Gene Ontology annotation of human sequence-specific DNA binding transcription factors (DbTFs) based on the TFClass database

Marcio Luis Acencio (1), George Georghiou (2), Sandra Orchard (2), Liv Thommensen (1), Martin Kuiper (1) and Astrid Lægreid (1). (1) Norwegian University of Science and Technology (NTNU), Trondheim, Norway; (2) European Bioinformatics Institute (EBI), Hinxton, Cambridgeshire, United Kingdom; 2018

The TFClass (http://tfclass.bioinf.med.uni-goettingen.de/index.jsf) database provides a comprehensive classification of mammalian DNA binding transcription factors (DbTFs) based on their DNA binding domains (DBDs) (PMID:29087517). TFClass classifies mammalian DbTFs by a five-level classification in which the four highest levels represent groups defined by structural and sequence similarities (superclass, class, family, subfamily, and genera) (more details at http://www.edgar-wingender.de/TFClass_schema.html). This classification is based on the combination of background knowledge of the molecular structural features of DBDs (PMID:9340487, PMID:23427989) and phylogenetic trees constructed via multiple sequence alignment with hierarchical clustering of manually validated DBDs and/or full-length protein sequences retrieved from UniProt (PMID:23427989, PMID:23180794, PMID:23427989).

The NTNU curation team has evaluated each family and assigned a molecular function annotation, GO:0000981 (DNA-binding transcription factor activity, RNA polymerase II-specific), and a cellular component annotation GO:0000790 (nuclear chromatin), as appropriate. The superclass/class/family or subfamily ID is specified in the “With/From” field. The annotations are supported by the evidence code ECO:0005556 (multiple sequence alignment evidence used in manual assertion).