Electronic Gene Ontology annotations created by ARBA machine learning models

UniProt; 2021

Association-Rule-Based Annotator (ARBA) predicts Gene Ontology (GO) terms among other types of functional annotation such as Protein Description (DE), Keywords (KW), Enzyme Commission numbers (EC), subcellular LOcation (LO), etc. For all annotation types, reviewed UniProtKB/Swiss-Prot records having manual annotations as reference data are used to perform the machine learning phase and generate prediction models. For GO terms, ARBA has an additional feature to augment reference data using the relations between GO terms in the GO graph. The data augmentation is based on adding more general annotations into records containing manual GO terms, which will result in richer reference data. The predicted GO terms are then propagated to all unreviewed UniProtKB/TrEMBL proteins that meet the conditions of ARBA models. GO annotations using this technique receive the evidence code Inferred from Electronic Annotation (IEA; ECO:0000501). Links: ARBA documentation at UniProt (https://www.uniprot.org/help/arba), Blog on ARBA (http://insideuniprot.blogspot.com/2020/09/association-rule-based-annotator-arba.html).

External xrefs
  • J:342609