Manual transfer of experimentally-verified manual GO annotation data to orthologs by curator judgment of sequence similarity.

AgBase, BHF-UCL, Parkinson's UK-UCL, dictyBase, HGNC, Roslin Institute, FlyBase and UniProtKB curators.; 2011

Method for transferring manual annotations to an entry based on a curator’s judgment of its similarity to a putative ortholog that has annotations that are supported with experimental evidence. Annotations are created when a curator judges that the sequence of a protein shows high similarity to another protein that has annotation(s) supported by experimental evidence (and therefore display one of the evidence codes EXP, IDA, IGI, IMP, IPI or IEP). Annotations resulting from the transfer of GO terms display the ‘ISS’ evidence code and include an accession for the protein from which the annotation was projected in the ‘with’ field (column 8). This field can contain either a UniProtKB accession or an IPI (International Protein Index) identifier. Only annotations with an experimental evidence code and which do not have the ‘NOT’ qualifier are transferred. Putative orthologs are chosen using information combined from a variety of complementary sources. Potential orthologs are initially identified using sequence similarity search programs such as BLAST. Orthology relationships are then verified manually using a combination of resources including sequence analysis tools, phylogenetic and comparative genomics databases such as Ensembl Compara, INPARANOID and OrthoMCL, as well as other specialised databases such as species-specific collections (e.g. HGNC’s HCOP). In all cases curators check each alignment and use their experience to assess whether similarity is considered to be strong enough to infer that the two proteins have a common function so that they can confidently project an annotation. While there is no fixed cut-off point in percentage sequence similarity, generally proteins which have greater than 30% identity that covers greater than 80% of the length of both proteins are examined further. For mammalian proteins this cut-off tends to be higher, with an average of 80% identity over 90% of the length of both proteins. Strict orthologs are desirable but not essential. In general, when there is evidence of multiple paralogs for a single species, annotations using less specific GO terms are transferred to the paralogs, however, annotations using more specific GO terms may be transferred to the most similar paralog in each species, this decision is taken on a case by case basis and may be influenced by statements by researchers in the field. Further detailed information on this procedure, including how ISS annotations are made to protein isoforms, can be found at: https://wiki.geneontology.org/Inferred_from_Sequence_or_structural_Similarity_(ISS).

External xrefs
  • dictyBase_REF:9
  • J:342587
  • FB:FBrf0255270