the Gene Ontology

  • Open menus
  • Home
  • FAQ
  • Downloads
  • Ontologies
  • Annotations
  • Database
  • Mappings to GO
  • Teaching Resources
  • Other files
  • FTP and CVS downloads
  • Tools
  • Browsers
  • Microarray tools
  • Annotation tools
  • Other tools
  • Submit New Tools
  • Documentation
  • Introduction
  • Annotation Guide
  • Evidence Code Guide
  • Component Ontology
  • Function Ontology
  • Process Ontology
  • File Format Guide
  • GO Database Guide
  • GO Slim Guide
  • Meeting minutes
  • Editorial Style Guide
  • About GO
  • GO Consortium
  • Publications
  • Citation Policy
  • Mailing lists
  • Interest Groups
  • GO People
  • Funding
  • Acknowledgements
  • Newsletter
  • Projects
  • Cardiovascular
  • Immunology
  • Reference Genomes
  • Contact GO
  • Site Map

The Reference Genome Annotation Project

With more and more genomes being sequenced, we are in the middle of an explosion of genomic information. The limited resources to manually annotate the growing number of sequenced genomes imply that automatic annotation will be the method of choice for many groups. The GO Consortium coordinates an effort to maximize and optimize the GO annotation of a large and representative set of key genomes, known as 'reference genomes'. The goal of this project is to completely annotate twelve reference genomes so that those annotations may be used to effectively seed the automatic annotation efforts of other genomes.

  • Reference Species and Databases
  • Priorities for Annotation
  • Overview of project strategy
  • How does this project differ from standard GO annotation?
  • How do we know when GO annotation is comprehensive?
  • Where can GO annotations from the project be viewed?
  • Reference Genome Graphs
  • Concluding Remarks
  • Activities

Reference Species and Databases

The reference genomes and responsible database groups are:

  • Arabidopsis thaliana (The Arabidopsis Information Resource (TAIR) [external website])
  • Caenorhabditis elegans (WormBase [external website])
  • Danio rerio (zebrafish; Zebrafish Information Network (ZFIN) [external website])
  • Dictyostelium discoideum (dictyBase [external website])
  • Drosophila melanogaster ( FlyBase [external website])
  • Escherichia coli (EcoliHub [external website])
  • Gallus gallus (AgBase [external website])
  • Homo sapiens (human Gene Ontology Annotation [GOA] @ EBI [external website])
  • Mus musculus (Mouse Genome Informatics [external website])
  • Rattus norvegicus (Rat Genome Database (RGD) [external website])
  • Saccharomyces cerevisiae (Saccharomyces Genome Database (SGD) [external website])
  • Schizosaccharomyces pombe (GeneDB S. pombe [external website])

The Reference Genome GO Annotation Team, with trained and highly skilled GO curators from each genome annotation group, coordinates annotation, facilitates implementation of GO Consortium annotation priorities, and provides quantitative measures to assess progress toward the goal of broad and deep annotation of the reference genomes. This group represents the annotation expertise within the GO Consortium and provides key liaisons to the model organism databases that have primary responsibilities for the annotation of the reference genomes.

Back to top

Priorities for Annotation

Our ultimate goal is to provide comprehensive GO annotation for all gene products in each of the reference genomes. This is a huge task and requires prioritizing curation targets. Our initial annotation efforts (August 2006 - September 2007) focused on orthologs of human disease genes but in October 2007, we widened our list to four priority areas:

  • Orthologs of human disease genes
  • Topical or 'hot' genes
  • Genes conserved from E. coli to human but currently lacking GO annotation
  • Genes involved in biochemical and/or signaling pathways

Each month we curate genes from each category, as selected by one of the participating databases on a rotational basis.

Back to top

Overview of project strategy

Every month each database curates the same set of genes from our annotation priority list. Working on the same genes together promotes cross-organism discussion about annotations and frequently leads to new terms being added to the Gene Ontology. We start from a set of genes selected from the human genome.

Curation process:

  • Identify the ortholog(s)/homolog(s) of the selected target genes in each species. Not all species may have orthologs/homologs to selected genes.
  • Enter the gene identifiers in a shared spreadsheet so that all curators can see the set of genes being curated.
  • Collect and annotate available literature about the genes.
  • Assign GO terms based on experimental data.
  • Review existing GO annotations to make sure they conform to agreed GO annotation standards.
  • Record in shared spreadsheet that the gene in question is considered comprehensively annotated as of a given date.

A web tool for reference genome annotation is under development. This will help curators to track and compare annotations, thus streamlining the annotation process.

Back to top

How does this project differ from standard GO annotation?

The reference genome databases have agreed to follow more stringent guidelines than those used for standard GO annotation:

  • Experimental evidence codes (IDA, IPI, IMP, IGI, IEP) should be used where possible. The ultimate objective would be to provide experimentally-based annotations for all gene products from these organisms.
  • Terms inferred from sequence or structural similarity (ISS) should only be used where the terms are supported by experimental evidence for the similar sequence.
  • Non-traceable author statements (NAS) should not be used.
  • No new annotations should be based on traceable author statement (TAS); existing terms assigned with TAS should gradually be replaced with the appropriate experimental evidence code based on the primary literature.

Back to top

How do we know when GO annotation is comprehensive?

The amount of literature per gene is very variable. Where possible, we review every paper about a given gene for each organism and capture all possible GO terms. This is only really feasible when there are tens of papers. For genes associated with hundreds or even thousands of publications, we cannot read all of the papers. So we seek to prioritize the literature and capture all functional attributes of the gene in the annotations. In these situations, we often start work with recent reviews that lead us to key experimental papers. Users are encouraged to notify us if we have failed to capture some aspect of a specific gene. Send your comments to the GO helpdesk.

When there are no experimental data for any of the reference genome species, but experimental data are available in other model systems, we submit GO annotations for the relevant species to GOA [external website] so that this information is captured from the primary literature.

Back to top

Where can GO annotations from the project be viewed?

All GO annotations from this project are included in the gene association files that each group submits to GO. Annotations can also be viewed using the GO search engine and browser AmiGO.

Reference Genome Graphs

It is also possible to specifically view the reference genome effort. A graphical representation of the annotations to a particular gene across the reference genomes can be viewed in full in the Reference Genome graphs directory.

Each curated reference gene links to one graph. In addition to the graph, each page includes two informative tables: a table comparing organism annotations for each term (rows are GO terms, columns correspond to organism), or a table that shows full experimental annotations in each organism for the given gene. This facilitates comparison of the curation status in the twelve reference genomes and helps curators to identify genes that need attention.

Partial Graph of Gene POLA

Partial POLA graph diagram

Back to top

Concluding Remarks

This project aims to improve annotations across a wide range of organisms. The resulting high quality annotations will no doubt improve electronic annotations that propagate from this resource and annotations will facilitate cross-species functional comparison. Furthermore, the easy comparison of annotations between organisms may lead to new hypotheses and thus inspire new exciting research.

Back to top

Activities

  • Monthly conference calls
  • First Reference Genome Annotation Meeting, Princeton, NJ, Sept 26 - 27, 2007

Back to top


Open Biomedical Ontologies logo

Last modified Tuesday, 11-Dec-2007 10:40:10 PST
Cite GO • Terms of use • GO helpdesk
Copyright © 1999-Monday, 12-May-2008 17:06:54 PDT the Gene Ontology