File Format Guide

GO Formats

The GO File Format Guide documents the structure and syntax of the files available on the GO website, to assist users who need to read, write parsers for, or create these files. The following file formats are documented separately:


Annotation is the process of assigning GO terms to gene products. The annotation data in the GO database is contributed by members of the GO Consortium, and the Consortium is continuously encouraging new groups to start contributing their annotations. The list of links below offer details on the GO annotation policies and the annotation process, as well as direct users to other pages of interest on GO annotation conventions, the standard operating procedures used by some consortium members, and the GO annotation file format guide.

Ontology Documentation

The Gene Ontology project provides controlled vocabularies of defined terms representing gene product properties. These cover three domains: Cellular Component, the parts of a cell or its extracellular environment; Molecular Function, the elemental activities of a gene product at the molecular level, such as binding or catalysis; and Biological Process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.


The Gene Ontology Project

Introduction to GO

The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products across databases. Founded in 1998, the project began as a collaboration between three model organism databases, FlyBase (Drosophila), the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD). The GO Consortium (GOC) has since grown to incorporate many databases, including several of the world's major repositories for plant, animal, and microbial genomes. The GO Contributors page lists all member organizations.

IC: Inferred by Curator

Updated September 22, 2011 

The IC evidence code is to be used for those cases where an annotation is not supported by any direct evidence, but can be reasonably inferred by a curator from other GO annotations, for which evidence is available.

RCA: inferred from Reviewed Computational Analysis

Updated November 9, 2007

Note: Annotations using the RCA code should be reviewed after one year, any older than this date will be deleted.

ISO: Inferred from Sequence Orthology

  • Pairwise or multiple alignments between a query protein and experimentally characterized match proteins when the proteins are established to be orthologs of each other.
  • Phylogenetic analysis of a set of proteins to define orthologous groups.
  • An entry in the with field is mandatory.

With/From Column Usage

We are aware that there has been some variability in usage of the with/from column. Some groups have used an annotation in combination with the IDs in the with/from field in the same line to indicate specific interactions that occur in pairwise or other specific combinations, while others have used the with/from field to indicate all interactions with that gene that are described in a paper, without any indication as to whether they occur at the same time or not.

Automatically-assigned Evidence Codes

The Automatically-assigned Evidence Code is:

IEA: Inferred from Electronic Annotation

Note: Annotations using the IEA code should be reviewed after one year, any older than this date will be deleted.