GO Annotation File (GAF) Format 2.1

Annotation data is submitted to the GO Consortium in the form of gene association files, or GAFs. This guide lays out the format specifications for GAF 2.1; for the previous GAF 2.0 file syntax, please see the GAF 2.0 file format guide.

For the first GAF 1.0 file syntax, please see the GAF 1.0 file format guide.

Please see the information on the changes in GAF 2.1.

The gene association files submitted by GO Consortium members are shown in the tables below. Files are in the GO annotation file format and are compressed using the UNIX gzip utility. Please see the appropriate README file for further details on the annotation set. Any errors or omissions in annotations should be reported by writing to the GO Helpdesk.

Ontology and annotation data is integrated in the mySQL and XML files. See the GO database guide for more information.

GO Enrichment Analysis

One of the main uses of the GO is to perform enrichment analysis on gene sets. For example, given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO terms are over-represented (or under-represented) using annotations for that gene set.

GO Tools Registry

External Mapping File Format

Mappings of GO have been made to other many other classification systems; a full list is available on the Mappings to GO page. This page describes the format of these files.

Format Specification

The source of the external file is given in the line beginning !Uses: !Uses:, 15 aug 2000.

The line syntax for mappings is:

external database:term identifier (id/name) > GO:GO term name ; GO:id

For example:

Annotation Quality Control

The GO Consortium implements a number of automated checks to check the quality of the annotations submitted to the GO database. These checks are detailed on the annotation quality control checks page.

Gene Product Information (GPI) Format

Gene Product Information (GPI) format is used to submit gene and gene product information to the GO Consortium. Please note that the GPI companion file for annotation information uses the GPAD file format.

GPI format version

All annotation files must start with a single line denoting the file format. For GPI it is as follows:

!gpi-version: 1.2

Gene Product Association Data (GPAD) format

The GPAD file is an alternative means of exchanging annotations from the Gene Association File (GAF). The GPAD format is designed to be more normalized than GAF, and is intended to work in conjunction with a separate format for exchanging gene product information.

All annotation files must start with a single line denoting the file format. For GPAD it is as follows:

!gpa-version: 1.1

GO Annotation File Formats

This page documents the file formats used to store gene associations (annotations), data capturing the attributes of gene products using terms from the Gene Ontology. For more general information on annotation, please see the GO annotation guide.