Gene Product Association Data (GPAD) format

The GPAD file is an alternative means of exchanging annotations from the Gene Association File (GAF). The GPAD format is designed to be more normalized than GAF, and is intended to work in conjunction with a separate format for exchanging gene product information.

All annotation files must start with a single line denoting the file format. For GPAD it is as follows:

!gpa-version: 1.1

GO Annotation File Formats

This page documents the file formats used to store gene associations (annotations), data capturing the attributes of gene products using terms from the Gene Ontology. For more general information on annotation, please see the GO annotation guide.

GO Annotation File (GAF) Format 1.0

Annotation data is submitted to the GO Consortium in the form of gene association files, or GAFs. The following document lays out the format specification for GAF 1.0; for the newer GAF 2.0 file syntax, please see the GAF 2.0 file format guide.

More general information on annotation can be found in the GO annotation guide.

GO File Format Guide

GO Formats

The GO File Format Guide documents the structure and syntax of the files available on the GO website, to assist users who need to read, write parsers for, or create these files. The following file formats are documented separately:


Annotation is the process of assigning GO terms to gene products. The annotation data in the GO database is contributed by members of the GO Consortium, and the Consortium is continuously encouraging new groups to start contributing their annotations. The list of links below offer details on the GO annotation policies and the annotation process, as well as direct users to other pages of interest on GO annotation conventions, the standard operating procedures used by some consortium members, and the GO annotation file format guide.

Ontology Documentation

The Gene Ontology defines the universe of concepts relating to gene functions (‘GO terms’), and how these functions are related to each other (‘relations’). It is constantly revised and expanded as biological knowledge accumulates.

GO Annotation File Format 2.0

Annotation data is submitted to the GO Consortium in the form of Gene Association Format, or GAFs. This guide lays out the format specifications for GAF 2.0; for the older GAF 1.0 file syntax, please see the GAF 1.0 file format guide.

Please see the information on the changes in GAF 2.0.

General information about annotation can be found in the GO annotation guide.

The Reference Genome Annotation Project

The GO Consortium coordinated an effort to maximize and optimize GO annotations for a large and representative set of key genomes, known as 'reference genomes'. The Reference Genome Annotation Project aimed to completely annotate twelve reference genomes, producing a resource that may effectively seed automatic annotation efforts of other genomes.

Guide to GO Evidence Codes

A GO annotation consists of a GO term associated with a specific reference that describes the work or analysis upon which the association between a specific GO term and gene product is based. Each annotation must also include an evidence code to indicate how the annotation to a particular term is supported. Although evidence codes do reflect the type of work or analysis described in the cited reference which supports the GO term to gene product association, they are not necessarily a classification of types of experiments/analyses.