This page describes the Gene Product Association Data (GPAD) 2.0 format. This format has not yet been implemented in GO but is provided to help with the changeover from previous GPAD/GPI versions.

GPAD/GPI files

Gene Product Association Data (GPAD) and (Gene Product Information) (GPI) companion files reduce the redundancy of the Gene Association File (GAF). GAF files contains information about gene products that are present in each line of the GAF; the GPAD/GPI file system normalizes the data by separating the annotations and metadata about gene and gene product entities in two separate files.

This page is a summary of the Gene Product Association Data (GPAD) 2.0 format; for full technical details and a summary of changes from GPAD 1.1 see the GitHub specification page.

The other file that supports exchange of GO is the GAF format. For more general information on annotation, please see the Introduction to GO annotation.

Gene Product Association Data (GPAD) 2.0 format

The GPAD file is a standardized way to exchange GO annotation data. Each line in the tab-delimited file represents a single association between a gene product and a GO term, and includes an evidence code, reference, and other relevant information.

The GPAD file must start with a header minimally consisting of a declaration of the file format, the group generating the file, and the date the file was generated. Each header line should be prefixed with an exclamation mark (!) so that these lines are ignored by data parsers:

!gpad-version: 2.0
!generated-by: MGI
!date-generated: 2023-01-30

The group in the generated-by field must be present in the dbxrefs.yaml file. The year must be YYYY-MM-DD, conforming to the date portion of ISO 8601 standards.

Submitting groups may choose to include optional additional information, for example:

!URL: http://www.yeastgenome.org/
!Project-release: WS275
!Funding: NHGRI grant number HG012212
!Columns: DB:DB_Object_ID Negation    Relation    GO ID    DB:Reference(s)    Evidence Code    With (or) From    Interacting taxon ID    Date    Assigned by    Annotation Extension    Annotation Properties
!go-version: https://doi.org/10.5281/zenodo.8436609

Annotation file fields

The GPAD format comprises 12 tab-delimited fields. Some fields are optional, some fields are mandatory and cardinality varies by field and other conditions. For fields that permit multiple values, values should be separated by pipes (|) for OR statements and commas (,) for AND statements.

GPAD 2.0 sample line:

SGD:S000002164	NOT	RO:0002331	GO:0043409	PMID:26546002	ECO:0000316	SGD:S000003631		2018-01-19	SGD	RO:0002233(UniProtKB:Q00772),BFO:0000050(GO:0071852)	noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=orcid:0000-0003-3212-6364
Column Content Required? Cardinality Example
1 DB:DB_Object_ID required 1 SGD:S000002164
2 Negation optional 0 or 1 NOT
3 Relation required 1 RO:0002331
4 GO ID required 1 GO:0043409
5 DB:Reference(s) (|DB:Reference) required 1 or greater PMID:26546002
6 Evidence Code required 1 ECO:0000316
7 With (or) From optional 0 or greater SGD:S000003631
8 Interacting taxon ID optional 0 or greater NCBITaxon:5476
9 Date required 1 2018-01-19
10 Assigned by required 1 SGD
11 Annotation Extension optional 0 or greater RO:0002233(UniProtKB:Q00772),BFO:0000050(GO:0071852)
12 Annotation Properties optional 0 or greater noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=orcid:0000-0003-3212-6364

Definitions and requirements for field contents

1. DB:DB Object ID

A unique identifier for the item being annotated. The DB prefix is the database from which the DB Object ID is drawn and must be one of the values from the set of GO database cross-references. The DB:DB Object ID is the combined identifier for the database object. The DB is not necessarily the same as the group submitting the file, which is named in column 10 Assigned by. Examples:

  • UniProtKB:P99999
  • SGD:S000002164
  • MGI:MGI:1919306

The identifier usually references the canonical form of a gene or gene product including functional RNAs. Identifiers may also describe gene variants, distinct proteins produced by to differential splicing, alternative translational starts, post-translational cleavage or post-translational modification. If the gene product is not a canonical gene or gene product identifier, the Gene Product Information (GPI) file should contain information about the canonical form of the gene or gene product.

This field is mandatory, cardinality 1.

2. Negation

Negation is indicated by the ‘NOT’ value.

This field is optional, cardinality 0 or 1.

3. Relation

The relations depend upon the term namespace, and must be in the below list of current allowed Gene Product to GO Term Relations.

This field is mandatory, cardinality 1.

GO Aspect Relations Ontology Label Relations Ontology ID Usage Guidelines
Molecular Function enables RO:0002327 Default for all GO:0003674 molecular_function & child terms
Molecular Function contributes to RO:0002326  
Biological Process involved in RO:0002331  
Biological Process acts upstream of RO:0002263  
Biological Process acts upstream of positive effect RO:0004034  
Biological Process acts upstream of negative effect RO:0004035  
Biological Process acts upstream of or within RO:0002264 Default for all GO:0008150 biological_process & child terms
Biological Process acts upstream of or within positive effect RO:0004032  
Biological Process acts upstream of or within negative effect RO:0004033  
Cellular Component part of BFO:0000050 Default for all GO:0032991 protein-containing complex & child terms
Cellular Component located in RO:0001025 Default for GO:0005575 cellular_component except protein-containing complex
Cellular Component is active in RO:0002432 Used to indicate where a gene product enables its MF
Cellular Component colocalizes with RO:0002325  

4. GO ID

The GO identifier for the term attributed to the DB object ID. Must be in the format GO:GOID.

This field is mandatory, cardinality 1.

5. DB:Reference

One or more unique identifiers for a single source cited as an authority for the attribution of the GO ID to the DB object ID. This may be a literature reference or a database record. Valid references are one of: PubMed, DOI, GO_REF, Agricola, MOD reference. The syntax is DB:accession.

Only one reference can be cited on a single line in the gene association file. If a reference has identifiers in more than one database, multiple identifiers for that reference can be included on a single line. For example, if the reference is a published paper that has a PubMed ID, the PubMed ID must be included; if the model organism database has its own identifier for the reference, that can also be included (e.g. PMID:2676709|SGD_REF:S000047763)

This field is mandatory, cardinality 1, >1; for cardinality >1 use a pipe to separate entries.

6. Evidence code

One of the codes from the Evidence & Conclusion Ontology, ECO. See the wiki linked from our evidence code documentation for more information.

This field is mandatory, cardinality 1.

7. With [or] From

Also referred to as With, From or the With/From column

This field is used to hold an identifier for annotations using certain evidence codes: ECO:0000305 (IC); ECO:0000203, ECO:0000256, and ECO:0000265 (IEA & child terms); ECO:00000316 (IGI); ECO:0000021 (IPI); ECO:0000031, ECO:0000250 and ECO:0000255 (ISS & child terms). This column can identify another gene product to which the annotated gene product is similar (ECO:0000031, ECO:0000250 and ECO:0000255, ISS) or interacts with (ECO:0000021, IPI).

The With [or] From column may not be used with the evidence codes ECO:0000314 (IDA), ECO:0000304 (TAS), ECO:0000303 (NAS), or ECO:0000307 (ND).

A GO:ID is used only when the evidence code is IC, and refers to the GO term(s) used as the basis of a curator inference. In these cases the entry in the DB:Reference column will be that used to assign the GO term(s) from which the inference is made.

Cardinality 0, 1, >1 with the following rules:

  • Cardinality must be 0 for evidence codes IDA, TAS, NAS, or ND.

  • Cardinality must be 1, >1 for IEA, IC, IGI, IPI, ISS & child terms of ISS.

For cardinality >1 use a pipe to separate independent evidence (e.g. FB:FBgn1111111|FB:FBgn2222222). Use commas to indicate grouped evidence, e.g. two of three genes in a triply mutant organism.

8. Interacting taxon ID

Taxonomic identifier for interacting organism to be used only in conjunction with terms that have the biological process term ‘GO:0044419 biological process involved in interspecies interaction between organisms’or the cellular component term ‘GO:0018995 host cellular component’ as an ancestor. Identifiers must come from NCBI Taxonomy database and have the NCBITaxon: prefix.

This field is optional, cardinality 0 or greater.

9. Date

Date on which the annotation was made; format is YYYY-MM-DD. Conforms to the date portion of ISO 8601.

This field is mandatory, cardinality 1.

10. Assigned By

The database which made the annotation one of the values from the set of GOC groups; used for tracking the source of an individual annotation. Value may differ from the DB:DB Object ID column: any annotation that is made by one database and incorporated into another retains the original value.

This field is mandatory, cardinality 1.

11. Annotation Extension

Annotation extensions allow GO terms in standard annotations to be further specified, using gene products, chemicals, cell types, anatomical structures. The ontology terms used to extend annotations are GO term or external ontologies and build a more complete model of biological systems.

For example, if a gene product has a role in tetrahydrofolate interconversion during S phase, the GO ID (column 4) would be GO:0035999 and the Annotation Extension column would contain the Relations Ontology and appropriate GO term: RO:0002092(GO:0051320). Targets of certain processes or functions can also be included in this field to indicate the gene, gene product, or chemical involved; for example, if a gene product is annotated to protein kinase activity, the annotation extension column would contain the UniProtKB protein ID for the protein phosphorylated in the reaction. See the documentation on using the annotation extension column for details of practical usage.

This field is optional, cardinality 0 or greater.

12. Annotation Properties

The Annotation Properties column contains a list of “property_name = property_value”. If the property exists, the property is single valued. Annotation properties include GO-CAM information and comments on annotations. Examples:

  • id=GOA:2113861687
  • noctua-model-id=gomodel:6086f4f200000223
  • model-state=production
  • creation-date=2019-07-20T12:04:08

This field is optional, cardinality 0 or greater; for cardinality >1 use a pipe to separate entries.