Guide to query and download GO

Query

Ontology and standard GO Annotations

GOlr: GO Solr search engine

The ontology and GO annotations can easily be searched and retrieved via the GO Solr search engine API called GOlr.

The following is a query example to retrieve all meta data about the GO term GO:0030182:

http://golr-aux.geneontology.io/solr/select?fq=document_category:%22ontology_class%22&q=*:*&fq=id:%22GO:0030182%22&wt=json

GOlr is powering the faceted search of AmiGO.

The purpose of the BioLink Data Model is to provide a high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc), their properties, relationships, and ways in which they can be associated.

GO-CAMs (Experimental)

GO also provides an API to query data about GO-CAMs as well as a swagger documentation to familiarize with the routes and parameters. The API is used to power the http://geneontology.org/go-cam section of this site.

The following is a query example to retrieve all GO terms contained in the GO-CAM 59a6110e00000067:

https://api.geneontology.cloud/models/go?gocams=59a6110e00000067

Download

General Download

GO provides different ways to download both its ontology and its annotations. Among them, GO provides a programmatic way to access full or holey BD bags.

Programmatic Download: BDBag

The following example requires both python and pip to be installed. Once this is done, you can install the BDBag cli by following those steps:

pip install bdbag
git clone https://github.com/fair-research/bdbag
python setup.py install

Then check that you pass all tests:

python setup.py test

Create a symlink from your bdbag application to your /usr/bin/ folder:

sudo ln -s ./bdbag /usr/bin/bdbag

Once the BDBag cli is installed, fetch a DOI versioned of GO dataset, either the full archive or the holey bag.

In this example, we plan on accessing single files, so the holey bag (containing only the references to our files) is sufficient. Once you have retrieved our DOI versioned of GO from one of the two links above, notice a file named fetch.txt. It describes all the files contained and accessible from this archive. Its syntax is as follow:

URL Length Filename
http://release.geneontology.org/2018-10-08/annotations/aspgd.gaf.gz 6346222 data/annotations/aspgd.gaf.gz
http://release.geneontology.org/2018-10-08/annotations/aspgd.gpad.gz 4883110 data/annotations/aspgd.gpad.gz
http://release.geneontology.org/2018-10-08/annotations/aspgd.gpi.gz 1367586 data/annotations/aspgd.gpi.gz

The full extent of possible queries over BDbags are described here.

GO to your DOI versioned of GO BDbag folder, you can now for instance retrieve the first file (aspgd.gaf.gz) with two different methods:

By the URL of the file

bdbag --resolve all --fetch-filter url==http://release.geneontology.org/2018-10-08/annotations/aspgd.gaf.gz ./

By the name of the file

bdbag --resolve all --fetch-filter filename==data/annotations/aspgd.gaf.gz ./

The file retrieved will be stored in the same folder hierarchy as described in the filename. In the previous example, the file aspgd.gaf.gz retrieved will be stored locally in data/annotations/

Notes:

  • this specific file could be accessed by using length==6346222 but there is no guaranty that this size is unique. The length filter is therefore better used to retrieve a set of files smaller than or greater than a certain threshold
  • holey BDbags are a very convenient way to retrieve only the files important to you as the holey BDbags only contain the references needed to actually retrieve the files of interest