GO Subset Guide

What is a GO subset?

GO subsets (also known as GO slims) are cut-down versions of the GO ontologies containing a subset of the terms in GO. They give a broad overview of the ontology content without the detail of the specific fine-grained terms.

How are GO subsets used?

GO subsets are particularly useful for giving a summary of the results of GO annotation of a genome, microarray, or cDNA collection when broad classification of gene product function is required. Some groups annotate to GO subsets that are relevant to their domain of interest, rather than using the full GO.

Who creates and maintains GO subsets?

GO subsets are created by users according to their needs, and may be specific to species or to particular areas of the ontologies. GO provides a generic GO subsets which, like the GO itself, is not species-specific, and which should be suitable for most purposes. Alternatively, users can create their own GO subsets or use one of the model organism-specific subsets integrated into GO. Please email the GO helpdesk for more information about creating and submitting your GO subsets.

GO subsets available

Maintained GO subsets

The GO subsets in this list are maintained as part of the GO flat file. The files available below for download are generated by script from that file.

Maintained GO subsets for download

Organism or Usage Download
Generic GO subset Developed by GO Consortium OBO format
Aspergillus subset Developed by Aspergillus Genome Data OBO format
Candida albicans Developed by Candida Genome Database OBO format
Chembl Drug Target subset developed by Prudence Mutowo and Jane Lomax OBO format
Metagenomics subset Developed by Jane Lomax and the InterPro group OBO format
Mouse GO slim Developed by MGI OBO format
Plant subset Developed by The Arabidopsis Information Resource OBO format
Protein Information Resource subset Developed by Darren Natale, PIR OBO format
Schizosaccharomyces pombe subset Developed by Val Wood, PomBase OBO format
Yeast subset Developed by Saccharomyces Genome Database OBO format

For internal checking purposes we also provide two "anti-slims"

  • Do not annotate -- the set of high level terms that are useful for grouping, but should have no direct annotations
  • Do not manually annotate -- as above, but it's permitted for automated tools to make direct annotations to these

Archived GO Slims

There is also an archive of deprecated GO slims that are no longer maintained or updated. These files have been deposited for two reasons; the first is to give easy access to the GO slim used in a particular publication or analysis; the second is for reuse by others in the community.

Users should note that the majority of these GO slims are no longer maintained by the authors, and they may contain GO terms which are now obsolete. All archival GO slims are in the deprecated GO flat file format.

Archived GO slims for download
Topic / Usage Information Download
Generic GO slim Suparna Mundodi and Amelia Ireland Aug 2002 old GO format
Honey bee ESTs C.W. Whitfield, M.R. Band, M.F. Bonaldo, C.G. Kumar, L. Liu, J.R. Pardinas, H.M. Robertson, M.B. Soares, G.E. Robinson, PMID:11923340 Apr 2002 old GO format
Drosophila M. Adams, M. Ashburner, G.M. Rubin, S.E. Lewis et al.; Adams et al., PMID:10731132 Mar 2000 old GO format
Glossina ESTs M. Berriman Sep 2002 old GO format
UniProtKB-GOA N. Mulder, M. Pruess PMID:12230037 Nov 2002 old GO format
Mouse The RIKEN Genome Exploration Group Phase II Team and the FANTOM Consortium PMID:11217851 Feb 2001 old GO format
P. falciparum M. Berriman July 2002 old GO format
Plant Suparna Mundodi Dec 2002 old GO format
Rice (Beijing) J. Yu et al. PMID:11935017 Apr 2002 old GO format
Rice (Syngenta) J. Yu et al.PMID:11935018 Apr 2002 old GO format
Yeast SGD curators Aug 2003 old GO format
Prokaryotic subset GO curators. Replaced by taxon constraints. old GO format

Map2Slim option in OWLTools

Given a GO subset file, and a current ontology (in one or more files), the Map2Slim script will map a gene association file (containing annotations to the full GO) to the terms in the GO subset. This script is an option of OWLTools, and it can be used to either create a new gene association file, which contains the most pertinent GO slim accessions, or in count-mode, in which case it will give distinct gene product counts for each subset term.

Background information and details on how to download, install, and implement OWLTools, as well as instructions on how to run the Map2Slim script are available from the OWLTools Wiki at https://github.com/owlcollab/owltools/wiki/Map2Slim.

On the web

Similarly, there are a couple of online tools that may be of use. The first is the Princeton slimming tool, the second is the legacy amigo slimmer. It should be noted that online tools do often contain limitations and timeouts.