Introduction to the GO resource

Because of the staggering complexity of biological systems and the ever-increasing size of datasets to analyze, biomedical research is becoming increasingly dependent on knowledge stored in computable form. The Gene Ontology (GO) project provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products.

The GO knowledgebase is composed of two primary components:

  • the Gene Ontology (GO), which provides the logical structure of the biological functions (‘terms’) and their relationships to one another, manifested as a directed acyclic graph
  • the corpus of GO annotations, evidence-based statements relating a specific gene product (a protein, non-coding RNA, or macromolecular complex, which we often refer to as ‘genes’ for simplicity) to a specific ontology term

Together, the ontology and annotations aim to describe a comprehensive model of biological systems. Currently, the GO knowledgebase includes experimental findings from over 140 000 published papers, represented as over 600 000 experimentally-supported GO annotations. These provide the core dataset for additional inference of over 6 million functional annotations for a diverse set of organisms spanning the tree of life.

In addition to this core knowledgebase, GOC resources also include software to edit and perform logical reasoning over the ontologies, web access to the ontology and annotations, and analytical tools that use the GO knowledgebase to support biomedical research.

Uses of the Gene Ontology and annotations

The most common use of the Gene Ontology annotations is for interpretation of large-scale molecular biology experiments, sometimes called "omics" experiments. These experiments measure either: 1) gene products (RNA and proteins), 2) variation in the DNA sequence of genes, or 3) small molecules metabolized by proteins. Thus they can all be related to gene function.

A typical omics experiment measures levels of thousands of molecules, making it difficult to interpret the underlying molecular changes (for example between a cancer cell and a normal cell). "Gene Ontology enrichment analysis" identifies relevant groups of genes that function together, which reduces the thousands of molecular changes to a much smaller number of biological functions, so that it is possible to understand what the molecular changes mean.

The Gene Ontology is also at the hub of a major effort to represent the vast amount of biomedical knowledge in a computable form. It is linked to many other biomedical ontologies, and is a foundation for research applying computer science in biology and medicine.

You can explore the scientific publications that have used the Gene Ontology resource.

Further reading about the Gene Ontology resource

For further guidance and reading, please see the following publications: