About the GO
The mission of the GO Consortium is to develop an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular and organism-level systems.
The Gene Ontology knowledgebase provides a computational representation of our current scientific knowledge about the functions of genes (or, more properly, the protein and non-coding RNA molecules produced by genes) from many different organisms, from humans to bacteria. It is widely used to support scientific research, and has been cited in tens of thousands of publications.
Understanding gene function — how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. Moreover, experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor. The Gene Ontology (GO), as a consortium, began in 1998 when researchers studying the genome of three model organisms — Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (brewer’s or baker’s yeast) — agreed to work collaboratively on a common classification scheme for gene function, and today the number of different organisms represented in GO is in the thousands. GO makes it possible, in a flexible and dynamic way, to provide comparable descriptions of homologous gene and protein sequences across the phylogenetic spectrum.
GO is also at the hub of a major effort to represent the vast amount of biomedical knowledge in a computable form. GO is linked to many other biomedical ontologies, and is a foundation for research applying computer science in biology and medicine.
The GO offers two primary resources:
- The GO ontology: the logical structure describing the full complexity of the biology, comprised of the ‘classes’ (often referred to as ‘terms’) for the many different kinds of biological functions, the pathways carrying out different biological programs, and the cellular locations where these occur; and equally important, the different types of specific relationships that indicate how each of these classes is related to other classes (more information on the Relations Ontology here).
- The corpus of GO annotations: the traceable, evidence-based statements relating a specific gene product (i. e. a protein, a non-coding RNA, or a macromolecular complex, or gene for simplicity) to specific ontology terms to describe its normal biological role.
Together, the ontology and annotations provide a comprehensive model of biological systems. Currently, the GO includes experimental findings from over 150,000 published papers, represented as over 700,000 experimentally-supported annotations. These provide the core dataset for additional inference of over 6 million functional annotations for a diverse set of organisms spanning the tree of life.
In addition to this core knowledgebase, GO resources also include software to edit and perform logical reasoning over the ontologies, web access to the ontology and annotations, and analytical tools that use GO to support biomedical research.
Uses of the GO and annotations
The GO knowledgebase plays an essential role in supporting biomedical research and has been used in tens of thousands of scientific studies. The most common use of GO annotations is for interpretation of large-scale molecular biology experiments, sometimes called “omics” experiments. Whether, genomics, transcriptomics, proteomics, or metabolomics, these experiments pool biological molecules to gain insight into the structure, function, and dynamics of an organism. “Gene Ontology enrichment analysis” is used to discover statistically significant similarities or differences under alternate controlled experimental conditions.
You can explore the scientific publications that have used the Gene Ontology knowledgebase.
The GO and the Alliance of Genome Resources
In 2016, the GO knowledgebase partnered with model organism databases (MODs) to form the Alliance of Genome Resources. The mission of the Alliance is to provide a comprehensive, sustainable resource that unites the diverse information available from each member. GO annotations, combined with data from each of the MODs, presented in the Alliance are not only important to researchers in each of the represented organisms, but also to researchers and clinicians interested in the genetic and genomic basis of human biology, health, and disease.
The partner MODs are: Flybase, Mouse Genome Database (MGI), Rat Genome Database (RGD), Saccharomyces Genome Database(SGD), WormBase, Xenbase, and Zebrafish Information Network(ZFIN).
The GO and the Global Biodata Coalition
The Global Biodata Coalition (GBC), founded in 2019, is a forum working to ensure the efficient management and growth of biodata infrastructure by coordinating funding at the global level. In December 2022, the GBC announced their first set of Global Core Biodata Resources (GCBRs), which includes GO. Among other criteria, GCBRs selection was based on their status as authoritative databases or knowledgebases that are used extensively, have a proven longevity, and provide free and open access to their high quality data. For more information and to view the full list of GCBRs, visit the GBC Global Core Biodata Resource page
The GO Consortium is funded by the National Human Genome Research Institute (US National Institutes of Health), grant number HG012212, with co-funding by NIGMS.
Further reading about the Gene Ontology knowledgebase
For further guidance and reading, please see the following publications: