Editorial Style Guide
The GO Style Guide introduces new users to (and reminds old users of) both the philosophy and the practicalities behind developing and maintaining GO. Its main purpose is to serve as a user manual for GO curators. You will find it more useful if you first read An Introduction to GO for more general background information about the GO project and how the ontology works. Information on annotating genes and gene products to GO can be found in the GO Annotation Guide and information on the structure and syntax of the GO files can be found in the GO File Format Guide.
General Conventions When Adding Terms
As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. When adding a new term, ensure that it is a valid concept within the scope of GO. The following concepts should not be introduced as GO terms:
- Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
- Processes, functions or components that are unique to mutants or diseases: e.g. "oncogenesis" is not a valid GO term because causing cancer is not the normal function of any gene.
- Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate sequence ontology (see the Open Biomedical Ontologies website [external website] for more information).
- Protein domains or structural features.
- Protein-protein interactions.
The following stylistic points should be applied to all aspects of the ontologies.
Spelling conventions
Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT.
Abbreviations
Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.
Greek symbols
Spell out Greek symbols in full: e.g. alpha, beta, gamma.
Upper vs. lower case
GO terms are all lower case except where demanded by context, e.g. DNA, not dna.
Singular vs. Plural
Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).
Be Descriptive
Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway".
Anatomical Qualifiers
Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.
Gene Products
It is easy to confuse a gene product and its molecular function, because very often these are described in exactly the same words. For example, "alcohol dehydrogenase" can describe what you can put in an Eppendorf tube (the gene product) or it can describe the function of this stuff. There is, however, a formal difference: a single gene product might have several molecular functions, and many gene products can share a single molecular function. For example, there are many gene products that have the function alcohol dehydrogenase activity. Some, but by no means all, of these are encoded by genes with the name "alcohol dehydrogenase". A particular gene product might have both the functions alcohol dehydrogenase activity and acetaldehyde dismutase activity, and perhaps other functions as well. It's important to grasp that, whenever we use terms such as alcohol dehydrogenase activity in GO, we mean the function, not the entity; for this reason, most GO molecular function terms are appended with the word 'activity'.
Referring to Gene Products in Synonyms and Term Names
As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.
Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:
- DNA damage response, signal transduction by p53 class mediator ; GO:0030330
- A cascade of processes induced by the cell cycle regulator phosphoprotein p53, or an equivalent protein, in response to the perception of DNA damage.
Term Definitions
Always define new terms
If you create a new term, or refine an existing term, you should add a definition for it, and note the references used in composing the definition.
Write definitions carefully
Definitions should explain clearly to the reader what is meant by a particular term. They should be concise, full sentences. They should begin with an upper-case letter and end with a period (full stop). Proofread your definitions carefully to eliminate typos and double spaces. The definition should be written at the same level of specificity as the term itself. It should also be consistent with the guidelines for the contents of each ontology. As with term names, avoid using abbreviations that may be ambiguous (e.g. "ER" can mean "endoplasmic reticulum" or "estrogen receptor").
Use Aristotelian definitions
Ideally, definitions should follow the genus-differentia ("Aristotelian") pattern: they should take the form of a genus (generic term, an is_a parent) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms). (Note: cross-products can be represented using "intersection_of" tags in OBO files, and are a means of formally expressing the genus and differentiae; see the Logical Definitions documentation for more information.)
Database cross-references for definitions
If you define a term, you must document where your definition came from. If you use OBO-Edit, the software won't allow you to commit a definition without entering a cross-reference for it. Database cross-references have two parts, separated by a colon: an abbreviation for the database being cross-referenced (see the list of database cross-references used in GO) and the ID of the item in that database.
-
Definitions may be created by individual curators, groups of curators, community experts, or by consensus at meetings. For any such sources, use the database abbreviation 'GOC'. A list of curator cross-references currently in use is available; the guidelines for creating new dbxrefs are as follows:
- If the definition comes from an individual curator's head, use the GOC and your initials in lower case as the ID; e.g. a definition written by Michael Ashburner has the dbxref GOC:ma.
- For a definition created by a group of curators, use the database abbreviation with '_curators' appended; e.g. a definition written by several curators at TAIR has the dbxref GOC:TAIR_curators.
- If an expert from the community has contributed to a definition, use the expert's initials following 'GOC:expert_'; e.g. a definition from John Pringle has the dbxref GOC:expert_jrp.
- For definitions created at meetings, the dbxref has 'mtg_' followed by the meeting start date; e.g. definitions written at the June 2006 content meeting on CNS development have the dbxref GOC:mtg_15jun06.
- If the definition comes from a book, use the ISBN; e.g. a dbxref to the Oxford Dictionary of Molecular Biology would be ISBN:0198506732. Hyphens should be removed from the ISBN.
- If the definition comes from a paper, use the PubMed ID, e.g. PMID:11910864. If the paper doesn't have a PubMed ID, use another ID such as a DOI or model organism database ID.
Use of standard definitions
Wherever a 'standard' definition exists for a group of related terms, it should be used; please see the ontology guides for standard definitions used in each ontology. If you find yourself repeatedly using the same text string in a series of definitions, please send your standard definition to aji@ebi.ac.uk (Amelia Ireland), who keeps an up-to-date version of the list of standard definitions.
Redefining terms
A GO ID is really associated with a definition rather than with the term name. If we change the wording but not the meaning of a term, the GO ID stays the same; a new meaning requires a new GO ID, even if the text string doesn't change. Here's a trivial example that illustrates when we do and don't change GO IDs:
Assume that we have a term mouse, GO ID GO:0000123, in an ontology; it is defined as a small furry mammal.
- We decide to change the term wording to Mus musculus, keeping the definition the same. In this case we merely update the text; the GO ID stays the same because the meaning stays the same. We may choose to keep "mouse" as a synonym, but there would still only be one ID associated with the term.
- We decide that the term "mouse" should instead mean a piece of computer equipment. In this case, the old term and ID are moved to the obsolete category, and "mouse", as newly defined, gets a new GO ID, GO:0000456. The old GO ID and definitions are saved for posterity in case we ever need to know what happened to them.
See the term obsoletion protocol section below for details on how to obsolete a term.
Comments
All terms have an optional comments field for adding extra information about an entry. The purpose of this is to help annotators, especially if you have obsoleted or redefined a term. Comments can be anything relevant to the term or term definition. If you write a comment, you must use the appropriate syntax.
To refer to other terms in the ontologies, use the format
comment: Also see '[term name] ; GO:0000000'.
To make any other comment, prefix it with the following:
comment: Note that [comment].
See also the comment syntax for obsoletions and term splits.
Synonyms
Often when terms are created, there are several words or phrases that could be used as the term name. In such cases, one form will be chosen as term name whilst the other possible names are added as synonyms. Despite the name, GO synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the term they are attached to. Instead, a GO synonym may be broader or narrower than the term string; it may be a related phrase; it may be alternative wording, spelling or use a different system of nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serve as valuable search aids, as well as being useful for applications such as text mining and semantic matching.
Having a single, broad relationship between a GO term and its synonyms is adequate for most search purposes, but for other applications such as semantic matching, the inclusion of a more formal relationship set is valuable. For this reason, GO records a relationship type for each synonym. These relationships are stored in the OBO format GO file.
Synonym scopes
The synonym relationship scopes are:
-
the term is an exact synonym
ornithine cycle is an exact synonym of urea cycle -
the synonym is broader than the term name
cell division is a broad synonym of cytokinesis -
the synonym is narrower or more precise than the term name
pyrimidine-dimer repair by photolyase is a narrow synonym of photoreactive repair -
the terms are related
cytochrome bc1 complex is a related synonym of ubiquinol-cytochrome-c reductase activity; virulence is a related synonym of pathogenesis
The synonym scope related should be used where the relationship between a term and its synonym is NOT exact, narrower or broader.
In some cases, broader and narrower synonyms are created in the place of new parent or child terms because some synonym strings may not be valid GO terms but may still be useful for search purposes. For example, the string "respiration" is synonymous with both cellular respiration, the energy-generating metabolic processes of a cell, and respiratory gaseous exchange, or breathing; as its meaning is ambiguous, it is unsuitable for use as a GO term string, but we can add it as a broad synonym to both terms.
Adding synonyms
When you add a synonym using OBO-Edit, choose a scope from the pull-down selector (see the OBO-Edit user guide for more information). OBO-Edit will incorporate the synonym scope into the OBO format flat file when you save. The default synonym scope is 'related synonym', but this should be changed to a different scope if appropriate.
The number of synonyms for a term is not limited, and the same text string can be used as a synonym for more than one GO term.
Add synonyms if you edit a term name but the old name is still a valid synonym; for example, if you change "respiration" to "cellular respiration", keep "respiration" as a synonym. This helps other users to find familiar terms.
Add synonyms if the term has (or contains) a commonly used abbreviation. For example, FGF binding could be used as a synonym for fibroblast growth factor binding.
Do not add a synonym if the only difference is case (e.g. start vs. START). Synonyms, like term names, are all lower case except where demanded by context (e.g. DNA, not dna).
Rules For Synonyms
- acronyms are exactly synonymous with the full name, as long as the acronym is not used in any other sense elsewhere
- include implicit information when making a decision and take into account which ontology the term is in; e.g. an entry that ends in 'factor' is not synonymous with a molecular function
- jargon type phrases are exactly synonymous with the full name, as long as the phrase is not used in any other sense elsewhere
- proton is exactly synonymous with hydrogen where hydrogen refers to H+ (hydrogen ion); proton is not synonymous with H2 (hydrogen gas)
- ligand is not exactly synonymous with binding (ligand is an entity, binding is an action)
- x receptor ligand is not exactly synonymous with x (x is only one of the potential ligands so XXX receptor ligand is broader than x)
- x complex is not exactly synonymous with x (x is ambiguous - could be describing the activity of x)
- x transporter is broader than x porter, x symporter or x antiporter
Cross-referencing other databases
General database cross-references, or general dbxrefs, should be used where a GO term is identical to an object in another database. For more information on syntax, please refer to the GO File Format Guide and for a complete list of dbxrefs, see the database cross-references page.
| Ontology | Database | Sample dbxref |
|---|---|---|
| Function | Enzyme Commission [external website] | EC:3.5.1.6 |
| Transport Protein Database [external website] | TC:2.A.29.10.1 | |
| University of Minnesota Biocatalysis/Biodegradation Database [external website] (UM-BBD) | UM-BBD_enzymeID:e0310 | |
| MetaCyc [external website] metabolic pathway database | MetaCyc:XXXX-RXN | |
| Process | MetaCyc [external website] metabolic pathway database | MetaCyc:2ASDEG-PWY |
| University of Minnesota Biocatalysis/Biodegradation Database [external website] (UM-BBD) | UM-BBD_pathwayID:dcb | |
| Component | None | |
The GO database cross-references set is maintained by the BioMOBY [external website] project; please email the GO helpdesk to suggest any changes to this file.
Obsoleting terms
A term that is no longer used is not deleted, but is tagged 'obsolete'. Never delete a GO ID: GO IDs should be conserved at all times so that, even if a term is defunct or has a new GO ID, someone searching using the old GO ID can find it.
A term can become obsolete when it is removed or redefined, but a term should not not be made obsolete due to changes in wording that do not alter the meaning of the term (see the documentation on redefining terms). When a term's definition changes meaning, the term should also be assigned a new GO ID, and the old ID considered obsolete.
As a general rule, if the annotations to a GO term would need to be changed as a result of the term definition changing, the term should be made obsolete. However, terms should not be made obsolete on the basis of incorrect annotations; the database that submitted the annotations should be informed of the error instead.
In the browser AmiGO and in OBO-Edit, an obsolete term becomes a child of the meta node obsolete. Obsolete terms are identified in the OBO format flat file by the 'is_obsolete: true' tag.
Term Obsoletion Protocol
Terms used for manual annotation, generic GO slim, or mappings
When there is a proposal to obsolete a term that has been used by a consortium group for manual annotation, for mappings, or as part of the generic GO slim, the following standard operating procedure should be used to notify the group.
An email is sent to the list with a structured subject line:
Alert: Proposal to obsolete GO:nnnnnnn: term that impacts existing annotation
The email should have this structure:
The proposal has been made to obsolete GO:nnnnnnn: term name.
There exist today annotations to this term as follows (data from AmiGO):
- SGD: n objects
- FB: n objects
- MGD: n objects
- (etc., for all gene association data presented by AmiGO)
*The term is used in the following mappings:
db2go: external term ; ID -> GO:nnnnnnn: term
*The term is found in the generic GO slim set.
The reasons for this proposal are a brief summary of the technical case for change.
The SourceForge discussion is to be found on SourceForge url.
UNLESS OBJECTIONS ARE RECEIVED BY date WE WILL ASSUME THAT YOU AGREE TO THIS CHANGE.
*Delete as appropriate.
Note: the reason for using AmiGO here, and not all of the gene association files, is both for simplicity and because the major impact of these changes is on manually curated, rather than computationally predicted, annotations.
Changes will only be implemented if the following criteria are met:
- No objections are received from consortium members who have existing annotations to this term (i.e. those listed in the above e-mail).
- There is no one else who replies negatively to the proposal within 14 days.
If there is any consortium member who very strongly opposes this change, then the proposal will be put on the agenda for discussion at the next Consortium meeting.
Both those with affected annotations and those without should preserve the subject line in their reply.
We hope, of course, that consensus can be reached without the need for a face-to-face discussion at our next meeting. However, the consequences of some of these changes for the annotating groups are quite severe in terms of the work needed to reannotate, and we consider that it is most reasonable for such changes to be made only after all have had a chance to discuss both the case for change and its implications.
Obsoletion of terms in the generic GO slim
If a term in the generic GO slim is obsoleted after agreement by the consortium we alert the GO friends mailing list.
Terms not used for manual annotation, generic GO slim, or mappings
Proposed obsoletion should be implemented two weeks after they are announced on SourceForge unless objections are raised.
Comments for Obsolete Terms
When you make a term obsolete, insert the word 'OBSOLETE.' at the beginning of the term definition and add a comment that explains why the term has become obsolete and suggests alternative terms for annotators to use.
Use the following syntax for the reason for obsoletion:
comment: This term was made obsolete because [reason].
To suggest alternative terms, use one of the following:
Exact replacement(s)
If exact replacement is possible (i.e. it is safe to move all existing annotations, keyword mappings, etc. to one term), precede the suggested term with 'use':
To update annotations, use the [ontology name] term '[term] ; GO:[id]'.
example:
term: transfer RNA
goid: GO:0005563
comment: This term was made obsolete because it represents a gene product. To update annotations, use the molecular function term 'triplet codon-amino acid adaptor ; GO:0030533'.
No exact replacement(s)
In cases where all existing annotations and mappings can't necessarily be transferred to one term, put 'consider' in front of the suggested terms. Syntax for different situations:
1. There is only one suggestion, but it may not work for all annotations:
To update annotations, consider the [ontology name] term '[term] ; GO:[id]'.
example:
term: activation of MAPK (mating sensu Fungi)
goid: GO:0030456
comment: This term was made obsolete because it is a gene product specific term. To update annotations, consider the process term 'signal transduction during conjugation with cellular fusion ; GO:0000750'.
2. To make more than one specific suggestion:
a) from a single ontology, separate terms with commas:
To update annotations, consider the [ontology name] terms '[term1] ; GO:[id1]', '[term2] ;GO:[id2]', '[term3] ; GO:[id3]'.
example:
term: allantoin/allantoate transport
goid: GO:0006838
comment: This term was made obsolete because it is a composite term that represents two individual processes. To update annotations, consider the biological process term 'allantoin transport ; GO:0015720', 'allantoate transport ; GO:0015719'.
b) from more than one ontology, separate terms from one ontology with commas, and use 'and' between ontology names:
To update annotations, consider the [ontology name] terms '[term1] ; GO:[id1]' and the [ontology name] term '[term2]; GO:[id2]'.
examples:
term: expansin
goid: GO:0009936
comment: This term was made obsolete because it represents a gene product. To update annotations, consider the cellular component term 'cell wall (sensu Magnoliophyta) ; GO:0009505' and the biological process term 'cell growth ; GO:0016049'.
term: blue-sensitive opsin
goid: GO:0015059
comment: This term was made obsolete because it refers to a class of proteins. To update annotations, consider the molecular function terms 'photoreceptor ; GO:0009881', '3,4-didehydroretinal binding ; GO:0046876' and 'retinal binding ; GO:0016918' and its children, the cellular component term 'integral to membrane ; GO:0016021' and the biological process terms 'phototransduction, visible light ; GO:0007603' and 'UV-A, blue light phototransduction ; GO:0009588'.
To suggest a term and all its children (as in the example above), use the syntax
consider the [ontology name] term '[term] ; GO:[id]' and its children
Restoring obsolete terms
If you need to reinstate an obsolete term back into the ontologies, use the following:
comment: Note that this term was reinstated from obsolete.
Term Merges, Splits and Movements
Term merges
Terms are merged in cases where two terms have exactly the same meaning. Usually this situation arises when one term exists, and another wording of the same concept is added as a new term instead of as a synonym, either because a curator didn't find the old term or didn't know it meant the same thing.
When two terms are merged, e.g. term A and term B are merged into term A, the GO ID of term B is made a secondary GO ID, and the term string is made a synonym. Usually, the ID that has existed longer is used as the primary ID, but exceptions can be made; for example, the term string of the newer ID may be more correct or the definition may be better.
Secondary GO IDs are stored in the OBO flat file with the 'alt_id' tag.
Term splits
A term can be split if curators decide that it combines two or more concepts that should be represented by separate terms.
The standard procedure for splitting a term is to obsolete the original term and add a comment directing annotators to the new terms. See the example of a term that has been split.
Moving terms
Terms can be moved as long as the term's new position does not break the true path rule. Terms should not, however, be moved between ontologies; only within the same ontology. If you need to move a term to a different ontology, first obsolete it and then create a new term in the other ontology.
Understanding relationships in GO
The GO ontologies are structured as a directed acyclic graph (DAG), which means that a child (more specialized) term can have multiple parents (less specialized terms). This makes GO a powerful system to describe biology, but can also create some pitfalls for curators. Keeping the following guidelines in mind should help you to avoid these problems.
A child term can have one of two different relationships to its parent(s): is_a or part_of. The same term can have different relationships to different parents; for example, the child 'GO term 3' may be an is_a of its parent 'GO term 1' and a part_of its other parent, 'GO term 2':
The is_a relationship
In GO, an is_a relationship means that the term is a subclass of its parent. For example, mitotic cell cycle is_a cell cycle. It should not be confused with an 'instance' which is a specific example. For example, clogs are a subclass or is_a of shoes, while the shoes I have on my feet now are an instance of shoes. GO, like most ontologies, does not use instances. The is_a relationship is transitive, which means that if 'GO term A' is_a 'GO term B', and 'GO term B' is_a 'GO term C', 'GO term A' is_a 'GO term C':
For example:
Terminal N-glycosylation is_a subclass of terminal glycosylation.
Terminal glycosylation is_a subclass of protein glycosylation.
Terminal N-glycosylation is_a subclass of protein glycosylation.
The part_of relationship
The use of part_of in GO is more complex. There are four basic levels of restriction for a part_of relationship:
The first type has no restrictions. That is, no inferences can be made from the relationship between parent and child other than that the parent may or may not have the child as a part, and the child may or may not be a part of the parent.
The second type, necessarily is_part, means that wherever the child exists, it is as part of the parent. To give a biological example, replication fork is part_of chromosome, so whenever replication fork occurs, it is a part of a chromosome, but chromosome does not necessarily have part replication fork.
Type three, necessarily has_part, is the exact inverse of type two; wherever the parent exists, it has the child as a part, but the child is not necessarily part of the parent. For example, nucleus always has chromosome as a part, but chromosome isn't necessarily part of the nucleus.
The final type is a combination of both two and three, has_part and is_part. An example of this is nuclear membrane is part_of nucleus. So nucleus always has the part nuclear membrane, and nuclear membrane is always a part of the nucleus.
The part_of relationship used in GO is usually type two, necessarily is_part. Note that part_of types 1 and 3 are not used in GO, as they would violate the true path rule. Like is_a, part_of is transitive, so that if 'GO term A' is part_of 'GO term B', and 'GO term B' is part_of 'GO term C', 'GO term A' is part_of 'GO term C':
For example:
Laminin-1 is part_of basal lamina.
Basal lamina is part_of basement membrane.
Laminin-1 is part_of basement membrane.
The regulates relationship
In GO, a regulates relationship means that the term is a process that modulates its parent process. For example, regulation of transcription regulates transcription. The regulation of a process is not a part of the process itself. For example, regulation of transcription describes the processes that affect the transcriptional machinery to modulate its activity.
The ontology editing tool OBO-Edit [external website] allows you to specify the necessity of relationships. The part_of relationship used in GO, necessarily is_part, would correspond to part_of, [inverse] necessarily true. For more information, see the OBO-Edit user guide.
For information on how these relationships are represented in the GO flat files, see the GO File Format Guide.
For technical information on the relationships used in GO and OBO, see the OBO relationships ontology [external website].
Subsumption paths in GO
Not all of the GO ontologies currently have complete subsumption paths, that is, where every term has at least one path of is_a relationships back to the top node. There are several reasons why completing the subsumption hierarchy is a vital aim for GO.
Ontologically correct
Logically, everything that exists is a kind of something else; this applies to all entities in GO.
More accurate queries and reasoning
Without full subsumption paths we cannot get complete answers to queries such as "show me all the different kinds of membrane", "show me all the different kinds of protein complex". Or put another way, without these extra paths, GO is not complete.
Better compatibility with ontology tools
The majority of ontology tools, with the exception of OBO-Edit, assume that the subsumption hierarchy for an ontology is complete, so completing the subsumption hierarchy for GO will make it more compatible with existing tools, such as Protege-Frames [external website], Protege-OWL [external website] and SWOOP [external website].
Improved visualisation
Having complete complete is_a and part_of paths in GO will allow the design of tools to display alternative is_a and part_of views. This is a more intuitive way to view GO, as it detangles to complicated mixed relation view.
To this end, we have recently completed the subsumption hierarchy for the cellular component ontology. This was achieved by creating a set of new high-level terms ending in part. So for example, the term membrane was formerly only a part_of child of cell; it had no is_a parent:
cellular component
[i] cell
---[p] membrane
In the new structure, membrane is now part_of cell, and is_a cell part:
cellular component
[i] cell
---[p] cell part
------[i] membrane
[note that the is_a relation is transitive, so every cell part is implicitly part_of cell because cell part is part_of cell]
So the is_a path would be:
cellular component
[i] cell part
---[i] membrane
And the part_of path:
cell
[p] membrane
We are working on completing the process ontology subsumption hierarchy, which we hope will be done in 2007.
For more information on the is_a relationship, see the OBO Relations Ontology [external website].
True path rule
The true path rule states that "the pathway from a child term all the way up to its top-level parent(s) must always be true". One of the implications of this is that the type of part_of relationship used in GO, outlined more fully in the part_of relationship documentation above, is restricted to those types where a child term must always be part_of its parent.
Often, annotating a new gene product reveals relationships in an ontology that break the true path rule, or species specificity becomes a problem. In such cases, the ontology must be restructured by adding more nodes and connecting terms such that any path upwards is true. When a term is added to the ontology, the curator needs to add all of the parents and children of the new term.
This becomes clear with an example: consider how chitin metabolism is represented in the process ontology. Chitin metabolism is a part of cuticle synthesis in the fly and is also part of cell wall organization in yeast. This was once represented in the process ontology as follows:
cuticle synthesis
[i] chitin metabolism
cell wall biosynthesis
[i] chitin metabolism
---[i] chitin biosynthesis
---[i] chitin catabolism
The problem with this organization becomes apparent when one tries to annotate a specific gene product from one species. A fly chitin synthase could be annotated to chitin biosynthesis, and appear in a query for genes annotated to cell wall biosynthesis (and its children), which makes no sense because flies don't have cell walls.
This is the revised ontology structure which ensures that the true path rule is not broken:
chitin metabolism
[i] chitin biosynthesis
[i] chitin catabolism
[i] cuticle chitin metabolism
---[i] cuticle chitin biosynthesis
---[i] cuticle chitin catabolism
[i] cell wall chitin metabolism
---[i] cell wall chitin biosynthesis
---[i] cell wall chitin catabolism
The parent chitin metabolism now has the child terms cuticle chitin metabolism and cell wall chitin metabolism, with the appropriate catabolism and synthesis terms beneath them. With this structure, all the daughter terms can be followed up to chitin metabolism, but cuticle chitin metabolism terms do not trace back to cell wall terms, so all the paths are true. In addition, gene products such as chitin synthase can be annotated to nodes of appropriate granularity in both yeast and flies, and queries will yield the expected results.
Dependent ontology terms
Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:
- If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
- If regulation of X exists, then the process X must also exist. Potentially any process in the ontology can be regulated. Note: X may refer to a phenotype (for example cell size in regulation of cell size); in these cases, X should not be added to the ontology.