How do I annotate ESTs?

FAQ tags: 

To make electronic GO annotation to ESTs, it is usual to BLAST the EST sequences against sequences that have been manually annotated and transfer the annotations from similar sequences, adding evidence code IEA.

Some useful tools for EST annotation:

  • The previous version of AmiGO browser has a BLAST query feature built in, which you can still use to query annotated gene products in the GO database. For large batch queries, you may want to download the file of annotated sequences and use it to run BLAST locally. The file is available from the GO ftp site (ftp://ftp.geneontology.org/pub/go) and is updated regularly.

    Another option might be to install the AmiGO code and GO database locally.

    The underlying data are in flat files that can be found in these directories on the GO FTP site:

    ftp://ftp.geneontology.org/pub/go/gene_associations (annotated gene products)
    ftp://ftp.geneontology.org/pub/go/gp2protein (Uniprot IDs for annotated protein sequences)

    There is a README for the gp2protein directory. The format of the files in the /gene_associations directory is described in the GO annotation guide. Please let us know if you have questions about these files.

  • You could also try using InterProScan to find protein domains/motifs encoded by the ESTs, and transfer GO terms that have been associated with InterPro entries. See InterPro for more information. This related FAQ may be useful: How do I annotate a de novo assembled transcriptome against the GO database?
  • Several other groups have done automated assignment of GO terms to genes or proteins, including ESTs, and many of them would probably be willing to share their methods and software.