The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

Y.I.A. Kourmpetis, A. van der Burgt, M.C.A.M. Bink, C.J.F. ter Braak, R.C.H.J. van Ham

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 - 60%) of genes have been annotated with multiple and hierarchically independent terms. This may be necessary to attain adequate specificity of description. One noticeable exception is Arabidopsis thaliana, in which genes are much less frequently annotated with multiple terms (6 - 14%). In contrast, an analysis of the occurrence of InterPro hits in the proteomes of the seven species, followed by a mapping of the hits to GO terms, did not reveal an aberrant pattern for the A. thaliana genome. This study shows the widespread usage of multiple hierarchically independent GO terms in the functional annotation of genes. By consequence, probabilistic methods that aim to predict gene function automatically through integration of diverse genomic datasets, and that employ the GO, must be able to predict such multiple terms. We attribute the low frequency with which multiple GO terms are used in Arabidopsis to deviating practices in the genome annotation and curation process between communities of annotators. This may bias genome-scale comparisons of gene function between different species. GO term assignment should therefore be performed according to strictly similar rules and standards.
Original languageEnglish
JournalIn Silico Biology
Volume7
Issue number0050
Publication statusPublished - 2007

Keywords

  • Annotation strategies
  • Arabidopsis genome
  • Gene function prediction
  • Gene Ontology
  • Genome annotation
  • Multi-label classification
  • Protein function

Fingerprint

Dive into the research topics of 'The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation'. Together they form a unique fingerprint.

Cite this