While the number of sequenced prokaryotic genomes grows rapidly, the number of assignments of gene functions in the available databases remain low and skewed toward Escherichia coli. The February 2012 release of the UniProt-GOA database holds ~6000 annotations with experimental support for E. coli, and ~800 such annotations for all other prokaryotes combined.
Thus, there is a need for computational methods that would predict gene function on a massive scale, helping prioritize downstream experimental work. We thus developed GORBI, a machine learning framework for orthology and paralogy-aware phylogenetic profiling, which provides a large number of computational annotations with high accuracy in train/test evaluations.
Importantly, we have shown a proof-of-principle that our functional annotation model can be used to generate relevant biological hypotheses: we performed experiments on 38 E. coli knockout mutants, which demonstrate that GORBI provides realistic estimates of accuracy.
Our predictions include annotations for 1.3 million genes with the estimated Precision of 90%; these, and many more predictions for 998 prokaryotic genomes are freely available from the GORBI search page.
Please click the links in the left side-bar for more information on specific topics.