• Search
  • Method
  • About
  • Help
  • Growth of sequence databases
  • Phylogenetic profiling
  • OMA cliques in phylogenetic profiling
  • Hierarchical multi-label classification
  • Choosing the best model
  • Calculating the confidence of annotations
  • Experimental validation

OMA cliques in phylogenetic profiling

We based our functional annotation models on orthologous and paralogous pairs inferred by the OMA algorithm, briefly explained here and detailed in the literature: 

  • Roth et al, 2008 "Algorithm of OMA for large-scale orthology inference"
  • Altenhoff et al, 2011 "OMA 2011: orthology inference among 1000 complete genomes"

In addition to being available as a standalone version, the results of the OMA algorithm are available for browsing. 

The OMA algorithm is a graph-based method of orthology inference. The algorithm starts with an all-against-all sequence alignment: proteins from two species are connected if they are best bidirectional hits, within a confidence interval, in the compared species. The connections are broken if a third species contains a pair of proteins more similar to each of the proteins in an evaluated pair than the connected proteins are similar to each other; the broken pairs are inferred paralogs. The remaining connections are inferred orthologs. Finally, OMA cliques of orthologs are sub-graphs where all proteins are connected by orthologous relationships. 

The figure below shows a small section of the protein graph obtained using the OMA algorithm and the possible relations used in constructing phylogenetic profiles. 

Part of the OMA graph
Members of an OMA group are all connected by orthologous relations and they form a clique (red). Some orthologous proteins were left out in the process of forming cliques because they lack an orthologous connection to at least one group member (blue). A witness to non-orthology infers paralogs (green).

To choose among the models for functional annotation, we constructed four kinds of phylogenetic profiles. First, phylogenetic profiles of OMA cliques of orthologs: each profile represents the pattern of presence/absence of an OMA clique member among 909 Bacterial and 89 Archaeal genomes. Second, we added the presence patterns for all orthologs inferred by the OMA algorithm that did not participate in the ortholog clique. Third, we added presence patterns for all paralogs inferred by the OMA algorithm. Fourth, we made a separate set of phylogenetic profiles that only include clique members and paralogs, but not the orthologs outside of the clique.

Constructing phylogenetic profiles: presence of the corresponding homolog is shown with the colours and their combinations. For example, when constructing the phylogenetic profile that accounts for OMA clique members (red) and all left out orthologs (blue), the cell in the 1st column and 1st row will have '1': 'Species 1' has an 'OMA 1' clique member (red) and at least one more protein in an orthologous relationship with at least one protein from 'OMA 1' (blue); the cell in the 998th column and 2nd row will have '0': 'Species 998' only has protein(s) in a paralogous relationship to 'OMA 2' members. In the Function column, the Gene Ontology annotations are assigned when at least half of the OMA clique members have the respective annotation.

© 2010 Laboratory for information systems      Design by: styleshout       Webmaster: mb      

Contact:

Creative Commons License
GORBI: Gene Ontology at Ruđer Bošković Institute by http://gorbi.irb.hr is licensed under a Creative Commons Attribution-Non-Commercial-Share Alike 3.0 Croatia License.