OMA orthologs and paralogs
OMA is a graph-based method of orthology inference. The algorithm starts with an all-against-all sequence alignment: proteins from two species are connected if they are best bidirectional hits, allowing for a confidence interval.
Next, these connections may be broken if a third species contains a pair of proteins more similar to each of these two proteins than the connected proteins are similar to each other. Such broken pairs are inferred paralogs, while the remaining connections are inferred orthologs. Finally, OMA ortholog cliques are sub-graphs where all proteins are connected by orthology relationships.
Members of an OMA clique (red) are all inter-connected by orthologous relations. Some orthologous proteins were left out because they lack an orthologous connection to at least one clique member (blue). A procedure to search for 'witnesses of non-orthology' detects paralogs (green).
Based on the above, we constructed four kinds of phylogenetic profiles:
- First, phylogenetic profiles of OMA cliques of orthologs: each profile represents the pattern of presence/absence of an OMA clique member among 909 Bacterial and 89 Archaeal genomes.
- Second, we added the presence patterns for all remaining orthologs inferred by the OMA algorithm that did not participate in the original ortholog clique.
- Third, we added presence patterns for all paralogs inferred by the OMA algorithm.
- Fourth, we also made a separate set of phylogenetic profiles that include only clique members and paralogs, but not the orthologs outside of the clique.
Constructing phylogenetic profiles: presence of the corresponding homolog is shown with color combinations. For example, when constructing the phylogenetic profile that accounts for OMA clique members (red) and all left-out orthologs (blue), the cell in the 1st column and 1st row will have '1': 'Species 1' has an 'OMA 1' clique member (red) and at least one more protein in an orthologous relationship with at least one protein from 'OMA 1' (blue); the cell in the 998th column and 2nd row will have '0': 'Species 998' only has protein(s) in a paralogous relationship to 'OMA 2' members. In the Function column, the
Gene Ontology annotations were assigned when at least half of the OMA clique members have the respective annotation.