We based our functional annotation models on orthologous and paralogous pairs inferred by the OMA algorithm, briefly explained here and detailed in the literature:
The OMA algorithm is a graph-based method of orthology inference. The algorithm starts with an all-against-all sequence alignment: proteins from two species are connected if they are best bidirectional hits, within a confidence interval, in the compared species. The connections are broken if a third species contains a pair of proteins more similar to each of the proteins in an evaluated pair than the connected proteins are similar to each other; the broken pairs are inferred paralogs. The remaining connections are inferred orthologs. Finally, OMA cliques of orthologs are sub-graphs where all proteins are connected by orthologous relationships.
The figure below shows a small section of the protein graph obtained using the OMA algorithm and the possible relations used in constructing phylogenetic profiles.
To choose among the models for functional annotation, we constructed four kinds of phylogenetic profiles. First, phylogenetic profiles of OMA cliques of orthologs: each profile represents the pattern of presence/absence of an OMA clique member among 909 Bacterial and 89 Archaeal genomes. Second, we added the presence patterns for all orthologs inferred by the OMA algorithm that did not participate in the ortholog clique. Third, we added presence patterns for all paralogs inferred by the OMA algorithm. Fourth, we made a separate set of phylogenetic profiles that only include clique members and paralogs, but not the orthologs outside of the clique.