• Search
  • Method
  • Help / About

Supplementary Datasets for: Vidulin, Smuc and Supek, Bioinformatics (under review) 2016.

These data sets contain the predicted gene functions for 5,133,543 genes from 2,071 microbial genomes. They were obtained in a single massive analysis that draws on five established genomic methodologies: phylogenetic profiles, conserved gene neighborhoods, remote homology patterns, protein biophysical properties and codon adaptation profiles.

In particular, this encompasses:

  • the GO terms we initially assigned to each of the 21,626 COG/NOG gene families using the '50% rule'
  • the identifiers (PID, Entrez GeneID, gene name, synonym and RefSeq) of genes in each OG
  • for each of the five AFP methods, their predictions for each OG (the precision score, equivalent to 1-FDR, for all 1,227 learnable GO terms
  • the predictions after integrating the methods via the top-performing 'best precision', 'weighted voting' and 'one vote' schemes

In the tables with predictions, columns are GO terms, rows are COGs/NOGs (from EggNog v4), and cells contain the precision score, which is equivalent to 1-FDR. Precision<0.10, or equivalently FDR>0.90 are all written as 0.

 

Predictions - individual methods. (5-8 Mb each file)

predictions_Phyletic-profiles.txt.gz

predictions_Empirical-kernel-map.txt.gz

predictions_Conserved-gene-neighborhoods.txt.gz

predictions_Biophysical-protein-sequence-properties.txt.gz

predictions_Translation-efficiency-profiles.txt.gz

 

Predictions - integration schemes. (approx 8 Mb each file)

predictions_integrated_Best-precision.txt.gz

predictions_integrated_Weighted-voting.txt.gz

predictions_integrated_One-vote.txt.gz

 

ID and mapping files.

Gene-identifiers-to-OGs-mapping.txt.gz  (50 Mb)

GO-terms-assigned-to-OGs_using-50-percent-rule.txt.gz  (0.3 Mb)

 

More details of the methodology will become available upon acceptance of the manuscript.

 

© 2010       Design by: styleshout       Webmaster: mb      

Contact:

Creative Commons License
GORBI: Gene Ontology at Ruđer Bošković Institute by http://gorbi.irb.hr is licensed under a Creative Commons Attribution-Non-Commercial-Share Alike 3.0 Croatia License.