To validate how realistic are the confidence estimates reported by our machine learning model, we chose annotations for 38 genes in Escherichia coli K-12 at a threshold of 60% expected Precision.
We focused on three GO terms that were straightforward to investigate experimentally using readily available antibiotics:
The 38 E. coli strains, each with the deletion of one selected gene, were grown in the presence of antibiotics that target the above processes:
We considered a prediction confirmed only if: (1) the survival of a mutant was <25% of the wild-type, when grown with the antibiotic inhibiting the process GORBI has predicted, and (2) the survival of the mutant was >50% of the wild-type when grown on the other two antibiotics, here used as negative controls.
For example, we predicted “DNA damage response” for the E. coli yfgI gene: when grown with DNA-damaging nalidixic acid, the yfgI mutant had 7% survival of the wild type, but when grown on kasugamycine or ampicilin, the survival was much higher: 98% and 74% of the wild type, respectively. We therefore consider this prediction confirmed: the yfgI mutant is sensitive to a DNA-damaging agent, while exhibiting wild type-like resistance to other stresses.
With these criteria, 25 out of 38 genes had confirmed predictions, equivalent to an experimental Precision of 66% (95% C.I. 51%-81%). This agrees well with the expected Precision of ≥60%, show that our estimates of accuracy are realistic.
In fact, 14 of the 38 tested genes have Precision ≥85%. For these genes, the experiments have shown 11/14 (79%) to be correct, approximately matching the expected precision of 85%.