When estimating the classification error, RF uses the bagging (Bootstrap aggregating) methodology of resampling the dataset (Breiman, 2001). One bootstrap sample consists of the same number of examples as the original dataset, randomly drawn with replacement; consequently, this contains ~2/3 of unique examples.
A CLUS-HMC decision tree is produced from the bootstrap sample. Those examples that were omitted from the bootstrap sample—one third of the original dataset—will be used in calculating Precision, Recall, and Area Under the Precision-Recall Curve (AUPRC). Being based on a random sample, the measures are therefore unbiased.
The final prediction of the random forest is obtained by aggregating the predictions of the individual trees. This method is called out-of-bag error estimate
Here, we are dealing with a multi-class problem: each OMA clique can be annotated with multiple GO terms. The classifier we are using, CLUS-HMC can take advantage of such data, and assigns a confidence score that each OMA group is assigned each of the GO terms. By varying a cut-off for the probability form 1.0 to 0.0, we are relaxing the stringency of the predictions: an increasing number of OMA groups are assigned an increasing number of GO terms.
In addition to this ranking of GO predictions for the individual proteins, we wanted to estimate the number of candidate genes we need to experimentally examine in order to get confirmed annotations. Therefore, we translated the probabilities to Precision for each GO term. Precision is equivalent to 1-FDR (false discovery rate).
Similarly as above, we varied the cut-off for the probability, and calculated the corresponding Precision for each GO term at each probability cut-off: out of all the OMA clique annotations that pass the threshold, we counted the number of true positives, and the number of false positives.