Predicting Rifampicin Resistance Mutations in Bacterial RNA Polymerase Subunit Beta Based on Machine Learning Algorithms


 BackgroundMutations in an enzyme target are one of the most common mechanisms whereby antibiotic resistance arises. Identification of the resistance mutations in bacteria is essential for understanding the structural basis of antibiotic resistance and design of new drugs. However, the traditionally used experimental approaches to identify resistance mutations were usually labor-intensive and costly. ResultsWe present a machine learning (ML)-based classifier for predicting rifampicin (Rif) resistance mutations in bacterial RNA Polymerase subunit β (RpoB). A total of 66 resistance mutations were gathered from the literature to form positive dataset, while 53 residue variations of RpoB among a series of naturally occurring species were obtained as negative database. The features of the mutated RpoB and their binding energies with Rif were calculated through computational methods, and used as the mutation attributes for modelling. Classifiers based on four ML algorithms, i.e. decision tree, k nearest neighbors, naïve Bayes and supporting vector machine, were developed, which showed accuracy ranging from 0.69 to 0.76. A majority consensus approach was then used to obtain a new classifier based on the classifications of the four individual ML algorithms. The majority consensus classifier significantly improved the predictive performance, with accuracy, precision, recall and specificity of 0.83, 0.84, 0.86 and 0.83, respectively. ConclusionThe majority consensus classifier provides an alternative methodology for rapid identification of resistance mutations in bacteria, which may help with early detection of antibiotic resistance and new drug discovery.


Abstract Background
Mutations in an enzyme target are one of the most common mechanisms whereby antibiotic resistance arises. Identi cation of the resistance mutations in bacteria is essential for understanding the structural basis of antibiotic resistance and design of new drugs. However, the traditionally used experimental approaches to identify resistance mutations were usually labor-intensive and costly.

Results
We present a machine learning (ML)-based classi er for predicting rifampicin (Rif) resistance mutations in bacterial RNA Polymerase subunit β (RpoB). A total of 66 resistance mutations were gathered from the literature to form positive dataset, while 53 residue variations of RpoB among a series of naturally occurring species were obtained as negative database. The features of the mutated RpoB and their binding energies with Rif were calculated through computational methods, and used as the mutation attributes for modelling. Classi ers based on four ML algorithms, i.e. decision tree, k nearest neighbors, naïve Bayes and supporting vector machine, were developed, which showed accuracy ranging from 0.69 to 0.76. A majority consensus approach was then used to obtain a new classi er based on the classi cations of the four individual ML algorithms. The majority consensus classi er signi cantly improved the predictive performance, with accuracy, precision, recall and speci city of 0.83, 0.84, 0.86 and 0.83, respectively.

Conclusion
The majority consensus classi er provides an alternative methodology for rapid identi cation of resistance mutations in bacteria, which may help with early detection of antibiotic resistance and new drug discovery.

Background
Antibiotic resistance has become one of the greatest threats to public health all over the world. Pathogens with antibiotic resistance add di culty to deal with infections and lead to increasing mortality. As stated by the United Nation in 2019 [1], at least 700 thousands of deaths are caused by infections of resistant pathogens every year, and this number will soar to 10 million annually by 2050 if no action is taken. Among the ever-growing resistant pathogens, Mycobacterium tuberculosis (MTB) is of particular concern because this species is a causative agent of tuberculosis, a highly-ranked death cause worldwide nowadays [1]. Rifampicin (Rif), an antibiotic of rifamycin class, has been extensively used to treat tuberculosis. However, there has been an increasing occurrence of Rif resistance in MTB, raising emerging health concerns [2,3]. It was estimated that approximately 484,000 new cases of Rif-resistant tuberculosis and 214,000 Rif-resistant tuberculosis related deaths occurred in 2018 [4].
Antibiotic resistance in bacteria can originate from multiple sources, such as acquiring antibiotic resistance genes (ARGs) carried by mobile genetic elements (e.g. plasmids and transposons) [5], overexpression of multidrug e ux [6] and de novo resistance mutations in bacterial genomes [7]. For Rif, resistance is primarily caused by single point mutations in RNA polymerase (RNAP), an enzyme that is essential for RNA synthesis [8,9]. Rif typically binds to the β subunit of RNAP (RpoB) and blocks RNA synthesis, leading to the death of bacterial cells.
Mutations in RpoB might cause changes of RpoB conformation and prevent Rif from binding to RpoB, resulting in loss of bactericidal activity of Rif. It should be noted that mutations occur randomly at any site of RpoB and do not always cause detrimental outcomes, instead, only those inducing resistance phenotypes (known as resistance mutations) are undesired and are more noteworthy. Currently, resistance mutations in bacteria are mostly identi ed through experimental approach, for example, to sequence and compare the relevant DNA segments in the mutants and the wild type strain, which are time-and labor-consuming. Since there are lots of probabilities for mutations in a given protein, it is of great signi cance to build predictive, other than experimental, approaches for quick screening of the resistance mutations.
Machine learning (ML) is a branch of arti cial intelligence (AI), which learns from massive amounts of data and reveals patterns and features in the data for predictions and decision making based on new data. Nowadays, ML algorithms have found applications in a variety of elds, such as speech recognition, tra c prediction and recommender systems [10][11][12][13]. In recent years, ML algorithms have been widely used in molecular biology and toxicology. For example, Murakami and Mizuguchi (2010) developed a naïve Bayes (NB) classi er for predicting the protein-protein interaction sites and Zhang et al. (2017Zhang et al. ( , 2016 constructed NB classi ers for predicting drug-induced liver injury and mitochondrial toxicity in human. In those studies, ML algorithms served a powerful tool for solving classi cation problems. Classi cation is an important issue to understand the question whether mutations occurring in bacterial RpoB could lead to Rif resistance, and thus it would be useful for predicting the outcomes of these mutations, yet there have been rare such attempts in the literature. In this paper, we reported a novel ML-based method for predicting the Rif resistance mutations in bacterial RpoB. The positive resistance mutations were collected from the literature, while the negative mutations were obtained by gathering the residue variations in RpoB among different naturally occurring species. Four ML algorithms, i.e. decision tree (DT), k nearest neighbors (kNN), NB and supporting vector machine (SVM) were employed for modeling using 66 positive and 53 negative mutations. A majority consensus classi er was nally obtained based on the classi cations of four individual ML algorithms, which showed satisfactory predictive performance. The ML-based classi er provides an alternative approach for quickly identifying Rif resistance mutations in bacteria.

Results
Negative and positive mutations in RpoB of E. coli K12 Mutations occur both spontaneously and under stresses during DNA replication and repair processes. Herein, mutations that confer bacterial resistance against antibiotics (i.e., antibiotic resistance mutations) are referred to as positive mutations, while those do not induce any changes in bacterial resistance phenotype are assigned as negative mutations. Resistance to Rif in bacteria is thought to be solely attributed to the single point mutations of rpoB gene that encodes RpoB. Rif resistance mutations primarily occur within Rif resistancedetermining regions (RRDRs) of RpoB that are involved in the formation of the Rif-binding pocket (Fig. 1A). In the literature, the positive mutations were usually determined by sequencing the rpoB gene of the isolated Rif-resistant strain and comparing it with the wild type sequence. In the present research, 68 positive mutations at 25 amino acid positions were gathered from the literature. As shown in Fig. S1, a majority (17) of the sites were in RRDR-, while two in RRDR-N, ve in RRDR-and one in RRDR-. Among these residues, 516, 526 and 531 in RRDR-are the hotspots where resistance mutations occur most frequently [17].
Unlike the positive mutations, the negative mutations do not cause resistant phenotype. Moreover, negative mutations can be hardly recognized due to the di culties in isolating the negative mutants and enriching them for sequencing. In the present research, we proposed that the differences of RRDRs in RpoB among different naturally occurring strains could be considered as negative mutations. This assumption was based on the facts that RRDRs in RpoB are highly conserved among different prokaryotes and play important roles in the combination of Rif and RpoB, and variations within RRDRs of RpoB from different naturally occurring strains may have no in uence on the combination between RpoB and Rif. Therefore, the negative mutations in this work were obtained by nding out the different residues of RpoB among different strains. A total of 32021 RpoB sequences were downloaded from RefSeq database and 100 of them were randomly picked up to obtained a manageable number of the sequences. Figure 1B shows the sequence alignment of E. coli and other species (only 11 sequences were shown here), which suggested that the sequences had a very high identity in the range of RRDRs, with only a small minority of the sites displaying difference. The relationships between the number of the sampled sequences and the accumulative numbers of the different sites and the amino acid (AA) changes were described by the rarefaction curves. As shown in Fig. 1C, the number of the obtained sites and AA substitutions showed only slight changes when the number of the sequences approached to 100, suggesting a satisfactory sampling depth. A total of 55 AA changes were nally gathered from the 100 sequences (Table 1), among which only two changes (H526Q and H526N) had also appeared in the positive dataset. The two AA changes shared by the positive and negative datasets were excluded from both datasets before the subsequent calculation and analysis. Overall, there are 119 point-mutations in the nal dataset, including 53 negative and 66 positive mutations.
Amino acid changes appeared in both negative and positive datasets were designated with a superscript "a" and shown in bold.

Evaluation of the mutated RpoB structures
Twelve features (Table S1) were obtained by PremPS Server for each mutated RpoB. Among these features, ΔΔG is usually used for predicting the stability of the protein caused by mutations [18]. It is obtained by quantifying the change of unfolding Gibbs free energy (ΔG) of a protein after a single point mutation. According to the results, almost all (118) of the 119 mutants had greater ΔGs than the wild type RpoB (i.e., ΔΔG > 0), suggesting destabilizing effects of these point mutations (Fig. 2C). The only exception was A683S mutant with a ΔΔG value of -0.61 kcal•mol − 1 . Pearson's correlation coe cient of each pair of the features was shown in Fig. 2A, and the p values were shown in Fig. 2B. It was revealed that, the stability of the mutated RpoB (ΔΔG) was signi cantly correlated (p < 0.01) to location, PSSM, ΔOMH, P_L, P_FWY, SASA_pro, SASA_sol, and the mutation type. As for the relationship between the mutation type and ΔΔG, it was shown in Fig. 2C that most of the ΔΔG values of the negative mutants fell into the range of 0-2.0 kcal•mol − 1 , with a mean of 0.86 kcal•mol − 1 , whereas ΔΔG values of the positive mutants distributed in a larger range (0-3.31 kcal•mol − 1 ) and with a greater mean (1.25 kcal•mol − 1 ). Statistical analysis with an unpaired t test (Mann Whitney test) suggested that the difference of the ΔΔG values between negative and positive mutations were statistically signi cant (p < 0.01) (Fig. 2C). This result implied that a mutant with a higher ΔΔG value had a greater inclination to become resistant. With respect to the location, it was found that mutations at residues that were buried in the protein (i.e., in the core) tended to have greater ΔΔG values than those occurring on the surface of the protein (Fig. 2D). This suggested that changes of the buried residues may have greater destabilizing effects on the protein. The stability of the mutants also had a high correlation with P_FWY ( Fig. 2E), which represented the fraction of aromatic residues buried in the protein core. This suggested that the aromatic residues buried in the protein core may have important functions in maintaining the stability of the protein.

Interactions between mutated RpoB and Rif molecule
The combination of Rif molecule with mutated and wild type RpoB were studied by molecular docking. The Rif pose obtained by docking simulation was similar to that obtained experimentally (i.e., Rif in the complex of 5UAC). As shown in Fig. 3A, the macrocyclic moieties of Rif in the two conformations presented a good overlap, while the 1-methylpiperazine tail of Rif in the two conformations showed a slight orientation change. Interactions between wild type RpoB and Rif molecule (in 5UAC complex) were shown in Fig. 3B. It can be seen that ARG529, SER531, ARG540 and PHE514 were key amino acids that involved in hydrogen bond interactions with Rif, while LEU511, LEU533 and PRO564 played important roles in the hydrophobic interactions. In addition, it was found that the macrocyclic moiety of Rif dominated in binding with RpoB, while the 1-methylpiperazine tail had no interaction with RpoB (Fig. 3B). In the docked complex (Fig. 3C), however, more interactions were found between RpoB and Rif, especially between RpoB and the 1-methylpiperazine tail of Rif. Speci cally, GLU565 and ASN568 formed hydrogen bond interactions with the 1-methylpiperazine tail, and PRO567 formed a π-alkyl interaction with the piperazine ring. These interactions could well explain the orientation change of the 1-methylpiperazine tail in the docked complex, which was probably closer to the realistic situation. The structures of complexes of D516V and Rif, as well as S531L and Rif, have been determined experimentally in previous research [19]. Comparisons of the docked poses of Rif with the experimentally obtained conformations for D516V were shown in Fig. S2. As similar to the case of the wild type RpoB, the docked Rif in the D516V mutant also presented a slight change in the orientation of the 1-methylpiperazine tail, which was likely attributed to the interactions between the piperazine ring and the residues.
The distributions of the binding energies between Rif and mutated RpoB models were shown in Fig. 3D. The mean of the binding energies for negative mutants appeared lower than the positive mutants. Meanwhile, it was found that the binding energies between negative RpoB mutants and Rif was close to that between wild type RpoB and Rif (-8.47 kcal/mol), with only a few outliers (Fig. 3D). As for the positive mutants, however, the binding energies distributed in a wider range. This suggested that the positive mutants tended to present greater changes in the binding energies with Rif than the negative mutants.

Comparison of classi ers using different machine learning algorithms
Four classi ers with DT, kNN, NB and SVM algorithms were developed by using the PremPS-obtained attributes and the difference of the binding energies between wild type and mutated RpoB (denoted as ΔE). The overall confusion matrix for the classi ers and the detailed prediction results were shown in Table S2 and Table 2, respectively. The comparison of the prediction results of the classi ers were depicted in Fig. 4A. It was shown that kNN and NB classi ers had the greatest accuracy with a same score of 0.76, followed by DT (0.71) and SVM It should be noted that, the predictive models developed herein are intended to identify the potential resistance mutations (i.e., positive mutations), thus it is more important that the positive mutations are predicted correctly. In view of this, the recalls of the classi ers are more important than the speci city. F-measure, which conveys the balance between recall and precision, was also calculated. As listed in Table 2, NB had the highest F-measure score (0.81), followed by kNN (0.79), SVM (0.75) and DT (0.72).
Furthermore, the receiver operating characteristic (ROC) curves of the classi ers (Fig. 4B), which demonstrated the connection between recall and speci city, were obtained by plotting the recall versus "1-speci city" across all possible thresholds for the test. A ROC curve that is closer to the left-hand and top borders of the ROC space points to a high accuracy of the classi er, and the area under the ROC curve (AUC) was used to represent the measure of separability of the classi ers. According to the results (Table 2), kNN, NB and SVM had similar AUC scores (0.81, 0.80 and 0.80, respectively), suggesting close separation capacity of the three classi ers, which was greater than DT whose AUC score was 0.74.

Evaluation of the majority consensus prediction
The majority consensus prediction was based on the classi cation results of the four individual ML algorithms. The prediction results and ROC curves for the majority consensus were shown in Table 2 and Fig. 4B, respectively. In general, the majority consensus showed better predictive performance than the four individual ML algorithms, with higher accuracy, AUC, F-measure, precision and speci city scores (Fig. 4A). The accuracy of the majority consensus was 0.83, with 99 of 119 mutations being correctly predicted. In particular, the majority consensus prediction had a reasonably balanced recall and speci city, which were 0.84 and 0.83, respectively. That is, 55 positive mutations were identi ed from a total of 66 data, and 44 negative mutations were identi ed from 53 data. The precision, which represents the correct proportion of positive identi cations, came out to be higher for the majority consensus with a score of 0.86, and the F-measure, which balances the concerns of recall and precision was also higher for the majority consensus (0.85) than the individual algorithms.

Application of the classi ers
In theory, a total of 1197 (63 × 19) single point mutations can occur within the RRDRs of RpoB (19 possibilities for each site). All of the possible RpoB mutants were built through PremPS server and were investigated for their interactions with Rif molecule by LeDock. The PremPS-obtained features and the ΔE for the RpoB mutants were provided in the Table S3. Afterwards, the resistance types of the mutated RpoB were predicted by using four individual ML algorithms, i.e. DT, kNN, NB and SVM. A total of 685, 684, 917 and 1100 positive mutations were identi ed by the four algorithms, respectively. Then the classi cations of the four ML algorithms were combined based on the majority consensus to obtain the nal classi cation. The majority consensus identi ed 820 positive mutations, and the nal classi cations were depicted as a heatmap in Fig. 5. It was shown that the resistance mutations could occur at a majority of the sites within RRDRs region, with exceptions of GLU504, GLN510, PHE514, GLU565, PRO567, ILE569, LEU575, ASN684, where no resistance mutations were predicted. However, it should be noted that the predicted resistance mutations shown in Fig. 5 do not necessarily occur in real conditions, they only represent that the mutations have a high probability to confer resistance phenotype if they occur.

Discussion
Antibiotic resistance is a major threat to public health and has aroused great concerns. One of the mechanisms for antibiotic resistance is the mutations occurring in an enzyme target, causing structural modi cations of the active sites that no longer allow the combination of the antibiotics [20]. Resistance originating in this way has been observed for many antibiotics. For example, mutations of pbp, a gene encoding penicillin-binding proteins, conferred bacteria with β-lactams resistance [21], and mutations of rrnA and rrnB, genes that encode 16S rRNA, conferred tetracycline resistance [22]. In the case of Rif, resistance usually arises as a result of single point mutations in bacterial RNAP, an enzyme that accommodates Rif molecule at the β subunit. Identi cation of the resistance mutations is essential for understanding the structural basis of the antibiotic resistance, and provides useful information for prevention of the resistance emergence and new drug discovery. In previous studies, the identi cation of the resistance mutations was usually achieved through labor-intensive and costly experiment, for example to isolate the resistance mutants and sequence the relevant gene segments followed by aligning the sequences with those of the sensitive strains [23,24]. The experimental approach is thus incapable of rapid identi cation, and moreover, the identi cation with experimental approach usually lags behind the emergence of the resistance. Therefore, prediction of the resistance mutations based on known information is highly desired.
In previous studies, there have been some attempts that focused on the prediction of antibiotic resistance mutations. For example, Frey et al.
(2010) reported a computational approach for predicting resistance mutations in dihydrofolate reductase of bacteria. In this approach, a protein design algorithm was employed for generating positive and negative mutations and the prediction was based on the analysis of the a nity between the mutants and the inhibitor. Comparing to this approach, the predictive algorithm developed in the present work takes into account not only the a nity but also a series of features of the mutants. Moreover, the predictive algorithms in the present work were trained using a collection of both negative and positive mutations, and is thus more reliable and explicable.
There haven't been any attempts in the literature to use ML algorithms for predicting resistance mutations, until recently Jamal et al. (2020) reported a work that used AI and ML algorithms to predict resistant and susceptible mutations in MTB. However, the method for developing predictive models in Jamal et al. (2020) was debatable. When preparing the database for modeling, the authors labeled mutations with positive ΔΔG values as susceptible mutations, otherwise as resistant mutations. The problem was that ΔΔG was an indicator of the stability of a mutated protein, and it was improper to use this parameter alone for classifying the mutations, although this parameter might have some relationship with the mutation type. Moreover, in the following steps, the authors used this parameter again as an attribute of the mutations for training the classi cation models, which was also improper. Unlike this work, the data used for developing the predictive models in the present research came from two sources, the resistance mutations that have been validated experimentally were obtained from the literature and used as the positive dataset, while the different residues of RNAPs from a series of naturally occurring species were deemed as neutral mutations and used for the negative dataset. In addition, we developed a majority consensus algorithm based on the predictions of four individual ML algorithms, which signi cantly improved the predictive performance. However, the classi er developed in this work still has some limitations, for example, it can only apply to predicting Rif resistance mutations and the classi cations need further validation with experiments. Nonetheless, the present work provides an inspiration and an alternative methodology for rapid identi cation of resistance mutations in bacteria, which may be helpful for early detection of resistance and new drug discovery.

Conclusion
A majority consensus classi er was developed for predicting Rif resistance mutations in bacterial RNAP based on four ML algorithms (i.e. DT, kNN, NB and SVM), using 66 positive resistance mutations and 53 negative mutations as the dataset. The features of the mutated RpoB and their combination with Rif molecule were studied by computational approaches and used for developing the predictive models. Analysis of the features showed that the mutation type was signi cantly related to ΔΔG, PSSM and P_FWY. Estimates of the predictive models showed that the four individual algorithms presented varying predictive performance with accuracy ranging from 0.69 to 0.76, while the majority consensus classi er gave better predictions with a higher accuracy of 0.83, and higher AUC (0.86), higher precision (0.86) and speci city (0.83) as well. Using this majority consensus, the possible mutations within RRDRs of RNAP were classi ed, which showed that a majority of the mutations were identi ed as resistance-conferring mutations. However, it should be noted that further experiment is needed for validation of the classi cations.

Mutation dataset preparation
Mutations in RpoB that confer bacteria with Rif resistance are assigned as positive, otherwise as negative. The positive mutations in RpoB of E. coli have been well documented in the literature [19,27] and were gathered to obtain the positive dataset. As shown in Table 1, the positive dataset included 68 amino acid (AA) substitutions at 25 sites in total. All of the 68 AA changes were within the RRDRs (Table S4) [19], which were considered highly conserved among prokaryotes. It should be noted that the AA changes gathered in the positive dataset were not necessarily exhaustive, because some changes that confer Rif resistance may have not been observed or reported yet. Despite of the high conservation of the RRDRs, differences in the residues of the RRDRs may still exist among naturally occurring species. These differences are unlikely to affect the susceptibility of the species against Rif and can thus be deemed as negative mutations. Therefore, the negative dataset used for the machine learning was obtained through the following procedures. First, the sequences of RpoB from different bacteria were downloaded from NCBI reference sequence (RefSeq) database, ltered with "bacteria" in species and "1000-1400" in sequence length. A total of 32021 sequences were obtained in this way, and 100 of them were picked up randomly and subjected to alignment with the RpoB sequence of E. coli. Afterwards, the different residues within RRDRs between E. coli and other species were gathered to obtained the negative dataset. The sequence alignment was conducted on Discovery Studio (DS) by using Align Sequences protocol. The negative database was shown in Table 1.

Construction of the RpoB mutants
The wildtype RpoB in E. coli K12 was obtained from Protein Data Bank (ID: 5UAC, Chain C). Mutated RpoB with single point mutations were constructed through PremPS Server (https://lilab.jysw.suda.edu.cn/research/PremPS/), using the wildtype RpoB (5UAC) as a template. PremPS Server also gives a set of parameters that characterize the mutated proteins, such as the unfolding Gibbs free energy changes (ΔΔG), differences of hydrophobicity scale between mutated and wildtype RpoB (ΔOMH), and the solvent-accessible surface area (SASA_pro) [28]. Interpretations of these parameters are listed in Table S1.

Molecular docking
The interactions between Rif molecule and mutated and wild type RNAP were investigated by LeDock, a molecular docking program [29].
Prior to the docking, the wild type RpoB and the mutated models were prepared by LePro, which added missing hydrogen to the proteins and remove redundant structures. The binding pocket of the protein was set manually as a cube box at the same location for both mutated and wild type RpoB, with coordinates of x = 105-123, y = 20-38, and z = -17-1. The number of the binding poses was set as 20, which means the process will generate 20 random docking poses. The poses with the highest scores were chosen to represent the optimal binding poses of Rif with the mutated models.

Machine learning algorithms
Four supervised machine learning (ML) algorithms were employed for developing predictive models in the present research, i.e. Naïve Bayes (NB), k nearest neighbor (kNN), support vector machine (SVM) and decision tree (DT). Naive Bayes (NB) classi er is a supervised learning algorithm based on Bayes' theorem with a "naive" assumption that all attributes are independent given the value of the class variable [30,31]. kNN classi er is based on the Euclidean distance between the target sample and the training samples, where k denotes the number of the nearest neighbors that are used for classifying the target sample [32]. SVM is a statistical learning method that uses a hyperplane to optimally separate data into negative and positive categories [33]. DT predicts the category of a sample by using a tree-like owchart, where the nodes represent the test on an attribute and the branches denote the outcome of the test [34]. All of the four ML algorithm are commonly used for solving classi cation problems. In the present research, the predictive models using the four ML algorithms were developed on KNIME platform with "Naïve Bayes Learner", "SVM Learner", "K Nearest Neighbor" and "Decision Tree Learner" nodes, respectively. Cross validation with K-fold = 4 was performed by using a combination of "X-partitioner" and "X-aggregator" nodes. Strati ed sampling was used for partitioning the total data in X-partitioner node. Receiver Operating Characteristic (ROC) curve was generated using a "ROC curve" node.
Mean and standard deviation of the precision (P), recall (R), speci city (SP) accuracy (AC) and F-measure were calculated and gathered for evaluation of the NB classi er. The work ow used for developing the predictive models was depicted in Fig. S3. The functions used for deriving these parameters were as follows.
Where TP, FP, TN and FN denote true positive, false positive, true negative and false negative respectively.