Prediction of Clostridium difficile infection in patients with polymerase chain reaction using a classification and regression tree

Background Clostridium difficile infection (CDI) is commonly diagnosed with the polymerase chain reaction (PCR), but this test finds a high percentage of false positives, so their use and interpretation in CDI is a challenge in the clinical practice. That is why it is necessary to define an algorithm to optimize the use of PCR that considers clinical characteristics to classify patients with diarrhea as CDI or without CDI. Objective To identify a predictive algorithm with the clinical features that best classify patients with CDI vs. without CDI, to help physicians in making decisions to request PCR.


Introduction
Clostridium difficile infection (CDI) is the most common cause of healthcare-associated diarrhea (1). The risk factors identified for CDI are the use of antibiotics, stomach acidsuppressing medications, primarily proton-pump inhibitors (PPI), advanced age (above 64 years), prior hospitalization, exposure to CDI patients, weak health condition, among others (2).
In the past two decades, the incidence and severity of CDI have increased worldwide.
Several reports have shown that C. difficile affects 20 to 50% of patients in the hospital environment. Indeed, a recent meta-analysis reported that C. difficile accounted for 20% of all antibiotic-.associated diarrhea cases among hospitalized patients (3,4). Although there is a lack of data about this issue from Latin America, a Colombian study found that 16.7% of gastrointestinal infections in hospitalized adults were caused by C. difficile (5).
CDI is commonly diagnosed with the polymerase chain reaction (PCR), which is specific for genes encoding toxins produced by C. difficile (6,7). Although this test is useful in clinical practice, its interpretation and applicability are challenging (7). Approximately 50% of hospitalized patients may be asymptomatically colonized by toxigenic C. difficile (7).
Moreover, a positive PCR result does not distinguish between active toxin production and asymptomatic carriage, which results in a large number of false positives (7,8). All these issues lead to increased costs, unnecessary isolation, and antibiotic overuse(9).
One of the most significant challenges in the management of patients with CDI associated diarrhea is to underpin the advance of diagnostic decision-making algorithms in the clinical practice scenario. Clinical algorithms can increase the pretest likelihood of CDI diagnosis. Furthermore, the use of such models in daily clinical practice may lower the overuse of diagnostic tests, thus reducing costs and improving the correct utilization of specific treatments for CDI. The present study used a classification tree to develop a predictive model for CDI diagnosis using a PCR test among inpatients attended at Cali, Colombia.

Data source
We performed a retrospective case-control study from 2012 to 2016 at Fundación Valle del Lili (FVL) in Cali, Colombia. FVL is a level IV university hospital affiliated with Universidad Icesi and serves as a referral facility for the southwest region of Colombia. All data were obtained directly from the clinical records.

Patients and Controls
We included hospitalized patients older than 18 years old with diarrhea, abdominal pain, or other nonspecific gastrointestinal symptoms and that underwent PCR testing for C.
difficile. Cases were defined as patients with a positive PCR result for C. difficile. Controls were defined as patients with negative PCR results.

Laboratory methods
The FVL clinical practice guideline (CPG) for CDI diagnosis include PCR test. A Xpert® C.
difficile assay (Xpert CD assay; Cepheid, Sunnyvale, CA, USA) is a multiplex real-time PCR assay that uses primers targeted to the cytotoxin gene (tcdB), binary toxin genes (cdtA and cdtB), and a single nucleotide deletion at position 117 in the tcdC gene. As a result, the Xpert CD assay can detect toxigenic C. difficile strains and differentiate C. difficile presumptive 027/NAP1/BI (10).

Statistical analysis
Predictive algorithms to classify patients with CDI and without CDI were designed using a classification and regression tree (CART). There is a non-parametric technique that determines a set of logical conditions (variables) to perform binary classification of cases in homogeneous subgroups and at the same time, considers the complexity of the relationships between the independent variables. For this study, the CART explains why patients are classified as negative or positive PCR, achieving a low rate of misclassification (11,12).
In each division (node), the tree selects the independent variable that best classifies patients with CDI or without CDI. Then, the algorithm includes variables until achieving the highest possible homogeneity or few observations. The steps for developing the CART were: 1) Construction of the maximum tree with variables that showed significantly less or equal to 20% in the bivariate analysis. For the bivariate analysis, quantitative data were compared using the Mann-Whitney U test, and qualitative data using the Chi-Squared test or Fisher's exact test; 2) The maximum tree was built using the Gini index, which was a criterion to determine the best division. The minimum change in the rate of misclassification was a criterion for stopping the tree (adding new variables); 3) Some ten patients were established as the minimum number of observations per division; 4) The process of pruning backward, to obtain a parsimonious tree, was done with crossvalidation using the same sample, and the cost-complexity mean was zero. The resulting classification tree was evaluated for discriminative capacity through sensitivity, specificity, positive and negative predictive values, overall value, likelihood ratio, and the degree of diagnostic accuracy by calculating the area under the ROC curve. All analyses were performed using software R v.3.16 (13).

Results
A total of 149 patients were included in the study after fulfilled criteria, forty-eight cases with positive PCR, and ninety-nine controls with negative PCR. Two cases were excluded for the analysis.
The median age was 55 years (IQR=71-35 years). The majority were women. The most frequent comorbidity was cardiovascular disease. The leading cause of hospitalization was for non-surgical conditions (40,9%, n=61). Table 1 shows the main clinical characteristics. Table 2 shows the drugs prescribed to the patients. A total of 62.6% of patients had a history of antibiotics use, mainly meropenem and antifungal drugs. Other frequently used drugs were omeprazole, immunosuppressants, and steroids. Statistically significant differences were found in the use of meropenem, clindamycin, polymyxin B, antifungals, steroids, and chemotherapy. Also, regarding the history of drug use, antibiotics, PPI and H2 antagonists, statistically significant differences were found.
The CART is presented in Figure 1. The CART had adequate discrimination capacity to distinguish between positive and negative PCR, only with the history of antibiotic use.
However, the CART also included other variables, such as the use of PPI, the use of ranitidine, and the use of antifungal drugs, because these variables help make a better decision. The CART showed the following performance: sensitivity 64.6%, specificity 85.8%, positive predictive value 68.8%, negative predictive value 83.3%, positive likelihood ratio 4.56, negative likelihood ratio 0.41, and AUC 79.7% (Figure 2).
The algorithm to identify patients with negative PCR was: if a patient did not have a history of antibiotic use, the probability was 89.1%. On the contrary, if a patient had a history of antibiotic use, the probability was 54.3%. But if a patient had a history of antibiotic use, did not use PPI and ranitidine, the probability of a negative PCR was 78.8%.
Finally, if a patient had a history of antibiotic use, and did not use PPI but received ranitidine and did not use antifungal drugs, the probability of a negative PCR was 71,4%.

Discussion
In low and middle-income countries, there is a lack of evidence-based diagnostic algorithms for early CDI diagnosis using a PCR test. In 2011, a study showed that the prevalence of in-hospital C. difficile was 10 per 10.000 inpatients in Cali, indicating a high prevalence of this infection and the need to identified better strategies for the control of C. difficile (14). Our study shows a straightforward algorithm that includes clinically relevant variables and may help clinicians in the diagnostic process of patients with suspected CDI. The CDI is an important cause of nosocomial infection, so the implementation of diagnostic strategies, such as decision trees, is an important alternative to guide physicians in making decisions.
Our CART showed that if a patient had a history of antibiotic use, the probability of a positive PCR was 45.7%. Previous studies in this matter showed almost all antibiotics can increase vulnerability to CDI, but cephalosporins, fluoroquinolones, clindamycin, and certain penicillins (e.g., amoxicillin/clavulanic acid) increase risk to the greatest extent (8,14,15). Above supports the results of this study regarding consider this exposure in the decision to perform a PCR.
Also, PPI has been associated with increased odds of CDI and recurrence (8). The 2018 meta-analysis by Oshima et al. (16) concluded that PPI use was associated with CDI in adult (OR 2.30, 95% CI 1.89-2.80; p<0.00001) and pediatric patients (OR 3.00, 95% CI 1.44-6.23; p< 0.00001), and with recurrent CDI (OR 1.73, 95% CI 1.39-2.15; p=0.02). In the CART, PCR performance in patients with a history of use of PPI was 75%. This finding suggests that in a case with a history of antibiotic and PPI use, it could be not necessary to use the PCR test for CDI diagnostic, and the physician will be a big probability to have a positive case of CDI without another diagnostic test.
From our results, we can recommend considering the PCR test when the patients have been exposed to PPI, ranitidine, and antifungal drugs. Instead, if the patient has not a history of antibiotics use, the probability of finding a negative result is higher. This model demonstrated a good capacity to classify healthy patients as healthy (specificity) and a high negative predictive value and thus can be considered as an algorithm to identify conditions that indicate when it is not necessary to perform a PCR test in a patient with symptoms of CDI.
CDI increases patient healthcare costs due to extended hospitalization, re-hospitalization, laboratory tests, and medications. A systematic review found that CDI to be a significant economic healthcare burden in their respective settings, with an increased length of stay and costs (17). That is why our model emerges as a diagnostic alternative for middle and low-income countries, which allows optimizing the indication of PCR for the diagnosis of CDI and thus reduces the economic burden of the disease in health systems.
The CART of this study has the advantage that it is easy to interpret and implement for health workers in their daily practice because the data that contain it is available by performing a routine clinical review to the inpatients. This methodology compared other statistical methods such as regression, which has a good capacity to discriminate, allows identifying directly from the tree, the interactions between the data and the probability of having a PCR with a positive or negative result. Therefore, this tree can be a tool for the decision to perform a PCR in patients with symptoms of CDI in environments where there are restrictions to perform diagnostic tests as PCR.
Limitations. Data were collected retrospectively, which can lead to an information bias due to the absence of data in some variables. The predictive capacity of the variable

Not applicable
Availability of data and materials Data is property of authors and could be available by request at fernando.rosso@fvl.org.co

Competing interests
The authors declare that they have no competing interests

Funding
No funding. This article received no specific grant from any funding agency in the public, commercial, or non-for-profit sectors.

Authors' contributions
DMM, LGPL, and FR contributed equally to this work. DMM and LGPL designed the research. All authors performed the research, analyzed the data, and wrote the paper. All authors made important intellectual contributions to the manuscript, and all authors approved the final version before submission. FR supervised all the process.  Figure 1 Classification tree for prediction CDI by PCR.

Figure 2
Results from validation of the classification tree.