The aim of the current study was to develop a predictive model based on ML algorithms to discriminate between gram-positive and gram-negative bacteremia in patients with severe bloodstream infection before the pathogenic test results are accessible. The model based on the RF algorithm showed satisfactory predictive performance in discriminating between gram-positive and gram-negative pathogens that cause bacteremia. In order to improve its applicability in real-life situations in which all routine blood test results might not be available on time, especially in emergency situations, a DT model was built using only five variables.
Empiric antimicrobial treatment of bacteremia is often problematic because of the increasing resistance of both gram-positive and gram-negative microbes against antimicrobial drugs. Gram-positive bacteria are a major concern, especially multidrug-resistant bacteria such as methicillin-resistant Staphylococcus Aureus, vancomycin-resistant Enterococcus faecium, and β-lactamase-resistant Streptococcus pneumonia [24, 25]. Further, multidrug-resistant gram-negative bacteria, such as Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter baumannii, also pose a serious and rapidly emerging threat, especially for patients in intensive care units [26]. The easy-to-use model proposed in the present study can be used to promptly predict gram-positive and gram-negative bacteremia and could contribute to the timely and adequate elimination of the implicated pathogen. Adequate empiric antimicrobial treatment for sepsis has been demonstrated to directly affect the mortality rate in the ICU [27]. With this model, the probability of gram-positive or gram-negative bacteremia can be calculated offline when the values of the 32 variables are input into the software provided. Further interventional studies based on this prediction model are necessary to verify its effectiveness based on patient outcomes.
Several laboratory blood test parameters have been proposed as potential predictive markers for the discrimination of gram-positive and gram-negative bacterial infections, and these are used to tailor empiric antimicrobial therapy before the results of the pathogen tests are obtained [6, 9, 28, 29]. However, there is no strong evidence for the ability of any of these parameters to predict the infection pathogens. The ML algorithm has been proved to be helpful in combining several variables to discriminate different subsets of patients. So far, there is no ideal ML model for predicting the pathogens that cause bacteremia. The ML model of Ratzinger et al. based on the K-star algorithm had a sensitivity of only 44.6% for detecting gram-negative bacteremia [30]. Although the AUC of their model (0.675) was comparable to that of the present study, it had poor sensitivity (44.6%) and specificity (79.8%). Ratzinger’s research also started with variables from routine laboratory tests, such as CBC count, liver function test, renal function test, serum electrolytes, and coagulation function test, but only seven variables (gender, count of lymphocytes, count of monocytes, percentage of monocytes, fibrinogen, creatinine, and C-reactive protein) were included in the final K-Star model. When building the current RF model, the results of blood gas analysis were also included. Moreover, 32 variables were entered into the RF model. The larger cohort of patients, the higher number of input variables, and the different algorithms used may explain why our model performed better.
Considering that measurements of the 32 variables input in the RF model may not be available in some areas, medical institutions, and units, a well-performing DT model was also constructed with only five routinely measured variables: WBC count, basophil percentage, alkaline phosphatase, and lactate. Gram-negative bacteremia is associated with a higher level of inflammatory response than gram-positive bacteremia [6]. Accordingly, the association of gram-negative bacteremia with increased levels of WBC has also been found in a previous report [31]. Additionally, as basophils are a type of WBC, the inclusion of basophil percentage as an indicator also makes sense. Gram-positive and gram-negative bacteria activate different receptor pathways [32] and cytokine production patterns in the host [33]. Certain cytokines (such as IL-3, IL-5, and GM-CSF) induced by gram-positive bacteria appear to be important developmental factors for basophils [34]. Further, lipopolysaccharide is found in abundance in the outer membrane of most gram-negative bacteria and plays a key role in host–pathogen interaction [35] by increasing lactatemia via enhanced glycolysis [36] and lactate production [35], as well as early and severe impairment of lactate clearance [37]. Furthermore, it causes hepatoxicity by induction of oxidative stress and consequent oxidative damage to biomolecules [38]. These functions of lipopolysaccharide may explain the significant increase in lactate levels and hepatic biomarkers (e.g., AKP and total bilirubin) in patients with gram-negative bacteremia.
Several limitations of this study must be considered. First, the laboratory blood test variables in the MIMIC database do not represent all commonly used infection-related parameters; for example, procalcitonin and C-reactive protein are not reported in the MIMIC database. Further, immune-related parameters, such as CD4, CD8, and HLA-DR, were rarely recorded in the MIMIC database and could not be included when developing the ML model. The exclusion of these parameters may limit the effectiveness of the ML algorithm. Second, as the datasets were evaluated retrospectively, most of the laboratory blood test results were not obtained on the same day that bacteremia was suspected. As there is no standard turnaround time for laboratory test results, the applicability of the model may be limited in certain situations. Finally, the model needs to be evaluated using data from different regions and countries, as well as prospective cohorts.