Patient characteristics
In the discovery cohort, 383 patients were initially recruited, of which 23 patients without complete data were excluded, leaving 360 patients who were eligible for analysis. For the validation cohort, 313 patients were initially enrolled, and 3 patients were excluded due to missing prognostic information. A final total of 310 patients were included in the validation cohort. Tables 1 and S1 show the baseline characteristics (demographic characteristics, pre-existing conditions, primary sites of infection, vital signs at baseline, laboratory examination outcomes and specific treatments before BSI) as well as prognostic outcomes (ICU complication and outcomes) of patients in the discovery and validation cohorts. In the discovery cohort, the median age for the patients was 64 years, with 37.2% of the patients being female; The baseline SOFA scores and ICU mortality rate for this cohort were 8 and 27.2%, respectively. In the validation cohort, the median age and proportion of female patients were 62 years and 48.7%, respectively while the baseline SOFA scores and ICU mortality rate were 7 and 25.5%, respectively.
Characteristics of clusters in the discovery cohort
Based on baseline variables, partitioning cluster analysis revealed two distinct clusters with differing prognostic outcomes. The optimal number of clusters proposed by the ‘NbClust’ package, “silhouette width” and the elbow method was 2. The clusters were well separated from one another, as shown by principal component analysis plots (S-Figure 1). Patients in the two clusters had distinct baseline characteristics and prognostic outcomes (Table 2 and S-table 2).
Patients in cluster 1 (n= 211) were likely to have been directly transferred from the emergency department to ICU, and had had less time in the hospital before BSI. These patients had milder organ dysfunctions (lower SOFA scores) and received fewer invasive treatments. The main primary sites of infection were the abdomen, followed by the urinary system, with fewer patients exhibiting pulmonary infections. Regarding prognosis, patients in cluster 1 were less likely to suffer from MODS, had lower incidences of ARDS and septic shock, and had lower ICU mortality rates.
Patients in cluster 2 (n=149) had significantly higher SOFA scores, more ICU complications (MODS, ARDS, and septic shock) as shown in Table 2 and Figure 2A, and poorer prognostic outcomes (longer hospital and ICU stays, higher ICU mortality). The Kaplan-Meier curve revealed a higher risk of death for cluster 2 patients, compared to cluster 1 (hazard ratio, 2.31 [95% CI, 1.53 to 3.51]; p < 0.001; Figure 3A). A higher proportion of patients in cluster 2 had been subjected to mechanical ventilation (117/149, 78.5%), deep vein catheterization (133/149, 89.3%), antibiotics (145/149, 97.3%) and vasoconstrictor agents (105/149, 70.5%) before the diagnosis of bloodstream infections.
Predicting the identified clusters using baseline variables
Using the random forest, the top 4 baseline variables (vasoconstrictor use before BSI, MV before BSI, DVC before BSI, and antibiotic used before BSI; Figure S2) were identified to predict the prognostic outcomes of the identified clusters in the discovery cohort. Then, we created a nomogram that integrated all four significant independent predictors. For easy clinical applications, based on the derived nomogram using only four baseline variables, we developed a bloodstream infections clustering (BSIC) score (Figure 1A). Figure 1B shows adequate calibration of the score, as the proportion of patients attributed to cluster 2 increased with the score. The nomogram and BSIC score showed good discrimination with AUC of 0.96 (95%CI, 0.94 to 0.98 and 0.74-0.98, respectively; Figure 1C). The optimal cut-off value of the score was 5, which was determined by ROC curve with highest Youden Index. Patients with a score of 0 to 4 were included in cluster 1 while those with scores of 5 to 8 were included in cluster 2. The accuracy, sensitivity and specificity of classifying patients according to this cut-off value were 91%, 86% and 95% respectively, with PPV of 92% and NPV of 90%.
Validation of the BSIC score
The four baseline variables (vasoconstrictor use before BSI, MV before BSI, DVC before BSI, and antibiotic use before BSI) were used to predict cluster labels of the 310 BSI patients in the validation cohort with BSIC scores. In this study, 124 of 310 patients were assigned to cluster 1, while 186 patients were assigned to cluster 2. Patient’s baseline characteristics and prognostic differences between predicted clusters of the validation cohort are shown in S-table 3 and Table 3. Consistent with findings from the discovery cohort, cluster 2 patients had higher SOFA scores, more ICU complications (MODS, ARDS, septic shock, AKI, DIC), and poorer prognostic outcomes (longer hospital stays and ICU stays, higher ICU mortality), compared with patients in cluster 1 And the results are also shown in (Figures 2B and 3B). The Kaplan-Meier curve revealed a high risk of death for cluster 2 patients, compared to cluster 1 (hazard ratio, 2.23 [95% CI, 1.34 to 3.71]; p = 0.001 of log-rank test). Moreover, the species of pathogens in the discovery and validation cohorts were as shown in Figure S3. Escherichia coli, Klebsiella pneumoniae and Staphylococcus were the top 3 most common pathogens in the discovery cohort. In contrast, the most common pathogens in the validation cohort were Staphylococcus, Candida and Klebsiella pneumoniae, respectively.