Study Design
|
Study Characteristics
(Timeline, Sample size, and Features)
|
Outcome Measures
|
Handing Missing Data
|
Algorithms
|
Performance
|
Summary
|
Retrospective cohort35.(2021)
|
EHR (May 2010 to July 2014) from a single healthcare;
9986 CDI cases and 2 230 354 members without CDI;
104, 518 hospital discharges for validation; case:control ≈ 1:23
≈ 20 Risk factors
|
CDI
|
Not addressed
|
Univariate logistic regression
Logistic Regression to develop 2 risk scores. Model 1: hospital discharge IDRSA. Model 2: random IDRSA.
|
Model 1: using hospital discharge as the IDRSA, C-statistic of 0.848 in subsequent 31-365 days; Model 2: using a random date as the IDRSA, C-statistic 0.722
|
Identification of high-risk populations for CDiff vaccine trials to determine the study feasibility (sample size and time to completion)
|
Case-control study 28.(2019)
|
Adult patients admitted to multicenter study July 1, 2015, to July 1, 2017, who received systemic antibiotics.
200 subjects (100 cases and 100 controls)
Reported 2 features
|
CDI(hospital-associated)
|
Not addressed
|
Univariate logistic regression
Multivariate logistic regression model to formulate a point-base risk prediction model
|
Sensitivity and specificity were 76% and 49%
Highest accuracy (63%)
AUROC = 0.7
|
A simple-to-implement hospital-onset CDI risk model; including only independent risks that can be obtained immediately on presentation to the healthcare facility
|
Retrospective Cohort 34. (2018)
|
EHR based adult inpatients admitted to two healthcare systems one( January 1, 2010, and January 1, 2016) and the other ( June 1, 2012, and June 1, 2014)
191,014 (155,009/36005 for training/testing) and 65,718 (33,477/32,241 for training/testing) for two healthcare systems respectively
Case:control ≈ 1:100
4,836 &1,837 features from two healthcare system respectively
|
CDI
|
Not addressed
|
L2 regularized logistic regression
Logistic regression to create a daily risk score for risk stratification
|
AUROC = 0.82[0.80-0.84] and 0.75[0.73-0.78] for two cohorts respectively.
|
Many of the top predictive factors differed between the cohorts from two healthcare systems.
Institution-specific models instead of “one-size-fits-all” models
|
Retrospective Cohort 33. (2016)
|
Population-based sample Medicare beneficiaries aged 65 and older on January 1, 2008, with continuous Medicare coverage from January 1, 2008, through December 31, 2009.Inpatient setting (58.5%)
Of 1,165,165 Medicare beneficiaries meeting the enrollment criteria, 6,838 had an incident CDI episode; case:control = 1:170
22 features
|
CDI
|
Not addressed
|
Logistic regression model for feature selection sequentially remove features with < 0.8 change in C-statistic;
A weighted score was developed for each of the risk factors based on its odds ratio, with the sum of all of the risk values representing a participant’s individual risk score
|
C-statistic = 0.858
NPV = 98.7%
|
Developed a risk stratification scoring system
Emphasized the age-dependent CDI
|
Retrospective cohort 32. (2016)
|
admitted over a 1-year period (2013).
Total of 61,482 subjects, Discovery dataset (40,990) and validation dataset (20,492) case:control ≈1:200
~25 features
|
CDI (hospital-associated)
|
Not addressed
|
Multivariable analysis to identify risk factors individually
Multivariable model based on six risk factors to develop a risk score
|
Sensitivity = 82.0%; Specificity = 75.7%; AUROC = 0.85
|
Developed a clinical prediction rule to identify patients at high risk for primary CDI.
|
Retrospective cohort (longitudinal) 31. (2015)
|
Hospital discharge data and pharmacy data from two large academic centers linked to active population-based CDI surveillance data from the Emerging Infections Program (EIP)
Of the 35,186 index hospitalizations, 288 (0.82%) had CDI ≥28 days post discharge
39 features to begin with, 4 features left
|
CDI (Having CDI ≥28 days post discharge
|
Not addressed
|
Cox proportional hazards model (stepwise backward selection) for low and high risk groups
|
C-statistics = 0.75
|
Develop a risk score applied at discharge to identify a risk of CDI≥28 days post discharge
|
Case-control study 27. (2015)
|
Patients admitted between January 2005 and December 2011 from a single healthcare system
Discovery:180 cases and 330 controls; Validation: 97 cases and 417 controls; case:control ≈ 1:120
12 features
|
CDI (hospital-associated)
|
Not addressed
|
Stepwise backward elimination to determine the best fit model.
Logistic regression to develop a simplified risk score
|
Corrected AUROC = 0.81[0.77-0.85]; calibration: Brier score = 0.004
|
Developed and validated a model to predict the incident CDI in hospitalized patients who receive systemic antibiotic treatment
|
Retrospective cohort 30. (2014)
|
All patients admitted on or after April 12, 2011 and discharged on or before April 12
Training: 34 846 admissions (372 cases of CDI).Validation: 34 722 admissions (355 cases of CDI) Case:Control ≈1:100
14 features (EHR Model)
1017 features (Curated Model)
10,859 features ( EHR ALL)
|
CDI (hours from the time of admission)
|
Not addressed
|
L2-Reguralized Logistic Regression
3 Models Compared based on different number of features included in the final models to discriminate low-risk from high-risk patients
|
AUROC
Risk Period> 24h
EHR = 0.81(.79–.83)
Curated = 0.72 (.69–.75)
EHR ALL = 0.8140 (.80–.83)
Risk Period> 48h
EHR = 0.7886 (.76–.82) Curated = 0.69 (.66–.72) EHR ALL = 0.79 (.76–.81)
|
Additional features from EHR data improved prediction and outperformed the model only considering a small set of known clinical risk factors.
|
Retrospective cohort 29. (2014)
|
all inpatient visits for the 2 years between April 2011 and April 2013.
1348 test positive case of C difficile out of 132 853 admissions from three hospitals, varying in size and location; case:control = 1:100
578 binary features; Different feature space including common (256) and specific features
|
CDI (hospital-associated)
|
Missingness has been discussed; source feature space and target feature space
|
L2-regularized logistic regression
Multivariate Logistic regression
|
AUROC ≈ 0.80 varied by the approach and target task
|
The external data from other hospitals can be successfully and efficiently incorporated into hospital-specific models.
|
Case-control study 26. (2014)
|
Not available (abstract only)
8 Known risk factors
|
CDI
|
Not addressed
|
All feature included
Multivariate regression model to create a weighted score tool
|
Sensitivity = 92%; Specificity = 39%
|
Developed a weighted scoring tool to predict incident CDI
|
Retrospective cohort 25. (2014)
|
a consecutive cohort of patients
admitted to the adult medical service over a period of 17 months
(June 2011 to October 2012).
62 out of 7026 patients with over 48h hospital stay having hospital-onset CDI cases; case:control = 1:100
Reported 6 features
|
CDI (hospital-onset)
|
Addressed for missingness in serum albumin level
|
Univariate analysis to determine the potential risk factors included in the model
Multivariable logistic regression model using a forward stepwise selection for features
|
AUROC = 0.94 [
0.92-0.95]. Sensitivity = 98.3% [90.2-99.9]; Specificity = 85.2% [84.3-86.0]
|
Developed a predictive scale for hospital-onset CDI which can be used for risk stratification
|
Retrospective Cohort 24. (2011)
|
Patients admitted for ≥48 hours during the calendar year 2003 from a single healthcare system
35,350 total admissions & 329 CDI cases. Case:control ≈ 1:100
11 features
|
CDI
|
Not addressed
|
Feature selection based on high dimensional data reduction techniques such as PCA, cluster analyses
Logistic stepwise regression to determine the best fit model
Logistic regression also test for the some feature interactions
|
C index=0.88; Brier score 0.009)
|
Developed and validated a CDI risk prediction model using EHR with strong discriminative capacity.
|
Retrospective cohort 23. (2009)
|
Three phases design: discovery dataset (NA), testing dataset (n=1468), and external validation (n=29425)
|
CDI Clostridium difficile-associated disease (CDAD)
|
Not addressed
|
Logistic regression model
|
AUROC = 0.827; Sensitivity = 70% and specificity = 95%
|
Developed a predictive score to predict patient’ risk of developing CDAD.
|
Retrospective cohort 22 (2008)
|
Temporal split. development cohort (March 2005 to December 2006) and a validation cohort (January 2007 to October 2007).
a cohort of hospital patients given broad-spectrum antibiotics 392 (288/104) out of 54226(41224/13002). Case:control ≈ 1:100
Reported 4 features
|
CDI
|
Not addressed
|
Logistic regression model to identify significant predictor individually
A scoring algorithm to create four categories of CDI risk
|
AUROC = 0.712
|
Developed an easily implemented risk index for risk stratification of patients.
|
Prospective Cohort 39. (2018)
|
Patients symptomatic of CDIFF between July 2014 and February 2015 from 14 Spanish hospitals
274 (Training dataset); 183 (Validation cohort). Reported 4 features
|
Recurrence
|
Not addressed
|
Logistic regression Model with model calibration using Hosmer-Lemeshow test.
Logistic regression to form a GEIH-CDI score
|
AUROC = 0.72 (0.65-0.79).
|
Develop a risk score for recurrent CDI prediction and stratification
|
Retrospective cohort 44. (2019)
|
First-episode of adult CDI from January 1, to December 2015
For recurrence, 36 vs 191
For poor outcome, 70 vs157
(no testing dataset)
≈35 features
|
Recurrence
Severity
|
addressed
|
Univariate analyses; Backward stepwise
multivariate logistic regression
Multivariate regression
|
AUROC = 0.728/0.789 for clinical model; 0.775/0.801 for EIA-included model; 0.785/0.804 for PCR-including model for recurrence/severity
|
|
Retrospective cohort (longitudinal)38 (2017)
|
EHR (2007–2013) from a single healthcare.
Training: 9,386 incident CDI & 1,311 first CDI recurrence; testing: 1865 incident CDI &144 recurrent CDI
150 predictors
|
Recurrence (inpatient or outpatient)
|
Right-censored data
|
Univariate and bivariate regression
Competing risk discrete survival models and Cox competing risk survival regression
|
Basic (C-statistic: 0.591, sensitivity: 75.69, specificity: 41.19). Enhanced (C-statistic: 0.587, sensitivity: 69.44, specificity: 43.64). Automated (C-statistic: 0.605, sensitivity: 79.17, specificity: 32.04). Zilberg (c-stat: 0.591, sensitivity: 74.31, specificity: 39.03).
|
None of the models showed the well discriminative power.
Suggest including environmental and ecological predictors
|
Retrospective Cohort.37 (2015)
|
Patients with lab test positive for CDIFF between January 2009 and June 2013 at single healthcare system
198 CDI with 30 having CDR and break into 70% & 30% for training and testing
25 features
|
Recurrence
|
Not addressed
|
All features included
Random Forest
|
Sensitivity(83.3%), Specificity(63.1%), and AUROC (0.826)
|
Expecting Random Forest model with a higher performance
|
Retrospective Cohort 36. (2013)
|
Janurary 2006 – October 2010 from 4-hosptial Heath Care Organization
198 out of 829 with relapse for 56 days of follow-up
Reported 6 features
|
Relapse
|
Not addressed
|
Univariate logistic regression
Multivariate logistic regression
|
Predicted 14.6% of CDI Episodes
|
Comprehensive EHR can be used to identify patients at high risk for CDI relapse. Major risk factors include antibiotic and PPI exposure
|
Retrospective Cohort 43(2019)
|
adult inpatients diagnosed with CDI from October 2010 to January 2013 at a single healthcare. 89 out of 1144 cases of CDI having complicated CDI; 894 cases for training and 224 cases for tesing.
23 features for the curated model; 4271 features from EHR; final selected 900 features; 923 features for EHR + curated
|
Severity (3Day Complications)
|
No imputation or case-wise deletion.
|
Compared EHR-based model to one based on a small set of manually curated features
L2 regularization regression model
Logistic regression
|
AUROC = 0.69[0.55-0.83) on the day of CDI diagnosis; AUROC = 0.90[0.83-0.95] 2 days after CDI diagnosis; outperformed curated feature model with AUROC = 0.84[0.75-0.91]
|
Develop a model based on EHR data to accurately stratify CDI cases according to their risk of developing complications.
|
Prospective Cohort 42. (2015)
|
Discovery dataset: Boston site from December 2004 to January 2006, Validation dataset: Dublin site from November 2007 to June 2009 , & Houston site from January 2006 to August 2010
251 for Discovery and 345 for validation
3 features (Age, WBC, and Creatinine)
|
Severity
|
Not addressed
|
Univariate logistic regression
Multivariate logistic regression analysis to form a Clostridium difficile severity score (CDSS)
|
AUROC = 0.725 [0.675-0.769]
|
Developed a CDSS scoring system to predict severe CDI
|
Retrospective cohort 41, et al. (2011)
|
January 2004 and December 2007
255 patients
4 risk factors (history of malignancy + 3 laboratory variables)
|
Severity
|
Not addressed
|
Univariate analysis
Composite scoring: CDI severity index score
|
AUROC = 0.78; Sensitivity = 82%; Specificity = 65%
|
Develop a composite score for risk stratification of severe CDI
|
Prospective Cohort40. (2009)
|
A single healthcare
8 out of 58 for day1 and 75 for day 3 having severe complications
3 Laboratory variables
|
Severity
|
Not addressed
|
No feature selection
Composite scoring: RUWA scoring system
|
Sensitivity: 80.0% [39.4-96.3] and 62.5% [32.3- 85.6]; Specificity: 77.4% [73.5-78.9] and 82.1% [78.5-84.8] on day 1 and day 3 respectively.
|
the Ratio of white cell count on the day of the positive C. difficile toxin test to two days previously, as well as the Urea, White cell count and Albumin on the day of the positive C. difficile toxin test.
|