A machine-learning approach for dynamic prediction of sepsis-induced coagulopathy in critically ill patients with sepsis: an integrated analysis of the MIMIC-IV and eICU-CRD databases

INR normalized model, the compact model and Logistic Regression) and SIC scores were compared in internal and external validations. The full and the compact models were developed in MIMIC-IV, based on all or selected features, respectively. Logistic Regression was developed based on all features. Besides, the current SIC score was used to predict patient’s SIC risk of the next day. Youden Index, dened as Sensitivity + Specicity – 1, and AUC assessed the performance of different models. All statistics were the median values in 1000 iterations of the Bootstrap Resampling technique.


Sepsis-induced coagulopathy (SIC) criteria were developed by members of the Scienti c and Standardization
Committee (SSC) on Disseminated Intravascular Coagulation (DIC) of the International Society of Thrombosis and Haemostasis (ISTH) in 2017 [13] (Additional File 1: Table S1). The criteria are a scoring system designed to identify patients with "sepsis and coagulation disorders". SIC is de ned as a score ≥ 4. It was found that the mortality rate increased as SIC score elevated and exceeded 30% at a score of 4 [13]. Compared with DIC, a signi cant cause of organ failure in sepsis, SIC is more relevant for the updated Sepsis-3 criteria [1,14].
Observational evidence has shown that SIC preceded DIC in most cases and that mortality rates of SIC and DIC cohorts were relatively high and comparable [15,16]. As a result, the new guideline in 2019 recommended that septic patients with thrombocytopenia (platelet count < 150 × 10 9 /L) should be screened, rst by using SIC diagnostic criteria and then by using ISTH DIC diagnostic criteria [14].
However, there is still a lack of predictive tools for coagulopathy in sepsis. The current SIC criteria serve as a diagnostic tool rather than a predictor of SIC. In our study, daily SIC scores were assessed to predict the SIC risk of the next day; it was demonstrated that the scoring system was outperformed by our predictive models in both internal and external validations. Furthermore, several new biomarkers have been found for the early detection of coagulopathy and DIC in sepsis or septic shock [17,18]. However, these promising results are not ready for large-scale clinical practice due to the high cost and complicated test procedures [19].
Machine learning is a eld of arti cial intelligence that learns from data based on computational modeling.
Advanced machine-learning models can t high-order relationships between covariates and outcomes, and therefore, they excel in the analysis of complex signals in data-rich environments [20]. The aim of this study was to develop and validate machine-learning models for the early prediction of SIC, and to assess the feature importance in SIC prediction by interpreting the nal model.

Source of data
We conducted our retrospective study based on two sizeable critical care databases named Medical Information Mart for Intensive Care (MIMIC)-IV [21] and the eICU Collaborative Research Database (eICU-CRD) [22]. The MIMIV-IV database is an updated version of MIMIC-III. A number of improvements have been made, including simplifying the structure, adding new data elements, and improving the usability of previous data elements. Currently, MIMIC-IV contains comprehensive and high-quality data of patients admitted to intensive care units (ICUs) at the Beth Israel Deaconess Medical Center between 2008 and 2019. The other database, eICU-CRD, is a multicenter database comprising de-identi ed health data associated with over 200,000 admissions to ICUs across the United States between 2014 and 2015. One author (QZ) obtained access to the two databases and was responsible for data extraction. The study was reported according to the recommendations of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement [23].

Selection of participants
In MIMIC-IV, patients who ful lled the de nition of sepsis between 2008 and 2019 were included. According to the sepsis-3 criteria, sepsis was de ned as a suspected infection combined with an acute increase in Sequential Organ Failure Assessment (SOFA) score ≥ 2 [1]. Patients with prescriptions of antibiotics and sampling of bodily uids for microbiological culture were considered to have suspected infection. In line with previous research, when the antibiotic was given rst, the microbiological sample must have been collected within 24 h; when the microbiological sampling occurred rst, the antibiotic must have been administered within 72 h [24]. Hourly SOFA was assessed based on the clinical and laboratory data. In eICU-CRD, microbiology data was not well populated due to the limited availability of microbiology interfaces; instead, infection was identi ed according to documented diagnosis.
We only included patients who were older than 18 years old and spent more than 24 hours in ICU. No patients were excluded because of missing values. We made no attempt to estimate the sample size of the study; instead, all eligible patients in MIMIC-IV and eICU-CRD were included to maximize the statistical power of the predictive model.

Outcome (SIC)
Septic patients with coagulation disorders were identi ed according to the SIC criteria, as recommended [14].
The worst daily values of SIC-related indicators were extracted when sepsis de nition was ful lled. Then daily repeated scoring was performed. A patient was de ned as SIC if he or she has a score ≥ 4 on that day.

Predictors of SIC
Clinical and laboratory variables were extracted during sepsis by using Structured Query Language (SQL). For the prediction of SIC, 88 variables were collected (Additional File 2: Table S2), including patient characteristics (age, gender, ethnicity, admission type), vital signs (respiratory rate, blood pressure, heart rate, spo2, and temperature), laboratory data (blood gas, blood routine, liver function, renal function, and coagulation pro le), transfusion (red blood cells, platelet, and fresh frozen plasma) and urine output. Comorbidities were also collected based on the recorded International Classi cation of Diseases (ICD)-9 and ICD-10 codes, including hypertension, diabetes mellitus, chronic obstructive pulmonary disease, congestive heart failure, myocardial infarction, chronic kidney disease, leukemia, strokes, cancer, and liver disease. In addition, medications such as heparin, antibiotics and vasopressor, continuous renal replacement therapy (CRRT), and mechanical ventilation (MV) were collected. Last, the length of hospital stays, the length of ICU stays, and 28-day mortality were also analyzed but were not used for prediction.

Statistical analysis
Variable values on the rst sepsis day were compared between SIC and non-SIC groups in MIMIC-IV. As shown in Fig. 1.A, our model generated a continuous prediction score on each day when patients were diagnosed with sepsis. The scores assessed the SIC risk of the next day. Prediction was not performed if SIC criteria were ful lled on that day; when the patients recovered from SIC, our model restarted to predict. None of imputation methods was used for advanced boosting machine-learning methods because they can automatedly handle missing values; in contrast, missing values were imputed by the median values for continuous variables or mode values for categorical values when training other models. As shown in Fig. 1.B, we preliminarily compared the prediction performance of 15 algorithms using the PyCaret package, an opensourced, automated machine-learning work ow. The assessment process was performed using 10-fold crossvalidation. Accuracy and Area Under the receiver operating characteristic Curve (AUC) were calculated on each fold and pooled to evaluate each model. The most potential algorithm with the highest accuracy and the largest AUC was selected. Then, we performed ne-grained hyperparameter adjustment for the potential model using Bayesian Optimization Algorithm, an e cient constrained global optimization tool [25].
Hyperparameter search domains were listed in Additional File 1 (Table S3). The optimized model was believed as the best model for SIC prediction and was de ned as the full model.
The effects of features on prediction scores were measured using the SHapley Additive exPlanations (SHAP) values, which assessed the importance of each feature using a game-theoretic approach based on the validation set [26]. We selected 15 features which had great importance and as easy as possible to collect in the clinical setting (Additional File 2: Table S2). Then, a compact model was trained for SIC prediction based on the selected features. Although this model was not so accurate as the full model, it is considered more practical in clinical settings.
External validation for the full and the compact models was performed in eICU-CRD. The median and 95% con dence intervals of AUC were calculated using the Bootstrap Resampling technique with 1000 times of iteration. Conventional Logistic Regression and SIC scoring system were assessed to predict SIC risk and were compared with our models in both internal and external validations. Additionally, performances of our models in different patient cohorts were assessed. Samples of the validation set were split into different groups, based on Acute Physiology and Chronic Health Evaluation (APACHE)-IV, age, the region of the United States, ethnicity, time since sepsis onset, and unit type. Two models were validated in each sub-cohort separately.
All analyses were performed using Python (Version 3.6), and p < 0.01 was considered statistically signi cant.

Baseline characteristics
As shown (Fig. 2), of 12381 septic patients in MIMIC-IV, 11362 were included in the nal cohort. A total of 6744 patients had SIC during sepsis, and 4618 patients had not. A cohort of 35252 septic patients in eICU-CRD was included, and 111002 samples were derived.

Comparison of 15 models
Daily data were extracted, and 16183 samples for prediction in MIMIC-IV were created at last. Of them, 1489 were labeled as positive (SIC in the next day), 14694 were labeled as negative (still non-SIC in the next day).
The prediction performances are listed in    Table 3. Model performance in different patient cohorts in eICU-CRD was shown in Fig. 5. As seen, two models had the greatest AUC for patients who had APACHE-IV scores between 81 and 100, whose ages were younger than 65 years old, or who were admitted in NICU and SICU. The two models maintained good performance over four regions of the United States. Besides, the two models had better discrimination when sepsis lasted for several days. A similar sub-cohort analysis was also performed in MIMIC-IV (Additional File 1: Fig. S1).
The summary plot of SHAP in Fig. 3 provides an overview of feature impacts on the nal models. Additionally, the prediction results of two speci c instances were explained in Fig. 6. The bars in red and blue represent risk factors and protective factors, respectively; longer bars mean greater feature importance. For the instance in Fig. 6.A, although her coagulation pro le was still normal, she was in poor circulatory status with high lactate and vasopressor administration. The model successfully predicted she would have SIC the next day. For the instance in Fig. 6.B, his condition was more moderate, and our model predicted a low-risk value.
Website-based tool A website-based tool was established for clinicians to use the compact model, http://www.aimedicallab.com/tool/sic_risk.html. The SIC risk of the next day can be assessed by using this tool, and interpretation of the prediction result in the instance level will be shown to the user.

Discussion
Our study analyzed and compared data of patients who would and who would not have SIC on the rst sepsis day. Two variants of machine-learning models were developed (the full and the compact models), and could dynamically predict SIC with signi cantly improved accuracy. The relationships between clinical variables and SIC were analyzed based on model interpretation.
Our study compared the differences in characteristics between SIC and non-SIC groups on the onset of sepsis. As shown in Table 1, SIC patients were signi cantly younger but had worse physiological status (higher Simpli ed Acute Physiology Score [SAPS]-II, SOFA, and rate of support treatment) than those who were non-SIC. More types of antibiotics and a lower rate of heparin were administered to the SIC group on the rst day. Interestingly, linezolid and vancomycin were administered to a higher rate of SIC patients. This was probably because patients with SIC had more severe infection. On the other hand, the administration of two antibiotics could cause a decrease in platelet and exacerbate clotting abnormalities [27,28]. Additionally, the SIC group had a signi cantly higher mortality rate and longer length of hospital/ICU stays than the non-SCI group, consistent with the previous research [13].
Currently, there is a lack of reliable tools for the early prediction of coagulopathy in septic patients. Our study has demonstrated that advanced machine-learning algorithms can predict SIC with high accuracy and excellent AUC. They outperformed conventional Logistic Regression and SIC scores in both internal and external validations. CatBoost, an open-sourced gradient boosting algorithm, has not been widely adopted in critical care research. Gradient boosting is a powerful machine-learning technique that iteratively trains a weak classi er (e.g., decision tree) to t residuals of previous models. Among these models, CatBoost successfully handles categorical features and takes advantage of dealing with them during training instead of preprocessing time [29]. That means categorial features no longer need to be encoded, and a CatBoost model can be successfully developed based on raw data. Another advantage of the algorithm is that it uses a new schema to calculate leaf values when selecting the tree structure. The schema helps to reduce over tting, the major problem that constrains the generalization ability of machine-learning models [29].
In this study, we developed two variants of CatBoost models that can identify patients with a high risk of SIC and provide clinical decision-makers with more information. Generally, based on more valuable variables, models have better discrimination but worse clinical usability. Therefore, in our study, two model variants were developed for different application scenarios. The full model predicted SIC based on 88 clinical variables and reached the greatest AUC in this study. In external validation, the full model maintained good discrimination and only had a slight reduction in AUC. However, it is tough to collect 88 variables and apply this model. As a result, the full model is recommended to the hospitals with a well-designed clinical data system. By contrast, the compact model was trained based on 15 selected variables. Under the condition of ensuring necessary accuracy, it achieved practicality as far as possible. As shown in Fig. 5, our models had great and comparable AUC in different patient cohorts, demonstrating that machine-learning models based on big data have good generalization capability. Besides, a website tool was developed to help clinicians to use the compact model in clinical practice. By logging on the website and entering the values of 15 variables, our compact model will give the prediction results, and interpretation of the prediction result will be shown to the user.
By interpreting the full model, it was found that many clinical variables can help to indicate the risk of SIC. In this study, renal function indicators (urine output and creatinine) were important variables next only to coagulopathy pro le. As shown in Fig. 3, patients with poorer renal function (less urine output and higher serum creatinine) tended to have a higher risk of SIC. Also, body mass index (BMI), vital signs (heart rate and mean arterial pressure), laboratory tests (such as lactate and white blood cell count), the use of MV and vasopressor, and SAPS-II scores can help assess the risk of SIC. In addition, prediction results can be interpreted in the instance level, as shown in Fig. 6, which makes our model clinically explainable.
Several limitations of this study should be considered. First, only septic adults in critical care were included, whereas hospitalized sepsis cases were not analyzed. Besides, considering the immaturity of the coagulation system in children, especially newborns, more research is needed for SIC in children with sepsis. Second, our models screen out patients with high risks of SIC but do not indicate who will bene t from the anticoagulant therapy. It is still up to clinicians to decide whether to administrate anticoagulant agents. However, the process from sepsis to severe coagulopathy is a continuous condition arising from coagulation disorder. Early and accurate prediction of SIC can provide more time for clinical workers to adjust treatment strategies, and also help to study the potential effect of anticoagulant therapy in early stage. Third, this is a retrospective observational study. Data missing and input errors exist, despite the very high quality of the MIMIC-IV and eICU-CRD databases. Therefore, prospective validation is still needed in the future. Compared with septic shock, for which advances have been made in recent years, giving rise to signi cant survival improvements, there is still a long way to go for diagnosis and management of sepsis-associated coagulopathy.

Conclusions
In conclusion, the present study developed two variants of the  Schematic illustration of study design. Daily assessment was performed during the time when sepsis was diagnosed. If SIC criteria were not ful lled, risk of SIC in the next day was predicted by our model. Prediction stopped when SIC was diagnosed, and restarted when patients recovered from SIC. We compared the discrimination of 15 machine learning models by using 10-fold cross validation. The one with the best accuracy and greatest AUC was chosen. Fine-grained hyperparameter adjustment was performed by using    Model performance in different patient cohorts in eICU-CRD. Different validation sets were derived based on APACHE-IV (A), age (B), region of the United States (C), ethnicity (D), time since sepsis onset (E) and unit type (F). AUC of the full and the compact models in each set was measured by using Bootstrap Resampling technique. The colored area represents 95% con dence intervals. Abbreviations: Full the full model, Comp the compact model, AUC area under receiver operating characteristic curve, APACHE-IV Acute Physiology and Chronic Health Evaluation-IV, CICU cardiac intensive care unit, CSICU cardiac surgical intensive care unit, CTICU cardiothoracic intensive care unit, MICU medical intensive care unit, NICU neuro intensive care unit, SICU surgical intensive care unit.