An interpretable machine learning models for predicting in-hospital mortality in patients with sepsis based on multiple databases

doi:10.21203/rs.3.rs-3308739/v1

Download PDF

Research Article

An interpretable machine learning models for predicting in-hospital mortality in patients with sepsis based on multiple databases

https://doi.org/10.21203/rs.3.rs-3308739/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 06 Mar, 2024

Read the published version in European Journal of Medical Research →

You are reading this latest preprint version

Background:

This study aimed to develop and validate an interpretable machine-learning model that utilizes clinical features and inflammatory biomarkers to predict the risk of in-hospital mortality in critically ill patients suffering from sepsis.

Methods:

We enrolled all patients diagnosed with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV, v.2.0), eICU Collaborative Research Care (eICU-CRD 2.0), and the Amsterdam University Medical Centers databases (AmsterdamUMCdb 1.0.2). Lasso regression was employed for feature selection. Seven machine-learning methods were applied to develop prognostic models. The optimal model was chosen based on its accuracy and area under curve (AUC) in the validation cohort. Moreover, we utilized the SHapley Additive exPlanations (SHAP) method to elucidate the effects of the features attributed to the model and analyze how individual features affect the model's output. Finally, Spearman correlation analysis examined the associations among continuous predictor variables. Restricted cubic splines (RCS) explored potential non-linear relationships between continuous risk factors and in-hospital mortality.

Results:

A total of 3,535 patients with sepsis were eligible for participation in this study. The median age of the participants was 66 years (IQR, 55–77 years), and 56% were male. After selection, 12 of the 45 clinical parameters collected on the first day after ICU admission remained associated with prognosis and were used to develop machine-learning models. Among seven constructed models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance, with an AUC of 0.73 and an accuracy of 85% in the validation cohort. Feature importance analysis revealed that Age, AST, invasive ventilation treatment, and Heart rate were the top four features of the XGBoost model with the most significant impact. Some novel inflammatory biomarkers such as NLR, NHR, and MHR also critically impacted the in-hospital mortality of sepsis patients in the XGBoost model. Furthermore, SHAP force analysis illustrated how the constructed model visualized the prediction of the model.

Conclusions:

This study demonstrated the potential of machine-learning approaches for early prediction of outcomes in patients with sepsis. The SHAP method could improve the interoperability of machine-learning models and help clinicians better understand the reasoning behind the outcome.

Sepsis

prediction

machining learning

intensive care unit

Xgboost

Sepsis is a severe illness that arises from various infections, leading to uncontrolled systemic inflammation. Despite medical advances and increased knowledge of its psychophysiology, sepsis remains a common cause of ICU admission and causes over 30 million deaths annually^1,2. According to the third international consensus definition, sepsis and septic shock are rapidly progressive inflammatory conditions accompanied by a state of immunosuppression³. Neutrophils are primary effector cells during systemic inflammatory reactions and exert regulatory roles over other immune cells by secreting cytokines and chemokines that enhance their recruitment, activation, and function^4,5. The neutrophil-to-lymphocyte ratio (NLR), calculated as a simple ratio between the neutrophil and lymphocyte counts measured in peripheral blood, reflects two aspects of the immune system: innate immunity, predominantly mediated by neutrophils, and adaptive immunity, supported by lymphocytes⁶. The NLR has acted as a reliable diagnostic marker for bacteremia and sepsis⁷, with higher NLR values associated with adverse prognoses in patients with sepsis⁸. Moreover, NLR values have demonstrated a potential effect in assessing sepsis severity, notably elevated in patients with septic shock. Recent researchers have explored prediction models based on NLR, revealing their excellent diagnostic and prognostic capabilities in sepsis^9–11.

In comparison with other markers, such as C-reactive protein (CRP) and white blood cell count (WBC), NLR exhibited moderate sensitivity and high specificity¹². Beyond neutrophils, High-density lipoprotein (HDL), known for its anti-inflammatory properties, has been demonstrated to have prognostic implications in patients with inflammatory disorders, including sepsis¹³. HDL levels significantly decrease during sepsis, and low HDL correlates with higher hospital mortality, likely due to its anti-inflammatory properties¹⁴. Furthermore, recent studies have shown that immune-inflammation markers such as platelet-to-lymphocyte ratio (PLR), lymphocyte-to-monocyte ratio (LMR), neutrophil to high-density lipoprotein ratio (NLR), monocyte/high-density lipoprotein cholesterol ratio (MHR) have garnered attention in identifying patients with septic and other infectious diseases^15–17. Considering these findings, our study was designed to investigate the correlation between immune-inflammation biomarkers and in-hospital mortality among septic patients. To strengthen the model's credibility, we recruited patients from three reputable medical centers, namely MIMIC-IV, eICU-CRD, and AmsterdamUMCdb, compared to single-center studies that have been prevalent in previous predictive models for sepsis^18,19. In addition, we meticulously excluded patients with HIV infection, rheumatic diseases, cancer or metastatic tumors, and hematological diseases to minimize potential bias associated with immunosuppression. Through advanced machine learning models, including the XGBoost model, we aimed to provide reliable prognostic insights into in-hospital mortality in septic patients.

Data Source

Data for this study were obtained from MIMIC-IV 1.0 database, the eICU-CRD 2.0, and the AmsterdamUMCdb 1.0.2. The MIMIC-IV 2.0 database, an updated version of MIMIC-III, comprises data from over 40,000 patients admitted to ICU at the Beth Israel Deaconess Medical Center (BIDMC)²⁰. The eICU-CRD contains data from multiple ICUs having over 200,000 patients admitted in 2014 and 2015²¹. The AmsterdamUMCdb contains approximately 1 billion clinical data points from 23,106 admissions of 20,109 patients²².

Data collection and release were approved by the ethical standards of the institutional review board of the Massachusetts Institute of Technology (no.0403000206) and complied with the Health Insurance Portability and Accountability Act (HIPAA).

Participants

This study included participants aged 18 or older from the MIMIC-IV1.0, eICU databases 2.0, and AmsterdamUMCdb 1.0.2. eligibility for inclusion was based on the following criteria: 1) Documented or suspected infection and a Sequential Organ Failure Assessment (SOFA) score of ≥ 2 according to the Sepsis-3.0 standards ³ in the first 24 hours of ICU admission. 2) Documentation of peripheral complete blood count within the first 24 hours of ICU admission

Exclusion criteria included: 1) ICU stay of fewer than 24 hours; 2) HIV infection, cancer, metastatic tumors, rheumatic diseases 3) For patients with multiple hospitalizations, only the first ICU admission was considered for the study 4) Total cholesterol, triglyceride, HDL, low-density lipoprotein (LDL) was not documented in the first 24 hours.

Data extraction, handling missing and outliers’ data

The following clinical information was extracted using Structured Query Language (SQL) statements:1) Laboratory blood and biochemical examination within the first 24 hours: WBC, platelets, neutrophil count, lymphocyte count, monocyte count, total cholesterol, HDL, LDL, blood glucose. 2) Demographics and vital signs within the first 24 hours: age, sex, heart rate, systolic blood pressure, diastolic blood pressure, temperature (℃), and respiratory rate. 3) Blood gas analysis within the first 24 hours: arterial partial pressure of oxygen (PaO₂), arterial partial pressure of carbon dioxide (PaCO₂). 4) ICU details: the length of ICU stays and the inpatient survival status. 5) Comorbidity and treatment modalities: myocardial infarction, congestive heart failure, chronic pulmonary, liver disease, renal disease, mechanical ventilation, and dialysis. In cases where a variable was recorded multiple times within the first 24 h of ICU admission, the value associated with the greatest severity of illness was used. The NLR was computed as the ratio of neutrophils to lymphocytes, and LMR was calculated as the ratio of lymphocytes to monocytes. The PLR was calculated from the ratio of platelets to lymphocytes. The MHR was calculated from the ratio of monocytes to HDL. The NHR was calculated from the ratio of neutrophils to HDL.

Variables missing for over 30%, including PaO₂fio₂ratio, Fio₂, Lactate,Spo₂, Paco₂, Pao₂, Ph, LDL, were excluded from analysis (FigureS1). The remaining 45 predictor candidates measured at the ICU admission were selected for further analysis. Multiple imputations utilizing predictive mean matching (pmm) with the "mice" package imputed missing values for selected variables²³. Random forest outlier detection was implemented (Figure S2), with outliers replaced by pmm using outForest R package^24,25.

Statistical analysis

To begin, we completed the Data or Specimen Study course in the Collaborating Institution's Training Program (CITI) (Record ID: 9303810). Subsequently, we applied for access to both databases by creating an account on physio.com (https://physionet.org) and signing the Physio.com Clinical Database Restricted Data Use Agreement. We then utilized SQL statements to extract the required clinical information.

All analyses were carried out using R4.0.5. Continuous variables were represented as the mean ± SD or median (interquartile) and compared using Student's t-test for normally distributed variables or Mann-Whitney U test for non-normally distributed variables. Categorical variables were expressed as proportions and analyzed using the Chi-square or Fisher's exact tests.

Lasso regularization was employed for variable selection, identifying pertinent variables while disregarding others to reduce model complexity and mitigate overfitting risks ^26,27. A vital advantage of this approach is facilitating model interpretability by enhancing the understanding of underlying relationships. Ten-fold cross-validation with the "glmnet" package estimated optimal penalty parameters (lambda) and beta coefficients for selected variables in the training cohort²⁸. This rigorous cross-validation process ensured robustness in model selection and parameter estimation.

A comprehensive ensemble of seven machine learning models, including eXtreme Gradient Boosting (XGBoost), logistic regression (LR), random forest (RF), support vector machine (SVM), K Nearest Neighbor (KNN), Naive Bayes, and Decision Tree (DT), estimated the predictive models in our study. Model discriminative accuracy was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), a widely accepted metric. To further assess the practical utility and potential clinical impact, decision curve analysis (DCA) quantified net benefit across varying threshold probabilities, providing crucial insights into model clinical relevance and optimal decision strategies based on predictive outcomes ²⁹. Spearman correlation analysis examined the associations among the continuous predictor variables. Restricted cubic splines (RCS) with strategic knot positioning ( the 5th, 35th, 65th, and 95th percentiles) explored potential non-linear relationships between continuous risk factors using the Regression Modeling Strategies (rms) package in R ³⁰. Multivariate adjustment in RCS analyses helps control for these variables' effects and get a more accurate estimate of the relationship between the independent variable and the in-hospital mortality. Collectively, these rigorous statistical techniques ensured robust and reliable results.

Clinical characteristics and demographics of patients

A total of 3,535 patients meeting the inclusion criterion were ultimately recruited in this study (Fig. 1). Median participant age was 66 years (IQR, 55–77 years), with 1977 of 3535 (56%) being male (Table 1). Diabetes was the most common comorbidities (1116 of 3535, 31.6%), followed by congestive heart failure (680 of 3535, 19.2%). Non-survivors tend to be older (64.0 [53.0–75.0] vs. 69.0 [58.4–80.0], p < 0.01) and exhibited greater vulnerability to medical interventions, including invasive ventilation (79.1% vs. 59.8%, p < 0.01) and renal replacement treatment (RRT) (19.3% vs. 8.3%, p < 0.01). The median value of HDL, lymphocytes, hemoglobin, albumin, and LMR was higher in survivors, while the inflammatory biomarkers, including NLR and NHR, were significantly lower than in the non-survivors.

Table 1

Baseline characteristics of the patients
	Survivors	Non-survivors	P value
	N = 2980	N = 555
Age, years	65.(0 55.0–76.0)	72.0 (62.1–80.0)	< 0.001
Gender			0.260
F	1326 (44.5%)	232 (41.8%)
M	1654 (55.5%)	323 (58.2%)
White blood cells, ×10³/µL	13.6 (10.1–18.4)	14.7 (10.6–20.0)	0.001
Lymphocytes	1.45 (0.90–2.31)	1.26 (0.79–2.12)	0.001
Neutrophils	10.9 (7.44–15.2)	11.9 (7.85–16.9)	0.002
Monocytes	0.93 (0.59–1.42)	0.94 (0.55–1.42)	0.588
Platelet	225 (169–296)	214 (154–282)	0.004
Triglycerides, mg/dL	106 (74.0-157)	104 (77.0-147)	0.757
Total cholesterol, mg/dL	128 (99.0-164)	114 (88.9–150)	< 0.001
High density lipoprotein, mg/dL	35.0 (25.0–47.0)	32.0 (21.5–44.0)	< 0.001
Hemoglobin, g/dL	12.6 (10.9–14.3)	12.4 (10.3–14.0)	0.011
Albumin, g/dL	3.20 (2.70–3.70)	3.00 (2.40–3.50)	< 0.001
Bun, mg/dL	25.0 (16.2–40.0)	34.0 (20.9–51.0)	< 0.001
Calcium, mg/dL	8.70 (8.20–9.20)	8.70 (8.10–9.20)	0.025
Creatinine, mg/dL	1.25 (0.90-2.00)	1.70 (1.10–2.60)	< 0.001
Glucose, mg/dL	171 (133–229)	187 (143–247)	< 0.001
Sodium, mmol/L	140 (138–143)	141 (138–146)	< 0.001
Potassium, mmol/L	4.40 (4.10–4.90)	4.60 (4.20–5.20)	< 0.001
ALT, IU/L	30.0 (19.0-55.2)	38.0 (21.0–97.0)	< 0.001
AST, IU/L	36.0 (23.0–76.0)	60.0 (32.0-200)	< 0.001
PH	7.41 (7.36–7.46)	7.40 (7.35–7.46)	0.091
Heart rate	106 (92.0-122)	113 (96.0-129)	< 0.001
Systolic blood pressure	155 (136–176)	155 (134–178)	0.898
Diastolic blood pressure	90.0 (77.0-105)	91.0 (76.0-105)	0.631
Mean blood pressure	162 (137–187)	160 (134–186)	0.306
Respiratory rate	28.0 (24.0–35.0)	30.0 (25.0–37.0)	0.001
Temperature	37.4 (37.0–38.0)	37.3 (36.9–37.9)	0.072
First day sofa score	6.00 (4.00–9.00)	9.00 (7.00–12.0)	< 0.001
Length of hospital stays	38.0 (14.0–86.0)	36.0 (8.00-118)	0.605
Renal replacement therapy			< 0.001
No	2732 (91.7%)	448 (80.7%)
Yes	248 (8.32%)	107 (19.3%)
Invasive ventilation			< 0.001
No	1197 (40.2%)	116 (20.9%)
Yes	1783 (59.8%)	439 (79.1%)
Myocardial infarct			0.293
No	2648 (88.9%)	484 (87.2%)
Yes	332 (11.1%)	71 (12.8%)
Congestive heart failure			0.838
No	2409 (80.8%)	446 (80.4%)
Yes	571 (19.2%)	109 (19.6%)
Peripheral vascular disease			0.566
No	2834 (95.1%)	524 (94.4%)
Yes	146 (4.90%)	31 (5.59%)
Dementia			0.835
No	2831 (95.0%)	529 (95.3%)
Yes	149 (5.00%)	26 (4.68%)
Chronic pulmonary disease			0.465
No	2533 (85.0%)	479 (86.3%)
Yes	447 (15.0%)	76 (13.7%)
Peptic ulcer disease			0.612
No	2933 (98.4%)	544 (98.0%)
Yes	47 (1.58%)	11 (1.98%)
Renal disease:			0.003
No	2508 (84.2%)	438 (78.9%)
Yes	472 (15.8%)	117 (21.1%)
diabetes:			0.003
No	2009 (67.4%)	410 (73.9%)
Yes	971 (32.6%)	145 (26.1%)
Liver disease:			0.015
No	2876 (96.5%)	523 (94.2%)
Yes	104 (3.49%)	32 (5.77%)
Cerebrovascular disease:			0.010
No	2509 (84.2%)	442 (79.6%)
Yes	471 (15.8%)	113 (20.4%)
LMR	1.63 (1.00-2.56)	1.42 (0.94–2.48)	0.018
NLR	7.16 (4.18-12.0)	8.71 (4.90–14.7)	< 0.001
PLR	152 (94.5–238)	151 (90.5–269)	0.436
MHR	0.03 (0.02–0.05)	0.03 (0.01–0.05)	0.204
NHR	0.31 (0.19–0.50)	0.35 (0.21–0.64)	< 0.001
LMR: the ratio of lymphocytes to monocytes; NLR: the ratio of neutrophils to lymphocytes; PLR: the ratio of platelets to lymphocytes; MHR: the ratio of monocytes to high density lipoprotein; NHR: the ratio of neutrophils to high density lipoprotein

Model Development and Validation

A total of 45 clinical variables were collected according to the inclusion criteria. LASSO regression identified 12 variables associated with sepsis prognosis out of 45 clinical parameters (Figure S3): Age, AST, invasive ventilation treatment, renal replacement treatment, albumin, cerebrovascular disease, MHR, NLR, NHR, and potassium. Seven ML binary classifiers were constructed to predict sepsis mortality risk based on the selected variables: XGBoost, Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Decision Tree (DT). In the training cohort, XGBoost demonstrated superior model fit with an area under the curve (AUC) of 0.89 and an accuracy of 0.88 compared to a Sofa score AUC of 0.77 and an accuracy of 0.86 (Fig. 2A). The other models showed comparatively lower efficiency (AUC: RF, 0.71, NB, 0.684; LR,0.72; SVM, 0.65; KNN, 0.59; DT,0.58; Accuracy: RF, 0.84, NB,0.839; LR,0.84; SVM,0.83; KNN,0.80; DT,0.83). This trend persisted in the validation cohort (Table 2 and Fig. 2B). Given its optimal performance, the XGBoost model was selected for further prediction. DCA also shows XGBoost model conferred more significant clinical benefit across threshold probability (0.25) versus the Sofa score and other models in the training and validation cohorts (Fig. 3). Additionally, calibration curve analysis revealed superior XGBoost model goodness-of-fit over SOFA scoring in the validation cohort (Figure S4).

Table 2

Performances of the seven machine learning models for predicting in-hospital mortality
ML	AUC	Accuracy
XGBoost
Training set	0.89	0.88
Validation set	0.73	0.85
Sofa score
Training set	0.77	0.86
Validation set	0.71	0.84
Logistic regression
Training set	0.72	0.84
Validation set	0.67	0.83
Random forest
Training set	0.71	0.84
Validation set	0.69	0.83
K-nearest Neighbor
Training set	0.59	0.80
Validation set	0.61	0.78
Naïve Bayes
Training set	0.69	0.83
Validation set	0.60	0.82
SVM
Training set	0.65	0.83
Validation set	0.65	0.83
Decision Tree
Training set	0.58	0.83
Validation set	0.57	0.78
ML: machine learning, XGBoost: eXtreme Gradient Boosting, SVM: Support Vector Machine, AUC: the area under curve

Model Explanation

SHAP values from the optimal XGBoost model identified feature importance ranks (Fig. 4A). Age, AST, invasive ventilation treatment, and Heart rate are the most critical features for hospital mortality. The summary plot (Fig. 4B) revealed the factors' positive or negative contribution to the XGBoost model. Higher Heart rate, NHR, and Potassium had positive SHAP values (in purple), driving the prediction toward in-hospital mortality. In contrast, higher albumin with negative SHAP values (in yellow) decreased mortality risk prediction. Multivariate-adjusted restricted cubic splines further explored variables' relationships with in-hospital mortality. Liner associations were found for NLR, NHR, Potassium, heart rate, and albumin (P for non-linear > 0.05) (Fig. 5A-D), and a significant positive correlation occurred between NHR and MHR (P for Spearman correlation analysis < 0.05) (Fig. 5E). However, non-linear relationships between in-hospital mortality and MHR, Age, Bun, and AST were observed. A U-shaped association exists for MHR, with higher and lower values conferring greater in-hospital mortality risk than the curve bottom (0.028) (Figure S5A). Age, BUN, and AST demonstrated steep initial increases plateauing at certain levels (BUN:60mg/dl, AST:234IU/L, Age:78 years) (Figure S5B-E).

In this retrospective study utilizing three large-scale public ICU databases, we developed and validated seven machine-learning algorithms to predict the in-hospital mortality of patients with sepsis. The XGBoost model outperformed LR, RF, NB, KNN, DT, and SVM. Furthermore, the XGBoost model demonstrated superior performance compared to traditional Sofa scores. We employed SHAP to explain the XGBoost model to ensure model performance and clinical interpretability, which enables physicians to comprehend the model's decision-making process better and facilitates the utilization of prediction results. In critical care research, XGBoost has been extensively utilized to predict the in-hospital mortality of patients and may assist clinicians' decision-making ^31,32. We calculated SHAP feature importance and feature effects to confirm the variables' contributions to the model.

The most impactful input parameters contributing to predicted mortality risk in sepsis patients were age, AST, invasive ventilation treatment, and heart rate. Blood urea nitrogen and serum albumin were also highly predictive of in-hospital mortality in ICU sepsis patients, consistent with previous research ^33,34. Interestingly, some novel inflammatory biomarkers critically impacted hospital mortality of sepsis patients in the XGBoost model. Previous prognostic prediction models utilizing inflammatory biomarkers have been developed, such as a nomogram by Hui Chen et al. based on age, NLR, PLR, LMR, and RDW to predict 28-day mortality in sepsis ³⁵. The NLR_MHR ratio as an independent mortality risk factor with predictive efficacy for 28-day mortality in septic patients by Li et al. ¹⁷. This was the first XGBoost model incorporating inflammatory biomarkers such as NLR, NHR, and MHR to predict prognosis in sepsis patients. Our model encompassed three ICU databases to improve credibility and generalizability compared to previous single-center models. An observational study in Australia and New Zealand also demonstrated sepsis mortality under 5% without comorbidities or advanced age ³⁶. We also found that comorbidities like cerebrovascular disease contributed to higher sepsis mortality. Our initial exclusion of patients with HIV, rheumatic disease, cancer, or metastatic tumors minimized potential immunosuppression-related biases across the three databases.

However, several limitations exist. The retrospective nature leads to inherent selection bias. Therefore, a well-designed prospective study would be necessary to validate the lymphopenia threshold utility. Additionally, limited by the MIMIC-IV eICU databases, and AmsterdamUMCdb, essential information like inflammatory biomarker temporal changes was insufficiently recorded, precluding analysis³⁷. Furthermore, our imputation approach for missing and outlier data may have introduced deviation from actual values. Regardless, we hope our constructed model will aid clinicians in the timely treatment of ICU sepsis patients.

In this study, we present the efficacy of ensemble machine learning algorithms in predicting the risk of in-hospital mortality among sepsis patients. The XGBoost algorithm exhibits optimal discrimination and excellent calibration. The inclusion of inflammatory biomarkers such as NLR, NHR, and MHR in the model assists clinicians in making informed clinical decisions regarding the management of patients with sepsis.

Data Statement

The raw data supporting the conclusions of this article were available at MIMIC-IV v1.0 (physionet.org), eICU Collaborative Research Database v2.0 (physionet.org), and Amsterdam Medical Data Science.

Ethics Statement

Ethical review and approval were not required for the study on human participants following the local legislation and institutional requirements. This study did not require informed consent for participation under national legislation and institutional requirements.

Author Contribution

GZ collected the data, analyzed the data, and drafted the manuscript. TW, WY, JW, FS, RS,and JG extracted the data and participated in its design. XQ participated in the literature research. ZT was responsible for the whole project, reviewed the manuscript, designed the study, and supervised the study. All authors contributed to the article and approved the submitted version.

Funding

There is no funding support for the study.

Conflict of Interest

The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims in the article are solely for the authors and do not necessarily represent their affiliated organizations or the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that its manufacturer may make, is not guaranteed or endorsed by the publisher.

Acknowledgments

We thank all participants in the Emergency Medicine Clinical Research Center, Beijing Chaoyang Hospital, Capital Medical University

Fleischmann C, Scherag A, Adhikari NK, et al. Assessment of Global Incidence and Mortality of Hospital-treated Sepsis. Current Estimates and Limitations. Am J Respir Crit Care Med. 2016;193(3):259-272.
Denstaedt SJ, Singer BH, Standiford TJ. Sepsis and Nosocomial Infection: Patient Characteristics, Mechanisms, and Modulation. Front Immunol. 2018;9:2446.
Singer M, Deutschman CS, Seymour CW, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Jama. 2016;315(8):801-810.
Li Y, Wang W, Yang F, Xu Y, Feng C, Zhao Y. The regulatory roles of neutrophils in adaptive immunity. Cell Commun Signal. 2019;17(1):147.
Zhu CL, Wang Y, Liu Q, et al. Dysregulation of neutrophil death in sepsis. Front Immunol. 2022;13:963955.
Song M, Graubard BI, Rabkin CS, Engels EA. Neutrophil-to-lymphocyte ratio and mortality in the United States general population. Sci Rep. 2021;11(1):464.
Drăgoescu AN, Pădureanu V, Stănculescu AD, et al. Neutrophil to Lymphocyte Ratio (NLR)-A Useful Tool for the Prognosis of Sepsis in the ICU. #N/A. 2021;10(1).
Huang Z, Fu Z, Huang W, Huang K. Prognostic value of neutrophil-to-lymphocyte ratio in sepsis: A meta-analysis. Am J Emerg Med. 2020;38(3):641-647.
Lin SF, Lin HA, Pan YH, Hou SK. A novel scoring system combining Modified Early Warning Score with biomarkers of monocyte distribution width, white blood cell counts, and neutrophil-to-lymphocyte ratio to improve early sepsis prediction in older adults. Clin Chem Lab Med. 2023;61(1):162-172.
Liu S, Wang X, She F, Zhang W, Liu H, Zhao X. Effects of Neutrophil-to-Lymphocyte Ratio Combined With Interleukin-6 in Predicting 28-Day Mortality in Patients With Sepsis. Front Immunol. 2021;12:639735.
Liu Y, Zheng J, Zhang D, Jing L. Neutrophil-lymphocyte ratio and plasma lactate predict 28-day mortality in patients with sepsis. J Clin Lab Anal. 2019;33(7):e22942.
Gürol G, Çiftci İ H, Terizi HA, Atasoy AR, Ozbek A, Köroğlu M. Are there standardized cutoff values for neutrophil-lymphocyte ratios in bacteremia or sepsis? J Microbiol Biotechnol. 2015;25(4):521-525.
Morin EE, Guo L, Schwendeman A, Li XA. HDL in sepsis - risk factor and therapeutic approach. Front Pharmacol. 2015;6:244.
Tanaka S, Stern J, Bouzid D, et al. Relationship between lipoprotein concentrations and short-term and 1-year mortality in intensive care unit septic patients: results from the HIGHSEPS study. Ann Intensive Care. 2021;11(1):11.
Zheng CF, Liu WY, Zeng FF, et al. Prognostic value of platelet-to-lymphocyte ratios among critically ill patients with acute kidney injury. #N/A. 2017;21(1):238.
Demirdal T, Sen P. The significance of neutrophil-lymphocyte ratio, platelet-lymphocyte ratio and lymphocyte-monocyte ratio in predicting peripheral arterial disease, peripheral neuropathy, osteomyelitis and amputation in diabetic foot infection. Diabetes Res Clin Pract. 2018;144:118-125.
Li JY, Yao RQ, Liu SQ, Zhang YF, Yao YM, Tian YP. Efficiency of Monocyte/High-Density Lipoprotein Cholesterol Ratio Combined With Neutrophil/Lymphocyte Ratio in Predicting 28-Day Mortality in Patients With Sepsis. Front Med (Lausanne). 2021;8:741015.
Hu C, Li L, Huang W, et al. Interpretable Machine Learning for Early Prediction of Prognosis in Sepsis: A Discovery and Validation Study. Infect Dis Ther. 2022;11(3):1117-1132.
Hu C, Li L, Li Y, Wang F, Hu B, Peng Z. Explainable Machine-Learning Model for Prediction of In-Hospital Mortality in Septic Patients Requiring Intensive Care Unit Readmission. Infect Dis Ther. 2022;11(4):1695-1713.
Johnson AEW, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1.
Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5:180178.
Thoral PJ, Peppink JM, Driessen RH, et al. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med. 2021;49(6):e563-e577.
van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. #N/A. 2011;45(3):1 - 67.
Georgakopoulos SV, Tasoulis SK, Vrahatis AG, Moustakidis S, Tsaopoulos DE, Plagianakos VP. Deep Hybrid Learning for Anomaly Detection in Behavioral Monitoring. Paper presented at: 2022 International Joint Conference on Neural Networks (IJCNN)2022.
Mayer M, Mayer MM. Package ‘outForest’. 2023.
Tibshirani RJJotRSSSBSM. Regression shrinkage and selection via the lasso. 1996;58(1):267-288.
Pavlou M, Ambler G, Seaman SR, et al. How to develop a more accurate risk prediction model when there are few events. 2015;351.
Friedman J, Hastie T, Tibshirani R, et al. Package ‘glmnet’. 2021.
Van Calster B, Wynants L, Verbeek JF, et al. Reporting and interpreting decision curve analysis: a guide for investigators. 2018;74(6):796-804.
Frank EH. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis. In: Spinger; 2015.
Liu T, Zhao Q, Du B. Effects of high-flow oxygen therapy on patients with hypoxemia after extubation and predictors of reintubation: a retrospective study based on the MIMIC-IV database. BMC Pulm Med. 2021;21(1):160.
Yao RQ, Jin X, Wang GW, et al. A Machine Learning-Based Prediction of Hospital Mortality in Patients With Postoperative Sepsis. Front Med (Lausanne). 2020;7:445.
Cai S, Wang Q, Chen C, Guo C, Zheng L, Yuan M. Association between blood urea nitrogen to serum albumin ratio and in-hospital mortality of patients with sepsis in intensive care: A retrospective analysis of the fourth-generation Medical Information Mart for Intensive Care database. Front Nutr. 2022;9:967332.
Ye Z, Gao M, Ge C, et al. Association between albumin infusion and septic patients with coronary heart disease: A retrospective study based on medical information mart for intensive care III database. Front Cardiovasc Med. 2022;9:982969.
Zhao C, Wei Y, Chen D, Jin J, Chen H. Prognostic value of an inflammatory biomarker-based clinical algorithm in septic patients in the emergency department: An observational study. Int Immunopharmacol. 2020;80:106145.
Kaukonen KM, Bailey M, Suzuki S, Pilcher D, Bellomo R. Mortality related to severe sepsis and septic shock among critically ill patients in Australia and New Zealand, 2000-2012. Jama. 2014;311(13):1308-1316.
Yue S, Li S, Huang X, et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med. 2022;20(1):215.

No competing interests reported.

Download PDF

Journal Publication

published 06 Mar, 2024

Read the published version in European Journal of Medical Research →

Editorial decision: Major revision
28 Sep, 2023
Reviews received at journal
06 Sep, 2023
Reviewers agreed at journal
02 Sep, 2023
Reviewers agreed at journal
01 Sep, 2023
Reviewers invited by journal
01 Sep, 2023
Editor assigned by journal
31 Aug, 2023
Submission checks completed at journal
30 Aug, 2023
First submitted to journal
30 Aug, 2023

You are reading this latest preprint version

An interpretable machine learning models for predicting in-hospital mortality in patients with sepsis based on multiple databases

Status:

Journal Publication

Version 1

Abstract

Background:

Methods:

Results:

Conclusions:

Figures

Introduction

Methods

Data Source

Participants

Data extraction, handling missing and outliers’ data

Statistical analysis

Results

Clinical characteristics and demographics of patients

Model Development and Validation

Model Explanation

Discussion

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1