A machine learning predictive model of in-hospital mortality in patients with sepsis complicated by anemia: a retrospective study based on the MIMIC-III database

Backgroud: Patients with sepsis complicated by anemia have a higher risk of mortality. It is clinically important to study the risk factors associated with the prognosis of this disease. The aim of this study was to establish a predictive model of mortality during hospitalization by extracting clinical data from the Medical Information Mart for Intensive Care III (MIMIC-III) database. Methods: The clinical data of patients with sepsis complicated by anemia in the MIMIC-III database were retrospectively analyzed. Indexes were screened by stepwise logistic regression (LR), and machine learning predictive models such as Decision Tree (DT), Random Forests (RF), and eXtreme Gradient Boosting (XGBoost) were developed and compared, identifying advantages and disadvantages of each model. Results: A total of 13,547 patients with sepsis complicated by anemia were included in the study, among which 1,827 died during hospitalization and 11,720 were still alive at discharge. The preliminary stepwise regression model selected 20 clinical indexes, including Elixhauser comorbidity index, maximum blood urea nitrogen (BUN), and maximum hemoglobin reduction. The predictive models showed good discriminative ability (area under the receiver operating characteristic curve [AUROC]:LR, 0.777; DT, 0.726; RF, 0.788; XGBoost, 0.815) and goodness of fit (area under the precision-recall curve [AUPRC]: LR, 0.350; DT, 0.290; RF, 0.400; XGBoost, 0.428). The Shapley Additive exPlanation (SHAP) values in the XGBoost model showed that Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use within 24 hours of admission, and age were significant features for predicting in-hospital mortality in patients with sepsis complicated by anemia. Conclusions: The XGBoost model had better discrimination ability and goodness of fit when compared with other models. Machine learning algorithms have significant practical value in the development of an early warning system for patients with sepsis complicated by anemia. drafted the manuscript and take overall responsibility for its content. DW, XY, QM and YSW carried out literature search and data acquisition. YZ, YYQ, RQY and ZQW conducted the search and the statistical analysis. SHZ, SLM and FZ assessed the study eligibility and quality and interpreted the data. All authors contributed to the manuscript and approved the final version to be considered for publication.

several studies have shown that there may be other causes of anemia during the acute phase of sepsis, including the degradation of the glucose calyx of the vascular endothelial membrane [7,8] leading to the entry of large amounts of tissue fluid into the blood vessels, the dilution of hemoglobin after massive fluid infusion [9], etc.
Moreover, patients with sepsis complicated by anemia have a higher risk of mortality, since severe anemia will aggravate tissue hypoxia, leading to further deterioration of organ function and circulation disorders [10,11]. Meanwhile, severe anemia, which often requires blood transfusion, also increases the risk of infection [10]. Hence, it is clinically important to study the risk factors associated with the prognosis of sepsis complicated by anemia. The aim of this study was to establish a predictive model of mortality during hospitalization for patients with sepsis complicated by anemia by extracting clinical data from the Medical Information Mart for Intensive Care III (MIMIC-III) database. Due to the limitations of traditional statistical methods in dealing with covariates and missing values in retrospective studies, we adopted three machine-learning models, namely Decision Tree (DT), Random Forests (RF), and eXtreme Gradient Boosting (XGBoost), and compared their respective advantages and disadvantages. Such tools can be used to explore the risk factors of patients with sepsis complicated by anemia and to predict their mortality risk. Database MIMIC is a large online clinical data set of critically ill patients created by the Massachusetts Institute of Technology (MIT) in 2003 [12,13]. MIMIC-III was released at the end of 2015, and includes 28,000 additional records compared with MIMIC-II; the data were cleansed and proofread to simplify their structure and increase their reliability.

Materials and methods
This study was based on the MIMIC-III database, and the project was approved by the Beth Israel Deaconess Medical Center and the MIT Institutional Review Board. Since all protected private information was identified and removed, individual patient consent was no longer required. As this project is a retrospective and observational study, the clinical data of patients with sepsis were extracted by database management software and language tools; all relevant data were exported, processed, and analyzed by data analysis software. Such analysis does not have any impact on the treatment of patients and has good safety [13]. A Collaborative Institutional Training Initiative (CITI) license was obtained (number 8761695), together with the permission to use the MIMIC-III database in accordance with the relevant regulations.

Inclusion and exclusion criteria
Inclusion criteria: the diagnosis of sepsis was based on the operational scheme of Sepsis 3.0 [14]. Sepsis was diagnosed in patients suspected to have infection and with sequential organ failure assessment (SOFA) score and quick SOFA (qSOFA) score ≥2. The time when the patient diagnosed with sepsis entered the intensive care unit (ICU) was defined as the admission time. The diagnostic criteria for anemia [15] were hemoglobin <13.6 g/dL for males and <11.9 g/dL for females. Exclusion criteria: 1. incomplete hemoglobin data; 2. not admitted to any ICU; 2. age <16 years; 3. repeated admission to ICU (only the first admission was considered); 4. pregnant or in the perinatal period; 5. total length of ICU stay <24 hours.

Variable screening
Most extracted variables were indexes that might reflect tissue oxygen metabolism and common indexes related to sepsis-associated organ damage. Indexes with missing data rate ≤30% were included in the screening. Clinical and laboratory variables were collected within 24 hours of admission to the ICU. General information collected about the patients included: age; gender; vital signs, such as minimum systolic blood pressure (SBP), minimum diastolic blood pressure (DBP), maximum respiratory rate (RR), and maximum heart rate (HR). The laboratory tests included minimum hemoglobin, maximum hemoglobin reduction, maximum hemoglobin reduction rate, minimum albumin, minimum red blood cell (RBC) count, minimum hematocrit (HCT), minimum mean corpuscular hemoglobin concentration (MCHC), minimum oxygen partial pressure (PO2), maximum carbon dioxide partial pressure (PCO2), minimum arterial oxygen saturation (SaO2), maximum lactate, minimum oxygen saturation with pulse oximetry (SpO2), maximum B-type natriuretic peptide (BNP), maximum serum creatinine, maximum blood urea nitrogen(BUN), maximum D-dimer, maximum international normalized ratio (INR), maximum prothrombin time (PT), maximum partial thromboplastin time (PTT), minimum platelet count, maximum blood glucose, maximum troponin, and maximum uric acid. The incidence of renal failure, dialysis, ventilator use, sedation, Elixhauser comorbidity index, and vasoactive drugs such as dobutamine, dopamine, epinephrine, norepinephrine, and phenylephrine were also included. In-hospital mortality was observed as the outcome event.
It should be noted that: (1) The relevant indexes within 24 hours include data from 6 hours before admission to the ICU to 24 hours after admission, according to the relevant literature [12,13]; (2) since blood transfusion could have a direct impact on the hemoglobin level, we discarded hemoglobin data taken after transfusion of erythrocytes.

Statistical methods
Statistical Product and Service Solutions 25 (SPSS 25, International Business Machines Corporation, New York, USA) and R 3.5.2 (R Project for Statistical Computing, Austria, Vienna) were used for data analysis. Data with normal distribution and homogeneity of variance were represented as mean ± standard deviation (X ̅ ±s). Data not following the normal distribution were represented as median (M) and quartiles (P25, P75). Count data were presented as numbers (percentages). The Student t test was used to compare normally distributed data. The t' test was adopted when the variance of the data was not uniform. The Mann-Whitney U test was used for data not following the normal distribution. The chi-square test was used to compare the count data between groups. P-values <0.05 were considered statistically significant.
Further, we used the Python StatsModels module for feature filtering. In this study, a stepwise logistic regression model with bidirectional method (combination of forward and backward) was used to predict in-hospital mortality and screen variables that might affect outcome events, using the Akaike Information Criterion (AIC). The AIC is a standard method used to evaluate the complexity of a statistical model and to measure how well it fits the data, and it is based on the concept of information entropy. The formula is as follows: = −2 * ln( ) + 2 * (1) where L is the likelihood value and K is the number of parameters. The AIC increases with the number of free parameters to improve the optimization of fitting and avoids the occurrence of overfitting as much as possible. Therefore, we selected the minimum AIC information statistics to select the most predictive variables and perform feature screening.

Construction of machine learning models
The models were built by the Scikit-Learn machine learning library in Python, and the features selected by stepwise logistic regression were used in the machine learning models. Logistic regression (LR) and the DT and RF models were set as the baseline models in this study. Furthermore, the XGBoost model [16] was used to explore the risk factors for patients with sepsis complicated by anemia and to predict the mortality risk.
Given n samples of patient data and m features ( ∈ (0,1,2, … , )),the predicted probability ^ of each patient can be calculated as follows: is the prediction fraction of a single decision tree, and Ϝ is the tree space. In order to obtain the optimal solution, the following regularization objectives are optimized: is the loss function, used to calculate the losses of the predicted value ^a nd the true value , is the penalty term, and is the number of leaves.
is the fraction of leaf node , while and are the coefficient parameters.
Moreover, in order to make the XGBoost black box model interpretable, the Shapley Additive exPlanation (SHAP) value algorithm [17] was used to interpret the model: the following formula calculates the contribution of each feature.
is the set of features of the training data set, of dimension , while is the subset extracted from of dimension | |.
represents the contribution of feature , ( ) is the prediction value of the decision tree, and ′ ∈ {0,1} is the number of features included in the decision path made by the sample in all of the M features, with 0 being constant.
We randomly divided the patient data in a 4:1 ratio, with 4 portions used as the training set and 1 portion as the testing set. The Grid Search Method was used to find the best hyperparameters in the training set, and 5-fold cross validation was used to avoid overfitting. The XGBoost hyperparameter settings are shown in Table S1, while basic information about the training and validation groups is shown in Table S2.
After obtaining the best XGBoost model, the SHAP value algorithm was applied to improve its interpretability, and finally to obtain the ranking by importance of all the characteristics related to the outcome, while distinguishing between protective factors and risk factors. Further, the area under the receiving operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) values were used to evaluate the performance of each model.

Basic information about the patients
The patient selection process is represented in Fig. 1

Effect of hemoglobin reduction on in-hospital mortality of patients with sepsis
LR was used to examine the impact of hemoglobin-related indexes on mortality in patients with sepsis, including hemoglobin at admission (HB_0), maximum hemoglobin reduction within 24 hours (Delta_HB_24h_down_max), and maximum hemoglobin reduction rate within 24 hours (Delta_HB_24h_down_max_R). There were significant differences between the survival and death groups in hemoglobin and maximum hemoglobin reduction (p<0.001, Table 1). The results showed that the decrease of hemoglobin was an independent risk factor for mortality during hospitalization in patients with sepsis.

Receiver operating characteristic (ROC) curve
Further analyses were limited to patients with sepsis complicated by anemia. The ROC curves were used to explore the optimal threshold for predicting in-hospital mortality of these patients using hemoglobin-related indexes (Fig. 2). The AUROC of hemoglobin at admission, minimum hemoglobin, maximum hemoglobin reduction, and maximum hemoglobin reduction rate were only 0.580, 0.521, 0.611, and 0.606, respectively. These results imply that the use of hemoglobin-related indexes alone could not accurately predict the in-hospital mortality of patients with sepsis complicated by anemia.

Baseline data of patients with sepsis complicated by anemia
A total of 1,827 patients died during hospitalization. The death group was significantly older than the survival group. The proportion of male patients in the death group was significantly lower than that in the survival group. Regarding complications, vasopressors, and special procedures, renal failure, continuous renal replacement therapy (CRRT) use, ventilator use, sedation, dobutamine use, dopamine use, epinephrine use, norepinephrine use, phenylephrine use, and Elixhauser comorbidity index were significantly different between the two groups. Concerning vital signs, minimum SBP, minimum DBP, maximum HR, maximum RR, and minimum SpO2 were significantly different between the two groups.
In arterial blood gas tests, minimum PO2, maximum PCO2, minimum SaO2, maximum lactic acid were significantly different between the two groups.
Regarding laboratory tests, hemoglobin at admission, minimum hemoglobin, maximum hemoglobin reduction, maximum hemoglobin reduction rate, minimum RBC count, minimum HCT, minimum MCHC, minimum albumin, maximum BNP, maximum creatinine, maximum BUN, maximum blood glucose, maximum troponin, maximum PT, maximum INR, maximum PTT, maximum D-dimer, minimum platelet count were significantly different between the two groups( Table 2).

Variable screening by stepwise logistic regression
Stepwise LR analysis was used to further screen the variables, and 20 variables were included into the final model according to the AIC criterion (Table 3).  Table S1 in the Supplementary Material. SHAP is an additive interpretation model. Compared with the traditional feature importance graph of the XGBoost model, its significant advantage is that the influence of each feature can be described by SHAP value with a clear direction, distinguishing positive and negative effects. As shown in Fig. 3, among the variables associated with in-hospital mortality in patients with sepsis complicated by anemia, the top 5 factors were Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use, and age. Considering age as an example, red represents older age, while a higher SHAP value indicates a higher risk of mortality, so we concluded that the risk of mortality increases with age.
The population was divided into high-risk and low-risk groups according to the median, and the survival curve was plotted. The results showed that the survival probability decreased gradually over time, with significant differences between the two groups (p< 0.001, Fig. 4).

Evaluation of the predictive capabilities of different models
Compared with the stepwise LR, DT, and RF models, the XGBoost model had the highest AUROC value (0.815 vs 0.777, 0.726, and 0.788, respectively), indicating its better discriminating ability ( Figure 5A). As shown in Figure 5B, the XGBoost model performed better than stepwise LR, DT, and RF also in terms of AUPRC (0.428 vs 0.350, 0.290, and 0.400, respectively).

Discussion
The etiology and pathogenesis of sepsis complicated by anemia are varied. When infection and immune system dysfunction occur, the erythrocyte cell membrane develops abnormalities due to the damage inflicted by bacteria and immune mechanisms, finally resulting in erythrocyte apoptosis [18]. The cytokines induce activation of mononuclear macrophages and enhanced phagocytosis of  [19]. However, sepsis is usually accompanied by the formation of a large number of microthrombi. When erythrocytes pass through the thrombi, mechanical damage can also occur and lead to anemia, while patients with sepsis and anemia have more severe tissue hypoxia, are prone to organ function and circulation disorders, and have a higher risk of mortality. Therefore, it is of great clinical significance to study the prognostic risk of sepsis complicated by anemia.
Previous studies have shown that the mortality rate of patients with sepsis is greatly increased when hemoglobin at the time of admission is lower than 8.0 g/dL [20]. Our study also confirmed significant differences in the maximum reduction of hemoglobin at the time of admission and the maximum reduction of hemoglobin within 24 hours between patients with sepsis who survived and those who died in the hospital. Thus, we limited the analysis to patients diagnosed with sepsis complicated by anemia and drew ROC curves with hemoglobin-related indexes to build an early warning model of mortality. Regrettably, although a univariable warning model is more convenient and intuitive than those involving multiple variables, it has poor early warning capability. This may be due to the limited ability of a univariable model to fully reveal the overall patient condition.
In addition, while many studies have identified risk factors for mortality in patients with sepsis complicated by anemia during hospitalization, there is still a lack of usable models to accurately predict the clinical outcome of these patients [15,21,22]. Machine learning algorithms can help clinical workers to build better prediction models than traditional linear models [ 23 , 24 ]. Consequently, we used a stepwise LR model to screen all clinical data of patients with sepsis complicated by anemia for factors that may have important influence on in-hospital mortality, and constructed DT, RF and XGBoost machine learning predictive models, among which the XGBoost model had the best prediction efficiency. Based on these results, we believe that it is absolutely necessary to use advanced machine learning methods to build predictive models for clinical diseases with complicated pathophysiologic mechanisms and unclear etiopathogenesis.
In our study, we selected 30 covariates related to sepsis with anemia, further screened them by stepwise LR, finally retaining 20 clinical indexes. These indexes were used to construct the DT, RF, and XGBoost models to explore the risk factors and predict the risk of mortality. XGBoost is a kind of gradient lifting tree model, composed of multiple classification and regression trees (CART) and is an example of serial generation model. Each CART divides the patients into two branches according to a certain threshold value of each patient characteristic. After multiple grouping, the end of each CART tree (leaf node) contains patients with the same risk of mortality. The output result of XGBoost is calculated according to the result of the leaf node of each CART. Compared with other traditional machine learning models (LR, DT, or RF), XGBoost is an integrated algorithm based on a tree model, which not only can deal with the problem of data sparsity, but also can learn the nonlinear relationships between features, so as to improve its generalization ability and robustness [25,26]. Compared with other models, the XGBoost model has better identification ability and better goodness-of-fit.
In order to make the black box model interpretable, the SHAP algorithm was adopted in this study to interpret it. The SHAP value can be used to interpret not only each patient individually, but also the outcome of all the patients as a whole. By calculating the marginal contribution of each feature of each sample, the feature interpretation of each sample can be deduced from the SHAP value, so as to achieve the effect of local interpretation. Using SHAP values in the XGBoost model, we analyzed the influence of the characteristic values of each clinical index. Through statistics and modeling, we found that Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use, and age were the primary predictors of in-hospital mortality within the first 24 hours of ICU admission in patients with sepsis complicated by anemia.
The Elixhauser comorbidity index is a commonly used comprehensive scoring system for evaluating the prognosis of inpatients with underlying diseases [27], often used in studies to reflect disease severity, and is an important confounding factor that needs to be adjusted [28,29]. Indeed, both acute organ damage caused by infection in patients with sepsis and basic diseases such as diabetes, tumors, and renal failure are closely related to the mortality of sepsis [30,31]. The mortality of patients with sepsis is also positively correlated with age [32]. Therefore, it is clinically feasible to use the comorbidity index to determine the risk of mortality of patients with sepsis, consistent with other studies [33][34][35].
High BUN is a risk factor for mortality in patients with sepsis [36,37]. Indeed, acute kidney injury is not rare in patients with sepsis or septic shock. BUN is an important index of renal function [38], which can reflect the nutritional intake of critically ill patients over a period of time [39]. However, while BUN could be easily affected by diet, renal blood flow, high catabolic metabolism, intake of protein or amino acids, as well as intestinal bleeding, hyperthyroidism, and other factors, it is traditionally believed that BUN cannot reflect kidney function better than creatinine [40]. However, a recent study [41] has shown that elevated BUN, rather than serum creatinine, was closely related with increased mortality in critically ill patients whose creatinine was between 0.8 and 1.3 mg/dL. Therefore, whether urea nitrogen is more sensitive to kidney injury than creatinine in a certain group of critically ill patients remains to be elucidated.
Our model, based on large databases, showed that the maximum reduction of hemoglobin within 24 hours in patients with sepsis complicated by anemia was inversely proportional to the risk of mortality, that is, the greater the reduction of hemoglobin, the lower the risk of mortality, which was inconsistent with our previous understanding. Since we collected hemoglobin values before transfusion, the patients did not achieve hemoglobin improvement through transfusion of erythrocytes. In our opinion, the large decrease in hemoglobin may be related to iatrogenic hemodilution caused by the large amount of fluid resuscitation in patients with sepsis at an early stage [42,43]. Due to timely and sufficient resuscitation at an early stage, these patients have better prognosis [44,45]. In the SHAP value graph, the absolute value of hemoglobin reduction, rather than its proportion, gave a significant contribution, which confirms our view. It is still controversial whether patients with sepsis complicated by anemia should receive blood transfusion [46][47][48]. Our results suggest that the maximum hemoglobin reduction within 24 hours was not positively associated with the risk of mortality, and that a certain level of decrease in hemoglobin does not affect patient outcomes.
Finally, in the additional files, we further included the top five and top ten contributing indexes into the XGBoost model, with no significant decrease in AUROC and AUPRC, which also proves that the indexes with high SHAP values had indeed good predictive value ( Fig. S1 and Fig. S2). In addition, we have established a web page (https://wengzq-lab.cn/sepsismp/) implementing the machine prediction model for researchers to visit and evaluate.
The present study had several important limitations. The MIMIC database contains a large amount of clinical information, and by mining the database some hidden characteristics of diseases that cannot be found by conventional methods can be revealed, which is useful for prognostic purposes and to evaluate drug use or operation risk. However, its main disadvantage is that the data come from a single center in the US, and the majority of the population is of white or black ethnicity. Due to racial differences, the results may not be applicable to all ethnic groups, and in particular to Asian people, who account for a low percentage of the database subjects. Therefore, further evaluation of our machine learning prediction model needs to be performed in other large databases or, preferably, using prospective cohort studies.

Conclusions
Our prospective study showed that the XGBoost model had better discrimination ability and goodness of fit when compared with other models to accurately predict the mortality in patients with sepsis complicated by anemia during hospitalization. It is necessary to use advanced machine learning methods to build predictive models for clinical diseases with complicated pathophysiologic mechanisms and unclear etiopathogenesis. However, due to racial differences, our results may not be applicable to all ethnic groups.

Key Messages
• Patients with sepsis complicated by anemia have a higher risk of mortality. It is clinically important to study the risk factors associated with the prognosis of this disease. • This study established a predictive model of mortality during hospitalization by extracting clinical data from the MIMIC-III database. The top 5 factors which contribute most were Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use, and age. • The XGBoost model had better discrimination ability and goodness of fit when compared with other models. Machine learning algorithms have significant practical value in the development of an early warning system for patients with sepsis complicated by anemia.

Additional files
Additional files 1: Table S1 XGBoost hyperparameter settings Additional files 2: Table S2 Baseline data of the training and validation groups