A New Risk Model Based on the Machine Learning Approach for Prediction of Mortality in the Respiratory Intensive Care Unit

Background and objectives : Intensive care unit (ICU) resources are inadequate for the large population in China, so it is essential for physicians to evaluate the condition of patients at admission. In this study, our objective was to construct a machine learning risk prediction model for mortality in respiratory intensive care units (RICUs). Methods: This study involved 817 patients who made 1,063 visits and who were admitted to the RICU from January 1, 2012, to December 31, 2017. Potential predictors such as demographic information, laboratory results, vital signs and clinical characteristics were considered. Among the 1,063 visits, the RICU mortality rate was 13.5%. We constructed eXtreme Gradient Boosting (XGBoost) models and compared predictive performances with random forest models, logistic regression models and clinical scores such as Acute Physiology and Chronic Health Evaluation II (APACHE II) and the sequential organ failure assessment (SOFA) system. Results : For this dataset, XGBoost models achieved the best performance with the area under the receiver operating characteristics curve (AUROC) of 0.923 (95% CI: 0.889 – 0.957) in the test set, which was signicantly greater than APACHE II (0.811, 95% CI: 0.778 – 0.844) and SOFA (0.805, 95% CI: 0.770 – 0.840). The Hosmer-Lemeshow statistic was 12.667 with a P-value of 0.124, which indicated a good calibration of our predictive model in the test set. The nal model contained variables that were previously known to be associated with mortality, but it also included some features absent from the clinical scores. The mean N-terminal pro-B-type natriuretic peptide (NT-proBNP) of survivors was signicantly lower than that of the non-survival group (2066.43 pg/mL vs. 8232.81 pg/mL; p<0.001). Conclusions : Our results showed that the XGBoost model could be a suitable model for predicting RICU mortality with easy-to-collect variables at admission and help intensivists improve clinical decision-making for RICU patients. We found that NT-proBNP


Background
Intensive care unit (ICU) costs account for a large portion of increasing health care expenditures, and ICU resources are also inadequate to accommodate large populations, especially in China. 1,2 To allocate resources and reduce high costs, physicians have made many efforts to ensure that limited resources go to patients who can bene t most from them. Therefore, the evaluation of patients' clinical condition at ICU admission, which is a risk assessment of mortality, has received much attention.
To determine the illness severity of patients, many clinical scores have been developed. Acute physiology and chronic health evaluation II (APACHE II) scores 3 can stratify the risk of mortality depending on clinical characteristics collected on the rst ICU day involving twelve acute physiological variables, age and chronic health information. A sequential organ failure assessment (SOFA) system 4 is used to quantify the degree of six organ failures to evaluate severity. However, the accuracy of both scores for predicting mortality is limited. Moreover, the care conditions and illness severity of ICU patients in developing countries are different from those in developed countries, so there is a need to construct a risk model that can be applied to patients in developing countries.
In recent years, with the development of machine learning, many studies have deployed multivariate prediction models such as eXtreme Gradient Boosting (XGBoost) models, random forest models and logistic regression models for the research of healthcare problems. 5,6 This study aimed to develop a new risk prediction model based on a machine learning method to predict the mortality of patients on their rst day in the ICU with easy-to-obtain variables.

Data Source
The data was obtained from the hospital information system of the respiratory intensive care unit (RICU) in the Chinese People's Liberation Army General Hospital (PLAGH). PLAGH has 125 clinical, medical and technological departments, 4000 patient beds, and an annual volume of more than 3.8 million outpatient visits, over 110000 admissions and more than 65000 operations. Every year, an average of nearly 200 patients are admitted to the RICU in PLAGH. This hospital information system consists of clinical and demographic information, laboratory results, mortality and lengths of RICU stays for patients admitted to the RICU until the time of discharge. The data were retrospectively collected and then analysed.
From January 1, 2012, to December 31, 2017, a total of 1,350 patients who made 1,643 visits to the RICU were identi ed. Patients were excluded if their length of RICU stay was less than 24 hours or if they had missing values in important features. After exclusion, 817 patients with 1,063 visits remained in this study ( Fig. 1). We considered each visit as an independent incidence.

Data description
The entire dataset of 1,063 visits was randomly split into train and test datasets at a 7:3 ratio. A total of 726 visits with 101 deaths in the RICU (13.9%) were placed in the train set, while 337 visits with 43 deaths in the RICU (12.8%) were placed in the test set.

Outcome and candidate predictors
We de ned our outcome as recorded death in the RICU. All features were derived from original datasets. After data cleaning and processing, 107 features with missing values less than 30% were considered (details in Table S1.).
The features included demographic information at the time of RICU admission, such as age, sex and marital status, laboratory results, vital signs and nursing assessment, such as pressure ulcer risk. These candidate variables were de ned as the worst value within 24 hours after RICU admission. Clinical characteristics such as infection within 24 hours after RICU admission were also considered.

Statistical analysis
We rst used standard descriptive statistics to describe each feature between the survival group and the non-survival group at RICU discharge. For continuous variables, the mean value and standard deviation were calculated, and differences between groups were assessed by univariate analysis with the Mann-Whitney U test. For categorical variables, group size and proportion were used, and comparisons between groups were examined by the chi-square test.
Then, we implemented the eXtreme Gradient Boosting 7 (XGBoost) method to predict the mortality of patients admitted to the RICU. The XGBoost method uses the gradient boosting decision tree algorithm: new models are generated to correct the errors of previous tting. After training, it combines all the models together to achieve better predictive performance. We reached our nal model through a hyperparameter selection process using the k-fold cross validation area under the curve (k = 5). We started with default parameters, tuned the maximum depth of a tree and minimum sum of the instance weight needed in a child and tested various numbers of other parameters to avoid over tting on the training set. The nal XGBoost model used a 'gbtree' booster with a learning objective of logistic regression for binary classi cation. The model deployed a maximum depth of 5, a minimum sum of instance weights of 20 and a gamma value of 5. The learning rate was set to 0.02 and boosting iterations were set to 560. The subsample ratio of the training instance and of columns when constructing each tree were all set to 0.6. The other parameters were set to their default values.
We compared XGBoost models with random forest models 8 and logistic regression 9 models. For random forest models, we rst tried to determine the maximum depth of a tree with one tree model trained on the data. Then, we trained with different numbers of trees until the out-of-bag error did not decrease further.
Finally, we achieved our model of 100 estimators with a depth of 4. For logistic regression models, we trained models with different regularizations. Our nal model used the L2 penalty. As these models cannot handle missing data as automatically as XGBoost models, we generated multiple imputations for incomplete multivariate data. After the nal model was constructed, we employed the Shapley Additive exPlanations (SHAP) method 10 to better interpret the non-linear relationship between variables and outcome. SHAP assigns a value to each feature for each prediction by computing a weighted average of differences between models with one feature included and withheld for all possible feature subsets. With the higher SHAP value, the attribution of this feature to the risk prediction becomes larger. Therefore, we could explain the output of our nal model from the SHAP values.
The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUROC) were used to measure the predictive performance of the models. The area under the ROC curve of predictive models and clinical scores were compared by the DeLong test. 11 Calibration of the risk model was assessed by the quintile plot of observed risk versus expected risk and the Hosmer-Lemeshow goodness-of-t test. 12 All statistical analyses were performed on Python 3.6.4 and R 3.5.1 (packages: mice, pROC, ResourceSelection). A two-tailed p value < 0.05 was considered statistically signi cant. Table 1 summarizes the selected demographics, clinical characteristics, laboratory results and vital signs of the study cohort. Supplementary Table S1 compares all the features between the survival group and the non-survival group. The rate of RICU mortality was 13.5%. Among all RICU patients, 73.2% were male and 88.7% were married. The average age in the survival group was 68.48 years, while the average age in the non-survival group was 80.43 years. The mean N-terminal pro-B-type natriuretic peptide (NT-proBNP) of survivors was signi cantly lower than that of the non-survival group (2066.43 pg/mL vs. 8232.81 pg/mL; p<0.001). The mean length of RICU stay was 20.32 days.

Model Performance
The XGBoost model achieved the best performance, with the highest area under the ROC curve of 0.923 (95% CI: 0.889-0.957) in the test set, followed by the random forest model and logistic regression models. The area under the ROC curve of all models can be found in Fig. 2. The XGBoost model greatly outperformed the random forest models (p = 0.018), logistic regression models (p < 0.001) and clinical scores (p < 0.001). The calibration of the XGBoost model is illustrated in Fig. 3. The Hosmer-Lemeshow statistic of the model was 12.67 (p = 0.124) in the test set, indicating no evidence of poor t.

Feature Importance and Interpretations
Feature importance was calculated by the amount of improvement in accuracy after a feature was introduced onto a branch. This indicated how valuable each feature was in the processing of prediction model construction. The ten most important features derived from the nal XGBoost model are presented in Fig. 4. We applied the SHAP method to better understand the non-linear relationship between those variables and the outcome in Fig. S1-S10. NT-proBNP contributed the most to the model. When its value was larger than approximately 1200 pg/mL, the risk of death started to increase rapidly (Fig. S1). The age at RICU admission also had great in uences, and it had two clear changing points at approximately 80 and 85 (Fig. S2). Urea (Fig. S3) ranked as the third most important feature, followed by lactic acid (Fig.  S4), blood glucose (Fig. S5), and respiratory rate (Fig. S7). Their relationships with the outcome all shared an S shape: increasing slowly initially, then becoming steeper in the middle and attening off again at the end. The pressure ulcer risk score (Fig. S8) and red blood cell count (Fig. S10) shared an inverse S-shaped curve. When variables such as red blood cells in urine (Fig. S6) and myoglobin (Fig. S9) were below the normal value, the risk of death increased sharply.

Discussion
We trained three machine learning approach-based models and assessed their performance with each other and clinical scores using the area under the ROC curve. Among them, the XGBoost model had the best predictive power with an area under the ROC curve of 0.923 (95% CI: 0.889-0.957) in the test set; this model was signi cantly greater than other models and had higher scores (p < 0.05). Moreover, XGBoost also showed good calibration by the Hosmer-Lemeshow goodness-of-t test (12.67; p = 0.124). Therefore, our study established a new mortality prediction model that can be applied to RICU patients.
Many studies have attempted to develop risk models for predicting mortality in the ICU. Most of the studies focused on time-series data. Ghassemi M et al. 13 developed time-varying models with a combination of latent topic features and baseline features that reported an AUC of 0.85. Another study 14 in a German tertiary care centre showed a real-time prediction for ICU mortality after cardiothoracic surgery. However, some studies 13,15 concluded that data collected within 24 hours after admission contributed the most to the predictive power of the model. Furthermore, due to inadequate ICU resources, many clinical decisions should be made within a limited time after admission to avoid treatment delays. Our predictive model was based on baseline features, which were all easy to collect, to help intensivists assess the clinical conditions of patients in the rst 24 hours after they are admitted to the ICU.
We also identi ed several important features. As expected, baseline age and respiratory rate were important in the model and were also included in clinical scores like APACHE II and SOFA. Our results showed that patients aged 80 years and over had a higher ICU death risk (Fig. S2), a conclusion that was supported by the study of Bagshaw S M, et al. 16 Similar to our ndings (Fig. S4-S5), previous studies have identi ed that high blood glucose17 and lactic acid level 18 play an important role in mortality prediction. NT-proBNP, the feature that most in uenced the model, has been widely recognized as an excellent diagnostic and prognostic marker in heart failure patients. 19 Many studies have investigated its usefulness in ICU patients. One study 20 proposed that a single measurement of NT-proBNP at admission might be a potential prognostic marker in unselected ICU patients, and survivors had signi cantly lower NT-proBNP than non-survivors, which was consistent with our result (Fig. S1). As the third most important variable, a high urea level (Fig. S3) 21 has been reported to be an independent predictive value for the 12month mortality prediction of elderly admitted to the ICU, combined with the presence of acute renal failure, the need for mechanical ventilation, a low Glasgow Coma Score (GCS) and age. Pressure ulcer risk score classi es the risk for pressure ulcers: a score under 14 indicates an increased risk of pressure ulcer development. 22 In addition, patients who developed pressure ulcers were more likely to die during their hospital stay, 23 which was consistent with our results (Fig. S8).
Our model introduced some novel predictors of mortality risk for RICU patients, such as red blood cells in the urine, myoglobin and red blood cell count. For example, myoglobin has long been evaluated as an early marker of myocardial infarction diagnostics and is associated with the mortality of patients after cardiac surgery. 24 However, the association of ICU mortality with myoglobin has not been discussed in previous work. In future studies, we may further examine the association between these novel variables and ICU mortality. Although relationships between several top in uential features and outcome have been brie y discussed, our primary objective was to build a predictive model that combined all the features but not to identify individual risk predictors.
There are some limitations we would like to acknowledge. Our study was conducted at a single centre retrospectively. The external validations from separate institutions would be necessary to test the generalized predictive power of the model for future application. Moreover, our model was only based on the population of the RICU. Further study on the population from mixed types of ICU and a comparison between different types of ICU would be advised.

Conclusion
We developed a new predictive model of RICU mortality based on a machine learning approach. Our nal model, with its superior predictive performance and good calibration, would help intensivists make better clinical decisions to identify patients who are at the most risk.

Declarations
Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate
This retrospective study used previously documented medical records, so it was exempted from clinical ethics. This study was a retrospective data analysis and we did not registered. The need for informed consent was waived.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Declarations
This study was conducted in accordance with the amended Declaration of Helsinki. Table   Table 1 Selected demographics, clinical characteristics, laboratory results and vital signs of RICU patients. Data are summarized as the mean ± standard deviation or n (%). NT-proBNP = N-terminal pro-B-type natriuretic peptide.  Calibration plot of the XGBoost model in the test set. Data were divided into quintiles by predicted risk. The blue bar represents the observed proportion of events, and the red bar represents the predicted risk.
The Hosmer-Lemeshow statistic of the model was 12.67 with a p value of 0.124.