Prediction Algorithm for ICU Mortality and Length of Stay Using Machine Learning

DOI: https://doi.org/10.21203/rs.3.rs-992995/v1

Abstract

Background: Machine learning can predict outcomes and determine variables contributing to precise prediction, and can thus classify patients with different risk factors of outcomes. This study aimed to investigate the predictive accuracy for mortality and length of stay in intensive care unit (ICU) patients using machine learning, and to identify the variables contributing to the precise prediction or classification of patients.

Methods: Patients (n=12,747) admitted to the ICU at Chiba University Hospital were randomly assigned to the training and test cohorts. After learning using the variables on admission in the training cohort, the area under the curve (AUC) was analyzed in the test cohort to evaluate the predictive accuracy of the supervised machine learning classifiers, including random forest (RF) for outcomes (primary outcome, mortality; secondary outcome, and length of ICU stay). The rank of the variables that contributed to the machine learning prediction was confirmed, and cluster analysis of the patients with risk factors of mortality was performed to identify the important variables associated with patient outcomes.

Results: Machine learning using RF revealed a high predictive value for mortality, with an AUC of 0.945. In addition, RF showed high predictive value for short and long ICU stays, with AUCs of 0.881 and 0.889, respectively. Lactate dehydrogenase (LDH) was identified as a variable contributing to the precise prediction in machine learning for both mortality and length of ICU stay. LDH was also identified as a contributing variable to classify patients into sub-populations based on different risk factors of mortality.

Conclusion: The machine learning algorithm could predict mortality and length of stay in ICU patients with high accuracy. LDH was identified as a contributing variable in mortality and length of ICU stay prediction and could be used to classify patients based on mortality risk.

Background

Critically ill patients have a potential risk of death. To prevent this, substantial data including physiological parameters and laboratory tests are collected on intensive care unit (ICU) admission and throughout the ICU stay. Based on the integral evaluation using these data, health professionals, including physicians can provide care and predict clinical outcomes. Death is the primary clinical outcome, while length of ICU stay, which potentially reflects severity of the survivors and alters medical costs, may be another key outcome. Precise prediction of clinical outcomes at an early time point may contribute to improving the quality of patient care, medical resources/costs, and clinical outcomes[1, 2].

Recent advances in machine learning have led to an increase in the development of prediction algorithms using machine learning approaches in critical care[3]. A recent systematic review reported that a machine learning approach could be used to develop precise prediction algorithms for mortality in ICU patients compared to conventional statistical approaches (patients n > 10,000, area under the curve [AUC] 0.89-0.94)[4, 5]. While high mortality precision has been reported, analysis results to identify important variables in mortality prediction algorithms have not been sufficiently reported. Other clinical outcomes besides mortality, including length of ICU stay, are rarely analyzed. In addition to prediction, clustering into sub-populations is another strength of the machine learning approach; however, studies on clustering critically ill patients are limited.

Thus, we hypothesized that machine learning approaches could be used to develop precise prediction algorithms for ICU mortality and length of ICU stay, and that machine learning analysis could identify important variables for outcomes, while clustering may clarify subcategories. We collected a large data set on ICU admission from a single-center surgical/medical mixed ICU and analyzed three types of machine learning approaches.

Methods

Subjects

This was a retrospective cohort study performed using electronic health record data of consecutive patients admitted to the ICU at Chiba University Hospital, Japan, from November 2010 to March 2019. The surgical/medical ICU has 22 beds, with an annual admission number of patients ranging from 1,541 to 1,832. Of the 16,169 screened patients, 12,747 were enrolled in the present study after the exclusion of 3,422 with lower input rates (less than 50%) or missing data on clinical outcomes.

The study was approved by the Ethical Review Board of the Graduate School of Medicine, Chiba University (approval number: 3380), who waived the need for written informed consent.

Data Collection And Definitions

To develop prediction algorithms, the data of 94 input variables (Supplemental Table 1) were collected at the earliest time within 24 h after ICU admission from the ICU data system. These variables included 1) patient baseline characteristics (age, sex, height, weight, blood type, clinical department categories, diagnosis on admission, admission route [from emergency room, general ward, operating room, other hospitals] and acute physiology, and chronic health evaluation [APACHE] II comorbidities (acquired immunodeficiency syndrome, acute myeloid leukemia/multiple myeloma, heart failure, lymphoma, respiratory failure, cancer metastasis, liver failure/cirrhosis, immunosuppressed status, and dialysis)); 2) blood tests (complete blood count, biochemistry, coagulation, and blood gas analysis); and 3) physiologic measurements (heart rate [HR], blood pressure, respiratory rate, peripheral oxygen saturation [SpO₂], body temperature, and end-tidal carbon dioxide [EtCO₂]).

Variable importance is defined as an index calculated by machine learning indicating the extent to which the given machine learning model was used as a variable to make precise predictions. The top three variables with high importance were defined as the key variables in this study. The length of ICU stay was analyzed in survivors and divided into three categories: short (within one week), medium (within one to two weeks), and long (more than two weeks). The short and long length of ICU stay were considered to have high clinical importance because these subcategories were reported to be associated with ICU mortality and severity[6, 7]. In addition, identifying patients who are at risk of long ICU stay may contribute to adequate ICU management and avoid ICU bed shortage[6].

Imputation For Missing Values

We performed multiple imputations (10 times) for the missing values of numerical data using the sklearn.impute.Iterative Imputer in Python (scikit-learn 0.22.1; https://scikit-learn.org). Dummy coding was used to convert categorical variables into binary variables. After missing value imputation, the dataset was randomly split into the training and test cohorts, comprising 80% and 20% of the datasets, respectively, and the variables were compared between the two cohorts.

Statistical analysis

The primary outcome variable was ICU mortality, and the secondary outcome variable was the length of ICU stay. Outcome prediction was performed using machine learning approach algorithms computed with the three types of classifiers, namely random forest (RF), XGBoost, and neural network, or logistic regression analysis using either APACHE II score or sequential organ failure assessment (SOFA) score. After machine learning algorithms were derived using the training cohort, the established algorithms were applied to the test cohort. As we found that the RF was superior to the other two machine learning models for the prediction of mortality, we confirmed the variable importance and key variables in the RF model.

For robust clustering of ICU patients with higher risk factors for mortality, an RF dissimilarity measure was calculated to evaluate the similarity among patients. The RF dissimilarity was then used as an input for uniform manifold approximation and projection (UMAP) to provide a 2D representation of the patients in the test cohort. Subsequently, partitioning around medoids clustering was applied to the two scaling coordinates of the UMAP.

To predict the length of ICU stay, we evaluated the short and long categories using machine learning with RF algorithm and logistic regression analysis using the APACHE II or SOFA scores. In the same manner as the analysis on mortality, variable importance and key variables associated with length of ICU stay prediction were confirmed. We also analyzed the predictive values of length of ICU stay using ordinalForest, which could estimate the predictive values for all three categories of ICU stay at the same time. All classifiers were implemented using Python, except for the ordinalForest, which was executed with R.

Data are expressed as median (interquartile range) for continuous values and absolute numbers and percentages for categorical values. The area under the curve (AUC) was calculated to evaluate the predictive values. Statistical significance was set at P < 0.05. Analyses were performed using Python packages (sklearn.neural_network.MLPClassifier, sklearn.ensemble.RandomForestClassifier, xgboost, sklearn.linear_model.LogisticRegression) and R package (ordinalForest 2.4.1), to construct machine learning models. The Python and R codes used in this article are available at https://github.com/eiryo-kawakami/ICU_AI_code.

Results

Prediction of ICU mortality

The entire cohort of 12,747 patients was randomly split into a training cohort of 10,197 patients (80%) and a test cohort of 2,550 patients (20%) (Table 1). The number of ICU survivors was 12,133 (95.2%). Patient background and outcome were comparable between the training and test cohort data, except for the fraction of elective operations (P = 0.036).

Table 1

Baseline characteristics and outcomes of patients in the training and test cohorts
Variables	Training cohort	Test cohort	P-value
	(n=10,197)	(n=2,550)
Demographic data
Age, years	67 (53 - 75)	67 (54 - 75)	0.41
Male	6,366 (62.4)	1,567 (61.5)	0.36
Height, cm	161.4 (154.0 - 168.0)	161.1 (153.1 - 167.8)	0.84
Weight, kg	58.0 (49.8 - 67)	57.7 (49.5 - 66.2)	0.40
Diagnosis on admission*
Sepsis/Septic shock	411 (4.0)	98 (3.8)	0.67
Post cardiac arrest syndrome	284 (2.8)	67 (2.6)	0.66
Stroke	151 (1.5)	41 (1.6)	0.64
Acute coronary syndrome	305 (3.0)	83 (3.3)	0.49
Heart failure	426 (4.2)	89 (3.5)	0.12
Trauma	222 (2.2)	61 (2.4)	0.51
Others	4,579 (44.9)	1,130 (44.3)	0.59
Entry route*
Emergency room	1,123 (11.0)	287 (11.3)	0.73
General ward	689 (6.8)	170 (6.7)	0.87
Operating room, emergency	241 (2.4)	59 (2.3)	0.89
Operating room, elective	2,765 (27.1)	639 (25.1)	0.036
Other hospital	30 (0.3)	13 (0.5)	0.15
Comorbidity
Acquired immunodeficiency syndrome	6 (0.1)	0 (0.0)	0.22
Hematological diseases†	29 (0.3)	7 (0.3)	0.93
Heart failure	361 (3.5)	80 (3.1)	0.32
Lymphoma	22 (0.2)	7 (0.3)	0.58
Respiratory failure	443 (4.3)	94 (3.7)	0.14
Metastasis	99 (1.0)	22 (0.9)	0.62
Immunosuppression	170 (1.7)	41 (1.6)	0.83
Liver failure	54 (0.5)	17 (0.7)	0.41
Cirrhosis	33 (0.3)	8 (0.3)	0.94
Dialysis	113 (1.1)	29 (1.4)	0.90
APACHE II score	20 (14 - 28)	19 (14 - 27)	0.16
SOFA score	5 (2 - 8)	4 (2 - 7)	0.13
Outcomes
Survival discharge	9,703 (95.2)	2,430 (95.3)	0.77
Length of ICU stay
≤1 week	8,758 (90.3)	2,216 (91.2)	0.16
1-2 weeks	549 (5.7)	121 (5.0)	0.19
>2 weeks	396 (4.1)	93 (3.8)	0.57
APACHE, acute physiology and chronic health evaluation; SOFA, sequential organ failure assessment; ICU, intensive care unit
* The total number of diagnoses on admission and entry route sum to less than 100% because of missing values.
† Hematological diseases include acute myeloid leukemia and multiple myeloma.
Data are expressed as median (interquartile range) for continuous variables and as exact numbers (%) for categorical variables.
P-values were calculated using Pearson’s chi-square test and Mann-Whitney U test.

We first compared predictive values of ICU mortality using three machine learning approaches and logistic regression with APACHE II or SOFA score. The prediction algorithm using RF had the highest predictive value for ICU mortality among the test cohort (Figure 1a, red line; AUC 0.945) (predictive value among training cohort, Supplemental Table 2). In the analysis of importance using the RF model, lactate, lactate dehydrogenase (LDH), and platelet count were identified as key variables (top three variables) (Figure 1b).

Cluster Analysis Of Icu Patients

To further evaluate the key variables, we performed a clustering analysis in the test cohort. The clustering analysis with UMAP classified the ICU patients into five clusters based on the risk of ICU mortality (Figure 2). Clusters 1 and 5 had low mortality rates compared to clusters 2, 3, and 4 (Figure 2a), and clusters 1 and 2 were characterized by entry route to the ICU (cluster 1, post elective surgeries; cluster 2, from emergency department) (Figure 2b). In terms of key variables for mortality, cluster 3 had high LDH and cluster 4 had low platelet counts, and high lactate was not localized in specific clusters (Figure 2b).

Prediction Of The Length Of Icu Stay

We next analyzed the length of ICU stay category prediction. The RF algorithm had better prediction for both short or long stay compared to the logistic regression analysis models of APACHE II or SOFA score in the test cohort (short/long, AUC 0.881/0.889) (Figure 3a and 3b) (predictive value among training cohort, Supplemental Table 3). Three subgroup predictions using the ordinalForest approach yielded high predictive values in the test cohort (AUCs: short, 0.872; medium, 0.839; long, 0.863) (Supplemental Figure 1).

The analysis of importance identified the key variables for short ICU stay to be admission after elective surgery, HR, and LDH; for long ICU stay to be LDH, HR, and urea nitrogen (UN) (Figure 3c and d). In accordance with the key variables in mortality, LDH was identified as a key variable for both short and long ICU stays.

Discussion

In the present study, the machine learning algorithms using RF demonstrated a higher predictive value for ICU mortality and length of ICU stay compared to classical scoring systems. The machine learning prediction indicated that LDH was a contributing variable in precise prediction for both mortality and length of ICU stays. Cluster analysis with UMAP also identified that LDH contributed to classifying patients into the mortality risk sub-populations.

A unique feature of our study is the examination of key variables to predict ICU mortality and length of ICU stay using machine learning in combination with cluster analysis. LDH was found to be an important predictor of mortality involved in the clustering of ICU patients based on mortality risk and length of ICU stay. LDH is elevated by cellular damage caused by infection, hematologic disease, or liver damage[8], and has been reported to be associated with mortality in critically ill patients. In septic patients, high LDH (>225 IU/L) was associated with 28-day mortality; LDH had a favorable predictive value for 28-day mortality among these patients (AUC 0.783)[9]. In acute respiratory distress syndrome patients, high LDH (>350 IU/L) was associated with increased 60-day mortality[10], and LDH was the most important variable for predicting mortality (AUC 0.854) among 12 variables (age, sex, pneumonia or not, sepsis or not, white blood cell, albumin, C-reactive protein, LDH, pulmonary surfactant-associated protein D, peptidase inhibitor 3, number of organ failures [none, one, two, and more than two], and PaO₂/F_IO₂)[11]. In acute pancreatitis, patients who died had significantly higher LDH levels (mean 667 IU/L) than survivors (mean 494 IU/L)[12]. In line with these reports, our machine learning investigation found that LDH was a contributing variable in outcome prediction in ICU patients under various conditions.

Our investigation further identified that machine learning had a high predictive value for mortality in ICU patients. Machine learning can analyze a vast amount of data. As ICU patients undergo continuous monitoring of various parameters, they produce a massive amount of data and are thus an optimal target for machine learning[13]. With the development of automatic data collection systems in ICUs[14], investigations to identify the potential predictive ability for mortality in ICU patients using machine learning have been increasing recently. A previous investigation reported high predictive values for mortality using machine learning in ICU patients (AUC 0.85 - 0.89)[5, 15]. Our results showed similar predictive values to these reports.

In the analysis of the length of ICU stay prediction, we found a remarkable performance of the machine learning algorithm. Prediction of prolonged ICU stay is important for the adequate allocation of medical resources and management of ICU beds[16, 17]. Short and long ICU stays are reported to be related to the severity of illness[6]. Thus, precise prediction of short and long ICU stays is essential in the care of critically ill patients. A previous investigation reported that the predictive value for ICU stay longer than 6 days was higher in machine learning algorithms than in SOFA scores (machine learning, AUC 0.76; SOFA score, AUC 0.62)[18]. Patients with ICU stay within 10 days were identified using a machine learning algorithm with an AUC of 0.851[19]. Our investigation identified that the machine learning algorithm may predict both short (within one week) and longer (more than two weeks) ICU stays with high precision (short/long, AUC 0.881/0.889), in line with previous reports.

The present study has several limitations. First, this was a single-center, retrospective study. In this regard, the prediction accuracy should be validated in a future prospective study. Second, previous investigations were analyzed using data at the earliest timing after ICU admission. Inconsistent timing of sample collection among patients could potentially affect prediction accuracy. Third, the prediction was performed based on data collected only at a single time point on admission. Real-time prediction could also be useful to improve the accuracy of prediction of ICU mortality and length of stay of critically ill patients whose condition is subject to abrupt changes. Future investigations using sequential data are warranted.

Conclusions

The machine learning algorithm could predict ICU mortality and short/long length of ICU stay with high accuracy. Moreover, LDH was found to be a key variable predicting both mortality and length of stay, and contributed to the clustering of ICU patients based on mortality risk.

Abbreviations

APACHE, acute physiology and chronic health evaluation; AUC, area under the curve; EtCO2, end-tidal carbon dioxide; HR, heart rate; ICU, intensive care unit; LDH, lactate dehydrogenase; RF, random forest; SOFA, sequential organ failure assessment; SpO₂, peripheral oxygen saturation; UMAP, uniform manifold approximation and projection; UN, urea nitrogen

Declarations

Ethics approval and consent to participate

This study was approved by the research ethics committee of the Chiba University Graduate School of Medicine (approval number: 3380), who issued a waiver for written consent for the study because data collection was retrospective.

Consent for publication

Not applicable.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Authors’ contributions

SI and TN contributed to the study conception and design, acquisition of data, interpretation of data, drafting of the manuscript, and critical revision of the manuscript for important intellectual content. EK contributed to the study conception and design, acquisition of data, statistical analysis, interpretation of data, drafting of the manuscript, and critical revision of the manuscript for important intellectual content. TS, TO, TS, and NT contributed to the study conception and design, interpretation of data, and critical revision of the manuscript for important intellectual content. JY contributed to the acquisition of data, statistical analysis, and critical revision of the manuscript for important intellectual content. YY contributed to study conception and design, and critical revision of the manuscript for important intellectual content. All authors read and approved the final manuscript.

Acknowledgments

None.

References

Mamdani M, Slutsky AS. Artificial intelligence in intensive care medicine. Intensive Care Med. 2021;47:147–9.
Delahanty RJ, Kaufman D, Jones SS. Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients. Crit Care Med. 2018;46:e481-e8.
Gutierrez G. Artificial Intelligence in the Intensive Care Unit. Crit Care. 2020;24:101.
Shillan D, Sterne JAC, Champneys A, Gibbison B. Use of machine learning to analyse routinely collected intensive care unit data: a systematic review. Crit Care. 2019;23:284.
Pirracchio R, Petersen ML, Carone M, Rigon MR, Chevret S, van der Laan MJ. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Respir Med. 2015;3:42–52.
Zampieri FG, Ladeira JP, Park M, Haib D, Pastore CL, Santoro CM, et al. Admission factors associated with prolonged (>14 days) intensive care unit stay. J Critical Care. 2014;29:60–5.
Laupland KB, Kirkpatrick AW, Kortbeek JB, Zuege DJ. Long-term mortality outcome associated with prolonged admission to the ICU. Chest. 2006;129:954–9.
Farhana A, Lappin SL. Biochemistry. Lactate Dehydrogenase. StatPearls. 2021.
Lu J, Wei Z, Jiang H, Cheng L, Chen Q, Chen M, et al. Lactate dehydrogenase is associated with 28-day mortality in patients with sepsis: a retrospective observational study. J Surg Res. 2018;228:314–21.
Anan K, Kawamura K, Suga M, Ichikado K. Clinical differences between pulmonary and extrapulmonary acute respiratory distress syndrome: a retrospective cohort study of prospectively collected data in Japan. J Thorac Dis. 2018;10:5796–803.
Hu J, Fei Y, Li WQ. Predicting the mortality risk of acute respiratory distress syndrome: radial basis function artificial neural network model versus logistic regression model. J Clin Monit Comput. 2021. doi:10.1007/s10877-021-00716-x.
Vengadakrishnan K, Koushik AK. A study of the clinical profile of acute pancreatitis and its correlation with severity indices. Int J Health Sci (Qassim). 2015;9:410–7.
Komorowski M. Artificial intelligence in intensive care: are we there yet? Intensive Care Med. 2019;45:1298–300.
Bulgarelli L, Deliberato RO, Johnson AEW. Prediction on critically ill patients: The role of "big data". J Crit care. 2020;60:64–8.
Rongali S, Rose AJ, McManus DD, Bajracharya AS, Kapoor A, Granillo E, et al. Learning Latent Space Representations to Predict Patient Outcomes: Model Development and Validation. J Med Internet Res. 2020;22:e16374.
Houthooft R, Ruyssinck J, van der Herten J, Stijven S, Couckuyt I, Gadeyne B, et al. Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores. Artif Intell Med. 2015;63:191–207.
Arabi Y, Venkatesh S, Haddad S, Al Shimemeri A, Al Malik S. A prospective study of prolonged stay in the intensive care unit: predictors and impact on resource utilization. Int Qual Health Care. 2002;14:403–10.
Su L, Xu Z, Chang F, Ma Y, Liu S, Jiang H, et al. Early Prediction of Mortality, Severity, and Length of Stay in the Intensive Care Unit of Sepsis Patients Based on Sepsis 3.0 by Machine Learning Models. Front Med (Lausanne). 2021;8:664966.
Ma X, Si Y, Wang Z, Wang Y. Length of stay prediction for ICU patients using individualized single classification algorithm. Comput Methods Programs Biomed. 2020;186:105224.