Machine Learning Algorithms Predict Successful Weaning from Mechanical Ventilation Before Intubation

Prediction of successful weaning from mechanical ventilation in advance to intubation can facilitate discussions regarding end-of-life care before unnecessary intubation. In this context, we aimed to develop a machine-learning-based model that predicts successful weaning from ventilator support based on routine clinical and laboratory data taken before or immediately after intubation. We used the Medical Information Mart for Intensive Care-IV database, including adult patients who underwent mechanical ventilation in intensive care at the Beth Israel Deaconess Medical Center, USA. Clinical and laboratory variables collected before or within 24 hours of intubation were used to develop machine-learning models that predict the probability of successful weaning within 14 days of ventilator support. Of 23,242 patients, 19,025 (81.9%) patients were successfully weaned from mechanical ventilation within 14 days. We selected 46 clinical and laboratory variables to create machine-learning models. The machine-learning-based ensemble voting classi�er revealed the area under the receiver operating characteristic curve of 0.863 (95% con�dence interval 0.855–0.870), which was signi�cantly better than that of Sequential Organ Failure Assessment (0.588 [0.566–0.609]) and Simpli�ed Acute Physiology Score II (0.749 [0.742–0.756]). The top features included lactate, anion gap, and prothrombin time. The model’s performance achieved a plateau with approximately the top 21 variables. We developed machine learning algorithms that can predict successful weaning from mechanical ventilation in advance to intubation in the intensive care unit. Our models can aid in appropriate management for patients who hesitate to decide on ventilator support or meaningless end-of-life care.


Introduction
Acute respiratory failure can be caused by various conditions, including pulmonary disease, cardiovascular disease, neuromuscular disorder, or required respiratory support after major surgery 1 .
Although invasive mechanical ventilation (MV) is a life-sustaining intervention used to assist or replace spontaneous respiration in patients with acute respiratory failure, the procedure is associated with a risk of severe complications such as ventilator-associated pneumonia, pulmonary edema, and acute respiratory distress syndrome 2 .
Inevitably, a proportion of patients will be unable to recover rapidly from ventilatory support, mandating the use of MV for an extended period.The duration of MV is linearly associated with poor outcomes, and the number of days of ventilation support directly correlates with daily incremental costs and unexpected medical conditions like thromboembolic events and posttraumatic stress disorder 3 .Prolonged MV unavoidably accompanies tracheostomy and life-sustaining care, which is not usually desired by patients 4,5 .Tracheostomy has its advantages in a lower frequency of laryngeal ulcers, less airway resistance, and ease of management 4,5 .However, prolonged MV and consequent tracheostomy are unlikely to bene t chronically ill patients with an expected dismal prognosis.It should only be performed if it aligns with the patient's goals and preferences.The possibility of undergoing a tracheostomy can be a reason for hesitancy to intubate elderly or chronically ill patients.
For these reasons, successful early predictions of whether a patient will undergo prolonged MV can support clinical decision-making in many clinical aspects.Several previous models have been suggested to anticipate prolonged MV or tracheostomy 6,7 , but they were either short-term predictive models or unrealistic models that did not consider patient death.Given this background, we aimed to develop a thorough machine learning model that can predict the possibility of successful weaning from MV within 14 days after intubation, before undergoing intubation.

Feature Importance of Each Variable
We chose the VC model as our representative model.The included variables in the model were ranked according to their information gain, and the top three features were lactate concentration, anion gap, and prothrombin time (Figure 3).To better understand the direction of in uence each feature has on this model, the SHapley Additive exPlanations (SHAP) algorithm was implemented for this model to explain for each feature the magnitude and direction of its impact on the outcome prediction.The top risk features included anion gap, age, presence of cardiovascular disease, and blood urea nitrogen concentration.Speci cally, a higher value or the presence of a variable indicates a higher chance of failure to wean from MV within 14 days (Figure 4).

Change of Model Performance with Variables
We assessed the performance metrics (Cohen's kappa, AUROC, F1-score, and balanced accuracy) of the VC model according to the number of features included (Figure 5).Each metric's performance was calculated as the variables with the highest feature importance were sequentially added.The model reached its plateau performance in all four metrics with approximately 21 variables.

Discussion
Based on the clinical presentation of patients in the ICU, physicians intuitively determine the possibility of successful weaning from MV before endotracheal intubation.However, expressing these intuitions as numerical values is challenging.In this study, we developed and validated well-performing machine learning-based models that predict successful weaning from MV before or immediately after intubation in critically ill patients with ventilator support.The prediction performance of our model was considerably better than that of conventional prognostic scoring models used for patients who need ICU support.This is the rst study to develop a prognostic model that predicts relatively long-term outcomes (14 days) based on variables within a day of intubation.The model characteristics render it clinically pragmatic and facilitate improved discussions about end-of-life care or prolonged MV with a tracheostomy.Previous efforts to predict the prognosis of patients undergoing MV in ICUs have shown several drawbacks.Clark et al. suggested a model consisting of clinical variables (intubation in the ICU, tachycardia, renal dysfunction, acidemia, elevated creatinine, and decreased HCO 3 -concentration) to identify individuals who may need prolonged MV at the time of intubation 6,8 .However, their model was derived from and validated in relatively small patient populations (99 and 225 patients, respectively), and the AUROC value for prolonged MV prediction was about 0.75.Moreover, patients who died within two weeks of starting MV were excluded from the analysis, leading to a selection bias.Several other studies exist but only predicted mortality 9 and short-term outcomes 10 , or they did not consider death in model building 7 .
Numerous clinical factors have been proposed in predictive models for patients with MV.The I-TRACH model previously extracted tachycardia, renal dysfunction, acidemia, and a decreased HCO 3 concentration as the main variables for constructing a scoring system 8 .In another study that reported the prediction of 30-day mortality, the essential features in the models were Acute Physiology and Chronic Health Evaluation II score, Charlson Comorbidity Index, use of norepinephrine, and base excess 9 .In our study, lactate level and anion gap were the two most important predictors in the nal VC model.Per the ndings of this study, several prior reports have emphasized the prognostic importance of lactate level and anion gap.The e cacy of early lactate-guided therapy in ICU patients has been reported 11 .The anion gap, a surrogate for levels of unmeasured anions, has been reported in a meta-analysis as an indicator of mortality in critically ill patients 12 .
We proposed the VC ensemble comprising RLRC, RFC, and CBC as our representative model.Historically, logistic and linear regression models have been used for prognosis tasks in clinical decisions concerning ICU-admitted patients 13 .Still, they are not t for predictor variables with skewed distributions and tend to over t.To avoid these shortcomings, more complex modeling approaches have been proposed 14 .First, RLRC does not rely on multivariate normality and equal within-group covariance matrices, but predictions require large-scale sample data for stable outcomes 15,16 .Second, RFC works well on data with several input variables and improves its classi cation accuracy because it keeps bias low and reduces variance.
Still, the interpretation is complex, and evaluation is slow 17 .Third, CBC requires lower computational costs but shows better accuracy than other tree-based models and support vector machines 18,19 .The three models (RLRC, RFC, and CBC) showed a tradeoff between precision and recall 20 ; therefore, the use of the VC ensemble method improves performance by reducing the variance component of prediction errors made by the contributing models 21 .
Apart from the thorough development of the ensemble model, our study has its strength due to the relatively high number of included patients (n=23,242) provided by the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, which is a well-established open database derived from an ICU in the USA.This is the rst study to establish a machine-learning model to predict the weaning probability based on MIMIC-IV data.Moreover, precise search terms like "intubation/invasive ventilation" and "ventilator type/mode" were utilized to establish a more stringent patient selection and outcome de nition.Such criteria can provide an example for the selection processes of intubated patients for future studies using the MIMIC-IV database.
Despite our meaningful ndings, there are some inherent limitations to our study.First, some imbalance in patient numbers was noted between those with and without successful weaning from MV (81.9% vs. 18.1%, respectively).Therefore, we presented various performance metrics, such as AUPRC values.
Second, although most variables were collected before the time of intubation, some variables were collected within 24 hours after intubation.However, we used variables that are not likely to change dramatically according to intubation (e.g., body temperature, blood urea nitrogen, underlying diseases).
In conclusion, we developed and validated a VC ensemble machine learning model that can effectively predict successful weaning from MV within 14 days before or immediately after intubation.Our study indicates that machine learning algorithms may facilitate clinical decision-making, such as identifying patients more likely to bene t from MV before or immediately after endotracheal intubation.This information can relieve the burden and aid doctors in suggesting appropriate management for patients at risk of endotracheal intubation in the ICU, notably for those who hesitate to decide on ventilator support or meaningless end-of-life care due to advanced age or the presence of several comorbidities.

Data Source
Data on patients requiring MV were obtained from the MIMIC-IV v1.0 database.MIMIC-IV is a well-known large-scale, single-center (Beth Israel Deaconess Medical Center), an open-access database covering 524,740 admissions of 382,278 patients to this center from 2008 to 2019 22 .The relevant records include demographic data, ICD-9-CM codes, hourly vital signs and input/output, laboratory tests and microbiological culture results, imaging data, treatment procedures, medication administration, and survival data.The database also provides multiple severity-of-illness scores generated from physiologic and laboratory variables on the rst day of each ICU admission.MIMIC-IV has several advantages over its previous version MIMIC-III.The composing data are relatively homogenous because MIMIC-IV contains data entirely sourced from the clinical information system iMDSoft MetaVision, the information of "procedure events," one of the primary sources of clinical information in ICU, is entirely present, and the number of included patients is more signi cant.

Selection of Participants
For meticulous patient selection, patients with "Intubation" and "Invasive ventilation" codes appearing at least once in the "procedure event" or "chart event" les were selected.Additionally, patients with "Ventilator type" and "Ventilator mode" codes appearing ve times or more within 24 hours after the rst code were also included (Supplementary Figure 2).The exclusion criteria were as follows: (i) age <18 or >100 years, patients with (ii) pre-existing tracheostomy, and (iii) missing SOFA score and SAPS II.

Data Collection and Outcome De nition
We collected clinical and laboratory variables recorded before and closest to the initiation of MV.For patients who did not have the values before intubation, the nearest value was obtained within 24 hours of intubation.To minimize the impact of intubation on each variable, we selected variables that are less likely to change dramatically after intubation (Supplementary Appendix 1).Missing values were imputed with the median value, and the missing rates are depicted in Supplementary Figure 3.As a comparator, two severity-of-illness scores, the SOFA and SAPS II scores, were calculated using the codes from Google's BigQuery database (https://github.com/MIT-LCP/mimic-code/tree/main/mimiciv/concepts/score).The primary outcome was successful weaning within 14 days of intubation, de ned as documented MV discontinuation without death.

Model Development
Several machine learning algorithms were used to develop predictive models, such as RLRC 16 , RFC 17 , CBC 19 , and VC ensembles 21 .The AUROC, AUPRC, Cohen's kappa, and F1-score in the entire dataset were calculated by ve-fold cross-validation for each model, and they are presented as averages with 95% CIs.
Likewise, the performance parameters for the SOFA and SAPS II scores were calculated, and the AUROCs of each model were compared using the DeLong test 23 .The VC model's confusion matrix was presented using Cohen's kappa's maximizing threshold value.To determine the optimal hyperparameter setting, the GridSearchCV library (version 0.22) was used to search multiple optimal parameter values to t estimators automatically.We calibrated the predictions using isotonic regression to verify that the predicted probability re ected the expected probability of successful weaning 24 .Finally, to better understand how individual variables impact the outcome prediction, a permutation feature importance and SHAP analysis on the best-performing model was conducted 17,25 .For model development and validation, Python (3.6.9) and its packages such as NumPy (1.19.5) 26 , pandas (1.1.5) 27, scikit-learn (0.23.2) 28 , matplotlib (3.3.4) 29 , seaborn (0.11.2) 30 , rpy2 (3.4.5) (https://github.com/rpy2/rpy2),scipy (1.5.4) 31 , and shap (0.41.0) 25 , as well as R (version 3.4.4) 32and its package pROC (1.18.0) 23 , were used 33 .

Other statistical considerations
To compare the baseline characteristics, categorical variables were presented as total numbers (percentages) and compared using Fisher's exact test.Continuous variables were presented as means ± standard deviations and compared using the Wilcoxon rank-sum test.All statistical analyses in this study were performed using Google BigQuery, Python (version 3.6.9),and R (version 3.4.4),and p-values <0.05 were considered statistically signi cant.Patients with evidence of endotracheal intubation were identi ed in the MIMIC-IV database.After careful selection, patients were divided into two groups according to whether they had been successfully weaned from MV within 14 days of intubation or not.Abbreviations: SOFA, Sequential Organ Failure Assessment; SAPS II, Simpli ed Acute Physiology Score II; MV, mechanical ventilation.

Tables
Performance metrics of the ensemble voting classi er model according to the addition of explanatory variables.

Figures Figure 1 Flowchart
Figures

Figure 3 Important
Figure 3

Figure 4 The
Figure 4

Table 1 .
Baseline Characteristics of Patients in the Intensive Care Unit According to Successful Weaning from Mechanical Ventilation within 14 Days

Table 2 .
Performance Metrics of the Developed Machine Learning Models, along with SOFA score and SAPS II Values are presented as means (95% con dence intervals) calculated from 5-fold crossvalidation.Hypothesis tests were conducted to determine whether the AUROC values of the models using machine learning algorithms were equal to those of conventional scores.* P-value<0.001compared to SOFA score.† P-value<0.001compared to SAPS II.Abbreviations: AUROC, area under the receiver operating characteristics curve; AUPRC, area under the precision-recall curve; VC, voting classi er; CBC,