Objective: This study aimed to develop an explainable model for predicting mechanical ventilation (MV) duration in patients with acute respiratory distress syndrome (ARDS) using on a machine learning (ML) approach.
Method: Of the 4443 patients with ARDS in the Medical Information Mart for Intensive Care-IV database, 2702 were selected to construct feature set A (age at admission, BMI, Acute Physiology Score III [APS-III], Sequential Organ Failure Assessment [SOFA] score, and other features at MV initiation), and 2228 patients remained for construct feature sets B (age, APS-III, SOFA score, and remaining features after 24 hours MV) and C (A+B). After feature sets were randomly assigned with 70% in a training cohort and 30% in a testing cohort, tenfold cross-validation was conducted on training cohort to determine the best performing model, which was accessed in the related testing cohort and explained using SHapley Additive exPlanations (SHAP).
Result: The tenfold cross-validation results indicated that the Extreme Gradient Boosting model had the best performance on the training set (root-mean-square error [RMSE]=5.78 days [SD=0.52 days]) among six algorithms. The Bland–Altman plot and paired sample t-test results indicated that the predicted and actual values of the optimal model were consistent, with RMSE=6.85 days. The SHAP results indicated that the three most important features for the model were APS-III, age, and BMI, and there was an obvious effect of the interaction between APS-III and age on the SHAP value.
Conclusion: ML models can accurately predict the MV duration of patients with ARDS in intensive care units. The feature set based at MV initiation had better predictive performance than the feature set at 24 hours after MV.