Internal and External Validation of a Machine Learning-assisted Predicting Models for Mechanical Ventilation-associated Severe Acute Kidney Injury

The preventative and therapeutic strategies of mechanical ventilation (MV)-associated severe AKI are still limited. We developed clinical prediction models for early warning the occurrence of severe AKI in the rst week of ICU during the initiation of MV. Retrospective analysis of a large ICU database (MIMIC-IV). Data collection were based on the clinical information available on ICU admission and during the rst 12 hours of MV. Predictors were selected successively using univariable and multivariable analysis. Two machine learning algorithms were compared for model development. The primary outcome was predicting the development of AKI stage 2 or 3 (AKI-23) and AKI stage 3 (AKI-3) in the rst week of ICU stay after the initiation 12 hours of MV. The developed models were further external validated in another multi-center ICU database (eICU) and evaluated in different subpopulations of patients.

Conclusions MV-associated severe AKI can be predicted early with models driven by machine learning techniques based on routinely clinical information. The validated models are available at: https://apoet.shinyapps.io/mv_aki_2021_V2.0/ Background Mechanical ventilation (MV) is a critically life-saving intervention which is widely required in intensive care unit (ICU). Moreover, it is also regarded as one of the most important risk factors for acute kidney injury (AKI) in critically ill patients, which could increase the odds of AKI nearly threefold (1). Severe AKI, double serum creatinine or the need of acute dialysis, is independently associated with an up to ve-fold higher risk of death (2,3). The patients with combined MV and severe AKI have a higher mortality rate of approximately 60-80% (3)(4)(5)(6)(7). Up to now, the therapeutic strategies for MV-induced severe AKI are still limited (6), which partly due to the delays in the identi cation of severe AKI on the basis of elevation in serum creatinine (SCr) 2-3 days after the initial injury. With an effective treatment of AKI lacking, prevention becomes greatly critical since it would greatly improves the poor outcomes of AKI in the ICU (8). Before a detectable rise in SCr, early detection of patients at increased risk of MV-induced severe AKI, would provides intensivists a critical time window in early evaluation and better informed decision making to change the current intervention strategies utilized in a MV-receiving patient (9,10).
Currently, different pathophysiological and underlying processes were explored in experimental studies to explain the interactions between MV and AKI (11,12). But there is no an ideal method that could provide the early information of the odds of severe AKI to clinicians after the initiation of MV (13,14). The increasing secondary analysis of large electronic data of severely ill patients provides a new opportunity and a major step forward to predict the occurrence of severe AKI in ICU (15)(16)(17)(18)(19)(20). It has been extensively suggested that such prediction model based on clinical data and machine learning would predict much earlier, be more applicable and cost-effective than currently delayed diagnostic criteria and limited biomarkers. We therefore hypothesized that machine learning might be a new method for early prediction of severe AKI speci cally in mechanically ventilated patients who should be paid more attention and cost too much medical resources differently from non-mechanically ventilated patients.
The aim of this study was the development of the early clinical prediction models for the occurrence of severe AKI in the rst week of ICU stay after the initiation of MV, which were based on the large ICU database with general population background and machine learning algorithms. The developed models would be further external validated in another large ICU database and different subpopulations.

Study design and cohorts
The prediction models were developed and internal validated by the retrospective analysis of a singlecenter publicly available ICU database (MIMIC-IV,v0. Both databases contained most of the routinely clinical patients' data including demographics, vital signs, laboratory tests, and other information of diagnosis, medications, and procedures. Data collection for the database was passive and had no impact on patients' safety. The databases were de-identi ed in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. The secondary analysis of both databases was approved by the team of the Laboratory for Computational Physiology from the Massachusetts Institute of Technology (MIT-LCP) who contributed and maintained the MIMIC-IV and eICU databases.
All patients in the MIMIC-IV and eICU were included in this study and only data of the rst ICU admission of the rst hospitalization were used (24). The inclusion criteria were as follows: 1) Age ≥ 18 years old; 2) ICU-stay for at least 24 consecutive hours; 3) Invasive MV, including ventilation through a endotracheal intubation, started on or after ICU admission and for at least 12 consecutive hours. The exclusion criteria were as follows: 1) The clinical history of the end-stage renal disease (ESRD); 2) The baseline SCr ≥4 mg/dL; 3) Need for dialysis or renal replacement therapy (RRT) within 24 hour after ICU admission; 4) The measurement of SCr was inadequate to stage AKI (SCr missing on day 1 or missing for more than 48 hours); 5) Patients with insu cient data collection.

Severe AKI de nitions
The AKI was staged daily during ICU stay based on the SCr criteria of "Kidney disease: Improved Global Outcomes" (KDIGO) (25). Severe AKI was de ned as AKI stage 2 or 3 according to 48-hour and 7-day SCr criteria of KDIGO (25). The mean SCr measurements prior to ICU admission in databases were considered as the baseline SCr. The SCr measured on ICU admission was used as an imputation value in patients with missing baseline SCr (26). The criteria of acute dialysis or RRT were excluded in our study since our models would be further evaluated in predicting the requirement for dialysis or RRT during ICU stay.

Prediction tasks performed
The primary prediction task of this study was predicting the development of severe AKI in the rst week of ICU stay after the initiation 12 hours of MV. The details of two primary outcomes were elaborated as AKI-23 and AKI-3:

1)
AKI-23: Prediction of the occurrence of AKI stage 2 or 3 during the rst week of ICU stay after the initiation 12 hours of MV.

2)
AKI-3: Prediction of the rst occurrence of AKI stage 3 during the rst week of ICU stay after the initiation 12 hours of MV.

Prediction models
For each prediction task, one model was developed based on the clinical information, available on ICU admission and the initiation 12 hours after MV. Models were developed and internal validated using the logistic regression and random forest machine-learning algorithms, in order to choose the better models for further investigation. The logistic regression and random forest machine-learning algorithms were performed by R version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) and the R package "randomForest" (20).

Predictor selection
For each patient, most of the routinely available ICU data known to be associated with MV and AKI was used candidate predictors according to literature review and availability in database (17,26). In the development cohort, the potential predictors entered into the selection process. The nal set of predictor variables was successively determined by univariable and multivariable analyses for each predicting model. The variables were selected as the nal predictors when they showed a strong association with severe AKI when the statistical difference at the 0.10 level was observed in both univariable and multivariable analyses. The univariable and multivariable analyses were performed by R version 3.5.1.

Model development and validation
The selected variables to be used as predictors for each model, as well as model development was performed in the development cohort only. Model performance and stability were validated via 10-fold cross-validation in the development cohort. The performance of the different models was evaluated in the internal validation cohort to choose the better machine-learning algorithm. Then, models were further validated in the external cohort. Models were subsequently evaluated separately in cardiac, respiratory, nervous and septic patients only, and for prediction of the requirement of dialysis or renal replacement therapy RRT during ICU stay. Model development, internal and external validation were performed by R version 3.5.1.

Performance evaluation and statistical analysis
The continuous variables were expressed as median and interquartile range (IQR). The signi cance of the differences between two groups was determined using the Mann-Whitney U test or Student's t-test, as appropriate. Categorical variables were expressed as absolute (n) and relative (%) frequency and analyzed by chi-square test or Fisher's exact test, as appropriate. The multivariable analysis was assessed using the logistic regression. A two-sided α of less than 0.05 was considered statistically signi cant. To compare performance criteria of models in the same patient subset, the difference and corresponding bootstrap con dence intervals (CIs) were computed. If the 95% CI excluded 0, the difference was declared statistically signi cant at the 0.05 level.
To assess the performance of the developed models, the evaluations for discrimination and calibration were reported (20,27). Discrimination referred to how well the predictions allow discriminating positive or negative outcomes, which was commonly evaluated with the receiver operator characteristic (ROC) curve, and quanti ed with the area under the ROC curve (AUC).
The additional common measures of discriminability were also reported including sensitivity, speci city, positive predictive value (PPV) and negative predictive value (NPV) at the classi cation thresholds that maximized sensitivity and speci city in the development cohort. The classi cation threshold referred to the cutoff for the predicted probabilities above which the patient would be considered 'positive' for the outcome. We have chosen the cutoff that result in the highest sensitivity and speci city in a population with similar characteristics (as determined for all patients in the development cohort) in the reported tables and as the default in the online calculator. This cutoff was selected equivalent to the optimal point on the ROC curve based on the maximum value of the Youden-index. All statistical analysis was performed by R version 3.5.1.  Table 1 and Supplemental Table 1. The information about 7,458 patients excluded for insu cient data in the MIMIC-IV was provided in the Supplemental Table 2.

Variable selection
We ended up extracting a set of 57 candidate variables came from the time before, upon ICU admission, and after MV, including demographic information, primary diagnosis, comorbidities, vital signs, ventilator parameters, additional hemodynamic support, and laboratory values from the databases. Details of the candidate predictors were reported in Supplemental Table 3. The univariable association with AKI-23 and AKI-3 in the development cohort was listed in the Supplemental Table 4-5. The selected predictors based on further multivariable analysis for AKI-23 and AKI-3 model development were shown in Supplemental Table 6-7. In the nal, there are 12 and 8 variables were selected as the predictors for AKI-23 and AKI-3 model development, respectively.

Development cohort model performance
In the development cohort, the performances of AKI-23 and AKI-3 models based on random forest were better than those based on logistic regression. The performance of AKI-23 model was with the AUCs of 0.77 (95% CI 0.69-0.84) for logistic regression and 0.82 (95% CI 0.76-0.88) for random forest. Meanwhile, the AUCs of AKI-3 model were 0.79 (95% CI 0.70-0.88) and 0.89 (95% CI 0.84-0.95) for logistic regression and random forest, respectively. Based on the maximum Youden index, the PPVs of AKI-23 model were 0.25 (95% CI 0.20-0.36) for logistic regression and 0.39 (95% CI 0.17-0.53) for random forest. The PPVs of AKI-3 model were 0.21 (95% CI 0.11-0.32) and 0.27 (95% CI 0.14-0.40) for logistic regression and random forest, respectively. The other discrimination results of the clinical prediction models for AKI-23 and AKI-3 in the development cohort based on two machine-learning algorithms were shown in Table 2.
The random forest was selected as the nal machine-learning algorithm for model development. To improve speci city or PPV, other classi cation thresholds within the range of clinical usefulness could be chosen (Supplemental Table 8).

Internal validation cohort model performance
The performances for AKI-23 and AKI-3 models developed by random forest in the internal validation cohort were reported in Table 3. The discrimination was shown with AUCs of 0.78 (95% CI 0.74-0.82) and 0.81 (95% CI 0.76-0.87) for AKI-23 and AKI-3, respectively. The classi cation thresholds identi ed in the development cohort were 23.5% and 11.9% for AKI-23 and AKI-3, respectively, remained robust and resulted in similar sensitivities, speci cities, PPV, and NPV in the internal validation cohort.
The calibration of the models AKI-23 and AKI-3 models were well calibrated with respective calibration slopes of 1.13 and 1.08 for development cohort, 1.04 and 0.90 for internal validation cohort. The calibration-in-the-large was close to 0 and calibration curve was close to the diagonal (Supplemental Figure 2). External validation cohort model performance   Table 11.
Online MV-associated severe AKI predicting models We made the ne-tuned models MV-associated AKI-23 and AKI-3 publicly available through our online portal at https://apoet.shinyapps.io/mv_aki_2021_V2 Discussion The inverse relationship between MV and kidney function was rst observed several decades ago and becoming a critically global challenge because of the greatly limited management (1,28,29). One of the reasons of lacking preventative and therapeutic strategies is the short of a desirable method offering the early-warning valuable information in evaluating the risk of AKI after the initiation of MV (17,30). Since it was highlighted that severe AKI had strong association with poor clinical outcomes(2, 3, 31), we developed and validated the clinical prediction models for early MV-associated AKI-23 and AKI-3 prognostication based on large databases in this study. Prediction of severe AKI stage is relevant to clinical practice and research, as it allows for risk strati cation. Additionally, in clinical trials, it might aid in strati cation or heterogeneity reduction. Higher-risk patients require special attention and might bene t from an earlier preventive management.
Clinical prediction for prognostication or decision-support in critical care about MV or AKI based on machine learning has been extensively studied and formed many cost-effective models (20,24,26,(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43). However, the clinical prediction models for the development of severe AKI after the initiation of MV are scarce. Our results clearly demonstrated that the development of AKI-23 and AKI-3 after the initiation 12 hours of MV during the rst week of ICU stay, as de ned by the KDIGO SCr criteria, could be accurately predicted based only on clinical information. Although fewer predictor variables were selected, the developed models performed well according to evaluation criteria in internal and external validation (20), and even in different subpopulations of patients. The models also signi cantly outperformed in predicting the associated requirement of dialysis or RRT during ICU stay.
It is well known that the invasive MV could be as a risk factor for AKI in the critically ill patients based on a systematic review and meta-analysis (1), yet the relationship between MV-related parameters and AKI is not clear. The meta-analysis, including many observations with different levels of tidal volume and PEEP, found no association between MV-related parameters and AKI (1). Similarly, another retrospective study found no signi cant association with any early or late AKI (12). In this study, based on multivariable analysis, we could not found the strong association with AKI-23 or AKI-3 either. However another study, investigating the patients with acute respiratory distress syndrome in MIMIC-III database, has found that PEEP is the only respiratory-related variable with a direct causal association in severe AKI (44). Although PEEP may reduce cardiac output and increase central venous pressure, which could diminish renal blood ow, free water clearance, or the glomerular ltration rate, it should be further evaluated and validated in a multivariate model including other potentially correlated variables (11). Therefore, the relationship between MV-related parameters and severe AKI still need further investigation.
For non-MV-related risk factors, in accordance with previous study (12), the platelet count in the initiation of MV showed strong association with AKI-23 and AKI-3. Moreover, we found that the lowest /diastolic blood pressure, serum phosphorus and magnesium in the rst 12 hours after MV also had multivariable association with AKI-23 and AKI-3. It was suggested that these risk factors might related to the underlying patho-physiological mechanisms of MV-kidney interactions, which could help to conduct further investigations (45). Meanwhile, they also could be used as the potential clinical indicators for early risk assessment of severe AKI in MV-receiving patients.
We made the developed and validated AKI-23 and AKI-3 models publicly available as an online prognostic calculator. Since early detection of MV-receiving patients at high risk of severe AKI, before a detectable rise in SCr, would bring a critical time window, clinicians could halt or reverse the ongoing renal injury in time(46). Furthermore, different classi cation thresholds are likely required to study populations at different risks for MV-associated severe AKI. For example, a higher threshold would be considered when studying septic patients than for nervous patients, which may have a better speci city and positive predictive value. In this study, we also provided multiple classi cation thresholds in the reported tables and in the online calculator for different clinical and research implications.
This study has several strengths. The rst is its use of a large dataset of diverse patient populations who received MV from six ICUs (21). Second, patients who received MV for a minimum of 12h were included, which would constitute much earlier warning window than other studies to identify the high risk of developing the severe AKI (44,47). Third, as the included patients under mix modes of MV came from multiple ICUs where ventilation practices vary, our models might be more applicable (44). Fourth, a development, internal validation and external validation approach was followed, with robust machine learning algorithm compared and selected for model development and further validation in different subpopulations. Fifth, we integrated most of the routinely available ICU data as the candidate predictors for selecting, which successively illustrated by univariable and multivariable association analysis (20). Sixth, the models used the least number of predictors that are likely available in most ICUs, which improve their likelihood of worldwide use and generalization. Seventh, the need for dialysis or RRT during ICU stay could also be predicted well base on the developed AKI-23 and AKI-3 models. Last, the developed and validated AKI-23 and AKI-3 models are publicly available as an online early warning tool. It provides an useful platform for further model validation, future biomarker, MV parameters, MV-associated severe AKI prediction studies, and complements existing risk evaluation scores in use at the bedside and/or in limited resource settings.
This study still has some limitations. First, the development cohort came from single clinical center while multi-center studies might be more convincing. However, the developed models were validated well in an independent multi-center dataset. Second, we excluded patients whose measurement of SCr was inadequate to stage AKI and with insu cient data collection, so our models do not apply to these patients and limited their generalization. Third, some routinely clinical variables, such as height, uid intake and other lab values, were not included as the candidate variables for predictor selection and model development due to the high missing ratio in the databases, while these features might improve the performance for early warning. Fourth, the causal association between MV parameters and severe AKI could not be demonstrated solely based on the machine learning algorithms. Fifth, as only the KDIGO SCr criteria were used, patients developing severe AKI according to KDIGO urine output criteria could be missed. Nevertheless, this is in line with several other previously published AKI risk scores and the models predicted well the related and clinically relevant outcomes of dialysis or RRT requirement at ICU discharge. Sixth, study design in this work relied on retrospective data investigation which may cause missing some important information compared to a prospective study. Besides, results about the impact analysis between model prediction and patients' outcomes could not be obtained. Seventh, the modest model performance might be contributed by the imperfect classi cation, since the AKI phenotype has not truly been established when MV associated with AKI. Last, several steps still need to be taken before such models could be translated into clinical practice. Therefore, these models also need further external validation in other independent datasets and prospective validation in a real-world clinical setting to demonstrate that preventive actions triggered by model outputs meaningfully alter patient outcomes, patient experiences, or care processes.

Conclusions
We have shown that the development of AKI-23 and AKI-3 after the initiation 12 hours of MV during the rst week of ICU stay, as de ned by the KDIGO SCr criteria, can be accurately predicted by machine learning techniques based on clinical information. These developed clinical prediction models are available online to facilitate further researches, which is freely accessible through https://apoet.shinyapps.io/mv_aki_2021_V2. Data collection was passive and had no impact on patient safety. The data set was de-identi ed in compliance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Studies performed on de-identi ed data constitute non-human subject research, thus no institutional or ethical approvals were required for this study.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and during the current study are available in the MIMIC and eICU repository, https://physionet.org/.

Competing interests
The authors declare that they have no competing interests.

Funding
There is no nancial funding or interest to report.
Author's Contributions CF designed the study, conducted the data collection, data analysis, data interpretation, and wrote the manuscript. SH conducted the data analysis, data interpretation, conducted the online early warning calculator, developed the website, and wrote the manuscript. LC and YB conducted the data interpretation and wrote the manuscript. LW and XZ conducted the data interpretation and reviewed the manuscript. Ventilator parameters: The The highest minute volume, L/min (median, (IQR)) 9.0 (8.1-9.7) 8.8 (7.8-9.6) <0.01 The highest drive pressure,  Classification Threshold based on the maximum Youden index in the development cohort.
Abbreviations: AKI, acute kidney injury; AUC, area under the ROC curve; ROC, receiver operator characteristic. Classification Threshold based on the maximum Youden index in the development cohort.
Abbreviations: AKI, acute kidney injury; AUC, area under the ROC curve; ROC, receiver operator characteristic. Figure 1 Study cohort selection work ow of MIMIC-IV database based on the designed inclusion and exclusion criteria. AKI: acute kidney injury, SCr: serum creatinine.