Baseline Characteristics
As shown in Fig. 1A, a total of 1722 patients diagnosed as CA were included in our study. According to the research method of the previous study, we randomly divided all patients into training set (1206 people, 70%) and validation group (516 people, 30%). In the whole study population, the in-hospital mortality rate of CA patients was 52.43% (793 survivors and 929 non-survivors). Table 1 showed the comparison of demographics and variables between the training set and the validation set, as well as the comparison of dead patients and survivors during hospitalization. SBP, DBP, MBP values were lower in the training set. The proportion of dopamine use and in-hospital mortality were lower in the validation set. There were no significant differences in other selected variables between the training set and validation set. Patients who died also had lower SBP, DBP, MBP, temperature, SPO2, HCT, HB, platelet, bicarbonate, calcium, pH, GCS score, the proportion of man, CHF and myocardial infarction. However, age, SOFA score and SAPS Ⅲ score, epinephrine use, dopamine use, HR, RR, WBC, anion gap, BUN, creatinine, glucose, sodium, potassium, INR, PT and lactate levels in patients who died during their hospital stay were significantly increased. There was no significant difference in terms of whether they had DM, hypertension, ventilation and chloride levels, between the surviving and non-surviving patients.
Selected variables
In the training set, we conducted the regularization process of LASSO. The binomial deviance was computed for the test data as measures of the predictive performance of the fitted models. The binomial deviance curve was plotted versus log (λ) using 10-fold cross-validation via minimum criteria, where λ was a tuning hyperparameter. The dotted vertical lines were drawn at optimal values by using the minimum criteria and within one standard error range of the minimum criteria. We chose the latter criteria (λ = 0.01944) as it results in stricter penalty allowing us to reduce the number of covariates even further than the minimum criteria(λ = 0.00332). (Fig. 2A, Fig. 2B) Finally, 17 nonzero coefficients were resulted in LASSO regression. (Fig. 3A) Meanwhile, XGboost was also used to analyze the patients who died in the training set, ranked the predictive importance of all included variables, and selected the top 17 variables. (Fig. 3B)
Model development
In the training set, 17 variables respectively screened by LASSO and XGboost, were used to conduct univariate logistic regression with the in-hospital mortality, and variables with statistically significance in univariate logistic regression were used to conduct multivariate logistic regression. Table 2 and Table 3 showed the variables selected in the univariate and multivariate analysis by LASSO and XGBoost. Among the variables screened using LASSO, multivariate logistic regression identified age, SAPS Ⅲ, HR, MBP, RR, temperature, SPO2, GCS, man, bicarbonate, PT as the most significant mortality risk predictors. Among the XGBoost selected variables, SAPS Ⅲ, RR, bicarbonate, SPO2, temperature, age, HR, GCS, HB as the most significant mortality risk predictors.
We established an in-hospital mortality prediction algorithm using LASSO selected variables as follows: log odds of mortality = 18.746877 + 0.013344× age + 0.010997×SAPS Ⅲ+ 0.019006×HR- 0.017839×MBP + 0.048912×RR- 0.286264×temperature- 0.080727×SPO2- 0.142085×GCS- 0.258837×man- 0.064604×bicarbonate + 0.021723×PT.
The variance inflation factors for these variables were 1.1, 1.2, 1.3, 1.1, 1.3, 1.2, 1.1, 1.0, 1.0, 1.1 and 1.0, respectively.
Based on XGBoost, the selected variables for the in-hospital mortality prediction algorithm were as follows: log odds of mortality = 20.258476 + 0.011845×SAPS Ⅲ+ 0.052959×RR- 0.071588×bicarbonate- 0.099384×SPO2- 0.279351×temperature + 0.013794×age + 0.018419×HR- 0.138026×GCS- 0.102837×HB.
The variance inflation factors for these variables were 1.2, 1.3, 1.1, 1.1, 1.2, 1.1, 1.3, 1.0, and 1.1, respectively.
Model validation
The discrimination and calibration of the LASSO model and the XGBoost model in the training set and validation set were shown in Fig. 4A-C and Fig. 5A-C, respectively. In the training set, the AUC of LASSO model and XGBoost model were 0.7879 (0.7627–0.8132) and 0.7854 (0.7599–0.8109) respectively. In the validation set, the AUC of LASSO model and XGBoost model were 0.7994 (0.7618–0.8369) and 0.7941 (0.7560–0.8321), respectively. As shown in Fig. 4B and 4C, Fig. 5B and 5C, in the calibration curves of the training set and the validation set of the same model, it can be seen that the prediction models had a strong concordance performance in both sets.
The NEWS 2 based on RR, SPO2, SBP, pulse rate, level of consciousness or new confusion, temperature to predict the risk of in-hospital mortality for patients in ICU with CA. We calculated the NEWS 2 for all study patients. The ROC curve and calibration curve of NEWS 2 in the training set and verification set were shown in Fig. 6A-C. The AUC of the training set and verification set were 0.6944 (0.6651–0.7237) and 0.7030 (0.6588–0.7472), respectively.
The DCA for the LASSO model, the XGBoost model and the NEWS 2 model were presented in Fig. 7A. It can be seen that when the threshold probability was 0.18 to 0.88 in the three models, the models added more net benefit than the ‘All’ or ‘None’ scheme.
Model comparison
We compared the AUC of the LASSO, XGBoost and NEWS 2 model in our total study population to assess the predictive effectiveness of the three models. Figure 7B showed that the AUC for the LASSO model, XGBoost model and NEWS 2 model were 0.7912(0.7703–0.8122), 0.7875(0.7663–0.8088) and 0.6969 (0.6725–0.7212), respectively, which were confirmed to be 0.7845,0.7873, and 0.6969 via bootstrapping validation(repeat = 1000). By comparing the AUC, the predictive effectiveness of the LASSO model and the XGBoost model were both significantly better than the NEWS 2 model (p < 0.001). And there was no statistical significance difference between the LASSO model and the XGBoost model (p = 0.4605) (Table 4).
In the LASSO model, 11 variables were included, while 9 variables were included in the XGBoost model. Although the XGboost model was more concise, the net benefit of the LASSO model was higher than the XGBoost model within the threshold range of 0.6-1.0. We believed that higher net benefit was more beneficial for patients with CA. Therefore, we chose the LASSO model as the final model, and represented by the nomogram in Fig. 8. The nomogram used some parallel lines with scales to estimate the probability of occurrence of each risk factor. The score of each risk factor can be calculated, and then the probability of occurrence to the total score of all risk factors can be calculated, which is the probability of occurrence of this event.