In this study, we developed a nomogram based on EML to predict the 30-day risk of death in ICU stroke patients, with higher performance compared to the UC-N. In addition, our findings revealed that the ability of EML to identify important variables and explore complex non-linear associations can improve on the shortcomings of traditional linear models (e.g., logistic regression). Our nomogram therefore can allow clinicians to easily and accurately assess the risk of short-term death for stroke patients on the first day of ICU admission, thereby improving patient treatment and care.
The identification of risk factors for death in stroke patients can improve patient management and enable a more accurate estimate of prognosis. From the SHAP summary plot (Fig. 2.A), we incorporated a total of 10 risk factors into the subsequent nomogram construction, with “sofa” being the variable that had the greatest impact on the LightGBM. Though the “sofa” score originated form a score of sepsis-related organ failure assessment, it had been widely used for routine monitoring of acute morbidity in intensive care units [27, 28]. “Sofa” score was a comprehensive assessment of the state of dysfunction in six aspects of the body. Sofa's predictive value for early mortality risk in stroke patients has been proven: Wei Qin et.al found that the first day “sofa” score had a good predict effect on the stroke patient’s prognosis [28]. In addition, our study revealed that stroke patients with a “sofa” score of greater than 4 had a higher risk of death (Fig. 2.B and Fig. 3.A). A meta-analysis found that average mortality also significantly increased in 30-day sepsis mortality in study populations with higher “sofa” score [29]. The other risk factor worth exploring in this study was “temperature_max”. Temperature management was particularly important for ICU patients given that even small changes in body temperature can lead to changes in inflammation and immune function and had variety of effects on patient outcomes [30–32]. Our study concluded that for stroke patients, the maximum body temperature between 36.5 and 37.8 on their first day in the ICU would reduce the risk of death within 30 days. A large retrospective cohort of 28,679 Australian and 45,038 New Zealand stroke patients found that their maximum body temperature on the first day of ICU admission was between 37 and 39 degrees with a lower risk of death [33], which was more consistent with our findings. Other risk factors, including “age”, “sodium”, “bun” (blood urea nitrogen) and “heart rate”, were also identified in studies of predicting the risk of death in stroke patients based on the MIMIC datasets[8, 10].
Given the huge burden of disease already caused by stroke: stroke alone was responsible for 6.6 million deaths worldwide, small improvements in the accuracy of prognostic-related prediction models for stroke can have huge benefits [34]. Our EML-N was a significant improvement over the UC-N in terms of both the overall dimension and the individual dimension. We believed that it was the following two major improvements in the method we built on the nomogram that had led to the higher performance of our EML-N. Firstly, linear models (including logistic regression and cox regression) were the most common for develop nomogram[12]. However, those linear models were not appropriate when there was a nonlinear association between predictors and outcomes[35]. Daan et al. reported that the restricted cubic splines regression (a nonlinear modeling methods) outperformed the logistic regression with linear terms when assessing the nonlinear relationship between continuous predictors and outcome[35]. Although some studies had fitted non-linear relationships between predictors and outcomes by using variables with cubic splines in logistic regression, the choice of location and number of knots was strongly influenced by a priori experience [9, 35, 36]. Taking the “temperature_max” variable in this study as an example, we found that cubic spline regressions (RCS) using 3 knots (10th, 50th and 90th percentiles) and 5 knots (5th, 27.5th, 50th, 72.5th and 95th percentiles) showed significantly different trends in their curves after the “temperature_max” above 38 degrees (results not reported). Moreover, when a large number of variables were included in the RCS, the workload of selecting the best-fit form of all variables was significant and can easily lead to biased results. On the contrary, the PDPs in this study greatly reduced the difficulty of knots selection in RCS and allowed for the non-linear fitting of multiple variables simultaneously. Secondly, the important variables associated with ICU mortality of stroke patients were easily selected by the SHAP summary plot in our study. A common method in selecting important variables was least absolute shrinkage and selection operator algorithm (lasso) in disease research[37]. Zirui Meng et.al utilized lasso to select the important variable in laboratory examination results[38]. However, the lasso was a linear model and it only select a variable that was linearly related with the outcome, and only one variables could be chosen from a set of highly correlated variables [39]. It was certainly possible that some key variables may not be selected. In our study, both UN-N and EML-N were developed based on the variables selected from the SHAP summary plot and they both had high AUC values.
There were several limitations in the study. Firstly, our nomogram was only constructed by MIMIC-IV datasets, so it may not be generalizable to other settings. Second, in order to further improve the usability and convenience of the EML-N, we discretized all continuous variables, which may lead to a loss of some information and thus reduced the performance of the nomogram. Thirdly, although TIA was often thought of as a herald to stroke only, our study also included patients who were diagnosed with TIA [40]. A large cohort lasting 66 years found that 30.8% (40) of the 130 stroke patients identified at follow-up had a TIA within 30 days[41]. In addition, patients with TIA only accounted for 4.8% of our study, which would not affect the robustness of our nomogram.