Long-term mortality risk prediction tools for cardiac surgery can play an important role in enhancing continuity of care and planning resource allocation appropriately. With the advancement of electronic medical records and artificial intelligence, ML algorithms have become more widely utilized in individualized medicine to assist clinical decision-making . In this study, several ML algorithms (NNET, NB, GBM, Ada, RF, BT, LR, and XGB) were developed and validated to predict 4-year mortality of patients undergoing cardiac surgery. Concerning the predictive performance, the Ada model exhibited the greatest AUC and outperformed the remaining ML models. Moreover, to help surgeons use the model, a visualized and publicly accessible online calculator was developed, which provided a user-friendly interface. This study was the first to establish a long-term prediction model after cardiac surgery using early-stage and easily obtained variables based on ML methods.
Cardiac surgery, as a unique operation type, had a significant impact on circulation and physiology, as well as posing significant hurdles in terms of lowering mortality . In the field of cardiac surgery, there has been an increasing interest in risk prediction models for clinical use. Various risk stratification methods were cited in European guidelines for decision making, even though these scores cannot replace clinical judgment and multidisciplinary dialogue . Among the many scores that have been proposed, the original EuroSCORE, EuroSCORE II and STS scores are the most widely used to predict short-term mortality after cardiac surgery. However, several studies have reported that these scores have limitations in some surgeries or patient subgroups [11–13]. Recently, a growing number of studies have focused on mid-term or long-term mortality after cardiac surgery [32–36]. For example, Wu et al.  created a risk score predicting long-term mortality following isolated CABG surgery with the C-statistics ranging from 0.768 to 0.783 for mortality at 1, 3, 5, and 7 years of follow-up. Due to the need for more precise prediction models, the application of ML approaches has been increasingly studied. A recent meta-analysis using 15 studies showed that when compared with LR, ML models provide better discrimination in operative mortality prediction after cardiac surgery . In the present study, the Ada model had a better performance in both discriminatory ability with the higher AUC of 0.804 and goodness of fit (visualized by calibration curve) compared to the traditional LR methods.
The potential advantage of ML models is their capacity to capture nonlinearity and the interactions among features without the need for the modeler to manually specify all interactions, as needed with LR. Moreover, compared with traditional statistical methods, ML algorithms can handle missing data more efficiently because they do not rely on data distribution assumptions and are capable of more complex calculations. Clinical models constructed by ML have been used to predict short-term mortality in cardiac surgery with the performance regarding AUC ranging from 0.77 to 0.92 [19, 20, 39–47]. Zhou et al . and Ong et al . Found that the RF models predict short-term mortality better than other models in cardiac surgical procedures. Additionally, several studies showed that the XGBoost method performed better in predicting operative or in-hospital mortality than the other ML methods [19, 20, 41–43]. In our study, the study outcome was set as long-term mortality, and the Ada model performed better than the RF and XGBoost model. This also supports the so-called No Free-Lunch theorem in ML , which shows that there is no one model that works best for every problem or every dataset. Therefore, it is necessary to try and evaluate multiple ML models to determine which one performs best for a specific problem or study cohort. Actually, The Ada model is a technique that is gaining increasing application in clinical research [49–51]. Our study is the first to apply the Ada model in the context of cardiac surgery.
Through sophisticated ML methods, we determined that RDW, BUN, SAPS II, AG, age, urine output, chloride, creatinine, congestive heart failure, and SOFA were the Top 10 predictors in the feature importance rankings. In general, the predictors for long-term mortality identified in the Ada model in this study are consistent with other studies. RDW is a simple measure of the broadness of erythrocyte size distribution, conventionally called anisocytosis . A growing body of evidence demonstrated that higher RDW is strong correlation with a higher mortality rate in widespread cardiovascular diseases such as cardiac surgery, heart failure, and acute coronary syndrome [53–56]. However, there is less research available about whether RDW affects long-term outcomes after cardiac surgery, for which our study is a novel contribution to the published literature. The SAPS II, based on a large international sample of patients, provides an estimate of the risk of death without having to specify a primary diagnosis . According to our findings, the SAPS II score seems to be more important than the SOFA score in the feature importance rankings of the Ada model. Similar to our findings, Schoe et al.  found that the SOFA score used as a mortality prediction model underperformed compared to the SAPS-II score in this large cohort of cardiac surgery patients. Urine output, BUN, and creatinine were all Top 10 important variables. Lassnigg et al.  reported that even a slight increase in serum creatinine is correlated with a considerable increase in 30-day mortality following cardiac surgery. Tseng et al.  developed and validated ML algorithms using 94 preoperative and intraoperative features to predict cardiac surgery-associated acute kidney injury, which is closely associated with increased morbidity and mortality. In their model, the importance matrix plot reveals that the most important variables contributing to the model were intraoperative urine output. Our results also underline the importance of detecting, evaluating, and improving preoperative renal function in patients requiring cardiac surgery, which might serve as a target for improving outcomes.
There are several strengths of our study. Firstly, this is the first study that established advanced ML death prediction models focusing on the long-term mortality of patients undergoing all types of cardiovascular surgery. Given the heterogeneity of patients on ICU admission, our findings can be used to identify patients at high-risk for death, and determine which patients would benefit most from cardiac surgery. Providers can then offer targeted individualized care such as more extensive evaluation, post-discharge home visits, closer surveillance by primary care physician, or earlier post-operative follow-up appointments for these patients, actions that might mitigate future adverse outcomes. Secondly, we used MIMIC-III, a high-quality database with large sample size and extensive clinical data. Thirdly, we utilized advanced statistical methods, including eight ML models. To evaluate the performance of these models, the AUCs, calibration curves, and DCA were calculated and plotted, representing the discrimination, goodness of fit, and clinical application, respectively. Fourthly, the models were created based on the data readily available collected within the first 24 hours after patients’ admission. It is worth noting that early and accurate prediction of mortality can provide more time for clinicians to adjust corresponding treatment strategies. Finally, to help surgeons use the model at the bedside, a calculator was developed, which provided a user-friendly interface.
Our study had several limitations. Firstly, we used data from a single academic medical center in the USA, with the earliest cases from almost 20 years ago, when care may have been inconsistent with currently accepted standards. Therefore, a multicenter registry, prospective studies are needed to confirm these findings. Secondly, derived from the ICU adult participants, the results of our study cannot be generalized to other populations such as children and non-ICU patients. Thirdly, we did not obtain information including laboratory testing and interventions before ICU admission, which may cause confounders to some extent. Fourthly, restricted by the contents of the MIMIC-III database, some important information, including preoperative data (i.e. lactate, left ventricular ejection fraction, NYHA functional class, EuroSCORE score, and STS score), intraoperative data (i.e. intraoperative hypotension, vasopressor-inotropes and cardiopulmonary bypass time), and postoperative data (i.e. complications, late extubation, and length of ICU stay) were recorded incompletely and not included in the analysis. Fifthly, although we included patients in the database with the primary diagnosis of receiving cardiac surgery, it cannot be ruled out that some patients were admitted to treating other diseases. Finally, although our study deeply explored 4-year mortality in the ICU settings, other outcomes, such as acute kidney injury incidence, are also needed for further investigation.