The major findings of this study are: (1) in a cohort of stage I–III gastric cancer patients that underwent gastrectomy, AutoML performed well in predicting early-post operative mortality; (2) the generated AutoML models produced predictions with high positive predictive value and sensitivity, and thus could be considered for clinical patient prognostication and counseling of those predicted to be high risk; (3) the variables most influential in predicting 90-day mortality include older age, high nodal ratio of positive nodes to nodes examined, and prolonged hospital length of stay following surgery; (4) a multi-step model that first predicts a post-operative characteristic (i.e. pLOS and pNodeRatio) and then 90-day mortality can be used to design models for pre-operative use. Our work shows that AutoML can be feasibly, efficiently, and easily be used for training and validating ML models using commonly collected perioperative factors. To our knowledge, our study is the first to demonstrate the applicability of AutoML for early postoperative mortality prediction in cancer surgery. Thus, in addition to its potential utility for surgical treatment of patients with gastric cancer, our study supports broader evaluation and application of AutoML to guide surgical oncologic care.
Numerous studies have highlighted the importance of predicting mortality among patients with advanced cancers to assist with appropriate treatment planning and patient counseling3, 4, 18. Post-gastrectomy outcomes and mortality have been associated with several factors including stage of the disease, lymph node metastases, co-morbidities from neoadjuvant therapy, and age of the patient2, 19–21, but few clinical support tools or algorithms have been developed to accurately inform patient prognostication based on perioperative variables. Niu et al’s review on the application of artificial intelligence within gastric cancer highlights several studies that used ML models to diagnose gastric cancer and predict recurrence and metastasis; however, most of these studies utilized endoscopy or computed tomography images, pathology slices, or genetic features7. Image-based prediction models require large quantities of accurately annotated data7, 22, and acquiring genetic features for all patients adds to the cost of patient care and requires substantial time.
Unlike those studies, we utilized common data elements found within readily available real world data sources to train our ML models in patients with gastric cancer that underwent non-palliative gastrectomy. Given the importance of patient counseling and optimizing surgery/treatment plans if early-mortality risk is suspected, we chose to prioritize sensitivity and positive predictive value within our models. H2O.ai’s AutoML allows one to change the F1 threshold or optimize based on F2 scores to create models that optimize and prioritize specificity and negative predictability. Our approach provides a template for developing cost-effective and easy-to-implement decision-support tools for guiding patient selection for surgical treatment in this population. Furthermore, Lu et al.’s systemic review of 15 articles that utilized ML models to predict early mortality in patients with cancer using electronic health record data showed that model performance ranged from AUCs of 0.71 to 0.9223. While many of the studies they included had small sample sizes, our study with 39108 patients highlights promising abilities of ML models to predict early-mortality among cancer patients using data from population-level registries.
Our use of an interpretable machine learning approach facilitates the identification of potentially targetable risk factors. Older patient age, higher nodal ratio, and greater number of days between surgery and discharge were the three most influential variables across models in predicting 90-day mortality. This is consistent with Shannon et al.’s multivariate retrospective analysis of patients within NCD with stage I–III gastric adenocarcinoma that underwent total gastrectomy; their results showed that increasing age and a lower number of lymph nodes examined are associated with 90-day mortality2. Shu et al. further showed that older age (> 70 years) was associated with increased rate of complications (20% vs 11% in those < 70 years), and higher 90-day mortality (3.7% vs 0.5%) in a cohort of 534 patients at a single-institution. Notably, age independently predicted mortality after controlling for tumor biology, cancer stage, adjuvant therapy, and postoperative complications24, thereby highlighting the need for careful evaluation and counseling of older patients prior to gastrectomy.
For ensuring clinical utility, the timing of implementing predictive models is crucial. The initial model in this study can inform post-operative patient prognostication and highlighted the importance of post-operative length of stay and nodal ratio in predicting 90-day mortality. This is consistent with previous efforts to enhance prognostication in gastric cancer which reported that the number of nodes examined and nodal positivity independently influence survival in gastric cancer25, 26. However, pre-operative prediction is necessary to assist with both patient prognostication and selection of surgery. To ensure that our predictive models are useful in the pre-operative setting, we used a multi-step modeling strategy where we first predicted length of stay and nodal ratio only using parameters available pre-operatively. These predicted features were then used as input features in our final model for predicting mortality, which showed high discriminatory capability. This complex strategy was easy to implement through H2O.ai’s AutoML tools.
Despite better performance in prediction of pNodeRatio compared to pLOS, inclusion of pLOS provided the most improvement in model performance in predicting 90-day mortality. This suggests that patients that are at higher risk for longer hospital stays are highly susceptible to early-post operative mortality. Our work highlighted that patient’s income quartile, undergoing distal or en bloc gastrectomy as well as racial background influenced length of stay predictions. This is in-line with prior studies that show that the extent of resection and type of surgical procedure are independently predictive of post-operative length of stay in patients with gastric cancer27. In addition to these factors, patients’ pre-operative physical function/strength and co-morbidities influence both post-operative complications and length of hospital stay28, 29. Future models that incorporate these pre-operative characteristics may enhance pLOS prediction and subsequent early mortality prediction. Importantly, the congruence between prior research and the variables that were most influential in AutoML models provide confidence in these models’ clinical utility.
The influence of hospital length of stay on predicting early mortality also provides an opportunity for implementing clinical programs that help reduce this duration, to then potentially reduce early postoperative mortality. Enhanced Recovery After Surgery (ERAS) protocols have been implemented following gastrectomy30, 31, and they incorporate preoperative counseling and nutrition, earlier mobilization and feeding following surgery, avoidance of abdominal drains, and nasogastric/nasojejunal decompression32, 33. Wee et al.’s meta-analysis comparing conventional post-operative care vs ERAS protocols showed that ERAS programs decreased length of stay and care costs but did not significantly alter 30-day postoperative mortality or post-operative morbidity32. Weindelmayer’s single-institution study of 351 gastric cancer patients reported a reduction in 90-day mortality among patients in the ERAS program (0.8% vs 4.8% control); however, their overall 90-day mortality was only 2%34. Further research is necessary to optimize ERAS programs and to assess whether they reduce early postoperative mortality. Within our dataset, there was a cohort of patients that were still admitted to the hospital past 90 days postoperatively, and while the primary aim of this study was to assess early mortality, further research is necessary to understand predictors of prolonged hospital stays as well as morbidity, mortality, and quality of life outcomes among these populations.
Numerous studies have piloted clinical implementation of machine learning tools. Avati et al. developed a deep neural network that screens electronic health records from of all admitted patients at Stanford Hospital and predicts all-cause mortality within 3–12 months. They implemented the ML algorithm as a screening tool that notifies palliative care of positive predictions35, thereby streamlining patient-referrals and demonstrating how ML-based early mortality predictions can improve the efficiency of patient care. Manz et al. developed an ML-algorithm to predict 180-day mortality among oncology clinic patients within a health system in Pennsylvania. Their randomized clinical trial implementing this model along with behavioral nudges (weekly performance feedback to clinicians) showed increased rates of serious illness conservations with high mortality risk patients—a positive clinician behavior that improves end-of-life care22. Our results provide the necessary first step towards bedside application by demonstrating the feasibility of using AutoML to produce robust mortality predictions. Specifically, AutoML-based predictions could be used to augment perioperative risk stratification and postoperative treatment planning.
Our results must be interpreted while considering the limitations. While NCDB allows us to train ML models on a large cohort of heterogenous patients, the database itself is limited by missing data36, lack of information on the cause of death, and biases introduced by retrospective analysis2. Additionally, the database does not include information on patient transfers to hospice care, so we cannot discern what proportion of patients underwent hospice deaths. Finally, while NCDB captures approximately 70% of cancer patients, it only has data from patients that were treated at accredited CoC facilities, and thus is not generalizable to the entire US population2 36. Nonetheless, AutoML is able to handle missing data and reasonably predict early mortality in this heterogenous population using only the available features. Given the easy-to-use nature of the platform, institutions can validate and optimize the models based on institutional data prior to implementation. Our work only focused on one AutoML approach, and further studies are necessary to understand the applicability of other models within surgical risk prediction. Lastly, while we focused on mortality prediction, it is not the only outcome of interest for patients and families considering gastric surgery. Thus, future studies focused on morbidity and quality of life predictions are needed.