Predictors of 30-Day Mortality Using Machine Learning Approach following Carotid Endarterectomy: An Insight from National Surgical Quality Database

BACKGROUND: Pre-operative prognostication of 30-day mortality in patients with carotid endarterectomy (CEA) can optimize surgical risk stratication and guide the decision-making process to improve survival. To develop and validate a set of predictive variables of 30-day mortality following CEA. METHODS: The patient cohort was identied from the American College of Surgeons National Surgical Quality Improvement Program (2005-2016). We performed logistic regression (enter, stepwise and forward) and least absolute shrinkage and selection operator (LASSO) method for selection of variables, which resulted in 28-candidate models. The nal model was selected based upon clinical knowledge and numerical results. RESULTS: Statistical analysis included 65,807 patients with 30-day mortality in 0.7% (n=466) patients. The median age of our cohort was 71.0 years (range, 16-89 years). The model with 9-predictive factors which included: age, body mass index, functional health status, American society of anesthesiologist grade, chronic obstructive pulmonary disorder, preoperative serum albumin, preoperative hematocrit, preoperative serum creatinine and preoperative platelet count—performed best on discrimination, calibration, Brier score and decision analysis to develop a machine learning algorithm. Logistic regression showed higher AUCs than LASSO across these different models. The predictive probability derived from the best model was converted into an open-accessible scoring system. CONCLUSION: Machine learning algorithms show promising results for predicting 30-day mortality following CEA. These algorithms can be useful aids for counseling patients, assessing pre-operative medical risks, and predicting survival after surgery.


Introduction
The absolute bene t derived from carotid endarterectomy (CEA) is limited by the mortality and morbidity imposed by the surgery itself 1 . The risks and bene t of stroke and mortality among patients with asymptomatic carotid stenosis undergoing CEA is still debatable 2 . However, the CEA has been shown to bene cial in patients with high-grade carotid stenosis and ipsilateral transient ischemic attack or stroke 3 .
Previous studies have reported perioperative mortality and morbidity rates of approximately 4% to 6% among patients with high-grade stenosis [4][5][6] . The risk of stroke or mortality following CEA varies in different settings 3 . Several studies reported perioperative mortality rate of approximately 1% and stroke rates of 3% [7][8][9][10] . Further, community-based studies have reported mortality rates if approximately 3%, with combined mortality and morbidity rates ranging from approximately 6% to 20% 113 . The overall combined mortality and morbidity rate from CEA in the United States has been estimated to between 6% and 10% 3 .
There does not currently exist a prognostic algorithm to stratify risks of mortality in patients undergoing CEA. Therefore, the purpose of this study was to, (i) develop a set of predictive factors for 30-day mortality following CEA using machine learning algorithms, and (ii) to create an open-access scoring system to use as a decision-supportive tool.

STUDY DESIGN AND DATA SOURCES
The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) database was queried for patients who had undergone CEA in the United States from 2005 to 2016. This is a large multi-institutional, prospectively-collected clinical database that reports 30-day postoperative outcomes in the United States. 12,13 Patients with a diagnosis of CEA were identi ed using applicable ICD-9 and Current Procedural Terminology codes. The de-identi ed NSQIP data is exempt from review by our  14,15 This was a retrospective machine learning classi cation study (outcome was binary categorical) for prognostication following CEA.

SPLIT-SAMPLE APPROACH: TRAINING VERSUS VALIDATION SET
Based on the split proportions reported in the literature, 16 we randomly divided the data into training (70%) and validation datasets (30%). A set of predictive models for various types of 30-day mortality for the training dataset was made using (i) a generalized linear regression model with logit link function (logistic regression model), and (ii) least absolute shrinkage and selection operator (LASSO) regularization method. The performance of the prediction models developed using the training dataset was evaluated through the validation dataset.

COHORT FEATURES
We extracted the following variables for each patient as potential predictors of postoperative 30-day mortality: 17,18 (i) age (continuous), (ii) gender (male and female), (iii) body mass index (BMI) (continuous), (iv) functional status (independent and dependent), (v) severity of systemic disease as assessed by American Society of Anesthesiologists Classi cation System (ASA) (I, II, III, IV-V), (vi) comorbid conditions (diabetes mellitus, hypertension, smoking, cardiovascular disease, pulmonary and renal disorders), (vii) preoperative hematocrit (continuous), (viii) preoperative albumin (g/dl) (continuous), (ix) preoperative alkaline phosphatase, (x) preoperative white blood cell count (continuous), (xi) preoperative platelets (continuous), (xii) preoperative creatinine, and (xiii) preoperative sodium (mEq/L) (continuous). The baseline characteristics of the cohort are illustrated in Table 1. The missing data was imputed using multiple imputation with chained equations. The overall 30-day mortality identi ed in NSQIP was used as the dependent variable for the development of our algorithm.

OUTCOME VARIABLES: MORTALITY
Mortality is de ned as death following 30-days following CEA.

BUILDING THE PREDICTIVE MODEL
The rst approach to building the predictive model was using logistic regression. Based upon the Akaike Information Criterion, 19 a forward and backward stepwise selection procedure was conducted. We used a natural cubic spline method to determine the non-linearity of the continuous variables. 20 The second approach was based on the penalized regression model to obtain shrinkage estimators for the regression coe cients using the least absolute shrinkage and selection operator (LASSO) method. 21 Using LASSO, there is shrinkage of regression coe cients for some variables to zero since it uses a regularization method and shrinkage estimator to impose a constraint on the model parameters. Furthermore, we used a 10-fold cross validation to nd a tuning parameter for each predictive model. 22 We used an absolute value of the z-statistic for each model to evaluate the importance of included variable.

PERFORMANCE EVALUATION OF THE PREDICTIVE MODEL
We assessed discrimination of the predictive model using the receiver operator characteristic (ROC) area under the curve (AUC) on both the training and validation datasets. Furthermore, the calibration was assessed by plotting the observed incidence of mortality against the incidence of the model-predicted probability. When the predicted effect for the model is equivalent, the predictions are expected to be close to a 45 diagonal line. Overall model performance was further assessed using the Brier score, which is the mean squared error between the predicted probability and the observed outcome of each model. The Brier score ranges between 0 and 1. A Brier score value of 0 shows a perfect t.
A simulation study was performed evaluating the in uence of sample size on the performance of prediction models for an overall 30-day mortality. Therefore, we randomly selected a subset of data from a varying sample size of n=10, 000 to n=40, 000 patients and repeated the model tting procedure for calculating the predictive ability of overall 30-day mortality using logistic regression to calculate the AUC. Furthermore, decision curve analysis was performed to determine the best model for clinical management using net bene t over a range of probability thresholds.

COHORT CHARACTERISTICS
The overall demographics of the cohort of 65,807 patients are illustrated in Table 1. Figure 1 shows the frequency of patients undergoing CEA based upon patient age. The median age of the cohort was 71.0 years (range, 16-89 years) with 39,184 patients (59.5%) being males.

PREDICTING OVERALL 30-DAY MORTALITY
Overall, 30-day mortality was observed in 0.7% (n=466) of the patients. The receiver operating characteristics of an overall 30-day mortality based on the training and validation data are shown in Figure 2 (A) and (B) respectively, with an AUC value corresponding to 0.72. Furthermore, a calibration plot showing predicted probability and observed outcome in good agreement are illustrated in Figure 3 (Brier score: 0.10). The ROC curve for predicting overall AEs was 0.72 (95%CI: 0.60-0.77) with logistic regression compared to 0.59 (95%CI: 0.52-0.69) with LASSO. This demonstrates that the performance of the predictive model based upon LASSO was lower than that using logistic regression.

VALIDATION AND SENSITIVITY ANALYSIS
The remaining dataset (30%) was used for validation of the performance of this prediction model. The ROC curve of the overall 30-day mortality showed an AUC of 0.73, 95% CI: 0.61-0.79 (Brier score: 0.15) for the validation dataset.
We further performed a simulation study by increasing the sample size to determine the in uence of increasing the sample size on AUCs. The AUC of the prediction model increased from 0.72 to 0.76 when the sample size was increased from N=10,000 to 40,000. However, it did not change beyond this sample size, thus shows there is no improvement in the accuracy as we increased the sample size.

DECISION CURVE ANALYSIS
The decision curve was further plotted as illustrated in Figure 4, which shows that our prediction model provided a higher net bene t in the management of patients compared to changing management for no patients or for all patients undergoing CEA over all thresholds. The overall predictive probability of our model was 90.8% with an estimated risk of 0.09.

Discussion
Several trials have shown CEA to be superior to medical management for the prevention of stroke in patients with symptomatic or asymptomatic carotid artery stenosis 23,24 . The bene cial effects of CEA on preventing stroke in patients with symptomatic carotid stenosis have been well documented. Studies suggest that symptomatic patients have higher 30-day mortality and stroke than asymptomatic patients 25 . Among symptomatic patients, those suffering a pre-operative stroke are vulnerable to a higher risk of 30-day death and stroke rates compared with those presenting with transient ischemic stroke or amaurosis fugax. In this study, we developed a predictive model to quantify risks for 30-day mortality following CEA. We evaluated the feasibility of using machine learning algorithms to identify a set of 9predictive variables from NSQIP and used this to develop an accessible scoring system. The predictive model with the highest performance across discrimination, calibration, and decision analysis was selected. Previous studies 26 have used similar variables, identifying medical comorbidities such as renal failure, ischemic colitis, acute myocardial infarction, among others as important risk factors for perioperative morbidity after CEA.
Surgical results of CEA have been shown to be superior or equivalent to carotid artery stenting, with numerous large-scale clinical trials being conducted 27,28 . The number of CEA is rapidly declining than that of carotid artery stenting as it takes time to master the surgical techniques of CEA 29 . While surgeons are pro cient at recognizing neurologic impairment, the most challenging aspect of deciding upon surgery is often determining whether the patient will tolerate the intervention. Having a better understanding of what leads to complications and poses increased risk of mortality before they happen may help surgeons and patients decide whether to proceed with intervention, carefully weighing the risks and potential bene ts. Though the NSQIP data does not provide the granularity to develop a comprehensive model incorporating these variables, it does allow us to analyze and incorporate several components including laboratory values, nutritional status, and medical co-morbidities to name a few.
In this study, the 30-day death rate of 0.7% is consistent with published data, albeit on the lower end.
Several predictive models were developed primarily using logistic regression with additive main effects as well as 2-and 3-factor interactions. Furthermore, the correlation among predictive factors derived by various methods was high, such as the correlation among factors derived from forward logistic regression and LASSO, showing an AUC of 0.73. Based upon our selected cohort, the predictive model obtained from logistic regression showed comparable accuracy (AUC: 0.72 and 0.73 respectively) in both the training and validation datasets, which accounts for patient-and surgical-related factors. The model with 9-predictive factors which included: age, body mass index, functional health status, American society of anesthesiologist grade, chronic obstructive pulmonary disorder, preoperative serum albumin, preoperative hematocrit, preoperative serum creatinine and preoperative platelet count demonstrated a strong association with predicting 30-day mortality following CEA. Since the purpose of this study was to see if we could create a tool to help supplement clinical decision making, it was important to calibrate the model. Therefore, we assessed the calibration slope and intercept both numerically and graphically. In this study, we used logistic regression to evaluate the independent effect of individual risk factors on 30day mortality.
There are several limitations of this study which include, (i) NSQIP is a prospective database with information from 600+ hospitals in the United States with wide variability in patient-and surgical characteristics and limited patient speci c treatment data. However, this may be an advantage over single institutional studies, which are subject to clustering of patients and its associated biases. As such, clinicians should exercise caution and clinical judgement while interpreting the predictive probabilities derived from this analysis, (ii) NSQIP does not provide postoperative outcomes beyond 30-days. Since mortality can occur at any time after surgery, this is a limitation. However, for the cancer patient, the immediate postoperative recovery window is critical to starting or resuming systemic and/or radiation therapies. For this reason, the 30-day time point is relevant in the cancer setting. (iii) Although several variables were incorporated into this model, the AUC for the model did not exceed 0.8, limiting its performance. This could be due to missing variables not entered in the database which might contribute to the overall risks of 30-day mortality. The lack of cancer speci c treatment information such as molecular and hormonal markers, and radiation therapies also limit the inclusion of these important factors. In addition, incorporation with radiographic imaging and patient reported outcomes is not possible with this dataset. Alternative machine-learning algorithms based upon other analyses like treebase, Bayes-point machine, neural network or deep-learning methods may enhance the predictive accuracy but there is no optimal methodology. (iv) The data in NSQIP is based upon CPT and ICD-9-CM codes, therefore errors in coding may lead to over or under reporting of mortality. (v) The LASSO regression did not show better performance than logistic regression. This could be because the penalized regression approach does better when the number of observations is relatively smaller compared to the number of features, which is different from our study.
Despite these limitations, we were able to develop a machine learning algorithm for predicting 30-day mortality following CEA. The model performed well across discrimination, calibration and decisionanalysis. Furthermore, we created a predictive probability scoring system and developed an open access web calculator for risk strati cation. With further development using prospectively acquired data incorporating medical, surgical, oncologic, radiographic, nutritional, and patient reported outcome data, instruments like this could foreseeably be integrated into electronic medical record systems to provide real-time risk assessments to aid clinical decision making at the bedside.

Conclusions
Machine learning algorithms provide a novel computational model that can help predict outcomes following CEA. Applied to CEA, these algorithms may improve the assessment of postoperative outcomes, optimize risk strati cation, and re ne complications avoidance management strategies. Further study is required to externally validate the proposed system in clinical practice.

Declarations
Competing interests:    The calibration model of the overall 30-day mortality for the training dataset. Decision curve analysis for prediction of 30-days Mortality: In decision curve analysis, the y-axis measures net-bene t, calculated by summing the bene ts (true positives) and subtracting the harms (false positives). The straight line indicates net bene t through changing the management for no patients, horizontal line indicates changing the management for all patients and the dotted line indicates changing management based on the overall prediction model.