STUDY DESIGN AND DATA SOURCES
The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) database was queried for patients who had undergone CEA in the United States from 2005 to 2016. This is a large multi-institutional, prospectively-collected clinical database that reports 30-day postoperative outcomes in the United States. 12,13 Patients with a diagnosis of CEA were identified using applicable ICD-9 and Current Procedural Terminology codes. The de-identified NSQIP data is exempt from review by our Institutional Review Board. IBM® SPSS® Statistics software v23.0 (Armonk, NY: IBM Corp.), Microsoft Azure (Microsoft Corporation, Redmond, Washington), R version 3.4.3 (The R Foundation, Vienna, Austria), RStudio version 1.0.153 (RStudio, Boston, Massachusetts), and Python version 3.6 (Python Software Foundation, Wilmington, Delaware) were used for data analysis, model development, and scoring system development.
GUIDELINES
The Transparent Reporting of a Multivariable Prediction Models for Individual Prognosis or Diagnosis (TRIPOD) and JMIR Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research were followed. 14,15 This was a retrospective machine learning classification study (outcome was binary categorical) for prognostication following CEA.
SPLIT-SAMPLE APPROACH: TRAINING VERSUS VALIDATION SET
Based on the split proportions reported in the literature, 16 we randomly divided the data into training (70%) and validation datasets (30%). A set of predictive models for various types of 30-day mortality for the training dataset was made using (i) a generalized linear regression model with logit link function (logistic regression model), and (ii) least absolute shrinkage and selection operator (LASSO) regularization method. The performance of the prediction models developed using the training dataset was evaluated through the validation dataset.
COHORT FEATURES
We extracted the following variables for each patient as potential predictors of postoperative 30-day mortality: 17,18 (i) age (continuous), (ii) gender (male and female), (iii) body mass index (BMI) (continuous), (iv) functional status (independent and dependent), (v) severity of systemic disease as assessed by American Society of Anesthesiologists Classification System (ASA) (I, II, III, IV-V), (vi) co-morbid conditions (diabetes mellitus, hypertension, smoking, cardiovascular disease, pulmonary and renal disorders), (vii) preoperative hematocrit (continuous), (viii) preoperative albumin (g/dl) (continuous), (ix) preoperative alkaline phosphatase, (x) preoperative white blood cell count (continuous), (xi) preoperative platelets (continuous), (xii) preoperative creatinine, and (xiii) preoperative sodium (mEq/L) (continuous). The baseline characteristics of the cohort are illustrated in Table 1. The missing data was imputed using multiple imputation with chained equations. The overall 30-day mortality identified in NSQIP was used as the dependent variable for the development of our algorithm.
OUTCOME VARIABLES: MORTALITY
Mortality is defined as death following 30-days following CEA.
BUILDING THE PREDICTIVE MODEL
The first approach to building the predictive model was using logistic regression. Based upon the Akaike Information Criterion, 19 a forward and backward stepwise selection procedure was conducted. We used a natural cubic spline method to determine the non-linearity of the continuous variables. 20 The second approach was based on the penalized regression model to obtain shrinkage estimators for the regression coefficients using the least absolute shrinkage and selection operator (LASSO) method. 21 Using LASSO, there is shrinkage of regression coefficients for some variables to zero since it uses a regularization method and shrinkage estimator to impose a constraint on the model parameters. Furthermore, we used a 10-fold cross validation to find a tuning parameter for each predictive model. 22 We used an absolute value of the z-statistic for each model to evaluate the importance of included variable.
PERFORMANCE EVALUATION OF THE PREDICTIVE MODEL
We assessed discrimination of the predictive model using the receiver operator characteristic (ROC) area under the curve (AUC) on both the training and validation datasets. Furthermore, the calibration was assessed by plotting the observed incidence of mortality against the incidence of the model-predicted probability. When the predicted effect for the model is equivalent, the predictions are expected to be close to a 45⸰ diagonal line. Overall model performance was further assessed using the Brier score, which is the mean squared error between the predicted probability and the observed outcome of each model. The Brier score ranges between 0 and 1. A Brier score value of 0 shows a perfect fit.
A simulation study was performed evaluating the influence of sample size on the performance of prediction models for an overall 30-day mortality. Therefore, we randomly selected a subset of data from a varying sample size of n=10, 000 to n=40, 000 patients and repeated the model fitting procedure for calculating the predictive ability of overall 30-day mortality using logistic regression to calculate the AUC. Furthermore, decision curve analysis was performed to determine the best model for clinical management using net benefit over a range of probability thresholds.