Predicting Long-term Mortality in Patients with Stable Angina Across the Spectrum of Dysglycemia: A Machine Learning Approach

Background: We aimed to develop and validate a model to predict mortality in patients with stable angina across the spectrum of dysglycemia. Methods: A total of 1479 patients admitted for coronary angiography due to angina were enrolled. All-cause mortality was followed up and served as the primary endpoint. We compared the performance of different machine learning models for survival analysis and used least absolute shrinkage and selection operator (LASSO) to select important features. Performance was evaluated using Harrell’s C-index and the Brier score. The models were validated with ve-fold cross validation to predict long-term mortality. Results: The features selected by LASSO were age, heart rate, plasma glucose levels at 30 min and 120 min during an oral glucose tolerance test (OGTT), use of angiotensin II receptor blockers, use of diuretics, and smoking history. The best performing model was built using a random survival forest with selected features. It had a good discriminative ability (Harrell’s C-index: 0.829) and acceptable calibration (Brier score: 0.08) for predicting long-term mortality. Among patients with obstructive coronary artery disease conrmed by angiography, our model outperformed the Global Registry of Acute Coronary Events discharge score for mortality prediction (Harrell’s C-index: 0.829 vs. 0.739, P < 0.001). Conclusions: We developed a machine learning model to predict long-term mortality among patients with stable angina. With the integration of OGTT, the model could help to identify patients with stable angina at high risk of mortality across the spectrum of dysglycemia.


Background
Angina is a common symptom of ischemic heart diseases [1]. Based on coronary angiography, ischemic heart disease can be strati ed as obstructive coronary artery disease (CAD) and non-obstructive CAD [1,2]. Obstructive CAD has an established association with increased cardiovascular mortality and is the target of treatment in contemporary practice [3,4]. However, a growing body of evidence has revealed that almost 50%~70% of patients with angina have no signi cant obstruction on coronary angiography [5,6], and they are associated with a high risk of all-cause mortality and cardiovascular events [7,8]. Since there is heterogeneity in pathogenesis among patients with stable angina, the European Society of Cardiology published associated guidelines to address differences in diagnosis and prognosis [9]. However, the optimal management strategy for patients with stable angina is still under debate, and a risk strati cation model to identify patients at high risk and to tailor personal management is warranted [10][11][12].
Current risk strati cation models for patients with ischemic heart disease are mainly based on patients with obstructive CAD, ranging from stable CAD to acute coronary syndrome [13]; however, a predictive model focused on patients with stable angina is still lacking. In addition, glucose perturbation represents a well-documented risk factor for atherosclerosis and is commonly seen in patients with ischemic heart disease [14]. A recent survey showed that dysglycemia detected by the oral glucose tolerance test (OGTT) is prevalent in patients with CAD [15]. However, contemporary predictive models for patients with ischemic heart disease seldom integrate glucose indices into their parameters. For example, the Global Registry of Acute Coronary Events (GRACE) discharge score [16], which is a widely used risk score for patients with acute coronary syndromes and has recently had its predictive ability validated for patients with stable CAD [17], has long been criticized for not including glucose indices [18]. Herein, we aimed to use data collected from patients with stable angina undergoing coronary angiography and OGTT to build a machine learning model for predicting long-term mortality.

Setting and participants
Data were obtained from a prospective, observational study conducted at Taichung Veterans General Hospital. The study enrolled adult patients admitted for coronary angiography between April 2009 and December 2018 due to symptoms of typical angina and under suspicion of ischemic heart disease by cardiologists. Patients were excluded if they had undergone coronary artery bypass graft or they had been diagnosed with diabetes. All patients underwent OGTT after overnight fasting at an outpatient visit after discharge, and glucose levels were tested at fasting, at 30 minutes, and at 120 minutes during the OGTT. Normal glucose regulation was determined as the fasting plasma glucose (FPG) < 100 mg/dL and the glucose level at 120 minutes (OGTT 120 min) < 140 mg/dL. Newly diagnosed diabetes was de ned as the FPG ≥ 126 mg/dL or the OGTT 120 min ≥ 200 mg/dL. Prediabetes was de ned in patients with a glucose regulation between that of normal glucose regulation and diabetes. Obstructive CAD was de ned as at least a lesion with ≥ 50% stenosis and non-obstructive CAD was de ned as no any lesion with ≥ 50% stenosis in coronary angiography reports. Baseline data collected upon admission, coronary angiography reports, and OGTT results and the medications at the outpatient visit were considered as candidate variables for analysis. Variables with > 20% missing values were excluded, and multivariable imputation using the chained equation was applied for the remaining variables [19]. Mortality data up to December 2019 were retrieved from the Collaboration Center of Health Information Application, Department of Health, Executive Yuan, Taiwan, and served as the outcome of interest.

Model development and evaluation
Model development consisted of two parts. In the rst part, we used all available variables to develop prediction models. We chose different machine learning methods, including random survival forest (RSF), gradient boosting machine learning algorithm (XGBoost) for survival analysis, and discrete-time survival model for neural networks [20,21]. RSF is an ensemble tree method for analysis of right censored survival data. It can handle complex interactions among variables, including mixed data types and nonlinear relationships between variables. XGBoost is a novel boosting tree-based ensemble algorithm, whose performance is iteratively improved through optimization of a customized objective function. To handle time-to-event data, the objective function of XGBoost was set as Cox regression. The discrete-time survival model for neural networks was implemented in the Kera deep learning framework and was trained with the maximum likelihood method using minibatch stochastic gradient descent. The likelihood function was used as the loss function, and it naturally incorporated non-proportional hazards. The prediction of these machine learning methods was the hazard ratio of each individual, and their results were compared with those of Cox proportional hazards models.
In the second part, we used features selected by the least absolute shrinkage and selection operator (LASSO)-derived Cox proportional hazards model to construct the predictive model. LASSO regularization has the ability to shrink the estimations and force certain coe cients to zero, thereby keeping only the important features [22]. Signi cant predictors with P values < 0.05 in the LASSO-derived Cox proportional hazards model were selected as input variables for the machine learning methods mentioned above.
The prediction models were internally validated using ve-fold cross validation. Discrimination ability was assessed using the Harrell's C-index [23]. Calibration was evaluated using the Brier score [5]. The Brier score measures the mean squared difference between the predicted probability and the actual outcome, with a lower score indicating better calibrated predictions. To explain the contribution of each variable to the best performing model, Shapley values were utilized [24,25]. Based on game theory, Shapley values can explain a model's prediction by computing the importance of each feature to the prediction.
Comparison with GRACE discharge score For patients with obstructive CAD con rmed by coronary angiography, we calculated their GRACE discharge score and compared its performance with that of the best performing model. Net reclassi cation improvement (NRI) and integrated discrimination improvement (IDI) were used to evaluate the improvement in predictive power of our nal model compared with the GRACE discharge score [26]. Analyses were performed using R 3.4 software (The R Project for Statistical Computing, Vienna, Austria) and Python (version 3.6).

Characteristics of enrolled patients
A total of 1479 patients were included in the analyses, and 157 patients (10.6%) died during a median follow up of 6 years (interquartile range, 3.1-9.1 years). Table 1 lists the cohort's baseline characteristics. Brie y, patients who died tended to be older, with higher heart rates, higher urine albumin-to-creatinine ratios, higher uric acid levels, higher high-density lipoprotein cholesterol levels, higher FPG, and higher OGTT 120 min, whereas their body mass index, diastolic blood pressure, triglycerides, glutamic pyruvic transaminase, hemoglobulin, and estimated glomerular ltration rate were lower compared with patients who survived. Patients who died during follow up had a lower prevalence of beta-blocker use and a higher prevalence of new-diagnosed diabetes, CAD history before admission, smoking history, and uses of angiotensin-converting enzyme inhibitors, angiotensin II receptor blockers (ARB), alpha blockers, and diuretics compared with patients who survived.

Feature importance and model performance
The signi cant predictors selected by the LASSO-derived Cox proportional hazards model included age, heart rate, glucose level at 30 minutes (OGTT 30 min), OGTT 120 min, CAD history, smoking history, use of angiotensin II receptor blockers, and use of diuretics ( Table 2). The performance of the predictive models is shown in Table 3. The RSF model after feature selection had the highest Harrell's C-index (Harrell's C-index: 0.829) and acceptable calibration (Brier score: 0.08). Figure 1 shows the area under the operating curve and the calibration plot of predicted risks at 10 years. Shapley values of the variables in the best performing model are shown in Fig. 2.  Comparison with GRACE discharge score Among patients with obstructive CAD con rmed by angiography during admission, Harrell's C-index of the best performing model was signi cantly greater than the GRACE discharge score (0.829 vs. 0.739, respectively; P < 0.001). The NRI (0.328, 95% con dence interval [CI]: 0.096-0.583, P = 0.027) and IDI (0.135, 95% CI: 0.068-0.203, P = 0.007) indices also showed improvement in predictive ability compared with the GRACE discharge score (Table 4). Table 4 Comparison between GRACE discharge score and the best-performing model.

Model
Harrell's C-index

Discussion
In this study, we built a machine learning based model to predict all-cause mortality among patients with stable angina across the spectrum of dysglycemia. With glucose indices obtained from OGTT and other available clinical data, this model showed good discrimination and accuracy in predicting long-term mortality after coronary angiography. To the best of our knowledge, this study is among the rst to compare state-of-the-art machine learning methods to predict survival in patients with stable angina, with an emphasis on OGTT results as important parameters.
For patients with obstructive CAD, several predictive models have been developed to predict major cardiovascular events and mortality. For example, the GRACE discharge score has been recently validated its accuracy to predict mortality 2 years after coronary angiography with an area under curve of 0.61 for patients with stable CAD [17]. The ABC-CHD score [27] with risk factors identi ed by the Cox proportional hazards model, including age, biomarkers (N-terminal prohormone of brain natriuretic peptide and troponin-T), and clinical histories (smoking, diabetes, and presence of peripheral artery disease), has good discriminatory ability (Harrell's C-index: 0.71) and calibration for three-year mortality. However, previous models seldom include patients with non-obstructive CAD. It has been reported that more than half of patients with angina have no obstructive CAD during coronary angiography [28-30]. Since a sizable proportion of patients have angina has non-obstructive CAD, our model, which was derived from a cohort in which almost 50% of patients had non-obstructive CAD, is more representative of real-world patients with stable angina. Even for patients with obstructive CAD, our model outperformed the GRACE discharge score to predict long-term mortality.
The predictors detected by our LASSO-derived Cox proportional hazards models were age, diuretic use, ARB use, heart rate at admission, OGTT 120 min, and OGTT 30 min. Although some are well-established risk factors for mortality and have been included in previous predictive models for patients with CAD, using OGTT results as parameters for risk strati cation has not been investigated before. Our study showed high accuracy in mortality prediction with the integration of OGTT results. According to the EUROASPIRE study, OGTT 120 min is a predictor of major cardiovascular events and mortality for patients without diabetes [31]. Chattopadhyay et al. [32] showed that with the adjustment of OGTT 120 min, GRACE discharge score has an improved prognostic ability for patients with acute coronary syndrome. Similar to above studies, our model containing OGTT 120 min outperformed the GRACE discharge score for predicting mortality among patients with stable angina, supporting the importance of OGTT 120 min for mortality prediction among patients with ischemic heart disease. Our model also revealed potentially unidenti ed predictors, such as OGTT 30 min. OGTT 30 min has predictive value for developing type 2 diabetes and is associated with in ammatory markers [33,34]; however, its role in mortality prediction has not been previously evaluated. Based on the Shapley value derived from our model, OGTT 30 min also contributed to mortality prediction in patients with stable angina. Further prospective studies are warranted to elucidate the prognostic value of OGTT 30 min.
There are several clinical applications of this model. Our model is derived from patients with stable angina. More than half of patients with stable angina have non-obstructive CAD, and approximately 20%~40% of patients with CAD still suffer from angina symptoms after revascularization [4]. There is heterogeneity in prognosis among patients with stable angina, and it is important to stratify their risk and tailor their management strategy. However, contemporary clinical practice mainly focuses on prevention and management of obstructive CAD [35], despite the fact that the risk of major cardiovascular events and mortality among patients with non-obstructive CAD is increased [36]. Our model could help to identify patients with stable angina at a high risk of mortality. In addition, our model emphasized the importance of OGTT. Screening for dysglycemia using OGTT in patients undergoing percutaneous coronary intervention has long been proposed and is also recommended in European Society of Cardiology guidelines [36,37]. However, adhesion to this recommendation is poor [38], partially because the prognostic role of OGTT is less clear. Our model, which adds prognostic value to the OGTT for patients with stable angina, could increase adhesion to this recommendation.
The major strength of this study is that we used the LASSO-derived Cox proportional hazards model for feature selection and advanced machine learning methods for model development. Only six variables were needed in our model after utilization of LASSO regularization, and most of them, except OGTT, were available from electronic health records, making it a convenient tool to implement in clinical practice. The best performing machine method in our study was RSF. Previous predictive models for CAD were usually built using Cox proportional hazards models; however, several assumptions must be met before applying the Cox proportional hazards model. Conversely, machine learning methods, such as RSF, can handle non-linear, complex relationships between features without assumptions, thus widening their clinical application. However, there are still some limitations which should be highlighted in the current study.
First, this cohort has been conducted since 2009, and contemporary anti-diabetic medications, such as sodium glucose co-transporters 2 inhibitors and glucagon-like peptide-1 receptor agonists, which can reduce mortality risk in patients with type 2 diabetes, were seldom been prescribed. Second, our model has not been externally validated with other independent datasets, so its performance in other datasets is unknown.

Conclusions
We developed a machine learning model, containing OGTT results and other clinically available parameters to predict all-cause mortality among patients with stable angina. This model could help to identify patients at a high risk of mortality. With the integration of glucose indices from OGTT, patients with dysglycemia could be identi ed early, enabling their risk of mortality to be accurately evaluated. Declarations Ethics approval and consent to participate: The study complied with the Declaration of Helsinki and was approved by the Institutional Review Board of Taichung Veterans General Hospital. Written consent was obtained from each patient before the study procedures were performed.

Consent for publication:
Not applicable.
Availability of data and materials: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Competing interests: The authors declare that they have no competing interests. Funding: This work was supported by Taichung Veterans General Hospital, Taichung, Taiwan (grant number  TCVGH-1093503C), the Ministry of Science and Technology, Taiwan (grant number MOST 109-2314-B-075A-004), and National Health Research Institute (grant number NHRI-EX109-10927HT). The funders had no role in the decision to submit the manuscript for publication.
Authors' contributions: Y.L. contributed to the study design, data collection, data analysis, and drafting of the manuscript. W.H.S. contributed to the study design and data collection. W.Y. contributed to the data analysis. Y.C. contributed to data analysis, and drafting and editing of the manuscript. I.L. contributed to the study design, data collection, and editing of the manuscript. All authors performed a critical revision of the manuscript for important intellectual content. I.L is the guarantor of this work and takes full responsibility for the work as a whole, including the study design, access to data, and the decision to submit and publish the manuscript.