Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning

Background It is increasingly clear that longitudinal risk factor levels and trajectories are related to risk for atherosclerotic cardiovascular disease (ASCVD) above and beyond single measures. Currently used in clinical care, the Pooled Cohort Equations (PCE) are based on regression methods that predict ASCVD risk based on cross-sectional risk factor levels. Deep learning (DL) models have been developed to incorporate longitudinal data for risk prediction but its benefit for ASCVD risk prediction relative to the traditional Pooled Cohort Equations (PCE) remain unknown. Objective To develop a ASCVD risk prediction model that incorporates longitudinal risk factors using deep learning. Methods Our study included 15,565 participants from four cardiovascular disease cohorts free of baseline ASCVD who were followed for adjudicated ASCVD. Ten-year ASCVD risk was calculated in the training set using our benchmark, the PCE, and a longitudinal DL model, Dynamic-DeepHit. Predictors included those incorporated in the PCE: sex, race, age, total cholesterol, high density lipid cholesterol, systolic and diastolic blood pressure, diabetes, hypertension treatment and smoking. The discrimination and calibration performance of the two models were evaluated in an overall hold-out testing dataset. Results Of the 15,565 participants in our dataset, 2,170 (13.9%) developed ASCVD. The performance of the longitudinal DL model that incorporated 8 years of longitudinal risk factor data improved upon that of the PCE [AUROC: 0.815 (CI: 0.782–0.844) vs 0.792 (CI: 0.760–0.825)] and the net reclassification index was 0.385. The brier score for the DL model was 0.0514 compared with 0.0542 in the PCE. Conclusion Incorporating longitudinal risk factors in ASCVD risk prediction using DL can improve model discrimination and calibration.


Introduction
The Pooled Cohort Equations (PCE) were developed by the American College of Cardiology (ACC) and American Heart Association (AHA) in 2013 and updated in 2018 using data from 9 longitudinal cohort studies as a tool for clinicians to predict 10-year risk of atherosclerotic cardiovascular disease (ASCVD). 1,2 Te PCE are a set of race-and sex-speci c Cox proportional hazards models, that include widelyaccepted clinical and behavioral risk factors for ASCVD, including age, sex, race, systolic (SBP) and diastolic blood pressure (DBP), total cholesterol, high density lipid-protein (HDL) cholesterol, smoking status, and type 2 diabetes.In clinical practice, risk predictions from the PCE are a key criterion to determine eligibility for moderate to high intensity statins and hypertension treatments. 1,3 owever, numerous studies have found the performance of the PCE varies across demographic groups [4][5][6] ; cstatistics from these studies ranged from 0.55 to 0.77 (average: 0.70) in men and 0.61 to 0.82 (average: 0.74) in women. 7,8 dditionally, current clinical guidelines provide more ambivalent and complex treatment recommendations for those who fall in the borderline (5-7.5%) and intermediate risk groups (7.5-20%). 9A more accurate and robust risk prediction algorithm can help physicians better assess an individual's risk, allowing them to make more appropriate treatment decisions.
A growing number of studies have demonstrated that long-term risk factor levels are associated with an individual's risk for the development of ASCVD.For instance, incident CVD risk was shown to be dependent on cumulative exposure to LDL-C. 10 In a separate study, incident CVD and survival were also found to be associated with 10-year cumulative SBP. 11Hence, long-term risk factor patterns may be predictive of ASCVD risk above and beyond cross-sectional levels. 12In a prior study, after including 5-year and 10-year cumulative blood pressure measurements in the PCE, researchers found a moderate improvement in the net reclassi cation index. 13Additionally, full integration of multiple longitudinal trajectories of clinical factors into ASCVD prediction is now feasible in clinical practice given advances in computing and electronic medical record (EMR) systems that allow clinicians to access longitudinal risk factor data for their patients.
5][16] Compared with traditional statistical methods, deep learning methods are often superior at processing and creating representations of complex data, such as radiology images and unstructured physician notes, 17,18 without the need of prior feature engineering or selection. 15,19 nce, deep learning can more thoroughly extract and leverage the rich features stored in longitudinal data such as longitudinal blood pressure measurements recorded in the electronic health records (EHR) for predictive tasks.
In this study, we incorporated cross-sectional and longitudinal clinical and behavioral risk factor levels into a state-of-the-art deep learning architecture to create a new prediction model for 10-year risk of incident ASCVD in a pooled cohort of 4 US-based, diverse longitudinal cohorts.We evaluated our model's predictive performance in comparison to that of the PCE in the overall population and in key population subgroups to better understand the importance of longitudinal data for ASCVD risk prediction.Moreover, we determined the importance of each clinical variable used in the prediction model.Lastly, we performed additional evaluations of the model performance in the borderline and intermediate risk groups to better understand our model's potential impact on clinical decision making.

Baseline Characteristics
Baseline demographics and measurements of CVD risk factors included in the PCE are described in Table 1.Pooled cohort participants included in this study were 55% female, 27% non-Hispanic Black and 50 years old on average.We found participants who developed ASCVD in prediction period had signi cantly higher levels of ASCVD risk factors compared with the participants who did not develop ASCVD.

Performance of Models
Table 3 shows the discrimination of the three models in the training and testing datasets.The AUCs for the PCE and the longitudinal Dynamic-DeepHit model were 0.792 (CI: 0.760-0.825)and 0.815 (CI: 0.782-0.844),respectively.The Dynamic-DeepHit model shows slight improvement in discrimination upon the PCE model.The cross-sectional deep learning model achieved an AUC of 0.807 (CI: 0.778-0.838)(Supplemental Table 2).The continuous net reclassi cation index (NRI) for the Dynamic-DeepHit model compared with the PCE was 0.385.The Brier Score for the PCE model was 0.054, 0.052 for the crosssectional deep learning model and 0.051 for the longitudinal deep learning model, showing meaningful improvement in model calibration.
The predicted risks derived from the Dynamic-DeepHit model were found to be generally lower than the risks derived from the PCE (Figure 2).In Figure 3, the calibration of the Dynamic-DeepHit model is compared with the calibration of the PCE by comparing the predicted risk and observed risk within each decile of predicted risk.The PCE is shown to over-predict 10-year ASCVD risk, especially within the top 40% of predicted risk, which corresponds to the 7.5% risk threshold used in clinical guidelines.
Comparatively, the calibration of the Dynamic-DeepHit model is consistently better along the entire spectrum of risk.

Feature Importance
The results of the leave-one-out feature importance analysis are shown in Figure 4.After removing age from the model, the greatest decrease in AUROC was observed (0.769, CI: 0.735-0.803);thus, age is considered the most important variable in the model.Following age, longitudinal SBP was the second most important predictor, with the AUROC reduced to 0.777 (CI: 0.744-0.809).Diabetes diagnosis and hypertension treatment were the most important categorical predictors, with AUROCs reduced to 0.779 (CI: 0.747-0.812)and 0.780 (CI: 0.748-0.813)when these predictors were removed respectively.
Figure 5 shows the longitudinal trajectories of clinical risk factors, including SBP, DBP, total cholesterol and HDL among the individuals whose risk increased and those whose risk decreased after switching to the Dynamic-DeepHit model for ASCVD risk prediction.Between the two groups, the average terminal measurements of SBP and total cholesterol were similar, the historical measurements of those risk factors were higher among those whose predicted risk increased in Dynamic-DeepHit model.

Borderline Risk Strati cation
Among the individuals in the borderline and intermediate risk groups determined by the risk derived from the PCE, the AUC from the Dynamic-DeepHit model was higher than that from the PCE: 0.688 (CI: 0.634-0.742)versus 0.652 (CI: 0.594-0.709).The NRI for the Dynamic-DeepHit model between the borderline and intermediate group was 0.322.The Brier score was 0.069 for the PCE compared with 0.067 for the Dynamic-DeepHit model, again showing some improvement in the model calibration.
Given the 7.5% risk threshold for moderate-intensity statin prescription, we examined the individuals whose risk crossed the threshold in both directions to understand the Dynamic-DeepHit model's potential impact on clinical decision making.In our testing dataset, among those who would be prescribed statins under the PCE risk (N=1,213), 33% (N=405) would not be prescribed statins under the new risk provided by the Dynamic-DeepHit model, and 95% (N=386) of those individuals would not develop ASCVD.Among those who were not prescribed statins using the PCE (N=1,900), 2% (N=34) would be recommended to prescribe statins under the Dynamic-DeepHit model.However, of those individuals, only 3% (N=1) developed ASCVD within 10 years.

Discussion Principal Findings
In this study, we have demonstrated that by incorporating longitudinal data of the same clinical and behavioral predictors as in the PCE using a state-of-the-art and validated deep learning model we can improve the calibration of predicting 10-year ASCVD risk.We leveraged data from 4 diverse cohorts for model training and testing and found that the longitudinal deep learning model outperformed the PCE both in the overall cohort and in speci c subpopulations.We have demonstrated that the longitudinal deep learning model has clinical value through improved discrimination and greater calibration for those with borderline risk of ASCVD, thus providing physicians more reliable estimates of risk for clinical decision making.

Deep Learning in ASCVD Risk Prediction
Longitudinal trends of clinical factors such as blood pressure and cholesterol have long been established to be of clinical importance. 13While this is not the rst study to incorporate longitudinal data for predicting ASCVD, to our knowledge, it is the rst study that uses a deep learning approach.Prior studies used methods such as including aggregate summary statistics of the longitudinal clinical data in the PCE or landmark models that could update data at xed time intervals. 13,25,26 Tis foundational work led to minor improvements in model discrimination; however, we were able to achieve better performance because we utilized a deep learning method.A key advantage of deep learning models is their ability to recognize complex patterns by utilizing multiple layers of arti cial neural networks, which are composed of inter-connected nodes.This advantage manifests in two ways in the Dynamic-DeepHit model.First, the improvement in the discrimination of the cross-sectional DeepHit model over the PCE demonstrates that given the same cross-sectional data, neural networks can make better predictions of ASCVD than the PCE.Second, the RNN can create robust representations of longitudinal clinical data in the presence of missing values, preserving critical information for ASCVD risk prediction.

Clinical Implications
8][29][30][31][32] The Dynamic-DeepHit model performance in Black females may indicate that incorporating longitudinal data can better elucidate higher risks, allowing physicians to make more accurate treatment decisions and reducing health outcome disparities in these high-risk groups.
Among the individuals categorized as borderline-and intermediate-risk by the PCE, the Dynamic-DeepHit model improved discrimination and was better calibrated.One-third of the individuals in the intermediate PCE risk groups had overestimated 10-year ASCVD risk, which indicates the potential for over-prescribing.In these individuals, the Dynamic-DeepHit model slightly under-estimates risk, that it is better at ruling out people who will not have ASCVD events, while not as good as identifying those who will get ASCVD.As current clinical guidance requires further risk analysis for the individuals in these risk groups, guideline-concordant treatment is less optimal.By providing a better calibrated risk assessment, clinicians may be less concerned with over-prescribing and feel more con dent in prescribing guidelineconcordant treatment given the predicted risks from the Dynamic-DeepHit model.
The feature importance analysis shows that longitudinal measurements of clinical variables have meaningful in uence on the performance on the Dynamic-DeepHit model.In the Dynamic-DeepHit model, longitudinal SBP was the most important modi able predictor, while total cholesterol was found to be relatively important as well.Similar to prior research 22 , age was found to be an important predictor in the Dynamic-DeepHit model.In addition, diabetic status, sex, and smoking status were also found to in uence the AUROC of the model.In the observed 8-year trajectories of SBP, DBP, and total cholesterol, for the individuals whose risk changed (Figure 5), at the population level, the aggregate terminal measurements were similar.If prediction occurred only using those terminal measurements, a similar risk pro le between those with increased risk and decreased risk would be assumed.However, the Dynamic-DeepHit model picked up separation in the historical values of those clinical factors, which contributed to the model identifying the differences in risk pro les of the two groups of individuals.Combined with the results of the feature importance analysis, this evidence further supports that longitudinal histories of clinical predictors can provide additional insight in evaluating ASCVD risk pro les.
With the proliferation of EHRs, longitudinal data is readily accessible.In addition, with the advent of cloud and edge computing, it is possible to deliver intensive computing capabilities to the EHR for supporting sophisticated machine learning or deep learning models for clinical risk prediction.This study shows, with further validation, deep learning models can be a powerful tool to aid clinicians to leverage the silos of currently untapped historical patient data in the EHR to improve patient cardiovascular outcomes.New methods of interpreting these models will also add con dence in adoption among physicians.

Limitations
There are several limitations in our study.First, the cohorts used in this study may not re ect the clinical conditions of present-day patients, who are more likely to be on CVD treatments, such as statins.
Therefore, given the limited information we had on statin usage, we did not exclude any participants who may have been on statin treatment.Second, data was recorded more sparsely in the cohort studies, whereas clinical measurements are often more frequent in clinical practice 33 .The quality of the data stored in the EHR could be also compromising, due to varying clinical contexts of when the data was collected.While these data problems exist, deep learning methods are still one of the best tools to overcome such issues. 34,35 n the other hand, EHRs often do not contain up to 8 years of longitudinal data on patients.As this study is a proof of concept, further work is needed to explore e cacy and utility of incorporating longitudinal risk factors into ASCVD risk prediction within EHRs.

Study Population
The four longitudinal cohorts used in this study contributed data to the Cardiovascular Lifetime Risk Pooling Project (LRPP): the Framingham Heart Study, Framingham Offspring Study, Coronary Artery Risk Development in Young Adults (CARDIA) Study, and Atherosclerosis Risk in Communities (ARIC) Study. 20hese cohorts were selected for their number of participants, duration of follow-up, number of participant visits, and consistency of measurement of CVH risk factors.
As the examination schedules differed across cohorts, the number of exams within timeframes varied.To include the largest number of exams across the different studies while balancing the size of the timeframe for the study, we used 8 years of longitudinal data as the timeframe for CVD risk factor ascertainment (observation period).For consistency with the PCEs, outcomes were then measured over a 10-year follow-up period.Thus, to maximize the number of exams included in our study, we included data beginning at the following index exams (i.e. the exam at which risk factor follow-up began) for the included studies (Figure 1): year 15 for the Framingham Heart Study, year 10 for the Framingham Offspring Study, year 18 for the CARDIA study, and year 1 for the ARIC study.The exact start and end years of each cohort as well as their mean and interquartile range of the number of exams in each cohort are shown in Table 2.
Eligible participants were over 40 and under the age of 75 years at the point of prediction (i.e. the end of the 8 year observation period), had no record of self-report or diagnosed ASCVD at the index exam or during the 8 year observation period, and had at least one measurement of SBP, DBP, total cholesterol and HDL cholesterol.The LRPP is approved by the Northwestern IRB and this study utilized de-identi ed data from each of the included cohorts in LRPP.Written informed consent was obtained for all participants and analysis were performed in accordance with relevant guidelines.

Outcome: ASCVD incidence
The outcome in our study was ASCVD incidence, de ned as the incidence of coronary heart disease, ischemic stroke, or CVD-related death, over a 10-year period that began at the end of the observation period (Figure 1). 11,20 oronary heart disease and ischemic stroke were adjudicated by review of medical records by study investigators. 21Participants without any recorded event at the end of the study, or who died of other causes during the follow-up period were considered right censored.

Features: CVD Risk Factors
CVD risk factors included in the original PCE include systolic BP, diastolic BP, total cholesterol, and HDL cholesterol, and were measured 1-4 times during the 8-year observation period.Blood pressure was measured using standard methods by clinic staff in the various cohorts. 21,22 asting HDL-C, total cholesterol measurements and blood glucose were collected via blood serum. 20,22 iagnosis of diabetes and treatment for hypertension, predictors also included in the PCE, were self-reported at the index visit. 21,22 Ae, sex, race, ethnicity, smoking status, and alcohol consumption were self-reported at the index visit. 21,22

Statistical Analysis
The deep learning model used in this study is Dynamic-DeepHit, which enabled the incorporation of longitudinal risk factor data in a dynamic fashion to estimate 10-year risk of incident ASCVD. 23The Dynamic-DeepHit model has been demonstrated to have substantial improvements over traditional predictive methods, including the Cox Proportional Hazards Model, in predicting cystic brosis outcomes. 23e Dynamic-DeepHit model consists of two neural networks: 1) a recurrent neural network (RNN) that processes the longitudinal measurements and predicts future measurements of time-varying covariates, and 2) a fully connected neural network that estimates the probability of the speci c event at a given time.RNNs are commonly used for machine learning problems involving temporal or sequential data and can capture long-term dependencies in the data.The Dynamic-DeepHit model also utilizes an attention mechanism that identi es important longitudinal measurements when making risk predictions, which improves predictive performance.The second neural network takes as input the learned representations that are output from the rst neural network along with the last recorded set of behavioral and clinical covariates (e.g. the most recent CVD risk factor measurements at the end of the 8-year observation period).The output layer of the second neural network converts the learned relationships between the risk factors and outcome into the 10-year risk of incident CVD.
To explore the reasons for any improvements in the predictive power we also implemented a crosssectional DeepHit model.This allowed us to disentangle whether the improvements were due to the incorporation of the longitudinal data or simply to the complexity of the neural network modeling methods.The DeepHit model was tted on only the last set of measurements for each participant within the 8-year observation period.We also t the traditional PCE model, to understand its performance in this sample.
Data pre-processing included randomly splitting the dataset into 3 chunks, called training, tuning, and testing, at a 3:1:1 ratio.The Dynamic-DeepHit and cross-sectional DeepHit models were trained in the training dataset and corresponding hyperparameters were tuned in the tuning dataset.The training data for the PCE included both the training and tuning datasets.The testing dataset, not used in model development, was used for validation.The participants were the same in each of the respective datasets for each model.
We assessed model discrimination and calibration of all 3 models.We calculated and compared the Area Under the Receiver Operator Curve (AUROC) for all models to evaluate model discrimination, the ability of the model to discriminate those who have a higher risk of having an event from those at lower risk.Brier scores were used to evaluate the calibration of the model; lower scores indicate better calibration, the extent of the estimated risk correspond to observed event rates. 24e trained Dynamic-DeepHit model was evaluated in the following population groups: Black males, Black females, other (White, Hispanic, Asian) males, other females, under 60 years old and 60 or over years old.These demographic groups were chosen to mirror the same classi cations used for the sexand race-speci c PCE.As in the overall analysis, the AUROCs were compared between corresponding population subgroups.
To understand the importance of each predictor in the Dynamic-DeepHit model, we took a leave-one-out approach.We removed one predictor at a time from the Dynamic-DeepHit model and retrained and retested the model.The change in the testing dataset AUROC was calculated for each feature removed: the greater the change in AUROC, the greater the importance of the predictor.To also understand the role of longitudinal clinical risk factors better in the Dynamic-DeepHit model, we examined the average trajectories of SBP, DBP, total cholesterol and HDL for the individuals whose predicted risk increased and those whose risk decreased in the Dynamic-DeepHit model.

Figure 3 Observed
Figure 3

Figure 4 Feature
Figure 4

Table 2 .
Trajectories were created via generalized The o cial start year, start year of the observation period (after adjustment), end year of the 8 year follow-up period, average number of exams within the 8 year follow-up period as well as the interquartile range of the number of exams by each cohort.

Table 3 .
Discrimination performance of models and in population subgroups.