Study Population
The four longitudinal cohorts used in this study contributed data to the Cardiovascular Lifetime Risk Pooling Project (LRPP): the Framingham Heart Study, Framingham Offspring Study, Coronary Artery Risk Development in Young Adults (CARDIA) Study, and Atherosclerosis Risk in Communities (ARIC) Study.20 These cohorts were selected for their number of participants, duration of follow-up, number of participant visits, and consistency of measurement of CVH risk factors.
As the examination schedules differed across cohorts, the number of exams within timeframes varied. To include the largest number of exams across the different studies while balancing the size of the timeframe for the study, we used 8 years of longitudinal data as the timeframe for CVD risk factor ascertainment (observation period). For consistency with the PCEs, outcomes were then measured over a 10-year follow-up period. Thus, to maximize the number of exams included in our study, we included data beginning at the following index exams (i.e. the exam at which risk factor follow-up began) for the included studies (Figure 1): year 15 for the Framingham Heart Study, year 10 for the Framingham Offspring Study, year 18 for the CARDIA study, and year 1 for the ARIC study. The exact start and end years of each cohort as well as their mean and interquartile range of the number of exams in each cohort are shown in Table 2.
Eligible participants were over 40 and under the age of 75 years at the point of prediction (i.e. the end of the 8 year observation period), had no record of self-report or diagnosed ASCVD at the index exam or during the 8 year observation period, and had at least one measurement of SBP, DBP, total cholesterol and HDL cholesterol. The LRPP is approved by the Northwestern IRB and this study utilized de-identified data from each of the included cohorts in LRPP. Written informed consent was obtained for all participants and analysis were performed in accordance with relevant guidelines.
Outcome: ASCVD incidence
The outcome in our study was ASCVD incidence, defined as the incidence of coronary heart disease, ischemic stroke, or CVD-related death, over a 10-year period that began at the end of the observation period (Figure 1).11, 20 Coronary heart disease and ischemic stroke were adjudicated by review of medical records by study investigators.21 Participants without any recorded event at the end of the study, or who died of other causes during the follow-up period were considered right censored.
Features: CVD Risk Factors
CVD risk factors included in the original PCE include systolic BP, diastolic BP, total cholesterol, and HDL cholesterol, and were measured 1-4 times during the 8-year observation period. Blood pressure was measured using standard methods by clinic staff in the various cohorts.21, 22 Fasting HDL-C, total cholesterol measurements and blood glucose were collected via blood serum.20, 22 Diagnosis of diabetes and treatment for hypertension, predictors also included in the PCE, were self-reported at the index visit.21, 22 Age, sex, race, ethnicity, smoking status, and alcohol consumption were self-reported at the index visit.21, 22
Statistical Analysis
The deep learning model used in this study is Dynamic-DeepHit, which enabled the incorporation of longitudinal risk factor data in a dynamic fashion to estimate 10-year risk of incident ASCVD.23 The Dynamic-DeepHit model has been demonstrated to have substantial improvements over traditional predictive methods, including the Cox Proportional Hazards Model, in predicting cystic fibrosis outcomes.23
The Dynamic-DeepHit model consists of two neural networks: 1) a recurrent neural network (RNN) that processes the longitudinal measurements and predicts future measurements of time-varying covariates, and 2) a fully connected neural network that estimates the probability of the specific event at a given time. RNNs are commonly used for machine learning problems involving temporal or sequential data and can capture long-term dependencies in the data. The Dynamic-DeepHit model also utilizes an attention mechanism that identifies important longitudinal measurements when making risk predictions, which improves predictive performance. The second neural network takes as input the learned representations that are output from the first neural network along with the last recorded set of behavioral and clinical covariates (e.g. the most recent CVD risk factor measurements at the end of the 8-year observation period). The output layer of the second neural network converts the learned relationships between the risk factors and outcome into the 10-year risk of incident CVD.
To explore the reasons for any improvements in the predictive power we also implemented a cross-sectional DeepHit model. This allowed us to disentangle whether the improvements were due to the incorporation of the longitudinal data or simply to the complexity of the neural network modeling methods. The DeepHit model was fitted on only the last set of measurements for each participant within the 8-year observation period. We also fit the traditional PCE model, to understand its performance in this sample.
Data pre-processing included randomly splitting the dataset into 3 chunks, called training, tuning, and testing, at a 3:1:1 ratio. The Dynamic-DeepHit and cross-sectional DeepHit models were trained in the training dataset and corresponding hyperparameters were tuned in the tuning dataset. The training data for the PCE included both the training and tuning datasets. The testing dataset, not used in model development, was used for validation. The participants were the same in each of the respective datasets for each model.
We assessed model discrimination and calibration of all 3 models. We calculated and compared the Area Under the Receiver Operator Curve (AUROC) for all models to evaluate model discrimination, the ability of the model to discriminate those who have a higher risk of having an event from those at lower risk. Brier scores were used to evaluate the calibration of the model; lower scores indicate better calibration, the extent of the estimated risk correspond to observed event rates.24
The trained Dynamic-DeepHit model was evaluated in the following population groups: Black males, Black females, other (White, Hispanic, Asian) males, other females, under 60 years old and 60 or over years old. These demographic groups were chosen to mirror the same classifications used for the sex- and race-specific PCE. As in the overall analysis, the AUROCs were compared between corresponding population subgroups.
To understand the importance of each predictor in the Dynamic-DeepHit model, we took a leave-one-out approach. We removed one predictor at a time from the Dynamic-DeepHit model and retrained and retested the model. The change in the testing dataset AUROC was calculated for each feature removed: the greater the change in AUROC, the greater the importance of the predictor. To also understand the role of longitudinal clinical risk factors better in the Dynamic-DeepHit model, we examined the average trajectories of SBP, DBP, total cholesterol and HDL for the individuals whose predicted risk increased and those whose risk decreased in the Dynamic-DeepHit model. Trajectories were created via generalized estimating equations (GEE) to account for correlation between repeated measurements for individuals. The trajectories were visualized across exam times with the 95% confidence bands.
Current blood pressure and cholesterol control guidelines use risk thresholds based on the PCE to inform clinical care. Physicians are advised to prescribe medium intensity statins if an individual’s ASCVD risk is over 7.5%. However, differentiation of individuals between the borderline and intermediate PCE risk groups could be improved. We calculated the net reclassification index (NRI) between the PCE and the Dynamic-DeepHit model, to understand how the Dynamic-DeepHit model changed individuals’ risk classification. We then conducted additional analysis to better understand the performance of the Dynamic-DeepHit model in borderline and intermediate groups, and how clinical behavior would be affected if the risk derived from the Dynamic-DeepHit model was used instead of risk from the PCE.
All statistical analysis was performed using Python version 3.8 and R 4.0.2. A 5% type-I error rate was used when calculating all confidence intervals.