Dataset
In this study we used data from the UK Biobank (UKB), a large prospective population study including 500,000 participants aged 40 to 69 years at the time of recruitment [6]. Information about socio-demographics, lifestyle, environment, medical background, genetics, physical measurements, and clinical outcomes were recorded. Since 2015, over 70,000 participants underwent CMR.
This study complies with the Declaration of Helsinki. It is covered by the ethical approval for UKB studies from the NHS National Research Ethics Service on 17th June 2011 (Ref 11/NW/0382) and extended on 18 June 2021
(Ref 21/NW/0157). Written informed consent was obtained from all participants.
Study populations
The process for selecting our study cohorts is shown in Fig. 1. Participants with available ECG and CMR metrics were included, while those with over 50 missing values or outliers (± 5 interquartile range (IQR)) were removed [7]. Next, a control group (n = 31,167) free from pre-existing cardiovascular diseases (Supplemental Table 1) was first identified. This group was further split into train and test sets to define the heart aging model. Then, we identified two distinct disease groups based on disease diagnosis codes from the UKB showcase (Supplemental Table 2) at the time of imaging: the IHD group, comprising 2,142 subjects with prevalent IHD and the CA group, comprising 1,683 subjects with prevalent CA. After computing the HAG, post-hoc analyses were performed on each group using the subjects with available values for the exposure variable of interest (see Fig. 1).
ECG
The UKB 12-leads ECG data were collected in selected participants by placing 12 electrodes in the standard position. The ECGs were recorded at 500 Hz frequency for 10 seconds (Cardiosoft v6.51 GE) and stored as XML files. The XML files were downloaded from UKB and subsequently processed using GE MUSE v9.0 SP4, Marquette 12 SL algorithm [8]. The file acquisition in Muse was done with re-analysis enabled to generate diagnostic codes and quantitative data that was saved via Muse as SQL databases. Queries in SQL using Microsoft SQL Server Management Studio (2014-12.0.4237.0) were applied to extract detailed measurements to be used for further analysis. The acquired ECGs were also saved in Hilltop and PDF formats to allow for manual review and validation; randomly selected ECGs underwent internal quality control confirming the diagnostic codes.
The Marquette 12SL ECG Analysis program makes measurements of 12-lead ECG recordings by first QRS detection, followed by ventricular rate calculation, median formation where beats of the same shape are combined into representative cycles, followed by onset/offsets and intervals detection, followed by wave measurements and finally p-wave detection. The wave measurement part generates a measurement matrix containing amplitudes represented in voltages in uV with respect to the voltage at the QRS onset; this is done separately for each lead. Further definitions of the ECG features are shown in Fig. 2.
CMR analysis and conventional parameters
CMR images were acquired using 1.5 scanners (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthcare, Erlangen, Germany) following a pre-defined acquisition protocol [9]. In brief, a combination of long-axis cines and a complete short-axis stack covering both the left and right ventricles (LV, RV) acquired using balanced steady-state free precession sequences was used for the cardiac assessment [10] [11] [12]. Conventional measures of LV and RV structure and function were extracted using an automated pipeline with inbuilt quality control, as previously described [10]. The following nine parameters were derived and used in our study: LV and RV volumes in end-diastole and end-systole (LVEDV, RVEDV, LVESV, and RVESV), LV and RV stroke volumes (LVSV, RVSV) LV and RV ejection fraction (LVEF, RVEF) and LV mass (LVM).
CMR radiomics
Besides conventional CMR measures, a series of CMR radiomics features were extracted. A detailed description of CMR radiomics analysis is provided in the Supplemental Methods. Briefly, CMR radiomics were extracted based on the segmentations of the left and right atrium (LA, RA) from long-axis in end diastole (ED), as well as the segmentations of the RV, LV, and LV myocardium (MYO) from short-axis in ED and end-systole (ES).
The open-source PyRadiomics (version 2.2.0) library was used to extract a total of 210 atrial radiomics features in long axis and 636 radiomics features in short-axis describing ventricular and myocardial characteristics. These features included shape, size, first-order statistics, and texture-based features (the last two types of features are also called as signal intensity (SI)-based features). A priori harmonization of the CMR images was performed using histogram matching by selecting one image randomly as the reference, and radiomics features were discretized and calculated using a bin width of 25 (default value). Overall, this comprehensive approach enabled the characterization of cardiac tissue properties in greater detail and provided a wealth of information for modelling cardiac aging.
Vascular risk factors
Traditional VRFs, including hypertension, diabetes, hypercholesterolemia, and smoking, were used as possible exposures. Body mass index (BMI) and waist-hip ratio, two biomedical indices of body adiposity, were also evaluated. These clinical exposures were ascertained using selected UKB fields (Supplemental Table 3).
Statistical analysis
The analysis was performed using Python 3.8.10 (Python Software Foundation, Delaware USA) and Scikit-learn version 0.23.2. A two-sided p value < 0.05 was considered statistically significant for all analyses.
The Central Illustration shows the overall analysis steps, whilst a more detailed description of each analysis is provided below.
Cardiac age model and HAG
The ECG and radiomics features were used as predictors to estimate cardiac age in the control group by using a Bayesian Ridge Regression model. Weight, height, and sex were used as confounds and regressed from the input features using a linear regression model. The new predictors were standardized to zero mean, and one unite standard deviation. Thereafter, the data was split into training-test sets (training set, 80%; testing set, 20%).
Due to high dimensionality of the data, we applied principal component analysis (PCA) to the predictors (433 ECG biomarkers and 840 radiomics features) of the training set before building the model, and we retained the number of chosen components explaining 95% of the variance. The test data, IHD and CA cohorts were projected on the same space using the PC loadings from the training. The resulted principal components (PC)s were standardized to zero mean, and one unit standard deviation before being fitted to the model to estimate cardiac age. In the model, the PCs were the independent variables, while the actual chronological age was the dependent variable. Age was de-meaned before fitting to the model. The resulting model was then applied to the test data and Mean Absolute Error (MAE) and coefficient of determination (R2) were calculated to assess model performance. After removing the age-dependency bias [13] [14], we calculated the HAG by subtracting the actual age from the estimated heart age in the test set, to measure the deviation between heart age and actual age, accordingly to previous literature [3] [4] [15]. The HAG was considered as a marker of cardiac aging.
The trained model was then applied to the two diseased cohorts, IHD and CA, following the same aforementioned steps. Independent student t-test was used the assess whether the difference in HAG between control (test) and each disease groups was statistically significant.
The most informative PCs contributing to the model were identified using SHAP method [16] [17]. Thereafter, the top ten PCs were returned to their original space to show the contribution of each feature in the PCs. That was done to reveal the effect of each feature on cardiac aging across the different groups.
To further validate the predictive ability of the heart age model for each disease condition, we used logistic regression in which the HAG was used as input and the subject disease status was used as output. The model was corrected for age, sex, and BMI. Two models were thus considered: control vs IHD and control vs CA. Coefficient value of HAG was reported as the effect size while the p-value of the coefficient was used to determine significant associations.
Mediation analysis
Mediation analysis was conducted to assess to which extent conventional CMR metrics mediated the impact (indirect effect) of having the disease (IHD vs control; CA vs control) on cardiac aging (HAG). The ordinary least squares regression and logistic regression using PROCESS package in R and SPSS [18] was used to study the associations (described in terms of effect) between the variables (disease status, each CMR metric as mediator, HAG). The model was adjusted for age, sex, and BMI. Nine analyses were conducted for each cohort (e.g. test, IHD and CA) which represent the number of the mediators (CMR metrics).
Association with VRFs
A linear regression model was used to assess the role of VRFs in cardiac age for each group (control (test), IHD and CA). The association was adjusted for sex and age. IQR was used before the association to exclude individuals (outliers) with more than 1.5 IQR below Q1 or more than 1.5 IQR above Q3 of HAG.