Overview of approach
The overview of study design is shown in Fig. 1. We conducted analyses using Illumina’s Methylation EPIC BeadChip (850K) and NHLBI Trans-Omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA) Exam 1 arterial stiffness and pulsatile hemodynamics (AS/PH) traits with about 900 multi-ancestry participants. We first applied quality control (QC) on DNA methylation (DNAm) data and filtered out methylation positions with low methylation variation (i.e., the standard deviation of methylation Beta-values < 0.02). The epigenetic analyses were then performed on the filtered methylation positions. First of all, we performed epigenome-wide association studies (EWAS) between AS/PH traits and DNAm at individual CpG position and co-methylated genomic region in race/ethnicity pooled analyses respectively. Then the CpGs included in both differentially methylated positions (DMPs) and differentially methylated regions (DMRs) were examined further in three follow-up analyses (i.e., gene set enrichment analysis, expression quantitative trait methylation analysis, and functional enrichment analysis) for each AS/PH trait. Finally, we used the overlap of significant findings across three follow-up analyses to prioritize CpGs and their annotated genes for each AS/PH trait.
Study participants
The study population consisted of participants free from clinical CVD enrolled in MESA, a longitudinal study of subclinical CVD and risk factors that predict progression to clinically overt cardiovascular disease or progression of the subclinical disease51. Between 2000 and 2002, MESA recruited 6,814 men and women 45 to 84 years of age from Forsyth County, North Carolina; New York City; Baltimore; St. Paul, Minnesota; Chicago; and Los Angeles. Exclusion criteria were clinical CVD, weight exceeding 136 kg (300 lb.), pregnancy, and impediment to long-term participation. Approximately 38 percent of the recruited participants are white, 28 percent African American, 22 percent Hispanic/Latino, and 12 percent Asian, predominantly of Chinese descent. All research was performed in accordance with relevant guidelines and regulations. All participants provided informed consent and the protocols of MESA were approved by the IRBs of collaborating institutions and the National Heart, Lung and Blood Institute and informed consent was obtained from all participants. Research involving human research participants was performed in accordance with the Declaration of Helsinki. The participants in this study are a subset of MESA individuals who have both AS/PH measures (described in detail previously52–56) and DNAm measures57 from MESA Exam 1 (2000–2002).
DNA methylation profiling
DNAm profiles were generated using the Infinium Methylation EPIC BeadChip (850K) (Illumina, Inc., San Diego, CA). As part of the TOPMed MESA Multi-Omics project, 900 participants were selected for MESA Exam 1 DNAm profiling based on the following criteria: (1) restrict to those already included in the TOPMed Whole Genome Sequencing effort57, (2) preserve the race/ethnic distribution of participants in the parent MESA cohort, (3) maximize the amount of overlapping ‘omics data (with the other ‘omics included in the TOPMed MESA Multi-Omics pilot requiring availability of plasma samples for proteomics/metabolomics and RNA from PBMCs, monocytes or T cells for RNA-seq). The QC was applied to DNAm data prior to analysis, including color bias correction, median background adjustment, standard quantile adjustment, batch effect correction (using ComBat function in sva58 R package), sex or race mismatch check, and outlier detection (using Gaphunter function in minifi59 R package). We further filtered out low methylation variation positions that the standard deviation of their methylation Beta-values is less than 0.02. Hence, 491,174 CpGs were left for epigenetic analyses. To preserve better statistical properties (i.e., homoscedasticity), the M-values (i.e, M = log(Beta/(1-Beta))) were used in the analyses.
Arterial stiffness and pulsatile hemodynamics traits
The study examined eight AS/PH traits, including: aortic augmentation index (AIx, %) measured by PWA; aortic arch pulse-wave velocity measured by cardiac magnetic resonance imaging (CMR-PWV, m/sec), ascending aortic distensibility (AAD, mmHg), descending aortic distensibility (DAD, mmHg); Young’s Elastic Modulus (YM, mmHg) and distensibility coefficient (DC, mmHg), measured by carotid ultrasound; PTC1 (milliseconds) and PTC2 (milliseconds) obtained from radial artery pressure waveforms. The definition and measurement of all eight AS/PH traits in MESA are described in previous studies21,54,56. Due to the skewness of raw phenotype data of eight AS/PH traits, the log-transformation was applied to AS/PH traits except for DC where the square-root transformation was applied.
EWAS at individual CpG
The linear regression adjusted for covariates was performed to test association between each AS/PH trait and M-value of individual CpG at epigenome-wide scale excluding sex chromosomes for 491,174 CpGs. The covariates included age, sex, race/ethnicity, BMI, smoking status (never, former, or current), smoking pack-years, mean arterial pressure, anti-hypertensive medication usage (yes or no), anti-diabetic medication usage (yes or no), lipid-lowering medication usage (yes or no), fasting glucose, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, triglycerides, estimated cell type proportions (Lymph, Mono, and Neu), and the first four genomic principal components (PCs) of ancestry. We carried out pooled analyses across self-reported race/ethnic groups for EWAS. The Illumina60 was used for CpG annotation. We then generated a list of 59 candidate genes from literature-based approach7. The EWAS results for CpGs annotated to 59 candidate genes were referred to as candidate-gene approach results. All association tests were adjusted for multiple comparisons using false discovery rate (FDR) correction (Benjamini-Hochberg) at 5%. Due to the limited statistical power of this study, we considered CpGs with EWAS p-values passed suggestive significance as suggestive DMPs. The suggestive significance cut-offs for epigenome-wide approach and candidate-gene approach were 1x10− 4 and 0.05, respectively. All suggestive DMPs were then used in the follow-up analyses.
EWAS at co-methylated genomic region
We applied coMethDMR61 on 491,174 CpGs to identify DMRs. First, coMethDMR identified co-methylated sub-regions with closely located and co-methylated CpGs. We first extracted clusters of CpGs located closely within genomic regions, i.e., the CpG cluster has at least three CpGs and the maximum separation between any two consecutive CpGs within the cluster is 200 base pairs. This step helps to ensure the sub-regions with similar CpG densities. Then we used the correlation between methylation levels among CpGs (i.e., rdrop statistics > 0.4 in coMethDMR) in a sub-region to identify co-methylated CpGs. Next, the median of M-values of CpGs within a co-methylated region was used to test association with AS trait in a random coefficient mixed model in coMethDMR that allowed us to model both variations between CpGs within the region and differential methylation simultaneously. The mixed model was also adjusted for the same covariates as for EWAS at individual CpG. We used the AnnotateResults function in coMethDMR to annotate co-methylated regions. The association results for co-methylated regions annotated to 59 candidate genes were referred to as candidate-gene approach results. Similar to the identification of suggestive DMP, the co-methylated region with association p-values passed suggestive significance was defined as suggestive DMR. The suggestive significance cut-offs for DMR were 5x10− 4 and 0.05 for epigenome-wide approach and candidate-gene approach, respectively. All suggestive DMRs were then used in the follow-up analyses as well.
Weighted gene co-expression network analysis (WGCNA)
WGCNA62 is a systems biology method that can be used to find modules (clusters) with highly correlated methylation levels and to relate modules to clinical traits. We applied the WGCNA R package63 on 491,174 CpGs to identify modules significantly associated with AS/PH traits. First, an unsigned co-methylation network was constructed by using blockwiseModules function (soft thresholding power = 6, merge cut height = 0.25, and minimum module size = 30). 41 modules were identified from the WGCNA network. DNAm levels of CpGs within a module were summarized by the module eigengene (ME) value which represents the overall methylation level of CpGs clustering in a module. Next, the linear regression model adjusted for covariates was performed between ME value and AS/PH traits for each module to identify significantly associated modules with AS/PH traits. The covariates were the same as EWAS analysis. We considered the module with association p-value < 0.05 as a significant module. Finally, we used CpGs in the significant modules to carry out gene set enrichment analysis.
Gene set enrichment analysis (GSEA)
We conducted both Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis for each AS/PH trait on two sets of CpGs respectively. The first set of CpGs, the significant EWAS set, was composed of CpGs from the union of both suggestive DMPs and DMRs identified based on suggestive significance of EWAS for each trait of interest. The second set of CpGs, the significant WGCNA-module set, contained CpGs in trait-associated significant modules (P < 0.05) from the WGCNA network. We used the gsameth function in the missMethyl64 R package that was developed for genome-wide DNAm data to conduct GSEA. The FDR at 5% was applied to GSEA results to correct for multiple testing comparisons.
Expression quantitative trait methylation (eQTM) analysis
To identify genes whose expression is associated with significant DNAm differences, we conducted eQTM analysis for both suggestive DMPs and DMRs with their annotated genes (Illumina reference table was used to annotate genes for CpGs and DMR annotation was done by coMethyDMR). We used 587 multi-ancestry samples with both MESA Exam 1 RNA-seq normalized gene expression and DNA methylation for association analysis. First, we removed confounding effects in DNA methylation by fitting the linear regression model M value ~ age + gender + race + first 4 genomic PCs of ancestry + estimated cell type proportions (Lymph, Mono, and Neu) and extracting DNA methylation residuals from the model. We used the median M value for the DMR. Similarly, we removed potential confounding effects in RNA-seq by fitting model normalized gene expression ~ age + gender + race + first 4 genomic PCs of ancestry + PEER factors 1–10 and extracting gene expression residuals from the model. Next, for each gene expression and significant DNAm difference pair, we tested association between gene expression residuals (outcome) and DNA methylation residuals via a simple linear regression to quantify eQTM. The FDR at 5% was applied to eQTM results for suggestive DMPs and DMRs respectively to correct for multiple testing comparisons.
Functional enrichment analysis
To understand the complex interpretation of significant DNAm differences better, we applied eFORGE65,66 to test whether our AS/PH trait-associated DNAm differences were enriched in regulatory elements from the Roadmap Epigenomics Consortium67 across 20 tissues and cell types. The CpGs included in both suggestive DMPs and DMRs for each AS/PH trait were used for eFORGE 15 chromatin states enrichment analysis. eFORGE selects a background of 1,000 random CpGs with matching properties based on gene-centric categories (first Exon, 3′ untranslated region or UTR, 5′UTR, Body, intergenic region or IGR, TSS1500 and TSS200) and CpG island-centric categories (CpG island, CpG island shore/shelf, N/A or “open sea”). Then the eFORGE uses a 1 kb proximity to filter out highly correlated input CpGs. Finally, eFORGE compares the number of CpGs overlapping regulatory elements from the reference panel with those obtained randomly to calculate enrichment scores for each of the selected cell types. eFORGE performs the Benjamini–Yekutieli approach to account for multiple testing corrections for a cell-type level significance.
Tissue correlation look-up in GTEx
We used GTEx v8 gene expression data68 of both Whole-Blood and Aorta tissue (AS/PH-relevant tissue) to check their tissue correlation for the prioritized genes after follow-up analyses. The GTEx v8 gene expression data was downloaded from the GTEx portal. The inverse normalization was first applied to 360 tissue-overlapped GTEx samples. Then both Pearson’s correlation and its p-value were reported to measure tissue-correlation level.