Overview of the analysis workflow
We designed an integrative analysis workflow of multiple human retina omics datasets, incorporating DNAm, gene expression, imputed genotypes, AMD-GWAS, and Hi-C data (Fig. 1a). We carried out DNAm profiling of postmortem retinas (n = 160), with an equal distribution of males and females and mean age of 73 years, using the Human MethylationEPIC BeadChip (Supplementary Table 1). After quality control (QC) and covariate analysis (see Methods) (Extended Data Fig. 1a-d), DNAm data from 152 retina samples was integrated with the previously published corresponding genotype and expression data29,31 for cis-mQTL mapping. We also performed concurrent cis-eQTL analysis from 403 retinas with genotype and RNA-seq data from the same study29 correcting for covariates (see Methods). In addition, associations were tested between DNAm of CpG sites and gene expression (cis-eQTMs) using all 152 retina samples with methylation data, RNA-seq and covariates. To test for causal relationships between cis-mQTLs and cis-eQTLs, and between m/eQTLs and AMD GWAS signals, we pursued two approaches: (i) SMR to distinguish pleiotropic or causal association from linkage of genetic associations with DNAm, gene expression and AMD37,19,38, and (ii) colocalization analyses that test whether co-occurring association signals are tagging the same causal variant/haplotype, including eCAVIAR39, coloc40, and multiple-trait-coloc (moloc)41. We also integrated adult retina Hi-C data including chromatin loops, and cis-regulatory elements (CREs) and super-enhancers (SEs)33 inferred from chromatin histone marks with mQTLs, eQTLs, eQTMs, and SMR or moloc associations, to identify high confidence candidate AMD and QTL target genes through physical linking between variants, CpG sites and genes.
The landscape of retina mQTLs and eQTLs
To characterize genetic regulation of DNAm in human retina, we performed cis-mQTL (n = 152) analysis on all genotyped and imputed variants in cis (± 1 Mb) of 749,158 CpG sites that passed stringent QC criteria (see Methods) using QTLtools42. Controlling for genetic population structure with top 10 genotype principal components (PCs) and for batch and other hidden confounding effects with inferred surrogate variables (SVs), we identified 2,817,314 significant variant-CpG cis-mQTLs (FDR ≤ 0.05) for 36,906 CpG sites that map to 10,000 mGenes (Methods, and Supplementary Table 2). We detected 37,453 independent mQTL signals using conditional analysis (Extended Data Fig. 2a), with 98.5% having a single independent signal per CpG (Supplementary Table 2, Extended Data Fig. 2c). Only 564 (1.5%) CpG sites had two independent mQTL signals. Concurrent analysis of all variants within ± 1 Mb of 17,382 genes expressed in the retina (n = 403, Methods) revealed 2,023,293 significant variant-gene cis-eQTLs (FDR ≤ 0.05) for 9,395 eGenes; of these, 12,505 are independent eQTLs (Extended Data Fig. 2b). A large fraction of eGenes (26.8%) showed more than one independent genetic effect in contrast to the CpG sites, though we might detect more independent signals for CpG methylation with a larger sample size (Supplementary Table 3, Extended Data Fig. 2c). Almost 95% of the mQTL variants are clustered near CpGs (median distance of 11.4kb) (Extended Data Fig. 2d), as recently identified for other tissues21, whereas 95% of eQTL variants clustered near the transcriptional start site (TSS) of the corresponding eGene (median distance of 5.5kb) (Extended Data Fig. 2e). A majority of CpGs (including all tested and those with significant mQTLs) are present within 200 bp or 1,500 bp of TSSs, 5’ UTR, gene body and intergenic regions. In addition, CpGs with significant mQTLs were relatively depleted from exon boundary and 3’ UTR regions when compared to all CpGs (Fig. 1b).
Genomic features and biological processes enriched in retina mQTLs compared to eQTLs
We examined the enrichment of mQTL variants (FDR ≤ 0.05) in functional genomic elements and compared the results to eQTL variants using TORUS43,44 (see Methods). Retina mQTLs were significantly enriched in both coding regions and transcriptional regulatory elements, with the strongest enrichment detected in frameshift variants, followed by open chromatin regions, transcription factor (TF) binding sites, enhancers, synonymous variants, and 3’ and 5’ UTR (Fig. 1c). Though highly enriched among frameshift variants, this class of mQTLs accounted for only 214 mVariants (Extended Data Fig. 2f, right panel). Synonymous variants, enhancers, and open chromatin regions accounted for the largest number of enriched mQTLs (thousands of mVariants; Extended Data Fig. 2f, right panel). In contrast, eQTLs were strongly enriched among variants that affect splicing, followed by frameshift variants, 5’ UTR, TF binding sites, 3’ UTR, and regulatory regions (Extended Data Fig. 2f, left panel). Though not significantly enriched in open chromatin regions and enhancers (lower bound 95% confidence interval < 0), the fold-enrichment of eQTLs in these functional categories was higher than that of mQTLs.
We next examined the pathways that are enriched for genes regulated by mQTLs in the retina using gene ontology (GO) enrichment analysis (see Methods). The mQTL target genes are enriched (FDR ≤ 0.05) in a range of biological processes, including those related to cell adhesion, actin filament organization, synaptic signaling, and peptide hormone secretion (Extended Data Fig. 3a, and Supplementary Table 4). Examples of genes driving these GO enrichments include PARK7, a mitochondrial gene involved in synaptic signaling and a potential target of 5 mQTLs (Fig. 1d), and MTOR involved in the actin filament-based process, regulation of GTPase activity, control of cell growth and proliferation and a potential target of 2 mQTLs (Extended Data Fig. 3b and Supplementary Table 4). In contrast, target genes of retina eQTLs are enriched in cellular components (FDR ≤ 0.05), such as extracellular matrix and endoplasmic reticulum lumen (Extended Data Fig. 3c, and Supplementary Table 5). Thus, genetic effects on CpG methylation in the retina appear to be driven by distinct molecular mechanisms and biological processes compared to the genetic effects on gene expression.
Tissue-specificity of retina mQTLs
To evaluate the tissue-specificity of DNAm and mQTLs, we compared 36,906 methylated CpGs with significant mQTLs and 37,453 independent mQTLs in the retina to those identified in five different tissues: adipose (n = 119)45, blood (n = 614)46, endometrium (n = 66)47, brain frontal cortex (n = 526)48, and skeletal muscle (n = 282)20. We detected the highest representation of retina methylated CpGs in the frontal cortex (43%), followed by skeletal muscle (30%), blood (24%), endometrium (24%), and adipose (10%) (Fig. 1e). Our results reflect the shared neuronal nature of frontal cortex and retina. The retina mQTLs also show the highest overlap (22–30%) with mQTLs in skeletal muscle and frontal cortex (Fig. 1f). Interestingly, 13,458 mQTLs and 5,895 CpGs are unique to the retina (Supplementary Tables 2 and 6). Retina-specific methylated CpGs are similarly distributed within and around the gene body, and least represented in exon boundaries and 3’ UTR, as with all the methylated CpGs identified in the retina (Extended Data Fig. 3d). Retina-specific mQTLs are also enriched in similar functional genomic elements as all retina mQTLs, aside for showing significantly stronger enrichment in splice donor variants compared to all retina mQTLs, as in case of retina eQTLs (Fig. 1c). Thus, retina-specific mQTLs may be enriched for genetic effects on alternative isoform expression compared to all mQTLs. Gene ontology analysis of the retina-specific mQTL target genes reveals an enrichment (FDR ≤ 0.05) in unique biological processes that include synaptogenesis and photoreceptor cell maintenance (Extended Data Fig. 3e, and Supplementary Table 7).
Identification of eQTMs in the human retina
We evaluated the association between DNAm of CpG sites in cis (± 1Mb) and genes expressed in the retina, considering 732,506 CpG sites and 18,263 genes. Top 10 genotype PCs, known covariates (e.g., AMD grade), and SVs capturing hidden covariates of expression, were included in the linear regression model (see Methods). We identified a total of 13,747 significant cis-eQTMs (FDR ≤ 0.05) in the retina, comprised of 10,585 unique CpGs (1.4% of tested CpGs) regulating 13,747 unique genes (75.2% of total genes); of which, 11,248 (82%) are protein-coding genes (Supplementary Table 8). Of the 13,747 eQTMs, 770 CpGs and 7,292 genes showed a significant mQTL and eQTL, respectively. All eQTMs were independent signals, and none had a secondary signal with our current sample size. Most CpGs with significant eQTMs resided proximal to the target gene’s TSS with a median distance of 1.07 kb (Fig. 2a) and are most enriched (81%) within 1500 bp and 200 bp of TSSs, first exon, 5’ UTR, and the gene body (Fig. 2b), similar to the CpGs with significant mQTLs. The eQTM target genes are enriched (FDR < 0.05) in mitochondrial and translation-related processes (Extended Data Fig. 4a, and Supplementary Table 9).
We further examined the direction of effect of CpG methylation on gene expression. A higher fraction of CpGs showed a canonical negative correlation with their target gene expression (54.5%) compared to a positive correlation (45.5%), as observed in other tissues20,34,21 (Supplementary Table 8). For example, the CpG cg24846343 located in a gene body is negatively correlated with expression levels of Glutathione S-transferase, GSTT2B (Fig. 2c), and CpGs cg21653793 and cg10832655 located in a 5’ UTR region are negatively correlated with the expression of cholesterol transporter ABCA1 and the neuron derived neurotrophic factor NDNF, respectively (Fig. 2d, and Extended Data Fig. 4b). On the contrary, CpG cg24307499 located in a gene body is positively correlated with the expression of NLRP2, an immune response regulator, and CpG cg04718426 located within 200 bp of the TSS is positively correlated with the zinc finger protein, ZNF232 (Fig. 2e and Extended Data Fig. 4c).
Next, we inspected the distribution of number (fraction) of eQTMs with their target and all known genes across different chromosomes and uncovered a greater relative number of eQTMs on a few smaller chromosomes, e.g., chromosome 16, 17 and 19 (Fig. 2f). While on average eQTMs regulate a single gene, we noticed that 0.2% of CpGs regulated more than 10 genes in the retina and that a majority of these are on chromosomes 16 and 19 (Fig. 2g). Gene ontology analysis of these genes revealed an enrichment (FDR ≤ 0.05) of catabolic processes including autophagic mechanisms and proteasomal protein degradation (Extended Data Fig. 4d and Supplementary Table 10). The CpGs that regulate more than 10 genes and thereby have greater pleiotropic effects are clustered on p and q arms of both chromosomes 16 and 19 (Fig. 2g). Genes on chromosome 16 are enriched in regulation of telomere maintenance (Extended Data Fig. 4e and Supplementary Table 11) and on chromosome 19 in negative regulation of transcription by RNA polymerase II, RNA splicing via transesterification reactions, and positive regulation of cellular protein catabolic process (Extended Data Fig. 4f and Supplementary Table 12).
Causal or pleiotropic relationships between genetic regulation of DNAm and gene expression
Variants that regulate CpG methylation (mQTLs) may in turn affect gene expression, and variants that regulate gene expression (eQTLs) may influence CpG methylation; however, the extent to which these molecular mechanisms occur in the retina is not clear. We thus used Summary-data-based Mendelian Randomization (SMR)37 to examine whether mQTLs underlie the causal mechanism or share the same causal variant (pleiotropy) with eQTLs in the retina or vice versa (Fig. 3a). We performed SMR analysis with retina eQTL and mQTL summary statistics considering DNAm as the exposure and gene expression as the outcome (represented as M2E_SMR) or gene expression as the exposure and DNAm as the outcome (represented as E2M_SMR). We further applied a heterogeneity in dependent instruments (HEIDI) test to differentiate a causal or pleiotropic model from a linkage model (Fig. 3b). In the E2M_SMR analysis, we identified 7,869 associations (SMR P-value < 9.30x10− 7 after Bonferroni correction) (Supplementary Table 13); of these, 5,805 associations passed the linkage test (HEIDI P-value > 0.05), corresponding to 2,256 eQTLs, 5,612 mQTLs and 1,983 genes (Fig. 3c). In the M2E_SMR analysis, 9,307 associations (SMR P-value < 9.38x10− 7 after Bonferroni correction) (Supplementary Table 14) were evident, of which 6,592 associations passed the linkage test (HEIDI P-value > 0.05) corresponding to 5,175 mQTLs, 5,012 eQTLs and 2,101 genes (Fig. 3d). Of the 6,592 M2E_SMR associations, 232 associations (3.5%) corresponding to 232 genes were also identified as significant eQTMs. We detected 554 common associations between E2M_SMR and M2E_SMR (Extended Data Fig. 5a), proposing a feedback mechanism between DNAm and gene expression regulation for a minority of the 513 mQTLs (1.3%) and 292 eQTLs (2.3%). A larger number of significant M2E_SMR associations compared to E2M_SMR suggests that the genetic effect of DNAm on gene expression is a more predominant mechanism than the genetic effect of gene expression on DNAm.
We further compared the genes identified in both E2M_SMR and M2E_SMR analysis and discerned 1,447 common genes that were common out of 1,983 E2M_SMR and 2,101 M2E_SMR genes. Notably, ten common genes were enriched in the glutathione metabolism pathway (Fig. 3e and Extended Data Fig. 5b). The E2M_SMR and M2E_SMR shared genes were also enriched in immune related processes (such as antigen processing and presentation and autoimmune diseases) and metabolic processes (such as glycolysis-gluconeogenesis and metabolism of xenobiotics by cytochrome p450) (Fig. 3e and Supplementary Table 15). For example, for glutathione S-transferase mu 1, GSTM1, we detected an association in E2M_SMR with CpG cg24506221 and SNP rs36209093 (SMR P-value < 8.53x10− 13 and HEIDI P value > 0.05) and another association in M2E_SMR with SNP rs148490733 and CpG cg24506221 (SMR P-value < 4.60x10− 11 and HEIDI P value > 0.05) (Fig. 3f). This causal relationship was further supported by the eQTM analysis, where the CpG cg24506221 located within 200 bp of the TSS of GSTM1 was negatively correlated with GSTM1 expression levels (Fig. 3g).
To provide additional support for a shared causal variant between retina mQTLs and eQTLs that co-occur along the genome, we applied Bayesian colocalization, using coloc40, to all significant eQTL and/or mQTL (EM) variants (Extended Data Fig. 5c). Using a stringent posterior probability of association (PPA) ≥ 0.8, we determined a significant colocalization for 9,417 variants and 2,423 genes (Supplementary Table 16). Target genes of the colocalizing EM signals were enriched in purine and pyrimidine metabolism pathways (FDR < 0.05) (Extended Data Fig. 5d). We further compared EM results from coloc with E2M_SMR and recognized 343 associations in common corresponding to 177 genes (Extended Data Fig. 5e and Supplementary Table 17).
SMR and colocalization of retina mQTLs and eQTLs prioritize causal genes for AMD loci
To test for causal or pleiotropic (hereafter referred to as pleiotropic) relationships between DNAm and AMD risk, we applied SMR to all significant independent retina mQTLs and AMD GWAS summary statistics23 (Fig. 4a). We identified 28 significant associations between mQTLs and AMD at 7 GWAS loci (SMR P-value < 5.68x10− 7 after Bonferroni correction) (Supplementary Table 18); of which, 5 associations at 3 loci (KMT2E/SRPK2, PILRB/PILRA and ARMS2/HTRA1) passed the linkage test (HEIDI P-value > 0.05) and are thus likely to be pleiotropic (Fig. 4b). Similarly, we used our retina eQTLs to test for pleiotropic associations between gene expression and AMD GWAS and identified 14 associations at 7 loci (SMR P-value < 6.41x10− 6 after Bonferroni correction) (Supplementary Table 19); of which, 8 associations at 5 loci (CFI, C2/CFB/SKIV2L, PILRB/PILRA, RDH5/CD63 and TMEM97/VTN) passed the linkage test (HEIDI P-value > 0.05) (Fig. 4c). In our previously published eQTL study, we identified the same target genes for 4 of the loci, including CFI, PILRB/PILRA, RDH5/CD63 and TMEM97/VTN, based on colocalization analysis29 (Supplementary Table 19). The retina mQTLs and eQTLs proposed the same target genes for one locus PILRB/PILRA, and new causal mechanisms and genes for AMD at 3 (KMT2E/SRPK2, PILRB/PILRA and ARMS2/HTRA1) and 2 (C2/CFB/SKIV2L and PILRB/PILRA) loci, respectively. For example, at the PILRB/PILRA locus, we identified SNP rs11766752 and CpG cg07160278 influencing both DNAm and AMD (SMR P-value = 2.88x10− 7 and HEIDI P-value = 0.09). cg07160278 is localized in the TSS region of MEPCE and ZCWPW1 genes, and for this CpG we identified eQTM with PILRA gene (Fig. 4d). With eQTL and AMD GWAS SMR analysis, we identified SNP rs45451301 at the C2-CFB-SKIV2L locus influencing both DXO expression and AMD risk (SMR P-value = 2.02x10− 7 and HEIDI P-value = 0.38) (Fig. 4e). At the CFI locus, SMR detected SNP rs7439493 influencing PLA2G12A expression and AMD risk (SMR P-value = 2.6x10− 8 and HEIDI P-value = 0.22) (Extended Data Fig. 6a), as identified in our previous eQTL study29.
We next used the Bayesian colocalization method eCAVIAR39 to test whether retina mQTLs and eQTLs share one or more causal variants with known AMD GWAS loci. Significant colocalization would prioritize potential causal mQTLs/eQTLs and target CpGs or genes for AMD. Of 52 independent AMD GWAS variants at 34 loci, we identified significant colocalization (Colocalization posterior probability, CLPP > 0.01) of one or more mQTLs and/or eQTLs with 17 AMD GWAS variants in 10 loci (29.4% of loci) (Extended Data Fig. 6b). An average of 1.87 ± 0.52 causal genes per locus were proposed based on mQTLs, 1.4 ± 0.24 based on eQTLs, and 2.1 ± 0.50 causal genes based on mQTLs and/or eQTLs (Supplementary Table 20). A single causal gene was proposed for 5 (14.7%) of the loci (Supplementary Table 20). Fifteen AMD GWAS variants in 8 (23.5%) loci colocalized with at least one retina mQTLs, and 6 AMD GWAS variants in 5 loci (14.7% of loci) colocalized with at least one retina eQTL (Supplementary Table 20). Only mQTLs colocalized with an AMD signal at 5 loci, with CFH as a target gene at the CFH locus (Extended Data Fig. 6c and 6d), a lincRNA LINC01004 at the KMT2E/SRPK2 locus (Fig. 4f and 4g), ARHGAP21 and RNA5SP305 at the ARHGAP2 locus, RAD51B at the RAD51B locus, and MARK4 at the APOE(EXOC3L2/MARK4) locus. Two loci had only eQTLs colocalizing with the AMD signal, with SARM1 and TMEM199 target genes at the TMEM97/VTN locus and TMEM259 at the CNN2 locus. At the AMD locus rs147859257/rs2230199 on chromosome 19, both a mQTL (cg12024887 and cg07567260 that mapped to GRP108, MIR6791 and TRIP10) and an eQTL (acting on GRP108) colocalized with the AMD association signal, suggesting that the causal effect of the colocalizing mQTL in this AMD locus may be acting via altered GRP108 expression levels (Extended Data Fig. 6e, 6f and 6g). Eighteen of the colocalizing genes have not yet been identified as targets at corresponding AMD loci. The mQTL of CpG cg11712338 mapped to LINC01004 and colocalized with AMD at the KMT2E/SRPK2 locus (Fig. 4f). SMR analysis further validated this gene as a high confidence new AMD gene.
Colocalization analysis of retina mQTLs, eQTLs and GWAS identifies additional AMD genes
To determine shared causal variants between retina mQTLs or eQTLs and AMD and identify novel AMD genes, we applied the colocalization method coloc40 to the following pairwise comparisons: AMD GWAS and mQTL (GM), AMD GWAS and eQTL (GE), and mQTL and eQTL (EM) (described above), considering all significant mQTLs/eQTLs genome-wide (Fig. 5a and Methods). We discovered significant GM colocalization results for 69 mVariants and 81 CpG sites corresponding to 58 genes (PPA ≥ 0.8) at 11 AMD loci (Extended Data Fig. 7a and Supplementary Table 21) and significant GE colocalization for 44 eVariants and 32 genes at 3 AMD loci (Extended Data Fig. 7a and Supplementary Table 22). Three colocalizing variants at 3 AMD loci were significant with both GM and GE (Supplementary Table 23). The genes identified in GM and GE were enriched (FDR ≤ 0.05) in immune-related processes, such as antigen processing and presentation (Extended Data Fig. 7b). The GM and GE significant colocalizations detected 42 and 47 new AMD associations, respectively. Considering these new AMD associations, we identified 36 mVariants and 43 CpG sites corresponding to 30 genes in GM and 29 eVariants and 24 genes in GE (Supplementary Tables 21 and 22).
To further identify variants associated with DNAm, gene expression and AMD, we applied the multi-trait colocalization method, moloc41, across AMD GWAS, eQTL and mQTL (GEM) signals (Fig. 5b) (see Methods). We obtained strong evidence (PPA ≥ 0.8) at 18 CpG sites and 10 genes where variations in DNAm, gene expression and AMD GWAS were attributed to the same variant (Table 1, Supplementary Table 24). Seven of the significant colocalizing variants at 3 known AMD loci (C2/CFB/SKIV2L, PILRB/PILRA and CNN2) corresponded to 3 genes (Fig. 5c). SNP rs67538026 associated with methylation of CpG cg05475770, TMEM259 expression and AMD risk was identified at the CNN2 locus (Fig. 5d), and rs114820183 associated with cg22670819 methylation, DXO expression, and AMD risk at the C2-CFB-SKIV2L locus. DXO was also proposed as a potential AMD gene at this locus by SMR analysis of retina eQTL and AMD GWAS statistics.
Table 1
Significant moloc colocalization results for AMD GWAS, retina eQTL and mQTL
CpG
|
Target gene
|
chr
|
best.snp.GEM
|
Variant position
|
Loci
|
cg22670819
|
DXO
|
chr6
|
rs114820183
|
31933131
|
C2/CFB/SKIV2L
|
cg00854166
|
AP4M1
|
chr7
|
rs34130487
|
100161582
|
PILRB/PILRA
|
cg14736458
|
AP4M1
|
chr7
|
rs34130487
|
100161582
|
PILRB/PILRA
|
cg04039547
|
AP4M1
|
chr7
|
rs34130487
|
100161582
|
PILRB/PILRA
|
cg00472528
|
KMT5A
|
chr12
|
rs34477554
|
123405486
|
-
|
cg19881011
|
MTMR10
|
chr15
|
rs6493454
|
31101742
|
-
|
cg21565421
|
MPI
|
chr15
|
rs11072507
|
74762525
|
-
|
cg11955166
|
EHD4
|
chr15
|
rs72735670
|
40922983
|
-
|
cg13906792
|
MPI
|
chr15
|
rs11072507
|
74762525
|
-
|
cg16646645
|
MTMR10
|
chr15
|
rs6493454
|
31101742
|
-
|
cg14323928
|
MTMR10
|
chr15
|
rs6493454
|
31101742
|
-
|
cg21851553
|
MTMR10
|
chr15
|
rs6493454
|
31101742
|
-
|
cg23999607
|
CPLX4
|
chr18
|
rs4940875
|
59463337
|
-
|
cg03536881
|
KASH5
|
chr19
|
rs2098709
|
48684412
|
-
|
cg05475770
|
TMEM259
|
chr19
|
rs67538026
|
1031439
|
CNN2
|
cg00752531
|
TMEM259
|
chr19
|
rs67538026
|
1031439
|
CNN2
|
cg09935308
|
TMEM259
|
chr19
|
rs113772652
|
1031551
|
CNN2
|
cg03641251
|
SNX21
|
chr20
|
rs4810499
|
46340784
|
-
|
The moloc analysis also identified 7 colocalizing variants that affect DNAm and gene expression of 7 genes at 7 loci that are not reported AMD loci and are therefore new AMD associations (Supplementary Table 24). These include the SNP rs11072507 associated with methylation of CpG cg21565421, MPI expression and AMD (Fig. 5e), and the SNP rs6493454 associated with CpG cg19881011 methylation, MTMR10 expression and AMD (Extended Data Fig. 7c). We compared CpGs and genes identified in the GM or EM coloc and GEM moloc analyses, and only detected an overlap of 6 CpGs in GM and GME, 4 genes (KASH5, EHD4, CPLX4 and SNX21) common in GEM and EM, and 5 genes (C8orf58, PDZD7, FN3KRP, SLC16A3, SCAF1) common in GE and EM (Extended Data Fig. 7d and 7e). We further overlapped CpGs and genes identified in the SMR analysis (E2M_SMR, M2E_SMR) and EM coloc analysis, and noted 770 CpGs and 544 genes in common in all 3 analyses (Fig. 5f and 5g) (Supplementary Table 25). The CpGs were mapped to potential target genes (mGenes; see Methods), which showed enrichment in pyruvate metabolism and olfactory signaling pathway (Extended Data Fig. 7f).
Integrative analysis with Hi-C retina map improves target gene prioritization
Finally, to prioritize a high confidence list of variants that contribute to DNAm and gene expression variation, we integrated loops, CREs and SEs from our recently published retina Hi-C data33 with mQTLs, eQTLs, and eQTMs, and their target genes, and variant associations identified in the SMR, eCAVIAR, coloc and moloc analyses. We observed that 73% of mQTLs, 78% of eQTLs and 82% of eQTMs are localized to the A compartment (representing active transcription chromatin regions) compared to 23% of mQTLs,18% of eQTLs and 16% of eQTMs in the B compartment (representing closed chromatin regions with inactive genes); 4% of e/mQTLs or eQTMs fell in uncategorized chromatin regions (Fig. 6a and Supplementary Table 26). Most of the retinal mQTLs and eQTLs are present within the A compartment providing a direct mechanism to explain the impact of a variant on the mGene or eGene. We next examined the overlap of Hi-C loop foot locations with mQTL and eQTL variants and their target genes, distinguishing promoter mQTLs/eQTLs (± 2.5 kb from TSS of target gene) from distal mQTLs/eQTLs (> 2.5 kb from TSS) (see Methods), as previously described49. We identified 5,479 mQTLs overlapping a loop foot; of these, 104 (1.8%) interacted with their mGene promoter (promoter mQTLs), 705 (12.8%) were distal mGene mQTLs, 56 (1%) were promoter non-mGene mQTLs, and 4,614 (84.2%) were distal non-mGene mQTLs (Fig. 6b left panel and Supplementary Table 27). Among the 705 distal mGene mQTLs, 2 mQTLs are associated with known AMD loci (SYN3/TIMP3 and TRPM3). For eQTLs, 1,721 overlapped a loop foot; of these, 17 (0.9%) interacted with their eGene promoter (promoter eQTLs), 186 (10.8%) were distal eGene eQTLs, 15 (0.8%) were promoter non-eGene eQTLs, and 1,503 (87.3%) were distal non-eGene eQTLs (Fig. 6b middle panel and Supplementary Table 28). We similarly tested for overlap of promoter eQTMs (CpGs within ± 2.5 kb from target gene TSS) and distal eQTMs (> 2.5kb from TSS) with retina Hi-C loop foot locations. We identified 3,395 eQTMs overlapping a loop foot; of these, 24 (0.7%) interacted with their target gene promoter (promoter-eQTMs), 282 (8.3%) were promoter non-target gene eQTMs, 195 (5.7%) were distal target gene eQTMs, and 2,894 (85.2%) were distal non-target gene eQTMs (Fig. 6b right panel and Supplementary Table 29).
We next evaluated the overlap of retina mQTLs, eQTLs and eQTMs with retinal CREs and SEs that were identified based on epigenetic marks33 (see Methods). We discovered 6,834 CRE-mQTLs and 2,610 SE-mQTLs; of which, 2,048 (29.9%) and 271 (10.3%) were promoter-mGene mQTLs, respectively (Extended Data Fig. 8a left panel and Supplementary Table 30). For the retina eQTLs, we ascertained 1,953 CRE-eQTLs and 773 SE-eQTLs overlaps; of which, 609 (31.1%) CRE-eQTLs and 84 (10.8%) SE-eQTLs were promoter-eGene eQTLs (Extended Data Fig. 8a middle panel and Supplementary Table 31). For eQTMs, we uncovered 8,996 CRE-eQTMs and 1,392 SE-eQTMs; of which, 668 (7.4%) CRE-eQTMs and 88 (6.3%) SE-eQTMs were promoter-eQTMs (Extended Data Fig. 8a right panel, and Supplementary Table 32).
We further integrated Hi-C loops, CREs and SEs with target genes, and variant associations identified in our SMR, eCAVIAR, coloc and moloc analyses. In SMR analysis of mQTL and AMD GWAS with CRE, SE and loops, we identified target genes for 3 associations (MEPCE; ZCWPW1 and HTRA1) at 2 AMD loci (PILRB/PILRA, ARMS2/HTRA1) (Supplementary Table 33). For SMR analysis of eQTL and AMD GWAS with CRE, we detected target genes for 2 associations (STAG3L5P and BLOC1S1) at 2 AMD loci (PILRB/PILRA and RDH5/CD63) (Supplementary Table 34). In eCAVIAR analysis of mQTL and AMD GWAS with CRE and loops, we found target genes (ARHGAP21, KMT2E and LINC01004) for 2 colocalizations at 2 loci (ARHGAP21 and KMT2E/SRPK2) (Supplementary Table 35). For E2M_SMR associations, we identified targets genes for 1,058 associations with CRE, 538 associations with SE, and 186 associations with loops (Supplementary Table 36). These analyses also identified high confidence causal links with Hi-C loops between retina eQTLs and mQTLs, and 1 eQTL/mQTL association with ALDH2 identified in E2M_SMR (Fig. 6c). Similarly, for M2E_SMR associations, we identified targets genes for 609 associations with CRE, 516 associations with SE, and 187 associations with loops (Supplementary Table 37). We determined 3 mQTL/eQTL associations with GSTP1 identified in M2E_SMR (Fig. 6d), and 4 eQTL/mQTL and mQTL/eQTL associations with EML1 noted in E2M_SMR and M2E_SMR (Extended Data Fig. 8b). For coloc EM colocalizations, we identified target genes for 790 colocalizations with CRE, 450 colocalizations with SE and 216 colocalizations with loops (Supplementary Table 38). We also discovered a target gene for coloc EM colocalization in LPIN1 gene with Hi-C loops (Extended Data Fig. 8c).
Genes and pathways influenced by DNA methylation and associated with AMD
To obtain a non-redundant set of genes, we merged the target genes identified in mQTL and AMD GWAS analyses using SMR, eCAVIAR, coloc and moloc. We identified 4 target genes at 3 loci by SMR, 15 target genes at 8 loci using eCAVIAR, 58 target genes at 11 loci using coloc, and 10 target genes at 3 loci using moloc. By combining the target genes from the four methods, we identified a total of 87 non-redundant genes that are influenced by DNAm (Supplementary Table 39; 50 of these genes are at the reported AMD GWAS loci and 37 are new target genes. These genes belonged to a range of biological processes including mTOR signaling and RHOF GTPase cycle (Reactome, gProfiler adj. P < 0.05), and regulation of actin cytoskeleton reorganization and glycosylation (Gene Ontology, GeneEnrich adj. P < 0.05).