QTL mapping of human retina DNA methylation identifies 87 gene-epigenome interactions in age-related macular degeneration

DNA methylation (DNAm) provides a crucial epigenetic mark linking genetic variations to environmental influence. We analyzed array-based DNAm profiles of 160 human retinas with co-measured RNA-seq and > 8 million genetic variants, uncovering sites of genetic regulation in cis (37,453 mQTLs and 12,505 eQTLs) and 13,747 eQTMs (DNAm loci affecting gene expression), with over one-third specific to the retina. mQTLs and eQTMs show non-random distribution and enrichment of biological processes related to synapse, mitochondria, and catabolism. Summary data-based Mendelian randomization and colocalization analyses identify 87 target genes where methylation and gene-expression changes likely mediate the genotype effect on age-related macular degeneration (AMD). Integrated pathway analysis reveals epigenetic regulation of immune response and metabolism including the glutathione pathway and glycolysis. Our study thus defines key roles of genetic variations driving methylation changes, prioritizes epigenetic control of gene expression, and suggests frameworks for regulation of AMD pathology by genotype–environment interaction in retina.


Introduction
Common healthy and disease traits in humans exhibit extensive variability, are largely multifactorial, and dictated by a complex interplay between genetic architecture and widely varying environments 1 . Genomic variations can impact phenotypes through genetic and epigenetic control of gene expression programs.
Large genome-wide association studies (GWAS) have been effective in identifying thousands of genetic variants that are linked to common traits (https://www.ebi.ac.uk/gwas/); however, a vast majority of the associated variations are present in non-coding regions of the genome, likely impacting gene regulation and consequently disease pathogenesis 2,3,4 . The GTEx project has provided extensive descriptions of expression quantitative trait loci (eQTLs) for many human tissues 5,6 . Nonetheless, an array of dynamic environmental factors, including diet and socio-economic status, can complicate the interpretation of GWAS datasets 7 . Furthermore, many common traits are also in uenced by advanced age, and genetic inheritance and environment are critical determinants of the aging process itself 8, 9 . GWAS success has so far been limited in unraveling the complexities of gene-environment relationships and the contribution of individual genetic variations to multifactorial phenotypes.
DNA methylation (DNAm) is a key dynamic epigenetic mark that is established during development and largely preserved in differentiated cell types to maintain chromatin organization and genetic controls 10,11,12 . DNAm is an active contributor to gene regulation, and tissue-speci c aberrations in DNAm landscape are associated with environmental factors (such as diet and exercise), aging as well as agerelated diseases 13,11,14,12 . Non-coding variants, both in cis and trans, can exert strong in uence on DNAm, alter chromatin topology, and modulate the expression of target genes 15,16,17,18 . Integrated analyses of genetic variants affecting DNAm (mQTLs), association between DNAm sites and gene expression (eQTMs), and GWAS of complex traits have only recently begun to elucidate the complex relationships among multiple disease-causing factors 19,20,21 . However, no such information currently exists for eye or retinal tissues and traits.
Age-related macular degeneration (AMD) is a multifactorial progressive neurodegenerative disease, which is characterized by loss of central vision and is the leading cause of irreversible visual impairment and blindness in older individuals worldwide 22 . Patients with AMD exhibit lipid-rich extracellular deposits (drusen), atrophy of the retinal pigment epithelium (RPE), and loss of photoreceptors primarily in the central macular region of the retina. Advanced age is arguably a major critical component in addition to genetic susceptibility and environmental factors (such as smoking and diet), which together determine etiology and varying phenotypes of AMD 22 . A large AMD GWAS had previously identi ed association of 52 independent genetic variants at 34 loci 23 , which have been expanded to 46 loci in a larger GWAS metaanalysis 24 and 63 loci by a cross-ancestry GWAS 25 . Causal genes and functionally relevant variants at most AMD-associated loci are still unrecognized, though a few key biological pathways have begun to emerge 26 . Ultra-rare variants in case-control or family-based genetic studies can potentially point to causal genes 23,27 , as exempli ed by the identi cation of complement 8A and C8B 28 . Furthermore, like other complex traits, the majority of AMD-associated variants are present in non-coding regions that could regulate expression and epigenetic landscape of distal genes. Integrated statistical analyses of GWAS with gene expression quantitative trait loci (eQTLs) in retina and GTEx tissues have helped in prioritizing potential target genes for AMD [29][30][31][32] . Recent high-resolution mapping of human retinal genome topology further helps elucidate chromatin looping patterns of variants in distal cis-regulatory elements, such as enhancers, and re nes candidate disease-causing genes 33 . Despite innovative advances, we have limited understanding of underlying mechanisms that associate genetic regulation with epigenomic shifts linked to aging and environmental factors in AMD pathogenesis.
QTL mapping of DNAm and its integration with GWAS and eQTLs in the human retina can potentially uncover epigenomic regulation of disease pathogenesis, as demonstrated for multiple other tissues 20,21,34 . We note that alterations in DNAm are also associated with aging in the retina 35,36 . Here, we generated genome-wide DNAm pro les of 160 human retina samples to identify associations between genetic, epigenetic, and transcriptional variation relevant to retinal homeostasis and complex disease traits, such as AMD. We report mapping of mQTLs and eQTMs and an integrative analysis of mQTLs, eQTLs and AMD-GWAS variants. Using Summary data-based Mendelian Randomization (SMR), multiple colocalization methods, and Hi-C data, we demonstrate complex associations among genetic variants, DNAm, and gene expression and identify 87 unique genes affected by DNAm that may contribute to AMD risk. Our studies provide insights into molecular mechanisms underlying epigenomic regulation of AMD and suggest aging and environment-responsive pathways.

Overview of the analysis work ow
We designed an integrative analysis work ow of multiple human retina omics datasets, incorporating DNAm, gene expression, imputed genotypes, AMD-GWAS, and Hi-C data (Fig. 1a). We carried out DNAm pro ling of postmortem retinas (n = 160), with an equal distribution of males and females and mean age of 73 years, using the Human MethylationEPIC BeadChip (Supplementary Table 1). After quality control (QC) and covariate analysis (see Methods) (Extended Data Fig. 1a-d), DNAm data from 152 retina samples was integrated with the previously published corresponding genotype and expression data 29,31 for cis-mQTL mapping. We also performed concurrent cis-eQTL analysis from 403 retinas with genotype and RNA-seq data from the same study 29 correcting for covariates (see Methods). In addition, associations were tested between DNAm of CpG sites and gene expression (cis-eQTMs) using all 152 retina samples with methylation data, RNA-seq and covariates. To test for causal relationships between cis-mQTLs and cis-eQTLs, and between m/eQTLs and AMD GWAS signals, we pursued two approaches: (i) SMR to distinguish pleiotropic or causal association from linkage of genetic associations with DNAm, gene expression and AMD 37,19,38 , and (ii) colocalization analyses that test whether co-occurring association signals are tagging the same causal variant/haplotype, including eCAVIAR 39 , coloc 40 , and multiple-trait-coloc (moloc) 41 . We also integrated adult retina Hi-C data including chromatin loops, and cis-regulatory elements (CREs) and super-enhancers (SEs) 33 inferred from chromatin histone marks with mQTLs, eQTLs, eQTMs, and SMR or moloc associations, to identify high con dence candidate AMD and QTL target genes through physical linking between variants, CpG sites and genes.

The landscape of retina mQTLs and eQTLs
To characterize genetic regulation of DNAm in human retina, we performed cis-mQTL (n = 152) analysis on all genotyped and imputed variants in cis (± 1 Mb) of 749,158 CpG sites that passed stringent QC criteria (see Methods) using QTLtools 42 . Controlling for genetic population structure with top 10 genotype principal components (PCs) and for batch and other hidden confounding effects with inferred surrogate variables (SVs), we identi ed 2,817,314 signi cant variant-CpG cis-mQTLs (FDR ≤ 0.05) for 36,906 CpG sites that map to 10,000 mGenes (Methods, and Supplementary Table 2). We detected 37,453 independent mQTL signals using conditional analysis (Extended Data Fig. 2a), with 98.5% having a single independent signal per CpG (Supplementary Table 2, Extended Data Fig. 2c). Only 564 (1.5%) CpG sites had two independent mQTL signals. Concurrent analysis of all variants within ± 1 Mb of 17,382 genes expressed in the retina (n = 403, Methods) revealed 2,023,293 signi cant variant-gene cis-eQTLs (FDR ≤ 0.05) for 9,395 eGenes; of these, 12,505 are independent eQTLs (Extended Data Fig. 2b). A large fraction of eGenes (26.8%) showed more than one independent genetic effect in contrast to the CpG sites, though we might detect more independent signals for CpG methylation with a larger sample size (Supplementary Table 3, Extended Data Fig. 2c). Almost 95% of the mQTL variants are clustered near CpGs (median distance of 11.4kb) (Extended Data Fig. 2d), as recently identi ed for other tissues 21 , whereas 95% of eQTL variants clustered near the transcriptional start site (TSS) of the corresponding eGene (median distance of 5.5kb) (Extended Data Fig. 2e). A majority of CpGs (including all tested and those with signi cant mQTLs) are present within 200 bp or 1,500 bp of TSSs, 5' UTR, gene body and intergenic regions. In addition, CpGs with signi cant mQTLs were relatively depleted from exon boundary and 3' UTR regions when compared to all CpGs (Fig. 1b).

Genomic features and biological processes enriched in retina mQTLs compared to eQTLs
We examined the enrichment of mQTL variants (FDR ≤ 0.05) in functional genomic elements and compared the results to eQTL variants using TORUS 43,44 (see Methods). Retina mQTLs were signi cantly enriched in both coding regions and transcriptional regulatory elements, with the strongest enrichment detected in frameshift variants, followed by open chromatin regions, transcription factor (TF) binding sites, enhancers, synonymous variants, and 3' and 5' UTR (Fig. 1c). Though highly enriched among frameshift variants, this class of mQTLs accounted for only 214 mVariants (Extended Data Fig. 2f Fig. 2f, left panel). Though not signi cantly enriched in open chromatin regions and enhancers (lower bound 95% con dence interval < 0), the fold-enrichment of eQTLs in these functional categories was higher than that of mQTLs.
We next examined the pathways that are enriched for genes regulated by mQTLs in the retina using gene ontology (GO) enrichment analysis (see Methods). The mQTL target genes are enriched (FDR ≤ 0.05) in a range of biological processes, including those related to cell adhesion, actin lament organization, synaptic signaling, and peptide hormone secretion (Extended Data Fig. 3a, and Supplementary Table 4). Examples of genes driving these GO enrichments include PARK7, a mitochondrial gene involved in synaptic signaling and a potential target of 5 mQTLs (Fig. 1d), and MTOR involved in the actin lamentbased process, regulation of GTPase activity, control of cell growth and proliferation and a potential target of 2 mQTLs (Extended Data Fig. 3b and Supplementary Table 4). In contrast, target genes of retina eQTLs are enriched in cellular components (FDR ≤ 0.05), such as extracellular matrix and endoplasmic reticulum lumen (Extended Data Fig. 3c, and Supplementary Table 5). Thus, genetic effects on CpG methylation in the retina appear to be driven by distinct molecular mechanisms and biological processes compared to the genetic effects on gene expression.

Tissue-speci city of retina mQTLs
To evaluate the tissue-speci city of DNAm and mQTLs, we compared 36,906 methylated CpGs with signi cant mQTLs and 37,453 independent mQTLs in the retina to those identi ed in ve different tissues: adipose (n = 119) 45 , blood (n = 614) 46 , endometrium (n = 66) 47 , brain frontal cortex (n = 526) 48 , and skeletal muscle (n = 282) 20 . We detected the highest representation of retina methylated CpGs in the frontal cortex (43%), followed by skeletal muscle (30%), blood (24%), endometrium (24%), and adipose (10%) (Fig. 1e). Our results re ect the shared neuronal nature of frontal cortex and retina. The retina mQTLs also show the highest overlap (22-30%) with mQTLs in skeletal muscle and frontal cortex (Fig. 1f). Interestingly, 13,458 mQTLs and 5,895 CpGs are unique to the retina (Supplementary Tables 2  and 6). Retina-speci c methylated CpGs are similarly distributed within and around the gene body, and least represented in exon boundaries and 3' UTR, as with all the methylated CpGs identi ed in the retina (Extended Data Fig. 3d). Retina-speci c mQTLs are also enriched in similar functional genomic elements as all retina mQTLs, aside for showing signi cantly stronger enrichment in splice donor variants compared to all retina mQTLs, as in case of retina eQTLs (Fig. 1c). Thus, retina-speci c mQTLs may be enriched for genetic effects on alternative isoform expression compared to all mQTLs. Gene ontology analysis of the retina-speci c mQTL target genes reveals an enrichment (FDR ≤ 0.05) in unique biological processes that include synaptogenesis and photoreceptor cell maintenance (Extended Data Fig. 3e, and Supplementary Table 7).

Identi cation of eQTMs in the human retina
We evaluated the association between DNAm of CpG sites in cis (± 1Mb) and genes expressed in the retina, considering 732,506 CpG sites and 18,263 genes. Top 10 genotype PCs, known covariates (e.g., AMD grade), and SVs capturing hidden covariates of expression, were included in the linear regression model (see Methods). We identi ed a total of 13,747 signi cant cis-eQTMs (FDR ≤ 0.05) in the retina, comprised of 10,585 unique CpGs (1.4% of tested CpGs) regulating 13,747 unique genes (75.2% of total genes); of which, 11,248 (82%) are protein-coding genes (Supplementary Table 8). Of the 13,747 eQTMs, 770 CpGs and 7,292 genes showed a signi cant mQTL and eQTL, respectively. All eQTMs were independent signals, and none had a secondary signal with our current sample size. Most CpGs with signi cant eQTMs resided proximal to the target gene's TSS with a median distance of 1.07 kb (Fig. 2a) and are most enriched (81%) within 1500 bp and 200 bp of TSSs, rst exon, 5' UTR, and the gene body ( Fig. 2b), similar to the CpGs with signi cant mQTLs. The eQTM target genes are enriched (FDR < 0.05) in mitochondrial and translation-related processes (Extended Data Fig. 4a, and Supplementary Table 9).
We further examined the direction of effect of CpG methylation on gene expression. A higher fraction of CpGs showed a canonical negative correlation with their target gene expression (54.5%) compared to a positive correlation (45.5%), as observed in other tissues 20,34,21 (Supplementary Table 8). For example, the CpG cg24846343 located in a gene body is negatively correlated with expression levels of Glutathione S-transferase, GSTT2B (Fig. 2c), and CpGs cg21653793 and cg10832655 located in a 5' UTR region are negatively correlated with the expression of cholesterol transporter ABCA1 and the neuron derived neurotrophic factor NDNF, respectively (Fig. 2d, and Extended Data Fig. 4b). On the contrary, CpG cg24307499 located in a gene body is positively correlated with the expression of NLRP2, an immune response regulator, and CpG cg04718426 located within 200 bp of the TSS is positively correlated with the zinc nger protein, ZNF232 ( Fig. 2e and Extended Data Fig. 4c).
Next, we inspected the distribution of number (fraction) of eQTMs with their target and all known genes across different chromosomes and uncovered a greater relative number of eQTMs on a few smaller chromosomes, e.g., chromosome 16, 17 and 19 (Fig. 2f). While on average eQTMs regulate a single gene, we noticed that 0.2% of CpGs regulated more than 10 genes in the retina and that a majority of these are on chromosomes 16 and 19 (Fig. 2g). Gene ontology analysis of these genes revealed an enrichment (FDR ≤ 0.05) of catabolic processes including autophagic mechanisms and proteasomal protein degradation (Extended Data Fig. 4d and Supplementary Table 10). The CpGs that regulate more than 10 genes and thereby have greater pleiotropic effects are clustered on p and q arms of both chromosomes 16 and 19 (Fig. 2g). Genes on chromosome 16 are enriched in regulation of telomere maintenance (Extended Data Fig. 4e and Supplementary Table 11) and on chromosome 19 in negative regulation of transcription by RNA polymerase II, RNA splicing via transesteri cation reactions, and positive regulation of cellular protein catabolic process (Extended Data Fig. 4f and Supplementary Table 12).
Causal or pleiotropic relationships between genetic regulation of DNAm and gene expression Variants that regulate CpG methylation (mQTLs) may in turn affect gene expression, and variants that regulate gene expression (eQTLs) may in uence CpG methylation; however, the extent to which these molecular mechanisms occur in the retina is not clear. We thus used Summary-data-based Mendelian Randomization (SMR) 37 to examine whether mQTLs underlie the causal mechanism or share the same causal variant (pleiotropy) with eQTLs in the retina or vice versa (Fig. 3a). We performed SMR analysis with retina eQTL and mQTL summary statistics considering DNAm as the exposure and gene expression as the outcome (represented as M2E_SMR) or gene expression as the exposure and DNAm as the outcome (represented as E2M_SMR). We further applied a heterogeneity in dependent instruments (HEIDI) test to differentiate a causal or pleiotropic model from a linkage model (Fig. 3b). In the E2M_SMR analysis, we identi ed 7,869 associations (SMR P-value < 9.30x10 − 7 after Bonferroni correction) (Supplementary Table 13); of these, 5,805 associations passed the linkage test (HEIDI P-value > 0.05), corresponding to 2,256 eQTLs, 5,612 mQTLs and 1,983 genes (Fig. 3c). In the M2E_SMR analysis, 9,307 associations (SMR P-value < 9.38x10 − 7 after Bonferroni correction) (Supplementary Table 14) were evident, of which 6,592 associations passed the linkage test (HEIDI P-value > 0.05) corresponding to 5,175 mQTLs, 5,012 eQTLs and 2,101 genes (Fig. 3d). Of the 6,592 M2E_SMR associations, 232 associations (3.5%) corresponding to 232 genes were also identi ed as signi cant eQTMs. We detected 554 common associations between E2M_SMR and M2E_SMR (Extended Data Fig. 5a), proposing a feedback mechanism between DNAm and gene expression regulation for a minority of the 513 mQTLs (1.3%) and 292 eQTLs (2.3%). A larger number of signi cant M2E_SMR associations compared to E2M_SMR suggests that the genetic effect of DNAm on gene expression is a more predominant mechanism than the genetic effect of gene expression on DNAm.
We further compared the genes identi ed in both E2M_SMR and M2E_SMR analysis and discerned 1,447 common genes that were common out of 1,983 E2M_SMR and 2,101 M2E_SMR genes. Notably, ten common genes were enriched in the glutathione metabolism pathway ( Fig. 3e and Extended Data Fig. 5b). The E2M_SMR and M2E_SMR shared genes were also enriched in immune related processes (such as antigen processing and presentation and autoimmune diseases) and metabolic processes (such as glycolysis-gluconeogenesis and metabolism of xenobiotics by cytochrome p450) (Fig. 3e and  Supplementary Table 15). For example, for glutathione S-transferase mu 1, GSTM1, we detected an association in E2M_SMR with CpG cg24506221 and SNP rs36209093 (SMR P-value < 8.53x10 − 13 and HEIDI P value > 0.05) and another association in M2E_SMR with SNP rs148490733 and CpG cg24506221 (SMR P-value < 4.60x10 − 11 and HEIDI P value > 0.05) (Fig. 3f). This causal relationship was further supported by the eQTM analysis, where the CpG cg24506221 located within 200 bp of the TSS of GSTM1 was negatively correlated with GSTM1 expression levels (Fig. 3g).
To provide additional support for a shared causal variant between retina mQTLs and eQTLs that co-occur along the genome, we applied Bayesian colocalization, using coloc 40 , to all signi cant eQTL and/or mQTL (EM) variants (Extended Data Fig. 5c). Using a stringent posterior probability of association (PPA) ≥ 0.8, we determined a signi cant colocalization for 9,417 variants and 2,423 genes (Supplementary Table 16). Target genes of the colocalizing EM signals were enriched in purine and pyrimidine metabolism pathways (FDR < 0.05) (Extended Data Fig. 5d). We further compared EM results from coloc with E2M_SMR and recognized 343 associations in common corresponding to 177 genes (Extended Data Fig. 5e and Supplementary Table 17).
We next used the Bayesian colocalization method eCAVIAR 39 to test whether retina mQTLs and eQTLs share one or more causal variants with known AMD GWAS loci. Signi cant colocalization would prioritize potential causal mQTLs/eQTLs and target CpGs or genes for AMD. Of 52 independent AMD GWAS variants at 34 loci, we identi ed signi cant colocalization (Colocalization posterior probability, CLPP > 0.01) of one or more mQTLs and/or eQTLs with 17 AMD GWAS variants in 10 loci (29.4% of loci) (Extended Data Fig. 6b). An average of 1.87 ± 0.52 causal genes per locus were proposed based on mQTLs, 1.4 ± 0.24 based on eQTLs, and 2.1 ± 0.50 causal genes based on mQTLs and/or eQTLs (Supplementary Table 20). A single causal gene was proposed for 5 (14.7%) of the loci (Supplementary  Table 20). Fifteen AMD GWAS variants in 8 (23.5%) loci colocalized with at least one retina mQTLs, and 6 AMD GWAS variants in 5 loci (14.7% of loci) colocalized with at least one retina eQTL (Supplementary Table 20). Only mQTLs colocalized with an AMD signal at 5 loci, with CFH as a target gene at the CFH locus (Extended Data Fig. 6c and 6d), a lincRNA LINC01004 at the KMT2E/SRPK2 locus ( Fig. 4f and 4g), ARHGAP21 and RNA5SP305 at the ARHGAP2 locus, RAD51B at the RAD51B locus, and MARK4 at the APOE(EXOC3L2/MARK4) locus. Two loci had only eQTLs colocalizing with the AMD signal, with SARM1 and TMEM199 target genes at the TMEM97/VTN locus and TMEM259 at the CNN2 locus. At the AMD locus rs147859257/rs2230199 on chromosome 19, both a mQTL (cg12024887 and cg07567260 that mapped to GRP108, MIR6791 and TRIP10) and an eQTL (acting on GRP108) colocalized with the AMD association signal, suggesting that the causal effect of the colocalizing mQTL in this AMD locus may be acting via altered GRP108 expression levels (Extended Data Fig. 6e, 6f and 6g). Eighteen of the colocalizing genes have not yet been identi ed as targets at corresponding AMD loci. The mQTL of CpG cg11712338 mapped to LINC01004 and colocalized with AMD at the KMT2E/SRPK2 locus (Fig. 4f). SMR analysis further validated this gene as a high con dence new AMD gene.

Colocalization analysis of retina mQTLs, eQTLs and GWAS identi es additional AMD genes
To determine shared causal variants between retina mQTLs or eQTLs and AMD and identify novel AMD genes, we applied the colocalization method coloc 40 to the following pairwise comparisons: AMD GWAS and mQTL (GM), AMD GWAS and eQTL (GE), and mQTL and eQTL (EM) (described above), considering all signi cant mQTLs/eQTLs genome-wide ( Fig. 5a and Methods). We discovered signi cant GM colocalization results for 69 mVariants and 81 CpG sites corresponding to 58 genes (PPA ≥ 0.8) at 11 AMD loci (Extended Data Fig. 7a Table 23). The genes identi ed in GM and GE were enriched (FDR ≤ 0.05) in immune-related processes, such as antigen processing and presentation (Extended Data Fig. 7b). The GM and GE signi cant colocalizations detected 42 and 47 new AMD associations, respectively. Considering these new AMD associations, we identi ed 36 mVariants and 43 CpG sites corresponding to 30 genes in GM and 29 eVariants and 24 genes in GE (Supplementary Tables 21 and 22).
To further identify variants associated with DNAm, gene expression and AMD, we applied the multi-trait colocalization method, moloc 41 , across AMD GWAS, eQTL and mQTL (GEM) signals (Fig. 5b) (see Methods). We obtained strong evidence (PPA ≥ 0.8) at 18 CpG sites and 10 genes where variations in DNAm, gene expression and AMD GWAS were attributed to the same variant (Table 1, Supplementary   Table 24). Seven of the signi cant colocalizing variants at 3 known AMD loci (C2/CFB/SKIV2L, PILRB/PILRA and CNN2) corresponded to 3 genes (Fig. 5c). SNP rs67538026 associated with methylation of CpG cg05475770, TMEM259 expression and AMD risk was identi ed at the CNN2 locus (Fig. 5d), and rs114820183 associated with cg22670819 methylation, DXO expression, and AMD risk at the C2-CFB-SKIV2L locus. DXO was also proposed as a potential AMD gene at this locus by SMR analysis of retina eQTL and AMD GWAS statistics. The moloc analysis also identi ed 7 colocalizing variants that affect DNAm and gene expression of 7 genes at 7 loci that are not reported AMD loci and are therefore new AMD associations (Supplementary  Table 24). These include the SNP rs11072507 associated with methylation of CpG cg21565421, MPI expression and AMD (Fig. 5e), and the SNP rs6493454 associated with CpG cg19881011 methylation, MTMR10 expression and AMD (Extended Data Fig. 7c). We compared CpGs and genes identi ed in the GM or EM coloc and GEM moloc analyses, and only detected an overlap of 6 CpGs in GM and GME, 4 genes (KASH5, EHD4, CPLX4 and SNX21) common in GEM and EM, and 5 genes (C8orf58, PDZD7, FN3KRP, SLC16A3, SCAF1) common in GE and EM (Extended Data Fig. 7d and 7e). We further overlapped CpGs and genes identi ed in the SMR analysis (E2M_SMR, M2E_SMR) and EM coloc analysis, and noted 770 CpGs and 544 genes in common in all 3 analyses ( Fig. 5f and 5g) (Supplementary Table 25). The CpGs were mapped to potential target genes (mGenes; see Methods), which showed enrichment in pyruvate metabolism and olfactory signaling pathway (Extended Data Fig. 7f).
Integrative analysis with Hi-C retina map improves target gene prioritization Finally, to prioritize a high con dence list of variants that contribute to DNAm and gene expression variation, we integrated loops, CREs and SEs from our recently published retina Hi-C data 33 Table 36). These analyses also identi ed high con dence causal links with Hi-C loops between retina eQTLs and mQTLs, and 1 eQTL/mQTL association with ALDH2 identi ed in E2M_SMR (Fig. 6c). Similarly, for M2E_SMR associations, we identi ed targets genes for 609 associations with CRE, 516 associations with SE, and 187 associations with loops (Supplementary Table 37). We determined 3 mQTL/eQTL associations with GSTP1 identi ed in M2E_SMR (Fig. 6d), and 4 eQTL/mQTL and mQTL/eQTL associations with EML1 noted in E2M_SMR and M2E_SMR (Extended Data Fig. 8b). For coloc EM colocalizations, we identi ed target genes for 790 colocalizations with CRE, 450 colocalizations with SE and 216 colocalizations with loops (Supplementary   Table 38). We also discovered a target gene for coloc EM colocalization in LPIN1 gene with Hi-C loops (Extended Data Fig. 8c).
Genes and pathways in uenced by DNA methylation and associated with AMD To obtain a non-redundant set of genes, we merged the target genes identi ed in mQTL and AMD GWAS analyses using SMR, eCAVIAR, coloc and moloc. We identi ed 4 target genes at 3 loci by SMR, 15 target genes at 8 loci using eCAVIAR, 58 target genes at 11 loci using coloc, and 10 target genes at 3 loci using moloc. By combining the target genes from the four methods, we identi ed a total of 87 non-redundant genes that are in uenced by DNAm (Supplementary Table 39; 50 of these genes are at the reported AMD GWAS loci and 37 are new target genes. These genes belonged to a range of biological processes including mTOR signaling and RHOF GTPase cycle (Reactome, gPro ler adj. P < 0.05), and regulation of actin cytoskeleton reorganization and glycosylation (Gene Ontology, GeneEnrich adj. P < 0.05).

Discussion
DNA methylation (DNAm) has regulatory roles during development, aging and disease 11 . The relationship between DNAm and gene expression is complex and in uences diverse biological pathways. DNAm can be mitotically inherited and can act as an intermediary molecular link between genetic variation and tissue-speci c transcriptional regulation, thereby providing a mechanism for long-term maintenance of cellular identity 10 . As most of the GWAS signals map to non-coding regions, their functional signi cance may be best explained by regulatory effects on common human traits and diseases. Non-coding genetic variation also contributes to the establishment of DNAm patterns, independently or together with environmental exposures 50 . In the present study, we have performed an integrative genetic, transcriptomic, and epigenetic analysis identifying genes and pathways that are regulated in human retina under normal conditions and/or may be in uenced by advancing age and environmental factors. We support this proposal by identifying mQTLs that may be causal to eQTLs and vice versa using Mendelian randomization and colocalization analyses. The strength of our study is that we applied two different complementary approaches to testing whether mQTLs, eQTLs and/or AMD associations share a common causal variant and whether their association is causal or pleiotropic as opposed to being confounded by linkage. The 87 genes and corresponding pathways, we identi ed here, are modulated by DNAm and may be suggestive of disease progression and pathology.
DNAm can have a dual interplay by being a cause and a consequence of gene expression changes 51 . We discovered genetic associations with DNAm for over 36,000 CpG sites that mapped to about 72.6% of retina-expressed genes. Notably, approximately two-third of these associations are likely unique to the retina and not observed in brain, muscle, blood or adipose tissue. We also identi ed genetic associations with gene expression for 9,395 (54%) of retina genes. Retina mQTLs were primarily detected in open chromatin regions, transcription factor (TF) binding sites, enhancers, and synonymous variants. In contrast, the retina eQTLs showed the highest enrichment in splicing regions, 5' and 3' UTR, TF binding sites, and other regulatory regions. Interestingly, the retina mQTLs and eQTLs tended to affect genes in divergent biological processes.
Using multiple algorithms, our maps of mQTLs and eQTLs integrated with AMD GWAS have provided detailed information about genotype changes that mediate DNAm and gene expression in the retina under non-disease conditions and in relation to AMD. Even though mQTLs identi ed here are limited by sample size compared to the published eQTL studies 29,30,31 , we have reported more target mGenes associated or colocalized with AMD loci, indicating a larger impact of epigenetic changes due to environmental factors and advanced age. Our work suggests that mQTLs uncover novel AMD-associated molecular mechanisms that have been missed by eQTLs. In addition, integration with retina Hi-C data has yielded supportive evidence of regulatory mechanisms and helped in determining high con dence target genes for mQTLs and eQTLs. For example, we ascertained 5 mQTLs and 1 eQTL affecting PARK7, which is involved in synaptic signaling and protecting neurons against oxidative stress and cell death 52 . We also recognized a CRE overlapping these 5 mQTLs using retina epigenetic marks. A previous study of retina from PARK7/DJ-1 de cient 3-and 6-month-old mice showed that knock-out of PARK7 leads to changes in electroretinogram (ERGs), which measures the electrical activity of retina in response to light stimulus and increased oxidative stress, suggesting a possible connection to aging and AMD 53 .
We characterized eQTMs for over 10,000 CpG sites, which account for 1.4% of all methylated CpGs yet map to 75% of genes expressed in the retina. Our eQTM analysis revealed that the majority of CpGs affecting gene expression are localized in the TSS region, as observed in other tissues 20,34,21 . Though the methylated CpGs in eQTMs are enriched in TSS regions, 10-fold more eQTMs targeted a distal gene rather than a proximal gene supported by Hi-C chromatin loops; these results suggest that many CpGs act as enhancers regulating the associated gene. Interestingly, CpG methylation has been commonly associated with downregulation of gene expression; nonetheless, almost 45% of eQTMs we identi ed are positively associated with gene expression, consistent with previous eQTM studies 20,34,21 . We also noted that some TFs prefer to bind to methylated CpGs, and methylation does not always equal repression 54  The assessment of sharing of a causal variant between expression and methylation QTLs using colocalization analysis assumes a single causal variant (coloc) and/or summary statistic-based Mendelian randomization (SMR) 20,21,34 . In addition to using coloc, we have applied colocalization methods that account for multiple causal variants (allelic heterogeneity) in QTL and GWAS loci (eCAVIAR and moloc), as well as SMR that tests whether co-occurring mQTL, eQTL and AMD associations are pleiotropic or induced by linkage. These analyses have enabled us to prioritize a comprehensive list of candidate mQTL/eQTL and AMD genes for follow-up functional studies. As every method is unique and has somewhat different underlying assumptions, we could only demonstrate commonality only for a few results. For example, we detected one common mQTL of CpG cg11712338 at KMT2E/SRPK2 locus between SMR associations and eCAVIAR colocalization results for mQTL and AMD GWAS. For pairs that are not common between colocalization and SMR analyses, hidden confounders may not be well captured by the covariates we adjusted for, or this could be due to the limited power of each method.
Clearly, since each of these methods have their limitations, experimental testing will be needed to con rm causality.
Newer chromatin interaction methods, developed in the last decade, have implicated long-range looping interactions between regulatory elements and promoters 65 . Integration of gene expression and AMD GWAS with a retina chromatin map was recently reported by our group 33 . Incorporation of retina Hi-C data into the current study allowed us to identify a high con dence set of target genes that are linked to mQTLs, eQTLs, and eQTMs based on the SMR, eCAVIAR, coloc and moloc methods. For example, we show a remarkable long-range interaction between the variant rs11065898 and the target gene ALDH2, whose association was evident in the E2M_SMR analysis. ALDH2 is upregulated in AMD retinas, indicating that it responds to changes in cellular redox status 66 . We also identi ed the target gene EML1 for the variant rs11625037 with retina loops, SE and CREs, whose associations were found in E2M_SMR and M2E_SMR analysis. EML1 is a microtubule-associated protein-like gene in which rare mutations are linked to Usher syndrome (USH), a group of genetic disorders manifesting congenital deafness, retinitis pigmentosa, and vestibular dysfunction 67 . We thus establish how integration of mQTLs/eQTLs and eQTMs with Hi-C data can further facilitate the prioritization of candidate genes for disease.
Our study represents the rst to integrate epigenome and transcriptome dysregulation in the retina with genetic risk of disease such as AMD. We have discovered multiple genes and CpG sites that show association and colocalization at a shared genetic variant with known AMD GWAS loci. Several of the associated target genes we identi ed have not yet reached genome-wide signi cance in the largest published AMD GWAS 23 , demonstrating the ability of the current QTL study to propose novel genes related to this disease. The addition of a retina chromatin map has provided further insights into genes and biological mechanisms driving the genetic associations. It is important to note that different omics datasets and methods we applied provide both complementary and con rmatory information for target prioritization. We are especially interested in gene expression and DNAm levels measured in the retina since epigenome is a reversible system that can be in uence by diverse environmental factors, such as smoking and diet 13 . We hypothesize that future studies of DNAm changes with age and AMD using a larger sample size can potentially uncover joint contributions of genetic variability, aging, and environment on AMD risk. Finally, incorporating additional QTL types of molecular traits, such as splicing QTL (sQTL), protein QTL (pQTL), chromatin accessibility QTL (caQTL), and histone modi cation QTL (hQTL) are expected to yield greater understanding of retinal function and disease pathology. MUC73416) and the University of Cologne (application nr. 14-247), respectively.

Datasets
At the LMU, whole globes were prepared as described before 71 . After cleaning of the whole eye in 0.9% NaCl solution, immersing in 5% polyvinylpyrrolidone-iodine, and rinsing again with NaCl solution, the globe was prepared and vitreous was removed. Subsequently, the retina was harvested with two sterile forceps and dissected at the optic nerve head. Thereafter, the retina was transferred to a 1.5 ml Eppendorf cup and stored at -80°C. Donor eyes from the eye bank of the University of Cologne were processed within 6 h of death and retinal dissection was performed. Retinal samples were then ash frozen in liquid nitrogen and stored at − 80°C until further processing. Genomic DNA was extracted using the salting-out method described elsewhere 72  Genotyping data processing and quality control for mQTL analysis Genotyping data for the mQTL analysis was taken from our two previously published studies, which mQTL tissue speci city To gain insight into the tissue speci city of the discovered mQTLs for peripheral retina, we examined mQTLs from adipose (n = 119) 45 , blood (n = 614) 46 , endometrium (n = 66) 47 , frontal cortex (n = 526) 48 and skeletal muscle (n = 282) 20 . For comparative analysis, we obtained the signi cant mQTLs (variant-CpG pairs) for adipose, blood, endometrium, and frontal cortex tissues from the corresponding studies. Since for skeletal muscle only the nominal results were provided without signi cance information, we considered mQTL (variant-CpG pairs) with P value < 0.05 as signi cant in our analysis.

RNA-seq data
Gene expression data was taken from our previously described published studies for 406 RNA-seq retina samples from the Bethesda study 29 and 152 RNA-seq retina samples from the Cologne and Munich study 31 .
Genotyping data processing and quality control for eQTL analysis Genotyping and 1000 Genomes Project-imputed data for the eQTL analysis was taken from our previously published studies of 406 samples 29 . Three samples were further removed due to high perchromosome missingness (> 10%) and an average missingness above 3% across all chromosomes.
Genotype PCA was performed on LD-pruned SNPs using Eigenstrat (v6. Gene expression ~ Sample site + MGS + sex + age The set of negative-control genes for supervised SVA were taken from our previously published RNA-seq study 29 . We identi ed 21 SVs, and these were further used as covariates in eQTL analysis.  Fig. 2b).
TORUS was run on all signi cant variant-gene pair summary statistics of all signi cant CpGs and eGenes at FDR < 5%. All mQTLs found in retina, and mQTLs that are speci c to retina compared to adipose, blood, endometrium, frontal cortex, and skeletal muscle were tested. The functional annotations analyzed where based on Ensembl's Variant Effect Predictor (VEP) and Loss-Of-Function Transcript Effect Estimator (LOFTEE) (VEP v100, GENCODE v34) annotations of all variants in the imputed genotype array VCF. Distinct "consequences" from the annotated VCF le were extracted using BCFtools (v1.11) and a presence/absence matrix (PAM) was created. Any annotations with less than 100 variants were removed (lower bound requirement of TORUS). The PAM matrix together with reformatted FastQTL 82 les were used as input to TORUS for the QTL functional enrichment analysis. TORUS computed a point estimate and 95% con dence intervals (CIs) per annotation and QTL type. Annotations with a lower bound 95% CI value above zero were considered signi cant. ggforestplot (v0.1.0) in R 3.5.0 was used to plot the enrichment results. Due to large CI variations, annotations (consequences) with less than 550 variants were removed from the forest plot for mQTLs and eQTLs. The proportion and number of mQTLs and eQTLs that fell within each functional annotation were computed.

eQTM analysis
We tested for associations between methylation of CpG sites and expression levels of genes whose TSS as within a ± 1Mb window of each CpG site. eQTM analysis was performed using a linear regression model implemented in R. We analyzed 152 retina samples used in the mQTL analysis with their available gene expression data from RNA-seq 29,31 . To correct for batch effects in the RNA-seq data for 152 samples and avoid confounding by AMD grade, surrogate variables (SVs) were identi ed and estimated for known batch effects and latent factors using supervised SVA method 79 (v3.38.0) and the following model: Gene expression ~ Sample_site + MGS + sex + age A list of negative control genes for the supervised SVA were taken from our previously published RNA-seq study 29 .We identi ed SVs that and were used as covariates in eQTM analysis. Gene ontology and pathway enrichment of mQTL, eQTM and eQTL target genes We used GeneEnrich 2 to identify gene ontology (GO) terms enriched for target genes of signi cant mQTL, eQTM and eQTL (FDR < 5%). GeneEnrich estimates an empirical gene set enrichment p-value for a list of signi cant genes in prede ned gene sets, based on permutation analysis of a background set of genes expressed in a given tissue that adjusts for potential biases that may arise from the genes expressed in the given tissue. Speci cally, the gene set enrichment P-value for a given gene set or pathway is computed as the fraction of 1,000-100,000 randomly sampled sets of genes from a background (null) list of genes, of equal size to that of the signi cant set of genes, which have the same or more signi cant hypergeometric probability than the observed probability of the signi cant genes. The null set of genes was de ned in this study as all genes expressed in peripheral retina that were tested for mQTLs, eQTMs and eQTLs, excluding the Summary data-based mendelian randomization (SMR) analysis of eQTL, mQTL and AMD GWAS We performed summary data-based mendelian randomization (SMR) analysis to test for causal or pleiotropic associations between eQTL and mQTL using eQTL summary statistics as the exposure and mQTL as the outcome (represented as E2M_SMR analysis), or between mQTL and eQTL using mQTL summary statistics as the exposure and eQTL summary data as the outcome (represented as M2E_SMR analysis) using package GSMR 37,38 (v1.1.1). We used the heterogeneity independent instruments (HEIDI) test and P-value threshold of greater than 0.05 to differentiate pleiotropy from linkage 37,25 . We also performed SMR and HEIDI analysis to test causal or pleiotropic associations between mQTL or eQTL and AMD GWAS. All analyses were applied to all signi cant independent variant-by-gene or variant-by-CpG site QTLs.

Colocalization analysis between retina mQTLs or eQTLs and known AMD GWAS loci
To identify retina mQTLs and/or eQTLs that might underlie the causal mechanisms of genetic associations with AMD in known GWAS loci, we applied a Bayesian-based colocalization method, eCAVIAR 39 to each of the 52 independent AMD GWAS variants in 34 loci and overlapping peripheral retina mQTLs and eQTLs. Only mQTL or eQTLs with at least 5 signi cant variants in the LD interval per GWAS locus (r 2 > 0.1 plus 50kb on either side of the lead GWAS variant) were tested. All variants that fell within the GWAS locus LD interval were analyzed. A colocalization posterior probability (CLPP) above 0.01 was considered signi cant based on prior simulations 39 .To minimize false positive results, we excluded signi cant colocalizing GWAS-QTL signals if the me/eQTL and GWAS variants with CLPP > 0.01 did not pass the following signi cance cutoffs: me/eQTL FDR < 0.05 or P < 10 − 4 and/or GWAS P < 10 − 5 .
Supplementary Table 5 presents a summary of the colocalization results for 52 AMD GWAS loci. We added the direction of effect of the mQTL or eQTL on AMD risk if signi cant colocalization was found.
Colocalization analysis between eQTL, mQTL and AMD GWAS loci using Moloc We performed Bayesian colocalization analysis to identify shared causal variants or haplotypes between co-occurring eQTL, mQTL, and AMD GWAS signals. To estimate the posterior probability that a lead variant is associated with two traits (GWAS and mQTL, GWAS and eQTL, mQTL and eQTL), coloc 40 (v5.1.1) was implemented using the R package. Next, to estimate posterior probability that a lead variant is associated with three traits (GWAS, eQTL and mQTL), moloc 41 (v0.10) was implemented using the R package. Posterior probability above 0.8 was considered as evidence that two and three trait associations share the same causal variant based on coloc and moloc, respectively.
Integration of HiC data for identi cation of target genes for mQTLs, eQTLs, eQTMs, variants and genes identi ed in SMR, eCAVIAR, coloc and moloc analyses The mQTLs were classi ed based on the location of the variants relative to the canonical TSS of the associated mGene based on Illumina annotation 78 . eQTLs were classi ed based on the location of the variants relative to the canonical TSS of the associated eGene based on Ensembl 85 version. eQTMs were classi ed based on the location of the CpGs relative to the canonical TSS of the associated target genes based on Ensembl 85 version. mQTL and eQTL variants located within ± 2.5 kb of the mGene/eGene TSS were identi ed as promoter and those located > 2.5 kb from the mGene/eGene were identi ed as distal mQTLs/eQTLs. The mQTLs/eQTLs variants overlapping a chromatin loop foot were classi ed based on the location contacted by opposite loop foot; variants in contact with their mGene/eGene promoter were identi ed as mGene/eGene promoter mQTLs/eQTLs, and variants in contact with a promoter other than the mGene/eGene as non-eGene promoter mQTLs/eQTLs. The mQTLs/eQTLs were checked for overlap with CREs and SEs and subsequently for overlap with the associated mGene/eGene. eQTMs located within ± 2.5 kb of the TSS of its target gene were identi ed as promoter eQTMs and while those located > 2.5 kb from the target gene were identi ed as distal eQTMs.

Declarations Data Availability
The data that support this study are available from the corresponding author upon reasonable request.
Dataset produced in this study are accessible in GEO under the accession number: GSE231536 (Methylation array). Gene expression data are from our previous study, under the accession GSE115828 27 .

Code availability
Publicly available software and packages were used throughout this study according to each developer's instructions. Custom scripts generated for the study will be available upon request. The following software was used and can be downloaded or accessed online: genomics.org/plink2/) R v3.5.0 and v4.0.0 (https://cran.r-project.org/), Washington University Epigenome browser: http://epigenomegateway.wustl.edu/browser/, qctool v2 (https://www.well.ox.ac.uk/~gav/qctool_v2/). Figure 1 Graphic summary of datasets generated, integrated and analyses performed in the present study and robust identi cation of retina mQTLs a: Schematic representation of our genetic, epigenetic, and transcriptomic datasets and methods used in the identi cation and integration of methylation quantitative trait loci (cis-mQTL), expression quantitative