Distinct CSF biomarker-associated DNA methylation in Alzheimer’s disease and cognitively normal subjects

Background Growing evidence has demonstrated that DNA methylation (DNAm) plays an important role in Alzheimer’s disease (AD) and that DNAm differences can be detected in the blood of AD subjects. Most studies have correlated blood DNAm with the clinical diagnosis of AD in living individuals. However, as the pathophysiological process of AD can begin many years before the onset of clinical symptoms, there is often disagreement between neuropathology in the brain and clinical phenotypes. Therefore, blood DNAm associated with AD neuropathology, rather than with clinical data, would provide more relevant information on AD pathogenesis. Methods We performed a comprehensive analysis to identify blood DNAm associated with cerebrospinal fluid (CSF) pathological biomarkers for AD. Our study included matched samples of whole blood DNA methylation, CSF Aβ42, phosphorylated tau181 (pTau181), and total tau (tTau) biomarkers data, measured on the same subjects and at the same clinical visits from a total of 202 subjects (123 CN or cognitively normal, 79 AD) in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. To validate our findings, we also examined the association between premortem blood DNAm and postmortem brain neuropathology measured on a group of 69 subjects in the London dataset. Results We identified a number of novel associations between blood DNAm and CSF biomarkers, demonstrating that changes in pathological processes in the CSF are reflected in the blood epigenome. Overall, the CSF biomarker-associated DNAm is relatively distinct in CN and AD subjects, highlighting the importance of analyzing omics data measured on cognitively normal subjects (which includes preclinical AD subjects) to identify diagnostic biomarkers, and considering disease stages in the development and testing of AD treatment strategies. Moreover, our analysis revealed biological processes associated with early brain impairment relevant to AD are marked by DNAm in the blood, and blood DNAm at several CpGs in the DMR on HOXA5 gene are associated with pTau181 in the CSF, as well as tau-pathology and DNAm in the brain, nominating DNAm at this locus as a promising candidate AD biomarker. Conclusions Our study provides a valuable resource for future mechanistic and biomarker studies of DNAm in AD.


Introduction
We obtained information for CSF biomarkers ( , , and tTau), which were measured by Roche Elecsys immunoassay, from the "UPENNBIOMK9.CSV" le at the ADNI website (adni.loni.usc.edu). Standardized CSF biomarkers values were computed by log (base 2)-transformation followed by centering using the study means, as in previous analyses of CSF biomarkers 38, 39 .

Identi cation of CSF biomarker-associated CpGs
To assess the associations between CSF biomarkers ( , , and tTau) and DNA methylation, we tted the following linear regression model (Model 1) to CN and AD samples separately: standardized CSF biomarker ~ methylation.beta + age + methylation plate + sex + APOE4 + years of education + smoking history + immune cell-type proportions (B, NK, CD4T, Mono, Gran).
We also compared the effects of methylation-to-CSF biomarker associations in CN samples and AD samples, by tting the following model (Model 2) to combined CN and AD samples: standardized CSF biomarker ~ methylation.beta + diagnosis + methylation.beta × diagnosis + age + methylation plate + sex + APOE4 + years of education + smoking history + immune cell-type proportions (B, NK, CD4T, Mono, Gran). Signi cant methylation.beta × diagnosis interaction effect corresponds to a signi cant difference in methylation-to-CSF biomarker associations in the CN samples and AD samples.

In ation Assessment and correction
We estimated genomic in ation factors (lambda values) using both the conventional approach 40 and the bacon method 41 , which is speci cally proposed for a more accurate assessment of in ation in EWAS. Supplementary Table 2 shows the estimated in ation and bias of the test statistics from Model 1 described above. Speci cally, lambda values (λ) by the conventional approach ranged from 0.719 to 1.096, and lambdas based on the bacon approach (λ.bacon) ranged from 0.863 to 1.019. The estimated bias ranged from − 0.097 to 0.117. Genomic correction using the bacon method 41 , as implemented in the bacon R package, was then applied to obtain bacon-corrected effect sizes, standard errors, and P-values for each dataset to obtain a more accurate estimate of statistical signi cance. After bacon correction, the estimated bias ranged from − 0.002 to 0.002, the estimated in ation factors ranged from λ = 0.967 to 1.042, and λ.bacon ranged from 0.974 to 1.000.
For each CSF biomarker, we considered CpGs with a false discovery rate (FDR) 0.05 as statistically signi cant. Given the modest number of samples with both DNA methylation and CSF biomarker measurements, we expected our analysis to be underpowered. Therefore, based on our experiences and previous studies in the analysis of EWAS measured in blood 37,42,43 , we also prioritized CpGs with suggestive signi cance at the pre-speci ed signi cance threshold P-value < .

Differentially methylated regions (DMR) analysis
To identify the differentially methylated regions associated with CSF biomarkers, we used the comb-p software 44 . Brie y, comb-p takes single CpG P-values and locations of the CpG sites to scan the genome for regions enriched with a series of adjacent low P-values. In our analysis, we used the bacon-corrected A β 4 2 p T a u P-values from Model 1 above as the input, and the parameter setting --seed 0.05 and --dist 750 (a P-value of 0.05 is required to start a region and extend the region if another P-value was within 750 base pairs), which was shown to have optimal statistical properties in our previous comprehensive assessment of the comb-p software 45 . As comb-p uses the Sidak method to correct P-values for multiple comparisons, we considered DMRs with Sidak-adjusted P-value < 0.05 as signi cant. To further reduce false positives, we imposed two additional criteria in our nal selection of DMRs: (1) DMRs with nominal P-value < 1 × 10 − 5 ; (2) All CpGs within the DMR have a consistent direction of change in estimated effect sizes from Model 1 described above.
Functional annotation of signi cant methylation associations  47 (https://www.engreitzlab.org/resources/). Speci cally, we selected enhancer-gene pairs with "positive" predictions from the ABC model, which included only expressed target genes, did not include promoter elements, and had an ABC score higher than 0.015. In addition, we also required that the enhancer-gene pairs be identi ed in cell lines relevant to this study (https://github.com/TransBioInfoLab/AD-metaanalysis-blood/blob/main/code/annotations/).

Pathway analysis
To identify biological pathways enriched with CSF biomarker-associated DNA methylation, we used the methylRRA function in the methylGSA R package 48 (version 1.14.0). The pathway analyses were performed separately for each of the three CSF biomarkers, and the most signi cant P-value among the 3 P-values (one for each CSF biomarker) was then selected as the nal P-value for each pathway. In each analysis, we used the bacon-corrected P-values from Model 1 described above as the input for methylGSA. Brie y, methylGSA rst computes a gene-wise value by aggregating P-values from multiple CpGs mapped to each gene.

Correlation and overlap with genetic susceptibility loci
We searched mQTLs using the GoDMC database 53 , which was downloaded from http://mqtldb.godmc.org.uk/downloads. To select signi cant blood mQTLs in GoDMC, we used the same criteria as the original study 53 , that is, considering a cis P-value smaller than 10 − 8 and a trans-P-value smaller than 10 -14 as signi cant. The 24 LD blocks of genetic variants reaching genome-wide signi cance were obtained from Supplementary Table 8

Sensitivity analysis
Immune cell type proportions were estimated using the IDOL algorithm 55 , as implemented in the estimateCellCounts2 function in the R package FlowSorted.Blood.EPIC. We then tted the same linear models described in "Identi cation of CSF biomarker-associated CpGs" above, except by replacing cell type proportions estimated by EpiDISH method with those estimated by IDOL algorithm.

Validation analysis using an independent dataset
The London dataset 7 ,56 , which consists of DNAm measured on premortem whole blood samples from 69 subjects, along with their postmortem neuro brillary tangle burden as measured by AD Braak stage 57 , as well as DNAm measured on the brain prefrontal cortex at autopsy, was downloaded from the GEO database (accession number: GSE29685). The blood and brain DNAm samples from the London dataset were pre-processed in the same way as described above. Given the relatively modest number of samples at some of the Braak stages, we modeled the Braak stage as a binary variable, with absent/low (Braak scores of 0,1,2) vs. intermediate/high (Braak scores 3-6) neuro brillary tangle tau pathology, as previously described 28 . Speci cally, to test the association between premortem blood DNAm and postmortem AD Braak stage, we tted the model methylation M value ~ Braak stage (absence/low vs. intermediate/high) + sex + age at blood draw + batch. In the London dataset, none of the estimated blood cell-type proportions were signi cantly associated with the Braak stage ( Supplementary Fig. 1), so they were unlikely to be confounding factors; therefore, we did not include them in the above linear model. To assess concordance between brain and blood DNAm at each CpG within the DMR located on the HOXA5 gene, we computed Spearman correlations.

Sample characteristics
To identify DNA methylation associated with CSF biomarkers, we studied matched whole blood DNA methylation, CSF Aβ 42 , phosphorylated tau 181 (pTau 181 ), and total Tau (tTau) biomarkers data measured on the same subjects and at the same clinical visits in the ADNI study 31,37 . Our study included samples from a total of 202 subjects (123 cognitively normal, 79 AD cases). Table 1 shows the demographic information of these subjects. There were no signi cant differences in age, sex, smoking history, and educational attainment between the cognitively normal (CN) and AD subjects. Overall, the majority of the subjects are in their seventies (with an average age of 76.6), are highly educated (with an average of 16 years of education), and fewer than half of the subjects smoked. Compared to CN subjects, the AD subjects have a higher proportion of APOE 4 carriers (71% in AD vs. 25% in CN). Moreover, CSF Aβ 42 levels were signi cantly lower in AD subjects, while CSF pTau 181 and tTau levels were signi cantly higher in AD subjects. Finally, Mini-Mental State Examination (MMSE) scores were signi cantly lower in AD subjects (an average of 22 points in AD vs. an average of 29 points in CN), indicating more cognitive dysfunction.

DNA methylation in the blood is signi cantly associated with CSF biomarkers at individual CpGs and genomic regions
To identify DNAm differences associated with CSF biomarkers at different stages of the disease, we analyzed CN and AD samples separately. Supplementary Table 3 presents a summary of the signi cant CpGs and DMRs. In CN samples, after adjusting covariate variables (age, sex, batch effects, years of education, number of APOE4 alleles, smoking history, immune cell-type proportions), and correcting for genomic in ation in each dataset, we identi ed 1 CpG cg06171420, located in the vicinity of PCBP3 gene, signi cantly associated with CSF levels of total tau (tTau) at 5% false discovery rate (FDR) (Supplementary Table 4). At P-value < 1 × 10 − 5 , we identi ed an additional 34, 15, and 11 CpGs signi cantly associated with CSF Aβ 42 , pTau 181 , and tTau levels, respectively (  Fig. 2) might be due to CSF Aβ 42 reduction occurring earlier in the disease process, and thus is associated with more pervasive epigenetic effects.
Among these 198 signi cant CSF biomarker-associated CpGs in either CN or AD samples, the majority (61% or 120 CpGs) were negatively associated with increased levels of AD biomarkers; about two-thirds were located in distal regions of genes (65% or 129 CpGs); about half of the signi cant CpGs (51% or 100 CpGs) were located in CpG islands or shores, and only about a third of them were located in gene promoter regions (Supplementary Tables 4-6).

Blood DNAm associated with CSF biomarkers differed between diagnosis groups
Overall, we found the DNAm associated with CSF biomarkers were relatively distinct across diagnosis groups. Speci cally, there was no overlap between the signi cant CpGs in AD samples and CN samples ( Supplementary Fig. 2). Among the 184 signi cant DMRs that were signi cant in either CN or AD sample analysis (Supplementary Table 3 Pathway analysis revealed DNA methylation associated with CSF biomarkers is enriched in a number of biological pathways in cognitively normal and AD subjects To better understand biological pathways enriched with signi cant CSF biomarkers-associated DNA methylation, we next performed pathway analysis using the methylGSA software 48 . At 25% FDR (Methods), a total of 89 and 13 pathways were signi cant in CN and AD samples, respectively (Supplementary Table 10). Among them, 3 pathways (calcium signaling pathway, regulation of actin cytoskeleton, neuroactive ligand-receptor interaction) also reached 5% FDR in CN samples, and 2 pathways (cardiac conduction and muscle contraction) also reached 5% FDR in AD samples.
We next examined the overlap between signi cant pathways identi ed in CN samples and AD samples. Among the 95 pathways that reached 25% FDR in either CN or AD samples, only 7 pathways (7.4%) were signi cant in both groups (Supplementary Table 10). These seven pathways are regulation of actin cytoskeleton, neuroactive ligand-receptor interaction, ubiquitin mediated, proteolysis, Wnt signaling pathway, MAPK signaling pathway, cardiac conduction, and muscle contraction. We also found pathway enrichment of the signi cant CSF biomarker-associated CpGs to be independent in CN samples and AD samples ( Supplementary Fig. 5). These pathway analysis results are consistent with those described above for individual CpGs, in which we observed little correlation between estimated effect sizes of CpGto-CSF biomarkers associations in CN and in AD.

Correlation of DNA methylation at signi cant CSF biomarker-associated CpGs and DMRs with expressions of nearby genes
To prioritize signi cant DNAm with downstream functional effects, we next correlated DNA methylation levels of the signi cant DMRs or CpGs with the expression levels of genes found in their vicinity, using matched DNAm and gene expression samples generated from 263 independent subjects (84 AD cases and 179 CN) in the ADNI cohort. In CN subjects, after removing effects of covariate variables in both DNA methylation and gene expression levels separately (Methods), at 5% FDR, we found DNAm at 2 CpGs, and 6 DMRs were signi cantly associated with target gene expression levels (Supplementary Table 11). Interestingly, aside from 1 CpG (cg14074117) located in the intergenic regions, all CpGs and DMRs were negatively associated with target gene expressions. Among them, 3 DMRs were located in gene promoter regions and negatively associated with expression levels of the target genes at GSTM5, CAT, and CRISP2. GSTM5 belongs to the Glutathione S-Transferase family of genes, which encodes enzymes associated with oxidative stress in neurodegenerative diseases 59,60 . Recently, GSTM5 was observed to be signi cantly downregulated in the primary visual cortex brain tissues, an area mildly affected by tau pathology and corresponds to the "early" AD transcriptome 61 . This previous nding is consistent with our result that DNAm increases with pTau 181 and tTau levels and are negatively associated with the target gene. Similarly, the CAT gene encodes catalase, another key antioxidative enzyme that mitigates oxidative stress 62 . Defects in catalase have been implicated in a number of neurological disorders, including AD 63 .
On the other hand, in AD samples, we found DNAm at 5 CpGs and 5 DMRs were signi cantly associated with target gene expression levels. Half of these DNAm (4 CpGs and 1 DMR) had a negative correlation with target gene expression. Two DMRs, located in the promoter region of the TNNT1 gene, were positively associated with the expression level of the TNNT1 gene, which was shown to be a marker of central nervous system molecular stress associated with neuropsychiatric diseases 64 . Our results are consistent with previous observations that DNAm at some promoter regions is correlated with increased target gene expression 65-68 . While traditionally promoter methylation is thought to be associated with transcriptional silencing by blocking the binding of transcription factors (TFs), which are proteins that bind DNA to facilitate the transcription of DNA into RNA, recent studies suggest more complex patterns of protein -DNA interaction associated with the DNA methylome 69,70 . In particular, several studies observed that the binding and activity of some TFs are enhanced by CpG methylation to activate gene expression [71][72][73] . In addition, the positive promoter DNAm to target gene association could also be due to a coregulatory phenomenon in which both DNAm and target gene are altered by proteins associated with TFs 53,69,74,75 . Correlation and overlap with genetic susceptibility loci To identify methylation quantitative trait loci (mQTLs) for the signi cant DMRs and CpGs, we next performed look-up analyses using the GoDMC database 53 Table 14). Our comparison of the mQTLs with CSF biomarker-associated genetic loci 38 did not identify any overlapping variants. These results suggested that the majority of the CSF biomarker-associated CpGs, by and large, are not in uenced by genetic variants at the GWAS loci for AD or AD biomarkers. Therefore, even though a substantial proportion of the CpGs are in uenced by genetic variants, we found no evidence that genetic variations might be confounding variables in our DNAm to CSF biomarker associations because these genetic variations are not signi cantly associated with AD or AD biomarkers.
Finally, we also evaluated if our signi cant methylation loci overlapped with the genetic risk loci associated with AD diagnosis 54 or CSF AD biomarkers 38 . However, we found no overlap between the signi cant DNAm discovered in this study compared with AD diagnosis or CSF AD biomarker-associated genetic risk loci. This result is consistent with a previous study which also found no evidence of overlap between signi cant EWAS loci and GWAS loci in a meta-analysis of 11 blood-based EWAS of neurodegenerative disorders 36 . The lack of commonality between genetic and epigenetic loci in AD supports previous ndings that DNA methylation and genetic variants play relatively independent roles in AD 4,76 .

Sensitivity analysis
We performed an additional analysis to evaluate the robustness of DNAm to CSF biomarker associations with regard to different methods for estimating cell type proportions. To this end, we estimated immune cell type proportions using an alternative method, the IDOL algorithm described in Salas et al. (2018) 55 .
Our results show the cell type proportions estimated by the IDOL method and the EpiDISH method 35 we used in our primary analyses are highly concordant ( Supplementary Fig. 6). Next, we repeated our DNAm to CSF biomarkers association analyses by adjusting cell type proportions estimated by IDOL. Our results showed the blood DNAm to CSF biomarker associations obtained by adjusting IDOL cell type proportions are largely congruent with our primary analysis results. In particular, the Aβ 42 -associated CpGs and pTau 181 -associated CpGs remained highly signi cant, with P-values ranging from 1.10 × 10 -10 to 1.81 ×

Validation analysis using an independent dataset
To validate our ndings, we also studied DNAm associated with brain pathology in an independent dataset. To this end, we analyzed DNAm measured on premortem blood samples from 69 subjects, along with their postmortem neuro brillary tangle burden in the brain prefrontal cortex determined at autopsy, as measured by AD Braak stage 57 in the London dataset 7 ,56 . At a nominal P-value less than 0.05, a number of CSF biomarker-associated CpGs and DMRs that we identi ed in the ADNI dataset are also signi cantly associated with the Braak stage in the London dataset ( Supplementary Tables 17-18). These DNAm are located at the ERO1LB, MBTPS1, HOXA5, TRIM15, TYW3, MME, HMSD, CHAD, SEMA3C genes, and the intergenic regions. Note that because CSF Aβ 42 decreases and brain tau-pathology increases in AD subjects, we selected CpGs or DMRs with opposite directions in blood DNAm-to-CSF Aβ 42 and blood DNAm-to-Braak stage associations.
After correcting for multiple comparisons, at Sidak adjusted P-value less than 0.05, we observed blood DNAm at two DMRs, located on the HOXA5 and CHAD genes, were signi cantly associated with AD Braak stage in the London dataset, and overlapped with CSF pTau 181 or Aβ 42 associated DMRs in the ADNI dataset. Of particular interest is the strong replication association signal located in the promoter region of the HOXA5 gene. In ADNI (discovery) dataset, blood DNAm at DMR chr7: 27183946-27184668 is signi cantly associated with CSF pTau 181 (P-value = 1.06 × 10 − 6 , Sidak-adjusted P-value = 1.07×10 − 3 ); in London (replication) dataset, blood DNAm at this locus (at DMR chr7: 27183133-27184451) is also signi cantly associated with Braak stage in the brain (P-value = 7.27 × 10 -20 , Sidak-adjusted P-value = 2.49 × 10 -17 ) (Supplementary Table 18). Previously, Smith et al. (2018) also observed signi cant hypermethylation across the HOXA gene cluster in the brain signi cantly associated with AD Braak stage in the Mt. Sinai, London, and ROSMAP brain datasets 8 . Intriguingly, we also observed signi cant correlations between brain and blood DNAm at 7 CpGs located within the DMR (Supplementary Fig. 7), as well as a signi cant association between the DMR with target gene expression (Supplementary Fig. 8).
Together, these results suggested the DMR at HOXA5 is a promising biomarker robustly associated with tau-pathology in both brain and the blood.

Discussion
In this study, we analyzed samples from the CN and AD subjects separately, as we reasoned that the CSF biomarker-associated DNAm discovered in CN samples would most likely be associated with AD risk; in contrast, after the onset of disease, the CSF biomarker-associated DNAm in AD samples would most likely be associated with both AD risk as well as changes caused by AD pathologies that accumulate in the brain. Supporting this premise, we found that the signi cant DNAm identi ed in AD and CN samples were largely distinct (Supplementary Fig. 2). There was also little correlation between DNAm-to-AD biomarker associations in the two groups of subjects, both at the levels of CpGs (Supplementary Fig. 4) and pathways ( Supplementary Fig. 5). These results suggest that the epigenetics associated with different pathological processes in cognitively normal subjects (some of which might later proceed to develop AD) and AD patients vary, supporting the recommendation of considering the patients' disease stage in developing treatment strategies 77,78 .
Our comprehensive analyses identi ed a number of DNAm differences signi cantly associated with CSF biomarkers Aβ 42 , pTau 181 , and tTau, many of which were associated with genes previously implicated in AD pathogenesis. Speci cally, in the analysis of CN subjects, we identi ed 1 CpG (cg06171420) mapped to around 5 kb upstream of the PCBP3 gene, signi cantly associated with tTau at 5% FDR (Supplementary Table 4, Supplementary Fig. 9). The PCBP3 gene encodes the RNA-binding protein hnRNPE3 (poly(rC) binding protein 3), which regulates alternative splicing of the tau gene 79,80 . In Down Syndrome, AD, and other neurodegenerative diseases, an abnormal ratio of tau protein isoforms often results in aggregated tau, a major component of neuro brillary tangles. In the region-based analysis, the most signi cant CSF Aβ 42 -associated DMR is located in the promoter of the THRB gene ( Supplementary   Fig. 10), which encodes a receptor for the thyroid hormone, previously observed to be dysregulated in AD subjects 81-83 .
In AD subjects, we identi ed signi cantly more DNA methylation associated with the CSF biomarkers; a total of 112, 4, and 3 CpGs reached 5% FDR in their association with Aβ 42 , pTau 181, and tTau, respectively.
Among the top 10 most signi cant CpGs associated with Aβ 42 (Table 2), cg24037493 maps to the promoter of the SFXN1 gene and is signi cantly associated with CSF Aβ 42 in AD subjects ( Supplementary   Fig. 11). SFXN1 encodes the mitochondrial serine transporter, which helps to maintain mitochondrial iron homeostasis 84 . It has been observed that iron levels accumulate in the brains of AD subjects and correlate signi cantly with cognitive decline [85][86][87] . Similarly, among the top 10 most signi cant pTau 181 and tTau-associated CpGs (Table 3), cg03037740 maps to the promoter of the RING1 gene, and is signi cantly associated with CSF pTau 181 (Supplementary Fig. 12). RING1 encodes a protein that interacts with the polycomb protein BMI1, which plays a critical role in AD pathogenesis. Remarkably, it has been demonstrated that reduced expression of BMI1 protein alone is su cient to induce both amyloid and tau pathologies in both cellular and animal models 88,89 . The most signi cant promoter DMR associated with Aβ 42 is located at the TMEM204 gene ( Supplementary Fig. 13), which encodes a transmembrane protein that functions as a cell surface marker for in ltrating microglia in the CNS during neuroin ammation 90 . Similarly, the most signi cant promoter DMR associated with pTau 181 is located at the FBP1 gene ( Supplementary Fig. 14), which encodes an enzyme that regulates glucose and energy metabolism. It has been observed the expression levels of FBP1 are reduced in the brains of patients at risk for AD 91,92 , consistent with our observed hypermethylation at the promoter of the FBP1 gene in samples with increased levels of pTau 181 . Taken together, these results demonstrated that our analysis nominated biologically meaningful DNA methylation loci in the blood associated with AD and, importantly, that changes in the different pathological processes in the CSF, both before and after the clinical diagnosis of AD, are re ected in the epigenome.
In CN samples, interestingly, among the most signi cant pathways enriched with signi cant CpGs is the KEGG pathway "Alzheimer's disease", which was curated based on recent AD literature and included genes that confer AD risks, such as APOE, PSENEN, MAPT, CALM3, MME, and others. Also, in CN samples, the most signi cant pathway is the calcium signaling pathway (P-value = 2.39 × 10 − 4 , FDR = 9.09 × 10 − 3 ), consistent with the calcium hypothesis of AD, which posits that dysregulated neuronal calcium homeostasis induces impaired synaptic plasticity, defective neurotransmission, promotes accumulation of Aβ and tau proteins, and subsequently lead to neuronal apoptosis in the brain 98,99 . Moreover, increased levels of free intracellular calcium have also been observed in normal aging, the strongest risk factor for AD 100,101 . The second most signi cant pathway is the regulation of actin cytoskeleton (P-value = 1.61 × 10 − 3 , FDR = 2.51 × 10 − 2 ), consistent with the observation that synapse degeneration is a key early feature of AD pathogenesis 102,103 , and stability of the actin cytoskeleton is crucial for maintaining functional integrity of the dendritic spines at sites for neurotransmission in the brain 104 . These results suggest that some of the brain impairment during the early stages of the disease (i.e., preclinical) is also re ected in the blood epigenome.
Although the majority of the CSF biomarker-associated DNAm differed in CN and AD samples, our analyses also identi ed a small number of DMRs that were signi cantly associated with CSF biomarkers in both groups ( Supplementary Fig. 2), which could serve as candidate biomarkers in future studies of AD progression. Speci cally, three DMRs, all of which were associated with Aβ 42 , reached Sidak adjusted Pvalue < 0.05 in both CN and AD sample analyses. The rst DMR chr15:69744390-69744763 is located at the promoter of the RPLP1 gene, which encodes a subunit protein of the ribosome. A defective ribosomal function is associated with decreased capacity for protein synthesis, reduced number of synapses, and has been observed as an early feature of AD preceding neuronal loss 105,106 . Another noteworthy result is two overlapping DMRs signi cantly associated with CSF Aβ 42 , at chr6:30130819-30131284 in AD samples and chr6:30130819-30131362 in CN samples, both are located in the promoter of the TRIM15 gene, which encodes a member of the TRIM protein family involved in the ubiquitin system responsible for degrading misfolded protein aggregates and plays important roles in neurodegenerative diseases 107,108 .
To validate our ndings, we studied premortem blood DNAm associated with postmortem Braak stage measured on prefrontal cortex samples in an independent dataset, previously described as the London dataset 7 . Encouragingly, we found a number of CSF-biomarker-associated blood DNAm also correlated signi cantly with the Braak stage, which corresponds to neuro brillary tangle tau pathology burden in the brain (Supplementary Tables 17-18). In the London dataset, we observed a strong blood DNAm to Braak stage association signal located at a DMR in the promoter region of the HOXA5 gene. Interestingly, this locus also showed a signi cant association to CSF pTau 181 in the ADNI dataset (Supplementary   Table 18, Supplementary Fig. 15). Moreover, we also observed a signi cant correlation between brain DNAm and blood DNAm at a subset of 7 CpGs within the DMR ( Supplementary Fig. 7), as well as a signi cant association between the DMR and downstream target gene expression ( Supplementary   Fig. 8). Consistent with previous studies, which discovered the extensive hypermethylation in the brain at the HOXA gene clusters signi cantly associated with tau neuropathology 7 , our study provided strong evidence that these hypermethylated CpGs can also be observed in the blood epigenome, and are signi cantly associated with pTau 181 levels in the CSF (Supplementary Table 18). Taken together, these results nominate hypermethylation at the HOXA5 locus in the blood as a plausible biomarker for tau pathology.
On the other hand, given brain and blood cells originate from different developmental cell lineages, previous studies also suggested that DNA methylation pro les are, by and large, distinct between brain and blood 7,17,109 . Consistent with these previous results, our comparison of the blood DNAm from this study with brain DNAm associated with AD pathology in two large recent meta-analyses of postmortem brain tissues 9,110 shows only a few overlapping DNAm (3 CpGs and 8 DMRs), mapped to PRSSL1, LINGO3, SPRED2, HOXA2, NR2F1, CPT1B, HOXA5, ZFPM1 genes, and intergenic regions, were signi cant with both blood DNAm-to-CSF Aβ 42 /pTau 181 association and brain DNAm-to-brain Aβ/tau association (Supplementary Tables 4-9). Also, there is not any overlap between blood DNAm associated with the CSF AD biomarkers and blood DNAm associated with clinical AD from our previous meta-analyses of two large clinical AD datasets 17,111 . This is not surprising, given the disconnection between brain pathology and clinical diagnosis in AD; it has been observed that a substantial proportion of cognitively normal subjects also have AD pathology in the brain 20,21 .
This study has several limitations. First, we analyzed the methylation levels measured on whole blood, which contains a complex mixture of cell types. To reduce confounding effects due to different cell types, we included estimated cell-type proportions as covariate variables in all our analyses. Future studies that utilize single-cell technology for gene expression and DNAm could improve power and shed more light on the particular cell types affected by the DNAm loci discovered in this study. Second, to study DNAm associated with CSF biomarkers in subjects at different stages of the disease (i.e., preclinical or clinical), we separately analyzed samples from cognitively normal and AD subjects, which reduced the sample sizes of the analysis datasets considerably. Given the modest sample size, we pre-de ned a more liberal signi cance threshold (i.e., P-value < 10 − 5 ) based on previous analyses of blood DNA methylation data 17,37,43,112 , to select a small number of loci that were then further prioritized using additional integrative analyses. Future studies with larger sample sizes are needed to identify and replicate DNAm loci at more stringent signi cance thresholds. Third, we did not consider MCI subjects in this study because there is considerable heterogeneity among MCI subjects, with subjects converting to AD at different trajectories 113 . As ADNI is currently conducting additional phases of the study, future analyses with a larger sample size will make it possible to detect DNA methylation to CSF AD biomarker associations in different subgroups of MCI subjects. Fourth, although women make up about two-thirds of AD patients in the general U.S. population 1 , our study cohort (which had both CSF biomarkers and blood DNAm available in ADNI) had a disproportionately lower proportion of females in the AD group (37% females in AD group vs. 51% females in CN group) (Table 1). Therefore, our study cohort may not represent a random sample from the general population. In all our analyses, we adjusted the variable sex in addition to other covariate variables, so the DNAm-to-CSF biomarkers associations we identi ed are independent of sex. Large and diverse community-based cohort studies that validate our ndings are needed. Fifth, as recent autopsy studies revealed that about a quarter of CN subjects also shows AD neuropathology in the brain 20,21 , the CSF biomarker-associated methylation we observed in CN subjects could potentially be markers of an early feature in AD that precedes clinical diagnosis. Future studies that develop DNAmbased prediction models for diagnosing AD and compare their performance with state-of-the-art plasma biomarkers of AD are needed. Finally, the associations we identi ed do not necessarily re ect causal relationships. Future studies are needed to establish the causality of the nominated DNA methylation markers.

Conclusions
In this study, we leveraged AD biomarkers as quantitative outcomes to identify DNAm associated with various AD pathology. Our study found a number of novel associations between blood DNAm and CSF Aβ 42 , phosphorylated tau 181 , and total tau, which are proxy biomarkers of AD pathophysiology, demonstrating that changes in various pathological processes in the CSF are re ected in the blood epigenome. Overall, the CSF biomarker-associated DNA methylome is relatively distinct in CN and AD subjects, highlighting the importance of analyzing omics data measured on cognitively normal subjects (which includes preclinical AD subjects) to identify diagnostic biomarkers, and considering disease stages in the development and testing of AD treatment strategies. Our analysis of blood samples of cognitively normal subjects pointed to a number of potential therapeutic targets relevant to the treatment of AD, such as calcium channel blockers associated with calcium signaling pathway 98 , and spine stabilizing therapy associated with regulation of actin cytoskeleton 104 . Moreover, we found blood DNAm at several CpGs in the DMR on the HOXA5 gene are not only associated with CSF pTau 181 , but also taupathology in the brain, as well as brain DNAm at the same locus in an independent dataset, nominating DNAm at this locus as a promising candidate AD biomarker. In summary, our study provides a valuable resource for future mechanistic and biomarker studies in AD.

Declarations
Ethics approval and consent to participate Not Applicable

Not Applicable
Availability of data and materials The ADNI can be accessed from http://adni.loni.usc.edu The scripts for the analysis performed in this study can be accessed at https://github.com/TransBioInfoLab/AD-ATN-biomarkers-and-DNAm Tables   Table 1 Sample characteristics of the study dataset. Table 2 Top 10 most significant CpGs associated with CSF Aβ 42 in cognitively normal (CN) and Alzheimer's disease (AD) subjects. Annotations include the location of the CpG based on hg19/GRCh37 genomic annotation (chr, position) and nearby genes based on GREAT (GREAT_annotation). Regression analysis results for CpG-to-CSF Aβ 42 association include effect estimate, standard error (se), and P-values after inflation correction using the bacon method (PMID: 28129774). Highlighted in red are gene promoter regions mapped to significant CpGs.   Table 4 Top 10 most significant DMRs associated with CSF Aβ 42 in cognitively normal (CN) and Alzheimer's disease (AD) subjects. For each DMR, annotations include the location of the DMR based on hg19/GRCh37 genomic annotation (chr, start, end) and nearby genes based on GREAT (GREAT_annotation). Direction indicates a positive or negative association between DNA methylation at a CpG located within the DMR and CSF biomarker. Highlighted in red are gene promoter regions mapped to significant DMRs.
Page 28/30 Table 5 Top 10 most significant DMRs associated with CSF phosphorylated tau 181 (pTau 181 ) in cognitively normal (CN) subjects and Alzheimer's disease (AD) subjects. For each DMR, annotations include the location of the DMR based on hg19/GRCh37 genomic annotation (chr, start, end), and nearby genes based on GREAT (GREAT_annotation). Direction indicates a positive or negative association between DNA methylation at a CpG located within the DMR and CSF biomarker. Highlighted in red are gene promoter regions mapped to significant DMRs. Figure 1 Miami plot for CpGs signi cantly associated with CSF Aβ 42 in the ADNI cohort. The X-axis shows chromosome numbers. The Y-axis shows -log 10 (P-value) of methylation-to-CSF Aβ 42 association in cognitively normal (CN) subjects, or Alzheimer's disease (AD) subjects. The genes associated with the 20 most signi cant CpGs per subject group are highlighted. The red line indicates P-value < 10 -5 signi cance threshold.

Figure 2
Miami plot for CpGs signi cantly associated with CSF phosphorylated tau 181 (pTau 181 ) in the ADNI cohort. The X-axis shows chromosome numbers. The Y-axis shows -log 10 (P-value) of methylationto-CSF pTau 181 association in cognitively normal (CN) subjects, or Alzheimer's disease (AD) subjects. The genes associated with the 20 most signi cant CpGs per subject group are highlighted. The red line indicates the P-value < 10 -5 signi cance threshold.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.