AD risk regions and risk genes
Using 936 GWAS AD SNPs and linkage disequilibrium, we identified 598 genomic risk regions, spanning ~ 55.0 Mb of the human genome (Fig. 1). Based on the genomic annotation of genes and regulatory elements, we could connect 1,445 genes to 432 risk regions (Supplementary Fig. S2 and Table S5). Among AD risk gene candidates, 967 are proximal genes, overlapping AD risk regions, and 506 are distal genes, linked to AD risk regions through long-range gene regulatory elements (e.g., enhancers) (Supplementary Table S5). 28 genes are both proximal for some risk regions and distal for other risk regions. 35 candidates were not scored due to their lack of GO annotation and/or exclusion from the gene functional linkage network. Among 1,410 scored candidates, 342 loci, distributed in 203 risk regions (Fig. 2 and Supplementary Table S6), surpassed the “high” threshold (see Method) and thus were considered as (putative) AD risk genes. They included 233 (68.1%) candidates proximal to AD risk regions, and additional 115 (33.6%) distal genes, which are likely to be regulated by regulatory elements in the risk regions (Fig. 1 and Supplementary Table S6).
KEGG pathways and GO biological processes enriched with AD risk genes
AD risk genes were overrepresented in 10 KEGG pathways (FDR < 0.05) and with 151 GO terms (P < 0.01 after the Bonferroni correction) (Fig. 3A-B and Supplementary Table S7), all highly relevant to the disease pathology. The most significantly enriched KEGG pathway is the complement and coagulation cascades. Increasing evidence suggests that deregulation of the complement cascade is a contributing factor leading to chronic inflammation and neurodegeneration observed in AD. The complement system plays an important role in the innate and adaptive immune responses, restricts amyloid plaque formation, and helps clearance of plaque components associated with AD [32]. AD risk genes are also enriched in MAPK signaling pathway, which contributes to the AD pathogenesis through multiple mechanisms, including the regulation of neuronal apoptosis and phosphorylation of APP and tau [33]. Additional enriched pathways, such as GnRH signaling, PI3K-Akt signaling, neurotrophin signaling, and calcium signaling, all have been shown to likely play a role in AD [34–37], whereas Ras signaling may play an important role in aging [38]. Many enriched GO terms – e.g., “nervous system development”, "neurogenesis”, “neuron differentiation”, “neuron development” – are related to the development and differentiation of neurons. Some are directly related to AD: e.g., “phosphorylation (p < 0.01)”, “activation of immune response (p < 0.01)”, and “learning memory” are part of the currently predominant hypothesis of the pathogenesis of AD. In additional, several blood-related GO terms are enriched: “blood vessel development”, “vasculature development”, “negative regulation of blood coagulation”, and “hemopoiesis”. Increasing evidences has shown that the hematopoietic system may contribute to the initiation and/or progression of AD [39].
Ad Risk Genes And Human Aging
Eight AD risk genes – ABCA1, TREM2, PLD2, MAPK12, NR1H3, CDK5RAP3, IRAK2, and ABCG1 – were also implicated by rare variants for connection with human lifespan in a whole exome sequencing study of a longevity cohort (data not shown). IRAK2 is informative for AD prognosis. AD patients with high expression of IRAK2 have significantly better survival outcome than patients with low expression of it (P < 0.05) in superior temporal gyrus and inferior frontal gyrus. Another gene, ABCA1, was one of the top genes with the highest score in our longevity study. ABCA1 gene contributes to lipid processing and the formation of Aβ. Studies of transgenic mouse models revealed that deficiency or overexpression of ABCA1 is associated with increased or decreased Aβ deposition [40, 41], respectively, implying that ABCA1-mediated lipidation influences amyloid degradation. Moreover, ABCA1 also plays a crucial role in ApoE – a major genetic risk factor for LOAD – lipoprotein metabolism in the brain [42]. These data suggest that ABCA1 may be a therapeutic target for AD and aging.
Clustered expression of AD risk genes in different human tissues
Based on their expression profiles in different human tissues, AD risk genes can be clustered into three groups (Fig. 3C and Supplementary Fig. S3). The first group of 32 genes was expressed almost exclusively in the central nervous system (CNS), especially the frontal and the prefrontal cortices. Many genes in this group, such as BIN1, MAPT, and CNTNAP2, have been implicated in the pathogenesis of AD [43–45]. The second group included 131 genes actively expressed in the immune cells such as B and T lymphocytes. Recent studies showed that inflammation contributes to the pathogenesis of AD [46]. It is important to note that many immune related genes, such as TREM2, INPP5D, CD34, and CD55, were included in this group. TREM2 is a cell surface receptor of the immunoglobulin superfamily protein expressed in microglia in the CNS. As a potential key molecule in AD pathogenesis, it might protect against neurodegeneration by promoting phagocytosis to clear apoptotic neurons [47] and a broad array of microglial functions in response to Aβ deposition [48]. INPP5D plays a significant role in inflammatory responses and has been implicated in the pathogenesis of late-onset AD through the regulation of microglial and myeloid cell function [7]. This suggests that immune processes may directly contribute to the pathology and progression of AD, rather than being the consequence of the neurodegeneration. Cholesterol metabolism-associated genes in this group, such as SORL1 and ABCA7, have been linked to AD in previous studies. Studies showed that the suppression of SORL1 expression contribute to the overexpression of Aβ and an increased risk of AD [49]. ABCA7 is a genetic risk factor for late-onset AD and may participate in the regulation of Aβ homoeostasis in the brain [50]. Moreover, this group also includes tauopathy-associated AD risk genes, such as, PTK2B and PICALM. Previous studies in Drosophila indicated that PTK2B acts as an early marker and in vivo modulator of tau toxicity [51]. Cell-based and in vivo data showed that perturbations of PICALM levels might be a key for the regulation of autophagy and tau levels and therefore essential for modulating tau toxicity [52]. 115 genes in the third group were expressed across a wide range of different tissues, including the CNS. Many genes in this group, such as ApoE, CR1, and EPHA1, are known to be associated with AD. Human studies clearly indicate that ApoE isoforms differentially affect Aβ aggregation and clearance [8], and CR1 may play a role in the clearance of Aβ [53].
High expression of AD risk genes in microglia, endothelia, and pericytes of human brain
The expression of AD risk genes was significantly enriched in microglia, endothelia and pericytes in the frontal and the visual cortices and cerebellum from human adults (Fig. 3D and Supplementary Table S4). The over-expression profile was evident for many genes (Supplementary Fig. S4), which were enriched for microglial markers (e.g., HLA-DRA and TREM2) and endothelial markers (e.g., CD34). Previous studies showed that microglia are the primary cells contributing to the initiation of the immune response to AD pathology, and the aberrant microglial activation is a causal factor for the development of AD [54]. Recent studies also suggests endothelial dysfunction may be involved in the pathogenesis of AD [55]. Pericytes, cells in the blood-brain barrier, degenerate in AD and are reported to control multiple steps of AD-alike neurodegeneration cascade in mice overexpressing Aβ-precursor protein [56]. Moreover, oligodendrocytes are only significantly enriched with overexpressed AD risk genes in cerebellum. The major function of oligodendrocytes is the formation of myelin, whose breakdown is associated with AD [57].
Connectivity Of Ad Risk Genes In Co-expression Network
AD is a progressive neurodegenerative disease that involves alteration of gene expression at the whole transcriptome level. The perturbation in the sub-networks of co-expression involving AD risk genes can partially reflect AD progression. Finding the altered network hub genes involved in AD progression may help identify AD biomarkers. We carried out the gene co-expression network analysis across four brain regions to examine the gene regulation patterns among AD risk genes. Overall, we observed that connections among AD risk genes are less in AD patients compared to normal controls (Supplementary Fig. S5 and S6). For each brain region, we considered top 20 genes with most interactions with other genes as network hubs. We found that hub genes with high connectivity in AD patients – e.g., ARL6IP5 (BM10), RNF6 (BM22), TP53INP (BM36), and GGH (BM44) – tended to have low connectivity in healthy individuals (Fig. 3E and Supplementary Fig. S5). Rnf6, a ring-finger-dependent ubiquitin ligase, functions for proteasomal degradation in axonal growth cones of primary hippocampal neurons in mice by regulating the levels of Limk1, which play a crucial role in neurodevelopment and synaptic plasticity [58]. TP53INP1, a major regulator of p53 in response to oxidative stress [59], is a tumor suppressor associated with malignant tumor metastasis in breast, liver, pancreas, and stomach and plays a critical role in cancer progression. Interestingly, previous studies showed inverse correlation between cancer and AD [60]. It has been reported that tripeptide GGH might be used for Cu chelation therapy for AD treatment as Cu ion level was reported to be elevated in AD brains and accumulation of amyloid plaques leading to metal homeostasis dysregulation [61]. On the other hand, many hub genes in normal people – e.g., LMTK2 (BM10), SPPL2A (BM22), MAPT (BM44), and USP8 (BM36) – usually had low connectivity in AD patients. LMTK2 may contribute to the neurodegenerative process by disrupting axonal transport, tau hyperphosphorylation and enhancing apoptosis [62]. Its expression is decreased in a tau mouse model of AD [63]. As one of the deubiquitinases, which play a critical role in regulating synaptic function and whose dysfunction results in several neurological disorders, USP8 has been shown to be associated with AD [64], Parkinson’s disease, and Lewy body disease. MAPT encodes tau protein, whose hyper-phosphorylation and subsequent intracellular neurofibrillary entanglement is one of definitive neuropathological hallmarks of AD. SPPL2a is an intramembrane protease of lysosomes/late endosomes and plays a critical role in regulation of intramembrane proteolysis in B cells and the regulation of innate and adaptive immunity [65].
We also analyzed co-expression of AD risk genes at the proteomic level and observed similar patterns that the AD risk genes were less connected among AD patients than normal controls in the ACG region, while opposite pattern in the FC region (Supplementary Fig. S7). For example, ARL6IP5 and GGH were network hubs in AD patients but less connected in controls in the FC region (Supplementary Fig. S8). In the ACG region, we observed network hubs such as PTK2B, SPARC, and RAD50 showing large alteration between AD patients and controls (Supplementary Fig. S8). SPARC is a matricellular protein which can facilitate the migration of immune cells (e.g., blood-derived dendritic cells). Although its role in AD-related neuroinflammation is still not clear, a study has shown that there are significant alterations in its expression and it collocates to Aβ protein deposits in AD brain tissues [66].
Expression of AD risk genes in human brain and its connection to disease survival
Using data from three studies of differential gene expression between AD cases and controls in different brain regions [16–18], we found 171 (50%) AD risk genes were differentially expressed in at least one brain region, including 102 up-regulated genes, 64 down-regulated genes, and 5 genes showing both up- and down-regulation in different brain regions (Supplementary Table S8). Differential expression of AD risk genes was either widespread, occurring in multiple brain regions, or limited to a specific brain region. TGFB2, the highest ranked risk gene, was up-regulated in frontal cortex (FC), central nervous system (CNS), temporal cortex (TCX), superior temporal gyrus (STG) and parahippocampal gyrus (PHG), while PTK2B was down-regulated in brain cerebellum (CBE), TCX, and PHG. COL25A1, the second highest ranked risk gene, and PMAIP1 were separately down- and up-regulated only in the TCX region. Differential expression of some AD risk genes was discordant in different brain regions. For example, ApoE and CST3 in AD patients were up-regulated in TCX region but down-regulated in cerebellum.
Since AD is mainly a late-onset neurodegenerative disorder, we specifically examined how AD risk genes are expressed among adults. Using a binarization procedure [67], we analyzed their spatiotemporal expression patterns using RNA-seq data from BrianSpan. Although no strong pattern was found (Supplementary Fig. S9), the proportion of AD risk genes with dramatically suppressed expression was increased at the age of 40 compared to early ages. The proportion of AD risk genes that tend to be transcriptionally actively was relatively higher at the early ages (of 23 and 30) in comparison to the old ages (of 36 and 40). We next examined the spatiotemporal expression pattern of AD risk genes during the development of the frontal cortex across an extended range of ages (from 18.05 to 78.23). We did not observe any distinct expression pattern across this range of ages.
As a chronic neurodegenerative disease, AD starts slowly and gradually worsens overtime. We hypothesized that genes whose expression correlates with AD progression may mark AD severity and thus can be used to predict AD prognosis. To test this hypothesis, we assessed the impact of AD genes on survival using the Kaplan-Meier analysis. Based on expression levels, nine genes – NRG3, IL1RAP, PMAIP1, STRADA, SGK3, LAMTOR4, MAPK12, PHB, and GRB2 – separated AD patients into low- and high-risk groups with different disease survival (P < 0.05). Their expression also trends differently with age between healthy individuals and AD patients in at least one brain region (P < 0.05) (Fig. 4 and Supplementary Fig. S10).
Predicted Ad Causal Variants
Using the computational framework that we developed for this project, we predicted 150 unique potential causal variants for 109 AD risk genes (Supplementary Table S9 and Fig. S11A). To evaluate this prediction, we analyzed their effect on sequence motifs of transcription factors binding sites (TFBS) and compared these to the eQTL data from GTEx. Motif analysis revealed that 69 predicted causal variants (46%, Supplementary Table S9) cause either gain or loss of TFBS motifs, likely affecting TF binding. Among them, 32 (21%, Supplementary Table S9) have also been identified as eQTLs. Together, 85 (57%) of the predicted causal variants can be functionally annotated (Supplementary Fig. S11B). Three modules were developed in our computational framework to predict causal variants in different functional genomic regions:
Coding variants. We predicted 54 causal coding variants. For example, rs7412 and rs4147934 are two missense coding SNPs, each in high LD (r2 > 0.5) with one of the AD GWAS lead SNPs in its corresponding risk region, were predicted as causal variants (Fig. 5A and B, Supplementary Table S9). rs7412, in APOE with a PrimateAI score = 0.80, is a well-known variant reported to be associated with AD. rs4147934, in ABCA7 with a PrimateAI score = 0.78, has been proposed as a functional candidate variant accounting for the GWAS signal at ABCA7 locus in Caucasians [68]. Although AD risk from rs4147934 is probably population-specific since its association signal was not replicated in the African American cohort [69], our analysis provides additional evidence in support of its causal role in AD and thus its impact in non-European ancestry populations merits further investigation.
Non-coding variants in promoters. We also predicted 33 causal promotor variants in brain tissues or cells (Fig. 5C-E), including rs76516995 (ExPecto score = 0.194) for BIN1 in astrocytes, rs4292 (ExPecto score = 0.390) for ACE in neural cells, and rs12691088 (ExPecto score = 0.224) for APOC1 in astrocytes. rs76516995 and rs4292 have also been identified as eQTLs by GTEx, while rs12691088 has been shown to be associated with AD-related phenotypes in multiple brain regions [70].
Non-coding variants in enhancers. We predicted 64 causal enhancer variants. Two of them, rs2271920 (LINSIGHT score = 0.951) and rs117423666 (LINSIGHT score = 0.966) Fig. 5F and Supplementary Table S9), are causal variants for risk gene PTK2B. rs2271920 is an AD GWAS lead SNP itself [71]. As a GTEx eQTL, it changes the expression of PTK2B, likely by altering the binding sites motif of BCL6 and ZNF467. For the SNX1 locus, we also predicted two causal enhancer variants: rs146600064 (LINSIGHT score = 0.914) and rs60226406 (LINSIGHT score = 0.977) (Fig. 5G and Supplementary Table S9). rs146600064 is in total LD (r2 = 1) with the lead AD SNP rs74615166 [7]. In another risk region indexed by the AD lead SNP rs7207400 [71], several predicted causal variants – rs2049515, rs4341787, rs549929529, and rs242557 – are connected to multiple AD-risk genes – NMT1, CRHR1, SPPL2C, MAPT, and NSF (Fig. 5H and Supplementary Table S9). Although a disease risk gene could be influenced by multiple causal variants, we examined additional data – e.g., TFBS motifs, eQTL, and ClinVar – to further prioritize them. rs549929529 is most likely a causal variant for MAPT as it is reported in ClinVar to be associated with 'MAPT-Related_Spectrum_Disorders'. Compared with other SNPs, rs4341787 is in the highest LD (r2 = 0.779) with the AD lead SNP rs7207400 and is an eQTL identified by GTEx for NMT1. Also, in high LD (r2 = 0.775) with rs7207400, rs2049515 could be a causal variant for multiple genes as it is an eQTL for CRHR1, SPPL2C, and NSF (Fig. 5H and Supplementary Table S9). Moreover, rs1522388, a predicted causal variant (LINSIGHT score = 0.979) for FLNB, was also identified as a reporter assay QTL in HepG2 cell line [72], which experimentally demonstrated its functional impact as a regulatory variant.
To date, the functional impact of the aforementioned variants is still poorly understood in AD etiology, but our findings provide promising causal variant candidates for further experimental validation, which will in turn identify potential drug targets for the development of AD treatment. To explore the molecular function of the predicted causal variants, we examined eQTLs (FDR < 0.05) identified in three brain regions (temporal cortex, dorsolateral prefrontal cortex, and cerebellum) in an AD cohort (Synapse: syn17015233). We found 16 (11%, Supplementary Table S10) causal variants were AD-related eQTLs. Several causal variants – e.g., rs2236393 for CDH3 and rs12752439 for HSPG2 – were not identified as eQTLs by GTEx but instead are AD-related eQTLs. This analysis provides direct evidence for their involvement in AD pathogenesis.