Human brain single nucleus cell type enrichments in neurodegenerative diseases

Background: Single-cell RNA sequencing has opened a window into clarifying the complex underpinnings of disease, particularly in quantifying the relevance of tissue- and cell-type-specific gene expression. Methods: To identify the cell types and genes important to therapeutic target development across the neurodegenerative disease spectrum, we leveraged genome-wide association studies, recent single-cell sequencing data, and bulk expression studies in a diverse series of brain region tissues. Results: We were able to identify significant immune-related cell types in the brain across three major neurodegenerative diseases: Alzheimer’s disease, amyotrophic lateral sclerosis, and Parkinson’s disease. Subsequently, putative roles of 30 fine-mapped loci implicating seven genes in multiple neurodegenerative diseases and their pathogenesis were identified. Conclusions: We have helped refine the genetic regions and cell types effected across multiple neurodegenerative diseases, helping focus future translational research efforts.


Background
Neurodegenerative diseases (NDDs) encompass diseases characterized by progressive degeneration of cell types, including neurons and glia, in the central nervous system and/or peripheral nervous system.
NDDs vary in their anatomic vulnerabilities and main affected neuropathologies, resulting in speci c cell type dysregulation that may or may not be shared between NDDs. 1,2The complexity of biological processes and pathways involved in NDD pathogenesis has stymied progress in the understanding of disease and treatment development.While the expression of microglia in one cell type of the brain may be dysregulated, within the same disease state, other cell types may be regulated normally.Technology such as bulk RNA sequencing (RNA-seq) would be unlikely to identify these differences in expression since RNA-seq averages the expression of transcripts from all cells present in a tissue sample.
High throughput single-cell, or single-nucleus, mRNA sequencing (scRNA-seq, or snRNA-seq) technology provides a new window into understanding the functionally complex interactions between cell populations and across tissues.While bulk RNA-seq provides an assessment of expression averaged across a population of sampled cells, snRNA-seq allows for nuanced insight at the cell-type level.Instead of identifying gene set enrichment across an entire tissue, one can identify enrichment of gene expression partitioned by cell type or changes in the composition of cell types themselves.In this report, we leverage population-scale genome-wide association study (GWAS) data in conjunction with snRNA-seq in the brain to gain more speci c insights into potential cell-type-speci c mechanisms of risk within and across neurodegenerative diseases.

Materials and methods
Single nucleus RNA-seq expression data Data from snRNA-seq in adult human brain tissues, consisting of the forebrain, midbrain, and hindbrain, were obtained from Siletti et al. 3 The data were obtained by sequencing dissections from three healthy postmortem donors.It included each of the 461 superclusters de ned by Siletti et al. as being represented by the islands identi ed through a two-dimensional representation calculated by t-distributed stochastic neighbor embedding (t-SNE). 3Additionally, we calculated enrichment at the scRNA-seq data at the level of annotated cell type classes using autoannotation data provided by Siletti et al. in the supplementary materials of their manuscript.

Genome-wide association study summary statistics
We included data from six NDD GWAS: Amyotrophic lateral sclerosis from van Rheenen et al. 4 ; Alzheimer's disease from Bellenguez, et al. 5 ; Frontotemporal lobar degeneration from Pottier, et al. 6 ; Lewy body dementia from Chia, et al. 7 ; Parkinson's disease from Nalls,et al. 8 ; and Progressive supranuclear palsy from Höglinger, et al. 9 Data for Amyotrophic lateral sclerosis, Alzheimer's disease, Lewy body dementia, and Progressive supranuclear palsy were obtained from the GWAS Catalog (https://www.ebi.ac.uk/gwas), and data for Frontotemporal lobar degeneration and Parkinson's disease were obtained directly from the respective authors.

MetaBrain eQTL summary statistics
We included expression quantitative trait loci (eQTL) summary statistics from MetaBrain, a large-scale eQTL meta-analysis from de Klein et al. 10 For colocalization analysis, we included SNPs with a reported p-value no greater than 1x10 -4 for the QTL component of the study.

Cell Type Enrichment Analyses
We conducted cell type enrichment analyses using the R package MAGMA. 11,12Cell typing was completed for each adult human brain snRNA-seq data and disease GWAS combination.Multiple testing correction (Bonferroni method) was used to allow for e cient identi cation of enriched cell types.
Required input data for conducting MAGMA analysis include formatted GWAS summary statistics and a CellTypeDataset object (CTD).Preprocessing of disease-speci c GWAS summary statistics and the creation of the CTD object, which holds cell speci city data, was performed using the R packages MungeSumstats 13 and ECWE 14 respectively.Quality control and munging of input data was conducted using methods available through the R packages used (see supplementary methods).

Colocalization
We conducted Bayesian colocalization analysis using the R package coloc 15 for all pairwise combinations of putatively signi cant autosomal NDD GWAS SNPs (p ≤ 5x10 -8 ) and MetaBrain eQTLs (p ≤ 1x10 -4 ).To account for the possibility of a single SNP in uencing expression at multiple loci, we iterated over signi cant probes (see supplementary methods).We report associations with a posterior probability of at least 90%.A summary of the data used and the numbers of hits is included in Supplementary Table S1.

Gene Expression Summary
We summarized expression ranks for genes of interest within the single-cell adult human brain transcriptome data set adult_human_20221007.loom from Siletti et al. 3 Using custom R scripts, we converted feature counts into transcripts per million (TPM).For a given sample, feature counts were divided by maximum nonredundant intron-removed exon lengths to correct for differences in gene length.
Values were then multiplied by a sample-speci c constant (10 6 / T, where T is the sum of lengthnormalized counts) such that the resulting unitless vector sums to one million.We extracted exon lengths based on annotations from the GTF le used to originally annotate the single-cell data (gb_pri_annot.gtf).
We calculated the expression percentile rank for genes of interest using the empirical cumulative distribution function and then calculated the mean and median expression percentile rank (EPR) value for each gene for each tested cell type.To ease interpretation, we binned the EPR values into 3 classes: off (EPR < 10), low (10 < EPR < 90) or high (EPR > 90).

Data and Code Availability
The code used to generate and process our data can be accessed at our Github Repository.All data for this project are publicly available via the original publications accessed by our team.Summaries of enrichment and colocalization data are also available to browse in our community target discovery and due diligence resource omicSynth web application.

Online methods
Additional method details can be found in the Supplementary Materials section as well as in the Data and Code Availability section.All analyses comprising this work ow are summarized in Fig. 1.

Cell type enrichments identi ed in Alzheimer's disease, Amyotrophic lateral sclerosis, and Parkinson's disease
We identi ed signi cant cell type enrichments in the adult human brain for three out of six tested diseases: Alzheimer's disease, Amyotrophic lateral sclerosis, and Parkinson's disease (Table 1; Supplementary Table 2).In general, we found that linear regression style enrichment analysis identi ed more signi cantly enriched cell types than the top 10% enrichment style (n linear = 30, n Top 10% = 5).
MAGMA.Celltyping documentation does state that using the linear regression enrichment mode results in more signi cant results due to overlapping cell type signatures.In Parkinson's disease, we ran MAGMA cell typing analysis on three variations/subsets of Nalls et al.
meta-GWAS 8 : the complete meta-GWAS; meta-GWAS excluding 23andMe data; and meta-GWAS excluding 23andMe and UK Biobank data.In the rst two variations tested, no cell types at the supercluster or class level reached signi cant enrichment after Bonferroni correction ( p < 9.2×10 − 4 ).The only GWAS variation that identi ed signi cant cell type enrichments was when the full Parkinson's disease meta-GWAS was used in the cell typing analysis.We identi ed 14 enriched cell types when using linear regression analysis at the supercluster level.At the class level, we identi ed 10 enriched cell types (Supplementary Table S3).The top enriched cell types at the supercluster and class levels were thalamic  S4-5).
In Alzheimer's disease, we found microglia to be the only signi cantly enriched cell type at the supercluster level in both linear and top 10% analyses ( p < 9.2×10 − 4 ; p linear = 4.193×10 − 8 , b linear = 3.978×10 − 3 , p Top 10% = 1.195×10 − 6 , b Top 10% = 0.1278) for risk loci.At the class level, four cell types were found to be signi cant after Bonferroni correction in both analyses.Signi cantly enriched cell types included macrophages, monocytes, microglia, and natural killer cells (Supplementary Table S6).Alzheimer's disease was the only disease to have signi cant enrichment when using the top 10% enrichment analyses.
In Amyotrophic lateral sclerosis, we identi ed one signi cant cell type enrichment using linear regression enrichment analysis and at the class level annotations.Monocytes were the only cell type to reach signi cance (Bonferroni-corrected threshold of p < 9.2×10 − 4 ); p Monocytes = 8.516×10 − 4 , b monocytes = 1.778×10 − 3 ; Supplementary Table S7).We did not detect signi cant cell type enrichments in Frontotemporal lobar degeneration, Lewy body dementia, and Progressive supranuclear palsy.No cell types at either the supercluster or class level reached signi cance after MAGMA-implemented Bonferroni correction ( p < 9.2×10 − 4 ; Supplementary Table S2, Supplementary Tables S8-10).

Colocalization
Across all diseases tested, we ne mapped a total of 205 association signals at a posterior probability > 90%.This included 89 unique genes identi ed as harboring putative causal associations.Of these 205 associations, 20 were centered around the HLA region, with colocalized omic associations in the cerebellum and cortex (the latter in multiple ancestry groups), suggesting an extremely complicated risk of neuroin ammation in this part of the genome.Interestingly, GRN is ne-mapped using colocalized eQTL signals in the cerebellum for Alzheimer's disease and the cortex for Parkinson's disease, suggesting related but potentially different mechanisms.The TMEM175/GAK region shows multiple colocalized signals across multiple diseases (Parkinson's disease and Lewy body dementia).Of particular interest is that GAK and TMEM175 are both ne-mapped to the same SNP (rs6599388) with eQTL effects in opposite directions in Lewy body dementia.At the same time, regional signals for the SLC26A1 gene were also ne-mapped for Parkinson's disease, Lewy body dementia and Alzheimer's disease, with the effect in Parkinson's disease being detected in the spinal cord, while the other disease QTLs were localized to the cortex.
Variants exhibiting a colocalization posterior probability ≥ 90% are summarized in Fig. 2 and Supplementary Table S11.Thirty loci were ne-mapped to a single gene per disease by leveraging QTL data.10,16 Of these, six are known druggable genes, including ABCA1, ADAM10, CD55, FGF7, OXGR1 and POLE (see Supplementary Table S12).Mining additional data on these druggable genes from the omicSynth 16 database (Supplementary Table S13), we note that ABCA1 has putative functional multiomic associations with Alzheimer's disease and Progressive supranuclear palsy in blood and brain tissues.ADAM10 has a similar pattern of functional inferences in Alzheimer's disease.CD55 has been shown to have multiple signi cant functional inferences in brain tissues mediating the risk for Alzheimer's disease and Lewy body dementia.However, FGF7 does not display any signi cant functional inferences in any diseases from the database query.Methylation QTLs in blood connect Progressive supranuclear palsy and FTD at OXGR1 via functional inferences using SMR, while multiple brain, blood and nerve associations connect this gene with Parkinson's disease risk across both expression and methylation QTLs.A similar pattern of disease and tissue associations is seen for POLE, although there is no signi cant neural tissue association for Parkinson's disease, and Progressive supranuclear palsy is also connected via blood eQTL to the same gene.Of the ne mapped loci, 22 (Supplemental Table S16) have also been nominated elsewhere as potential therapeutic targets with likely functional impacts on neurodegenerative disease risk in the context of methylation, expression, protein or chromatin QTLs detailed in our omicSynth web application. 16ll type of colocalized genes evaluated gene expression from snRNA-seq used in enrichment analysis.We calculated the mean and median expression percentile rank (EPR) for each gene across cells corresponding to the nominated supercluster cell types identi ed in our enrichment analyses and compared the aggregate mean and median values against the nominated colocalized genes (Supplementary Fig. 1, Supplementary Tables S14-15).To ease interpretation, we binned the mean EPR values into three categories based on the mean EPR value for each gene-cell type combination: off, low, and high (see methods).
We identi ed 10 colocalized genes with high median EPR values out of the 14 tested genes.The PAM gene has six cell type combinations (CGE interneuron, Eccentric medium spiny neuron, MGE interneuron, Hippocampal dentate gyrus, Thalamic excitatory, and Mammillary body) classi ed as high (Fig. 3).Overall, the Mamillary body cell type had the greatest count of eight high EPR genes.

Discussion
identi ed enriched cell types in various brain regions for three (Alzheimer's disease, Amyotrophic lateral sclerosis, and Parkinson's disease) out of six tested NDDs utilizing snRNA-seq data of the 461 superclusters identi ed by Siletti et al. 3 We did not detect signi cant cell type enrichment at either the supercluster or class level in Frontotemporal lobar degeneration, Lewy body dementia, and Progressive supranuclear palsy.We speculate that due to the smaller sample sizes of each GWAS, we were unable to identify any signi cantly enriched cell types after applying the MAGMA-implemented Bonferroni correction ( p < 9.2×10 − 4 ; Supplementary Table S2, Supplementary Tables S8-10).signi cantly enriched cell types had an associated positive beta for risk genes, and our identi ed signi cant cell type enrichments fall in line with current knowledge on cell types implicated in various NDD pathologies, which we will discuss further.Broadly, our results highlight the importance of immune-related cell types in the pathology of varying NDDs.[19][20] Monocytes were the only cell type signi cantly enriched across the three diseases: Alzheimer's disease, Amyotrophic lateral sclerosis, and Parkinson's disease.It is the most signi cantly enriched cell type in amyotrophic lateral sclerosis and Parkinson's disease (p Amyotrophic lateral sclerosis =8.51×10 − 4 , p Parkinson' s disease = 1.31×10 ) and the third most enriched cell type in Alzheimer's disease (p Alzheimer' s disease = 9.25×10 − 8 ).2][23] Of the genes enriched in the monocyte cell type (previously described in our results), KYNU is a gene of note, being part of the tryptophan metabolic pathway, which has been found to play a role in amyloid-β formation. 19croglia are another signi cantly enriched cell type identi ed in our analyses, although only signi cantly enriched in Alzheimer's disease.5][26] Microglia are known to function similarly to DCs and macrophages, which are derived from monocytes. 25In the Alzheimer's disease literature, microglia are implicated as an affected cell type associated with neuroin ammation. 25,27Our results highlight microglia as the second most enriched cell type in Alzheimer's disease after Bonferroni correction (p Alzheimer' s disease = 4.19×10 − 8 ).Microglia were found to be nominally signi cant in Lewy body dementia and Parkinson's disease (p Lewy body dementia (Linear) = 0.0119, p Parkinson' s disease(Linear) = 9.25×10 − 3 , p Parkinson' s disease(Linear) = 1.96×10 − 03 , Supplementary Tables 2,   3, 6).8][29][30] It is possible that due to the limited sample size in the snRNA-seq data used, the analyses were unable to detect any signi cant cell type enrichment in microglia for amyotrophic lateral sclerosis.
Common risk factors across neurodegenerative diseases have always been of interest to the basic science and therapeutic industries.The association at the SLC26A1 locus is of particular interest across multiple NDDs, as it is strongly associated with IDUA protein levels in QTL studies, with de ciencies in this protein causing severe lysosomal storage disorders. 31,32The HLA region is a complex locus from a structural genetic standpoint but also in terms of general risk of neuroin ammation, so not surprisingly, resulting in a number of ne-mapped associations across this locus for multiple genes. 5,33GRN is a positive control for this colocalization analysis effort, as previous efforts show that increased genetic risk in the region coincides with decreased expression of the gene in PD, AD and amyotrophic lateral sclerosis. 34ine-mapping efforts localizing signals to single genes within risk loci help to identify novel therapeutic targets with known biological plausibility.ABCA1 is known to be associated with Tangier disease, which is a rare autosomal recessive disorder with low plasma levels of high-density lipoprotein (HDL) causing peripheral neuropathy. 35ADAM10 is implicated in the formation of amyloid plaques in the brain and the processing of APP. 35,36CD55 is known to interact with viruses and cause neuroin ammation, potentially leading to increased Alzheimer's disease risk. 37COQ8A mutations have been shown to cause coenzyme Q10 de ciency leading to autosomal recessive ataxia, cerebellar atrophy, and progressive movement disorders.38 C9orf72, known to be linked to amyotrophic lateral sclerosis, acts as a positive control here and displays unique amyotrophic lateral sclerosis/frontotemporal lobar degeneration colocalization.ADAM10 and C9orf72 were also nominated as potential therapeutic drug targets for neurodegenerative disease via Mendelian randomization. 16mitations to this study generally relate to the availability of data in this context.First and foremost, there is a limited amount of multiancestry or non-European data available for the GWAS and single-cell or QTL resources used here.This potentially introduces bias into therapeutic development and precision medicine applications.Second, low sample sizes for single-nucleus analyses (in terms of the number of humans involved) reduce our ability to generate eQTL databases compared to coarse methods of bulk RNA sequencing at scale.
Here, we provided insights that could potentially aid in therapeutic development for NDDs.On the macrolevel, we have identi ed cell-type level enrichments associated with disease risk in multiple neurodegenerative diseases, allowing biologists and drug developers to better focus their mechanistic and therapeutic research.On the microlevel, we have used eQTL colocalization methods to narrow down the large tracts of associated loci in GWAS to potentially functional variants, metaphorically going from a neighborhood to building level resolution on a map.

Conclusions
We were able to identify signi cant immune-related cell types enriched for risk signals in the brain across three major neurodegenerative diseases: Alzheimer's disease, amyotrophic lateral sclerosis, and Parkinson's diseases.We ne-mapped 30 loci at a cell type level of resolution, implicating seven genes contributing to risk of multiple neurodegenerative diseases.All work is completely transparent and replicable within an open science framework, from data to code and results.An app has been built to aid in effectively sharing these results to the public [https://nih-card-ndd-smr-home-syboky.streamlit.

Figures
Figures

Table 1
Signi cant cell type enrichments across Alzheimer's disease, Amyotrophic lateral sclerosis, and Parkinson's disease.