Study design and analysis workflow. A schematic overview of the study design is illustrated in Fig. 1. We conducted a survey of 26 GWASs [34–51, 1, 52, 10, 53–55, 11, 56] of ALS in the NHGRI-EBI GWAS Catalog (till Sep. 2021) and additionally included one most recent GWAS [57] and 3 recent post-GWASs [13, 8, 14] to summarize the current knowledge on the susceptibility loci and candidate genes of ALS (Supplementary Table S1, Fig. 2B). We then based our post-GWAS analyses on ALS GWAS from Nicolas et al. [10], which represents the largest-ever GWAS data for ALS so far, which incorporated data from multiple cohorts, totaling 20,806 cases and 59,804 controls of European ancestry.
Briefly, we firstly characterized a large set of possible tissues and cell types potentially functionally involved in the development of ALS by conducting a gene property analysis using FUMA. We next annotated the potential functions of a set of candidate genes using CADD score, RegulomeDB score, relative physical positioning with ALS-associated SNPs, evidence of eQTL, and chromatin interactions. We further prioritized the causality for a list of candidate genes and tissues by applying TWAS, COLOC, and SMR to 18 eQTL datasets from GTEx v8, eMeta, and eQTLGen. We finally integrated the findings from different methods and provided a list of variants and genes in corresponding tissues with high probabilities of causality.
Gene property analysis highlights a list of tissues and cell types. The platform FUMA has implemented the MAGMA method for tissue-specificity analysis using 54 tissues from GTEx. By using FUMA, we identified 13 brain regions and pituitary, in which tissue-specific gene expression profiles showed nominally significant association with ALS-gene associations (p < 0.05) and two regions showed significance after Bonferroni correction: cerebellum (Bonferroni adjusted p = 2.7×10− 5) and cerebellar hemisphere (p = 1.3×10− 4) (Supplementary Figure S2). These findings are consistent with those of the most recent GWAS by van Rheenen W et al. [57], emphasizing an important role of various brain tissues and in particular cerebellum and cerebellar hemisphere in the development of ALS. In addition, we highlighted pituitary as an important tissue for the analysis of locus-specific gene expression profiles within ALS-risk loci. We thus focused on the 13 brain regions and pituitary in our subsequent analyses, and additionally included skeletal muscle, which has been implied in ALS progression [58, 59], and whole blood, which was also implied in previous studies [14].
MAGMA cell-type specificity analyses in a total of 196 cell types were performed in 8 different mouse and human single-cell RNA-seq datasets. In the human brain dataset from PsychENCODE, we identified both excitatory and inhibitory neurons as significantly (FDR < 0.05) associated with ALS (Supplementary Figure S3A, Supplementary Table S3), meaning that in these cells gene expression profiles are significantly associated with ALS-gene associations. Within-dataset step-wise conditional analyses identified inhibitory neuron 6 showing independent association (Supplementary Figure S3B, Supplementary Table S4, S5), further highlighting the likelihood of inhibitory neuron 6 being the basic functional unit of ALS. In 7 mouse datasets, within-dataset conditional analyses additionally identified oligodendrocytes and gamma-aminobutyric acidergic (GABAergic) neurons (Gad1/Gad2) as significantly associated cell types (Supplementary Figure S3A, Supplementary Table S3-5). To evaluate if the significantly associated cell types from distinct datasets are driven by similar genetic signals, we further conducted a cross-dataset conditional analysis for all significant cell types in all 8 datasets. This analysis highlighted that inhibitory neuron 6, oligodendrocytes and GABAergic neurons (Gad1/Gad2) are likely driven by distinct genetic signals, while various oligodendrocytes from different datasets are likely driven by similar genetic signals (Supplementary Figure S3C, Supplementary Table S6).
Our findings regarding inhibitory and excitatory neurons are consistent with the very recent study of Megat et al. [26], which showed that a large part of ALS heritability lies in genes expressed in inhibitory and excitatory neurons. Our conditional analysis further pinpointed inhibitory neuron 6 as the most likely functional unit. Our findings regarding oligodendrocytes and GABAergic neurons are highly consistent with the study of Saez-Atienzar et al. [60] and partially consistent with the study of van Rheenen et al. [57], in which oligodendrocyte was not significantly enriched in ALS but significant in PD and AD proxy.
In addition, MAGMA gene-based analysis identified 7 genes showing significant association with ALS risk, and all have been identified in previous GWASs (Supplementary Figure S1).
Functional annotation of ALS-associated SNPs and genes. We annotated the functionality of 233 ALS-associated candidate SNPs from the 6 independent genome-wide significant loci (5q33.1 rs10463311, 9p21.2 rs3849943, 12q13.3-14.1 rs142321490, 12q14.2 rs74654358, 19p13.11 rs12973192, 21q22.3 rs75087725) in the GWAS of Nicolas et al. [10] using CADD [18], which predicts SNP deleteriousness based on an integrative annotation built from more than 60 genomic features. Of 233 candidate SNPs, the CADD analysis identified a total of 17 SNPs at 3 loci (9p21.2, 12q13.3, 12q14.2) with high scores (> 12.37) [18], suggesting a strong deleterious effect of these variants (Supplementary Table S7). At 9p21.2, 14 SNPs had high scores tightly surrounding C9orf72, with the highest score observed for rs3736319 (18.5), 83 bp upstream of MOB3B, and 7.6 kbp downstream of C9orf72, and the second highest score observed for rs10967965 (17.2), an intronic variant of MOB3B. At 12q14.1, two SNPs had high scores, including a UTR5 variant of NAB2 (rs185306972) and a nonsynonymous variant of KIF5A (rs113247976). At 12q14.2, one synonymous variant of TBK1 (rs41292019) had a high score.
A complementary analysis using the RegulomeDB [19] annotation further revealed three SNPs at 9p21.2 with strong evidence of regulation supported by eQTL and TF binding/DNase peak (evidence level 1f, Supplementary Table S7). All three were located in the intronic region of MOB3B and very close to C9orf72 (< 60kbp), with one (rs10967965) also highlighted in the CADD analysis. These results provided evidence for the presence of deleterious variants with pathogenic effects and SNPs with regulatory effects in 3 ALS-associated loci and provided the list of candidate genes in these loci.
Next, positional mapping, eQTL mapping, and chromatin interaction mapping analyses were conducted in the 6 ALS-associated loci. Together, these analyses mapped to 58 genes, among which 4 genes were mapped by all three mapping methods, including 5q33.1 TNIP1, 9p21.2 C9orf72, MOB3B, IFNK (Supplementary Table S8). The positional mapping with a maximum distance 10 kbp mapped to 18 genes in 6 loci. The eQTL mapping identified 1,434 significant SNP-gene-tissue pairs (FDR < 0.05), mainly in brain tissues (948 pairs), which mapped to 22 expressed genes in 5 loci (Supplementary Table S9). The chromatin interaction mapping identified 34 genes in 5 loci with 6 genes overlapping with those from eQTL mapping, i.e., 5q33.1 TNIP1, 9p21.2 C9orf72, MOB3B, IFNK, 12q13.3 BAZ2A, PRIM1. Interestingly, different from other loci, the 9p21.2 locus clearly contained a DNA loop [61] in brain tissues (Supplementary Figure S4), which made parts of DNA closer together and allowed genes to be activated by regulatory elements known as enhancers. The 2 loci on chromosome 12 (12q14.1 and 12q14.2) contained more signals for both eQTL and chromatin interactions compared with the other 4 loci (Supplementary Figure S5). These results provided direct evidence for a list of SNPs and genes that are potentially functionally involved in the development of ALS.
Multi-tissue TWAS identified novel functional candidate genes. Based on the full GWAS summary statistics from the study of Nicolas et al. [10], we conducted a series of TWASs to test the association between gene expression levels predicted using PredictDB and ALS risk in 13 different brain tissues, pituitary, skeletal muscle, and whole blood. TWASs were carried out using S-PrediXcan, which analyzes one tissue at a time, and S-MultiXcan, which jointly analyzes all tissues.
S-PrediXcan found a total of 31 genes at 19 distinct loci showing significant (FDR < 0.05) association with ALS risk in at least one tissue. Among the 19 loci, 5 (1q23.3, 6q14.1, 16q24.1, 17p13.2, 22q13.33) are newly identified (Fig. 2B), highlighting 6 genes (NR1I3, PCP4L1, UBE3D, ZDHHC7, MIS12, DENND6B). In addition, 16 genes have not been previously suggested as functional candidates of ALS (Fig. 2A, Supplementary Table S10). Among the 31 genes, the most significant finding was C9orf72 (FDR = 5.03×10− 18 in Brain_Nucleus_accumbens_basal_ganglia), which was at orders of magnitude more significant than any other gene in any tissue (minimum FDR = 0.001). C9orf72, representing the most well-established gene involved in the risk of ALS, was significant not only in 11 brain regions but also in pituitary, skeletal muscle, and blood. The second most significant finding was the gene SCFD1 at 14q12, which also showed a significant association with ALS risk in 10 brain regions, pituitary, skeletal muscle, and blood (min FDR = 0.001 in Brain_Cerebellar_Hemisphere). This gene also represented a well-established candidate gene for ALS risk. The 16 newly identified genes had similar significance levels (with FDR ranging between 0.001 and 0.05), and all were significant in up to three tissues. Among these 16 genes, 13 from 11 loci were significant in at least one brain region, and 3 from 3 different loci were significant only in non-brain tissues, i.e., skeletal muscle (12q13.3 PIP4K2C), blood (17q12 DHRS11), and pituitary (16q24.1 ZDHHC7).
S-MultiXcan found a total of 22 genes at 14 distinct loci, among which 6 genes at 6 distinct loci (8q13.2 ARFGEF1, 12q13.3 OS9, 12q14.1 CTDSP2, 12q24.31 RP11-173P15.9, 13q12.3 LINC00426, 15q25.2 HDGFRP3) were not identified in S-PrediXcan (Fig. 2A, Supplementary Table S11). Among the 6 S-MultiXcan-only loci, one gene (15q25.2 HDGFRP3) has been reported in a previous TWAS study [14], and the other 5 genes represent new findings. Among these 5 genes, the most significant was ARFGEF1 (FDR = 6.4×10− 4), which involves in vesicular trafficking and has previously been suggested to play a role in pathogenesis in ALS based on Gene Ontology [62].
Overall, our S-PrediXcan and S-MultiXcan together identified 21 novel genes at 15 novel loci, complementing the lists of previously established functional candidate genes and ALS-associated loci.
Colocalization highlights genotype-mediated genes in corresponding tissues. GWAS associations driven by eQTLs may indicate functional mechanisms. However, few studies have investigated the colocalization with eQTLs for ALS-associated loci. We conducted a series of eQTL colocalization analyses in 13 brain tissues, pituitary, skeletal muscle, and blood utilizing a variety of eQTL datasets from GTEx v8 and eQTLGen consortium. These analyses identified a total of 9 genes at 5 loci showing significant evidence (PP4 > 0.75 & PP3 + PP4 > 0.9 & PP4/PP3 > 3, Fig. 3A, Supplementary Table 12) of colocalization with eQTLs in at least one tissue investigated. These included 5q33.1 (TNIP1, GPX3), 9p21.2 (C9orf72), 10q25.2 (ZDHHC6, ACSL5), 14q12 (SCFD1, G2E3), 14q32.12 (TRIP11, RP11-529H20.6).
The strongest signal according to the PP4/PP3 ratio was identified for rs2453555 at 9p21.2 (PP4/PP3 = 82.5), which was highly significantly associated with ALS risk (GWAS p = 6.5×10− 30) and at the same time served as a highly significant eQTL of C9orf72 in the pituitary gland (eQTL p = 4.4×10− 12), strongly suggesting a causal relationship (Fig. 3B). This SNP was also a significant eQTL of C9orf72 in spinal cord cervical, albeit at a much lower significance level (eQTL p = 8.1×10− 6), thus having less strong evidence of colocalization (PP4/PP3 = 5.2). The association at this locus showed no evidence of colocalization with eQTLs in other tissues investigated. The SNP rs2453555 is in very high linkage disequilibrium (LD) with the most significant SNP (rs3849943, p = 3.8⋅10− 30, r2 = 0.98) in the GWAS of Nicolas et al. [10]. This result pinpoints rs2453555, which may regulate the expression of C9orf72 in the pituitary, and consequently modifies the risk of ALS. A very recent study failed in finding colocalization signals for C9orf72 [63]. Compared with their study, our study used the newest version of GTEx, which has an average 24% increased sample size.
The second strongest signal was observed for 10q25.2 ZDHHC6 in the cerebellum (56.7) as well as in other 5 different brain tissues (5.1 to 19.9). The other gene (ACSL5) at this locus also showed significant colocalization (5.1) but at a much lower significance level than ZDHHC6 and the signal was observed only in blood. The third signal was 5q33.1 TNIP1 with colocalization signals in the cerebellar hemisphere (18.0) and blood (9.7) but not in other tissues. The other gene at this locus (GPX3) was significant in only blood (10.0). The fourth signal was 14q12 SCFD1 in six brain tissues, skeletal muscle, and blood at similar significance levels (PP4/PP3 ranging between 9.9–12 except in nucleus accumbens basal ganglia, where PP4/PP3 = 6.0). The other gene at this locus (G2E3) was detected only in skeletal muscle at a further decreased level of significance (5.5). The last signal was 14q32.12 TRIP11 in the cerebellum, cerebellar hemisphere, pituitary, skeletal muscle, and blood with similar levels of significance (5.1–6.1). The other gene at this locus (RP11-529H20.6) showed a relatively weak colocalization signal in the blood (3.7). These results provided direct evidence of causality for a specific set of SNPs, genes, and corresponding tissues likely functionally involved in the development of ALS (Supplementary Table S12).
SMR illustrates the causal relationships between SNPs, gene expressions, and ALS risk. SMR & HEIDI is a powerful approach to test whether the effect of an SNP on a phenotype is mediated by transcription. Four studies [13, 52, 57, 64] have previously conducted SMR & HEIDI analyses for ALS. However, they were either conducted using smaller size GWAS or using a limited number of tissues. Here, based on the GWAS of Nicolas et al. (n = 80610), we additionally analyzed 9 brain tissues, pituitary, and skeletal muscle from GTEx v8.
Our SMR & HEIDI analysis identified a total of 9 genes from 6 loci with significant evidence mediating the genetic associations observed in these loci (FDR < 0.05 & HEIDI > 0.01, Table 1). These included 3q24 PLOD2, 9p21.2 C9orf72, 10q22.2 NDST2, 14q12 SCFD1, 17q12 GGNBP2, MYO19, DYNLL2, ZNHIT3, and 22q13.33 PLXNB2. Among these 6 loci, 3q24 and 22q13.33 have not been previously reported. At 3q24, PLOD2 showed significant evidence mediating the association between rs149615181 and ALS risk in skeletal muscle. This gene encodes a membrane-bound homodimeric enzyme localized to the cisternae of the rough endoplasmic reticulum, which plays a critical role in the stability of intermolecular collagen crosslinks and progressive degradation of the extracellular matrix, but little is known about its potential relationship with ALS. At 22q13.33, PLXNB2 was significant in blood. This gene is a transmembrane receptor involved in axon guidance and cell migration in response to semaphorins [65] and recently showed that it mediates the neurogenesis and neuroprotective activities of angiogenin [66], which was implicated in ALS and AD [67].
Table 1
Genes mediating the genetic associations with ALS in 6 loci from SMR & HEIDI analyses
Locus | Gene | SNP | FDR | HEIDI | Tissue | Database |
3q24 | PLOD2 | rs149615181 | 0.04 | 0.69 | Muscle_Skeletal | GTEx v8 |
9p21.2 | C9orf72 | rs2453565 | 2.00×10− 5 | 0.14 | Pituitary | GTEx v8 |
| | rs700795 | 0.02 | 0.19 | Brain_Spinal_cord_cervical_c-1 | GTEx v8 |
10q22.2 | NDST2 | rs11000785 | 0.05 | 0.07 | Blood | eQTLGen |
14q12 | SCFD1 | rs7144204 | 3.6⋅10− 3 | 0.10 | Blood | eQTLGen |
| | rs448175 | 0.01 | 0.35 | Blood | GTEx v8 |
| | rs229152 | 0.03 | 0.45 | Brain_Cerebellum | GTEx v8 |
| | rs229243 | 0.04 | 0.31 | Muscle_Skeletal | GTEx v8 |
| | rs2070339 | 0.03 | 0.22 | Multiple brain regions | Brain-eMeta |
17q12 | GGNBP2 | rs11650008 | 0.03 | 0.04 | Blood | eQTLGen |
| MYO19 | rs7222903 | 0.04 | 0.60 | Blood | eQTLGen |
| DYNLL2 | rs2877858 | 0.04 | 0.16 | Blood | eQTLGen |
| ZNHIT3 | rs4796224 | 0.05 | 0.50 | Blood | eQTLGen |
22q13.33 | PLXNB2 | rs62241220 | 0.02 | 0.77 | Blood | eQTLGen |
All genes with FDR < 0.05 & HEIDI > 0.01 are shown. Loci that have not been identified in previous GWASs or post-GWASs are indicated in bold. Genes that have not been reported in previous SMR studies are indicated in bold. Tissues that have not been reported in previous SMR studies are indicated in bold. |
Interestingly, at 9p21.2, C9orf72 was found highly significantly mediating the association between rs2453565 and the risk of ALS in the pituitary (FDR = 2⋅10− 5), which was at orders of magnitude more significant than in other brain and non-brain tissues. This SNP is in very high LD with rs2453555 (r2 = 0.95) identified by our colocalization analysis as described above. This result boosted the likelihood of a causal chain between rs2453555/rs2453565, expression of C9orf72 in the pituitary and the risk of ALS. In the very recent study of van Rheenen W et al. [57], HEIDI test rejected the hypothesis that expression of C9orf72 could mediate the association between rs2453555 and ALS risk in blood. Our finding stresses the pituitary being the correct tissue where C9orf72 plays a functional role in the development of ALS.
At 14q12, SCFD1 in blood (FDR = 3.6⋅10− 3 in eQTLGen), cerebellum (FDR = 0.03) and skeletal muscle (FDR = 0.04) showed significant mediatory effects on genetic association with ALS. Notably, rs229243 was detected to increase risk of ALS by modifying SCFD1 expression in skeletal muscle. This SNP is also a significant eQTL of SCFD1 in skeletal muscle as found in our colocalization analysis (PP4/PP3 = 9.9). Interestingly, SCFD1 had different effect directions in blood, skeletal muscle and cerebellum (Supplementary Table S13). A very recent SMR analysis [64] found that rs229243 had a regulatory effect on ALS risk mediated by the expression of SCFD1 in the blood (GTEx) and cerebellum. Our SMR and colocalization results thus further provided evidence for skeletal muscle as an additional tissue possibly of function.
At 17q12, expressions of four genes (GGNBP2, MYO19, DYNLL2, ZNHIT3) in the blood significantly mediated the genetic association in this locus (Supplementary Table S13). Among these four genes, GGNBP2 was the most significant (FDR = 0.03), consistent with the finding from a previous study [52].
Integration of evidence pinpoints causal genes in corresponding tissues. Overall, our study identified a total of 43 genes at 24 loci showing significant evidence of causality (Supplementary Table S14). Among these 43 genes, 23 genes at 17 loci have not been functionally linked with ALS in previous studies. Among the 24 loci, 10 loci (9 from TWAS, one from SMR) have not been associated with ALS risk in previous studies.
A total of 8 genes at 6 loci were significant in at least two out of three analyses, i.e., TWAS, colocalization analysis and SMR analysis. These included 5q33.1 TNIP1, 9p21.2 C9orf72, 10q25.2 ACSL5, 10q22.2 NDST2, 14q12 SCFD1, 17q12 MYO19, GGNBP2, and ZNHIT3. All these 6 loci have been previously associated with ALS risk, and all 8 genes have been previously suggested as the functional candidates. Integrating the results from three different analyses conducted in various tissues, our study further revealed their most likely corresponding functional tissues (Table 2).
Table 2
Integration of TWAS, COLOC and SMR results.
Locus | Gene | Tissue | TWAS | COLOC (PP4/PP3) | SMR (HEIDI) | Overall evidence |
5q33.1 | TNIP1 | Blood | 2.2×10− 3 | 0.91 (9.70) | 3.6×10− 3 (6.7×10− 4) | ** |
9p21.2 | C9orf72 | Brain_Spinal_cord_cervical_c-1 | 7.00×10− 13 | 0.81 (5.10) | 1.5×10− 2 (0.19) | *** |
| C9orf72 | Pituitary | 5.00×10− 14 | 0.99 (82.50) | 2.00×10− 5 (0.14) | *** |
10q25.2 | ACSL5 | Blood | 2.7×10− 2 | 0.82 (5.10) | - | ** |
10q22.2 | NDST2 | Blood | 3.3×10− 2 | - | 4.9×10− 2 (0.07) | ** |
14q12 | SCFD1 | Brain_Cerebellum | 1.7×10− 3 | 0.92 (11.81) | 3.1×10− 2 (0.45) | *** |
| SCFD1 | Muscle_Skeletal | 3.7×10− 2 | 0.91 (9.85) | 3.8×10− 2 (0.31) | *** |
| SCFD1 | Blood | 1.5×10− 3 | 0.92 (12.00) | 1.1×10− 2 (0.35) | *** |
| SCFD1 | Brain_Anterior_cingulate_cortex_BA24 | 3.3×10− 3 | 0.92 (11.85) | - | ** |
| SCFD1 | Brain_Cerebellar_Hemisphere | 1.1×10− 3 | 0.92 (11.35) | - | ** |
| SCFD1 | Brain_Cortex | 2.4×10− 3 | 0.91 (10.75) | - | ** |
| SCFD1 | Brain_Frontal_Cortex_BA9 | 1.3×10− 3 | 0.92 (11.80) | - | ** |
| SCFD1 | Brain_Nucleus_accumbens_basal_ganglia | 2.7×10− 3 | 0.77 (6.01) | - | ** |
17q12 | MYO19 | Blood | 2.2×10− 3 | - | 4.0×10− 2 (0.6) | ** |
| GGNBP2 | Blood | 1.3×10− 2 | - | 2.5×10− 3 (0.04) | ** |
| ZNHIT3 | Blood | 3.5×10− 2 | - | 4.9×10− 2 (0.5) | ** |
Blood refers to whole blood from GTEx v8 or eQTLGen depending on which is more significant. The TWAS column indicated the p-value (FDR). The COLOC column indicated PP4 and PP4/PP3. The SMR column indicated the p-value (FDR) of the SMR test and the HEIDI test. The number of asterisks in the overall evidence column represents the number of significant results from three different analyses. |
The most significant finding was for 9p21.2 C9orf72, which was highly significant in all three analyses, and all three analyses pinpointed pituitary as the most likely functional tissue, with orders of magnitude more significant than in any other tissues. We thus propose that in the pituitary, the expression of C9orf72, regulated by rs2453555, is causally associated with ALS risk.
14q12 SCFD1 in cerebellum, skeletal muscle and blood were significant in all three analyses and multiple other brain tissues were supported by two analyses, emphasizing the multi-tissue effect of SCFD1.
The remaining 4 loci (5q33.1, 10q25.2, 10q22.2, and 17q12) were supported by two analyses, but all suggesting blood instead of brain tissues being the causal tissue. This finding is somehow surprising and requires experimental validations in future studies. For 17q12, three genes (MYO19, GGNBP2, and ZNHIT3) are functional candidates. A previous study [14] suggested MYO19 as the most likely functional candidate of this locus, while another [52] suggested GGNBP2. Our analysis suggested that GGNBP2 is less competitive with the other two as it had a more significant HEIDI (p = 0.04).