Study design
In this study, we used SomaLogic Somascan assay for measuring the protein levels of 7,029 analytes in CSF of 2,286 participants from Knight ADRC,21 FACE, ADNI, and Barcelona-1 cohorts. We employed a three-stage analytical approach (stage 1, stage 2, and meta-analysis) to identify robust proteomic alterations in the AD CSF proteome. Based on an AT(N) paradigm, the discovery analysis (stage 1) was performed in the Knight ADRC and FACE cohorts (n=1,170; A-T- = 680, which correspond to biomarker negative individuals, and A+T+ = 490, or biomarker positive individuals). The significant proteins after false discovery rate correction (FDR < 0.05) were further replicated in stage 2 using ADNI and Barcelona-1 cohorts (n=593; A-T- = 235 and A+T+ = 358). Finally, a meta-analysis of stage 1 and 2 was performed to identify robust proteins associations passing a more stringent Bonferroni correction (Bonf < 0.05) criteria. We further validated these proteins in a completely independent CSF proteomics cohort (Stanford ADRC; A-T- = 80 and A+T+ = 27) profiled using a different protein quantification platform (Somascan 5K).
Using lasso regression, we identified a distinctive signature of 11 proteins with robust and high predictive power for AD (AUC = 0.97–0.99) in three independent cohorts (stage 1, stage 2, and validation). The identified proteomic signature was unique to AD and showed no to very low predictive power for other dementia such as frontotemporal dementia (FTD; AUC = 0.61), dementia Lewy body (DLB; AUC = 0.73), or Parkinson’s disease (PD; AUC = 0.57). The developed proteomic signature displayed significant association with disease progression (β = 0.35, p = 2.1×10-04) and individual’s probability of not developing AD (p = 2.2×10-58).
A comprehensive examination of protein abundance across various AT groups (A-T-, A+T-, and A+T+) revealed distinct protein pseudo-trajectories (estimating protein longitudinal trajectories based on cross-sectional data) that span the entire AD continuum. Based on these disease stage- we obtained four different group of proteins, with unique pseud-trajectories. Group-specific pathway enrichment was performed to understand biological processes compromised during different stages of AD continuum. Each group displayed enrichment for several biological systems (nervous system, immune response, biosynthesis, and signal transduction) and specific brain cell types (neuron, astrocytes, and microglial cells). Overall, the disease and pathway enrichment analyses highlighted several neurological disorders (e.g., AD, tauopathy, and synucleinopathy) and neuronal functions (neuron projection morphogenesis, synapse assembly, and axonogenesis) to be significantly enriched (FDR < 0.05) in the altered AD CSF proteome (Table 1, Fig. 1).
Table 1: Demographics information of participants at the time of the CSF draw.
|
Stage 1 (Discovery)
|
Stage 2 (Replication)
|
Validation
|
Cohort
|
Knight ADRC
|
FACE
|
ADNI
|
Barcelona-1
|
Stanford ADRC
|
Sample size
|
836
|
618
|
700
|
132
|
132
|
Females (%)
|
54.31
|
59.22
|
42.86
|
54.55
|
55.30
|
Males (%)
|
45.69
|
40.78
|
57.14
|
45.45
|
44.70
|
Age (mean)
|
70.80
|
72.14
|
73.49
|
68.16
|
68.70
|
Age (SD)
|
8.51
|
8.44
|
7.50
|
8.19
|
7.55
|
A+T+ (%)
|
20.45
|
51.62
|
40.00
|
59.09
|
20.45
|
A+T- (%)
|
55.62
|
34.79
|
31.43
|
11.36
|
18.94
|
A-T- (%)
|
23.92
|
13.59
|
28.57
|
29.55
|
60.61
|
APOE4+ (%)
|
39.00
|
26.70
|
50.43
|
50.00
|
46.21
|
This table summarizes basic demographic information of CSF proteomics study participants. For each cohort, we report sample size, percentage of females and males, mean age and its standard deviation (SD), percentage of A+T+, A+T-, and A-T- participants, and percentage of APOE4+ and APOE4- individuals. Abbreviations: Knight-ADRC, Knight Alzheimer’s Disease Research Center; ADNI, Alzheimer's Disease Neuroimaging Initiative; SD, standard deviation.
|
Identification of AD-specific CSF proteomic alterations
We performed a three-stage study to identify significant alterations in AD CSF proteome (Fig. 2B). In the first stage, a discovery analysis was performed on 1,170 individuals (A+T+ = 490, A-T- = 68) from the Knight ADRC and FACE studies (Fig. 2A). We identified 3,565 with significantly different levels (FDR < 0.05) between A-T- (biomarker negative and a proxy for controls) and A+T+ (biomarker positive and a proxy for AD cases) individuals (Fig. 2C and Supplementary Table 1). Consistent with previous proteomic studies, some of the significantly up-regulated proteins included known AD biomarkers such as SMOC1 (FDR = 2.8×10-181), 14-3-3 protein YWHAG (P = 4.1×10-179), PPP3R1 (FDR = 1.8×10-146), and NRGN (FDR = 1.8×10-90). 17–19,22–24
In the second stage, the protein that showed significant associations in stage 1 were further tested in the stage 2 that comprises 593 individuals (A-T- = 235, A+T+ = 358) from ADNI and Barcelona-1 (Fig. 2A). Of the 3,565 identified proteins in stage 1, 2,608 replicated in stage 2 after FDR and with consistent effect direction (Fig. 2D and Supplementary Table 2). Of these 1,693 were upregulated in A+T+ (cases) compared to A-T- (controls), and 915 were downregulated.
In the third stage, we performed a meta-analysis to combine the p-values from stage 1 and 2, for those proteins that replicate in stage 2 and applied a stringent Bonferroni correction to minimize the chances for false-positive results (Fig. 2A). The meta-analysis resulted in 2,173 proteins associated with AT status after Bonferroni correction (Fig. 2E and Supplementary Table 3). Finally, we validated these findings by using CSF proteomics data from an independent study (Stanford ARDC) that employed a different proteomic panel (Somascan 5K). As this validation cohort had a limited size (n=132 and Table 1), we assessed the consistency of effect size and significance (p-value) across all these studies. We observed a strong correlation between the effect size (corr = 0.90, p = 3.3×10-187) and p-values (corr = 0.82, p = 1.5×10-138) of the meta-analysis and the Stanford ADRC study (Figure S2). This unbiased validation confirms the platform-independent robustness of our meta-analysis results. We considered the 2,173 proteins that passed Bonferroni correction in stage 3 for downstream analyses: disease prediction models, and pathway enrichment (Fig. 1).
Identification of a robust and AD-specific prediction model
Since the entire set of differentially abundant analytes (DAA; n=2,173), identified using multi-stage metanalysis, is too large for developing a clinically meaningful proteomics panels for AD diagnosis and prognosis, we used machine learning approaches to identify the minimum number of proteins with high prediction power (Fig. 3A). We used least absolute shrinkage and selection operator (Lasso) regression model25 on 70% of the stage 1 (stage 1 training; n=819) for training. The Lasso regression model with five-fold cross-validation identified 56 proteins. Proteins displaying high correlation (Pearson correlation > 0.8) between the abundance levels in the stage 1 data were removed to further reduce the size of proteomic signature. Since the performance of identified proteomic signature was also assessed in an independent study (Stanford ADRC) that used a different protein quantification platform (Somascan 5K), only proteins overlapping between the proteomic signature and Stanford ADRC data (n=25) were kept. Finally, a set of 11 proteins, which significantly contributed to the prediction (P < 0.05 in the multi-variant model; Supplementary Table 4) were kept. The identified proteomic signature included some of the well-known AD-associated proteins such as YWHAG,18,22 PIN1,26 and EZR.27
This model (11 proteins and specific weights) was assessed in the stage 1 testing (30% of stage 1 data; n=351), stage 2 (replication; n=593), and external validation (Stanford ADRC; n=107) datasets. This model showed strong prediction power for classifying A+T+ vs A-T- individuals, with an area under the curve (AUC) of 0.98 and 0.97 for stage 1 testing and stage 2 datasets respectively, and 0.99 in the independent Stanford ADRC cohort (Fig. 3B). Positive predictive value (PPV) and negative predictive value (NPV) were >0.86 in all cases (Supplementary Table 5). The performance of the baseline model, which only used age and sex for predicting AT status was significantly low for stage1 testing, stage 2, and Stanford validation cohorts, with an AUC of 0.72, 0.59, and 0.57, respectively (Fig. 3B).
We also analyzed if the same model can predict clinical diagnosis (Controls = 724, AD = 882), and obtained an AUC of 0.89 for stage 1+2, and 0.97 and Stanford ADRC (Fig. 3C). These high AUC suggests the robustness of our prediction model in stratifying clinical AD individuals from controls, as well as AT biomarker status.
To further assess the specificity of this prediction model for AD, we also applied it (same proteins, weights and cut-off as identified in Stage-1 training) to other dementia disorders including dementia Lewy body (DLB; n=25), frontotemporal dementia (FTD; n=42), and Parkinson’s disease (PD; n=507), as well as other non-AD individuals (n=335) and healthy controls (n=1,157). We observed that model did not have a strong prediction power for these non-AD dementias and PD, with AUC ranging from a maximum of 0.70 in the case of DLB to a minimum of 0.44 for PD (Fig. 3D). Overall, these results suggest that we have identified a unique signature of 11 proteins that showed consistently high prediction power for predicting AD clinical or biomarker status. This identified proteomic signature is specific to AD as it showed very low power for other dementia such as FTD, DLB, or PD.
Assessing progression to dementia and rate of memory decline
Next, we asked if the identified CSF 11-proteins signature can reliably distinguish between slow and fast progressors. For this analysis, we focused on individuals with an AD-diagnosis at lumbar puncture and rate of memory decline was modeled using change in Clinical Dementia Rating sum of boxes (CDR-SB) per year. We observed a significant separation between the regression slopes for individuals predicted as proteomic signature-positive and -negative (Fig. 3E; red and green slopes, respectively). Individuals positive for the proteomic signature presented faster rate of progression (β = 0.35, p = 2.1×10-04). No difference between the slopes was observed between A-T- vs A+T+ individuals (Fig. 3E; blue and orange slopes).
We also performed a time-to-event analysis to assess if our proteomic signature can also determine if cognitive normal individuals at lumbar puncture are more likely to develop AD. We observed that individuals positive for the 11-protein panel displayed a significantly high probability of developing AD (p = 2.2×10-58) in comparison to individuals that were negative for the proteomic signature (Fig. 3F). In particular, the individuals positive for the 11-protein panel displayed almost 100% of the individuals develop AD in the 10-year interval post-first clinical assessment, whereas the individuals negative for this panel showed 35% probability of developing AD in the same time span.
In summary, these results indicate that the identified AD CSF proteomic signature is a better predictor of dementia progression than the known AT status. Furthermore, individuals that are predicted to be positive for this proteomic signature exhibit significantly low probability of not developing AD as compared to their counterparts that are negative for this signature.
CSF proteome exhibit distinct protein expression patterns throughout the AD continuum
Following the AT classification system, we categorized individuals into three groups: biomarker negative individuals (A-T-), individuals in early AD stages: amyloid positivity but tau negativity (A+T-), and full biomarkers positive individuals (A+T+), which cover the entire AD continuum. The goal of this analyses is to determine how the protein levels change across the AD continuum (pseudo-trajectories), determine if there are specific patterns of those changes and the pathways associated with those changes.
Based on the differences in the estimates and their significance across three independent differential abundance analyses (A-T- vs. A+T-, A+T- vs. A+T+, and A-T- vs. A+T+), we identified four distinct groups of proteins (Fig. 4A and Supplementary Table 6). Specifically, group one (G1) included 471 proteins that showed consistently “linear increase” in protein abundance from healthy controls (A-T-) to asymptomatic (A+T-) to AD (A+T+) stage. The second group (G2) included a set of 482 proteins that followed an “up-down” trend, i.e., they showed an increase in protein abundance from biomarker negative to early stages and then a decrease to full biomarker positive. Group 3 (G3) included a set of 184 protein analytes that showed a consistent “linear decrease” from biomarker negative to positive. Finally, group four (G4) showed the exact opposite behavior of G2, a “down-up” trajectory, where an initial decrease was followed by an increase.
The G1 (linear increase) includes key AD-associated proteins such as SPARC-related modular calcium-binding protein 1 (SMOC1,28 Extended Data Fig. 1), Neurofilament Light Chain (NEFL)29, Glial Fibrillary Acidic Protein (GFAP)30, Granulin Precursor (GRN)31, Protein Phosphatase 3 Regulatory Subunit B, Alpha (PPP3R1)32, and Alpha-Synuclein (SNCA)33. Besides having these established AD biomarkers, this group also included NCK Adaptor Protein 2 (NCK2) and SHANK Associated RH Domain Interactor (SHARPIN) which are located on two known AD risk loci.34 Recent studies have revealed that SMOC1 protein in the brain colocalizes with Aβ plaques35 and its CSF levels increase almost 30 years before AD symptom onset.36
The G2 (up-down) also includes proteins located on multiple known AD-risk loci such as SPI1 and Protein Tyrosine Kinase 2 Beta (PTK2B), as well as other proteins known to be implicated on AD or neurodegeneration such as Brain-Derived Neurotrophic Factor (BDNF)37,38, Cathepsin D (CTSD)39,40, and Nuclear Factor Kappa B Subunit 1 (NFKB1) 41,42. Some of the key proteins contained in the G3 (linear decrease) group include Carboxylesterase 1 (CES1)43, Interleukin 6 (IL6)44,45, and Forkhead Box O1 (FOXO1)46,47, which have been implicated in various metabolic, age-, and immune system-related mechanisms that underlie AD pathogenesis. Finally, and consistent with previous study,48 we found Triggering Receptor Expressed On Myeloid Cells 2 (TREM2) in the G4 (down-up), which showed a decrease from controls to the asymptomatic stage but then significantly elevated levels are noticed in AD individuals. Besides TREM2, G4 contains various other proteins that have been implicated in AD, including Apolipoprotein E (APOE)49, Neurogranin (NRGN)50,51, ADAM Metallopeptidase Domain 17 (ADAM17)52,53, and Nectin Cell Adhesion Molecule 2 (NECTIN2)54. Overall, these results identified four groups of proteins based on their estimated trajectories based on the AD continuum, with each group including known proteins implicated on AD or neurodegeneration.
Network and pathway analysis of the CSF proteome reveal novel proteins related to AD pathophysiology
In order to identify the specific biological process that each of those groups, with unique trajectories, we conducted a functional pathway enrichment analysis using a set of selected topologically important proteins (Fig. 4F-I). To further gain a systems-level understanding of the proteins part of specific pathways, we utilized STRING database55 and extracted protein-protein interaction (PPI) information between the constituent proteins from the top 10 pathways.
G1 captures neuronal death, apoptosis and defects in phosphorylation/dephosphorylation. Specifically, proteins in G1 were enriched in the nervous system related pathways (Fig. 4F and Supplementary Table 7) including pathways of neurodegeneration – multiple diseases (FDR = 1.6×10-05), glutamatergic (FDR = 1.8×10-04) or dopaminergic synapse (FDR = 3.40×10-04) and Parkinson’s disease (FDR = 9.2×10-05) among others. The dopaminergic synapse pathway includes known kinases (GSK3A), and phosphatases (PPP3CA and PPP2R5D). GSK3A and calcineurin (PPP3CA) are known to be involved on tau phosphorylation regulation56 and PPP2R5D is known to cause an autosomal dominant neurodevelopmental disorder, Jordan’s syndrome,57 although this is the first time this protein is implicated on AD. The glutamaergic pathway includes proteins known to be part of the causal AD pathways such as another proteoforms of calcineurin (PPP3R1) reported to be associated with phospho-tau levels and rate of memory decline,56 or HOMER1.58 . This pathway also includes DLG4 and GLUL, both neuronal-specific proteins, involved on signal transduction. The G1 group also includes several proteins implicated on Parkinson such as PRKN, SNCA and PARK759,60. The identification of these proteins could explain why around 30% of the AD cases have Lewy Body pathology, which is normally found in PD. The G1 network also contained NCK2 and SHARPIN, two previously known AD risk loci,34 associated with the ErbB signaling pathway (FDR = 4.65×10-05), and Nectoproptsis (FDR = 7.52×10-05), respectively. The nectoproptsis pathway also include other proteins such as SHARPIN what we recently found to be genetically dysregulated in AD cases and to be part of the causal pathways by performing pQTL mapping couple with colocalization and Mendelian Randomization analyses. All these results suggesting that some of these proteins could not only be pure biomarkers but also part of causal pathway of AD. On the other hand, some known biomarkers included in this group includes both NEFL and NEFH.61,62 We also found multiple 14-3-3 proteins (e.g. YWHAB, YWHAG, and YWHAH) to be part of this group extending our previous results,63 which are predicted to be neuronal specific and are part of the cell division pathway (FDR = 5.29×10-08). Multiple recent studies suggest that mosaic mutations resulting from mitosis defect64 could also be involved in AD pathogenesis.
In contrast of G1 (lineal increase) which seems to capture early neuronal death, the G2 (down-up) group is capturing immune response glia-specific and endolysosome pathways, including platelet activation (FDR = 0.006), chemokine signaling pathways (FDR = 0.008), and acute myeloid leukemia (FDR = 0.001; Fig. 4G and Supplementary Table 8), which likely as a response to early neuronal death. SPI1, a microglial marker gene located on a well-known AD risk locus,34 was be a part of transcriptional misregulation in cancer pathway, most likely regulating the microglial inflammatory response in AD65. We observed a consistently low abundance levels of SPI1 in the CSF of A+T+ individuals compared to A-T- in both stage 1 (estimate = -0.001, FDR = 2.4×10-07) and stage 2 (estimate = -0.01, FDR = 1.5×10-04). In line with our findings, a decreased level of SPI1 in both primary human microglia and the BV-2 mouse microglia cell line has been shown to be associated with reduced phagocytic capacity of the cells65–67, further supporting these findings. Other proteins of this pathway that interact with SP1 include FLT3 which is important for the normal development and the immune system and is a drug target for acute myeloid leukemia (AML).68. PML, another protein identified in our analyses is also part of this group, interacts with SP1 and is a tumor suppressor protein that is associated with acute promyelocytic leukemia.69 CEBPA, another protein involved in leukemia70 is also part of this network. Some other important proteins included in the G2 group included signal transducer and activator of transcription-1 (STAT1), NFKB1, and Forkhead Box O3 (FOXO3) that have previously been shown to be linked with the inflammatory response in AD brains.71–73. In summary the G2 group is able to capture many novel proteins that are part of the inflammatory and immune response pathway that may become dysregulated due to early neuronal death and apoptosis.
The G3, which displayed linear decrease throughout the AD continuum, seems to be capturing proteins and pathways related to brain plasticity or mechanism trying to compensate for AD-related pathology, including pathways part of biosynthesis-related biological processes (Fig. 4H and Supplementary Table 9), such as cholesterol biosynthetic process (P = 4.4×10-05), sterol biosynthetic process (P = 7.8×10-05), and stem cells proliferation (P = 2.1×10-04; Fig. 4D). Numerous proteins within this group include AXIN2 and CTNNB1.74 AXIN2 is a suppressor of Wnt/β-catenin signaling known to affect mitochondrial biogenesis, which is linked to several neurodegenerative disorders, including AD.75 Consistent with our findings, a significant reduction (~70%, P < 0.001) in soluble β-catenin (CTNNB1) levels has already been shown in AD brains as compared to controls76, and accumulation of this protein is a marker of ubiquitination and rapid proteasomal degradation.77 Part of the same pathway as CTNNB1, is SIRT6 which several studies that higher SIRT6 levels area associated with longer lifespan78,79, which is in line with our findings as we found these proteins decreased in AD cases.
Lastly, proteins within G4 group (down-up) captures a difference microglia activity to that of G2 (up-down; Fig. 4E), as it has the opposite pseudodirectory pattern, and it also captures cell-to-cell crosstalk. Proteins in this group were enriched in in the MAPK signaling (FDR = 3.7×10-07), Ras signaling (FDR = 5.4×10-06), Rap1 signaling (FDR = 1.1×10-04), and different cancer–related pathways e.g., pathways in cancer (FDR = 2.4×10-04) and prostate cancer pathways (FDR = 2.8×10-04; Fig. 4I and Supplementary Table 10). Some of the important highlights of this network include CSF1, and CSF1R, involved in several signally pathways, and key proteins to maintain proper microglia activity.80–82. Other microglia and inflammation related proteins include, FAS, IGF1, and IGF1R proteins, which have been implicated in the pathogenesis of AD and other amyloidosis disorders83. Other key proteins in this group that were not part of the top 10 pathways included TREM2, APOE, PLD3, and NRGN, which have already been implicated in AD, known to be involved on microglia or lysosome activity.49,50,84,85. At the same time, we observed multiple proteins in this group to be the protein-encoding marker genes for neurons (NRG1, a co-receptor for of RENL, a recent gene identified in AD resilience,86, NTRK2, L1CAM, and EPHA7) and astrocytes (CTNNB1, FGFR3, and PDGFC), with most of them being enriched in signaling related processes which suggest microglia neuron communication.
In summary, by first grouping the proteins based on their trajectory and performing pathway analyses we have been able to identify specific mechanism affecting AD pathogenesis at different stages of the disease that other general pathways analyses would have missed (table ST11-ST15, extended results).