Serum proteomics reveals APOE dependent and independent protein signatures in Alzheimer’s disease

The current demand for early intervention, prevention, and treatment of late onset Alzheimer’s disease (LOAD) warrants deeper understanding of the underlying molecular processes which could contribute to biomarker and drug target discovery. Utilizing high-throughput proteomic measurements in serum from a prospective population-based cohort of older adults (n = 5,294), we identified 303 unique proteins associated with incident LOAD (median follow-up 12.8 years). Over 40% of these proteins were associated with LOAD independently of APOE-ε4 carrier status. These proteins were implicated in neuronal processes and overlapped with protein signatures of LOAD in brain and cerebrospinal fluid. We found 17 proteins which LOAD-association was strongly dependent on APOE-ε4 carrier status. Most of them showed consistent associations with LOAD in cerebrospinal fluid and a third had brain-specific gene expression. Remarkably, four proteins in this group (TBCA, ARL2, S100A13 and IRF6) were downregulated by APOE-ε4 yet upregulated as a consequence of LOAD as determined in a bi-directional Mendelian randomization analysis, reflecting a potential response to the disease onset. Accordingly, the direct association of these proteins to LOAD was reversed upon APOE-ε4 genotype adjustment, a finding which we replicate in an external cohort (n = 719). Our findings provide an insight into the dysregulated pathways that may lead to the development and early detection of LOAD, including those both independent and dependent on APOE-ε4. Importantly, many of the LOAD-associated proteins we find in the circulation have been found to be expressed - and have a direct link with AD - in brain tissue. Thus, the proteins identified here, and their upstream modulating pathways, provide a new source of circulating biomarker and therapeutic target candidates for LOAD.


Introduction
Alzheimer's disease (AD) is the most common cause of dementia, accounting for up to 80% of all dementia cases 1 , of which late onset Alzheimer's disease (LOAD) is most common 2 .As of 2022, approximately 55 million individuals worldwide had dementia, representing 1 out of 9 people aged 65 and over 3 .While promising advances have been made in amyloid-targeting therapeutic options for early-stage LOAD 4,5 , they still have limited bene t and identi cation of additional risk pathways that can be used for early detection and intervention is highly needed.To meet these demands, a variety of biologically relevant circulating molecules have been broadly associated with LOAD risk.The proteome in particular has the potential to reveal circulating markers of diseaserelated molecular pathways from different tissues, and studies assessing the circulating proteomic signatures between non-demented older adults and individuals suffering from LOAD have been described [6][7][8][9][10][11][12][13][14][15][16][17] .Modest sample sizes, low-throughput proteomics and lack of longitudinal measurements have, however, been limiting factors in these studies.However, a recent large-scale longitudinal study identi ed promising blood-based markers for all-cause incident dementia although it is unknown how speci c the results are to LOAD 18 .Information on the global circulating proteomic pro le preceding the onset of LOAD, and how well it re ects AD-related processes in brain and CSF, is thus scarce.
Alzheimer's disease has a considerable genetic component, and both common 19 and rare risk variants have been identi ed 20 , of which the strongest effects are conferred by variants in the well-known apolipoprotein E (APOE) gene.Approximately 25% of the general population carries the APOE-ε4 variant while it is present in over 50% of AD cases 21,22 .The APOE-ε4 allele increases the risk of LOAD by threefold in heterozygous carriers and up to twelvefold in homozygous carriers 23 .Although the link between the ε4 allele and LOAD has been extensively researched, light has yet to be shed on the precise mechanism by which the APOE gene affects LOAD onset and/or progression.Importantly, recent large-scale proteogenomic studies have consistently established the APOE locus as a protein-regulatory hotspot, regulating levels of hundreds of proteins in both circulation [24][25][26][27] and cerebrospinal uid (CSF) 28,29 .Yet, it remains unknown to what extent these proteins relate to LOAD and if they can provide new information on the mechanisms through which APOE-ε4 mediates it risk.Identifying LOADassociated circulatory proteins and whether their association is APOE-dependent or independent is crucial for the understanding of AD more generally as well as for gaining insight into potential pathways suitable for targeting in personalized treatment.
The current study tests the hypotheses that speci c proteomic signatures in the circulation precede LOAD diagnosis and can re ect dysregulated biological pathways in the brain and CSF.Furthermore, we expect that some of these protein signatures may be affected by the APOE-ε4 genotype and can thus provide molecular read-out of pathways directly affected by APOE-ε4.To address these hypotheses, we used a high throughput aptamer-based platform to characterize 4,137 serum proteins in 5,294 participants of the population-based Age, Gene/Environment Susceptibility-Reykjavik Study (AGES-RS) 30 to identify protein signatures of incident LOAD (events occurring during follow-up) and prevalent LOAD, taking an unbiased, longitudinal, and cross-sectional approach to the discovery of potential biomarkers for LOAD (Fig. 1).Considering APOE's protein-regulatory in uence and how it may impact the way that serum-proteins are associated with LOAD, we disentangled the LOAD protein signature into APOE-ε4 dependent and independent components, by identifying proteins whose LOAD-association is largely attenuated upon conditioning on APOE-ε4 carrier status.We compared the serum protein signature of LOAD to those observed in cerebrospinal uid (CSF) and brain and nally, used genetic variation as anchors to determine the potential causal direction between serum proteins and disease state.

Results
The AGES study cohort.This prospective population-based study was based on 5,127 participants free of dementia at baseline, after the exclusion of 163 individuals with prevalent non-AD dementia, and 167 individuals with prevalent LOAD.During a potential follow-up of 12.8 years (median), 655 individuals were diagnosed with incident LOAD, with the last individual being diagnosed 16 years from baseline.Participants with incident LOAD were older at entry, were more likely to carry an APOE-ε4 allele, had lower BMI, and had lower education levels compared to healthy individuals (Supplementary Table 1).See Fig. 1 for study overview.
Serum protein pro le of incident LOAD in AGES.To investigate the LOAD-associated circulatory proteomic patterns which occur prior to disease onset, we used Cox proportional hazards (Cox PH) models and found 320 aptamers (303 proteins) to be signi cantly (FDR < 0.05) associated with incident LOAD diagnosis after adjusting for age and sex (model 1), with hazard ratios (HRs) ranging from 0.78 for TBCA to 1.47 for NTN1 (Fig. 2d, Supplementary Table 2).To account for variability related to APOE-ε4 carrier status, we adjusted for the genotype in an additional model (model 2, Supplementary Table 2), which resulted in 140 signi cant aptamers (130 proteins, HR 0.79 (CD4) -1.25 (CGA/FSHB), FDR < 0.05) (Fig. 2e), all of which overlapped with model 1 (Fig. 2f).When comparing the two models, 43% of the serum proteins remained signi cant after APOE-ε4 adjustment, indicating that their LOAD association is independent of the APOE-ε4 genotype (Table 1).Adjusting for additional AD risk factors and eGFR (see Methods) retained 38 signi cant LOAD-associated aptamers (35 proteins, HR 0.80 (CD4) -1.26 (SMOC1), FDR < 0.05) (model 3, Supplementary Table 2), which may re ect speci c processes affecting risk of LOAD but not driven by currently established risk factors.As hazard ratio variability can arise with lengthy follow-up time, secondary analyses were implemented with a 10-year follow-up cut-off, which revealed mostly overlapping results (Supplementary Note 1, Supplementary Tables 3 and 4, Extended Data Fig. 1).We did however detect protein associations speci c to the shorter follow-up time, which potentially re ect processes that take place closer to the LOAD diagnosis.As there may be further differences in proteomic pro les depending on whether protein sampling occurred before or after LOAD diagnosis, we additionally considered the protein pro le of the 167 individuals with prevalent LOAD at baseline (Supplementary Note 2, Extended Data Fig. 2a-c, Supplementary Tables 5-7).Interestingly, many of the proteins associated with increased risk of incident LOAD showed the opposite direction of effect for prevalent LOAD, although generally not statistically signi cant (Extended Data Fig. 2d).These contrasting results suggest an important temporal element in the LOAD-associated proteome.In total there were 346 aptamers (329 unique proteins) associated with LOAD when all outcomes (incident and prevalent LOAD), follow-up times and models were considered (Supplementary Tables 2, 3, 5, 6, and 7).
To evaluate which biological processes are re ected by the overall incident LOAD-associated protein signature in AGES, we performed a gene set enrichment analysis (GSEA).The strongest enrichment for protein associations in model 1 was observed for gene ontology (GO) terms related to neuron development and morphogenesis (Fig. 2h-i, Supplementary  8), demonstrating that these terms were mainly driven by the APOE-ε4 independent component of the LOADassociated protein pro le.
Serum proteins with APOE -dependent association to incident LOAD.As previously mentioned, 43% of the protein associations with incident LOAD were independent of APOE-ε4.Of the remaining 57% that were affected by APOE-ε4 adjustment, we identi ed 17 proteins whose associations with incident LOAD were particularly strongly affected by APOE-ε4 carrier status (Table 2, Fig. 3a, Extended Data Fig. 3, Supplementary Table 2).These proteins, hereafter referred to as APOE-dependent proteins, were de ned as proteins signi cantly (FDR < 0.05) associated with incident LOAD in model 1 but whose nominal signi cance was attenuated (P > 0.05) or direction of effect changed upon APOE-ε4 adjustment in model 2. These APOE-dependent proteins included those with the strongest associations to LOAD prior to adjusting for the APOE-ε4 allele (Fig. 2a, d).The levels of the APOE protein itself were not associated with either incident or prevalent LOAD. Figure 3b shows the intra-correlations among the 17 APOE-dependent proteins.All the 17 APOE-dependent proteins were strongly regulated by the APOE-ε4 allele (Fig. 3c, Table 2, Extended Data Fig. 4, Supplementary Table 9), with the ε4 allele increasing the levels of ve of the proteins and decreasing the levels of the other 12. Accordingly, we observed that increased levels of the ve APOE-ε4 upregulated proteins and decreased levels of the 12 APOE-ε4 downregulated proteins were also associated with higher risk of LOAD, yielding a hazard ratio above and below one, respectively (Fig. 3d).As per de nition, most of the APOE-dependent proteins lost signi cance upon APOE-ε4 adjustment yet interestingly, the direction of effect inverted for ve proteins after APOE-ε4 adjustment (ARL2, IRF6, NEFL, S100A13, TBCA) (Fig. 3e).
Table 2 -A summary table of the 17 APOE-dependent LOAD associated proteins, describing their tissue speci city in the Human Protein Atlas v22, results from th association analyses in AGES (n = 5127), references for previous functional associations with LOAD and whether the aptamers have been orthogonally validat by mass-spectrometry (MS) 26 .The APOE-dependency is de ned as being signi cant (FDR < 0.05) in model 1 and fully non-signi cant in model 2 (P > 0.05) or re effect for incident LOAD. Incident

3.E-47
The HR conferred by APOE-ε4 for incident LOAD in AGES was 2.1 (Cox PH P = 1.23e-27) per each ε4 allele.To evaluate if any of the 17 APOE-dependent proteins might mediate the effect of APOE-ε4 on incident LOAD, we considered the change in HR for APOE-ε4 on risk of incident LOAD when adjusting for individual proteins.We found that adjustment for most proteins resulted in a minor effect decrease, suggesting they do not mediate the APOE-ε4 effect on LOAD (Extended Data Fig. 5a).Intriguingly, however, the adjustment for four proteins (NEFL, ARL2, TBCA and S100A13) caused an increase of ~ 10% in APOE-ε4 effect size (Extended Data Fig. 5a-b).Thus, the effect of APOE-ε4 on LOAD is partly masked by secondary opposing associations between these proteins and LOAD, which are further explored below.The effect of APOE-ε4 carrier status on LOAD risk was largely unchanged in a multivariable model containing all 17 APOE-dependent proteins, thus not supporting a mediating effect of these proteins for the LOAD risk conferred by APOE-ε4 (base Cox PH model: HR = 2.1, P = 1.23e-27, multivariable Cox PH model: HR = 2.2, P = 3.2e-10).However, although not direct mediators, the 17 proteins could be blood-based readouts of a true mediator within tissue-speci c pathological processes occurring prior to LOAD diagnosis.
To map out potential tissues of origin for the circulating levels of the 17 APOE-dependent proteins, we considered gene expression data from the Human Protein Atlas 31 .We observed that ve (LRRN1, TMCC3, FAM159B, NEFL, GSTM1) of the APOE-dependent proteins had elevated gene expression in brain compared to other tissues and additional two (IFIT2, NDE1) clustered with brain-speci c genes (Table 2).Of the remaining APOE-dependent proteins, six were universally expressed, including in brain tissue, and four were enriched in other tissues.We did not detect any signi cantly enriched molecular signatures nor GO terms for the 17 APOE-dependent proteins (Supplementary Table 7).However, a network analysis of measured and inferred physical protein-protein interactions 32 revealed that the APOE-dependent proteins interact directly with proteins involved in neuronal response-and development, neuroin ammation and AD (Fig. 4, Supplementary Table 10-12, Supplementary Note 3).
Given the well-established relationship between APOE and cholesterol 33 we explored the potential effect of serum lipid levels on the association between LOAD and the 17 APOE-dependent proteins (Supplementary Table 13, Extended Data Figs.6-7, Supplementary Note 4).Our ndings suggest that, while many of the APOE-dependent proteins are associated with cholesterol levels, it is not the driver of their link to LOAD.
External validation of protein associations with incident LOAD in the ACE cohort.We set to externally evaluate our observations in an independent cohort, the ACE -Alzheimer Center Barcelona (n = 1,341), with SOMAscan platform (7k) measurements from plasma of individuals who were referred to the center.The longitudinal component of ACE consists of individuals who had been diagnosed with mild cognitive impairment (MCI) at the center and had been followed up.A total of 719 participants had follow-up information and 266 converted to LOAD over a median follow-up of 3.14 years (Supplementary Table 14).Despite the fundamentally different cohorts, with AGES being population-based and using the 5K SOMAscan platform and ACE being based on individuals with established symptoms and the 7K SOMAscan platform, we replicated 36 protein associations with LOAD at nominal signi cance (P < 0.05) in the smaller ACE cohort (Table 3, Fig. 2f-g).Of those, 30 proteins were nominally signi cant in model 1 with 97% being directionally consistent with the observations in AGES (Fig. 2f).In model 2, 21 proteins were nominally signi cant, 86% of which were directionally consistent (Fig. 2g).After multiple testing correction, seven proteins remained statistically signi cant (FDR < 0.05), all of which were directionally consistent (Table 3, Fig. 2f-g).Six were statistically signi cant (FDR < 0.05) in model 1 (NEFL, LRRN1, TBCA, CTF1, C1orf56 and TIMP4) and one in model 2 (S100A13) (Supplementary Table 9).Of all 332 tested aptamers, 213 (64%) were directionally consistent regardless of signi cance in model 1 (Two-sided exact binomial test P = 2.0e-05) and 202 (61%) were directionally consistent in model 2 (Two-sided exact binomial test P = 0.002), demonstrating an enrichment for consistency in direction of effect.The protein associations replicated in the ACE cohort are of particular interest as they represent potentially clinically relevant candidates for LOAD that are consistent in two different contexts, in both a general population and a clinically derived symptomatic sample set.However, our results suggest that many of the proteins that associate with long-term LOAD risk are not strongly associated with the conversion from MCI to AD, which is further into the AD trajectory and may also explain the limited overlap between the proteins associated with prevalent and incident LOAD in AGES.External validation of reversed LOAD association conditional on APOE-ε4 for a subset of proteins.Speci cally considering the APOE-dependent proteins, the association between the APOE-ε4 allele and the proteins was replicated for 13 of 17 proteins in the ACE cohort (Fig. 3f).Furthermore, the change in direction of effect for incident LOAD upon APOE-ε4 adjustment was replicated in the ACE cohort for 4 of 5 proteins (ARL2, NEFL, S100A13 and TBCA) (Fig. 3g-h) (Supplementary Table 9), with even larger effects observed in the ACE cohort compared to AGES in the APOE-ε4 adjusted model and three proteins (ARL2, S100A13 and TBCA) becoming statistically signi cant (P < 0.05).Thus, the attenuation of the primary LOAD associations for these proteins upon APOE-ε4 adjustment meet the criteria of APOE-ε4 dependence (see Methods).No signi cant interaction between protein and APOE-ε4 carrier status on AD risk was observed in either the AGES or ACE cohorts.Taken together, our results show that these proteins are strongly downregulated by APOE-ε4, and consequently show an inverse relationship with incident LOAD, but when adjusting for the APOE-ε4 allele, their association to LOAD is still signi cant but reversedsuggesting a secondary non-APOE-ε4-mediated process affecting these same proteins in relation to LOAD in the opposite direction that is more strongly observed a cohort of individuals with MCI than in the population-based AGES cohort.
Mendelian randomization to identify potential causal associations between proteins and LOAD.The proteins associated with LOAD could include proteins causally related to the disease, or proteins whose serum level changes re ect a response to prodromal or genetic liability to LOAD.To test this hypothesis, we performed a bi-directional two-sample MR analysis, including the targets of all 346 aptamers associated with LOAD in our study.Genetic variant associations for serum protein levels were obtained from a catalog of cis-protein quantitative trait loci (pQTLs) from AGES 24 while variant associations with LOAD were extracted from a recent GWAS on 39,106 clinically diagnosed LOAD cases, 46,828 proxy-LOAD and dementia cases and 401,577 controls of European ancestry 19 .In total, 117 (34%) of the LOAD-associated serum aptamers had cis-pQTLs that were suitable as genetic instruments and were included in the protein-LOAD MR analysis (Supplementary Table 15).
In the forward MR analysis, two proteins, integrin binding sialoprotein (IBSP) and amyloid precursor protein (APP), had support for causality (Supplementary Table 16).IBSP had a risk-increasing effect for LOAD in both the causal (OR = 1.26,FDR = 0.03) and observational analysis (incident LOAD full follow up, HR = 1.13,FDR = 0.04).APP had a protective effect for LOAD in both the causal (OR = 0.76, FDR = 0.03) and observational analysis (incident LOAD full follow up, HR = 0.87, FDR = 0.047).Notably, while not statistically signi cant, we observed suggestive support for a protective effect of genetically determined serum levels of acetylcholinesterase (ACHE, OR = 0.92, P = 0.061), a target of clinically used therapeutic agent for dementia 34 (Supplementary Table 16, Extended Data Fig. 8).In a forward MR analysis of the APOE-dependent protein interaction partners, two proteins, APP and MAPK3, had support for causality (Supplementary Tables 10-12, Supplementary Note 3).
As most of the observational protein associations in the current study were detected for incident LOAD, and thus re ect changes that take place before the onset of clinically diagnosed disease, it is unlikely that their levels and effects are direct downstream consequences of the disease after it reaches a clinical stage.However, they may re ect a response to a prodromal stage of the disease.We therefore performed a reverse MR to test if the observed changes in serum protein levels are likely to occur downstream of the genetic liability to LOAD, which may capture processes both at the prodromal and clinical stage.The APOE locus is likely to have a dominant pleiotropic effect in the reverse MR analysis (Supplementary Table 17, Extended Data Fig. 9, Supplementary Note 5), as it has a disproportionately strong effect on LOAD risk compared to all other common genetic variants, while also being a well-established pQTL transhotspot, affecting circulating levels of up to hundreds of proteins 24,25,27 .We therefore performed the primary reverse MR analysis using only LOAD-associated genetic variants outside of the APOE locus as instruments.We found two proteins (S100A13 and ARL2) that were signi cantly (FDR < 0.05) affected by LOAD or its genetic liability (Supplementary Table 17, Extended Data Figs.9-10).Interestingly, both were among the 17 previously identi ed APOE-dependent LOAD proteins, together with two additional proteins that were nominally signi cant in the reverse MR (TBCA, P = 4.4e-4, FDR = 0.051 and IRF6, P = 7.9e-4, FDR = 0.055).Thus intriguingly, these ndings suggest that these four proteins are upregulated by LOAD, in contrast to the observed APOE-ε4 downregulation of the same proteins (Fig. 5).This supports our ndings of competing biological effects described above (Fig. 3e, Extended Data Fig. 5) and collectively our results indicate that simultaneous opposing effects of APOE-ε4 on one hand and LOAD on the other result in differential regulation of these proteins in serum (Fig. 5b).
We performed a replication analysis of the effect of APOE-ε4 on protein levels and the reverse MR results for these four proteins using published protein GWAS summary statistics from two recent studies 25,35 .In the external datasets, the downregulation of all four proteins by APOE-ε4 (as determined by the rs429358 C allele) was replicated.In the reverse MR analysis (excluding the APOE locus), the upregulation of protein levels by LOAD liability observed in AGES was also detected for two proteins (S100A13 and TBCA) in both validation cohorts, reaching signi cance (P < 0.05) in the study by Ferkingstad et al.
(Extended Data Fig. 11, Supplementary Table 18).While the two proteins changed direction in a similar manner as in AGES, the effect size was considerably smaller in the validation cohorts.Importantly, however, individuals in these two cohorts are much younger than those in AGES, with mean ages of 55 and 48 years for the Ferkingstad et al. and Sun et al. studies, respectively, compared to 76 years in AGES.Therefore, we conducted an age-strati ed reverse MR analysis in AGES, which showed a strong age-dependent effect, with a much larger effect of LOAD genetic liability on protein levels in individuals over 80 years old compared to those younger than 80 years (Extended Data Fig. 11).The effect size in AGES individuals under 80 years old was in line with the effect observed in the validation cohorts.Thus, if the upregulation of these proteins re ects a response to prodromal or preclinical LOAD, an older cohort may be needed to detect an association of the same degree as we found in AGES.However, the observed support in the validation cohorts for the discordant effects of APOE vs non-APOE LOAD-associated genetic variants on the same serum proteins strongly implicates these proteins as directly relevant to LOAD, potentially as readouts of biological processes that are both disrupted by APOE-ε4 and modulated in the opposite manner as a response to genetic predisposition to LOAD or the disease onset in general.
Together, these results indicate that LOAD or its general genetic liability causally affects the levels of some APOE-dependent proteins, but this effect is simultaneously masked by the strong effects of the APOE locus in the other direction (Fig. 5a).These outcomes strengthen results described above, showing that the levels of these four proteins are strongly downregulated in APOE-ε4 carriers and lower levels of these proteins are therefore associated with increased risk of LOAD in an APOE-dependent manner (Fig. 5b).Simultaneously, the reverse MR analysis shows that the collective effect of the other non-APOE LOAD risk variants is to upregulate the serum levels of these same proteins, possibly re ecting a response mechanism to LOAD pathogenesis (Fig. 5c).Again, this is in line with the observational analysis, where all four proteins changed direction of effect when adjusting for APOE-ε4 (Fig. 5a, Fig. 2d-e).
Overlap with the AD brain and CSF proteome.To evaluate to what extent our LOAD-associated serum proteins re ect the proteomic pro le of AD in relevant tissues, we queried data from recent proteomic studies of AD in cerebrospinal uid (CSF) 36 and brain 37 which also describe tissue speci c co-regulatory modules.We observed that of our LOAD-associated serum proteins, 51 proteins were also associated with AD in brain as measured by mass-spectrometry, with 32 (63%) being directionally consistent (Fig. 6a-b) (Supplementary Tables 19-20).Higher directional consistency was observed within the APOEindependent protein group, or 15 (71%) of 21 proteins associated with AD in brain tissue.Additionally, 60 proteins were directly associated with AD in CSF as measured with SOMAscan (7k) (Fig. 6a) with 46 (77%) being directionally consistent (Fig. 6b) 21 .The proportion of directionally consistent associations between serum and CSF was higher in both the APOE-independent and dependent protein groups, or 88% (22 of 25 and 7 of 8 for APOE-independent and dependent proteins, respectively) (Fig. 6b, Supplementary Table 19).However, directional inconsistency between plasma and CSF AD proteomic pro les has been reported before in a similar comparison 38 .Fourteen proteins overlapped between all three tissues in the context of AD (Fig. 6a) (Supplementary Table 19).Many of these proteins have established links or are highly relevant to LOAD, such as Spondin 1 (SPON1), involved in the processing of amyloid precursor protein (APP) 39 ; Secreted Modular Calcium-Binding Protein 1 (SMOC1) previously proposed as a biomarker of LOAD in postmortem brains and CSF 40 ; Netrin-1 (NTN1), an interactor of APP and regulator of amyloid-beta production 41 ; Neuro lament light (NEFL), previously proposed as a plasma biomarker for LOAD and axon injury 42,43 and Von Willebrand factor (VWF), known for its role in blood clotting and associations with LOAD 44 (Supplementary Table 19).Notably, some of the APOE-dependent proteins were associated with AD across all three tissues such as TBCA and TP53I11.
We have previously described the co-regulatory structure of the serum proteome, which can broadly be de ned as 27 modules of correlated proteins 26 (Supplementary Table 21).In the current study we found that among the 346 aptamers (329 proteins) associated with LOAD (prevalent or incident, any model), ve serum protein modules (M27, M3, M11, M2 and M24) were overrepresented (Fig. 6c, Supplementary Table 22).In particular, the 140 APOEindependent aptamers were speci cally overrepresented in module M27, enriched for proteins involved in neuron development and the extracellular matrix, and module M3 that is associated with growth factor signaling pathways (Supplementary Table 22).By contrast, the 17 APOE-dependent proteins were speci cally enriched in protein module M11 (Supplementary Table 22), which is strongly enriched for lipid pathways and is under strong genetic control of the APOE locus 26 .Serum modules M27, M24 and M11 were all enriched for AD-associations in CSF (Fig. 6c).We next sought to understand to what extent our LOAD-associated proteins identi ed in serum might re ect AD protein signatures in CSF and brain tissue.Among the LOAD-associated proteins measured in serum, we found the APOE-dependent and APOE-independent proteins to be enriched in different CSF modules, most of which were also linked to AD (Fig. 6d, Supplementary Table 22).In brain tissue, the serum APOE-independent LOAD proteins were particularly enriched in brain module M42 (Matrisome), which is enriched for extracellular matrix (ECM) proteins 37 .Strikingly, M42 was strongly enriched for the AD-proteomic pro les of all three tissues (Fig. 6e, Supplementary Table 22).Interestingly, members of this module (SMOC1, APP, SPON1, NTN1, GPNMB) showed some of the strongest associations in serum to incident LOAD in our study (Fig. 2d, Supplementary Table 2) as well as in brain (Fig. 6b, Supplementary Table 22).
This module has furthermore been demonstrated to be correlated with amyloid beta (Aβ) deposition in the brain and some of its protein constituents (e.g MDK, NTN1 and SMOC1) have been shown to colocalize with and bind to Aβ 37 .Additionally, the APOE locus regulates M42 levels in the brain (mod-qTL), and while the APOE protein is a member of module M42, this regulation was found to not be solely driven through the levels of the APOE protein itself 37 .Our results simultaneously show that other members of the module, such as SPON1 and SMOC1, exhibit an APOE-independent association to incident LOAD in serum.
Interestingly, these same two proteins are increased in CSF thirty years prior to symptom onset in autosomal dominant early onset AD 45 .In summary, we demonstrate signi cant overlaps in LOAD-associated protein expression across blood, CSF and brain on both an individual protein level and on protein module level.

Discussion
We describe a comprehensive mapping of the serum protein pro le of LOAD that provides insight into processes that are independent or dependent on the genetic control of APOE-ε4 (Fig. 7).We identi ed 329 proteins in total that differed in the incident or prevalent LOAD cases compared to non-LOAD participants in a population-based cohort with long-term follow-up.Among these, we identi ed a novel grouping of proteins based on their primary LOAD-association being statistically independent of (130 proteins), or dependent on (17 proteins) APOE-ε4 carrier status.Many of the APOE-independent proteins are implicated in neuronal pathways and are shared with the LOAD-associated CSF and brain proteome.The 17 APOE-dependent proteins overlap with AD-associated protein modules in CSF and interact directly with protein partners involved in LOAD, including APP.Another key nding is, amongst these 17 proteins, four proteins (ARL2, S100A13, TBCA and IRF6) change LOAD-associated direction of effect both observationally and genetically when taking APOE-ε4 carrier status into account.Importantly, we replicate this directional change both observationally for three proteins (ARL2, S100A13, TBCA) and genetically for two proteins (S100A13, TBCA) in external cohorts.Collectively, our results suggest that while their primary association with LOAD re ects the risk conferred by APOE-ε4, there exists a secondary causal effect of LOAD itself on the protein levels in the reverse direction as supported by the MR analysis, possibly re ecting a response to the disease onset.
Previous studies identifying proteins associated with LOAD have been limited to a cross-sectional cohort or are based on all-cause dementia 18,46-48 .Here we extend those ndings by distinguishing LOAD cases from other types of dementia in a prospective cohort study to identify LOAD-speci c serum protein signatures preceding clinical onset.Furthermore, our comparative approach of statistical models with and without APOE-ε4 adjustment provides a novel compartmentalized view of the LOAD serum protein pro le and demonstrates how protein effects can differ depending on genetic confounders which are imperative to take into consideration.We found that the proteins associated with incident LOAD in our study, in particular those independently of APOE-ε4 such as GPNMB, NTN1, SMOC1 and SPON1, overlap with the proteomic pro le of LOAD in CSF 38 and brain 37 , are enriched for neuronal pathways and have been functionally implicated with LOAD (Table 1), which may re ect an altered abundance of neuronal proteins in the circulation during the prodromal stage of LOAD.These overlaps that we nd across independent cohorts and different proteomics technologies suggest that the serum levels of some proteins have a direct link to the biological systems involved in LOAD pathogenesis and may even provide a peripheral readout of neurodegenerative processes prior to clinical diagnosis of LOAD.In particular, the proteins that show directionally consistent effect sizes suggest exceptional AD-speci c robustness as the measurements vary by tissue, methodology and populations.
We identi ed 17 proteins with a particularly strong APOE-dependent association to incident LOAD, of which eight were also associated with prevalent AD in CSF.The association between APOE-ε4 and circulating levels of these proteins has been reported by our group 24,26,27 and others 49 , but their direct association with incident LOAD has to our knowledge not been previously described.These APOE-dependent proteins may point directly to the processes through which APOE-ε4 mediates its risk on LOAD and provide a readout of the pathogenic process in the circulation of the approximately 50% of LOAD patients worldwide carrying the variant 21,22 .While our data does not provide information on the tissue-origin of the APOE-dependent proteins, nine either exhibit brain-speci c gene expression, cluster with brain-speci c genes 50 or have been associated with LOAD at the transcriptomic or protein level in brain tissue or CSF (Table 2).
At the genetic level, a lookup in the GWAS catalog 51 shows that an intron variant in the IRF6 gene has a suggestive GWAS association with LOAD via APOE-ε4 carrier status interaction 52 .In addition, variants in the TMCC3 gene have been linked to LOAD 76 , educational attainment 53 and caudate volume change rate 54 and variants in the TBCA gene have been suggestively associated with reaction time 55 and PHF-tau levels 56 .Collectively, the gene expression patterns for these proteins in the brain, interactions with proteins involved in neuronal processes and suggestive associations between genetic markers in or near these genes and brain-related outcomes suggest that these APOE-dependent proteins may re ect brain-speci c processes affected by APOE-ε4 carrier status that affect the risk of developing LOAD.Importantly, the association patterns for ARL2, S100A13 and TBCA suggest the presence of a pathway that is downregulated by APOE-ε4 early in life, given the consistent effect of APOE-ε4 on the same proteins in younger cohorts, but upregulated at the onset of LOAD, as supported by the larger observed effects in the APOE-ε4 adjusted analysis in the ACE cohort of individuals who are closer to diagnosis on the AD trajectory than those in AGES.Additional studies are required to expand on these interpretations and dissect the complex mechanisms at play and to determine if the modulation of the process represented by these proteins has therapeutic potential.
Two proteins, IBSP and APP, were identi ed to potentially have a causal role in LOAD.IBSP was previously associated with plasma amyloid-β and incident dementia 57 , while APP is the precursor protein for amyloid-β 58 .Based on the MR analysis for the LOAD-associated proteins that could be tested, the majority do not appear to be causal in and of themselves but their association with incident LOAD may still re ect changes that occur years before the onset of LOAD that could be of interest to target before irreversible damage accumulates.
A major strength of this study is the high-quality data from a prospective longitudinal population-based cohort study with detailed follow-up, broad coverage of circulating proteins and a comprehensive comparison to the AD-proteome in CSF and brain.The limitations of our study include that our results are based on a Northern European cohort and cannot necessarily be transferred directly to other populations or ethnicities.Additionally, while we partly replicate our overall ndings in an external cohort, a greater replication proportion could be anticipated in a more comparable cohort.The ACE cohort consists of clinically referred individuals with MCI and proteomic measurements performed on a different version of the SOMAscan platform.Additionally, different normalization procedures were applied by SomaLogic for the two SOMAscan versions, which may have an effect on the LOAD associations 48 .Further studies are required to determine the impact of time to event, platform and normalization approaches on the associations between circulating proteins and LOAD.Regardless of these differences, we did replicate the majority of the APOE-dependent LOAD associations, including the APOE-dependent change in effect for ARL2, S100A13 and TBCA.We could not test all LOAD-associated proteins for causality, including most of the APOE-dependent proteins, due to lack of signi cant cis-pQTLs for two thirds of the proteins, thus we cannot exclude the possibility that some could be causal but missed by our analysis.Finally, despite our LOAD diagnosis criteria it is possible that some of our ndings re ect processes related to dementia in general.As a result, it is critical that these ndings be validated in individuals with established amyloid-beta and tau deposits, as well as in experimental settings.
The proteins highlighted in this study and the mechanisms they point to may be used as a source of biomarkers or therapeutic targets that can be modulated for the prevention or treatment of LOAD.This large prospective cohort study, using both a longitudinal and cross-sectional design, represents a uni ed and comprehensive reference analysis with which past and future serum protein biomarkers and drug targets can be considered, compared, and evaluated.
overview Flowchart of the current study.a) Overview of the AGES cohort and study participants.Prevalent non-AD dementia cases were excluded from the analysis.b) Overview of the aptamers tested and their associations with LOAD.Serum measurements of 4782 aptamers were associated to prevalent and incident LOAD status, using logistic and Cox proportional hazards regression models, respectively.From the proteins associated with incident LOAD, sets of 140 proteins with an APOE-independent associations and 17 proteins with an APOE-dependent association were de ned.The APOE-dependent proteins were further expanded to rst degree protein-protein interaction (PPI) partners.All sets of proteins were subjected to functional enrichment analysis and bidirectional Mendelian Randomization (MR) analysis.c) Overview of the replication cohorts used in the study which include proteins measured in the circulation (ACE) as well as in brain and CSF (Emory).

Figure 2 Proteins
Figure 2 Proteins associated with incident LOAD in AGES (n = 5127).a-b) Volcano plots showing the protein association pro le for incident LOAD from the Cox PH a)without APOE-ε4 adjustment (model 1) and b) with APOE-ε4 adjustment (model 2).c) Venn diagram for the overlap between models 1 and 2 for incident LOAD.d-e) Enrichment of top Gene Ontology terms from GSEA analysis for incident LOAD (model 1) shown as d) dotplot, strati ed by ontology and e) geneconcept network.f-g) Comparison of effect sizes (HR) for incident LOAD between the AGES and the ACE (n = 719) cohorts for all proteins reaching nominal signi cance (P < 0.05) in the Cox PH in ACE for f) model 1 and g) model 2.

Figure 3 Proteins
Figure 3 Proteins with APOE-ε4 dependent association to incident LOAD in AGES (n = 5127).a) Spaghetti plot showing the statistical signi cance of protein associations with incident LOAD across the three Cox PH models, highlighting a set of 17 unique proteins (green) whose association with incident LOAD is attenuated upon APOE-ε4 adjustment.The horizontal lines indicate FDR < 0.05 (dashed) and P < 0.05 (dot-dashed).b) Pairwise Pearson's correlation between the 17 APOE-dependent proteins.c) Forest plot showing the linear associations between APOE genotype and the 17 APOE-dependent proteins.The beta coe cient shows the change in protein levels per ε4 allele count.d-e) Forest plots showing the associations between the 17 APOE-dependent proteins and incident LOAD d)without APOE-ε4 adjustment (model 1) and e) with APOE-ε4 adjustment (model 2).LOAD-HR indicates risk per SD increase of protein levels.Proteins that change direction of effect between the two models are highlighted in red.

Figure 4 Functional
Figure 4 Functional enrichment analysis of APOE-dependent protein-protein interaction partners.a) A scheme of the PPI partners selection, where rst degree partners of the APOE-dependent proteins were extracted from the InWeb database.b-c) Enrichment of selected Gene Ontology terms for the PPI partner proteins shown as b) dotplot and c) gene-concept network.d-e)Enrichment of top seven unique Wikipathways shown as d) dotplot and e)gene-concept network.

Figure 5 Reverse
Figure 5Reverse Mendelian randomization analysis.a) Comparison of hazard ratios for incident LOAD with and without APOE-ε4 adjustment in the observational analysis (Cox PH), the effects of APOE-ε4 on protein levels and reverse MR odds ratios (excluding the APOE locus) for the four APOE-dependent proteins that change direction of effect in both observational and causal analyses when APOE is accounted for.b-c)Visual summaries of the observed data.b) Mediation diagrams showing 3 possible hypotheses that could explain the relationship between APOE-ε4, LOAD and the four proteins shown in a).Our analyses do not support the hypothesis that LOAD mediates the effect of APOE-ε4 on proteins (Hypothesis 1) nor the other way around (Hypothesis 2).However, our results from both the observational and causal analyses support the hypothesis that two mechanisms are at play that affect the same proteins in the opposite direction (Hypothesis 3).c) The APOE-ε4 mutation leads to increased risk of LOAD via its effects in brain tissue.The same mutation results in a downregulation of serum levels of four proteins that are themselves negatively associated with incident LOAD.Additionally, other non-APOE LOAD risk variants lead to upregulation of the same proteins in the reverse MR analysis, possibly re ecting a response to LOAD or its genetic liability.

Figure 6
Figure 6 Overlap between AD protein signatures in serum, brain and CSF.a) A Venn diagram showing the overlap of AD-associated proteins in serum, brain and CSF.b) A comparison of the effect sizes for AD associated proteins that overlap between serum and brain (top) and serum and CSF (bottom).The proteins are strati ed based on the APOE-dependence in AGES for incident LOAD.The effect size in AGES is shown for incident LOAD model 1 (Cox PH), except for proteins that were uniquely identi ed using the shorter 10-year follow-up (Cox PH) or prevalent LOAD (logistic regression), in which case the respective effect size from the signi cant association is shown.c-e) Heatmap showing the enrichment (Fisher's test) of AD-associated proteins by tissue type (x-axis) in c) the AGES serum protein modules, d) Emory CSF protein modules and e) Emory brain protein modules (y-axis).

Table 1 -
A summary table of the top 20 signi cant APOE-independent proteins associated with incident LOAD in AGES (n = 5127).The effect size (Hazard Ratio (Con dence Interval)) and level of signi cance (P, FDR) is shown for model 2 in the Cox PH, adjusting for age, sex and APOE-ε4.The nal column indicates if the aptamers have been orthogonally validated by mass-spectrometry (MS)26.The APOE-independency is de ned as proteins remaining signi cantly (FDR < 0.05) associated with incident LOAD after APOE-ε4 adjustment.

Table 3 -
Replication of the LOAD associated proteins from AGES (n = 5127) in the ACE cohort (n = 719).All proteins with nominal P < 0.05 in either model from the Cox PH are shown.P and FDR values < 0.05 are highlighted in bold.