Proteomic Clusters Underlie Heterogeneity in Preclinical AD Progression

Background: Heterogeneity in progression to AD poses challenges for both clinical prognosis and clinical trial implementation. In the absence of a well-dened understanding of future disease trajectory, participants may receive unnecessary treatment or true effects of pharmacological intervention may be obscured. We identied early differences in preclinical Alzheimer Disease (AD) biomarkers, assessed patterns for developing preclinical AD across the Amyloid-Tau-(Neurodegeneration) (AT(N)) framework, and considered potential sources of difference by analyzing the CSF proteome. Methods: 108 participants enrolled in longitudinal studies at the Knight Alzheimer Disease Research Center (ADRC) who completed four or more lumbar punctures and were cognitively normal at baseline were included. Cerebrospinal uid (CSF) measures of Ab42, pTau 181 , and Neurolament Light chain (NfL) as well as proteomics values were evaluated. Imaging biomarkers, including positron emission tomography (PET) amyloid and tau and structural magnetic resonance imaging (MRI) were repeatedly obtained when available. This allowed for staging individuals according to the AT(N) framework. Results: Growth mixture modeling, an unsupervised clustering technique, identied three patterns of biomarker progression as measured by CSF pTau 181 and CSF Ab42. Two groups (AD Biomarker Positive and AD Biomarker Intermediate) had distinct progression from normal biomarker status to having biomarkers consistent with preclinical AD. A third group (AD Biomarker Negative) did not develop abnormal AD biomarkers over time. Participants grouped by CSF trajectories were successfully re-classied using only proteomic proles (AUC AD Biomarker Positive vs AD Biomarker Negatives = 0.970, AUC AD Biomarker Positive vs. Intermediate AD Biomarkers = 0.750, AUC Intermediate AD Biomarkers vs. AD Biomarker Negative = 0.698). Conclusions: We highlight heterogeneity in the development of AD biomarkers in cognitively normal individuals. We identied individuals who became AD Biomarker Positive before age 50. A second group, AD Biomarker Intermediate, developed elevated CSF ptau 181 in their mid-60’s before becoming amyloid positive in their mid-70’s. A third group were AD Biomarker Negative over repeated testing. Our results could inuence the selection of participants for specic treatments (e.g. amyloid-reducing vs. other agents) in clinical trials. CSF proteome analysis highlighted additional potential opportunities for non-AT(N) focused therapies, including blood brain barrier-, liver-, and neuroinammatory-related targets. the AT(N), highlight heterogeneity in amyloid and tau development. The AD Biomarker group developed amyloid and tau pathology before age 50, very early intervention this The Intermediate AD Biomarker group developed signicant tau pathology before amyloid positive. amyloid-reducing would demonstrate less ecacy in this as they do not require amyloid positivity before developing substantial tau burden. We also identied additional potential non-AT(N) related targets for prospective AD drug development, including integrity, liver function, and neuroinammation.


Background
Alzheimer Disease (AD), a slowly progressive neurodegenerative disorder with an extended prodromal stage, affects nearly 6 million Americans (1). It is a disease that progresses from a clinically asymptomatic preclinical phase to a symptomatic clinical phase over many years (2). The disease continuum is generally thought to progress starting with amyloid accumulation followed by the development of tau pathology concurrent with neurodegenerative changes and nally clinically observable cognitive impairment (AT(N)) (3). However, there is signi cant heterogeneity from the time of development of amyloid to the time of clinical symptoms (4). expressed in CSF proteomics, can offer an enhanced understanding of the progression from preclinical to symptomatic AD.
In the literature, the observed heterogeneity in preclinical AD is frequently attributed to cognitive resilience (11). Additional factors that have been proposed as possible drivers of resilience, include educational attainment (12,13), cortical thickness (12), personality (14), cardiovascular health (15), and synaptic function(16-18). A substantial proportion of the resilience literature relies on residual methods which allow researchers to de ne resilience as the source of unexplained variance, rather than identifying clinically meaningful underlying sources (11).
Although di cult, it is vitally important that we understand the etiology of heterogeneity in the preclinical and symptomatic stages of AD. Clinical trials are now targeting drug delivery during either the preclinical or very early symptomatic stages of AD (2). In the absence of a well-de ned understanding of future disease trajectory, participants may receive unnecessary treatment or the true effects of pharmacological intervention may be obscured.
In an effort to further interrogate the mismatch in neuropathologic change and progression to clinical symptoms of AD, we evaluated longitudinal CSF samples from a well-characterized cohort of older adults who were cognitively normal at enrollment. More than 100 participants completed at least four lumbar punctures over a period of about 10 years, in addition to completing other traditional measures of amyloid, tau, and neurodegeneration (AT(N)). Recent reviews have highlighted the importance of longitudinal research, particularly with respect to heterogeneity (19); however, relatively few longitudinal studies have been conducted. One recent review identi ed from more than 1400 studies in AD that had been performed between 1995 and 2015, only 48 included repeat biomarker measurements. From those 48 longitudinal studies, only nine included CSF biomarkers, with almost all of these CSF studies relying on just two timepoints (20). Longitudinal studies are preferable for studying biomarker dynamics (21); measuring samples from the same individuals over time allows for a deeper understanding of potential time courses for disease development.
We compared the trajectories from the longitudinal CSF samples to well-established neuroimaging markers of PET amyloid (using Pittsburgh compound B (PiB)), PET tau (using AV1451), and neurodegeneration (using magnetic resonance imaging (MRI) measures of cortical thickness and white matter hyperintensities), as well as CSF neuro lament light chain (NfL). Due to the aforementioned limitations of traditional AT(N) pathology to fully explain the heterogeneity in symptomatic progression, we included an analysis of proteomics.
Evaluating protein expression in preclinical AD and healthy aging provides insight into additional potential biological mechanisms that underlie observed heterogeneity (22). Beyond simply characterizing our participants using AD biomarkers, proteomics analysis allows us to consider potential pathways and mechanisms of disease progression. Patterns in protein expression in individuals who progress relatively rapidly to symptomatic AD may point to biological hazards, while proteomic expression in individuals who progress rather slowly may help identify protective factors.
The objectives of this project were as follows: to identify early differences in preclinical AD biomarker development, to assess patterns of development of preclinical AD biomarkers across the AT(N) framework, and to consider potential sources of difference by examining the CSF proteome.

Methods
We included 108 participants (Table 1) enrolled in longitudinal studies at the Knight ADRC, Washington University in St Louis (WUSTL) as previously described (23). For study inclusion, participants had to be: 1) cognitively normal at time of enrollment; 2) have longitudinal clinical, imaging and CSF measures; and 3) at least one sequenced set of high throughput proteomics. Four CSF data points were required in order to ensure that observed nonlinear dynamics at the individual level were re ective of actual observed nonlinearity rather than noise. Enrollment in the study occurred over a mean period of 11.3 (SD = 2.4) years. A subset of participants also completed a PET PIB scan and/or PET AV1451 scan and/or structural MRI (Supplemental Fig. 1). This study was approved by the WUSTL Institutional Review Board, and each participant provided signed informed consent.
Data Acquisition CDR Participants in the study completed regular clinical assessments and cognitively normal at time of enrollment as de ned by the Clinical Dementia Rating® (CDR) scale. The CDR classi es the degree of cognitive impairment through the use of semi-structured interviews (24). Individuals with a CDR of 0 are considered to have no impairment; CDR 0.5 as very mild dementia; CDR 1 as mild dementia; CDR 2 as moderate dementia; and CDR 3 as severe dementia (24). APOE Genotyping DNA samples were collected at enrollment and genotyped using either an Illumina 610 or OmniExpress chip. Genotyping methods have been previously described (25).

Cerebrospinal uid (CSF) acquisition
Each participant enrolled in this study completed at least four lumbar punctures (LP). On average, these occurred approximately 2 years apart. This process has been previously described (26). LP was performed at 8:00 AM following an overnight fast. An atraumatic Sprotte 22-gauge spinal needle was used to collect approximately 25 mL of CSF via gravity drip Cerebrospinal uid (CSF) collection and processing Participants underwent CSF collection by LP at approximately 8 AM following overnight fasting (27). CSF (20-30 mL) was collected in a 50-mL polypropylene tube via gravity drip using an atraumatic Sprotte 22-G spinal needle. The tube was inverted gently to disrupt potential gradient effects and centrifuged at low speed to pellet any cellular debris. The CSF was then aliquoted into polypropylene tubes and stored at − 80°C. Concentrations of CSF Aβ40, CSF Aβ42, CSF tau phosphorylated at 181 (CSF ptau 181 ) were measured by chemiluminescent enzyme immunoassay using a fully automated platform (LUMIPULSE G1200, Fujirebio, Malvern, PA, USA). CSF NfL was measured via commercial ELISA kit (UMAN Diagnostics, Umeå, Sweden).

Structural MRI
MRI images were obtained on 3T Siemens scanners. T1-weighted scans were segmented using FreeSurfer 5. 3 (Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, USA), using the Desikan-Killiany atlas. Previous work has identi ed that cortical thickness decreases with the onset of AD (28-30). We calculated the average cortical thickness (30).

White Matter Hypterintensities
T-2 weighted uid attenuated inversion recovery (FLAIR) images were also collected. White matter hyperintensities (WMH) were calculated via a legion segmentation toolbox that relies on Statistical Parametric Mapping (SPM) (31).
Positron emission tomography (PET) imaging PET scans using [11C] PiB were obtained via previously described methods (32). Images were then processed using the PET uni ed pipeline (PUP, https://github.com/ysu001/PUP) (33,34). Images were smoothed to achieve a spatial resolution of 8 mm. This minimized inter-scanner differences (34,35). A standard image registration technique was used to correct for motion (36, 37) using corresponding structural images. The cerebellum was used as the reference region. Regions of interest were de ned using the Desikan-Killiany atlas based on the MRI. The standard uptake ratio (SUVR) in each region was evaluated using the 30-60 minute post-injection time window (38). We applied partial volume correction via a geometric transfer matrix approach (39). The PET PiB summary value was the arithmetic mean of SUVRs for the following regions: precuneus, prefrontal cortex (FreeSurfer regions: superior frontal and rostral middle frontal regions), gyrus rectus (FreeSurfer regions: lateral orbitofrontal and medial orbitofrontal regions), and lateral temporal regions (FreeSurfer regions: superior temporal and middle temporal regions) (40).
PET tau imaging utilized [18F]-Flortaucipir (AV-1451), but was otherwise conducted in a similar manner to PET PiB imaging. The SUVR was evaluated using the 80-100 minute post-injection time window. The whole cerebellum was used as the reference region (40). The PET Tau summary value, hereafter referred to as "Tauopathy", was the arithmetic mean of SUVRs for the following regions: amygdala, entorhinal cortex, inferior temporal region, and lateral occipital cortex (32).

CSF Proteome Analysis
All participants had at least one CSF sample also processed for proteomics pro ling. When multiple CSF samples were available, the most recent sample was retained for analysis. Brie y, proteomic data was generated using the SomaScan 1.3k panel (SomaLogic Inc), an aptamer-based platform as previously described (41,42). Quality control (QC) was performed at the sample and aptamer levels using control aptamers (positive and negative controls) and calibrator samples. At the sample level, hybridization controls on each plate were used to correct for systematic variability in hybridization. The median signal over all aptamers was used to correct for within-run technical variability. This median signal was assigned to different dilution sets within each tissue. For CSF samples, a 20% dilution rate was used.

Statistical Analysis
An unsupervised machine learning technique called growth mixture modeling was used to cluster the longitudinal trajectories of individuals CSF pTau 181 / CSF Aβ42 ratios (43). This approach identi es possible sub-groups within longitudinal data and has previously been employed to study cognitive trajectories in AD (4,15,44) and structural changes (45), but not in preclinical amyloid and tau biomarkers. We searched for 1, 2, 3, and 4 latent clusters and selected the optimal number of clusters via Bayesian Information Criterion (BIC) minimization. We compared participant demographics across the identi ed latent clusters, using the R package table1(46).
To evaluate the time to pathology development, we performed survival analysis (47) to determine age at amyloid positivity (using CSF Aβ42/Aβ40 < 0.0673 pg/mL as the threshold (48)) and age at tau positivity (using the CSF pTau 181 > 42.5 pg/mL as the threshold (48), and age at symptomatic onset (using the rst instance of CDR > 0). Participants were considered pathology positive or to have experienced symptom onset at their visit date where either of their CSF measures surpassed threshold or they had their rst CDR > 0 rating at a clinical visit. On average, participant visits where lumbar punctures were performed were spaced about 2 years apart. Clinical visits for CDR rating occur annually. In the survival analysis, we grouped by the latent clusters that we had identi ed in the previous analysis (relying on the CSF pTau/Aβ42 ratio). For the survival analysis we omitted participants that were amyloid positive (13 / 108) and tau positive (20 / 108) prior to study enrollment for their respective analyses as we could not estimate their age at pathology positivity or change in CDR status.
We then evaluated the trajectory of a variety of amyloid (CSF Aβ42/Aβ40 and PET-PiB summary value), tau (CSF pTau 181 and PET-AV1451 Tauopathy) and neurodegeneration (cortical thickness, WMH volume, CSF NfL) biomarkers, testing for differences across latent clusters. We applied generalized additive mixed models (GAMMs), selected for their interpretability, capacity for nonlinearity, and ability to handle repeated measures at inconsistent time intervals (49). Age at procedure, identi ed latent cluster (based on the longitudinal CSF pTau/Aβ42 ratios), and their interaction were included as regressors. The following parameters were utilized as response variables: CSF Aβ42, PET-PiB cortical amyloid summary, CSF pTau, PET-AV1451 tauopathy, cortical thickness, WMH volume, and CSF NfL.
Finally, we applied Pelora, a supervised clustering technique, to classify individuals as members of the previously identi ed latent clusters using proteomics values (50). A total of 713 CSF proteins passed QC and were utilized as features in the supervised clustering model. For this analysis single timepoint proteome values were compared to clusters derived from longitudinal trajectories.
After applying Pelora to identify ten protein clusters that were predictive of the latent cluster labels, we utilized 10-fold cross validated lasso regression to select clusters relevant for analysis (51). Because this portion of the analysis was for hypothesis generation rather than diagnostic development, we ran Pelora and the subsequent lasso binomial regression on the entire dataset. In order to assess the stability of the classi cation algorithm, we also completed 10-fold cross validation with an 80% train/20% test split and evaluated the area under the curve (AUC).
To complete our analysis of the proteome, we also calculated predictive power scores for each protein.
Further, we performed logistic regression using group membership in the identi ed latent clusters as the response variable and each protein as the regressor in order to calculate the individual AUC for each protein's ability to classify. We performed a pathway analysis for the proteins identi ed by Pelora, in order to better understand the function of the proteins associated with latent cluster group membership (52).

Identi cation of Early Differences in Preclinical AD Pathology
An unsupervised machine learning technique, growth mixture modeling, was used to cluster the longitudinal trajectories for each participant with regards to CSF pTau 181 / Aβ42 (43). This was a novel application of the algorithm to preclinical amyloid and tau biomarkers. Using a data-driven search for the appropriate number of clusters, three latent growth trajectories were identi ed (Fig. 1a). The largest cluster was the "AD Biomarker Negative" group (N = 69) and contained individuals who had relatively low CSF pTau 181 throughout the time period. Despite CSF Aβ42 progressively decreasing, these individuals continued to have relatively low CSF pTau. The second cluster of individuals was referred to as the "Intermediate AD Biomarkers" group(54) (N = 27). It was comprised of individuals who had higher CSF pTau 181 and corresponding lower CSF Aβ42 but did not have CSF pTau 181 as high as the third group.
Participants in the third group were referred to as "AD Biomarker Positive" group (N = 12). These individuals had high CSF pTau 181 and low CSF Aβ42 that were consistent with AD positivity (48). These individuals were at the highest risk for developing AD (55).
In order to understand the different relationships between CSF Aβ42 and CSF pTau 181 for the three groups, we used a generalized additive mixed effect model (GAMM), tting cubic splines by group membership. A breakpoint for the Intermediate AD Biomarker cluster occurred just below 1000 pg/mL (Fig. 1a). After this threshold, the applied GAMM revealed three distinct slopes: one negative slope, indicating a decreasing relationship between CSF pTau 181 and declining CSF Aβ42 (AD Biomarker Negative latent cluster), one approximately zero slope, indicating no relationship between CSF pTau 181 and CSF Aβ42 (Intermediate AD Biomarkers latent cluster), and one positive slope, indicating an increasing relationship between CSF pTau 181 and declining CSF Aβ42 (AD Biomarker Positive latent cluster).
Given recent work emphasizing the utility of Aβ40 as a means to mitigate abnormally high or low Aβ42 values that could be attributed to individual variation in protein production or ventricular volume(56), we evaluated CSF Aβ42/Aβ40 by CSF pTau 181 relationship for the three latent clusters. This normalization of CSF Aβ42 by Aβ40 (Fig. 1b) transforms the apparent three trajectories such that all participants fall on a single monotonically increasing continuum where low CSF Aβ42/Aβ40 is associated with high CSF pTau.
Within this continuum, AD Biomarker Negative individuals exist nearly entirely below the thresholds for amyloid and tau positivity, Intermediate AD Biomarkers individuals exist in the transition area, where the relationship between CSF pTau 181 and CSF Aβ42/Aβ40 goes from a basically at relationship to a steeply increasing relationship, and AD Biomarker Positive individuals show a steeply increasing relationship between CSF Aβ42/Aβ40 and CSF pTau 181 .
Although Fig. 1b seems to show a continuum of pathology, there are important demographic differences across the latent clusters (Table 1). There were differences with regards to age at study enrollment (AD Biomarker Negative individuals were the youngest while Intermediate AD Biomarkers participants were the oldest). These differences in age reveal that although the normalized plot of CSF pTau 181 by CSF Aβ42/Aβ40 shows a continuum of pathology, this does not represent a continuum across time. There were also differences with regards to APOE status, where the APOE ε4 allele was most frequently found in the AD Biomarker Positive cohort and the APOE ε2 allele was most frequently found in the AD Biomarker Negative cohort. By the conclusion of the study, there was a statistically signi cant difference in CDR across the three latent clusters. The AD Biomarker Positive cohort had the greatest clinical decline

Patterns of Development Across the AT(N)
We then performed survival analysis (47) in order to evaluate time to pathology development. We assessed age to amyloid positivity, age to tau positivity, and age to CDR conversion as strati ed by the three latent clusters. In addition to survival analysis, we applied GAMMs to biomarkers of AT(N) pathology.
Survival analysis showed a clear separation in age at amyloid positivity (CSF Aβ42/Aβ40 < 0.0673) (48)) across clusters (Fig. 2a). The majority of AD Biomarker Positive participants were amyloid positive before age 65. GAMM modeling shows that the AD Biomarker Positive participants were, on average, CSF Aβ42/Aβ40 positive at time of enrollment (Fig. 3) (44, 54). AD Biomarker Positive were also CSF pTau positive (CSF pTau 181 > 42.5 ug/mL) around 65 years old (48). Similar results were also seen using PET -AV1451. Relatively few participants converted to CDR > 0; however, the majority of the AD Biomarker Positive cluster developed clinical symptoms by their late 70's (Fig. 2c).
The majority of Intermediate AD Biomarker participants did not become amyloid positive by CSF Aβ42/Aβ40 until around 75 years old, based on survival analysis. The difference in time to amyloid positivity by cluster is statistically signi cant (Cox proportional hazard test, p < 0.001). GAMM modeling aligns with this observation. Interestingly, positivity as de ned by CSF pTau (48) occurred prior to amyloid positivity as de ned by CSF Aβ42/Aβ40 for the Intermediate AD Biomarkers cohort (Fig. 2b). This ordering does not align with the AT(N) hypothesis. For both the Intermediate and AD Biomarker Positive cohorts, clinical symptoms occurred after pathology developed, consistent with the AT(N) hypothesis (57). There were no observable differences in tauopathy as measured by PET-AV1451 between the Intermediate AD Biomarkers and AD Biomarker Negative cohort; however, the data was relatively sparse.
There are no observable differences in any of the measures of neurodegeneration (cortical thickness, WMH volume, and CSF NfL) across the three clusters of participants.

Replicating Identi ed Clusters via the Proteome
We attempted to classify individuals as members of one of the three cohorts using only proteomics data. No additional covariates (e.g. age, sex, APOE genotype) were included in the initial analysis. For each pair of cohorts, we applied Pelora (58) twice: once to the entire set and once using cross validation. We evaluated and reported the proteins identi ed when we applied Pelora to the entire cohort. We also applied 10-fold cross validation in order to evaluate the generalizability of the classi cations; the AUC associated with both model runs subsequently reported. We repeated the analysis after including clinical values of age alone, sex alone, and both age and sex in accordance with the methods outlined (58). Our results remained robust after introduction of these covariates.
The Pelora algorithm(58), which is applied to labeled data (in this case, the labels were AD Biomarker Positive, Intermediate AD Biomarkers, and AD Biomarker Negative) identi ed groups of proteins that were either upregulated or down regulated. The expression of each of the highlighted proteins, separated by group, is shown in Supplemental Figs. 2-7. Heatmaps were generated that showed correlation between identi ed proteins (Supplemental Figs. 13-15). Proteins that were most important for each group were ranked by binomial log-likelihood. To get a general sense of which proteins played the most signi cant role in each group (Intermediate AD Biomarkers vs. AD Biomarker Negative, AD Biomarker Positive vs. AD Biomarker Negative, AD Biomarkers Positive vs. Intermediate AD Biomarkers), the primary function of each of the ten most important proteins (as ranked by binomial log-likelihood) were classi ed after reviewing available literature. A pathway analysis is also reported in the supplement. Proteins were grouped as primarily associated with the blood brain barrier (BBB), cardiovascular disease (CVD), liver function, amyloid production and/or clearance, in ammation, or neurodegeneration. The relationship between each protein, its' function, and which classi cation(s) it applied to is shown in Fig. 4

Discussion
This study used a novel application of growth mixture modeling to identify unique pathological trajectories of participants as a function of CSF pTau 181 / Aβ42. Previously this unsupervised clustering technique has been applied to markers of cognition and neurodegeneration (4,15,44,45). Our objective in doing this was to apply a data driven method to understanding heterogeneity in longitudinal development of AD pathology.
After performing this classi cation, we examined the demographic characteristics of the three clusters that were identi ed. The AD Biomarker Positive group had the greatest proportion of APOE ε4 + individuals. This is consistent with studies that have previously identi ed that the APOE ε4 allele is a risk factor for developing AD (69, 70) and at an earlier age for amyloid deposition (71). The AD Biomarker Negative individuals were younger than the other clusters at the time of enrollment. Interestingly, AD Intermediate individuals were the oldest. A priori, we would have anticipated that the oldest group would have been the AD Biomarker Positive cohort. The older age of the Intermediate cohort suggests that this group is developing AD pathology at a slower rate than the AD Biomarker Positive group and may exhibit some resilience in the face of increasing pathology.
When we looked at pTau 181 as a function of CSF Aβ42/Aβ40, individuals appeared to move along a single continuum rather than three distinct paths. Although it appears in the analysis of the AT(N) that Intermediate individuals attain tau positivity before reaching amyloid positivity, the presented continuum shows steadily decreasing CSF Aβ42/Aβ40 with rapidly increasing CSF pTau 181 after individuals reach an in ection point. This tipping point is approximately where the Intermediate AD individuals fall along the continuum. Figure 1b shows a relationship between amyloid and tau that is consistent with the prevailing literature (e.g. 2,54,56) however it is important to recall that all participants are not the same age. The Intermediate cohort is oldest, meaning that although we observe a continuum of pathology, it is not aligned temporally.
We further interrogated the proposed AT(N) continuum through the use of survival analysis and application of GAMMs with a variety of biomarkers. Overall, longitudinal changes across the AT(N) aligned with existing literature (2,57,72). Compared to the other groups, the AD Biomarker Positive group had signi cantly lower CSF Aβ42/Aβ40 at the time of enrollment; that persisted throughout subsequent time points. With the limited PET-PiB data available, we observed an elevation in amyloid starting at enrollment for the AD Biomarker Positive cohort. This has important implications for clinical trials that emphasize early intervention for amyloid detection and potential removal. For the AD Biomarker Positive group, amyloid-related changes occurred in cognitively normal adults before age 50, consistent with previous work (73). Individuals in the Intermediate AD Biomarkers cohort showed a steady decline in CSF Aβ42/Aβ40 over the study, with many eventually becoming amyloid positive. The Intermediate cohort displayed a very clear increase in amyloid pathology as measured by PET -PiB around age 70.
With regards to CSF pTau, this measure was also elevated at the time of enrollment for the AD Biomarker Positive cohort. These participants were CSF pTau 181 positive by their mid-50's. Of note, tau positivity as measured by PET tau lagged behind CSF pTau 181 positivity for this group and is consistent with previously published literature (5). While the AD Biomarker Positive group followed the proposed AT(N) curves (57), the Intermediate cohort developed CSF tau positivity earlier (65 years old) compared to CSF amyloid positivity (75 years old). This development of tau before amyloid is consistent with previous studies suggesting that in some individuals, tau positivity can occur prior to amyloid positivity (74). In the survival analysis, we did not detect a statistically signi cant difference in time to positivity for amyloid compared to tau for either the AD Biomarker Positive or Intermediate AD Biomarkers cohorts again suggesting that there may be heterogeneity in the progression to symptomatic AD. However, our survival analyses are limited by the number of individuals who were AD biomarker positive upon enrollment.
Throughout enrollment, there were no signi cant differences in neurodegeneration biomarkers between the three groups. This lack of difference was expected as we focused on cognitively normal individuals who may be at the very earliest stages of AD. Neurodegeneration is proposed to occur during the later stages and our results support the AT(N) hypothesis. At the conclusion of this study, we had one participant with a CDR = 2 and one participant with CDR = 1. Even though amyloid and tau pathology developed in this cohort, participants rarely progressed to symptomatic AD during the duration of the study (~ 11 years). Of those who did, cognitive decline aligned with our assessment of disease pathology severity (42% of the AD Biomarker Positive cohort had decline on clinical assessments compared to 7% of the Intermediate AD Biomarker cohort).
Perhaps most surprising was our ability to classify individuals as AD Biomarker Positive, Intermediate AD Biomarker, or AD Biomarker Negative -groupings that emerged organically from an unsupervised clustering analysis -using an entirely separate method, namely CSF proteome. Several post mortem studies have previously applied proteomic analysis to identify potential sources of resilience (16-18). In contrast to previous ex vivo brain-tissue based proteomics studies of resilience, we utilized in vivo CSF samples. Observed differences in the proteome represent a starting point rather than a conclusive identi cation of discrepancies in early preclinical AD progression. Future work will require quantitative targeted measurements of speci c proteins.
Within the AD Biomarker Negative group many proteins associated with neurodegeneration and amyloid production were downregulated. Further studies will be required to determine if the decreased expression of these proteins plays a protective role in slowing the progression of AD pathology. Ubiquitin Modi er 1(75), a protein associated with amyloid production, was downregulated in the AD Biomarker Negative group as compared to the other two cohorts, highlighting the importance of amyloid in the progression to AD. Several BBB associated proteins (including SMOC, Nidogen-2, and Matrilysin) were also downregulated. Many neurodegeneration associated proteins (e.g. Calcineurin, 14-3-3 protein family, MCL-1, IGF-1, NOTCH03, and Kallikrein-8) were also less commonly expressed in this cohort, consistent with recent ndings (41).
Within the Intermediate AD Biomarker group, several proteins were upregulated. This suggests that this cluster may offer insights into slowing of the progression of AD. Angiogenin(76) and Nidogen-2(77), proteins associated with BBB integrity, were elevated in this group. Nidogen-2 on its own was su cient to distinguish the Intermediate from the AD Biomarker Positive cohort, indicating its overall importance.
Retinol Binding Protein 4(78) a protective protein associated with liver health, was identi ed by Pelora as important and was also upregulated in this group. From this we nd evidence consistent with previously published arguments for the importance of BBB integrity (79) and liver health (80) in the prevention of AD.

Limitations
Although this dataset is relatively large in the context of longitudinal CSF studies, we had relatively few datapoints in the context of machine learning. Because of this data sparsity, we were unable to retain a true validation dataset and instead had to rely on cross-validation to evaluate generalizability. We were also were unable to compare results to an external cohort for validation. The requirement of multiple LPs with CSF Aβ42 and CSF pTau 181 for unsupervised classi cation in addition to needing fully multiplexed proteomics via CSF makes this a unique dataset. We hope in the future that additional highly characterized longitudinal CSF samples become available for analysis. Future collection of longitudinal PET Tau images could also greatly enhance this dataset and allow for a more complete investigation of tau progression in this cohort.

Conclusions
Our ndings on both the timing of amyloid and aggregated tau development, and the ability of the CSF proteome to classify these groupings have important implications for clinical trials. In therapies that focus on the AT(N), here we highlight heterogeneity in amyloid and tau development. The AD Biomarker Positive group developed amyloid and tau pathology before age 50, suggesting very early intervention is necessary for this group. The Intermediate AD Biomarker group developed signi cant tau pathology before becoming amyloid positive. Perhaps amyloid-reducing agents would demonstrate less e cacy in this group, as they do not seem to require amyloid positivity before developing substantial tau burden. We also identi ed additional potential non-AT(N) related targets for prospective AD drug development, including BBB integrity, liver function, and neuroin ammation.    The three clusters exhibit different behaviors across the Amyloid and Tau phases of the AT(N) progression. AD Biomarker Positive individuals have the greatest amyloid accumulation as quanti ed by both the CSF Ab42/Ab40 ratio and PET-PiB imaging. They also have the greatest level of tau accumulation as quanti ed by both CSF pTau 181 and PET-AV1451 imaging. The Intermediate cohort develops both amyloid positivity and tau positivity during the period of enrollment of the study, but they become tau positive before they are amyloid positive. There are no differences in the selected neurodegenerative biomarkers across the clusters. Figure 4 Using the log-loss criterion, we identi ed ten proteins that were most important for each classi cation (Intermediate vs. AD Biomarker Negative, AD Biomarker Positive vs. AD Biomarker Negative, AD