AGES study population
Participants aged 66 through 96 were from the Age, Gene/Environment Susceptibility (AGES)-Reykjavik Study cohort. AGES is a single-center prospective population-based study of deeply phenotyped subjects (n = 5,764, mean age 76.6 ± 5.6 years) and survivors of the 40-year-long prospective Reykjavik study, an epidemiologic study aimed to understand aging in the context of gene/environment interaction by focusing on four biologic systems: vascular, neurocognitive (including sensory), musculoskeletal, and body composition/metabolism30. Of the AGES participants, 3,411 attended a 5-year follow-up visit. LOAD diagnosis at AGES baseline and follow-up visits was carried out using a three-step procedure described in Sigurdsson et al.76. Cognitive assessment was carried out on all participants. Neuropsychological testing was performed on individuals with suspected dementia. Individuals remaining suspect for dementia underwent further neurologic and proxy examinations in the second diagnosis step. Thirdly, a panel comprising of a neurologist, geriatrician, neuroradiologist, and neuropsychologist assessed the positive scoring participants according to international guidelines30 and gave a dementia diagnosis. The participants were followed up for incident dementia through medical and nursing home reports and death certificates. The follow-up time was up to 16.9 years, with the last individual being diagnosed 16 years from baseline. Nursing home reports were based on intake exams upon entry or standardized procedures carried out in all Icelandic nursing homes77. The participants diagnosed at baseline were defined as prevalent LOAD cases while individuals diagnosed with LOAD during the follow-up period were defined as incident LOAD cases. All prevalent non-AD dementia cases (n = 163) were excluded from analyses.
Age, sex, education, and lifestyle variables were assessed via questionnaires at baseline. Education was categorized as primary, secondary, college, or university degree. Smoking was characterized as current, former, or never smoker. APOE genotyping was assessed via microplate array diagonal gel electrophoresis (MADGE)78. BMI and hypertension were assessed at baseline. BMI was calculated as weight (kg) divided by height (m) squared, and hypertension was defined as antihypertensive treatment or BP >140/90 mm Hg. Type 2 diabetes was defined from self-reported diabetes, diabetes medication use, or fasting plasma glucose ≥7 mmol/L. Serum creatinine was measured via the Roche Hitachi 912 instrument and estimated glomerular filtration rate (eGFR) derived with the four-variable MDRD study equation79. The AGES study was approved by the Nation Bioethics Committee (NBC) in Iceland (approval number VSN-00-063), and by the National Institute on Aging Intramural Institutional Review Board, and the Data Protection Authority in Iceland.
Proteomic measurements
The proteomic measurements in AGES have been described in detail elsewhere27,80 and was available for 5,457 participants. Briefly, a custom version of the SOMAscan platform (Novartis V3-5K) was applied based on the slow-off rate modified aptamer (SOMAmer) protein profiling technology81,82 including 4,782 aptamers that bind to 4,137 human proteins. Serum was prepared using a standardized protocol83 from blood samples were collected after an overnight fast, stored in 0.5 ml aliquots at −80°C and serum samples that had not been previously thawed were used for the protein measurements. All samples were run as a single set at SomaLogic Inc. (Boulder, CO, US). Hybridization controls were used to adjust for systematic variability in detection and calibrator samples of three dilution sets (40%, 1%, and 0.005%) were included so that the degree of fluorescence was a quantitative reflection of protein concentration. All aptamers that passed quality control had median intra-assay and inter-assay coefficient of variation (CV) < 5%. Finally, intraplate median signal normalization was applied to individual samples by SomaLogic instead of normalization to an external reference of healthy individuals, as is done for later versions of the SOMAscan platform (https://somalogic.com/wp-content/uploads/2022/07/SL00000048_Rev-3_2022-01_-Data-Standardization-and-File-Specification-Technical-Note-v2.pdf).
Of the 37 APOE-independent and APOE-dependent proteins highlighted in Tables 1 and 2, respectively, orthogonal mass spectrometry (MS) has verified the specificity of 7 aptamers (6 proteins) in previous studies26. Twelve additional aptamers were profiled (CD4 (3143_3_1), BRD4 (10043_31_3), SPON1 (5496_49_3), SMOC (13118_5_3), LRRN1 (11293_14_3), S100A13 (7223_60_3), CTF1 (13732_79_3), ARL2 (12587_65_3), C1orf56 (5744_12_3), MSN (5009_11_1), IRF6 (9999_1_3) and NEFL (10082_251_3)), and two additional aptamers (C1orf56 (5744_12_3) and MSN (5009_11_1)) were confirmed with SOMAmer pull down mass spectrometry (SP-MS) using elderly patient serum samples (>65 years) purchased from BioIVT (Table 2). The new confirmations' methodology is consistent with previous publications26, but the instrumentation was updated. Data-dependent analysis was performed on an Orbitrap Eclipse operated in positive ionization mode, with electrospray voltage 1500V and ion transfer tube temperature of 275°C applied. Full MS scans with quadrupole isolation were acquired in the Orbitrap mass analyzer using a scan range of 375-1500 m/z, standard AGC target, and automatic maximum injection time. Data dependent scans were acquired in the Orbitrap with a 0.7 m/z quadrupole isolation window, 50,000 resolution, 50% normalized AGC target, 200 ms maximum injection time, and 38% HCD collision energy over a 2 sec cycle time. Dynamic exclusion of 45 sec relative to +/- 10ppm reference mass tolerance was applied. The peptides were eluted Aurora Ultimate 25cm x 75um ID, 1.7um C18 nano columns over a 90 min gradient on the Vanquish Neo UHPLC system (Thermo Fisher Scientific). Raw data files were processed in Proteome Discoverer v2.5 with SequestHT database search using a canonical human FASTA database (20528 sequences, updated 04/08/22).
ACE cohort
ACE Alzheimer Center Barcelona was founded in 1995 and has collected and analyzed roughly 18,000 genetic samples, diagnosed over 8,000 patients, and participated in nearly 150 clinical trials to date. For more details, visit www.fundacioace.com/en. The syndromic diagnosis of all subjects of the ACE cohort was established by a multidisciplinary group of neurologists, neuropsychologists, and social workers. Healthy controls (HCs), including individuals with subjective cognitive decline (SCD) diagnosis, were assigned a Clinical Dementia Rating (CDR) of 0, and mild cognitive impairment (MCI) individuals a CDR of 0.5. For MCI diagnoses, the classification of López et al., 2003, and Petersen’s criteria were used84–87. The 2011 National Institute on Aging and Alzheimer’s Association (NIA-AA) guidelines were used for AD diagnosis88. All ACE clinical protocols have been previously published89–91. Paired plasma and CSF samples92, following consensus recommendations, were stored at -80°C. A subset of ACE cohort was analyzed with the SOMAscan 7k proteomic platform93 (n = 1,370), (SomaLogic Inc., Boulder, CO, US). The proteomic data underwent standard quality control procedures at SomaLogic and was median normalized to reference using the Normalization by Maximum Likelihood (ANML) method (https://somalogic.com/wp-content/uploads/2022/07/SL00000048_Rev-3_2022-01_-Data-Standardization-and-File-Specification-Technical-Note-v2.pdf). Additionally, APOE genotyping was assessed using TaqMan genotyping assays for rs429358 and rs7412 SNPs (Thermo Fisher). Genotypes were furthermore extracted from the Axiom 815K Spanish Biobank Array (Thermo Fisher) performed by the Spanish National Center for Genotyping (CeGen, Santiago de Compostela, Spain).
Statistical analysis
Protein measurement data was centered, scaled and Box-Cox transformed, and extreme outliers excluded as previously described80. Sample size was not predetermined by any statistical method but rather by available data. The associations of serum protein profiles with prevalent AD (n = 167) were examined cross-sectionally via logistic regression at baseline. The associations of serum protein profiles with incident LOAD (n = 655) were examined longitudinally via Cox proportional-hazards models. Participants who died or were diagnosed with incident non-AD dementia were censored at date of death or diagnosis. To account for hazard ratio variability which may arise with lengthy follow-ups, a secondary analysis using 10-year follow-up cut-off of incident LOAD was performed (nLOAD = 432). To compare the fits of the two follow-up times and test for time-dependence of the coefficients we used anova and the survsplit function from the Survival R package94. For both prevalent and incident LOAD, we examined three covariate-adjusted models. The primary model (model 1) included the covariates sex and age. Model 2 included as an additional covariate the APOE-ε4 allele count. The third model (model 3) included additional adjustment for cardiovascular risk factors, lifestyle, and kidney function (BMI, type 2 diabetes, education, hypertension, smoking history, eGFR) as they have been associated with risk of LOAD95. Benjamini-Hochberg false discovery rate (FDR) was used to account for multiple hypothesis testing. Analyses were conducted using R version 4.2.1. ACE SomaLogic proteomics data was similarly Box-Cox transformed and association analysis performed in the same manner as in AGES.
APOE-dependence criteria of the proteins were defined as serum proteins that met FDR significance of < 0.05 in association with incident LOAD in model 1, thus unadjusted for the APOE-ε4 allele, but whose nominal significance was abolished upon APOE-ε4 correction in model 2 (P > 0.05). Serum proteins that remained nominally significantly associated with incident LOAD (P < 0.05) upon APOE-ε4 correction but changed direction of effect were also considered to meet the APOE dependence criteria, as a reversal of the effect indicates that the primary association is driven by APOE-ε4.
Functional enrichment analyses were performed using Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA) using the R packages ClusterProfiler and fgsea96,97. The association significance cut-off for inclusion in ORA was FDR < 0.05. Background for both methods was specified as all proteins tested from the analysis leading up to enrichment testing. The investigated gene sets were the following: Gene Ontology, Human Phenotype Ontology, KEGG, Wikipathways, Reactome, Pathway Interaction Database (PID), MicroRNA targets (MIRDB and Legacy), Transcription factor targets (GTRD and Legacy), ImmuneSigDB and the Vaccine response gene set98. Finally, we looked into tissue gene expression signatures via the same methods (ORA and GSEA) using data from GTEX99 and The Human Protein Atlas, where gene expression patterns across tissues were categorized in a similar manner as described by Uhlen et al50 and tissue-elevated expression considered as gene expression in any of the categories ‘tissue-specific’, ’tissue-enriched’ or ‘group-enriched’. MinGSSize was set at 2 when investigating the LOAD-associated serum proteins directly. When investigating the APOE-dependent protein interaction partners, minGSSize was set to 15 and maxGSSize was set to 500. Overrepresentation of brain cell type markers among LOAD-associated proteins was tested using a Fisher’s exact test, with the SOMApanel protein set as background. Tissue specificity lookup for the top LOAD associated proteins was based on the Human Protein Atlas version 22 (https://v22.proteinatlas.org/). For the protein-protein interaction (PPI) network analysis, PPIs from InWeb32 (n = 14,448, after Entrez ID filtering) were used to obtain the first-degree interaction partners of the APOE-dependent proteins.
Protein comparisons across serum, CSF and brain
To compare protein modules and AD associations across tissues, protein modules and protein associations to AD were obtained from brain37 and CSF36. The brain data, from the Banner Sun Health Research Institute100 and ROSMAP101, included TMT-MS-based quantitative proteomics for 106 controls, 200 asymptomatic AD cases and 182 AD cases. The CSF samples were collected under the auspices of the Emory Goizueta Alzheimer’s Disease Research Center (ADRC) and Emory Healthy Brain Study (EHBS)36. The cohort consisted of 140 healthy controls and 160 patients with AD as defined by the NIA research framework102. Protein measurements were performed using TMT-MS and SomaScan (7k). Only SomaLogic protein measurements were included in the comparison between CSF and serum, which were median normalized. Proteins were matched on SomaLogic aptamer ID when possible but otherwise by Entrez gene symbol. Overlaps between modules and AD-associated (FDR<0.05) proteins across tissues were evaluated with Fisher’s exact test.
Mendelian randomization
A two-sample bi-directional Mendelian randomization (MR) analysis was performed to first evaluate the potential causal effects of serum protein levels on AD (forward MR), and second to evaluate the potential causal effects of AD or its genetic liability on serum protein levels (reverse MR). All aptamers significantly (FDR<0.05) associated with LOAD (incident or prevalent) were included in the MR analyses, or a total of 346 unique aptamers (Supplementary Tables 2, 3 and 5), of which 320 aptamers were significant in the full follow-up incident LOAD analysis (models 1-3), 106 aptamers were significant in the 10-year follow-up incident LOAD analysis (model 1-3) and ten aptamers were significant in the prevalent LOAD analysis (models 1-3). Genetic instruments for serum protein levels were obtained from a GWAS of serum protein levels in AGES24 and defined as follows. All variants within a 1 Mb (±500 kb) cis-window for the protein-encoding gene were obtained for a given aptamer. A cis-window-wide significance level Pb = 0.05/N, where N equals the number of SNPs within a given cis-window, was computed and variants within the cis window for each aptamer were clumped (r2 ≥ 0.2, P ≥ Pb). The effect of the genetic instruments for serum protein levels on LOAD risk was obtained from a GWAS on GWAS on 39,106 clinically diagnosed LOAD cases, 46,828 proxy-LOAD and dementia cases and 401,577 controls of European ancestry19. Genetic instruments for the serum protein levels not found in the LOAD GWAS data set were replaced by proxy SNPs (r2 > 0.8) when possible, to maximize SNP coverage. Genetic instruments for LOAD in the reverse causation MR analysis were obtained from the same LOAD GWAS24, where genome-wide significant variants were extracted (P < 5e-8) and clumped at a more stringent LD threshold (r2 ≥ 0.01) than for the protein instruments to limit overrepresentation of SNPs from any given locus across the genome. In the reverse causation MR analysis, cis variants (±500 kb) for the given protein were excluded from the analysis to avoid including pleiotropic instruments affecting the outcome (protein levels) through other mechanisms than the exposure (LOAD). A secondary reverse causation MR analysis was performed excluding any variants in the APOE locus (chr19:45,048,858-45,733,201). Causal estimate for each protein was obtained by the generalized weighted least squares (GWLS) method103, which accounts for correlation between instruments. Causality for proteins with single cis-acting variants was assessed with the Wald ratio estimator. For the reverse causation MR analysis, the inverse variance weighted method was applied due to a more stringent LD filtering of the instruments. Instrument heterogeneity was evaluated with Cochran’s Q test and horizontal pleiotropy with the MR Egger test.
Methods references
76. Sigurdsson, S. et al. Incidence of Brain Infarcts, Cognitive Change, and Risk of Dementia in the General Population: The AGES-Reykjavik Study (Age Gene/Environment Susceptibility-Reykjavik Study). Stroke 48, 2353–2360 (2017).
77. Jørgensen, L. M., El Kholy, K., Damkjær, K., Deis, A. & Schroll, M. »RAI« - Et internationalt system til vurdering af beboere på plejehjem. Ugeskr Laeger 159, 6371–6376 (1997).
78. Gudnason V, S. J. S. L. H. S. S. G. Association of apolipoprotein E polymorphism with plasma levels of high density lipoprotein and lipoprotein(a), and effect of diet in healthy men and women. NUTRITION METABOLISM AND CARDIOVASCULAR DISEASES 3, 136–141 (1993).
79. Levey, A., Greene, T., Kusek, J. & Beck, G. A simplified equation to predict glomerular filtration rate from serum creatinine. Journal of the American Society of Nephrology 11, 155A (2000).
80. Gudmundsdottir, V. et al. Circulating Protein Signatures and Causal Candidates for Type 2 Diabetes. Diabetes 69, 1843 (2020).
81. Lamb, J. R., Jennings, L. L., Gudmundsdottir, V., Gudnason, V. & Emilsson, V. It’s in Our Blood: A Glimpse of Personalized Medicine. Trends Mol Med 27, 20–30 (2021).
82. Gold, L. et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker Discovery. doi:10.1371/journal.pone.0015004.
83. Tuck, M. K. et al. Standard operating procedures for serum and plasma collection: Early detection research network consensus statement standard operating procedure integration working group. J Proteome Res 8, 113–117 (2009).
84. Jessen, F. et al. A conceptual framework for research on subjective cognitive decline in preclinical Alzheimer’s disease. Alzheimer’s & Dementia 10, 844–852 (2014).
85. Lopez, O. L. et al. Risk Factors for Mild Cognitive Impairment in the Cardiovascular Health Study Cognition Study. Arch Neurol 60, 1394 (2003).
86. Petersen, R. C. et al. Mild cognitive impairment: a concept in evolution. J Intern Med 275, 214–228 (2014).
87. Petersen, R. C. et al. Mild Cognitive Impairment. Arch Neurol 56, 303 (1999).
88. Jack, C. R. et al. NIA‐AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimer’s & Dementia 14, 535–562 (2018).
89. Orellana, A. et al. Establishing In-House Cutoffs of CSF Alzheimer’s Disease Biomarkers for the AT(N) Stratification of the Alzheimer Center Barcelona Cohort. Int J Mol Sci 23, 6891 (2022).
90. Rodriguez-Gomez, O. et al. FACEHBI: A PROSPECTIVE STUDY OF RISK FACTORS, BIOMARKERS AND COGNITION IN A COHORT OF INDIVIDUALS WITH SUBJECTIVE COGNITIVE DECLINE. STUDY RATIONALE AND RESEARCH PROTOCOLS. J Prev Alzheimers Dis 1–9 (2016) doi:10.14283/jpad.2016.122.
91. Moreno-Grau, S. et al. Genome-wide association analysis of dementia and its clinical endophenotypes reveal novel loci associated with Alzheimer’s disease and three causality networks: The GR@ACE project. Alzheimer’s & Dementia 15, 1333–1347 (2019).
92. Vanderstichele, H. et al. Standardization of preanalytical aspects of cerebrospinal fluid biomarker testing for Alzheimer’s disease diagnosis: A consensus paper from the Alzheimer’s Biomarkers Standardization Initiative. Alzheimer’s & Dementia 8, 65–73 (2012).
93. Candia, J., Daya, G. N., Tanaka, T., Ferrucci, L. & Walker, K. A. Assessment of variability in the plasma 7k SomaScan proteomics assay. Sci Rep 12, 17147 (2022).
94. Therneau, T., Crowson, C. & Clinic, M. Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model. (2014).
95. Gottesman, R. F. et al. Associations Between Midlife Vascular Risk Factors and 25-Year Incident Dementia in the Atherosclerosis Risk in Communities (ARIC) Cohort. JAMA Neurol 74, 1246 (2017).
96. Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS 16, 284 (2012).
97. Korotkevich, G. et al. Fast gene set enrichment analysis. bioRxiv 060012 (2021) doi:10.1101/060012.
98. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550 (2005).
99. Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318 (2020).
100. Beach, T. G. et al. Arizona Study of Aging and Neurodegenerative Disorders and Brain and Body Donation Program. Neuropathology 35, 354–389 (2015).
101. Bennett, D. A. et al. Religious Orders Study and Rush Memory and Aging Project. Journal of Alzheimer’s Disease 64, S161–S189 (2018).
102. Jack, C. R. et al. NIA‐AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimer’s & Dementia 14, 535–562 (2018).
103. Burgess, S., Dudbridge, F. & Thompson, S. G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med 35, 1880 (2016).