Manifestations of Alzheimer’s Disease Genetic Risk in the Blood: A Cross-Sectional Multi-Omic Analysis in Healthy Adults Aged 18-90+

Background: Genetics play an important role in late-onset Alzheimer’s Disease (AD) etiology and dozens of genetic variants have been implicated in AD risk through large-scale GWAS meta-analyses. However, the precise mechanistic effects of most of these variants have yet to be determined. Deeply phenotyped cohort data can reveal physiological changes associated with genetic risk for AD across an age spectrum that may provide clues to the biology of the disease. Methods: We utilized over 2000 high-quality quantitative measurements obtained from blood of 2831 cognitively normal adult clients of a consumer-based scientic wellness company, each with CLIA-certied whole-genome sequencing data. Measurements included: clinical laboratory blood tests, targeted chip-based proteomics, and metabolomics. We performed a phenome-wide association study utilizing this diverse blood marker data and 25 known AD genetic variants, adjusting for sex, age, vendor (for clinical labs), and the rst four genetic principal components; sex-SNP interactions were also assessed. Results: We observed statistically signicant SNP-analyte associations for ve genetic variants after correction for multiple testing (for SNPs in or near NYAP1, ABCA7, INPP5D, and APOE), with effects detectable from early adulthood. The ABCA7 SNP and the APOE2 and APOE4 encoding alleles were associated with lipid variability, as seen in previous studies; in addition, six novel proteins were associated with the e2 allele. The most statistically signicant nding was between the NYAP1 variant and PILRA and PILRB protein levels, supporting previous functional genomic studies in the identication of a putative causal variant within the PILRA gene. Sex modied the effects of four genetic variants, with multiple interrelated immune-modulating effects associated with the PICALM variant. In post-hoc analysis, sex-stratied GWAS results from an independent AD case-control meta-analysis supported sex-specic disease effects of the PICALM variant, highlighting the importance of sex as a biological variable. Conclusions: Known AD genetic variation lipid metabolism and immune systems a with

and PILRA and PILRB protein levels, supporting previous functional genomic studies in the identi cation of a putative causal variant within the PILRA gene. Sex modi ed the effects of four genetic variants, with multiple interrelated immune-modulating effects associated with the PICALM variant. In post-hoc analysis, sex-strati ed GWAS results from an independent AD case-control meta-analysis supported sexspeci c disease effects of the PICALM variant, highlighting the importance of sex as a biological variable.
Conclusions: Known AD genetic variation in uenced lipid metabolism and immune response systems in a population of non-AD individuals, with associations observed from early adulthood onward. Further research is needed to determine whether and how these effects are implicated in early-stage biological pathways to AD. These analyses aim to complement ongoing work on the functional interpretation of ADassociated genetic variants.

Background
The rapidly decreasing cost of genomics paired with technological advances in the generation of multiomic data has resulted in multiple datasets of deeply phenotyped individuals with a variety of health outcomes (1)(2)(3). The data collected in these studies have the potential to yield important insights into potential molecular drivers of health observable in the blood periphery. The present study seeks to leverage a unique and relatively large set of multi-omic, deep-phenotyping data to shed light on genetic pathways to late-onset Alzheimer's disease (AD) by assessing differences in ~2000 analytes in the blood that show association with known genetic risk variants for AD. Coupled with high-dimensional data sets, this approach has the potential to yield clues into gene pleiotropy, disease processes, and possible earlyintervention strategies, which are critically important given the essentially untreatable nature of late-stage Alzheimer's disease once signi cant brain deterioration has occurred.
Genetic variation plays a substantial role in AD risk, with twin studies estimating AD heritability at 58-79% (4). While the emergence of recent large-scale consortia efforts has facilitated well-powered metaanalyses of genome-wide association studies (GWAS) to identify multiple common variants with small effect sizes (5,6), the research community is still untangling exactly how this genetic variation in uences disease risk. Functional genomics studies are beginning to identify likely genetic pathways to disease with the aid of transcriptomic, epigenomic, and endophenotypic data (7)(8)(9)(10). So far, genetic and multiomic studies of AD studies have largely focused on older individuals with either clinically diagnosed AD or milder symptoms of cognitive decline, despite research pointing to highly variable AD pathobiology that occurs on a spectrum, and begins decades before clinical symptoms onset (11).
In this study, we leveraged the results from a large-scale GWAS meta-analysis (5) alongside data from a deeply phenotyped wellness cohort to investigate the physiological periphery effects of genetic risk for AD in individuals without known cognitive impairment, at all ages. We undertook an agnostic approach by adopting a phenome-wide association study (PheWAS) design (12). By examining how genetic variation impacts 2008 analytes in the blood of 2831 individuals, we sought to complement previous functional genomics studies as well as potentially reveal new testable hypotheses for future studies. In addition, we tested for associations between a polygenic risk score (PGRS) for AD and blood analytes to determine if a relative burden of genetic risk might impact observable changes in the blood, and we assessed for effect modi cation of genetic risk by sex.

Population
The Institute for Systems Biology, through partnership with their spin-out company Arivale, has access to a wealth of data collected from subscribers in the commercially available (now closed) Arivale Scienti c Wellness program (3,13), from July 2015 to May 2019. In brief, participants in the Arivale program were assigned a health coach upon joining the program, who then utilized data from clinical blood assays and detailed health-history and behavioral questionnaires to personalize health advice and management of health goals. Participants consented to their de-identi ed data being used for research purposes.

Blood-derived analytes
We identi ed 2831 individuals with whole genome sequencing (WGS) and at least one class of bloodderived analyte, described as follows. For each participant, fasting clinical blood laboratory tests were measured upon joining the program. Blood samples were collected at either local facilities hosted by LabCorp (North Carolina, USA) or Quest Diagnostics (New Jersey, USA). Whole genome sequencing was performed on DNA extracted from whole blood with library preparation using the Illumina TruSeq Nano Library prep kit and sequenced using Illumina HiSeq X, PE-150, target 30X coverage at a single Clinical Laboratory Improvement Amendmnets (CLIA)-approved sequencing laboratory. At the baseline blood draw, 2827 of the 2831 individuals with sequenced whole genomes had up to 63 fasting clinical blood lab tests. Clinical blood tests included standard markers for cardiometabolic health (lipid levels), diabetes, in ammation, kidney and liver function, nutrition (vitamins and minerals), and blood cell counts. Frozen plasma samples were also sent to Olink (Olink Bioscience, Sweden) for targeted proteomics assays based on Olink's proximity extension assay technique. Up to 2694 of these participants had quantitative proteomic data on 274 proteins from three Olink panels (Cardiovascular II, Cardiovascular III, and In ammation panels). An additional 919 proteins (from 10 additional panels available at Olink at the time) were obtained from a subsample of 354 individuals, in which Apolipoprotein E (APOE) e2/e2 and APOE e4/e4 genotypes were overrepresented. Since multiple batches were performed, previously generated pooled control samples were run with each batch and used for batch correction and multiple control samples were included on each plate. Aliquots of frozen plasma samples were sent to Metabolon, Inc. to conduct metabolomics assays using the Metabolon HD4 discovery platform. Up to 1855 of the participants had data from 754 metabolites. Relative concentration values were reported for each metabolite. For all analytes, only analytes with <20% missing were included in analyses.

SNP selection
We selected 25 common and somewhat-rare (>1% allele frequency) single nucleotide polymorphisms (SNPs) that were signi cantly associated with AD in a large-scale meta-analysis based on updated data from the International Genomics of Alzheimer's Project (IGAP) (5). In addition to these variants, we also included the SNP coding for APOE e2 (rs7412). The 25 SNPs were linked to 24 genes (two SNPs in APOE), as detailed in Table S1.
Polygenic risk score calculation for AD: PGRS for age-associated AD risk was computed using summary statistics from the initial IGAP-driven GWAS meta-analysis (6). Brie y, the set of SNPs included in the PGS was determined as follows. The Benjamini-Hochberg (14) procedure was applied to the p-values for all SNPs tested in the GWAS to account for multiple testing by controlling the false discovery rate (FDR) at a 5% level. This FDR-ltered set of SNPs was then further pruned using linkage disequilibrium (LD): pairs of SNPs in close proximity capturing highly correlated information (r 2 > 0.2) were identi ed, and the SNP with the smaller p-value in the pair was kept; this was repeated until all remaining SNPs were mutually uncorrelated (r 2 < 0.2 for all pairs). The PGRS for each individual was then calculated by summing up the published effect size for each selected SNP multiplied by the number of effect alleles the individual carried for that SNP, across all of the selected SNPs. Missing genotypes were mean imputed using the effect allele frequency.

Statistical analysis
Following a phenome-wide association study approach (PheWAS) (12,15), the primary model for each SNP used linear regression, with genotype (0, 1, or 2, with 0 indicating homozygosity for the major allele and 2 indicating homozygosity for the minor allele) as the predictor, and each continuous quantitative analyte as the dependent variable. Clinical lab and metabolite values were natural log transformed to account for right skewness and outliers, with +1 added to each natural log transformation to prevent zero values. Proteomic quantities were presented as normalized protein expression (NPX), Olink's arbitrary unit, which is in log2 scale. Genetic ancestry was represented by principal components (PCs) 1-4, calculated using previously described methods (16). All SNP models were adjusted for age, sex, genetic ancestry PCs 1-4, and vendor identi cation for the clinical labs. Secondary models tested effect modi cation by sex by including a gene x sex interaction term in the models. We accounted for multiple comparisons by applying the Benjamini-Hochberg method (14) at alpha=0.05 on a per-SNP basis and applied to the main effect of genotype in the primary models, while we set B-H alpha=0.1 of the sex-SNP interaction term as the threshold for the gene x sex interaction models. The FDR rate took into account testing for all 2008 possible analytes, with the understanding that this adjustment was highly conservative given a high degree of correlation among multiple groups of analytes, and the fact that some analytes were sampled in only a subset of individuals. Both raw and adjusted p-values are reported.
We also repeated the primary PheWAS approach with participants strati ed by self-identi ed race, due to evidence for variable genetic risk for cognitive outcomes between non-Hispanic white (hereafter referred to as "white") and non-white populations (17,18). Unfortunately, due to small numbers of individuals in speci c non-white racial and ethnic groups, which become vanishingly small when accounting for allele frequency and numbers of available samples (Table 1), we were not able to assess genetic risk effects in individual groups with statistical rigor and had to group all non-white participants into one stratum for analysis. The strati ed white and non-white group analyses serve as an investigation into whether our primary results re ected the majority-white makeup of the Arivale population. PheWAS was applied as described above, with FDR to account for multiple comparisons.
To visualize genotype-analyte associations across adulthood, we created boxplots of the log-transformed analyte values by genotype, strati ed by age group (by decade, from 18-29 to 70 and over). One-way analysis of variance (ANOVA) was used to test whether there was an overall difference between genotypes within each age group. All statistical analyses were performed in R v3.5.1 (https://www.Rproject.org/).
In post-hoc exploratory analysis focused on the SNP in the PICALM (Phosphatidylinositol Binding Clathrin Assembly Protein) locus (rs3851179), sex-strati ed and sex-interaction analyses was performed on 12,324 cases (57.7% female) and 11,453 controls (59.9% female) of European ancestry from the Alzheimer's Disease Genetics Consortium (ADGC) (see Supplementary Table 4 for dataset details). Datasets were imputed to the Haplotype Reference Consortium (HRC) (19) panel using the Michigan Imputation Server  (20). Individuals with non-European ancestry according to principal components analysis of ancestry-informative markers were excluded from the further analysis. Detailed descriptions of individual ADGC datasets can be found in Kunkle et al. (5) and Table S5. Study-speci c logistic regression analyses employed Plink (21) for sex-interaction analysis and SNPTest (22) for sex-strati ed analysis. Sex-interaction, which analyzed the sex*variant interaction, and sex-strati ed analysis of males and females separately, were performed for two separate models per analysis, one adjusting for age, sex and PCs (model 1) and a second adjusting for age, sex, PCs and APOE (model 2). Results were meta-analyzed with METAL using inverse variance-based analysis (23).

Summary of population and study design
Sixty-one percent of Arivale participants were female, 22% were of non-white self-reported ethnicity, and 28% were obese ( Table 1). The mean age at blood draw was 47 years, with a range of 18 to 89+. In general, individuals who joined Arivale had somewhat higher levels of cardiovascular risk markers compared to the US population, and slightly lower rates of obesity and pre-diabetes (3) (these rates were consistent with rates in the geographies and ethnicities of the population, mostly from the west coast region of the United States).

Phenome-wide association study results
We observed 33 SNP-analyte associations that were statistically signi cant at FDR-adjusted p-value<0.05, with most of the associations observed for the APOE SNPs (rs7412, or the e2-de ning allele, and rs429358, or the e4-de ning allele). The other SNPs showing signi cant associations with at least one clinical chemistry, protein, or metabolite were rs10933431, rs12539172, and rs3752246 ( Figure 1, Table S2).

NYAP1
The most robust SNP-analyte associations we observed were between rs12539712 in the 3' region of NYAP1 (Neuronal Tyrosine Phosphorylated Phosphoinositide-3-Kinase Adaptor 1), and two co-regulated proteins, paired immunoglobulin-like type 2 receptors beta and alpha (PILRB and PILRA) ( Figure 2). Carriage of the minor allele (AD risk odds ratio (OR)=0.92) was associated with signi cant reduction in normalized protein expression (NPX) of PILRB and PILRA compared to individuals homozygous for the major allele (FDR-adjusted p-values=2.2x10 −33 and 2.3x10 −17 , respectively), while the overall level of NPX increased with age among all genotypes. The reduction in protein levels appears roughly dose-dependent with the number of minor alleles and was observed in all but the oldest and youngest age groups (likely due to small numbers of the minor allele in these groups (Table S3A). These observations led us to previous studies pointing to variation in PILRA as the causal gene at this locus, with a missense SNP as a leading candidate (G78R, rs1859788) (24)(25)(26)(27). In post-hoc analysis, we repeated the PheWAS with this putative causal SNP (which was in LD with our index SNP rs12539172, R 2 =0.77), and the associations became stronger (FDR-adjusted p-value for PILRB=3.6x10 −52 ; for PILRA=1.4x10 −22 ) ( Figure 2), with genotype differences observed in all age groups (Table S3A).
APOE4: We observed signi cant associations between rs429358 (which encodes the e4 allele) and multiple related clinical measures of cholesterol ( Figure 3). Differences by genotype were less pronounced in older age groups likely due to statin use (Table S3B); exploratory analyses visualizing only individuals who did not report use of statin-lowering medications showed more consistent genotypedependent differences between rs429358 and the top cholesterol marker, low-density lipoprotein (LDL) particle number ( Figure S1, Table S3B). The concentration of two proteins in the blood were associated with the e4 allele: PLA2G7 (Platelet Activating Factor Acetylhydrolase) and CD28 (T-Cell-Speci c Surface Glycoprotein CD28) ( Figure 3). Selected lipid metabolites in the blood were positively associated with e4: two diacylglycerol (DG) metabolites (one of which was measured twice in the Metabolon panel) were higher in e4 carriers compared to non-carriers.
APOE2: we observed signi cantly lower levels of multiple clinical measures of LDL cholesterol associated with carriage of the e2 allele ( Figure 4). As the unadjusted plots show, e2 homozygotes are dramatically different than other genotypes, though it should be noted that few e2 homozygotes were present in the population (n=16) and were within a limited age range (30-59 years). Selected lipid metabolites in the blood were positively associated with e2: a monoglyceride (MG) and four diacylglycerol (DG) metabolites (one of which was a replicate) were higher in e2 carriers compared to non-carriers. We observed six e2protein associations ( Figure 5), such that each of the following proteins were observed at higher levels in e2 carriers: low density lipoprotein receptor (LDLR), heme oxygenase-1 (HMOX-1), SLAM family member 8 (SLAMF8), ring nger protein 31 (RNF31), contactin associated protein 2 (CNTNAP2), and signal recognition particle 14 (SRP14).

ABCA7
The ABCA7 (ATP Binding Cassette Subfamily A Member 7) variant (rs3752246), which has been associated with increased risk of AD (OR=1.15, Table S1), was associated with lower levels of two lactosylceramide (LC) metabolites in the sphingolipid family. These differences were evident starting in the youngest age groups ( Figure S2, Table S3A). The minor allele of rs3752246 was also associated with higher levels of DEFA1 (Defensin Alpha 1), an antimicrobial peptide.

INPP5D
An intronic SNP in INPP5D (Inositol Polyphosphate-5-Phosphatase D) (rs10933431), which was associated with a lowered risk of AD in meta-analyses, was associated with lower levels of the protein IDUA (alpha-L-iduronidase) ( Figure S2).

Polygenic risk score
No associations were observed between the APOE-free PGRS and any analyte after FDR correction for multiple testing, either in primary analyses or in analyses adjusted for e4 status, or among non-e4 individuals only. No effect modi cation by sex or APOE4 status was observed.

Sex-speci c ndings
We observed a SNP x sex interaction involving the AD-protective PICALM variant, such that the minor allele was associated with higher levels of 30 proteins in men and lower levels of the proteins in women ( Figure 6, Figure S3, Table S4). These proteins were highly correlated with one another (mean pairwise spearman's rho = 0.49); thus, it is unclear whether the associations are independently biologically meaningful, or whether there is a passenger effect, in which one or a few proteins are driving the sexdifferential association with genotype observed in the data. In addition, the PICALM variant is associated with a sex-speci c effect on ve highly correlated long-chain fatty acid (LCFA) metabolites and one polyunsaturated fatty acid (PFA) metabolite (Docosahexaenoic acid) ( Figure 6). To investigate further, we conducted a post-hoc analysis examining the impact of this variant on AD risk strati ed by sex, in a metaanalysis of clinically diagnosed late-onset AD (18,812 individuals, Table S5). While AD risk was reduced in both men and women among carriers of the minor allele, the effect was stronger among men (Table 2  and Table S6), which was consistent with the sex-strati ed SNP-analyte analyses (data not shown). Other observed sex-speci c effects were more modest ( Figure 6). The SNP near CD2AP (CD2 Associated Protein) interacted with sex to affect three highly correlated sphingomyelins and three plasmologens, while the SNP in SPI1 (Transcription Factor PU.1) interacted with sex to affect SPARC related modular calcium binding 2 (SMOC2). Lastly, the missense ABCA7 SNP interacted with sex to affect levels of Ubiquitin conjugating enzyme E2f (UBE2F).

Strati cation by self-identi ed race/ethnicity
Unfortunately, due to vanishingly small numbers in individual self-identi ed groups (Table 1), we were not able to assess genetic risk effects in individual groups with statistical rigor. As expected, analyses restricted to white individuals recapitulated results of the overall analysis ( Figure S4). In the nonwhite group overall, we observed effect sizes that were consistent with the overall results and white-only results ( Figure S5).

Discussion
Our study examines associations between known genetic risk factors for AD and blood markers (clinical labs, proteins, and metabolites). It provides insight into the manifestation of AD-related genetic risk in blood-borne analytes from cognitively normal individuals and demonstrated how AD-related genetic variation manifests in the blood across adulthood. Our results contribute to the growing literature highlighting a potential causal variant (missense SNP in PILRA), point to potential new mechanisms of protection among APOE2 carriers, and suggest a role for infectious diseases as AD risk factors, alongside lipid metabolism, immune response, and endocytosis. We also uncovered intriguing differences between men and women in how genetic risk manifests in the blood. These analyses not only add to the existing literature on functional genomics in AD, but also offer up multiple potential new hypotheses to catalyze future studies.
The strongest associations in the study were between the NYAP1 SNP (rs12539172) and the PILRB/PILRA proteins. This locus was originally identi ed by rs1476679 near ZCWCP1 (6). NYAP1 and ZCWPW1 are located near PILRA and PILRB on chromosome 7, within a linkage disequilibrium (LD) block. In previous gene expression studies, the initial index SNP for ZWCWP1 has been associated with expression of multiple PILRB and PILRA transcripts in brain (9, 28). PILRA and PILRB are paired, coregulated inhibiting/activating receptors, respectively, that are expressed on innate immune cells, recognize certain O-glycosylated proteins, and have an important role in regulating acute in ammatory reactions (29). The R78 substitution in PILRA (rs1859788) has been shown to reduce the binding capacity of endogenous ligands and thereby potentially increase microglial activity (27). In addition, while controversial, work from our group and others (30-32) has suggested a potential viral role in AD risk. Notably, the R78 variant has been implicated in HSV-1 (Herpes Simplex Virus type 1) infectivity (27) and differences in HSV-1 antibody titer levels (24). While previous studies have hypothesized that reduced activity of PILRA was due to steric conformational changes in the protein leading to reduced binding of key ligands (including HSV-1 glycoprotein B), our results suggest that reduced levels of circulating PILRA protein in R78 carriers could also be a factor in the overall protective effect of this genetic variant.
Statistically signi cant associations were observed between multiple lipid analytes and the SNPs encoding both APOE4 (rs429358) and APOE2 (rs7412). APOE normally plays a key role in lipid transport, including shuttling cholesterol to neurons in healthy brains. Notably, APOE has a role in beta-amyloid (Aβ) metabolism, and while the exact mechanism is unknown, the e4 variant appears to accelerate neurotoxic Aβ accumulation, aggregation, and deposition in the brain (33). The associations we observed between the e4 variant and increased levels of total cholesterol and LDL cholesterol, along with lower levels of high-density lipoprotein (HDL), were consistent with previous cardiovascular disease cohort studies that included young, middle-aged, and older adults (34)(35)(36)(37). The e4 allele was associated with increased NPX of two in ammatory proteins. PLA2G7 is a known cardiovascular risk marker with pro-in ammatory and oxidative activities (38) which has previously been associated with APOE genotypes (39) and implicated in AD and cognitive decline (38, 40). To our knowledge, CD28 protein levels have not previously been associated with e4 status, though this relatively weak association may be a downstream result of APOE isoform-speci c effects on in ammation (41).
Blood cholesterol levels among APOE2 carriers were also largely consistent with a body of existing data (35); the e2 variant was associated with lower levels of multiple measures of LDL cholesterol. It should be noted that while 5-10% of e2 homozygotes develop type III hyperlipoproteinemia (typically in the presence of an existing metabolic disorder (42)) resulting in elevated cholesterol levels, all e2 homozygotes in the study had signi cantly decreased levels of LDL cholesterol compared to other genotypes. In contrast, the e2 variant was associated with higher levels of six lipid metabolites in the diacylglycerol and monoacylglycerol family; interestingly, both the e4 variant and e2 variants were associated with increased levels of the same two lipid metabolites in the diacylglycerol family, despite the opposite effects of these two variants on circulating blood cholesterol. Diacylglycerol is a precursor to triacylglyceride (TG), which is typically higher in APOE2 carriers (37). The effects of high DGs and TGs remains unclear. DG-rich diets fed to diabetic APOE-knockout mice had reduced atherosclerosis and lower plasma cholesterol than mice fed TG-rich or western diets (43,44); however, non-targeted metabolomics studies have shown elevated levels of DGs and MGs in AD and mild cognitive impairment (MCI) patient brains and blood compared to cognitively intact individuals (45,46).
We observed six proteins that were signi cantly upregulated in APOE2 carriers (Figure 3). The LDLR protein had higher levels of NPX in e2 carriers, particularly in e2 homozygotes (Figure 3a). Though APOE2 is known to bind poorly to LDLR (~2% of e3 or e4 binding activity) (47), APOE2 was associated with lower levels of LDL cholesterol across age groups as noted previously, perhaps due to compensatory upregulation of LDLR (37). Greater understanding of the compensatory mechanism leading to upregulated LDLR and lower circulating LDL cholesterol is needed. The e2 variant was associated with increased levels of the highly inducible HMOX-1, which has antioxidant properties and has been associated with both neuroprotection and neurodegeneration (48). SLAMF8 may be another link to an antioxidant effect of APOE2, as it has been implicated in modulation of reactive oxygen species and in ammation via negative regulation of NOX activity (49). APOE2 carriers displayed higher levels of RNF31 protein (aka HOIP). HOIP is the catalytic component of the linear ubiquitin chain assembly complex (LUBAC), which was shown to have a role in the recognition and degradation of misfolded proteins (50). Variation in CNTNAP2, a member of the neurexin superfamily of proteins involved in cell-cell interactions in the nervous system, has been associated with neurodevelopmental disorders (51), and has been implicated in AD-related dementia (52). Lastly, SRP14, which has a role in targeting secretory proteins to the rough endoplasmic reticulum (ER) membrane, has been identi ed as one of many tau-associated ER proteins in AD brains (53). To our knowledge, the APOE2-protein associations described here are novel and may help point to the mechanisms of protection associated with the e2 variant.
ABCA7 is involved in lipid e ux from cells into lipoprotein particles, plays a role in lipid homeostasis (54), and has also been implicated in Aβ processing and deposition in the brain (55). Our results support ABCA7's lipid-related function by showing lower levels of two LC metabolites among individuals carrying the AD-risk allele of rs3752246. In contrast, we observed higher NPX of DEFA1 protein in carriers of the ABCA7 variant, which is consistent with previous studies showing higher levels of this protein in cerebral spinal uid (CSF) and sera of AD patients compared to controls (56, 57), potentially linking ABCA7 with an in ammatory response pathway to AD. Lastly, lower NPX of IDUA was associated with the INPP5D SNP. INPP5D, which encodes the lipid phosphatase SHIP1, is a negative regulator of immune signaling and is expressed in microglia (58). To our knowledge, this association has not been previously observed.
Genetic variation likely affects men and women differentially, pointing to mechanisms that contribute to known differences in AD pathology between the sexes (59). The set of proteins that were differentially affected by sex and PICALM genotype are primarily implicated in immune processes, cell adhesion, and regulatory processes, with widely overlapping functions ( Figure S6). Our results highlight an interaction between the AD-risk variant in PICALM and multiple proteins implicated in immune response in a sexspeci c manner, and support emerging research showing sex differences in the neuroimmune response that impact microglia function (60). We also observed a sex-differential effect of the variant on multiple LCFA metabolites and one PFA metabolite (DHA). A potential link between PICALM function, lipids, and AD is feasible: fatty acids, and DHA in particular, have long been known to have a role in maintaining brain health and cognition (61), while PICALM expression has been shown to in uence cholesterol homeostasis through multiple mechanisms (62). This multi-analyte interaction was supported by results from sex strati ed GWAS meta-analyses, which showed differing effect sizes of the variant on men vs. women.
In addition to these sex-speci c PICALM effects, the SNP near CD2AP, a scaffolding protein, interacted with sex to affect three highly correlated sphingomyelins and three plasmologens, while the SNP in SPI1, a transcription factor associated with microglial activation (63), interacted with sex to affect SMOC2, a protein involved in microgliosis that has been previously associated with Aβ positivity in CSF (64).
We also examined an AD-speci c polygenic risk score. While the PGRS is predictive of disease in case/control studies (65), it was not associated with any blood analytes in the all-ages AD-free Arivale cohort. Combining genetic effects into a single score for AD likely served to dilute any individual genetic effect on the manifestation of genetic risk in the blood. In addition, the relative youth and cognitive health of this cohort should be taken into account. The PGRS may be more likely to detect perturbation in analytes that are markers of systemic in ammation or immune dysfunction in later life and among cohorts experiencing cognitive impairment.
The results presented here are novel and we believe will be of interest to the AD-related functional genomics community, though several limitations should be noted. The study population was not a random sample but was self-selected. The population is largely self-identi ed non-hispanic white, was mostly located on the west coast, and likely has higher than average socio-economic status (though these data were not captured). Thus, results may not be generalizable to a broader population. At this time, we were not aware of a suitable replication cohort that would contain parallel -omics panels in an all-ages health-heterogeneous cohort. Future studies will be needed to assess generality of the ndings to other populations, not only for the sake of replicability of the ndings, but due to the relative ancestral homogeneity of this data set. Previous studies have shown genetic heterogeneity between white and nonwhite individuals, particularly with regard to African Americans and risk of cognitive outcomes among carriers of APOE and ABCA7 variants (17,18). Given known wide-ranging racial/ethnic disparities in dementia incidence (66), it is imperative that future deep-phenotyping studies are far more inclusive than the study presented here.
Another limitation to the interpretation of results concerns the issue of pleiotropy; we cannot discern pleiotropic, non-AD-related effects from true causal effects that are implicated in AD pathogenesis.
However, even if the associations described here are purely the result of pleiotropy and are unrelated to causal mechanisms of AD, the novel associations we described may provide clues to the function of several genes that are highly interesting to the AD community. Related, we only obtained peripheral plasma, and are unable to examine effects in AD-relevant compartments such as brain or CSF. We had high-coverage WGS available and did not interrogate other types of genetic variation such as copy number variants, indels, and short tandem repeats. Lastly, data harmonization with other studies will be a challenge. For instance, most previous metabolomics studies used metabolomics data that lacked complete speciation, and more work is needed with newer technologies that yield high delity data to determine the biological effects of speci c serum metabolites.
This study also has multiple strengths. While most studies focused on AD-related genetic variation consist of case/control cohorts in older adults, the Arivale data offered an unprecedented look into how genetic variation perturbs physiological pathways in the blood long before disease onset, in healthheterogeneous individuals of all ages. This feature allowed us to observe subtle changes in blood associated with genetic variation, due to the relatively large sample size (2831 individuals with WGS) and the high quality of the blood analytes collected. Our results are from a "real-world" cohort, which promises to be an increasing source of large-scale data in the community going forward, with its accompanying advantages and disadvantages. Some results were previously unobserved and need to be replicated (such as the associations between APOE2 and multiple proteins), while other results agree with previous ndings and serve to reinforce con dence that the results are reasonably representative and not simply spurious.

Conclusions
Due to a uni ed world-wide effort, dozens of genetic variants have been robustly implicated in the development of AD, though we are still in the early stages of understanding exactly how genetic variation contributes to disease. Our study showed that AD-related genetic variation manifests in the blood, from early adulthood onward, and highlights known targets for prevention in early and mid-life, such as cholesterol monitoring, mitigation of in ammation, and possibly, HSV-1 prevention and/or viral load management. Importantly, as well as yielding new insight into the pathobiology of AD through adulthood, these results may provide a signi cant number of new drug targets that are highly novel and biologically plausible or may serve as biomarkers if con rmed to have a consistent in uence on AD pathophysiology. Lastly, these results highlight the need to assess for sex differences in future studies. Taken together, these results not only illustrate previously unobserved biological phenomenon as a result of ADassociated genetic variation, but also serve as an important resource for the generation of hypotheses for future functional genomics studies and emphasize the potential insight that can be gleaned from deeply phenotyped individuals.

Availability of data and materials
The datasets generated and/or analysed during the current study are not publicly available because the data was generated by a private investment rm under legal terms that mandate researchers to sign a data access agreement permitting the use of these data for non-proift research purposes. Upon reasonable request, researchers can access the Arivale deidenti ed dataset supporting the ndings in this study for research purposes from ISB. Requests should be sent to data-access@isbscience.org. The data are available to quali ed researchers on submission and approval of a research plan.
Competing Interests Figure 1 Statistically signi cant SNP-analyte associations after correcting for multiple testing (threshold FDRadjusted p-value=0.05), by SNP.  Unadjusted box plots of clinical chemistries and metabolites signi cantly associated with APOE2 genotype, by age group. White boxplots=individuals who are homozygous for the major allele, gray boxplots=heterozygotes, black boxplots=minor allele homozygotes. Box plot midline=median value, lower/upper hinges=25th and 75th percentiles, respectively; lower whisker ends/upper whisker ends no further than 1.5 x interquartile range from the hinge. Data beyond whiskers are outlying points.