Human peripheral monocytes capture elements of the state of microglial activation in the brain

Daniel Felsky Centre for Addiction and Mental Health https://orcid.org/0000-0003-1831-9848 Hans-Ulrich Klein Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322 https://orcid.org/0000-0002-6382-9428 Vilas Menon Columbia University Medical Center https://orcid.org/0000-0002-4096-8601 Yiyi Ma Columbia University Irving Medical Center https://orcid.org/0000-0002-3609-8877 Yanling Wang Rush University Medicine Center Milos Milic CAMH Peter Zhukovsky Centre for Addiction and Mental Health Julie Schneider David Bennett Rush University Medical Center Philip De Jager (  pld2115@cumc.columbia.edu ) Columbia University Medical Center https://orcid.org/0000-0002-8057-2505


Introduction
Over the past decade, the role of myeloid cells in susceptibility to Alzheimer's disease (AD) has become progressively more established given the growing number of susceptibility loci with evidence of having functional consequences in myeloid cells and the fact that such variants make up a large proportion of known AD susceptibility variants 6,7 . While microglia have received most of the attention to date since they are the resident immune cells of the brain, many other myeloid cells are found in the central nervous system and could have a role in AD since (1) they share many of the transcriptional programs of microglia and (2) enrichment analyses using genetic data are enriched for all myeloid cells. Further, monocytes have been a useful substitute in human functional genetic studies 8,9 and can be differentiated into microglia-like cells that complement induced pluripotent stem cells (iPSC)-derived microglia-like cells as a model system for microglia 10 . Both model systems share a similar extent of gene expression with primary human microglia.
Monocytes, dendritic cells, and microglia-like cells are found in the cerebrospinal uid 11 , and macrophages are found in atherosclerotic lesions, around blood vessels, and in the meninges 12, 13 .
Further, in ltrating monocytes differentiate into activated parenchymal macrophages that are di cult to distinguish in tissue with current tools from activated microglia. In aging, blood-brain barrier integrity is compromised and monocytes may be more likely to directly in ltrate brain tissue where they can differentiate into microglia-like mononuclear macrophages and further contribute to the central immune response to age-related pathology. 14 Overall, the genetic evidence in AD to date does not exclude any of these cell populations from having a role in AD. Myeloid cells are part of the innate immune system and have many different roles, including acute and chronic in ammatory responses to pathogens. One example is the association of gingivitis with the risk of AD, which could lead to a chronic in ammatory response that affects the CNS through circulating immune cells and/or mediators of systemic in ammation. 15,16 Alternatively, peripheral cells could be responding to the same mediators of in ammation that affect all other organs, including the CNS.
Given the roles of monocytes in responding to events unfolding within the CNS, their molecular phenotype may be sensitive to neuropathological changes associated with aging even if they are not causally involved, making them a substrate for biomarker discovery in the diagnosis and prognosis of neurodegenerative illness. 17 Here, we set out to examine the role of peripheral blood monocytes -which are mostly recent bone marrow emigrants -and AD by measuring the transcriptome of this cell type puri ed from blood collected from living participants in a longitudinal study of aging with prospective blood and brain collection. Speci cally, we retrieved frozen peripheral blood mononuclear cells from an archive of cryopreserved material from participants who had undergone autopsy and had transcriptomic data generated from the dorsolateral prefrontal cortex. We compare monocyte RNA expression levels directly with those from bulk postmortem brain tissue in the same individuals, and we identify genes that are predictive of cognitive decline and certain aging-related neuropathologies. These associations are validated against results from over a dozen human and preclinical experiments of immune cell transcriptomics in in ammatory and AD-related models. Finally, we explore the moderating effects of APOE genotype, clinical AD stage, and biological sex on identi ed associations.

Results
Cryopreserved peripheral blood mononuclear cells (PBMCs) were thawed and uorescence activated cell sorting (FACS) was used to isolate the CD14+ cells from each sample into RNA lysis buffer. 2000 sorted CD14+ cells were collected per participant, and a transcriptome was generated from each sample using the SmartSeq2 protocol. Data were pre-processed as outlined in the Methods section and yielded expression values for 9,129 genes in 218 participants. Demographic details of the participants used in downstream analyses are presented in Supplementary Table 1.

Monocyte-brain gene expression correlation
We rst asked the relatively naïve question of whether genes in the monocyte transcriptome were strongly correlated with the bulk cortical transcriptome in the same individuals; we assessed all 8,757 genes with detectable expression in both tissues. Despite a small but signi cant positive difference in mean correlation compared to null (1000 permutations of all gene-wise correlations yielded p=0, with mean observed rho=0.016), most of the transcriptome has no individual correlation between monocytes and cortex ( Figure 1A). A small number of genes did exhibit strong FDR-corrected correlations in their level of RNA expression (170 positively correlated, 6 negatively correlated); negatively correlated genes had rho values ranging from -0.26 to -0.24 (RPS16, PDE4B, RABGAP1, FOXP1, TSPY26P, RPS28), whereas the top 10 most positively correlated genes had considerably larger effect sizes, with rho between 0.68-0.9 (HLA-DRB5, HLA-DRB1, ERAP2, RPS26, U2AF1L5, PI4KAP1, RPL9P9, XRRA1, RPSAP58, SNHG5).
To understand which biologically relevant pathways are overrepresented in this pool of correlated genes, we performed gene ontology (GO) enrichment analyses of the four gene pools (positive and negatively correlated at nominal p<0.05 or corrected FDR<0.05 thresholds) using gpro ler2 ordered queries ( Figure   1B-C). 18 To improve interpretability of the results, we semantically collapsed signi cantly enriched biological processes (see Methods), which revealed enrichment of antigen processing and presentation of endogenous peptide antigen in the positively correlated pool of 170 genes (a highly similar enrichment pattern was observed in the larger mutually inclusive pool of 587 nominally correlated genes).
Intriguingly, enrichment for pathogenesis of SARS-COV2 mediated by nsp9-nsp10 complex was also observed in this pool, suggesting a connection to systemic in ammation as all brain and blood samples were collected pre-pandemic. Among negatively correlated genes (219 in the nominally correlated pool), enrichment was principally observed for mRNA catabolic process and SRP-dependent co-translational protein targeting to membrane, which are composed primarily of ribosomal protein subunit (RPS) and signal recognition particle (SRP) genes (all signi cant enrichment results in Supplementary Table 2) Next, to assess whether these correlations represent true proxies in blood for brain expression of certain genes, we generated a map of transcriptome-wide expression quantitative trait locus results for cis effects (cis-eQTL) in monocytes. This eQTL map is accessible online via the AD Knowledge Portal (will be uploaded prior to publication. We also accessed brain cis-eQTL data from our prior DLPFC xQTL-serve platform 19 (which was updated in June 2021) as well as GTEx (v8) frontal cortex data (Brodmann area (BA) 9). As illustrated in Figure 1D, there is strong overlap in eQTL between frontal cortex and monocytes: 275/411 egenes in monocytes have signi cant eQTLs in at least one other dataset, with the majority (189) shared with both (fold-enrichment=5.87, p=1.3x10 −99 ). In addition, genes with stronger monocyte:cortex correlations have signi cantly stronger eQTLs in both tissues ( Figure 1E). Thus, while the monocyte RNA expression of these genes could offer a proxy for brain expression, they largely re ect the effect of genetic variants in those genes, and the added utility of measuring this largely constitutive genetic effect on monocyte gene expression is probably limited.
Having established that the monocyte transcriptome offers little in terms of a proxy for measuring the cortical transcriptome, we turned to evaluating the relation of individual gene expression levels to a variety of ante-mortem cognition-related traits, post-mortem neuropathologic indices, and diagnostic categories.
Association of monocyte genes with cognitive and neuropathologic traits Full monocyte transcriptome differential expression results are visualized in Figure 2A and compared to results obtained from the cortical transcriptome ( Figure 2B-C) for the same traits, using the same analytic models.
Focusing rst on cognition-related traits, we found few signi cant associations of monocyte gene expression with measures of cognitive performance in different domains that were collected at the time of blood sampling. The measure of global cognitive performance, which summarizes all of the tests, only had 10 genes meeting FDR<0.05. Most of those genes overlapped with results for the component cognitive measures, notably episodic memory, which is profoundly affected in AD; Figure 2D. By contrast, consistent with prior analyses of these data, the DLPFC transcriptome is broadly associated with cognitive performance. At a more liberal FDR correction threshold of FDR < 0.2, we did nd signi cant overlap between genes associated with semantic memory (p=0.001) and episodic memory (p=0.002) in both tissues.
In evaluating neuropathologic indices, we rst focus on the AD-related traits. Looking at a pathologic diagnosis of AD (pathoAD) using the Reagan criteria 20 , we again see only a small number of associations, compared to the large changes seen in DLPFC, where measurable atrophy happens late in the disease. Pathologic AD is de ned by the severity and extent of neuritic plaques and neuro brillary tangles, the two de ning pathologic features of AD. There are multiple different ways to measure these pathologies, and all strongly associated with a large fraction of the DLPFC transcriptome. In monocytes, we found no associations of gene expression with measures of tau proteinopathy (neuro brillary tangles (NFT) measured by silver stain or paired helical lament tau (PHF) tau using immunohistochemistry for the AT8 antibody). However, we did observe some associations with measures amyloid β proteinopathy, including an immuno uorescence-based measure of amyloid β (Total Aβ), neuritic plaques measured by silver stain, and diffuse plaques (DP). In particular, the DP trait had more associations than the neuritic plaques or Aβ (n genes =51); though all four genes associated with neuritic plaques were also associated with DP ( Figure 2E). We also found associations with severity of cerebral amyloid angiopathy (n genes =281), which is a measurement of Aβ deposition in meningeal and parenchymal blood vessels.
This suggests that the monocyte transcriptome may be in uenced by or have an effect on some of the same factors that in uence the accumulation of amyloid proteinopathy. This narrative connects with the observation that activated microglia surround amyloid plaques and that some AD variants may in uence phagocytosis of Aβ. 21 There were also associations with other neuropathologies; in particular, there were strong associations with cerebral arteriolosclerosis (n genes =635). However, the most prominent associations of monocyte gene expression are found in relation to the proportion of activated microglia (PAM), a trait based on counting the number of microglia with an activated, stage III morphology. 22 The two measures of cortical PAM (from the midfrontal (MF) cortex and inferior temporal (IT) cortex) showed the most associations, despite limited power from the reduced sample size of individuals who have monocyte RNA expression and PAM measures (n subjects =51). All measures of PAM had signi cant overlap in the gene sets they were associated with ( Figure 2F), though the strongest overlap was observed between cortical measures (n overlap =154), and 7 genes were signi cantly associated with all four measures (p hypergeometric =7.1x10 −9 ; EXOSC5, MLST8, NUP58, PVT1, RAF1, RNASEH2B, SERINC1). In Figure 3, we present an alternative visualization of the transcriptome-wide results, emphasizing global cognition, diffuse amyloid, cerebrovascular, and microglial pathologies. Full differential expression results and scatterplots for top individual gene effects for all phenotypes are available for download online (will be uploaded to AD Knowledge Portal prior to publication).
Overall, while there are hundreds of individual genes signi cantly associated with arteriosclerosis, cerebral amyloid angiopathy, and microglial activation, the monocyte transcriptome does not appear to be strongly associated with the canonical AD-related traits of Total Aβ, neuritic amyloid plaques, PHF tau, or NFTs.

Comparison with Published Myeloid Cell Differential Expression Signatures
In the absence of comparable monocyte datasets with postmortem neuropathologic phenotype data for examination, we extended our ndings by accessing summary statistics from the myeloid landscape 2 (ML2) repository. 23,24 This collection of differential gene expression results is assembled from 35 experiments in human and preclinical brain tissues and myeloid cells of the CNS and periphery, each providing effect size and signi cance estimates for individual gene associations with Alzheimer's disease-related and neuroin ammatory traits, perturbations, preclinical models, and cell populations. We evaluated the overlap of each set of differentially expressed genes from our monocyte analyses for all neuropathologic and cognitive traits with differentially expressed genes from each ML2 experiment in a pairwise manner (on the background of genes common in both analyses; results in Supplementary Figure  1 and Supplementary Table 3). Overall, there were many overlapping gene sets (hypergeometric p<0.05), though not all showed strong consistency in effect direction concordance. Among the top results were overlaps between genes differentially expressed in cortical CD11b+ microglia (GSE75246 25 & GSE67858 26 ) from lipopolysaccharide-injected mice monocyte genes associated with cerebral arteriolosclerosis (p=6.8x10 −8 ), midfrontal PAM (p=0.027), and diffuse amyloid plaques (p=0.011).
Several other overlaps of interest were identi ed e.g., between monocyte genes associated with posterior putamen PAM and genes differentially expressed in CD11b cells of a tau-P301S mutant mouse model (p=2.8x10 −3 ).
Functional enrichment for biological processes GO overrepresentation analyses for biological processes were performed separately for up-and downregulated gene sets for each trait (Supplementary Table 4). Given the low number of genes associated with some traits, we also performed rank-based gene set enrichment analyses using the full set of summary statistics for each differential expression analysis, ranked by moderated t-statistic (top results shown in Supplementary Figure 2; full results Supplementary Table 5). Many fewer enrichments were observed for overrepresentation in discrete sets of genes than for rank-based tests. GSEA analyses revealed the most signi cant enrichment for ribosomal genes, translational initiation, and mRNA metabolism across several phenotypes including subcortical PAM measures, cerebral atherosclerosis, TDP-43 proteinopathy, and cognition. The most signi cant enrichment for diffuse plaque-associated genes were for cellular response to decreased oxygen levels (NES=-2.4, p FDR =4.5x10 −8 ). Notably, beta-amyloid clearance was signi cantly enriched in genes negatively associated with midfrontal PAM (NES=-2.0, p FDR =0.01), suggesting that part of the monocyte signature for cortical microglial activation may re ect the loss of activated microglia's ability to clear amyloid effectively.

Analyzing modules of co-expressed genes
To complement the single gene analyses and maximize power to discover broad patterns in gene expression given our moderate sample size, we created modules of co-expressed genes using the WCGNA method 27 (Figure 4). The monocyte transcriptome ( Figure 4A) exhibited a somewhat bifurcated pattern of diffuse correlations (two major sets of genes represented in large part by the blue (n genes =1,096) and turquoise (n genes =2,327) modules), with several smaller well-de ned clusters of strongly co-expressed genes (a total of 14 clustered modules with 2,595 remaining unclustered genes). In comparison with the clustered DLPFC transcriptome ( Figure 4B), there were fewer identi ed modules (14 vs. 38 modules), consistent with the lack of cell type heterogeneity in our puri ed monocyte samples and the largely immature nature of monocytes. In line with our single-gene cross-tissue analyses, the purple monocyte In module-trait association analyses, we observed association of multiple monocyte gene modules with cerebrovascular traits as well as with Lewy body (α synucleinopathy) pathology ( Figure 4D), though no associations survived FDR correction for multiple testing. This result was somewhat surprising, given the large number of individual genes associated with microglial activation and cerebrovascular phenotypes in the single-gene analyses. We investigated this further by plotting eigengene loadings for genes in each module with their standardized effect estimates from single-gene differential expression analyses (Supplementary Figure 4). This analysis revealed that for modules enriched with signi cantly-associated genes (e.g. The brown module has 87 genes signi cantly associated with midfrontal microglial activation), there were substantial numbers of genes with discordant directions of effect but concordant loadings. This demonstrates a lack co-expression among monocyte genes that are associated with brain pathology in the same direction -a result that is supported by the minimal biological process enrichment found in overrepresentation analyses of the PAM phenotype differentially-expressed genes (Supplementary Table 4). Interestingly, the yellow monocyte module, which was nominally associated with lower cerebral amyloid angiopathy (beta=-2.2, p raw =0.041) and Lewy body stage (beta=-2.8, p raw =0.035), had the hub gene ARID4B (AT-rich interactive domain-containing protein 4B); expression of family member ARID5B in human monocytes has been recently associated with carotid arteriosclerosis. 28 Interaction effects with APOE genotype, clinical AD diagnosis, and sex Finally, we explored whether APOE genotype, clinical disease stage, and biological sex moderated the observed relationships between monocyte gene expression and brain pathology and cognition. To test this, we performed principal components analysis on expression residuals of signi cantly differentially expressed genes for each phenotype. We then associated the top PC (essentially an eigengene representing cohesive expression of only trait-associated genes; Figure 5A) with their respective outcomes in separate linear or logistic regression models, including an interaction term for: 1) APOE ε4 allele carrier status (present vs. absent), 2) clinical AD diagnosis at time of death (no cognitive impairment, mild cognitive impairment, probable AD), and 3) biological sex (male vs. female) ( Figure 5B). After FDR correction across all models (18 outcomes x 3 interacting factors = 54 models), one interaction remained signi cant: that between monocyte gene expression and clinical AD stage on MMSE scores (interaction p=2.5x10 −4 , p FDR =0.014) suggesting that the relationship between monocyte gene expression and MMSE is only present in those with an eventual consensus diagnosis of dementia likely due to AD ( Figure 5C). We interpret this result with caution, as the MMSE is often used to screen for dementia, and homoscedasticity is not maintained across diagnostic categories. Nonetheless, MMSE scores do not determine the consensus AD diagnosis, and the effect is pronounced, so we examined PC1 and found that it loaded most strongly onto IL6ST (0.68) and AMPD3 (-0.72), in opposite directions. This indicates that anticorrelated expression of these genes in monocytes are strongly linked to MMSE score in those with substantial cognitive de cits, where the variation in performance is greatest. The IL6ST result aligns well with the association of higher circulating IL6 levels with AD, as IL6ST is one of the subunits of the IL6 receptor. In addition, we found near-signi cant interactions of monocyte gene expression with APOE ε4 status on global cognitive performance (interaction p=2.8x10 −3 , p FDR =0.15), whereby the association of expression and cognition was stronger in ε4 carriers than in non-carriers ( Figure 5D). Expression PC1 for global cognition genes loaded most strongly, in concordant directions, onto SARS1 (0.47), ARAP1 (0.44), and PQLC2 (0.41). No notable interactions were observed for biological sex.

Discussion
This report establishes a new resource of peripheral monocyte gene expression with matched data from human postmortem brain tissue, with all data and results available through the AD Knowledge Portal to facilitate repurposing. A recent review summarized gene expression studies in human monocytes 29 , and our data are unique given the availability of rich ante-and post-mortem traits from each participant. The relationship between brain and peripheral gene expression has previously been explored in whole blood, which has limitations given the heterogeneity of component cells and smaller samples sizes of these studies. 30,31 Here, we focused on a cell type robustly implicated in AD from human genetic analyses, and we have demonstrated that the monocyte transcriptome, captured ex vivo, has little correlation with the postmortem brain transcriptome measured in the prefrontal cortex. There are some exceptions, with a subset of strongly correlated immune and ribosomal genes; however, these correlations are driven by strong eQTLs that are active in both tissues. We have also shown that the peripheral monocyte transcriptome has little association with the majority of AD-related traits: overall, peripheral monocytes do not appear to have any robust relationship with canonical AD-related pathologies.
However, these monocyte transcriptomes from older individuals were associated with traditionally non-AD pathologies such as cerebral atherosclerosis, consistent with existing work 32 . More interesting are the associations with microglial activation state in both cortical and subcortical brain regions. This result aligns with the intuition that central and peripheral immune cells may respond similarly to systemic cues. However, these signals appear to be diverse and may be driven by a variety of factors, as groups of individually associated genes do not yield evidence of association when collapsed within co-expressed gene modules. Further, while the proportion of activated microglia is associated with AD traits 22 , these associations do not appear to be strongly linked to monocyte gene expression. Thus, peripheral monocytes deserve further attention given their relation to microglial states, but it is unlikely that this inter-connection of peripheral and central immunity is strongly related to AD. They may be more relevant for other aging-related pathologic processes.
In more detailed modelling, we found little evidence for the in uence of AD disease stage or APOE genotype on gene-pathology associations. One exception was an effect of IL6ST on MMSE score that was only observed in individuals with a diagnosis of dementia (likely due to AD), suggesting that the later stages of AD may accompany perturbations of the IL6 pathway and other measures of systemic in ammation that are not part of an early risk pro le. Thus, while monocytes may be less likely to have a causal role in AD, they may re ect some of the effects of the disease downstream of cognitive dysfunction, which is different from microglia implicated in the accumulation of amyloid and tau proteinopathy 22,33,34 . Another conditional effect (albeit not signi cant after correction for multiple testing) was observed for cognition-associated genes, represented most strongly by SARS1, ARAP1, and PQLC2, whereby their association with global cognition was stronger in APOE ε4 carriers. Some of these genes have known links to AD; brain expression of ARAP1 has recently been associated with beta-amyloid load in African-Americans 35 and PQLC2 is a lysosomal lysine/arginine transporter important for the unfolded protein response 36 , which is protective against neurodegeneration 37 . It is possible that APOE ε4 carriers, which have increased pro-in ammatory response to lipopolysaccharide injection 38 , are particularly susceptible to in ammatory alterations in peripheral monocytes; our cognition-associated monocyte genes were signi cantly enriched for response to interleukin-1.
This study has several limitations. First, the sample size was limited, so we can only exclude the possibility of large effects in interpreting the results of our largely negative results for the AD-related traits. Second, the monocyte RNA sequencing data were produced in two batches; this technical heterogeneity was mitigated with cross-batch normalization and the use of voomWithQualityWeights.
However, such methods could over-correct or attenuate certain results. Third, we performed a large number of comparisons and the FDR did not account for testing of 26 phenotypes in addition to all genes in both tissues. This is a trade-off based on the limited sample size; while not strictly conservative, this approach allows for prioritizing genes for further evaluations. Since rank in the results does not change, our GSEA-based enrichment results and their interpretation are not altered.
In conclusion, we have identi ed monocyte gene expression signatures for microglial activation, neurovascular pathology, and cognitive performance in late life. Given the roles of monocytes in monitoring and responding to events unfolding within the CNS, their molecular phenotype may be sensitive to some neuropathological changes associated with aging and place them as suitable biomarkers for vascular, in ammatory, and possibly cognitive consequences of neurodegenerative illness in aging. 17

Study Participants
All participants in this study were part of the Religious Orders Study or Rush Memory and Aging Project (ROS/MAP) 39 . All subjects were recruited free of known dementia (mean age at entry = 78 ± 7.8 (SD) years), agreed to annual clinical and neurocognitive evaluation, and signed an Anatomical Gift Act allowing for brain autopsy at time of death. Written informed consent was obtained from all participants and study protocols were approved by an Institutional Review Board of Rush University Medical Center.
All participants signed a repository consent that allowed for resource sharing. Data can be requested at https://www.radc.rush.edu.

Assessment of Cognitive Performance
All subjects were administered 17 cognitive tests annually spanning ve cognitive domains. Raw scores for tests within each domain were z-scored (using the mean and standard deviation of the entire ROS/MAP cohort at baseline) and averaged to form the composite measures used as outcomes in differential expression analysis. The list of individual cognitive tasks and their corresponding domains has been published 22 . Further, the Mini Mental Status Examination (MMSE), a widely used 30 item measure of global cognition and dementia severity 40 , was also administered.

Assessment of Postmortem Neuropathology and Consensus Diagnoses
With an average postmortem interval of 9.3 hours (SD=8.1), brains were removed in a standard fashion as previously described. 41 All brains were examined by a board-certi ed neuropathologist blinded to clinical data. A total of 18 disease-and age-related neuropathological and were measured brain-wide, including validated measures of Aβ peptides, neuritic and diffuse plaques, hyperphosphorylated tau protein, neuro brillary tangles, micro and macro cerebral infarcts, cerebral atherosclerosis, degree of alpha-synucleinopathy, TDP43 proteinopathy, and hippocampal sclerosis. A binary summary diagnosis of neuropathologic Alzheimer's disease was also calculated according to NIA-REAGAN criteria (absent if likelihood was no or low and present if intermediate or high) 20 . A subset of samples were also evaluated for the presence of microglia at three stages of activation in four regions (midfrontal (MF) cortex, inferior temporal (IT) cortex, ventral medial caudate (VM), and posterior putamen (PPUT)), based on morphology: stage I (thin rami ed processes), stage II (plump cytoplasm and thicker processes), and stage III (appearance of macrophages). Detailed descriptions of all neuropathological variables have been previously published. 22 In addition, at the time of death, a consensus summary diagnostic opinion was rendered by one or more neurologist(s) regarding the most likely clinical diagnosis of Parkinson's disease, blind to neuropathological data. 42  Sequencing of RNA from Bulk Brain (DLPFC) RNA sequencing on DLPFC tissue was carried out in 13 batches within three distinct library preparation and sequencing pipelines. All samples were extracted using Qiagen's miRNeasy mini kit (cat. no. 217004) and the RNase free DNase Set (cat. no. 79254), and quanti ed by Nanodrop and quality was evaluated by Agilent Bioanalyzer. Full details on these methods are available on the AMP-AD knowledge portal (syn3219045). Brie y, for pipeline #1, The Broad Institutes's Genomics Platform performed RNA-Seq library preparation using the strand speci c dUTP method 43 with poly-A selection 44

RNAseq Processing and Quality Control
Alignment pipeline and quality control For monocytes, two batches (n batch1 =46, n batch2 =201; initial n total =247) were processed using the same pipeline: 1) fastq le quality control was performed using FastQC v0.11.5 (default parameters), 2) STAR v2.5.3a was used to align reads (GRCh38.91 reference), 3) RSEM v1.2.31 was used to quantify expression from aligned BAM les, and 4) multiqc v1.5 was used to aggregate quality metrics from fastqc and Picard tools v2.17.4, 5) quality reports were examined for each batch and exclusion of samples was initially carried out according to manual identi cation of outlying samples primarily considering low numbers of aligned reads, excess GC coverage bias, high percentage of read duplicates, and abnormal distribution of read assignments across genomic annotations (nine outliers were identi ed at this stage and removed; new n total =238). For DLPFC, the identical pipeline was applied for all 13 batches (initial n total =1,110). Expression of the XIST gene was evaluated at this stage to exclude subjects with contradictory reported biological sex. For monocytes, ve subjects were identi ed (new n total =233), and for DLPFC, 13 subjects were identi ed (new n total =1,097). Count data quality control and initial outlier removal Expected counts, calculated by RSEM, were aggregated across both batches and used as input to limma (v3.48.3) voom in R (v4.1.1). Genes with insu cient expression (median count value was less than 15 across the combined sample) were removed. Naïve multidimensional scaling (MDS) analysis was then performed for each batch separately on the top 5000 most variable genes (using limma "plotMDS") to identify subjects with outlying expression patterns. Outliers were de ned as those with values of either of the rst two latent dimensions exceeding ±4 times the interquartile range (IQR) of their within-batch median value. Following this step, 224 subjects remained for monocytes, and 1,091 remained for DLPFC.
To limit bias in modeling due to extreme within-subject outlier observations, we conservatively coerced extreme expression values separately per gene: any values beyond 8xIQR of the voom-transformed sample median values were coerced to the nearest maximum or minimum point of the sample distribution. In this way, we were able preserve some in uence for such observations in linear modeling while limiting leverage in ation. These QC'd data were then trimmed mean of M-values (TMM) normalized. For monocytes, in uential technical variables were determined to be: batch, % of usable bases, % of passed lter reads aligned, % read duplicates, median 3' bias, estimated library size, and study (ROS vs. MAP). In addition, we observed effects on expression PCs by important biological and clinical variables measured at time of blood draw that were also included in downstream association analyses of monocyte expression: biological sex, age, fasting status, hematologic medication status, blood hemoglobin, mean corpuscular volume (MCV), mean corpuscular hemoglobin concentration (MCHC), platelet count, and white blood cell count (WBC). These variables were not selected in isolation; correlations between all potential co-variates were examined prior to inclusion and variables were selected to avoid redundancy ( Supplementary Figures 7 & 8). In downstream models of postmortem outcomes, age at death and postmortem interval (PMI) were also included as covariates. For cognitive outcomes, years of education were included.

Co-variate selection
For DLPFC, the following variables were selected using the same procedure described above: batch, study, biological sex, age at death, PMI, median coe cient of variation for coverage values of the 1000 most highly expressed genes, % of aligned bases mapping to ribosomal RNA, % coding bases, % UTR bases, log(estimated library size), log(passed lter aligned reads), median 5' to 3' bias, % of passed lter reads aligned, % read duplicates, median 3' bias, and % of intergenic bases. Again, for cognitive outcomes, years of education were also included.
Estimation of brain cell type proportions in DLPFC Prior to calculating residual expression values, a brain cell type-corrected expression matrix was generated for DLPFC on voom-transformed expression values. This matrix was used as input for crosstissue correlation analyses, transcriptome-wide differential expression analyses, as well as WGCNA. Cell type proportions were estimated using the Brain Cell Type Speci c Gene Expression Analysis (BRETIGEA) 45  Final QC for post-processed expression residuals As a nal step to ensure the robustness of included data, linear effects of identi ed technical co-variates, age at draw (age at death and PMI for DLPFC), and biological sex were removed using the "lm t" (specifying robust Huber regression 48 ), and expression values residualized. Individuals were then hierarchically clustered (agglomerative) and the resulting dendrograms (see Supplementary Figure 9) were manually inspected to identify any additional subject outliers escaping recognition by twodimensional MDS (6 identi ed for monocytes, 18 for DLPFC). This resulted in two high quality RNAseq datasets: monocytes (218 subjects and 9,129 genes) and DLPFC bulk tissue (n genes =17 465).

Weighted Gene Co-Expression Network Analysis (WGCNA)
The WGCNA pipeline (v1.70-3) 27 was used to detect co-expressed gene modules in the monocyte and DLPFC datasets separately using the signed network approach. For network construction, gene expression residuals were used as input; for monocytes, covariates included all selected technical variables plus age at blood draw and biological sex. This set of residuals was chosen to allow for downstream module-trait analyses on both cognitive and postmortem variables, which require different analytical covariates. DLPFC expression residuals adjusted for all selected variables listed above, in addition to brain cell type proportions. Brie y, WGCNA calculates a topological overlap matrix (TOM) based on the signed correlation between all input genes, raised to a power (which optimizes scale-free network topology and improves module detection). To improve the robustness of our TOM network structure, we used biweight midcorrelations as our pairwise similarity measure (corType="bicor"), allowing for 5% of observations for any given correlation to be considered outliers (maxPOutliers=0.05).
The optimal soft threshold for calculation of adjacency was determined by visual assessment of the scale free topology index R 2 calculated from the "pickSoftPowerThreshold" function (8 for monocytes and 16 for DLPFC; see Supplementary Figure 10 for threshold determination experiment results). The TOM is then hierarchically clustered, and distinct clusters of genes detected using a dynamic tree-cutting and merging process (non-default parameters: minClusterSize=30, deepSplit=3, detectCutHeight=0.995, minKMEtoStay=0.3, pamStage=TRUE, pamRespectsDendro=TRUE). After clustering into gene modules, eigen-decomposition was used to identify latent features capturing the most linear covariance among all genes expressed within each module and each dataset. These "eigengenes" therefore summarize the expression of all genes within a given module, weighted by the similarity of each member gene to the other members. This resulted in a set of high con dence, structurally cohesive gene module identities for further examination. Gene modules identi ed in monocytes and DLPFC separately were assessed for overlapping identities (gene members) using the WGCNA "overlapTable" and function.

Monocyte Expression Quantitative Trait Loci (eQTL) Analysis
Genotype data was available for 2067 ROS/MAP subjects, between two batches: n batch1 =1686 genotyped using the Affymetrix GeneChip 6.0 and n batch2 =381 genotyped using the Illumina OmniQuad Express platform. Details of raw genotype quality control have been previously described. 49 Each batch was imputed separately using the TOPMed Imputation Server (TOPMed reference r2), 50 including Eagle (v2.4) for allelic phasing and Minimac4 (v1.5.7) for imputation. Prior to submission for imputation, genotypes were preprocessed using the TOPMed Imputation Server-recommended data preparation pipeline available here: https://topmedimpute.readthedocs.io/en/latest/prepare-your-data.html. Imputed output data from the TOPMed server for each batch were ltered for imputation quality (removing SNPs with r<0.8) before merging and mapping to rsIDs (dbSNP build 155). This resulted in a nal high-quality dataset of 9,329,439 bi-allelic autosomal SNPs. PLINK2 (v2.00a3) was used to perform QC on SNP data prior to eQTL analysis, including removing variants with minor allele frequency (MAF) > 0.05 and violating Hardy-Weinberg Equilibrium (HWE) with a p-value below 1x10 -6 . BootstrapQTL (v1.0.5) 51 was used to perform cis-eQTL mapping within a distance of 1MB from the start and stop sites of each gene passing QC in our monocyte RNAseq experiment (GRCh38 coordinates). Analyses co-varied for biological sex, age at blood draw, and the top 10 genomic PCs estimated from only genotyped SNPs overlapping between genotyping platforms (n snps =188,936). 1000 bootstrap iterations were performed to address the "winner's curse" common in cis-eQTL mapping studies 51 and maximize the con dence of our results.

External Postmortem Brain Cis-eQTL Resources
For comparison of our monocyte is-eQTL map with cis-eQTL data from human frontal cortex, we accessed two existing resources: the GTEx cis-eQTL database v8 (dbGaP accession phs000424.v8.p2) and the recently updated (June 2021) xQTL-serve 19 , which was built directly from DLPFC expression and genotype data from the ROS/MAP studies. Speci cally, for GTEx, the single-tissue egene collection for frontal cortex (BA9) was downloaded. For both datasets, each gene was matched to its strongest eQTL (i.e. SNP) for downstream analysis.

Gene Set Enrichment Analyses
The R Bioconductor gpro ler2 (v0.2.0) 18 package was used for gene set overrepresentation analyses across a broad range of annotations in cross-tissue correlation analysis ( Figure 1B-C). For biological process gene ontology (GO) overrepresentation analysis of module membership and rank-based gene set enrichment analysis (GSEA) of transcriptome-wide differential expression results, the clusterPro ler R package (v4.0.2) was used (parameters: minGSSize=20; maxGSSize=300; OrgDb=org.Hs.eg.db). For GSEA, an FDR-corrected threshold of q<0.05 was applied across all tested GO gene sets within each phenotype. Following enrichment tests, the rrvgo (v1.5.2; https://ssayols.github.io/rrvgo/) package was used to perform semantic similarity reduction of enriched GO terms of informative GO terms. Rrvgo uses semantic similarity ("Rel" threshold set at 0.7) to simplify and improve interpretability of GO analyses where many terms may be signi cantly enriched.

Extension of Findings in Publicly Available Myeloid Cell Experiments
To test if genes identi ed in our monocyte differential expression analyses had been identi ed previously in myeloid cells or brain tissue as associated with traits related to Alzheimer's disease of neuroin ammation, we systematically queried the myeloid landscape 2 online database. 23,24 This collection of differential expression results is assembled from 35 experiments in human and preclinical brain tissues and myeloid cells of the CNS and periphery, each providing gene-wise effect size and signi cance estimates. Full descriptions of each component study and their analyses can be found online (http://research-pub.gene.com/BrainMyeloidLandscape/BrainMyeloidLandscape2/#).

Statistical Analyses
Correlation of gene expression between tissues Using covariate-normalized gene expression residuals, Spearman rank correlations were calculated for each gene between monocytes and DLPFC, followed by correction for multiple testing using Benjamini-Hochberg FDR. 52 To assess whether the mean correlation was different from 0, we performed permutation analysis, randomly shu ing subjects in both expression datasets and re-calculating correlation coe cients for every gene 1000 times. This procedure yielded a null distribution of mean correlations from which the p-value for the null test (H 0 :µ cross-tissue rho ≠0) could be calculated: p = proportion of permutations for which the absolute value of the mean correlation was greater than the mean correlation from our observed, unshu ed data.
Comparing cis-eQTL results with cross-tissue correlated genes Values for assessing evidence for genetic regulation were calculated for each gene with at least one signi cant eQTL in both monocytes and DLPFC (corrected p eQTL < 0.05) as the -log 10 Fisher's meta pvalue of its top eSNPs in both tissues (not necessarily the same lead SNPs). Cross-tabulation followed by two-sided Fisher's exact test was used to visualize and statistically test for pairwise overlap in signi cant (p FDR < 0.05) eQTLs between all three datasets (monocytes, DLPFC, and GTEx frontal cortex).
Hypergeometric tests were used to identify signi cance of overlap of signi cant cross-tissue correlated genes with genes possessing signi cant eQTLs in each dataset separately. For consistency in visualization, we also performed a Kruskall-Wallis test on eQTL signi cance (-log 10 (p-value)) between three pools of mutually-exclusive genes used in the Figure 1A visualization: uncorrelated (p>0.05), nominally correlated (p<0.05 & p FDR >0.05), and signi cantly correlated (p FDR < 0.05). Post-hoc Wilcoxon rank sum tests were performed to identify pairwise differences between groups, with Bonferroni correction for multiple testing (three tests; p threshold =0.0176). Transcriptome-wide differential expression analysis Differential expression analysis was performed using limma/voom for associations with neuropathology and cognitive performance outcomes. For monocytes, ltered gene expression values processed with voomWithQualityWeights were used as input, co-varying for all technical and biological confounders speci ed above. For DLPFC, the same approach was used, except ltered and voom-transformed expression values were rst corrected for brain cell type proportions, as described above. Robust linear modeling was used for differential expression, allowing up to 20,000 iterations to reach convergence. Signi cance of the effects for target outcomes in our multivariate models was performed using empirical Bayes moderation (eBayes function). P-values were adjusted using the FDR approach within each phenotype and tissue separately.

Comparison of differential expression results between monocytes and DLPFC
For each cognition-related and neuropathologic outcome, hypergeometric tests were used to determine if the identity of associated genes signi cantly overlapped between monocytes and DLPFC. These tests used the background of genes commonly expressed in both tissues (n genes =8,757) and were performed for three levels of FDR signi cance (FDR<0.05, 0.1, and 0.2). In addition, hypergeometric tests were applied to subsets of outcomes within each tissue, to determine if genes associated with similar traits also had overlapping identities. Tests were implemented using the SuperExactTest (v1.0.7) 53 package in R.

Association of gene modules with cognitive and neuropathological variables
Module eigengenes were associated with the same set of cognitive and pathological outcomes as in transcriptome-wide differential expression analyses, using the same covariates. Ordinary least squares regression was used for association. FDR-based multiple testing correction was as applied across the full set of modules and phenotypes, but separately for monocyte and DLPFC analyses.

Data Availability
All data used in the analyses described can be found on the AMP-AD Knowledge Portal (https://adknowledgeportal.synapse.org/).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. supplementv5.docx