Unique genetic architecture of CSF and brain metabolites pinpoints the novel targets for the traits of human wellness

Brain metabolism perturbation can contribute to traits and diseases. We conducted the first large-scale CSF and brain genome-wide association studies, which identified 219 independent associations (59.8% novel) for 144 CSF metabolites and 36 independent associations (55.6% novel) for 34 brain metabolites. Most of the novel signals (97.7% and 70.0% in CSF and brain) were tissue specific. We also integrated MWAS-FUSION approaches with Mendelian Randomization and colocalization to identify causal metabolites for 27 brain and human wellness phenotypes and identified eight metabolites to be causal for eight traits (11 relationships). Low mannose level was causal to bipolar disorder and as dietary supplement it may provide therapeutic benefits. Low galactosylglycerol level was found causal to Parkinson’s Disease (PD). Our study expanded the knowledge of MQTL in central nervous system, provided insights into human wellness, and successfully demonstrates the utility of combined statistical approaches to inform interventions.


Introduction
Many metabolite levels are known to be heritable with a median heritability of 19.7% 1,2 . Hundreds of blood and urine metabolites have been associated with a number of loci through metabolite genomewide association studies (MGWAS) [1][2][3][4][5][6][7][8][9][10][11][12] . The two recent large blood MGWAS included 8,299 and 14,296 individuals and identi ed hundreds of metabolite-signal associations 2,13 . A large-scale German Chronic Kidney Disease (GCKD) urine study with 1,627 participants identi ed 240 associations at genes with enriched kidney and metabolism-relevant cell type expressions 9 . These studies not only identi ed causal metabolites for common diseases, including bone mineral density, asthma, and chronic kidney disease, but also illuminated the role of common variants at genes of inborn error of metabolism (IEM) in building similar phenotypic features observed in IEM patients 2,13 . These studies contributed to the understanding of common diseases and complex human traits and nominated causal metabolic reactions to the phenotypes. However, very few MGWAS have been performed using neurological-relevant tissues.
To this date, the genetic architecture of central nervous system (CNS) metabolite levels has never been characterized, except for an exploratory study in cerebral spinal uid (CSF) 14 , which is a commonly used proxy for brain tissues. The CSF and brain metabolite concentrations might be distinct from blood due to multiple factors, such as blood-brain barrier (BBB) and the metabolism of brain cells 15 . The tissuespeci c genetic effects on transcript and protein levels in brain and nonbrain tissues may also transfer to downstream pathways that affect metabolite levels 16,17 , leading to brain and CSF-speci c genetic regulations of metabolites. While the CSF and blood level of many metabolites, such as some amino acid derivatives and xenobiotics 18,19 , are highly correlated, some metabolites like tryptophan and kynurenic acid 20 showed low correlation and others even had inverse correlation such as gammaglutamylglutamine 19 , suggesting that peripheral metabolism can in uence CSF yet do not govern CSF metabolome. Understanding the genetic architecture of CNS metabolite levels can provide additional information not captured in blood or urine. A recent MGWAS using CSF identi ed metabolite perturbations implicated in neurological disorders 14 . Although promising, this study had a limited sample size (N=291), which resulted in a small number of ndings. Larger scaled studies are needed to comprehensively understand the etiology of brain-related traits and disorders as well as to identify their causal metabolites and potential drug targets.
Here, we conducted MGWAS using large and well characterized CSF (N=2,602) and brain (N=1,016) datasets, to rst determine the overall genetic architecture of metabolite levels in CSF and brain and to compare them with other tissues in order to identify tissue-speci c associations. We subsequently integrated the knowledge from MGWAS with colocalization, functional summary-based imputation (FUSION) and Mendelian randomization (MR) approaches to identify causal and druggable metabolites for related traits such as Alzheimer's disease (AD), Parkinson's Disease (PD), schizophrenia, cognitive performance, and bipolar disorder. The ndings from this study can be leveraged to study other diseases and traits, informing causal and druggable targets.

Study design
For CSF, we performed MGWAS using a three-stage study design: discovery, replication, and metaanalyses. The discovery stage included 1,224 unrelated non-Hispanic white (NHW) samples from the Knight Alzheimer Disease Research Center (Knight-ADRC), Dominantly Inherited Alzheimer Network (DIAN) and the memory and disorder unit at the university hospital Mutua de Terrasa, Spain (Barcelona-1) cohorts ( Supplementary Fig. 1). The replication stage included a total of 1,378 unrelated NHW samples from the Alzheimer's Disease Neuroimaging Initiative (ADNI), Fundació ACE Alzheimer Center (ACE) and the Wisconsin CSF study cohorts (WADRC and WRAP) 14 (detailed information of the cohorts can be found in Supplementary Table 1). Metabolomics data of all cohorts were generated using the Metabolon HD4 platform. After rigorous quality control (QC; see methods and Supplementary Fig. 2, Supplementary Table 2), a total of 440 metabolites passed QC (329 non-xenobiotics, 59 xenobiotics and 52 minimally characterized metabolites; minimally characterized metabolites include unknown metabolites and partially characterized metabolites). In order to compare the CSF genetic architecture of metabolites with that of brain, we also performed a large-scaled meta-analyses of brain MGWAS that included a total of  Table 3). Of these 962 metabolites, 360 passed QC in CSF and therefore 602 were uniquely analyzed in brain ( Supplementary Fig. 3). Following the identi cation of associations, we performed deep characterization and functional annotation to identify the effector genes.
In this study, we not only compared the genetic architecture of CSF, brain, blood and urine, but also determined novel vs. known associations and loci by comparing our results with the largest CSF, plasma and urine MGWAS available at the moment of the study. Furthermore, the genetic regulators for brain and CSF metabolites were used to identify genetically dysregulated metabolites (via TWAS/Fusion) and to uncover metabolites causal for 12 neurological and 15 non-neurological traits or disorders through colocalization and MR ( Supplementary Fig. 1, Supplementary Table 4). Each of these traits and disorders either has been linked to the central nervous system or is a risk factor for brain disorders.
CSF and brain MGWAS identify hundreds of novel and tissue speci c metabolite associations We performed the largest MGWAS to date in both CSF (N = 2,602) and brain (N = 1,016) tissues. In CSF, signi cant associations (metabolites-genetic locus pairs) were de ned as those with: 1) nominal signi cance in discovery and replication; 2) effects in the same direction; and 3) study-wide signi cance in meta-analysis (P < 2.79×10 − 10 ; Supplementary Fig. 1; See material and methods for additional details).
Among 440 CSF metabolites we identi ed a total of 192 associations for 144 metabolites at 102 distinct loci (Fig. 1a&b, Supplementary Fig. 4a&5a, Supplementary Tables 5&6). The effect sizes of the index variants between discovery and replication showed high correlation (r = 0.97; Supplementary Fig. 6), indicating high replicability and that both stages contributed to the association. Of these 192 associations, 173 associations belong to 130 well characterized metabolites (123 non-xenobiotics and 7 xenobiotics) and 19 associations belong to 14 minimally characterized metabolites (Supplementary  Table 2&5).
Given the potentially complex nature of certain associations, we performed conditional analysis to identify independent associations within the identi ed regions. To be considered as an independent signal, we required that the variant passed the study-wide threshold (P < 2.79×10 − 10 ) in the conditional analyses. Of the 192 associations, 13 had two independent signals, and seven had three independent signals ( Supplementary Fig. 7a). Therefore, in CSF we identi ed a total of 219 independent association signals for 144 metabolites at 102 loci. Of the seven association regions with three signals, two were novel (Supplementary Table 5). One of them was an association between methylmalonate (MMA) and the ACSF3 region (16q24.3). Its primary signal included a missense variant (rs11547019 -p.Ala17Pro) for malonyl-CoA synthetase family member 3. While the two additional signals did not include any SNP that modi es protein sequence, but harbored known expression quantitative loci (eQTLs;P < 10 − 4 ) for ACSF3. In the other case, three independent signals were found at the CNDP1 gene region which was associated with homocarnosine, a substrate of CNDP1's encoded enzyme. While the primary signal (rs56042934) was intronic to CNDP1 with unknown mechanism of action, the two other signals cause benign (rs73973908) and deleterious (the lead variant rs140836083) missense changes to the enzyme. In addition, there could be an underestimation in the number of independent signals, because the use of study-wide threshold can be too stringent. In fact, associations with higher signi cance are more likely to have complex regions based on student's t test (student's t = 3.24, P = 4 ×10 − 3 ; Supplementary Fig. 7b).
Regardless, these results indicated that several independent signals may regulate metabolites levels in the same locus through multiple independent events (some signals change protein sequence or even protein function, and others alter protein level by modifying mRNA level) 17,21 .
As our study was enriched for Alzheimer's diseases (AD) patients, our results may have been affected by participants' health status. We therefore conducted sensitivity analyses by performing MGWAS including either only healthy (N = 883) or AD individuals (N = 769) based on the biomarker status (amyloid/tau/neurodegeneration (ATN) classi cation (see extended results)). The effect sizes of the index variants showed high correlation with that of analyses included either controls-only (r = 0.96; P < 2.2×10 − 16 ) or ADs-only (r = 0.97; P < 2.2×10 − 16 ) individuals, indicating that the disease status minimally affected our identi ed associations (Supplementary Fig. 8; extended results).
In brain MGWAS, only associations that had the same direction of effect for two or more three cohorts (for shared metabolites) and had study-wide signi cance in the meta-analyses were considered signi cant ( Supplementary Fig. 1). The brain MGWAS (n = 1,016 individuals and 962 metabolites) identi ed 35 associations for 34 metabolites at 27 loci (Fig. 1a&c, Supplementary Fig. 4a&5a, Supplementary Fig. 9, Supplementary Tables 7&8, see extended results). Conditional analysis identi ed one additional independent signal for cytidine ( Supplementary Fig. 7c). Therefore, we identi ed a total of 36 independent association signals for 34 metabolites at 27 loci. Of these associations, 16 were identi ed in both, CSF and brain (PP.H4.abf > 0.8, Supplementary Table 9). Many more associations were identi ed in CSF compared to brain that could be due to either CSF and brain having different genetic architectures or simply because CSF had higher statistical power due to sample size. To address this, we examined the study-wide signi cant associations (180 associations for 133 metabolites) of the 360 metabolites present in both tissues. The effect size for these 180 associations showed a high correlation between CSF and brain (r = 0.81, p < 2.2×10 − 16 ; Supplementary Fig. 10f; extended results), indicating that the genetic architecture of metabolites levels is similar between CSF and brain. Additionally, we applied Mashr 16 approach to compare study-wide associations between CSF and brain in direction and magnitude. We found that 90% of associations had the same direction of effect and 57% of associations shared effects in both direction and magnitude.
We then performed additional analyses comparing the overall genetic architecture of metabolite levels across four different tissues: brain, CSF, blood and urine, using mashr ( Supplementary Fig. 11). We used the latest blood and urine MGWAS available at the time of the study 9,12 . For this comparison, we focused on the 247 metabolites that were tested across all tissues and their genome-wide signi cant signals. All tissue pairs had over 80% of consistent direction of associations, with the highest percentage in CSF and blood (91%; Supplementary Fig. 11b, Supplementary Table. 10). When both direction consistency and magnitude similarity (within 2-fold) were considered, CSF and brain showed the highest (57%) overlap of associations followed by blood and urine (32%). Additionally, brain and urine had more direction speci c associations (effect direction in one tissue being different from other tissues) than other tissues, indicating unique genetic regulation in brain metabolism and renal function (urine metabolite levels). These ndings emphasized the need to analyze brain-related tissues in MGWAS in order to better understand neurological diseases.
Finally, we examined whether the CSF and brain MGWAS led to any novel signal by comparing to the large blood and urine studies and the CSF study of which was included in our meta-analysis 1,8,9,12,14 . Of the 219 independent association signals (192 associations) in CSF, 88 signals (70 association regions) had been previously reported in at least one of the ve large-scale Metabolon-platform based studies (PP.H4.abf > 0.6; Fig. 1d; Supplementary Fig. 12). We found that 97.7% of the 131 novel association signals (at 113 novel regions and 9 reported regions) originated from previously examined metabolites in blood or urine, suggesting that our associations may be speci c to CSF. The 131 novel signals corresponded to 24 novel loci and 49 previously reported loci that were associated with different metabolites ( Fig. 1d; Supplementary Fig. 13). For brain, 16 independent signals (16 associations) of the 36 signals (35 associations) have been reported (Fig. 1d) in previous studies (PP.H4.abf > 0.6; Fig. 1d, Supplementary Fig. 13). Therefore, we identi ed 20 novel association signals (at 13 novel association regions and six reported regions) for 18 metabolites, in which six signals (4 novel loci) were for metabolites not analyzed in any previous study. In addition, these 20 novel association signals correspond to seven novel loci and six reported loci associated with different metabolites than the one identi ed here ( Fig. 1d; Supplementary Fig. 13, see extended results).

Pleiotropic loci and polygenic metabolites
Pleiotropic analyses can be instrumental to identify metabolites that are part of the same metabolic reaction or are unknown substrates, or products of one speci c reaction. In CSF, 43 of the 102 identi ed loci were associated with more than one metabolite (Supplementary Table 6). Most of these loci were associated with two (23 loci) or three (10 loci) metabolites, although there were two loci with four metabolites, four loci with ve metabolites, two loci with six, one locus associated with seven metabolites, and one locus associated with ten metabolites (Fig. 2a Supplementary Fig. 14). The most pleiotropic CSF locus, located at SLC13A3/ADA gene region, was associated with ten metabolites (Fig. 2b, Supplementary Fig. 14a). This region was a complex region as there were two independent signals (based on r 2 > 0.8): rs406383, intronic to ADA, was associated with N1-methyladenosine and rs439143, intronic to SLC13A3, was associated with nine different metabolites. Of these nine metabolites, seven were potential direct substrates of the transporter encoded by SLC13A3, being either amino acid derivatives or Krebs cycle components 22 , and the others were carnitine molecules secondary to Krebs cycle components. 22 In brain, six of the 27 loci were pleiotropic and associated with either two (four loci) or three (two loci) metabolites ( Fig. 2d-f). All six loci were identi ed as pleiotropic in CSF as well. The metabolites associated with these loci were often shared by both tissues, while additional metabolites were identi ed from brain due to either metabolites uniquely analyzed in brain or brain-tissue-speci c associations not identi ed in CSF (see extended results).
The pleiotropic nature of many loci corresponds to known biological mechanisms, as in the case of SLC13A3/ADA locus, as in the case of CPS1, which encodes an enzyme catalyzing the rst step of urea cycle. The variants in this region were associated with metabolites ( Supplementary Fig. 14b, n = 6; i.e homoarginine, glycine, glutamine degradant, among others) that are part of urea cycle or alternative ammonia elimination pathways 23 . The APOE/APOC1 locus was associated with ve lipid metabolites, including cholesterol and four phosphatidylcholines (1,2-dipalmitoyl-GPC (16:0/16:0), 1-myristoyl-2palmitoyl-GPC (14:0/16:0), 1-palmitoyl-2-stearoyl-GPC (16:0/18:0), 1-palmitoyl-2-palmitoleoyl-GPC (16:0/16:1)). Apolipoprotein E is known to interact with lipoproteins and function as cholesterol and phosphatidylcholines carrier 24 . APOE variants are one of the major genetic risk factors of Alzheimer's disease (AD) and cholesterol has been associated with AD development downstream of Aβ and Tau pathology. Several studies also indicate that phosphatidylcholines may lower the risk for dementia and AD [25][26][27] . These ve metabolites were also predicted to be associated with AD based on the MWAS analyses, and all of them were found lower in AD patients CSF based on differential abundance analysis (p < 0.05; Extended data Table 1, and extended results).
In addition, many metabolites were polygenic, meaning that multiple loci were associated with the same metabolite. In CSF, of the 144 metabolites with study-wide association(s), 37 metabolites were associated with multiple loci: 29 metabolites were associated with two loci, six metabolites with three, one metabolite methylsuccinoylcarnitine with four and one metabolite, bilirubin (E,E), was associated with ve loci (Fig. 2c, Supplementary Table 5). The nominated effector genes (See section "In silico functional annotation of the CSF and brain associations") for bilirubin (E,E) were UGT1A6 (2q37.1), GYPA (4q31.21), TWISTNB (7p21.1), FAS (10q23.31), and SPRY2 (13q31.1). UGT1A6, encodes an enzyme that transform bilirubin to water-soluble molecules and GYPA encodes the major intrinsic membrane protein of the erythrocyte, where bilirubin is generated 28 . Mutations in FAS leads to an autoimmune lymphoproliferative syndrome (ALPS) that is associated with hyperbilirubinemia 29 . However, the role of SPRY2 in bilirubin metabolism is unknown but these ndings suggest that it is also part of the pathways that produce or regulate bilirunin. The metabolite methylsuccinoylcarnitine was associated with four loci, which signals were predicted to affect CPT2 (1p32.3), SUCLG2 (3p14.1), ACADS (12q24.31), and SLC13A3 (20q13.12) (Fig. 2c, Supplementary Table 5). Its association with CPT2 has been reported previously, yet the mechanism is unknown. The other three loci were novel, and their nominated functional genes were implicated in the metabolism of methylsuccinoylcarnitine. Both SUCLG2, encoding a succinyl-CoA ligase, and the metabolite is involved in succinyl-CoA pathways. Mutation in ACADS causes short-chain acyl-CoA dehydrogenase de ciency and methylsuccinate level was altered in this disorder 30 . The SLC13A3 encoded protein can transport succinate, which is a building block for methylsuccinoylcarnitine. This is the rst time these genes have been linked to bilirubin and methylsuccinoylcarnitine levels. Additional functional analyses will be needed to characterize them in the context of these polygenic metabolites.
In silico functional annotation of the CSF and brain associations To identify the effector gene for each association, we applied two complementary strategies (Fig. 3a, b).
The rst strategy is based on the ProGeM 31 program which incorporates both genetic annotation and broad metabolism relevance; it prioritizes a gene if 1) the associated signal (the sentinel variants and its tagged variants (r 2 > 0.8)) leads to a change in protein sequence, 2) the gene in the loci belongs to metabolic pathways, 3) the gene that harbors an eQTL overlaps with the association signal, and 4) it is the nearest gene to the sentinel variant ( Supplementary Fig. 15) 31 . The second strategy is based on the manually curated biological knowledge, which relies on metabolite-gene relationship from KEGG 32 , GeneCards 33 , and HMDB 34 databases.
For the 219 CSF signals, the ProGeM-strategy nominated 130 genes for 219 signals and the knowledgebased strategy nominated 89 genes for 165 signals (Fig. 3a). For brain, the ProGeM strategy nominated 29 genes for 36 signals and the knowledge-based strategy nominated 17 genes for 23 signals (Fig. 3b). Both strategies provided consistent predictions, with the same effector gene being nominated in 83.6% and 78.3% of CSF and brain associations ( Fig. 3a, b). In case of discordance (27 CSF and 5 brain associations), the gene nominated from the biological knowledge-based strategy was prioritized over the ProGeM, as we con rmed that the ProGeM-strategy nominated gene was not biologically meaningful to the metabolite (see extended results).
Once the effector gene was nominated for each association signal, we categorized the associations rstly based on the location and consequence of variants to the effector gene, and subsequently based on eQTLs to the effector gene. Categorizing by consequence to the nominated genes, the association included a protein-sequence-altering variants (missense or splice acceptor variants) in 28.3% of CSF (62 association signals mapped to 39 genes) and 19.4% of brain associations (seven association signals mapped to 6 genes; Fig. 3c, d, Supplementary Fig. 16, Supplementary Table 5). Of these, 25 of the 62 CSF and three of the seven brain associations were deleterious to protein functions, predicted by SIFT and PolyPhen 35,36 and of these, ten CSF and two brain deleterious associations are novel. Based on CSF, lossof-function or deleterious variants had higher effect sizes (deleterious vs. benign missense: t = 3.3, p = 2 ×10 − 3 ; deleterious v.s. non-coding: t = 3.1, p = 6 ×10 − 3 ) and lower minor allele frequencies (deleterious vs. benign missense: t = -3.2, p = 2 ×10 − 3 ; deleterious vs. non-coding: t = -3.2, p = 3 ×10 − 3 ) than those that were non-coding or were predicted to be benign ( Fig. 3e-h). We identi ed that 58.9% of the CSF (129 signals mapped to 81 genes) and 72.2% of the brain association signals (26 signals mapped to 19 genes) included an eQTL (Supplementary Table 5). In 15.1% of the CSF association signals (29 signals in 21 genes) and in 11.4% of the brain association signals, the same prioritized gene was supported by both altered protein sequence and eQTL variant evidence (Supplementary Table 5&7).
Among the nominated effector genes, 91.8% in CSF (87.9% of unique genes) and 77.8% in brain (75.0% of unique genes) encoded enzymes or transporters ( Supplementary Fig. 17c, d). In addition, 42.9% and 34.4% of the total nominated effector genes for the CSF and brain association signals correspond to cisproteins, de ned as enzymes and transporter (production, degradation, transport) for a speci c metabolite ( Supplementary Fig. 17a, b).
Then, we investigated if the nominated genes showed an enrichment for any speci c brain cell type. The cell type speci city was determined for each gene based on gene expression (see material and methods) 37 . We found that the nominated effector genes for the CSF associations were enriched for astrocytes (log2FC = 1.66, p = 4.7×10 − 5 , Supplementary Fig. 18), which were the key regulators of brain energy metabolism 38 .
Insights into brain-related phenotypes using genetically regulated metabolites Metabolism dysregulation, observed in many disorders, can be part of the causal pathway and potentially be good targets for intervention. The plasma MGWAS by Chen et al. identi ed 95 causal relationships for 12 phenotypes ( ve phenotypes included in this study) including O-sulfo-l-tyrosine for PD and the ratio of choline phosphate/choline for AD 2 . The recent urine MGWAS study identi ed 684 relationships between 110 metabolites and 68 phenotypes (no phenotype overlapped with this study) through colocalization analyses 9 . An earlier CSF MGWAS study 14 identi ed 19 metabolites-trait pairs for multiple neurological and psychiatric disorders including attention de cit hyperactivity disorder (ADHD)-malate and schizophrenia-N-delta-acetylornithione 14 , through metabolome-wide association (MWAS) analyses.
Here, we integrate our CSF and brain MGWAS data to identify potential biomarkers and causal for 27 brain and wellness-related traits or disorders (Alzheimer's disease, alcoholism, cognitive performance, among others; Supplementary Table 4 To identify metabolites dysregulated with those traits, the FUSION approach was used to build metabolite level prediction models based on study-wide signi cant associations and performing association analysis between predicted metabolite levels and phenotypes. The weights for predicting metabolites were calculated for 92.4% (133/144) of CSF metabolites and 85.3% (29/34) of brain metabolites that had at least one heritable association region (Supplementary Table 11). Through this approach, we identi ed 62 CSF metabolite levels associated with 19 phenotypes including ADHD, alcoholism, bipolar disorder (128 metabolite-phenotype pairs), and nine brain metabolites associated with 12 phenotypes (22 metabolite-phenotype pairs; Fig. 4 Both CSF and brain analyses identi ed seven metabolite-trait pairs, including four metabolites (succinylcarnitine (C4-DC), N6-methyllysine, methylsuccinate, ethylmalonate) and six traits (AD, baldness, educational attainment, major depressive disorder, schizophrenia, smoking initiation). Across tissues, these associations showed consistent effects in both direction and magnitude (Supplementary table 14).
In total, we identi ed 140 unique metabolite-traits pairs in CSF and/or brain, in which only ve were reported in the previous CSF MWAS study (Supplementary Table 15) 14 . Therefore, the remaining 135 metabolite-traits pairs are novel. We found the trait waist-to-hip ratio adjusted for BMI (WHRadjBMI) was associated with 21 metabolites (the largest number), education attainment with 19 metabolites, cognitive performance with 13 metabolites, schizophrenia with ten metabolites, and Alzheimer's disease with nine metabolites.
To investigate whether the metabolite-phenotype associations identi ed through MWAS had the same functional variant for the metabolite and the phenotype, we performed colocalization analysis. Of the 128 To infer causal metabolites, we performed MR excluding highly pleiotropic regions (associated with > 5 metabolites) 44,45 . In CSF, we identi ed 38 metabolites causal for 22 traits after FDR correction (78 pairs; Supplementary Table 18). For brain, we identi ed 11 causal metabolites for 10 traits (20 pairs; Supplementary Table 19). In total, we identi ed 92 causal effects involving 46 metabolites and 22 phenotypes from both tissues. There were ve causal relationships identi ed in both tissues, including for example succinylcarnitine for AD and HDL (Supplementary Table 20). In addition, we conducted a sensitivity analysis by performing MR using a more stringent method which removed all genetic regions associated with more than one metabolite. The sensitivity analysis identi ed 46 causal relationships between 20 metabolites and 18 phenotypes from both tissues (Supplementary Table 21-24, Supplementary Fig. 21).
The differences between the ndings from the standard and stringent MR analysis come from how pleiotropic regions were de ned. However, in some of these scenarios pleiotropic effects may identify relevant biological processes. For example, when a nding was pointing to an enzyme that catalyzes a speci c metabolic reaction, changes in the activity of the enzyme will affect at least two metabolites: the direct substrate and the direct product. In some situations, it may affect more analytes if multiple substrates and products are involved. This could be the case where a signal (lead by rs17279437) affected ACADS 46 , which gene encodes an enzyme in beta-oxidation where fatty acids carried by carnitines were broken down to produce energy. This signal was associated with various metabolites involved in beta-oxidation pathways, like acylcarnitine related molecules, methylsuccinate 47 and methylsuccinoylcarnitine, and fatty acids such as ethylmalonate. In the other scenarios, the signal may be driven by a metabolite channel or transporter, where genetic variants that decrease the activity of this transporter will lead to changes in levels of several metabolites. For example, the signal (lead by rs17279437) that affected a transporter encoding gene SLC6A20 48 was associated with multiple substrates of the transporter, such as proline, betaine, and dimethylglycine. Therefore, although each single metabolite might not be causal, what is leading to the disease maybe the dysregulation of a speci c metabolic process. However, these events will be identi ed as source of pleiotropic effects and were therefore removed from the MR analyses, leading to many false negative ndings. Therefore, for MGWAS it may be necessary to reconsider how we may adjust the de nition of pleiotropy by incorporating biological knowledge. To conclude, the intertwined nature of metabolic pathways often resulted in pleiotropic effects of signals, creating challenges for MR approaches, which may redirect us to identify metabolic reactions rather metabolites themselves.
In any case, we examined how many of the metabolites-traits pairs we found in our MWAS and MR were also reported in previous CSF, plasma or urine studies 2,14 . We replicated one of the three ndings of the previous CSF study, which was brain ethylmalonate's causal effect on schizophrenia, while the other two (N-delta-acetylornithine's causal effect on cognitive performance and schizophrenia) were not replicated due to its pleiotropic signal at NAT8. Among 95 causal metabolite (or metabolite ratio)-phenotype relationships identi ed by the plasma Chen et al study 2 , we were able to analyze six pairs (three phenotypes and four metabolites), but were unable to replicate these ndings: four due to tissue-speci c ndings, one due to study power difference, and one due to instrument variable selection difference (see extended results). At the same time, our analyses identi ed ve causal metabolites-disease pairs that were previously tested but were not found as signi cant and therefore represent novel associations.
These included bilirubin (Z,Z) for type 2 diabetes (T2D) and N-acetylhistidine for WHRadjBMI (Supplementary Table 18). These ndings were driven by tissue speci c MGWAS ndings, highlighting the need to not just perform larger studies on plasma, but to expand these studies to additional tissues.
Then we integrated the ndings from the three analyses: MWAS, colocalization and MR. In the CSF, 26 metabolite-trait pairs were signi cant for MWAS and MR, including nine pairs with colocalization evidence (Supplementary Table 25). These included six metabolites (Fig. 4) for seven traits (Alcoholism, bipolar disorder, WHRadjBMI, brain volume, cognitive performance, PD, and T2D). In brain, 11 pairs were signi cant in both MWAS and MR, in which two metabolites, N6-methyllysine and N6,N6-dimethyllysine showed colocalization with baldness (Supplementary Table 26).
For cognitive performance, we found causal associations with lower levels of two metabolites, 6oxopiperidine-2-carboxylate and 3-hydroxyisobutyrate, based on all three analyses: MWAS, MR and colocalization ( Fig. 4 & Supplementary Table 25). These associations were not found in the previous plasma studies because the genetic associations with these metabolites are CSF-speci c. Some previous studies supported these metabolites in uencing cognition. 6-oxopiperidine-2-carboxylate and 3hydroxyisobutyrate have been linked to cognition in AD or epilepsy-speci c studies 49,50 .
We also found that the higher levels of mannose may be causal to alcoholism, T2D, bipolar disorder, and PD based on MWAS, MR and colocalization analyses (Fig. 4 & Supplementary Table 25). The nominated effector gene for mannose, GCKR, was shown to affect both lipids and carbohydrates, including sphingolipids, glycerolipids, and serine (key connecter of amino acids to lipids, carbohydrates) 51 , suggesting that lipids and carbohydrates pathways may play an important role in alcoholism, T2D and PD. Mannose is known to be involved in alcoholism metabolism as it had an anti-steatosis role in alcoholic liver disease 52,53 . High mannose level at fasting have been associated with insulin resistance in diabetic individuals independent of obesity level 54 . In addition, MBL2 (encodes mannose-binding lectin 2), which is implicated in mannose metabolism, has been found to be associated with bipolar disorder in genetic studies, supporting the causal role of this metabolite in bipolar disorder [55][56][57] . Our analyses indicate that lower mannose levels were associated with higher risk for bipolar disorder patients, and thus prescribing mannose, an available supplemental substance, may be useful to study as a potential intervention for bipolar disorder. Moreover, this is the rst time that mannose was causally associated with PD based on literature.
Besides mannose, lower galactosylglycerol levels were potentially causal to Parkinson Disease (PD) through a signal that colocalized at GALC (Fig. 4  Additionally, lower xanthine level was predicted to have a causal effect for WHRadjBMI with support from MWAS, MR, and colocalization analyses (Fig. 4 & Supplementary Table 25). High level of xanthine oxidase activity, which reduced xanthine level, has been observed in obese individuals 59 . Therefore, xanthine as dietary supplement could be tested for potential intervention for individuals who suffer from obesity. Finally, an unknown metabolite X-24228 was found causal to brain volume, with their signals colocalized at a novel loci CLDN16. This gene plays a role in cell-adhesion, which is a crucial component in brain development 60 .
In brain, we identi ed N6-methyllysine and N6,N6-dimethyllysine to be causal for baldness, according to MWAS, MR, and colocalization analyses (Fig. 4 & Supplementary Table 26). The PYROXD2 was the effector gene for both metabolites. Interestingly, CSF N6-methyllysine level neither shared the same causal signal with baldness nor had causal effect towards baldness, which could be explained by the tissue-speci c association of N6-methyllysine (Supplementary Table 9).
Overall, we identi ed 11 high-con dent metabolites-traits causal relationships (nine in CSF and two in brain) that were supported by the MWAS, MR and colocalization analyses. These associations were novel due to either the MGWAS signal being tissue-speci c, or the metabolite not analyzed in other tissues.
Previous studies only performed one or two of these three types of analyses to identify metabolites implicated in or causal to traits. Here, we reported metabolites-traits that were identi ed not only by these two approaches but also signi cant in the third method, which made our analyses more stringent. Many more metabolites-traits pairs were found in our analyses if we only require to be associated in two of those approaches: additional 43 metabolites-trait associations (34 in CSF and 14 in brain), were supported by two approaches (MWAS + Coloc; MWAS + MR), such as succinylcarnitine and adenine for AD, and (N(1) + N(8))-acetylspermidine and 5-methylthioadenosine (MTA) for brain volume and others (Extended results, Supplementary Table 27). In these 43 additional pairs, 41 were novel and therefore warrant future exploration.

The Druggable metabolites
The metabolites identi ed in these analyses could be drug targets for improving disease outcomes or achieving desired phenotypes. Based on DrugBank database 43 ,18 in 67 of the metabolites identi ed in the MWAS analyses were of pharmacological interest (Supplementary Table 28): there are six metabolites that are either approved drugs or being targeted by approved drugs. Betaine, which higher level was found to be associated with ADHD, Autism, and WHRadjBMI, is used for the treatment of homocystinuria to decrease elevated homocysteine blood levels 61 . Valine, which were also positively associated with ADHD, Autism and WHRadjBMI, is a crucial component of parenteral nutrition and the treatments such as Aminosyn II 7% was approved for premature infants 62 . Asparaginase treatments, such as pegaspargase, reduce asparagine levels and were approved for acute lymphoblastic leukemia. Lower asparagine level was associated with WHRadjBMI. Adenine (rejuvesol treatment), which higher level was found in AD, ADHD, and smaller brain volume, is approved for Sickle Cell Disease (SCD) 63 , suggesting that adenine may be implicated in multiple complex traits but with opposite effects. Therefore, if adenine is going to be targeted for therapeutic intervention, it will be important to track potential increased risk for other traits. Statins are FDA approved drugs to lower cholesterol levels in cardiovascular disease and others 64 . We found higher cholesterol level being associated with T2D and WHRadjBMI, and therefore the use of statins could also be used to treat T2D and obesity. In addition, ve metabolites are commercial dietary supplements, and the three others are at experimental stages (See extended results for a detailed description of these ndings; Supplementary Table 28). These results indicated that some of the metabolites identi ed as potential causal factors for these traits are druggable, but at the same time, due to the complex nature of the metabolism regulation, changing the levels of those metabolites may also increase risk of other diseases, and therefore a close monitoring of those potential secondary effects will be needed.
In addition, in some cases, the nominated effector gene, instead of the metabolite itself can be also druggable. For example, Tipiracil, an approved drug for gastric or colorectal malignancies, can inhibit the transferase activity of thymidine phosphorylase, which leads to higher levels of 2'-deoxyuridine. Our study showed that 2'-deoxyuridine was lower in in ammatory bowel disease (IBD) through TYMP locus (rs140522, p = 3.98 × 10 − 57 ) by MWAS, and consistently, a recent study showed that high uridine/2'deoxyuridine ratio was causal for IBD 2 . Thus, the increase in 2'-deoxyuridine level by Tipiracil might provide therapeutic bene ts in IBD, although experimental evidence would be needed to support this hypothesis. In another example, Belinostat and Panobinostat were pharmacologically approved inhibitors of Histone deacetylase 10, which regulates polyamine substrates including (N(1) + N(8))-acetylspermidine and diacetylspermidine (rs61748567, p = 5.56 × 10 − 15 ; rs143617749, p = 1.54 × 10 − 33 ). Higher levels of these metabolites were observed in brains with shrinked sizes based on MWAS. CNS injury was associated with an increase in N1-acetylspermidine level in rat brain, indicating a link between polyamine acetylation and impaired brain function 65 . Therefore, increased (N(1) + N(8))-acetylspermidine and diacetylspermidine levels by Belinostat or Panobinostat may likely lead to side effect of a reduced brain size.

Discussion
We described a comprehensive, large-scale MGWAS study for 440 CSF metabolites in 2,602 individuals and 962 brain metabolites in 1,016 individuals, respectively. In CSF, we identi ed and replicated 219 independent signals (192 associations) for 144 metabolites in 102 loci, where 131 association signals and 24 loci were novel. In brain, we identi ed 36 independent signals (35 associations) for 34 metabolites, in which 20 signals were novel. Tissue speci city can be inferred from the observation that 59.8% of CSF independent association signals were novel yet only 2.3% of the novel signals were associated with newly analyzed metabolites. Our analyses indicate that CSF could be a good proxy to brain as we found a very high overlap across these tissues (correlation of effect size of study-wide association across tissues r = 0.82). In addition, the magnitude of the shared associations was more similar between CSF and brain, compared to blood or urine, indicating tissue-speci c effects ( Supplementary Fig. 11).
Most of the novel associations were driven either by metabolites measured only in CSF and brain or by genetic effects that were unique to these tissues that were not found in plasma and urine. Similarly, we observed a few metabolite-disease associations captured by plasma or urine studies, which were likely due to tissue-speci c regulations. These observations have broader and more translational implications in understanding the biology of complex traits and identifying novel causal and druggable targets. It is instrumental to perform similar studies in more tissues besides analyzing larger studies in the same tissue or in different populations.
Following the discovery and replication of the genetic associations with CSF and brain metabolites levels, we pursued well-established and powerful statistical approaches to identify causal and druggable metabolites for various brain-relevant traits. These analyses also identi ed potential metabolism pathways involved in or causal to human phenotypes. For the 27 analyzed phenotypes, we identi ed 11 causal metabolite-to-trait effects for eight traits, including alcoholism, bipolar disorder, WHRadjBMI, cognitive performance, PD, T2D, and baldness. In addition, we identi ed 12 metabolites that are either druggable or already have compounds that could be repositioned. Of these, mannose was identi ed as a potential therapeutic target for bipolar disorder that requires further investigation, and the mannose dietary supplement may have opportunity for drug repositioning. Yet in other cases, metabolites were upregulated in diseased individuals or individuals with undesirable trait, which will redeem those compounds invalid in treating these disorders like AD, ADHD, autism, or improve traits like cognitive performance. In addition, drugs targeting the functional enzyme of a metabolite may provide therapeutic bene ts. We found that Tipiracil may have potential to treat IBD by increasing 2'-deoxyuridine level, for which additional validation will be needed. Overall, we demonstrated the ability of MWAS approach to direct drug development.
There are some limitations of this study that warrant future efforts in the eld. First, the study was focused on non-Hispanic white population, which may not discover ethnicity-speci c ndings from African, Asian or other populations. Second, despite the fact that we performed the rst brain MGWAS analyses, the power of our brain study was relatively compromised by its limited sample size, and therefore created di culty in identifying brain-tissue-speci c associations and potentially interesting mechanisms of the brain.
In conclusion, our study described the largest CSF and brain MGWAS to date, identifying 219 associations signals for 144 CSF metabolites and 36 association signals for 34 brain metabolites.
Through MWAS, MR, and colocalization, we found eight metabolites causal for eight brain related traits. Multiple of these metabolites are druggable or have compounds for drug repositioning. We lled in the knowledge gap in the lack of metabolome-wide genetic study in CNS and improved our understanding of the genetic risk loci for brain-related phenotypes in the metabolism perspective. In the replication stage, ADNI and ACE were analyzed together using measured individual level data. The derived MGWAS summary statistics were combined with UW-Madison study by meta-analyses, which results were reported as the replication stage results. Lastly, we performed meta-analysis on discovery and replication to obtain the nal results. These meta-analysis summary statistics were used for downstream analysis. Given that this is the rst largescale CSF MQTL study performed, the discovery-replication design helps to minimize the false positive discoveries.

Declarations
In the Brain study, each cohort (WashU, ROSMAP and MAYO) were analyzed individually, followed by meta-analysis for shared metabolites between cohorts.
CSF and brain samples The 2,987 fasting CSF samples from 2,985 participants (from the ve cohorts) were processed and stored at -80 °C. Samples were kept frozen until sent to Metabolon, lnc (Durham, NC). The Metabolon's untargeted Precision Metabolomics™ LC-MS (liquid chromatography-mass spectrometry) analysis was used for metabolomics data generation. Metabolites values were rst volume normalized and then median transformed to correct for each analytical method's ("NEG", "POLAR", "POS EARLY", "POS LATE") day batches.
The 472 brain tissue samples (~50mg) were collected from the parietal lobe cortex of 459 WU participants, with 13 samples having technical replicates. All samples were shipped to Metabolon at the same time and measured in a single round of analysis. The metabolomics dataset was volume normalized, median transformed, and batch corrected as performed in the CSF. In addition, for the publicly available data, the 514 ROSMAP brain samples were collected from dorsolateral prefrontal cortex and 196 MAYO brain samples were collected from temporal cortex.

Metabolite identi cation and quanti cation
All samples were measured using Metabolon's untargeted Precision Metabolomics™ LC-MS (liquid chromatography-mass spectrometry). The technology is composed of four methods, including acidic positive ion conditions optimized for hydrophilic compounds, acidic positive ion conditions optimized for more hydrophobic compounds, basic negative ion optimized conditions, and negative ionization. All methods utilized a Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scienti c Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. The scan range varied but covered 70-1000 m/z.

Metabolomics quality control
The quality control of CSF dataset and three (WU, ROSMAP, MAYO) brain datasets followed the same pipeline, except for the slight differences in treating duplicated data. The duplicated samples in CSF came from longitudinal measures of the same individuals, and therefore were both kept for QC proposes but were removed in the MGWAS analyses. The duplicated samples in brain were technical replicates extracted at the same time. The brain replicated samples were merged through averaging the values, except for the case where only one sample has value. Given the di culty in detecting metabolites in low abundance, the single value was kept only when it was close to the limit of detection (the lowest 10% of all values).
The initial steps of quality control assessed the missingness of each sample and each metabolite. A missing value can be due to sample and technical issues, metabolite not presenting in the sample, or the metabolite level being lower than detection limit. First, a sample with > 50% missingness was removed.
Metabolites were de ned by Metabolon to be either innate or foreign to human system as nonxenobiotics and xenobiotics. Non-xenobiotics are expected to be present in many samples, while xenobiotics can be largely missing due to their foreign nature. Therefore, only non-xenobiotics with > 80% missingness were excluded, while xenobiotics were not assessed at this step. Overall, taken nonxenobiotics and xenobiotics together, the average call rate for CSF metabolites was 89.6% and the average call rates for brain metabolites were 94.4% (WU), 93.9% (ROSMAP), and 97.0% % (MAYO; Supplementary Fig. 2).
In addition, due to the mixture of individual disease status (Control, AD, PD, FTD, aging) in all cohorts, the metabolites' missingness could be caused by biological effect. Both sher's exact tests and linear regression were performed for each disease status group versus control group, and the consistently signi cant metabolites associated with disease status were recovered. For example, if sher's exact test identi ed higher missingness being associated with a disease and linear regression showed lower expression of a metabolite in diseased individuals, then the missingness could be caused by disease status instead of technical issue. None of the removed metabolites were recovered at this step because we found that their missingness were not caused by disease status. Given that CSF dataset consisted of ve cohorts, the structure of each cohort was taken into consideration in several steps. The imputation of missing values was performed separately for each CSF and brain cohort. We performed imputation for non-xenobiotics using half-minimum value of the metabolite 66 , while xenobiotics were not imputed.
Log10 transformation was applied to achieve approximate normal distribution. Moreover, given that metabolites with little variation throughout samples are non-informative for analysis, we removed metabolites that either have IQR equal to zero, or variance < 0.001. The outliers were determined separately by each cohort. Metabolite outlier values were de ned as being outside the range of values from the rst quantile minus 1.5-fold IQR to the third quantile plus 1.5-fold IQR. In addition, we removed metabolites with an overall limited number of values (N < 50) to ensure a su cient power for analysis.
Lastly, samples outliers, de ned by > 5 std from the mean of principle component one or two, were excluded. Overall, taken non-xenobiotics and xenobiotics together, the average of call rate for CSF metabolites was 89.6% and the average call rates for brain metabolites were 94.4% (WU), 93.9% (ROSMAP), and 97.0% % (MAYO; Supplementary Fig. 2).
Given the discovery and replication design for CSF study, we further removed metabolites with < 50 values in each analysis stage to ensure the feasibility of each analysis. Moreover, the earlier CSF draw in longitudinal replicates were selected.
Genotyping and imputation GWAS of CSF and brain metabolite levels GWAS analysis was performed for each metabolite in each cohort using PLINK (v2.00a3LM) 69 . Linear regression of additive model was applied and controlled for multiple factors: All meta-analysis were performed using the inverse-variance-weighted (IVW) approach of METAL (2011-03-25 version, STDERR scheme) 70 . In the CSF study, internal replication study (ADNI and ACE) was rst meta-analyzed with the previous CSF MQTL study 14 using shared variants (MAF > 0.05 in the previous study), including only variants tested in the ADNI and ACE study 14 . Then the discovery-and replicationphase results were meta-analyzed. With one analyte removed due to in ation, a total of 440 metabolites' meta-analysis GWAS results were reported. The standards of selecting signi cant signals were as follows: (1) Both discovery and replication phases P < 0.05. (2) Consistent in the direction of effects in two phases. (3) Meta-analysis result P < 2.79 × 10 -10 (5 × 10 -8 / 179 independent metabolites).
For brain, three cohorts were independently analyzed and meta-analyzed. The meta-analysis results were combined with independent analysis results for metabolites unique to a single cohort. With one analyte removed due to in ation, the brain results included a total of 962 metabolites. The criteria of selecting signi cant signals were as follows: (1)  Identify tissue speci city using mashr Mashr 16 was developed to compare signi cant associations amongst various tissue types in effect direction and effect magnitude. We compared metabolite-signal associations amongst four tissues, including CSF, brain, blood and urine. We rst prepared the 'random input' (mimic all results) of association signals and then subset it to the 'strong input' (signi cant results) of associations. The 'random input' can be used as multiple test correction based on the protocol. Given that 91% of our identi ed associations nominated a metabolism gene as functional, we created 'random input' by extracting one variant per metabolism gene (ProGeM 31 curated metabolism genes from GO, KEGG, MGI, orphaned, reactome databases) for each metabolite in the CSF study using cis-region of each gene (2Mb). Duplicated variants in each metabolite were removed, and variants within 2Mb distance were merged by selecting the most signi cant variant. For the analysis, we rst identi ed shared metabolites in all tissues and then curated 'random input' for these metabolites. The 'strong input' was extracted from 'random input' by genome-wide signi cant threshold. The function 'get_pairwise_sharing' with default factor=0 was used to calculate the percentage of associations with the same direction shared by two tissues. The function 'get_pairwise_sharing' with factor=0.5 was used to calculate percentage of associations shared by two tissues not only in direction but also in magnitude. The default setting of factor=0.5 indicated that the two magnitudes were within two-fold of difference, which was de ned by mashr to be similar. The effect sizes of an association in two tissues that met both the same direction and the magnitude ratio<0.5 were considered to be the same association.

Variant annotation and effector gene nomination
The lead-index variant of each association together with its proxy variants (r 2 > 0.8) were annotated using Variant Effect Predictor (VEP) 71 . The consequence of all variants from each signal was extracted from the VEP output. The consequences were prioritized based on impact level (high to low) and then based on the physical distance to the gene coding region (close to far). Then the consequences were grouped into the following categories: intergenic, upstream or downstream, 3'UTR and 5' UTR, intronic, splice region, splice acceptor or donor, synonymous, missense, stop gained, stop lost. The results of VEP were input into prioritization of candidate causal Genes at Molecular QTLs (PRoGeM) 31 to nominate effector gene(s). PRoGeM utilizes two strategies, "bottom-up" and "top-down", followed by selecting the concurring gene(s) in two strategies as the candidate genes for an association. The "bottom-up" method prioritizes genes overlapping with the signal (between left-most and right-most proxy variant (+/-5kb), the nearest genes (set to 10), if being protein-coding type, if its transcription being regulated by the variants based on GTEx eQTL (v7) database, and the variant impact based on consequences from VEP. The "topdown" method prioritizes genes within the distance range (set to 1Mb region to lead-index variant on both sides) by relatedness to metabolic pathways based on multiple databases. The steps of selecting candidate effector genes were as follows: 1) if the signal had a variant altering protein sequence or changing protein level dramatically (regulatory_region_ablation), which corresponded to high and moderate impact from VEP, then the respected gene is selected. If there were more than one gene, we chose genes overlapping with the signal's LD region. 2) For other cases, prioritize genes selected by both bottom-up and top-down methods. 3) If no gene was shared by two methods, the nearest gene with an eQTL overlapping the metabolite association was selected. If no eQTL was found, the nearest gene was selected. 4) When co-occur genes were found, the nearest gene with an eQTL was selected. If no eQTL was found, the nearest gene was selected.
In addition to applying PRoGeM pipeline to predict the effector gene of metabolite-to-signal associations, we performed biological knowledge-based nomination of the effector gene for each association for named metabolites, from the 10 nearest protein-coding genes (if applicable) within a 2Mb window centered at the lead variant. If multiple genes were found relevant, we selected the nearer gene to the signal. We sourced from GeneCards 33 , HMDB 34 , the Uniprot database 72 , and the KEGG database 32 . For associations successfully nominated with a biological relevant gene, we de ned the action of a gene to a metabolite to be cis-acting or trans-acting based on metabolism pathways. We de ned cis-acting association to be either the gene encoding a direct transporter of the metabolite or the gene encoding an enzyme catalyzing a reaction that involved this metabolite. The trans-acting association can be the gene encoding a protein involved in the same pathway of the metabolite but didn't directly catalyze the metabolite or the encoded protein being related to a metabolite based on literature with unknown mechanism.
Lastly, the biological knowledge nominated gene was prioritized over ProGeM nominated gene if discordance was found, and we con rmed that in each case, the ProGeM nominated gene was not biologically meaningful to the metabolite. When the biological relevant gene was missing, ProGeM nominated gene would be the effector gene. We further categorized effector genes based on its function based on enzyme and transporter. The information was obtained from the HMDB database together with GeneCards database.
The pathway analysis curating KEGG database was performed using MetaboAnalyst's joint pathway analysis 73 by inputting a list of effector genes.

Cell-type enrichment analysis
Zhang et al published 37 human brain RNA sequencing pro les for different cell types, including the mature astrocytes, neurons, oligodendrocytes, microglia/macrophages, and endothelial cells. The total expression was calculated by summing the average expression level for each gene in each cell types. For each gene, we divided the expression level in each cell type by the sum to get the proportion of gene expression in each cell type. We determine cell-type speci c gene to be the gene which transcripts accounted for > 50% of all gene transcripts.

Independent variant selection through conditional Analysis
Approximate conditional analysis was used to identify independent signals in each association, based on the LD structure of WashU participants' genotype (CSF: 2,311 non-related non-Hispanic white individuals; Brain: 1016 non-related non-Hispanic white individuals from three brain cohorts). For each association, the GCTA 74 COJO-slct function was applied to perform step-wide forward selection of approximate independent variants (P < 2.79×10 -10 for CSF and P < 1.74×10 -10 for brain; --cojo-collinear 0.9, and other settings using suggested values such as -cojo-wind 10000kb).
Association with and causality to brain related wellness traits and diseases.
Two-sample mendelian randomization analysis (R package TwoSampleMR version 0.5.6) 42 was conducted to estimate the causal effect of a metabolite (exposure) on trait or disease (outcome). In CSF, 144 metabolites with at least one association passed study-wide threshold were analyzed. In brain, 34 metabolites with at least one association passed study-wide threshold were analyzed. The instrument variables were derived from PLINK v1.90b6.4 clumping using respective CSF and Brain study participant genotype as LD reference. We used the default parameters for clumping (clump_kb = 10000, clump_r2 = 0.001) included in the R package. For metabolites with single instrument variable, Wald ratio method was applied. For metabolites with multiple instrument variables, inverse-variance-weighted method was applied. Both methods were the basic models in the R package. FDR correction was performed to correct for multiple metabolites and phenotypes tested. For phenotypes les that lack effect standard error information, we used "se.from.p" r package to infer standard error. In the main analysis, we excluded variants within FADS genes region and the highly pleiotropic regions associated with at least ve 44,45 CSF or brain metabolites identi ed from MGWAS. For the stringent analysis, we excluded variants within FADS gene region and all pleiotropic regions associated with at least two CSF or brain metabolites identi ed from MGWAS.

Druggable targets
We identi ed related drugs for metabolites that was associated with at least one phenotype in the MWAS study. We identi ed metabolites that has been approved or tested as drug or metabolites that could be modulated through drug using DrugBank 43 . We also identi ed metabolites as dietary supplements that were commercially available. In addition, we identi ed the effector genes that overlaps drug targets using DrugBank 43 .    Associated or causal relationships between metabolites and complex traits/disease uncover insights into etiology. The plot showed associations between metabolite levels and traits identi ed from Metabolomewide association study (MWAS) using CSF (top) and brain (bottom). Direction of effect and the strength of associations by P-value were represented by a range of colors. Colocalization were performed on each genetic locus through which the MWAS association were identi ed. Mendelian randomization (MR)