Investigating Causal Relations Between Circulating Metabolites and Alzheimer’s Diseases: a Mendelian Randomization Study

Metabolomics is a promising approach that can be used to understand pathophysiological pathways of Alzheimer disease (AD). However, the relationships between metabolism and AD are poorly understood. The aim of this study is to investigate the causal association between circulating metabolites and risk of AD by combining metabolomics with genomics through two-sample Mendelian randomization (MR) approach. Genetic associations with 123 circulating metabolic traits were utilized as exposures. A large summary statistics data from International Genomics of Alzheimer’s Project was used in primary analysis, including 21,982 AD cases and 41,944 controls. Validation was further performed using family history of AD data from UK Biobank (27,696 cases of maternal AD, 14,338 cases of paternal AD and 272,244 controls). We utilized the inverse-variance weighted method as primary analysis and four additional MR methods (MR-Egger, weighted median, weighted mode, and MR pleiotropy residual sum and outlier) for sensitivity analyses. causal effects of circulating glycoprotein acetyls, ApoB, LDL cholesterol, and serum total cholesterol on higher risk of AD, whereas glutamine showed the protective effect. Further research is required to decipher the biological pathways underpinning associations.

synaptic loss [1,2]. Unfortunately, there is currently no effective prevention and treatment for AD [1].
Metabolomics is the newest systematic biology approach which measures the biochemical products of cell processes downstream of genomic, transcriptomic, proteomic systems, and in uences from the environment. It offers great potential for the diagnosis and prognosis of neurodegenerative diseases by capturing snapshots of the complex and multifactorial biochemical pathways that may be altered in AD [3]. Recent studies have showed that metabolomics can be used to measure alterations in biochemical pathways related to AD [4][5][6][7]. However, one of the key challenges in these metabolomics studies is the inability to ensure whether the relationships between circulating metabolites and AD are causal. It is of great importance to understand the causality between metabolites and AD, as well as the potential pathophysiological pathways of AD, to inspire drug discovery and to detect biomarkers that aid in early detection of high-risk individuals to initiate prevention, monitoring, and treatment.
Mendelian randomization (MR) is an analytic approach that uses genetic variants as instrumental variables (IVs) to assess causal inferences between an exposure and the outcome of interest [8]. The MR approach is largely independent of unmeasured confounding biases and reverse causality inherent in observational studies, given that allocation of genotypes from parents to offspring is random and genetic variation is unlikely to be affected by environmental factors [8,9]. This is particularly relevant in a metabolomic study where the inter-individual variability of circulating metabolites can be affected by a wide range of potential confounders such as age, sex, physical diseases, medication, exercise, weight, and time of sampling. In addition, two-sample MR analysis is an extension in which the effects of the genetic instrument on the exposure and on the outcome can be obtained from publicly available genomewide association studies (GWAS) summary data [10]. The two-sample MR approach enables us to link circulating metabolites with risk of AD using GWAS estimates on both metabolic phenotypes and AD.
The aim of this study was applying a two-sample MR approach to investigate the causal relationship between circulating metabolites and risk of AD, combining genome-wide and metabolome-wide datasets generated from large scale cohorts.

Method Genetically determined metabolites
We obtained the summary statistics from a GWAS for 123 circulating metabolites [11]. Brie y, Kettunen et al. conducted a comprehensive GWAS estimated with quantitative human serum/plasma metabolites as phenotypes [11]. Metabolomics data were acquired based on human fasting blood samples, otherwise the fasting time effect were adjusted in original study. The 123 metabolites represent a broad molecular signature of systemic metabolism and were assigned to 12 classes (carboxylic acids and derivatives, fatty acyls, glycerolipids, glycerophospholipids, hydroxy acids and derivatives, keto acids and derivatives, lipoprotein, organooxygen compounds, protein, ratio, sphingolipids, steroids and steroid derivatives) based on human metabolome database classi cation and expert opinion [12,13]. A total of up to 24,925 individuals from 14 Europe cohorts were meta-analyzed. The mean age was 44.6 years, and females accounted for 54.6%. Written informed consent was obtained from all participants, and the study was approved by ethical committees.

Selection of instrumental variables
For each of the 123 metabolites, single nucleotide polymorphisms (SNPs) associated at genome-wide signi cance P-value (P < 5×10 -8 ) with a minor allele frequency greater than 0.01 were considered as potential instruments. Independent SNPs were selected at a threshold of linkage disequilibrium r 2 > 0.05 and a distance of 1000kb. For palindromic SNPs, we aligned strands using allele frequency and discarded palindromic SNP(s) that had minor allele frequency above 0.42. Then exposure-outcome datasets were harmonized. We have considered the palindromic SNPs and checked original datasets to avoid reverse effects.
The proportion of variance explained by IVs were computed. And F-statistic of each metabolite was calculated to judge the strength of IVs. Typically, a strong instrument was de ned as an F-statistic > 10 [14]. Additionally, power calculations were conducted using the R code provided by Burgess S with a twosided type-I error rate α = 0.05 [15]. The proportion of variance explained by IVs, F-statistics and power were presented in Additional le 1.

IGAP AD dataset
In primary analysis, genetic variants associated with late-onset AD were obtained from a meta-analysis GWAS performed by International Genomics of Alzheimer's Project (IGAP) [16]. There is no sample overlap of IGAP with cohorts of circulating metabolites. IGAP is a large two-stage study based upon GWAS on individuals of European ancestry. Data from stage 1 was used in the present study, including 63,926 individuals (21,982 AD cases and 41,944 cognitively normal controls) of European descent from four consortia: Alzheimer Disease Genetics Consortium (ADGC), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), The European Alzheimer's Disease Initiative (EADI), and the Genetic and Environmental Risk in AD/De ning Genetic, Polygenic and Environmental Risk for Alzheimer's Disease Consortium (GERAD/PERADES) [16]. All AD diagnoses were autopsy-con rmed or satis ed the NINCDS-ADRDA criteria or DSM-IV guidelines [16]. The average age at onset of AD ranged from 74.4 to 81.9 years, and the average age at examination for 83% controls is ≥ 76 years.

UK Biobank proxy-AD dataset
We additionally set out to validate our results using a proxy-AD dataset, based on individuals in the UK Biobank (http://www.ukbiobank.ac.uk) [17,18]. The high heritability of AD implies that individuals with family history of AD are likely to have a higher genetic AD risk load. Thus, individuals with one or two parents with AD were de ned as proxy cases, that is, having family history of AD [19]. In this dataset, the proxy-AD case-control status was ascertained via self-report. Over 500,000 community-dwelling individuals aged between 37 and 73 years were recruited in the United Kingdom between 2006 and 2010 [17]. An array of 314,278 participants with available AD information on at least one parent were meta-analyzed in this analysis, including 27,696 cases of maternal AD, 14,338 cases of paternal AD, and 272,244 controls [18]. All data sources used in this MR study received approval from an ethics standards committee on human experimentation and obtained informed consent from all participants.

Statistical analysis for Mendelian randomization
We used inverse-variance weighted (IVW) method as the primary analysis to determine the causal relationships between genetically determined circulating metabolites and AD. The IVW method will return an unbiased estimate in the absence of horizontal pleiotropy or when horizontal pleiotropy is balanced [20]. Results are presented as odds ratio (OR) per standard deviation (SD) increase in genetically determined metabolites on AD for the outcome was dichotomous.
We conducted sensitivity analysis using weighted median [21], weighted mode-based estimate (MBE) [22], MR Egger regression [23], and Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) [24], and displayed in scatter plots. These methods hold different assumptions at the costs of reduced statistical power. The weighted median allows for 50% of the IVs to be invalid or present pleiotropy [21]. The weighted MBE method used the mode of the IVW empirical density function as the weighted MBEs and obtained a causal effect estimate robust to horizontal pleiotropy [22]. MR-Egger regression allows >50% of the variants to be invalid [23]. MR-Egger is based on the "NO Measurement Error" (NOME) assumption (no measurement error in the SNP exposure effects), which is evaluated by the regression dilution I 2 GX statistic (i.e., less than 0.9 indicates a violation of NOME) [25]. Thus, an I 2 GX statistic was calculated to test the presence of bias with MR-Egger.
We checked each SNP prioritized in the present study using the GWAS Catalog (http://www.ebi.ac.uk/gwas) to ensure no genetic instruments were associated directly with AD. SNPs signi cantly associated with AD were excluded from the IVs. Where SNPs signi cantly (P < 5×10 -8 ) associated with AD are detected, the analysis was re-run to determine whether removing those potential pleiotropic SNPs impacted the effects.
Furthermore, we conducted MR Egger intercept and MR-PRESSO global test to detect the presence of horizontal pleiotropy or heterogeneity. In the case of horizontal pleiotropy, MR-PRESSO outlier test compares the observed and expected distributions of the tested variants to identify outlier variants. If signi cant outliers (P < 0.05) are detected, they were removed from the analysis to return an unbiased causal estimate [24]. Heterogeneity in the IVW estimates was tested using Cochran's Q test, and displayed in forest plots. Moreover, visual inspection of funnel plot and leave-one-out plot were also used to assess the MR "no horizontal pleiotropy" assumption (see Additional le 6).
To correct for multiple comparisons, we applied false discovery rates (FDR) correction in IVW. An FDR corrected P-value < 0.05 was considered signi cant, and an unadjusted P-value < 0.05 was considered as the evidence of a suggestive association. Those metabolites that showed suggestive evidence of association (P IVW < 0.05) with late-onset AD were assessed for validation using UK Biobank GWAS.
Results with consistent direction of point estimates across sensitivity analyses, validation, IVW, and estimates after correction of potential pleiotropy were considered as robust causal associations.
Analyses were conducted using R version 3.6.3, with the MR analysis performed using the "TwoSampleMR" package version 0.5.2 [26,27].

Pathway enrichment analysis
Metabolic enrichment analysis was conducted using the web-based MetaboAnalyst 5.0 (https://www.metaboanalyst.ca/). We analyzed the metabolite sets from Kyoto Encyclopedia of Genes and Genomes (KEGG) Database in our study [28]. All metabolites that were associated with AD risk with a P-value <0.05 were used to identify metabolic pathway.

Results
A ow diagram depicting the process of the MR analyses is shown in Fig 1. The IVW identi ed 32 circulating metabolites that were associated with AD risk, including 27 signi cant (FDR P IVW < 0.05) traits and ve suggestive (P IVW < 0.05) causal traits (Fig 2). Of the 32 metabolites, the IVs were composed of 6-52 SNPs and could explain 3-20% of the variance of the corresponding metabolites (see Additional le 1). And the F-statistics were all highly above the threshold of weak instruments of F-statistic < 10 (see Additional le 1). Results of sensitivity analyses were showed in Additional le 2. The negative MR results were presented in Additional le 3.

Effects of Genetically Determined Metabolites on AD
Those associated metabolites include two proteins, two carboxylic acids and derivatives, 26 lipoproteins, and two steroids and steroid derivatives (Fig 2). We observed a 1-fold increased risk of developing AD per Pleiotropy and heterogeneity analysis The Q-test detected no evidence of heterogeneity in the results of Gp, citrate, glutamine, S-HDL-P, XL-HDL-C, XL-HDL-CE, and XL-HDL-FC ( The MR Egger intercept test showed no evidence of pleiotropy in these IVW-identi ed metabolites (Table   1)

Validation
In validation, we used UK Biobank dataset to verify the association between IVW-identi ed metabolites and proxy-AD (Fig 3). Of the 32 identi ed metabolites, 20 were still signi cant associated with proxy AD in validation, especially including ApoB, Gp, and LDL-C, and several different composition of LDL fractions. Among 32 selected metabolites, direction of point estimates in validation were all accordant with primary results in IVW except for the citrate (OR validation = 1.029), which yields reversed effect with risk of AD (OR IVW = 0.83).

Pathway enrichment analysis
Our study identi ed six signi cant metabolic pathways that were involved in the pathogenesis of AD ( Table 2). The most signi cant metabolic pathway was D-glutamine and D-glutamate metabolism (P =1.30×10 -5 ) from the KEGG database. L-glutamine and L-glutamine were involved in this metabolic pathway. Another two metabolic pathways involving two circulating metabolites (i.e., glutamine and citrate) survived FDR correction, that is, the pathway of alanine, aspartate, and glutamate metabolism (P = 0.0003), and glyoxylate and dicarboxylate metabolism (P = 0.0004). We also identi ed three pathways at the nominal P <0.05, including nitrogen metabolism (P = 0.008), arginine biosynthesis (P = 0.018), and citrate cycle (i.e., tricarboxylic acid (TCA) cycle; P = 0.026).

Discussion
By performing a two-sample MR analysis, the present study supports the hypothesis that circulating metabolites levels can be causally corelated to risk of AD. We suggest a signi cant association between higher levels of Gp and higher risk of AD, and genetically predicted glutamine levels are signi cantly associated with lower risk of AD. Our results also reinforce the idea that circulating lipid-related metabolites may play a role in the in the pathophysiological process of AD. Particularly, we observed robust evidences of causal effects with respect to ApoB, serum-C, three IDL subfractions (i.e., IDL-C, IDL-L, IDL-P), and 13  The measured Gp are mainly α-1-acid glycoprotein (AGP) [11], also called orosomucoid. Gp is an acute phase plasma α-globulin glycoprotein, involving in many activities including modulating immunity, binding and carrying drugs, maintaining the barrier function of capillary, and mediating the sphingolipid metabolism [29][30][31]. Gp is associated with AD due to its important role in modulating neuroin ammation [32]. Higher levels of plasma AGP were found in patients with cognitive impairment than in normal subjects [32]. Previous meta-analysis has reported that plasma levels of Gp were associated with increased risk of dementia and lower cognitive function [33]. These results support our ndings suggesting a relationship between circulating Gp and AD.
ApoB is synthesized in the liver and circulates in the plasma as the major protein component of LDL, involving in the transport of cholesterol to peripheral tissues [34]. Previous studies have demonstrated that AD group has signi cantly higher levels ApoB in serum [35,36] and plasma [37] than that of the control group, especially in AD subjects with APOE ε4 allele [38]. In AD patients, higher serum levels of ApoB are signi cantly correlated with higher Aβ42 levels in brain [36]. Additionally, genetic variants in the gene of APOB are strongly associated with early-onset AD [39], suggesting a link of ApoB to AD risk. However, previous studies of circulating ApoB levels in human are con icting, with a large population study nding no association between circulating ApoB levels and incident AD [40]. Therefore, our MR results is a signi cant evidence enhancing the association between circulating ApoB and AD and suggesting it as causality.
It is reasonable of our ndings that many biological studies have reported coincident evidences. Plasma ApoB was found co-localized with cerebral amyloid plaque in a transgenic mouse AD model [41], and was positively correlated with Ab plaque abundance in brain [42]. Overexpressing APOB in a transgenic mouse model induces signi cant memory impairment and increases Aβ levels compared with wild-type mice, suggesting that increased ApoB levels can contribute to the development of AD-like pathology [43].
Whereas ApoB is involved in LDL-C metabolism and is regarded as a promising link between cholesterol and AD [44], many epidemiological evidences of association between LDL-C and AD are consistent with that of ApoB [36]. Observational studies have indicated that LDL-C levels were signi cantly increased in AD patients [45][46][47]. Likewise, Zhou et al. suggest that elevated concentration of LDL-C (> 121 mg/dl) may be a potential risk factor for AD [48]. Our MR analysis support these results and suggest a causal effect of high circulating LDL-C levels in increasing risk of AD. Consistent with our ndings, another two published MR study also revealed similar effects of LDL-C using different datasets [49,50], enhancing reliability of the results.
According to the molecular size, LDLs are further categorized as large (L), medium (M), and small (S) LDLs in initial study [11]. Variation in circulating levels and composition of these fractions may have different pathophysiologic signi cance. Particularly, plasma levels of L-LDL particles were signi cantly associated with greater cerebral amyloidosis and lower hippocampal volumes independent of LDL-C [51]. Except for LDL-C, our ndings suggest that six L-LDL subfractions, four M-LDL subfractions, and two S-LDL subfractions can in uence the AD risk, but further investigations are needed to fully understand the molecular mechanisms involved.
In results of observational studies, the effects of serum-C on risk of AD were highly heterogeneous [52]. With respect to serum-C, several meta-analyses revealed non-signi cant effect on AD [53]. While other epidemiological studies reported that serum-C levels were signi cantly increased in AD patients [45][46][47]. The signi cance of serum TC differs between mid-life and older adults [54]. Several studies state that high mid-life serum TC levels represent a risk factor for subsequent AD [55], but that there are no detectable differences in serum TC levels at older ages [56]. Additionally, except for long-term average serum TC levels, higher TC variability is signi cantly associated with increased risk of all-cause dementia and AD in the general population, independent of mean TC levels [57]. Thus, the in-coincident results between serum-C studies may be explained by the variations in total cholesterol levels and the disease progression. Taking the advantage of not being affected by unmeasured confounders inherent in observational studies [8], our MR results are more robust, suggesting the high serum-C levels may have a causal effect in increasing AD risk.
Many epidemiological evidences suggest a protective association of circulating HDL cholesterol (HDL-C) levels against AD risk [58]. While we found HDL-C levels are not associated with AD risk with enough power (see Additional le 3), concordant with a large population study [40]. In our study, four very large HDL subfractions (i.e., XL-HDL-C, XL-HDL-CE, XL-HDL-FC, and XL-HDL-P) yield inverse effects in AD risk, however, these sensitivity results showed inconsistent effect estimates against IVW. Investigations are needed to further clarify whether the relationship between HDL and AD are causal.
Our study also reported several metabolic pathways that might be involved in the pathogenesis of AD, in which the D-glutamine and D-glutamate metabolism has been reported to be associated with AD [4,59]. Observational study showed that glutamine concentrations in plasma is positively correlated with that in posterior cingulate cortex [60], which is associated with cognitive impairment in AD [61]. We found consistent results that genetically determined circulating glutamine show a protective effect against AD. Nevertheless, a cohort study found higher glutamine levels were associated with lower cognitive function and higher risk of dementia [33]. Whereas observational studies are prone to reverse causation and confounding bias, an MR analysis with balanced horizontal pleiotropy is more credible [9]. Consistent with our results, a published two-sample MR study came to a similar conclusion of glutamine using a different AD dataset [62]. Furthermore, by conducting a series of rigorous sensitivity, pleiotropy, and validation analyses, our results are more comprehensive and robust. Moreover, there also exist biological evidences of this result. Anderson et al. have observed that reduced glutamine metabolism, reduced TCA activity, and impaired oxidative glutamine metabolism precede amyloid plaque formation in AD mouse model compared to controls [63]. And glutamine is proved to protect against oxidative stress-induced injury that is intimately related to AD in AD mice model [64].
Citrate is key constituent of the TCA cycle, serves as a substrate in the cellular energy metabolism cycle involved in the fatty acid synthesis, glycolysis, and gluconeogenesis [65]. There are very few researches exploring the relationship between citrate and AD. However, our current analysis found a protective effect of citrate in AD risk at signi cant level. Although additional evidence is needed, it might provide valuable information to help understand the underlying biological mechanisms in the pathogenesis of AD.
Our study also has several limitations. First, a general challenge of MR is the persistent possibility of horizontal pleiotropic associations between exposure and outcome. In the present study, we conducted up-to-date analyses to detect and correct the potential pleiotropy. One limitation is that the Q-test and MR-PRESSO global test is signi cant in some metabolites. Nevertheless, MR-PRESSO outlier test was further performed to correct for horizontal pleiotropy and returned an unbiased causal estimate. Second, some metabolites yield opposite direction of effect estimates across sensitivity analysis and IVW. It is generally recommended that the emphasis of sensitivity analysis should be laid on the direction of point estimates among the IVW and sensitivity analyses, rather than just the P values. Although this standard ruled out several metabolites from robust results, a serious screening protocol ensure the reliability of our results. For instance, the Gp and glutamine showed the most robust casualty. However, we didn't have enough evidence that those "non-robust" metabolites are not associated with AD. Third, we used a proxy-AD GWAS dataset to verify our analysis. Hence, the phenotypes used for validation were different from that used in primary analysis, resulting in smaller effect sizes. However, the validation is an independent replication analysis for that there is no overlap between primary AD dataset and validation dataset.
Despite these limitations, strengths of the study are notable. Our study provided novel insight by combining metabolomics with genomics to help understand the pathogenesis of AD. The use of two sample MR approach also enabled us to use the very large AD case-control data, giving su cient power to detect even small effects. And there is no overlapping among exposure and outcome datasets, as is unachievable in many MR studies that may bias effect estimates. Stronger evidence of causal relationships is of great importance because the AD underlying pathophysiological mechanisms are unclear. If these circulating metabolites levels truly reduce AD risk, it would be promising markers for early detection and potential avenues for effective therapeutic intervention in AD.

Conclusions
In conclusion, our study suggested increased levels of circulating Gp, ApoB, LDL-C, and serum-C were the most robust metabolites that were associated with higher risk of AD, whereas glutamine showed the contrary effect. We found strong evidence for causal effects of several different composition of LDL fractions on increased AD risk. The present study provides little evidence that recommending circulating HDL-C would help to prevent AD. Further research is required to decipher the biological pathways underpinning associations.

Declarations
Ethics approval and consent to participate All data sources used in this MR study received approval from an ethics standards committee on human experimentation and obtained informed consent from all participants.

Consent for publication
Not applicable.

Availability of data and materials
All the data used in this study can be acquired from the original genome-wide association studies that are mentioned in the text. Any other data generated in the analysis process can be requested from the corresponding author.

Competing interests
The authors declare that they have no competing interests.     Mendelian randomization results for the circulating metabolites and the risks they pose in risk of Alzheimer's disease in primary analysis AD, Alzheimer's disease; CI, con dence interval; FDR, false discovery rates; SD, standard deviation; SNPs, single nucleotide polymorphisms; OR, odds ratio. Mendelian randomization results for associations between circulating metabolites and family history of Alzheimer's disease in validation AD, Alzheimer's disease; CI, con dence interval; SD, standard deviation; SNPs, single nucleotide polymorphisms; OR, odds ratio. Overview of the design and results of this MR study on circulating metabolites and Alzheimer's disease.