Metabolites are highly relevant integrative markers of health and disease, that can inform disease prediction and pathophysiology. However, measuring metabolites in the large datasets required to robustly interrogate metabolite-phenotype associations is costly, logistically challenging, and often unfeasible. In this “virtual” metabolomics study, we leveraged a state-of-the-art genetic methods in conjunction with large, phenotypically diverse clinical and genetic data sets to interrogate an extended set of metabolites against a broad clinical phenome. Among 726 metabolites analyzed, there were 336 and 107 metabolites that showed significant associations among BioVU participants of European and African ancestries, respectively. Of these, 159 and 22, respectively, were associated under a MR framework using genetic instruments for metabolites constructed in an independent population, suggesting they may be mediators of disease risk. Among associations identified in the European ancestry population, we validated associations for 16 of 22 metabolite-phenotype pairs using phenotypes derived from independent GWAS studies. Among the validated phenotypes were IBD, cholelithiasis, CAD, MI, neutropenia and lipid phenotypes. These analyses highlight the value of applying the “virtual” metabolomic approach in diverse, phenotype-rich biobanks to identify novel associations.
We found consistent associations between gastrointestinal disease phenotypes and bioactive lipids, highlighting both inflammation and resolution of inflammation as important disease mediators. We found inverse associations between phosphatidylcholine (PC) (16:0/22:5n3, 18:1/20:4) and arachidonate (20:4n6) with IBD and Crohn’s disease, both inflammatory diseases of the gut mucosa. Circulating phosphatidylcholines have been reported to be reduced in inflammatory bowel disease, suggesting that they may have a protective role in the gut mucosa.16,17 The protective effects of PCs may be attributed to anti-inflammatory action and prevention of mucosal damage16, with potential therapeutic application for IBD.18 It is important to identify the specific PC involved in protecting the gut mucus against disease. One of the abundant main species of phosphatidylcholines in gut mucus is PC 16:0/18:1.16 This is consistent with our data indicating that lower genetically-predicted phosphatidylcholine (16:0/22:5n3, 18:1/20:4) associates with IBD and Crohn's disease. Arachidonate (20:4n6) was also associated with IBD. Arachidonic acid is a precursor of eicosanoids, with potential anti-inflammatory activity19, and has previously been shown to be inversely associated with IBD including UC and Crohn’s disease.20–22
We observed several other plausible disease specific associations. There were positive associations between bilirubin (E,E) and X–21796 and cholelithiasis (gallstone disease). A causal association has been reported between extreme levels of bilirubin and increased risk of gallstone disease.23 This could be due to increased efflux of this metabolite into bile and/or the variation in the expression of genes controlling both bilirubin levels and the disease. 23 Bilirubin (E,E) is one of the water soluble isomers of bilirubin that is converted from unconjugated bilirubin (Z,Z) upon exposure to light.24 As X–21796 has an unknown identity, the associated pathway is unknown. However, SNPs associated with X–21796 map to several members of the UGT1A family of genes, which have been associated with bilirubin levels and risk of gallstones23, and SLCO1B, which is involved in bilirubin transport into the liver.25. This also highlights the utility of our approach to define the underlying mechanistic basis of associations with unknown metabolites using the underlying genetic data, which is generally not feasible using other standard epidemiological approaches.
Interestingly, the “virtual” metabolomics approach provided us with a considerable opportunity for novel discovery in relation to cardiovascular disease (CVD). Previously, a meta-analysis showed that there is no association between serum concentrations of two common plant sterols (sitosterol and campesterol) and risk of CVD.26 However, through this large well-powered study, we found a positive association between campesterol and risk of CAD and MI. Campesterol was also strongly associated with most of the phenotypes categorized in the lipid-related disorders group. Several factors have been proposed as the potential mechanisms linking elevated concentration of campesterol and increased risk of these two diseases, including common pathways influencing the absorption of cholesterol and plant sterols in the intestines,27 shared genetics linking lipoproteins and phytosterols to MI and atherosclerosis, 2829 poor nutritional status,30 and poor metabolic health.31 However, we anticipate that future analyses may validate and explore the mechanistic bases and the underlying pathophysiology of this novel finding.
This unbiased discovery approach allowed us to create and validate a resource of associations which identified metabolites that are biomarkers and potential mediators of several other clinical phenotypes. For instance, we successfully validated an inverse association between the plasmalogen 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1) and hypercholesterolemia. This metabolite was reported as inversely related to visceral adipose tissue volume and the percentage of fat in the liver and pancreas.32 We also found associations between 1-palmitoyl-2-stearoyl-GPC (16:0/18:0) and low-density lipoprotein (LDL) and total cholesterol; this metabolite has been found to be positively associated with dyslipidemia.33 Our data demonstrated that hypertriglyceridemia was positively associated with oleoyl-linoleoyl-glycerol (18:1/18:2), potentially a novel association. We also found and validated a significant association between phosphatidylcholine (16:0/22:5n3, 18:1/20:4) and low blood cell count (neutropenia). There were other interesting associations we were unable to validate using external data sets due to lack of available data. For instance, we observed positive significant associations between stearidonate (18:4n3) and 1-stearoyl-2-meadoyl-GPC (18:0/20:3n9) and nasal polyps. Dysregulated lipid metabolism has been reported in Nasal polyps.34 These metabolites potentially represent new biomarkers of this disorder. An inverse association between methylsuccinate and Alzheimer's disease (AD) was not validated, however given published data linking methylsuccinate supplementation to improvement in neuron dysfunction in AD, this may merit further study.
A significant strength of this study was the use of large datasets which have proven robust for discovery of SNPs associated with both metabolites and disease. A further strength is that we utilized genetic approaches that are well-validated for the applications we propose.35,36 We analyzed data from multiple sources, including independent cohorts using independent metabolite measurement platforms, and analysis in both European and African American populations where possible. This allowed us to maximize discovery through increased sample sizes and a more diverse population sample, to ensure generalizability, reproducibility and rigor of the association.37 Moreover, validating the observed associations using available external GWAS additionally strengthened our findings.
Our study also has some limitations. An important limitation of a genetics-based association approach is that the association may not be consistent when using directly measured levels of the metabolite. This can be due to pleiotropic associations, such as when a SNP in the predictor tags a genetic locus that is associated with an outcome through a mechanism unrelated to the metabolite, or due to weak instrument bias.3839 Further, some metabolites are heavily modulated by environment and homeostatic physiology, which may mask an association. A second limitation is that we could not find GWAS data for all the phenotype showing a significant association with metabolites. This limited the number of total novel findings we could evaluate in external data sets.
In summary, we identified novel metabolite-phenotype associations, and confirmed known relationships between metabolites and disease. Further studies are needed to replicate and clinically validate these findings. This study highlights the utility of a genetics-based “virtual” metabolomics approach in conjunction with DNA biobanks to link metabolites to clinical diseases and clinical diagnoses. As genetic biobanks continue to grow, the potential to discover genetic underpinnings of the metabolome will also expand. This approach can be used to identify additional metabolite-disease associations, uncover novel disease biology and move towards application in clinical populations.