Genetic architecture of the plasma proteome in participants of African and European ancestry
To build genetic maps of the plasma proteome and metabolome, we performed pQTL and metabolite-QTL (mQTL) analyses separately (Fig. 1; Figure S3A-F, Table S2-7). To obtain the ancestry stratified maps, we further split the input data into African and European sub-cohorts. In summary, we utilized an aptamer-based assay (SOMAscan 7k platform18) to measure the multi-ancestry proteomics data and a mass-spectrometry assay (Metabolon HD4 platform19) for the metabolomics data of the same cohort. Following quality control procedures for the omics data and integration with array-based post-imputation genotype data, we constructed four maps: i) AFR pQTL (414 participants and 6,907 proteins), ii) EUR pQTL (2,338 participants and 6,907 proteins), iii) AFR mQTL (417 participants and 1,413 metabolites), and iv) EUR mQTL (2,392 participants and 1,483 metabolites). To determine the study-wide significant QTLs, we derived it from genome-wide significance after further accounting for the number of independent features within each separate map (see Methods for more details).
To identify genetic variants associated with the plasma proteome in individuals of African ancestry, we conducted pQTL mapping on African ancestry participants (Fig. 2A). Of 6,907 proteins that passed QC, we identified 881 proteins with 954 study-wide significant pQTLs (Table S11). Among these findings, 420 pQTLs were classified as cis, while 534 were trans-pQTLs. Consistent with previous studies20,15,21, we observed that the absolute effect size was negatively correlated with the minor allele frequency (Figure S4A). After assigning each pQTL to its corresponding linkage disequilibrium (LD) block22, we identified a total of 508 unique genetic loci, including 84 pleiotropic regions. Notably, we found 33 proteins associated with the APOE locus (Figure S5A), which ranked second in terms of AFR proteomic-associated pleiotropic regions and genomic hotspots. The other top-five pleiotropic loci included VTN (chr17q11.2) with 36 proteins, ABO (chr9q34.2) with 24 proteins, CFH (chr1q31.3) with 23 proteins, and the MHC region with 22 proteins. Next, we performed a stratified analysis for individuals of European ancestry. Of the 6,907 proteins, 2,400 proteins showed 2,848 significant pQTLs in this population; 1,282 cis-pQTLs and 1,566 trans-pQTL (Fig. 2B; Table S12; Figure S4B). Of the top five pleiotropic pQTL loci in EUR (totally 746 regions), the APOE locus (chr19q13.32) was associated with 126 proteins (Figure S5B). The other top-five pleiotropic loci included the VTN (chr17q11.2) with 182 proteins, CFH (chr1q31.3) with 151 proteins, MHC region with 86 proteins, and the ABO locus (chr9q34.2) with 82 proteins.
To determine how many of the pQTLs have been reported before, we conducted a comparison between our study-wide pQTLs and the three largest to date external studies covering both cis and trans associations while encompassing two genetic ancestries (see Methods). Of these four datasets from three external pQTL studies (Table S13), Ferkingstad et al.,23 and Sun et al.,15 included participants of EUR ancestry, while Surapaneni et al.,20 and Sun et al., 15 sampled individuals of AFR ancestry. Overall, out of the 954 AFR pQTLs (Table S14) identified, we found that 561 had been previously reported with a study-wide significant p-value threshold of 5×10− 11. Additionally, among the remaining 393 AFR pQTLs, 14 had been reported with a genome-wide threshold, 45 had passed a nominal threshold, and 242 did not show nominal significance in previous studies. Among the pQTLs that were not tested, 82 were due to missing proxy variants, and 10 were due to missing protein data. Considering the largest number of proteins (~ 5k) profiled from a large-scale European cohort in the study by Ferkingstad et al.,23, we can replicate the highest number of our findings for the EUR pQTLs, with a p-value below 5×10− 2 in this external study. Of the 2,848 EUR pQTLs identified (Table S14), 2,052 had been reported as study-wide significant (p < 5×10− 11), 43 were below a genome-wide threshold, 241 were below a nominal threshold, while 395 were above the nominal threshold. This indicates that 86% of the tested pQTLs have supportive evidence from previous studies. Among the untested pQTLs, 81 were due to missing proxy variants and 36 were due to the missing protein data.
Genetic architecture of the plasma metabolome in participants of African and European ancestry
To detect genetic variants associated with the plasma metabolome in African and European ancestry respectively, we performed mQTL mapping using the same participants from which the proteomic data was generated (Fig. 2C-D). After quality controlling for both genotype and metabolome datasets, we identified 65 significant mQTLs in 34 genetic loci, associated with 60 metabolites (out of a total of 1,413 metabolites tested) in the African-American cohort (Fig. 2C, Table S15). Notably, a significant number of these hits (27 out of 34 loci) were involved in and enriched for the super pathway of amino acids (odds ratio = 2.78, Fisher exact test, p-value = 6.63×10− 5) compared to other pathways. In the European-American cohort, we found 490 significant mQTLs in 124 genetic regions associated with 403 metabolites (Fig. 2D, Table S16). The majority of the hits were observed in two super pathways: lipids (198 metabolites, odds ratio = 1.26, Fisher exact test, p-value = 0.019) and amino acids (106 metabolites, odds ratio = 1.52, Fisher exact test, p-value = 0.0015). In line with a previous cross-platform mQTL study24, we observed both sets of mQTL followed a trend where the absolute effect size negatively correlated with the minor allele frequency across all variants (Figure S4C-D). Additionally, we identified top-ranked LD blocks associated with metabolites were loci near ALMS1P1 (chr2p13.1), UGT1A6 (chr2q37.1), and PYROXD2 (chr10q24.2) in the African cohort (Figure S5C), and FADS1/2 (chr11q12.2), SLCO1B1 (chr12p12.1), ALMS1P1 (chr2p13.1) in the European-American cohort (Figure S5D). These regions represent metabolite-associated pleiotropic regions and genomic hotspots specific to their respective populations.
To assess the presence of previously reported mQTLs, we examined our study-wide mQTLs using three of the up-to-date external studies with full summary statistics available, which included participants from both genetic ancestries (details see Methods). Among these three external mQTL studies (Table S17), Yin et al.,25, and Chen et al.,26 focused on individuals of EUR ancestry alone, while Rhee et al.,27 included participants of AFR ancestry. Of the 65 AFR mQTLs identified (Table S18), we found that 48 had already been reported after multiple testing corrections with a study-wide significance threshold of 5×10− 11. None of these 65 mQTLs were below a genome-wide threshold, one passed a nominal threshold, seven were above the nominal threshold, and nine were not examined as the metabolites were missing. On the other hand, of the 490 EUR mQTLs identified (Table S18), we found 412 passed the study-wide threshold of p-value as 5×10− 11, seven passed a genome-wide threshold, ten passed a nominal threshold, while nine did not reach the nominal threshold. Among the mQTLs that were not tested, two were due to the missing proxy variants and 50 were due to the missing metabolite data. These results indicate that approximately 88% of tested mQTL pairs in the African ancestry cohort and 98% of tested mQTL pairs in the European ancestry cohort have supportive evidence from previous studies.
All four QTL datasets can be interactively explored on the PheWeb28-based ONTIME browser (https://ontime.wustl.edu/).
Ancestry-specific pQTLs and mQTLs
To identify ancestry-specific xQTLs (i.e., pQTL and mQTL), we compared results between participants of African and European ancestry. Briefly, ancestry-specific hits were determined based on fold-change criteria and considering both the effect size and standard error (for deriving the Z-normalized effect size), following the methodology used in previous condition-dependent genetic studies29,30. The fold-change greater than ten-fold or smaller than 10%-fold was used as the threshold, with log10-based fold change boundaries of +/- 1 to determine context-specific QTLs (see Methods).
In the case of proteomics, of the 954 pQTLs identified in AFR participants, 29.6% were considered AFR-specific pQTLs (Fig. 3A, Table S19). For example, in African-ancestry participants, the protein levels of HSF2B (Heat shock factor 2-binding protein) were positively associated with the genetic variant chr5:14626365:T:C (\(Z=6.77\), \(MAF.afr=0.05\)), However, in European-ancestry participants, the HSF2B protein levels were similar across all genotypes of the same variant (\(Z=-0.007\), \(MAF.eur=0.10\)), resulting a large fold-change difference (\(Z.afr/Z.eur ratio =-949\), Fig. 3B). Similarly, among the 2,848 pQTL identified in EUR participants, 24.3% were considered EUR-specific pQTLs (Fig. 3C, Table S20). In the European-ancestry cohort, the protein levels of Apo A-V (Apolipoprotein A-V) were significantly decreased with the minor allele dosages of the variant chr11:116780399:C:T (\(Z=-5.55\), \(MAF.eur=0.20\)) but this association was not observed in African-ancestry participants (\(Z=2.51\), \(MAF.afr=0.26\)), leading to a fold-change ratio of -0.452 (Fig. 3D).
In the field of metabolomics, we compared mQTL results from participants of the two ancestries. Of the 65 mQTLs identified with AFR participants, 20% were classified as AFR-specific mQTLs (Fig. 3E, Table S21). For instance, in AFR participants, the metabolite abundances of the X-23447 were increased with the minor allele dosage of the genetic variant chr5:175129766:C:T (\(Z=6.89\), \(MAF.afr=0.0096\)). However, in EUR participants, this metabolite displayed similar levels across all genotypes of the variant (\(Z=-0.0116\), \(MAF.eur=0.014\)), resulting in a substantial fold-change difference (\(ratio of Z.afr/Z.eur =-594\), Fig. 3F). On the other hand, among the 490 mQTLs identified with EUR participants, 23.7% were considered EUR-specific mQTLs (Fig. 3G, Table S22). In the EUR cohort, the levels of myristoyl dihydrosphingomyelin (d18:0/14:0) were significantly and positively associated with the minor allele dosages of the variant chr20:12978750:T:C (\(Z=6.84\), \(MAF.eur=0.38\)), whereas in the AFR group, this association was not observed (\(Z=-1.98\), \(MAF.afr=0.41\)), leading to a fold-change ratio of -0.289 (Fig. 3H).
To further characterize the ancestry-specific QTLs, we categorized the genetic variants into three bins based on minor allele frequency (MAF). The bins were defined as follows: bin-1, MAF ranging from 0 to 0.01; bin-2, MAF ranging from 0.01 to 0.05; and bin-3, MAF ranging from 0.05 to 0.5. The MAF threshold for each variant was determined by considering the minimum MAF value between the two ancestries. We found that the larger the MAF of a genetic variant, the less likely it was to be specific to a particular ancestry (Figure S6A-G). On average, the proportion of ancestry-specific QTLs decreased from 68.5–45% as the MAF bins shifted from bin-1 to bin-2. Moreover, the ancestry-specific QTLs decreased to 10.8% at bin-3. Power analyses (see Methods) indicated that the sentinel xQTLs used in this ancestry specificity section were well-powered given the current sample size. Even though the variants with lower MAF tend to have a lower power with the same effect size, all the ancestry-specific variants showed > 80% power in the other ancestry.
As a complementary strategy, we employed a more flexible Bayesian approach, multivariate adaptive shrinkage (MASH) framework29, to calculate the posterior probability and posterior mean for each QTL-trait pair in the two ancestries. The posterior mean fold change also indicated a similar ancestry-sharing proportion ranging from 82.3–96.2% (Figure S7A-D), which supports the previous estimations of ancestry-specific QTLs. These results align with previous studies (Zhang et al.,13 for proteomics and Rhee et al.,27 for metabolomics datasets). Zhang et al.,13 reported 10% EUR-specific and 30% AFR-specific cis-pQTLs, while Rhee et al.,27 uncovered 22% ancestry-specific mQTLs. Moreover, our findings extend these previous reports by analyzing different MAF bins, revealing that the ancestry-specific QTLs are more likely to have lower frequencies. This observation could be explained by the fact that functional variants tend to have lower frequencies than non-functional variants, and therefore, these ancestry-specific QTLs may capture those functional variants. Alternatively, participants of African ancestry may have a higher prevalence of rare variants compared to those of European ancestry, which increases the likelihood of finding ancestry-specific associations.
Integration of proteins and metabolites with the risk of ancestry-matched T2D via PWAS and MWAS
Finally, to identify proteins associated with ancestry-specific risk of T2D, we first employed the PWAS framework. Specifically, we prioritized proteins that were associated with ancestry-matched T2D risk within each ancestry group, namely EUR- and AFR-stratified analyses (Fig. 4A-B, Figure S8-9, Table S23-24). For the EUR T2D risk analysis, we used the summary statistics from the GWAS published by Mahajan et al.,5 that included 80,154 cases and 853,816 controls. On the other hand, to investigate AFR T2D risk, we leveraged the study conducted by Vujkovic et al.,6 which enrolled 24,646 cases and 31,446 controls. These two GWAS reported 225 and 23 genome-wide significant hits in EUR and AFR GWAS, respectively.
From the ancestry-stratified PWAS analyses, we found 74 proteins in EUR and eight in AFR associated with T2D after multiple testing corrections (Fig. 4A). In the European ancestry group, the associated proteins included C4b, NFL, BGAT, among others. (Fig. 4A, top). For the African ancestry group, the associated proteins were FAM3D, MGAT3, SPC25, and BGAT (Fig. 4A, bottom). Four proteins (SPC25, BGAT, FAM3D, MGAT3) were found in common in the EUR and AFR-specific analyses.
We applied the same framework to assess the association between metabolite levels and T2D in an ancestry-specific manner (Fig. 4B, Figure S8-9, Table S25-26). We identified 23 EUR and two AFR metabolites given Bonferroni-corrected thresholds (Fig. 4B). Of the 23 significant metabolites from EUR MWAS analysis (Fig. 4B, top), 14 have been studied in differential level analyses concerning T2D, and six of these metabolites were reported to be differentially expressed between T2D cases and controls in a comprehensive external study by Zaghlool et al.,31. Furthermore, two noteworthy AFR metabolites from AFR MWAS analysis (Fig. 4B, bottom) that exhibited significance were GlcNAc sulfate conjugate of C21H34O2 steroid and 1-stearoyl-2-arachidonoyl-GPC. Neither of them had been reported previously in the differentially expression study31. One metabolite, 1-stearoyl-2-arachidonoyl-GPC (18:0/20:4), was in common between EUR and AFR groups.
Integration of proteins and metabolites with the risk of ancestry-matched T2D via genetic colocalization and Mendelian randomization
To minimize the possibility of false-positive findings in XWAS (including PWAS and MWAS), we implemented two supplementary analyses: genetic colocalization (Fig. 4C-D, Table S27-30) and Mendelian randomization (Fig. 4C-D, Table S31-34) between the molecular traits and T2D risk).
In the EUR-stratified analyses (Figure S10A-B), there were 36 proteins and 21 metabolites significant for the XWAS and colocalization analyses; and 13 proteins and six metabolites significant for the XWAS and MR. In the AFR-specific analyses (Figure S10C-D), four proteins and two metabolites were significant on the XWAS and colocalization analyses. Moreover, two proteins were significant in both PWAS and MR.
To emphasize the proteins and metabolites with the highest confidence in terms of their association with the T2D risk, we only highlighted those that are significant after multiple test correction in all three analyses: XWAS, colocalization, and MR. Consequently, we pinpointed five proteins and four metabolites in the EUR-specific (Figure S11A-E, Figure S12A-D) and one protein and one metabolites the AFR-specific analyses (Figure S11F; Figure S12E; Table S35-38). No proteins were in common between the EUR- and AFR-specific analyses (Fig. 4E).
In the AFR-specific analyses (PWAS, colocalization, MR), we nominated a protein called QSOX2 (Fig. 4E, Figure S11F). This protein is regulated by a trans-pQTL under the known risk gene, ABO6 (Table S35). It is worth noting that the variant chr9:133252214:G:A used as an instrumental variable was not in high LD (r2 = 4×10− 4) with the closest pleiotropic variant chr9:133254260:G:A, and hence it was included in the analysis). QSOX2 showed a positive association with T2D risk (PWAS.Z = 4.763 and MR.beta = 0.246). Additionally, the posterior probability of genetic colocalization between QSOX2 and the disease was high (PP.H4 = 0.977). Although QSOX2 has been previously annotated in the process of protein folding32, it has not been published as a known T2D effector.
Of the five EUR proteins associated with EUR T2D5 (Fig. 4E, Table S36, Figure S11A-E), Dtk, C1QT4, and MANBA were encoded by genes that were already known to be implicated in T2D. In this study, these proteins were nominated based on the PWAS, colocalization and MR results. These results were driven by cis-pQTLs to these proteins. These loci were reported as being genome-wide significant and those genes were nominated based on based on variant annotation, genetic colocalization with eQTLs, pcHi-C links, and TWAS significance5,6., Dtk has been reported to be involved in the positive regulation of kinase activity32; C1QT4 plays roles in positive regulation of interleukin-6 and NF-KappB signaling; and MANBA participates in protein modification processes, specifically glycosylation.
However our study also identified two new proteins (TBCE and TPP2) implicated in T2D. TBCE was associated with a cis-pQTL (with chr1:235423298:C:T (negLog10p-value = 12.2), PWAS p-value = 1.46×10− 5, and MR FDR = 7.35×10− 5 on EUR-T2D). This locus passed genome-wide significance in the T2D-GWAS (Figure S11A), but no candidate genes within the locus were identified5,6. TBCE was reported to be part of the causal pathway underlying progressive encephalopathy with distal spinal muscular atrophy33. The other novel protein TPP2 had a trans-pQTL (chr8:9172650:G:A negLog10p-value = 10.6, PWAS p-value = 4.09×10− 6, and MR FDR = 4.23×10− 4 on EUR-T2D). This variant, however, was reported to be within a T2D risk locus passing the genome-wide significance threshold (Figure S11C). The nominated functional genes within this region were ERT1 and MFHAS1 by the TWAS and colocalization results.. TPP2 functions in intracellular amino acid homeostasis32.
Just like the proteomic results, there were no metabolites in common between EUR and AFR-prioritized results based in the triple analyses (MWAS, colocalization, MR). The only AFR-significant metabolite, the GlcNAc sulfate conjugate of C21H34O2 steroid**, had not been reported to be implicated with T2D before (Fig. 4E, Table S37). This metabolite was regulated by the genetic variant near UGT3A1 (mQTL with chr5:35965868:A:C (negLog10p-value = 10.6), MWAS p-value = 7.79×10− 4, and MR FDR = 9.11×10− 4 on AFR-T2D) and it belonged to the partially characterized molecules. Interestingly, a similar metabolite (HMDB0001449) has been reported to be reduced in the blood of human patients with major depression34.
Of the four EUR metabolites associated with EUR T2D5 (Fig. 4E, Table S38), N-acetyl-isoputreanine was found to be associated with genetic variants in the AOC1 gene. This gene has been nominated as a functional gene in two multi-ancestry GWA studies.5,6 However, this is the study linking variants in the AOC1 gene with N-acetyl-isoputreanine as part of the causal pathway for T2D.35 Notably, this metabolite was the only one with a negative association in the EUR-specific T2D analyses. The other three metabolites implicated in T2D from our EUR-stratified analyses: 1-stearoyl-2-linoleoyl-GPE (18:0/18:2)*, 1-oleoyl-2-linoleoyl-GPE (18:1/18:2)*, 1-stearoyl-2-dihomo-linolenoyl-GPE (18:0/20:3n3 or 6)*, have not been reported as effector analyte in previous multi-ancestry T2D GWAS5,6. However, it is worth mentioning that 1-stearoyl-2-linoleoyl-GPE (18:0/18:2)* has been reported as significantly different between T2D cases and healthy controls by Zaghlool et al.,31. Our study serves as an example of integrating both genomics and metabolomics to support that this metabolite is part of the T2D pathway. Phosphatidylethanolamine (PE) metabolites, in general, have been implicated in T2D, non-alcoholic fatty liver disease, Parkinson’s disease, and Alzheimer Disease, based on previous metabolomic studies curated by HMDB.
To extend the XWAS and colocalization findings using an advanced statistical approach, we applied the recently published INTACT framework36 (Table S39-42). In summary, we found ancestry-matched T2D associations with six EUR and two AFR proteins, six EUR and one AFR metabolite (details in Supplementary Notes). These associations contained all the previously mentioned pairs, along with additional pairs related to one EUR protein (MANS4) and one AFR protein (BGAT), as well as two EUR metabolites (5-oxoproline and betaine). Notably, both proteins were reported as effector genes in previous multi-ancestry T2D GWAS5,6, whereas the two metabolites were not previously identified as such.
Moreover, to detect the patterns of the proteins and metabolites implicated here in T2D, we performed a cell-type-specific analysis using stratified-LDSC37. Regardless of ancestry, these proteins and metabolites displayed high expression in T cells and myeloid cells (Figure S13). Finally, to nominate druggable targets for repositioning, we queried the proteins and metabolites against the Drugbank database38. As a result, we found one EUR protein, Dtk, could be targeted by Fostamatinib, an FDA-approved drug used to treat chronic immune thrombocytopenia. On the other hand, the AFR protein, BGAT, exhibited targetability by 13 drugs, though none of them had FDA approval at the time of this study. Furthermore, two EUR metabolites, 5-oxoproline and betaine, could be targeted by pidolic acid and N,N,N-trimethylglycinium, respectively. Notably, pidolic acid has already been approved by the FDA to treat a family history of diabetes.