Genetic and trait-based characteristics of 1,999 samples
A total of 1,999 subjects with 29 biochemical indices that passed the QC call rate of 95% were analyzed, and a total of 709,211 SNPs in these subjects were subjected to the subsequent genetic analysis. The average GWAS inflation factor for all 29 biochemical indices was 1.029 (range: 0.975-1.060), suggesting that the stratification correlation worked well (Table S1). The heatmaps based on the Pearson correlation coefficient showed that 106 correlated pairs were found among these 29 traits (correlation coefficient was over 0.3 or less than -0.3 and the P value was less than 0.01) (Figure 1). In addition, cluster analysis with the hclust package in R package could be classified these 29 biochemical indices into 2 groups, one group including blood urea nitrogen (BUN), Cholesterol, Glucose, testosterone (TE), follicle-stimulating hormone (FSH), Insulin, immunoglobulin G (IgG), homocysteine (HCY), folate (FOL), alpha-fetoprotein (AFP), immunoglobulin A (IgA), low-density lipoprotein cholesterol (LDL-C), immunoglobulin M (IgM), C3, how-density lipoprotein cholesterol (HDL), Triglyce, C-reactive protein (CRP). The other group including vitamin B12 (B12), Ferritin (FRRR), Uricacid, immunoglobulin E (IgE), anti-streptococcus hemolysin "O" (ASO), Creatinine, osteocalcin (OSTEOC), Estradiol, sex hormone binding globulin (SHBG), alanine transaminase (ALT) (Figure S1). All of each group contained common lipid metabolism indices, suggesting that these traits were correlated with lipid metabolism.
Correlation analysis based on network medicine
For each trait, we used a linear mixed model estimate fixed value, adjusted with PC1 and PC2 of population stratification and age, respectively, to perform GWAS. A total of 86,556 SNPs (p-value<1×10-3) associated with all 29 biochemical indices were obtained and then annotated using the SNP Function database with default parameters and the south Asian population option[18]. A total of 12,521 genes were obtained, and protein-protein interactions were determined using the BioGRID database [19]. A total of 5,313 genes with known proteins were obtained, and the interactional network was built with Cytoscape 3[20]. The topological coefficient, clustering coefficient and degree distribution were important indices to evaluate network nodes. Details of these three factors for 5,313 genes are shown in Figure S2 (A, B, C, D).
The Jaccard correlation matrix heatmaps showed that there were 63 correlated pairs among 435 pairwise combinations among these 29 traits indices with MCI were over 0.6 (Figure 2). In these pairs, HCY, IgG, SHBG, B12, IgA and C4 were closely related with more than other six traits. However, in view of the information regarding gene/protein interactions in public databases is limited, interaction information for most of the genes/proteins in this study could not be obtained, and the Jaccard index was computed based on a small number of genes/proteins.
Correlation analysis based on linkage disequilibrium score regression (LDSC)
Genetics can help to elucidate cause and effect. However, single variants tend to have minor effects, and reverse causation involves an even smaller list of confounding factors. Therefore, interrogating genetic overlap via GWAS that focuses on genome-wide significant SNPs is predicted to be an effective means of mining the correlation between different phenotypes. The GWAS effect size estimate for a given SNP will capture information about SNPs near the linkage disequilibrium [21]. The correlations based on GWAS of the 29 quantitative clinical traits were estimated using cross-trait LDSC. The genetic correlation estimates for all 435 pairwise combinations among these 29 traits. After removing the outlier values, 68 significant correlated pairs (p<0.05) were found (Figure 3). The details for these 68 selected pairs of traits are shown in Table S2.
Integration and interpretation of important pairs identified by these three methods
To identify the correlation pairs among these three methods, we integrated the correlated traits fitting at least one of the following: Pearson coefficient was greater than 0.3 or less than -0.3 and P value less than 0.01, Jaccard coefficient was greater than 0.6, or P value of LDSC was less than 0.05. Totally 208 correlated pairs among biochemical indices were found, among them 106, 63, 68 correlated pairs were found by Pearson coefficient , Jaccard coefficient , LDSC, respectively. Only 1 correlated pairs was found by all three methods. 10 correlated pairs both by Pearson coefficient and LDSC, 10 correlated pairs were found both by Pearson coefficient and LDSC, 15 by Pearson and Jaccard coefficient, and 5 by Jaccard coefficient and LDSC. (Figure S3, A). The related traits were integrated if they fulfilled the following conditions: the Pearson coefficient was greater than 0.3 and p-value less than 0.01, the Jaccard coefficient was greater than 0.6, or the LDSC p value was less than 0.05. Six traits (IgA, IgG, HCY, AFP, IgE and B12) were the first top factors in the network of these 29, and related to more than 20 traits. Additionally, IgM, CRP, C4, BUN, TG, Creatinine and FSH were the second top factors and connected with more than 15-20 traits, and OSTEOC, Estradiol, Glucose, FOL, TE, SHBG, FERR, BMI, ALT and HDL were the third top traits which correlated with more than 10 traits (Figure S3, B).
Genes and SNPs those are potentially important across multiple traits
We selected SNPs with P<10-3 for each trait, resulting in a total of 60,644 SNPs for all 27 traits. The essential genes have a tendency to express in multiple tissues, and are topologically and functionally central [12]. After integrating all 5,313 genes and removing the free notes in the total network among 29 biochemical indices, 427 genes (with P<10-3 at least one SNP) were correlated with more than 5 traits. After filtering the genes with SNPs (P<10-4), there were 71 genes correlated with more than or equal to 3 traits, especially aldehyde dehydrogenase 2 family member (ALDH2), BRCA1 associated protein (BRAP), cadherin 13(CDH13) and CUB and Sushi multiple domains 1 (CSMD1) which was related with more than 5 traits. In these 71 genes, 38 genes were found to connect more than 5 other genes in the interactional network annotated from BioGRID database [19] (Table S3). This showed that essential genes related with multiple traits were sure to locate in the central gene interactional network.
Among all the genome wide variations SNPs, 481 (P<1✕10-3) were associated with three or more clinical biochemical quantitative traits, and 13 of these 481 SNPs were related with more than 5 traits. In these SNPs, rs12229654 (near cut like homeobox 2 (CUX2)), rs2188380 (locates in CUX2), rs3809297 (locates in CUX2) and rs3782886 (locates in BRAP) was related with more than 10 traits. Six SNPs in CUX2 were correlated with more than 5 traits, which denote that CUX2 should play an important role on this net. In addition, for all the SNPs with P<1✕10-4, 29 SNPs were related with three or more biochemical indices (Figure 4). After annotated 29 SNPs with P<1✕10-4 using the HaploReg database[22], we found that almost all these SNPs were related to enhancer histone binding, promoter DNase binding and transcript binding, which affected protein binding or the presence of eQTL (Table S4).
After integrating the SNPs associated with more than 2 traits(P<1✕10-4)with GWAS catalog[23], we found that 31 SNPs in 18 genes were found in GWAS catalog (Table S5). Among those SNPs, five SNPs (rs579459, rs649129, rs507666, rs495828, and rs651007) in ABO were associated with more than 10 quantitative traits and diseases. One SNP (rs671) in ALDH2 was related to 21 traits, six SNPs (rs10519302, rs16964211, rs2305707, rs2414095, rs6493487 and rs727479) in or near CYP19A1 were associated with mainly with hormone measurements. This finding supports the idea that shared genetics on traits can produce correlations among these traits.
The rs671 polymorphism in ALDH2 affects osteogenic and adipogenic differentiation of 3T3-L1 preadipocytes
An interaction between SNP (rs671) in ALDH2 was related to 13 traits were found in this study. The relations between rs671 and lipid metabolism or osteocalcin have been found in some literatures [24, 25], however, their function need to invest. Rs671 is a nonsynonymous (ns) SNP (G504L) in the ALDH2 gene, which is located on chromosome 12. To evaluate the effects of the rs671 polymorphism on osteogenic and adipogenic differentiation of 3T3-L1 preadipocytes, a lentivirus vector was used to overexpress ALDH2-WT or ALDH2-G504L-mut in 3T3-L1 preadipocytes (Figure S4). The cell growth curve of ALDH2-G504L-mut showed no obvious change compared with that of the control, but expression of ALDH2-WT induced a significant increase in cell proliferation (Figure 5A). The cell apoptosis results were consistent with this finding; overexpression of ALDH2-WT resulted in a 3.935-fold decrease in late apoptotic cells in comparison to that of ALDH2-G504L-mut or control cells (Figure 5B, C). We next investigated the impact of the ALDH2 G504L mutation on osteogenic and adipogenic differentiation of 3T3-L1 preadipocytes. At 7 days after osteoblast induction, cells were subjected to Alizarin red-S staining. ALDH2-WT cells showed more mineralized nodules than the control cells or those expressing ALDH2-G504L-mut (Figure 5D, E). In addition, mRNA expression of osteoblast-related genes, such as alkaline phosphatase (AKP), osteocalcin, RUNX family transcription factor 2 (Runx2), and collagen type I (Col1), was significantly higher in ALDH2-WT cells than in ALDH2-G504L-mut or control cells (Figure 5F). After 7 days of adipogenic induction, the ALDH2-WT cells displayed accumulation of lipid vacuoles, as detected by oil red O staining, compared with ALDH2-G504L-mut or control cells (Figure 5G, H). The expression levels of adipogenesis-related proteins, such as adiponectin, C/EBPα (CCAAT/enhancer binding protein α), C/EBPβ, adipocyte fatty acid-binding protein (Fabp4), and Pparγ (peroxisome proliferator-activated receptor), were much higher in ALDH2-WT cells than in ALDH2-G504L-mut or control cells (Figure 5I). Taken together, these results suggest that ALDH2-G504L-mut affected the osteogenic and adipogenic differentiation of 3T3-L1 preadipocytes.