BMD and LM are the complex traits which can be influenced by multiple genetic factors. Although various genes have been identified to be associated with the two traits respectively by previous studies, limited mechanistic clues can be obtained from these studies. It is known that the genetic variants can exert influence on the traits by regulating the gene expression level. So we conducted this tissue specific transcriptome-wide association study to investigate the expression-trait association for BMD and LM. The databases with both expression and genotype measurement in the tissues/cells including muscle skeleton, peripheral blood and whole blood were used as reference panel for the expression imputation.
IGHMBP2 (immunoglobulin mu DNA binding protein 2) is one of the significant associated genes with LM across LM, YBL and NBL. It locates in 11q13.2 and can encode a kind of helicase to switch immunoglobulin by binding to a specific DNA sequence. Mutation of this gene is thought to be involved in the spinal muscle atrophy with respiratory distress type 1(SMARD1), which can cause muscle weakness and respiratory failure typically beginning in infancy[24]. The deficiency of IGHMBP2 protein can lead to the degeneration of muscle cell nuclei [25]. But how this gene affects the phenotype of LM remains unclear and need more studies.
MTHFR is another notable gene associated with LM for MS and YBL. The protein encoded by this gene is a kind of enzyme in the methyl cycle which can catalyze the conversion of 5, 10-methylenetetrahydrofolate to 5-methyltetrahydrofolate. In a previous study conducted by Liu[26], the MTHFR gene polymorphism was found to be associated with body lean mass but not fat body mass, which was consistent with our study results.
CRIPAK was identified to be significantly associated with BMD in all three tissues. It had a role in the modulation of Pak1-mediated estrogen receptor transactivation by negatively regulating Pak1[27]. The estrogen stimulation of cells could enhance the CRIPAK expression and promote its co-localization with estrogen receptor in the nuclear compartment [27]. Estrogen plays an important role in the growth and maturation of bone. At cellular level in bone estrogen can decrease the cell number and reduce the amount of active remodeling units by inhibiting differentiation of osteoclasts[28]. So CRIPAK might exert an influence on the BMD by the estrogen related pathway.
MSH3 was also found to be significantly associated with BMD in MS and YBL. The protein encoded by this gene can bind to MSH2 and form MutS beta, which belongs to the post-replicative DNA mismatch repair system. Previous studies identified that some SNP loci on this gene (such as rs2035256, rs33013) was significantly associated with BMD and have strong cis-effects on gene expression in human primary osteoblast [29, 30]. But for the lack of further studies of its function, how it influences the BMD phenotype remains unknown.
The enrichment analysis results showed that the significant associated genes with BMD were enriched in the hormone-related categories, such as steroid hormone mediated signaling pathway (GO: 043401), regulation of growth hormone secretion (GO: 0060123) and cellular response to glucocorticoid stimulus (GO: 0071385). The steroid hormone is a kind of hormone making of steroid compound, mainly secreted by tissues including adrenal cortex, testes and ovaries. It can be divided into five types according to their receptors, such as glucocorticoids, mineralocorticoids, androgens, estrogens, and progestogens. Estrogens are known to be the powerful regulator of bone metabolism. The loss of estrogen from the ovarian after menopause is always accompanied with the decline of bone mineral density, which lead to higher osteoporosis rate in women than men. The glucocorticoid is associated with a decreased bone mineral density and impaired bone microarchitecture parameters [31]. Growth hormone is a peptide hormone which can stimulate cell reproduction and cell regeneration in humans and play an important role in human development. These findings seem consistent with the previous knowledge of BMD.
For LM, it seems that the associated genes are more likely to be enriched in the metabolism categories of several substances, including the heterocycle catabolic metabolism (GO: 0046700), carbohydrate metabolism (R-HAS-5663084), regulation of lipid localization (GO: 1905952), tetrapyrrole metabolism (GO: 0033013), purine metabolism (hsa00230), lipid storage (GO: 0019915) and monocarboxylic acid metabolism process (GO: 0032787). These are all important nutrition substances of our body. Carbohydrate and lipid are two kinds of essential macronutrients for the organism. A heterocycle is a cyclic compound containing atoms of at least two different elements in its ring, such as purine. Purine is the basic components of nucleic acid. Tetrapyrroles are a group of chemical compounds containing four pyrrole or pyrrole-like rings and the core parts of some compounds with crucial biochemical roles in the living system, such as hemoglobin. And monocarboxylic acids are the essential resources for synthesizing amino acid. Compared with BMD, LM is more likely to be regulated by the biochemical metabolism process.
The TWAS analysis approach developed by Alexander G et al. was adopted in this study[19]. It can identify genes whose expression is significantly associated with complex traits in individuals without directly measuring the expression levels based on the GWAS summary data[19]. There are several potential advantages by using this approach. First of all, the expression-trait association can provide more interpretable clues for the genetic study while the GWAS often obtained associated locus lying in the LD with multiple significant SNPs which may not in the genes. Moreover, unlike the analysis focusing on eQTL and SNP association, TWAS can combine full cis-SNP signals, no matter they are significant or not, to make an expression imputation. Finally, it can also avoid confounding from environment differences caused by the traits.
There are also some issues to be addressed about the limitations of this study. First of all, as the gene expression is imputed based on the reference panels, there is a possibility that the results can be influenced by the quality and sample size of the reference data. A larger sample size and more available tissues’ datasets can mitigate such impact. Second, the number of imputed genes also depends on the training data. So it is limited in some degree. Finally, only the expression mediated regulation modes are considered by this analysis approach although there are lots of other ways by which the SNPs influence the traits.