HCM is a recognized genetic form of heart disease. Many patients have poor long-term outcomes, including heart failure, malignant arrhythmias and sudden cardiac death. Currently there are no treatments that can effectively reverse the disease, including drug therapy, interventional therapy, and surgical treatment. Therefore, there is an urgent need to explore new effective therapeutic strategies and etiological explanations for HCM. Genetic testing is an indispensable part of labor practice, which has diagnostic and predictive value. It also offers hope that scientists will be able to decipher the mechanisms of disease occurrence and identify targets for effective treatments. As well, new sequencing techniques have led to a large number of candidate genes. In this study, we used a global approach to construct a gene co-expression network to predict candidate gene clusters in the pathogenesis of HCM. Furthermore, we hope to predict candidate gene sets as the basis for a given biological process through closed co-expressed gene modules with common functional annotations.
WGCNA is a new statistical method based on gene correlation which can be used to construct gene networks, detect modules, identify central genes and screen candidate genes as biomarkers (Langfelder and Horvath, 2008). In the statistical process, WGCNA focuses on processing a set of gene modules rather than individual genes. This avoids the disadvantages of only processing genes and therefore ignoring molecular transcription networks. A few similar bioinformatic studies have been previously reported. Jing et al. identified specific modules and hub genes related to coronary artery disease by WGCNA (Jing Liu et al., 2016). In 2019, Ran et al. identified biomarkers correlated with hypertrophic cardiomyopathy with co-expression analysis (Ran Chen et al., 2019). Our study differs from previous similar studies mainly in that they used the differentially expressed genes for WGCNA. We did not filter genes by differential expression to avoid failures of choosing soft thresholding power by scale-free topology fit. In this study, we obtained 4000 genes of 105 HCM samples from a NCBI dataset, which yielded 7 modules through the use of the WGCNA method. According to a correlation study by the topological overlap matrix (TOM) plot (Figure 4), each module was shown to be independent of the others. In addition, functional enrichment analysis was performed on the genes in these modules to identify important modules and the genes they contained. By analyzing the functional richness of these seven modules, there are significant differences in their enrichment degrees and terms. By analyzing these data, we found that in both GO enrichment and KEGG pathway analysis, the turquoise module has the highest enrichment. The greatest number of genes (1202 genes) were enriched in the turquoise module. It accounts for 30% of the total number of genes. Therefore, the turquoise module is the most relevant module from the 7 previously identified. Through GO analysis, the genes were mainly concentrated in protein binding, poly(A) RNA binding and translation. Through KEGG analysis, we can find that differentially expressed genes in the turquoise module are highly rich in Ribosome, Proteasome and Oxidative phosphorylation. The WGCNA package function “chooseTopHubInEachModule” was used to identify the modular hub genes with the highest connectivity. The hub genes of each module are RPL35A for module Black, FH for module Blue, PREI3 for module Brown, CREB1 for module Green, LOC641848 for module Pink, MYH7 for module Turquoise and MYL6 for module Yellow. Interestingly, the hub gene of the most important module Turquoise obtained by GO and KEGG analysis was MYH7. It has been repeatedly reported as one of the prevalent pathogenic mutation genes for HCM (Richard et al., 2003; Andersen et al., 2009). Compared to normal people, the above 7 hub genes were further differentially expressed. We found that FH and MYH7 were highly expressed in the HCM group compared to the normal group. These results suggested that these two genes may play an important role in the occurrence and development of HCM. Therefore MYH7 (Figure 7A) and FH (Figure 7B) were regarded as true hub genes. Moreover, ROC curves revealed their high diagnostic value as biomarkers for HCM (Figure 7C; MYH7 AUC: 0.762, FH AUC: 0.612).
MYH7 was a gene encoding myosin heavy chain beta (MHC-β). It was mainly expressed in the heart and is also expressed in skeletal muscles (Quiat et al., 2011). Multiple previous independent studies have demonstrated that pathogenic mutations in the β-myosin heavy chain (MYH7) gene caused HCM (Geisterfer-Lowrance et al., 1990; Tanigawa et al., 1990). In addition, mutations in the MYH7 gene are very common in HCM, and can be seen in 25% to 40% of patients (Van Driest et al., 2005). The hub genes in the important modules calculated by WGCNA in this study are consistent with the results of many of the above studies.
The fumarate hydratase (FH) gene is localized to the chromosomal position 1q42.3-q43. In normal cells, FH is located in both mitochondria and cytosol and catalyzes fumarate to malate (Jardim-Messeder et al., 2017). Fumarate is a covalent oncometabolite. Its accumulation is characteristic of hereditary leiomyomatosis of genetic cancer syndrome. The mutation of the fumarate hydratase (FH) gene may cause the affected cells to transition to aerobic glycolysis (Warburg effect) (King et al., 2006). It has been found that mutations in fumaric acid can cause several fumarase-related diseases in humans, such as benign mesenchymal tumors of the uterus, leiomyomatosis and so on. In the results of this study, the FH gene may also be a key gene in the occurrence and development of HCM. At present there is little research on their correlation. In the future, exploration of the FH gene may be a salient new direction for further research on HCM.
Therefore, in order to determine the potential molecular function of these two important hub genes, we continued to use GSEA to search for KEGG pathways that are rich in high-expression samples. As shown in Figure 8A and 8B, genes in high expression groups of MYH7 and FH were enriched (p<0.05) in “Proteasome” and “PPAR signaling” pathways, respectively.
The proteasome is the main multicatalytic protease complex. It is involved in the degradation of most intracellular proteins. In addition, muscle fibers and sarcoma proteins have been shown to be primarily degraded by proteasome. Some scholars have pointed out that proteasome dysfunction is related to human HCM (Predmore JM et al., 2010). As well, a previous study suggested that the protease inhibitor ps-519 may cause a significant regression of cardiac hypertrophy.
Peroxisome proliferator-activated receptors (PPAR) are expressed in many tissues, such as skeletal and cardiac muscles, fat cells, liver cells, etc. Different subtypes of PPAR have different tissue distribution and expression profiles, leading to different clinical outcomes. In particular, the subtype of PPARα is highly expressed in tissues with high fatty acid oxidation capacity, such as liver, heart, and skeletal muscle. In addition, activation of the receptor for another subtype, PPARβ/δ has been shown to protect the myocardium from ischemia-reperfusion injury typical of diabetic cardiomyopathy. Although there is no research on the direct relationship between the PPAR pathway and HCM, this pathway may also be a direction for further researching HCM and finding therapeutic approaches.