2.1 Materials
The datasets of GSE36961 contributed by Hebl VB et al. and GSE89714 contributed by Li Y et al. were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), and the expression profiling was generated using GPL15389 (Illumina HumanHT-12 V3.0 expression beadchip) and GPL11154 (IlluminaHiSeq 2000). The GSE36961 dataset contains a total of 39 control samples and 106 case samples, and the other dataset – GSE89714 included 5 case samples and 4 control samples. The case samples were of HOCM patients who underwent surgical septal myectomy because of LVOTO, and the control samples were donor cardiac tissues. Statistical analysis with the R (version 3.6.0) was performed.
Meanwhile, the genes associated with HCM were obtained from the GENE database of NCBI (https://www.ncbi.nlm.nih.gov/gene/) and Online Mendelian Inheritance in Man database (OMIM, https://omim.org/ ).
2.2 Data procession
The datasets of R language including the matching probes with genes were processed, and the median value of the expression level of the gene was selected when multiple probes were applied to the same gene. The differentially expressed genes (DEGs) were then obtained by using the R software package “limma”, according to log2|fold change| (|log2FC|>0.58) and adjusted the p-value (adj.P.Val < 0.05).[15, 16]
2.3 Identification of candidate gene set and function analysis
The DEGs from the two datasets were summed up, and then the HCM-associated genes from GENE and OMIM databases were obtained. Next, the redundancy was removed to obtain candidate gene set, and the expression profile data of candidate gene set in GSE36961 were selected for the weighted co-expression network analysis (WGCNA). The “ClusterProfiler” software package from R language was used to perform enrichment analysis of genes of candidate gene set including KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis and gene ontology (GO) enrichment analysis.
2.4 WGCNA
The “WGCNA” software package from R language was used to construct the weighted co-expression network for the candidate gene set. After screening HOCM-related modules, the “ClusterProfiler” software package from R language was used for enrichment analysis of genes from modules. Based on GO analysis and KEGG analysis, the significant GO terms and pathways were identified (adj P-Val < 0.05). Meanwhile, the co-expressed gene set within the module was constructed into an interaction subnetwork, and the core genes of the module were screened according to the node degree of the network using the Cytoscape3.7.2 to visualize the network.
2.5 Construction of the multi-factor regulatory network
To screen the TF gene, the lncRNA, and the miRNA that interact with the core genes, the interaction pairs between ncRNAs and their target genes from the starBase v2.0 (http://starbase.sysu.edu.cn/) database (including miRNA-lncRNA and miRNA-mRNA)and interaction pairs between TFs and their target genes (TF-mRNA) from the TRRUST v2 database (https://www.grnpedia.org/trrust/) were downloaded [17, 18].
MiRNA-lncRNA pairs should meet the number of supporting experiments that are greater or equal to high stringency (≥ 3), and the miRNA-mRNA interaction pairs should also be identified by at least one of the softwares that consisted of targetScan, picTar, RNA22, PITA and miHCMnda/mirSVR. Finally, interaction pairs of core genes containing TF-mRNA, miRNA-lncRNA and miRNA-mRNA were obtained to construct a multi-factor regulatory network. By calculating the degree of each regulator (miRNA, lncRNA, TF), the regulators that act as core regulators (degree > 5) were screened.
2.6 Clusters of HOCM
The key genes that co-expressed to GSE36961 for unsupervised clustering were mapped, and then the HOCM samples (N = 106) were selected to classify the HOCM samples by K-means unsupervised clustering method combined with “tsne” dimension reduction. Firstly, we chose the optimal K value (number of categories) by finding the inflection point of SSE (sum of the squared error, i.e., the sum of the squares of the distances of all points to the center of the cluster to which they belong) to divide HOCM samples into clusters. After that, the expression patterns of these core genes in different subtypes were examined and the genes with significant differences in different clusters were analyzed, which might act as potential marker genes in HOCM subtypes.