Co-expression network analysis identied novel potential Signature Genes Associated with human left ventricle cardiomyopathies arises from different etiologies

Background: Heart disease is global pandemic and causes huge healthcare burden to society. However, it is still illusive that the whole transcription disorder pattern of cardiomyopathies arises from different etiologies. The Weighted Gene Co-Expression Network Analysis (WGCNA) was applied to construct and screen functional gene that be signicantly related to different cardiomyopathies pathological feature. Through co-expression and protein-protein interaction (PPI) networks enrichment analysis, the hub genes and key pathways were screened, which were correlated to cardiomyopathy traits. To discover the novel disease signature genes, cardiovascular disease bioportal database and were employed to blast and validate, which contained independently investigations of clinical cardiomyopathies cases. Results: The potential disease signature genes were identied and assorted into three common axes shared among ve subtype of cardiomyopathies. Four genes (MDM4, CFLAR, RPS6KB1, PKD1L2) were shared by ischemic and ischemic cardiomyopathy group. The secondary axe contained eight signature genes (MAPK1, MAPK11, MAPK14, LMNA, RAC1, PECAM1, XIAP, CREB1) and was overlapped by Ischemic Cardiomyopathy, Post. Partum Cardiomyopathy, Familiar Cardiomyopathy and Idiopathic Cardiomyopathy. The third axe consisted of two common signature genes (TFAM, RHEB) that shared among the subgroups of Viral Cardiomyopathy, Post. Partum Cardiomyopathy, Familiar Cardiomyopathy and Idiopathic Cardiomyopathy. The majority of disorder functions and pathways were enriched in metabolic processes and pathways of MAPK signaling, protein processing in endoplasmic reticulum, regulation of actin cytoskeleton. Conclusion: These results strongly suggest that expression disorder of signature genes contribute to the cardiac dysregulation and functional relapse into cardiomyopathies. Taken together, these novel signature genes could be utilized as potential diagnostic biomarkers or therapy targets. It will be benet the cardiomyopathy precise clinical diagnostics with better outcome. In summary, this study will attract great interest of clinical research scientists as well as division), DNA binding, protein-protein interaction, kinases activity and signal transduction, iron binding and nucleotide binding, etc. cellular components enriched in membrane-bounded organelle, intracellular organelle part, etc. The pathways concentrated on mitogen-activated protein kinase (MAPK) signaling pathway, protein processing in endoplasmic reticulum, regulation of actin cytoskeleton, etc. These results suggested that dysregulation of cardiac functions would be associated with metabolism abnormal and accelerated progress of cardiomyopathies. Furthermore, through gene network literature mining and clustering analysis, these signicance genes were clustered and labelled according to cellular functions and pathological feature keywords literature Idiopathic


Background
Cardiomyopathy is a pathological syndrome that featured with the incapacity of heart to pump and/or ll with blood for body need, and the worst state lapses into heart failure. Heart failure is complex pathophysiological condition with left ventricle myocytes dysfunction. It is caused by many physiological or pathologic processes, such as ischemia, hypertension, pregnancy and diabetes mellitus. As a global pandemic, at least 26 million people were suffered from heart failure and consumed over $30 billion health expenditure in the world [1]. Furthermore, the mortality of heart failure patients is higher than 50% in 5-year [2]. Many historical binary classi cation systems, like termed with idiopathic, were simplicity and clarity, but they may represent an oversimpli cation assorting for a complex biologic phenomenon [3]. As a group of heterogeneous complex cardiovascular diseases, the typical features of cardiomyopathies are primary abnormalities in the physiological structure and function of the heart. According to the morphological features and different etiologies, cardiomyopathies are commonly grouped into several subtypes, including hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), viral cardiomyopathy (VCM), familial cardiomyopathy (FCM), post-partum cardiomyopathy (PCM) and ischemic cardiomyopathy (ISCM) [4]. This new classi cation scheme is potentially useful to draw relationships between enigmatic etiology and dysfunctional cardiovascular disease, and to promote better understanding of disease.
Early clinical investigations demonstrated that some subtypes of cardiomyopathies were derived from patient's cases with dysregulated gene expression pro le, which has initial normal physiological condition and somatic genetics background. It strongly suggests that different etiologies paly critical role in epigenetics transcription change [5]. Recently, to enhance the patient's treatment and healthcare management, it is the trend to discover clinical applicable disease signature genes or biomarker for precise diagnostics through analyzing the genetic disorder and expression pro ling of heart failure [6,7].
Among multiple computerization methodologies, the Weighted Gene Co-Expression Network Analysis (WGCNA) is considered as one of the most useful approaches to discover gene co-expression network based on functional feature through gene expression pro ling analysis [8]. Furthermore, WGCNA has been widely applied to screen the novel biomarkers or therapeutic targets for cancer early diagnostics and treatment, like hepatocellular carcinoma [9] and lung cancer [10].
In this study, the WGCNA has been employed to analyze gene expression pro les of several subtypes of cardiomyopathies and discover the highly connected modules that genes signi cance associated with heart failure. The novel signature genes and key biological pathways were identi ed and validated through cardiovascular disease bioportal, and were listed as potential novel drug targeted for cardiomyopathies. These results will be helpful to better understand the cardiomyopathies pathogenesis, progression and prognosis, and be bene t for patient's precise diagnostics and treatment.
Enriched Genes Signi cance related to different cardiomyopathies Compared the module memberships (MM) correlation and Genes Signi cance (GS) among the all signi cant modules, the module with most signi cant value was de ned as the best candidate for pathological traits correlation analysis (Table.1, Fig.3B), respectively. These candidates were listed as turquoise module (cor = 0.77, p < 1.  Fig.S3G), respectively. In addition, the scatter plot of multiple module memberships (MM) was plotted against the Genes Signi cance (GS) in each signi cant module, and the point was represented each gene contained in a module.

Hierarchical Clustering of Eigengene Pro les with cardiomyopathies Traits
Based on the ME's values, the hierarchical clustering was performed between all modules and different cardiomyopathies traits to identify their relationships. Furthermore, the Eigengene dendrogram analysis was performed to build the correlation of candidate module with different subtypes of cardiomyopathy feature, respectively ( Fig.S4A-G). In the idiopathic dilated group, turquoise module was tightly clustered with idiopathic dilated (Fig. S4A). In the ischemic group, modules of greenyellow and lightyellow were the closest branch clustered with ischemic (Fig.S4B). In previous step analysis, greenyellow module (t-value = 0.41, p-value = 8e−05, GS = 0.2418) was the secondary higher correlation with ischemic status (Fig.2B,   Fig.3B). It suggests that genes containing in greenyellow module would be involve the progress of ischemic. In idiopathic cardiomyopathy group, modules of brown, magenta and purple were clustered with idiopathic cardiomyopathy in a separate branch, and magenta and purple module allocated in same cluster (Fig.S4C). It suggests that module of purple and magenta would be the top two signi cant associated with cardiomyopathy status (Fig.2B, Table.1). In the familial cardiomyopathy and post. partum cardiomyopathy groups, modules of brown, magenta and purple were tightly clustered with familiar cardiomyopathy, while the magenta module was the most signi cant associated with disease status (Fig.S4D-4E). In the hypertrophic cardiomyopathy group, no module was associated with its pathological feature. In the ischemic cardiomyopathy group, although module of black and midnightblue were clustered in closer branch, the yellow module was allocated in the adjacent branch (Fig.S4F).
Combined with the module-trait relationship correlation and gene signi cance results, it suggested that module of black, midnightblue and yellow were signi cant associated with ischemic cardiomyopathy (Fig.2B, Fig. 3B, Table. 1). The module of brown and blue were ignored for next analysis as with higher negative GS values (Table.1, Fig. 2B). The yellow module was the most signi cant correlation to ischemic cardiomyopathy. In the viral cardiomyopathy group, the modules of magenta, purple and brown were clustered with viral cardiomyopathy in a separated branch, and cyan module had the highest GS value associated with pathological feature (t-value = 0.35, p-value = 1e−03, GS = 0.2245) (Fig.2B, Fig. 3B, Supporting Fig.4G). It suggested that magenta module containing genes involve the progress of viral cardiomyopathy.  Fig.S5F-A). The Biological Processes were mainly concentrated in subgroups of cellular macromolecular metabolic process, protein metabolic process, organic substance metabolic process and macromolecule modi cation. The genes signi cant enriched in molecular functions were summarized and listed (Table.S4). The molecular functions were linked to endoplasmic reticulum functions, cell functions (migration, death, growth, division), DNA binding, protein-protein interaction, kinases activity and signal transduction, iron binding and nucleotide binding, etc. The cellular components were mainly enriched in membrane-bounded organelle, intracellular organelle part, etc. The pathways concentrated on mitogen-activated protein kinase (MAPK) signaling pathway, protein processing in endoplasmic reticulum, regulation of actin cytoskeleton, etc. These results suggested that dysregulation of cardiac functions would be associated with metabolism abnormal and accelerated progress of cardiomyopathies. Furthermore, through gene network literature mining and clustering analysis, these signi cance genes were clustered and labelled according to cellular functions and pathological feature keywords literature corresponding (Ischemic, Fig Fig.4F). The real hub genes were determined as described in method section (Table.3), and the numbers of real hub genes were listed ( Fig. 4 A-F). The Idiopathic Dilated group was dismissed for further analysis as no identi ed real hub gene.
Through Venn diagrams analysis, three common axes of hub genes were discovered among these cardiomyopathies groups (Fig.4G). The rst axis was PICALM, which shared by Ischemic Cardiomyopathy, Idiopathic Cardiomyopathy and Post. Partum Cardiomyopathy groups (Fig.4G), and signi cantly up-regulated in Idiopathic Cardiomyopathy and Ischemic Cardiomyopathy groups (Table.3).
PICALM is key regulator in iron homeostasis, clathrin-mediated endocytosis [11,12]. Overexpression of PICALM impaired endocytosis of Transferrin (Tf) Receptor (TfR) and Epidermal Growth Factor Receptor (EGFR) and disturbed the iron homeostasis [12,13]. Up to now, it is still illusive that the exactly role and deregulatory mechanism of PICALM in cardiomyopathies. It is strongly suggesting that PICALM work as potential novel biomarker and therapy target for these subcases of cardiomyopathies. The secondary axis, contained genes of PRKACB, MOB1A, CDC40, were shared in Post. Partum Cardiomyopathy and Idiopathic Cardiomyopathy groups. In addition, these genes (PRKACB, MOB1A, CDC40) were signi cantly overexpressed in Idiopathic cardiomyopathy group, and MOB1A was up-regulated in Post. Partum cardiomyopathy group (Table.3).These genes were linked to the cAMP (cyclic AMP)-dependent protein kinase A (PKA) mediated the exciting-contraction coupling in cardiomyocytes [14], and regulated microtubule stability, cell cycle and cell proliferation & migration, and restrained cardiomyocyte proliferation and size via Hippo pathway [15,16]. PRKACB (protein kinase cAMP-activated catalytic subunit β gene) was linked to congenital heart defect with abnormal over-expression [17]. MOB1A (MOB kinase activator 1A) was required for cytokinesis through regulating microtubule stability. It worked as binding partners as well as co-activators of Ndr family protein kinases and mediated phosphorrecognition in core Hippo pathway that restrains cardiomyocyte proliferation during development to control cardiomyocyte size [15,16]. Overexpression of MOB1A induces centrosomes fail to split and cell size dysregulation [18]. CDC40 (Cell Division Cycle 40), a splicing factor of cell division cycle 40 homolog, regulates cell cycle and cell proliferation and migration [19]. Overexpression of CDC40 causes abnormally cell proliferation and migration, and linked with carcinogenesis [20]. The third axis consisted of ve genes (CREB1, DBT, NCOA2, NUDT21, PIK3C2A) and were overlapped among three groups of Familial / Idiopathic / Post. Partum Cardiomyopathy (Fig.4G). The CREB1 (cAMP-responsive element-binding protein) had been identi ed as the transcription factor and mediated cAMP stimulation by multiple extracellular signals, such as growth factors and hormones. The CREB1 was the key regulator in heart and linked with heart disease via cAMP-PKA pathway dysregulation [21,22]. The DBT (dihydrolipoamide branched chain transacylase E2) is an inner-mitochondrial enzyme complex regulated to degrade the branched-chain amino acids isoleucine, leucine, and valine [23]. The DBT was reported as clinical diagnostics biomarker for patients with dilated cardiomyopathy via caused mitochondria dysfunction [24]. NCOA2 (nuclear receptor coactivator 2) is a transcriptional coactivator that functional aid for nuclear hormone receptors, including steroid, thyroid, retinoid, and vitamin D receptors. NCOA2 promotes muscle cells maintenance and growth, eventually regulates in cardiac cTnT levels [25,26]. Overexpression of NCOA2 regulated cell proliferation in cardiomyopathy [26,27]. NUDT21 (nudix hydrolase 21) is a novel of cell fate regulator by alternative polyadenylation chromatin signaling, and suppression of NUDT21 will enhance the cell pluripotent, facilitated trans-differentiation into stem cell [28]. NUDT21 regulates cell proliferation through ERK pathway [29]. Up to now, little knows about the function of NUDT21 in cardiomyocytes. PIK3C2A (phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 alpha) is an enzyme belong to phosphorylate the 3'-OH of inositol ring of phosphatidylinositol (PI) superfamily and regulates multiple signaling pathways. PIK3C2A is mainly expressed in endothelial cells, vascular endothelium, and smooth muscle [30]. Lower expression of PIK3C2A in peripheral blood was used as signi cant biomarker for acute myocardial infarction patients [31]. More interesting, these hub genes indicated different expression pattern. The expression level of DBT, NCOA2, NUDT21 and PIK3C2A were signi cantly upregulated in Idiopathic cardiomyopathy group, and PIK3C2A was up-regulated in Familiar cardiomyopathy group (Table.3). It hints that these hub genes play different regulatory pattern in the progress of these subtype's cardiomyopathies. The fourth axis of hub genes (HNRNPC, UEVLD) were shared by Familiar Cardiomyopathy and Idiopathic Cardiomyopathy groups, and signi cantly overexpressed in Idiopathic cardiomyopathy group (Table.3). HNRNPC (heterogeneous nuclear ribonucleoprotein C) is RNA binding protein that belong to ubiquitously expressed heterogeneous nuclear ribonucleoproteins subfamily, and mediates pre-mRNAs transport and metabolism between cytoplasm and nucleus [32,33] and overexpression caused cells multi-nucleation [34]. UEVLD (EV and lactate/malate dehydrogenase domain-containing protein) involves the protein degradation and dysregulated linked with metabolic disease [35]. In this study, the expression level of HNRNPC and UEVLD were signi cantly up-regulated in Idiopathic cardiomyopathy group (Table.3). Furthermore, through different signi cant expression analysis, the signi cant changed hub genes were summarized (Table.3, p<0.05). Combined these results together, it hints that these signi cantly expressed Hub genes play dominant role and work as common key regulatory nodes in progress of cardiomyopathies.

Discussion
In this study, to discover novel signature genes or biomarkers to accelerate the precise clinical diagnostics and interference for cardiomyopathies, the WGCNA pipeline was applied to analyze the gene expression pro ling of 90 clinical left ventricle biopsy samples, which represents 8 subtype's cardiomyopathies. The whole transcriptome pro le contained 20,283 target genes for promise diagnostic assessment and mainly covered variously biological and cellular processes. It is representative of real pathological satiation and valuable to discover the signature gene of cardiomyopathies. First of all, it was reasonable to build the co-expression networks with different clinical cardiomyopathies traits using the Pearson correlation ( Fig. 2A). To discover the related modules to cardiomyopathies phenotype, the genes signi cance of the modules was calculated by the linear mixed effects model for testing the association of node to the pathological phenotypes. It was identi ed that the association signi cance between individual modules of gene expression pro le and different cardiomyopathies feature (Fig. 2B). Through the Eigengene dendrogram analysis, the most signi cantly module was pick out for next analysis (Fig. 3B, Supporting Fig. 4A-G). For the next step, the real hub genes among each signi cant module were screened by module membership (MM) -Gene Signi cance and Protein-Protein Interaction Network analysis, and comprised as key interconnected nodes within a functionally network and played important roles in biological functions [44]. There was without any real hub genes identi ed in the Idiopathic Dilated group and discarded for next analysis. In addition, the Idiopathic Dilated case was treated as unique physiological state without impaired the normal cardiac function and ignored for analysis. Brie y, the next analysis was mainly concentrated on last ve subtype's groups, including idiopathic cardiomyopathy (IdCM), familial cardiomyopathy (FCM), post-partum cardiomyopathy (PCM), Ischemic cardiomyopathy (IsCM) and viral cardiomyopathy (VCM). There were four axes of hub genes shared among these cardiomyopathies groups. It was suggesting that these Hub genes work as common key regulator. It was exception that viral cardiomyopathy group did not share hub gene with the others groups. It was possible unique that the dysregulation expression pattern of viral cardiomyopathy. Furthermore, to deeply dig the correlation of signi cance genes and different cardiomyopathies, the signi cance genes were blast through GenCliP2 to mine gene networks and functions connection. The enrichment analysis results contained the biological process, molecular functions, the cellular components and functional pathways.
Although the enriched pathways were varied from different subtype's cardiomyopathies group, the key pattern was similar (Supporting Fig. 5A-F). The biological processes were signi cantly concentrated on cellular metabolic, protein metabolic, organic substance metabolic and macromolecule modi cation. It suggests that the metabolic process disorder associate with cardiomyopathies. The molecular functions were mainly involving in protein binding, heterocyclic compound binding, purine ribonucleotide binding, iron binding and nucleotide binding, etc. The cellular components were including membrane-bounded organelle, intracellular organelle part, etc. The pathways were concentrated on MAPK signaling pathway, protein kinase C protein processing in endoplasmic reticulum, regulation of actin cytoskeleton, etc (Supporting Fig. 6A-F). These related genes were summarized with functions and pathways (Supporting Table. 4A-F). Furthermore, most of signi cance genes were labelled as gene-term association not reported. It represents that the regulatory mechanism of these genes are illusive in progress of cardiomyopathies. Through Literature Gene Networks Mining, the genes reported in the regulatory network were identi ed, which associated with different cardiomyopathies except Post-Partum cardiomyopathy (Supporting Fig. 5D-B). It is possible that less researches and few reports concentrated on Post-Partum cardiomyopathy. For subtype of idiopathic cardiomyopathy (IdCM), familial cardiomyopathy (FCM), and Ischemic cardiomyopathy (IsCM), the highlighted genes were mainly concentrated in MAPK signaling pathway, including MAPK1, MAPK14, CREB1, RAC1. The genes YY1, RAPGEF1, SMAD2, JUND, ATF1, and SRA1 linked with viral cardiomyopathy (Supporting Fig. 5F-B), which are assemble in SMAD signaling pathway [45] and YY1 was overexpressed in heart failure [46]. These results partially matched the key axes of hub genes linked the functions and pathways. These new discovered genes linked to viral cardiomyopathy may give a new view for clinical diagnostics and treatment. It suggests that the dysfunctions of these signi cance genes associate with metabolism disorder and progress of cardiomyopathies.
In this study, limited by clinical samples accession, blast through the cardiovascular disease BioPortal database was used as validation strategy to explore the signi cance genes associated with different subtype cardiomyopathies. The number of disease signature genes ltered were varied from different groups (Table.4). Through overlapping analysis, there were three axes of common signature genes identi ed among ve subtype's cardiomyopathies groups (Fig. 5A). The rst axis contained 4 disease signature genes (MDM4, CFLAR, RPS6KB1, PKD1L2) shared by ischemic and ischemic cardiomyopathy group. Compared between health and ischemic groups, only MDM4 was signi cantly overexpressed (FC = 1.0495, p = 0.0037) in ischemic group, while the four signature genes did not signi cantly change in ischemic cardiomyopathy group. It matches the previously reports that the upregulated MDM4 plays cardio-protective effect in ischemia -refuse injury [47]. Overexpression of MDM4 re ects the selfcorrection of physiological system in abnormal physiological condition of ischemic. It is a new strategy to develop the interference the ischemic lapse into cardiomyopathies by arti cial upregulated MDM4 expression. The secondary axis consisted of eight signature genes (MAPK1, MAPK11, MAPK14, LMNA, RAC1, PECAM1, XIAP, CREB1) and shared by Ischemic Cardiomyopathy with Post. Partum/Familiar/Idiopathic Cardiomyopathy groups. The signature genes (MAPK1, MAPK11 and LMNA down-regulated; RAC1 up-regulated) were signi cantly changed in ischemic cardiomyopathy group, and only LMNA was signi cantly down-regulated in Idiopathic Cardiomyopathy group (Table.5 , Fig. 5G). These results hint that the disorder expression pattern of these three cardiomyopathies were more complex, although they shared common signature genes. The MAPK pathway plays dominant role in progress of ischemic cardiomyopathy. The signature genes (TFAM, RHEB) were shared by Viral Cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy groups and worked as the third axis. The expression comparison results indicated that only RHEB signi cantly down-regulated in Idiopathic cardiomyopathy group. It suggests that signature genes (TFAM, RHEB) play less contribution in pathological progress of viral cardiomyopathy and Post. Partum / Familiar /Idiopathic Cardiomyopathy. Combined these results together, it strongly suggests that these genes could be used as promising biomarkers or therapy targets for cardiomyopathies. This progress will be helpful to integrate precise clinical application for different subtype cardiomyopathies.
There are some limitations in this study. Firstly, lacking of large number of clinical patient's samples and more detail of clinical information, it was impossible to track these samples pathological feature with expression pro les and verify these potential biomarkers with original patient's pathological feature. Secondly, due to the nature of bioinformatics analysis, the discovered speci c GO pathways and biomarkers were not further investigated. Although validated these genes signi cant associated with cardiomyopathies feature through cardiovascular disease BioPortal database, and it is necessary to verify these potential biomarkers with more clinic patient's biopsies. It is mandatory and rational to investigate the function and mechanism of these potential biomarkers or targets for cardiomyopathies with animal model and validate with clinical data in the future research.

Conclusions
In summary, this study provides new insight to identify the potential novel key regulatory biomarkers or therapy targets for varied cardiomyopathies induced by different etiologies. The disease signature genes associated with cardiomyopathies were identi ed and listed as the potential therapy targets for clinical application. In the future research, the detail of regulatory mechanism of these disease signature genes will be deeply investigated and develop novel therapy strategy for cardiomyopathies.

Clinical Samples information
To establish cardiac transcription pro les of cardiomyopathies, the human left ventricle samples were collected from patient's biopsy undergoing cardiac transplantation whose failure arises from different etiologies (e.g. idiopathic dilated cardiomyopathy, ischemic cardiomyopathy) and from "normal" organ donors whose hearts cannot be used for transplants. The transcriptional pro les of these samples were measured by Affymetrix Human Genome U133 Plus 2.0 Array. The Changes in transcriptional pro les were correlated with the physiologic pro le of heart-failure hearts acquired at the time of transplantation.
These samples and data were generated by cardio-genomics lab, Department of Bauer Center for Genomic Research, Harvard University.

Original Gene Expression Data
The gene expression data used in this paper was obtained from the Gene Expression Omnibus (GEO) database in NCBI (Gene Expression Omnibus), and the platform entry number is GPL570. The datasets used for data analysis is available in GEO with tracking number GSE1145 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1145), which was derived from 90 patient's left ventricle biopsies. The datasets contained 20,283 target genes for further analysis ( Figure.

Data preprocessing
The GSE1145 dataset was downloaded, and each gene expression values were normalized and performed with log2 transformation. Each probe-set was linked with gene symbol through the Affymetrix annotation le GLP570 (Affymetrix Human Genome U133 Plus 2.0 Array), which contains total of 54,675 probes that correspond to 20,283 genes. The microarray quality was assessed by sample clustering according to the distance between different samples in Pearson's correlation matrices, and a height cut of 170000 was chosen to identify potential microarray outliers. Four normal samples (GSM18444/18445/18447/18448) were detected as outliers and ignored in the subsequent analysis (Supporting Fig.1).

Construction of Weighted Gene Co-Expression Network
The WGCNA package of R (version 1.63) was download and setup by following the protocol described previously [48]. The WGCNA package was used for performing various functions in weighted correlation network analysis, including constructing network, detecting module, calculating topological properties, simulating data, visualization, and interfacing with external software [48]. First of all, data has been checked to exclude the sample with excessive missing values and identi cation of outlier samples. After data preprocessed, the principal component analysis (PCA) has been applied to double check the data quality. The heart failure and health samples were separated in the PCA plot (Supporting Fig. 1), and the hierarchical clustering on the samples was performed to detect potential outliers. The total 86 samples were used for next step analysis ( Figure. 1). The soft threshold β = 7 was chose to construct the coexpression network as the R 2 reached the peak for the rst time when β = 7. The plot of log10(p(k)) versus log10(k) (Supporting Fig. 2) indicated that the network was close to a scale-free network by using β = 7, where k was the whole network connectivity and p(k) was the corresponding frequency distribution (Supporting Table.1). When β = 7, the R2 is 0.98, ensuring that the network was close to the scale-free network. After the soft thresholding power β was determined, the Topological Overlap Matrix (TOM) and dissTOM = 1−TOM were obtained. After the modules were identi ed, the T-test was used to calculate the signi cant p-value of candidate genes, and the gene signi cance (GS) was de ned as mediated p-value of each gene (GS = lgP). Then, the module signi cance (MS) were de ned as the average GS of all the genes involved in the module. The cut-off signi cant standard was setup as p-value lower than 0.05. In general, the module with the highest MS among all the selected modules will be considered as the one associated with disease. In addition, it was also calculated the relevance between the different etiologies (idiopathic dilated, ischemic, idiopathic cardiomyopathy, familial cardiomyopathy, post-partum cardiomyopathy, ischemic cardiomyopathy, viral cardiomyopathy) of modules and cardiomyopathies phenotypes to identify the most relevant module. In the WGCNA, the module membership (MM): MM(i) = cor (xi, ME) is de ned to measure the importance of the gene within the module. The greater absolute value of MM(i), the gene i is more important in the module. The Genes Signi cance (GS) in the module is highly correlated with MM and the most important element to discover the signi cant module, indicating that Genes in module is signi cantly associated with cardiomyopathies feature. The hierarchical clustering analysis was used to identify gene modules and color to indicate modules, which is a cluster of densely interconnected genes in terms of co-expression. For genes that are not assigned to any of the modules, WGCNA places them in a grey module as not co-expressed. The module eigengene (ME) of a module is de ned as the rst principal component of the module and represents the overall expression level of the module. To identify modules that signi cantly associated with the traits of different etiologies, it was calculated the correlation of MEs (i.e. the rst principle component of a module) [49] with clinical pathological features and identi ed the most signi cant associations.
Function & Pathway enrichment analysis of gene signi cance in module Genes signi cance related to different pathological phenotype were blast through the web-based GenCLiP 2.0 and conducted correlation analysis, which can analyze human genes associated with biological functions and molecular networks [50]. The correlation analyses were including Gene Cluster

Identi cation of hub genes
The module membership (MM) was de ned as the correlation of gene expression pro le with module eigengene (ME). The GS measure was de ned as (the absolute value of) the correlation between gene and external traits. Hub genes were de ned as a gene that in a module play important roles in the biological processes than other genes in the whole network, which were comprised as key interconnected nodes within a functionally network and played important roles in biological functions [44]. In this study, two methods, co-expression network and PPI network analysis, had been employed to identify the real Hub genes among each signi cant module. Genes with the highest MM and highest GS in modules were processed as candidates for further functional research [51][52][53]. In this study, the criterial of screening hub genes were setup as GS > 0.2 and MM > 0.8 with a threshold of P-value <0.05, and hub genes were identi ed in the most signi cantly module that correlated to certain clinical trait. In parallel, the proteinprotein interaction (PPI) network of the module genes were built in the selected modules through STRING database. The signi cant module contained genes interaction between genes was de ned as positive with a combined with the cutoff of > 0.4 and connectivity degree of ≥ 8 through STRING database [54]. In the PPI network, ltered genes were de ned as hub genes. The overlapped hub genes in both coexpression network and PPI network were regarded as "real" hub genes pickup for further analyses.

Exploring Cardiovascular Disease Portal
The identi ed signi cant genes blast in the Cardiovascular Disease Portal to lter the genes associated with cardiomyopathies. The Cardiovascular Disease Portal provides easy access to multiple genetic data associated with speci c cardiomyopathy types and annotations of 854 genes with veri ed functions related to human cardiomyopathies [55,56]. The Cardiovascular Disease Portal integrates data for genes, QTLs and strains associated with the disease(s) highlighted and translational research annotation and associated disease information. The ltered genes will be de ned as disease signature genes for cardiomyopathies.

Validations of Hub / Signature Genes Expression
The validation of signi cant genes expression was performed by comparison between the different cardiomyopathies and health groups. The health group was used as the benchmark. The individual gene expression in each group was presented as means ± standard error of the mean (SEM) that represent cases distribution of group. The expression level comparison used the fold change ratio to quantitatively analyze. The signi cance of differences genes expression was determined between health and cardiomyopathies cases by using the student t-test with the Prism software (GraphPad Software, Inc. San Diego, CA). A p-value < 0.05 was setup as signi cant difference standard (*, represents p value <0.05; **, represents 0.001<p value <0.0001; ***, represents p value <0.0001). The standard of signi cance was setup as up-expression (Fold change > 1.0, p < 0.05) or down-expression (Fold change < 1.0, p < 0.05). Availability of data and materials The datasets used and/or analyzed during this study are available from online supplementary or contact the corresponding author (SL or YH) to request.

Competing interests
The authors declare that they have no competing interests.

Funding
This work was supported by the grants for outstanding talent from abroad by Chinese Academic of Science and startup support Funding from "CAS Pioneer Hundred Talents Program" (E0241211H1) and State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, the Chinese Academy of Sciences (Y8677211K1, Y8690211Z1) to Dr. Shubai Liu. The roles of these grants were to support the activities of study design and data collection, analysis and interpretation, manuscript writing and publishing.

Authors' Contributions
SL and YH designed the overall project study; YH collected data, performed data analysis, and drafted the manuscript; SL, YH, JT and ZW interpreted and summarized the results; SL and YH wrote and revised the