Profile’s Quality Control
The microarray dataset GSE26049 were applied into this study, gene expression profiles of normal samples as well as different tumor samples (n = 90) were normalized and generated. Firstly, we filter and identify the samples which we could use according to the quality control of these gene chips. Standard samples used in this study were selected by calculating their normalized unscaled standard errors (NUSE) and NUSE plot was also generated for visualization (Fig. 1A). Results illustrated that these 90 samples all had a good chip quality and could be used in our further research. Subsequent to data preprocessing using RMA algorithm, the boxplot of normalized and background corrected gene expression level was plotted (Fig. 1B), results showed that the median amount of gene expression in each sample was on a straight line, indicating that the preprocessed data was served as standard data and could be analyzed for the following study.
Construction of Weighted Gene Co-expression Networks
Firstly, hierarchical clustering analysis for each sample was applied to check the heterogeneity of samples to detect and remove the outliers (Fig. 2A), cutoff threshold height was set as 85 to filter outliers. Finally, one sample GSM639674 was excluded and the gene expression matrix contained 20482 genes in the rest of 89 samples were used for WGCNA analysis. Next, the optimal soft threshold power for WGCNA was calculated. As shown in Fig. 2B, when the soft threshold power β was equal to 16, the scale free topology model fit index (R^2) was higher than 0.90 and mean connectivity was infinitely approaching zero. As a result, power 16 was selected as the optimal soft threshold power value.
Based on the co-expression relationships, hierarchical clustering analysis was then performed to obtain the weighted co-expression network (Fig. 3A), results displayed that after generating the clustered dendrogram, 14 distinct gene co-expression modules, characterized by their unique module color, were identified and clustered. The module colors were salmon, tan, blue, purple, yellow, brown, pink, magenta, black, red, turquoise, green, green-yellow, grey, respectively. Interaction relationships of the 14 co-expression modules in all of the genes was shown in Fig. 3B, topological overlap heatmap revealed that each co-expression module could independently validate each other in the network, and dendrogram branches indicated that genes in each module were highly heterogenous. Consequently, it was necessary to further identify the interested modules in each subgroup.
Key Module Identification of PMF
After obtaining WGCNA network data, the module-trait relationships heat map was also performed. The interaction relationships between each module and each different feature including control, PV, ET, and PMF subgroups were fully assessed (Fig. 4A, 4B). From this heat map and histogram, we observed that the modules significantly associated with PV was black module, no co-expression module was significantly related to ET. As for PMF, results visualized that correlations and p values of PMF samples in the modules green and red had strong positive correlation compared to normal samples, while purple and yellow modules had a strong negative correlation compared to normal samples, indicating that modules green, red, purple and yellow could significantly differentiate the PMF samples from the normal samples and genes in these modules could be up-regulated or down-regulated in the progression of PMF. Of which, green module had the strongest correlation (r = 0.46) and the lowest p value (p = 8e-5), elucidating that the green module was highly correlated with PMF, the genes in green module played a crucial roles in the pathogenesis and oncogenesis of PMF. Besides, Module Membership (MM) in PMF versus Gene Significance (GS) scatter plot of green module was also generated (Fig. 4C), results illustrated that MM was highly correlated with GS in green module (cor = 0.48, p = 3.6e-5).
Module eigengene adjacency was subsequently calculated to cluster and assess the identified modules with feature PMF, and heatmap was plotted to depict their interactions within modules and PMF subgroup (Fig. 5A, 5B). Based on the dendrogram, result illustrated that the PMF subgroup clustered with green module, implying that these two eigenvalues had a highly correlations with each other. As for heat map, each module exhibited their independent validation to the other. Blue colors and red colors represented the different strength of co-expression interactions, and results displayed that the PMF subgroup had the highly correlations with green module.
Then we performed the hierarchical clustering of the gene expression profiles in green module between four subgroups: control, PV, ET and PMF in green module. As shown in Fig. 5C, hierarchical clustering analysis displayed that the genes in green module could significantly differentiate PMF group from other three groups, and PMF samples clustered tightly with each other compared to other groups. From the results above, green module was identified as key module accounting for the progression of PMF and subsequent research was pooled based on the green module genes.
Functional and Pathway Enrichment Analysis of Genes in Green Module
Totally the blue module contained 217 genes (as shown in supplementary table 1), to obtain a comprehensive understanding of the biological functions and signaling pathway relevance of the genes in green module, GO, KEGG and GSEA researches were conducted. These genes were uploaded into DAVID database for further investigation, the detailed results of GO, KEGG was shown in Table 1 and Fig. 6. GO results indicated that these genes were mainly associated with several biological process including erythrocyte differentiation, hemopoiesis, positive regulation of transcription from RNA polymerase II promoter and negative regulation of apoptosis process et al; cellular component including cytosol, hemoglobin complex, transcription factor complex et al; and several molecular functions such as protein binding, transcription factor binding, heme binding et al. KEGG analysis revealed that some signaling pathways were identified to be significantly changed, such as cell cycle, protein processing in endoplasmic reticulum, bile secretion and porphyrin metabolism. Then GSEA analysis was also conducted to screen the potential biological function and aberrant regulated pathways in PMF patients. As shown in Fig. 7, results indicated that the mutual genes in green module were commonly evolved in critical signaling pathways that were correlated with carcinogenesis of tumor, including cell cycle, hematopoietic cell lineage, JAK-STAT signaling pathway, oocyte meiosis, P53 signaling pathway and toll-like receptor signaling pathway et al.
Table 1
Functional and pathway (GO, KEGG) enrichment analysis of the genes in green module.
Category
|
Term
|
Count
|
%
|
P Value
|
GOTERM_BP_DIRECT
|
GO:0030218 ~ erythrocyte differentiation
|
5
|
2.304147
|
9.37E-04
|
GOTERM_BP_DIRECT
|
GO:0030097 ~ hemopoiesis
|
5
|
2.304147
|
0.003971
|
GOTERM_BP_DIRECT
|
GO:0043066 ~ negative regulation of apoptotic process
|
13
|
5.990783
|
0.004454
|
GOTERM_BP_DIRECT
|
GO:0008285 ~ negative regulation of cell proliferation
|
11
|
5.069124
|
0.012111
|
GOTERM_BP_DIRECT
|
GO:0000122 ~ negative regulation of transcription from RNA polymerase II promoter
|
16
|
7.373272
|
0.012984
|
GOTERM_BP_DIRECT
|
GO:0007050 ~ cell cycle arrest
|
6
|
2.764977
|
0.019582
|
GOTERM_BP_DIRECT
|
GO:0008584 ~ male gonad development
|
5
|
2.304147
|
0.019829
|
GOTERM_BP_DIRECT
|
GO:0045944 ~ positive regulation of transcription from RNA polymerase II promoter
|
19
|
8.75576
|
0.022256
|
GOTERM_BP_DIRECT
|
GO:0043161 ~ proteasome-mediated ubiquitin-dependent protein catabolic process
|
7
|
3.225806
|
0.024531
|
GOTERM_BP_DIRECT
|
GO:0006810 ~ transport
|
9
|
4.147465
|
0.03818
|
GOTERM_BP_DIRECT
|
GO:0030308 ~ negative regulation of cell growth
|
5
|
2.304147
|
0.044148
|
GOTERM_BP_DIRECT
|
GO:0043065 ~ positive regulation of apoptotic process
|
8
|
3.686636
|
0.047575
|
GOTERM_CC_DIRECT
|
GO:0005829 ~ cytosol
|
63
|
29.03226
|
2.93E-06
|
GOTERM_CC_DIRECT
|
GO:0030863 ~ cortical cytoskeleton
|
6
|
2.764977
|
3.03E-06
|
GOTERM_CC_DIRECT
|
GO:0005833 ~ hemoglobin complex
|
4
|
1.843318
|
2.47E-04
|
GOTERM_CC_DIRECT
|
GO:0016020 ~ membrane
|
39
|
17.97235
|
0.001861
|
GOTERM_CC_DIRECT
|
GO:0005737 ~ cytoplasm
|
76
|
35.02304
|
0.001901
|
GOTERM_CC_DIRECT
|
GO:0080008 ~ Cul4-RING E3 ubiquitin ligase complex
|
3
|
1.382488
|
0.013939
|
GOTERM_CC_DIRECT
|
GO:0005667 ~ transcription factor complex
|
7
|
3.225806
|
0.017789
|
GOTERM_CC_DIRECT
|
GO:0016323 ~ basolateral plasma membrane
|
6
|
2.764977
|
0.044445
|
GOTERM_CC_DIRECT
|
GO:0005856 ~ cytoskeleton
|
9
|
4.147465
|
0.046641
|
GOTERM_MF_DIRECT
|
GO:0005515 ~ protein binding
|
124
|
57.14286
|
9.16E-05
|
GOTERM_MF_DIRECT
|
GO:0051537 ~ 2 iron, 2 sulfur cluster binding
|
4
|
1.843318
|
0.002571
|
GOTERM_MF_DIRECT
|
GO:0008134 ~ transcription factor binding
|
9
|
4.147465
|
0.014121
|
GOTERM_MF_DIRECT
|
GO:0042803 ~ protein homodimerization activity
|
16
|
7.373272
|
0.015948
|
GOTERM_MF_DIRECT
|
GO:0003714 ~ transcription corepressor activity
|
7
|
3.225806
|
0.025659
|
GOTERM_MF_DIRECT
|
GO:0004842 ~ ubiquitin-protein transferase activity
|
9
|
4.147465
|
0.030192
|
KEGG_PATHWAY
|
hsa05144: Malaria
|
5
|
2.304147
|
0.003013
|
KEGG_PATHWAY
|
hsa04976: Bile secretion
|
5
|
2.304147
|
0.010206
|
KEGG_PATHWAY
|
hsa00860: Porphyrin and chlorophyll metabolism
|
4
|
1.843318
|
0.014778
|
KEGG_PATHWAY
|
hsa04141: Protein processing in endoplasmic reticulum
|
7
|
3.225806
|
0.017796
|
KEGG_PATHWAY
|
hsa00910: Cell Cycle
|
3
|
1.382488
|
0.018195
|
KEGG_PATHWAY
|
hsa05211: Renal cell carcinoma
|
4
|
1.843318
|
0.047755
|
Identification of Hub Genes and PPI (Protein-Protein Network) Construction
After screening the interested module, the PPI network was then analyzed into STRING database, and the weighted co-expression network was constructed and uploaded into Cytoscape software.
Co-related modules in these genes were also identified by the Molecular Complex Detection (MCODE) plug-in in Cytoscape, totally 155 nodes and 863 edges were acquired, together with the top 2 significant modules, as shown in Fig. 8A. Module 1 contained 10 nodes and 55 edges, module 2 contained 7 nodes and 30 edges. The functional annotation and pathway enrichment of screened modules were also conducted using DAVID database, as shown in Table 2. Results showed that genes in module 1 mainly played roles in hemoglobin metabolic process, erythrocyte differentiation, iron ion binding and hemoglobin complex. Genes in module 2 were primarily enriched in protein polyubiquitination, protein binding and cytoplasm et al.
Table 2.
Functional and pathway enrichment analysis of MCODE identified genes.
Module
|
Term
|
Count
|
%
|
P Value
|
1
|
GO:0020027~hemoglobin metabolic process
|
5
|
50
|
0.002856
|
GO:0030218~erythrocyte differentiation
|
4
|
40
|
0.018902
|
GO:0005833~hemoglobin complex
|
4
|
40
|
0.005912
|
GO:0005887~integral component of plasma membrane
|
3
|
30
|
0.027505
|
GO:0005506~iron ion binding
|
2
|
20
|
0.078693
|
2
|
GO:0000209~protein polyubiquitination
|
5
|
71.42857
|
2.06E-07
|
GO:0016567~protein ubiquitination
|
4
|
57.14286
|
1.85E-04
|
GO:0005737~cytoplasm
|
5
|
71.42857
|
0.025964
|
GO:0005654~nucleoplasm
|
4
|
57.14286
|
0.027962
|
GO:0004842~ubiquitin-protein transferase activity
|
7
|
100
|
5.24E-11
|
GO:0005515~protein binding
|
7
|
100
|
0.019848
|
Subsequently, hub genes in this weighted network were filtered with degrees ≥ 36, degrees meant the level of correlation between two genes. Altogether, 20 genes were identified as hub genes including EPB42, SLC4A1, CALR, MPL, FECH, GYPB, KLF1, DMTN, RBX1, HBD, GYPA, GLRX5, UBE2H, KAT2B, RHAG, SELENBP1, CDC34, TAL1, NEDD4L, SNCA, as listed in Table 3. Meanwhile, the gene expression levels of the hub genes between PMF and normal samples were tested. The results of gene expression levels were shown in heat map (Fig. 8B). Illustration showed that these hub genes expression levels in PMF samples were much higher compared to normal samples, indicating that these genes was responsible for the development of PMF. Additionally, EPB42, CALR, SLC4A1 and MPL associated genes including HBM, KLF1, GYPB, MYL4, AHSP, RHAG were also expressed abnormally, which demonstrated that EPB42, CALR, SLC4A1 and MPL associated pathways were upregulated aberrantly.
Table 3
Detailed information of the hub genes screened in green module.
Gene Symbol
|
Degree
|
Betweenness
|
Gene Symbol
|
Degree
|
Betweenness
|
CALR
|
68
|
0.178851
|
GYPA
|
40
|
0.013157
|
EPB42
|
63
|
0.171656
|
GLRX5
|
38
|
0.048597
|
MPL
|
62
|
0.090448
|
UBE2H
|
38
|
0.104637
|
SLC4A1
|
53
|
0.026492
|
KAT2B
|
38
|
0.106015
|
FECH
|
52
|
0.073644
|
RHAG
|
37
|
0.00664
|
GYPB
|
52
|
0.01537
|
SELENBP1
|
37
|
0.050267
|
KLF1
|
46
|
0.013714
|
CDC34
|
36
|
0.025117
|
DMTN
|
43
|
0.077598
|
TAL1
|
36
|
0.080359
|
RBX1
|
43
|
0.050277
|
NEDD4L
|
36
|
0.048756
|
HBD
|
41
|
0.004011
|
SNCA
|
36
|
0.093156
|
Among those genes, the node degree of genes EPB42, CALR, SLC4A1 and MPL ranked highest, showing that EPB42, CALR, SLC4A1 and MPL had the most links with other proteins, which indicated that these four genes could play a vital role in the pathogenesis and oncogenesis of PMF.