Construction of weighted co-expression network and identifcation of key modules
To construct a gene co-expression network, the original data of ES (GSE63155 and GSE63156) were downloaded from the GEO database. R is used for background correction and normalization, and the original data is preprocessed in the same way. The probe was matched with the gene symbol by r-packet annotation, and the probe matching multiple genes was removed. For the gene matched by multiple probes, the median was taken as the final expression value. In the end, we had a total of 24,991 genes. We calculated the SD values of each gene, sequenced them from large to small, and selected the first 5,000 genes as WGCNA. The function of fashClust in the WGCNA package was used for cluster analysis of 5000 genes (Figure 1A and Figure 2A). The selection of soft threshold power is an important step in the construction of WGCNA. The network topology of 1 ~ 20 threshold weights was analyzed, and the scale independence and average connectivity of WGCNA were determined. As shown in Figure 1B, C and Figure 2B, C, the power value 9 was selected as the lowest power (0.9) of the scaling free topological ft index to generate a hierarchical clustering tree (dendrogram) of 5,000 genes. We set the MEDiss Thres to 0.25 to merge similar modules (Figure 3A and Figure 4A) and generated 65 modules (Figure 5A and Figure 6A). The gene statistics in each module are shown in Table1 and 2. Genes that cannot be included in any module are put into the grey module and removed in subsequent analysis.
Correlation between modules and identifcation of key modules
We analyzed the interaction between 65 modules and drew a network heat map (FIG. 5A and 6A). The results show that the modules are independent from each other, and the modules have high independence from each other, and the gene expression of each module is relatively independent. In addition, we calculated the feature genes and clustered them according to their correlations to explore the co-expression similarity of all the modules (FIG. 5B and 6B). We found that the 65 modules are mainly divided into two clusters. A heat map based on the adjacency relationship shows similar results. In GSE63155, Blue module, black module and recurrence group are positively correlated, while brown module and Blue module are negatively correlated with non-recurrence group (figure 7). In GSE63156, Brown module, turquoise module and recurrence group are positively correlated, while blue module and non-recurrence group are negatively correlated (figure 8).
Look for common core genes
The core differentially expressed genes were verified and the core genes were found in two independent ES data sets (GSE63155, GSE63156). By analyzing the genes that play a key role in recurrence in GSE63155, we can find that Blue module and black module are positively correlated in the recurrence group through WGCNA, while GSE63156 is Brown module and turquoise module. Euler diagram shows that they share 12 genes (see figure 9). Second, through WGCNA, we can find that brown module and Blue module are negatively correlated in the non-recurrence group, while GSE63156 is Blue module. By Euler diagram, we found that they Shared 26 genes (see figure 10).So we have 38 genes that are critical to the recurrence group.
Function enrichment analysis in two key modules
In order to study the role of core genes, we conducted enrichment analysis and explored the pathways involved in BP and these two key modules. GO enrichment of BP by DAVID shows that, Recurrence of core set of genes is mainly enriched in regulation of phosphorus metabolic process, regulation of phosphate metabolic process, regulation of phosphorylation, the response to hypoxia, the response to oxygen levels, Regulation of protein amino acid phosphorylation (FIG. 11,12,13,).
Survival analysis of Core gene
Genes with significant interactions in the recurrence group included PSEN2, GLT1D1, MAP1A, RBM43, FDFT1, SOCS3, RPL5, SRSF11, COPS5, SMG7, TSPAN2, CYP2C19, AAMP, GPR137B, TGFBR1, HEBP2, C2orf57, GRM1, CDK2AP2, WDR5B, NUPL2, BPIL1, CLDN11, TNFRSF1A, ADIPOQ, TRIB2, FOXN1,CRYM, ELMO2, PENK, TPK1, FN3KRP, CINP, KCNK3, LDHA, TFPI, POU2F2, FGF17.In order to further study the influence of core genes on the prognosis of patients, we studied the influence of core genes on the prognosis by grouping the expression amount. Relapse group at the core of the correlated with prognosis of genes and gene TGFBR1, TFPI, SRSF11, SOCS3, RPL5, RBM43, POU2F2, NUPL2, MAP1A, GRM1, GLT1D1, FOXN1, FN3KRP, FGF17, CYP2C19, CRYM, COPS5, ELMO2, CLDN11, C2ORF57, CINP, ADIPOQ, AAMP,16,17 (Fig14, 15).
Identifcation of hub genes in the Salmon module and Darkorange module
We submitted the core gene set related to prognosis to the protein interaction of STRING, with the binding confidence interval of truncation set at 0.4. In Plugin Molecular Complex Detection (MCODE), significant mondels with strong protein-protein connections are calculated and selected, with default parameters (degree cut≥2, node score cut≥2, k-core ≥2, maximum depth =100). In order to P<0.05 was considered statistically significant. The candidate genes of node degree were sequenced and the core genes were selected for further analysis. Figure 18 shows the hub genes for SRSF11 TRIM39, SOCS3, NUPL2, COPS5.