Identification of DEGs in ES
We obtained the gene expression profile and clinical information of ES and normal tissues from GSE17679. Differently expressed genes (DEGs) between ES and normal tissues were identified using R package “limma”. A total of 4855 DEGs (2739 up-regulated genes and 2116 down-regulated genes) were screened based on the cut-off criteria of a adjust p-value < 0.01 and |log2FC (fold change) | > 1, in which 3779 protein-coding genes were chosen for subsequent analysis (Figure 1A-B).
Construction and identification of key modules
Next, 3779 protein-coding DEGs were used to perform WGCNA using R package “WGCNA”. A total of 88 ES samples with clinical features including age, sex (female and male), survival time (follow-up time), Status (alive and dead), and RM.status (recurrence/metastasis status) (Figure 2A). The most appropriate soft threshold to construct a scale-free network was 5 (R2 = 0.9) (Figure 2B). A total of twelve modules were identified (module size ≥ 30 and abline = 0.25) in the network (MEbrown, MEmagenta, MEgreen, MEblue, MEpurple, MEblack, MEsalmon, MEtan, MEgreenyellow, MEpink, MEturquoise, and MEgrey) (Figure 3A). Results indicated MEbrown module to be significantly positively associated with recurrence/metastasis status (Status) and survival status (RM.status) and negatively associated with survival time (Figure 3B-E). Thus, the MEbrown module was selected as a clinically pivotal module for subsequent analysis.
Functional analysis of MEbrown module
To further explore the potential function of genes in MEbrown module, we conducted the functional analysis using R package “clusterprofiler”. The functional enrichment results revealed that these genes were mainly associated with chromosome segregation and nuclear division in BP (biological process) category. For CC (cellular component) category, the genes in MEbrown module were mainly distributed in chromosomal region and spindle region. In MF (molecular function) category, the genes in MEbrown module mainly enriched in single-stranded DNA binding and catalytic activity, acting on DNA (Figure 4A). In addition, KEGG pathway analysis indicated the enrichment of MEbrown module mainly in Cell cycle pathway (Figure 4B).
Construction of the PPI network
To further explore the relationship between genes in MEbrown module, The STRING website was used to construct the PPI network. We identified three significant sub-modules using cluster analysis of PPI network in Cytoscape MCODE plug-in. These three sub-modules are respectively composed of 82, 24 and 11 genes (Figure 5A-C).
Construction of the Prognostic Model for ES
We further performed Kaplan-Meier survival analysis and screened 49 genes with p < 0.00001 (Supplementary Table 1). The prognostic model was constructed based on the 49 genes using lasso regression (Figure 6A-B). A total of 16 genes, including NOP2, DDB2, PTPN22, ASPM, GLB1L2, EMILIN1, TTK, CDCA3, NEK2, P4HTM, CENPW, PSMG3, RAD51AP1, SQLE, YWHAG, and SLC16A4 were selected by lasso regression (sup-Figure 1A-P). The prognosis index for each patient was calculated by the established formula: The Prognosis Index (PI) = (0.42 * expression of NOP2) + (-0.84 * expression of DDB2) + (-0.21 * expression of PTPN22) + (1.02 * expression of ASPM) + (0.77 * expression of GLB1L2) + (-0.07 * expression of EMILIN1) + (-0.04 * expression of TTK) + (-0.50 * expression of CDCA3) + (-0.15* expression of NEK2) + (-0.11 * expression of P4HTM) + (-0.07 * expression of CENPW) + (0.08 * expression of PSMG3) + (-0.43 * expression of RAD51AP1) + (-0.10 * expression of SQLE) + (1.10 * expression of YWHAG) + (-0.17 * expression of SLC16A4). All patients were divided into two groups (high- and low-risk groups) according to the median risk score. Kaplan-Meier survival analysis showed the prognostic model to be a good predictor of ES patient prognosis. The overall survival of ES patients was significantly poorer in the high-risk group (Figure 6C-D). Furthermore, ROC analysis was used to determine the accuracy of this signature in predicting the survival. As shown in Figure 6E, this prognostic model showed high sensitivity and specificity with the AUCs for 1-year, 3-year, and 6-year overall survival to be 0.903, 0.995, 0.953. Taken together, the prognostic model could serve as a good predictor of overall survival of ES patients.