Differentially expressed genes of EC
The flowchart of the study is depicted in Figure 1. We analyzed three integrated GEO expression profiles, including 128 cancer samples and 22 normal samples. A total of 501 DEGs (187 upregulated and 314 downregulated genes) were screened. In TCGA, we screened 7394 DEGs containing 4505 upregulated and 2889 downregulated genes from EC samples. Of the DEGs from GEO and TCGA databases, 146 upregulated genes and 217 downregulated genes were identified. The results of DEGs are shown in the volcano plots in (Figures 2A and 2B).
Gene Ontology and KEGG pathway enrichment
GO analysis revealed that the DEGs were mainly enriched in organelle fission, nuclear division, chromosome segregation, extracellular structure organization, and cell cycle, etc (Figure 2C). KEGG pathway analysis showed that the DEGs were related to microRNAs in cancer, cell cycle, cellular senescence, chemokine signaling pathway, IL-17 signaling pathway, p53 signaling pathway, and some disease-associated pathways (Figure 2D).
WGCNA results
We selected 150 samples and 12,402 genes from GEO for WGCNA and drew a sample dendrogram and trait heat map (Figure 3A). β = 4 (R2 = 0.92) was selected to construct a scale-free network (Figures 3B and 3C). On the basis of the TOM matrix, the modules were hierarchically clustered, and similar modules on the cluster tree were merged (Figure 4A). Twenty coexpression modules were obtained, and the correlation between these modules with clinical information was calculated (Figure 4B). The green module, which contained 649 genes, was the most related to clinical information and thus selected for further analysis. Correlation analysis between module membership and gene significance was performed for the green module. Results suggested the existence of a close relationship between them (Figure 4C).
Identification and validation of hub genes
A total of 65 upregulated and 36 downregulated genes were included in the green module (Figures 4D and 4E). A total of 64 real DEGs were obtained using UALCAN. These DEGs were evaluated via univariate Cox regression analyses in the TCGA cohort. For the top 15 genes correlated with the OS of EC patients according to the results of univariate analyses, multivariate Cox regression analysis was performed. Cox regression analysis successfully screened MTHFD2, KIF4A, TPX2, RPS6KA6, and SIX1, and these hub genes were used to establish a prediction model for the prognosis of EC. Risk scores based on gene expression and regression coefficient were calculated using the following formula: Risk score = MTHFD2 × 0.36349 + KIF4A × (−0.51299) + TPX2 × 0.46623 + RPS6KA6 × 0.54220 + SIX1 × 0.20801. According to the risk scores, the patients were divided into high-risk and low-risk groups. Kaplan–Meier curves showed that the high-risk group had significantly poorer OS than the low-risk group (Figure. 5A) The AUC value of the ROC curve was 0.70285 for the 5-year survival, indicating that both the high-risk and low-risk groups have a diagnostic value for the prognosis of patients with EC (Figure 5B). The expression of the five hub genes in different groups is depicted in a heat map (Figure 5C). In UALCAN, the expression of mRNA and protein levels markedly differed (Figures 6A–6K), and their mRNA expression levels not only correlated with sample type but also with cancer stages (Figures 6K–6O). Based on the HPA database, the protein expression levels of MTHFD2, TPX2, SIX1, and KIF4A were significantly higher in tumor tissues than in normal tissues, but no significant difference was found in the protein expression level of RPS6KA6 (Figures 7A–7E).
The ROC curve was analyzed using the dataset from the GEO database. Results indicated that the five hub genes have a good diagnostic value for discriminating tumor and normal tissues (Figures 8A–8E). Cox regression analysis suggested that age, grade, stage, type, and risk score were independent prognostic factors for patients with EC (Table 1).
Survival analysis
K–M curves from the Kaplan–Meier plotter database showed that the expression levels of the five hub genes were negatively associated with the prognosis of patients with EC. High expression levels of these hub genes resulted in low OS and RFS, except for the RFS of SIX1 (Figures 9A–9J). The results of TISIDB are summarized in Supplementary Figure1-5. High mRNA expression levels of KIF4A and TPX2 led to the poor prognosis of UCEC, ACC, KIRP, LGG, LUAD, and MESO (Supplementary Figures 1 and 2). Interestingly, the high expression of RPS6KA6 was not only associated with the poor prognosis of UCEC and CESC but also related to the good prognosis of KIRC and LGG (Supplementary Figure 3). The results of MTHFD2 and SIX1 were similar to those of RPS6KA6; high MTHFD2 expression was negatively correlated with the OS of several tumors, such as UCEC, HNDC, KIRC, and KIRP but positively correlated with GBM and LGG (Supplementary Figure 4). High mRNA expression of SIX1 led to a good prognosis of patients with LUAD and LUSC but a bad prognosis of patients with UCEC, ACC, KIRC, and SARC , etc (Supplementary Figure 5).
Hub gene analysis
In the GSCALite website, we found that these five hub genes are mainly related to the activation of several pathways, such as cell cycle, DNA damage response, and EMT. They are also involved in inhibiting various pathways, such as hormone ER, RAS/MAPK, and PI3K/AKT (Figures 11A and 11B). A total of 100 significant genes with positive or negative correlation were identified by GSEA. Interestingly, the KEGG pathways enriched in the high-risk group according to GSEA enrichment analysis were similar to those from the KEGG enrichment analysis of DEGs by clusterProfiter package. Overall, the pathways were highly associated with cell cycle, p53 signaling pathway, DNA replication, oocyte meiosis, and several cancer pathways (Figures 12A–12E). The expression profiles of 100 significant genes are shown in the heat map in Figure 12F. GSEA with the hallmark gene sets suggested that the most relevant pathways included mitotic spindle, G2M checkpoint, spermatogenesis, E2F targets, MYC targets V1, MYC targets V2, UV response, cholesterol homeostasis, mtorc1 signaling, and heme metabolism (Figures 12A–12E). The abnormal expression of these genes may have an important relationship to the disorder of these pathways and may influence the occurrence and development of EC.
Mutation and methylation of the hub genes
The genetic mutations of the five hub genes (TCGA and Firehose Legacy) were analyzed using the cBioPortal. Results showed that 61 out of 177 patients had mutations (34%). Among them, TPX2 and TSPYL5 had the highest mutation rates (19% and 12%, respectively) (Figure 13A). The KM curves showed that the patients with the mutations of the five hub genes had a lower overall survival rate and a shorter disease-free survival time than those without the mutations (Figures 13B and 13C).
Using UCSC Xena, we analyzed the methylation and its effects on the expression of the five hub genes in the TCGA cohort. Several sites had a significantly different methylation level in EC tissues compared with in normal tissues. Further analysis suggested that the methylation of these sites were closely related to the gene expression of the hub genes. The methylation levels of MTHFD2, KIF4A, and TPX2 were lower in the tumor samples than in the normal samples, whereas the overall methylation level of RPS6KA6 in the tumor samples was relatively high. The methylation levels were negatively correlated with the mRNA expression of these genes (Supplementary Figures 6–9). However, the methylation of SIX1 was different. In the 21 methylation sites affecting gene expression, 6 of them had a lower methylation status in the tumor tissues, whereas the other 15 showed a higher methylation status. Correlation analysis revealed that these six sites were negatively correlated with SIX1 expression unlike the other 15 sites. These results were consistent with the high expression level of SIX1 (Supplementary Figure 10).
Small-molecule drugs related to EC
We screened a total of 153 small-molecule drugs by using CMap. The top 10 drugs most relevant to EC are shown in Table 2. Among these small-molecule drugs, trichostatin A, resveratrol, and vorinostat showed a strong negative correlation with EC. The chemical structures of these drugs were obtained from the PubChem database (Figure 14).