Identication of candidate biomarkers associated with the diagnosis and prognosis of endometrial cancer: a bioinformatics analysis

Background Endometrial cancer (EC) is a common malignancy tumor that seriously threatens the wellbeing and health of women. This study aimed to map the hub genes and potential pathways that may be involved in EC. Methods In our study, three gene expression proles of GEO and EC data from the TCGA database were analyzed. We performed WGCNA, and Cox regression analyses to screen hub genes and further validated them by using HPA, KM plots, and other databases. We also studied the methylation level by UCSC Xena and mutation of hub genes using Cbioportal . Results We identied 363 differentially expressed genes (DEGs) (146 upregulated and 217 downregulated genes). Pathway analysis revealed that hub genes are mainly related to cell cycle, DNA damage response, EMT, hormone ER, RAS/MAPK, and PI3K/AKT. Finally, we identied ve hub genes that are related to the progression and prognosis of EC and screened several relevant small-molecule drugs.


Introduction
Endometrial cancer (EC) is a common malignancy tumor in the female reproductive system that seriously threatens the wellbeing and health of women. The age-standardized incidence rate of EC is steadily increasing. A previous study reported that approximately 63,400 new cases and 21,800 deaths from EC occurred in China in 2015 [1]. In the United States, about 65,620 new cases and 12,590 deaths from EC were estimated in 2020, representing 3.6% of all new cancer cases [2]. The early treatment of EC involves surgery, followed by radiotherapy and/or chemotherapy, which provide favorable prognosis.
Unfortunately, chemotherapy remains the main treatment option for advanced stages, and most cases are currently incurable [3]. Therefore, effective diagnostic markers, prognostic factors, and therapeutic targets must be identi ed for the successful treatment of patients with EC. Owing to experimental constraints, bioinformatics analyses have become a valuable tool for early diagnosis and prognosis of cancer. Bioinformatics analyses combined with databases are increasingly utilized to screen biomarkers RFS of patients with EC with gene expression can be easily obtained from the Kaplan-Meier plotter. TISIDB (http://cis.hku.hk/TISIDB/) was used to evaluate the prognostic value of the expression of the hub genes in other tumor types, as well as explore the relationship between the hub genes and their OS [13].
Hub gene analysis GSCA Lite (http://www.bioinfo.life.hust.edu.cn/web/GSCALite/) and GSEA were used to investigate further the molecular mechanisms of the hub genes in EC. GSCA Lite (http://bioinfo.life.hust.edu.cn/web/GSCALite/) is a network analysis platform for genomic cancer analysis. It can be used to calculate the score for 10 cancer-related pathways in 32 cancer types on the basis of the RPPA data from TCPA [14]. GSEA is a comprehensive analysis method for genome-wide expression pro le chip data [15]. In this study, EC samples from the TCGA database were divided into two groups according to the risk score and then utilized for GSEA using GSEA 4.0.3 software(http://software.broadinstitute.org/gsea/index.jsp).
.all.v7.1.symbols.gmt [HARMARKS], and c2.cp.kegg.v7.0.symbols.gmt [KEGG] was set as the gene set database. Each analysis process was performed 1,000 times by using the permutation test. P values less than 0.05 and FDR less than 0.25 were considered to be statistically signi cant.

Genetic mutation and methylation of the hub genes
The cBioPortal for Cancer Genomics (http://cbioportal.org/) is a web-based cancer genomics database based on TCGA [16]. It can be used to explore, visualize, and analyze multidimensional cancer genomic data. In the present study, it was used to explore genetic mutations connected with the ve hub genes and the effects of these mutations on prognosis. mRNA expression z-scores (RNASeq V2 RSEM) with a zscore threshold was set as ±2.0. Kaplan-Meier curves were plotted to show the relationship between the mutations of the ve genes with the OS and DFS of EC. UCSC Xena (http://xena.ucsc.edu/ ) is an online tool for multi-omic research that integrates data from TCGA, ICGC, GTEx, and other databases. This tool also includes clinical and phenotype data [17]. This database was used to study the methylation of the ve hub genes and analyze their differential methylation sites. The correlation between the mRNA expression of the hub genes and the signi cantly different methylation sites was explored.

Potential drug prediction
To explore the potential small-molecule drugs related to EC, we uploaded the DEG list to the CMap database for analysis.
CMap (https://portals.broadinstitute.org/cmap/) is a database used in studying gene expression and drugs. This database can help researchers use data on gene expression pro les to compare drugs with high correlation with diseases [18].
PubChem (https://pubchem.ncbi.nlm.nih.gov/) was used to retrieve the chemical structures of the related small-molecule drugs.

Result
Differentially expressed genes of EC The owchart of the study is depicted in Figure 1. We analyzed three integrated GEO expression pro les, including 128 cancer samples and 22 normal samples. A total of 501 DEGs (187 upregulated and 314 downregulated genes) were screened. In TCGA, we screened 7394 DEGs containing 4505 upregulated and 2889 downregulated genes from EC samples. Of the DEGs from GEO and TCGA databases, 146 upregulated genes and 217 downregulated genes were identi ed. The results of DEGs are shown in the volcano plots in (Figures 2A and 2B).
Gene Ontology and KEGG pathway enrichment GO analysis revealed that the DEGs were mainly enriched in organelle ssion, nuclear division, chromosome segregation, extracellular structure organization, and cell cycle, etc ( Figure 2C). KEGG pathway analysis showed that the DEGs were related to microRNAs in cancer, cell cycle, cellular senescence, chemokine signaling pathway, IL-17 signaling pathway, p53 signaling pathway, and some disease-associated pathways ( Figure 2D).

WGCNA results
We selected 150 samples and 12,402 genes from GEO for WGCNA and drew a sample dendrogram and trait heat map ( Figure 3A). β = 4 (R2 = 0.92) was selected to construct a scale-free network ( Figures 3B  and 3C). On the basis of the TOM matrix, the modules were hierarchically clustered, and similar modules on the cluster tree were merged ( Figure 4A). Twenty coexpression modules were obtained, and the correlation between these modules with clinical information was calculated ( Figure 4B). The green module, which contained 649 genes, was the most related to clinical information and thus selected for further analysis. Correlation analysis between module membership and gene signi cance was performed for the green module. Results suggested the existence of a close relationship between them ( Figure 4C).
Identi cation and validation of hub genes A total of 65 upregulated and 36 downregulated genes were included in the green module ( Figures 4D and 4E). A total of 64 real DEGs were obtained using UALCAN. These DEGs were evaluated via univariate Cox regression analyses in the TCGA cohort. For the top 15 genes correlated with the OS of EC patients according to the results of univariate analyses, multivariate Cox regression analysis was performed. Cox regression analysis successfully screened MTHFD2, KIF4A, TPX2, RPS6KA6, and SIX1, and these hub genes were used to establish a prediction model for the prognosis of EC. Risk scores based on gene expression and regression coe cient were calculated using the following formula: Risk score = MTHFD2 × 0.36349 + KIF4A × (−0.51299) + TPX2 × 0.46623 + RPS6KA6 × 0.54220 + SIX1 × 0.20801. According to the risk scores, the patients were divided into high-risk and low-risk groups. Kaplan-Meier curves showed that the high-risk group had signi cantly poorer OS than the low-risk group ( Figure. 5A) The AUC value of the ROC curve was 0.70285 for the 5-year survival, indicating that both the high-risk and low-risk groups have a diagnostic value for the prognosis of patients with EC ( Figure 5B). The expression of the ve hub genes in different groups is depicted in a heat map ( Figure 5C). In UALCAN, the expression of mRNA and protein levels markedly differed ( Figures 6A-6K), and their mRNA expression levels not only correlated with sample type but also with cancer stages (Figures 6K-6O). Based on the HPA database, the protein expression levels of MTHFD2, TPX2, SIX1, and KIF4A were signi cantly higher in tumor tissues than in normal tissues, but no signi cant difference was found in the protein expression level of RPS6KA6 ( Figures 7A-7E).
The ROC curve was analyzed using the dataset from the GEO database. Results indicated that the ve hub genes have a good diagnostic value for discriminating tumor and normal tissues ( Figures 8A-8E). Cox regression analysis suggested that age, grade, stage, type, and risk score were independent prognostic factors for patients with EC (Table 1).
Survival analysis K-M curves from the Kaplan-Meier plotter database showed that the expression levels of the ve hub genes were negatively associated with the prognosis of patients with EC. High expression levels of these hub genes resulted in low OS and RFS, except for the RFS of SIX1 (Figures 9A-9J Figures 1 and 2). Interestingly, the high expression of RPS6KA6 was not only associated with the poor prognosis of UCEC and CESC but also related to the good prognosis of KIRC and LGG (Supplementary Figure 3). The results of MTHFD2 and SIX1 were similar to those of RPS6KA6; high MTHFD2 expression was negatively correlated with the OS of several tumors, such as UCEC, HNDC, KIRC, and KIRP but positively correlated with GBM and LGG (Supplementary Figure 4). High mRNA expression of SIX1 led to a good prognosis of patients with LUAD and LUSC but a bad prognosis of patients with UCEC, ACC, KIRC, and SARC , etc (Supplementary Figure  5).

Hub gene analysis
In the GSCALite website, we found that these ve hub genes are mainly related to the activation of several pathways, such as cell cycle, DNA damage response, and EMT. They are also involved in inhibiting various pathways, such as hormone ER, RAS/MAPK, and PI3K/AKT (Figures 11A and 11B). A total of 100 signi cant genes with positive or negative correlation were identi ed by GSEA. Interestingly, the KEGG pathways enriched in the high-risk group according to GSEA enrichment analysis were similar to those from the KEGG enrichment analysis of DEGs by clusterPro ter package. Overall, the pathways were highly associated with cell cycle, p53 signaling pathway, DNA replication, oocyte meiosis, and several cancer pathways (Figures 12A-12E). The expression pro les of 100 signi cant genes are shown in the heat map in Figure 12F. GSEA with the hallmark gene sets suggested that the most relevant pathways included mitotic spindle, G2M checkpoint, spermatogenesis, E2F targets, MYC targets V1, MYC targets V2, UV response, cholesterol homeostasis, mtorc1 signaling, and heme metabolism (Figures 12A-12E). The abnormal expression of these genes may have an important relationship to the disorder of these pathways and may in uence the occurrence and development of EC.
Mutation and methylation of the hub genes The genetic mutations of the ve hub genes (TCGA and Firehose Legacy) were analyzed using the cBioPortal. Results showed that 61 out of 177 patients had mutations (34%). Among them, TPX2 and TSPYL5 had the highest mutation rates (19% and 12%, respectively) ( Figure 13A). The KM curves showed that the patients with the mutations of the ve hub genes had a lower overall survival rate and a shorter disease-free survival time than those without the mutations (Figures 13B and 13C). Using UCSC Xena, we analyzed the methylation and its effects on the expression of the ve hub genes in the TCGA cohort. Several sites had a signi cantly different methylation level in EC tissues compared with in normal tissues. Further analysis suggested that the methylation of these sites were closely related to the gene expression of the hub genes. The methylation levels of MTHFD2, KIF4A, and TPX2 were lower in the tumor samples than in the normal samples, whereas the overall methylation level of RPS6KA6 in the tumor samples was relatively high. The methylation levels were negatively correlated with the mRNA expression of these genes ( Supplementary Figures 6-9). However, the methylation of SIX1 was different. In the 21 methylation sites affecting gene expression, 6 of them had a lower methylation status in the tumor tissues, whereas the other 15 showed a higher methylation status. Correlation analysis revealed that these six sites were negatively correlated with SIX1 expression unlike the other 15 sites. These results were consistent with the high expression level of SIX1 (Supplementary Figure 10).

Small-molecule drugs related to EC
We screened a total of 153 small-molecule drugs by using CMap. The top 10 drugs most relevant to EC are shown in Table 2. Among these small-molecule drugs, trichostatin A, resveratrol, and vorinostat showed a strong negative correlation with EC. The chemical structures of these drugs were obtained from the PubChem database ( Figure 14).

Discussion
EC is a serious disease that leads to patient mortality. Abnormal vaginal bleeding is the main clinical manifestation of the early stage of EC. It may take a long period from the onset of symptoms to con rm EC. A previous study reported that only 9% of women with postmenopausal bleeding was diagnosed with EC [19]. The 5-year survival rate of patients with localized EC is about 95%, whereas it is 17.3% for patients with distant metastasis [2]. Although considerable progress has been made in EC research, the mortality rate of EC has been increasing worldwide in recent decades. According to the US Surveillance, Epidemiology, and End Results database, the death rate was 4.21% in 1992 and 5.02% in 2017 in the United States. Unfortunately, no new therapeutic approaches for EC have been developed [20]. Consequently, screening candidate genes is warranted for the early diagnosis and treatment of patients with EC.
In the present study, three gene expression datasets from the GEO and mRNA pro le data of EC were obtained. A total of 363 DEGs were screened. MTHFD2, KIF4A, TPX2, RPS6KA6, and SIX1 were screened as the candidate biomarkers. The risk scores based on these ve hub genes indicated a high predictive value for EC prognosis. The ROC curves showed the ability of these hub genes to distinguish tumor tissues from normal tissues, suggesting that these hub genes play an important role in the development and progression of EC.
MTHFD2 is an enzyme of folate metabolism that is a common target of cancer chemotherapeutics. Even during growth restriction, MTHFD2 overexpression has the capacity to promote cell proliferation [21]. Signi cant upregulation of MTHFD2 has been reported in various cancers, including CRC, HCC, breast cancer, and NSCLC [22][23][24][25]. In CRC, silencing MTHFD2 expression inhibits proliferation, weakens migration ability, blocks cell cycle, and promotes the apoptosis of cancer cells. Meanwhile, high MTHFD2 expression results in bad prognosis [22,23]. MTHFD2 overexpression is associated with poor prognosis and tumor metastasis in HCC and breast cancer [24,25]. In vitro and in vivo experiments of NSCLC indicated that knockdown of MTHFD2 can reduce cell growth and tumorigenicity [26]. Kawai et al. [27] reported that DS18561882, an MTHFD2 inhibitor, has a strong antitumor effect on mouse xenograft model. In the present study, the mRNA and protein expressions of MTHFD2 were higher in EC tissues than in normal tissues. Moreover, mRNA expression was associated with the patients' cancer stages. Furthermore, MTHFD2 was an independent prognostic factor for the short OS of patients with EC, indicating that MTHFD2 participates in the progression of EC.
KIF4A overexpression has been recently observed in various types of cancers, including HCC, OSCCs, and CRC [28][29][30]. Moreover, these studies reported that KIF4A overexpression promotes cell proliferation throughout the entire cell cycle. The level of KIF4A overexpression is negatively related to survival time but signi cantly correlated with clinical stage, metastasis, and tumor dimension [28]. Yasuyuki et al. [29] reported that cell-cycle is arrested at the G2/M phase and cell proliferation is signi cantly decreased in KIF4A knockdown cells in OSCCs. A study demonstrated that KIF4A promotes the proliferation of CRC via the p21-mediated cell cycle and results in a poor prognosis [30]. These results are consistent with those of our study and suggest that KIF4A may be a key regulator in the tumor progression of EC. TPX2, formerly known as P100, is known to be a key factor for mitosis and spindle assembly [31]. A review showed that TPX2 is considered a candidate oncogene because of its overexpression in numerous malignancies that correlate with the progression of diseases and unfavorable prognosis [32]. Yan et al. [33] found that the mRNA and protein levels of TPX are upregulated in bladder carcinoma. Moreover, they proved that TPX2 modulates cell proliferation, migration, invasion, and tumorigenicity via a TPX2-p53-GLIPR1 regulatory circuitry. In OC, TPX2 is upregulated, and TPX knockdown affects cell cycle and decreases proliferation [34]. Tomii et al. [35] reported that high TPX2 expression in GC is associated with a decrease in DFS and RFS. A study reported that miR-491 exerts inhibitory effects on BC cell invasion and migration by targeting TPX2 [36]. Zhang et al. [37] proved that TPX2 is regulated by miR-29a-5p overexpression in EC tissues and that it is associated with poor outcome. These results agreed with the ndings of the present study that TPX2 plays an important role in EC.
RPS6KA6, also known as RPS4, is considered a tumor suppressor. It shows a decreased expression in several cancers, including ovarian cancer, breast cancer, and GC [38]. A previous study reported that RPS6KA6 expression may have the capacity to limit the oncogenic, invasive, and metastatic potential of breast cancer [39]. In vitro and in vivo experiments suggested that RPS6KA6 has inhibitory effects on GC [40]. Dewdney et al. [41] found that RPS6KA6 frequently shows a higher methylation status and lower expression in EC compared with normal endometrial tissues. Their results were similar with those of the present study. We further explored the mutation, prognosis, and cancer stage of RPS6KA6 in EC and found that RPS6KA6 is an independent risk factor of OS in EC.
A previous review showed that SIX1 is overexpressed and associated with poor prognosis in numerous cancers [42]. SIX1 is the most well-studied hub gene in EC. Li et al. [43] reported that miR-23a regulates cell proliferation, migration, and invasion by targeting SIX1 in EC. Xin et al. [44] found that SIX1 is involved in the growth of EC via the ERK and AKT signaling cascades. Suen et al. [45] showed that the SIX1 oncoprotein is a biomarker of EC. In the present study, we found that SIX1 was not only aberrantly expressed but also associated with the cancer stage and overall survival of patients with EC.
The results KEGG pathway, GSEA, and GSCA Lite analyses showed that cell cycle had a tight association with EC. Kastan and Bartek [46] indicated that the disorder of the cell cycle pathway is a vital mechanism of tumorigenesis. Another study suggested that targeting cell cycle checkpoints may contribute in identifying new therapeutic targets [47]. Therefore, we speculate that the ve hub genes may play an important role in the tumorigenesis and progression of EC by regulating the cell cycle. This conjecture provides a new direction to understand further the molecular mechanism of EC.
The results of mutation analysis showed varying degrees of mutation of the ve hub genes. These mutations led to the poor prognosis of patients with EC. DNA methylation results can explain the abnormal expression of the ve hub genes in EC from an epigenetic perspective. We screened several small-molecule drugs that may have potential therapeutic effects on EC on the basis of the CMap database. However, the present study has several limitations. First, all data were derived from public databases. We did not explore the molecular mechanism of EC through experiments. Thus, more clinical data and in vitro and in vivo experiments are required to validate our results.

Conclusions
In Conclusion, we identi ed ve hub genes that are related to the progression and prognosis of EC.
Among them, MTHFD2 was reported for the rst time in EC and has a close relationship with the occurrence and development of EC patients. We hope that more research will be conducted in the future to explore his role in endometrial cancer. We established a risk prediction model that provides a favorable prognostic value on the basis of the ve hub genes and identi ed several candidate drugs that may have potential treatment effects on EC.

Declarations
Availability of data and materials The datasets supporting the conclusion of this article are included within the article.