Screening and identification of MSX1 for predicting progestin resistance in endometrial cancer

Background Progestin resistance is a critical obstacle for endometrial conservative therapy. Therefore, the studies to acquire a more comprehensive understanding of the mechanisms are very important. However, the pivotal roles of essential molecules are still unexplored. Methods We downloaded GSE121367 from the GEO database. The “limma” R language package was applied to identify differentially expressed genes (DEGs). We conducted Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) analysis. Protein–protein interaction was constructed by STRING and visualized in Cytoscape. The tumor immune microenvironment was explored by TISIDB database. Methylation validation and overall survival analysis was conducted by TCGA database. In addition, the upstream modulators of hub genes were predicted by miRTarBase and Network Analyst database. Results A total of 3282 DEGs were identified and they were mostly enriched in cell adhesion pathway. We screened out ten hub genes including CDH1, JAG1, PTGES, EPCAM, CNTNAP2, TBX1, MSX1, KRT19, OAS1 and DAB2 among different groups, whose genomic alteration rates were low based on the current endometrial carcinoma sample sets. Has-miR-335-5p, has-miR-124-3p, MAZ and TFDP1 were the most prominent upstream regulators. The methylation status of CDH1, JAG1, EPCAM and MSX1 were decreased, corresponding to their high protein expression, which also predicted better overall survival. The homeobox protein of MSX1 showed significantly tissue specificity and better prognostic value. Conclusions Our study identified the gene of MSX1 promised to be the specific indicator. This would shed new light on the underlying biological mechanism to overcome progestin resistance of endometrial cancer. significant


Abstract
Background Progestin resistance is a critical obstacle for endometrial conservative therapy. Therefore, the studies to acquire a more comprehensive understanding of the mechanisms are very important. However, the pivotal roles of essential molecules are still unexplored.
Methods We downloaded GSE121367 from the GEO database. The "limma" R language package was applied to identify differentially expressed genes (DEGs). We conducted Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) analysis. Protein-protein interaction was constructed by STRING and visualized in Cytoscape. The tumor immune microenvironment was explored by TISIDB database. Methylation validation and overall survival analysis was conducted by TCGA database. In addition, the upstream modulators of hub genes were predicted by miRTarBase and Network Analyst database.
Results A total of 3282 DEGs were identified and they were mostly enriched in cell adhesion pathway.
We screened out ten hub genes including CDH1, JAG1, PTGES, EPCAM, CNTNAP2, TBX1, MSX1, KRT19, OAS1 and DAB2 among different groups, whose genomic alteration rates were low based on the current endometrial carcinoma sample sets. Has-miR-335-5p, has-miR-124-3p, MAZ and TFDP1 were the most prominent upstream regulators. The methylation status of CDH1, JAG1, EPCAM and MSX1 were decreased, corresponding to their high protein expression, which also predicted better overall survival. The homeobox protein of MSX1 showed significantly tissue specificity and better prognostic value.
Conclusions Our study identified the gene of MSX1 promised to be the specific indicator. This would shed new light on the underlying biological mechanism to overcome progestin resistance of endometrial cancer.

Background
Endometrial carcinoma (EC), which results from aberrant regeneration in terms of excessive growth of endometrial glands [1], accounts for 4.4% of carcinoma cases among women in 2018 [2] with more than 60,000 cases estimated in the United States in 2019 [3]. As for endometrial precancerous lesions including atypical hyperplasia or endometrial intraepithelial neoplasia and well-differentiated cancer, hysterectomy would not be a feasible and effective optimal choice for them and conservative treatment to preserve fertility for young patients is becoming significantly essential. While progestin remedy is commonly applied, approximately 30% of such patients don't respond to the therapy, which causes poor effect for fertility preservation [4]. Till now, there is no effective solution to detect or predict which group of patients may respond to the progestin treatment.
A more comprehensive exploration of the precise molecular targets of progestin resistance would facilitate further improvements in disease diagnosis and would probe new biomarkers, continuous researches including ours have been carried out in the last decade to address the problem [5][6][7]. At present, the microarray technology and bioinformatics processing are considered to be promising tools for genomic analysis and could be well applied to identify genetic or epigenetic alterations in carcinogenesis and drug resistance, which become a necessary complement to experimental research [8]. Considering recent developments in open-access datasets like the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA), the exploration of key genes and detection of functional pathways have been implemented in EC [9]. The GEO and TCGA database contains thousands of clinical information and gene sequencing data, allowing for well-rounded analysis of various cancers.
The gene expression profiling interactive analysis (GEPIA), acts as a web server for gene expression profiling, survival analysis and correlative analyses on the basis of different tumor characteristics such as grades or stages [10]. Epigenetics which covers fields of aberrant DNA methylation, dysregulated noncoding RNA and altered post-translational histone modification refers to heritable changes in gene expression which is not associated with an alteration in DNA sequence but plays an essential role in carcinogenesis and resistance detection [11]. Aberrant DNA methylation is most widely explored and may become an effective detection indicator [12]. Up to now, there was no relevant analysis of bioinformatics focused on progestin resistance of EC and the exploration of methylation marker of resistant genes was needed.
In this research, bioinformatics analysis was applied to reveal the differential expressed genes (DEGs) that leaded to progestin resistance based on microarray datasets from GEO databases and screen out significant hub genes. Gene-related microRNAs (miRNAs), transcription factors (TFs), methylation status and survival analysis as well as biological functions and pathways were also integrated to explore the mechanisms and potential therapeutic value of these DEGs in resistance by constructing networks. Tumor Immune Estimation Resource (TIMER), Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) were utilized to detect underlying biological mechanisms. Our results may help understand the pathogenesis of progestin resistance. Moreover, it may provide insight regarding the novel treatment for EC.

Microarray data and Data procession
The Gene Expression Omnibus (GEO) is a public repository for data storage. In the present study, the gene expression profiling data sets (GSE121367) was obtained from GEO database. It included endometrial cancer cell line Ishikawa and IshikawaPR which were established from Ishikawa cell as an acquired medroxyprogesterone acetate (MPA) resistant subline. Normalized data of GSE121367 was downloaded from GEO database and further processed by the "limma" R language package to identify differentially expressed genes (DEGs) between IshikawaPR and Ishikawa cell lines. P value < 0.05 and |log fold change(FC)| > 2 were set as criteria to screen DEGs. The Cancer Genome Atlas (TCGA) database of EC was used to verify the expression status and survival function of hub genes.

Functional and pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID, http://davidd.ncifcrf.gov/) is an online program offering systematic and integrative functional annotation tools for researchers to explore biological meaning behind large list of genes [13]. In this study, the DAVID database and Metascape (http://metascape.org/) were introduced to perform both Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the top 250 DEGs [14]. P < 0.05 was set as the cut-off criterion.
Data analysis of Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) In order to explore biological pathways of different groups, GSEA software (https://www.broadinstitute.org/gsea/index.jsp) was used. The annotated gene sets of c5.all.v7.0.symbols.gmt and h.all.v7.0.symbols.gmt were downloaded from the website and considered as the reference gene sets. The number of permutations was 1,000. Other parameters were set to default. Significant difference at P-value < 0.05 was defined as the cutoff criteria.
Normalized enrichment score (NES) and false discovery rate (FDR) were applied to determine the statistical differences. The differential results were visualized by Enrichment Map plug-in of Cytoscape [15]. Furthermore, the "GSVA" R package was utilized to explore the pathways most associated with hub genes [16]. On the basis of the median expression of hub gene, 91 EC samples were divided into two groups (high expression and low expression). P < 0.01 was defined as statistically significant.

Protein-protein interaction (PPI) Network Construction and Hub Genes Screening
Firstly, online database Search Tool for the Retrieval of Interacting Genes (STRING, http://stringdb.org) was employed to explore the functional interactions between DEG-encoded proteins and build the PPI network [17]. PPI pairs with combined score ≥ 0.4 were considered as the threshold value. Subsequently, the PPI network was visualized by Cytoscape software.16 [18] and the degree of connectivity was also analyzed. Then the network relationship file was downloaded and the top 10 hub genes were identified by the analysis tool of its plug-in (degrees ranking of cytoHubba) [19].

Validation of the hub genes
Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn) is a web-based server to conduct out Hub genes expression analysis, correlation analysis and patient survival analysis [10]. Survival analyses of hub genes were conducted by log-rank tests and Kaplan-Meier survival curves were plotted. Then the mutation and DNA copy-number alterations of hub genes were investigated in cBioPortal (https://www.cbioportal.org/), the methylation status of hub genes was validated in Ualcan (http://ualcan.path.uab.edu/), which were based on TCGA analysis. Furthermore, the RNA expression level of hub genes in different carcinoma tissues were shown on basis of TCGA database and the protein level difference of hub genes were displayed by immunohistochemistry (IHC) on basis of the Human Protein Atlas database (HPA, https://www.proteinatlas.org/) [20].

Prediction of Relevant MicroRNAs and Transcriptional Factors (TFs) of Hub Genes
For the hub DEGs identified from the PPI network, the related miRNAs were predicted by miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/php/download.php), which is the database aiming to provide hundreds of published experimentally validated miRNA-gene interactions [21].The Network Analyst (https://www.networkanalyst.ca/faces/home.xhtml) is designed to support integrative analysis of gene expression data through statistical, visual, and network-based approaches [22]. In this study, Network Analyst was introduced to predict hub generated TFs. A list of the hub genes were enrolled into the input area and proceeded step by step, finally the gene-related TFs as well as TFs-gene interactions pairs were presented. Then the results were visualized using the Cytoscape software.16 and the correlations were also evaluated based on GEPIA.

Identification of Aberrantly Expressed Genes
After data preprocessing and quality evaluation, we obtained the expression matrices from the cell samples in the set GSE121367 and further processed with GEO2R tool. Results showed that a total number of 3282 commonly DEGs were screened out from the dataset GSE121367 with 1819 upregulated genes (logFC > 2) and 1463 down-regulated genes (logFC < 2). The volcano plot is shown in Results suggested that changes in BP of top 250 key genes were significantly enriched in "negative regulation of DNA binding", "type I interferon signaling pathway" and "neuron migration" (Fig. 1c), As for CC term, these genes showed enrichment in "anchored component of membrane", "extracellular space" and "multivesicular body" (Fig. 1d), Besides, MF term indicated enrichment predominantly at "protein dimerization activity", "Notch binding" and "transporter activity" (Fig. 1e). To further analyze the DEG-enriched pathways, KEGG pathway analysis was subsequently conducted Supplementary file 3 . As shown in Fig. 1f, it covered "Cell adhesion molecules pathway" and "Endocytosis pathway".
This functional investigation identified that these DEGs had close association with changes of DNA binding and cell adhesion pathway. Furthermore, we also analyzed the pathway of differential genes by the website of Metascape ( Fig. 1g-h) which revealed that differential genes were enriched in "mesenchymal cell differential", "cell-cell adhesion mediated by cadherin" and "negative regulation of DNA binding and cell proliferation".  Data Processing and Gene Set Enrichment Analysis (GSEA) Although differential expression of individual genes could play a critical role in mechanistic aspects of cellular regulation, many compounds and genes are regulated complicatedly. For the sake of categorizing such modules of cellular regulation, bioinformatics approaches for "gene set enrichment" (GSEA) statistics have been developed [23]. The consequences of GSEA analysis revealed that 428 gene sets were upregulated in the IshikawaPR cell line with P-value < 0.05, among which 145 gene sets were significantly enriched at nominal P-value < 0.01. A total of 116 gene sets were upregulated in the Ishikawa cell line with P-value < 0.05, among which 34 gene sets were significantly enriched at nominal P-value < 0.01 (Supplementary file 4). As is shown in Table 3, pathways including interferon gamma response, TNF-a signaling via NF-KB, epithelial mesenchymal transition, interleukin1 beta production and negative regulation of response to drug were significantly enriched in IshikawaPR cell line (Fig. 2b). While in Ishikawa cell line (Table 4), the consequences showed that pathways about mesenchymal to epithelial transition, negative regulation of insulin secretion, apical junction assembly and plasma membrane receptor complex compounds were highly enriched (Fig. 2c). All the differential results of gene sets were visualized by Enrichment Map plug-in of Cytoscape (Fig. 2a).
Altogether, these data suggested that when Ishikawa cells were stimulated and selected by MPA for almost 10 months, the functions of cell signal transduction such as nuclear receptor activity and cytokine biosynthetic process including interferon gamma, interleukin1 production and epithelial cell polarity were dramatically changed.   (Fig. 3a). The red color of a node reflects the upregulated gene and the green means the downregulated gene. The size of the node indicates the connectivity degree and the width of edge displays the combined score. PPI network analysis had been studied by using the threshold value of confidence > 0.4 and connectivity degree ≥ 10. In this network, it contained 159 nodes and 244 edges. The plug-in of cytoHubba in Cytoscape was used to screen hub genes, then a significant submodule was obtained (Fig. 3b), from which we chose the hub genes with high scores. Finally, 10 common hub genes (CDH1, JAG1, PTGES, EPCAM, CNTNAP2, TBX1, MSX1, KRT19, OAS1 and DAB2) were identified in the subnetwork (Table 5 and Table 6). Next, we observed the mutation and DNA copy number alterations of 10 key genes (Fig. 3c).

Gene-Associated MicroRNAs Network Analysis
To explore the potential upstream regulator of hub genes, predicted miRNAs of hub gene were analyzed by miRTarBase database. Main miRNAs with interactions of more than two genes were listed in Table 7. Moreover, Cytoscape was used to construct the hub gene-relevant miRNAs network ( Fig. 4a). There were a total number of 8 genes, 118 miRNAs and 128 genemiRNA pairs contained in the network. Some miRNAs were found to play a critical role in regulating essential genes. Has-miR-335-5p was predicted to regulate PTGES, OAS1, KRT19 and DAB2, has-miR-26b-5p may regulate DAB2, CDH1 and JAG1. Furthermore, genes of PTGES and JAG1 were regulated by has-miR-124-3p, whose high expression may be associated with worse survival (Supplementary Fig. 1), suggesting that it may be involved in tumor resistance and may become a prognostic indicator for endometrial cancer.

Core Transcriptional Factors Mediation Network Analysis of Hub Genes
To identify the transcriptional regulation of the hub genes and assess the effect of TFs on the expression of the hub genes, gene-TFs regulation network was performed by using a Network Analyst network-based service. Totally, 143 TFs were included in the network, constructing 203 gene-TFs interaction pairs (Fig. 4b). In this network, MAZ was considered as the key TF to regulate five hub genes: CDH1, EPCAM, KRT19, MSX1, and TBX1. In addition, TFDP1 plays a second important role in regulating CDH1, CNTNAP2, KRT19 and MSX1 (Table 8). Furthermore, we explored the correlation of hub genes and core TFs of MAZ and TFDP1in endometrial carcinoma using TCGA datasets, respectively. From these results, we found that MAZ and TFDP1 had positive correlations with CDH1, EPCAM, MSX1, KRT19, OAS1 and had negative correlations with JAG1, TBX1 and DAB2 (Fig. 5a-b).
Additionally, we found that MAZ most positively related with EPCAM and negatively related to DAB2.
While TFDP1 was positively associated with gene CDH1 and negatively associated with gene JAG1.

Methylation Status and Expression Validation of Hub Genes in HPA
The initiation of cancer resistance was controlled by both genetic and epigenetic events. Epigenetic changes also make an important impact on occurrence of drug resistance. Therefore, we decided to detect the methylation status of hub genes. As is performed in Fig. 6a, the genes of CDH1, JAG1, EPCAM and MSX1 were aberrantly methylated, which were in consistent with their expression respectively, on the basis of Ualcan website (Fig. 6b). In addition, Immunohistochemistry (IHC) staining obtained from the HPA database showed the dysregulation of the expression of hub genes ( Fig. 6c) we explored the prognostic values of the four essential genes further, which were obtained in GEPIA and displayed in Fig. 7e-

Validation of hub gene MSX1 based on the TCGA dataset
The relationship between MSX1 expression and different pathological grade was measured, which suggested that mRNA expression of MSX1 were significantly correlated with tumor grades (Fig. 8a).
To further investigate the potential functions of MSX1, we performed GSVA on the TCGA data. As shown in Fig. 8b, genes in high expression groups of MSX1 were enriched in "positive regulation of intrinsic apoptotic signaling pathway by p53 class mediator" and "epithelial to mesenchymal transition" pathways. Furthermore, we enrolled survival and follow-up data from TCGA cohort. It suggested that high expression of MSX1 were significantly associated with favorable prognosis in EC patients (Fig. 8c). We also used the tumor-immune system interactions (TISIDB) online database to detect the expression and prognostic value of MSX1 in other types of tumors (Fig. 8d). It showed better prediction function in endometrial cancer (Fig. 8e). In addition, we explored association of hub genes' expression with immune infiltration, the results showed that no or weak associations were observed between MSX1 and infiltration of lymphocytes ( Supplementary Fig. 2), while we found MSX1 was positively associated with immunostimulator NT5E (Fig. 8f-g).

Discussion
The majority of women in reproductive period with endometrial precancerous lesion and welldifferentiated endometrial neoplasm have an intense willing to preserve fertility. However, when closely following up younger patients with the progestin conservative therapy, clinicians observed that more than 30% of them responded poorly to the treatment [4]. The major account for such a high failure rate is that the potential molecular mechanisms of drug resistance remain unclear. Our team has been keen on this research theme for several years [5-7, 24, 25]. On the basis of our previous work, we recently conducted bioinformatics research replying on the dataset from GEO website. By comparing endometrial adenocarcinoma cell line Ishikawa with its counterpart IshikawaPR (MPA resistant cell line), we systematically analyzed their significant different genes, relevant molecular and pathways as well as their DNA copy number and the status of methylation in order to identify the essential candidate genes that promoted the carcinogenesis and progestin resistance.
In our present research, among the 24384 DEGs, we utilized the top 250 genes to conduct Functional and pathway enrichment analysis and GSEA analysis, which may provide novel insights for clarifying pathogenesis of progestin resistant. As was exhibited in DAVID, the genes were enriched in biological processes (BP) of negative regulation of DNA binding, type I interferon signaling pathway, neuron migration and axonogenesis involved in innervation, which made complementary remarks to previous points that EGF/EGFR [26] and insulin [27] signaling pathways may lead to progestin resistance.
Meanwhile, KEGG pathway showed that cell adhesion molecules pathway and endocytosis signals were involved. Furthermore, the results of GSEA reported that nuclear receptor activity, chronic inflammatory response and endothelial cell proliferation pathway were enriched in MPA resistant cells, which were significantly different from MPA sensitive cells, suggesting that prolonged progestin treatment may result in changes of cell membranes and nuclear receptors activity, affect signaling transduction and induce peripheral inflammation response and neurological development. As was said, intestinal bacteria can induce a chronic subclinical inflammatory process, leading to insulin resistance [28], confirming the association between inflammation and resistance, while how inflammatory cytokines evoked progestin resistance remained uncovered and the role neurological factors played in drug resistance provided a new perspective for us to explore.
PPI network of DEGs demonstrated the functional correlations, in which hub genes were screened out.
Afterwards, we investigated the mutation status of essential genes, all mutation rates were less than 10%, suggesting that their expressions were regulated by other factors, other than genomic mutations. Then, the protein expressions were testified in TCGA database, genes with statistically significant differences were enumerated. while PTGES and CNTNAP2 had no remarkable difference between normal and tumor samples, so the results were not shown. Moreover, the relevant microRNA and transcription factors were detected and has-miR-335-5p, has-miR-124-3p, MAZ and TFDP1 played a vital role. As was reported, has-miR-335-5p inhibits invasion and metastasis of thyroid cancer cells [29], breast cancer cells [30] and non-small cell lung cancer [31]. In endometrial stromal sarcomas (ESS), has-miR-335-5p was more highly expressed in patients with tumor metastasis and relapse [32].
In this study, high expression of has-miR-124-3p predicted worse survival, which was consistent with its effect on hepatocellular carcinoma [33]. MAZ was thought to act as a therapeutic target for aerobic glycolysis and the progression of neuroblastoma [34] and prostate cancer bone metastasis [35].
TFDP1 was involved in colon cancer stemness and cell cycle progression [36] and in the endometrium of women with deep infiltrating endometriosis (DIE) [37]. The relevant microRNA and interactions between hub genes and core TFs MAZ and TFDP1 had already been verified, suggesting they may participate in the formation of progestin resistance, while the detailed regulated mechanisms between TFs and hub genes needed to be further confirmed both in vitro and vivo.
Epigenetic processes, such as DNA methylation, are known to regulate specific gene expression [38].
Therefore, in the current study, we researched the methylation status of key genes and presented the Additionally, existing researches reported that MSX1 inhibited the growth and metastasis of breast cancer cells and was frequently silenced by promoter methylation [39], and that MSX1 induced G0/G1 arrest and apoptosis in cervical cancer [40] and that epigenetic regulation of MSX1 associated with platinum-resistant disease in high-grade serous epithelial ovarian cancer [41]. However, the expression and function of MSX1 had not been explored in endometrial resistant lesions. Therefore, although all of the four genes were associated with patients' overall survival, due to its high tissue specificity, MSX1 may become a promising tissue specific marker for endometrial cancer initiation and predict the therapy effect of progestin.
We further verified the expression of MSX1 in a larger TCGA cohort. The results confirmed our findings that MSX1 had significant prediction value in EC and may make an impact on cancer progression through the P53 pathway. We also explored associations of MSX1's expression with the tumor microenvironment and found the interaction between MSX1 and immunostimulator NT5E [42].
However, a series of experimental verification about our found mechanism need to be investigated and its immune-reinforcing effect needs further detection.
In conclusion, a comprehensive analysis of the hub genes based on the GEO dataset will likely shed new light on progestin therapy. Our study highlighted a novel understanding of the potential biological mechanism in progestin resistance and identified the homeobox gene MSX1 as a biomarker to detect the sensitivity and efficacy of progestin treatment.

Conclusions
In summary, by using comprehensive bioinformatics analysis, we have identified DEGs and demonstrated for the first time that MSX1 is likely one of the main molecular indicator of progestin resistance in endometrial cancer. If validated in a larger cohort, MSX1 may become a useful target to detect progestin therapy effect, which is beneficial for younger patients who want to preserve fertility.

Availability of data and materials
All data analyzed during this study are included in this published article.

Ethics approval and consent to participate
Not required.

Consent for publication
This article is original and has not already been published in another journal.          Correlation analyses of hub genes and core TFs. a Correlation analysis between hub genes and MAZ. b Correlation analysis between hub genes and TFDP1.

Figure 5
Correlation analyses of hub genes and core TFs. a Correlation analysis between hub genes and MAZ. b Correlation analysis between hub genes and TFDP1.     Validation of hub gene in the TCGA dataset.