Identication of Two Novel Genes SNX10 and PTGDS with Cervical Squamous Cell Carcinoma Prognosis

Background: Cervical cancer (CC) is the primary cause of death in women. This study sought to investigate the therapeutic targets of CC. Methods: We downloaded four gene expression proles from GEO. The RRA method was used to integrate and screen DEGs between CC and normal samples. Functional analysis was performed by clusterproler. We built PPI network by STRING and selected hub modules via MCODE. CMap was used to nd molecules with therapeutic potential for CC. We also validated hub genes in GEO datasets, GEPIA, immunohistochemistry. Cox regression analysis, TCGA methylation analysis and ONCOMINE were carried out. ROC curve analysis and GSEA were also done to dig out the signicance of hub genes. Results: Functional analysis revealed that DEGs were signicantly enriched in binding, cell proliferation, transcriptional activity and cell cycle regulation. PPI network screened 30 prominent proteins, with CDK1 having the strongest association with CC. Cmap showed that apigenin, thioguanine and trichostatin A might be used to treat CC. Eight genes were screened out through GEPIA. Of them, only PTGDS and SNX10 have not been reported in CC related articles. The validation in GEO showed that PTGDS showed low expression in tumor tissues while SNX10 showed high expression in tumor tissues. Their expression proles were consistent with the results in immunohistochemistry. They can distinguish CC and normal tissue and have good diagnostic eciency. GSEA showed that the two genes were associated with the chemokine signaling pathway. TCGA methylation analysis showed that patients with low-expressed and hyper-methylated PTGDS had a bad prognosis than the patients with high-expressed and hypo-methylated PTGDS. Cox regression analysis showed that SNX10 and PTGDS were independent prognostic indicators for OS among CC patients. Conclusions: In conclusion, PTGDS and SNX10 showed abnormal expression and methylation in CC. Both genes could be used to develop new target treatments for CC.

Up to now, the expression pro les of thousands of differentially expressed genes (DEGs) in CC have been researched [5][6][7]. However, the results on some mRNAs are inconsistent. Here we use an unbiased approach to solve this problem.
In our study, we screened DEGs from four pro les downloaded from GEO. PPI network was built by STRING Database and hub modules selected via plug-in MCODE. CMap was used to nd potential molecules associated with CC. We also validated hub genes with GEO datasets, GEPIA, immunohistochemistry and ONCOMINE. ROC curve analysis and GESA were also done to tease out the signi cance of hub genes. The ow chart of this research was displayed in Fig. 1.

Screening DEGs
Keywords "cervical cancer geo accession" were put in the GEO database (https://www.ncbi.nlm.nih.gov/geo/) and the gene expression pro les of GSE6791, GSE63514, GSE39001 and GSE9750 were downloaded. The dataset information is shown in Table 1. We processed unquali ed data by R package. The data is calibrated, standardized and log2-transformed. Gene expression analysis was performed using the limma [8] package in the Bioconductor package. Relevant codes were placed into R. We selected four microarray datasets and analyzed them with limma. The |log2fold change (FC)| > 2 and adjusted p < 0.05 were set as cutoffs. RRA package was download (http://cran.r-project.org/) [9] and R was used for running the instruction code.

Functional analysis based on DEGs
The biological signi cance of DEGs was analyzed with DAVID (https://david.ncifcrf.gov/) database and clusterpro ler [10] (a package visualizing the biological pro les of genes). P < 0.05 was considered to be statistically signi cant.

PPI network integration
Search Tool for the Retrieval of Interacting Genes Database (STRING) [11] (http://www.string-db.org/) was used to assess PPI complex between identi ed and predicted proteins. In addition, the plug-in MCODE [12] of Cytoscape was conducted to select and visualize hub modules in PPI complex.

Identi cation of potential drugs
CMap [13] is a computer simulation method for predicting the potential drug that may induce or reverse a biological state encoded by the gene expression signature. The different probe components commonly between CC and normal samples were screened out with CMap database and divided into the up-and down-regulated groups. An enrichment score representing similarity was calculated. The positive score illustrated that the drug could induce cancer in human; the negative score illustrated the drug function oppositely and had potential therapeutic value.
Page 4/24 2.5 Construction of a prognostic signature Univariate Cox proportional hazard regression analysis was performed based on DEGs. The genes associated with prognosis were de ned using the cutoff value of p < 0.05. Next, a multivariate Cox regression model was constructed with genes of p < 0.01. Cox regression with p < 0.05 was constructed to estimate the risk score of each patient on the expression of the DEGs. Patients were divided into a lowand high-risk group according to their mean scores of prognostic risks. Kaplan-Meier survival analysis was conducted based on the low-and high-risk group. We also performed ROC curve analysis using ve years as the predicted time to assess the predictive value of the outcome. The areas under the ROC curve, sensitivity and speci city were used to describe predictive values.

Validation of key genes
We used GEPIA [14] (Gene Expression Pro ling Interactive Analysis) to analyze the expression and prognostic signi cance of DEGs. After reviewing literature, we screened real hub genes from DEGs. Subsequently the real hub genes were validated into GEO datasets, including GSE7803 and GSE29576, and ONCOMINE database (www.oncomine.org). GSE7803 included 21 CC tissue samples and 10 normal tissue samples. GSE29576 included 45 CC tissue samples and 17 normal tissue samples. ONCOMINE [15] dataset is a publicly accessible online cancer microarray database that enables online analysis on correlations between a candidate gene and various tumors according to DNA and RNA sequence data. The Human Protein Atlas (HPA) (http://www.proteinatlas.org/) was also used to validate the expression of the real hub genes. ROC curve analysis was performed to distinguish normal and cancer tissues.

Survival analysis and mapping of methylation level
Survival analysis on gene methylation and expression was conducted through R package to identify key prognosis-associated genes in CC. To explore the correlation between aberrant methylation and expression of genes, we extracted key genes with methylated expression from the downloaded data on TCGA CESC methylation. Then we evaluated the association between the methylated expression and the gene expression.

Gene set enrichment analysis (GSEA)
Based on the hub gene expression level, the samples were divided into two groups. In order to study the potential function of the DEGs. GSEA [16] (http://software.broadinstitute.org/gsea/index.jsp) was used to research a series of prior de ned biological pathways which might be enriched in the gene rank derived from hub gene between the two groups. Annotated gene set of c2.cp.kegg.v6.0.symbols.gmt in Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp) was selected as the reference. Additionally, we used "Clusterpro ler" package in R to handle the datasets, and the "Enrichplot" package to tease out the enriched pathways of the key genes. The adjusted-P< 0.05 was set as signi cant.

Identi cation of DEGs in CC
The CC expression microarray datasets (GSE6791, GSE9750, GSE39001 and GSE63514) were rstly standardized ( Figure S1). With limma package, 256 DEGs were ltered from GSE6791 (60 downregulated and 196 upregulated), 236 DEGs from GSE9750 (136 downregulated and 100 upregulated), 98 DEGs from GSE39001 (38 upregulated and 60 downregulated), 489 DEGs from GSE63514 (177 upregulated and 312 downregulated). DEGs from the 4 microarray datasets were exhibited in volcano maps ( Figure  S2A-D) and heatmaps ( Figure S3A-D). We analyzed the four microarray datasets via the limma package and then with RRA method according to their log-folding variation values ((|log2fold change (FC)| > 1 and adj. p <0.05). The RRA method was based on the theory that genes in each experiment were randomly ordered. For the genes ranking higher in the experiment, the possibility of differential expression is inversely proportional to the value of P. Through analytic hierarchy analysis, we sorted out 74 upregulated genes, and 73 down-regulated genes ( Table 2). Finally, the R-heatmap software was performed to plot the top 40 up-and down-regulated genes ( Figure 2).

Functional analysis of DEGs
The biological annotations of DEGs in CC were obtained with an online analysis tool named DAVID, which had GO analysis of up-and down-regulated genes (P 0.05). The GO analysis of DEGs covered three aspects: molecular function, biological processes and cellular composition ( Figure S4A). The upregulated genes were signi cantly enriched in microtubule binding, tubulin binding and ATPase activity ( Figure 3A), and the down-regulated genes in serine-type peptidase activity, serine hydrolase activity and serine-type endopeptidase ( Figure 3B). These results indicated that most DEGs were prominently enriched in structural molecule activity, midbody, kinesin complex and microtubule motor activity. (Figure S4 B-C and Figure S5A). Clusterpro le was performed to analyze the DEGs. The result showed that the upregulated genes were mostly enriched in DNA replication, Oocyte meiosis and Cell cycle. (Figure 3C), and the downregulated genes in Arachidonic acid metabolism, prostate cancer and signaling pathways regulating pluripotency of stem cells ( Figure 3D). The pathway-gene network ( Figure S5B) suggested that the cell cycle was the most important term in the biological processes of CC.

PPI network construction and modules selection
The PPI network of DEGs was constructed, including 147 nodes (74 up-regulated genes and 73 downregulated genes) and 562 edges ( Figure 4A). Degrees ≥30 was set as the cutoff. A total of 16 genes, such as CDK1, TOP2A, NCAPG, and KIF11, were showed the most signi cant difference in expression ( Figure  4B). A signi cant module was selected with plug-in MCODE, including 27 nodes and 343 edges ( Figure   S6A). GO and KEGG pathway enrichment analysis indicated that the genes in this module were mainly related to microtubule binding, tubulin binding, cell cycle and oocyte mitosis ( Figure S6B and 6C).

Small molecule drugs screening
CMap network was used to analyze 147 differentially expressed genes into two groups (74 in upregulated group and 73 in down-regulated group). After the signature query, the three compounds with the highest negative enrichment score (apigenin, thioguanine, and trichostatin A) were identi ed as potential therapeutic agents for CC (Table 3). The three-dimensional chemical structure of these three compounds is shown in Figure 5.

Validation of hub genes
We validated DEGs at GEPIA website, including survival analysis and tissue sample expression analysis ( Figure S7 and Figure S8). Eight genes (APOD, CXCL8, MMP1, MMP3, PLOD2, PTGDS, SNX10 and SPP1) had the same trend in both the above analysis. We literature-reviewed these eight genes, nding that only PTGDS and SNX10 had not been reported to be associated with CC. Therefore, we used GSE7803 and GSE29576 to validate PTGDS and SNX10 ( Figure S9). The results showed that PTGDS had high expression levels in normal tissues and low expression levels in tumor tissues, while SNX10 showed an opposite pro le. We further validated the two genes using immunohistochemistry ( Figure 6A-B) and ONCOMINE, obtaining the results consistent with those from the GEO database ( Figure 6C-D). The area under the curve of PTGDS was 0.919 and that of PTGDS was 0.905, suggesting that both can distinguish CC and normal tissue and have a good diagnostic e ciency ( Figure 7A). GSEA was performed to search KEGG pathways enriched in the TCGA samples. The top ten most enriched pathways included "hematopoietic cell lineage", "adhesion molecules cams", "vascular smooth muscle contraction", "systemic lupus erythematosus", "chemokine signaling pathway", "t cell receptor signaling pathway", "cytokine cytokine receptor interaction", "calcium signaling pathway", "neuroactive ligand receptor interaction" and "leukocyte transendothelial migration" ( Figure 7B) (adj.p < 0.05). In addition, the univariate and multivariate Cox proportional hazards regression analyses showed that SNX10 and PTGDS were independent prognostic indicators for OS among CESC patients (Table 4).
To nd out the mechanism of abnormal gene expression, we analyzed the gene expression level and methylation level from the Illumina Human Methylation 450 platform based on TCGA data. The association between the methylated expression and the gene expression of the two key driving genes were shown in Figure7C-D. The survival analysis showed that the patients with low-expressed and hypermethylated PTGDS had a worse prognosis than those with high-expressed and hypo-methylated PTGDS ( Figure 7E). However, SNX10 methylation has no statistical signi cance in survival analysis.
Kaplan-Meier survival analysis suggested that low-risk patients had better overall survival than high-risk patients in the TCGA cohort ( Figure 8A). ROC curve analysis was also completed according to the 5-year survival of the area under the receiver operating characteristic curve (AUC) value. The speci city and sensitivity were both highest when the risk score was 0.738 ( Figure 8B). The distribution of risk score, survival status, and the expression levels of ve genes was also analyzed ( Figure 8C-F). The expression levels of ve genes in low-and high-risk groups were shown in Figure S12A. The univariate and multivariate Cox proportional hazards regression analyses showed that only the risk score based on ve genes was independent prognostic indictor of CC ( Figure S12B-C).
The Heatmap showed the expression levels of the ve genes in high-and low-risk patients in the TCGA dataset. We observed signi cant difference in survival state (P < 0.001) and stage (P < 0.05) (Figure S   12D).

Discussion
CC brings on more than 265,700 deaths per year, making it the second most fatal malignancy in women. Nowadays, microarray and high-throughput sequencing technology have been used to identify the potential targets widely for CC treatment. Recent studies always use a single group or a small-size sample, lacking reliability.
This study integratively analyzed the expression pro les of four genes using R software and bioinformatics tools. A total of 147 differential genes were identi ed using RRA analysis, including 73 downregulated genes and 74 upregulated genes. GO functional annotation indicated that the upregulated DEGs were primarily associated with microtubule binding and the downregulated genes with serine-type peptidase activity. KEGG pathway analysis indicated that DEGs were primarily enriched in the cell cycle.
Similar to our ndings,Studies have reported that microtubule binding and cell cycle have an effect on the development of cervical cancer [17]. In other studies, It has shown that microtubule binding plays a role in the development of acute myeloid leukemia cells and the biology of colorectal cancer cells [18,19]. The change of cell cycle is the cause of abnormal proliferation of many tumor cells [20,21].
PPI network screened out 30 prominent CC associated proteins. Next, we found CDK1 was the most relevant protein in the PPI network of CESC. To our surprise, CDK1 was members of the top module 1 and suggesting that the top module 1 plays a crucial role in the PPI complex. Functional analysis indicated that the genes in this module were mainly enriched in microtubule binding, tubulin binding and cell cycle. CDK1 is one of the serine/threonine kinases that regulate the cell cycle by interacting with speci c cellcycle-regulatory cyclins [22]. It has been reported that CDK1 regulated CC [23].It is also involved in the proliferation of many tumors. Y. Zeng et al. found that Cyclin-dependent kinase 1 (CDK1)-mediated mitotic phosphorylation of the transcriptional co-repressor Vgll4 inhibits its tumor-suppressing activity. K.
Bednarek et al. found that CDK1 was involved in the process of laryngeal squamous cell carcinoma [24,25].
Cmap showed that apigenin, thioguanine and trichostatin A might be used to treat CC. Apigenin and trichostatin A have been shown to inhibit breast cancer growth[26, 27]. 6-thioguanine has also potential therapeutic effects on tumors[28]. Our ndings may help create the appropriate drugs for CC treatment.
Eight genes (APOD, CXCL8, MMP1, MMP3, PLOD2, PTGDS, SNX10 and SPP1) were screened from DEGs through GEPIA. Of them, only PTGDS and SNX10 had not been reported in CC research. According to Geo validation results, PTGDS was lowly expressed and SNX10 highly expressed in tumor tissues, which was consistent with the results from immunohistochemistry. TCGA methylation analysis showed that the patients with low-expressed and hyper-methylated PTGDS had a worse prognosis than those with highexpressed and hypo-methylated PTGDS. Cox regression analysis showed that SNX10 and PTGDS were independent prognostic indicators for OS among CC patients. GSEA showed that the two genes were associated with the chemokine signaling pathway. Zhong G et al. suggested that chemokine signaling was involved in the invasion and migration of lung cancer cells [29]. Chemokine signaling has also been reported to affect the progression of breast and hepatobiliary cancer [30,31]. In addition, the prognostic signature was constructed based on the eight hub genes. Of them, ve genes (SPP1, CXCL8, PTGDS, PLOD2, and MMP3) exhibited signi cant prognostic value. The Cox regression analysis showed that only the risk score of the ve genes was an independent prognostic indicator of CC.
Interestingly, all of the above genes have been shown to be associated with cervical cancer. Yan R et al.  [42]. It has also been identi ed that MMP1 participated in breast and ovarian cancer [43,44]. Banik D et al. demonstrated that MMP3 regulated tumor progression [45]. Ji Y et al. proved that C/EBPβ promoted tumor cell invasion and metastasis of colorectal cancer [46]. PLOD2 has been shown to play a role in cervical cancer [35] and renal  [54,55]. The present study is the rst to report the expression and prognosis of these two genes in CC. We found that their methylation is associated with CC prognosis, which has never been reported before.

Consent to publication
Not applicable.

Availability of data and material
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.   Flow chart of the present study.