Identification of DEGs in CC
The CC expression microarray datasets (GSE6791, GSE9750, GSE39001 and GSE63514) were firstly standardized (Figure S1). With limma package, 256 DEGs were filtered from GSE6791 (60 downregulated and 196 upregulated), 236 DEGs from GSE9750 (136 downregulated and 100 upregulated), 98 DEGs from GSE39001 (38 upregulated and 60 downregulated), 489 DEGs from GSE63514 (177 upregulated and 312 downregulated). DEGs from the 4 microarray datasets were exhibited in volcano maps (Figure S2A-D) and heatmaps (Figure S3A-D). We analyzed the four microarray datasets via the limma package and then with RRA method according to their log-folding variation values ((|log2fold change (FC)| > 1 and adj. p < 0.05). The RRA method was based on the theory that genes in each experiment were randomly ordered. For the genes ranking higher in the experiment, the possibility of differential expression is inversely proportional to the value of P. Through analytic hierarchy analysis, we sorted out 74 up-regulated genes, and 73 down-regulated genes (Table 2). Finally, the R-heatmap software was performed to plot the top 40 up- and down-regulated genes (Fig. 2).
Table 2
Screening DEGs in cervical cancer by integrated microarray
DEGs | Gene name |
Upregulated | MMP1 IFI44L MMP12 PLOD2 CXCL11 RFC4 HOXC6 TOP2A ISG15 SPP1 PRC1 RAD51AP1 SYCP2 DTL APOBEC3B MLF1 TTK CDKN2A INHBA NDC80 EZH2 CXCL8 KIF23 CTHRC1 MCM2 KIF20A KIF4A CDK1 MICB CENPE LAMP3 IFI44 CXCL13 CDC25B TOPBP1 CDC7 LMNB1 RRM2 CDC6 HLTF SYNGR3 NCAPG RYR1 ENO2 SMC4 NEK2 CXCL1 MCM3 C1QB SNX10 PPAP2C KIF11 MCM5 AIM2 AURKA MAD2L1 PBK CENPF KIF15 KNTC1 NTS FBXO5 STIL SPAG5 TRIP13 EPCAM MELK MMP3 KIF14 GZMB CDC20 CEP55 BUB1B NEFH CRNN MAL CRISP3 CRCT1 SPINK5 ALOX12 KRT13 SPRR3 PPP1R3C KRT1 SPRR1B APOD SPRR1A CFD IVL CXCL14 RHCG SPRR2B ENDOU EDN3 CRYAB TMPRSS11D CLIC3 HPGD UPK1A TST KLK11 BBOX1 EMP1 CLCA4 KLK12 SCNN1B NSG1 SLURP1 SOSTDC1 IL1R2 KRT4 KLF4 DSG1 PPL DEFB1 SULT2B1 GPX3 TGM3 ALOX12B ECM1 NDN ISL1 CRABP2 FCGBP PTGDS TMPRSS11B CCND1 FOSB GYS2 TGFBR3 LDOC1 S100A7 KRT2 FGFBP1 PRSS3 ID4 ADRB2 VAT1 SLIT2 CLDN8 KLK10 PTK6 SPINK2 AR PDGFD AKR1B10 EREG |
Downregulated |
Functional analysis of DEGs
The biological annotations of DEGs in CC were obtained with an online analysis tool named DAVID, which had GO analysis of up- and down-regulated genes (P༜0.05). The GO analysis of DEGs covered three aspects: molecular function, biological processes and cellular composition (Figure S4A). The upregulated genes were significantly enriched in microtubule binding, tubulin binding and ATPase activity (Fig. 3A), and the down-regulated genes in serine-type peptidase activity, serine hydrolase activity and serine-type endopeptidase (Fig. 3B). These results indicated that most DEGs were prominently enriched in structural molecule activity, midbody, kinesin complex and microtubule motor activity. (Figure S4 B-C and Figure S5A). Clusterprofile was performed to analyze the DEGs. The result showed that the upregulated genes were mostly enriched in DNA replication, Oocyte meiosis and Cell cycle. (Fig. 3C), and the down-regulated genes in Arachidonic acid metabolism, prostate cancer and signaling pathways regulating pluripotency of stem cells (Fig. 3D). The pathway-gene network (Figure S5B) suggested that the cell cycle was the most important term in the biological processes of CC.
PPI network construction and modules selection
The PPI network of DEGs was constructed, including 147 nodes (74 up-regulated genes and 73 down-regulated genes) and 562 edges (Fig. 4A). Degrees ≥ 30 was set as the cutoff. A total of 16 genes, such as CDK1, TOP2A, NCAPG, and KIF11, were showed the most significant difference in expression (Fig. 4B). A significant module was selected with plug-in MCODE, including 27 nodes and 343 edges (Figure S6A). GO and KEGG pathway enrichment analysis indicated that the genes in this module were mainly related to microtubule binding, tubulin binding, cell cycle and oocyte mitosis (Figure S6B and 6C).
Small molecule drugs screening
CMap network was used to analyze 147 differentially expressed genes into two groups (74 in up-regulated group and 73 in down-regulated group). After the signature query, the three compounds with the highest negative enrichment score (apigenin, thioguanine, and trichostatin A) were identified as potential therapeutic agents for CC (Table 3). The three-dimensional chemical structure of these three compounds is shown in Fig. 5.
Table 3
Rank | CMap name | Mean | N | Enrichment | P-value |
1 | apigenin | -0.848 | 4 | -0.973 | 0 |
2 | thioguanosine | -0.799 | 4 | -0.96 | 0 |
3 | trichostatin A | -0.386 | 182 | -0.261 | 0 |
4 | viomycin | 0.751 | 4 | 0.924 | 0.00004 |
5 | adiphenine | 0.779 | 5 | 0.907 | 0.00004 |
6 | atractyloside | 0.651 | 5 | 0.839 | 0.00024 |
7 | chrysin | -0.745 | 3 | -0.939 | 0.00032 |
8 | isoflupredone | 0.768 | 3 | 0.937 | 0.00044 |
9 | nadolol | 0.649 | 4 | 0.866 | 0.00044 |
Validation of hub genes
We validated DEGs at GEPIA website, including survival analysis and tissue sample expression analysis (Figure S7 and Figure S8). Eight genes (APOD, CXCL8, MMP1, MMP3, PLOD2, PTGDS, SNX10 and SPP1) had the same trend in both the above analysis. We literature-reviewed these eight genes, finding that only PTGDS and SNX10 had not been reported to be associated with CC. Therefore, we used GSE7803 and GSE29576 to validate PTGDS and SNX10 (Figure S9). The results showed that PTGDS had high expression levels in normal tissues and low expression levels in tumor tissues, while SNX10 showed an opposite profile. We further validated the two genes using immunohistochemistry (Fig. 6A-B) and ONCOMINE, obtaining the results consistent with those from the GEO database (Fig. 6C-D). The area under the curve of PTGDS was 0.919 and that of PTGDS was 0.905, suggesting that both can distinguish CC and normal tissue and have a good diagnostic efficiency (Fig. 7A). GSEA was performed to search KEGG pathways enriched in the TCGA samples. The top ten most enriched pathways included “hematopoietic cell lineage”, “adhesion molecules cams”, “vascular smooth muscle contraction”, “systemic lupus erythematosus”, “chemokine signaling pathway”, “t cell receptor signaling pathway”, “cytokine cytokine receptor interaction”, “calcium signaling pathway”, “neuroactive ligand receptor interaction” and “leukocyte transendothelial migration” (Fig. 7B) (adj.p < 0.05). In addition, the univariate and multivariate Cox proportional hazards regression analyses showed that SNX10 and PTGDS were independent prognostic indicators for OS among CESC patients (Table 4).
Table 4
Univariate analysis and multivariate analysis of the correlation of SNX10 and PTGDS expression with OS among cervical cancer patients.
Variables | Univariate analysis | | Multivariate analysis |
HR | 95%CI | p | | HR | 95%CI | p |
Stage(Stage I & Stage II vs Stage III & Stage IV) | 2.338 | 1.364–4.004 | 0.002 | | 3.078 | 1.329–7.129 | 0.009 |
Grade(G1 & G2 vs G3) | 1.221 | 0.947–1.574 | 0.124 | | 0.837 | 0.565–1.240 | 0.375 |
Age(≤ 50 vs ༞50) | 1.263 | 0.755–2.112 | 0.373 | | 1.197 | 0.711–2.011 | 0.498 |
SNX10 | 1.202 | 0.957–1.509 | 0.113 | | 1.424 | 1.103–1.838 | 0.007 |
PTGDS | 0.838 | 0.729–0.962 | 0.012 | | 0.802 | 0.693–0.928 | 0.003 |
To find out the mechanism of abnormal gene expression, we analyzed the gene expression level and methylation level from the Illumina Human Methylation 450 platform based on TCGA data. The association between the methylated expression and the gene expression of the two key driving genes were shown in Fig. 7C-D. The survival analysis showed that the patients with low-expressed and hyper-methylated PTGDS had a worse prognosis than those with high-expressed and hypo-methylated PTGDS (Fig. 7E). However, SNX10 methylation has no statistical significance in survival analysis.
Establishment of Cox regression model
Univariate cox regression analysis screened out seven genes, including APOD, CXCL8, MMP1, MMP3, PLOD2, PTGDS and SPP1 (Figure S10). Multivariate Cox regression analysis screened five genes, including SPP1, CXCL8, PTGDS, PLOD2 and MMP3 (Figure S11). The score for predicting overall survival risk was calculated as followed: Risk score = 0.143* SPP1 + 0.136* CXCL8-0.093* PTGDS + 0.206* PLOD2 + 0.067* MMP3. Based on the risk score, CC patients were divided into low- and high-risk groups. Kaplan-Meier survival analysis suggested that low-risk patients had better overall survival than high-risk patients in the TCGA cohort (Fig. 8A). ROC curve analysis was also completed according to the 5-year survival of the area under the receiver operating characteristic curve (AUC) value. The specificity and sensitivity were both highest when the risk score was 0.738 (Fig. 8B). The distribution of risk score, survival status, and the expression levels of five genes was also analyzed (Fig. 8C-F). The expression levels of five genes in low- and high-risk groups were shown in Figure S12A. The univariate and multivariate Cox proportional hazards regression analyses showed that only the risk score based on five genes was independent prognostic indictor of CC (Figure S12B-C).
The Heatmap showed the expression levels of the five genes in high- and low-risk patients in the TCGA dataset. We observed significant difference in survival state (P < 0.001) and stage (P < 0.05) (Figure S 12D).