Comprehensive analyses reveal three-gene signatures in ovarian cancer

Background: Ovarian cancer is a common cancer that affects the quality of women’s life. With the limitation of the early diagnosis of the disease, ovarian cancer has a high mortality rate worldwide. However, the molecular mechanisms underlying tumor invasion, proliferation, and metastasis in ovarian cancer remain unclear. We aimed to identify, using bioinformatics, important genes and pathways that may serve crucial roles in the prevention, diagnosis, and treatment of ovarian cancer. Methods: Three microarray datasets (GSE14407, GSE36668, and GSE26712) were selected for whole-genome gene expression proling , and differentially expressed genes were identied between normal and ovarian cancer tissues. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were performed using DAVID. Additionally, a protein-protein interaction network was constructed to reveal possible interactions among the differently expressed genes. The prognostic values of the hub genes were investigated using Gene Expression Proling Interactive Analysis (GEPIA) and the KM plotter database. Meanwhile, the mRNA expression analysis of the hub genes was performed using the GEPIA database. Results: We obtained 247 upregulated and 530 downregulated differently expressed genes, and 52 hub genes in the signicant gene modules. Enrichment analysis revealed that the hub genes were signicantly ( P < 0.05) associated with proliferation. Additionally, BIRC5, CXCL13, and PBK were revealed to be signicantly associated with the clinical prognosis of patients with ovarian cancer. Immunohistochemical staining results obtained from the Human Protein Atlas revealed that BIRC5, PBK, and CXCL13 were highly expressed in ovaria cancer tissues.


Background
Ovarian cancer (OC) is the sixth most common meladies among women worldwide. However, even with advances in detection and therapeutic methods, the mortality rate in women is still high [1]. The most common cause of high mortality in OC is that patients experience nonspeci c symptoms, if any, in the early stages of the disease that can result in delayed diagnosis. More than 75% of OC patients are diagnosed at an advanced stage, which hinders the implementation of effective treatment measures [2,3]. Therefore, early screening and diagnosis is essential for OC patients.
To date, many genome-wide expression pro les have been acquired by gene-chip microarrays. We can identify new screening or treatment targets for different diseases using unbiased bioinformatic tools, and also understand potential biological functions and molecular mechanisms by analyzing gene expression data using pathway pro ling software. A large number of datasets have been generated by gene-chip microarray in OC patients [4]; differently expressed genes (DEGs) can be analyzed by gene ontology (GO) function annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses to explore the possible mechanisms for disease development [5]. This prompted us to investigate potential biomarkers in OC and further validate the molecular mechanism through cellular experiments in the future.

Identi cation of DEGs
We identi ed a total of 786 genes that met the above criteria, including 247 up-regulated genes and 539 down-regulated genes (Supplementary Table 1). Then, we plotted a heatmap to represent the expression of DEGs by pheatmap package in R( Supplementary Fig. 1).

GO function and KEGG pathway enrichment
We carried out GO function and KEGG pathway enrichment analyses with the criteria of p < 0.05 to explore the potential molecular mechanisms for DEGs using DAVID (Fig. 2). We presented the top ten signi cant terms for BP, CC, and MF by bubble plot based on p-values using the ggplot2.R package; while the top ten signi cant terms in the KEGG pathways were visualized using the GOplot.R package. The GO function enrichment analysis results showed that the DEGs were mainly enriched in the terms involving biological processes, including cellular protein modi cation processes, response to organic substances, regulation of signaling, regulation of cell communication, and positive regulation of metabolic processes (Fig. 2a). The cellular components were mainly involved in the extracellular region, membrane-bounded vesicles, extracellular exosomes, extracellular vesicles, and extracellular organelles (Fig. 2b). Regarding molecular function, the DEGs were signi cantly enriched in protein binding, protein homodimerization, calcium ion binding, heparin binding, and microtubule activities (Fig. 2c). The most enriched KEGG pathway terms for the DEGs were metabolic pathways, pathways in cancer, PI3K-Akt signaling pathway, proteoglycans in cancer, and ECM-receptor interaction (Fig. 2d).

PPI networks and modules
We uploaded the DEGs to the STRING website to construct the PPI network that were visualized using Cytoscape software. A total of 779 nodes with 1548 edges with scores ≥ 0.900 (the highest con dence) were selected to construct the PPI networks. The two signi cant modules were selected using the MOCODE plug-in. The rst module consisted of 28 nodes and 356 edges ( Supplementary Fig. 2a). In this module, 28 DEGs were up-regulated according to the gene expression pro ling in the module. The second module consisted of 24 nodes and 144 edges ( Supplementary Fig. 2b). In this module, 9 genes were upregulated and 15 genes were down-regulated according to the gene expression pro ling among the common hub genes in the module. For a more in-depth understanding of the common hub genes, we performed GO function and KEGG pathway enrichment analyses via the DAVID website. we found that the DEGs in Module 1 were mainly associated with the mitotic cell cycle process (BP), the microtubule cytoskeleton (CC), and heterocyclic compound binding (MF) (Supplementary Fig. 3a); whereas, DEGs in Module 2 were mainly associated with the G-protein coupled receptor signaling pathway (BP), the extracellular region (CC), and receptor binding (MF) (Supplementary Fig. 3b).

Validation of DEG expression levels and survival analysis of common hub genes
The prognostic information involving PFS and OS was obtained from the KM plotter and GEPIA database respectively. We found that BIRC5 (log-rank p = 1.6e-02), PBK (log-rank p = 3.5e-02), CXCL13 (log-rank p = 9.4e-03) were signi cantly correlated with the PFS of ovarian cancer patients based on the KM plotter. In order to determine the reliability of the prognosis values, we further analyzed the effect of the above genes on the overall survival of ovarian cancer patients using GEPIA. We found that only PBK (log-rank p = 4.4e-02), BIRC5 (log-rank p = 1.3e-02), and CXCL13 (log-rank p = 4.2e-04) were correlated with the overall survival of ovarian cancer patients. Moreover, the mRNA expression levels of PBK, BIRC5, and CXCL13 were also higher in OV tissues compared to normal tissues based on GEPIA database analysis (Fig. 3a).

Real hub gene validation
We further compared the transcriptional levels of the real hub genes in cancer tissue with normal tissues using ONCOMINE databases. ONCOMINE analysis revealed that the mRNA expression levels of BIRC5, CXCL13, PBK were signi cantly up-regulated in ovarian cancer tissues compared to normal tissues (Fig. 3b). Immunohistochemical staining data from HPA demonstrated the deregulated protein expression level of PBK, CXCL13, and BIRC5 (Fig. 3b). Therefore, we performed downstream analysis for BIRC5, CXCL13, and PBK.

Mining genetic alterations and co-expressed genes
We analyzed PBK, BIRC5, and CXCL13 using cBioportal to explore the genomic alterations in ovarian cancer. Among the three ovarian cancer studies, alterations ranging from 6.54-14.92% were found for the gene sets submitted for analysis (Fig. 3c); the individual frequencies of alteration in PBK, BIRC5, and CXCL13 were shown in Fig. 3d respectively. In order to deepen our understanding on the potential function of PBK, BIRC5, and CXCL13, we investigated the biological processes of the top 50 genes coexpressed with them. Regarding PBK and BIRC5, the mainly enriched biological processes were GO:0007067 (mitotic nuclear division), GO:0051301 (cell division), and GO:0007062 (sister chromatid cohesion) (Supplementary Fig. 4a-b). As for CXCL13, the mainly enriched biological processes were GO:0006955 (immune response), GO:0042110 (T cell activation), GO:0031295 (T cell co-stimulation) ( Supplementary Fig. 4c).

Results of GSEA and GSVA
In order to further understand the potential molecular mechanisms, we performed GSEA and GSVA to explore the pathways enriched in the hub gene high expression group. In the BIRC5 high expression group, the results of GSEA revealed that the top three enriched KEGG pathways were DNA replication, RNA polymerase, and homologous recombination (Fig. 4a). In the CXCL13 high expression group, the results of GSEA revealed that the top three enriched KEGG pathways were primary immunode ciency, intestinal immune network for IgA production, and cytosolic DNA sensing pathway (Fig. 4b). As for the PBK high expression group, the results of GSEA revealed that the top three enriched KEGG pathways were citrate cycle (TCA cycle), RNA degradation, and oxidative phosphorylation (Fig. 4c). Furthermore, GSVA further revealed the signi cant association between BIRC5, PBK, CXCL13 and proliferative process (Fig. 4d-f).

Discussion
Ovarian cancer comprises a histologically and genetically broad range of tumors, including those of epithelial, sex cord-stromal and germ cell origin [5]. In this study, we identi ed potential gene targets for OC by comparing normal ovarian tissues and OC tissues using bioinformatic analyses on the basis of three pro le datasets (GSE14407, GSE36668, and GSE26712). We identi ed 247 up-regulated and 539 down-regulated DEGs, which we then inspected with GO function and KEGG pathway enrichment analyses. The results of the enrichment analysis revealed that the genes were mainly enriched in the various types of cancer-related functions and pathways. In order to identify the protein interactions and key modules, we uploaded the DEGs to the STRING website. A PPI network that contained 779 nodes/genes and 1548 edges was constructed. The rst key module was chosen from the PPI network: it contained 28 nodes/genes and 356 edges. The second key module contained 24 nodes/genes and 144 edges. In order to identify the prognosis value of the common hub genes from the above two gene modules, we performed the PFS and OS analyses using KM plotter and GEPIA database respectively; we con rmed the transcription level of the hub genes using the GEPIA database. Then, we identi ed CXCR4, BIRC5, CXCL13, and PBK for the follow-up analysis; we also found that CXCR4, BIRC5, CXCL13, and PBK were strongly positive in the ovarian cancer data set of the Oncomine database. However, CXCR4 was excluded by HPA database analysis because of the lack of immunohistochemical staining data, so we performed the downstream analysis for PBK, BIRC5, and CXCL13. Based on the cBioportal database, we found that the genomic alteration of the real hub genes ranged from 6.54-14.92% in the three ovarian cancer datasets. To clarify the biological role of PBK, BIRC5, and CXCL13, we selected top 50 genes coexpressed with them and performed Go function analysis to explore the potential biological behavior related to these hub genes. Regarding the co-expressed genes of PBK and BIRC5, the biological processes were enriched in the mitotic nuclear division, cell division, and sister chromatid cohesion. Regarding coexpressed genes of CXCL13, the biological processes were mainly enriched in the immune response, T cell activation, T cell costimulation. The results of GSEA and GSVA indicated that many cell cycle-related pathways were enriched in the high-expression groups of the real hub genes, suggesting their contribution to ovarian carcinoma cell proliferation. Moreover, our results also revealed that CXCL13 could be related to the immune regulation in ovarian cancer microenvironment.
In total, we performed comprehensive bioinformatic analysis to identify three gene signatures that were related to the survival and prognosis of ovarian cancer. BIRC5 encodes the protein survivin, which is a member of the apoptosis inhibitor family. Survivin is highly expressed in a variety of cancer tissues compared to normal tissues. It was reported that expression level of survivin (BIRC5) gene is signi cantly increased in primary prostate cancers and metastases, but there is no difference between recurrent and non-recurrent prostate cancer [7]. The chemokine ligand 13 (CXCL13) and its chemokine receptor 5 (CXCR5) play fundamental roles in in ammatory, infectious and immune responses. And the CXCL13/CXCR5 signaling axis is a crucial role in the cancer cell growth, survival, invasion, and metastatic dissemination [8]. Wei et al. reported that intra tumoral CXCL13 expression can be used as an independent prognostic marker for patients after gastric cancer resection, and that the expression level of CXCL13 may appear as a predictive biomarker of response to postoperative chemotherapy in these patients [9].
PDZ-binding kinase (PBK) promotes malignant tumor progression in various cancers such as colorectal and oral cancer. Su et al. found that PBK/TOPK expression detected by the immunohistochemical staining was signi cantly associated with prognosis in colorectal cancer patients based on a retrospective analysis [10]. Chang et al. found that PBK was highly expressed in oral cancer patients, and that high PBK/TOPK expression could be a favorable prognosis marker for these patients [11]. Our study further identi ed that PBK could be a potential biomarker in OC; however, the results need to be further validated by molecular and cellular experiments.

Conclusion
In conclusion, we identi ed three potential therapeutic targets for ovarian cancer: BIRC5, CXCL13, and PBK. These genes have been previously reported to be involved in many pathways associated with tumorigenesis, but the link with ovarian cancer have not been completely identi ed. All of the candidate DEGs that are ovarian cancer-related genes should be further con rmed via molecular biological experiments. Therefore, we will further investigate the biological function of PBK, BIRC5, and CXCL13 in ovarian cancer by cytology experiments.

Microarray data preprocessing and analysis
We downloaded the microarray les GSE14407, GSE36668, and GSE26712 from the public GEO database [12,13](https://www.ncbi.nlm.nih.gOV/geo/) which included 12 OC tissues and 12 normal ovary tissues, 4 OC tissues and 4 normal ovary tissues, and 185 OC tissues and 10 normal ovary tissues, respectively. Because there is no gene symbol in the list of GSE14407, GSE36668, and GSE26712 datasets, we converted the probe names to the gene symbol according to the Whole Human Genome Microarray 4 × 44K G4112F (Probe Name Version). However, a large number of probes can correspond to the same gene symbol, so we used the p-value as the criterion for the downstream differential gene expression analysis. The probes with the most signi cant p-values were chosen to correspond to DEGs (Fig. 1).

Data processing of DEGs
We merged the GSE14407, GSE36668, and GSE26712 datasets and performed batch normalization using the surrogate variable analysis (sva) and limma packages [14][15][16][17]. Then we further used the limma R package to identify the DEGs in the normal ovarian tissue and paired tumor tissue. Genes that met the following criteria were chosen as DEGs [18]: |log2-fold change| ≥ 1.0 and adjusted p-values < 0.05 (moderated t-statistics corrected by Benjamini and Hochberg's method).

GO function and KEGG pathway enrichment analyses
We performed GO function and KEGG pathway enrichment analyses for the DEGs. GO function can reveal the possible mechanisms involving these DEGs in the three aspects of biological function, including the biological process (BP), molecular function (MF), and cellular component (CC) [19]. KEGG pathway enrichment analysis can provide the known signal metabolic pathway in which these DEGS are involved. We used the DAVID website to perform the GO function and KEGG pathway enrichment analyses for DEGs according to the p-value < 0.05 criteria. The DAVID website (https://david.ncifcrf.gov/) is a webaccessible program that provides comprehensive function enrichment tools to understand the biological meanings for genes.

PPI networks and module analysis
We used the Search Tool for the Retrieval of Interacting Genes (STRING) to nd the key proteins and construct PPI networks. The STRING database contains more than 24.6 million proteins and 2 billion interactions from 5,090 organisms [20]. We uploaded the DEGs to the STRING database and set the interaction score ≥ 0.9 (the highest con dence) as the threshold. We used the Cytoscape software to visualize the PPI networks [21,22]. We investigated the key modules that were constructed by Molecular Complex Detection (MCODE) in Cytoscape software. The analysis parameters k-score ≥ 8 and p < 0.05 were set as the signi cant thresholds. We performed the GO function and KEGG pathway enrichment analyses for the DEGs in the key modules.

Validation of hub genes
In order to validate the reliability of the hub genes, we uploaded said genes into the Gene Expression Pro ling Interactive Analysis (GEPIA) database [23]. We identi ed gene expression with |log fold change| > 1.0 and p-value < 0.001 as different expression genes. In this study, the effect of the hub genes on overall survival (OS) and progression-free survival (PFS) was evaluated combining the GEPIA and Kaplan-Meier plotter databases, respectively. The OS and PFS were determined by splitting ovarian cancer patients into two groups based on the median expression of hub genes (high vs. low expression) and assessing with log-rank test.
Oncomine database and The Human Protein Atlas database analyses Oncomine database (http://www.oncomine.org) [24]is a publicly accessible, online cancer gene information mining platform for the purpose of cancer genome expression analysis. To date, the Oncomine database has collected 715 datasets and 86,733 cancer and normal tissue sample data. This database was used to identify the different gene expression and perform clinical and pathological analyses for various types of cancer. In this study, to investigate the correlation between the expression of the hub genes and tumorigenesis, we evaluated the expression level of the hub genes in common cancer types using the Oncomine database. We further analyzed the protein expression and distribution of the hub genes in ovarian cancer tissues and normal tissues by The Human Protein Atlas (HPA) database.
Exploring cancer genomic data in TCGA The cBioPortal for Cancer Genomics (http://cbioportal.org) [25]is a cancer genome database that helps facilitate research from multidimensional cancer genomic data analysis. cBioportal comprises a collection of more than 200 studies, including all TCGA projects and published studies from the literature.
In this research, we explored the genetic alterations and co-expression of hub genes. In order to investigate the potential biological functions of real hub genes, we further performed GO function enrichment analysis for the top 50 co-expressed genes respectively.
Gene set enrichment analysis (GSEA) and gene set variation analysis (GSVA) We utilized the GSEA software to perform GSEA analysis of the real hub genes with merged dataset (GSE14407, GSE26712, and GSE36668). Then, we used the "GSVA" R package to further explore the pathways most signi cantly correlated with these hub genes [26]. In this study, we divided the merged datasets (GSE14407, GSE36668, and GSE26712) into two groups according to the median expression level of real hub genes. Terms with normalized enrichment score ≥ 1 and false discovery rate < 0.25 were identi ed in the GSEA. We considered terms with a p value 0.001 statistically signi cant in the GSVA.

Construction of the miRNA-mRNA network
The miRDB database (http://www.mirdb.org/) [27,28]is an online platform for miRNA target prediction and functional annotation. Targets in the miRDB were based on the bioinformatic tool MirTarget that predicts miRNA targets with machine learning methods. In this study, we used miRDB to predict the miRNA associated with our hub gene targets. Next, the miRNA-mRNA regulatory networks were visualized via Cytoscape.

Declarations
Ethics approval and consent to participate Not applicable