Combinatorial Bioinformatics Analysis Reveals Novel Biomarkers for Improved Ovarian Cancer Prognosis

Background: Given the known lethality of highly frequent ovarian cancer (OC) among females, it is imperative to investigate potential biomarkers of prognostic and therapeutic signicance. The objective of this study was to identify signicant differentially expressed genes (DEGs) with poor prognosis and to explore their underlying mechanisms. Methods: We acquired three microarray datasets (GSE14407, GSE36668 and GSE18520), available from the public database GEO. We compared a total of 72 cancerous and 26 normal samples originating from ovarian tissues. GEO2R and Venn diagram tools were used to obtain DEGs, followed by the gene ontology (GO) and Kyoto Encyclopedia of Gene and Genome (KEGG) analysis via Database for Annotation, Visualization and Integrated Discovery (DAVID). Subsequently, protein-protein interaction (PPI) network was constructed and visualized in Cytoscape. Results: Among three analyzed datasets, a total of 232 DEGs were common. The upregulated 108 genes were signicantly enriched in the cell adhesion, cellular response to interleukin-1, positive regulation of transcription from DNA/RNA, and transcription, extracellular matrix/region, anchored membrane component, cell junction and golgi membrane, sequence-specic DNA binding, transcription factor activity, RNA polymerase II regulatory region, and DNA binding. The PPI network analysis via MCODE plug-in revealed a total of 14 upregulated genes. Kaplan-Meier plotter analysis revealed that 9 genes were associated with signicantly worse survival among OC patients while 4 genes exhibited no signicant effect. Gene Expression Proling Interactive Analysis (GEPIA) showed that 13 DEGs had signicantly higher expression in the ovarian cancerous tissues compared to the normal ones. Repeated KEGG analysis showed that 11 genes (CDC6, CCNE1, BUB1B, CCNB2, BUB1, SFN, TTK, CDC20, PTTG1, CDK1 and CDKN2A)) were mainly associated with cell cycle while 2 genes (SFN and RRM2) were related to p53 signaling pathway. Conclusion: Our ndings identify potential upregulated DEGs The present study represents a focused effort to identify potential OC-related biomarkers that could be useful in effective OC prognosis. We used integrated bioinformatics tools to analyze three microarray datasets (GSE14407, GSE36668 and GSE18520) and processed the data from a total of 73 cancerous (OC) and 26 normal ovarian tissue samples. The analysis with GEO2R revealed a total of 232 common DEGs (p-value < 0.05 and |logFC| > 2), including 108 upregulated (log FC > 0) and 124 downregulated (log FC < 0) genes. Further, GO and pathway enrichment analysis via DAVID showed that for BP, the upregulated genes were involved in cell adhesion, cellular response to interleukin-1, positive regulation of transcription from DNA/RNA, and transcription. Whereas, for CC, these genes were enriched in extracellular matrix/region, anchored membrane component, cell junction and golgi membrane. As for MF, the upregulated genes were mainly enriched in sequence-specic DNA binding, transcription factor activity, RNA polymerase II regulatory region, and DNA binding. The KEGG pathway analysis showed that 26 genes were mainly involved in cell cycle, p53 signaling and oocyte meiosis pathways. Re-analysis of these candidate genes that 13 DEGs were related to signicantly higher expression in OC tissues compared to the normal tissues. Of these, 11 genes (CDC6, CCNE1, BUB1B, CCNB2, BUB1, SFN, TTK, CDC20, PTTG1, CDK1 and CDKN2A) were found to be signicantly generated and enriched (p < 0.05) in the cell cycle pathway, while 2 genes (SFN and RRM2) were signicantly enriched (p < 0.05) in p53 signaling pathway. Our ndings reveal new potential candidate genes that could further be targeted for improved ovarian cancer prognosis.


Page 3/19
software. Afterwards, the Database for Annotation, Visualization and Integrated Discovery (DAVID) was employed to analyze the common DEGs including their role in biological process (BP), cellular component (CC), molecular function (MF) and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway. Furthermore, to rule out the core genes, we performed integrated analysis using protein-protein interaction (PPI) network combined with Cytotype Molecular Complex Detection (MCODE). The signi cant prognostic (p < 0.05) details of the genes were obtained by processing candidate DEGs in online database, Kaplan Meier Plotter. Finally, the validation of DEGs expression between normal and cancerous ovarian tissues was performed (p < 0.05) using Gene Expression Pro ling Interactive Analysis (GEPIA). Taken together, the initial analysis revealed only 26 candidate genes. Repeated KEGG analysis of these genes showed that 11 DEGs (CDC6, CCNE1, BUB1B, CCNB2, BUB1, SFN, TTK, CDC20, PTTG1, CDK1, CDKN2A) in the cell cycle while 02 DEGs (SFN and RRM2) were generated and signi cantly enriched in p53 signaling pathway. In conclusion, our combinatorial bioinformatics analysis reveals potential OC biomarker genes which could be useful for targeted and effective prognosis among OC patients.

Identi cation of DEGs in OC
The data from three expression pro les (GSE14407, GSE36668 and GSE18520) consisted of 12 OC vs 12 normal tissues, 8 OC vs 4 normal tissues and 53 OC vs 10 normal tissue samples. All microarray datasets were based on platform GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array) ( Table 1). The Venn diagrambased results revealed that a total of 232 DFEGs were common among the three datasets. Among these genes, 108 were upregulated (FC > 0) while 124 genes were downregulated (FC < 0) in the OC tissues compared to the normal ones ( Fig. 1 & Table 2).  DEGs functional enrichment and KEGG analysis Next, all 232 DEGs were subjected to GO analysis by DAVID. The results indicated that for BP, the upregulated genes were mainly enriched in cell adhesion, cellular response to interleukin-1, positive regulation of transcription from DNA/RNA, and transcription. Whereas, for CC, these genes were enriched in extracellular matrix/region, anchored membrane component, cell junction and golgi membrane. As for MF, the upregulated genes were mainly enriched in sequence-speci c DNA binding, transcription factor activity, RNA polymerase II regulatory region, and DNA binding.
On the other hand, GO analysis for downregulated genes showed that for BP, these DEGs were generated and enriched in the mitotic nuclear division, cell division, mitotic cell cycle/cell cycle transition, protein catabolic process and microtubule-based movement. Further, for CC, the downregulated genes exhibited enrichment in anaphase-promoting complex, kinesin complex, microtubule, bicellular tight junction and integral membrane component. Finally, for MF, these DEGs were mainly enriched in the ATP binding, microtubule motor activity/binding, protein serine/threonine kinase activity, serine-type endopeptidase activity and G-protein coupled receptor activity (Table 3).   Fig. 2A).
Next, to nd highly interconnected regions (clusters), we used MCODE app in Cytoscape which revealed a cluster that consisted of a total of 14 genes (Fig. 2B).

Analysis of candidate genes by Kaplan Meier plotter and GEPIA
The survival data of 14 candidate genes was acquired using KM-plotter (https://kmplot.com/analysis). The results showed that 9 genes exhibited a signi cantly (p < 0.05) worse survival effect while the remaining 5 didn't show a signi cant effect (p > 0.05) related to survival (Fig. 3). In the next step, we compared the mRNA expression levels between normal and OC-affected persons via GEPIA. We found that 13 of 14 analyzed genes had a signi cantly higher expression (p > 0.05) in OC tissues, compared to the normal tissues (Fig. 4).

Validation of KEGG pathway enrichment results
To gain further insights into possible pathways of 13 DEGs with higher expression, we re-analyzed them using  Table 5) while two (SFN and RRM2) were signi cantly enriched ((p = 1.3E-8) in p53 signaling pathway (Fig. 6 & Table 5).

Discussion
The present study represents a focused effort to identify potential OC-related biomarkers that could be useful in effective OC prognosis. We used integrated bioinformatics tools to analyze three microarray datasets (GSE14407, GSE36668 and GSE18520) and processed the data from a total of 73 cancerous (OC) and 26 normal ovarian tissue samples. The analysis with GEO2R revealed a total of 232 common DEGs (p-value < 0.05 and |logFC| > 2), including 108 upregulated (log FC > 0) and 124 downregulated (log FC < 0) genes.
Further, GO and pathway enrichment analysis via DAVID showed that for BP, the upregulated genes were involved in cell adhesion, cellular response to interleukin-1, positive regulation of transcription from DNA/RNA, and transcription. Whereas, for CC, these genes were enriched in extracellular matrix/region, anchored membrane component, cell junction and golgi membrane. As for MF, the upregulated genes were mainly enriched in sequence-speci c DNA binding, transcription factor activity, RNA polymerase II regulatory region, and DNA binding. The KEGG pathway analysis showed that 26 genes were mainly involved in cell cycle, p53 signaling and oocyte meiosis pathways. Re-analysis of these candidate genes that 13 DEGs were related to signi cantly higher expression in OC tissues compared to the normal tissues. Of these, 11 genes (CDC6, CCNE1, BUB1B, CCNB2, BUB1, SFN, TTK, CDC20, PTTG1, CDK1 and CDKN2A) were found to be signi cantly generated and enriched (p < 0.05) in the cell cycle pathway, while 2 genes (SFN and RRM2) were signi cantly enriched (p < 0.05) in p53 signaling pathway. Our ndings reveal new potential candidate genes that could further be targeted for improved ovarian cancer prognosis.
The cell division cycle 6 or CDC6 is a novel initiator of DNA replication and it plays a vital role in the initiation and regulation of the cell cycle [8]. Several studies have reported that CDC6 plays a key role in human cancers including squamous cell carcinoma (SCC) of head and neck, nasopharyngeal carcinoma and lung cancer [9][10][11]. Remarkably, a signi cantly higher expression of CDC6 has been associated with epithelial ovarian cancer (EOC) tissues compared to the normal ones [12]. The prognostic signi cance of this gene in OC and other forms of cancers [13] makes it important target for therapeutic strategies.
Furthermore, our results revealed additional genes like cyclin-dependent kinase 1 (CDK1), cyclin E1 (CCNE1) and cyclin-dependent kinase inhibitor 2A (CDKN2A) that exhibited signi cantly higher expression in the OC tissues. A recent study by Yunoki and coworkers demonstrated that these three genes were signi cantly upregulated in the sebaceous gland carcinoma (SGC) of eyelid [14]. Interestingly, CDKN2A is the most widely studied gene for its tumor suppressive activity. Also, any mutation in this gene or disruption in its functional regulation are frequently associated with different types of cancers in human [15,16].
BUB1 is a serine/threonine kinase that binds centromeres during the process of mitosis. Studies have demonstrated that upregulation of BUB1 is associated with various types of human cancers and subsequently their clinical prognosis [17]. Another study reported that a positively higher percentage of BUB1 protein denotes an advanced stage and higher degree of differentiation in the endometrial carcinoma patients [18]. On the other hand, BUB1B, a mammalian homolog of yeast Mad3, has been shown to elevate the proliferation of tumor and is related to worse survival rates in different forms of cancers including breast, colorectal, prostate and gastric cancer [19][20][21].
Remarkably, the higher expression of strata n (SFN) has been designated as a universal abnormality and it is associated with progression of lung adenocarcinoma [22]. In case of human ovarian cancer, the prognostic importance of SFN has been demonstrated earlier. For example, regarding the clinical signi cance of SFN mRNA, a study has shown that SFN is a cell cycle-related checkpoint gene that is associated with oncogenesis.
The higher expression of SFN was observed in different cells of the OC patients and it was reported that higher expression of SFN has association with age and cancer levels [23]. In short, SFN plays a vital role in the regulation of cell cycle and OC pathogenesis [24].
The RRM2 or ribonucleotide reductase subunit M2 is associated with 2p25 chromosome which lacks the structural variations in cervical cancer samples [25]. In addition to being a potential indicator of poor prognosis [26][27][28][29][30], the overexpression of RRM2 has frequently been observed in various forms of cancers including gastric, lung, adrenocortical, nasopharyngeal cancer and neuroblastoma [31][32][33][34]. While in our analysis we found that RRM2 is associated with p53 signaling pathway. Given the well-known role of p53 signaling pathways in cancers [35], it could be a potential prognostic biomarker in case of OC.
As several studies have demonstrated that the variable expression of the above-described genes is not only associated with different forms of cancers (including OC), but also, these genes could be a potential candidates for improved OC prognosis. While our study provides the useful information on potential OC biomarkers, in future, these ndings should be validated by proper experimentation.

Conclusion
Taken together, our comparative bioinformatics analysis of normal and OC tissues revealed a total of 12 DEGs (CDC6, CCNE1, BUB1B, CCNB2, BUB1, SFN, TTK, CDC20, PTTG1, CDK1, RRM2 and CDKN2A) that were signi cantly generated and enriched in the cell cycle and p53 signaling pathways. As supported by several studies, these genes could be further targeted for their OC prognostic value. The ndings of our study will be useful to investigate the pathogenesis of the OC and to develop better prognosis of improved therapeutic signi cance.

Acquisition of microRNA expression datasets
In this study, we took advantage of the free, public database and obtained three datasets (GSE14407, GSE36668 and GSE18520) from online GEO (NCBI) database (https://www.ncbi.nlm.nih.gov/geo/). A total of 73 OC samples were compared with 26 normal ovarian samples originating from all datasets. The detailed information on the annotation platform, number and type of analyzed samples is given in the Table 1.
Data processing and identi cation of DEGs GEO2R is a web-based interactive tool that can e ciently perform the comparison of two different datasets originating from the same experimental conditions [36]. The comparative analysis of all datasets was performed in the above-mentioned online tool and subsequently, the results were exported in TXT format and processed via Microsoft Excel. We used Venn, an online tool for the identi cation of the common DEGs. The DEGs with a value of FC > 0 were considered as upregulated genes, while the DEGs with FC < 0 were designated as downregulated genes.
GO functional analysis [37] and KEGG pathway enrichment analysis [38] were performed to predict the potential functions of the candidate DEGs via DAVID (https://david.ncifcrf.gov/tools.jsp). A cut-off value of p < 0.05 was implemented to rule out noteworthy BP, CC and MF of DEGs.

Construction of PPI network and module analysis
To evaluate the PPI information of the target DEGs, online Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) was used (https://string-db.org/) with a con dence score > 0.4. Subsequently, we used the MCODE app in Cytoscape [39] to nd out the PPI network modules (degree cutoff = 2, depth = 100, kcore = 2 and node cutoff value = 0.2).
Survival analysis and validation of expression of the candidate genes To assess the effect of multiple genes on survival based on GEO, we used a well-known web-based tool, Kaplan Meier-plotter [40]. The hazard ratio (HR) and logrank P value with con dence interval (CI) of 95% were calculated and displayed on the plot. Furthermore, the validation of candidate DEGs was performed via GEPIA web tool based on the RNA-seq expression data from thousands of TCGA and GTEx projects [41].

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.  The online tool KM-plotter was used to acquire the prognostic information of 13 core genes. A total of 9 genes exhibited a signi cantly (p < 0.05) worse survival rate while 4 genes did not have a signi cant survival rate (p > 0.05).

Figure 4
The comparative analysis of signi cant gene expression between normal and OC-affected patients was performed using an online web resource, GEPIA. A total of 13 genes were signi cantly upregulated in the OC tissues compared to the normal ovarian tissues (*p > 0.05). The grey color denotes normal tissues and the red color represents the tumor tissues.

Figure 5
Repeated KEGG pathway analysis of 14 signi cantly upregulated DEGs from OC tissues. A total of 11 genes were generated and signi cantly enriched in the cell cycle pathway in G1, S G2 and M phases. Ink4a and ARF represent CDKN2A, CycE means CCNE1, Mps1 denotes TTK, and CybB means CCNB2.