Identification of hub genes and a prognostic model associated with ovarian cancer

Background Ovarian cancer (OC) is one of the most lethal gynecological cancers worldwide. The pathogenesis of the disease and outcomes prediction of ovarian cancer patients remain largely unclear. This study aimed to explore the key genes and biological pathways in ovarian carcinoma development, as well as construct a prognostic model to predict patients’ overall survival. Results We identified 164 up-regulated and 80 down-regulated differentially-expressed genes (DEGs) associated with ovarian cancer. GO term enrichment showed DEGs mainly correlated with spindle microtubes. For KEGG pathways, cell cycle was mostly enriched for the DEGs. The PPI network yielded 238 nodes and 1284 edges. Top three modules and 10 hub genes were further filtered and analyzed. Three candidiate drugs targeting for therapy were also selected. Thirteen OS-related genes were selected and an eight-mRNA model was present to stratify patients into high- and low-risk groups with significantly different survival. Conclusions The identified DEGs and biological pathways may provide new perspective on the pathogenesis and treatments of OC. The identified 8-mRNA signature has significant clinical implication for outcome prediction and tailored therapy guidance for OC patients. diagnoses and personalized treatment in ovarian cancer.

Background Ovarian cancer (OC) is one of the most lethal gynecological cancers worldwide. The pathogenesis of the disease and outcomes prediction of ovarian cancer patients remain largely unclear. This study aimed to explore the key genes and biological pathways in ovarian carcinoma development, as well as construct a prognostic model to predict patients' overall survival. Results We identified 164 up-regulated and 80 down-regulated differentially-expressed genes (DEGs) associated with ovarian cancer. GO term enrichment showed DEGs mainly correlated with spindle microtubes.
For KEGG pathways, cell cycle was mostly enriched for the DEGs. The PPI network yielded 238 nodes and 1284 edges. Top three modules and 10 hub genes were further filtered and analyzed. Three candidiate drugs targeting for therapy were also selected. Thirteen OS-related genes were selected and an eight-mRNA model was present to stratify patients into high-and low-risk groups with significantly different survival. Conclusions The identified DEGs and biological pathways may provide new perspective on the pathogenesis and treatments of OC. The identified 8-mRNA signature has significant clinical implication for outcome prediction and tailored therapy guidance for OC patients.

Background
Ovarian cancer (OC) is the most lethal malignant diseases in the female reproductive system, with over 200,000 new cases and 150,000 deaths happen each year worldwide [1]. Epithelial ovarian cancer accounts for 80-95% of ovarian malignancies, listed as the most common histological type.
Since the ovaries locate in the deep pelvis, with mere symptoms emerging at the beginning of ovarian morbid change, the early detection for the malignancy is truly difficult. Hence, when ovarian cancer is detected, the patient is usually at an advanced stage, with invasion and metastasis accompanied [2].
For patients in the early stage, the 5-year survival rate can reach 85-90%, whereas, for advancedstage patients, the number is mere ~ 20% [3]. Therefore, it is imperative to explore the molecular mechanisms of malignant biological behavior of ovarian cancer cells and to develop more reliable molecular markers for predicting recurrence and evaluating prognosis, further guiding clinicians for therapy.
At present, various high-throughput microarrays and next-generation sequence genomic datasets, which were deposited in the Gene Expression Omnibus (GEO) [4] and The Cancer Genome Atlas (TCGA) databases, have been widely analyzed for identifying differentially expressed genes (DEGs), which could serve as candidate diagnostic or prognostic markers, further effectively improving our understanding of the disease from genetic perspective. Whereas, since the existence of tissue or sample heterogeneity in each independent experiment, as well as the discrepancy of different data processing methods and technology platforms, the DEGs identified from a single-cohort study may generate false positives. Herein, the Robust Rank Aggregation (RRA) method, which analyzes the significant probability of all elements by a probabilistic model, is developed to identify statistically significant genes from multiple datasets and provide more accurate and valuable information for clinical use far beyond one gene list [5]. To date, a bunch of novel prognostic markers has been discovered to potentially improve the efficacy of diagnosis and prognosis of OC [6]. However, these identified markers were only effective for partial stages or grades and were difficult to apply widely.
Hence, a prognostic model, which is based on signature gene expression level, with high discriminating power to effectively assist prognosis prediction for each patient is required in clinical practice.
In this study, we downloaded six primary microarray datasets from the GEO database which contained a total of 265 samples, with 201 ovarian cancer samples and 64 normal samples. The geneset and relative clinical information on ovary tissues of OC patients and healthy females from TCGA and GETx portal were also downloaded. Integrated DEGs between cancerous and normal ovarian samples were screened using the "limma" R package and RRA method. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment of DEGs were performed for next-step functional analysis. The STRING and the Connectivity Map (CMap)online database was then used to analyze the association of DEGs and explore the molecular mechanisms as well as drugs involved in tumorigenesis. Through survival analysis, prognostic mRNAs were also selected. By performing Cox regression analysis, we identified an eight-mRNA signature model with the ability to predict the prognosis of OC patients and independent from clinical factors. Our study provides reliable molecular markers and prognostic models for early detection and outcome prediction, as well as effective drug targets for treating ovarian cancer.

The DEGs among six GEO microarray datasets
The distribution of gene expression from each microarray dataset was presented in volcano plots ( Figure 1A-F), and the top 100 significantly up-and down-regulated genes were displayed in the heatmaps (Figure 2A-F). Through RRA analysis of 6 expression microarrays, we identified 605 DEGs, which consist of 301 up-regulated and 304 down-regulated genes, and displayed the top 20 dysregulated genes by "pheatmap" R package in Figure 3. Next, we analyzed the expression profiles of TCGA and GETx, getting 2253 dysregulated genes. Intriguingly, when combined these 2253 DEGs with the 605 differentially expressed genes from GEO datasets, we found that 244 genes were commonly dysregulated in these two databases, with 164 up-regulated ( Figure 4A) and 80 downregulated genes ( Figure 4B).

GO term and KEGG pathway enrichment analysis of DEGs
To study the potential biological function of the 244 DEGs, we performed biological pathway analysis and identified significantly enriched pathways via Enrichr web tools. In GO term ( Figure 5A), for the BP group, the DEGs were mostly enriched in "regulation of attachment of spindle microtubes to kinetochore", "cellular response to laminar fluid shear stress" and "microtubule cytoskeleton organization involved in mitosis". In MF group, the dysregulated genes were highly correlated to "microtubule-binding", "microtubule motor activity" and "tubulin-binding". As for CC group, the DEGs were closely related to "condensed nuclear chromosome kinetochore" and "mitotic spindle". KEGG pathway analysis showed 244 DEGs highly enriched in "cell cycle" and "Alanine, aspartate, and glutamate metabolism" ( Figure 5B).

PPI network construction and modules analysis
Using the STRING database and Cytoscape software, a totally 244 DEGs were mapped into the PPI network, which included 238 nodes and 1284 edges ( Figure 6A). The PPI enrichment p-value were < 1.0e-16. The top three key modules (Figure 6b-d) within PPI network were then selected (Module 1, MCODE score= 43.391; Module 2, MCODE score= 4.8; Module 3, MCODE score= 3.667) and the biological function of Module 1, which consisted of 47 nodes and 998 edges, was further analyzed. GO analysis indicated that Module1 was mainly associated with "regulation of attachment of spindle microtubules to kinetochore", "condensed nuclear chromosome kinetochore" and "microtubule motor activity". KEGG analysis showed that "cell cycle" and "Oocyte meiosis" were the most highly enriched pathways ( Figure S1).

The screen of Hub genes and their characteristics
The top ten hub genes with the highest degree of connectivity were CDC45, CDK1, TOP2A, CDC20, CCNB1, CEP55, UBE2C, HMMR, FOXM1, and TPX2 ( Figure 6E). The co-expression analysis results of the hub genes demonstrated that these genes actively interacted with each other (Supplementary 2B). Besides, we established the interaction network of ten hub genes with their related genes and explored the biological role ( Figure S2A, C-E) of the network by FunRich. The gene alteration type and frequency, as well as the 50 most frequently altered neighbor genes were also exhibited ( Figure 7).
Gene alteration frequency of 10 hub genes among 606 TCGA ovarian cancer samples was beyond 50%, with most genes showed amplified and multiple altered ( Figure 7A-B). The top 3 most frequent altered genes were FOXM1, CDC20 and CCNB1, with FOXM1 and CDC20 largely amplified while CCNB1 deep deleted. Through analysis of ovarian cancer patients' gene-set from TCGA, we found that CCNB1, UBE2C, CDK1, CEP55 as well as FOXM1 expressed significantly higher in high-grade tumors and predicted worse outcomes since patients overexpressed above genes owned lower overall survival (OS) and disease-free survival (DFS) rates ( Figure 8). The Oncomine database showed results from various studies were consistent to our finding ( Figure S3). The HPA website also demonstrate that proteins translated by such 5 hub genes were overexpressed in ovarian cancer tissues ( Figure   S4). HMMR and TPX2 were also negatively correlated to patients' prognosis while no expression difference was observed in diverse tumor grades and CDC20 was positively associated with tumor grade but not correlated to patients' outcomes.

Related small molecule drugs screening
In total, 244 DEGs were analyzed in CMap to screen small molecule drugs, and the candidate molecules with top ten connectivity score are listed in Table 1. Five of these molecules showed a negative correlation and suggested potential in clinical applications. Among them, Trichostatin A, pyrvinium and 8-azaguanine showed a significantly negative correlation and the stuctures of such candidate molecule drugs was found in the PubChem database and shown in Figure S5.

Construction of prognostic model and evaluation of its predictive ability
Univariate Cox regression analysis revealed that 13 of 240 DEGs were significantly correlated to patients' OS in the training cohort. (Table 2) The 13 OS-related genes were listed as follows: CCND1, SYNE4, CCDC80, TMC4, MCC, FOXQ1, KRTCAP3, CXCR4, IL4I1, DEFB1, CSGALNACT1, KLHL14 and MCUR1. Through LASSO Cox regression, we narrowed the number of 13 prognosis-associated genes into twelve according to the minimum criteria ( Figure 9). Next, based on the multivariate Cox model, analysis showed that the overall survival time of the lower-risk group was significantly longer than the high-risk group (P =1.147e-07) ( Figure 10E). ROC curve analysis revealed the area under the ROC curve (AUC) of the prognostic model was 0.815 ( Figure 10D). Meanwhile, the risk scores ( Figure 10A) of ovarian cancer patients in the training group were ranked for displaying their distribution and the survival status ( Figure 10B) was marked on the dot plot. The expression pattern of 8 prognostic mRNAs between high and low-risk groups was also shown in the heatmap ( Figure 10C). Univariate and multivariate Cox regression analysis concerning the risk score and clinical factors showed that the prognostic model was able to serve as an independent prognostic indicator ( Figure 11A-B). ROC curve analysis also showed that the AUC value of the model was 0.820, much significantly higher than the tumor stage (AUC= 0.542), grade (AUC= 0.574) and patients' age (AUC= 0.701) ( Figure 11C).
Interestingly, when combined the risk score with clinical factors, the ROC curve of combination model was much higher than each alone( Figure 11D).
As for the testing cohort, we divided the group into 78 high-risk and 108 low-risk individuals based on the training cohort' cut-off risk score. The outcomes of low-and high-risk group patients of the testing cohort were also measured and the overall survival time of the high-risk group was significantly shorter than the lower-risk group (P =1.721e-02) ( Figure 12E). The area under the ROC curve (AUC) of the prognostic model was 0.641 ( Figure 12D). The risk scores distribution ( Figure 12A) and survival status ( Figure 12B) of ovarian cancer patients, as well as the 8-prognostic gene expression heatmap ( Figure 12C) in the testing group were also present. Meanwhile, the independency of the prognostic model was confirmed in testing cohort since univariate and multivariate cox regression analysis showed the model correctly predicted high-or low-risk factor group patients' outcomes without relying on any clinical factors ( Figure 13A-B). ROC curve analysis showed that the prognostic model exhibited better sensitivity and specificity when compared to tumor stage, grade and patients' age for the AUC value of the model was much higher than latter ( Figure 13C). In accordance with results from training cohort, the combination of risk score and clinical factors showed better OS prediction capability ( Figure 13D).

The Prognostic Signature correlating to immune cells infiltration
Through TIMER web-tool, we analyzed the relative gene expression levels of six types of immune cells for each patient and found that genes concerning macrophage fraction were expressing significantly higher in the high-risk group (P<0.05) compared to the low-risk group in training cohort ( Figure 14).
Interestingly, same result was also observed in the testing cohort ( Figure 15).

Discussion
In this study, we used the RRA methods to jointly analyzed six GEO ovarian cancer microarrays which contain 201 ovarian cancer and 64 normal samples, identifying 605 DEGs and overlapped them with dysregulated genes of OC cohort from TCGA and GETx portal, finally getting 164 up-regulated and 80 down-regulated genes.
Functional analysis showed that 244 DEGs were significantly enriched in the cell division cycle, to be clear, in the process of the mitotic spindle. Spindle microtubules have been proved to play crucial role in physiological and pathological processes. As for cell division, only when all chromosomes linked to spindle microtubules through kinetochores and the spindle assembly checkpoint is satisfied, this process could step to anaphase [7]. Suraokar et.al found that the mitotic spindle assembly checkpoint and microtubule network were significantly altered in malignant pleural mesothelioma (MPM) while PPI network construction of 244 DEGs includes 238 nodes and 1284 edges, among which we identified 3 key modules. Interestingly, the top1 module was also highly associated with spindle microtubules and chromosome kinetochore, confirming the role of cell cycle in OC pathogenesis. The top ten hub genes from the PPI network were also recognized, which are CDC45, CDK1, TOP2A, CDC20, CCNB1, CEP55, UBE2C, HMMR, FOXM1, and TPX2. Among them, CCNB1, UBE2C, CDK1, CEP55 as well as FOXM1 were found overexpressed in high-grade tumors and predicted worse outcomes. Besides, FOXM1, CDC20 and CCNB1were the most frequent altered genes. These genes were reported to closely associated with the BRCA1/2 mutation process of ovarian cancer. It has been reported that females with BRCA1 or BRCA2 mutations were much more susceptible to get ovarian cancer, , while miR-490-3P could reduce CDK1 expression, impeding ovarian cancer cell proliferation [15] and Alsterpaullone could effectively reverse the drug-resistant trend [16]. Through searching CMap, we found Trichostatin A, pyrvinium and 8-azaguanine negatively correlated to the genomic-wide changes of OC. Trichostatin A has been proved to enhance the apoptotic potential of Palladium nanoparticles and increased the therapeutic potential in cervical cancer [17]. Likewise, cotreatment with BEZ235 and Trichostatin A enhanced autophagic cell death via up-regulating LC3B-II and Beclin-1 expression, finally exerting anti-tumor activity in breast cancer [18]. Pyrvinium was found to inhibit cell autophagy and promote cancer cell death. The combination of pyrvinium with autophagy stimuli improves its toxicity against cancer cells [19]. 8-azaguanine has also been used for treatment of various carcinoma, sarcoma, osteogenic sarcoma, lymphosarcoma and melanoma [20].
Hence, except for cisplatin or PARPi in treating ovarian cancer, such small molecules may also reverse the malignant phenotypes of OC and serve as potential drugs for therapy.
By performing Univariate and multivariate Cox regression analysis, as well as LASSO regression methods for 244 DEGs, we developed an eight-mRNA model that could classify OC patients into the high-and low-risk group with significantly different overall survival. We explored the regulatory mechanism of eight-mRNAs in the signature by searching the published article, the majority of which were reported to be associated with tumorigenesis and tumor proliferation. DEFB1 is commonly considered as a single copy gene that encodes beta-defensin 1 (BD-1), a member of the host defense peptide group. In human cancers, BD-1 is proposed to inhibit cell growth and promote apoptosis, acting as a tumor suppressor [21,22] [31,32]. Note that the role of TMC4 in ovarian cancer pathogenesis has not been studied. This may offer a new direction for TMC4 in ovarian cancer research. More recently, the prognostic value of mRNA-related signature has been reported in several studies [33,34]. However, current traditional clinical risk factors and clinical models have limited success in predicting OC patients' outcomes due to the molecular heterogeneity and false-positive rate. Our LASSO regression model results with independent validation suggested that the combination of eight mRNA has good robustness and reproducibility in predicting prognosis for OC patients independent from traditional clinical risk factors, with the area under the ROC curve (AUC) marked 0.815, significantly higher than tumor stage, grade, and patients' age.
To investigate the biological function of various types of immune cells regarding ovarian cancer, we explored the relative gene of each immune cell type and found that macrophage part expressed expressively higher in high-risk group in both training and testing cohort, pointing out the oncogenic-role of macrophages in ovarian cancer development. Macrophage, as a type of immune-related cells, has already been considered to be closely associated with the malignant biological behavior of various cancers. M2 macrophage-like tumor-associated macrophages (TAMs) secreted EGF and then activated EGFR on tumor cells, further upregulating VEGF/VEGF-R signaling in surrounding tumor cells to finally mediate ovarian cancer cell proliferation and migration [35]. The exosomal miR-223 derived from macrophages under hypoxia condition reduced PTEN expression and led to increased PI3K/AKT signal activation, consequently mediated the drug resistance of EOC cells [36]. Hence, the potential therapeutic tools targeting macrophages may provide new perspective into ovarian cancer treatment.
There are some highlights of our study. First of all, from the RRA integrated approach, we jointly explored six ovarian cancer datasets in GEO databases and TCGA OC patients' data matrix, finding some interesting niche factors and unique modules that were not seen earlier. Second, this study discovered a multitude of differentially expressed genes and hub genes between ovarian cancer and normal tissues, as well as the mutation condition of these genes. This information summarizes the genetic-level changes during the pathophysiological process of ovarian cancer and provides possible target molecules for further research. Third, the prognostic model in our study can effectively predict OC patients' outcomes, which provide a new method to help gynecologists evaluate patients' prognosis in clinical practices.
However, some aspects of our study required improvement. First, our research was completely based on public data analysis, additional experimental studies are needed to explore the detailed molecular mechanism regarding DEGs and pathways, as well as the eight-mRNA prognostic model. Second, candidate drugs targeting hub genes and immune-related cells are needed to explore and clinical trials are also needed to verify whether the hub genes can be targeted to truly exert therapeutic effects and whether the prognostic model effectively predicts patients' outcomes for OC. That said, with the ever-increasing accessibility and volume of genomic data from clinical patients and the continued development of technologies and algorithms, the bioinformatic analysis will further promote the progress of accurate diagnoses and personalized treatment in ovarian cancer.

Conclusion
In summary, in this study, we identified 244 genes commonly dysregulated in ovarian cancer, with 164 up-regulated and 80 down-regulated genes. The most enriched biological pathways regarding DEGs were cell-cycle related processes. From the PPI network concerning all DEGs which comprised 238 nodes and 1284 edges, we seek out top3 hub modules and top10 hub modules. In addition, we found three molecular drugs may target ovarian cancer for therapy. We also performed cox and lasso regression to present and validate a robust prognostic model aggregating eight signature genes.
Using this model, we could further distinguish patients with an elevated risk of mortality independent of other clinical factors, which may help us to improve our understanding of underlying mechanisms involved in OC and guide for diagnosing and prognosis prediction, as well as rational therapy in clinical practice.

Data collection
Through searching on the GEO Repository with "ovarian cancer", we downloaded the gene expression (STRING) is a database that encompasses the interaction information between known proteins and potentially interacted proteins [43]. In order to explore the correlations between the DEGs, we used the STRING database to construct a PPI network and visualize our results by Cytoscape software [44].
Confidence score > 0.4 was set as significant. Molecular Complex Detection (MCODE) was utilized to select hub modules of PPI networks in Cytoscape [45]. We set the degree cut-off = 2, node score cutoff = 0.2, k-score = 2, and max. Depth = 100 as the criterion. Then, the significant modules were performed by go and KEGG analysis. Top 10 genes were defined according to the high degree of connectivity in string network. [46]. The co-expression analysis of 10 hub genes was performed by STRING, either.

Validation of the hub genes
We downloaded the raw gene-set of ovarian cancer patients from TCGA to explore the expression differences of hub genes in low and high-grade tumor tissues of ovarian cancer and draw the survival plot using Kaplan Meier plotter web-tool [47]. The gene and protein expression level of grade-related hub genes were then confirmed by Oncomine and The Human Protein Atlas(HPA)database [48,49].
The Meanwhile, we explored the genetic alteration information of the selected 10 hub genes in ovarian cancer patients by plug-in cBioPortal (cBio Cancer Genomics Portal) tool, which deposits the genomics profiles of various cancer types and provides analysis and visualization of the genomics datasets [50].

Identification of candidate small molecule drugs
The Connectivity Map (CMap) database was able to predict potential drugs which might reverse, or induce the biological state encoded in certain gene expression signatures in OC [51]. The 244 DEGs from our study were used to query the CMap database. The enrichment scores which represent the similarity were calculated (ranging from − 1 to 1). The positive connectivity score means an inducing influence on the input signature, whereas drugs with negative connectivity score present reversion impact on the characteristic in human cell lines and are considered as candidate therapeutic molecules. After sorting all instances, the connectivity score of various instances was filtered by p value < 0.05. Next, we investigate the structures of these candidate molecular drugs from the Pubchem database (https://pubchem.ncbi.nlm.nih.gov/).

Establishment and evaluation of the prognostic model
The 374 ovarian cancer patients from the TCGA project were randomly classified into the training cohort (n = 188) and the testing cohort (n = 186). OS-related genes were determined by performing univariate Cox regression analysis in the training cohort with the "Survival" R package and further selected for the next-step screen. Least Absolute Shrinkage and Selection Operator (LASSO) is a parameter selection algorithm which shrinks all high-dimensional regression coefficients and generates the penalty regularization parameter λ via the cross-validation routine by "glmnet" R package. To select the optimal prognostic mRNAs, we adopted LASSO regression among the selected candidate genes and further perform multivariate Cox proportional hazards regression to evaluate their independent prognostic values. The risk-score model for predicting outcomes of ovarian cancer patients was the sum of each optimal prognostic mRNA expression level multiplying relative regression coefficient weight calculated from the multivariate Cox regression model.
All training cohort patients were classified into high-and low-risk groups according to the median risk score. The Kaplan-Meier curves of two diverse groups were plotted using "survfit" function and the receiver operating characteristic (ROC) curve was unfolded for OS prediction to estimate the sensitivity and specificity of the prognostic model. Cox multivariate analysis was also performed to exam whether the risk score was independent of the clinical characters, such as age, tumor stage, and grade. Next, we used the testing group to check the efficacy of the prognostic risk model. Each individual in the testing cohort was also categorized as high-risk or low-risk case by comparing the patient's risk score with the cut-off value calculated from the training cohort. Kaplan-Meier curve analysis, time-dependent ROC analysis, and cox multivariate analysis were performed, either.

Searching tumor-infiltrating immune cells associated with patients' prognostic signatures
The TIMER web-tool allows for systematical evaluations of the relationship between the six immune

Ethics approval and consent to participate
Not applicable.

Consent for publication
Written informed consent for publication was obtained from all participants.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interest
The authors declare that they have no competing interests.

Funding
This work was supported by the National Nature Science Foundation of China (81872119) and Jiangsu province medical innovation team (CXTDA2017008).