Identification of Molecular Markers Associated With Ovarian Cancer Prognosis Using Bioinformatics Analysis

Background Ovarian cancer is associated with a high mortality rate worldwide. However, the pathogenesis, clinicopathological features, and genetic mechanisms of ovarian cancer are still unclear, and it is essential to identify prognostic markers for its clinical diagnosis and treatment. Here, we utilized bioinformatic analysis to identify potential genes involved in mediating BRCA1 expression to elucidate the potential mechanisms underlying ovarian cancer. Methods Gene expression profiling (GSE14407) was performed to identify differentially expressed genes (DEGs) and analyze the weighted gene co-expression network. We selected the key module that was significantly associated with BRCA1 expression and performed gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses for genes in the hub module. We then screened the hub genes utilizing the Search Tool for the Retrieval of Interacting Genes Database (STRING) and Molecular Complex Detection (MCODE) plug-in in Cytoscape. We validated gene expression levels through The Cancer Genome Atlas and GTEx databases for hub genes and screened genes that were related to overall survival in patients with ovarian cancer using the Kaplan-Meier plotter database. Results In total, 3124 DEGs were detected, including 433 upregulated genes and 2691 downregulated genes. We selected the brown module, which was the most significantly associated with BRCA1 expression. GO analysis showed that the hub module genes were significantly enriched in biological processes, including the mitotic cell cycle process, chromosome segregation, and cell division. KEGG analysis showed that the hub module genes were particularly enriched in the cell cycle, p53 signaling pathway, and small cell lung cancer. We selected 30 hub genes from the protein-protein interaction network, which had 88 nodes and 721 edges. Further analyses identified PBK as a prognosis-3


Abstract
Background Ovarian cancer is associated with a high mortality rate worldwide. However, the pathogenesis, clinicopathological features, and genetic mechanisms of ovarian cancer are still unclear, and it is essential to identify prognostic markers for its clinical diagnosis and treatment. Here, we utilized bioinformatic analysis to identify potential genes involved in mediating BRCA1 expression to elucidate the potential mechanisms underlying ovarian cancer.
Methods Gene expression profiling (GSE14407) was performed to identify differentially expressed genes (DEGs) and analyze the weighted gene co-expression network. We selected the key module that was significantly associated with BRCA1 expression and performed gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses for genes in the hub module. We then screened the hub genes utilizing the Search Tool for the Retrieval of Interacting Genes Database (STRING) and Molecular Complex Detection (MCODE) plug-in in Cytoscape. We validated gene expression levels through The Cancer Genome Atlas and GTEx databases for hub genes and screened genes that were related to overall survival in patients with ovarian cancer using the Kaplan-Meier plotter database.
Results In total, 3124 DEGs were detected, including 433 upregulated genes and 2691 downregulated genes. We selected the brown module, which was the most significantly associated with BRCA1 expression. GO analysis showed that the hub module genes were significantly enriched in biological processes, including the mitotic cell cycle process, chromosome segregation, and cell division. KEGG analysis showed that the hub module genes were particularly enriched in the cell cycle, p53 signaling pathway, and small cell lung cancer. We selected 30 hub genes from the protein-protein interaction network, which had 88 nodes and 721 edges. Further analyses identified PBK as a prognosis-3 associated hub gene. Notably, PBK expression was significantly increased in ovarian cancer tissues, as demonstrated by immunohistochemistry analysis using samples from the Human Protein Atlas database.
Conclusion PBK was found to be associated with overall survival in patients with ovarian cancer. Our results provide insights into our understanding of the molecular mechanisms and molecular diagnosis of ovarian cancer.

Background
Ovarian cancer is a common gynecological tumor that is the fifth leading cause of death among women worldwide [1]. Notably, ovarian cancer is associated with a high mortality rate because patients experience no or nonspecific symptoms during the early stages of the disease, resulting in delayed diagnosis. More than 75% of patients with ovarian cancer are diagnosed at an advanced stage, limiting treatment efficacy [2,3]. Therefore, early screening and diagnosis of patients with ovarian cancer is essential.
At least 5-10% of ovarian cancer cases exhibit a genetic predisposition to the disease, and most are carriers of breast cancer susceptibility gene (BRCA) 1 or BRCA2 mutations [4]. The risk of developing ovarian cancer varies considerably in women with BRCA1/2 mutations, i.e., 39-58.9% for BRCA1 and 11-34.5% for BRCA2. Thus, women with BRCA1 mutations have a higher risk of developing ovarian cancer [5][6][7]. However, previous studies have shown that cancer is caused by combinations of several genes and pathways, and not just by a single gene.
Accordingly, in this study, we obtained mRNA expression profiles from the Gene Expression Omnibus (GEO) and identified differently expressed genes (DEGs) in ovarian cancer. Furthermore, function annotations and signal pathway enrichment were performed to identify key module genes and construct protein-protein interaction networks for elucidation of novel biomarkers in ovarian cancer.

Identification of DEGs from the GEO database
We chose the ovarian cancer dataset GSE14407 from the GEO database. GSE14407, which was based on the GPL570 platform (HG-U133_Plus_2; Affymetrix Human Genome U133 Plus 2.0 Array), contained 12 paired normal ovarian epithelia and serous papillary ovarian cancer samples. We converted probe names to gene symbols according to the Whole Human Genome Microarray 4x44K G4112F (Probe Name Version). However, many probes can correspond to the same gene symbol, and we used the p value as the criteria for downstream DEG analysis. The probes with the most significant p value were chosen as DEGs. Then, we used the limma package in R [8][9][10] to identify DEGs in normal ovary tissues and paired tumor tissue. Genes that met the following criteria were chosen as DEGs: |log2 fold change| ≥ 1.0 and adjusted p value < 0.05.

Weighted gene co-expression network construction
Expression data for DEGs (3125 genes) were applied to identify co-expression gene modules constructed by weighted gene co-expression network analysis (WGCNA) [11]. We set the soft threshold power as 14, which was the lowest power based on scale-free topology [12]. A topological overlap matrix (TOM) was calculated by adjacency transformation, and the value (1 -TOM) was designated as the distance for identification of hierarchical clustering genes and modules. The minimum module size was set to 100.

Module clinical feature associations
In order to identify modules that were significantly associated with the designated clinical trait (BRCA1 expression level), we selected the module with most significant p value form the heat map of the module/trait relationships for downstream analysis.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
pathway enrichment analyses 5 GO analysis was used to identify the mechanisms affected by the module genes. KEGG pathway analysis was used to identify known metabolic pathways affected by the DEGs.
We used the Database for Annotation, Visualization and Integrated Discovery (DAVID) website to perform the GO function and KEGG pathway enrichment analyses of DEGs; results with p values of less than 0.05 were considered significant [13,14]. We plotted the results using a bar map for the top 10 significant biological processes, cellular components, and molecular functions using the ggplot2.R package based on the p value.
The top five significant pathways were visualized using the GOplot.R package.

Establishment of a protein-protein interaction (PPI) network
We used the Search Tool for the Retrieval of Interacting Genes (STRING) to identify key proteins and construct PPI networks [15]. We uploaded the module genes to the STRING database and set the threshold for the interaction score at greater than or equal to 0.9 (the highest confidence). Additionally, we used Cytoscape software to visualize the PPI networks [16,17] and to analyze key modules constructed by the Molecular Complex Detection (MCODE) function. The k score ≥10 was taken as the criteria to define a module.

Validation of candidate genes
Hub genes from the MCODE analysis were chosen as potential biomarkers for deep analysis and validation. We validate the expression levels of hub genes via the GEPIA website (http://gepia.cancer-pku.cn/) using the following criteria as the cutoff: p < 0.01 and |log 2 FC| ≥1 [18]. We further carried out overall survival analysis for hub genes using the Kaplan-Meier plotter database (http://kmplot.com).

Oncomine database extraction
The Oncomine database (http://www.oncomine.org) [19] is the world's largest oncogenechip database integrated data-mining platform and contains 715 gene expression datasets and 86,733 cancer and paired normal sample sequencing data. Here, the Oncomine 6 database was applied for analysis of DEGs and clinicopathological features for common cancer types.

The Human Protein Atlas analysis
The Human Protein Atlas database (https://www.proteinatlas.org) [20] provides information on the tissue and cellular distribution of 24,000 human proteins based on immunohistochemical analysis. Here, we used the Human Protein Atlas database to evaluate the proteome of patients with ovarian cancer.

Bioinformatics analysis
Ovarian cancer methylation data were downloaded from Xena

Identification of DEGs
In the study, we carried out DEG analysis of normal ovary tissue and paired tumor tissues from the GSE14407 dataset. We identified upregulated and downregulated genes between the two sample types and found a total of 3124 genes that met the cutoff criteria (433 upregulated genes and 2691 downregulated genes; Figure 1A). The DEG heatmap is shown in Figure 1B.

WGCNA
The expression values of 3124 DEGs from the 12 paired normal and cancer tissues were used to construct the co-expression module using the WGCNA tool. One of the most important steps was choosing the appropriate soft-thresholding power that affected the independence and average connectivity of gene co-expression modules. When the power value was selected as 14, the independence was equal to 0.8, and the average connectivity degree of modules was higher ( Figure 2A). As shown in Figure 2B, a power value of 14 was selected to produce different co-expression modules.

Modules corresponding to clinical traits
We identified the co-expression modules with the most significant associations based on correlations between the module eigengene and extra traits (e.g., BRCA1 expression). The result indicated that the brown module (183 genes) was the most positively associated with BRCA1 expression ( Figure 3A). Additionally, we found that the brown modules and  Figure 4C. For pathway enrichment analysis, the module genes were enriched for genes involved in the cell cycle and p53 signaling pathway ( Figure 4D).

PPI network construction
After PPI network construction, we found that the network (score > 0.900) had 88 nodes and 721 edges ( Figure 5A). The most significant module consisted of 30 nodes and 398 edges ( Figure 5B). This result suggested that the identified 30 hub genes could play critical roles in ovarian cancer.

Validation of hub genes
In order to validate the gene expression levels of a large number of samples to assess whether gene expression was concordant between samples from the GSE14407 and OV (TCGA/GTEx) datasets, we uploaded genes in the module to the GEPIA website. We found that the expression levels of 28 of 30 genes in the OV dataset (from TGCA/GTEx) were consistent with gene expression changes in the GSE14407 dataset. The gene expression plots of these genes are shown in Figure S1. We then further explored whether these genes were involved in overall survival; the survival prognosis forest map is shown in Figure 6, and the survival curves related to these genes are shown in Figure S2.

Analysis of the PBK gene in the Oncomine and Human Protein Atlas databases
In this study, Oncomine analysis revealed that PBK expression was significantly different in 66 tumor studies (upregulated in 55 studies, and downregulated in 11 studies; Figure   7A). Additionally, we further showed that PBK expression was upregulated in all ovarian 9 cancer samples ( Figure 7B-D). To increase the credibility of the results, we obtained immunohistochemistry data from the Human Protein Atlas database ( Figure 8A-B). Despite the limitations of the use of database results, our findings showed that PBK expression differed between tumor and normal samples.

Molecular mechanisms of PBK expression in ovarian cancer
The methylation data were downloaded from Xena (https://xenabrowser.net/datapages/).
The results revealed that the PBK expression level was negatively correlated with the methylation level and that hypomethylation resulted in high expression of PBK ( Figure 9A).
Somatic cell mutation data were downloaded from TCGA database, and patients were divided into high and low expression groups according to the median expression of PBK. In both high expression and low expression groups, TP53, CSMD3, and TTN mutations were significantly enriched ( Figure 9B-C). We investigated the association between copy number variations and PBK expression levels according to the cBioportal database (https://www.cbioportal.org/). According to copy number variations, the samples were divided into four groups: deep deletion, shallow deletion, gain, and amplification. The results showed that deep deletion and shallow deletion of PBK were associated with low expression levels, whereas gain and amplification were associated with high expression levels ( Figure 9D). Additionally, GO function and KEGG pathway analyses revealed that genes associated with PBK expression were involved in biological processes, such as cell cycle process, nuclear division, DNA metabolic process, and DNA replication ( Figure 9E), and molecular functions, such as ATP binding, DNA-dependent ATPase activity, and DNA binding ( Figure 9F). The results of KEGG analysis indicated that the pathways were mainly involved in the p53 signaling pathway, small cell lung cancer, prostate cancer( Figure 9G).

Discussion
Despite advances in medical technology in recent years, the mortality rate of patients with ovarian cancer has not significantly changed, and ovarian cancer remains the leading cause of gynecological tumor-related deaths. Because ovarian cancer tends to metastasize and exhibits tumor heterogeneity, most patients are diagnosed only at an advanced stage.
Therefore, identification of more reliable biomarkers and exploration of the underlying molecular mechanisms of ovarian cancer will improve the overall survival and prognosis of patients with ovarian cancer.
BRCA1 is a tumor-suppressor gene that contributes to DNA repair and transcriptional regulation in response to DNA damage. Carriers with mutations in the BRCA1 gene are at high risk of breast cancer, ovarian cancer, pancreatic cancer, and prostatic cancer [21].
The specific mechanism through which mutations in BRCA1 can result in increased susceptibility to ovarian cancer are still unclear, although it is expected that many genes may play important roles in this process.
In this study, we identified 3124 DEGs between normal tissues and ovarian cancer tissues and showed that the genes in modules identified by WGCNA were involved in specific biological process, cellular components, and molecular functions. Further pathway enrichment analysis showed that the module genes were enriched in cell cycle and the p53 signaling pathway. From the results of enrichment analysis, we found that these genes were involved in regulating the biological behaviors of ovarian cancer cells and affected the occurrence and development of ovarian cancer.
To further find key genes that played important roles in the occurrence and development of ovarian cancer, we construct PPI networks and screened the key networks. From this analysis, we identified PBK as contributing the overall survival in patients with ovarian cancer. PBK is a serine-threonine kinase that regulates cell cycle-related processes, including cell growth, immune responses, and DNA damage repair, and can accelerate tumorigenesis in various types of cancer, including gastric cancer, oral cancer, prostate cancer, and cervical cancer [22]. Subsequent analysis showed that PBK was strongly associated with ovarian cancer and that the expression of PBK in ovarian cancer tissues was significantly higher than that in normal tissues. Overall, these findings indicated that PBK may contribute to clinical prognosis in patients with ovarian cancer.
PBK has been studied as potential biomarker in various types of cancers, including gastric cancer, colorectal cancer, lung adenocarcinoma, hepatocellular carcinoma, and breast cancer. Overexpression of PBK protein is frequently detected in gastric cell lines and primary gastric tumor samples and is related to gastric tumor occurrence, metastasis, and invasion. Ohashi et al. found that knockdown of PBK in PBK-overexpressing gastric cell lines inhibits cell proliferation by activating p53 in a TP53 mutation-dependent manner and inhibits cell migration/invasion by upregulating phosphatase and tensin homolog (PTEN) in a TP53 mutation-independent manner [23]. Additionally, Su et al. found that the 5-year survival rate was higher in patients with high PBK expression than in patients with low PBK expression in a study of colorectal cancer [22]. Lei et al. also reported that PBK overexpression affected the survival times of patients with lung adenocarcinoma [24], and Shih et al. found that PBK promoted cell migration by activating the phosphatidylinositol 3-kinase/PTEN/AKT signaling pathway in lung cancer [25]. Moreover, Yang et al. showed that PBK can be a potential diagnostic biomarker and therapeutic target for patients with hepatocellular carcinoma and can promote cell migration and invasion by activating the ETV4-uPAR signaling pathway [26]. Finally, Park et al. found that PBK acted as a cancer/testis antigen with oncogenic activity in breast cancer [27].
To further explore the molecular mechanisms underlying the action of PBK in ovarian cancer, we analyzed somatic cell mutation data and found that TP53, CSMD3, and TTN mutations were significantly enriched in both high and low expression groups.
Additionally, the expression levels of PBK were found to be positively associated with copy number variations, suggesting that samples with copy number amplification may have high PBK expression. DNA methylation is a critical type of DNA modification occurring at the epigenetic level and is modulated by DNA methyltransferases [28]. During cancer development, abnormal methylation often occurs, resulting in abnormal expression of multiple key genes [29]. In this study, hypomethylation of PBK resulted in higher expression of PBK and could be related to poor prognosis in patients with ovarian cancer, suggesting that PBK may act as a biomarker and therapeutic target in ovarian cancer. To identify the molecular function of PBK expression in ovarian cancer, the first 500 genes significantly associated with PBK were selected from the cBioportal database for further analysis. The GO function analysis indicated that these genes were involved in cell cycle process, nuclear division, DNA metabolic process, and DNA replication. The molecular functions of these genes were mainly associated with ATP binding, DNA-dependent ATPase activity, and DNA binding. The pathways analysis were related to the p53 signaling pathway, small cell lung cancer, prostate cancer. However, further studies are necessary to fully elucidate the molecular functions of PBK in ovarian cancer.

Conclusion
In summary, we used DEG analysis and WGCNA to identify key module genes closely associated with clinical traits (e.g., BRCA1 expression). Further analysis showed that the PBK gene was associated with prognosis in patients with ovarian cancer. Bioinformatics analysis was used to elucidate the potential molecular mechanisms through which PBK contributed to ovarian cancer. In the future, more in-depth studies are needed to confirm association between PBK expression and ovarian cancer.