Prediction and Analysis of Hub Genes in Ovarian Cancer Based on Network Analysis

Objective The aim of the present study was to identify the interactions among transcription factors (TFs), microRNAs (miRNAs) and mRNAs in ovarian serous cystadenocarcinoma (OV) to investigate the mechanisms. Method: Gene expression data were obtained from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were screened between tumor and nontumor samples using the online tool GEO2R based on the limma package of R language. The coexpression between genes was identied using the weighted gene corepression analysis (WGCNA) package of R language to construct a coding-noncoding network. Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were conducted for the genes in the coexpression network. The expression and prognostic value of the hub genes in coexpression were further validated in The Cancer Genome Atlas (TCGA) dataset. The interactions among miRNAs, mRNAs and TFs were predicted using miRGen and mirWalk to construct the TF-miRNA-mRNA regulatory network. The blue module was identied as signicantly associated with OV, and a coexpression network were constructed through WGCNA analysis. The expression and prognostic value of the hub genes in the blue module were validated in the TCGA dataset. A total of 164 common differentially expressed miRNAs (DEMs) were screened from 2 miRNA datasets. A TF-miRNA-mRNA regulatory network composed of 2 TFs, 6 miRNAs and 199 mRNAs was constructed. QRT-PCR was also conducted to identify the expression of the 6 key miRNAs (hsa-mir-193b, hsa-mir-296, hsa-mir-29b-2, hsa-mir-29c, hsa-mir-34a and hsa-mir-421) in ovarian serous cystadenocarcinoma. We constructed a validated TF-miRNA-mRNA network in OV, and TFs might regulate mRNAs by regulating miRNAs to


Introduction
Ovarian cancer is a malignant tumor with one of the highest mortality rates in the female reproductive system, second only to cervical cancer and endometrial cancer [1][2][3]. If ovarian cancer has invaded the pelvic cavity or is accompanied by abdominal distension or abdominal pain, it indicates that the lesion has shifted with regard to the clinical stages. The main clinical treatment for ovarian cancer is cell reduction surgery and combination chemotherapy with platinum and paclitaxel. In patients with ovarian cancer, although combined with cisplatin chemotherapy, the rst sensitivity rate is more than 80%, and platinum resistance is prone to occur [4][5][6]. Most patients eventually relapse and die because of resistance to chemotherapy. Therefore, reducing the recurrence rate of multidrug resistance reversal in ovarian cancer chemotherapy, especially the mechanism and pathway of reversing platinum resistance, has become an important research topic.
MicroRNAs (miRNAs) are a class of endogenous small non-single-stranded RNAs composed of [19][20][21][22][23] nucleotides that regulate cell proliferation, apoptosis, metabolism, gene expression and other physiological and pathological biological processes through posttranscription mechanisms [7][8][9]. Recent studies have proven that miRNAs can regulate approximately 1/3 of the protein-coding genes, and abnormal expression of miRNAs appears in several cancers [10][11][12][13]. A transcription factor (TF) is a protein that controls the transcription process by binding to speci c sequences in a gene regulatory region [14]. TFs can not only directly regulate the expression of mRNAs but also that of miRNAs. Numerous studies have revealed that TFs regulate gene expression by interacting with miRNAs. For instance, Chandra et al. found that two TFs, ZEB1 and ZEB2, could contribute to epithelial-mesenchymal transition (EMT) through interaction with miR-101-1 [15]. In addition, Hua et al. demonstrated that ETS1 could inhibit cell proliferation, invasion and migration by regulating miR-139-5p in hepatocellular carcinoma (HCC) [16]. Thus, TF deregulation may result in the deregulation of miRNAs or mRNAs and induce the appearance of some physical abnormalities. TF-miRNA-mRNA regulation loops may play critical roles, and the TFs, miRNAs, and mRNAs in those loops may serve as important targets for the treatment of ovarian cancer.
Over the past years, microarray technology has been widely applied to cancer research to obtain genetic alterations during ovarian cancer tumorigenesis [17,18]. Bioinformatics methods are necessary to process a great deal of data generated by microarray technology. The National Center of Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, http://www.ncbi. nlm.nih.gov/geo) is a public repository of microarray and next-generation sequencing data storage, which is freely available to users.
In this study, we downloaded 2 original microarray datasets, which were published in NCBI-GEO.
Differentially expressed miRNAs between tumor samples and normal samples were screened using the R software limma package [19]. We performed gene function analysis, including Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Then, we constructed the regulatory network by combining the coexpression network with miRNA-mRNA and TF-miRNA interactions to reveal the potential molecular differences between tumor samples and normal samples. In conclusion, we constructed and analyzed the regulatory networks and possible key signaling pathways using bioinformatics tools to study different molecular mechanisms of ovarian cancer.

Method
Gene datasets and clinical information A series of matrix les for the GSE4122 dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE4122) and the GSE47841 dataset [20] were obtained from NCBI-GEO. The platform used in the GSE4122 dataset was the GPL201 (HG-Focus) Affymetrix Human HG-Focus Target Array. A total of 67 samples were included in the GSE4122 dataset and divided into 4 groups (benign, malignant, borderline malignant and normal human ovaries). However, the sample size of the borderline malignant group which has only 3 samples is too small. Considering the bias of WGCNA algorithm, borderline malignant group was wiped out in the further analysis [21]. Therefore, 64 samples were divided into three groups based on pathology (benign, malignant and normal human ovaries) for the further analysis. Two miRNA datasets were included in the present study and the data search stage was shown in Figure 1. The platform used in the GSE47841 dataset was the GPL14613 (miRNA-2) Affymetrix Multispecies miRNA-2 Array. As described in the original study of the GSE47841 dataset, 30 samples were divided into three groups: high-grade serous ovarian carcinoma (HGSC), ovarian surface epithelium (OSE) and clear cell ovarian carcinoma. In the present study, only 12 HGSC and 9 OSE samples were included. The platform used in the GSE131790 dataset was the GPL8786 (miRNA-1) Affymetrix Multispecies miRNA-1 Array. As described in the original study of the GSE131790 dataset, 25 samples were divided into two groups: 19 HGSC samples and 6 normal samples.

Weighted gene corepression analysis
Sample clustering was performed to demonstrate the relationship between clinical traits and the expression pro le. After the raw data of the GSE4122 dataset were preprocessed, weighted gene corepression analyses (WGCNA) were performed according to the previously described algorithm to identify signi cant gene modules [22]. The probe set was rst ltered based on the variance of expression value across all samples. Probe sets with duplicated gene symbols were also removed based on expression variance. The R package WGCNA [23] was applied for this analysis. Brie y, Pearson's correlation coe cients were calculated for selected genes in a pairwise manner yielding a similarity matrix (Sij). The soft threshold (power) was set as 7. The matrix was transformed into an adjacency matrix (aij) using a power function using the formula aij = Power (Sij, β) ≡ |Sij| β. Average linkage hierarchical clustering was then performed to identify modules of densely interconnected genes. Network interconnectedness was measured by calculating the topological overlap using the TOMdist function with a signed TOM-Type. Average hierarchical clustering was performed to group the genes based on the topological overlap dissimilarity measure (1-TOM) of their connection strengths. Then, the module that was most related to the disease phenotype was selected for further analysis. The coexpression with a weight score > 0.05 in the module was ltered. Finally, the coexpression network was visualized by using Cytoscape (www.cytoscape.org; version 3.4.2) and analyzed by computing the topological properties of the network, including the network distribution [24]. The subnetwork was then extracted from the entire PPI network by using the MCODE app according to the previously described algorithm [25].
Gene function analysis GO enrichment analysis and KEGG pathway enrichment analysis of mRNAs was implemented with DAVID (https://david.ncifcrf.gov/). In short, the gene identi ers were rst converted into their H. sapiens Entrez gene IDs based on the latest database. If multiple identi ers corresponded to the same Entrez gene ID, they were treated as a single Entrez gene ID in downstream analyses. For each given list of genes, pathway and process enrichment analysis was performed with the following ontology sources: KEGG pathway and GO biological processes (BP), cellular component (CC) and molecular function (MF). The species was limited to H. sapiens, and all genes in the genome were used as the enrichment backgrounds. More speci cally, the p-value was calculated based on the accumulative hypergeometric distribution, and the q-value was calculated using the Benjamini-Hochberg procedure to account for multiple testing [26]. The Kappa score was used as a measure of similarity when performing hierarchical clustering on the enriched terms, and then subtrees with similarity > 0.3 were considered a cluster. The most statistically signi cant term within a cluster was selected as the one representing the cluster.

Validation in TCGA datasets
The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov) is a platform for researchers to download free public datasets [27]. In the present study, the expression of hub genes selected from the GSE4122 dataset was veri ed in TCGA using the online tool GEPIA (http://gepia.cancer-pku.cn) to improve the reliability of the analysis. A total of 426 normal samples and 88 tumor samples of ovarian serous cystadenocarcinoma (OV) in TCGA datasets were included in the present study. Box and whisker plots were used to show the expression levels of the hub genes. Fold change (FC) > 2 and p-value < 0.05 were used as cut-offs. The prognostic value of the hub genes was further con rmed by Kaplan-Meier survival analysis based on the clinical information from TCGA datasets using GEPIA. Based on gene expression values, the group cut-off was the quartile.

Identi cation of differentially expressed genes
In the present study, GEO2R was used to lter differentially expressed miRNAs between HGSC samples and OSE samples in the GSE47841 dataset. A FC greater than 2 and a P-value less than 0.05 were considered as the cut-off criteria, and probes were ltered without a corresponding gene symbol.
Construction of the TF-miRNA-mRNA network The TFCheckpoint database [28] was used to nd the transcription factors in differentially expressed genes identi ed in the GSE4122 dataset. The interactions between TFs and miRNAs were predicted by using the miRGen v.3 database [29]. The interactions between the differentially expressed miRNAs and differentially expressed mRNAs were predicted using miRWalk, TargetScan [30] and miRDB [31], and a score ≥ 0.95 was considered the cut-off criterion. Cytoscape software (version 3.40) was used to visualize the regulatory network.

Patient samples
A total of 5 ovarian serous cystadenocarcinoma specimens and 5 nontumor ovarian cancer epithelial tissues were collected from the First A liated Hospital of Soochow University (Suzhou, China). The average age of the patients was 45±5.5 years. Biopsy samples were collected from May 2017 to May 2018. Written informed consent was obtained from each patient, and the experimental protocols were approved by the Institutional Review Board of Soochow University. Each biopsy sample was divided into two sections. One section was submitted for routine histological diagnosis.

RNA extraction and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) analysis
First, total RNA of tissues was extracted by fully grinding the sample with liquid nitrogen. Then, 1 ml of TRIzol (Zymo Research) solution was added, mixed, and placed at room temperature for 5 min to fully extract the total RNA. Then, samples were centrifuged at 14,000 g at 4℃ for 15 min, and the RNA, which was separated into the upper water phase layer, was transferred to another new RNase-free EP tube. Later, isopropanol was added to the samples, gently and thoroughly mixed, and allowed to stand at room temperature for 10 min. Next, the samples were centrifuged at 14,000 g for 10 min at 4℃, and the RNA precipitate was collected and washed twice with 75% ethanol. According to the amount of precipitate, an appropriate amount of DEPC water was added to dissolve the precipitate.

Results
Construction of the coexpression network By using the R package WGCNA, we generated 15 modules after ltering the probe sets with no signi cant variance of expression value across all samples ( Figure 2). All uncorrelated genes were assigned to a gray module. Out of 15 modules, we identi ed the blue module to be most signi cantly associated with benign human ovaries (r=0.3, P-value=0.01). A coexpression network composed of 207 nodes and 4085 edges of the blue module was constructed ( Figure 3A). RRM2 had the most degrees (degrees = 151). A subnetwork composed of 75 nodes and 2198 edges was extracted from the global network ( Figure 3B).
GO and KEGG pathway enrichment analysis of the genes in the coexpression network GO and KEGG enrichment analyses were performed for the genes in the blue module to investigate the biological function of these genes. In GO and KEGG enrichment analysis, all the results were ranked by enrichment score (-log (P-value)). As shown in Figure 4, in the KEGG pathway enrichment analysis of the genes in the blue module, proteasome, oxidative phosphorylation and biosynthesis of antibiotics were the top 3 enriched pathways. In biological process analysis, anaphase-promoting complex-dependent catabolic process, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and proteasome-mediated ubiquitin-dependent protein catabolic process were the top 3 enriched terms.

Validation in the TCGA dataset
The hub genes selected from the coexpression network were further validated in the TCGA dataset. First, the expression was validated, and then the value of prognosis was validated through survival analysis.

Identi cation of differentially expressed miRNAs in the GSE47841 dataset
With a threshold of P-value < 0.05 and |log2 (FC)| > 1, DEMs in the GSE47841 dataset were identi ed. A total of 200 DEMs including 139 downregulated miRNAs and 61 upregulated miRNAs between tumor samples and normal samples were screened ( Figure 6A). With the same threshold, a total of 210 DEMs including 144 downregulated miRNAs and 76 upregulated miRNAs between tumor samples and normal samples were screened in GSE131790 dataset ( Figure 6B). After the identi ed DEMs from the two datasets were intersected, 164 DEMs were selected for the further study.
Construction of the TF-miRNA-mRNA regulatory network To identify how these differentially expressed miRNAs function in ovarian cancer, TF-miRNA and miRNA-mRNA interactions were predicted using online prediction tools separately. First, TFs were screened from the genes in the blue module. TF-miRNA and miRNA-mRNA interactions were predicted using online prediction tools separately. HIF1A, HLTF, HOXB7, PLSCR1, STAT1, SUB1, TARDBP, TFAP2A and YY1 were TFs screened from the blue module and 6 TF-miRNA (2 TFs and 6 miRNAs) interactions were screened in the blue module. Then, the targets of these 6 miRNAs were predicted to intersect with the genes in the blue module. A total of 439 mRNA-miRNA interactions were identi ed. Finally, a TF-miRNA-mRNA regulatory network was constructed ( Figure 7A).

Validation of differentially expressed genes by RT-qPCR
The present study aimed to determine whether the differentially expressed miRNAs identi ed in the microarray analysis were upregulated or downregulated in clinical patients with ovarian serous cystadenocarcinoma. OV specimens and nontumor epithelial tissues were obtained, and the differential miRNAs were validated with RT-qPCR ( Figure 7B). The experimental results showed that hsa-mir-193b, hsa-mir-296, hsa-mir-29b-2, hsa-mir-29c, hsa-mir-34a and hsa-mir-421 expression was signi cantly downregulated, which was consistent with the microarray analysis results.

Discussion
Ovarian cancer is one of the most common malignant tumors in gynecology. Because it is located in the deep pelvic cavity, there are no obvious clinical symptoms in the early stage [32]. There is no effective method of early screening. Most patients are diagnosed late in the process and miss the optimal timing of treatment, which causes great harm to women's physical and mental health [33]. Currently, the most widely used treatment is ideal cytoreductive surgery combined with paclitaxel and platinum-based combination chemotherapy. Early chemotherapy effects are obvious in most patients, but almost all patients will experience recurrence of the disease. One of the most important reasons is the emergence of chemotherapy resistance. Because of this, although the incidence of ovarian cancer in our country has tended to be stable in recent years, the mortality rate has shown an upward trend year by year [34]. Therefore, chemotherapy resistance in ovarian cancer has become one of the focuses of ovarian cancer research.
In the present study, an ovarian cancer related coexpression network were constructed, and hub genes were identi ed from the networks. Among these genes, several genes have been identi ed as ovarian cancer-associated genes, such as STAT1 [35] and HOXB7 [36]. However, little is known about the function of differentially expressed miRNAs in ovarian cancer. Based on the TF-miRNA-mRNA regulatory network analysis, hsa-mir-193b, hsa-mir-296, hsa-mir-29b-2, hsa-mir-29c, hsa-mir-34a and hsa-mir-421 were screened in the present study. Among these miRNAs, hsa-mir-29b-2 has the most targets. In several cancers, such as lung cancer [37] and giant cell tumor of bone [38], hsa-mir-29b-2 was considered a tumor promoter [39]. However, it is interesting that hsa-mir-29b-2 plays an opposite role in ovarian cancer. Hsa-mir-29b-2 was proven to be able to promote epithelial mesenchymal transition (EMT) and was associated with better prognosis in ovarian cancer [40].
Several studies have revealed that hsa-mir-29c plays an essential role in the development of ovarian cancer. Hsa-mir-29c can inhibit apoptosis of lung cancer cells by activating the Wnt/β-catenin pathway [41] and suppressing PPARγ/VEGF-A/BCL-2 [42]. Hsa-mir-29c also plays a driving role in metastasis in Ewing sarcoma [43], and validation of this role in ovarian cancer is still needed.
Hsa-mir-193b inhibits tumor development in pancreatic cancer [44,45]. However, studies about hsa-mir-193b are quite few, and the role of hsa-mir-193b in ovarian cancer is totally unknown. Although we have revealed that low expression of hsa-mir-193b was correlated with poor prognosis, validation in a larger number of samples and research on miR-1181-related mechanisms are needed.
Nuclear elongation factors can be divided into EEF1 and eEF2. EEF1 mediates the binding of aminoacyl tRNA to ribosomes. It has four subunits, namely, EF-1 α, EF-1 β 1, EF-1 β and EF-1 γ [46]. In April 1995, the IUBMB Committee renamed them eEF1A, eEF1b α, eEF1b β and eEF1b γ. EEF1A is not only involved in signal transduction related to translation control but is also involved in cell growth, stress response and signal transduction related to motility. However, few studies have been conducted on eEF1F [47].
Zmpste24 is a zinc metalloproteinase in mice and is homologous with the human Face-1 gene and widely distributed in mammalian tissues. Mutations of zmpste24 and prelamin A lead to a series of diseases called laminopathy [48]. Navarro et al. [49] reported that nine newborn babies all suffered from severe restrictive skin disease (RD) due to the synthesis disorder of lamin A caused by mutations of the zmpste24 gene. However, research on tumors is very limited, but it is worth further study. The protein family of microchromosome maintenance (MCM) is an important part of the DNA prereplication complex, which plays an important role in the process of DNA replication initiation and DNA damage repair. MCM protein family members also play an important role in transcriptional regulation, chromatin remodeling and checkpoint response [50,51]. In ovarian cancer, the Kruskal-Wallis test showed that there was a correlation between the increased expression of MCM3 and the increased degree of histologic malignancy, and there was a positive correlation between the expression of MCM3 and MCM7. In addition, the expression of MCM2 and MCM5 was also related to the tumor stage [52]. Because of these correlations, MCM protein should be considered a marker of proliferation of ovarian cancer, which is consistent with our results.
In conclusion, we found several weighted genes involved in the development and progression of ovarian cancer through bioinformatics analysis and survival analysis. However, further experimental veri cation is still needed to con rm the potential effects of the weighted genes in ovarian cancer.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Authors' contributions YC designed the study and wrote the manuscript. JW and YZ performed all of the experiments. YL conducted the statistical analysis. All authors read and approved the nal manuscript.