Exploration of differential methylation-driven genes to identify key epigenetic genes for prognosis of ovarian serous cystadenocarcinoma

Background: Ovarian cancer is the leading cause of gynecological cancer-related mortality. The majority of patients have poor prognosis due to the lack of available biomarkers capable of predicting prognosis accurately and providing novel targeted therapy options. Methods: The methylation array data, gene expression prole and clinical information of 351 ovarian serous cystadenocarcinoma (OSC) samples as well as the methylation array data of another 10 normal ovarian epithelial tissues were accessed from The Cancer Genome Atlas (TCGA) and served as training dataset. An algorithm, MethylMix, was on the basis of a β mixture model to compare the DNA methylation status in tumor to normal DNA methylation status and then screened out hypo and hypermethylated genes of OSC, which were transcriptionally predictive. Gene Ontology (GO) enrichment and ConsensusPathDB pathway analysis of differential methylation-driven genes were further performed. A linear risk model was constructed through the univariate and multivariate COX regression with Akaike Information Criterion (AIC). Datasets of OSC from Gene Expression Omnibus (GEO) database were served as validation datasets to determine the prognostic value of the risk model. The Kaplan-Meier analysis was then used to evaluate the ability of the risk model to distinguish the survival in training dataset or validation dataset, even in subgroups of patients with different FIGO stage, histologic grade, residual tumor after surgery, and anatomic neoplasm subdivision. The receiver operating characteristic (ROC) curve was also applied to determine the value of risk model on predicting the prognostic survival of patients as well as compared with other 15 known prognostic biomarkers in OSC. Results: Totally 171 differential methylation-driven genes were identied between OSC and normal samples. Five of them, UBB, PLAT, TMOD1, KCNJ11, and CDSN, were selected to construct the prognostic risk model. The Kaplan-Meier analysis demonstrated that the survival rate in low-risk cohort was signicantly higher than that in high-risk cohort in both training dataset and validation dataset (P<0.0001 for training dataset; P<0.05 for


Background
In the female reproductive system, ovarian cancer is the second most common cancer but the most lethal cancer of gynecological malignancy (1). Epidemiological statistics indicate that 23% of gynecological malignancies originate from the ovary, but about 47% of all deaths from gynecological malignancies due to ovarian cancer (2). Ovarian serous cystadenocarcinoma (OSC) accounts for approximately 90% of all pathologic subtypes of ovarian cancer, which means that on average 9 out of every 10 patients diagnosed with ovarian cancer are OSC (3). Though heterogeneity exists between tumors, OSC patients are usually treated with standard therapy consisting of tumor rebulking operation and platinum-based chemotherapy (4,5). Despite there have been some advances in treatment of this disease in the last few decades, gain in average survival time is limited. The ve-year survival rate for ovarian cancer is just 45% (6). A large part of the causes of poor prognosis lies in a lack of available prognosis indicators which could help physician classify patients according to their degree of risk so as to provide personalized treatments. In addition, conventional treatment patterns usually cause resistance to chemotherapy resulting in tumor recurrence and leading poor prognosis (4). Therefore, it is imperative to identify differential methylation-driven genes in OSC to reveal potential therapeutic targets and construct a highly speci c, sensitive, and independent predictive prognosis model by bioinformatics analysis.
DNA methylation, as the core element of epigenetic modi cation, participates in cellular differentiation, maintaining genomic stability, genomic imprinting, and many other processes (7,8). DNA methylation has been one of the most researched epigenetic aberrations in many cancer forms. Besides gene mutation, hypo and hypermethylation of genes was found as an important mechanism underlying oncogenesis (9). It has been found that abnormal DNA methylation occurs commonly in ovarian cancer (10). In recent years, DNA methylation mechanism in the tumorigenesis and different phenotypes of ovarian cancer has become the focus of great attention. The Methylation levels of an increasing number of genes have been found to repress gene expression and have been shown to be strongly associated with OSC's diagnosis and prognosis (11). For instance, methylation of leukocyte BRCA1 promoter region was found as a potential risk factor for carcinogenesis of ovarian cancer (especially high grade serous ovarian cancer) and this aberrant methylation could even be detected in both newborns and adults (12). Bonito et al. showed that low methylation level of CpG island in gene MSX1 is related to platinum resistance and shorter OS (overall survival) (13). In addition, Xiao et al. revealed that there was a signi cant association between aberrant methylation of p16 INK4a promoter and ovarian cancer risk using a xed-effects model (14). Thus, the using of DNA methylation as potential biomarkers holds great promise in indicating prognosis of patients with ovarian cancer for improving the survival rate with only a small number of specimens to meet the need of methylation analysis and is superior to other markers in many aspects, including relative stability both in vivo and vitro (15). What's more, the frequent epigenetic alterations with biological signi cance may provide relevant targets for innovative therapy to improve the dilemma of chemotherapeutic resistance in the current traditional treatment patterns.
Herein, we integrated and analyzed the quantitative data of gene expression and DNA methylation pro le of OSC patients provided by The Cancer Genome Atlas (TCGA) database to e ciently identify methylation-driven genes with MethylMix. MethylMix, an algorithm built-in R, screens disease speci c DNA methylation-driven genes based on a β mixture model (16). The univariate and multivariate COX regression was then used to assess whether these genes had any relevance to survival of patients with OSC thus constructing a novel prognostic risk model. In addition, the Kaplan-Meier analysis and receiver operating characteristic (ROC) curve validated the potential clinical signi cance of the model.

Data of samples acquisition and preprocessing
Complete DNA methylation data, gene expression pro le and clinical information of OSC samples and the DNA methylation data of normal ovarian epithelial samples were downloaded from the TCGA database. In this study, only OSC samples with DNA methylation data, gene expression pro le and clinical survival information were selected out as training dataset. Total 351 OSC samples were enrolled while another 10 normal ovarian epithelial tissues with DNA methylation data were selected as control. And another 2 independent datasets of OSC obtained from Gene Expression Omnibus (GEO) database with accession number GSE43265 and GSE25033 were selected as validation datasets. The DNA methylation data were all acquired based on GPL8490 (Illumina HumanMethylation27 BeadChip) and expressed as β values. The methylation data was normalized by the 'normalizeBetweenArrays' function in LIMMA package and the gene expression data was normalized as TPM (Transcripts Per kilobase Million) form. The data obtained from TCGA and GEO are open to public, so there is no requirement for local ethics committees to review it.

Exploration of differentially methylated genes
MethylMix package (http://www.bioconductor.org/packages/release/bioc/html/MethylMix.html) was used to identify the difference of DNA methylation status between tumors and normal samples on a basis of constructing a β mixture model (FDR<0.05) and calculating the Pearson's correlation coe cient between the gene methylation level and its expression ( lter condition was Cor<-0.3, P<0.05).

Enrichment analysis of gene set
The Database for Annotation, Visualization and Integrated Discovery (DAVID) v.6.8 online tool was used to perform the (Gene Ontology) GO enrichment analysis of differential methylation genes (17) and the gure of enrichment results was plotted as bubble plots using ggplot2 R package. The differentially methylated genes were then analyzed using ConsensusPathDB, an online open platform integrates information on protein interactions, genetic interaction signaling, cellular metabolism, gene regulation, and drug-target interactions in human based on 32 public databases. (18). The over-representation analysis of ConsensusPathDB was used, and P<0.01 was set as the criterion.

Statistical analyses
All the statistical analysis in this study were performed using R statistical package (R version 3.5.3). Univariate and multivariate COX regression analysis were performed using Survival package. And the variables in multivariate COX regression were selected out based on the Akaike Information Criterion (AIC) by using MASS package so as to construct risk model with well goodness of t. A risk score could be calculated via the formula consist of the sum of DNA methylation level of the methylation-driven genes weighted by the coe cient in multivariate Cox regression analysis for advanced application: Finally, a risk score formula was extracted and it could contribute to the prediction of the survival. The risk score of every patient was calculated with this formula and helped us divide patients into different cohort using the median as a threshold value. The Kaplan-Meier estimator with Mantel-Haenszel log-rank test was applied to determine difference of the survival between the two cohorts using Survminer package. Furthermore, the ROC analysis was used to estimate the sensitivity and speci city of risk model in predicting the prognosis (OS=3 years) as well as compared with other biomarkers using timeROC package. Area Under Curve (AUC) as well as its 95% con dence interval (CI) was calculated and Z test was used to validate the difference of risk model and other biomarkers.

Results
The clinical characteristics of selected patients from TCGA and GEO database 351 patients with methylation data, gene expression data, and complete clinical information were selected from the TCGA as training dataset and another 2 independent datasets of OSC from the GEO database were served as validation datasets (see 'Methods' section). All samples were pathologically and clinical diagnosed with OSC. The patients were divided into stage I, II, III, and IV according to FIGO staging system while they were also assigned to G2, G3, and G4 as the tumor histologic grade. Anatomic neoplasm subdivisions were based on different locations of the surgical lesion including left, right, and bilateral. According to the diameter of tumor residual lesions, the patients were divided into no macroscopic disease, 1-10 mm, 11-20 mm, and >20 mm groups. The clinical characteristics of samples are summarized in Table 1.

Screening of differentially methylation-driven genes
Through the comparison of normalized methylation data of 351 OSC samples and 10 normal samples extracted from TCGA based on MethylMix package, a β mixture model was constructed. Subsequently, Mann-Whitney U test for methylation gene and correlation test between methylation level and gene expression were calculated (FDR<0.05, Cor<-0.3, P<0.05). 171 methylation-driven genes were identi ed in result (see Additional le 2) and the heatmap of differential methylation-driven genes was plotted ( Figure  1). Among the 171 differentially methylated genes, 113 genes had higher methylation levels in tumors than normal tissues while 58 genes had lower methylation levels. The rst 10 differentially methylated genes with lowest FDR were selected to plot the distribution map based on β mixture model ( Figure 2) and correlation scatter diagram (Figure 3).

Functional and pathway enrichment analysis of methylation-driven genes
To further explore the potential mechanism underlying oncogenesis of OSC, functional enrichment analysis and pathway analysis of the 171-differential methylation-driven genes were performed using DAVID and ConsensusPathDB (see 'Methods' section). As the result of GO enrichment analysis, these genes were concentrated in 40 biological processes, 14 molecular functions, and 11 cellular components ( Figure 4A, Additional le 3). In biological process term, these genes mainly involve in ammatory response, positive regulation of transcription, DNA-templated, positive regulation of NF-κB transcription factor activity and others. In molecular function term, the genes were mainly enriched in transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-speci c binding. And in cellular component term, extracellular exosome was enriched the most signi cantly. Pathway analysis on ConsensusPathDB showed that the genes were mainly involved in Interleukin-18 signaling, Cytokine Signaling in Immune system, and Interleukin-1 family signaling ( Figure 4B, Additional le 4).

Construction and validation of risk assessment model based on different regrouping methods
The univariate COX proportional hazard analysis was rst carried out for the 171-differential methylationdriven genes in TCGA training dataset and β values of DNA methylation were served as variables to identify the survival related methylation-driven genes. As a result, 9 methylated genes were screened out (P<0.05; Additional le 5). Further, the 9 candidate genes were analyzed using the multivariate COX regression based on Akaike Information Criterion (AIC). Consequently, ve methylation-driven genes were singled out as key variables and statistical test on the COX regression coe cient showed that the 5 genes were signi cantly related to the survival (P<0.05, Additional le 1: Table S1 and Additional le 5). Then DNA methylation level of the 5 genes were weighted by the regression coe cient and a risk formula was nally obtained: Risk Score=1.209 × (methylation level of UBB) -1.3560 × (methylation level of PLAT) + 2.7718 × (methylation level of TMOD1) + 1.4498 × (methylation level of KCNJ11) -0.7739 × (methylation level of CDSN). The risk score of 351 tumor samples were then calculated with this formula and the median value of risk score (Median=2.567) was set as a group cutoff condition. Consequently, 351 OSC samples were divided into a high-risk cohort (n=175) and a low-risk cohort (n=176). Meanwhile, we further analyzed the methylation levels as well as the expression of these ve key epigenetic genes in the two cohorts ( Figure 5A and 5B). Signi cantly higher methylation levels of PLAT and CDSN while signi cantly lower methylation levels of other three driven genes were observed in high-risk patients than low-risk patients, which were consistent with the results of COX regression. Based on this grouping method, the Kaplan-Meier survival analysis of high-risk cohort and low-risk cohort showed that the survival rate of the low-risk cohort was signi cantly higher than that of the high-risk cohort at all time points ( Figure 5C). To evaluate the validity of this risk model, a completely independent dataset of 31 OSC patients from GEO database was included analysis and we then strati ed the 31 OSC patients into high-risk (n = 14) and low-risk group (n = 17) with the same cutoff condition as previous. The Kaplan-Meier analysis demonstrated that the OSC patients in low-risk group had a signi cant better overall survival than in high-risk group (P<0.05 with log-rank test; Figure 6A). Further, to con rm that whether the risk model could indicate prognosis independently, the univariate and multivariate COX regression analysis of risk model, age at diagnosis, as well as histological grade were conducted, respectively. As the results showed in Figure 6C and 6D, our risk model as an independently prognostic factor exhibited marginally signi cance in both univariate and multivariate COX regression analysis. In addition, we also detected the capacity of this model to distinguish the progression-free survival (PFS) of patients in another OSC dataset of 20 patients from GEO database. We used the same formula and cutoff value to calculated risk score of 20 OSC patients and separated them into high-risk (n = 10) and low-risk group (n = 10). The Kaplan-Meier curve showed that there was no signi cant difference on PFS between high-risk and low-risk group (P=0.73 with log-rank test; Figure 6B), which may be affected by the sample content. It was known to all that patients with OSC in advanced stage (Stage III and IV) had shorter survival time than in early stage (Stage I and II) (11). However, irrespective of the effect of the degree of disease progression, the survival rate was still lower in high-risk cohort obviously in either patients of early stage or advanced stage (Additional le 1: Supplementary Figure 1). The result suggested that the novel risk model was independent of FIGO staging system. We further determined the clinical effect of the risk model in patients with different histologic grade, including G2 and G3, which had enough samples. The Kaplan-Meier analysis demonstrated that the survival rate of high-risk cohort was signi cantly lower than that of low-risk cohort in both G2 and G3 groups (Additional le 1: Supplementary Figure 2). As for the anatomic neoplasm subdivisions, left side only and right side only were merged as unilateral group, while the opposite is the bilateral group. Similarly, we also observed signi cant differences in survival between the high-risk and low-risk cohorts (Additional le 1: Supplementary Figure 3). In addition, after classi cation of patients according to the size of the residual tumor lesions, the result of the Kaplan-Meier analysis in each group was also signi cant (Additional le 1: Supplementary Figure 4).

The receiver operating characteristic (ROC) analysis of risk model compared with other known prognosispredict markers
To further estimate the sensitivity and speci city of the risk assessment model consisting of 5 key epigenetic genes in predicting survival, the time-dependent ROC curve was plotted and the AUC of the risk model was 0.71 (95%CI: 0.64-0.78) at 3 years of OS. We also performed the ROC analysis in groups of different FIGO stage, histologic grade, residual tumor after surgery, and anatomic neoplasm subdivision (Additional le 1: Supplementary Figure 1-4). All the results indicated that the risk model based on ve key epigenetic genes had satis ed performance in predicting clinical survival. In addition, we chose several genes that had been identi ed as prognosis predictors in previous researches to verify whether the novel risk model had ascendancy in the predictive performance. For instance, methylation of BRCA1 promoter region and overexpression of Capn4 predict poor ovarian cancer prognosis (19,20). LncRNA HOTAIR was determined as an independent prognostic biomarker of OS (21). The AUC of these genes and risk scoring model were gured out and shown in Figure 5D and Additional le 1: Table S2. And Z test was used to examine whether the risk model had higher predictive value than these genes. The nal results showed that our model was superior to many other biomarkers in reliability and stability Discussion Among female malignant tumors of the reproductive system, ovarian cancer has a high degree of aggressiveness and a poor prognosis. The rst of the important reasons is that the internal molecular mechanism of the development of ovarian cancer has not been fully clari ed. The second is due to the lack of highly sensitive and speci c prognostic markers (22). Therefore, in-depth study of the molecular pathogenesis of OSC and identi cation of prognostic biomarkers are urgently needed, which will greatly help improve the quality of life and prognosis of OSC patients. DNA methylation modi cation, as an important epigenetic mechanism regulating gene transcription, is considered to have signi cant role in the carcinogenesis and have broad application prospects in indicating the prognosis or early diagnosis of tumor patients because of its advantages in convenience and sensitivity of detection, and tumor speci city (15,23). In recent years, many studies have shown that some molecular markers can predict the clinical prognosis of tumor patients well. For instance, methylation states of PITX2 (the homeodomain transcription factor) has high con dence in the prediction of the distant recurrence risk in lymph node-negative breast cancer patients treated with tamoxifen (24). The upregulation of HDAC1 (histone deacetylases 1) has been correlated with poorer prognosis of adenocarcinoma of the lung (25). However, the limitations of these studies are that the independence of the molecular markers for the prognostic evaluation of patients has not been veri ed, and the detection of single factors often has lower sensitivity and speci city. In addition, though Guo et al. (26) have established a ve-DNA methylation signature act as a novel prognostic biomarker using data from TCGA database, they did not consider the methylation status of normal ovarian epithelial tissues and the possible over tting in the process of building a multi-factor model. In present study, a novel computational framework for identifying differentially methylation-driven genes by integrating the transcriptomes data of cancer samples and the genome-wide methylation data of cancer samples as well as normal samples was adopted, and then we performed univariate and multivariate COX regression analysis base on the Akaike Information Criterion (AIC) to screen parameters of the model that best interpreted the data. After analysis, a risk prediction model composed of ve key epigenetic genes was nally obtained. And the Kaplan-Meier analysis further con rmed the potential predictive value of the risk model both in the training and validation datasets. This risk model still performs well in distinguishing between high-risk and low-risk cohorts in different FIGO stage, histologic grade, residual tumor disease, and anatomic neoplasm subdivisions as well as the results of ROC analysis showed that the sensitivity and speci city of the risk model were higher than other known molecular markers. This indicates that the risk score is an independent indictor of patient prognosis, and the combination of ve genes provides better predictive potential.
Previous studies have shown that the ve key epigenetic genes mentioned above may play regulative role in cancer. UBB, the gene encodes ubiquitin B, is a vital element in targeting cellular proteins for degradation by the 26S proteasome. It has been revealed that the transcription suppression of UBB occurs in a large number of patients with cancers derived from female reproductive system, which is a speci c change in cancer subtypes and patients with high expression of UBB tended to have better outcomes than those with low UBB expression in high grade serous ovarian cancer (27). PLAT (tissue plasminogen activator) has been identi ed as an important role in cancer invasion and metastasis, even in angiogenesis, vascular remodeling involved in tumor progression (28). In ovarian cancer, PLAT may be closely related to the prognosis of ovarian cancer. High plasma concentration of tissue plasminogen activator (tPA) before surgery is determined as an independent biomarker for shorter OS in patients with ovarian cancer and the expression of PLAT in plasma before the start of rst-line chemotherapy could be used as an indicator to predict survival of patients with ovarian cancer (29). TMOD1 has been proved previously that TMOD1 elevation could enhance the malignant phenotype of TNBC via inducing expression of MMP13 (30). Through GWAS, SNP within KCNJ11 was associated with colorectal cancer risk (31) and it was con rmed that the expression of KCNJ11 could promote tumor progression of hepatocellular carcinoma with interaction of lactate dehydrogenase A (32). And CDSN encodes a protein found in corneodesmosomes, which localizes human epidermis and other corni ed squamous epithelia. Though the effect of CDSN gene on tumor biological behavior is still unclear, CDSN is worthy of investigation due to its tissue speci city and special function as a cellular component.
In the GO enrichment of biological process and the pathway analysis, it was found that the differentially methylated genes were highly associated with 'in ammatory response' and 'Interleukin-18 signaling'.
Interleukin-18 (IL-18) is an immune-enhancing cytokine, which induces CD8 + T cells and natural killer (NK) cells to produce IFN-γ, stimulates Th1 cell polarization, and mediates a speci c immune response to tumor (33). IL-18 had been used as a new molecule for targeted therapy to enhance anti-tumor immunity, but ended in failure in phase II clinical trials (34). Researchers had found that many clinical patients developed acquired resistance after multiple IL-18 treatments, and the production of large amounts of IL-18 binding protein (IL-18BP) was also observed in these patients (35). IL-18BP is an endogenous negative regulator of Interleukin-18 signaling and it can bind to mature IL-8 and block the signal transduction downstream of IL-18 (36). Therefore, IL-18BP was regarded as a novel secreted immune checkpoint and advanced study in IL-18 as a potential immunotherapy target had showed us through protein directed evolution in vitro, a new IL-18 (Mutant) with lower a nity to IL-18BP exhibited better pharmacological activity than wild-type IL-18 (37). Evidences had shown that IL-18BP expression is increased in epithelial ovarian cancer (EOC) compared with normal tissues via the analysis of gene expression pro les, which may contribute the ability of EOC tumors to escape from monitoring of the immune system (38). However, the results of genome-wide methylation level analysis showed that the methylation level of IL-18BP gene in tumors was higher than that of normal controls. It indicates that the aberrant expression of the IL-18BP in EOC may be due to post-transcriptional regulation or modi cation and other molecular mechanisms, which requires more studies in the future.

Conclusion
In our study, through joint analysis of genomic methylation data and genome-wide expression pro les of 351 patients with OSC, we identi ed 171 methylation-driven genes using the MethylMix algorithm. COX proportional hazard analysis was further performed to select genes related closely to the survival of patients and construct the prognostic risk model consisting ve aberrant methylation-driven genes, UBB, PLAT, TMOD1, KCNJ11, and CDSN. And its predictive effect was con rmed in patients of different FIGO stage, histologic grade, residual tumor after surgery, and anatomic neoplasm subdivision. Therefore, this ve-gene risk model may have great potential as an independent prognostic biomarker in predicting patient survival and provide a basis of personalized treatment for OSC patients. In addition, the functional mechanism of the ve methylation-driven genes in the carcinogenesis should be examined in further researches for potential therapeutic targets. Even though further experiments are needed, our ndings provide a bioinformatic basis for guiding subsequent OSC treatment.