Development and validation of a DNA methylation signature to predict the progression-free survival in patients with advanced-stage ovarian cancer


 Background: Ovarian cancer (OC) is a disease characterized by late-stage presentation and poor prognosis. Our aim was to identify a DNA methylation signature for predicting progression-free survival (PFS) of patients with advanced-stage OC.Methods: A bioinformatics analysis was performed to identify methylation sites that are relevant to PFS and develop a DNA methylation signature. A total of 501 patients with advanced-stage OC who were identified from the Cancer Genome Atlas (TCGA) were enrolled as a training cohort, and 108 patients with advanced-stage OC from GSE65820 were used as a validation cohort.Results: A DNA methylation signature was constructed on the basis of five methylation sites. We found that patients with OC in the training and validation cohorts could be stratified, based on the DNA methylation signature, into high- and low-risk groups with distinct prognosis. Different cancer-related pathways were enriched between these two groups. Finally, a nomogram that integrated the methylation signature risk score and clinical stage was established, and the nomogram performed well.Conclusions: The DNA methylation signature provides a promising prognostic biomarker for patients with advanced-stage OC and may help to optimize clinical management.


Background
Ovarian cancer (OC) is the most common cancer of the female reproductive system [1]. Due to a lack of typical symptoms, patients with OC are usually diagnosed at a late stage [2]. The survival of patients with advanced-stage OC (stages III or IV) is poor: the ten-year survival rate of patients with OC is 15% in the advanced stage, and 55% in the early stage [3]. About 20% of patients with advanced-stage OC survive for more than ten years, following standard treatment [3], and the mortality of patients with advanced-stage OC after ten years is close to that of the general population. Identifying patients with advanced-stage OC who survive beyond 10 years through patient strati cation is bene cial to clinical management. Surgery and subsequent platinum or taxane chemotherapy are standard therapies for patients with advanced-stage OC [4,5]. It has been reported that 34% of patients with advanced-stage OC who were treated with surgery and chemotherapy underwent progression and recurrence within 12 months [6]. Therefore, an effort to determine effective biomarkers to identify high-risk progression and recurrence is necessary.
DNA methylation is a common epigenetic modi cation that involves the addition of a methyl group to a DNA molecule. Aberrant DNA methylation can silence tumor suppressor genes or activate oncogenes and affect cellular processes, such as cell cycle, DNA repair, and cell apoptosis [7]. Aberrant DNA methylation of a single gene has been extensively identi ed in OC. For instance, hypermethylation of tumor suppressors BRCA1 and RASSF1A was signi cantly higher in OC tissues than in normal tissues [8].
Hypermethylation of BRCA1 and RASSF1A causes downregulation of genes involved in the cell cycle, thus promoting OC [8]. Numerous studies have reported that DNA methylation can be a promising biomarker in cancer. DNA methylation of NKD1, RUNX3, and ZNF671 were reported to be predictors of progression-free survival (PFS) in patients with OC [9][10][11]. Furthermore, a previous study reported that a DNA methylation signature can predict overall survival in patients with OC [12]. However, the study did not explore the correlation between DNA methylation status and PFS in OC.
Therefore, the DNA methylation pro les of tumor samples from patients with advanced-stage OC were obtained from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). A prognostic signature on the basis of DNA methylation sites was constructed and veri ed using bioinformatics methods and was used to predict the 3-year, 5-year, and 10-year PFS of patients with advanced-stage OC.

Results
Constructing and evaluating the ve-CpG site signature The clinical characteristics of patients with advanced-stage OC in the training and validation cohorts are summarized in Table S1. In the training cohort, after ltering low-quality probes, 501 samples with 18,914 CpG sites, were analyzed by univariate Cox regression analysis, and the results indicated that 1340 CpG sites were associated with PFS (P < 0.05). After LASSO regression analysis, ve CpG sites were selected as candidate CpG sites ( Figure S1A and B). Finally, these ve methylation sites (cg02218324, cg04586563, cg07172280, cg14378057, and cg26803305) were used to construct a multivariate Cox regression model ( Figure S1C). The formula was calculated as follows: DNA methylation signature risk score = -0.7337 × β-value of cg02218324 + 1.6231 × β-value of cg04586563 − 5.1071 × β-value of cg07172280 − 2.2472 × β-value of cg14378057 + 0.8637 × β-value of cg26803305. The genes corresponding to ve methylation sites (cg02218324, cg04586563, cg07172280, cg14378057, and cg26803305) were RSPH6A (radial spoke head 6 homolog A), DIP2C (disco interacting protein 2 homolog C), DPP7 (dipeptidyl peptidase 7), BICRAL (BRD4 interacting chromatin remodeling complex associated protein like), and SLC14A1 (solute carrier family 14 member 1). The information on the ve CpG sites is displayed in Table 1. The associations between the methylation levels of the ve CpG sites and the expression level of the corresponding genes were measured by Pearson correlation analysis. We found that the higher methylation levels of cg02218324 and cg26803305 were correlated with lower expression levels of RSPH6A and SLC14A1 (R < -0.10, P < 0.05, Figure 1). Patients with advanced-stage OC were grouped into high-and low-risk groups. The Kaplan-Meier curves demonstrated that low-risk patients had longer 5-year and 10-year PFS than high-risk patients (P < 0.0001; Figure 2A). The 3-year area under the curve (AUC), 5-year AUC, and 10-year AUC were 0.722, 0.77, and 0.854, respectively ( Figure 2B).
Patients were ranked according to their risk scores ( Figure 2C), and the dotplot represents their progression status ( Figure 2D). A heatmap of methylation pro les of the ve CpG sites is shown in Figure  2E.Moverover, methylation status of the ve CpG sites in the high-and low-risk groups in the training cohort was assessed. As a result, the high-risk group presented relatively higher methylation levels of cg26803305, while relatively higher methylation levels of cg02218324, cg07172280, and cg14378057 were identi ed in the low-risk group ( Figure S2, all P < 0.01). The GSE65820 validation dataset was used to evaluate the robustness and effectiveness of the ve-methylation site signature. The 5-year PFS of low-risk patients was better than that of high-risk patients ( Figure 3A, P = 0.029). The AUCs of the signature at 3, 5, and 10 years were 0.623, 0.67, and 0.688, respectively ( Figure 3B). The distribution of methylation signature risk score, progression status, and heatmap of methylation pro les of the ve CpG sites in the validation dataset are displayed in Figure 3C-E.

Nomogram development
Several clinicopathological factors, including age, clinical stage, histological grade, and methylation signature risk score were included in univariate ( Figure 4A) and multivariate Cox regression analyses ( Figure 4B). As a result, the calculated risk score provides an independent prognostic marker for 5-year PFS in the training cohort (P = 0.001, Figure 4B). To predict PFS in patients with advanced-stage OC quantitatively and more precisely, a nomogram was developed ( Figure 4C) that integrated the DNA methylation signature risk score and clinical stage, which correlated with PFS in the univariate analysis ( Figure 4A). The calibration curves for the 5-and 10-year PFS rates demonstrated good performance of the nomogram in the training cohort ( Figure 4D).

DNA methylation signature performance among patient subgroups
In addition, to evaluate the effectiveness of this signature among patients in different groups, a subgroup analysis was conducted. Patients were classi ed into two groups according to age: < 60 years and ≥ 60 years. High-risk patients had signi cantly reduced 5-year PFS for the < 60 group (P < 0.001, Figure 5A) and ≥ 60 group, compared with low-risk patients (P < 0.0001, Figure 5B). In addition, in the high-risk group, patients with Grade 1/2 (P = 0.026, Figure 5C) and Grade 3/4 (P < 0.0001, Figure 5D) OC also exhibited signi cantly decreased 5-year PFS rates. The methylation signature performance for patients with stage III and IV OC was also assessed. Similarly, high-risk patients exhibited a reduced survival rate compared to low-risk patients (P < 0.01, Figure 5E and F).

Validation of the ve CpG sites in blood samples from OC patients
To investigate whether the methylation levels of these ve sites that we observed in tumor tissue might be present in blood, DNA methylation data in whole blood samples from patients with OC and healthy females were collected from GSE19711. As a result, we found that the healthy group presented relatively higher methylation levels of cg02218324, cg04586563, cg07172280, while relatively higher methylation levels of cg26803305 were identi ed in patients with OC ( Figure 6).
Signaling pathways implicated in the high-and low-risk groups Differences in pathway activities between high-and low-risk patients with OC were scored using gene set variation analysis (GSVA). Consequently, hedgehog signaling, epithelial-mesenchymal transition (EMT), angiogenesis, KRAS signaling, hypoxia, TNF-α signaling via NF-kB, Notch signaling, IL-6/JAK/STAT3 signaling, and TGFβ signaling were signi cantly activated in the high-risk group (Figure 7). Unfolded protein response, E2F targets, mTORC1 signaling, DNA repair, G2/M checkpoint, MYC targets, and interferon-alpha response were signi cantly activated in the low-risk group (Figure 7).

Discussion
OC is a complex disease with different histological and molecular patterns, and these features cause inconsistent results when applying a single class of drugs. Risk strati cation and characterization of the potential mechanisms related to high-and low-risk groups are crucial for our understanding advancedstage OC. A thorough identi cation of the molecular mechanisms related to each group will facilitate the appropriate use of targeted therapies, immunotherapies, epigenetic therapies, and combination therapies. Thus, more emphasis should be placed on genomics, epigenomics, and the analysis of patient tumor samples before treatment regimens. A few DNA methylation-related prognostic signatures have been reported in cancer, including melanoma [13], breast cancer [14], and lung adenocarcinoma [15]. These studies suggest that DNA methylation-related prognostic signatures can provide promising cancer biomarkers.
In this study, a prognostic signature was compiled that focused on ve CpG sites and positively predicted PFS for patients with advanced-stage OC. The genes corresponding to the ve CpG sites (cg02218324, cg04586563, cg07172280, cg14378057, and cg26803305) were ve protein coding genes: RSPH6A, DIP2C, DPP7, BICRAL, and SLC14A1. Multiple studies have reported that hypermethylation causes lower gene expression, while hypomethylation causes higher expression. Thus, we explored the association of methylation level and gene expression at these ve CpG sites. Inverse correlations were found only between the methylation levels of cg02218324 and cg26803305 and the expression levels of the corresponding genes. These ndings indicated that methylation levels do not necessarily cause expected gene expression changes, and other factors affect gene expression.
DNA methylation signatures derived directly from blood will facilitate the prediction of OC patient survival. We found that these ve CpG sites could be detected in the blood samples from patients with OC. Although we did not explore the correlation between methylation levels of these ve CpG sites in the peripheral blood and the prognosis of patients with OC due to a lack of survival information, these results suggest that the ve CpG sites provide potential prognostic biomarkers in both tumor tissue and body uids.
A previous study reported that loss of DIP2C causes DNA methylation and gene expression changes and EMT in human colorectal cancer cells [16]. Furthermore, SLC14A1 has been reported to inhibit colony formation in lung cancer cell lines and have tumor-suppressor activity [17]. However, to date, these ve genes and ve CpG sites have not been reported in OC. Further functional studies will be necessary to clarify the biological and pathological implications of methylation in these genes and its relationship to OC progression.
The methylation signature investigated in this study was proven to function as an independent prognostic marker in the multivariate Cox model. Moreover, although patients with OC were regrouped according to different clinicopathological variables, the subgroup analyses demonstrated that the signature still provided independent predictive ability. Furthermore, high-and low-risk patients with OC who were strati ed by the methylation signature presented signi cant differences in cancer-related pathways. TGFβ signaling, hedgehog signaling, notch signaling, and EMT were signi cantly activated in the high-risk group. Furthermore, MYC targets, E2F targets, G2/M checkpoint, and DNA repair were signi cantly activated in the low-risk group. EMT is a key step in OC metastasis [18], and TGF-β signaling has been reported to promote EMT and metastasis in advanced OC [19]. Furthermore, TGF-β signaling causes global changes in DNA methylation during the EMT in OC [20]. Notch and TGFβ signaling can form a positive regulatory loop and cooperatively regulate EMT in OC cells [21]. In addition, hedgehog signaling has been reported to promote EMT in OC, through the PI3K/AKT pathway [22]. Furthermore, MYC expression has been reported to in uence patient responsiveness to platinum chemotherapy and the prognosis of patients with OC [23]. E2F targets and G2M checkpoint pathways are important cell cycle signaling pathways that are crucial to OC progression [24,25]. These signaling pathways may play important roles in OC tumorigenesis and progression, and the GSVA analysis results were in accordance with those of previous studies. We reasoned that aberrant DNA methylation can cause gene expression changes in these signaling pathways and affect cellular processes, such as cell cycle, DNA repair, and EMT, to promote tumorigenesis. In summary, our DNA methylation signature has potential functional relevance in predicting altered activities in these cancer-related pathways.
Taken together, our results suggest that this methylation signature provides a promising biomarker for patients with advanced-stage OC. In addition, the signature may improve the clinical management of patients with OC. This study has a few limitations. First, the signature was developed based on public databases and retrospective cohort studies. To apply the ve-CpG sites signature in clinical practice, we will need to improve its predictive power and con rm the reliability of the signature in a prospective study.
Thus, a future prospective study of a cohort of patients with OC is necessary to verify our ndings, and further functional research will be necessary to explore the roles of the ve CpG sites.
The DNA methylation signature that was developed in this study provides a promising prognostic marker for patients with advanced-stage OC and may help optimize their clinical management.

Methods
Patients with OC from TCGA and GEO TCGA DNA methylation data of samples from patients with stage III-IV were obtained from UCSC Xena (https://xena.ucsc.edu/). The GSE65820 dataset, including DNA methylation pro les and clinical information, was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). DNA methylation pro les in whole blood samples of patients with OC and healthy females were collected from GSE19711. The platforms of the three datasets were based on the In nium HumanMethylation27 BeadChip (Illumina Inc., CA, USA) or the Illumina Human Methylation 450 BeadChip (Illumina Inc., CA, USA) and the genomic annotation was on the basis of GRCh38. DNA methylation status were calculated as M/(M + U + 100) and represented as β-values ranging from 0 to 1 [26]. The gene expression pro les and clinical information of patients with stage III-IV OC were obtained from TCGA data portal (https://portal.gdc.cancer.gov/). Patients from the TCGA cohort, which included PFS time, complete clinical information (age, clinical stage, and histological grade), and DNA methylation pro les, were screened to explore the association between DNA methylation site β-values and PFS in advanced-stage OC. In addition, 108 patients with advanced-stage OC from the GSE65820 dataset were analyzed as an external validation cohort.

Data preprocessing
The ChAMP R package [27] was used to remove low-quality probes according to the following criteria: (1) CpG site methylation undetected in any sample, (2) single-nucleotide polymorphisms at the assayed CpG dinucleotide [28], and (3) location in sex chromosomes [29].
Construction and validation of DNA methylation signature First, a univariate Cox regression analysis was performed in the training cohort to identify prognostic methylation sites (P < 0.05). Then, by using the "glmnet" R package, the LASSO method was applied to decrease the number of prognostic methylation sites [30]. Finally, a multivariate Cox regression analysis was performed to establish an optimal predictive model. The risk score for every OC patient in the training and external validation cohorts was calculated. Patients with advanced-stage OC were grouped into highand low-risk groups, with the median risk score as the cutoff. Using the "survival" R package, a survival analysis was performed to compare the differences in PFS between the two groups [31]. By using the "timeROC" R package, we calculated the AUC, to measure the prognostic capability of the signature [32].

Nomogram construction
To predict the PFS of patients with OC using a quantitative tool, we developed a nomogram. Univariate and multivariate Cox analyses were performed, based on DNA methylation signature risk score and other variables.

GSVA
Pathway analyses were performed on the hallmark gene set, which were acquired from the molecular signature database [33]. To explore enriched signaling pathways in high-and low-risk groups, we applied GSVA in the "GSVA" R package using standard settings       Boxplots of β-value in ovarian cancer patients and healthy females in the GSE19711 dataset. Wilcoxon rank test was used to determine the differences between the two groups.