Whole genome bisulfite sequencing (WGBS) and transcriptomic profiling in matched primary and recurrent high grade serous ovarian cancer
We used WGBS to perform whole genome methylation profiling and RNA sequencing (RNA-seq) for whole transcriptome profiling of 62 fresh frozen matched primary and recurrent tumor tissues from 28 women diagnosed with stage III/IV HGSOC. Clinical features of the patients and their tumors are given in Additional File 1, Supplementary Table 1 and illustrated in Figure 1A and B. Eleven of these patients carried a pathogenic germline mutation in the BRCA1 and/or BRCA2 genes. The remaining 17 patients were confirmed non-BRCA1/2 germline mutation carriers (Additional File 1, Supplementary Table 1). All patients received similar first line treatments comprising optimal debulking surgery followed by combination chemotherapy with a platinum agent. Time to first recurrence and median survival times were significantly greater in BRCA1/2 carriers compared to BRCA1/2 non-carriers (2768 days vs 1678 days respectively, P-value=0.0056) (Figure 1B, 1C).
For each primary and recurrent tumor specimen, we generated a bisulfite converted library and sequenced to a depth of ≥30x, generating approximately 400 million read pairs per library, with a bisulfite conversion rate greater than 99%. After removing CpGs covered by fewer than 5 reads, we obtained on average 24.2 million CpGs per tissue specimen (range 13.1 - 26.5 million), with an average of 24.6 million CpGs in primary tumors (15.9 - 26.5 million) and 23.8 million CpGs in recurrent tumors (13.1 - 26.4 million). RNA was extracted from the same tissue samples and sequenced. We generated a mean of 335 million reads from each library. Samples with less than 90 percent of reads mapped to the correct strand of the reference genome were excluded. After removing very low expressed transcripts (<1 transcript per million (TPM)) and transcripts in ‘blacklist’ regions (25), we obtained 91,411 transcripts from 33,969 genes, coding and non-coding (Figure 1D).
Hypervariability in the landscape of partially methylated domains (PMDs) drives tumor heterogeneity between patients
Genome-wide maps of all CpG sites using supervised (Figure 2A) and unsupervised analysis (Additional File 2, Supplementary Figure 1) indicated widespread heterogeneity between tumors from different patients. Unsupervised clustering in each of the four sample groups (primary, recurrent, BRCA1/2 carrier and BRCA1/2 non-carrier) showed that methylation profiles in primary and recurrent tumors consistently clustered according to patient status rather than their sample group classifier, except that methylation profiles also clustered by BRCA1/2 mutation status (Additional File 2, Supplementary Figure 2A-D).
We attributed this heterogeneity to a global loss of methylation within large genomic blocks (partially methylated domains or PMDs)(Figure 3A)(18, 26, 27). The fraction of the genome covered by PMDs varied from 1-58% in this set of tumors, with an average of 29% of the genome covered by PMDs per tumor (Additional File 1, Supplementary Table 2). There was no consistent pattern in the PMD architecture across this series of tumors. The most frequently observed PMDs were detected in 56/62 tumors, representing less than 0.03% of PMDs characterized in this set of tumors (Figure 3B). We identified a set of ‘common’ PMDs that were shared in more than 9 tumor specimens (ovcaPMDs), determined by the first inflection point of the bimodal distribution seen by plotting PMD frequency (Figure 3B; Additional File 1, Supplementary Table 3). Principal Component (PC) analysis using the top 10,000 most variable CpGs genome-wide (Figure 3C, left) showed that PC1 accounted for 48% of the total variance.
Hypervariable methylation patterns due to PMDs can be reproduced using only methylation values from soloWCGW CpGs, which are flanked by an A or T on both sides (palindromic) and reside alone within a window of +/- 35bp (18, 26). Thus, the average soloWCGW CpG methylation was calculated to represent the PMD signature within a maximal set of PMDs that included both cell-type ‘invariant’ common PMDs (26) and ovcaPMDs. PC1 was highly correlated with soloWCGW PMD methylation (Pearson correlation = 0.74, P-value = 4.5x10−12, Figure 3C, left). When we removed CpGs located within common+ovcaPMDs and re-performed PC analysis the variance accounted for by PC1 was reduced from 48–10%, and the overall association with soloWCGW PMD methylation was greatly reduced (Pearson correlation = 0.31, P-value = 0.01, Figure 3C, right), demonstrating that methylation in PMD regions contributes significantly to the heterogeneity across these tumors.
We also examined the effects of PMD masking by performing pairwise comparisons of soloWCGW methylation difference and Euclidean distance (calculated based on the 10,000 most variable CpGs) between each tumor (Figure 3D). The correlation between delta soloWCGW and pairwise Euclidean distance was significantly attenuated after PMD masking (Pearson correlation=0.65 and Pearson correlation = 0.17, respectively), reinforcing the observation that hypervariability at PMDs across the cohort contributes significantly to the observed heterogeneity.
CpG islands (CGIs) located within PMDs were hypermethylated compared to CGIs outside PMDs (Additional File 2, Supplementary Figure 3A). We extended this analysis to include other functional genomic elements. We observed increased levels of methylation at promoters but not CGI shores, gene bodies or intergenic regions within PMDs (Additional File 2, Supplementary Figure 3B). Whole transcriptomic profiling in the same tumor specimens was used to evaluate correlations for genes within common+ovcaPMDs. Genes within these PMDs were expressed at a significantly lower level than genes outside PMDs, with the lowest expression for genes inside the most common PMDs (P-value <2 x 10−16, linear regression; Figure 4A, left panel). Also, there was more variability in the expression of genes within PMDs than for genes outside PMDs (P-value <2 x 10−16, linear regression; Figure 4A, right panel). These findings were similar in tumors from different sample groups (BRCA1/2 carrier and non-carriers, and from primary and recurrent tumors) (Additional File 2, Supplementary Figure 4A, B) and consistent with previously reported relationships between methylation and gene expression in PMDs (18).
We next examined the frequency of known or predicted tumor suppressor genes (TSGs) (28) located in common+ovcaPMDs. Seventeen percent of known TSGs (41/248) were located in these PMDs, of which only 17 TSGs (7%) are located in low-frequency PMDs (hypergeometric test P-value = 4.17 x 10−17; Figure 4B). TSGs known to be associated with ovarian cancer development were excluded from PMDs (P-value = 0.002). Seventy-one percent of the cancer genome atlas project (TCGA) signature genes (P-value = 2.75 x 10−18) that define the four molecular subtypes of HGSOC (8) and 73% of pan-cancer oncogenes (28) were located outside PMDs (P-value = 1.94 x 10−9 ; Figure 4B). Taken together, these findings suggest that PMDs are depleted at genes important in cell identity and function. Gene Set Enrichment Analysis (GSEA) for genes located within PMDs shows that these genes are enriched in the Rig-I-like receptor signaling and ERBB signaling pathways (Additional File 1, Supplementary Table 4), similar to previously reported findings in breast cancer (18) and human neuron cells (29).
We masked the genome for PMD methylation to remove the influence of highly variable PMDs and then reanalyzed the data. Hierarchical clustering and PC analysis for the 10,000 most variable CpGs showed that recurrent tumors continued to cluster most closely with the matched primary tumor from the same patient rather than by primary, recurrent or BRCA1/2 mutation status (Additional File 2, Supplementary Figure 5A, 5B). None of the 10,000 most variable CpGs are known methylation QTLs in normal or cancer tissues, indicating this clustering was not driven by germline genetic variation (30, 31). To quantify these clusters we calculated the Euclidean distances from each primary and recurrent tumor pair from the same patient (intra-patient distances) and the distances of all pairs from different patients (inter-patient distances). Intra-patient distances were significantly smaller than inter-patient differences in both BRCA1/2 carriers and non-carriers (P-values = 7.16 x 10−7 and 1.41 x 10−3 respectively; Figure 4C). For further validation of this finding, we evaluated heterogeneity associated with whole transcriptome data. Similarly, intra-patient Euclidean distances were significantly shorter than inter-patient distances in both BRCA1/2 carriers and non-carriers (P-values = 9.67 x 10−5 and 6.70 x 10−4 respectively; Figure 4C). These observations are unlikely to be the result of residual PMD hypomethylation differences that remain after masking, but more the epigenetic landscape of recurrent tumors defined by inter-patient heterogeneity.
HGSOCs maintain methylation and gene expression programs in primary tumors progressing to chemoresistance
We used metilene to perform paired analysis of the primary and recurrent tumors from each patient (32) to identify changes in methylation associated with recurrence after a diagnosis of primary HGSOC. We identified 15,082 DMRs with >10% change in methylation and Q<0.1 across all primary and recurrent tumor pairs, but found no DMRs meeting a genome-wide corrected significance threshold of Q<0.05. We found significantly more DMRs (>10% change, Q<0.1) in primary versus recurrent tumors from BRCA1/2 non-carriers (11,205 DMRs in 16 tumor pairs, average of 659 DMRs per case comparison) compared to tumors from BRCA1/2 carriers (3,877 significant DMRs in 11 tumor pairs, average of 388 DMRs per case comparison) (P-value = 0.004). We restricted these analyses to 1,785 DMRs that were shared between primary and recurrent tumors from 2 or more patients (Additional File 1, Supplementary Table 5). We calculated the change in methylation at each CpG site within these DMRs between primary and recurrent tumors from each patient. Of these DMRs, 558/1785 (31%) were discordantly methylated (in other words, the same region is observed as hyper- and hypo-methylated in different patients) (Additional File 2, Supplementary Figure 6); but hierarchical clustering analysis failed to identify any clinical or molecular features that correlate with consistent methylation changes at these regions. Taken together, these data indicate that the methylation landscapes of recurrent HGSOCs are largely conserved after chemotherapy and disease recurrence, and do not acquire common somatic methylation changes that may be drivers of chemoresistance and tumor recurrence.
Whole transcriptome data identified 99 genes that were differentially expressed between all primary and recurrent tumors (adjusted P-value <0.05), of which 37 genes had lower expression and 62 genes higher expression in recurrent compared to primary tumors (Additional File 1, Supplementary Table 6). Differential expression for twenty of these genes was directionally consistent with differential changes in methylation (Additional File 1, Supplementary Table 7). We compared the DEGs we identified with 1,836 known/characterized tumor suppressor genes and oncogenes associated with pan-cancer development and molecular signature genes for different HGSOC subtypes described by TCGA (8). Two genes– the oncogenes CCNE1 and DDX5 – were differentially expressed between primary and recurrent tumors. CCNE1, which is significantly amplified and highly expressed in aggressive HGSOC cases with poor clinical outcome (33), was overexpressed in recurrent tumors (P=3.65x10−6). DDX5 was overexpressed in recurrent tumors and located within 10 genes of a DMR that is hypomethylated, although at ~270kb from the gene (Additional File 1, Supplementary Table 7).
Hypomethylation increases expression of immune-related genes and identifies putative drivers of disease recurrence in HGSOCs from BRCA1/2 carriers
We identified 135 significant DMRs in primary-recurrent HGSOCs between BRCA1/2 carriers and non-carriers (Q-value < 0.05)(Figure 5A; Additional File 1, Supplementary Table 8), with a trend for hypermethylation at these DMRs in tumors from BRCA1/2 non-carriers and genome-wide hypermethylation in PMD regions in primary tumors from BRCA1/2 non-carriers (P-value = 0.0011) (Additional File 2, Supplementary Figure 7). PCA analysis also suggested a trend for tumors to cluster together based on BRCA1/2 carrier status (Figure 5B). Annotation of DMRs hypermethylated in BRCA1/2 non-carrier tumors showed these regions were depleted in CpG islands near transcription start sites and active regulatory regions marked by H3K27ac in the secretory epithelial fallopian tube cell line FT246 (a model for the precursor cell type for HGSOC) indicated by at least a 15% reduction in methylation over background (Additional File 2, Supplementary Figure 8). This depletion indicates that while heterogenous, differential methylation between BRCA1/2 carrier and non-carrier tumors is focused at active regulatory regions in relevant cell types and CpG islands near transcription start sites, which play an important role in the regulation of gene expression.
We performed a replication analysis of the 135 DMRs using previously published array-based differential probe methylation data generated using Illumina 450k array analysis of tumors from 20 BRCA1/2 carriers and 60 non-carriers (7) (q < 0.05, Additional File 1, Supplementary Table 9). For this analysis, we masked probe locations that overlapped common+ovca PMDs, reducing total probe count on the array to 310,968 probes. These probes were intersected with our DMRs, and overlapping probes were used to identify DMRs between BRCA1/2 carriers and non-carriers using the tool bumphunter (34), which requires two probes within 300bp to have a significant change in methylation in the same direction of one another to be considered a DMR candidate region. After removing probes based on these criteria that did not overlap DMRs, replication data were available for 31/135 DMRs. We did not identify significant differentially methylated regions using bumphunter at any of these 31 DMRs. A similar replication analysis was not possible using TCGA dataset (613 HGSOCs including 52 BRCA1/2 and 561 non-BRCA1/2 tumors) because these tumors were analyzed using the Illumina 27K methylation array for which there is very low concordance (one CpG probe, cg21557231) with our WGBS data.
PCA analysis based on differential gene expression (DEG) analysis also identified clusters associated with BRCA1/2 mutation status (Figure 5C). We identified 3,341 DEGs in tumors from BRCA1/2 carriers compared to non-carriers (adjusted P-value <0.05) (Additional File 1, Supplementary Table 10), of which 1,760 genes were up-regulated and 1,581 genes were down-regulated (Figure 5D). This is consistent with DMR analysis where we found a greater proportion of hypomethylated CpGs in tumors from BRCA1/2 carriers compared to non-carriers. Up-regulated genes in BRCA1/2 associated tumors were significantly enriched in immune related pathways (Q<0.05) including autoimmune diseases, infection response and antigen processing and presentation (Figure 5E), even though there were no clear differences in immune cell infiltration in BRCA1/2 tumors compared to non-BRCA1/2 tumors (consesusTME scores P-value = 0.97) (Additional File 2, Supplementary Figure 9). Down-regulated genes in tumors from BRCA1/2 carriers were most significantly enriched in pathways that maintain stemness and cell differentiation, including the hippo signaling pathway (adjusted P-value <0.05) (Figure 5E).
We connected changes in the methylation of regulatory elements to changes in gene expression. One thousand seven hundred and eighty two genes were located within 10 genes up and down-stream of the 135 DMRs (Additional File 1, Supplementary Table 11), including 94 gene promoters that were within 2kb of a DMR. We compared this list with the 3,341 DEGs identified in tumors of BRCA1/2 carriers and non-carriers and identified 68 instances (includes some duplicate genes) where DMRs showed either positive or inverse correlation between methylation and expression of the nearby gene (Additional File 1, Supplementary Table 11). This corresponded to 37 unique regions. These DMRs varied in length from 100bp to 8kb with the number of promoters within 2 kb of a single DMR ranging between one and six. Twenty-five unique genes were hypermethylated and down-regulated in BRCA1/2 carriers while 41 unique genes hypomethylated and up-regulated in tumors from BRCA1/2 carriers. (Additional File 1, Supplementary Table 11). We adapted the software tool ELMER (35, 36) to correlate methylation values in DMRs with the expression of nearby genes, identifying three genes - FGF18, CDK2AP1, and NAGLU - where methylation and expression were positively correlated (q<0.05, r= 0.46, 0.44, and 0.48 respectively) (Figure 5F). Positive correlation between methylation and expression can indicate that a gene’s function is affected by gene body methylation or possibly methylation of a nearby insulator (37), although paradoxically increased expression may also arise from a fully methylated promoter (38).