Altered H3K4 methylation at promoters upon loss of Ash2l
The loss of Ash2l in both hematopoietic and MEF cells results in inhibition of proliferation. At the molecular level, a reduction of H3K4 methylation and altered gene expression was observed [15,18]. In MEF cells this correlates with the induction of senescence. To further evaluate H3K4 methylation, we performed chromatin immunoprecipitation combined with next generation sequencing (ChIP-seq) of 2 pairs of Ash2l KO and WT immortalized MEF cells (i.e. iMEF1 and 2, for details see [18]) at day 5 after HOT treatment, resulting in deletion of exon 4 of Ash2l and loss of Ash2l protein. Genome-wide, 22344 and 122781 H3K4me3 and me1 modified regions, respectively, were identified. A large number of H3K4me3 marked chromatin sites (12799 common for KO1 and KO2 cells) showed loss of signal upon HOT treatment of KO cells (log2FC > 0.58; signals of > 20 reads) (Fig. 1a and Supplementary Table S1; also available in GEO under accession number GSE205232). The vast majority of these sites were associated with promoters (±3000 bp of the transcriptional start site (TSS)). Roughly a third of the 37205 promoters analyzed showed loss of H3K4me3 (Fig. 1b and c and Supplementary Fig. S1a), consistent with the decrease in global H3K4me3 [18]. The decrease in H3K4me3 was particularly obvious in regions with intermediate levels of this modification as depicted in MA plots (Supplementary Fig. S1b).
H3K4me1 marks enhancers [11,12]. Of the large number of H3K4me1 modified regions, fewer than 600 lost signals (Fig. 1b), consistent with the small decrease in global H3K4me1 [18]. One possibility is that in the absence of Ash2l KMT2 enzymes might possess mono-methyltransferase activity. In vitro studies suggest that at least some KMT2 complexes with WDR5 and RBBP5 mono- and di-methylate H3K4, while the addition of Ash2l promotes tri-methylation and stimulates overall activity [67-69]. However, our Rbbp5 immunoprecipitates did not contain methyltransferase activity in the absence of Ash2l [18]. Alternatively, H3K4me1 might be sufficiently stable during the course of the experiment, preventing loss of signal. We noticed that some H3K4me1 marked sites gained signals (Fig. 1a; 8052 common for KO1 and KO2; log2FC > 0.58; signals of > 20 reads). Most of these sites were accompanied by a decrease in H3K4me3 and are linked to promoters (Fig. 1b and c and Supplementary Fig. S1a-c).
The changes in H3K4 methylation, as described above, are documented in the displayed IGV browser tracks at the Cdh3 locus, which lost H3K4me3 and gained H3K4me1 in its promoter region (Supplementary Fig. S1c). These effects were validated for Cdh3 and the promoters of several additional genes. Cdh3 and Flywch2 are downregulated in response to Ash2l loss, while the expression of Hsp90b1 was unchanged in the RNA-seq experiments (summarized below in Supplementary Fig. S2a) [18]. All three lost H3K4me3 and me2 in their promoter regions (Fig. 1d). However, the increase in H3K4me1 was less obvious. Olig1, Olfr456 and Cdh17 are genes that were minimally or low expressed in WT and untreated KO cells [18], and showed no H3K4me3 in their promoter regions in the ChIP-seq data set in KO1 and KO2 iMEFs (Supplementary Table S1). This was corroborated in ChIP-qPCR experiments, demonstrating low levels of all three H3K4 methylation states (Fig. 1d). Of note, in the HOT treated KO1 and KO2 cells the expression of Olfr456 and Cdh17 was upregulated, the latter only in RT-qPCR measurements, but not of Olig1 [18]. Thus, these findings support the concept that H3K4me3 correlates with gene expression and that H3K4 methylation at promoters is broadly affected in response to loss of Ash2l. In contrast, the H3K4me1 pattern was remarkably stable with an increase in regions that carried H3K4me3 modifications suggesting that the loss of tri- and di-methylation resulted in an increase in mono-methylation.
Gene repression correlates with loss of H3K4me3
We evaluated the correlation between changes in the H3K4 methylation patterns and gene expression. Promoters were grouped according to low, medium and high H3K4me3 signals (see material and methods section for details). We observed that the fold reduction of H3K4me3 in the high group was the lowest (Fig. 1e, left panel, and Supplementary Fig. S1d). Despite this, these promoters revealed the highest increase in H3K4me1 (Fig. 1e, middle panel, and Supplementary Fig. S1d), supporting the suggestion that the loss of both H3K4me3 and me2 resulted in an increase in H3K4me1, particularly at promoters with very high H3K4me3. H3K4me1 may then persist as this modification appears to be rather stable (Fig. 1a and b). Also, the genes associated with H3K4me3high promoters were those with the smallest decrease in expression, while those genes with H3K4me3medium and H3K4me3low promoters were downregulated more strongly (Fig. 1e, right panel, and Supplementary Fig. S1d). One interpretation is that H3K4me3high promoters possess, after 5 days of HOT treatment, still sufficient H3K4me3 for being efficiently transcribed and that a certain H3K4me3 threshold is required to maintain accessibility of promoters and thus allow transcription.
This is consistent with promoters of downregulated genes showing the largest decrease in H3K4me3, while the decrease was smaller for the few upregulated genes (Fig. 1f and Supplementary Fig. S1e). At present, it is unclear whether this increase in RNA is due to enhanced transcription or due to stabilization of the RNA as a consequence of the overall repression of gene transcription and thus some secondary effect. Further evaluation may require a system that allows short-term regulation of Ash2l to acquire the ability to study more direct effects of Ash2l loss.
GC-rich promoters are sensitive to loss of H3K4me3
Two major types of promoters have been classified according to either a focused or a dispersed TSS [70,71]. The former is typically characterized by the presence of a TATA box as a core promoter element. The latter is associated with CpG islands (CGIs) and thus are enriched for GC-rich binding sites. These include the GC box, originally defined as SP1 binding site [72], and more general sites for SP as well as Krüppel-like factors (KLF) [73-76]. We compared the presence of TATA and GC boxes in promoters of up- and downregulated genes. Downregulated genes were enriched for promoters with GC boxes while TATA boxes were underrepresented (Fig. 2a-c) [64]. Consistent with these findings was that GC-rich binding sites for SP and KLF transcription factors were also enriched (Fig. 2b + Supplementary Table S2; also available in GEO under accession number GSE205232). For example, 76% and 57% of downregulated genes in KO1 or KO2 cells, respectively, possess Klf4 and SP1 binding sites within their promoter proximal regions supporting the conclusion that GC-rich promoters are preferentially downregulated (Supplementary Table S2). Similarly to SP and KLF sites, CTCF and CTCFL consensus sites were enriched, which also have a high GC content (Fig. 2b and c). We note that CTCFL is not expressed in our MEF cells according to the RNA-seq data [18], consistent with its expression being very low in normal somatic cells [49,51]. Many CTCF and CTCFL binding sites overlap with some marked with H3K4me3 and thus most likely represent promoters [77,78]. Together, GC-rich binding sites were preferentially associated with promoters characterized by high and medium H3K4me3 (Fig. 2c). Additionally, for upregulated genes, an enrichment in binding sites for AP1 factors was observed (Fig. 2b and Supplementary Table S2). Finally, in support of the association of GC boxes with repressed genes, the majority of downregulated genes are controlled by CGI promoters, while only few CGIs are linked to upregulated genes (Fig. 2d). Together these findings suggest that the consequences of a loss of Ash2l and thus of H3K4 methyltransferase activity, are particularly pronounced at CGI promoters.
Ash2l loss affects promoter associated histone H3 loading and histone marks
H3K4me3 correlates with promoter accessibility and transcription [5,7]. Thus, loss of H3K4me3 may result in more compacted chromatin at promoters. We chose three genes with strong CGI promoters (Rab27a, Atp9a and Mapk12), which lost H3K4me3 upon Ash2l KO (Fig. 2e), and the six genes analyzed above (Fig. 1d), to assess the level of histone H3 at promoters using ChIP-qPCR (for a summary of changes in H3K4me3 and expression upon HOT treatment, see Supplementary Fig. S2a). The H3 ChIP signal in the Ash2l KO samples increased at all 9 promoters upon Ash2l loss (Fig. 2f and Supplementary Fig. S2b). In addition, we observed a decrease of H3K27ac and an increase in H3K27me3 at the majority of the promoters (Fig. 2e and Supplementary Fig. S2c). In support for increased chromatin compaction, H3K9ac was decreased (Supplementary Fig. S2c). Somewhat unexpected, H3K9me3, linked to compacted chromatin and enriched at some repressed promoters [79], showed a trend to reduced signals at the promoters analyzed (Supplementary Fig. S2c). Finally, we measured H3K79me3, enriched at actively transcribed genes, and H4K20me2, associated with DNA repair, which were largely unchanged at the evaluated promoters (Supplementary Fig. S2c) [80,81]. The impact on modification of H3K27 may relate to observations that KMT2 complexes have been reported to be associated with KDM6/UTX enzymes, which demethylate H3K27, and CBP/p300, which acetylate H3K27 [20,82-84], thus supporting the strong interplay of H3K4 and H3K27 marks [9]. Together, these findings suggest that the loss of Ash2l results in increased chromatin compaction at promoters and a shift from activating to repressing chromatin marks, which is particularly evident at CGI promoters.
Increased chromatin compaction upon loss of Ash2l
To further evaluate a possible chromatin compaction upon Ash2l loss, we performed ATAC-seq experiments at day seven of HOT treatment. These revealed the expected pattern of nucleosome-free regions, mono-nucleosomes, di-nucleosomes and larger fragments (Supplementary Fig. S3a). The significantly changed sites upon loss of Ash2l (q<0.05; log2FC>0.40), 15087 sites gained and 11961 sites lost accessibility, were analyzed regarding their location (Supplementary Table S3; also available in GEO under accession number GSE205230). We compared the accessibility of promoter regions (±3 kb) to intra- and intergenic regions of the genome. The gained accessibility was preferentially in the intra- and intergenic regions (Fig. 3a). Considering that a 6000 bp region of 37205 promoters was analyzed, which represents roughly 8.3% of the murine genome, the gained sites were slightly underrepresented at promoters (4.6% of total gained sites when assuming one site/6 kb fragment). Lost accessibility was predominantly near promoters (34.6% of total lost sites). Thus, they were 4.5-fold more abundant than expected, suggesting that promoter regions were preferentially compacted upon Ash2l loss (Fig. 3a and b). Although it has been argued that the promoters of transcribed compared to silent genes are more accessible, only few studies have provided evidence for a link to H3K4me3. In two distinct experimental systems, murine myogenesis and embryogenesis in Xenopus, H3K4me3 signals correlate with accessibility by ATAC-seq analysis, but because the histone mark was not manipulated functional links were not established [85,86]. Thus, our findings suggest that the loss of H3K4me3 compromises promoter accessibility. We note that the time frame in our experimental system is rather long and only when short term regulation of this histone mark will be possible, conclusions about potentially direct consequences might become possible.
Altered accessibility was particularly obvious just upstream of the TSS, a region that is typically nucleosome-depleted when genes are transcribed [87-89]. Therefore, we addressed whether an increase in mono-nucleosomes close to the TSS can be detected when a smaller region encompassing ±600 bp is evaluated (Supplementary Fig. S3b). This revealed that the overall accessibility in this small chromatin window was reduced but we did not observe a significant increase in positioned nucleosomes at or just upstream of the TSS. We then analyzed the promoters of downregulated genes, which might be affected more strongly, however, the effect of Ash2l loss was similar with a decrease of the overall accessibility (Supplementary Fig. S3b). Further comparison of the different data sets documented that chromatin regions with H3K4me3 loss became compacted (Fig. 3c). Finally, chromatin compaction was most prominent at promoters of downregulated genes (Fig. 3d). Together, the increased compaction at promoters upon Ash2l loss was consistent with an increase in H3 signals, and thus likely due to increased nucleosome loading. However, a well-positioned nucleosome just upstream of the TSS [87], which we expected to result in a distinct pattern of ATAC-seq signals, could not be visualized. Whether this is due to not fully established changes in chromatin organization at the chosen time point and/or due to variability in the position of postulated upstream nucleosomes relative to the TSS, remains to be determined.
To evaluate whether the observed alterations in the accessibility of DNA were associated with distinct DNA motifs, the ATAC fragments were screened for transcription factor (TF) binding sites. We noticed that a few sites were strongly linked to altered accessibility (Fig. 3e). For further analysis, we concentrated on those sites that showed significantly changed activity upon Ash2l loss (p<0.05) and, in addition, for which at least 1000 binding sites were observed in our ATAC-seq data set. At this stringency, we identified 8 TF binding motifs that gained and 9 that lost occupancy (Fig. 3e and Supplementary Table S4; also available in GEO under accession number GSE205230). Of those TF motifs that significantly gained binding activity, CTCF sites were affected most profoundly. CTCF binds to GC-rich sequences, which are associated with downregulated genes (Fig. 2), and has major functions as transcriptional regulator and in higher-order chromatin organization [49,50]. Alterations of activity were identified for 14132 sites in the ATAC-seq data set (Fig. 3f and Supplementary Table S4). Overall, higher sequence coverage was observed on both sides of CTCF consensus DNA binding sequences (Fig. 3f). For comparison, increased binding to ATF7 consensus sites, and decreased binding to NFYA and Dux consensus sites are displayed, which showed weakly altered protection compared to CTCF (Fig. 3e and f). Moreover, the analysis of the neighboring regions of the CTCF consensus motif suggested that the positioning of both the -1 and +1 nucleosomes was enhanced (Fig. 3f). Well positioned nucleosomes flanking CTCF sites has been noted previously [90-93]. This suggested that the altered accessibility of chromatin was linked to relatively few known TF binding motifs.
Binding of CTCF to core promoters is reduced upon Ash2l loss
Because of the effects related to CTCF binding site motifs in our ATAC-seq data, we performed CTCF ChIP-seq experiments of control and 7 day HOT treated cells in replicates. We identified a total of 101513 binding sites (Fig. 4a and Supplementary Table S5; also available in GEO under accession number GSE205231), which is in the same order of magnitude as reported by others. For example, when the CTCF occupancy landscape in 40 different human cell lines was determined, an average of 61944 sites and a total of 107295 sites across the different cell lines were detected [94]. Moreover, in murine cells 2-3-fold more CTCF sites were noticed when compared to human cells [95]. Of those sites that showed altered binding upon knockout of Ash2l (q<0.05; log2FC>1), a loss was observed at 719 and a gain at 1682 binding sites (Supplementary Table S5). Of note was that most of the losses were located in promoter regions (TSS ±3000 bp) (Fig. 4b). When we further subdivided the ±3000 bp window, we observed that lost binding sites were enriched close to the TSS in the ±1000 bp window and their numbers decreased with increasing distance to the TSS, consistent with the ATAC-seq data (Fig. 4c). Compared to a statistically distributed change in CTCF binding sites, we observed a 10.2- and 19.6-fold increase in lost CTCF binding sites in the ±3000 and ±1000 promoter window, respectively. Thus, the loss of CTCF binding was even more pronounced than the effect on accessibility measured by ATAC-seq (see above). For verification, the differential occupancy of CTCF sites in response to Ash2l loss at different genomic locations, as determined by ChIP-seq, was measured in independent ChIP-qPCR experiments (Fig. 4d and Supplementary Fig. S4a). At 7 distinct loci, 2 unaffected, 2 with increased and 3 with reduced CTCF binding in the ChIP-seq data set, the alterations were reproducible. Our findings are consistent with previous notions that CTCF binding is in competition to a fragile nucleosome close to the TSS [96,97], and with occupation of promoter-linked CTCFL sites being negatively correlated with H3 loading [78].
Next, we compared the CTCF binding sites that were gained/lost upon Ash2l depletion with the set of up-/downregulated genes [18]. Although a small number of downregulated genes lost CTCF binding in their promoter regions, the majority of lost CTCF sites were not associated with the promoters of significantly downregulated genes (Supplementary Fig. S4b). This suggested that the loss of CTCF binding at promoters is unlikely to play a major direct role in gene repression upon Ash2l loss. The intersection of CTCF gained peaks and upregulated genes with the other two groups was minimal (Supplementary Fig. S4b). Thus, also the upregulated genes were unlikely to be main targets of CTCF. Furthermore, we compared our CTCF ChIP-seq data set with annotated enhancers in MEF cells [65]. Of note was that at enhancers decreased CTCF binding was observed (Supplementary Fig. S4c). Although the number of significantly altered CTCF binding sites was low, their reorganization may affect clustering of transcriptional regulators, thereby modulating gene expression [98].
Because CTCF binding sites are associated with topologically associating domain (TAD) boundaries [49], we compared our CTCF ChIP-seq data set with TAD boundaries that were determined in mouse embryonic stem cells (mESCs) [60], as no defined positions of annotated TAD boundaries for MEFs were available. Therefore, this comparison has to be interpreted with caution. We found that both gained and lost CTCF peaks were associated with potential TADs in MEF cells (Supplementary Fig. S4d). Of the gained peaks, 13% overlap with TADs, while of the lost peaks 30% are TAD associated. This suggested that higher-order chromatin organization was affected upon Ash2l loss. Considering that 15% of CTCF are residing at TAD boundaries [60,99], these numbers are compatible with this interpretation. Together, these findings suggest that altered CTCF binding sites are linked to chromatin organization, and thus may affect gene expression indirectly, rather than to regulatory functions proximal to promoters.
The role of H3K4me3 in reorganizing active CTCF binding sites in Ash2l-KO MEF cells
To further compare the different data sets, we used the 1682 CTCF binding sites that gained binding in response to Ash2l loss in the ChIP-seq experiments and asked how this increased binding affected the neighboring chromatin. We observed increased accessibility around the CTCF binding sites (Fig. 5a). This was found for sites near the promoter (TSS ±3000 bp) and also for intragenic and intergenic sites. When lost CTCF binding sites were analyzed, reduced accessibility was noted in the promoter regions (Fig. 5b), consistent with the overall decrease in promoter accessibility. Similar tendencies were noted for the lost sites in intragenic regions, but not for intergenic regions, although the number of affected sites was small in both intra- and intergenic regions. Finally, we compared the lost and gained CTCF sites regarding colocalized H3K4me3 signals. As the lost sites are predominantly near promoters (Fig. 4b), we expected a decrease in H3K4me3. Indeed, this was observed (Fig. 5c). In contrast, the gained sites, which are predominantly intra- and intergenic, showed very low H3K4me3 signal that did not change upon Ash2l loss (Fig. 5c). These findings suggest that CTCF dissociates after Ash2l and H3K4me3 loss from core promoter regions and may redistribute to more accessible intergenic sites. Whether this is a direct consequence of H3K4me3 depletion and chromatin compaction needs to be further investigated.