KMT2D preferentially binds mRNAs of the genes it regulates, suggesting a role in RNA processing

Abstract Histone lysine methyltransferases (HKMTs) perform vital roles in cellular life by controlling gene expression programs through the posttranslational modification of histone tails. Since many of them are intimately involved in the development of different diseases, including several cancers, understanding the molecular mechanisms that control their target recognition and activity is vital for the treatment and prevention of such conditions. RNA binding has been shown to be an important regulatory factor in the function of several HKMTs, such as the yeast Set1 and the human Ezh2. Moreover, many HKMTs are capable of RNA binding in the absence of a canonical RNA binding domain. Here, we explored the RNA binding capacity of KMT2D, one of the major H3K4 monomethyl transferases in enhancers, using RNA immunoprecipitation followed by sequencing. We identified a broad range of coding and non‐coding RNAs associated with KMT2D and confirmed their binding through RNA immunoprecipitation and quantitative PCR. We also showed that a separated RNA binding region within KMT2D is capable of binding a similar RNA pool, but differences in the binding specificity indicate the existence of other regulatory elements in the sequence of KMT2D. Analysis of the bound mRNAs revealed that KMT2D preferentially binds co‐transcriptionally to the mRNAs of the genes under its control, while also interacting with super enhancer‐ and splicing‐related non‐coding RNAs. These observations, together with the nuclear colocalization of KMT2D with differentially phosphorylated forms of RNA Polymerase II suggest a so far unexplored role of KMT2D in the RNA processing of the nascent transcripts.


| INTRODUCTION
Epigenetic modifications are crucial for precise spatiotemporal regulation of all gene expression programs and manifest in several molecular mechanisms.One of these is the histone lysine methylation mediated by histone lysine methyltransferase (HKMT) enzymes.The addition of a methyl mark to lysine allocates transcription factors to their specific site of action or directly modifies chromatin structure to alter gene expression.In mammals, the KMT2 (or MLL-mixed lineage leukemia) family of lysine methyltransferases (HKMT) consists of six enzymes: KMT2A-G (Poreba et al., 2022).Members of this family are responsible for the mono-and dimethylation on enhancers, and tri-methylation on active promoters of histone H3 lysine 4 (H3K4) (Froimchuk et al., 2017).KMT2 family members act as parts of large protein complexes with obligatory complex components of COMPASS complex: WDR5, RbBP5, ASH2L, and DPY3, and the absence of KMT2 proteins destabilizes the complex and disintegrates the core components (Jang et al., 2017).In yeast, all H3K4 methylation events are carried out by a single Set1 protein, while in Drosophila, there are three SET domain-containing proteins responsible for H3K4 methylation: Trx (Trithorax), Trr (Trithorax related) and dSet (Herz et al., 2012).In mammals, two orthologs are found for each Drosophila H3K4 methyltransferase: KMT2A and B (MLL1/2) are related to Trx, KMT2C and D (MLL3/4) to Trr and KMT2F and G (SETD1A/B) to dSet (Mohan et al., 2011).These pairs have largely, but not completely overlapping functions; for example, KMT2D knockout results in embryonic lethality in mice, while KMT2C null mice can be born, but are not viable (Sugeedha et al., 2021).The unique functions of KMT2D are underlined by the genetic disorders related to the mutations of the protein.Kabuki syndrome is largely caused by the truncation of KMT2D (van Laarhoven et al., 2015), while a similar, but clinically and genetically different condition, the Exon38/39 syndrome originates from point mutations in a conserved segment of the protein (Cuvertino et al., 2020).KMT2D also contains a long glutamine-rich low complexity region (polyQ), which is not found in any other KMT2 family members.This segment was shown to drive liquid-liquid phase separation (LLPS) in vitro and in cells (Fasciani et al., 2020), and appears to be indispensable for the correct localization of KMT2D to mediator condensates.The results confirmed that the polyQ region was also important for the assembly of transcriptional condensates and maintaining a proper nuclear architecture (Fasciani et al., 2020).These observations accentuate the importance of protein regions outside the enzymatic domain of histone methyltransferases and raises the possibility of important regulatory functions embedded in these segments.One such function can be the capability to form interactions with various RNA molecules that may serve as molecular scaffolds or assist in the targeting of the client proteins (Jang et al., 2017).
Accumulated scientific evidence underlines the involvement of long non-coding RNAs (lncRNAs) in the regulation of chromatin structure and epigenetic programming (Bartonicek et al., 2016).These RNA molecules are longer than 200 nt and do not have protein-coding capacity (Niknafs et al., 2015).Various experimental results show direct interactions of lncRNAs and chromatin modifier-complexes with a detectable physiological outcome (Balas & Johnson, 2018;Cifuentes-Rojas et al., 2014;Heery et al., 2017;Mirzaei et al., 2022).
The connection between H3K4 methylation and RNA binding is further supported by the observation that yeast Set1, the only H3K4 methyltransferase present in the yeast genome, is capable of binding RNAs with a functional relevance (Luciano et al., 2017;Trésaugues et al., 2006).
KMT2D, with its amino acid sequence over 5500 residues, is the largest member of the KMT2 family and contains long disordered regions of poorly characterized functions (Lazar et al., 2016).Bioinformatics analysis indicated the presence of several putative RNA binding regions within these segments, two of which showed RNA interaction in vitro (Szab o et al., 2018).
Based on these results and the preceding examples of the importance of the RNA interactions in other HKMT proteins, we asked the question if KMT2D interacts with RNAs within cells.To this end, we performed RNA immunoprecipitation followed by sequencing with the endogenous KMT2D and a separated RNA binding region (RBR-polyQ) and could identify several interacting RNAs which showed a significant overlap with the genes regulated by KMT2D.Further investigations revealed the possible existence of a so far undiscovered connection between KMT2D and RNA processing.

| RESULTS
As our previous observations suggested that RNA binding regions exist in the sequence of KMT2D (Szab o et al., 2018), we aimed at confirming the RNA interaction capacity in cells and identifying the possible RNA interactors of the protein.RNA immunoprecipitation followed by sequencing (RIP-Seq) revealed that KMT2D indeed binds a high number of transcripts in the nucleus (Tables S1 and S2).The initial list of pulled down RNAs contained 10,427 mRNAs and 635 non-coding RNAs that showed higher FPKMs in the KMT2D pulldown sample compared with the negative control in any of the three independent samples.Out of these, 5289 mRNAs and 132 lncRNAs were found in at least two datasets, while 1318 mRNAs and 26 lncRNAs were present in all three replicates (Figure S1.).About 75% of the positive hits in each experiment were also represented in at least one other independent sample.Detailed bioinformatics analyses were carried out with the dataset containing RNAs present with higher FPKMs than the negative control in all replicates, unless otherwise indicated.
As a verification of the RIP results, we selected 45 RNAs, both coding and non-coding, from the transcripts identified in the RIP-Seq samples and performed RIP-qPCR experiments.Contrary to the bioinformatics analysis, here we chose RNAs based on either their average enrichment or possible functional relevance and we included three additional RNAs that were not pulled down by KMT2D in any of the replicates, as negative controls.As indicated in Figure 1a,b (and Figure S2), the majority of the RNAs showed at least a 1.3-fold enrichment over the negative control antibody, clearly confirming the broad-range RNA binding of KMT2D.As expected, the negative control RNAs were not significantly enriched in the pull-down samples (Figure 1a).The other RNAs below the 1.3 fold enrichment threshold were BATF3 (Basic leucine zipper transcriptional factor ATF-like 3) and MBNL1 (Muscleblind-like protein 1) for mRNAs and SCARNA6 (small Cajal body-specific RNA 6) among the ncRNAs.The fold-enrichment values determined in the qPCR experiments for the bound RNAs were in good correlation with the fold-enrichments in FPKMs in the NGS data, corroborating the reliability of the results.SCARNA6 was found only in one RIP replicate, so that it may indeed have been a false positive hit, but BATF3 and MBNL1 mRNAs showed a reasonable enrichment in three and two RIP samples, respectively.This underlines the need for further confirmation of the direct interactions of RNAs identified in the RIP screens.
To achieve a better understanding of the role of RNA binding in the function of KMT2D, we wanted to determine if there is a systematic enrichment of the mRNAs identified.To this aim, we compared the known list of genes under the regulation of KMT2D and the mRNAs we identified as intracellular partners to see if they significantly overlap.As a first step, we collected the KMT2D-regulated genes from the three independent publications (Dawkins et al., 2016;Lin-Shiao et al., 2018;Ortega-Molina et al., 2015) and from these, we created a nonredundant dataset containing 2174 genes.This dataset shared 266 instances with our high-confidence KMT2Dinteracting mRNA list, which suggests that genes under the control of KMT2D may be enriched in our RIP dataset.To confirm that this overlap is not accidental, we calculated the size of shared instances occurring by chance.We generated 200 random datasets from the UniProt (UniProt Consortium, 2023) database, which had 1316 randomly selected human protein entries-the same number as the high-confidence mRNAs bound by KMT2D-and we calculated the overlap between the random datasets and the KMT2D-regulated genes (for further details see Materials and Methods).According to the results represented on Figure 1c, the average overlap between the random datasets and the KMT2D-regulated genes was around 105 genes and the maximum was 130 instances.Compared with these numbers, the 266 overlapping genes are much higher, indicating that the detected overlap is not a random occurrence and KMT2D preferentially binds to the mRNAs of the genes it regulates.A similar rate of overrepresentation was found for mRNAs present in at least two biological replicates of the RIP samples (Figure S3), confirming the nonrandom distribution of the RIP datasets.Notably, 13 of 15 genes individually confirmed in the previous publications (Lin-Shiao et al., 2018;Ortega-Molina et al., 2015) were pulled down by KMT2D at least in one of our RIP experiments, with IKBKB (Ortega-Molina et al., 2015) and MINK1 (Lin-Shiao et al., 2018) enriched over the negative control in all of them (Supplementary Methods Table 1).
Functional categorization revealed that RNA processing and RNA metabolism, as well as protein modification and chromatin binding are significantly enriched in the KMT2D-bound mRNAs (Figure 1d).Of these categories, only the "Protein modifying enzyme" is overrepresented within the KMT2D-regulated genes, which can mean that the RNA binding is only relevant in a subset of genes, but it can also arise from the specific gene expression differences between the cellular systems studied.All of the previously published experiments were performed in different cellular environments from the one in this study, such as B cell lymphoma (Ortega-Molina et al., 2015), undifferentiated epidermal keratinocytes (Lin-Shiao et al., 2018) and pancreatic ductal adenocarcinoma (Dawkins et al., 2016), allowing for large variations in the expressed genes.
To confirm that the pulled down RNAs were transcribed from genes that were indeed under the control of KMT2D in our system, we silenced the expression of KMT2D in HEK293T cells and determined the expression levels of the genes that showed a high enrichment in the KMT2D RIP-qPCR experiments.Since the generally used housekeeping gene actin (ACTB) showed unstable expression upon the silencing of KMT2D (Figure 1e), we opted to apply combined internal controls, based on methodologies designed to tackle this problem (Liu et al., 2022;Liu et al., 2023;Tsaur et al., 2013;Valente et al., 2009), using the average of combined expressions of both GAPDH and SCARNA6 as reference to minimize the experimental error (Figure S4A,B).We chose these two genes, as our analyses indicated that they were the most stable pair in the siRNA transfected cells (for further details, see Section 5).KMT2D silencing resulted in a reduction of its mRNA level by 70% (Figures 1e and  S4C).Accompanied by this change we could detect a decrease in the mRNA levels of most of the tested genes (Figure 1e) and, showing that these are indeed under the control of KMT2D in our system.The known KMT2Dtarget genes, NEAT1, CCNL1 and CCND1 (Dawkins et al., 2016;Lin-Shiao et al., 2018;Ortega-Molina et al., 2015) were all significantly downregulated upon the silencing of KMT2D, but it is important to note that the expression of RNA Polymerase II was not affected, although its RNA was consistently enriched in the RIPseq results (Supplementary Methods Table 1).Two out of the three negative control RNAs also remained unchanged upon KMT2D silencing and only CLDN7 showed a relatively mild decrease in expression.This may be an indirect effect, as neither of these RNAs are listed as known KMT2D-targets (Dawkins et al., 2016;Lin-Shiao et al., 2018;Ortega-Molina et al., 2015).The identified uneven alterations in gene expression levels indicate that although there is a detectable overlap between the KMT2D-regulated genes and the bound RNAs, this is not a total correspondence and RNA binding alone does not grant KMT2D a direct influence on gene expression.
In line with previous observations (Lin-Shiao et al., 2018;Ortega-Molina et al., 2015) H3K4 monomethylation did not significantly change upon KMT2D silencing after 48 h (Figure S5), however, as the loss of KMT2D manifested in the local depletion of the H3K4 mono-and dimethyl marks in the published work (Ortega-Molina et al., 2015), it is possible that these changes occurred here as well.This may originate directly from the lower level of active KMT2D in the cells, or indirectly, from the destabilization of the COM-PASS complex due to the absence of KMT2D (Jang et al., 2017) and is less likely connected to the RNA binding capacity of the protein.Therefore, these results confirm only the overlap between the bound RNAs and the KMT2D-regulated genes and not the importance of the RNA binding itself.
The observations summarized above show that KMT2D appears to be closely associated with a wide range of RNAs.However, KMT2D is known to function as a part of the COMPASS complex, a large, multisubunit protein assembly, that contains other components already implicated in RNA binding (Yang et al., 2014).To rule out the possibility that the identified RNAs were pulled down by other COMPASS complex components, we expressed a fragment of KMT2D that is predicted to have RNA-binding capability and previously has been shown to be able to bind RNA (Szab o et al., 2018) in vitro.Since the polyQ segment that follows this RNA binding region is important to direct the localization of KMT2D to super enhancer condensates (while also driving the formation of transcriptional condensates) (Fasciani et al., 2020), we expressed our putative RNA binding region together with the polyQ fraction (RBR-polyQ, aa3500-4010) (Figure 2a) (Ren et al., 2009) and performed RIP experiments with this construct as well.
RIP-Seq results returned 4941 mRNAs and 164 noncoding RNAs that were pulled down by the RBR-PolyQ (Tables S1 and S2).Out of these, 763 mRNAs and 11 ncRNAs were shared with the high-confidence list of RNAs bound by the full length KMT2D (Figure S6) and only 660 mRNAs and 53 ncRNAs were not found to be enriched in any of the KMT2D RIP-Seq samples (Tables S1 and S2).These indicate that the RBR-polyQ region is capable of binding RNAs on its own, and it shares a relevant target landscape with the full length protein.Nevertheless, the differences seen also underline the possible regulatory functions conveyed by other factors either in KMT2D itself, or its interacting partners.Since the RBR-polyQ region does not harbor the segments required to form interactions with other COM-PASS complex components, possible differences in the localization of the two proteins also need to be taken into account.
Similarly to the full length protein, we wanted to verify the RNA binding of RBR-polyQ by RIP-qPCR experiments.The results presented in Figures 2b and S2 confirmed that the RBR-PolyQ region is able to pull down the same set of RNAs as the full length protein with an enrichment over the negative control above the two-fold threshold.It also appears to have a somewhat reduced specificity, as RNAs not bound significantly by KMT2D, like RBM4B and BATF3 both showed significant enrichments in the RBR-polyQ RIP-qPCRs.This corroborates the observations of the RIP-Seq data and suggests the existence of other regulatory elements within KMT2D that modulate the specificity and the strength of the RNA binding.To address the importance of the RNA binding region and separate the effects of RNA binding and the polyQ segment, we designed two additional constructs, one that lacks the polyQ region (aa 3500-3630 with 14 consecutive glutamines between aa 3600-3613 deleted, termed further on as ΔQ) and one that is devoid of the RBR (aa 3600-4010, termed polyQ) (Figure 2a).In vitro RNA binding results (Figure S7A) confirmed that the ΔQ variant had comparable RNA binding capacity to the WT RBR (for the in vitro RNA binding studies, we used the same construct as in our previously published work; Szab o et al., 2018).The RIP-qPCR results (Figure S7B) indicate that ΔQ has a measurable RNA binding capacity in the cells, while the polyQ alone does not bind a RNAs above the negative control, indicating that the RNA recognition is mediated by the RBR region and does not depend on the presence of the polyQ segment.
Surprisingly, the expression of the RBR-polyQ resulted in a strong downregulation of several genes (Figure 2c), which even exceeded the effect of KMT2D silencing.This was an unexpected result, as these experiments did not involve the deliberate modification of KMT2D levels.Nevertheless, the mRNA level of KMT2D was robustly reduced in the RBR-polyQ-transfected cells (Figure 2c), to an extent comparable to the siKMT2D treatment (Figure 1e).Together with this downregulation, the mRNA level of RNA Polymerase II (RPOL2) was similarly diminished (Figure 2c), unlike in the case of siKMT2D (Figure 1e).This combined reduction in the amount of KMT2D and RPOL 2 could explain the harsh effect of RBR-polyQ in terms of the mRNA expressions.It also indicates that the RBR-polyQ region possesses unique features that allow it to strongly interfere with RNA expression.To dissect whether these features are connected to the RNA binding or the properties of the polyQ region, we tested the effects of the two separate segments in the same setup.Strikingly, the ΔQ construct demonstrated a similar, but milder effect than RBR-polyQ, with strong downregulation of KMT2D and RPOL 2 levels, but a less pronounced decrease in others (Figures 2c and S7C).In contrast, expression of the polyQ segment alone resulted in a marked overexpression of practically all mRNAs examined (Figure 2c), which might be connected to its role in the formation of transcriptional condensates (Fasciani et al., 2020).The observed differences between the effects of the various constructs highlight the importance of the RNA binding region and indicate that the downregulation induced by the RBR-polyQ and ΔQ constructs is not exclusively a consequence of the reduced levels of KMT2D.Nevertheless, they also emphasize the differences in the behavior of RBR-polyQ and KMT2D, underlining the importance of other regions in KMT2D.
RNA-binding function of KMT2D has not been suggested yet, therefore the physiological role of this phenomenon is still largely unclear.Nevertheless, previous studies revealed that RNA binding can have dual roles in the function of HKMTs.In the case of Ezh2, RNA binding is mostly suggested to involve lncRNAs, and regulate the correct localization of the protein (Davidovich & Cech, 2015).On the other hand, yeast Set1 was shown to interact with both coding and non-coding RNAs and it was also proven that the binding can occur cotranscriptionally (Luciano et al., 2017), suggesting an involvement of Set1 in the processing of nascent transcripts.In order to be able to differentiate these two possibilities, we mapped the reads in the NGS data on the gene structures of the identified transcripts.This revealed that several reads were localized to intronic regions of the RNAs (Figure 3), indicating that RNA binding by KMT2D can occur cotranscriptionally.Binding of KMT2D to CHAS-ERR lncRNA further corroborates this notion, as this RNA has been shown to interact with nascent RNAs to activate gene expression (Antonov & Medvedeva, 2020).
Cotranscriptional binding of nascent RNAs may indicate that KMT2D plays a role in RNA processing.This would involve an intimate relationship with splicingrelated nuclear bodies and the presence of the associated RNAs in the RIP dataset.A closer inspection of the high-confidence RIP dataset shows that several spliceosomal RNAs and Cajal-body related RNAs are indeed abundantly found in our dataset (Table S4).NEAT1, the structural RNA of the paraspeckles also appears to bind to KMT2D, although it is not part of the high-confidence dataset, our RIP-qPCR experiments confirmed their interaction.Since literature data generated so far does not support the direct localization of KMT2D to paraspeckles (Nakagawa et al., 2018), we cannot exclude the possibility of this interaction happening before the formation of paraspeckles, especially, since KMT2D also appears to regulate the expression of NEAT1 (Lin-Shiao et al., 2018).While the NEAT1 gene does not contain introns (Smith et al., 2022), the gene still gives rise to two isoforms, one short, polyadenylated transcript, NEAT1_1, and when the polyadenylation site is occluded, a longer one, NEAT1_2, which forms the core of paraspeckles (Sunwoo et al., 2009).Mapping the reads in the NGS data on the NEAT1 gene (Figure 3h) indicates that KMT2D pulls down the long, paraspeckle-related form of the RNA.Whether it can also bind to the short, polyadenylated isoform remains to be decided through targeted experiments.
KMT2D has been shown to localize to mediator condensates and super-enhancers (Alam et al., 2020;Fasciani et al., 2020;Lai et al., 2017) and we could also confirm the interaction of KMT2D and Med1 by coimmunoprecipitation (Figure S8).This latter localization is corroborated by the presence of super-enhancer associated lncRNAs (Soibam, 2017) in the RNA interactome of the protein (Table S5).These RNAs, like GATA2-AS1 and OIP5-AS1 showed a high enrichment in the RIP-qPCR experiments with both the full-length KMT2D and the RBR-polyQ region, indicating stable binding (Figures 1b and S1B).
The presence of KMT2D is also necessary for the correct positioning of RNA polymerase II (RPOL 2) at enhancers (Dorighi et al., 2017) and RPOL 2 has been shown to colocalize with KMT2D in nuclear condensates (Fasciani et al., 2020).As it is known that different functions of RPOL 2 are regulated by differential phosphorylation events on its C-terminal (CTD) disordered tail, we asked the question whether KMT2D has a preferential association with RPOL 2 phosphorylated on serine 2 (pSer2), related to transcription elongation (Bowman & Kelly, 2014) or on serine 5 (pSer5), involved in mRNA splicing (Nojima et al., 2018).As expected, KMT2D showed a non-uniform staining within the nuclei of HEK293T cells, with localized regions of condensations (Figure 4a).Colocalization with either phosphorylated form of RPOL 2 was also clear (Figure 4a, white arrowheads), colocalization pixel intensity analysis for randomly, but not blindly selected regions of interest across the nuclei revealed a significant difference in the colocalization of KMT2D with RPOL 2 pSer2 and RPOL 2 pSer5, with the latter showing a higher level of overlap (Figure 4a, box plot).This observation is in correlation with the previous results suggesting an involvement of KMT2D in mRNA processing.KMT2D also appears to be colocalized with the splicing factor SC-35 in HCT116 cells (Figure S9A), a further indication of its possible connection to splicing.
Since the polyQ region of KMT2D has been shown to be responsible for the driving of phase separation of the protein (Fasciani et al., 2020), we examined if the localization of the RBR-PolyQ was similar to that of the wild type protein.As represented on Figure 4b, the empty vector showed a rather uniform, diffused signal, present both in the cytoplasm and the nucleus.Also, transfection with the empty vector did not have any effect on the distribution and localization of the endogenous KMT2D (Figure 4b).Because of the even distribution of the vector, a large overlap of the pixel intensities of KMT2D and the fluorescent signal from the m-cerulean could be detected (Figure 4b, upper box plot).In contrast to the empty vector, RBR-polyQ showed a clear condensation pattern that was similar to the distribution of KMT2D (compare Figure 4a,b, middle panel).This localization pattern appears to require the presence of both the RBR and the polyQ segments, as the separated constructs demonstrated clearly distinct behavior in the cells (Figure S9B).The ΔQ construct looks similar to the empty vector, clearly lacking the capacity to form condensates.This is expected, as it was extensively proven that the polyQ region is indispensable for phase separation of KMT2D (Fasciani et al., 2020).Nevertheless, the polyQ segment alone also showed altered localization, appearing in larger, droplet-like condensations, in line with its inherent phase separation capacity.The size of the observed polyQ condensates was about 2 μm in diameter, comparable to the polyQ condensates observed earlier (Fasciani et al., 2020).The amount and distribution of the endogenous KMT2D was perturbed upon the expression of RBR-polyQ-a reduced level of condensation as well as a reduced protein level could be observed (Figures 4b and S10A).Moreover, when pixel intensities within the condensates were calculated, it became evident that KMT2D was excluded from the RBR-polyQ condensates (Figure 4b, lower bar graph).
RBR-polyQ also had a marked effect on the distribution of both RPOL 2 pSer2 and pSer5.Confocal images presented in Figure 4c show that both RNA RPOL 2 pSer2 and pSer5 appear to be more diffuse than in the untransfected cells (Figure 4a) and the empty vector controls (Figure 4c).It is also clear that both RPOL 2 forms show reduced colocalization with the RBR-polyQ condensates compared with the native protein (compare the graphs in Figure 4a,c).This is in line with the observation that KMT2D is necessary for the localization of RPOL 2 to enhancer regions (Dorighi et al., 2017) and further confirms that the full length KMT2D is prevented from localizing to the condensates formed by RBR-polyQ.Our analysis also revealed that RPOL 2 pSer5 is more severely affected by RBR-polyQ expression, than RPOL 2 pSer2 (Figure 4c, box plots), suggesting that RNA processing may be more dependent on the presence of a full length KMT2D, than transcription elongation.Together with these changes we observed an $30% decrease in RPOL 2 protein level in the RBR-polyQ-transfected cells (Figure S10B), which is in line with the reduction of RPOL 2 mRNA detected earlier (Figure 2c).Compared with the global decline of RPOL 2 levels, the amount of RPOL 2 pSer2 and pSer5 was even more reduced, with only around 40% remaining signal (Figure S10C), providing further background for the strong influence of RBR-polyQ on gene expression.

| DISCUSSION
RNA binding by HKMTs is not unprecedented in the literature (Bure et al., 2022), with Ezh2 being the most well-characterized example (Mirzaei et al., 2022) and although sporadic evidence indicates that several other HKMTs are also able to interact with specific RNAs (Hu et al., 2019;Pan et al., 2018;Wang et al., 2019;Wang et al., 2021), the generality of the phenomenon is still unclear.
Even though it does not have a canonical RNA binding domain, our results confirm that KMT2D belongs to the group of RNA-binding HKMTs.According to our RIP-Seq data, similarly to Ezh2 and Set1, KMT2D has a broad RNA interactome, implying a relatively promiscuous binding with low sequence specificity, but unlike Ezh2, KMT2D appears to preferentially bind mRNAs of the genes under its control.In this respect, its RNA binding profile resembles more to the RNA interactome of Set1 (Luciano et al., 2017), its evolutionary ancestor.
Functional relevance of the RNA binding can manifest in two major, but not mutually exclusive manners.One, just like in the case of Ezh2 (Mirzaei et al., 2022), RNA interaction can play a role in the targeting and precise positioning of the methyltransferase on the chromatin.Two, the interaction may have a direct role in the life of RNAs, either in their processing, maturation, or transport.
Our comprehensive approach to identify the RNA interactome of KMT2D allows us to assess both possibilities.

| Targeting and localization of KMT2D
Targeting of HKMT proteins through RNA binding is generally achieved by the interaction with lncRNAs (Statello et al., 2021).In our study the most relevant RNAs in this sense are the lncRNAs associated with super-enhancers.We could identify 19 super-enhancer associated antisense RNAs (Soibam, 2017) in our RIP-Seq experiments and we confirmed binding to KMT2D with RIP-qPCR in the case of two.Both OIP5-AS1 and GATA2-AS1 showed significant enrichment over the negative control in the RIP-qPCR experiments, supporting the results of the RIP-seq experiments.It has already been established that KMT2D is necessary for super-enhancer formation (Alam et al., 2020), but our results showing that super-enhancer associated RNAs can make direct contact with KMT2D add further details to the mechanism of super-enhancer formation.Through the interaction with the superenhancer lncRNAs, KMT2D can be retained in the structures for elongated periods without the necessity to remain fixed on the chromatin.Given the low abundance of the protein within the nucleus (van Nuland et al., 2013), this mechanism may ensure the availability of KMT2D in the proper localization and concentrated presence during gene activation (Figure 5a).The phase separation capability of the glutamine-rich region of KMT2D (Fasciani et al., 2020) can even contribute to the formation of the super enhancer structure.It is also worth noting that KMT2D appears to be necessary for enhancer RNA synthesis in itself (Dorighi et al., 2017).
While the localization of KMT2D to super-enhancers could influence the expression of several genes at the same time, more precise, fine-tuned targeting through RNA binding can also be envisioned.Several of the identified lncRNA partners of KMT2D, like HAGLR (Sun et al., 2020), FENDRR (Munteanu et al., 2021), SNHG-15 (Olatubosun et al., 2022), and NORAD (Lee et al., 2016) have been implicated in different cancers through suggested involvement in the regulation of gene expression.In this scenario, direct interaction of these lncRNAs and KMT2D may result in the recruitment of KMT2D to specific genomic targets, inducing the expression of selected genes (Figure 5b).In an alternative scenario binding to lncRNAs can have the opposite effect, sequestering KMT2D and thus achieving the downregulation of specific target genes (Figure 5c), through a similar mechanism that was described for the GAS5 lncRNA (Kino et al., 2010).Cancer-related overexpression or downregulation of these targeting RNAs may result in dysregulated gene expression and tumor progression, much like in the case of MALAT1 (Kim et al., 2017) and HOTAIR (Battistelli et al., 2017).NORAD represents a specific case in the bound lncRNAs of KMT2D, as it appears to be able to regulate KMT2D expression itself (Chen et al., 2022).The fact that these two molecules can also interact with each other, indicates the existence of a finely regulated feedback of their expression.As NORAD has also been implicated in several cancers (Soghli et al., 2021) and plays a critical role in genome stability (Elguindy & Mendell, 2021;Lee et al., 2016;Munschauer et al., 2018), the details of this regulation merit further investigations.
Another specific example is NEAT1, the architectural RNA of the paraspeckles (Nakagawa et al., 2018).Our RIP-seq and RIP-qPCR experiments confirmed consistent binding of this RNA to KMT2D, but there is no evidence of KMT2D participation in the paraspeckle proteome in the literature (Wang & Chen, 2020).On the other hand, NEAT1 expression is also controlled by KMT2D (Lin-Shiao et al., 2018), suggesting that binding occurs cotranscriptionally and may have a function in the processing of the RNA.

| Involvement in mRNA processing
The large overlap between the genes under the control of KMT2D and the mRNAs bound by the protein together with the fact that binding appears to occur cotranscriptionally raise the possibility of the involvement of KMT2D in mRNA processing.In fact, this has already been established for Set1, the yeast orthologue of KMT2 proteins (Luciano et al., 2017).In that case, similarly to KMT2D, cotranscriptional RNA binding in parallel with the presence of spliceosomal snRNAs in the RNA interactome indicated a possible role for the protein in RNA splicing (Figure 5d).This putative role for KMT2D in pre-mRNA splicing is supported by the colocalization of the protein with RNA Polymerase II phosphorylated on serine 5, involved in cotranscriptional splicing through physical interaction with the spliceosome (Nojima et al., 2018).Colocalization of KMT2D and the splicing factor SC-35 (Figure S9) further supports this possibility.
Moreover, Luco et al. (2010) determined that specific histone modifications were indicative of cell type specific alternative splicing events in several genes.One of the significantly altered histone modifications was the H3K4me1 mark, suggesting a possible role for KMT2D in the regulation of the process.The studied alternative exons were located in the FGFR2, TPM1, and TPM2 genes, all of which show binding to KMT2D on the mRNA level (Supplementary Table S1).The generally accepted explanation for the crosstalk between histone methylation and mRNA splicing is the recruitment of splicing factors to the modified histones through chromatin binding proteins (Naftelberg et al., 2015;Yearim et al., 2015).Our observations raise the possibility of further layer of regulatory options, where direct binding of the nascent RNA to the histone modifier can mask an alternative splicing site or, in another scenario, stabilizes the RNA in a conformation more accessible to splicing factors.
Similarly to yeast Set1, the RNA interactome of KMT2D also contains several splicing-related scaRNAs and snoRNAs, as well as spliceosomal snRNAs (Table S4), suggesting the localization of KMT2D to the membraneless nuclear organelles responsible to these processes (Figure 5e).However, these may as well be a result of an indirect binding, that is, that the pulled down mRNAs are already in the process of maturation.Based on our results only, this possibility cannot be rejected and binding of spliceosomal snRNAs directly to KMT2D needs to be confirmed independently.
Since KMT2D is not localized to the nucleolus, it is unlikely that the snoRNAs are bound by KMT2D in their functional place.Interestingly, most of the snoRNAs in our RIP datasets belong to the orphan category, with no known rRNA targets (Dupuis-Sandoval et al., 2015), suggesting that KMT2D-interacting snoRNAs have predominantly unrelated roles to ribosome biogenesis.A group of five interacting snoRNAs (SNORA73B, SNORA40, SNORA74A, SNORD15B, and SNORD97) have been shown to be involved in the regulation of chromatin structure, being necessary for the maintenance of open euchromatin (Schubert et al., 2012).Their interaction with KMT2D may be involved in this role.It is also important to note that none of these snoRNAs appear to interact with the RBR-polyQ region, indicating that their binding mechanism is probably different from the majority of the other KMT2D-interacting RNAs.

| The role of RBR-polyQ
Our results indicate that the RBR-polyQ region of KMT2D itself is capable of stable RNA binding and several of its RNA partners are shared with the native protein.Nevertheless, the high number of bound RNAs and the differences in the RIP-qPCR results indicated that RBR-polyQ has reduced binding specificity compared with KMT2D.For example, BATF3 was highly enriched in the RBR-polyQ RIP-qPCR (Figure S2), contrary to the non-significant enrichment in the KMT2D pulldowns (Figure 1a).Moreover, even the RNAs highly enriched in the KMT2D RIP-qPCR samples produced higher enrichment in the RBR-polyQ RIP-qPCR, strongly suggesting that other regions of KMT2D, or its associated protein partners also participate in the regulation of RNA binding and the identified RNA binding region determines specificity only partially.It has to be mentioned that bioinformatics predictions indicate the presence of several other putative RNA recognition sequences in KMT2D, one of which has already been shown to interact with RNAs in vitro (Szab o et al., 2018).This delineates a scheme where multiple cooperating RNA binding surfaces achieve specific and regulated interaction between KMT2D and RNAs, something that has been proven for Ezh2 (Szab o et al., 2022).
When interpreting our results, we must also take into consideration the effect of RNA binding on the phase separation of KMT2D.Fasciani et al. (2020) showed that the expression of a truncated KMT2D variant lacking the polyQ region led to a diminished condensation of transcriptional cofactors, like MED1 and BRD4 and concluded that KMT2D facilitates the assembly of chromatin clusters through liquid-liquid phase separation driven by the polyQ tract of the protein (Fasciani et al., 2020).Interestingly, our observations indicate that while the polyQ segment alone enhances gene expression, the presence of the RBR region severely disrupts the expression of the bound RNAs (Figure 2c).This phenomenon merits further investigation, but is probably related to its inhibitory effect on KMT2D-containing condensate formation.

| CONCLUSION
Summarizing our results, it is apparent that KMT2D is able to bind to an abundant set of coding and non-coding RNAs and there seems to be a significant overlap in the bound RNAs and the genes directly under the control of KMT2D.This observation, taken together with the fact that the RNAs appear to be bound co-transcriptionally, suggest that KMT2D might be involved in the processing of the nascent RNA molecules.
A further important role of the RNA binding can be the direction and retaining of KMT2D in super-enhancer condensates, which is supported by the presence of a large number of super-enhancer lncRNAs in the RNA pool of KMT2D and the localization and effect of RBR-polyQ.
A third functional relevance of RNA binding can be the direct targeting of the enzyme to specific genomic loci, resulting in the selective regulation of individual genes.
Underlining the physiological relevance of our results is the recent discovery of a set of disease-related point mutations localizing exclusively to the RNA binding region studied here (Baldridge et al., 2020;Cuvertino et al., 2020;Stadelmaier et al., 2021).All of these mutations result in severe developmental disorders that are different from Kabuki syndrome.Importantly, H3K4me levels appeared to be unaffected by the expression of the mutant KMT2D, indicating that other, non-canonical functions of the protein are affected.The demonstration of the in vivo RNA binding capacity and the possible involvement of KMT2D in RNA maturation offers a possible explanation of the disease mechanism.

| Nuclear extraction for RNA immunoprecipitation
HEK293T cells were grown to 80%-90% confluency on a TC-100 mm plate.After washing twice with ice-cold phosphate-buffered saline (PBS), cells were UV crosslinked once at 150 mJ/cm 2 on ice using CL-1000 UV crosslinker, then harvested by scraping in 5 mL of ice-cold PBS.Collected cells were lysed in 1 mL/10 Â 10 6 cells of cytoplasmic extraction buffer (20 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.5 mM DTT, 0.05% NP-40, Â1 protease inhibitor cocktail [Sigma, cat no.: S8820]) for 15 min on ice with frequent mixing, then spun at 3000g for 5 min at 4 C.The supernatant was discarded, and the nuclei were washed once with 1 mL of ice-cold PBS to remove cytoplasmic contaminants.Nuclei were lysed in 200 μL/per 10 Â 10 6 cells of nuclear extraction buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% NP-40, 5 mM EDTA, 1 mM EGTA, 1 mM PMSF, 50 mM NaF, Â1 protease inhibitor cocktail, 4 U/mL RNase inhibitor [Sigma, cat no.: 3335402001]) for 30 min on ice with frequent vortexing.The nuclear lysate was centrifuged at 16,000g for 10 min at 4 C.The supernatant was diluted with 800 μL of dilution buffer (DEPC-treated PBS pH 7.4 and 6.5% glycerol).The diluted nuclear lysate was pre-cleared with 60 μL protein A/G coated slurry magnetic beads (ThermoFisher, cat no.: 88802) for 2 h at 4 C on endover-end rotation.Beads were removed on a magnetic rack and the cleaned lysate was transferred to a new 1.5 mL microfuge tube for RNA immunoprecipitation.

| RNA immunoprecipitation
Exactly 250 μL diluted nuclear lysate per RNA immunoprecipitation (RIP) reaction was incubated overnight at 4 C with 6 μg of either anti-KMT2D or RIP negative control IgG from the same species (see the list of antibodies used in Supplementary Methods Table 1) on endover-end rotation.An aliquot of 25 μL of the diluted nuclear lysate was kept as 10% input at À80 C for RIP-qPCR.On the next day, 12 μL of protein A/G coated slurry magnetic beads (ThermoFisher, cat no.: 88802) or anti-FLAG conjugated M2 magnetic beads (Millipore, cat no.: M8823) per IP were washed three times with 200 μL of bead washing buffer (DEPC-treated PBS pH 7.4, 0.5% Tween-20).Nuclear lysate-antibody mixtures were incubated with washed beads for 2 h on end-over-end slow rotation at 4 C. Ribonucleoprotein (RNP) bound magnetic beads were briefly spun at 3000 rpm and separated on a magnetic rack for 5 min at 4 C.The supernatants were discarded and beads were washed three times with 200 μL of low-salt wash buffer (150 mM NaCl, 20 mM Tris-HCl pH 8, 2 mM EDTA, 1% Triton X-100, 1X protease inhibitor cocktail [Sigma, cat no.: S8820], 80 U/mL Protector RNase Inhibitor [Sigma, cat no.: 3335402001]).To avoid the elution of non-specific protein complexes bound to the microfuge tube wall, beads were transferred to new pre-chilled microfuge tubes, and washed again three times with bead washing buffer.Bound RNAs were eluted with 200 μL of IP elution buffer (100 mM Tris pH 8, 10 mM EDTA, 1% SDS, 0.2 μg/μL Proteinase K) and 10% input samples were brought up to a final volume of 200 μL with the same elution buffer.The elution was performed at 60 C for 30 min with frequent vortexing, then cooled down on ice for 5 min.After a brief spinning at 3000 rpm, the eluates of RIP reactions were transferred to new microfuge tubes and proceeded to RNA purification.

| RNA purification
Immunoprecipitated RNAs with their corresponding 10% Inputs, and total RNAs were extracted according to Imprint ® RNA Immunoprecipitation protocol (Sigma, cat no.: RIP) or following the manufacturer's instructions of Direct-Zol RNA Miniprep kit (Zymo Research, cat no.: R2050).Depending on the downstream applications, purified RIP RNAs were either pelleted for nextgeneration sequencing or re-suspended/eluted in 20 μL of nuclease-free water for cDNA synthesis followed by RIP-qPCR, and total RNAs in 20 μL of nuclease-free water were prepared for cDNA synthesis.

| RNA sequencing data analysis
We prepared three biological replicates for the RIP samples with the KMT2D and a negative control antibody and one experiment with the RBR-polyQ region and a negative control antibody.After rRNA depletion, 1 μg of RNA of each sample was used for RIP-sequencing protocol on an Illumina NovaSeq 6000 high throughput NGS platform (Novogene, Cambridge, UK).
NGS results were analyzed as follows.After trimming the adaptors and the low-quality sequences with the trimming tool Trimmomatic (Bolger et al., 2014), the obtained 150 bp paired-end reads were aligned to the GRCh38 human transcriptome using HISAT2 (Kim et al., 2019).Aligned reads were manipulated (converted to BAM format, sorted, and indexed) with samtools (Li et al., 2009).Transcriptome assembly and differential expression analysis were done using Cufflinks program package (cufflinks, cuffmerge, cuffquant, and cuffdiff) (Trapnell et al., 2012).Differences between the identified transcripts were calculated based on the differences in FPKM (fragments per kilobase of exon per million mapped fragments) numbers of the KMT2D and the negative control samples.We considered a gene as a positive hit if it had a higher FPKM value in the RIP sample than in the negative control.

| Randomized datasets and comparison with the literature data
For the gene enrichment analysis we created three datasets: the UniProt set, the Literature set and the Experimental set.
The UniProt set had 26,819 entries, containing all human primary gene names from the UniProt database (Release 2023_01) (UniProt Consortium, 2023).The Literature set consisted of the genes regulated by KMT2D, based on information found in the literature (Dawkins et al., 2016;Lin-Shiao et al., 2018;Ortega-Molina et al., 2015) and had 2174 unique gene that were also present in the UniProt list.The Experimental set was based on the high-confidence KMT2D binding mRNAs dataset (1318 genes), but two hits had to be excluded for they were not found in the UniProt dataset, resulting in 1316 unique gene names.
Next, we generated 200 different gene lists from the Uniprot list that had 1316 randomly selected entries (corresponding to the Experimental set).Then we calculated the distribution of the overlapping hits of these 200 randomly generated lists and the KMT2D regulated genes and checked the overlap of KMT2D regulated genes and our identified mRNAs relative to this distribution.Panther protein class analysis was performed using the PANTHER Classification System online tool (Mi et al., 2019;Thomas et al., 2022).

| cDNA synthesis
For each cDNA synthesis reaction, equal volumes of RIP RNAs and their corresponding 10% input, or equal amount of total RNA (0.8-1 μg/reaction) were reverse transcribed according to SuperScript™ III Reverse Transcriptase kit (ThermoFisher, cat no.: 18080044) manufacturer's instructions.Reactions were briefly vortexed and spinned down for 5 s.PCR cycling conditions were applied as following; incubation at 25 C for 5 min followed by 50 min incubation at 50 C, then 10 min at 55 C. Finally, the reactions were heat inactivated at 70 C for 15 min.

| RT-qPCR
RT-qPCR reaction mixes were prepared as follows; 10 μL TaqMan™ Fast Advanced Master Mix (ThermoFisher, cat no.: 4444557), 1 μL Taqman probe (for the list of probes see Supplementary Methods Table 2), 7 μL nuclease-free water.The reaction mixture at 18 μL/well was loaded into corresponding wells of 96-Well RT-qPCR Plate (ThermoFisher, cat no.: 4346907 or cat no.: N8010560), and 2 μL of cDNA was added per reaction.Using QuantStudioTM 5 or 6 pro Real Time-qPCR (Thermo Fisher Scientific) machines, fast comparative amplification cycling setup was applied for 40 cycles.To calculate the yield (% input) and specificity (fold enrichment) of each probe in specific RIP reactions compared with negative control RIP, cycle threshold values were analyzed to get ΔΔCt according to the method published by Marmisolle and others (Marmisolle et al., 2018).
The percentage of remaining expression of KMT2D target RNAs was calculated according to this equation: (2 ÀΔΔCt ) Â 100, with SCARNA6 and GAPDH used as combined internal control genes.

| Plasmids and DNA constructs
The RBR-polyQ sequence of KMT2D was amplified from reversely transcribed total RNA extracted from HEK293T cells.The amplified RBR-polyQ sequence was cloned between the BamHI and NotI sites in an N-terminal mCerulean vector, which carries a FLAG tag on the C terminus (a gift from Dr. Attila Reményi).The map of the expression vector with cloned RBR-polyQ sequence is presented in the (Supplementary Methods Figure S1).
The polyQ region alone was from mCerulean-RBR-polyQ-FLAG expression vector, then subcloned in the empty vector between BamHI and XhoI restriction sites.
The deltaQ sequence was from pET22b-RBRΔQ cloning vector (Szab o et al., 2018), then subcloned into the empty mCerulean expression vector between BamHI and XhoI restriction sites.
The primer sequences used for the preparation of the constructs are listed in Supplementary Methods Table 4.

| Cell transfection
HEK293T cells were seeded either at density of 10 Â 10 3 cells/cm 2 on Ø12 mm pre-cleaned and sterilized round coverslips (VWR, cat no.: 630-2190) in 24-well plates (if immunostaining was planned) or at 7 to 8 Â 10 3 cells/cm 2 in 6-well plates (if RNA immunoprecipitation or total RNA extraction was planned) in complete DMEM medium for 2 days in humidified atmosphere at 37 C and 5% CO 2 .On transfection day, seeded cells were washed once with pre-warmed serum-free DMEM, and re-incubated in the same medium at 37 C and 5% CO 2 .Meanwhile, the DNA-transfection reagent complex was prepared with Lipofectamine™ 2000 (Invitrogen, cat no.: 11668027) in serum-free DMEM according to manufacturer's instructions with some modifications.The siRNAs (Supplementary Methods Table 3) at 15 picomole/cm 2 or 250-800 ng/cm 2 of plasmid DNAs (mCerulean-RBR-polyQ/WT-FLAG expression vector or empty vector) diluted in 125 μL of serum-free DMEM medium.After 5 min incubation at RT, diluted siRNAs or plasmid DNAs were gently mixed with diluted lipofectamine reagent and incubated at room temperature (RT) for 20 min.The resulting mixture was slowly dropped over starved cells to a final volume 300 μL or 1 mL depending on the well size.The transfection was proceeded for 5 h in a humidified atmosphere at 37 C and 5% CO 2 , then pre-warmed complete-DMEM without antibiotics was added to transfected cells to a final concentration of 5% of fetal bovine serum, and incubated overnight in a humidified atmosphere at 37 C and 5% CO 2 .On the next day, transfected cells were either subjected to immunostaining or 1 mL of pre-warmed complete-DMEM was added to the wells, and incubated again overnight at 37 C and 5% CO 2 .Two days post-transfection, transfected cells were washed once with ice-cold PBS and harvested for nuclear extraction followed by RNA immunopreciptation (RIP) or total RNA extraction.

| Cell sorting
The transfected cells expressing mCerulean fluorescent protein tag were sorted out by BD FACSAria III (BD Biosciences, San Jose, CA) cell sorter, using violet laser (405 nm) excitation and 525/50 nm emission.

| Co-immunoprecipitation
Co-immunoprecipitation (Co-IP) was performed the same way as aforementioned nuclear extractions and RIPs with some modifications.About 3 Â 10 7 cells/Co-IP were cross-linked with 1 mL of 0.75% formaldehyde for 8 min at RT, and quenched with 125 mM of glycine for 5 min at RT, then 10 min on ice.Cells were lysed in 1 mL (per 10 Â 10 6 cells) of cytoplasmic extraction buffer.The nuclear pellets were resuspended in 750 μL (per 3 Â 10 7 cells) of nuclear extraction buffer without RNase inhibitor, for 15 min on ice with frequent vortexing, then diluted with equal volume of dilution buffer (PBS pH 7.4, 10% glycerol).Using Diagenode SA BIORUPTOR Plus, the nuclei (500 μL/1.5 mL microfuge tube) were sonicated at high amplitude (level 6) for 20 cycles (30 s pulse, 24 s rest).The nuclear lysate was centrifuged at 16,000g for 10 min at 4 C. Supernatants were mixed and equal volumes corresponding to 3 Â 10 7 nuclei ('1 mg of nuclear protein) was used per each Co-IP.Nuclear lysates of about 2 to 2.5 Â 10 6 nuclei in 25 μL were kept at À80 C as Co-IP input.Co-IP reactions were pre-cleaned with protein A/G coated magnetic beads (ThermoFisher, cat no.: 88802) for 1 h at RT on end-over-end rotation, then the beads were discarded.For each IP reaction, 6 μg of anti-target or negative control IgG of the same species was used (for a list of antibodies, see Supplementary Methods Table 1).After overnight incubation at 4 C, the nuclear lysate-antibody complex was incubated with 30 μL protein A/G coated magnetic beads for 1 h at RT on end-over-end rotation.Once Co-IP was done, beads were successively washed two times with low-salt wash buffer, two times with high-salt wash buffer (500 mM NaCl, 20 mM Tris-HCl pH 8, 2 mM EDTA, 1% Triton X-100, Â1 protease inhibitor cocktail, 80 U/mL Protector RNase Inhibitor), and one time with TE buffer (15 mM Tris-HCl pH 7.4, 5 mM EDTA).Protein complexes were eluted with 25 μL of Â1 Laemmli buffer containing 25 mM dithiothreitol (DTT), and input samples were mixed with 7 μL of Â4 Laemmli buffer containing 100 mM DTT.All samples were denatured at 95 C for 10 min.After cooling down at RT, magnetic beads were pulled aside on a magnetic rack and the supernatants proceeded to Western blotting.

| Western blot
Proteins were separated on 1 mm thick 4%-20% gradient Tris-Acetate gel.Exactly 6 μL of protein ladder (Thermo Scientific™, cat no.: 26619) was used per lane.Proteins were migrated at 180 V for 1 h.Separated proteins were transferred onto PVDF membrane (BioRAD, cat no.: 10026933) according to instructions of Trans-Blot Turbo Transfer System manual (BioRAD, cat no.: 1704150) using Â1 turbo transfer buffer (BioRAD, cat no.: 10026938) containing 20% methanol.After transfer, membranes were washed twice with tris buffered saline containing 0.1% Tween-20 (TBST), then blocked for 1 h at RT in 5% nonfat skim milk dissolved in TBST.Blocked membranes were washed twice in TBST, and probed overnight at 4 C with primary antibodies (Supplementary Methods Table 1) diluted in TBST.Then, unbound antibodies were washed out three times with 15 mL of TBST, each for 10 min on shaking at RT. Washed membranes were blotted for 1 h at RT with horseradish peroxidase-conjugated secondary antibodies (Supplementary Methods Table 1) diluted in TBST.Unbound antibodies were washed out three times with 15 mL of TBST, each for 10 min on shaking at RT. Finally, membranes were subjected to chemiluminescence revealing using BioRad ChemiDoc MP imaging system.

| In vitro RNA binding
ΔQ protein expression, purification and in vitro RNA binding studies by Microscale Thermophoresis system (Monolith NT.115 from NanoTemper Technologies, München, Germany) were performed as described previously (Szab o et al., 2018).Standard treated Monolith capillaries (NanoTemper, cat no.: MO-K002) were used for the measurements.Instrument settings are presented in Table 1.
Cy5 labeled RNA concentrations were set to give an initial raw fluorescence between 300 and 500 counts (5 nM for NEAT1_2 and 25 nM for SCARNA7).All experiments were performed at room temperature in DEPC-treated assay buffer (50 mM Tris pH: 7.5, 150 mM KCl, 2.5 mM MgCl 2 , 0.05% NP-40, 1 mM DTT).Normalized fluorescence values after 1.25 s after turning on the IR laser were used as T-jump values.

| Immunofluorescence staining
For immunofluorescence (IFS) assays, cells were seeded on pre-cleaned and sterilized 12 mm round coverslips at 3 Â 10 4 cells per well in 24-well plates.At the desired confluency, cells were fixed with 4% paraformaldehyde/ PBS for 10 min at RT.After three subsequent washes with 0.1% Tween-20/PBS (PBS + T), each for 5 min, cells were permeabilized with 0.5% Triton X-100/PBS for 10 min at RT, then blocked with blocking solution; PBS/2% BSA/5% fetal bovine serum/0.2%porcine skin gelatin/0.1% Triton X-100 for 1 h at RT, followed by incubation overnight at 4 C with primary antibodies diluted in the blocking solution (Supplementary Methods Table 1).Primary antibodies were washed out four to six times with PBS + T, and cells were incubated for 1 h at RT with secondary antibodies diluted in the blocking solution.DAPI (Sigma-Aldrich, cat no.: MBD0015) was used for nuclear staining.After four successive washes with PBS + T and two times with sterile distilled water, coverslips were mounted on microscopic slides using Fluoromount-G (Thermofisher, cat no.: 00-4958-02), and kept at 4 C until IFS detection.

| Confocal microscopy
Colocalization was assessed by an LSM 710 inverted confocal microscope (Carl Zeiss, Oberkochen, Germany) with a Plan-Apochromat Â63/1.40Oil DIC M27 objective.Images were recorded by the ZEN software (Carl Zeiss, Oberkochen, Germany).Using constant image acquisition settings, images were acquired from random fields and replicates.All images were processed with Carl Zeiss ZEN 2012 Blue edition software.

| Colocalization intensity analysis
Images were analyzed with an enhanced version of Ima-geJ called Fiji.ImageJ bundled with 64-bit Java 1.8.0_172 was downloaded from: https://imagej.nih.gov/ij/download.html,and BIOP JACoP, a Fiji's plugin for colocalization intensity analysis, was downloaded from https://biop.epfl.ch/Fiji-Update/.This plugin, depending on the space methods of Pearson, Manders, Li and more, implements the pixel intensity correlation.After background subtraction, images with two channels were analyzed based on Pearson's correlation which measures the strength of the linear relationship between the pixel intensity of the two channels ranging from À1 to +1.
Here, the value of À1 means complete artifact colocalization, 0 is no co-localization, and + 1 is complete co-localization.

| qPCR experiments
Experiments were carried out on two to four independent samples and the fold enrichment of immunoprecipitated RNAs was calculated as 2 (ÀΔΔCt [KMT2D RIP/NS Rb IgG RIP]) .Statistical analysis was performed using GraphPad Prism 8.0.1, multiple t-test.

| Colocalization experiments
All experiments were carried out on at least three independent biological replicates and distinct samples were selected for relative measurements.For quantification of colocalization intensity, all the images were taken randomly from the same sample, and relative quantifications were reproduced from independent biological replicates to recover similar results.The whole nucleus or selected protein condensates were gated as regions of interest (ROI) for the measurement of Pearson's coefficient.Using GraphPad Prism 8, statistical P values were calculated by either multiple or unpaired two-tailed Student's t-test, and the P < 0.05 to P < 0.0001 was marked by * and ****, respectively.
Evaluation of the RNA binding of KMT2D.(a) Enrichment of mRNAs and (b) lncRNAs in the KMT2D RIP samples over the negative control antibodies.RT-qPCR analysis was performed for the quantification of the RNAs in the RIP samples from HEK293T cells.Orange coloring represents the enrichment of the RNAs in the KMT2D RIP samples over the negative controls (black).Mean values and standard errors of two to four independent experiments are shown.The details of the replicate numbers can be found in Table S3.Results for experiments with one replication are shown on Figure S2.* and **** indicate significant differences adjusted P < 0.05 and P < 0.0001 values, respectively.(c) Enrichment of genes under the regulation of KMT2D in the RIP high confidence dataset over random datasets.Gray columns represent the overlaps of the KMT2D-regulated genes with randomized datasets from the UniProt database, red triangle indicates the number of shared genes with the high-confidence RIP dataset.The green dotted line shows the moving average of the data.Note that the bin sizes are increased from 5 to 20 above 140 for better visibility.(d) PANTHER Protein Class categorization of the mRNAs bound to KMT2D.Protein classes enriched significantly (FDR adjusted P-value <0.05) are detailed on the second pie chart.(e) Impact of KMT2D downregulation on the level of KMT2D-partner RNAs in HEK293T cells.Differences from SCARNA6 and GAPDH combined internal controls are shown.Mean values and standard errors of three independent experiments are shown.* and **** indicate significant differences from control samples at P < 0.05 and P < 0.0001 values, respectively.T A B L E 1 Instrument settings for the MST measurements.LED power (%)MST power (%) Before MST (s) MST on (s) After MST (s) Delay (s)

F
I G U R E 2 Evaluation of the RNA binding of the RBR-PolyQ.(a) Domain architecture of KMT2D (gray) and the schematic representation of the different constructs used.(b) Enrichment of mRNAs and lncRNAs in the RBR-polyQ RIP samples over the empty vector.RT-qPCR analysis was performed for the quantification of the RNAs in the RIP samples from HEK293T cells.Purple coloring represents the enrichment of the RNAs in the RBR-polyQ RIP samples and black coloring represents the enrichment of the RNAs in the negative controls.Mean values and standard errors of two independent experiments are shown.(c) Impact of RBR-polyQ (purple), ΔQ (blue) or polyQ (red) overexpression on the level of KMT2D partner RNAs in HEK293T cells.The average of SCARNA6 and GAPDH combined expression was used as internal control to analyze the data.Mean values and standard errors of the differences from the empty vector from three independent experiments are shown.* and **** indicate significant differences adjusted P < 0.05 and P < 0.0001 values, respectively.

F
I G U R E 4 Intranuclear localization of KMT2D and RBR-polyQ.(a) Representative images of the distribution of RPOL 2 pSer2 (upper row) or pSer5 (lower row) and KMT2D condensates; scale bars: 2 μm.The graph represents the colocalization intensity between RPOL 2 pSer2 or pSer5 and KMT2D condensates as determined by Pearson's correlation coefficient.(b) Representative images of the distribution of the endogenous KMT2D in empty vector (upper row) or RBR-polyQ (middle row) transfected cells.An enlarged picture of a gated nucleus is shown in the bottom row; scale bars: 2 μm.Colocalization intensities between KMT2D and the empty vector or RBR-polyQ were determined using Pearson's correlation coefficient gating either the whole nucleus (upper graph) or the randomly selected regions of interest (ROIs) in RBR-polyQ transfected nuclei (lower graph).(c) Representative images of the distribution of RPOL 2 pSer2 in empty vector (upper row) or RBR-polyQ (bottom row) transfected cells.The graph represents Pearson's correlation coefficient of the colocalization intensity between RPOL 2 pSer2 and RBR-polyQ from randomly but not blindly selected ROIs.(d) Representative images of the distribution of RPOL 2 pSer5 in empty vector (upper row) or RBR-polyQ (bottom row) transfected cells.The graph represents Pearson's correlation coefficient of the colocalization intensity between RPOL 2 pSer5 and RBR-polyQ from randomly selected ROIs.The number of analyzed nuclei is represented by N on each graph, while the number of selected ROIs picked from different nuclei is represented as n.Scale bars, 2 μm.Unpaired twotailed Student's t-test was applied for all statistical analysis, and values of P < 0.05 and 0.0001 are marked by * and ****, respectively.

F
I G U R E 5 Schematic representation of the possible roles of RNA binding in the regulation of the activity of KMT2D (dark red).(a) Localization to super-enhancers through binding to super-enhancer RNAs (SE-RNAs).(b) Targeting to specific genomic loci through the binding of lncRNAs.(c) Sequestration of KMT2D by lncRNAs, preventing histone methylation.(d) Involvement in mRNA processing through binding nascent mRNAs.(e) Localization to membraneless organelles (binding partners like scaRNAs and snoRNAs).