Transcription directionality is licensed by Integrator at active human promoters

A universal characteristic of eukaryotic transcription is that the promoter recruits RNAPII to produce both precursor mRNAs (pre-mRNAs) and short unstable promoter upstream transcripts (PROMPTs) toward the opposite direction. However, how the transcription machinery senses the correct direction to produce pre-mRNAs is largely unknown. Here, through multiple acute auxin-inducible degradation (AID) systems, we show that rapid depletion of a RNAPII-binding protein complex, Integrator, results in robust PROMPTsaccumulation throughout the genome. Interestingly, the accumulation of PROMPTs is compensated by the reduction of pre-mRNA transcripts in actively transcribed genes. Consistently, Integrator depletion alters the distribution of polymerase between the sense and antisense direction, which is marked by an increased RNAPII-CTD Tyr1 phosphorylation (Tyr1P) level at PROMPT regions and a refrained Ser2 phosphorylation (Ser2P) level at transcription start sites (TSSs). Mechanistically, the endonuclease activity of Integrator is critical to suppress PROMPTs production in a sequence-independent manner. During this step, the endonuclease activity can be inhibited by the U1 signal on nascent antisense transcripts through the recognition of the U1 snRNA-Integrator which relies on the U1-Integrator axis to govern the direction of gene transcription.


Introduction
Transcription initiation of eukaryotes is a highly orchestrated process that requires hundreds of protein factors to establish the proper start of gene expression.Active promoters typically harbor two distinct preinitiation complexes (PICs) assembled with opposite orientations for divergent transcription [1][2][3] .
Although bidirectional transcription is known to localize into a nucleosome-free region with two adjacent PICs, the relationship between two PICs is poorly understood 8,10,11 .Various studies have shown that the U1-PAS (U1-mediated polyadenylation sites) axis at active promoters acts as a cis-element to determine the directionality of transcription 12,13 .Speci cally, the low occurrence of U1 snRNP sites and high abundance of polyadenylation signal (PAS) sites upstream of sense transcription start sites (TSS) attenuate PROMPT transcription, thus guaranteeing the transcription towards the sense direction 12 .
However, thus far, little is known about how this unbalanced bi-directional transcription event is precisely initialized and mechanistically conducted in eukaryotes.
In addition to its endonuclease function, recent structure studies demonstrated that the Integrator complex contains two different enzymatic modules 24,25 .Along with the well-known endonuclease module composed of INTS11, INTS9, and INTS4 subunits, it also contains an interesting phosphatase module including protein phosphatase 2A (PP2A) and 2C subunits associated with the INTS6 and INTS8 subunits 24,25 .These two functional modules are placed on the opposite sides in the complex and the whole Integrator complex mimics a soft hat sitting on the top of RNA polymerase II 25 .The function of the phosphatase module has not been fully uncovered yet, but it has been proposed to associate with the phosphorylation dynamics of the C-terminal domain of RPB1 (RNAPII-CTD) during the pause-release and elongation stages of transcription [26][27][28][29] .
Due to the complexity of the Integrator structure, the question about its function during transcription is largely under debate.After its discovery, its role has been proposed into two distinct aspects: facilitating transcription or attenuating gene expression [17][18][19]26,27,[30][31][32] . Analyses baed on the length of genes or speci c environmental responses further complicate the determination of its function during different stages of transcription 18, [30][31][32][33][34][35][36] .However, most of the previous studies on Integrator were based on the use of traditional RNA interference tools, which generally needs 48-60 hours for depletion.Therefore, these observations could mix the direct evidences with nonessential or indirect outputs 37 .
To avoid unwanted secondary effects, we utilized the acute degradation system (AID), in which the depletion occurs in 1~2 hours, to analyze the direct function of Integrator 38,39 .Consistent with previous studies, we detected a massive PROMPT accumulation at most of active human genes after an acute depletion of Integrator in 1 hour 19,40,41 .Meanwhile, through multiple transcription dynamics studies, after Integrator depletion we observed a genome-wide attenuation of pre-mRNA synthesis in the pre-mRNA direction (sense direction), suggesting a counterbalance between the sense and the antisense transcription.To further address the speci city and mechanism, our dominant negative assay of the Integrator endonuclease subunit showed the transcription directionality speci cally requires the endonuclease function of Integrator on PROMPTs.Furthermore, by mimicking Integrator cleavage with gapmer antisense oligonucleotides (ASOs) or introduction of U1 sites to the antisense region, we reveal the connection between U1 snRNA and PROMPTs to inhibit Integrator cleavage for the determination of transcription directionality.Together, our results suggest the U1-Integrator axis is critical for the rapid degradation of PROMPTs and facilitates the transcription for actively transcribed genes.

Acute depletion of Integrator induces abundant PROMPTs
To accurately assess the function of Integrator, we utilized the mini-Auxin Inducible Degron (mAID) system to rapidly deplete Integrator subunits 38,39,42 .Speci cally, the mAID tag was inserted at the C terminus of the endogenous INTS11 or INTS9 loci in the HCT116:TIR1 or the F74G mutant cell line to avoid affecting the expression levels of endogenous proteins (Fig. 1a, Extended Data Fig. 1a).After 15minute treatment of 500 µM IAA (indole-3-aceticacid, for INTS11-AID) or one hour with 10 µM 5-Ph-IAA (for INTS9-AID2), targeted proteins were nearly undetectable (Fig. 1b).After the depletion of the target protein, we performed chromatin-associated RNA-seq (ChrRNA-seq) to monitor the on-going transcriptome changes 20,43 .
As expected, upon Integrator subunit depletion, abundant transcript signals are accumulated downstream of the annotated 3' end of RNU11 and RNU12, two UsnRNA genes which 3' end processing is known to be speci cally cleaved by Integrator (Extended Data Fig. 1b, c) 14 .To our surprise, we noticed dramatic accumulation of PROMPTs on 8,377 gene loci for both cell lines (Fig. 1c).Moreover, with INTS11 or INTS9 depletion, PROMPTs with over two-fold accumulations are mostly overlapped (Fig. 1d) and correlated well with each other (Extended Data Fig. 1e).Since PROMPTs have no annotation on the human genome (GRCh38/hg38), we developed an algorithm (PROMPT-Finder, methods for details) to de ne the range of PROMPTs in the human genome based on ChrRNA-seq results.Brie y, a PROMPT region is de ned as a genomic section with signi cant enrichment of antisense transcripts (> 1kb) initiated immediately upstream of annotated TSSs (< 500bp).Plus, a given PROMPT region should not overlap with any active annotated genes in the same direction (Details in the methods section).Within the identi ed 8,377 gene loci in INTS11-AID cells, there are 7,070 gene loci harboring PROMPTs and showing signi cate accumulations of PROMPTs (> 2 fold, FDR < 0.05) upon INTS11 depletion (Fig. 1e-f).Furthermore, under INTS9 depletion, PROMPTs also accumulated at these 7,070 gene loci (Extended Data Fig. 1d, f).The overall PROMPTs accumulation in both INTS11 or INTS9 depletions correlated well with each other, suggesting the acute disruption of the Integrator endonuclease module induces PROMPT accumulation in the antisense direction.The difference of PROMPT numbers and levels called by PROMPT-Finder between INTS11-AID and INTS9-AID cells are likely due to the incomplete depletion of INTS9 34,44 .Among all 7,070 gene loci that accumulates PROMPTs in the antisense direction, we noticed a broad range of dysregulation of their pre-mRNA on the sense strand (Fig. 1f).We next split all 7,070 gene loci into ten quantiles depending on their pre-mRNA transcript level in control samples (Fig. 1g, pre-mRNA).
Among them, the top 90% of gene loci shows various levels of pre-mRNA reduction after the acute depletion of INTS11.Meanwhile, for the bottom 10% genes representing the lowest gene expression levels, both pre-mRNAs and PROMPTs levels are dramatically increased (Fig. 1g).This increase in transcription activity after Integrator depletion has been also reported in a recent publication 29 .To accurately evaluate sense and antisense transcription for actively transcribed genes, we selected the top 30% of highly expressed genes for further analysis (2,121 gene loci).
To con rm that the accumulation of PROMPTs is indeed through the depletion of INTS11, we exchanged the media containing IAA with fresh media, which should restore the level of INTS11 protein (Fig. 1h).As expected, after 7 hours of restoration, the 3' end extension of UsnRNAs (RNU11 and RNU12) can be properly processed through the recovery of the cleavage function of Integrator (Extended Data Fig. 1b).Meanwhile, the PROMPTs were diminished upon the restoration of INTS11 (Fig. 1i, PROMPTs marked in red).As speci c examples, we choose the MYC gene because it is one of the key transcription factors in cancer and BMP4 gene for its importance in signaling transduction and embryogenesis 45,46 .Consistently, the massive accumulation of PROMPTs at the MYC and BMP4 loci can be observed after 1 hour of the IAA treatment.In total, all 2,121 PROMPT-harboring gene loci show signi cant uctuation of PROMPTs during the process of knockdown and restoration of INTS11 (Fig. 1j-k).Interestingly, for actively transcribed genes, after INTS11 depletion, the increased amount of PROMPTs are correlated with a decreased amount of pre-mRNA.Yet, the total amount of transcripts at each locus remains largely the same (Fig. 1k and 1l).Similar exchange of the levels of pre-mRNAs and PROMPTs can also be observed under INTS9 depletion conditions (Fig. 1k, Extended Data Fig. 2d-g).To avoid the off-target effects from the IAA treatment, we performed the rescue experiments with ectopic expression of INTS11.The gene expression pro les are con rmed the PROMPTs induction is dependent on INTS11 depletion (Extended Data Fig. 2a-c).Together, our results suggest that the acute depletion of Integrator leads to the strong accumulation of PROMPTs for transcriptionally active genes.
Pre-mRNAs and PROMPTs are counterbalanced in highly expressed genes and induced genes During the depletion and restoration of INTS11, we noticed that at the MYC and BMP4 loci, pre-mRNA transcripts were attenuated along with the increased level of PROMPTs, suggesting that pre-mRNA and PROMPT transcription might counterbalance each other (Fig. 1l).For most active promoters in eukaryotes, the basal transcription machinery is believed to be inherently directional and the divergent promoter region typically harbors two distinct promoters with inverted orientations 1,3,47 .The assembly of two separated transcription pre-initiation complexes (PICs) is assumed to be correlated with each other 1,48 .Our previous results prompted us to examine if Integrator plays a key role to monitor transcription direction for the active gene loci.
Since ChrRNA-seq performed above did not have a spike-in control, to carefully examine the role of integrator in the dynamics of nascent RNA synthesis, we performed 4sU-seq (with spike-in control) to quantify the newly transcribed RNAs (containing 4-thiouridine) and transient transcriptome sequencing (TT-seq) for the transient transcriptome 49 (Extended Data Fig. 4a-b).In both cases, we observed dramatic inductions of PROMPTs transcription in MYC, BMP4, XPO1 and EP300 loci, while transcription in the sense direction is largely reduced (Fig. 2a, Extended Data Fig. 4d, h and i).With the detailed analysis for the same 2,121 highly transcribed genes, our data suggest that Integrator depletion interrupts the distribution of transcription at the promoter-proximal stage (Fig. 2b-d, TT-seq; Extended Data Fig. 4e-f, 4sU-seq).Moreover, after comparing PROMPTs and gene body signals from our TT-seq results, we noticed that the total amount of newly synthesized transcripts remains largely the same in all of these loci (Fig. 2e).Similar result was also obtained by using the published Integrator depletion data (Fig. 2f; Extended Data Fig. 4c 29 ).
Except for the coding genes, we analyzed our TT-seq results for the enhancer RNAs.According to the RNAPII peaks at the intergenic regions, we identi ed 3,259 regions as the enhancer loci (details in the methods section).Similar with previous published results, enhancer transcripts are accumulated after Integrator depletion 20 .Contrasting with protein coding genes, nearly all the enhancer loci do not show the strand preferences (Fig. 2g-h).Instead, primary transcripts are accumulated equally in both transcription directions (Fig. 2g, h bottom panel; Extended Data Fig. 3d-e).Similar with the lowly expressed genes in chrRNA-seq results (Fig. 1g), transcription levels at most enhancer regions are quite low (Extended Data Fig. 3d-e).This observation may suggest that transcription speed or capacity could contribute to the selection of transcription direction for eukaryotic gene expression.
To gain further insight into how the transcriptional apparatus is engaged for the actively transcribed genes, we applied epidermal growth factor (EGF) or interferon beta (IFN-β) induced systems to our Integrator AID system.By applying these systems, we can evaluate the function of Integrator on the initiation status of immediate-early genes (IEGs) upon stimulations 1 (Extended Data Fig. 5a, g).As expected, shortly after adding EGF or IFN-β, 1,026 and 84 IEGs, which harbor detectable PROMPTs under the INTS11 depletion condition, were signi cantly induced, respectively (Extended Data Fig. 5d, h).In contrast, upon stimulation after INTS11 depletion, the inductions of IEGs were dramatically attenuated, along with signi cant accumulation of PROMPTs (KLF2 and NR4A1 for EGF induction, Fig. 2i; DDX58 and IFIT1 for IFN-β response; Extended Data Fig. 5i).These alternating responses were further con rmed with RT-qPCR for NR4A1 and IFIT1 pre-mRNA transcripts, as well as for their PROMPTs (Extended Data Fig. 5f  and j).From the analyses of 1,026 EGF induced IEGs, the counterbalance effect between PROMPTs and pre-mRNAs could also be observed (Fig. 2j and Extended Data Fig. 5e).Meanwhile, before and after the depletion, the total amount of highly induced nascent transcripts shows no change (Fig. 2k and Extended Data Fig. 5k-l).Together, our results suggest that the depletion of Integrator could lead to rapid changes of transcription distribution between the PROMPTs and pre-mRNAs at the actively transcribed genes.

Integrator affects RNAPII status and CTD phosphorylation dynamics
Given that Integrator is tightly associated with RNAPII-CTD throughout the whole transcription processing 19,26,30,31 , we performed RNAPII chromatin immunoprecipitation sequencing (ChIP-seq) to measure exactly how RNAPII is actively engaged at promoter region to conduct transcription initiation.Upon depletion, the accumulations of RNAPII were observed at upstream regions of TSSs in the MYC, BMP4, EP300 and EZH2 genes, along with decreased levels of RNAPII at the TSS sites (Fig. 3a and Extended Data Fig. 6c).For the same 2,121 highly expressed genes, our heatmap analysis shows clear accumulation of RNAPII occupancy at the upstream of TSS regions and reduction at the TSS sites upon depletion (Fig. 3b).Meanwhile, similar re-distribution of RNAPII occupancies could also be detected from the IEGs with EGF induction after INTS11 depletion (Extended Data Fig. 6f-h).
It is well-known that the phosphorylation dynamics at heptad repeats of the RNAPII-CTD orchestrate different stages of transcription 50 .Previous reports suggest that Tyr1P might associates with the antisense transcription and functions at the transcription initiation stage 51,52 .Moreover, a recent structural study of Integrator indicates that Tyr1P is adjacent to the RNAPII-Integrator interface 25 .Interestingly, our Tyr1P ChIP-seq shows a dramatical enrichment of CTD-Tyr1 phosphorylation at PROMPT regions and an obvious reduction at TSS sites at the MYC and BMP4 loci (Fig. 3a, middle panel).Genome-wide analyses also reveal that the changes of Tyr1P engagement after depletion is similar to that of total RNAPII at PROMPT regions and TSSs (Fig. 3d, left panel), indicating Tyr1P dynamics might be tightly associated with Integrator to regulate PROMPTs during the promoter-proximal stage of transcription (Fig. 3c, left panel).
We next tested several other phosphorylation forms of RNAPII-CTD, within which we observed a noticeable accumulation of Ser2P at TSS sites of MYC and BMP4 loci (Fig. 3a, bottom panel).Interestingly, the amounts of Ser2P showed a remarkable increase at TSS sites of transcriptionally active genes and east only slightly enriched for enhancer TSSs after depletion (Fig. 3c and 3d, right panel; Extended Data Fig. 7f, right panel).Consistent with previous observations [29][30][31] , our results suggest that the functional disruption of the endonuclease module of Integrator might facilitate the promoter-proximal pausing of RNAPII (Fig. 3h).Together with the total amount of polymerase remaining roughly the same before and after Integrator depletion (Fig. 3i), our results indicate that Integrator controls divergent transcription likely through refraining RNAPII Tyr1 phosphorylation at the initiation stage of transcription, as well as transcription pause-release via RNAPII Ser2 phosphorylation.
The similar changes of Tyr1P and Ser2P were also observed at enhancers and super enhancers after Integrator depletion (Extended Data Fig. 6e, Extended Data Fig. 7f).Meanwhile, within the 2,121 highly expressed gene loci or 1,026 EGF induced gene loci, the combination of RNAPII for both sense and antisense still remains at a similar level (Fig. 3g and Extended Data Fig. 6h, gray bars).Together with the western blot of RNAPII and TT-seq for newly synthesized transcripts, our results suggest that Integrator depletion might not affects the RNAPII loading on the genome, but rather it interrupts the distribution of bi-directional transcription for actively transcribed genes.

The endonuclease activity of Integrator is critical for bidirectionality
Integrator is known to possess the endonuclease activity for small nuclear RNAs, viral miRNAs and enhancer RNAs 14,20,23 .Given that INTS11 functions as an endonuclease and regulates genome-wide non-productive transcription 19 , we speculated that Integrator might directly cleave PROMPTs to destabilize the antisense transcripts.Previous studies have shown that a single point mutation (E203Q) in the catalytic domain of INTS11 impairs the processing of UsnRNAs and enhancer RNAs 14,20 .We thus performed a dominant negative assay by expressing INTS11 or its catalytic-dead E203Q mutant in INTS11-depleted cells, so as to investigate whether the endonuclease function of Integrator is necessary for the regulation of PROMPT transcription (Fig. 4a).Indeed, the ectopic expression of INTS11 could substantially reduce the 3'-end accumulation of the UsnRNA genes (RNU11 and RNU12) and PROMPT production of MYC and CCND1 genes after depleting the endogenous protein (Fig. 4b, Extended Data Fig. 8a).Notably, the accumulation of PROMPTs caused by INTS11 depletion cannot be suppressed by overexpression of the E203Q mutant, suggesting that the endonuclease activity of Integrator is required for PROMPT processing on chromatin (Fig. 4b).
To con rm the endonuclease activity to PROMPTs is critical for transcription directionality, we designed chemically modi ed gapmer antisense oligonucleotides (ASOs), which can hybridize with nascent transcripts on chromatin in a sequence-speci c manner and cleave the nascent RNA via an RNase H1 mechanism [53][54][55] .We assumed that the PROMPT cleavage by gapmer ASO could mimic the Integrator endonuclease function without affecting the other functions of the Integrator complex (Fig. 4c).Upon INTS11 depletion, PROMPTs accumulation and the pre-mRNA reduction at MYC, RBM14 and SRRT gene loci were measured by RT-qPCR (Fig. 4d and Extended Data Fig. 8b, rst two columns for each panel).Meanwhile, after the treatment with sequence-speci c gapmer ASOs, the reduction of PROMPTs and increased level of pre-mRNAs were monitored accordingly (Fig. 4d and Extended Data Fig. 8b, ASO+).These results indicate that targeted ASO cleavage of PROMPTs indeed mimics Integrator cleavage for the transcriptional balance between sense and antisense strands.More importantly, because ASO approaches can only mimic the endonuclease function of Integrator for the antisense strand without affecting transcription at the sense strand or interrupting other functions of Integrator, it clearly suggests that the PROMPT cleavage is critical for the preference of transcriptional direction for sense over antisense strands in eukaryotic cells.
In humans, INTS11 is also termed CPSF73L, due to its highly homologous protein sequence with CPSF73, which is known to cleave the 3' end of pre-mRNAs and some PROMPTs 19,56,57 .To distinguish the endonuclease function of INTS11 and CPSF73, we created a CPSF73-AID2 cell line, which showed a dramatic decrease of CPSF73 level after 1 hour of the 5-Ph-IAA treatment (Extended Data Fig. 8c).Upon CPSF73 depletion, we observed signi cant extension at the 3' end of MYC, ACTB and EGFR pre-mRNAs (Extended Data Fig. 8e-f).However, for all 3,847 genes, which have at least two-fold accumulation of 3'end pre-mRNA (Extended Data Fig. 8g), we observed no signi cant enrichment of PROMPTs, albeit these genes have considerable amounts of PROMPTs in INTS11-depleted cells (Fig. 4e-g, Extended Data Fig. 8h).These data suggest that Integrator and CPSF73 might have distinct preference to process RNAs during transcription 19 .
Taken together, our results suggest there is a balance between sense and antisense strand transcription for actively transcribed gene loci.Under normal conditions, antisense strand transcription is likely suppressed through rapid cleavage of PROMPTs by Integrator and degraded through an exosome dependent RNA degradation mechanism 5,58 .Without the cleavage function of Integrator on the antisense strand, transcription could be activated on the antisense strand, which leads to PROMPTs accumulation and pre-mRNA reduction for the actively transcribed gene loci.
Integrator coordinates with upstream U1 signal for transcription directionality Given the well-recognized U1-PAS theory 12,13 , we next asked if Integrator could act as the trans-factor to connect U1 snRNP and its U1 sites (the cis-element) on the production of nascent transcript for the unbalanced sense/antisense transcription.We thus carried out a genome-wide search for the 1st U1 snRNP sites near TSSs with the de novo motif analysis as described before 12 .We grouped genes according the numbers of predicted strong or medium U1 sites in the 1kb window upstream of TSS sites (Fig. 5a and Extended Data Fig. 9a-b).Notably, under normal condition, the amount of PROMPT transcripts is generally higher if the PROMPT regions harbor two or more U1 binding sites in our chrRNAseq data sets (Fig. 5b, left panel).After INTS11 depletion, the induction of PROMPTs is much less for these genes with multiple U1 sites (Fig. 5b, right panel), suggesting that U1 sites may counteract with Integrator to facilitate PROMPT production.
Apart from the numbers of U1 sites, the distance between the U1 site and TSS could also affect the productivity of PROMPTs.After the categorization of the 1st U1 sites in a 2 kb window upstream of TSS sites (Fig. 5c), our data suggests that the further the 1st U1 site away from the TSS in the PROMPT region, the less PROMPTs can be generated under native conditions (Fig. 5d, left panel).Meanwhile, upon INTS11 depletion, PROMPTs were produced more and less controlled by the appearance of U1 sites (Fig. 5d, right panel).Combining with the number of U1 sites and the position of the 1st U1 site in the PROMPT regions, we speculate that the U1 site might inhibit Integrator cleavage for the production of transcription on the antisense strand.Indeed, comparing with the sense direction that harbors large amount of U1 sites, there are much fewer U1 sites distributed in the antisense direction (Extended Data Fig. 9a).
A recent publication shows that U1 snRNP enhances transcription in the pre-mRNA direction 59 , suggesting U1 sites could also contribute to PROMPT transcription.Based on the number and distance of U1 sites on the antisense strand (in a 2 kb window upstream of TSS), we calculated a U1 score for every identi ed PROMPT, which should re ect the contribution of U1 sites to PROMPT production (Fig. 5e, details in methods).We found that the higher a U1 score on a speci c PROMPT region, the higher level of antisense transcription and more PROMPTs were accumulated in our chrRNA-seq data sets (Fig. 5f, left panel).Interestingly, upon INTS11 depletion, there are less changes for the PROMPTs possessing a high U1 score, suggesting they are less regulated by Integrator cleavage activity (Fig. 5f, right panel).This observation is not only re ected by our chrRNA-seq data, but also exhibited in our TT-seq results (Fig. 5g).
To carefully check if our analysis is correct, we selected two gene loci with high U1 scores and two loci with low U1 score (Fig. 5e).For MYC gene loci, lacking or having fewer U1 sites in the antisense direction, our multiple sequencing results suggest that Integrator could rapidly cleave PROMPTs and guide RNAPII machinery to the pre-mRNA direction (Fig. 5h left panel).Upon Integrator depletion, there was induced transcription on the antisense strand and attenuated transcription on the sense strand (+ IAA lanes).The balance of transcription at these gene loci is strongly regulated by Integrator cleavage.In contrast, for CCNY gene loci, with high U1 scores in the antisense direction, Integrator cleavage is inhibited by U1 sites on the antisense strand (Fig. 5h right panel).Consistently, after Integrator depletion, the bi-directional transcription pattern remains largely the same.

U1 snRNA together with Integrator mediate transcription dynamics
To further understand the connection between Integrator and U1 snRNA during transcription, similar with previous analysis, we compared all the predicted U1 sites for the 2,121 actively transcribed gene loci with 3,259 enhancer regions (Fig. 6a).These results showed an obvious bias toward pre-mRNA side for the active genes (red line) and no difference for the enhancer regions (blue line).Comparing with our TT-seq results, there is a shift of transcription direction for active genes after Integrator depletion (Fig. 6b, red line).However, such a shift is absent at enhancer regions, which exhibit elevated nascent RNA synthesis at both sides (Fig. 6b, blue line).Based on our analyses, enhancer RNA transcription is directly regulated by Integrator cleavage and it ts with the previously published results 20 .
Moreover, after we split all 7,070 transcribed genes by their reads number from our TT-seq results, the transcription levels of eRNAs or PROMPTs are quite low (Fig. 6c, top panel).Meanwhile, upon Integrator depletion, the regulation of low expressed transcripts is quite different with highly transcribed genes (Fig. 6c, bottom panel, red bars vs. blue bars).Thus, Integrator-dependent regulation might act in a dose dependent manner (Fig. 6c, bottom panel).
To directly link the U1 snRNA with Integrator, we next performed RNA-IP by pulling down the ectopic FLAG-INTS1, and using FLAG-CPSF73 as the negative control (Fig. 6d).Interestingly, we observed strong and speci c interaction of RNU1 (U1 snRNA) with INTS1, in contrast to CPSF73 which has little interaction with RNU1 (Fig. 6e).Moreover, this interaction is speci c for U1 snRNA with INTS1, but not for U2 snRNA (Fig. 6e), suggesting this U1-Integrator interaction is not within the processing steps of UsnRNA maturation 14 .
To further measure the importance of U1-Integrator interaction, especially for transcription directionality, we inserted either a DNA fragment containing 3 x U1 site sequences or a natural sequence with multiple U1 sites (from the PROMPT region of the Cttn gene locus) in the PROMPT region of the FUS gene (Fig. 6f, Extended Data Fig. 9d-e).After Integrator depletion, signi cant accumulations of PROMPTs can be detected at the PROMPT region for both cases (Fig. 6g, panel + IAA in PROMPT section).Meanwhile, the reduction of FUS pre-mRNA could simultaneously be observed (Fig. 6g, panel + IAA in pre-mRNA section).Interestingly, after the insertion of 3 x U1 sites in both cases, signi cant amounts of PROMPTs had accumulated with a reduction of nascent transcript in the sense direction (Fig. 6g, panel + U1).Moreover, there was no signi cant inhibition of cleavage response for both PROMPT and pre-mRNA directions after the Integrator depletion (Fig. 6g, panel + U1 + IAA).Taken together, these results suggest that during transcription, U1 signals on PROMPTs could function together with U1-Integrator to inhibit Integrator cleavage and impact on bi-directional transcription.

Discussion
Bi-directional transcription has been well-documented at the promoters of mammalian protein-coding genes, but little is known about the regulatory bases that guide transcription to the sense direction and refrain the antisense transcription.By applying the rapid depletion systems for multiple subunits of Integrator, we show a clear interplay of transcription for both strands at active gene promoters through the endonuclease function of the Integrator complex.This transcriptional interplay mainly depends on the local transcription activity and U1 snRNA binding sites around the TSS region.Similar with previous studies, we nd Integrator cleaves enhancer RNAs and other lowly expressed genes 19,20,40 .As most of the enhancer sites lack of U1 sites, the Integrator cleavage cannot be constrained by U1 snRNA.For the actively transcribed gene loci, U1 snRNA binding sites surrounding TSSs exhibit an uneven distribution, which is far more abundant at sense than antisense direction (Fig. 6h).In this case, PROMPTs could be e ciently cleaved by Integrator and rapidly degraded through the RNA exosome mechanism 5,19,58 .
Moreover, because Integrator cleavage can be inhibited by U1 sites on the sense strand, the entire transcription machinery on the gene locus favors the transcription to the pre-mRNA side.During this step, the U1 snRNA reverse-complementarily binding with nascent transcripts to inhibit Integrator cleavage is the critical step to perform the directional selection locally.Taken together, we propose a U1-Integrator model to govern the transcription bi-directionality for actively transcribed genes (Fig. 6h).
Untill now, the function of Integrator for the early steps of transcription was still under debate 15,26 .The cleavage activity of Integrator for non-polyadenylated RNAs has been well documented to have functions on RNA maturation, biogenesis and transcription activation 14,20,31 .However, a genetic repressor screen has suggested that the complex may have the function to abrogate transcription [17][18][19] .Meanwhile, an association of Integrator with non-productive transcription was reported within ~ 3kb downstream of promoter-proximal regions in a sequence-independent manner 19 .As Integrator has multiple different functional modules and participates at different stages of transcription, rapid and clean tools are needed to dissect its function 26 .Along this line, a recent paper also suggests that the elongation velocity at protein-coding genes shows broadly decreased productive elongation after acute INTS11 depletion 60 .Moreover, by expression of a truncated form of INTS8 in an acute INTS8 depletion system, a distinct activation of transcription has been reported for the function of the phosphatase module, suggesting different enzymatic modules in the Integrator complex may present different functions during transcription 29 .Consistent with this notion, our results are focused on the endonuclease function of Integrator complex.It seems that the rapid degradation of PROMPTs is initiated through Integrator cleavage.Moreover, through multiple structure and mechanism studies, it suggests that Integrator may serve as a scaffold complex, working together with other transcription factors or RNAs, for the contextbased access control on chromatin.The detailed picture of how the Integrator complex involved into different steps of transcription still needed to be addressed.
In eukaryotes, it is known that the abundance of U1 snRNP is much higher than that of the other snRNPs.
A "Telescripting" mechanism has been proposed to suppress premature termination in the pre-mRNA strand from Dreyfuss's laboratory [61][62][63] .Consistent with the telescripting mechanism, our data do support that the presence of U1 snRNAs on chromatin and their pairing with pre-mRNAs are critical for the controlling of transcription.Moreover, our results further suggest that the pairing between U1 snRNAs with nascent transcripts could inhibit Integrator cleavage for premature termination.Importantly, our data pointed out that this inhibition is crucial for the natural selection of transcription direction.From another point, somatic mutations in the 5' splice site binding region of U1 snRNA has been identi ed in different cancer types 64,65 .As one of the most abundant noncoding RNAs in eukaryotic cells, it has been shown to regulate long noncoding RNA (lncRNA) retention in the nucleus, even though the mechanism is not clear 24 .Similar with blocking the 5' end of U1 snRNA, our results of INTS11 depletion show a reduction of transcription of active genes, suggesting a counterbalance model of endonuclease activity of Integrator with U1 snRNA or U1 snRNP during transcription.As Integrator and U1 snRNA were discovered to be biochemically associated with the RNAPII complex, our results suggest that the U1-Integrator axis may serve as a decisive factor for the rapid removal of transcription from the anti-sense strand, which determines the proper direction for pre-mRNA transcription in eukaryotic cells.

Declarations Methods
Cell culture and generation of AID Cell lines HCT116 cell lines were cultured in DMEM (Thermo Fisher Scienti c), supplemented with 10% (v/v) FBS, 2 mM L-glutamine at 37ºC.To generate HCT116-AID cell lines, the HCT116-OsTIR1 cells were transfected into a 10 cm petri dish with 4.8 µg of guide RNA plasmid (based on pX330-U6-Chimeric_BB-CBh-hSpCas9 (addgene #42230)) and 3.6 µg of donor plasmids (pMK289 (addgene #72827) and pMK290 (addgene #72828)) by using the Calcium Phosphate Cell Transfection Kit (Beyotime, C0508).8 hours after transfection, the media were changed with fresh media and cells were selected with 500 µg/mL neomycin (Biofroxx,1150GR005) and 100 µg/mL hygromycin (Sino Biological Inc, 50708-mccH).After 10 ~ 15 days of selection, individual clones were isolated and screened by genomic DNA PCR with corresponding primer sets (details in Supplementary Table 1).The correct clones were further con rmed by the western blot with corresponding antibodies.Before the experiments, 500 mM Indole-3-Acetic Acid Solution (Phytotech, I364) was dissolved in DMSO as the stock solution.
The chromatin-associated RNA was isolated with the standard Trizol protocol (Invitrogen, cat.no.15596018).Genomic DNA was removed following the protocol of DNase I treatment (Thermo, EN0521).
Chromatin immunoprecipitation sequencing (ChIP-seq) and data processing About 1 x 10 7 cells were cross-linked with 1% formaldehyde for 10 min and quenched in 125 mM of glycine for 5 min at room temperature.After washing twice with ice cold PBS, the cells were re-suspended in cold ChIP buffer (150 mM NaCl, 1% Triton X-100, 0.7% SDS, 500 mM dithiothreitol, 10 mM Tris-HCl and 5 mM EDTA with fresh protease inhibitors) on ice for 20 min.Chromatin shearing was performed using Covaris ME220 ultrasonic generator.After clearance of the sonicated chromatin by centrifugation at 14,000 rpm for 10 min, chromatin fragments were immuno-precipitated overnight at 4ºC with 2-4 µg of appropriate antibodies and 30 µL of Dyna Protein A or G beads (Invitrogen,11204D or 11202D).The next day, beads were washed twice with cold Mixed Micelle Buffer (150 mM NaCl, 1% Triton X-100, 0.2% SDS, 20 mM Tris-HCl, 5 mM EDTA, and 65% sucrose), twice with cold Buffer 500 (500 mM NaCl, 1% Triton X-100, 0.1% Na deoxycholate, 25 mM HEPES, 10 mM Tris-HCl, and 1 mM EDTA), twice with cold LiCl/detergent Buffer (250 mM LiCl, 0.5% Na deoxycholate, 0.5% NP-40, 10 mM Tris-HCl, and 1 mM EDTA) and one wash with 1 x cold TE buffer.The immunoprecipitated chromatin fragments were eluted with 1 x TE buffer containing 1% SDS and incubated overnight at 65°C to reverse crosslinks.The chromatin fragments were treated with 0.5 mg/mL proteinase K for 3 h.DNA was puri ed by phenol/chloroform and precipitated with isopropanol with Glyco-Blue (Invitrogen, AM9516).The DNA libraries were constructed with the VAHTS Universal DNA Library Prep Kit for MGI (Vazyme, NDM607-02) and sequenced on MGI 2000 instrument (MGI-SEQ).
Raw reads were processed as described above and mapped to the human genome and drosophila genome (UCSC dm6) using STAR 67 with parameter "--outFilterMultimapNmax 1" to remove multi-mapped reads.Low mapping quality (MAPQ lower than 30) and duplicate reads were further removed from BAM les by SAMtools 68 .The number of spike-in dm6 reads counted by SAMtools 68 was used to calculate the normalization factor alpha = 1e6/dm6_count.Bigwig les were generated and normalized with merged BAM les by deepTools 71 with scaling factors of spike-in.Gene expression quanti cation was performed with featureCounts 69 .Reads counts were normalized by both scaling factors of spike-in and gene length.

Lentiviral transduction
Lenti expression plasmids INTS11 or INTS11 (E203Q) were transfected with two helper plasmids (psPAX2 and pMD2.G) into HEK293T cells by Lipofectamine 2000 (Invitrogen,11668019).The fresh culture media were replaced and the viral supernatants were collected twice after 24 hours and 48 hours of transfection.The HCT116-AID cells were infected with virus for 70 hours and harvested 60 hours after the IAA treatment.The e ciencies of protein expression were measured by western blots with appropriate antibodies and quantitative RT-PCR for the products of transcription.All the antibodies and PCR primer sequences are listed in the supplementary table 1.

Antisense oligonucleotide transfection
The HCT116-AID cell line was cultured in 5% CO2 at 37℃.When cell density reached to 80-90%, 100nM gapmer ASO was transfected into cells by calcium transfection method.After 6-8 hours of transfection, fresh medium was replaced and IAA was provided at the same time.After 18 hours of transfection, Trizol (Invitrogen, cat.no.15596018) was used to extract total RNA for RT-qPCR.ASOs used in this study to cleave PROMPTs of MYC, SRRT and RBM14 are 20 nucleotides in a standard sandwich structure (10 unmodi ed deoxynucleotides anking by 5 MOE-modi ed ribonucleotides with phosphorothioate backbone) 76 .ASOs were solubilized in water (DNase-/RNase-free) The sequence of the MYC ASO is 5′-TACTGCTACGGAGGAGCAGC-3′ The sequence of the RBM14 ASO is 5′-AATTAATGGCACGAGGGCTT-3′ The sequence of the SRRT ASO is 5′-TGTGCCTGGCCCTAAATATT-3′ The bold letters represent MOE-modi ed bases.

RNA immunoprecipitation (RIP)
About 1 x 10 6 cells were transfected with pCMV2-INTS1-FLAG and pCMV2-CPSF73-FLAG for 36 h and lysed with lysis buffer (50mM Tris-HCl pH 7.5, 150mM NaCl, 0.1% Triton, 1mM EDTA).The RIP experiment did not include any crosslinking steps.After centrifuging at 14000 rpm for 10 min, the supernatant was incubated with 50µL Anti-FLAG® M2 Magnetic Beads (Sigma-Aldrich) for 2 h at 4°C.The beads were washed three times with ice-cold lysis buffer and twice with ice-cold PBS.Immunoprecipitated RNA was extracted with Trizol (Invitrogen, cat.no.15596018) reagent and used for qPCR assay.

Identi cation of active promoters and enhancers
Active promoters were de ned within 500bp regions immediately upstream of TSS (transcription start site), which are overlapped with peaks called from RNAP II ChIP-seq.To identify active enhancers, we rst selected genomic regions that contain RNAP II peaks and are at least ± 10kb away from any annotated gene.Next, we use ROSE (v0.1) 77,78 to identify active enhancers and super-enhancers from those regions.
Most BED les are processed using BEDtools 79 .

PROMPT-Finder
First, we set background area as intergenic regions which are 20kb and 10kb from the upstream or downstream of annotated genes (UCSC hg38).To generate the empirical distribution of ChrRNA-seq background signals, we randomly selected 10,000 windows (200bp) from background areas of each chromosome and calculated the chrRNA-seq density of each window, resulting in an empirical distribution function for each chromosome.Next, we used a sliding window (200bp in length, 10bp steps) to scan across the genome.ChrRNA-seq signals of each sliding window were evaluated with corresponding empirical distribution function (e.g., chromosome I).The probability of each window was further adjusted by false discovery rate (FDR).Windows with FDR > 0.05 were removed.Remaining windows in upstream antisense region of active promoters (50kb upstream and 2kb downstream of TSS) were merged if the gap between windows is less than 400bp.For a given PROMPT region, the portion overlapping with downstream active genes of the same direction were truncated.We eliminated the PROMPT region if length < 1kb.We next estimated the differential expression of these PROMPT regions between treatment and control by featureCounts (v2.0.1) 69 and DESeq2 (v1.34.0) 70 .PROMPT regions were de ned as FDR > 0.05 and FC > 2.

Transcript activity and RNAP II loading balance analysis
Reads in identi ed promoter regions and genes were quanti ed using featureCounts 69 .ChrRNA-seq reads counts were normalized with library size and region (gene or PROMPTs) length.TT-seq reads counts were normalized by scaling factors and region length.As chrRNA-seq and TT-seq are strand-speci c, the total read counts were calculated as the sum of reads in PROMPT and gene regions.The total read counts in ChIP-seq were calculated from the end of the PROMPT region to the end of the gene.Then, the + IAA and CTRL were compared to identify the RNAP II loading and transcription activity changes in PROMPT, gene(pre-mRNA), and total.

U1 site prediction
Prediction of U1 snRNA recognition sites was performed as described 80 .The 5' splice site motif was calculated in the known intron 5' site (3nt in exon and 6nt in an intron) of the human genome (UCSC hg38).The motif of the 5' splice site was used by FIMO 81 to search for signi cant matches (P < 0.01).
Matches were then scored by the maximum entropy model 82 .All annotated 5' splice sites were also calculated with maximum entropy score to classify the predicted sites.Sites with scores larger than the median of annotated 5' splice sites were classi ed as strong.Sites with scores lower than the median but higher than the rst quartile were classi ed as medium.
Classi cation of genes by predicted U1 site U1 score calculation genes were removed if their upstream region overlapped with the putative promoter of annotated genes.For the remaining genes, we only took into account predicted U1 sites that were located upstream 2kb in antisense direction and calculated the distance between 1st U1 and TSS.For estimation of U1 distance in the antisense direction, genes were classi ed as "0-0.5kb","0.5-1kb" and "1-2kb" by the