Revealing eRNA interactions: TF dependency and convergent cooperativity

Enhancer RNAs (eRNAs) are non-coding RNAs produced from transcriptional enhancers that are highly correlated with their activities. Using capped nascent RNA sequencing (PRO-cap) dataset in human lymphoblastoid cell lines across individuals, we identified inter-individual variation of expression in over 80 thousand transcribed transcriptional regulatory elements (tTREs), in both enhancers and promoters. Co-expression analysis of eRNAs from tTREs across individuals revealed how enhancers interact with each other and with promoters. Mid-to-long range interactions showed distance-dependent decay, which was modified by TF occupancy. In particular, we found a class of ‘bivalent’ TFs, including Cohesin, which both facilitates and insulates the interaction between enhancers and/or promoters depending on the topology. In short ranges, we observed strand specific interactions between nearby eRNAs in both convergent or divergent orientations. Our finding supports a cooperative convergent eRNA model, which is compatible with eRNA remodeling neighboring enhancers rather than interfering with each other. Therefore, our approach to infer functional interactions from co-expression analyses provided novel insights into the principles of enhancer interactions depending on the distance, orientation, and the binding landscapes of TFs.


Introduction
Regulation of transcription is achieved mainly through the binding of transcription factors (TFs) at transcription regulatory elements (TREs), such as promoters and enhancers [1][2][3][4] . Genes are expressed from promoters, where regulatory signals are integrated from proximal and distal enhancers to determine the amount of RNA product. Enhancers act as distant regulatory elements for promoters through speci c TF binding, while at the same time produce bidirectional enhancer RNAs (eRNAs). Regulatory networks by enhancers are key to most cellular processes, including development, cell-type differentiation, and stress response, whereas their misregulation could cause disease. Numerous studies revealed that a large majority of disease associated genetic variations affect TREs [5][6][7][8][9][10][11][12][13][14] . Understanding TRE networks requires knowledge of how TREs interact with one another, and the mechanisms through which this regulation is achieved.
The bidirectional production of RNA from TREs, including both enhancers and promoters, is one of the most important hallmarks of regulatory activity in vertebrates, as transcribed enhancers are most active [1][2][3][4] . Consequently, relying on RNA products to identify TREs emerged as an e cient approach. For instance, the FANTOM5 consortium used Cap Analysis of Gene Expression (CAGE) to create an atlas of enhancer activity across numerous cell types and tissues 6 . While CAGE is a simple and powerful method to quantify transcription initiation at genes, it is less well suited for the quanti cation of transcriptional activity at enhancers, which produce particularly unstable eRNAs. Sequencing methods that capture nascent RNA, such as global Precision nuclear Run-On sequencing with 5´-capped (m 7 G) RNA enrichment (GRO-cap or PRO-cap), measure transcriptional activity directly and are, therefore, better suited to the quanti cation of TRE activity 2,15 .
After identifying and quantifying active TREs, which TREs control gene expression and how they do so becomes an eminent question. Systematic analysis of expression variation can reveal these targets and mechanisms. Co-expression network uses variation in expression across different samples to elucidate regulatory circuits 16 . By coupling co-expression analysis to TF binding pro les, the mechanisms of the regulatory circuits can be characterized. While physical interaction between TREs, charted through chromatin conformation assays 18-21 , remains the gold standard for mapping chromatin interaction networks, such interactions are not always readily detectable. Functional interactions identi ed based on TRE co-expression can, therefore, complement physical interaction maps to ll gaps in the regulatory network.
This study leverages the variation in transcription initiation at transcribed TREs (tTREs), majority of which are eRNAs 19 . We investigate the interactions between tTREs measured by PRO-cap in lymphoblastoid cell lines (LCLs) from ~ 70 individuals. We use co-variation between tTREs as an indicator of functional interactions between sites and explore how those interactions are globally affected by various transcription factors. We identi ed thousands of putative interactions and found global signatures of either facilitation or inhibition of interactions for multiple TFs. We also explored interactions between the different strands of neighboring tTREs and found evidence of cooperativity at sites with converging polymerases, discovering new rules of eRNA interactions and their potential roles.

Results
Co-expression of tTREs as an indicator of functional interaction While approaches such as physical interaction maps have de ned and provided insight into the principles of cis-regulatory modules 18-21 , we leverage our previously published PRO-cap datasets from ~70 YRI LCLs to study functional interactions between tTREs using a tTRE co-expression analysis 17 . This dataset includes 87,826 tTREs, of which about 1/5 correspond to promoters and the remainder are de ned as putative enhancers. These were identi ed based on bidirectional transcription of a pair of nascent RNAs within 300 bps of each other. Of these, about 40% are variably expressed across the individuals. As a measure of co-expression, we used Pearson's correlation coe cients for linear regressions across individuals of the PRO-cap signals between pairs of variably transcribed tTREs in short range near SLFN5 gene promoter and in long range at BCL2 super-enhancer locus (Figs. 1a, 1b). The example at the BCL2 locus illustrates how this analysis allows us to visualize which TREs are likely to have functional interactions, both within clusters of tTREs and between such clusters.
We explored whether the co-expression reveals the principles of functional interactions between tTREs, such as the rule of distance. When we bin the correlation coe cients based on the distance between tTREs, covariation of the PRO-cap levels decreases as the distance increases (Fig. 2a, Supplementary   Figs. 1a, 1b). The distribution of the coe cients reaches background levels (comparable to interchromosomal interactions) by 1 Mb (Supplementary Figs. 1c, 1d), and the majority of the coe cients above background are for tTRE pairs within 200 kb ( Supplementary Fig. 1a). This distance effect is unlikely to be bias from linkage disequilibrium (LD) of the genotypes at tTRE pairs, since the YRI population is known to have LDs smaller than 5 kb 22 and a similar degree of distance decay is observed using only the tTRE pairs with independent genotypes (genotype correlation <0.05, Supplementary Fig.  1e).
We also explored how covariation between variably transcribed tTREs and mRNA levels varied based on the distance between the tTRE and gene promoters or mRNA TSSs (Fig. 2b, Supplementary Figs. 1f, 1g).
The tTREs located upstream of the TSSs have a covariation pattern that gradually decreases upon distance. Interestingly, coe cient distribution to the downstream tTREs seem to drop faster with increasing distance. This is consistent with a recent report that intragenic enhancers are less activating for their host gene, and can even attenuate expression 23 .
TF binding sites at or between tTREs are associated with differences in tTRE co-expression Binding of transcription factors can modify interactions between TREs. Insulator proteins such as CTCF can disconnect the communication between two regions 24 and transcriptional co-activators such as P300 can bridge TREs to their targets 25 . We tested these using published chromatin immunoprecipitation sequencing (ChIP-seq) datasets from ENCODE's Factorbook repository 26 in a representative LCL (GM12878) to determine the effect of TF binding on the global tTRE interaction trends.
To examine the insulating effect of CTCF on functional interactions, we separated tTRE pairs into "no intersection" and "intersection" categories based on the number of CTCF ChIP-seq peaks 27 between them and plotted the distribution of correlation coe cients as a function of distance (Fig. 3a, Supplementary   Fig. 2a). The presence of CTCF sites between tTREs reduced co-expression overall and within each distance bin. We also looked at the activating effect of P300 on co-expression, comparing tTRE pairs that are occupied by P300 26 to ones where P300 is not bound. As expected, P300-bound tTRE pairs tend to have higher levels of co-expression than the unoccupied pairs ( Fig. 3b, Supplementary Fig. 2a). These examples show that our co-expression analysis is able to detect the expected effects of TF binding, both when the TF is intersecting and when it is occupying the tTREs.
We expanded our analysis to all 60 TFs for which ChIP-seq data were available from the ENCODE repository 26 . To generate a metric of how TF binding modi es the distance-dependent decay of coexpression, we focused on tTRE interactions in the top 5 th percentile of the correlation coe cient distribution at each distance bin, and plotted the top 5 th percentile as a function of the distance. We then calculated the difference in the area under the curve (∆AUC) between "occupancy"/"no occupancy" and "intersection"/"no intersection" tTRE pairs. We compared the ∆AUC to permuted background ∆AUC distributions to evaluate the signi cance of the difference (Supplementary Figs. 2b, 2c).
The TF analysis reveals three broad categories of TFs: insulating, activating, and bivalent (Fig. 3c,   Supplementary Fig. 2d). Insulating TFs, such as CTCF, reduce tTRE co-expression both when intersecting and occupying tTREs. The opposite is true for activating TFs, such as P300. Interestingly, the activating category contains many TFs that are immune or B-cell speci c (Fig. 3c, red), whereas the insulating category contains general factors associated with strong promoters (Fig. 3c, blue). Bivalent TFs, such as the cohesin subunits RAD21 and SMC3, enhance covariation when occupying the tTREs, but repress covariation when intersecting tTREs. Another bivalent factor, FOXM1, that controls cell cycle progression is also known to function as both repressor and activator depending on the chromatin context 28 .
Strand-speci c covariation at closely-spaced tTREs supports a cooperative model for convergent transcription At close distances, RNA polymerases at one tTRE can potentially affect the neighboring tTRE in either a cooperative or inhibitory manner (Fig 4a). Recent works have suggested that convergent transcription near promoters and intragenic enhancers attenuate transcription from the gene through polymerase interference 23,29 . Others have shown that transcription read-through leads to increased chromatin accessibility [30][31][32][33] and, therefore, cooperativity. It is also possible that transcription at one site may increase the local concentration of RNA polymerase and TFs, leading to rapid recycling of polymerase from one tTRE to the next.
The strand-speci city of PRO-cap allows us to determine which model is more prevalent. If convergent transcription were inhibitory, the converging strands of two adjacent tTREs (upstream plus and downstream minus) would show lower overall correlation than the diverging strands (upstream minus and downstream plus) (Fig. 4a). We performed a strand-speci c regression for all adjacent tTRE pairs and compared the distribution of the orientation-speci c correlation coe cients. For closely-spaced tTREs (250 bp -1kb apart), the divergent strand pairs are signi cantly less correlated than the convergent or the same strand (sense) pairs, even when the difference in the relative distance between the strands are corrected (Fig. 4b, Supplementary Figs. 3a-c). This pattern is no longer evident for tTREs that are far enough apart to make direct read-through unlikely (> 5kb) ( Supplementary Fig. 3c). Also, if the enhancer pairs are interlaced by another enhancer in between, the convergent correlation pattern disappears (Fig  4c). These ndings are more consistent with the model that convergent transcription is cooperative than inhibitory (Fig. 5).

Discussion
We explored the activity, architecture, and functional interactions of eRNA transcribing tTREs using the variation in transcription initiation across human LCLs. The results reveal the principles of functional interactions between tTREs dependent on distance, TF binding, and direction of transcription.
Identi cation of transcriptional regulatory elements (TRE) based on capped nascent RNA sequencing provides a direct measure of transcriptional activity and, therefore, shows higher sensitivity. A direct measure of transcriptional activity is important, as non-productive transcripts such as eRNAs and uaRNAs are rapidly degraded in the nucleus. Other transcription-based approaches, such as CAGE and nuclear short RNA analysis, are impeded by this instability. As we showed, CAGE performs well in identifying promoters but less e ciently detects enhancers than PRO-cap. Additionally, by focusing on bidirectional eRNA transcription start sites (TSS) from PRO-cap data enabled us to lter out spurious transcription from only one strand. Taken together, we propose that PRO-cap provides a high quality set of tTREs.
Bidirectional transcription has been widely accepted as a general architecture of TREs. Our previous analysis of transcription initiation (tiQTL) and directional initiation quantitative trait loci (diQTL) implies that both the central TF binding and the core RNA polymerase initiating regions must be taken into consideration when assessing the potential impact of non-coding genetic variants 17 . Our model suggests a single TF binding site between the eRNA TSSs as a universal architecture, whereas other models have proposed two TF binding sites for promoter TREs, one for each strand 34 . Our interpretation is that two adjacent highly-directional tTREs form a larger nucleosome free region, where convergent strands are below the detection limit, and identi ed as a single TRE.
Beyond understanding individual TREs, we explored how TREs interact with one another to form cisregulatory networks. Physical interaction maps have been the standard for identifying direct interactions.
However, co-expression network approaches to identify functional interactions, though indirect, provide independent evidence for interactions and a good complement to physical maps. We were able to capture mid-range interactions (within 200 kb), which were dependent on distance and insulator/co-activator binding in a manner consistent with current knowledge. Interestingly, factors associated with strong promoters were insulating, indicating that strong promoters may dominate functional interactions and repress looping across them. This may be analogous to chromosomal boundaries in Drosophila that are formed by paused RNA polymerases 35 .
While most mid-to long-range interactions take place through DNA looping, the correlations we observed between close-range clustered tTREs also suggest a transcription-dependent mechanism (Fig. 5). Previous evidence indicated that polymerase collision leads to transcription termination between highly expressed intragenic enhancers and their host genes 23 (Fig. 5, top). However, most tTREs are less active, making simultaneous transcription and polymerase collisions rare. Instead, our strand-speci c coexpression analysis suggests cooperativity rather than inhibition between convergent transcription in clustered tTREs (Fig. 5, middle/bottom). Perhaps transcription from a tTRE remodels the chromatin architecture in the neighboring region to increase accessibility (Fig. 5, middle). Another possibility is direct recycling of polymerase through termination and re-initiation at a neighboring tTRE, which could further be tested with a large-scale transcription-termination analysis (Fig. 5, bottom).
The identi cation of transcribed TREs using capped-nascent-RNA sequencing will be a powerful resource to the study of gene regulation. Although the identi cation of individual cis-regulatory networks may be limited due to the scale of our study, the principles and insights learned herein will help in predicting the regulatory targets of TREs and understanding their biological consequences.

Methods
Co-expression analysis of tTREs. We used the variably expressed tTREs (n=29,694; promoter -4,006; enhancer -25,688 using the CAGE based criteria), and calculated correlation coe cients of the 75 individual normalized readcounts (67 individuals + 8 replicates) between two tTREs within 5 Mb distance (2,249,839 pairs). For the distance analysis, we binned the correlation coe cients by the distance between 2 tTREs. The bins are generated based on xed distance intervals up to 1,024 kb, or a xed number of tTRE pairs (1,000 pairs per bin) with variable distance intervals. With the variable interval bins, we generated plots for the 5th percentile, median and the 95th percentiles of the correlation coe cient distributions within each bin of 1000 pairs along the distance (Supplementary Figure 1a, 1b). To compare the tTRE correlations independent of the genotype correlations in the population structure (linkage disequilibrium), we used genotypes at the SNPs nearest to the TSSs with the minor allele frequency greater than 0.05, and selected tTRE pairs within 1 Mb distance whose associated genotypes are not correlated (|correlation coe cient of the genotypes| < 0.05, n=129,660 pairs; Supplementary Figure 1f).
Co-expression analysis between tTRE nascent transcription and mRNA expression. We used the RNA-seq expression data from Pickrell et al. 36  Analysis of the ENCODE factor dependencies in tTRE co-expression. The top 5 percentile of the correlation coe cients in the variable interval bins was used as an indicator of tTRE co-expression. This correlation coe cient percentile serves as a function of the distance, and we refer to it as the correlation decay plot. To test whether the abundance of ENCODE FACTORBOOK binding sites -located either between the tTREs or at the tTREs -affect the correlation decay plots, we calculated the difference between the area under the correlation decay curves (∆AUC) in the 0 -200 kb distance range. The signi cance of the ∆AUC was estimated by comparing to the background distribution of ∆AUC which was generated by random shu ing of the factor binding sites. We calculated the background ∆AUC distributions by 100 permutations of 5,000 -70,000 randomly shu ed ENCODE factor binding sites in GM12878. The background ∆AUC distributions followed a normal distribution and were dependent on the number of binding sites. We approximated the expected mean and standard deviation (sd) of the ∆AUC dependent on the actual number of the speci c factor's binding sites, and used it to generate a z-score and p-value of the ∆AUC between correlation decay curves enriched or depleted with the factor binding sites. For example, CTCF contains 41,465 ChIP-seq binding sites 27 , and the ∆AUC = +27.8 between more than 2 CTCF sites or less than 1 site intersecting a pair of tTREs. The background distribution of ∆AUC with 40,000 sites intersecting tTREs is +6.41 ± 2.56 (mean ± sd), and we obtained the z-score = +8.34 and p = 7.26 x 10 -17 for CTCF intersections affecting correlation decay plots. The intersection and the occupancy scores of all other 76 factors available for GM12878 in FACTORBOOK were calculated in this way. Only factors with su cient number of binding sites in each category were reported. These p-values and z-scores were used in hierarchical clustering and t-stochastic neighbor embedding (tSNE) analysis to cluster TFs.
Strand-speci c co-expression analysis for tTRE nascent transcription at adjacent sites For each tTRE, the nearest neighboring tTRE downstream was identi ed and a linear regression between their strands was computed across the samples according to the following categories: Sense (plus-plus, minus-minus), antisense convergent (plus-minus), and antisense divergent (minus-plus). Pairs of tTREs were binned based on the distance between them and the distributions of Pearson correlation coe cients were compared between the categories. Pairs within 250 bp were not included to avoid any possibility of the same read being counted in both tTREs.

Additional Information
The authors declare no competing interests.

Data Availability statement
The data can be accessed through the GEO (GSE110638). The datasets generated and analyzed during the current study are available in the Github repository, [https://github.com/sl2665/procap-network].   Transcription factor binding sites are associated with changes in tTRE co-expression. (a) Intersection by insulator decreases tTRE co-expression. Top: Diagram of CTCF bound to intersecting regions between two tTREs. Bottom: Distributions of correlation coe cients for tTRE pairs that are intersected by 2 or more CTCF sites and 1 or less, binned by distances (left) or total (right). *** indicates P < 1 x 10 -3 by Wilcoxon rank sum test. (b) Occupancy by co-activator increases tTRE co-expression. Top: Diagram of P300 occupying a tTRE. Bottom: Correlation coe cients for tTRE pairs that are occupied by P300 or not occupied, binned by distances (left) or total (right). *** indicates P < 1 x 10 -3 , and ** indicates P < 1 x 10 -2 by Wilcoxon rank sum test. (c) The effect of TF intersection and occupancy on tTRE interactions for 61 TFs. Heatmap indicates the -log 10 (FDR) for AUC between with and without the TF (Supplementary Figs   2b, 2c). The TFs are ordered by minimum variance hierarchical clustering. Comparison of the co-expression between neighboring and interlacing tTRE pairs to test the direct neighborhood interaction over distance effect (left). LOESS ts of the correlation coe cients in neighboring and interlacing tTRE pairs (right two panels). Note that neighboring pairs have higher correlation than interlacing pairs.

Figure 5
Covariation analysis supports a cooperative rather than inhibitory model for convergent transcription.Diagram suggesting three models of the transcriptional consequences of convergent transcription -collision, remodeling, recycling. Two adjacent enhancers (TREs) are engaged with bidirectional divergent RNA polymerase II (coffee bean shaped lled with red color) transcribing sense (red lines) and antisense (blue lines) nascent RNAs (see Discussion).