Bimodal sequence composition underlies functional duality of transcribed enhancers

Background: Recent studies have drawn attention to transcribed enhancers (trEs) as important regulatory elements of gene expression; however, their characteristics and mechanisms of action remain poorly understood. Results: We profiled the characteristics of trEs and obtained insights into their mechanisms of action. We found that trEs harbor functional duality related to bimodal sequence composition. TrEs are composed of nonoverlapping cores and flanking regions (flanks): cores function as regular enhancers, while flanks transcribe enhancer RNAs (eRNAs) that can potentially regulate the expression of their target genes in trans . Cores are evolutionarily conserved and compact, while flanks are significantly longer. We observed that approximately 25% of eRNAs transcribed from the flanks likely contribute to trans DNA:RNA triple helix formation, while another 10% likely employ classical mechanisms of RNA-based regulation. We found that the majority of human enhancers are not transcribed, and trEs are strikingly different from regular enhancers in their functional characteristics. In addition, we found evidence for trEs exhibiting functional duality in regulatory locus encapsulation (RLE), effectively providing localized control over the spread of gene expression upregulation by trE cores and other locus enhancers. Conclusions:

. In 2010, genome-wide evidence of transcription at enhancers was presented simultaneously in papers by Kim et al. (4) and de Santa et al. (4,5). The former study reported that approximately 25% of enhancers are transcribed and named those enhancer transcripts "enhancer RNAs" (eRNAs). Synthesis of eRNAs is initiated near enhancer centers, where the CREB-binding protein (CBP) and RNA polymerase II (RNA Pol II), are bound and proceed bi-directionally extending to the flanks.
Changes in eRNA expression levels at enhancers are strongly correlated with changes in mRNA expression at nearby genes (4). At the same time, de Santa and colleagues found that 70% of intergenic RNA Pol II peaks are associated with genomic regions with a canonical chromatin signature of enhancers and that enhancer transcription is regulated in response to endotoxin stimulation. Their data demonstrate that a large proportion of intergenic RNA Pol II transcription sites can be ascribed to cis-regulatory genomic regions (5). Furthermore, de Santa et al. pointed out the possibility that a part of the eRNA transcript may be nonfunctional and may only reflect noise generated by random collisions of the transcriptional machinery with the genome (5).
In 2014, the Functional Annotation of the Mammalian Genome (FANTOM) consortium generated a database of transcribed enhancers in the human genome (6) based on bidirectional transcription at enhancers. They identified 43,011 transcribed enhancers using cap analysis of gene expression (CAGE) (7) and showed that enhancer activity can be detected through the presence of bidirectional capped transcripts. They identified enhancers in 432 primary cells, 135 tissues, and 241 cell line samples from human and found that enhancers share properties with CpG-poor messenger RNA (mRNA) promoters but produce bidirectional and relatively short unspliced RNAs and that the production of eRNA transcripts is related to enhancer activity (6). In 2019, Hirabayashi et al. developed a robust method, called native elongating transcript-CAGE (NET-CAGE), that sensitively detects 5′-ends of nascent RNAs, including unstable transcripts such as enhancer-derived RNAs, and expanded the FANTOM database of transcribed enhancers to approximately 80,000 sequences (8).
Recently, an increasing number of studies have provided evidence of the functionality of trEs.
Schaukowitch et al. found that eRNAs facilitate the transition of paused RNA Pol II into productive elongation by acting as decoys for the negative elongation factor (NELF) complex. They also found that the enhancer-promoter interaction was unaffected by eRNA knockdown (9). Allo et al. discovered that 80% of protein argonaute-1 (AGO1) clusters, required for RNA-mediated interference (RNAi), are associated with cell-type-specific trEs, with 73% overlapping active enhancers. They observed that the association is mainly mediated by long intronic eRNAs (10).
A different perspective of the function of trEs was introduced by Sigova et al., who proposed that the transcription factor (TF) Ying-Yang 1 (YY1) binds to both gene regulatory elements and their associated RNA species across the genome where eRNAs produce a positive feedback loop that contributes to the stability of gene expression programs (11). In 2017, Cinghu et al. reported that intronic eRNAs interfere and attenuate host gene transcription during productive elongation (12).
Later, Jiao et al. identified an eRNA of a super enhancer of the heparanase (HPSE) gene that is able to bind to heterogeneous nuclear ribonucleoprotein U (hnRNPU) to facilitate its interaction with E1A binding protein p300 (p300), which results in chromatin looping between the super enhancer and the HPSE promoter (13). There is also an increasing number of experimentally verified cases of trans-regulation by eRNAs. Tsai et al. showed that an enhancer region of the master regulator myogenic differentiation (MyoD) gene on chromosome 7 in mice gives rise to an eRNA that binds a component of a cohesion complex and is recruited to the myogenin locus on chromosome 1 to affect its expression in trans (14). They observed that the functional depletion of eRNAs reduces chromatin accessibility, prevents myogenin activation, and hinders muscle differentiation. Another case is presented by Alvares-Dominguez et al., who reported a long ncRNA (lncRNA), Bloodlinc, transcribed from a super enhancer of the erythroid membrane transporter SLC4A1/BAND3, that diffuses to trans-chromosomal loci to bind critical regulators of terminal erythropoiesis (15). They found that inhibition of Bloodlinc compromises the terminal erythropoiesis program.
The clinical importance of eRNAs has been under a spotlight in recent years. Studies show that eRNAs participate in several types of cancer, namely, bladder cancer (16,17), castrationresistant prostate cancer (18), leukemia (19), head and neck cancer (20), basal cell carcinoma (20), colorectal cancer (21), and in cancer invasion and metastasis (13) (22) and p53 tumor suppressor regulation (23). There is also interest in the therapeutic prospects of trEs (24) (25) with some examples of successful attempts to regulate the expression of p53-dependent genes and to modulate the cell cycle with short interfering or silencing RNAs (siRNAs) against eRNA (26).
We developed a computational approach to dissect the characteristics and mechanisms of action of trEs. We found that trEs harbor a bimodal sequence composition defining their functional duality, in which their cores function as regular enhancers while their flanks produce eRNAs that 5 have a propensity to regulate the expression of their target genes in either cis or trans.
Furthermore, eRNAs are bound by RNA binding proteins (RBPs) and contain DNA-binding domains (DBDs) that can potentially target genes via formation of triple helices with DNA sequences. We describe a case study of a trE within the locus of CD40 that highlights a multigene regulatory activity of this regulatory element that consists of an upregulating activity in cis and downregulating activity in trans.

TrEs harbor enhancer-like epigenetic marks at cores and promoter-like marks at flanks
Using the set of 85,786 trEs, we divided each regulatory element into three segments, named reverse flank, core, and forward flank ( We also observed elevated levels of H3K27me3 and H3K9me3 at the flanks (Figure 2i,j) at trEs, which are considered marks of bivalency related to low transcription levels (29). This is in agreement with trEs harboring low levels of transcription compared to promoters where the number of detected gene transcripts is tenfolds larger than the number of detected eRNAs (6).
Since the flanks share epigenetic modifications characteristic of promoters, we then verified that trEs are not promoters of unannotated genes or alternative promoters. To do this, we overlapped trEs with the histone modification H4K20me1 associated with transcriptional activation of genes (30). We found that this mark is depleted at flanks (Figure 2h), suggesting that transcription at intergenic trEs is not associated with transcription of promoters of unannotated genes. To reduce the possibility that trEs were unannotated TSSs of gene isoforms, we removed those in the proximity of the TSSs of genes (±2 kb).
Overall, we observed that cores of trEs harbor enhancer-like epigenetic marks while flanks harbor promoter-like marks, suggesting distinct regulatory regions with potentially different functionality.
The duality of the regulatory code of the cores and flanks is also reflected in their evolutionary sequence conservation. While both cores and flanks are evolving under stronger negative selection than regular enhancers, the evolutionary conservation significantly spikes at the cores (p-value=1.65e-32, Student's t-test, Figure 3). This phenomenon could be explained if one assumes that the cores have two functions, i.e., enhancer-like regulation of their target genes as well as upregulation of the eRNA transcription from the flanks.

Cores of trEs are enriched in the binding of TFs commonly found at enhancers
We compared the enrichment of 132 TFs binding profiles in trEs and enhancers from the HepG2 cell-line using the ENCODE ChIP-Seq datasets (31,32) (Figure 4a,b). We observed that trE cores and regular enhancers, but not trE flanks, are enriched in the binding sites of key liver TFs, including: the hepatocyte nuclear factor 4 alpha (HNF4A), a hub TF associated with hepatocellular Overall, we found that enrichment of the liver-specific TFs at the trE cores is similar to regular enhancers, thus providing additional evidence that the cores of trEs function as enhancers.
The analysis of computationally predicted TFBSs (see Methods) revealed a trend similar to the TF ChIP-seq profiles and also demonstrated that for some TFs binding is stronger at the trE cores as compared with the regular enhancers (Supplementary Figure 3).

TrEs and enhancers show similar interactions in 3D space
Chromatin looping brings enhancers in proximity to their target promoters, and the 3D structure of the chromatin is commonly characterized using Hi-C (36). We found no significant difference in the number of chromatin contacts originating from trEs and regular enhancers compared to background that consisted of random regions of the human genome matched by length and GC content (Figure 5a), suggesting that trEs contact their target genes as often as regular enhancers and their function is not limited to eRNA production and eRNA-based regulation. The Hi-C results are supported by similar profiles of chromatin remodeling protein binding signals at regular enhancers and trEs (Figure 5b).

TrE flanks produce eRNAs bound by RBPs
In recent years experiments have shown that eRNAs mediate recruitment of RBPs to regulate transcription of their target genes (10,14,15,(37)(38)(39). We observed that RBPs binding is enriched at the flanks of trEs (Figure 6a To gain additional evidence of the RBP binding to trE eRNAs, we computed the enrichment of RBP narrow peaks in HepG2 using 83 available enhanced crosslinking and immunoprecipitation (eCLIP) datasets from the ENCODE project (eCLIP is a recently developed technology that improves specificity in discovery of authentic binding sites (40)). We found that 43% of trEs feature transcription of eRNAs with significant binding of at least one in 83 RBPs (3 times more compared to background, p-value=4.09e-78, Fisher's exact test)(see Methods). We also observed that the binding is significantly localized to trE flanks but not to cores (p-value=0.004, Fisher's exact test).
These results suggest widespread binding of RBPs to eRNAs.

Trans-acting trE eRNAs
Previous research has shown experimental cases of gene regulation by eRNAs-RBPs, associations that can be categorized as trans-mode eRNAs (14,15). The mechanism of action of trans-mode eRNAs consist in the regulation of transcription of target genes located on different chromosomes relative to the chromosome harboring the locus for the transcribed eRNAs. To study trans-mode eRNAs, we first annotated eRNAs using several ncRNA databases.
Using Gencode v28 (41), we found that 1.4% (108/7613) of trEs have pseudogenes transcribed from their flanks in the HepG2 cell line (6.8 times more than expected by chance; p-value=4.7e-33, Fisher's exact test). In contrast, there were only 0.1% (39/30080) of HepG2 regular enhancers that feature transcription of pseudogenes. We also found that the preferential directionality of trEs agreed with the direction of transcription of the overlapping pseudogenes in 64% of cases compared to 35% in enhancers (p-value=0.002, Fisher's exact test). Although the enrichment of pseudogenes at the trE flanks was significant, the very small overall fraction of trEs containing pseudogenes advocates for largely non-genic transcription originating from the trE flanks.
Another potential mechanism of regulation by trans-mode eRNAs consists in the binding of antisense eRNAs (asRNAs) to their target mRNAs to block translation. We found that ~5% (380/7613) of trEs harbor eRNAs that likely function as asRNAs. Previous research reports an antisense ncRNA that represses transcription of the lymphoid enhancer-binding factor 1 (LEF1) gene by recruiting Polycomb repressive complex 2 (PRC2), which trimethylates H3K27 (44). We observed that ~75% (285/380) of these eRNAs are associated with mRNAs transcribed from loci in a different chromosome (in trans) while ~25% associated with mRNAs transcribed from loci in the same chromosome (in cis) (p-value=1.8e-11), suggesting the potential of eRNAs to regulate transcription of their target genes in trans.
Next, we assessed the possibility that trans-mode eRNAs might regulate the transcription of target genes by direct binding to DNA to form triplex target DNA sites (TTSs) (Supplementary Figure 4).
We found that eRNAs are enriched for DNA binding domains (2.5-fold compared with background, p-value<2.34e-11, Fisher exact test) and up to 25% of eRNAs could be attributed to regulation through RNA:DNA triple helical structure formation.
In summary, we were able to identify some evidence of trans regulatory activity associated with trE flanks, but there is no single mode of trans action that could be attributed to the majority of trE eRNAs given currently available data.

Flanks of trEs are enriched in trans-eQTLs and the cores of trEs are enriched in cis-eQTLs
We hypothesized that if trans-mode eRNAs are functional and, indeed, act in trans to regulate their target genes, we should observe an enrichment of trans-eQTLs at the flanks of trEs. We found a significant 2.5-fold enrichment of trans-eQTLs at the trE flanks (p-value<2.77e-6, Fisher exact test; see Methods). We also observed a significantly higher enrichment of trans-eQTLs at the trE flanks as compared with the trE cores (Figure 7a). This difference between the eQTL density at the trE flanks and cores grows with the increase in the distance to the target gene ( Figure 7b). For example, the first row corresponds to the density of eQTLs that are associated with genes located 200,000 bp or more (p-values: *<0.001 **<0.0001 Fisher's exact test). As the distance to the target gene rises to 800,000 bp, there are virtually no trE core eQTLs, and almost all eQTLs are at the trE flanks, which indicates that the true trans regulatory activity of trE is almost exclusively associated with the trE flanks. Finally, the directionality of the trans regulatory eQTL effect indicates that the target genes of trE flank eRNAs are predominantly downregulated ( Figure   7c).
The eQTL enrichment is different for cis eQTLs. At the <10 kb distance to the target gene, there are slightly more core than flank trE eQTLs, and the regulatory directionality corresponds to upregulation of gene expression. This indicates that trEs behave as classical enhancers at the shorter distance and that the trE flanks might be contributing to enhancer formation despite the lack of active TF binding to these outer parts of trEs.

TrE functional duality in regulatory locus encapsulation
Our results indicate that the core and flanking parts of a trE likely regulate different genes, i.e., a nearby gene is regulated by the trE core and a distant gene is regulated by an eRNA transcribed from one of the trE flanks, possibly in trans. As the eRNA regulatory effect is likely opposite in directionality to the upregulation by a trE core enhancer, we considered a trE regulatory mechanism, in which the trE core enhancer upregulates a small number of nearby genes while its eRNAs are used to limit the propagation of the upregulation to the surrounding genes. We named this mechanism regulatory locus encapsulation (RLE). Using thyroid eQTL data, we found evidence of an RLE in the locus of the Coiled-Coil Domain Containing 57 (CCDC57) gene ( Figure   8). CCDC57 is surrounded by the Fatty Acid Synthase (FASN) gene downstream and the Solute Carrier Family 16 Member 3 (SLC16A3) gene upstream. While CCDC57 and SLC16A3 are highly expressed in thyroid and pancreas, the expression of FASN is almost absent in these two tissues (Figure 8a,b,c). FASN, which plays critical role in de novo synthesis of fatty acids, is critically important for proper liver function and is highly expressed in liver (45) (Figure 8a). However, FASN is linked to dysregulation of thyroid and is strongly associated with hypothyroidism (45) at the same time. In particular, FASN overexpression is a marker of papillary thyroid carcinoma (PTC) (46), and inhibition of FASN in PTC has been proposed as a clinical target for this cancer (46). We found two eQTL SNPs in the core and a flank of the CCDC57 trE. The flanking eQTL SNP (rs7406326) is linked to the more distal FASN located 102 kb downstream. Its G>A mutation is associated with a significant 5.2-fold increase in the expression of FASN (Figure 8d), indicating that the native trE eRNA had been silencing FASN expression. The core eQTL SNP (rs8079572) is linked to SLC16A3 located 27 kb upstream, and its G>T mutation is associated with a decrease in expression of SCL16A3 (Figure 8e), which is consistent with an upregulatory enhancer function of this trE. In summary, these results indicate that the CCDC57 trE upregulates gene expression of two proximal genes, i.e., CCD57 and SLC16A3, in thyroid and pancreas, whereas the CCDC57 trE eRNA silences the expression of the flanking, but distal, FASN in these two tissues and that this RLE is critical for proper thyroid function.

Discussion
An increasing number of reports indicate a significant role for eRNAs in gene expression regulation in healthy and diseased states and suggest their clinical utility in eRNA-targeted therapy (16, 17, 20-22, 24, 47-49). However, trE characterization (50)(51)(52) and the mechanisms of action of trE eRNAs (53, 54) remain unclear. Here we developed an integrative approach to dissect the characteristics of trEs and investigated the duality of function of these regulatory elements.
We observed that trEs harbor a bimodal sequence composition, which is distinct from regular enhancers. TrEs consist of three distinct components: a central core and two flanks. The core shares functional characteristics and the TF binding capability of a regular enhancer, while the flanks are selectively used for eRNA transcription. Cores are much more compact than regular enhancers and evolve under stronger negative selection than regular enhancers.
Several groups have argued that eRNA production may be an incidental byproduct of a regulatory mechanism where the act of transcription, and not the transcripts, is the important regulatory function (55). To evaluate this hypothesis, we first tested if the direction of transcription of trEs agrees with the direction of transcription of their target genes. We observed that intronic trEs, but not intergenic trEs, follow the direction of transcription of their target genes in 60% of cases (Supplementary Figure 1a). This suggests that transcription of eRNAs at intronic trEs may be the result of gene transcription promoting inner eRNA transcription. Second, Cinghu et al. reported that transcription at intragenic enhancers interferes with, and attenuates, host gene transcription during productive elongation (12). We then tested this scenario by comparing the expression of target genes with intronic trEs in two groups. The first group consisted of intronic trEs with opposite directionality to that of their host genes, while the second group consisted of those trEs with the same directionality as that of their host genes (Supplementary Figure 1b,c). We found no significant difference in gene expression suggesting that the previously proposed regulatory mechanism may not have a genome-wide effect.
We observed that binding of eRNAs by RBPs is widespread and tissue-specific, which agrees with previous reports of extensive interaction of eRNAs with RBPs (9,68). Based on the spatial interaction of eRNAs with RBPs and their target genes, they can be categorized as either cis-or trans-mode eRNAs. Cis-mode eRNAs act in the proximity of their target promoters.
For example, we observed that an eRNA can potentially regulate PRC2 occupancy at a promoter through the competitive binding of its core component RBP SUZ12 (Supplementary Figure 5).
However, this regulation is only possible when gene transcription is comparable to trE transcription. In this scenario an increase of transcription at the trE can potentially prevent SUZ12 from forming the repressive complex and can allow an increase of gene transcription (Supplementary Figure 5c). If the model is correct, we should also observe a positive correlation between transcription at the trE and transcription at the gene. Furthermore, the correlation should increase for higher values of SUZ12 at the trE as measured by ChIP-Seq (Supplementary Figure   5d). There are similar examples of this mode of action cited above that show that eRNAs facilitate the transition of paused RNA Pol II into productive elongation by acting as a decoy for the negative elongation factor (NELF) complex (9). Also, Sigova et al. reported that some DNA binding TFs also bind RNA generated at promoters and enhancers, which suggests a modest but important effect that contributes to the stability of gene expression programs (11). However, one limitation of this mode of action is the finding that co-transcription of eRNAs and mRNAs rarely occurs within closed enhancer-promoter loops (54), which also suggests, again, that this regulatory mechanism may not have a genome-wide effect.
On the other hand, the trans-mode eRNA mechanism commonly consists of the binding of RBPs to eRNAs in an effort to regulate their target genes in trans. We found that about 5% of flanks can transcribe eRNAs previously categorized as lncRNAs or pseudogenes and another 5% of flanks transcribe eRNAs that can be potentially bind asRNAs to target mRNAs. Previous reports have shown, for example, that an antisense ncRNA represses transcription of the LEF1 gene by recruiting PRC2. However, since asRNAs consist of repetitive elements and the genome is full of repetitive sequences, it is difficult to assess the true gene targets of eRNAs that function as asRNAs.
We also observed that eRNAs can potentially identify their target genes by RNA-DNA triplex interactions to regulate their transcription in trans. The implications of this trans-mode of action is that a subset of eRNAs harbors two types of sites. One is the site for binding RBPs, while the second consists of a DBD that potentially allows the eRNA to identify its target gene in trans. We found that, overall, approximately 25% of eRNAs can potentially interact with genes in a tissue through significant RNA-DNA triplexes, suggesting that triplex formation may be an important mechanism of eRNA regulation.
As additional evidence that eRNAs function to regulate target genes in trans, we found that flanks are enriched for trans eQTLs compared to cores and regular enhancers. We also observed that the primary effect of trans regulation by eRNAs is downregulation of target genes. However, we have to consider that the number of trans eQTLs that overlap flanks of trEs is very limited (<1% of trEs in B-cells), although this may just reflect the finite size of currently available data sets of trans eQTLs.
In addition to the classical trans regulatory activity of trE eRNA, we observed evidence of trEs contributing to RLE, where the role of a trE eRNA is to effectively shield the bystander genes surrounding the genes regulated by the trE core from the propagation of potentially undesirable upregulatory activity of this core.
Taken together, our results show that trEs play an important role in the precise orchestration of gene regulation in the human genome, accurately balancing the positive regulatory activity of trE cores with the negative regulatory activity of eRNAs transcribed from trE flanks.

Enhancers and trEs
To define enhancers, we used the histone modifications H3K27ac and H3K4me1 coupled with For transcribed enhancers (trEs), we used 85,786 predicted trEs identified by bidirectional transcription using NET-CAGE across a panel of tissues (8), a robust method that sensitively detects 5′-ends of nascent RNAs, coupled with the FANTOM5 atlas of transcribed enhancers (6). We obtained 7,613 HepG2 trEs, 4,777 thyroid trEs, and 6,686 GM12878 trEs.

Segmentation, binning, and orientation of trEs
Based on transcription properties, we divided each trE into three segments: a reverse flank, a core, and a forward flank (Figure 1b). The segment locations are based on the positions of the leftmost/rightmost clusters of CAGE tags for reverse and forward flanks, respectively.
Next, since flanks are enriched for transcription initiation clusters within 1 kb of the core, we extended the flanks to 1,000 bp and binned them into 20 bins of 50 bp each (Figure 1c). Cores had an average length of 150 bp and were divided into 10 bins. Next, we oriented each trE so that the strongest transcription (the leading flank) was on the right side (Figure 1d). For this, we first assigned preferential directionality of a trE by comparing the number of CAGE tags (7) in the two flanks. Second, we reoriented all reverse directionality cases so that the strongest transcription pointed to the right. Using a 1.5x-fold difference in the number of CAGE tags between two flanks as a threshold of strong directionality, we observed that 89% of trEs have strong preferential directionality and 11% of trEs are bidirectional.

Chromatin immunoprecipitation sequencing (ChIP-seq) signals (histone marks, TFs) of trEs, enhancers, and promoters
To compare ChIP-seq and RNA-seq signals of trEs and enhancers, we calculated the average signal value S at each bin I (Si) with respect to the background. The background consisted of regions of open chromatin in the tissue/cell line of study as marked by DHSs. The statistical significances of differences in signal vs. background, trEs vs. enhancers, and cores vs. flanks signal were calculated using the Student's t-test (56).
Hierarchically-clustered heatmaps were generated using the Seaborn visualization library based on matplotlib (57).

Sequence conservation of trEs, enhancers and promoters
To evaluate sequence conservation of trEs, we used phylogenetic p-values for multiple alignments of 100 vertebrate genomes (58,59). For this, we took advantage of the PhyloP scores available from the University of California, Santa Cruz (UCSC) Genome Browser (60) and computed the average value per bin with respect to the background. The background consisted of open chromatin regions marked by DHSs (length matched).

RBP enrichment in trEs and eRNAs
To calculate the enrichment of RBPs in flanks of transcribed enhancers, we calculated the average signal value S at each bin I (Si), with respect to the background. RBP signals were obtained from the 32 ENCODE ChIP-Seq datasets (31,32 (31,32) and compare to cores. eCLIP is a technique that allows robust transcriptome-wide discovery of RNA-binding protein binding sites and enable large scale and robust profiling, with amplification and sample requirements similar to those of ChIP-Seq (40).

TFBSs at trEs
To evaluate the enrichment of TFBSs at trEs and to compare them with enhancers, we used the Multiple Em for Motif Elicitation (MEME) library of TFBSs (61). We mapped TFBSs within trEs and enhancers and calculated the TFBS-enrichment p-values using the Poisson distribution with Bonferroni correction for multiple testing. The controls consisted of random regions matched for length and repeat content.

Gene transcription and eRNA directionality
To check if there is a trend of eRNA orientation "towards" the nearest transcription start site (TSS), we first calculated the directionality score (D) for each trE D=(nF-nR)/(nF+nR) where nR iis the number of tags overlapping the reverse flank and nF is the number of tags overlapping the forward flank. Next, we associated each trE with its nearest gene and determined if the gene orientation was the same or opposite to D (Supplementary Figure 1a). We then compared the level of expression of the genes hosting intronic trEs that either match or oppose the transcriptional directionality of their host genes. A scatter plot of directionality vs. gene expression was generated using matplotlib (57).

eRNA-DNA triplex formation
To investigate the formation of eRNA-DNA triplexes, we first selected the GM12878 trEs that have a trans expression quantitative trait locus (eQTL) association. For this, we used a set of ~60,000 significant trans-eQTL associations (FDR<0.05) from the eQTLGen Consortium (62). Next, we used the Triplex Domain Finder (TDF) within the Regulatory Genomics Toolbox (RGT) (63). TDF characterizes the triplex-forming potential between RNA and DNA regions (63). This method tests whether a particular eRNA region is likely to form a DBD using its RNA sequence homology to the potential target DNA regions (±100 kb from center of gene body).

Analysis of trans-eQTLs at trEs
To evaluate the enrichment of trans-eQTLs at the flanks of trEs compared to the cores and to regular enhancers, we first downloaded the significant trans-eQTLs from the eQTLGen Consortium (62). After that, we calculated the density of trans-eQTLs at the cores and flanks with respect to the background, which consisted of random DNA regions matched by length and GC content.