LiRIP-seq profiles RNA-RNA interactome in living cells
To systematically profile the RNA-RNA interactome in live bacteria, we have developed a highly streamlined in vivo approach (LiRIP-seq, ligation in vivo followed by RNA immunoprecipitation and sequencing) (Fig. 1A). By pulse-expressing T4 RNA ligase 1 (t4rnl1) from an inducible pBAD promoter (Figs. 1A and S1), LiRIP-seq enables in vivo proximity ligation of sRNAs to their interaction partners in living cells. This is followed by enrichment of Hfq-bound ligation products (RNA chimeras) using Hfq-coIP and subsequent RNA-seq analysis (Chao et al., 2012). Expression of T4 RNA ligase was induced only for 30 min, minimizing non-specific ligations and secondary effects on Salmonella growth (Fig. S1 C).
To validate the feasibility of the approach, we have successfully detected in vivo ligation products using RT-PCR for several known sRNA-target pairs in Hfq-coIP samples (Fig. S1 D-F), such as ArcZ-flhD (De Lay and Gottesman, 2012) and CyaR-ompX (Papenfort et al., 2008). As important controls, chimeras were not detected in the absence of T4 RNA ligase or in the control IP (untagged WT) samples. We also did not detect chimeras for non-target interactions (e.g. ArcZ-ompX, which are not predicted to basepair), together indicating the high specificity and sensitivity of our in vivo ligation and capture strategy.
RNA-seq analysis of the Hfq-coIP samples on a genome-wide scale fully recapitulated the RNA ligation products for ArcZ-flhD in the form of chimeric reads (Fig. 1B-C). Systematic analysis of all sequencing reads (chimeras and non-chimeric singletons) confirmed a strong enrichment of known Hfq-associated sRNAs in Hfq-coIP samples vs. the untagged control coIP library (Fig. 1C), but also a 10-fold increase in the number of chimeric reads when T4 RNA ligase was expressed (e.g. Hfq + T4 vs. Hfq + EV) (Fig. 1D). Moreover, while the majority of the detected chimeras in the no-ligase control (Hfq + EV) represented ligation products of abundant rRNAs and tRNAs, a large number of ‘informative’ non-rRNA/tRNA chimeras (8000 chimeras/million reads) were detected in the Hfq + T4 samples (Fig. 1C-D). Further analysis of the significant chimeras (S-chimeras, Fisher’s-exact test, p < 0.05) confirms that most sRNAs are ligated to the 5’ UTR and CDS of mRNAs (Fig. 1E), consistent with the established action of sRNAs (Hör et al., 2020). Within these S-chimeras, nearly all mRNA 5’UTRs and CDS are located at the 5’ end (“RNA1”), and over 90% of sRNAs are located at the 3’ end (“RNA2”) (Fig. 1F). This directionality indicates that sRNAs 5’ are prone to in vivo ligation, whereas sRNA 3’ ends are protected by Hfq from fortuitous ligation. These results support the high fidelity of in vivo proximity ligation on Hfq in live bacteria.
Global RNA-RNA interaction network overgrowth in Salmonella
Having established the LiRIP-seq approach, we next analyzed the global RNA interactome at three different stages of Salmonella growth in LB medium. At OD600 of 0.5, 2.0 and 2.0 + 3h, we induced the expression of T4 RNA ligase in Salmonella for 30 min, and then pulled down Hfq and its bound RNAs (Fig. 2A). Deep sequencing analysis of the RNA samples (12 samples, in duplicates) yielded ~ 153 million high-quality mappable reads (Table S4), generating ample coverage of the Salmonella genome with high reproducibility between replicates (Fig. S2 A-B). LiRIP-seq strongly enriched the class of Hfq-associated sRNAs (Fig. 2B, Fig. S2C), which exhibited a dynamic profile overgrowth consistent with our previous report (Chao et al., 2012). Further, we have observed a strong enrichment of sRNA chimeras at all three growth conditions (Fig. 2B), covering ~ 90% known Hfq-dependent sRNAs. Using a stringent cutoff (chimeric reads > 10, and P < 0.05, Fisher’s-exact test (Melamed et al., 2016)), our LiRIP-seq analysis identified a total of 436, 855 and 1705 statistically significant RNA-RNA interactions under three growth conditions, respectively (Fig. 2C). Nearly 30% of these interactions were consistently detected in more than one growth condition. Interestingly, the number of interactions increased towards stationary phase of growth (Fig. 2C-E), accompanied by the appearance of stress-induced sRNAs and their abundant interactions with target genes (Fig. 2B). For example, 10-fold more interactions were identified for SdsR in stationary phase (OD2 + 3h) compared to logarithmic phase (OD 0.5) (Fig. 2D-E), when SdsR is activated by RpoS and occupies ~ 20% of all sRNA singleton reads (Fig. 2B). Similarly, the stress-related sRNA RprA, an activator of RpoS (Majdalani et al., 2002), showed 20-fold more interactions in stationary phase, which included the known RprA-rpoS interaction (Table S5). Therefore, LiRIP-seq analysis established a comprehensive and dynamic sRNA-target interaction network in vivo during Salmonella growth.
LiRIP-seq detects targets for both processed and primary sRNAs with high accuracy.
Cross-comparing our in vivo data with available RIL-seq dataset from Salmonella at OD 2.0 (Matera et al., 2022) confirmed that LiRIP performed equally well or better than RIL-seq with a greatly streamlined workflow. Using the Top 10 sRNAs as a practical benchmark, LiRIP-seq identified more interactions in most cases (Fig. 3A), together recapitulating nearly half of interactions (203) that were detected by RIL-seq (Matera et al., 2022) and 1214 new interactions. To further examine the reliability of these interactions, we carefully analyzed the S-chimeric reads for sequence motifs. This revealed a polyU motif in RNA2 (Fig. 3B), likely a signature of Rho-independent terminators at sRNA 3’ ends (Fig. S3-S4). Meta-analyses of RNA1 fragments successfully identified a number of highly significant motifs with extremely low p-values (Fig. 3C-F, Fig. S5). Strikingly, these motifs are found in nearly 100% of all the captured target mRNAs, and show substantial complementarity to the conserved seed region of their cognate sRNAs. For example, we have identified two motifs complementary to the both seeds (R1 & R2) of the GcvB sRNA (Fig. 3F) (Miyakoshi et al., 2022; Sharma et al., 2007). These data demonstrate that LiRIP-seq is highly effective to discover true sRNA-target interactions in vivo.
In addition to these sRNA-mRNA interactions, we also identified almost 100 sRNA-sRNA chimeras involving many potential RNA sponges (Table S5). For instance, the documented sponge interaction between ArcZ and CyaR sRNAs (Kim and Lee, 2020) had the highest abundance among ArcZ S-chimeras in our dataset. We also captured the classical ChiX-chbBC sponge pairs (Figueroa-Bossi et al., 2009), as well as the OppX-MicF sponge complex that was recently recognized to adjust envelope porosity to transport capacity (Matera et al., 2022).
In live cells, the performance of LiRIP-seq is based on the in vivo availability of sRNA 5’ end for proximity ligation (Fig. 1A), since we have not introduced any nuclease trimming or end-repair steps. Intriguingly, we observed a strong enrichment of processed sRNAs in S-chimeras compared to primary sRNAs (Fig. 3G), indicating processed sRNAs with a 5’-monophosphate (5’P) are more prone to ligation. Indeed, the 5’ ends of processed sRNA such as CpxQ and ArcZ are readily captured as RNA2 in S-chimeras and occupy a large number of chimeric reads (Fig. 3H). Based on this unique feature, LiRIP-seq may help identify novel 3’UTR-processed sRNAs that are prone to ligation. In comparison, primary sRNAs such as Spot42 and GcvB are involved in chimera formation at internal seed regions (Fig. 3H), perhaps during the coupled decay of sRNA-target pairs (Prévost et al., 2011). In other words, our data established that the transient expression of T4 RNA ligase in vivo mediates rapid ligation between the 5’P end of sRNAs to their binding partners on Hfq.
LiRIP-seq identifies the porin mRNA ompD as key regulatory hub
Focusing on the target genes, our inspection of R1 fragments in chimeric reads identified a number of potentially key regulatory hubs in Salmonella that are targeted by multiple sRNAs. Figure 4A depicts Salmonella mRNAs that may interact with four or more sRNAs based on our data. These mRNAs include the prominent regulatory hub rpoS, whose expression is activated by three sRNAs (ArcZ, RprA, and DsrA (Sedlyarova et al., 2016)), all of which were captured as S-chimeras in our dataset (Table S5).
Among the most-targeted mRNAs are ompD and ompC, both encoding abundant porins on the Salmonella outer membrane. The ompD mRNA is predicted to interact with as many as 13 sRNA candidates (Fig. 4A-B), among which only two are established regulators of ompD: the global OMP repressor RybB (Papenfort et al., 2006) and the pathogenicity island-encoded sRNA InvR (Pfeiffer et al., 2007). Twelve of these sRNAs are predicted to base-pair with the 5’ UTR or early CDS of the ompD mRNA (Fig. 4B), whereas ArcZ is predicted to interact in the coding region (Fig. S6). To validate these interactions and their regulatory functions, we cloned RybB as control and another 11 sRNAs into pZE12 vectors and constitutively expressed them in WT Salmonella. Strikingly, 8 out of 11 sRNAs strongly inhibited the expression of OmpD (Fig. 4C), while several sRNAs also repressed OmpC and OmpA to different extent. These data not only suggest ompD as one the largest regulatory hubs in bacteria (regulated by > 10 sRNAs), but also showcase the reliability and robustness of LiRIP-seq analysis.
A novel 3’UTR-derived sRNA regulator FadZ
LiRIP-seq data suggest that several of the novel regulators of OmpD are processed sRNAs, among which we selected one sRNA STnc790 (renamed FadZ) for detailed characterization. FadZ was initially described as a primary sRNA candidate in Salmonella using differential RNA-seq (Kröger et al., 2013, 2012). It is located within the 196 nt-long 3’UTR of the fadBA mRNA, which encodes enzymes involved in fatty acid oxidation and metabolism (Fig. 5A). Our LiRIP-seq data show that only a short 3’ terminal fragment of the mRNA, which possesses the highest sequence conservation among Enterobacteriaceae species including E. coli and Yersinia (Fig. 5C-D), was pulled down by Hfq (Fig. 5B). On northern blot, FadZ accumulated as a very short species of only ~ 40 nt in Salmonella and in E. coli (Fig. 5E, Fig. S7). As expected for an Hfq-associated sRNA, the expression of FadZ was abolished in a Salmonella hfq-deletion strain, but unaffected in the strain lacking the second global RNA chaperone ProQ (Fig. S7C).
FadZ becomes undetectable at the non-permissive temperature (44°C) in an RNase E temperature-sensitive strain (rneTS) (Fig. 5F), suggesting that it is a processed, 3’UTR-derived sRNA. Consistent with this result, we noticed that the 5’ sequence of FadZ matches the consensus motif for RNase E cleavage (Fig. 5D) (Chao et al., 2017). Mutating three conserved uridines indeed disrupted cleavage and production of FadZ (Fig. S7E), confirming that FadZ is a 3’UTR-sRNA processed by RNase E. Altogether, these data demonstrate that FadZ is a short, Hfq-dependent sRNA cleaved off from the conserved region of the fadBA 3’ UTR.
FadZ is part of an incoherent feed-forward loop in fatty acid metabolism
To investigate the physiological function of FadZ sRNA, we sought to elucidate upstream signals that activate FadZ expression. Because its parental mRNA fadBA is repressed by the transcriptional regulator FadR and derepressed by the addition of certain fatty acids (DiRusso et al., 1992), the 3’UTR-derived FadZ may be under similar transcriptional control. Indeed, FadZ levels were elevated in a mutant lacking the fadR gene (Fig. 6A). While FadZ was not detectable in minimal medium containing glucose as the sole carbon source, it was strongly induced upon the supplementation with long-chain (Ole, oleic acid, C18:1) as well as medium-chain (Oct, octanoic acid, C8:1) fatty acids. These data confirm that FadZ, as well as its parental fadBA mRNA, is activated by the availability of fatty acids. Under this condition, expression of OmpD and OmpC are completely repressed by overexpressing FadZ (Fig. 6B), suggesting that the sRNA may function to shut down the expression of these abundant porins during fatty acid metabolism.
Additional transcription factors may control the expression of fadBA and FadZ, since FadZ accumulates in Salmonella when growing in LB. Screening a small panel of regulators identified CRP as an upstream activator of FadZ expression (Fig. 6C-D). FadZ expression and levels of a fadBA-lacZ transcriptional reporter were extremely low in a Δcrp mutant. CRP has also been suggested to regulate ompD expression (Santiviago et al., 2003). Using a transcriptional ompD-lacZ reporter fusion, we indeed confirmed that CRP activates transcription from the ompD promoter (Fig. 6E, Fig. S7B). These findings support that ompD and its sRNA repressor FadZ are activated by the same upstream transcriptional regulator CRP, thus forming a type-1 incoherent feed-forward loop (Fig. 6F). Finally, we have observed an obvious growth defect when FadZ was constitutively expressed in medium containing oleic acids, and an even more pronounced detect for the Δcrp mutant (Fig. 6G-H, Fig. S7F), highlighting the crucial role of this feed-forward loop in fatty acid metabolism.