Sequencing technologies and methodologies are improving constantly, allowing investigations of features of both genome and transcriptome from different perspectives. In this study, we used state-of-the-art methods, such as cap-seq and direct RNA-seq, to obtain a genome-wide view of the transcriptional units and to evaluate RNA modifications in L. monocytogenes. Moreover, previous Illumina RNA-seq reads [13] were re-investigated with the aim of identifying RNA editing sites in L. monocytogenes.
TSSs of L. monocytogenes EGD-e and non-pathogenic L. innocua have been studied previously [21]. Here, we evaluated a new strain and possible alternative usage of TSSs when cells were treated with HPP, causing extreme stress. We report, for the first time, identification of TSSs in L. monocytogenes by the cap-seq approach. To compare our RO15 TSS prediction results and previous studies’ results on strain EGD-e [21], we lifted the identified EGD-e TSSs to the RO15 genome. Most TSSs could be lifted to the RO15 genome, but only 57% of the lifted TSSs (876/1531) were in the same position as our TSS prediction using cap-seq. To understand such differences, we obtained the EGD-e reads that were used previously [21] and mapped the RO15 genome. Comparison of the EGD-e data [21] with our cap-seq reads indicated that EGD-e reads [21] are noisier than our cap-seq, which may cause false-positives. Moreover, these two studies ([21] and this study) use different sequencing, mapping, and TSS prediction methods, which might explain discrepant results. One might also speculate that TSSs across the genomes may differ between strains due to a large difference between strains such as a different clonal complex and multilocus sequence type.
Subsequently, we focused on the group of genes found to be relevant for the recovery process after HPP treatment [13], including prophages, anti-CRISPR genes, ethanolamine utilization, and cobalamin synthesis. We looked at TSSs in these regions more closely and compared control and treated samples’ TSS predictions. The only clear difference between control and treated samples TSS predictions was found in the prophage regions. Here, two TSSs were predicted in control samples, and one TSS was predicted in treated samples. Therefore, we speculated that prophages can change TSS usage possibly due to stress-induced mechanisms.
Previously, we detected acrIIA1 and acrIIA2 anti-CRISPR genes within strain RO15 genome using homology-based prediction based on known anti-CRISPR genes [12]. It is possible that new candidate anti-CRISPR genes can be identified using the 'guilt-by-association' approach [39]. Based on our TSS prediction, anti-CRISPR genes (acrIIA1 and acrIIA2) were found in an operon consisting of four genes. Moreover, long RNA-seq reads also supported the predicted four-gene operon structure. We believe that the rest of the genes in the operon might be acrIIA3 and acrIIA4, which were defined before in L. monocytogenes [40] but without any sequence homology. A four-gene operon can be identified in EGD-e, which suggests that in addition to lmo2274, lmo2275, and lmo2276 the gene lmo2273 might be an acr family gene without homology to know family members.
In Listeria, sRNAs play a role in several processes including pathogenicity and host interaction [41]. In total, 81 sRNA were predicted in L. monocytogenes RO15 using our data. Previously, similar methods (cap-seq and ANNOgesic analysis) were employed for Bacteroides thetaiotaomicron [18], and the study successfully validated several predicted sRNAs using Northern blot assays [18]. Therefore, we expected to obtain accurate predictions from the ANNOgesic sRNA predictions. Within the predicted 81 sRNA L. monocytogenes RO15, six of them were identified as asRNA. However, identified asRNAs and corresponding genes were not anticorrelated based on the log2fold change (Figure S2). It is possible that the expression of asRNAs is difficult to identify with RNA-seq, and therefore, we cannot see a clear anticorrelation. We predicted 199 antisense TSS, but only six asRNA were identified by ANNOgesic in L. monocytogenes RO15. Previously, nine asRNA were verified in L. monocytogenes EGD-e using earlier TSS sequencing methods and Northern blot assays [21]. We mapped these nine asRNA [21] to L. monocytogenes RO15 and observed that antisense TSS were predicted at the start of only two out of the nine verified asRNAs. These two strains (EGD-e and RO15) may harbour different asRNAs.
The median 5’ UTR length was calculated as 33 bases in L. monocytogenes RO15. Previously, it was predicted that both L. monocytogenes EGD-e and L. innocua BUG499 also have a median length of 33 nucleotides 5’ UTR [21] and B. thetaiotaomicron a median length of 32 nucleotides 5’ UTR [18]. Moreover, a very similar 5’ UTR length distribution was also predicted in E. coli and Klebsiella pneumoniae [42]. Thus, our prediction of 5’ UTR length of L. monocytogenes RO15 is in line with published data for L. monocytogenes and other bacteria. It should be also noted that variation in median 5’ UTR length was reported in some species such as median 5’ UTR length of 27 nucleotides in cyanobacterium Prochlorococcus MED4 [43] and 42 nucleotides in cyanobacterium Synechocystis sp. PCC6803 [44].
The ANNOgesic “circrna” module predicted 10 circ-RNAs in L. monocytogenes RO15. Interestingly, all 10 predicted circ-RNAs were on the positive strand. Transcriptional analysis indicated that circ-RNAs are differentially expressed after HPP. The function of circ-RNA in bacteria is unknown, but it has been shown that bacteria can translate circ-RNA [45].
Prokaryotic promoter regions generally include − 35 and − 10 elements, which affect promoter activity [46]. Promoter motif analysis indicated that only the TATAAT-like − 10 region could be clearly identified in L. monocytogenes RO15, while no clear − 35 consensus motifs were found. However, promoters of some genes appear to have a -35 motif starting with “TTG” similar to the “TTGACA” -35 region of E. coli [47]. Since no clear − 35 motif was predicted, we believe that several genes in L. monocytogenes RO15 lack a -35 element in the promoter region. In addition, we observed an extended − 10 element “TG” (also known as -15 element) [48] in RO15, which might compensate for the lack of a -35 element [49].
In re-analysing our published RNA-seq data [13], we observed RNA variants (A to G, or A to I) compared with the genome sequence for several genes in both RO15 and ScottA, especially after pressure treatment. A to I editing in the hokB transcript was shown to increase toxicity of E. coli [24]. More recently, it has been shown that A to I RNA editing in the fliC transcript increases tolerance of P. putida to oxidative stress [25]. In our study, as RNA editing was mostly seen in HPP-treated samples, we hypothesize that RNA editing may be related to HPP stress response of L. monocytogenes. Therefore, RNA edited transcripts might have an important role in HPP stress in L. monocytogenes, and these genes can be a target for future research. In total, 12 genes were RNA edited in both strains. Especially the gene hpf encoding a ribosome hibernation-promoting factor drew our attention since the difference between treated samples and control samples increased over time. The gene hpf has an important role in Listeria. When Listeria encounters a stress condition, hpf converts translationally active 70S into inactive 100S ribosomes, which contributes to stationary-phase survival [50]. During HPP recovery L. monocytogenes might improve efficiency of ribosome hibernation by editing the hpf transcript. It has been also reported that the hpf gene can be found in phages [51]. However, in both RO15 and ScottA, the hpf gene was not present in any prophage region.
Direct RNA sequencing is a novel method that creates long reads from native RNA, which allow us to predict full-length transcripts and RNA modifications [28]. For direct RNA sequencing, we selected four samples in total from both L. monocytogenes RO15 and ScottA strains, which were used in our previous HPP experiment [13]. Samples were selected based on earlier observations of abundant differentially expressed genes and high numbers of RNA editing events. Direct RNA sequencing allowed us to predict transcripts, providing a general view of gene structures and operons. The number of predicted transcripts was lower than the number of genes in both strains, due to operon structures. However, some regions of the genome did not have sufficient coverage to predict a transcript from the reads. Hence, transcriptional inactivity was another reason for the lower number of predicted transcripts.
Analysing the RNA modifications using direct RNA-seq reads indicated that a GATGA-like motif is used in both RO15 and ScottA for RNA modification. RNA modification can regulate virulence and resistance in pathogens [52]. To our knowledge, RNA modification of whole transcripts in bacteria has not been reported to date; therefore, this is the first proposed whole transcript-based RNA modification motif in bacteria.
No RNA modification signal was observed in RNA editing sites. This might indicate that RNA modifications are caused by a different mechanism than RNA editing. Moreover, we have not observed partial variants in the RNA editing sites using the direct RNA-seq reads. Therefore, we believe that direct RNA-seq is not suitable for investigating A to I RNA editing sites. RNA editing site prediction may require other sequencing methods such as Illumina.
In summary, we identified several features of the transcriptional unit landscape of L. monocytogenes, including transcription start sites and terminators, promoters, operon, RNA transcripts, RNA editing events, and RNA modifications. We compared EGD-e [21] and RO15 TSSs. More experimental work is needed to fully elucidate these findings linked to functionalities of L. monocytogenes and the relevance of our observation of growth and possible virulence of these bacteria.