Analysis of Messenger RNA Secondary Structures in Rhodobacter sphaeroides

doi:10.21203/rs.3.rs-36110/v1

Download PDF

Research article

Analysis of Messenger RNA Secondary Structures in Rhodobacter sphaeroides

https://doi.org/10.21203/rs.3.rs-36110/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: The Shine-Dalgarno (SD) sequence, when present, is known to promote translation initiation in a bacterial cell. However, the thermodynamic stability of the messenger RNA (mRNA) through its secondary structures has an inhibitory effect on the efficiency of translation. This poses the question of whether bacterial mRNAs with SD have low secondary structure formation or not.

Results: About 3500 protein-coding genes in Rhodobacter sphaeroides were analyzed and a sliding window analysis of the last 100 nucleotides of the 5’ UTR and the first 100 nucleotides of ORFs was performed using RNAfold, a software for RNA secondary structure analysis. It was shown that mRNAs with SD are less stable than those without SD for genes located on the primary chromosome, but not for the plasmid encoded genes. Furthermore, mRNA stability is similar for genes within each chromosome except those encoded by the accessory chromosome (second chromosome).

Conclusions: Results highlight the possible contribution of other factors like replicon- specific nucleotide composition (GC content), codon bias, and protein stability in determining the efficiency of translation initiation in both SD-dependent and SD-independent translation systems.

Epigenetics & Genomics

Shine-Dalgarno sequence

Secondary structure

messenger RNA

Translation initiation

Translation initiation, a rate-limiting step in protein biosynthesis, involves the recognition, attachment and adaptation of the mRNA to the 30S subunit of the ribosome [1]. Messenger RNA recognition is facilitated by the non-random distribution of purines about 5–10 nucleotides upstream the start codon [2–3]. This purine-rich sequence (typically 3–6 nucleotides long), known as the Shine-Dalgarno (SD) sequence, is also complementary to a conserved region at the 3’ end of the 16S rRNA located in the platform of the 30S subunit [4–5]. By complementary base pairing between the 16S rRNA and mRNA, the mRNA is attached to the 30S platform. Trans-acting initiation factors (IF1, IF2, and IF3) and ribosomal proteins mediate this attachment to the small subunit of the ribosome and help to unfold the mRNA for its accommodation in the channel of the ribosome. Although mutations in SD have been shown to alter protein expression levels up to 250-fold, SD itself is not obligatory for translation of some genes, e.g rpsA in Escherichia coli [5]. In some of the cases where there is no complementarity between 16S rRNA and the sequence upstream of the mRNA start site, it has been shown that ribosomal protein S1 interacts with AU- rich regions to facilitate translation initiation [3].

A recent study in Lactococcus lactis revealed that in cases where mRNA-16S rRNA and/or mRNA-ribosomal protein interaction is absent, mRNA stability, or its lack thereof, contributes significantly in translation initiation efficiency [2, 6]. Hence, analyzing mRNA secondary structure is critical in understanding translation initiation, as the formation of highly stable hairpin structures around a start codon could not only occlude translation from that codon, but also drive translation from a weaker start codon with less secondary structure interference downstream [7–11]. Since SD serves as a recognition signal for the selection of the right reading frame for translation, it is expected that this sequence is somewhat sensitive to secondary structure formation.

A study of mRNA stability across alphaproteobacterial, gammaproteobacterial, cyanobacterial, plastid, metazoan mitochondrial, fungal mitochondrial and plant mitochondrial genomes was previously performed [12], and the results of randomly sampled 5000 genes from each group revealed that, on average, mRNAs without SD have less secondary structure than mRNAs with SD in organisms where SD-dependent and SD-independent translation coexist [12]. Furthermore, in these organisms, mRNAs with and without SD generally have minimal secondary structure around the start codon, compared to the upstream and downstream regions of the start codon. The secondary structure analysis was based on predicting minimum free energy (MFE) of mRNAs with RNAfold function in the Vienna package [13], which is publicly available.

The objective of this study was to assess the influence of SD on the secondary structure of mRNAs in Rhodobacter sphaeroides. Two hypotheses were tested: 1) mRNAs with SD and mRNAs without SD retain similar stability, and 2) secondary structure around the start codon is minimized for mRNAs with SD and not for mRNAs without SD.

Generally, given the set of non-overlapping secondary structures, P, for a given sequence, S; Pr(P | S) = Boltzmann Distribution and N(Configurations) = ${1.8}^{L}$ [14–15]. Then probability of a base pair (i,j) for S is given by:

$\text{Pr}\left[\left(i,j\right)|S\right]≔ \sum _{P\ni (i,j)}\text{P}\text{r}\left[P\right|S]$

Minimum free energy (MFE) was used as a measure of secondary structure formation as previously described [12, 16]. MFE value is computed by adding up energy contributions of two consecutive base pairs according to nearest-neighbor-pairing rules [16–17]. The RNAfold function in Matlab Bioinformatics Toolbox [18–19], which implements the Turner energy model [16, 20–21], was used to compute the MFE values in this study. The RNAfold in Matlab incorporated some sequence-dependent adjustments in thermodynamic parameters to improve free energy minimization for RNA structure prediction [18].

This revised function performs better sequence knowledge-based computations of MFE, and a low MFE value for an input sequence indicates that the sequence is stable [22–24]. Furthermore, less secondary structure around the start codon and the SD of mRNAs would suggest that both the accessibility of the start codon and the exposure of the SD sequence for complementary pairing with 16S rRNA might be necessary for efficient translation initiation [8, 11, 25].

Chromosomal Genes with SD have mRNAs with less secondary structure

Figures 1 and 2 show the distribution of MFE values obtained for genes located in chromosome 1 and chromosome 2, respectively. MFE values for genes with SD are significantly different from values for genes without SD (P < 0.001) in both chromosomes. Furthermore, comparing genes with SD to those without SD (bin 0) reveals a nonrandom distribution of median MFE values for these genes, wherein median MFE values for bin 0 is lower than medians for most of the other bin numbers. This suggests that mRNAs with SD form less stable secondary structures in comparison to those without SD for genes located in chromosomes 1 and 2.

The role of SD is independent of mRNA secondary structure in plasmid-encoded genes

Figure 3 shows the combined distribution of MFE values for genes located in the plasmids. Performing an MFE value comparison parallel to that of the two chromosomes, reveals a random distribution of medians for all the bins, including bin 0. Moreover, MFE values for all genes are not significantly different from each other for Plasmid A (p = 0.088), Plasmid B (p = 0.148), Plasmid C (p = 0.341), Plasmid D (p = 0.186) and Plasmid E (p = 0.644). This then suggests that in plasmids, there is no apparent difference in stability for mRNAs with SD and those lacking SD. This indicates less efficient translation of transcripts for plasmid genes compared to those of chromosomal genes. However, since these endogenous plasmids exist in multiple copies in the cell, loss of translation efficiency may be compensated by the overabundance of transcripts available for protein synthesis [32].

Impact of SD on mRNA stability around the start codon is influenced by intrinsic genome composition

Sliding window analysis also revealed that mRNAs with SD are less stable than those without SD for genes on chromosomes 1 and 2 (Figs. 4 and 5), refuting the first hypothesis that mRNAs with SD and mRNAs without SD retain similar stability; although no statistically significant difference is seen for the plasmids, especially for regions upstream the start codon. Furthermore, a pronounced maximum mRNA instability around the start codon is only seen for genes located in chromosome 2 (Fig. 5). A similar RNA stability is maintained for genes in chromosome 1 and plasmids. The variability seen in free energy for genes in Fig. 6 is as a result of the high standard deviation in means for some of the plasmids. Nonetheless, the second hypothesis that secondary structure around the start codon is minimized for mRNAs with SD and not for mRNAs without SD is refuted. This indicates that SD is only sensitive to secondary structure formation globally on the mRNA, and that the influence of SD on mRNA free energy is organism-specific, and possibly influenced by intrinsic genome composition.

Even though one would expect a less stable initiation region on the mRNA, it is possible that mRNA instability has an adverse effect on translation as it reduces the half-life of the mRNA [33]. Therefore, a tradeoff between mRNA stability and start codon accessibility might come into play, especially for essential genes on chromosome 1 that are retained mostly in a single copy. This then highlights the possibility of the contribution of factors, other than the presence of SD and mRNA stability (for start codon accessibility), like protein stability, codon bias and GC content, in determining translation efficiency [9, 34–36].

In summary, our work on R. sphaeroides has shown a possible underlying influence of organism specificity on mRNA stability in SD-dependent and SD-independent translation systems. In R. sphaeroides, Chromosomes 1 and 2, which mostly exist in single copies, contain less stable mRNAs in SD-dependent initiation system, with the premise that the presence of SD implicates its use in driving translation of the mRNA. This is not the case for the plasmids which exist in multiple copies, wherein, mRNA stability is not significantly different for both SD-dependent and SD-independent translation systems. Further analyses of mRNA stability around the start codon also show replicon-specific formation of secondary structure for both mRNAs with SD and those without SD. Future efforts could, therefore, be directed at elucidating the effects of intrinsic genomic features other than the presence of SD and mRNA secondary structure in order to assess the efficiency of translation in bacteria.

A total of 3579 protein-coding gene sequences of R. sphaeroides were sampled from the National Center for Biotechnology Information (NCBI) database, and then analyzed using the Bayesian estimation method below [26–27].

${(\# operons)}^{1.09}=\# genes$

About 70% of these genes were predicted to be organized in gene-operons. The 27 SD motifs (Table 1) classified by Prodigal [28] were searched for, and only 19 ribosomal binding site (RBS) motifs were identified in the R. sphaeroides’ genome.

Table 1

RBS Motifs classified by Prodigal.
Bin Number	RBS Motif	RBS Spacer
-1	Non-SD RBS	Variable
0	No consensus RBS	None
1	GGA, GAG, AGG	3–4 bp
2	GGA, GAG, AGG, AGxAG, GGxGG	13–15 bp
3	AGGA, GGAG, GAGG, AGxAGG, AGGxGG	13–15 bp
4	AGxAG	11–12 bp
5	AGxAG	3–4 bp
6	GGA, GAG, AGG	11–12 bp
7	GGxGG	11–12 bp
8	GGxGG	3–4 bp
9	AGxAG	5–10 bp
10	AGGAG, GGAGG, AGGAGG	13–15 bp
11	AGGA, GGAG, GAGG	3–4 bp
12	AGGA, GGAG, GAGG	11–12 bp
13	GGA, GAG, AGG	5–10 bp
14	GGxGG	5–10 bp
15	AGGA	5–10 bp
16	GGAG, GAGG	5–10 bp
17	AGxAGG, AGGxGG	11–12 bp
18	AGxAGG, AGGxGG	3–4 bp
19	AGxAGG, AGGxGG	5–10 bp
20	AGGAG, GGAGG	11–12 bp
21	AGGAG	3–4 bp
22	AGGAG	5–10 bp
23	GGAGG	3–4 bp
24	GGAGG	5–10 bp
25	AGGAGG	11–12 bp
26	AGGAGG	3–4 bp
27	AGGAGG	5–10 bp
Table was adapted with permission from Hyatt et al., 2010 [28], under license http://creativecommons.org/licenses/by/2.0/legalcode. Changes were made to original table to incorporate depiction of Non-SD RBS as Bin − 1. An 'x' in the middle of a motif indicates a mismatch is allowed. The rightmost column shows the spacer distance allowed between the translation start and the motif. The bin number on the leftmost column indicates the initial "score" assigned by prodigal to the RBS motif in the first iteration.

No known alternative or Non-SD binding motifs were found in the search scheme. Sequence patterns with a random distribution of nucleotides (no consensus SD) were represented as bin 0. Experimentally validated MFE values at T = 37 were obtained for each input DNA sequence from the NCBI database, using RNAFold algorithm [13, 22–23] in Matlab Bioinformatics Toolbox [18–19]. These energy values were computed by applying dynamic programming, and the corresponding mRNA structures and mountain plots were deduced. Genes were separated based on location on specific chromosome and plasmids, and a sliding window (spanning 50 nucleotides) analysis was performed on the region of 200 nucleotides, -100 to + 100, on the mRNA. The following constraints were implemented in the probabilistic determination of base pairing: 1) One nucleotide can be paired to at most one other nucleotide; 2) the smallest number of unpaired nucleotides in the loop is three [29]. Although these requirements may not be biologically relevant (indicative of the formation of pseudoknots), they make identification of secondary structure more realistic and probable [30–31].

A Kruskal-Wallis rank sum test, with adjustments for tied ranks, was performed to evaluate statistically significant differences in distribution of MFE values within each of the two chromosomes and five plasmids. Post-hoc analyses were completed using the Mann-Whitney and Kruskal-Wallis tests to evaluate any pair-wise differences, with Bonferroni correction for Type-1 error, among different regions upstream and downstream the start codon.

Shine-Dalgarno; MFE:Minimum Free Energy; RBS:Ribosomal Binding Site(s); Non-SD:Alternative motif(s) (putative ribosomal binding sites other than SD).

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The RefSeq and GenBank data for R. sphaeroides 2.4.1 that support the findings of this study are available in [NCBI database], [https://www.ncbi.nlm.nih.gov/genome/509?genome_assembly_id=167318]

Competing interests

The authors declare that they have no competing interests

Funding

The authors received no specific funding for this work

Authors’ contributions

DO read the literature, organized the data, generated figures, performed statistical tests, and drafted manuscript. HC generated MFE values from Matlab bioinformatics toolbox, provided much insight on how to best graph/display data, and helped to draft the manuscript. MC proposed the study, guided the researchers performing the data mining and analyses, provided much insight on how to best display data and interpret outcomes, and helped to draft the manuscript. All authors have read and approved the final manuscript.

Authors’ Information

DO was a graduate student in the Department of Biological Sciences at Sam Houston State University when this work was initiated, and is currently a Research Coordinator at The University of Calgary Cumming School of Medicine. HC is an Associate Professor in the Department of Computer Science at Sam Houston State University. MC is a Professor in the Department of Biological Sciences at Sam Houston State University.

Acknowledgements

We would like to thank the Department of Biological Sciences at Sam Houston State University for their support during the completion of this project.

Nivinskas R, Malys N, Klausa V, Vaiškunaite R, Gineikiene E. Post-transcriptional control of bacteriophage T4 gene 25 expression: mRNA secondary structure that enhances translational initiation. Journal of molecular biology. 1999;288(3):291–304.
Picard F, Milhem H, Loubière P, Laurent B, Cocaign-Bousquet M, Girbal L. Bacterial translational regulations: high diversity between all mRNAs and major role in gene expression. BMC Genom. 2012;13(1):528.
Zheng X, Hu G, She Z, Zhu H. Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genom. 2011;12(1):361.
Simonetti A, Marzi S, Jenner L, Myasnikov A, Romby P, Yusupova G, et al. A structural view of translation initiation in bacteria. Cell Mol Life Sci. 2009;66(3):423–36.
Skorski P, Leroy P, Fayet O, Dreyfus M, Hermann-Le Denmat S. The highly efficient translation initiation region from the Escherichia coli rpsA gene lacks a Shine-Dalgarno element. J Bacteriol. 2006;188(17):6277–85.
Mironov A, Kister A. RNA secondary structure formation during transcription. Journal of Biomolecular Structure Dynamics. 1986 Aug;4(1)(1):1–9.
Gu W, Zhou T, Wilke C. A Universal Trend of Reduced mRNA Stability near the Translation-Initiation Site in Prokaryotes and Eukaryotes. PLoS Comput Biol. 2010;6(2):e1000664.
Jacobs E, Mills J, Janitz M. The Role of RNA Structure in Posttranscriptional Regulation of Gene Expression. Journal of Genetics Genomics. 2012;39(10):535–43.
Katz L, Burge CB. Widespread Selection for Local RNA Secondary Structure in Coding Regions of Bacterial Genes. Genome Res. 2003;13(9):2042–51.
Nomura M, Ohsuye K, Mizuno A, Sakuragawa Y, Tanaka S. Influence of messenger RNA secondary structure on translation efficiency. InNucleic acids symposium series. 1983;15:173–176.
Ringnér M, Krogh M. Folding free energies of 5′-UTRs impact post-transcriptional regulation on a genomic scale in yeast. PLoS Comput Biol. 2005;1(7):e72.
Scharff LB, Childs L, Walther D, Bock R. Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites. PLoS Genet. 2011;7(6):e1002155.
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chemical Monthly. 1994;125(2):167–88.
Ding Y, Chan CY, Lawrence CE. RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA. 2005;11(8):1157–66.
Martin J. Describing the Structural Diversity within an RNA’s Ensemble. Entropy. 2014;16(3):1331–48.
Mathews D, Turner D. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16(3):270–8.
Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinform. 2004;5(1):1.
Wuchty S, Fontana W, Hofacker I, Schuster P. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 1999;49(2):145–65.
Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of molecular biology. 1999;288(5):911–40.
Uppsala University
Freyhult E. New techniques for analysing RNA structure. Doctoral dissertation, Uppsala University. 2004. Available from: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=1467BCE65591E341BEF2BC59FA767432?doi=10.1.1.66.1531&rep=rep1&type=pdf.
Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci USA. 2004;101(19):7287–92.
Gruber A, Lorenz R, Bernhart S, Neubock R, Hofacker I. The Vienna RNA Websuite. Nucleic Acids Res. 2008;36(Web Server):W70–4.
Hofacker I. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.
Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R. RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics. 2005;22(4):500–3.
Clote P, Ferré F, Kranakis E, Krizanc D. Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. Rna. 2005;11(5):578–91.
Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 2003;31(24):7280–301.
Price MN, Huang KH, Alm EJ, Arkin AP. A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res. 2005;33(3):880–92.
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119.
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic acids research. 2003;31(13):3406–15.
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic acids research. 2013;41(7):4307–23.
Ren J, Rastegari B, Condon A, Hoos HH. HotKnots: Heuristic prediction of RNA secondary structures including pseudoknots. Rna. 2005;11(10):1494–504.
Million-Weaver S, Camps M. Mechanisms of plasmid segregation: have multicopy plasmids been overlooked? Plasmid. 2014;75:27–36.
Lenz G, Doron-Faigenboim A, Ron E, Tuller T, Gophna U. Sequence Features of E. coli mRNAs Affect Their Degradation. PLoS ONE. 2011;6(12):e28544.
Molina N, van Nimwegen E. Universal patterns of purifying selection at noncoding positions in bacteria. Genome Res. 2007;18(1):148–60.
Tuller T, Waldman Y, Kupiec M, Ruppin E. Translation efficiency is determined by both codon bias and folding energy. Proceedings of the National Academy of Sciences. 2010;107(8):3645–3650.
Zur H, Tuller T. Strong association between mRNA folding strength and protein abundance in S. cerevisiae. EMBO Rep. 2012;13(3):272–7.

Download PDF

Version 1

posted

You are reading this latest preprint version

Analysis of Messenger RNA Secondary Structures in Rhodobacter sphaeroides

Status:

Version 1

Abstract

Figures

Introduction

Results And Discussion

Chromosomal Genes with SD have mRNAs with less secondary structure

Conclusion

Materials And Methods

Abbreviations

Declarations

References

Status:

Version 1