Analysis of Messenger RNA Secondary Structures in Rhodobacter sphaeroides

Background: The Shine-Dalgarno (SD) sequence, when present, is known to promote translation initiation in a bacterial cell. However, the thermodynamic stability of the messenger RNA (mRNA) through its secondary structures has an inhibitory effect on the eciency of translation. This poses the question of whether bacterial mRNAs with SD have low secondary structure formation or not. Results: About 3500 protein-coding genes in Rhodobacter sphaeroides were analyzed and a sliding window analysis of the last 100 nucleotides of the 5’ UTR and the rst 100 nucleotides of ORFs was performed using RNAfold, a software for RNA secondary structure analysis. It was shown that mRNAs with SD are less stable than those without SD for genes located on the primary chromosome, but not for the plasmid encoded genes. Furthermore, mRNA stability is similar for genes within each chromosome except those encoded by the accessory chromosome (second chromosome). Conclusions: Results highlight the possible contribution of other factors like replicon- specic nucleotide composition (GC content), codon bias, and protein stability in determining the eciency of translation initiation in both SD-dependent and SD-independent translation systems.


Introduction
Translation initiation, a rate-limiting step in protein biosynthesis, involves the recognition, attachment and adaptation of the mRNA to the 30S subunit of the ribosome [1]. Messenger RNA recognition is facilitated by the non-random distribution of purines about 5-10 nucleotides upstream the start codon [2][3]. This purine-rich sequence (typically 3-6 nucleotides long), known as the Shine-Dalgarno (SD) sequence, is also complementary to a conserved region at the 3' end of the 16S rRNA located in the platform of the 30S subunit [4][5]. By complementary base pairing between the 16S rRNA and mRNA, the mRNA is attached to the 30S platform. Trans-acting initiation factors (IF1, IF2, and IF3) and ribosomal proteins mediate this attachment to the small subunit of the ribosome and help to unfold the mRNA for its accommodation in the channel of the ribosome. Although mutations in SD have been shown to alter protein expression levels up to 250-fold, SD itself is not obligatory for translation of some genes, e.g rpsA in Escherichia coli [5]. In some of the cases where there is no complementarity between 16S rRNA and the sequence upstream of the mRNA start site, it has been shown that ribosomal protein S1 interacts with AUrich regions to facilitate translation initiation [3].
A recent study in Lactococcus lactis revealed that in cases where mRNA-16S rRNA and/or mRNAribosomal protein interaction is absent, mRNA stability, or its lack thereof, contributes signi cantly in translation initiation e ciency [2,6]. Hence, analyzing mRNA secondary structure is critical in understanding translation initiation, as the formation of highly stable hairpin structures around a start codon could not only occlude translation from that codon, but also drive translation from a weaker start codon with less secondary structure interference downstream [7][8][9][10][11]. Since SD serves as a recognition signal for the selection of the right reading frame for translation, it is expected that this sequence is somewhat sensitive to secondary structure formation.
A study of mRNA stability across alphaproteobacterial, gammaproteobacterial, cyanobacterial, plastid, metazoan mitochondrial, fungal mitochondrial and plant mitochondrial genomes was previously performed [12], and the results of randomly sampled 5000 genes from each group revealed that, on average, mRNAs without SD have less secondary structure than mRNAs with SD in organisms where SDdependent and SD-independent translation coexist [12]. Furthermore, in these organisms, mRNAs with and without SD generally have minimal secondary structure around the start codon, compared to the upstream and downstream regions of the start codon. The secondary structure analysis was based on predicting minimum free energy (MFE) of mRNAs with RNAfold function in the Vienna package [13], which is publicly available.
The objective of this study was to assess the in uence of SD on the secondary structure of mRNAs in Rhodobacter sphaeroides. Two hypotheses were tested: 1) mRNAs with SD and mRNAs without SD retain similar stability, and 2) secondary structure around the start codon is minimized for mRNAs with SD and not for mRNAs without SD.
Generally, given the set of non-overlapping secondary structures, P, for a given sequence, S; Pr(P | S) = Boltzmann Distribution and N(Con gurations) = 1.8 L [14][15]. Then probability of a base pair (i,j) for S is given by: 1 Minimum free energy (MFE) was used as a measure of secondary structure formation as previously described [12,16]. MFE value is computed by adding up energy contributions of two consecutive base pairs according to nearest-neighbor-pairing rules [16][17]. The RNAfold function in Matlab Bioinformatics Toolbox [18][19], which implements the Turner energy model [16,[20][21], was used to compute the MFE values in this study. The RNAfold in Matlab incorporated some sequence-dependent adjustments in thermodynamic parameters to improve free energy minimization for RNA structure prediction [18].
This revised function performs better sequence knowledge-based computations of MFE, and a low MFE value for an input sequence indicates that the sequence is stable [22][23][24]. Furthermore, less secondary structure around the start codon and the SD of mRNAs would suggest that both the accessibility of the start codon and the exposure of the SD sequence for complementary pairing with 16S rRNA might be necessary for e cient translation initiation [8,11,25].

Results And Discussion
Chromosomal Genes with SD have mRNAs with less secondary structure Figures 1 and 2 show the distribution of MFE values obtained for genes located in chromosome 1 and chromosome 2, respectively. MFE values for genes with SD are signi cantly different from values for genes without SD (P < 0.001) in both chromosomes. Furthermore, comparing genes with SD to those without SD (bin 0) reveals a nonrandom distribution of median MFE values for these genes, wherein median MFE values for bin 0 is lower than medians for most of the other bin numbers. This suggests that mRNAs with SD form less stable secondary structures in comparison to those without SD for genes located in chromosomes 1 and 2.
The role of SD is independent of mRNA secondary structure in plasmid-encoded genes Figure 3 shows the combined distribution of MFE values for genes located in the plasmids. Performing an MFE value comparison parallel to that of the two chromosomes, reveals a random distribution of medians for all the bins, including bin 0. Moreover, MFE values for all genes are not signi cantly different from each other for Plasmid A (p = 0.088), Plasmid B (p = 0.148), Plasmid C (p = 0.341), Plasmid D (p = 0.186) and Plasmid E (p = 0.644). This then suggests that in plasmids, there is no apparent difference in stability for mRNAs with SD and those lacking SD. This indicates less e cient translation of transcripts for plasmid genes compared to those of chromosomal genes. However, since these endogenous plasmids exist in multiple copies in the cell, loss of translation e ciency may be compensated by the overabundance of transcripts available for protein synthesis [32].
Impact of SD on mRNA stability around the start codon is in uenced by intrinsic genome composition Sliding window analysis also revealed that mRNAs with SD are less stable than those without SD for genes on chromosomes 1 and 2 (Figs. 4 and 5), refuting the rst hypothesis that mRNAs with SD and mRNAs without SD retain similar stability; although no statistically signi cant difference is seen for the plasmids, especially for regions upstream the start codon. Furthermore, a pronounced maximum mRNA instability around the start codon is only seen for genes located in chromosome 2 (Fig. 5). A similar RNA stability is maintained for genes in chromosome 1 and plasmids. The variability seen in free energy for genes in Fig. 6 is as a result of the high standard deviation in means for some of the plasmids. Nonetheless, the second hypothesis that secondary structure around the start codon is minimized for mRNAs with SD and not for mRNAs without SD is refuted. This indicates that SD is only sensitive to secondary structure formation globally on the mRNA, and that the in uence of SD on mRNA free energy is organism-speci c, and possibly in uenced by intrinsic genome composition.
Even though one would expect a less stable initiation region on the mRNA, it is possible that mRNA instability has an adverse effect on translation as it reduces the half-life of the mRNA [33]. Therefore, a tradeoff between mRNA stability and start codon accessibility might come into play, especially for essential genes on chromosome 1 that are retained mostly in a single copy. This then highlights the possibility of the contribution of factors, other than the presence of SD and mRNA stability (for start codon accessibility), like protein stability, codon bias and GC content, in determining translation e ciency [9, 34-36].

Conclusion
In summary, our work on R. sphaeroides has shown a possible underlying in uence of organism speci city on mRNA stability in SD-dependent and SD-independent translation systems. In R. sphaeroides, Chromosomes 1 and 2, which mostly exist in single copies, contain less stable mRNAs in SD-dependent initiation system, with the premise that the presence of SD implicates its use in driving translation of the mRNA. This is not the case for the plasmids which exist in multiple copies, wherein, mRNA stability is not signi cantly different for both SD-dependent and SD-independent translation systems. Further analyses of mRNA stability around the start codon also show replicon-speci c formation of secondary structure for both mRNAs with SD and those without SD. Future efforts could, therefore, be directed at elucidating the effects of intrinsic genomic features other than the presence of SD and mRNA secondary structure in order to assess the e ciency of translation in bacteria.

Materials And Methods
A total of 3579 protein-coding gene sequences of R. sphaeroides were sampled from the National Center for Biotechnology Information (NCBI) database, and then analyzed using the Bayesian estimation method below [26][27].
(#operons) 1.09 = #genes 2 About 70% of these genes were predicted to be organized in gene-operons. The 27 SD motifs (Table 1) classi ed by Prodigal [28] were searched for, and only 19 ribosomal binding site (RBS) motifs were identi ed in the R. sphaeroides' genome. An 'x' in the middle of a motif indicates a mismatch is allowed. The rightmost column shows the spacer distance allowed between the translation start and the motif. The bin number on the leftmost column indicates the initial "score" assigned by prodigal to the RBS motif in the rst iteration. An 'x' in the middle of a motif indicates a mismatch is allowed. The rightmost column shows the spacer distance allowed between the translation start and the motif. The bin number on the leftmost column indicates the initial "score" assigned by prodigal to the RBS motif in the rst iteration.
No known alternative or Non-SD binding motifs were found in the search scheme. Sequence patterns with a random distribution of nucleotides (no consensus SD) were represented as bin 0. Experimentally validated MFE values at T = 37 were obtained for each input DNA sequence from the NCBI database, using RNAFold algorithm [13,[22][23] in Matlab Bioinformatics Toolbox [18][19]. These energy values were computed by applying dynamic programming, and the corresponding mRNA structures and mountain plots were deduced. Genes were separated based on location on speci c chromosome and plasmids, and a sliding window (spanning 50 nucleotides) analysis was performed on the region of 200 nucleotides, -100 to + 100, on the mRNA. The following constraints were implemented in the probabilistic determination of base pairing: 1) One nucleotide can be paired to at most one other nucleotide; 2) the smallest number of unpaired nucleotides in the loop is three [29]. Although these requirements may not be biologically relevant (indicative of the formation of pseudoknots), they make identi cation of secondary structure more realistic and probable [30][31].     Sliding window analysis of mRNA stability for genes in Chromosome 1. The arrow indicates the position of the start codon. A cartoon depiction of secondary structure formation for mRNAs with SD is also shown on the graph.

Figure 5
Sliding window analysis of mRNA stability for genes in Chromosome 2. The arrow indicates the position of the start codon. A cartoon depiction of secondary structure formation for mRNAs with SD is also shown on the graph.