Identifying putative cis-acting elements within the 5’ UTRs of P. falciparum that differ between genes with high and low TE
To identify putative cis-acting sequences that regulate TE in P. falciparum, the ribosome profiling and mRNA sequencing data generated by Caro and Ahyong et. al. [18] was re-analyzed by comparing the 5’ UTR sequences of genes in the bottom 10% and top 10% of TEs during the late trophozoite stage (Figure 1A). Features within the 5’ UTRs were quantified, and the distributions from each set were compared. While the distributions of 5’ UTR length were not statistically distinct (K.S. test p=0.10 Supplemental Figure 1A), the distributions of uAUG frequency differed significantly and appeared distinctly separated when normalized to 5’ UTR length with lower TE genes tending to contain more uAUGs (K.S. test p= 3.36*10−9 and p=1.1*10−21 respectively) (Supplemental Figure 1B, Figure 1B). This trend appeared to be most distinct closest to the protein coding region (Figure 1C).
Additionally, the distributions of GC content statistically differed between 5’ UTRs with low and high TE (K.S. test p=2.24*10−5) (Figure 1D). The positional effect followed a similar trend with repressed genes on average having a higher GC content, especially near the translational start site (Figure 1E). Together, this retrospective bioinformatic analysis suggested that these two features should be further investigated for their role in influencing TE with particular attention placed on the sequence region proximal to the translation start site.
Evaluating P. falciparum and human K562 in vitro translation assays for measuring the effect of 5’ UTRs on TE
To investigate the role of cis-acting elements within 5’ UTRs, an in vitro translation assay previously developed for identifying translation inhibitors against P. falciparum [5, 10] was adapted using both P. falciparum W2 and H. sapiens K562 cellular extracts. To validate and optimize the platform for this purpose, two mRNAs transcribed in late trophozoites with significantly different TEs were identified, PF3D7_1411400 (a plastid replication-repair enzyme) representing a translationally repressed mRNA from the bottom 10% of TEs and PF3D7_1428300 (a proliferation-associated protein) representing a high translation mRNA from the top 10% of TEs. These two genes were chosen for their relatively similar 5’ UTR lengths and other properties (Figure 2A and B). The full length 5’ UTRs of both genes (Figure 2A) were cloned into a reporter construct driving expression of a luciferase enzyme and were evaluated for their effect on TE.
The 5’ UTR of PF3D7_1411400 is 730 nucleotides long, contains 15 uAUGs (13 form uORFs), and is 11.0% GC (Figure 2B). Using the data of Caro et. al. [18], the RNA abundance was measured to be 63.66 reads per million and the log2(TE) was -1.94. The 5’ UTR of PF3D7_1428300 is 775 nucleotides long, contains 10 uAUGs (all of which form uORFs), and is 9.3% GC (Figure 2B). The abundance for the RNA was measured to be 522.93 reads per million and the log2(TE) was 1.75. Thus, the TE of the active gene is 12.2-fold higher than that of the repressed gene by ribosome profiling. In the P. falciparum in vitro translation assay, which effectively removes any influence from differential expression levels, the signal produced by the activating 5’ UTR was 24.5-fold higher than the signal from the repressive 5’ UTR (Figure 2B). In the K562 in vitro translation assay, the 5’ UTR from the active gene also out-performed that of the repressed gene by 5.3-fold (Figure 2C). Both in vitro translation assays recapitulated the difference in TE that was observed in vivo, albeit with different absolute magnitudes.
As noted above, the 5’ UTR analysis of the ribosome profiling data suggested that differences between high and low TE 5’ UTRs appeared to be exaggerated closer to the translation start site. To investigate this while reducing the search space for cis-acting elements, each of the 5’ UTRs was progressively trimmed from the 5’ end (Figure 2C). In P. falciparum lysates, shortening the activating 5’ UTR to 549 nucleotides increased translation 4.2-fold, and reducing the UTR to 130 nucleotides further increased translation 1.9-fold, for a 7.9-fold total increase. Reducing the repressive 5’ UTR to 339 nucleotides similarly increased translation 3.15-fold, but further reduction to 130 nucleotides resulted in no additional increases in P. falciparum. Similarly, in human K562 lysates, trimming of the 5’ UTRs resulted in an overall increase in translation for both 5’ UTRs and increased the TE differential between the two (Figure 2B).
While trimming both 5’ UTRs increased their respective translation, the differential between the activating and repressive UTRs was magnified. At 130 nucleotides, the activating 5’ UTR outperformed the repressive 5’ UTR by 64-fold (Figure 2B), which had the added benefit of increasing the dynamic range between constructs. Hence forth, the minimal 130 nucleotide sequences were used as the platform for further dissection of cis-acting sequences and all subsequent 5’ UTRs evaluated were 130 nucleotides. The activating 130 nucleotide 5’ UTR derived from PF3D7_1428300 is denoted as A[WT] and the repressive 130 nucleotide 5’ UTR from PF3D7_1411400 is denoted as R[WT]. Reflective of the distinct distributions in uAUG abundance and GC abundance, R[WT] is 16.9% GC and contains four uAUGs, numbered 1-4 based on distance from the translation start site. uAUGs 1 and 2 do not form uORFs and are in the +1-frame relative to the reporter gene starting at -13 and -22 nucleotides, while uAUGs 3 and 4 both form uORFs at -66 and -101 nucleotides. A[WT] is 7.7% GC and contains no upstream “AUG”s (Figure 2D).
All the RNAs used herein were capped using Vaccinia Capping Enzyme (NEB M2080S). To verify that both lysates were sensitive to capping, capped and uncapped versions of the full length 5’ UTRs and the 130 nucleotide 5’ UTRs were compared (Supplemental Figure 2). Both lysates were sensitive to capping, with capped RNAs generally generating more luminescence (up to a 21.7-fold increase in P. falciparum and 7.1 in K562 with full length 1429300), especially in P. falciparum lysates. Additionally, in K562 lysates, uncapped RNAs with the full length 5’ UTRs generated a more variable signal than capped RNAs. To promote scanning initiation, increase luminescence signal, and reduce noise, all further experiments in this study utilized capped RNA.
Measurement of both independent and combined effects of uAUGs on translational repression
The combined effect of the four uAUGs in R[WT] was first evaluated by mutating all four to “AUC”, denoted R[Δ1Δ2Δ3Δ4]. Conversion of all four alleviated repression by over 1000% in P. falciparum, and 337% in human lysates (Figure 3A). If each uAUG equally contributed toward repression, the expected result of maintaining any single uAUG would be a consistent relief from repression relative to R[WT]. However, individually maintaining each of the four uAUGs yielded significantly different degrees of translation (Figure 3B), ranging from a modest 2-fold increase with uAUG-3 alone (R[Δ1Δ2Δ4]) to a nearly 10-fold increase with uAUG-1 alone (R[Δ2Δ3Δ4]), indicating unequal contributions towards the overall level of repression. For K562 extracts, the results were similar, although uAUG-2 alone (R[Δ1Δ3Δ4]) was the most repressive of the set, being even more so than the wild-type construct. Since uAUG-4 forms a uORF whose stop site overlaps with uAUG-3 and was eliminated by making uAUG-3 into “AUC”, uAUG-4 with a restored uORF was also evaluated (R [Δ1Δ2Δ3-uORF restored]). With the uORF restored, uAUG-4 confers minimal or no translational repression. These data demonstrate that each of the individual uAUGs in isolation possess differing repressive activities with respect to translation.
To further evaluate the repressive effects of uAUGs in a novel context, the four uAUGs from R[WT] were placed into A[WT] at the matching positions (Supplemental Figure 3). As expected, in P. falciparum, when all four uAUGs were present A[+1:+2:+3:+4], translation was repressed, 2.9-fold. Additionally, each uAUG individually repressed translation between 1.5-fold and 2.9-fold when the other positions were mutated to “AUC (Supplemental Figure 3). The results in K562 followed the same trends as P. falciparum.
To explore potential interactions between uAUGs, pairwise combinations of the uAUGs in R[WT] were evaluated (Figure 3C). If uAUGs possess independent repressive potentials that do not affect each other, the repression by any two uAUGs would be the product of their respective potentials. For example, the two furthest uAUGs, uAUG-1 and uAUG-4, yielded 37% and 73% of the maximum translation of the derepressed construct R[Δ1Δ2Δ3Δ4] in P. falciparum lysates. Thus, if acting independently, the predicted yield for a 5’ UTR containing both uAUGs would equal 0.37 * 0.73, or 27%, of the maximum signal. The measured signal for this combination (R[Δ2Δ3]) was extremely close to the predicted value, 28.6%, suggesting that these two elements act independently and proportionately on translation. Evaluation of the remaining pairs of uAUGs revealed some notable combinations that likely highlight interacting pairs (Supplemental Figure 4). Of note, the predicted combination of uAUG-3 and uAUG-4 (R[Δ1Δ2]) in P. falciparum underestimates the measured amount of translation (11% predicted versus 19% measured), suggesting an interaction between uAUG-4 and uAUG-3, which, as noted previously, marks the end of the uORF formed by uAUG-4. For K562 lysates, constructs containing uAUG-2 differ most from their predicted values, indicating this element may be uniquely sensitive to the presence of the other uAUGs.
Having examined all pair-wise combinations of the four uAUGs, each three-way combination was then evaluated (Figure 3D). Unlike the broad range of differing repressive activities observed for individual and pairwise uAUGs, trios of uAUGs all repressed translation to a similar or greater degree than R[WT]. Together these data indicated that uAUGs in isolation independently confer varying levels of repression; however, multiple uAUGs may combine to produce a concerted effect that was not predicted by their individual contributions.
Investigating the effect of position and termination status on uAUG repression
Each of the uAUGs in R[WT] is distinct with respect to their Kozak context, their position relative to the translation start site, and their termination status. Previous work describing the Kozak context for P. falciparum suggests a string of adenosine bases preceding the start site is most commonly observed [28, 33]. To assess the effects of uAUG positionality while maintaining a common Kozak, a cassette comprised of the -3 to +9 sequence from uAUG-3 was individually placed at five equally spaced positions within R[Δ1Δ2Δ3Δ4] beginning at -14 nucleotides from the reporter protein coding region (Figure 4). All cassettes were inserted in the +2 frame such that if translation initiated at these sites, no reporter should be translated in-frame. Two versions of the cassette were created, one maintaining the termination with a stop codon at the end of the cassette and one without (Figure 4A/B). For the five constructs containing a non-terminating uAUG, all potential stop sites proceeding the protein coding region in-frame with the 5’ most cassette were eliminated and the effect of these mutations alone in the presence of uAUG-3 (R[Δ1Δ2Δ4]*) were evaluated (Supplemental Figure 5A).
Except for the -122 position, where the uAUG is 11 nucleotides from the 5’ cap, all cassette placements resulted in repression comparable to R[Δ1Δ2Δ4] (Figure 4C). Of note, the cassettes placed nearest to the 5’ cap had little effect on translation in either P. falciparum or K562 lysates (1.2-fold and 1.3-fold repression respectively). For P. falciparum, unlike the relative consistency of repression produced by uORF placement, the uAUG equivalent yielded a trend in repression. As the uAUG moved closer to the translation start site the repressive strength increased until maximum repression was achieved when the cassette was placed -41 nucleotides from the translation start site (Figure 4C). In comparison, K562 lysates also yielded peak repression at the -41 position, but the pattern of repression induced by both the uORF and uAUG cassettes were more similar to each other and the trend observed for uAUG cassettes in P. falciparum. These experiments indicate that in both P. falciparum and K562 lysates, the position of uAUGs contributes in part to downstream repression, however, termination status may also impact this effect, at least in the case of P. falciparum.
Evaluating the effect of GC content on TE
One distinguishing feature of the P. falciparum genome is an extreme bias in nucleotide content, especially within the intergenic regions that are ~90% AT [34]. As noted in Figure 1D and 1E, there is a significant difference in the distributions of GC content between the 5’ UTRs of genes with high and low TE with repressed genes exhibiting a higher GC bias. These differences are evident within A[WT] and R[WT], which possess 7.7% GC, and 16.9% GC respectively. This GC bias is intensified in the 60 nucleotides closest to the translation start with A[WT] containing only 1.7% GC and R[WT] containing 15% GC (Figure 2D). To investigate the impact of GC content in the context of these two constructs, substitutions were systematically introduced into the proximal region of A[WT] to increase the GC content from 1.7% to a maximum of 30% GC (Figure 5A). Substitutions were maintained between constructs, no upstream “AUG”s were introduced, and significant secondary structures was avoided (Supplementary Figure 5). In P. falciparum lysates, between 1.7% and 20% GC there was no change in TE while at 30% GC translation was repressed 1.5-fold (Figure 5A). The repressive effect of the high GC content was 1.3-fold in human K562 lysates.
The converse experiment of reducing the GC content of R[WT] was also carried out. The GC content in the last 60 nucleotides of R[Δ1Δ2Δ3Δ4] was reduced to 5% by eliminating all GC content between 4 and 60 nucleotides from the translation start site and to 0% by removing all GC (Figure 5B). A maximum translation increase of approximately 2-fold was observed relative to R[Δ1Δ2Δ3Δ4], indicating a modest but measurable impact in this context. These results were mirrored in K562 lysates (Figure 5B). Together, the result of manipulating the GC content of the last 60 nucleotides of the 5’ UTR suggests that the impact on translation to be subtle, but sensitive to the overall context.
Identifying additional cis-acting regulatory regions within R[WT] and A[WT]
In addition to the study of specific elements predicted to impact TE, a series of systematic sequence swaps were investigated, in which regions from both the 5’ and 3’ end of R[WT] and A[WT] were exchanged. Beginning with the 3’ end of the 5’ UTR, 20, 40, and 60 nucleotides were exchanged between R[WT] and A[WT] (Figure 6A and 6B). In the case of A[WT], introducing more sequence from R[WT] severely impacted TE. While some of this impact was anticipated due to the introduction of uAUG-1 and uAUG-2, additional decreases in translation were observed with sequence beyond these elements (A[60nt 3’ R]). Furthermore, the added impact beyond the introduction of uAUGs was observed only with P. falciparum lysates. For the converse experiments, exchange of sequence from A[WT] into R[WT] at the 3’ end resulted in increased translation (11.7-fold). This increase in translation was in part expected due to the elimination of uAUG-1 and uAUG-2, however the magnitude of the effect is greater than predicted from the experiments shown in Figure 3C. The effect in human K562 lysates was markedly less with a maximum difference of 1.4-fold.
Sequence exchanges at the 5’ end were similarly carried out using 10, 20, and 30 nucleotide swaps between A[WT] and R[Δ1:Δ2:Δ3:Δ4]. The latter construct was chosen over R[WT] to assess the impact in the absence of uAUGs. For P. falciparum, exchanging the first 10 nucleotides of R[Δ1:Δ2:Δ3:Δ4] into A[WT] repressed translation 2.6-fold, with a final 3.7-fold repression exchanging 30 nucleotides. (Figure 6C). In contrast, exchanging the first 10 and 20 nucleotides of A[WT] into R[Δ1:Δ2:Δ3:Δ4] activated translation up to 1.9-fold while exchanging 30 nucleotides activated translation 3.5-fold. Note that the level of translation achieved in this latter construct matches the output of A[WT], demonstrating that in the absence of uAUGs, exchanging the sequence elements within the first 30 nucleotides of the 5’ end of the 5’ UTR was sufficient to render A[WT] and R[Δ1:Δ2:Δ3:Δ4] approximately equivalent (Figure 6D).