A novel structural mechanism of ribosomal stop codon readthrough in VEGF-A

10 The process of ribosomal recoding is generally regulated by an autonomous mRNA signal 11 downstream of stop-codons. While structural studies have provided mechanistic insights into 12 viral systems, no such studies exist in mammalian systems. Here we define a novel structural 13 mechanism for the VEGF-A readthrough system and show that regulation is multifaceted and 14 complex, requiring a multipartite set of RNA elements located at long distances that interact 15 with each other and with hnRNP A2/B1 to synergistically enhance readthrough levels. The Ax- 16 element downstream of the stop codon adopts a unique multistem (SL-Ax 1-3 ) architecture: SL- 17 Ax 1 interacts with hnRNP A2/B1, while SL-Ax 2 interacts with an RNA element (SL-Au 1 ) located 18 ~500 nt upstream at the start of the coding sequence. SL-Au 1 also independently binds to 19 hnRNP A2/B1, which manipulates an equilibrium between alternate structures— from a 20 sequestered bulge towards one that allows for the long-range interaction with SL-Ax 2 . Overall, 21 our study not only highlights the significance of structural organization of elements within 22 the coding sequence of mRNA, but also provides a functional relevance of the closed-loop 23 mRNA organization in non-canonical translation and suggests complex mechanisms allow for 24 finer integration of many signals for a required output. to replicate the same read through capability of the 81 Ax-element 25 . Here we show that efficient readthrough in VEGF-Ax mRNA requires full-length 82 mRNA, with multiple cis -acting signals in the coding region working synergistically with each 83 other and with the trans -acting hnRNP A2/B1 to confer readthrough activity.


Introduction 27
Recoding during protein translation, either via programmed ribosomal frameshifting 28 (PRF) or stop-codon readthrough (PRT), allows for production of extended proteins or 29 polyproteins by bypassing stop codons. While these mechanisms are widely used by RNA viruses to densely pack information into their small genomes and to maintain relative protein levels 1 , 31 they have also been discovered in cellular systems, including bacteria, yeast, Drosophilae, and 32 humans 1,2 , to regulate a wide range of functions from transcriptional regulation to signal 33 transduction and sub-cellular localization 3-5 . In viruses, ribosomal recoding is essential for 34 maintaining critical relative ratios of proteins, for example structural (Gag) and enzymatic (Pol)

86
To characterize the VEGF-A system, we used a dual luciferase reporter assay in which 87 various sequences of interest were cloned between renilla and firefly luciferases to determine stop 88 codon readthrough efficiency 26 . First, the natural leakiness of the UGA stop codon was 89 determined to be ~0.05%, which is in agreement with previous studies 27 (Fig. 1b). Second, we 90 tested the full bicistronic coding sequence of the VEGF-A mRNA (VEGF-A+Ax), where we 91 observed ~40-fold increase in readthrough levels to ~2.2%, a level similar to many functionally 92 active recoding systems 1 . To ensure that the observed increase over the natural leakiness of the 93 UGA stop codon was robust and specific, we tested Annexin A2 mRNA (Fig. 1b), a gene with no 94 known readthrough activity. This construct also only allowed for ~0.05% basal leakiness of the 95 stop codon, demonstrating that VEGF-A mRNA indeed has the propensity for stimulating 96 readthrough events. Third, we tested just the 63 nucleotide Ax-element downstream of the stop 97 codon that was previously identified as the signal being responsible for readthrough activity 11 .

98
This region only led to a 4-fold increase over natural leakiness with a 0.2% readthrough activity 99 ( Fig. 1b), which is similar to the results obtained by Loughran,et al. 25 . Taken together these data 100 indicated that sequences upstream in the coding region influence the process of readthrough in 101 VEGF-Ax, and that the Ax-element alone is only marginally active for readthrough.

103
In an effort to identify factors responsible for the above observations, we first probed for 104 the presence of potential local enhancers by adding 50, 100 or 200 nucleotides upstream of the 105 Ax-element (Extended Data Fig. 1a). This did not result in increasing the readthrough levels to 106 that of the whole coding sequence, thus implying that there are no local enhancer sequences for 107 the Ax-element, and that sequences very distal and upstream to the stop codon enhance 108 readthrough. We then created truncated constructs in which nucleotides were removed from the 109 5' end of the mRNA. Surprisingly, deletion of the first 100 nucleotides dropped the readthrough 110 activity by half (~0.94% readthrough), with a further drop to Ax-only levels at the 250-nucleotide 111 deletion (Fig. 1b). Interestingly, sequence analysis of the upstream region within the first 100 nt 112 presented another potential hnRNP A2/B1 binding sequences (A93GGAGG). Altogether, this set of data demonstrate that the full, bicistronic VEGF-A mRNA is necessary for efficient 114 readthrough to occur and that the Ax-element is incapable of causing significant readthrough 115 without the presence of multiple signals in the coding region: one within the first 100 nucleotides 116 with a potential to bind hnRNP A2/B1 (SL-Au1), and the second between 200-250 nucleotides 117 (Au2).

121
To gain structural insights into the readthrough process, we chemically probed 28 the 122 bicistronic native mRNA from the cells used for the luciferase assay by DMS-MaPseq. Overall, 123 the Ax-element folds into three, short stem loops (SL-Ax1-3) with five and four nucleotide linker 124 sequences in between them, respectively. This architecture is unusual in that most recoding 125 signals are made up of pseudoknots or single stem loops ( Fig. 1c and Extended Data Fig. 1b). In 126 addition, the first seven nucleotides of the Ax-element form an additional stem loop with ten 127 nucleotides upstream of the stop codon. SL-Ax1 contains the previously identified hnRNP A2/B1 128 binding site with one of the consensus A588GG motifs predicted to be positioned in the loop 11 . 129 130 DMS mapping of the 5' end of the mRNA showed that only residues 43-102 give rise to 131 ensembles that converge into a defined fold, which we term SL-Au1. SL-Au1 was predicted to 132 form a long stem loop capped with a CCA triloop and has four short helices (1a-1d) interspersed 133 with bulges ( Fig. 1c and Extended Data Fig. 1c). In some structures, helix 1c either folded into a 134 stem with a register shift (helix 1c') or did not form at all and instead configured into a large 135 bulge, suggesting that helix 1c could possibly sample alternate arrangements (Extended Data Fig.   136 1c). The potential hnRNP A2/B1 binding sequence spans both helix 1a and 1b at the 3' end of SL-

144
We next synthesized the individual RNA domains by in-vitro transcription. Given that 145 we observed an effect of the Au1 region on stop codon readthrough, we tested whether it is able 146 to interact directly with the Ax-element. To do so, we performed Isothermal Titration Calorimetry 147 (ITC) to check for potential long-range interactions between the elements. While titration of SL-148 Au1 into SL-Ax3 did not give rise to any binding interactions, titration of SL-Ax2 into SL-Au1 gave 149 rise to specific enthalpically-driven heats of binding between the two domains (Kd = 0.95 ± 0.53 150 µM) at a 1:1 stoichiometry (n= 0.93 ± 0.1) (Fig. 1d)

160
We then performed structural analysis by Nuclear Magnetic Resonance (NMR) to 161 understand the individual motifs of the Ax-element (Extended Data Table 1). In SL-Ax1, the 5-bp

170
As predicted, SL-Ax2 forms a four base pair stem capped with a U607CGGG pentaloop. The 172 first two residues of the pentaloop, U607 and C608, continue to stack on the 5' strand of the stem 173 base pairs, after which the chain turns with the following three residues G609GG showing 174 continuously stacking NOES with the 3' strand of the helix (Fig 1g and Extended Data Fig. 2b).

175
As in the GNRA-type fold, the nucleobase of G611 stacks on the ribose of A612 leading to an

190
To aid in assignments of the relatively large Au1 construct and to unambiguously 191 corroborate the presence of the equilibrium structure predicted in the SL-Au1 element, we 192 synthesized smaller segments of Au1 (1a-1b, 1b-1c, 1b-1c', 1c'-1d, 1c-1d and 1d; along with the 193 full-length construct (Extended Data Fig. 3-6). Indeed, our data show that SL-Au1 forms a single 194 stem with a CCA loop and is interspersed with bulges that divide the stem into four short helices, 195 1a-1d (Extended Data Fig. 3). Furthermore, as predicted, two sets of signals were assigned for the

232
The following three cytosine residues show a continuous NOE walk. With only a partial stacking 233 between C84 and A84, the triplet Cs are extruded out in way that seem poised to make long-range 234 interaction with SL-Ax2, as indicated by the ITC data (Fig. 1d, 2d).

255
We next wanted to test if the previously identified hnRNP A2/B1 binding sites (A584GG 256 and A588GG in SL-Ax1) and the new potential sites (A93GG and A96GG in helix 1a and 1b, respectively) 11 interacted with hnRNP A2/B1. It was previously reported that mutating the 258 A588GG hnRNP A2/B1 binding site led to decreased readthrough levels 11 . Similar to previous 259 hnRNP binding studies 32-36 , we used an hnRNP construct that consisted of the two RNA-260 recognition motifs (RRMs), but lacked the aggregation-prone C-terminal domain, previously 261 shown to be involved in oligomerization and nuclear localization of the protein 37-39 (Fig. 3a).

262
Titration of hnRNP A2/B1 into SL-Ax1 and SL-Au1 by ITC gave data that fit well with a single-site with SL-Au1 (∆H -3.6 ± 1.7 kcal/mol and ∆S 17.6 ± 5.7 cal/mol/K) was entropically driven, the latter 269 suggesting rearrangement of RNA structure upon protein binding.

271
Since hnRNP A2/B1 bound both SL-Au1 and SL-Ax1, and since it is able to dimerize 272 through its C-terminal domain 38 , we tested if it is able to mediate the long-range interaction 273 between SL-Au1 and SL-Ax1 via its dimerization domain. Using an in-vitro rabbit reticulocyte 274 lysate (RLL) system, we tested the effects of full-length hnRNP A2/B1 and hnRNP A2/B1 lacking 275 a C-terminal domain (DCTD) on VEGF-A readthrough efficiencies (Fig. 3c). Recoding was 276 replicated in this assay, with baseline readthrough values of ~4%, which is slightly higher than 277 those in cells. Such differences have previously been reported 40 , and generally arise from tighter 278 regulation in cells. By adding either full-length hnRNP A2/B1 or hnRNP A2/B1-DCTD to VEGF-279 A mRNA, we are able to increase readthrough by a factor of ~2.5 (Fig. 3c) to 9.6%, and further 280 additions of hnRNP A2/B1 reproducibly reduced readthrough. These results not only confirm the 281 stimulatory effect of hnRNP A2/B1 binding, but also point to other hnRNP A2/B1 driven 282 mechanisms to balance required readthrough levels. Importantly, there was no observable 283 difference in readthrough levels between the full-length and dimerization-deficient hnRNP 284 A2/B1 constructs, indicating that hnRNP A2/B1 dimerization is not required for readthrough 285 activity, and that another mechanism must be responsible for mediating the observed long-range interactions. As a control, using the Murine Leukemia Virus (MLV) readthrough system resulted 287 in constant readthrough levels, showing that the effects of hnRNP A2/B1 are specific to VEGF-A 288 (Fig 3c).

290
To determine the exact binding site within SL-Au1, we performed a protein titration on

303
Based on recent findings that it is the RRM1 that recognizes the AGG sequence 36 , we used 304 a shortened hnRNP construct to determine the mode of interaction with SL-Ax1. Binding between 305 RRM1 and SL-Ax1 resulted in perturbations of specific residues in the loop, with G590 giving clear 306 intermolecular NOEs to an aromatic residue of RRM1, thus providing confirmation of residues 307 A588GG sequence as the binding site (Fig. 3e). The second potential A584GG binding motif remains 308 sequestered within the stem as unaffected by protein binding; in fact, the majority of the SL-Ax1

313
The various elements synergistically regulate recoding efficiencies Given both the unusual multi-stem nature of the Ax-element, and potential unique 315 functions of the individual stems, we wanted to first determine their contributions to recoding.

316
We created constructs in which SL-Ax2 and SL-Ax3 elements were removed (Fig. 4a). We also 317 checked DMS-MaPseq reactivities to ensure that in either deletion, the fold of SL-Ax1 is not 318 perturbed (Extended Data Fig. 7a). Deletions of SL-Ax3 and both SL-Ax2 and SL-Ax3 led to 319 significant reduction in readthrough levels by 40% and 70%, respectively, implying that the 320 entirety of the Ax-element is required to maximize readthrough levels (Fig. 4b). Interestingly, 321 removing the entire Ax-element decreased readthrough levels by almost 80% to a level of 0.5% 322 (Fig. 4b).

324
While we do not yet understand the structural mechanism by which SL-Ax3 influences 325 readthrough, our understanding of the long-range interaction between SL-Ax2 with SL-Au1 326 allows us to test for its contribution. We thus created mutants in which the C82CC involved in 327 long range interaction in SL-Au1 was sequestered by either changing them to Gs along with 328 mutating G61 to C, or by mutating the opposing C62C residues to guanosines. Both of these give 329 rise to three contiguous G-C base pairs, which should preclude inter-domain interaction of this 330 bulge and lead to appreciable decreases in readthrough by ~40% and ~50%, respectively (Fig. 4c).

331
Similarly, disruption of the docking G609GG motif in SL-Ax2 by replacing the pentaloop with a 332 GAAA tetraloop, led to an equivalent 40% decrease in readthrough activity (Fig. 4c).

334
Next, we wanted to test the contributions of protein binding motifs to recoding 335 efficiencies. We first tested protein binding efficiency of the previously published dinucleotide 336 A587A to U587U mutation in SL-Ax1 11 . ITC experiments on a SL-Ax1 A587A:UU mutant yielded a 337 ~1.5-fold decrease in binding when compared to wild-type SL-Ax1 (Extended Data Fig. 7b). This 338 correlates with an observed ~50% reduction in readthrough levels (Fig. 4c). Similarly, for 339 experiments on a SL-Au1 A92A:UU mutant, we observed a ~2-fold increase in Kd, or 2-fold 340 decrease in binding affinity (Extended Data Fig. 7c), which correlates to a ~40% decrease in 341 readthrough levels for this mutation (Fig. 4c), suggesting a strong-interplay between hnRNP 342 A2/B1 binding and readthrough of the VEGF-Ax system. Interestingly, a double mutant in which either single mutant (Fig. 4c). This epistatic behavior is a strong indication of direct influence of 345 the two binding events on local RNA structure, and consequently their long-range interactions.

374
Altogether, our study allows us to start putting together a mechanism by which 375 readthrough frequency-and hence relative proportions of VEGF-A (angiogenic) to VEGF-Ax 376 (less angiogenic) isoforms-may be maintained (Fig. 5). Our model suggests that regulation

463
(Thermo Scientific) and allowed to bind for 2 hours on a rocker at 4°C. Following binding, beads 466 were washed twice with 50 mL ice cold hnRNP wash buffer (50 mM Phosphate pH 7.5, 500 mM 467 NaCl, 1 mM DTT). Subsequently, protein-bound beads were loaded onto a column and washed 468 overnight with 2 L hnRNP wash buffer using a pump at a flow rate of 1.8 mL/minute at 4°C.       532 conformation (blue) shows the bend described above. It also overlays with a model envelope (grey) of the 533 SL-Au1:SL-Ax2 long-range interaction complex. Base coloring is as in Figure 1.

541
Error bars indicate standard error (n=3) and statistical tests were performed using a 2-tailed t-test (* p <

549
For truncated products, the maximal extent of the VEGF-A mRNA is indicated underneath the schematic.

577
The complete imino proton NMR assignments (280 K) for SL-Au1 in the 1c conformation were obtained.

578
Imino assignments are colored to match the schematic of the secondary structure, which was subdivided 579 into helices 1a (blue), 1b (red), 1c (grey), and 1d (green).     Error bars indicate standard error (n=3) and statistical tests were performed using a 2-tailed t-test (* p < 0.05, ** p < 0.005). (c) The secondary structure reveals an extended stem-loop for SLAu 1 , including an alternate conformation (bases shown in orange and blue for simplicity), as well as shorter stem-loops SL-Ax1-3 within the Ax-element. hnRNP binding sites are indicated in purple, stop codons are shown in red, and bases involved in the long-range interaction are shown in green. (d) ITC data show that SL-Au1 is able to speci cally interact with SL-Ax2 (black squares), but not with SL-Ax3 (white squares). Tertiary folds are shown for SL-Au1 (PDB 7KUB) (e), SL-Ax1 (PDB 7KUC) (f), and SLAx 2 (PDB 7KUD) (g) with the same base coloring as in (c). (h) A model was created showing the docking interaction between SL-Au1 and SL-Ax2 .

Figure 2
Structural features of SL-Au1. (a) Molecular detail of the C72CA triloop that caps SL-Au1 . (b) Helix 1c of SL-Au1 contains 2 tandem C-C non-canonical basepairs. (c) SL-Au1 is able to form a triple base-pair interaction between C89 and the A92-U54 base-pair. (d) Helix 1c' assumes a kinked conformation and is anked by an A56CCA bulge at the junction with helix 1b (e) and an A81CCCA bulge at the junction with helix 1d (f). (g) The density, as determined by SAXS, of free SL-Au1 in the 1c' conformation (blue) shows the bend described above. It also overlays with a model envelope (grey) of the SL-Au1:SL-Ax2 long-range interaction complex. Base coloring is as in Figure 1. ITC binding studies of RRM1/2 to SL-Au1 (left) or SL-Ax1 (right) show tight and speci c binding. Representative curves of 2 trials are shown. (c) Rabbit reticulocyte lysate assays to which either full length hnRNP A2/B1 (dark grey) mRNA or RRM1/2 (light grey) was added show no difference in readthrough activity for an MLV control construct (top) or a VEGF-A construct (bottom). Error bars indicate standard error (n=3) and statistical tests were performed using a 2-tailed t-test (* p < 0.05). (d) Titration of RRM1/2 (black, 0 molar equivalents; blue, 0.3 molar equivalents; pink, 1 molar equivalent) into SL-Au1 at 280 K shows the shifting of several 1H-1H NMR peaks of base pairs disrupted by protein binding. (e) Titration of RRM1 into SL-Ax1 at 311K results in 1H-1H NOE cross-peaks between G590 and aromatic ring hydrogens of the protein. readthrough. (c) Each tested mutation led to a decrease in readthrough levels, highlighting the importance of each region in mediating ribosomal readthrough. Error bars indicate standard error (n=3) and statistical tests were performed using a 2-tailed t-test (* p < 0.05, ** p < 0.005). RNA alternate conformations mechanism of VEGF-A readthrough. A mechanistic model of the VEGF-A coding mRNA shows the formation of a linear SL-Au1 element at the 5' end of the RNA and three stem loops (SL-Ax1-3) at the 3' end of the RNA in the absence of hnRNP A2/B1. In the presence of hnRNP A2/B1, SL-Au1 undergoes a register shift, exposing three cytosine residues. These three cytosine residues are able to partake in long-range Watson-Crick base pairing with SL-Ax2, thereby promoting translational stop-codon readthrough.