Structures of SL2 and SL3 RNA elements. Previous studies13–19 have demonstrated the secondary structures of SARS-CoV-2 SL2 and SL3 RNA elements and their ability to fold independently. As shown in Figs. 1B and C, both SL2 and SL3 elements adopt the hairpin structure with a flanked tail at 5’- and/or 3’-end. Then, we employed the IsRNA2 model31 to predict 3D structures of SL2 and SL3 RNA elements. Our previous work indicated that the coarse-grained IsRNA2 model enables de novo modeling RNA 3D structures with a comparable performance to the atomic model but at much less cost31. For the SL2 element, the predicted 3D structure adopts nearly the same global fold as the NMR solution structure of SARS-CoV SL253 (PDB id: 2l6i), which shares the identical sequence with SARS-CoV-2 SL2, and the heavy-atom RMSD between two structures is 2.2 Å (see Fig. 1B). These points further declare the capability of IsRNA2 model in RNA 3D structure prediction. For the SL3 segment, the predicted 3D structure indicates that five (A68, A69, A70, C71, and G72) nucleotides (nts) in the hairpin loop trend to form consecutive base stackings, while the 3’-end 5-nt terminal loop is somewhat flexible. Recently, the 3D structure of SL3 RNA element was also predicted by FARFAR254 and all-atom MD simulation55. Our IsRNA2 predicted structure shares a similar fold to FARFAR2 with heavy-atom RMSD of 1.4 Å, but is slightly different from the prediction via all-atom MD in the hairpin loop region (see Figure S3).
TIA1 RRM23 mainly binds to SL3 RNA element. RRMs are the most abundant RNA binding motif and are well known for their ability to bind most commonly 3- to 5-nt stretches of single-stranded oligonucleotide56 (linear stretches or loop regions). The RRMs of a single protein can contribute differentially to the overall RNA binding, in terms of both affinity and specificity. For TIA1 protein, early experiments22,24 indicated that its preferred target is U-rich sequences predominantly directed by RRM2. Its RRM1 is thought to have little intrinsic RNA binding affinity and contribute trivially to RNA binding in the context of RRM1,2,3, while the RRM2,3 may bind cooperatively to pyrimidine-rich RNA sequences23. To explore the interactions between SARS-CoV-2 SL2/3 RNA elements and human TIA1 protein in detail, we purified recombinant TIA1 (Figure S4) and performed EMSA experiments. Expectedly, compared with the TIA1 RRM1-3 protein (1-274aa), the TIA1 RRM23 truncation (93-274aa, Fig. 1A) shows similar binding ability with both SL2 + SL3 and U1123 (positive control) RNA sequences (see Figs. 1D and S5). In contrast, the negative control oligoC (C21) RNA segment only displays a trivial binding ability to TIA1 (Figure S6), which agrees with the previous study26. For simplicity, we focus on the TIA1 RRM23 truncation in the following sections and treat it as the TIA1 protein.
To further dissect the major binding site of TIA1, we measured the dissociation constants (Kd) of TIA1 with SL2 + SL3, SL2 and SL3 from EMSA experiments, respectively. The Kd of TIA1 with full SL2 + SL3 is 0.43 ± 0.05 µM, which is consistent with published Kd values of TIA1 with other oligonucleotide substrates23,24 (Figs. 1D and S7). For individual SL RNA elements, the Kd of TIA1 with SL3 is slightly increased (Kd = 0.78 ± 0.09 µM), but that for SL2 is dramatically increased to 2.47 ± 0.37 µM (Figs. 1E and F). That is to say, the binding abilities of SL2 + SL3 and isolated SL3 element to TIA1 protein are comparable, while the binding of SL2 is obviously weaker than the others. The differences in sequences and length of hairpin loop (5-nt in SL2 vs. 7-nt in SL3, Figs. 1B and 1C) may account for the weaker binding of SL2 relative to SL3 RNA element. Overall, the SARS-CoV-2 SL3 RNA element serves as the major binding site for human TIA1 protein.
Both hairpin and 3’-terminal loops of SL3 element are essential for TIA1 binding. In order to determine the binding mode of SL3 RNA element with human TIA1 protein, we measured the binding affinities for various mutated and truncated variants of SL3. The SARS-CoV-2 SL3 RNA element folds into a stem-loop structure that contains a 7-nt hairpin loop (HP), a stem consisting of 4 base pairs and a 5-nt 3’-terminal loop (TL) (Fig. 2A). Since RRM commonly recognizes 3- to 5-nt stretches of single-stranded oligonucleotide, the roles of hairpin and terminal loops in TIA1 binding were evaluated individually. Firstly, when the 7-nt hairpin loop was truncated to a 3-nt loop (named SL3-HP3, Fig. 2B), we found that the binding affinity of TIA1 is significant decreased (Kd = 3.87 ± 1.06 µM), demonstrating the hairpin loop is required for TIA1 binding. Moreover, the 7-nt hairpin loop was mutated to all cytosines (SL3-C7) or uridines (SL3-U7) to check sequence preference. EMSA results showed SL3-C7 variant has a lower binding affinity (Kd = 1.95 ± 0.44 µM) but SL3-U7 variant has a higher binding affinity with TIA1 (Kd = 0.54 ± 0.03 µM) than wild-type SL3 (Fig. 1F, 2C and 2D). This scenario is consistent with binding character of TIA1 that prefers U-rich element22–24 and disfavors all C’s loop26 (Figure S6).
Subsequently, the function of 5-nt 3’-terminal loop for TIA1 binding was assessed. As shown in Fig. 2E, deletion of that terminal loop from SL3 (SL3-TL1) obviously decreases the binding affinity with TIA1 (Kd = 2.86 ± 0.55 µM), indicating the 5-nt terminal loop is also essential for TIA1 binding. Considering that three successive uridines (U76, U77, and U78) are present in the terminal loop and the strong U-rich preference of RRM222–24, we speculated that SL3 interacts with TIA1 RRM2 through those three uridines. To validate this assumption, we replaced those 3 Us to 3 Cs (SL3-TL-C3) or 3 Gs (SL3-TL-G3) and repeated the measurements of their binding ability with TIA1. Expectedly, either SL3-TL-C3 or SL3-TL-G3 variant has a lower binding affinity than WT SL3 (Fig. 1F, 2F and 2G). Taken together, both the 7-nt hairpin loop and the 5-nt 3’-terminal loop of SL3 RNA element interact with TIA1 protein and the putative binding mode may be that the hairpin loop binds with RRM3 and the 3’- terminal loop interacts with RRM2.
In addition, the impact of stability of the stem on binding was also explored. The stem of SL3 consists of two middle A-U base pairs and two terminal G-C base pairs. Due to the higher stability of G-C base pair over A-U one, the substitution of those two A-U base pairs with G-C ones (SL3-GC, see Fig. 2H) should increase the stability of SL3. Indeed, the folding free energy change predicted by RNAStructure57 is increased from − 2.1 kcal/mol (SL3) to -6.6 kcal/mol for SL3-GC. Intriguingly, superior stability of the stem inhibits the TIA1 binding and results in a weaker binding affinity Kd = 5.65 ± 1.70 µM (Fig. 2H). Therefore, a relatively loose stem structure is required for SL3 RNA element binding with TIA1 protein, probably to facilitate the RNA structure rearrangement during TIA1’s binding to both sites (hairpin and 3’-terminal loops) of SL3.
Putative 3D model for SL3 and TIA1 complex. Based on the above knowledge, the 3D binding model for SARS-CoV-2 SL3 RNA element with human TIA1 protein was constructed computationally through template-based approach and MD simulations. The entire procedure of computational modeling contains four steps and the details are given in Method and Materials section. Apart from the most probable conformation displayed in Fig. 3, other possible 3D models of the binding complex are displayed in Figure S8. The superior stability of the putative 3D binding model was validated by three independent 1-µs MD simulations, in which stable heavy-atom RMSD values (~ 2.2 Å) are observed during all three simulations (Figure S2). Furthermore, similar binding interfaces relative to the selected templates24,45 remained intact after long MD simulations (Figure S9). In contrast, the binding model for all C’s loop variant of SL3, which has a much weaker binding ability to TIA1 protein26, is largely unstable under the same simulation conditions (see Figure S10). In agreement with the previous study24, SL3 binding induces a compact domain arrangement for TIA1 protein, which is highly flexible in its apo state23, and the RRM2 and RRM3 domains cooperate in binding to SL3 RNA element (Fig. 3A). For SL3 element, compared to the free state (Fig. 1C), binding to TIA1 obviously stretches both the hairpin and 3’-terminal loops and the heavy-atom RMSD between predicted free and bound structures is 6.0 Å (see Figure S11). Thus, SL3 binding by TIA1 causes notable structural changes both in protein domain arrangement and RNA 3D structure adaptation.
Since the β-sheet surfaces of the TIA1 RRM2 and RRM3 serve as the classical oligonucleotide binding interface58, lots of positively charged residues, such as Arg125, Lys136, and Arg167 in RRM2 and Arg233, Lys238, and Lys274 in RRM3, are resided on the sheet surfaces to accommodate the negatively charged RNA backbone. For the base moieties in the binding interface, the main interactions include aromatic stacking, specific hydrogen bonds (H-bonds), as well as hydrophobic interactions (Fig. 3). Specifically, four conserved aromatic residues (Phe98 and Phe140 in RMM2 and Tyr206 and Phe242 in RRM324) located in the RNP-2 and RNP-1 sub motifs of RRMs interact directly with RNA bases U77, U78, U67, and A68 through π–π stacking, respectively. Additionally, U67 forms two H-bonds via N3-H3 and O2 atoms with Trp272 and Lys274, respectively (see Fig. 3B). And an acyclic stacking interaction between U67 and Lys274 is present. For other nucleotides in the hairpin loop region, A68 and A69 possess rich vdW interactions with residues from TIA1 RRM3 (Figs. 3C and D). In the case of the 3’-terminal loop, U76 forms three H-bonds through atoms O2, N3-H3, and O4 (Fig. 3E), U77 forms two H-bonds via N3-H3 and O4 atoms (Fig. 3F), and U78 forms one H-bond through atom O4 (Fig. 3G), respectively. It should be noted that the N3-H3 and O4 atoms involved in above H-bonds are unique for uracil and could not be taken place by other RNA bases, which may facilitate to understand the U-rich preference for TIA1 binding.
Influence of single nucleotide mutations on SL3 binding. The putative 3D structure of SL3 and TIA1 binding complex proposed above suggests that six nucleotides (U67, A68, A69, U76, U77, and U78) play important roles in directing SL3 binding to TIA1 protein. Therefore, exhaustive mutations were introduced for each of those six nucleotides and the relative binding free energy changes (\(\varDelta \varDelta {G}_{calc}\)) were estimated through FEP calculations. Up to now, FEP is regarded as the most rigorous and reliable method in estimating binding affinity changes, which has also achieved high accuracy in characterizing vital residues and their mutational effects for many protein-protein, protein-ligand, and protein-DNA bindings, as compared with experiments47 − 49,59–61. As shown in Fig. 4A, mutations in U67 site are all adverse, e.g., \(\varDelta \varDelta {G}_{calc}=3.17\pm 0.48 kcal/mol\) for U67C and \(\varDelta \varDelta {G}_{calc}=2.82\pm 0.50 kcal/mol\) for U67G mutations, and are predicted to decrease the binding affinities. Likewise, mutations for nucleotides U76, U77, and U78 are also all unfavorable, especially for U77 with the least binding free energy change \(\varDelta \varDelta {G}_{calc}=4.97\pm 1.00 kcal/mol\) at U77G mutation. The aforementioned U’s particular H-bonds may partially account for those disfavors, including U67@N3-H3…Trp272@O (Fig. 3B), U76@N3-H3…Asp101@Oδ1 (Fig. 3E), Asn169@Nδ2-Hδ22…U77@O4 (Fig. 3F), and so on. Those results reprove the U-rich binding preference of TIA1 protein. On the other hand, impacts of mutations at A68 and A69 sites are diverse. Though mutations A68G, A69C, and A69G are thought to be adverse (\(\varDelta \varDelta {G}_{calc}\ge 1.34 kcal/mol\)), A68C (\(\varDelta \varDelta {G}_{calc}=-0.17\pm 0.47 kcal/mol\)) and A69U (\(\varDelta \varDelta {G}_{calc}=0.17\pm 1.08 kcal/mol\)) have negligible influence on binding affinity. Furthermore, the binding free energy change for A68U mutation is \(\varDelta \varDelta {G}_{calc}=-1.64\pm 0.68 kcal/mol\), which indicates an enhanced binding affinity between SL3 A68U variant and TIA1 protein.
To further validate the influence of single nucleotide mutations indicated by FEP approach, the binding affinities (Kd) for some representative mutations were measured from ESMA experiments (see Figure S12). For comparison, the experimental binding free energy changes are derived as \(\varDelta \varDelta {G}_{exp}=-{k}_{B}T\text{l}\text{n}\left({K}_{d}^{wt}/{K}_{d}^{mut}\right)\), where \({K}_{d}^{wt}\) and \({K}_{d}^{mut}\) are dissociation constants for wild type and mutated SL3 elements, respectively, and \({k}_{B}\) is the Boltzmann constant and \(T=300 K\). For nine selected mutations (at least one case for each of aforementioned six key sites for SL3 binding), a strong correlation with Pearson coefficient \(R=0.86\) between experimental (\(\varDelta \varDelta {G}_{exp}\)) and calculated (\(\varDelta \varDelta {G}_{calc}\)) binding free energy changes is observed (see Fig. 4B). Although the magnitude of the binding affinity changes is generally larger in FEP calculations, this is probably due to the imperfect force field parameters yet. This point declares that those elect six nucleotides (U67, A68, A69, U76, U77, and U78) are indeed important for SL3 binding by TIA1 protein and indirectly prove the reliability of the 3D binding model proposed in Fig. 3.
Particularly, both the FEP calculation and subsequent experiment indicate SL3 A68U variant has an enhanced binding affinity (Kd = 0.46 ± 0.05 µM) to TIA1 protein than the wild type (Kd = 0.78 ± 0.09 µM). Inspection of the binding conformation before and after FEP calculations reveal that an extra H-bond (Arg233@Nη1-Hη11…U68@O2) was formed between SL3 A68U variant and TIA1 RRM3 (Fig. 4C), which is in good agreement with the calculated binding free energy change \(\varDelta \varDelta {G}_{calc}=-1.64\pm 0.68 kcal/mol\). Subsequently, additional MD simulations confirmed that this newly formed H-bond is very stable under current simulation conditions (see Figure S13). Considering the potentially vital function of SL3-TIA1 binding in SARS-CoV-2 replication, A68U can be treated as a possible variant of concern for COVID-19.
Interactions between TIA1 protein and SL3 RNA elements are common for betacoronavirus genomes. Apart from the SARS-CoV-2, the other members of betacoronavirus also cause illness in humans and animals, including the SARS coronavirus that caused the SARS outbreak in 200362 and the Middle East respiratory syndrome (MERS) coronavirus that triggered the MERS outbreak in 201263. In addition to the identical SL3 shares by SARS-CoV-2 and SARS genomes, multiple sequence alignment4,11,12 indicates that SL3 RNA elements are well conserved among different species within genus Betacoronavirus (Fig. 5A). More specifically, the two identified binding cores in SARS-CoV-2 SL3 RNA element to TIA1 protein are highly conserved, namely, the 5’-U[A/U]A-3’ and 5’-UU[U/A]-3’ segments located before and after the transcriptional regulatory sequences64 (TRS), respectively. Thus, we assumed that TIA1 protein could interact with other SL3 RNA elements within betacoronavirus genomes. To validate this hypothesis, the binding abilities of two representative SL3 RNA elements from other members of betacoronavirus to human TIA1 protein were studied by EMSA experiments. Expectedly, high binding affinities with Kd = 0.41 ± 0.02 µM (Rousettus bat coronavirus HKU9) and Kd = 0.30 ± 0.02 µM (MERS) were observed (Figs. 5B and C) for those two different SL3 RNA elements. Furthermore, consisting with the higher binding affinity for the A68U SL3 variant of SARS-CoV-2 (Fig. 4), six of ten concerned betacoronavirus genomes adopt uridines in the corresponding position and only three members adopt adenines, which results in the 5’-U[U/A]A-3’ binding motif preceded to the TRS (Fig. 5A). Overall, interactions between the SL3 RNA elements of betacoronavirus genomes and human TIA1 protein are common. We speculate that this viral RNA-host protein interaction plays an indispensable role in the life cycle of betacoronavirus.