Design of the platform for the large-scale analysis of small molecule–RNA interactions
Regarding the first RNA structure library for the analysis (Library-1), we designed 1824 RNA structural motifs by extracting the terminal loops of human pre-miRNAs and adding several repetitive and control sequences.30 Five different barcodes were allocated to each motif structure to exclude the outliers representing non-specific binding to the barcode sequences. Thereafter, the small molecule was immobilized onto beads via biotin–streptavidin interactions (Fig. 1a). We performed the pull-down process by mixing the RNA structure library and immobilizing the small molecule, followed by the washing and elution steps to collect the bound RNAs. The RNAs that were pulled down were quantified by a DNA barcode microarray to obtain the fluorescence intensity of each RNA structure because of the correlation of fluorescence intensities with binding affinities after background subtraction by no-ligand-conjugated streptavidin control samples.30
In this study, we selected G-clamp and thiazole orange (TO) derivatives as the binding molecules (Fig. 1). G-clamp can recognize an unpaired guanine base in RNA loop structures by forming four hydrogen bonds (Fig. 1b).31–33 G-clamp was used to validate our system because it binds strongly to a wide range of RNAs. Conversely, the TO derivatives, TO-PRO-1 and TO-PRO-3, are known as fluorescent light-up probes for imaging and fluorescent indicator displacement (FID) assays (Fig. 1c).34–38 FID represents a high-throughput method for identifying novel RNA-binding molecules.39–45 For example, TO-PRO-3, a deep-red fluorescent indicator, was used in an FID assay to screen for compounds that bind to the bacterial A-site, influenza A virus RNA, and G4 DNA.37,38,46 However, the binding information of these fluorescent indicators and their target RNA sequences is still limited. We believed that it would be beneficial to determine the RNA binding profiles of such conventionally used indicators to further expand the repertoire of target RNA sequences that can be used in FID assays. Based on the structure of TO-PRO-1, we designed the N3-modified TO–N3 and TO–N3-2 exhibiting different linker positions (Fig. 1d). Similarly, we designed TO-3–N3 and TO-3–N3-2. The N3-modified G-clamp–N3, TO–N3, and TO-3–N3 were synthesized using N3–PEG3–NH2 as an N3 linker after preparing the carboxylic acid intermediates (Schemes S1–S3), whereas TO–N3-2 and TO-3–N3-2 were synthesized using N3–PEG4–NHS ester as an N3 linker after preparing the amine intermediates (Schemes S4 and S5).47,48 These N3-modified molecules were conjugated to biotin via a strain-promoted azide–alkyne cycloaddition (SPAAC) with DBCO–biotin (Figs. 1a, S1, and S2)49,50 and used for the experiments without further purification (Figures S1 and S2).
Large-scale analysis of the interaction of G-clamp-N3 with Library-1
First, we ranked the RNA motifs from Library-1 based on their G-clamp binding (ranking list S1). To understand the binding properties of G-clamp, the numbers of bases in the single-stranded (ss) and double-stranded (ds) RNA regions were investigated using the predicted secondary structures of the pre-miRNA loops (Fig. 2). Regarding ssRNA, the G count of high-ranking RNAs (1–360) was significantly higher than that of all the pre-miRNAs in Library-1. Contrarily, the G count of the low-ranking RNAs (1441–1800) was significantly lower than that of all the examined pre-miRNAs. Conversely, the C counts of the high- and low-ranking RNAs were lower and higher than those of all the pre-miRNAs in Library-1, respectively. The U count of the high-ranking RNAs was lower than that of all the pre-miRNAs, and the A count of ssRNA was not significantly different among the rank sections. Regarding dsRNA, the four bases exhibited smaller differences among the ranks compared with ssRNA. The C and U counts were inversely proportional to the G count, as C and U in the ssRNA region can form base pairs with the neighboring G bases. Furthermore, the percentage of the unpaired G count highlighted an unpaired-G selectivity (Figure S3). Five or more unpaired Gs were mainly observed in high-ranking RNAs (1–180), and the percentage decreased gradually as the rank decreased. Contrarily, few RNAs without any or only a single unpaired Gs were observed in the high-ranking group, and the percentage gradually increased as the rank decreased. These results corresponded to the fact that G-clamp mostly recognizes G base in the ssRNA regions.32
Next, to validate our screening platform for RNA structures, we selected 17 sequences from the high-affinity (top 100), intermediate-affinity (101–1000), and low-affinity (1001–1824) groups and measured their apparent dissociation constants (KDapp) by fluorescence titration (Figure S4). The RNA motifs with three base pairs of a common stem (5′-AGC-motif-GCU-3′) were used to measure KDapp. A histogram of Z-scores and the correlation between the Z-scores and KDapp values are shown in Figs. 3a and 3b and Table S1. The minimum free energy structures of the selected RNAs are shown in Figs. 3c and S5. The ranks 1 and 2 RNAs (Fig. 3c, top) contained unpaired guanine bases in their loop structures and exhibited strong G-clamp binding (KDapp = 0.024 and 0.022 µM, respectively). For the rank 1 RNA (hsa-mir-4520-1 loop), we performed the G mutation assay using two G-mutated hsa-mir-4520-1 loops (mir-4520-1-mutG2A and -mutG7A). Although mutG2A exhibited strong binding (KDapp = 0.011 µM) similar to the wild type, mutG7A exhibited weaker binding (KDapp = 15 µM). The double mutant mutG2,7A also exhibited weaker binding (KDapp = 3.7 µM) than the wild type, indicating that G7 contributes to the strong interaction with G-clamp. To consider the selectivity of G7, the molecular modeling of the complex structure between mir-4520-1 and G-clamp–N3 was performed using RNAComposer51,52 and MacroModel (Fig. 3d). When G-clamp is bound to 7G by four hydrogen bonds, it can interact with neighboring bases. We considered that these interactions, such as stacking with CG base pairs at the top of the stem, would facilitate strong binding in addition to the formation of the four hydrogen bonds, indicating that G-clamp does not recognize all Gs on the loop (G-clamp recognizes specific Gs). The high number of G bases in the ssRNA region of high-ranking RNAs probably increased the probability of the presence of G bases that bind to G-clamp strongly. In the high-affinity group, two of the selected RNA motifs contained the G4 structure. The KDapp values of the hsa-mir-6850 loop (rank 28) and G4_(GGGU)6 (rank 38) were 0.19 and 0.15 µM, respectively. In the intermediate-affinity group, even though hsa-mir-548ba (rank 522) exhibited a loop that was similar to that in hsa-mir-4520-1, its KDapp value (10 µM) was much higher. Comparing the modeling structures of hsa-mir-4520-1 and hsa-mir-548ba (Figure S6) revealed that G-clamp–N3 cannot interact with adjacent bases when it forms hydrogen bonds with a G base on the loop structure of hsa-mir-548ba. In the low-affinity group, the loops without any G bases, such as hsa-mir-4773-1 (rank 1192), hsa-mir-4282 (rank 1775), and common stem sequence with four Us in the hairpin loop, exhibited weak binding (KDapp > 40 µM; Figures S4 and S5). Within the group of selected RNAs, only (CUG)16 (rank 43) deviated from our expectations in the fluorescence titration experiment (Fig. 2b, green color). Overall, we observed a good correlation between the Z-scores and observed KDapp (Fig. 2b, Spearman’s correlation coefficient: −0.86); the coefficient without considering (CUG)16 exhibited an even higher correlation (− 0.95). The G4 structures, which are susceptible to bias when using sequencing-based methods, were evaluated and ranked. These results indicate that our system for the large-scale analysis of the RNA structure libraries can ensure accurate assessments of small molecule–RNA interactions.
Large-scale analysis of the interaction of the thiazole derivatives with Library-2
Next, we investigated the binding of different RNA motifs to the TO derivatives using our second RNA structure library, Library-2 (ranking lists S2–S5). Library-2 contains 3000 RNA structural motifs that were designed by extracting the terminal loops of human pre-miRNAs, along with SARS-CoV-2 and influenza A virus RNAs and several repetitive and control sequences. Compared with the G-clamp binding profile, TO and TO-3 exhibited distinct profiles (Fig. 4a), although a significant correlation was observed between their binding profiles (Fig. 4b). These data indicate that the TO derivatives exhibited similar selectivities, which were unique compared with the G-clamp, as expected. The correlation coefficient between TO–N3 and TO–N3-2 with different linker positions (r = 0.78) was lower than that between TO–N3 and TO-3–N3 with the same linker position (r = 0.91), suggesting that the linker positions affect the binding profile (Fig. 4b). The high-affinity group of RNAs for the TO derivatives was mainly populated with G4 RNAs. The kernel density estimation of the Z-scores of the TO derivatives indicated the significant enrichment of the G4 control RNAs (Figure S7).
To understand the binding properties of the TO derivatives, the numbers of bases in the ssRNA and dsRNA regions were quantified using the predicted secondary structure of the pre-miRNA loops similar to the analysis of the G-clamp (Fig. 4c). For ssRNA, the G count of the high-ranking RNAs (1–360) was significantly higher than that of all the pre-miRNAs in Library-2. Contrarily, the ssRNA counts of the other bases were not significantly different among the different ranks. Regarding dsRNA, the G and C counts of the high-ranking RNAs (1–360), as well as the A and U counts of the low-ranking RNAs (1441–1800), were significantly higher than that of all the pre-miRNAs. The count tendencies of TO-3–N3 and TO–N3 were similar. Overall, these results altogether suggest that the TO derivatives prefer G-rich ssRNA and G/C-rich rigid stem structures, such as hsa-mir-5091 and − 4437 (Fig. 4d). Regarding ssRNA, we further examined the total number of nucleotides in the internal and hairpin loops (Fig. 4e). Although high-ranking RNAs exhibited more G and A bases in their internal loops, the hairpin loops of high-ranking RNAs only exhibited a preference for more G but no other bases. These results suggest that the TO derivatives prefer the G/A bases in the internal and G-rich hairpin loops. A likely explanation is that the internal loops comprising G/A bases may create a binding pocket that is ideal for intercalation, whereas the G-rich hairpins may form G4-like structures. To confirm the preference of the TO derivatives for internal loops comprising G/A bases, we compared the KDapp values of hsa-mir-4437 and its internal loop (AGG to UCC) mutant, mir-4437-mut (Figs. 4d and S8). Although the KDapp values of TO–N3 and TO-3–N3 for the wild type hsa-mir-4437 loop were relatively low, 4.4 and 11 µM, respectively, the KDapp values of mir-4437-mut were much higher (> 40 µM), suggesting that the G/A bases in the internal loop are crucial to the strong binding of the TO derivatives to the hsa-mir-4437 loop at least.
To further validate the binding profiles of the TO derivatives that were generated by our screening platform, the KDapp values of TO–N3 and TO-3–N3 interacting with 11 RNAs were measured by fluorescence titration (Figures S9 and S10 and Table S2). For the high-ranking RNAs (top 100), the KDapp values correlated well with the Z-scores of TO–N3, and the Spearman correlation coefficient was − 0.93 (Fig. 5a). Contrarily, no strong binding was observed for the low-ranking RNAs (KDapp > 40 µM). Similarly, the KDapp values of TO-3–N3 also correlated well with the Z-scores of TO-3–N3 of high-ranking RNAs (top 100), as the coefficient was − 0.96 (Fig. 5b). These results confirm that our system can provide accurate assessments of different binding modes of ligands and structured RNAs containing G4 structures.
Additionally, we extended this analysis to the commercially available indicators, TO-PRO-1 and TO-PRO3, by measuring their KDapp values to the 16 selected RNAs (pre-miRNAs, G4 RNAs, and virus RNAs) and calculating the correlations with the Z-scores of TO–N3 and TO-3–N3, respectively (Figures S11–S13 and Tables S3 and S4). Regarding TO-PRO-1, the KDapp values exhibited weak and improved correlations with the Z-scores of TO–N3 (r = − 0.60) and TO–N3-2 (r = − 0.71), respectively, indicating that the binding profile of TO–N3-2 may reflect TO-PRO-1 binding by various RNA motifs more accurately (Fig. 5a). Conversely, for TO-PRO-3, there were significant correlations between the KDapp values and Z-scores of TO-3–N3 (r = − 0.89) and TO-3–N3-2 (r = − 0.90) (Fig. 5b). Taken together, these binding profiles will benefit the selection of the proper combinations of target RNA and fluorescent indicators for FID assays.
Screening of the novel RNA-binding molecules by fluorescent indicator displacement assay using TO-PRO-1 and TO-PRO-3
Based on the binding profiles of the TO derivatives, we selected the intermediate-affinity-ranked combinations of the indicator and disease-related human pre-miRNAs previously observed to be dysregulated in several tumors, hsa-mir-221, -191, and − 21, for the FID assay (Figs. 6).53–55 As a high-rank G4 RNA control, hsa-mir-6850 was selected. Additionally, as a low-rank control, the hairpin loop motifs from SARS-CoV-2 RNA (SARS-low) and hsa-mir-374a were selected. The predicted RNA secondary structures are shown in Fig. 6b, and the KDapp values of TO-PRO-1 and TO-PRO-3 to these target and control RNAs are listed. The signal-to-background (S/B) ratios of TO-PRO-1 and TO-PRO-3 for these RNAs are summarized in Fig. 6c. The S/B ratios of the low-rank RNAs were significantly lower than the others. A low S/B ratio is not favorable for performing an accurate FID assay. To identify the small molecules that bind to the target human pre-miRNAs listed above, we employed FID to screen a chemical library comprising 118 oxidation–reduction compounds (Targetmol). The fluorescence emission of TOs depends on the RNA binding: free TOs exhibit low fluorescence, although the intensity increases upon RNA binding. Thus, the fluorescence emission of TOs decreases when a test compound interacts with a target RNA via the same site as the fluorescent indicator, thereby identifying it as a hit compound (Fig. 6a). Through this screen, we identified four hit compounds that disrupted TO–RNA interactions (Figs. 6d and S14). Although three of these compounds—baicalein (Bai), myricetin (Myr), and chelerythrine chloride (Che)—were hits obtained from the assay when using TO-PRO-1, Bai did not meet our selection criteria when TO-PRO-3 was used as the indicator; rather, AS 602801 (AS) became a hit compound. This is probably because TO-PRO-3 differs in size and/or fluorescent properties compared with TO-PRO-1, indicating that diverse fluorescent indicators should be included to avoid false negatives and positives. Regarding the hit compounds, Myr56–58 and Che59–61 have been reported as DNA or RNA binders, whereas AS has not been reported.
The RNA binding of the four hit compounds was validated by measuring their KDapp values by fluorescence titrations. These experiments revealed that Bai exhibits weak RNA binding (KDapp > 40), indicating that it is a false-positive compound for targeting disease-related human pre-miRNAs when using TO-PRO-1. The structurally similar flavonoid, Myr, exhibited moderate binding (KDapp = 16–25) to target RNAs, as the indicators revealed (Figures S15 and S16). Unexpectedly, Myr bound strongly to hsa-mir-6850, which forms a G4 structure, although it was not identified as a hit compound when TO-PRO-3 was used. This suggests that Myr and TO-PRO-3 might have different binding sites. When using low-rank RNAs, Myr exhibited weak RNA binding (KDapp > 40) even though the indicators exhibited positive. Moreover, we observed that Che was bound to all the RNAs (KDapp = 2.6–16) though the indicators exhibited negative for low-rank RNAs (Figs. 6d and S17). Overall, predictably unreliable results were obtained when low-rank RNAs were used. The precisions of demonstrating the reliability of the assay data across the investigated RNAs became worse as the RNA ranking decreased (Figure S18), suggesting that our binding profiles offered insight into the selection of applicable RNA targets for indicators in FID assays.
In the fluorescence spectra of Che, two major peaks were observed at 420 and 550 nm (Fig. 7a and S17). Under aqueous conditions, Che forms an OH adduct that emits a strong fluorescence signal at 420 nm when the reaction is at equilibrium.62,63 However, the intensity of this 420 nm peak increased dramatically at pH 8 as we shifted the experimental conditions from pH 5 to 8, indicating that the addition of OH was favored under weak alkaline conditions (Figure S19). Although the fluorescent intensity of the OH-adduct peak at 420 nm decreased after RNA addition, the 550 nm peak increased. This is likely because Che was protected from hydrolytic attacks after RNA binding and shifted the reaction equilibrium toward Che. Finally, we observed AS binding to hsa-mir-191, -21, and − 6850 (KDapp = 14, 20, and 4.5, respectively). Interestingly, this compound exhibited strong light-up properties (Figs. 7b and S20): although free AS exhibited almost no fluorescence (Φfree = 0.00063), strong fluorescence was observed after RNA binding (Φbound = 0.054). The methine tautomer64 likely contributes to this light-up property. TO-PRO-1 could not detect the RNA binding of this compound because of the interference of its strong light-up property at a similar wavelength range with the detection of the fluorescence originating from TO-PRO-1. These characteristics make AS an interesting seed compound for developing novel RNA binders and fluorescence probes.