When compared to the genomes of other RNA viruses, coronaviruses have been found to possess largest genome sizes. They are capable of establishing reservoirs in both human and zoonotic populations, enabling their transmission and circulation among a range of animal hosts, including bats, pangolins, civets, cats, mice, pigs, whales, dogs, and raccoons [24]. Till date, SARS-CoV-2 is regarded as the most lethal among the family of coronaviruses. The genome of SARS-CoV-2 consists of fourteen Open Reading Frames (ORFs), which encode for 16 non-structural proteins, 4 structural proteins and 11 accessory factors [7, 8]. The 2 major polyproteins, ORF1a and ORF1ab, are present in SARS-CoV-2 proteome, which are to produce individual replicase complex nonstructural proteins. These nonstructural proteins (NSP1-16), play a crucial role in regulating early transcription and facilitating genome replication [9]. NSP3, NSP4, and NSP6 are collectively involved in the formation and assembly of double membrane vesicles (DMVs) within the Golgi apparatus of the host. These DMVs provide a site for the anchorage of viral replication complexes, which facilitate viral genome replication and the production of progeny virions within the host cell upon their release, thus enabling further infection [10]. On the other hand, accessory genes of SARS-CoV-2 also play a major role in regulating replication and contribute in the pathogenicity of virus. Previous studies reported deletion of accessory genes ORF3a, 3b, 5a and 5b from avian coronavirus and observed resultant mutated virus exhibiting reduced pathogenicity [25]. Thus, targeting accessory genes of SARS-CoV-2 can be an effective strategy for therapeutic purposes. For this study, the sequences of non-structural (NSP3, 4, and 6) and accessory genes (ORF3a, 6, 7a, 8, and 10) were utilized to predict short interfering RNA molecules that could potentially interfere with SARS-CoV-2.
Sohrab et al., 2022 predicted 4 siRNAs for targeting the receptor binding domain (RBD-S) of SARS-CoV-2 using an in silico pipeline. They found no cytotoxicity in the Vero E6 cell line based experimental evaluation of the predicted siRNAs and one out of four siRNAs showed better antiviral activity based on qPCR Ct value [38]. In another study by Sohrab et al., 2021, they identified 7 efficient siRNA molecules for targeting ORF1ab of MERS-CoV using siDirect 2.0 and their designed siRNAs showed no cytotoxic effects in Vero cells (ATCC CCL-81) at different concentrations. They identified 2 out of 5 siRNAs for the inhibition of viral replication more efficiently on the basis of real-time PCR [39]. Perez-Mendez et al., 2021 also targeted the 5´ UTR region of Zika virus via an siRNA designed in silico. A significant reduction in cycle thresholds was found in C6/36 cells when transfection with 1 and 2 µg of the synthesized siRNA was done in infected cells at an MOI of 0.001 for one hour (p < 0.05) [23]. ElHefnawi et al., 2016 also predicted 2 siRNAs against 5´ NTR of Hepatitis C virus. Both of the siRNAs (HCV353 and HCV258) showed efficient inhibition of HCV replication mechanism at low concentrations. Moreover, both siRNAs suppressed the replication of HCV genotype 4 isolates derived from infected Huh-7 cells efficiently. The long-term treatment of HCV replicon cells also did not lead to the emergence of escape mutant viruses which ensured the sustained effectiveness of the antiviral therapy over an extended time period [19]. We developed a novel in silico pipeline for predicting and validating siRNA molecules that combines multiple effective in silico methods used in the previous studies [38, 39, 23, 19], which demonstrated successful inhibition of viral replication in vitro. This innovative pipeline confidently aims to identify and validate siRNAs with the potential to inhibit viral replication in in vitro experiments (Table 4).
Table 4
Comparative analysis of in silico pipelines from the previous antiviral siRNA studies and their in vitro implementation results.
Targeted viruses and their genomic region | In silico siRNA designed pipeline | Total designed siRNAs | In vitro implementation and results |
Sequence retrieval from repositories & databases | Multiple Sequence Alignment | Consensus sequence generation | Phylogenetic analysis | siRNA prediction using siDirect 2.0 | Binary validation using siRNApred | i-Score and s-Biopredsi scores calculation | Thermo-dynamic stability | Secondary structures prediction & MFE calculation using RNAStructure | Tertiary structure Prediction using RNA Composer | Molprobity validation | Off-target minimization |
Receptor-binding domain (RBD-S) of SARS-CoV-2[38] | ✓a | ✘ | ✘ | ✘ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | 4 | No cytotoxicity for tested siRNAs was found in Vero E6 cells based on experimental evaluation and analysis of generated results. Following strict selection and scoring criteria, a better antiviral efficiency was observed in 1 out of 4 siRNAs based on q-real-time PCR Ct value. |
ORF1ab of MERS-CoV[39] | ✓a | ✓ | ✘ | ✘ | ✘ | ✘ | ✓ | ✘ | ✘ | ✘ | ✘ | ✘ | 7 | siRNAs showed no cytotoxic effects at various concentrations in Vero cells (ATCC CCL-81). On the basis of real-time PCR results, two of the designed siRNAs were found to inhibit viral replication more efficiently as compared to the other five. |
5´ UTR of Zika virus[23] | ✓a | ✓ | ✓ | ✓ | ✓ | ✘ | ✘ | ✘ | ✓ | ✓ | ✓ | ✓ | 1 | Significant reduction in cycle thresholds was observed in C6/36 cells upon transfection with 1 and 2 µg of designed siRNA after being infected with Zika virus at an MOI of 0.001 for one hour (p < 0.05). |
5´ NTR of Hepatitis C virus[19] | ✓b | ✓ | ✓ | ✘ | ✘ | ✘ | ✓ | ✘ | ✘ | ✘ | ✘ | ✓ | 2 | Both of the designed siRNAs (HCV353 and HCV258) demonstrated efficient inhibition of HCV replication mechanism at low concentrations. Moreover, both siRNAs suppressed the replication of HCV genotype 4 isolates derived from infected Huh-7 cells efficiently. Also the long-term treatment of HCV replicon cells did not result in the emergence of escape mutant viruses. |
a Retrieved from NCBI b Retrieved from HCV LANL |
Multiple sequence alignment of selected gene sequences was performed for the conservation analysis. The sequences of NSP3, NSP4, and NSP6 showed high levels of conservation among the 100 selected sequences of each gene in the circulating strains of SARS-CoV-2 across the globe, from year 2019 to 2023. It was also observed that the NSP3 sequences exhibited a substantial frequency of mutations. Our observation is consistent with a previous study on conservation and mutational analysis of nonstructural genes of SARS-CoV-2, on the basis of geographic distribution by Anand et al., 2021, in which some of the highly mutating positions in NSP3 were reported as “hotspot zones” [26]. In another conservation and phylogenetic analysis by Fiaz et al., 2021, NSP3 was reported as the most variable nonstructural gene [27]. Among our target sequences, a point mutation was observed in NSP3 sequence of a Japanese strain (accession number = OQ504245.1) showing Guanine in place of highly conserved Adenine residues at position 1140. Another mutation was observed in NSP4 sequences, showing Thymine in place of conserved Cytosine residues at position 732, in a strain from Switzerland (accession number = OQ050229.1). Our conservation analysis of accessory genes also revealed a high level of conservation among the selected sequences. In a previous conservation analysis of accessory proteins of SARS-CoV-2, Li et al., 2020 reported diverse mutations disseminated within ORF3a and ORF8 [28].
The phylogenetic analyses demonstrated variability across various geographic regions and revealed multiple clades with distinct clusters. In phylogenetic tree constructed for NSP3 sequences, the clusters A, C, D, and J showed a uniform distribution of Asian and European sequences predominantly. Among other obtained clusters, NSP3 sequences of Pakistani strains from years 2022 and 2023 fall in clusters D, F, and I with Asian, European, and African sequences. Overall, phylogenetic analysis of NSP3 sequences revealed highest rate of variations. In a previous genomic and epidemiological study, Lamptey et al., 2021 also performed phylogenetic analyses of nonstructural proteins of SARS-CoV-2 and found that NSP3 sequences contained most variants [29]. The phylogenetic analysis of NSP4 also revealed the same distribution pattern of sequences from different continents across various obtained clusters. Predominantly, most of the European sequences were found in cluster A (n = 16/30) along with Asian sequences (n = 8/30). Cluster B contained sequences from New Zealand strains of 2021 and 2022 sharing close relatedness with US strains. Asian strains were found to be predominant in clusters F and G also, along with a uniform distribution of sequences from Europe and other continents. The Pakistani sequences of NSP4 fell in clusters A and G sharing close relatedness with European, Asian, and US sequences. The phylogenetic analysis of NSP6 gene from circulating strains across the globe revealed a uniform distribution of sequences throughout the phylogenetic tree. The phylogenetic analysis of accessory gene revealed high levels of conservation and the sequences were uniformly distributed throughout the respective clusters. Further sequence logo analyses were performed and consensus sequences were obtained using WebLogo application and Jalview program respectively.
Short interfering RNAs are small (21 to 25 nt) RNA molecules that do not encode for proteins and have the ability to bind to complementary messenger RNA sequences. At post-transcriptional level, they can prevent the mRNA from being translated into proteins, thereby negatively regulating the expression of the target gene. An siRNA requires a high degree of complementarity between the guide strand of the siRNA and its specific target mRNA. Since the discovery of siRNA therapy, significant advancements have been made in investigating the potential of small interfering RNA (siRNA) as a therapeutic approach for targeting genes of various viruses including Zika virus [23], Hepatitis C virus [19], Nipah virus [22], Influenza A virus [21], MERS-CoV [30], and SARS-CoV-2 [16, 17]. The web-based siDirect 2.0 server employs a highly efficient algorithm and combined rational rules of Ui-Tei along with Reynolds + Amarzguioui for the prediction of functional siRNAs with minimal off-target effects. The Tm value of 21.5°C can be used as a threshold to distinguish the seed sequences with minimized off-target effects from those that are likely to have off-target binding effects. Primarily, a total of 41, 12, and 3 siRNA molecules were predicted against NSP3, NSP4, and NSP6 genes respectively and 7, 1, 2, 4, and 1 siRNAs were predicted for targeting regions of ORF3a, ORF6, ORF7a, ORF8, and ORF10 and further comprehensive analyses were performed, taking into consideration various filters to evaluate their effectiveness.
The GC content of siRNA-target duplexes is one of the significant parameters that may affect the efficacy of siRNA. A higher GC content may lead to the formation of secondary structures like hairpins and stems, which can ultimately lead to reduced accessibility of siRNA to its mRNA target. A lower GC content may result in an unstable duplex formation reducing the gene silencing efficiency. Therefore, in our study, an optimal GC content range of 31.6 to 57.0% was set to design efficient siRNAs. The predicted siRNA sequences were screened against the Main21 dataset of siRNAPred server using binary pattern [31]. Based on the highest binary scores (≥ 0.9), a total of 12, 2, and 1 siRNAs for NSP3, NSP4, and NSP6 respectively, and in case accessory genes, a total of 3, 1, 1, 3, and 1 siRNAs against ORF3a, 6, 7a, 8, and 10 respectively, were selected for the additional assessment. Further scoring of siRNA molecules was performed using i-Score Designer server that employs several 1st and 2nd generation algorithms [32]. The i-Scores (≥ 65) and s-Biopredsi scores (< 1) were calculated for evaluation of specificity of predicted siRNA sequences. In the heat capacity plots, Cp is represented as a function of temperature, referred to as TmCp. Whereas, Tm (Conc) represents the point at which concentrations of the siRNA-duplexes reach ½ of their maximum value. The melting temperatures TmCp and Tm were calculated using DINAMelt server [33]. In case of non-structural genes, the TmCp values ranged from 81.0 to 85.8°C whereas the Tm (conc) values ranged from 79.6 to 84.5°C. For accessory genes, the TmCp values ranged from 81.3 to 86.2°C whereas the Tm (conc) values ranged from 80.2 to 84.6°C. For the visualization of folding and binding patterns along with their corresponding minimum free energy values, RNA structure program was utilized. The secondary structures of guide strands of siRNA molecules were predicted using MaxExpect algorithm and their minimum free energy values ranged from 1.5 to 1.8 kcal/mol for NSPs and 1.4 to 1.9 kcal/mol for accessory genes. According to Hasan et al., 2021, positive MFE value indicates better siRNA molecules, as chances of folding are rare among them [17]. The secondary structures of target-siRNA duplexes were also predicted using RNA DuplexFold algorithm and the free energy of hybridization with target sequences of predicted potential siRNAs were − 34.8, -33.9, -35.0, -31.4, -31.5, -32.5, -31.9, -34.2, -36.8, and − 34.8 kcal/mol respectively. On the other hand, for accessory genes, the hybridization of siRNA-target mRNA duplex along with minimum free energy (MFE) for binding of both strands were − 32.9, -35.7, -32.2 kcal/mol for ORF 3a, -29.9 for ORF6, -33.2 kcal/mol for ORF7a, -34.0, -33.9, -32.2 kcal/mol for ORF 8, and − 32.5 kcal/mol for ORF10. The MFE for an siRNA targeting ORF8 was found to be lower than the threshold value (1.5°C), therefore, it was excluded from further analyses. Similarly, Minimum free energy of binding for siRNA targeting ORF6 was found to be greater than cutoff value (-30°C), thus, it was also excluded from further assessments. Next, we predicted the tertiary structures of 17 siRNA molecules using RNAComposer web server. The chemical structure of RNA backbone is rotameric, and there is a probability of getting wrong nucleic acid geometry [34]. Therefore, to validate the three–dimensional structures and nucleic acid geometry of our modelled siRNAs, we screened them, using the MolProbity server.
siRNA enters the cell and come in contact with RNAi silencing machinery referred to as RNA induced silencing complex (RISC). Guide strand then attaches itself with this complex leaving the passenger strand, which is then removed. It causes the attachment of this complex with a protein namely, argonaute thereby activating the complex. Guide strand directs this complex with its target mRNA sequence and binding occurs. Out of 21 nucleotides of siRNA, 19 of them acts as recognition factor for the silencing of gene by its breakdown [35]. The nucleotides present at position 2–8 are termed as seed region which should not be complementary to any nontargeted mRNA sequence to prevent off target effects [36]. Therefore, finally we performed nucleotide BLAST (BLASTn) against human genomic plus transcript database for investigation of any off-target effects and found no significant E-values.
In a previous study conducted by Saadat et al., 2022, a total of 133 siRNA molecules were predicted against a number of targeted proteins including non-structural and structural proteins and the 5′ and 3′ UTR sequences of SARS-CoV-2 [37]. They have reported 45 siRNA molecules for targeting NSP3/PLpro using siDirect 2.0, however, no siRNA candidate shared sequence similarity with our predicted siRNAs. In another study, Hasan et al. 2021 reported a total of 10 siRNA molecules, predicted against ORF1ab of SARS-CoV-2 using the same tool [17]. Our study, on the other hand, was focused on predicting siRNAs for NSP3, NSP4 and NSP6 of SARS-CoV-2 and identified 10 potential siRNA molecules. Among these, 3 siRNA molecules targeting NSP3 (siRNA no. 1, 2, and 3) were found to have complete sequence similarity with the siRNAs predicted by Hasan et al. 2021, thus validating our findings. Additional in vitro and in vivo experiments are needed to validate the effectiveness and role of the predicted siRNAs in suppressing NSP3, 4, and 6 along with accessory genes for inhibiting the double membrane vesicle formation and replicative pathway of SARS-CoV-2.