Description of a Novel Fameshifting Site in the 5’UTR of SARS-CoV-2 as a Potential Drug Target

SARS-CoV-2 is an enveloped positive-sense single-stranded RNA coronavirus that causes COVID-19 whose present outbreak has cost a high number of casualties throughout the world. The aim of this work was to scan the SARS-CoV-2 genome in search for new therapeutic targets. We found a sequence in the 5’UTR (NC 045512:74-130), consisting of a typical heptamer next to a structured region that may cause frameshifting. The potential biological value of this region is shown by its similarity with other coronaviruses related with SARS-CoV and its sequence conservation within isolates from SARS-CoV-2. We have predicted the secondary structure of the region by means of different bioinformatic tools. We have chosen a probable secondary structure to proceed with a 3D reconstruction of the structured segment. We carried out virtual dockin on the 3D structure to look for a binding site and then for drug ligands from a database of lead compounds. Several molecules that would probably administered as oral drugs show promising binding affinity within the structured region and so it would be possible interfere the potential regulatory role of our sequence of interest.

On March 11th, the World Health Organization (WHO) declared COVID-19 a clinical 2 pandemic (primarily pneumonia and gastroenteritis) caused by the SARS-CoV-2 virus. 3 As of mid September 2020, the pandemic outbreak has caused nearly a million deaths 4 worldwide. 5 SARS-CoV-2 belongs to the Coronaviridae family and is related to SARS-CoV and 6 Middle East respiratory syndrome (MERS)-CoV (79% and 50% genomic similarity, 7 respectively). SARS-CoV caused an epidemic outbreak in 2003 and MERS caused an 8 outbreak in 2012 [4]. All three belong to the Betacoronavirus genus. Coronaviruses 9 cause zoonotic infections, so they may pass from a host species to a different one 10 through small changes in their genome. SARS-CoV-2 demonstrated a high genetic 11 similarity (more than 85%) to a group of virus known as SARS related coronavirus 12 (SARSr-CoV), which are isolated from animal hosts, including Hipposideros bats and 13 pangolins (Manis javanica). These species seems to be candidates of being intermediate 14 hosts for SARS-CoV-2 [29] [9]. 15 These viruses have a positively translated single strand RNA genome and they use 16 programmed −1 ribosomal frameshifting (−1 PRF) to direct the synthesis of immediate 17 early proteins that prepare the infected cell for takeover by the virus. Frameshifting is a 18 smart mechanism for the translation of a genomic sequence into two different proteins 19 by moving the translation frame one position (backwards or forward) in the union 20 between RNA and the ribosome [10]. A typical frameshifting signal has two essential 21 elements: a characteristic heptanucleotide called the 'slippery' sequence, at which the 22 ribosome-bound tRNAs slip into the -1 frame, and an adjacent mRNA secondary 23 structure that stimulates this slippage process. The intermediate sequence between 24 these two elements also has a typical size of less than twelve nucleotides. Often the 25 secondary structure is more complex than a simple stem-loop between palindromic 26 sequences, expanding into pseudoknots [2]. In terms of structure, a pseudoknot forms 27 upon the base-pairing of a single-stranded region of RNA in the loop of a hairpin to a 28 stretch of complementary nucleotides elsewhere in the RNA chain. 29 A set of bioinformatic tools has been developed to predict these structures [13]. The 30 mechanism of action of pseudoknots is not completely understood; some authors 31 suggested that it appears to be linked to the helicase activity of the ribosome. When 32 pseudoknots are located in coding regions, they modulate the elongation and 33 termination steps of translation: the ribosome is able to switch from the zero reading 34 frame to the-1 frame and translation continues in the new frame. When pseudoknots 35 are in non-coding regions, they act on the regulation of the initiation of protein 36 synthesis and on template recognition by the viral replicase guiding viral replication and 37 packaging [7]. 38 All coronaviruses have been reported to utilize programmed -1 ribosomal 39 frameshifting to control the expression of their proteins. In 2005, Plant et al. [19] 40 identified a three-stemmed mRNA pseudoknot inducing an efficient -1 ribosomal 41 frameshift in the SARS-CoV genome. By this mechanism, the virus may produce a 42 fusion protein that overlaps the regions ORF1a and ORF1b. This element encodes an 43 ORF1ab polyprotein involved in ablating the host cellular innate immune response.

44
Mutations affecting this structure decreased the rates of -1 PRF, and had deleterious 45 effects on the virus propagation. Recently, Kelly et al. [11] described the same 46 pseudoknot in SARS-CoV-2, and they demonstrated frameshifting. This area is highly 47 conserved between SARS-CoV and SARS-CoV-2, as there is only one single nucleotide 48 difference, a C to A substitution at position 13,533 bp.

49
The frameshifting regions could be used as a target to fight viral infection [11]. 50 Starting with early studies, point mutations at the slippery sequence have proved to 51 have an important effect on viral replication [19]; thus, they would be also interesting 52 2/16 points in the engineering of an attenuated virus for vaccine development. The inhibition 53 of these regions by peptide antisense oligomers was studied by Neuman et al. [16]. After 54 several passes of viral culture, virions escape the inhibition of replication but show 55 attenuated forms. Rangan et al. [23], described highly structured areas of RNA that 56 might be less accessible to complementary oligomers, but these convoluted areas would 57 provide small binding sites for conventional drug molecules; therefore, a combination of 58 scanning for structure and sequence conservation may be appropriate to find 59 therapeutic targets. Previous studies using in silico methods found drug-like molecules 60 that would inhibit SARS-CoV replication by action on the frameshifting region at the 61 overlap between ORF1a and ORF1b [17].The same molecule has been shown to affect 62 replication in SARS-CoV-2 [11] 63 In this work, we scanned the SARS-CoV2 genome to seek for novel likely critical 64 areas for virus replication focusing on frameshifting predictors. We explored the likely 65 biological relevance of this feature through the study of sequence conservation and its 66 suitability as a potential drug target by the analysis of the structural properties and the 67 drug docking prediction. SARS-CoV-2 (NC 045512) [30] was performed using the KnotInFrame tool [27]. The inspection of the predicted secondary structure was achieved with additional tools 78 ipknot [25] and RNAfold from Vienna Suite [15].The secondary structure for the 79 segment of interest in dot bracket notation was chosen from the inspection of the overall 80 conformation of the 5'UTR and assuring to include the slippery region. The likelihood 81 of the secondary structure was assessed by computing the minimum free energy (MFE) 82 of a large number of random sequences of SARS-CoV-2 of the same length as the 83 sequence of interest into mFold, in order to obtain an empirical distribution of MFE and 84 so assess how dominant the proposed structure would be. [8]. hosted in humans and other species were selected based on subjective criteria regarding 93 variability and relevant facts to build a cladogram. The genomes were downloaded from 94 GenBank and aligned with Clustal Omega [26] using the default parameters. The 95 cladogram was constructed using a maximum likelihood estimate with FastTree [21], 96 under a GTT model of nucleotide evolution. The package ggtree [31] was used in R [22] 97 to generate the graphic of the cladogram and the multiple sequence alignment (MSA).

98
In addition to this alignment of the SARS-CoV-2 and another 20 coronavirus 99 genomes, the sequence of interest was examined by ViroBLAST [6]. This tool provides a 100 blastn [1] alignment with a comprehensive database of all types of virus, so that we 101 would assess any casual homology with any other virus. Secondly, we evaluated the 102 conservation of the sequence of interest within SARS-CoV-2 isolates from different 103 geographic locations since the onset of the pandemic. We took advantage of the fast 104 contribution of genomes into the GISAID database. We filtered the genomes in the 105 database in order to retain only high quality records (length greater than 29,000 nt and 106 with a low number of undetermined positions). The number of variant site strains was 107 assessed by blastn [1], making the distinction of variants at the whole 5'UTR region Upon consideration of different alternatives, the structure of the sequence of interest in 113 dot-bracket notation and the underlying nucleotide sequence were imported into 114 rnacomposer [20] to obtain a 3D structure prediction in .pdb format. The file in .pdb 115 format was used as input for the virtual scan for active sites. This task was carried out 116 using Autodock tools suite [28]. This suite comprises the AutoGrid and Autoligand  Knotinframe provides the nucleotides and position of the slippery sequence and the 128 near pseudoknot. The predicted stability of every predicted structure is indicated by the 129 MFE value, on the rightmost column. A more negative value of MFE represents a more 130 stable and likely structure. This value is mainly dependent on the length of the   The sequence spanning from position 74-130 was used to proceed with the analysis 173 so that it would include the the slippery region and the structure of the stem and loop 174 and the pseudoknot. The graphical representation of that secondary structure using the 175 VARNA software [5] is shown in Figure 4.

176
Conservation of the sequence of interest in relation to a set of other coronaviruses is 177 shown in Figure 6. A multiple sequence aligment of the 5'UTR is shown on the right 178 side, with a zoomed in view on the coordinates around the sequence of interest. All the 179 isolates from SARS-CoV-2 were identical as far as the sequence of interest is concerned. 180 This includes the samples from human patients in distant places (MT370831, New York; 181 and LC542809, Japan) and isolates from animals that were suspected of being infected 182 from humans (MT396266, farm mink; and MT365033, zoo tiger). A minor difference in 183 one nucleotide was found in a bat sequence (MT996532) and then the differences The input of the sequence of interest on the Viroblast database yielded 10 hits. The 187 search parameters were kept at nominal values except for the word length, which was 188 changed from ten to seven to increase the likelihood of matching slippery sequences.

189
The ten hits may be summarized in two sources. A group of sequences with consecutive 190 GenBank accession numbers from HQ890526 to HQ890531.1 came from a     compounds. As more negative values of MFE mean higher binding affinities, so the lead 221 compounds were ranked by this value 3. The number of hydrogen bond donors and 222 acceptors and the molecular weight in g/mol are annotation data from PubChem.

223
These data are show how likely a compound is to be used as an oral drug [14]. The 3 224 was consistent with an orally feasible drug.  shown in Figure 7 and the drug compound set NCBI maximum diversity set II are 237 summarized in Table 3.   Table 3. Results of dockind of lead compounds from NCI diversity set II against the predicted active site in the sequence of interest.  The red area on the left shows the slippery sequence. The active ligand site holds one of the best matches:NSC308835/pubChem328761 (see Table 3) in its docked position.

239
This work shows a previously unnoticed feature in the SARS-CoV-2 genome, which is 240 likely to play an important biological role on account of the remarkable conservation of 241 its sequence. The close occurrence of the slippery sequence and likely a stable 242 pseudoknot suggests that this may be an area of frameshifting, in addition to the 243 overlapping region of ORF1a and ORF1b, where frameshifting has been proven for 244 SARS-CoV [19] and also present in SARS-CoV-2 [11]. We have focused on a different 245 region, previously unnoticed in 5'UTR. The fact that no protein may be linked with the 246 sequence may argue against frameshifting as that of the overlap between ORF1a and 247 ORF1b. Supporting the role of 5'UTR, Zhou et al. [32] demonstrated that different 248 natural deletions in the 5'UTR of FMDV (foot-and-mouth disease virus) markedly 249 affected the pathogenicity and species tropism of the virus. Frameshifting linked with 250 5''UTR has been described in HIV-1 [3], and in this case the structure next to the 251 slippery sequence is a stem and loop, without additional pseudoknotting.

252
Another important endeavour of this work is to consider this RNA structured area 253 as a useful target for feasible drug intervention. Puzzingly, the description of a possible 254 drug against the pseudoknot involved in frameshifting between ORF1a and ORF1b in 255 SARS-CoV did not progress to an actual drug to use in health care [18] [17], probably 256 due to the lag in time of this discovery after the 2003 SARS-CoV outbreak. The same 257 molecule that was found to inhibit viral replication of SARS-Cov appears to be effective 258 against SARS-CoV2. [11] 259 Rangan et al. [23] performed a wide analysis on the SARS-CoV-2 and SARSr-CoV 260 genomes and they classified multiple regions in terms of the conservation and RNA 261 structure. In agreement with our approach they consider that structured regions would 262 be ideal targest for small drug molecules. In Table 2 (eighth row) of their article, they 263 describe, among others, sequence 40:157 of NC 045512-2 as highly conserved and 264 structured. We reproduce in 1 their proposal of structure for our region of interest.

265
There is a clear disagreement between the secondary structures not only due to the use 266 of different tools (Rosetta [23], RNAfold and IPKnot, but also depending whether the 267 input to these tools was a large, wide scope region as the whole 5' UTR or just a 268 fragment. The observation of the differences between 3 and 2 shows that disagreement 269 exists even for the overall 5'UTR structure. We decided to work on the local detach of 270 the segment of interes as predicted by IPKnot.

271
The result of the alignment of our sequence of interest against the Viroblast database 272 showed that the sequence may have been close to SARS-CoV as described in 2003 [24]. 273 We found two pathways of highly similar sequences, one in wildly occurring in isolation 274 from bats in the following years and another one in laboratory derived strains to create 275 a mouse adapted model from Urbani strain of SARS-CoV-2. Although ouside of this 276 work, these findings support the role of bats as intermediate hosts between SARS-CoV 277 and SARS-CoV-2 and the possibility that some unrecognized variation in strains of 278 SARS-CoV would hold relevant features as the sequence that we describe.

279
The primary limitation of our work is that it was restricted to in silico research.

280
This shortcoming is likely more relevant when it comes to the determination of the 281 tridimensional structure of RNA and its subsequent docking. The determination of the 282 crystal structure 3D prediction and drug docking has been developed for proteins rather 283 than RNA. One of the features of RNA that makes docking difficult is its flexibility.

284
However, successful discovery of ligands against SARS-Cov-2 pseudoknot by a computed 285 3D structure has been described before [18] [17]. Clinical evidence of pharmacological 286 actions against RNA viral genomes was achieved by drugs such as sofosbuvir (tradename 287 Sovaldi) against Hepatitis C Virus (HCV). These drugs are described as nucleotide 288 analogues. They bind to the target region as a complementary sequence would do but 289 they differ from short chains of nucleotides so that they may resist lytic enzymes. 290

13/16
Our screening for drug ligands was an exploratory analysis, as it was limited to 1507 291 compounds from NCBI maximum diversity set II. In table 3 several compounds have a 292 MFE lower than -10 Kcal/mol. This suggests that a search against a larger catalog 293 would yield multiple candidates. As pointed out in the limitations, we fail to provide 294 further evidence of the effect of a drug binding the pseudoknot as Park et al. did on 295 SARS-Cov [18] or recently on SARS-CoV-2 [11]. Several compounds on the table 3 296 would meet the criteria to be orally useful drugs according to Lipinski's rule of five [14]. 297 We point out the compound ranked second in terms of MFE affinity: NSC 308835 298 because it meets every Lipinski's criteria. The next best compound did not meet that 299 molecular weight should be less than 500 g/mol, though by a small margin. This fact 300 does not preclude oral activity. NSC61610 was given orally, once a day to mice in a 301 experimental model of H1N1 influenza infection [12]. The mice had less mortality and 302 the response was better than to tamiflu after the sixth day of infection. However, that 303 mechanism of action is unrelated to interactions with viral RNA, as NSC61610 acts as a 304 modulator of the inmune response.

305
As a final conclusion of this work, we claim to have found a relevant sequence in the 306 5'UTR region of SARS-CoV-2. It shows fairly temptative traits to play an important 307 role, either through frameshifting or other mechanism. A remarkable conservation 308 within the isolates in the present SARS-CoV-2 strongly support a biological role for this 309 sequence. Our analysis of the druggability of this sequence is flawed by the inconsistent 310 predictions of bioinformatic tools. It is however very likely that a strong structure of