In silico analysis of RNA-dependent RNA polymerase of the SARS-CoV-2 and potentiality of the pre-existing drugs

Background: The alarming increase in the number of SARS-CoV-2 cases worldwide, urgently demands far-reaching effective strategies to win the battle against emerging as well as re-emerging diseases. Many research laboratories are conducting clinical trials with different drugs, among which some became quite interestingly effective against this pandemic. Our aim is to investigate the potentiality of the pre-existing drugs and get a clear understanding of their effects on RNA-dependent RNA polymerase (RdRp) of the SARS-CoV-2. Methods: Multiple sequence alignment (MSA) alogwith molecular phylogeny analysis were performed using homologous sequences to identify the mutations within RdRp and evolutionary relationship. Based on the published literatures, we have chosen eight drug molecules like Ribavirin, Tenofovir, Sofosbuvir, IDX-184, YAK, Setrobuvir, Remdesivir and Galidesivir. Series of molecular docking studies between template RNA and RdRp of SARS-CoV-2 has been performed in absence or presence of those drugs and cofactors nsp7 and nsp8. Results: From MSA 13 exclusive mutations identied within RdRp of SARS-CoV-2 and phylogeny reveals its close relation with Bat coronavirus RaTG13. The interaction anities and interacting residues as obtained from systematic molecular docking study led to a conclusion that the chosen drugs can prove to be effective against this pandemic partially. Conclusion: Therefore, designing of inhibitors which will specically act against nsp12 and lower the binding anity of nsp12 towards RNA template is necessary. In this article we additionally focus on the nsp7-8 hexadecameric complex and had suggested a list of amino acid residues of nsp12 and nsp7-8 complex to develop far-reaching effective drugs.


Introduction
Since its outbreak in December 2019, in Wuhan, Hubei Province in China, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) (formally known as 2019 novel coronavirus or 2019-nCoV), had infected more than seventy lakh people worldwide, as of 18 th July 2020, leading to more than 6.7 lakh deaths across the globe (Source: https://www.worldometers.info/coronavirus/). Although recently the number of cases had reduced in China, an exponential increase in the number of cases has been detected in countries like the United States of America, Spain, Italy, France, the United Kingdom, Germany, India, South Korea, Iran etc.. As of the present scenario, the worst affected country is the United States of America, followed by Spain (Source: https://www.worldometers.info/coronavirus/). In many countries, these numbers are disesteemed, mainly due to the lack of scrutiny and testing facilities. The most common existing symptoms of SARS-CoV-2 include fevers, chills and shortness of breathing and the most affected organ is the lungs. Recently, The Centers for Disease Control and Prevention had introduced new symptoms, which can be linked with this viral infection. These include muscle pain, sore throat, loss of taste or smell and headaches, in uenza like symtomps, often leads to the development of Acute Respriratory disorders(ARDS) or even multiple organ failure [1][2][3][4][5][6].
SARS-CoV-2 belongs to a group of positive-sense ssRNA enveloped viruses (60nm -140nm diameter) known as Coronavirus. These viruses earned their name due to their characteristic presence of crown-like projections on their surfaces [7][8][9]. Till date, only four genera of these viruses have been identi ed, namely α, β, γ and δ. Among the human coronaviruses, HCoV-229E and NL63 fall under α category and MERS-CoV, SARS-CoV, HCoV-OC43 and HCoV-HKU1 fall under β category [10][11][12][13]. Scientists predict 14 functional ORFs within the SARS-CoV genome, which include the replicase and protease genes, the spike (S), envelope (E), membrane (M), and nucleocapsid (N) genes (order of appearance: 5′ to 3′) [10,14]. The replicase gene (ORF1a) and protease gene (ORF1b) are known to encode polyprotein1a (pp1a) and polyprotein1ab (pp1ab), which are further processed by Papain-like protease (PLpro) and Chymotrypsinlike protease (3CLpro) to yield sixteen individual nonstructural proteins (nsp). The pp1ab is thought to be translated through a frameshifting mechanism [10,15]. RACE-PCR and northern blotting techniques had identi ed the leader sequence and the transcription regulatory sequence at the 5'UTR and highly conserved s2m motif at the 3'UTRs [16]. Trimeric spike glycoprotein (or S protein) plays the most important role in coronavirus infection, by neutralizing antibodies, binding to the receptors, and mediating the membrane fusion, which nally leads to the entry of the viral particles. The S proteins of the coronaviruses remain as trimers in their pre-fusion and post-fusion states and hence can be categorized under typical class-I viral fusion proteins. To activate their fusion potentials, a protease cleavage is required. In the cases of SARS-CoV and MERS-CoV, scientists had proposed a two-step protease cleavage model, which includes priming cleavage between S1 and S2 and activating cleavage on S2' site.
Coronavirus may enter the target cells through the plasma membrane or via endocytosis, which depends on the availability of the types of proteases within the cell. Several proteases like human airway trypsinlike protease (HAT), transmembrane protease serine protease-2 and 4 (TMPRSS-2 and TMPRSS-4), trypsin, cathepsins etc. are known to cleave the S proteins of Coronaviruses. The N-terminal or the Cterminal domain can serve as the receptor-binding domain, depending on the type of virus. Most coronaviruses use their C-terminal domain to bind to the receptors [17][18]. It is also being reported that SARS-CoV-2 uses human angiotensin-converting enzyme 2 (hACE2) as the receptor, to enter the human bodies [19]. The replication of viral RNA from an RNA template is catalyzed by a special class of enzymes known as RNA-dependent RNA polymerase (RdRp or RNA replicase). Besides playing an important role in the early stages of evolution, these classes of enzymes forms an integral part in the life cycle of RNAcontaining viruses with no DNA stage. Although RdRps share structural similarities with the DNAdependent DNA polymerases and reverse transcriptase, their error rate in the process of transcription is much higher than the others. These higher error rates lead to genomic variations within the viral populations. But, through the RNA recombination process, viruses can repair these mutations and genetic rearrangements may help them to acquire new genes, which may provide a selective advantage to the viral populations [20]. In general, the virus with segmented genomes prefers reassortment of RNA. This is completely contrasting in cases of RNA viruses. RNA viruses undergo recombination between two nonsegmented RNA genomes, which involves polymerase jumping during the synthesis of RNA. This RNA recombination mechanism can be considered similar to the generation of defective interfering RNA (a very common phenomenon among RNA viruses) except the fact that the RNA recombination mechanism has been found in cases of limited viral species [21]. Only a few members belonging to picornavirus and coronavirus have been reported to undergo RNA recombination [22,23]. Several plant viruses, like Brome Mosaic Virus and Cowpea Chlorotic Mottle Virus, has also been reported to undertake RNA recombination, but only in some rare situations [24,25,21]. The viral replication and transcription processes of CoVs are mainly facilitated by a set of non-structural proteins (products of cleavage from viral polyproteins) and the RNA-dependent RNA polymerase (RdRp). Non-structural proteins 7 and 8 (nsp7 and nsp8) act as cofactors for the COVID-19-RdRp (also known as nsp12) and play an important role in the replication and transcription cycle of these viruses [26]. In brief, the structure of the SARS-CoV-2 RdRp consists of a polymerase domain (RdRp domain) and a unique N-terminal domain that forms architecture similar to the nidovirus RdRp-associated nucleotidyltransferase (NiRAN) [27,28]. In addition to this con rmation, the cryo-EM map revealed that an interface domain connects the polymerase domain and NiRAN domain. A further detail on the structure of the COVID-19 viral RNA-dependent RNA polymerase is provided in the article published by Gao et. al, 2020 [28]. Several research papers had been published recently, which provides a clear idea about the molecular pathogenesis of the SARS-CoV-2 [10,[29][30][31].
The alarming increase in the number of SARS-CoV-2 cases worldwide, urgently demands far-reaching effective strategies, which include the development of effective vaccines and drugs, to win the battle against emerging as well as re-emerging diseases. The aim of our study is to investigate the potentiality of the pre-existing drugs and get a clear understanding of their effects on RNA-dependent RNA polymerase of the SARS-CoV-2.

Retrieval of the sequences
Database from Zhang Lab (freely accessible from https://zhanglab.ccmb.med.umich.edu/COVID-19/) was searched to identify the amino acid sequence of RNA-directed RNA polymerase (RdRp) of SARS-CoV-2. Thereafter, its homologous sequences were obtained from the BLASTP 2.2.32 (developed and maintained by National Center for Biotechnology Information; freely accessible from https://blast.ncbi.nlm.nih.gov/Blast.cgi) search with the already retrieved RdRp protein sequence as query [32]. In total, ORF1ab polyprotein from SARS-CoV-2 and 28 different organisms (selection criteria: e-value < 0.01, in BLASTP search) were collected and saved in FASTA format until further processing.

Multiple Sequence Alignment and Phylogenetic Analysis
Identi cation of the corresponding homologous regions among a bunch of input sequences provide useful pieces of information related to the biological relationships among the sequences of interest.

Phylogenetic analysis
Viral genomes can be differentiated from other replicons based on their mechanisms of survival strategies. A viral genome encodes all the necessary informations to maintain and ful ll its infectious stages within its host and they have tripartite survival strategy. Viruses also differ from other cellular organisms in terms of their disability to self-maintain itself and also to self-replicate itself [42]. In the 21 st century, with the changes in globalization, pathogens had also evolved among themselves to cope up with the new environments. Moreover, the evolution of host immune responses induces selection pressures, which results in unbalanced survival abilities, even among the concurrent strains. These had increased the scope of a new branch of science, called "phylodynamics". Phylodynamics deals with those informations where the phylogenetic properties and the epidemic dynamics of viruses gets interconnected [43,44].
We had conducted phylogenetic and sequence alignment analysis, to investigate the relationships between the taxa. The organisms involved in the formation of each sub-clusters within a cluster are likely to share homology among them, whereas, the differences in the branch length among the taxa within a particular cluster or among the taxa of different clusters indicate the differences in the mutation acquired during the course of evolution. The more the branch length, the greater will be the accumulation of the number of mutations acquired by the organisms. In all the cases, we found a very close relation between Bat coronavirus RaTG13 and SARS-CoV-2. The similar relationship has been found by various researchers, which leads to a solid conclusion that SARS-CoV-2 is genetically almost similar to RaTG13 (isolated from bat in Yunnan in 2013) [45,46]. From the phylogenetic tree based on entire length of ORF1ab polyprotein, we observed that SARS-CoV-2 is homologous to Bat coronavirus RaTG13. Moreover, the branch length for Bat coronavirus RaTG13 is slightly higher than SARS-CoV-2. But, in cases of other phylogenetic trees, the branch length pattern for Bat coronavirus RaTG13 and SARS-CoV-2, was found to be in reverse order (Figure 1 (A-C)). This concludes that the number of acquired mutations is slightly more in Bat coronavirus RaTG13 than in SARS-CoV-2 along the length of ORF1ab polyprotein.
Multiple sequence analysis result showed differences in amino acids occurred at 12 positions within the sequence of ORF1ab (corresponds to RdRp). At site 4495 (corresponding to 90 th amino acid of RdRp), valine is the most frequently used amino acid. But in some cases, valine has been replaced by Leucine (Bat_coronavirus_RaTG13/1-7095 and SARS-CoV-2) and Isoleucine in some cases. The substitution in both the cases is within the same amino acid group, i.e., neutral and non-polar. The site 4497 (corresponds to 92 nd amino acid residue of RdRp) is almost occupied by Glutamate (negatively charged, polar & hydrophilic), except for Bat_SARS-like_coronavirus/1-7092, where the glutamate is replaced by glycine (non-polar, aliphatic). Moreover, in cases of Bat_coronavirus_RaTG13/1-7095 and SARS-CoV-2, glutamate is replaced by aspartate. Both glutamate and aspartate belong to the same group. In some cases, we have also found the substitution of glutamate by asparagine, a neutral, non-polar amino acid.
At sites 4560 (which corresponds to 155 th amino acid of RdRp) and 4576 (which corresponds to 171 th amino acid of RdRp), the site is mostly occupied by aspartic acid (a negatively charged, hydrophilic amino acid) and isoleucine (non-polar) respectively, with some exceptions. The site 4594 (corresponds to 184 th amino acid of RdRp), is occupied by glutamine (polar, uncharged), with some exceptions in some Bat_SARS_coronaviruses, where it is replaced by a charged and polar amino acid residue arginine. Sites 4564 (corresponds to 249 th amino acid of RdRp) and 4671 (corresponds to 266 th amino acid of RdRp) are mostly occupied by isoleucine (non-polar, aliphatic), with some exceptions. Sites 4698 (corresponds to 293 rd amino acid of RdRp), 5016 (corresponds to 611 th amino acid of RdRp), 5042 (corresponds to 637 th amino acid of RdRp), 5048 (corresponds to 643 rd amino acid of RdRp) and 5212 (corresponds to 804 th amino acid of RdRp) are mostly occupied by Threonine, Threonine, Valine, Serine and Lysine, respectively with some exceptions. The detailed analysis of these mutated sites is shown in Figure 1(D). The variability of the amino acid characters within the polymerase domain between SARS-CoV2 and the SARS counterparts may be the reason for SARS-CoV2 RdRp e ciency.

Molecular docking studies
Based on our analysis of several already published reports, we had chosen the aim of our study. In order to ful ll the aim, we had conducted multiple molecular docking studies with SARS-CoV-2-nsp12. The SARS-CoV-2-nsp12 polymerase activity is activated by nsp7 and nsp8. To maintain similar situation as in vivo, we had conducted molecular docking to investigate the RdRp activity of SARS-CoV-2-nsp12 present in complexed state with nsp7-8 hexadecamer. Our result showed that the RNA template binds to the nsp7-8 complex with more a nity rather than binding to SARS-CoV-2-nsp12 within the SARS-CoV-2-nsp7-8-12 complex. Previous study reports showed that nsp7 takes part in polymerase activity and nsp8 also possesses non-canonical RdRp activity [47]. These leads to a conclusion that nsp7 and nsp8 within this complex must carry an RNA binding domain. The NTP entrance channel within the nsp12 (formed by the hydrophilic residues like Lys545, Arg553, Arg555 of Motif F), facilitates the entry of the incoming NTPs [48]. After the initial binding of the template or parental RNA with nsp7, the RNA is expected to meditate its entry into the active site of nsp12 polymerase domain (formed by Motif A & Motif C) and synthesis of new RNA strand takes place [28]. Moreover, nsp7-8 complex has been reported to interact with nsp12, at the sites of Thr409, Lys411, Trp509, Gly510, Gly897, Met899 of nsp12, which fall within the polymerase domain of the later. Therefore, it may also be concluded that nsp7-8 complex also occupies the nsp12 polymerase domain, as a result of which the viral polymerase domain becomes unavailable to many drugs. The binding a nity between only nsp12 to RNA is slightly greater (-328.84 with RMSD value of 133.19) than the binding a nity for nsp7-8-12 complex with the same RNA template (-317.09 with RMSD value of 83.68), but lesser RMSD value indicates that nsp7-8-12 complex has better con rmation to bind with the template RNA.
Subsequent molecular docking studies were conducted to investigate interactions between nsp12 and the chosen drugs (that might act as potential inhibitor of nsp12). These studies were carried out to examine the alterations of binding between nsp12 and RNA template by introducing the drug to nsp12 prior to its RNA binding. Our results showed that all the drugs eventually bind to the residues of nsp12 that fall under the polymerase domain of nsp12. The drug IDX184 forms hydrogen bonds with the residues Thr817, Leu819, Tyr831 and His872 of nsp12 polymerase domain (as indicated by Chimera and PLIP) and hydrophobic interactions with the residues Lys807, Tyr831 and pie stacking with residues Tyr831 as indicated by PLIP. Ribavirinforms multiple hydrogen bonds with Tyr530 and Val535 of nsp12 polymerase domain as indicated by Chimera and PLIP but no pie stacking or salt bridge interactions (although numerous hydrogen bonds are also found outside the polymerase domain of nsp12) and Sofosbuvir forms hydrogen bonds with Arg914 and Tyr915 of nsp12 polymerase domain as indicated by Chimera and PLIP. Also, additional hydrogen bonds are found between Sofosbuvir with Tyr915 and Glu919 of nsp12 polymerase domain as indicated by PLIP. Sofosbuvir also forms multiple hydrophobic interactions with the residues Tyr595, Tyr903 and Tyr915, pie stacking interactions with Tyr595 and salt bridge interactions with Arg914. Exceptions are for Galidesevir (hydrogen bonds are formed between the drug molecule and nsp12 with the residues Asn52, Arg116, Lys121, Tyr217 & Asp218 of nsp12 as indicated by PLIP, besides, the pie-stacking interactions are observed between Galidesevir and nsp12 with the residue Asp217 of nsp12 as indicated by PLIP and no salt bridge interaction is found in this case) and Tenofovir (hydrogen bonds are formed with residues Thr120, Thr123 of nsp12 as indicated by both Chimera and PLIP, and Cys53, Cys54 of nsp12 as indicated by Chimera). We had also got indications from Chimera software results that upon introduction of these chosen drugs (except for Setrobuvir) to nsp12, even though the binding a nity values between nsp12 and RNA template get lowered to some negligible extent, but still the template can form multiple hydrogen bonds with different amino acid sites within the nsp12 polymerase domain (Lys500, Ser501, Arg569, Lys577, Tyr689, Tyr689 and Tyr903). In case of Setrobuvir, the binding a nity between nsp12 and template RNA also get lowered (to a negligible extent), but unlike others in this case, the template RNA forms hydrogen bonds with those amino acid sites of nsp12, which are mostly positioned outside its polymerase domain (Tyr38, Asp40, Lys41, Thr76, Ser78, Asn79, Gly220 and Asp221). Molecular docking result also shows that when Sofosbuvir binds to nsp12, it increases the binding a nity of nsp12 towards RNA template. Moreover, the template RNA can still establish multiple contacts with the polymerase domain of nsp12. This can conclude preliminarily that Sofosbuvir can no longer be treated as a potential inhibitor of nsp12.
Keeping in mind about the in vivo situation, we had also conducted another series of molecular docking experiment with our drugs of interest to investigate that whether these drugs confer the RNA binding ability of nsp7-8-12 complex upon binding with the complex. Our results clearly indicated that whenever the nsp7-8-12 forms complex, the polymerase domain of nsp12 is protected by the cofactor and as a result, the drugs are unable to get the access of nsp12 polymerase domain. Additionally, the template RNA is still able to bind with nsp7 and nsp8 chain with in the nsp7-8-12 complex through the formation of hydrogen bonds with the residues of Lys27, Arg21, Glu23 of nsp7 and Arg75, Gln73 of nsp8 as indicated by Chimera. PLIP also showed that template RNA forms hydrogen bonds with Arg21, Ser26 of nsp7 and Gln73, Arg75, Arg80 of nsp8 within the nsp7-8-12 complex. None of these drugs inhibits the binding a nity of template RNA to nsp7 and nsp8 chain within the nsp7-8-12 complex. Meanwhile, when sofosbuvir is docked against nsp7-8-12 complex and then RNA is introduced for docking against nsp7-8-12 to Sofosbuvir complex, it was observed that Sofosbuvir increased the binding a nity of nsp7-8-12 complex for RNA template, which is indicating that this drug can no longer be treated as the inhibitor of nsp12.
Lastly, we have performed a nal series of docking studies with nsp12 complexed with each of these eight drugs individually, which is then allowed to interact with nsp7-8 and ultimately docked against template RNA. We have found that the RNA template still can be able to bind with nsp7 and nsp8 chains of nsp7-8-12 complex, although the binding sites may vary for each of the selected drugs. Besides these, one contact is also been made by the template RNA at Ala406 site of polymerase domain of nsp12, upon introducing Remdisivir with nsp12. Also, template RNA is able to bind with the polymerase domain of nsp12 through the formation of hydrogen bonds with the residues of Thr402, Asn403, Gln408 and Gly670 upon the introduction of Remdisivir to nsp12, which is then complexed with nsp7-8 prior to the RNA binding, which brings about the polymerization by nsp12. Also, upon the introduction of Sofosbuvir to nsp12 before the formation of nsp7-8-12 complex, it is observed that, RNA template forms hydrogen bonds with nsp12 complexed with Sofosbuvir at the residues Asp499, Lys500, Ser501, Tyr521, Arg569, Lys577, Asp684, Tyr689, Ser814, Arg836 and Tyr903 of nsp12 polymerase domain, as indicated by Chimera and PLIP and salt bridge interactions are also observed between RNA template and nsp12 at the site of Arg513, Lys593 and Arg836 of nsp12. As a result, these drugs do not seem to alter the binding a nity of template RNA upon binding with nsp12 when the latter forms complex with its cofactor nsp7-8 complex. The results of the docking experiments are shown in the gures 2-4 and supplementary table 1a-f.

Phylogenetic analysis:
From the phylogeny analysis on the basis of complete length of ORF1ab or only the mutable sites, it is clear that SARS-Cov-2 is evolutionary close to Bat coronavirus RaTG13. The branch length of the phylogenetic tree based on full length of ORF1ab depicts that the total number of acquired mutations in Bat coronavirus RaTG13 is little bit higher than SARS-CoV2 and the occurrence of identical amino acids in the equivalent mutable positions along the entire length of ORF1ab between Bat coronavirus RaTG13 and SARS-CoV2 re ects evolutionary relatedness among these two taxa.
Upon considering the twenty ve selected mutable positions along the length of ORF1ab corresponding to the viral RdRp, we have found that the total number of mutations acquired is highest in SARS-CoV2, although the evolutionary relationship between Bat coronavirus RaTG13 and SARS-CoV2 remains the same and out of these mutations observed along the length of RdRp, and amino acid changes within the

Molecular docking studies:
Series of molecular docking experiment clearly indicates that whenever the nsp7-8-12 forms complex, the polymerase domain of nsp12 is protected by the cofactor and as a result, the drugs are unable to get the access of nsp12 polymerase domain. Additionally, the template RNA is still able to bind with nsp7 and nsp8 chain within the nsp7-8-12 complex through the formation of hydrogen bonds with the residues of Lys27, Arg21, Glu23 of nsp7 and Arg75, Gln73 of nsp8 as indicated by Chimera (Figure 2-4). PLIP also showed that template RNA forms hydrogen bonds with Arg21, Ser26 of nsp7 and Gln73, Arg75, Arg80 of nsp8 within the nsp7-8-12 complex (Supplementary Table 1a-1f). None of these drugs inhibits the binding a nity of template RNA to nsp7 and nsp8 chain with in the nsp7-8-12 complex. Meanwhile, when sofosbuvir is docked against nsp7-8-12 complex and then RNA is introduced for docking against nsp12-7-8 to Sofosbuvir complex, it was observed that Sofosbuvir increased the binding a nity of nsp12-7-8 complex for RNA template, which is indicating that this drug can no longer be treated as the inhibitor of nsp12.

Conclusion
From the above systematic studies it's clear that RdRp of SARS-CoV-2 is highly mutable in comparison to its homologous sequences and evolutionary close to Bat coronavirus RaTG13, which might also be the study of interest for drug design. Furthermore, we would like to conclude that the drugs those are already been treated as the potential inhibitors of nsp12 do not lower the binding a nity of RNA for nsp12 to a signi cant level. These drugs seem to inhibit the nsp12 partially, that is also when the nsp12 does not form any complex with its cofactors. Therefore, designing of inhibitors which will speci cally act against nsp12 and that will have the ability to lower the binding a nity of RNA template for nsp12 is necessary. Besides, speci c drug molecules have to be designed against nsp7-8 hexadecameric complex with the ability of binding to the residues of nsp7-8 complex, which in turn binds with nsp12. We have found that Asn69 of nsp7 interacts with Gly897 of nsp12, Cys72 of nsp7 interacts with Gly510 of nsp12, Glu73 of nsp7 interacts with the Arg513 of nsp12. Also, Arg96, Arg111, Asn136 and Met174 of nsp8 interacts with the Trp509, Leu900, Lys411 and Thr409 of nsp12 respectively during the formation of nsp7-8-12 complex. So, by blocking these sites of nsp7-8 complex by designing of speci c drug molecules which will interact with the sites of nsp7-8 mentioned above will result in the alteration of binding a nity of nsp7-8 complex to nsp12. Also by designing speci c drugs, which will eventually bind with the sites of nsp7-8-12 complex that is being found responsible for binding with RNA (Gln69 of nsp8, Trp29 of nsp7 and Arg21 of nsp7) may also provide fruitful result in blocking of the polymerase activity of nsp7-8-12 complex.

Declarations
Compliance with Ethical Standards: This article does not contain any studies with animals performed by any of the authors.   Molecular docking between nsp7-8-12complex and RNA, considering that nsp12 rst interact to drug in the successive interactions.