Decoding the structure of RNA-dependent RNA-polymerase (RdRp), understanding the ancestral relationship and dispersion pattern of 2019 Wuhan Coronavirus

Most recently, an outbreak of severe pneumonia caused by the infection of 2019-nCoV, a novel coronavirus first identified in Wuhan, China, imposes serious threats to public health. Upon infecting host cells, coronaviruses assemble a multi-subunit RNA-synthesis complex of viral non-structural proteins (nsp) responsible for the replication and transcription of the viral genome. Therefore, the role and inhibition of nsp12 are indispensable. Since there is no crystallographic structure of RdRp is available, so, here, we present the 3-dimensional structure of the 2019-nCoV nsp12 polymerase using a computational approach. nsp12 of 2019-nCoV possesses an architecture common to all viral polymerases as well as a large N-terminal extension. This structure illuminates the assembly of the coronavirus core RNA-synthesis machinery, provides key insights into nsp12 polymerase catalysis and fidelity, and acts as a template for the design of novel antiviral therapeutics. Besides, the experimental structure could reveal the organization in a more sophisticated way. Furthermore, the ancestral state reconstruction suggests the possible evolution of nCoV in Wuhan China and its dispersal to the USA. The result of our analyses postulates the possible dispersal of nCoV from the USA and Shenzhen back to Wuhan. This disclosing of valuable knowledge regarding the 3D structure of 2019-nCoV nsp12 architecture, ancestral relation, and dispersion pattern could help to design effective therapeutic candidates against the coronaviruses and design robust preventive measurements.


Introduction
The viruses of the family Coronaviridae are now notoriously famous for their diseases causing capabilities in birds, humans & mammals. The corona virion typically composed of RNA enclosed in enveloped protein, having glycoprotein spikes, is capable of infecting a broad range of hosts, including humans. Coronaviruses, as the number of variants and diversity increases in this family, based on similarities are classified into four sub-genera, designated as alpha (α), beta (β), gamma (γ) & delta (δ) [1]. So far, the β coronaviruses (CVs) are known to cause infections in humans including common colds and primarily affecting the respiratory system. Bats are associated with the CVs pandemics in the human population, bats harbor the virus and are believed to be immune to the viral infection itself, promoting the mutations that are crucial for the CVs pathogenicity [2]. The spike-like glycoprotein (S), giving the virus its corona like appearance is vital for their pathogenicity and helps them to attach with the host cell surface receptors and also delimits the hosts' range for the CVs [3].
The CVs genome, ranging from 27 to 32 kilo-basis, are positive-sense single-stranded RNA (+ssRNA) coding for, ORF1a & ORF1b, the poly-proteins involved in RNA polymerization (RNA-dependent RNApolymerases) (RdRp) and also for modulation of host responses [4,5]. Fatal diseases causing zoonotic strains in this family are severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS) [6]. Additionally, there are four more strains, which are reported to be diseasecausing in humans, mainly common colds in individuals with immunodeficiency (229E, HKU1, NL63 & OC43) [4].
The 2019-novel-corona-virus (2019-nCoV) that emerged in Wuhan in 2019 belongs to a bat derived Coronaviridae family, that have gained the transmission capability from animals to humans and from human to human, due to which 2019-nCoV became so lethal and caused global emergency [7]. The 2019-nCoV is an enveloped RNA virus with the distinctive corona like shape protein spikes (usually about nine to twelve nanometers) capable of attachment to host cells. The 2019-nCoV potentially causes "novel corona-virus-infected pneumonia" or NCIP, the disease of lower respiratory tract having common cold-like symptoms with fever chest congestion leading to difficulty in breathing[8].
The 2019-nCoV has an 86.9% similarity with the genome of bat-like SARS-CVs and was classified as a distinctive subclade in the subgenus of sarbecovirus having typical β-CVs genome organization[8].
The non-structural proteins (nsp) from 1 to 16 of CoVs has a vital role in their replication, while the functions of certain nsps remain elusive. The structural proteins are indispensable for viral assembly and infection, while S protein for spike has distinctive variations and helps in the attachment to the host cell surface proteins [10,11]. The M protein having transmembrane domains binds to the nucleocapsid and shaping the virion [12,13]. The E protein is indispensable for viral pathogenesis and is responsible for virion assembly and budding [14,15]. The N protein comprising of two domains, having the capability of binding with virion genome and nsp-3 protein triggering replicasetranscriptase complex and viral genome encapsulation [16][17][18].
To elucidate the 3D structure of 2019-nCoV, here we applied the computational modeling approach to give an insight into the domain architecture of nsp12. The classification of the RdRp domain into fingers, palm, thumb, and decoding the conserved motifs in the N-terminal domain and RdRp domains provides a better insight into the exact mechanism. Furthermore, we also exposed the familial relationships and its dispersion pattern. This disclosing of necessary knowledge regarding the 3D structure of 2019-nCoV nsp12 architecture, ancestral relation and dispersion pattern could help to design effective therapeutic candidates against the coronaviruses and design robust preventive measurements.

Material And Methods
The primary amino acid sequence of RNA-dependent RNA-polymerase (RdRp) of the recent 2019-nCOV was retrieved from NCBI (https://www.ncbi.nlm.nih.gov/) [19] using accession number QHD43415.1. The RdRp (nsp12) was extracted from the polyprotein using information from UniProt (https://www.uniprot.org/) [20]. The sequence was submitted to the Robetta server (http://new.robetta.org/) [21] for comparative modeling. Sequences submitted to the server are parsed into putative domains for structure prediction, and structural models are produced using either comparative modeling or de novo prediction methods. If a positive match is found using BLAST, PSI-BLAST, FFAS03, or 3D-jury to a protein with the known structure, it is used as a template for comparative modeling. If no match is found, predictions of the structure are rendered using the Rosetta de novo fragment insertion process.
For structural validation, Ramachandran plot [22] servers were accessed to quantify the quality of the predicted models. Based on the model's scores the top model was selected for further analysis.
Pymol [23] was used for structure visualization and analysis. Domains architectures and conserved motifs were highlighted using the Pymol visualization tool. For the electrostatic potential APBS [24] module in Pymol was utilized. Multiple sequence alignment was performed using clustal Omega[25].

Phylogeny-based historical biogeography of Coronavirus:
Information about the distribution area and sequences (obtained from databases in this study) are essential for the inferences of historical biogeography of Coronavirus. Keeping the recent outbreak in mind, we divided the distribution ranges into 9 areas based on the availability of sequences. The

Structure Prediction and Validation:
Robetta comparative modeling approach was used to model the 3D structure of RNA-dependent RNApolymerase (RdRp) of the recent 2019-nCOV. The amino acid sequence of RdRp was extracted from orf1ab polyprotein submitted to NCBI using the accession number QHD43415.1. Five different models were generated from the amino acid sequence. The model generated by the Robetta ab initio modeling server is given in Figure 1. All the models were subjected to structural validation using the Ramachandran plot analysis.
The analysis from all the servers revealed that model 1 is the best model. The initial analysis suggested that the 2019-nCoV RdRp structure (sequence) possess 95.77% sequence identity with SAR-CoV (PDB ID: 6NUR) while 86.27% sequence identity with 6NUS [27]. Model 1 was selected for further analyses. All the validation scores are given in Table 1. The superimposed structure of model1 with 6NUR and 6NUS is given in Figure 2.

Domains Architecture of RdRp:
The

RdRp (2019-nCoV) Possess two highly conserved metal-binding sites
Previous studies reported that the RdRp enzymes have two metal-binding sites coordinated by four residues each. The four residues are enriched by Histidine repeats. In the case of the previously reported cryo-EM structure, these residues include His295, Cys301, Cys306, and Cys310, while the second is in the fingers domain and is coordinated by Cys487, His642, Cys645, and Cys646. Herein, these residues also coordinate the two metal-binding sites. All these eight amino acids, coordinating the metals, are reported to be highly conserved in all the RdRp. Both of these metal-binding sites are distal to known active sites as well as protein-protein and protein-RNA interactions. Thus, rather than being directly involved in enzymatic activity, these ions are expected to be structural components of the folded protein. The involvement of intrinsic zinc ions in nsp12 is reflective of bound zinc atoms in coronavirus nsp3, nsp10, nsp13, and nsp14 and leads to the common use of zinc ions to fold viral replication functional proteins. Figure 4 showing the two metal-binding sites given below.

Electrostatic Potential and conservation analysis of RdRp:
As given in Figure 5, it can be seen that the outer surface of the predicted model carries a mostly negative electrostatic potential. Nevertheless, a strong positive electrostatic potential was reported as the nucleotide triphosphate (NTP) binding site and the polymerase RNA template site. The RNA exit tunnel is comparatively neutral. A relative neutral electrostatic potential at the nsp7 and nsp8 can be observed. The electrostatic potential of the predicted model is given in Figure 5(A).
Furthermore, sequence conservation analysis using sequences from the coronavirus family reveals that the NTP tunnel, template entry, and primer exit tunnels are the most highly conserved surfaces on nsp12. On the other hand, the polymerase active site is also highly conserved site Figure 5(B).
The previous study reported that the nsp12 nidovirus-unique N-terminal extension also has a conserved surface, which may reflect an interface site for the N-terminal disordered domain of nsp12 .

The N-terminal extension conserved motifs
The 2019-nCoV nsp12 is ~931amino acids long, which is in distinction to the polymerases of the

nsp12 RNA-dependent RNA polymerase (RdRp) domain & Catalytic mechanism
The RdRp region has been reported to have a shape like a right hand with subdomains include fingers, palm and thumb. Herein, we defined these subdomains with their respective residues    These ancestral reconstructions suggest three central of diversity and expansions in coronaviruses, Wuhan is the primary center of diversity, whereas Shenzhen and USA are the other two centers ( Figure 8 and Table 3). Dispersals events occurred from Wuhan to Thailand, USA, Shenzhen, Shanghai and Beijing. From USA dispersal took place to Area B, C, D E and I (Europe). The dispersal from Shenzhen (C) is unique because for very remote dispersal to A (Australia), Japan (G), USA (H) and Europe (I) occurred via this area as shown in Figure 9B.
As far as the recent new coronaviruses are concerned, they are nested within Bat SARs like coronaviruses and probably evolved in E (Wuhan) region. An early expansion occurred from Wuhan to USA and Shenzhen. From the USA, recent dispersals are identified toward Wuhan and Shenzhen. The most of the long-distance dispersal events in 2019-nCoV took place from Shenzhen to USA, Australia, Finland and Japan.

Discussion
Here we described the in-silico characterization of 2019-nCoV proteins RdRp and its interaction with the nsp12, nsp8 and nsp7 and how they interact to initiate the RNA synthesis and polymerization of 2019-nCoV genome. A comparative modeling approach using Robetta was used to predict the protein models from the 2019-nCoV genome, retrieved from the NCBI database. Five models were predicted, and after structure, validation using online servers and Ramachandran plot, the best-predicted model was selected and its sequence identity was compared with experimentally verified similar protein model. Further, the domain architecture RdRp was characterized and melded into fingers, palm and thumb design, and its interaction and binding with the RNA for accurate demonstration of the proteinprotein and protein-RNA interactions. The position, binding, structure integrity and association of the zinc metals in the protein structure is also sketched. Computational methods are of great importance in determining the structure and function of proteins, drug binding, exploring the resistance mechanism and biocatalysis [28][29][30].
Our analysis is in accordance with the previously reported SARS-CoV identical protein, where the nsp12 protein interacts with the CoV RNA, and the required stability is provided with the nsps, including nsp7 and nsp8. Further, the nsp12 is also involved in template recognition. Together with the stability in SARS-CoV the nsp8 also plays its role in polymerization. Biochemical confirmations also demonstrated the de-novo synthesis ability of nsp12, nsp8 and nsp7 complex. Herein, the higher identity between the previously reported SAR-CoV, our model works in the same way due to high amino acid conservancy. The highlighted residues, important domains and conserved motifs will help to identify potent inhibitors and help to control the emerging infections related to Coronaviridae family. The

Conclusion
In conclusion, this study decoded the important domains and motifs of RNA-dependent RNA polymerase, which is important for viral replication. Also, it provides a basis for designing novel potent inhibitors targeting the RdRp region. Furthermore, we also exposed the familial relationships and its dispersion pattern. Thus, this study is a significant consideration in future strategies against the outbreaks caused by such viruses.

Conflict of Interest
The authors declare no conflict of interest.  Figure 1 Structure of RNA-dependent RNA polymerase 2019-nCOV.The Robetta server-generated five models.

Figure 2
The superimposed structure of model 1 (yellow) with 6NUR (green) (left) and 6NUS (cyan) (right). The RMSD of each superimposition was reported to be 0.235Å and 0.258Å.

Figure 4
The figure shows the two metal-binding sites on the protein structure. Four residues enriched by the histidine amino acid coordinate both metals. The Zinc ions are given as a sphere in blue colour.