An Update on the Origin of SARS-CoV-2: Though Closest Identity, Bat (RaTG13) and Pangolin Derived Coronaviruses Varied in the Critical Binding Site and O-Linked Glycan Residues

Background The initial cases of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) occurred in Wuhan, China, in December 2019 and swept the world by 13 May 2020 with 4,179,479 active cases, 287,525 deaths across 215 countries, areas or territories. This strongly implies global transmission occurred before the lockdown of China. However, the initial source's transmission routes of SARS-CoV-2 remain obscure and controversial. Research data suggest bat (RaTG13) and pangolin carried CoV were the proximal source of SARS-CoV-2. Methods In this study, we used systematic phylogenetic analysis of Coronavirinae subfamily along with wild type human SARS-CoV, MERS-CoV, and SARS-CoV-2 strains. The key residues of the receptor-binding domain (RBD), O-linked glycan and Angiotensin-converting enzyme 2 (ACE2) were compared. Results SARS-CoV-2 strains were clustered with RaTG13 (97.41% identity), Pangolin-CoV (92.22% identity) and Bat-SL-CoV (80.36% identity), forms a new clade-2 in lineage B of beta-CoV. The alignments of RBD contact residues to ACE2 justied those SARS-CoV-2 strains sequences were 100% identical by each other, signicantly varied in RaTG13 and pangolin-CoV. SARS-CoV-2 has a polybasic cleavage site with an inserted sequence of PRRA compared to RaTG13 and only PRR to pangolin. Only serine (Ser) in pangolin and both threonine (Thr) and serine (Ser) O-linked glycans were seen in RaTG13. Conclusion Though, pangolin (Manis javanica) and bat (Rhinolophus anis) related CoV proximal to SARS-CoV; detailed study needed to conrm.


Phylogenetic analysis and Protein sequences alignment:
For phylogenetic analysis, the full length S protein sequences of 11 countries SARS-CoV-2 were compared with SARS-CoV, MERS-CoV, bat-CoV (RaTG13), Pangolin-CoV, bat-SL-CoV and previously published representative viruses of the Coronavirinae subfamily sequences by BLAST-EXPLORER program that uses the neighbor-joining method with 1000 bootstrap replicates [13]. The resulting dendrograms were used to verify previously proposed genera assignments and identify areas for clari cation. Alignment of RBD and O-linked glycan residues sequences between SARS-CoV-2 strains, RaTG13, Pangolin-CoV, bat-SL-CoV and SARS-CoV, were analyzed by MEGA-10 [14].

Results:
Efforts to identify the reservoir of human CoV led to the discovery of diverse CoV, which are genetically close related. For the rst time, we have constructed an "S"protein sequence-based phylogenetic tree with all the known Coronavirinae subfamily viruses for the betterment of understanding of current SARS-CoV-2 clustering and classi ed them into genera alpha, beta, gamma and delta CoV. To cross-check the proximal to SARS-CoV-2; we had chosen wild type human CoV spike protein sequence to compare with all species of CoV along with recently documented closest CoV (RaTG13 and Pangolin-CoV) [ Figure 1]. The protein sequences were nearly identical across the S protein of eleven isolates, with sequence identity above 99.70%, indicative of a very recent emergence into the human population and justi cation here why we selected those eleven isolates than mutated and variant strains being updated globally. The phylogenetic analysis result showed that eleven SARS-CoV-2 isolates were closely clustered to inner joint neighbor RaTG13 (97.41%), Pangolins carried CoV (92.22% identity) and bat-SL-CoV (80.36% identity). All these together form a new clade 2 in lineage B of beta CoV and 2003 emerged SARS-CoV (Urbani) forms clade 1.

Discussion:
CoV are enveloped have a non-segmented, positive-sense RNA genome ranging from 26 to 32-kilo bases in length [21] and divided into four genera, including Alpha/Beta/Delta/Gamma. Evolutionary analyses have shown that bats, civet, camel, murine, canine, bovine, equine and rodents are the gene sources of most alpha-CoV and beta-CoV, while avian species, whale and porcine are the gene sources of most delta-CoV and gamma-CoV [22,23]. Prior to December 2019, 6 CoV were described to be pathogenic to humans [24]. In this study, for the rst time we have constructed the phylogenetic tree with all the species of the Wuhan seafood market, Huang C, et al. reported a total of 41 patients, and 14 cases are not related to the seafood market and no trace of bats has been found, so exact place of origin need to be studied in detail [27]. Subsequently, Zhou P. et al. from Wuhan institute of virology (Zheng Li Shi lab) showed that SARS-CoV-2 was highly similar throughout the genome to RaTG13 with an overall genome sequence identity of 96.2% and 93.1% nucleotide identity to S protein. Also, the author did not mention when it has been sequenced and RNA dependent RNA polymerase (RdRp) data not shown to compare SARS-CoV-2 [9]. RaTG13 was isolated from the bat (Rhinolophus a nis) on 24 July 2013 by Zheng Li Shi group and the reason unclear why they did not submit the sequence before instead on 27 Jan 2020, although it is proximal to bat-SL-CoV (accession number: AVP78042.1, AVP78031.1 and ACU31051.1) (Supplement-3). SARS-CoV (Rs806/2006) (accession number: ACU31051.1) already has proven for Intraspecies diversity and its implications for the origin of SARS coronaviruses in humans [28]. Hence, the detailed investigation needed for RaTG13 isolate and origin. Scientists report genetic sequences of viruses isolated from pangolins are 99% similar to that of the COVID-19 strains [7,8,10,29]. Lam TT, et al. identi ed two sub-lineages of SARS-CoV-2-related CoV in Malayan pangolin, one that exhibits strong similarity to SARS-CoV-2 in the RBD [30]. Zhang C, et al., assembled a draft genome of the SARS-CoV-2 using the metagenomic samples from the lung of Manis javanica, showing an overall coverage of 73% of COVID-19 strains with 91% sequence identity [31]. However, Li X, et al. concluded that the human SARS-CoV-2 virus, did not come directly from pangolins based on a unique peptide (PRRA) insertion seen in the human SARS-CoV-2 virus and not in pangolins carried CoV [32]. Also, a study demonstrated SARSCoV-2 is not a purposefully manipulated virus, based on high-a nity binding to human ACE2, polybasic cleavage site and the three adjacent predicted O-linked glycans are unique to SARS-CoV-2 and were not previously seen in lineage B beta-CoV [11]. Hence, we compared RaTG13 and pangolin-CoV with SARS-CoV-2 for an update and betterment of understanding.
RBD of S protein in SARS-CoV-2 binds strongly to human, pangolin and bat angiotensin-converting enzyme 2 (ACE2) receptors [19,20,33]. Studies have con rmed that S protein in the SARS-CoV-2 uses the ACE2, found in the lower respiratory tract of humans [1,9], and other certain species (pangolin, civet, swine, cow, buffalo, goat, cat, sheep and pigeon) as cellular entry receptor [34,35]. Liu Z, et al. indicated that, other than pangolins and snakes, turtles may act as the potential intermediate hosts transmitting SARS-CoV-2 to humans based on the key amino acid interaction between RBD and ACE2 [36]. Choudhury A, et al. showed SARS-CoV-2 is close to bat-CoV, strongly binds with ACE2 receptor protein from both human and bat origin and TLR4 is most likely to be involved in recognizing molecular patterns from SARS-CoV-2 to induce in ammatory responses [37]. A study data support the natural origin of SARS-CoV-2, likely derived from bats, possibly transferred to pangolins, before spreading to man and it not arti cial CoV, including the chimeric SL-SHC014-MA15 [38]. The study proposes a unique cleavage motif promoting SARS-CoV-2 infection in humans may be under strong selective pressure, given that replication in permissive Vero-E6 cells leads to the loss of this adaptive function [39]. Overall, we demonstrate the key residues of RBD (455, 486, 493, 494, 501 and 505) and polybasic cleavage sites varies signi cantly; need to be studied in detail for a better understanding of cross-species transmission. PubMed search result showed only three bats (Rhinolophus a nis) and ve pangolin CoV sequences were available and more CoV isolation needs to verify the origin of RaTG13.

Conclusion:
Although RaTG13 and Pangolin derived CoV are very proximal to SARS-CoV-2, the key receptor binding and O-linked glycan residues vary signi cantly, except a Malayan pangolin (PRJNA573298) isolate has 100% identity. The Polybasic cleavage site (PRRA insertion) was absent in RaTG13 and pangolin (PRJNA573298), whereas it is only PRR in other pangolin isolates with unique amino acid changes within. Thus, animal study, isolation of CoV from pangolin (Manis javanica) and bat (Rhinolophus a nis) is necessary to help in understanding of SARS-CoV-2 origin and intermediate transmission. The author received no speci c grant from any funding agency.

Con icts of interest:
The author declares that there are no con icts of interest.
Ethical approval and Consent for publication: NA Figure 1 Phylogenetic analysis of S protein of SARS-CoV-2 strains and representative viruses of the Coronavirinae subfamily. Countrywide rst reported SARS-CoV-2 isolates were closely clustered to RaTG13 (97.41% identity), Pangolins-CoV (92.22% identity) and bat-SL-CoV (80.36% identity) forms a new clade 2 in lineage B of beta CoV.