Sequence Conservation Analysis and Screening of Ayurvedic compounds against SARS-CoV-2 Spike protein

Recent infections caused by the novel coronavirus (SARS-CoV-2) have led to global panic and mortality. Here, we analyzed the spike (S) protein of this virus using bioinformatics tools. We aimed to determine relative changes among different coronavirus species over the past two decades and to understand the conservation of the S-protein. Representative sequences of coronaviruses were collected from humans and other animals between 2000 and 2020. Evolutionary analyses found that the S-protein did not evolve overnight, but rather continuously over time. Virtual screening of S-protein against a phytochemical database of Ayurvedic medicinal compounds (n = 2103) identied the S-protein inhibitors. Among these, top ranked were Gingerol (IUPAC name: 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols), 1-(5-Butyltetrahydro-2-furanyl)-2-hexacosanone and Ginsenoyne N ginseng that stimulates Caspase-3, Caspase-8, and the immune system. Gingerol is found in the fresh ginger and has reputation of being a potent antiviral. These compounds might prove useful to design drugs against COVID-19. Here, we using S-protein modications other that helps then undergo further modications over several years that lead to entry study, we carried out silico based virtual screening based using Our will and lead therapeutic We screened inhibitors to identify might act against


Introduction
The number of global infections due to the novel coronavirus (SARS-CoV-2) reached over 22,262,946 infections with over 784,107 deaths on August 20, 2020. Stats are continuously changing ("Coronavirus (COVID-19) -Google News," n.d.). The virus was initially transmitted to humans from other animals and then via person-to-person contact. It is not airborne, but aerosols and contact base transmission are the usual modes of transmission. The mechanism of the rapid transmission and why the same viral strain causes death in some persons and mild symptoms in others remain obscure. Coronavirus belongs to the subfamily Coronavirinae along with Torovirinae, and is grouped into the Coronaviridae family (Belouzard, Millet, Licitra, & Whittaker, 2012).  infections have recurred from time to time across various geographic regions. It is responsible for almost 30% of all respiratory infections in humans and other animals and causes great economic loss. Alpha and beta types of CoV mostly target human hosts, and other serological genera include beta, gamma and delta types (https://talk.ictvonline.org/taxonomy/). The RNA-based genome of CoV is the largest among known RNA viruses and it has high zoonotic potential for recombination and infecting new animal hosts including humans (Lai, Shih, Ko, Tang, & Hsueh, 2020). Viral sequences are under continuous stress to break through host barriers. High rates of interaction between humans and other animals provide a considerable trial and error experimental environment for viral pathogens to cross from one host to another, resulting in the emergence of new pathogens (Dolja & Koonin, 2018).
The evolutionary basis of CoV has made devastating comebacks possible; for example, the severe acute respiratory syndrome coronavirus (SARS-CoV) outbreak of 2002(Drosten et al., 2003Holmes & Rambaut, 2004) that infected over 8,000 people with varying morbidity and mortality rates (Guarner, 2020). A new strain that emerged in the Middle East during 2012 (Middle East respiratory syndrome coronavirus; MERS-COV) that killed over 780 people were thought to arise due to Arabs interacting with dromedary camel products and by-products (Reusken et al., 2013). In 2013, a porcine epidemic diarrhea coronavirus (PEDV) with a 100% fatality rate that decimated 10% of the total pig population in the USA (Mole, 2013;Chen et al., 2014) . The virus transformed and emerged within a decade of the PEDV epidemic in the form of COVID-19 with pneumonia-like symptoms. Such situations are becoming more problematic for authorities to manage. The epicenter of this virus is thought to be an animal market in Wuhan (Chang, Lu, Chen, Jin, & Yang, 2012) SARS-CoV-2 has an ~29 kb genome, a GC content of 38%, and RNA that encodes various proteins including the structural spike (S), membrane (M), and envelope (E) proteins (Enjuanes, Almazán, Sola, & Zuñiga, 2006;Fehr & Perlman, 2015). The CoV S-protein is a glycoprotein that expresses pathogenic characteristics in hosts by interacting with various cellular receptors and invading cells (Kwak, Song, Lee, & Schiefelbein, 2015). S-proteins vary according to viral types and range from 1,160 to 1,400 amino acids that facilitate viral entry into cells by interaction with various receptors (Belouzard et al., 2012). S-proteins consist of an N-terminal domain, an S1 receptor-binding region, and a C-terminal S2 binding domain. The latter domain assists viral fusion with host receptors on cell membranes (Bosch, van der Zee, de Haan, & Rottier, 2003;Millet & Whittaker, 2015). The protein is cleaved during viral maturation and exocytosis in some viruses, which causes various distinctions among CoV isolates. S-protein is class 1 fusion protein with an α-helical structure that confers the features of similar coiled-coils such as in uenza hemagglutinin protein HA (Bosch et al., 2003;Xu et al., 2004).
Receptor-mediated intracellular entry varies according to viral strains. Aminopeptidase N (APN) receptors are used by various α-CoV, HCoV-NL63 and SARS and they interact with angiotensin-converting enzyme 2 (ACE2) to mediate cellular entry, whereas the novel dipeptidyl-peptidase 4 (DPP4) confers invasive capability upon MERS-COV (Fehr & Perlman, 2015). Here, we analyzed the evolutionary changes in Sprotein that allow the virus to penetrate host barriers and cause infection using specie based analysis. Sprotein modi cations and other mutations lead to protein evolution that helps viruses to invade new species. Viruses then undergo further modi cations over several years that lead to the entry into and pathogenesis of more new hosts. In this study, we carried out an in silico based virtual screening based approach to nd potential drug compound using natural compounds. Our ndings will increase understanding of the mechanisms of CoV infection and possibly lead to therapeutic interventions against it. We also screened phytochemical inhibitors in a database of Ayurvedic compounds to identify agents that might act against S-protein.

Conservation analysis
Homologous sequences of S-protein and conservation pro les were analyzed using the protein sequences of major human and animal viral species of S-protein, obtained from the Viral Pathogen Resource (ViPR) https://www.viprbrc.org/brc/home.spg?decorator=vipr. Sequences collected annually between 2000 and 2020 were ltered to remove short sequences. Thereafter, sequences from several species were selected regarding year and aligned pairwise using the ClustalW function of MEGA-X software (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). Phylogenetic analyses were conducted using the maximum likelihood method. Accession numbers and aligned sequence data are provided in Supplementary File 1. Structural conservation analysis was done using ConSurf (Ashkenazy, Erez, Martz, Pupko, & Ben-Tal, 2010) and multiple sequences were aligned using ClustalW. Homologues were collected from UNIREF90 using the HMMER algorithm (E-value, 0.0001; maximum identity, 95; minimum identity, 35; iterations, 1). Conservation scores were calculated using maximum likelihood and Bayesian methods.

S-protein retrieval
SARS-COV2 S-protein structure PDB-ID RVYB was retrieved from protein data bank (PDB) having a resolution of 3.20 Å, initial preprocessing was done using AutoDock Tools utility (Morris et al., 2009).
Pocket prediction of all chain in open conformation and single chain was also performed using DoGSiteScorer (Volkamer, Kuhn, Rippmann, & Rarey, 2012). Both chains combinely forming pocket and individual chain pocket formation analysis was performed to nd difference in ligand accommodating pockets offered. Comparison of pockets was also performed between SARS-CoV-1, SARS-CoV-2, and MERS-COV structures were retrieved from RCSB having PDB-ID 5X58, RVYB, and 5X5C respectively.

Ayurvedic medicine database screening against target S-protein
About 2,103 compound were retrieved from Ayurvedic medicine database available at (http://www.way2drug.com/). Compound were screened for their drug like features using reference and were tested for Lipinski's thumb rule of ve (Lipinski, 2004). Molecular Operating Environment (MOE v2019.0102) was used for docking and visualization. Reference scale was also produced in order to compare drug e cacy, standard were Ritonavir and Remdesivir. Docking of was carried out using parameter: placement= triangle matcher, rescoring 142 1= London dG, re nement= force eld, rescoring 2= a nity dG. Out of 21,103 compounds, around 500 were retained for docking after ltering based on various criteria like Lipinski's drug-likeliness etc. and top ranked three were selected for further analysis. Selection was based on S-scores and root-mean-square deviation (RMSD) values. MOE uses built-in function that by default calculates binding energy (S value), which show a nity of ligand with receptor. While in case for RMSD the scoring functions uses reference to compare conformation against docked. Top selected compounds were having higher S-values and lower RMSD score than the reference which could be developed as a potential inhibitor for S-protein (Tahir ul Qamar et al., 2016). ADMET analysis was performed using AdmetSAR 2.0 (http://lmmd.ecust.edu.cn/admetsar2/).

Conservation and phylogenetic evolution
The present study we analyzed S-protein of SARS-CoV-2, which recently emerged as a new threat and rapidly became a global pandemic. The GC content of SARS-CoV-2 S gene was 37.39%. The ideal range of GC content is between 30% and 70%. Any peaks outside this range will adversely affect transcriptional and translational efficiency. We analyzed a total of 114 amino acid sequences to infer annual evolution. To suggest evolutionary relationships with various other species while considering the possibility of recombination, all species were considered in this analysis using the maximum likelihood method and amino acid substitutions were corrected using the JTT substitution matrix-based model (Jones et al., 1992). We estimated pairwise distance using a JTT model applied to Neighbor-Join and BioNJ with an automated heuristic search and final tree with the highest log -100343.12 (Fig. 1). Our findings indicated that SARS-CoV-2 evolved from bats and that it has continuously evolved. We also postulate that other recombinations are underway that might result in a new species.
3.2. SARS-CoV-2 S-protein potential drug pockets Pocket analysis was performed to evaluate potential drug binding sites and volume, for calculation of volume protein co-ordinates file was submitted into online web server DoGSiteScorer (Volkamer et al., 2012). Server predicted various pockets with a series of pockets data having various parameters of druggability. Druggability score ranges from 0-1 pocket, pocket higher score is more potential with respect to others. Pocket with highest drug score was 0.85 having volume of 895.01 Å for single chain while multiple chain were forming a pocket with a volume of 4155.19 Å (Fig. 2). The combine pocket has similar regions involved in pocket formation with minor variation for SARS-CoV1 and MERS-CoV sharing similarity in potential drug pocket with highest score. While other SARS-CoV-1 have different binding pocket with highest score as compared to SARS-CoV-2 and MERS-CoV. SARS-CoV-1 forms a channel pocket having a volume of 10587.05 Å being the largest docking pocket structurally it very much like multiple Y's connected with S shaped tunnels depicted in Fig. 3. This difference in ratio of pocket volume and surface volume clearly indicate that interaction would be clearly different.

Virtual screening using Ayurvedic database
Virtual screening approach give advantage of screening through thousands of compound with investment of less time, and helps in discovery of novel drugs compounds. Using best hits in reference drugs make it more powerful as druggable compounds could be compared with already known active drugs. Like in this case considered compounds with one hydrophobic, one aromatic, one H-accepter, and one H-donor were selected, and were given preference over others. Compound were further screened through Lipinski's thumb rule of five and number was further brought down by just selecting compound fulfilling at least three rules. Lipinski's rule sets five parameters for compound to be a drug like compound which include log p-valued <5, molecular weight < 500 Daltons, H-bond acceptor and H-bond donor <5. Drug should be fulfilling these properties are important for human pharmacokinetics (Lipinski, 2004;Yang et al., 2018). Best compound hits were ran through force-field MMFF94x, and Gradient: 0.05 for energy minimization running using MOE minimizing algorithm.  number of atoms, number of rings, H-bond acceptors, and H-bond donors) and negative for Ames mutagenicity.
Various administration and distribution matters directly influence metabolism and elimination in overall life cycle and effectiveness of drug. Oral bioavailability and Blood brain barrier (BBB) is endothelial cell barrier that prevent drug from entering the brain and is most important factor influencing a drug to be distributed (Alavijeh, Chishty, Qaiser, & Palmer, 2005;Thomas et al., 2006). Drug-like compound were positive for their blood brain barrier (BBB) penetration, bioavailability, HIP (Human intestinal preparation), ROCT (Renal Organic cation transporter), CaCO2 permeability and P-Glycoprotein substrate. Compounds with top hits were non-toxic, non-inhibitor of CYP enzymes, non-carcinogenic, and non-mutagenic. Compounds selected could serve as novel drug compounds potentially active against S-protein of SARS-COV-2. The three compound 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols, 1-(5-Butyltetrahydro-2-furanyl)-2-hexacosanone, and Ginsenoyne N were all in acceptable range of ADMET parameters ( Table 2).  3.6. Selection of drug based on interaction of protein ligand All library showed binding with protein each and every compound showing binding were not going to act as drug. So selection of potential drug strength of binding was analyzed for all interacting compounds. Strength of binding is directly proportional to the S-score analyzed and compound were organized according to the highest to lowest. This approach was combined with reference based approach using well know affective as standards drugs for comparing interactions of our potential drug like compounds. Our top compound 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols was showing S-value of 6.8 while Remdesivir and Ritonavir were showing S-value 6.4 and 5.4 respectively. Top compound surface analysis using surface based analysis showed that the compound was quite embedded in the binding with good stability Figure 5.
4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols or simply Gingerol, belongs to phenol phytochemical that is commonly found in fresh ginger and activate spice receptors. Gingerol belong to the family of capsaicin and piperine family of compound, which are alkaloids having different bioactivity pathway (Beltrán et al., 2013). Ginger has been used from ancient times in various Chinese, Ayurvedic and Tibb-Unani herbal medicines. It has been nonspecifically used for treatment of various unrelated condition including arthritis, inflammation, stomach problems like diarrhea, fever and parasite infections, and Gingerol has been specifically associated with the effectiveness (Ali, Blunden, Tanira, Toxicology, & 2008, n.d.). Antiviral activity has also been associated with Gingerol with various different pathway against SARS-CoV-1, Human Immuno-deficiency virus HIV, Ebola virus, and Influenza virus A (Mbadiko et al., 2020). All previous studies suggest that Gingerol could be potentially an effective drug option while treating CoVID-19 patients as safer option as it is least toxic substance that has been used for centuries by various medicinal culture throughout the world.

Conclusion
In this study Ayurvedic medicine database was virtually screened for potential drug like compounds that could inhibit SARS=CoV-2 binding with host receptor. After screening library of more than 2000 compounds, 500 were chosen for docking based on various parameters. Three compounds were further analyzed for ADMET, as they ranked as top inhibitors against SARS-CoV-2 S-protein having higher interactions in comparison to standard drugs. According to our ndings, 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols is a potential drug for SARS-CoV-2, with S-protein as target. It needs to be evaluated in vivo, in vitro and added in drug development pipeline in near future.
Declarations Figure 1 Evolution of SARS-CoV-2. Phylogenetic tree of various sequences shows the avian origin of SARS-CoV-2.     Protein drug interaction shows that protein (purple) has highly embedded drug (orange). Clearly showing a good interaction between protein and ligand with proper tting in binding pocket.