Conservation analysis and screening of Ayurvedic compounds against SARS-CoV-2 Spike protein


 Recent infections caused by the novel coronavirus (SARS-CoV-2) have led to global panic and mortality. Here, we analyzed the spike (S) protein of this virus using bioinformatics tools. We aimed to determine relative changes among different coronavirus species over the past two decades and to understand the conservation of the S-protein. Representative sequences of coronaviruses were collected from humans and other animals between 2000 and 2020. Evolutionary analyses found that the S-protein did not evolve overnight, but rather continuously over time. Post-translational modification (PTM) analysis using online tools and virtual screening of S-protein against a phytochemical database of Ayurvedic medicinal compounds (n = 2103) identified the S-protein inhibitors. Among these, top ranked were Gingerol (IUPAC name: 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols), 1-(5-Butyltetrahydro-2-furanyl)-2-hexacosanone and Ginsenoyne N ginseng that stimulates Caspase-3, Caspase-8, and the immune system. Gingerol is found in the fresh ginger and has reputation of being a potent antiviral. These compounds might prove useful to design drugs against COVID-19.

Receptor-mediated intracellular entry varies according to viral strains. Aminopeptidase N (APN) receptors are used by various α-CoV, HCoV-NL63 and SARS and they interact with angiotensin-converting enzyme 2 (ACE2) to mediate cellular entry, whereas the novel dipeptidyl-peptidase 4 (DPP4) confers invasive capability upon MERS-COV (Fehr & Perlman, 2015). Here, we analyzed the evolutionary changes in Sprotein that allow the virus to penetrate host barriers and cause infection using specie based analysis. Sprotein modi cations and other mutations lead to protein evolution that helps viruses to invade new species. Viruses then undergo further modi cations over several years that lead to the entry into and pathogenesis of more new hosts. In this study, we carried out an in silico based virtual screening based approach to nd potential drug compound using natural compounds. Our ndings will increase understanding of the mechanisms of CoV infection and possibly lead to therapeutic interventions against it. We also screened phytochemical inhibitors in a database of Ayurvediccompounds to identify agents that might act against S-protein.

Conservation analysis
Homologous sequences of S-protein and conservation pro les were analyzed using the protein sequences of major human and animal viral species of S-protein, obtained from the Viral Pathogen Resource (ViPR) https://www.viprbrc.org/brc/home.spg?decorator = vipr. Sequences collected annually between 2000 and 2020 were ltered to remove short sequences. Thereafter, sequences from several species were selected regarding year and aligned pairwise using the ClustalW function of MEGA-X software (Kumar, Stecher, Li, Knyaz, & Tamura, 2018). Phylogenetic analyses were conducted using the maximum likelihood method. Accession numbers and aligned sequence data are provided in Supplementary File 1. Structural conservation analysis was done using ConSurf (Ashkenazy, Erez, Martz, Pupko, & Ben-Tal, 2010) and multiple sequences were aligned using ClustalW. Homologues were collected from UNIREF90 using the HMMER algorithm (E-value, 0.0001; maximum identity, 95; minimum identity, 35; iterations, 1). Conservation scores were calculated using maximum likelihood and Bayesian methods.

S-protein retrieval
SARS-COV2 S-protein structure PDB-ID RVYB was retrieved from protein data bank (PDB) having a resolution of 3.20 Å, initial preprocessing was done using AutoDock Tools utility (Morris et al., 2009).
Pocket prediction of all chain in open conformation and single chain was also performed using DoGSiteScorer (Volkamer, Kuhn, Rippmann, & Rarey, 2012). Both chains combinely forming pocket and individual chain pocket formation analysis was performed to nd difference in ligand accommodating pockets offered. Comparison of pockets was also performed between SARS-CoV-1, SARS-CoV-2, and MERS-COV structures were retrieved from RCSB having PDB-ID 5X58, RVYB, and 5X5C respectively.

Ayurvedic medicine database screening against target S-protein
About 2,103 compound were retrieved from Ayurvedic medicine database available at (http://www.way2drug.com/). Compound were screened for their drug like features using reference and were tested for Lipinski's thumb rule of ve (Lipinski, 2004). Molecular Operating Environment (MOE v2019.0102) was used for docking and visualization. Reference scale was also produced in order to compare drug e cacy, standard were Ritonavir and Remdesivir. Docking of was carried out using parameter: placement = triangle matcher, rescoring 142 1 = London dG, re nement = force eld, rescoring 2 = a nity dG. Out of 21,103 compounds, around 500 were retained for docking after ltering based on various criteria like Lipinski's drug-likeliness etc. and top ranked three were selected for further analysis. Selection was based on S-scores and root-mean-square deviation (RMSD) values. MOE uses built-in function that by default calculates binding energy (S value), which show a nity of ligand with receptor. While in case for RMSD the scoring functions uses reference to compare conformation against docked.
Top selected compounds were having higher S-values and lower RMSD score than the reference which could be developed as a potential inhibitor for S-protein (Tahir ul Qamar et al., 2016). ADMET analysis was performed using AdmetSAR 2.0 (http://lmmd.ecust.edu.cn/admetsar2/).

Conservation and phylogenetic evolution
The present study we analyzed S-protein of SARS-CoV-2, which recently emerged as a new threat and rapidly became a global pandemic. The GC content of SARS-CoV-2 S gene was 37.39%. The ideal range of GC content is between 30% and 70%. Any peaks outside this range will adversely affect transcriptional and translational e ciency. We analyzed a total of 114 amino acid sequences to infer annual evolution. To suggest evolutionary relationships with various other species while considering the possibility of recombination, all species were considered in this analysis using the maximum likelihood method and amino acid substitutions were corrected using the JTT substitution matrix-based model (Jones et al.,  1992). We estimated pairwise distance using a JTT model applied to Neighbor-Join and BioNJ with an automated heuristic search and nal tree with the highest log -100343.12 (Fig. 1).. Our ndings indicated that SARS-CoV-2 evolved from bats and that it has continuously evolved. We also postulate that other recombinations are underway that might result in a new species.

SARS-CoV-2 S-protein potential drug pockets
Pocket analysis was performed to evaluate potential drug binding sites and volume, for calculation of volume protein co-ordinates le was submitted into online web server DoGSiteScorer (Volkamer et al., 2012). Server predicted various pockets with a series of pockets data having various parameters of druggability. Druggability score ranges from 0-1 pocket, pocket higher score is more potential with respect to others. Pocket with highest drug score was 0.85 having volume of 895.01 Åfor single chain while multiple chain were forming a pocket with a volume of 4155.19 Å (Fig. 2).. The combine pocket has similar regions involved in pocket formation with minor variation for SARS-CoV1 and MERS-CoV sharing similarity in potential drug pocket with highest score. While other SARS-CoV-1 have different binding pocket with highest score as compared to SARS-CoV-2 and MERS-CoV. SARS-CoV-1 forms a channel pocket having a volume of 10587.05 Å being the largest docking pocket structurally it very much like multiple Y's connected with S shaped tunnels depicted in Fig. 3. This difference in ratio of pocket volume and surface volume clearly indicate that interaction would be clearly different.

Virtual screening based on pharmacophore using Ayurvedic database
Virtual screening approach give advantage of screening through thousands of compound with investment of less time, and helps in discovery of novel drugs compounds. Using best hits in reference drugs make it more powerful as druggable compounds could be compared with already known active drugs. Like in this case considered compounds with one hydrophobic, one aromatic, one H-accepter, and one H-donor were selected, and were given preference over others. Compound were further screened through Lipinski's thumb rule of ve and number was further brought down by just selecting compound ful lling at least three rules. Lipinski's rule sets ve parameters for compound to be a drug like compound which include log p-valued <5, molecular weight < 500 Daltons, H-bond acceptor and H-bond donor <5. Drug should be ful lling these properties are important for human pharmacokinetics (Lipinski, 2004; Yang et al., 2018). Best compound hits were ran through force-eld MMFF94x, and Gradient: 0.05 for energy minimization running using MOE minimizing algorithm.

Molecular Docking
Molecular docking has great importance for drug discovery and widely accepted. Current studies shows that ACE2 binding sites show higher conservation, about 8 out of 14 are highly conserved. While  Table. 1. Binding interactions of S-protein best hits were evaluated using MOE Ligplot algorithm. Compounds were showing hits near binding sites S-protein in SARS-CoV-2, compound with the highest S-value showed binding with ASN331, THR333, SER530, and GLN580 considered as a good inhibitory compound. Three compounds were selected out of all hits that were showing good features for ADMET analysis and their drugability pro le was generated. (consisting of six physicochemical properties: molecular weight, alogP, number of atoms, number of rings, H-bond acceptors, and H-bond donors) and negative for Ames mutagenicity.

Various administration and distribution matters directly in uence metabolism and elimination in overall
life cycle and effectiveness of drug. Oral bioavailability and Blood brain barrier (BBB) is endothelial cell barrier that prevent drug from entering the brain and is most important factor in uencing a drug to be distributed (Alavijeh, Chishty, Qaiser, & Palmer, 2005; Thomas et al., 2006). Drug-like compound were positive for their blood brain barrier (BBB) penetration, bioavailability, HIP (Human intestinal preparation), ROCT (Renal Organic cation transporter), CaCO2 permeability and P-Glycoprotein substrate. Compounds with top hits were non-toxic, non-inhibitor of CYP enzymes, non-carcinogenic, and non-mutagenic. Compounds selected could serve as novel drug compounds potentially active against S-protein of SARS-COV-2. The three compound 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols, 1-(5-Butyltetrahydro-2-furanyl)-2hexacosanone, and Ginsenoyne N were all in acceptable range of ADMET parameters ( Table 2)..  3.6. Selection of drug based on interaction of protein ligand All library showed binding with protein each and every compound showing binding were not going to act as drug. So selection of potential drug strength of binding was analyzed for all interacting compounds. Strength of binding is directly proportional to the S-score analyzed and compound were organized according to the highest to lowest. This approach was combined with reference based approach using well know affective as standards drugs for comparing interactions of our potential drug like compounds.
Our top compound 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols was showing S-value of 6.8 while Remdesivir and Ritonavir were showing S-value 6.4 and 5.4 respectively. Top compound surface analysis using surface based analysis showed that the compound was quite embedded in the binding with good stability Figure 5.
4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols or simply Gingerol, belongs to phenol phytochemical that is commonly found in fresh ginger and activate spice receptors. Gingerol belong to the family of capsaicin and piperine family of compound, which are alkaloids having different bioactivity pathway (Beltrán et al., 2013). Ginger has been used from ancient times in various Chinese, Ayurvedic and Tibb-Unani herbal medicines. It has been nonspeci cally used for treatment of various unrelated condition including arthritis, in ammation, stomach problems like diarrhea, fever and parasite infections, and Gingerol has been speci cally associated with the effectiveness (

Conclusion
In this study Ayurvedic medicine database was virtually screened for potential drug like compounds that could inhibit SARS = CoV-2 binding with host receptor. After screening library of more than 2000 compounds, 500 were chosen for docking based on various parameters. Three compounds were further analyzed for ADMET, as they ranked as top inhibitors against SARS-CoV-2 S-protein having higher interactions in comparison to standard drugs. According to our ndings, 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols is a potential drug for SARS-CoV-2, with S-protein as target. It needs to be evaluated in vivo, in vitro and added in drug development pipeline in near future.    A 2D Plot protein ligand interaction with 4'-Me ether, 3,5-di-Ac 3,5-di-Gingerdiols (A) top hit, drug having highest S-value followed by 1-(5-Butyltetrahydro-2-furanyl)-2-hexacosanone (B) as second and Ginsenoyne N was third to have highest S-value.

Figure 5
Protein drug interaction shows that protein (purple) has highly embedded drug (orange). Clearly showing a good interaction between protein and ligand with proper tting in binding pocket.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.