Computational Approach for Covalent Lead Identication against Spike Glycoprotein of SARS-CoV-2

Background: A global outbreak of coronavirus disease 19 (COVID-19) led researchers to investigate various active compounds that can inhibit the replication of SARS-CoV2 (severe acute respiratory syndrome coronavirus 2). The present work targets to evaluate small covalent synthetic molecules through a virtual screening and docking approach that can eciently inhibit Spike Glycoprotein of SARS CoV2. Methods: We retrieved around 50,000 small covalent synthetic molecules through the American chemical society (CAS) database. The initial evaluation of these synthetic molecules depends on the ADMET screening. A Lipinski's Rule of Five (RO5) was also applied to nd whether the drug met the criteria of good bioavailability. Then, the further selection was made through virtual screening using BIOVIA Discovery Studio. Further, comparison among top hits was performed via a docking approach based on the binding energy (kcal/mol) calculated using the AutoDock Vina plugin and Patch Dock-like docking engines. Finally, the selected top ve molecules were compared for their binding eciency with reference drugs like Favipiravir, Chloroquine, Ribavirin, Hydroxychloroquine (approved by the FDA), and molecules with better binding anity than reference drugs was selected. Results: In the rst tier of selection, 215 molecules were screened out, satisfying all the necessary conditions of RO5 and ADMET. Among 215 molecules screened, only 203 molecules were stable in structure to undergo the second tier of target-based virtual screening. Further, based upon the LibDock score generated by virtual screening, the top ve molecules with the highest LibD score were selected. Molecular docking of these ve selected compounds reveals compound2 (3-ethyl-5-propyladamantan-1-amine) with the best binding energy. Furthermore, we compared the binding anity of 3-ethyl-5-propyladamantan-1-amine with reported drugs that show 3-ethyl-5-propyladamantan-1-amine as the most promising ligand ecient hydrogen bond interactions with amino acid residues of protein which provides more excellent stability in the docked region of the protein with ecient binding energy as compared to the reference molecule. Moreover, Compound2 also has a high oral bioavailability, non-mutagenicity, non-toxicity and follows all RO5 criteria. Conclusion: Thus, it has potential as an antiviral covalent synthetic molecule that may prevent the replication of spike protein. These ndings are just preliminary selection to facilitate the upcoming tests from in vivo and in vitro studies.


Introduction
SARS-CoV2 is known to be the seventh most severe infecting coronavirus in humans [1] . Coronavirus is spherical particles consisting of an enveloped exterior covered with membrane (M) proteins, spike(S) proteins, and enveloped proteins situated in the spike proteins. Approximately, these particles are 80 to 160 mm in diameter [2] . The coronavirus genome involves a single positive-stranded RNA of 26Kb to 32Kb length, known to be the most extended RNA virus to date [3] . The genome of this virus consists of capping Page 3/18 at 5'terminus, a polyadenylate tailing at 3' terminus, and six open reading frames (ORFs). Among these six ORFs, the rst ORF present near the 5' end of RNA encodes non -structural proteins of 16 different types which are involved in replication and viral transcription; remaining ORFs are involved in encoding four main structural (Spike, N, Membrane, and P) proteins and eight other additional proteins (ORF14,p6,3a,3b,7a,7b,8band 9b) which are involved in the assembly of these particles [4] .
Coronaviruses can cause several severe ailments in the digestive, nervous and respiratory systems of animals and humans. These viruses belong to the Coronavirinae subfamily in Coronaviridaefamily. Within the envelope, the genomic RNA and nucleocapsid phosphorylated proteins, which form a nucleocapsid of spiral structure, are present [2] . Several studies suggested that SARS-CoV2 is a more stable genome than SARS-CoV. On a major scale, 149 mutational sites appeared in SARS-CoV2. SARS-CoV2 is further categorized into two main L and S subtypes. L type is known for its more broadly widespread and having many mutational sites [5] . SARS-CoV2 enters human cells after binding to the ACE2 receptor [6] .
In 2003, Guangdong Province saw the rst occurrence of coronavirus diseases. Recently, it is believed that this virus originated from bats and was transmitted to humans through civets, believed to be an intermediate host [7] . By the end of 2019, the rst SARS outbreak occurred in Wuhan, Hubei, caused by SARS-CoV2 [8] . Some studies also suggested that the origin of SARS-CoV2 infectious diseases might be from Bat -CoV-RaTG13 like a virus with Pangolin -CoV -like virus recombination [9] . The World Health Organization (WHO) named the causative agent of coronavirus as 2019-nCoV [10] . According to WHO recent statistics of May 2021, 216 countries are suffering from coronavirus diseases, with 153 187 889 con rmed cases and 3 209 109 death cases [11] .
SARS-CoV2 infection is found to be transmitted in a broader range through several routes. Through closeup contact, aerosol droplets in enclosed areas, through urine, and even though mother-to-child transmission (yet to be established). SARS-CoV2 can survive in feces too. After different experiments of intragastric administration of SARCoV2 in transgenic mice, that shows human ACE2 was also found with the infection, thus indicating fecal-oral transmission [12,13] . The emergence of coronaviruses is imposing a signi cant threat to health and well-being globally. Generally, young people between 30-69 years of age are more susceptible to the COVID19 epidemic, while this infection is low in children [14] .
Earlier studies had suggested the basic reproduction number (R 0 ) as 2.9 [15,16] ; however, in the latest report from the Chinese Centre For Disease Control And Prevention and WHO, R 0 was roughly estimated to be 2.68 for SARS CoV2 [17] . Coronavirus with spike proteins is the principal target of antibodies as it enters the cell, leading to the formation of homotrimeric, which protrudes out to the virus's surface. The membrane spike protein consists of two different subunits S1 and S2. S1 subunit binds to the host cell's receptor (ACE2 ), and the S2 subunit leads to fusion of the cellular and viral membranes [18] .
In the previous studies involved in the development of antiviral drugs against the species of the Coronaviridae family, many RNA-dependent RNA polymerase (RdRp) inhibitors do not prove to be very speci c [19] . They are low potent drugs that also lead to side effects in infected patients. Although Remdesivir, an RdRp drug, was identi ed to be one of the most promising inhibitors against the coronavirus, it is also not effective in every case [20] . An antimalarial drug Chloroquine has also been an effective drug in treating coronavirus-infected patients, but Remdesivir has its disadvantages [21] . Thus to date, there are no speci c clinical approved drugs for inhibiting SARS-CoV2. Covalent synthetic organic molecules as inhibitors with adequate safety pro ling and e cacious prove to be the most promising agents for treating several diseases [22] . Thus all this available information strongly indicates a way to design and develop new covalent molecules as therapeutic agents against SARS-CoV2.
The target-based virtual screening approaches require the 3D structure of target protein onto which molecular docking studies of different individual compounds have been carried to calculate their binding energy scores by using a series of several scoring functions. The screening and docking of drug compounds on targeted proteins could help nd out the potent therapeutic agent with additional optimization of the molecules to nalize the drug candidate [23] .
So in the hope to design a novel drug candidate against SARS-CoV2, we used a computational approach to accelerate the identi cation of potential lead, which would prove to be therapeutic inhibitors for the treatment of this viral infection. Our present study includes virtual screening, molecular docking, and comparative docking studies of different antiviral drugs reported as reference inhibitors against SARS-CoV2 targets with the novel screened candidate. The compounds were screened based upon their excellent binding a nity against the targets along with their superb ADME (Absorption, Distribution, Metabolism and, Excretion) pro ling and other physicochemical properties. The selection was further supported based on the non-mutagenicity and non-toxicity of the compound.

Materials And Methods
All the computational studies were performed on the Windows10 platform running onto HP Laptop-15 having 250 GB SSD RAM and intel core i5 (8 th generation).

Target retrieval, quality check, and processing
The X-Ray crystal protein structure related to SARS-CoV2 Spike glycoprotein in complex with inhibitor NAG (PDB ID:6VXX) in closed state forms, prepared by Walls et al., was retrieved via Protein Data Bank [24] . The quality of protein structure was analyzed using ERRAT [25] and PROCHECK [26] . ERRAT and PROCHECK servers provide the overall quality of protein structure through scores and plots. This 3D macromolecular structure was of Homo sapiens origin. Before using this structure for further computational studies, it was processed by removing all the nonstandard residues, including water molecules, followed by the addition of hydrogen atoms and charges. BioVia Discovery Studio v 20 was used for target preparation [24] .

Retrieval of Compound Library
A total Antiviral of 49,430 inhibitors was retrieved from the CAS database (A division of the American Chemical Society) [27] . Moreover, some reported inhibitors of SARS CoV2 were also collected from PubChem (https://pubchem.ncbi.nlm.nih.gov/) [28] . According to the need for different le formats in various software, converting the 2D coordinates of ligands to three-dimensional (3D) coordinates was carried out using Open Babel [29] .

Lipinski Rule of Five and ADMET Pro ling
Before proceeding to molecular docking, the compound libraries underwent the rst tier of screening according to the Lipinski Rule of Five and different parameters of ADME (Absorption, Distribution, Metabolism, Excretion). Moreover, the screening also included the parameters like non-toxicity and nonmutagenicity. The exclusive screening was performed using the pkCSM online server (http://biosig.unimelb.edu.au/pkcsm/prediction) [30] . All the compounds-related les were uploaded in SMILES format on the pkCSM server, and it helped evaluate different pharmacokinetic properties that were important for rational drug development.

Virtual screening
The selected molecules now underwent virtual screening against the chosen target (6VXX) using Biovia Discovery Studio 3.5 software to nd potent SARS CoV2 inhibitors. Biovia Discovery Studio is academic licensed software for docking and simulating small and large molecule systems. The software generated several conformers for each compound, and based upon the best libDock scores generated; the top 5 compounds were selected in this second tier of screening.

Molecular Docking
Docking studies were performed to calculate the binding energy and prepare topology parameters and primary coordinates for molecular dynamics simulation studies. The screened compounds then underwent molecular docking using o ine software like the AutoDockVina plugin present in Chimera [31] , Autodock 4.2 [32] . In addition, an online tool, PatchDock [33] , was also used to calculate the binding energy. Since PatchDock and Autodock Vina plugin use different processes in nding the binding energy of the inhibitors, both methods improve the credibility of the docking results. After the third tier of screening, the best-docked compound was selected based on the binding energy scores in (kcal/mol).
Chimera software is freely available o ine software and provides advances in its visualization, usability, and performance. First, we pre-processed the target le containing chain A in "pdb" format through the Dockprep option by removing inhibitor NAG, water molecules and adding hydrogen and charges. Then the ligand le was uploaded (reported and top 5 screened molecules) in sdf format. Next, we formed the grid box by setting x=200, y=213, and z=198. Then, both structures were docked and calculated the binding energy in (kcal/mol).

Autodock 4.2 (with Vina plugin)
Selected compounds and targets containing chain A were generated in 3D coordinates and saved in pdbqt format. The grid parameters of the receptor with x=200, y=213, and z=198 were generated using the Autodock 4.2 tools. Using an in-house script docking process, having incorporated the Vina program and analyzed the result. We retrieved the binding scores in (kcal/mol) to rank the top docked molecules.

Patch Dock
We uploaded the target and compounds to be docked in "pdb" format on the PatchDock server (a geometry-based online docking algorithm tool), and by keeping all parameters as default, we analyzed the docked complex. We analyzed the result based upon the best atomic contact energy (ACE) in (kcal/mol). The whole docking process is based upon blind docking.

Visualization of 2D interaction diagram
Further, the best-docked compound's 2D interactions with different amino acids of target like hydrogen bonding, hydrophobic like Pi-Sigma, and many others were also analyzed using the 2D interaction pose option of Biovia Discovery Studio 3.5 software.

Retrieval of Target Protein structure
The 3D structure of SARS CoV2 spike glycoprotein was retrieved from RCSB PDB, as shown in g. 1. This spike glycoprotein structure contains a sequence length of 1281 aa with no mutation. It consists of A, B, and C chains and is classi ed as a viral protein with a resolution of 2.80Ǻ. NAG molecule as heteroatom is associated with this PDB structure to provide stability.

Validation of Target protein structure
After retrieving the 3D protein structure in PDB format, the validation of the structure was carried out using the ERRAT server and PROCHECK. ERRAT is an online server that accepts the target structure based on the nuclear connection among various sorts of atoms. The overall quality factor of our protein structure is 84.658 ( g. 2) which is acceptable. Next, Ramachandran plot analysis of target protein was carried through PROCHECK (an online server). The submitted 3D structure revealed that 90% of the protein residues are in the most favorable regions, with 9.6% residues in additional allowed regions and 0.1% in generously allowed areas.
Moreover, only 0.2% of residues are present in the disallowed region. Our protein structure's main chain stereochemical parameters revealed the parameter value of 90% residues lying in the favorable area with a Zeta angle standard deviation of 1.3, Hydrogen bond standard deviation of 0.7, and overall G factor of 0.2. Also, the side chain Chi-1 gauche trans standard deviation provides a parameter value of 8.4, Chi-1 gauche minus standard deviation provides parameter value of 6.0, Chi-1 gauche plus standard deviation provides parameter value of 7.2, and Chi-2 gauche trans standard deviation provides parameter value of 8.6 as shown in table 1 a, b and c. g. 3a, 3b, and 3c represents the retrieved plots.

Lipinski Rule of Five and ADMET screening
We introduced the compound libraries retrieved from CAS Database in pharmacophore descriptor and ADMET screening, where 215 molecules were screened out satisfying all the necessary conditions of RO5 and ADMET. In addition, these compounds were also ful lling the criteria of being non-toxic and nonmutagenic, as provided in Supplementary le 1. The Lipinski RO5 and ADMET values of the top docked ligand are shown in table 2.

Structure-based Virtual Screening
After completing the rst tier of screening, we supplied the selected compounds for virtual screening against the target protein structure using Biovia Discovery studio software. The protein structure was preprocessed by adding all missing side chains, removing heteroatoms and water molecules present.
Further, the hydrogen atoms were added to the target protein structure. Among 215 molecules screened, only 203 molecules were stable in structure to undergo the second tier of screening. All these compounds were prepared by converting from 2D to 3D structure and by adding hydrogen atoms. Further, high throughput screening was performed, and binding scores as Libdock scores of these compounds against the target were extracted (Supplementary le 2).
Molecular docking of four reported drugs was also performed against the target protein using the Autodock vina plugin in the Chimera tool and Patch dock server (table 4). We selected the highest negative binding score in kcal/ mol for further comparative study.

Visualization of docked compounds
Top docked compounds (from CAS database and reference molecules) in their respective active sites against the target were visualized as shown in g. 5a & 5b. Moreover, their hydrogen bond and hydrophobic bond interactions with different amino acid residues of protein were also visualized ( g.6a & 6b). Van der Waal interaction is the primary form of interaction for the complex. These interactions on spike glycoprotein conveyed that the compound has various binding regions. Different interaction data only explains and supports the stability of the docking process.

Comparison with the reference drug
We compared the binding e cacy of 3-ethyl-5-propyladamantan-1-amine with four reported drugs (taken as reference molecules). The result showed that the 3-ethyl-5-propyladamantan-1-amine has better binding e ciency as compared to other reported drugs.
A comparative study was performed between 3-ethyl-5-propyladamantan-1-amine from CAS Database and Chloroquine (as a top hit reference molecule) based on docking score and hydrogen bond interactions with amino acid residues of protein as shown in table 5, which will help in the identi cation of the most potent drug against the target Spike Glycoprotein.

SARS-CoV-2 belongs to the Coronaviridae family of viruses that invades human cells via a
transmembrane protein expressed on the surface of alveolar cells of the lungs called ACE2. The binding of viral spike protein receptor (S) with ACE2 receptor on alveoli facilitates the entry of the virus into host cells and leads to an infectious disease called COVID-19. To date, there are no recorded or approved drugs that can help in ameliorating the coronavirus infection. Therefore, scientists are imposing constant attempts to elevate the vaccine effectiveness in search of the most potent drug. A high throughput screening of around 50 000 small synthetic molecules has been performed using virtual screening and molecular docking strategy in the present study. The study suggested 3-ethyl-5-propyladamantan-1-amine as a potent drug in inhibiting spike protein interaction with ACE2 receptors. In computational pharmacology, drug repurposing is a powerful approach to nd the new uses of existing drugs. In-Silico analysis fastens the search for desirable medicines among the bulk of compounds which may be otherwise exhausting, time-consuming, and burdensome if carried through a wet lab. In this study, around 50 000 small synthetic antiviral compounds were assessed for their potency against spike glycoprotein (S) of SARS-CoV-2. To identify spike protein-ACE2 interaction inhibitors, we coupled the docking with ADMET screening and virtual screening. The viral envelope is densely populated by S protein and plays the leading role in pathogenesis by facilitating its entry into host cells. Therefore, spike protein is the main target for drug designing and repurposing.

Validation of target structure
First, we validated the spike glycoprotein structure through the Ramachandran plot, which revealed that depending on the analysis of 118 structures of at least 2.0 Ǻ, 90% of the residues were lying in the most favorable region indicating that the target protein is of good quality. Also, all the stereochemical parameters are better for the target protein's main chains and side chains, which implies that the structure is acceptable for further studies.
Drug Likeness screening K Zia et al. studies suggested that the CAS database consisting of approximately 50,000 compounds with known or potent antiviral activity were utilized for their docking studies against coronavirus [34] . To identify drug potency, one of the following methods is the Rule of ve (RO5). According to RO5, active compounds violating more than one established criteria cannot be determined for oral administration [35] . Thus, before proceeding towards docking studies, Lipinski's RO5 was applied to the compound library. The selected compound library used for virtual screening were rst screened using RO5, indicating that these selected compounds have molecular weight more than 500 Dalton, with ve hydrogen bond donor, ten hydrogen bond acceptor, and ClogP greater than 5. The evaluation of drug-likeness and determining different chemical and physical properties are needed to be carried out to categorize that the active compound is suitable for oral consumption in humans [36] . It also helps predict the accomplishment and failure of one active molecule having certain biological or pharmacological activities needed for drug development. The RO5 also suggests that the compound with two or more violations indicates lower permeability or solubility [37] .
Moreover, these compounds were also selected based upon ADMET and mutagenicity using an online tool pkCSM which consists of absorption parameters like CaCO 2 permeability having predicted value >0.9 and human intestinal absorption having value <30% were removed due to its poorly absorption property. Distribution parameter like BBB in which log BB >0.3 cross BB and log BB <-1 is poorly distributed to the brain. Metabolism parameter involves parameters of cytochrome P450 inhibitors which affect the drug metabolism were also considered for the screening. Excretion parameter like renal clearance was also included along with non-toxicity through LD50 value, non-hepatoxicity and non-mutagenicity were also included as criteria for selecting the compounds [38] . The compounds ful lling mainly nontoxicity and non-mutagenicity criteria were considered further for virtual screening and docking studies.

Compound Library screening
The validated spike glycoprotein structure was utilized as a 3D model for structure-based virtual screening against the screened compound library of the CAS database consisting of small synthetic covalent molecules. This database, after scrutiny, contains 215 molecules that were subjected to virtual screening using the LibDock algorithm of the DS platform to identify the potent hit compounds.
Compounds were ranked based on binding score, rigidity, planarity, and hydrogen bond interactions [39] .
Depending upon the better binding modes, LibDock scores, and molecular interactions with active regions of the target protein, which were added as vital components in screening, the best poses of docked molecules were scrutinized for further study.

Molecular docking process
To further validate the ndings of virtual screening, the docking process was carried out for the top ve screened molecules providing active regions as x:200, y:213, and z:118 having center grid box 40 × 40 × 40. Binding energy is set as a parameter for validation. Autodock vina plugin represents its binding energy as RMSD value. The more the RMSD value, the higher the deviation, which suggests the greater is the protein-ligand interaction prediction error [40] . Moreover, from different docking studies, creating data in binding a nity values from protein-ligand interactions was obtained. Generally, docked ligand and amino acid residues of the protein form different types of bonds, such as hydrogen bond, hydrophobic bond, van der Waal interactions [41] .
S protein presence on virus surface found to bind very e ciently to ACE2 present on cells surface of human which makes SARS-CoV2 easily transmissible. Thus, spike protein is a classic target for drug development. Coutard et al. proposed an inhibitor nding for furin because the sequence of S protein has particular furin-like cleavage [38] . This present study, which predicts the inhibition of top 5 screened organic covalent compounds against S protein, has disclosed different results indicating that 3-ethyl-5propyladamantan-1-amine have better binding pose other compounds. Covalent bond formation among hydrogen atoms (H) and other atoms such as oxygen (O), nitrogen (N), and uorine (F) is referred to as hydrogen bond interactions [42] . Hydrophobic interactions tend to aggregate inner globular regions of protein by avoiding water molecules. Different forms of hydrophobic interactions like Pi-Sigma and Alkyl/Pi-Alkyl bonds were identi ed [43] .In the current docking study, the best-selected ligand has a various number of hydrogen bonds. It is situated on multiple amino acid residues as revealed through the 2D interaction diagram of 3-ethyl-5-propyladamantan-1-amine, representing its hydrogen bond interactions with ASP-A: 867and HIS-A: 1058 amino acid residues. Pi-Alkyl interaction was also observed with PRO-A: 863, ILE-A:870, PRO-A: 1057, and sigma -alkyl interaction with PRO-A: 1057 amino acid residues. Some reported drugs like Favipiravir, Chloroquine, Ribavirin, and Hydroxychloroquine were also taken for present docking studies, and based on the highest binding a nity, Chloroquine was selected as a reference drug for further comparative analysis. The interaction diagram reveals an unfavorable bump with LYS-A: 1038 and VAL-A: 1040. Vander Waal interaction was also observed with GLY-A: 908, GLY-A: Comparative study between best docked and reference molecules.
Comparison between the top docked reported drug and top hit lead of CAS Database was also carried out, which revealed that 3-ethyl-5-propyladamantan-1-amine have good oral bioavailability along with non-mutagenic and non-toxic. It has a better binding a nity of -6.8 kcal/mol with AutoDock vina plugin in Chimera tool and -241.08 kcal/mol with PatchDock server as compared to the reference molecule. According to C. N. Pace et al., the hydrogen bond formation with amino acid residues strengthens the bonds, leading to lower energy score and stable bond formation [44] .
Thus, the 2D interaction diagram revealed that the 3-ethyl-5-propyladamantan-1-amine have hydrogen bonding with different amino acid residues of the target protein, and no unfavorable bumps were observed, which indicates that it is more stable, exible, and have a higher binding a nity with the target protein structure which would provide a promising lead candidate against the spike glycoprotein.

Conclusion
Our work revealed that 3-ethyl-5-propyladamantan-1-amine had best free energy binding with S glycoprotein of SARS -CoV2. However, the molecular docking results of reported drugs indicated that Chloroquine had a better docking score than others, but the comparative study revealed that 3-ethyl-5propyladamantan-1-amine proved to be the most promising ligand due to non-mutagenic and non-toxic. In addition, it has more hydrogen bond interactions with amino acid residues of protein which provides stability in docked region of protein with a good binding energy. Compound2 also have a high oral bioavailability and follows all Ro5 criterions. Thus,it is having potential as antiviral covalent synthetic molecule that may prevent the replication of spike protein. These ndings are just preliminary selection to facilitate the upcoming tests in vivo and in vitro ( clinical trials in animal or human models).   Overall protein structure quality factor validation through graphical representation using ERRAT software with Residue # represented on x axis and error value represented on y axis.   Figure 3B: Validation of protein main chain parameters through graphical representation using PROCHECK server. Figure 3C : Validation through plots representing side chain parameters of protein structure using PROCHECK server.

Figure 4
Histogram revealing Molecular docking scores of top ve selected compounds. Docking scores of these compounds were generated using AUTODOCK vina through CHIMERA plugin, AUTODOCK vina through AUTODOCK plugin and PATCHDOCK server. Compounds ( in SMILES format) were represented on X axis and docking score represented on y axis.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Tables.docx