Natural compounds as potential inhibitors of novel coronavirus (COVID-19) main protease: An in silico study

COVID-19 pandemic has now expanded over 213 nations across the world. To date, there is no specic medication available for SARS CoV-2 infection. The main protease (M pro ) of SARS CoV-2 plays a crucial role in viral replication and transcription and thereby considered as an attractive drug target for the inhibition of COVID-19,. Natural compounds are widely recognised as valuabe source of antiviral drugs due to their structural diversity and safety. In the current study, we have screened twenty natural compounds having antiviral properties to discover the potential inhibitor molecules against M pro of COVID-19. Systematic molecular docking analysis was conducted using AuroDock 4.2 to determine the binding anities and interactions between natural compounds and the M pro . Out of twenty molecules, four natural metabolites namely, amentoavone, guggulsterone, puerarin, and piperine were found to have strong interaction with M pro of COVID-19 based on the docking analysis. These selected natural compounds were further validated using combination of molecular dynamic simulations and molecular mechanic/generalized/Born/Poisson-Boltzmann surface area (MM/G/P/BSA) free energy calculations. During MD simulations, all four natural compounds bound to M pro on 50ns and MM/G/P/BSA free energy calculations showed that all four shortlisted ligands have stable and favourable energies causing strong binding with binding site of M pro protein. These four natural compounds have passed the Absorption, Distribution, Metabolism, and Excretion (ADME) property as well as Lipinski’s rule of ve. Our promising ndings based on in-silico studies warrant further clinical trials in order to use these natural compounds as potential inhibitors of M pro protein of COVID. investigated as a potential target to inhibit previous coronavirus infections also like SARS and MERS (Jo et al., 2020). This study aimed to screen the natural compounds based on their pharmacokinetic properties, drug likeness and ability to specically bind to the active sites of SARS-CoV–2 main protease so that these leads can be proposed as potential inhibitor to check the virus replication cycle. Lopinavir and Ritonavir are well known protease inhibitor of HIV (Israr et al., 2011). Both drugs were also recommended as repurposed drug in the treatment of SARS and Middle East respiratory syndrome (MERS) (Chu et al., 2004). Therefore, in this study we have taken these drugs as standard reference drugs to compare the ecacy of the binding of our selected compounds. In our in-silico prediction experiment, none of the selected compound showed hepatotoxicity and cytotoxicity except Pectolinarin which showed potential cytotoxicity. The compounds which were found to potentially inhibit the viral protease based on the binding energy were Amentoavone, Guggulsterone, Puerarin, Piperine, Maslinic acid, Apigenin, Epigallocatechin, Daidzein, Xanthohumol, Resveratrol, Luteolin, Cyanidin–3-o-galactoside, Pectolinarin, Herbacetin, Rhoifolin, Ganomycin B, Phloretin, and Crocetin. Among these Amentoavone and Guggulsterone were the top two leads showing lowest binding energy and satisfying our studied parameters. Therefore, we propose that these natural compounds may further be validated as potential inhibitors of COVID–19 main protease M pro . Our promising ndings based on preliminary an in-silico analysis could become a basis for further studies at in-vitro and in-vivo levels in order to use these compounds as potential inhibitors of SARS-CoV–2 protease.


Introduction
Coronavirus disease  is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which primarily affects the lungs and shows certain types of pneumonia-like symptoms (Huang et al., 2020;Kumar et al., 2020). SARS-CoV-2 is a novel strain of coronavirus, rst time emerged in December 2019, during an outbreak in Wuhan, China, and subsequently expanded to all over the world in a very short period of time (World Health Organization, 2020;Hendaus, 2020). The outbreak was declared a Public Health Emergency of International Concern by WHO on 30 January 2020 (WHO, 2020). As on June 02, 2020, this contagious disease has led to over 6, 140, 934 con rmed cases and 373, 548 fatalities (https://covid19.who.int/). To date, there is no speci c treatment for this ongoing COVID-19 pandemic. Some preliminary study results investigated potential drug combination of Lopinavir and Ritonavir to treat COVID-19 infected patients, which were earlier used in human immunode ciency virus (HIV) and SARS CoV or Middle East respiratory syndrome (MERS) coronavirus patients (Lu, 2020;Chu et al., 2004). The drugs which can speci cally target the virus replication cycle and subsequent infection are urgently required to develop effective antiviral therapies as early as possible. Natural compounds due to the presence of enormous structural and chemical diversity, availability of more chiral centers, and relative biosafety are considered as an excellent source of drugs in several diseases including viral infections. This is further strengthened by the fact that around 45% of today's bestselling drugs have either originated from natural products or their derivatives (Lahlou, 2013). Natural compounds possesses antiviral property could become a valuable resource in this regard.  crystallized the COVID-19 main protease (M pro ), which has been structured and repositioned in the Protein Data Bank (PDB) and is publically accessible.
SARS-CoV-2 main protease (M pro ) is reported to play an inevitable role in virus replication and transcription, suggesting it to be a promising target for inhibition of the SARS-CoV-2 cycle (Boopathi et al., 2020;Lahlou, 2013;Xu et al., 2002). Keeping this in mind, in this study we have selected several natural compounds based on extensive literature (Zakaryan et al., 2017;Thayil et al., 2016;Jo et al., 2020).
In the present study, we screened and explored the potential of selected natural compounds to inhibit the M pro of COVID-19 using molecular docking, followed by MD simulations, MM/G/P/BSA free energy calculations, ADME, drug-likeness, target speci c binding, and toxicity analysis validation.

Material And Methods
A ow chart of pipeline used in present study is summarized in Figure 1.

Literature survey and ligands selection
An extensive literature survey was conducted to select the natural compounds having antiviral properties from different medicinal plants using PubMed and Google scholar platforms. Based on the literature survey, total twenty natural compounds were selected, and their chemical structures were extracted from PubChem (Kim et al., 2016) repository in SDF format. List of all selected natural compounds along with their corresponding chemical Ids, 2D and 3D structures is presented in Table 1. In order to prepare the ligands to perform molecular docking, hydrogen atoms were added followed by PDB structure generation by OPENBABEL program (O'Boyle et al., 2008). Further, all the molecules were allowed for the energy minimization and optimization using universal force eld at 200 descent steepest algorithm of OPENBABEL available in PyRx (https://pyrx.sourceforge.io/) and converted in pdbqt format.

Preparation of protease
The 3D coordinates of main M pro of SARS-CoV-2 was obtained from the RCSB-PDB repository with PDB ID 6LU7 (Jin et al., 2020). In order to prepare the macromolecule for docking, water and other nonspeci c molecules were removed by using UCSF CHIMERA (Pettersen et al., 2004). For protein protonation maintaining cellular pH, polar hydrogen atoms were added to the 3D structure model of M pro . The structure optimization and energy minimization were performed by using SPDB viewer (Guex & Peitsch, 1997). While, clean geometry module embedded in Discovery Studio package was utilized for the side chain angles correction.

Molecular docking
To identify new potential inhibitors against the M pro of SARS-CoV-2, the site-speci c docking-screening of all selected natural compounds were carried out by AutoDock 4.2 (Forli et al., 2012). The box dimensions were kept as 70 Å × 70 Å × 70 Å with total of 50 genetic algorithm run. Other docking parameters were set as default. During performance of molecular docking, the amino acid residues including Thr25, Thr26, Gly143, Ser144, His163, His164, and Glu166 were utilized as the binding pocket sites. Other docking parameters of AutoDock 4.2 were set as default. The protein-ligand interactions were further rendered with the Maestro and Discovery studio programs.

ADME compound screening
An in-silico tool for analysis of absorption, distribution, metabolism, and excretion (ADME) was used to screen the above-mentioned compounds which could be bioactive via oral administration. Drug-like properties were calculated using Lipinski's rule of ve using SWISSADME prediction (http://www.swissadme.ch/) (Lipinski et al., 2012) (Giménez et al., 2010) . .

Target prediction
Molecular Target studies are important to nd the macromolecular targets of bioactive small molecules. This is useful to understand the molecular mechanisms underlying a given phenotype or bioactivity, to rationalize possible side-effects, to predict off-targets (Enmozhi et al., 2020). In this direction, SwissTarget Prediction tool (https://www.swisstargetprediction.ch) was used (Daina et al., 2019). Canonical smile for Amento avone and Guggulsterone was entered and was analyzed.

Molecular dynamics (MD) simulations
The four representative docking complexes of ligands with M pro including amento avone, guggulsterone, puerarin, and piperine were used for further re nement using MD simulations analysis. MD simulation studies were carried out to nd out the stability and exibility of the natural compounds-M pro complexes on 50ns. The method used for the MD simulations of natural compounds-M pro complexes remains same as earlier described in recent studies (Gajula et al., 2016;Jee et al., 2018;Kumar et al., 2020). All simulations of representative natural compounds-M pro complexes were conducted with the fruitful utilization of GROMOS96 43a1 force eld available in GROMACS 5.1.4 suite (Van Der Spoel et al., 2005). Topology les for ligand molecules were created by using PRODRG server (Schüttelkopf & Van Aalten, 2004). The prepared protein complexes were solvated in a cubic box of edge length 10 nm along with SPC water molecules. In order to maintain the system neutrality, adequate numbers of ions were added. To remove the clashes between atoms, system energy minimization calculations were applied with the convergence criterion of 1000 kJ/mol/nm. Long-range interaction electrostatics (Abraham & Gready, 2011) was handled by using PME. For both van der Waals and Coulombic interactions, a cut-off radius of 9 Å was utilized.
Equilibration was completed in two different phases. The solvent and ion molecules were kept unrestrained in rst stage, while in the second stage the restraint weight from the protein and protein-ligand complexes was gradually declined, in NPT ensemble. LINCS constraints were applied to all bonds involving hydrogen atoms (Hess et al., 1997). The temperature and pressure of the system were kept at 300 K and 1 atm respectively by using Berendsen's temperature and Parrinello-Rahman pressure coupling respectively (Berendsen et al., 1995). The production simulation was initiated from the velocity and coordinates obtained after the last step of the equilibration step. All the systems were simulated for 200 ns and snapshots were taken at every 2 ps interval.

MM/PBSA free energy calculations
The calculations of binding energy of the M pro -ligand complexes were calculated by using MM/PBSA (Molecular Mechanics Poisson Boltzmann Surface Area) method. While calculations of MM-PBSA the polar part of the solvation energy was calculated by using the linear relation to the solvent accessible surface area. The g_mmpbsa module available in GROMACS was applied for the determination of different components of the binding free energy of complexes (Kumari and Kumar, 2014). Only the last 10 ns of data were utilized for the MM-PBSA analysis to considering the convergence issue associated with the calculations. In present study, entropy calculations were not calculated as they may change the numerical values of the binding free energy reported for the molecules. In the MM-PBSA calculation, the binding free energy between M pro and a ligand was calculated using following equations:

Toxicity analysis
Toxicity analysis of selected natural compounds was done by the ProTox-II web server (Banerjee et al., 2018). ProTox-II is a kind of virtual lab which integrates several parameters like molecular similarity, fragment propensities and most frequent features. It predicts various toxicity endpoints and incorporates a total of 33 models for the prediction of various toxicity aspects of small molecules.

Similar FDA approved drug compound search with SWISS similarity
The compounds which were giving the best binding energy among the selected natural compounds were checked for similarity, if any, with FDA approved drugs using SWISS similarity tool (http://www.swisssimilarity.ch) (Zoete et al., 2016).

Computational facility details
The MD simulations and corresponding energy calculations were carried out on HP Gen7 server with 48 Core AMD processors and 32GB of RAM.

Results
3.1 Determination of Active Sites: Table 1 shows the structure and amino acids found in the active site pockets of 6LU7. 6LU7 is the main protease (Mpro) found in COVID-19, which has been structured and repositioned in PDB and can be accessed by the public, as of early February 2020.
3.2 ADME (Absorption, distribution, metabolism, and excretion): ADME properties were found by obtaining the canonical smiles from PubChem. These smiles were used to identify ADME properties using SWISS ADME. Then compounds were analyzed on various parameters like lipophilicity, molecular weight, hydrogen-bond donors, hydrogen-bond acceptors, Clog P-value, Ghosh violations, Lipinski violations, etc. Ligands/natural compounds have been selected based on adherence to soft or classical Lipinski's rule of ve. The selected ligands that did not incur more than 2 violations of Lipinski's rule were further used in molecular docking experiments with the target protein. The drug scanning results (Table 2) show that most of the tested compounds in this study was accepted by Lipinski's rule of ve. These compounds were selected for docking to nd their binding a nity with COVID19 main protease Mpro.
List of compounds with suitable ADME properties given below: 00The target prediction analysis was displayed for our two best compounds, Amento avone and Guggulsterone on the web page with the following observations the top 15 of the results were given as a pie-chart ( Figure 2). The pie chart for Amento avone predicts 20% of Family AG protein-coupled receptor, 13.3% Kinase, 13.3% of Enzymes, 13.3% of unclassi ed protein, 6.7% of Phosphatase, 6.7% of protease, 6.7% of Oxidoreductase, 6.7% of primary active transporter, 6.7% of Secreted protein, 6.7% of Ligand-gated ion channel. The pie chart for Guggulsterone predicts 40% of Nuclear Receptors, 13.3% of CytochromE P450, 13.3% of Secreted protein, 13.3% of Oxidoreductase, 6.7% of Membrane receptors, 6.7% of Fatty acid-binding protein family, 6.7% of Enzymes. The output table consisting of Target, Common Name, Uniprot ID, ChEMBL-ID, Target Class, Probability, and Known actives in 2D/3D are given in the Supplementary material. The possible sites of the target which the compound may bind to are mostly the targets which are predicted by the software and the probability score for Amento avone and Guggulsterone are obtained from 1.0 to 0.0868 & 1.0 to 0.101672 respectively. This makes an inference that the small compound may have high target attraction towards the speci c binding site it is directed to.

Molecular docking
Molecular docking is an extensively used in-silico way to predict protein-ligand interaction. To perform the docking analysis, the structures and amino acids found in the active site pockets of 6LU7. 6LU7 is the main protease (M pro ) found in COVID-19, which has been structured and repositioned in PDB databank.
Thereafter, Ligand-protein docking was performed, and the interactions were determined based on the binding a nity of our compounds. Each individual analysis gave positive results, suggesting that the selected natural compounds may directly inhibit COVID-19 main protease M pro . The 14 selected natural compounds were docked with COVID-19 main protease M pro along with the standard ritonavir and lopinavir to compare the results. Further, like previous other ndings, our results also indicated a good binding a nity of ritonavir and lopinavir to the COVID-19 main protease M pro .The results obtained are as follows: Due to technical limitations, Table 2 cannot be displayed in the text. Please nd Table 2 in the supplemental le section. Table 2. Shows the molecular docking analysis results for selected natural compounds against COVID-19 main protease M pro (PDB-6LU7). Figure 3 and Figure 4 can be found in the gures section.
Due to technical limitations, Table 3 cannot be displayed in the text. Please nd Table 3 in the supplemental le section. The binding residues and their chains were identi ed from the protein-ligand complex as shown in the above images.

Molecular dynamics (MD) simulations
To further investigate the molecular docking results, the top four natural compound complexes namely, amento avone, guggulsterone, puerarin, and piperine were subjected to 50ns MD simulations. The conformational stability and exibility of the complexes have been analyzed by using various parameters such as root mean square deviation (RMSD), root mean square uctuation (RMSF), solvent accessible surface area (SASA), radius of gyration (Rg), and binding a nity of phytomolecule complexes by using mmpbsa and hydrogen bond formation ability. The RMSD is a commonly used similarity tool to measure the conformational perturbation during the simulation of macromolecule structures. RMSD of the Cα Atoms related to the stability of the complexes. The time dependent RMSD from the initial stage of the simulation to 50 ns simulation. The RMSD of the backbone of these 4 complexes lies between 0.231-0.50 nm, which stabilizes at the 35 ns whereas the RMSD of ligands ranged from 0.35-0.96 nm (Figure 5a). RMSD of the protein backbone of all system was small and comparable, which may conclude that the binding of ligands does not lead to the con rmation perturbation during the simulation (Figure 5b). During MD simulations, RMSF de ne the residual exibility from the average position. The RMSF of the protein ranged from 0.2-0.4 nm of all systems (Figure 5c). Some amino acid shows the high-intensity pick, which may represent a loop region. The presence of low-intensity pick revealed that binding of the phytomolecules does not affect the stability of the structural region of the enzyme.
In MD simulations, Rg determines the compactness of protein, induced by the movement of a ligand. The lower the exibility of the Rg during the simulation associated with the structural stability of the protein.
The Rg values of all phytochemical's complexes were lies 2.20-2.05 nm (Figure 5d). The Rg values of all four phytochemicals complexes support their consensus architecture as well as size. The SASA associated with the exposures of the hydrophobic residue during the simulation. SASA plays a principal role in the van der interaction. The SASA values of all systems were lies between 125-150 nm 2 . SASA con rmations showed that the binding of ligand molecules does not affect the overall folding of the protein (Figure 6a).
In a complex protein and ligand, hydrogen bonding plays a critical role to determine the strength of interaction. During the simulation time, several hydrogen bonds formed between the donor and the acceptor group (Figure 6c). Two hydrogen bonds consistently formed during the time of simulation (Figure 6b). Over all observations indicated that all four complexes are stable during simulation.  -173.54511.759 -9.3734.129 50.5737.610 -13.9431.293 -146.287 11.205 Puerarin -180.78720.912 -82.40516.508 148.20017.298 -16.6391.417 -131.63120.483 Table 4. Binding free energy calculation of four stable complexes during simulation

Toxicity analysis
In-silico toxicities of selected natural compounds were predicted by using ProTox-II. As shown in Table 4, ProTox-II toxicity prediction was done to check the safety of the compounds based on two major toxicity end points, hepatotoxicity & cytotoxicity. According to the toxicity analysis, none of the selected natural compounds showed potential hepatotoxicity or cytotoxicity except Pectolinarin which showed potential cytotoxicity.
Due to technical limitations, Table 5 cannot be displayed in the text. Please nd Table 5 in the supplemental le section. of Guggulsterone which are predicted by the software and the probability score for obtained from 0.995 to 0.009. This makes an inference that that these compounds could be very important and unique with pharmaceutical perspectives and need to be explored at in vitro and subsequent pre-clinical and clinical trials.
Due to technical limitations, Table 6 cannot be displayed in the text. Please nd Table 6 in the supplemental le section.  (Jin et al., 2020). This protease is considered an attractive target as it is essential for virus functionality, replication, and entry competence. The main protease M pro has been investigated as a potential target to inhibit previous coronavirus infections also like SARS and MERS (Jo et al., 2020). This study aimed to screen the natural compounds based on their pharmacokinetic properties, drug likeness and ability to speci cally bind to the active sites of SARS-CoV-2 main protease so that these leads can be proposed as potential inhibitor to check the virus replication cycle. Lopinavir and Ritonavir are well known protease inhibitor of HIV (Israr et al., 2011). Both drugs were also recommended as repurposed drug in the treatment of SARS and Middle East respiratory syndrome (MERS) (Chu et al., 2004). Therefore, in this study we have taken these drugs as standard reference drugs to compare the e cacy of the binding of our selected compounds. In our in-silico prediction experiment, none of the selected compound showed hepatotoxicity and cytotoxicity except Pectolinarin which showed potential cytotoxicity. The compounds which were found to potentially inhibit the viral protease based on the binding energy were Amento avone, Guggulsterone, Puerarin, Piperine, Maslinic acid, Apigenin, Epigallocatechin, Daidzein, Xanthohumol, Resveratrol, Luteolin, Cyanidin-3-o-galactoside, Pectolinarin, Herbacetin, Rhoifolin, Ganomycin B, Phloretin, and Crocetin. Among these Amento avone and Guggulsterone were the top two leads showing lowest binding energy and satisfying our studied parameters.
Therefore, we propose that these natural compounds may further be validated as potential inhibitors of COVID-19 main protease M pro . Our promising ndings based on preliminary an in-silico analysis could become a basis for further studies at in-vitro and in-vivo levels in order to use these compounds as potential inhibitors of SARS-CoV-2 protease.

Declarations
Con ict of interest    Histogram showing molecular docking results between COVID-19 main protease Mpro (PDB-6LU7) and selected natural compounds (the binding energy value ΔG is shown in minus kcal/mol), *Reference compounds.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. table6.pdf table5.pdf table3.pdf table2.pdf