Elucidation of the inhibitory activity of plant-derived SARS-CoV inhibitors and their potential as SARS-CoV-2 inhibitors

Abstract Several drugs are now being tested as possible therapies due to the necessity of treating SARS-CoV-2 infection. Although approved vaccines bring much hope, a vaccination program covering the entire global population will take a very long time, making the development of effective antiviral drugs an effective solution for the immediate treatment of this dangerous infection. Previous studies found that three natural compounds, namely, tannic acid, 3-isotheaflavin-3-gallate and theaflavin-3,3-digallate, are effective proteinase (3CLpro) inhibitors of SARS-CoV (IC50 <10 µM). Based on this information and due to the high sequence identity between SARS-CoV and SARS-CoV-2 3CLpro, these three compounds could be candidate inhibitors of SARS-CoV-2 3CLpro. This paper explores the structural and energetic features that guided the molecular recognition of these three compounds for dimeric SARS-CoV-2 and SARS-CoV 3CLpro, the functional state of this enzyme, using docking and MD simulations with the molecular mechanics-generalized-born surface area (MMGBSA) approach. Energetic analysis demonstrated that the three compounds reached good affinities in both systems in the following order: tannic acid > 3-isotheaflavin-3-gallate > theaflavin-3,3-digallate. This tendency is in line with that experimentally reported between these ligands and SARS-CoV 3CLpro. Therefore, tannic acid may have clinical usefulness against COVID-19 by acting as a potent inhibitor of SARS-CoV-2 3CLpro. Communicated by Ramaswamy H. Sarma


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the 7 th coronavirus species discovered that infects humans, is the causative agent of the ongoing COVID-19 viral pandemic Guan et al., 2020;. The virus was first discovered in Wuhan in late 2019, causing a pneumonia-like outbreak that quickly spread worldwide (Ludwig & Zarbock, 2020). In March 2020, the World Health Organization (WHO) declared COVID-19 a pandemic due to its extreme outbreak (Brinks & Ibert, 2020) and notified global authorities to take emergency measures. While many countries have successfully combated the pandemic and have been declared COVID-19 free, the pandemic situation has relapsed in several others. To date, approximately 2.68 million lives have been lost globally to COVID-19, with 121.58 million people still infected [https://www. worldometers.info/coronavirus/]; the global COVID-19 situation seems hardly optimistic overall. The virus was initially reported to cause fever, coughing, sneezing, breathing difficulties in noncritical cases, pneumonia, and multiple organ failure, leading to death in severe cases . Recent studies indicate the possibility of viral infection also causing kidney dysfunction and myocardial injury (Kwong et al., 2018;Nguyen et al., 2016). While SARS-CoV-2 is the third coronavirus that has reached epidemic status after 2002 SARS and 2012 MERS , it is by far the deadliest of the three and the only one to spread on a global scale in such a short period (Petrosillo et al., 2020;Xie & Chen, 2020).
SARS-CoV-2 is an enveloped, positive-sense, singlestranded RNA virus of the genus Betacoronavirus in the family Coronaviridae, which belongs to the order Nidovirales. It has been suggested that SARS-CoV-2 followed an evolutionary transmission cascade similar to that of SARS-CoV and MERS-CoV, both of which have zoonotic origins, with their natural originators being bats (Benvenuto et al., 2020;Zhou et al., 2020). It has also been confirmed that SARS-CoV-2 shares $80% sequence identity with SARS-CoV (Ceraolo & Giorgi, 2020). While most coronaviruses generally contain six open reading frames (ORFs), SARS-CoV-2 contains 14, among which ORF 1a/b plays the most substantial role in viral proliferation Gordon et al., 2020;Masters, 2006). ORF 1a/b translates into two overlapping polyproteins, pp1a and pp1ab, which are cleaved by the main protease 3CL pro and papain-like protease PL pro enzymes, into 16 nonstructural proteins, including a major protein for viral reproduction such as RdRp Khailany et al., 2020;Masters, 2006;Ziebuhr et al., 1997Ziebuhr et al., , 2000. It has been discovered that the PL pro enzyme also recognizes the C-terminal sequence of ubiquitin (Baez-Santos et al., 2015), but the 3CL pro enzyme exclusively cleaves polypeptide sequences after a glutamine residue (Zhang, Lin, Sun, et al., 2020). The rest of the genome sequence translates into structural proteins, which include the spike glycoprotein (S), an envelope protein (E), the membrane protein (M), and the nucleocapsid phosphoprotein (N). The spike glycoprotein (S) recognizes the human angiotensin-converting enzyme-2 (ACE-2) receptor, which makes it indispensable for viral propagation Woo et al., 2010).
Since the discovery of SARS-CoV-2 in late 2019, scientists have developed various methods to alleviate the severity of the resulting disease and minimize the spread of the infection; after the declaration of a pandemic by the WHO, a global effort emerged for the rapid development of vaccines and specific antiviral treatments. Scientists have vested much effort in developing new antiviral drugs, with an influential group of researchers focusing on drug repurposing, as this method is faster than developing novel medicines. Among the antiviral drug targets that have been studied against coronaviruses, 3CL pro , PL pro , RdRp, and spike glycoprotein S have been treated as significant drug targets for the antiviral treatment of diseases, as they play crucial roles in viral proliferation and infection (Hilgenfeld, 2014;Ibrahim, Abdelrahman, Hussien et al., 2020;Ibrahim, Abdeljawaad, Abdelrahman, et al., 2020;Zhang, Lin, Sun, et al., 2020, Zhang, Lin, Kusov, et al., 2020. The deubiquitinase nature of PL pro makes substrate-derived inhibitors of PL pro also inhibit host-cell deubiquitinases, making drug development targeting PL pro arduous (Baez- Santos et al., 2015). Several FDAapproved RdRP inhibitor drugs, including remdesivir, favipiravir, sofosbuvir, ribavirin, lopinavir, ritonavir, tenofovir, and galidesivir, are effective against a broad range of RNA viruses, including past coronaviruses, and have been tested against SARS-CoV-2 for potential antiviral treatment. So far, only remdesivir has shown a reduction in the recovery period; however, it has shown zero impact on mortality (Beigel et al., 2020;Li & De Clercq, 2020;Mitj a & Clotet, 2020;Sinha & Balayla, 2020;Yavuz & € Unal, 2020). A recent computational study opted for an alternative pathway using structural analogs of FDA-approved RdRP inhibitor drugs. While the result of this analysis was shown to be optimistic, the computational nature of the study makes the possibility of developing effective antiviral drugs uncertain (Hasan et al., 2021). 3CL pro of SARS-CoV-2 proteolytically cleaves the pp1a and pp1ab polyproteins from ORF a/b into functional proteins, a critical step during viral replication, representing an essential target for decreasing the impact of COVID-19 (Zhang, Lin, Sun, et al., 2020). The alignment of 3CLpro of SARS-CoV-2 and 3CL pro of SARS-CoV showed that these proteins share up to 95% sequence identity, indicating that SARS-CoV 3CL pro inhibitors may function similarly against SARS-CoV-2. Different theoretical studies repurposed from 3CL pro of SARS-CoV have been published to identify new inhibitors of monomeric (Andrianov et al., 2020;Jimenez-Alberto et al., 2020;Li & De Clercq, 2020) and dimeric 3CL pro of SARS-CoV-2 (Bello, 2020;Bello et al., 2020), with the latter being the conformation of the active enzyme (Graziano et al., 2006). Taking advantage of this information, the present study assessed three naturally available plant-derived SARS-CoV 3CL pro inhibitors, namely, tannic acid, 3-isotheaflavin-3gallate, and theaflavin-3,3-digallate, which have been previously confirmed to have potent SARS-CoV 3CL pro inhibitor activity (Chen et al., 2005); these inhibitors were virtually docked against both SARS-CoV and SARS-CoV-2 using molecular docking analysis, protein-ligand interactions, molecular dynamic simulations, and free energy calculations to predict their potential for antiviral treatment of SARS-CoV-2 using nutraceuticals.

Molecular docking
The three compounds were docked on dimeric SARS-CoV-2 3CL pro and SARS-CoV 3CL pro using the software AutoDock Tools 1.5.6 and AutoDock 4.2 (Morris et al., 2009). Hydrogen atoms were added to the ligands and protein atoms, and Kollman and Gasteiger partial charges were given for the receptor and ligand, respectively. The grid box was placed on the binding site of each monomeric subunit of dimeric SARS-CoV-2 3CL pro and SARS-CoV 3CL pro with grid xyz points of 70 Â 70 Â 70 Å and a grid space of 0.375 Å. Ligand location was assessed using a Lamarckian genetic algorithm as employed elsewhere (Yadava et al., 2013;Yadava, 2018). The docking runs were set to 100, and the population in the Lamarckian genetic algorithm was 150. The 100 docked poses were clustered into groups with RMSD values lower than 1.0 Å. The protein-ligand complex with the lowest binding free energy was designated as the initial conformer to start MD simulations. The docking was validated by replicating the binding mode of inhibitor N3 and TG-0205221 on SARS-CoV-2 3CL pro (PDB ID: 6LU7) and SARS-CoV 3CL pro (PDB ID: 2GX4) with RMSD values lower than 1.0 Å ( Figure 1S, supplementary material).

MD Simulations
MD simulations were performed with AMBER16 software (Case et al., 2005) using the tall atom ff14SB force field (Duan et al., 2003). The three compounds' force fields were determined using AM1-BCC atomic charges and the general Amber force field (GAFF) (Wang et al., 2004). Each system obtained through docking was neutralized with 0.10 M NaCl and then solvated in a dodecadic box of 12.0 Å using the TIP3P water model (Jorgensen et al., 1983). The minimization and equilibration of solvated and neutralized systems consisted of the following steps: minimization through 1000 steps using the steepest descent method and 3000 steps using the conjugate gradient method. Afterward, the complexes were heated through 200 ps, and then, the density was equilibrated through 200 ps; finally, the systems were equilibrated with 600 ps of constant pressure equilibration at 310 K. MD simulations were run for 100 ns with triplicate experiments under an NPT ensemble at 310 K. The electrostatic forces were described by the PME method (Darden et al., 1993). A 10 Å cutoff was selected for the van der Waals interactions. The SHAKE algorithm (Van Gunsteren Berendsen, 1977) was employed to constrain bond lengths at their equilibrium values. Temperature and pressure were preserved using the weak-coupling algorithm (Berendsen et al., 1984). The MD results were analyzed using AmberTools16, whereas the images were built using Maestro Schr€ odinger version 10.5 (Schr€ odinger, 2016). Hydrogen bonds (H-bonds) were identified based on the following criteria: a maximum distance of 3.5 Å, a minimum donor angle of 120 , and a minimum acceptor angle of 90.0 . Representative protein-ligand complexes were obtained using a cluster analysis employing the kclust algorithm present in the MMTSB toolset (http://www.mmtsb.org/software/ mmtsbtoolset.html),

Binding free energy and per-residue decomposition calculations
The MMGBSA (Miller et al., 2012, Gohlke & Case, 2004 approach was used to determine the binding free energy (DG bind ) values for the complexes and determine the perresidue decomposition energy. Five hundred snapshots at time intervals of 100 ps were taken over the equilibrated time (last 50 ns), representing the simulation time where the average energies converged ( Figure 2S, supplementary material). Before the analysis, all counterions and water molecules were removed, and a salt concentration of 0.10 M was considered with the implicit solvation model (Onufriev et al., 2004). The DG bind calculation and per-residue decomposition analysis were performed as described elsewhere (Bello & Garc ıa-Hernandez, 2014), and the DG bind values signify the average values of three experiments.

Docking interactions between TA and 3CL pro of SARS-CoV-2 and SARS-CoV
The tannic acid (TA)-docked complex with SARS-CoV-2 3CL pro (SARS-CoV-2 3CL pro /TA) and SARS-CoV 3CL pro (SARS-CoV 3CL pro /TA) showed docking scores of À9.48 and À9.60 kcal, respectively (Table 1S, supplementary material). The SARS-CoV-2 3CL pro /TA complex showed a total of 11 H-bonds in 8 residues of subunit 1 ( Figure 3SA, supplementary material). Among these residues, four were similar to those in the complex between TA and SARS-CoV 3CL pro (SARS-CoV 3CL pro /TA) ( Figure 4SA, supplementary material). Via H-bonding interactions of Glu166, residues Phe140, Gly143, Glu166, and Gln189 of subunit 1 differed between the SARS-CoV 3CL pro /TA complex and SARS-CoV-2 3CL pro /TA complex, as Glu166 had a total of 5 H-bonding interactions ( Figure 5S, supplementary material) with TA in the SARS-CoV 3CL pro /TA complex but 3 H-bonds in the SARS-CoV-2 3CL pro /TA complex. Additionally, both complexes had a common H-bonding interaction with residue 46, which was a Ser residue in the SARS-CoV-2 3CL pro /TA complex but an Ala residue in the SARS-CoV 3CL pro /TA complex. In the case of subunit 2 of the SARS-CoV-2 3CL pro /TA complex ( Figure 3SB, supplementary material), a total of 11 H-bonds were formed by nine residues; among these, two residues were similar to the SARS-CoV 3CL pro /TA complex ( Figure 4SB, supplementary material). Ser139 and Gln189, through H-bonding interactions with Gln189, differed between the SARS-CoV 3CL pro /TA complex and SARS-CoV-2 3CL pro /TA complex, as Gln189 formed a total of 2 H-bonding interactions with TA in the SARS-CoV-2 3CL pro /TA complex but only one H-bond in the SARS-CoV 3CL pro /TA complex. Subunit 1 of the SARS-CoV-2 3CL pro /TA complex had a total of 16 polar contacts with TA, and 11 of these contacts held similar positions in the SARS-CoV 3CL pro /TA complex with residues Thr25, Ser144, His163, His164, His172, Gln189, Thr190, and Gln192 of subunit 1 and Ser1, Asn214 and Gln299 of subunit 2. Subunit 2 of the SARS-CoV-2 3CL pro /TA complex also had 16 polar contacts with TA; eight residues, namely, Asn214 in subunit 1 and Thr25, His41, Ser139, Asn142, Ser144, Gln189, and Gln192 in subunit 2, were found to be shared. Both complexes also had a common polar residue, no. 169 of chain B, although in the SARS-CoV-2 3CL pro / TA complex, it was a His residue rather than Thr169 in the SARS-CoV 3CL pro /TA complex. Additionally, subunit 1 of the SARS-CoV-2 3CL pro /TA complex was observed to have a total of 14 hydrophobic contacts with TA; among them, ten contacts were shown to be similar to the SARS-CoV 3CL pro /TA complex, including Met49, Phe140, Leu141, Cys145, Met165, Leu167, Pro168 and Ala191 of chain A and Ile213 and Cys300 of chain B. Subunit 2 of the SARS-CoV-2 3CL pro /TA complex formed a total of 22 hydrophobic contacts with TA; among them, 10 contacts were shown to be similar in the SARS-CoV 3CL pro /TA complex, including Phe3 and Ile213 of subunit 1 and Leu27, Met49, Phe140, Leu141, Cys145, Met165, Leu67, and Ala191 of subunit 2. From a comparative observation, TA as a ligand showed a considerably higher binding affinity for the SARS-CoV-2 3CL pro enzyme than for the SARS-CoV 3CL pro enzyme.

Convergence of MD simulations
Root mean squared deviation (RMSD) and radius of gyration (Rg) analysis showed that bound SARS-CoV-2 3CL pro and SARS-CoV 3CL pro reached equilibrium between 10 and 30 ns ( Figures 8S and 9S, supplementary material). Therefore, the first 30 ns were removed from the 100 ns simulation for further analysis.

Interactions between TA and 3CL pro of SARS-CoV-2 and SARS-CoV
The complex of TA docked with 3CL pro of SARS-CoV-2 showed a docking score of À9.44 kcal in each subunit.
Notably, 3CL pro of SARS-CoV-2 was shown to have a total of 17 H-bonds in the subunit (Figure 1), three of which were from side chains, with the others from residue backbones; among these, 5 H-bonds were in positions similar to those in the SARS-CoV 3CL pro /TA complex ( Figure 2): Cys145, Glu166, Arg188 and Thr190 of subunit 1 and Ser1 of subunit 2. The H-bonding interactions of Glu166 and Thr190 differed between the SARS-CoV 3CL pro /TA complex and the SARS-CoV-2 3CL pro /TA complex, as Glu166 had a total of 3 H- bonding interactions ( Figure 10SA-B, supplementary material) with TA in the SARS-CoV 3CL pro /TA complex. In contrast, only one H-bond ( Figure 11SA-B, supplementary material) was formed in the SARS-CoV-2 3CL pro /TA complex, and Thr190 had a total of 2 H-bonding interactions with TA in the SARS-CoV-2 3CL pro /TA complex, while only one H-bond formed in the SARS-CoV-2 3CL pro /TA complex. In the case of subunit 2 of the SARS-CoV-2 3CL pro /TA complex (Figure 1), a total of 15 H-bonds (3 of which were H-bonded side chains) were formed, while the SARS-CoV 3CL pro /TA complex (Figure 2) formed only eight H-bonds (5 of which were H-bonded side chains). Similarities were noticed in only 2 locations: Cys145 of subunit 2 (H-bonded backbone) and Asn214 of subunit 1 (H-bonded side chain). Another notable difference between the SARS-CoV-2 3CL pro / TA and SARS-CoV 3CL pro /TA complexes was that in the SARS-CoV 3CL pro /TA complex, there were 2 double H-bonding interactions (H-bonded side chain) with TA by Glu47 and Glu166 ( Figure 10SC-D, supplementary material) of subunit 2, which were absent in the SARS-CoV-2 3CL pro /TA complex, whereas a triple H-bonding interaction (H-bonded backbone) was observed with His164 of subunit 2. Subunit 1 of the SARS-CoV-2 3CL pro /TA complex had a total of 12 polar contacts with TA, and 7 of these contacts held similar positions in the SARS-CoV 3CL pro /TA complex: residues His41, Asn142, Ser144, His163, Gln189, and Thr190 of chain A and Ser1 of subunit 2. Subunit 2 of the SARS-CoV-2 3CL pro /TA complex had a total of 14 polar contacts with TA, while the SARS-CoV 3CL pro /TA complex had only 7 polar contacts, with 5 residues, namely, Asn214 of chain A and His41, Asn142, Ser144, and Gln189 of subunit 2, found in common. Analysis of the hydrophobic interactions showed that subunit 1 of the SARS-CoV-2 3CL pro /TA complex had a total of 16 hydrophobic contacts with TA. Among these contacts, seven were shown to be similar to those in the SARS-CoV 3CL pro /TA complex, namely, Leu167, Pro168, Met49, Met165, Cys145, and Phe140 of chain A and Cys300 of subunit 2. Subunit 2 of the SARS-CoV-2 3CL pro /TA complex formed a total of 17 hydrophobic contacts with TA; six contacts were shown to be similar to those in the SARS-CoV 3CL pro /TA complex, namely, Val303 of subunit 1 and Cys44, Met49, Leu50, Phe140 and Met165 of subunit 2. From a comparative observation, TA as a ligand showed a considerably higher binding stability with the SARS-CoV-2 3CL pro enzyme than the SARS-CoV 3CL pro enzyme.
In subunit 2 of the SARS-CoV-2 3CL pro /TF3 complex ( Figure  3B), 4 H-bonding interactions (backbones) were observed, of which two residues, His164 and Glu166 ( Figure 11SC-D, supplementary material), were similar to those in the SARS-CoV 3CL pro /TF3 complex ( Figure 3D), which only formed 2 Hbonds in total. However, it must be noted that Glu166 of the SARS-CoV 3CL pro /TF3 complex formed a side-chain H-bond ( Figure 10SG-H, supplementary material), unlike the Hbonded backbone of the residue from the SARS-CoV-2 3CL pro /TF3 complex. SARS-CoV-2 3CL pro /TF3 complex subunit 1 formed eight polar contacts with TF3, of which four were common to the SARS-CoV 3CL pro /TF3 complex, namely, Glu189, His41, Asn142, and His163, and subunit 2 formed 5 polar contacts with TF3 and compared to 7 polar contacts in SARS-CoV 3CL pro /TF3 complex subunit 2, where two residues were found to be similar: His41 and His164. Subunit 1 of the SARS-CoV-2 3CL pro /TF3 complex formed 4 hydrophobic contacts with TF3, of which three residues were common to those in SARS-CoV 3CL pro /TF3 complex subunit 1: Leu27, Cys145, and Met165. On the other hand, subunit 2 of the SARS-CoV-2 3CL pro /TF3 complex formed seven hydrophobic contacts; when compared with the SARS-CoV 3CL pro /TF3 complex, three residues identical to those in subunit 1 were found to be in common: Leu27, Cys145, and Met165.

Binding free energy calculations
Differences in the binding free energy (DG bind ) for the complexes between ligands and the SARS-CoV-2 3CL pro and SARS-CoV 3CL pro systems were estimated using the MMGBSA approach, indicating that all of the complexes are energetically favorable and guided mainly through van der Waals energy (DE vdw ) and the nonpolar free energy of desolvation (DG npol,sol ). Table 1 shows that the ligand reaches a higher affinity in most cases for one of the subunits of the dimeric SARS-CoV-2 3CL pro and SARS-CoV 3CL pro systems, which is consistent with the differences found in the map of interactions observed through structural analyses (Figures 1-4) and with previous reports where this energetic behavior has been observed (Bello, 2020;Bello et al., 2020). The average DG bind values for the ligands coupled at the first and second subunits of SARS-CoV-2 3CL pro show that TA reaches the highest affinity, followed by TF2B and TF3. Similarly, average DG bind values for the ligands bound at the two subunits showed that TA binds with the highest affinity to SARS-CoV-2 3CL pro , followed by TF2B and TF3. Interestingly, this tendency is similar to that experimentally reported between these ligands and SARS-CoV-2 3CL pro , showing IC 50 values of 3, 7, and 9.5 mM for TA, TF2B, and TF3, respectively (Chen et al., 2005). Comparative analysis of the DG bind values of TF2B and TF3 with SARS-CoV-2 3CL pro indicated an affinity similar to that observed for darunavir and indinavir . Additionally, TA showed a higher association than saquinavir, a potent inhibitor of SARS-CoV-2 3CL pro previously identified through MMGBSA methods . Comparisons of the affinity of the three compounds for the SARS-CoV-2 3CL pro and SARS-CoV 3CL pro systems showed that they exhibit a higher affinity for SARS-CoV-2 3CL pro than SARS-CoV 3CL pro . Based on this result, it is evident that TA can be proposed as an anti-COVID-19 clinical drug. TA is an FDA-approved drug used in the treatment of cold sores, diaper rash, and poison ivy. It is also taken by mouth and used directly for bleeding, chronic diarrhea, bloody urine, dysentery, and cancer (https://go.drugbank. com/drugs/DB09372).
In general, this analysis takes into consideration the participation of His41 and Cys145, two conserved residues (Nukoolkarn et al., 2008), in molecular recognition and highlights the relevance of other residues (Met49, Asn142, His163, Met165, Glu166, Asp187, and Gln189) in the stabilization of ligands; these residues are similar to those previously observed for ligand stabilization in SARS-CoV-2 3CL pro and SARS-CoV 3CL pro systems (Bello, 2020;Bello et al., 2020).

Conclusion
In this contribution, we first performed docking studies of three plant-derived compounds previously reported to be SARS-CoV 3CL pro inhibitors on dimeric SARS-CoV-2 3CL pro and SARS-CoV 3CL pro , after which 100-ns MD simulations combined with the MMGBSA approach were employed to compare the results. Our results showed that the binding affinity of the three natural compounds in complex with SARS-CoV 3CL pro reproduced the experimental affinity tendency, in which tannic acid showed the highest association. Comparing the binding affinity of the three compounds between and SARS-CoV-2 3CL pro and SARS-CoV 3CL pro revealed that the compounds exhibited a higher affinity for SARS-CoV-2 3CL pro than SARS-CoV 3CL pro , suggesting that these three compounds may have potential as inhibitors of SARS-CoV-2 3CL pro . In addition, per-residue free energy decomposition allowed identification of hot-spot residues (His41, Met49, Cys145, Asn142, His163, Met165, Glu166, Asp187, and Gln189), which contributed significantly to the total binding affinity. Of these residues, His41 and Cys145 are conserved residues that are considered necessary for ligand binding.

Disclosure statement
No potential conflict of interest was reported by the authors.