Rational approaches to discover SARS-CoV-2/ACE2 interaction inhibitors: from pharmacophore-based virtual screening to molecular dynamics


 The lack of effective treatment remains the biggest bottleneck in combating the current novel coronavirus (COVID-19) pandemic. Drug discovery is a complex and time-consuming process, while various in silico methods dramatically reduce the time and cost of this process. The infection begins with the binding of the receptor-binding domain (RBD) of Spike (S) glycoprotein from SARS-CoV-2 to the host angiotensin-converting enzyme (ACE2) receptor. We, therefore, present computational screening methods aimed to identify inhibitors potentially able to target RBD. For this purpose, pharmacophore mapping, molecular docking, and molecular dynamics (MD) simulations were performed to screen potential candidates against ACE2/SARS-CoV-2. In particular, pharmacophore-based virtual screening used over a hundred million compounds to construct protein-protein interactions (PPIs) inhibitor library. In conclusion, PubChem-84280085 compound is suggested as a potential PPIs inhibitor for preventing SARS-CoV-2 RBD/ACE2 interactions.


Introduction
The world is currently facing the challenge of the COVID-19 outbreak associated with a novel strain of coronavirus. In the last two decades, two outbreaks have also been caused by coronaviruses, severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS), occurring in 2002 and 2012, respectively [1]. The recent outbreak, severe acute respiratory syndrome (SARS-CoV-2), was first identified in China and spread so rapidly around the world that the World Health Organization (WHO) declared COVID-19 as a pandemic on March 11, 2020 [2]. The prevalence of COVID-19 and the mortality rate from respiratory disease and multiple organ damage have not yet been controlled. To date (December 8, 2020), one year after the start of the pandemic, more than 65.8 million infections and over 1.5 million deaths have been reported worldwide, and the numbers are increasing daily [3]. COVID-19 is currently a global concern for humanity, and it is unclear how long this situation will continue. The pandemic also has tremendous economic damage in most countries around the world [4]. For these reasons, the researchers' efforts are globally trying to discover effective drugs and vaccines to tackle the SARS-CoV-2 infection [5].
The coronavirus genome encodes four structural proteins: spike (S) protein, membrane (M) protein, envelope (E) protein, and nucleocapsid (N) protein [6]. The S protein, located in the outer envelope, promotes the interaction and entry of SARS-CoV-2 into the host cell. The closed state and the open state are two different structures for the S protein [7]. In the closed S state, the three recognition motifs at the interface between protomers are buried. As a result, the open state is necessary for interacting with ACE2 at the host cell surface, facilitating membrane fusion and viral entry [8] (Figure 1).
The spike protein consists of two subunits: S1 contains the receptor-binding domain (RBD), which serves as a receptor-binding region, and S2, which includes the fusion machinery and is responsible for membrane fusion [9]. The S1 region consists of the N-terminal domain (NTD) and C-terminal domain (CTD) [10]. The first entry step for the SARS-CoV-2 life cycle is binding to the host cellular membrane protein, angiotensin-converting enzyme 2 (ACE2), via CTD [10,11].
In SARS-CoV-2-CTD, atomic information shows that a series of hydrophilic residues form a strong network of hydrogen bonds and salt bridge interaction substitutions at the binding interface [12]. These protein-protein interactions (PPIs) play a critical role in the virus-receptor interactions [12,13]. Some PPI inhibitors have entered clinical trials in recent years, and some have been licensed for marketing, showing that PPI inhibitors have a wide perspective. In the anti-COVID-19 drug development context, targeting the PPI interface and blocking coronavirus-host receptor interactions via the discovery of PPI inhibitors could be useful to fight the current coronavirus epidemic. As some specific monoclonal antibodies (e.g., CR3022) have been developed to effectively inhibit the viral invasion by blocking the interactions between RBD domain and ACE2.
Since discovering and developing a new drug is a very lengthy and costly process, a combination of computational methods is an excellent replacement to identify potential drug candidates from the large compound libraries [14]. Such in silico approaches are more rational, cost-effective, and time-saving [15]. In this regard, many attempts have recently been made to design new drugs against SARS-CoV-2 infection using computational methods [9,16,17]. These approaches including pharmacophore-based virtual screening, molecular docking, and molecular dynamics (MD) simulation. Pharmacophore-based virtual screening over the chemical libraries is an exciting technique that has been successfully applied for supporting in-silico hit discovery, hit-to-lead expansion, and lead optimization [18]. A pharmacophore leads to the specific molecular identification of ligands by biological macromolecules with specific selectivity [19].
In the present study, we designed computational screening methods to identify potential compounds as inhibitors of SARS-CoV-2 and ACE2 in three main stages. Firstly, we applied a rational approach to mapping the pharmacophoric features of the hot spot regions in the PPI interface between SARS-CoV-2 and ACE2. Also, pharmacophore-based virtual screening was performed on over a hundred million compounds to construct the PPI inhibitor library against ACE2/SARS-CoV-2 RBD. Secondly, the library was subjected to molecular docking to select hit compounds as potential inhibitors. Finally, the selected hit compounds were screened by molecular dynamics simulations and binding free energy calculation to confirm the docked protein-ligand complex's stability and affinity.

Structural analysis of Spike/ACE2 interaction
To investigate the interface between the SARS-CoV-2 RBD and ACE2, the complex crystal structure of ACE2/SARS-CoV-2 RBD (PDB id: 6M0J) was taken from the Protein Data Bank.
Moreover, BIOVIA Discovery Studio 2020 software was employed to analyze intramolecular interactions, such as hydrogen bonding, ionic interactions, and hydrophobic contacts of PPI interactions [20].

Pharmacophore query construction
The pharmacophore query for the ACE2-SARS-CoV-2 RBD complex interfaces was built using the PocketQuery feature generation PPI target-based capability. PocketQuery is an online platform that can use machine learning techniques to identify hotspot amino acids and drug binding sites [21]. PocketQuery first predicts and rates the interface residues in terms of druggability for a query PPI complex. A machine learning technique, the support vector machine (SVM), predicts the druggability of a residue, the change of solvent accessible surface (SASA) area upon complexation (ΔSASA), percentage of the total possible SASA(ΔSASA%), estimation of the change of free energy of an alanine mutation (Rosetta Energy (ΔΔG)), the change of free energy of a residue upon complexation (FastContact Energy (ΔG)), a sequence conservation score, and an evolutionary rate of the residue [22]. The following search filters were applied at this point to find druggable amino acid clusters on the spike: average ΔSASA > 44.6233 average ΔSASA% > 39.6, Cluster Score > 0.55, and Cluster Residue = 3. The clusters were selected based on the highest Cluster Score and the maximum binding site coverage.

Pharmacophore-based virtual screening
Selected clusters were used as a pharmacophore model to conduct a 3D pharmacophore search by the pharmit server [23], which offers both pharmacophore and molecular shape search methods and a rating of results by energy minimization. Since the Pharmit server presented many molecules in the outcome, the resulting hits were refined by applying various filters, such as total hits, hits per conformer, molecular weights, number of rotatable bonds, log P, polar surface area (PSA), number of aromatic rings and RMSD.

Docking-based virtual screening
Hits obtained from the pharmacophore search collected and duplicate compounds were removed by the Babylon tools. The 3D structure of the SARS-CoV-2 spike was selected as the target protein and acquired from Protein Data Bank (PDB Code 6M0J). Ligands and protein were prepared using AutoDock Tools and docked with AutoDock Vina software [24]. Docking was done using a 50 A˚ × 800 A˚ × 50 A˚ binding site grid box centered on the position of Gly493, with the exhaustiveness parameter of 32.

Molecular dynamics simulation
MD simulations were done via GROMACS 5.1.4 package [25] using gromos 54a7 force fields and SP3 water model [26]. It was used the ATB server to prepare the coordinates and topology of ligands [27]. The system was neutralized with the appropriate amounts of chloride and sodium ions. Periodic Boundary Condition (PBC) was applied along every simulation box axis in each simulation system [28]. The LINCS algorithms constrained all covalent bonds. A short-range electrostatic interaction in addition to a 1.2 nm distance cutoff for the van der Waals interaction [29] and the Particle Mesh Ewald (PME) algorithm calculated the long-range electrostatic interaction. The steepest descent algorithm fulfilled the energy minimization of all systems, then the NVT ensemble for 500 ps equilibrated all the systems. Afterward, the NPT ensemble progressively directed the equilibration of each system and the Nose-Hoover algorithm temperature [30] preserved and the temperature at 310 K. During the NPT equilibration, the Parrinello-Rahman barostat [31] maintained the pressures at 1 bar. MD simulation was completed for the complexes in 100 ns.

Binding free-energy analyses
Upon performing MD simulations, Gromacs utilities examined and evaluated the results of every trajectory. The nonpolar and polar interactions between the RBD and ligands can be explained via binding free-energy calculation. Through exercising the MM-PBSA method, the binding free-energy was calculated with the g_mmpbsa tool [32].
The total amount of binding free-energy (∆G) is realized by adding up the nonpolar interaction free energy (∆G nonpolar ) and the polar interaction free energy (∆G polar ) that can be explained as follows: ∆G total = ∆G nonpolar (∆G nonpolar solvation energy + ∆G vdWvan der Waals energy ) + ∆G polar (∆G polar solvation energy +∆G electrostatic energy )

Analysis of the interface between the SARS-CoV-2 RBD and ACE2
As illustrated in Figure 2, the core structure of SARS-CoV-2 RBD consists of a twisted fivestranded antiparallel β-sheets (β1, β2, β3, β4, and β7) with short α helices connected by flexible loops. In addition, between the β4 and β7 strands in the core, there is an extended insertion containing short β-sheets (β5 and β6), α-helices (α6 and α7), and loops (A-B-C) (Figure 2a). This groove-shaped extended insertion is the receptor-binding motif (RBM) that slides and interacts with the claw-like structure of the ACE2 receptor protease domain (PD). ACE2 binds to RBD through N-terminal α1 and α2 antiparallel helices as well as the β3-β4 loop (Figure 2b). In the contact area, a total of 29 residues, including 16 residues of RBM in contact with 13 residues of ACE2, are involved in their interactions. This contact surface can be divided into three regions (Error! Reference source not found.): 1-At the C-terminus of α1 helix (Asp38, Tyr41, and Gln42) and β3-β4 loop (Gly354 and Lys353) from the AEC2 form a network of H-bonds with loop A (Gly446, Tyr449,) loop C (Gly496, Gln498, Thr500, Asn501, Gly502,) and short α7 helix (Tyr505) of the RBD.

Pharmacophore-based virtual screening
Clusters obtained from PocketQuery were used as a query for pharmacophore-based virtual screening by Pharmit. Virtual screening for each cluster was performed on six public databases: PubChem [33], ChEMBL [34], Zinc [35], Mcule [36], ChemSpace [34], and MolPort [37]. During the virtual search, the Lipinski Rule of five was applied. After the pharmacophore search's minimization stage, hits were selected based on the RMSD less than three and the binding energy less than -6 KJ. The number of obtained hits for all clusters was 358 compounds. After removing duplicate compounds, 350 unique hits were identified. The number of obtained hits for each cluster is shown in Supplementary Table S2.

Docking-based virtual screening
The best hits obtained by pharmacophore-based screening were further screened using molecular docking. Molecular docking is a well-known method to predict the binding-conformation of ligands to the appropriate target binding site [38]. The docking processes were carried out using autodock vina software to dock all 350 compounds. After molecular docking, the top-ranked 50 molecules were selected for manual observation according to docking scores and binding mods.
Following manual selection, the ten molecules with a high docking score, low RMSD, and suitable binding mode were selected as potential active molecules for molecular dynamics (Supplementary Figure S2). Docking results revealed that the top 10 compounds interacted with key residues of the contact surface of RBD (RBM) and exhibited a high affinity between -7 and −7.9 kcal/mol. Details of the top 10 compounds, including Docking scores, minimized scores, minimized RMSD, and interactions, are presented in Table 2.

RMSD and RMSF
To recognize the complexes' stability and flexibility, RMSD was employed. The RMSD of complexes is shown in Figure 3 between residue 473 and 493 in all complexes as an exposed loop domain tends to be highly flexible, leading to structural instability during MD simulation [40]. This high flexibility seems to make it difficult for binding inhibitors as well as antibodies in this region.

The number of hydrogen bonds and the free binding energy
The number of hydrogen bonds and the free binding energy between the protein and the ligands play a key role in stabilizing the protein/ligand complexes [41]. The average number of H-bonds between RBD and the ligands at 310 K has been calculated that shown in Figure 5. The compounds, including PubChem-89448287, ZINC000014748612, PubChem-136107209, and PubChem-84280085, formed the most significant number of hydrogen bonds with an average of 2.5, 2, 1.7, and 1.6, respectively important for the binding affinity. Meanwhile, ΔEele made a significant contribution to binding free energy. In contrast, polar solvation energy was found to be unfavorable.

Discussion
In the process of finding PPI inhibitors, combining pharmacophore-based virtual screening with docking is helpful to understand how these ligands bind to the PPI interface and to predict the affinity using a scoring function [42]. In addition, molecular dynamics and MM-PBSA can be used to optimize the structures of the final complexes from docking, calculate more detailed interaction energies, and provide information about the ligand-binding mechanism and binding free energy calculations [43,44]. In order to discover inhibitors of the spike-ACE2 interaction interface, we established the workflow combining pharmacophore virtual screening, docking, molecular dynamics, and mmpbs free energy calculation techniques. Firstly, hot spot regions in the PPI interface between SARS-CoV-2 and ACE2 were investigated. Our analysis detected 29 amino acids that were involved in interactions between ACE2 and the RBD that were divided into three parts ( Figure 2). Secondly, Pocket Query was employed to map the pharmacophore features onto the PPI interface's amino acids. Therefore, six pharmacophore models were obtained based on the score, maximum coverage of the complex contact surface, and the presence of hotspot amino acids (Table 1). Following this, for each model, the 3D pharmacophore search was performed using the Pharmit server on 6 large databases over a hundred million compounds. Subsequently, 356 hits were obtained from Pharmit server by applying various filtering, and ranking criteria, including drug-like properties, Lipinski's "rule of five", the energy minimization, and RMSD described in detail in the methods section. After that, the hits that match a well-defined pharmacophore were analyzed through molecular docking by autodock vina. The top ten molecules with high docking scores, low RMSD, and suitable binding mode were selected as potential active compounds for molecular dynamics analysis and MMPBSA Free energy calculation (Table 3). Consequently, four compounds, including PubChem-84280085, PubChem-89448287, PubChem-136654447, and PubChem-117741619 with high total binding free energy, chose to analyze the binding mode of interactions ( Figure 6 ).

Conclusion
In the current study, pharmacophore virtual screening, docking, molecular dynamics and MM-PBSA approaches were utilized to discover inhibitors of the SARS-CoV-2 spike-ACE2 interaction. Taking together, high binding energy, a large number of bonds with hot spot amino acids, maximum coverage of hot spot area, as well as the structural stability of the complex during MD simulation, PubChem-84280085 compound is proposed as a potential PPI inhibitor for preventing interactions between SARS-CoV-2 RBD and ACE2. However, further investigations are needed to confirm these findings.

Ethical Declaration/Conflict of Interest
This work does not require any ethical statement, and the authors declare no competing interests. S.Gh supervised the study; All authors read and approved the manuscript.  . Spike protein is composed of S1 subunit (which includes the N-terminal domain (NTD) and the receptor-binding domain (RBD) ) and S2 subunits. With a magnification of the RBD-ACE2 binding interface. ACE2 is colored red, SARS-CoV-2 RBD core is colored blue.  a. Schematic of SARS-CoV-2 RBD structure. Different structures are shown by different colors. The core structure of SARS-CoV-2 RBD consists of a twisted five-stranded antiparallel β-sheets (β1, β2, β3, β4, and β7) with short α helices connected by flexible loops. receptor-binding motif (RBM), an extended insertion containing short β-sheets (β5 and β6), α-helices (α6 and α7), and loops (A-B-C), located between the β4 and β7 strands in the core.

M.Y and
b. Overall structure of the SARS-CoV-2 RBD (green and gray) bound to ACE2(purple). The ACE2 binds to RBD through N-terminal α1 and α2 antiparallel helices as well as the β3-β4 loop.
c. Ribbon representation of the RBD-ACE2 binding interface. ACE2 is shown in blue. This contact surface can be divided into three regions: Region one includes residues in C-terminus of α1 and β3-β4 loop from the AEC2 and loop A, loop C, and short α7 residues of the RBD. Region two: includes residues in the middle α1 helix from AEC2 and β4 strands and β5 strands residues from the RBD. Region three: includes residues in N-terminus of α1 and C-terminus α2 helix of ACE2 and loop B residues from the RBD.          Table 2  Table 3. Ranked calculated binding free energies using MM/PBSA for the ten protein-ligand systems (the values presented are in kcal/mol). The terms consist of van der Waals(∆ vdW), electrostatic(ΔEele), polar solvation(ΔGPB), solvent-accessible surface area (ΔGSA), and binding energies(ΔGbinding).