Membrane-Mediated Sars-cov-2 Host Cell Entry: Potential Inhibitory Roles of Terpenoids in Silico

Targeting viral cell entry proteins is an emerging therapeutic strategy for inhibiting the rst stage of SARS-CoV-2 infection. In this study, 106 bioactive terpenoids from African medicinal plants were screened through molecular docking analysis against human angiotensin-converting enzyme 2 (hACE2), human transmembrane protease serine 2 (TMPRSS2) and the S proteins of SARS-CoV-2, SARS-CoV and MERS-CoV. In silico ADMET and drug-likeness prediction, molecular dynamics simulation (MDS), binding free energy calculations and clustering analysis of MDS trajectories were performed on the top docked compounds to respective targets. The results revealed eight terpenoids with high binding tendencies to the catalytic residues of different targets. Pentacyclic terpenoids: 24-methylene cycloartenol and isoiguesterin interacted with the hACE2 binding hotspots for the SARS-CoV-2 Spike protein. 11-hydroxy-2 -(3,4-dihydroxybenzoyloxy) abieta -5,7,9 (11),13-tetraene-12-one, 11-hydroxy-2 -(4-hydroxybenzoyloxy)-abieta- 5,7,9(11),13-tetraene-12-one and other abietane diterpenes interacted strongly with the S1-specicy pocket of TMPRSS2. 3-benzoylhosloppone and cucurbitacin interacted with the RBD and S2 subunit of SARS-CoV-2 spike protein respectively. The predicted druggable and ADMET favourable terpenoids formed structurally stable complexes in the simulated dynamics environment. These terpenoids provides core structure that can be exploited for further lead optimization to design drugs against SARS-CoV-2 cell mediated entry, subject to further in vitro and in vivo studies.

Cell entry of coronaviruses depends on a ne interplay between the viral membrane spike (S) proteins and the host cell membrane proteins more importantly are the angiotensin-converting enzyme 2 (ACE2) and serine protease transmembrane protease serine 2 (TMPRSS2). [10]. The S-protein comprises two subunits, S1 as the receptor-binding domain (RBD) and S2 subunit for the fusion of viral membrane and the host cellular membrane. The SARS-CoV-2 relies on the host ACE2, for entry and the TMPRSS2 for Sprotein priming. Upon binding of the S-protein to host receptor through the receptor-binding domain (RBD) in the S1 subunit, the S2 subunit mediates fusion of the viral envelope with the host membranes [11]. Although, the overall sequence similarities between S-protein of SARS-COV-2 and SARS-CoV are approximately ~76%, a nity between S-RBD of SARS-COV-2 and ACE2 is found to be approximately ten times higher when compared with SARS-CoV RBD [11][12][13]. This molecular interaction is responsible for regulating both the cross-species and higher human-to-human transmissions of SARS-CoV-2 [14,15]. Therefore, these protein effectors of viral attachment, membrane fusion and cell entry are known as emerging targets for development of entry inhibitors, antibodies, and vaccines [14].
The use of phytomedicines as cheap alternatives to combat viral diseases and other infections, forms an integral component of African cultural practices, and hence a prominent feature in Africa [16][17][18][19][20]. Terpenoids are a well known class of phytochemicals of tremendous pharmaceutical value over time because of their relevant broad-spectrum utility in medicine [21,22]. Screening a database of phytochemicals from indigenous African medicinal plants may help identify terpenoids with therapeutic potentials against the novel COVID-19 pandemic. Therefore, this study explores computational based screening of terpenoids from indigenous African medicinal plants as potential inhibitors of the emerging protein targets responsible for coronavirus cell entry and subsequent infection.
Subsequently, non-polar hydrogens were merged while polar hydrogen was added to each protein. The well-ordered scheme was repeated for each protein and thereafter saved into dockable pdbqt format for molecular docking.

Ligand preparation
One hundred and six (106) bioactive terpenoids from African medicinal plants were complied base on literature search. Structure Data Format (SDF) of the reference inhibitors (S1: MLN-4760; S2: Camostat and S3: Nel navir mesylates) and 106 bioactive terpenoids derived from African plants were retrieved from the PubChem database (www.pubchem.ncbi.nlm.nih.gov) and converted to mol2 chemical format using Open babel [27]. Other compounds that were not available on the database were drawn with Chemdraw version 19 and converted to mol2 chemical format. Polar hydrogen charges of the gasteigertype were assigned and the nonpolar hydrogen molecules were merged with the carbons and the internal degrees of freedom and torsions were set to zero. The protein and ligand molecules were further converted to the dockable pdbqt format using Autodock tools.

Molecular docking
Molecular docking was performed to evaluate the binding energy and to provide initial coordinates and topology parameters for the MD simulations. Virtual screening of human enzymes and active regions of the coronaviruses spike protein and determination of binding a nities were carried out using AutoDock Vina [28] and the binding scores from vina analysis were further validated by BINDSURF [29]. Docking of bioactive terpenoids and reference compounds against human ACE2, human TMPRSS2 and : SARS-CoV-2 spike protein was performed by AutoDock Vina to locate alternate binding sites enclosing the whole macromolecules with a very extended grid (60 Å × 60 Å × 60Å) to reveal all the possible interaction sites applying exhaustiveness values of 8. Pdbqt form of each protein and terpenoid were uploaded into their respective columns of Autodock Vina and BINDSURF (the online tool) to run the interaction. The compounds were then ranked by their binding a nity scores. The molecular interactions between proteins and selected compounds with higher binding a nity to the proteins were viewed with Discovery Studio Visualizer version 16.

Molecular Dynamics Simulation
Molecular Dynamics simulations were carried out on selected compounds, to evaluate their binding interactions with. The protein-ligand for the top docked terpenoids to SARS-CoV-2 spike (S) protein, human angiotensin-converting enzyme 2 (ACE2) and transmembrane protease serine 2 (TMPRSS2) from AutoDock Vina docking step were used in Molecular Dynamics Simulation (MDS) using NAMD software [30]. Necessary les for MDS were generated using CHARMM-GUI webserver [31,32]. For each complex, the system was minimized for 10000 steps then a production run for 100 ns was performed. Temperature was set to be 310 K and salt concentration was set to be the physiological concentration 0 .154 M NaCl. Afterwards, calculations of Backbone-Root Mean Square Deviation (RMSD), Per residue Root Mean Square Fluctuations (RMSF), Radius of Gyration (RoG), Surface Accessible Surface Area (SASA) were performed using VMD TK console scripts [33].

Binding Free Energy calculation and Clustering analysis
Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) [34] calculation was performed using AmberTools 20 [35] on the results from MDS. TTClust version 4.7.2 were used to cluster the trajectory automatically according to the elbow method and produce a representative structure for each cluster [36].
These representative conformations were analyzed using Protein Ligand Interaction Pro ler (PLIP) to know the interacting amino acids and the types of interactions [37].

Drug-likeness and ADMET studies
The top terpenoids that demonstrated highest binding a nity for ACE2, TMPRSS2 and active regions of SARS-CoV-2 spike protein were subjected to several drug-likeness predictive descriptors which an orally bio-active drug should comply to criteria for drug likeness [38,39]. The predicted Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) study was analysed using the admetSAR webserver [40].
The SDF le and SMILES of the compounds were downloaded from PubChem database to calculate ADMET properties using default parameters. The docking analysis revealed that the reference inhibitor (MLN-4760) to the human ACE2 protein had binding energy of -7.7 Kcal/mol respectively, while Camostat an inhibitor of TMPRSS2 had a binding energy of -7.6 Kcal/mol as represented in gure 3. It was further observed that the topmost docked terpenoids to the ACE2 had higher binding a nity for the S protein of SARS-CoV and MERS-CoV than SARS-CoV-2. More than 10 terpenoids had higher binding a nity than the 3 inhibitors used in this study table S1 (supplementary material). The top 20 docked compounds to SARS-CoV-2 S-proteins had higher binding a nity than nel navir mesylates S2 (supplementary material).
From the binding scores generated by the interacting terpenoids with the ACE2 and TMPRSS2 proteins, the 2 best docked terpenoids with the highest binding a nity are: 24-methylene cycloarteno and isoiguesterin with the corresponding binding energy of -9.7, and -9.5 Kcal/mol respectively. The 2 best docked terpenoids to SARS-CoV-2 S protein are 3-benzoylhosloppone and cucurbitacin with binding energies of -9.4 and -9.3 Kcal/mol respectively. 3-benzoylhosloppone had the highest binding a nity for SARS-CoV-2 S protein and the second top binding a nity to MERS-CoV S protein ( Figure 3).

Amino acid interaction of selected terpenoids with target proteins.
The amino acid interactions of the human target proteins (ACE2 and TMPRSS2) with reference inhibitors and plant derived terpenoids that demonstrated the highest binding tendencies are represented in table 1. In the same way, the amino acid residues of the coronaviruses S protein that interacted with reference inhibitors and terpenoids with the highest binding a nity are shown in table 2. The interacting residues of ACE2 and TMPRSS2 with respective ligand groups were majorly through hydrophobic interactions and H-bond. Few H-bonding below 3.40 Å were observed with coronaviruses S protein (table 4).
The binding of MLN-4760 to ACE2 showed that it was docked into the N terminus and zinc-containing subdomain I of ACE2 ( gure 4a). MLN-4760 exhibited several types of hydrophobic interactions (Pi-Sigma, Pi-Pi T-Shaped, Pi-Alkyl and Alkyl) with TYR 510 PHE 504 MET 360 , LYS 363 and CYS 344 , a Salt and attractive charges to ARG 514 , ARG 518 and ARG 278 and hydrogen bond to TYR 515 , THR 371 , PRO 346 and ARG 273 ( gure 4a).
24-methylene cycloartenol the best docked terpenoid was docked into the C terminus-containing subdomain II of ACE2 but interacted with different residue as with the case of N-acetyl-D-glucosamine ( gures 4b). 24-methylene cycloartenol interacted via H-bond to TRP 163 , SER 170 and TYR 497 .A Pi-Alkyl interaction was also observed with TYR 613 , PRO 492 and VAL 491 . Isoiguesterin interacted via H-bond to ASP 350 , TYR 385 and ASN 394 . A Pi-Alkyl and Alkyl interactions was observed with the ALA 99 , PHE 40 , PHE 390 and LEU 73 , TRP 69 residues respectively in a similar binding pattern with MLN-4760 ( gure 4c).
Camostat was docked into the S1-speci city pocket of TMPRSS2 ( gure 5a). It interacted via conventional H-bond to ve amino residues (ARG 41 , SER 195 , TRP 215 , ALA 190 and ASP 189 ) and via carbon hydrogen bond to GLN 192 of TMPRSS2. The conventional H-bond was formed in the direction of the guanidine group in this order: rst ester bond, second ester bond, while the last three residues interacted with amidino nitrogen of guanidine group respectively. The phenyl ring was responsible for the carbonhydrogen bond with GLN 192 ( gure 5a). T3 and T4 were docked into S1-speci city pocket of TMPRSS2 in a similar binding pattern as in the case of camostat ( gure 5b & 5c). The only difference observed between the binding pattern of T3 and T4 was an additional H-bond between T3 with ARG 41 ( gure 5b).
Nel navir mesylates an inhibitor of SARS-CoV and MERS-CoV S protein interacted in its best docked conformation to the S protein of SARS-CoV-2 in a different manner. Nel navir mesylates was docked into the S2 Subunit of SARS-CoV S protein ( gure 7a). The same inhibitor was docked into to the N-terminal domain (NTD) region of the S1 subunit of SARS-CoV-2 and MERS-CoV S protein ( gure 6a & 8a).
3-benzoylhosloppone with the highest binding a nity for SARS-CoV-2 S protein interacted via H-bond to THR 547 ; Alkyl interaction to PHE 541 and Pi-Alkyl interaction to PRO 589 and LEU 546 . The region of interaction was between the CTD and SD1 region of S1 subunit of SARS-CoV-2 S protein. Cucurbitacin B was docked to the S2 subunit of SARS-CoV-2 S protein but interacted with different amino acid residue.
The interaction of cucurbitacin B to the protein was via H-bond to ARG 1091 , ASN 914 , THR 912 and GLN 1113 ; Pi-Sigma bond to PHE 1121 and Alkyl interaction to ILE 1114 and GLY 1124 ( gure 6c). Pi-Sigma bonds were observed to with the remaining amino acid residues (table 4; gure 8a & 8b). 3benzoylhosloppone interacted via: Pi-Sigma interaction to (PHE 341 ) of NTD; Pi-Pi Stacking to (MET 698 ) of SD2; Pi-Alkyl interaction to (LYS 689 ) of SD2; and an Alkyl interaction to (LEU 344 and ILE 337 ) of NTD with the S1 subunit ( gure 8c).

Energy pro le of best docked terpenoids to respective proteins
The overall energy pro les of terpenoid-protein complexes in the selected clusters with the best docked poses are shown in gures 9-11.

Molecular Dynamics Simulation
Four compounds which are, camostat, 11-hydroxy-2 -(3,4-dihydroxybenzoyloxy) abieta-5,7,9(11),13tetraene-12-one, 24-methylene cycloartenol, and 3-benzoylhosloppone, were analysed for their interactions with SARS-CoV-2 Spike glycoprotein (S protein), Angiotensin-converting enzyme 2 (ACE2), and Transmembrane protease serine 2 (TMPRSS2) proteins. Molecular dynamics simulation was done on each of the protein-ligand complexes and the trajectories were analyzed. The Radius of Gyraion (RoG), Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), and Surface Accessible Surface Area (SASA) results were calculated for each trajectory and are shown in gure 11. There was no observed difference between the RoG of TMPRSS2_camostat and TMPRSS2_(11-hydroxy-2 -(3,4dihydroxybenzoyloxy) abieta-5,7,9(11),13-tetraene-12-one) complexes, while that of ACE2 is larger and S protein has the largest values because it is a monomer of the SARS-CoV-2 spike trimer. All of them are uctuating about certain values. The RMSD values of TMPRSS2_(T3), ACE2-24_methylene cycloartenol), TMPRSS2_camostat, and S protein_(3-benzoylhosloppone) complexes are around 2.13 Å, 3.6 Å, 2.14 Å, and 16.78 Å, respectively. While the RMSF values for TMPRSS2_(11-hydroxy-2 -(3,4dihydroxybenzoyloxy) abieta-5,7,9(11),13-tetraene-12-one), ACE2_24-methylenecycloartenol), TMPRSS2_camostat, and S protein_(3-benzoylhosloppone) complex are uctuating around 0.68 Å, 1.29 Å, 0.73 Å, and 7.36 Å, respectively. TMPRSS2_(11-hydroxy-2 -(3,4-dihydroxybenzoyloxy) abieta-5,7,9(11),13-tetraene-12-one), ACE2_(24methylene cycloartenol), and TMPRSS2_camostat complexes have a spike in the end of their RMSF results indicating the motion of the terminals. The spikes in the middle and the start of the RMSF of ACE2_(24-methylene cycloartenol) complex between amino acid 265 and amino acid 443 and spikes in S protein_(3-benzoylhosloppone) complex corresponds to the loops in the two protein respectively ( Figure  12). The values of SASA can be found to be nearly stable for each complex but differ from each other. Molecular Mechanics/Generalized Born Surface Area (MMGBSA) algorithm in AmberTools 20 was utilized to calculate the ligand binding free energy. All frames (~1000 frame) were used in this calculation for each protein-drug complex. Figure 14 shows the binding a nity in Kcal/mol from MMGBSA analysis with Standard Deviation as error bars for each protein-drug complex. The best binding a nity (more negative) is for TMPRSS2_camostat (-53.5059 Kcal/mol) which indicates the strong binding between them. Table 4 shows the number of clusters, representative frame produced for each trajectory, and the interaction types using PLIP webserver. Hydrophobic, H-bond, salt-bridges, pi-cation and pi-stacking are the types of interactions found by PLIP webserver. Most of complexes have H-bond and hydrophobic interactions, with TMPRSS2_camostat having the largest number of bonds in each cluster compared to other complexes. Figure 15 shows the protein-drug cluster representatives for the protein-ligand complexes and the mode of interaction in the enlarged part of the image. Images were generated using PyMol software V 2.2.2.

Drug likeness and Pharmacokinetic properties of selected terpenoids.
The result generated from the Lipinski and ADMET ltering analyses are represented in table 4 and gure S1 (supplementary le).
Four terpenoids T1, T3, T5 and T6 ful lled the requirement for Lipinski analysis of the rule of-ve with corresponding favourble predicted ADMET parameters.
The in silico predictive pharmacokinetic and ADMET properties from the ltering analyses suggested T1, T3, T5 and T6 with a high probability of absorption, subcellular distribution, low toxicity. Though pharmacokinetic analysis indicated T1 (Table 4) to be less soluble while the ADME/tox analysis indicated high aqueous solubility, ability to pass the high human intestinal absorption, low acute oral toxicity with a good bioavailability score as exhibited by T3, T5 and T6 (Table 4).

Discussion
Interference with several proteins that mediate viral attachment, membrane fusion, and cell entry of coronaviruses is an emerging therapeutic strategy for preventing COVID-19 infection [10,41]. This principle was earlier demonstrated with HIV [42,43] and SARS-CoV [44]. Earlier screening and prospecting of therapeutic phytocompound have been reported for both SARS-CoV and MERS-CoV [45][46][47][48]. Cell-based assays have shown the antiviral potentials of speci c plant terpenoids against Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) [45,49]. This study was therefore undertaken to identify potential inhibitors of membrane-mediated SARS-CoV-2 entry proteins form the class of the plant derived terpenoids. Speci cally, two triterpenes namely 24-methylene cycloartenol and isoiguesterin were reported to target ACE2 as well as the host-virus interface (S-protein-ACE2 Receptor Complex). These compounds interacted with adjacent residues in the conserved domain, apparently portraying its ability to bind and block interactions of hotspot 31 residues. The residues near lysine 31, and tyrosine 41, 82-84, and 353-357 in human ACE2 are important for the binding of S-protein of coronavirus [50]. The hotspots: 31 and 353, makes salt bridge between Lys31 and Glu35, and the hotspot 353, comprising a salt bridge between Lys353 and Asp38, and are both buried in hydrophobic environment, therefore interaction within this region is suggested to affect the binding of its substrate [51]. Abietane diterpenes, namely 11-hydroxy-2 -(3,4-dihydroxybenzoyloxy) abieta-5,7,9(11),13-tetraene-12-one (T3), and 11-hydroxy-2-(4hydroxybenzoyloxy)-abieta-5,7,9(11),13-tetraene-12-one (T4) showed the strongest interaction with with TMPRSS2. In a similar binding pattern to camostat, these compounds were tted into the S1-speci city pocket. They interacted with residue ALA 190 , ASP 189 and GLN 192 that are known to be part of the amino acid found at the basement of the pocket. ASP 189 at the bottom of the pocket is known to determine the speci city of the S1 pocket for basic residues Arg and Lys at position P1 of the substrate [24]. The result showed that the hydroxybenzoyloxyl moiety of the terpenoids (T3 and T4) was responsible for atleast 75% of the H-Bond with the protein. It was further observed that just as in the case of benzamidine (the native ligand) and camostat, the hydroxybenzoyloxyl moiety of the two terpenoids points with its hydroxyl group towards the carboxylate group of ASP 189 forming strong H-bonds with ASP 189 and other residue in the pocket. For camostat, the phenylquanidine moiety pointed into the hydrophobic pocket with the negatively charged ASP 189 at its bottom. Unlike the H-bond formed between the amidino nitrogen of the phenylquanidine and benzamidine, in T3 and T4 the H-Bonds were formed mainly with the hydroxyl, and carboxylate group. A striking similarity observed was that the ester bond that linked both the phenylquanidine moiety of camostat and the hydroxybenzoyloxyl moiety of T3 and T4 to the remaining structural unit of the compounds formed strong H-Bonds to the same residue SER 195 . The phenyl group of the hydroxybenzoyloxy moiety of T3 and T4 further interacted with hydrophobic interactions to CYS 119 and CYS 219 just as the peptide planes of the bonds between Trp215-Gly216 and Cys191-Gln192 sandwich the phenyl ring of benzamidine [24,52] The additional hydrophobic interaction by T3 and T4 may have been responsible for the exhibited higher binding a nities than camostat and benzamidine. Furthermore, while the hydroxybenzoyloxy moiety was directed toward the hydrophobic cleft created by ASP 189 , the abietane agylcon interacted with the imidazol ring of HIS 57 of the S2 pocket that is found next to the S1 pocket and ARG 41 (in the case of T4) which are outside the hydrophobic cleft. A similar interaction as the later was observed with camostat. The strong similarity in the binding pattern and even a far strong binding a nity than the camostat and benzamidine indicates that T3, T4 and other abietane diterpenes especially those with hydroxybenzoyloxyl moiety attached to the abietane aglycon are potential inhibitors of TMPRSS2 thus preventing some coronaviruses from entering host [24]. It is known that, like SARS-CoV, SARS-CoV-2 S protein recognizes and binds to host-cell receptor angiotensinconverting enzyme 2 (ACE2) using a transmembrane protease serine 2 (TMPRSS2) which activates the S protein to facilitate viral fusion and entry into cells [9]. It is important to note that serine protease inhibitors like camostat mesylate, which blocks the activity of TMPRSS2 [53], has been approved in Japan for human use. Related compounds with antiviral activity potentiates as an anti-SARS-CoV-2 agent [54]. Also some abietane terpenoids have been identi ed to exhibit in vitro anti-SARS-CoV activity [45]. This is corroborates the result of our study that shows that abietane diterpenes exhibits a wide spectrum and multiplicity of protein binding; and may thereby speci cally execute a complete blockage of viral entry.
With regards to coronavirus S-proteins two compounds, 3-Benzoylhosloppone and Cucurbitacin B, were of utmost interest. While 3-benzoylhosloppone interacted with amino acid residue of the RBD and SD1 region of the S1 subunit, Cucurbitacin B was docked into the S2 subunit of SARS-CoV-2 S protein. The former subunit is responsible for receptor recognition while the later mediates the fusion of viral membrane and the host cellular membrane [55]. These terpenoids may prevent interaction of spike protein with its host cell receptor, thereby preventing entry of the virus into the host cell. 3benzoylhosloppone has been reported for its antimalarial property while Cucurbitacin B is an anticancer agent [56,57]. The MDS analysis of the top docked with their complexed proteins were stable and could be therefore subjected to experimental processes in further studies. From the Lipinski, pharmacokinetic and ADMET ltering analyses, we identi ed four druggable and non-toxic, natural terpenoids that exhibited strong binding tendency to the various protein targets that mediates coronavirus-host cell entry. The result from the predicted ltering analyses of the four compounds showed parameters that suggest a favourable in silico ADMET and pharmacokinetic properties The terpenoids expressed high probability of human intestinal absorption. They were also non-substrate to the permeability-glycoprotein (P-gp) [58], expressed capability to cross the blood brain barrier (BBB). SARS-CoV-2 has been reported to infect the brain, thus indicating its ability to cross the blood brain barrier (BBB) [59], compounds that can cross the BBB will be bene cail in the overal all viral clearance, The four terpenoids did not show indication of mutagenicity in silico, thereby they may not cause genetic mutations. The compounds did not display inhibitory potential for the various cytochrome P450, thus may not adversely affect phase I drug metabolism in the liver. These terpenoids are therefore considered as potential drug candidates.
Since the identi ed lead compounds showed drug-likeness and low toxicity as indicated by the in silico pharmacokinetically relevant molecular descriptors, they are postulated as potential inhibitors that can be considered for further in vitro and in vivo studies towards developing entry inhibitors against the ongoing coronavirus pandemic.    Table 3: Shows the number of clusters produced from TTClust, its representative frame for each of the protein-ligand complexes, and the interactions between the ligand and the protein from PLIP webserver for that frame.

Declarations
Amino acid residues were represented by single letter code. Bold amino acids are common in each protein-drug complex