DOI: https://doi.org/10.21203/rs.3.rs-37557/v1
In this work, we aimed to identify potential SARS-COV2 virus Mpro protease inhibitors using molecular docking. As a validation step of our docking approach, we used 7 known Mpro inhibitors and predicted their binding energies. We found a very good correlation (R = -0.86) between the binding energies and the pIC50 (-LogIC50) values. We then undertook virtual screening of 50753 heterocyclic molecules from the PubChem Database. The screened molecules were first filtered out using Lipinski, Veber and Ghose rules resulting in 21 142 which was submitted to the docking phase. The docked compounds were ranked according to their predicted binding energy to the protease and a threshold of -9.0 Kcal/mol was applied resulting in a set of 2711 hits. These hits were split into different groups according to their chemical class using a rule-based classification approach. The best compound from each of the most populated classes were subjected to binding mode analysis which led to ligand-receptor interaction maps and suggested some medicinal chemistry-oriented modifications to further optimize the potency of the obtained hits. Predicted IC50 of the hit molecules ranged from 0.85 to 0.43 µM against SARS-COV2 virus Mpro protease.
Since the onset at the end of 2019 of infection with the SARS-COV2 virus to the present day (May 26th, 2020) [1, 2], the pandemic situation has resulted in 5,370,375 confirmed cases and 344,454 overall deaths in 195 countries and territories. In Europe, 2,025,176 confirmed cases have been reported including 173,454 death. The countries most affected by the pandemic are the United States, Brazil, Spain, Italy, United Kingdom, India and Germany [3]. The USA remains the country most affected by the coronavirus with nearly a third of Covid–19 contaminations and almost a quarter of the dead (99,805 deaths).
The aggressive nature of this viral infection has pushed the scientific community into a race against the clock to find remedies and solutions to control its spread around the world. One of the solutions implemented very quickly by some countries and belatedly by others is the containment of the population combined with the wearing of protective masks. These solutions have made it possible to reduce the number of new cases significantly in certain countries, but the risk of a new wave of contamination remained real. In addition to the health consequences, the pandemic has caused a slowdown in economic activity and stunted growth judged by economic indicators with an unprecedented decrease in industrial production. This impact could exceed $4.1 Trillion [4].
Faced with this global health crisis, pharmaceutical players were working hard to develop therapeutic solutions (vaccines or drugs). Chloroquine was the first drug to be clinically tested against SARS-COV–2 [5]. The pioneering team in this therapeutic fight against the SARS-COV2 virus is that of a French researcher at Aix-Marseille Hospital in France. Their study showed a synergistic effect in vitro between chloroquine (primarily used to prevent and treat malaria) and azythromycin [6] (an antibiotic used for the treatment of a number of bacterial infections) against the SARS-COV2 virus in concentrations compatible with those obtained in the human lung [7]. Other existing drugs were tested against the virus such as Remdesivir and proved efficacy in treating patients carrying COVID–19 disease [8].
Target based drug discovery approach combined to molecular modeling and medicinal chemistry proved to be a highly efficient strategy to quickly develop therapeutics against human diseases [9–11]. With the genomic characterization of the SARS-COV2 virus and the elucidation of the 3D structures of certain of its proteins such as the main protease [12, 13], a door opened for the use of the target-based approach drug discovery to identify therapeutic agents against this virus [14]. Recent works using computational approaches such as structure-based drug design, molecular modeling and virtual screening assisted in the identification of potential inhibitors from different sources (existing drugs, natural compounds, chemical synthesis) against some key targets of SARS-COV2 virus [15, 16].
In this work, following target-based drug discovery approach, we undertook the virtual screening of heterocyclic compounds against the main protease of the SARS-COV2 virus. The potential inhibitors were grouped into chemical classes and each class was characterized by its mode of binding in the active site of the protease. This work opens interesting perspectives for medicinal chemistry combined with the structured-based drug design to optimize the activity of the selected hits from this study and to synthesize and evaluate the activity of new anti-SARS-COV2 main protease inhibitors.
2.1. Chemical drawing and classification. All molecular and Markush structures were designed by MarvinSketch [35]. The chemical classification of the molecules was performed using ClassyFire engine [36].
2.2 Ligand Data preparation. Data on the inhibition of main protease Mpro of SARS-COV2 virus of 7 molecules was obtained from of the work of Jin, Z. et al. (2019) [13]. 50753 2D structures of heterocyclic molecules were download from the PubChem Database [37]. The salts forms of the molecules as well as the mixtures were processed and the largest fragment in each molecule was kept. The strong acids were deprotonated, the strong bases were protonated, and explicit hydrogen were added to the structures. The resulting set of molecules was subjected to sequential filtering steps including the Lipinski rules [38] (MW < = 500, logP < = 5, HBA < = 10; HBD < = 5), Veber’s rules [39] (Rotatable Bonds ≤ 10; Polar surface Area ≤140 Å2) and Ghose rules [40] (40 < = Molar refractivity < = 130; 20 < = Atom Count < = 70). The filtered set of heterocyclic molecules as well as the known inhibitors were converted to 3D structure (when keeping the original stereochemistry information) with an energy minimization step using openBabel Software [41].
2.3. Molecular Docking process. The 3D structure coordinates of the SARS-COV2 main protases Mpro (PDB code: 5RGI) was obtained from the Protein Data Bank [42, 43]. The molecular docking was performed by AutoDock Vina package [44]. AutoDock MGTOOLS program was used to prepare the active site of the receptor [44]. All the water molecules as well the ligand structure co-crystallized with the protein were removed. Nonpolar hydrogens were added to the protein and a grid box was constructed around the protease active with a size of 20 Å, 16 Å et 20 Å in X, Y and Z directions with a central point defined at 9, 0,1 and 21 three-dimensional coordinates. The 3D structure of the protease was then saved into a PDBQT file format. All the ligand 3D structures were prepared for docking using a python script provided with AuDock Vina package and which produced the PDBQT files needed for the docking step. All the other AutoDock Vina parameters were kept at their default values.
2.4. Visualization. This step was performed using PyMol softaware [45]. AutoDock Vina generates 9 poses for each docked ligand. An inhouse python script was used to extract the pose with the best score which expresses the binding energy of the ligand in Kcal/mol as unit. Th generation and visualization of 2D ligand-receptor interactions maps was performed by Discovery Studio Vizualizer [46].
As a validation step of the docking approach, knwon inhibitors were docked into the active site of Mpro protease. This step concerned 7 inhibitors taken from the work of Jin, Z. et al. (2019) [13] (Table 1). Carmofur, anti-cancer agent, is a pyrimidine derivatives obtained from fluorouracil (chemotherap agent) that can be administered orally [17]. Ebselen, a benzoisoselenazolone derivative, is a non-toxic seleno-organic drug with antiinflammatory, antiatherosclerotic and cytoprotective properties [18, 19]. Tideglusib, a thiodiazole derivative, is a potent, selective and irreversible inhibitor of glycogen synthase kinase 3 (GSK–3) [20]. TDZD–8, an other thiodiazole derivative is also an inhibitor of GSK3 kinase [21]. PX–12, a sulfides derivative, is an irreversible inhibitor the redox protein thioredoxin [22], which has been associated with cancer and tumor growth. Disulfuram is a carbamate derivative used as an alcohol deterrent and inhibitor of aldehyde dehydrogenase [23]. Shikonin, a quinone derivative, is an apoptotic agent active in a variety of cancer cell lines but not in normal cell lines [24].
Due to technical limitations, Table 1 is provided in the Supplementary Files section.
Molecular Docking of these known inhibitors showed docking scores ranging from –7.9 Kcal/mol to –1.7 Kcal/mol. When comparing the pIC50 (-Log IC50) values and the docking scores, we obtained a good anti-correlation corefficient (R) of –0.86 and a regression determination coefficient R2 fo 0.74 (Figure 1).
Before molecular docking step, the set of heterocyclic compounds were subjected to a filtering step using Lipinski’s rule of five. This first filering kept 34783 out of 50753 compounds. The application of the Veber rules resulted into 32669 compounds and the Ghose rules reduced the set of compounds to 21 142. This final set of compouds was then submitted to the docking phase into the Mpro protease active site.
The filtered set of heterocyclic compounds were virtually screened within the Mpro protease active site prepared using AutoDock Vina. The program predicts the best poses that a ligand may have and its binding energy (docking score) to the receptor. The docked compounds were then ranked according to their score of binding energies. In order to look for compounds potentially more active than Ebselen (Score = –7.9 Kcal/mol, pC50 = 6.17) and Tideglusib (Score = –7.9 Kcal/mol, pIC50 = 5.81), we set a threshold for the docking binding energy at –9.0 Kcal/Mol. Using this threshold, we selected 2711 compounds whose binding energy was less or equal to −9.0 kcal/mol (Supplementary Table S1). This set is referred to as HitSet.
The distributions of the four Lipinsk’s rules with the selected set of compounds were quite symmetrical except for the hydrogen bonds donor property where the distribution was skewed at the left. In fact, 80% of the molecules have less or equal to two hydrogen bond donors (Figure 2).
The compounds of the HitSet were submitted to the classification engine Classyfire. This web application uses a rule-based classification approach that relies on a comprehensible, comprehensive, and computable chemical taxonomy. As a result, we obtained 162 different chemical classes (Supplementary Materials Table S1). Nine chemical classes had more than 70 compounds: Benzene and substituted derivatives, quinolines and derivatives, indoles and derivatives, benzopyrans, diazanaphthalenes, diazinanes, carboxylic acids and derivatives, azoles and organo-oxygens. We have analyzed the members of each chemical classes and selected few representatives with the best docking scores for analyzing their binding mode inside the Mpro protease active site. We have also analyzed some exceptional natural molecules that were not within the HitSet due to the noncompliance with Ghose rules but presented interesting predicted binding energies
In the first step of the docking procedure, we have docked the 7 known inhibitors into the Mpro active site protease. The existence of the good correlation (R = –0,86) between the docking scores and the IC50 values of the known inhibitors suggested that our docking could be used to predict the binding mode of potential inhibitors and their IC50 for Mpro protease. The visual inspection of the most active inhibitor ebselen (IC50 = 0,26 μM) docked into the protease active site revealed that this compound established π-anion interaction with Glu–166, π-sigma interaction with Met–165, van der Waals (VDW) contacts with His–41 and His–163 and weak hydrogen bond (HB) contact with His–163 (Figure 3).
From each highly populated chemical class obtained from the HiTSet, we analyzed the lead compound showing the highest docking score. For these compounds, we have elucidated and discussed their binding mode into Mpro protease active site and predicted their binding energy as well as their IC50 (Table 2).
This chemical class is the largest in terms of number of compounds. The most active compound holds a fluorobenzene moiety and presented a docking score of –10,6 Kcal/mol which corresponded to a predicted IC50 of 0.6 μM derived from the simple linear regression we have obtained before. This compound, referred to as HBF–0259, was reported as an inhibitor of hepatitis B virus surface antigen (HBsAg) secretion [25]. The visual inspection of its binding mode to Mpro protease revealed multiple contacts to the residues of the active site of types, HB (Leu–141, Gly–143, Ser–144, Cys–145), VDW (Met–49, His–163), and π-sigma (Leu–27, His–41, Met–165) (Figure 4-a). Further optimization of the potency of this compound could be carried out through diverse substitutions on the phenyl groups attached to the pyrimidine core. The substitutions could target new interactions with other amino acids in the active site for enhanced affinity to the protease.
The best compound from this class, a triazine derivative, resulted in a docking score of –10.8 Kcal/mol which corresponds to a predicted IC50 of 0.57 µM. This compound, named Capmatinib, is a potent inhibitor of c-Met kinase with antineoplastic activity [26]. It showed dense network of contacts within the Mpro protease active site including HB (Ser–144, Thr–25), VDW (Cys–44, Gln–189, Ser–46, His–172, His–163, Leu–141, Leu–27), π-Alkyl (Met–49, Cys–145, Thr–26) and π-π T-shaped to His–41 (Figure 4-b). Different potency optimization possibilities are available on the quinoline moiety as well as on the Fluro-phenyl moiety.
Table 2. Predicted most active compounds from the HitSet chemical classes.
Compound |
Chemical Class |
Direct Parent Class |
Predicted Binding Energy (Kcal/mol) |
Stdev Binding Energy (Kcal/mol) |
Predicted IC50 (µM) |
HBF-0259 |
Benzene and substituted derivatives |
Fluorobenzenes |
-10,60 |
1,10 |
0.60 |
Capmatinib |
Quinolines and derivatives |
Quinolines and derivatives |
-10.80 |
1.17 |
0.57 |
Arcyroxocin B |
Indoles and derivatives |
Hydroxyindoles |
-10.80 |
0.72 |
0.57 |
DR396 |
Benzopyrans |
Xanthenes |
-11.30 |
0.52 |
0.51 |
Pazinaclone |
Carboxylic acids and derivatives |
Beta amino acids and derivatives |
-11.50 |
1.03 |
0.49 |
Fumiquinazoline C |
Diazanaphthalenes |
Quinazolines |
-11.60 |
0.95 |
0.48 |
57 |
Diazinanes |
N-arylpiperazines |
-10.30 |
0.67 |
0.64 |
SR-3029 |
Azoles |
Phenylimidazoles |
-11.20 |
0.76 |
0.52 |
AMG-579 |
Organooxygen compounds |
Aryl-phenylketones |
-10.50 |
0.49 |
0.61 |
Silydianin |
Flavonoids |
Flavones |
-11.1 |
0.69 |
0.53 |
In this chemical class, the best two compounds (Score = –10.9 Kcal/mol) are straurosporine like compounds which are potent non-specific inhibitors of kinases but present high toxicity. For this reason, we choose to analyse the binding mode of the third best compound presenting a docking score of –10.8 Kcal/mol (predicted IC50 = 0.57 uM) and which is referred to as Arcyroxocin B. This alkaloid was isolated from two fungal species Arcyria denudata and Arcyria obvelata and showed cytotoxicity against Jurkat cells [27], an immortalized line of human T lymphocyte cells that are used to study acute T cell leukemia. When docked into the active site of Mpro protease, Arcyroxocin B atoms established multiple HB contacts (Cys–145, Gly–143, Ser–144, Leu–141, His–163, Glu–166), VDW interactions (His–41, Phe–140, Met–165), and π-Alkyl interaction with Cys–145 (Figure 4-c). The potency of this compound can be further optimized through substitutions on the phenyl groups of the two indole moieties.
Two compounds from this chemical class, fluorescein 5-maleimide and DR396 a benzoic acid derivative compound, presented docking scores of –11.3 Kcal/mol corresponding to predicted IC50 of 0.5 µM. We choose to study the binding mode of DR396 since it has a reported biological activity as an apoptotic DNase γ inhibitor [28]. In fact, DR396 showed several interactions with the protein amino acids of type HB (Tht–190, Gln–192), Halogen (Glu–166, Phe–140, Gly–143, Asn–142), VDW (Arg–188, Leu–167, Ser–144, Cys–145, His–41, Met–49, Gln–189), and π-interactions (Met–165) (Figure 4-d). Chemical diversifications for optimizing the potency of this compounds could take place at different positions of the molecule on the benzopyran core as well as on the phenyl group of the benzoic acid.
In this chemical class of selected hits, the best compound showed a docking score of –11.5 Kcal/mol and a predicted IC50 of 0,48 µM. This compound referred to as Pazinaclone (DN–2327), is an anxiolytic drug [29]. The evaluation of the binding mode of pazinaclone in the Mpro protease active site showed that it is interacting with several protein amino acids through HB (Gly–143, Cys–145, Ser–144, Asn–145, Glu–166), VDW (Leu–27, Leu–141, His–163, His–164), Alkyl interaction with Met–165, and π-interactions with His–41 (Figure 4-e). For optimization purposes, chemistry could take place with diverse substitutions on the naphtyiridine group.
Two compounds from this chemical class, fumiquinazoline C and H, presented a docking score of—11,6 Kcal/mol corresponding to predicted IC50 of 0.47 µM. These two molecules are produced by Aspergillus fumigatus, a marine fungus that makes a series of fumiquinazoline peptidyl alkaloids [30]. As the compounds are very similar, we reported only the binding of fumiquinazoline C whose derivatives were reported to have antitumor activity [38]. Fumiquinazoline C predicted binding mode revealed that the atoms of this compounds are in close contact with different protein residues through HB (His–41, His–163, Glu–166, Cys–145), VDW interactions (Leu–141, His–172, His–163, His–164, Met–165), and π-interactions (pi-Alkyl with Pro–168 and Cys–145, π-π T-Shaped with His–41) (Figure 4-f).
The best compound in terms of docking score from this class shows predicted values for binding energy of –10.3 Kcal/mol and an IC50 of 0.63 uM. It was reported to have an antagonist activity against 5-HT receptor [31]. Compound 57 presented a high density of interactions within the Mpro protease active site. In fact, it was able to make HB contacts to four amino acids (Gly–143, His–163, Leu–141, Ser–144), VDW interactions four amino acids (Met–49, Phe–140, His–164, Met–165), and π-interactions to three amino acids (Glu–166, His–41, Cys–145) (Figure 4-g). In terms of possibilities of chemistry in order to optimize its potency, substitutions are possible in the piperazine group, as well as the naphthalene group and the indazole core.
The azoles chemical class showed compound SR–3029 with the highest docking score of –11.2 Kcal/mol and a predicted IC50 of 0.52 µM. SR–3029 was reported to be highly selective casein kinase 1 Delta /1 Epsilon inhibitor with potent antiproliferative properties [32]. The docked pose of SR–3029 in the Mpro protease active site showed some interactions of types fluorin-HB (Cys–44, Glu–166, Phe–140), multiples VDW interactions (Met–49, His–41, Leu–27, Thr–25, Thr–45, His–172), and π-interactions (Cys–145, Gly–143, Leu–141, Ser–144) (Figure 4-h). Possible structural diversifications could be done on fluoro-phenyl group or the non-substitued positions on the benzodiazole core.
The best compound of this chemical class is AMG–579 which showed a docking score of –10,5 Kcal/mol and a predicted IC50 of 0,6 µM. AMG–579 is a potent, selective, and efficacious inhibitor of phosphodiesterase 10A [33]. AMG–579 pose in the Mpro active site showed different interactions with the protein amino acids: HB (Thr–26, Ser–144), VDW (Leu–27, His–163, Leu–141, Asn–142), π-interactions (pi-Alkyl to Pro–168 and Cys–145), and alkyl interaction with Cys–145 (Figure 4-i). This molecule can be further optimized for its potency by chemical modifications on the amid moiety, the phenyl group from the benzodiazole and the central phenyl group.
Few compounds showed good docking scores but did not obey the different filtering rules such as Lipinski’s rules. We choose to select the best of these compounds to study its binding mode to Mpro protease. One of these compounds is silydianin which showed a docking score of –11,1 Kcal/mol and a predicted IC50 of 0,53 µM. Silydianin, an isomer of Silymarin extracted from the plant Silybium marinum, can control the main symptoms of asthma through modulating immune system responses [34]. Silydianin pose in the Mpro active site showed different interactions with the protein amino acids: HB (Thr–190, Gln–192, His–164, Ser–144, Leu–141, Phe–140, Glu–166), VDW (Met–49, His–163, Cys–145, His–41, Arg–188), π-interactions (π -Sigma to Gln–189, π-Alkyl to Pro–168), and alkyl interaction with Cys–168 (Figure 4-j).
Overall, and for all the analyzed hits from the different chemical classes, it appeared that some amino acids within the Mpro active site were the key residues entering in close interactions with the different ligands. In fact, four amino acids were involved in HB contacts with ligands very frequently. These amino acids concerned Ser–144, Cys–145, Glu–166, Gly–143. They were involved in HB interactions with at least 4 ligands. Four amino acids established VDW interaction with at least 4 analyzed ligands (His–163, Leu–141, Leu–27, Met–49). These VDW interactions are the predominant type of interactions that the majority of ligands have established with the protein. Three amino acids showed π interactions with at least 3 ligands (Cys–145, His–41, Met–165). These amino acids could be targeted during the design a new of optimized Mpro inhibitors.
As we have considered in our binding mode analysis only one representative from each chemical, other compounds from the same class could investigated also for experimental inhibition screening against the Mpro protease. This will provide interesting data to generate structure-activity relationships maps to be used for optimizing the activity of the molecules of interest.
In this work, a virtual screening strategy was successfully used to predict the binding energies and the IC50 of known Mpro protease inhibitors. It was also applied to find potential inhibitors from heterocyclic compounds to this viral protease and to identify their interaction maps with the protease active site. To the best of our knowledge, this is the first report using this approach combined to chemical analysis of the ligands to identify key interactions between the ligand and the Mpro protease and deliver guidelines of potential medicinal chemistry-oriented modifications of the compounds in order to optimize their potency.
The different hits from this study were categorized in chemical classes and the most active compounds from these classes showed predicted IC50 ranging from 0.85 to 0.43 µM. This set of potential Mpro protease inhibitors came from different sources either natural compounds such capmatinib, arcyroxocin B, and fumiquinazoline C or from organic chemistry synthesis such as pazinaclone, compound 57, SR–3029, AMG–579, HBF–0259, DR396. Most of these identified compounds had various reported biological activities. This set of compounds and their homologous ones in the chemical series will be good candidates for experimental screening against SARS-COV2 Mpro main protease. The screening of the analogues of the hit compounds will certainly provide data to generate structure-activity relationships maps to be used for medicinal chemistry investigations.
Availability of data and materials: The following are available online at https://1drv.ms/x/s!AnTvRYBsyeSEg2VIq4T_77P9_CR0?e = 8qqVyt, Table S1: List of selected compounds from molecular docking with their PubChem ID, chemical class, direct parent class, predicted binding energy and its standard deviation and predicted IC50 for SARS-COV2 virus Mpro protease.
Competing Interest: The authors declare no competing interest.
Funding: This work did not receive any funding.
Author Contributions: Conceptualization, A. Y.; methodology, M. S., R. B., A. Y.; validation, M. S., A. Y.; writing—original draft preparation, A. Y., R. B., M. S.; writing—review and editing, M. S., R. B., A. Y.; visualization. All authors have read and agreed to the published version of the manuscript.