Virtual Screening of Heterocyclic Molecules to Identify Potential SARS-COV2 virus Mpro Protease Inhibitors for Further Medicinal Chemistry design

In this work, we aimed to identify potential SARS-COV2 virus M pro protease inhibitors using molecular docking. As a validation step of our docking approach, we used 7 known M pro inhibitors and predicted their binding energies. We found a very good correlation (R = -0.86) between the binding energies and the pIC 50 (-LogIC 50 ) values. We then undertook virtual screening of 50753 heterocyclic molecules from the PubChem Database. The screened molecules were rst ltered out using Lipinski, Veber and Ghose rules resulting in 21 142 which was submitted to the docking phase. The docked compounds were ranked according to their predicted binding energy to the protease and a threshold of -9.0 Kcal/mol was applied resulting in a set of 2711 hits. These hits were split into different groups according to their chemical class using a rule-based classication approach. The best compound from each of the most populated classes were subjected to binding mode analysis which led to ligand-receptor interaction maps and suggested some medicinal chemistry-oriented modications to further optimize the potency of the obtained hits. Predicted IC 50 of the hit molecules ranged from 0.85 to 0.43 µM against SARS-COV2 virus M pro protease.


Introduction
Since the onset at the end of 2019 of infection with the SARS-COV2 virus to the present day (May 26 th , 2020) [1,2], the pandemic situation has resulted in 5,370,375 con rmed cases and 344,454 overall deaths in 195 countries and territories. In Europe, 2,025,176 con rmed cases have been reported The aggressive nature of this viral infection has pushed the scienti c community into a race against the clock to nd remedies and solutions to control its spread around the world. One of the solutions implemented very quickly by some countries and belatedly by others is the containment of the population combined with the wearing of protective masks. These solutions have made it possible to reduce the number of new cases signi cantly in certain countries, but the risk of a new wave of contamination remained real. In addition to the health consequences, the pandemic has caused a slowdown in economic activity and stunted growth judged by economic indicators with an unprecedented decrease in industrial production. This impact could exceed $4.1 Trillion [4].
Faced with this global health crisis, pharmaceutical players were working hard to develop therapeutic solutions (vaccines or drugs). Chloroquine was the rst drug to be clinically tested against SARS-COV-2 [5]. The pioneering team in this therapeutic ght against the SARS-COV2 virus is that of a French researcher at Aix-Marseille Hospital in France. Their study showed a synergistic effect in vitro between chloroquine (primarily used to prevent and treat malaria) and azythromycin [6] (an antibiotic used for the treatment of a number of bacterial infections) against the SARS-COV2 virus in concentrations compatible with those obtained in the human lung [7]. Other existing drugs were tested against the virus such as Remdesivir and proved e cacy in treating patients carrying COVID-19 disease [8].
Target based drug discovery approach combined to molecular modeling and medicinal chemistry proved to be a highly e cient strategy to quickly develop therapeutics against human diseases [9][10][11]. With the genomic characterization of the SARS-COV2 virus and the elucidation of the 3D structures of certain of its proteins such as the main protease [12,13], a door opened for the use of the target-based approach drug discovery to identify therapeutic agents against this virus [14]. Recent works using computational approaches such as structure-based drug design, molecular modeling and virtual screening assisted in the identi cation of potential inhibitors from different sources (existing drugs, natural compounds, chemical synthesis) against some key targets of SARS-COV2 virus [15,16].
In this work, following target-based drug discovery approach, we undertook the virtual screening of heterocyclic compounds against the main protease of the SARS-COV2 virus. The potential inhibitors were grouped into chemical classes and each class was characterized by its mode of binding in the active site of the protease. This work opens interesting perspectives for medicinal chemistry combined with the structured-based drug design to optimize the activity of the selected hits from this study and to synthesize and evaluate the activity of new anti-SARS-COV2 main protease inhibitors.

Materials And Methods
2.1. Chemical drawing and classi cation. All molecular and Markush structures were designed by MarvinSketch [35]. The chemical classi cation of the molecules was performed using ClassyFire engine [36].  [13]. 50753 2D structures of heterocyclic molecules were download from the PubChem Database [37]. The salts forms of the molecules as well as the mixtures were processed and the largest fragment in each molecule was kept. The strong acids were deprotonated, the strong bases were protonated, and explicit hydrogen were added to the structures. The resulting set of molecules was subjected to sequential ltering steps including the Lipinski rules [38] (MW < = 500, logP < = 5, HBA < = 10; HBD < = 5), Veber's rules [39] (Rotatable Bonds ≤ 10; Polar surface Area ≤140 Å 2 ) and Ghose rules [40] (40 < = Molar refractivity < = 130; 20 < = Atom Count < = 70). The ltered set of heterocyclic molecules as well as the known inhibitors were converted to 3D structure (when keeping the original stereochemistry information) with an energy minimization step using openBabel Software [41].
2.3. Molecular Docking process. The 3D structure coordinates of the SARS-COV2 main protases M pro (PDB code: 5RGI) was obtained from the Protein Data Bank [42,43]. The molecular docking was performed by AutoDock Vina package [44]. AutoDock MGTOOLS program was used to prepare the active site of the receptor [44]. All the water molecules as well the ligand structure co-crystallized with the protein were removed. Nonpolar hydrogens were added to the protein and a grid box was constructed around the protease active with a size of 20 Å, 16 Å et 20 Å in X, Y and Z directions with a central point de ned at 9, 0,1 and 21 three-dimensional coordinates. The 3D structure of the protease was then saved into a PDBQT le format. All the ligand 3D structures were prepared for docking using a python script provided with AuDock Vina package and which produced the PDBQT les needed for the docking step. All the other AutoDock Vina parameters were kept at their default values.
2.4. Visualization. This step was performed using PyMol softaware [45]. AutoDock Vina generates 9 poses for each docked ligand. An inhouse python script was used to extract the pose with the best score which expresses the binding energy of the ligand in Kcal/mol as unit. Th generation and visualization of 2D ligand-receptor interactions maps was performed by Discovery Studio Vizualizer [46].

Prediction of binding a nitis of known inhibitors
As a validation step of the docking approach, knwon inhibitors were docked into the active site of M pro protease. This step concerned 7 inhibitors taken from the work of Jin, Z. et al. (2019) [13] (Table 1). Carmofur, anti-cancer agent, is a pyrimidine derivatives obtained from uorouracil (chemotherap agent) that can be administered orally [17]. Ebselen, a benzoisoselenazolone derivative, is a non-toxic selenoorganic drug with antiin ammatory, antiatherosclerotic and cytoprotective properties [18,19]. Tideglusib, a thiodiazole derivative, is a potent, selective and irreversible inhibitor of glycogen synthase kinase 3 (GSK-3) [20]. TDZD-8, an other thiodiazole derivative is also an inhibitor of GSK3 kinase [21]. PX-12, a sul des derivative, is an irreversible inhibitor the redox protein thioredoxin [22], which has been associated with cancer and tumor growth. Disulfuram is a carbamate derivative used as an alcohol deterrent and inhibitor of aldehyde dehydrogenase [23]. Shikonin, a quinone derivative, is an apoptotic agent active in a variety of cancer cell lines but not in normal cell lines [24].
Due to technical limitations, Table 1 is provided in the Supplementary Files section.
Molecular Docking of these known inhibitors showed docking scores ranging from -7.9 Kcal/mol to -1.7 Kcal/mol. When comparing the pIC 50 (-Log IC 50 ) values and the docking scores, we obtained a good anticorrelation core cient (R) of -0.86 and a regression determination coe cient R 2 fo 0.74 ( Figure 1).

Drug likeness of ligand molecules
Before molecular docking step, the set of heterocyclic compounds were subjected to a ltering step using Lipinski's rule of ve. This rst lering kept 34783 out of 50753 compounds. The application of the Veber rules resulted into 32669 compounds and the Ghose rules reduced the set of compounds to 21 142. This nal set of compouds was then submitted to the docking phase into the M pro protease active site.

Virtual screening of heterocyclic compounds
The ltered set of heterocyclic compounds were virtually screened within the M pro protease active site prepared using AutoDock Vina. The program predicts the best poses that a ligand may have and its binding energy (docking score) to the receptor. The docked compounds were then ranked according to their score of binding energies. In order to look for compounds potentially more active than Ebselen (Score = -7.9 Kcal/mol, pC 50 = 6.17) and Tideglusib (Score = -7.9 Kcal/mol, pIC 50 = 5.81), we set a threshold for the docking binding energy at -9.0 Kcal/Mol. Using this threshold, we selected 2711 compounds whose binding energy was less or equal to −9.0 kcal/mol (Supplementary Table S1). This set is referred to as HitSet.
The distributions of the four Lipinsk's rules with the selected set of compounds were quite symmetrical except for the hydrogen bonds donor property where the distribution was skewed at the left. In fact, 80% of the molecules have less or equal to two hydrogen bond donors ( Figure 2).

Compounds classi cation
The compounds of the HitSet were submitted to the classi cation engine Classy re. This web application uses a rule-based classi cation approach that relies on a comprehensible, comprehensive, and computable chemical taxonomy. As a result, we obtained 162 different chemical classes (Supplementary   Materials Table S1). Nine chemical classes had more than 70 compounds: Benzene and substituted derivatives, quinolines and derivatives, indoles and derivatives, benzopyrans, diazanaphthalenes, diazinanes, carboxylic acids and derivatives, azoles and organo-oxygens. We have analyzed the members of each chemical classes and selected few representatives with the best docking scores for analyzing their binding mode inside the M pro protease active site. We have also analyzed some exceptional natural molecules that were not within the HitSet due to the noncompliance with Ghose rules but presented interesting predicted binding energies 4. Discussion

Prediction of binding energy of Heterocyclic compounds
From each highly populated chemical class obtained from the HiTSet, we analyzed the lead compound showing the highest docking score. For these compounds, we have elucidated and discussed their binding mode into M pro protease active site and predicted their binding energy as well as their IC 50 (Table   2).

Benzene and substituted derivatives
This chemical class is the largest in terms of number of compounds. The most active compound holds a uorobenzene moiety and presented a docking score of -10,6 Kcal/mol which corresponded to a predicted IC 50 of 0.6 μM derived from the simple linear regression we have obtained before. This compound, referred to as HBF-0259, was reported as an inhibitor of hepatitis B virus surface antigen (HBsAg) secretion [25]. The visual inspection of its binding mode to M pro protease revealed multiple

Indoles and derivatives
In this chemical class, the best two compounds (Score = -10.9 Kcal/mol) are straurosporine like compounds which are potent non-speci c inhibitors of kinases but present high toxicity. For this reason, we choose to analyse the binding mode of the third best compound presenting a docking score of -10.8 Kcal/mol (predicted IC 50 = 0.57 uM) and which is referred to as Arcyroxocin B. This alkaloid was isolated from two fungal species Arcyria denudata and Arcyria obvelata and showed cytotoxicity against Jurkat cells [27], an immortalized line of human T lymphocyte cells that are used to study acute T cell leukemia.  (Figure 4-c). The potency of this compound can be further optimized through substitutions on the phenyl groups of the two indole moieties.

Benzopyrans
Two compounds from this chemical class, uorescein 5-maleimide and DR396 a benzoic acid derivative compound, presented docking scores of -11.3 Kcal/mol corresponding to predicted IC 50 of 0.5 µM. We choose to study the binding mode of DR396 since it has a reported biological activity as an apoptotic DNase γ inhibitor [28]. In fact, DR396 showed several interactions with the protein amino acids of type HB

Carboxylic acids and derivatives
In this chemical class of selected hits, the best compound showed a docking score of -11.5 Kcal/mol and a predicted IC50 of 0,48 µM. This compound referred to as Pazinaclone (DN-2327), is an anxiolytic drug [29].

Diazanaphthalenes
Two compounds from this chemical class, fumiquinazoline C and H, presented a docking score of-11,6 Kcal/mol corresponding to predicted IC 50 of 0.47 µM. These two molecules are produced by Aspergillus fumigatus, a marine fungus that makes a series of fumiquinazoline peptidyl alkaloids [30]. As the compounds are very similar, we reported only the binding of fumiquinazoline C whose derivatives were reported to have antitumor activity [38]. Fumiquinazoline C predicted binding mode revealed that the atoms of this compounds are in close contact with different protein residues through HB (His-41, His-163, Glu-166, Cys-145), VDW interactions (Leu-141, His-172, His-163, His-164, Met-165), and πinteractions (pi-Alkyl with Pro-168 and Cys-145, π-π T-Shaped with His-41) (Figure 4-f).

Diazinanes
The best compound in terms of docking score from this class shows predicted values for binding energy of -10.3 Kcal/mol and an IC 50 of 0.63 uM. It was reported to have an antagonist activity against 5-HT receptor [31]. Compound 57 presented a high density of interactions within the M pro protease active site. In fact, it was able to make HB contacts to four amino acids (Gly-143, His-163, Leu-141, Ser-144), VDW interactions four amino acids (Met-49, Phe-140, His-164, Met-165), and π-interactions to three amino acids (Glu-166, His-41, Cys-145) (Figure 4-g). In terms of possibilities of chemistry in order to optimize its potency, substitutions are possible in the piperazine group, as well as the naphthalene group and the indazole core.

Azoles
The azoles chemical class showed compound SR-3029 with the highest docking score of -11.2 Kcal/mol and a predicted IC 50 of 0.52 µM. SR-3029 was reported to be highly selective casein kinase 1 Delta /1 Epsilon inhibitor with potent antiproliferative properties [32].

Exceptional chemical class
Few compounds showed good docking scores but did not obey the different ltering rules such as Lipinski's rules. We choose to select the best of these compounds to study its binding mode to M pro protease. One of these compounds is silydianin which showed a docking score of -11,1 Kcal/mol and a predicted IC50 of 0,53 µM. Silydianin, an isomer of Silymarin extracted from the plant Silybium marinum, can control the main symptoms of asthma through modulating immune system responses [34].
Overall, and for all the analyzed hits from the different chemical classes, it appeared that some amino acids within the M pro active site were the key residues entering in close interactions with the different ligands. In fact, four amino acids were involved in HB contacts with ligands very frequently. As we have considered in our binding mode analysis only one representative from each chemical, other compounds from the same class could investigated also for experimental inhibition screening against the M pro protease. This will provide interesting data to generate structure-activity relationships maps to be used for optimizing the activity of the molecules of interest.

Conclusions
In this work, a virtual screening strategy was successfully used to predict the binding energies and the IC 50 of known M pro protease inhibitors. It was also applied to nd potential inhibitors from heterocyclic compounds to this viral protease and to identify their interaction maps with the protease active site. To the best of our knowledge, this is the rst report using this approach combined to chemical analysis of the ligands to identify key interactions between the ligand and the M pro protease and deliver guidelines of potential medicinal chemistry-oriented modi cations of the compounds in order to optimize their potency.
The different hits from this study were categorized in chemical classes and the most active compounds from these classes showed predicted IC 50 ranging from 0.85 to 0.43 µM. This set of potential M pro protease inhibitors came from different sources either natural compounds such capmatinib, arcyroxocin B, and fumiquinazoline C or from organic chemistry synthesis such as pazinaclone, compound 57, SR-3029, AMG-579, HBF-0259, DR396. Most of these identi ed compounds had various reported biological activities. This set of compounds and their homologous ones in the chemical series will be good candidates for experimental screening against SARS-COV2 M pro main protease. The screening of the analogues of the hit compounds will certainly provide data to generate structure-activity relationships maps to be used for medicinal chemistry investigations.

Declarations
Availability of data and materials: The following are available online at https://1drv.ms/x/s!AnTvRYBsyeSEg2VIq4T_77P9_CR0?e = 8qqVyt, Table S1: List of selected compounds from molecular docking with their PubChem ID, chemical class, direct parent class, predicted binding energy and its standard deviation and predicted IC 50 for SARS-COV2 virus M pro protease.