In silico discovery of potential azole-containing mPGES-1 inhibitors by virtual screening, pharmacophore modeling and molecular dynamics simulations

Inhibition of microsomal prostaglandin E2 synthase-1 (mPGES-1) is promising for designing novel nonsteroidal anti-inflammatory drugs, as they lack side-effects associated with inhibition of cyclooxygenase enzymes. Azole compounds are nitrogen-containing heterocycles and have a wide use in medicine and are considered as promising compounds in medicinal chemistry. Various computer-aided drug design strategies are incorporated in this study. Structure-based virtual screening was performed employing various docking programs. Receiver operator characteristic (ROC) curves were used to evaluate the selectivity of each program. Furthermore, scoring power of Autodock4 and Autodock Vina was assessed by Pearson’s correlation coefficients. Pharmacophore models were generated and Güner-Henry score of the best model was calculated as 0.89. Binding modes of the final 10 azole compounds were analyzed and further investigation of the best binding (− 8.38 kcal/mol) compound was performed using molecular dynamics simulation, revealing that furazan1224 (ZINC001142847306) occupied the binding site of the substrate, prostaglandin H2 (PGH2) and remained stable for 100 ns. Continuous hydrogen bonds and hydrophobic interactions with amino acids in the active site supported the stability of furazan1224 throughout the trajectory. Pharmacokinetic profile showed that furazan1224 lacks the risks of inhibiting cytochrome P450 3A4 enzyme and central nervous system-related side-effects.


Introduction
Nonsteroidal anti-inflammatory drugs (NSAIDs) are very popular today as inflammatory responses are very common conditions. Compared to corticosteroid-based therapy, NSAIDs have significantly lower risks and side effects [1][2][3]. However, cardiovascular and gastrointestinal (GI) complications may develop due to NSAID use [4][5][6]. Since most NSAIDs are COX (cyclooxygenase-1, cyclooxygenase-2) inhibitors, new targets have been explored for inhibition. Prostaglandin E 2 (PGE 2 ) is one of the main inflammatory mediators produced in the arachidonic acid cascade and synthesized by cytosolic prostaglandin E 2 synthase (cPGES) and microsomal prostaglandin E 2 synthases (mPGES-1 and mPGES-2). mPGES-2 and cPGES enzymes are not inducible, whereas mPGES-1 is known to release PGE 2 consequent to an inflammatory signal. Inhibition of mPGES-1 is therefore promising for designing novel NSAIDs without the side-effects which have been linked to COX-inhibition. Moreover, mPGES-1 has been reported to play important roles in carcinogenesis, angiogenesis, migration, invasion, and apoptosis. There are also studies demonstrating that increased amount of PGE 2 is linked with many pathological conditions like gastric ulcer, GI cancer, colon, prostate, lung cancer, glioma and neuroblastoma, as well as non-carcinogenic pathologies like Alzheimer's disease, Parkinson's disease, and multiple sclerosis [7].
Being a group of nitrogen-containing heterocycles, azole compounds have a wide variety of use. They are effective against mycosis for both humans and animals; therefore, they are used in many anti-fungal medications including veterinary medicines [8]. Being electron-rich heterocycles, imidazole and benzimidazole easily accept and donate protons. They also form various weak physical bonds. These features allow them bind with therapeutic targets which brings about pharmacological activity [9][10][11]. Besides having medicinal potential against many biological targets [12,13], azole derivatives are also considered promising for mPGES-1 inhibition, e.g., 2-aminothiazole derivatives were found to have inhibitory activity against PGE 2 production [14]. Some triazole and oxadiazole derivatives have also been reported to inhibit mPGES-1 in cell-free assay models [15,16].
Virtual screening (VS) is a cheminformatics tool to discover new molecules, usually expected to bind with a certain target, e.g., enzyme, from a large compound set [17][18][19]. The main goal of VS is to identify compounds with a specific therapeutic potential very fast so as to filter down for biological assays. VS can be performed with two approaches; ligand-based and structure-based. Ligand-based methods exploit the similarity of compounds with the known bioactive compound. Structurebased methods rely on the cavity of the target receptor and the interactions a compound can form within. In the structurebased virtual screening (SBVS), docking stands as a very important tool to predict the affinity of a molecule to another, namely a macromolecule (target). Although molecular docking applications are very useful at identifying hit compounds, they have weaknesses as well as strengths. For example, a scoring function (an essential component of docking applications) might be very good at discarding true negatives (inactive molecules), but it might not be successful at ranking binding affinities, hence the need for incorporating various scoring functions. Pharmacophore models are another computer-aided drug design (CADD) tools that are created according to the knowledge derived from structure-activity relationships and structural properties of active analogs, therefore they are better at distinguishing inactive ligands. A pharmacophore can be determined by direct method which uses receptor-ligand complexes or indirect method which makes use of the information that a collection of ligands known to interact with a certain receptor [20]. Molecular dynamics (MD) simulations offer a visualization of stable binding of the ligand in the active site of the macromolecule. This allows researchers to estimate ligand binding free energies [21]. In the present study, a CADD protocol incorporating VS, pharmacophore modelling, and MD was followed. Binding mode of the best candidate, obtained after filtering ~ 1,000,000 azole compounds using various methods, was closely investigated by means of MD simulations.

Library construction and filtering
A compound library was constructed with 1,011,637 azole compounds retrieved from ZINC database [22]. Hydrogens were added and three-dimensional structures were generated with Babel [23]. The library was "cleaned" by using SMARTS (SMiles ARbitary Target Specification) patterns implemented in Discovery Studio Client 2016 [24]. Those patterns, which typically indicate unwanted chemical groups in a desired drug candidate, are provided in the Supplementary Material Table S1. Next, the entire library was filtered according to Lipinski's rule of five [25]. Briefly, the rule states that the molecular weight of an oral drug should be lower than 500 Da, the number of hydrogen bond acceptors should be at most 10, the number of hydrogen bond donors should be at most 5, and log P should be lower than 5, allowing only one violation. Therefore, filtered ligands were decreased to 971,040.

Design of the workflow and evaluation of selectivity of the scoring functions
Design of the workflow in VS is critical as capabilities of computational tools are different. For the purpose of discarding inactive compounds early, a scoring function that is good at distinguishing between actives and inactives (selectivity) would be the best option for the first stage of screening. rDock is known to eliminate inactive molecules at an early stage [26]. However, scoring functions are target-dependent. For this reason, 3 different docking programs were evaluated for their use at a specific stage. Receiver operator characteristic (ROC) curve for each program was obtained and area under the curve (AUC) was calculated. Toward this end, 50 known active compounds were downloaded from cHEMBL database and 36 decoys for each ligand [27], 1800 decoys were retrieved from DUD-E database [28] (A database of useful decoys: Enhanced, http:// dude. docki ng. org/). rDock was selected as the first screening tool according to the AUC profile. Moreover, Autodock Vina was used for the second screening. Next, pharmacophore-based screening was performed to identify potential mPGES-1 inhibitors. Finally, Autodock4 was used to rank compounds according to their binding affinities before MD simulations. The workflow followed in this study is demonstrated in Fig. 1.

Screening with rDock
Prior to screening with rDock [26], the cavity definition was made by reference molecule method as described in the manual. Toward this end, structure of a known inhibitor [29] with a high in vitro inhibition activity (0.241 µM) was used (Fig. 2). Crystal structure of mPGES-1 (PDB ID: 4AL0) was downloaded from PDB Database [30] and prepared in Discovery Studio Client 2016.
The inhibitor was docked using rDock to the crystal structure of mPGES-1 enzyme in the presence of glutathione to ensure a minimized ligand-receptor interaction energy.
Considering the size of the chemical space (971,040 compounds), high-throughput virtual screening (HTVS) protocol for rDock was followed. Toward this end, the first docking with all compounds were performed with 5 runs. Later, ligands exhibiting binding score of − 10 or below were subjected to further docking with 10 runs. Next, for ligands which scored below − 20 were filtered for further docking simulations with Autodock Vina. Filtering at every step was performed according to SCORE.INTER, which refers to the ligand's binding energy.

Further structure-based virtual screening with Autodock Vina
According to ROC profile, Autodock Vina ranks the second best at distinguishing active and inactive compounds. Furthermore, the scoring function of Vina is known for its speed and accuracy [31]. For this reason, screening was pursued with Autodock Vina. Filtered ligands according to their scores in rDock were further subjected to docking with Autodock Vina. Prior to this, tautomers and protonation states were generated using Prepare Ligands in Discovery Studio Client 2016. Ligands which raised problems of generating 3D coordinates, bad valences or being duplicates were eliminated. Consequently, 44,366 compounds were subjected to molecular docking simulations with Vina. A grid box was prepared with center coordinates of x: 11.749, y: − 12.53, z: − 12.147 and dimensions of x: 60 Å, y: 60 Å, and z: 60 Å. The center was chosen at the sulfur atom of GSH.

Pharmacophore generation
Remaining compounds in the library were screened with pharmacophore to recover more drug-like compounds. In order to generate a pharmacophore model, a training set comprising known active mPGES-1 inhibitors was prepared. Toward this end, 20 molecules were retrieved from the literature and cHEMBL database. The entire set was docked to the crystal structure of mPGES-1 enzyme with Autodock Vina to determine their binding affinities (Table 1).
Next, the training set was subjected to Common feature pharmacophore generation protocol in Discovery Studio Client 2016. Ten hypotheses were generated by HipHop algorithm. Conformation generation was selected BEST, with an energy threshold of 20 kcal/mol and maximum conformation of 250. Maximum interference distance was set to 2.97 and the number of maximum pharmacophores to be generated was set to 10.

Validation of the pharmacophore model
Each generated pharmacophore model was validated by using a test set including 380 decoys and 20 compounds that are known to be active mPGES-1 inhibitors (the training set). Decoys were downloaded from DUD-E database (A database of useful decoys: Enhanced, http:// dude. docki ng. org/). The quality of the pharmacophore often refers to its power to differentiate between inactive compounds and active compounds. Towards   this end, the test set was screened against each pharmacophore and several statistical parameters such as percentage yield of actives, ratio of actives and enrichment factor (EF) were estimated. Güner-Henry score [32], which determines the quality of a pharmacophore model, was also calculated.
where %A is the ratio of actives in the entire hit list, %Ya is the fraction of active hits relative to total hits, Ht refers to total hits acquired during database search, Ha refers to active compounds in the hit list, D is the total number of compounds, and A is the number of active compounds in the test set.

Screening the ligand library with pharmacophore model
After validating each pharmacophore model, hypothesis 01 was selected to further screen the ligand library. Upon filtering with this model, 48 compounds remained in the library. Ten compounds having the lowest Autodock Vina scores and fit value ≥ 3 were selected for further investigation.

Optimization with quantum mechanical methods
Selected compounds were subjected to geometry optimization using Gaussian [33]. M062X method and 6-31G(d,p) basis set were selected for the calculations. All calculations were performed on Linux platform.

Evaluation of scoring power and docking with Autodock4
Experimental data (IC 50 , nM) of 50 known active compounds (see Design of the workflow and evaluation of selectivity of the scoring functions) were obtained from cHEMBL database plotted against the highest score estimated for each compound. Pearson's correlation coefficient was calculated for each docking program and Autodock4 was found to be better at ranking binding affinities of ligands. Optimized structures were subjected to docking using Autodock4 [34]. Toward this end, the crystal structure of mPGES-1 enzyme was cleaned and prepared. Grid box was defined as described in Further structure-based virtual screening with Autodock Vina section and Autogrid4 was employed for grid calculation. Conformational space was searched by Lamarckian genetic algorithm [35], which was implemented in AutoDock4. Maximum number of energy evaluations was set to 2,500,000 and 100 runs were performed for each ligand. All calculations were performed on Linux platform.

Molecular dynamics simulation
The azole derivative with the highest affinity in complex with mPGES-1 enzyme was selected for MD simulations. mPGES-1 enzyme was defined with CHARMM36 force-field while GSH and ligand were parameterized by using CGenFF server (https:// cgenff. umary land. edu). The complex (enzyme with GSH and ligand) was solvated in TIP3P water [36] using a periodic cubic box and was minimized by using steepest descent algorithm with GROMACS 5.1.4 package [37]. Minimization was complete as the maximum force of 10 kJ/mol was reached. Next, the system was subjected to NVT equilibration for 100 ps with leap-frog integrator, V-rescale thermostat [38] and T = 300 K. This was followed by NPT equilibration to equilibrate volume for 100 ps, also with leap-frog integrator, V-rescale thermostat, and T = 300 K. Finally, the equilibrated complex was submitted to MD simulations for 100 ns. For both stages of equilibration and MD simulation, LINCS algorithm [39] was used for constraining bond lengths and Particle Mesh Ewald (PME) Fig. 3 ROC curves and calculated AUC values for each docking program [40] was employed for long-range electrostatic interactions (with a cut-off of 1.2 nm). V-rescale algorithm was selected for temperature coupling and Parrinello-Rahman barostat [41] for pressure coupling. The simulation time-step was set to 2 fs and data were recorded every 10 ps. Clustering of conformations was performed based on root-mean-square deviation using gmx cluster tool. GROMOS clustering algorithm [42] and a cutoff of 0.16 nm was selected. Optimal cut-off value was chosen according to the plot provided in the Supplementary Material Fig. S1. Central complex structure was generated from the most populated cluster.

ROC curves
Receiver operating characteristic (ROC) curve is a recognized method to evaluate the ability of a test to distinguish between two sets of data [43]. In the field of drug design, this approach is used to discriminate "active" and "inactive" compounds in virtual screening. Eliminating inactive compounds from a large set of compounds at an early stage in VS is critical. Therefore, decision on the right scoring function is beneficial for a successful screening. ROC curves were generated for three different docking programs; rDock, Autodock Vina, and Auto-dock4 (Fig. 3). Sensitivity was plotted against 1-specificity for each threshold of the score estimated by the scoring function. Sensitivity can be described as the ratio of true positives to the sum of true positive and false negative, and specificity can be described as the ratio of true negatives to the sum of true negatives and false positives. Area under the curve (AUC) values were calculated for each curve. An AUC value of 0.5 indicates a test that selects randomly, whereas an AUC value of 1 (highest possible value) indicates an ideal test for discriminating two different populations. According to AUC values calculated for each program, rDock is better at discriminating actives and inactives than the other two.

Evaluation of scoring power
A scoring function with a high ability to determine the correct binding pose does not ensure its performance on ranking the binding affinity [44]. For this reason, Pearson's correlation coefficients (r P ) were calculated using experimental binding affinities (IC 50 ) to evaluate the scoring power of Autodock4 and Autodock Vina ( Table 2). A value of r P close to 1 indicates a very strong correlation between experimental data and the docking scores, whereas r P being close to 0 shows a poor correlation. Both programs showed a fair correlation while Autodock4 was found to be superior at ranking binding affinities for mPGES-1 enzyme.

Common feature pharmacophore model
Ten best pharmacophore hypotheses were generated after common feature generation was employed for the training set, scores ranging between 78.870 and 65.864 (Table 3). Hypotheses can be grouped as HDAA (01-07) and HAD (08-10).

Pharmacophore validation
Each generated hypothesis was investigated by using certain statistical parameters such as the number of total hits (Ht), the number of active hits (Ha), enrichment factor (EF), % yield of actives, and % ratio of actives. Güner-Henry scoring method was used to determine the discriminating power of a pharmacophore model. Table 4 shows these parameters for each hypothesis.
According to Table 4, hypotheses 01 and 07 identify all of the 20 active compounds. However, as Ht also has a high value, hypothesis 07 presents poor selectivity. A GH score higher than 0.7 indicates a very good and reliable model [45,46]. Hypothesis 05 exhibits a sufficient GH score; nevertheless, 16 active compounds mapped to the model out of 20 lowered the selectivity. Hypothesis 01, displaying an excellent GH score of 0.89 suggests that this is a reliable model for identifying active compounds in a database. Therefore, hypothesis 01 was chosen as the best pharmacophore model. Figure 4 shows the hypothesis model 01 with its features: hydrophobic point (H), hydrogen bond acceptor (A), and hydrogen bond donor (D).

Screening the ligand library with pharmacophore model
After validating each pharmacophore model, hypothesis 01 was selected to screen the ligand library. Ten compounds with fit values ≥ 3 and having the lowest Autodock Vina scores were selected for Autodock4 docking.

Autodock4 results
Remaining ten compounds after pharmacophore screening were subjected to geometry optimization and docking with Autodock4. Binding energies and calculated inhibition constants are given in Table 5.

Docking and binding mode analysis
Detailed interaction types and distances are provided in the Supplementary Material Table S2. The most commonly observed binding mode seems to take advantage of surrounding hydrogen bond donors in the catalytic channel.
A more detailed look to the binding mode and interaction types reveals that this binding mode, which the final ten compounds share (Supplementary Material, Fig. S2), is in the substrate-binding region. This is supported by the mechanism suggested by Sjögren et al. which requires GSH, Arg126, and Ser127 in close contact with the substrate,  [47]. Additionally, Jegerschold et al. proposed that GSH and Arg126 are directly involved in the mechanism [48]. Therefore, it is important to note these close interactions between the cofactor GSH, Arg126, Ser127, and the inhibitor. Since furazan1224 forms all the critical interactions for the enzyme reaction and binds with the lowest energy (− 8.38 kcal/mol), this inhibitor was used to demonstrate interactions in the cavity of mPGES-1 (Fig. 5). The reader is referred to Figs. S3-S11 for interactions of other inhibitors. Furazan1224 forms hydrogen bonds with Arg126, Ser127, Leu121, and Arg70. Guanidine groups of Arginine residues are mostly responsible for the interactions. Furthermore, aromatic rings of the ligand make π-cation interactions with the guanidine groups of Arg126 and Arg67. Ala123 forms π-alkyl interactions with furazan1224.

Molecular dynamics simulation
The strongest-binding azole compound (furazan1224) was chosen for 100 ns molecular dynamics simulation as the other azole derivatives share the same binding mode but differ in affinity. Root mean square deviation of the ligand is shown in Fig. 7, which reveals the stability of furazan1224 throughout the trajectory. RMSD of the protein and GSH are shown in Figs. S12 and S13. The central complex structure ( Fig. 6) of the most populated cluster was defined as to represent the dominant conformation during the simulation. Crystal structure of the enzyme is also provided in Fig. 6. It can be seen that conformation of GSH has changed during the simulation. The central structure has the smallest average RMSD from all other structures in the same cluster. An RMSD cut-off of 0.16 nm yielded 19 clusters with only 2 one-member clusters. Output files of clusters calculated with various cut-off values are provided in the Supplementary Material for interested readers (clusters.log.rar). Figure S14 displays new amino acids contributing to the stability of furazan1224 during the simulation (Lys120, Arg122) while some amino acids observed in the docking complex are absent (Arg70, Ala123, Ser127). However, contacts with Arg67, Leu121, and Arg126 remain throughout the trajectory. Stability of the ligand is supported by hydrogen bonding profile involving certain amino acids in the binding site (Figs. 8, 10, and 11). Being the cofactor of the enzyme, GSH plays a key role in the enzymatic reaction in concord with Arg126 and Ser127. Arg126 is very important as it has a key role in the final step of the enzymatic reaction which gives PGE 2 [47] and is in close contact with GSH and furazan1224 in the docking complex (Fig. 5). Amide oxygen of the ligand (O2) makes very close and stable contact with the nearest hydrogen of the guanidine group of Arg126 (Fig. 8). It is often expected to see shifts in hydrogen bond distances when there is another hydrogen bond donor nearby, due to rotation of single bonds. However, electron delocalization constraints this movement, giving a very stable hydrogen bond between HH11 and O2 (Fig. 9). Therefore, the distance between HH12 and O2 remains at a certain range throughout the trajectory (Fig. 8).
Another close contact of furazan1224 was observed with Leu121. Two hydrogen bonds are observed which seem consistent throughout the trajectory (Fig. 10). However, H1 of furazan1224 makes more consistent hydrogen bond with the amide oxygen of Leu121, with respect to H4. 3D representation of these contacts is given in Fig. S15.
Arg122 is observed to contribute for stabilization of furazan1224 throughout the simulation although it is absent in the interaction map of the docking complex (Fig. 5). Figure 11 shows stable hydrogen bond interaction between the amide oxygen (O) of Arg122 and the hydrogen (H4) belonging to the furazan ring of furazan1224. Neighboring hydrogen (H1) does not make very close contact with oxygen of Arg122, but forms continuous hydrogen bond with   (Fig. 10). Furazan1224 was observed to form a consistent hydrophobic interaction as well, as shown in Fig. 12. It was revealed that the phenol ring of the ligand and the alkyl chain of Lys120 formed π-alkyl interaction (Fig. S16). These stable hydrogen bonds, steady distances and hydrophobic interaction between the ligand and amino acids support furazan1224 being very stable during the simulation.
On the other hand, rest of the final set of compounds (Table 5) did not show a consistent behavior regarding their position in the enzyme during the simulation. Figure 13 shows RMSD plot of furazan3510 and furazan856 together with the active compound furazan1224.
It is interesting to note that, furazan3510 has a similar binding energy in comparison with furazan1224 with a Fit Value of 3.00 (Table 5). This means that the binding strength according to docking scores does not necessarily ensure a steady binding during a simulation. Furthermore, furazan856, which gave an even higher binding energy but a better Fit Value (3.21), was observed to leap from its binding pocket noticeably early in the simulation (18 ns). Snapshots of furazan3510 and furazan856 are provided in Supplementary Material Fig. S17.

Pharmacokinetics
Since the compound library was constructed according to Lipinski's rule of five, all molecules in the final set comply with the rule. However, certain pharmacokinetic properties were not considered during library construction and were calculated by using SwissADME (http:// www. swiss adme. ch). The results are given in Table 6. P-glycoprotein (P-gp) is an ATP-dependent transmembrane protein that is a member of ABC transporter family. It has a crucial role in the absorption and penetration of drugs through blood-brain barrier (BBB) [49]. Furazan1224 was predicted as a P-gp substrate; however, its GI absorption was predicted high and it does not appear BBB permeable. For drugs designed for peripheral action, when central nervous system (CNS) is not their primary site of action, BBB permeability can cause undesirable CNSrelated side-effects [50]. The estimated pharmacokinetic profile of the selected compounds showed that none of them has BBB permeability.
Cytochrome P450 (CYP) isoforms are very important for drug metabolism in humans. Inhibition of these CYPs may lead to serious clinical consequences like drug-drug interactions. Especially, inhibition of CYP3A4 can lead to serious adverse effects as nearly half of the drugs in the market are metabolized by this enzyme [51]. Of the compounds selected by pharmacophore analysis and molecular docking simulations, only furazan1199 was estimated to inhibit CYP3A4 isozyme. The best inhibitor (furazan1224) identified by virtual screening appears not to inhibit CYP3A4 showing that it does not have a potential to elevate blood levels of drugs metabolized by this enzyme.

Conclusion
Various CADD methods were pursued to identify novel azole compounds that would inhibit human mPGES-1 enzyme. Docking studies provided filtering down a large molecule set to only the best-binding compounds. The choice of the scoring function at the former and succeeding stages of VS was made according to the ROC results and calculated Pearson's correlation coefficients. Pharmacophore models were successfully constructed. The best hypothesis (Güner-Henry score: 0.89) was used to identify compounds that would possess similar features with known mPGES-1 inhibitors. Further docking studies and binding mode analysis showed that furazan1224, similar to nine other compounds, occupied the binding site of the substrate PGH 2 , and took advantage of both hydrogen bond donors and acceptors in the mPGES-1 cavity. Molecular dynamics simulations also demonstrated that furazan1224 occupied the binding site of PGH 2 , and remained stable for 100 ns. Hydrogen bonds analysis supports this as stable hydrogen bonds interactions were observed with amino acids in the binding site, including Leu121 and Arg126. Hydrophobic interaction with Lys120 also supported the steady binding pose of furazan1224 throughout the simulation. Furazan3510 and furazan856, which also showed good binding upon docking, and satisfactory Fit Values, did not display steady binding poses in the mPGES-1 enzyme throughout the simulation. Calculated pharmacokinetic properties show that only furazan1199 among the final azole compounds inhibits the CYP3A4 isozyme, which is a critical enzyme for metabolizing most drugs. Furazan1224 (ZINC001142847306) is promising as a potential mPGES-1 inhibitor regarding its stability in the enzyme cavity and considerably safe pharmacokinetic profile.