Performance of Virtual Screening Methods for the Large-scale Kinase Test Set
The different VS methods (TS-ensECBS model, ligand-based shape similarity score, receptor-based pharmacophore model, and molecular docking with either flexible or rigid sidechains) were commonly applied to the kinase test set and their prediction accuracy was compared using AUC values in a PR curve as shown in Fig. 1. The TS-ensECBS model was a machine learning-based method trained using precompiled protein-ligand interaction data. In cases of structural similarity between chemicals, it was calculated through the molecular information obtained from chemical structures and easily applicable to VS using the similarity score to known active molecules. In contrast, the receptor-based pharmacophore model was a structure-based method that extracted a set of critical steric and electrostatic features from a known protein-ligand complex structure. Molecular docking only required knowledge of the receptor structure without any prior binding site information and the calculated binding energy was used to score chemical compounds. The systematic comparison of these methods not only provided a performance estimation of the individual methods but also revealed how to combine them for improving VS procedures. The detailed steps about combination of effective VS procedure can be found in ‘Discussion’ part.
Selection of Kinases for the Blind Virtual Screening Test
For further performance validation, a small number of kinase targets were selected to find novel inhibitory molecules from completely unseen chemical databases. We focused on three kinases (MEK1, EPHB4, and WEE1) that were selected by the following criteria based on the screening results of the test set shown in Fig. 1; a TS-ensECBS model AUC value greater than 0.8 (expecting high prediction accuracy), a number of known binding chemicals for the targets greater than 10 (ensuring sufficient training data), an output TS-ensECBS score for the positive chemicals higher than 0.8, and average score differences between positive and negative chemicals over 0.4 (expecting high discriminative power). These criteria were defined to select kinases with reliable, high predictive performance models likely to have high accuracy in the blind test. The PR AUC values of TS-ensECBS models were 0.93, 0.92, and 0.89 for MEK1, EPHB4, and WEE1, respectively. Similarly, the corresponding AUC values of the pharmacophore models were 0.68, 0.61, and 0.92, respectively. Hence, pharmaDB was used along with TS-ensECBS to further screen the unknown databases.
Virtual Screening for the Three Selected Kinases
To find promising candidates in the chemical databases, we prioritized the chemical compounds by TS-ensECBS or receptor-based pharmacophore score. First, chemical compounds were only selected by TS-ensECBS score (cutoff 0.7), which resulted in 34, 53, and 238 for MEK1, EPHB4, and WEE1, respectively. Of these, the top 10 individual hits by TS-ensECBS and receptor-based pharmacophore score were shortlisted for each kinase target. Notably, only six molecules had a pharmacophore fit score above zero for MEK1. From those constricted hits, we excluded molecules that were out of stock, duplicates obtained as repetitive hits in more than one method, and those already reported as inhibitors for MEK1, EPHB4, and WEE1 kinases or closely associated kinases. To further compare PharmaDB and TS-ensECBS methods, the chemical databases were also later screened using only the PharmaDB receptor-based pharmacophore models. The PharmaDB models stored in PDB IDs 3v01, 4aw5, and 1 × 8b were used to select the candidate molecules using the pharmacophore fit score. The experimentally tested molecules had diverse score ranges for both TS-ensECBS and pharmacophore fit score (Fig. 2).
In Vitro Kinase Binding Assay
An in vitro binding assay was used to test the final candidate molecules for binding specificity and binding affinity. As a positive control, hypothemycin, a predicted but known inhibitor of MEK1, was also included to perform the experimental assay. Out of 32 molecules tested through the in vitro binding assay with 10 µM concentration, SEW05801, 18864755, XBX00307, and CD09763 were successfully found to be potential inhibitors of MEK1 and GK03499 and GK03503 were found to be dual inhibitors for MEK1 and EPHB4 (Figs. 2 and 3). The percentage of a kinase in solid support (that confirms the competition between a test molecule and an immobilized molecule for binding to a kinase) varied between 5.9% and 32% (Fig. 2 and Table 1). For the eight active molecules (Fig. 3), we also calculated the dissociation constant (SI Figs. 1 and 2) and the values were between 1500 nM and 8400 nM (Table 1). Contrastingly, no active molecules were found for WEE1 that can be seen along with kinase diversity in SI Fig. 3. Out of eight active molecules, only three had low to moderate pharmacophore scores from PharmaDB screening (0.15, 0.44, and 0.52; Table 1). Nineteen molecules had pharmacophore fit scores above zero but only three of them were active in the competitive binding assay. Hence, receptor-based pharmacophore models may not be suitable for final scoring use in VS.
Table 1
Competitive binding assay hits. Compounds identified in this study. * represents dual inhibitors of MEK1 and EPHB4.
Compound | Target | TS-ensECBS score | Pharmacophore fit score | Percent of control (POC) | Kd (nM) | Rigid docking (kcal/mol) | Flexible docking (kcal/mol) | Shape similarity |
SEW05801 | MEK1 | 0.87 | 0 | 5.9 | 1500 | -8.4 | -10.4 | 0.47 |
18864755 | MEK1 | 0.98 | 0 | 8.1 | 2900 | -9.7 | -9.8 | 0.52 |
XBX00307 | MEK1 | 0.97 | 0.15 | 18 | 3700 | -8.6 | -9.6 | 0.59 |
CD09763 | MEK1 | 0.97 | 0 | 25 | 3800 | -6.3 | -7 | 0.43 |
GK03499* | MEK1 | 0.9 | 0 | 23 | 4900 | -8.2 | -9.8 | 0.76 |
GK03503* | MEK1 | 0.94 | 0 | 31 | 8400 | -8.2 | -9.8 | 0.85 |
GK03503* | EPHB4 | 0.86 | 0.52 | 7.9 | 1600 | -8.2 | -9 | 0.72 |
GK03499* | EPHB4 | 0.9 | 0.44 | 32 | 3300 | -8.5 | -9.5 | 0.67 |
In contrast, the TS-ensECBS model consistently showed high performance in selecting potential inhibitors for both MEK1 and EPHB4. All active molecules showed high TS-ensECBS scores (above 0.87), whereas candidate molecules selected only using pharmacophore fit scores were all inactive (Fig. 2). Molecules with both high TS-ensECBS and pharmacophore scores were likely active (Fig. 2b). Taken together, the results suggested that the TS-ensECBS model could serve as a promising primary VS tool.
Secondary 3D-QSAR Pharmacophore Model for Complementing the TS-ensECBS Model
Despite the high accuracy of the TS-ensECBS model, the in vitro binding assay showed that many of the candidate molecules with a TS-ensECBS score of 0.7 or higher were inactive, suggesting the possibility for further improvement. To complement the TS-ensECBS model and refine the output scores, we constructed a secondary 3D-QSAR model using only the in vitro binding assay data. The structure-based 3D-QSAR model was relatively free from the overfitting problem that occurs in most machine-learning methods, and more importantly, provided the 3D molecular features necessary for optimal target-binding.
Six reliable 3D-QSAR models built using the competitive binding assay results were finally chosen for MEK1 and EPHB4. The 3D-QSAR pharmacophore models were generated via the Catalyst Hypogen algorithm that used the data (that is the training set, conformational models, pharmacophore features, parameters, and so on) to generate predictive pharmacophores (Fig. 4). All pharmacophores (maximum five features) among the two most active molecules were identified and stored by Hypogen. Pharmacophores that fit the remaining active molecules were retained. Any pharmacophore that matched more than half of the compounds identified as inactive was removed. The highest-scoring unique pharmacophores were exported after optimization. The values of regression analysis by Hypogen, along with the chosen hypotheses, are shown in SI Fig. 4.
The Fisher randomization option in DS 2018 was used to ensure that the generated hypotheses are acceptable. All the generated models had a statistical significance of 90% or higher, which suggested that the 3D-QSAR models were trustworthy (Fig. 5).
In MEK1 models built based on POC (MEK1_POC1) and Kd (MEK1_Kd1) values, one HBD, one HBA, and one RA were predicted to be common active features among the six active molecules (Fig. 4). The statistical significance for MEK1 models generated based on POC (MEK1_POC1) and Kd (MEK1_Kd1) values were 90% and 95%, respectively (Fig. 4). For the hypothesis using POC values, the EPBH4_POC1 model consisted of two RA features and one HBA feature and its significance was 90% (Fig. 4). In the case of EPHB4 models with Kd values taken as input, the significance was 95% for the three selected hypotheses (Fig. 4). EPHB4_POC1 and EPHB4_Kd3 models had two RA features and one HBA feature while EPHB4_Kd1 and EPHB4_Kd2 models had one HBA, one RA, and one HY feature.
General Applicability of the Secondary 3D-QSAR Models
Notably, the 3D-QSAR models were not just limited to our experimental data set but were generally applicable to a larger data set without an overfitting problem because the prediction results for the unseen large test set from the kinases still showed a strong discriminative capacity for known binders (Fig. 5a and b). Specifically, the 3D-QSAR models that were built with the MEK1 and EPHB4 inhibitors identified in this study were employed to score the MEK1 and EPHB4 test set molecules used to validate the structure-based methods (Fig. 5a and b). The fit scores of positives in MEK1 and EPHB4 test set molecules for 3D-QSAR models were clearly discriminative against the negatives (Fig. 5a and b). The highest pharmacophore fit score of both positive and negative molecules from any of the selected model was used for calculation. Out of ten MEK1 positive molecules tested, eight had a 0.86 or higher pharmacophore fit score. Similarly, all nine EPHB4 active molecules had a 0.93 or higher pharmacophore fit score. The total number of molecules used to as assess the performance of MEK1 and EPHB4 3DQSAR models were 51 and 49, respectively.
Combined Use of 3D-QSAR and Chemical Binding Similarity
Thanks to the reliability and generality of the 3D-QSAR models, we compared the predicted values of 3D-QSAR models and the chemical binding similarity scores for the MEK1 and EPHB4 inhibitors found in this study to check the possibility of their combined use (Fig. 5c and d). The competitive binding assay results from 13 MEK1 and 12 EPHB4 candidates were used to show the kinase inhibiting properties of molecules. The lowest predicted activity log value from any of the selected 3D-QSAR models for the screened molecules was used for this comparison. The results suggested that the combined use of TS-ensECBS models with the 3D-QSAR model clearly separated the active molecules from the inactive molecules. Accordingly, this strategy for generating a secondary 3D-QSAR model would be useful for further inhibitor screening or in structure-activity relationship studies aiming for new molecules when used along with the ECBS method. As discussed in the individual validation results of both the TS-ensECBS and 3D-QSAR methods in earlier sections, these methods were also individually effective as VS tools against unknown molecules.
Molecular Docking to Predict the Binding Pose of Active Molecules
Although molecular docking showed poor performance in the VS test, it was useful to predict the binding mode and present the atomic details of protein-ligand interaction when the binding pair was fixed. When the native ligands were docked to the target proteins for a docking pose prediction test, the ligand binding complexes were very close to the corresponding x-ray structures with RMSD values of 1.58 Å and 1.23 Å for MEK1 and EPHB4, respectively. The results suggested the potential use of molecular docking to predict binding modes, even though it could not be used as a primary VS tool based on its binding energy score. The molecules with the highest binding affinity value from the competitive binding assay were chosen to predict the binding models. The binding models of MEK1-SEW05801 and EPHB4-GK03503 complexes are shown in Fig. 6.
In the MEK1 crystal structure, the ligand was involved in bidentate interactions with Ser212 and formed a halogen bond with Val127. In the case of the EPHB4 complex, Asp234 and Met172 formed hydrogen bond interactions with the ligand in addition to several hydrophobic interactions. Binding modes of identified inhibitors in this study suggested that they may not share any conserved interactions with their respective co-crystallized ligands. Their predicted interaction details are discussed below. When molecular docking and 3D-QSAR models for MEK1 were compared, the hydrophobic interactions were disclosed through an RA feature that was similar in all protein-ligand interactions. Specifically, XBX00307 and 18864755 inhibitors were assumed to interact with Val22 and Leu14 of MEK1. The residues Gln93 and Met159 in MEK1 may play a role in binding with XBX00307 and SEW05801, respectively. Interaction with Ile81 may be necessary for GK03503 and GK03409 to bind in the MEK1 cavity. The Leu155 residue in MEK1 may play an essential role in binding to CD09763. For EPHB4 binding, interactions with Ile15, Ala39, Leu141, and Asp152 may be significant for GK03499 and GK03503.