In Silico Screening for Novel Tyrosine Kinase Inhibitors With Oxindole Scaffold as Anticancer Agents: Design, QSAR Analysis, Molecular Docking and ADMET Studies

Recently, Anti-cancer targeting drugs are directed against specic molecules and signaling pathways. These targeting agents have reasonable specicity, ecacy, and less side effects. Tyrosine kinases, which play an essential role in growth factor signaling regulation, are signicant targets in this type of therapy. Synthesized numerous tyrosine kinase inhibitors (TKIs), such as substituted indolin-2-ones, are effective as anti-tumor and anti-leukemia agents. In this study, a series of novel substituted indolin-2-ones were studied as kinase inhibitor analogs through quantitative structure-activity relationship (QSAR) analysis.Two chemometrics methods, such as multiple linear regression (MLR) and partial least squares combined with genetic algorithm for variable selection (GA-PLS), were employed to establish relationships between structural characteristics and kinase inhibitory activity of oxindole analogs. The GA-PLS was developed as the best predictor and validated QSAR model. The data set compounds were also studied by molecular docking to investigate their binding mechanism in the active site of tyrosine kinase enzymes. According to the information obtained from QSAR models and molecular docking analysis, 44 new potent lead compounds with novel structural features were introduced. Molecular docking, drug-likeness rules, ADMET analysis, bioavailability, toxicity prediction, and target identication were carried out on the newly designed oxindoles to elucidate the fundamental structural properties that affect their inhibitory activity. The results of our study could provide signicant insight for future design and development of novel tyrosine kinase inhibitors.


Introduction
Targeted anticancer therapy has been one of the top developments in medicine in recent years [1]. Today, protein kinases are one of the signi cant drug targets in the eld of chemotherapeutic antitumor drugs [2]. In recent years, more than 20 kinase-targeted drugs have been applied in chemotherapy, and hundreds of drug candidates are in pre-clinical and clinical trials [3]. Tyrosine kinase receptors play an essential role in cellular signal transduction pathways, modulating cellular responses to an external stimuli and affecting a variety of cellular processes including cell cycle progression, migration, cell metabolism, survival, proliferation and cell differentiation [4]. Due to their high expression in most cancer cells and their important role in modulating growth factor signaling in the cancer cell cycle, tyrosine kinases have been considered as one of the potential targets in the targeted therapy of cancers [2,5]. Two tyrosine kinase receptors, vascular endothelial growth factor receptor1 (VEGFR1) and vascular endothelial growth factor receptor 2 (VEGFR2 or KDR), regulate the function of endothelial cells [6,7]. Small molecule tyrosine kinase inhibitors (TKIs) have been successfully entering the drug market as selective anticancer agents with good potency, e cacy and low side effects [8]. Additionally, TKIs have good safety pro les and can be easily combined with other forms of chemotherapy or radiation therapy [9]. In recent years, several small molecules acting on tyrosine kinases such as Sorafenib, Sunitinib, Axitinib, and Regorafenib have been developed and approved for clinical use [10][11][12][13][14]. Many oxindoles or oxindole-like compounds have been documented in the literature as receptor tyrosine kinase (RTK) inhibitors. Among them, the scaffold of sunitinib has been extensively studied and the structure-activity relationships (SAR) of many of the indolin-2-one derivatives have been described [15][16][17]. As shown in Figure 1, Sunitinib structure does not have the phenyl group on its alkene substitution and because of that Sunitinib is inactive against FGFR1 [18].
Regarding such signi cant inhibitory activities of sunitinib on KDR and FGFR in the presence and absence of the phenyl ring substitution, there have been many efforts for developing SAR of sunitinib and its similar structures by changing the substituents of rings A, B, and C. From published SAR studies on indolin-2-ones, modi cations at the C-5 position of C-ring resulted compounds with different kinase inhibition pro les against VEGFR2 enzyme [18,19]. Therefore, substitution at this position can be used to improve the pharmacological characteristics of indolin-2-one derivatives. In this study, various substituents have been applied at different positions of rings A, B, and C of indolin-2-ones derivatives to improve their kinase inhibition spectrum.
In silico procedures have become a crucial part of drug discovery processes such as identify, re-design, and evaluate new candidate inhibitors [20]. Molecular modeling studies such as quantitative structure-activity relationship studies (QSARs) and molecular docking have been traditionally used as lead optimization approaches in the drug design eld. Recently, some novel oxindole derivatives have been reported as selective inhibitors of VEGFR2 with an excellent binding a nity [18,19]. Accordingly, this encouraged us to systematically probe the relationship between their structures and their inhibitory activities.
In this research, a novel series of oxindole analogs with broad spectrum kinase inhibitory activity was selected to perform a QSAR analysis [18]. Two different methods were applied to model the relationship between the structural properties and biological activity of the studied compounds: (1) multiple linear regression (MLR), and (2) partial least squared combined with genetic algorithm for variable selection (GA-PLS). The best model was obtained from GA-PLS, which is a linear six-parameter model. Additionally, the binding a nity, ligand orientation, and protein-ligand interactions of the designed compounds were revealed using the molecular docking technique. Drug-likeness rules like Lipinski's rule of ve, bioavailability, toxicology prediction, and target identi cation can provide useful guidelines in early stage drug discovery. Accordingly, in this study, the useful drug-likeness descriptors were also calculated for selected ligands. Finally, from the nal selected compounds, some novel potent hit compounds were chosen for future study. Hence, the results of the present study might provide helpful guidelines for the future design of potent tyrosine kinase inhibitors as potential anticancer agents.

Data set preparation
In this study, a data set of 44 oxindole derivatives was collected from the literature (Kim MH et al.) to perform this study [18]. The chemical structures and kinase inhibitory activity (IC 50 ) of these 44 molecules against KDR are presented in Table 1. The IC 50 values of the compounds were converted into the corresponding pIC 50 (−log IC 50 ) values and were used to develop QSAR models. To perform the QSAR studies, the whole data set (44 molecules) was randomly divided into a training set (35 molecules, 80%) for model generation and a test set (9 molecules, 20%) for external evaluation of the generated models.

Descriptor preparation and Optimization methods
The 2D structures of all studied compounds were sketched employing ChemBio Draw Ultra 12.0 [20]. The energy minimization was carried out using molecular mechanics (MM + ) followed by quantum-based semiempirical (AM1) method employing Hyperchem 8.0 software [21]. The software Gaussian 98, Hyperchem 8.0, and Dragon (version 5.5) were used to calculate molecular descriptors [22]. For example, the highest occupied molecular orbital energy (E HOMO ), the lowest unoccupied molecular orbital energy (E LUMU ), as well as the total dipole moment (µ) were calculated by Gaussian 98. Furthermore, some chemical parameters including molecular volume (V), hydrophobicity (logP), molecular surface area (SA), molecular polarizability (MP), hydration energy (HE) were computed employing Hyperchem 8.0. Dragon software provides a large domain of various descriptors such as constitutional, topological, empirical, 2D autocorrelation, geometrical, aromaticity index, atom-centered fragments, functional group counts, topological charge index, molecular properties and charge descriptors for each molecule [22]. Moreover, according to the equations proposed by Thanikaivelan et al., hardness, softness, electronegativity, and electrophilicity were calculated [23]. Some of these descriptors are depicted in Table S1 in the supplementary le.

Molecular docking procedure
Molecular docking analysis was carried out using an in-house batch script (DOCKFACE) to run AutoDock 4.2 and Vina in parallel mode automatically [24]. MGLTools 1.5.6 was used to convert the output structures to PDBQT [25]. The 3D crystal structure of EphB4, a surrogate receptor tyrosine kinase (PDB ID:4AW5), were obtained from the protein data bank (http://www.rcsb.org/) [12]. Regarding the preparation of the protein, cocrystallized ligand and water molecules were removed from the crystal structure using Accelrys DS Visualizer 4. Then, missing hydrogens were added, nonpolar hydrogen atoms were merged, Kollman charges, and AD4 atom types were assigned using AutoDock Tools [26]. Subsequently, the ligand was prepared by addition of Gasteiger partial chargers and merging the nonpolar hydrogen atoms. The grid box of dimensions 45×45×45 A° with a grid spacing of 0.375 A° was generated by the auxiliary program AutoGrid 4.0 using the x, y, and z coordinates of the active site. The Lamarckian Genetic Algorithm (LGA) was applied to perform the docking calculations [27]. For the LGA method, the number of runs was set to 100 and the number of population was set to 150. The binding visualization and further ligand-receptor interactions were performed using PyMOL, Autodock tool program and VMD software [28,29].
The internal validation of the docking protocol was performed by self-docking of the crystallized ligand to the binding pocket of the receptor. The root-mean-square deviation (RMSD) value was calculated between the docked and crystalized ligand to evaluate the accuracy of the docking protocol. The RMSD value between the crystalized and redocked conformations was less than 2 Å.

QSAR model development
Two different regression methods were developed and used as QSAR models: (1) simple multiple linear regression with stepwise variable selection (MLR) and (2) Genetic algorithm-partial least squares (GA-PLS). These well-known methods are commonly used in QSAR analysis [30]. In QSAR procedure, various approaches can be used to obtain many descriptors, but selecting the best of them is crucial because these descriptors must contain all the essential structural details of compounds. Here, using the SPSS statistics 18.0, stepwise selection and variable elimination were used to obtain a nal descriptor set for QSAR modeling.

Model validation
Model validation is a critical stage in the development of a reliable model. Leave-one-out (LOO) cross-validation (CV) is a powerful tool in the GA-PLS and MLR methods that consider the potential and predictive ability of the model. If the following conditions are satis ed, the QSAR model is called predictive: R 2 >0.6, Q 2 >0.6 and R 2 pred > 0.5 (30). The higher R 2 and Q 2 values indicate the better prediction ability and more robustness of the model [31]. In this study, both GA-PLS and MLR methods were validated by the Leave-one-out cross-validation approach to check their predictive ability using MATLAB 2013 software.

Applicability domain
The applicability domain (AD) is a principle key in creating the QSAR model, and it must be quanti ed before predicting a collection of molecules. Therefore, the predictions of only those compounds that fall into the applicability domain of the model might be considered reliable. The leverage values for each compound were used to evaluate the applicability domain. Furthermore, the Williams plot was used to show standardized residuals versus leverage values (h). According to the Williams plot, the obtained GA-PLS and MLR models were reliable if the leverage value (h) of each compound in the data set was lower than the warning leverage value (h*). In other words, prediction must be considered unreliable for compounds with a high leverage value (h > h*). The warning leverage value (h*) is generally xed at 3(k + 1) ⁄ n, where k is the number of the predictor variables and n is the number of training set compounds) [32][33][34].

In Silico Drug-Likeness, Bioavailability, Toxicology Prediction and Target identi cation
Nowadays, the assessments of pharmacokinetics and bioavailability become more important in the drug discovery process. Hence, in this work, the ADME (absorption, distribution, metabolism, and excretion) properties, drug-likeness (Lipinski's rule of ve), and oral bioavailability of the 10 best-predicted compounds were calculated and compared with those of sunitinib as a control drug. The drug-likeness properties and ADME pro les were calculated through the SwissADME web tool (http://www.swissadme.ch/) [35]. The human intestinal absorption (HIA) and blood-brain barrier penetration (BBB) properties of ADME were predicted using BOILED-Egg graphical model in the SwissADME web tool [35]. Additionally, the toxicity which is one of the key ingredients in medicine was predicted by the admetSAR online tool that is available at http://lmmd.ecust.edu.cn/admetsar2 [36,37].
Target identi cation is an essential step in screening the best and effective compounds in the modern drug design process. We used SwissADME online tool for target prediction of selected compounds which is accessible through the http://www.swisstargetprediction.ch/ server [38].

GA-PLS method
In the GA-PLS method, unlike MLR analysis, PLS analysis ignores the multicollinearity problem in the descriptors. In GA-PLS modeling, a genetic algorithm was used to nd the most helpful set of descriptors. It is an essential step in QSAR analysis because the selection of relevant descriptors directly relates to the predicted inhibitory activity values. In our work, the GA-PLS method was able to select the best descriptors for building the QSAR model. The details of the descriptors and the statistical results of the model obtained from GA-PLS method are shown in Table 3. As can be seen, a combination of 2D autocorrelation (ATS1m, GATS5v), information content index (IC1), topological index (MAXDN), geometrical descriptor (SPAM) and quantum chemical descriptor (softness), selected by GA-PLS approach, were responsible for the kinase inhibitory activity of oxindole analogs.
The obtained GA-PLS model showed very high statistical quality parameters (i.e., R 2 value of 0.8284 and Q 2 value of 0.7175). The predictive ability of the obtained model was evaluated by ve external test set molecules.
The predictive R 2 value for the test set was found to be 0.79, which indicated the great predictive ability of the model. A detailed statistical results of the two generated models by GA-PLS and MLR methods are shown in Table S2. Comparison of statistical results of these two models indicates the higher predictability and accuracy of the GA-PLS model to the MLR model.

Application domain of the selected GA-PLS model
The William plot for the applicability domain of GA-PLS model is displayed in Figure 3. This plot was acquired using the leverage approach to de ne the applicability domain of the model. As shown in Figure 3, all values of the training set and test set compounds were lower than the warning h* value of 0.6, indicating the good predicting ability of the nal model. Hence, new compounds with enhanced tyrosine kinase inhibitory activity could be designed using GA-PLS model.

Docking Studies
With the aim of exploring the possible binding modes of TKIs to the tyrosine kinase active site and obtaining their crucial binding interactions, 44 TKIs were docked into the active site of the tyrosine kinase enzyme (PDB ID: 4AW5). Before docking, the reliability of the docking method and parameters was evaluated by redocking the original co-crystallized ligand into the binding pocket.
As shown in Figure 4A, the best redocked pose and the original structure of the co-crystallized ligand were superimposed well with each other. Additionally, the RMSD value between them was 0.33 Å, con rming the reliability and accuracy of the docking method employed in this study. An H-bond was found between the oxindole carbonyl group of co-crystallized ligand and Gly98 residue ( Figure 4B). Hydrogen bond and electrostatic interactions were also found between the piperidine ring and Glu140 and Asp234, respectively.
All TKIs were then docked into the active site applying the same docking parameters. The ranking order of the inhibitory activity (pIC 50 values) were found to be in accordance with the docking scores ( Figure 5), which further indicated the reliability of the employed docking algorithm. The estimated free binding energy values (ΔG bind ; kcal/mol) and the possible binding mode of this compound with the key amino acid residues at the active site of the enzyme are summarized in Tables 1, S5 and S6.
As can be seen from Table1, the studied molecules were classi ed into four different classes based on their structures. Accordingly, their docking results have changed with these structural changes. The binding energy values (ΔG bind ) for the best-docked poses of the class 1 (6a-6w), class 2 (7a-7j), class 3 (8a-8e) and class 4 (8f-8k) compounds were within the range of -7.34 to -10.53 kcal/mol, -9.29 to -10.77 kcal/mol, -7.53 to -11.05 kcal/mol and -9.63 to -10.46 kcal/mol, respectively. Based on the results, class 2 and 3 compounds possessed the lowest docking binding energies. Therefore, the docking results of compounds 7g, 7j, 8b, and 8e with the lowest binding energy among the 44 compounds were chosen for further binding mode analysis between protein and ligands.
It was noted that the nitrogen and oxygen atoms of oxindole core and the nitrogen atom of benzimidazole group of all four compounds formed hydrogen bonding contacts with Glu170, Phe171 and Met172, respectively, which indicated the remarkable contribution of these interactions for the binding a nity (Figures 6 and S1). For compounds 7g and 7j, other CH---O interactions were observed between the sidechain group on the nitrogen atom of piperidine ring and the oxygen atom of Arg220, Asn221 and Asp234. Additionally, the piperidine ring in the three studied compounds 7g, 7j, and 8b formed CH---O interactions with Ser233.

Design of novel potent compounds
According to the information obtained from the developed QSAR models (MLR and GA-PLS) and the binding mechanisms of molecular docking, 44 compounds were proposed as novel tyrosine kinase inhibitors. The main strategy in the design of these novel inhibitors was the skeleton modi cation of the most potent compounds (8a-8e) and (8f-8k) via bioisosterism principle. The inhibitory activities of the newly designed compounds were predicted by the derived MLR and GA-PLS models and the results are listed in Table S3. From the designed compounds, molecules 9i, 9j, 9k, 9m, 10j, 11b, 11e, 11h, 11k, and 11m showed equal to or higher predicted pIC 50 values than compounds (8a-8e) and (8f-8k) (Figure 7).
To explore the binding modes of the novel designed compounds, they were docked into the binding site of tyrosine kinase enzyme. The docking score of all newly proposed compounds is shown in Table S3 as well as the binding mode details of 12 of them are listed in Table S4. Based on their structures, the designed compounds can be divided into three groups (9a-9p), (10a-10k), and (11a-11m). Hence, the docking results were affected by structural changes in these compounds. As expected, compounds 11a-11m showed better docking scores than those of other designed compounds. Moreover, all proposed compounds of group 11a-11m showed lower binding energy values than that of sunitinib as the control drug and some of them displayed equal and even better binding energy values than that of co-crystalized ligand.
The binding interactions of Sunitinib and new compounds 9i, 9j, 9m, 11b, 11e, 11j, and 11k in the active pocket of tyrosine kinase are shown in Figures 8 and S2-S4. The docking results suggested that most of the designed compounds had a common binding mode similar to that of the most potent compounds (8a-8e) and (8f-8k). As shown in Figure 8A, there is a CH---O interaction between Sunitinib and Glu170, further indicating the crucial role of Glu170 in stabilizing ligands in the active site of enzyme. The designed compound 9i formed two H-bond interactions with Glu170 and Met172. Similar to compounds 7a-7j, the piperidine ring and the side chain group on the nitrogen atom of the piperidine ring of compound 9i participated in CH---O interactions with Arg220 and Asn221. The docking ndings also indicated that, in addition to H-bonding, hydrophobic interactions with nonpolar amino acids and arene-H interactions play a signi cant role in the stability of the ligand-protein complex.
To investigate the speci city of the newly proposed compounds toward the tyrosine kinase enzyme, the target identi cation was carried out using SwissADME online tool. Table S5 summarizes the results of this prediction.
Among these new structures, compounds 9j, 9k, 9m, 10j, 11b, and 11e demonstrated the highest probability to act as kinase inhibitors with similarity of 93.3%, 66.7%, 80%, 60%, 66.7% and 60% to previously reported kinase inhibitors, respectively. It was notable that SwissADME showed 80% similarity with sunitinib as a control drug, which could indicate the acceptable speci city of our proposed compounds.

In Silico Drug-Likeness, Bioavailability, Toxicology Prediction and, Synthetic accessibility
The druggability of the newly designed compounds was analyzed by assessing their physiochemical properties and by applying Lipinski's rule of ve using SwissADME web tool and compared with that of sunitinib (Table 4). It could be seen that most of the selected compounds successfully passed Lipinski's ve rules. Compounds 9i, 11h, 11k, and 11m had one violation of Lipinski's rule of ve, and that was their molecular weight.  Table 5 shows the physiochemical properties of the selected newly designed compounds. As shown, most of the compounds have moderate water solubility and some of them are poorly soluble in water. Although all compounds except compound 11k displayed high gastrointestinal absorption, most did not show brain penetration. Half of the compounds depicted that they could act as cytochrome CYP2D6 inhibitors like the control drug, indicating that these compounds might not easily undergo oxidation and hydroxylation in the phase-1 metabolism. The synthetic accessibility of all new compounds were lower than 5, suggesting that they were relatively easy to be synthesized. 1-log S scale: insoluble < -10 < poorly < -6 < poorly < moderately < -4 < soluble < -2 very < 0 < highly 2-range: 1= very easy, 10= very hard Bioavailability radar plots depicted the oral bioavailability of the selected compounds ( Figure 9). Orally bioavailable compounds were predicted to be 9i, 9j, 10j, 11b, 11e, 11h, and 11m. compounds 9k and 9m showed one off-shoot relative to insaturation (INSATU), and compound 11k showed four off-shoot relatives to insaturation (INSATU), insolubility (INSOLU), size and Lipophilicity (LIPO) indicated that they might have poor physicochemical properties for oral bioavailability.
The toxicity risk calculation is critical in the early drug design phase for screening the safest and most effective drugs or compounds. The toxicity of 10 selected compounds was predicted using the admetSAR online tool [37]. Table S6 shows the toxicity prediction results. None of the compounds showed carcinogenicity, eye corrosion, or eye irritation. Compound 9m was predicted to cause Ames mutagenesis. All Compounds and control drug (Sunitinib) were determined to be hepatotoxic, however, the toxicity range of the selected compounds was better than that of Sunitinib. According to the US EPA classi cation, all compounds were predicted to have class III acute oral toxicity, which means that their LD 50 values are greater than 500 mg/kg but less than 5000 mg/kg. These computational pharmacokinetics and bioavailability information could improve the success rate of newly proposed compounds.

Conclusion
In this work, a molecular modeling study on a series of oxindole derivatives has been carried out employing 2D-QSAR, molecular docking and ADMET property analysis to investigate the structural requirements for inhibition of tyrosine kinase enzymes. Two different QSAR modeling methods, MLR and GA-PLS, were used to develop QSAR models by 35 molecules as training set and 9 molecules as the test set to validate these models. The QSAR model obtained from GA-PLS approach was superior to the MLR version and had more powerful predictive ability. From the molecular docking, the H-bond interactions with residues Glu170, Phe171, and Met172 contributed mainly to the inhibitory activity of the studied compounds. Furthermore, residues Arg220, Asn221 played an important role in stabilizing inhibitors in the tyrosine kinase active site. Based on information derived from QSAR models and molecular docking studies, 44 novel potential inhibitors with improved theoretical inhibitory activity were designed by modi cation of the oxindole scaffold. According to their predicted pIC 50 values and docking scores, ten newly designed compounds 9i, 9j, 9k, 9m, 10j, 11b, 11e, 11h, 11k, and, 11m were selected and studied through molecular docking and their ADMET properties were investigated comprehensively. The docking results further veri ed the crucial role of Glu170, Phe171, Met172, Arg220, and Asn221 in binding interactions. Furthermore, ADMET studies suggested that the six proposed compounds 9i, 9j, 9m, 11b, 11e, and 11h had favorable drug-like properties, pharmacokinetic parameters and toxicological pro les as novel tyrosine kinase inhibitors. These selected compounds, in particular, proved to be a helpful starting point for the development of new synthetic oxindole derivatives with improved inhibitory activity. Overall, these results might provide and insight into some instructions for the rational design of more potent tyrosine kinase inhibitors.     Structures of the compounds 9i, 9j, 9k, 9m, 10j, 11b, 11e, 11h, 11k, and 11m.   BOILED-Egg plot of 10 selected designed compounds and the control drug Sunitinib. Spots on the yellow area of the plot represent passive diffusion molecules through the BBB. In contrast, points in the white space of the plot are predicted to be passively absorbed by the gastrointestinal tract (HIA). Blue dots (PGP+) indicate the molecules expected to be e uated from the central nervous system (CNS) by P-glycoprotein. In contrast, the red ones (PGP-) show the molecules predicted not to be e uated from the CNS by P-glycoprotein [37].

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupportingInformation.docx