Classification of FLT3 Inhibitors and SAR Analysis by Machine Learning Methods

doi:10.21203/rs.3.rs-2459483/v1

FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set.

In addition, we clustered 3867 inhibitors into 11 subsets by K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine, 2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.

Deep neural networks (DNN)

FMS-like tyrosine kinase 3 (FLT3)

Molecular modeling

Structure-activity relationship

Substructure analysis

Acute myeloid leukemia (AML) is a hematologic malignancy that includes all non-lymphoid acute leukemia [1], accounting for at least 3% of all human cancers. The disease is caused by clonal malignant proliferation of myeloid primordial cells, and then causes the dysfunction of the bone marrow hematopoietic system, the aberrant growth and proliferation of myeloid precursor cells, and a partial block in cellular differentiation [2].

FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which also includes FMS, KIT, and platelet-derived growth factor receptor (PDGFR) kinases [3]. FLT3 regulates cell survival, proliferation, and differentiation in hematopoietic stem/progenitor cells [4].

Many clinical studies indicated that FLT3 was highly expressed in more than 70% of AML patients. They also found that FLT3 mutations occurred in approximately 30% of patients. Mutations include Internal Tandem Duplications (ITD) and Point Mutations in the Tyrosine Kinase Domain (TKD). ITD mutations occurred in approximately 25% of AML patients and TKD mutations occurred in approximately 7–10% of AML patients [5, 6]. These mutations promoted leukemia growth and cell survival.

Therefore, FLT3 is currently an important target for the treatment of AML. Midostaurin [7–9] and Gilteritinib [10–12] were approved by both the Food and Drug Administration (FDA) and European Medicines Agency (EMA) for AML indications in 2017 and 2019, respectively. In 2019, Quizartinib [13, 14] was approved in Japan for the treatment of adult patients with relapsed/refractory FLT3-ITD acute myeloid leukemia. In addition, other inhibitors targeting FLT3 have been studied or are currently in clinical trials as potential therapeutic agents for AML [15–19] (as shown in Table 1).

According to their specificity against FLT3, they can be divided into three generations. The first-generation inhibitors are nonselective and multitarget inhibitors, such as Sunitinib [20–22], Sorafenib [23–25], Midostaurin [7–9], Lestaurtinib [26–29], etc. The second-generation inhibitors such as Quizartinib [13, 14], Gilteritinib [10–12], and Crenolanib [30, 31] have higher selectivity and activity, and less toxicity related to off-target effects compared with the first-generation inhibitors [32]. However, it has been found that patients with these inhibitors developed resistance soon after they were put on treatment [33]. Therefore, new-generation inhibitors have been developed to address targeted resistance mechanisms.

At present, the research on FLT3 small molecule inhibitors is mainly carried out from the following aspects: (1) Inhibitor research on wild-type FLT3 continues, exploring small molecule inhibitors of FLT3 with novel scaffolds, regardless of certain mutation [34, 35]; (2) Developing new-generation inhibitors, multitarget inhibitors, and combination therapies with FLT3 inhibitors to address resistance mechanisms [17, 36].

Compounds’ structure-activity relationship (SAR) is a ligand-based drug design method. It can build mathematical statistics or machine learning algorithm models, study the relationship between compound structure and activity, and predict the biological activity of new compounds. In the past 10 years, some medicinal chemists have carried out quantitative structure-activity relationship (QSAR) studies on small amounts of FLT3 inhibitors. Kuei et al. [37] built a bilayer 3D-QSAR model. The q², r^2, and predicted r² of the comparative molecular similarity index analysis (CoMSIA) model were 0.54, 0.97, and 0.76, respectively; Rajiv et al. [38] built 3D QSAR models from two classes of compounds (2-acylaminothiophene-3-carboxamide derivatives and 4-amino-6-piperazine-1yl-pyrimidine-5-carbaldehyde oxime derivatives). The Pearson-R, q², and r² of the derived model were 0.8912, 0.7471, and 0.9154; Ghosh et al. [39] developed the ligand-based comparative molecular field analysis (CoMFA) and CoMSIA models from 40 pyrimidine-4,6-diamine-based compounds. The q², r², and q²_F3 of the CoMFA model were 0.802, 0.983, and 0.698. The q², r², and q²_F3 of the CoMSIA model were 0.725, 0.965, and 0.668.

As seen above, most previous studies were based on a certain derivative or a specific scaffold and did not cover a large chemical space. Besides, most of them were quantitative prediction studies of activity. Therefore, in this work, we collected 3867 FLT3 inhibitors with abundant diversity to construct classification models to predict whether a compound is highly or weakly active against FLT3. Then 36 classification models were built based on randomly divided datasets using three types of fingerprints (MACCS, ECFP4, and TT) and four machine learning algorithms (support vector machine, random forest, eXtreme Gradient Boosting, and deep neural networks). The external test set was used to verify the performance of the classification models. We also found some structural characteristics of highly active inhibitors by K-Means clustering and descriptor analysis by RF algorithm based on ECFP4 fingerprints.

2.1 Dataset

We collected inhibitors from ChEMBL [40] and Reaxys [41]. 3867 inhibitors and their biological activity values (IC₅₀) against wild FLT3 were collected after checking with 297 published pieces of literature and 86 patents (References S1-S362 in Supplementary Material 3). The IC₅₀ values were all measured by the enzymatic methods in vitro in the range of 0.101 to 322000 nM. We regarded 1734 inhibitors with IC₅₀ less than 50 nM as highly active inhibitors, labeled as ‘1’. Correspondingly, 1684 inhibitors with IC₅₀ greater than 100 nM were regarded as weakly active inhibitors, labeled as ‘0’. We removed inhibitors with IC₅₀ between 50–100 nM to minimize the influence of boundary effects and experimental errors on predictions [42, 43]. The dataset was provided in file ‘Supplementary Material 1 – database.csv’ in Supplementary Material 1.

To evaluate the applicability of the models objectively, we extracted 321 inhibitors (References S1-S41 of Supplementary Material 3) from the dataset as the external test set (176 highly active inhibitors, 145 weakly active inhibitors). The external test set was independent of the training set and the test set.

We randomly divided the remaining 3097 inhibitors into a training set and a test set with a ratio of 3:1 three times. They were named training set 1 and test set 1, training set 2 and test set 2, training set 3, and test set 3. The training set 1, 2, and 3 contained 2074 inhibitors (1043 highly active inhibitors, 1031 weakly active inhibitors) and the test set 1, 2, and 3 contained 1023 inhibitors (515 highly active inhibitors, 508 weakly active inhibitors). The training sets, test sets, and external test set are shown in Table 2.

Table 2

The training set, test set, and external test set
split seed^a	Dataset	Training set			Test set			External test set
split seed^a	Dataset	Dataset name	Highly active	Weakly active	Dataset name	Highly active	Weakly active	Highly active	Weakly active
1	3418	Training set 1	1043	1031	Test set 1	515	508	176	145
10	3418	Training set 2	1043	1031	Test set 2	515	508
111	3418	Training set 3	1043	1031	Test set 3	515	508
^a The dataset was split into a training /test set by random partition three times

2.2 Molecular Representations

We used MACCS structural keys [44] (166 bits), Topological Torsion fingerprints [45] (TT, target size = 4, 1024 bits), and ECFP4 [46] (radius = 2, 1024 bits) to represent inhibitors. They were calculated by the Python toolkit of RDKit [47] and were generally used as a series of complete sets. To avoid the redundancy of meaningless descriptors and avoid overfitting, descriptors with variance < 0.01 were removed before building models.

Finally, 128 MACCS structural keys, 750 ECFP4 fingerprints, and 370 TT fingerprints were kept and used as the inputs for the machine learning models. The selected fingerprints were provided in file ‘Supplementary Material 2 – fingerprints.csv’ in Supplementary Material 2.

2.3 Model Evaluation

Accuracy (Q), sensitivity (SE), specificity (SP), and Matthews correlation coefficient (MCC) were calculated to evaluate the performance of the models. Their calculation methods were as formula (1–4).

$$\text{Q =}\text{ }\frac{\text{TP+TN}}{\text{TP+TN+FP+FN}}$$

1

……………………………..

$$\text{MCC}\text{ }\text{=}\text{ }\frac{\text{TP*TN-FP*FN}}{\sqrt{\left(\text{TP+FP}\right)\left(\text{TP+FN}\right)\left(\text{TN+FP}\right)\left(\text{TN+FN}\right)}}$$

2

………………….…

$$\text{SE = }\frac{\text{TP}}{\text{TP+FN}}$$

3

…………………………………....

$$\text{SP}\text{ }\text{=}\text{ }\frac{\text{TN}}{\text{ }\text{TN+FP}}$$

4

…………………………………..

Here TP is true positive, TN is true negative, FP is false positive, FN is false negative. The ROC curve and AUC (the area under the curve) value were also used to evaluate the robustness and generalization ability of the models. The larger the AUC, the better the model prediction effect.

2.4 Machine Learning Methods and Parameter Optimization

Machine Learning Methods

Support vector machine (SVM) [48], random forest (RF) [49], eXtreme Gradient Boosting (XGBoost) [50], and deep neural networks (DNN) [51] were used to build models. SVM, RF, and XGBoost were implemented by Scikit-learn Python toolkit [52]. DNN was implemented on the Pytorch [53] backend.

SVM is an algorithm to find an optimal decision boundary that is farthest from two samples based on limited sample information. The optimum penalty parameter (C) and parameter of RBF kernel (gamma) were chosen to be optimized.

RF is a forest consisting of many unrelated decision trees built randomly. The number of decision trees (n_estimators), the number of features to search for the best data split(max_features), and the hyperparameter of decision trees (max_leaf_nodes) were optimized.

XGBoost is based on the Gradient Boosted Decision Tree (GBDT) objective function. A penalty term is added to reduce the complexity of the model. The learning rate, the maximum depth of the decision tree (max_depth), the number of decision trees (n_estimators), the sample ratio of sampling (subsample), and the proportion of features sampled per tree (colsample_bytree) were optimized.

DNN consists of an input layer, hidden layers, and an output layer. Each layer of the neural network has several neurons. The neurons between layers are connected, the neurons in the layers are not connected, and the neurons of the next layer are connected to all the neurons of the previous layer. The principle is shown in Fig. 1.

Parameter Optimization

The grid search [54] with a 5-fold cross-validation [55] based on the training set was used to select the optimal hyperparameters of SVM, RF, and XGBoost models. We repeated the 5-fold cross-validation 10 times and then took the average of these 10 average scores (evaluated by MCC). The maximum MCC corresponds to the optimal hyperparameters.

For neural network DNN, we followed a certain parameter adjustment sequence to reduce the workload because it takes a long time to train a neural network. These parameters need to be optimized in our DNN models: (i) the number of hidden layers; (ii) the number of neurons in each hidden layer; (iii) optimizer; (iv) batch_size; (v) learning_rate; (vi) a parameter that makes the weights in the neural network smaller to prevent overfitting (weight_decay). Between each of the two hidden layers, we also add batch normalization (BN) [56] layers and a ReLU activation function. BN can greatly improve the predictive effect of the model, and effectively avoid overfitting. The ReLU activation function prevents gradients from disappearing, making training faster. The ranges we tried for all parameters were given in Table S1 of Supplementary Material 3.

3.1 Classification models

3.1.1 Diversity of dataset

We evaluated the diversity of the inhibitors in the dataset to characterize the application scope of the models. Based on the 166-bit MACCS fingerprints, we calculated the Tanimoto coefficients (Tc) among the 3097 compounds used for modeling. A smaller Tc represents better diversity. The frequency distribution of Tanimoto coefficients is shown in Fig. 2. Their average value was 0.502, and 94.41% of the compound pairs had a similarity of less than 0.7. This indicated the abundant chemical diversity of our dataset.

3.1.2 External test set distribution

To evaluate the independence of the external test set, we used a self-organizing neural network (SOM) to cluster the entire dataset (including the training set, test set, and external test set) based on the structure of the compounds, as shown in Fig. 3. In this work, SOM of 59×39 neurons was utilized based on the 166 MACCS fingerprints. The red grids contained only compounds from the training set and the test set, the green grids only contained compounds from the external validation, and the black grid contained compounds in both the training and test sets, as well as compounds in the external test set.

As a result, a total of 103 compounds were in the 67 green grids and 2542 compounds in the 596 red grids. A total of 774 compounds were in 95 black grids. 219 of them were from the external test set, and 555 were from the training sets, and test sets. That is to say, 1/3 of the external verification set had a certain novelty in structure compared to the training sets and the test sets, so the external test set can be used to verify the models.

3.1.3 Performance of classification models

In this study, MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the datasets. The MACCS fingerprints were calculated to be a 166 bits binary string, and each bit corresponds to a "predefined" structural feature. ECFP4 and TT fingerprint descriptors are circular and path-based fingerprints, respectively, computed as binary strings of 1024 bits. Due to the differences in the structures and properties of their respective representations, the performance of the classification models built by them also differs.

We adopted three random divisions and obtained three parallel training sets and test sets. Four machine learning algorithms, SVM, XGBoost, RF, and DNN were used to build models, so 12 (3×4 = 12) models were built based on each type of fingerprints. The optimal hyperparameters each model ultimately used are given in Table S2-S5 of Supplementary Material 3. Model 1A-1D were models based on MACCS fingerprints. Model 2A-2D were models based on ECFP4 fingerprints. Model 3A-3D were models based on TT fingerprints. The average performance of the models based on the three training sets and three test sets was summarized in Table 3.

Table 3

Performance of 36 classification models based on the 128 MACCS fingerprints, 750 ECFP4 fingerprints, and 370 TT fingerprints
Model	Training set/test set^a	Algorithm	Training set
Model	Training set/test set^a	Algorithm	Q^b	MCC^c	5-CV^d	Q	SE^e	SP^f	MCC
MACCS (128 bits)
Model 1A^g	2074/1023	SVM	0.957 ± 0.004	0.91 ± 0.006	0.814 ± 0.003	0.828 ± 0.006	0.812 ± 0.002	0.844 ± 0.013	0.66 ± 0.015
Model 1B^g	2074/1023	RF	0.955 ± 0.003	0.91 ± 0.006	0.819 ± 0.001	0.830 ± 0.004	0.821 ± 0.014	0.840 ± 0.016	0.66 ± 0.006
Model 1C^g	2074/1023	XGBoost	0.966 ± 0.007	0.93 ± 0.015	0.816 ± 0.000	0.827 ± 0.005	0.820 ± 0.006	0.835 ± 0.005	0.65 ± 0.012
Model 1D^g	2074/1023	DNN	0.970 ± 0.002	0.94 ± 0.000	0.819 ± 0.002	0.833 ± 0.008	0.823 ± 0.012	0.843 ± 0.003	0.66 ± 0.015
ECFP4 (750 bits)
Model 2A^g	2074/1023	SVM	0.976 ± 0.003	0.95 ± 0.006	0.848 ± 0.004	0.856 ± 0.002	0.832 ± 0.007	0.881 ± 0.006	0.71 ± 0.006
Model 2B^g	2074/1023	RF	0.958 ± 0.007	0.85 ± 0.110	0.834 ± 0.003	0.852 ± 0.005	0.821 ± 0.014	0.883 ± 0.010	0.71 ± 0.012
Model 2C^g	2074/1023	XGBoost	0.979 ± 0.014	0.96 ± 0.032	0.840 ± 0.002	0.855 ± 0.002	0.838 ± 0.005	0.871 ± 0.001	0.71 ± 0.006
Model 2D^g	2074/1023	DNN	0.983 ± 0.001	0.97 ± 0.006	0.842 ± 0.006	0.862 ± 0.003	0.879 ± 0.012	0.845 ± 0.008	0.72 ± 0.006
TT (370 bits)
Model 3A^g	2074/1023	SVM	0.953 ± 0.005	0.91 ± 0.010	0.842 ± 0.004	0.852 ± 0.002	0.830 ± 0.015	0.875 ± 0.011	0.70 ± 0.006
Model 3B^g	2074/1023	RF	0.963 ± 0.006	0.92 ± 0.015	0.838 ± 0.005	0.851 ± 0.005	0.830 ± 0.020	0.872 ± 0.019	0.70 ± 0.012
Model 3C^g	2074/1023	XGBoost	0.955 ± 0.013	0.91 ± 0.030	0.835 ± 0.004	0.848 ± 0.006	0.835 ± 0.017	0.861 ± 0.007	0.69 ± 0.012
Model 3D^g	2074/1023	DNN	0.936 ± 0.005	0.87 ± 0.010	0.841 ± 0.006	0.860 ± 0.002	0.888 ± 0.010	0.832 ± 0.011	0.72 ± 0.000
^a The number of compounds in the training set and test set;
^b Accuracy;
^c Matthews correlation coefficient;
^d Average accuracy of five-fold cross-validation;
^e Sensitivity;
^f Specificity;
^g Mean value and standard deviation of three parallel modeling results

The performance of the models based on the MACCS fingerprints was summarized in Table S6 of Supplementary Material 3. Model 1A_1-1A_3 (Model 1A_1, Model 1A_2, Model 1A_3) were three models built with SVM. Model 1B_1-1B_3 were models built with RF. Model 1C_1-1C_3 were models built with XGBoost. Model 1D_1-1D_3 were models built with XGBoost. The standard deviations of the models built with the four algorithms were all small, so the longitudinal comparison between algorithms can be measured by the prediction accuracy (Q) and MCC. The performance of the 12 models built with the four algorithms was comparable indicating that these algorithms were all suitable for the datasets. The MCCs of all the models based on the MACCS fingerprints on the test set exceeded 0.64, which showed that the 12 models had a good predictive effect. Model 1D_1 built with DNN performed best with a prediction accuracy of 84.16% and MCC of 0.68 on the test set. The ROC curve of Model 1D_1 is shown in Fig S1.

The performance of the models built on the ECFP4 fingerprints is shown in Table S7 of Supplementary Material 3. The MCCs of all 12 models exceeded 0.68 on the test set. It showed that these models were superior to the models based on the MACCS fingerprints. The performance of the 12 models built with the four algorithms was also comparable. DNN models performed slightly better with an average prediction accuracy of 0.862 and an average MCC of 0.72 on the test set. The ROC curves of Model 2D_1, Model 2D_2, and Model 2D_3 are shown in Fig S2.

The performance of the models built on the TT fingerprints shown in Table S8 of Supplementary Material 3 was comparable to that of the models built on the ECFP4 fingerprints. Both of them were better than the performance of the MACCS fingerprints models. DNN models performed best among the 12 models with an average prediction accuracy of 0.860 and an average MCC of 0.72 on the test set. The ROC curves of Model 3D_1, Model 3D_2, and Model 3D_3 are shown in Fig S3.

DNN algorithm seemed to be more suitable to classify the inhibitors than the other three types of algorithms in our training sets and test sets. The sensitivity (SE) and specificity (SP) of each model were comparable among the 36 models, indicating that the models had equal recognition of highly active and weakly active inhibitors without bias.

3.1.4 Prediction of the external test set

The external test set was utilized to validate the models’ performance. The performance of models on the external test set was summarized in Table S9 of Supplementary Material 3. Table 4 showed the average performance of the 36 models on the external test set.

Table 4

Performance of 36 classification models on the external test set with 321 compounds
Model	Algorithm	External test set
Model	Algorithm	Q^a	SE^b	SP^c	MCC^d
MACCS (128)
Model 1A^e	SVM	0.673 ± 0.078	0.612 ± 0.105	0.747 ± 0.057	0.36 ± 0.148
Model 1B^e	RF	0.610 ± 0.090	0.600 ± 0.151	0.621 ± 0.024	0.22 ± 0.167
Model 1C^e	XGBoost	0.575 ± 0.078	0.579 ± 0.128	0.525 ± 0.069	0.15 ± 0.151
Model 1D^e	DNN	0.549 ± 0.029	0.579 ± 0.097	0.525 ± 0.116	0.10 ± 0.045
ECFP4 (750)
Model 2A^e	SVM	0.801 ± 0.033	0.780 ± 0.085	0.825 ± 0.059	0.61 ± 0.059
Model 2B^e	RF	0.790 ± 0.039	0.803 ± 0.023	0.775 ± 0.064	0.58 ± 0.081
Model 2C^e	XGBoost	0.720 ± 0.070	0.678 ± 0.077	0.770 ± 0.063	0.44 ± 0.139
Model 2D^e	DNN	0.707 ± 0.057	0.779 ± 0.086	0.648 ± 0.065	0.43 ± 0.115
TT (370)
Model 3A^e	SVM	0.846 ± 0.066	0.805 ± 0.143	0.897 ± 0.030	0.71 ± 0.109
Model 3B^e	RF	0.828 ± 0.018	0.867 ± 0.012	0.779 ± 0.048	0.65 ± 0.036
Model 3C^e	XGBoost	0.846 ± 0.021	0.864 ± 0.006	0.825 ± 0.054	0.69 ± 0.042
Model 3D^e	DNN	0.807 ± 0.073	0.701 ± 0.098	0.894 ± 0.090	0.62 ± 0.157
^a Accuracy;
^b Sensitivity;
^c Specificity;
^d Matthews correlation coefficient;
^e Mean value and standard deviation of three parallel modeling results

We can see that the models built on the MACCS fingerprints were still suboptimal among the models built on the three fingerprints. Although the performance of the TT fingerprints models was comparable to that of the ECFP4 fingerprints models on the test set, they were significantly better on the external test set.

Interestingly, the DNN models had the best performance on the test set, but they did not perform well on the external test set. This indicated that the models built with DNN learned more features of the compounds in the training sets through training and better fitted the compounds in the training sets. The models with the best generalization ability were built with SVM, followed by RF and XGBoost.

Finally, we chose four optimal models that performed well both on the test set and the external test set, as shown in Table 5. They were all built on TT fingerprints. The ROC curves of Model 3A_1, Model 3A_3, Model 3C_2, and Model 3D_3 are shown in Fig S4.

Table 5

Performance of the four optimal models based on the test set and external test set
Model	Algorithm	Training set/Test set^a	Test set				External test set
Model	Algorithm	Training set/Test set^a	Q^b (%)	SE^c (%)	SP^d (%)	MCC^e	Q (%)	SE (%)	SP (%)	MCC
Model 3A_1	SVM	Training set 1/test set 1	85.43	84.66	86.22	0.71	90.97	93.75	87.59	0.82
Model 3A_3	SVM	Training set 3/test set 3	85.14	81.94	88.39	0.70	85.05	82.39	88.28	0.70
Model 3C_2	XGBoost	Training set 2/test set 2	84.95	83.50	86.42	0.70	85.36	86.36	84.14	0.70
Model 3D_3	DNN	Training set 3/test set 3	85.83	89.76	81.94	0.72	86.92	81.40	91.50	0.74
^a The training/test sets which are constructed by random splitting the whole dataset with seed 1, 10, 111;
^b Accuracy;
^c Sensitivity;
^d Specificity;
^e Matthews correlation coefficient

3.1.5 Molecular fingerprints analysis

The random forest algorithm generated an information gain (IG) for each feature. To explore the structural information hidden in the models, we calculated the IG values of ECFP4 fingerprints in our RF models as shown in Fig S5 of Supplementary Material 3. The top 16 important ECFP4 keys (IG > 0.08) and the substructures they represent are shown in Table 6. The differences in the proportion of these 16 fingerprints in highly active and weakly active inhibitors were also counted. Fragments commonly appearing in highly active inhibitors were: 2-aminopyrimidine, 1-ethylpiperidine, 2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycles, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl.

In addition, each estimator in the RF represented a decision tree. We combined multiple ECFP4 fingerprints according to the dendrogram (shown in Fig. 4) of the decision tree to obtain larger scaffolds correlated with molecular biological activity. These structures are shown in Table 7.

In Subset_A, there were 384 inhibitors containing both the ECFP_459 key and ECFP_926 key, and 380 of them were highly active. The scaffolds of these inhibitors contained 5-ethynyl-2-(phenylamino)-4-(propylamino)pyrimidine moiety, N-methyl-4-(phenylamino)-2-(propylamino)benzamide moiety, and (2E)-4-(dimethylamino)-N-methylbut-2-enamide moiety. These structures also appeared in Table 6 as highly active fragments.

In Subset_B there were 102 inhibitors with the ECFP_459 key, ECFP_302 key, and without the ECFP_926 key. 93 of them were highly active inhibitors. The common structure of these molecules was 9-(4-methylcyclohexyl)-2-(5,6,7,8-tetrahydropyrido[4,3-b]pyridin-2-ylamino)pyrido[4',3':4,5]pyrrolo[2,3-d]pyrimidine moiety, just like the FLT3 inhibitor AMG-925.

In Subset_C, there were 76 inhibitors containing both the ECFP_844 key and ECFP_773 key without the ECFP_459 key. 71 of them were weakly active inhibitors. Most of these molecules had 1-{[4-(pyrimidin-4-yloxy)phenyl]amino}-N-[3-(trifluoromethyl)phenyl]methanamide moiety as their scaffold.

The substructures in Subset_A, Subset_B, and Subset_C also corresponded to the fragments in the top 16 important ECFP4 keys in Table 6. Thus, they are reliable to explore the structure-activity relationship of FLT3 inhibitors.

3.2 Clustering and analysis

To summarize the scaffolds of the reported FLT3 inhibitors, we performed a structure-based clustering. The descriptors cannot characterize macrocyclic structures, so we excluded all the macrocyclic compounds as a separate class(Subset 11 in Table 8). Each macrocyclic compound has more than 10 heavy atoms in its largest ring. Then, we used t-distributed stochastic neighbour embedding (T-SNE) [57] to reduce the 1024 ECFP4 fingerprints of the remaining 3264 inhibitors into two-dimension data as the input of K-Means [58]. As a result, the inhibitors were divided into 10 subsets. As shown in Fig. 5, the 10 subsets were clearly separated. The main scaffolds and typical ECFP4 fragments with large frequency differences between highly active inhibitors and weakly active inhibitors of each subset are shown in Table 8.

The compounds in Subset 1 were oxindole or pyrrole-2,5-dione derivatives. The oxindole or pyrrole-2,5-dione moiety was attached to pyrrolyl or binary aromatic heterocyclic moiety by cyclopropane, amino, or methylene at one end, and the other end was linked to amide or urea moiety. In addition, indene-carbazole or indole-carbazole derivatives were also in Subset 1, such as the marketed drug Midostaurin. Pyridinyl, pyrimidinyl, methylpiperazinyl, thiopheneyl, and phenyl frequently appeared in the side chain of highly active inhibitors. While 1,2,3-trimethoxybenzene moiety frequently appeared in the side chain of weakly active inhibitors.

The compounds in Subset 2 were mainly amide or urea derivatives. Amide or urea moiety was attached to aromatic heterocyclic ring moiety by aliphatic heterocyclic ring at one end and was directly linked to aromatic heterocyclic ring at the other end. The compounds containing oxazolidin-2-one moiety, oxyethyl, and phenoxy in the side chain had a higher proportion of high activity.

Among the molecules of Subset 3, the 4,5,6,7-tetrahydrobenzo[b]thiophene, pyrrolo[3,2-d]pyrimidine, and quinazoline moiety was attached to nitrogen-containing aromatic heterocyclic ring by oxygen moiety. Among them, 4,5,6,7-tetrahydrobenzo[b]thiophene derivatives were mostly weakly active inhibitors. The 1,2-dimethoxybenzene and amide moiety were mostly in weakly active inhibitors. Chlorobenzene and (trifluoromethyl)benzene moiety were mostly in highly active inhibitors.

Subset 4 contained 441 inhibitors, including 439 highly active inhibitors and 2 weakly active inhibitors. It indicated that compounds with this series of scaffolds were possibly to be highly active FLT3 inhibitors, which was similar to the conclusion obtained in Subset_A. We divided each compound in Subset 4 into three parts: left-scaffolds, linkers, and right-scaffolds. Left-scaffolds were mainly aminopyrimidine moiety, which was linked to aromatic heterocyclic ring. Linkers were mainly alkynyl or amide moieties. Right-scaffolds were mostly fat unsaturated long chains similar to the chain in FF-10101 [59]. The side chains were mainly 1-methyl-4λ²-piperazine moiety, (2-fluorophenyl)oxy, and pyridinyl.

Subset 5 comprised 9H-pyrido[4',3':4,5]pyrrolo[2,3-d]pyrimidine, benzenesulfonamide, 4,7-dihydro-3H-pyrrolo[2,3-d]pyrimidine, and 6-(1H-pyrazol-4-yl)-2H-indazole derivatives. The tricyclic structure in Subset_B also appeared in Subset 5. Chlorine-substituted compounds were mostly weakly active. Piperidinyl and pyridinyl frequently appeared in the side chain of highly active inhibitors.

Subset 6 contained amide derivatives, but their scaffolds were different from those in Subset 2. The molecular scaffolds were mainly N-(1H-pyrazole-5-yl)benzamide moiety and 2-(formylamino)-4,5,6,7-tetrahydrocyclohexa[1,2-b]thiophene-3-carboxamide moiety. The compounds containing (trifluoromethyl)benzene, 2-(2-(λ¹-azanyl)ethyl)pyridine, 4-propylpiperidine, and N,4,5-trimethylisoxazol-3-amine moiety in the side chain were mostly highly active inhibitors.

The molecular scaffolds of Subset 7 were mainly 2-[(1E)-2-phenylvinyl]quinazoline-4-amine moiety, N-phenyl-1H-indazole-3-carboxamide moiety, and ternary heterocycle rings containing nitrogen or sulfur. 1-methyl-1H-pyrazole moiety, 2-(pyridin-4-yl)pyrazine moiety, 4-ethylpiperidine moiety, 2-vinylphenol moiety, and fat rings containing oxygen moiety were mostly in highly active inhibitors.

The compounds in Subset 8 were mainly urea derivatives. The urea moiety was linked to the phenyl at one end and linked to the phenyl or isoxazolyl at the other end. Weakly active scaffold in Subset_C was also appeared in Subset 8. Trifluoromethyl, 1-ethyl-4λ²-piperazine moiety, and 1-chloro-4-methylbenzene moiety frequently appeared in the side chain of highly active inhibitors, and pyrimidine derivatives with amino moiety in this subset frequently appeared in weakly active inhibitors.

The compounds in Subset 9 were mainly pyrrolo[2,3-b]pyridine derivatives. This binary ring was attached to pyridinyl or pyrimidinyl by ketone moiety or methyl. Chlorine and 1-methyl-1H-pyrazole moiety appeared in some highly active inhibitors. Pyrimidinyl and 4-(2-(methyl-azanyl)ethyl)morpholine moiety also frequently appeared in highly active inhibitors, and pyridinyl frequently appeared in weakly active inhibitors.

Subset 10 mainly comprised fused heterocyclic compounds. The two ends of the amino moiety were linked to pyrimidinyl, pyridinyl, phenyl, or other nitrogen-containing aromatic heterocycles as molecular scaffold. The compounds containing 1-methyl-1H-pyrazole moiety and trifluoroacetamide moiety in the side chain were mostly highly active inhibitors, while butyl and acrylamide moiety frequently appeared in weakly active inhibitors.

Subset 11 contained all the macrocyclic compounds. Each macrocyclic compound had more than 10 heavy atoms in its largest ring. This subset contained 154 inhibitors, including 64 highly active and 90 weakly active inhibitors. Piperidinyl, piperazinyl, and 1-(2-methoxyethyl)pyrrolidine moiety were mostly in highly active inhibitors.

In this work, we collected 3867 FLT3 inhibitors with IC₅₀ values. A total of 36 classification models were built based on MACCS, ECFP4, and TT fingerprints and SVM, RF, XGBoost, and DNN algorithms. The models based on ECFP4 and TT fingerprints performed better than that based on MACCS fingerprints. The models based on TT fingerprints had excellent prediction and generalization ability. The performance of the SVM, RF, XGBoost and DNN models was comparable on the test set, and the SVM algorithm performed slightly better on the external test set. Model 3A_1, Model 3A_3, Model 3C_2, and Model 3D_3 performed well both on the test set and the external test set. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set.

Then, we calculated the importance of ECFP4 fingerprints by RF algorithm. The result shows that 2-aminopyrimidine, 1-ethylhexahydropyridine, 2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl are typical fragments among highly active inhibitors.

Combined with the dendrogram generated by the DT algorithm and clustering via the K-means algorithm, we found the substructures in Subset_A (Subset 4), Subset_B, and Subset_C were significantly related to inhibition activity. The pyrimidinamine moiety linked to a fat chain ending with (2E)-4-(dimethylamino)-N-methyl-N-[1-(methylamino)-1-oxoprop-2-yl]but-2-enamide and 9-(4-methylcyclohexyl)-2-(5,6,7,8-tetrahydropyrido[4,3-b]pyridin-2-ylamino)pyrido[4',3':4,5]pyrrolo[2,3-d]pyrimidine is the scaffold of highly active inhibitors targeting FLT3. 1-{[4-(pyrimidin-4-yloxy)phenyl]amino}-N-[3-(trifluoromethyl)phenyl]methanamide is likely to be the scaffold of weakly active inhibitors targeting FLT3.

Conflict of interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by ‘Chemical Grid Project’ of Beijing University of Chemical Technology. We thank Molecular Networks GmbH, Erlangen, Germany, for providing the programs SONNIA.

Supplementary Materials

Supplementary Material	Description
Supplementary Material 1	Dataset of 3867 inhibitors and their biological activity values (IC₅₀) against wild FLT3
Supplementary Material 2	The fingerprints for the machine learning models
Supplementary Material 3	Supplementary tables and graphs

Ke YY, Singh VK, Coumar MS et al. (2015) Homology modeling of DFG-in FMS-like tyrosine kinase 3 (FLT3) and structure-based virtual screening for inhibitor identification. Scientific Reports 5:11702. https://doi.org/10.1038/srep11702
Döhner H, Weisdorf DJ, Bloomfield CD (2015) Acute myeloid leukemia. New England Journal of Medicine 373:1136-1152. https://doi.org/10.1056/NEJMra1406184
van der Geer P, Hunter T, Lindberg RA (1994) Receptor protein-tyrosine kinases and their signal transduction pathways. Annual Review of Cell Biology 10:251-337. https://doi.org/10.1146/annurev.cb.10.110194.001343
Maroc N, Rottapel R, Rosnet O et al. (1993) Biochemical characterization and analysis of the transforming potential of the FLT3/FLK2 receptor tyrosine kinase. Oncogene 8:909-918.
Gilliland DG, Griffin JD (2002) The roles of FLT3 in hematopoiesis and leukemia. Blood 100:1532-1542. https://doi.org/https://doi.org/10.1182/blood-2002-02-0492
Smith CC, Wang Q, Chin CS et al. (2012) Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukemia. Nature 485:260-263. https://doi.org/10.1038/nature11016
Fabbro D, Buchdunger E, Wood J et al. (1999) Inhibitors of Protein Kinases: CGP 41251, a Protein Kinase Inhibitor with Potential as an Anticancer Agent. Pharmacology & Therapeutics 82:293-301. https://doi.org/https://doi.org/10.1016/S0163-7258(99)00005-4
Barry EV, Clark JJ, Cools J et al. (2007) Uniform sensitivity of FLT3 activation loop mutants to the tyrosine kinase inhibitor midostaurin. Blood 110:4476-4479. https://doi.org/10.1182/blood-2007-07-101238
Stone RM, Mandrekar SJ, Sanford BL et al. (2017) Midostaurin plus Chemotherapy for Acute Myeloid Leukemia with a FLT3 Mutation. The New England Journal of Medicine 377:454-464. https://doi.org/10.1056/NEJMoa1614359
Lee LY, Hernandez D, Rajkhowa T et al. (2017) Preclinical studies of gilteritinib, a next-generation FLT3 inhibitor. Blood 129:257-260. https://doi.org/10.1182/blood-2016-10-745133
Mori M, Kaneko N, Ueno Y et al. (2017) Gilteritinib, a FLT3/AXL inhibitor, shows antileukemic activity in mouse models of FLT3 mutated acute myeloid leukemia. Investigational New Drugs 35:556-565. https://doi.org/10.1007/s10637-017-0470-z
Perl AE, Martinelli G, Cortes JE et al. (2019) Gilteritinib or Chemotherapy for Relapsed or Refractory FLT3-Mutated AML. The New England Journal of Medicine 381:1728-1740. https://doi.org/10.1056/NEJMoa1902688
Zarrinkar PP, Gunawardane RN, Cramer MD et al. (2009) AC220 is a uniquely potent and selective inhibitor of FLT3 for the treatment of acute myeloid leukemia (AML). Blood 114:2984-2992. https://doi.org/10.1182/blood-2009-05-222034
Cortes JE, Khaled S, Martinelli G et al. (2019) Quizartinib versus salvage chemotherapy in relapsed or refractory FLT3-ITD acute myeloid leukaemia (QuANTUM-R): a multicentre, randomised, controlled, open-label, phase 3 trial. The Lancet Oncology 20:984-997. https://doi.org/10.1016/s1470-2045(19)30150-0
Ahn J-S, Kim H-J (2022) FLT3 mutations in acute myeloid leukemia: a review focusing on clinically applicable drugs. Blood Research 57:32-36. https://doi.org/10.5045/br.2022.2022017
Zhong Y, Qiu R-Z, Sun S-L et al. (2020) Small-Molecule Fms-like Tyrosine Kinase 3 Inhibitors: An Attractive and Efficient Method for the Treatment of Acute Myeloid Leukemia. Journal of Medicinal Chemistry 63:12403-12428. https://doi.org/10.1021/acs.jmedchem.0c00696
Zhao JC, Agarwal S, Ahmad H et al. (2022) A review of FLT3 inhibitors in acute myeloid leukemia. Blood Reviews 52:100905. https://doi.org/https://doi.org/10.1016/j.blre.2021.100905
Tong L, Li X, Hu Y et al. (2020) Recent advances in FLT3 inhibitors for acute myeloid leukemia. Future Medicinal Chemistry 12:961-981. https://doi.org/10.4155/fmc-2019-0365
Solana-Altabella A, Ballesta-López O, Megías-Vericat JE et al. (2022) Emerging FLT3 inhibitors for the treatment of acute myeloid leukemia. Expert Opinion on Emerging Drugs 27:1-18. https://doi.org/10.1080/14728214.2021.2009800
O'Farrell AM, Abrams TJ, Yuen HA et al. (2003) SU11248 is a novel FLT3 tyrosine kinase inhibitor with potent activity in vitro and in vivo. Blood 101:3597-3605. https://doi.org/10.1158/0008-5472.Can-04-1443
Yee KWH, Schittenhelm M, O'Farrell A-M et al. (2004) Synergistic effect of SU11248 with cytarabine or daunorubicin on FLT3 ITD–positive leukemic cells. Blood 104:4202-4209. https://doi.org/https://doi.org/10.1182/blood-2003-10-3381
Fiedler W, Kayser S, Kebenko M et al. (2015) A phase I/II study of sunitinib and intensive chemotherapy in patients over 60 years of age with acute myeloid leukaemia and activating FLT3 mutations. British Journal of Haematology 169:694-700. https://doi.org/https://doi.org/10.1111/bjh.13353
Brose MS, Nutting CM, Jarzab B et al. (2014) Sorafenib in radioactive iodine-refractory, locally advanced or metastatic differentiated thyroid cancer: a randomised, double-blind, phase 3 trial. Lancet 384:319-328. https://doi.org/10.1016/s0140-6736(14)60421-9
Ravandi F, Alattar ML, Grunwald MR et al. (2013) Phase 2 study of azacytidine plus sorafenib in patients with acute myeloid leukemia and FLT-3 internal tandem duplication mutation. Blood 121:4655-4662. https://doi.org/10.1182/blood-2013-01-480228
Wilhelm SM, Carter C, Tang L et al. (2004) BAY 43-9006 exhibits broad spectrum oral antitumor activity and targets the RAF/MEK/ERK pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis. Cancer Research 64:7099-7109. https://doi.org/10.1158/0008-5472.Can-04-1443
Knapper S, Mills KI, Gilkes AF et al. (2006) The effects of lestaurtinib (CEP701) and PKC412 on primary AML blasts: the induction of cytotoxicity varies with dependence on FLT3 signaling in both FLT3-mutated and wild-type cases. Blood 108:3494-3503. https://doi.org/10.1182/blood-2006-04-015487
Levis M, Brown P, Smith BD et al. (2006) Plasma inhibitory activity (PIA): a pharmacodynamic assay reveals insights into the basis for cytotoxic response to FLT3 inhibitors. Blood 108:3477-3483. https://doi.org/10.1182/blood-2006-04-015743
Hexner EO, Mascarenhas J, Prchal J et al. (2015) Phase I dose escalation study of lestaurtinib in patients with myelofibrosis. Leukemia & Lymphoma 56:2543-2551. https://doi.org/10.3109/10428194.2014.1001986
Shabbir M, Stuart R (2010) Lestaurtinib, a multitargeted tyrosinse kinase inhibitor: from bench to bedside. Expert Opinion on Investigational Drugs 19:427-436. https://doi.org/10.1517/13543781003598862
Zimmerman EI, Turner DC, Buaboonnam J et al. (2013) Crenolanib is active against models of drug-resistant FLT3-ITD-positive acute myeloid leukemia. Blood 122:3607-3615. https://doi.org/10.1182/blood-2013-07-513044
Galanis A, Ma H, Rajkhowa T et al. (2014) Crenolanib is a potent inhibitor of FLT3 with activity against resistance-conferring point mutants. Blood 123:94-100. https://doi.org/10.1182/blood-2013-10-529313
Larrosa-Garcia M, Baer MR (2017) FLT3 Inhibitors in Acute Myeloid Leukemia: Current Status and Future Directions. Molecular Cancer Therapeutics 16:991-1001. https://doi.org/10.1158/1535-7163.Mct-16-0876
Pratz KW, Levis M (2017) How I treat FLT3-mutated AML. Blood 129:565-571. https://doi.org/10.1182/blood-2016-09-693648
Yuan X, Chen Y, Zhang W et al. (2019) Identification of Pyrrolo[2,3- d]pyrimidine-Based Derivatives as Potent and Orally Effective Fms-like Tyrosine Receptor Kinase 3 (FLT3) Inhibitors for Treating Acute Myelogenous Leukemia. Journal of Medicinal Chemistry 62:4158-4173. https://doi.org/10.1021/acs.jmedchem.9b00223
Im D, Jun J, Baek J et al. (2022) Rational design and synthesis of 2-(1H-indazol-6-yl)-1H-benzo[d]imidazole derivatives as inhibitors targeting FMS-like tyrosine kinase 3 (FLT3) and its mutants. Journal of Enzyme Inhibition and Medicinal Chemistry 37:472-486. https://doi.org/10.1080/14756366.2021.2020772
Wang Z, Cai J, Cheng J et al. (2021) FLT3 Inhibitors in Acute Myeloid Leukemia: Challenges and Recent Developments in Overcoming Resistance. Journal of Medicinal Chemistry 64:2878-2900. https://doi.org/10.1021/acs.jmedchem.0c01851
Shih K-C, Lin C-Y, Chi H-C et al. (2012) Design of Novel FLT-3 Inhibitors Based on Dual-Layer 3D-QSAR Model and Fragment-Based Compounds in Silico. Journal of Chemical Information and Modeling 52:146-155. https://doi.org/10.1021/ci200434f
Kar RK, Suryadevara P, Roushan R et al. (2012) Quantifying the structural requirements for designing newer FLT3 inhibitors. Medicinal Chemistry 8:913-927. https://doi.org/10.2174/157340612802084153
Ghosh S, Keretsu S, Cho SJ (2021) Molecular Modeling Studies of N-phenylpyrimidine-4-amine Derivatives for Inhibiting FMS-like Tyrosine Kinase-3. International Journal of Molecular Sciences 22:12511. https://doi.org/10.3390/ijms222212511
ChEMBL. ChEMBL is part of the ELIXIR infrastructure. https://www.ebi.ac.uk/chembl/. Accessed 15 September 2022
Reaxys. https://www.reaxys.com. Accessed 15 September 2022
Rodríguez-Pérez R, Bajorath J (2019) Multitask Machine Learning for Classifying Highly and Weakly Potent Kinase Inhibitors. ACS Omega 4:4367-4375. https://doi.org/10.1021/acsomega.9b00298
Huo D, Wang H, Qin Z et al. (2021) Building 2D classification models and 3D CoMSIA models on small-molecule inhibitors of both wild-type and T790M/L858R double-mutant EGFR. Molecular Diversity https://doi.org/10.1007/s11030-021-10300-9
Durant J, Leland B, Henry D et al. (2002) Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci 42:1273-1280. https://doi.org/10.1021/ci010132r
Nilakantan R, Bauman N, Dixon JS et al. (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. Journal of Chemical Information and Computer Sciences 27:82-85. https://doi.org/10.1021/ci00054a008
Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 50:742-754. https://doi.org/10.1021/ci100050t
GmbH TI. RDKit: Open-Source Cheminformatics Software. http://www.rdkit.org/. Accessed 2022.3.1 2022
Cortes C, Vapnik V (1995) Support-Vector Networks. Machine Learning 20:273-297. https://doi.org/10.1023/A:1022627411411
Breiman L (2001) Random Forests. Machine Learning 45:5-32. https://doi.org/10.1023/A:1010933404324
Sheridan RP, Wang WM, Liaw A et al. (2016) Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships. Journal of Chemical Information and Modeling 56:2353-2360. https://doi.org/10.1021/acs.jcim.6b00591
Xu Y, Ma J, Liaw A et al. (2017) Demystifying Multitask Deep Neural Networks for Quantitative Structure–Activity Relationships. Journal of Chemical Information and Modeling 57:2490-2504. https://doi.org/10.1021/acs.jcim.7b00087
scikit-learn. scikit-learn: Machine Learning in Python. http://scikit-learn.org/stable/. Accessed 15 September 2022
Pytorch. https://pytorch.org. Accessed 15 September 2022
Mete M, Sakoglu U, Spence JS et al. (2016) Successful classification of cocaine dependence using brain imaging: a generalizable machine learning approach. BMC Bioinformatics 17:357. https://doi.org/10.1186/s12859-016-1218-z
Krstajic D, Buturovic LJ, Leahy DE et al. (2014) Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 6:10. https://doi.org/10.1186/1758-2946-6-10
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning 37:448–456.
van der Maaten L, Hinton G (2008) Viualizing data using t-SNE. Journal of Machine Learning Research 9:2579-2605.
Žalik KR (2008) An efficient k′-means clustering algorithm. Pattern Recognition Letters 29:1385-1391. https://doi.org/https://doi.org/10.1016/j.patrec.2008.02.014
Yamaura T, Nakatani T, Uda K et al. (2018) A novel irreversible FLT3 inhibitor, FF-10101, shows excellent efficacy against AML cells with FLT3 mutations. Blood 131:426-438. https://doi.org/10.1182/blood-2017-05-786657

Table 1,6,7,8 are available in the Supplementary Files section.

No competing interests reported.

GraphicalAbstract.png
Description: We built 48 classification models and clustered on 3687 FLT3 inhibitors, and performed SAR analysis based on the models.
SupplementaryMaterial1database.csv
SupplementaryMaterial2fingerprints.csv
SupplementaryMaterial3.pdf
Table1.docx
Table6.docx
Table7.docx
Table8.docx

Classification of FLT3 Inhibitors and SAR Analysis by Machine Learning Methods

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Materials And Methods

2.1 Dataset

2.2 Molecular Representations

2.3 Model Evaluation

2.4 Machine Learning Methods and Parameter Optimization

3. Results And Discussion

3.1 Classification models

3.1.1 Diversity of dataset

3.1.2 External test set distribution

3.1.3 Performance of classification models

3.1.4 Prediction of the external test set

3.1.5 Molecular fingerprints analysis

3.2 Clustering and analysis

4. Conclusion

Declarations

References

Table 1,6,7,8

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1