Compound-target interactions can be accurately predicted from integrated features
Our first concern in this study is to construct a predictive model that can accurately differentiate compound-target interactions with strong binding affinity from those with weak binding affinity. To represent compound-target interactions, we used a chemogenomics framework. In brief, an interaction is represented by simultaneously considering the structure content from this compound and this protein. Thus, each interaction sample (positive or negative) is finally characterized by a fixed dimensional vector by combining the structural content from the related compound and protein. Each of these factors can be considered as a separate coordinate spanning a multidimensional space, and in this sense a compound-target interaction is an event in this type of multidimensional space.
Firstly, the classification performance of compound-target interactions was evaluated. The statistics of the predictions on the stratified 10-fold CV are summarized in the “Integrated” rows of Table 1. The ROC curves are shown in Fig. 3.
Table 1
Statistical results of the models derived from different descriptors (integrated or separated groups) on the stratified 10-fold CV.
Descriptors | Pair-split | Compound-split |
ACC | SE | SP | AUC | ACC | SE | SP | AUC |
Integrated | ECFP4-ProA | 83.96 ± 0.12 | 85.74 ± 0.11 | 82.00 ± 0.17 | 92.67 | 83.58 ± 0.06 | 85.35 ± 0.09 | 81.65 ± 0.09 | 92.33 |
ECFP4-ProB | 83.99 ± 0.16 | 85.87 ± 0.12 | 81.91 ± 0.22 | 92.68 | 83.55 ± 0.05 | 85.41 ± 0.08 | 81.51 ± 0.06 | 92.30 |
Mol2D-ProA | 82.11 ± 0.09 | 84.68 ± 0.10 | 79.29 ± 0.11 | 90.86 | 81.47 ± 0.05 | 83.95 ± 0.08 | 78.69 ± 0.03 | 90.24 |
Mol2D-ProB | 82.17 ± 0.13 | 84.85 ± 0.10 | 79.23 ± 0.18 | 90.93 | 81.59 ± 0.07 | 84.17 ± 0.07 | 78.75 ± 0.11 | 90.21 |
MACCS-ProA | 82.89 ± 0.25 | 85.00 ± 0.21 | 80.53 ± 0.34 | 92.04 | 82.24 ± 0.05 | 84.23 ± 0.08 | 80.04 ± 0.08 | 91.53 |
MACCS-ProB | 82.83 ± 0.27 | 85.02 ± 0.20 | 80.40 ± 0.33 | 92.02 | 82.09 ± 0.07 | 84.16 ± 0.11 | 79.81 ± 0.09 | 91.30 |
Separated | ECFP4 | 74.99 ± 0.03 | 76.21 ± 0.04 | 73.66 ± 0.09 | 85.08 | 75.28 ± 0.09 | 77.53 ± 0.12 | 72.76 ± 0.18 | 84.65 |
Mol2D | 74.09 ± 0.06 | 76.73 ± 0.07 | 71.17 ± 0.07 | 82.88 | 73.61 ± 0.07 | 76.99 ± 0.13 | 69.85 ± 0.16 | 83.30 |
MACCS | 72.83 ± 0.07 | 75.30 ± 0.09 | 70.12 ± 0.08 | 83.18 | 72.94 ± 0.08 | 76.45 ± 0.08 | 69.03 ± 0.18 | 82.33 |
ProA | 66.20 ± 0.01 | 72.51 ± 0.09 | 59.22 ± 0.13 | 72.57 | 66.21 ± 0.03 | 72.41 ± 0.20 | 59.34 ± 0.22 | 72.53 |
ProB | 66.21 ± 0.03 | 72.53 ± 0.08 | 59.20 ± 0.13 | 72.57 | 66.21 ± 0.03 | 72.44 ± 0.12 | 59.34 ± 0.15 | 72.53 |
From Table 1, both the pair-split and compound-split models performed well with an average ACC up to 0.81 and an average AUC up to 0.90, and the low standard deviations obtained from the 50 repetitions of the model shows the robust predictive performance of the models. These results above indicated that our models built with the six integrated descriptor groups and XGBoost algorithm could effectively distinguish the compound-target interactions with strong binding affinity from those with weak binding affinity. Unsurprisingly and reasonably, the performance of the compound-split validation is slightly worse than that of the pair-split validation (e.g., Mol2d-prob model ACC: 82.11 vs. 81.47) since these two strategies simulate different situations that actual predictions may encounter where the former means the prediction of brand-new ‘new’ compounds while the latter additionally includes the prediction of ‘known’ compounds whose associated compound-target interaction(s) in the training set may provide prediction clues against similar targets which also bind to the compounds. The statistical values of the models built on the individual descriptor group were as follows in a decreasing order: ECFP4-ProA > ECFP4-ProB > Mol2D-ProA > Mol2D-ProB > MACCS-ProB > MACCS-ProA. The model utilizing the ECFP4-ProA descriptors yielded the best performance, with ACC = 0.832 and AUC = 0.913.
The chemogenomic approach, aiming at integrating the chemical space with the genomics space, is demonstrated to be strikingly helpful for representing compound-target associations. A demonstrable feature of our approach is that the information from compounds and targets were integrated to represent compound-target associations. We assume that compound-target interactions can be determined by the structural features from compounds and targets, which comprise of a pharmacological space. To demonstrate the reliability of our assumption, we re-established our model using only the structural information from a single space (i.e., chemical space or genomics space), that is, models are constructed only using the compound features or protein features, respectively. The statistics of these models on the stratified 10-fold CV were summarized in the “Separated” rows of Table 1. The ROC curves of the re-established models were shown in Fig. 3.
As can be seen from Table 1, the models with the compound features or protein features provided noticeably inferior predictions. The comparison between the models with the separated features and those with the integrated features sufficiently indicates that the structural information from compounds and targets contributes to the discrimination of compound-target associations cooperatively. Somewhat surprising, our comparison also illustrated that the features from compounds seem to be more predictive than those from target proteins.
The ensemble model performs well than individual models
Due to the different strengths in compound-target interaction prediction caused by different descriptor groups, we attempt to improve the prediction performance through their combination. We built three types of ensemble models by averaging the predictions given by the six individual models (Average)[69], taking the maximum value given by the six individual models (Maximum) and obtaining new scores using the stacked models reported by Nicholas (Stacked)[70]. The performance statistics are summarized in Table 2 and the ROC curves are shown in Fig. 4. The result shows that the ensemble model (Average) yielded better predictive ability than any individual model, with the improved ACC of 0.01–0.1 and AUC of 0.01–0.1. It appeared that it could capture the relationship between compound-target interaction patterns and the interaction endpoint more efficiently than any individual model. Therefore, the ensemble model (Average) was used as the final model and applied for the subsequent analysis.
Table 2
Prediction results of different ensemble models on the stratified 10-fold CV.
Methods | | Pair-split | | Compound-split |
| ACC | SE | SP | AUC | | ACC | SE | SP | AUC |
Mean | | 84.83 ± 0.16 | 86.96 ± 0.10 | 82.44 ± 0.24 | 92.84 | | 84.41 ± 0.03 | 86.5 ± 0.04 | 82.13 ± 0.04 | 92.41 |
Maximum | | 80.73 ± 0.19 | 94.53 ± 0.08 | 65.39 ± 0.32 | 92.50 | | 79.97 ± 0.05 | 94.63 ± 0.05 | 63.71 ± 0.08 | 92.01 |
Stacked | | 83.80 ± 0.23 | 85.07 ± 0.20 | 82.37 ± 0.30 | 91.53 | | 82.93 ± 0.08 | 84.33 ± 0.08 | 81.40 ± 0.11 | 91.50 |
Evaluation of the target prediction performance of the ensemble model
Under the premise of ensuring good classification performance of compound-target interactions, the target prediction performance of the ensemble model was then evaluated, which was the focus of our study that we attempted to verify whether our method could be expanded to the application of target prediction. For each compound to be predicted, a vector of 859 compound-target interaction scores could be outputted by the ensemble models and the targets with higher scores are considered as the target prediction result. Therefore, the target prediction performance was verified here using the recall rate, namely the fraction of the known targets identified in the top k of the prediction list. Undoubtedly, the performance improved with the increasing number of the picked targets. However, if the threshold of the number of selected targets is high, the number of the targets to be experimentally tested increases and thus the efficiency of the model application decreases. Inversely, if the threshold of the number of the selected targets is low, many targets recognized as inactive might be actually active. For practicality, approximately the top 1–10 targets out of the total 859 targets are proposed as the candidate targets.
The result on the stratified 10-fold CV was showed in Table 3. The average recall rates of the top-1 and top-10 metrics for pair-split validation datasets were 28.54% and 59.50%, respectively, implying that there are 28.54% and 59.50% of known targets were enriched to the top-1 and top-10 of the ranked list by our model. Given that predictions were made among the 859 possible human targets, these recall rates of the top-1 and top-10 metrics correspond to approximately the 245-fold (28.54%/(1/859)) and 51-fold (59.50%/(10/859)) enrichment compared to random picking, respectively.[30] As for the compound-split validation datasets, the average recall rates of the top-1 and top-10 metric were 26.78% and 57.96%, respectively, which refer to approximately the 230-fold (26.78%/(1/859)) and 50-fold (57.96%/(10/859)) enrichments.[30] By the way, the targets to be correctly predicted evenly distributed across different target classes, which recognized the unbiased prediction performance for different target classes. This indicated that our ensemble models based on the chemogenomic approach could push true targets at the top of the ranking list and make some efforts to narrow down the potential targets to be tested.
Table 3
Recall rates of the ensemble model measured on the stratified 10-fold CV datasets.
| Top1 | Top3 | Top5 | Top7 | Top9 | Top10 |
Pair-split | 28.54 ± 1.22 | 43.92 ± 0.56 | 50.63 ± 0.46 | 55.00 ± 0.35 | 58.18 ± 0.28 | 59.50 ± 0.26 |
Compound-split | 26.78 ± 0.12 | 42.80 ± 0.23 | 49.42 ± 0.22 | 53.59 ± 0.17 | 56.69 ± 0.17 | 57.96 ± 0.14 |
To further verify whether the ensemble model had better target prediction performance than the individual models based on various integrated descriptor groups, the prediction abilities of the individual models were prerecorded and compared with that of the ensemble model. As for the compound-split validation, the average recall rates of the top-k targets for diffident models on the stratified 10-fold CV datasets, were plotted in Fig. 5. As shown in Fig. 5, the performance of the individual models was greatly inferior to that of the ensemble model. The recall value of each individual model for top 1 was lower than 20% even lower than 10%, while that for the ensemble model was 26.78%. The recall rate of top 10 for each individual model was lower than 40%, while that for the ensemble model was 57.96%. The recall values of the models in decreasing order were as follows: Ensemble (Average) > > ECFP4_Proa > ECFP4_Prob > Mol2d_Proa > Mol2d_Prob > MACCS_Prob > MACCS_Proa, which further illustrated the robustness and predictivity of the ensemble model based on the chemogenomic approach for target prediction.
Target prediction performance for external test sets
To validate the generalization ability of our ensemble model on the external test dataset, we collected nonduplicated compound-target interactions with Ki less than 100nM from the PDSP Ki (Psychoactive Drugs Screening Program Ki Database)[71] and NPASS databases (Natural Product Activity & Species Source Database)[72] to evaluate the ability of the model. After compound filtering and preprocessing, we finally obtained 442 compounds with 778 compound-target interactions from the PDSP Ki database and 122 compounds with 181 compound-target interactions from the NPASS database. The two test datasets include 94 and 113 proteins, respectively.
The target prediction results were shown in Table 4. For the compounds from the PDSP Ki database, 147 targets (out of 778) were ranked at the top-1 of the predicted target list, with a recall rate of 18.89%. The NPASS dataset obtained a recall rate of 8.84%, indicating that 16 targets (out of 181) were successfully predicted in the top-1 list. The performance gap between these two datasets might be explained by the fact that the enough knowledge about natural products didn’t be well learned by the model constructed by datasets mostly composed of synthetic compounds. However, whether for the PDSP Ki dataset or NPASS dataset, more than 45% targets were enriched in the top-10 of the predicted ranking list (a recall rates of 53.34% and 45.30% for PDSP Ki and NPASS for the top-10 prediction, respectively). Although the performance of these external datasets was fractionally inferior to that of the stratified 10-fold CV, it highlighted the capability of our model to enrich active targets for different sets of compounds, even for natural products.
Table 4
The target prediction results of the external test sets.
Top k threshold | PDSP Ki (778) | NPASS (181) |
Count | Recall (%) | Count | Recall (%) |
Top1 | 147 | 18.89 | 16 | 8.84 |
Top3 | 257 | 33.03 | 49 | 27.07 |
Top5 | 322 | 41.39 | 68 | 37.57 |
Top7 | 363 | 46.66 | 72 | 39.78 |
Top10 | 415 | 53.34 | 82 | 45.30 |
Comparison with alternative approaches
Our model was compared with some state-of-the-art target prediction tools including SwissTargetPrediction (the updated 2019 version)[30], HitPickV2[73], PPB2[74], PPB[75] and TargetNet[36]. The comparation dataset was the validation data from SwissTargetPrediction, containing 500 ligands annotated as direct binders with the high activity (Ki, KD, IC50 or EC50) < 1 nM, associated with 1,061 ligand-target interactions. The ligands present in our model were firstly removed from the model to rebuild a new one in order to avoid potential bias. The recall rate defined in this study was used in the comparison between our ensemble model and four web tools, whereas the reported statistics metric of the SwissTargetPrediction, i.e., the fraction of compounds for which at least one known target was identified in the top-1 or top-15 of the prediction lists, was used in the comparison between our model and SwissTargetPrediction. The comparison results are listed in Table 5.
The comparison results with HitPickV2, PPB2, PPB and TargetNet showed that our ensemble model performed better than any other method for the recall rate on top-1 predictions, including the popular HitPickV2 (Recall: 26.96% vs. 24.69%) and PPB2 method NN(MQN) + NB(ECfp4) (Recall: 26.96% vs. 14.89%). For the top-10 predictions, the performance of our model was better than those of all other models except PPB2 NN(ECfp4) + NB(ECfp4) (Recall: 63.99% vs. 64.75%). The above results are very encouraging, especially since it is not clear whether the tested
Table 5
Comparison results with alternative state-of-the-art prediction methods.
TopK | aHitPickV2 | | aPPB2 | aPPB | aTargetNet | aOur model | bOur model | bSwisstarget Prediction |
NB(ECfp4) | NN(ECfp4)+ NB(ECfp4) | NN(ECfp4) | NN(MQN)+ NB(ECfp4) | NN(MQN) | NN(Xfp)+ NB(ECfp4) | NN(Xfp) |
1 | 24.69 | 16.49 | 14.89 | 16.59 | 21.87 | 10.65 | 21.49 | 16.49 | 5.18 | 23.20 | 26.96 | 57.00 | 28.00 |
3 | 56.74 | 35.06 | 52.31 | 52.88 | 52.40 | 22.43 | 52.40 | 30.91 | 18.85 | 41.85 | 56.36 | - | - |
5 | 58.43 | 47.03 | 60.92 | 57.96 | 57.21 | 26.96 | 60.30 | 35.34 | 25.82 | 46.37 | 59.33 | - | - |
7 | 60.82 | 53.35 | 62.76 | 61.29 | 60.04 | 30.16 | 61.30 | 39.21 | 29.78 | 48.91 | 60.89 | - | - |
10 | 62.20 | 60.98 | 64.75 | 63.62 | 63.05 | 34.68 | 62.58 | 45.62 | 34.40 | 50.99 | 63.99 | - | - |
15 | - | - | - | | - | - | - | - | - | - | - | 76.00 | 72.00 |
a Recall rate defined in our article (%);b the fraction of compounds for which at least one known target was identified in the top-1 or top-15 of the prediction lists (%) |
interaction pairs have been used in the construction of other models.
Comparison results with SwissTargetPrediction showed that for 360 molecules (72%), at least one of the experimentally known targets can be found among the predicted top-15 of SwissTargetPrediction, while for 379 molecules (76%), at least one of the experimentally known targets can be found among the predicted top-15 of our method. More importantly, our model detected at least one known target at top-1 prediction for 57.0% of ligands, with 28% for SwissTargetPrediction. These excellent results supported that our ensemble model is a strongly powerful target prediction engine to enrich active targets which may strongly bind/associate to compounds. It is expected to make some efforts for narrowing down the set of potential targets to be experimentally tested and to be of interest to the audiences for wider scientific community.