The development of the QSAR model’s prediction capability was evaluated using these statistical parameters: r2 – correlation coefficient, q2 – Cross-validated correlation coefficient, CCC – concordance correlation coefficient, IIC – index of ideality of correlation, s – standard error of estimation, MAE – Mean absolute error and F – Fischer ratio. The numerical values of these, presented in Table 1, were utilized for determining how good the developed conformation-independent QSAR models which were obtained through the Monte Carlo optimization method actually were. The Av abbreviation represents the average value for the statistical parameters secured from three independent Monte Carlo optimization runs. When it comes to the calculated numerical values, the Monte Carlo optimization method yielded QSAR models that displayed good reproducibility and a high predictability potential. Also, the used metrics indicate that the best QSAR model for antiMES activity was attained for the third split, where the T value was 4 and the Nepoch was 11. According to the applied methodology for AD, all the molecules were within the calculated AD, and no outliers were detected. Figure 1 presents the graphical representation of the best developed QSAR model (the highest obtained r2 value) regarding the best Monte Carlo optimization run for all three splits. Figure 1 also presents the difference between the experimental and the calculated pED50 values regarding the best Monte Carlo optimization run, both for the molecules in the test set and for those in the training set.
Table 1
The statistical quality of the developed QSAR models for antiMES activity
|
|
Training set
|
Test set
|
|
|
r2
|
q2
|
CCC
|
IIC
|
s
|
MAE
|
F
|
r2
|
q2
|
CCC
|
IIC
|
s
|
MAE
|
F
|
Split 1
|
1 run
|
0.9439
|
0.9712
|
0.9310
|
0.9377
|
0.156
|
0.125
|
758
|
0.9196
|
0.9563
|
0.9589
|
0.8945
|
0.198
|
0.143
|
137
|
2 run
|
0.9341
|
0.9659
|
0.9262
|
0.9271
|
0.169
|
0.137
|
638
|
0.9199
|
0.9535
|
0.9591
|
0.8916
|
0.200
|
0.145
|
138
|
3 run
|
0.9378
|
0.9679
|
0.9280
|
0.9308
|
0.164
|
0.129
|
679
|
0.9190
|
0.9411
|
0.9586
|
0.8984
|
0.215
|
0.163
|
136
|
Av
|
0.9386
|
0.9683
|
0.9284
|
0.9319
|
0.163
|
0.130
|
692
|
0.9195
|
0.9503
|
0.9589
|
0.8948
|
0.204
|
0.150
|
137
|
Split 2
|
1 run
|
0.9183
|
0.9574
|
0.9583
|
0.9082
|
0.189
|
0.157
|
494
|
0.9337
|
0.9656
|
0.9654
|
0.9148
|
0.185
|
0.123
|
183
|
2 run
|
0.9357
|
0.9668
|
0.8867
|
0.9282
|
0.168
|
0.141
|
640
|
0.9101
|
0.9537
|
0.9534
|
0.8744
|
0.214
|
0.150
|
132
|
3 run
|
0.9303
|
0.9639
|
0.9102
|
0.9227
|
0.175
|
0.145
|
587
|
0.9114
|
0.9544
|
0.9545
|
0.8724
|
0.212
|
0.144
|
134
|
Av
|
0.9281
|
0.9627
|
0.9184
|
0.9197
|
0.177
|
0.148
|
574
|
0.9184
|
0.9579
|
0.9578
|
0.8872
|
0.204
|
0.139
|
150
|
Split 3
|
1 run
|
0.9663
|
0.9829
|
0.9420
|
0.9636
|
0.125
|
0.102
|
1292
|
0.9345
|
0.9304
|
0.9666
|
0.9101
|
0.276
|
0.214
|
171
|
2 run
|
0.9767
|
0.9882
|
0.9471
|
0.9750
|
0.104
|
0.083
|
1885
|
0.9280
|
0.9254
|
0.9633
|
0.9034
|
0.283
|
0.249
|
155
|
3 run
|
0.9747
|
0.9872
|
0.7974
|
0.9726
|
0.109
|
0.085
|
1736
|
0.9299
|
0.9309
|
0.9643
|
0.9068
|
0.270
|
0.234
|
159
|
Av
|
0.9726
|
0.9861
|
0.8955
|
0.9704
|
0.113
|
0.090
|
1638
|
0.9308
|
0.9289
|
0.9647
|
0.9068
|
0.276
|
0.232
|
162
|
r2 – Correlation coefficient
q2 – Cross-validated correlation coefficient
CCC – Concordance correlation coefficient
IIC – Index of ideality of correlation
s – Standard error of estimation
MAE – Mean absolute error
F – Fischer ratio
Av – Average value for statistical parameters obtained from three independent Monte Carlo optimization runs
|
[Figure 1]
All the realized QSAR models exhibited high reproducibility in accordance with the obtained concordance correlation coefficient (CCC). The application of the MAE-based metric provided further validation, indicating that all the QSAR models were GOOD. The sturdiness of the developed QSAR models was determined through Y-randomization (with the Y values scrambled in 1000 trials over ten separate runs), and the developed QSAR models do not have accidental correlations, as per the results presented in Table 2. The final estimation whether QSAR is adequate or not was performed with the index of ideality of correlation (IIC). The obtained numerical values show that all the developed QSAR models possess a high predictive potential.
Table 2
Y-randomization of the best QSAR model (best optimization run) for three independent splits
|
Split 1
|
Split 2
|
Split 3
|
|
Training
|
Test
|
Training
|
Test
|
Training
|
Test
|
0
|
0.9341
|
0.9199
|
0.9183
|
0.9337
|
0.9663
|
0.9345
|
1
|
0.0002
|
0.2483
|
0.0545
|
0.0644
|
0.0325
|
0.2481
|
2
|
0.0935
|
0
|
0.0017
|
0.1105
|
0.0214
|
0.0009
|
3
|
0.0002
|
0.058
|
0.0031
|
0.0479
|
0.0128
|
0.2371
|
4
|
0.0140
|
0.0447
|
0.0547
|
0.0737
|
0.0995
|
0.0043
|
5
|
0.0054
|
0.0863
|
0.0047
|
0.0203
|
0.0116
|
0.0108
|
6
|
0.0102
|
0.051
|
0.0037
|
0.0181
|
0.0028
|
0.0415
|
7
|
0.1258
|
0
|
0.0935
|
0.1775
|
0.0068
|
0.0066
|
8
|
0.0130
|
0.0039
|
0
|
0.2512
|
0.0167
|
0.0842
|
9
|
0.0115
|
0.0148
|
0.0136
|
0.2168
|
0.0022
|
0.0057
|
10
|
0.0023
|
0.2767
|
0.0074
|
0.1475
|
0.0183
|
0.0286
|
Rr2
|
0.0276
|
0.0784
|
0.0237
|
0.1128
|
0.0225
|
0.0668
|
CRp2
|
0.9202
|
0.8799
|
0.9064
|
0.8755
|
0.9550
|
0.9005
|
CRp2 = R × (R2-Rr2)1/2 should be > 0.5 [32]
|
A mathematical representation of the best QSAR models based on the obtained test set r2 for all the splits and for the antiMES activity is cited in Eq. 6–8.
Split 1: pED50 = 1.2600(± 0.0231) + 0.0360(± 0.0003)×DCW(2,14) (6)
Split 2: pED50 = 1.6112(± 0.0246) + 0.0239(± 0.0002)×DCW(4,6) (7)
Split 3: pED50 = 1.0330(± 0.0102) + 0.0342(± 0.0001)×DCW(4,11) (8)
The presented Eq. 6–8 show that the preferable T and Nepoch values for split 1 are 2 and 14, that the preferable T and Nepoch values for split 2 are 4 and 6, and that the preferable T and Nepoch values for split 3 are 4 and 11 for the antiMES activity.
One of the chief goals of this research was to determine molecular fragments, defined as the SMILES notation optimal descriptors with a positive and negative influence on the examined activity [34, 35, 47–50]. The full list of molecular descriptors can be found in Table S2 (Supplementary Material). These are based on the molecular graph and the SMILES notation. Table 3 displays an example of the calculation of a molecule’s summarized correlation weight (DCW) and the studied activity (pED50). Here, the molecular graph-based descriptors were excluded for the purpose of getting an easier interpretation. The summarized process in Fig. 2 represents the Computer-Aided Design (CAD) of higher/lower activity compounds, which represents one of the key goals of this research. The conformational-independent results in the CAD process generated the design of nine novel potential inhibitors (structures presented in Fig. 2).
[Figure 2]
Molecule A represented the template molecule, because it is one of the least chemically exploited molecules. Figure 2 shows the highlighted part of molecule A, which is favourable for chemical modification with the use of the SMILES notation fragments that have a positive effect on the studied activity, and those that were obtained from the conformational-independent QSAR studies. Table 4 cites the list of all the designed molecules, in addition to their calculated pED50values.
Table 4
The list of all the designed molecules with their SMILES notation and calculated activities
Molecule
|
SMILES notation
|
pED50
|
A
|
Nc1ncnc2c1ncn2Cc1ccccc1
|
4.1415
|
A1
|
Cc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.1856
|
A2
|
CCc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.2763
|
A3
|
CC(c1ccc(cc1)Cn1cnc2c1ncnc2N)C
|
4.5860
|
A4
|
Nc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.4017
|
A5
|
CNc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.4697
|
A6
|
CN(c1ccc(cc1)Cn1cnc2c1ncnc2N)C
|
4.6939
|
A7
|
Oc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.1865
|
A8
|
COc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.2237
|
A9
|
Fc1ccc(cc1)Cn1cnc2c1ncnc2N
|
4.1983
|
On the basis of the results that the QSAR modelling yielded, the SMILES notation descriptors associated with molecular fragments that have a positive impact on the pED50 for both the studied activities and the yield increase in such activities are the following: “C............” – carbon atom or a methyl group, fragment that has a positive effect, whose addition led to the increase in the calculated pED50 values for molecule A1, when compared to the calculated pEC50 values for template molecule A; “C...C.......” – two connected carbon atoms or ethyl group, fragment possessing a positive effect, whose addition led to the increase in the calculated pIC50 values for molecule A2, when compared to the calculated pED50 values for the template molecule; “C…(…c…”, “C…(…”, “c...1...(...”, “ c...C.......” – associated with the addition of at least one methyl group to benzene, leading to the molecule’s branching; “N...........” – nitrogen atom, “N...C.......” – amino group, “N…c…” – nitrogen atom bonded to benzene, “C…N…(…” – branching on the amino group, are fragments found in molecules A3, A5 and A6, molecules that have a primary, secondary and tertiary amino group, respectively, all with higher calculated values for pEC50 when compared to template molecule A; “O...........” – oxygen atom, “O...c...1...”, “c...O.......” and “c...O...C...” – fragments indicating that the hydroxyl/methoxy group is bounded to benzene; molecules A7 and A8 have an added hydroxyl/methoxy group in the para position, and the calculated values for pED50 for the studied activity are higher when compared to molecule A; “F..........”, “HALO10000000”, “++++F–N===” – SMILES descriptors indicating that the molecule has a fluorine atom, as well as a combination of a fluorine and nitrogen atom, “c...F......” – SMILES descriptors indicating that the fluorine atom is bound to the benzene ring; molecule A9 has an added fluorine atom in the para position and the calculated values for pED50 are higher when compared to molecule A.
Molecular docking studies were conducted on all the designed molecules and template molecule A, with both studied enzymes in order to evaluate the predictability of the developed QSAR models and further validate the said models. Table 5 cites all the numerical values for the calculated “scoring” functions. Various scoring functions could be associated with various ligand-amino acids interactions. For this reason, when the inhibitory potency assessment is made, all needs to be taken into consideration. The MolDock and ReRank “score” function results showed that the molecules that potentially possess the highest inhibitory activity are molecules A6 and A7, in the case of the studied enzyme, and the QSAR modelling results support this. It has been proven that template molecule A is the molecule with the lowest MolDock and ReRank “score” function values, which has a good correlation with the QSAR modelling results. Literature [34, 51] outlines detailed definitions of other “scoring” functions, in addition to the type of the potential impact these may have on inhibitory activity. Moreover, the aforementioned “scoring” function values could be used to perform similar analyses. The Supplementary Information section Figures contain all the interactions between the amino acids originating from the dopamine transporter active site and the selected molecules. They also exhibit a 2D representation of hydrogen bonds and the hydrophilic and hydrophobic interactions within the binding pocket. Figure 3 contains the best-calculated poses of all the designed molecules within the human voltage-gated sodium channel, the brain isoform active site.
Table 5
Score values (kcal/mol) for all computer-aided designed compounds
Molecule
|
HBond
|
NoHBond
|
Steric
|
VdW
|
Energy
|
MolDock
Score
|
Rerank
Score
|
A
|
-6.73837
|
-6.81011
|
-94.4714
|
-23.5086
|
-93.8656
|
-95.6522
|
-69.0745
|
A1
|
-4.67594
|
-5.87713
|
-98.0504
|
-15.3582
|
-94.2286
|
-97.2073
|
-69.9425
|
A2
|
-5.57927
|
-6.41380
|
-102.057
|
-16.1269
|
-100.603
|
-102.689
|
-74.6128
|
A3
|
-5.25352
|
-6.33121
|
-106.478
|
-12.9107
|
-105.119
|
-107.437
|
-75.1959
|
A4
|
-10.0777
|
-10.3034
|
-96.2437
|
-22.5016
|
-100.270
|
-101.815
|
-78.7717
|
A5
|
-9.4656
|
-10.4304
|
-101.065
|
-24.8663
|
-104.107
|
-106.331
|
-82.3626
|
A6
|
-4.89265
|
-6.69276
|
-106.761
|
2.68538
|
-113.635
|
-114.811
|
-75.2484
|
A7
|
-7.9973
|
-8.71823
|
-96.8022
|
-23.2328
|
-98.1076
|
-100.258
|
-77.3707
|
A8
|
-5.21338
|
-6.15364
|
-101.998
|
-17.6925
|
-98.5957
|
-100.900
|
-74.2967
|
A9
|
-4.55001
|
-6.17604
|
-99.1365
|
-16.7948
|
-95.4206
|
-98.0054
|
-71.3171
|
[Figure 3]
Ensuring that new compounds possess physical and chemical features to classify them as potential therapeutics represents one of the first steps in the early development stages of drugs. A molecule needs to exhibit the following key features in order to be classified as drug-like: good absorption/permeation, gastrointestinal absorption and brain penetration, oral bioavailability, optimal bioavailability and the efficacy of binding to receptors/channels. These are the features which the molecule structure can predict. The SwissADME web service was used during the research withn the aim of calculating the physicochemical descriptors and predicting pharmacokinetic properties, medicinal chemistry friendliness, ADME parameters, as well as the drug-like nature of the designed molecules in order to support drug discovery [51]. Table S3 (Supplementary Material) presents the results obtained, and these results indicate that all the designed molecules have high drug-likeness.