An (MLR) multiple linear regression analysis was performed using statistical data miner [14] on the training set compounds to establish a correlation between activity and various descriptors of the compounds. The most significant correlation obtained is shown by eq. 1.
pIC50 = -3.3304+ 0.2102 (±0.1842) RDF115p +1.3421 (±0.7593) E2s
+3.4518 (±0.8113) R1i (1)
n =40, r2= 0.7249, r2cv = 0.655, r2pred = 0.652, s =0.343 , F= 31.617
in equation (1), n refers to the number of data points used in the correlation, r2 is the square of the correlation coefficient , r2cv is the square of cross-validated correlation coefficient obtained by leave-one-out (LOO) jackknife procedure, and r2pred is the square of correlation coefficient obtained for test set compounds to judge the external validity of the correlation.
Values of r2cv and r2pred are calculated according to eqs. (2) and (3), respectively, where obsd in eq. (2) refers to the observed activity of compound in the training set and that in eq. (3) to the compound obsd in test set. Similarly, pred in eq. (2) refers to the predicted activity of compound pred in the training set obtained in LOO jackknife procedure and that in eq. (3) to that predicted for the test test compounds by model obtained for the training set . However, av, obsd in both the eq. refers to the average activity of the training set compound.
r2cv= 1 − [Σ(obsd − pred)2/ Σ(obsd – av,obsd)2] (2)
r2pred = 1 − [Σ(obsd − ,pred)2/ Σ(obsd− av,obsd)2] (3)
The correlation is supposed to be valid and has the good internal predictive ability if r2cv > 0.60. Similarly, the external predictive ability of the model is supposed to be good if its r2pred > 0.5. From both the parameters, the correlation expressed by eq. (1) is found to be quite valid. Among the remaining two statistical parameters, s and F, s is the standard deviation and F is the Fischer-ratio between the variances of calculated and observed activities. The figure within the parentheses with (±) sign refers to the 95% confidence intervals.
The F-value given in parenthesis refers to the standard F Value at the 99% level. A higher value of F indicates a good correlation. Also, all the descriptors used in this correlation are found to be quite significant if we remove them one by one, the significance of the correlation is appreciably dropped (eq. 4-5)
PIC50 = -3.628+1.525(±0.783) E2s +3.558 (± 0.851) R1i [4]
n =40, r2= 0.684, r2cv = 0.575, r2pred = 0.639, s = 0.363, F= 40.034
PIC50 = -2.406+3.347(± 0.992) R1i [5]
n =40, r2= 0.551, r2cv = 0.497, r2pred = 0.543, s =0.426 , F= 46.663
Thus, from the above results, it is clear that eq. (1) has a noteworthy correlation between the inhibitory activity and the structural descriptors of the compounds. Although the correlation does not have any mechanistic aspects, but it has good predictive ability. A graph drawn between the predicated and observed activities for both the training and test sets (fig.1) further shows that the model has good predictive ability. Figure 1 shows that except 1 or 2 points, all other points lie near the straight line. Using this MLR model [eq.(1)], we have predicted some new compounds, as shown in Table 2, Where each compound has a higher activity value than any compound in the existing series (Table 1).
Docking Analysis
Molecular docking was performed on the predicted compounds in Table 2 using Lead IT Flexx software to get the binding mode of these compounds. The potency of a molecule is determined by its ability to interact with an enzyme. For studying molecular docking, the crystal structure of the related enzyme is very important, which can now be retrieved from the RCSB protein data bank.We selected the enzyme with PDB entry code 1a4g (https://www.PDB.Org)[15].
The compounds listed in Table 2 were docked into this enzyme and their docking results are shown in Table 3. The molecular docking analysis was carried out on all of the compounds predicted to be present in the enzyme.
Here we cited only compound 1 (Fig.2), this compound having the highest predicted activity and the compound 10 (Fig. 3) having the highest docking score ( Table 3) just to illustrate the best possible interactions between the inhibitors and the enzyme. From Fig. 2&3 it is clear that the predicted compounds have good interactions with the enzyme. They all undergo hydrogen bondings as well as steric interactions, in which several moieties of compounds are surrounded by the different active clefts of the enzyme. The penetration of any moiety of any inhibitor in any cavity of the enzyme will depend on its flexibility. All these steric interactions might involve dispersion interactions, which is a set of electronic interactions.
Table 3: Molecular Docking Results of the Predicted Molecules :
Compd
No.
|
No. of H bonds
|
H-bonds
|
H-bonds length
|
Score
|
1
|
5
|
O(11)-LYS44
O(27)-ASN150
H(38)-MET96
H(49)-GLY27
H(51)-THR23
|
4.70
4.61
3.35
4.47
4.38
|
-27.3170
|
2
|
5
|
O(11)-LYS44
O(27)-ASN150
H(39)-MET96
H(50)-GLY27
H(52)-THR23
|
4.70
4.60
3.35
4.34
4.36
|
-27.9540
|
3
|
4
|
O(25)-ASP103
H(52)-ASP103
H(52)-ASP103
H(53)-LEU21
|
3.51
8.16
3.36
4.43
|
-27.1239
|
4
|
10
|
N(9)-ARG118
O(11)-TYR406
O(14)-ARG292
O(14)-ARG292
O(14)-ARG371
O(25)-ASN294
O(25)-ARG292
H(35)-GLU119
H(46)-GLU276
H(57)-GLU276
|
0.51
2.51
2.85
3.96
4.70
2.61
0.32
4.70
0.21
0.43
|
-20.5896
|
5
|
5
|
O(11)-ARG371
O(11)-ARG118
O(14)-GLY348
O(25)-ARG152
H(48)-GLU152
|
4.19
4.7
4.28
4.38
8.3
|
-19.5077
|
6
|
5
|
O(11)-ARG118
O(11)-ARG371
O(14)-GLY348
O(25)-ARG152
H(49)-GLU119
|
4.7
4.21
4.28
4.37
8.3
|
-18.6483
|
7
|
9
|
O(14)-ARG292
O(14)-ARG292
O(14)-ARG371
O(25)-ARG152
O(27)-ARG156
H(39)-ASP151
H(51)-GLU119
H(53)-ASP178
H(72)-GLU119
|
2.36
3.17
4.7
4.7
3.03
2.73
8.3
1.25
0.13
|
-23.0806
|
8
|
7
|
O(14)-ARG371
O(14)-ARG118
O(25)-ARG152
O(27)-ARG156
H(39)-ASP151
H(51)-GLU119
H(53)-TRP178
|
2.34
4.7
4.7
3.21
1.28
8.3
1.88
|
-24.5201
|
9
|
5
|
O(11)-ARG118
O(11)-ARG371
O(14)-GLY348
O(25)-ARG152
H(50)-GLU119
|
4.7
2.97
3.41
0.26
8.3
|
-21.3755
|
10
|
6
|
O(14)-ASP103
N(29)-THR23
H(39)-ASP103
H(39)-ASP103
H(51)-GLU149
H(57)-THR23
|
4.54
3.74
2.27
0.59
4.7
4.24
|
-28.1891
|
Table 4: Pharmacokinetic Properties of the Predicted Compounds of Table 2.
Total Molweight
|
cLogP
|
cLogS
|
H-Acceptors
|
H-Donors
|
490.971
|
1.5781
|
-4.484
|
10
|
4
|
513.061
|
2.7389
|
-4.805
|
10
|
4
|
527.088
|
3.1933
|
-5.075
|
10
|
4
|
527.088
|
3.1933
|
-5.075
|
10
|
4
|
561.105
|
3.1769
|
-5.508
|
10
|
4
|
567.153
|
3.5335
|
-5.969
|
10
|
4
|
533.051
|
2.3632
|
-4.86
|
10
|
4
|
540.954
|
-0.4975
|
-4.065
|
12
|
5
|
521.001
|
-0.325
|
-4.084
|
12
|
5
|
529.064
|
0.3814
|
-4.135
|
12
|
5
|
591.135
|
1.2738
|
-5.108
|
12
|
5
|
Pharmacokinetic studies
The pharmacokinetic properties of the predicted compounds were obtained using Data Warriors software [16], and the results are shown in Table 3. These pharmacokinetic properties include molecular weight (MW), ClogP, number of hydrogen bond acceptors (HA), and number of hydrogen bond donors (HD).According to Lipinski's rule of five, compounds with MW<500 and ClogP<5 should have good Absorption and penetration capacity