3.1 Homology modelling
Resistance D1 protein (gi|30413115|gb|AAP33145.1) was taken as a query sequence to find out the most suitable template and it has shown significant alignment with chain-A of the photosynthetic oxygen evolving center (1S5L) having a score: 503, Query coverage: 99%, identity: 90% and E value: 1e-178 and resolution 3.5Å. Cyanobacterial D1 protein (pdb|3PRQ) complex with terbutryn at the QB binding site has also shown significant alignment; i.e. query coverage of 100%, identity 90%, similarity 95.2%, and E-value 1e-178 without any gap in the alignment and resolution 3.2 Å (Fig. 2). The crystal structure of Cyanobacterial PS-II consists of 20 polypeptides along with 88 ligands and 10 cofactors [38] and it was selected as a template protein for modelling. As a result, 10 models were built and selected based on their discrete optimized protein energy (DOPE) scores and root mean square deviation (RMSDs). A short 1000 and 500 step of the steepest descent and conjugate gradient minimization have been performed respectively to minimize the strain of modelled D1 protein Fig. 2.
The modelled D1 protein has shown phi and psi torsion angle of 93.6% amino acid residues in core regions and 6.4% amino acid residues in the allowed region, whereas the template structure showed 82.3% amino acid residues in the core region and 17.0% amino acid residues in the allowed region (Table S1 and Figure S1). The quality of the model was also validated by the ProSA server, a web server for protein structural analysis. Moreover, The ProSA Z-score was − 4.0 which indicates the overall quality and measures the deviation from the total energy of the structure with respect to an energy distribution derived from random conformations. A negative PROSA score indicates the correctness of the modelled protein. Verify 3D was used for validation of the modelled protein and it showed an average 3D-1D profile score − 0.09 and 0.52. TM-align server was used to further validate the modelled protein which showed TM score 0.99125 (0.5 < TM Score > 1, indicates about the same fold), suggesting the modelled protein to be of good quality, as shown in Fig. 3. Thus, results suggest that the comparative modelled structure of D1 protein is of good quality, and could be used for further studies. The homology modeling was performed to get the blueprint of the binding mode of herbicide terbutryn complexed with D1 protein (PDB ID3PRQ/4V82) which is precisely located at the QB site of D1 protein for the design and development of herbicides. The target template identity is ~ 90% without any gap so the model structure could be considered as equivalent to crystallographic structure except few side chains [39].
In a previous study, our group has built a chimeric homology model of the PS-II reaction center of P. minor. The D1 protein is truncated at both N and C terminal, so both missing ends were concatenated with ref sequence of T. aestivum's D1 protein. Sequence for D2 protein was taken from its nearest homolog i.e. T. aestivum (from Ref Seq Database). The build reaction center was having one β-carotene, one bicarbonate ion, five chlorophyll-a, two pheophytin-a, two plastoquinone) and cofactors (iron and OEC) along with D1 and D2 protein. To check the integrity and stability of the build chimeric model it was subjected for MD simulation, and we reported that the complete reaction center deviates by 1.7175 Å RMSD from its equivalent bacterial reaction center i.e. template; PDB ID 2AXT [13].
3.3 Designing of new analogs at the binding site of D1 protein
The de novo drug design program Ligbuilder v2 was used to design the new analogs of selected ligands iteratively using an inbuilt library of organic fragments. Overall 28 ligands, i.e. 16, 7, and 5 ligands from class C1, C2, and C3 were set for the de novo synthesis (Table S3). Ligand detection mode of the CAVITY module was used to detect the binding site of the protein and the best binding site was decided based on ligandability (Fig. 5). These were compared with ref_lig binding sites in D1 protein. The automatic extractor module was used which has extracted 71, 29, and 19 seeds from C1, C2 and C3 class respectively (Table S3). Extracted seed molecules were initially used for growing to design new ligands molecules by the BUILD module. BUILD module uses a pocket and grid file generated by the CAVITY module which acts as pharmacophore for de novo synthesis. Here, molecules were synthesized considering the Lipinski rule of five, chemscore, binding affinity, synthesizability, and toxicity parameter for all molecules as well as the molecules were represented with the route of synthesis with cost index. As a result, 282 ligands were synthesized which belongs to C1 (169), C2 (72), and C3 (41) classes.
3.5 Docking
The ELC and DLC hits were docked at the center coordinates (X: 20.63, Y: 65.50, and Z: 35.64) of the binding site of modelled D1 protein using AutoDock4.2 as well as the Molegro tool. The criteria chosen to select the best pose of each hit were AutoDock term (Binding energy, Ligand efficiency, and inhibition constant) and Molegro term (MolDock score, H-bond score and rerank score). One hundred poses per ligand were generated and clustered with an RMSD tolerance limit of 1.5 Å. Binding poses of all hits of the ELC and DLC were selected from the most populated cluster and a summary of these hits are tabulated in Table S5 and Table S6. Thirteen hits form ELC and eleven hits from DLC were selected from each cluster of the ELC and DLC respectively after employing the docking score filter. Three amino acids, i.e. His177, Phe227, and Phe236 in binding sites play a crucial role in interaction with hits. Docking results have shown that ref_lig has two H-bonds and one pi-pi interaction in the binding site of protein with ligand efficiency, -0.25, and inhibition constant 1.08 mM (Table S4 & Figure S2).
The ref_lig docking scores were set as criteria to select the best hits from each cluster, as a result, thirteen and eleven hits selected from ELC and DLC clusters. In addition to ref_lig based criteria, two or more H-bonds and one pi-pi interaction with better binding energy, ligand efficiency, and inhibition constant were employed as selection criteria to select the molecule (Table S5-S6 & Figure S3-S6).
These lead molecules were selected through the rigorous procedure and several attributes were analyzed. It is found that among the extracted leads, most of them have one ring common, having at least one heteroatom with molecular weight, donor site, and acceptor site in the range of 140 to 315, 1 to 4, and 4 to 8 respectively.
3.6 Molecular Dynamics Simulation
To find the binding affinity and binding mode of ligand, the rigid docking followed by flexible docking was applied, but it does not properly sample the conformational space that could lead to a variety of different binding modes of the ligands [40]. The goals of these simulations were to observe the structural behaviour of protein (flexibility of the D1 protein with ref_lig, ELC, and DLC leads) and ligands (the behaviour of leads inside the binding site of D1 protein). MD simulation was performed for each lead protein complex for 10ns using Gromacs4.6.5 in an environment of a lipid bilayer embedded in a periodic box solvated with water shown in Fig. 6. PS-II complex has 20 peptides and 95 Co-factors (PDB ID 4V82). D1 protein complexes with at least one β-carotene, one pheophytin, two α-chlorophylls, two plastoquinone, at least two Fe2 + ions, and one bicarbonate. The PS-II reaction center proteins and associated molecules of the complex in a tight-knit providing stability to each other, and having interlinked functions. In our previous attempt, a chimeric D1/D2 protein was constructed from its nearest homolog and subjected to MD simulation. The chimeric D1/D2 protein has shown RMSD of 1.7175 Å from its templates protein (Singh et al 2012). Therefore, we have refrained from doing any long simulation with the D1 subunit in this study. RMSD curves of the protein backbone; energy profile and H-bond profile of each protein-lead complexes were calculated and compared with the ref_lig protein complex. Variation in the total energy plot of the ref_lig protein complex has shown that the overall energy tends to stabilize after 3ns of simulation (Fig. 6B), which confers the average energy remains constant, thus suggesting structural stabilization. The total energy patterns were similar for all ELC and DLC lead protein complexes, (data are not shown).
Throughout the simulation period, no significant fluctuations were observed in the backbone of protein with ref_lig. It implies that the binding of ref_lig at the binding site of protein is not only stable and strong but also does not perturb the protein backbone stability. The binding stability of the ELC leads (ELC2, ELC3, ELC4, ELC5, ELC6, ELC10, ELC11, and ELC12) and DLC leads (DLC4, DLC6, DLC7, and DLC8) in the binding site of the protein which, have shown comparable fluctuation with ref_lig protein complex throughout the simulation in the size of the noticeable window of 0.30–0.40 nm Fig. 7. It ensures and suggests that these leads are well accommodated inside the binding site throughout the simulation. H-bonds at the binding site of the ref_lig protein complex were determined which revealed that it encompassed four H-bonds throughout the simulation period. The amino acid residues involved in the key H-bond formation at the binding site were Ala225, Ser226, and Phe227. The succession of frames for ELC and DLC leads inside the binding site of the protein have shown that ELC leads ELC2, ELC3, ELC5, ELC7, ELC8, ELC11, and ELC13 and DLC leads DLC3, DLC4, DLC6, DLC8, and DLC11 comprises three to four H-bonds throughout the simulation as shown in Fig. 8. The amino acid residues involved in the key H-bonds in the binding site were His177, Ala225, Ser226, Phe227, Asn229, and Ser232 throughout the simulation. It seems that these leads are also interacting in a similar fashion as ref_lig; which confirms the accommodation of these leads in the protein binding site because ref_lig fitted well into the binding site of protein over the simulated time.
3.6.1 Amino acid fluctuation analysis
The root mean square fluctuation (RMSF) analysis revealed the possible movements of amino acid residues Phe201, Phe222, Tyr224, and Phe227 (inside the binding pocket) and Arg26, Gln127, Arg187 (out the binding site) in the ref_lig ligand-protein complex (Fig. 9). In the case of selected ELC and DLC lead complexes, RMSF appears to be almost similar to the ref_lig except for the other small fluctuations. This signified that protein-ELC and protein-DLC lead complexes are also stable like the ref_lig-protein complex. The amino acid residues inside the binding sites have shown slightly less fluctuation because of the formation of a pi-pi interaction between the leads and the aromatic ring of Tyr224 and Phe227. It might be fluctuating to stabilize the protein lead interaction. Moreover, residues of the binding site showed lower fluctuation, revealing to be a more stable protein-ligand system.
3.6.2 Analysis of hydrogen bond occupancy
The MD trajectories were analyzed for the H-bonds distance and the H-bond occupancy. H-Bond distance is determined by the bond length (< 3.5 Å) and the H-bond occupancy is defined as the time percent existence of a specific H-bond during the simulation. The H-bonds predicted for the ref_lig, ELC, and DLC lead and amino acid residues of the binding site of protein were observed consistently for the major part of trajectory with high occupancy for all the proposed lead molecules. H-bond interactions were categorized as strong (> 60%), medium (40–60%), and weak (< 40%) based on H-bond occupancy with ref_lig and the binding site of the protein. Four H-bond were found between ref_lig and protein, i.e. [email protected]@ref_lig, [email protected]@ref_lig, [email protected]@ref_lig, and [email protected]@ref_lig with occupancy 63.0%, 59.9%, 55.2%, and 60.3%, respectively, and the distance of H-bonds was fluctuating in windows of 1Å to 3.5Å. Occupancies of the H-bonds between protein, ELC, and DLC lead molecules showed a diverse range of fluctuation which is comparable to ref_lig. ELC leads ELC2, ELC3, and ELC8 have shown four H-bonds, where two H-bonds are having occupancy > 70 % and another two H-bonds having occupancy 70 to 30%. ELC4 and ELC7 showed three stable H-bonds with significant occupancy > 40%. DLC leads DLC3, DLC6 and DLC9 showed four H-bonds occupancy > 40%. DLC4 and DLC8 ligands also have shown two stable H-bonds with significant occupancy. Other ELC and DLC lead molecules (except above described ELC and DLC lead molecules) have not shown significant stable H-bond as compared to ref_lig. The occupancies of the H-bonds analysis also showed that amino acid residues His177, Ala225, Ser226, Phe227, and Asn229 were involved in the H-bonding with ref_lig, ELC and DLC lead molecules. However, the residues Ser226 and Phe227 were obtained as key residues to hold the molecules at the binding site as these residues showed higher occupancy in most of the ELC and DLC lead molecules.
3.6.3 Free energy calculations
Binding free energy of D1 protein complexed with ref_lig, ELC, and DLC molecules was determined by the MM/PBSA method. In the case of ref_lig, van der Waals interactions (∆Gvdw), showed the most significant contributions among the van der Waals interactions (∆Gvdw), the electrostatic energy (∆Gele), and the nonpolar solvation free energy (∆Gnonpol, soil). The results reveal that leads from ELC class (ELC1, ELC2, ELC4, ELC5, ELC8, ELC11, ELC12, and ELC13) and lead from the DLC class (DLC1, DLC3, DLC4, and DLC6) displayed higher binding free energy compared to ref_lig binding free energy (-104.627 +/- 10.792kJ/Mol). The order of binding free energy of ELC's and DLC's is ELC4 > ELC 11 > ELC 8 > ELC 13 > ELC 5 > ELC 12 > ELC1 > ELC2 and DLC6 > DLC4 > DLC1 > DLC3 respectively. Binding energies of the ELC and DLC lead molecules are shown in Table 1, which reveals that ∆Gvdw has the most significant contribution to the binding energy.
Table 1
The Table shows the van der Waals interaction energy (∆EvdW), electrostatic energy (ΔEele), polar solvation energy (ΔGpolar), nonpolar solvation energy (ΔGnonpolar), and binding energy (ΔGbinding) of ref_lig molecule, ELC lead, and DLC lead protein complexes. The energy values are average values in KJ/Mol which were determined by the MM/PBSA method.
System
|
a∆EvdW
|
a∆Eelec
|
a∆Gpolar
|
a∆Gnonpolar
|
a∆Gbinding
|
Ref_lig
|
Ref_lig
|
-136.282 +/- 10.140
|
-8.184 +/- 3.246
|
53.919 +/- 6.532
|
-14.080 +/- 0.933
|
-104.627 +/- 10.792
|
ELC lead
|
ELC1
|
-136.168 +/- 18.364
|
-10.486 +/- 10.291
|
52.763 +/- 14.004
|
-12.773 +/- 1.491
|
-106.664 +/- 16.450
|
ELC2
|
-133.086 +/- 10.140
|
-63.940 +/- 7.822
|
105.192 +/- 6.745
|
-13.402 +/- 0.798
|
-105.236 +/- 8.860
|
ELC3
|
-135.888 +/- 11.678
|
-23.117 +/- 10.308
|
79.837 +/- 10.864
|
-13.025 +/- 1.037
|
-92.192 +/- 9.491
|
ELC4
|
-226.920 +/- 13.782
|
-30.143 +/- 10.856
|
92.789 +/- 10.406
|
-20.306 +/- 1.053
|
-184.580 +/- 13.217
|
ELC5
|
-152.496 +/- 9.783
|
-8.146 +/- 2.641
|
53.635 +/- 5.890
|
-16.407 +/- 0.979
|
-123.415 +/- 9.441
|
ELC6
|
-79.543 +/- 9.136
|
-11.230 +/- 3.373
|
36.082 +/- 5.416
|
-7.967 +/- 1.059
|
-62.658 +/- 8.925
|
ELC7
|
-90.852 +/- 12.170
|
-30.849 +/- 6.863
|
77.723 +/- 8.950
|
-11.725 +/- 1.164
|
-55.703 +/- 10.491
|
ELC8
|
-172.742 +/- 13.993
|
-43.575 +/- 7.894
|
90.215 +/- 8.879
|
-15.768 +/- 1.331
|
-141.870 +/- 13.809
|
ELC9
|
-127.335 +/- 14.787
|
-27.564 +/- 8.311
|
66.847 +/- 11.559
|
-14.443 +/- 1.138
|
-102.495 +/- 11.241
|
ELC10
|
-108.888 +/- 22.820
|
-3.367 +/- 3.659
|
28.827 +/- 10.771
|
-12.242 +/- 2.191
|
-95.670 +/- 16.687
|
ELC11
|
-193.160 +/- 25.566
|
-1.984 +/- 1.230
|
40.587 +/- 8.122
|
-17.862 +/- 1.535
|
-172.420 +/- 21.020
|
ELC12
|
-140.765 +/- 9.307
|
-5.079 +/- 2.853
|
48.274 +/- 5.784
|
-14.356 +/- 0.834
|
-111.926 +/- 8.968
|
ELC13
|
-156.923 +/- 15.543
|
-34.140 +/- 6.858
|
75.406 +/- 8.382
|
-15.46 +/- 0.866
|
-131.119 +/- 13.870
|
DLC lead
|
DLC1
|
-135.061 +/- 13.621
|
6.168 +/- 6.265
|
31.185 +/- 9.587
|
-13.956 +/- 1.233
|
-111.665 +/- 12.895
|
DLC2
|
-107.463 +/- 11.009
|
-12.515 +/- 5.824
|
47.658 +/- 6.065
|
-11.334 +/- 0.920
|
-83.655 +/- 10.542
|
DLC3
|
-154.59 +/- 12.638
|
-28.974 +/- 7.750
|
87.996 +/- 7.825
|
-13.844 +/- 0.978
|
-109.416 +/- 11.518
|
DLC4
|
-162.907 +/- 12.942
|
-21.123 +/- 13.218
|
73.483 +/- 10.909
|
-16.567 +/- 1.199
|
-127.114 +/- 13.063
|
DLC5
|
-119.804 +/- 15.573
|
-6.646 +/- 3.462
|
39.220 +/- 8.657
|
-13.798 +/- 1.281
|
-101.029 +/- 12.107
|
DLC6
|
-192.460 +/- 15.952
|
-37.893 +/- 7.908
|
97.723 +/- 11.232
|
-17.244 +/- 0.945
|
-149.874 +/- 13.212
|
DLC7
|
-122.524 +/- 8.754
|
-16.470 +/- 8.328
|
75.856 +/- 9.176
|
-14.270 +/- 1.021
|
-77.407 +/- 9.632
|
DLC8
|
-90.167 +/- 14.000
|
-25.311 +/- 8.323
|
59.914 +/- 12.091
|
-10.921 +/- 1.050
|
-66.486 +/- 10.326
|
DLC9
|
-127.765 +/- 16.923
|
-25.236 +/- 19.172
|
71.181 +/- 16.018
|
-13.815 +/- 1.181
|
-95.635 +/- 12.928
|
DLC10
|
-110.780 +/- 8.874
|
-27.165 +/- 11.678
|
78.280 +/- 18.602
|
-11.361 +/- 0.621
|
-71.026 +/- 10.219
|
DLC11
|
-108.314 +/- 9.016
|
-12.881 +/- 8.995
|
53.002 +/- 6.521
|
-10.721 +/- 0.591
|
-78.915 +/- 8.438
|
From Table 1, it can be seen that ELC1, ELC13, DLC1, and DLC3 showed better binding affinity than ref_lig. However, these lead molecules are discarded in the broad-spectrum purpose as either the number of H-bonds was decreased from the initial structure or the H-bond occupancy was obtained lower than the other lead molecules. Therefore, the lead molecules ELC2, ELC4, ELC5, ELC8, ELC11, ELC12, DLC4, and DLC6 have been selected. These selected ELC and DLC lead molecules and the other known herbicides for D1 protein of PS-II protein were undertaken to predict the physiochemical properties (molecular weight, logP, AlogP, number of hydrogen donors and acceptors, total polar surface area (TPSA), number of rotatable bonds, number of acidic groups, number of rigid bonds, number of aromatic rings and number of hydrogen bonds) (Fig. 11, Table S7).
From the data presented in Fig. 11 Table S7, it is remarkable that the properties of all the proposed molecules are comparable to known PS-II herbicides and having no toxic functional groups. Therefore, proposed molecules (ELC2, ELC4, ELC5, ELC8, ELC11, ELC12, DLC4, and DLC6) considered as potential lead molecules for the experimental testing and further authentication (structures are shown in Fig. 12); ZINC, PubChem ID, and IUPAC name are shown in Table S8. However, only four molecules (ELC8, ELC5, ELC11, and ELC12) were synthesized among the proposed eight molecules. Other molecules were not synthesizable due to the high polarity of intermediates, thereby increasing the cost of synthesis which would not be viable as an herbicide.
Table 2
Synthesized molecules for laboratory testing.
SI. no.
|
Prioritized small molecules
|
Code
|
PubChem/ZINC ID.
|
1.
|
4-chloro-2,6-bis[(5-chloro-2-hydroxyphenyl)methyl]phenol
|
ELC11
|
PubChem ID.- 62616
ZINC ID.- 1679491
|
2.
|
1-benzyl-3-(5-tert-butyl-1,2-oxazol-3-yl)-1-methylurea
|
ELC8
|
PubChem ID.- 20330566
ZINC ID.- 59961084
|
3.
|
6-ethoxy-4-N-(2-morpholin-4-ylethyl)-2-N-propan-2-yl-1,3,5-triazine-2,4-diamine
|
ELC5
|
PubChem ID.- 767097
ZINC ID.- 19520234
|
4.
|
2-[[4-chloro-6-(2-hydroxyethylamino)-1,3,5-triazin-2-yl]amino]-2-methylpropan-1-ol
|
ELC12
|
PubChem ID.- 22333149
ZINC ID.- 6721886
|
3.7. Whole-plant bioassay
3.7.a. Testing the bioactivity of synthesized molecules
The tabulated data of untreated control, treated control with isoproturon (ref, and treated with test molecules (Table 2) were compared to interpret the effect of test molecules against P. minor and wheat with respect to the solvent (2% DMSO ) and distilled water. Non-treated seedlings have a healthy growth rate as compared to treated seedlings. Besides late germination in treated seeds, yellowing of leaflets of treated plantlets has been observed after one week of treatment. Progression in the yellowing of leaflets has been observed in the treated plantlets, reflecting the visible chlorosis in treated plantlets which could be an indirect reflection of hindrance in photosynthesis thereby loss in biomass accumulation. At the end of the experiment, biomass accumulation in plantlets has been tabulated. Biomass accumulation has been compared among the test molecules and isoproturon (ref), to determine the most efficient molecule among the four test molecules graph has been plotted (Fig. 13, Table S9). From the graph Fig. 13, it has been recorded that there is no effect of either control isoproturon or test molecules upon wheat plantlets. However, it has been observed that there is a selective effect of test molecules upon P. minor, with different activity range i.e. ELC5 > ELC12 = ELC8 > ELC11. The efficacy and selectivity of ELC5 (6-ethoxy-4-N-(2-morpholin-4-ylethyl)-2-N-propan-2-yl-1,3,5-triazine-2,4-diamine) is comparable as isoproturon. Thus, it has been further tested and evaluated to determine the GR50 of ELC5 molecule.
3.7.b. Testing the efficacy and sensitivity of ELC5 molecule in laboratory-controlled condition
The ELC5 has been further tested to evaluate its activity at different concentrations (0-600 µM at a regular interval of 40 µM), to determine its GR50. The plot (Table S10) decipher that there is no effect of either isoproturon or test molecule ELC5 upon wheat at any concentration, thus promising the selective activity of ELC5 against P. minor only. The tabulated data of biomass has been statistically analyzed using a descriptive statistics (Confidence level: 95%) approach to verify the truthfulness of the results. Upon further evaluation of tabulated data (Table S10), it has been interpreted that upon the treatment of plantlets with ELC5 there is ~ 50% reduction in biomass of P. minor plantlets at the concentration of ~ 40 µM (Fig. 14).
ELC5 (6-ethoxy-4-N-(2-morpholin-4-ylethyl)-2-N-propan-2-yl-1,3,5-triazine-2,4-diamine) molecule has shown comparable activity to isoproturon a reference molecule against P. minor under laboratory-controlled conditions. It could be further taken for pot assay and field assay (pilot-scale synthesis and testing for various parameters) to test its effectiveness, efficacy, selectivity, and sensitivity in a natural uncontrolled environment. The outcome of the assays could suggest whether the molecule is ready to use or need some chemical modification and optimization.