Similarity calculation
Our study attempts to perform a similarity search by extrapolating the structural information of reference compounds with reported inhibitory activity and presume structurally similar compounds from the diverse natural compound dataset [38]. Thus, the structural relatedness of each compound in the NPACT library was estimated using the Tanimoto coefficient. The Tc values of the library compounds varied from 0.21 to 0.51. The literature evidence illustrates that the identification of compounds with a similarity range above 0.25 would have potential activity similar to the reference structure [39]. Thus, molecules possessing a threshold Tc value ≥ 0.31(average value) were scrutinized for further process. Finally, a total of 617 molecules that exhibited satisfactory similarity metrics to enasidenib were used for the virtual screening protocol.
Virtual Screening
The natural compounds from the NPACT database with Tc greater than 0.31 were reclaimed and pre-processed by employing the LigPrep module of the Schrödinger suite. Prior to molecular docking, the energy grid was generated by enclosing the conserved residues of mIDH2 protein reported in the literature. For instance, the conserved residues incorporated in the grid include Val161, Trp164, Val294, Val297, Leu298, Trp306, Glu316, Ile319, Leu320 and Gly323 [40, 41]. The grid coordinates (-32.31, -14.10, -27.02) encompassing these residues serve as the binding site for the library compounds during the course of molecular docking. The three-tier docking protocol (HTVS, SP and XP) from the Glide suite of Maestro interface was utilized for molecular docking. Each docking mode employs different scoring functions to screen and rank the pose of ligand molecules based on their binding affinity towards the target receptor. Glide XP mode incorporates a sophisticated scoring function that distinguishes the effective binder from non-binders [42]. The XP GScore of all the natural compounds ranged from -8.36 kcal/mol to -0.02 kcal/mol representing its binding potential against mIDH2 protein.
MM-GBSA Calculations
The Prime MM-GBSA module was used as a post docking validation tool that reconfirms the binding affinity of each ligand molecule towards the target protein. The post viewer file from XP docking was used as a query for free energy calculations. The binding free energies of all the docked compounds vary from -60.48 kcal/mol to -8.00 kcal/mol. It is to note that, only 96 molecules exhibited the lowest binding energy in comparison with the reference molecule (-39.12 kcal/mol). Moreover, various interactions such as lipophilicity, covalent bond, van der Waals interactions, non-polar interactions, etc., culminate the relative binding energy of each complex. The results from MM-GBSA analysis highlight that van der Waals and lipophilic energies of the ligand molecules contribute majorly to the binding free energy and thereby facilitating strong binding in the active site of the target protein. A total of 42 natural compounds were observed to exhibit favourable van der Waals and lipophilic energies. In addition, the contribution of covalent interactions was also found to be significant. Therefore, compounds with lower covalent energies were scrutinized for lead optimization. At last, the screening resulted in only 6 compounds namely NPACT00954, NPACT01533, NPACT00031, NPACT00995, NPACT01511 and NPACT00957 that are thermostable with the aid of the van der Waals energy, lipophilic energy and covalent bond interactions (Table 1).
Secondary Site Mutations
In order to attain the goal of identifying novel inhibitors against resistant mIDH2, the scrutinized natural compounds were docked against the second site mutations located at residue glutamine 316 and isoleucine 319 respectively. Initially, the crystal structure of the target protein with R140Q mutant was co-mutated with either Q316E or I319M. The ERRAT scores for both the mutants were found to be 96.5 which emphasizes the high structural quality. The energy grids were generated by enclosing the active site residues and each mutated structure (mIDH2R140Q/Q316E and mIDH2R140Q/I319M) along with the screened hit molecules were subjected to hierarchical docking modes. Further, the docking scores from glide were reaccredited using prime MMGBSA free energy calculations. From Table 2, it is evident that all the 6 molecules possess better binding affinity and binding energies against the mIDH2 variants.
Random Forest-Based Scoring Function
The Random Forest-based scoring function implicitly recognizes the binding of protein-ligand complexes [43]. This scoring function achieves high performance by incorporating more than 15,000 actives and 8,90,000 inactive for training with more than 100 diverse target sets. Of note, the RF-Score was observed to outperform in many instances [44]. Thus, in the present study, all the lead molecules along with the reference were rescored against the primary and secondary mutant structures using RF-Score version v1-3. Interestingly, all 8 molecules showed a better RF score when compared to the reference compound (Table 3).
ADMET Screening
The pharmacological characteristics of lead molecules play a vital role in drug design and development. Of note, several potent lead molecules that express good inhibitory potential fail to satisfy the ADMET properties in clinical trials [45, 46]. Thus, in the present study, efforts were taken to extensively analyze the drug-likeliness of the scrutinized hit molecules using the QikProp program of the Maestro interface. The results of the ADMET analysis are tabulated in Table 4. The percent of human oral absorption of the lead molecules such as NPACT00954, NPACT00123, NPACT01511 and NPACT00957 was found to be greater than 80%. The Lipinski rule comprises the following descriptors namely Molecular weight (<500), octanol/water coefficient (QPlogPo/W < 5), hydrogen bond donor (HBD ≤ 5) and acceptor (HBA≤ 10) were examined. The screened hit molecules namely NPACT01533, NPACT00031, NPACT00995, NPACT01511 and NPACT00957 showed 2 violations in the Lipinski rule. Moreover, it is worth mentioning that the therapeutic efficacy of enasidenib was questioned because of its deprived brain penetrating potential (-0.467) and toxicity endpoints. Amongst the 6 molecules, only NPACT00954 was predicted to have a positive CNS and logBB score of 2 and 2.016 respectively. On top of ADME screening, the toxicity analysis was performed using ProTox II online server to examine the toxicologic endpoints such as cytotoxicity, carcinogenicity, hepatotoxicity and immunotoxicity [30]. The lead molecule (NPACT000954) did not show toxicological effects on the above-mentioned endpoints whereas enasidenib was reported to exhibit immunotoxicity (Fig 1). Finally, the biological activity of the lead molecules was predicted by evaluating the probability of actives and inactives using the PASS server [31]. The hit compound NPACT00954 was observed to show antineoplastic activity against brain tumors (Table 5).
Interaction Profiling
The binding chemistry of the reference and hit complex in the active site of IDH2 mutational variants was analyzed using a protein-ligand interaction profiler (PLIP) [47]. Fig 2-4 represents the interaction profiling of enasidenib and NPACT00954 complexed with three mutational variants. The IDH2R140Q/enasidenib was observed to exhibit its binding affinity through hydrophobic and hydrogen bond interactions. For instance, the reference compound was able to maintain hydrogen bond interactions with residues Gln 316 and Val 294 highlighted in blue-colored lines. The dashed grey lines in Fig 2 depict the hydrophobic interactions, in which enasidenib was found to form four hydrophobic interactions with Val 294, Tyr 311, Val 315 and Ile 319 residues of mIDH2 protein. Interestingly, the hit compound was able to maintain seven hydrophobic interactions with Leu 160, Val 161, Ile 290, Val 294, Tyr 311, Val 315 and Ile 319 residues of the mIDH2R140Q variant. In the case of IDH2Q316E mutation, two hydrogen bond formation and 2 hydrophobic interactions were found with Phe 192, Val 209 and Gln 178, Met 219 residues respectively (Fig 3). Whilst NPACT00954 was consistently having seven hydrophobic interactions with the residues of Phe 196, Trp 207, Lys 251, Leu 255, Tyr 258, Ile 290 and Tyr 311. For IDH2I319M mutation the reference was capable of forming three hydrogen bonds and 2 hydrophobic interactions. Interestingly, the hit compound was observed to exhibit six hydrophobic interactions with Met 194, Trp 207, Tyr 222, Leu 255, Tyr 258 and Tyr 311 which is highlighted in Fig 4. Of note, the decreased hydrogen and hydrophobic interactions of enasidenib with conserved residues of target protein were due to the steric effect caused by co-mutation. The increased hydrophobic interactions of the hit molecule with IDH2 mutational variants might contribute to its high inhibitory effect. In essence, some of these interactions were established with the conserved residues of the target protein, some unique residues such as Trp 207, Leu 255, Tyr 258 and Tyr 311 showed persistent interactions upon NPACT00954 binding [42].
Scaffold Analysis
Shreds of literature evidence highlight that the existence of mild polar groups such as trifluoromethyl pyridine and 2-methyl 2-propanol contributes to the overall binding potency of enasidenib [13]. The structural representation of both the enasidenib and NPACT00954 are depicted in Fig 5 The screened hit NPACT00954 (Squalene) is a natural compound commonly found in olive oil and shark liver oil. An extensive study on squalene highlights its multifunctional role as an antioxidant, anti-infectant, detoxifier, anticancer and skin hydrating agent [48]. In particular, there is an emergence in the use of squalene as supportive therapy for several tumor types including skin, lung and breast cancers [48- 50]. It is believed that the existence of triterpene moiety in the lead compound (NPACT00954) could contribute to its enhanced anti-tumor activity. Of note, the triterpene moiety was reported to possess anti-tumor activity against glioblastomas [51].
Molecular Dynamic Simulation
Conformational Stability Analysis
The binding of enasidenib and squalene in the catalytic pocket of the target can result in conformational variation in mIDH2 structure [52]. Estimation of the RMSD of each complex could serve as an important attribute to characterize the stability of the complex. Consistently, the black trend line represents the results of the enasidenib complex and the red line represents the squalene complex. The results of the RMSD plot depicted in Fig 6 suggest that the binding of enasidenib showed less deviation in the RMSD until 30 ns. The complex attains the highest deviation of ~0.64 nm between 30 to 50ns and the plateau decreased and maintained a stable deviation of ~0.55 nm till the end of the 100 ns simulation period. In contrast, the binding of squalene resulted in lower RMSD deviation than the reference complex until the 10 ns simulation and maintained a stable plateau till 30 ns. Within the time frame of 30 to 100 ns, the hit complex was found to deviate between ~0.53 and ~0.56 nm respectively. Of note, it is worth mentioning that the hit complex exhibited a lower deviation between 30 to 50 ns, whereas the reference complex showed a maximum deviation within the stipulate time boundary. At the end of the 100 ns simulation, the mean RMSD of enasidenib and squalene complex was found to be 0.56 nm and 0.57 nm respectively. Thus, from the RMSD plot, we suggest that the orientation of binding for both the complex was similar, highlighting their stable conformation.
Residual flexibility analysis
The atomistic fluctuations of each residue were assessed to gain insights into the overall flexibility of the mIDH2 structure upon ligand binding. The residual fluctuations of active site residues from mIDH2 protein were depicted in Table 6. It is clear from the table that amongst 51 active site residues 66% of active site residues in the mIDH2 bound squalene complex showed lesser residual fluctuation. It is worth mentioning that, the conserved residues namely Lys 199 and Asp 202 were found to be highly fluctuating (> 0.2 nm) when bound to the enasidenib complex. On the other hand, the squalene bound complex showed lesser fluctuation (< 0.12 nm) when bound to the target protein (Fig 7). We believe that the decrease in the fluctuation of the residues from the hit complex depicts their involvement in the binding of the squalene compound. Altogether from the RMSF plot, we understand that the residual fluctuation of the mIDH2-squalene system was much lower when compared to the reference complex and thus emphasizing its structural stability.
Compactness Evaluation
The radius of gyration directly correlated to the volume of the tertiary structure of the target protein. The Rg value of the reference and the hit complex was calculated using the gmx gyrate tool from gromacs. The Rg trend of the reference complex falls in the range of 2.09 nm to 2.34 nm whereas, for the hit complex, the range varies from 2.14 nm to 2.33 nm. The average Rg for enasidenib and squalene was found to be 2.09 nm and 2.14 nm respectively. The stable protein-ligand complex is supposed to possess a higher Rg value due to less tight packing. The results from the radius of gyration were found to be in agreement with the literature evidence [53]. Thus, it is clear from Fig 8 that the squalene complex is more stable than the reference complex.
SASA
The exposure of the side chain residues around the solvent surface was estimated using the gmx sasa tool of gromacs software. Fig 9 illustrates the SASA plot of the reference and the hit complexes. From the figure, we infer that the enasidenib and squalene compound bound complexes were following a similar SASA trend. The average SASA value of enasidenib and squalene systems was found to be 225.37 nm2 and 224.39 nm2 respectively. In addition, the free energy of solvation of both the complexes was measured. The solvation-free energy of mIDH2-enasidenib and mIDH2-squalene complexes were observed to be 1508.74 kJ/mol/nm2 and 1502.172 kJ/mol/nm2 respectively. The decrease in the SASA value of the mIDH2-squalene complex emphasizes the stability of the system upon ligand binding.
Essential Dynamics
The essential dynamics of enasidenib and squalene complexes reflect the concerted motion of protein structure upon ligand binding [54]. In the present investigation, the dynamic movements of mIDH2 protein were evaluated using covar and anaeig utilities of the gromacs tool. The overall motility of each system was measured by summing the eigenvalues that ultimately depict the flexibility of the complexes [55]. The initial eigenvectors describe the concerted motion of mIDH2 within the conformational subspace. The trace covariance matrix encompassing eigenvalues of mIDH2-enasidenib and mIDH2-squalene was observed to be 13.75 nm2 and 13.68 nm2 respectively. The lower trace value of squalene, when compared to the enasidenib complex, emphasizes the decrease in random fluctuations of protein upon ligand binding. Fig 10 illustrates the 2d projection of enasidenib and squalene trajectories along the conformational phase space. It is evident from the figure that both complexes occupied similar conformational phase space. The atomistic density distribution plot of enasidenib and squalene complex were depicted in Figs 10 b and 10 c respectively. The results from the distribution plot highlight that there is a significant difference in the total movement of each complex [56]. For instance, the difference in the movement of atoms would arise due to the stable binding of squalene with the target protein. In addition, the color code in the distribution plot illustrates the movement of atoms along the phase space. The blue-colored region denotes the anti-correlated movement of atoms and the red colour region depicts the correlated movements. From Figs 10 b and c, we infer that the mIDH2-enasidenib complex showed a strong anticorrelated motion whereas the squalene complex exhibited a restricted movement with net correlated motion highlighting the structural compactness.
Furthermore, to evaluate the reproducibility of the MD results, both the systems were subjected to 3 replicas of 100ns simulations per system under random initial velocities. The parameters such as RMSD, RMSF, Rg and SASA were analyzed for each replication which is highlighted in Fig. S1-S4 respectively. From the figures, we infer that each replicate showed only marginal deviation from the initial simulation. These marginal differences in the trend between the replicas might be results of under-sampling or exploration of local neighborhood space rather than global solution space [57]. For instance, the RMSD values of both the reference and hit complexes lies between ~0.56 nm and ~0.57 nm in the initial run. While the replicates exhibited the RMSD from ~0.46 nm to ~0. 55 nm during the second and third replicas of the simulation. It is worth noting that other parameters such as RMSF, Rg and SASA also showed similar patterns during the simulation. Indeed, all these 3 replica results highlight the reproducibility and significance of the simulation results.