Exploration of natural product database for the identification of potent inhibitor against IDH2 mutational variants for glioma therapy

Mutation in isocitrate dehydrogenase 2 (mIDH2) is an oncogenic driver prevalently reported in various cancer types including gliomas. To date, enasidenib is the only FDA-approved drug widely used as a mIDH2 (R140Q) inhibitor. However, dose-limiting toxicity and modest brain penetrating capability restrict its use as a plausible mIDH2 inhibitor. Furthermore, secondary site mutations (Q316E and I319M) were identified in patients with enasidenib treatments resulting in acquired therapeutic resistance. Hence, in the present investigation, we aimed to identify novel and potent drug-like compounds to overcome the existing drawbacks using an integrated in-silico strategy. A sum of 1574 natural compounds from the naturally occurring plant-based anti-cancerous compound activity target (NPACT) database was proclaimed and subjected to molecular docking. The binding affinities of the resultant natural compounds were rescored using MM-GBSA scoring functions. The resultant lead molecules were subjected to anticancer activity prediction using the machine-learning model. Furthermore, the toxicity and drug-likeliness of the lead compounds were investigated using ADMET properties. Eventually, the integrated in silico approach resulted in a lead molecule, namely squalene (NPACT00954) against mIDH2 protein. The screened compound was subjected to mutational analysis accomplishing second-site mutations. Interestingly, squalene exhibited appreciable binding affinity alongside good brain penetrating potential than enasidenib. Indeed, the reproducibility and significance of our results are examined by running 3 replicas of 100-ns simulations per system using the random initial velocities of the atoms generated by Maxwell distribution at a given temperature. Thus, we hypothesize from our results that further optimization of squalene could be beneficial for the treatment and management of glioma in the near future.


Introduction
Gliomas are the most established primary tumors of the central nervous system with poor prognoses [1]. On the basis of tumor cell origin, gliomas are classified into astrocytomas, oligodendrogliomas, and ependymomas. Whilst the histopathological classification, grades the aggressiveness of the tumor from I to IV in which grades I and II are lowgrade gliomas (LGG) and the other two types are considered high-grade gliomas (HGG). In 2016, the World Health Organization (WHO) attempts to classify CNS tumors by combining both molecular markers and histopathologic analysis. The isocitrate dehydrogenases (IDH) mutational status remains the predominant choice for the abovementioned classification. [2]. Of note, > 80% of WHO grade II and III patients were reported with IDH mutations and 73% of grade IV (secondary glioblastoma) cases with IDH mutation [3,4]. Even in primary glioblastomas, clinical cases are reported to have IDH mutation [4]. It is worth mentioning that this reclassification of tumors based on IDH mutations improves the prediction of gliomas with increased accuracy.
Isocitrate dehydrogenase is the driver enzyme that participates in major cellular and metabolic pathways such as the Krebs cycle, lipogenesis, redox regulation, and glutamine metabolism [5,6]. Primarily, IDH proteins catalyze the conversion of isocitrate to α-ketoglutarate via an oxidative decarboxylation reaction that reduces NADP to NADPH [7]. Of note, the neomorphic mutations in IDH protein acquire gain of activity that promotes the pathogenesis of numerous cancer types including glioma (80%) by excessive production of oncometabolite, 2-hydroxyglutarate (2-HG). The missense mutations associated with tumor progression tend to replace the highly polar arginine residues with low-polarity amino acids [8]. Thus, the mutated amino acid possesses decreased binding affinity for isocitrate and necessitates the demand for NADPH. The commonly identified mutations of IDH2 protein are R140Q or R172K. The shreds of literature evidence highlight that the IDH mutations were recognized in the early stage of glioma pathogenesis and persist throughout the cancer progression [9,10]. The treatment regimens for glioma were limited to surgical resection in combination with chemotherapy or radiation therapy [11].
In view of discovering potential antagonists against the target protein IDH2, few studies were accomplished using in vitro and in silico strategies. For instance, AGI-6780 is one of the early reported mutated isocitrate dehydroge-nase2 (mIDH2) inhibitors that target acute myeloid leukemia (AML) cells with R140Q mutations [12]. This study also reported to reduce the activity of both intra-and extracellular 2-HG levels in the in vitro experiments. However, the ineffective target selectivity of AGI-6780 remains a major limitation for its use in therapeutic applications. Furthermore, enasidenib (AG-221) was identified as the effective and highly selective inhibitor against IDH2 (R140Q) with an IC 50 value of 100 nM. Subsequently, in 2017, the US Food and Drug Administration (FDA) approved AG-221 as a plausible inhibitor against relapsed or refractory AML possessing IDH2 mutation [13]. Despite its effective inhibition, few adverse side effects such as nausea, differentiation syndrome, and leukocytosis were encountered by some patients in phase trials [14]. Shreds of literature evidence highlight that AML patients develop therapeutic resistance against enasidenib due to the emergence of the secondary mutations (Q316E and I319M) in the allosteric site of mIDH2 protein [15][16][17]. Thus, identifying novel and potent inhibitors against both the primary and secondary mutants of IDH2 protein would be of immense significance. Hence, in the present study, the effort has been made to deduce the highly selective and potent mIDH2 protein inhibitors for glioma treatment.

Dataset preparation
The 3D X-ray diffraction structure of mutated (R140Q) isocitrate dehydrogenase 2 protein with PDB code: 6VFZ was reclaimed from Protein Data Bank [18]. The structure possessed the resolution and R-free value of 1.99 Å and 0.209 respectively, which emphasize the good quality of the protein structure [19]. On contrary, the secondary mutational structures of the mIDH2 protein were generated by mutating the positions of glutamine 316 and isoleucine 319 to glutamic acid and methionine respectively. The reliability of the structures and the non-bonded interactions within the atoms were statistically evaluated using the ERRAT web server [20]. As per the literature evidence, the protein structure with an ERRAT score > 50 was considered an efficient model [21]. Initially, the crude protein structure of mIDH2 was refined using the protein preparation wizard of Schrödinger software. Furthermore, the structure of the known inhibitor, enasidenib, was procured from the PubChem database and subjected to structural optimization using the LigPrep module. During the structural refinement, the hydrogen bonds and formal charges were added, and non-interacting water molecules were removed. Finally, the bond order and tautomerization state of protein and ligand molecules were minimized using the OPLS3e force field [22]. The spatial data file (SDF) of 1574 small molecules from the naturally occurring plant-based anti-cancerous compound activity target (NPACT) database was extracted and utilized as the library for the virtual screening process.

Molecular docking
The molecular docking-based virtual screening was carried out to filter the compounds that have effective binding characteristics at the allosteric site of the mIDH2 protein.
Firstly, the cubic energy grid was generated around the crucial binding site residues of the target protein for accurate docking. Furthermore, the prepared compounds from the NPACT library were subjected to extra precision (XP) docking facilitated by the glide module of the Schrödinger suite. The XP docking achieves the advantage of employing a special scoring function that recognizes the structural scaffold crucial for target binding [23]. Thus, compounds with the lowest XP GScore were identified and compared with known inhibitors (enasidenib) for screening the hit molecules.

Binding free energy calculations
The binding free energies of protein-ligand complexes were estimated with the aid of the prime module of Schrödinger software as a post-docking validation tool. The literature evidence highlights the predicted binding calculations from the prime molecular mechanics; the generalized born model and solvent accessibility (MM-GBSA) module correlate well with experimental values [24,25]. For instance, the MM-GBSA calculations in the prime module include VSGB 2.0 polar solvation model, OPLS force field and van der Waals, and solvent accessible surface area for nonpolar expression [26]. Here, the pose viewer file from Glide XP docking was used as a query to estimate the binding free energies of each docked complex. The MMGBSA calculations were estimated using the following equation.
where ΔE MM represents the energy difference between the protein-ligand complex and the energy summation of ligand-free protein and liganded protein, ΔG solv represents the energy difference in solvation energy of protein-ligand complex and the energy summation of ligand-unbound protein and ligand-bound protein and ΔG SA represents the energy difference in the surface area for the protein-ligand complex and the energy summation of surface area for ligand-free protein and liganded protein.

In silicobiological activity prediction
The biological anticancer activity of the reference and the screened lead compounds was examined using the pdCSMcancer tool. It is worth noting that the comprehensive anticancer bioactivity prediction platform, pdCSM-cancer, was trained and validated with the aid of clinical data of over 18,000 compounds, on 9 tumor types and 74 distinct cancer cell lines [27]. The graph-based signature of the small molecules was utilized to predict the biological activity against numerous cancer cell lines. The predicted anticancer activity of the small molecules was quantified in terms of growth inhibitory concertation (GI50%).

Drug-likeliness screening
A potent lead compound should possess satisfactory drug-like characteristics such as absorption, distribution, metabolism, and excretion (ADME) that are essential to overcome the failure in the drug development process. Thus, the pharmacokinetic and pharmacodynamic properties of the lead molecules were examined using the QikProp module of the Schrödinger suite to screen the druggability of the lead molecules. For instance, the QikProp module is equipped with a diverse set of descriptors, in which a few important descriptors such as Lipinski's rule of five (Ro5), Jorgensen rule of three (Ro3), central nervous system (CNS) activity, blood-brain-barrier (LogBB), and human oral absorption percentage were accounted for screening. Furthermore, the toxicity profile of the hit compounds was examined using ProTox II online prediction tool [28]. Finally, the biological activity of the hit molecules is speculated using in silico PASS prediction tool [29].

Molecular dynamics simulation
The MD simulations for the enasidenib and hit complex were carried out using Gromacs version 2020.3. The GRO-MOS96 54A7 and PRODRG servers were utilized to generate the topology of the target protein and the hit compounds respectively [30,31]. The GROMOS Force-Field Parameter Set 54A7 has been reported to enhance the maintenance of the initial protein structure and thereby improve the stability of the secondary structure of the protein system. The results were found to be in agreement with experimental data [32]. Furthermore, the systems were placed in a dodecahedron box with a minimum 1-nm distance between the solute and the edge of the box. Furthermore, the complexes were solvated using a simple point charge (SPC) water model. The systems were neutralized by adding two chlorine ions using the genion utility of the gromacs tool. The particle mesh Ewald method (PME) and LINCS algorithm were used to estimate the long-range electrostatic interactions and the length constraints of hydrogen bonds [33,34]. The short-range cutoffs for the complex systems were found to be 1.2 nm. In addition, the energy minimization was carried out using the steepest descent integrator. The positional restrain was carried out for each system using the NVT and NPT ensemble equilibration process at 300 K temperature and 1 bar pressure respectively for 100 ps [35]. Finally, the equilibrated systems were subjected to 100-ns MD simulation with a 2-fs time interval. To reassure the results of MD simulations, 3 replicas of the reference and hit complexes were sampled for 100 ns each at varying initial atomic velocities. The initial velocities of the atoms were randomly generated on the basis of Maxwell distribution at the known temperature using the random seed (gen_vel = yes and gen_seed = − 1) [36]. The resultant trajectories were plotted using a grace tool to visualize various parameters such as root mean square deviation (RMSD), root mean square fluctuation (RMSF), and radius of gyration (Rg), and solvent accessible surface area (SASA) generated using inbuilt gromacs utilities.

Virtual screening
The natural compounds from the NPACT database were reclaimed and pre-processed by employing the LigPrep module of the Schrödinger suite. Prior to molecular docking, the energy grid was generated by enclosing the conserved residues of the mIDH2 protein reported in the literature. For instance, the conserved residues incorporated in the grid include Val161, Trp164, Val294, Val297, Leu298, Trp306, Glu316, Ile319, Leu320, and Gly323 [37][38][39]. The grid coordinates (− 32.31, − 14.10, − 27.02) encompassing these residues serve as the binding site for the library compounds during the course of molecular docking. Initially, the docked pose of the reference was superimposed with the available structure in PDB (PDB code: 5I96) to verify the reproducibility of the crystallographic binding pose. The conformational orientation of docked complex was verified using the RMSD value. It is inferred that the RMSD value of the superimposition was found to be 1.01 Å. This result was considered a docking success due to the pose RMSD value of less than 2.0 Å from the crystallographic reference [40]. Subsequently, the entire set of 1574 compounds was docked using the abovementioned parameters. The XP GScore of all the natural compounds ranged from − 8.54 to − 0.02 kcal/ mol representing its binding potential against mIDH2 protein (Fig. S1).

MM-GBSA calculations
The Prime MM-GBSA module was used as a post-docking validation tool that reconfirms the binding affinity of each ligand molecule towards the target protein. The post viewer file from XP docking was used as a query for free energy calculations. The binding free energies of all the docked compounds vary from − 93.86 to − 8.00 kcal/mol (Fig. S2). It is noted that a total of 515 molecules exhibited the lowest binding energy in comparison with the reference molecule (− 39.12 kcal/mol). Moreover, various interactions such as lipophilicity, covalent bond, van der Waals interactions, and non-polar interactions culminate in the relative binding energy of each complex. The results from MM-GBSA analysis highlight that van der Waals and lipophilic energies of the ligand molecules contribute majorly to the binding free energy and thereby facilitating strong binding in the active site of the target protein (Table 1).

Secondary site mutations
In order to attain the goal of identifying novel inhibitors against resistant mIDH2 variants, the scrutinized natural compounds were docked against the second site mutations located at residue glutamine 316 and isoleucine 319 respectively. Initially, the crystal structure of the target protein with R140Q mutant was co-mutated with either Q316E or I319M. The ERRAT scores for both mutants were found to be 96.5 which emphasizes the high structural quality. The energy grids were generated by enclosing the active site residues and each mutated structure (mIDH2 R140Q/Q316E and mIDH2 R140Q/I319M ) along with the screened hit molecules were subjected to hierarchical docking modes. Furthermore, the docking scores from glide were reaccredited using prime MMGBSA free energy calculations. Collectively, a total of 61 compounds displayed better ΔG bind against all the investigated mutant structures (Table 2). Subsequently, pdCSMcancer was incorporated to predict the growth inhibition concentration (GI50%), distribution of compounds, and mIDH2 variants.

pdCSM-based activity prediction
In the present study, the biological activity of the reference and the hit molecule was predicted using a machine learning-based online tool called pdCSM-cancer [27]. The GI50% values of the reference and hit molecule against glioma specific cell line (U251) were found to be 4.74 and 4.56 respectively. It is evident from the results that 19 molecules were found to be active against the U251 glioma cell line as it has GI50% > 5 [41]. In particular, 11 compounds showed better GI50% than enasidenib (Table 3). These compounds were further examined based on ADMET screening and molecular dynamics simulations studies.

ADMET screening
The pharmacological characteristics of lead molecules play a vital role in drug design and development. Of note, several potent lead molecules that express good inhibitory potential fail to satisfy the ADMET properties in clinical trials [42,43]. Thus, in the present study, efforts were taken to extensively analyze the drug-likeliness of the scrutinized hit molecules using the QikProp program of the Maestro interface.
The results of the ADMET analysis are tabulated in Table 4. The percent of human oral absorption of the lead molecules such as NPACT00164, NPACT00185, NPACT00551, NPACT00954, NPACT01432, and NPACT01479 was found to be greater than 80%. Moreover, it is worth mentioning that the therapeutic efficacy of enasidenib was questioned because of its deprived brain penetrating potential (− 0.467) and toxicity endpoints. Amongst the 6 molecules, only three compounds, namely NPACT00164, NPACT00551, and NPACT00954, were predicted to have a positive CNS and optimum logBB score. On top of ADME screening, the toxicity analysis was performed using ProTox II online server to examine the toxicologic endpoints such as cytotoxicity, carcinogenicity, hepatotoxicity, and immunotoxicity [28]. Only the lead molecule (NPACT000954) did not show toxicological effects on the above-mentioned endpoints whereas enasidenib was reported to exhibit immunotoxicity (Fig. 1). Finally, the biological activity of the lead molecules was predicted by evaluating the probability of actives (Pa) and inactives (Pi) using the PASS server [29]. The hit compound NPACT00954 was observed to exhibit antineoplastic activity against brain tumor cell lines with a higher probability of an active (Pa = 0.454) than inactive (Pi = 0.05) score.

Interaction profiling
The binding chemistry of the reference and hit complex in the active site of IDH2 mutational variants was analyzed using a protein-ligand interaction profiler (PLIP) [44]. Figure 2 represents the interaction profiling of enasidenib and NPACT00954 complexed with three mutational variants. The IDH2 R140Q /enasidenib was observed to exhibit its binding affinity through hydrophobic and hydrogen bond interactions. For instance, the reference compound was able to maintain hydrogen bond interactions with residues Gln 316 and Val 294 highlighted in blue-colored lines. The dashed grey lines in Fig. 2 (Fig. 2). Whilst NPACT00954 was consistently having seven  Fig. 2. Of note, the decreased hydrogen and hydrophobic interactions of enasidenib with conserved residues of target protein were due to the steric effect caused by co-mutation. The increased hydrophobic interactions of the hit molecule with IDH2 mutational variants might contribute to its high inhibitory effect. In essence, some of these interactions were established with the conserved residues of the target protein, and some unique residues such as Trp 207, Leu 255, Tyr 258, and Tyr 311 showed persistent interactions upon NPACT00954 binding [38]. These results reiterate the contributions of the weaker interactions in the ligand binding. For instance, shreds of literature highlight that weak cation pi interactions and hydrophobic interactions play a crucial role in stabilizing and selective binding of ligand molecules towards the target protein. In particular, the study by Gromiha et al. emphasize the contribution of cation-pi interactions in the stable binding of DNA binding proteins [45]. In another study, Patil et al. reported that the increased hydrophobic interactions in the active site of c-Src and c-Abl proteins enhance the stability and binding affinity of the protein-ligand complex [46]. Based on this evidence, we believe that the existence of hydrophobic interactions in the complex structure could significantly contribute to the selective and enhanced binding of the hit molecule.

Scaffold analysis
Shreds of literature evidence highlight that the existence of mild polar groups such as trifluoromethyl pyridine and 2-methyl 2-propanol contributes to the overall binding potency of enasidenib [13]. The structural representation of both the enasidenib and NPACT00954 is depicted in Fig. 3 The screened hit NPACT00954 (Squalene) is a natural compound commonly found in olive oil and shark liver oil. An extensive study on squalene highlights its multifunctional role as an antioxidant, anti-infectant, detoxifier, anticancer, and skin-hydrating agent [47]. In particular, there is an emergence in the use of squalene as supportive therapy for several tumor types including skin, lung, and breast cancers [47][48][49]. It is believed that the existence of triterpene moiety in the lead compound (NPACT00954) could contribute to its enhanced anti-tumor activity. Of note, the triterpene moiety was reported to possess anti-tumor activity against glioblastomas [50]. Indeed, collective evidence from our results highlights that squalene has been utilized to prevent or arrest tumor growth in conjunction with anticancer drugs. The results of our study correlate well with various experimental tumor model studies reported in the literature [51]. Squalene achieves its mechanism of action through the inhibition of the lipid metabolic pathway, a plausible molecular target for the management of different cancers [52][53][54][55]. Precisely, squalene inhibits cancer cell proliferation by decreasing farnesyl pyrophosphate levels, thereby reducing its availability for prenylation, which is required for the activation of an oncogene, Ras [56]. Through isoprenylation, the ras and other important proteins are localized to the cell membrane via farnesylpyrophosphate. This process plays a crucial role in the progression of the neoplastic cells, without which the ras oncogene finds it difficult to survive.
Squalene has extended its role in minimizing certain tissue damages, in addition to enhanced cytotoxicity in cancerous cells [57]. It is to highlight that squalene in combination with chemotherapy agents can potentiate cytotoxic activity against cancer in humans. For instance, experimental evidence proves that the combination of squalene with nitrosourea in the murine model potentiated the efficacy of nitrosourea against leukemia [58]. Furthermore, the combinatorial effect of squalene with chemotherapeutic drugs such as 5-fluorouracil, Adriamycin, cisplatin, bleomycin, and cyclophosphamide has increased the antitumor activity and cytotoxicity. Amongst the other chemotherapeutic drugs, bleomycin and adriamycin were found to exhibit increased antitumor activity against cancer cells [59]. All this evidence recognizes squalene to exhibit significant antioxidant and anticancer properties and could be used as a supportive medicine in combination with enasidenib.

Conformational stability analysis
The binding of enasidenib and squalene in the catalytic pocket of the target can result in conformational variation in the mIDH2 structure [60]. Estimation of the RMSD of each complex could serve as an important attribute to characterize the stability of the complex. Consistently, the black trend line represents the results of the enasidenib complex and the red line represents the squalene complex. The results of the RMSD plot depicted in Fig. 4 suggest that the binding of enasidenib showed less deviation in the RMSD until 30 ns. The complex attains the highest deviation of ~ 0.64 nm between 30 and 50 ns and the plateau decreased and maintained a stable deviation of ~ 0.55 nm till the end of the 100-ns simulation period. In contrast, the binding of squalene resulted in lower RMSD deviation than the reference complex until the 10-ns simulation and maintained a stable plateau till 30 ns. Within the time frame of 30 to 100 ns, the hit complex was found to deviate between ~ 0.53 and ~ 0.56 nm respectively. Of note, it is worth mentioning that the hit complex exhibited a lower deviation between 30 and 50 ns, whereas the reference complex showed a maximum deviation within the stipulate time boundary. At the end of the 100-ns simulation, the mean RMSD of enasidenib and squalene complex was found to be 0.56 nm and 0.57 nm respectively. Thus, from the RMSD plot, we suggest that the orientation of binding for both complexes was similar, highlighting their stable conformation.
The trajectories of reference and the hit-mIDH2 complex from the MD simulation were utilized to examine the RMSD of the ligand. The result is shown in Fig. 4b. It is evident from the figure that the RMSD trend of enasidenib was found to fluctuate between 0.13 and 0.4 nm until the mentioning that NPACT00954 equilibrated and maintained a stable RMSD value of 0.4 nm from 40 to 100 ns with a negligible variation at 55 ns. The ligand RMSD of the enasidenib and the hit compound was found to be 0.33 nm and 0.39 nm, respectively, at the end of the 100-ns simulation period. Overall, from the RMSD plot, we infer the hit compound depicts stable binding with mIDH2 protein than the reference compound.

Residual flexibility analysis
The atomistic fluctuations of each residue were assessed to gain insights into the overall flexibility of the mIDH2 structure upon ligand binding. The residual fluctuations of active site residues from mIDH2 protein are depicted in Table S1. It is clear from the table that amongst 51 active site residues, 66% of active site residues in the mIDH2 bound squalene complex showed lesser residual fluctuation. It is worth mentioning that the conserved residues, namely Lys 199 and Asp 202, were found to be highly fluctuating (> 0.2 nm) when bound to the enasidenib complex. On the other hand, the squalene-bound complex showed lesser fluctuation (< 0.12 nm) when bound to the target protein (Fig. S3). We believe that the decrease in the fluctuation of the residues from the hit complex depicts their involvement in the binding of the squalene compound. Altogether from the RMSF plot, we understand that the residual fluctuation of the mIDH2-squalene system was much lower when Fig. 2 The interaction scheme of a enasidenib and b NPACT00954 within the active site of mIDH2 R140Q , mIDH2 Q316E , and mIDH2 I319M respectively Fig. 3 The 2D structure of a enasidenib and b NPACT00954 highlighted with functional key scaffolds compared to the reference complex and thus emphasizing its structural stability.

Compactness evaluation
The radius of gyration is directly correlated to the volume of the tertiary structure of the target protein. The Rg value of the reference and the hit complex was calculated using the gmx gyrate tool from gromacs. The Rg trend of the reference complex falls in the range of 2.09 to 2.34 nm, whereas for the hit complex, the range varies from 2.14 to 2.33 nm. The average Rg for enasidenib and squalene was found to be 2.09 nm and 2.14 nm respectively. The stable protein-ligand complex is supposed to possess a higher Rg value due to less tight packing. The results from the radius of gyration were found to be in agreement with the literature evidence [61].
Thus, it is clear from Fig. S4 that the squalene complex is more stable than the reference complex.

SASA
The exposure of the side chain residues around the solvent surface was estimated using the gmx sasa tool of gromacs software. Figure S5 illustrates the SASA plot of the reference and the hit complexes. From the figure, we infer that the enasidenib and squalene-bound complexes were following a similar SASA trend. The average SASA value of enasidenib and squalene systems was found to be 225.37 nm 2 and 224.39 nm 2 respectively. In addition, the free energy of solvation of both complexes was measured. The solvation-free energy of mIDH2-enasidenib and mIDH2squalene complexes were observed to be 1508.74 kJ/mol/ nm 2 and 1502.172 kJ/mol/nm 2 respectively. The decrease in the SASA value of the mIDH2-squalene complex emphasizes the stability of the system upon ligand binding.

Essential dynamics
The essential dynamics of enasidenib and squalene complexes reflect the concerted motion of protein structure upon ligand binding [62]. In the present investigation, the dynamic movements of mIDH2 protein were evaluated using covar and anaeig utilities of the gromacs tool. The overall motility of each system was measured by summing the eigenvalues that ultimately depict the flexibility of the complexes [63]. The initial eigenvectors describe the concerted motion of mIDH2 within the conformational subspace. The trace covariance matrix encompassing eigenvalues of mIDH2enasidenib and mIDH2-squalene was observed to be 13.75 nm 2 and 13.68 nm 2 respectively. The lower trace value of squalene, when compared to the enasidenib complex, emphasizes the decrease in random fluctuations of protein upon ligand binding. Figure S6 illustrates the 2d projection of enasidenib and squalene trajectories along the conformational phase space. It is evident from the figure that both complexes occupied similar conformational phase space. The atomistic density distribution plot of enasidenib and squalene complex were depicted in Figs. S6 b and c respectively. The results from the distribution plot highlight that there is a significant difference in the total movement of each complex [64]. For instance, the difference in the movement of atoms would arise due to the stable binding of squalene with the target protein. In addition, the color code in the distribution plot illustrates the movement of atoms along the phase space. The blue-colored region denotes the anti-correlated movement of atoms and the red color region depicts the correlated movements. From Fig. S6 b and c, we infer that the mIDH2-enasidenib complex showed a strong anticorrelated motion whereas the squalene complex exhibited a Furthermore, to evaluate the reproducibility of the MD results, both the systems were subjected to 3 replicas of 100-ns simulations per system under random initial velocities. The parameters such as RMSD, RMSF, Rg, and SASA were analyzed for each replication which is highlighted in Figs. S7, S8, S9, and S10 respectively. From the figures, we infer that each replicate showed only marginal deviation from the initial simulation. These marginal differences in the trend between the replicas might be a result of under-sampling or exploration of local neighborhood space rather than global solution space [65]. For instance, the RMSD values of both the reference and hit complexes lie between ~ 0.56 and ~ 0.57 nm in the initial run. Whilst the replicates exhibited the RMSD from ~ 0.46 to ~ 0. 55 nm during the second and third replicas of the simulation. It is worth noting that other parameters such as RMSF, Rg, and SASA also showed similar patterns during the simulation. Indeed, all these 3 replica results highlight the reproducibility and significance of the simulation results.

Conclusions
Our present study focuses on exploring the feasible and potent drug-like candidates against mIDH2 and its associated second-site mutational variants using an integrated virtual screening approach. Initially, structural intricate and binding poses of the target protein upon ligand binding were analyzed using molecular docking and prime MM-GBSA calculations. The increased van der Waals energy and lipophilic contributions observed in this analysis highlight the existence of hydrophobic interactions in the binding of the hit compounds. Furthermore, the results from secondary mutational site docking reveal the efficacy of the hit compounds towards the diverse mutational variants. In addition, the docking accuracy of each mutational variant was reassured using binding free energy calculations, which resulted in 61 lead compounds with better binding energies. The resultant lead molecules were then scrutinized based on the machine learning-based pd-CSM-cancer tool. In addition, the bioavailability and toxicity profile of the resultant 11 compounds were examined using QikProp and Pro-Tox II modules. Interestingly, only one compound, namely NPACT00954 (Squalene), exhibited better pharmacokinetic properties with an enhanced positive log BB value of 2.01. The scrutinized hit along with the reference was subjected to three replica sampling per system of 100 ns in which each system was assigned with different initial velocities in accordance with Maxwell distribution. Altogether, the results from the MD simulation emphasize the stable binding of squalene towards mIDH2 protein. Importantly, the literature evidence also highlights that the triterpene scaffold of Squalene was reported to possess anti-cancer activity. It is evident from the results that squalene is a potential inhibitor of mIDH2 which binds stably when compared to the known inhibitor enasidenib. As there is currently no experimental evidence of squalene mIDH2 inhibitory activity, this is merely something worth exploring further to confirm the findings of our study. If confirmed, further optimization of squalene as a lead compound would still be necessary before it could potentially be used as a mIDH2 inhibitor.