The workflow for this study which included Activity models, Pharmacophore modeling, Molecular docking, Molecular dynamics (MD) simulations, Binding free energies calculation and cell assay is shown in Fig. 1.
2.1 Data Preparation
For the screening database, the specs database(http://www.specs.net) containing 212,736 compounds were first filtered by Lipinski and Veber rules and 75671 compounds obeyed the Lipinski's cut-off values. Subsequently, the remaining molecules were prepared with LigPrep module provided in the Maestro 12.1 (www.schrodinger.com) using the OPLS_2005 force filed and the protonation states were generated with Epik at PH of 7.4. The prepared database was used to generate 3D conformational set for each molecule using CAESAR in Discovery Studio 4.0 (http://www.accelrys.com) and saved them as Database 1.
For the known ligand database, in order to have a reliable and representative set of active ligands with wide structural diversity, we collected known ligands from literature and patent[11, 22–31]. In this study, compounds whose IC50 values (here reported as pIC50) were determined by the same procedure (fluorescence polarization assay) were considered. Moreover, compounds with defined chirality only were selected in order to have a univocal match between structure and activity. The final the known ligand database was constituted by 48 compounds with pIC50 values in the range 4.0 − 10.0 and sufficient structural heterogeneity. The structures of the collected known ligands are shown in Fig. 2.
For the known ligand database, the molecules in the database need to be properly prepared. Therefore, prepared the structures by washing, adding hydrogen atoms, and energy minimizing in MOE (2019.01). In Database Wash, set the Protonation Field to Dominant (Dominant protonation state at specified pH) and set the Coordinates Field to Rebuild 3D. Other settings with default values. After preparation, the molecules were stored in the mol file with properly protonation. The known ligand data set after preparation was named KLD.
2.2 Preparation of the 3D-Structure of HDM2
Induced-fit theory emphasizes that the ligand can induce the conformational change of the receptor during the binding process. Therefore, it is very important to select a good receptor structure. On this basis, HDM2 protein with a PDB ID of 5OC8 (https://www.rcsb.org/structure/5OC8)  was selected as the receptor of this study according to the good resolution. The crystal structure resolution is 1.56 Å and binding of protein HDM2 with small molecule ligand NVP-HDM2, which is a second generation p53–HDM2 inhibitor that had entered phase I clinical trial. Then Discovery Studio 4.0 (DS 4.0) was employed to the prepare protein by adding missing residues, hydrogen atoms as well as deleting water molecules, and then the structure was minimized by using SYBYL-X software (version 2.0) with AMBER99SB force field.
2.3 Activity Models
Active models such as QSAR and fingerprint were constructed based on known compounds can be used to filter the initial database.
2.3.1 QSAR Modeling
The QSARs are statistical models that attempt to correlate the structure and experimental properties (activity, selectivity, toxicity, etc.) with computed properties such as molecular descriptors and fingerprints. The QuaSAR suite of applications in MOE is used to build numerical models of the data for prediction and interpretation purposes. For instance, if we get a series of ligands with known activity, we can build a numerical models from the training set to predict the activity of molecules not in the training set.
In this study, the QSAR modeling calculations were performed by using QuaSAR Model in MOE. First, QSAR models used three molecular descriptors to calculate the molecular descriptors for 48 compounds with already known activity in KLD. Then, the AutoQSAR function was employed for the model building using the Partial Least Squares (PLS) regression models. PLS is often used in the main regression analyses[35, 36]. A reasonable QSAR model requires R2 > 0.8 and XR2 > 0.5. If the model was unreasonable, we need to remove outliers from the database, then rebuilt and cross-validated the model. Finally, the optimized QSAR model was used to filter the database.
2.3.2 Fingerprint Models
A vector of Fingerprint keys contains information on the presence or absence of certain properties or molecular features. According to the design philosophy, different fingerprint schemes emphasize different molecular attributes. In MOE, the fundamental idea of the Fingerprint Model is to encapsulate certain properties directly or indirectly in the fingerprint and then use the fingerprint as a surrogate for the chemical structure. Then comparisons between molecules are reduced to comparing sets of features and measuring the degree of overlapping of the sets.
MOE-Fingerprints can be used for: (i) Fingerprint Clustering to a group of molecules based on the similarity of their fingerprint. (ii) Fingerprint Models to study Quantitative Structure-Activity Relationships using fingerprint descriptions of the molecules. (iii) Similarity Searching to find molecules with similar fingerprints. In this study, we used the fingerprints of a set of active compounds to search databases for similar active compounds by building a fingerprint model.
We built a fingerprint model with KLD mentioned above. The code FP: GpiDAPH3 refers to the Pharmacophore Graph Triangle (GpiDAPH3) fingerprints, and these are three-point pharmacophore fingerprints calculated from the 2D-molecular graph. Therefore, we selected the FP:GpiDAPH3 fingerprint model. Set fingerprint score to Maximum (Maximum of similarities scores) and saved the fingerprint model.
2.4 Pharmacophore Modeling
MOE's pharmacophore modeling tools determine the pharmacophoric feature characterizes and their spatial arrangement in 3D that are essential to the binding of ligands to their receptors and the drug activity of the ligands. Pharmacophore models can be generated from the structural data of protein-ligand complexes as well as from ligands when no receptor information is available and from the receptor structure when no ligands are available. The generated models can be used to screen virtual compound libraries for potentially active molecules.
In this study, we used Ligand-based and Complex-based Pharmacophore Modeling for virtual screening.
2.4.1 Ligand-Based Pharmacophore
The common feature pharmacophores were performed by using the pharmacophore Elucidation module in MOE. The HDM2 inhibitor dataset KLD contains 48 known active compounds, was divided into high active compounds(pIC50 ≥ 8) and low active compounds(pIC50 < 8). At first, Pharmacophore Elucidation was employed to identify common features necessary for active HDM2 inhibitors. Set conformation to Clustered to reduce the total number of conformations, the conformation method was set to Conformation Import, the Activity Field was set to pIC50, the value was set to 8, and the other parameters of Pharmacophore Elucidation were set to default values. Then, the selected pharmacophore model with the highest alignment score and manually add ligand shape constraints. Finally, the pharmacophore model was saved for subsequent database screening.
The Protein-Ligand Interaction Fingerprints (PLIF) tool was used for summarizing the interactions between ligands and proteins using a fingerprint scheme. The interaction between the receptor and the ligand can be observed more intuitively, and the pharmacophore obtained in this way is more reliable. First, selected five crystal structures (PDB ID:4HG7, 4QO4, 4WT2, 4ZYF, 5OC8), which the crystal structure resolution was less than 2.0 Å and binding with good HDM2 inhibitor, imported five HDM2 complex structures into MOE, performed QuickPrep, unchecking the Refine checkbox, using default values for the rest of the settings. The QuickPrep application can correct problems in the protein- ligand complex, adding hydrogen and assigning the protonation states. Tethers were then added to the receptor, heavy atoms and distant atoms are fixed.
The Protein Structure Superposition application was used to superpose the set of 3D protein structures to maximize the spatial overlap of a common atom subset, highlighting the regions of conserved structure and modified regions in the protein collection. It can also be used to overlay and study active sites of related proteins. Finally, the PLIFs were generated and analyzed, and a Pharmacophore Query was generated.
2.5 Molecular Docking
Docking is a computer simulation technique used to predict the binding mode of two interacting chemical species. Dock application from MOE can be used to search for favorable binding modes between ligands and macromolecular targets, which is usually a protein. After the docking is completed, for each ligand, a number of poses are generated and scored. The final highest-scoring poses, along with their scores and conformation energies, were recorded to a database where they are ready for further analysis.
The molecular docking in this study was performed using the crystal structure of HDM2 (PDB ID: 5OC8). First, the structure was prepared using the Protonate 3D module in the MOE to add hydrogens to the entire system and distribute ionization states. The structure was then optimized using energy minimization. Then, the active site of the receptor was checked using SiteView in MOE. The docking site was selected as the ligand-binding site in 5OC8, the initial scoring methodology selected London dG, poses were set to 5, the final scoring methodology selected GBVI/WSA dG, poses were set to 1, while initial settings were adopted for the other conditions. The docking simulation output used score S while considering the binding energy.
Some potential inhibitors with diverse skeletons were selected manually after molecular docking, according to the principle of structural diversity.
2.6 Molecular Dynamics Simulations
Molecular dynamics can dynamically analyze microscopic interactions between compounds from the atomic scale. By analyzing the trajectory of the balanced system, a large amount of kinetic data can be analyzed to predict the macroscopic properties of matter.
The potential inhibitors bound to the HDM2 models were investigated by MD simulations for 100 ns with GROMACS 5.2.0. In this study, the size of the octahedral solvent box matches the size of the protein, and the edge distance is 1 nm. The corresponding Cl− are added to the solvation system to neutralize the system. Before the molecular dynamics simulations, the energy of the protein-ligand complex is minimized, so that the potential energy (Epot) of the system is as small as possible.
During the first stage, a simulation was carried out for 200ps at a temperature of 300K in the Nose-Hoover thermostat (NVT) (constant number of particles, volume, and temperature) ensemble with solute heavy atoms restrained by a force constant of 100 kcal· mol− 1·Å−1. Then the system was kept to 300K during a 200ps MD simulation in the NPT ensemble, with the same force constant applied to the solute atoms. Finally, in the production phase, the relaxed systems were simulated without restrained under NPT ensemble conditions for 100ns. Within this simulation time, the total energy and RMSD of the protein backbone reached a plateau, and the systems were considered equilibrated and suitable for statistical analysis. The Nose-Hoover thermostat (NVT) was utilized to maintain a constant temperature in all simulations, and the Martyna-Tobias-Klein method was used to control the pressure.
2.7 Binding Free Energies Calculated By MM/PBSA
In this study, the molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) method was used to estimate the binding free energy between protein HDM2 and potential inhibitors and used the g_mmpbsa tool of GROMACS 5.0.2. In MM/PBSA calculation, the solvent is regarded as a uniform continuous medium, and based on the force field and implicit continuous medium model, the dynamic equilibrium trajectory of the protein-ligand complex is sampled, and the free energies of the complex, protein and ligand are calculated respectively.
2.8 Cell assay
MTS is analog of MTT, the full name is 3-(4,5-dimethylthiazol-2-yl)-5(3-carboxymethoxyphenyl)-2-(4-sulfopheny)-2H-tetrazolium, which is a yellow color dye. Succinate dehydrogenase in the mitochondria of living cells can metabolize and reduce MTS to generate soluble formazan compounds. The content of formazan can be measured at 490nm with a microplate reader. Under normal circumstances, the amount of formazan produced is proportional to the number of viable cells, so the number of viable cells can be inferred from the optical density OD value.
First, inoculate cells, prepare a single cell suspension with RMPI1640 medium containing 10% fetal bovine serum, and inoculate 5000 cells per well into a 96-well plate, with a volume of 100 µl per well, and the cells are inoculated and cultured 12–24 hours in advance. Then, the solution of the compound to be tested was added, the compound was dissolved in DMSO, and the compound was re-screened at the concentrations of 40 µM, 8 µM, 1.6 µM, 0.32 µM, 0.064 µM, and the final volume of each well was 200 µl, and each treatment had 3 replicate wells. After culturing at 37℃ for 48 hours, the adherent cells discarded the medium in the wells, and added 20 µl of MTS solution and 100 µl of culture medium to each well; set up 3 blank duplicate wells, and continue to incubate for 2–4 hours, the light absorption value was measured after allowing the reaction to proceed sufficiently. Finally, the wavelength of 492 nm was selected, and the light absorption value of each well was read with multi-function microplate reader (MULTISKAN FC), and the result was recorded to calculate the IC50 value of the compound.