WaterSwap Analysis, a Computation-based Method for the Discovery of Effective and Stable Binding Compounds for Mutant EGFR Inhibition


 The epidermal growth factor receptor of the tyrosine kinase family has been largely targeted in mutations associated with non-small cell lung cancer. EGFR inhibitors have been produced that bind allosterically to the C797S mutant EGFR enzyme. Here, the Waterswap tool has been used for the interpretation and visualization of trajectories of mutant EGFR-ligand complexes. Virtual screening of the generated compounds has been carried out along with its molecular docking and ADMET analysis. Out of the generated library of compounds, the top 15 have been selected. Waterswap calculated the binding free energies of the compounds and thermodynamic properties of the enumerated compounds were compared with that of standard EAI045. It was observed that styrylquinoline stabilized better than EAI045. The results show that Waterswap analysis offers a promising new path in the hunt for improved tools for analyzing and visualizing molecular driving forces in protein-ligand complex simulations.


Introduction
Cancer is the second biggest cause of death in the world, and it is one of the primary causes of morbidity and mortality. In 2015, cancer claimed the lives of 8.8 million people. Cancer is responsible for about one-sixth of all fatalities worldwide [1]. Despite decades of attempts to nd and develop small molecule anticancer medications, the demand for novel antitumor medicines with enhanced tumour e ciency, selectivity, and safety remains critical [2,3].
One of the most signi cant enzymes in the tyrosine kinase family is the Epidermal Growth Factor Receptor. EGFR is a transmembrane receptor that is found on the cell surface. The EGFR enzyme is classi ed as (ErbB1, HER1), ErbB2 (HER2, neu in rodents), ErbB3 (HER3), and ErbB4 (HER4) based on structural similarities [4]. EGFR and its members have become a popular target for anticancer therapy, particularly in the treatment of non-small cell lung cancer [5]. Activating mutations in the EGFR gene had a signi cant impact on non-small cell lung cancer therapy approaches (NSCLC). The recent identi cation of novel EGFR inhibitors was critical in overcoming such mutations [6].
Drugs' a nity for their targets is described by thermodynamic characteristics. As a result, accurate thermodynamic property calculations have long been a goal of computational chemistry. Hence, accurate thermodynamic property calculations have long been a computational chemistry goal. In the industry, study of solvation patterns surrounding pharmaceuticals and binding sites has become a major goal for molecular modelling [7][8][9]. According to an estimate, hydrophobic surfaces cover 75% of surface drug sites [10], implying that the hydrophobic effect is one of the most energetic contributions to ligand binding and recognition.
Individual water molecules that serve a crucial stabilising role, for example, by building bridges between the ligand and the receptor, have been shown by high-resolution co-crystallized structures of protein-ligand complexes [11,12]. Improved binding a nity has also been linked to a network of H-bond-enriched water molecules around the ligand [13][14][15].
The water-swap reaction coordinate was described by Woods et al. as a new reaction coordinate (WSRC) [16]. The ligand and water are exchanged from being bound to the protein to being solvated by a periodic box of bulk water in this approach. Because no molecules are moved into a vacuum, the enormous decoupling free energy associated with double decoupling procedures is avoided. Furthermore, a new approach for restricting the identity of solvent clusters within bulk has been established, allowing the swap to be accomplished without the requirement for any restrictions on the ligand or swapped water, and this strategy applies to both buried and solvent-exposed binding sites [16]. The concept behind the water-swap reaction coordinate is that it permits the identity of water clusters to be limited during simulation. During a simulation, this identity constraint [17] works by modifying how water molecules are labeled. We used computation-based methods such as virtual screening, molecular docking, and WaterSwap analysis to nd an effective compound with stable binding and mutant EGFR inhibition effect in this study.

Insilico methodology
ChemT tool was used for library generation and enumeration. The molecular docking tool, AutoDock Vina, was used for docking studies of all compounds on the EGFR enzyme. SwissADME tool was used to determine ADMET studies. Waterswap study was performed on trial version of Cresset's Flare software.

ChemT and Library Enumeration
ChemT, a Biochemcore-based software, was used to enumerate the library. In Marvin Sketch, the R-group position on the template styrylquinoline. The styrylquinoline was then assigned to the identi ed R-groups through the enumeration model of the software. EAI045 input determined the standard template. Additionally, few available pharmacokinetic properties such as molecular weight, H-bond donors, and Lipinski rule of ve was set for designing drug-like compounds [18].

Molecular Docking
To investigate the binding pocket, residue interactions, and predicted binding a nity of the enumerated compounds, molecular docking protocol was carried out using AutoDock tools. Through Chimera modeler, the protein was fetched from the open-source PDB libraries. Explicit hydrogen bonds were added followed by the removal of water molecules. Using Gastegier charges, ions were removed and nally, the protein was minimized using the Amber94 force eld. The total number of loops was set to 1000 to ensure maximum minimization of the protein. The minimization was terminated when the energy converged or the root mean square deviation (RMSD) reached a maximum cut-off of 0.30 Å. Ramachandran plot was set to check the total number of disallowed residues. Through AutoDock 4.0 GUI, the minimized PDB was converted to a PDBQT le followed by generation of the grid. To perform exible docking, torsions were chosen in the selected residues and saved as ex.PDBQT. The con guration le, which was used to outline the grid area as well as the protein and ligand input, was de ned and saved as a text document. All the ligands were prepared using Gastegier charges and saved as a PDBQT le. The les saved in the fold were called through the command line under the vina environment to start the docking process. The results for each input ligand were saved as a text le consisting of docking scores and RMSD values of each ligand conformation which was used for the analysis of data. The output le was then converted into Maestro readable extension for image saving [19][20][21][22].

WaterSwap methodology
The PDB: 5D41 was downloaded initially and full preparation on proteins and ligands was carried out. Under are protein preparation wizard, at pH 7.0 small side chains movement were allowed, atoms were removed from residues with incomplete backbone, along with lling of 1 and 2 residue gaps at 6.00 A° active site size. Then the protein was minimized under the normal calculation method with a gradient cutoff of 0.200 kcal/mol/A with 1300 maximum iterations.
The ligands were loaded in the sdf format for docking. Docking was performed under the normal calculation method under Autodock exible docking protocol. The ligands to dock were already selected, the template ligand was de ned, the co-crystallized ligand was selected for docking grid preparation, the minimized protein was selected and docking calculation was started [16,23].
For WaterSwap calculation, the protein along with docked ligand was prepared again and minimized as per the above method.
The calculation method was kept custom and no changes were made. The small molecule force eld was kept as AMBER with GAFF version with AM1-BCC as the charge method. The solvate box buffer was kept at 10.0 A. Open MM platform was GPU, minimized energy tolerance as 0.25 (kcal/mol), the system was equilibrated for 100 ps. For Waterswap, the iterations were kept as 1000 -1300 (1300 for docked ligand), maximum threads were kept on Auto, with Water monitor distance as 7.00 A, the protein was selected, the ligand was selected along with the proper chain, and the calculation was started.

Result And Discussion
Computational Methods

Database building and Library Screening
The most potent substituted styrylquinoline with the highest antitumour activity was taken from the literature [24], and by modi cation, at R 1 position a library of around 2000 compounds was generated virtually. Based on the obtained tness score using ChemT, the top 1000 hits were screened for further pre-ADMET analysis. All 1000 hits were screened through Lipinski's Rule of 5 and around 100 hits were chosen for further study tting with the criteria. In the virtual screening work ow protocol, all 100 hits were subjected to a exible docking approach using AutoDock. A score cutoff of -9.20 kcal/mol for docking discarded lower-scoring compounds selecting 15 highest scoring compounds. Top 15 highest scoring, best pocket tting, and possessing similar binding interactions with that of standard EAI045, novel and potent compounds were obtained through virtual screening and were selected for further studies.

Molecular Docking
Molecular docking was carried out on mutant EGFR enzyme (PDB ID: 5D41) to check the binding pattern and interactions. It was seen that all 15 molecules docked correctly into the allosteric binding pocket of the protein with common interacting amino acid residues LYS745, LEU788, ASP855 and PHE856. From Figures 1a and 1b, it was observed that all compounds t into the hydrophobic pocket of the protein interacting with the common amino acid residues. The combined docking scores along with structures are shown in Table 1. ADMET studies SwissADME website (http://www.swissadme.ch/index.php) was used to quantify the physicochemical and pharmacokinetic properties of the synthesized derivatives with respect to EAI045. Table 2 shows the analysis of the pharmacokinetic parameters needed for ADMET study of compounds KSK (1-15). The drug must pass Lipinski's rule to show good oral absorption in humans [25,26]. All 15 compounds do not show Lipinski's rule violation, passing the criteria. The MLOGP values were also within the acceptable range, showing the compounds' correct drug absorption and distribution within the body. Almost all the compounds show good GI absorption and some of them were bloodbrain barrier permeant. Through bioavailability scores, it was indicated that a potent compound is available at the site of action.
Overall, the compounds showed satisfactory pharmacokinetic parameters range de ned for human use along with good docking scores [27][28][29].

Waterswap
When dealing with the cavitation and large-value problem of double decoupling, we used Waterswap, an absolute binding free energy method. Continuum solvent methods do not include the molecular detail of protein-water, ligand-water, and proteinwater-ligand interaction interactions because they use an explicit model of water [23].
It uses a single simulation to replace the ligand-bound protein with bulk water molecules of the same shape and volume as shown in Figure 2. An increase in protein-ligand interactions occurs when the protein's resulting cavity is lled with a cluster of water molecules. With only one reaction coordinate, it's possible to get a more accurate free energy value because it's not the product of two huge numbers [30].
Moreover, the bene cial waters around the protein-ligand complex are also identi ed to obtain a stable pocket. These bene cial waters also become a potential bridge in creating interactions between likely residues and ligand. Both proteins favourable and ligand favourable waters are important to identify to assess the pocket interactions and their stability [31]. Two different types of colours are given for the protein depending on its preference i.e., the protein present in green colour states that it prefers ligand, and protein present in red colour states that it prefers water as indicated in Figure 3. The water box identi ed by the system further provides the required binding free energy for the protein and ligand [16].
To analyze Waterswap studies, binding free energy (∆G), shown in Figure        Interpretation of which protein residues prefer binding to water or ligand. Green represents ligand favourable and red represents water favourable.

Figure 4
Page 10/10 Swapping of Protein:Water to Protein:Ligand.

Figure 5
The relative binding free energy graph of standard EAI045 and Styrylquinoline.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.