Virtual screening of the potential hCA XII inhibitors based on the pharmacophore modelling, molecular docking, MMGBSA and molecular dynamics simulations studies


 Human carbonic anhydrase XII (hCA XII) has gained therapeutic value in the field of medicinal chemistry as a pharmacological target and biomarker for different types of cancer. Despite the fact that interaction features of hCAs inhibitors with the catalytic site of the enzyme are well described, lack in the selectivity of the traditional hCA inhibitors is an urgent issue. This study was devoted to the identification of novel potential hCA XII inhibitors using comprehensive set of computational approaches, including generation of structure-and ligand-based pharmacophore models, molecular docking, MMGBSA calculations and molecular dynamics simulations. As the results of the study multiple potential hCA inhibitors were identified. Along with the identified derivatives of the classical hCA inhibitors (have sulfonamide motifs), several compounds with alternative chemical scaffolds, that can be considered and further investigated as potential non-classical hCA XII inhibitors, were identified.


Introduction
Carbonic anhydrases (CAs, EC 4.2.1.1) are zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide into bicarbonate and a proton 1 . Sixteen CA isoforms are found in humans and all vary in kinetic properties, subcellular localization and distribution to the tissues. Many essential physiological processes such as pH regulation, electrolyte secretion, respiration, bone resorption and tumorigenesis are involved in the various hCAs, so irregular levels and/or activities of these enzymes are typically associated with many diseases, such as glaucoma, neurological diseases, osteoporosis and metabolic disorders 2 . In general, CA inhibitors (CAIs) are compounds equipped with an effective zinc-binding group (ZBG) capable of chelating the prosthetic zinc ion placed inside the hCA binding site that is essential for these enzymes' catalytic action. Sulfonamide moieties or related structural motifs represent the most common ZBGs shared by typical CAIs (such as sulfamides and sulfamates). These groups are especially effective in endowing high-affinity small-molecule ligands with hCAs, as they enable not only the proper coordination of the catalytic zinc ion, but also the forming of associations of H-bonds with key protein residues located in the zinc-binding cavity area. In particular, the most significant groups of CAIs are these molecules, and multiple ligands with high inhibitory potency have been recorded to date 3 . However, because of the high amino acid conservation found at the level of their catalytic site and adjacent regions in the various hCA isoforms, these ligands also present the downside of being insufficiently selective against particular hCAs that are therapeutically interesting. In particular, in the field of medicinal chemistry, hCA IX and hCA XII have gained special interest, as these isoforms have been designated as biomarkers and pharmacological targets for different types of cancer. As a result, the inhibition of these two enzymes is currently the subject of a vast field of study 4 . Notwithstanding, the development of hCA IX/XII selectivity inhibitors over hCA I/II, which are ubiquitously distributed and involved in key physiological processes, is still a difficult challenge, although some examples of selective ligands have been reported 5,6 . Non-classical CAIs are an especially useful resource in this sense for finding isoform specificity and avoiding possible off-target events, side effects and or adverse reactions (such as sulfur allergies) associated with the use of sulfurized ligands 7 . This study was devoted to identification of novel potential hCA XII inhibitors using computational approaches: pharmacophore modeling (structure-, and ligand-based), molecular docking, MMGBSA and molecular dynamics simulations.

Materials and Methods
Preparation of the pharmacophore validation datasets 3D structures of the 2500 compounds with Ki < 10 against hCA XII enzyme were downloaded from PubChem 8 and ChEMBL 9 databases. 100 of them were used for generation of 5000 decoys using a Database of Useful Decoys: Enhanced (DUD-E) 10 and were not included in the validation datasets. Two validation datasets were prepared and each of them consisted of 5% active compounds and 95% decoys. First validation dataset included active top 135 active compounds based on the Ki values. Second dataset included randomly picked 135 active compounds. 3D structures of decoys were generated by Open Babel v3.1.1 11 software based on the smiles obtained by using DUD-E. Hydrogens were added to all compounds using Open Babel software. The iConBest method, implemented in LigandsScout software v4.4, was used for the generation of the conformations for all compounds. Structure-based approach pharmacophore generation Both, structure-based and ligand-based pharmacophore models were generated using LigandScout v4.4 12 . For structure-based pharmacophore generation, 22 crystal structures of the hcAXII in complexes with inhibitors (resolution < 2 Å) available in the Protein Data Bank (PDB) 13 . Several approaches for the structure-based pharmacophore model generation were tested: 1) based on the separate crystal structures, 2) based on the combination of separate crystal structures, 3) merged and 4) shared pharmacophores. Best pharmacophore model was generated using crystal structure of the catalytic domain of human carbonic anhydrase isozyme XII with benzenesulfonamide derivative (PDB ID : 4QJW with resolution 1.55 Å, chain C). All features that characterize interactions with the water molecules and hydrophobic interactions were removed from the obtained pharmacophore model. Also exclusion volumes coat feature was added.
Ligand-based pharmacophore generation Clusterization of the top 135 active compounds with the best Ki values was performed using pharmacophore alignment score as similarity measure. Maximum number of the conformations and cluster distance values were set to 3 and 0.4, respectively. The average method was used for cluster distance calculations. Based on the clusterization of the 135 active compounds 14 clusters were identified and representative compounds of those clusters were used for the creation of ligand-based pharmacophore models (Table S1). The best ligand-based pharmacophore model was obtained based on the biggest cluster. "H bond donor" feature was removed from the selected ligand-based pharmacophore model. Validation metrics ROC AUC -represents the area value under the ROC (receiver operating characteristic). Values range from 0 to 1, where 1 is perfect classification and 0.5 and below is random classification. ROC, widely used to evaluate virtual screening and pharmacophore modeling methods 14,15 , is defined as a graphical representation of the test sensitivity in relation to its specificity or false-positive rate. The AUC is the probability of active compounds being ranked earlier than decoy compounds. The classifier "precision" represents the share of true positives (TP) among all hits (TP/(TP+FP, where FP is false positive compounds). The classifier "specificity" represents the ratio of the active compounds found in the list of compounds identified as "true positives" (TP/A, where A is active compounds). TP/(TP+FP)). "Specificity" classifier is the ratio of true negative compounds to all compounds in the database, excluding active ones. Enrichment Factor (EF) measures the fraction of active compounds found in a specific percentage, solving the problem of comparing the results for datasets with different active/inactive compound ratios 16 . The EF for 1, 5, 10 and 100 % was calculated for the share of true positives among the molecules identified as hit compounds using generated pharmacophore models. Virtual screening ZINCPharmer 17 was used for filtering of the ZINC purchasable (21,777,093 compounds) and ZINC natural products and natural derivatives (197, 488 compounds) datasets using generated structure-based and ligand-based pharmacophore models. Identified hit compounds were used for the molecular docking against the active site of the hCA XII enzyme (PDB ID 4QJW). AutoDock Tools 18 was used for the estimation of the grid box (< 27 Å), calculation of the protein's and compounds' charges and the addition of the polar hydrogens. Virtual screening was performed using AutoDock Vina software 19 . AutoDock Vina was regarded as highly efficient software for molecular docking and virtual screening procedures based on the recent benchmark studies among both academic and commercial software 20 . AutoDock Vina uses "Iterated Local Search global optimizer" similar to that by Abagyan et al and Broyden-Fletcher-Goldfarb-Shanno (BFGS), which is quasi-Newton method for the local optimization as a search algorithm and hybrid scoring function (empirical + knowledge-based function) inspired in the X-Score function 19 . Standard, recommended by the developers, parameters were used for the virtual screening procedure.

Re-scoring using MMGBSA method
The algorithm for the molecular mechanics-generalized Born surface area (MMGBSA) calculations is based on a freely-available AmberTools suite 21 . The algorithm can be described in three stages, 1) receptor and ligands parametrization 2) minimization and 3) MM/GBSA and MM/PBSA calculations. At the first stage, the ff14SB force field 22 is used to describe protein parameters, and General Amber Force Field (GAFF) 23 with AM1-BCC charge model 24 is used for small molecule parametrization. Next, the algorithm prepares necessary input files (coordinates and topologies) with mbondi3 radii using LEaP. Minimization is performed in generalized Born implicit solvent models (igb=8) using the sander engine. Finally, for the free energy calculations, the algorithm uses the MMPBSA.py program 25 for the MMGBSA calculations. The algorithm is implemented in a bash script and can be run in parallel in most Linux distributions without any additional libraries for parallelization. The full code is available at the following link: https://github.com/sahakyanhk/iPBSA

Molecular Dynamics simulations
The molecular dynamics simulations were carried out using AMBER20 with ff14SB force field for protein and GAFF for the ligand parameterization according to the AM1-BCC scheme order to calculate the atomic point charges. Minimized conformations from the previous step (re-scoring of docked complexes with MMGBSA) were used as starting positions for corresponding simulations. The complexes were solvated with TIP3P water and Na + /Clions at 150 mM concentration 26 . The Monte Carlo barostat 27 with reference pressure at 1 bar and Langevin thermostat 28 with collision frequency (gamma_ln) 2 ps -1 to keep the temperature at 310.15 K were used. Particle Mesh Ewald (PME) with electrostatic interactions cut off at 1.0 nm was used for the long-range electrostatic interactions. Bonds involving hydrogen were constrained using the SHAKE algorithm and 2fs integration step was used 29 . Finally, for every simulation, binding free energies were re-calculated using the same MMGBSA method and MMPBSA.py program, using 250 snapshots with equal intervals collected from the last 20 ns of every trajectory for each simulation.

Results and Discussion
Ligand-based pharmacophore generation As a result of clusterizations, 14 clusters have been obtained. The best ligand-based pharmacophore model was obtained based on the biggest cluster ( fig. 1). "H bond donor" feature was removed from the pharmacophore. Therefore, the final ligand-based pharmacophore model included three "H-bond (acceptor)" features, two "H-bond (donor)" features, one "hydrophobic" and "aromatic" features of the LigandScout v4.4.   . 2).

Figure 2. Validation of the generated ligand-based pharmacophore on the two datasets (A and B)
Identified hit compounds out of 2635 total compounds (135 actives, 2500 decoys).

Receptor-ligand pharmacophore generation (Structure-based approach)
Structure-based pharmacophore model generated using crystal structure of catalytic domain of human carbonic anhydrase isozyme XII with benzenesulfonamide derivative (PDB ID: 4QJW) demonstrated LigandScout's binding affinity score of -17.18, which includes interaction and desolvation energies. The selected structure-based pharmacophore model included two "H-bond (acceptor)" that are important for the interaction of the ligand with ASN64 and THR198 of the active site of hCA XII enzyme, and two "Hbond (donor) feature that are important for interaction with PRO200 and GLU104 ( fig. 3).  Filtration of zinc database using generated pharmacophore models Both selected best ligand-based and structure-based pharmacophore models were applied using ZINCPharmer, separately, to filter ZINC datasets of purchasable and natural and derivatives compounds datasets (total of 21.970.000 compounds). As the result 6357 and 193 hit compounds were identified from purchasable and natural (including derivatives) datasets, respectively, using a structure-based pharmacophore model. Other 8415 and 1514 hit compounds were identified using a ligand-based pharmacophore model. All of those compounds have been passed to molecular docking against the active site of the hCA XII enzyme.

Molecular Docking
As the result of the molecular docking of compounds, obtained by application of the ligand-based pharmacophore, against ligand binding site of the hCA XII enzyme, 1361 "purchasable" and 264 natural derivatives compounds with similar or higher docking scores compared to the reference molecule (cocrystallized ligand of 4QJW, benzenesulfonamide derivative: -8 kJ/mol Vina core) were identified (Table  S2). As the result of the molecular docking of compounds, obtained by application of the structure-based pharmacophore, against ligand binding site of the hCA XII enzyme, 1027 "purchasable" and 17 natural derivatives compounds with similar or higher docking scores compared to the reference molecule (cocrystallized ligand of 4QJW, benzenesulfonamide derivative: -8 kJ/mol Vina core) were identified ( Table  S3). All of the compounds with similar or higher docking scores in comparison to reference ligand, were passed to the additional stages of minimization and re-scoring using MM-GBSA method. MMGBSA re-scoring results As the results of the minimization procedures and following calculations of the binding energies of the complexes of the identified compounds with hCA XII, obtained as the result of performed molecular docking only 38 compounds from the "purchasable" (Table 1) and 2 from "natural and derivatives" (Table  2) datasets demonstrated similar or higher binding energy values than reference ligand. 24 out of aforementioned 38 compounds were obtained with the use of structure-based pharmacophore and the rest derivatives" dataset were obtained with the use of a ligand-based pharmacophore model. Table1. Binding energies of the top "purchasable" compounds after re-scoring using MM-GBSA In order to analyse chemical diversity of identified compounds, an agglomerative hierarchical clustering method based on the "ECFP" fingerprint was performed ( fig. 5). As a result of the clusterization of the identified 38 compounds, 18 clusters were obtained. As expected, most of the identified compounds ( fig.  5, clusters 1-10) have sulfonamide moieties or related structural motifs, that are widely known to promote small-molecules interaction with hCAs active site. However, several of the identified compounds ( fig. 5, clusters [11][12][13][14][15][16][17][18] have different from sulfonamide chemical moieties. Those compounds include cyclohexanecaboxiamide, propanamide, acetamide, cyclopropanecarbohydrazide, thiadiazolidine, cyclopropane, carbohydrazide, chromen-7, tetrahydrofuran derivatives and analogues. Remarkably one of those compounds, identified with the use of ligand-based pharmacophore (ZINC82980951), showed lowest binding energy among all compounds as the result of MM-GBSA rescoring (Table 1). Thus, compounds that are different from the traditional hCA XII inhibitors (which include sulfonamide chemical group) are of special interest for further analysis and investigation as potential alternative hCA XII inhibitors.
In the case of the natural and natural-derivatives compounds only two compounds (ZINC49181869 and ZINC49181861, Table 2), which are stereoisomers of the same oxychromen-2-one derivative, showed higher binding energies than reference ligand.  Table 2. Binding energies of the top "natural and derivatives" compounds after re-scoring using MM-GBSA.

Molecular dynamics
From a list of compounds identified from the "purchasable" dataset, two compounds with lowest binding energy (one identified using structure-based pharmacophore (ZINC66466630) and another one using ligand-based pharmacophore model (ZINC82980951) were selected for additional MD simulations. In the case of the "natural and derivatives" dataset, two compounds with the lowest binding energies identified using structure-based (ZINC70704873) and ligand-based pharmacophore (ZINC49181869) models were also selected for MD simulations. Based on the results of conducted 100 ns MD simulation of the reference compound (co-crystallized ligand of 4QJW structure), it was stable during the whole simulation and has maintained conformation close to its initial crystal state (fluctuations around 0.05 nm, fig. 6). Figure 6. RMSD values of the selected four compounds and reference ligand.
From the four tested compounds, only ZINC49181869 was stable during the whole simulation and also maintained conformation close to the one predicted by molecular docking fluctuations around 0.05 nm. Compound ZINC70704873 stabilized after ~55 ns of the simulation and maintains its conformation with fluctuations around ~ 0.05 nm. ZINC66466630 stabilized after ~70 ns and maintains its conformation with fluctuations around ~ 0.025nm. ZINC82980951 stabilized after ~50 ns and maintains its conformation with fluctuations around ~ 0.1 nm. In its stable conformation, during molecular dynamics simulation, the reference ligand has five hydrogen bonds with the following amino acid residues of active site of the hCA XII enzyme: HIE 117, HID 91, GLN 89, ASN 64, THR 198. In the presence of the reference ligand, HID 93, HIE 117 and THR 198 also have hydrogen bonds with Zn ion (fig. 7).

Conclusion
The hCA XII enzyme has gained special interest in the drug design and discovery field, as a biomarker and pharmacological target for different types of cancer. Given the medical relevance, the discovery and development of selective inhibitors for hCA XII have been under continuous efforts. However typical hCAs binding ligands, that have sulfonamide functional groups, have insufficient selectivity against particular hCAs that are therapeutically interesting, such as hCA XII.
In this study, ligand-and structure-based pharmacophore modeling approaches were combined with molecular docking, MMGBSA, molecular dynamic simulations and applied with the purpose of finding new potential hCA XII inhibitors from a vast number commercially available natural, natural derivatives and synthetic compounds. As the result of the study, a number of new compounds with typical sulfonamide functional group were identified. Number of identified compounds have non-classical chemical scaffolds and groups and are of special interest as alternative potential hcA XII inhibitors and matter of further investigations.