2.1 Database construction
As an important prerequisite for structure-based kinase profiling, the kinase-ligand complex-focused database KinLigDB was established through following steps. The keyword “Kinase” was first searched in the Protein Data Bank (PDB, http://www.rcsb.org/) with the restriction of “Home sapiens”, which led to a total of 6,492 human kinase structure entries (by June 30, 2019); all these structure coordinates were downloaded from the PDB. An in-house program was then used to pick out the kinase-ligand complex structures and simultaneously separate protein and ligand coordinates. The resulted complex structures were further checked and corrected by manual inspection, of which the information including kinase name, gene name, alias name, kinase group, kinase family, mutation, downstream molecules, and associated diseases were comprehensively collected from the PDB, KinBase (http://kinase.com/kinbase/), Uniprot (https://www.uniprot.org/), KEGG (https://www.genome.jp/kegg/), TTD (http://bidd.nus.edu.sg/group/cjttd/) and references therein. The kinase-ligand interaction associated information including the binding site type, ligand type, key residues, and binding data were further curated and/or collected from references. To this end, a total of 4,219 complex structures were retained in the KinLigDB database.
The AutoDock Vina program [31] was used as the docking engine for binding pose prediction. All the protein structures were prepared by assigning Gasteiger-Marsili charges and adding polar hydrogens using the AutoDockTools and then saved in pdbqt format. The binding site information, including the grid center, grid size, and number of docking poses, were generated for each complex structure using an in-house program and stored in configuration file (conf format). The interaction fingerprinting (IFP) method [30] was used to characterize the key kinase-ligand interaction features involving hydrogen-bond acceptor, hydrogen-bond donor, negatively charged center, positively charged center, hydrophobic interactions, face-to-face and edge-to-face π−π stacking interactions. For each kinase-ligand complex, a specific IFP mode was generated and saved in ifp format. All the generated IFP modes will be used as reference modes for later kinase profiling.
2.2 Structure-based kinase profiling approach
The structure-based kinase profiling approach behind the ProfKin web server works via integrating molecular docking and interaction fingerprinting methods, as briefly described below: (i) the MOI inquired by users through the web interface is automatically defined the rotatable bonds, assigned partial charges/polar hydrogens, and transformed into pdbqt format using the AutoDockTools; (ii) the prepared MOI is submitted to execute molecular docking with each binding site in KinLigDB by calling AutoDock Vina, and top-ranked docking poses are generated for each binding site; (iii) for each docking pose, the IFP mode is generated using the same method as above-described for generating the reference IFP modes [30]; (iv) the similarity score between docking pose IFP and the reference IFP modes is calculated as described previously[30]; (v) a comprehensive index (Cvalue), integrating the advantages of docking and IFP similarity scores, is finally calculated and used for kinase ranking and profiling. The combination of complementary molecular docking and IFP methods will probably yield improved prediction results.
2.3 Website development
The ProfKin web server (http://www.lilab-ecust.cn/profkin/) involves two main functions: providing searchable useful archives for kinase details and performing structural-based kinase profiling (Figure 1). It runs on a Linux system with Apache as the HTTP server. The web interfaces are implemented in PHP and JavaScript, which control the display behavior of the web page and respond to the operations performed by the users. The backend was developed using the Python programming language, with a MySQL database for storing the kinase annotations and the task details. The phylogenetic tree describing the kinase distribution of ranked kinases is available for each task using the Kinome Render tool [32]. The superimpositions of docking poses and reference ligands can be visualized and analyzed with a JavaScript-based web applet NGL Viewer [33]. The website requires browsers supporting HTML5 and ES6, and can work well on most of the mainstream browsers, such as Chrome/Chromium-kernel, Opera, Firefox, Edge, IE11, and Safari.
3 Database statistics and access
The current version of KinLigDB contains 4,219 curated kinase-ligand complex structures of 297 human kinases covering 106 kinase families; most of them are related to human diseases. About 75% of these kinases have at least two structures with different ligands, and 92 kinases have ≥10 complex structures, such as CDK2 (357 entries), MAPK14 (204 entries), PIM1 (141 entries), CHK1 (130 entries), and EGFR (104 entries). Most of ligands are observed to bind in the kinase active sites and act as inhibitors, and some ligands bind adjacent to the active sites or allosteric binding sites to specifically modulate (inhibit, activate, or enhance) the kinase catalytic activity. A total of 396 binding sites were found and defined for the kinases in the database. Notably, 73 kinases have two or more different binding sites; for example, for NTRK1, there are four kinds of binding sites, including type I, type II inhibitor binding sites, and two distinct allosteric binding sites [34-36]. According to the binding features, eight types of ligands were annotated, including type I inhibitor (3167 entries), type II inhibitor (283 entries), type III inhibitor (23 entries), type IV inhibitor (9 entries), competitive inhibitor (554 entries), covalent inhibitor (31 entries), activator (49 entries), and allosteric ligand (103 entries). A total of 2805 kinase-ligand complexes were annotated with the binding data.
4 Database search
The database search module enables users to search and browse all of the data covered without any prerequisite knowledge or experience. Users can retrieve all the kinase-ligand complex entries and associated information via basic annotations, such as PDB code, kinase name, kinase family, kinase group, ligand type, binding site type, downstream molecule, and relevant disease (Figure 2A). For example, searching with the kinase group of “Atypical” will return a list of 303 atypical kinase structure entries (Figure 2B); users can select all or part of these kinase structure entries via the first-column select boxes as a subset database to link to the kinase profiling webpage (Figure 2C). Users can also click on the PDB code to access the detailed information page. The linked page mainly include kinase information (e.g. kinase family/group, mutations, kinase alias, downstream molecules, and associated diseases; Figure 2D) and ligand information (e.g. ligand structure, ligand smiles, ligand type, binding pocket, key residue, and binding data; Figure 2E). All the ready-to-use coordinates of kinases and ligands and their associated information can be downloaded via the ‘DOWNLOAD’ webpage.
5 Kinase profiling
This module enables users to perform kinase profiling prediction for small molecules of interest. Users can upload the query molecule using a mol2 or sdf file, sketch a chemical structure online [37], or input a standard SMILES strings (Figure 3A). It allows users to select entire or part of the KinLigDB database for kinase profiling; for example, users can select one kinase group or family as a database subset to execute kinase profiling prediction (Figure 3B). The advanced options including the cutoff of IFP similarity and Cvalue can be setup by users to customize the specific requirements (Figure 3C). Once all the necessary parameters are given, clicking on the ‘Submit’ tab will start your computation job, and meanwhile the system will send the job id to the email address provided by users (Figure 3D). Usually, one job if running against the entire database may cost 30-40 hours because a series of molecule docking processes will be performed; the time cost is associated with the database size and the complexity of the query molecule (particularly the number of rotatable bonds). The web server will inform users via email when the job is finished. Users can also check the job schedule/progress using the job id. A help document is provided with more details on the kinase profiling webpage.
As one example, the kinase profiling job was run for the compound (Z)-2',3-dioxo-[2,3'-biindolinylidene]-5'-sulfonic acid, an indirubin derivative, which is a potent CDK2 inhibitor [38]. The profiling results can be visualized and downloaded on the webpage (Figure 4A). The 100 top-ranked kinases for the compound are graphically showed with the additional classification attributes and phylogenetic tree; an additional phylogenetic tree containing all ranked kinases is also given and can be switched over arbitrarily (Figure 4B). In addition to CDK2 that was ranked at the top 3 (Figure 4C), the compound is observed to fit well to the binding sites of multiple kinases, such as TYK2, AurA, and JAK3 (Figure 4C); for example, the compound likely has a similar binding mode with a potent TYK2 inhibitor although their chemical structures are apparently different (Figure 4D-E). Besides, the user can simply click on the kinase name of any records in the result list to start a KinLigDB search on this kinase.
(E) A view of the superimpositions of the top-ranked docking pose with reference kinase ligand; users can drag or zoom the molecules for more views.