ProfKin: A Comprehensive Web Server for Structure-based Kinase Selectivity Proling

Protein kinases are central mediators of signal-transduction cascades and attractive drug targets for therapeutic intervention. Since kinases are structurally and mechanistically related to each other, kinase inhibitor selectivity is often investigated by kinase proling and considered as an important index for drug discovery. We here describe a versatile web server termed ProfKin for structure-based kinase selectivity proling, which is based on a kinase-ligand focused database (KinLigDB). It provides all ready-to-use 3D structure coordinates of 4,219 kinase-ligand complex structures covering 297 human kinases and the associated information, particularly including binding site type, binding ligand type, interaction ngerprints, downstream molecules and related human diseases. The web server works via predicting possible binding modes for the query molecule, prioritizing the binding modes guided by an interaction ngerprint analysis method, and giving a list of ranked kinases by a comprehensive index. Users can freely select entire or part of the KinLigDB database, e.g. via subfamily and binding site type, to customize the proling contents. The superimpositions of the predicted binding poses of the query molecule with reference binding modes can be visually inspected on the website. For each top-ranked kinase, the additional classication attributes and the phylogenetic tree are given simultaneously.


Introduction
Human protein kinases represent one of the largest enzyme families and are functionally integral to signal transduction. Aberrant kinase activity is an important contributor to various human disorders, in particular those involving proliferative or in ammatory responses, such as cancer, psoriasis, rheumatoid arthritis, and neurological diseases [1][2][3][4][5]. Small molecule inhibitors targeting kinases have great therapeutic potentials, which have been continually proved in clinic in recent decade [6]. By June 2020, a total of 61 kinase inhibitors have been approved by United States Food and Drug Administration; meanwhile, a large number of kinase inhibitors are currently in preclinical and clinical development phase [7][8][9]. Despite an unparalleled success already made in drug discovery targeting kinases, it is still highly desirable to develop more potent and selective kinase inhibitors, particularly for unexploited kinases, which can provide useful chemical tools for target validation as well as drug candidates for therapeutic interventions.
The aim of obtaining potent and selective kinase inhibitors is complicated by structural similarity in the kinase active sites. Kinase selectivity pro ling is undoubtedly an e cient strategy for kinase inhibitor discovery, which enables a parallel approach by interrogating query compounds against hundreds of kinases in a single screen; in this paradigm, kinase inhibitor potency and selectivity are determined simultaneously [10]. Besides, kinase selectivity pro ling can be used for drug repositioning and polypharmacology [1,11]. To date, a number of experimental methods have been established for kinase selectivity pro ling, typically based on kinase catalytic activity or competition binding assays with isolated/puri ed kinases [12,13]; recently reported methods enabling direct assessment of kinaseinhibitor occupancy in live cells are also of great interest [14,15]. With the increasing number of kinase inhibitors and kinase-inhibitor complex structures, computational methods have been gradually used in this task [11,[16][17][18][19], which are complementary to experimental methods that are usually resourceintensive and time-consuming.
Although several established target prediction methods could be used for kinase selectivity pro ling [11,[20][21][22][23][24][25][26][27][28][29], a versatile web server specialized to kinase inhibitor selectivity prediction by structural informatics has been lacking. We hence provide ProfKin, a web server for structure-based kinase selectivity pro ling, which is established based on our kinase-ligand complex focused structural and information database (KinLigDB). This database contains 4,219 manually curated kinase-ligand complex structures, corresponding to 396 binding sites of 297 human kinases, covering 106 kinase families, and involves kinase/ligand associated information, particularly including binding site type, binding ligand type, kinase-ligand interaction ngerprints, downstream molecules and related human diseases, which are not intensively assembled in other related structural databases. For the query molecule, ProfKin enables prediction of its possible binding modes with each binding site in the KinLigDB database, comparative analyses of the predicted binding modes with the respective reference binding modes guided by the key interaction features via a weighted interaction ngerprinting method [30], and outputs the top-ranked kinases according to an integrative index of docking and ngerprint similarity scores. The database and prediction results can be freely accessed and downloaded. ProfKin is expected to serve as a useful tool to exploit the potentials of kinase selectivity pro ling in lead/drug discovery targeting kinases.

Database construction
As an important prerequisite for structure-based kinase pro ling, the kinase-ligand complex-focused database KinLigDB was established through following steps. The keyword "Kinase" was rst searched in the Protein Data Bank (PDB, http://www.rcsb.org/) with the restriction of "Home sapiens", which led to a total of 6,492 human kinase structure entries (by June 30, 2019); all these structure coordinates were downloaded from the PDB. An in-house program was then used to pick out the kinase-ligand complex structures and simultaneously separate protein and ligand coordinates. The resulted complex structures were further checked and corrected by manual inspection, of which the information including kinase name, gene name, alias name, kinase group, kinase family, mutation, downstream molecules, and associated diseases were comprehensively collected from the PDB, KinBase (http://kinase.com/kinbase/), Uniprot (https://www.uniprot.org/), KEGG (https://www.genome.jp/kegg/), TTD (http://bidd.nus.edu.sg/group/cjttd/) and references therein. The kinase-ligand interaction associated information including the binding site type, ligand type, key residues, and binding data were further curated and/or collected from references. To this end, a total of 4,219 complex structures were retained in the KinLigDB database.
The AutoDock Vina program [31] was used as the docking engine for binding pose prediction. All the protein structures were prepared by assigning Gasteiger-Marsili charges and adding polar hydrogens using the AutoDockTools and then saved in pdbqt format. The binding site information, including the grid center, grid size, and number of docking poses, were generated for each complex structure using an inhouse program and stored in con guration le (conf format). The interaction ngerprinting (IFP) method [30] was used to characterize the key kinase-ligand interaction features involving hydrogen-bond acceptor, hydrogen-bond donor, negatively charged center, positively charged center, hydrophobic interactions, face-to-face and edge-to-face π−π stacking interactions. For each kinase-ligand complex, a speci c IFP mode was generated and saved in ifp format. All the generated IFP modes will be used as reference modes for later kinase pro ling.

Structure-based kinase pro ling approach
The structure-based kinase pro ling approach behind the ProfKin web server works via integrating molecular docking and interaction ngerprinting methods, as brie y described below: (i) the MOI inquired by users through the web interface is automatically de ned the rotatable bonds, assigned partial charges/polar hydrogens, and transformed into pdbqt format using the AutoDockTools; (ii) the prepared MOI is submitted to execute molecular docking with each binding site in KinLigDB by calling AutoDock Vina, and top-ranked docking poses are generated for each binding site; (iii) for each docking pose, the IFP mode is generated using the same method as above-described for generating the reference IFP modes [30]; (iv) the similarity score between docking pose IFP and the reference IFP modes is calculated as described previously [30]; (v) a comprehensive index (Cvalue), integrating the advantages of docking and IFP similarity scores, is nally calculated and used for kinase ranking and pro ling. The combination of complementary molecular docking and IFP methods will probably yield improved prediction results.

Website development
The ProfKin web server (http://www.lilab-ecust.cn/profkin/) involves two main functions: providing searchable useful archives for kinase details and performing structural-based kinase pro ling (Figure 1). It runs on a Linux system with Apache as the HTTP server. The web interfaces are implemented in PHP and JavaScript, which control the display behavior of the web page and respond to the operations performed by the users. The backend was developed using the Python programming language, with a MySQL database for storing the kinase annotations and the task details. The phylogenetic tree describing the kinase distribution of ranked kinases is available for each task using the Kinome Render tool [32]. The superimpositions of docking poses and reference ligands can be visualized and analyzed with a JavaScript-based web applet NGL Viewer [33]. The website requires browsers supporting HTML5 and ES6, and can work well on most of the mainstream browsers, such as Chrome/Chromium-kernel, Opera, Firefox, Edge, IE11, and Safari.

Database statistics and access
The current version of KinLigDB contains 4,219 curated kinase-ligand complex structures of 297 human kinases covering 106 kinase families; most of them are related to human diseases. About 75% of these kinases have at least two structures with different ligands, and 92 kinases have ≥10 complex structures, such as CDK2 (357 entries), MAPK14 (204 entries), PIM1 (141 entries), CHK1 (130 entries), and EGFR (104 entries). Most of ligands are observed to bind in the kinase active sites and act as inhibitors, and some ligands bind adjacent to the active sites or allosteric binding sites to speci cally modulate (inhibit, activate, or enhance) the kinase catalytic activity. A total of 396 binding sites were found and de ned for the kinases in the database. Notably, 73 kinases have two or more different binding sites; for example, for NTRK1, there are four kinds of binding sites, including type I, type II inhibitor binding sites, and two distinct allosteric binding sites [34][35][36]. According to the binding features, eight types of ligands were annotated, including type I inhibitor (3167 entries), type II inhibitor (283 entries), type III inhibitor (23 entries), type IV inhibitor (9 entries), competitive inhibitor (554 entries), covalent inhibitor (31 entries), activator (49 entries), and allosteric ligand (103 entries). A total of 2805 kinase-ligand complexes were annotated with the binding data.

Database search
The database search module enables users to search and browse all of the data covered without any prerequisite knowledge or experience. Users can retrieve all the kinase-ligand complex entries and associated information via basic annotations, such as PDB code, kinase name, kinase family, kinase group, ligand type, binding site type, downstream molecule, and relevant disease (Figure 2A). For example, searching with the kinase group of "Atypical" will return a list of 303 atypical kinase structure entries ( Figure 2B); users can select all or part of these kinase structure entries via the rst-column select boxes as a subset database to link to the kinase pro ling webpage ( Figure 2C). Users can also click on the PDB code to access the detailed information page. The linked page mainly include kinase information (e.g. kinase family/group, mutations, kinase alias, downstream molecules, and associated diseases; Figure 2D) and ligand information (e.g. ligand structure, ligand smiles, ligand type, binding pocket, key residue, and binding data; Figure 2E). All the ready-to-use coordinates of kinases and ligands and their associated information can be downloaded via the 'DOWNLOAD' webpage.

Kinase pro ling
This module enables users to perform kinase pro ling prediction for small molecules of interest. Users can upload the query molecule using a mol2 or sdf le, sketch a chemical structure online [37], or input a standard SMILES strings ( Figure 3A). It allows users to select entire or part of the KinLigDB database for kinase pro ling; for example, users can select one kinase group or family as a database subset to execute kinase pro ling prediction ( Figure 3B). The advanced options including the cutoff of IFP similarity and Cvalue can be setup by users to customize the speci c requirements ( Figure 3C). Once all the necessary parameters are given, clicking on the 'Submit' tab will start your computation job, and meanwhile the system will send the job id to the email address provided by users ( Figure 3D). Usually, one job if running against the entire database may cost 30-40 hours because a series of molecule docking processes will be performed; the time cost is associated with the database size and the complexity of the query molecule (particularly the number of rotatable bonds). The web server will inform users via email when the job is nished. Users can also check the job schedule/progress using the job id. A help document is provided with more details on the kinase pro ling webpage.
As one example, the kinase pro ling job was run for the compound (Z)-2',3-dioxo-[2,3'-biindolinylidene]-5'sulfonic acid, an indirubin derivative, which is a potent CDK2 inhibitor [38]. The pro ling results can be visualized and downloaded on the webpage ( Figure 4A). The 100 top-ranked kinases for the compound are graphically showed with the additional classi cation attributes and phylogenetic tree; an additional phylogenetic tree containing all ranked kinases is also given and can be switched over arbitrarily ( Figure  4B). In addition to CDK2 that was ranked at the top 3 ( Figure 4C), the compound is observed to t well to the binding sites of multiple kinases, such as TYK2, AurA, and JAK3 ( Figure 4C); for example, the compound likely has a similar binding mode with a potent TYK2 inhibitor although their chemical structures are apparently different ( Figure 4D-E). Besides, the user can simply click on the kinase name of any records in the result list to start a KinLigDB search on this kinase.

Conclusion
This work provided the ProfKin web server as a platform for e ciently analyzing potential binding kinases for molecules of interest guided by structural informatics, with the aim to assist inhibitor development and drug discovery targeting clinically relevant kinases. An important feature of ProfKin is the comparison analyses of predicted binding poses with reference binding modes through the weighted interaction ngerprint method, which is not only suitable for structurally similar ligands but also useful to identify similar binding modes for structurally different ligands that are not easily identi ed by ligand similarity methods. The manually curated structural and information database is also provided, which could be directly used for developing other kinase pro ling methods or platforms. The web server and database are freely accessible for non-commercial users at http://www.lilab-ecust.cn/profkin/. We are also sincerely open to receiving support and advice from users to improve ProfKin's usefulness. Figure 1 The main features of ProfKin.