Since the SpaceDock method heavily relies on the possibility to accurately dock chemical reagents, we first investigate the best docking prototols for the latter task by setting-up a dedicated benchmarking study. We then describe how chemical reagents are annotated by reactive groups and organic chemistry reactions, to define a chemical space of 53.5 billion synthesizable compounds. Last we present two concrete application of the SpaceDock workflow to two receptors of pharmaceutical interest.
Setting up the conditions for accurate docking of chemical reagents
To evaluate the feasibility of the SpaceDock approach, we first needed to set-up an archive of reference 3D structures for protein-bound chemical reagents. Since experimental data for such a dataset are missing, we fragmented in 3D space drug-like ligands from known protein-ligand X-ray structures (sc-PDB dataset)31 using a set of 12 common organic chemistry reactions, then added the 3D atomic coordinates of the missing reactive moieties (e.g. boronic acid, halide; Supplementary Fig. 1) and last created on-the-fly "surrogate X-ray poses" for the corresponding reagents expected to yield the parent ligands with the above-described reactions. The final archive of 5,845 reagents was selected after appropriate filtering (Supplementary Table S1) and exhibit 13 chemical functions with a prevalence of reactive groups (e.g. amines, aryl halides, boronic acids) reflecting the frequent usage of simple organic chemistry reactions in drug discovery.32 With a set of reference reagents in hand, we could next verify whether state-of-the-art docking algorithms were able to reproduce the surrogate X-ray poses. Five algorithms relying on different principles (FlexX:33 incremental construction, GOLD:34 genetic algorithm, PLANTS:35 ant colony optimization, RDPSOVina:36 random drift particle swarm optimization, Surflex:37 surface-based molecular similarity) were used for that purpose. Since the SpaceDock strategy just need a single pair of complementary reagents to be properly docked to reconstitute a full ligand, the docking performance was measured by computing the root-mean square deviation (rmsd) of the pose found to be the closest (best pose) to that of the surrogate X-ray structure (Fig. 1). All docking tools exhibit an excellent docking performance with 70-80% of chemical reagents being docked within 2 Å rmsd accuracy (Fig. 1A). Up to 70% of very high-quality poses (rmsd < 1 Å) could be generated by the apparently best docking/scoring scheme (GOLD docking, PLP scoring; Fig. 1A). The observed docking accuracy is therefore independent on the chosen docking algorithm, and remains in agreement with docking benchmarks on low molecular weight fragments.38, 39 Since rmsd is a global measure that does not take into account whether key protein-reagent interactions are verified or not, we additionally computed the similarity of protein-reagent interaction fingerprints (IFPs)40 between docked and surrogate X-ray poses. Again, an excellent performance could be noticed using this orthogonal quality descriptor, with 75-85 % of chemical reagents for which the IFP similarity to the X-ray pose is deemed acceptable (Tc-IFP > 0.60;40 Fig. 1B). To ascertain that all chemical functions are equally suitable for docking, the same analysis was repeated for each of the 13 chemical groups (Fig. 1C) present in our library, focusing on the best docking strategy (GOLD docking, PLP scoring). Reassuringly, the docking performance appears to be relatively independent on the chemical function of the reagent (Fig. 1C) as well as on the target protein family (Fig. 1D).
Defining a readily-accessible ultra-large chemical space from simple organic chemistry reactions
Starting from the pioneering work of Hartenfeller et al.,26 we selected 36 robust, stereo- and regioselective organic chemistry reactions to define a chemical space of 5.5 billion compounds readily accessible in one or two synthesis steps (Supplementary Table S2, Supplementary Fig. 2). Contrarily to previous similar approaches26, 41, 42, chemical reagents were here carefully chosen from specific SMARTS strings in a list of 145,705 commercial chemical reagents contributing to Enamine's REAL space43 of 36 billion compounds. Moreover, possible side reactions affecting synthesis yields were minored by selecting reagents that are monofunctional for a particular chemical function (e.g. monocarboxylic acid), and lacking additional chemical functions (e.g. nucleophilic groups for an electrophilic reactant) that would decrease the reaction yield (Supplementary Table S2). Altogether, 134,331 commercial reactants could be unambiguously annotated by reaction type, reactant role and reactive atoms yielding a total of 713,155 atomic tags (Fig. 2). Conversion in 3D atomic coordinates provided a total of 176,824 ready-to-dock unique reagents, ionized at pH 7, including stereoisomers for reactants bearing up to two undefined chiral centers.
Retrospective chemical space docking of 97 million compounds for human estrogen receptor beta agonists.
For a first proof-of-concept, we selected as a target the activated form of the human estrogen receptor beta (ERβ) for the following two reasons: (i) the ligand-binding cavity is nicely druggable with a good hydrophobicity/hydrophilicity balance, (ii) the receptor has been co-crystallized with many high-affinity low molecular-weight agonists, notably compounds sharing a 2-aryl-benzoxazole scaffold44 whose one-step synthesis from 2-aminophenols and benzaldehydes is one of the 36 reactions that we have encoded. To avoid a possible chemotype bias, we selected an X-ray receptor structure co-crystallized with genistein (PDB 1QKM), a non-benzoxazole high-affinity agonist used from hereon as "reference ligand" (Fig. 3A) and asked whether we could recover a "ground truth" benzoxazole agonist (WAY-338, Fig. 3A) or any close analog, by first docking the necessary reactants (2-aminophenols, benzaldehydes) and then enabling the benzoxazole ring formation within the protein binding site. To this end, 145 commercial 2-aminophenols and 3,874 benzaldehydes were generated in 3D and docked into the 1QKM structure, in order to explore a combinatorial space of 561,730 possible benzoxazoles. Since the later space is small, we additionally considered a much larger space of 97 million sulfonamide decoys synthesizable from 1,275 sulfonyl chlorides and 76,758 amines, thereby strongly minoring the benzoxazole space (0.57%) in the full chemical space to scan. After docking all reagents necessary to mine both chemical spaces according to the previously found best protocol (GOLD docking, PLP scoring), a series of filters of increasing complexity (Table 1) was iteratively passed to a decreasing number of possible solutions, first starting with pairs of potentially reacting reagent poses, then with successfully enumerated ligand poses, and last with quality checked redocking poses.
Table 1 | Incremental series of filters applied to prioritize SpaceDock hits
Filter
|
Type
|
Criteria
|
Applies to
|
Software used
|
1
|
Geometry
|
Distances, angles, clashes
|
Pair of reactant poses
|
this work
|
2
|
Interaction
|
Interaction fingerprint similarity to reference
|
Pair of reactant poses
|
IChem45
|
3
|
Energy, geometry
|
Rmsd of refined pose to non-refined pose
|
Fully enumerated ligand
|
Szybki46
Surflex-Dock37
|
4
|
Interaction, structure
|
Interaction fingerprint similarity (IFP) to reference
Number of stereocenters Number of rotatable bonds
Drug-likeness
|
Fully enumerated ligand
|
IChem45
Filter46
|
5
|
Redocking
|
Rmsd to energy-minimized SpaceDock pose
IFP similarity to energy-minimized SpaceDock pose
|
Docking poses
|
GOLD34
Surflex-Dock37
IChem45
|
6
|
Quality check
|
Number of strained torsions Local and global strain energy
Number of unsatisfied H-bond donors and acceptors, number of unsatisfied ionic bonds
|
Docking poses
|
Torsion_analyzer47 Freeform46
this work
|
7
|
Final selection
|
Duplicates removal
HYDEscore
|
Docking poses
|
This work
Hydescorer48
|
The SpaceDock flowchart is displayed Fig. 3. In a first step, pure chemical and topological filters (Supplementary Figs. 3, 4) are passed to all docking poses of possible reactant pairs to quickly remove impossible reactions (filter #1). To stay on a safe side, we only considered pairs of bound reactants exhibiting a total interaction fingerprint (IFP) similarity40 to the genistein X-ray pose above an acceptable threshold40 (IFP ≥ 0.60 considering all non-bonded interactions, IFP ≥ 0.50 considering polar interactions only; filter #2). The 821,702 remaining pairs of reactants were then converted, in the protein 3D space, into the corresponding benzoxazoles and sulfonamides, respectively and the fully enumerated ligands were quickly minimized in the protein binding site. Only 539,906 poses deviated less than 1.0 Å rmsd from the non-refined poses after energy refinement (filter #3). The remaining minimized poses were filtered again according to IFP similarity of the genistein X-ray pose (IFP ≥ 0.60 considering all non-bonded interactions, IFP ≥ 0.60 considering polar interactions only; filter #4). Compounds with more than 2 stereocenters and 8 rotatable bonds were removed at this stage, leaving 49,569 poses for further processing. To ensure that the selected SpaceDock poses might be recovered by classical docking, all remaining hits were redocked to the ERβ structure, as previously done for the reagents. Only 121,470 poses close to the corresponding energy-minimized SpaceDock poses (rmsd ≤2.0 Å; IFP ≥ 0.60 considering all non-bonded interactions, IFP ≥ 0.60 considering polar interactions only) were retained (filter #5). A quality check of remaining poses (filter #6) was next applied to remove unlikely solutions (≥ 1 strained torsion, local strain energy > 4 kcal/mol, global strain energy > 8 kcal/mol, no unsatisfied ionic bond , >2 unsatisfied h-bond donors, > 4 unsatisfied h-bond acceptors).22, 49 The number of plausible solutions (7,712) being still important, a custom filter was finally applied to keep only poses anchored at both sides of the binding pocket (H-bond either Glu305 or Arg346, and to His475), as seen for all potent ERβ agonists (recall genistein X-ray pose, Fig. 3A). The final hit list comprises 102 poses from 64 unique ligands (filter #7), including 54 benzoxazoles and 10 sulfonamides (Fig. 3B, Supplementary Table S3) ranked by decreasing full IFP similarity to the reference ligand, then by decreasing polar IFP similarity, and last by increasing absolute binding free energy predicted by the HYDE scoring function.48
Despite in minority in the initial space (0.57%), it is reassuring that the ground truth chemotype was considerably enriched (84 %) in the final hit list. Inspecting the structures and binding poses of the hits, we observed that SpaceDock was indeed able to recover, among the top-ranked hits, the ground truth ligand (rank #9), a known ERβ agonist ChEMBL18767350 (IC50 = 50 nM, rank #25) and 52 other 2-arylbenzoxazoles, with almost perfect binding modes (rmsd =1.15 Å for the ground-truth ligand, Fig. 3C). About half of the hits (30 out 64; all from the benzoxazole space), were considered chemically similar to existing ERβ ligands (Supplementary Fig. 5), evidencing that SpaceDock can propose both known ligands (or very close analogs thereof) and new chemical entities. However, only a lower number of compounds (17, out of which 10 share the sulfonamide space) strictly intersected Enamine REAL space (Supplementary Fig. 5). This observation does not preclude for their synthesizability but just illustrates that these hits, despite the commercial availability of their starting building blocks, cannot be obtained within the scope of 167 parallel synthesis protocols defining REAL space.
From this preliminary proof-of-concept, it appears that the herein presented method is able to perform a complex organic chemistry reaction (ring cyclisation) from suitably posed and chemically compatible chemical reagents, under the 3D constraints of the target's structure, to generate and prioritize fully enumerated ligands for meaningful reasons. We therefore decided to apply SpaceDock to a prospective screening of a much larger chemical space.
Prospective chemical space docking of 670 million compounds for human dopamine D3 receptor antagonists.
We next applied the method to a much larger chemical space of 670 million carboxamides targeting the human dopamine D3 receptor (DRD3). Since the only available high-resolution DRD3 receptor structure (PDB 3PBL) has been obtained in complex with the antagonist eticlopride (Fig. 4A),51 the latter orthomethoxybenzamide (OMB) ligand was used as both reference and ground-truth ligand to recover. Commercially available carboxylic acids and primary/secondary amines (Supplementary Table 2) were first filtered to remove reagents that, upon amide bond formation, would lead to non-drug-like ligands (Supplementary Table 4), thereby keeping 19,887 acids and 33,726 amines (in 3D coordinates) to explore a chemical space of 670 million carboxamides (Fig. 4B). The resulting 53,613 chemical reagents were then docked to the eticlopride-free DRD3 structure using GOLD docking and PLP scoring, as previously described. Since 20 poses were saved for each reactant, a total of 268 billion (19,887*20*33,726*20) possible reactions were passed to the SpaceDock flowchart (Fig. 4B), removing first impossible amide bond formation according to geometrical criteria (Supplementary Fig.6) while keeping only amine poses exhibiting the crucial ionic bond to the key Asp110 residue51 (filter #1, Fig. 4B), then retaining pair of reactant poses for which the IFP similarity to the reference ligand is higher than 0.60 for all interactions and 0.50 for polar interactions only (filter #2).40 A total of 24,674,693 reactions were conducted in silico to generate the corresponding carboxamides inside the receptor pocket, that were later energy-minimized. Keeping only minimized poses that did not deviate much from the initial pose (rmsd < 1.0 Å) afforded 15,120,198 plausible solutions (filter #3, Fig. 4B). At this stage, hits bearing a cis-amide bond or more than 2 chiral centers or more than 9 rotatable bonds were removed to keep only drug-like compounds. The resulting number of hits being still very high, we pruned the hit list by keeping only minimized poses with a high full IFP similarity to the reference ligand (IFP similarity > 0.60) while exhibiting a perfect IFP similarity to eticlopride (IFP =1) with respect to polar interactions (H-bond and ionic bond to Asp110). This filter (filter #4, Fig. 4B) yielded to 518,306 SpaceDock poses (corresponding to 500,041 unique compounds) that had to be confirmed by full atomistic docking (GOLD docking, PLP scoring, 20 poses saved) of the corresponding ligands and comparison with the minimized SpaceDock poses. Only docking poses verifying the following three criteria (rmsd ≤2.0 Å & IFP_full ≥ 0.60 & IFP_polar =1) were retained, leaving 712,120 good docking poses (filter #5, Fig. 4B) for sanity check (no strained torsion, local strain energy ≤ 4 kcal/mol, global strain energy ≤ 8 kcal/mol, no unsatisfied ionic bond, ≤ 2 unsatisfied h-bond donors, ≤ 4 unsatisfied h-bond acceptors, filter #6, Fig. 4B). The number of remaining poses being still important (97,096), a custom filter (not implemented by default, Table 1) was added to remove poses for compounds with no aromatic ring (always present in known DRD3 antagonists),52 exhibiting a predicted absolute binding free energy (HYDEscore) lower than 30 kJ/mol and further restricting the deviation to the original SpaceDock poses (rmsd ≤1.0 Å & IFP_full ≥ 0.75). A reasonable number of 757 docking poses from 315 unique ligands (filter #7, Fig. 4B) defined the final hit list. Compounds were ranked by decreasing full IFP similarity to the reference ligand, then by decreasing polar IFP similarity, and last by increasing HYDE binding free energy (Supplementary Table 5).
As for the first attempt on ERβ ligands, we first check whether the ground-truth ligand and its corresponding OMB scaffold were present in the list. Indeed, 15 OMBs including eticlopride (rank 30) were part of the list with binding poses very similar to that observed for the reference ligand (rmsd of eticlopride = 0.73 Å, Fig. 4C). Interestingly, 300 additional hits not sharing the OBM scaffold were prioritized with poses and protein-ligand interaction patterns quite close to that seen for eticlopride (Fig. 4D). Most ligands were scaffold hops for which the orthomethoxybenzamide has been replaced by a bicyclic heteroaryl-amide, connected by 2-3 carbon atoms to a basic amine. By comparison to the ERβ hit list, the DRD3 hits deviate more from known ChEMBL ligands (24% considered as chemically similar) but are more easily obtainable in REAL space (53% being directly purchasable, and additional 38% being very close to REAL space compounds; Supplementary Fig. 7). 16 chemically diverse and representative hits were directly purchased at Enamine, out of which 15 could be synthesized in six weeks (5 mg quantity, > 90% purity) and further tested for binding to human DRD3 (Fig. 5).
Out of the tested 15 compounds, ten exhibited detectable binding (> 20% inhibition) to the DRD3 receptor at the single concentration of 10 µM (Fig. 5). The six strongest binders (#1, #25, #66, #107, #142, #161) were selected for dose-curve responses for inhibition constants (Ki) determination (Fig. 5, Supplementary Fig. 8). Three of them (#1, #66, #142) exhibited Ki values in the 300-400 nM range, the three others at 1.4-1.6 µM. The remarkable hit rates (66% at 10 µM, 20% at 500 nM) are in line with previous observations from docking ultra-large libraries,10, 11 and suggests that SpaceDock competes rather well with much more demanding full atomistic docking when screening large chemical spaces.
Interestingly, novel heteroamatic-carboxamide scaffolds were disclosed for 4 of the strong binders (#66, #107, #142 and #161) that could not be found in any of 6,714 dopamine DRD2/DRD3 ligands from ChEMBL (Table 2). SpaceDock proposals should still be considered as primary hits. As such, their potency is lower than that of the closest dopamine D2/D3 antagonists from ChEMBL, albeit with a higher ligand efficiency.