Our study commenced with a crowd-sourcing approach wherein each member of our lab collected all published and patented synthetic routes for one of the drugs in Figure 1. The routes were then encoded via their simplified molecular-input line-entry system (SMILES) strings. From this dataset we built an interactive route visualizer, available for free at http://covidroutes.cernaklab.com (21), to facilitate review of existing routes. The concatenated list of starting material SMILES from each target was used as an exclusion criterion in each retrosynthetic search. This approach allowed us to rapidly navigate to novel starting materials. Each search result contained 50 route proposals, and the user-defined search heuristic was generally set to minimize starting material cost. A single search heuristic worked for most targets, but occasionally, the preference for minimized cost reagents would result in proposed routes with more reaction steps than desired. In these cases, the search heuristic was modified by relaxing the preference for reagent cost and increasing the software’s beam search width. Predicted routes were manually reviewed for step count, synthetic feasibility, and ease of execution of proposed reactions on multikilogram scale, for instance by biasing towards routes that minimized the use of cryogenic cooling or pyrophoric reagents. The final heuristic used for each target is shown in the Supporting Information.
Among small molecules being considered (Figure 2), we focused on remdesivir (1), umifenovir (2), bromhexine (3), galidesivir (4), ritonavir (5), cobicistat (6), ribavirin (7), camostat (8), darunavir (9), nelfinavir (10), favipiravir (11), and baricitinib (12). In most cases the proposed route has the same number, or fewer, steps than the established routes, and initiates from distinct starting materials. Our analysis yielded alternate starting material proposals for 1–12, which can relieve pressure on the fine chemical supply chain. Using galidesivir (4) as an example (Figure 3), the software proposed a sequence hinging on a trans-hydroiodination of alkyne 13, an Evans alkylation to form 18, an Ullman coupling to form 19, and an enantioselective Heck-coupling to give 22. The software proposed that the latter reaction mixture could be subjected in situ to hydrochloric acid to remove the Boc-protecting groups in a one-pot operation. Dihydroxylation of 22 would complete the synthesis of 4. Exemplary starting materials 23–25 were excluded from the search based on the appearance of their SMILES strings in published routes to 4. The algorithm successfully navigated around five established pyrrolopyrimidine starting materials to arrive at 21, which is cost-competitive with the established nucleobase sources; for instance, the 7-des-bromo analog of 21, 4-chloro-5H-pyrrolo[3,2-d]pyrimidine, is listed at $9.90/g while a 4–OH analog of 21, 7-bromo-5H-pyrrolo[3,2-d]pyrimidin-4-ol, which is used in the reported synthesis of 4, is listed at $280/g (22). On production scale, all starting material pricing would likely be customized based on competitive bidding, but in any event the high list price of 21 is comparable to starting material analogs currently described in patents. The proposed use of an Evans auxiliary to produce 18 highlights the software’s desire to select robust chemistry, but this step could likely be replaced with a catalytic protocol to avoid auxiliary use for large-scale production if needed. Indeed, the overall route proposes a variety of catalytic operations. For the production of 1–12 on large scale, routes could be found that minimized use of cryogenic conditions, pyrophoric reagents or expensive catalysts, which were the main biases imposed in our manual review of answer sets beyond route length and starting material cost. For 1, novel starting materials were identified but the route bore high similarity to known routes, mirroring the automated retrosynthesis findings of the Gryzbowski lab for this target (10). While the software was challenged by esoteric functionalities like a chiral phosphorus atom, the predicted route to 4 discussed above, and those to 2 and 3 discussed below, represent typical outputs.
Umifenovir (2), is an antiviral drug developed to combat influenza infections whose use against SARS-CoV1 made it an attractive synthetic target for this study. It is believed that 2 inhibits entry of viruses into human cells, and the antiviral has been used in many clinical trials as an investigational COVID-19 therapy (9). Although 2 shows promising in vitro activity against the novel coronavirus, recent clinical results suggest limited efficacy for 2 against COVID-19 in humans (23). Using the search criteria described above, we arrived at a series of routes to 2 based on the oxidative cyclization of an aniline with a b-ketoester (24). Since it is a commodity chemical, ethyl acetoacetate 27 was not included in the exclusion criteria of our heuristic and appears as a starting material here, although it has been used previously in the synthesis of 2. Starting materials 28–30, among several others (see Supporting Information), were excluded. A key theme that separated the predicted routes from the established indole-forming routes, and enabled the use of distinct starting materials, was the incorporation of a Baeyer-Villiger oxidation to utilize an acetyl group as a surrogate to the requisite hydroxyl group at C1. We found this proposal of a Baeyer-Villiger oxidation to be a surprising solution. Among other proposals that were non-obvious to us was the suggested C–H oxidation of an ethyl group where the use of C–H functionalization logic (25-28) reduces the cost of the starting materials. The software proposed an inefficient three-step sequence to N-methylate the indole, whereas published syntheses of 2 reported N-methylation directly from the indole with methyl iodide and sodium hydride. We opted to use this one-step precedent instead of the software’s three-step proposal. In another search, a proposed sequence to 2 initiated with a pre-installed halogen coupling handle, instead of a C–H bond, to enable a related indole formation, but instead invoked a different C–H functionalization logic via a Bamberger rearrangement. As described below, these four routes were reduced to experimental practice with only minor modifications to reaction conditions and sequences proposed by the software.
To experimentally validate routes to 2 (Figure 4), we first investigated the proposed indole formation from 1-(4-aminophenyl)ethan-1-one (26, $1.15/g) and ethyl acetoacetate (27, $0.03/g) using oxidative reaction conditions (Figure 3, route A). Pretreatment of 26 and 27 with 1 mol% indium(III) bromide, to form 31, was followed by oxidative cyclization to form 32. While the published reaction conditions for the suggested reaction (24) did provide the desired indole 32, the yield was only 20%. Using magnesium sulfate to promote the formation of 31 improved the yield of 32 to 47%. As described below, other implementations of this reaction gave much higher yields. N-Methylation occurred smoothly to produce 33 in 99% yield. An issue was encountered in the experimental realization of the Baeyer-Villiger oxidation using mCPBA in that a mixture of oxidation products was obtained. Unstable products we believe to be from oxidation of the indole’s double bond accounted the bulk of the reacted material, and only traces of 34 were isolated. While formation of 34 was accurately predicted, the subtle interplay of electronics that govern the preference for the desired Baeyer-Villiger oxidation over the undesired Prilezhaev oxidation could not be teased out by the software, and the best modification of reaction conditions we found yielded small amounts of 34 as a mixture with undesired oxidation products. A literature search on related indoles revealed that the a-chloroketone 35 should be a viable substrate for the Baeyer-Villiger (29), with the chloro-group acting as a directing group to favor oxidation of the ketone. We thus modified the route and, indeed, chlorination of 33 led to 35, which underwent selective Baeyer-Villiger oxidation to produce 36. Subsequent bromination produced 37, which underwent thioetherification with 38 and in situ saponification to produce 39. Here, the route intercepts known syntheses of 2 via alkylation with 40 (30). All intermediates predicted by the software were observed, but a modification to incorporate a chlorine directing group was necessary to achieve usable levels of selectivity in the formation of 36. This change led us to demonstrate the bromination of 36 to produce 37, instead of brominating 34, yet the selective bromination of 34 en route to 39 is a known reaction (31).
The output of a SYNTHIATM search is a ranked list of route proposals. Several other computed routes to 2 were also experimentally vetted. One route, based on a variation of the same indole-formation and Baeyer-Villiger sequence described above, proposed a benzylic C–H oxidation of indole 43 (Figure 4, route B). The indole-synthesis was more productive with 41 than with 26, yielding 42 in 79% yield. Methylation gave 43 in 92% yield. SYNTHIA™ predicts reaction sequences. Corresponding reaction conditions are recommended based on what was reported in the source literature. While these recommendations work well for a majority of substrates, exact recipes for specific substrates may require user direction. Software-recommended conditions of Oxone® with potassium bromide (32) for the C–H oxidation of 43 were unsuccessful in our hands. An experimental survey of oxidants revealed the recently disclosed Baran-Roček oxidation (33, 34) could selectively oxidize C14 in 62% yield, thus intercepting the previous route to 2. While the generation of chromium waste is only viable on small scale, this result validates the proposed C–H functionalization hypothesis.
The direct installation of the chloromethyl ketone via a Friedel-Crafts acylation would provide a concise and alternative route to 2. Indeed, the software had proposed a route that used a Friedel-Crafts acylation (Figure 4, route C). This route was intriguing in that it initiated the synthesis from 45, an exceptionally cheap starting material. While the software proposed a Friedel-Crafts acylation with acetyl chloride, we modified the route to use instead chloroacetyl chloride (48, $0.13/g), thus installing the chloride directing group in a single step. Experimentally, oxidative indole-coupling to form 46, followed by methylation to form 47, occurred smoothly. Friedel-Crafts acylation of 47 with 48 under influence of aluminum(III) chloride gave 35 and intercepted the other routes. The 2:1 regioselectivity of the acylation would require optimization for production on scale. Aside from this reaction, the regioselectivity for desired isomers was excellent for other C–H functionalization events. While C–H functionalization logic is not new, it has become rigorously adopted in synthesis only in recent years. We were therefore surprised that C–H functionalization reactions were frequently proposed by the software, and expect this outcome is the result of the preference for low-cost starting materials in our heuristic, with C–H bonds in many instances being cheaper than other functionalities. The Friedel-Crafts acylation route described functionalizes six C–H bonds over seven reactions to convert 27 and 45 into 2.
We next employed a different tactic. Most routes to 2 hinge on a Nenitzescu indole coupling (30) between 1,4-benzoquinone and a b-aminocrotonic ester (21). Indeed, the Nenitzescu reaction using known starting materials featured as a proposal in our query results when default search criteria were used, so the keyword “Nenitzescu” was used as an exclusion criterion. This heuristic did not employ a SMILES exclusion criterion, so starting material 52 was employed even though this chemical has been used in a prior synthesis of 2. The results of this search led to yet another form of C–H functionalization logic, via a Bamberger rearrangement to install the C1 hydroxyl (Figure 4, route D). SYNTHIA™ proposed the use of 4-bromo-2-chloro-1-nitrobenzene as a starting material. In our hands, the requisite indole coupling on the chloride gave only traces of 39, and we ultimately modified the starting material to use 2,4-dibromo-1-nitrobenzene (49) instead. This modification allowed the indole coupling to proceed, as discussed below, with the added benefit that 49 is cheaper than the corresponding chloride. In practice, 49 was reduced to the hydroxylamine, and treated with aqueous trifluoroacetic acid to affect the Bamberger rearrangement yielding 50, which was methylated to arrive at 51. Copper-catalyzed coupling to 53, itself obtained through the union of 52 and 38, produced 39 in 66% yield when 54 was used as a ligand. These conditions were the result of a rapid optimization campaign using high-throughput experimentation (see Supporting Information). Subsequent alkylation of 39 with 40 produced 2. With this latter route, convergency is maximized, so the longest-linear sequence is just five steps.
Finally, we looked at 3 (Figure 5), a transmembrane protease, serine 2 (TMPRSS2) inhibitor that was being investigated in five clinical trials for COVID-19. A SYNTHIATM search provided new reaction sequences of comparable length to known routes, identifying 55 as a novel starting material (21) by navigating around known starting materials 56–59 and others. The predicted route invoked a C–H oxidation of the benzylic methyl group, presumably to arrive at cheaper starting materials, which readied 60 for a reductive amination with 61. The proposed route completed the synthesis of 3 by N-methylation of 62 with 63. We considered instead that 3 could be synthesized from 2,4,6-tribromoaniline (64, $0.51/g), which is used in the textile industry and readily available, with N,N-dimethylcyclohexylamine (65, $0.10/g), a commodity chemical used in oil refining, via the direct C–H functionalization recently reported by Shirakawa (35). While this manually designed route doesn’t serve as a test of the software’s capability per se, our motivation here was largely to do what we could as synthetic chemists to support production of a potentially beneficial medicine during a pandemic. The key reaction was added to the SYNTHIATM database so it would appear as a general solution to subsequent searches, and indeed this route came up as a top hit in a subsequent search for 3. To experimentally realize this one-step route, we found that 64 could be heated in excess 65 in the presence of tert-butylperoxide (35) to produce 3 in 41% yield. Further optimization of reaction conditions – to improve yield, ease of purification of 3, and address the hazard of using a peroxide on large scale – would be needed for commercial production (36). Nonetheless, this strategic disconnection reduces 3, in a single step, to starting materials that are considerably cheaper than those in commercial use.
Disconnection of drugs into affordable reagents was achieved for twelve drugs through the merger of crowd-sourcing and retrosynthetic software. Navigating the combinatorial explosion of routes towards twelve distinct synthetic targets to arrive at distinct and affordable starting materials was a data handling challenge that could only be navigated with automated retrosynthesis. Four predicted routes to 2 and one route to 3, manually designed but added to the software’s database for future use, were experimentally validated. Our work was performed over nine weeks in Spring 2020 against the backdrop of a developing pandemic. While full process development would require a longer timeline – for instance reagents such as peroxides would likely be replaced for production on scale – our results show that automated retrosynthetic predictions can rapidly de-risk the route scouting process of translational synthesis.