NMR-Guided Directed Evolution

Directed evolution can rapidly achieve dramatic improvements in the properties of a protein or bestow entirely new functions on it. We have discovered a strong correlation between the probability of nding a productive mutation at a particular position of a protein and a chemical shift perturbation in Nuclear Magnetic Resonance spectra upon addition of an inhibitor for the chemical reaction it promotes. In a proof-of-concept study we converted myoglobin, a non-enzymatic protein, into the most active Kemp eliminase reported to date using only three mutations. The observed levels of catalytic eciency are on par with the levels shown by natural enzymes. This simple approach, that requires no a priori structural or bioinformatic knowledge, is widely applicable and will unleash the full potential of directed evolution.

Experimentally, such an environment can be evaluated using NMR, which provides residue-level information under catalytic conditions without the need for full structural characterization. In a conformational ensemble, residues that require substantial reorganization to adopt or to increase the population of a speci c rotamer to support the transition state, should experience a large change in their NMR chemical shift upon addition of the corresponding transition state analog (usually a competitive inhibitor). Thus, analysis of the chemical shift perturbation (CSP) upon inhibitor titration may help identify mutagenic hot spots in the protein structure, both near and far from the active site.
Kemp elimination (Fig. 1) is a well-established and benchmarked model reaction for testing protein design and evolution methodologies 17-25 , thus we set up to explore whether an NMR-guided approach can be successfully used to evolve a Kemp eliminase. Inspired by the recent discovery of redox-mediated Kemp elimination promoted by cytochrome P450 26 , we sought to use a non-enzymatic heme protein as a starting point for the evolution. For the unbiased test of the approach, we chose not to perform any computational pre-selection of possible candidates, but rather focused on the simplest proteins.
Myoglobin (Mb), arguably the most well characterized heme protein, adopts catalytic functionalities upon replacement of distal histidine His64 27 , which in the native protein controls oxygen binding and slows heme oxidation. Mb-H64V has been extensively studied before 28 , so we experimentally tested this mutant for the ability to promote Kemp elimination. In the reduced form, Mb-H64V demonstrated catalytic e ciency of 104 M -1 s -1 at pH 8.0 presenting itself as a promising candidate for NMR-guided directed evolution (Table 1). Even with paramagnetism and high helical content of the reduced protein, a nearly full backbone assignment was possible, which enabled us to perform a CSP study using 6nitrobenzotriazole (6-NBT), an inhibitor of Kemp elimination (Fig. 1). The data show 15 hot spots, de ned as regions with residues CSP Z-score of above 1, dispersed around the protein, both near to and away from the heme cofactor (Fig. 2a,d). Next, we prepared saturation mutagenesis libraries in all positions with Z>1 and their immediate neighbors (except for the proximal His93 that was not considered as it is absolutely required for the heme cofactor binding). Crude lysate screening of the saturation mutagenesis libraries showed hits in all hot spots. Puri cation of the identi ed proteins con rmed the screening results in all cases (all showing large increase in catalytic e ciencies ranging from 2.3-fold to 93-fold, with an average of 21-fold) except in one instance (Mb-H64V/Q152M), where we were unable to produce enough soluble protein for kinetic characterization. Nine of the 19 identi ed productive mutations were located away from the active site (Fig. 2d).
Saturation mutagenesis performed in 12 randomly selected positions with small CSP yielded no hits (Fig.   2a, blue asterisks). In a subsequent non-exhaustive gene shu ing experiment, we found that L29I, H64G and V68A can be productively combined with positive synergy (the observed rate is 3-fold higher relative to the independent contribution of all mutations), a trait quite uncommon in traditional directed evolution experiments. The resulting triple mutant Mb-L29I/H64G/V68A, named FerrElCat for FERRous Kemp ELimination CATalyst, showed a remarkable Kemp elimination activity with catalytic e ciency of 2,796,000 M -1 s -1 at pH 8 (Table 1). This level of catalytic e ciency is more than an order of magnitude higher than that of the most active reported Kemp eliminase HG3.17, evolved in 17 rounds of directed evolution 19 , and is on par with the levels shown by natural enzymes for the reactions they are evolved to catalyze. Importantly, NMR-guided approach yields mutants with high k cat values (1,398 s -1 for FerrElCat), a trait that is often hard to achieve in traditional approaches to directed evolution, where high levels of catalytic e ciency are often achieved by lowering the K M . FerrElCat is capable of at least 10,000 turnovers before showing signs of product inhibition (Extended Data Fig. 1). The unprecedented, experimentally guided ca. 27,000-fold improvement in catalytic e ciency (Extended Data Fig. 2) over the starting design in directed evolution of a catalyst for an unnatural reaction was obtained with only three mutations in a non-enzymatic protein (Fig. 2c). The crystal structure of FerrElCat shows remarkable similarity to the starting point of the evolution 29 (RMSD of 0.16 Å, Fig. 2e). While we were unable to obtain a crystal structure of FerrElCat with an inhibitor, docking studies (Fig. 2e) show that directed evolution results in creation of a tight binding pocket bringing the substrate into proximity to the heme iron. Strikingly, we were unable to dock neither 5-NBI nor 6-NBT into the crystal structure of Mb-H64V, that lacks a su ciently large binding pocket (Fig. 2d). Yet, CSP analysis clearly shows association of the inhibitor with the protein in the place of the distal histidine, highlighting the power of NMR to easily identify productive arrangements of molecules that may not be apparent in modelling based on static crystal structures.
In conclusion, we have discovered a strong correlation between the degree of NMR CSP of backbone amide resonances in 15 N-1 H HSQC spectra of enzymes by an inhibitor and the probability of nding a bene cial mutation in the vicinity of that residue. The chemical shift perturbation maps are highly sensitive to minor changes in protein sequences, and pinpoint areas likely to affect catalytic activity, even if located far from the active site. In a proof-of-concept study we converted myoglobin, a non-enzymatic oxygen storage protein, into a highly e cient Kemp eliminase using only three mutations. To our knowledge this represents the rst example of an experimental approach to guide directed evolution that does not rely on an a priori structural or bioinformatic analyses, and only requires reliable backbone amide assignments and an appropriate inhibitor. Such NMR data can usually be easily obtained for soluble, folded proteins with fewer than about 300 residues, a criterion that is true for many enzymes selected for directed evolution. Given the simplicity of this experimental approach, we expect it to be widely applicable to other proteins and to unleash the full potential of directed evolution to rapidly create new enzymes for practically important chemical transformations. These results also highlight the power of the minimalist approach to design of protein catalysts 30 that allows for quick and inexpensive identi cation of starting points for subsequent directed evolution without detailed consideration of the reaction mechanism as well as extensive computation, and instead exploits the incredible plasticity of proteins to adopt new functions. Last, but not least, our results contribute to the ongoing debate about the role of dynamics in enzymatic catalysis [11][12][13][14][15] by prospectively validating the importance of conformational selection in protein evolution. This opens the path to new high value fundamental studies of enzymatic function and evolution.     (10 mL

Reduction and concentration determination of myoglobin variants
For standardization of dithionite, 20-30 mg of solid dithionite (Riedel-de Haen, Germany) as well as potassium ferricyanide (Sigma) were brought into the glovebox. Both the solid reagents were dissolved in 1 mL of degassed MilliQ water to prepare stock solutions. Dithionite stock was further diluted by 20-fold.
Next, two 1 mL solutions were prepared where in the rst one, potassium ferricyanide stock was diluted by 100-fold while in the second one, a 1:1 mixture of ferricyanide stock and 20-fold diluted dithionite solution was prepared with subsequent dilution of each of them by 100-fold. Absorbances of both the solutions were measured at λ max = 420 nm using UV-Vis diode array spectrophotometer (Agilent 8453).
The reducing equivalence of 20-fold diluted dithionite solution was calculated from the difference in absorbances of the solutions using extinction coe cient of 1,020 M -1 cm -1 at 420 nm. Protein  Table 2.

Library design
The gene encoding sperm whale myoglobin was cloned into pET-28a(+) (Novagen) with simultaneous introduction of the H64V mutation using standard protocols. Site-speci c saturation mutagenesis targeting de ned site was achieved using megaprimer PCR protocol 33 with primer sets (Integrated DNA Technologies) which overlapped on the 5' terminus of the randomized position, together with anking primer (T7 forward or T7 reverse), as appropriate (NNK primer sequences are shown in Supplementary  Table 1). The size of PCR product was veri ed using agarose gel electrophoresis. DNA sample was digested with DpnI (New England Biolabs) at 37 °C for 10-12 h to eliminate parental clone. Digested sample was transformed into E. coli NEB5α cells (New England Biolabs) and subsequently plated on Luria Bertani (LB) agar plate containing kanamycin (50 µg/mL). After incubation at 37 °C for 10-12 h, colonies obtained from the plate were allowed to grow in LB with Kan at 37 °C for 5-6 h. Cells were harvested, and plasmids were extracted using DNA extraction kit (Monarch, New England Biolabs). Library quality was con rmed by Sanger sequencing analysis (Genewiz, Inc.) (Extended Data Fig. 6). Kemp substrate was docked into FerrElCat using AutoDock Vina 8 and described protocol 9 . The imidazole molecule associated with the heme in the crystal structure was removed prior to docking.

Circular dichroism (CD) spectroscopy
All CD spectra of myoglobin variants were recorded using Jasco J-715 CD spectrometer in continuous mode with 1 nm bandwidth, 2 nm data pitch, scan rate of 50 nm/min with 8 s averaging time. The nal spectra represent a buffer-subtracted average of three runs. The CD spectra of non-reduced proteins in the far-UV region (200-260 nm) were collected using quartz cuvette with 1 mm pathlength while for the Soret band region (390-470 nm) quartz cuvette of 1 cm pathlength was used. The spectra at Soret band region (390-470 nm) were obtained to determine mean residue ellipticity values (MRE) assuming protein binds heme in a 1:1 ratio. Protein stocks were diluted in 2 mM TRIS (pH 8.0) to 5 µM and the spectra were recorded for oxidized protein. For the analysis of the reduced protein, the stock was diluted in the same way as for the oxidized sample and ten equivalents of sodium dithionite were added to the protein inside the glovebox. The concentration of the reduced protein was calculated based on the Soret band maxima using the corresponding extinction coe cients. Sample absorbance never exceeded 2 at all wavelengths.
The mean residue ellipticity (MRE, deg*cm 2 *dmol -1 ) values were calculated using the following equation (MRE = θ/(10*c*l*N)), where θ (mdeg) is ellipticity, l (cm) is the pathlength of the cuvette, c (M) is the protein concentration and N is the number of residues (Extended Data Fig. 7).

Data availability
The crystal structure of FerrElCat (Mb-L29I/H64G/V68A) was deposited in the Worldwide Protein Data Bank (wwPDB) under the accession number 7VUC.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.