Development of a high throughput method to obtain a high-quality enantioselectivity dataset
Enantiomeric excess (e.e.) is a commonly used parameter to gauge the enantioselectivity of given enzymes. However, when dealing with hydrolases that exhibit low enantioselectivity, particularly towards nearly symmetric substrates, e.e. proves to be an ineffective parameter as its values fluctuate during reaction44. To address this limitation and devise a high-throughput method for obtaining a high-quality dataset of enantioselectivity, we adopted the ratio of initial reaction rates between (R)- and (S)-CHCE, referred to as the apparent enantioselectivity (Eapp). This approach relies on the “real substrate” and accurately mirrors the actual reaction dynamics (Figure 2A). The initial reaction rates towards (R)- and (S)-CHCE were determined by coupling the hydrolytic reaction with a chromogenic reaction catalyzed by alcohol dehydrogenase (ADH). An ideal ADH for this method should possess the following characteristics: 1) NADP+-dependent instead of NAD+-dependent to eliminate background interference; 2) high catalytic efficiency (kcat/KM), even at lower ethanol concentrations. Consequently, genome mining and rational mutagenesis were employed to identify an NADP+-dependent ADH with high kcat/KM.
At high ethanol concentration (800 mM), NADP+-dependent ADH6 proved the most efficient, displaying a specific activity as high as 421 U·g–1 (Figure 2B), while at lower ethanol concentration (2 mM), NADP+-dependent ADH10 from Kluveromyces polysporus exhibited the highest specific activity of 9.34 U·g–1. However, the activity of ADH10 was low and required high loading, which was deemed unsatisfactory for developing a high-throughput method to determine Eapp. To address this limitation, ethanol was docked into the active center of ADH10 (PDB: 5Z2X), and residues surrounding ethanol were identified (Figure 2C). In response to the insufficient activity, rational mutagenesis was undertaken to manipulate the hydrophobic interaction and steric hindrance between the methyl group of ethanol and ADH10. Mutations were introduced, turning them into Cys, Phe, Leu, Ile, and Val. Five single mutants, including V84L, V84I, F197L, F197V, and F197I, were obtained, displaying higher activity towards ethanol (Figure 2D). Among these, F197V proved to be the most efficient, with a specific activity of 17.1 U·g–1. Consequently, double mutants ADH10V84L/F197V and ADH10V84I/F197V were constructed, with ADH10V84L/F197V displaying a synergistic effect, exhibiting a specific activity of 70.4 U·g–1, approximately 7.6-fold higher than ADH10. Kinetic parameters analysis revealed that ADH10V84L/F197V not only demonstrated increased ethanol binding affinity, with a KM value of 2.34 mM, much lower than 55.9 mM of WT, but also enhanced catalytic activity, with a kcat of 5.68 min–1. The kcat/KM of ADH10V84L/F197V was significantly increased to 2.43 min–1·mM–1 from 0.07 min–1·mM–1 of WT. The impressive performance of ADH10V84L/F197V encouraged further optimization of conditions for high-throughput determination of Eapp of AcEst1.
Various factors including ADH dosage, NADP+ concentration, pH, and CHCE concentrations, were systematically investigated. In Figure 2E, the changes in A340 were similar at ADH dosages higher than 7.5 U·mL–1. With an increase in NADP+ concentrations from 2 to 5 mM, the slopes of A340 increased accordingly until exceeding 4.0 mM. The effect of CHCE concentrations was also explored, and the slopes of A340 increased steadily until reaching 5.0 mM CHCE. Although a further increase in CHCE concentration would result in a higher A340 slope, the low solubility of CHCE might interfere with the coupled reaction. Consequently, an ADH dosage of 7.5 U·mL–1, NADP+ concentration of 4 mM, and CHCE concentration of 5.0 mM were adopted as the optimum conditions for the high-throughput method. Subsequently, the feasibility of this method was evaluated using the entire plate of AcEst1. The average Eapp of WT was 4.1 ± 0.3, with a CV of less than 8%. This high-throughput method, based on a “real substrate,” was developed to obtain a high-quality dataset on the enantioselectivity of AcEst1. Importantly, this method can be extended to characterize other hydrolases with ethanol as the product.
Development of ML predictors for enantioselectivity of AcEst1 toward CHCE
To gather a diverse and high-quality dataset on the enantioselectivity of AcEst1 mutants for constructing an ML predictor, 20 unconservative residues located in the first and second layers surrounding the catalytic S201 were identified for saturation mutagenesis (Figure 3A). Degenerative codons of 22c-trick45 including NDT (A/T/C/G, A/T/G, T), VHG (A/C/G, A/T/C, G), and TGG were utilized at molar ratio of 12:9:1 to develop the unbiased library, and a 3-fold excess of mutants was evaluated at each position to ensure the quality. A total of 1920 mutants were subjected to determining Eapp values using the high-throughput method. Mutants from F66, F78, I82, V133, Y228, V230, D253, V254, V257, and L297 exhibited significantly enhanced Eapp values, while mutants from T248, L249, V254, T258, L297, and N329 displayed decreased Eapp values (Figure 3B). Among all the single mutants, V257M and L297F had Eapp values of 9.84 and 0.77, respectively, ranking as the highest and lowest records. After removing redundancy, a high-quality dataset containing 760 out of 1920 mutants was obtained for training ML models.
While ML is recognized for its potential in advancing directed evolution, its effectiveness is often hindered by incompatible descriptors or features46. In this study, biochemical features, including volume (feature #1), hydrophobicity (feature #2), hydrophilicity (feature #3), electrostatic (feature #4), hydrogen bond (feature #5), π-π interaction (feature #6), and distance to catalytic residue (feature #7), based on 1-D sequence from biochemical considerations, were introduced in this study (Figure 3C). The use of atom numbers or SMILES format of amino acids as one feature is common. However, this feature is inaccurate for describing the properties of residues, as there is no linear relationship between atom number and spatial volume. For instance, Phe and Lys have the same spatial volume but different molecular weights. Therefore, spatial volume was selected as a feature. The hydrophobicity of the active center is crucial for enzyme performance. Both the hydrophobicity index47 denoting hydrophobicity (feature #2) and the Kyte-Doolittle hydropathy scale48 denoting hydrophilicity (feature #3) were introduced as features to describe the hydrophobic interactions. The isoelectric point of an amino acid (feature #4) was selected to describe electrostatic properties. Hydrogen bonding was described by the number of electronegative atoms, such as O and N (feature #5). Stacking interactions, including π-π or alkyl-π interactions, were represented by the number of double bonds (feature #6). Moreover, the distance to nucleophilic S201 was also introduced as feature #7. These features from experimental and biochemical considerations were first introduced in the ML model, providing a robust foundation for training the ML predictor.
Various ML algorithms have been employed to address biocatalytic-related challenges, including Kernel Ridge Regression (KRR), Gaussian Process Regression (GPR), Gradient Boosting Regression Tree (GBRT), Random Forest Regression (RFR), Support Vector Regression (SVR), and Bayesian Ridge Regression (BRR)35,49. However, no single algorithm is universally optimal for all tasks. Consequently, we assessed the performance of six regression models in correlating the enantioselectivity of AcEst1 with seven features, namely GBRT, GPR, KRR, RFR, SVR, and BRR. The dataset was randomly divided into a training set (90%) and a testing set (10%), and hyperparameters were tuned for each model. Subsequently, the learning process was executed using the high-quality dataset and the aforementioned models. Based on the regression results, GBRT outperformed GPR, KRR, RFR, SVR, and BRR. The Pearson correlation (R2) between predicted and experimental Eapp values of the GBRT model reached a high value of 0.93, with a Root Mean Square Error (RMSE) of 0.119 (Figure 3D). Landscape analysis revealed a smooth distribution of data, indicating the excellent performance of GBRT. KRR, SVR, and BRR were ineffective in predicting the Eapp of AcEst1, exhibiting lower R2 values (below 0.55), and almost all the Eapp values were underestimated. RFR showed better performance than GPR, with higher R2 and lower RMSE. However, RFR’s ability to predict mutants with elevated Eapp was less robust than GBRT and GPR. The performance of the GBRT model aligned with data-driven protein engineering for the activity and stereoselectivity of transaminase, achieving an R2 of 0.80341. The higher R2 in our model can be attributed to the high-quality dataset and incorporation of biochemical features. Attempts to reduce the number of features and retrain the GBRT model resulted in decreased correlation.
To enhance the accuracy of the trained GBRT predictor, we systematically combined advantageous single mutants to generate double mutants. All single mutants with increased or decreased Eapp values were paired, resulting in various double mutants. Notably, V257M/Y228M stood out with the highest Eapp value of 13.8, while L297F/L249A displayed the lowest Eapp value of 0.36. The incorporation of these double mutants further enriched the dataset used for retraining the GBRT model. The GBRT predictor was then retrained using both the single and double mutants (Figure 4A). The retrained GBRT predictor demonstrated exceptional performance, achieving a Pearson correlation of 0.97 with an RMSE of 0.105, indicative of a robust regression. The remarkable performance of the trained GBRT predictor instilled confidence in its application to guide combinatorial mutagenesis for the stereodivergent evolution of AcEst1.
ML guided stereodivergent evolution of AcEst1
Considering that V257M and L297F were the single mutants with the highest and lowest Eapp values, they were selected as starting points for the stereodivergent evolution of (R)-selective and (S)-selective AcEst1 mutants. Consequently, V257M and L297F were designated as DR1 and DS1, respectively. For the (R)-selective evolution, we predicted the Eapp values of all double mutants starting with DR1 and incorporating other mutations (I82M, V133S, Y228M, V230W, D253H, V254Q, V257M, and L297I) with increased Eapp (Figure 4B). DR1/Y228M (DR2) emerged as the most (R)-selective, consistent with experimental results. Subsequently, we predicted the Eapp values of triple mutants starting from DR2, leading to DR2/I82M (DR3) with an impressive Eapp of 18. We further forecasted the Eapp values of quadruple mutants based on DR3, including DR3/V133S, DR3/D253H, DR3/V254Q, and DR3/L297I. V230W was excluded due to the lack of a synergistic effect. As depicted in Figure 4B, the predicted Eapp values of quadruple mutants were all higher than the corresponding triple, double, and single mutants, suggesting a synergistic effect among them. However, none of the quadruple mutants exhibited predicted Eapp values surpassing DR3, which can be attributed to the limited contributions of V133S, D253H, V254Q, and L297I. To validate the accuracy of the GBRT-predicted results, we experimentally constructed DR1, DR2, and DR3, determining their enantioselectivity values (E value) through resolution reactions. The E values progressively increased from 7.3 for WT to 40.1 for DR1, 59.1 for DR2, and 103 for DR3 (Figure 4D). Quadruple mutants, such as DR3/V254Q and DR3/L297I, were also experimentally constructed; however, their E values were lower than DR3, aligning with the prediction result. The newly developed DR3 thus represents a promising biocatalyst with excellent enantioselectivity in the resolution of near-symmetric CHCE.
The successful guidance of (R)-selective evolution further motivated us to assess the applicability of the trained GBRT predictor in (S)-selective evolution. To depict the evolution pathway more clearly, the reciprocal of the predicted Eapp ((S)-CHCE/(R)-CHCE) was adopted for (S)-selective evolution (Figure 4C). Among all the single mutants, only L297F exhibited an Eapp value lower than 1.0, indicating reversed (S)-selectivity. Therefore, L297 was designated as DS1 and served as the starting point for (S)-selective evolution. All the single mutants with Eapp values lower than WT, including L247F, T248W, L249A/I, V254K, T258A, and N329F, were added to DS1 to form double mutants. Among all the double mutants, the reciprocal of the predicted Eapp value for DS1/L249A was 2.8, ranking the highest. Consequently, DS1/L249A was designated as DS2a and subjected to triple mutation. Other mutations were iteratively added to DS2a to predict their Eapp values. DS2a/T248W displayed a reciprocal predicted Eapp of 3.0, slightly higher than DS2a, and was regarded as DS3 for further combination. The quadruple mutant DS3/T258A (DS4) exhibited increased enantioselectivity and was combined with other mutations to form quintuple mutants. DS4/V254K showed the highest (S)-selectivity among all the quintuable mutants and was regarded as DS5. Furthermore, L247F and N329F were introduced in DS5, resulting in DS5/N329F (DS6) with a reciprocal of predicted Eapp value as high as 5.9 toward (S)-CHCE, even higher than the 4.1 of WT towards (R)-CHCE, implying the concrete reversal of the enantioselectivity of AcEst1 from (R)-selective to (S)-selective.
Although the single mutant L247F exhibited decreased enantioselectivity compared to WT, its contribution to (S)-selectivity was limited. The reciprocal predicted Eapp values of different combinatorial mutants containing L247F were overall increased while locally decreased (Figure 4C), suggesting the existence of an antagonistic effect. The introduction of L247F in DS6 resulted in decreased enantioselectivity. DS1-DS6 predicted by the GBRT model were experimentally constructed and evaluated in the resolution of rac-CHCE. The E value of DS1 was –1.6, reversed from the 7.3 of WT. Since DS2a (DS1/L249A) and DS2b (DS1/L249I) had similar predicted Eapp values, both were investigated, and DS2a displayed a lower E-value of –4.2. From DS3 to DS6, the E value further decreased from –5.7 to –11, significantly lower than WT AcEst1. Stereodivergent evolution of AcEst1 toward nearly symmetric esters has been achieved using this newly trained GBRT predictor.
Characterization of stereocomplementary AcEst1 mutants
To gain a deeper understanding of enantioselectivity manipulation, kinetic parameters of AcEst1 mutants toward (R)- and (S)-CHCE were meticulously characterized (Table 1). The WT AcEst1 exhibited kcat/KM values toward (R)- and (S)-CHCE of 156 and 13 s–1∙mM–1, respectively, with a resulting calculated ratio of 12.0. Notably, the (R)-selective DR3 mutant displayed kcat/KM values of 9.82 and 0.08 s–1∙mM–1 toward (R)- and (S)-CHCE, respectively, leading to a drastically increased ratio of 123, approximately 10.3-fold higher than WT, aligning with the observed E value. DR3 exhibited kcat values toward (R)- and (S)-CHCE of 60.6 and 0.66 s–1, respectively, with a ratio of 92, significantly surpassing the 6.2 of WT. In terms of kinetic dynamics, the substantial decrease in the kcat value of DR3 toward (S)-CHCE is pivotal for its heightened enantioselectivity. Deconvolution analysis indicated elevated kcat/KM ratios for Y257M and Y228M (59.0 and 32.9, respectively) compared to WT, suggesting their significant contributions to enhanced enantioselectivity. I82M exhibited a modest increase in the kcat/KM ratio (13.0).
Kinetic parameter and deconvolution analysis of the (S)-selective DS6 were also conducted. The kcat/KM value of DS6 toward (R)-CHCE significantly decreased to 0.74 s–1∙mM–1 from the 156 s–1∙mM–1 of WT, indicating a substantial reduction in catalytic efficiency toward (R)-CHCE. Meanwhile, the kcat/KM value of DS6 toward (S)-CHCE was 4.00 s–1∙mM–1, with a kcat/KM ratio of (S)-CHBE to (R)-CHCE of 5.41, around 67.6-fold higher than that of WT, in agreement with the reversed E value of DS6. Deconvolution analysis revealed that L297F exhibited the highest kcat/KM ratio of 1.74. While L297F maintained the same kcat/KM value of 13.3 s–1∙mM–1 toward (S)-CHCE as WT, it drastically decreased the kcat/KM value to 7.65 s–1∙mM–1 toward (R)-CHCE. Additionally, L249A also displayed a higher kcat/KM ratio of 1.08, about 13.5-fold higher than WT. In contrast to L297F and L249A, single mutants T248W, T258A, and V254K exhibited increased catalytic efficiency toward (S)-CHCE, surpassing the increment toward (R)-CHCE. T248W, T258A, and V254K contributed not only to the increased (S)-selectivity but also to the catalytic efficiency of DS6. T248W and N329F were beneficial for the enhanced binding affinity of DS6 toward (S)-CHCE. From the perspective of kinetic dynamics, the significantly decreased catalytic efficiency toward (R)-CHCE and enhanced binding affinity toward (S)-CHCE are both crucial for the (S)-selectivity of DS6.
Application potential of stereocomplementary DR3 and DS6 in resolving near-symmetric esters
The potential application of stereocomplementary DR3 and DS6 in the synthesis of chiral CHCA was thoroughly investigated. To qualify as promising biocatalysts with industrial relevance, substrate loading should exceed 100 g·L–1, and the e.e. value should surpass 99% within 24 hours. After optimization, a substantial 1.0 M rac-CHCE (154 g·L–1) could be enantioselectively hydrolyzed into (S)-CHCE by DR3 (Figure 5A). By the 8.0-hour mark, the e.e.s reached >99% (S) at a conversion ratio of 53% and e.e.p of 88%. (S)-CHCE was isolated from the reaction system with a yield of 45.7%. Similarly, DS6 was applied in the resolution of 1.0 M rac-CHCE (154 g·L–1) for the synthesis of (R)-CHCE (Figure 5B). At 18 hours, e.e.s value reached >–99% (R), with a conversion ratio of 74% and e.e.p of –41%. (R)-CHCE was isolated with a yield of 23.3%. Notably, the enantioselective resolution of nearly symmetric CHCE for the synthesis of both enantiomers of chiral CHCE was achieved at 1.0 M for the first time, employing stereocomplementary DR3 and DS6.
The application potential of stereocomplementary DR3 and DS6 was further examined in the enantioselective resolution of esters with a nearly symmetric structure (Table 2). Initially, methyl and isopropyl cyclohex-3-ene-1-carboxylate (S2 and S3) could be enantioselectively hydrolyzed by DR3 and DS6. The E values of DR3 towards S2 and S3 were 92 and 110, respectively, dramatically higher than 5.6 and 8.3, consistent with CHCE. Furthermore, with the increase of alcohol groups from methyl to ethyl and isopropyl, the E values of DR3 increased accordingly, implying that a larger volume of alcohol groups is favorable for high enantioselectivity. DS6 exhibited reversed (S)-selectivity towards S2 and S3 with E values of –7.0 and 12, respectively. DR3 and DS6 demonstrated opposite priorities in the enantioselective accommodation of CHCA esters.
Moreover, other esters with a nearly symmetric structure, specifically oxyheterocyclic esters, were also evaluated by DR3 and DS6 (Table 2). WT AcEst1 displayed a relatively higher activity towards S4 and S6 with odd-member rings than S5 and S7 with even-member rings. The highest E value was 12 towards S7. DR3 could enantioselectively hydrolyze (R)-esters with higher E values than WT. The highest E value of DR3 was also observed with S7. Conversely, DS6 could enantioselective hydrolyze (S)-esters with reversed E values. The lowest E value was observed with S5 by DS6. All of the above results demonstrate that stereocomplementary DR3 and DS6 can be applied in the enantioselective hydrolysis of other esters containing a nearly symmetric structure.
Molecular mechanism of stereoselectivity control according to MD and QM/MM calculation
In-depth insights into the molecular mechanism of carboxylesterase in the enantioselectivity resolution of nearly symmetric esters were sought by attempting to resolve the crystal structure of sterecomplementary DR3 and DS6. Despite optimization of crystallization conditions and different truncation attempts, crystals with the desired quality for X-ray crystallography were not obtained. Consequently, structural models of DR3 and DS6 were constructed using AlphaFold2, a widely accepted software for building reliable protein structures based on artificial intelligence. The structural models of WT, DR3, and DS6 were minimized and subjected to multiple 100-ns MD simulations. The RMSD achieved stability at about 5 ns, ranging from 1.5 to 2.5 Å.
Near-attack conformation (NAC) analyses were conducted to gain insights into the transition state of nucleophilic attack50. According to the transition state, the angle among O of S201, carbonyl C, and carbonyl O of the substrate (θ1: ∠OG-C7-O2) should be within 90° ± 15°, and the nucleophilic attack distance between O of S201 and carbonyl C of the substrate (d1: distOG-C7) should be less than 3.4 Å. As illustrated in Figure 6, the percentages of conformations satisfying NAC parameters of WT toward (R)- and (S)-CHCE were 38.7% and 14.9%, respectively (Figure 6A&6B), consistent with the (R)-preference of WT AcEst1. However, the NAC percentage of 14.9% in WT and (S)-CHCE hinted that (S)-CHCE could also be accommodated by WT.
For DR3, the NAC percentage of (R)-CHCE was 39.9%, significantly higher than the 5.6% of (S)-CHCE (Figure 6C&6D), indicating that hydrolysis of (R)-CHCE is more favorable than (S)-CHCE in DR3. Regarding DS6, the NAC percentage of (S)-CHCE was 29.4%, much higher than the 12.9% of (R)-CHCE, proving that (S)-CHCE is more preferable than (R)-CHCE in DS6 (Figure 6E&6F). Binding free energy was also analyzed employing the MMPB/GBSA method51. The free energy difference between (R)- and (S)-CHCE of DR3 was –3.7 kcal·mol–1, higher than the –3.0 kcal·mol–1 of WT. For DS6, the free energy difference between (R)- and (S)-CHCE was 3.3 kcal·mol–1, suggesting that (S)-CHCE was preferable.
Representative conformations with the highest distribution ratio were extracted from MD simulations and are presented in Figure 7. In the case of WT AcEst1, the distance between the O atom of catalytic S201 and the carbonyl C (d1) of (R)-CHCE measured 3.1 Å, while d1 of (S)-CHCE was 3.9 Å (Figure 7A&7B). Turning to DR3, the d1 of (R)-CHCE decreased to 2.5 Å, and the d1 of (S)-CHCE increased to 4.0 Å (Figure 7C&7D). The mutation of V257 into M257 introduced an alkyl-π interaction with the double bond of (R)-CHCE. For DS6, the d1 of (S)-CHCE decreased to 2.6 Å, while the d1 of (R)-CHCE remained the same as in WT (Figure 7E&7F). A favorable π-π interaction between F297 and (S)-CHCE was also observed, providing evidence for the increased catalytic efficiency of DS6 toward (S)-CHCE.
The free energy barriers (DG‡) for (R)- and (S)-CHCE of WT, DR3, and DS6 were calculated using QM/MM52. The free energy difference between (R)- and (S)-CHCE (DDG‡) of WT was calculated to be –1.1 kcal·mol–1, favoring the hydrolysis of (R)-CHCE. In contrast, the DDG‡ values of DR3 and DS6 were –2.7 and 0.9 kcal·mol–1, respectively. The significant decrease in DDG‡ of DR3 and the increase for DS6 were consistent with their performance in the enantioselectivity resolution of CHCE. These findings provide molecular insights into the manipulation of enantioselectivity of stereocomplementary DR3 and DS6 in the accommodation and resolution of CHCE with a nearly symmetric structure.