Targeting turnover numbers for engineering applications
The integration of enzymatic constraints into ecGEMs [15] facilitates the development of our optimization-based approach, termed Overcoming Kinetic Obstacles (OKO), that predicts metabolic engineering strategies targeting turnover numbers. OKO manipulates the turnover numbers of enzymes under the assumption that the abundances of enzymes in the ecGEMs of wild type and the engineered organism are not significantly changed. This setting is particularly relevant for in vivo metabolic engineering strategies, since it does not require tinkering with the underlying transcriptional and (post)translational regulatory machinery of cell factories.
OKO is formulated as a two-step approach: In the first step, we determine the maximum product yield in the wild-type model at optimal growth rate under the set of constraints imposed by ecGEMs. Next, we identify the protein allocation by minimizing the enzyme usage (see Fig. 1A, Methods). In the second step, with the obtained enzyme abundances in the wild-type model, we constrain the abundance ranges in the model of the engineered strain, to ensure no significant change in comparison to the wild type strain (Methods). By introducing a binary variable for every turnover number and a tuneable parameter, we keep track of whether or not a turnover number is considered changed. Another tuneable parameter is in turn used to define the admissible range for the modified turnover number and guarantees that its value remains within this range (Fig. 1B, Methods). The second step of OKO then minimizes the number of significantly changed turnover numbers while ensuring that a desired level of chemical production is achieved at a factor of optimal growth without affecting protein abundance compared to the wild type (Fig. 1C, Methods).
Application of OKO to two cell factories
Modification of enzyme turnover numbers enhances the capacity of wild type cell factories and marks a departure from typical engineering strategies that rely on manipulation of enzyme abundance in prioritizing product formation at the expense of biomass growth. As a result, we expect that by using OKO, the optimal growth rate in in the engineered model may remain almost unchanged in comparison to that of the wild-type, thus breaking the trade-off between product production and growth (or biomass formation) [41].
Here, we applied the proposed computational approach, OKO, to the ecYeastGEM_v8.3.4 [30] of the yeast Saccharomyces cerevisiae and the eciML1515 [31] of the bacterium Escherichia coli as cell factories. We aimed to determine modification strategies that rely on turnover numbers for the overproduction of multiple native compounds in these hosts. The ecYeastGEM v8.3.4 model contains 1148 enzymes, of which 968 are linked to \({k}_{cat}\) values, resulting in a total of 4261 available \({k}_{cat}\) values that can be modified. The eciML1515 model contains 1505 enzymes with 3392 available \({k}_{cat}\) values linked to 1259 enzymes in the model.
Domenzain et al. [42] proposed a computational method to predict optimal combinations of gene engineering targets and applied it to the ecYeastGEM_v8.3.4 model to increase the production of 102 different chemicals. Their method involves scoring genes within the model to find the magnitude and directionality of genetic modifications, and then discards genes that encode enzymes that are unfavorable for the production of the target metabolite. Finally, the approach identifies the minimal set of modifications for the transition of cells from biomass formation to metabolic production. Of the chemicals studied in [42], 49 compounds, including amino acids, are native metabolites in Saccharomyces cerevisiae. Mapping of these native compounds to metabolites in eciML1515 using KEGG IDs, CHEBI IDs and chemical formulae resulted in 41 metabolites to which the OKO method was applied (see Supplementary Table 1).
We first calculated the maximum production of each native compound at optimal growth rates, i.e. 0.38 per hour for ecYeastGEM_v8.3.4 and 0.57 per hour for eciML1515. Enzyme abundances were then calculated by using the first step of OKO (see Methods). To demonstrate that the resulting enzyme abundances can be used as a reference point, we conducted a robustness analysis relying on tailored variability analysis that specifies admissible ranges for enzyme abundance (see Methods). The results show that, on average, the logarithmic ratio of the maximum and minimum values for each enzyme abundance at the maximum production of the different chemicals of interest is 0.015 for the ecYeastGEM_v8.3.4 model and varies from \(6\times {10}^{-4}\) to 0.011 for the eciML1515 model (see Supplementary Tables 1 and 2, respectively). This demonstrates that the estimated maximum and minimum abundance values for every enzyme are very close to each other, allowing the usage of the estimated enzyme abundances in the remaining computational steps of OKO.
We then used OKO to search for strategies with the minimum number of changes in \({k}_{cat}\) values that at least double the compound production while maintaining almost the same growth rates as in the wild type. The minimum number of modified \({k}_{cat}\) values for overproduction of the chemicals of interest ranges from 7 to 338 for the ecYeastGEM_v8.3.4 model and from 4 to 39 for the eciML1515 model (see Supplementary Tables 1 and 2, respectively). The two compounds with the least number of modifications required are choline and palmitoleate in the ecYeastGEM_v8.3.4 model, with seven modified turnover numbers, and proline and serine in the eciML1515 model, with four and five modified turnover numbers, respectively (see Supplementary Table 3 for the involved enzymes).
Since our goal is to double the production of a chemical of interest in comparison to the wild type, while ensuring certain growth, one may expect that only allowing for an increase in turnover numbers would suffice to address these objectives. Interestingly, however, the increase in production of the considered compounds does not necessarily entail only increases in the turnover numbers. We found that in the ecYeastGEM v8.3.4 model on average 35.5% of the enzymes require an increase in their \({k}_{cat}\) values, while in the eciML1515 model, this number is 38.5% to double the production of the selected compounds. Further, since in metabolic engineering it is easier to knock out an enzyme than to reduce its turnover number via enzyme engineering, we also identified the cases where the reduction in turnover number could be replaced by knocking out the enzyme. To the end, we inspected if an enzyme requiring a reduction in \({k}_{cat}\) value was associated with a flux of zero (\(<{10}^{-6} mmol {gDW}^{-1}{h}^{-1}\)) in the engineered model. On average, 53.9% of the reductions in the ecYeastGEM_v8.3.4 model and 12.9% in the eciML1515 model fell into this category. This finding indicated that decrease in turnover number, without blocking of reactions, are required to simultaneously achieve different objectives, including growth and production of the chemical of interest. Indeed, the growth rates of the engineered ecYeastGEM_v8.3.4 and eciML1515 models are on average 88% of the optimal growth rates of the wild-type models, indicating that the increase in the production of compounds following the proposed strategies does come at some, but not drastic cost of growth.
Promiscuous enzymes have been deemed reservoirs of functions that can be used in metabolic engineering, especially if particular catalytic activities are modified in a desired direction [2]. Hence, we also investigated whether enzymes targeted for modification of their turnover number were more likely to be promiscuous. We found, on average, 26.5% and 24.3% of the enzymes in the proposed strategies for the ecYeastGEM_v8.3.4 and eciML1515 models, respectively, were exhibited promiscuous functions.
OKO identifies strategies for overproduction of amino acids
Razaghi-Moghadam et al. [11] investigated the participation of enzymes in different reactions and the inclusion of complex gene-protein-reaction rules, and showed that complex rules are present in metabolic models to a considerable extent. Therefore, strategies that build on existing enzymes and pathways often lead to conflicts in the manipulation of enzyme abundances. For example, using their proposed approach, it was found that there is no feasible strategy at the native enzyme level to increase production of almost all amino acids in E. coli and S. cerevisiae. This finding motivated us to further investigate the performance of OKO to increase the production of amino acids in these cell factories. To this end, we first reviewed the primary results of OKO, which indicated the minimum number of modified turnover numbers for amino acid overproduction. This number ranges from 9 to 20 for the ecYeastGEM_v8.3.4 model and from 4 to 25 for the eciML1515 model (Fig. 2A). We found that, on average, 35% and 37.4% of the proposed enzymes in the ecYeastGEM_v8.3.4 and eciML1515 models, respectively, required an increase in their \({k}_{cat}\) values.
Furthermore, we observed that the proposed amino acid modification strategies involve a total of 83 and 80 enzymes in the ecYeastGEM v8.3.4 and eciML1515 models, respectively. Of these, 15 and 17 enzymes are required for the overproduction of at least three amino acids, respectively (see Figs. 2B and 2C). Moreover, and most importantly, the relative changes for the \({k}_{cat}\) values are small, reaching the defined limits (either \({\left({\beta }_{2}\right)}^{-1}\) or \({\beta }_{2}\)) only 0.6% and 7.6% of the time for the ecYeastGEM v8.3.4 and eciML1515 models, respectively (see Supplementary Tables 4 and 5).
In addition, as shown in Figs. 2B and 2C, for each enzyme included in the proposed amino acid modification strategies, the log-fold changes are predominantly distributed to one side of zero, indicating that the proposed modifications are biased towards either increasing or decreasing the \({k}_{cat}\) values to increase the production of each amino acid. Furthermore, the boxplots in Figs. 2B and 2C show a small spread, indicating the uniformity of the estimated \({k}_{cat}\) values for each enzyme. Therefore, in vivo enzyme engineering following the proposed strategy for overproduction of amino acids is expected to benefit the increase of production of other amino acids that also relies on that enzyme.
As depicted in Fig. 2B and Supplementary Table 4, the eciML1515 model demonstrates two exceptions to the consistency of change directions. The \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of dye-decolorizing peroxidase YfeX enzyme (Uniprot ID P76536) in the ferrochelatase reaction is initially 25999.42 (\({\text{s}}^{-1}\)) in the eciML1515 model. For the overproduction of alanine, aspartate, cysteine, glutamine, glycine, threonine, tryptophan, and tyrosine, this value must be reduced by a factor of 0.7. Conversely, overproduction of proline requires a 10-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value. In addition, pyruvate kinase II (Uniprot ID P21599) with an initial \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of 3204.01 (\({\text{s}}^{-1}\)) in various pyruvate kinase reactions of the eciML1515 model needs to be increased for arginine and isoleucine overproduction and decreased for tyrosine overproduction. In the proposed strategy for arginine overproduction, the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of pyruvate kinase II needs to be increased 10-fold in two reactions and 2.5-fold in another reaction.
In the ecYeastGEM v8.3.4 model, as shown in Fig. 2C and detailed in Supplementary Table 5, the estimated changes in \({\text{k}}_{\text{c}\text{a}\text{t}}\) values for two enzymes show greater variability. The isocitrate dehydrogenase [NADP] cytoplasmic (IDH) enzyme (Uniprot ID P41939) has a \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of 58.30 (\({\text{s}}^{-1}\)) in the isocitrate dehydrogenase (NADP) reaction. This value needs to be increased by a factor of 1.3 for the overproduction of glutamine and proline, and by a factor of 10 for the overproduction of glutamate. In addition, the homoserine kinase (HK) (HSK) enzyme (Uniprot ID P17423), which has a \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of 40.80 (\({\text{s}}^{-1}\)) in the homoserine kinase reaction, needs to be increased 1.6-fold for glycine and threonine overproduction, whereas only a 1.01-fold increase is suggested for isoleucine overproduction. These examples demonstrate that while changes in some turnover numbers can generally accommodate usage of the engineered enzymes in the implementation of engineering strategies for different compounds, there are few, notable exceptions.
Setting the ground for experimental validation
Metabolic engineering approaches that aim to use enzymes with the optimized \({k}_{cat}\) values can leverage either existing nonnative enzymes or engineered enzymes with modified substrate specificity. Methods that rely on existing enzymes have the flexibility to reuse the enzyme repository in nature and facilitate the creation of synthetic pathways for engineering purposes. This motivated us to investigate whether the identified \({k}_{cat}\) values in our engineering strategies are close to the turnover numbers of these enzymes in other organisms.
Enzyme databases such as BRENDA [17] and SABIO-RK [18] contain extensive repositories of \({k}_{cat}\) values. However, there is still a lack of high-throughput techniques for measuring \({k}_{cat}\) values, resulting in a notable disparity in the number of \({k}_{cat}\) values available compared to the vast number of organisms and metabolic enzymes in existence [28]. To fill this gap, deep learning and machine learning approaches have been proposed to predict the \({k}_{cat}\) values for different enzymes in different organisms, and have shown high performance [19–21]. To investigate whether the \({k}_{cat}\) values estimated by OKO could be found for the respective enzymes in other organisms, we used the predicted \({k}_{cat}\) values from the DLKcat approach [19]. DLKcat includes the prediction of \({k}_{cat}\) values for 651 metabolic enzymes in 343 organisms (see Supplementary Table 6).
Having mapped the two sets of \({k}_{cat}\) values from ecGEMs and the DLKcat results (see Methods), we examined each enzyme with an estimated \({k}_{cat}\) value from OKO to determine whether it was included in the list of enzymes with a predicted \({k}_{cat}\) value from the DLKcat approach. If so, we further identified the closest \({k}_{cat}\) value to the estimated one across all organisms. In the proposed amino acid modification strategies, we identified 248 and 236 estimated \({k}_{cat}\) values for the ecYeastGEM_v8.3.4 and eciML1515 models, corresponding to 83 and 80 different enzymes, respectively. As shown in Table 1, only 55.2% and 24.2% of these enzymes were included in the list of enzymes with predicted \({k}_{cat}\) values by the DLKcat approach. Furthermore, the average distance of the closest \({k}_{cat}\) values to the estimated values across all organisms was 503.15 (\({s}^{-1}\)) and 32188.07 (\({s}^{-1}\)) for the ecYeastGEM_v8.3.4 and eciML1515 models, respectively. These findings indicated that the enzymes with \({k}_{cat}\) values estimated by the OKO approach were rarely found in other organisms and could not be directly used for in vivo engineering.
Table 1
Deep-learning-driven identification of suitable turnover numbers from other organisms. Shown is the number of enzyme-reaction pairs in the proposed strategies for which turnover numbers can be predicted by DLKcat results. The table also includes the average distance of the \({k}_{cat}\) values predicted by DLKcat and the modifications resulting from OKO and OKO+.
| OKO | OKO+ |
| E. coli | S. cerevisiae | E. coli | S. cerevisiae |
Found enzyme-reaction pairs | 57 | 137 | 354 | 817 |
Missing enzyme-reaction pairs | 179 | 111 | 0 | 0 |
Average distance (\({s}^{-1}\)) | 32188.07 | 503.15 | 2.35 | 4.92 |
Refinement of OKO enhances its applicability
As it is more convenient to construct synthetic pathways using existing enzymes, we also proposed a refinement of OKO, termed OKO+. In OKO+, for each enzyme-reaction pair with an available \({k}_{cat}\) value in an ecGEM, we first extract the minimum and maximum of the predicted \({k}_{cat}\) values in all other organisms using the mapping established between ecGEM and DLKcat results. In total, there are 4261 and 3392 enzyme-reaction pairs with available \({k}_{cat}\) values in the ecYeastGEM_v8.3.4 and eciML1515 models, respectively, of which 1676 and 469 respected values are found in the predicted \({k}_{cat}\) values from the DLKcat approach (see Supplementary Table 7). To ensure that only enzymes with known turnover numbers are included, we introduce two additional sets of constraints in OKO+: (1) for the enzyme-reaction pairs with available turnover estimates from DLKcat or other resources, the predicted \({k}_{cat}\) should remain either unchanged or within the range extracted from the values obtained from DLKcat, and (2) the enzyme-reaction pairs with available \({k}_{cat}\) values in the ecGEM, for which values from DLKcat (from other organisms) are not available, are not allowed to be modified.
The introduction of these two additional sets of constraints resulted in infeasibilities to identify engineering strategies that yield the desired levels of product formation and growth. Therefore, in OKO+, we also allow for variation in enzyme abundance, modelled by introduction of a binary variable for each enzyme abundance indicating if it is significantly altered (with respect to a tunable parameter). This approach then minimizes the weighted average of the number of modified turnover numbers and the number of modified abundances, with preference for the former (see Methods).
By deploying OKO+ with the same models we found that the minimum number of modified \({k}_{cat}\) values for overproduction of the chemicals of interest ranges from 0 to 85 for the ecYeastGEM_v8.3.4 model and from 0 to 44 for the eciML1515 model (see Supplementary Tables 1 and 2), while also requiring changes in enzyme abundance. The ecYeastGEM_v8.3.4 model requires different number of changes in enzyme abundance, ranging from 2 to 240 modifications, while this number varies from 3 to 286 for the eciML1515 model. On average, 48.8% and 77% of the enzymes in the proposed strategies for the ecYeastGEM_v8.3.4 and eciML1515 models were promiscuous enzymes, respectively.
In addition, for each \({k}_{cat}\) value estimated by OKO+ in amino acid modification strategies, we identified the closest \({k}_{cat}\) value across all organisms. Clearly, due to the constraints imposed in OKO+, all enzymes with newly estimated \({k}_{cat}\) values were included in the list of enzymes with predicted \({k}_{cat}\) values from the DLKcat approach. The average distance from the estimated values to their closest counterparts across all organisms was 2.35 for the ecYeastGEM_v8.3.4 model and 4.92 for the eciML1515 model (see Table 1). The small average distance value indicates that OKO+ successfully identified a suitable enzyme for another organism with closely matching activity.
Inspection of the proposed strategies to increase the production of amino acids showed that the existing nonnative enzymes with \({k}_{cat}\) values closest to the predicted by OKO+ originate from different organisms (see Supplementary Tables 1, 2 and 8). In total, 103 different organisms contribute to the proposed modification strategies for the ecYeastGEM_v8.3.4 model and 53 for the eciML1515 model (see Supplementary Table 8). Each organism contributes a different number of enzymes, ranging from 1 to 50 for the proposed modifications in the ecYeastGEM_v8.3.4 model and from 1 to 29 for the eciML1515 model.
To further analyze the applicability of the strategies resulting from OKO+, we constructed two distinct ecGEMs by substituting the \({k}_{cat}\) values in the original ecGEMs with the values estimated by OKO+, as well as the closest values from other organisms. We then assessed the performance of the strategies with the so-modified turnover numbers in producing the compounds of interest. In each model constructed and for each compound of interest, the biomass level was set to match that of the proposed engineering strategy for the overproducing that chemical. In addition, the number of modifications to the enzyme abundance was constrained by the number of modifications in the corresponding proposed engineering strategy. Clearly, this evaluation is only applicable in cases where at least one modification to the enzyme activity was proposed by OKO+ and this is for 19 amino acids in the ecYeastGEM_v8.3.4 model and 12 in the eciML1515 model. With these constraints, we calculated the maximum production of each chemical of interest in the two constructed ecGEMs and determined their logarithmic ratio (\(\text{l}\text{o}\text{g}\left(\frac{{v}_{max}^{\text{D}\text{L}\text{K}\text{c}\text{a}\text{t}}}{{v}_{max}^{{\text{O}\text{K}\text{O}}^{+}}}\right)\)) (see Supplementary Tables 1 and 2). In the majority of cases the two constructed ecGEMs yielded the same levels of product formation. Small differences in product formation levels were only observed for glutamine in the ecYeastGEM_v8.3.4 model and for arginine and methionine in the eciML1515 model, with the smallest logarithmic ratio being \(-1.3\times {10}^{-2}\).
Feasibility of strategies designed by employing OKO+
OKO+ was employed to design engineering strategies for the overproduction of 49 compounds in the ecYeastGEM_v8.3.4 model and 41 compounds in the eciML1515 model. The total number of modifications, including enzyme abundance and turnover numbers, varied from 2 to 293 in the ecYeastGEM_v8.3.4 model and from 3 to 316 in the eciML1515 model (see Supplementary Tables 1 and 2). Further analysis focused on the engineering strategies of OKO+ that are more feasible to implement, specifically those with a total number of modifications less than or equal to seven. Six such compounds were identified in the ecYeastGEM_v8.3.4 model and five in the eciML1515 model (see Table 2 and Fig. 3).
It is noteworthy that all modification strategies proposed by OKO+ to overproduce these eleven compounds involved an increase in enzyme abundance. Furthermore, the proposed engineering strategies do not only involve modification of enzyme abundance, but also adjustment of enzyme turnover numbers. For instance, the overproduction of isoamylol requires a 1.4-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of the 3-isopropylmalate dehydrogenase enzyme (Uniprot ID P04173), with the closest comparable value found in Candida rhagii. Similarly, for the overproduction of isoleucine, a 1.2-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of homoserine dehydrogenase (Uniprot ID P31116) is required, and the closest identified value comes from Blastobotrys raffinofermentans.
In the proposed strategy for spermine overproduction, a \(1.5\times {10}^{-4}\)-fold reduction in the \({k}_{cat}\) value of fumarate reductase 2 (Uniprot ID P21375) was proposed, with the closest corresponding value identified in Ogataea trehaloabstinens. Finally, for (S)-malate overproduction, the proposed strategy involved a 1.6-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of pyruvate carboxylase 1 (Uniprot ID P11154), with the closest comparable value coming from Lachancea waltii.
Additionally, for increasing homoserine production in the eciML1515 model, OKO+ proposed an engineering strategy involving a 1.3-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of aspartate-semialdehyde dehydrogenase (Uniprot ID P0A9Q9) and a 2.9-fold increase in the \({\text{k}}_{\text{c}\text{a}\text{t}}\) value of the bifunctional aspartokinase/homoserine dehydrogenase 1 (Uniprot ID P00561). The closest comparable \({k}_{cat}\) values to these proposed modifications are identified in Saturnispora zaruensis and B. anomalus, respectively. These results jointly indicate that feasible engineering strategies can be obtained when combining overexpression of enzymes with enzymes engineered towards an increase in turnover numbers.
Table 2
Proposed metabolic engineering strategies with feasible implementation. Shown are the compounds for which the proposed engineering strategies involve a total of seven or fewer modifications. The number of modifications to the turnover number and to the enzyme abundance are included.
| | modifications of turnover number | Modifications of enzyme abundance |
E. coli | Aspartate | 0 | 5 |
Citrulline | 0 | 7 |
Glycine | 0 | 3 |
Homoserine | 2 | 3 |
| (S)- malate | 0 | 6 |
| Acetate | 0 | 6 |
S. cerevisiae | Isoamylol | 1 | 4 |
Isoleucine | 1 | 6 |
Lysine | 0 | 2 |
Spermine | 1 | 5 |
(S)-malate | 1 | 2 |