In Silico Design for Systems-Based Metabolic Engineering In Escherichia Coli for The Bioconversion of Aerobic Glycerol to Succinic Acid

Glycerol has become an interesting carbon source for industrial processes as consequence of the biodiesel business growth since it has shown promising results in terms of biomass/substrate yields. Selecting the appropriate metabolic targets to build ecient cell factories and maximize the desired chemical production in as little time as possible is a major challenge in industrial biotechnology. The engineering of microbial metabolism following rational design has been widely studied. However, it is a cost-, time-, and laborious-intensive process because of the cell network complexity; thus, to be procient is needed known in advance the effects of gene deletions. An in silico experiment was performed to model and understand the effects of metabolic engineering over the metabolism by transcriptomics data integration. In this study, systems-based metabolic engineering to predict the metabolic engineering targets was used in order to increase the bioconversion of glycerol to succinic acid by Escherichia coli. Transcriptomics analysis suggest insights of how increase the glycerol utilization of the cell for further design ecient cell factories. Three models were used; an E. coli core model, a model obtained after the integration of transcriptomics data obtained from E. coli growing in an optimized culture media, and a third one obtained after integration of transcriptomics data obtained from E. coli after adaptive laboratory evolution experiments. A total of 2402 strains were obtained from these three models. Fumarase and pyruvate dehydrogenase were frequently predicted in all the models, suggesting that these reactions are essential to increasing succinic acid production from glycerol. Finally, using ux balance analysis results for all the mutants predicted, a machine learning method was developed to predict new mutants as well as to propose optimal metabolic engineering targets and mutants based on the measurement of importance of each knockout’s (feature’s) contribution. revealed These data provide Our


Abstract Background
Glycerol has become an interesting carbon source for industrial processes as consequence of the biodiesel business growth since it has shown promising results in terms of biomass/substrate yields. Selecting the appropriate metabolic targets to build e cient cell factories and maximize the desired chemical production in as little time as possible is a major challenge in industrial biotechnology. The engineering of microbial metabolism following rational design has been widely studied. However, it is a cost-, time-, and laborious-intensive process because of the cell network complexity; thus, to be pro cient is needed known in advance the effects of gene deletions.

Results
An in silico experiment was performed to model and understand the effects of metabolic engineering over the metabolism by transcriptomics data integration. In this study, systems-based metabolic engineering to predict the metabolic engineering targets was used in order to increase the bioconversion of glycerol to succinic acid by Escherichia coli. Transcriptomics analysis suggest insights of how increase the glycerol utilization of the cell for further design e cient cell factories. Three models were used; an E. coli core model, a model obtained after the integration of transcriptomics data obtained from E. coli growing in an optimized culture media, and a third one obtained after integration of transcriptomics data obtained from E. coli after adaptive laboratory evolution experiments. A total of 2402 strains were obtained from these three models. Fumarase and pyruvate dehydrogenase were frequently predicted in all the models, suggesting that these reactions are essential to increasing succinic acid production from glycerol. Finally, using ux balance analysis results for all the mutants predicted, a machine learning method was developed to predict new mutants as well as to propose optimal metabolic engineering targets and mutants based on the measurement of importance of each knockout's (feature's) contribution.

Conclusions
The combination of transcriptome, systems metabolic modeling, and machine learning analyses revealed versatile molecular mechanisms involved in the utilization of glycerol. These data provide a platform to improve the prediction of metabolic engineering targets to design e cient cell factories. Our results may also work a guide platform for the selection/engineering of microorganisms for production of interesting chemical compounds.

Background
Shifting from petrochemical sources to renewable, abundant, and inexpensive feedstocks to obtain valuable chemicals has become a promising goal for the chemical industry [1]. Biodiesel industry has increased in the last years by using renewable raw materials, but it generates large amounts of glycerol becoming it a burden. The bioconversion of glycerol is a potential route to increasing the use bio-based succinic acid industry, a critical building-block chemical with an attractive market. The availability of three pathways for succinic acid production ( Fig. 1) [2], the adaptability to different environments, and the accessibility of metabolic engineering and omics tools make Escherichia coli an attractive cell factory. However, some challenges, such as low growth rate and yield, the use of rich medium, the generation of by-products, and various anaerobic requirements, need to be overcome for bio-based succinic acid production, considering cost-effective issues compared to the petroleum-based approach.
The main goal of using microbial cell factories is design cheap and high-yield biotechnology-based conversion processes. A signi cant problem to be solved is how to enhance cell growth while using its capabilities to obtain a high yield target chemical product. A classical approach for that is adaptive laboratory evolution -ALE, which is based on the selection of microorganisms with superior production capability after random mutagenesis screening. Another approach to strain improvement is metabolic engineering, which uses genetic manipulation to optimize the production of desired compounds. Metabolic engineering select targets that increases the productivity based on the rationality of trial-and error development cycles and an understanding of the routes playing a signi cant role in the synthesis. Strain design with this method has been extensively applied to use and/or produce interesting compounds [2][3][4][5][6], including bio-based organic acids by substrate transport enhancement, gene overexpression, and deletion [6][7][8][9][10][11]. However, making the strain industrially competitive requires much time, effort, and high cost [12].
Once DNA was discovered in the last century, a new approach called metabolic network modeling for the study of cellular metabolism was developed [13]. It allows to determine how several pathways in a cell can interact, as well as to elucidate basic microbial processes [14]. The rst genome-scale metabolic network was described in 1999 and in 2002 was reported the use of metabolic modeling to analyze recombinant pathways [15]. Ever since, several models have been developed with signi cant accuracy and useful predictions [16] that can be used to guide experimental studies [13,17]. COnstraints-Based Reconstruction and Analysis -COBRA -methods make possible to predict, given a cellular objective function, attractive targets to increase or maximize biochemical yields, and to determine perturbations after genetic manipulations of the cell [18,19]. OptKnock, OptStrain, OptForce, and OptReg are some COBRA methods developed to predict metabolic engineering targets for cell optimization by using Gene-Protein-Reaction -GPR-relationship [17,[20][21][22][23].
OptKnock applies a Flux Balance Analysis -FBA-approach for simulating genome-scale metabolic models. It assumes that each organism's metabolic network has been tuned through evolution for some objective function, be it a maximal growth rate or energy e ciency (e.g., minimal ATP utilization). While this assumption may be valid for wild-type (WT) organisms that have evolved over many hundreds or thousands of generations, it may be less appropriate for engineered mutants (KO) because they have been engineered in a controlled environment and unexposed to the same evolutionary forces. Hypothesizing that mutant organisms are unable to immediately adapt their metabolic network to achieve the WT objective function, computational tools such as Minimization of metabolic adjustment (MOMA) was developed [24]. This approach is mathematically formalized as a quadratic programming (QP) problem, nding a suboptimal ux pro le that is a minimal Euclidean distance from the WT (WT-FBA) and the genetically perturbed (KO-FBA) organisms. FBA combined with MOMA evaluation after OptKnock prediction could provide more accurate prediction of the immediate metabolic response to KO than FBA does on its own. However, there could exists a large list of knockouts combinations that could be obtained when computational tools are used and select which test in lab can be laborious.
Several approaches to optimize cell factories have been developed, but conventional and computational approaches have not always been successful due to unexpected changes in the cell where exits an intracellular complex interconnected network of genes, proteins, and reactions. Systems metabolic engineering has emerged as an approach that integrates metabolic engineering and combined metabolic and 'omics' network models. This could be bene cial for genome-scale modeling because reduce the solution space and generates accurate predictions [12,[25][26][27][28]. Mainly considering that under certain environmental conditions there are a limited number of reactions which are active according to transcriptional responses and other regulation phenomena to provide bene cial improvements for the cell bioconversion process [29][30][31][32][33][34][35].
In this study, systems metabolic engineering for overproduction of succinic acid from glycerol in E. coli was used through the integration of transcriptomics data to metabolic models and classi cation trees analysis using random forest to classify gene targets predicted by OptKnock. Our strategy took advantage of transcriptomics data obtained from an evolved E. coli in glycerol, and an optimized culture media. These data were subsequently integrated into a metabolic network models to predict targets using OptKnock. Predicted combinations were then evaluated using FBA, ux variability analysis, and MOMA to determine the effects of gene-reaction knockout in the cell. Finally, predicted target reactions were evaluated using random forest to determine the importance of each target using the succinic acid production, growth rate, and Euclidian distance between the WT strain and each mutant as response variables.

Results
Transcriptional response of E. coli for aerobic glycerol consumption A cell is considered as a complex system where a large number of processes are carried out. These processes then involve an interaction between, genes, transcripts, proteins, metabolites, reaction, among others [12,36,37]. Metabolic models are reconstructed by using the genome information; however, it is well known that metabolism is given by environmental conditions by passing through a cell regulation process. This causes that some genes can be turned on and off under certain conditions. To determine which reactions are active to obtain high accurate models two transcriptomic pro le was obtained from an ALE experiment and an optimized culture medium.
Differentially Expressed Genes (DEGs) were determined using the deseq2 statistical package after ltering out low count reads with an average value of < 100. Signi cantly DEGs were de ned as those whose abundance had at least a log2-fold change [(log2 FC) >|2|] between the reference condition (glycerol-based medium) and a chosen experimental condition (optimized culture medium and evolved strain) at a false discovery rate (FDR)-corrected P value of < 0.05. Relevant genes with log2 FC >|1| for glycerol metabolism or under the same regulon were taken into account. Figure 2 shows the distribution of DEGs using a Log2 FC ≥|2| for one strain growing in the optimized culture medium and one evolved strain growing in the same optimized medium. This analysis determined that 478 genes were differentially expressed, with 222 genes down-regulated and 256 up-regulated for the optimized medium, and 431 DEGs for the evolved strain, which 223 genes were down-regulated and 208 genes were upregulated. When comparing DEGs in the optimized medium and those in the evolved strain, 59 down-regulated genes were found to be unique in the evolved strain and 58 unique genes for optimized medium. In this context, 47 and 95 up-regulated genes were found to be unique in the evolved strain and the optimized medium, respectively (Fig. 2).
DEGs were classi ed into the following three groups using GO analysis: biological processes, molecular functions, and cellular components. The shared down-regulated genes predominantly included those involved in metabolic process (cellular, organic substances, nitrogen compounds, and primary metabolic processes), chemicals, stress and stimulus responses, and heterocyclic compound systems. Between down-regulated genes, we found phoB and phoR, which are involved in phosphorous uptake and metabolism since, under excess phosphorous, PhoR inactivates phoB [38]. Figure 3 shows the level 2 GO terms for unique down-and up-regulated genes in both conditions using blast2GO [39]. The 117 unique down-regulated genes at Log2 FC ≥|2| and an adjusted p-value of ≤0.05 were classi ed into 15 functional groups. Two GO terms, signaling and locomotion, were only present for the evolved strain, and one GO term, multi-organism processes, was only present for the optimized culture condition in down-regulated genes (Fig. 3A).
GO analysis revealed that shared up-regulated DEGs (Fig. 2) are involved mostly in the metabolic process (51%), including GO terms such as cellular, organic substances, primary, and nitrogen compounds process. 11% of upregulated genes were associated with biosynthetic processes and the establishment of localization. The main GO terms for molecular functions were those involved in binding activity (66%), counting ions, heterocyclic compounds, organic cyclic compounds, small molecules, and protein binding, followed by transferase activity (10%) and transmembrane transporter activity (9%). About 42% of the DEGs categorized in cellular functions were implicated in membrane GO terms, with 17% in the cell periphery and 16% in the cytoplasm.
Glycerol metabolism in E. coli is mediated by glp operons. In consequence, transcriptomic analysis shows shared up-regulation of glpBCFKQTX genes. The changes in bacterial gene expression in response to glycerol utilization are summarized in Table 1. During glycerol utilization, GlpF permease facilitates glycerol entry into E. coli for further transformation into glycerol-3-phosphate (Gly-3-P) by GlpK under aerobic conditions. Comparing glpK expression with the values obtained for other genes in the glp regulon showed that glpK was one of the most highly expressed genes. However, a difference of ~ 1 log2 FC between the evolved strain and the optimized culture condition was exhibited in the glpFKX operon ( Table 1). As a consequence of the regulatory network, an increase in the expression of glpX was detected (2.76 Log2 FC), which is part of the glpFKX operon, and works as an alternative fructose-1,6-bisphosphatase involved in gluconeogenesis by catalyzing the hydrolysis of fructose 1,6bisphosphate to fructose 6-phosphate [40]. Overexpression of glpX has been shown to increase hydrogen production [41]. Additionally, transcriptomic analysis showed up-regulation of both avin oxidases, glpD and glpABC. The electron-transport chains of E. coli are composed of many different dehydrogenases and terminal reductases. Glycerol metabolism in E. coli uses oxygen as the main electron acceptor but it could employ also fumarate under anaerobic conditions by encoding a fumarate reductase complex under anaerobic conditions [42,43]. Table 1 shows Log2 FC for fumA and frdABCD genes in E. coli. The fumA gene was encoded for abundant fumarase, predominantly expressed in the optimized culture medium (1.55 Log2 FC), but not for the evolved strain (-0.53 Log2 FC). FumA have been reported to be predominant expressed under aerobic conditions [44]. Under aerobic conditions, the catalysis of succinate to fumarate interconversion are mediated by the succinate dehydrogenase complex encoded by sdhABCD [43]. However, in this study sdhABCD genes were not found to be differentially expressed in any of the culture conditions. Interestingly, among upregulated genes in the adapted strain, a difference of ~ 1 log2 FC in the expression of the fumarate reductase genes (frdABCD), which is used in anaerobic growth, was observed over the optimized culture condition.
The maltose operon of E. coli consists of genes that encode proteins involved in the uptake and metabolism of maltose and maltodextrins. These genes have been found to be highly associated with up-regulation under glycerol utilization as a carbon source, and changes in the level of glpK transcription had a signi cant effect on malT transcription [45]. In this study, malEFKMTPQ genes were shown to be up regulated in both conditions. For malT, the log2 FC was more highly expressed in the optimized culture condition than in the evolved strain. The same behavior was observed for glpK. Thus, high expression of this regulon in this study could be presumably linked to the high expression of the glpK gene since they showed similar log2 FC.
As a result of glycerol metabolism, acetate is mainly generated. In our analysis, the phosphate acetyltransferase encoded by pta, which catalyzes the reversible conversion between acetyl-CoA and acetylphosphate [46,47], was found to be up regulated (~ 2.30 Log2 FC). Also, the atpABCDEFGH genes have a role in the generation of ATP from ADP and phosphate. These genes were observed to be up-regulated, with similar log2 FC, except for atpA, which had a difference of around 1 log2 FC in the optimized culture medium with respect to the evolved strain.
Predicting potential metabolic engineering targets for succinic acid overproduction Genome-scale metabolic (GEMs) models are de ned as a complete set of reactions involved in cell metabolism, given by genome annotation, regardless of whether the annotated metabolic genes are expressed in a given environment. This assumption could be correct in genome-scale models because core models represent the central metabolism, but the full potential of GEMs remains largely unexploited [48]. To avoid this situation and to evaluate the effects of use a core or a large model to predict metabolic engineering targets three models were used: a core model (ECC2) and two models obtained after the integration of transcriptomics data that can help to elucidate the actual state of the metabolic network in vivo for further metabolic engineering.

Metabolic model reconstruction and transcriptomics integration
For the integration process, a reconstruction of the metabolic model for E. coli ATCC 8739 was carried out based on the iEcolC 1368 [49], iEC1349_Crooks [50], and iML1515 models [51]. Extensive manual curation was conducted to ll pathway gaps. Transport and exchange reactions were added or eliminated, enabling nutrient uptake and byproduct secretions. Finally, the resulting model was designated iTA1338, and it involved 2032 metabolites, 2804 reactions, and 1338 genes (Supplementary File S1). After that, using GIMME, it was constructed context-speci c metabolic networks departing from the iTA1338 model for two types of strains: (1) WT E. coli ATCC 8739 growing in an optimized culture medium (iTA818) (Supplementary le S1) and (2) E. coli ATCC 8739 strains evolved to grow on glycerol (iTA821) (Supplementary le S1), manual curation was carried for iTA821 model based on GapFind and GapFill results. Figure 4 illustrates the number of reactions obtained for each model after transcriptomic integration. The same growth rate was observed after integration; however, ux distribution in 24 reactions was exhibited (Fig. 4B). The reactions only present in iTA821 are mainly associated with the transport inner membrane (14). Other unique reactions in iTA821 were mapped to be linked to the citric acid cycle, cofactor and prosthetic group biosynthesis, glutamate metabolism, inorganic ion transport and metabolism, the nucleotide salvage pathway, oxidative phosphorylation, and pyruvate metabolism, among others. Unique reactions of iTA818 were mainly associated with transport, including the transport outer membrane porin (218), transport inner membrane (50), and transport outer membrane (15), followed by cell envelope biosynthesis (37), the nucleotide salvage pathway (24), glycerophospholipid metabolism (14), alternate carbon metabolism (12), and cofactor and prosthetic group biosynthesis (7), among others.
In silico systems metabolic engineering targets prediction To design E. coli strains that overproduce succinic acid from glycerol, OptKnock was used [22]. Before predicting the reaction target to overproduce succinic acid, both metabolic networks were pre-processed. The goal of preprocessing was to obtain a smaller set of selected reactions that could serve as valid targets for gene knockouts. First, all reactions displaying maximum and minimum uxes equal to zero were removed from the set of potential reactions to be knocked out. Next, all reactions that had been experimentally found to be essential for growth were removed from consideration [52]. Also, the reactions that were found to be computationally essential were not considered, as well as non-gene-associated reactions.
Ten OptKnock rounds of mutant prediction were carried out. In each round, the set of reactions was set up to 1, 2, 3, … 10, and 100 mutants were requested per round. ECC2, iTA818, and iTA821 models were used to predict mutants of succinic acid overproducers. 811, 806, and 785 possible mutants were obtained from ECC2, iTA818, and iTA821 model, respectively. Figure 5 describes the frequency of the reactions predicted in all the possible mutants. It can be seen that 30 reactions were above the average of the frequency. Reactions acetate kinase (ACKr), fructose 6phosphate aldolase (F6PA), fumarase (FUM), pyruvate dehydrogenase (PDH), pyruvate formate lyase (PFL), phosphotransacetylase (PTAr), succinate dehydrogenase (SUCDi), triose-phosphate isomerase (TPI), glycerol-3phosphate dehydrogenase-NADP (G3PD2), and glycerol dehydrogenase (GLYCDx) were frequently predicted for all models. It is important to mention that G6PDH2r, LDH_D, PGL, and POX were not predicted to be part of models iTA818 and iTA821 after integration.
Interestingly, in the complete set of reactions predicted, PDH was the most frequent target reaction, followed by FUM in all models (Fig. 5) and minimal variations in the knockout frequency were observed for these reactions. Figure 5A shows A cluster analysis between the reaction frequency for each k deletion showed that elimination of acetate, formate, and lactate by-products mediated by POX, PFL, and LDH_D are highly related to PDH and FUM deletion (Fig. 6). This phenomenon, probably due to PDH deletion results in reduced conversion of pyruvate to acetyl-CoA, which is the main substrate in ACKr and PTAr reaction to generate acetate (Fig. 1), a competitive by-product on succinate production [46]. Then, if PDH deletion is not carried out, ACKr and PTAr knockouts would become essential to increasing succinate production, as well as minimizing costs in the separation process [53,54].
Since metabolic manipulation of a cell results in a stress process, negative impact of deletions on the maximum growth rate can be observed. To determine the effects of reaction knockouts over the cell, FBA was carried out and Euclidean distance was calculated for each mutant predicted. Figure 7 illustrates the relationship between the number of knockouts, succinic acid production, and growth rate using FBA and the Euclidian distance between the WT and mutant strains using MOMA. It can be seen that the number of reactions knocked is highly related to high succinic acid production rates due to the elimination of competitive by-products, such as acetate, formate, and lactate, requiring at least 3-4 deletions. The highest succinate production (~ 8.5 mmol/gDW/h) was observed in mutants predicted in ECC2 when 9 or 10 reactions were deleted. However, this implies a substantial reduction in the growth rate to ~ 4% compared to the WT strain. Thus, selecting these mutants is unrealistic for the industrial production of succinic acid. The same behavior in the reduction of the growth rate was observed for those mutants that required more than 6 deletions in mutants predicted in iTA818 and iTA821. In contrast, a considerable reduction in the growth rate (28% of the WT growth rate) as well as an increasing succinic acid production rate (around 30% more than those with 9-10 knockouts) for those mutants with 6 knockouts was observed. In addition, it was observed that there is no direct correlation, in the same magnitude for all the mutants, between the Euclidean distance and the numbers of knockouts in each mutant. However, Fig. 7 shows ampli cations in the Euclidian distance between the WT and the mutants when succinate production and knockout numbers increases and growth rate decreases.

Identi cation of critical metabolic targets and potential mutants
OptKnock results are a large list of a knockouts combinations where maximum product synthesis occurs at maximum growth rate reachable [22]. However, it has been observed that the optimal solution of the target given by OptKnock is not necessarily growth coupled and some mutants predicted does not increase the product target. In consequence, select a mutant to be tested in the lab could be really di cult and probably result in a laborious process. Assuming that each mutant product growth "coupled" predicted will result in a successful biological production, this mutants can ensure high productivity over time and initially solve this situation [55]. To identify growth-coupled production solutions, a Cobra Toolbox function was used verifying the minimum and maximum production rate given a set of reactions to be knocked. As a result, when the maximum growth rate is achieved the same minimum and maximum ux for the desired product should be obtained. 1799 mutants were predicted to be growth coupled, 539 to be growth-coupled non-unique (maximum ux -minimum ux > 0.1), and 64 mutants were categorized as not growth-coupled (maximum ux < 0.1). For the mutants categorized as growth-coupled nonunique, an FBA was carried out to predict the succinic acid production rate (Fig. 7), where 279 mutants were predicted to have a difference between the maximum production rate predicted by the function and FBA < 2, resulting in 2078 in silico mutants that overproduce succinic acid.
In order to lter and select potential mutants to be tested in the lab, a random forest model to predict the importance of each reaction knockout was developed based on the OptKnock predictions. Each possible combination of reactions using binary values that increase the succinic acid production was associated with the ux of the extracellular succinic acid and biomass reaction obtained by FBA and the Euclidian distance obtained by MOMA. The data set was divided into two groups, 70% for training and 30% for the test. Following feature selection and cross validation, a robust model that associated any combination of 58 reaction variables to a predicted growth rate and succinic acid production ratio was obtained. A measure of importance the contribution of each feature to the random forest model is shown in Fig. 8 indicated by IncNodePurity. This model exhibited a mean square error (MSE) value when using the reaction ux of EX_succ_e ux obtained by FBA as variable response of 0.293. For the growth rate (biomass reaction) as a response variable, the MSE value was 0.0002. Finally, when the Euclidian distance for each mutant was used as the response variable, the MSE value was 9175.158, indicating that the Euclidian distance is not a good response variable to predict cell behavior when using the random forest model. Moreover, this result allows use machine learning models to predict a largest number of mutants than those obtained by OptKnock in terms of growth rate and succinic acid production since OptKnock is high time consuming. Figure 8A shows that PFL, LDH_D, GLYCDx, G3PD2, PDH, and POX are the most important reactions to increase the amount of succinic acid. These reactions are mainly associated with the GldA-DhaKLM fermentative route and the Gly-3-P route (Fig. 1) in glycerol utilization [46], as well as acetyl-CoA generation given by the PDH knockout. In around 24% of the mutants predicted, a combination of GLYCDx and G3PD2 reactions was found to increase the succinic acid production. However, POX and LDH_D reactions were not present in iTA818 and iTA821 models, and PDH, G3PD2, and PFL were also found to be the most important reactions, predicted to have an effect over the growth rate (Fig. 8B).
The pyruvate dehydrogenase complex is a critical connection point between glycolysis and the TCA cycle, both of which function during aerobic respiration through catalyzing the conversion of pyruvate to acetyl coenzyme A (acetyl-CoA) [56]. PDH deactivation results in PFL carrying the ux from pyruvate to acetyl-CoA [57]. Simple reaction knockouts show that PDH deletion results in a growth rate reduction of ~ 5%. Additionally, 5 reactions (FUM, GAPD, PGK, PGM, and TPI) were predicted to have the most signi cant reduction (8-10%) in the growth rate during glycerol utilization. Of those reactions, only FUM has a signi cant positive effect over the succinate production when this deletion is carried out alone. However, in mutants in which both FUM and PDH were predicted (59.45%), TPI appeared in around 12.60% (Fig. 5). Then, the deletion of genes associated with TPI in addition to FUM and PDH reactions could negatively affect the growth rate.

Discussion
Glycerol metabolism in E. coli have been described in the literature. However, cell changes are carried out as response to stress situations. In this study, two conditions were tested for transcription response in E. coli for further integration to metabolic network modeling. Gene expression-wide analyses reveal how cell have the ability to avoid glycerol toxicity increasing the consumption. The most striking response to glycerol consumption and the possible mechanism to optimize succinic acid production from glycerol was revealed by the combination of transcriptome, metabolic modeling, and machine learning analyses.
After glycerol incorporation in the cell mediated by GlpF, glycerol can be metabolized through two pathways. The rst is mediated by the glycerol kinase GlpK through phosphorylation of glycerol to Gly-3-P, followed by GlpD activity under aerobic conditions, leading to dihydroxyacetone phosphate -DHAP- (Fig. 1). The alternative pathway consists of an oxidation step by glycerol dehydrogenase (GldA) to yield dihydroxyacetone (DHA), followed by phosphorylation by DHA kinase (DhaK) to give DHAP, as well. In this study, over expression of glpK was observed in both conditions, with a difference of around 20%. This result is not surprising since the GlpK-mediated reaction is the rate-limiting step in glycerol utilization [58]. However, it has been observed that under the optimized culture conditions the glycerol utilization rate is higher than the evolved conditions, suggesting that should exits other mechanism in the cell to enhance glycerol utilization. Gly-3-P is the rst intermediate between the glycerol pathway and the TCA cycle, as well as the biosynthesis and catabolism of lipids; however, accumulation of Gly-3-P can become toxic. Thus, it is carefully regulated [40]. The export of Gly-3-P could be mediated by phoE and ompF membrane porins; however, down-regulation of phoE (-8.67 and − 9.04 Log2 FC for the optimized culture and the evolved strain, respectively) and up-regulation of ompF (Log2 FC 2.43) in the optimized culture suggest that it could play an essential role in E. coli ATCC 8739 glycerol metabolism at high uptake rates avoiding toxicity.
The marked up-regulation of glpQ (5.351 and 5.597 Log2 FC for the optimized culture and evolved strain, respectively), which catalyzes the hydrolysis of glycerol-phosphodiesters to an alcohol plus Gly-3-P together with ompF could explain the partially higher transcript abundance of glpT since the externally generated (or supplied) Gly-3-P activates GlpT [59,60]. This protein exchanges Gly-3-P for phosphate, avoiding the toxicity of both Gly-3-P and the inorganic phosphates [40]. As result, and considering that phosphate is necessary to increase glycerol utilization, autoregulation of the PhoB/PhoR two-component regulatory system need to be down expressed. Down regulation of PhoB/PhoR was observed in this study which could explain the achievement of optimal density [61], as well as contribute to the regulation of glycerol phosphate metabolism [62].
The transcriptional analysis also identi ed the differential expression of both avin oxidases, glpD and glpABC. Once Gly-3-P is in the cytoplasm, it is oxidized to dihydroxyacetone phosphate by one of two avin-dependent oxidases encoded by glpD or glpABC genes under aerobic or anaerobic conditions, respectively [40,46]. In the presence of oxygen or nitrates, GlpD transfers electrons to the respective terminal oxidizes. In contrast, under anaerobic conditions the GlpABC system transfers the electrons to fumarate or nitrates [63]. GlpD up-regulation was expected since culture conditions were under aerobic conditions, but higher expression of the glpABC system was surprising. Over expression of glpABC under aerobic conditions could be elucidated because of the activation of fumarate reductase enzymes (Table 1) in the evolved strain as results of high cell densities during the ALE process. However, in glycerol fermentation studies, the ΔfrdA mutant has been shown to be bene cial for glycerol fermentation because it prevents the negative impact of hydrogen by maintaining suitable redox conditions [64]. Moreover, its activity could be supported by sdhABCD since they are structurally and functionally homologous [65]. Therefore, we hypothesized that frdABCD up-regulation could be the reason why enhancement in the glycerol utilization was not observed in the evolved strain, even when an optimized culture medium was employed.
The insights on the molecular adaptive responses of E. coli to glycerol consumption revealed by the transcriptional datasets identi ed a marked hdeAB up-regulation only in the evolved strain. This is attractive since HdeAB are periplasmic proteins that play a role in optimal protection at low pH [66,67]. Therefore, differences in hdeAB upregulation in the evolved strain and the optimized culture medium probably occur because acetate is the main product in glycerol utilization and under ALE conditions the pH was not controlled. Moreover, the addition of a phosphate buffer system using the salts Na 2 HPO 4 and KH 2 PO 4 provides the culture medium used directly for the optimized condition with a buffering capacity (Ng, 2018).
It was observed that the main and preferable route for glycerol consumption is the pathway mediated by GlpK since this gene was highly overexpressed in high glycerol consumption cultures. Moreover, glpK deletion has also been observed to be essential for glycerol utilization as the sole carbon source [69]. Then, deletion of this gene could result in a non-effective bioconversion process. As a result, this gene should not be taken into account for engineered E. coli strains using glycerol as the carbon source even when the GLYK reaction was repeatedly predicted to be knocked by OptKnock in ECC2 and iTA821 since two pathways for glycerol utilization in E. coli exist.
Based on OptKnock and random forest model predictions, four critical control points, glycolysis, pyruvate metabolism, the pentose phosphate pathway, and the TCA cycle, are associated with the overproduction of succinic acid. FUM and SUCDi appear to be the most signi cant keys in the TCA cycle for succinate overexpression. Results of this study suggest that they are mutually exclusive. Parallelly, the knockout of byproducts such as acetate, formate, and lactate by deleting POX, ACKr, PTAr, PFL, and LDH_D were highly predicted to be knocked out. Those results are interesting since one of the bottlenecks for industrial production of bio-based products is the elimination of by-products, which could facilitate the recovery and puri cation process. This results and those obtained in the transcriptional responses suggest that deletion of the pta need to be, almost as mandatory, carried out since acetate production become a competitive pathway in glycerol metabolism for succinic acid production [68].
The pyruvate dehydrogenase complex is a critical connection point between glycolysis and the TCA cycle, both of which function during aerobic respiration through catalyzing the conversion of pyruvate to acetyl coenzyme A (acetyl-CoA) [56]. PDH deactivation results in PFL carrying the ux from pyruvate to acetyl-CoA [57]. Simple reaction knockouts show that PDH deletion results in a growth rate reduction of ~ 5%. Additionally, 5 reactions (FUM, GAPD, PGK, PGM, and TPI) were predicted to have the most signi cant reduction (8-10%) in the growth rate during glycerol utilization. Of those reactions, only FUM has a signi cant positive effect over the succinate production when this deletion is carried out alone. These results indicate that those mutants predicted by OptKnock where FUM and PDH are predicted are potential mutants to be tested in the lab because it have been observed that a low growth rate could negatively affect the pro tability of industrial bio-based production products [2,70].. However, in mutants in which both FUM and PDH were predicted (59.45%), TPI appeared in around 12.60% (Fig. 5). Then, the deletion of genes associated with TPI in addition to FUM and PDH reactions could negatively affect the growth rate. This because in the absence of TpiA, DHAP is converted to methylglyoxal, which, even at sub-millimolar concentrations, is a toxic compound [40]. DHAP is the results of the alternative pathway on glycerol metabolization consistent of an oxidation step by glycerol dehydrogenase (GldA). DHAP must be transformed into the general glycolytic pathway through isomerization by triosephosphate isomerase (TpiA) as glyceraldehyde-3phosphate (GA3P). Therefore, deletion of tpiA could result in growth inhibition and cell death in the presence of glycerol as the only carbon source [69]. However, since FBA is not able to capture regulation, this situation could not be predicted by OptKnock.
Finally, computational models suggest that deletions of just 6-7 reaction knockouts are bene cial for industrial production since the growth rate does not decrease extremely. It is important to consider that a similar succinate production could be achieved if 6-8 reactions are knocked for all models. An assumption using optimization methods to predict cell capabilities is that the cell could quickly adjust the metabolism to maximize growth under certain conditions. This a rmation could be true for WT strains because FBA predicts an optimal condition. However, in metabolically engineered strains, the cell attempts to compensate the genetic changes carried out by the fewest changes in gene regulation until it achieves an optimal state that could be predicted using FBA [72]. Then, FBA in engineered strains predicts a long-term evolved state. Thus, an alternative to evaluate unevolved mutants is the MOMA method [24]. MOMA solves this problem by nding the solution that is most similar to the WT state (maximization of WT growth rate). Figure 7 shows a jump in the Euclidian distance between the WT and mutant strains when succinate production increases. This result could imply that after genetic manipulation microbial cell factories requires to be evolutionary engineered. ALE studies have shown to provide the cell with the ability to growth under selection pressure to go up from a sub-optimal state to the optimal growth rate predicted using in silico models [73]. Moreover, since OptKnock seeks to maximize the ux of a target chemical while maximizing the growth rate, our predictions could be bene cial for further ALE experiments because microbial cell factories have naturally evolved to maximize the growth rate. Thus, the succinic acid production rate would increase as biomass formation increases [55] by using ALE rounds after metabolically engineering cells [71].

Conclusions
By adopting tools from various disciplines, computational methods for systems metabolic engineering have been developed in order to understand the cell behavior and how level systems (RNA, proteins, metabolites, etc.) can interact inside the cell for industrial purposes. In the same way, E. coli has been extensively studied to become it a cell factory for the production of useful bio-based chemicals and materials through its native capabilities. However, there are some challenges that still need to be overcome.
This study proposes that computational tools have the potential to accelerate the optimization of cell factories by identifying metabolic engineering targets (genes/reactions) and not just by predicting mutants that may be biologically unviable. Therefore, systems metabolic engineering allows reduce time in rational strain design as well as to guide the selection of metabolic engineering targets based on the cell behavior under experimental conditions. At the same time, departing from traditional computational tools new methods such as machine learning could be proposed as an interesting alternative for the reduction of computational demand. However, these techniques are dependent on the level of completeness and accuracy of the metabolic model considered which could be improved by using omics data.

Methods
Strains and culture conditions E. coli ATCC 8739 was obtained from the American Type Culture Collection (ATCC). A glycerol-based medium containing the following components (per liter) was used as the reference culture condition: 5 g of yeast extract, 2.

Strains and culture conditions for adaptive laboratory evolution
To adapt E. coli ATCC 8739 on glycerol, E. coli was continuously sub-cultured in Luria-Bertani (LB) medium (5 g/L of NaCl, 10 g/L of tryptone, and 5 g/L of yeast extract) at 72 h per round until the 3rd round, where each round corresponds to one transfer to the same media. After every 3 rounds, the concentration of tryptone was decreased in 1 g/L until reaching 0 g/L of tryptone. Then, 10 rounds were carried out. A 50 mL culture was carried out in a 250 mL, non-ba ed, conical Erlenmeyer ask and cultivated aerobically at 37 °C and 200 rpm. Then, 1 mL of the culture after each round was used as the inoculum for the next culture. Cells at the end of each ALE were kept at − 80ºC and used for further evaluation of the growth and glycerol uptake, as well as transcriptomics analysis.

Differential expression analysis
RNA seq was carried out in triplicate for all conditions. For the evolved strain, the culture conditions for RNA-seq were the same as that for the optimized culture medium condition. To harvest the cells for total RNA puri cation, culture sample was rst treated with RNAprotect Bacteria Reagent (Cat No./ID: 76506), and enzymatic lysis and Proteinase K digestion of the bacteria were carried out following the manufacturer's protocol. Then, the Qiagen RNeasy Mini kit (Cat No./ID: 74104), following the manufacturer's protocol, was used to obtain the total RNA for further analysis. Each sample was treated with DNase following the protocol in order to remove the DNA. The samples were sent to commercial RNA-seq services for further sample processing and sequencing (Genewiz, South Plain eld, NJ).
Clean, raw data was obtained by removing the reads containing adapters using trimmomatic. The sequence RefSeq: NC_CP010468 was employed for mapping. RNA reads were mapped using the software bowtie2, and featureCounts was employed for read counts. SARTools (Statistical Analysis of RNA-Seq data Tools) [74] was used for statistical RNA-Seq analysis. Differentially expressed genes (DEGs) were identi ed using the DESeq2 R Package. The functional classi cation of the DEGs was performed using the gene ontology (GO) analysis by Blast2Go [39]. The data discussed in this publication has been deposited in NCBI's Gene Expression Omnibus [75] and are accessible through GEO Series accession number GSE140847.

Genome-scale metabolic network reconstruction
In order to obtain metabolic engineering targets to overproduce succinic acid from glycerol, the following two E. coli models were used: EColiCore2 (ECC2) (data under peer review) and iTA1338 for E. coli ATCC 8739 (Supplementary File S1). Gene associations for both models were modi ed to ECOLC_RS number based on the sequence RefSeq: NC_CP010468 to facilitate the integration of transcriptomics data. Extensive manual curation was conducted, including (i) adding/eliminating transport reactions and extracellular metabolites and (ii) lling pathway gaps. Gap Find and Gap Filling, two optimization problems that search for root metabolite problems that are not connected in the network and that solve them, were used to ll gaps in iTA1338, including biomass reaction BIOMASS_Ec_iML1515_WT_75p37M (Supplementary File S1). All optimization problems were solved using the COBRA Toolbox v.3.0 [76].

Transcriptomics integration and metabolic engineering target prediction
Gene inactivity moderated by the metabolism and expression (GIMME) [77] method was used to integrate transcriptome data with the E. coli metabolic model. This method then minimized the usage of low-expression reactions while keeping the objective (e.g., biomass) above a certain value. Expressed genes were considered according to their expression level with Log 2 Fold Change (FC) ≥ |1|. Next, according to the gene-protein reaction rules and the de ned gene expression states, a speci c activity state for each reaction was identi ed. Finally, a speci c context model was obtained from the transcriptomic data. Metabolic engineering targets were obtained using OptKnock. The maximum uptake rate of glycerol was set to 13.3 mmol/gDW h − 1 . The GIMME and OptKnock methods were conducted using COBRA Toolbox v.3.0 [76] in MATLAB 2017b and Gurobi 8.0.1.
Machine learning to determine potential metabolic engineering targets Random forest models are supervised machine learning approaches, which have the advantage of giving a summary of the importance of each variable. This approach is based on a randomized variable selection process. An estimation of variable importance is provided by IncNodePurity, which measures the decrease in tree node purity that results from all splits of a given variable over all trees [78]. For interpretation purposes, this measure can be used to rank variables by the strength of their relation to the response variable [78]. A matrix of binary values was built from m mutant predicted and n reactions in the set of possible reactions to be knocked-out. In this matrix, one represents the presence of one speci c reaction to be deleted in the mutant and zero absence in the combination of reactions to be deleted in the mutant. The matrix was partitioned into training and test sets; the training set was used to build a random forest model to predict succinic acid production, growth rate, or the growth rate Euclidean distance between the mutant and WT strains as response variables. For the training set, succinic acid production, growth rate variable response was initially predicted using FBA, and the growth rate Euclidean distance between the mutant and WT strains was predicted using MOMA. Next, the model performance was assessed using the testing set. Finally, we used the random forest to determine the importance of each target reaction over the three evaluated response variables.
Declarations Funding This research was results of the Gobernación del Cesar program of Science, Technology, and Innovation for higher education by PhD scholarships.