Genome-Scale Metabolic Model Based Engineering of Escherichia Coli Enhances Recombinant Single Chain Antibody Fragment Production

Escherichia coli is an attractive and cost-effective cell factory for producing recombinant proteins such as scFvs. AntiEpEX-scFv is a small antibody fragment receiving considerable attention for the epithelial cell adhesion molecule (EpCAM) targeting. EpCAM is one of the rst discovered-cancer-associated biomarkers highly expressed on various types of solid tumors. Hereby, a genome-scale metabolic model guided engineering strategy was proposed to recognize gene targets for improved antiEpEX-scFv production. Flux balance analysis and FVSEOF algorithm identied several potential genetic targets localized in the glucose import system and pentose phosphate pathway that probably guaranteed an improved yield of scFv. Among the targets predicted by the model, glk gene encoding glucokinase was selected to be overexpressed in the parent strain Escherichia coli BW25113 (DE3). Due to metabolic burden, scFv recombinant expression caused a remarkable decrease in the maximum specic growth rate of the transformed strain. By means of overexpressing glk, presumably increasing carbon ux through the PP pathway, the growth capacity of the E. coli recombinant strain was recovered. Moreover, the engineered strain with glk overexpression successfully increased scfv production. The titer of antiEpEX-scFv reached 235.41 ± 9.53 µg/mL (0.428 g/g DCW) in the engineered strain compared with the parent strain (110.236 ± 7.68 µg/mL; 0.202 g/g DCW). So, model-based prediction was experimentally validated. This approach can be considered for the improvement of other recombinant proteins production.


Introduction
Nowadays, majority of clinically valuable proteins are recombinant ones whose production is directly in uenced by the metabolism of the producing cell factory. Bacteria and yeasts, are two of the most well established microbial cell factories characterized by high-yield production of recombinant proteins (Ferrer-Miralles and Villaverde 2013). Despite continuous efforts, there are still limiting barriers in overexpression of recombinant proteins that need to be overcome. Metabolic burden observed in producing cells leads to biomass yield reduction, which is part of a stress reaction triggered by protein overexpression and results in reduction of recombinant protein productivity. Circumventing these challenges, metabolic engineering is being successfully employed in developing industrial cell factories (Fernández-Cabezón and Nikel 2020). Using this approach, speci c biochemical reactions can be modi ed by deletion and/or ampli cation of speci c genes leading to regulatory circuits rearrangement (Herrgård et al. 2006). These modi cations lead to better productivity of the strains by changing the ux of some metabolic reactions. Based on this strategy, a signi cant improve in production rate of numerous heterologous products has been reported. For example, heterologous lycopene was produced up to 102 mg/L in an engineered Escherichia coli strain. However, the systematical understanding of metabolic pathways as well as regulation mechanisms is highly needed for rational metabolic engineering.
System metabolic engineering strategies that use omics data, have been expanded to guide metabolic engineering. These novel approaches such as genome-scale metabolic model (GEMM)-guided engineering have been evolved building on enormous advances in system biology. Genome-scale modeling can quantitatively predict the cellular behavior at system level and try to recommend genetic manipulations to regulate the relationship between target protein overexpression and biomass production so that strains with high growth and protein productivity can be attained. High accuracy in prediction of cell phenotype and minimization of consumed laboratory resources and time for developing productive strains are the most momentous advantages of the model-guided metabolic engineering (Orth et al. 2011). The ux variability analysis (FVA) or the Flux balance analysis (FBA) can be utilized to compute the E. coli system model by considering the maximum amount of biomass as an objective target to achieve potential genes for overexpression or down-regulation. FVA or FBA result a ux variability or a ux distribution value respectively. Various algorithms such as FSEOF ( ux scanning based on enforced objective ux) (Choi et al. 2010) as a FBA-based method and FVSEOF ( ux variability scanning based on enforced objective ux) (Park et al. 2012) and OptForce (Ranganathan et al. 2010) as two FVA-based methods have been employed in the metabolic engineering to determine appropriate genes to be ampli ed in order to improve industrial strains. In this study, scFv (single chain variable fragment) antibody against EpEX (EpCAM extracellular domain) which has drawn great attention in biomedicine for dual therapeutic and diagnostic applications (Eyvazi et al. 2018), was considered as a model protein to be overproduced in E. coli BW25113 (DE3). The FVSEOF method was used to determine appropriate modi cations of the genes in order to increase ux towards antiEpEX-scFv overproduction. Based on the results obtained from in silico FVSEOF analysis, glk gene was selected to be overexpressed. Experimental tests were then performed to evaluate the effect of glk overexpression on antiEpEX-scFv overproduction.

Metabolic modelling and target gene prediction
The iJO1366 metabolic model of E. coli (Orth et al. 2011) was used in the COBRA Toolbox v2.0 (Schellenberger et al. 2011) under MATLAB 2014b (Mathworks, USA) with glpk as the solver. The metabolic reaction of antiEpEX-scFv production was added to iJO1366 according to the previously described method (under review). FBA was used to determine maximum theoretical amount of antiEpEX-scFv (0.034 mg.gDCW − 1 h − 1 ) but this value can be achieved only when biomass leads to zero and cannot be used as a realistic constraint for further analysis, therefore FVSEOF method was used by considering a range of 0 to 0.017 mg.gDCW − 1 h − 1 of antiEpEX-scFv as constraint range in order to achieve maximum biomass. FVSEOF discretizes the range (0 to 0.017 mg.gDCW − 1 h − 1 ) into ve consecutive FVAs and computes the reaction uxes in the model to maximize biomass production as an objective function.
Comparing the result of FVAs declares the reaction uxes variation, most of the reactions in the model remain unchanged or with less than 0.1 change in ux rate which was ignored. Reactions with signi cant ascending uxes are favorable for experimental investigations due to their positive effect on antiEpEX-scFv production.
Bacterial strains, plasmids, and cultivation conditions University of Medical Sciences, Tehran, Iran) was utilized as a co-expression vector. The plasmid pETDuet-antiEpEX-scFv was previously constructed in our lab (Behravan and Hashemi 2021). Pfu DNA polymerase was provided from mxcell. T4 DNA ligase, protein molecular weight markers and restriction enzymes were obtained from Thermo Fisher Scienti c (USA). DNA fragments were puri ed from agarose gel using the gel extraction kit (Roche Diagnostics GmbH, Germany). M9 minimal medium contained (per liter) 0.5g of NaCl, 6g of Na 2 HPO 4 , 3g of KH 2 PO 4 , 1g of NH 4 Cl supplemented with 5 g/L glucose, 2 mM MgSO4, 0.01 mM FeCl3, 0.1 mM CaCl2, and 0.1 mL 1000x trace metals element (Teknova-USA) and LB medium composed of (per liter) 10 g of tryptone, 5 g of yeast extract, and 10 g of NaCl, were used as culture media. All other chemicals were purchased from merck in analytical grade.

Construction of recombinant plasmid
The glk gene was ampli ed from the genome of E. coli BW25113 (DE3) employing the primers glk-F, 5-CCGGAATTCTGAAGAATGACAAAGTATGC − 3 (the EcoRI site is marked) and glk-R, 5-AAACTGCAGCCCGATATAAAAGGAAGGAT − 3 (the PstI site is marked). Cycling condition was 94°C for 3 min followed by 30 cycles of 94°C, 30 s; 56°C, 35 s, 72°C 1min and 30 s and 1 cycle of 72°C, 10 min. To generate plasmid pETDuet-glk, the PCR product was digested with restriction enzymes PstI, EcoRI, and then ligated to pETDuet-1 treated with the same two enzymes. pETDuet-glk-antiEpEX-scFv expression plasmid was constructed by digesting the pGH vector carrying the antiEpEX-scFv gene with XhoI and NdeI to get the gene with a hexa-histidine tag in its C-Terminal, which was then ligated with pETDuet-glk treated by NdeI/XhoI. The restriction enzyme digestion assay and sequencing were used to con rm the constructs which were transformed into the chemically competent E. coli BW25113 (DE3) cells for recombinant protein expression.

Expression of antiEpEX-scFv
For antiEpEX-scFv expression, a single colony of BW25113 (DE3) harboring pETDuet-glk-antiEpEX-scFv was inoculated into 5 mL of LB medium supplemented with appropriate amounts of ampicillin and incubated for 18 h at 37°C with shaking (200 rpm). After centrifugation (6000×g for 5 min at 4°C), the pellet was resuspended into 100 mL of M9 minimal medium containing ampicillin (50 mg/mL). When cell density reached an OD 600nm of 0.8, the expression of antiEpEX-scFv was induced with 0.8 mM IPTG at 37°C. Using centrifugation (10000×g, 10 min, 4°C), cells were harvested after 24h. For initial determination of protein expression, the cell pellets were suspended in 30 ml of lysis buffer containing 50 mM Tris pH 7.5, 1 mM EDTA, 1 mg/ml lysozyme, 150 mM NaCl, 1% triton X1005 and sonicated for 30 min (20s ON, 10s OFF at 400 W). After centrifugation of the cell lysate (10000×g for 30 min at 4°C), protein samples were electrophoresed on a 15% SDS-PAGE gel and visualized using Coomassie Brilliant Blue G-250 Dye. By using a wet Trasbolt (Bio-Rad, USA), the proteins were electro-transferred from gel into the polyvinylidene di uoride (PVDF) membrane to perform western blot analysis. Transferred membrane was blocked in 5% nonfat milk for 1 hour and then was washed three times with TBST and then incubated in His-tag antibody (Sigma, UK) overnight. After washing again, the membrane was incubated in anti-mouse HRP conjugated immunoglobulin (Sigma, UK) as secondary antibody for two hours and then detected by means of a solution of 3,3′-Diaminobenzidine DAB (Sigma, UK). The recombinant antiEpEX-scFv was puri ed using the Ni-NTA a nity chromatography column under denaturing conditions based on the manufacturer's protocol (Qiagen, Netherlands). Utilizing the bicinchoninic acid assay (BCA assay), the concentration of the puri ed protein was measured (Takara BCA Protein Assay Kit, Takara, Japan).

Growth pro le and glucose analysis
To investigate cell growth pro le, optical density at 600 nm was determined every hour, using spectrophotometer (E-Chrome Tech, Taipei, Taiwan). Logarithmic derivation of the optical density curve was used for calculation of growth rate. In order to determine the glucose concentration, one milliliter of sample from culture broth was harvested in 1 hour intervals. The supernatant was collected following centrifugation at 10000×g (10 min). The concentration of glucose was measured using a commercial enzymatic kit (Megazyme, Wicklow, Ireland).

Real-time PCR analysis
The relative expression of glk as a target gene was compared between E. coli BW25113/Duet-glk-scFv, wild type strain E. coli BW25113 and E. coli BW25113/Duet-scFv using RT-qPCR. E. coli strains were cultured in 50 ml of M9 medium and induced with 0.8 mM IPTG in OD 600 = 0.8. After 3 hours, samples were collected and diluted to OD 600 = 0.4. Based on the manufacturer's protocol, total RNA was extracted from bacterial cells utilizing Trizol reagent (Ambion). The purity and quantity of the isolated RNA were measured by Synergy HTX multimode reader (BioTek) and was stored at -80°C for further use. cDNA synthesis kit (YT450; Yekta Tajhiz Azma) was employed to synthesize cDNA according to the instruction provided by the manufacturer. Reviewing the literature, primers were selected and assessed for GC content, speci city, secondary structures, and amplicon size. Primers sequences synthesized by Metabion are presented in Table1. StepOne Real-Time PCR System (Applied Biosystems) was employed for SYBRGreen qPCR reactions in 48 well optical reaction plates. cDNA (0.5 ng/reaction) was used as a template for qPCR reactions with 5 µl SYBR Green PCR Master Mix (2×) (YT2551; Yekta Tajhiz Azma) and primers at 10 µM nal concentration. Samples were exposed to thermal plan as follows: 95°C, 30 s followed by 40 cycles of 95°C, 5 s and 60°C, 30 s. The PCR reactions were done in three technical replicates for more accuracy. 2 −ΔΔCt method was used to evaluate relative gene expression against the reference gene.

Results
Prediction of overexpression targets E. coli GEMM named iJO1366 was employed for prediction of metabolic engineering targets which can improve antiEpEX-scFv production. FVSEOF predicted ten metabolic reactions to enhance antiEpEX-scFv productivity via their overexpression. Two genes related to the importation of glucose to the cell (galP and glk), four genes related to the pentose phosphate pathway (PPP) (zwf, rpe, pgl, gnd), two genes related to the folate biosynthesis pathway (focA and purU), and two genes related to alternative carbon metabolism sub-system (xylA, mak) were suggested by FVSEOF analysis for overexpression. These reactions in metabolic pathway are illustrated in Fig. 1.
The transcriptome analysis showed that the glk gene was upregulated during the recombinant protein production in E. coli (Oh and Liao 2000). Therefore, in this study the glk gene was selected as the target gene for overexpression.

Evaluation of glk transcription level
In order to validate the overexpression of glk, real-time PCR experiment was employed. After 3 h of induction, the relative quanti cation of glk transcript revealed that glk gene was upregulated in E. coli BW25113/Duet-glk-scFv by 14.78-fold and 925.56-fold in comparison to the E. coli BW25113/Duet-scFv and the parent strain respectively.
The effect of glk overexpression on antiEpEX-scFv production Glucokinase gene was ampli ed from E. coli BW25113 (DE3) genome by PCR reaction resulting in a 966 bp glk gene which can encode 321 amino acids with the molecular weight of about 35 kDa. glk and antiEpEX-scFv coding sequences were inserted into the rst and second multiple cloning sites (MCS) of pETDuet-1 respectively to generate the plasmid pETDuet-glk-antiEpEX-scFv as shown in Fig. 2. The constructed pETDuet-glk-antiEpEX-scFv plasmid was con rmed by restriction enzyme digestion (Fig. 3) and sequencing and transformed into E. coli BW25113 (DE3). Recombinant E. coli BW25113/Duet-glk-scFv and E. coli BW25113/Duet-scFv were cultured in M9 minimal medium containing a suitable antibiotic to an OD 600 of 0.8 and induced with 0.8 mM IPTG for 24 h. The SDS-PAGE analysis exhibited the presence of two separate protein bands with molecular weights of about 35 kDa (glk) and about 29 kDa (antiEpEX-scFv) (Fig. 4a). The expressed scFv protein was con rmed using western blot analysis and an antibody against a C-terminal histidine tag (6xHis-tag) (Fig. 4b). Recombinant antiEpEX-scFv protein was puri ed through a nity chromatography using Ni-NTA matrix (Fig. 4c). BCA analysis was used to calculate recombinant protein concentration. According to the concentration of the puri ed antiEpEX-scFv, the glk -overexpressed strain showed an increase in antiEpEX-scFv titer (235.41 ± 9.53 µg/mL; 0.428 g/g DCW), which was approximately 2.135 times higher than that in strain with no glk overexpression (110.236 ± 7.68 µg/mL; 0.202 g/g DCW) after 24 h post-induction cultivation (Fig. 4b). The results suggested that the altered glucose metabolism by glk overexpression could improve the antiEpEX-scFv production.

Growth and glucose consumption pro les
To examine how co-expression of glk with antiEpEX-scFv can affect bacterial growth rate and glucose consumption rate, wild-type strain (BW25113), recombinant E. coli BW25113/Duet-glk-scFv and E. coli BW25113/Duet-scFv, were cultured in M9 minimal medium containing 5 or 10 g/L glucose at 37°C. IPTG in nal concentration of 0.8 mM was added in OD 600 = 0.8 for protein induction. In order to obtain growth pro le, OD in 600 nm were measured for 24h with 1-hour intervals. Each measurement was performed in duplicate.
All strains in M9 minimal medium containing 5g/L glucose ( Fig. 5a and 5b), grow logarithmically as long as glucose is available and when glucose is depleted, enter to the stationary phase (Fig. 5a). As shown in Fig. 5a, maximum speci c growth rate in recombinant E.coli BW25113/Duet-scFv and BW25113/Duetglk-scFv (µ max = 0.462 ± 0.034 and µ max = 0.552 ± 0.003 respectively) was lower than that in the parent strain (µ max = 0.637 ± 0.013). Since recombinant protein production and cell growth share some common precursors, increased protein expression at the expense of decreased cell density maybe due to alteration of intracellular uxes through the biomass precursors towards protein synthesis in recombinant strains.
However, recombinant E. coli BW25113/Duet-glk-scFv has greater speci c growth rate than the E. coli BW25113/Duet-scFv (Table 2) especially when more glucose is available in the media. As shown in Fig. 5C and presented in Table 2, µ max is much higher for E. coli BW25113/Duet-glk-scFv (0.81 ± 0.043) in comparison to E. coli BW25113/Duet-scFv (0.592 ± 0.003) and wild type strain (0.729 ± 0.022). As expected, maximum speci c growth rate of all strains in M9 minimal medium containing 10 g/L glucose, is higher than that in the medium supplemented with 5 g/L glucose.  Fig. 5b, the required time for the complete consumption of glucose for parental strain and the E. coli BW25113/Duet-glk-scFv were 10 hours while E. coli BW25113/Duet-scFv needs two more hours to consume all the glucose in the medium. Interesting result from Fig. 5d is that the required time for complete consumption of glucose for parental strain and E. coli BW25113/Duet-glk-scFv is similar (about 24 h) while E. coli BW25113/Duet-scFv doesn't consume all of the available glucose in the medium up to 24 h (Fig. 5d). It con rmed that the overexpressed glk gene functioned well in E. coli BW25113/Duet-glk-scFv. As a conclusion, the enhancement of the growth rate and the glucose consumption rate by overexpressing glk gene is considerable in the results.

Discussion
Development of the hosts that have desirable metabolic phenotypes and ability to produce heterologous products is an important issue in microbial metabolic engineering. Utilizing various algorithms, GEMM based approaches enabled scientists to recognize gene deletion or overexpression targets for developing cell factories. For example, using MOMA simulations, L-valine biosynthesis was successfully improved in an engineered E. coli strain (Park et al. 2007). Also, ampli cation of idi gene selected by FSEOF together with the dxs gene led to lycopene overproduction (Choi et al. 2010). However, metabolic phenotypes prediction after gene deletion is much simpler than that after gene ampli cation. Because the corresponding metabolic uxes of the deleted gene can be assumed as zero, while, owing to complex regulation of the metabolic network, the corresponding uxes of the ampli ed genes do not certainly increase. Moreover, the amount of increase in metabolic uxes corresponding to the gene ampli cation is di cult to be predicted. In this study, in order to increase the ux towards antiEpEX-scFv overproduction, the glk gene was selected for ampli cation among the several targets predicted by FVSEOF.
According to our results, recombinant expression of scFv and glk resulted in a decrease in the maximum speci c growth rate of the recombinant strains compared with the parent strain. A decrease in the growth rate is normally detectable in bacteria transformed with multicopy plasmids to produce a recombinant protein. Actually, plasmid DNA replication, plasmid-encoded mRNA synthesis and translation in bacteria frequently place a metabolic burden into the engineered strains that usually results in growth retardation (Flores et al. 2004). This metabolic burden may be due to the cell inability to supply the extra demand of energy and building blocks required for plasmid replication and foreign multicopy genes expression (Li and Rinas 2020). However, a signi cant increase was observed in the µ max of the recombinant strains from 0.592 ± 0.003 in BW25113-Duet-scFv to 0.81 ± 0.043 in BW25113-Duet-glk-scFv when the expression of the glk gene was increased which was comparable to the wild-type strain (0.729 ± 0.022). glk gene encodes the enzyme glucokinase catalyzing the ATP-dependent phosphorylation of the glucose that was imported by GalP. glk overexpression probably compensates the special metabolic demands of the engineered strains via increasing the carbon ux into the PP pathway. The PP pathway which is closely interconnected with glycolysis normally provides some of the required blocks for biosynthesis of histidine, nucleotide and aromatic amino acids e.g., erythrose-4-phosphate (E4P) and ribose-5-phosphate (R5P) (Stincone et al. 2015). Also NADPH, a power of biosynthetic reactions, was reduced in its oxidative branch (Christodoulou et al. 2018). In a similar study, engineering of the pentose phosphate pathway led to reduction of the metabolic load caused by the recombinant protein production (Flores et al. 2004). Moreover, using this approach, a signi cant positive effect was observed on the productivity of scFv producing strains in the current study. The glk -overexpressed strain produced approximately 2.135 times higher titer of scFv than the strain with no glk overexpression. So, the metabolic engineering target predicted in our study was validated via the improvement observed in the scFv production.
Recently, the integration of the computational methods and omics at the systems level can empower metabolic engineering. Employing DNA microarray, oh et al revealed that the overproduction of recombinant non-toxic LuxA could lead to the downregulation of ppc, fba, gnd, and atpA genes as well as upregulation of heat shock and glk genes in E. coli strains including JM109, MC4100, and VJS676A. Based on the transcriptome pro le obtained in oh et al study, glucose kinase might have the major role instead of the phosphotransferase system (PTS) to provide glucose6-phosphate in protein overproducing condition in the E. coli cells (Oh and Liao 2000). On the other hand, overexpression of the recombinant proteins was shown to induce heat shock genes and rapid stress response. Interestingly, glk has been reported to play an essential role in bacterial stress responses. Although, this gene plays a minor role in glucose metabolism, but under stress condition like heterologous protein expression or growth in acidic condition, this glycolysis enzyme is required to supply su cient level of glucose6-phosphate (Arora and Pedersen 1995; Zhang et al. 2020). So, glk seems to be a suitable target gene to be overexpressed to achieve increased recombinant protein productivity, which is consistent with our results.
Here, the GEMM-guided metabolic engineering strategy was used to improve the scFv production in Escherichia coli BW25113 (DE3). The engineered strain with glk overexpression successfully increased scFv production. The titer of antiEpEX-scFv reached 235.41 ± 9.53 µg/mL (0.428 g/g DCW) in the engineered strain compared with the parent strain (110.236 ± 7.68 µg/mL; 0.202 g/g DCW). Our method for the production of scFv is a successful example which can be considered for the improvement of other recombinant proteins production.  Figure 1 Overexpression targets illustrated in the metabolic map. The predicted targets for overexpression are indicated by their bold gene names.(PPP pathway: Pentose Phosphate Pathway, fru: Fructose, glc: Glucose, g6p: glucose-6-phosphate, f6p: Fructose-6-phosphate, pgl: 6-phosphogluconolacton, Ru5P:
containing glucose as a carbon source. When OD600 reached to 0.8, cells were induced for 24 h using 0.8 mM IPTG. The experiments were performed in duplicates. The protein bands corresponded to antiEpEX-scFv and glk are shown by arrows.

Figure 5
Growth pro les and glucose consumption rates. When OD600 reached to 0.8, cells were induced for 24 h using 0.8 mM IPTG. Growth and glucose consumption pro les of the wild type (BW25113) and the recombinant strains in M9 medium supplemented with 5 g/L glucose (A, B) and 10 g/L glucose (C, D).
Error bars illustrated the standard deviation of two experimental replicates. All graphs are drawn using GraphPad Prism 8 software. Data are presented as mean ± SD, n = 2.