Process Evaluation of Recombinant Chitin Deacetylase Expression in E. Coli Rosetta pLysS Cells using Statistical Design of Experiments


 Chitin is a natural polymer with N-acetylglucosamine units, extracted from seafood waste as a major source. It remains an underexplored polymer due to its crystalline structure. The commercial applicability can be improved if we could make it soluble. One of the routes employed to decrease this crystallinity is the conversion of chitin to chitosan via deacetylation. The industrial production of chitosan uses chemical methods, which leaves the process footprint on the environment. The greener alternative approach to producing chitosan is using chitin deacetylases (CDA). The enzymatically converted chitosan with known characteristics has a wide range of applications, importantly in the biomedical field. In the present paper, we report heterologous expression of CDA from a marine moneran; Bacillus aryabhattai B8W22. The process and the nutritional conditions were optimized for the submerged fermentation condition of E. coli Rosetta pLysS expressing the recombinant CDA using the design of experiment tools. The employment of central composite design (CCD) resulted in a ~2.39 fold increase in the total activity of expressed CDA with the process conditions of induction temperature at 22 ºC, agitation at 120 rpm, and 30 h of fermentation. The nutritional conditions required for the optimized expression were 0.061% glucose concentration and 1% lactose in media. The employment of these optimal growth conditions could result in cost-effective large-scale production of the lesser-explored moneran deacetylase, embarking on the greener route to produce biomedical grade chitosan.

NdeI and XhoI, ligated using a DNA ligase enzyme. The successful cloning was con rmed by colony PCR with gene-speci c primers, restriction digestion, and sequencing with T7 universal primer.

Extracellular expression of BaCDA
The pET-22b BaCDA vector was transformed into E. coli Rosetta pLysS cells by calcium chloride method to express BaCDA gene. The initial expression was carried out in TB media (composition % (w/v): 1.2, tryptone; 2.4, yeast extract; 1X TB salt, and 0.6% glycerol) containing 0.05% (w/v) glucose and 0.2% (w/v) lactose at 16°C with 180 rpm agitation rate till its growth reaches to stationary phase (Studier 2005). The expression was auto-induced by including lactose and glucose in the media as inducer and repressor respectively. To enhance the expression and enzyme activity using the statistical approach (CCD), three physical factors (induction temperature, agitation rate, and induction time) and two nutritional factors (glucose and lactose concentration) were chosen.

Optimization and experimental design
In this study, CCD was framed using a statistical optimization tool MINITAB 17.0 (Trial Version) to design the experiments and t the second-order polynomial model. Five factors at ve different levels (-2, -1, 0, +1, +2) with six replicates at the center point requiring 32 experimental runs were designed. The ve factors such as induction temperature, agitation rate, incubation time, glucose concentration, and lactose concentration were selected for optimization of expression of recombinant chitin deacetylase. These factors were selected based on our preliminary studies. Table 1 depicts the factors and their levels both in coded and uncoded terms used in the process optimization using CCD. Whereas Table 2 depicts the experimental design matrix, levels, and factors in terms of coded and uncoded units. Further to understand the impact of factor interactions, a quadratic model was established to correlate the total activity and expression of recombinant chitin deacetylase in E. coli Rosetta pLysS cells and is shown below: Table 1 The factors and their levels in coded and uncoded terms used in experimental design to estimate the expression of recombinant chitin deacetylase in E. coli Rosetta pLysS cells.   Where: Z, the total activity of recombinant chitin deacetylase (predicted response); y k and y l are the independent factors, ϕ 0, intercept term; ϕ k , linear effect; ϕ kk , squared effect, and ϕ k ϕ l , interaction effects. The regression equation 1 was designed and analyzed by using the software MINITAB 17.0.

Validation of model
The optimized process parameters obtained for the expression of CDA in E. coli Rosetta pLysS cells were validated by the Minitab response optimizer tool available in MINITAB 17.0 (Trial Version). The experiments were carried out in triplicated and the experimental total activity was compared with predicted total activity under optimized conditions.

Analytical Methods
Expression, biomass, and protein quanti cation The expression level was analyzed by using ImageJ software followed by running 12% SDS-PAGE. The biomass was calculated at the end of each experimental run by weighing the cell pellet. To each gram of cell pellet 5 mL of lysis buffer was added (50mM Tris-HCl, 300mM NaCl, 10mM Imidazole) and the soluble protein was collected by disrupting cell pellet by sonication, for 10 cycles with pulse 10s on and 10s off at 60% amp. The lysate was collected by centrifuging at 8000 rpm for 10 min at 4°C. The protein was quanti ed by Bradford's assay method. The BaCDA activity was also determined in the lysate using an acetate assay kit.

Enzyme activity assay
The chitin deacetylase activity was determined using an acetate assay kit as per the user manual. Brie y, ethylene glycol chitin (1mg/mL) (EGC) was used as a substrate (Schomburg and Salzmann 1991). In 100 µL of reaction, 40 µL of the substrate with 20 µL of lysate in presence of 40 µL of 50 mM Tris-HCl (pH: 7) buffer was incubated for one hour at 30°C with mixing at 800 rpm. After the incubation time, the reaction was stopped by separating the enzyme using a 3 kDa spin column. 10 µL of the reaction mixture was used for assay and enzyme activity was calculated accordingly. One Unit of the enzyme is de ned as the activity which released 1µmol of acetate from the substrate per microliter of enzyme per minute. The enzyme activity assay was carried out in triplicates and the respective enzyme activity was calculated accordingly.

Fermentation kinetics of lactose induction
To investigate the point of lactose induction and expression start point, activity pro ling was done. At the optimized condition, the expression was repeated. During the expression, glucose concentration using glucose estimation kit, biomass by weighing cell pellet, and enzyme activity using acetate assay kit were determined. The kinetics of glucose concentration, biomass yield, and enzyme activity were evaluated.

Isolation, screening, and identi cation
The isolation of the microbes from the Arabian Sea sediment yielded fteen bacterial and one fungal colony. The bacterial colonies were subjected to four rounds of puri cation on the colloidal chitin agar plate to obtain a single isolated colony.

Gene identi cation, and cloning
A putative polysaccharide deacetylase gene from Bacillus megaterium with Gene ID: NZ_CP009920.1 was taken as a reference gene. The gene was probed in the Bacillus aryabhattai whole genome with Genome ID: NZ_JYOO01000001.1 using genome BLAST server. The result yielded a 100% query coverage with a 97% match. The sequence was annotated in the NCBI gene databank in the third-party section of the DDBJ/ENA/GenBank databases with accession number TPA: BK010747 (Pawaskar et al. 2021).
Using the designed primers, BaCDA (~765bp) was ampli ed at optimized PCR conditions. The genetic code of BaCDA was a rmed by Sanger sequencing employing the T7promoter and the terminator region of the vector. Post con rmation, the vector construct was transformed into E. coli Rosetta pLysS cells for expression.

Extracellular expression of BaCDA
In TB basal media, the concentration of glucose depleted completely within 24 h of fermentation. This is followed by a second lag phase of 4 h, and further lactose consumption and auto-induction was initiated in the second log phase which prolonged for an additional 20 h. The maximum biomass yield and total CDA activity were found to be 22.26 ± 0.98 g/L and 84.67 ± 0.56 U/L respectively in the stationary phase at 52 h of fermentation (Pawaskar et al. 2021).
Process optimization of expression of extracellular recombinant enzyme chitin deacetylase in E. coli using central composite design (CCD) In the current study, ve factors induction temperature (A), agitation rate (B), induction time (C), glucose concentration (D), and lactose concentration (E) were considered (Table 1). Accordingly, A, B, C, D, and E were taken as exogenous factors while the total activity of expression of extracellular recombinant chitin deacetylase was chosen as the endogenous factor (response). The expression of extracellular recombinant enzyme chitin deacetylase in E. coli Rosetta pLysS cells is quanti ed and represented as "total activity (U/L)" of recombinant chitin deacetylase ( Table 2). The results of biomass yield, expression of CDA, total protein content, speci c activity at all the experimental runs are shown in Table 3. The expression was estimated by SDS-PAGE and quanti ed using ImageJ software. The total activity at each condition was determined using acetate assay and represented along with the SDS-PAGE as shown in Fig. 1. Table 3 The results of biomass yield, expression of CDA, total protein content, the speci c activity of all the experimental runs given by central composite design. The results of the analysis of variance (ANOVA) of the full model (Equation 2) are given in Table 4. The predicted values of the total activity were given by the software based on a modi ed regression model (Equation 3) and are represented in Table 5. In addition, the normality assumption is ful lled as the residuals are normally distributed i.e., the data points are closer to the straight line as shown in normal probability plot (Fig. 2) indicating the capability of the model to optimize the expression of recombinant chitin deacetylase. Hence the modi ed model Equation 3 shall be applied to discover the optimal levels and their design space for the process. The impact of interactions among the independent factors was visualized by the two-dimensional contour plots for the expression of recombinant chitin deacetylase (Fig. 3).

Validation of model
The response optimizer tool in MINITAB 17.0 (Trial version) was used to solve the reduced regression model (Equation 3) and to nd the optimal conditions for enhanced total activity of recombinant enzyme chitin deacetylase and their expression in E. coli Rosetta pLysS. The model (Fig. 3) was validated at the optimal process conditions i.e., induction temperature of 22°C; agitation rate, 128 rpm; induction time, 30 h; glucose concentration, 0.058% (w/v); and lactose concentration, 1% (w/v). The optimal values (Table 6 and Fig. 4) of all the ve factors except factor E (lactose concentration) are placed within factor levels selected and the predicted and experimental total activity of recombinant chitin deacetylase at these optimal conditions was 190.85 U/L and 202.39 ± 0.31 respectively (Table 6). The experiment was performed in triplicates and the value was represented in average with standard deviation.
The optimal value of lactose concentration using the response optimizer tool was found to be 1% which was at a +2 level in the model. Therefore, the impact of higher lactose concentrations (1 -2.5%(w/v)) on expression and total enzyme activity was studied. The total enzyme activity was 201.840±1.92 U/L, 201.900±1.95 U/L, 202.186±1.59 U/L and 202.173±2.09 U/L with 1%, 1.5%, 2% and 2.5% lactose respectively. There was no signi cant difference in the total activity and the SDS-PAGE analysis showed the expression level was the same in all lactose concentrations (Fig. 5).

Fermentation kinetics of lactose induction
To investigate lactose induction, expression pro ling was done. The biomass, glucose concentration in the media, and enzyme activity were determined during the expression at the optimized conditions. The growth of E. coli showed a diauxic pattern due to the inclusion of glucose and lactose in the media. The rst log phase was observed till 12 h at which the glucose was completely consumed. The expression of CDA and production of by-product allo-lactose were initiated in the second log phase after the 4 h of lag i.e., at 16th h. The maximum biomass yield and CDA activity of 17.53 ± 0.07 g/L and 202.39 ± 0.31 U/L respectively were found at 30 h of fermentation (Fig. 6).

Discussion
Chitin, the second most abundant polymer after cellulose is extracted on an industrial scale from the seafood waste (Yadav et al. 2019). The commercial applicability of chitin is restricted due to its crystalline structure. This limitation is addressed by deacetylating chitin into chitosan which increases its amorphous nature (Jayakumar et al. 2010). On a commercial scale, a chemical route is undertaken. However, the reproducibility of the product is a major concern in addition to other environmental apprehensions. The greener route using enzymes is thus being exploited. As the main source of chitin is from the sea, the unexplored marine ecosystem holds a plethora of enzymes with unique physiochemical properties. This has led to increased research for novel chitin  (Table 3 and Fig. 1). A similar trend was observed with the agitation speed which improves the oxygen transfer rate and thereby the growth of the E.coli (Rosano and Ceccarelli 2014). The agitation speed also affects the growth of E. coli during expression. Therefore, an accord was maintained at 128rpm which was optimum for growth and soluble expression of BaCDA, a further increase led to negative effects on the expression. A similar observation with the agitation rate has been reported by Shahzadi Table 2). The total activity varied from 1.27 to 199.72 U/L indicating the in uence of factors and their levels on the expression of BaCDA in E. coli Rosetta pLysS cells (Table 2). A variation of only ± 9.71% was observed in the total activity between experimental and predicted values ( Table 2) indicating the accuracy of experimentation.
The adequacy of the model and tness was evaluated by using analysis of variance (ANOVA) for the experimental design used ( Table 5). The high value of R 2 suggests a higher signi cance of the model (Selvaraj et al. 2021). The observed low difference of 0.431 between the adjusted R 2 (0.8785) and the R 2 value (0.9216) con rms the data accuracy. The model Equation 2 was highly signi cant with an F-value of 21.37 as shown by Fisher's F-test, along with a very low probability value (P model > F=0.000), which was signi cant at a 95% con dence interval. The model F-value was calculated using the formula: The factors are said to be signi cant only if the value of F-statistics probability is less than 0.05. For the proposed model (Equation 3), the terms D, E, A 2 , B 2 , C 2 , D 2 , E 2 , AB, and AC were signi cant at 95% con dence (P < 0.05) ( Table 5). The value of "Lack of Fit F-value" was 8.24 which indicates its insigni cance and has a 14% chance to be signi cant (Table 5) The insigni cant F-value of "Lack of Fit" represents the tness of experimental data to the model. Therefore, based on the above statistics it can be concluded that glucose concentration (D) and lactose concentration (E) exhibits a vital part in the expression of recombinant chitin deacetylase in E. coli. In this study, the physical environmental conditions such as induction temperature (A), agitation rate (B), and induction time (C) did not show signi cance in the expression of recombinant chitin deacetylase (Table 5). Whereas the interaction between A and B; and A and C was very much signi cant ( Table 5). The results are in good agreement with the general facts of higher F-values of the model than the F-value of lack of t, and higher R 2 values (> 0.70) specify that the model ts the data in a better way. Further, to validate the polynomial regression model (Equation 3), experiments were carried out at the optimal conditions in triplicates as given in Table 6  The kinetics of the expression pro ling was carried out with the optimized process conditions. The glucose concentration, biomass yield, and total enzyme activity were evaluated. As glucose is the simplest form of carbon, for the initial growth E. coli utilizes glucose rst. Once the glucose exhausts in media, the E. coli starts utilizing lactose. In this study, glucose (0.061%) was exhausted in 16 h of fermentation, leading to the onset of the diauxic shift. There was a second lag phase during this sugar utilization transition lasting for 4h. The expression of the BaCDA increased strongly during this diauxic shift. The culture reached its saturation at 30 h recording maxima in the total enzyme activity (202.39±0.31 U) and biomass (17.53±0.07 g/L). The total activity after the optimization had increased by a fold of ~2.39 fold. The total fermentation duration had also reduced from the earlier recorded 52 h to 30 h proving it be better on economics. Many groups have worked on the process optimization with lactose as the inducer using CCD as the statistical tool ( Table 7). The fold improvement after the statistical intervention varied from 2. with the study on Ficin (Sattari et al. 2020). In the same lines, we could upregulate the production of BaCDA by ~2.39 folds. In the present study, we have worked on the diauxic of E. coli rather than adding it as an inducer based on the absorbance value at 600nm. This auto-induction aspect helps to reduce the chances of contamination required during regular absorbance monitoring Even though lactose was used as an inducer, most of the studies optimized for the OD 600 for induction (Table 7). On the other hand, in this study, we choose to auto-induce by including lactose in the media along with glucose as a repressor. Table 7 Literature on lactose-induced expression optimization using the CCD model.

Conclusion
The Central composite design was useful for modelling the effects of bioprocess and nutritional parameters on the CDA production by E.coli Rosetta pLysS cells in submerged fermentation. This also enabled us to identify the optimal conditions for the CDA expression. The optimized conditions for the submerged fermentation included induction temperature of 22°C, agitation speed of 128 rpm with incubation for 30 h without pH control. The media included 0.058% glucose and 1% lactose. The optimized parameters were validated by comparing the theoretical value with the experimental responses. At optimized conditions, the total enzyme activity yield was enhanced by ~2.39 folds and the fermentation duration was reduced to 30 h from 52 hours.
The investigation of induction kinetics showed the expression of BaCDA initiates after 16th h on utilization of lactose as a carbon source by E. coli cells. Therefore, the post-induction duration was only 14 h whereas in the unoptimized condition it was 28 h. The ndings could be used in scaling up the expression of BaCDA with minimum culturing time to get a high yield thereby making the fermentation process cost-effective. Therefore, the optimized process conditions could be scaled up for higher yield of CDA which can be utilized for the pharmaceutical grade chitosan production. The normal probability plot. The plot indicates the capability of the model to optimize the expression of recombinant chitin deacetylase.  The optimization plot presenting optimal values for the increased total activity of recombinant enzyme chitin deacetylase and their expression in E. coli Rosetta pLysS.