Development of Process and Data Centric Inference System for Enhanced Production of L-Asparaginase from Halotolerant Bacillus licheniformis PPD37

The present study aims at bioengineering of medium components using data and process centric approaches for enhanced production of L-asparaginase, an important biological molecule, by halotolerant Bacillus licheniformis PPD37 strain. To achieve this, first significant medium components were screened followed by optimisation of a combination of media components and culture conditions such as L-asparagine, MgSO4, NaCl, pH, and temperature. Optimisation study was carried out using statistical models such as response surface methodology (RSM) – process centric and artificial neural network (ANN) – data centric approaches. The production improved from 2.86 U/mL to 17.089 U/mL, an increase of approximately 6-times of the unoptimised L-asparaginase production. On comparing RSM and ANN models for optimised L-asparaginase production based on R2 value, mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute deviation (MAD) values, the ANN model emerged as the superior one. As this is the first report to the authors best knowledge on development of inference system using RSM and ANN models for enhanced L-asparaginase production using a halotolerant bacteria, this study could lead to more in-depth and large-scale L-asparaginase production.


Introduction
L-Asparaginase or L-asparagine amidohydrolase (EC 3.5.1.1) catalyses the deamidation of L-asparagine into L-aspartic acid and ammonia. This property of L-asparaginase has established its place as one of the most valuable biological macromolecule in the medical field that ANN is able to enhance the accuracy between the non-linear input-output relationship. In contradictory to other computational and physics based numerical modelling, ANN requires training and calibration of the input variables. Thus, ANN inferences the patterns on the basis of learning regime by use of input and output data without assuming or recognising its nature and interrelations. Apart from that, there are many algorithms in ANN which offers flexibility for any additional limitations which may arise during training and calibration of the input [27]. Moreover, there is no specific design that is required, as decision-making and drawing of conclusions is human intuitive [28]. Polynomial properties and regression-based experimental modelling make ANN more suitable for designing of bioprocess model over other process centric approaches [29]. ANN can predict all kind of non-linear functions including quadratic, while RSM is useful only for quadratic approximations [30].
Thus, the current study deals with development of inference system of process and data centric approaches, i.e. RSM and ANN for optimisation of L-asparaginase production first, by designing and performing RSM and then using the experimental data obtained from RSM to build an effective ANN model with high accuracy. In addition to that, this study also emphasises on their sensitivity analysis and their usefulness in optimisation of the process through the validation of the approaches. This experiment of model fabrication, prediction, and successful application of traditional and non-linear models for L-asparaginase production by a halotolerant bacterium is one of the first reports to the best of authors' knowledge.

Isolation and Screening of L-Asparaginase Producing Halotolerant Bacteria
Halotolerant Bacillus licheniformis PPD37 isolated from the salterns of Sultanpura, Dandi situated at the coastline of the Arabian Sea, Gujarat, India, was selected for the present study. B. licheniformis PPD37 was cultured on HEM (halophilic enrichment medium) modified from Kumar et al. (2012) at 37 °C [31] and was found to have the highest L-asparaginase production out of the 51 total isolates (data not shown). Hence, it was used for subsequent optimisation of medium components for enhanced production of L-asparaginase. Primary screening for L-asparaginase was done using rapid plate assay, and the enzyme activity was derived by performing nesslerization method [32,33].

Screening of MM9 Medium Components by Placket-Burman Design (PB Design)
Screening of significant medium components of MM9 medium (inducing medium for L-asparaginase production) was performed based on Placket-Burman (PB) design. A total of seven media components-L-asparagine, Na 2 HPO 4 , NaCl, MgSO 4 , KH 2 PO 4 , Glucose, and CaCl 2 -were investigated. All the factors were examined at lower levels (− 1) and higher levels (+ 1). The seven variables were screened in 13 experimental runs as observed in Table 1. One hundred millilitres of MM9 medium was prepared in 250-mL Erlenmeyer flask and cultured at 37 °C, 120 rpm for 72 h. L-Asparaginase production (U/mL) was estimated after every 24 h by determining the amount of ammonia liberated through nesslerization method [33]. The experiments were performed in duplicates, and the average value of L-asparaginase activity was accepted as the response (Y). Media components with p value > 0.05 were considered significant. The statistical software Minitab v. 19.0 was utilised for experimental design and analysis.

L-Asparaginase Production with Screened Parameters by RSM Using CCD
RSM is a statistical method that optimises a process dependent on several variables, by predicting the ideal conditions that provide maximum yield with a minimum number of experiments. Central composite design (CCD) used in the present study is a type of RSM that fits a second-order polynomial order. The five variables selected to perform this experiment included three screened MM9 medium components, L-asparagine, NaCl, and MgSO 4 , as determined by PBD, and two physical conditions, pH and temperature. These variables were chosen to analyse their combined effect on L-asparaginase production and how the interaction between chemical and physical variables impacts the production process [34][35][36]. All the input variables were examined at five levels and coded as − 2, − 1, 0, + 1, and + 2 for lowest, low, medium, high, and highest concentration. As observed in Table 2, a total of 52 experimental runs (comprising of 32 cube points, 10 centre points in cube, and 10 axial points) were designed using Minitab v. 19. The experimental runs were carried out in duplicates in 250-mL Erlenmeyer flasks containing 100 mL of MM9 medium for 72 h, and the L-asparagine activity (Y) was recorded after every 24 h.
A second-order quadratic model was developed to predict the influence of all variables on L-asparaginase production. The response surface quadratic equation for above 5 variables adopted in the present case was: where Y is the desired response in terms of L-asparagine production. b 0 is constant or inter- 15 are the regression coefficients to be estimated. X 1 , X 2 , X 3 , X 4 , and X 5 are the coded values of the variables which were significant for each of the system.
Minitab v.19.0 was used for calculation of second-order polynomial coefficients, analysis of variance (ANOVA), and regression analysis [37]. Fischer's test at 95% confidence level was conducted to examine the significance of the model, and goodness-of-fit for the second-order polynomial equation was determined by R 2 value. Contour plots and 3D graphs were used to visualise the interaction between significant factors and their effect on the production of L-asparaginase.

Artificial Neural Network (ANN)
ANN is a type of statistical data modelling tool that can mimic different attributes of biological information processing and could be helpful in the optimisation process. For the development of an ANN model, the selection of a suitable learning method plays a crucial (1)

Table 2
Experimental matrix for central composite design (CCD) using RSM and ANN for cultural parameters with actual and predicted L-asparaginase activity (U/mL Run no L-Asparagine  role as it reduces the error function and helps in improving the model [38]. The optimum conditions for L-asparaginase production were predicted by creating a linear feed forward ANN model using MATLAB R2018a software [39][40][41]. The feed forward model also known as multilayer perceptron (MLP) has a gradient descent backpropagation (BP) learning algorithm and has been used expansively for biological applications [23,40,42]. BP algorithm performs this task by minimising the error of changing weights that are inversely proportional to the negative error gradient.

Optimisation of Network Configuration
In an optimum, non-linear, multilayer perceptron (MLP), two layers exist-one input and one output with one or more layers hidden inside. Each layer has a precise number of artificial neurons, and their estimation is dependent on the input/output vectors [23]. The determination of the hidden layer N h is a crucial task for the optimisation of the ANN model and requires a trial-and-error-based approach. Even though there is no particular formula for this purpose, the problem can be resolved by applying this thumb of rules: (a) the number of the hidden layer (N h ) must be between I and 2I + 1 and (b) it should not be less than maximum of I/3 and O, where I is the number of inputs and O is the number of outputs [43]. Lesser neurons in the hidden layer are preferable as they are generally more efficient at generalisation and have limited overfitting problems [44]. Since, in the present study, I = 5 and O = 1, the number of hidden neurons was found to be ≥ 5/3 = 1.66 neurons and ≤ 5*4 = 20 neurons implying that the hidden layer (N h ) lies between 1 ≤ N h ≤ 20. The trial-and-error approach was employed to finally determine the Nh based on the optimisation of neurons using MATLAB 2018a.
Since we are taking a hybridised approach for optimisation, the experimental data obtained from RSM will be used as inputs to construct an ANN model. Except for the simulated data, the experimental dataset used for RSM-CCD analysis was again exercised by the ANN model to reproduce the L-asparaginase activity (U/mL) prediction (Table 2). In order to train the ANN model, the dataset was divided into three subsets: (1) training, (2) validation, and (3) training, having 70%, 15%, and 15% of weightage, respectively. A predictive ANN model comprising of five parameters: L-asparaginase, MgSO 4 , NaCl, pH, and temperature as inputs and 20 neurons were constructed to obtain the output activity (U/mL) as illustrated by Fig. 1. The dataset with different variables has multiple measurement units and are often weighed down by measurement errors and noise which may negatively influence the ANN training. By using activation function like logistic sigmoid, the raw dataset is standardised and converted into a non-dimensional form of uniform range of variability [24].

Determination of Prediction Potential of RSM and ANN
Several error functions and R 2 value were employed to ascertain the predictive ability of the RSM and ANN. During ANN training, the weights underwent appropriate adjustments to minimise the error functions. Most extensively utilised functions applied for the study were mean squared error (MSE), root mean squared error (RMSE), mean absolute deviation (MAD), and mean absolute percentage error (MAPE). Microsoft Excel 2016 was used to calculate these errors by applying the following equations: where A t refers to actual L-Asparaginase activity (U/mL), F t is the predicted L-asparaginase activity by RSM and ANN, and n is the number of runs used for the experiment.

Screening of Medium Components by Plackett-Burman Design (PBD)
PB design comprises of 13 experimental runs used to observe the effect of the seven MM9 medium components. The components of MM9 media, i.e. Na 2 HPO 4 , KH 2 PO 4 , L-asparagine, NaCl, MgSO 4 , and CaCl 2 were screened for production of L-asparaginase using Bacillus licheniformis PPD37. Table 1 represents the observed activity for every experimental run. The highest activity of 3. 62 to 10.84 U/mL was observed after 48 h of incubation compared to 2.86 U/mL obtained during unoptimised run. This indicates clear impact Fig. 1 Schematic representation of ANN model employed with input, hidden, and output layers of the medium components on L-asparaginase production. Vala et al. (2018) have observed maximum L-asparaginase production from fungi Aspergillus niger on the 6 th day of incubation after optimising all the physico-chemical conditions. Thus, decrement in incubation time provides a very good opportunity for large-scale production of this industrially important enzyme. Three media components, L-asparagine, NaCl, and MgSO 4 , were deemed to be significant factors based on their effect, p value (< 0.05), and standard error (Table 3). By performing ANOVA with 95% confidence limit, p values of L-asparagine, NaCl, and MgSO 4 were calculated to be < 0.0001, 0.023, and 0.003, respectively. These medium components were used for the subsequent bioengineering for enhancement of L-asparaginase production through CCD-RSM.

CCD-RSM Modelling
Optimisation of three screened medium components (L-asparagine, NaCl, and MgSO 4 ) and two physical conditions (pH and temperature) by RSM using CCD derived model predicted maximum L-asparaginase activity after 48 h of incubation. This is in accordance with previous unoptimised run and Plackett-Burman screening, where the highest activity was observed after 48 h and then plateaued or decreased after 72 h. According to Table 2, the predicted values of L-asparaginase production closely matched to the experimentally calculated values.
The highest L-asparaginase activity of 17.089 U/mL ( Table 2) was recorded after 48 h. As revealed in Table 2, the maximum activity was obtained with experimental run 16 where the values for the five variables were as follows: L-asparagine, 2.0 g; MgSO 4 , 0.2 g; NaCl, 0.8 g; pH, 7.5; and temperature, 37 °C. The experimental and predicted values for activity seem to be near identical as depicted in Table 2. To verify the efficiency of the optimal run conditions, the process was repeated, and the L-asparaginase activity was determined. After 48 h of incubation under same conditions as experimental run 16, L-asparaginase activity of the crude extract was recorded to be 17.061 U/mL. B. licheniformis has been used for optimisation of L-asparaginase production by Mahajan et al. (2012;, where they obtained enzyme activity of 32.26 U/mL. Even though L-asparaginase activity observed was higher compared to our study, the specific activity of crude extract was reported to be 23.1 U/mg, while in our case, it was 650 U/mg [45,46]. Alrumman et al. (2019) have also carried out L-asparaginase production using marine B. licheniformis, and maximum enzyme production achieved was 8.1 U/mL for free cell culture and 11.66 U/ mL for immobilised cells, with highest specific activity 36.08 U/mg [47] after purification.
Abdelrazek et al. (2019) have yielded 7.95 IU/mL of L-asparaginase production by RSM-Box-Behnken using B. licheniformis, which is lower than this study [48]. Compared to previous studies, the specific activity demonstrated by this strain of B. licheniformis strain was considerably higher which could prove to be advantageous for future applications. Venil and Lakshmanaperumalsamy (2008) have reported that addition of organic nitrogen sources in the media could induce the production of L-asparaginase [49], whereas inorganic nitrogen source reduces the L-asparaginase production significantly. L-Asparagine is the sole source of nitrogen in MM9 medium, and it can work as an inducer for the production of L-asparaginase. Therefore, optimisation of asparagine concentration may have significant effects on the enzyme production. This information was also supported by Sudhir et al. (2012) and Amena et al. (2010), where they have reported optimised production of L-asparaginase at 1% and 0.5% concentration, respectively [50,51]. A positive effect of pH on enzyme production was observed in the present study. Usually, at higher pH, enzyme activity increases, but this L-asparaginase was produced by marine bacteria; it may possible that they are not susceptible at very high pH [52]. Therefore, higher production of L-asparaginase in the culture with pH 7.5 suggested positive impact of pH on L-asparaginase production after 48 h.
ANOVA and regression analysis of the CCD is being represented in Tables 4 and 5. On the basis of significant p value (< 0.05), L-asparagine, NaCl, temperature, L-asparagine*Lasparagine, NaCl*NaCl, pH*pH, temperature*temperature, and L-asparagine*NaCl were considered to be significant. Since the model has p value < 0.05 for linear, square terms, and 2-way interaction, it has been observed to be precise, significant, and reproducible. The actual R 2 value (0.9404) was found to be in a reasonable range from the adjusted R 2 (0.90) and predicted R 2 (0.80). The R 2 value of 0.94 is indicative of the 94.04% variability of the response explained by the model. After optimisation through CCD-RSM, L-asparaginase activity increased by approximately 6 times from 2.86 to 17.089 U/ml. The second-order response surface model equation fitted for L-asparaginase activity is as follows: Figure 2 represents the response surface of interactions between the five variables visualised through 3D graphs and contour plots. The graphs show the relationship between two (6) Activity = − 157.9 + 10.38L − asparagine − 48.4MgSO4 + 4.60NaCl + 26.08pH − asparagine * temperature − 6.73MgSO4 * NaCl + 5.61MgSO4 * pH − 0.420 MgSO4 * temperature + 0.117NaCl * pH − 0.0146NaCl * temperature + 0.0220pH * temperature variables and their effect on L-asparaginase activity after 48 h of incubation. In the case of 3D graphs, the x-and y-axis represent the variables, while z-axis corresponds to the response, i.e. activity. Contour plots are the geometric illustrations of 3D graphs on a twodimensional scale, and the highest activity is depicted by the smallest curve or circle of the plot.

ANN Modelling Prediction
Input neuron network topology was utilised to construct an ANN architecture for the optimisation of L-asparaginase production. Out of numerous ANN topologies trained, the one that recognised the optimal number of neurons in the hidden layer was selected based on the root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute deviation (MAD). The predictive ANN model constructed was trained by the BP algorithm since better transformation behaviour can be achieved through it. The BP algorithm uses second-order derivatives of the RMSE between the desired output and the actual output. For this study, the optimum ANN model topology (Fig. 1) has five input variables, one hidden layer with 20 neurons, and one output layer (5-9-1). As illustrated in Table 2, the predicted value of L-asparaginase activity for the ANN model is similar to that of RSM. However, the higher value of R 2 (0.9846) along with lower value for RMSE (0.322) ascertains ANN as a preferable model for predicting L-asparaginase production conditions. This kind of results can be explained by the ability of ANN to provide global optimisation, as training of the neurons was repeated several times for different physicochemical variables. Repeated training could be helpful to provide global optimisation of the variables [23]. However, one disadvantage that lies with ANN is the requirement of high-quality data for the training. In this context, experimental dataset has to be normalised to avoid measurement errors and noise created by variables with different measurements. These factors may exert negative impact on operation of ANN training algorithms.

Optimisation of the Number of Hidden Neurons
Accurate prediction of the optimal conditions is dependent on the number of neurons present in the hidden layer. If the network topology is minimal, i.e. having few neurons, the network is unable to train properly and learn. Thus, optimisation of the neurons was employed in order to determine the ideal neuron using the best predictivity and accuracy. The optimum neurons were selected on the basis of model performance including values of R 2 , RMSE, MAPE, and MAD. Figure 1 illustrates the optimal structure of the feed forward network model for the neural network. As depicted in Fig. 3, the 11 neurons demonstrated the best prediction capability and high accuracy for L-asparaginase production. The results obtained give a high value of R 2 (0.9846), indicating a reduction of RMSE (0.322), MAPE (1.519), and MAD (1.558) values, thereby suggesting the significance of the ANN model. The obtained values further strengthen the capability of the model for predicting optimal neurons that yields maximum L-asparaginase production.

Training, Validation, and Testing of the Model
For the development of the model, the input data were compiled into three different subsets-training (70%), validation (15%), and testing (15%). Here, Fig. 4a,b and c illustrate the ANN model having improved R 2 values in all three subsets-training (0.9824), validation (0.9953), and testing (0.9876). Figure 4d depicts that the overall model was a better fit to linear equation having R 2 value of 0.9846 higher than that of RSM (0.9404). The ANN model thus developed was able to accurately simulate L-asparaginase production (target) and reproduce experimental output with superior precision. A BP algorithm trained, multilayered feed forward ANN with substantial R 2 values can be utilised to obtain L-asparaginase production (target) with accurate results.

External Validation of RSM and ANN Models
A prediction model is assessed based on its competency to predict the outcome (response) with high precision when new observations with same distribution as that of training data are used. The performance of the constructed ANN prediction model was evaluated on the basis of explained variations (R 2 ) and on the probability of overfitting goodness-of-fit test. This probability may perhaps be eluded if the neurons are in a range of 8 to 12 [53]. Actual and predicted outputs of both RSM and ANN were contemplated for carrying out the external validation of the developed RSM and ANN models [54]. The results (Fig. 5) showed that the ANN model displayed less overfitting in comparison with the RSM model, thereby implying the ANN model as statistically more reliable and suitable.

Prediction Potential
Values of R 2 , RMSE, MAPE, and MAD for RSM and ANN models were calculated and compared to determine the model with superior predictive capability. Table 3 represents the experimental and predicted values achieved by RSM and ANN models. As seen in respectively. The ANN model established itself as superior over RSM, in every regard, for the production of L-asparaginase. While RSM utilises the quadratic equation to generalise data, the superior prediction potential exhibited by ANN can be accredited to utilisation of non-linear polynomial functions. Superiority of the ANN model over RSM as a modelling tool for bioprocess optimisation has been recorded several times; however, a limited number of reports have been found on the application of ANN for L-asparaginase production up till now.

Sensitivity Analysis
The study of medium components on L-asparaginase production is very important, especially interaction between medium components. This can be meticulously studied using ANN compared to RSM. In RSM, the contribution of the various medium components can be measured by coefficients. Though interpretation of ANN model can be difficult due to its 'black box' nature but by using 'Perturb' method, it can give insights on the effectiveness of an individual parameter or interactions among parameters [55,56]. Therefore, performance of the medium components was investigated by the optimal ANN model with 20 neurons in the hidden layer.
In RSM, the out of 5 variables, L-asparagine is the most influencing factor for L-asparaginase production. L-asparagine*MgSO 4 , MgSO 4 *pH, and NaCl*pH had significant interactions between them indicating from their coefficient values (0.190, 0.105, and 0.0829) and suggesting their influencing role in the L-asparaginase production. On the other part, ANN can be useful to obtain insights between the medium components interaction due to the inherent nature of ANN. Figure 6 indicates the rate of response and effect of this change in the output. The influence of the variable can be measured based on the slope and range of the change in response. As the value of slope increases, the influence of that individual variable also increases. It could be observed from the Fig. 6 the slope of the asparagine (1.562) and temperature (0.716), suggesting greater influence on L-asparaginase production compared to other parameters, viz. pH (− 0.215), NaCl (− 0.370), and MgSO 4 (− 0.081). Interestingly, these results are also supported by RSM first-order quadratic equation. Thus, it can be concluded that ANN is also equally efficient in sensitivity analysis.
In this case, the low generalisation capability and erratic prediction of ANN can be improved through intense calibration to generate accurate predictions. Since ANN model Actual activity (U/ml)  is difficult to interpret due to its 'black box' nature, one needs to analyse the noise generated through weights to gain a full understanding of the model. On the other hand, RSM could be useful for reduction in experimental runs or trials when compared to other traditional approaches like one factor at a time. Even though RSM is most commonly regarded method for optimisation of bioprocesses, its use is limited to only quadratic non-linear correlation. Therefore, a hybrid model that combines both RSM and ANN could help in achieving successful optimisation.

Conclusion
In the present study, optimisation of physico-chemical conditions for enhanced L-asparaginase production by halotolerant Bacillus licheniformis PPD37 using two inference systems-RSM and ANN-was carried out. RSM proved useful at determining the optimal conditions for L-asparaginase production. Highest activity of 17.089 U/mL was observed at optimal condition of L-asparagine (2.0 g), MgSO 4 (0.2 g), NaCl (0.8 g), pH (7.5), and temperature (37 °C). A substantial rise of about 6 times from the previous one was obtained by following the RSM model. Even though the predicted value for L-asparaginase activity given by both RSM and ANN is within a similar range (16.875 for RSM, 16.946 for ANN), ANN has proven to be better at prediction and then RSM in this regard. ANN has an overall R 2 value of 0.9846 which is higher than 0.9404 of RSM, as well as lower values of other error functions. ANN has also proven to be equally useful for sensitivity analysis and has consistently performed better in every aspect. Therefore, ANN could potentially be a suitable alternative to RSM for the optimisation process. Thus, the study may lead to a more thorough exploration and research on the previously limited subject of L-asparaginase production from the halotolerant bacteria and improvement in large-scale production using such models. The study could be also helpful to design experiments for enhanced production of other important biological molecules which could be helpful to scientific community around the globe.