Mathematical Model of Physiological and Biochemical Indexes to Plant Growth–Defense Tradeoff in Salvia Miltiorrhiza

Background Enzyme activities play a very important role in metabolism. Carbon (C) and nitrogen (N) are the two most basic elements for plant growth and development, and their mutual coupling makes C:N become an important index to explore plant element allocation and adaptation strategies. Although the key enzymes activity in carbon and nitrogen metabolism, and defense enzymes are often used to indexes of the physiological and biochemical characteristics of plants, the relationship between them and biomass still lacks understanding. In this paper, under the control condition, the biomass and 18 kinds of physiological and biochemical indexes were obtained through 24 groups experiments of the regenerated seedlings of Salvia miltiorrhiza by 9 endophytic fungi strains grafted. Results The data were analyzed by descriptive statistical analysis, Lasso variable screening analysis and MLP neural network regression analysis. Results show that many physiological and biochemical indexes are related to biomass, and glutamine synthetase(GS),glutamate synthase (GLS), glutamate dehydroge nase(GDH), peroxidases(POD), catalase(CAT), soluble protein are the key factors which af-fect the biomass synthesis of Salvia miltiorrhiza . Conclusion In this paper, it discusses the relationship between physiological and biochemical indexes and biomass in a comprehensive and systematic way by the framework of ”Build-Design-Calculate-Test”. Through rigorous logical reason-ing process, the factors aﬀecting the growth of Salvia miltiorrhiza are selected, and the mathematical model is established. It also provides a powerful tool for the comprehensive and systematic study of plant growth and the synthesis of eﬀective components.


Background
Cabon (C) and nitrogen (N) are the the primary elements involved in the growth and development of plant [1]. Plants assimilate C and N at a certain ratio through C and N metabolism, and also consume part of C and N in the defense process. Plants maintain a dynamic balance between growth and defense, and optimize allocation of resources for survival under constant biological and abiotic stresses [2].
In the process of carbon sequetization, plants initially produce PGA(3-phosphoglyceric acid), which is later converted into various sugars.The contents of chlorophyll, soluble sugar and starch in leaves, as well as the enzymes activities of sucrose phosphate synthase(SPS) and sucrose synthase(SS) are important index to explore plant carbon metabolism. Plant metabolism of nitrogen is mainly a process of converting inorganic nitrogen compounds absorbed from the environment into amino acids, proteins and other organic nitrogen compounds. Generally, the contents of solubility protein, the enzymes activities of glutamine synthetase(GS), glutamate dehydrogenase(GDH), nitratereductase (N-R), glutamatesynthase (GOGAT), nitritere ductase (NiR), glutamate synthase (GLS) are considered to be important for metabolism of nitrogen. As well as the enzymes activities of superoxide dismutase(SOD), peroxidases(POD), catalase(CAT), proline(Pro),phenylalanineammo-nialyase (PAL), and the content of malondialdehyde(MDA) are used to characterize plant defense responses. Therefore, the activities of key enzymes activity in C and N metabolism, and defense enzymes become an important index to explore plant growth and ecological adaptation strategies [3].
Salvia miltiorrhiza(S. miltiorrhiza) is a perennial medicinal plant in Labiatae family. Its root is widely used in the treatment of variousdiseases, especially for coronary heart disease and cerebrovascular disease [4,5,6,7]. It is of great significance to analyze the relationship between physiological and biochemical indexes and biomass accumulation in the cultivation and production of S. miltiorrhiza. Under controlled experimental conditions, 24 experimental groups were established by grafting 9 strains of endophytic fungi and their combination. After 30 days of cultivation in culture bottle, the data are obtained, including biomass, main enzyme activities of carbon and nitrogen metabolism, defense enzyme activities and chlorophyll content. Then, the relationship between the biomass and physiological and biochemical indexes is analyzed by using descriptive statistical analysis methods such as analysis of variance and correlation analysis. Finally,on this basis,we use Lasso variable screening method to find out the key factors affecting biomass synthesis, and through MLP neural network method, we establish the regression model between the biomass and these factors, and compare the predicted value with the actual value for testing.

Selection of physiological and biochemical indexes
The contents of chlorophyll, soluble sugar and starch in leaves, as well as the enzymes activities of sucrose phosphate synthase(SPS) and sucrose synthase(SS) are important index to explore biomass and carbon metabolism [8].Soluble sugar not only provides energy and metabolic intermediates for plant growth, but also regulates osmotic pressure and plays a role in responding to drought stress.In metabolism of nitrogen, GS, GOGAT and GDH play different roles in NH + 4 assimilation as the main pathway of amino acid and protein synthesis. GS and GOGAT can promote the synthesis of amino acids. GDH inhibit the synthesis of amino acids, but the synergistic effect of GDH and GS can also promote the synthesis of amino acids [9,10,11]. Then GS, GOGAT have appositive effect on biomass synthesis, while GDH has a two way effect. NR, NIR and GLS can improve the synthesis of amino acids through the inter action effect [11,12,13]. So NR, NIR and GLS play appositive role in biomass synthesis. All of the above physiological and biochemical indexes have direct effects on biomass synthesis.
Many soluble proteins are important components of enzymes in plants and are involved in the regulation of various physiological and biochemical metabolic processes [14], also an indicator to measure whether plants will undergo heavy metal stress [15].
Plant defense improves the ability of plants to adapt to environmental changes and survive. Antioxidant enzymes such as SOD, POD, CAT and Pro significantly increased plant growth, biomass, chlorophyll content and gas exchange characteristics by increasing antioxidant activities. However, PAL and MDA reduce the activity of antioxidant enzymes and inhibit plant growth through oxidative stress [16,17,18,19]. SOD, POD, CAT and Pro play a positive role in indirect effect, while PAL and MDA play a negative role. Therefore, these indicators are often used to explore plant growth and defense processes, the structure chart for theme mechanism of biomass accumulation is established in Figure 2.

Strains tested
Nine endophytic fungi are isolated from roots of S. miltiorrhiza and proved to be non-pathogenic.

Determination of physiological and biochemical indexes
The activities of SS,SPS,NR,GS and GOGAT were determined through enzyme solution prepared from young leaves of S.miltiorrhiza [20,21,22,23]. The contents of reducing sugar and soluble sugar were estimated by adopting 3,5dinitrosalicylic acid method and anthrone method respectively [23,24,25]. The soluble protein content was assessed with coomassie brilliant blue method [26]. Chlorophyll content was determined by spectrophotometry [27]. The activities of SOD, POD, CAT, PAL and MDA and Pro in S. miltiorrhiza leaves were measured according to literatures [25,28,29].

Determination of net increased biomass
The S.miltiorrhiza seedlings were taken out and washed them carefully to remove impurities attached to the roots. After absorbing the surface moisture with absorbent paper, weigh it and subtract the weight of seedlings when transplanting to get the net biomass. After each sample is baked at 105 o C for 12 hours, dry weight is obtained after cooling.
2.5. Descriptive statistics of physiological and biochemical indexes of S. miltiorrhiza This paper assumes that the experimental data of physiological and biochemical indicators are normally distributed. Then, some descriptive statistical analysis methods, including box chart, correlation analysis chart and so on, are carried out on the experimental data. The results can show that whether the data are beyond the normal range, the reasons for the abnormality, the linear correlation between physiological and biochemical indexes and S. miltiorrhiza biomass, and the relationship between physiological and biochemical indexes and S. miltiorrhiza biomass.

Systematic analysis based on Lasso algorithm
Based on the above analysis, the linear regression function is defined as follow where x (j) i represents the jth physiological and biochemical index of the ith sample, y i represents the biomass of S. miltiorrhiza of the ith sample. Because .., 9 are multicollinearity, the Lasso method is used to filter variables. The Lasso constructs a penalty function to obtain a more refined model, which makes it compress some regression coefficients. Here, l 1 -penalty is used for regularization estimation parameter in formula [30], defined aŝ where N represents the number of samples, Y = (y 1 , y 2 , ..., y 9 ) T represents the biomass, X = (X 1 , X 2 , ..., X 9 ) T represent the physiological and biochemical indexes, β represents the regression coefficient, λ ≥ 0 represents the penalty parameter.

Regression analysis based on MLP neural network
MLP neural network is a multi-layer fully connected neural network. Its basic structure includes input layer, hidden layer and output layer. It represents the nonlinear structure among variables by increasing the number of layers and nodes. The connection from each layer to the next layer is added by linear weighting, and the value of the next layer is obtained by the activation function. Its principle is shown in the figure 3. x j represents the input of the node j of the input layer, j = 1, · · · , n. w ij represents the weight between the nodes of the implied layer i and the nodes of the input layer j in the input layer. v jk represents the weight between the node of the output layer k and the node of the implied layer k and k = 1, ..., q; y represents the output of the node at the output layer k node.
Here we show the connection mode of the kth layer as follows: where x k j represents the input value of the kth layer; w k ij represents the weights; θ k i represents bias; g k i represents the output value of the output layer; φ(·) represents the activation function which is a non-linear function; f k i represents input value of the kth hidden layer.
In the process of training, training a multi-layer perceptron is the process of continuously adjusting the weights of the weighted chain until it can better fit the input-output relationship of the training data. The weight overlapping formula is where w k ij represents the weight of the jth post input link after the kth cycle; β represents learning efficiency; x k ij represents the jth attribute value of training sample x i of kth layer. The new weight w k+1 ij is equal to the old weight w k ij plus a term proportional to the prediction error (y k i −ŷ k i ). The hidden layers is defined as the input layer, we will not repeat them here. The principle of MLP neural network is to adjust the weights according to the mean square error. The mean square error is as follow.
where M SE represents the mean square error; z l represents the actual values of the biomass;o l represents the forecast values of the biomass.

Data analysised
All the data and data analysis chart are analysised or plotted in R 4.0.3 + Rstudio. The Lars package is used for filtering variable analysis of Lasso. The neuralnet package is used for regression analysis of MLP neural networks.

Analysis of variation of various physiological and biochemical indexes
The dependent variable is S. miltiorrhiza biomass (y), and the independent variable is chlorophyll ( Because of the large difference in unit and quantity of various physiological and biochemical indexes, the physiological and biochemical indexes are standardized by the function where x ij is the value of the jth physiological and biochemical index of the ith plant of S. miltiorrhiza, x j is the average of this indicator, and s j is the standard deviation of this indicator. Some physiological and biochemical indexes are influenced by individual plant, such as SS (x 2 ), GOGAT (x 6 ) are very different (Figure 4), indicating that they lack consensus in the biomass synthesis of S. miltiorrhiza and there is uncertainty among indicators.

Correlation analysis between physiological and biochemical indexes and biomass
of S. miltiorrhiza In the Figure 5, the absolute value of the correlation coefficient is greater than 0.5, that is, the area of the circle is greater than that of the semicircle. Then we can obtain that the biomass of S. miltiorrhiza (y) is positively correlated with chlorophyll (x 1 ) and CAT (x 15 ), while it is positively correlated with GLS (x 10 ) and POD (x 13 ). The results show that the decomposition of chlorophyll promoted the biomass synthesis, GLS consumed soluble protein, POD consumed sucrose and soluble protein, thus inhibited the biomass synthesis. Although CAT consumed sucrose and soluble protein, it is closely related to POD (x 13 ). In addition, from the internal consideration of physiological and biochemical indicators, such as chlorophyll (x 1 ), NR (x 8 ) and CAT(x 15 ). It is positively correlated with SPS (x 3 ), NiR (x 9 ) and POD (x 13 ). Negative correlation is found between the two groups and NIR (x 9 ), GLS (x 10 ), POD (x 13 ). It is positively correlated with CAT (x 15 ). These results indicate that there is multicollinearity between physiological and biochemical indexes, and linear regression can not be carried out directly.

Relationship between physiological and biochemical indexes and biomass of
S. miltiorrhiza From the Figure 6, the relationship between S. miltiorrhiza biomass and various physiological and biochemical indexes is not simple increase or decrease. There will be a certain fluctuation, that is, with the increase of various physiological and biochemical indexes, S. miltiorrhiza biomass will increase and decrease. The results show that the biomass can not be expressed by a single enzyme activity or physiological and biochemical indexes, nor by all physiological and biochemical indexes, but by the combination of some effective physiological and biochemical indexes. Therefore, it is necessary to systematically analyze  With the horizontal axis of x i , which is the physiological and biochemical index of i and the vertical axis is y, which is the biomass of S. miltiorrhiza.

Key physiological and biochemical indexes affecting biomass synthesis of S. miltiorrhiza
We use the glmnet package of R software to calculate Lasso model by seting parameters as default. We optimize the model and the function by the function cv.glmnet(). The main function of the method is to determine the steps of the optimal solution. It shows that the optimal λ value is obtained in 5th step. The corresponding non-zero variables are shown in Table 1. We can draw a conclusion that GS, GDH and CAT are positively correlated with biomass, thus GLS, POD and soluble protein are negatively correlated with biomass. The results showed that GS and GDH in seedling stage were beneficial to the synthesis of protein in cells with amino acids as the main substrate, and coordinated carbon metabolism, which was in line with the fast nitrogen assimilation in the early stage of plant growth, which laid the foundation for high-speed carbon assimilation in the later growth stage.
However, GLS, POD and soluble protein are negatively correlated with biomass formation. The results showed that soluble proteins consume amino acids for biomass synthesis, and carbon assimilation products existed in the form of starch in leaves, which was not conducive to the growth of leaves; GLS, POD reduced H 2 O 2 , converted carbohydrate into lignin, improved plant physical defense barrier and consumed assimilates needed for plant growth. CAT can specifically remove H 2 O 2 , reduce the toxicity of H 2 O 2 , and maintain the stability of cell membrane; The biomass of S. miltiorrhiza was positively correlated with CAT. At the same time, GLS, POD and CAT are negatively correlated (see Figure 4), indicating that they exist in the form of H 2 O 2 , which is the reaction of substrate and has competitiveness and coordination.

Prediction and test of biomass
Then, we use GS, GLS, GDH, soluble protein, POD and CAT as the independent variable and y as the dependent variable for regression analysis. The neuralnet package in R software is used to calculate MLP neural network. We randomly select 70 percent data for training and 30 percent data for verification. The verification results are shown in Figure 9.

Discussion
Based on the principles of metabolic engineering, the relationship between physiological and biochemical indexes and biomass is systematically discussed by the framework of "Build-Design-Calculate-Test". In terms of influencing factors, the physiological and biochemical indexes studies in this paper are increased from 10 the following to 18. From the perspective of research methods, this paper not only carries out boxplot analysis and correlation coefficient analysis on physiological and biochemical indexes and biomass, but also makes full use of Lasso regression to find out the key factors affecting the biomass synthesis of S.miltiorrhiza: GS, GLS, GDH, POD, CAT and soluble protein. MLP neural network is used to establish the functional relationship between S.miltiorrhiza biomass and key factors. According to the established equation, the data are divided into training set and test set, and the validity of the model is verified.
Interestingly, the relationship between physiological and biochemical indexes in the mechanism of biomass synthesis is consistent with the results of mathematical model analysis. It indicates that biomass is mainly related to nitrogen utilization and antioxidant system of S.miltiorrhiza. The results show that increasing the activities of GS, GDH and CAT, and decreasing the activities of GLS and POD are beneficial to increase crop yield. At the same time, the above indexes can monitor the recovery of S.miltiorrhiza. In addition, the results of this study can also guide the artificial cultivation of S.miltiorrhiza.

Conclusion
The research framework used in this paper provides a rigorous logical reasoning process from the selection of influencing factors, the selection of key factors, as well as the establishment and verification of the model. This method is not only applicable to plant metabolic engineering, but also to the phenomena with similar mechanism characteristics, such as the relationship between plant activity and soil environment, and the self-organization of microbial communities. In addition, with the promotion and application of machine learning and deep learning algorithms, such as Lasso regression and MLP neural network used in this paper, powerful tools are provided for the comprehensive and systematic study of plant growth and the synthesis of active components, and more detailed and accurate explanations are provided for the selection model combining with plant mechanism.