Prediction of Needle Chlorophyll Model with Different Leaf Ages Based on BPNeural Network and PLSR

Background: To explore the changes of chlorophyll content in needles of different leaf ages of Picea koraiensis Nakai of different specifications, the study compared the prediction accuracy changes of chlorophyll in needles of Picea koraiensis Nakai by two modeling methods: the BP neural network and partial least squares regression (PLSR) methods. The effects of different spectral pre-processing and characteristic band selection methods on the performance accuracy of the model were tested, and the optimal combination model was selected to predict forest growth status and community structure productivity through the physiological and biochemical characteristics of needles at different leaf ages. Results: 1) the spectral pre-processing method could avoid systematic errors and eliminate background values; 2) the accuracy of the needle chlorophyll fitting model with different leaf ages was much higher than that of mixed needle chlorophyll model, verifying that needle chlorophyll with different leaf ages could better estimate the annual growth and examine the growth status of Picea koraiensis Nakai; 3) the accuracy of the BP neural network model was significantly higher than that of the PLSR model, with its R2 above 0.95, and the validation set’s R2 above 0.86; and 4) the fitting accuracy of different leaf age needle chlorophyll models of the spectral pre-processing model, variable selection model, PLSR model and BP neural network: triennial needles > annual and biennial needles. Conclusions: The BP neural network method was more accurate than the PLSR method in predicting pigment content model. In the process of model fitting, it was found that the pigment model fitted by fine classification of needles improves the accuracy of the model, which provides the basis and theoretical support for the establishment of the model by combining remote sensing technology with stoichiometry methods in the future.


Background
The data of the eighth national forest resource inventory showed that China's 69.33 million hm 2 of plantation area was the most in the world, while its afforestation area and accumulation of coniferous pure forest accounted for 73.07% and 74.58% of the world's total plantation area and total accumulation, respectively [1] . Picea koraiensis Nakai is the main conifer species for afforestation and timber in Northeast China, and the cultivation and management of plantations by Picea koraials Nakai have been included in the National "Thirteenth Five-Year Plan" Key R&D Program [2] . Coniferous stands are often caused by unreasonable allocation of spatial and non-spatial structures, forcing trees to compete for their spatial niches, resulting in unhealthy stands and unstable structural attributes. However, leaf pigment content, as one of the indicators for evaluating the status of plant and forest health, has been identified as a trait associated with ecosystem structure and its biodiversity relationship [3] . Its pigment content data can reflect the status of plant growth such as nutrient stress, photosynthetic capacity and senescence process, which is helpful for the biogeochemistry and nutrient cycle of ecosystems [4][5] . At present, the traditional method for the determination of plant physiological and biochemical indicators is extremely time consuming and labor-intensive, and even environmentally destructive. An important goal of forestry research is how to quickly monitor the growth status, content of inclusion, health evaluation and other information related to large area stand structure and provide scientific basis and theoretical support for the decision-making of forestry production management.
With the evolution of sensor technology, hyperspectral remote sensing technology can easily obtain the reflection information of different targets, and the fine spectral resolution can fully reflect the characteristics and differences of subtle changes in vegetation spectra [6] . Wen et al. [7] confirmed that the establishment of a BP neural network model, which is based on band depth analysis, can improve the estimation accuracy of pigment content in rice leaves by spectral techniques. Yuling et al. [8] showed that the partial least squares regression (PLSR) technique was superior to the stepwise multiple linear regression (SMLR) technique in predicting the model accuracy of soil heavy metals. However, variable selection using the squared multiple correlation (SMC) method can improve the PLSR model accuracy [9] .
Garhwal et al. [10] used the Variable Importance in Projection (VIP) method for band optimization to reveal the sensitive band of diseased potatoes and improve recognition accuracy.
The use of hyperspectral technology for rapid non-destructive measurement is favored by many scholars, but there are few reports on the application of spectral technology to predict the pigment content of needles at different leaf ages of high-throughput conifers in China, combined with different spectral pre-processing and characteristic band selection to improve the accuracy of the model. The use of an artificial neural network model to parameter inversion is also in the early stages of research in China [11] . Therefore, in this study, based on the measured needle chlorophyll content and needle spectral reflectance of annual, biennial and triennial needles of Picea koraiensis Nakai, the construction of different specifications of needle chlorophyll models of different leaf ages of Picea koraiensis Nakai involves the use of

Chlorophyll Content Statistics and Spectral Preprocessing Analysis
The statistical values of chlorophyll content are shown in Table 1, and the chlorophyll content of different leaf ages of Picea koraiensis Nakai with the same ground diameter: triennial needles > biennial needles > annual needles. The chlorophyll content of Picea koraiensis Nakai with the ground diameter of 25-30 cm at the same leaf age was greater than that with the ground diameter of 20 -25 cm, and the dispersion was also less than that of Picea koraiensis Nakai with the ground diameter of 20-25 cm.
When the Specim hyperspectral camera is used to collect needle reflectance, the surrounding noisy environment, light scattering and diffuse reflection will have a certain impact on the spectral data. Therefore, pre-processing of the spectral data can avoid the errors caused by noise and baseline translation [12] . Although there was no significant change in the spectral reflectance of Picea koraiensis Nakai needles of different sizes at the same leaf age, there was a significant difference in the spectral reflectance of needles of different leaf ages at the same size (p<0.05) (Fig. 2). The spectral reflectance values of annual conifers SG-Raw and MSC in the bands of 513 nm to 616 nm and 720 nm to 988 nm were higher than those in the conifers biennial, triennial conifers and mixed Picea koraiensis Nakai conifers. In addition, extreme points were formed near 711 nm and 690 nm in the SG-FD and SG-SD spectra, respectively, which were due to the strong absorption of chlorophyll in different bands. These findings were consistent with results in the existing literature [13] .

Response of Different Leaf Age Needle Pigment Models to Different Spectral Pretreatments
Five spectral pre-processing methods were applied to the PLSR model of conifer chlorophyll at different leaf ages (Fig. 3). The fitting accuracy of the Picea koraiensis Nakai model with a ground diameter of 25-30 cm was generally higher than that of Picea koraiensis Nakai with a ground diameter of 20-25 cm, and the fitting accuracy of triennial needles was higher than that of annual and biennial needles. This may be related to the fact that the greater the leaf age and the stronger the conversion ability, the higher the pigment content [14] . However, the fitting accuracy of mixed Picea koraiensis Nakai needle model is much lower than that of different specifications and different leaf age needles (Fig. 3). Therefore, the construction of the pigment content model for fine classification of needles can improve the robustness of the model.

Influence of Feature Band Extraction on Model Performance
To obtain a more robust yet simplified PLSR model with stronger predictive ability, the SMC and VIP variable selection methods were used to extract the characteristic bands and eliminate useless or nonlinear variables. After variable selection treatment, some models showed an increasing accuracy trend (Fig. 4), and the number of model components decreased, which was consistent with the results of Yanjie L [15] . Compared with the optimal models of annual, biennial, triennial needles and mixed needles that had been pretreated with SMC, and VIP treatments (Fig. 5), the models obtained after VIP treatment of annual and triennial needles of

Construction of BP Neural Network Model
The BP neural network is trained by forward signal transmission and back error propagation mode, and the optimal weight value and threshold value are calculated to fit the model. In

Discussion
Compared with conventional chemical methods, spectroscopy is faster, more chemical-free, easier to use and non-destructive. Traditional spectral analysis generally uses full spectral modeling [16] . However, the full spectrum contains a good deal of useless information or interference variables, which not only increase the model's complexity but also reduces its prediction performance. After pre-processing of the spectrum in this test, the model accuracy of SG-FD, SG-SD, SNV and MSC was greater than that of SG-Raw, which was generally consistent with the results of combining SNV and FD studied by Jie L et al. [17] to eliminate the interference of factors such as background noise and baseline drift and to highlight the effective information of the spectrum to improve the model's discrimination accuracy. Based on spectral pre-processing, the selection of SMC and VIP variables greatly improves the prediction accuracy of the Picea koraiensis Nakai pigment model and reduces the RMSE value. This is consistent with the results of the Xiangzhong [18] trial. Although some scholars have cited neural networks in spectral analysis modeling and discriminant analysis to improve the prediction accuracy and fine classification of plant physiological and biochemical index models [19], [20], [21] , there are few comparative studies on the accuracy of BP neural network models and PLSR models. In this experiment, it is verified that the accuracy of the BP neural network model is higher than that of the PLSR model, which is generally consistent with the results of Jingyang et al. [22] who used the nonlinear mapping relationship of the BP neural network model to treat variables better than the linear relationship of the multiple linear regression model.
It was found that the fit of the needle pigment content model with different specifications and different leaf ages was much higher than that of the mixed Picea koraiensis Nakai model, which may be related to the dispersion of pigment content, indicating that the model of fine classification and fitting of needles is helpful to improve the robustness of the model. Whether it is the spectral pre-processing model, the variable selection model or the PLSR or BP neural network model, the fitting accuracy of the chlorophyll model of triennial conifers is always higher than that of the annual and biennial needles. Yingzi et al. [23] showed that within a certain range, the greater the leaf age, the higher the pigment content. Analysis of chlorophyll content at different leaf ages in Picea koraiensis Nakai revealed that the dispersion of its chlorophyll content decreased with increasing leaf age (p<0.05), which may directly affect the fitting accuracy of the model. Therefore, the high fitting accuracy of the triennial needle model may be closely related to the level and dispersion of pigment content. In summary, the model of fine classification fitting of needles helps to improve the robustness of the model.

Conclusion
Both the PLSR and BP neural network models can rapidly predict the chlorophyll content of the Picea koraiensis Nakai needle leaf, but the BP neural network model has higher accuracy and a lower RMSE.
(1) Spectral pre-processing can avoid systematic errors and eliminate background values, but it has little effect on improving model accuracy. After the PLSR model was processed by SMC and VIP variable selection, the model accuracy was greatly improved and the RMSE value was reduced.
(2) Compared with the PLSR model, the BP neural network model had higher accuracy, with R 2 above 0.95, and the validation set R 2 above 0.86.

Determination of Conifer Pigment Content
We took 0.3 g needle leaves of Picea koraiensis Nakai, cut them into pieces and put them into a mortar, added a small amount of quartz sand and 95% ethanol, ground them into homogenate, transferred them to a test tube and added 95% ethanol to constant volume of 10 ml. We then sealed the tubes and placed them in the dark to soak for 24 h. Then we performed filtration, using an Agilent Cary60 UV-Visible spectrophotometer to measure the absorbance of chlorophyll extract at 649 nm and 665 nm. The formula for calculating the first derivative of spectrum is (1) In Equation (1), λn R is the first derivative of the spectrum in the band from n to n + 1, and 1 λn R and λn R are the original spectral reflectance values at n + 1, n, respectively.
The formula for calculating the second derivative of spectrum is (2) In Equation (2), λn R   is the second derivative of the spectrum in the band from n to n + 1, and 1 λn  R and λn R are the original spectral reflectance values at n + 1, n, respectively. ∆λ denotes the interval from wavelength λ n-1 to λ n. Differential processing can eliminate the influence of system error and background noise on spectral values.
The formula for calculating SNV is ( In Equation (3), i x is the spectral reflectance of the ith observation, i x is the mean, and i s is the standard deviation. SNV standardizes each spectrum and eliminates the effect of dimensional effects and the size of the variable's own variation, that is, the size of the value [24] .
The formula for calculating MSC is In Equation (4), i x is the spectral reflectance of the ith observed value, r x is the average of all spectral data as the ideal spectral value, and i a and i m are constant terms. MSC eliminates scattering levels resulting in spectral differences and corrects the baseline translation and offset phenomena of spectral data by ideal spectra.

Partial Least Squares Regression
PLSR combines the measured individual chemical components with reflection spectra for model calibration [25] , and finds the best function of the data for matching by minimizing the sum of squares of the errors [26] . PLSR, which is widely used in modeling statistical analysis.
The PLSR pairs m independent variables (x1, x2, x3, …, xm) and q dependent variables (y1, y2, y3, …, yq), and then normalizes them. Then the linear combination of the first components t1 and u1 is extracted from it to establish the initial variable equation about T1: In Equation (5), 1  is the parameter vector at time T1 of the independent variable, and 1 f is an n × q residual matrix. If the first component combination extracted cannot meet the model accuracy requirements, the above steps are continued to extract the components from the independent variables (t2, t3, t4, …, tk).
Bring Equation (6) into Equation (7) to obtain the final PLSR equation To avoid the problems of low accuracy and poor robustness of the PLSR fitting model, the PLSR model was subjected to variable selection to screen feature bands for improving model performance. SMC can reduce the impact of unrelated variables on the model and highlight related variables to improve model accuracy [27] . VIP is used for band optimization [28] .

BP Neural Network Algorithm
The BP neural network algorithm can perform the complex and nonlinear mapping function, predict the nonlinear function approximation and other problems, and has the ability of self-learning promotion and generalization [29] . It uses the square of network error as the objective function, uses the gradient descent method, adjusts the network weight value and threshold through back propagation, and calculates the minimum value of the objective function. The BP neural network, shown in Fig. 1, includes the forward transmission of signal and the reverse propagation of error. The forward transmission includes three layers. Each layer of neuron affects only the next layer of state. If the output of the trained neural network cannot obtain the expected output value, the reverse propagation process of error is performed.
The weight and threshold are adjusted layer by layer until the expected error range is reached.

Data Analysis
R was used for data collation, model establishment and drawing analysis. The prospectr package [30] was used for spectral processing and KS sample [31] division, and the spectral curve was calculated with Savitzky-Golay filtering [32] with a window of 11; the pls package [33] was used for leave-one-out cross-validation and PLSR model fitting; the plsvartr package [34] was used to select SMC and VIP variables for the PLSR model and to extract important variables affecting the model; the grid, MASS and neuralnet package [35] were used for BP neural network model fitting; and the ggplot2 package was used for chart drawing.