Comparison of the Prediction of Winter Wheat Leaf Water Content by Using New Hyperspectral Index and Machine Learning Models

In this study, hyperspectral technology was used to establish the winter wheat leaf water content inversion model to provide technical reference for winter wheat precision irrigation. In a eld experiment, seven different wheat varieties for different irrigation times were treated during two consecutive years. The data onto canopy spectral reectance and leaf water content (LWC) of winter wheat were collected. Five different modeling methods, Spectral index, partial least squares (PLSR), random forest (RF), extreme random tree (ERT) and k-nearest neighbor (KNN) were used to construct LWC estimation models. The results showed that the canopy spectral reectance was directly proportional to the irrigation times, especially in the near infrared band. As for LWC, the prediction effect of the newly differential spectral index DVI (R1185, R1308) is better than the existing spectral index, and R 2 are 0.78. Because of the large amount of hyperspectral data. The correlation coecient method (CA) and loading weight (x-Lw) are used to select the water characteristic bands from the full band. The results show that the accuracy of the model based on the characteristic band is not signicantly lower than that of the full band. Among these models, the ERT- x-Lw model performs best (R 2 and RMSE of 0.88 and 1.81; 0.84 and 1.62 for calibration and validation, respectively). In addition, the accuracy of LWC estimation model constructed by ERT-x-Lw was better than that of DVI (R1185, R1307). The results provide technical reference and basis for crop water monitoring and diagnosis under similar production conditions. growth period, and the difference increased gradually. The LWC content in w0 was the lowest, which was signicantly lower than that in w2 and w1. The results showed that lling water could delay the senescence of leaves and delay the green retention of leaves.


Backgroud
Wheat is the main food crop in North China. Due to the imbalance between precipitation and water demand during the growing period, reasonable irrigation has become a necessary condition for high yield of wheat (Wang et al., 2017). As an important component of plant canopy structure and other biochemical changes, leaf water content is an important indicator to re ect crop water status and indirectly re ect the pro t and loss of soil water (José R et al., 2018;William T et al., 2000). Therefore, leaf water content can be used as an important reference index for irrigation decision (Wang et al., 2001).
Hyperspectral remote sensing technology has the advantages of fast, economic, and non-destructive. It can be used to monitor. the growth of crops by obtaining the re ectance information on plants. Thus, it is of great signi cance for precision irrigation and water-saving irrigation to construct the diagnosis model of water status by hyperspectral remote sensing technology At present, hyperspectral remote sensing technology has been widely used in crop water monitoring scenarios. In some studies, the relationship between wheat leaf water content and hyperspectral was analyzed, and the spectral index was used to estimate leaf water content. For example, the leaf water content of rice, peanut, soybean, and wheat can be well predicted by WI (R900 / R970) / NDVI (R900-R680) / (R900 + 680) (Inoue, 1993). Zhao  Considering the leaf canopy re ectance of ten wheat varieties under different irrigation treatments from 2018 to 2020, ve different models was established separately, to provide a reference for hyperspectral monitoring of winter wheat leaf moisture under similar production conditions in the future. The aim of this study were to: (i) analyze the effect of different irrigation times on wheat leaf water content and spectral re ectance; (ii) evaluate the performance of newly spectral indices of leaf water in comparison to existing spectral indices; (iii) evaluate and compare different model algorithms for leaf water status.  Table 1. in the 350-2500 nm range were collected. To minimize the effects caused by sky and eld conditions, spectral measurements were obtained from 10 sites in each plot and averaged into a single spectral sample. For each experiment, measurements were obtained on several different dates that re ected the major growth stages of wheat.

Determination of leaf water
After the canopy spectrum was measured, the wheat plants at corresponding points were collected and all the leaves were extracted. The moisture content of wheat leaves was determined by drying method.
The fresh weight of the leaves was weighed with an analytical balance (accuracy of 0.01 g). The leaves were put into bags and sterilized at 105 ℃ for 30 min. After that, it was dried at 80 ℃ until the weight remained constant weight, and the dry weight of the leaves was weighed at the end. The LWC was calculated using the following formula: (1) LWC is the water content of leaf, %; m f is the fresh weight of leaves, g; m d is the dry weight of leaf, g.

Data analysis and Utilization
During the two years of the experiment, a total of 252 wheat samples were collected by the researchers.
The test data in 2018-2019 are used as modeling samples (n = 126) and the test data from 2019-2020 are used as validation samples (n = 126). The statistical characteristics of leaf water content in each sample set are shown in Table 2.

Characteristic band screening
The hyperspectral band of 350-2500 nm was collected in the experiment (Due to atmospheric noise of hyperspectral spectrometer at 1350-1400 nm, 1800-1950 nm, 2450-2500 nm, the spectral bands in this range are removed), a total of 1901 dimensional spectral bands. If all the spectral bands are used as the input of the model, it will cause "dimension disaster". Therefore, the correlation coe cient method and the load factor method are used to screen the characteristic bands.
(1) Correlation coe cient (CA) This method determines the characteristic band according to the correlation coe cient between the spectral band and the parameters. The correlation between leaf water content and canopy re ectance under different irrigation treatments was analyzed. The characteristic wavelength was determined by selecting the maximum absolute value of correlation coe cient and the position of wave crest and trough.
(2) x-Loading weight (x-Lw) The loading weight based on PLSR model can clarify the in uence proportion of different dependent variables on the total independent variables, which is of great signi cance for the rapid screening of characteristic bands. In this study, the peak and trough are selected as the characteristic band.
Based on the feature bands selected by the above two methods as dependent variables, the inversion models of leaf water content of winter wheat were constructed by PLSR, RF, ERT and KNN respectively.

modeling method (1) Spectral index
Combined with the existing research, it has been proposed that the spectral index related to crop water status has been selected by researchers, as shown in Table 3, and the relationship with leaf water content has also been analyzed. In order to obtain better spectral parameters, the normalized vegetation index (NDVI), the ratio vegetation index (RVI) and the difference vegetation index (DVI) were calculated in the range of 350-2500 nm, which were shown in following formula, and the relationship between them and leaf water content was analyzed, so as to determine the optimal spectral estimation of leaf water content parameters were measured.
R λ 1 and R λ 2 are re ectance of any two bands in the range of 350-2500 nm, respectively.
(2) Partial Least-Squares Regression (PLSR) Partial least squares regression (PLSR) can effectively establish regression model under the condition of serious multiple correlation of independent variables. In the process of modeling, principal component analysis can be used to judge whether the independent variables can signi cantly improve the prediction ability. Therefore, this method can explain the multiple autocorrelations among multiple hyperspectral features (Shao J et al., 1993). In this study, PLSR model was carried out in R version 3.5.3 using "pls" package, and "Loo" cross validation was used to determine the number of components.
(3) Random Forest (RF) Random Forest can establish the relationship between multiple independent variables and a dependent variable, improve the prediction accuracy of the model through many classi cation trees, and the sample data can be fully utilized (Victor et al., 2014). To apply this technique, we used python 2.7. With the n tree = 500 used in this study.

(4) Extreme Random Trees (ERT)
Extreme random tree is a top-down method, which is very similar to random forest, but it is different from random forest in two points: rstly, it does not adopt bootstrap sampling replacement strategy, but directly uses the original training samples, in order to reduce the deviation; secondly, it gets the bifurcation value completely randomly, so as to realize the decision tree to bifurcation. The result is smaller and more stable than the random forest (Uddin et al. ,2015). To apply this technique, we used python 2.7.
(5) k-nearest neighbor (KNN) k-nearest neighbor was proposed by Cover and Hart. It is a classi cation algorithm based on the proximity of similar samples in the pattern space (Cover T et al., 1967) Euclidean distance is used to measure the similarity between samples, and the larger the distance, the less similarity. In this study, The K-nearest neighbor algorithm was performed in python 2.7. Cross validation method was used to determine K value, K = 3.

Model validation
The coe cient of determination (R 2 ), root mean square error (RMSE) and prediction bias (PRD) were used to evaluate the accuracy of the model. The larger R 2 is, and the smaller RMSE is, which indicates that the model has good prediction accuracy. RPD > 1.4, indicating that the prediction ability of the model is acceptable, and the model can be applied.

Effects of irrigation times on leaf water content (LWC) and canopy spectral re ectance of Wheat
In order to explain the effects of different irrigation times on LWC and canopy spectrum, the experiment 1 data were taken as an example. It can be seen from Fig. 1 that with the advance of growth, the LWC of all varieties increased rst and then decreased. It reached the highest10 days after jointing water (Mar.30). At the early wheat growth stage, there was little difference in LWC among different irrigation times. However, it decreased rapidly in the late growth period, and the difference increased gradually. The LWC content in w0 was the lowest, which was signi cantly lower than that in w2 and w1. The results showed that lling water could delay the senescence of leaves and delay the green retention of leaves.
Taking Luomai No.27 as an example, under the same irrigation times, the canopy re ectance of wheat was slightly lower than that of 10 days after jointing water (Mar.30). However, canopy re ectance at 10 days after jointing water was higher than that at 10 days after lling water. Therefore, the canopy re ectance rst increased and then decreased with the growth process. With the increase of irrigation times, the canopy re ectance in the visible light region (350-750 nm) did not change signi cantly. The main reason is that the growth of crops is accelerated after getting enough water after irrigation. Leaf area index, biomass and canopy spectral re ectance increased. It is more obvious after lling water, that is w2 > w1 > w0. Under different irrigation times, the variation of canopy spectral re ectance was basically the same among different wheat varieties.

Correlation between LWC and spectral index of wheat leaves
Under different irrigation treatments, the correlation between LWC and existing water related vegetation indices was analyzed, such as Ratio Index, NDWI, MSI, MDVI, hNDVI, NDII, WI, SRWI, WI / hNDVI and FD 730 − 955 . In order to nd the best spectral index to estimate the water content of wheat leaves. Based on the leaf spectral re ectance, the relationship between LWC and NDVI, RVI and DVI was analyzed by the researchers based on the samples tested in 2018-2019, and the sensitive band range with larger R 2 value was determined. In NDVI, RVI and DVI bands, some "hot spots" with high correlation coe cient between LWC and RSI, NDS and DVI were identi ed (Fig. 3). According to Fig. 3, the highest R 2 value was extracted from the hot spot area. NDVI, RVI, and DVI consisting of 1185 nm and 1307 nm perform best for LWC.
Based on the correlation comparison between 13 spectral indices and LWC (Fig. 4)

Extraction of characteristic bands of LWC
The correlation between LWC and original spectral re ectance (350-2500 nm) under different irrigation times was shown in Fig. 6. The results showed that the correlation coe cient of LWC ranged from − 0. According to the contribution rate of PLSR model and RMSEP, the number of principal components is determined. When the principal component was 3, RMSEP was 2.34% and explained 96.12% variance (88.86%, 6.03% and 1.23% were explained respectively), as shown in Fig. 7A. Therefore, the best characteristic band is determined by the peak and trough of the loading weight value of the three principal components (Fig. 7B). According to the above analysis, the optimal band is: 588 nm, 663 nm,  674 nm, 680 nm, 700 nm, 763 nm, 777 nm, 783 nm, 808 nm, 816 nm, 970 nm, 977 nm, 984 nm, 1070 nm,  1072 nm, 1156 nm, 1205 nm, 1246 nm, 1264 nm, 1402 nm, 1445 nm, 1456 nm, 1660 nm, 1678 nm, 1957 nm, 1702 nm, 2221 nm and 2252 nm.

Comparison of LWC inversion models constructed by different methods
The independent variables were 1901 bands (full band in 350-2500 nm), 100 bands and 28 bands screened by correlation coe cient (CA) and x-Loading weight (x-Lw), respectively, and LWC was the dependent variable. After the PLSR, KNN, RF and extreme ERT models were established (Table 4). Then, compared with the full band model, the accuracy of the model is not signi cantly reduced with the feature bands extracted from CA and x-Lw as dependent variables. However, the input variables are reduced, which improves the e ciency of the model. Modeling and validation of R 2 , RMSE and PRD were considered. The performance of the model is as follows: PLSR-CA > ERF-CA > RF-CA > KNN-CA. However, for the 28 characteristic bands selected by x-Lw method, the performance of the model is as follows: ERF-x-Lw > PLSR-x-Lw > RF-x-Lw > KNN-x-Lw. Compared with CA method, the dependent variable is reduced by 98.63% by x-Lw method, which signi cantly improves the modeling e ciency. Compared with 12 models constructed by different methods, ERT-x-Lw model showed higher R 2 value and lower RMSE. In the modeling set, R 2, RMSE, and RPD were 0.88, 1.46, and 3.37, respectively; in the validation set, R 2 , RMSE, and RPD were 0.84, 1.62, and 2.39, respectively (Fig. 8).

Discussion
The growth and development of wheat can be directly affected by water de cit. As an important indicator of wheat growth, leaf water content can be monitored by hyperspectral technology. In this study, the spectral re ectance of wheat canopy increased rst and then decreased with the advance of growth period. The canopy re ectance was lower before jointing water, and increased 10 days after jointing water, but decreased signi cantly 10 days after grouting water. The reason may be due to the rapid growth of wheat biomass and leaf area at jointing stage. In the jointing stage, timely irrigation can In addition, different irrigation times had signi cant effect on canopy spectral re ectance. There was no signi cant difference between w0 treatment and w1, w2 treatment before and after jointing water.
Especially in the near infrared region (750-1350 nm), canopy re ectance increased signi cantly with the increase of irrigation times after jointing water, which was due to the increase of wheat plant height, chlorophyll content and net photosynthetic rate. However, the canopy re ectance of w2 was signi cantly higher than that of w1 and w0. This indicates that the leaf senescence and photosynthesis time after owering can be delayed by irrigation at the lling stage; however, without irrigation, wheat plants grow short, the leaves turn yellow and wither ahead of time, lower leaf water content and cell structure changes, which eventually leads to wheat yield reduction (Guo et al., 2016).
The researchers believe that the change of canopy re ectance is caused by the change of LWC. Therefore, the model is constructed by the characteristic response band of leaf water content, which can and Wu et al. (2009). Furthermore, these two bands are selected in this paper, which are in the water sensitive near-infrared band (Ranjan R et al., 2017). It is superior to the water spectral parameters constructed by previous studies.
Due to inclusion of too many latent variables led to over-tting (Ecarnot et al., 2013). To improve the modeling accuracy, machine learning and other methods have also been used to model and analyze the moisture index of wheat. Several researchers in the past, based on the grey correlation analysis method, the spectral index with high correlation of leaf water content was selected. These spectral indices were used as independent variables in PLSR and BP neural network models to predict wheat leaf water content, with R 2 of 0.72 and 0. and prediction determination coe cient R 2 were 0.88 and 0.84 respectively, and RMSE were 1.46 and 1.62, respectively, which were higher than those of PLSR, RF and KNN models. The reason for this is that ERT has better generalization ability and more stable performance (Randal S. et al., 2017). However, KNN breaks the continuous characteristics of the band because it learns and predicts according to the distance features between different samples (Wu et al.,2017). It is suggested that extreme random tree (ERT) may be a reliable modeling method to improve the modeling accuracy, machine learning and other methods have also been used to model and analyze the moisture index of wheat.

Conclusion
Based on the eld experiment of different irrigation treatments in two years, the effect of different irrigation times on canopy re ectance spectrum was studied. Five different modeling were compared. The results showed that irrigation in jointing + irrigation in lling period could increase leaf water content, leaf area and biomass, delay plant senescence and increase canopy re ectance. The new model was constructed by DVI (R1185, R1307), which can be used to estimate the LWC of winter wheat. The accuracy of the Extreme random tree model based on the x-loading weight method is better than the former models, and the decision coe cient of modeling and prediction is 0.84 respectively. Thus, both models can be used to estimate the water content of wheat leaves effectively. Declarations