In this study, the protein content of winter wheat grain had obvious absorption peaks at 851, 1443, 1458, 1476, and 2246 nm. The original spectrum of grain was positively correlated with protein content, and the correlation was the strongest at 851–1476 nm, while the original spectrum of powder was negatively correlated with protein content at 714–1154 nm and positively correlated with protein content at 1407–2500 nm. The characteristic spectra of wheat grain protein extracted by SPA were mainly distributed in 350–450 nm, 900–1160 nm, 1300–1500 nm, and 1901–2100 nm. The characteristic spectra of wheat grain powder protein contents were mainly distributed in 330–430 nm, 550–600 nm, 1300–1400 nm, and 1990–2050 nm. Considering the fact that the basic unit of protein is an amino acid, which is mainly composed of C, H, O, and N; information provide by hyperspectral reflectance mainly comes from the frequency doubling absorption of C-H, O-H, and N-H groups, of which about 800 nm was related to the third harmonic generation of C-H and N-H 22. In addition, values between 1200–1500 nm may be related to C-H triple frequency and O-H stretching vibration; around 2000 nm was combined with N-H stretching vibration frequency absorption 23,24. In summary, the spectral regions of 350–430 nm, 851–1154 nm, 1300–1476 nm, and 1990–2050 nm were closely related to winter wheat protein.
We found that the hyperspectral reflectance curves of the powder samples were built with higher model accuracy compared to the seed samples, and the correlation between the powder spectral data and protein content was higher. The reason for this was that the seed and powder samples have different particle sizes and thus the spectral reflectance is very different, and the protein content is measured by the powder 10,25; therefore, the correlation and model accuracy was higher than that of the seeds. However, powder samples can damage seeds and seed coats when the model for powder samples was of similar accuracy to that for seed samples, it was more practical to choose seeds. In this study, the difference in R2 between the model validation set of SG-BPNN in the powder state and SG-SVM in the seed state was 0.025 and considering the practical value, it was also feasible to choose the model in the seed state for monitoring the protein content of wheat. Therefore, the potential mechanism between the accuracy of the spectral prediction of the sample treatment on quality and the explanation of such subtle differences remains to be further investigated.
Additionally, when performing the correlation analysis between pretreatment methods and protein content, it was found that the highest correlation was FD 26. It is reported that FD can remove linear and near-linear components in the original spectrum, highlighting the increase and deceleration rate of spectral reflectance. It can also capture the inflection point and extreme point of the original spectral curve, and accurately locate the peak valley characteristics of protein absorption in the spectral curve 27. In addition, FD can also separate the absorption characteristics and change trends of the protein spectrum in the infrared region, achieving better prediction results than the original spectrum 28,29. However, the optimal model in this paper is SG-BPNN based on characteristic bands in the powder state, and the preprocessing method used is SG instead of FD. And the influence of seed coat on spectral information is greatly reduced in the powder state 30–32. In addition, BPNN has a strong nonlinear fitting ability, which can effectively analyze and use rich data sets to simulate the complex relationship of the internal mechanism of variables, greatly improving the accuracy of the model 33,34. The featured band training model will not train invalid information, which improves the recognition accuracy 35–38. Therefore, the combination of sample state, band selection, and model algorithm may have a great impact on the accuracy of the model.