2.1 Materials
The Red globe grapes used in the test were purchased from local vineyards in Wuhan. Ten bunches of grapes of similar quality and location were randomly selected from ten vines and fresh, intact and similarly graded grape kernels were picked from the outside, middle, top and tip of each red globe grapes bunch for this test. The samples had to be picked on the day of the test and stored in a constant temperature and humidity chamber set at (22 ± 1) °C and 65% relative humidity for 12 hours. Hyperspectral information was collected three times from each sample in three different placement modes (stalk side up, stalk side down and horizontal). The data obtained from the three placement modes were averaged to obtain an average spectrum. As previous studies have shown that the averaged spectra were significantly better than the other placement modes, the average spectra of the region of interest were extracted directly as the original spectra for analysis in this paper (Li et al. 2019). A total of 360 hyperspectral images of red grapevine samples were obtained.
2.2 Acquisition of hyperspectral image information
The hyperspectral system used in this experiment consists of a CCD camera (Hamamatsu, Japan), an imaging spectrometer (Spectral Imaging Ltd., Finland), four 50W halogen lamps (Beijing Jorhanguang Instruments Co., Ltd.), lenses and a mobile platform (Beijing Jorhanguang Instruments Co., Ltd.). The system acquired spectra in the wavelength range of 391–1043 nm with a resolution of 2.8 nm. For the experiments, the speed of the moving stage was set to 1.8 mm/sec, the exposure time of the camera was 0.15 sec, and the distance between the sample stage and the lens was 420 mm. The acquisition system is shown in Fig. 1.
The hyperspectral system was first warmed up for one hour, and in order to eliminate the interference of the instrument itself and external noise, black and white calibration was performed before the test. The black-and-white calibration is performed in three steps. In the first step, a standard white plate (Teflon rectangular white plate) is placed on the mobile platform to obtain the white plate data IW; in the second step, the lens is turned off to obtain the all-black calibration image ID; in the third step, the lens is turned on and the sample is placed on the mobile platform to obtain the original hyperspectral diffuse reflectance image IR; the calibration image R of the sample is obtained according to Eq. (1)(Gao et al. 2019).
2.3 Hyperspectral image extraction of a single sample
A hyperspectral experiment collected a total of 15 red globe grapes kernels, each kernel needs to be segmented out, and each kernel segmented out as RIOs (regions of interest) to extract the spectral information of a single red globe grape kernel sample, and the average spectrum of RIOs is calculated as the original spectral information of the sample. The noise information at both ends of the original spectrum was removed, and the wavelengths from 450–1000 nm (containing a total of 439 wavelengths) were selected for modeling analysis in this paper. When extracting the region of interest, as can be seen in Fig. 2, the difference in reflectance between the background and the red grapes region is large at 726.6 nm, so the grey-scale image at 726.6 nm was selected for the extraction of the sample region. The extraction process was as follows: the original colour image of the hyperspectrum was shown in Fig. 3(A), the grayscale image of the hyperspectral image at 726.6 nm was first extracted was shown in Fig. 3(B), and the grayscale image was converted into a binary image using the Otsu thresholding algorithm. Then the binary image noise was removed by the median filtering algorithm, and the masked template 3 (C) was obtained by the image erosion operation algorithm. After the masking operation, the images of individual red globe grapes were numbered according to their area size to obtain image 3 (D). Finally, the spectral information of the area of the individual red globe grapes was extracted according to the number of the label and averaged to obtain the average spectrum. (GAO et al. 2019).
2.4 Extraction of image feature parameters
In this experiment, the grey scale covariance matrix (GLCM) was used to obtain hyperspectral image feature parameter information, and the principal component analysis algorithm (PCA) was used to reduce the dimensionality of the obtained image information.
GLCM algorithm: the grayscale co-occurrence matrix has strong robustness and is widely used in the extraction of image texture features (Li et al. 2019). Since texture is the result of grayscale distribution replicated in spatial locations, there is grayscale correlation between any pixel points on grayscale images, and GLCM can represent comprehensive texture information such as the direction, magnitude and spacing of changes in image grayscale.
PCA algorithm. The principle of the principal component analysis algorithm (Guo et al., 2020) is to project from the high-dimensional data space to the low-dimensional data space along the direction of maximum covariance to obtain mutually independent principal component components. This algorithm removes a large amount of redundant information and retains features that characterize the original information as much as possible.
Color images have RGB and Lab color spaces, and each color space has distinct feature information. In this paper, a total of six channels (R, G, B, L, a, b) of information are extracted as the color feature parameters of the image. Texture information can better characterize the surface of an object. In this paper, 10 kinds of data, including contrast, mean, energy, entropy, correlation, homogeneity, smoothness, third-order moment, gray standard deviation and consistency, are collected as the texture feature information of the image. By combining them, a total of 16 image features of the red globe grape image are obtained. Due to the different sizes of the extracted image features, normalization is performed to reduce the dimensionality before the PCA algorithm is performed. As can be seen in Fig. 4, the cumulative contribution rate of the first five principal components obtained after PCA dimensionality reduction of the extracted 16 image features has reached 99.992%, which fully meets the modeling requirements. In order to simplify the computational speed and reliability of the model, the first five principal components extracted by dimensionality reduction of PCA algorithm are modeled.
2.5 Extraction of spectral feature parameters
2.5.1 Partitioning of the sample set
KS algorithm: All samples were considered as correction set candidates, and samples were selected from them in turn to enter the correction set. The advantage is that the samples with large differences in spectral data can be effectively selected as the correction set, and the remaining samples are divided into the prediction set to improve the stability and prediction accuracy of the model.
A total of 360 red globe grape samples were collected in the experiment, which were divided into 270 calibration set samples and 90 prediction set samples according to 3:1 (Kennard-Stone, KS) using the KS algorithm. As can be seen from Table 1, the total acid distribution ranged from 2.254 to 37.663 (g/kg), with standard deviation values of 9.663 and 11.210 for the correction and prediction sets, respectively.
Table 1
Data for the sample set divided using the KS algorithm
Sample set | Number | TA(g/kg) |
Max. value | Min. value | Mean value | Standard deviation |
Correction set | 270 | 37.663 | 2.320 | 11.283 | 9.663 |
Prediction set | 90 | 35.415 | 2.254 | 6.575 | 11.210 |
2.5.2 Pre-processing of raw spectra
Spectral pre-processing is required to effectively eliminate the effects of instrument noise, dark currents and other factors on the test results. The raw spectra(RAW) are shown in Fig. 5. Therefore, this paper adopted pre-processing methods such as Standard Normal Variable transformation (SNV), Savitzky-Golay convolutional smoothing (S-G), Multivariate Scatter correction (MSC), Moving-Average method (MA), Normalization (Nor), and the preprocessing results were shown in Table 2. The conclusions were as follows: After comparative analysis, it was concluded that the correlation coefficients Rc for the correction set and Rp for the prediction set of the LSSVM model built after the raw spectra were pre-processed by MSC were the highest, which were 0.9847 and 0.9841 respectively, and the root mean square errors of the correction and prediction sets were smaller, therefore, the spectra pre-processed by MSC were selected for feature wavelength extraction and modelling.
Table 2
LSSVM prediction models built using different pre-processing methods
Index | Pre-processing algorithm | γ | σ2 | Correction set | Prediction set | RPD |
Rc | RMSEC | Rp | RMSEP |
TA | RAW | 14801.6 | 19576.8 | 0.9764 | 1.7868 | 0.9673 | 1.9571 | 5.3678 |
SNV | 5435.9 | 9292.0 | 0.9768 | 1.7789 | 0.9708 | 1.9498 | 5.3883 |
SG | 8787.4 | 14592.9 | 0.9732 | 2.3320 | 0.9796 | 1.9391 | 4.8971 |
MSC | 15652.8 | 6187.7 | 0.9847 | 1.5022 | 0.9841 | 1.1978 | 5.4138 |
MA | 843.0 | 10110.1 | 0.9685 | 2.6269 | 0.9564 | 3.1221 | 3.0904 |
Nor | 1823.8 | 5428.8 | 0.9769 | 1.7809 | 0.9707 | 1.9885 | 5.0944 |
2.5.3 Extraction of spectral feature wavelengths
As the experimentally acquired hyperspectral reflectance images contained a large number of wavebands, and there was a large correlation and redundant information between the wavebands, the extraction of spectral feature wavelengths was carried out to improve the prediction speed and accuracy of the model. The pre-processed spectra were extracted using the Competitive Adaptive Reweighted Sampling (CARS) algorithm, the Successive Projection Algorithm (SPA) algorithm, and the Uninformative Variable Elimination (UVE) algorithm, respectively. the process was as follows:
CARS algorithm: In this study, Monte Carlo sampling was set to 50 times and 5-fold cross-validation was used. As can be seen in Fig. 6, when the RMSECV value reached the minimum value, the sampling was run 28 times, and the regression coefficients of each variable were located in the vertical straight line position in Fig. 6.
SPA algorithm: the number of wavelength selection variables was set to range from 5 to 35, and the number of selected feature variables was determined according to the variation of RMSEC. The final number of feature wavelength points extracted by CARS-SPA is 10. The location of the selected feature wavelengths in the original spectrum was shown in Fig. 7.
UVE algorithm: The UVE is applied to extract valid information from the spectral data, add the same amount of noise information as the spectral information, and the rejection threshold is set to 99% of the absolute value of the maximum stability of the noise matrix. The yellow and red curves in the figure are the stability values of the spectral and noise variables, respectively, and the outside of the two horizontal dashed lines (± 45.54) are the useful information to be retained, and the selection results are shown in Fig. 8.
2.6 Determination of total acid in red globe grape samples
The sample was squeezed to become grape juice and the total acid was determined on the grapes after collecting hyperspectral data.