Chemical characteristics
The contents of glycine in different parts of selected beef were significantly different (Fig. 3). The content of glycine was the highest in LD, and was significantly lower in HQ (p<0.05) and FL (p<0.01). This may be related to the content of free amino acids during catabolism. When the energy supply of free amino acids exceeds the energy required by the body, it participates in the tricarboxylic acid cycle and carbohydrate metabolism, and oxaloacetate (4C), citric acid (6C), cis aconitonic acid (6C), A-ketoglutaric acid (5C), fumaric acid (4C) and other intermediates were produced. Free amino acids in HQ and FL of beef are decomposed for energy during exercise, while LD has less movement, so it needs more free amino acids to participate in the synthesis. This is consistent with the results of fatty acid content in Tan mutton [19].
The test sample (n = 360) were divided into calibration sets and prediction sets (Table 3). The best results were obtained with SPXY method, and the calibration set and the prediction set were not very different with similar data distribution. Indicating that the glycine distribution was uniform between different muscles in this sample.
Spectral features
The average spectral curves of three different parts of the beef sample in the spectral range of 900–1700 nm can be seen that LD tended to have the higher spectral reflectance than HQ and FL and the overall trend of the three parts were roughly the same (Fig. 4). This is similar to the results obtained in the near infrared spectroscopy [24]. In the range of 1050–1400 nm, the spectral reflectance values of the three parts were different. After 1400 nm, the spectral reflectance values were lower than 0.1 due to the strong water absorption in the muscle [25]. Peaks (Fig. 4a) and valleys (Fig. 4b) occurred in the entire spectral reflectance curve, mainly because chemical bonds absorb energy at specific wavelengths, the spectral intensity of which is determined by changes in the molecular dipole moment [26]. Therefore, the band in the range of 1400–1700 nm was the water absorption band of the second and first overtones related to O-H bond frequency doubling, indicating that water accounted for the largest proportion and absorbed most of the near-infrared radiation in the beef muscle. At 1050 nm, 1200 nm, and 1400 nm, O།H, C།H, and N།H overtones appeared respectively.
Prediction models using full spectral range
Normalize, SNV, Detrend, Baseline, and MSC were used to process the data, and PLSR model of pretreated spectral reflectance was established to predict the content of glycine. Comparison of the RCV2/RP2 and RMSEC/RMSEP values of the PLSR model and the prediction set showed that a good result was obtained by the data preprocessing (Fig. 5).
Baseline was found to be the best preprocessing method for glycine prediction models in beef samples. The data obtained by other preprocessing methods also showed an improvement trend, indicating that most preprocessing methods provided favorable trends for the spectral data. Thus, selecting the right pretreatment method may increase the reliability of the prediction model. The spectral curves of data preprocessed by different methods were similar to baseline method (Fig. 6).
Selection of effective wavelengths
In this study, iVISSA, CARS and UVE were used to extract effective feature bands in the whole band range, and then the feature wavelength extraction method with the best effect was selected through modeling.
The iVISSA algorithm extracted characteristic wavelengths through two iterative cycles of global and local search, and finally obtained a set of optimized effective wavelengths. Set the iVISSA parameter, take the maximum principal component number 20, adopt half-fold cross-validation, select "Center" as the data processing method, set the number of binary matrix samples to 500, the optimal subset ratio to 0.1. In the iterative operation, the sampling weight value of each wavelength varied with the number of iterations, and the weight ranged from 0 to 1 (Fig. 7a). After 10 iterations, the sampling weight value was basically stable, and a total of 46 characteristic wavelengths were extracted. The 46 characteristic wavelengths selected by the iVISSA method were widely and uniformly distributed in the whole waveband, indicating that the iVISSA algorithm may well preserved the synergistic effect between wavelengths (Fig. 7b).
When the CARS method was used to extract characteristic wavelengths, the variable subset with the smallest RMSECV value was selected as the optimal subset. During the running process, the number of variables and the number of runs showed an exponentially decreasing relationship (Fig. 8a). RMSECV first decreased slowly, with a minimum value at the 275th sample run, and then increased, the number of optimal variable subsets decreased, and some valid information was removed, resulting in a decrease in model accuracy (Fig. 8b). The regression coefficient paths were different for different runs, reflecting the changes in the regression coefficients (Fig. 8c). The vertical line represented the minimum RMSECV value, which was 220 runs. The extracted 19 characteristic wavelengths were mainly distributed between 1100–1165 nm and 1200–1400 nm. The selected characteristic wavelengths were little but uniform, thus preserved some of the synergistic effects between wavelengths (Fig. 8d).
The UVE method preserves the information variables, reduces the frequency band dimension, and improves the prediction accuracy of the model. The left side of the vertical line showed 233 wavelength variables under full-spectrum conditions, while the right side showed 117 random variables (Fig. 9a). Two horizontal dashed lines separate useful variables (outside the dashed line) and useless variables (inside the dashed line). The 84 characteristic wavelengths extracted by UVE method were evenly distributed, mostly in the wavelength range, and the combined effect of wavelengths was observed (Fig. 9b).
Prediction models using effective wavelengths
The three models PLSR, LSSVM, and SFCN based on different algorithms to extract feature wavelengths could effectively predict glycine contents in beef samples, and the models of all characteristic wavelengths showed good results (Table 4). In the full-band modeling of beef glycine content, the prediction set performance of PLSR method was 0.0481 higher than that of LSSVM method, indicating that the modeling effect of linear model PLSR was better than that of nonlinear model LSSVM, the linear relationship between glycine content and spectral absorption in beef samples was greater than the nonlinear relationship. The prediction set performance of SFCN method was 0.0423 higher than that of PLSR method, this indicated that SFCN had more potential in predicting the content of glycine than traditional models.
The number of bands retained by all feature wavelength processing methods ranged from67.18%-92.57%, which could avoid data redundancy and eliminate useless information. The modeling effect of extracting characteristic wavelengths based on different methods was better than that of full-band data modeling. The model after extracting characteristic wavelengths by the VUE method achieved the best results, with Rp2 being 0.0298 higher than that of the full-band model, indicating that extracting characteristic wavelengths retained useful information and improved model accuracy. Jiangbo et al. studied the hardness of different pear varieties based on Vis-NIR spectroscopy [27], and obtained the optimal UVE-SPA-LSSVM hardness prediction multi-variety model was obtained, and its RP2, RMSEP and RPDP values of 0.94, 0.91, and 2.93, respectively. Our results suggested that the characteristic wavelengths selected by UVE were more accurate and effective. On the contrary, among the three modeling methods, CARS method was slightly worse and related to the less feature band extraction and the elimination of part of the effective information.
Prediction models using combinations of spectral and textural features
The optimal grayscale image had a critical influence on the optimal texture features. Principal component analysis was performed on beef sample images, and the cumulative contribution rate of the first three principal component images was 92.12% (Fig. 10). PC1, PC2 and PC3 explained 89.67%, 3.69% and 0.76% of the image information, respectively, suggesting that the texture information of the first principal component could be used as the representative information of the sample. Therefore, the GLCM method was used to extract the texture information of contrast (CONT), homogeneity (HOM), energy (ENG), and correlation (CORR) in four directions (0°, 45°, 90°, and 135°) in pc1 images. Pearson correlation plot of informative parameters and glycine content, positive in red and negative in blue. The darker the color, the greater the correlation. HOM and ENG were positively, and CONT and CORR were negatively correlated with glycine (GLY) content (Fig. 11). The different correlation coefficients were numerically similar, with the highest correlation between HOM and glycine content, reaching 0.47. Indicating that there was a certain correlation between texture information and glycine content, which was worthy of further study.
PLSR, LSSVM and SFCN models were established based on the fusion of extracted texture information by PC1 image and optimal spectral values (Table. 5). The RP2 values of the PLSR and LSSVM prediction models were increased by 0.04 and 0.05 respectively, compared with the optimal wavelength model, indicating that the linear and nonlinear relationship between glycine content and fusion information was enhanced to different degrees. The SFCN model established by the fusion of spectral information and textural information had the best performance (RP2=0.9005, RMSEP = 0.3075, RPDP=0.2688). Compared with the fused data of the other two modeling methods, the RP2 was 0.0303 and 0.0666 higher, respectively, indicating that the SFCN method still had better predictive ability in the prediction model established after fusing the textural and spectral information of beef glycine. In summary, the fusion of textural and spectral information can describe beef quality information more effectively. This maybe because the texture image of beef muscle reflected the arrangement characteristics or degree of density of muscle fibers, which was related to the amino acid content of beef.
At present, it was common for deep learning models to outperform traditional modeling methods. In red meat recognition and identification, the deep 3D-CNN model outperformed the PLS-DA and SVM models, and the highest recognition accuracy reached 98.6% in different spectral systems [28]. Prove the features of CNN outperform HOG features with correct classification rate (CCR) of 0.927 and 0.916 for cross-validation and test data set separately. It showed that CNN was more effective than traditional methods for fish muscle opening detection and ensures product quality. Building a graphical framework for visualizing salmon fillet classification, the best model built from the feature descriptor of CNN had achieved a high CCR of 0.927 for cross validation and 0.916 for the external test data set. It showed that CNN was more effective and suitable for fish muscle detection than traditional methods to ensure product quality [29]. The possible causes were as follows: The performance of SFCN prediction model established by deep learning method was better than that of traditional modeling method; the spectral values measured by different spectral systems may affected the training and fitting of model performance; differences breeds and muscle types may led to different results.
Visualization of glycine content prediction
On the basis of optimal spectral information and textural information fusion modeling, specific model coefficients were applied to obtain the dot product of coefficients and spectra to obtain a single pixel in the image. The predicted value of pixels was obtained by displaying distribution and visualization shows the distribution of glycine content in beef samples (Fig. 12). The color bar showed different concentration values, the concentrations of glycine was between 1.30 g/100 g and 4.97 g/100 g, and the color was not uniform, which may be caused by the uneven distribution of glycine; Some of the mutated blue and red areas may be caused by the failure to remove clean fine fascia and adipose tissue from the sample or by water loss during the experiment. It can be clearly and intuitively expressed the glycine content in different areas of samples after pseudo-color treatment. Compared with traditional prediction methods, this can help consumers to quickly and directly judged the quality of beef without damage. It can be said that these studies were effective and necessary.