3.1 Analysis of sample Raman spectra
Fig. 1 shows the Raman spectra recorded from goat milk, adulterated goat milk and cow milk. As it can be seen from Fig. 1 that the Raman peaks of goat milk, adulterated goat milk and cow milk are so similar that we cannot distinguish them visually. It can be seen from the Fig. 1 that the main Raman peaks of the milk samples are in the range of 715 cm-1 to 1812 cm-1. By reviewing the literature, we attributed the Raman peaks of milk samples. Literature revealed that the strongest peak shown in Raman spectroscopy of three milk species at approximately 1441 cm-1 was related to the -CH2 scissoring vibration. The breathing mode of ring compounds can be shown in the region of 1004 cm-1 (El-Abassy et al. 2011). The Raman shift in the region between 1064 and 1081 cm-1 indicated the existence of fatty acid (Gallier et al. 2011). The Raman peak appeared at 1120 cm−1 corresponds to ν(C-C) stretching modes of saturated fatty acids (Amjad et al. 2018). It is reported that the Raman spectral range at 1266 cm-1 is due to the =C-H symmetric rocking. The band around 1304 cm-1 may represent the -CH2 in-plane twist. The Raman spectra of milk also show major peaks on the following Raman shifts: at around 1656 cm-1 regarding to C=C cis double bond stretching; at around 1750 cm-1 regarding to C=O ester-carbonyl stretching. the main Raman peaks assignments are shown in Table 1. Particularly, the range of 1064 cm-1 to 1081 cm-1, 1120 cm-1, 1304 cm-1 and 1750 cm-1 are corresponded to the peaks of fat. The protein bands are at 1004 cm-1 and 1656 cm-1. The band at 1441 cm-1 are attributed to the fat and carbohydrate molecules (Gallier et al. 2011; McGoverin et al. 2010; Yaman 2020). Based on the above analysis of the Raman peaks of milk samples, it can be seen that the Raman spectra of milk samples are composed of fat, protein and carbohydrate peaks. In particular, Raman peaks of fat and protein are predominant. The similarity between the Raman spectra of goat milk and cow milk may be due to their similar chemical composition, especially the macromolecular compounds (fat, protein).
Table 1 Main Raman peaks assignments of milk samples
Raman cm-1
|
Peak assignment
|
1004
|
The breathing mode of the ring compounds
|
1064-1081
|
C-C stretching symmetric phosphoryl stretching
|
1120
|
ν(C-C) stretching modes
|
1266
|
=C-H symmetric rocking
|
1304
|
-CH2 in-plane twist
|
1441
|
-CH2 scissoring vibration
|
1656
|
C=C cis double bond stretching
|
1750
|
C=O ester-carbonyl stretching
|
However, it should be clearly noted that there are still relatively differences in the composition and structure of protein and fat between goat milk and cow milk (Turkmen 2017). One of the significant differences between goat milk and cow milk lies in lipids, with fat spheres of approximately 1~10 µm in size in goat milk and cow milk, with 60% of fat spheres smaller than 5 µm in cow milk and more than 80% in goat milk; it is known from the reported literature that different sizes of fat spheres do lead to differences in Raman spectra (Gallier et al. 2011; Silanikove et al. 2010).
Another compositional difference between goat milk and cow milk is protein. Among them, protein is divided into casein (as1-casein, aS1-casein, β-casein, and k-casein) and whey protein (b-lactoglobulin and a-lactalbumin) (Saeys et al. 2005). Some studies have shown that the difference in protein composition between goat milk and cow milk is mainly in the casein content, the protein (casein) composition and content of goat milk and cow milk are shown in the Table 2 and Fig. 2 below. It is clear that the major casein component in goat milk is β-casein (54.8%), whereas the major casein component in cow milk is aS1-casein (38.0%), where aS1-casein is also one of the caseins with the greatest concentration difference between goat and cow milk, with aS1-casein accounting for 38.0% of the total casein in cow milk compared to 5.6% in goat milk. Besides, there is also a noticeable difference between cow milk and goat milk in terms of β-casein content. As can be seen from the Fig. 2, the β-casein in cow milk only accounted for 36.0% of the total casein content, while the ratio of β-casein to total casein in goat milk was as high as 54.8% (Park 2006). However, unfortunately, these differences in the composition of goat milk and cow milk do not allow us to distinguish their Raman spectra by visual observation alone, let alone distinguish between goat milk and goat milk mixed with cow milk. In order to identify cow milk, goat milk and adulterated goat milk, we must combine chemometric methods.
Table 2 Comparison of the composition and content of casein in cow milk and goat milk
Furthermore, it can be seen from Fig. 1 that the Raman spectra of goat milk or cow milk do not completely coincide, but there are still subtle differences, which may come from the random error of the system and the inhomogeneity of the sample, where the inhomogeneity of the sample may account for a considerable portion of the error. Since the laser spot is about 1-2 µm and most of the fat globules in the milk are larger than 5 µm, the milk is not homogeneous in this limited spot area. In order to overcome the inhomogeneity of milk, multi-point spectral averaging of the same sample during identification and quantitation experiments might yield better results.
3.2 Results of PCA analysis
Draw scatter plots to visualize differences in samples by PCA. In this experiment, each Raman spectrum (1023 variables) represents a sample, and each sample is a point in the 1023-dimensional space, it is impossible to observe sample differences in such a high-dimensional space. PCA will reduce high-dimensional sample data into below three dimensions space and draw their scatter plots by extracting principal components.
Firstly, to investigate the effect of multi-point averaging of the same sample on the difference exploration experiment between cow and goat milk, we performed PCA analysis using the averaged and unaveraged data separately. As can be seen in Fig. 3, when unaveraged sample data were used for PCA analysis, even though three PCs with the highest cumulative variance explanation rate were selected, the differences between cow milk and goat milk were still not significant, the intra-group differences were larger for the same kind of samples and even greater than the difference between groups. Subsequently, three kinds of averaged sample data were used for PCA analysis (cow milk, goat milk, 50% adulterated goat milk). As can be seen from Fig. 4a, when the first two variance explained PC1, PC2 were selected, the cow milk and goat milk showed significant differences in this two-dimensional space, and the samples were more clustered compared to the unaveraged ones, which indicated that there were differences between cow milk and goat milk, the multi-point spectral averaging method indeed reduced the error caused by the sample inhomogeneity. Nevertheless, adulterated goat milk was not separated from the other two categories of samples in the two-dimensional space composed of PC1, PC2. According to Fig. 4b, although the total variance explained by them was 85.4%, which meet the general requirements of CPV (cumulative percent variance) for PCA analysis (Hu et al 2019). Therefore, we selected more PCs (PC1, PC2, PC3), and the cumulative variance explained reached 91.8%, which meant that these PCs already reflected most of the basic characteristics of the sample data. It can be seen from Fig. 4c that in the three-dimensional space composed of PC1, PC2, PC3, the three types of samples were obviously separated from each other and the tendency of separation was obvious along the PC1, PC3 axes particularly. Actually, as shown in Fig. 4d, the three types of samples were completely separated in the two-dimensional space composed of PC1 and PC3, even if a total of variance explained by PC1 and PC3 was less than 70-85%, which didn’t meet the general requirements of CPV (cumulative percent variance) for PCA analysis. According to the results, there are differences not only between cow milk and goat milk, but also between adulterated goat milk, which are the basis for quantitative experiments.
3.3 Results of PLS regression (PLSR)
To verify the importance of multi-point average for quantitation, the unaveraged spectral data and the averaged spectral data were used for model building individually. In order to quantify the amount of adulteration, a partial least squares regression model combined with full spectral range was established, using 80% of the samples as the training set. Then, the remaining 20% of the samples were used to validate the performance of the established PLSR model in prediction. In addition, to select the appropriate number of PLS components, we complied with the RMSE criterion: when the RMSE value is the smallest, the number of corresponding PLS components is optimal (Li et al. 2021).
As shown in Fig. 5, the RMSE value is minimum when the number of PLS components is 11, which is the optimal number of PLS components we selected. The established PLSR model with averaged sample data shows high prediction accuracy and good model stability. Fig. 6, Fig. 7 shows the evaluation results of the train (calibration) set and the test (prediction) set respectively. Obviously, the model built with averaged spectral data (the RMSE and R2 for the prediction set are 3.82% and 0.9781, respectively) shows higher accuracy and less prediction error than the PLSR model built with a training set and a prediction set of unaveraged spectral data (the RMSE and R2 for the prediction set are 6.32% and 0.948, respectively) as can be seen from the Table 3. As it is clear from the results that averaging multiple-point spectra of the same sample is more suitable for goat milk quantification. Consequently, the best quantification model was adopted and the performance was evaluated in detail as follows.
Table 3 Comparison of results for averaging or non-averaging of multi-point spectra
Data processing methods
|
RMSEtrain
|
RMSEtest
|
R2train
|
R2test
|
Multi-point spectral non-averaging
|
5.31%
|
6.32%
|
0.9621
|
0.9480
|
Multi-point spectral averaging
|
2.82%
|
3.82%
|
0.9897
|
0.9781
|
We compared the R2 and RMSE of train (calibration) set and the test (prediction) set, and found that the quantitative model for adulterated goat milk took a good compromise between fitting and prediction, with the quantitative accuracy of the calibrated model described as R2=0.9897 and RMSE=2.82%, and the quantitative accuracy of the predicted model as R2=0.9781 and RMSE=3.82%. Obviously, the sample points in a coordinate system consisting of true and predicted values are close to the line Y=X in the coordinate system, indicating that the predicted results are close to the actual values and there is a good linear relationship between them. Besides, the value of RPD is 6.8, which is greater than 3, indicating that the model has excellent prediction ability. These results imply that the model is stable, neither under-fitted nor over-fitted, and can be applied to quantify adulteration in goat milk.
To further understand this quantitative model, we analyzed the PLS regression coefficient. The regression coefficient plot allows us to observe more visually the magnitude of the contribution of each independent variable (each Raman shift of the Raman spectrum of the sample) in explaining y (adulteration concentration). In other words, a certain Raman shift corresponding to a larger regression coefficient (absolute value) represents a larger contribution of that Raman shift (independent variable) in explaining the dependent variable y, on the other hand, a smaller contribution (Šestak et al. 2022). From PLSR coefficient plots Fig. 8, it can be seen that the peaks of fat and protein (fat peaks:1304 cm-1, 1120 cm-1, 1064-1081 cm-1; protein peaks: 1004 cm-1, 1656 cm-1) in the samples play a dominant and important role in predicting the adulteration concentration of cow milk in goat milk, as shown by the large absolute values of the peaks corresponding to fat and protein in the PLSR coefficient plot (McGoverin et al. 2010). These show that it is the difference in fat and protein (composition and structure) between cow milk and goat milk that is the main factor in the quantitation analysis of cow milk in goat milk. Raman spectroscopy combined with chemometrics became another new confirmation of former research findings on the differences between goat and cow milk and provided another research direction for the subsequent study of adulteration of goat milk at the same time (Turkmen 2017).