3.1. Correlation analysis between spectral reflectance and oleic acid
The original and first derivative spectral reflectance data of B. napus with different oleic acid content at the seedling, bolting, and flowering stages were correlated with their corresponding seed oleic acid content (Fig. 2). Figure 2A shows that the correlation between spectral reflectance at seedling stage and seed oleic acid content was significant in many wave bands and extremely significant in some wave bands, such as 418 ~ 495 nm and 2050 ~ 2382 nm. Nevertheless, the correlation in the bolting stage was only significant at 1868 nm, and no significant correlation was found at the flowering stage.
The correlation coefficient between the first derivative differential spectrum and seed oleic acid content at three different growth stages was slightly higher than the original spectrum in some waves (Fig. 2B). However, the correlation coefficient had great variability and instability. Among them, more wave bands were significant at the seedling stage than in other growth stages. Therefore, the seedling stage was selected for subsequent modeling and analysis because of the correlation results of original and first derivative spectral reflectance with oleic acid.
3.2. Spectral characteristics analysis of rape at different growth stages
The original and first derivative spectral reflectance curves of high oleic acid B. napus seeds corresponding to the maximum and minimum oleic acid content at different growth stages are shown in Fig. 3. The reflectance curve of B. napu has a typical variation trend of plant leaves with a significant "double peak" feature, i.e., the reflection peak appears near 1119 nm and 1305 nm, and the absorption valley appears near 1209 nm. Furthermore, the variation trend of the original spectral reflectance of the maximum and minimum values of oleic acid is almost the same. At 350–700 nm and 800–1800 nm, the reflectivity of low oleic acid rape is generally higher than that of high oleic acid rape, while it is opposite at 1900–2500 nm. Between 700 nm and 800 nm, the reflectivity of high and low oleic acid is almost the same, so that the subsequent band selection will exclude the band in this range.
3.3 Estimation model of oleic acid based on characteristic wavelength
The characteristic wavelengths useful to estimate oleic acid content in the original spectrum of high oleic acid rapeseed seedlings and the first derivative with significant correlation were selected for subsequent analysis (Table 1). In the unitary model based on the original spectrum, only the fitting coefficient R2 (0.59) of the 22 nm band was greater than 0.55. In the univariate model based on the first derivative, only the fitting coefficient R2 (0.69) of the 929 nm band was greater than 0.55. Finally, in the binary model based on the original spectrum and the first derivative, the fitting coefficient R2 of 929 nm, 1121 nm, and 2283 nm bands was greater than 0.55. Therefore, these characteristic wavelength models can be used for the early prediction of oleic acid.
Table 1
Estimation of oleic acid content in B. napus seeds based on the full wavelength of seed stage
Characteristic wavelengths /nm
|
Original spectrum
|
First derivative
|
Original Spectrum and First derivative
|
436
|
y=-312.42x + 90.79
R²=0.32
|
y = 1.18x×105+ 87.51
R²=0.15
|
y=-418.52x1-71864.34x2 + 86.65
R2 = 0.34
|
664
|
y=-385.28x + 91.31
R²=0.29
|
y = 76.35x + 68.04
R²=0.01
|
y=-402.86x1-13637.08x2 + 84.45
R2 = 0.31
|
929
|
y = 88.52x + 28.59
R²=0.17
|
y = 5.59x×103+75.56
R²=0.69*
|
y = 64.57x1 + 5299.48x2 + 46.42
R2 = 0.78*
|
1121
|
y = 101.72x + 23.48
R²=0.23
|
y = 1.21x×104+72.75
R²=0.34
|
y = 102.73x1 + 12222.2x2 + 27.82
R2 = 0.58*
|
1358
|
y = 161.67x + 18.38
R²=0.45
|
y=-741.09x + 68.03
R²=0.08
|
y = 156.87x1-594.55x2 + 19.87
R2 = 0.51
|
2283
|
y = 293.78x + 47.57
R²=0.59*
|
y=-7.18x×103+65.63
R²=0.17
|
y = 281.03x1-1246.04x2 + 48.04
R2 = 0.59*
|
Note: x corresponds to the original spectrum or first derivative; y corresponds to the content of oleic acid in seeds; x1 corresponds to the original spectrum; x2 corresponds to the first derivative.
3.4. Correlation analysis between the spectral index and oleic acid
The coefficient of determination resulting from the correlation analysis between the spectral index of any two wavelengths with the original spectral reflectance of rape leaves at the seedling stage and their corresponding oleic acid content in seeds is shown in Fig. 4. It can be seen from that figure that the correlation between NDSI and RSI and seed oleic acid content is similar, i.e., the normalized combination with a high coefficient of determination R2 has a high ratio combination R2. Moreover, the wavelength combination with a large determination coefficient has more near-infrared region, which indicates that near-infrared region wavelength performs better in estimating the oleic acid content of seeds.
Among the spectral indexes constructed by any two wavelengths, five wavelength combinations with the highest determination coefficient of seed oleic acid content were selected, and a linear regression model based on the optimal spectral index was established. The model accuracy was externally tested using the validation set data (Table 2). Table 2 shows that among the 15 selected wavelength combination parameters, the R2c of 11 combinations was higher than 0.8, except for NDSI (699,688), NDSI (817,816), NDSI (1831,436), and RSI (629,628). In those 11 combinations, the RMSEC of DSI (855,837), DSI (1130,1126), and RSI (1358,703) were higher than 1, i.e., their prediction accuracy was poor, so they were eliminated. Among the eight remaining models, the RPD of the NDSI (793,792) and DSI (1358,710) combinations were lower than 1.5; thus, they have a low prediction power and were eliminated. Finally, NDSI (915,913), DSI (793,792), DSI (915,913), RSI (764,754), RSI (817,816), and RSI (915,913) were selected for subsequent optimization and verification.
Table 2
Estimation of the oleic acid content of B.napus based on the spectral index
Spectral index
|
Estimation model
|
R2C
|
RMSEC/%
|
R2V
|
RMSEV/%
|
RPD
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.37
|
4.36
|
0.22
|
5.12
|
0.83
|
NDSI (793,792)
|
y=-63.42x×102+77.22
|
0.85
|
0.16
|
0.83
|
0.17
|
0.06
|
NDSI (817,816)
|
y=-34.52x×102+67.30
|
0.29
|
0.17
|
0.86
|
0.19
|
0.13
|
NDSI (915,913)
|
y=-26.94x×103+61.14
|
0.87
|
NA
|
0.80
|
NA
|
2.29
|
NDSI (1831,436)
|
y = 36.47x + 66.50
|
0.77
|
9.43
|
0.31
|
2.14
|
1.20
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.84
|
0.01
|
0.88
|
NA
|
2.80
|
DSI (855,837)
|
y=-97.04x×102+69.42
|
0.86
|
7.51
|
0.81
|
7.50
|
0.01
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.87
|
NA
|
0.80
|
NA
|
2.24
|
DSI (1130,1126)
|
y=-65.42x×102+44.64
|
0.85
|
3.57
|
0.57
|
0.72
|
0.13
|
DSI (1358,710)
|
y = 28.63x×101+51.16
|
0.83
|
0.74
|
0.62
|
1.76
|
1.49
|
RSI (629,628)
|
y=-20.39x×103+20431
|
0.78
|
1.04
|
0.53
|
0.04
|
1.01
|
RSI (764,754)
|
y=-87.91x×102+972.33
|
0.84
|
0.33
|
0.80
|
0.37
|
2.20
|
RSI (817,816)
|
y=-14.20x×103+14278
|
0.87
|
0.02
|
0.86
|
0.02
|
2.69
|
RSI (915,913)
|
y=-13.47x×103+13535
|
0.87
|
0.03
|
0.80
|
0.03
|
1.81
|
RSI (1358,703)
|
y = 44.01x-5.93
|
0.81
|
8.13
|
0.51
|
4.53
|
1.37
|
Note: x corresponds to the spectral index; y corresponds to the content of oleic acid in seeds; NA indicates values below 0.01.
3.5. Model optimization and verification
In order to further eliminate the error of the above formula and obtain a more accurate prediction model. In this study, the characteristic parameters with an oleic acid content of 56%-85% and interval of 1% are taken into the above 11 formulas as independent variables. After obtaining the corresponding fitting data, the characteristic parameters and estimation model corresponding to the minimum difference of the actual corresponding oleic acid content in each percentage point are selected as the final model. The results are shown in Table 3.
Table 3
The best simulation formula corresponding to different oleic acid content with the minimum difference value
Oleic acid content (%)
|
Characteristic parameter
|
Estimation model
|
Minimum difference value (%)
|
56
|
DSI (1358,710)
|
y = 28.63x×101+51.16
|
0.21
|
57
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.16
|
58
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.06
|
59
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.05
|
60
|
DSI (855,837)
|
y=-97.04x×102+69.42
|
0.38
|
61
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.39
|
62
|
2283
|
y = 293.78x + 47.57
|
0.91
|
63
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.69
|
64
|
929'
|
y = 5.59x×103+75.56
|
0.28
|
65
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.21
|
66
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.23
|
67
|
DSI (1358,710)
|
y = 28.63x×101+51.16
|
1.26
|
68
|
RSI (1358,703)
|
y = 44.01x-5.93
|
0.17
|
69
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.12
|
70
|
DSI (1358,710)
|
y = 28.63x×101+51.16
|
0.27
|
71
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.32
|
72
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.49
|
73
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.16
|
74
|
RSI (1358,703)
|
y = 44.01x-5.93
|
0.06
|
75
|
DSI (855,837)
|
y=-97.04x×102+69.42
|
0.29
|
76
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.05
|
77
|
DSI (855,837)
|
y=-97.04x×102+69.42
|
0.40
|
78
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.24
|
79
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.15
|
80
|
DSI (793,792)
|
y=-69.11x×103+77.05
|
0.92
|
81
|
DSI (915,913)
|
y=-28.03x×103+61.40
|
0.75
|
82
|
RSI (1358,703)
|
y = 44.01x-5.93
|
0.35
|
83
|
DSI (1358,710)
|
y = 28.63x×101+51.16
|
0.96
|
84
|
DSI (855,837)
|
y=-97.04x×102+69.42
|
0.87
|
85
|
NDSI (699,688)
|
y = 13.03x×101+24.49
|
0.66
|
Note: y is the predicted oleic acid value; x is the corresponding characteristic parameters; x1 is the corresponding characteristic wavelength reflectivity; x2 is the first derivative of the corresponding characteristic wavelength reflectivity;' denotes the first derivative corresponding to this characteristic wavelength.
The complete workflow used to optimize and verify the models takes seven steps: 1) data acquisition; 2) calculation of 11 characteristic parameters related to early prediction; 3) substitution of 25 formulas of 56% − 85% in Table 3; 4) obtention of 25 oleic acid predictions; 5) comparison of 25 prediction results with the original oleic acid content of the formula to verify whether they are consistent; 6) if the comparisons of step 5 are consistent, the results will be the output, if not, step 5 is repeated; 7) the output value is the best prediction result of oleic acid. The specific application method of this model is shown in Fig. 5.
3.6. Method and accuracy test of model application
We selected 50 materials with different oleic acid levels planted in Liuyang next year to test the spectrum of seedlings to verify the above method and test the model's accuracy further. The predicted and measured 1:1 oleic acid results are shown in Fig. 6. In addition, the RSME of 0.792 was calculated by substituting the measured results into the ideal prediction linear model (y = x), proving that the model is reliable. Thus, it can be applied to the early screening of high oleic acid in rapeseed breeding.