Hyperspectral imaging on dry homogenized material is a widely used and well established technique, but applying this method on fresh plant material is not yet a standard analytical procedure [26, 27]. Consequently, the correlation of hyperspectral measurements and wet lab results for starch reported in this study were clearly lower, when compared to NIRS measurements on dried plant material, where coefficients of determination (R2) reached 0.99 for nitrogen and starch contend of cotton leaves [19]. One problem using fresh leaf material is the water in the fresh leaves. Water is a strong absorber of the infrared radiation and predominant bands are in the region between 1300 nm and 2500 nm, where important wavelengths were present in this study [26]. It is therefore likely, that water absorption masked the absorption bands of starch molecules, impairing prediction of starch content [26, 27]. Not only water absorption can obscure the starch absorption characteristics, but also the cell structure of fresh plants scattering light as it passes through multiple air and water boundaries. Furthermore, the distribution of starch in fresh leaves is not uniform with respect to the organization of cells and organelles [26]. The problems associated with the prediction of starch content in fresh leaves might be reduced, if spectral data is preprocessed [35]. Indeed, preprocessing of the spectra considerably improved predictive accuracy compared to unprocessed reflectance spectra (Additional file 3: Fig. S2), by removing systematic variation in spectra such as light scattering and thereby increasing the signal to noise ratio [35].
Total starch concentration of the plant material in the training was between 0.2% and 12% for plants harvested at ED and ranged from 0.01% up to 5% for the plant material harvested at EN. The starch concentrations of the training set were slightly lower than the concentrations of the test set. The total starch concentration was substantially lower than the ones published by Ruckle et al. [8], where leaf starch concentration ranged from 6% up to 35% for ED harvested plants. This difference occurred most likely due to different growing conditions, since light and temperature have a high impact on starch accumulation.
Many studies have shown, that starch contents highly depend on the diurnal cycle [3, 5, 37, 38]. An over 3-fold difference in starch content was observed between ED harvested plants, and plants harvested at EN (Fig. 4). Mean genotypic differences for the training set ranged from 31.6 to 59.7 mg g− 1 DW for the ED harvested plants (Fig. 5) and from 2.2 to 37.6 mg g− 1 DW for the EN harvested plants (data not shown), respectively, showing high variation within genotypes. The best PLSR training model explained 56% of the measured starch variation with an RSME of 17 mg g− 1 DW. Consequently, differences between harvest time points and differences between extreme genotypes can be predicted with our model. Nevertheless, this 56% of starch variation explained by the model is lower than found by Shorten et al. [39], using hyperspectral imaging systems (550 nm − 1700 nm) to estimate more than ten different quality compounds in perennial ryegrass (Lolium perenne L.). Low and high weight sugars were estimated separately and best model prediction for the high weight sugars using PLS regression resulted in an R2 of 0.68 and a RMSE of 19.9 mg g− 1. Splitting two-third of the data for calibration and using the remaining data for validation resulted in a slightly lower model performance (R2 = 0.63 and RMSE of 21.6 mg g− 1) [39]. Variable importance in the projection (VIP) did not considerably improve model performance. This is in contrast to comparable studies where selecting important wavelengths improved model accuracy and reduced the redundancy effects of wavelengths, which had low weight in the model [35, 36]. Our model results indicate that the selected regions were not sufficient to explain the data with equal effectiveness and that many features are important for prediction. The wavelengths near 550 nm, 770 nm, 850 nm and 1920 nm and from 1650 nm to 1850 nm had high model contribution for estimating starch content (VIP assessment, Additional file 5: Fig. S4). Starch absorbance in fresh leaves was previously shown to be associated with wavelengths in the regions of 556 nm, 702 nm, 1300 nm and 1960 nm [28]. These absorptions partly corresponded with the VIP patterns across wavelengths. However, many additional wavelength ranges had high model importance for starch prediction.
A model built from a single set of training observations is often not adequate to predict an independent data set [32, 33]. If a model is tested on the same data that was used to fit the model, performance is often overestimated [32]. Our study showed that cross-validated training resulted in a substantial overfitting and high starch contents were particularly underestimated (Fig. 7). The independent second dataset from a second experiment (test set) allowed us to further validate model performance, in addition to cross-validation during training. As expected, the test prediction resulted in a 1.7-fold increase in RMSE (Fig. 8). Moreover, models including only a subset of wavelengths were validated on the test set, resulting in an even lower predictability (Table 1). The VIP analysis of the two independent sets (training and test) indicated that some important wavelengths regions occurred in both sets, but with different VIP magnitudes (Additional files 5, 6: Figs. S4, S5). Further, the training model had important features between 500 nm and 750 nm, whereas the re-calibrated test model had important wavelengths below 500 nm. These differences in VIP magnitudes and the additional regions relevant for prediction partly explain the poorer prediction performance of the test set when applying the training model. Despite the fact that two of the three genotypes from the test set were included in the training set, the spectra and models had only limited generalization capacity for starch contents.
Recalibration using only test data led to a slight decrease in RMSE compared to test prediction, but this substantially reduced bias. Thus, a new calibration may be needed for each independent trial or the current red clover starch spectral library needs to be augmented with more measurements from different independent trials with both genotypic and phenotypic variance in starch. Various environmental growth conditions influence starch accumulation and can thereby mask genotypic effects [38]. Hence, we suggest follow up research to ultimately verify whether a separate spectral model is required for each independent trial to improve predictive accuracy under substantial genotype x environment interaction. Despite the relatively low prediction accuracy, performance of the best PLSR training model was sufficient to detect differences between red clover genotypes with very high or very low levels of starch content. Therefore, the method developed has the potential to contribute to the breeding of high energy red clover cultivars.
The success of breeding forage crops with increased energy content was previously demonstrated by breeding perennial ryegrass cultivars with high levels of water-soluble carbohydrates (WSC). These WSC cultivars can substantially increase animal performance and nitrogen use efficiency in pasture-based animal production systems [40]. Red clover and ryegrasses are often cultivated in mixtures, not only due to their attractive diet composition, but also due to the transfer of N between species. In addition, grass-clover mixtures require fewer pesticide and herbicide applications, and protect soils against erosion [41, 42]. Therefore, high starch red clover cultivars in mixtures with high WSC ryegrasses appear a particularly promising option, which brings us one-step closer towards an environmental sustainable feed production meeting the high-energy requirements of modern livestock production.