Near infrared spectroscopy (NIRS) based high-throughput online assay for key cell wall features that determine sugarcane bagasse digestibility

Background: The improvement of sugarcane cell wall structure is a promising strategy to enhance the bagasse digestibility to improve its prospects as a bioenergy crop. In this context, cellulose crystallinity (CrI) and lignin are the key parameters that inuence the saccharication eciency. Therefore, this study was conducted to develop a high-throughput assay for online characterization of these cell wall features in sugarcane. Results: A total of 838 different sugarcane genotypes were collected at different growth stages during 2018 and 2019. A continuous variation distribution of near-infrared spectroscopy (NIRS) was observed among the sugarcane samples. Due to signicant diversity of the cell wall features in the sampled population of the crop, seven high quality calibration models were developed through online NIRS calibration. All of the generated equations displayed coecient of determination (R 2 ) values higher than 0.8 and high ratio performance deviation (RPD) values over 2.0 in calibration, internal cross validation, and external validation. Particularly, the equations for CrI and the total lignin content exhibited the RPD values as high as 2.56 and 2.55, respectively, indicating their excellent prediction capacity. Furthermore, the oine NIRS assay was also performed. A comparable calibration was observed between the oine and online NIRS analyses, suggesting that both of the two strategies would be applicable for estimating cell wall characteristics. Nevertheless, as online NIRS assay offers greater advantages for large-scale screening jobs, it could be implied as a better option for high-throughput cell wall features prediction. Conclusions: This study, as a foremost attempt, explored an online NIRS assay for high-throughput assessment of key sugarcane cell wall attributes in terms of CrI, lignin content, and its proportion in sugarcane. Consistent and precise calibration results were obtained in NIRS modeling; insinuating this strategy as a reliable approach for large-scale screening of promising sugarcane germplasm for cell wall structure improvement and beyond.


Background
Bioethanol has been recognized as a signi cant clean fuel for reducing carbon debt. In particular, cellulosic ethanol that is derived from lignocellulosic feedstock has received more attention because it does not compete with the food production or occupy the land otherwise used for this purpose [1]. In this regard, bagasse, a major byproduct of sugarcane crushing during juice extraction, shows great advantages for second-generation biofuel production [2]. However, due to cell wall recalcitrance to hydrolysis, the cost-effectiveness of cellulosic ethanol production from sugarcane remains a question [3].
Therefore, screening of germplasm for optimal cell wall features is important for using sugarcane as biofuel crop.
In general, plant cell wall is majorly composed of three different types of polymers i.e. cellulose, hemicellulose, and lignin. These polymers form a complex network structure that impedes cell wall digestibility [4]. Particularly, the properties of cellulose and lignin are closely related to the cell wall recalcitrance [5][6][7][8]. or instance, cellulose is a polymer composed of glucose units linked via β-1,4glycosidic bonds. Cellulose crystallinity index (CrI), that is characterized by X-rays scattering from crystalline and amorphous regions [9,10], is a key parameter that de nes the hindrance to the cell wall sacchari cation [5,8,11,12]. While lignin is a hydrophobic polymer composed of phenylpropane compounds that often tightly associate with hemicellulose to form "LCC complex". This "LCC complex" blocks the cellulose surface and hinders the cellulose accessibility [13]. Therefore, lignin is also a major factor affecting cell wall sacchari cation [6,14,15]. In this backdrop, the screening of germplasm resources for lower cellulose CrI and lignin content can play signi cant role for tweaking cell wall recalcitrance attributes. To carry-out large-scale screening studies for this purpose, a high-throughput assay for determining the cell wall characteristics is needed.
Near-infrared spectroscopy (NIRS) is a rapid and non-destructive analytical tool that is widely employed for high-throughput biomass quantity or quality analysis for biofuel production [16]. It has been used for characterization of cell wall polymer features [17][18][19][20], analysis of biomass sacchari cation e ciency [17,18,21], and prediction of ethanol production via yeast fermentation [22][23][24][25]. Notably, in sugarcane, some studied also have applied NIRS for determining cell wall components or prediction of digestibility [26][27][28][29]. In one of such efforts, Caliari et al. [30] explored the NIRS assay for estimating cellulose crystallinity index. However, most of such studies used o ine calibration strategy that necessitates certain time-consuming steps for NIRS scanning. Hence, its application is limited in analyzing a large number of samples, generally required in crop improvement programs.
This study was initiated to develop a high-throughput online NIRS assay for characterizing key cell wall features in sugarcane bagasse. Hundreds of samples were collected from the germplasm population.
Based on the standard laboratory analytical methods for cell wall features and the online system for near-infrared spectroscopy, a reliable online NIRS assay was developed for analyzing lignin content and CrI. Thus, this study provides a precise and high-throughput approach for large-scale screening and selection of optimal germplasm for reducing cell wall recalcitrance, to target genetic improvement of sugarcane for low-cost bioethanol production.

Results
Near-infrared spectroscopy-based characterization of sugarcane population A total of 838 germplasm samples, brought in six different batches, were used for NIRS modeling. While analyzing each lot of samples, the NIRS data were immediately collected on an explicitly designed online system. The continuously collected spectrum re ectance values were averaged for NIRS calibration. As shown in Fig. 1A, the near infrared spectral re ectance values of all samples displayed a uctuation within the normal range, indicating the diverse nature of these samples. To characterize the structure of the sampled population, principal component analysis (PCA) was carried out for the recorded nearinfrared spectral values. In PCA analysis, new orthogonal variables were generated from the original spectral values. The rst thirteen principal components (PCs), which could explain 99.81% of the variation (Fig. 1B), were selected to characterize the population structure. A large variation of the sample population was observed within the selected PCs especially for the rst ve PCs (Fig. 1C). Finally, to take a view of the population structure, the rst three PCs were used for a 3D observation of the sample distribution. As shown in Fig. 1D, although sugarcane samples of different genotypes were collected from different batches, no discriminable distribution was observed among them, suggesting that these samples could be exploited for a global NIRS modeling.
Diversity of cell wall features in collected sugarcane population X-ray diffraction (XRD) method was applied for cellulose CrI determination. The maximum and the minimum diffraction were separately observed at the 2θ region ranging from 15° to 25°, allowing for a standard calculation of CrI. The varying diffraction values were observed in the sample population ( Fig. 2A) depicting the diversity of the genotypes. The maximum and the minimum diffraction values were applied for cellulose CrI calculation. The CrI values of the sugarcane population were calculated to be ranging from 21-56% (Fig. 2B). These results were comparable with the previous reports for sugarcane and Miscanthus [8,30]. The statistical distribution showed that cellulose CrI exhibited a normal distribution in the analyzed sugarcane population (Fig. 2C). The diversity of the CrI values indicated a large variation for cellulose-related features in the samples.
Lignin mass content (% dry mass) was analyzed through two-steps acid hydrolysis combined with ashing. Using this standard process, the acid soluble lignin (ASL), acid insoluble lignin (AIL), and the total of the two were determined. The ASL content (% dry mass) varied from 1.2% to 2.6%, while the AIL ranged from 9.2-25.3%. Moreover, a large variation of the total lignin content (% dry mass), which ranged from 10.9-27.0%, was observed ( Fig. 2D & Table S1). The lignin content is closely related to the cell wall network structure that signi cantly impacts lignocellulose digestibility. Therefore, this study also estimated the lignin proportion in the sugarcane cell wall. The lignin proportion in the cell wall exhibited a greater variation than the one observed in case of clean dry mass, especially for the ASL values which illustrated the highest coe cient of variation (CV) of 0.19 ( Fig. 2F & Table S1). The total lignin content (% cell wall) ranged from 24.3-56.2%, depicting a variation of the cell wall structure in collected sugarcane population. Furthermore, the frequency distribution analysis of lignin content in terms of clean dry mass and proportion was performed for determining the population structure. As a result, a normal distribution was observed for all these parameters under consideration ( Fig. 2E & G), hinting a reliable NIRS calibration.

Characterization of calibration and validation sets
To take NIRS modeling and the accorded performance evaluation, the samples were divided into two sets, one was used for NIRS calibration and the other was applied for external validation. For cellulose CrI modeling, a total of 120 samples were randomly selected from sample population to build an external validation set and the remaining 718 samples formed the calibration set (Fig. 3A). Similarly, a total of 679 samples were used for lignin content (% dry mass) modeling, 565 of which were used for calibration and 114 samples for external validation (Fig. 3B). For lignin proportion, 446 and 117 samples were analyzed for calibration and equation evaluation, respectively (Fig. 3C). Moreover, for comparison between calibration and validation sets, a frequency distribution was carried out for the cell wall features in both sets. Notably, all of them were comparable and showed a similar normal distribution (Fig. 3A, B & C). Hence, these comparable data sets allowed a reliable NIRS modeling and evaluation.
Online NIRS modeling Partial least square (PLS) regression analysis method packed in OPUS software were used to perform NIRS modeling. Dozens of parameters were combined in terms of wavelength range selection and spectrum pretreatment to obtain calibration equations in PLS analysis. The internal cross validation or external validation were performed for evaluating the performance of the equations and then the best equations were attained (according to their high-performance in validation).
The calibration results showed that all of the equations produced for the two cell wall features exhibited high coe cient of determination (R 2 ) values viz. over 0.80, except for AIL proportion which showed an R 2 value of 0.78 (Fig. 4). The total lignin content equation in the dry biomass determination demonstrated the highest t performance, recording the maximum R 2 value of 0.91 (Fig. 4B). Therefore, the identi ed excellent correlations between t and the reference values during calibration indicated high prediction capacity of the obtained equations.
In addition, the samples from the external validation sets were applied for an independent validation assay to evaluate the prediction performance of the obtained models. A correlation analysis between the predicted values and the measured values was carried out, and root means square error of external validation (RMSECV) and ratio performance deviation (RPD) were additionally calculated. The results suggested that all of the equations exhibited high correlation between the predicted and the true values.
The coe cient of determination of external validation (R 2 ev) ranged from 0.75 to 0.81 (Fig. 4). The AIL proportion showed R 2 ev value of 0.75 that was consistent with the calibration results. Notably, all of the equations obtained RPD values higher than 2.0 during external validation, suggesting their excellent prediction performance.
Finally, to achieve a better performance of the equations for cell wall features prediction, samples in the external validation set were combined into the calibration set for a global NIRS modeling. As more samples were added, wider variation of cell wall features was observed in the integrated new calibration sets (Table S2). As expected, most of the equations demonstrated substantial improvement in prediction capacity. In detail, the equation for cellulose CrI prediction showed the highest improvement as its R 2 value rose from 0.81 to 0.88 (Fig. 5A). For lignin content and proportion prediction, the AIL exhibited the maximum amelioration (Fig. 5B & 5C). All of the new equations obtained from this analysis exhibited high correlation between t and the measured values proposing their excellent tting during calibration. Furthermore, the internal cross validation assay was performed to evaluate the prediction capacity of the equations. During cross validation, the calibration set was randomly partitioned into several groups and samples in each group were validated using a calibration equation developed on other samples. The results proposed that all of the generated equations exhibited high R 2 cv and RPD values; especially, the RPD values that ranged from 2.21 to 2.56 (Fig. 5D & 5C). The equations for lignin content (% dry mass) and cellulose CrI displayed the highest R 2 cv value of 0.85 (Fig. 5E), hinting a consistence with the calibration results. Notably, the AIL proportion illustrated a consistent and high R 2 and R 2 cv values of 0.80 in calibration and validation, illustrating their staunch prediction capacity (Fig. 5F). Taken together, all of the new equations demonstrated good R 2 , R 2 cv and RPD values of calibration and internal cross validation. Hence, the generated equations could be applied for determining the cell wall features.

Discussion
The plant cell wall structure governs the biomass digestibility. In particular, lignin is one of the key factors that hinder lignocellulose sacchari cation [4][5][6][7]. It is a hydrophobic polymer, which can adsorb cellulase and hinder the accessibility to cellulose [31,32]. Thus, high lignin content indicates low cell wall digestibility. Moreover, lignin is composed of different monomers; the variations in these monomers dictate the lignin structure. Different monomers often show dissimilar cross-link patterns in lignin and hemicellulose interaction; and therefore, the lignin structure plays crucial role in de ning the cell wall polymers network. For instance, syringyl/guaiacyl (S/G) ratio has been identi ed as one of the most important factors that negatively affects cell wall sacchari cation in Switchgrass [33]. Thus, the lignin structure seems to be a key role player that in uences the lignocellulose digestibility. Keeping this background in view, in this study, six equations were generated for lignin content determination. Among them, three of the equations were developed for lignin mass content (% dry biomass) prediction; whereas, the other three were designed for characterizing the percentage of the lignin content (% cell wall). These specialized NIRS models could be applied for multi-purpose lignin content determination.
Cellulose is a polymer that can be sacchari ed for bioethanol production through fermentation. Cellulose CrI and the degree of polymerization (DP) are the main features that de ne the cellulose structure. Both of these traits are negatively correlated to sacchari cation e ciency and hinder the utilization of cellulose in second-generation ethanol production [8]. Interestingly, the samples with high DP generally exhibit high CrI as well and show low biomass digestibility [8,12]. Therefore, any of these two features can be employed for cellulose structure characterization. In this study, although NIRS calibration was only carried out for cellulose CrI and lignin content, it could be applied for precise cellulose features prediction.
To reduce cell wall recalcitrance, attempts have been made to modify cell wall structure by reducing cellulose crystallinity and lignin content in sugarcane [34][35][36][37] and other energy plants [12,[38][39][40][41]. The transgenic plants engineered to aim desired variations in these characteristics have shown signi cant improvement of cell wall sacchari cation. Therefore, these cell wall features should be the traits of interest for energy cane breeding. The association studies through large-scale phenotypic and genotypic analyses have emerged as a promising strategy for crop improvement. However, due to the lack of the effective high-throughput phenotyping methods, it is di cult to obtain accurate phenotypic data.
Therefore, here, we report an online NIRS assay for high-throughput screening on the basis of cell wall features. The produced equations showed high R 2 /R 2 cv/R 2 ev values of calibration, internal cross validation, and external validation, suggesting their high-quality performance. Hence, they could be used for high-throughput phenotyping or screening of optimal germplasm for energy cane breeding.
The o ine NIRS calibration was also conducted for cell wall features prediction. For this purpose, after online NIRS scanning, the shredded samples were collected, dried, and subjected to o ine NIRS calibration. The results depicted some obvious differences of o ine near infrared spectra vs. the online ones (Fig. 5A); yet, a continuous variation was evident (Fig. 5B). Similarly, partial least square (PLS) regression analysis was applied for o ine NIRS modeling. In PLS analysis, high R 2 and RPD values of calibration were obtained for both cellulose CrI and lignin content. All of the equations exhibited a high linear correlation between the predicted and the reference values in internal cross validation (Fig. 5C-E). Notably, the o ine NIRS models achieved better prediction performance than those reported previously [26,30], which could be attributed to the large population of diverse samples employed for NIRS modeling in this study.
For precise evaluation of crop genotypes and reliable germplasm selection, samples should be analyzed as soon as possible after collection. The huge number of samples in such screening jobs necessitates the use of appropriate high-throughput techniques. The o ine NIRS calibration uses ground dry samples.
The additional steps of griding and drying are time consuming and labor intensive and thus largely limit the use of o ine NIRS strategies. Therefore, we carried out a comparison between o ine and online analyses. Most of the online equations showed a comparable performance with the o ine ones, while some of them even illustrated higher R 2 and RPD values of calibration and validation (Fig. 5 & Fig. 6). Thus, both of the two NIRS strategies could be used for determining the cell wall features. For large-scale screening jobs, the online NIRS assay is more advantageous and opportune. Hence, this strategy can be used as a convenient tool for screening of sugarcane germplasm.
Since lignin and cellulose features play critical role in cell wall recalcitrance, this study explored both the o ine and online NIRS modeling for predicting these cell wall features. The future research may investigate the NIRS models for the lignin monomers contents and their ratios. All of equations produced in this research exhibited high prediction performance, suggesting their excellent potential for use in germplasm screening. Particularly, the online calibration models developed in this study, due to signi cant advantages in its protocols, exhibit excellent prospects for high-throughput screening of largescale samples for energy cane breeding and germplasm selection.

Conclusions
This study developed an online NIRS assay for high-throughput analysis of key cell wall features in terms of cellulose crystallinity, lignin content, and its proportion in sugarcane. Consistent and precise calibration was obtained in NIRS modeling; suggesting this strategy as a reliable approach for large-scale screening of optimal sugarcane germplasm. The results of the study can be employed for high-throughput screening jobs for bioenergy production from sugarcane.

Near-infrared spectral data collection
The selected six stalks of each genotype were shredded using DM540 (IRBI Machines & Equipment Ltd, Brazil). The shredded fresh samples were immediately blended and transferred for NIRS scanning by CPS (Cane presentation system, Bruker Optik GmbH, Germany). Near infrared spectral data of fresh samples were simultaneously collected through MATRIX-F (Bruker Optik GmbH, Germany) online system. Following the online NIR spectral data collection, the shredded samples were dried under 60℃ and then at 100 °C for 1 h after inactivation. The sample was ground through a 40-mesh screen and stored in a dry container until use. MATRIX-F equipped with Q413 sensor head was used for o ine NIR spectral data collection.
Full band scanning mode with the wavelengths ranging from 4000 to 10000 cm − 1 with 4 cm − 1 steps was employed for collecting on-line and o ine spectral data. The spectral absorbance values were recorded as log1/R, where R is the sample re ectance. The obtained re ectance values were then averaged for further analysis.
Lignin content determination Two-step acid hydrolysis was used for determining the lignin content according to the analytical procedure of the National Renewable Energy Laboratory [42]. Brie y, 0.50 g of ground dry samples were extracted using benzene-ethanol (2:1, v/v) in a Soxhlet for 4 h and then hydrolyzed using 10.0 mL 67% (v/v) H 2 SO 4 (at 25 °C for 90 min with a gentle shaking at 115 RPM). After hydrolysis, the acid solution was subsequently diluted to 3.97% (w/w) with distilled water and heated at 115 °C for 60 min. The autoclaved hydrolysis solution was ltered through a ltering crucible. The supernatant liquids were xed to 250 mL and read at 205 nm under UV spectroscopy to estimate acid soluble lignin. The remaining residues were ashed in a mu e furnace at 575 °C ± 25 °C for 4 h to ascertain the acid insoluble lignin. All experiments were conducted in triplicate.
Lignocellulose crystallinity index determination X-ray diffraction (XRD) method was used to determine lignocellulose crystallinity index (CrI) as described by Zhang et al. 2013 [8]. In detail, approximately 0.3 g of the ground dry samples were extracted using 10 mL of distilled water to remove the soluble sugar. The residues were subsequently extracted using chloroform-methanol (1:1, v/v), methanol, and acetone, and then dried under vacuum conditions. The remaining residues were classi ed as crude cell walls and were used for examining through XRD.
Rigaku-D/MAX 2500V instrument (Uitima III, Japan) was employed for XRD analysis. The crude cell wall powder was laid on the glass holder and investigated under plateau conditions. For this analysis Niltered Cu-Ka radiation (k = 0.154056 nm) generated at 40  represents both crystalline and amorphous materials while I am denotes amorphous material [43].

NIRS data processing and calibration
The OPUS spectroscopy software (version 7.8, Bruker Optik GmbH, Germany) was used for data processing and NIRS calibration. To solve the problems associated with the overlapping peaks and baseline correction, pretreatment and the wavelength range selection of the raw spectral data was performed before calibration. Several spectral pretreatment methods were used in OPUS software, namely constant offset elimination (COE), straight line subtraction (SSL), standard normal variate (SNV), Min-Max normalization (MMN), multiplicative scattering correction (MSC), rst derivative (FD), second derivative (SED), a combinations of rst derivative and straight line subtraction (FD + SSL), a combinations of rst derivative and standard normal variate (FD + SNV), and a combination of rst derivative and multiplicative scattering correction (FD + MSC). The NIRS spectra was divided into multiple intervals and then reassembled to obtain the optimal spectral region for calibration. A principal component analysis (PCA) was carried out to characterize the structure of spectral population and the GH outlier (GH > 3.0) samples were eliminated. Moreover, partial least square (PLS) regression was performed to produce calibration equations. The internal cross validation and external validation were carried out to test the performance of the generated equations. The best equations were selected according to the high coe cient of determination of the calibration/internal cross validation/external validation (R 2 /R 2 cv/R 2 ev), low root mean square error of calibration/internal cross validation/external validation (RMSEC/RMSECV/RMSEP), and high ratio performance deviation (RPD) values [17,21].

Declarations
Authors' contributions XL and FM completed major experiment, analyzed the data. CL, MW, YZ, YS, MA and PL participated in the determination of lignin content and lignocellulose crystallinity index analysis. JH and MZ designed the project, supervised the experiments, interpreted the data, and nalized the manuscript. MTK revised the manuscript. All authors read and approved the nal manuscript.