Fast Determination of the Rubber Content in Taraxacum kok-saghyz Fresh Biomass Using Portable Near-Infrared Spectroscopy and Pyrolysis–Gas Chromatography

Taraxacum kok-saghyz (TKS) is a plant native to the Tianshan valley on the border between China and Kazakhstan. TKS rubber is a good substitute for natural rubber. TKS's breeding work necessitates the need to screen high-yielding varieties, hence rapid determination of rubber content is essential for the screening. Conventional analytical methods cannot meet actual needs in terms of real-time testing and economic cost. Near-infrared spectroscopy analysis technology, which has developed rapidly in the field of industrial process analysis in recent years, is a green detection technology with obvious merits of fast measurement speed, low cost and no sample loss. This research aims to develop a portable non-destructive near-infrared spectroscopic detection scheme to evaluate the content of natural rubber in TKS fresh roots. Pyrolysis gas chromatography (Py-GC) was chosen as the reference method for the development of NIR prediction model. 208 TKS fresh root samples were collected from the Inner Mongolia Autonomous Region of China. Near-infrared spectra were acquired for all samples. Randomly, two-thirds of them were selected as the calibration set, the remaining one-third as the validation set, and the partial least squares method was successfully established a good NIR prediction model for rubber content at 1080–1800 nm with a ratio of performance to deviation (RPD) of 5.54 and coefficient of determination (R2) of 0.95. This study showed that portable near-infrared spectroscopy could be used with ease for large-scale screening of TKS plants in farmland, and could greatly facilitate TKS germplasm preservation, high-yield cultivation, environment-friendly, high-efficiency and low-cost rubber extraction, and comprehensive advancement of the natural rubber industry thereof.


Introduction
Natural rubber (NR) is a vital commodity in world market due to its excellent physical properties that synthetic rubber cannot replicate or prevail. In the past decade, the world's demand for natural rubber has increased dramatically. Alternative sources of natural rubber will not only increase the supply of rubber, but also provide resource protection for countries engaged in planting. More than 2500 kinds of plants can synthesize natural rubber, most of which produce small quantities of low molecular weight rubber. Only Hevea brasiliensis or para rubber tree, Parathenium argentatum and Taraxacum kok-saghyz can produce rubber with molecular weight over 1000 kDa and have in turn applicative values [1]. However, Hevea brasiliensis is easily affected by leaf blight and may face potential threats. Countries have successively initiated the research and development of Hevea brasiliensis alternative rubber-producing crops [2].
Taraxacum kok-saghyz (TKS), is a rubber-producing plant native to the mountains of Kazakhstan and the Tianshan Valley-Tex River Basin in Xinjiang, China. TKS plant, inherently rich in natural rubber, is an important industrial crop [3]. The mass fraction of rubber within TKS' taproot can reach 2.89-27.89% [4], and TKS rubber is of good quality, with a molecular weight of more than 1,000,000 g/mol [5].
Ying Chen and Shun-Kai Gao have contributed equally to this work.

3
In addition, TKS is easy to grow, which has strong ability to grow in both cold areas and temperate regions, and to resist bacterial infections and pest invasions. Therefore, TKS is regarded as an ideal Hevea brasiliensis alternative rubberproducing crop.
The determination of rubber content is crucial for the growth of alternative rubber plants in the natural rubber industry. The breeding of TKS necessitates the screening of high-yielding varieties, thus rapid determination of rubber content is essential for the work. Over the years, a variety of techniques have been used to determine rubber content, such as gravimetry [6], gel permeation chromatography (GPC) [7], infrared spectroscopy (IRS) [8], near-infrared spectroscopy (NIRS) [9], mass spectrometry (MS) [10,11]. However, quantification is usually a time-consuming process, which slows down the speed of improving rubber quality and yield through breeding. Several quantitative procedures have been discussed to measure the rubber content in plants [12,13]. In most cases, the protocols comprise two successive steps, extraction of rubber, then quantification of the components by gravimetry, chromatography, or a spectral technique [14][15][16][17][18][19][20]. For example, quantitative Soxhlet extraction method was used as the main technique for rubber quantitation [21]. With this method, plant materials were first ground into small particles, and the dry-milled materials were put into cellulose extraction thimbles. Acetone extraction for 4 h was next used to remove resins, and then samples were extracted with hexane for another 4 h to obtain the rubber. The collected extractants were evaporated, leaving dry film of rubber. Following evaporation of the solvents, weights were determined for rubber contents using gravimetric analysis [12]. This method requires time-consuming purification and extraction steps and consumes a lot of organic solvents. Moreover, the rubber is easily degraded in boiling acetone, which affects the quality of the rubber consequently.
Pyrolysis gas chromatography (Py-GC) has been used to determine the rubber content of plant-derived natural rubber, e.g. Hevea brasiliensis, Eucommia ulmoides and synthetic rubber [22]. Pyrolysis gas chromatography is a relatively accurate method so far to determine the rubber content indirectly. The method is fast and accurate, does not require complicated sample processing and rubber extraction steps, and does not need to use any organic reagents. The rubber content data of a plant sample can be easily acquired in an average of 40 min, and both dry and wet samples can be measured.
Near-infrared spectroscopy (NIRS) is based on an electromagnetic radiation wave between visible (Vis) and midinfrared (MIR). The near-infrared (NIR) region, defined as 780-2526 nm by the American Society of Testing Materials (ASTM), is the first invisible region found in the absorption spectrum. The near-infrared spectrum originates from the combinatorially vibrational/rotational frequency and doubling absorption frequency regions of hydrogen groups (-OH, -NH, -CH) in organic molecules. By scanning with ease, the near-infrared spectra of the samples, the characteristic fruitful information of hydrogen groups in organic molecules can be obtained, from which data modeling and sample analysis by near-infrared spectroscopy are deemed to be convenient, rapid, efficient, accurate, cost effective, non-destructive and environmental-friendly. Therefore, this technology is favored by more and more natural rubber researchers [23].
The main purpose of this study is to develop a facile nearinfrared spectroscopic method to determine the content of rubber in the fresh root of Taraxacum kok-saghyz. By collecting different varieties of TKS samples and taking the measured values of pyrolysis gas chromatography as the reference data of the NIR model, the quantitative prediction model of rubber content was established by multivariate regression analysis method. The prediction accuracy and reproducibility of the analysis model were evaluated, and a set of near-infrared spectrum model suitable for the fresh root of TKS was explored.

Biomass Samples
The samples used in this study were collected from the TKS plantation base in Duolun County, Inner Mongolia Autonomous Region, China. A total of 50 fresh TKS with entire roots of different varieties, different growth stages and different moisture were collected in the year of 2019 and 2020.

Experimental Protocol
The NIR spectral measurements were carried out with a portable self-assembled spectrograph NQ512-2.5 (OceanOptics, USA). To obtain clean rubber roots for spectroscopic analysis, the picked and collected TKS roots were washed to remove external impurities and sludge. The washed fresh roots were wiped with tissue paper and weighed to get the mass of fresh TKS before dehydration. To reduce the error of moisture fluctuation, each plant sample needed to be weighed again before being irradiated with near-infrared spectrometer, to ensure that the mass difference incurred by the water loss was between 8% and 10%. Each TKS plant was categorized into main root and lateral roots, the main root and two to three lateral roots were selected at the definite position ( Fig. 1) (main root: 10-20 mm under the leaf, lateral roots: 5-15 mm under the main root junction) to be irradiated for near-infrared spectroscopy. NIR lightirradiated position of each plant was marked to facilitate successive determination of reference rubber content values by thermal pyrolysis gas chromatography. A total of 208 TKS root samples were collected.

Pyrolysis Gas Chromatography
Pyrolysis gas chromatography was used as the reference method to measure the rubber content of TKS for the development of NIR prediction model. The sample was pretreated by freeze-drying and grinding. The fresh TKS root samples with collected near-infrared spectra were placed in a vacuum freeze dryer. After freeze-dried for 2 days, TKS root samples were ground into a uniform fine powder. Then, the ground samples were stored in sealed bags, labeled and placed in a desiccator.
Next, the pyrolysis gas chromatography measurement was performed. An electronic balance with an accuracy of 0.01 mg was used to weigh and record the tare weight of the pyrolysis sample cup. Ca. 5-10 mg of the powdered samples were weighed and put into the sample cup. Then the mass before pyrolysis was weighed on the balance and recorded. An Agilent 6890 N gas chromatography equipped with a PY UA-5 capillary column (30.0 m × 250 μm × 0.25 μm) and a thermal conductivity detector (TCD) was used for the experiment.
Thermal pyrolysis conditions: thermal pyrolysis temperature 550 ℃, pyrolysis time 0.1 s. Gas chromatographic conditions: injection port temperature 250 ℃, split ratio 50:1, carrier gas was high-purity nitrogen, the initial temperature of the programmed column temperature was 40 ℃, and the initial temperature was kept for 5 min, then increased to 130 ℃ at a rate of 10 ℃/min and maintained for 10 min. Finally, the temperature was increased to 280 °C at a rate of 10 °C/min and kept for 20 min.
After the preparation as well as the pyrolysis-gas chromatographic instrument warmed up and adjusted, put the sample cup into the inlet of the facility. The gaseous pyrolyzed monomeric small molecule components of the sample were blown into the gas chromatograph with nitrogen through the pyrolysis injection needle, and then the chromatogram was obtained for detection. The tared sample mass and the corresponding peak area of pyrolyzed limonene were substituted into our reported standard curve y = 343,269.93x + 1,520,710.36 [24] to obtain the reference value of TKS rubber content.

Near-Infrared Spectroscopy
A self-assembled portable near-infrared spectrometer in the laboratory (Fig. 2) was used to collect the near-infrared reflectance spectra of TKS plant samples. The spectrometer consists of a halogen tungsten light source, grating, detector, and a reflective optical fiber. The optical resolution is 6.25 nm, and data can be collected in the spectral range of 900-2500 nm. To avoid direct contact, the reflective fiber probe was 5 mm away from the sample, which could keep the probe clean and protect the probe from scratches. In the following experiments, to reduce light scattering, a handheld gun-like scanner was specially designed for the probe, and the probe was retracted 5 mm into the scanner head. Before each sample being scanned, a polytetrafluoroethylene reference plate was utilized to determine the absolute reflectance baseline. To ensure the accuracy of measurement, the absolute reflectance reference was calibrated every 15 min during spectrum acquisition.

Statistical Analysis
Chemometrics software package Thermo GRAMS was used to develop rubber content prediction models, where GRAMS AI module was used to preprocess the spectrum, Fig. 1 Irradiation site of near-infrared spectrometer Fig. 2 Laboratory self-assembled portable near-infrared spectrometer GRAMS IQ module was used to build the model, and IQ Predict module was used to predict and evaluate the model. The data matrix required to establish a rubber prediction model needed to correlate the standard rubber content value with the corresponding near-infrared spectrum. The mean center spectrum preprocessing method was used to subtract the average spectrum from each individual spectrum, which could eliminate data offset and make the characteristic difference between sample spectra more obviously. Standard normal variate (SNV) transformation mathematically corrected the spectrum. This correction reduced the influence of the size of solid particles, surface scattering, and changes in optical path on the near-infrared spectrum. Then the second derivative with a smoothing range of 12 data points and a cubic polynomial was used to eliminate interference from the baseline and other backgrounds, distinguish overlapping peaks, and improve resolution and sensitivity [25]. For fresh roots, the water band was removed at wavelength intervals of 1350-1450 nm and 1800-1950 nm. A calibration equation was developed using partial least squares (PLS) regression method. For cross-validation, a set of NIRS measurements was divided into 25 subsets, which have randomly selected measurements. The calibration set included 140 measurements, and the validation set included 68 measurements (208 measurements in total).

Analysis of the TKS Rubber Content in Samples by the Reference Method (Py-GC)
Each of the 208 samples was prepared in accordance with the procedure detailed in Sect. 2. The powder samples of TKS dry roots ground for use were directly subjected to thermal pyrolysis gas chromatography for determination. After thermal pyrolysis, the measured peak area (x) of the pyrolyzed product at the designated time and the sample injection volume (mass) were substituted into the standard curve equation y = 343,269.93x + 1,520,710.36 to calculate the rubber content of the sample (y). According to the ratio of the measured rubber content to the sample injection volume, the percentage or mass fraction of rubber content was obtained. Table 1 shows the descriptive statistics of moisture and NR content after pyrolysis gas chromatography. The number of samples was 208. The moisture content was calculated based on wet TKS sample. The average moisture content of all samples was 71.58%, and the standard deviation was 1.72%.
The rubber content of all samples ranged from 0.2% to 16.72%, with an average value of 5.00% and a standard deviation of 2.64%. Therefore, the analysis work showed that these 208 samples covered an ideal variation range of these extractable biomass components in agronomic and breeding research work [26].

Estimation of Natural Rubber Content
After preprocessing, the spectrum data were imported into Grams IQ chemometrics software together with the reference value. Randomized two-thirds of all samples were used as the calibration dataset, and the remaining one-third were used as the verification dataset. Partial least squares regression (PLSR) was used to establish the relationship between spectral data and sample composition data. Using cross-validation as a verification method, a few outliers were deleted from the dataset, and the remaining data was used for model development to predict unknown composition values from the spectral data. This process was then repeated in the entire data set to test its prediction accuracy, so that the coefficient of determination (R 2 ) of the model and the standard error of cross-validation could be calculated. The evaluation criteria of the best model are as follows: the smaller the prediction standard error (SEP), the more precise the result. Under the premise that the standard deviation (SD) of the validation set is the same, the larger the R 2 , the higher the accuracy. Under the premise of the same concentration range, the larger the ratio of prediction to deviation (RPD), the higher the accuracy [25].
In this study, the Savitzky-Golay convolution 2nd derivative (S-G 2nd) pretreatment method was used to develop the PLSR rubber prediction model. The total sample size was 208 where 4 abnormal data points were eliminated, twothirds of all samples were selected as the calibration set, and one-third as the validation set. In addition, additional 20 samples were measured as the prediction set. Table 2 shows the basic information of the TKS sample. The cross-validation and validation results were similar. In the same way, the validation result was similar to the calibration result. Crossvalidation and validation were related to the performance of the model used to estimate a single value. The R c 2 of the obtained model was 0.95, and the standard error of calibration (SEC) of rubber was calculated to be 0.47. The standard error of cross validation (SECV) was 0.42. The validation set data were used to evaluate the calibration model, the SEC of the rubber model was close to the SECV, indicating that the distribution of calibration set and validation on set were similar and precision of both sets prediction were similar. IQ Predict was used to predict spectral data from the prediction set and compared with the corresponding reference value to calculate the SEP, RPD p , and R p 2 . Through calculation, it was verified that the R c 2 value was 0.95, indicating that the model had the ability to predict rubber content. RPD c of the model was calculated to be 5.54. Table 3 shows the statistical properties of the NIRS model used to estimate rubber content. According to the evaluation criteria, the prediction of the model was accurate and could be used for quantification [27]. Figure 3 shows the model of the NR obtained.
Fresh roots contain a large proportion of water, and as the time of exposure to the environment increases, the water loss in the roots will also increase. Therefore, to ensure the robustness of the experiment, the influence of moisture content on the near-infrared spectrum must be minimized. To reduce the error of moisture detection, the surface soil of each plant was washed with clean water immediately after being picked, TKS samples were wiped dry and weighed, and weighed again before collecting near-infrared spectroscopy to ensure that the mass difference was maintained at ca. 8% because of the water loss. Concurrently, we also selected non-water peak bands to establish the model. However, there is still room for reducing differences caused by moisture to be further optimized.
Another point that was prone to errors in the collection of fresh roots NIR spectra was the influence of the irradiation position. As shown in Fig. 4, it is not difficult to see the distribution of the laticifers after the root of TKS is crosscut. The laticifers of TKS are arranged closely between the epidermis and xylem along a ring [28], and the laticifer cells  in the laticifers are where the latex is produced. Since the laticifers are more evenly distributed in the roots, so long as each plant was irradiated at the definite position (main root: 10-20 mm under the leaf, lateral roots: 5-15 mm under the main root junction) to acquire near-infrared spectrum, the error could be greatly reduced.

Interpretation of NIRS Spectra
From the raw spectral data, it can be roughly seen that the absorption peaks associated with rubber are at 1100-1300 nm, 1600-1800 nm and 2100-2300 nm and water band at 1400-1600 nm and 1900-2200 nm (Fig. 5).
Since the compound absorption region of the near-infrared spectroscopy has many overlapping areas, various treatment methods such as derivative spectra were used to enhance the absorption peaks of −CH, −CH 2 , and −CH 3 functional groups in isoprene to establish the correlation of the natural rubber concentration and the spectral absorption value. There are also rubber absorption peaks in the range of 2200-2400 nm, but the spectral noise in this region is relatively strong. Therefore, this spectral region was removed when modeling. The PLSR model established in the range of 1080-1800 nm could better predict the rubber content.

Prediction by NIRS of Rubber Content Values of Fresh TKS
IQ Predict was used to predict the spectral data from the calibration model and compared with the corresponding reference value. 20 unknown TKS fresh root samples were predicted using the NIR model. The results are tabulated in Table 3. The rubber content values of TKS by NIRS are similar to those obtained from traditional solvent extraction method [30,31], but NIRS is rapider, more convenient, more environmental-friendly and more economical. It can be seen from the table that the bias between the predicted value of the model and the result determined by Py-GC are almost zero. The results in Table 3 show that the rubber content predicted by the NIR model is close to that measured using the reference method. It proves that the prediction results of this NIR model are relatively accurate and can be used for large-scale actual samples detection if more samples are collected.

Conclusion
In this study, we use straightforwardly fresh plant roots as the object of near-infrared spectroscopy. By simple sample preparation work, we can greatly reduce the human error in the operation process, which is extremely helpful for improving the accuracy of the near-infrared spectroscopy model. The research results (R c 2 = 0.95, RPD c = 5.54) are satisfactory and can meet the needs of practical applications.
The standard error of Py-GC reference values used to calibrate the NIRS method is quite low of 2.64%. In the characterization of macro or even trace amounts of natural rubber, pyrolysis gas chromatography is probably the most accurate method to date. In this study, the method was combined with near-infrared spectroscopy, and a reliable result with good consistency was obtained, which can accurately and quickly characterize the content of natural TKS rubber.
The comprehensive development and utilization of TKS will be the development direction of this crop [32]. In the future, near-infrared spectroscopy technology will also  In addition, although this spectral model for field measurement can be used to estimate the content for a single rubber plant, it is even more applicable for evaluating the average NR content in plants in the entire field to monitor the impact of planting parameters. For practical reasons, this research was mostly conducted in the laboratory. The next step will be actual measurements on the ground to provide breeders, agronomists and farmers with a simple and high throughput tool for facile rubber content determinations.