Terahertz time-domain spectroscopy as a novel tool for crystallographic analysis in cellulose: the potentiality of being a new standard for evaluating crystallinity

Given that terahertz (THz) radiation responds to intermolecular forces such as hydrogen bonds, THz time-domain spectroscopy (THz-TDS) has expanded possibilities in cellulose research. In this study, THz-TDS was used to investigate the crystallinity of three types of cellulose-based materials. Microcrystalline cellulose (MCC) and wood were ball milled at different times, and pseudo-wood was a mixture of MCC and lignin of different mass fractions. All the samples showed peaks at 3.04 THz in the THz mass absorption coefficient spectra. Further, the spectra from 2.79 THz to 3.32 THz were cut out and detrended by subtraction from a baseline. The integrated intensity of the detrended spectra showed a correlation with the mass fraction of lignin of the pseudo-wood samples, and ball milling time of the MCC and wood samples. The correlation was similar with the crystallinity index calculated from X-ray powder diffraction. Moreover, the original wood sample without ball milling had an integrated intensity that was about 30% that of the original MCC sample, matching with the cellulose concentration of the wood (about 30% to 40%). We normalized the integrated intensity of 2.79 THz to 3.32 THz into 1 to 0 by a min–max algorithm and proposed a new “index” for evaluating crystallinity.


Introduction
Cellulose is a macromolecular polysaccharide composed of glucose units, where the b (1 ? 4)-Dglucose units are linked to a linear chain by hydrogen bonds (Updegraff 1969) and the cellulose chains further are linked to a sheet structure. The abundant intra-and inter-chain hydrogen bonds in cellulose make its structure relatively stable (Nishiyama et al. 2002;Langan et al. 2005). The sheets of cellulose are spontaneously and orderly packed by hydrophobic forces such as van der Waals forces, and further aggregated into larger microfibrils (Nishiyama 2009). The degree of order of cellulose crystals can be evaluated in general by using the crystallinity index (CrI), which is an important property of cellulose, and measured by X-ray powder diffraction (XRD). Cellulose is abundantly contained in the cell walls of plants and algae and plays a role in supporting cell structure. Wood is a typical natural cellulose-based material with a wide application and long history. The CrI of cellulose reflects the physical and chemical properties of wood to a certain extent. The tensile strength, Young's modulus, density, thermal stability, and dimensional stability of wood all increase as the cellulose CrI increases (Density et al. 1975;Thygesen et al. 2005;Poletto et al. 2012), whereas the swelling and reaction speed of enzymatic conversion processes decrease (Jörgensen et al. 1950;Hall et al. 2010). This condition means that crystalline cellulose is one of the main rate-limiting barriers for the efficient transformation and utilization of plant biomass (Himmel et al. 2007). Therefore, the crystallinity of cellulose must be investigated.
In addition to XRD, the methods for investigating the crystallinity of cellulose include solid-state carbon 13 nuclear magnetic resonance (13 C NMR), cross polarization/magic angle spinning (CP/MAS), infrared spectroscopy (IR), near-infrared spectroscopy (NIR), and Raman spectroscopy (Teeäär et al. 1987;Kataoka and Kondo 1998;Inagaki et al. 2010;Kim et al. 2013). Among these methods, XRD is the most widely used and considered as a standard method for evaluating crystallinity in most cases given that its pattern reflects the arrangement of atoms, and it has a rigorous theoretical calculation formula. However, the CrI obtained by XRD will be significantly different depending on the selected calculation method. The most used methods for determining the CrI from XRD are the Segal peak height method (Segal et al. 1959), amorphous subtraction method (Thygesen et al. 2005), and the peak deconvolution method (Hermans and Weidinger 1948). The minimum diffraction intensity between the (200) and (110) crystalline diffraction peaks usually appears at about 18°, where was attributed to amorphous by Segal et al., however, the diffraction intensity at this position results in part from the overlap of adjacent peaks; thus, the reliability of the first two methods is questionable (French and Santiago Cintrón 2013;French 2014;Ling et al. 2019). Therefore, the peak deconvolution method was most discussed in recent studies, however, there are still many researchers using the Segal method. For the deconvolution process, the most used peak profiles are Gaussian, Lorentzian, Voigt, and pseudo-Voigt (Hindeleh and Johnson 1972;de Keijser et al. 1983;Wada et al. 1997). However, the accurate expression of amorphous curves using only such functions is difficult (del Cerro et al. 2020). Yao et al. reported an amorphous curve fitted using Fourier series modeling, which gave a result that was consistent with the measured amorphous cellulose pattern (Yao et al. 2020).
The terahertz (THz) band has received considerable attention; this band is located in the far-infrared region where the boundary area is between light and radio waves, with a frequency range of 0.1THz to 10 THz, corresponding to wavelengths of 3 mm to 0.03 mm. The earliest commercial employed application of this band was THz time-domain spectroscopy (THz-TDS). The THz band responds to the intermolecular vibration and rotation of numerous biological macromolecules; thus, the THz band has been used in biology (Plusquellic et al. 2007;Markelz 2008), biomaterials (Inagaki et al. 2014a, b;Wang et al. 2019), hydrodynamics (Braly et al. 2000), and medicine studies (Xie et al. 2014). Given the response of THz radiation to phonons in the crystal lattice, THz has been used in several cases to study crystalline cellulose; in one case, the CrI of microcrystalline cellulose (MCC) was determined by THz-TDS combined with a partial least squares (PLS) model (Vieira and Pasquini 2014), and the capability to distinguish between cellulose crystalline forms of I a and I b was detected (Wang et al. 2020).
This study is a follow-up of a previous research (Wang et al. 2020). We continuously explored the possibility of THz-TDS in cellulose research. THz-TDS was used to investigate the cellulose-based materials, including MCC and wood, both ball milled at different times, and pseudo-wood which was a mixture of MCC and lignin. All materials showed a peak at 3.04 THz of the mass absorption coefficient spectra, and the peak intensity decreased as the crystallinity decreased by ball milling. Further, the mass absorption coefficient spectra of 2.79 THz to 3.32 THz were cut out and detrended by subtraction of a baseline. The integrated intensity of the detrended spectra showed a correlation with the mass fraction of lignin of the pseudo-wood, and ball milling time of MCC and wood. Different from determination of the CrI of cellulose by deconvolution of the XRD pattern, the processing of THz mass absorption coefficient spectra was relatively simple. Moreover, the integrated intensity of 2.79 THz to 3.32 THz was normalized to 0-1 by a min-max algorithm, and it can be regarded as a new ''index'' for evaluation of the cellulose crystallinity.

Sample preparation
To obtain cellulose and wood powder with different CrI, the MCC powder (EMD Millipore 1.02331.0500) were ball milled with a benchtop ball mill (AV-2, Asahi Rika Factory. Ltd) at 200 rpm using ceramic spheres and jars for 0, 12, 24, 48, 72, and 144 h. An airdried softwood, Hinoki cypress (Chamaecyparis obtuse), was first crushed by a rotary crusher and sieved with 10 US standard mesh to collect wood powder with a particle size of 200 lm. The wood powder was then ball milled for 0, 6, 12, 24, 32, 48, 72, and 144 h using the same procedure as the MCC powder.
The pseudo-wood powder was a mixture of MCC and organic-solvent lignin powder (Guangzhou Yinnovator Bio-tech Co., Ltd), where the lignin powder was extracted from Eucalyptus (Eucalyptus) by a high-concentration glacial acetic acid ([ 70%). To obtain different CrIs, we used the following mass fractions of lignin in the powder: 0%, 25%, 50%, 75%, and 100%. The 100% organic-solvent lignin powder can be considered as a totally amorphous powder.
The prepared powders with different CrIs were collected at a mass of 0.075 g by an electronic balance (± 0.0001 g). All the powders were compressed into tablets with a diameter of 14 mm and a thickness of approximately 0.35 mm using a hand-pressing tableting kit (IMC-180C, Imoto Machinery Co., LTD). Three tablets were prepared for each powder. The thickness of these tablet samples was measured using a micrometer (± 0.001 mm).

XRD and THz-TDS measurement
The XRD measurements of all the tablet samples were performed with Cu-Ka radiation (k = 0.1542 nm) using an X-ray diffractometer (Ultima IV, Rigaku) at a voltage of 40 kV and a current of 40 mA. Diffractograms were recorded from 5°to 40°, where the scan range included the peak of the main crystal lattice of cellulose and wood. The scan speed was set to 5°m in -1 , and the sampling step was 0.05°. The background diffractogram was obtained from an empty sample holder.
The THz transmission spectra of all the tablet samples were measured by using a Tera Prospector-Kit model (NIPPO PRECISION Co., Ltd.), and the reference signals were obtained by measurement of air before and after the sample measurement. The THz beam was horizontally polarized with the bandwidth from about 0.1 THz to 4.00 THz, and the spectral resolution was 0.02 THz, corresponding to the inverse of the temporal scan range (50 ps). The diameter of the THz beam spot on the sample was around 3 mm. Each measurement was recorded by averaging 100 scans to improve the signal-to-noise ratio. For reproducibility, all measurements were conducted thrice. To avoid the influence of water vapor on the measurement caused by THz absorption, we placed the whole THz optical system in an almost-closed acrylic box that was continuously filled with dry air to ensure the stability of humidity. All samples were placed in the box for 24 h before measurement to balance the ambient humidity. When the samples were placed in the box, dry airflow was filled into the box until all the THz measurements were completed.

Calculation of CrI with XRD pattern
The original XRD pattern was cut out with a scattering range from 10°to 40°of pseudo-wood, MCC, and wood samples, as shown in Fig. 1a-c, respectively. Gradient colors were used to express changes in the ball milling time of samples and mass fractions of lignin. Background subtraction and baseline correction were performed on all original XRD patterns before further calculation. The background pattern was obtained as mentioned in the experimental section. The baseline was fitted as a first-order polynomial after background subtraction; the process is shown in Fig. 2a.
After background subtraction and baseline correction, the corrected XRD patterns can be considered as composites of an amorphous intensity curve and five main crystalline peaks for the I b type cellulose, a dominant type in vascular plants such as wood. The five main crystalline peaks had Miller indices of (1 1 0), (110), (102), (200), and (004). The amorphous intensity curve was determined by two different methods (Park et al. 2010), which will be discussed in detail later.
The deconvolution of all the crystalline peaks was carried out with a curve-fitting process using a pseudo-Voigt profile, which is a linear combination of a Gaussian curve and a Lorentzian curve as shown respectively in the following equations: Fig. 1 Original XRD patterns of a pseudo-wood with a decreased mass fraction of organic-solvent lignin, b MCC, and c wood samples Fig. 2 Curve-fitting process of a pseudo-wood with 50% lignin. a Background subtraction and baseline correction, b deconvolution of peaks by using a pseudo-Voigt profile only, and c the deconvolution of peaks where the crystalline peaks fitted by using a pseudo-Voigt profile; the amorphous curve was fitted by a 7th-order Fourier series model where I max is the intensity of peaks, 2h max is the peak position, and b is the full width at half maximum (FWHM); I G 2h ð Þ and I L 2h ð Þ are the Gaussian and Lorentzian curves, respectively (de Keijser et al. 1983;Wada et al. 1997).
The positions (2h) (004), respectively. The other parameters of FWHM, the peak intensity, and coefficient l were all determined by the curve-fitting process.
In this study, the amorphous intensity curve was first fitted as a pseudo-Voigt profile, as shown in Fig. 2b, the same fitting process as that of other crystalline peaks. Given that recent research showed that the maximum of the amorphous intensity curve is slightly more than 20°, therefor, the peak position (2h) of the amorphous intensity curve was fixed at 20.6° (Yao et al. 2020), where the position was very close to the (102) crystalline peak, and the influence caused by this will be discussed in detail later.
Yao et al. reported the Fourier series modeling of amorphous intensity curve, and this method was used in this study (Yao et al. 2020). The amorphous intensity curve was determined by fitting the averaged XRD pattern of three 100% organic-solvent lignin samples using a 7th-order Fourier series as shown in Fig. 2c. The obtained Fourier series model can be considered as the basis function of the amorphous intensity curve. The XRD patterns of other different samples only showed the difference in intensity. The intensity can be adjusted by multiplying this Fourier series by a coefficient k in the deconvolution of XRD patterns: where a 0 , a i , b i and w are all constants determined automatically by the fitting process. Supplementary Information provides the details of the fitting of the Fourier series of 100% organic-solvent lignin samples. The deconvolution results were evaluated with a coefficient of determination R 2 : where X i is the intensity after background subtraction and baseline correction, Y i is the fitted intensity, and X is the average of intensities. Table 1 summarizes the calculated R 2 of the pseudo-wood fitted by two different amorphous intensity curves, and Supplementary Information summarizes the R 2 values of other samples. As shown in Table 1, the curves fitted by both methods showed good R 2 , which indicates that the curve-fitting of the two methods was credible.
Once the peaks were deconvoluted by the curvefitting process, the CrI was determined by the following equation: where S Cr and S Am are the sum of the integrated intensity of five crystalline peaks and the integrated intensity of amorphous intensity curve, respectively. As shown in Fig. 2b, c, the amorphous intensity curve fitted with a pseudo-Voigt profile and a Fourier series, respectively, gave different deconvolution results. The most evident intensity of the (102) peak in Fig. 2b was almost invisible, since the peak position of (102) peak was fixed at 20.5°and the amorphous peak was fixed at 20.6°, the contribution of the (102) peak was almost systematically ignored in the fitting process. On the other hand, the (102) peak can be clearly observed in Fig. 2c. Thus, the CrI calculated with a pseudo-Voigt profile amorphous intensity curve was smaller than that calculated with a Fourier series amorphous intensity curve on average. Figure 3 shows the CrI calculated from the different methods. In the figure, the amorphous curve for (a) pseudo-wood, (b) MCC, and (c) wood was fitted by pseudo-Voigt profile, whereas that for (d) pseudowood, (e) MCC, and (f) wood was fitted by Fourier series. As shown in Fig. 3, two different fittings gave a similar crystallinity, whereas for the pseudo-wood, the crystallinity decreased from about 85% to about 5% with the increase in mass fraction of lignin. For MCC and wood, the crystallinity decreased from about 90% to 60% and 80% to 40% with the increase in ballmilling time, respectively.
As shown in Fig. 3a, d, for the pseudo-wood, the samples with a 25% mass fraction of organic-solvent lignin showed a similar crystallinity with the samples without organic-solvent lignin (100% MCC samples). A non-linear correlation was observed between the CrI and mass fraction of lignin, where the CrI decreased sharply when the mass fraction of lignin was over 50%. Regardless of the fitting method used to obtain the curve, the sample that had 100% organic-solvent lignin, which was supposed to have no crystalline region, still showed a crystallinity of about 5%. The crystallinity for MMC and wood showed a similar downtrend. In general, the concentration of cellulose in different species of wood is about 30% to 40% (PETTERSEN 1984). However, in this study, for most of the ball-milling times, the CrI of wood was about 10% lower than that of MCC. For the same type of samples, the changes in the CrI calculated from the XRD patterns was easy to observe and evaluate. However, when the cellulose concentration of wood was about 30% to 40%, the original wood samples still showed a relatively similar CrI compared with that of the original MCC samples. Thus, accurate comparison of the differences in CrIs between various types of samples using the same standard can be a challenge. Table 2 summarizes the detailed crystallinity of the pseudo-wood sample calculated from the two curvefittings and the crystallinity calculated from the THz Fig. 3 CrI calculated from two different curve-fitting processes of the amorphous curve, where a-c with circle markers were fitted by a pseudo-Voigt profile, and d-e with diamond markers were fitted by a 7th-order Fourier series mass absorption coefficient spectrum, which will be discussed in detail in the next section. Supplementary Information summarizes the crystallinities of other samples.

Evaluation of CrI by THz mass absorption coefficient spectra
The measured THz time-domain signal was Fourier transformed into the frequency domain. Given that the samples used in this study were all ''optically thick samples,'' where the backward and forward reflections in the sample (Fabry-Pérot effect) can be ignored (Duvillaret et al. 1996), further to correct the possible inhomogeneity of hand-made samples, the mass absorption coefficient a that also considers the influence of density was calculated using the following equations: where q is the sample density, L is the sample thickness, R r , R s are photon intensities of the reference and measured samples, respectively (Kore and Pawar 2014). Figure 4a-c showed the original THz mass absorption coefficient spectra from 0.2 THz to 4 THz of the pseudo-wood, MCC, and wood, respectively. The same gradient colors as that in Fig. 1 were used. The calculated mass absorption coefficient spectra were all first corrected by baseline fluctuations with a standard normal variate algorithm (SNV) and then smoothed by the application of a Savitzky-Golay filter with a second-order polynomial and fifteen smoothing points, as shown in Fig. 4d-f. The pseudo-wood and MCC samples showed absorption peaks at around 2.1 THz, and these peaks gradually decreased with the increase in mass fraction of lignin or ball-milling time, whereas at this frequency position, no evident absorption peaks of wood samples were observed. Previous research discovered that the absorption peaks at 2.1 THz correlated with crystalline structures, where the I b -dominant and I a -rich types showed absorption peaks at 2.11 and 2.38 THz, respectively. Furthermore, the intensity of the peaks here was correlated with the fraction of I a . The larger the fractions of I a , the stronger the absorption at 2.38 THz. On the contrary, the smaller the fraction of I a , the stronger the absorption at 2.11 THz (Wang et al. 2020). Similar to the case of wood samples, the absorption peaks of pseudo-wood samples with the mass fraction over 50% of organic-solvent lignin were difficult to observe at around 2.1 THz, whereas the cellulose concentration of wood samples was about 30%. Therefore, no evident absorption peak, such as that of MCC samples at around 2.1 THz can be observed. This phenomenon might have been caused by the low cellulose concentration (about 30%) of wood. The thickness of cellulose samples used in the previous research was about 0.5 mm to 1 mm (Wang et al. 2020), which was thicker than that of the samples in this study. Given that the absorption around 3 THz is correlated with the CrI of cellulose (Vieira and Pasquini 2014), the samples were made thin in this study to improve the signal-to-noise ratio to investigate further the changing absorption peaks around 3 THz. However, this condition may lead to difficult observation of the small absorption peaks around 2 THz. As shown in Fig. 4a-f, all types of samples expressed a relatively strong absorption at 3.04 THz without a peak shift. The mass absorption coefficient spectra of wood showed a relatively small intensity compared with other samples. To further study this absorption peak, we cut out the mass absorption coefficient spectra from the range of 2.79 THz to 3.32 THz and then subtracted the baseline that was fitted as a first-order polynomial. Figure 4g-i show the baseline For the pseudo-wood samples, as calculated from the XRD patterns, the integrated intensity of the mass absorption coefficient spectra from 2.79 THz to 3.32 THz showed a linear correlation with the mass fractions of lignin. Given that the mass fraction of lignin was adjusted to increase linearly (? 25% each time) during the sample preparation process, the correlation obtained from THz mass absorption coefficient spectra is more reasonable than that obtained from XRD patterns. The correlations between the ballmilling time with the integrated intensity of the MCC and wood samples showed a similar trend with that of the crystallinity obtained from the XRD patterns. However, the values of 144 h ball-milled MCC samples decreased by about 50% compared with that of the original MCC samples, whereas the CrI of the 144 h ball-milled MCC samples calculated from XRD patterns decreased by about 25%. Furthermore, the original wood samples without ball milling showed relatively small values compared with the original MCC samples. By contrast, the CrI of the original wood samples calculated from the XRD pattern showed no evident difference with that of original MCC samples. The integrated intensities of the original wood samples were about 30% those of the original MCC samples, thus matching with the approximate concentration of cellulose in wood (about 30%). Table 2 summarizes the detailed integrated intensities of the pseudo-wood samples and the CrI calculated by two curve-fittings from the XRD patterns.
The integrated intensities of all samples were normalized to the range of 0 to 1 by the min-max algorithm, and the results are presented in Fig. 4d-f. After this process, the normalized value can be considered as an ''index'' that can be used to evaluate Fig. 5 Integrated intensity calculated from the detrended mass absorption coefficient spectra from 2.79 THz to 3.32 THz of a pseudowood, b MCC, and c wood samples. d-f Normalized integrated intensity by min-max algorithm crystallinity. Different from the CrI, which is a relative value calculated from XRD patterns, an assumption was easily reached from the above results, that is, the THz mass absorption coefficient spectra may reveal the absolute mass fractions of crystalline cellulose in all the samples used in this study. Given that the peaks of the mass absorption coefficient in the THz region did not shift with the changes the type of samples or ball-milling time and mass fraction of lignin, only the intensity changed. Moreover, the THz spectra contain rich physical and chemical information, such as the phonon frequency of crystal lattices, which provides a possibility for the comparison of the mass fraction of crystalline cellulose between different cellulose-based materials. However, the energy corresponding to 3 THz is 0.012 eV, which is relatively small compared with the stabilization energy of hydrogen bonding in cellulose I b , for example, the intrasheet O6-H…O3 H-bonds with the stabilization energy of about 0.87 eV (Parthasarathi et al. 2011). There are some researches simulated THz vibrational properties of cellulose I b using density-functional theory (Peccianti et al. 2017), however, the assignments of absorption peaks at 3.04 THz at molecular level still need further research.

Conclusions
In this study, pseudo-wood, MCC, and wood samples were adjusted to different cellulose CrI by changing the mass fraction of lignin of pseudo-wood samples and the ball-milling time of MCC and wood samples. The measured XRD patterns of all samples were deconvoluted by two different curve-fitting processes, in which one used a pseudo-Voigt profile, and the other used a 7th-order Fourier series to represent the amorphous intensity curve. Both methods gave good fitting results, and the calculated CrIs were close. All samples showed peaks at 3.04 THz of the THz mass absorption coefficient spectra. The THz mass absorption coefficient spectra were pretreated by SNV to correct baseline fluctuations, smoothed by the application of a Savitzky-Golay filter with fifteen points, and then detrended by subtraction of the baseline from 2.79 THz to 3.32 THz. The integrated intensities showed a similar changing trend with the CrIs obtained from XRD patterns. However, evident differences were observed. For the pseudo-wood, the correlation between the CrI and mass fraction of lignin were non-linear, whereas the integrated intensity of detrended THz mass absorption coefficient spectra decreased linearly with the increase in the mass fraction of lignin. The original wood sample without ball milling had an integrated intensity value that was about 30% of that in the original MCC sample, matching the cellulose concentration of wood. By contrast, the CrI of the original wood sample calculated from XRD patterns showed a similar value to that of the original MCC sample. Thus, based on the above results, THz mass absorption coefficient spectra may be used to evaluate the mass fraction of crystalline cellulose. Furthermore, after normalization of the integrated intensity by a min-max algorithm to the range of 0 to 1, the original MCC sample without lignin and the 100% lignin sample can be considered as a pair of standard samples corresponding to 1 and 0, respectively. All the cellulose crystallinity can be evaluated in this range. However, the assignments of absorption peaks of cellulose at 3.04 THz are still uncertain. Numerous optional functions that rely on the experience of analysts, such as pseudo-Voigt profile and Fourier series used in this study, are available for the deconvolution of XRD patterns. Meanwhile, the processing of THz mass absorption coefficient spectra is simple, and all cellulose-based materials showed peaks at 3.04 THz without a peak shift. The measurement of THz signal is quick, simple, and safe and requires no sample pretreatment. Thus, THz-TDS has the potentiality of becoming a new standard for the evaluation of cellulose crystallinity.

Declarations
Conflict of interest None.
Ethical standards This study following Compliance with Ethical Standards; this study does not involve human participants, animals, and potential conflicts of interest.