Physical properties and measurements of hair
The hair samples collected from different species were analyzed by several measurement techniques: solid-state NMR, TD-NMR, FT-IR, and TG-DTA. For solid-state NMR, the 1H wide-line (anisotropic) spectra, 1H MAS (isotropic) spectra, and 13C CP-MAS spectra were recorded. Fig. 1 shows the averaged data of each measurement. The 1H wide-line spectra exhibited the typical line shape of solid samples, which broadens over a range of 100 ppm due to the various orientations of dipolar interactions (Fig. 1a and S1a). At the same time, a relatively narrow peak was observed around 0 ppm. These line shapes in the 1H wide-line spectra indicated that hair samples contained compositions with different molecular mobilities, or anisotropic interation9. Meanwhile, the 1H MAS spectra showed characteristic peaks within a narrower spectral region owing to averaged isotropic interactions by MAS (Fig. 1b). Some sharp peaks, from 0.8 ppm to 2.3 ppm, were ascribed to lipid compositions37,38. The lipid peaks were the most distinct in cat hairs and hardly observed in pig hairs (Fig. S2a). The relatively broad peak around 2.8–7.0 ppm encompasses Hα of amino acids, which is mainly keratins37,39. In addition, the line shape widely expanding from −5 to 14 ppm may represent highly anisotropic and rigid components, such as structured keratins. The 13C CP-MAS spectra exhibited distinctive peaks of side-chain aliphatic carbons, Cα methine carbons of amino acids, aromatic carbons, and carbonyls carbons around 10–40 ppm, 45–60 ppm, 115–158 ppm, and 165–178 ppm, respectively (Fig. 1c)3,22,37, 40–42. The pig hairs showed a relatively higher intensity of carbonyl carbons, assignable to the α-helix form around 176 ppm, among the hair types (Fig. S3a)3,22,40,41. The signals observed by TD-NMR rapidly decayed at an earlier time, then gradually decreased to zero (Fig. 1d). The decay curves demonstrated the presence of compositions with different relaxation rates, or mobilities, in hairs43. This was consistent with the line shapes of the 1H wide-line NMR spectra. The TD-NMR curves of hairs showed a similar tendency for each species, whereas substantial donor-dependent differences were simultaneously involved (Fig. S4a). The FT-IR spectra also showed characteristic absorption peaks of proteins and lipids (Fig. 1e). Peaks of Amide A, Amide I, Amide II, and Amide III, derived from proteins, were observed around 3277, 1634, 1516, and 1234 cm−1, respectively44–46. Methyl and methylene stretching at 2958 and 2850 cm−1 were representative of lipids47. The hairs of each species showed similar spectral patterns with intensity variations, particularly at lipid peaks (Fig. S5a). TG-DTA provided DTG curves of the hair samples (Fig. 1f). The mass loss under 100°C represented the removal of free water48–50. Distinct mass loss up to around 240°C was considered pyrolysis of cortex according to previous reports5,25. After the pyrolysis of cortex, the remaining cuticle forms “micro-tubes” emptied of cortical material. Mass loss that follows should correspond to decomposition of the micro-tubes, which could possibly be preceded by the elimination of bound water48,51. The carbonization of the remaining constituents proceeded further until reaching the end temperature of 500°C. The human hairs showed slightly higher values in DTG curves at around 240°C–260°C than those of other species (Fig. S6a).
The hair samples were also subjected to a tensile tester to evaluate the following physical properties: breaking force, elastic modulus, extension, and yield strength. The physical property values are plotted in Fig. S7. The breaking force was high for pig hairs (median of 6.02 N) and relatively low for cat hairs (median of 0.21 N) (Fig. S7a), which were well correlated with hair diameter (Fig. S7e). Meanwhile, cow hairs demonstrated relatively high elastic modulus (median of 4.6 GPa) (Fig. S7b), and human hairs showed a bit higher extension (median of 65%) (Fig. S7c) among tested hair types. Yield strength among tested hair types was relatively low for cat (99 MPa) and human hairs (105 MPa), whereas it was high for cow hairs (177 MPa) (Fig. S7d). Owing to the characteristic properties depending on species, as well as individual donors, the collected hair samples provided a substantial variety of physical property values.
Generation of measurement descriptors
The measurement data of hairs were converted into “measurement descriptors” by the data-processing, including spectral differentiation, binning, dimension reduction by PCA, or curve deconvolution (Fig. 2). Second-order differentiation was applied to the 1H wide-line, the 1H MAS, 13C CP-MAS NMR spectra, the FT-IR spectra and the DTG curves in order to enhance the profiles’ features. The differentiation is also effective to correct offset or linear drift of baseline. Binning was conducted to calculate the average values within certain regions in the profiles so as to represent the characteristic peaks as resolved. Dimension reduction aimed to extract correlating variable sets for representing the data’s features efficiently. Curve deconvolution for the1H wide-line NMR spectra and the decay curves of TD-NMR was separation of the mixed signals into a small number of components via function fitting. Schematics of the generated bins and deconvoluted components are shown, along with their respective measurement results, in Fig. S1–S6. Consequently, a total of 902 descriptors were generated. All measurement descriptors are detailed in Table S1.
To overview the relationship between the generated descriptors and physical properties, CCorA was conducted. CCorA determines a set of linear combinations of variables in two datasets (i.e., physical properties and measurement descriptors) so as to maximize the correlation between them52. CCorA results were obtained for descriptor sets of each measurement (Fig. S8) and the combined set (Fig. 3). Breaking force was plotted with a relatively large score (~ 1) on the first, or the most dominant, canonical axes in all plots. This tendency demonstrated that breaking force was well explained by the prepared descriptors. On the other hand, elastic modulus, extension, and yield strength were expressed mainly on the second canonical axis in most plots. In addition, elastic modulus and yield strength were plotted close to each other. This result shows that these two properties have similar correlation with measured data. Meanwhile, extension was plotted on the opposite side of the plot, indicating different and distinctive correlation with the measurements (Fig. 3). Relative contributions of the measured information to physical properties were difficult to compare based on these CCorA results. However, some of the less-promising descriptors indicated by small scores for physical properties were descriptors of 1H wide-line NMR spectra for extension (Fig. S8a) and descriptors of 1H-MAS NMR spectra for elastic modulus (Fig. S8b).
Prediction of physical properties by measurement descriptors
The measurement descriptors were further associated with physical properties by building prediction models using RF and PLSR, which are nonlinear and linear algorithms, respectively. Herein, each of the physical properties was predicted by the measurement descriptors generated from their respective or all of the measurements. The constructed models were validated by 10-fold CV. The prediction accuracies of the models were evaluated using the coefficient of determination (R2) and NRMSE. The 10-fold CV was repeated 100 times, and the averages and standard deviations of R2 and NRMSE were then obtained. The evaluated accuracies are summarized in Table S2. According to the results of CCorA, breaking force was predicted accurately with a high R2 of ~ 0.913. The descriptors from 1H-MAS and 13C CP-MAS NMR spectra showed better prediction accuracies for breaking force (R2 ~ 0.882 and 0.885, respectively) than those from the other measurements. Efficacy of the descriptors from the FT-IR spectra and DTG curves was distinctive for prediction of elastic modulus (R2 ~ 0.251 and 0.292, respectively) and yield strength (R2 ~ 0.296 and 0.409, respectively), whereas their accuracies were not high. The prediction accuracies for extension were considerably poor with R2 near zero. However, among the models for extension, the descriptor sets from the FT-IR spectra (R2 ~ 0.170), DTG curves (R2 ~ 0.098), and 13C CP-MAS NMR spectra (R2 ~ 0.127) showed relatively better results. The descriptor set combined from all the measurements was expected to provide superior predictions using multiple types of measured information. However, the prediction accuracies obtained by the combined descriptor set were comparable or a bit poorer than those by the descriptors from each measurement. This result indicated that the presence of insignificant explanatory variables (i.e., the measurement descriptors) possibly hindered efficient prediction, making the integrative interpretation difficult. Therefore, selection of the descriptors adopted for the predictive modeling were requisite to determine the measured information which were significantly contributive to the physical properties.
Selection and interpretation of measurement descriptors
Integrative interpretation of information from multiple measurements of hair is the interest of the current study. Thus, prediction models for physical properties were subsequently refined by selecting contributive measurement descriptors from all 902 generated. The importance of each measurement descriptor was evaluated for the RF and PLSR models via 100 repeats of 10-fold CV. The number of descriptors adopted for the model building was reduced stepwise, succeeding 90% of those ranked with higher importance values to the next step. Prediction accuracies (i.e., R2 and NRMSE) of the RF and PLSR models built at each step are shown for their respective physical properties in Fig. 4a–d and Fig. S9a–d, respectively. As a general trend, starting from 902 descriptors, R2 values first increased (and NRMSE decreased), then reached the maxima. This process should correspond to elimination of insignificant descriptors. Further reduction of the number of the descriptors resulted in the decrease of the R2 values, indicating that the contributive descriptors were excluded. Consequently, the descriptor sets that showed the highest R2 values were determined to be the best among each selection series. Figure 4e–h and Fig. S9e–h show plots of predicted physical properties values with the best descriptor sets versus observed values. After the descriptor selection, the prediction accuracies were significantly improved (Table 1). For example, R2 values increased from 0.913 to 0.925 for the RF models of breaking force, from 0.204 to 0.546 for the RF models of elastic modulus, from 0.233 to 0.527 for the PLSR model of extension, and from 0.336 to 0.606
Table 1
Prediction accuracies of physical properties using sets of measurement descriptors selected as the best.a
Physical property | RF | | PLSR |
No. of selected descriptors | R2 | NRMSE | | No. of selected descriptors | R2 | NRMSE |
Breaking force | 255 | 0.925 ± 0.005 | 0.310 ± 0.009 | | 349 | 0.919 ± 0.005 | 0.299 ± 0.010 |
Elastic modulus | 8 | 0.546 ± 0.073 | 0.335 ± 0.024 | | 7 | 0.500 ± 0.038 | 0.351 ± 0.014 |
Extension | 34 | 0.506 ± 0.027 | 0.287 ± 0.005 | | 150 | 0.527 ± 0.021 | 0.270 ± 0.006 |
Yield strength | 12 | 0.593 ± 0.030 | 0.338 ± 0.012 | | 16 | 0.606 ± 0.022 | 0.329 ± 0.009 |
a Each value was evaluated by 100 repeats of 10-fold CV (average ± standard deviation). |
for the PLSR model of yield strength. These results demonstrated the efficacy of this selection strategy for the descriptors based on the importance evaluation.
The RF and PLSR models for each physical property showed common descriptors of the 20th-best importance. Such descriptors would be especially useful for interpretations of the association with the physical properties.
Breaking force selected several descriptors of the 1H MAS NMR spectra, around 3.1–3.9 ppm and 5.6–6.8 ppm, which indicate both sides of the peak involving amino acid Hα (blue arrows in Fig. 6a). These signals could be attributed to proteins with strong anisotropic dipolar coupling and, thus, slow mobility. Additionally, the descriptor selected on the 13C CP-MAS spectra (“cpmas.95”) corresponds to carbonyl carbons in α-helix form around 176 ppm (Fig. 6b)3,22,40,41. The α-helix and coiled-coil structures of crystalline fibrous keratins are distinctive of the cortex component. Therefore, the fraction of rigid α-keratin bundles in the cortex was linked to tensile resistance of hair, as well as the diameter. This result also demonstrated that the measurement descriptors successfully represented the secondary structure and the mobility of keratins. Meanwhile, the descriptors of the 1H wide-line NMR spectra and TD-NMR were also expected to exhibit molecular mobility; however, they were rarely selected. This result indicated that the descriptors of the 1H MAS and 13C CP-MAS NMR spectra were substantially efficient because they were well resolved into the spectra and then associated with respective molecular compositions.
The distinctive descriptors selected for elastic modulus were “dtg.2der.36” and “dtg.2der.37” of the DTG curves and “ftir.2der.51” of the FT-IR spectra. “dtg.2der.36” and “dtg.2der.37” correspond to the 265°C–276°C range of the second-derivative DTG curves (orange arrows in Fig. 6c). This temperature region could be associated with decomposition of the cuticle, specifically micro-tubes after the cortex has vanished5,25. The cow hairs with high elastic modulus showed high, or positive, values for these descriptors, which indicated the relatively slow rate of mass loss in this temperature region. Moreover, “ftir.2der.51” indicates Amide I absorption at 1631–1649 cm− 1, which is assignable to random coil structure (Fig. 6d)44–46,53−55. The FT-IR ATR technique measures only the sample surface, with a depth of several micrometers. Thus, “ftir.2der.51” supposedly corresponds to amorphous keratins in cuticles. Meanwhile, fibrous crystal keratins in cortex remain in α-helical forms during elongation from zero to several percent for evaluating elastic modulus based on Hooke’s law8,56−58. Therefore, we assumed that elastic modulus is dependent on the amount of disulfide links or the entanglement of amorphous keratin in the cuticle, rather than the cortex.
Extension was associated with some descriptors of DTG curves (“dtg.21,” “dtg.22,” and “dtg.45”) (Fig. 6c), FT-IR spectra (Fig. 6d), and 13C CP-MAS NMR spectra (“cpmas.66”) (Fig. 6b). The referred range (244°C–263°C) of the DTG curves in the aforementioned descriptors is possibly related to loss of bound water in the cuticle. “ftir.2der.52” and “ftir.2der.31” represent peaks of Amide I and Amide III at 1651–1669 and 1246–1264 cm− 1, respectively (pink arrows in Fig. 6d). These regions are assignable to β-turn or random coil structures44–46,53−55,59. At the same time, extension of hair reportedly increases with humidity8,58. Thus, the selected descriptors potentially demonstrated that the nonorganized amorphous keratins in the cuticle provided accessibility to water and then enhanced the hair extension. The other regions selected on the FT-IR spectra were 2763–2781 (“ftir.2der.60”), 822–839 (“ftir.2der.9”), and 783–800 cm− 1 (“ftir.2der.7”). Although the assignments were difficult, these descriptors possibly represent hydrophilic groups (e.g., C–O and N–H) in proteins that are related to water association. “cpmas.66” is a signal around 124 ppm in the 13C CP-MAS NMR spectra, which may result from hydrophilic aromatic amino acids, such as tyrosine. “dtg.45” is also difficult to understand, but could represent carbonization of heat-resistant compositions.
Lastly, yield strength was considerably dependent on the descriptors of the DTG curves (Fig. 6c) and the FT-IR spectra (Fig. 6d). Some of the selected descriptors (i.e., “dtg.2der.36,” “dtg.2der.37,” and “ftir.2der.51”) were common with elastic modulus, which was consistent with the CCorA results (Fig. 3 and Fig. S8). “dtg.22” and “dtg.23” represent the 254°C–273°C range on the DTG curve and almost cover the regions of “dtg.2der.36” and “dtg.2der.37.” “dtg.31”–“dtg.34” for the 344°C–383°C range were distinctive for yield strength (green arrows in Fig. 6c), which was higher for cat hair and lower for cow and pig hairs. These descriptors presumably indicate highly heat-resistant components in cuticle layers, which induce brittleness in hair.
The prediction accuracies for elastic modulus, extension, and yield strength were not high compared with those for the breaking force (Table 1). This result indicates that elastic modulus, extension, and yield strength need additional information to be sufficiently described. At the same time, errors of evaluated physical property values, which were not considered in model building, possibly hindered to achieve higher prediction accuracies. Nevertheless, the measurement descriptors and the selection strategy demonstrated in the present study successfully provided perspective on relationships with the respective physical properties. In addition, other selected descriptors, which were not discussed above, may support the interpretation of the physical properties. This is worth further detailed investigation in the future.
As for data processing, differentiation was effective to enhance the features of overlapped or broad signals, particularly in the 1H MAS NMR spectra and the DTG curves. Moreover, the measurement descriptors selected above were mostly generated by binning rather than dimension reduction and curve deconvolution. This is because binning enables the compression of the measured information more specifically for certain molecular structures, dynamics, and experimental events. At the same time, there have been alternative methods of dimension reduction and deconvolution (e.g., independent component analysis60 and nonnegative matrix factorization61). Further investigation of data-processing techniques would contribute to the development of descriptors with more efficient and potential compositional information.