Results of portable GC-MS
In this experiment, portable GC-MS was used to detect CGEH and CGEG, and the result shown that the spectra of each origin were basically consistent. Comparing the spectra of CGEG (see Fig. 1), it was found that more components in CEGH was detected. The retention time of highest peaks of samples were similar that appear at about 3.1 minutes and the substance of it was limonene. But the intensity of CGEH was significantly higher than that in CGEG. From this, it could be inferred that the limonene content of CGEH was higher thanCGEG. After 4 minutes of retention time, (Z)-5-[(1R,3R,6S)-2,3-dimethyl-3-tricyclo[2.2.1.02,6]heptanyl]-2-methylpent-2-en-1-ol was detected in CGEH, but it was not detected in CGEG. After qualitative analysis of the data (see Table 2), it could be seen that there were five substances: (3E)-3,7-dimethylocta-1,3,6-triene, 3,7,7-trimethylbicyclo[4.1.0]hept-3-ene, (1S,5S)-6,6-dimethyl-2-methylidenebicyclo[3.1.1]heptane, methyl octadeca-2,5-diynoate, and (Z)-5-[(1R,3R,6S)-2,3-dimethyl-3-tricyclo[2.2.1.02,6]heptanyl]-2-methylpent-2-en-1-ol. They were all detected in CGEH. It could be seen that these substances could be used as differential compounds to distinguish CGEH and CGEG.
Table 2
The Volatile Components of Citri Grandis Exocarpium
Category | Names | CAS | RT/min | Area |
CGEH | 7-methyl-3-methylideneocta-1,6-diene | 123-35-3 | 3.163 | 134523 |
1-methyl-2-propan-2-ylbenzene | 527-84-4 | 3.311 | 13813 |
1-methyl-4-prop-1-en-2-ylcyclohexene | 138-86-3 | 3.328 | 493213 |
(3E)-3,7-dimethylocta-1,3,6-triene | 13877-91-3 | 3.368 | 10057 |
3,7,7-trimethylbicyclo[4.1.0]hept-3-ene | 13466-78-9 | 3.420 | 56508 |
(1S,5S)-6,6-dimethyl-2-methylidenebicyclo[3.1.1]heptane | 18172-67-3 | 3.466 | 10796 |
methyl octadeca-2,5-diynoate | 57156-91-9 | 3.509 | 27755 |
(Z)-5-[(1R,3R,6S)-2,3-dimethyl-3-tricyclo[2.2.1.02,6]heptanyl]-2-methylpent-2-en-1-ol | 115-71-9 | 4.098 | 12801 |
CGEH | 7-methyl-3-methylideneocta-1,6-diene | 123-35-3 | 3.159 | 5703 |
7-methyl-3-methylideneocta-1,6-diene | 527-84-4 | 3.305 | 5912 |
1-methyl-2-propan-2-ylbenzene | 138-86-3 | 3.324 | 180133 |
1-methyl-2-prop-1-en-2-ylbenzene | 7399-49-7 | 3.507 | 7247 |
Results of GC×GC-TOF MS
Total 304 chemicals among which 261 were firstly reported were identified. There were 223 volatile substances in CGEH, among which the content of olefins was the highest. The proportion of olefins was 25.56%-47.55% with the average content 43.09% in 6 batches, alcohols were16.05%-27.96% with the average content 19.04%, aromatics were 15.44%-19.27% with the average 18.19%. Total 252 volatile substances were detected in CGEG, among which the content of olefins was also the highest. Olefins were 43.07%-45.62% with the average content 44.4% in 6 batches, aromatics were 21.44%-22.49% with the average22.1%, alcohols were11.65%-12.68% with the average 12.25%. The volatile components were shown in Table S1. Compared with the literatures [12–15], 50 volatile compounds, such as6-methylhept-5-en-2-one(1),1, 3,3-trimethylbicyclo[2.2.1]heptan-2-ol(2),2-methylhexanoic acid(3),were firstly detected.
Compared with chromatograms of GC×GC-TOF MS, we got that components in CGEH and CGEG were similar and the proportion of olefins was highest in samples. It was equal with result of portable GC-MS. But, compared with the result of GC×GC-TOF MS, we figured that five substances only detected in CGEH by portable GC-MS were analyzed in CGEG. It approved that portable GC-MS could only be used to distinguish CGEH and CGEG and not find the markers.
The chromatograms and Venn diagram of CGEH and CGEG were shown in Fig. 2 and Fig. 3. Total 52 compounds in CGEH did not detect in CGEG, and 81 were unique in CGEG. Many compounds with medicinal function were identified. With anti-asthma, antitussive, anti-degenerative inflammation and phlegm-reducing function, (4R)-1-methyl-4-prop-1-en-2-ylcyclohexene (4) was clinically used for treating cholecystitis [16, 17]. The average content of (4R)-1-methyl-4-prop-1-en-2-ylcyclohexene (4) in CGEH was 8.58%, and that in CGEG was only 5.64%, so the better clinical performance like relieving cough from CGEH than that from CGEG may result from the higher content of (4R)-1-methyl-4-prop-1-en-2-ylcyclohexene (4). The content of (1S,8aR)-4,7-dimethyl-1-propan-2-yl-1,2,3,5,6,8a-hexahydronaphthalene (5) that was an effective component for relieving cough and reducing phlegm in both CGEH and CGEG was about 3.3%.2,6,6-trimethylbicyclo [3.1.1] hept-2-ene (6) had antitussive and expectorant effects [18, 16]. (1R,4E,9S)-4,11,11-trimethyl-8-methylidenebicyclo [7.2.0] undec-4-ene (7) that had a lively and violent clove smell was used for the anti-inflammatory [19], pain relief, paralysis, warming body, and relief of gastritis, etc. In addition, soothing effect on the skin and tissues could be achieved by using (1R,4E,9S)-4,11,11-trimethyl-8-methylidenebicyclo [7.2.0] undec-4-ene (7)[20]. 2,3,5-trimethylpyrazine (8) in CGEG was a high-grade spice with a strong aroma of roasted peanuts or potatoes and acute toxicity (Mouse oral LD50: 806mg/kg), which may result in less effective of CGEG than that of CGEH [21].
Compared with traditional GC-MS [22–25], GC×GC-TOFMS had a better separation ability that was used to solved the problems of co-elution of components, and provided more comprehensive and high-quality information for the complex aroma components. It was seen from Fig. 4 that only one chromatographic peak was observed in the one-dimensional chromatogram near the 35.5 min, while in the two-dimensional three chromatographic peaks, (1,7,7-trimethyl-2-bicyclo [2.2.1] heptanyl) acetate, 1,3,3-trimethylbicyclo[2.2.1]heptan-2-ol,1-ethenyl-1-methyl-2,4-bis(prop-1-en-2-yl)cyclohexane,were observed at the same retention time in the chromatogram. Three aroma components, were separated and identified with GC×GC-TOFMS instead one with GC-MS.
Chemometric Analysis
Chemometrics that is applied to optimize the chemical measurement process and maximally to extract the useful information from chemical measurement data is a branch of chemical disciplines arising from the intersection of chemistry, statistics, mathematics, and computer science[26, 27].Principal component analysis (PCA) is an unsupervised multivariate statistical method that can distinguish samples on the basis of retaining the original information of the data to the greatest extent[28].It is also a common and effective way to compress information by which the complexity can be simplified and a lot of information can be compressed (variables) into several new dummy variables to simplify the problem. It is based on the variable covariance matrix composed of multiple samples, and adopts the method of eigendecomposition to get the virtual principal component that can replace the original variable. The new variables called principal components score are independent and uncorrelated with each other, which can eliminate the correlation and information redundancy in the original data [29].
As shown in the Fig. 5, every sample concludes many indicators and each indicator represents a dimension. In this case, each sample is a multidimensional vector. So, it looks like a high-dimensional data cloud from the perspective of three-dimensional space. PCA will complete the process of dimension reduction by means of projection, which could achieve the purpose of establishing a suitable model for interpretation and prediction. The PCA score plot shows the projection on the plane formed by the two directions PC1 (first principal component) and PC2 (second principal component). The two directions of PC1 and PC2 are virtual rather than specific variables, and are contributed by all variables [30].
Orthogonal partial least squares discrimination analysis (OPLS-DA) is a supervised multivariate statistical method that tends to extract variable information that is beneficial to sample classification, which greatly reduces system noise interference and improves classification efficiency [31].
Variable Importance in The Projection (VIP) defined that the VIP value of the independent variable can reflect the role of the independent variable in the prediction of the dependent variable. Much larger the VIP value of the independent variable play more important role in the prediction of the dependent variable, it reflects the importance of the compound.
The formula of VIP of the jth (j = 1,...,p) compound is as follows:
$$\text{V}\text{I}\text{P}\text{j}=\sqrt{\frac{p\sum _{a=1}^{A}\left({q}_{a}^{2}{t}_{a}^{T}{t}_{a}\right)({w}_{ja}/{‖{w}_{a}‖}^{2})}{\sum _{a=1}^{A}{q}_{a}^{2}{t}_{a}^{T}{t}_{a}}}$$
In the formula, wa and taare the ath column of the model coefficient matrix W and the score matrix T, respectively.qa is the ath element of the matrix Q. wjais the weight value of the jth compound in, and tais composed of a linear combination of all compounds, which can be used to directly predict the sample property matrix Y. It can be seen that the jth compound is the sample property matrix through the first A latent variables Y to explain.
In the formula, \({q}_{a}^{2}{t}_{a}^{T}{t}_{a}\)reflects the ability of ta to explain the sample property matrix Y, so it can be analyzed that if \({q}_{a}^{2}{t}_{a}^{T}{t}_{a}\)is larger than others which indicated the prediction of ta on the sample property matrix Y is important. Meanwhile, if wjais also larger, it reflects that the jth compound plays a more important role in the calculation of ta, which means that the jth compound is more important for the prediction of the model.
As the mean value of the square of the importance coefficient of all compound point variable projections is 1. Some scholars proposed to use the VIP value as the importance index of differential compounds, and then screen out characteristic compounds and eliminate unimportant compounds[32].
To get the characteristic components between CGEH and CGEG, PCA was performed on the components detected in twelve samples, and the results were shown in Fig. 4. There was basically no difference in the composition of Citri Grandis Exocarpium from different batches of the same origin, but there were some fluctuations in the relative content of the components in different batches of samples. However, there were obvious differences between the volatile components in CGEH and CGEG.
The OPLS-DA was applied to extract the VIP of the model, the compounds with |p(corr)|>0.9 and |p|>0.06 combined with VIP value > 1.1 were selected (Table 3), which can help us distinguish CGEH from others. After passed 200 permutation test, the simulated values of the model parameters R2 were greater than the simulated values of Q2, and the intercept of the Q2 regression line was − 0.298 less than 0.652 (see Fig. 7), indicating that the model does not have overfitting phenomenon.
Table 3
Specific Compounds in CGEH
Peak number | Names | CAS | VIP value |
1 | 1,2-diethyl-4-phenylbenzene | 61141-66-0 | 1.56 |
2 | 1-methyl-4-prop-1-en-2-ylcyclohexane-1,2-diol | 1946-00-5 | 1.50 |
3 | dodecan-1-ol | 112-53-8 | 1.42 |
4 | (E)-2-phenylbut-2-enal | 4411-89-6 | 1.37 |
5 | 3,8-dimethylundecane | 17301-30-3 | 1.33 |
6 | tetradecanal | 124-25-4 | 1.27 |
7 | 2-methyltridecane | 1560-96-9 | 1.26 |
8 | 2-methyl-2H-furan-5-one | 591-11-7 | 1.22 |
9 | 2-[2-[2-(2-hydroxyethoxy)ethoxy]ethoxy]ethanol | 112-60-7 | 1.22 |
10 | (1S,3S,5S)-4-methylidene-1-propan-2-ylbicyclo[3.1.0]hexan-3-ol | 3310-02-9 | 1.10 |
The Permutations Plot helps to assess the risk that the current model is spurious. In other words, the model just fits the training set well but does not predict Y well for new observations. The idea of this validation is to compare the goodness of fit (R2 and Q2) of the original model with the goodness of fit of several models based on data where the order of the Y-observations has been randomly permuted, while the X-matrix has been kept intact.
The plot shows, for a selected Y-variable, on the vertical axis the values of R2 and Q2 for the original model (far to the right) and of the Y-permuted models further to the left. The horizontal axis shows the correlation between the permuted Y-vectors and the original Y-vector for the selected Y. The original Y has the correlation 1.0 with itself, defining the high point on the horizontal axis.
The plot (Fig. 7) strongly indicates that the original model is valid. The criteria for validity are: All blue Q2-values to the left are lower than the original points to the right or the blue regression line of the Q2-points intersects the vertical axis (on the left) at, or below zero. The R2-values always show some degree of optimism. However, when all green R2-values to the left are lower than the original point to the right, this is also an indication for the validity of the original model.