Distinction Between Natural and Articial Agarwood Based on Electronic Nose Data

Background: Agarwood is widely used as a traditional medicine all over the world. Distinction between the qualities of natural and articial agarwood is a current hot research topic among agarwood research communities. An important sensory characteristic of agarwood lies in its incense smoke, and an analysis of incense smoke has been traditionally used to evaluate the agarwood quality since ancient times. The aim of this study is to establish a rapid detection method using electronic nose (E-nose) systems to distinguish between natural and articial agarwood. Result: Incense smokes of 45 natural and articial agarwood samples were analyzed by E-nose, and principal component analysis (PCA) was employed to cluster the E-nose data. The chemical markers which could be used to distinguish between articial and natural agarwood were identied by GC-MS combined with information value and decision tree algorithm. The results showed that the smellprints of articial agarwood contained more peaks, while those of natural agarwood had higher response intensities. The compounds that were different between the two types of agarwood were three sesquiterpenes and six chromone derivatives. The result from decision tree algorithm further showed that 6-hydroxy-2-(2-phenylethyl)chromone was the chemical marker that could be used to distinguish between articial and natural agarwood. Nootkatone and 2-(2-phenylethyl)chromone were the chemical markers that may contributed to the clustering of the E-nose data; the two compounds can be used to evaluate the incense smoke of agarwood. Conclusion: We demonstrated that our developed E-nose-based method could rapidly distinguish between the incense smokes of articial and natural agarwood; this method could be applied to evaluate the quality of agarwood in the future.


Background
Agarwood is a traditional medicinal material consisting of the xylem and resin of Aquilaria and Gyrinops plant species. Other ve plant species, including Aetoxylon, Enkleia, Gonystylus, Phaleria and Wikstroemia, have also been reported to produce agarwood [1]. That Aquilaria as a major source of agarwood is widely distributed in South and South-east Asia. Agarwood is used as a traditional medicine in China and other South-east Asian countries, and as a high-grade perfume or artware in other parts of the world [1][2][3]. Agarwood is formed when Aquilaria trees are injured, and there are two types of agarwood: natural agarwood and arti cial agarwood. The formation of natural agarwood is random and infrequent; approximately 7-10% of trees produce natural agarwood as a result of natural factors, such as fungi or wounding caused by wind, lightning strikes, the gnawing of ants or insects [4]. These formation processes occur very slowly and usually take over decades to complete; for this reason, the production of natural agarwood can not meet the requirements of the markets worldwide due to its over-exploitation [5]. In attempts to solve the issue on sustainable use of agarwood, cultivation of Aquilaria tree is currently being promoted in many countries, such as China, Indonesia, Laos, Malaysia, Cambodia, Myanmar, Thailand and Vietnam [6], and all Aquilaria spp. have been listed as endangered plants and have been included in Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Another approach that is used to solve this issue is the production of arti cial agarwood from Aquilaria trees. Various methods have been used to produce arti cial agarwood, such as burning, cutting, holing, fungal pathogen infection, treatment with chemicals and fungal pathogen [7][8]. These methods can produce high-quality arti cial agarwood that can replace natural agarwood, thus can partly relieve the pressure caused by the high demand of the markets worldwide.
The main compounds in agarwood are sesquiterpenes and chromone derivatives [2,[9][10]; their compositions in natural and arti cial agarwood are greatly different. Many studies have been conducted to discriminate between the qualities of natural and arti cial agarwood. Gao et al. [11] have studied the quality of agarwood using gas chromatography-mass spectrometry (GC-MS) ngerprint analysis and multivariate analysis, and have identi ed 22 metabolic markers that can be applied to evaluate the qualities of natural and arti cial agarwood. Li et al. [12] have constructed fourteen chromone derivatives using UPLC-ESI-QTOF-MS and multivariate statistical methods, and have presented results that may be useful for distinguishing between natural and arti cial agarwood. Espinoza et al. [13] have employed the direct analysis in real time (DART TM ) and time-of-ight mass spectrometry (TOF-MS) to distinguish between natural and arti cial agarwood based on the composition of 2-(2-phenylethyl)chromone derivatives. Ismail et al. [14] have established the PLS-DA and Random Forests Classi cation Models using 1 H-NMR-based metabolomics and employed them to discriminate between different gaharu with different grades.
Some traditional analysis techniques that are currently used to analyze the chemical components of agarwood include GC-MS [11,[15][16][17], LC-MS [18][19][20] and UPLC-MS [12]. These techniques, however, cannot achieve the rapid and large-scale characterization of agarwood because they require large equipment and complicated pre-processing. Thus, it is necessary to establish a method that can rapidly distinguish natural agarwood from arti cial agarwood, one method is the analysis of incense ingredients of agarwood. Some studies have used headspace gas chromatography-tandem mass spectrometry (HS-GC-MS) to analyze the incense ingredients of agarwood [21][22], which can be used to evaluate the quality of agarwood. Another promising approach is using electronic nose (E-nose) systems, which are fast and sensitive, require simple pre-processing and operation steps, and can provide the overall information on the volatile components in samples. Gas sensor array with global selectivity and chemometric model-based signal analysis has also been combined into the E-nose systems to convert of the change of odor into an electrical signal [23][24]. In recent years, E-nose systems have been widely applied to discriminate between odors in many elds, such as agricultural production [25][26], food quality detection [27][28], disease diagnosis [29][30], environmental monitoring [31][32], and chemical safety [33-34].
Incense smoke is an important sensory characteristic of agarwood, and analysis of incense smoke of agarwood has been used as a traditional method for evaluating the agarwood quality since ancient times. It is necessary to establish a method that can rapidly distinguish between natural from arti cial agarwood. In this study, two types of agarwood samples, natural and arti cial agarwood, were collected. The smellprints of the agarwood samples were acquired by E-nose system and were then clustered using principal component analysis (PCA). The clustering results were then veri ed by GC-MS and analyzed using information value and decision tree algorithm. The developed approach would lay a foundation for future studies on the evaluation of agarwood quality.

Materials and chemicals
Forty-ve agarwood samples were collected and analyzed. Thirty-one samples were natural agarwood originated from Vietnam, Malaysia, Indonesia, Myanmar, China (Hong Kong, Guangxi, Guangdong, Hainan) and Unknown places; and fourteen samples were arti cial agarwood produced from ve year-old matured A. sinensis trees. The trees were planted in the experimental base of A. sinensis located in Xinyi, Guangdong, China (22°21′ N, 110°21′ E, height 119 m), and were identi ed as A. sinensis by Prof. Yan Hanjing (College of Traditional Medicine, Guangdong Pharmaceutical University). The formation of resin in A. sinensis trees was induced by three methods, including physical damage, chemical stimulation and chemical plus fungal stimulation. The physical damage method was carried out by burning. The chemical stimulation was carried out using acetic acid and salicylic acid. The chemical plus fungal stimulation methods were carried out using Fusarium sp. A2, Nigrospora oryzae A8, and Botrysphaeria rhodina A13 (Wang et al., 2009), which were provided and identi ed by Associate Prof. Li Haohua (Guangdong Institute of Microbiology). Salicylic acid plus fungal liquid fermentation product were slowly injected into the xylem part of the trees according to the method described by Gao et al. [11]. All arti cial agarwood samples were harvested after more than 12 months of agarwood induction. The information of all agarwood samples are provided in Supplementary Chloroform (analytical grade) was purchased from Guangzhou Chemical Reagent Factory (Guangdong Province, China). Alkane standards (C 10 -C 31 ) were purchased from AccuStandard Inc. (USA) Sample preparation for E-nose analysis All agarwood samples were dried at room temperature, smashed into powder and then ltered through 60mesh sieves. The agarwood powder (10 mg) was weighted and placed in a 20-mL head space bottle. The bottle was sealed and incubated at 100℃ for 30 min; the incense smoke of agarwood generated in the bottle was then used for E-nose analysis.
Apparatus and conditions of E-nose analysis E-nose analysis was conducted using an ultra-fast gas phase electronic nose Heracles II (Alpha MOS, France) equipped with an MXT-5 nonpolar metallic capillary column ( ), an MXT-1701 polar metallic capillary column ( ), a trap and a ame ionization detector.
The column temperatures were programmed as follows: initial temperature of 50 ℃ was held for 2 s; ramped to 80 ℃ at a rate of 1 ℃/s and held for 5 s; nally ramped up to 250℃ at a rate of 3 ℃/s and held for 21 s.

Processing of E-nose data
To generate the smellprints of agarwood samples, the E-nose data on the incense smokes of 45 agarwood samples were analyzed and processed. The areas of partial chromatographic peaks were used as variables in PCA, which was performed using AlphaSoft statistical software 14.3 (Alpha MOS, France), to obtain the clustering result of 45 agarwood samples.

Sample preparation for GC-MS analysis
All agarwood samples were dried at the room temperature, smashed and then ltered through 60-mesh sieves.
The agarwood powder (0.5 g) was extracted with 10 mL of chloroform at room temperature for 24 h. The solvent was evaporated in water bath at 70 ℃ until a viscous semi solid was obtained. The solid was then reconstituted in 1 mL of chloroform in an airtight-sealed vial and stored in darkness at 4 ℃.
Apparatus and conditions of GC-MS analysis GC-MS analysis were performed using a GCMS QP-2010E (Shimadzu) equipped with an Rtx-5MS capillary fused silica column (30 m × 0.25 mm; I.D. 0.25 μm lm thickness; Restek Corp. Bellefonte, USA). Helium was used as the carrier gas and owed at a ow rate of 1 mL/min. The injection volume was 1 μL, the split ratio was 1:30 and the injector temperature was 260℃. The oven temperatures were programed as follows: initial temperature of 90 ºC was held for 4 min; increased at a rate of 2.5 ºC/min to 160 ºC and held for 5 min; increased at a rate of 0.3 ºC/min to 180 ºC and held for 5 min; increased at a rate of 2.0 ºC/min to 200 ºC; nally increased at a rate of 1º C/min to 230 ºC and held for 120 min. The mass spectra were recorded at a voltage of 70 eV at the m/z range of 50 to 500 amu.

Processing of GC-MS data
The GC-MS data les of all 45 samples were converted into NetCDF les using Shimadzu GCMS Postrun Analysis software. The detection of mass ions, correction of retention time, alignment of mass ions, annotation of the label (mass ions-retention time) of mass ions and calculation of the intensity value of mass ions in the NetCDF les were carried out using XCMS package (R-gui 3.3.1). The XCMS data set containing the label and value of mass ions was saved as "csv" format.
Information value can be used to digitize the importance of variables from binary classi cation. In this study, The 45 samples were divided based on the clustering result of E-nose, and the information value of the mass ions in the XCMS data set was assigned by Scorecard package (R-gui 3.6.1). The mass ions from the XCMS data set were clustered using ClustOfVar package (R-gui 3.6.1), and the distinct mass ions that had the maximum information value from each cluster were screened. The chromatographic peaks of the featured mass ions were identi ed based on their retention time in AMDIS. The mass spectral fragmentation patterns were compared with those stored in the NIST Mass Spectral Library (NIST05) to identify compounds that were signi cant different. The retention index was calculated using a series of n-alkanes (C 10 -C 31 ).
Decision tree algorithm was used to explain and verify the clustering result of the E-nose data. The classi cation rule was obtained based on the percentage of the peak area of compounds that were signi cant different according to the decision tree algorithm. ROC was used to verify the validity of the tree model.

E-nose analysis
The 3D plot showing the score scatter of the PCA results is shown in Fig. 1. Three principal components PC1, PC2 and PC3 were obtained, with contribution rates of 66.468%, 17.266% and 13.702%, respectively. The total contribution rate was as high as 97.436%, indicating that the PCA model of the E-nose data contained most of the data of the 45 samples. As shown in the 3D plot, the 45 samples were obviously separated into two clusters (group 1 and group 2): the samples in group 1 were arti cial agarwood, and those in group 2 were natural agarwood. This result suggests that the arti cial and natural agarwood could be discriminated based on their E-nose data. Further analysis of the overlapping smellprints of each group (Fig. 2) showed that the smellprints of arti cial agarwood contained a higher number of peaks than those of natural agarwood. By contrast, the response intensity of the smellprints of natural agarwood was higher than those arti cial agarwood. These results indicated that the composition of the incense smoke of arti cial and natural agarwood were different.

GC-MS analysis
Screening and identi cation of distinct metabolites GC-MS analysis was used to study the difference between the chemotypes of arti cial and natural agarwood in the 45 slampes, and to describe and verify the clustering of the E-nose data. A total of 995 GC-MS mass ions were found to be aligned, and their intensity values were assigned by XCMS package and recorded as the XCMS data set.
The 45 agarwood samples were separated into two clusters according to their clustering results of E-nose; 881 aligned mass ions in the XCMS data set were screened, their information values were calculated by Scorecard package and clustered by ClustOfVar package. The diagram showing the clustering of the aligned mass ions is shown in Fig. 3. According to the diagram, 17 clusters were created when the cluster height was set to 5; the distinct mass ions from each cluster that had the maximum information value can be found in Supplementary Table A The metabolites that had distinct mass ions were identi ed by comparing their mass spectra and retention index with those of commercial standards. As shown in Table 1, three sesquiterpenes and six chromone derivatives were identi ed, among which nootkatone, verrucarol and velleral have been reported in previous studies on agarwood (

Chemotypes of agarwood
Incense smoke of agarwood is constituted by many compounds mixed at different proportions, and chemotype can describe the type and content of these medicinal materials. Chemotypes of arti cial and natural agarwood samples were shown based on the relative contents of nine compounds that were signi cant different. As depicted in Fig. 4, the chemotypes of the two clusters were rather different: verrucarol, velleral, 6-hydroxy-2- Decision tree algorithm of agarwood Decision tree algorithm is a sorting technique that can be used to formulate the classi cation rule based on the clustering result and the data set of samples. In this study, the relative contents of nine signi cant different compounds in the agarwood data set were analyzed by decision tree algorithm. The 45 samples were separated into two clusters based on their E-nose data, and 70% of the data in the agarwood data set was assigned as the training group, whereas 30% was assigned as the test group. According to the decision tree shown in Fig. 6, 6-hydroxy-2-(2-phenylethyl)chromone (C5), which had a higher content in arti cial agarwood than other compound, was a maker compound that could separate the two sample clusters. The confusion matrix of the training and test groups (Supplementary Table A.3) further indicated the sorting process of the decision tree algorithm. Twenty-one natural agarwood samples and 11 arti cial agarwood samples were entered into the training group, from which 19 natural agarwood samples and 10 arti cial agarwood samples were accurately classi ed by decision tree algorithm. In addition, 10 natural agarwood samples and 3 arti cial agarwood samples were entered into the test group, the results from which showed that 9 natural agarwood samples and 3 arti cial agarwood samples could be accurately classi ed by decision tree algorithm.
Some performance indexes were also employed to evaluate the performance of the decision tree model ( Table   2 and Fig. 7). Accuracy, precision and recall can show the classi cation ability of the machine learning model. According to the results, the accuracy, precision and recall of the decision tree model were high in the training and test groups. Additionally, the F1-score and AUC of the training and test groups, which are important indicators in measuring the reliability of the machine learning model, were high. These results show that the decision tree model has a good classi cation ability and 6-hydroxy-2-(2-phenylethyl)chromone (C5) is the chemical marker that could be used to distinguish between arti cial and natural agarwood.

Discussion
Analysis of incense smoke produced by agarwood is a traditionally used method for evaluating the quality of agarwood, and E-nose system can visualize the incense smoke as the smellprints. In this study, we observed that the smellprints of incense smokes of arti cial and natural agarwood were signi cantly different. Whereas the smellprints of arti cial agarwood contained more peaks, those of natural agarwood had a higher response. This result suggests that the primary chemical compositions of the incense smoke from the two types of agarwood are different. The chemical composition of the incense smoke of the arti cial agarwood was more complex than that of the natural agarwood, likely due to the different arti cial methods used to stimulate agarwood formation. However, the contents of some volatile compounds of the incense smoke of the natural agarwood were higher than those of the arti cial agarwood. The GC-MS analysis further revealed the difference between the chemical compositions of the incense smokes of arti cial and natural agarwood, and the E-nose system, which has the advantages of being simple and having high speed, could generate the smellprints that can be used to distinguish between the natural and arti cial agarwood. Overall, we demonstrated that the E-nose system was successfully applied to analyze the incense smoke of agarwood.
The clustering of the E-nose data of the 45 agarwood samples were further analyzed by GC-MS combined with information value (the parameter that can screen different variables from binary classi cation) and decision tree algorithm. According to the results, three sesquiterpenes and six chromone derivatives were identi ed in this study. The chemotypes analysis further showed that the main compounds in the arti cial and natural agarwood were different: 2-(2-phenylethyl) chromone derivatives were the main compounds in the arti cial agarwood, while sesquiterpenes and 2-(2-phenylethyl) chromone derivatives were the main compounds in the natural agarwood. The chemical composition of incense smoke produced by agarwood was also found to be affected with the chemical composition of the agarwood. Kao et al. [21] have reported that the two main chemicals of the incense smoke of agarwood include sesquiterpenes (38%) and 2-(2phenylethyl)chromone (18%). Our previous study has also shown that the content of sesquiterpenes in the natural agarwood was higher than those in the arti cial agarwood [11]; and similar results were obtained in this study. The E-nose system is usually used to detect compounds with low boiling points, and our results suggested that sesquiterpenes may be the compounds that can mainly be detected by the E-nose system. Thus, the high response intensity of the smellprints of natural agarwood may be due to the higher content of sesquiterpenes in our study. Meanwhile, nootkatone and 2-(2-phenylethyl)chromone were also detected by HS-GC-MS in the incense smoke of agarwood [21][22], the pre-processing of sample from E-nose system in our study was consistent with HS-GC-MS. This suggests that nootkatone and 2-(2-phenylethyl) chromone may also be detected by E-nose system, and may contribute to the clustering results. We also found that nootkatone and 2-(2-phenylethyl)chromone had higher contents in the natural agarwood compared to those in arti cial agarwood. Therefore, nootkatone and 2-(2-phenylethyl)chromone can be used as chemical markers in the evaluation of the incense smoke from agarwood. The results from decision tree algorithm showed that the content of 6-hydroxy-2-(2-phenylethyl)chromone could be used to distinguish between arti cial and natural agarwood. Some studies have also shown that the contents of chromone derivatives in natural agarwood are signi cantly different from those in arti cial agarwood [11][12]21], which is consistent with our results. However, at present, the presence of 6-hydroxy-2-(2-phenylethyl)chromone in the incense smoke of agarwood has not been reported. These results indicated that chromone derivatives other than 2-(2phenylethyl)chromone may be di cultly detected by the E-nose system.
Analysis of incense smoke of agarwood has not been widely used for the evaluation of agarwood quality, and most past researches used solvent extracts of agarwood as analytes. One of the main applications from agarwood is to produce incense smoke, which was then detected by E-nose system, thus distinguishing between the natural and arti cial agarwood. Therefore, the analysis of incense smoke is meaningful method for evaluating the quality of agarwood, and the E-nose system is a new technique that may have a promising application in researches or studies of agarwood.

Conclusions
In conclusion, the E-nose system was used to distinguish between the incense smokes of natural and arti cial agarwood. The GC-MS analysis combined with information value and decision tree algorithm was further employed to identify the different compounds. Based on the GC-MS analysis, 6-hydroxy-2-(2phenylethyl)chromone was found to be the chemical marker which could be used to distinguish between arti cial and natural agarwood. Additionally, nootkatone and 2-(2-phenylethyl)chromone were identi ed to be the chemical markers contributing to clustering of the E-nose data. These two compounds can be used to evaluate the incense smoke of agarwood. Overall, this work presents an E-nose-based approach for distinguishing between natural and arti cial agarwood; it could also be applied to evaluate the quality of agarwood.