Study of the dynamic changes in the chemical constituents of Soapberry (Sapindus mukorossi Gaertn.) pericarp during fruit development by a non-targeted metabolomics approach

Soapberry (Sapindus mukorossi Gaertn.) is a multi-functional tree, which is widely used in daily chemicals, biomedicine, biomass energy and landscaping. The pericarp of soapberry can be used as medicine or detergent. However, there is no systematic study on chemical constituents of soapberry pericarp in fruit development, and the dynamic changes of these constituents are far from clear. In this study, we applied a non-targeted metabolomics approach using an ultra-high performance liquid chromatography-Q Exactive HF hybrid quadrupole-Orbitrap mass spectrometer (UHPLC-QE-HF-MS) to comprehensively prole the variations of metabolites in soapberry pericarp at eight fruit development stages. The metabolome coverage of UHPLC-QE-HF-MS on a HILIC column was higher than that of a C18 column. A total of 111 metabolites were putatively identied, and these metabolites showed three accumulation patterns (pre-accumulation, mid-accumulation and post-accumulation) with fruit development. Twenty-ve of these 111 metabolites (including amino acids and their derivatives, avonoids, organic acids, fatty acids, nucleotides and their derivatives, alkaloids, carbohydrates, terpenoids, vitamins, phosphorylated intermediates) were present at signicantly different levels between the two adjacent stages, which were involved in 13 KEGG pathways, among them 5 pathways (avonoid biosynthesis; histidine metabolism; aminoacyl-tRNA biosynthesis; avone and avonol biosynthesis; and phenylalanine, tyrosine and tryptophan biosynthesis) were most relevant. S8 stage (fruit ripening stage) is the most suitable stage for fruit harvesting to utilize the pericarp, during which the accumulation of many bioactive and valuable metabolites (e.g., furamizole, alpha-tocopherol quinone, sucrose) in the pericarp was highest. To the best of our knowledge, this was the rst time that the metabolomics in soapberry pericarp during the whole fruit development was development of that 3-dehydroquinic and protocatechuic tyrosine tryptophan biosynthesis. of protocatechuic the stages S1-S8. that diseases neoplasms.


Background
Metabolomics/metabonomics aims to study the components and dynamic changes of all small molecule metabolites (MV<1000) in an organism or tissue, even in a single cell 1 . Among them, the purpose of non-targeted metabolomics is to pro le all the compounds that can be detected. The differential metabolites are screened through multivariate statistical analysis, and the pathway analysis is carried out to reveal the physiological mechanism 2 . Liquid chromatography-mass spectrometry (LC-MS) is the most common analytical platform in metabolomics, and suitable for the analysis of compounds with poor stability and low volatility in complex samples because of its low requirements for sample pretreatment and no need for chemical derivatization. Moreover, analysis of different polar metabolites can be realized by using different chromatographic columns 3 .
In recent studies, metabolomics has been used to analyze the metabolic characteristics of different growth or processing periods. For example, Toffali et al 4 generated a detailed picture of the changing metabolic profiles during late berry development in the important Italian grapevine cultivar Corvina based on liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS) analysis on an analytical Alltima HP C18 column. The changes of constituents of black tea 5 and oolong tea 6,7 during their manufacturing process were also studied by using LC-MS on a C18 column or T3 column.
Additionally, non-targeted LC-MS is widely used in biomarker discovery and disease diagnosis 8,9 . Moreover, non-targeted metabolomics analysis based on LC-MS combined with gas chromatographytime-of-flight mass spectrometry (GC-MS), has been also used to study the characterization of biochemical variation within species or the relationship between metabolites and phenotypes or genotypes 10 . As we know, most of the chromatographic columns used in the non-targeted LC-MS study are C18 or T3 columns, and the use of Hydrophilic interaction liquid chromatography (HILIC) columns is still rare. However, the application of HILIC in bioanalysis has increased gradually, because of its advantages for separating polar compounds 11 . Soapberry (Sapindus mukorossi Gaertn., Sapindaceae, Sapindus) is an economic tree with multifunctional comprehensive utilization value which is widely used in toiletries, biomedicine, biomass energy and landscaping. It is mainly distributed in the south of the Yangtze River in China, Indochina Peninsula, India and Japan 12 . In addition, soapberry is a traditional medicinal plant in China. According to the Compendium of Materia Medica, an ancient Chinese pharmaceutical book, its pericarp can be used to wash hair and face to cure dandruff and freckles 13 . Modern pharmacological studies have also shown that the pericarp of soapberry has anti-in ammatory, anti-tumor, anti-bacterial, anti-viral, hepatoprotective, insecticidal and the other bioactivity [14][15][16][17][18] . Besides, soapberry pericarp is rich in saponins, with good nonionic surface activity, high foaming property and strong decontamination ability, which can replace raw materials for petrochemical products to produce detergent 19 . Previous studies reported that the main chemical constituents of the soapberry pericarp are terpenoids (especially triterpenoid saponins and sesquiterpenoid glycosides), phenylpropanoids, steroids and saccharides 15,[20][21][22][23] . At present, the researches on soapberry pericarp mainly focus on the saponin components and their bioactivity. The study of the non-targeted metabolomics of the pericarp of soapberry at different fruit development stages has not yet been reported.
The objectives of this study were: 1) to comprehensively describe the dynamic changes of the soapberry pericarp metabolites at eight different fruit development stages via non-targeted metabolomics using an ultra-high performance liquid chromatography-Q Exactive HF hybrid quadrupole-Orbitrap mass spectrometer (UHPLC-QE-HF-MS) on an Accucore Vanquish C18 column and a SeQuant ZIC pHILIC column; 2) to identify the differential metabolites in the two adjacent periods; 3) to analyze the Kyoto Encyclopedia of Genes and Genomes ( KEGG) pathway of these differential metabolites. To the best of our knowledge, this was the rst time that the metabolomics in soapberry pericarp during the whole fruit development was pro led. The results could offer valuable information for harvesting, processing and application of soapberry pericarp, and the biosynthesis of the main metabolites.
Plant materials. Three ten-year-old soapberry select tree (average tree height 6.5 m, average DBH 13.5cm, and average annual output 20 kg) were used, which cultivated in an orchard located in Jianning County, Fujian Province, China (latitude 26°49′ N and longitude 116°52′ E, altitude 300 m above sea level). The soapberry fruits were sampled during the period from June to November 2018 at eight time points, corresponding to initial fruit stage (S1, 15 days after owering), cotyledons growth stage (S2, 45 days after owering), fruit expanding stage (S3 and S4, 75 and 90 days after owering, respectively), seed hard shell stage (S5, 105 days after owering), fruit turning color stage (S6, 120 days after owering), fruit near ripening stage (S7, 135 days after owering) and fruit ripening stage (S8, 150 days after owering) ( Fig. 1). Three biological replicates were taken at each time point resulting in a total of 24 samples. Fruit was randomly picked from the East, South, West and North directions of the middle and upper part of the crown of the trees, and the number of fruits picked was determined by their size. Each biological replicate comprised fruit randomly extracted from a pool of soapberry fruit collected from each tree. The pericarp and the seed were separated, and the pericarps were immediately frozen in liquid nitrogen and stored at -80 ℃. The pericarps were dried using a vacuum freeze dryer (LGJ-10, Beijing songyuan huaxing biotechnology co., Lto., Beijing, China). The dried pericarps were then ground into uniform powder by a ball mill (MM400, Retsch, Germany). conditions were maintained for 5 min to equilibrate the column. The ow rate was 320 μL/min and the injection volumes were set to 2 μL. To avoid possible bias, the sequence of injections was randomized with a QC run after injection of 8 samples used for normalization in quantitation data analysis. The conditions for mass spectrometry have been previously described 24 . In brief, nitrogen as sheath, auxiliary, and sweep gas was set at 50, 8, and 1 U, respectively. Data were acquired under resolving power of 120,000 (at m/z 200); automatic gain control target, 3×10 6 ions; maximum injection time, 100 ms; scan range, 70-1050 m/z; spray voltage, 3.50 kV; and capillary temperature, 275°C. ESI−/+ data-dependent MS/MS spectra were generated for the QC samples and used for identi cation purposes, MS/MS data were acquired with a full scan followed by top 15 MS/MS scans with resolving power of 15,000 (at m/z 200); automatic gain control target, 1×10 5 ions; maximum injection time, 50 ms; isolation window, 0.4 m/z; and NCE 20, 30, 40. The acquired raw les were processed using Compound Discoverer 2.1. An untargeted metabolomics work ow with putative identi cation through in-house mass list, ChemSpider and mzCloud databases were used for processing the raw data and for compound annotation. The software parameters for alignment were 5 ppm mass tolerance for the adaptive curve model and 0.5 min maximum shift for alignment. The software parameters for detecting unknown compounds were 5 ppm mass tolerance for detection, 30% intensity tolerance, 3 for the signal to noise threshold, and 2×10 6 as minimum peak intensity. (4) Data analysis. A three-dimensional data matrix, including the metabolite name (putatively identified by UHPLC-QE-HF-MS), sample information (three biological repeats for each sample), and raw abundance (peak area for each putatively identified metabolite) was generated (Supplemental le 1). Raw data were subjected to three categories of normalization: normalization by median, log transformation, and pareto scaling. SIMCA 14.1 software (Umetrics, Umea, Sweden) was used to perform unsupervised principal component analysis (PCA) and supervised orthogonal projection to latent structures-discriminant analysis (OPLS-DA). R statistical environment was used to perform univariate analysis (fold change analysis, t-tests), K-means clustering analysis (Pearson correlation distances, Ward.D clustering algorithm). Time-series clustering analysis was performed using the Mfuzz package to study temporal expression patterns of the differential metabolites. And the differential metabolites were uploaded to MetaboAnalyst 4.0 (http://www.metaboanalyst.ca/) platform for Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway analysis.

Results
Metabolic profiles. UHPLC-QE-HF-MS-based untargeted metabolomic approaches were performed to profile the metabolites present in pericarp of soapberry fruit in 8 different ripening stages. After preprocessing, 1790 features were extracted from the UHPLC-QE-HF-MS on the C18 column in negative mode, and 13000 features (5000 in negative mode and 8000 in positive mode) were extracted on the HILIC column. We couldn't extract metabolites from the chromatogram using the C18 column in positive mode. Compared with in-house database and the online databases, as well as the literature and commercial standards, 56 metabolites were putatively identi ed in the negative ion mode on the C18 column, 19 of them were putatively annotated, and 37 metabolites were putatively chemical formula; 265 metabolites were putatively identi ed in the negative ion mode on the HILIC column, 88 of them were putatively annotated, and 177 metabolites were putatively chemical formula; and 53 metabolites were putatively identi ed in the positive ion mode on the HILIC column, 28 of them were putatively annotated, and 25 metabolites were putatively chemical formula. Excluding the common metabolites, a total of 111 metabolites were putatively annotated (Supplemental le 2: Table S1), including 34 amino acids and its derivatives, 12 organic acids, 10 fatty acids, 9 amines, 6 avonoids, 6 nucleotides and its derivatives, 5 alkaloids, 4 carbohydrates, 4 terpenoids, 3 vitamins, 3 phosphorylated intermediates, 2 phenylpropanoids, and 13 other metabolites.
(1) Principal component analysis. We performed unsupervised principal component analysis (PCA) to assess the variations in the 111 metabolites detected across the 24 pericarp samples from soapberry fruit in 8 different development stages. According to the PCA score plot (Fig. 2), QCs were clustered crowdedly, indicating that the experimental method was reliable and the instrument was stable. In addition, all 24 samples were within the 95% confidence regions, and eight groups were separated clearly in the PCA score plot indicating distinct metabolome patterns among them. Furthermore, clear stepwise alterations of soapberry pericarp metabolome were observed during the fruit development process from S1 to S8. Stage S1 was on the right side of the rst principal component, and the other 7 stages were all on the other side, suggesting that larger variations were observed within the S1 when compared to the following 7 stages.
(2) Clustering analysis. The relative content of metabolites was re ected by the abundance of metabolites. The abundance of 111 putatively annotated metabolites was analyzed by hierarchical clustering with heat map. According to the difference of metabolite abundance between different stages, the metabolites could be divided into two clusters. Stage S1 and stage S2 were classi ed into one cluster and the others (stages S3-S8) into another cluster (Fig. 3). And stages S3-S8 were clearly divided into two clusters, S3-S5 and S6-S8. The thermogram was also divided into two clusters by the changes of metabolites with fruit development. Cluster 1 included 38 metabolites (9 amino acids and their derivatives, 5 organic acids, 8 fatty acids, 1 amine, 3 avonoids, 2 nucleotides and their derivatives, 3 terpenoids, 1 vitamin, 2 phosphorylated intermediates, and 4 other metabolites), and the levels of which were the highest at stage S1, and then gradually decreased. However, cluster 2 could be subdivided into two subgroups, subgroup 1 contained 39 metabolites (9 amino acids and their derivatives, 5 organic acids, 2 fatty acids, 1 amine, 2 avonoids, 5 alkaloids, 3 carbohydrates, 2 vitamins, 1 phosphorylated intermediate, 2 phenylpropanoids, and 7 other metabolites), these metabolites accumulated gradually with the development of fruit and reached the highest at stage S8; subgroup 2 had 34 metabolites (16 amino acids and their derivatives, 2 organic acids, 7 amines, 1 avonoid, 4 nucleotides and its derivatives, 1 carbohydrate, 1 terpenoid and 2 other metabolites), and their relative contents were the highest at stage S4.
(3) OPLS-DA analysis. PCA allows for an analysis of which factors are responsible for the largest part of the variation of the data. In contrast, supervised models can be used to understand whether samples can be separated using a speci c factor. Therefore, we divided the samples into seven groups according to stages: S1-S2, S2-S3, S3-S4, S4-S5, S5-S6, S6-S7, S7-S8. Next, supervised OPLS-DA models were built.
Figures of merit of the internal cross-validation are shown in Supplemental le 2: Table S3. R 2 X, R 2 Y and Q 2 of the tting equations were all greater than 0.5, and the differences among the parameters were small (Supplemental le 2: Table S3), indicating that the tting equations were reliable. According to the OPLS-DA score plots (Supplemental le 2: Fig. S1 A-G), all samples in different groups were located in 95% confidence regions, and could be distinguished by OPLS models, which indicated that there were signi cant differences between groups; The samples in the same group were concentrated, showing that the samples had good repeatability. It could be seen from the OPLS-DA permutation test graphs (Supplemental le 2: Fig. S2 A-G) that all Q 2 points were lower than the original Q 2 points on the right, and Q 2 was less than 0, and the regression lines of R 2 and Q 2 cross with the abscissa or less than 0, which demonstrated that these OPLS-DA models were reliable and effective 25 .
KEGG pathway analysis of differential metabolites. To better elucidate the biological functions of differential metabolites, a pathway analysis was performed by comparing the metabolites with the KEGG reference pathway. As expected, these metabolites were involved in 13 different pathways (Supplemental le 2: Table. S3). Fig. 5 showed the in uence factors of metabolic pathway, according to the impact orlog(P) value, the most relevant pathways were selected, including avonoid biosynthesis; histidine metabolism; aminoacyl-tRNA biosynthesis; avone and avonol biosynthesis; and phenylalanine, tyrosine and tryptophan biosynthesis.

Discussion
Metabonomic pro ling of soapberry pericarp. LC-MS is currently the most widely used means of determining metabolic phenotypes via both untargeted and targeted analysis 26 . Metabolome pro ling studies are often performed using reversed phase (RP) chromatography (particularly C18) because of its robust and reproducible separation characteristics and the coverage of a wide range of metabolites. Nevertheless, most biological matrices contain plenty polar metabolites, which cannot be retained on RP stationary phases. Correspondingly, hydrophilic interaction liquid chromatography (HILIC) has recently appeared to be the best LC approach for complementing RP chromatography in the eld of metabolomics for separation of polar compounds 27,28 . RP-based methods are used for medium to nonpolar metabolites, while HILIC is employed for the more polar metabolites that are not well retained in RP systems 29 . In recent years, the applications of HILIC for non-targeted metabolomics are increasing 30,31 . In the present study, we conducted a non-targeted metabolomic study based on UHPLC-QE-HF-MS with both C18 column and HILIC column in pericarp of soapberry fruit in 8 different ripening stages. After preprocessing, 1790 features in negative mode were extracted on C18 column, and 13000 features (5000 in negative mode and 8000 in positive mode) were extracted on HILIC column, which showed that the coverage of metabolites with HILIC column was higher than that with C18 column. This result suggested that the HILIC column was better than the C18 column in non-targeted metabolomics analysis of the soapberry pericarp. This may be due to the fact that there are more polar compounds in the pericarp, as we putatively identi ed 34 amino acids and their derivatives, 12 organic acids, 10 fatty acids, 9 amines, 6 avonoids, 6 nucleotides and its derivatives, 5 alkaloids, 4 carbohydrates, 4 terpenoids, and 21 other metabolites. Besides, HILIC can be conveniently coupled to MS, especially in the electrospray ionization (ESI) mode, and HILIC mobile phases are very compatible and give high sensitivity 11 . Previous studies demonstrated that there are many triterpenoid saponins in the pericarp of soapberry 15,17,32 . Unfortunately, we did not nd triterpenoid saponins in our non-targeted metabolomics data. The reason for this result may be that there are few reports about soapberry saponins, lack of MS spectra information in various online databases, and authentic standards are not commercially available. In addition, the difference of extraction methods may also be the reason for the absence of saponins. Actually, none of the currently available separation modes in LC offers the ability to monitor the whole types of metabolites encountered in metabotyping studies 25 . In the future, the comprehensive metabolome coverage might be obtained by using the serial coupling of RPLC and HILIC to study the non-targeted metabolomics of soapberry pericarp and continuously optimizing samples preparation, chromatographic conditions and MS parameters 26,29,33 .
Our results indicated that no matter PCA score plot or OPLS-DA score plots, groups were separated clearly among the pericarps in different stages of soapberry fruit development, which demonstrated that the data in different stages were signi cantly different, and the metabolic spectrum had some changes in the process of fruit development. Metabolites showed mostly 3 different patterns of accumulation in pericarp at 8 different development stages. There were 38 metabolites belong to pre-accumulation pattern, including 9 amino acids and its derivatives, 5 organic acids, 8 fatty acids, 3 avonoids, 3 terpenoids, 2 nucleotides and their derivatives, 2 phosphorylated intermediates, and 6 other metabolites. These metabolites were highest in S1 or S2 and then decreased gradually with fruit development. And the pericarp of stage S3-S5 contained higher concentrations of these compounds (including 16 amino acids and their derivatives, 2 organic acids, 7 amines, 4 nucleotides and its derivatives, and 5 other metabolites), which belong to mid-accumulation pattern. Other 39 metabolites (9 amino acids and their derivatives, 5 organic acids, 2 fatty acids, 2 avonoids, 5 alkaloids, 3 carbohydrates, 2 vitamins, 2 phenylpropanoids, and 9 other metabolites) were the post-accumulation pattern, showing a trend of gradual increase and reached the highest at stage S8. It can be seen that there may be a close relationship between metabolites and fruit growth, and further studies are needed to elucidate this relationship. Biological analysis of differential metabolites. Our non-targeted metabolomic revealed 25 differential metabolites of the soapberry pericarp in fruit development, including 8 amino acids and their derivatives, 3 avonoids, 2 organic acids, 2 fatty acids, 2 nucleotides and their derivatives, 1 alkaloid, 2 carbohydrates, 2 terpenoids, 1 vitamin, 1 phosphorylated intermediate and 1 other metabolite. Surprisingly, the number of differential metabolites between S6 and S7 was the most (12), while those between S4 and S5, S7 and S8 were the least (1). Among these 12 differential metabolites, 10 were downregulated, and the maximum folds change was 353.18 (uridine-5'-diphosphate glucose). This indicated that the period from fruit turning color stage (S6) to fruit near ripening stage (S7) is the key period for the change of metabolites of the soapberry pericarp in fruit development.
In this study, the pathway analysis of differential metabolites by using MetaboAnalyst was able to link with 13 metabolic and biosynthesis pathways, and the most relevant pathways were avonoid biosynthesis; histidine metabolism; aminoacyl-tRNA biosynthesis; avone and avonol biosynthesis; and phenylalanine, tyrosine and tryptophan biosynthesis. There were three differential metabolites (i.e., quercitrin, leucodelphinidin, rutin) involved in avonoid biosynthesis and avone and avonol biosynthesis. Quercitrin is a quercetin O-glycoside that is quercetin substituted by an alpha-L-rhamnosyl moiety at position 3 via a glycosidic linkage. It is an antioxidant, antileishmanial agent, and a plant metabolite. We showed that it had high contents in the initial pericarps, and was decreased within stages S2-S3, and reached the maximum at stage S4, then decreased rapidly and remained low contents during stages S5-S8. Other avonoids, rutin and leucodelphinidin were also found changed during the fruit development. Rutin was accumulated highest at stage S1, and then declined continually. Conversely, the level of leucodelphinidin remained low stably within stage S1-S7, and elevated dramatically at stage S8. Rutin, as a metabolite and an antioxidant, is known to have a variety of biological activities including antiallergic, anti-in ammatory, antiproliferative, and anticarcinogenic properties. Leucodelphinidin is a metabolism in the plant. Previously, researchers have studied the avonoids from leaves and stem bark of soapberry 34,35 . Now, we also found several avonoids in the soapberry pericarp, and three of them have obviously changed in the process of fruit development, suggesting that avonoids play an important role in the soapberry. Based on these, the distribution, components, bioactivity and biosynthesis of avonoids of soapberry will be investigated in future work.
In the current study, we obtained 8 differential amino acids and their derivatives. Among them, DLhistidine is involved in histidine metabolism and aminoacyl-tRNA biosynthesis pathway. Alpha-Aminoadipic acid is a metabolite in the principal biochemical pathway of lysine, which can antagonize neuroexcitatory activity modulated by the glutamate receptor N-methyl-D-aspartate. Additionally, other amino acids and their derivatives, N-acetyl-L-glutamine, N-Undecanoylglycine, L-leucyl-L-valine, have a role as metabolite. 3-Hydroxy-L-glutamic acid is a non-proteinogenic L-alpha-amino acid (https://pubchem.ncbi.nlm.nih.gov/).
The results of the present experiment showed that 3-dehydroquinic acid and protocatechuic acid participate in the pathway of phenylalanine, tyrosine and tryptophan biosynthesis. The level of protocatechuic acid was decreased gradually during the stages S1-S8. Recent studies indicated that protocatechuic acid could be used as a protective agent against cardiovascular diseases and neoplasms.
The mechanism of its action is mostly associated with antioxidant activity 36 .
Some terpenoids, such as agnuside and betulin, have also been found in our study. The relative content of agnuside showed a trend of decrease-increase-decrease, and reached the highest in stage S6. And agnuside has an anti-arthritic activity 37 . However, betulin showed a trend of increase-decrease-increase and had higher content in stage S3 and S4. Betulin exhibits a wide spectrum of biological and pharmacological properties, such as anti-HIV, anti-in ammatory, and anti-cancer 38 . This triterpene is also a precursor of triterpenoid saponins of soapberry. As we know, a lupine-type triterpenoid saponin isolated from the soapberry pericarp by Hu et al. 17 , betulinic acid 3-O-β-D-xylopyranosyl-(1→3)-α-Lrhamnopyranosyl-(1→2)-α-L-arabinopyranoside, is derived from betulin. Although triterpenoid saponins have not been identi ed in this study, we might use targeted metabolomics to study triterpenoid saponins in the future.
We found that uridine-5'-diphosphate glucose, 8-hydroxyhexadecanedioic acid, pinolenic acid and glucose 1-phosphate were the highest in stage S1, and then gradually decreased with fruit development.
Uridine-5'-diphosphate glucose, a nucleotide derivative, acts intracellularly as an important intermediate in several different metabolic pathways and biosynthetic reactions, including the biosynthesis of polysaccharides such as starch and glycogen, lipopolysaccharides, and glycosphingolipids 39 . 8-Hydroxyhexadecanedioic acid is a constituent of various plant cutins. Pinolenic acid is an octadecatrienoic acid, which has a role as a plant metabolite and an antineoplastic agent. However, glucose 1-phosphate is a fundamental metabolite (https://pubchem.ncbi.nlm.nih.gov/).
On the contrary, several bioactive and valuable metabolites (e.g., furamizole, alpha-tocopherol quinone and sucrose) were accumulated and remained higher in the late stage of fruit ripening. Furamizole is a nitrofuran derivative that has strong antibacterial activity 40,41 . Alpha-Tocopherol quinone, a vitamin E oxidation product, can decrease androgen receptor protein and transcript levels in prostate cancer cells 29 .
Sucrose is widespread in seed, leaves, fruit, owers and roots of plants, where it functions as an energy store for metabolism and as a carbon source for biosynthesis. In addition to its use as a sweetener, sucrose is used in food products as a preservative, antioxidant, moisture control agent, stabilizer and thickening agent.
We also found some harmful compounds in our metabolome data, such as 5'-Nethylcarboxamidoadenosine, methyldopa, nitrilotriacetic acid and vorinostat, which are health hazard if swallowed. However, these compounds also have biological and pharmacological activities. For instance, 5'-N-ethylcarboxamidoadenosine is an antineoplastic agent and a vasodilator agent; methyldopa and aorinostat have antihypertensive activity and antineoplastic activity, respectively. Besides, these were L-(+)-tartaric acid and 2-acrylamido-2-methyl-1-propane sulfonic acid in the pericarp of soapberry, which are corrosive. L-(+)-tartaric acid is often used as a food additive serving as antioxidants. But in high doses, this agent acts as a muscle toxin by inhibiting the production of malic acid, which could cause paralysis and maybe death. 2-Acrylamido-2-methyl-1-propane sulfonic acid is a Food and Drug Administration approved for use in polymer components of food-contact paper and board adhesive, but it also can cause severe skin burns and eye damage if used unproperly (https://pubchem.ncbi.nlm.nih.gov/). These results veri ed the theory that "the pericarp of soapberry can be used as medicine, but have a small toxicity on people" recorded in the Compendium of Materia Medica, but the speci c mechanism needs further study.
Therefore, these factors should be taken into account in the future when using the pericarp.
According to our investigation on the fruit morphology, the soapberry fruit grows rapidly in the early stage, which can grow to the maximum size at fruit turning color stage (S6), and the pericarp volume is also the largest at this time 42 . Although many bioactive metabolites were accumulated in the pericarp at initial fruit stage (S1) and fruit expanding stage (S4), the rst ve stages were not suitable for fruit harvesting considering the maximization of economic bene ts. However, in the later stage of fruit development, the level of several bioactive metabolites was low at S7 stage, but rapidly accumulated at stage S8, so stage S8 was the most suitable stage for fruit harvesting, and the pericarp collected at this time had the highest utilization value.