3.1. Optimization and validation of the quantitative analytical method
Since ginsenosides had higher sensitivity and clearer mass spectra in the negative ion mode, data collected in the negative ion mode were used for the component detection and characterization (Huang X et al. 2019; Yang WZ et al. 2020). The composition of the mobile phase was investigated for improving the analyte ionization. Most of the standards contained abundant deprotonated molecular ions when the mobile phase consisted of acetonitrile and water. The water-phase additive acetic acid not only improved the LC separation but also helped to form [M + CH3COO]− ions which were helpful to identify the precursor ions of the ginsenosides. After different concentrations of acetic acid were investigated, water containing 0.01% acetic acid was used as the optimal mobile phase.
For the precision test, the peak areas of 20 ginsenosides in real samples were analyzed at 0, 12 and 24 h. The RSD values of the retention times and the peak areas of 20 ginsenosides were less than 3.2% and 9.1%, respectively (Fig. 2A). The repeatability of the assay was confirmed by extracting and analyzing five replicates of the same samples as described above The RSD values of the areas of 20 ginsenosides were all less than 5.5%.
The recovery was used to evaluate the accuracy of the method. A known amount of six ginsenoside standards was added into a certain amount of sample. The mixture was extracted and analyzed using the method above. Performing three replicates of the test. The developed method had good accuracy, with an overall recovery of 92.3–97.1%, RSD values ranging from 5.1–10.2% (supplemental Table 2). These results indicated that the LC-MS method was precise and accurate for the quantitative determination of ginsenosides in ginseng samples.
3.2. Identity assignment and confirmation of the ginsenosides
We used the reference ginsenosides to optimize the mass chromatographic conditions and to obtain the fragmentation pathways of ginsenosides (Huang X et al. 2019). From the MS scans of reference standards, the usual precursor ions of ginsenosides were [M-H]− and [M + CH3COO]−. The negative MS/MS spectra of the product ion [M-H]− exhibited a fragmentation pattern corresponding to the successive loss of the glycosidic units until the formation of [aglycon-H]− ions. Based on the neutral loss, it was easy to elucidate the sugar unit moiety according to a mass difference of 162.0547 Da indicating the presence of a glucosyl (Glc) group, of 132.0431 Da indicating the presence of a pentosyl group [arabinose (Ara) or xylose (Xyl)], of 146.0421 Da indicating the presence of a rhamnosyl (Rha) group, and of 176.0340 Da indicating the presence of a glucuronyl (Glu A) group. Figure 2B shows a representative example illustrating the fragmentation pathways of Rg3, F1 and Ro.
The first mass spectrometry data from ginsenoside Rg3 produced the analytical result of [M-H]− at m/z 783.4749 and the adduct ion [M + CH3COO]− at m/z 829.5981, indicating that the molecular formula was C42H72O13. Its characteristic MS/MS pattern contained the fragment ion m/z 459.3858, which indicated that this chemical compound belonged to the protopanaxadiol (PPD) group. The corresponding fragment ion originated from the break of the glycosidic bond, which produces peaks at m/z 621.4400 for [M-H-Glc]−, m/z 459.3859 for [M-H-2Glc]−, m/z 179.0565 for [Glc-H]−, and m/z 161.0457 for [Glc-H2O]−, with results shown in Fig. 2B.
The ginsenosides were identified and confirmed by the strategies shown in Fig. 1, and all of the possible ginsenosides are summarized in supplemental Table 3. Fifteen components were unambiguously authenticated as ginsenosides Rb1,2 and 3, Rc, Rd, Re, F1,2 and 11, Ra3, Rg1,2 and 3 and 5, Rh1 and Ro by comparing the retention times, m/z values and fragment ions with those of the reference compounds. The other components were tentatively identified by analyzing the accurate mass, isotopic ratio patterns and specific MS/MS fragment ions based on published data from known ginsenosides (Wang HP et al. 2016; Xu XF et al. 2016). It should be noted that isomers which had the same aglycone and sugar moiety while exhibiting the same fragmentation pathway could not be unambiguously identified.
3.3. Constituents analysis of ginseng samples
In our study, there were 81 identified ginsenosides in the three types of ginseng (supplemental Table 3). As illustrated in Fig. 3A, 2, 2 and 6 ginsenosides were only found in WG, RG and AG, respectively. As shown in Fig. 3B, ginsenoside Rs5 (C44H72O13, RT 6.6 min), for example, was only presented in AG. Ginsenoside F2 (C42H72O13, RT 5.4 min) and ginsenoside Rk3 (C36H60O8, RT 9.9 min) were only found in RG. However, more than two-thirds of the ginsenosides (59 in 81) were shared by WG, RG and AG. The results indicated that little difference was present in the ginsenoside compositions. In following analysis, we focused on the 59 shared ginsenosides to find the differences among WG, RG and AG.
3.4. Multivariate statistical analysis of the shared ginsenosides
The following multivariate statistical analysis was based on the 16 × 59 data matrix (samples × analytes). Figure 4A shows the hierarchical cluster analysis (HCA) results for sample clustering based on all 59 ginsenosides. The relative distances are proportional to the correlation between samples, so a smaller relative distance means a higher similarity between samples than a pair with a larger distance (Xue, J et al. 2011). Two major branches separate the 16 samples into two groups. The first branch (Group 1/3 ) included samples from AG, and the second branch (Group 2/3 ) included samples collected from RG and WG. This division of samples indicated that the AG samples were significantly different from the RG and WG samples according to analysis of the shared 59 ginsenosides. Moreover, with decreasing relative distances (and increasing correlations), the samples in the second branch were further sub-divided into two groups that corresponded to RG and WG. To increase the comparability among samples, the peak area of each sample was normalized by the peak area of ginsenoside Ro, since the peak areas of ginsenoside Ro among the AG, RG, WG samples were not statistically different (Fig. S2). Figure 4B shows the HCA for sample clustering after the normalization. Figure 4B more clearly shows the difference among the AG, RG and WG. Least squares discriminant analysis (PLS-DA) also proved that the normalized data more easily yielded satisfactory categorization of the samples (Fig. S3). PLS-DA provided a 100% success rate in the prediction ability in terms of variety. The results indicated that the 59 ginsenosides in the ginseng samples could be used as indicators for determination of the ginseng variety.
Although these groupings are useful for qualitative interpretation, a limitation of HCA is that the phylogenetic trees cannot be used to determine which markers cause major differences between samples (Mercier SM et al. 2013; Rathore AS et al. 2014). A principal component analysis (PCA) was performed for more rigorous interpretation of the datasets, so PCA have an advantage over HCA used alone (Sleighter RL et al. 2010).
Figure 5A and Fig. 5B are the PCA score plots generated based on the peak areas of all 59 ginsenosides without or with normalization. It is clearly to see that the first principal component (PC1) can explain the maximum variance in the data, and the second principal component (PC2) represents the maximum amount of variance in the other direction (Chen Y et al. 2015; Valsalan J et al. 2020). The two ranking PCs, PC1 and PC2, described 41% and 30% of the total variability in the original observations, respectively, and together they accounted for 71% of the total variance (Valsalan J et al. 2020). In supplemental Table 4, the loading of variables showed that ginsenosides Ra3, F11 and Re primarily formed PC1. PC2 was related to malonyl ginsenoside Rb2, acetyl ginsenoside Rg1, ginsenoside Rg4, ginsenoside Rs3, etc. PC3 was not prominent, as it only explained 8% of the total variance, and its inclusion would provide little additional information. As has been shown in other PCA studies with large labeled datasets (Palanisamy SK et al. 2017; Li P et al. 2018), Similar to the results of the HCA, the distances between samples in the PCA score plot were proportional to the similarities/differences between samples (Mercier SM et al. 2013). Since PC1 (41%) explained the most of the total variance, the same distance value along the PC1 axis indicated the greatest difference between samples. Therefore, AG which were vertically separated were more distinct to the horizontally separated WG and RG clusters samples. The PCA clustering (Fig. 5) was highly consistent with the previous HCA clustering (Fig. 4), which further validates the statistical results (Chen Y et al. 2015). Therefore, the high consistency between HCA, which was generated based on 100% of the original variance, and PCA clustering which accounted for 71% of the total variance, indicated that the PC1 and PC2 were sufficient to provide a trustworthy linear relationship model, and was further validate the advantage of PCA which study the significant markers (Sleighter RL et al. 2010).
Biplots were created by combining PCA loading plots (Fig. S4) with score plots to account for correlations between sample groups and individual markers. In Fig. S4, every detected single point represents one ginsenoside, loading values plotted on the PC1 and PC2 axis. The differences and/or similarities among the markers were shown in the score scatter plot (Abdelhafez OH et al. 2020). Therefore, a key contribution loading value of 1.0 was chosen to distinguish significant and non-significant markers for further analysis.
For the AG samples, most of the markers were found in Q2 of the loading plot (Fig. S4) which corresponded with the AG sample cluster in the score plot (Fig. 5A). Thirty-one markers, including their names and m/z values, are listed in supplemental Table 5. In our study, it was found that American ginseng contained little ginsenoside Rf and higher levels of ginsenosides F11, Re and Rd. These results were consistent with those of previous reports (Li W et al. 2000) and proved that our data analysis processing was robust and reliable. These distinctive ginsenosides are related to the therapeutic implication of AG for neurodegenerative diseases associated with neuroinflammation (Wang X et al. 2014).
For the WG samples, most of the markers were found in Q4 of the loading plot (Fig. S4), and 13 of these markers (including ginsenosides Rg1, Rb2, acetyl ginsenoside Rg1, etc.) are listed in supplemental Table 5. Higher levels of ginsenosides Rg1 and Rb2 were found in WG samples (Fig. 6). The level of ginsenoside Rg1, which has pharmacological use through producing weak stimulation to the central nervous system, indicated that WG is more “warm” than AG (Harkey MR et al. 2001).
For the RG samples, most of the markers were found in Q3 of the loading plot (Fig. S4), and 17 of these markers (including ginsenosides Rg3, Rg5, Rs3 and malonyl ginsenosides Rb1, Rb2, etc.) are listed in supplemental Table 5. It was observed that the content of ginsenoside Rg3 was the highest in RG among the three types of ginseng (Fig. 6). Compared with American ginseng and white ginseng, the content of ginsenoside Rg3 was approximately 3-fold in red ginseng (Fig. 6). This was consistent with previous reporting that the amounts of ginsenosides Rg3 and Rg5 increased after the hot steaming process (Park EH et al. 2014). Compared with Asian white ginseng, red ginseng has stronger anticancer activities (Wong AS et al. 2015) due to the changes in these ginsenosides.
Moreover, we could use the ratios of some ginsenosides to easily illustrate the differences among AG, RG and WG. For example, we determined the contents of ginsenosides Re, Rg1, Rg3 and Ro, then calculated the ratios of Re/Ro, Rg1/Ro and Rg3/Ro. The maximum values of Re/Ro (0.600), Rg1/Ro (0.033) and Rg3/Ro (0.046) were obtained in AG, WG and RG, respectively (Fig. S5). Based on our multivariate analysis results, other ratios of ginsenosides could also be suitable to distinguish the three types of ginseng.
3.5. logistic regression analysis of the shared ginsenosides
Extracting principal components (PCs) by directly projecting the data using transformation matrices results in incorrectly mapped samples to their true locations in the low-dimensional feature subspace if some elements of the samples are perturbed (Mi JX et al. 2019). Due to this weakness of PCA, logistic regression was introduced in this experiment for model classification to further increase the accuracy of the results.
Using the 59 metabolites shared by the three types of ginseng and the corresponding peak areas as independent variables, the logistic regression was carried out and the results were shown in supplemental Table 6. Taking WG as the control, after performing multiple regression calculation, the classification equation of WG and AG is obtained:
Species=-0.895 +∑(β × Area)
Where β is the value corresponding to each marker; Area is the corresponding value of each marker chromatographic peak area.
Taking WG as the control, the classification equation of WG and RG is obtained:
Species=-1.719 +∑(β × Area )
where β is the value corresponding to each marker; Area is the chromatographic peak area corresponding to each marker.
Five ginsenosides were randomly selected as independent variables, and perform binary logistic regression. After regression analysis, the AG-WG, AG-RG, WG-RG can be obviously distinguished. In the AG-WG classification, the positive judgment probability for species = AG/WG is 100%, and its classification effect is significant. In the AG-RG classification, the positive judgment probability for species = AG is 100%, and that for species = RG is 85.7% with high total positive judgment probability (94.4%). In the WG - RG classification, the positive judgment probability for specie = RG is 100%, and that for the specie = WG is 80%, and the total positive judgment rate is 91.7%. The significance of its classification is lower than the first two cases supplemental Table 7. It is to say, the difference between AG and WG/RG is higher than the discrimination between WG and RG. The results of the regression equation were consistent with the aforementioned results, so our analysis are credible.
However, it should be noted that only the ginsenosides with higher responses in the negative ESI mode were measured in this study. There are many ginsenosides with low content that should be further studied. In addition, future work is also needed for the identification of the unknown ginsenosides found in this paper.