3.1 Development of UPLC-sMRM method
3.1.1 Construction of sMRM strategy
The UPLC-Q-TOF-MS data were used for the identification of compounds in the AR firstly. 137 compounds were identified by comparison with the mass data in the previous reports (Li 2021; Zhang et al. 2015; Huang et al. 2009; Tan et al. 2012). In addition, 29 compounds were further deduced based on the mass fragmentation pattern including 20 isoflavones, 4 isoflavans, 3 pterocarpines, and 2 flavones. Thus, a total of 166 compounds (81 in positive ion mode and 85 in negative ion mode) were identified (Table S2), and subjected to subsequent exploration of MRM transitions.
The peak lists and the mgf format files with different CE in the positive and negative ion modes generated by Onemap were imported into the MRM-Ion Pair Finder package, respectively. The obtained MRM transitions lists included 63 and 56 compounds in the positive and negative ion mode, with their best ion pairs and best CE. Then the ion pairs and CE of the remaining 47 compounds were manually optimized using the Sciex OS software, and 1–3 characteristic product ions with higher intensity (100 ion-pairs) were selected for optimization of ion-pairs. The optimal product ion requires a signal intensity greater than 100 and a molecular weight at least 14 Da smaller than that of the precursor ion. The best CE was optimized for each ion pair according to peak shape and intensity. The optimum ion-pairs were selected by sMRM list through directly comparing the responses. For example, the extracted ion chromatogram (EIC) of astrapterocarpan-3-O-Glc with different MRM ion transitions were shown in Fig. 1A, and the ion-pair 463.16/ 167.07 was obviously superior to 463.16/ 301.10.
After obtaining the ion pair and the best CE for the 166 compounds, the DP was further optimized in MRM mode. The DP of each ion pair was set to 20, 40, 60 and 80 V, and the DP corresponding to the ion channel with the highest response for each compound was determined as the optimal DP. For example, there were obvious difference in the peak shape and intensity for calycosin-7-O-Glc at various DP, and the optimum DP was determined as 40 V (Fig. 1B). According to the previous study, the intensity of the ion-pairs should be higher than the limit of quantitation. In addition, 4e6 cps was utilized as the upper intensity threshold for each analyte to avoid over-saturation of electron multiplier at the back of Q3 chamber (Yu et al. 2020).
After ion pair screening, a total of 129 compounds were selected for subsequent MRM detection, including 64 compounds in positive ion mode and 65 compounds in negative ion mode. The compound ID, MRM ion pairs, RT, optimal CE, and optimal DP of each compound were imported into the monitoring list of the sMRM (Tables S3 and S4). The obtained ion pair chromatograms of UPLC-sMRM analysis in the positive and negative ion modes were shown in Fig. 2.
3.1.2 Selection of IS compounds
The selection of suitable IS compounds is important for the effective pseudo-targeted metabolomics analysis. Quercitrin, hyperoside, and hesperidin were selected as candidate IS compounds for flavonoids, while ginsenoside Rg1, saikosaponin A and saikosaponin D were selected for triterpenoid saponins. Taking into account the RT, separation of each analyte, and responsiveness, hyperoside and saikosaponin A were selected as the IS in the positive ionization mode, and hyperoside was selected in the negative ionization mode.
3.2 Validation of the pseudo-targeted metabolomics approach
3.2.1 Method validation
After data acquisition, Multiquant software was used to extract the peaks and integrate the peak areas. The quasi-content of every compound was calculated by the corresponding regressive calibration curve. Then the linear range, precision, repeatability and stability of the developed pseudo-targeted metabolomics method were assessed for method validation.
In the positive ion mode, among the 64 targeted analytes, satisfactory linearity (r > 0.995) was observed for 57 of them at the concentration range of 1/2500 to 1 fold of CQC (Table S5, Fig. S2). The results of intra-day precision showed that 52, 54 and 57 of the 57 targeted analytes at low, medium and high concentration levels exhibited RSD values less than 15%, respectively. In addition, the inter-day precision showed that 52, 53 and 53 of the targeted compounds showed RSD less than 15% for the three concentrations, respectively. For the repeatability, the RSDs of all 57 analytes were less than 15%. For the stability within 72 h, the RSDs of all analytes were less than 15%. In the negative ion mode, 57 out of the 65 analytes showed satisfactory linearity (r > 0.995) over the concentration range of 1/2500 to 1 fold of CQC. In addition, the RSDs of the intra- and inter-day precision, repeatability, and stability within 72 h also met the requirements of the pseudo-targeted metabolomics analysis (Table S5, Fig. S3).
Thus, a total of 114 compounds with good linearity, sensitivity and reproducibility were selected for relative quantification, including 50 isoflavones, 10 flavones, 11 isoflavans, 10 pterocarpines, 10 triterpenoid saponins and 23 other compounds. 20 isoflavones, 6 flavones, 5 isoflavans, 6 pterocarpines and 8 triterpenoid saponins with specific chemical structures were shown in Fig. 3.
3.2.2 Method validation with standards
A mixture solution containing 11 reference compounds were used to further validate the established method. 6 of them were used in the positive ion mode, including calycosin-7-O-Glc, ononin, astrapterocarpan-3-O-Glc, astragaloside I, astragaloside Ⅱ and astragaloside Ⅳ. As shown in Table S6, the linear regression coefficients between concentrations and the corresponding IS-normalized peak area were > 0.995 for the 6 reference compounds. The LOQs, which were determined at 10 times of S/N, were between 0.0038–0.0696 µg/mL. The intra-day precision RSDs of 6 standards ranged from 0.59 to 4.84%, and the inter-day precision RSDs ranged from 1.57 to 4.93%. In the negative ion mode, 5 reference compounds were used, which including calycosin, formononetin, isomucronulatol, isomucronulatol-7-O-Glc, astrapterocarpan. As shown in Table S6, these compounds showed good linearity, and the RSDs of intra-day and inter-day precision were less than 5%. Thus, the validation results by 11 standards further showed that the developed pseudo-targeted metabolomics method was robust for the relative quantification of selected analytes.
3.3 Pseudo-targeted Metabolomics Analysis
The validated method was applied to analyze 41 batches of AR samples, the quasi-content of each analyte and the accurate content of 11 compounds were calculated separately by the corresponding regression calibration curves. The results were shown in Table S7 and Table S8. The QC samples were run every 8 sample, and the distribution of 6 QC samples was within 2 SD, which suggested that the established analytical method was stable (Fig. S4). For the 11 analytes with standards, strong correlations were observed between the quasi-contents and the accurate contents (Fig. S5), which further demonstrating the reliability of the established method.
The quasi-content of metabolites accumulated in different AR samples showed that there was obvious difference among different samples. Sample 22 showed the highest content of the sum of various structural types, while sample 2 showed the lowest content (Fig. 4A). The proportion of various components in different AR samples were relatively stable, with isoflavones fluctuating in the range of 36.9–50.6% and flavones in the range of 6.3–11.9% (Fig. 4B), respectively. The radar chart was also generated to exhibit the distribution of the main metabolites in the different AR samples (Fig. 4C). The highest content of isoflavones were found in all AR sample, while isoflavans and pterocarpines were also present at higher levels but differences were observed among different samples.
3.4 Chemical comparison of the different AR and chemical markers discovery for the discriminant model
In order to compare the chemical compositions of AR between two different growth patterns, PCA was conducted firstly (Fig. 5A). Although the two kinds of AR samples could not be separated completely, chemical differences did exist between AR-W and AR-C. The quasi-content was used to generate a heatmap with HCA, in which the samples and features were shown on the X-axis and Y-axis, respectively. The chemical difference between AR-W and AR-C was also evident in the heatmap, which was in agreement with the PCA (Fig. 5B). The volcano plot was further generated, with the criteria of P < 0.05 and Fold Change > 1.5 or < 0.67, 26 differential compounds were determined, which including 9 isoflavones, 3 isoflavans, 4 flavones, 5 pterocarpines, 3 triterpenoid saponins and 2 amino acids (Table S9). Among them, 22 compounds were higher in AR-W, while 4 were higher in AR-C (Fig. 5C).
Lasso regression was used to screen the chemical markers from the 114 semi-quantified compounds for establishing the discriminant model, and the optimal λ was selected according to the minimum mean square error by 10-fold cross-validation. The regression coefficients of each variable in the optimal model were obtained, and the variables with non-zero regression coefficients were fitted to the logistic regression model. Then 5 markers compounds were screened out (Table 1), which including naringenin, neocomplanoside, leucine, vesticarpan-O-Glc-Mal, and 4-Hydroxybenzoic acid. All of them belonged to the 26 differential compounds determined above, except 4-Hydroxybenzoic acid due to its P value (0.0563).
Table 1
Regression model parameters for 5 marker compounds
compound name | effect | std. Error | P (AR-W vs AR-C) |
naringenin | -2.22E-02 | 1.65E-02 | 0.00517 |
neocomplanoside | 2.01E-02 | 1.92E-02 | 0.00015 |
leucine | -1.70E-02 | 2.43E-02 | 0.03273 |
vesticarpan-O-Glc-Mal | 3.91E-02 | 3.69E-02 | 0.00004 |
4-Hydroxybenzoic acid | -1.22E-02 | 1.75E-02 | 0.05634 |
Then unsupervised clustering, ROC discriminant, supervised random forest and support vector machine were used to test the classification power of these 5 markers. As shown in Fig. 6A, good separation between AR-W and AR-C samples was achieved in the heatmap with hierarchical clustering, and the separation have been improved when compared with Fig. 5B. Random forest classification analysis was conducted with the number of trees setting as 500, and 5 markers showed a classification accuracy of 100% (Fig. 6B). In addition, the classification showed 95.12% of accuracy in the support vector machine analysis (Fig. 6C). The ROC analysis was further performed for 5 marker metabolites separately and in combination, and the area under curve (AUC) (0.971) values of the combination were greater than the AUC values of the individual compounds (0.756–0.851) (Fig. 6D). In summary, all these models demonstrated strong discriminant ability of the 5 marker compounds.
3.5 Co-expression modules construction based on WGCNA
WGCNA is a tool that identifies sets of metabolites with highly synergistic changes. The metabolites with high similarity are clustered in the same module (Ning et al. 2022). Quasi-content data of 114 compounds were used in WGCNA to further explore the relationship between them. A total of 4 co-expression modules were identified, each containing 8 to 39 metabolites, and those not belonging to these modules were indicated in gray (Fig. 7A). The details of the metabolites in each module were shown in Table S10. The turquoise module contained 39 metabolites, including 22 isoflavones, 2 flavones, 6 isoflavans, 6 pterocarpines, 3 triterpenoid saponins.
Then we associated each of the co-expression modules with two groups of AR via Pearson correlation coefficient analysis (Fig. 7B). The turquoise module showed positive correlation with the AR-W (r = 0.38, p < 0.05). It was interesting that, among the 26 differential compounds screened above, 21 of them were enriched in this module and all of them were higher in the AR-W, which was in agreement with the multivariate analysis. The other 18 compounds in the turquoise module were also higher in AR-W, although there was no significant difference. The brown module showed positive correlation with AR-C (r = 0.45, p < 0.05), and 2 differential compounds which were significantly higher in the AR-C were enriched in this module. The yellow and blue module showed no significant correlation with either AR-W or AW-C, as no differential compounds were enriched in these 2 modules.
3.6 Analysis of malonyl -substituted flavonoids
Malonyl glycosides are natural derivatives and have been reported in many plant species, their content may be associated with high antioxidant properties (Zhang et al. 2020; Zheng et al. 2019). In AR, malonyl groups are usually substituted on flavonoid glycosides, and there were 12 malonyl-substituted flavonoids among the 114 semi-quantified compounds. It was interesting that their quasi-content were relative higher in AR-W than those in AR-C, and significant differences were observed for 8 of them (Fig. 8).
3.7 Variation Coefficients of metabolites in AR-W
In order to quantitatively evaluate the degree of variation for each analytes in the AR-W samples, variation coefficients were calculated using two different methods (V1 and V2) (Yu et al. 2020). The top 20 components were selected for each method, and there were 17 overlapping components, including 7 isoflavones, 2 isoflavans, 2 flavones, 3 triterpenoid saponins, 1 amino acids and 2 organic acids (Table 2). The quasi-content of these 17 compounds did vary greatly. The Cmax/ Cmin ratios of 17 compounds were all higher than 20, and IF12 and IF6 were even higher than 500. In addition, dihydroxy-trimethoxyisoflavan was not detected in two samples (2 and 31).
Table 2
Compounds with large variation
compound name | V1 | V2 | Cmax/Cmin | type |
dihydroxy-trimethoxyisoflavan | 25.82 | 2.08 | - a | isoflavan |
IF12 b | 7.02 | 1.11 | 2148.19 | isoflavone |
IF6 | 4.82 | 0.97 | 568.45 | isoflavone |
3´,5´-Dihydroxy-4´-methoxyisoflavone | 7.59 | 1.09 | 96.16 | isoflavone |
cycloaraloside F | 5.64 | 0.90 | 55.30 | triterpenoid saponin |
astragaloside V | 5.71 | 0.85 | 51.98 | triterpenoid saponin |
IF7 | 4.57 | 0.80 | 36.15 | isoflavone |
trimethoxy-isoflavone | 5.42 | 0.88 | 35.42 | isoflavone |
sieberoside II | 5.55 | 0.86 | 34.56 | triterpenoid saponin |
complanatoside | 4.16 | 0.82 | 34.38 | flavone |
tryptophan | 5.55 | 0.96 | 30.78 | amino acid |
4-Hydroxybenzoic acid | 4.42 | 0.80 | 30.08 | organic acid |
rhamnocitrin-Hex | 4.41 | 0.80 | 28.87 | flavone |
pratensein | 5.38 | 0.91 | 27.87 | isoflavone |
salicylic acid | 7.79 | 1.04 | 27.05 | organic acid |
isomucronulatol | 5.08 | 0.80 | 21.40 | isoflavan |
formononetin | 4.21 | 0.87 | 20.60 | isoflavone |
a -, not tested; b IFn, derivatives of isoflavones.
3.8 Biosynthesis pathway network of flavonoids in plants
Based on the biosynthetic pathway of flavonoids in plants, we mapped the network of flavonoid biosynthesis pathways in AR. Most of the flavonoids detected in this study were included, and could be classified into four categories: isoflavones, isoflavans, flavones and pterocarpines (Fig. 9). The precursor of these flavonoids is naringenin, which is a dihydroflavonoid biosynthesized from one molecule of 4-coumaroyl-CoA with 3 molecules of malonyl-CoA. Naringenin is a major intermediate in flavonoid biosynthesis and can be subjected to a variety of reactions to produce various flavonoid subclasses. These reactions include the production of flavones catalyzed by the flavone synthase (FNS), flavonols by the flavanone-3β-hydroxylase (F3H) and flavonol synthase (FLS), isoflavones by the isoflavone synthase (IFS). Moreover, Isoflavans and pterocarpines are produced from isoflavones. These compounds are further modified by various hydroxylases, methyltransferases, reductases, and glycosyltransferases in AR to form structurally diverse flavonoids (Lin et al.2022; Zhang et al. 2022; Pandey et al. 2016).
PAL: phenylalanine ammonia-lyase; C4H: cinnamate-4-hydroxylase; 4CL: 4-coumarate:coen-zyme A ligase; CHS: chalcone synthase; CHI: chalcone isomerase; FLS: flavonol synthase; F3H: flavanone-3β-hydroxylase; FNS: flavone synthase; IFS: isoflavone synthase; IOMT: Isoflavone O-methyltransferase; I3′H: isoflavone 3′hydroxylase; UFGT: formononetin-7-O-glucosyltransferase; UCGT: calycosin-7-O-glucosyltransferase; PTS: pterocarpan synthase.
Based on the results of the pseudo-targeted relative quantification, most of the isoflavans and pterocarpines were higher in AR-W than in AR-C, probably due to the high expression of the relalted genes. In addition, most of the isoflavones exhibited higher levels in AR-W, which may be related with the higher expression of enzymes associated with isoflavone synthesis in AR-W. In addition, the content of malonyl substituted compounds were higher in all the AR-W, probably due to the high expression of malonyl transferase in AR-W. However, further proof by transcriptome is needed regarding the different expression of flavonoid biosynthesis-related genes in the two growth patterns of AR.