A Large-Scale Detection, Identification and Quantification of Target Metabolites using dMRM-MS Combined with Transcriptome of Two Rheum Species Focused on Anthraquinone and Flavonids Biosynthesis


 Background

Rheum emodi ( R. emodi ) is a perennial herb as well as an important medicinal plant. The main active compounds in Rhubarb are anthraquinones, but also contains flavonoids. As a class of important active components in rhubarb, anthraquinones have many pharmacological effects. However, due to the lack of rhubarb genome, it is difficult to mine functional genes involved in the biosynthesis of anthraquinones in rhubarb.

Results

Based on the a large-scale quantification of target metabolites combined with the transcriptome data of Rheum Palmatum L . (RPL) and Rheum Officinale Bails (ROB), in this study we aimed to firstly mine out anthrquinones and other pharmacological metabolites in rhubarb, then search genes which may related to the biosynthesis of anthraquinones and other metabolites. Via dynamic-multiple reaction monitoring (dMRM) of triple quadrupole mass spectrometry (QqQ-MS) technical means, 62 compounds including 21 anthraquinones, 17 flavonoids, 6 stilbenes, 12 gallate esters, 3 tannins, and 3 others were simultaneous qualitatively and quantitatively assessed. Furthermore, a total of 180,689 transcripts and 90,242 unigenes were generated by the de novo transcriptome assembly. Down-stream bioinformatic analyses annotated the unigenes with BLAST against seven public databases including KEGG and KOG. GO and KEGG pathway analyses further characterized the unigenes, resulting in 21,691 unigenes with metabolic pathway annotations. There were 121 genes annotated in the flavonoid pathways, and 38 of them showed a significant difference in expression level between the RPL and ROB. According to the difference in the content of emodin and physcion in the two kinds of rhubarb and the expression level of key enzyme genes, we further search 7 differentially expressed genes annotated with the activity of caffeoyl coenzyme methyltransferase. RT-qPCR results showed that the expression of the cluster-14354.38156 in the ROB was approximately 14-fold higher than that observed in the RPL, which may participate in the process of O-methylation from emodin to physcion.

Conclusion

Combining metabolome and transcriptome analysis, a key enzyme gene was identified. This study supplied a useful resource for further study of metabolite in rhubarb.

4 the difference in the content of emodin and physcion in the two kinds of rhubarb and the expression level of key enzyme genes, we further search 7 differentially expressed genes annotated with the activity of caffeoyl coenzyme methyltransferase. RT-qPCR results showed that the expression of the cluster-14354.38156 in the ROB was approximately 14-fold higher than that observed in the RPL, which may participate in the process of O-methylation from emodin to physcion.

Conclusion
Combining metabolome and transcriptome analysis, a key enzyme gene was identified. This study supplied a useful resource for further study of metabolite in rhubarb.

Background
Rheum emodi ( R. emodi), also called "DaHuang", polygonaceae, is a species of rhubarb rich repository of pharmaceutically important secondary metabolite constituents of the anthraquinones group [1]. In ancient China and Western countries, rhubarb has mainly been used to clear away heat and assist with defecatione. China has been the major global producer of rhubarb. In addition, it has been a great demand for rhubarb in Europe owing to the local eating habits.
Rhubarb has been used as an effective, short-lived, and painless cathartic with important therapeutic effects [2]. The usage of the roots and rhizomes of rhubarb reports about the differences of the pharmaceutically metabolites between them [3]. However, the main active compounds in Rhubarb are anthraquinones [4], but also contains flavonoids.
The main anthraquinone derivatives in rhubarb are mostly emodin anthraquinone.
Currently, there are about more than 30 anthraquinone compounds in rhubarb, including emodin, aloe emodin and physcion [5]. As to the same skeleton construction, different structural modifications endow metabolites with different pharmacological effects [6]. Anthraquinone compounds in rhubarb belong to aromatic polyketone, which is obtained by the pathway of acetic acid biosynthesis.
In the initial synthesis stage, acetyl coenzyme-A was taken as the initial unit and malonyl coenzyme-A as the extension unit, and the benzene ring structure was formed under the action of polyketide synthase [7]. Under the catalysis of plant specific type III polyketo compound synthase, the precursor of free anthraquinone -anthrone, was generated through hydroxy aldehyde reaction, enolization reaction, oxidation reaction and decarboxylation reaction. Then free anthraquinone was generated into tuberculosis type anthraquinone under the action of glycosyltransferase [8,9]. Some studies have shown that methyltransferase catalyzes the hydroxyl methylation of emodin 3 to form physcion [10]. Similar to this pathway is the biosynthesis of flavonoids, the initial synthesis of flavonoids was also conducted under the action of polyketide synthase with malonyl coenzyme-A as the extension unit [11][12][13][14]. Then, chalcone, a precursor of flavonoids, was synthesized by plant specific type III polyketide synthase through oxidation reaction, hydroxy aldehyde reaction, synthesis and isomerism reaction. The free flavonoids were produced from chalcone catalyzed by chalcone isomerase, and then flavonoid glycosides were further produced by glycosyltransferase. 6 Although biosynthesis of anthraquinones is aim of some former studies in rhubarb [11,[15][16][17], the biosynthesis pathway of anthraquinones is not as clear as that of flavonoids. And the results from investigation of the metabolic pathway of anthraquinones are also limited in the currently available databases. The mechanism of skeleton formation of anthraquinones remains to be explored [16,18]. Therefore, the key enzyme genes involved in the biosynthesis of anthraquinones were explored by combining the biosynthesis information of flavonoids. Besides, due to the lack of rhubarb genome, it is difficult to search for functional genes. In the post-genomic data era, integrated analysis of comprehensive gene expression (transcriptome) and metabolic profiling (metabolomics) data has successfully applied in plant functional research [19,20].
Thus, in this study, we supplied both metabolome and transcriptome data of Rheum Palmatum L. (RPL) and Rheum Officinale Bails (ROB) as resources for future research. At the same time, we searched the genes that might be involved in the biosynthesis of anthraquinones.

Mass Conditions
The metabolic analysis of Rheum Officinales Bails (ROB) and Rheum Palmatum L.
(RPL) leaves were optimized by comparing the performance of several candidate elution systems, based on previous references data [21]. The acidity of the elution system is the most important factor for the analysis of metabolism. Therefore, different elution acidities with 0.1%, 0.2%, 0.5%, and 0.8% formic acid were evaluated by comparing the obtained target compounds and resolution of all the metabolism. Finally, 0.2% formic acid exhibited the best performance. The 7 optimized HPLC and electrospray ionization-QqQ-MS (ESI-QqQ-MS) conditions were as follows: the solvent system, eluent A was milli-Q water containing 0.2% formic acid, and eluent B which was acetonitrile. All samples were analyzed using an Eclipse-Plus C 18 column (2.1×50 mm, 1.8 μm) at 30 o C with a linear elution gradient protocol of 0-1.5 min, 95% B, 1.5-3 min, 5-24% B, 3-5min, 24-25% B, 5-11min, 25-66% B; 11-16 min, 66% B; 16-17 min, and 66-100% B, with a flow rate of 0.2 mL/min. Under these conditions, the metabolites of rhubarb leaves were completely eluted with resolving powder best target (Additional file 1).

Identification of rhubarb leaves metabolites based on ESI-Q-TOF-MS/MS
The identification of metabolites in conjunction with the ESI-Q-TOF-MS, both positive and negative ESI modes, were used to determine fragment ion information of metabolites. Information including mass spectra (in PI, NI, NI-MS 2 , and PI-MS 2 ), ultraviolet-vis spectra and retention time on C 18  The contents of aloe emodin, rhein, and emodin-O-(6-O-acetyl)-glucoside in RPL leaves were significantly higher than those observed in the ROB leaves. In addition, the contents of chrysophanol, emodin and methyl ether were significantly higher than those reported in ROB. Moreover, the contents of chrysophanol-(6-O-acetyl)glucoside and 43 other substances in the RPL were significantly lower than those measured in the ROB.
The contents of 8-O-methyl chrysophanol, chrysophanol, and physcion in the RPL were significantly lower than those observed in the ROB. Furthermore, the contents of three free anthraquinones in the RPL were significantly higher than those reported in ROB. Of the 14 binding anthraquinones, the content in the RPL was significantly lower than that detected in ROB, whereas the content of the other six in the ROB was significantly higher than in the RPL.
Principal component analysis was used to examine the quantitative data of the anthraquinone compounds in two kinds of rhubarb. The variance contribution rate of the first eigen-value was 92.61%, the second eigen-value was 3.62%, and the accumulative contribution rate of the first two factors was 96.23%. The six replicates of ROB and RPL leaves were clustered on the left and right sides of PC1, respectively, indicating a good correlation between biological duplication and a significant difference between the two kinds of rhubarb ( Figure 2A). There was a significant difference in the content of anthraquinones between the two rhubarb species ( Figure 2B).

Method validation
The calibration curves showed good linearity for all the available standards. The standard solutions were detected through chromatography until the signal-to-noise ratio corresponded to 3 and 10. The corresponding concentrations at these signalto-noise ratios were defined as the limit of detection and limit of quantitation, respectively. The precision of metabolic quantification was studied by examining the repeatability and intermediate precision for all the compounds.

Sequencing and de novo transcriptome assembly
RNA was sequenced using Illumina paired-end sequencing technology. A total of 193,873,248 Illumina raw reads were obtained ( Table 2). After removing adaptor sequences, ambiguous nucleotides and low-quality sequences, approximately 185 million clean reads were kept for the subsequent analysis. These reads were then assembled de novo using Trinity

Annotation of unigenes
To access the quality of these assembled unigenes, a homology search in Nr database was conducted for all unigenes using the BLAST program [24] and a cutoff E-value <10 − 5 . The best aligning results were selected to annotate the unigenes.
The results showed that 56% of the aligned sequences (50,558) exhibited significant homology with entries in the Nr database (E-value < 10 − 5 ), while > 75% of these matched sequences showed an E-value < 10 -15 (Additional file 3). Based on the BLAST similarity distribution, 32.5% of the aligned sequences exhibited alignment identities > 80% (Additional file 3). We screened these 90,242 non-redundant unigenes for similarity in seven public databases (i.e., Nr, Nt, Swiss-Prot, KEGG, GO, KOG, and Pfam). The annotation results showed that 56.05% (50,588 unigenes) had hits in the Nr database, followed by the Swiss-Prot database (45.15%, 40,753 unigenes). In total, there were 57,239 unigenes (63.42%) annotated in at least one of these seven databases, with 8,819 unigenes (9.77%) appearing in all seven databases including annotated in Nr, Nt, KEGG Orthology, SwissProt, Pfam, GO, Databases, at least one database, as well as total unigens.

Identification of DEGs
To reveal the molecular mechanism involved in flavone, flavanol, and flavonoid biosynthesis, the DEGs between ROB and RPL were identified. Pearson's correlation 12 coefficient (r 2 ) between samples was relatively high in ROB_1 vs ROB_2 (r 2 = 0.880) and RPL_1 vs RPL_2 (r 2 = 0.867), whereas it was low in ROB vs. RPL (from 0.527 to 0.576). Thus, we expected to obtain a large number of DEGs between ROB and RPL.
Read counts of every gene obtained through Hi-seq in all four RNA-sequencing samples (two biological replicates in ROB and RPL, respectively) were normalized and analyzed via DE-Seq package, finally 19,242 genes were detected as significantly differentially expressed between ROB and RPL.

Combined analysis of RNA-sequencing and metabolomics data
Emodin was methylated by methyl-transferase, followed by methoxy to form physcion. According to the results of the metabolomics analysis, the content of emodin in ROB is slightly lower than that observed in RPL. In contrast, the content of physcion in ROB was approximately 2.4-fold higher than that reported in RPL ( Figure 4B). Since the amount of physcion is different in ROB and RPL, we then We used RT-qPCR to quantify the four genes obtained from the above analysis. The 13 results showed that the expression of the cluster-14354.38156 in the ROB was approximately 14-fold higher than that observed in the RPL. This finding was consistent with the difference detected by RNA-seq gene expression. Therefore, cluster-14354.38156 may regulate the metabolic reaction of emodin 3 methylation to physcion ( Figure 4D).

Sample preparation and extraction
Accurately weighed about 100 g rhein fresh leaf crushed using a mixer mill with a zirconia bead for 1.5 min at 30 Hz, then extracted overweight at 4 o C with 1.0 ml pure methanol (or 70% aqueous methanol) containing 0.1 ppm lidocaine for lipidsolubility extracts were absorbed and 0.4 ml of each extract was mixed and filtrated (SCAA-104, 0.22 µm pore size) before LC-MS analysis.

ESI-Q-TRAP-MS/MS
A triple quadrupole-linear ion trap mass spectrometer (Q-trap), Q-TRAP 5500 LC-MS/MS system, equipped with an ESI-Turbo Ion-Spray interface, operating in a positive and negative mode by Analyst 1.6.2 software (AB Sciex). The ESI source operation parameters were as followed: source temperature 500 o C, ion spray voltage (IS) 5500 V; ion source gas I (GS I), gas II (GS II), collision gas (CAD), curtain gas (CUG) was set at 50, 50, and 30 psi, respectively. Collision energy was also set at 20, 30 and 40 eV. Instrument running and mass calibration were performed with 1, 10 and 100 mol/L polypropylene glycol solutions in trap and LIP modes, respectively. The MIM scan which served as a survey scan to trigger information-dependent acquisition (IDA) of EPI scan model, MIM-EPI was carried out to screen metabolites ions isolated in Q1 mininal CE (5 eV). The corresponding Q3 at the same metabolites were monitored through two ways. The firstly, a modified MIM-EPI scan was adapted, in which Q1 and Q3 were set from 50.1 to 1000.0 Da, and the mass step was 0.2 Da. At the same time, the other target-scan was also acquired for paired Q1 and Q3. We monitored each MIM-EPI experiment with 60 MIM transitions, product ions of each metabolites ion were scanned from 50.1 to 1000.0 Da in Q3, and the total cycle time for one scan was approximately 2.0 s.

QQQ-Experiment
The sample extracts were analyzed using UPLC-ESI-QQQ-MS (Agilent 1290 and 6460 triple quadrupole mass spectrometry series, Agilent Corporation, CA, USA). The

RNA extraction, library construction and sequencing
Fresh leaves of the two cultivars were individually ground to powder in liquid nitrogen, each sample with two biological replicates. Total RNA was isolated using the RNA prep Pure Kit (For Plant TIANGEN Biotech, Beijing, China), and RNA degradation and contamination were determined using 1% agarose gels. RNA purity was verified using the Nano-Photometer® spectrophotometer (Implen, CA, USA).

Sequence reads mapping, assembly and annotation
Raw data in FASTQ format were firstly processed through in-house Perl scripts.
Reads containing adapter sequences, Ns, and low-quality reads were excluded from the raw data. The remaining reads were kept as clean data. Subsequently, the Q20, 20 Q30 and GC-content of the clean data were calculated. All the downstream analyses were based on clean data with high quality.
The FASTQ files from all four samples were pooled for the subsequent assembly.
Transcriptome assembly was accomplished using the Trinity software assembly [22] with all parameters being set as default. All assembled contigs were then clustered to unigenes.

Differential expression analysis of unigenes
The levels of gene expression in each sample were estimated by RSEM [19]. Clean data were mapped back onto the assembled transcriptome and a read count for each gene was obtained using Hi-sequencing. Differential expression analyses of two conditions/groups were performed using the DE-Seq R package (1.10.1) [13].
We used count data generated by RSEM as input, two groups, i.e. two ROB samples and two RPL samples, were normalized and further used for differentially expression analysis by DE-seq data set from Matrix and DEseq functions in DE-Seq package.
DE-Seq provided statistical routines for determining differential gene expression using a model based on the negative binomial distribution. The resulting P values were adjusted and genes with an adjusted P-value < 0.05 (according to DE-Seq) 21 were assigned as differentially expressed.

Combined analysis based on metabolome and the transcriptome
We aimed to explore key enzyme genes involved in the biosynthesis pathway of anthraquinone. Since biosynthesis pathway of anthraquinone is poorly annotated, we searched the differentially expressed genes (DEGs) which involved in the flavonoid biosynthesis, which is well annotated. Then, according to the content difference of physcion in ROB and RPL, we picked up DEGs annotated as caffeoyl coenzyme methyltransferase as possible candidate of key enzyme involved in anthraquinone biosynthesis pathway. Since we have just two samples each group in DE analysis, we further used RT-qPCR to validate related DEGs. The primers were designed with Primer 5 software using the gene sequence obtained by RNAsequencing (Additional file 4). The primer design was based on common PCR primer design; however, the optimal length of the RT-qPCR primers was 80-180 bp, and the annealing temperature of the primers was 55-58 o C.
The RT-qPCR method is based on the specification of Trans-Start Green qPCR SuperMix UDG kit purchased from BeiJing all Gold Biotech Co. Ltd (Additional file 5).
The template was diluted to approximately 1μg/μL. The qPCR reaction program

Declarations 22
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets during and/or analysed during the current study available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
Our research was supported by "the Fundamental Research Funds for the Central public welfare research institutes" (No. ZZ13-YQ-057). The funding organization had no role in the design of the study, data collection, analysis, decision to publish the data, or preparation of the manuscript.

Authors contributions
SC conceived and designed the experiments; JL and LL performed the experiments and contributed to the acquisition of data; SC, JL and HG analyzed the data; SC and AL wrote the paper. All authors have revised the manuscript critically for important intellectual content and given final approval of the version to be published. SC are responsible for the integrity of the work as a whole. Notes: a: ROB, Rheum palmatum L.; b: RPL, Rheum palmatum L. Figure 1 The biosynthetic metabolic network of target metabolites in rhubarb has been constructed.