Metabolomics analysis
Qualitative and quantitative analysis of metabolites
We detected a total of 1254 metabolites from the three tissues of Cudrania roots (CR), stems (CB), and leaves (CL), and all metabolites were annotated and classified into eight categories by the KEGG database: steroids, lipids, antibiotics, vitamins and cofactors, nucleic acids, peptides, carbohydrates, and organic acids. The main metabolites are DL-Arginine、L-Aspartic acid、6-Methylnicotinamide、Trigonelline、D-Saccharic acid、D-(-)-Salicin、D-(+)-Mannose、Asparagine、2-Oxoglutaric acid、Choline、a-Lactose、Parishin C、Parishin E、Sucrose、Plantagoside、D-(-)-Quinic acid、Citric acid、Fumaric acid、DL-Malic acid、DL-Stachydrine. Heat map analysis of the major metabolites of the three tissues showed that we found that the abundance of metabolites varied greatly. This also suggests that there are different medicinal values among the roots, stems and leaves of the Cudrania plant. It will provide a basis for the development and utilization of Cudrania plants (Fig. 1).
Principal Component Analysis (PCA)
The pre-processed LC-MS data were subjected to unsupervised PCA analysis to obtain the spatial distribution of roots, stems and leaves of the pre-processed LC-MS data were subjected to unsupervised PCA analysis to obtain the spatial distribution of roots, stems and leaves of Cudrania. As shown in the Fig. 2A, replicate samples from the same site were closer to each other, while there was a clear separation between samples from the Cudrania root, stem, and leaf groups. It was shown that PCA analysis had a good differentiation between roots, stems and leaves of Cudrania, indicating that there were significant differences in metabolome composition and content. PC1 explained 41.3% of the total variables and PC2 explained 25.3% of the total variables, with a cumulative contribution of 66.6%, which explains the overall variables well and is consistent with the results of the heat map analysis.
PLS-DA analysis
By PLS-DA model analysis, the root, stem, and leaf groups of Cudrania can be completely separated, and the three tissue samples are distributed in different confidence intervals. The cross-validation parameters of the model R2X = 0.856, R2Y = 1.000, Q2 = 0.992, and all three parameters are greater than 0.5, saying that the model has a good degree of interpretability and predictability (Fig. 2B). In order to verify whether the constructed model is overfitting, we can use the Permutation Test, OPLSDA's Permutation Test selects prediction accuracy as the test statistic. The test statistic is observed to be on the right side of the random distribution (observations are significantly larger than random values), Q2 = 0.981, with a p-value of 0.01, which is less than 0.05, suggesting that our PLS-DA discriminant model is significant and not overfitted. Three groups of roots, stems and leaves could be significantly differentiated, and there were also significant differences in metabolites between groups (Fig. 2C).
Differential metabolite analysis
Based on multivariate analysis of the VIP values of the first two principal components of the PLS-DA model, combined with univariate analysis of the multiplicity of differences (fold change, FC) and q-value values, the intersection of VIP ≥ 1.0, fold change ≥ 1.2 or ≤ 0.8333; and q-value < 0.05 was taken to obtain the shared ion, i.e., the difference ions, based on the VIP and FC values, the volcano diagram of differential metabolites was derived. The screening revealed a total of 649 differential metabolites compared to roots and stems, 742 differential metabolites compared to roots and leaves, and 611 differential metabolites compared to stems and leaves. It can be seen that the metabolite compositions of the various medicinal parts of Cudrania are very rich, and it is difficult to carry out a comprehensive study on the metabolites of Cudrania plants by traditional methods. Upon further analysis of the metabolites among the three parts in roots, stems and leaves, a total of 504 major differential metabolites were identified.
We made box plots for the top 25 ranked metabolomes, and the differential metabolites that accumulated more in roots were 2-oxo-2H-chromene-3-carboxylic acid, Gomisin N, 2-Methylbutyric acid, D-Mannitol 1-phosphate, 6,7-Dimethoxy-4-Methylcoumarin, Ciliatine, Linderane, Tricin O-malonylhexoside, Fargesin, Bergenin, Methyl cinnamate, Sodium Houttuyfonate, Ketoisophorone, 1-(1,8-dihydroxy-3,6-dimethyl-2-naphthyl)ethan-1-one, Flemiphilippinin A, 7-Demethylsuberosin, Coumarin-3-carboxylic acid, mainly flavonoids, polyphenols and coumarins. The differential metabolites that accumulated more in the stems were 2-Methoxy-4-vinylphenol. The differential metabolites that accumulate more in leaves were (1H-indol-3-yl)-3-[4-(trifluoromethyl)phenyl]acrylonitrile, Smyrindioloside, 1,11-Undecanedicarboxylic acid, 3-Butylidenephthalide, Miyabenol C, 1-(2-carboxy-2-methylpropyl)-4,6-dimethylbenzoic acid(Fig. 3).
Repeat relevance assessment
Correlation analysis is an effective method to identify metabolic similarities between different tissues. In our study, Pearson analysis was performed to find out the linear relationship between 100 differentially expressed metabolites and the results were displayed in a correlation matrix. By comparing the metabolite correlation analysis spectra for each group, we can observe if the metabolite correlation changes between groups. In this study, 577 metabolites showed significant correlations (p < 0.001). We found that among the top 5 metabolites with the highest correlations, organic acids and their derivatives, and peptides occupied a larger proportion relative to other biological classes, which may indicate that they can significantly influence the metabolism of other substances (Fig. 4). By analyzing the interactions between differential metabolites, we revealed the main relationships between anabolic and catabolic metabolism of metabolites as well as functional correlations between different compounds.
Metabolic pathway analysis
The purpose of enrichment analysis is to search for biological pathways that play a key role in a particular biological process, so as to reveal and understand the basic molecular mechanism of the biological process. For metabolomic data, before enrichment analysis, we need to select some metabolites to focus on, these metabolites usually need to be metabolites that are significantly different between subgroups (T-test, p < 0.05) and the metabolites need to be in the KEGG Compound Reaction Network (KEGG Compound Reaction Network, which is embedded in the KEGG Pathway). By enrichment analysis, the top 50 metabolic pathways with significant enrichment of differential metabolites were found, mainly including Degradation of flavonoids, Flavonoid biosynthesis, Nucleotide metabolism, and Phenylalanine metabolism, Biosynihesis of amino acids, Alanine, aspartate and glutamate metabolism are pathways that may play important roles in biology. Even if the metabolites of interest are significantly enriched in a pathway, we still don't know if these metabolites play a key role in this metabolic pathway, and topological analysis is able to calculate the magnitude of the role of the metabolite of interest in the metabolic pathway (measured by Impact). The metabolic pathways in the blue region are the prominent metabolic pathways in the ORA enrichment analysis, and the vertical coordinates demonstrate the Impact of these metabolic pathways in the topology analysis. We found that Degradation of flavonoids, Flavonoid biosynthesis, and Nucleotide metabolism were the pathways with more significant differences between the three tissues (Fig. 5).
Transcriptomics Analysis
QC analysis and functional database annotation
In order to further explore the molecular mechanisms underlying the differential effects of metabolism among different tissues of Cudrania, three materials, CB, CL, and CR, were analyzed by transcriptome sequencing. A total of 12 cDNA libraries were constructed and sequenced, and 77.69 Gb of clean data were obtained. The clean data for each sample amounted to 6.10 Gb, and the percentage of Q30 nucleobases was 93.08% or higher, indicating the high quality of the transcriptome data. Comparing the clean reads of each sample with the reference genome sequence of "Shuchazao", the comparative efficiency was 82.91–84.84%. PCA was then performed based on the FPKM values of each gene expressed in the samples. The results of the principal component analysis of the three samples reproduced well, with significant differences among roots, stems, and leaves of Cudrania, and gene expression among different tissues was distinguished from each other, indicating that gene expression was significantly different among different tissues.
Of the 245205 genes obtained, only 44206 genes, or 18.03%, were simultaneously annotated to the seven major databases. In addition, a total of 175885 unigenes were annotated in the seven major databases, accounting for 71.73% of the total number of genes (Fig. 6A). Among them, the largest number of genes was annotated to the NR database, which was as high as 165859 genes, accounting for 67.64% of the total number of genes. The number of genes annotated to the NT database was 123,360 (50.31%). The number of genes annotated to the KEGG database was 78,952 (32.2%). The number of genes annotated to the SwissProt database was 123568 (50.39%). The number of genes annotated to the Uniprot database was 123,401 (50.33%). The number of genes annotated to the GO database was 81129 (33.09%). The proportion of genes annotated to the KOG database was 134824 (54.98%). Five databases, namely, NR, NT, KOG, SwissProt, and Uniprot, were selected for Venn diagrams (Fig. 6B).
Correlation check of gene expression levels and differential gene analysis
The correlation of gene expression levels between samples is an important indicator to test the reliability of the experiment and whether the sample selection is reasonable; the closer the correlation coefficient is to 1, the higher the similarity of expression patterns between samples. If there are biological replicates in the sample, the correlation coefficient between the biological replicates is usually high. The Encode program recommends that the squared Pearson correlation coefficient (R2) be greater than 0.92 (under ideal sampling and experimental conditions). For specific experimental manipulations, we require that the R2 between biological replicate samples be at least greater than 0.8. Intra- and inter-group correlation coefficients were calculated based on the TPM values of all genes in the sample and plotted as heat maps (Fig. 7A). It can be found that the correlation coefficients of each of the three groups, CB, CL and CR, are close to 1, indicating that the selected samples are reasonable and reliable, which is an important guarantee for the results of the differential gene analysis. A two-by-two comparative analysis of transcriptome data from different tissues of Cudrania revealed that the highest number of differentially expressed Unigenes (DEGs) were found in different tissues stem, leaf, and compared with root (Fig. 7B) Among them, stems were compared with leaves, of which 6741 were up-regulated and 11545 were down-regulated. Stems were compared with roots, of which 11,544 were up-regulated and 18,152 were down-regulated. Roots were compared with leaves, of which 12067 were up-regulated and 12748 were down-regulated. (Fig. 7C/D/E) Transcriptome data of different tissue samples A total of 245205 DEGs were obtained, with an error rate (FDR) of < 0.01. Overall gene expression heatmap analysis and hierarchical clustering analysis of DEGs from different tissues revealed that the stem transcript profiles of Cudrania were clustered into a single unit, while the roots and leaves could be clustered into a single unit, which indicated that Cudrania stem transcript profiles were tissue-specific. The gene expression trends of different tissues could be roughly clustered into eight categories (Fig. 7F)
GO enrichment analysis of differential genes
A total of 175,885 Unigenes (about 71.73%) could be compared to the homologous sequences in the database by Unigenes data comparison. In order to further explore the potential molecular mechanisms of the accumulation of active constituents in roots of Cudrania, GO enrichment analysis of DEGs from different tissues was performed. 81,129 DEGs were enriched in 56 GO terms, categorized into 3 sections: molecular functions, biological processes and cellular components. Where leaves were not found to function as cellular components compared to roots, these terms, mainly annotated by DEGs, are closely related to metabolic regulation in Cudrania, suggesting that differences in metabolites between different tissues of Cudrania are transcriptionally regulated(Fig. 8).
KEGG enrichment analysis of differential genes
To further clarify the differential metabolic pathways, metabolic pathway enrichment analysis was performed on the differential genes based on KEGG enrichment analysis. The top 20 pathways with the highest enrichment degree are shown in Fig. 10. The transcriptome data were compared separately among different tissues of roots, stems and leaves, and a total of 78,952 differential genes were screened. KEGG enrichment analysis revealed that DEGs produced in stems compared to leaves were enriched in Photosynthesis - antenna proteins, Zeatin biosynthesis, Flavone and flavonol biosynthesis and Stilbenoid, diarylheptanoid and gingerol biosynthesis pathways. DEGs produced by stems compared to roots were enriched in Monoterpenoid biosynthesis, Photosynthesis - antenna proteins, Glucosinolate biosynthesis and Biosynthesis of secondary metabolites -unclassified pathways. DEGs produced by leaves and roots were enriched in Monoterpenoid biosynthesis, Valine, leucine and isoleucine biosynthesis, Photosynthesis-antenna proteins, Zeatin biosynthesis, Flavone and flavonol biosynthesis, Linoleic acid metabolism and Diterpenoid biosynthesis pathways.
Differential Gene Protein Interaction Network Analysis
We mainly apply the interactions in the STRING protein interactions database, and for the species included in the database, we directly extract the interactions of the target gene set (e.g., differential gene list) from the database to construct the network; for the species not included in the database, we firstly apply blastx (with the value set to 1e-10) to compare the target gene set sequences onto the protein sequences of the reference species included in the STRING database, and construct the interactions network using the protein interactions of the reference species. The size of a node in an interworking network graph is proportional to the degree of the node, i.e., the more edges are connected to the node, the larger its degree is, and the larger the node is, and these nodes are likely to be in a more central position in the network. The colour of the node is related to the clustering coefficient of the node, the gradient of the colour from green to red corresponds to the value of the clustering coefficient from low to high (Fig. 11A).
SSR Analysis
Simple Sequence Repeat (SSR) refers to a class of short tandem repeat sequences in which the repeat unit lengths and base compositions are basically the same, and only a few genes show minor differences. Eukaryotic genomes contain a large number of tandem repeats with very high repeat frequency and low sequence complexity, and the mutation status of these tandem repeats varies greatly among different species, different loci of the same species, or different alleles of the same locus. We used MISA software version 1.0 with default parameters; the corresponding minimum number of repeats for each unit size is: 1–10, 2–6, 3–5, 4–5, 5–5, 6 − 5 (e.g., 1–10, for mononucleotide repeats, the number of repeats is at least 10 to be detected; 2–6, for dinucleotide repeats, the minimum number of repeats is 6) (Fig. 11B).
Joint metabolomics and transcriptomics analysis
Flavonoid biosynthesis - Reference pathway
Flavonoids are a major class of plant secondary metabolites that serves a multitude of functions including pigments and antioxidant activity [20]. Flavonoids are synthesized from phenylpropanoid derivatives by condensation with malonyl-CoA. For example, condensation of p-coumaroyl-CoA (C6-C3) with three malonyl-CoA (C3) molecules results in naringenin chalcone with a diphenylpropane (C6-C3-C6) unit, which is converted to naringenin with the flavone (2-phenylchromen-4-one) backbone by conjugate ring closure (Fig. 12). These and further modifications yield a variety of structural forms including chalcones, flavanones, dihyroflavonols, and flavans, anthocyanins, flavones and flavonols, and isoflavonoids [21–22].
In order to gain a comprehensive understanding of the biosynthesis of flavonoids, we first investigated the biosynthesis of flavonoids. This includes the and downstream metabolites of flavonoid biosynthesis. As shown in the Fig. 13, compared with other synthesis pathways we noticed that the metabolites of Naringin, Naringenin, Pelargorudin, Dihydroquercetin, (+)-Catechin, and Dihydromyricetin were relatively higher in roots, suggesting that flavonoid biosynthesis in roots upstream may be more active [23]. The metabolites with higher content in stems were Phloretin, Phloritin, Pinocembin, Hesperetin, Neohesperidin, Luteolin, Quercetin, etc., while the metabolites with higher content in leaves were only Apigenin and Kaempferol.
The metabolism of flavonoids involves two secondary metabolite pathways [24], the phenylpropane metabolism pathway (Ko00940) and the flavonoid metabolism pathway (Ko00941, Ko00942, Ko00943, Ko00944). Based on the KEGG differential gene enrichment results, we found significant differences in both flavonoid and flavonol biosynthesis (Ko00944) pathways in the comparison of different tissues and organs. Cytochrome P450 monooxygenase (CYP) is the largest enzyme family in plants, which is widely involved in many monooxygenation reactions such as hydroxylation, epoxidation, and demethylation, and plays a crucial role in plant growth, development, and defense [25–26]. In this biosynthetic pathway, we found that CYP75B1, an enzyme gene of the CYP75B subfamily, was significantly up-regulated, indicating that roots and stems contain more flavonoids compared to leaves. One of the studies showed that the CYP75B enzyme efficiently catalyzed the production of naringenin, which is consistent with the results of metabolomics differential metabolite analysis [27]. The expression of flavonoid-3-O-glucoside glucosyltransferase (K22794, FG3) was significantly down-regulated in roots and stems, suggesting that the weak expression of this gene promotes naringenin and quercetin expression [28].
Nucleotide Metabolism in Relation to Photosynthesis
A significant enrichment of the photosynthesis-haptoglobin synthesis pathway was found in a two-by-two comparison of KEGG data from root, stem, and leaf transcriptomes. This belongs to the metabolic pathway of photosynthesis. LHC family genes are mainly involved in the capture and transfer of light energy, with different expression patterns in organs such as roots, stems, and leaves, with the highest expression in leaves. Other studies suggest that LHC family genes may play a key role in drought stress tolerance [29–30]. LHC genes were significantly up-regulated in leaves and stems compared to roots, and significantly down-regulated in roots compared to leaves and stems, with Lhca5 being an uncommon gene (Fig. 14).
Significant differences in metabolic pathways of nucleotides were found in KEGG enrichment analysis of metabolomics across combinations. Nucleotides are important biomolecules in organisms and consist of a five-carbon sugar, a phosphate group and a nitrogen base. It has important structural properties and biological functions in living organisms [31]. ATP is the most important energy molecule in the cell, releasing energy through water interpretation of phosphate bonds, providing support for plant growth and photosynthetic processes [32]. Nucleotides accumulate metabolites significantly in leaves, which can lead to an increase in chlorophyll a, b and carotenoids in leaves, and also promotes an increase in chlorophyll a/b values, which improves the photosynthetic efficiency of leaves. Consistent with the results of enrichment analysis of metabolomics.