Changes of polysaccharide contents at different leaf stages
To compare the changes of polysaccharide content at different leaf development stages, the polysaccharide contents of leaves at four different developmental stages (F1, F2, F3, and F4) are shown in Fig. 2. The polysaccharide content differed significantly among the four different leaf stages. With the development of leaves, the content of polysaccharides increased first and the highest content was found at the F3 stage (16.373 mg/g), which was followed by a decrease. The content of polysaccharides in the leaves at the F4 stage was slightly lower than that at the F3 stage (no significant difference).
Sequencing and assembly
A total of 703862064 entries of sequencing data, including 541694618 raw reads and 499710194 clean reads were obtained. The average Q20 and Q30 values were 98.69% and 95.23%, respectively. The average GC content was 53.08% (Table 1). A total of 574747 transcripts were generated, where the shortest transcript was 201 bp and the longest transcript was 76464 bp. The total length of the transcript was 289576116 bp, and the average length was 503.83 bp. The N50 and N90 were 2269 and 240 bp, respectively. A total of 296593 unigenes were assembled, where the longest transcript and the shortest transcript had the same length. 61444 (20.72%) of the unigenes were ≥ 500 bp and 19068 (6.43%) were ≥ 1000 bp. The total length of the unigenes was 127566956 bp, and their average length was 430.11 bp (see Table 2).
Functional annotation and classification
Of these unigenes, 173809 unigenes were annotated to the NR database, accounting for 58.60% of the total. A total of 118930 (40.10%) unigenes were annotated to the Swissprot database. The number of unigenes annotated to NT, PFAM, GO, KOG, CDD, and TrEMBL databases were 104770 (35.32%), 57649 (19.44%), 145508 (49.06%), 64021 (21.59%), 89954 (30.33%), and 118930 (40.10%), respectively, and 191829 (64.68%) sequences were annotated to at least one database (Table 3).
The GO database was used to annotate unigenes and classify their possible functions. A total of 145508 (49.06%) unigenes were assigned to 71 functional groups. Among these, a total of 22 functional groups were associated with “cell components”, 22 functional groups were associated with “molecular function”, and 27 functional groups were associated with “biological processes”. The “binding”, “catalytic activity”, and “transporter activity” were the largest subgroups of molecular functional classifications. The unigenes associated with cellular processes, metabolic processes, and single-organism processes had a higher proportion in the biological process category. In the cellular component category, the proportion of unigenes associated with “cell”, “cell parts”, and “organelles” were high (Fig. 3).
Analysis based on the KEGG pathway provides hypothetical functionality for unigenes. A total of 14997 unigenes were assigned to the following four KEGG biochemical pathways: “metabolism” (7884, 52.57%), “genetic information processing” (4074, 27.17%), “cellular processes” (1588, 10.59%), and “environmental information processing” (1451, 9.67%) (Fig. 4). Among these, unigenes were divided into 23 subcategories in four biochemical pathways. The main subcategories were “translation” (12.92%, 1937), followed by “signal transduction” (9.21%, 1381), "carbohydrate metabolism” (1339, 8.93%), “folding, sorting and degradation” (1133, 7.55%), and “overview” (1084, 7.23%).
Differentially expressed gene analysis
A total of 880 DEGs were identified in leaves between F1 and F2 stage and 429 up-regulated and 451 down-regulated DEGs in F2 stage. A total of 690 DEGs were identified in leaves between F2 and F3 stages and 630 up-regulated and 60 down-regulated DEGs in F3 stage. A total of 3649 DEGs were identified in leaves between F3 and F4 stages and 1833 up-regulated and 1816 down-regulated DEGs in F4 stage (Fig. 5a). A Venn plot shows the overlap of DEGs between different groups. Many unigenes were differentially expressed at only one or two adjacent stages, and several unigenes were differentially expressed among the three groups (Fig. 5b, 5c).
DEGs were screened and their GO distribution was studied to elucidate functional differences between leaf samples at different developmental stages (Fig. 6). The results showed that the degrees of enrichment of “cell” and “cell part” were the highest between F1 and F2, both of which included down-regulation and up-regulation, and belonged to the cell composition category. Other highly enriched DEGs were associated with “metabolic processes” and “cell processes”. The DEGs of “organelles”, “cells”, and “cell parts” were up-regulated between F2 and F3, all of which belonged to cellular components. Other significantly up-regulated DEGs included “cell processes” and “metabolic processes”, and down-regulated DEGs included “binding” and “catalytic activity”. The DEGs of “cell part”, “cell”, and “organelle” were up-regulated between F3 and F4, and the DEGs of “cell process” and “metabolic process” were down-regulated. Other highly enriched unigenes were associated with “binding” and “catalytic activity”.
All enrichment pathways between DEGs and annotated unigenes were found via KEGG enrichment analysis based on the KEGG pathway database (Fig. 7). The most significant enrichment pathway between F1 and F2 was the “fatty acid degradation”, which involved up-regulation of DEGs, followed by a second significantly enriched pathway “monoterpene biosynthesis”, which also included up-regulation of DEGs. “Ribosome” and “spliceosome” were the most significantly enriched pathways between F2 and F3 stages, both of which were down-regulated, and the number of involved DEGs was maximal. Many unigenes were differentially expressed between F3 stage and F4 stage (Fig. 7), in which the “glyoxylate and dicarboxylate metabolism” and “glycine, serine, and threonine metabolism” pathways were significantly enriched. Although most DEGs were down-regulated in “carbon metabolism”, this enrichment was not significant.
Identification of glycosyltransferase genes
1333 unigenes related to the carbohydrate metabolism were identified, including 470 glycosyltransferase genes, 556 glycoside hydrolases, 173 carbohydrate esterases, 99 carbohydrate-binding modules, and 35 polysaccharide lyases (Fig. 8).
Glycosyltransferases (EC 2.4.x.y), which are enzymes that synthesize oligosaccharides, polysaccharides, and glycoconjugates, were dominant in unigenes that were associated with the carbohydrate metabolism. The results showed that 470 GTs were divided into 57 GT families (Suppl. Table S2). The families with the largest number of unigenes in the GT family were GT1 (68, 14.47%), and following GT2 (53, 11.28%), and GT47 (41, 8.72%). C. paliurus lacks 49 GTs families, including GT6, GT11, GT12, GT18, and GT23 families.
DEG related to glycosyltransferases genes
Forty DEGs associated with GT belong to 13 GT families. Only one gene was down-regulated between F1 stage and F2 stage, and two genes were down-regulated between F2 stage and F3 stage. In contrast, 23 genes were down-regulated and 15 genes were up-regulated between F3 stage and F4 stage, respectively (Table 4). In the GT1 family, seven DEGs were up-regulated and eight DEGs were down-regulated, while in the GT2 family, one and three DEGs were up- and down-regulated, respectively (Table 5).
Furthermore, a correlation analysis was conducted between the gene expression of the polysaccharide synthesis pathway and the polysaccharide content (Table 6). One gene was significantly positively correlated to polysaccharide content, which encodes UDP-glucose 4-epimerase, while six genes were significantly negatively correlated to polysaccharide content.
Putative pathway for C. paliurus polysaccharide
Based on the above transcriptome analysis results and amino sugar and nucleotide sugar metabolism KEGG pathways, a pathway map for the biosynthesis of C. paliurus polysaccharide was proposed (Fig. 9). In this pathway, the synthesis of C. paliurus polysaccharide is mainly divided into three steps: first, formation of UDP-Glucose and fructose; second, synthesis of multiple NDP-sugars; third, formation of polysaccharide under the action of different GTs. Each enzyme in the pathway was associated with several unigenes (Table S3), indicating that each enzyme in the pathway was associated with multiple coding genes. The most single genes (12 genes) were annotated as UDP-glucuronate decarboxylase (EC: 4.1.1.35), mannose-1-phosphate guanylyltransferase (EC: 2.7.7.13), and fructokinase (EC: 2.7.1.4), while the second largest number of unigenes (eight genes) were annotated as UDP-glucose 4-epimerase (EC: 5.1.3.2), and phosphomannomutase (EC: 5.4.2.8) encoding genes. The third largest number of unigenes (five genes) were annotated as UDP-glucose 6-dehydrogenase (EC: 1.1.1.22), UDP-glucose 4,6-dehydratase (EC: 4.2.1.76), and UDP-glucuronate 4-epimerase (EC: 5.1.3.6) encoding genes. Furthermore, four unigenes were annotated as encoding 3,5-epimerase/4-reductase (EC: 5.1.3.-), 4-reductase (EC: 1.1.1.-), and mannose-6-phosphate isomerase (EC: 5.3.1.8). Two unigenes were annotated as encoding xylan 1,4-beta-xylosidase (EC: 5.1.3.5). The expression levels of enzymes were calculated based on the average TPM values of the unigenes.
Correlation analysis between transcription factors and putative pathways of C. paliurus polysaccharide
Transcription factors play an important role in the regulation of the biosynthesis activity of plant polysaccharides and other secondary metabolic pathways. To identify transcription factors that potentially affect polysaccharide synthesis, correlation analysis was performed with differentially expressed transcription factors and enzyme genes encoding putative polysaccharide biosynthetic pathways. A total of 150 transcription factors from DEGs that were expressed during at least one developmental stage were identified and grouped into 38 transcription factor families (Table S4). The majority of these transcription factors encoded by the DEGs were members of the AP2/ERF family (21, 14%), followed by the C2H2 family (14, 9.33%), the MYB family (12, 8%), the C2C2-GATA family (10, 6.67%), the GRAS family (9, 6%), and the zf-HD family (7, 4.67%). Most of these transcription factors were either positively or negatively correlated with the enzyme gene, while 19 transcription factors did not correlate to the enzyme gene, including the OFP family and the C2C2-YABBY family.
qRT-PCR analysis
The abundance of gene transcripts of enzymes related to C. paliurus polysaccharide biosynthesis were determined via qRT-PCR analysis. Based on the difference in expression levels of genes at four different developmental stages, 10 genes were screened for qRT-PCR (Fig. 10). All of these 10 selected genes had similar expression patterns to RNA-seq data. Therefore, the data obtained in this study can be used as a tool to study the biosynthesis and metabolism-related genes of C. paliurus.