Temporal expression of polysaccharide biosynthesis genes in the leaves of Cyclocarya paliurus at different developmental stages

Background: Cyclocarya paliurus (Batal.) Iljinskaja is a common endemic tree species. The leaves of C. paliurus are used as a Chinese medicine and the main active components are polysaccharides. However, the temporal pattern of polysaccharide synthesis at different leaf developmental stages has not been reported to date. Results: With the development of leaves, the content of polysaccharides increased first and the highest content was found at the F3 stage (the third larger full expanded leaf). A total of 499710194 clean reads were obtained using C. paliurus genomic data and were assembled into 296593 unigenes. Among 4708 identified DEGs, 429 DEGs were up-regulated and 451 DEGs were down-regulated from F1 stage (the smallest full expanded leaf) to F2 stage (the second larger full expanded leaf), 630 DEGs were up-regulated and 60 DEGs were down-regulated from F2 stage to F3 stage, and 1833 up-regulated and 1816 down-regulated DEGs from F3 stage to F4 stage. Forty DEGs associated with GT belong to 13 GT families. Among them, only one gene was down-regulated from F1 stage to F2 stage, two genes were down-regulated from F2 to F3 stages, and 23 genes were down-regulated and 15 genes were up-regulated from F3 stage to F4 stage, respectively. A significant correlation exists between the five unigenes and the polysaccharide content. UDP-glucose 4-epimerase gene was significantly positively correlated with the polysaccharide content. A pathway map for the biosynthesis of C. paliurus polysaccharide was proposed. Among 150 transcription factors identified from DEGs, the majority was members of the AP2/ERF family (21, 14%), followed by the C2H2 family (14, 9.33%), the MYB family (12, 8%), the C2C2-GATA family (10, 6.67%), the GRAS family (9, 6%), and the zf-HD family (7, 4.67%). Conclusions: These results identified genes involved in the biosynthesis of Cyclocarya paliurus polysaccharides Two expression the polysaccharide content. UDP-glucose 4-epimerase gene was significantly positively correlated with the polysaccharide content. A pathway map for the biosynthesis of C. paliurus polysaccharide was proposed. A total of 150 transcription factors from the DEGs were identified via coexpression analysis of transcription factors and structural genes and grouped into 38 transcription factor families. This study provides information for the screening of polysaccharide biosynthesis related genes and identifies the mechanism underlying polysaccharide biosynthesis in C. paliurus

To compare the changes of polysaccharide content at different leaf development stages, the polysaccharide contents of leaves at four different developmental stages (F1, F2, F3, and F4) are shown in Fig. 2. The polysaccharide content differed significantly among the four different leaf stages.
With the development of leaves, the content of polysaccharides increased first and the highest content was found at the F3 stage (16.373 mg/g), which was followed by a decrease. The content of polysaccharides in the leaves at the F4 stage was slightly lower than that at the F3 stage (no significant difference).

Sequencing and assembly
A total of 703862064 entries of sequencing data, including 541694618 raw reads and 499710194 clean reads were obtained. The average Q20 and Q30 values were 98.69% and 95.23%, respectively.
The average GC content was 53.08% (Table 1). A total of 574747 transcripts were generated, where the shortest transcript was 201 bp and the longest transcript was 76464 bp. The total length of the transcript was 289576116 bp, and the average length was 503.83 bp. The N50 and N90 were 2269 and 240 bp, respectively. A total of 296593 unigenes were assembled, where the longest transcript and the shortest transcript had the same length. 61444 (20.72%) of the unigenes were ≥ 500 bp and 19068 (6.43%) were ≥ 1000 bp. The total length of the unigenes was 127566956 bp, and their average length was 430.11 bp (see Table 2).
The GO database was used to annotate unigenes and classify their possible functions. A total of 145508 (49.06%) unigenes were assigned to 71 functional groups. Among these, a total of 22 functional groups were associated with "cell components", 22 functional groups were associated with "molecular function", and 27 functional groups were associated with "biological processes". The "binding", "catalytic activity", and "transporter activity" were the largest subgroups of molecular functional classifications. The unigenes associated with cellular processes, metabolic processes, and single-organism processes had a higher proportion in the biological process category. In the cellular component category, the proportion of unigenes associated with "cell", "cell parts", and "organelles" were high (Fig. 3).

Differentially expressed gene analysis
A total of 880 DEGs were identified in leaves between F1 and F2 stage and 429 up-regulated and 451 down-regulated DEGs in F2 stage. A total of 690 DEGs were identified in leaves between F2 and F3 stages and 630 up-regulated and 60 down-regulated DEGs in F3 stage. A total of 3649 DEGs were identified in leaves between F3 and F4 stages and 1833 up-regulated and 1816 down-regulated DEGs in F4 stage (Fig. 5a). A Venn plot shows the overlap of DEGs between different groups. Many unigenes were differentially expressed at only one or two adjacent stages, and several unigenes were differentially expressed among the three groups (Fig. 5b, 5c).
DEGs were screened and their GO distribution was studied to elucidate functional differences between leaf samples at different developmental stages (Fig. 6). The results showed that the degrees of enrichment of "cell" and "cell part" were the highest between F1 and F2, both of which included down-regulation and up-regulation, and belonged to the cell composition category. Other highly enriched DEGs were associated with "metabolic processes" and "cell processes". The DEGs of "organelles", "cells", and "cell parts" were up-regulated between F2 and F3, all of which belonged to cellular components. Other significantly up-regulated DEGs included "cell processes" and "metabolic processes", and down-regulated DEGs included "binding" and "catalytic activity". The DEGs of "cell part", "cell", and "organelle" were up-regulated between F3 and F4, and the DEGs of "cell process" and "metabolic process" were down-regulated. Other highly enriched unigenes were associated with "binding" and "catalytic activity".
All enrichment pathways between DEGs and annotated unigenes were found via KEGG enrichment analysis based on the KEGG pathway database (Fig. 7). The most significant enrichment pathway between F1 and F2 was the "fatty acid degradation", which involved up-regulation of DEGs, followed by a second significantly enriched pathway "monoterpene biosynthesis", which also included upregulation of DEGs. "Ribosome" and "spliceosome" were the most significantly enriched pathways between F2 and F3 stages, both of which were down-regulated, and the number of involved DEGs was maximal. Many unigenes were differentially expressed between F3 stage and F4 stage (Fig. 7), in which the "glyoxylate and dicarboxylate metabolism" and "glycine, serine, and threonine metabolism" pathways were significantly enriched. Although most DEGs were down-regulated in "carbon metabolism", this enrichment was not significant.

DEG related to glycosyltransferases genes
Forty DEGs associated with GT belong to 13 GT families. Only one gene was down-regulated between F1 stage and F2 stage, and two genes were down-regulated between F2 stage and F3 stage. In contrast, 23 genes were down-regulated and 15 genes were up-regulated between F3 stage and F4 stage, respectively (Table 4). In the GT1 family, seven DEGs were up-regulated and eight DEGs were down-regulated, while in the GT2 family, one and three DEGs were up-and down-regulated, respectively (Table 5).
Furthermore, a correlation analysis was conducted between the gene expression of the polysaccharide synthesis pathway and the polysaccharide content (Table 6). One gene was significantly positively correlated to polysaccharide content, which encodes UDP-glucose 4-epimerase, while six genes were significantly negatively correlated to polysaccharide content.

Putative pathway for C. paliurus polysaccharide
Based on the above transcriptome analysis results and amino sugar and nucleotide sugar metabolism KEGG pathways, a pathway map for the biosynthesis of C. paliurus polysaccharide was proposed ( Fig.   9). In this pathway, the synthesis of C. paliurus polysaccharide is mainly divided into three steps: first, formation of UDP-Glucose and fructose; second, synthesis of multiple NDP-sugars; third, formation of polysaccharide under the action of different GTs. Each enzyme in the pathway was associated with several unigenes (Table S3), indicating that each enzyme in the pathway was associated with multiple coding genes. The most single genes (12 genes) were annotated as UDP-glucuronate decarboxylase  (Table S4). The majority of these transcription factors encoded by the DEGs were members of the AP2/ERF family (21, 14%), followed by the C2H2 family (14,9.33%), the MYB family (12, 8%), the C2C2-GATA family (10, 6.67%), the GRAS family (9, 6%), and the zf-HD family (7, 4.67%). Most of these transcription factors were either positively or negatively correlated with the enzyme gene, while 19 transcription factors did not correlate to the enzyme gene, including the OFP family and the C2C2-YABBY family.

qRT-PCR analysis
The abundance of gene transcripts of enzymes related to C. paliurus polysaccharide biosynthesis were determined via qRT-PCR analysis. Based on the difference in expression levels of genes at four different developmental stages, 10 genes were screened for qRT-PCR (Fig. 10). All of these 10 selected genes had similar expression patterns to RNA-seq data. Therefore, the data obtained in this study can be used as a tool to study the biosynthesis and metabolism-related genes of C. paliurus.

Discussion
Polysaccharides are the most important active components of C. paliurus leaves. This study found that the concentration of polysaccharides in old C. paliurus leaves (F3 and F4 stages) were significantly higher than those in young leaves (F1 and F2 stages). Robakidze and Bobkova (2003) reported that the water-soluble polysaccharide content was higher in older needles of Picea obovata [24]. Wang et al. (2001) reported that the water-soluble polysaccharide content in sixth-class tea leaves (fully developed leaves) was twice as high as that of first-class tea leaves (young leaves), which was consistent with the changes of the polysaccharide content in C. paliurus [25]. Compared to stage F3, the polysaccharide content in the leaves of C. paliurus at the F4 stage were slightly lower, which may be related to the plant entering the flowering stage, where flower bud differentiation led to the consumption of leaf nutrition [26].
A total of 4708 DEGs were identified, of which 429 were up-regulated and 451 were down-regulated unigenes between F1 and F2 stages, while 630 DEGs were up-regulated and 60 DEGs were downregulated between F2 and F3 stages. Notably, a total of 3649 DEGs (1833 up-regulated and 1816 down-regulated) were identified between F3 and F4 stages, which was the highest number of DEGs at all stages, indicating that these unigenes may play a key role in leaf development. In the results of GO classification annotation, most of the DEGs between the four leaf-developmental stages were annotated to "cells", "cell parts", and "organelles". In the present study, a total of 41 glycosyltransferases showed differences at four leaf-development stages, and the largest number of DEGs was found between F3 and F4 stages, including 23 downregulated genes and 15 up-regulated genes (Table 4) (Table S4). Zhou et al. (2009) reported that MYB58 and MYB63 are transcriptional regulators that specifically activate lignin biosynthetic genes during secondary wall formation in Arabidopsis thaliana [41]. Zhong and Ye (2009) reported that the eucalyptus transcription factor EgMYB2 and the pine transcription factor PtMYB4 are likely Arabidopsis MYB46 orthologs that are involved in the regulation of the entire secondary wall biosynthetic program [42]. The maize ZmMYB42 R2R3-MYB factor is involved in the phenylpropanoid pathway and cell wall structure and composition. Furthermore, this transcription factor also affects the cell wall structure and degradability, and its polysaccharide composition [43]. In the present study, the expression of 12 MYB transcription factors was closely related to the expression of polysaccharide biosynthesis genes, and the positively related transcription factors may promote the biosynthesis of polysaccharides. Among them, only one gene was down-regulated from F1 stage to F2 stage, two genes were downregulated from F2 to F3 stages, and 23 genes were down-regulated and 15 genes were up-regulated from F3 stage to F4 stage, respectively. A significant correlation exists between the five unigenes and the polysaccharide content. UDP-glucose 4-epimerase gene was significantly positively correlated with the polysaccharide content. A pathway map for the biosynthesis of C. paliurus polysaccharide was proposed. A total of 150 transcription factors from the DEGs were identified via coexpression analysis of transcription factors and structural genes and grouped into 38 transcription factor families.
This study provides information for the screening of polysaccharide biosynthesis related genes and identifies the mechanism underlying polysaccharide biosynthesis in C. paliurus.

Plant materials
C. paliurus leaves at four different developmental stages (Fig. 1) [47]. The F1 stage was the smallest fully expanded leaf, while the F4 stage was the largest full leaf enlargement and full leaf thickness. F2 and F3 refer to the second and third developmental stages between the smallest and largest leaves. Each sample was collected from three plants to obtain three biological replicates for RNA-seq analyses. All samples were immediately frozen in liquid nitrogen after collection and stored at −80 °C until further analysis.

Extraction of polysaccharides and content determination
The leaves were dried at 50 °C for 96 h until a constant weight was obtained. The leaves were ground into a fine powder and stored at room temperature prior to analysis. The polysaccharide of C. paliurus leaves was extracted according to a previously reported method [48] with modification. 2 g leaf powder was extracted with 80 ml of 80% ethanol at 90 °C for 60 min to remove most pigments, small molecular sugars, and impurities. Insoluble residues were dried and then extracted twice with 80 ml distilled water at 90 °C for 60 min. The extracts were filtered and the filtrate was obtained after centrifugation at 4500 × g for 15 min.
The Water-soluble polysaccharide content was measured using the phenol-sulphuric acid colorimetric method [49]. The absorbance was measured at 490 nm. The concentration of the water-soluble polysaccharide was quantitatively determined via the calibration curve and glucose was used as standard.

RNA isolation, cDNA library preparation, and sequencing
The total RNA was extracted according to the manufacturer's protocol, using the Total RNA Extractor

Transcriptome assembly and gene annotation
The raw data or raw reads of high-throughput sequencing, including gene ID and sequence, were samples. Genes were considered as significantly differentially expressed at a q-value < 0.001 and |FoldChange| > 2. When the normalized expression of a gene was zero between two samples, its expression value was adjusted to 0.01 (since 0 cannot be plotted on a log plot). If the normalized expression of a specific gene in two libraries was lower than 1, further differential expression analysis was conducted without this gene.
Functional enrichment analyses including GO and KEGG were performed to identify which DEGs were significantly enriched in GO terms or metabolic pathways. DEGs were mapped to the GO terms (biological functions) in the database, the number of genes was calculated in every term, and a hypergeometric test was performed to identify significantly enriched GO terms within the gene list out of the background of the reference gene list. KEGG pathway analysis identified significantly enriched metabolic pathways or signal transduction pathways enriched in DEGs in comparison to a reference gene background, using the hypergeometric test. GO terms and KEGG pathway with false discovery rate (q-value) < 0.05 were considered as significantly altered.
Ten glycosyltransferase genes that are involved in polysaccharide biosynthesis were selected for quantitative real-time PCR (qRT-PCR) verification. The reverse transcriptase reaction of DNase-treated RNA was conducted using HiScript II reverse transcriptase according to the manufacturer's instructions (Vazyme, Nanjing, China). Specific primers for GT genes were designed using Primer Premier 5.0 (Suppl. Table S1). The constitutively expressed gene (the β-Actin-1 gene) was used as internal control [51]. qRT-PCR was performed using CFX Connect (BioRAD, USA). The relative expression of genes was calculated by the 2 −∆∆Ct method. All qRT-PCRs were performed in three biological and three technical replicates.

Correlation analysis of polysaccharide biosynthesis related genes and transcription factors
Important genes related to polysaccharide biosynthesis and transcription factors that co-expressed with these genes were identified via correlation analysis. Unigenes with a TPM value ≥ 5 in at least one of the four stages of leaf development were selected to exclude false positives. Co-expression analysis was conducted using the "CORREL" function in Microsoft Excel 2010 and the results were independently verified using the SPSS 19.0 statistics software (IBM Corp, Armonk, NY, USA). The correlation was considered significant when the correlation coefficient values exceeded 0.95.

Statistical analysis of the data
The data are shown as means ± standard deviation (SD). One-way analysis of variance (ANOVA) was used to evaluate the significance at the 0.05 level.    Additional Files Table S1 qRT-PCR primers used in the study Table S2 The category and number of GTs family in Cyclocarya paliurus    KEGG pathway categories of Cyclocarya paliurus unigenes. The results were summarized as four main categories: metabolism, genetic information processing, cellular processes, and environmental information processing.

Figure 9
Putative biosynthesis pathway map and unigene expression pattern of Cyclocarya paliurus polysaccharide. The pathway of C. paliurus polysaccharide biosynthesis was constructed based on KEGG analysis. The full names of enzymes (with EC number) are provided in Table   S3. The average expression level of the enzyme-encoding unigenes at different leaf stages is indicated with a heat map.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.
Supplement materials.pdf