De novo Assembly and Transcriptome Analysis of Polygonatum odoratum (Mill.) Druce : Detection of Potential Genes Involved in Polysaccharide Synthesis

Background: Polygonatum odoratum (Mill.) Druceis a well-known traditional Chinese herb. Polysaccharide is one of the main bioactive components from Polygonatum odoratum, with broad pharmacological effects, including improving immunity, and is used in the treatment of rheumatic heart disease, cardiovascular disease, and diabetes. Results: This study identifies potential genes and transcription factors (TFs) that regulate polysaccharide synthesis in Polygonatum odoratum by using RNA sequencing data from leaf, stem and rhizome tissues. A total of 112,443 unigenes were de novo assembled, and 76,714 were annotated in public databases. Differentially expressed gene analysis showed that most upregulated and uniquely expressed unigenes were enriched in rhizome tissue compared with leaf or stem tissue. UV-spectrophotometry results showed that polysaccharide content was the greatest in the rhizome tissue. Additionally, 2,865 unigenes relevant to TF families were predicted, including 73 involved in polysaccharide synthesis. A few key enzyme genes were also verified by quantitative real-time PCR (qRT-PCR). Seven β-fructofuranosidases with different amino acid sequences showed similar spatial structures and all had well-conserved catalytic triads. Conclusion: This study substantially enlarges the public transcriptome datasets of this species, and provides insight into the detection of novel genes involed polysaccharide and other secondary metabolite synthesis. results quality of assembly and the reliability of our transcriptome database. Analysis of the DEGs of KEGG pathways indicated that 38,085 DEGs were mapped to metabolic pathways in rhizome versus leaf tissue and rhizome versus stem tissue, which are mainly enriched in “starch and sucrose metabolism”, “plant hormone signal transduction”, and “phenylpropanoid biosynthesis”. Of which, 276 DEGs were up-regulated in starch and sucrose metabolism in rhizome versus leaf tissue, while 123 DEGs were upregulated in starch and sucrose metabolism in rhizome versus stem tissue. These results support the use of P. odoratum rhizomes as a Chinese medicinal material at the genetic level. KEGG annotations revealed 18 key enzymes responsible for polysaccharide biosynthesis, including sacA, HK, scrK, MPI, PMM, GMPP, GMDS, TSTA3, GPI, pgm, UGP2, GALE, UGDH, AXS, UXE, UGE, RHM, and UER1. The unigenes that encode sacA, UGE, and UGP2 enzymes showed higher expression in rhizome than leaf or stem tissue.

several NDP-sugar interconversion enzymes (NSEs) catalyze the conversion of either UDP-Glc or GDP-Man to other NDP sugars [18]. Finally, the NDP-sugars are used in polysaccharide biosynthesis through various glycosyltransferases (GTs) [19,20]. To date, there have been no comprehensive reports on the genes involved in polysaccharide metabolism in P. odoratum, and no reports exist on their expression patterns in different tissues.
RNA-sequencing can provide comprehensive information for gene expression to improve our understanding of gene regulation and metabolic pathways [21]. Some  identified. However, genomic information for P. odoratum has not yet been reported, which hinders further research on this species.
In this study, we performed deep de novo transcriptome analysis on different tissues of P. odoratum using RNA sequencing. The numbers of potential genes that participate in polysaccharide pathways are identified in our transcriptome results. The transcriptome data from P. odoratum will be an important resource to investigate polysaccharide biosynthesis and other metabolic pathways in plants.

Plant material and RNA extraction.
Whole Polygonatum odoratum plants (identified by Professor Qingshan Yang, Anhui University of Chinese Medicine) were harvested from the Anhui University of Chinese Medicine herb garden with permission of managers and Professional. Fresh plants was washed with sterile distilled water several times and wiped clean with filter paper.
The tissues (leaf, rhizome and stem) were separately collected and placed in a 50 mL 5 centrifuge tube, and quickly frozen in liquid nitrogen , and then stored in a refrigerator at −80 ° C for RNA extraction. The stems, leaves and rhizomes selected from three replicates were pooled together. Total RNA from the plants was extracted using the RNA Plant Kit (Aidlab Biotech, Beijing, China) according to the manufacturer's protocol.

Determination of total polysaccharides content
The dried P. odoratum samples (rhizomes, stems and leaves) were used to detect the content of polysaccharides with the Phenol-sulfuric acid method [27]. Dried powders (0.2 g) from each sample were mixed with 100mL distilled water and then extracted twice at 85 •C (1 h each time) in boiling distilled water. After that, precipitate after adding 95% ethanol (10 mL) and centrifuging was collected and dissolved with distilled water (50 mL), and 4% phenol (1 mL) and sulfuric acid (7 mL) were added for determine the absorbance.
The absorbance of three tissues were measured in 490nm using UV-spectrophotometry (JASCO company, Japan). The standard curve of the relationship between the concentration and the absorbance was drawed through a standard of anhydrous glucose (Supplementary Image S1). The yield (%) of total polysaccharides was calculated using the equation: cDNA library construction and sequencing (mRNA-Seq).
Following the total RNA extraction, The extracted RNA was checked using a NanoDrop 2000 (Thermo, CA, USA), and the RNA quality was evaluated using an Agilent 2100 BioAnalyzer to ensure that the RNA Integrity Number (RIN) values were above 7.0. mRNA was enriched using Oligo (dT). mRNA was split into small fragments using a fragmentation buffer, and cDNA synthesis was performed with random primers. PCR amplification was prepared for library construction according to manufacturer' instructions. In detail, each mRNA samples were then used for first-strand cDNA synthesis using reverse transcriptase and random hexamer primers, while second-strand cDNA was synthesized with RNase H Transcriptome de novo assembly and unigene functional annotation.
After sequencing, raw data were received, and low quality reads (above 50% of bases with Q-value 20) were removed to obtain clean reads [29]. De novo assembly of the transcriptome was accomplished from clean reads with Trinity v2.0. 6 [30]. The contigs were connected using Trinity, and obtain sequences termed as unigenes. The Unigenes were divided into two classes, including clusters and singletons. The prefix of clusters was CL with the cluster id behind it, The prefix of singletons was unigene.
The unigenes were used for BLAST and annotation against protein databases, including Nt (NCBI nucleotide sequences), Nr (NCBI non-redundant protein sequences), KOG (Clusters of euKaryotic Orthologous Groups), KEGG (Kyoto Encyclopedia of Genes and Genome), SwissProt (A manually annotated and reviewed protein sequence database). With nr annotation, the GO(Gene Ontology) annotation of unigenes was obtained using the Blast2GO program [31]. With the help of the KEGG database, we can further excavate genes' biological complex behaviors, and pathway annotation for unigenes were obtained by KEGG [32].

Analysis of differentially expressed genes(DEGs).
Each unigene was normalized into FPKM (Fragments Per Kb per Million fragments) values.
The unigenes abundance was calculated by the ratio of FPKM values (|log2Ratio|). To analyze the differentially expressed genes (DEGs), clean reads to assembled unigenes were mapped using Bowtie2(version 2.2.5). DEGs were identified by PossionDis method, which was performed as described: The parameters of PossionDis was set as |Fold Change |>2.0-fold (|log 2 FC| >1) with a false discovery rate (FDR)<0.001 [33]. For functional annotation clustering of DEGs, GO analysis was performed using GO-Term Finder software.
DEGs were mapped to the terms of KEGG database for the pathway enrichment analysis.

Identification of transcriptome factor (TF)
TF, short for Transcription Factor, is a protein that binds to specific DNA sequences, thereby controlling the rate of transcription of genetic information from DNA to messenger RNA. To analyze TFs of P. odoratum, we tested the ORFs of unigenes and then aligned the ORF to the transcription factor protein domain using hmmsearch. Unigenes were annotated by using PlantTFDB(Plant transcription factor database) [34]. The unigenes encoding the transcription factors were identified by comparison with Pfam23.0 using the hmmsearch program [35].

qRT-PCR Analysis of key genes in polysaccharide biosynthesis.
The qRT-PCR was conducted on a Real-time Thermal Cycler 5100 PCR System using GoTaq 8 qPCR Master Mix PCR kit Promega . Specific primers were designed using Primer Premier 5.0 (Additional files 12 : Table 4). Total RNA was extracted from the leaves, stems, and rhizomes, and reverse-transcribed to cDNA with TransScript All-in-One cDNA supermix for qPCR (TransGen), following the manufacturer's instructions. All PCR conditions were performed as follows: 95 °C for 2 min, 40 cycles of 95°C for 15 s and 60°C for 1 min, and all qRT-PCR were repeated in three biological and three technical replications.
Additionally, the actin gene (CL2254.Contig1) of P. odoratum was used as a reference. The relative expression levels of the selected genes were determined using the equation 2ˆ(-ΔΔCt) method [36].

Analysis of the structure characteristics of sacA.
Total of 7 complete amino acid sequences were selected from 20 unigenes encode sacA (β-fructofuranosidase). The alignment of 7 sacA amino acid sequences was performed using DNAMAN software. 3D structures model of sacA proteins was analyzed using the SWISS-MODEL (https://swissmodel.expasy.org/) and PyMOL software [37].

Total polysaccharides content in P. odoratum samples
Total polysaccharide was extracted from the dried rhizomes, stems and leaves of P.
odoratum. Polysaccharide content was the highest in the rhizomes (3.11%) and the lowest in the leaves (1.15%) (Additional files 1 : Figure 1).

RNA-seq and de novo transcriptome assembly.
The number of raw reads for P. odoratum for the samples of leaf, stem and rhizome tissue was 120,415,666, 117,871,048 and 120,302,266, respectively; these were generated using a BGISEQ-500 high-throughput sequencing platform. The Q30 of each sample was greater than 89.33%, and the GC content of each sample was approximately 43.57%. After thorough quality control and filtering, transcriptomes were generated and a total of 112,443 unigenes were assembled and clustered using Trinity software. The average length of unigene was 1240 bp and N50 was 1937 bp. Of these unigenes, 47.35% (53,237) were longer than 1000 bp, while 66.81% (75,126) were longer than 500 bp (Additional files 2 : Figure 2). The quality of the assembled transcripts was assessed using a single copy orthologous database (BUSCO), and 97% of the unigenes were perfectly matched in the BUSCO database (Additional files 3 : Figure 3).

Overview of expression
In each tissue, the expression value of all transcripts (FPKM > 1) were counted, and a total of 41,811, 44,472 and 46,349 unigenes were expressed in leaf, rhizome and stem tissues, respectively ( Fig. 2A). The overall level of gene expression was higher in rhizome tissue than in leaf and stem tissues (Fig. 2B).
Validation and analysis of differentially expressed genes (DEGs) and specifc expressed genes in P. odoratum tissues.
In P. odoratum tissues, 78,288 shared unigenes were identified, of which 3,760 unigenes were found to be involved in polysaccharide biosynthesis. Based on the FPKM values of all unigenes, 4,374, 3,343, and 3,109 unigenes showed unique expression in leaf, rhizome and stem tissues, respectively (Fig. 5A). Among these unique expression unigenes, 128, 82, and 79 unigenes are involved in carbohydrate metabolism, respectively (Additional files 6 : Figure 6).  (Additional files 7 : Figure 7). Whereas comparison of the rhizome tissue with stem and leaf tissues showed that 16,655 DEGs, including 7,647 coupregulated unigenes and 7,797 co-downregulated ones in rhizome tissue compared to stem and leaf tissues (Fig. 5B). Among the up-regulated and down-regulated genes, we focused more on those up-regulated unigenes involved in the pathway of biosynthesis. 7610 specifc up-regulated unigenes were identified in rhizome tissue compared to stem 13 and leaf tissues with log2 (fold changes) > 1, and these unigenes were further estimated by KEGG classification and GO enrichment. Among these up-regulated unigenes, 394 unigenes involved in carbohydrate metabolism were identified by KEGG classification, while 42 unigenes were enriched to DNA binding transcription factor activity by GO enrichment (Fig. 5C,D).

Validation of key enzyme genes using qRT-PCR
UGE, UGP2, GMPP and sacA genes were tested for their expression level using qRT-PCR assays. The relative expression level of UGE, UGP2 and sacA was obtained for rhizome, stem and leaf tissues, and rhizome tissue showed higher expression than stem or leaf tissues. The relative expression of the GMPP gene was greater in leaf tissue than stem or rhizome tissue, which is consistent with our transcriptional data (Fig. 7).

The structural characteristics of sacA involved in polysaccharide biosynthesis.
15 β-fructofuranosidase (sacA) is the first key enzyme in the polysaccharide synthesis pathway, which catalyzes the conversion of sucrose to glucose 6-phosphate (Glc-6P) and Fructose . The alignment of 7 sacA amino acid sequences revealed that the sequence identity is not high (59.57%), but the 7 sacAs showed similar spatial structures, and all had three well-conserved motifs. A 3D structural model of sacA (CL7969. Contig2) was constructed based on the crystal structure of 6-SST/6-SFT from Pachysandra terminalis (PDB ID: 3UGF) using SWISS-MODEL (https://swissmodel.expasy.org/) and PyMOL software. The spatial structure model consisted of one β-propeller domain and one β-sheet domain, which are connected by an α-helix. The β-propeller domain has five radially oriented blades (Fig. 8A, blades I-V, colored in orange, cyan, magenta, green, and yellow, respectively), each with one α-helix at the N-terminus and C-terminus. The β-sheet domain consists of two six-stranded antiparallel beta-sheets (Fig. 8A, colored in blue), forming a sandwich-like fold. A well-conserved catalytic triad is located in the deep central pocket of the beta-propeller domain (D95, D224 and E282) (Fig. 8C).

Discussion
Although polysaccharides are the major bioactive components of P. odoratum, with significant medicinal value, the molecular mechanisms behind their synthesis are still unknown. To further identify the genes encoding key enzymes and TFs modulating polysaccharide synthesis, we constructed a comprehensive genomic library of P. odoratum leaf, stem and rhizome tissues. This is the first report about transcriptome study of P. odoratum, providing adequate references to study on other plants with close relationship to P. odoratum. A total of 112,443 unigenes were assembled in our datasets, with an N50 of 1937 bp. 47.35% unigenes (53,237) were longer than 1000 bp, while 66.81% unigenes (75,126) were longer than 500 bp. Compared with transcript datasets reported for other medicinal plants, P. odoratum has more unigenes than Polygonatum sibiricum (76,717) [38] and Platycodon grandiflorum (34,053) [39], and the average unigene length (1937 bp) is longer than those from Polygonatum sibiricum (900 bp), Pinellia ternata (750 bp) [40], and Platycodon grandiflorum (1102 bp). These results strongly indicated a high quality of assembly and the reliability of our transcriptome database. The seven sacAs with different amino acid sequences (identity 59.57%) showed similar spatial structures and all had well-conserved motifs. The active site of sacA is located in the deep central pocket of the β-propeller domain. A catalytic triad was identified as D95, D224 and E282 (part of the FMSDPSG motif, FRDP motif and WECTD motif, respectively) by superposition and comparison with the active site of a plant fructan biosynthesis enzyme from Pachysandra terminalis [44] (Fig. 8B, shown as red spheres). These results indicate that the gene cluster of sacA may play roles in regulating polysaccharide biosynthesis in different tissues of P. odoratum rhizomes. The characterization of these unigenes can be used to gain deeper knowledge on the molecular mechanisms of polysaccharide biosynthesis. In the future research, we will identify the function of sacA and other candidate genes.
By analysis of TFs, 2,865 candidate genes were predicted using the PlantTFDB database,  [45], and C2H2 zinc finger transcription factors play an important role in plant tolerance to various environmental stresses such as drought, cold, osmotic stress, wounding, and mechanical loading [46,47]. A total of 343 genes related to the MYB family and 76 genes related to the C2H2 family were examined in our transcriptional data. These results strongly indicated that these genes play critical roles in regulating the polysaccharide content. This is the first study of large-scale transcriptome sequencing and analysis in P. odoratum, and these data provide a comprehensive genetic resource that will enable improvements in understanding how polysaccharide biosynthesis and accumulation are regulated at the 18 molecular level.

Conclusions
In this study, the high-quality transcriptome sequencing data of leaf, root, and rhizome tissues of P. odoratum were obtained and the results of functional annotation, and expression profile were presented. Particularly, 18 key enzyme involved in the biosynthesis of polysaccharide were identified, and differentially expressed genes were analyzed. A few key enzyme genes validated using qRT-PCR were well in accordance with the expression data obtained by RNA-Seq. These studies will not only address the molecular mechanisms of polysaccharide biosynthesis in P. odoratum, but also provide valuable information for our complete understanding of the biosynthesis pathway on this species.  Tables   Table 1 Summary statistics of annotations Figure 1 Unigene functional annotation.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.