Total polysaccharides content in P. odoratum samples
Total polysaccharide was extracted from the dried rhizomes, stems and leaves of P. odoratum. Polysaccharide content was the highest in the rhizomes (3.11%) and the lowest in the leaves (1.15%) (Additional files 1 : Figure 1).
RNA-seq and de novo transcriptome assembly.
The number of raw reads for P. odoratum for the samples of leaf, stem and rhizome tissue was 120,415,666, 117,871,048 and 120,302,266, respectively; these were generated using a BGISEQ-500 high-throughput sequencing platform. The Q30 of each sample was greater than 89.33%, and the GC content of each sample was approximately 43.57%. After thorough quality control and filtering, transcriptomes were generated and a total of 112,443 unigenes were assembled and clustered using Trinity software. The average length of unigene was 1240 bp and N50 was 1937 bp. Of these unigenes, 47.35% (53,237) were longer than 1000 bp, while 66.81% (75,126) were longer than 500 bp (Additional files 2 : Figure 2). The quality of the assembled transcripts was assessed using a single copy orthologous database (BUSCO), and 97% of the unigenes were perfectly matched in the BUSCO database (Additional files 3 : Figure 3).
Unigenes functional annotation.
In order to predict and classify possible functions of unigenes, 112,443 unigenes were annotated by NR, NT, Swissprot, KEGG, Pfam, KOG, and GO databases. In total, 76,714 (68.22%) unigenes were mapped to least one public database. 72,709 (NR: 64.66%), 56,712 (NT: 50.44%), 55,223 (Swissprot: 49.11%), 56,755 (KEGG: 50.47%), 17,752 (GO: 15.79%), 58,103 (KOG: 51.67%), and 53,845 (Pfam: 47.89%) unigenes were annotated (Table 1). A total of 10,877 (9.67%) unigenes were annotated together by five databases (Fig. 1A). The majority of mapped unigenes revealed the closest homology was with those from Asparagus officinalis (62.71%) in the NR database. The second highest homology was with Elaeis guineensis (9.16%). Phoenix dactylifera (5.66%), Ananas comosus (2.13%), and Musa acuminata subsp. malaccensis (2.02 %) were other homologous species (Fig. 1B). In each tissue, the expression values of all transcripts (FPKM > 1) were counted, and a total of 59,271, 52,184 and 45,893 unigenes were expressed in leaf, stem and rhizome tissues, respectively.
GO classification showed that 17,752 unigenes (15.79%) were divided into three categories: molecular function, cellular component, and biological processes. A detailed analysis of the biological processes group showed ‘cellular process’ (5,209 genes), ‘biological regulation’ (1,765 genes), ‘localization’ (1,260 genes), ‘metabolic process’ (1,161 genes), and ‘cellular component organization or biogenesis’ (1158 genes) were highly represented. In relation to molecular function, ‘binding’ (8,257 genes) and ‘catalytic activity’ (7,963 genes) were the top two GO terms. In the cellular component group, the top three GO terms were related to ‘cell’ (5,313 genes), ‘membrane part’ (5,259 genes), and ‘organelle part’ (2,301 genes) (Additional files 4 : Figure 4). For KOG analysis results, 58,103 unigenes (51.67%) were grouped into 25 functional categories: The top two categories were “general function prediction only” (12,145 genes), and “signal transduction mechanisms” (6,994 genes).
Overview of expression
In each tissue, the expression value of all transcripts (FPKM > 1) were counted, and a total of 41,811, 44,472 and 46,349 unigenes were expressed in leaf, rhizome and stem tissues, respectively (Fig. 2A). The overall level of gene expression was higher in rhizome tissue than in leaf and stem tissues (Fig. 2B).
Characterization of functional genes involved in polysaccharide biosynthesis via KEGG pathway analysis.
KEGG classification resulted in 56,755 unigenes (50.47%) being mapped into 20 pathways, and the most represented KEGG function category was global and overview maps (12,994 genes), which was followed by carbohydrate metabolism (4,829 genes), translation (4,537 genes), folding, sorting and degradation (3,572 genes), and transcription (3,238 genes) (Additional files 5 : Figure 5). The “carbohydrate metabolism” subcategory contained 15 pathways, including starch and sucrose metabolism (ko00500), amino sugar and nucleotide sugar metabolism (ko00520), glycolysis / gluconeogenesis (ko00010), citrate cycle (ko00020), pentose phosphate pathway (ko00030), pentose and glucuronate interconversions ( ko00040), fructose and mannose metabolism (ko00051), galactose metabolism (ko00052), ascorbate and aldarate metabolism (ko00053), inositol phosphate metabolism (ko00562), pyruvate metabolism (ko00620), glyoxylate and dicarboxylate metabolism (ko00630), propanoate metabolism (ko00640), butanoate metabolism (ko00650), and c5-branched dibasic acid metabolism (ko00660) (Fig. 3). Among these pathways, we annotated 1108 unigenes involved in starch and sucrose metabolism and 942 unigenes involved in amino sugar and nucleotide sugar metabolism. Based on bioinformatics analysis, 211 unigenes encoding the enzymes involved in polysaccharide biosynthesis were identified, including beta-fructofuranosidase (sacA), hexokinase (HK), fructokinase (scrK), mannose-6-phosphate isomerase (MPI), phosphomannomutase (PMM), mannose-1-phosphate guanylyltransferase (GMPP), GDP-mannose 4,6-dehydratase (GMDS), GDP-L-fucose synthase (TSTA3), glucose-6-phosphate isomerase (GPI), phosphoglucomutase (pgm), UTP-glucose-1-phosphate uridylyltransferase (UGP2), UDP-glucose 4-epimerase (GALE), UDP-glucose 6-dehydrogenase (UGDH), UDP-apiose/xylose synthase (AXS), UDP-arabinose 4-epimerase (UXE), UDP-glucose 4,6-dehydratase (RHM) and 3,5-epimerase/4-reductase (UER1) (Table 2). In the P. odoratum transcriptome, 7 subclasses of NDP-sugar interconversion enzymes (NSEs) were identified, namely RHM (17 unigenes), UER1 (2 unigenes), GMDS (6 unigenes), GALE (34 unigenes), UGDH (12 unigenes), UGE (1 unigenes) and UXE (5 unigenes). Based on our results that identified constituents and key enzymes in carbohydrate metabolism, we outlined potential biosynthetic pathways for polysaccharide formation (Fig. 4). We obtained key enzymes unigenes that were predicted to be the sacA, HK, scrK, MPI, PMM, GMPP, GMDS, TSTA3, GPI, pgm, UGP2, GALE, UGDH, AXS, UXE, UGE, RHM, and UER1 may involve in polysaccharide biosynthesis.
Validation and analysis of differentially expressed genes (DEGs) and specifc expressed genes in P. odoratum tissues.
In P. odoratum tissues, 78,288 shared unigenes were identified, of which 3,760 unigenes were found to be involved in polysaccharide biosynthesis. Based on the FPKM values of all unigenes, 4,374, 3,343, and 3,109 unigenes showed unique expression in leaf, rhizome and stem tissues, respectively (Fig. 5A). Among these unique expression unigenes, 128, 82, and 79 unigenes are involved in carbohydrate metabolism, respectively (Additional files 6 : Figure 6).
Based on the KEGG enrichment analysis of DEGs, 38,085 unigenes(22,825 up-regulated unigenes and 15,260 down-regulated unigenes) were identified as significant DEGs in rhizome tissue compared to leaf tissue, including 810 unigenes involved in phenylpropanoid biosynthesis, 566 unigenes involved in starch and sucrose metabolism, 147 unigenes involved in carotenoid biosynthesis and 133 unigenes involved in flavonoid biosynthesis. Comparison of rhizome tissue and stem tissue revealed 23,016 DEGs (9,701 up-regulated unigenes and 13,315 down-regulated unigenes), including 557 unigenes involved in phenylpropanoid biosynthesis, 380 unigenes involved in starch and sucrose metabolism, 107 unigenes involved in flavonoid biosynthesis and 100 unigenes involved in carotenoid biosynthesis(Additional files 7 : Figure 7). Whereas comparison of the rhizome tissue with stem and leaf tissues showed that 16,655 DEGs, including 7,647 co-upregulated unigenes and 7,797 co-downregulated ones in rhizome tissue compared to stem and leaf tissues (Fig. 5B). Among the up-regulated and down-regulated genes, we focused more on those up-regulated unigenes involved in the pathway of biosynthesis. 7610 specifc up-regulated unigenes were identified in rhizome tissue compared to stem and leaf tissues with log2 (fold changes) > 1, and these unigenes were further estimated by KEGG classification and GO enrichment. Among these up-regulated unigenes, 394 unigenes involved in carbohydrate metabolism were identified by KEGG classification, while 42 unigenes were enriched to DNA binding transcription factor activity by GO enrichment(Fig. 5C,D).
The “carbohydrate metabolism” classification was enriched in DEGs. Of which, 2367 unigenes were up-regulated in rhizome tissue compared to leaf tissue, while 1821 unigenes were up-regulated in rhizome tissue compared to stem tissue (Additional files 9 : Table 1). The “Biosynthesis of other secondary metabolites” classification was enriched in DEGs. Of which, 1031 unigenes were also up-regulated in rhizome tissue compared to leaf tissue, while 809 unigenes were up-regulated in rhizome tissue compared to stem tissue (Additional files 10 : Table 2). In this study, these up-regulated unigenes in rhizome tissue were the candidate genes that can enhance the availability of the precursor for biosynthesis of polysaccharide, and have the potential to improve the production of polysaccharide.
Analysis of transcriptome factors involved in polysaccharide biosynthesis and other secondary metabolites
Transcription factors (TFs) participate in a variety of developmental and physiological roles in plants. An increasing number of TFs have been isolated and characterized for several plant secondary metabolic pathways. From our P.odoratum database, 2,865 unigenes were annotated in the PlantTFDB and assigned to 58 TF families. Among these TF families, the most abundant one was the MYB family (343 unigenes), followed by bHLH (223 unigenes), WRKY (188 unigenes), AP2-EREBP (180 unigenes), NAC (149 unigenes), bHLH (150 unigenes), C3H (143 unigenes), FAR (135 unigenes), C2H2 (76 unigenes), and Trihelix (69 unigenes) (Fig. 6A). Through pathway classifications of all TF families, 73 unigenes involved in carbohydrate metabolism were identified, including MYB (6 unigenes), C2H2 (41 unigenes), Trihelix (7 unigenes), bHLH (4 unigenes), FAR1 (2 unigenes), and GeBD (2 unigenes) (Fig. 6B). Meanwhile, 49 unigenes involved in biosynthesis of other secondary metabolites were identified, including MYB (15 unigenes), Trihelix (8 unigenes), C2C2-Dof (5 unigenes), LoB (4 unigenes), C2H2 (3 unigenes), and BES1 (2 unigenes) (Fig. 6C). These unigenes have been identified in the regulation of polysaccharide biosynthesis pathways (Additional files 11 : Table 3).
As shown in the heat map of hierarchical clustering(Additional files 8 : Figure 8), the expression level of 73 transcriptome factors involved in carbohydrate metabolism was higher in rhizome tissue compared to leaf and stem tissues, including 47 upregulated unigenes in rhizome vs leaf tissue and 42 up-regulated unigenes in rhizome vs stem tissue. The expression level of 49 transcriptome factors involved in the biosynthesis of other secondary metabolites was higher in rhizome tissue compared to leaf and stem tissues, of which 37 up-regulated unigenes in rhizome vs leaf tissue and 27 up-regulated unigenes in rhizome vs stem tissue.
Validation of key enzyme genes using qRT-PCR
UGE, UGP2, GMPP and sacA genes were tested for their expression level using qRT-PCR assays. The relative expression level of UGE, UGP2 and sacA was obtained for rhizome, stem and leaf tissues, and rhizome tissue showed higher expression than stem or leaf tissues. The relative expression of the GMPP gene was greater in leaf tissue than stem or rhizome tissue, which is consistent with our transcriptional data (Fig. 7).
The structural characteristics of sacA involved in polysaccharide biosynthesis.
β-fructofuranosidase (sacA) is the first key enzyme in the polysaccharide synthesis pathway, which catalyzes the conversion of sucrose to glucose 6-phosphate (Glc-6P) and Fructose . The alignment of 7 sacA amino acid sequences revealed that the sequence identity is not high (59.57%), but the 7 sacAs showed similar spatial structures, and all had three well-conserved motifs. A 3D structural model of sacA (CL7969. Contig2) was constructed based on the crystal structure of 6-SST/6-SFT from Pachysandra terminalis (PDB ID: 3UGF) using SWISS-MODEL (https://swissmodel.expasy.org/) and PyMOL software. The spatial structure model consisted of one β-propeller domain and one β-sheet domain, which are connected by an α-helix. The β-propeller domain has five radially oriented blades (Fig. 8A, blades I–V, colored in orange, cyan, magenta, green, and yellow, respectively), each with one α-helix at the N-terminus and C-terminus. The β-sheet domain consists of two six-stranded antiparallel beta-sheets (Fig. 8A, colored in blue), forming a sandwich-like fold. A well-conserved catalytic triad is located in the deep central pocket of the beta-propeller domain (D95, D224 and E282) (Fig. 8C).