Analyses of TP and CGA compositions
The search for CGA-related candidate genes and miRNAs was initiated by profiling two CGA-producing leafy sweet potato genotypes (E: EC16; F: Fushu No.7-6) at two stages (S1: 65 days after planting; S2: 85 days after planting) using Folin-Ciocalteu and HPLC methods following Xu et al. [12]. As illustrated in Fig. 2a, the TPCs of E were ~1.6 and ~1.7 fold those of F at S1 and S2, respectively. In Fig. 2b and c, at S1, the contents of CQA, 3,4-diCQA, 3,5-diCQA, and 4,5-diCQA in E were ~2.4, ~1.7, ~3.0, and ~2.4 fold those in F; at S2, the contents of CQA, 3,4-diCQA, 3,5-diCQA, and 4,5-diCQA in E were ~11.7, ~7.7, ~32.5 and ~24.7 fold those in F. Overall, the TP and CGA contents differed significantly between genotypes at each stage and between stages of each genotype (T-TEST, P<0.01). Obviously, within the same management condition, the TP and CGA contents of E were significantly higher than F; S1 notably higher than S2.
Transcriptome sequencing and analyses
The RNA-seq reads for two genotypes at two stages (three replicates) included 1675.7 million reads, with individual libraries containing 128.4 to 185.7 million reads (Table 1). Reads from each sample were mapped to the reference genome [36] using HISAT [37]. BLAST mapping [38] revealed 29834 (91.84%) genes with homology to protein sequences in the Nr database. The expression level distributions of expressed genes were shown in Additional file 12: Fig. S1a. Correlation analysis showed that L10 revealed low correlation to the other double replicated samples, and therefore were removed from further DE mRNA and miRNA analysis (Additional file 12: Fig. S1b).
Differentially expressed gene analyses and annotation
The TP and CGA compounds increased across the four pairwise comparisons (FS2 vs. FS1; ES2 vs. ES1; FS2 vs. ES2; FS1 vs. ES1), thus co-regulated DEGs across the comparisons indicated the pivotal steps in the pathway of CGA biosynthesis. In total, 6961 DEGs were found across the four comparisons (Additional file 1: Table S1). The number of DEGs ranged from 1315 (690 upregulated; 625 downregulated) for FS1 vs. ES1 to 4482 (2196 upregulated; 2286 downregulated) for FS2 vs. FS1 (Fig. 3a). A total of 1685 and 711 DEGs exhibited common expression patterns between FS2 vs. FS1 and ES2 vs. ES1, between FS1 vs. ES1 and FS2 vs. ES2; an overlap of 63 common DEGs were found across the four comparisons (Fig. 3b). In comparisons of stage-specific and genotype-specific groups, 2333 common DEGs that were identified at least in two comparisons were then considered for further analysis.
To functionally characterize expression genes, firstly, we used the BLAST algorithm to annotate 6961 DEGs based on the eggNOG, KEGG, Pfam, GO, Nr and Swiss-Prot databases. As a result, 4426, 2289, 2678, 1273 DEGs were annotated for FS2 vs. FS1, ES2 vs. ES1, FS2 vs. ES2, FS1 vs. ES1; detailed annotation information was provided in Additional file 2: Table S2. Out of these DEGs, 2333 common DEGs were assigned to 47 GO terms in Additional file 3: Table S3, including the biological process (20), molecular function (14) and cellular component (13) categories. GO enrichment analysis of common DEGs revealed that catalytic activity (GO:0003824), oxidation-reduction process (GO:0055114), oxidoreductase activity (GO:0016491) ranked in the top 20 most significant enriched terms in Fig 4a. Furthermore, the pathway analysis of common DEGs was carried out to understand the molecular mechanism using KEGG database. The DEGs were found to represent 288 pathways (Additional file 4: Table S4). The enrichment analysis suggested that photosynthesis-antenna proteins (map00196), starch and sucrose metabolism (map00500), drug metabolism-cytochrome P450 (map00982) and phenylpropanoid biosynthesis (map00940) were among the most enriched pathways (Fig. 4b). A total of 134 transcription factors (TFs) belonging to 26 families were identified differentially expressed. Among them, C2C2 (18), AP2/ERF (16), MYB-related (11), bHLH (11) were the most overrepresented TF families (Fig. 4c).
Metabolic pathway and gene analyses during CGA accumulation
To provide a global view of leafy sweet potato secondary metabolism, common DEGs with different map ids were further submitted for analysis via the online Interactive Pathway (ipath) explorer v2 (Fig. 5a) [39]. The metabolic pathways such as pentose phosphate pathway (Fig. 5b), phenylalanine biosynthetic pathway (Fig. 5c), CGA biosynthetic pathway (Fig. 5d) and flavonoid biosynthesis showed enhanced, which were in accordance with the results of GO analysis. As pentose phosphate metabolism, phenylalanine biosynthetic pathway and CGA biosynthetic pathway were vital steps for CGA biosynthesis, genes involving in these pathways were fully illustrated.
In this work, genes participating in CGA biosynthesis showed special expression patterns. For the common expression pattern, two synonymous DEGs of G6PDH (itb02g05910; itb03g00300) which encoded glucose-6-phosphate 1-dehydrogenase were upregulated, providing sufficient NADPH and phosphoenolpyruvis acid (PEP) for shikimic acid pathway. DHQ/SDH (itb14g20920) which catalyzed the biosynthesis of phenylalanine was upregulated as well. HCT that had three synonymous DEGs (itb03g29460; itb01g04710; itb01g04740) expressed differentially across the four comparisons. Yet itb03g29460 was the most expressed one, which was downregulated in ES2 vs. ES1 and FS2 vs. FS1. CCR, which encoded cinnamoyl-CoA reductase, had 4 synonymous DEGs (itb09g17150; itb09g17200; itb02g23900; itb07g23820). They were downregulated in different comparisons. CCoAOMT that encoded caffeoyl-CoA O-methyltransferase had three synonymous DEGs (itb12g05230; itb01g21750; itb12g20360), all of which were downregulated. In this study, PAL (itb09g15750) was found to be only upregulated in the comparison FS2 vs. ES2. C3’H (p-coumarate 3′-hydroxylase), another name CYP 98A3 (itb01g24570) was upregulated in the comparison FS2 vs. ES2. The schematic of metabolic data related to leafy sweet potato CGA accumulation was briefly illustrated in Fig. 6.
High-throughput small RNA sequencing
The small RNA sequencing resulted in 248.6 million clean reads, with 14.4 to 30.0 million reads per library. Reads with length > 17 nt and < 33 nt were kept, following by the removal of ribosomal RNA (rRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA), and repetitive sequences (Table 1). The length distribution patterns of the sRNAs were similar in the eleven libraries. They ranged from 18 to 30 nt, of which 24 nt were the most abundant size (Fig. 7a). In order to identify known miRNAs, the filtered reads were searched against the miRNAs in miRBase. A total of 149 known miRNAs were obtained. As some of the known miRNAs were aligned with more than one pre-miRNAs, the detailed information of all aligned miRNAs was listed in Table S5. The length distribution of known miRNAs exhibited a peak at 21 nt (Fig. 7b), similar to the results reported in previous research in sweet potato and other species [32-35]. Reads that could not be mapped to miRBase were subjected to novel miRNA predication by miRDeep2 and the most length distribution of novel miRNA was 24 nt following by 21 nt (Fig. 7c) [40]. A total of 22 novel miRNAs were identified and listed in Table S5. The negative folding free energies of the hairpin structures of novel miRNAs ranged from -68.37 to -26.52 kcal mol-1 with an average of -43.47 kcal mol-1. The minimal folding free energy index (MFEI) of novel miRNAs ranged from 0.86 to 1.69 with an average of 1.13.
DE miRNA expression during CGA accumulation
miRNAs were considered as DE miRNAs if they had absolute values of log2 (Fold Change) ≥1 and FDR (False discovery rate) ≤0.05. A total of 9, 7, 18 and 9 miRNAs were identified as DE miRNAs across the four comparisons and 5 miRNAs were common differently expressed ones (Additional file 6: Table S6). The majority of DE miRNAs showed a trend of downregulation during CGA accumulation (Fig. 7d). miR156, miR166, miR167 and miR858 were found in different comparisons, which had been reported to be involved in phenylpropanoid pathway [41].
Target predication via in silico and degradome approaches
To explore the function of miRNAs, computational program was performed to predict their target genes. All identified 171 miRNAs were predicated to have 1799 targets via TargetFinder software with the score value < 4 [42]. The annotations for the 1799 miRNA targets were based on the GO, KEGG, eggNOG, Nr, Swiss-Prot and Pfam databases (Additional file 7: Table S7). The targets were uniformly assigned to 20 biological processes, 14 cellular components and 11 molecular functions. The most abundant 20 GO terms were demonstrated in Fig. 8a. The significant enriched GO terms like lignin catabolic process (GO:0046274), phenylpropanoid catabolic process (GO:0046271), lignin metabolic process (GO:0009808) and phenylpropanoid metabolic process (GO:0009698) were listed in Additional file 8: Table S8 and they were all involved in CGA accumulation pathway. Furthermore, KEGG annotation was carried out to explore the pathways in which the identified targets were involved. A total of 220 pathways were identified indicating the highly diverse functions of the targets. Phenylpropanoid biosynthesis (map00940) which was CGA accumulation related pathway were among the most 20 abundant pathways (Fig. 8b).
Using degradome sequencing, a total of 21.94 Mb clean tags and 7,892,630 cluster tags were obtained. The cluster tags were aligned to the transcriptome and Rfam database for cleavage site analysis. After processing and analysis with the software CleaveLand [43], 158 miRNA-mRNA pairs were totally identified (Additional file 9: Table S9). Target analysis showed that many cleaved-target genes by miRNAs were TF genes, including AP2/ERF, bZIP, TCP, MYB, SPL, etc. Some miRNAs had more than one target genes, like miR530a targeted microtubule-associated protein 70-1-like and bHLH130-like genes. On the contrary, same gene can be targeted by more than one miRNA, for instance, miR394c and miR384-5p shared the same target F-box. TFs such as AP2/ERF (itb14g16290) and SPL (itb01g24030) predicated in silico were validated in degradome sequencing.
Correlation analysis of DE miRNA expression profiles and their target genes
The expression of both DE miRNAs (from small RNA-seq) and their target genes (from RNA-seq) were integrated to infer the mediatory role of miRNAs during CGA biosynthesis. Coherent interactions were the ones in which the expression of miRNA was upregulated when the expression of target mRNA was downregulated, and vice versa. In this study, the spearman mathematical method with the criterions of index <= -0.8 and P value <=0.05 was employed. As a result, the correlation analysis of DE miRNA and their target mRNA expression profiles identified a total of 1462, 66, 319 and 19 miRNA-mRNA interaction pairs across four comparisons (FS2 vs. FS1; ES2 vs. ES1; FS2 vs. ES2; FS1 vs. ES1) (Additional file 10: Table S10). From these coherent pairs, miR156g-5p, miR156k-5p, miR156i-5p, miR156h-5p, miR156j-5p, miR156e in FS2 vs. FS1 which belonged to miR156 family were downregulated and their target DHQ/SDH (itb14g20920) was upregulated with the correlation rate of -0.9. These results were in accordance with quantitative reverse transcription-polymerase chain reaction (qRT-PCR) result. In addition, through associate analysis miR156g-5p, miR156k-5p, miR156i-5p, miR156h-5p, miR156j-5p, miR156 were found downregulated and their target gene SPL (itb02g07930) was upregulated in FS2 vs. FS1, the negative correlation of which was -0.9.
Validation of differential gene and miRNA expression
qRT-PCR analysis was carried out to validate the expression patterns of genes and miRNAs obtained from the RNA and small RNA sequencing. The expression of enzyme encoding-genes (HCT, CCoAOMT, CCR ) in the CGA biosynthetic pathway, two synonymous G6PD genes (itb03g00300, itb02g05910) in the pentose pathway and one phenylalanine biosynthesis-related gene DHQ/SDH were validated via qRT-PCR (Fig. 9; Additional file 11: Table S11), and the results were consistent with that of the mRNA sequencing, except the deviations of CCR in FS2 vs. FS1 and ES2 vs. ES1. As thirty genes were investigated in qRT-PCR experiments (data not shown), CCR was the only one with deviations and also the results turned out the same after the qRT-PCRs of CCR were repeated three times, so it was considered to be downregulated in this study. In addition, six miRNAs, namely, Nov-m2294-5p, Nov-m3917-3p, Nov-m4613-3p, sly-miR168a-5p, stu-miR156e and tcc-miR530a were validated by qRT-PCR as well (Fig. 9; Additional file 11: Table S11). Similar expression trends (upregulated or downregulated) were observed between the qRT-PCR analysis and the sRNA sequencing results.