Analyses of TP and CGA compositions
The search for CGA-related candidate genes and miRNAs was initiated by profiling two CGA-producing leafy sweet potato genotypes (E: EC16; F: Fushu No.7-6) at two stages (S1: 65 days after planting; S2: 85 days after planting) using Folin-Ciocalteu and HPLC methods following Xu et al. . As illustrated in Fig. 2a , the TPCs of E were ~1.6 and ~1.7 fold those of F at S1 and S2, respectively. In Fig. 2b and c, at S1, the contents of CQA, 3,4-diCQA, 3,5-diCQA, and 4,5-diCQA in E were ~2.4, ~1.7, ~3.0, and ~2.4 fold those in F; at S2, the contents of CQA, 3,4-diCQA, 3,5-diCQA, and 4,5-diCQA in E were ~11.7, ~7.7, ~32.5 and ~24.7 fold those in F. Overall, the TP and CGA contents differed significantly between genotypes at each stage and between stages of each genotype (T-TEST, P<0.01). Obviously, within the same management condition, the TP and CGA contents of E were significantly higher than F; S1 notably higher than S2.
Transcriptome sequencing and analyses
The RNA-seq reads for two genotypes at two stages (three replicates) included 1675.7 million reads, with individual libraries containing 128.4 to 185.7 million reads (Table 1). Reads from each sample were mapped to the reference genome  using HISAT . BLAST mapping  revealed 29834 (91.84%) genes with homology to protein sequences in the Nr database. The expressional level distributions of expressed genes were shown in Fig. 3a. Correlation analysis showed that L10 revealed low coorelation to the other double replicated samples, and therefore were removed from further DE mRNA analysis (Fig. 3b).
Differentilally expressed gene analyses and annotation
The TP and CGA compounds increased across the four pairwise combinations (FS2 vs. FS1; ES2 vs. ES1; FS2 vs. ES2; FS1 vs. ES1), thus co-regulated DEGs across the combinations indicated the pivotal steps in the pathway of CGA biosynthesis. In total, 6961 DEGs were found across the four combinations (Additional file 1: Table S1). The number of DEGs ranged from 1315 (690 upregulated; 625 downregulated) for FS1 vs. ES1 to 4482 (2196 upregulated; 2286 downregulated) for FS2 vs. FS1 (Fig. 4a). A total of 1685 and 711 DEGs exhibited common expression patterns between FS2 vs. FS1 and ES2 vs. ES1, between FS1 vs. ES1 and FS2 vs. ES2; an overlap of 63 common DEGs were found across the four combinations (Fig. 4b). In comparison of stage-specific and genotype-specific groups, 2333 common expressed DEGs were at least in two comparisons and then therefore considered for further analysis.
To functionally characterize expression genes, firstly, we used the BLAST algorithm to annotate 6961 DEGs based on the COG, KEGG, Pfam, GO, Nr and Swiss-Prot databases. As a result, 4426, 2289, 2678, 1273 DEGs were annotated for FS2 vs. FS1, ES2 vs. ES1, FS2 vs. ES2, FS1 vs. ES1; detailed annotation information was provided in Additional file 2: Table S2. Out of these DEGs, 2333 common expressed ones were assigned to 47 GO terms in Additional file 3: Table S3. It was found that GO terms for DEGs were uniformly assigned to each of the biological process (20), molecular function (14) and cellular component (13) categories, the most abundant 20 terms were demonstrated in Fig. 5a. GO enrichment analysis of common expressed DEGs revealed that photosynthesis and light harvesting (GO:0009765), catalytic activity (GO:0003824), oxidation-reduction process (GO:0055114), oxidoreductase activity (GO:0016491) ranked in the top 20 most significant enriched terms (Additional file 4: Table S4). Further, the pathway analysis of common expressed DEGs was carried out to understand the molecular mechanism using KEGG database. The DEGs were found to represent 288 pathways (Additional file 5: Table S5). The enrichment analysis suggested that photosynthesis-antenna proteins (map00196), starch and sucrose metabolism (map00500), drug metabolism-cytochrome P450 (map00982) and phenylpropanoid biosynthesis (map00940) were among the most enriched pathways (Fig. 5b). A total of 134 transcription factors (TFs) belonging to 26 families were identified differentially expressed. Among them, C2C2 (18), AP2/ERF (16), MYB-related (11), bHLH (11) were the most overrepresented TF families (Fig. 5c).
Metabolic pathway and gene analysis during CGA accumulation
To provide a global view of leafy sweet potato secondary metabolism, common expressed genes with different map ids were further submitted for analysis via the online Interactive Pathway (ipath) explorer v2 (Fig. 6) . The red lines in Fig. 6 indicated that genes involving in various pathways were enhanced. The metabolic rate of pathways such as pentose phosphate pathway (Fig. 6a), phenylalanine biosynthesis (Fig.6b), CGA biosynthetic pathway (Fig. 6c), flavonoid biosynthesis, carotenoid biosynthesis and brassinosteroid biosynthesis showed enhanced, which were in accordance with the results of GO analysis. As pentose phosphate metabolism, phenylalanine biosynthesis and CGA biosynthetic pathway were vital steps for CGA biosynthesis, genes invloving in these pathways were fully illustrated.
In this work, genes participating in CGA biosynthesis showed special expression pattern. From the common expression pattern, two synonymous DEGs of G6PDH (itb02g05910; itb03g00300) which encoded glucose-6-phosphate 1-dehydrogenase were upregulated, providing sufficient NADPH and phosphoenolpyruvis acid (PEP) for shikimic acid pathway. DHQ/SDH (itb14g20920) leading to the accumulation of phenylalanine was upregulated as well. HCT which had three synonymous DEGs (itb03g29460; itb01g04710; itb01g04740) expressed differentially across the four combinations. Yet itb03g29460 was the most expressed HCT gene, which was downregulated in ES2 vs. ES1 and FS2 vs. FS1. CCR, which encoded cinnamoyl-CoA reductase, had 4 synonymous DEGs (itb09g17150; itb09g17200; itb02g23900; itb07g23820). They were downregulated in different combinations. CCoAOMT encoding caffeoyl-CoA O-methyltransferase had three synonymous DEGs (itb12g05230; itb01g21750; itb12g20360), all of which were downregulated. In this study, PAL (itb09g15750) was found to be only upregulated in the combination FS2 vs. ES2. C3’H (p-coumarate 3′-hydroxylase), another name CYP 98A3 (itb01g24570) was upregulated in the combination FS2 vs. ES2. The schematic of metabolic data related to leafy sweet potato CGA accumulation was briefly illustrated in Fig. 7.
High-throughput small RNA sequencing
The small RNA sequencing resulted in 248.6 million clean reads, with 14.4 to 30.0 million reads per library. Reads with length > 17 nt and < 33 nt were kept, following by the removal of ribosomal RNA (rRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA), and repetitive sequences (Table 1). The length distribution patterns of the sRNAs were similar in the eleven libraries. They ranged from 18 to 30 nt, of which 24 nt were the most abundant size (Fig. 8a). In order to identify known miRNAs, the filtered reads were searched against the miRNAs from miRBase. A total of 149 known miRNAs were obtained. As some of the known miRNAs were aligned with more than one pre-miRNAs, the detailed information of all aligned miRNAs was listed in Table S6. The length distribution of known miRNAs exhibited a peak at 21 nt, similar to the results reported in previous research in sweet potato and other species (Fig. 8b) [32-35]. Reads that could not be mapped to miRBase were subjected to novel miRNA predication by miRDeep2 and the most length distribution of novel miRNA was 24 nt following by 21 nt (Fig. 8c) . A total of 22 novel miRNAs were identified and listed in Table S6. The negative folding free energies of the hairpin structures of novel miRNAs ranged from -68.37 to -26.52 kcal mol-1 with an average of -43.47 kcal mol-1. The minimal folding free energy index (MFEI) of novel miRNAs ranged from 0.86 to 1.69 with an averge of 1.13.
DE miRNA expression during CGA accumulation
miRNAs were considered as DE miRNAs if they had absolute values of log2 (Fold Change) ≥1 and FDR (False discovery rate) ≤0.05. A total of 9, 7, 18 and 9 miRNAs were identified as DE miRNAs across the four combinations and 5 miRNAs were common expressed ones (Additional file 7: Table S7). The majority of DE miRNAs showed a trend of downregulation during CGA accumulation (Fig. 8d). miR156, miR166, miR167 and miR858 were found in different combinations, which had been reported to be involved in phenylpropanoid pathway .
Target predication via in silico and degradome approaches
To explore the function of miRNAs, computational program was performed to predict their target genes. All identified 171 miRNAs were predicated to have 1799 targets via TargetFinder software with the score value < 4 . The annotations for the 1799 miRNA targets were based on the GO, KEGG, COG, Nr, Swiss-Prot and Pfam databases (Additional file 8: Table S8). The targets were uniformly assigned to 20 biological processes, 14 cellular components and 11 molecular functions. The most abundant 20 GO terms was demonstrated in Fig. 9a. The significant enriched GO terms like lignin catabolic process (GO:0046274), phenylpropanoid catabolic process (GO:0046271), lignin metabolic process (GO:0009808) and phenylpropanoid metabolic process (GO:0009698) were listed in Additional file 9: Table S9 and they were all involved in CGA accumulation pathway. Further, KEGG annotation was carried out to explore the pathways in which the identified targets were involved. A total of 220 pathways were identified indicating the highly diverse functions of the targets. Phenylpropanoid biosynthesis (map00940) which was CGA accumulation related pathway were among the most 20 abundant pathways (Fig. 9b).
Using degradome sequencing, a total of 21.94 Mb clean tags and 7,892,630 cluster tags were obtained. The cluster tags were aligned to the transcriptome and Rfam database for cleavage site analysis. After processing and analysis with CleaveLand , 158 miRNA-mRNA pairs were totally identified (Additional file 10: Table S10). Target analysis showed that many cleaved-target genes by miRNAs were TF genes, including AP2/ERF, bZIP, TCP, MYB, SPL, etc. Some miRNAs had more than one target genes, like miR530a targeted microtubule-associated protein 70-1-like and bHLH130-like genes. On the contrary, same gene can be targeted by more than one miRNAs, for instance, miR394c and miR384-5p shared the same target F-box. TFs such as AP2/ERF (itb14g16290) and SPL (itb01g24030) predicated in silico were validated in degradome sequencing.
Correlation analysis of DE miRNA expression profiles and their target genes
The expression of both DE miRNAs (from small RNA-seq) and their target genes (from RNA-seq) were integrated to infer the mediatory role of miRNAs during CGA biosynthesis. Coherent interactions were the ones in which the expression of miRNA was upregulated when the expression of target mRNA was downregulated, and vice versa. In this study, the spearman mathematical method with the criterions of index <= -0.8 and P value <=0.05 was employed. As a result, the correlation analysis of DE miRNA and their target mRNA expression profiles identified a total of 1462, 66, 319 and 19 miRNA-mRNA interaction pairs across four combinations (FS2 vs. FS1; ES2 vs. ES1; FS2 vs. ES2; FS1 vs. ES1) (Additional file: Table S11). From these coherent pairs, MIR156 family of miR156g-5p, miR156k-5p, miR156i-5p, miR156h-5p, miR156j-5p, miR156e in FS2 vs. FS1 were downregulated and their target DHQ/SDH (itb14g20920) was upregulated with the correlation rate of -0.9. This results were in accordance with quantitative reverse transcription-polymerase chain reaction (qRT-PCR) result. In addition, through associate analysis miR156g-5p, miR156k-5p, miR156i-5p, miR156h-5p, miR156j-5p, miR156 were found downregulated and their target gene SPL (itb02g07930) was upregulated in FS2 vs. FS1, the negative correlation of which was -0.9.
Validation of differential gene and miRNA expression
qRT-PCR analysis was carried out to validate the expression patterns of genes and miRNAs obtained from the RNA and small RNA sequencing. The expression of enzyme encoding-genes (HCT, CCoAOMT, CCR ) in the CGA biosynthetic pathway, two synonymous G6PD genes (itb03g00300, itb02g05910) in the pentose pathway and one phenylalanine biosynthesis-related gene DHQ/SDH were validated via qRT-PCR (Fig. 10; Additional file 12: Table S12), the qRT-PCR results were consistant with the mRNA sequencing reults, except the CCR in FS2 vs. FS1 was not so significant as sequencing result. In addition, six miRNAs, namely, Nov-m2294-5p, Nov-m3917-3p, Nov-m4613-3p, sly-miR168a-5p, stu-miR156e and tcc-miR530a were validated by qRT-PCR as well (Fig. 10; Additional file 12: Table S12). Similar expression trends (upregulated or downregulated) were observed between the qRT-PCR analysis and the and sRNA sequencing results.