Sequencing of sRNAs and the transcriptome
Six sRNA libraries were generated to identify miRNAs involved in the fruit development of the pitaya. The Illumina sequencing data of sRNAs from the 19th, 25th and 29th day after flowering (DAF) showed that 24 nt sRNAs are the most abundant, followed by 21 nt sRNAs (Fig. 1). A total of 82,318,241 reads were obtained from the datasets. After removal of the adaptor, insert, polyA and short RNAs of <18 nt in length, 62,729,725 (76.20%) valid reads were obtained, including the rRNA, tRNA, snRNA, snoRNA and some other Rfam RNA (Table 1). 11,455,464, 10,976,049, 12,337,213, 8,608,731, 9,222,106 and 10,130,162 unique matches for the sRNAs were identified as valid reads in Hp19d_1, Hp19d_2, Hp25d_1, Hp25d_2, Hp29d_1 and Hp29d_2, respectively.
Three transcriptome libraries of pitaya pulps were constructed to identify the target genes of miRNAs. 7.46 Gb, 9.10 Gb and 10.23 Gb nucleotide data (Q20 < 20) were obtained from Hp29d, Hp25d and Hp29d libraries, respectively. All screened reads were de novo assembled into 68,505 transcripts with an N50 of 1,700 bp. And then, these transcripts were assembled into 39,737 unique sequences with an average length of 879 bp. The length distributions of those transcripts and unigenes were shown in Fig. 2. The distribution of assembled transcripts and genes with different GC contents in the transcriptome datasets were summarized in Fig. 3. The majority of transcripts and genes were in the range of 35%–50% in GC contents.
Identification of conserved miRNAs
A total of 126 conserved miRNAs were identified based on BLAST searches and sequence analyses in comparison of sRNA sequences with known mature plant miRNAs in miRBase (Table S1). Among them, 95 known miRNAs belonging to 53 families were obtained. The number of miRNA members for each family varied from 1 (for MIR535) to 7 (for MIR482) (Fig. 4). MIR160, MIR156, MIR159, MIR166, MIR164 and MIR396 families consisted of 2, 3, 4, 4, 5 and 6 members, respectively. The higher expression levels of the miRNA, the more copies of the miRNAs were sequenced. Among the 126 conserved miRNAs, four miRNAs named as sly-miR164a-3p_1ss12TC, ppe-miR535a, ptc-miR164a and mtr-miR396b-5p had the most reads, reaching up to 10,000 (Table S1). Additionally, nine miRNAs (ama-miR396-5p_1ss4CA, gma-miR396a-5p_L+1_1ss22GT, mtr-miR167b-5p, aly-miR168a-5p, peu-MIR2916-p5_1ss5AG, peu-MIR2916-p3_1ss8TC, osa-miR529b_R-1_1ss8AG, gma-miR6300, and gma-miR393a_R+1) were sequenced in thousands. While another six miRNAs (nta-miR6147, stu-miR7122-5p_1ss18TG, mdm-miR403a_R+1_2ss20CT21GT, sly-miR403-5p_L-2_1ss15TC, osa-miR319a-3p.2- 3p_1ss9AG, and gma-miR172b-5p_L-2R+2) had only one read sequenced at one stage. Changes in expression levels of miRNAs during fruit development of pitaya possibly reflected their potential functional differences.
Identification of novel miRNAs
Novel miRNAs were predicted using the sRNA valid reads based on structure and expression criteria [43]. 41 novel candidate miRNAs with a clear precursor including stem-loop secondary structure were identified (Table S2). The length of novel miRNAs was between 20-25 nt. The most abundant sequences (49%) were 21 nt-length, and followed by 24 nt-length (29%), which was consistent with typical length distribution of mature miRNAs. The abundance of known miRNAs was higher than that of putative novel miRNAs, except PC-5p-192_7269, PC-5p-2548_357, PC-5p-1835_491 and PC-3p-3086_289, which possessed more than 100 normalized reads (Table S1 and S2).
Analyses of Differentially Expressed miRNAs
High-throughput sequencing (HTS) was performed to explore the expression level changes of miRNAs involved in betalain biosynthesis of ‘Guanhuahong’ pitaya. Only miRNAs with expression values at p<0.05 were considered to be significantly regulated. In Hp29d/Hp25d/Hp19d, 33 known miRNAs and five novel miRNAs were found to be differentially expressed (Table 2).
In different pulp coloration stages of ‘Guanhuahong’ pitaya, the expression levels of mtr-miR396b-3p_L+2R-2, ppe-miR535a, PC-5p-2975_303 and PC-3p-53338_18 at the red pulp stages (25th and 29th DAP) were significantly higher than that of the white pulp stage (19th DAP) (Table 2). Twelve differentially expressed miRNAs i.e. aly-miR390a-3p, nta-miR6149a, zma-miR159c-3p_L+2R-1, lus-miR530a_R+1_1ss20TG, gma-miR6300, aly-miR390a-5p, nta-miR6020b, gma-miR394a-5p_1ss2TG, osa-miR5072_L-4_1ss12CT, PC-5p-23845_39, gma-miR171c-3p_1ss19AC and stu-miR482a-5p_2ss3AT12AG showed a lower expression level at the red pulp stages (25th and 29th DAP) compared to a relatively higher expression level at white pulp stage (19th DAP). Expression levels of ssp-miR156_L+1R-1_1ss4AC, aly-miR157d-5p_L+1_1ss4AC, ptc-miR164a, sly-miR164a-3p_1ss12TC and ppe-miR399a increased gradually from 19th DAF to 29th DAF in ‘Guanhuahong’ pitaya. However, the expression levels of ssl-miR171b_1ss21TC, mdm-miR408a_1ss1AT, cme-miR393a_R+1, gma-miR398c, ath-miR8175_L+4, aly-miR398a-3p, aly-miR168a-3p_L+3_1ss19CT, osa-miR529b_R-1_1ss8AG, zma-miR398a-3p_R-2, gma-miR172a_1ss1AC, nta-miR159 and lus-miR397a_L-1R+2 decreased gradually from 19th DAF to 29th DAF during fruit maturation of ‘Guanhuahong’ pitaya.
Bioinformatics of transcriptome analyses
All 39,737 unique sequences were annotated according to BLAST (cut-off E-value ≤10−5) searches of Nr, Swiss-Prot Protein, Pfam, Gene Ontology (GO), euKaryotic Ortholog Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Table 3). Of these, 21,783 unique sequences (54.8%) could be annotated in the Nr database, while 7,966 unigenes (20.1%) were annotated using the KEGG databases.
The GO, KEGG and KOG databases were used to classify the functions of the predicted pitaya pulp unigenes. 12,559 unigenes were classified into three main categories: ‘biological process’, ‘cellular component’, and ‘molecular function’. More than 150 unigenes involved in the annotation components were integrated (Fig. 5). As for the ‘molecular function’ category, the largest number of unigenes was annotated as ‘ATP binding’, while the major groups for the ‘cellular component’ category were ‘integral to membrane’ (2,878 unigenes, 23%), ‘nucleus’ (2,211 unigenes, 18%) and ‘plasma membrane’ (1,646 unigenes, 13%). In the category of ‘biological process’, ‘regulation of transcription, DNA-dependent’ (729 unigenes, 5.8%) and ‘transcription, DNA-dependent’ (706 unigenes, 5.6%) were the two most abundant subcategories.
7,822 unigenes were mapped onto KEGG pathways. The highest number of unigenes was carbohydrate metabolism (796 unigenes), followed by energy metabolism (596 unigenes), amino acid metabolism (595 unigenes) and translation (510 unigenes) (Fig. 6). Differential expression genes (DEGs) in the Hp25d-VS-Hp19d, Hp29d-VS-Hp19d, and Hp29d-VS-Hp25d were analyzed. In the Hp25d-VS-Hp19d, 24, 20, 16 and 13 unique sequences that significantly differentially expressed from phenylpropanoid, stilbenoid, diarylheptanoid and gingerol biosynthesis, DNA replication and flavonoid biosynthesis pathway were detected (p<0.0001), respectively (Fig. 7A). In Hp29d-VS-Hp19d, 140 genes were found to be significantly differentially expressed (p<0.0001), including biosynthesis of phenylpropanoid, flavonoid, stilbenoid, diarylheptanoid and gingerol, metabolism of phenylalanine, alpha-Linolenic acid, starch and sucrose, xenobiotics by Cyt P450, as well as drug metabolism-Cyt P450 (Fig. 7B). The expression levels of DEGs among different groups in one pathway showed different scales of change. 13 DEGs were detected from the flavonoid biosynthesis pathway in the Hp25d-VS-Hp19d and Hp29d-VS-Hp19d, respectively. However, no DEGs related to the flavonoid biosynthesis pathway were found in Hp29d-VS-Hp25d (Fig. 7C). The KOG database is used to study the classification and evolutionary rates of orthologous proteins. As shown in Fig. 8, group R (general function prediction only), group O (posttranslational modification, protein turnover, chaperones) and group T (signal transduction mechanisms) are the three most abundant groups in pitaya dataset, suggesting that pitaya fruit development involved in a large number of transcriptional and posttranslational regulation of gene expression and function.
DEGs from three development stages (Hp25d-VS-Hp19d, Hp29d-VS-Hp19d, and Hp29d-VS-Hp25d) were evaluated by pairwise comparisons using the expression fold (|log2fold change|≥1) and values p<0.05 as the thresholds (Figure 9). In the pairwise comparisons between any two stages, 3,988 genes were found to be significantly differentially expressed. The highest amount of DEGs was obtained between the Hp19d and Hp29d libraries, including 1,486 down-regulated and 733 up-regulated (Fig. 9A). The lowest number of DEGs (1835) was detected between the Hp25d and Hp29d libraries (987 down-regulated and 848 up-regulated). followed by the Hp19d and Hp25d libraries (1,961 DEGs, 1,274 down-regulated and 687 up-regulated). Among those DEGs, 124 genes were significantly differentially expressed in all three fruit development stages of pitaya (Fig. 9B).
Verification of the accuracy of the sRNAome and RNA-Seq data
To verify the reliability of the sRNAome and RNA-Seq results, the expression of fifteen miRNAs and fifteen genes related to betalain biosynthesis were analyzed by RT-qPCR. The IDs, reads per kilobase of exon model per million mapped read (RPKM) value, and primers for the 30 transcripts were shown in Table S1, S2, S4, S5, and S7, respectively. The overall correlation coefficient of 0.717** and 0.719** were obtained by linear regression [(Q-PCR value) = a (sRNAome or RNA-Seq value) + b] analysis (Fig. 10), respectively, indicating that the results of sRNAome and transcriptome analyses were consistent with those of RT-qPCR. Those results suggested that sRNAome and RNA-Seq data can be used for subsequent experiments.
Prediction of targets for differentially expressed miRNAs
To investigate the functions of miRNAs in betalain biosynthesis of pitaya, it is crucial to predict their target genes. Target genes of differentially-expressed miRNAs were predicted by Target Finder software. Transcripts of transcriptome were identified as possible target genes for the majority of miRNAs. A total of 124 target mRNAs, 22 target mRNAs for conserved miRNAs and novel miRNAs were obtained, respectively (Table S8). More than one target gene was predicted for most miRNAs. miRNAs have the potential to regulate targets belonging to certain gene families with different biological functions. For example, the predicted target genes of osa-miR529b_R-1_1ss8AG were found to be involved in SPL6 (squamosa promoter-binding-like protein 6), histone-lysine N-methyltransferase and ubiquitin-protein ligase. Six targets may be involved in betalain biosynthesis. Among them, comp29967_c0 (Cyt P450 83B1), comp234190_c0 (Cyt P450 71A8), comp28219_c0 (protein-tyrosine sulfotransferase), comp27657_c0 (WD and tetratricopeptide repeats protein 1) and comp25650_c0 (SPL) were respectively targeted by PC-5p-192_7269, nta-miR6020b, PC-5p-23845_39, ptc-miR164a and ssp-miR156_L+1R-1_1ss4AC while comp25631_c0 (SPL6) was co-targeted by ssp-miR156_L+1R-1_1ss4AC, aly-miR157d-5p_L+1_1ss4AC and osa-miR529b_R-1_1ss8AG.
Tissue-specific analyses of differentially expressed miRNAs
miRNA preferentially expressed in specific tissues might provide clues to its physiological functions. Thirty differentially expressed miRNAs were analyzed their functions by RT-qPCR using ten different tissues from ‘Guanhuabai’ and ‘Guanhuahong’ pitayas. As shown in Figure 11, the 30 differentially expression miRNAs showed different expression levels in those tissues. Gma-miR172a_1ss1AC, gma-miR394a-5p_1ss2TG, lus-miR530a_R+1_1ss20TG, nta-miR6020b, lus-miR397a_L-1R+2, ttu-miR160_1ss21AG and zma-miR398a-3p_R-2 displayed similar expression patterns. However, ssp-miR156_L+1R-1_1ss4AC strongly expressed in pitaya roots and fruits. Aly-miR398a-3p strongly expressed in pitaya stamens compared to weak expression in the other tissues. PC-5p-2975_303 and gma-miR171c-3p_1ss19AC strongly expressed in petals but moderately or weakly expressed in the other tissues. Ptc-miR164a and sly-miR164a-3p_1ss12TC displayed expression in all tissues and the highest expression level was detected in pitaya fruits. Mtr-miR396b-3p_L+2R-2 and osa-miR5072_L-4_1ss12CT had higher expression in petals of ‘Guanhuabai’ pitaya, but ssl-miR171b_1ss21TC and PC-5p-192_7269 showed higher expression in petals of ‘Guanhuahong’ pitaya. PC-3p-53338_18, PC-5p-1835_491, PC-5p-23845_39, cme-miR393a_R+1, aly-miR390a-5p, nta-miR159 and mdm-miR408a_1ss1AT had preferential expression in petals and receptacles of ‘Guanhuahong’ pitaya. Aly-miR157d-5p_L+1_1ss4AC and gma-miR6300 were highly expressed in ‘Guanhuahong’ and ‘Guanhuabai’ pitayas, respectively. Results from RT-qPCR indicated that expression patterns of the 30 pitaya miRNAs had tissue- and/or cultivar- characteristics in terms of different expression levels.
Validation of miRNAs and target genes related to belatain biosynthesis
Fourteen target genes predicted from 11 miRNAs were assayed by RT-qPCR. The target genes from the ‘Guanhuahong’ and ‘Guanhuabai’ pitayas had the same sequences (Fig. S2). Sixteen targets showed decreased or increased expression trends along with the increased or decreased expression of the miRNA, suggesting that they might be actively cleaved by miRNAs (Fig. 12). For example, comp36993_c0_seq3 (Fig. 12A9 and B9) and comp35191_c0 (Fig. 12A10 and B10) targeted by aau-miR160_L-4R+1, comp15143_c0 (Fig. 12A5), comp24362_c0 (Fig. 12A6) and comp403340_c0 (Fig. 12A7) targeted by mdm-miR858, and comp234190_c0 (Fig. 12A2) targeted by nta-miR6020b showed increasing at first and decreasing thereafter while aau-miR160_L-4R+1, mdm-miR858 and nta-miR6020b showed decreasing at first and increasing thereafter at all pulp coloration stages of pitaya. comp25631_c0 (Fig. 12A15) targeted by ssp-miR156_L+1R-1_1ss4AC showed decreasing while ssp-miR156_L+1R-1_1ss4AC showed increasing at all pulp coloration stages of pitaya. However, some targeted genes and their miRNAs such as comp28219_c0 (Fig. 12B11) targeted by PC-5p-23845_39 in ‘Guanhuahong’ pitaya and comp25631_c0 (Fig. 12A13) targeted by osa-miR529b_R-1_1ss8AG in ‘Guanhuabai’ pitaya had similar expression patterns at all pulp stages of pitaya.
5′RACE analyses were further verified the fourteen candidate targets. The comp25631_c0 (SPL6), comp24967_c0 (TF TT2), comp15143_c0 (TF MYB12), comp24362_c0 (anthocyanin regulatory C1 protein), comp36993_c0_seq3 (Hpcyt P450-like3), comp29967_c0 (cyt P450 83B1) and comp28219_c0 (protein-tyrosine sulfotransferase) were confirmed to be cleaved by their corresponding miRNAs (Fig. 13). The cleavage sites of PC-5p-192_7269 on comp29967_c0 and PC-5p-23845_39 on comp28219_c0 were both occurred at the 10th nucleotide from the 5′-end of miRNAs in the binding region. The cleavage frequency of PC-5p-192_7269 on comp29967_c0 and PC-5p-23845_39 on comp28219_c0 was up to 10/10 in ‘Guanhuabai’ and in ‘Guanhuahong’ pitayas, respectively. This finding confirmed that PC-5p-192_7269 and PC-5p-23845_39 can guide the cleavage of the mRNA of comp29967_c0 and comp28219_c0, respectively. Cleavage occurred mostly at the 10th nucleotide from the 5′-end in the binding sites. However, comp24362_c0 occurred at the 9th nucleotide in ‘Guanhuahong’ pitaya, which may be due to the wide range of miRNA cutting mRNA caused by siRNA interference. Besides, the same gene from ‘Guanhuahong’ pitaya is different from ‘Guanhuabai’ pitaya (Fig. 13), which may be responsible for the different pulp colors of the two pitayas.
The relationship between miRNAs and their target genes
A total of 5 miRNAs and 7 corresponding target genes i.e. aau-miR160_L-4R+1-comp35191_c0, nta-miR6020b-comp234190_c0, PC-5p-192_7269- comp29967_c0, PC-5p-23845_39-comp28219_c0, mdm-miR858-comp15143_c0, mdm-miR858-comp24362_c0 and mdm-miR858-comp403340_c0 were verified their interaction in tobacco transient expression system (Fig. 14). This interaction could affect miRNA processing resulting in higher precursor accumulation and reduced mature miRNA in pitaya, and further regulate betalain biosynthesis of pitaya.