2.1 Comparative transcriptome analysis among different color varieties of R. simsii
2.1.1 Statistical analysis of RNA-seq
Nine cDNA libraries with three repetitions for each variety were sequenced (Figure 1). Total reads were 42,265,228-78,751,278 for ‘Red variety’, 50,007,378-58,400,834 for ‘Pink variety’, and 62,882,854-74,057,210 for ‘Crimson variety’, respectively (Table 1). Clean reads were obtained through removing adaptors, low quality and ambiguous sequences. The rates of clean reads were all higher than 99.902%. In particular, more than 79.25% clean reads could be mapped to the reference genome, in which 3.56%-3.86% could be multiple mapped. Furthermore, the Q20 values were all more than 97.49%, and GC contents ranged within 46.30%-46.81% (Table 1). Numbers of InDel and SNP markers were 16,196-20,856 and 403,918-508,043, respectively. Exon skipping was the main alternative splicing events, followed by alternative 3' splice site and alternative 5' splice site, while mutually exclusive exon was the least (Table 2). In total, a set of 29,429 transcripts were obtained and assembled into 97,856 unigenes (72,632,054 bp) with an N50 value of 1,223 bp. The average lengths of clean reads and unigenes were 785 bp and 726 bp, respectively. Expression level of unigenes were mainly 0.1-3.75, as the ratio ranged within 29.55%-31.46% (Table 3).
2.1.2 Analysis of differentially expressed mRNAs between three varieties
After data filtering (P value<1E-3 , FDR<1E-3, and Fold Chang>2), 3,129, 5,755, and 5,295 DEGs were identified through comparative transcriptome analysis between ‘Red variety’ and ‘Pink variety’, ‘Red variety’ and ‘Crimson variety’, as well as ‘Pink variety’ and ‘Crimson variety’ (Figure 2). Compared with ‘Red variety’, 1,507 DEGs were up-regulated and 1,622 DEGs were down-regulated in ‘Pink variety’ (Table S1). In comparison with ‘Red variety’, 2,148 and 3,607 DEGs were up-regulated and down-regulated in ‘Crimson variety’, respectively (Table S1). Compared with ‘Pink variety’, expression of 2,089 GEGs were enhanced, while the expression of 3,206 DEGs were down-regulated in ‘Crimson variety’ (Table S1). In these three pairs of comparisons, typical DEGs were involved in “catalytic activity” (GO:0003824), “binding” (GO:0005488), “metabolic process” (GO:0008152), “cellular process” (GO:0009987) (Figure 3). Furthermore, “metabolic pathways” (map01100), “biosynthesis of secondary metabolites” (map01110), “plant-pathogen interaction” (map04626), and “phenylpropanoid biosynthesis” (map00940) were the main differentially enriched pathways among three R. simsii varieties (Figure 4). Statistical analysis was performed through calculating Pearson’s correlation coefficient (r) value based on DEGs, and replicates in each samples showed high correlation (Figure 5). Compared with ‘Crimson variety’, ‘Red variety’ and ‘Pink variety’ had more association. Moreover, gene cluster inferred that DEGs could be clustered into two clades (Figure 5). Good correlation (R2=0.8723) was observed between the RNA-seq data and qRT-PCR data.Negative correlation was observed between the expression levels of miRNAs and corresponding target genes.
2.2 Small RNA meta-analysis among different color varieties of R. simsii
2.2.1 Statistical analysis of small RNA seqeucing
Removing low quality reads and adaptors, 15,836,081-20,526,460, 14,793,802-17,324,699, and 11,947,123-15,428,901 clean reads (≧18nt) were obtained for ‘Red variety’, ‘Pink variety’, and ‘Crimson variety’, respectively (Table 5). In particular, the Q20 and Q30 values were all above 98.94%. and 96.82%, respectively. GC contents varied from 55.34% (‘Pink variety’-2) to 56.9% (‘Crimson variety’-3). In relation to R. simsii genome, the percentage rate of mapped reads ranged from 67.68% (‘Crimson variety’-1) to 89.89% (‘Red variety’-3) (Table 4). Moreover, similar length distributions existed in three R. simisii varieties (Figure S1). Among 18-36 nt small RNAs, the majority ranged from 21 to 24 nt in length. In particular, the most abundant sRNAs were 24 nt, followed by 21 nt and 22 nt in all three varieties. The rRNAs (853,418-1,354,362), snoRNA (25,279-35,637), snRNA (1,494-3,088), sRNA (17,499-26,859), and tRNA (3,238-14,346) were annotated through searching against GenBank, tRNAdb, Rfam, Repbase, and Silva databases (Table 5).
2.2.2 Identification of miRNAs among three R. simsii varieties
Unannotated RNAs were subjected for miRNA identification. Totally, 151 known miRNAs belonging to 55 families were screened, and miR156 family was the largest represented family with 14 members, followed by miR166 (10 members), miR169 (8 members), and miR157 (7 members) (Table S2). Most of the conserved miRNA families had only one member. Bases at 1th and 24th had obvious uracil (U) base preference for known miRNA (Figure 6). In particular, 102 miRNAs were all expressed in all three verities (Table S3). In addition, 9, 10, and 6 miRNAs were owned together between ‘Red variety’ and ‘Pink variety’, ‘Red variety’ and ‘Crimson variety’, as well as ‘Pink variety’ and ‘Crimson variety’ (Table S3). Furthermore, 5, 9, and 10 miRNAs were unique to ‘Red variety’, ‘Pink variety’, and ‘Crimson variety’, respectively.
The miRNA sequences, meeting threshold of miRDeep2 analysis but possessing no known homologous miRNA gene families in miRBase, were used to predict novel miRNAs (total score > 0). In particular, 62 novel mature miRNA sequences were identified for R. simsii varieties (Table S4). In total, four length types of miRNAs were predicted, including 21 nt (44, 70.97%), 22 nt (14, 22.58%), 20 nt (3, 4.84%), and 24 nt (1, 1.61%). For these novel miRNA, 1th and 24th bases had U base preference, while G base preference existed at 23th base position (Figure 7).
2.2.3 Analysis of differentially expressed miRNAs between three varieties
Based on fold change (≥1 or ≤-1) and P-value (<0.05) criteria, 53 miRNAs were significantly differently expressed among three varieties (Figure 8). Compared with ‘Red variety’, l0 miRNAs were up-regulated and 10 miRNAs were down-regulated in ‘Pink variety’ (Table S5A); while the downregulated and upregulated miRNAs were 21 and 13 in ‘Crimson variety’, respectively (Table S5B). In related to ‘Pink variety’, the expression level of 17 miRNAs were downregulated, while 7 miRNAs were upregulated in ‘Crimson variety’ (Table S5C). For deeply clarifying molecular mechanism of flower color variation in three R. simsii varieties, expression patterns of miRNAs and corresponding target genes were analyzed through cluster analysis. Compared between ‘Red variety’ and ‘Pink variety’, novel_miR15, ath-miR167d, ath-miR162b-5p, and ath-miR166b-5p were highly expressed in the ‘Red variety’ (Figure 9A). In related to ‘Red variety’, ath-miR390a-3p, ath-miR156j, ath-miR2111a-5p, and ath-miR169b-5p had higher expression levels in ‘Crimson variety’ (Figure 9B). Compared with ‘Crimson variety’, ath-miR399a, ath-miR167d, novel_miR49, and ath-miR170-5p were highly expressed in ‘Pink variety’ (Figure 9C).
2.2.4 Target gene prediction and enrichment analysis
In order to investigate the functions of miRNAs in blooming flower of R. simsii, target genes of these differentially expressed miRNAs were predicted with transcriptome data as target database. Totally, 2,889 miRNA target site (386 target genes) were obtained. Among the miRNA-target mRNA pairs, 79.97% miRNA could target mutiple unigenes, and the numbers ranging from 1 to 40 (Table S6). Good correlation (R2=0.8687) was obtained between real-time PCR amplification and sequencing data, which further confirmed high reliability of RNA-seq and miRNA-seq in this research (Figure S2). The miRNAs negatively regulated expression of target mRNAs in R. simsii varieties.
Compared between ‘Red variety’ and ‘Pink variety’, 46 GO terms were enriched, which could be further categorized into three categories: biological processes (17), cellular components (3), and molecular functions (26) (Table S7A). For biologic process, the target genes were mainly involved in “meristem maintenance” (GO:0010073), “meristem development” (GO:0048507), “protein glycosylation” (GO:0006486), and “protein ubiquitination” (GO:0016567). In cellular component category, target genes were tightly associated with “nucleus” (GO:0005634) and“extracellular space” (GO:0005615). In molecular function, most target genes were mainly related to “lipid binding” (GO:0008289) and “GTP binding” (GO:0005525). Moreover, 16 pathways have been identified, and the significant pathways were “Monobactam biosynthesis” (map00261), “Selenocompound metabolism” (map00450), “Sulfur metabolism” (map00920), “Purine metabolism” (map00230), “Plant-pathogen interaction” (map04626), and “MAPK signaling pathway-plant” (map04016) (Figure 10A).
Compared between ‘Red variety’ and ‘Crimson variety’, a total of 72 GO terms were enriched, containing biological processes (25), cellular components (10), and molecular functions (37) (Table S7B). Among GO terms related to biologic process, target genes were mainly involved in “carbohydrate metabolic process” (GO:0005975), “glutathione metabolic process” (GO:0006749), and “DNA repair”(G0:0006281). In cellular component category, target genes were mainly associated with “nucleus” (GO:0005634) and “cCAAT-binding factor complex” (GO:0016602). Considering molecular function category, target genes were mainly related to “hemebinding” (GO:0020037), “glutathione transferase activity” (GO:0004364), as well as “hydrolase activity, hydrolyzing O-glycosyl compounds” (GO:0004553). Among the 28 enriched pathways, “Plant-pathogen interaction” (map04626), “MAPK signaling pathway-plant” (map04016), “Glutathione metabolism” (map00480), and “Phenylpropanoid biosynthesis” (map00940) were most enriched (Figure 10B).
Among the 53 GO terms enriched between ‘Pink variety’ and ‘Crimson variety’, 19, 4, and 30 terms were associated with biological processes, cellular components, and molecular functions, respectively (Table S7C). In the category of biological process, main target genes were associated with “signal transduction” (GO:0007165), “protein glycosylation” (GO:0006486), “carbohydrate metabolic process” (GO:0005975), and “translation” (GO:0006412). In cellular component category, target genes were mainly regarding to “nucleus” (GO:0005634) and “extracellular space” (GO:0005615). For the molecular function category, target genes were mainly related to “ATP hydrolysis activity” (GO:0016887), “lipid binding” (GO:0008289) and “glycosyltransferase activity” (GO:0016757). In total, 19 KEGG pathways have been enriched, mainly including “Plant-pathogen interaction” (map04626), “MAPK signaling pathway-plant” (map04016), and “Starch and sucrose metabolism” (map00500) (Figure 10C).
2.3 Integrative analysis of RNA-seq and miRNAomics
2.3.1 MiRNA-mRNA interaction network analysis
Three top significant enriched terms were screened between ‘Red variety’ and ‘Pink variety’: “DNA repair” (GO:0006281) was under biologic process category; “GTPase activity” (GO:0003924) and “GTP binding” (GO:0005525) were under molecular function (Figure S3). Compared between ‘Red variety’ and ‘Crimson variety’, “hydrogen peroxide catabolic process” (GO:0042744) and “response to oxidative stress” (GO:0006979) were associated with biologic process; “extracellular space” (GO:0005615) was significantly involved in cellular component; 4 terms under molecular function category were found, consisting of “hexosyltransferase activity” (GO:0016758), “UDP-glycosyltransferase activity” (GO:0008194),“serine-type endopeptidase inhibitor activity” (GO:0004867), and“peroxidase activity” (GO:0004601) (Figure S4). Compared between ‘Pink variety’ and ‘Crimson variety’, “extracellular space” (GO:0005615) and “nucleus” (GO:0005634) were enriched for the cellular component category; “glycosyltransferase activity” (GO:0016757) and “serine-type endopeptidase inhibitor activity” (GO:0004867) were enriched for molecular function category (Figure S5).
2.3.2 Anthocyanin synthesis related miRNA-mRNA regulatory network
Synthesis of three major anthocyanins, including pelargonidin, cyanidin, and delphinidin, were detected in R. simsii flowers. Particularly, 5 homologous of PAL genes, 13 homologous of CHS gene, 3 homologous of flavonoid 3-hydroxylase (F3H) genes, 5 homologous of flavonoid 3’,5’-hydroxylase (F3’,5’H) genes, 2 homologous of flavonoid 3'-monooxygenase, 3 homologous of flavonoid O-methyltransferase, 1 homologous of ANS genes, 1 homologous of anthocyanidin reductase (ANR), 3homologous of flavonoid O-methyltransferase (CROMT2), 6 homologous of flavonol synthase, and 4 homologous of DFR genesinvolving in anthocyanin synthesis were identified (Table S8). Additionally, 3 homologous of DELLA genes in related to GA pathway were screened. For the photoperiod pathway, 2 transcripts of phytochrome A (PHYA) genes, 1 transcripts of phytochrome B (PHYB) gene, 2 homologous of FLOWERING LOCUS T (FT) genes were screened. In regard to transcription factors, 138 homologous of MYB genes, 2 homologous of WD40 protein genes, 7 homologous of WRKY genes (WRKY1, WRKY2, WRKY22, and WRKY23), 8 homologous of transcription factor MYC2, and 24 homologous of MADS-box transcription factor were screened (Table S8). Moreover, 5 homologous of cytochrome CYP81E genes were found. These transcription factors might contribute a lot to the synthesis of colored anthocyanins.
Based on the integrative analysis of RNA-seq and microRNAs, ath-miR5658 could affect the synthesis of pelargonidin, cyanidin, and delphinidin through downregulating expression of anthocyanidin 3-O-glucosyltransferase in the anthocyanin synthesis process (Figure 11). Particularly, ath-miR868-3p could regulate isoflavonoid biosynthesis through downregulating the expression of CYP81E1/E7, which encoded isoflavone/4'-methoxyisoflavone 2'-hydroxylase. During the process of isoflavonoid biosynthesis, CYP81E1/E7 are vital in catalyzing the synthesis of 2’-hydroxyformononetin, 2’-hydroxydaidzein, and 2’,7-dihydroxy-4’,5’- methylenedioxy-isoflavonefrom daidzein; while converting genistein into 2’-hydroxygenistein and 2’-hyroxybiochanin A (Figure 12). For the flavonoid biosynthesis, naringenin could be converted into eriodictyol by flavonoid 3',5'-hydroxylase. Then, flavonoid 3',5'-hydroxylaseconverted eriodictyol into dihydrotricetin. In particular, the expression of flavonoid 3',5'-hydroxylase could be regulated by ath-miR156g (Figure 13). Furthermore, the conversion of pinobanksin to galangin was catalyzed by flavonol synthase, which could be regulated by ath-miR829-5p in flavonoid biosynthesis process. In related to carotenoid, flavone and flavonol biosynthesis, no miRNA was screened to regulate corresponding mRNA expression.