Quantification of anthocyanidins
We quantified six anthocyanidins (delphinidin, cyanidin, pelargonidin, peonidin, malvidin, and petunidin) known to be involved in color development. Two high contents of malvidin and petunidin were detected in C-S4, the contents of which were 7.0 ug/g fresh weight (FW) and 2.5 ug/g FW, respectively. Otherwise, no color anthocyanidins were detected in the cream flowers of M-S4 (Fig. 1b).
Sequencing and analysis of the floral transcriptome using the PacBio Iso-Seq platform
To identify transcripts that are as long as possible, the transcriptome of the mixed sample from different tissues of C (see Methods for details) were sequenced by the Iso-Seq system, yielding 14.33 million subreads. After the quality control of Isoseq, 140,995 isoforms were obtained, of which 16,340 were high-quality isoforms with an accuracy higher than 99%. A total of 98.52% of the corrected isoforms were mapped to the Medicago genome (M. truncatula Mt4.0v2) using GMAP, and TOFU processing yielded 33,908 non-redundant isoforms (Table 1). The non-redundant transcript isoforms were used in subsequent analyses.
We compared the 33,908 isoforms against the Medicago genome set (Mt4.0v2), and 7784 (23%) new isoforms of annotated genes (ratio coverage < 50%) were obtained using MatchAnnot software (https://github.com/TomSkelly/MatchAnnot), and 513 novel isoforms were obtained that did not overlap with any annotated genes. To determine if the 513 novel isoforms were present in other plants, we conducted BLASTX searches against Swiss-Prot (E-value ≤ e-10, see Methods). In total, 309 (60.23%) of these isoforms were annotated in the Swiss-Prot database, and the remaining isoforms were unannotated (Table S2).
We used AStalavista to determine the number of transcripts generated by each of the five main types of alternative splicing (IR (intron retention), ES (exon skipping/inclusion), A3SS (alternative 3ˊ splice sites), A5SS (alternative 5ˊ splice sites) and MXE (mutually exclusive exons)). Among all alternative splicing events, IR predominated, accounting for 27.5% of alternative splicing transcripts (Fig. 2). MXE were being the least, accounting for 1.9% of alternative splicing transcripts (Fig. 2).
By filtering and excluding transcripts with an ORF of more than 300 bp, 143 lncRNAs were finally obtained. The length of the lncRNAs varied from 202 bp to 2,733 bp, with the majority (72%) having a length ≤ 700 bp. The mean length was 682 bp, which was much shorter than the mean length of all 33,908 isoforms (2,146 bp).
Sequencing and analysis of the floral transcriptome using the Illumina platform
For performance comparison and validation purposes, we also independently generated standard short read RNA-Seq data on the Illumina HiSeq™ XTen sequencing platform. Four floral organs from different developmental stages were sampled from both varieties. To this end, identification of DEGs from different floral organs could contribute to the understanding of the differential control of flower pigmentation. RNA-Seq analysis was performed on the samples described above with three biological replicates for each.
When compared to the PacBio transcript isoforms by BLASTN (coverage ≥ 0.85, e-value ≤ 1e-20, pairwise identity ≥ 90%, min bit score ≥ 100), 36% of the transcript contigs (29,662 contigs) exhibited similarity to 99% of the PacBio transcript isoforms (33,518 isoforms). There were 64% of the transcript contigs (53,870) and 1% of PacBio transcript isoforms (381 isoforms) that were unique to each of the datasets (Fig. 3).
Transcripts with normalized reads lower than 0.5 FPKM were removed from the analysis. In total, 28,365, 28,242, 28,088, and 28,185 transcripts were found to be expressed in C-S1, C-S2, C-S3, and C-S4, respectively. Similarly, 27,810, 27,726, 27,711, and 27,878 transcripts were identified in the samples from the respective stages of M. The numbers of expressed transcripts distributed in the 0.5-1 FPKM range, 1-10 FPKM range and ≥ 10 FPKM range are indicated in Fig. 4a.
Principal component analysis (PCA) revealed that the 24 samples could be clearly assigned to eight groups as C-S1, C-S2, C-S3 C-S4, M-S1, M-S2, M-S3 and M-S4 (Fig. 4b). The samples of C and M from the same stage exhibited a distant clustering relationship, suggesting that the overall transcriptome profile is evidently different for C and M at each developmental stage (Fig. 4b).
DEGs during the flower developments of alfalfa materials with purple and cream flower
The differences in gene expression were analyzed by comparing the four different floral development stages, using the thresholds of false discovery rate (FDR) value < 0.05 and fold change > 2. In total, 2,591, 1,925 and 3,771 DEGs were identified between C-S2 vs C-S1, C-S3 vs C-S2, C-S4 vs C-S3, respectively (Fig. 5a). Similarly, 3,282, 1,490 and 3,868 DEGs were identified between M-S2 vs M-S1, M-S3 vs M-S2, M-S4 vs M-S3, respectively (Fig. 5b). Contrasting S2 with S1, the down-regulated unigenes of C and M were similar to the up-regulated unigenes. Differently, the up-regulated unigenes were dominant between S3 vs S2, as well as between S4 vs S3 in both C and M.
In order to analyze the flower color formation differences in C and M, we compared the DEGs of C and M in the same flower development stage. In total, 4,052, 4,355, 3,293, and 4,181 DEGs were identified between M-S1 vs C-S1, M-S2 vs C-S2, M-S3 vs C-S3, and M-S4 vs C-S4, respectively. Furthermore, 1,693, 1,707, 1,511, and 2,092 DEGs were up-regulated, respectively (Fig. 6).
To identify the metabolic pathways related to flavonoid biosynthesis that were enriched, an analysis of KEGG pathway was conducted by comparing different flowering stages in C and M. With the flower blooming, the enriched pathways related to flavonoid biosynthesis increased evidently. Especially, between M-S4 vs C-S4, flavone and flavonol biosynthesis (ko00944), flavonoid biosynthesis (ko00941) and phenylpropanoid biosynthesis (ko00940) were enriched on the top 5 KEGG pathways (Figure S1), implying the crucial flower color formation stage.
Transcriptional profiles of the genes related to flavonoid biosynthesis
To determine the key genes involved in flavonoid biosynthesis, the genes with FPKM values lower than 5 were excluded. Phenylalanine ammonia-lyase (PAL, 15 isoforms), 4-coumarate: coenzyme A ligase (4CL, 27 isoforms), CHS (15 isoforms), chalcone isomerase (CHI, 3 isoforms), flavanone 3-hydroxylase (F3H) / flavonol synthesis (FLS) (3 isoforms), flavonoid 3′-monooxygenase (F3′H, 5 isoforms), F3′5′H (1 isoform), dihydroflavonol 4-reductase (DFR, 5 isoforms), anthocyanidin synthase (ANS, 4 isoforms), and UDP-glucose: flavonoid 3-O-glucosyltransferase (UFGT, 23 isoforms) were identified (Table S3). An expression heatmap was constructed based on the expression of these identified DEGs of the flavonoid pathway (Fig. 7). It was found that 101 isoforms, encoding 11 enzymes, showed large changes during flower development in both C and M.
Among these DEGs, most PAL genes showed down-regulated expression changes in C, but up-regulated expression patterns in M. In general, the FPKM values of many PALs were significantly higher in C than M (Fig. 7). It is possible that these PALs may be crucial in the formation of flower colors. Most genes encoding 4CLs, CHSs, CHIs, FLS/F3Hs, F3’Hs, F3’5’Hs, ANSs, and UFGTs exhibited similar expression patterns in both C and M with flower blooming However, the FPKM values differed greatly between C and M, indicating differential expression abundance in C and M. Additionally, we found 4 DFRs with different expression changes in C and M (particularly DFR1 and DFR2), the FPKM values of which were evidently higher in C than M, implying their potential functions in color formation in different flowers (Fig. 7).
Gene co-expression network analysis based on flower pigments
To reveal the regulatory network correlated with the changes in the successive developmental stages across the two varieties, we constructed the co-expression modules analysis by WGCNA (Fig. 8). Co-expression networks were constructed on the basis of pairwise correlations of gene expression across all samples. Modules were defined as clusters of highly interconnected genes, and genes within the same cluster have high correlation coefficients among them. From WGCNA, 18 co-expression modules were constructed, of which, the grey 60 module was the largest module, consisting of 2,520 unigenes, whereas the darkseagreen 4 module was the smallest, consisting of only 56 unigenes. The distribution of isoforms in each module (labeled with different colors) and module-trait correlation relationships is shown in Fig. 9. A number of modules displayed a close relationship with different stages.
The most important modules of our concern were the modules enriched in the C or M group, especially in S4 of C and M, which could help to distinguish the flower color phenotype. The modules of interest were thus selected according to the criteria |r| > 0.5 and P < 0.05, and were further annotated by KEGG and GO analysis. The module of skyblue 3 displayed a close relationship with M-S4. In the skyblue 3 module, many pathways related to color formation were enriched (P < 0.01). Among them, flavonoid biosynthesis (ko00941) and phenylpropanoid biosynthesis (ko00940) were the top 2 pathways (Table S4). Furthermore, the modules of bisque 4 and turquoise exhibited a close relationship with M or C, the enriched pathways (P < 0.01) of which were summarized in Table S4.
Candidates responsible for the loss of purple color in alfalfa with cream-colored flower
The expression patterns of 23 candidate genes according to the closed modules are indicated in Table 2. In summary, all 9 PALs were down-regulated during the flower ripening process in C, while in M-S4, they remained stable or declined initially and then increased. Additionally, their relative expression levels in S1-S3 of C were significantly higher than in M. Importantly, PAL6 and PAL9 were identified as candidate hub genes for the module of bisque 4. 4CL18 and 4CL22 were enriched in the module of bisque 4, and 4CL18 was identified as a candidate hub gene for this module. The much higher expression levels of 4CL18 in S1-S3 of C, which were evidently higher than M, were suggestive of a particularly important role for 4CL18 in the pathway. 4 CHSs were enriched in the module of skyblue 3, in which, CHS2, CHS4, and CHS8 were identified as candidate hub genes. They possessed the same expression changes in different stages of C and M, and in the M-S4, the relative expression levels of CHS2, CHS4, and CHS8 were 2.1-, 1.3-, and 2.5-fold higher than in C-S4. We also searched 3 CHRs enriched in these important modules, and found that the expression change patterns of CHR1, CHR2, and CHR3 were consistent with the enriched CHSs. Furthermore, F3’H4, DFR1, DFR2, UFGT22, and UFGT23 were enriched in these modules. In S1 and S2, the expression levels of F3’H4 were 1.2- and 2.0- fold higher in C than in M. With flower development in C, DFR1 was up-regulated and peaked at S3, however, DFR1 exhibited almost no expression in M. DFR2 was up-regulated and peaked at S3 in C, however, it exhibited low expression abundance and remained stable in M. The expression levels of DFR1 and DFR2 were evidently higher in all of the stages of C than M. Higher expression levels in C were also found in UFGT22 and UFGT23 (Table 2).
To further confirm these results and verify the expression of the above genes in the C and M, RT-qPCR was performed to analyze the expression patterns of 12 genes (Fig. 10). Most genes exhibited similar expression patterns between the RT-qPCR and RNA-Seq data, which confirmed the reliability of the RNA-Seq data.