Sequencing and assembly
The C. morifolium ‘ZY’ plants analyzed in this study produced both normal and mutant capitula (Figure 1). The normal capitula were composed of many rounds of ray florets with purple corollas and normally developed pistils. In contrast, the mutant capitula consisted of many rounds of mutant ray florets with green corollas as well as vegetative buds instead of pistils. We analyzed C. morifolium ‘ZY’ normal and mutant capitula, leaves, stems, and roots using PacBio sequencing, after which the normal and mutant capitula were separately analyzed using Illumina paired-end sequencing technology. The resulting sequences were assembled into 130,097 unigenes with an N50 of 3,013 bp and an average length of 2,510 bp (Table 1).
Gene annotation and functional classification
A total of 118,589 unigenes were annotated following a BLAST search of four databases [non-redundant (nr) protein database, Swiss-Prot, EuKaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG)], leaving 11,508 (8.85%) unannotated unigenes. A total of 118,043, 101,048, 87,630, and 54,245 unigenes were annotated on the basis of searches of the nr, Swiss-Prot, KOG, and KEGG databases, respectively. Moreover, the Gene Ontology (GO) database was used for the functional annotation and analysis of genes, which were divided into the following three main categories: molecular function, cellular component, and biological process. Specifically, 36,144 unigenes were classified into 47 functional categories, including 19, 17, and 11 in the biological process, cellular component, and molecular function categories, respectively. The predominant biological process, molecular function, and cellular component GO terms among the genes were ‘metabolic process’ (20,871), ‘catalytic activity’ (22,818), and ‘cell’ (11,887), respectively. This implied that numerous metabolic activities were activated during the development of chrysanthemum capitula in a process regulated by the combined effects of the proteins encoded by these diverse genes. Additionally, a substantial proportion of the genes were annotated with the ‘cellular process’, ‘binding’, and ‘cell part’ GO terms, whereas ‘locomotion’, ‘transcription factor activity, protein binding’, and ‘extracellular matrix component’ were relatively uncommon GO terms (Figure 2).
The KOG database is usually used to identify orthologous and paralogous proteins. Additionally, JGI-predicted genes may be identified according to KOG classifications or IDs. The annotated sequences were used as queries to screen the KOG database to assess the completeness of our transcriptome library and the reliability of our annotation process. Of 118,043 nr hits, 87,630 sequences were assigned KOG classifications. Among the 25 KOG categories, the cluster for ‘general function prediction only’ (28,904, 32.98%) represented the largest group, followed by ‘signal transduction mechanisms’ (23,030, 26.68%) and ‘posttranslational modification, protein turnover, chaperones’ (19,436, 22.18%). Conversely, the ‘defense mechanisms’ (673, 0.77%), ‘extracellular structures’ (454, 0.52%), and ‘cell motility’ (170, 0.19%) clusters were the smallest groups (Figure 3).
To further evaluate the chrysanthemum transcriptome, all unigenes were aligned with the sequences in the KEGG database using the BLASTx algorithm (E-value < 10−5). As a collection of manually drawn pathway maps, KEGG pathways present the networks of molecular interactions in cells and particular organisms. Of the 118,043 unigenes, 31,062 had significant matches with at least one KEGG pathway in the database and were assigned to 133 KEGG pathways in total (Table 2). The most represented pathways were ‘metabolic pathways’ (12,473 members) and ‘biosynthesis of secondary metabolites’ (6,980 members), followed by ‘biosynthesis of antibiotics’ (3,268 members), ‘microbial metabolism in diverse environments’ (2,777 members), and ‘carbon metabolism’ (2,089 members). Additionally, 1,339 unigenes were associated with the ‘plant hormone signal transduction’ pathway.
Alternatively spliced unigenes
The long PacBio sequencing reads can provide extensive information about alternative splicing. In this study, 27,975 unigenes had two or more alternatively spliced isoforms, 15,074 had three or more distinct isoforms, and 10,909 had four or more distinct isoforms (Figure 4A). Seven alternative splicing types were identified based on a SUPPA analysis, including exon skipping (938, 5.5%), alternative 5′ splice site (3,044, 17.8%), alternative 3′ splice site (3,336, 19.5%), mutually exclusive exon (305, 1.8%), retained intron (8,705 51%), alternative first exon (646, 3.8%), and alternative last exon (108, 0.6%). Therefore, retained intron, alternative 3′ splice site, and alternative 5′ splice site were the main alternative splicing types (Figure 4B).
Comparison of the transcriptomes of normal and mutant capitula
Unigenes common to normal and mutant capitula
A total of 124,284 unigenes were shared by the normal and mutant capitula (Figure 5A). In contrast, 3,269 and 955 unigenes were specifically expressed in the normal and mutant capitula, respectively.
Genes differentially expressed between mutant and normal capitula
The transcriptomes of the normal and mutant capitula were compared, and the reads were mapped to the reference transcriptome. A total of 35,419 DEGs (8,232 up-regulated and 27,187 down-regulated in the mutant capitula relative to the corresponding levels in the normal capitula) were identified between the normal and mutant capitula (Figure 5B). The correlation coefficient for the gene expression levels in the normal and mutant capitula was 0.8897, which was determined using an algorithm developed from the correlation scatter plot.
A total of 131 DEGs were specifically expressed in the mutant capitula, including TCP1 and AP2/ERF domain-containing genes. Conversely, 2,132 DEGs were specifically expressed in the normal capitula, including some important transcription factor genes (MYB, GRAS, and BTF3 genes), ubiquitin-conjugating enzyme genes, zinc finger protein genes, and many unannotated genes. These genes may have important functions in developing chrysanthemum flowers, especially during the pistil determination and development stage. The production of normal capitula composed of ray florets with normally developed pistils and purple corollas and mutant capitula containing ray florets with green corollas and vegetative buds may be due to significant differences in the expression of these genes. Details regarding the annotation of the DEGs specifically expressed in the mutant and normal capitula are provided in Additional files 2 and 3, respectively.
The GO and KEGG pathway enrichment analyses of the DEGs uncovered differences in biological processes and pathways between the mutant and normal capitula. The expression levels of 256 genes annotated with the ‘reproduction’ GO term (GO:0000003) in the biological process category were all considerably lower in the mutant capitula than in the normal capitula. Of these genes, 11 were specifically expressed in the normal capitula, including WD40 and UBA1-like protein-encoding genes. These genes may play important roles in the regulatory pathways related to chrysanthemum reproduction (Additional file 4).
A total of 6,733, 7,216, and 3,879 DEGs were enriched in the biological process, molecular function, and cellular component categories, respectively (Additional files 5–7). In the biological process category, the main terms were ‘metabolic process’ (GO:0008152, 5,128 DEGs), ‘cellular process’ (GO:0009987, 4,758 DEGs), and ‘single-organism process’ (GO:0044699, 4,017 DEGs). In the molecular function category, the most represented terms were ‘catalytic activity’ (GO: GO:0003824, 5,765 DEGs), ‘binding’ (GO:0005488, 3,903 DEGs), and ‘organic cyclic compound binding’ (GO:0097159, 2,386 DEGs). Finally, in the cellular component category, the most common terms were ‘cell’ (GO:0005623, 2,633 DEGs), ‘cell part’ (GO:0044464, 2,630 DEGs), and ‘intracellular’ (GO:0005622, 2,490 DEGs). Thus, the physiological and biochemical activities involved in metabolic, cellular, and single-organism processes differed between the mutant and normal capitula. In total, 16,342 down-regulated and 5,485 up-regulated DEGs in the mutant capitula relative to the corresponding levels in the normal capitula were enriched in many KEGG pathways (Additional files 8 and 9). Interestingly, all of the DEGs enriched in the ‘brassinosteroid biosynthesis’ (ko00905) and ‘plant hormone signal transduction’ (ko04075) KEGG pathways were expressed at lower levels in the mutant capitula than in the normal capitula, implying that plant hormone signal transduction activities were suppressed in the mutant capitula. The enriched GO terms and KEGG pathways are listed in Additional files 5–9.
Important transcription factors differentially expressed between mutant and normal capitula
A total of 3,921 important transcription factor genes from 52 classes were detected, among which 963 from the following transcription factor families were substantially differentially expressed between the normal and mutant capitula: AP2 (14 members), ARF (35 members), B3 (41 members), BBR-BPC (3 members), BES1 (8 members), bHLH (75 members), bZIP (22 members), C2H2 (84 members), C3H (51 members), CAMTA (2 members), CO-like (5 members), DBB (11 members), Dof (9 members), E2F/DP (4 members), ERF (98 members), FAR1 (15 members), G2-like (39 members), GATA (8 members), GeBP (1 member), GRAS (29 members), GRF (1 member), HB-other (6 members), HB-PHD (1 member), HD-ZIP (51 members), HSF (20 members), LBD (7 members), LSD (1 member), MIKC (12 members), M-type (9 members), MYB (80 members), NAC (25 members), NF-X1 (12 members), NF-YA (3 members), NF-YB (8 members), NF-YC (5 members), Nin-like (40 members), S1Fa-like (6 members), SBP (13 members), SRS (2 members), TALE (17 members), TCP (3 members), Trihelix (22 members), WRKY (56 members), YABBY (1 member), and ZF-HD (8 members). More specifically, the ERF, C2H2, MYB, bHLH, and WRKY transcription factor families respectively had 98, 84, 80, 75, and 56 members with expression levels that were extremely different between the normal and mutant capitula. Additionally, some important transcription factor genes were expressed only in the normal capitula, including 36 C2H2 genes, 6 bZIP genes, 5 bHLH genes, 5 MYB genes, 4 HB-other genes, 2 C3H genes, 2 E2F/DP genes, 2 GATA genes, 1 ERF gene, 1 HSF gene, 1 NF-X1 gene, 1 TALE gene, and 1 Trihelix gene.
These results suggest that many transcription factors are important for floral development, but the functions of some transcription factors in developing flowers remain to be investigated. The important transcription factor genes substantially differentially expressed between the normal and mutant capitula may be crucial for the phenotypic variations between the normal and mutant capitula. These genes are presented in Additional file 10.
Identification and expression analysis of genes involved in the photoperiod and GA pathways in chrysanthemum
As a typical short-day plant, chrysanthemum can flower in response to a single short day. Homologs of the important regulators of the photoperiod pathway in chrysanthemum were identified. Molecular genetic studies have identified many genes required for responses to the day length, with some encoding important regulators of flowering, whereas other genes encode components of the light signal transduction pathways or pathways involved in circadian signaling, including PHYTOCHROME (PHY), CRYPTOCHROME (CRY), LATE GIGANTEA (GI), and FKF1 (Flavin binding, Kelch repeat, F-box protein 1). In this study, many genes identified based on the transcriptome sequences were revealed as homologs of photoreceptor and circadian clock components involved in the photoperiod pathway (Figure 6A). On the basis of the protein annotations of the mutant and normal capitula transcriptome sequences, many genes were identified, including several CRY1 and CRY2 homologs as well as homologs of PHYA, PHYB, FKF1, LHY, EFL1, EFL3, EFL4, TOC1, and GI. Moreover, homologs were detected for CONSTANS (CO), which is critical for the photoperiod response, and for FT (Flowering Locus T), which is targeted by CO. Many MADS-box genes are important for promoting floral meristem identity, including SHORT VEGETATIVE PHASE (SVP), SUPPRESSOR OF CONSTANS1 (SOC1), and APETALA1 (AP1). PISTILLATA (PI) is a floral organ identity gene that specifies petal and stamen identities in the A. thaliana flower [20] . Additionally, AGAMOUS (AG) interacts with LEAFY (LFY) and TERMINAL FLOWER1 (TFL1) to maintain the identity of an existing floral meristem [21]. We identified homologs of these MADS-box genes. APETALA2 (AP2) encodes an important promoter of floral meristem identity. Two AP2 homologs were identified. LEAFY (LFY), which is vital for the regulation of floral meristem identity, is initially expressed very early throughout the presumptive floral meristem. We did not detect the expression of LFY homologs in the mutant and normal ‘ZY’ capitula, probably because these genes were no longer expressed at the full-bloom stage of the capitula. The SOC1 homolog identified in this study encodes an upstream regulator of LFY expression. Interestingly, its expression level was significantly higher in the mutant capitula than in the normal capitula. As an A-class-like gene, AP1 expression is directly activated by LFY [22, 23]. The AP1 homologs identified in this study were all more highly expressed in the mutant capitula than in the normal capitula.
Another A-class gene, AP2, is not a MADS-box gene and it encodes a transcription factor in the AP2/EREBP family. Two AP2 homologs were identified, both of which were more highly expressed in the normal capitula than in the mutant capitula. Most core eudicot species include three distinct B-class gene lineages: PI, euAP3, and TM6; however, TM6-like genes seem to have been lost in Arabidopsis and Antirrhinum species [24]. In the current study, we identified PI and AP3 homologs, but the expression of the TM6-like genes was undetectable. In contrast, TM6-like gene expression was detected in chrysanthemums in earlier studies [25, 26]. We also identified homologs of the C-class gene AG and E-like MADS-box genes in this study. The A-, B-, C-, and E-like genes were all expressed in the mutant capitula, which lacked normal stamens and pistils. Details regarding the annotation of the important genes involved in the photoperiod pathway are provided in Additional file 11.
A comparison of the expression of the detected MADS-box genes between the mutant and normal capitula revealed that the expression levels of many AP1, SEP, and AGL homologs were slightly higher in the mutant capitula than in the normal capitula. In contrast, the AP3 and PI homologs were expressed at lower levels in the mutant capitula than in the normal capitula, with AP3 homolog expression in the mutant capitula less than half of that in the normal capitula (Figure 6B). This finding may provide researchers with an important clue regarding the molecular mechanism underlying the phenotypic variations between normal and mutant capitula. Details regarding the annotation of the MADS-box genes are provided in Additional file 12.
Previous research proved that GAs, sugars and light help regulate various pathways required to complete the flower development process [27]. The circadian clock is affected by GA signaling, which is controlled by the transcriptional regulation of two GAINSENSITIVE DWARF1 (GID1) GA receptor genes (GID1a and GID1b) in A. thaliana [28]. Earlier studies demonstrated that GA promotes A. thaliana petal, stamen and anther development by inhibiting the function of the DELLA proteins encoded by REPRESSOR OF ga1-3 (RGA), GA-INSENSITIVE (GAI), RGA-LIKE1 (RGL1), RGL2 and RGL3. The GID1a, GID1b and GID1c genes of A. thaliana have been identified [29]. The expression of GASA genes is up-regulated by GA and down-regulated by the DELLA proteins GAI and RGA, which are involved in stem elongation or floral development [30]. In this study, we identified homologs of DELLA protein-encoding genes (RGA, GAI and RGL) as well as GID1 (GID1a, GID1b and GID1c) and GASA (GASA10 and GASA14) genes. Most of the RGA, GAI, RGL1, RGL2 and RGL3 homolog expression levels were significantly higher in the mutant capitula than in the normal capitula. Additionally, the homologs of GA receptor genes (GID1a, GID1b and GID1c) and GA-regulated protein-encoding genes (GASA10 and GASA14) were expressed at lower levels in the mutant capitula than in the normal capitula (Figure 6C). Therefore, the GA signaling pathway was probably suppressed in the mutant capitula. Details regarding the annotation of the important genes involved in the GA pathway are provided in Additional file 13.
Identification and analysis of important regulatory and functional genes in the anthocyanin biosynthesis pathway and the pigments in the corollas of chrysanthemum florets
The MYB-bHLH-WD40 (MBW) activator complexes modulate the expression of downstream genes required for flavonoid biosynthesis in plants. These complexes are composed of R2R3 MYB transcription factors (MYB), the basic helix-loop-helix (bHLH) transcription factors [e.g., Glabra 3 (GL3), Transparent Testa 8 (TT8), and Enhancer of Glabra3 (EGL3)], and the WD40-repeat protein TRANSPARENT TESTA GLABRA1 (TTG1) [31]. The MBW activator complexes directly mediate the expression of late anthocyanin biosynthetic genes, including chalcone isomerase (CHI), flavonoid 3′-hydroxylase (F3′H), dihydroflavonol reductase (DFR), and anthocyanin synthase (ANS) genes, leading to the accumulation of anthocyanins [32].
To explore the molecular basis of the flower color differences between the normal and mutant capitula, we identified and analyzed the expression of genes encoding the R2R3 MYB, bHLH and WD40-repeat proteins, including the homologs of MYB113, MYB114, MYB305, MYB46, Glabra 2 (GL2), Transparent Testa 12 (TT12) and TTG1. The expression levels of most of the R2R3 MYB genes were significantly down-regulated in the mutant capitula, with some genes not expressed at all (Figure 7A, Additional file 14). Similarly, the expression levels of the bHLH and WD40-repeat protein genes (GL2, TT12, and TTG1) were also considerably down-regulated in the mutant capitula. Hua Li et al. suggested that MdMYB8 contributes to the regulation of flavonoid biosynthesis, with the overexpression of MdMYB8 promoting flavonol biosynthesis in crabapple [33]. In this study, four MYB8-like genes (MYB8Cm1, MYB8Cm2, MYB8Cm3, and MYB8Cm4) were not expressed in the mutant capitula lacking anthocyanins; the lack of expression was confirmed by quantitative real-time PCR (qRT-PCR). Flavonol is an upstream substrate for anthocyanin biosynthesis. Therefore, this result implied these four MYB8-like genes may encode important regulators of anthocyanin biosynthesis in the corollas of chrysanthemum florets.
The key functional genes in the anthocyanin biosynthesis pathway, including genes encoding chalcone synthase, chalcone flavonone isomerase, flavanone 3-hydroxylase, flavonoid 3′-hydroxylase, dihydroflavonol 4-reductase, glucosyltransferase, 3-O-glucoside-6″-O-malonyltransferase, acyltransferase, and anthocyanidin synthase, were identified based on the transcriptome (Figure 7B). Most of the late anthocyanin biosynthetic genes were expressed at lower levels in the mutant capitula than in the normal capitula (Figure 7A).
Interestingly, all four DFR homologs and one CHI homolog were expressed at extremely low levels (close to 0) in the mutant capitula. A qRT-PCR analysis indicated that the CHI homolog was not expressed in the mutant capitula, but was expressed in the normal capitula. Information regarding the annotation of these genes in the anthocyanin biosynthesis pathway is provided in Additional file 14.
The pigment types and contents determine the diversity in flower colors. A qualitative analysis of the pigments in the florets of normal and mutant capitula was performed using an HPLC system. Anthocyanins were detected in the florets of normal capitula, but not in the florets of mutant capitula (Figure 8B). The detected anthocyanins were mainly delphinidin, cyanidin, petunidin, pelargonidin, peonidin, and malvidin (Figure 8A). Accordingly, the down-regulated expression of genes encoding MBW activator complex components inhibited the expression of late anthocyanin biosynthetic genes, including CHI and DFR, leading to an anthocyanin deficiency in the mutant capitula.
Verification of gene expression profiles in qRT-PCR assays
To further verify the expression profiles of the unigenes revealed following the Illumina sequencing analyses, 16 unigenes were selected for a qRT-PCR analysis of the mutant and normal capitula originally used for the RNA-seq experiment. Four MYB-like genes (MYB8Cm1, MYB8Cm2, MYB8Cm3, and MYB8Cm4) and one CHI gene (CHICZ-1) were selected because of their important regulatory effects on anthocyanin biosynthesis. The other analyzed genes were selected randomly, including one COP1 gene (COP1CZ), one EF2 gene (EF2CZ) specifically expressed in the normal capitula, one histone H2B gene (HIS2BCZ), one catalase gene (CAT3CZ), one photosystem II 5 kDa protein gene (PSBTCZ), one annexin D8 gene (ANN2CZ), one arginine kinase gene (ARGKCZ), and four unigenes encoding uncharacterized proteins (UnknownCZ1, UnknownCZ2, UnknownCZ3, and UnknownCZ4). The resulting data for all 16 genes were consistent with the sequencing data (Additional file 15, Figure 9).