Morphological analysis
There were no significant differences in head characteristics among the Ogura CMS line T54C, the DGMS line T54M and their maintainer line T54S (the inbred line), but there were some other differences in maturity days, flowering time, production and seed characters, which are described in detail in Table 1. Our analysis of the agronomic traits of broccoli at harvest revealed that the Ogura-CMS broccoli line showed stronger performance in head weight and head width than the DGMS line or the maintainer line (Table 1). The DGMS line had longer production (days to maturation and flowering) than the Ogura-CMS and inbred lines. In contrast, the inbred line presented a higher seed yield and seed germination rate than the DGMS and Ogura-CMS lines. The Ogura-CMS line showed the lowest seed yield and germination rate.
Transcriptome analysis
Six cDNA libraries from broccoli florets obtained at harvest times from the T54C, T54M and T54S lines were subjected to Illumina sequencing with two biological replicates for each sample. After filtering of invalid reads, 159,424,568 clean reads and 47,827,370,400 clean bases (47.8 Gb) were obtained (Table 2). The Q20 and Q30 percentages were 98.12–98.41% and 95.06–95.62%, respectively, with GC percentages of 45.08–46.80% (Table 2).
Genomic Characteristics and-Nucleotide Polymorphism (SNP) Distribution
As shown in Table 2, the average amounts of clean bases data for T54C, T54M and T54S were 8.77 Gb, 7.91 Gb and 7.25 Gb, respectively. The average GC percentages of T54C, T54M and T54S were 46.77%, 45.82% and 46.45%, respectively. There were no obvious differences in numbers of mapped reads. In addition, the alternative spicing pattern distributions of the samples were shown in Fig. 1. The T54C vs T54M comparison presented more alternative splicing of five patterns than the T54C vs T54S and T54M vs T54S comparisons, and the T54C vs T54S and T54M vs T54S comparisons showed similar alternative splicing patterns with regard to alternative 3' splice sites (A3SSs), alternative 5' splice sites (A5SSs), mutual exclusive exons (MEXs), mutual intron retention (IR) and exon skipping (ES).
As shown in Table 3, we found that eight regions of the genome contained SNP. The T54S line showed a higher percentage of SNP distribution in exonic regions (54.39%) than the T54C (52.35%) and T54M (52.79%) lines but a lower percentage in intergenic regions (15.85%) than the T54C (16.96%) and T54M (17.25%) lines. The remaining regions of SNP distribution showed no clear differences.
Gene optimization and correlation test
A total of 5754 optimized regions involving 3872 unigenes were predicted using the gene structure optimization method (Fig. 2; Table S1). Approximately 13 mitochondrial unigenes (chrMT) were optimized (Table S1).
To study the reliability the results and differences among the samples, correlation tests were performed for each sample comparison based on the log2 (FPKM) values of gene expression, as shown in Table 4. All the coefficients of correlation for the pairs of samples ranged from 0.913 to 0.973, with the Pearson’s R2 test values ranging from 0.867 to 0.973 (R2 > 0.8). The coefficients of correlation among replicate T54C, T54M and T54S samples were 0.965 (R2 = 0.943), 0.960 (R2 = 0.925) and 0.966 (R2 = 0.919), respectively. The coefficients of correlation between T54C and T54M, T54C and T54S, and T54M and T54S ranged from 0.867 to 0.918, 0.867 to 0.905, and 0.894 to 0.973, respectively. Therefore, the inbred line and the DGMS line are more closely related than the inbred line and the Ogura-CMS line.
Diffential genes expression analysis and DEG clustering
To explore the DEGs, the gene expression variations were analyzed in three comparisons T54C vs T54M, T54C vs T54S and T54M vs T54S (Fig. 3A, B). In the T54C line compared with the T54M line, 505 and 1109 genes were up- and downregulated, respectively. In the T54C line compared with the T54S line, 585 and 1073 genes were up- and downregulated, respectively. In the T54M line compared with the T54S line, 469 and 534 genes were up- and downregulated, respectively. The DEG cluster analysis (Fig. 4) revealed that the T54C line showed similar up- and downregulated genes in comparisons with the T54M and T54S lines, but the T54M vs T54S comparison presented very different numbers and types, as shown in Fig. 3B.
Analysis of the Ogura-CMS, DGMS and inbred lines
To better understand the function and biological processes associated with the DEGs, GO term (p < 0.01) and KEGG pathway (p < 0.01) analyses were performed for the DEGs between T54M and T54S, T54C and T54S, and T54C and T54M. The GO analysis classified the DEGs from these comparisons into 60 GO categories were separated into three main groups: the biological process (BP) group, the molecular function (MF) group and the cellular component (CC) group (Fig. 5). With regard to the T54M vs T54S comparison, GO analysis revealed that 332 and 331 upregulated genes corresponded to the intracellular and intracellular parts terms, respectively, both belonging to the CC group. However, other genes associated with terms in the CC group, including those associated with membrane (302), cell periphery (225), plasma membrane (182) and cellular region (104) terms, as well as 228 genes in the BP group were downregulated in the T54M line (Fig. 5A). With regard to the DEGs between T54C and T54S, GO analysis revealed that upregulated genes were associated with the response to stimulus (389), response to stress (271), or response to chemical (255) terms in the BP group, or with the nucleus (419) terms in the CC group. The organism metabolic process (270), extracellular region part (421), intrinsic component of membrane (300), and cell periphery (232) terms were associated with downregulated genes (Fig. 5B). In the T54C vs T54M comparison, more upregulated genes were associated with the cell (423) and cell part (423) terms in the CC group than with terms in the BP and MF groups, and the extracellular region (122) term in the CC group was associated with more downregulated genes than other terms (Fig. 5C).
The KEGG pathway results were comprehensively evaluated and filtered (p < 0.01), and there were eight terms strongly related to plant development. Specifically, photosynthesis-antenna proteins (7 upregulated genes), plant hormone signal transduction (41 upregulated genes), phenylpropanoid biosynthesis (39 downregulated genes), adenosine triphosphate (ATP)-binding cassette (ABC) transporters (4 up- and 5 downregulated genes), fatty acid elongation (5 upregulated genes), fatty acid metabolism (25 downregulated genes), flavonoid biosynthesis (15 downregulated genes) and polycyclic aromatic hydrocarbon degradation (8 upregulated genes) terms were distributed among different samples (Table 5). But there needs further validation of these genes in broccoli.
Identification of functional modules in PPI networks
The Protein-Protein Interactions Network (PPI-Net) was developed in 2011 [25, 26]. To better understand the interactions of the DEGs, PPI networks were constructed using Cytoscape software. As shown in Fig. 6, unlike the T54M vs T54 comparison, which showed a radiating PPI network, the T54C vs T54S comparison and T54C vs T54M comparison showed branching PPI network. These networks accurately reflected the different backgrounds of the Ogura-CMS, DGMS and inbred lines of broccoli. The results also suggested the existence of a close relationship between the DGMS and inbred lines of broccoli but a distant relationship between both of these lines and the Ogura-CMS line. In the PPI network for T54C and T54S, the most highly upregulated genes were LOC106311788 (auxin-responsive protein IAA3-like) and LOC106339384 (E3 ubiquitin protein ligase DRIP2), and the most highly downregulated genes were LOC106343730 (auxin-responsive protein IAA7) and LOC106331616 (auxin-responsive protein IAA7-like). In the PPI network for T54C and T54M, the most highly upregulated genes were LOC106312313 (uncharacterized protein At5g05190), LOC106327001 (auxin response factor 7) and LOC106317959 (auxin response factor 7-like), and the most highly downregulated gene was LOC106304816 (SKP1-like protein 14). In the PPI network for T54M and T54S, the most highly upregulated gene was LOC106298512, and the clusters of downregulated genes with higher degrees were LOC106322852 (protein TRANSPARENT TESTA 12-like), LOC106325277 (ribonuclease 1), LOC106295628 (protein TRANSPARENT TESTA 12-like) and LOC106340694 (protein ORGAN SIZE RELATED 1).
Special DEG selection and qRT-PCR validation
To better understand and explain the different agronomic traits of the broccoli lines T54S, T54M and T54C during developmental stages in the spring and autumn of 2016, fourteen DEGs were selected and their expression was validated by qRT-PCR. These genes were associated with photosynthesis-antenna proteins, plant hormone signal transduction, phenylpropanoid biosynthesis, ABC transporters, fatty acid elongation, fatty acid metabolism, flavonoid biosynthesis and polycyclic aromatic hydrocarbon degradation. One additional gene associated with each term was targeted, and the transcriptome data and qRT-PCR results showed similar patterns for the fourteen DEGs (Figure S2).