Phenotypic variations between two B. rapa accessions
The two B. rapa accessions were remarkably different in both SC, SZ and OC after harvest (Fig. 1a,b,c; Additional file 1: Table S1). The SC of B. rapa accession SWUK4 was yellow, whereas that of accession SWUK3 was black (Fig. 1a). The final SZ of SWUK4 seed (mean ± standard error of the longest dimension, 2.42 ± 0.05 mm) was significantly higher than that of SWUK3 (1.23 ± 0.01 mm), representing an approximately two-fold difference. And the fresh seeds of SWUK4 were visibly larger than those of SWUK3 at all seven sampling stages (Fig. 1b). Moreover, near-infrared reflection spectroscopy (NIRS) assays indicated that mature seed OC was significantly higher in SWUK4 (42.15±1.97%) than in SWUK3 (38.93±1.44%) (Fig. 1c). Gas chromatography (GC) measurements also identified differences in the abundances of the FA components (palmitic acid, stearic acid, oleic acid, linoleic acid, and linolenic acid) between the two accessions (Additional file 1: Table S1).
The speed of seed development is an important factor affecting the seed-filling process in B. rapa. The SZ of accession SWUK3 showed the greatest increase at 21–28 days after pollination (DAP) (0.75 ± 0.004 mm), followed by 7–14 DAP (0.4 ± 0.006 mm), whereas that of SWUK4 varied most at 21–28 DAP (0.94 ± 0.009 mm), followed by 28–35 DAP (0.63 ± 0.002 mm) (Fig. 1d), suggesting that the critical stage for SZ formation was 21–28 DAP in both B. rapa accessions, and the following stage, 28–35 DAP, was also quite important for the larger-seed accession. In addition, the two accessions showed the greatest increase in seed OC at 28–35 DAP (SWUK3: 18.02 ± 1.49%, SWUK4: 19.24 ± 2.84%) (Fig. 1e). The seed OC of SWUK4 increased greater than that of SWUK3 in all 7 developmental stages, especially 21–49 DAP (Fig. 1e). These results suggested that the critical stage for seed OC formation might be 28–35 DAP in B. rapa.
The silique length (SL) and thousand-seed weight (TSW) of the two B. rapa accessions were also significantly different. The SL of SWUK4 (40.90 ± 1.14 mm) was 1.6-fold longer than that of SWUK3 (25.00 ± 0.36 mm) (Additional file 1: Table S1), and the TSW of SWUK4 (7.06 ± 0.14 g) was approximate 2.9-fold greater than that of SWUK3 (2.42 ± 0.05 g) (Additional file 1: Table S1).
Transcriptome sequencing analysis
To generate transcriptional networks and identify key regulatory genes regulating SZ, SC, and OC in B. rapa, we collected 42 seed samples from two distinct accessions at seven seed developmental stages. Transcriptome sequencing generated a total of 320 Gb of raw data, including 1,076.28 million reads, averaging 7.62 G and 25.63 million reads per sample (Additional file 1: Table S2). Filtering of low-quality and contaminant reads left 1,034 million clean reads, with an average of 24.64 million reads per sample. The G+C content of all samples varied from 44.39% to 51.51%, and the range of Q30 was from 91.12% to 94.28%. After mapping, three samples (one sampled from SWUK3 at 14 DAP, the rest sampled from SWUK4 at 14 and 21 DAP) were removed because they had unique mapping rates of <70%. This resulted in an average unique mapping rate of 87.27% (Additional file 1: Table S2), indicating that the quality of our transcriptome sequencing reads was good enough for further regulatory network construction and identification of DEGs.
We eliminated one sample from SWUK4 at 49 DAP from further analysis because the correlation coefficient calculated for two of its biological replicates was only ~0.62, suggesting that this sample may have been contaminated. The remaining 38 samples all showed high reproducibility (r2 > 0.9) among biological replicates at each seed sampling stage (Additional file 2: Fig. S1). The correlations among the samples harvested at different seed developmental stages were much lower than those among biological replicates. Furthermore, the samples could be divided into two distinct groups in both B. rapa accessions—samples collected at 7, 14, and 21 DAP grouped together, whereas those from the other four time points formed a second group—suggesting that there might be a clear transition event at 21–28 DAP during seed development in B. rapa (Additional file 2: Fig. S1b,d).
Identification of DEGs
Through comparing the same seven seed developmental stages between the two accessions and the adjacent stages in each accession, a total of 28,954 unique DEGs were identified (Fig. 2, Additional file 1: Table S3). In the former of comparison, the number of DEGs rose over the course of seed development, suggesting that the variation in seed transcriptome between the two accessions gradually increased from seed germination to maturation (Fig. 2). The highest number of DEGs, 14,725, was observed at 49 DAP, and the number of downregulated DEGs was higher than that of upregulated DEGs at six stages, with the exception being 49 DAP (Fig. 2). In the later comparison, both two accessions had the highest number of DEGs at 21 DAP vs. 28 DAP (SWUK3: 12,468, SWUK4: 8,040) (Fig. 2, Additional file 1: Table S3), and the least number of DEGs at 14 DAP vs. 21 DAP (SWUK3: 19,06, SWUK4: 1,797) (Fig. 2, Additional file 1: Table S3). These results suggested that the 21–28 DAP might be the most important seed developmental stage in B. rapa, duo to the greatest number of DEGs. According to the transcriptome comparison between samples at adjacent developmental stages, the change of DEG number showed similar trend in two B. rapa accessions (Fig. 2), suggesting that there was not obvious difference of the seed developmental stage between the two accessions.
An analysis of TFs, transcription repressors (TRs), and kinases using the online program iTAK identified a total of 3,053 TFs, 602 TRs, and 1,500 kinase genes in B. rapa (Additional file 1: Table S4). Among the DEGs, we detected 2,261 unique differentially expressed TFs, indicating that the transcription of TFs (P = 2.48E-12) was regulated to a greater degree during seed development than was that of other genes, based on a chi-square test of the ratio between differentially expressed TFs and DEGs versus that between all TFs and all B. rapa genes. Among the TFs in B. rapa, the AP2/ERF-ERF, MYB, and bHLH families seem to play relatively important roles in seed development, as 169, 136, and 130 members of these families, respectively, were differentially expressed between the two accessions. Similar to TFs, the expression of 889 unique kinase genes and 355 unique TRs was also readily reprogrammed during seed development, but the significance levels (P = 6.33E-5 for kinase genes and P = 0.04 for TRs) determined using chi-square tests were lower than those for TFs.
Gene ontology (GO) enrichment analyses of DEGs
GO enrichment analysis revealed that the genes upregulated at 35, 42, and 49 DAP were significantly enriched in the term "fatty acid biosynthetic process" (GO:0006633). Given that OC differed significantly between the two B. rapa accessions, we deduce that transcriptional variation of these genes may play a crucial role in FA accumulation in B. rapa. Another group of upregulated genes may result in larger seeds through mechanisms involving the cell cycle regulation and CK signaling pathway, as they showed enrichment in "response to cytokinin stimulus" (GO:0009735), "cell cycle" (GO:0007049), "cell division" (GO:0051301), "M phase" (GO:0000279), "M phase of mitotic cell cycle" (GO:0000087), and "regulation of cyclin-dependent protein kinase activity" (GO:0000079) at the middle to late stages of seed development. Among the downregulated genes, important functions related to SC could be identified (Additional file 1: Table S5,6). The GO enrichment analyses showed that "phenylpropanoid biosynthetic process" (GO:0009699) and "flavonoid biosynthesis process" (GO:0009813) were enriched at all stages of seed development according to both methods, except at 49 DAP in the GO enrichment analysis. These results indicated that a reduction of pigment biosynthesis and accumulation may be the cause for the yellow seeds of B. rapa accession SWUK4 (Additional file 1: Table S5).
GO enrichment analysis of DEGs obtained from comparison of the adjacent stages revealed that the up-regulation genes of 21 DAP vs. 28 DAP and 28 DAP vs. 35 DAP in both two accessions were significantly enriched in GO term "fatty acid biosynthetic process" (GO:0006636), "lipid biosynthetic process" (GO:0008610), and "cellular lipid metabolic process" (GO:0044255), suggesting that 21-28 DAP and 28-35 DAP may be the key stages of seed OC information in B. rapa. Genes down-regulated in 7 DAP vs. 14 DAP of SWUK3 and up-regulated in 21 DAP vs. 28 DAP of SWUK4 were significantly enriched in GO terms "cell cycle" (GO:0007049) and "cell division" (GO:0051301), implying that the key stages for SZ increase were different between the two B. rapa accessions. (Additional file 1: Table S6).
Identification of expression patterns in DEGs by K-means clustering
To identify DEGs with similar expression patterns during seed development, we performed two independent K-means clustering tests for 28,954 DEGs in two accessions (Fig. 3a,b), and generated 12 optimal clusters for each accession. We identified several clusters with similar expression patterns between the two accessions, such as cluster 5 of SWUK3 and cluster 12 of SWUK4, both of which showed gene transcription levels that increased continuously with seed development (Fig. 3a,b). GO and KEGG enrichment analyses revealed that these genes were enriched in lipid storage, seed oil body biogenesis, and related pathways, suggesting that the expression levels of genes related to lipid biosynthesis elevated since 21 DAP in SWUK3, and 14 DAP in SWUK4, and both peaked at 35 DAP, which was consistent with the GO results of DEGs obtained from comparison of the adjacent stages in each accession.
A comparison of gene clusters in the two accessions revealed an interesting result in cluster 2 of SWUK3 and cluster 4 in SWUK4. Although the two clusters have different expression patterns, they were significantly enriched in similar functions involved in regulating the SZ, including cell cycle, cell division, and mitosis (Fig. 3c,d). In Arabidopsis, a total of 59 genes have been identified as core genes regulating the cell cycle and cell division. Based on a BLASTP analysis, we identified 101 homologs of these genes in B. rapa (Additional file 1: Table S7). Among these, the majority of positive regulatory genes showed similar patterns upon K-means clustering: they were highly transcribed at 7 and 28 DAP in SWUK3, but at 21–35 DAP in SWUK4 (Fig. 4e). Notably, the expression patterns were consistent with the respective phenotypic variations of SZ in the two accessions (Fig. 1d). Therefore, duration of cell cycle-specific gene expression at different stages may contribute to the difference of B. rapa seed SZ, and merit further investigation.
Construction of co-expression networks
In crops, complex traits are generally regulated by several transcriptional networks. To identify the co-expression networks associated with our target traits, we used the R WGCNA software based on the fragments per kilobase per million mapped (FPKM) expression matrix and phenotypic data of six traits: developmental stage, SZ, SZ increase, seed yellowness, OC and OC increase. The sample clustering and correlation coefficients revealed strong repeatability among the biological replicates, and no outliers needed to be removed (Additional file 2: Fig. S2a). The results were also in agreement with those from the calculation of correlation coefficients, indicating that the seed samples from 7–21 DAP were grouped together, whereas the samples from the remaining stages formed a second group, again suggesting that seed transcriptome reprogramming is closely related to seed developmental stage, but weakly affected by accession.
In the WGCNA pipeline, pickSoftThreshold calculation revealed that the optimal soft threshold was 18, where the fitting curve approached 0.9 (Additional file 2: Fig. S2b). Then, we used the automatic blockwiseModules network construction approach to identify co-expression modules (Additional file 2: Fig. S2c). This allows the visualization of modules by color scheme, showing genes that are highly correlated in the same color, and genes that are weakly correlated in different colors (Additional file 2: Fig. S2d). The module construction process demonstrated that our functional color modules were clearly divided. After merging of modules with similar expression pattern, this process produced a total of 15 color modules, each composed of genes with similar expression patterns over time (Additional file 2: Fig. S3a).
As shown in Additional file 2: Fig. S3a, the abovementioned six traits (developmental stage, SZ, SZ increase, seed yellowness, OC and OC increase) showed significant correlation with different modules. After enrichment analysis of GO for each color module (Additional file 1: Table S8), the genes of the module defined as MEgreen, which had the highest correlation with SZ increase (r2 = 0.73, P = 2E-7), were significantly enriched in "cell cycle" (GO: 0007049), "cell division" (GO: 0051301), "regulation of cell cycle" (GO: 0051726), and "DNA replication" (GO: 006260) (Additional file 2: Fig. S3b). This indicated that the genes related to cell cycle participated in the positive regulation of seed growth rate. The functions of genes in the MEsalmon color module, which showed a significant negative correlation with seed coat transparency (r2 = –0.69, P = 1E-6), were enriched in the terms "flavonoid metabolic process" (GO: 0009812), "flavonoid biosynthetic process" (GO: 0009813), "phenylpropanoid metabolic process" (GO: 0009698), and "phenylpropanoid biosynthetic process" (GO: 0009699) (Additional file 2: Fig. S3c). This revealed that flavonoids negatively regulate yellow seed coat, such as that found in SWUK4. Furthermore, the MEbrown module, which is closely correlated with SZ (r2 = 0.7, P = 1E-6) and OC increase (r2 = 0.7, P = 1E-6), was enriched in "photosynthesis" (GO: 015979), "photosynthesis, light harvesting" (0009765), "lipid biosynthetic process" (GO: 0008610), and "fatty acid biosynthetic process" (GO: 0006633) (Additional file 2: Fig. S3d). This demonstrated that lipid synthesis occurs in the late stages of seed development in B. rapa, and requires photosynthesis to provide necessary materials and energy.
Co-expression modules regulating SZ and SC
Co-expression network construction indicated that a SZ increase was most highly correlated with the MEgreen module (r2 = 0.73, P = 2E-7) (Additional file 2: Fig. S3a). The expression profiles of a large number of genes were highly correlated with both the module eigengene (average expression profile of module genes) and SZ in the MEgreen module (Fig. 4a). GO enrichment analysis indicated that the MEgreen module genes were significantly enriched in "cell cycle" (GO:0007049) and "cell division" (GO:0051301) (Additional file 2: Fig. S3b). A heatmap based on the expression levels of the MEgreen module genes revealed that the module eigengene of the MEgreen module was similar to the average expression profiles of cluster 3 in SWUK3 and cluster 4 in SWUK4 from the K-means clustering results (Fig. 3a,b, Fig. 4b), suggesting that the SZ of B. rapa was regulated mainly by genes related to the cell cycle and cell division.
To identify hub genes in our modules of interest, we first assessed gene connectivity (Kwithin) on the basis of the absolute value of Pearson’s correlations. We then considered genes with the top 30% Kwithin in each module as hub genes of those modules. In the 800 genes of the MEgreen module, the Kwithin values ranged from 29.01 to 285.97. Based on the iTAK results, a total of 21 TFs, 17 TRs, and 36 kinase genes were identified in this module (Additional file 2: Fig. S4a). Two TF genes with high Kwithin values, Bra.A05TSO1 (BraA05g024430, Kwithin = 217.73) and Bra.A09GRAS (BraA09g015380, Kwithin = 201.97), were selected as hub genes of the MEgreen module. Their Arabidopsis orthologs are AtTSO1 (AT3G22780) and AtSCL28 (AT1G63100, a TF of the GRAS family), respectively. In the primary Bra.A05TSO1 network, Bra.A05TSO1 was directly co-expressed with 147 genes, including 2 TFs (belonging to the E2F and B3 families, respectively), 7 TRs, and 11 kinase genes (Fig. 4c). The primary network of Bra.A09GRAS contained 248 co-expressed genes, including 5 TFs (2 B3, 1 GARP and 1 TUB family), 9 TRs, and 15 kinase genes (Additional file 2: Fig. S4b).
Another module of interest was MEsalmon, which showed a significant negative correlation with seed yellowness (r2 = –0.69, P = 1E-6) (Additional file 2: Fig. S3a). The genes in this module were enriched in the GO terms "flavonoid biosynthetic process" (GO:0009813), "regulation of flavonoid biosynthetic process" (GO:0009962), and "phenylpropanoid biosynthetic process" (GO:0009699) (Additional file 2: Fig. S3c). The gene expression levels in this module were clearly lower in SWUK4 than in SWUK3 at most of the sampling stages (Fig. 5a). These results suggest that the formation of yellow seeds in SWUK4 is most likely due to weak expression or silencing of flavonoid biosynthesis pathway genes. The co-expression network of the MEsalmon module comprised 91 genes, including 5 TFs, 1 TR, and 2 kinase genes (Fig. 5b). The Kwithin values of these genes varied from 6.17 to 37.62. Two well-known phenylpropanoid biosynthetic pathway TF genes, Bra.A05TT2 (BraA05g008220, R2R3-MYB, Kwithin = 30.23) and Bra.A09TT8 (BraA09g028560, bHLH, Kwithin = 27.54), were identified as hub genes underlying SC formation (Fig. 5c, d).
MEbrown module had the highest correlation coefficient with OC increase (r2 = 0.7, P = 1E-6) (Additional file 2: Fig. S3a). Genes in this module were enriched in the GO terms "lipid biosynthetic process" (GO:0008610), "fatty acid biosynthetic process" (GO:0006633), and "lipid metabolic process" (GO:0006629) (Additional file 2: Fig. S3d). Most of the genes in this module were most expressed at 28 and 35 DAP in the two accessions, but they had higher expression level in late seed development of SWUK4 (high OC) than SWUK3 (low OC) (Additional file 2: Fig. S5a). It indicates that the critical period for seed oil synthesis in B. rapa is 28-35 DAP. The co-expression network of the MEbrown module comprised 2,268 genes, including 209 TFs, 21 TRs, and 79 kinase genes. The Kwithin values of them from 6.33 to 511.45. Three TFs with the highest Kwithin values, Bra.A03GRF5 (BraA03g036220, Kwithin = 433.30), Bra.A09WRI1 (BraA09g045300, Kwithin = 432.83), and Bra.A06FUS3 (BraA06g038070, Kwithin = 414.13), were selected as hub genes of the MEbrown (Additional file 2: Fig. S5b,c,d).
Role of flavonoid pathway genes in SC formation
Because the abovementioned analysis suggested that the differences in SC and OC between the two B. rapa accessions might be caused by flavonoid pathway and FA biosynthesis genes, we performed a further expression comparison analysis of these two types of genes. All expression values were subjected to log2(FPKM+1) transformation and Z-score normalization.
For pathways involved in the formation of compounds contributing to SC (Fig. 6), there were 43 homologous genes in B. rapa (Additional file 1: Table S9). At the beginning of the pathway, the generation of p-cinnamoyl-CoA requires the three enzymes PAL, CH4, and 4CL, encoded by 14 genes in B. rapa (Additional file 1: Table S9). All of these genes showed significantly differential expression between the two accessions at one or more stages of seed development. Their expression patterns were similar, with expression increasing over time and being higher in the black-seeded accession SWUK3 at almost all stages. In the first step of flavonoid biosynthesis, the formation of p-cinnamoyl-CoA is catalyzed by CHS to produce the chalcone compound naringenin, which is then converted to its isomer naringenin in a reaction catalyzed by CHI. Next, F3H and F3¢H catalyze reactions converting naringenin to dihydrokaempferol and then eriodictyol. Of the 12 genes encoding these four enzymes in B. rapa, only two (one CHI and one F3H gene) were not differentially expressed between the two accessions.
The downstream portion of flavonoid biosynthesis is divided into two branches. In one branch, dihydrokaempferol and eriodictyol are catalyzed by FLS to generate dihydrokaempferol and dihydroquercetin. Of the five genes encoding this enzyme, only one gene was not differentially transcribed between accessions. In the other branch of the pathway, dihydrokaempferol and dihydroquercetin are catalyzed to become (–)-epiafazelechin and (–)-epicatechin by the enzymes DFR, LDOX, and BAN, which are encoded by five B. rapa genes. Notably, four of these genes were silenced and the fifth showed dramatically decreased transcription in the yellow-seeded accession SWUK4 at all seed developmental stages, indicating that the mutation of epicatechin branch genes may block pigment accumulation in B. rapa seeds.
Role of genes involved in lipid biosynthesis and storage in OC increase
In plants, lipid biosynthesis and accumulation in seeds can be divided into four steps (Fig. 7). (1) Pyruvate and other substances form C16-C18 FAs in the plastid. (2) FAs are transported into the cytoplasm, where they undergo elongation and desaturation of their carbon chains. (3) Various FAs and glycerol are catalyzed to synthesize triacylglycerols (TAG) and store them in the seed oil body, after which some are (4) degraded by products of the GDSL-type Seed Fatty Acid Reducer (SFAR) genes.
In the first step, genes shared similar expression patterns in both accessions, being more highly expressed at 28–49 DAP than at 7–21 DAP (Fig. 7). However, most genes encoding proteins involved in this step were generally more highly expressed in the higher-OC accession SWUK4 at each developmental stage, and could be found in cluster 12 of SWUK3 and cluster 5 of SWUK4 in K-means clustering, indicating that oil synthesis in B. rapa seed is mainly initiated at a later stage of seed development. In the second step, the four genes encoding the relevant catalytic enzymes (3-Ketoacyl-ACP Synthase II (KAS II), Stearoylacyl Carrier Protein Desaturase (SAD), Fatty Acid Export 1 (FAX1), and Long Chain Acyl-CoA Synthetase 9 (LACS9)) were more highly expressed in SWUK4 (Fig. 7), suggesting that SWUK4 seeds have a larger acyl-CoA pool than SWUK3 seeds. Differential expression of genes encoding Lysophosphatidylcholine Acyltransferase (LPCAT) and FA Desaturase 2/3 (FAD2/3) between the two accessions may cause the significant variation in FA components, including oleic acid, linoleic acid, and linolenic acid, in mature seeds detected by GC analysis (Additional file 1: Table S1). In the third step, TAG is formed in the endoplasmic reticulum catalyzed by four enzymes, glycerol-3-phosphate acyltransferase 9 (GPAT9), 1-acylglycerol-3-phosphate acyltransferase (LPAAT), phosphatidic acid phosphatase (PAP), and diacylglycerol acyltransferase (DGAT) [26]. The majority of genes encoding these four enzymes were more highly expressed in SWUK4 than in SWUK3 at stages 28–49 DAP, implying that greater TAG biosynthesis and storage in seed oil body might be the cause for the higher OC in SWUK4. In the fourth step, SFAR genes negatively regulate FA storage and seed oil body size, leading to FA degradation. B. rapa has 12 SFAR genes, but only two of them showed significant upregulation in the higher-OC accession SWUK4 at late sampling stages, suggesting that this accession does not adopt the strategy of negatively regulating SFAR activity to increase OC. In addition, the overexpression of SFARs lead to that increase the content of C18:1 and C20:1, and reduce the proportion of C18:2 and C18:3 [27], which explained the reason that SWUK4 has the higher content of oleic acid and lower content of linoleic acid and linolenic acid in its mature seeds (Additional file 1: Table S1). However, whether the OC of accession SWUK4 could be increased by reducing FA degradation remains to be established.
qRT-PCR validation
The relative expression levels of the 15 key DEGs at the seven seed developmental stages were analyzed by qRT-PCR to assess the accuracy of the RNA-seq results. The relative expression patterns of DEGs tested were positively correlated with the fold change variations obtained from the RNA-seq results (Fig. 8). The correlation coefficients between qRT-PCR and RNA-seq for the 15 DEGs ranged from 0.75 to 0.92. Further comparison showed that all the lower correlation coefficients between the two approaches belonged to three flavonoid pathway genes, Bra.A01TT18, Bra.A01ANR, and Bra.A09TT3. Due to the silencing of the three genes in the yellow-seeded B. rapa accession, their FPKM values were zero, but had to be set to 0.001 for fold change calculation, leading to lower correlation coefficients. However, the remaining 12 genes associated with FA metabolism and cell cycle progression all showed better correlation than the flavonoid pathway genes, demonstrating the reliability and accuracy of our RNA-seq results.