Data filtering and short reads assembly
We selected one vital seed development stage to compare the color difference between the Wuqi mustard and the Wugong mustard. At 30 DAP, large differences of the seed color between the two parents were observed (Supplemental Fig. 1). During this stage, the seeds of the Wuqi mustard were yellow, while those of the Wugong mustard began to turn brown. Therefore, the seeds at 30 DAP were prepared for BSR-Seq in Brassica juncea (B.juncea). Total 41.62-53.90 million pairs of 150 bp raw reads were generated from BSR-Seq. After the data filtering, 40.79-52.82million clean reads remained. Because ribosome contamination affects subsequent analysis, we removed the reads could be mapped to the Ribosomal RNA . The remained reads were used to map to the reference genome (Ensembl release 36 bra), and the Cufflinks Software was used for reference annotation based transcripts assemble.
Gene expression in developing seeds at 30 DAP
Based on RSEM software, the FPKM value was calculated between BB bulk and BY bulk. A total of 169 differentially expressed genes (DEGs) were identified (|log2 FlodChange| > 1, P<0.05), out of which, 157 were significantly up-regulated and 12 were significantly down-regulated in BY seeds compared to BB seeds at 30 DAP. KEGG functional annotation indicated that 17 (10.83%), 13 (8.28%) and 9 (5.73%) of the up-regulated genes were related to flavonoid biosynthesis, phenylpropanoid biosynthesis and phenylalanine, respectively, and no down-regulated genes were related to these three pathways, which is consistent with the previous study that yellow seed coat color is related to flavonoid biosynthesis in rapeseed[21]. Therefore, the DEGs involved in flavonoid biosynthesis were selected as the potential genes in the regulation of seed color, including BjBAN, BjDFR, BjLDOX, BjTT5, BjC4H, BjTT8, BjTT4, BjPLA, and BjF3H (Table 1). Obviously, the expression of the up-regulated DEGs in flavonoid biosynthesis expressed in the BB bulk were significantly higher than that in the BY bulk (Table 1).
Discovery of polymorphic SNPs
Reads from the pooled BB and BY bulks were extracted separately for SNP discovery. A total of 6825 polymorphic SNPs between the pooled BB and BY samples were identified (Supplemental Table 1). A total of 118 SNPs were mapped to scaffolds, and the remaining 6707 SNPs were in 10 chromosomes. The number of SNPs on chromosomes varied with the length of the chromosomes. The number of SNPs on chromosome A09 and A04 was the highest and the lowest, respectively. The mapping of the candidate genes was carried out using 6825 SNPs through BSR-Seq. The linkage probability of each SNP was plotted against its physical coordinate in the Brassica rapa (B. rapa) reference genome (Chiifu-401). Each of the SNP that had a high probability of being linked to the candidate gene clustered on chromosome A09 (Fig. 1). Therefore, the flavonoid biosynthesis DEGs on the A09 chromosome were selected as the candidate genes, including BjDFR, BjF3H, BjTT5, BjTT4 and BjTT8.
Differential expression patterns of candidate genes in B. juncea
The majority of the flavonoid biosynthesis DEGs on the A09 chromosome had different expression patterns, and the expression level of the genes was significantly different between the yellow and brown seeded lines. BjF3H and BjDFR had similar expression patterns in the developing seeds whose expression peaked at 9 DAP. The expression of the BjTT5 gene peaked at 23 DAP. The expression of BjTT4 and BjTT8 peaked at different stages in the yellow and brown-seeded parents, their expression peaked at 16 DAP in the Wugong mustard, and peaked at 23 and 9 DAP in the Wuqi mustard, respectively. The specific information is shown in Fig. 2.
Analysis of candidate genes in Brassica juncea (B. juncea)
Differences in the content of the seed coat color related indexes in different seed developmental stages were observed. During the development stages of seeds, the flavonoid (except 9DAP and 16DAP), anthocyanidin, melanin and total phenol contents of Wugong mustard were significantly higher than those in Wuqi mustard (p<0.01). The content of flavonoids, melanin and total phenol in both Wuqi mustard and Wugong mustard steadily increased along the seed development stage, with the maximum value detected at 45 DAP. However, the synthesis of these indexes occurred earlier in Wugong mustard than in Wuqi mustard. The difference of flavonoid and anthocyanin content in the two parents peaked at 38 DAP and 30 DAP, respectively. And the difference of melanin and total phenol between the two parents peaked at 45 DAP. The detailed information was shown in Table 2.
As shown in Table 3, the expression of BjTT5 and BjF3H had good correlation with the seed coat color related indexes in Wugong mustard and Wuqi mustard. For BjTT5, the expression showed significant or extremely significant positively correlated with the content of anthocyanins and total phenols in the two parents with a correlation coefficient of 0.817, 0.939, 0.848 and 0.811 respectively. For BjF3H, the expression was significantly negatively correlated with the content of total phenol in Wugong mustard with a correlation coefficient of -0.874. In Wuqi mustard, the expression of BjF3H was also negatively correlated with the content of the four seed coat color related indexes with a correlation coefficient of -0.900, -0.827, -0.836, and -0.878, respectively (Table 3).
In order to determine which one is the most likely candidate gene, twenty pairs of IP primers based on the sequences of chromosome 5 in Arabidopsis were designed, among which, two IP primers, IP-at3g51220 and IP-at3g51280, successfully amplified polymorphic bands between the two parents. Thus, these two primers were used to screen the small population and the entire BC9 population. As a result, all of them showed polymorphism, and no recombinants were identified, which co-segregated with the yellow seed coat color genes. As these two IP markers were from chromosome 5 of Arabidopsis, and close to F3H (AT3G51240), combined with the results of correlation between the expression of the key genes involved in flavonoid biosynthesis and seed coat color related indexes in the two parents, it was possible that BjF3H was the candidate gene controlling the yellow seed coat color.
Candidate gene cloning
BjF3H was amplified in the two parents, and the two genes were named as Wuqi-F3H and Wugong-F3H with the full-length of 3161bp and 3211bp respectively, containing three exons and two introns. Two CDS sequences in Wuqi mustard and one CDS sequence in Wugong mustard were obtained, named Wuqi-F3H.1, Wuqi-F3H.2 and Wugong-F3H, separately. The ORF sequences were 1077bp, 1071 bp and 1077 bp, encoded protein of 358, 356 and 358 amino acids.
The isoelectric points of Wuqi-F3H.1, Wuqi-F3H.2 and Wugong-F3H were 5.45, 6.73, 5.45, the molecular weights were 40.15kD, 39.86kD, 40.10kDa, respectively. Seven single nucleotide differences between Wuqi-F3H.1 and Wugong-F3H were detected, resulting in one amino acid changes (254, F/V) (Supplemental Fig. 2). Two typical domains, DIOX_N super family domain and 2OG-FeII_Oxy super family domain, were identified between residues 38-154 and 193-294, respectively (Fig. 3), these amino acid changes were identified in the conserved domain between two parents.
BjF3H gene cloning in Brassica napus (B. napus), Brassica juncea (B. juncea) and Brassica rapa (B. rapa)
We cloned F3H sequence from 2 yellow Brassica rapa (B. rapa) genotypes,2 brown Brassica rapa (B. rapa) genotypes, 3 yellow Brassica napus (B. napus) genotypes, 3 brown Brassica napus (B. napus) genotypes, 3 yellow Brassica juncea (B. juncea) genotypes and 3 brown Brassica juncea (B. juncea) genotypes. The sequencing results showed that two copies were obtained in the Brassica napus (B. napus) lines BnB1, BnB2, BnY1, BnY2 and BnY3, and one copy was cloned in Brassica napus (B. napus) line BnB3. However, only one copy was obtained in all Brassica juncea (B. juncea) lines. Two copies were cloned in BrY1 and BrY2 of Brassica rapa (B. rapa), and a copy was cloned in BrB1 and BrB2 of Brassica rapa (B. rapa). The ORF length of these copies was 1077 bp, encoding a protein of 358 amino acids (Supplemental Table 2).
The results of SNP analysis showed that 42, 8 and 16 SNPs were detected in Brassica napus (B. napus), Brassica juncea (B. juncea) and Brassica rapa (B. rapa), respectively, resulting in nine (7: T/N, 29: V/A, 66: N/S, 161: N/S, 188: N/S, 242: K/E, 324: K/R, 331: I/L and 341: K/Q), one (254: V/F) and one (254: V/F) amino acids changes, respectively. The four Brassica rapa (B. rapa) lines, six Brassica juncea (B. juncea) lines and the two parents (Wuqi mustard and Wugong mustard) had the same mutation sites of amino acids (254: V/F) (Supplemental Table 3).
Phylogenetic tree analysis
The BLAST tool developed by BRAD (http://brassicadb.org/brad/blastPage.php) was used to conduct the phylogenetic analysis. Bra036828 from Brassica rapa (B. rapa) was clustered together with Wugong-F3H; GSBRNA2T00129251001 from Brassica napus (B. napus) was clustered together with Wuqi-F3H.1; Bol044664 from Brassica oleracea (B. oleracea) was clustered together with Wuqi-F3H.1 (Fig 4). All the genes are involved in flavonoid synthesis. Therefore, it is presumed that the candidate gene can respond to seed coat color. We also found F3H from Brassica napus (B. napus) was clustered together with F3H from Brassica rapa (B. rapa), and most of F3H from Brassica napus (B. napus) were clustered (Fig. 4), indicating that the genetic relationship of Brassica rapa (B. rapa) and Brassica napus (B. napus) is closer than that of Brassica rapa (B. rapa) and Brassica juncea (B. juncea) in China. A sequence of 1500 bp upstream of the ATG of Wuqi-F3H and Wugong-F3H was cloned using the primer F3H-Up and analyzed for cis-acting elements using Plant CARE. It showed that a number cis-acting elements were identified, including AT-rich, ATBP-1, 5UTR Py-rich stretch, MYB binding site, MBS, ACE, GCN4_motif and TGA, among which, the auxin-responsive element TGA was specific for the upstream of Wugong-F3H gene sequence (Table 4).