A 24,482-bp Deletion Increases Seed Weight Through Multiple Pathways in Rapeseed (Brassica Napus L.)

Exploration of the genes controlling seed weight is critical to improve crop yield and understand the mechanisms underlying seed formation in rapeseed (Brassica napus L.). We previously identied the quantitative trait locus (QTL) qSW.C9, for the thousand-seed weight (TSW) trait, in a double haploid population constructed from F 1 hybrids between the parental accessions HZ396 and Y106. Here, we conrmed the phenotypic effects associated with qSW.C9 in BC 3 F 2 populations and ne-mapped the candidate causal locus to a 266-kb interval. Sequence and expression analyses revealed that a 24,482-bp deletion in HZ396 containing six predicted genes most likely underlies qSW.C9. Differential gene expression analysis and cytological observations suggested that qSW.C9 affects both cell proliferation and cell expansion through multiple signaling pathways. After genotyping a rapeseed diversity panel to dene their haplotype structure, we suggest that the selection of germplasm carrying two specic markers may be effective in improving seed weight in rapeseed. This study provides a solid foundation for the identication of the causal gene of qSW.C9 and offers an attractive target for breeding higher-yielding rapeseed. we ne-mapped a major QTL for seed weight in rapeseed and a 24,482-bp deletion might be responsible for it through multiple pathways. the three QTLs qSS.C9, qSL.C9, and qSW.C9, which respectively affect NSS, silique length, and TSW, located within same in linkage group (chromosome) C09 We later cloned qSS.C9, which predicted to encode a small of 119 acids with in the formation of megaspores In this study, we established that qSW.C9 co-segregates with a 24,482-bp deletion removing six predicted genes, two of which encode an E3 ubiquitin ligase and a cytochrome P450 family protein and are likely candidates for the causal locus underlying this QTL. We also characterized the cytological components that associate with seed size and revealed a series of known genes that may be involved in seed size regulation in rapeseed. Finally, haplotype analysis showed that the HZ396 haplotype, representing a germplasm with large seeds, is present at low frequency in a rapeseed diversity panel and may have potential applications in breeding. This study thus lays the foundation for better understanding the mechanism of seed development and provides a new resource to improve rapeseed yield.


Introduction
Rapeseed (Brassica napus L.) is the second-highest-producing oil crop worldwide (USDA ERS, 2020). Rapeseed provides edible oil for the human diet and is also a promising alternative source of protein for animal feed (Basunanda et al. 2010;Fattahi et al. 2018). Increasing rapeseed yield has long been a primary breeding objective in many countries. It is an especially timely breeding goal in China because of the gradual reduction of arable land and the consequences of rapeseed multiple-purpose development as a means to meet the increasing demand for edible oil (Sun et al. 2018). In rapeseed, yield is determined by the planting density and seed gain per plant, the latter being a re ection of the number of siliques per plant, number of seeds per silique (NSS), and thousand-seed weight (TSW) (Fan et al. 2010). High-density planting is already prevalent in many rapeseed-growing regions, making it relatively challenging to increase yield by modulating the number of siliques per plant or the plant density. However, seed weight and number of seeds per silique are potentially amenable to genetic improvement and thus are promising avenues to explore, which will be accelerated by the characterization of genes associated with seed development.
Quantitative trait locus (QTL) mapping studies for seed weight in rapeseed, using linkage analysis or genome-wide association (Cai et al. 2014;Chen et al. 2007;Fattahi et al. 2018;Fu et al. 2015;Li et al. 2014;Lu et al. 2017), have identi ed over 168 QTLs regulating seed weight in various populations (Raboanatahiry et al. 2018). Although several QTLs have been ne-mapped to smaller regions (Wang et al. 2020a), only two causal loci have been cloned and shown to in uence seed weight: the cytochrome P450 gene BnaA9.CYP78A9 and AUXIN RESPONSE FACTOR18 (ARF18) (Liu et al. 2015;Shi et al. 2019). Along with SUPPRESSOR WITH MORPHOGENETIC EFFECTS ON GENITALIA7 (BnaC9.SMG7b) (Li et al. 2015), which controls the number of seeds per silique, they comprise the short list of genes for yield traits cloned in rapeseed using classical genetics approaches.
The many QTLs yet to be cloned represent a vast untapped resource to dissect the mechanisms of seed weight regulation in rapeseed. Given how few genes have been cloned, ne-mapping of important QTLs and cloning their causal genes are particularly critical steps.
Many genes with speci c roles in regulating seed size have been cloned in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), among which some have been classi ed as belonging to major regulatory pathways based on the type of protein they encode or their associated functions Li et al. 2019b). The close genetic relationship between rapeseed and Arabidopsis can be exploited by characterizing functionally important genes in Arabidopsis and then validating the function of their homologs in rapeseed. This strategy has identi ed several genes with functions in seed development, although these genes may also affect additional traits. For instance, DA1 encodes an E3 ubiquitin ligase and the mutant da1-1 results in large seeds in Arabidopsis (Du et al. 2014;Li et al. 2008;Xia et al. 2013). Reduced expression of the rapeseed ortholog BnDA1 can also increase seed size in transgenic rapeseed plants, demonstrating that BnDA1 has the same function in rapeseed as DA1 in Arabidopsis ). In addition, ENHANCER OF DA1-1 (EOD3, also known as the cytochrome P450 gene CYP78A6) was identi ed as an enhancer of the da1-1 mutant and shown to positively regulate seed size in Arabidopsis. The function of the rapeseed homologs was tested via genome editing to inactivate all four EOD3 homologs: the resulting mutant plants showed a reduced thousand-seed weight (TSW) and silique length (SL) and an increased number of seeds per silique (NSS) (Khan et al. 2021). However, no other genes controlling yield-related traits have been characterized in rapeseed, possibly due to differences among Arabidopsis, rice, and rapeseed, whose polyploid genome may hinder the function validation of potential redundancy genes.
Thus, it is clear that homology alone is insu cient to identify genes with functions in seed development; however, we reasoned that predictive power may be improved by combining genetics and genomics with transcriptome analysis.
Indeed, gene functions are often determined by their spatial and temporal expression patterns, as seen with genes regulating seed weight Liu et al. 2020b;Song et al. 2019). Differential transcriptome analysis has provided new directions of research for the identi cation of genes involved in a given pathway or process. For example, the rapeseed gene UBIQUITIN-PROTEIN LIGASE3 (BnaUPL3.C03) regulates seed weight and was discovered by transcriptome association analysis, as its expression is negatively correlated with seed weight. Further analysis showed that BnaUPL3.C03 affects seed weight by mediating the degradation of LEAFY COTYLEDON2 (LEC2), a key transcription factor affecting seed ripening (Miller et al. 2019). The identi cation of BnaUPL3.C03 provides a proof of concept for the identi cation of functionally important genes by exploiting transcriptomic data as a complementary or alternative approach to traditional map-based cloning and reverse genetics strategies.
We previously reported that the three QTLs qSS.C9, qSL.C9, and qSW.C9, which respectively affect NSS, silique length, and TSW, are located within the same interval in linkage group (chromosome) C09 (Zhang et al. 2012;Zhang et al. 2011). We later cloned qSS.C9, which is predicted to encode a small protein of 119 amino acids with a key role in the formation of functional megaspores (Li et al. 2015). In this study, we established that qSW.C9 co-segregates with a 24,482-bp deletion removing six predicted genes, two of which encode an E3 ubiquitin ligase and a cytochrome P450 family protein and are likely candidates for the causal locus underlying this QTL. We also characterized the cytological components that associate with seed size and revealed a series of known genes that may be involved in seed size regulation in rapeseed. Finally, haplotype analysis showed that the HZ396 haplotype, representing a germplasm with large seeds, is present at low frequency in a rapeseed diversity panel and may have potential applications in breeding. This study thus lays the foundation for better understanding the mechanism of seed development and provides a new resource to improve rapeseed yield.

Plant materials
Genetic materials were derived primarily from a cross between the high thousand-seed weight (TSW) inbred line HZ396 (receptor) and the low TSW inbred line Y106 (donor), as previously reported (Li et al. 2015;Zhang et al. 2011). Two BC 3 F 2 segregating populations were used to validate the linkage between the TSW phenotype and the qSW.C9 QTL. Recombinant lines screened from the BC 4 F 2 population were used for ne-mapping, using the phenotype of their progeny. An NIL (Y106) was randomly selected in the BC 5 F 2 population and allowed to self for more than three generations.
For most experiments, as well as for sequence and expression analysis, we used the popular inbred line ZS11, whose genome sequence is available (Song et al. 2020).

Plant growth conditions and trait evaluation
All plant materials were sown at the experimental station of Huazhong Agricultural University during normal growing seasons. Each row consisted of 10-12 plants, with distances of 20 cm between individuals and 25 cm between rows. During the growing period, conventional management was conducted according to local planting practices. Two BC 3 F 2 populations were sown in autumn 2016, with 373 and 355 individuals each. During the same growth period, the progeny of 21 recombinant lines were sown in a randomized order, with two rows per line. The two parents and NIL(Y106) were sown and grown over 4 years from 2016 to 2019, with at least six rows sown per germplasm. ZS11 was sown in autumn 2017 and used to sample the seed growth period in the spring of the following year.
For all trait evaluations, individual plants with the same growth status and free of disease were selected for phenotypic analysis. After plants reached maturity, they were air-dried. Ten well-developed pods in the same position about midway along the main stem were used to measure the number of seeds per pod . Then, the total number of seeds on the main stem and the rst lateral branch were used to measure the thousandseed weight (TSW), by weighing 500-1,000 seeds and then converting the weight into TSW as described (Wang et al. 2020a). Data collection, analysis, and visualization were performed in Microsoft Excel 2016 and GraphPad Prism 8 (https://www.graphpad-prism.cn/).

Genomic DNA extraction and genotyping
The young leaves or cotyledons were used to extract genomic DNA by a modi ed CTAB method. Most of the markers used during ne-mapping in this study were sequence-characterized ampli ed region (SCAR) markers, with the exception of several simple-sequence repeat (SSR) markers, whose names have the pre x "SR." Depending on the marker, products were separated by agarose gel or polyacrylamide gel electrophoresis.

Sequence and candidate gene analysis
The bacterial arti cial chromosome (BAC) clone HBnB016G24 was re-sequenced by next-generation sequencing (NGS) to provide a precise reference.
The software Geneious 4.8.3 was used for sequence analysis and alignments. IGV 2.8.6 (http://www.igv.org/) was used to visualize the reads generated by NGS. IBS 1.0.3 (http://ibs.biocuckoo.org/) was used to draw the sketch maps of gene distributions.

Total RNA extraction and RT-qPCR analysis
For expression analysis, samples were obtained from ZS11 at nine stages of seed development and subjected to RNA-seq analysis. The nine stages were as follows: 0-6 mm pistils, 1 DAP (days after pollination) pistils, 3 DAP pistils, 6 DAP seeds, 15 DAP seeds, 21 DAP seeds, 28 DAP seeds, 38 qPCR. Plant tissues were harvested in a nuclease-free environment, quickly frozen in liquid nitrogen, and then stored at -80°C. Total RNA was extracted with TRIzol reagent (Invitrogen). First-strand cDNA synthesis was initiated with 2 μg total RNA with GoScript TM Reverse transcriptase (Promega, USA). qPCR was performed with the GoTaq qPCR Mix (Promega, USA) on a Bio-Rad CFX96 Real-time System (Bio-Rad). The 2 -ΔΔCt method was employed to calculate the relative expression levels of genes of interest based on three biological samples and three technical replicates per sample (Livak and Schmittgen 2001). BnACTIN2 (BnaC03G0430900ZS) was used as the internal control for normalization. Data collection, analysis, and visualization were performed in Microsoft Excel 2016 and GraphPad Prism 8.

Transcriptome analysis
Transcriptome deep sequencing (RNA-seq) was performed on an Illumina HiSeq platform with three biological replicates per sample. Gene expression levels are reported as trimmed means of M-values (TMM). To screen for differentially expressed genes (DEGs), Rsubread (DOI: 10.18129/B9.bioc.Rsubread) was used to count the reads associated with each gene to generate an input le compatible with DESeq2 (DOI: 10.18129/B9.bioc.DESeq2) for differential expression analysis. An adjusted p-value < 0.05 and an absolute value of log 2 ratio > 1 were applied as thresholds to select DEGs. Gene Ontology (GO) enrichment analysis was performed with clusterPro ler (DOI: 10.18129/B9.bioc.clusterPro ler).

Cytological observations and analysis
Mature dry seeds of HZ396 and NIL(Y106) were germinated on wet absorbent paper. The hypocotyls of 10-day-old seedlings were xed and embedded in para n to visualize cells in cross sections after toluidine blue staining. Slices were observed on a Nikon Eclipse E100 microscope and images collected with CaseViewer (https://www.3dhistech.com/). ImageJ (https://imagej.net/Welcome) was then used to count the number of cells and measure their corresponding area. For each line, data from six or seven separate slices were collected for analysis.
The epidermal cells of mature dry seeds of HZ396 and NIL(Y106) were observed by scanning electron microscopy (SEM). The images were collected at 400× magni cation; ImageJ was then used to count the number of cells and measure their corresponding area. Nine separate seeds were selected from each material for observation and statistical analysis. Data collection, analysis, and visualization were performed in Microsoft Excel 2016 and GraphPad Prism 8.

Haplotype analysis
A diversity panel of 505 inbred lines was used for haplotype analysis with three years of phenotypic data on TSW (Tang et al. 2020). Genomic DNA from these lines was used as a template to genotype the SCAR markers XH14 and STC9-164 by agarose gel electrophoresis. The XH14 marker is dominant and is therefore scored as presence or absence. The STC9-164 marker results in two bands during ampli cation, with the smaller band being used to determine the genotype. The combination of these two markers de ned four haplotypes. One-way ANOVA was performed to determine statistical signi cance of the differences for TSW between the HZ396 haplotype and the other haplotypes. Data collection, analysis, and visualization were performed in Microsoft Excel 2016 and GraphPad Prism 8.

qSW.C9 behaves as a single locus in the BC 3 F 2 population
To verify the function of qSW.C9, we generated and phenotyped two BC 3 F 2 populations derived from a cross between the inbred lines HZ396 and Y106, with 203 (group A) and 157 (group B) effective individuals. The parents exhibited a signi cant difference of TSW, with a value of 4.91±0.39 g for HZ396 and 3.56±0.19 g for Y106 (Fig. 1a, b). TSW values from group A ranged from 2.66 to 4.92 g; a chi-squared test demonstrated that the low TSW phenotype segregated in a Mendelian fashion, as the segregation ratio was 3:1 (Table. S1). The TSW phenotype co-segregated with the codominant marker SCC9-136 and the dominant marker STC9-164 in group A (Fig. 1c, e). In addition, plants heterozygous or homozygous for the Y106 allele at these two markers displayed similar and low TSW values (Fig. 1d, f), results that were recapitulated in group B (Fig. S1). These results indicated that qSW.C9 is closely linked to markers SCC9-136 and STC9-164 and that the HZ396 allele at this QTL behaves as a single dominant gene that regulates rapeseed TSW.

Fine-mapping of qSW.C9
To ne-map the qSW.C9 QTL and clone the underlying locus, we characterized 19 recombinant lines in the BC 3 F 2 population with the high TSW phenotype with markers SCC9-136 and SRC9-21 (Fig. S2). We then genotyped 21 representative recombinants between the same markers with seven markers spanning the interval and also determined TSW values of their progeny (BC 4 F 3 population) in 2017. Based on these data, we narrowed down the mapping interval to between markers SRC9-298 and SRC9-397. The size of the corresponding genomic region is about 261 kb in the rapeseed Damor-bzh reference genome but appears to be 310 kb in the ZS11 reference genome, which is of better quality. To gain a more accurate genomic sequence for the interval, we sequenced the BAC clone HBnB016G24 and compared the aligned sequence to the ZS11 reference genome in ~150-kb fragments: sequence identity was as high as 99.9% (Fig. S3a). With this new reference sequence, we designed two linked simple-sequence repeat (SSR) markers (XHSRC9-69 and XHSRC9-52) and a single-nucleotide polymorphism (SNP) marker (SNP5479) between STC9-298 and SRC9-397 (Fig.   2a). Although we genotyped 118 individuals with the two SSR markers, we did not identify any new recombination break points. We also genotyped 21 recombinant lines with SNP5479 (Fig. S4), allowing us to anchor the mapping interval to between markers SNP5479 and SRC9-397 (Fig. 2b), or a 266-kb genomic fragment in the ZS11 genome.

Analysis of genes within the mapping interval
The 266-kb region contained 39 genes in the ZS11 reference genome (Fig. 2c), including several sets of duplicated or triplicated homologous genes. To explore the genomic differences between the HZ396 and Y106 parental lines, we sequenced the HZ396 and Y106 genomes, assembled the reads over the mapping interval, and compared the assembly to ZS11. The Y106 genomic sequence over the mapping interval was largely congruent with that of ZS11, whereas HZ396 carried a large deletion, in addition to other variations (Fig. S3b). To determine the size and delineate the break points of the deletion in the HZ396 background, we designed PCR primers annealing to either side of the presumptive deletion interval. One primer pair ampli ed a 5-kb fragment speci cally in HZ396, but not in Y106, as the predicted size of the PCR product in this genotype would be over 25 kb (Fig.  S3c). This 5-kb fragment from HZ396 aligned perfectly to the ZS11 reference but also revealed a deletion of 24,482 bp ( Fig. 2d and Fig. S3d). We validated the size and position of the deletion by using a set of nine speci c markers spanning the deletion interval (Fig. S3e). Outside the deletion, resequencing data for HZ396, Y106, and NIL(Y106) showed no genomic variation in the coding regions of the 33 genes present in the 266-kb interval, with the exception of BnaC09G0547900ZS and BnaC09G0548700ZS. BnaC09G0547900ZS had one SNP in the exon that leads to a synonymous mutation, while BnaC09G0548700ZS carried an SNP in an intron (Table. 1). This result indicated that there were no differences between HZ396 and Y106 in regard to protein-coding genes in the mapping interval aside from those included in the deletion.
We then turned to transcriptome deep sequencing (RNA-seq) to measure expression levels of candidate genes by collecting samples for nine stages of seed development from ZS11 plants. Surprisingly, only 26 genes were expressed during seed development, while the remaining 13 genes were not expressed at any stage and are therefore unlikely to be the causal gene for qSW.C9 (Fig. S5). 15 DAP appeared to be a useful sampling time point, as more genes within the mapping interval were expressed then than at any other stage. We therefore focused on RNA-seq data from HZ396 and NIL(Y106) seeds at 15 DAP. Compared to NIL(Y106), 424 genes were signi cantly upregulated in HZ396, with the other 609 genes that were signi cantly downregulated (Fig. S5a). We validated the RNA-seq data for 15 randomly selected DEGs by RT-qPCR, con rming the reliability of our RNA-seq results (Fig. S5b). None of the genes within the mapping interval showed signi cant differences in expression between HZ396 and NIL(Y106) ( Table.1 and Fig. S5c), pointing to the 24,482-bp deletion and the six corresponding genes as most likely to be causal for qSW.C9.
Within the deletion interval, two genes caught our attention. BnaC09G0551200ZS encodes a cytochrome P450 protein (CYP81K2), whose function is not known. Notably, many cytochrome P450 genes regulate seed size and yield in Arabidopsis and other crops (Eriksson et al. 2010;Fang et al. 2012;Khan et al. 2021;Shi et al. 2019;Xu et al. 2015;Zhao et al. 2016). In addition, CYP81K2 is the only gene with no nearby homolog within the mapping interval, making it a prime candidate for qSW.C9. The gene pair, BnaC09G0551500ZS and BnaC09G0551600ZS (described as BnaC09G0551500ZS/BnaC09G0551600ZS), together encode a RING-type E3 ubiquitin ligase that is homologous to Arabidopsis JAV1-ASSOCIATED UBIQUITIN LIGASE1 (JUL1). JUL1 is reported to be involved in abscisic acid (ABA) and jasmonic acid (JA) signaling, especially during abiotic stress (Ali et al. 2019;Yu et al. 2020a). This is consistent with SALT-AND DROUGHT-INDUCED REALLY INTERESTING NEW GENE FINGER1 (SDIR1), a gene related to seed weight in wheat (Triticum aestivum) (Wang et al. 2020b, Gao et al. 2011Liu et al. 2013;Zhang et al. 2008).

qSW.C9 controls seed size through multiple signal pathways
To determine the molecular mechanism(s) by which qSW.C9 regulates seed size, we subjected the DEGs between HZ396 and NIL(Y106) to Gene Ontology (GO) analysis. This analysis revealed 95 enriched GO terms across the three categories biological processes (BP), cellular compartment (CC), and molecular function (MF) ( Table S2). Most enriched GO terms were related to ion homeostasis and transmembrane transport. The constituent genes were associated with well-known pathways that regulate seed size or yield: sucrose transport, cytochrome P450 genes, and leucine-rich repeat (LRR)-related genes (Fig. 3a). We validated the RNA-seq results by RT-qPCR for nine genes related to seed size (Fig. 3b).
The GO analysis and DEGs allowed us to create a list of genes that may be involved in the regulation of seed weight in rapeseed. This list included several published genes linked to seed size, yield, and fruit size, as well as others probably encoding seed size regulatory factors, such as proteins involved in cell division and expansion, transcription factors, phytohormone signaling proteins, and transporters (Table 2). This analysis suggested that the difference in seed size between HZ396 and NIL(Y106) may be regulated by multiple signaling pathways.
qSW.C9 controls seed size by regulating both cell expansion and proliferation Seed size may be regulated by cell expansion, cell proliferation, or both. To gain a cellular-level understanding of the difference in seed size between HZ396 and NIL(Y106) (Fig. 4a, c), we examined cells from the seed coat of mature HZ396 and NIL (Y106) seeds by scanning electron microscopy (SEM). In addition to a difference in cell morphology, the cells also showed a slight but signi cant difference in size between HZ396 and NIL (Y106), as determined by their surface (Fig. 4b, d). In addition, the cell number in HZ396 was 34.2% greater than in NIL(Y106), while the cell size was 7.8% greater than in NIL(Y106) ( Table. S3). These results suggested that qSW.C9 regulates both cell number and cell size. We extended our analysis to para n cross sections of hypocotyls (Fig. 4e). Again, we observed signi cant differences in the number of cells in the epidermis and the inner three layers of the hypocotyl (Fig. 4f), as well as in the size of cells (Fig. 4g). These results indicated that qSW.C9 probably controls seed size by regulating both cell expansion and proliferation.

Haplotype analysis of a rapeseed diversity panel
To explore the effect of qSW.C9 in natural populations, we collected 505 accessions with TSW data and determined their genotype at qSW.C9 using the speci c markers XH14 and STC9-164. XH14 is a marker designed to amplify BnaC09G0551500ZS/BnaC09G0551600ZS, while STC9-164 is speci c for BnaC09G0551300ZS (Fig. S6). Both markers are dominant, as they will either amplify the gene (as in Y106) or yield no amplicon (as in HZ396). These two markers will broadly classify our collection into four possible haplotypes at or near qSW.C9 (Fig. 5a). Haplotype A (positive for both markers, like Y106) consisted of 478 accessions; their TSW values varied from 1.95 to 5.46 g, with a mean of 3.47 g. Haplotype D (negative for both markers, like HZ396) consisted of 12 accessions, with TSW values ranging from 2.16 to 4.42 g and a mean of 3.59 g. Haplotype B (positive for STC9-164 only) comprised 12 accessions; the TSW values varied from 2.59 to 3.64 g with a mean of 3.14 g. We identi ed no germplasm with haplotype C. Although mean TSW across haplotype D increased by 0.12 g over that of haplotype A, this difference was not signi cant, as determined by one-way ANOVA. By contrast, mean TSW for haplotype D was signi cantly higher than that in haplotype B, as mean TSW increased by 0.45 g (Fig.   5b). These results indicated that haplotype D is a rare haplotype that offers the largest TSW values, which will be of great potential use to improve seed weight in rapeseed. The genotyping markers XH14 and STC9-164 will provide an effective means to select haplotype D during the breeding process.

Discussion
We previously showed that the QTL for the TSW trait qSW.C9 co-located with the QTL qSS.C9 for NSS using linkage analysis in a double haploid population (Zhang et al. 2012). We later cloned the causal gene for qSS. C9, BnaC9.SMG7b (Li et al. 2015). Here, we ne-mapped qSW.C9 to a 266-kb region on chromosome C09 and discovered a 24,482-bp deletion that is likely responsible for qSW.C9. Notably, this mapping interval contains BnaC9.SMG7b, indicating the strong linkage between TSW and NSS. Though the co-segregation of the TSW and NSS phenotypes also raises a conundrum regarding whether BnaC9.SMG7b regulates both NSS and TSW, giving rise to more seeds per silique but with small size, we think it more likely that qSW.C9 corresponds to a candidate gene other than BnaC9.SMG7b itself for two reasons. First, when considering the co-regulation between yield-related traits, SL and NSS are generally regulated in the same direction, whereas the relationship between TSW and NSS is not always consistent (Hussain et al. 2020). Second, no seed size-related phenotypes have been reported in smg7 mutants (Bulankova et al. 2010;Kerenyi et al. 2013;Lee et al. 2020;Raxwal et al. 2020;Riehs-Kearnan et al. 2012).
Aside from BnaC9.SMG7b, there are ve other candidate genes in the 24,482-bp deletion region. In Arabidopsis and other crops, there are no reports clearly showing any of these genes to be related to yield traits. However, the E3 ubiquitin ligase encoded by BnaC09G0551500ZS/BnaC09G0551600ZS also warrants some attention. The ubiquitin-proteasome system plays an essential role in almost all aspects of biology (Moon et al. 2004). Notably, a related pathway controlling seed size has already been established in Arabidopsis and rice (Li and Li 2014; Xu and Xue 2019). RING-type E3 ligases, such as Arabidopsis DA2 and rice Grain Width2 (GW2), play an important role in regulating seed size; their homologs in wheat (Triticum aestivum) and maize (Zea mays) appear to have the same function (Lee et al. 2018;Sestili et al. 2019;Simmonds et al. 2016;Wang et al. 2018;Xie et al. 2018;Zhang et al. 2018;Zhao et al. 2015). Here, BnaC09G0551500ZS/BnaC09G0551600ZS encode a RING-type E3 ubiquitin ligase that is homologous to Arabidopsis JUL, which has reported roles in abiotic stress, ABA, and JA signaling (Ali et al. 2019;Yu et al. 2020a). Notably, the wheat homolog of the RING-type E3 ligase-encoding Arabidopsis gene SALT-AND DROUGHT-INDUCED REALLY INTERESTING NEW GENE FINGER1 (SDIR1) is a negative regulator of seed size (Wang et al. 2020b), while SDIR1 is reported to be involved in salt tolerance and drought resistance (Gao et al. 2011;Liu et al. 2013;Zhang et al. 2008). This observation supports the hypothesis that BnaC09G0551500ZS/BnaC09G0551600ZS might be causal for qSW.C9.
Because the seed coat determines the space available for the embryo and endosperm to develop, seed size is usually affected by the seed coat (Li et al. 2019b). Examining seed coat cells will therefore help us better understand seed formation. SEM has been effectively applied in Arabidopsis and many crops to reveal the morphology of cells at the surface of the seed coat Lyu et al. 2020;Xu et al. 2018;Yang et al. 2019).
Although SEM has been used to observe rapeseed silique epidermal cells (Shi et al. 2019), the direct observation of cells from the seed coat is typically carried out via serial sectioning (Li et al. 2019a). We demonstrated here that SEM can provide a detailed view of seed coat cells. We observed signi cant differences in cell size and number between the parental line HZ396 and NIL(Y106), indicating that qSW.C9 regulates both cell proliferation and cell expansion. Most genes regulate seed size by affecting either cell division or cell expansion, but rarely both; examples include Arabidopsis EOD3 (Fang et al. 2012), rice Grain Size5 (GS5) (Li et al. 2011), and rice DENSE AND ERECT PANICLE2 (DEP2) (Abe et al. 2010). We observed that the expression of BnEOD3 in NIL(Y106) is six times higher than in HZ396, indicating that the germplasm with low BnEOD3 expression also exhibited larger seeds, which is not consistent with the reported phenotypes associated with loss of EOD3 in Arabidopsis or rapeseed (Khan et al. 2021). This difference suggests that BnEOD3 may regulate cell proliferation and cell expansion through a novel regulatory mechanism in our materials.
We identi ed several previously reported genes regulating seed size among our DEGs. Some, such as CYP78A5/KLU (Eriksson et al. 2010), SUCROSE-PROTON SYMPORTER2 (SUC2) (Wang et al. 2015), and JASMONATE METHYL TRANSFERASE (JMT) (Kim et al. 2009), had homologs in Arabidopsis whose functions are opposite to those reported here. However, the expression of the rapeseed homologs of HISTIDINE PHOSPHOTRANSFER PROTEIN4 (AHP4) (Hutchison et al. 2006) and C-TERMINALLY ENCODED PEPTIDE RECEPTOR1 (CEPR1, also named XYLEM INTERMIXED WITH PHLOEM1 [XIP1]) (Taleski et al. 2020) was consistent with the functions reported in Arabidopsis. The mechanism underlying the regulation of seed weight and size may thus be more complex in rapeseed than in Arabidopsis, but our RNA-seq and GO analyses have provided new directions to help dissect this critical regulatory network.
As the qSW.C9 QTL was associated with yield, we aimed to develop genotyping markers that might be useful for marker-assisted breeding by screening germplasm resources. Although a single marker often is su cient to distinguish different genotypes, in other cases multiple markers may be necessary to identify favorable alleles (Liu et al. 2020a;Wang et al. 2020b). Here, we identi ed a 24,482-bp deletion in HZ396 compared to Y106 and developed two speci c markers for the deletion region. We then identi ed rapeseed accessions with the HZ396 haplotype (PCR-negative for both markers) and observed that this rare haplotype was associated with the highest TSW values. Thus, our identi cation of the qSW.C9 QTL should have considerable utility for future rapeseed breeding and improvement.

Declarations
Author Contributions XZ conducted most experiments, including ne-mapping, sequence analysis, cytological observations, and haplotype analysis. QH and FL participated in phenotypic and genotypic analyses of BC 3 F 2 populations and recombinant lines. ML and XL participated in sequence analysis. PW participated in RNA-seq data analysis. XZ wrote the original draft. ZW, LW, and DH were involved in reviewing and editing the manuscript. DH and GY designed and supervised the project. All authors read and contributed to the revision of manuscript.
Zhao M, Gu Y, He L, Chen Q, He C (2015) Sequence and expression variations suggest an adaptive role for the DA1-like gene family in the evolution of soybeans. BMC Plant Biol 15:120 Tables   Table 1 Predicted genes within the 266-kb region and differences in sequence and expression levels b Difference in expression levels between HZ396 and NIL(Y106). ND, no difference; NE, no expression; --, not clear.