Genome-wide association study identifies candidate genes and favorable haplotypes for seed yield in Brassica napus

Oilseed rape (Brassica napus L.) is one of the most essential oil crops. Genetic improvement of seed yield (SY) is a major aim of B. napus breeding. Several studies have been reported on the genetic mechanisms of SY of B. napus. Here, a genome-wide association study (GWAS) of SY was conducted using a panel of 403 natural accessions of B. napus, with more than five million high-quality single-nucleotide polymorphisms (SNPs). A total of 1773 significant SNPs were detected associated with SY, and 783 significant SNPs were co-located with previously reported QTLs. The lead SNPs chrA01__8920351 and chrA02__4555979 were jointly detected in Trial 2_2 and Trial 2_mean value, and in Trial 1_2 and Trial 1_mean value, respectively. Subsequently, two candidate genes of BnaA01g17200D and BnaA02g08680D were identified through combining transcriptome, candidate gene association analysis, and haplotype analysis. BnaA09g10430D detected through lead SNP chrA09__5160639 was associated with SY of B. napus. Our results provide valuable information for studying the genetic control of seed yield in B. napus and valuable genes, haplotypes, and cultivars resources for the breeding of high seed yield B. napus cultivars.


Introduction
Oilseed rape (Brassica napus L.) is the second most important oil crop after soybean in the world (Zhang et al. 2022). Increasing seed yield (SY) of B. napus can meet the growing global demand for oil and protein (Zhao et al. 2016). Studying the genetic basis of SY in B. napus and mining candidate genes associated with SY has become particularly important (Raboanatahiry et al. 2018;Deng et al. 2019). In recent years, linkage mapping analysis has been widely applied to the study of the genetic basis of SY in B. napus (Shi et al. Abstract Oilseed rape (Brassica napus L.) is one of the most essential oil crops. Genetic improvement of seed yield (SY) is a major aim of B. napus breeding. Several studies have been reported on the genetic mechanisms of SY of B. napus. Here, a genome-wide association study (GWAS) of SY was conducted using a panel of 403 natural accessions of B. napus, with more than five million high-quality single-nucleotide polymorphisms (SNPs). A total of 1773 significant SNPs were detected associated with SY, and 783 significant SNPs were co-located with previously reported QTLs. The lead SNPs chrA01__8920351 and chrA02__4555979 were jointly detected in 1 3 Vol:. (1234567890) 2009; Zhou et al. 2014;Zhao et al. 2016;Chao et al. 2019). Two hundred twenty-six quantitative trait loci (QTLs) have been identified for yield and yield-related traits (biomass yield, thousand-seed weight, plant height, first effective branch height, first effective branch number, length of main inflorescence, and pod number of main inflorescence) in eight environments (Zhao et al. 2016). Among them, eighteen QTLs were detected for SY, explaining 5.63-20.91% of the phenotypic variation (PVEs) (Zhao et al. 2016). Besides that, a total of 870 QTLs were identified for nine yield related traits, of which nearly 10% (85) were identified for SY, which can explain 3.1-33.4% of the PVEs (Shi et al. 2009).
Compared with linkage analysis, GWAS has the advantages of high positioning accuracy without constructing mapping population Liu et al. 2022). GWAS has been widely applied to the genetic dissection of seed (grain) yield in different crops, such as rice (Zhong et al. 2021), maize (Ma and Cao 2021), oil palm (Kalyana et al. 2021), wheat (Garcia et al. 2019), soybean (Hu et al. 2020), and cotton (Zhu et al. 2021). In rice, seventy-four quantitative trait nucleotides (QTN) were significant associated with grain yield and yield related traits, such as grain length (GL), grain width (GW), grain thickness (GT), thousandseed weight (TSW), and yield per plant (YYP), respectively, and 17 QTNs for YYP explained 0.28-6.99% of the PVEs (Zhong et al. 2021). The association panel of maize consisting of 309 cultivars was used to perform GWAS for eight grain yield traits (Ma and Cao 2021). Only one SNP was identified for grain yield per plant with a PVE of 5.92% (Ma and Cao 2021). In recent years, GWAS has also been widely used in the genetic analysis of complex agronomic traits in B. napus, including harvest index (Lu et al. 2017), plant height (Sun et al. 2016), branch number (Zheng et al. 2017), branch angle (Shen et al. 2018), and seed yield (Pal et al. 2021). A total of 16, 18, 27, and 18 SNPs were identified to be significantly associated for siliqua length, number of seeds per siliqua, thousand-seed weight and SY, respectively (Pal et al. 2021). In addition, six candidate genes for SY were identified on chromosomes A02 and C02 (Pal et al. 2021). However, the genotypes used in above GWAS were all based on 60 K SNP chips.
With the continuous development of sequencing technology, genetic markers developed by wholegenome re-sequencing have been widely used in B. napus GWAS analysis .
In this study, we investigated the SY in an association panel of B. napus across 2 years. GWAS of SY was performed using more than five million SNPs by whole-genome resequencing. A total of 1773 significant SNPs were identified to be associated with SY and three candidate genes were identified. In addition, three haplotypes of BnaA-01g17200Hap1 (TTAT), BnaA02g08680Hap1 (GGCT), and BnaA09g10430Hap2 (TGA CAT CCCCG) associated with high SY were revealed, which provided the molecular basis for subsequent molecular marker-assisted breeding.

Plant materials and field trials
A total of 403 B. napus accessions were used in this study. Detailed information on them and their genotypes are described in our previous study . All accessions were planted in sandy loam soil at Meichuan Town, Wuxue City, Hubei Province, China (115.55°E, 29.85°N) during B. napus growing seasons from 2018 to 2019 (Trial 1) and from 2019 to 2020 (Trial 2). Each accession had four rows and each plot had eight plants in each row. After harvest, six plants in each plot were selected to measure SY. The mean value, maximum, minimum, and coefficient of variation were calculated using Excel 2007. Trial 1_mean value and Trial 2_mean value were calculated with the phenotypic values of three replications in Trial 1 and Trial 2, respectively (Supplementary Table 1). The R language was used to calculate the correlation coefficients between trials.
Genome-wide association analysis and candidate gene identification More than 5 million high-quality SNPs of the association panel were derived from previous study Liu et al. 2021). Tassel 5.0 software was used for GWAS analysis by general linear models (GLM) and mixed linear models (MLM) (Bradbury et al. 2007). The threshold for this population was 6.25 × 10 −07 ). The analysis of non-synonymous and synonymous SNPs was performed based on gene models from the version 5 annotation on Genoscope using SnpEff (Cingolani et al. 2012). GGplot2 (https:// cran.r-proje ct. org/ web/ packa ges/ ggplo t2/ index. html) and CMplot (https:// github. com/ YinLi Lin/ CMplot) software were used to draw Manhattan plot and quantile-quantile plot, respectively. The population could be divided into five subgroups based on the cross-validation errors and the r pairwise relative kinship was close to 0 . The LD value of the association panel is 300 kb ; thus, the genes located within LD decay (300 kb) value up-/downstream of the peak SNPs were considered as candidate genes. The genotypes of BnaA01g17200D, BnaA02g08680D, and BnaA09g10430D in the association panel of B. napus were obtained by VCFtools software (Danecek et al. 2011). Candidate gene association study of BnaA01g17200D, BnaA02g08680D, and BnaA09g10430D was performed with Tassel 5.0 software (Bradbury et al. 2007). The SNP markers from 2 kb upstream of the gene to termination codon were used to conduct association analysis with the SY of the association panel of B. napus.

Candidate gene identification and haplotype analysis
The published genome of "Darmor-bzh" was used as a reference genome to annotate the candidate genes by BLAST analysis using gene sequence from Arabidopsis and rice (Chalhoub et al. 2014). The best hits for B. napus were considered to be the homologous gene. To detect candidate genes for SY, all genes located in the 300 kb up-/downstream of candidate lead SNP were selected and common differentially expressed genes were predicted to be candidate genes combined with the transcriptome data of four high and four low-yield B. napus cultivars (Lu et al. 2017). The annotation of candidate genes was obtained from TAIR10 (https:// www. arabi dopsis. org/). HaploView.4.2 software was used to conduct haplotype analysis (Barrett et al. 2005). Haplotypes containing at least 10 B. napus accessions were used for further comparative analysis, and Student's t-test was used to compare the differences in SY among the haplotypes.

Phenotypic variation for SY of an association panel of B. napus
SY ranged from 1.51 to 25.07 g/plant (16.60-fold) in Trial 1 and from 2.18 to 35.88 g/plant (16.45fold) in Trial 2 (Table 1). A wide range of variations were detected, with the coefficient of variation (CV) varying from 33.71 in Trial 2 to 46.12% in Trial 1 ( Table 1). SY in both trials showed an approximate normal distribution ( Supplementary Fig. 1). The correlation coefficient (r) of SY between Trial 1 and Trial 2 was 0.41 (Table 1, Supplementary Fig. 2).

GWAS of SY of B. napus
GLM and MLM approaches were used to perform GWAS analysis to reveal SNPs significantly associated with SY of B. napus (Supplementary Figs. 3,4,. A total of 1773 SNPs were identified to be significantly associated with SY across two trials by GLM approach (P < 6.25 × 10 −07 ) (Supplementary Table 2). The PVE of 1773 SNPs ranged from 5.57 to 12.73% (Supplementary Table 2 Candidate gene association analysis and haplotype analysis of BnaA01g17200D and BnaA02g08680D The SNP of chrA01__8920351 co-located on chromosome A01 was significantly associated with SY in both Trial 2_2 and Trial 2_mean value (Fig. 1a,  Table 5). Previously, transcriptome sequencing was performed on extremely high-and low-yield B. napus cultivars to mine yield-related candidate genes (Lu et al. 2017).
We compared the GWAS results in this study with previous transcriptome data, and results showed that BnaA01g17200D was the only one significantly differentially expressed gene among the 87 candidate genes on A01 chromosome ( Table 2, Supplementary  Table 5). The expression level of BnaA01g17200D in bud on the primary branch of high-yield cultivars was significantly higher than that of low-yield cultivars ( Table 2, Supplementary Table 5). We  Table 6). Four SNPs in BnaA01g17200D were detected to be significantly associated with SY (P < 0.001) ( Table 3). Further analysis of the four significant SNPs showed that T allele of chrA01__9042710, T allele of chrA01__9042762, A allele of chrA01__9044014, and T allele of chrA01__9044072 were high-seed yield alleles (Fig. 1b-f). Based on the above four SNPs, two major haplotypes were detected, and cultivars with BnaA01g17200Hap1 (TTAT) had much higher SY than cultivars with BnaA01g17200Hap2 (CAGC) (Fig. 1g).
The SNP of chrA02__4555979 on chromosome A02 was identified in Trial 1_2 and Trial 1_mean value (Fig. 2a, Supplementary Table 2, 4). Based on the LD decay (300 kb), 108 genes were selected within the 300 kb up-/downstream of the chrA02__4555979 (Supplementary Table 6). The expression of BnaA02g08680D was 6.14-fold between high-yield B. napus cultivars and low-yield B. napus cultivars in bud on the main inflorescence (Lu et al. 2017, Table 2). Ten SNPs were extracted within the coding region and promoters (upstream, 2000 bp) of BnaA02g08680D and four SNPs in Bna-A02g08680D were significantly associated with the SY, and G allele of chrA02__4290037, G allele of chrA02__4290087, C allele of chrA02__4290091, and T allele of chrA02__4290092 were the high-seed yield alleles (Fig. 2b-f, Supplementary Fig. 6). The above significant SNPs revealed two major haplotypes, and cultivars harboring BnaA02g08680Hap1 (GGCT) had significantly greater seed yield than cultivars harboring BnaA02g08680Hap2 (AATA) (Fig. 2g).
Comparison of the significant SNPs associated with SY with QTLs for SY Based on the Darmor-bzh reference genome, a total of 783 SNPs were co-located with the intervals of 47 QTLs for SY in the previous linkage analysis of the B. napus segregation populations (Bouchet et al. 2014;Cai et al. 2016;Luo et al. 2017;Supplementary Table 8). A significant SNP (chrA09__5160639) Fig. 2 The co-localized locus and haplotypes on chromosome A02 associated with SY of B. napus. a Manhattan plot of co-localized locus for SY in Trial 1_2 and Trial 1_mean value. b Candidate gene association study of BnaA02g08680D with SY. Association of the four alleles in chrA02__4290037 (c), chrA02__4290087 (d), chrA02__4290091 (e), and chrA02__4290092 (f) with SY, respectively. g Two haplotypes of BnaA02g08680D. The number of inbred lines harboring the corresponding allele is shown in brackets at the bottom identified in Trial 2_3 was co-located with the interval of "qSY38" QTL and a gene (BnaA09g10430D) reported to control SY was within the interval of this QTL (Luo et al. 2017, Supplementary Table 8).
OsNPF7.2, the orthologue of BnaA09g10430D, regulates tiller number and grain yield in rice . Candidate gene association analysis of BnaA09g10430D with SNP from the 2-kb promoter and the entire coding region showed that 19 SNPs were significantly associated with SY, and had strong LD with each other (Fig. 3b, Table 3, Supplementary Table 6). Among them, 7 were located in the promoter, 4 in the exon and 8 in the intron region (Table 3). Notably, chrA09__5322341 (C/T), chrA09__5323050 (A/G), chrA09__5323064 (T/A), and chrA09__5323081 (A/C) were located in the exon region of the gene BnaA09g10430D and resulted in amino acid changes from arginine to cysteine, isoleucine to valine, leucine to terminator and glutamine to histidine, respectively (Table 3). Two major haplotype groups were obtained based on the above 11 significant SNPs (Fig. 3c). Among them, BnaA09g10430Hap2 (TGA CAT CCCCG) had significantly higher SY than BnaA09g10430Hap1 (CAT ATC TGATA); therefore, BnaA09g10430Hap2 was considered as the favorable haplotype (Fig. 3c). In addition, a total of 67 B. napus cultivars containing these three favorable haplotypes had higher SY than other B. napus cultivars (Fig. 4).

Discussion
Effects of growth environment and GWAS methods on the identification of SNPs associated with SY SY of B. napus is greatly affected by the growth environments (Raman et al. 2022). As the cultivars used in this study are too many (403 B. napus accessions), three contiguous fields were used, and each replication is an independent experimental field in each year. Although the soil physical and chemical properties in the field trials are very similar, some parameters, such as soil-available P and total of K, are much different in three fields . The basic soil fertility could affect the segregation in the seed yield in the association panel and finally affect the number of significant SNPs identified in different replications (Supplementary Table 2). In order to obtain the reliable b Candidate gene association study of Bna-A09g10430D with SY. c Two haplotypes of BnaA09g10430D. The number of inbred lines harboring the corresponding allele is shown in brackets at the bottom SNPs, the seed yields in Trial1_1, Trial1_2, Trial1_3, Trial1_mean value, Trial2_1, Trial2_2, Trial2_3, and Trial2_mean value were all used to identify the colocated SNPs, respectively (Supplementary Table 4). Moreover, there were only 16 SNPs detected by MLM approach in this study, and it is likely that some key SNPs significantly associated with SY were missed by the MLM approach (Supplementary Table 3). For example, many SNPs co-located with the interval of QTLs for SY in previous reports were not identified by MLM approach such as chrA05__11888567 and chrC08__33928511 (Bouchet et al. 2014;Cai et al. 2016;Luo et al. 2017;Pal et al. 2021 Pal et al. (2021), the natural population (403 B. napus accessions) panel in this study has more extensive phenotypic variation. In addition, more than 5 million SNPs by whole genome-wide re-sequencing were used for GWAS analysis in this study Liu et al. 2021). Compared with previous studies using 60 K SNP chip as genotype for GWAS of SY of B. napus, more significant SNPs related to SY were identified in this study (Supplementary Table 2-3).
Understanding the genetic variation of SY and SY-related traits is helpful for improving crop yield. GWAS of SY-related traits including PH (Sun et al. 2016), BN (Zheng et al. 2017;Liu et al. 2021), PB , biological yield (Xiao et al. 2021), and SY in B. napus (Pal et al. 2021) has been conducted in recent years. For instance, a panel of 476 inbred lines of B. napus was genotyped using 19,167 high-quality SNPs, and a total of 48 significant SNPs were identified to be associated with plant height (Sun et al. 2016). With a panel of 331 B. napus accessions and 24,508 SNPs, 27 loci were significantly associated with silique number . In another study, a panel of 505 inbred lines of B. napus was genotyped using more than ten million SNPs, and a total of 88 significant loci were identified to be significantly associated with biological yield (Xiao et al. 2021). Chloroplast membrane proteinlocated FATTY ACID EXPORTER 1-1 (FAX1-1) is identified as a candidate gene for the significant SNPs associated with biological yield through the combination of GWAS with transcriptome sequencing (Xiao et al. 2021). In this study, a total of 103 SNPs associated with SY were co-located with previously reported SNPs and QTLs for seed number per pod (49), pod number (21), and TSW (33) (Shi et al. 2015;Lu et al. 2017;Khan et al. 2019;Pal et al. 2021;Supplementary Table 9). In addition, two SNPs (C03.27718223 and C02.22109942) associated with SY identified by Pal et al. (2021) were also identified in our study (Supplementary Table 8).
These repeatedly identified significant SNPs should be focused on in the further study.
In this study, a total of 1773 SNPs were detected for SY and 783 SNPs were co-located with QTLs for SY and SY-related QTLs by linkage analysis reported in previous studies (Bouchet et al. 2014;Luo et al. 2017;Cai et al. 2016;Pal et al. 2021) (Supplementary Table 8). ChrA09__5160639 was co-located with QTL "qSY38" (Luo et al. 2017) and BnaA09g10430D (BnaA09.NPF8.5) was considered to be the most likely candidate gene. BnaA09.NPF8.5 had an MFS (Major facilitator superfamily) domain-contain protein. In rice, NPF7.2 enhanced influx of nitrate and nitrate concentration and improved root length, root weight, and fresh weight, thereby increasing tiller number and grain yield ).

The candidate genes for SY in B. napus
In this study, 9.53% (169/1773) of the significant SNPs were identified in more than one repetition (Supplementary Table 4). For example, chrA02__4555979 was identified simultaneously in Trial 1_2 and Trial 1_mean value. BnaA02g08680D was located 269 kb upstream of chrA02__4438592. The expression of BnaA02g08680D in bud on the main inflorescence of high-yield cultivars was significantly higher than that of low-yield cultivars (Table 2). In addition, chrA01__8920351 was detected simultaneously in Trial 2_2 and Trial 2_mean value (Supplementary Table 2). BnaA01g17200D was located 121 kb downstream of chrA01__893123. The expression level of BnaA01g17200D in bud on the primary branch of high-yield cultivars was 6 times higher than that in low-yield cultivars (Table 2). In Arabidopsis, ATSESA1 (SEED STORAGE ALBUMIN 1), the orthologue of BnaA01g17200D, regulates seed protein accumulation to influence seed growth (Hu et al. 2021). In addition, some genes which have been reported to improve SY were also identified in our GWAS (Supplementary Table 10). The mediator subunit, OsMED15 is important for flowering, seed size and weight in rice (Dwivedi et al. 2019). The significant SNP chrA05__18407128 on A05 was identified to be located downstream 101 kb of B. napus BnaA05.TCP4 (TCP family transcription factor 4), which is the homologous of OsMED15. Optimal flowering time is associated with high SY, and AtTCP4 regulates flowering time in Arabidopsis (Schiessl et al. 2015;Kubota et al. 2017). BnaC08g36050D, a member of the AUX1 LAX family of auxin influx carriers, was located within the interval of SNP chrC08__33833529. AT2G21050, the homologous of BnaC08g36050D in Arabidopsis, has been reported to increase xylem area and enhance SY (Cabello and Chan 2019). These reported SY related genes detected in this study further prove the accuracy of our GWAS results.
Favorable alleles, haplotypes, and high-seed yield breeding in B. napus Traditional breeding methods are difficult to meet the demand for food and edible oil caused by the growing population (Ma and Cao 2021). Mining superior alleles and haplotypes for molecular marker-assisted breeding has gradually become a new method for breeders (Ma and Cao 2021). Candidate gene association study can detect genetic variants within genes that are significantly associated with phenotypes. Genetic variations are located in the promoter and coding regions, which may lead to the difference of gene expression level and protein sequences among cultivars, respectively. In this study, we conducted candidate gene association study on three candidate genes (BnaA01g17200D, BnaA02g08680D, and BnaA09g10430D) and identified several genetic variations significantly related to SY within the genes. Four significant SNPs (chrA01__9042710, chrA01__9042762, chrA01__9044014, and chrA01__9044072) in BnaA01g17200D and four SNPs (chrA02__4290037, chrA02__4290087, chrA02__4290091, and chrA02__4290092) in BnaA02g08680D were all located in the promoter. Seven, four, and eight significant SNPs associated with SY were located in the promoter, the exon, and the intron of BnaA09g10430D, respectively; and the SNPs located in the exon caused the changes of amino acids among different haplotypes (Table 3). Three high-yield haplotypes (BnaA01g17200Hap1, BnaA02g08680Hap1, and BnaA09g10430Hap2) were identified based on the above significant SNPs (Figs. 1, 2, 3). SYs of 67 B. napus cultivars harboring these three favorable haplotypes were significantly higher than those of other cultivars in 2-year field experiments (Fig. 4). These favorable alleles and haplotypes will be valuable for the cultivation of highyield B. napus cultivars.

Conclusion
SY of B. napus is a complex quantitative trait controlled by multiple genes. A total of 1773 SNPs were identified to be significantly associated with seed yield across two years, and 783 of them were co-located with previous reported QTLs. Candidate gene association and haplotype analysis of BnaA01g17200D, BnaA02g08680D, and Bna-A09g10430D revealed several high-yield haplotypes. These significant SNP loci, candidate genes, favorable alleles, and haplotypes will be useful for the breeding of high-yield B. napus cultivars.