Genome-wide association study identifies new loci for 1000-seed weight in Brassica napus

Oilseed rape (Brassica napus L.) is an important oilseed crop worldwide and 1000-seed weight (SW) is the important determinant of seed yield of B. napus. To elucidate the genetic mechanism of SW and mining candidate genes, a panel of 403 diverse B. napus accessions was screened in a genome-wide association study (GWAS) using 1.60 million single-nucleotide polymorphisms (SNPs). This study identified 340 SNPs significantly associated with SW by general linear model (GLM) and mixed linear model (MLM). Through GWAS combined with transcriptome data, two significantly differentially expressed genes were identified as candidate genes (BnaA02g06870D and BnaC06g28920D). Candidate gene association analysis and haplotype analysis showed that the inbred lines carrying ACCC at BnaA02g06870Hap1 and TTGG at BnaC06g28920Hap1 had greater SW than lines carrying other haplotype alleles. Candidate genes and favourable haplotypes identified in this study will be useful for large-seed breeding of B. napus.


142
Page 2 of 15 Vol:. (1234567890) (Dwivedi et al. 2019), OsMKKK10 (Xu et al. 2018), OsMKK4 (Xu et al. 2018), and OsMAPK6 in rice (Xu et al. 2018); GmCYP78A72 and GmBZR1 in soybean (Zhao et al. 2016a;Lu et al. 2017b); and in oilseed rape (Liu et al. 2015). However, most genes associated with SW have not been excavated due to its complexity of genetic mechanism . Linkage mapping analysis has been widely used to study the genetic basis of SW in B. napus (Shi et al. 2009;Zhao et al. 2016b;Wang et al. 2020). One hundred and fifty-nine quantitative trait loci (QTLs) are significantly associated with SW in B. napus on the analysis of two (TNDH and RC-F2) populations in ten natural environments, and one major QTL was detected in ten environments (Shi et al. 2009). In addition, twenty-one significant QTLs are associated with seed yield and yield-related traits from three different trials (Zhao et al. 2016b). Among them, nine QTLs were associated with SW, explaining 4.58-19.62% of the phenotypic variation (PVE) (Zhao et al. 2016b). Recently, a significant QTL (cqSW.A03-2) associated with SW is detected across multiple environments, explaining 8.46-13.70% of the PVE . The subsequent candidate gene association analysis and gene expression analysis show that BnaA03g37960D is the most likely candidate gene for the significant QTL .
The physical intervals for QTLs are usually very large, which makes it difficult to identify candidate genes (Xiao et al. 2017;Liu et al. 2022). So far, only two genes (BnaA09.ARF18 and BnaA9.CYP78A9) controlling SW have been cloned in B. napus using linkage analysis (Liu et al. 2015;Shi et al. 2019). Compared with QTL mapping, GWAS take advantage of PVE and historical recombination in natural populations, which greatly improves the efficiency of gene mapping (Nordborg and Weigel 2008). GWAS has become a common method to analyze the genetic structure of complex agronomic traits, and it has also been widely used in mining genes associated with seed (grain) weight, such as in rice (Tao et al. 2019;Niu et al. 2021), maize , soybean (Qi et al. 2020;Zhang et al. 2021) and oilseed rape (Li et al. 2014a;Lu et al. 2017a). With the maturity of technology and the decline of sequencing cost, researchers have developed single-nucleotide polymorphisms (SNPs) markers through whole-genome resequencing technology in B. napus Lu et al. 2019;Tang et al. 2021). Some traits previously analyzed by GWAS using 60 K SNP chip as genotype have been re-analyzed, and some new loci and candidate genes have been excavated, such as, flowering time , glucosinolate content Tan et al. 2022) and seed oil content (Xiao et al. , 2021Tang et al. 2021). In this study, a diverse panel of 403 B. napus accessions were scored for SW and 340 significant SNPs were identified by GWAS. Two genes, BnaA02g06870D and BnaC06g28920D, were identified as candidate genes, and the favourable haplotypes (BnaA02g-06870Hap1 and BnaC06g28920Hap1) were revealed for breeding large-seed B. napus cultivars. These results will contribute to improve our understanding of the genetic mechanism of SW of B. napus and breeding of large-seed B. napus cultivars.

Plant materials and field trails
Detailed information on the 403 diverse B. napus accessions and genotypes are described in our previous study ). All B. napus accessions were grown in the field with three replications at Meichuan Town, Wuxue city, Hubei province, China (115·55° E, 29·85 N) from 2018 to 2019 (Trial 1) and from 2019 to 2020 (Trial 2). Each accession was grown in a plot with four rows, and each plot had eight plants in each row, with a distance of 25 cm between plants in each row and 25 cm between rows. After harvest, six plants for each repeat of each accession were selected to investigate SW.
Genome-wide association analysis and candidate gene identification More than 10 million high-quality SNPs of the association panel were derived from a previous study and after filtering the SNPs with minor-allele frequency (MAF) > 0.05 and missing rate < 0.2, a total of 1.60 million SNPs were used for GWAS Liu et al. 2021). Genome-wide association analysis for SW was carried out using general linear model (GLM) and mixed linear model (MLM) by Tassel 5.0 software (Bradbury et al. 2007). The significant P-value thresholds for the association panel was 6.25 × 10 −07 (1/1600000). GGplot2 (https:// Page 3 of 15 142 Vol.: (0123456789) cran.r-proje ct. org/ web/ packa ges/ ggplo t2/ index. html) software was used to draw Manhattan plot, and CMplot software was used to draw Quantile-Quantile plot (https:// github. com/ YinLi Lin/ CMplot). The LD decay, population structure and kinship of this association panel have been reported in previous studies ). The genes located within LD decay value (273 kb) upstream and downstream of the peak SNPs were considered as candidate genes. The genotypes of BnaA02g06870D and BnaC06g28920D in the association panel of B. napus were obtained by VCFtools software (Danecek et al. 2011). Candidate gene association analysis of BnaA02g06870D and BnaC06g28920D were performed with Tassel 5.0 software (Bradbury et al. 2007). The SNP markers from 2 kb upstream of the gene to termination codon were used to conduct association analysis with the SW of the association panel of B. napus.

Haplotype analysis
HaploView.4.2 software was used to conduct haplotype analysis (Barrett et al. 2005). Haplotypes containing at least 10 B. napus accessions were used for further comparative analysis, and Student's t-test was used to compare the differences in SW among the haplotypes.

Statistical analysis of phenotypic data
The mean value, maximum, minimum and coefficient of variation were calculated using Excel 2007. The R language (3.5.3) was used to calculate the correlation coefficients between trials by pairwise method.

Results
Phenotypic variation for SW of an association panel of B. napus SW was investigated for the association panel of 403 B. napus accessions in 2018-2019 (Trial 1) and 2019-2020 (Trial 2) field trials. Extensive phenotypic variations for SW were observed in the association panel of B. napus ( Fig. S1; Tables 1, S1). For example, SW ranged from 3.30 to 4.60 g/ 1000 seeds (1.4-fold) in Trial 1 and from 3.00 to 5.08 g/ 1000 seeds (1.69-fold) in Trial 2 ( Table 1).
The correlation coefficient (r) of SW between Trial 1 and Trial 2 were 0.62, which showed that the phenotype had good repeatability in two trials (Fig.  S2).

Genome-wide association study of SW of B. napus
We performed GWAS with GLM and MLM approaches to identify SNPs associated with SW in B. napus. A total of 340 SNPs were significantly associated with SW across two trials (P < 6.25 × 10 −07 ) ( Fig. 1; Tables S2, S3). Among the 340 SNPs, 180 were identified in Trial 1, 16 in Trial 2, and 144 were identified by the mean values of two trials ( Fig. 1; Tables S2, S3). The GLM analysis detected a total of 340 SNPs significantly associated with SW, distributed on 18 of the 19 B. napus chromosomes (excluding A08). Chromosomes C06 and A02 had the largest number of significant SNPs (62) and the second largest number of significant SNPs (61 SNPs), respectively (Table S2). Since MLM considers both population structure and kinship, only 3 SNPs were detected by MLM, which distributed on chromosomes A02, C06 and C07 of B. napus, explaining the PVE of 9.72-10.25% ( Fig. 1D-F; Table S3). These three SNPs were detected simultaneously by GLM and MLM (Tables S2, S3). Additionally, 10.30% (35/340) of the significant SNPs were identified in more than one trial (including the mean value of two trials), which showed high reliability ( Fig. 1, Table S2). For example, the significant SNP marker (chrA02__3219827) was detected by Trial 1, Trial 2 and the mean value of two trials, which could explain 12.81% of the PVE of SW ( Fig. 1A-C; Table S2). The significant SNP marker (chrC06__29452686) was detected by Trial 1, Trial 2 and the mean value of two trials, which could explain 9.28% of the PVE of SW ( Fig. 1A-C; Table S2).

Comparison of significant SNPs by GWAS with reported QTLs for SW in B. napus
Based on the Darmor-bzh reference genome, we analysed the co-localization of the significant SNPs detected in our study and the QTLs detected by previous studies (Li et al. 2014b;Fu et al. 2015;Liu et al. 2015;Wang et al. 2020). Nine significant SNPs detected by GWAS co-localized with the intervals of the QTLs for SW in a previous linkage analyses, including seven SNPs on A03, and two SNPs on A09 (Table 2). For example, the SNP 'chrA03__18945635' on chromosome A03 was detected for SW in Trial 1, and explained 8.83% of the PVE, which co-located with the reported QTL (cqSW. A03-2) ( Table 2). The SNP 'chrA09__27535146' on chromosome A09 was detected for SW with the mean value of the two trials, and explained 9.34% of the PVE (Table 2), which co-located with the reported QTL (cqSW.A09-3) ( Table 2).  The SNP of chrA02__3219827 on chromosome A02 was associated with SW in both Trial 1, Trial 2 and the mean value of two trials ( Fig. 1A-C; Table S2). The LD decay of chromosome A02 in this association panel was 174 kb ). Based on the LD decay, 174 kb up/down-stream of the significant SNP (chrA02__3219827) was selected to identify candidate genes on A02 and 82 candidate genes were detected (Table S4). Previously, a transcriptome analysis of the immature seeds of two B. napus lines with extremely different SW has been conducted to identify the candidate genes related to SW (Geng et al. 2018). In this study, a comparison of our GWAS and transcriptome sequencing results by Geng et al. (2018) revealed 2 common genes (BnaA02g06330D and BnaA02g06870D) on A02 chromosome (Table 3). In the seeds of large-seed line, the expression levels of BnaA02g06330D and BnaA02g06870D were 4.18 and 3.02 fold higher than those of smallseed line, respectively (Table 3). Twelve SNPs were located within the 2 kb promoter region and the entire coding region of BnaA02g06870D (Tables  S4, S5). However, no SNP was identified within the corresponding region of BnaA02g06330D (Table 3). We performed candidate gene association analysis of BnaA02g06870D, and four SNPs in BnaA02g06870D were significantly associated with SW in Trial 1, Trial 2 and the mean value of two trials ( Fig. 2A-C). Further analysis demonstrated that the A allele of chrA02__3257485, C allele of chrA02__3257503, C allele of chrA02__3257506 and C allele of chrA02__3257518 were the large-seed alleles (Fig. 3). Two major haplotypes were detected, and cultivars with BnaA02g06870Hap1 (ACCC) had much higher SW than cultivars with BnaA02g-06870Hap2 (TTTT), in Trial 1, Trial 2 and the mean value of two trials. Thus BnaA02g06870Hap1 was confirmed as the favorable haplotype ( Fig. 2D-F). The confidence region significantly associated with SW in both Trial 1, Trial 2 and the mean value of two trials, on chromosome C06 ranged from 29.47 to 31.25 Mb (Fig. 1A-C; Table S2). The lead SNP 'chrC06__30203895' on chromosome C06 was detected for SW in Trial 1, and explained 11.30% of the PVE (Table S2). In previous study, the LD decay of C06 chromosome in this association panel was 229 kb . Based on the LD decay, 229 kb upstream and downstream regions of the significant SNP (chrC06__30203895) were selected and found to contain 81 genes (Table S6). One significantly differentially expressed gene (BnaC06g28920D) was detected by combining GWAS and the transcriptome data by Geng et al. (2018) (Table 3). The expression levels of BnaC06g28920D in the seeds of large-seed line were 3.04 fold higher than that of small-seed line (Table 3). A total of twenty SNPs were located within the 2 kb promoter region and the entire coding region of BnaC06g28920D (Table 4). Among them, five SNPs in BnaC06g28920D were detected to be significantly associated with the SW in Trial 1, six in Trial 2 and seven with the mean value of two trials by candidate gene association analysis ( Fig. 4A-C). Notably, chrC06__30094959 (T/C) were located in the exon region of the gene BnaC06g28920D and resulted in amino acid changes from isoleucine to threonine; chrC06__30095553 (T/C) were located in the exon region of the gene BnaC06g28920D and resulted in amino acid changes from leucine to proline; chrC06__30095570 (G/T) were located in the exon region of the gene BnaC06g28920D and resulted in amino acid changes from valine to leucine; and chrC06__30096038 (G/C) were located in the exon region of the gene BnaC06g28920D and resulted in amino acid changes from valine to leucine (Table 4). We observed that the T allele of chrC06__30094959, T allele of chrC06__30095553, G allele of chrC06__30095570, and G allele of chrC06__30096038 were large-seed alleles (Fig. 5). These four significant SNPs revealed four haplotypes, and BnaC06g28920Hap1 (TTGG) had significantly greater SW value than other haplotypes in Trial 1, Trial 2 and the mean value of two trials. BnaC06g28920Hap1 was a favorable haplotype ( Fig. 4D-F). A total of 44 B. napus cultivars in the association panel contained these eight favorable alleles, and had higher SW than B. napus cultivars with other alleles (Fig. 6).

Discussion
Overlapped loci for SW between this study and previous studies SW is an important yield related trait and breeding large seed B. napus is a long-term goal of breeders (Fan et al. 2010;Wang et al. 2020). Mining genes controlling SW and finding excellent haplotypes associated with SW can promote molecular marker assisted breeding of B. napus. A panel of 472 inbred lines of B. napus were genotyped using 24,256 SNPs, and two SNPs were associated with SW, which distributed on A07 and A09 chromosomes, and explained 4.90% and 13.87% of the PVE, respectively (Li et al. 2014a). In another study, nine SNPs were significantly associated with SW on A02, A03, A08, A09, A10, C03, C04 and C09 chromosomes by using a panel of 520 B. napus accessions and 31,839 SNPs (Lu et al. 2017a). In addition, seven SNPs were significantly associated with SW with 157 B. napus cultivars and 690,953 SNPs ). In  our study, a panel of 403 diverse B. napus accessions was screened in a GWAS using more than one million SNPs and identified 340 significant SNPs associated with SW (Tables S2, S3). Twenty-two significant SNPs in this study were adjacent to previously published significant loci for SW, which increased the credibility of these SNPs (Table S7). For example, chrA09__33133565 explained 7.78% of PVE of SW, which was adjacent to the significant SNP (Bn-A09-p30654305) located in Li et al. (2014a) ( Table S7). Moreover, chrA02__3573374, a lead SNP on the A02 chromosome was detected by SW in Trial 1, which was adjacent to the significant SNP (rs4600) located in Lu et al. (2017a) (Table S7). In addition to GWAS method, some QTLs related to SW were also mined by linkage mapping. The intervals of a major QTL controlling SW of B. napus were reported overlapped in different studies (Basunanda et al. 2010;Yang et al. 2012;Li et al. 2014b;Fu et al. 2015;Liu et al. 2015;Dong et al. 2018) (Table 2; Table S7). Subsequent studies show that BnaARF18 in this interval regulates cell growth in the silique wall by acting via an auxin-response pathway and finally changed the SW (Liu et al. 2015). In this study, SNP chrA09__27915575 explained 9.34% of PVE of SW, which was co-located with the significant QTL of cqSW.A09 (Tables 2, S7). The mean value of different replicates have been widely used in GWAS of Brassica napus, such as plant height and primary branch number , harvest index (Lu et al. 2016) and cadmium accumulation . In this study, the average SW of two trials were used to perform GWAS analysis in order to reduce the impact of phenotypic variations on the results of GWAS analysis. The lead SNPs of chrA02__3219827 and chrC06__29452686 were both associated with SW in Trial 1, Trial 2 and the mean value of two trials ( Fig. 1A-C; Table S2). These SNPs showed higher reliability, and they were newly found important loci for SW, which were not detected in previous studies.
Causal genes associated with the significant loci for SW The LD decay values in several GWAS studies on SW of B. napus were 1.2 Mb (472 inbred lines and 24,256 SNPs), 0.8 Mb (520 inbred lines and 31,839 SNPs) and 0.6 Mb (520 inbred lines and 690,953 SNPs), respectively (Li et al. 2014a;Lu et al. 2017a;Dong et al. 2018). Large LD decay in B. napus indicates an associated locus contains more genes, and thus it is difficult to pinpoint the causal genes associated with the significant loci. Although the LD decay (237 kb) value in this association panel is much smaller than that in previous studies, however, there are still dozens of genes in the interval, so it is also difficult to determine the real candidate genes. For example, there were 82 candidate genes within the range of LD decay value upstream and downstream the lead significant SNP (chrA02__3219827) on the A02 chromosome (Table S4).
GWAS combined with transcriptome analysis has become a common combinatorial analysis for mining candidate genes in B. napus in recent years (Lu et al. 2016;Wei et al. 2019;Xiao et al. 2019). For example, Xiao et al (2019) combined GWAS and transcriptome to mine seed oil content-related candidate genes and identified seven candidate genes ). In addition, a GWAS combined with transcriptome identified 33 significantly differentially expressed candidate genes located within the confidence intervals of significant SNPs for harvest index (Lu et al. 2016). In this study, the reported transcriptome data of the immature seeds of two B. napus lines with extremely different SW (Geng et al. 2018) were employed, and BnaA02g06870D and BnaC06g28920D were predicted to be candidate genes.
Favorable alleles, haplotypes and large-seed breeding Candidate gene association study can mine the genetic variation significantly related to phenotype in candidate genes Zheng et al. 2017;Yu et al. 2019;Wang et al. 2020). For example, the favorable alleles of candidate gene 'BnaA03g37960D' for SW were identified in B. napus and SW of cultivars with two different SNPs (SA03_18861910 and SA03_18862302) are significantly different . In another study, the significant SNP Bn-A02-p9505646 (G/A) of the GG allele showed the largest contribution (PVE ~ 9.79%) to plant height of B. napus, and the plant height of plants with GG allele are far higher than those with AA allele. The significant SNP Bn-A02-p9505646 (C/A) of the CC allele showed the largest contribution (PVE ∼9.87%) to branch initation height, measuring 11.4 cm more than those of the AA allele (Zheng et al. 2017). In this study, we performed candidate gene association analysis on two candidate genes (BnaA02g06870D and BnaC06g28920D) and mined a total of eight significant SNPs (Figs. 2, 4). The SW of cultivars with favorable alleles was higher than that of other cultivars (Figs. 3, 5, 6). The genetic variation in promoter and 5' untranslated region (5'UTR) may cause the difference of candidate gene expression among cultivars and the genetic variation in exon region may cause the change of amino acids among cultivars, which can affect the function of the candidate gene protein (Farashi et al. 2019). In this study, four SNPs (chrA02__3257485, chrA02__3257503, chrA02__3257506 and chrA02__3257518) significantly related to seed weight were detected in candidate gene Bna-A02g06870D, which were located in the promoter region of BnaA02g06870D ( Fig. 2; Table S5). In another candidate gene BnaC06g28920D, four SNP loci (chrC06__30094959, chrC06__30095553, chrC06__30095570 and chrC06__30096038) were significantly related to SW ( Fig. 4; Table 4). These SNPs were located in the exon region and caused amino acid changes among cultivars ( Fig. 4; Table 4). Subsequently, the favorable haplotypes of Bna-A02g06870D and BnaC06g28920D were determined as ACCC and TTGG by haplotype analysis, respectively. The cultivars carrying ACCC at BnaA02g-06870Hap1 and TTGG at BnaC06g28920Hap1 had greater SW than other cultivars (Figs. 2D-F, 4D-F). The favorable haplotypes ACCC and TTGG identified in this study can be used for molecular markerassisted breeding of large seed in B. napus.
In conclusion, SW of B. napus is a complex quantitative trait controlled by multiple genes. A total of 340 SNPs were identified to be significantly associated with SW across two years, and thirty-five of them were detected simultaneously for different years. Two new candidate genes for SW were identified (BnaA02g06870D and BnaC06g28920D) and the discovery of the large seed haplotype of ACCC at BnaA02g06870Hap1, and TTGG at BnaC06g-28920Hap1 could enable the accurate selection of B. napus cultivars with large seed.