Analysis of the variation in EAC of seeds in two genetic populations
To investigate the variation in erucic acid content (EAC) among natural germplasm populations of rapeseed, the EAC of seeds was measured in two populations in consecutive years, namely population 2022 (P1) and population 2023 (P2). The larger population (P1, 2022) (Table S1) consisted of 932 accessions and a smaller population (P2, 2023) comprised 284 accessions (Table S2). The selection of the smaller population aimed to maintain genetic diversity of the larger population while allowing for better control of field experiment’s scale. A clustering plot (Fig. 1a) revealed that the 284 accessions in P2 were representatively distributed across various clusters in P1. The proportions of three ecological types (winter, semi-winter, and spring) were relatively consistent in both populations, with each type accounting for 66.2%, 14.6%, and 19.2% in P1 and 63.0%, 17.3%, and 19.7% in P2 (Fig. 1b, c).
Based on the EAC of seeds, all accessions were categorized into three groups: high erucic acid (HEAC) with EAC > 20%, moderate erucic acid (MEAC) with 5% < EAC < 20%, and low erucic acid (LEAC) with EAC < 5%. According to this classification, the proportions of HEAC, MEAC, and LEAC accessions in P1 were 33.9% (HEAC), 9.6% (MEAC), and 56.5% (LEAC), while in P2, the proportions were 39.4% (HEAC), 7.8% (MEAC), and 52.8% (LEAC). The relative proportions of HEAC, MEAC, and LEAC accessions were similar in both populations (Table S3).
Furthermore, Fig. 1d visually represents the distribution of accessions in P2 (gray dots) and P1 (red dots) on a principal component analysis plot. Genotypic analysis revealed that the accessions in P2 included 4.31 million single nucleotide polymorphisms (SNPs), accounting for 95% of the total SNPs in P1, respectively (Fig. 1d). Table S3 summarizes statistical parameters of EAC variation in the two genetic populations.
Selective signal analysis and genome-wide-association study on EAC in seeds
All current LEAC parental lines can be traced back to ‘Liho’, a fodder rapeseed cultivar originally from South America. The breeding for LEA genotype has significantly reduced the genetic diversity of rapeseed (Brassica napus). To investigate the artificial selection signatures left by LEA breeding, we performed selective signal analysis (SSA) on 200 HEAC and 200 LEAC accessions, aiming to understand the genetic footprints left during the process of LEA breeding. Using 4,436,952 high-quality single nucleotide polymorphisms (SNPs), we calculated the FST values between the HEA and LEA accessions at the global genome level. As a result, we identified a total of 66,517 significant SNPs and mapped 1,365 candidate genes associated with the selection signals between the LEA and HEA groups (Fig. 2a). The identification and functions of these genes can be found in Table S4. Notably, among these footprint genes resulting from LEA selection, there were 67 putative transcription factors (TFs), including ethylene-responsive factors (ERF8, RAP2, ERF6, ERF1A, ERF043, CRF4), GATA family genes (GATA29, GATA26, GATA3, GATA5), v-Myb avian myeloblastosis viral oncogene homolog (MYB) family genes (MYB39, MYB3R, MYB86, MYB73, MYB98, MYB85, MYB121, MYB34), basic leucine zipper (bZIP) proteins (ATB2/bZIP11, bZIP70, bZIP2), and basic helix-loop-helix (bHLH) proteins (bHLH63, bHLH27, bHLH126, bHLH99, bHLH162-like).These putative TFs have multiple copies in the two subgenomes of Brassica napus.
In addition to the SSA, we conducted a genome-wide association study (GWAS) to identify genes closely associated with EAC in rapeseed seeds. The GWAS was performed using data collected from P1 and P2. By analyzing the GWAS results, we identified 2148 SNPs with a -log10P value greater than 6.6 in both populations. Detailed information about SNP positions identified both in P1 and P2 can be found in Table S5. The majority of these SNPs, more than 99.9%, were located on chromosomes A08 (Chr.A08) and C03 (Chr.C03). Specifically, approximately 33.8% of the significant SNPs were located on Chr.A08, while 66.1% were on Chr.C03. The remaining 3 significant SNPs were dispersed across other chromosomes (C06, C07, C08) (Fig. 2b; Table S5). To avoid false positives, we eliminated 3 independent SNPs in chromosomes C06, C07 and C08 (Table S6). We further searched the 75 kb upstream and downstream regions closely linked to these SNPs for the candidate genes responsible for EAC variations, and identified 654 genes within these regions. Among them, 382 genes were located on Chr.A08, while 272 genes were located on Chr.C03. The annotations of these genes can be found in Table S6. The identified genes include those directly involved in FA and, more specifically, EA biosynthesis pathways. Examples are Arabidopsis orthologues of FAE1 ortholog (BnaC03G0745900ZS), 3-ketoacyl-CoA synthase 17 (BnaC03G0746000ZS), methylcrotonoyl-CoA carboxylase (BnaC03G0749800ZS), various TFs, such as ERFs (BnaA08G0105200ZS, BnaA08G0105300ZS, BnaA08G0144100ZS, BnaC03G0746800ZS), MYBs (BnaA08G0106500ZS, BnaA08G0111700ZS, BnaA08G0121500ZS, BnaA08G0144500ZS, BnaC03G0669700ZS, BnaC03G0723300ZS, BnaC03G0754500ZS), WRKYs (BnaA08G0108000ZS, BnaA08G0130000ZS, BnaA08G0149300ZS), GATAs (BnaC03G0744600ZS, BnaA08G0105900ZS, BnaA08G0133800ZS), bHLHs (BnaC03G0745700ZS, BnaA08G0134600ZS, BnaA08G0139400ZS, BnaC03G0745700ZS), bZIPs (BnaA08G0134300ZS, BnaC03G0741900ZS, BnaC03G0745300ZS, BnaC03G0741900ZS), and DOFs (BnaA08G0119900ZS, BnaA08G0122700ZS, BnaA08G0122800ZS, BnaA08G0123000ZS, BnaC03G0725300ZS, BnaC03G0725400ZS) .
To identify the genes resulting from both the SSA and GWAS, we conducted a cross-analysis as shown in a Venn diagram (Fig. 2c). A total of 240 genes were identified in both studies. Notably, these genes included FAE1 and the TF families, such as ERFs, GATAs, MYBs, bZIPs, bHLHs, and DOFs (Table S7).
Analyses for DEGs in HEAC seeds between the developmental stages 20 and 40 DAF
To gain further insights into the molecular mechanisms underlying EA accumulation in developing rapeseed seeds, we performed RNA-seq analyses to identify differentially expressed genes (DEGs) in typical HEAC seeds at two developmental stages, 20 DAF and 40 DAF. Figure 3a illustrates a rapid increase in EA accumulation within a short span of 20 days between 20 and 40 DAF. By comparing the DEGs between these two time points, we identified a total of 6,271 up-regulated genes and 5,745 down-regulated genes (|log2(fold change)| > 2, q-value < 0.05) at 40 DAF compared to 20 DAF (Fig. 3b). Cluster analysis revealed distinct differences in gene expression patterns between the 20 and 40 DAF time points (Fig. 3c). The identification and predicted functions of these genes can be found in Table S8.
Gene Ontology (GO) enrichment analysis was performed to further understand the biological functions of the DEGs. The enriched categories among the up-regulated genes (URGs) included 32 genes involved in FA metabolic processes, 22 genes related to lipid storage regulation, 18 genes involved in lipid droplet organization, and 369 genes positively regulating transcription. For instance, the URGs involved in FA metabolic processes include FA metabolic processes, lipid storage regulation, lipid droplet organization, etc. On the other hand, the enriched down-regulated genes (DRGs) included 214 genes responsive to cadmium ion, 212 genes involved in fungus defense responses, 150 genes responding to heat stress, as well as other genes involved in cellular and essential macromolecular proliferation activities such as mitotic cell cycle, DNA replication, and protein refolding, among others.
Analyses for DEGs in 40 DAF seeds between HEA and LEA accessions
Meanwhile, we performed a comparison of DEGs in developing seeds at 40 DAF between typical LEAC accessions (R4451, R4472, and R4845) and HEAC accessions (R4442, R4298, and R4707). The LEA and HEA accessions exhibited more than a 35-fold difference in EAC (Fig. S2a). We identified 198 URGs and 219 DRGs (|log2(fold change)| > 1, q-value < 0.05) in the HEA accessions compared to the LEA accessions (Fig. S2b). The gene expression patterns clearly distinguished between the LEA and HEA accessions (Fig. S2c). The identification and annotations of the DEGs between the HEA and LEA accessions are provided in Table S9.
GO enrichment analysis revealed that the URGs in the HEA accessions and LEA accessions included 1 gene involved in L-lysine catabolic process to acetyl-CoA, 1 gene in ATP biosynthetic process, and 3 genes involved in jasmonic acid mediated signaling pathway. On the other hand, the DRGs in the HEA accessions included 9 genes responding to cold, 3 genes associated with seed development, 2 genes responding regulation of reactive oxygen species, as well as other genes involved in cellular activities such as the cellular respiration, DNA replication, and protein storage vacuole organization (Fig. S2d).
To further investigate the overlap between the DEGs identified in the two RNA-seq experiments, we conducted a cross-analysis and listed the overlapping URGs and DRGs in Table S10.
Identification of allelic variation of candidate genes and development of markers for distinguishing HEA and LEA accessions
Comprehensively considered GWAS, SSA and two sets of DEGs, we recommended a gene pool consisting of 23 genes that may regulate EA biosynthesis (Table 1). These genes include orthologues of FAE1, Methylcrotonoyl-CoA carboxylase beta chain (MCCB), Dof zinc finger protein DOF4.4 (DOF4.4), GATA transcription factor 3 (GATA3), Arginine Decarboxylase 2 (ADC2), and others. To investigate the genetic differences between HEA and LEA accessions, we chose two candidate genes, BnaA08.FAE1 and BnaC03.MCCB, as representative examples for comparing the distribution patterns of single nucleotide polymorphisms (SNPs) between 200 HEA and 200 LEA accessions. The aim of this comparison was to justify the selection of molecular markers for distinguishing HEA and LEA accessions.
Table 1
Candidate genes regulating the synthesis of erucic acid in seeds recommended by integrative analyses of GWAS and FST and DEGs
Arabidopsis orthologue | ID of Brassica napus |
Arginine Decarboxylase 2 (ADC2) | BnaA08G0133500ZS, BnaC03G0743900ZS |
An ortholog of the human BREAST CANCER SUSCEPTIBILITY 1 (BRCA1) | BnaA08G0122500ZS, BnaC03G0724200ZS |
Cationic Amino Acid Transporter 1 (CAT1) | BnaA08G0122000ZS, BnaC03G0723700ZS |
Cystathionine beta-Synthase domain-containing protein (CBSX2) | BnaA08G0137900ZS, BnaC03G0749100ZS |
Caffeoyl-CoA O-methyltransferase 1 (CCOAOMT1) | BnaC03G0749700ZS |
Cyclin-dependent kinases regulatory subunit 2 (CKS2) | BnaA08G0136900ZS |
Coenzyme Q-binding protein (COQ10) | BnaA08G0106000ZS |
DOF zinc finger protein DOF4.4 | BnaA08G0122800ZS, BnaC03G0724300ZS、BnaC03G0724600ZS, BnaC03G0724700ZS |
3-ketoacyl-CoA synthase 18 (FAE1) | BnaA08G0134700ZS |
GATA transcription factor 3 (GATA3) | BnaA08G0133800ZS, BnaC03G0744600ZS |
Guanine nucleotide-binding protein subunit beta (GB1) | BnaA08G0135000ZS |
Methylcrotonoyl-CoA carboxylase beta chain, mitochondrial (MCCB) | BnaC03G0749800ZS |
DNA mismatch repair protein MSH4 | BnaA08G0104800ZS |
GPN-loop GTPase (QQT2) | BnaC03G0725900ZS |
Serine carboxypeptidase-like 43 (SCPL43) | BnaA08G0144800ZS |
SWI/SNF complex subunit (SWI3D) | BnaA08G0135300ZS |
Disease resistance protein (TAO1) | BnaC03G0727800ZS, BnaC03G0727900ZS |
UDP-glycosyltransferase 73B1 (UGT73B1) | BnaA08G0137300ZS, BnaC03G0748400ZS |
Zinc induced facilitator-like 2 | BnaA08G0104700ZS |
Ribosomal protein L19 family protein | BnaA08G0105800ZS |
WAT1-related protein | BnaA08G0113700ZS |
Transmembrane protein, putative (DUF677) | BnaA08G0136100ZS |
Late embryogenesis abundant protein (LEA) family protein | BnaA08G0123100ZS, BnaC03G0724800ZS |
BnaA08.FAE1 consists of only one exon, and a total of 18 SNPs were identified in the 3 Kb 5' regulatory region of the gene. Figure 4a shows the position of the SNPs on Chr.A08 and the location of BnaA08.FAE1, ranging from 18,615,052 Bp to 18,622,753 Bp. Figure 4b visually displays the different patterns of SNP distribution in the 5' regulatory region of BnaA08.FAE1. As we used the genome of the LEA cultivar ‘ZS11’ as the reference genome, the majority of SNPs in the HEA group had alternative alleles compared to the reference genome, while the majority of SNPs in the LEA group had alleles identical to the reference genome. We classified the SNP combinations into six haplotypes, namely FAE1_Hap A to Hap F. The statistical probabilities for LEA were 4.7%, 0%, 1.5%, 16.7%, 93%, and 0% for FAE1_Hap A, FAE1_Hap B, FAE1_Hap C, FAE1_Hap D, FAE1_Hap E, and FAE1_Hap F, respectively (Fig. 4c and Table 2).
Table 2
The haplotypes of SNPs related to candidate genes
Haplotype | Haplotype Name | related gene |
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_G_A_A_C_T_G_G_T_C_C_T_C_T_G | FAE1_Hap A | BnaA08.FAE1 |
G_C_T_A_T_T_G_T_A_G_G_A_G_G_A_G_C_T_A_T_G_G_G_G_T_T_A_A_C_C_T_T_C_G_A_C_A_A_A | FAE1_Hap B |
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_T_G_G_C_T_G_G_T_C_C_T_C_T_G | FAE1_Hap C |
G_C_A_G_T_T_G_T_A_G_A_A_G_G_A_G_C_T_A_T_G_G_G_G_T_G_A_A_G_T_T_T_C_G_A_C_A_T_G | FAE1_Hap D |
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_T_A_A_C_T_G_G_T_C_C_T_C_T_G | FAE1_Hap E |
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_G_G_G_C_T_G_G_T_C_C_T_C_T_G | FAE1_Hap F |
A_C_C_G_C_G_T_A_C_T_A_C_A_G_G_C_G_C_A | MCCB_Hap A | BnaC03.MCCB |
G_C_C_G_C_G_T_A_C_T_A_C_A_G_G_C_G_C_A | MCCB_Hap B |
A_A_G_T_A_A_A_T_G_G_T_G_T_A_A_T_C_G_C | MCCB_Hap C |
G_A_G_T_A_A_A_T_G_G_T_G_T_A_A_T_C_G_C | MCCB_Hap D |
BnaC03.MCCB consists of 10 exons and 9 introns, and a total of 12 SNPs were identified within the 3 Kb 5' regulatory region and 6 SNPs in the exon region. Figure 4d shows the location of BnaC03.MCCB, ranging from 72,690,869bp to 72,699,720 Bp, and the positions of the SNPs on Chr.C03. Among the 6 SNPs in the exon region, 3 resulted in nonsense nucleotide changes, 2 led to sense nucleotide changes, and 1 caused a frameshift. Figure 4e visually presents the different patterns of SNP distribution in BnaC03.MCCB between the HEA and LEA accessions. The majority of SNPs in the HEA group had alternative alleles compared to the reference genome, while the majority of SNPs in the LEA group had alleles identical to the reference genome. We classified the SNP combinations into four haplotypes, namely MCCB_Hap A to Hap D. The statistical probabilities for LEA were 21.0%, 8.3%, 94.5%, and 100% for MCCB_Hap A, MCCB_Hap B, MCCB_Hap C, and MCCB_Hap D, respectively (Fig. 4f and Table 2). The allelic combinations for each haplotype of both FAE1 and MCCB orthologues are provided in Table 2.
The SNPs shown in Fig. 4b and 4e can be used to develop breeding markers such as Cleaved Amplified Polymorphic Sequence (CAPS) (Fig. S3, Table S11). Here, we focused on the SNPs in the 5' regulatory region upstream of BnaC03.MCCB (Fig. 5). The restrictive enzyme Xba1 was able to digest the PCR product at the position of Chr.C03_72696864 in LEA accessions, but not at that position in HEA accessions (Fig. 5a and Table S12). This resulted in different fragment sizes as displayed on the gel during electrophoresis (Fig. 5b).