Genomic and transcriptome analyses reveal the molecular basis for erucic acid biosynthesis in seeds of rapeseed (Brassica napus)

doi:10.21203/rs.3.rs-3901677/v1

Download PDF

Research Article

Genomic and transcriptome analyses reveal the molecular basis for erucic acid biosynthesis in seeds of rapeseed (Brassica napus)

https://doi.org/10.21203/rs.3.rs-3901677/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Erucic acid (EA) is an important quality trait in rapeseed with low EA content (LEAC) oil being recognized as a healthy edible oil and high EA oil holding industrial value. Despite its importance, the consequences of intensive selection for LEAC genotype and the genes associated with EA regulation remain largely unknown. Here, we employed selective signal analyses (SSA), genome-wide association study (GWAS), and transcriptome analyses to enhance our understanding on the molecular base of EA regulation. Our investigation revealed the genetic footprints resulting from LEAC selection in germplasm populations, highlighting genetic regions for enriching diversity. Through GWAS, we identified 654 genes, including enzymes involved in the fatty acid biosynthesis and various transcription factors, that significantly associated with EAC variation. By combining SSA, GWAS, and transcriptome analyses, a subset of 23 genes that have a significant impact on EAC in seeds is recommended. Example genes such as Fatty Acid Elongation 1 and Methylcrotonoyl-CoA Carboxylase Beta Chain were selected to illustrate the SNP distributions, haplotypes for EAC phenotypes and the development of molecular marker to distinguish LEAC and HEAC genotypes. These findings provide insights into the mechanism of EA regulation and shed light on the manipulation of the genes regulating EA biosynthesis.

genome-wide-association study

selective sweep

transcriptome analysis

erucic acid

CAPS markers

Brassica napus

Through comprehensive genomic and transcriptomic analyses, we propose a group of 23 genes as potential regulators of erucic acid content (EAC) in rapeseed seeds. We selected example genes to demonstrate the distribution of single nucleotide polymorphisms (SNPs), haplotypes associated with EAC phenotypes, and the creation of molecular markers differentiating low EAC and high EAC genotypes.

Rapeseed (Brassica napus) is a significant global source of edible oil. Traditional rapeseed oil contains high levels of erucic acid (EA), a monounsaturated fatty acid with a carbon chain length of 22. In the 1960s, Canada initiated the breeding of rapeseed varieties with reduced EA and glucosinolate content (Gupta and Pratap 2007). Simultaneously, the concentration of 18-carbon chain fatty acids (FAs) with high nutritional value, such as oleic acid (C18:1), and linoleic acid (C18:2), and linolenic acid (C18:3), significantly increased (Cui et al. 2021). This advancement propelled rapeseed oil to be acknowledged as a healthy edible oil worldwide. Presently, officially registered rapeseed varieties in major producing countries must adhere to low EA requirements (EA content below 1%). From a health standpoint, EA is considered undesirable in edible oil (Knutsen et al. 2016). Nevertheless, EA holds industrial value as a raw material. It finds extensive application in the production of lubricants, plasticizers, and surfactants, as well as in cosmetics, paints, coatings, and the synthesis of various chemicals, including pharmaceutical intermediates and agricultural products (Kaur et al. 2020). Whether the objective is to breed low EA (LEA) or high EA (HEA) rapeseed varieties, a comprehensive understanding of the molecular mechanisms governing EA synthesis in seeds is an essential prerequisite.

Erucic acid (EA), a 22-carbon chain fatty acid, is synthesized through a series of reactions. The process starts in plastids, where 18-carbon chain fatty acids (FAs) are formed and then transferred to the cytosol. In the cytosol, these FAs undergo further elongation to become EA or longer FAs (Haslam et al. 2013; Wang et al. 2022). The primary carbon source for FA synthesis is sucrose, which is converted to pyruvic acid through the Calvin cycle. Pyruvic acid is then transformed into acetyl-CoA by the pyruvate dehydrogenase complex (PDH), serving as a precursor for FAs. The next step involves the synthesis of malonyl-CoA, catalyzed by the enzyme acetyl-CoA carboxylase (ACCase). The malonyl group is transferred from CoA to acyl carrier protein (ACP). Acetyl-CoA and malonyl-ACP enter the fatty acid synthesis complex (FAS) separately and go through a series of reactions, including condensation, reduction, and dehydration. Various enzymes facilitate these reactions, with 3-ketoacyl-ACP synthase III (KAS III) catalyzing the formation of C4:0-ACP. Continuing this cycle, 3-ketoacyl-ACP synthase I (KAS I) facilitates the synthesis of C16:0-ACP, with two carbon additions per cycle. C18:0-ACP is then elongated from C16:0-ACP by 3-ketoacyl-ACP synthase II (KAS II). C18:0-ACP undergoes desaturation to form C18:1-ACP, catalyzed by stearoyl-ACP desaturase. C18:1-ACP is released from FAS by acyl-ACP thioesterases (Fat A/B). Free FAs are activated to acyl-CoA by a long-chain acyl-CoA synthetase (LACS) and transported to the endoplasmic reticulum (ER), where the FA chain undergoes desaturation and elongation (Ohlrogge and Browse 1995; Harwood 2005; Li-Beisson et al. 2013). Additionally, oleic acid (C18:1) can be converted to other fatty acids like linoleic acid (C18:2) and linolenic acid (C18:3) through the action of specific desaturase enzymes. The synthesis of EA, a very long-chain fatty acid (VLCFA), occurs at the ER membrane through a complex of enzymes known as the FA elongation enzyme complex. This complex includes 3-ketoacyl-CoA synthase (KCS), 3-ketoacyl-CoA reductase (KCR), 3-hydroxyacyl-CoA dehydratase (HCD), and trans-2,3-enoyl-CoA reductase (ECR). The rate-limiting enzyme in the initial step of the FA elongation reaction is 3-ketoacyl-CoA synthase (KCS), encoded by Fatty Acid Elongase 1 (FAE1). Finally, EA is assembled and stored as triglycerides (TAGs) with the involvement of various enzymes (Ohlrogge and Browse 1995; Chen et al. 2011; Haslam and Kunst 2013; Li-Beisson et al. 2013).

In the aforementioned pathway, which initiates from the precursor acetyl-CoA and culminates in the formation of 18-carbon FAs, followed by carbon chain elongation and ultimately the synthesis of EA in the endoplasmic reticulum (ER), as well as the assembly of EA and its storage as triglycerides (TAGs), any enzyme involved along this pathway can significantly impact the content of EA in oilseeds. Consequently, the regulation of EA content in oilseeds can be addressed from various perspectives: (1) Modulating the efficiency of carbon chain elongation by either enhancing or suppressing the activity of FAE1. (Shi et al. 2017; Liu et al. 2022) (2) Enhancing or impeding the efficiency of EA to TAG assembly by concurrently activating or inhibiting enzymes involved in TAG synthesis, such as Lysophosphatidic Acid Acyltransferase (LPAAT) and Diacylglycerol O-Acyltransferase (DGAT), along with FAE1 (Bates et al. 2009; Haslam and Kunst 2013; Liu et al. 2022). (3) Activating or inhibiting enzymes that compete with FAE1 for substrates, such as FAD2 and FAD3 (Browse and Somerville 1991; Li-Beisson et al. 2013). In addition to these more direct influencing factors, there are numerous indirect factors that can impact the EA content in oilseeds. These factors encompass soil nutrient conditions, climate conditions, planting density, and even the plant structure exhibited by different varieties (Khan et al. 2018; Davoudi et al. 2019; Wang et al. 2022). Considering these factors, it becomes evident that the EA content in oilseeds represents a quantitative trait regulated by multiple genes. However, presently, beyond the enzymes involved in the biosynthetic pathway of EA, our understanding of other genes remains limited.

Erucic acid is a trait that has undergone strong selection, as all current LEA rapeseed varieties can be traced back to the utilization of Liho, a forage rapeseed that served as a parental line with zero EA content (Stefansson et al. 1961). This intensive selection for LEA genotypes has resulted in evident selective sweeps, which involve a reduction or elimination of genetic variation in the vicinity of specific DNA mutations, within rapeseed genetic populations. Selective sweeps can be identified by measuring linkage disequilibrium (LD), which reflects the over-representation of particular haplotypes in a population (Slatkin 2008; Wu et al. 2019). Additionally, genome-wide association studies (GWAS) offer an effective approach for identifying quantitative trait loci (QTL) that control an agronomic trait (Hu et al. 2022; Tang et al. 2021; Wu et al. 2019).

In this study, we conducted selective signal analyses, GWAS and transcriptome analysis to enhance our understanding of the molecular mechanisms underlying EA content (EAC) in rapeseed seeds. Our objective was to identify the genetic footprints left by intensive LEAC selection, genes associated with EA regulation, significant single nucleotide polymorphisms (SNPs) that could serve as molecular markers capable of distinguishing between LEA and HEA genotypes. We focused on identifying genes involved in the EA biosynthetic pathway, as well as those exert an influence beyond the pathway.

Plant materials and growth conditions. In a previous study, we collected and resequenced a total of 932 rapeseed accessions from 39 countries worldwide (Wu et al. 2019). These accessions comprised 617 winter-type, 136 semi-winter-type, and 179 spring-type varieties. This diverse and extensive genetic population, referred to as P1, was utilized for GWAS in the experimental year 2023. The accessions were sown in September 2021 and harvested in May or June 2022. To establish a core collection that retained more than 96.5% of the genetic diversity of P1 based on SNPs, GWAS was conducted again in the experimental year 2023. This core collection, denoted as P2, was sown in September 2022 and harvested in May or June 2023 (Fig. 1a, b, c). The ID numbers of the accessions included in both P1 and P2 and their EA categories can be found in Tables S1 and S2, respectively. The growth and cultivation of the P1 and P2 populations took place at the Agricultural Experimental Station of Jiaxing Academy of Agricultural Sciences, located at 120°E, 30°N. Each plot had an area of 1.2m * 1.2m, and 12 plants were transplanted in each plot, with three replicates. Field management followed standard practices used in the lower reach of Yangtze River Region.

Erucic acid content measurement. The seeds harvested from P1 and P2 in two successive years were subjected to air-drying to eliminate excess moisture. The EAC was quantified using near-infrared spectroscopy (ANTARIS II, Thermo Scientific™, America). Based on the EAC data obtained in the experiment year 2022, three representative LEAC (R4472, R4845, R4451) and three typical HEAC accessions (R4422, R4298, R4707) were selected, and their developing seeds were measured at 40 days after flowering (40 DAF) using GC-2014 (Shimadzu, Japan) gas chromatography. Moreover, the three HEAC accessions were measured for EAC at 20 days after flowering (20 DAF) and 40 days after flowering (40 DAF) using the gas chromatography.

Genome-wide-association study. The SNPs utilized for GWAS were obtained from the resequencing data of 932 accessions (Wu et al. 2019). The reference genome employed in this study was Zhong Shuang 11 (ZS11) (ZS11.v0, available at https://yanglab.hzau.edu.cn/BnIR/accessionsinfo?id=ZS11.v0) (Song et al. 2020), against which the raw reads were aligned. SNPs with a deletion rate exceeding 10% and a minor allele frequency (MAF) lower than 5% were filtered out using plink1.9 software (Purcell et al. 2007). Subsequently, 4,436,952 and 4,312,418 high-quality SNPs were retained for the P1 and P2 accessions, respectively. The annotation of these SNPs was performed using SnpEFF software (https://pcingola.github.io/SnpEff/).

To evaluate the population structure, the Admixture software (Alexander et al. 2009) was employed, and the optimal grouping was determined based on cross-validation error rates. Visualization of the group structure was achieved using the R package Pophelper (Francis 2017) (Fig. S1). Principal component analysis was conducted using GCTA software (Yang et al. 2011) (Fig. 1d). GWAS was executed through the 'BnaGWAS' web portal (https://bnapus-zju.com/gwas/) utilizing EMMAX (Efficient Mixed-Model Association eXpedited). For identifying high-quality SNPs associated with traits, the genome-wide significance thresholds were determined using the formula P = 1/n (where n is the number of SNPs) (Yan et al. 2020). Accordingly, thresholds of − log₁₀^(p) > 6.6 was applied for the P1 (4,436,952) and P2 (4,312418) populations. Candidate genes were searched within 75 Kb sequence regions adjacent to significantly associated SNPs.

Genome-wide selective sweep analysis. We divided the 932 accessions into two subpopulations based on EAC: 200 LEAC and 200 HEAC accessions. This segregation was performed using vcftools v1.17 (http://vcftools.sourceforge.net/) (Danecek et al. 2011) with a sliding window size of 100 Kb and a step size of 5 Kb. These two subpopulations represented the extreme phenotypes, with each comprising 20% of the total accessions. To assess the differentiation between the HEAC and LEAC subpopulations, we calculated the F_ST values and genetic polymorphism values. SNPs with π values exceeding the 95% threshold and F_ST values with significance levels above 0.05 were deemed as significantly differentiated SNPs between the two populations. Furthermore, we focused on genes that included significant SNPs located within 3 kb upstream of introns and introns, considering them as candidate genes for further analysis.

Transcriptome analysis. For RNA-seq analysis, we selected three HEAC and three LEAC accessions. It was crucial to ensure strict bagging and self-pollination of these selected materials. Harvesting of siliques was performed at two time points, namely 20 days after flowering (20 DAF) and 40 days after flowering (40 DAF), which corresponds to the developmental stages when EA rapidly accumulates in the seeds. The seeds were delicately extracted from the siliques and promptly stored in liquid nitrogen to maintain RNA integrity.

The RNA isolation, library preparation, and paired-end sequencing processes were entrusted to OE Biotech Corporation (Shanghai, China), utilizing the MGI DNBSEQ-T7 platform (Shenzhen, China). To obtain high-quality clean reads, the raw sequencing data underwent preliminary processing using the fastp tool (Chen et al. 2018). Subsequently, these clean reads were aligned to the ZS11.v0 reference genome using the HIST2 alignment tool (Bolger et al. 2014; Kim et al. 2015). Expression levels were determined for each sample using the featureCounts software (Liao et al. 2014). Differential expression analysis was conducted using the DESeq2 package in R (Anders et al. 2015) to identify genes that displayed significant differential expression between the high and low EAC groups.

Gene ontology enrichment analysis. To perform the Gene Ontology (GO) enrichment analysis, all proteins from the B. napus ZS11.v0 reference genome were compared against the Arabidopsis proteome (TAIR10) using the BLASTP program (Basic Local Alignment Search Tool for Protein) available at https://www.ncbi.nlm.nih.gov/. A stringent e-value cut-off of 1e-5 was applied during the comparison. Based on the alignment results, the GO terms of B. napus genes were assigned according to the best hit genes in Arabidopsis. This approach allowed for the annotation of B. napus genes based on the functional annotations of their closest homologs in Arabidopsis. For GO enrichment analysis, we utilized the R package clusterProfiler (Wu et al. 2021). The resulting p-values from the analysis were adjusted using the Benjamini–Hochberg method to account for multiple testing. This adjustment approach allowed us to identify GO terms that were significantly over-represented in the gene set of interest, with a false discovery rate (FDR) threshold set to less than 0.05. By applying this stringent criterion, we aimed to ensure the robustness and reliability of the identified enriched GO terms.

Analyses of genomic sequences. In order to investigate the distribution of genetic variations, we utilized a comprehensive B. napus variation database called BnVIR (B. napus Variation Database). This database, available at http://yanglab.hzau.edu.cn/BnVIR, served as a valuable resource for extracting SNPs and InDels within the genomic regions of the candidate genes (Yang et al. 2022). To ensure data quality and reliability, we selected high-quality SNPs with minor allele frequencies exceeding 5% according to 932 accessions of previous study (Wu et al. 2019). These selected variants were then utilized to construct haplotypes representing different allelic combinations within the candidate genes. To examine the allele and haplotype distribution patterns in different ecotypes, we included rapeseed accessions that possessed complete SNP information, without any missing data points. By analyzing the distribution of alleles and haplotypes among these accessions, we gain insights into the genetic diversity and variation within different ecotypes of rapeseed.

Development of breeding marker. To develop CAPS (Cleaved Amplified Polymorphic Sequence) markers, we conducted a targeted search for significant associated SNPs within the upstream 3kb region, coding sequences, and introns of the candidate genes. Subsequently, significant difference analysis was employed to identify SNPs that exhibited significant differences between groups of interest. The DNA sequences corresponding to the candidate SNP markers were obtained from the BnIR database (https://yanglab.hzau.edu.cn/BnIR). To identify SNPs that caused changes in restriction enzyme recognition sites, we utilized the SnapGene software (https://www.snapgene.com/). This analysis allowed us to pinpoint SNPs that potentially conferred variations in restriction enzyme digestion patterns. To validate the candidate SNP markers, we conducted enzyme digestion experiments. The primers for PCR amplification of the regions containing the SNP markers can be found in Fig. 5a. By performing PCR with these primers, followed by enzyme digestion, we were able to verify the presence of the desired CAPS markers, providing a reliable method for genotyping and understanding the genetic variations associated with the target genes.

Analysis of the variation in EAC of seeds in two genetic populations

To investigate the variation in erucic acid content (EAC) among natural germplasm populations of rapeseed, the EAC of seeds was measured in two populations in consecutive years, namely population 2022 (P1) and population 2023 (P2). The larger population (P1, 2022) (Table S1) consisted of 932 accessions and a smaller population (P2, 2023) comprised 284 accessions (Table S2). The selection of the smaller population aimed to maintain genetic diversity of the larger population while allowing for better control of field experiment’s scale. A clustering plot (Fig. 1a) revealed that the 284 accessions in P2 were representatively distributed across various clusters in P1. The proportions of three ecological types (winter, semi-winter, and spring) were relatively consistent in both populations, with each type accounting for 66.2%, 14.6%, and 19.2% in P1 and 63.0%, 17.3%, and 19.7% in P2 (Fig. 1b, c).

Based on the EAC of seeds, all accessions were categorized into three groups: high erucic acid (HEAC) with EAC > 20%, moderate erucic acid (MEAC) with 5% < EAC < 20%, and low erucic acid (LEAC) with EAC < 5%. According to this classification, the proportions of HEAC, MEAC, and LEAC accessions in P1 were 33.9% (HEAC), 9.6% (MEAC), and 56.5% (LEAC), while in P2, the proportions were 39.4% (HEAC), 7.8% (MEAC), and 52.8% (LEAC). The relative proportions of HEAC, MEAC, and LEAC accessions were similar in both populations (Table S3).

Furthermore, Fig. 1d visually represents the distribution of accessions in P2 (gray dots) and P1 (red dots) on a principal component analysis plot. Genotypic analysis revealed that the accessions in P2 included 4.31 million single nucleotide polymorphisms (SNPs), accounting for 95% of the total SNPs in P1, respectively (Fig. 1d). Table S3 summarizes statistical parameters of EAC variation in the two genetic populations.

Selective signal analysis and genome-wide-association study on EAC in seeds

All current LEAC parental lines can be traced back to ‘Liho’, a fodder rapeseed cultivar originally from South America. The breeding for LEA genotype has significantly reduced the genetic diversity of rapeseed (Brassica napus). To investigate the artificial selection signatures left by LEA breeding, we performed selective signal analysis (SSA) on 200 HEAC and 200 LEAC accessions, aiming to understand the genetic footprints left during the process of LEA breeding. Using 4,436,952 high-quality single nucleotide polymorphisms (SNPs), we calculated the F_ST values between the HEA and LEA accessions at the global genome level. As a result, we identified a total of 66,517 significant SNPs and mapped 1,365 candidate genes associated with the selection signals between the LEA and HEA groups (Fig. 2a). The identification and functions of these genes can be found in Table S4. Notably, among these footprint genes resulting from LEA selection, there were 67 putative transcription factors (TFs), including ethylene-responsive factors (ERF8, RAP2, ERF6, ERF1A, ERF043, CRF4), GATA family genes (GATA29, GATA26, GATA3, GATA5), v-Myb avian myeloblastosis viral oncogene homolog (MYB) family genes (MYB39, MYB3R, MYB86, MYB73, MYB98, MYB85, MYB121, MYB34), basic leucine zipper (bZIP) proteins (ATB2/bZIP11, bZIP70, bZIP2), and basic helix-loop-helix (bHLH) proteins (bHLH63, bHLH27, bHLH126, bHLH99, bHLH162-like).These putative TFs have multiple copies in the two subgenomes of Brassica napus.

In addition to the SSA, we conducted a genome-wide association study (GWAS) to identify genes closely associated with EAC in rapeseed seeds. The GWAS was performed using data collected from P1 and P2. By analyzing the GWAS results, we identified 2148 SNPs with a -log₁₀P value greater than 6.6 in both populations. Detailed information about SNP positions identified both in P1 and P2 can be found in Table S5. The majority of these SNPs, more than 99.9%, were located on chromosomes A08 (Chr.A08) and C03 (Chr.C03). Specifically, approximately 33.8% of the significant SNPs were located on Chr.A08, while 66.1% were on Chr.C03. The remaining 3 significant SNPs were dispersed across other chromosomes (C06, C07, C08) (Fig. 2b; Table S5). To avoid false positives, we eliminated 3 independent SNPs in chromosomes C06, C07 and C08 (Table S6). We further searched the 75 kb upstream and downstream regions closely linked to these SNPs for the candidate genes responsible for EAC variations, and identified 654 genes within these regions. Among them, 382 genes were located on Chr.A08, while 272 genes were located on Chr.C03. The annotations of these genes can be found in Table S6. The identified genes include those directly involved in FA and, more specifically, EA biosynthesis pathways. Examples are Arabidopsis orthologues of FAE1 ortholog (BnaC03G0745900ZS), 3-ketoacyl-CoA synthase 17 (BnaC03G0746000ZS), methylcrotonoyl-CoA carboxylase (BnaC03G0749800ZS), various TFs, such as ERFs (BnaA08G0105200ZS, BnaA08G0105300ZS, BnaA08G0144100ZS, BnaC03G0746800ZS), MYBs (BnaA08G0106500ZS, BnaA08G0111700ZS, BnaA08G0121500ZS, BnaA08G0144500ZS, BnaC03G0669700ZS, BnaC03G0723300ZS, BnaC03G0754500ZS), WRKYs (BnaA08G0108000ZS, BnaA08G0130000ZS, BnaA08G0149300ZS), GATAs (BnaC03G0744600ZS, BnaA08G0105900ZS, BnaA08G0133800ZS), bHLHs (BnaC03G0745700ZS, BnaA08G0134600ZS, BnaA08G0139400ZS, BnaC03G0745700ZS), bZIPs (BnaA08G0134300ZS, BnaC03G0741900ZS, BnaC03G0745300ZS, BnaC03G0741900ZS), and DOFs (BnaA08G0119900ZS, BnaA08G0122700ZS, BnaA08G0122800ZS, BnaA08G0123000ZS, BnaC03G0725300ZS, BnaC03G0725400ZS) .

To identify the genes resulting from both the SSA and GWAS, we conducted a cross-analysis as shown in a Venn diagram (Fig. 2c). A total of 240 genes were identified in both studies. Notably, these genes included FAE1 and the TF families, such as ERFs, GATAs, MYBs, bZIPs, bHLHs, and DOFs (Table S7).

Analyses for DEGs in HEAC seeds between the developmental stages 20 and 40 DAF

To gain further insights into the molecular mechanisms underlying EA accumulation in developing rapeseed seeds, we performed RNA-seq analyses to identify differentially expressed genes (DEGs) in typical HEAC seeds at two developmental stages, 20 DAF and 40 DAF. Figure 3a illustrates a rapid increase in EA accumulation within a short span of 20 days between 20 and 40 DAF. By comparing the DEGs between these two time points, we identified a total of 6,271 up-regulated genes and 5,745 down-regulated genes (|log₂(fold change)| > 2, q-value < 0.05) at 40 DAF compared to 20 DAF (Fig. 3b). Cluster analysis revealed distinct differences in gene expression patterns between the 20 and 40 DAF time points (Fig. 3c). The identification and predicted functions of these genes can be found in Table S8.

Gene Ontology (GO) enrichment analysis was performed to further understand the biological functions of the DEGs. The enriched categories among the up-regulated genes (URGs) included 32 genes involved in FA metabolic processes, 22 genes related to lipid storage regulation, 18 genes involved in lipid droplet organization, and 369 genes positively regulating transcription. For instance, the URGs involved in FA metabolic processes include FA metabolic processes, lipid storage regulation, lipid droplet organization, etc. On the other hand, the enriched down-regulated genes (DRGs) included 214 genes responsive to cadmium ion, 212 genes involved in fungus defense responses, 150 genes responding to heat stress, as well as other genes involved in cellular and essential macromolecular proliferation activities such as mitotic cell cycle, DNA replication, and protein refolding, among others.

Analyses for DEGs in 40 DAF seeds between HEA and LEA accessions

Meanwhile, we performed a comparison of DEGs in developing seeds at 40 DAF between typical LEAC accessions (R4451, R4472, and R4845) and HEAC accessions (R4442, R4298, and R4707). The LEA and HEA accessions exhibited more than a 35-fold difference in EAC (Fig. S2a). We identified 198 URGs and 219 DRGs (|log₂(fold change)| > 1, q-value < 0.05) in the HEA accessions compared to the LEA accessions (Fig. S2b). The gene expression patterns clearly distinguished between the LEA and HEA accessions (Fig. S2c). The identification and annotations of the DEGs between the HEA and LEA accessions are provided in Table S9.

GO enrichment analysis revealed that the URGs in the HEA accessions and LEA accessions included 1 gene involved in L-lysine catabolic process to acetyl-CoA, 1 gene in ATP biosynthetic process, and 3 genes involved in jasmonic acid mediated signaling pathway. On the other hand, the DRGs in the HEA accessions included 9 genes responding to cold, 3 genes associated with seed development, 2 genes responding regulation of reactive oxygen species, as well as other genes involved in cellular activities such as the cellular respiration, DNA replication, and protein storage vacuole organization (Fig. S2d).

To further investigate the overlap between the DEGs identified in the two RNA-seq experiments, we conducted a cross-analysis and listed the overlapping URGs and DRGs in Table S10.

Identification of allelic variation of candidate genes and development of markers for distinguishing HEA and LEA accessions

Comprehensively considered GWAS, SSA and two sets of DEGs, we recommended a gene pool consisting of 23 genes that may regulate EA biosynthesis (Table 1). These genes include orthologues of FAE1, Methylcrotonoyl-CoA carboxylase beta chain (MCCB), Dof zinc finger protein DOF4.4 (DOF4.4), GATA transcription factor 3 (GATA3), Arginine Decarboxylase 2 (ADC2), and others. To investigate the genetic differences between HEA and LEA accessions, we chose two candidate genes, BnaA08.FAE1 and BnaC03.MCCB, as representative examples for comparing the distribution patterns of single nucleotide polymorphisms (SNPs) between 200 HEA and 200 LEA accessions. The aim of this comparison was to justify the selection of molecular markers for distinguishing HEA and LEA accessions.

Table 1

Candidate genes regulating the synthesis of erucic acid in seeds recommended by integrative analyses of GWAS and F_ST and DEGs
Arabidopsis orthologue	ID of Brassica napus
Arginine Decarboxylase 2 (ADC2)	BnaA08G0133500ZS, BnaC03G0743900ZS
An ortholog of the human BREAST CANCER SUSCEPTIBILITY 1 (BRCA1)	BnaA08G0122500ZS, BnaC03G0724200ZS
Cationic Amino Acid Transporter 1 (CAT1)	BnaA08G0122000ZS, BnaC03G0723700ZS
Cystathionine beta-Synthase domain-containing protein (CBSX2)	BnaA08G0137900ZS, BnaC03G0749100ZS
Caffeoyl-CoA O-methyltransferase 1 (CCOAOMT1)	BnaC03G0749700ZS
Cyclin-dependent kinases regulatory subunit 2 (CKS2)	BnaA08G0136900ZS
Coenzyme Q-binding protein (COQ10)	BnaA08G0106000ZS
DOF zinc finger protein DOF4.4	BnaA08G0122800ZS, BnaC03G0724300ZS、BnaC03G0724600ZS, BnaC03G0724700ZS
3-ketoacyl-CoA synthase 18 (FAE1)	BnaA08G0134700ZS
GATA transcription factor 3 (GATA3)	BnaA08G0133800ZS, BnaC03G0744600ZS
Guanine nucleotide-binding protein subunit beta (GB1)	BnaA08G0135000ZS
Methylcrotonoyl-CoA carboxylase beta chain, mitochondrial (MCCB)	BnaC03G0749800ZS
DNA mismatch repair protein MSH4	BnaA08G0104800ZS
GPN-loop GTPase (QQT2)	BnaC03G0725900ZS
Serine carboxypeptidase-like 43 (SCPL43)	BnaA08G0144800ZS
SWI/SNF complex subunit (SWI3D)	BnaA08G0135300ZS
Disease resistance protein (TAO1)	BnaC03G0727800ZS, BnaC03G0727900ZS
UDP-glycosyltransferase 73B1 (UGT73B1)	BnaA08G0137300ZS, BnaC03G0748400ZS
Zinc induced facilitator-like 2	BnaA08G0104700ZS
Ribosomal protein L19 family protein	BnaA08G0105800ZS
WAT1-related protein	BnaA08G0113700ZS
Transmembrane protein, putative (DUF677)	BnaA08G0136100ZS
Late embryogenesis abundant protein (LEA) family protein	BnaA08G0123100ZS, BnaC03G0724800ZS

BnaA08.FAE1 consists of only one exon, and a total of 18 SNPs were identified in the 3 Kb 5' regulatory region of the gene. Figure 4a shows the position of the SNPs on Chr.A08 and the location of BnaA08.FAE1, ranging from 18,615,052 Bp to 18,622,753 Bp. Figure 4b visually displays the different patterns of SNP distribution in the 5' regulatory region of BnaA08.FAE1. As we used the genome of the LEA cultivar ‘ZS11’ as the reference genome, the majority of SNPs in the HEA group had alternative alleles compared to the reference genome, while the majority of SNPs in the LEA group had alleles identical to the reference genome. We classified the SNP combinations into six haplotypes, namely FAE1_Hap A to Hap F. The statistical probabilities for LEA were 4.7%, 0%, 1.5%, 16.7%, 93%, and 0% for FAE1_Hap A, FAE1_Hap B, FAE1_Hap C, FAE1_Hap D, FAE1_Hap E, and FAE1_Hap F, respectively (Fig. 4c and Table 2).

Table 2

The haplotypes of SNPs related to candidate genes
Haplotype	Haplotype Name	related gene
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_G_A_A_C_T_G_G_T_C_C_T_C_T_G	FAE1_Hap A	BnaA08.FAE1
G_C_T_A_T_T_G_T_A_G_G_A_G_G_A_G_C_T_A_T_G_G_G_G_T_T_A_A_C_C_T_T_C_G_A_C_A_A_A	FAE1_Hap B
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_T_G_G_C_T_G_G_T_C_C_T_C_T_G	FAE1_Hap C
G_C_A_G_T_T_G_T_A_G_A_A_G_G_A_G_C_T_A_T_G_G_G_G_T_G_A_A_G_T_T_T_C_G_A_C_A_T_G	FAE1_Hap D
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_T_A_A_C_T_G_G_T_C_C_T_C_T_G	FAE1_Hap E
C_T_A_G_G_G_A_A_C_T_G_G_A_A_G_A_T_C_G_C_G_G_G_G_T_G_G_G_C_T_G_G_T_C_C_T_C_T_G	FAE1_Hap F
A_C_C_G_C_G_T_A_C_T_A_C_A_G_G_C_G_C_A	MCCB_Hap A	BnaC03.MCCB
G_C_C_G_C_G_T_A_C_T_A_C_A_G_G_C_G_C_A	MCCB_Hap B
A_A_G_T_A_A_A_T_G_G_T_G_T_A_A_T_C_G_C	MCCB_Hap C
G_A_G_T_A_A_A_T_G_G_T_G_T_A_A_T_C_G_C	MCCB_Hap D

BnaC03.MCCB consists of 10 exons and 9 introns, and a total of 12 SNPs were identified within the 3 Kb 5' regulatory region and 6 SNPs in the exon region. Figure 4d shows the location of BnaC03.MCCB, ranging from 72,690,869bp to 72,699,720 Bp, and the positions of the SNPs on Chr.C03. Among the 6 SNPs in the exon region, 3 resulted in nonsense nucleotide changes, 2 led to sense nucleotide changes, and 1 caused a frameshift. Figure 4e visually presents the different patterns of SNP distribution in BnaC03.MCCB between the HEA and LEA accessions. The majority of SNPs in the HEA group had alternative alleles compared to the reference genome, while the majority of SNPs in the LEA group had alleles identical to the reference genome. We classified the SNP combinations into four haplotypes, namely MCCB_Hap A to Hap D. The statistical probabilities for LEA were 21.0%, 8.3%, 94.5%, and 100% for MCCB_Hap A, MCCB_Hap B, MCCB_Hap C, and MCCB_Hap D, respectively (Fig. 4f and Table 2). The allelic combinations for each haplotype of both FAE1 and MCCB orthologues are provided in Table 2.

The SNPs shown in Fig. 4b and 4e can be used to develop breeding markers such as Cleaved Amplified Polymorphic Sequence (CAPS) (Fig. S3, Table S11). Here, we focused on the SNPs in the 5' regulatory region upstream of BnaC03.MCCB (Fig. 5). The restrictive enzyme Xba1 was able to digest the PCR product at the position of Chr.C03_72696864 in LEA accessions, but not at that position in HEA accessions (Fig. 5a and Table S12). This resulted in different fragment sizes as displayed on the gel during electrophoresis (Fig. 5b).

Erucic acid (EA) is a crucial quality trait in rapeseed, and it is essential to understand the molecular mechanisms that regulate its formation in seeds. The LEAC variety was first bred in Canada in the 1960s and has since become dominant in major rapeseed cultivars worldwide within a relatively short period (Gupta and Pratap 2007). However, the intensive selection for LEA has resulted in a reduced genetic diversity in rapeseed, compromising its ability to adapt to environmental stresses. To investigate the genetic consequences of LEA selection in the rapeseed germplasm population, we employed SSA to identify selective signals between the HEAC and LEAC subpopulations. SSA is a powerful tool that can detect genomic regions subject to selective forces, providing insights into the genetic basis of phenotypic variations, identification of genes under selection, and the development of molecular markers for marker-assisted breeding programs (Chen et al. 2010). SSA has proven its utility in diverse fields such as evolutionary biology (Qiu et al. 2015), population genetics (Wei et al. 2021), and plant breeding (Li et al. 2023). In this study, we discovered genetic footprints resulting from LEA breeding, including several genes involved in FA biosynthesis pathways and 31 putative transcription factors (TFs) such as ethylene-responsive factors (ERFs), GATA family genes, MYB genes, bZIP, and bHLH proteins (Fig. 2b, Table S5). These footprint genes may be associated with the reduced adaptability and stress tolerance observed in many LEA cultivars (Jakoby et al. 2002; Li et al. 2017; Sun et al. 2018; Feng et al. 2020; Zhu et al. 2020). Notably, the TFs aforementioned play crucial roles in stress responses, and their differential expression or activity can impact a plant's ability to cope with various environmental challenges (Sun et al. 2018; Javed et al. 2020). However, it is important to acknowledge that the majority of LEA accessions in our study also exhibited a low glucosinolate genotype (double-low quality). As a result, it is unclear whether the diminished environmental adaptability of the double low genotype is solely attributable to the selection for LEA or if it is influenced by the consequences stemming from low glucosinolate selection. Previous studies have indicated that glucosinolates contribute to plant defense against pathogens and herbivores by producing toxic compounds upon tissue damage. Glucosinolates also interact with other defense pathways, such as jasmonic acid and salicylic acid signaling, which can influence overall stress tolerance and response in plants (Mitreiter and Gigolashvili 2021). Nonetheless, considering the fixed linkage allelic changes presented in Table S5, there is potential to explore the enrichment of genetic diversity within the LEA genotype as a viable strategy. In our previous studies, SSA has proven effective in analyzing the formation of Brassica napus ecotypes (specifically winter, semi-winter, and spring ecotypes) (Wu et al. 2019; Xu et al. 2023) and the natural evolution of leaf wax coverage during domestication in specific areas (Long et al. 2023).

Furthermore, we conducted a GWAS to identify genes associated with EAC variation. Two different-sized GWAS populations (P1 and P2) were utilized over two consecutive years. Although P2 was much smaller than P1, it allowed for a controlled scale of field experiment while retaining over 96.5% of the SNPs observed in P1. In comparison with previous GWAS studies in Brassica napus (Lu et al. 2019; Wu et al. 2019; Wang et al. 2021), our approach demonstrated the feasibility of using a much smaller population (P2) for GWAS. Despite its reduced size, P2 still exhibited a high number of significant SNPs (4,312,418), providing the necessary statistical power and enabling the identification of tightly associated genes (Table S6). GWAS serves as a powerful tool for identifying genetic variations associated with important agronomic traits by analyzing a large number of genetic markers across the entire genome of a given population. One significant advantage of GWAS is its ability to detect genetic variations without prior knowledge of gene function, allowing for an unbiased approach. This approach facilitates the discovery of new genetic associations, including rare or novel variants that may have significant effects on traits (Liu et al. 2016; Wei et al. 2019). Scalability is another advantage of GWAS, as it can analyze large populations, such as the 932 accessions in this study and the 991 accessions in our previous research (Wu et al. 2019). This scalability enhances the statistical power and increases the reliability of the findings. In our previous studies, GWAS has played a pivotal role in identifying genes that regulate various traits in rapeseed, such as flowering time (Wu et al. 2019), leaf trichome density (Xuan et al. 2020), seed oil content (Qu et al. 2017; Wang et al. 2021), drought tolerance (Zhu et al. 2020), tocopherol content and composition (Huang et al. 2023), leaf wax thickness (Long et al. 2023), shade tolerance (Li et al. 2023), and petal size (Wang et al. 2023). Understanding the genetic basis of these traits aids in identifying candidate genes and pathways, providing insights into the underlying biological mechanisms and facilitating gene manipulation for crop improvement.

However, it is important to distinguish between the concepts of "association" and "regulation." Association refers to a statistical correlation between a gene and the expression of a trait. However, this association does not necessarily imply a direct regulatory role of the gene in the expression of the trait. On the other hand, regulation indicates that a gene influences the formation and expression of a trait. For a gene to have a regulatory role in a specific trait, it must first be expressed in the relevant tissues. In the case of this study, EA is synthesized in developing seeds, which means that genes directly or indirectly involved in regulating EA synthesis should be expressed in the developing seeds. Given this consideration, we conducted transcriptome analyses between two developmental time points, before and after rapid accumulation of EA in HEAC genotypes (DEG analysis 1) (Fig. 3, Table S9). We also performed DEG analysis between LEAC and HEAC seeds at the developmental point of 40 DAF (DEG analysis 2) (Fig. S2, Table S10). Additionally, we conducted an enrichment analysis on these DEGs. Our results showed that the enriched DEGs in both analyses largely belonged to the same categories. However, the number of DEGs in each category was relatively higher in DEG Analysis 1 compared to DEG Analysis 2. The enriched upregulated genes (URGs) were primarily associated with categories such as FA metabolic processes, regulation of lipid storage, lipid droplet organization, and positive regulation of transcription. Conversely, the enriched downregulated genes (DRGs) were associated with categories such as cadmium ion response, defense against fungi, heat stress response, and fundamental cellular activities including the mitotic cell cycle, DNA replication, and protein refolding (Fig. 3d, Fig. S2d). The increased EA synthesis may be a consequence of the URGs, considering that EA belongs to the class of very long-chain fatty acids (VLFACs). The elevated EA synthesis might directly result from increased expression of genes such as FAE1, other FA synthases. On the other hand, the increased expression of genes involved in lipid storage regulation and lipid droplet organization was the result of more EA biosynthesis.

By combining SSA, GWAS, and DEG analysis, we have identified a list of 20 candidate genes that may directly or indirectly regulate EA synthesis or closely associate with EAC in rapeseed seeds (Table 1). Due to length constraints, we have analyzed a subset of these genes to explore the potential mechanisms underlying their impact on EA synthesis. (1) Fatty Acid Elongation 1 (FAE1): FAE1 is a well-known enzyme involved in the rate-limiting step of FA elongation and has been extensively studied for its role in determining EAC in seeds (Shi et al. 2017; Liu et al. 2022; Wang et al. 2022). (2) Methylcrotonoyl-CoA carboxylase beta chain (MCCB): MCCB is the beta chain of the MCC enzyme complex, which catalyzes the carboxylation of crotonyl-CoA, an intermediate in FA biosynthesis (Ding et al. 2012). The resulting methylcrotonyl-CoA might contribute to EA. (3) UDP-glucosyltransferase 73B1 (UGT73B1): UGT73B1 is an enzyme belonging to the UDP-glycosyltransferase family, involved in the glucosylation of various substrates, including plant secondary metabolites (Rehman et al. 2018). UGT73B1 might catalyze the glucosylation of EA or its precursors, potentially affecting its content and stability in seeds. (4) Caffeoyl-CoA O-methyltransferase 1 (CCoAOMT1): CCoAOMT1 is an enzyme involved in lignin biosynthesis (Walker et al. 2016). Its activity could divert metabolic intermediates away from FA biosynthesis towards lignin biosynthesis, potentially limiting the availability of substrates for EA synthesis. (5) Cyclin-dependent Kinases Regulatory Subunit 2 (CKS2): CKS2 is a regulatory subunit involved in controlling cell cycle progression in coordination with cyclin-dependent kinases (CDKs) (Boruc et al. 2010). CKS2 might play a role in seed development and maturation processes that influence lipid biosynthesis. Altered CKS2 activity or expression could disrupt the coordination and resource allocation for EA synthesis. Changes in CKS2 activity and associated CDKs could lead to metabolic trade-offs, redirecting resources away from lipid biosynthesis, including EA, towards supporting cell cycle progression or other cellular processes.

These speculative explanations require further investigation to determine the specific details of the underlying mechanisms connecting the candidate genes listed in Table 1 and EAC in seeds. Additional studies are necessary to explore the interactions, signaling pathways, and potential regulatory networks involving these genes and EA metabolism in seeds.

In summary, our study has revealed the genetic footprints resulting from artificial selection for LEA in germplasm populations of Brassica napus. These findings suggest specific genetic regions as targets for enriching genetic diversity. Through GWAS combined with transcriptome analysis, we have recommended a set of 20 genes, some of which could be used for the development of molecular markers differentiating HEAC and LEAC genotypes and the identification of recommended haplotypes for HEAC and/or LEAC genotypes. Our study provides preliminary insights into potential genes involved in regulating EA biosynthesis and the probable mechanisms underlying such regulation.

Author contribution statement

LJ conceived the experiments. SX carried out the field experiments and data analyses. SC, JC, TY, MT, RW, SH participated in field experiments in multiple locations and years. SX and LJ wrote the manuscript.

Competing interests

The authors have no relevant financial or nonfinancial interests to disclose.

Funding

This work was financially supported by STI 2030 – Major Projects (2023ZD04008), and Natural Science Foundation of China (code nos. 31971817, 32301756).

Acknowledgements

We thank Dr. Ulrike Lohwasser from Leibniz Institute of Plant Genetics and Crop Plant Research, Germany, for providing a part of the rapeseed accessions for this study, and Mr. Rui Sun from Agricultural Experiment Station of Zhejiang University for the management of field experiments.

Data availability

The supporting data of Figures and Tables are available in Supplementary Figure S1–S3, and Supplementary Tables S1–S12. The raw reads of the rapeseed accessions have been deposited in the public database of National Center of Biotechnology Information under SRP155312 (https://www.ncbi.nlm.nih.gov/sra/SRP155312) and China National Center for Bioinformation (NGDC) (https://ngdc.cncb.ac.cn/gsa/browse/CRA001854).

Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664.
Anders S, Pyl P T, Huber W (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166-169.
Bates PD, Durrett TP, Ohlrogge JB, Pollard M (2009). Analysis of Acyl Fluxes through Multiple Pathways of Triacylglycerol Synthesis in Developing Soybean Embryos. Plant Physiol 150:55–72.
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.
Browse J, Somerville C. (1991) Glycerolipid Synthesis: Biochemistry and Regulation. Annu. rev. plant physiol. plant mol. Biol 42:467–506.
Boruc J, Mylle E, Duda M, De Clercq R, Rombauts S, Geelen D, Hilson P, Inzeݩ D, Van Damme D, Russinova E (2010) Systematic Localization of the Arabidopsis Core Cell Cycle Proteins Reveals Novel Cell Division Complexes. Plant Physiol 152:553–565.
Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Res 20:393–402.
Chen JM, Qi WC, Wang SY, Guan RZ, Zhang HS (2011) Correlation of Kennedy pathway efficiency with seed oil content of canola (Brassica napus L.) lines. Can. J. Plant Sci 91:251–259.
Chen S, Zhou Y, Chen Y, Gu J (2018)fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34: i884–i890.
Cui Y, Zeng X, Xiong Q, Wei D, Liao J, Xu Y, Chen G, Zhou Y, Dong H, Wan H, Liu Z, Li J, Guo L, Jung C, He Y, Qian W (2021) Combining quantitative trait locus and co-expression analysis allowed identification of new candidates for oil accumulation in rapeseed. J Exp Bot. 72:1649–1660.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158.
Davoudi A, Mirshekari B, Shirani-Rad A, Farahvash F, Rashidi V (2019)Eﬀect of selenium foliar application on oil yield, fatty acid composition and glucosinolate content of rapeseed cultivars under late-season thermal stress. OCL 26:43.
Ding G, Che P, Ilarslan H, Wurtele ES, Nikolau BJ (2012) Genetic dissection of methylcrotonyl CoA carboxylase indicates a complex role for mitochondrial leucine catabolism during seed development and germination. Plant J. 70:562–577.
Feng K, Hou XL, Xing GM, Liu JX, Duan AQ, Xu ZS, Li MY, Zhuang J, Xiong AS (2020) Advances in AP2/ERF super-family transcription factors in plant. Crit Rev Biotechnol 40:750–776.
Francis RM (2017) pophelper: An R package and web app to analyse and visualize population structure. Mol Ecol Resour 17:27–32.
Gupta SK, and Pratap A (2007) History, origin, and evolution. In Advances in Botanical Research (Academic Press), pp. 1–20.
Harwood JL (2005) “Fatty acid biosynthesis,” in Plant Lipids: Biology, Utilization and Manipulation, ed. D. J. Murphy (Oxford: Blackwell), 27–66.
Haslam TM, Kunst L (2013) Extending the story of very-long-chain fatty acid elongation. Plant Sci 210:93–107.
Hu J, Chen B, Zhao J, Zhang F, Xie T, Xu K, Gao G, Yan G, Li H, Li L, Ji G, An H, Li H, Huang Q, Zhang M, Wu J, Song W, Zhang X, Luo Y, Chris Pires J, Batley J, Tian S, Wu X (2022) Genomic selection and genetic architecture of agronomic traits during modern rapeseed breeding. Nat Genet 54:694–704.
Huang Q, Lu L, Xu Y, Tu M, Chen X, Jiang L (2023) Genotypic variation of tocopherol content in a representative genetic population and genome-wide association study on tocopherol in rapeseed (Brassica napus) Mol Breed 43:50.
Jakoby M, Weisshaar B, Dröge-Laser W, Vicente-Carbajosa J, Tiedemann J, Kroj T, Parcy F (2002) bZIP transcription factors in Arabidopsis. Trends Plant Sci 7:106–111.
Javed T, Shabbir R, Ali A, Afzal I, Zaheer U, Gao SJ (2020) Transcription Factors in Plant Stress Responses: Challenges and Potential for Sugarcane Improvement. Plants 9:491.
Kaur H, Wang L, Stawniak N, Sloan R, van Erp H, Eastmond P, Bancroft I (2020) The impact of reducing fatty acid desaturation on the composition and thermal stability of rapeseed oil. Plant Biotechnol J 18:983–991.
Khan S, Anwar S, Kuai J, Noman A, Shahid M, Din M, Ali A, Zhou G (2018) Alteration in yield and oil quality traits of winter rapeseed by lodging at different planting density and nitrogen rates. Sci Rep 8:634
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357-360.
Knutsen HK, Alexander J, Barregård L, Bignami M, Brüschweiler B, Ceccatelli S, Dinovi M, Edler L, Grasl-Kraupp B, Hogstrand C, Hoogenboom L (Ron), Nebbia CS, Oswald I, Petersen A, Rose M, Roudot A-C, Schwerdtle T, Vollmer G, Wallace H, Cottrill B, Dogliotti E, Laakso J, Metzler M, Velasco L, Baert K, Ruiz JAG, Varga E, Dörr B, Sousa R and Vleminckx C (2016) Erucic acid in feed and food. EFSA J. 14: e04593.
Li-Beisson Y, Shorrosh B, Beisson F, Andersson MX, Arondel V, Bates PD, et al. (2013) Acyl-lipid metabolism. Arabidopsis Book 11: e0161.
Li D, Jin C, Duan S, Zhu Y, Qi S, Liu K, Gao C, Ma H, Zhang M, Liao Y, Chen M (2017) MYB89 Transcription Factor Represses Seed Oil Accumulation. Plant Physiol 173:1211–1225.
Li P, Xiao L, Du Q, Quan M, Song Y, He Y, Huang W, Xie J, Lv C, Wang D, Zhou J, Li L, Liu Q, El-Kassaby YA, Zhang D (2023) Genomic insights into selection for heterozygous alleles and woody traits in Populus tomentosa. Plant Biotechnol J 21:2002–2018.
Li Y, Guo Y, Cao Y, Xia P, Xu D, Sun N, Jiang L, Dong J (2023) Temporal control of the Aux/IAA genes BnIAA32 and BnIAA34 mediates Brassica napus dual shade responses. J Integr Plant Biol 13582.
Liao Y, Smyth GK, Shi W (2014) featureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930.
Liu X, Huang M, Fan B, Buckler ES, Zhang Z (2016) Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLOS Genet 12: e1005767.
Liu Y, Du Z, Lin S, Li H, Lu S, Guo L, Tang S (2022) CRISPR/Cas9-Targeted Mutagenesis of BnaFAE1 Genes Confers Low-Erucic Acid in Brassica napus. Front Plant Sci 13:848723.
Long Z, Tu M, Xu Y, Pak H, Zhu Y, Dong J, Lu Y, Jiang L (2023) Genome-wide-association study and transcriptome analysis reveal the genetic basis controlling the formation of leaf wax in Brassica napus. J Exp Bot 74:2726–2739.
Lu K, Wei L, Li X, Wang Y, Wu J, Liu M, Zhang C, Chen Z, Xiao Z, Jian H, Cheng F, Zhang K, Du H, Cheng X, Qu C, Qian W, Liu L, Wang R, Zou Q, Ying J, Xu X, Mei J, Liang Y, Chai YR, Tang Z, Wan H, Ni Y, He Y, Lin N, Fan Y, Sun W, Li NN, Zhou G, Zheng H, Wang X, Paterson AH, Li J (2019) Whole-genome resequencing reveals Brassica napus origin and genetic loci involved in its improvement. Nat Commun 10:1154.
Mitreiter S, Gigolashvili T (2021) Regulation of glucosinolate biosynthesis. J Exp Bot 72:70–91.
Ohlrogge J, Browse J. (1995)Lipid biosynthesis. Plant Cell 7:957-970
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet 81:559–575.
Qiu Q, Wang L, Wang K, Yang Y, Ma T, Wang Z, Zhang X, Ni Z, Hou F, Long R, Abbott R, Lenstra J, Liu J (2015) Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions. Nat Commun 6:10283
Qu C, Jia L, Fu F, Zhao H, Lu K, Wei L, Xu X, Liang Y, Li S, Wang R, Li J (2017) Genome-wide association mapping and Identification of candidate genes for fatty acid composition in Brassica napus L. using SNP markers. BMC Genomics 18:232.
Rehman HM, Nawaz MA, Shah ZH, Ludwig-Müller J, Chung G, Ahmad MQ, Yang SH, Lee SI (2018) Comparative genomic and transcriptomic analyses of Family-1 UDP glycosyltransferase in three Brassica species and Arabidopsis indicates stress-responsive regulation. Sci Rep 8:1875
Shi J, Lang C, Wang F, Wu X, Liu R, Zheng T, Zhang D, Chen J, Wu G (2017) Depressed expression of FAE1 and FAD2 genes modifies fatty acid profiles and storage compounds accumulation in Brassica napus seeds. Plant Science, 263:177–182.
Slatkin M (2008) Linkage disequilibrium—Understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9:477–485.
Song JM, Guan Z, Hu J, Guo C, Yang Z, Wang S, Liu D, Wang B, Lu S, Zhou R, Xie WZ, Cheng Y, Zhang Y, Liu K, Yang QY, Chen LL, Guo L (2020) Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat Plants 6:34–45.
Stefansson BR, Hougen FW, Downey RK (1961) Note on the isolation of rape plants with seed oil free from erucic acid. Can J Plant Sci 41:218–219.
Sun X, Wang Y, Sui N (2018) Transcriptional regulation of bHLH during plant response to stress. Biochem Biophys Res Commun 503:397–401.
Tang S, Zhao H, Lu S, Yu L, Zhang G, Zhang Y, Yang QY, Zhou Y, Wang X, Ma W, Xie W, Guo L (2021) Genome- and transcriptome-wide association studies provide insights into the genetic basis of natural variation of seed oil content in Brassica napus. Mol Plant 14:470–487.
Walker AM, Sattler SA, Regner M, Jones JP, Ralph J, Vermerris W, Sattler SE, Kang C (2016) The Structure and Catalytic Mechanism of Sorghum bicolor Caffeoyl-CoA O-Methyltransferase. Plant Physiol 172:78–92.
Wang H, Wang Q, Pak H, Yan T, Chen M, Chen X, Wu D, Jiang L (2021) Genome-wide association study reveals a patatin-like lipase relating to the reduction of seed oil content in Brassica napus. BMC Plant Biol 21:6.
Wang Pandi, Xiong X, Zhang X, Wu G, Liu F (2022) A Review of Erucic Acid Production in Brassicaceae Oilseeds: Progress and Prospects for the Genetic Engineering of High and Low-Erucic Acid Rapeseeds (Brassica napus). Front Plant Sci 13: 899076.
Wang R, Li Y, Xu S, Huang Q, Tu M, Zhu Y, Cen H, Dong J, Jiang L and Yao X (2023) Genome-wide association study reveals the genetic basis for petal-size formation in rapeseed (Brassica napus) and CRISPR-Cas9-mediated mutagenesis of BnFHY3 for petal-size reduction. Plant J16609.
Wei D, Cui Y, Mei J, Qian L, Lu K, Wang ZM, Li J, Tang Q, Qian W (2019) Genome-wide identification of loci affecting seed glucosinolate contents in Brassica napus L. J Integr Plant Biol 61:611–623.
Wei T, van Treuren R, Liu X, Zhang Z, Chen J, Liu Y, Dong S, Sun P, Yang T, Lan T, Wang X, Xiong Z, Liu Y, Wei J, Lu H, Han S, Chen JC, Ni X, Wang J, Yang H, Xu X, Kuang H, van Hintum T, Liu X, Liu H (2021) Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce. Nat Genet 53:752–760.
Wu D, Liang Z, Yan T, Xu Y, Xuan L, Tang J, Zhou G, Lohwasser U, Hua S, Wang H, Chen X, Wang Q, Zhu L, Maodzeka A, Hussain N, Li Z, Li X, Shamsi IH, Jilani G, Wu L, Zheng H, Zhang G, Chalhoub B, Shen L, Yu H, Jiang L (2019) Whole-genome resequencing of a worldwide collection of rapeseed accessions reveals the genetic basis of ecotype divergence. Mol Plant 12:30–43.
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G (2021) clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2:100141.
Xu Y, Kong XD, Guo Y, Wang RS, Yao XT, Chen XY, Yan T, Wu DZ, Lu YH, Dong J, Zhu Y, Chen MX, Cen HY, Jiang L (2023) Structural variations and environmental specificities of flowering time-related genes in Brassica napus. Theor Appl Genet 136:42.
Xuan L, Yan T, Lu L, Zhao X, Wu D, Hua S, Jiang L (2020) Genome-wide association study reveals new genes involved in leaf trichome formation in polyploid oilseed rape (Brassica napus L.). Plant Cell Environ 43:675–691.
Yan T, Wang Q, Maodzeka A, Wu D, Jiang L (2020) BnaSNPDB: an interactive web portal for the efficient retrieval and analysis of SNPs among 1,007 rapeseed accessions. Comput Struct Biotechnol. J 18:2766–2773.
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82.
Yang Z, Liang C, Wei L, Wang S, Yin F, Liu D, Guo L, Zhou Y, Yang QY (2022) BnVIR: Bridging the genotype-phenotype gap to accelerate mining of candidate variations underlying agronomic traits in Brassica napus. Mol Plant 15:779–782.
Yang Z, Wang S, Wei L, Huang Y, Liu D, Jia Y, Luo C, Lin Y, Liang C, Hu Y, Dai C, Guo L, Zhou Y, Yang QY (2023) BnIR: A multi-omics database with various tools for Brassica napus research and breeding. Mol Plant 16:775–789.
Zhu W, Guo Y, Chen Y, Wu D, Jiang L (2020) Genome-wide identification, phylogenetic and expression pattern analysis of GATA family genes in Brassica napus. BMC Plant Biol 20:543

Download PDF

Reviewers agreed at journal
23 Feb, 2024
Reviewers invited by journal
05 Feb, 2024
Editor assigned by journal
27 Jan, 2024
First submitted to journal
25 Jan, 2024

You are reading this latest preprint version

Genomic and transcriptome analyses reveal the molecular basis for erucic acid biosynthesis in seeds of rapeseed (Brassica napus)

Status:

Version 1

Abstract

Figures

Key message

Introduction

Materials and methods

Results

Analysis of the variation in EAC of seeds in two genetic populations

Selective signal analysis and genome-wide-association study on EAC in seeds

Analyses for DEGs in HEAC seeds between the developmental stages 20 and 40 DAF

Analyses for DEGs in 40 DAF seeds between HEA and LEA accessions

Discussions

Declarations

Author contribution statement

Competing interests

Funding

Acknowledgements

Data availability

References

Supplementary Files

Status:

Version 1