Construction of a Quantitative Genomic Map Gives Insight on Rapeseed Genetic Characteristics and Direction for Future Breeding


 Background

Rapeseed is the second most important oil crop in the world. Improving seed yield and seed oil content are the two main highlights for researches. Unfortunately, rapeseed development is frequently affected by different diseases. Extensive researches have been made through many years to develop elite cultivars which could produce high oil, high yield or be resistant to disease. QTL analysis has been one of the most important strategy in genetic deciphering of agronomic characteristics. In order to comprehend the distribution of these QTLs and to uncover the key regions that could simultaneously control multiple traits, 4555 QTLs that have been identified during the last 25 years were aligned in one unique map, and a quantitative genomic map which involved 128 traits from 79 populations developed in 12 countries was constructed.
Results

The present study revealed 517 regions of overlapping QTLs which harbored 2744 candidate genes and might affect multiple traits, simultaneously. They could be selected to customize super-rapeseed cultivars, including the cultivars which produce high oil content for biofuels production. The gene ontology and the interaction network of those candidates revealed genes which highly interacted with the other genes and might have strong influence on them. The expression and structure of these candidate genes were compared in eight rapeseed accessions and revealed genes of similar structure which were expressed differently.
Conclusion

The present study enriches our knowledge on rapeseed genome characteristic and diversity, and it also provided indications for rapeseed molecular breeding improvement in the future.

QTLs investigation and discovery of related candidate genes can be done together, this strategy helps to comprehend the authority of these genes over traits [11,21]. Identi cation of candidate genes implies the detection of important genes for agricultural and economic quantitative traits. Candidate genes are present within QTL region and they are responsible for phenotype variation [22]. The effect of these genes on variation of phenotype could be elucidated through investigation on the gene arrangement and interaction of loci affecting this variation [11]. This technique has already been used to identify potential candidate genes in B. napus Darmor-bzh, for instance, Chao et al. used this technique to identify potential candidate genes for seed oil content trait, and found 448 genes underlying 41 oil content QTLs [16]. More, 76 candidate genes were found for 57 QTLs for oil content and fatty acids [19], and 147 candidate genes were discovered in a region of 131 yield QTLs which were overlapping [20]. Candidate genes can be manipulated to get the most bene cial gene combination to get maximum pro t, especially those genes which were found in region of overlapping QTLs involving many traits. Additionally, the release of various B. napus genome sequences: Darmor-bzh (3), Tapidor [23], Zhongshuang11 -ZS11 [24], Gangan, Shengli, Zheyou7, Quinta, No2127, Westar and 1689 other accessions [25] represent a precious resource which will have a tremendous impact on understanding rapeseed accessions diversity, notably the structural variation of regions which are associated with agronomic traits.
In our previous study, overlapping QTLs for single trait were detected (e.g. oil content or seed yield). In the current study, the purpose was to construct a quantitative genomic map and to detect genomic regions that might control multiple traits associated with seed, yield, hormones level and diseases, simultaneously, and the related candidate genes were also identi ed. Moreover, structural variation and gene expression level of those candidate genes were studied in eight different accessions. Consequently, the ultimate objectives were: (1) to build a quantitative genomic map of QTLs associated to agronomic and disease related traits in order to display overlapping regions with multiple traits; (2) to reveal the candidate genes within those regions of overlapping QTLs for the purpose of nding genes that might have pleiotropic effects on seed composition, seed yield, hormones and disease; (3) to identify the homologous of these candidate genes in eight rapeseed accessions and to analyze the gene expression and the structure variation. The present study would enhance the knowledge of rapeseed genome characteristic and diversity, the ndings can be used to develop molecular markers associated with the studied traits, and also can provide some guidance for molecular design for breeding. Identi ed candidate genes might be used to target for genomics-based improvement and better seed yield, seed composition and disease resistance in the future.

Results
A quantitative genomic map of QTLs controlling seed yield, seed components, hormone level and disease related traits 4555 QTLs of 128 agronomic and disease related traits, developed from 79 different populations of three different ecotypes and grown in 12 different countries (Additional le 3: Table S1) were gathered and combined together in one unique map (Fig. 1). A total distance of 978.4Mb on the physical map of Darmor-bzh was covered ( Fig. 2A, Additional le 1: Figure S1, Additional le 4: Table S2). Further observation revealed that 2695 and 1860 QTLs were located on A and C genome, respectively. A3 and C3 chromosomes contained the highest number of QTLs, with 430 and 399 QTLs, respectively, whereas A10 and C4 chromosomes had the longest coverage distance (159.33Mb and 122.42Mb, respectively). A9, A3 and C3 chromosomes contained the highest number of traits (80, 71 and 69 traits, respectively) ( Fig. 2A).
Obviously, most QTLs for seed components, seed yield, hormones level and disease related traits were found on A genome rather than in C genome.
It is crucial to locate regions of the genome where multiple traits overlapped the most. Thereby, the abovementioned 128 traits were subdivided into ve categories: abiotic factor (A), biotic factor (B), hormones level (H), seed components (S) and yield related traits (Y). The total number of QTLs in each category were 349 (A), 334 (B), 42 (H), 1392 (S) and 2438 (Y). Each region on Darmor-bzh genome was carefully observed in order to detect regions where QTLs of more than one category of trait could overlap, i.e. regions with two, three, four or ve categories of traits, which were present in one region, simultaneously.
A total of 517 regions which hosted overlapping QTLs were observed (Fig. 2B, Additional le 1: Figure  S1). The region of overlapping QTLs on each chromosome, the number of QTLs and the categories of traits are summarized on Additional le 5: Mb, respectively). Then, the most abundant and the most overlapping categories of traits were S and Y categories, they were found in 403 among the 517 regions of overlapping QTLs detected in this study.
The H category of trait was found rarely in overlapping region since the identi ed QTLs in early published papers were few (42 QTLs), so far, this H category were found in 39 among the 517 regions of overlapping QTLs of this study. Otherwise, regions of overlapping QTLs which involved one environment or one population were observed. No speci ed region was found exclusively for one population. Also, only speci ed regions for China were found in 11 areas of the genome: four areas on C3 (36. 94 Table S3). For instance, the region on A8 (16.87-17.37Mb) had four QTLs (3S, 1Y), which were all found with Chinese experimental eld. Ultimately, the rapeseed genome had been nely dissected to unveil regions that harbored multiple traits, simultaneously. It would be crucial to couple those ndings with the identi cation of genes that were located within those regions to understand the in uence of those genes over those traits.
Candidate genes identi ed within regions of overlapping QTLs, and their interaction network Totally, 3181 genes which are associated to oil biosynthesis, yield and disease related traits were aligned to the physical map of Darmor-bzh, and a total of 2744 candidate genes were found within overlapping QTLs of two to ve categories of traits (Fig. 3A, Additional le 7: Table S5). A total number of genes of 26 (1%), 729 (47%), 1291 (27%) and 700 (25%) were found for ve, four, three and two categories of traits, respectively (Fig. 3B). involving two overlapping QTLs (1 A and 1 B). In assumption from those ndings, important genes which were located within regions of overlapping QTL with multiple traits were identi ed. They might have in uence on more than one category of traits, and they could be selected according to the desired improvement of two or multiple traits.
Interaction network analysis of the 2744 candidate genes were made with their 1555 orthologous genes in A. thaliana because B. napus is not available on String database. Gene ontology (GO) analysis indicated that the 1555 genes could be classi ed into 16 categories, according to Panther GO-slim biological process's classi cation (Additional le 8: Table S6) . Those genes belonged to the GO categories of metabolic process (KASI and AGL20), multicellular organismal process (CO) and other category (AP2, FT, AUX1, KASIII, MCAT, PHYA, COP1 and ACP4). Those most in uential genes had function related to oil biosynthesis and yield related traits: ACP, MCAT, KASI and KASIII are both plastidial genes which are involved in fatty acid biosynthesis. ACP act as carrier of acyl intermediates, then MCAT catalyzes the synthesis of malonyl-ACP and CoA from malonyl-CoA and ACP. KASI and KASIII both act on fatty acid elongation. Then, the other seven in uential genes were related to yield traits: AGL20 promote owering and in orescence meristem in Arabidopsis, AP2 are involved in oral organ speci cation, in oral meristem establishment and in ovule and seed coat development. AUX1 are auxin transporters and have in uence on lateral root initiation and positioning, CO regulate owering during long days, COP1 act as suppressors of photomorphogenesis and stimulate skotomorphogenesis in the dark, FT also promote owering as AGL20, and PHYA regulate photomorphogenesis. Those most in uential genes had different functions and were involved in different metabolism pathways, yet they might have higher effect over other genes, this might indicate that the simultaneous control of multiple categories of traits might be affected at different path of metabolisms.
Gene expression and structural variation of candidate genes in eight rapeseed accessions.
One copy of each homologous of the 11 most in uential genes of the interaction network in B. napus was selected for the current gene expression and structural variation analyses. Additionally, the 26 candidate genes which were found within QTLs of ve categories (A, B, H, S and Y) were also added to the analyses. Thus, a total of 37 genes of Darmor-bzh genome were searched in eight rapeseed accessions by using of BnPIR database [25], including two winter-types (Quinta and Tapidor), two springtypes (Westar and No2127), and four semi-winter types (ZS11, Zheyou7, Shengli and Gangan) (Additional le 9: Table S7 Table S8). The expression of those six genes is illustrated on Fig. 6 and nucleotide sequence identity is shown on Additional le 11: Table S9. ACP and AGL20 were among the above mentioned most in uential genes in the interaction network and their roles were cited earlier. Besides, the four remaining genes were located in regions of overlapping QTLs of ve categories of traits: FRB12 encodes a translation initiation factor which is involved in apoptosis in response to infection from Pseudomonas syringae, LTA2 is involved in embryo formation and development, PP2A modulates protein phosphoregulation in several cellular processes such as growth and developmental processes, hormone and environmental responses, and RLP46 are trans-membrane receptors which are involved in cellular signaling.
ACP4 gene expression showed dissimilarity at T1, with the highest expression in Gangan, followed by No2127, Westar, Zheyou7 and ZS11, while expression of ACP4 in Quinta, Shengli and Tapidor were lower, the other stages displayed comparable pro le.  . However, they shared 100% of identity between them, and with PP2A of other genome, except with Westar PP2A which shared 99% identity with them. Westar PP2A expression level remained low from stage T1 to T4. At last, a peak of high expression level was noticeable in RLP46 of Tapidor and Quinta at T3, while those of Westar, Zheyou7 and ZS11 peaked at T2. Tapidor and Quinta RLP46 shared 100% identity between them, and with Gangan, Westar and ZS11, and 99% with No2127, Shengli and Zheyou7. It was noticeable than in genes Zheyou7 and ZS11 always had the same expression level, even when the nucleotide sequence identity were less than 100%. Those ndings suggested that similar gene sequence might display different expression pro le, and reversely, different gene sequence might show similar expression pro le.

Discussion
The current study aimed to combine QTLs for seed component, seed yield, hormones and disease related traits, which were detected in previous studies, in one physical map in rapeseed, to identify the related candidate genes and to analyze their expression and structural variation in different rapeseed accessions. The same strategy was used to nd region that might control multiple traits of one category, i.e seed oil [19] and seed yield [20]. In those two studies, some regions were suggested to possibly contribute to the improvement of one trait or multiple traits of one category, and some regions were supposed to be stable for one given environment. For example, a region on A1 (2.50-2.99Mb) had overlapping QTLs for plant height, which were from two populations developed in China (Tapidor×Ningyou7 and Express617×V8), so this region might affect the plant height and it is a stable region for Chinese environment [20]. More, QTLs for C16:0, C18:0, C18:1, C18:2, C20:0, and C22:1 were overlapping on C3 (53.75 Mb to 58.29 Mb), thus, this region might control those six traits, simultaneously [19]. The current study was made in higher level because it was not limited to one category of trait as the previous studies, but ve categories which involved all studied traits in rapeseed.
Actually, increasing seed oil and seed yield are among the main focus of researcher on rapeseed, in order to cater the increasing demand of oil. However, usage of rapeseed is not limited as a biomass for oil, but also for multiple purposes, such as protein, carbohydrate and vitamins sources, and many more [8].
Despite the effort in improving seed components and seed yield, rapeseed crops are under attack from various diseases that resulted in huge crop loss. For example, Leptosphaeria maculans causes blackleg disease [26], which has created an economic loss of $900 million per growing season in the world [27]. Despite the fact that resistant cultivars have been developed and cultivated since year 1990s in Canada [28], which has decreased the yield loss of 1% [29]. Abiotic stresses has also caused about 50% yield reduction in major crops [30]. Actually, extensive researches are still undertaken to take total control of those biotic and abiotic disease, via selective breeding. Otherwise, phytohormones play important roles in plant growth and development, such as IAA [31], but also on plant adaptation to assure survival face to the environment uctuation. ABA respond to both of biotic and abiotic stresses [32], which have in uence on one another [33]. Those phytohormones support agronomic trait improvement and response to disease. Therefore, all the ve categories of traits analyzed in the current study are correlated and are pivotal for rapeseed crop improvement.

Dissection of rapeseed genome revealed regions controlling multiple traits
The current study is the rst study to gather all the QTLs of important agronomic and disease related traits discovered in B. napus over 25 years, in order to construct a quantitative genomic map, which is crucial to uncover similarity and difference in QTLs detected from different populations and environments, but also to reveal the regions that might control multiple bene cial traits, simultaneously.
It was obvious that most of the QTLs were found on A rather than C genome (2695 vs 1860 QTLs, respectively). Selection has played important role in improving B. napus. It was reported that C genome rather than A genome, contained extended breeding regions (51.15 Mb on C vs 16.80 Mb on A) which might contribute more to alleles producing elite traits [34]. However, a recent investigation on the origin of B. napus and the genetic loci that contributed to its improvement had revealed that parallel selection of A and C genome had let to seed quality improvement in B. napus [35]. In fact, A genome speci c-selection greatly enhanced disease resistance, oil accumulation and environment adaptation of B. napus during its rst stage of improvement, while C genome had improved developmental traits. This might explain the fact that most of the QTLs of studied traits in the current study could be found on A genome. Particularly, for Asian B. napus varieties, it was reported that they have experienced strong arti cial selection from A genome which contributed to their adaptation following their introduction from Europe [36].
Apart from that, 517 regions were found with overlapping QTLs involving at least two categories of traits. Those regions might be suitable for selection to improve two or more desired traits, simultaneously, for example, to improve both of abiotic stress response, seed component and seed yield (A4:3.07-4.11 Mb).
Several studies have already investigated on co-location of QTLs from different category of traits [37,38,39,40,41,42,43,44,45,46,47,48]. Those studies demonstrated the importance of analyzing multiple traits, at the same time, to target the loci for breeding cultivars with the most advantageous pro le. For example, a study which focused on owering time (FT) and Sclerotinia stem rot resistance (SSR) reported that early FT might increased susceptibility to S. sclerotiorum, and regions of co-location of FT and SSR resistance traits were found which were crucial for breeding early maturing and SSR resistance cultivars. More, four co-localized QTL hotspots of SSR resistance and FT on A2 (0-7.7 Mb), A3 (0.8-7.5 Mb), C2 (0-15.2 Mb), C6 (20.2-36.6 Mb), which were consensual with previous studies [47]. In the current study, QTL of SSR and DIF (FT) were also co-localized in those regions.
Particularly, seed components and seed yield traits often overlapped in this study. In earlier studies, yield traits such as owering time, morphology of root and plant growth environment could affect seed quality traits such as erucic acid, oil, protein and glucosinolate contents [13,49,50,51,52]. In oil crops, QTLs that could have in uence on both seed quality and yield traits had already been discovered in several studies, in a positive or negative way. For instance, oil and protein contents were positively correlated with seed weight in eleven Brassica carinata lines developed in Canada [53]. Zhao et al. found evidence of positive correlation between oil content and seeds per silique while evaluating 282 DH lines from cross between Sollux and Gaoyou (B. napus), and developed in Germany and China [54]. In a study performed by Chen et al. in a DH population derived from cross between high and low oil content B. napus and developed in Canada, oil content and owering time were negatively correlated [55]. In that study, QTLs for oil content, owering time and seed yield were co-localized on a small region of LG7 where alleles of low oil content, early owering time and higher seed yield were found together. However, QTLs for high oil content and early owering time were found in co-location on LG2.
Overlapping QTLs of multiple traits might happen when gene alteration frequencies at nearly linked loci occurs, but also, it might be caused by pleiotropic effect when an appropriate substitution of genes occurs [56]. Also, pleiotropy or/and linked genes might have caused this phenomenon.
Region on the chromosome which was exclusively for QTLs from one population was not found, and those exclusively for one environment was for China only ( xed environment for China). This indicated that those QTLs remained unchanged, despite the variation of population and environment.
Identi ed candidates genes might be pleiotropic or linked genes?
In previous studies, 110 genes in Arabidopsis were identi ed to be involved in oil formation [57] and 439 homologous genes were found in B. napus [19]. More, 425 yield genes which were related to branch number, owering time, maturity time, plant height, pod number, seed number, seed weight, and seed yield in Arabidopsis were identi ed [58] and 1398 homologous genes were detected in B. napus [20]. Dolatabadian et al. found 1344 resistance genes in B. napus [59]. Those genes had relationship with the current studied traits, thus, they were selected to uncover the candidate genes .
A total of 2744 genes related to oil, yield and resistance were found within overlapping QTLs involving two to ve categories of traits. As mentioned above, overlapping QTLs might be caused by pleotropy or linked genes. Pleiotropy is when one gene can control multiple unrelated phenotypic traits [60, 61, 62, 63].
Pleiotropy is largely distributed due to biochemical and developmental systems and it affects development and evolution, and creates correlations between genes and phenotype, and it affects selection and impose the accessibility of the evolution extent [64, 65]. The pleiotropic organization of traits (dominant or epistatic) can be modi ed by selection and inbreeding [66, 67, 68]. Linked genes are genes located closely to each other on the same chromosome and are inherited together during meiosis.
Genes might separately control different phenotype but are found closely located on the same region of a chromosome.
Candidates found within region of overlapping QTLs with ve categories of traits attracted more our attention, since they might be more in uential than the others over multiple traits. A total of 26 candidates were found on ve regions distributed on A6, A9 and C3 chromosomes, and they would be the most recommended in this study, for selection to target multiple traits simultaneously (Fig. 7). They belonged to different families and might have different distinct roles, but the way they act to in uence each other or to affect the traits still need a deep investigation. Functional investigation of each gene would be indispensable to comprehend their in uence over the studied traits, and would reveal whether they were pleiotropic genes or linked genes.
Gene interaction network revealed that 11 genes might have more in uence over the other genes. Genes are responsible for genetic variation of traits [22], and structure and dynamism of genetic regulatory network have impact on quantitative traits [69]. In this study, KAS, ACP, AUX1, CO, FT, PHYA, AGL20 were also identi ed as most in uential genes in our previous studies [19,20]. Despite the number of genes identi ed in this study was far larger than those of the previous study, and genes function were also broader, those eight genes of different functions still had higher in uence over the other genes, indicating that simultaneous control of multiple traits might be affected at different metabolism pathway.
Expression and structural variation analysis of genes in eight rapeseed varieties showed that some genes which had 100% sequence similarity displayed different expression pro le, and some other genes with different sequence showed similar expression pro le. Epigenetic factors might be responsible for this phenomenon, i.e. alteration in gene expression while preserving the primary DNA sequence or genotype [70,71]. Epigenetic mechanisms include DNA methylation which commonly induces gene silencing by blocking transcription binding site, histone modi cation which alters chromatin structure and accessibility of genes for transcription, and non-coding RNA-associated gene silencing which targets mRNA transcripts for destruction induce and preserve epigenetic change [72,73]. Even if epigenetic change is natural and regular, it can also be in uenced by environmental factors [74,75]. In the case of our study, the eight accessions were produced with different genetic and environmental backgrounds, thus difference in gene expression even with similar sequence was expected.

Breeding a super-rapeseed cultivar that meets expectations
The current study uncovered regions, with two, three, four or ve categories of traits which can be chosen and used for marker assisted selection, to produce a customized rapeseed cultivar with desired traits. For instance, stresses imposed by heat is detrimental for seed yield and quality (reviewed by Sehgal et al. [76]). In order to control these traits at once, the region on A3 (11.40-12.47 Mb) could be selected for nemapping, since it contained overlapping QTLs for heat, seed yield and seed composition. Candidate genes included in this region could be cloned and validated through functional analysis, in order to understand the related molecular mechanism.
Another innovation of the current study is the usage of the rapeseed pan-genome of BnPIR to compare gene expression and gene structure of candidate genes. This strategy aimed to comprehend how same genes of different accessions would be expressed, and how their structures are different. This might serve later to explain their functions. Since numerous rapeseed accessions have been sequenced, performing the same study as our current study is now feasible in those other accessions. It would enhance our understanding of rapeseed genome variation.
Besides, compared to rice, rapeseed breeding program needs more effort and innovation. Until now, rapeseed research focuses on QTLs and the studied traits were repetitive. However, rice breeding program already focuses on QTG (quantitative trait gene) and QTN (quantitative trait nucleotide) for improvement of this crop [77]. This effort was made in order to further close the gap between genomic studies and practical breeding, and to facilitate the localization of causative variants of all known traits. A collection of rice varieties with those variation was made and a genome navigation system was established for breeding. Thus, research on rapeseed should switch progressively into those QTG and QTN analyses. Actually, multiple rapeseed accessions are also available and a collection of variation should be implanted for breeding.
Finally, the current study has enhanced our knowledge on rapeseed genome characteristics and diversity.
Co-localized QTLs might have ally or antagonistic effect. For the usage in practical breeding, identi cation of the most favorable alleles combinations which will produce maximum pro ts is still crucial.

Alignment of QTLs on the physical map of Darmor-bzh
Extensive literature inquiry allowed to identify more than 350 papers which reported on GWAS and QTLs analyses in B. napus over the last 25 years (1995-2020). They were manually sorted according to data availability. Research articles with missing information were removed (absence of anking markers, marker sequence or physical position of QTLs on Darmor-bzh). QTLs/GWAS with just one anking marker were kept and given an area of 1cM from the unique marker as loci. 4555 QTLs for seed, yield, hormones and disease related traits were collected from 145 research articles, involving 79 different populations of three different ecotypes and grown in 12 different countries (Table S1). They were aligned in the physical map of Darmor-bzh. Location of QTLs anking markers on the physical map was detected via e-PCR [78,79], and method of alignment was as similar as our previous studies [19,20]. The map was build using Circos software [80].

Identi cation of candidate genes
Genes in B. napus which were reported in three different literatures were selected as basis for the identi cation of candidate genes in the current study: 439 genes were related to oil formation [19], 1398 genes were related to yield traits [20], and 1344 genes were resistance genes [59]. They were aligned into the physical map of Darmor-bzh, and the genes located within overlapping QTLs were identi ed as candidates for the traits.

Construction of gene interaction network
The gene interaction network was predicted using STRING (http://string-db.org/) [80]. Orthologous genes in A. thaliana were used to perform the analysis, because B. napus is still not available on STRING database, and A. thaliana genes have more accurate known functions rather than those of B. rapa and B. oleracea. The genes were clustered using Panther GO-slim biological process (http://pantherdb.org/) [82], and the interaction was visualized with Cytoscape_V3.6.0 [83].
Gene expression and structural variation analyses of candidate genes Identi cation of homologous candidate genes in Gangan, No2127, Quinta, Shengli, Tapidor, Westar, Zheyou7 and ZS11 was made with blast tool in BnPIR database (http://cbi.hzau.edu.cn/bnapus/) [25], using Darmor-bzh gene sequences as probe. In sillico gene expression analysis was made using "gene expression" tool of BnPIR. The identity percentage between gene sequences was calculated using Vector NTI Advance 11.5. Different colors represent different trait. QTLs were arranged from inner to outer circle according to their apparition on the physical map of Darmor-bzh. The map was built using Circos software.   Candidate genes interaction network. The interaction analysis was made with orthologous A. thaliana genes by using STRING (http://string-db.org/) and visualized with Cytoscape_V3.8.2. 1271 nodes and 10101 edges are shown. Eleven categories of genes are displayed according to their GO term enrichment.