Analysis of Genomic Variation in Different Brassica Napus Synthetic Allopolyploids

Allopolyploidy is an evolutionary and mechanisticaly intriguing process involving the reconciliation of two or more sets of diverged genomes and regulatory interactions, resulting in new phenotypes. In this study, we explored the genomic variation of eight F2 synthetic B. napus using whole-genome sequencing. We found that there was a genetic variation in the F2 generation. Part of the variation was consistent in the F2 generation, and a small number of mutations only appeared in a single plant of the F2 generation. The analysis of copy number variation (CNV) found that most of the AA genome was lost, and most of the CC genome was obtained. In addition, there was inter-chromosomal translocation (CTX) in the F2 generation and the number of each plant was different. The above results indicate that the F2 generation showed genetic variation and there was a difference between eight plants, which may lay a molecular basis for the unique eld performance of the offspring. It provides a new perspective of genomic variation and trait separation in the early stages of allopolyploid polyploid formation. variation in the F2 generation. Part of the variation was consistent in the generation, and a small number of mutations only appeared in a single plant of the F2 generation. The analysis of copy number variation (CNV) found that most of the AA genome was lost, and most of the CC genome was obtained. In addition, there was chromosomal translocation (CTX) in the F2 generation and the number of each plant was different. The above results indicate that the F2 generation showed genetic variation and there was a difference between eight plants, which may lay a molecular basis for the unique eld performance of the offspring. This study explored the genomic variation of eight F2 synthetic B. napus using resequencing. The results have shown that F2 generation showed genetic variation and there was a difference between eight plants, which may lay a molecular basis for the unique eld performance of the offspring. It provides a new perspective of genomic variation and trait separation in the early stages of allopolyploid polyploid formation.


Introduction
Polyploidy refers to the fact that the chromosome numbers of genetically related organisms are in a multiple relationship with each other [1]. The results of the study now suggest that polyploidy is a frequent rather than accidental event, and taxonomists believe that many polyploids are produced through multiple independent polyploidization processes [2]. The phenomenon of high levels of unreduced gamete formation in natural populations of angiosperms [3] provides the basis for the formation of polyploidy, and further forms heteropolyploidy (involving interspecies hybridization) and autopolyploid through hybridization and doubling, both types of polyploid contribute greatly to angiosperm species diversity [4]. The newly formed polyploid undergoes genetic variation and changes in gene expression at an early stage to cope with the impact of multiple genomes in the polyploid. There are many forms of solutions to the impact of multiple genomes, including chromosomal recombination, loss of homologous genes, changes in gene expression, etc [5][6][7][8][9][10][11].
Polyploid is not only the addition product of its diploid parent, polyploid is a combination of additive and non-additive expression of the parent [12,13]. In short, the polyploid phenotypic diversity and wide range of variation can provide species with strong adaptability.
Based on the "U-triangle" (U 1935) theory and Brassica genome sequencing data, it is generally believed that Brassica napus was naturally crossed and doubled from B. rapa and B. oleracea about 7500 years ago [14]. Moreover, it is easier to obtain hybrid offspring by embryo rescuing of B. rapa and B. oleracea as parents. The synthetic AACC allotetraploid early generation has abundant trait variations, such as ower size, owering time, waxy characteristics, leaf shape and size [15,16]. The role of genetics and epigenetics in heterologous polyploidization leads to changes in gene expression, which in turn leads to new phenotypes [15,[17][18][19][20].
Genome shock generated by hybridization and polyploidization can induce hybrid offspring to produce mutations at the genetic level, and the newly generated genetic variation will directly or indirectly cause hybrid offspring to produce new phenotypes and increase the adaptation of hybrid offspring. Rousseau-Gueuti et al. (2017) used Brassica 60K In nium SNP array to investigate the SNP variation of 30 newly synthesized AACC heterotetraploid hybrids, and found that after heterologous polyploidization, their genomes were completely shu ed, only 8.5% and 3.5% of the C and A genomes are lost in generations. The identi ed deletions mainly occur in the distal part of the chromosome, and the C genome has a greater degree of variation than the A genome [21]. Wang et al. (2013) re-sequenced hybrid rice introgression lines and found that introgression hybridization caused widespread changes in the rice genome, and some of these mutations led to important new phenotypes [22].
There is little research on the genomic variation of the newly synthesized AACC heterotetraploid hybrids. In this experiment, the Chinese cabbage × Chinese kale and eight F2 were re-sequenced to analyze the different characters genomic variation of the newly synthesized heteropolyploid. We found that there was a genetic variation in the F2 generation. Part of the variation was consistent in the F2 generation, and a small number of mutations only appeared in a single plant of the F2 generation. The analysis of copy number variation (CNV) found that most of the AA genome was lost, and most of the CC genome was obtained. In addition, there was chromosomal translocation (CTX) in the F2 generation and the number of each plant was different. The above results indicate that the F2 generation showed genetic variation and there was a difference between eight plants, which may lay a molecular basis for the unique eld performance of the offspring.

Plant materials
For this study, we used 10 accessions, including the female parent Cai-Xin, male parent Chinese kale, and eight F2 synthetic allopolyploids (Fig. 1). The materials used are the same as in the previous article [23,24].

Whole-genome sequencing
Young leaves next to bud (5 cm in length) were collected, frozen in liquid nitrogen, and stored at -80°C until extraction. DNA was extracted using the CTAB method. Unampli ed, high-molecular weight, RNase treated genomic DNA (4-6 μg) was used for WGS. WGS were performed at the Novogene company (Beijing, China) with an Illumina HiSeq 2000. WGS was performed with the TruSeq DNA prep kit. Sequencing was carried out so as to obtain 30× coverage from 2 × 150-bp paired-end reads.
Analysis of Whole-genome Sequencing Data.
For the raw data, the adapter sequence, undetected bases, and bases with very low sequencing quality are ltered to obtain clean data, and the clean data is used for data analysis. The B. rapa genome and the B. oleracea genome were combined together to serve as reference genomes for eight F2 generation single plants. Use BWA software to compare the sequencing data of F2 to the merged reference genome. The sequencing data of the P1 was compared to the B. rapa genome, and the data of P2 to the B. oleracea genome. Use samtools software to detect genomic variation, including SNP and InDel, should be use CNVkit software to detect copy number variation (CNV) [25] and Breakdancer software to detect structural variation (SV) [26].

Data accessibility statement
The resequencing data we sequenced would be uploaded to genebank database after the article is published.

Results
Acquisition of clean reads data The statistical results of clean reads obtained by ltering out unquali ed data with whole genome sequencing are shown in the following table (Table 1). The sequencing data of the parents and 8 F2 single plants are all 30 ×. The sum of the data of the two parents is similar to the 8 F2 single plants, which lays the foundation for the subsequent data analysis. The data of F2 were compared with the data after the parents were combined to analyze the genomic variation of F2.

SNP and InDel analysis
Using samtools for SNP and InDel mutation analysis, it was found that the number of SNPs in F2 plants was higher than the sum of the parents, and the number of InDel in F2 plants was similar to the sum of the parents (Fig 2). Most of the F2 plant variants are located in the intergenic region, and a small number are located in the intron and exon regions of the gene. F2 plants had more heterozygous genotypes (ALT), pure and genotypes than their parents (Fig 2). In addition, the experimental analysis of the distribution of mutations found that 8 F2 plants had frequent genomic mutations, some of the mutations were consistent in all progeny (red rectangles), and a few mutations only appeared in one F2 single plant (purple circle) (Fig 3), it is speculated that this phenomenon is related to the differential expression of genes between individual plants, which in turn affects the performance of traits.

CTX and CNV analysis
The experiment used Breakdancer software to detect structural variation (SV). SV includes deletion (DEL), inversion (INV), insertion (INS), intrachromosomal translocation (ITX), and interchromosomal translocation (CTX). Analysis found that there are a large number of interchromosomal translocations (CTX) in offspring, which is consistent with previous observations using chromosomes [27], The occurrence of inter-chromosomal translocations leads to genomic variation, which in turn leads to the emergence of new phenotypes and trait separation in the eld shape of newly synthesized AACC heterotetraploid hybrids. In addition, the experiment found that there are differences between individual plants, the number of 8 plants are: 46782, 45033, 43651, 45546, 44469, 46006, 44705, 46329 (Fig 4), The number of CTX in AACC1 was the highest, and the number of CTX in AACC3 was the least, which was consistent with the results of gene expression and small RNA expression [28]. It is speculated that the unique genetic differences of 8 plants may provide the molecular basis of unique eld traits.
In the experiment, CNVkit software was used to detect copy number variation (CNV). The analysis found that in 8 F2 plants, the AA genome was mostly lost, and the CC genome was mostly represented (Fig 4). The results are consistent with the conclusions obtained by transcriptome sequencing analysis of differentially expressed genes and activation / silence genes in previous research (unpublished), indicating that the two genomes responded differently when the synthetic AACC heterotetraploid hybrids experienced WGD.

Discussion
Genomic variation of synthetic AACC heterotetraploid . In this experiment, using bioinformatics analysis, it was found that the offspring of the hybridization had more interchromosomal translocations, which was consistent with the previous conclusions. In addition, genome copy number variations (CNVs) refer to complex variations derived from the insertion, ampli cation, and deletion of DNA fragments ≥ 1 kb in the genome aligned with the genome reference sequence. Experimental analysis of copy number variation found that most of the AA genome performance was lost, and the corresponding majority of the CC genome performance was obtained, indicating that the two genomes showed differences in response to WGD. In addition, regardless of the number of CTX or the performance of CNVs, the performance among individual plants is different, which is consistent with the results of transcriptome analysis and small RNA expression analysis in previous studies [28]. Speculate that the speci c variation of a single plant lays a molecular foundation for its unique eld traits. Although, due to the small number of individual plants, it is di cult to associate genetic variation, small RNA changes, and gene expression with a speci c trait, but this study is necessary. The population should be expanded in the future to link the variation with the trait, providing substantial theoretical basis for breeding.

Conclusion
This study explored the genomic variation of eight F2 synthetic B. napus using resequencing. The results have shown that F2 generation showed genetic variation and there was a difference between eight plants, which may lay a molecular basis for the unique eld  Variation distribution of parents and offspring The outermost circle is the parent, and the inside is AACC1...AACC8 Figure 4