Acquisition of clean reads data
The statistical results of clean reads obtained by filtering out unqualified data with whole genome sequencing are shown in the following table (Table 1). The sequencing data of the parents and 8 F2 single plants are all 30 ×. The sum of the data of the two parents is similar to the 8 F2 single plants, which lays the foundation for the subsequent data analysis. The data of F2 were compared with the data after the parents were combined to analyze the genomic variation of F2.
Table 1 Number of clean reads from parent and offspring
|
AACC1
|
AACC2
|
AACC3
|
AACC4
|
AACC5
|
AACC6
|
AACC7
|
AACC8
|
P2
|
P1
|
clean reads
|
3.29E+10
|
3.19E+10
|
3.54E+10
|
3.00E+10
|
3.66E+10
|
3.40E+10
|
4.27E+10
|
3.60E+10
|
1.81E+10
|
1.35E+10
|
Sequence Alignment
After comparing the clean reads of the parents and F2 plants with the genome. The results are shown in the table. The experiment found that the comparison rate of the parents and the F2 plants exceeded 90% (Table 2), most of the data can be used.
Table 2 Comparison ratio of clean reads
|
P1
|
P2
|
AACC1
|
AACC2
|
AACC3
|
AACC4
|
AACC5
|
AACC6
|
AACC7
|
AACC8
|
mapped
|
98.11%
|
99.02%
|
98.63%
|
98.67%
|
98.70%
|
97.92%
|
90.85%
|
98.73%
|
96.26%
|
98.71%
|
properly paired
|
93.16%
|
94.02%
|
91.29%
|
90.68%
|
90.68%
|
90.62%
|
83.03%
|
91.08%
|
88.22%
|
91.36%
|
SNP and InDel analysis
Using samtools for SNP and InDel mutation analysis, it was found that the number of SNPs in F2 plants was higher than the sum of the parents, and the number of InDel in F2 plants was similar to the sum of the parents (Fig 2). Most of the F2 plant variants are located in the intergenic region, and a small number are located in the intron and exon regions of the gene. F2 plants had more heterozygous genotypes (ALT), pure and genotypes than their parents (Fig 2). In addition, the experimental analysis of the distribution of mutations found that 8 F2 plants had frequent genomic mutations, some of the mutations were consistent in all progeny (red rectangles), and a few mutations only appeared in one F2 single plant (purple circle) (Fig 3), it is speculated that this phenomenon is related to the differential expression of genes between individual plants, which in turn affects the performance of traits.
CTX and CNV analysis
The experiment used Breakdancer software to detect structural variation (SV). SV includes deletion (DEL), inversion (INV), insertion (INS), intrachromosomal translocation (ITX), and interchromosomal translocation (CTX). Analysis found that there are a large number of inter-chromosomal translocations (CTX) in offspring, which is consistent with previous observations using chromosomes [27], The occurrence of inter-chromosomal translocations leads to genomic variation, which in turn leads to the emergence of new phenotypes and trait separation in the field shape of newly synthesized AACC heterotetraploid hybrids. In addition, the experiment found that there are differences between individual plants, the number of 8 plants are: 46782, 45033, 43651, 45546, 44469, 46006, 44705, 46329 (Fig 4), The number of CTX in AACC1 was the highest, and the number of CTX in AACC3 was the least, which was consistent with the results of gene expression and small RNA expression [28]. It is speculated that the unique genetic differences of 8 plants may provide the molecular basis of unique field traits.
In the experiment, CNVkit software was used to detect copy number variation (CNV). The analysis found that in 8 F2 plants, the AA genome was mostly lost, and the CC genome was mostly represented (Fig 4). The results are consistent with the conclusions obtained by transcriptome sequencing analysis of differentially expressed genes and activation / silence genes in previous research (unpublished), indicating that the two genomes responded differently when the synthetic AACC heterotetraploid hybrids experienced WGD.