Transcriptome sequencing and reads mapping
High-throughput transcriptome sequencing was used to study and compare the transcript differences in natural allotetraploid B. napus relative to its diploid progenitors. The RNA samples from stems, leaves, flowers, and siliques of B. napus and its diploid progenitors (Figure 1) were subjected to paired-end RNA sequencing, each with three biological replicates. After filtering and quality control of the raw reads, a total of 1529.93 million (M) clean reads from 36 RNA libraries were obtained (approximately 42.5 M reads per library, Table 1). An average of 85.5%, 63.9%, and 63.7% of the reads from the samples of B. rapa, B. oleracea, and B. napus were uniquely mapped to the A genome [47], C genome [46], and the integrated A-C genome, respectively (Table 1). Gene expression correlations between the three biological replicates was high, and the Pearson correlation coefficient (R) between them mostly exceeded 0.9 (Figure 2). The transcripts per million reads (TPM) method were used to normalize the gene expression levels in our study. The specific statistics of expressed gene numbers in all samples were shown in Table 2. In total, 41,914, 32,204 and 73,012 genes were detected to express in the four tissues of B. rapa, B. oleracea and B. napus, respectively. Among the 73,012 genes expressed in B. napus, 40,831 genes were derived from the A subgenome, and 32,181 genes from the C subgenome.
Differentially expressed genes (DEGs) between B. napus and its diploid progenitors
To study the differences of gene expression between natural allotetraploid B. napus (AnAnCnCn) and its diploid progenitors (ArAr and CoCo), all DEGs in stems, leaves, flowers, and siliques were identified using DESeq2, with and padj ≤ 0.001. Compared with diploid progenitors B. rapa, a total of 17,463 DEGs were identified in four tissues, including 10,084 in stems, 6,614 in leaves, 8,557 in flowers and 8,246 in siliques (Figure 3). While compared with diploid progenitors B. oleracea, 11,930 DEGs were identified in four tissues, including 5,233 in stems, 5,025 in leaves, 6,708 in flowers and 5,122 in siliques (Figure 3). In total, the DEGs between allotetraploid B. napus and B. rapa were approximately 1.5 times of that between B. napus and B. oleracea, among these, the most different tissue was stems and the DEGs were about 1.9 times. In allotetraploid B. napus, more DEGs from both A and C subgenomes were down-regulated relative to those in its two diploid progenitors, and an average of 53% (9,266 of 17,463) and 52.9% (6,312 of 11,930) of DEGs in the A and C subgenomes were down-regulated, respectively (Figure 2).
Functional classifications of the DEGs
To further explore the gene functional differences between B. napus and its two diploid progenitors in selected four tissues, all genes from A and C genomes were functionally annotated using eggNOG-mapper (Huerta-Cepas et al. 2017). A total of 91% (42,103 of 46,250) and 91.8% (42,017 of 45,758) genes from A and C genomes, were annotated, respectively. Moreover, 54.8% (23,083 of 42,103) and 57.4% (24,102 of 42,017) genes from A and C genomes were annotated to at least one GO term, respectively. Then, the GO functional categories of all DEGs between B. napus and its diploid progenitors were investigated, using the GO functional classification analysis (WEGO). A total of 55 enriched GO terms were identified among DEGs, including three categories: biological process (31 GO terms), molecular function (8 GO terms), and cellular component (16 GO terms) (Figure 4). In the DEGs between B. napus and the diploid progenitors B. rapa, there were four secondary dominant groups, including growth, membrane-enclosed lumen, membrane part, and organelle part (Figure 4). While in the DEGs between B. napus and B. oleracea, there were seven secondary dominant groups, such as immune system process, response to stimulus, extracellular region, and organelle part (Figure 4).
KEGG analysis of the DEGs
To identify the metabolic or signal transduction pathways involved in DEGs, all DEGs were mapped to reference pathways in the KEGG database. A total of 12 pathways were significantly enriched (qvalue ≤ 0.05) in DEGs between B. napus and its diploid progenitors B. rapa, such as photosynthesis (ko00195), pentose phosphate pathway (ko00030), and circadian rhythm-plant (ko04712). While, only 4 pathways were significantly enriched in DEGs between B. napus and B. oleracea, including plant-pathogen interaction (ko04626), photosynthesis (ko00195), photosynthesis-antenna proteins (ko00196), and carbon fixation in photosynthetic organisms (ko00710). Three pathways related to photosynthesis (ko00195, ko00196, ko00710) were enriched in both two comparison groups (AnAnCnCn vs ArAr and AnAnCnCn vs CoCo). Statistics shown that these three pathways were down-regulated in all four tissues of B. napus relative to its diploid progenitors, suggesting that the photosynthesis of natural B. napus was not as active as that of its two diploid progenitors. It was obvious that DEGs between B. napus and its diploid progenitors were involved in many plant physiological processes, which can provide an evidence for the morphological and physiological differences between them. The specific statistics of KEGG enrichment in every comparison groups were shown in Table S1. Moreover, the sum of the FPKM values of the differential genes involved in each KEGG pathway was calculated, and the top 5 up- and down-regulated pathways were shown in Table S2.
Homoeolog expression bias in natural allotetraploid B. napus
Previous studies have shown that the duplicated gene pairs in allotetraploids might display homoeolog expression bias, where bias refers to the preferential and high expression of one homoeolog relative to the other homoeolog [14, 48-50]. To study the homoeolog expression bias in the natural allotetraploid B. napus, the expression levels of 27609 homologous gene pairs from B. rapa and B. oleracea were analyzed. These homologous gene pairs were obtained using a perl script. Then, DESeq2 was used to analyze whether there were expression differences between these gene pairs. Homologous gene pairs that met the condition of and padj ≤ 0.001 were considered to be differentially expressed gene pairs. Compared with the diploid progenitors, the homologous gene pairs between the two subgenomes of B. napus were divided into three categories, including the parental condition, no bias, and novel bias in progeny (Figure 5). As shown in Figure 5, the overwhelming majority of gene pairs (an average of 86.7%) from the two subgenomes of natural allotetraploid B. napus maintained their expression pattern in two diploid progenitors, and this feature was most obvious in leaves (92%) and least obvious in flowers (82.2%). Moreover, an average of 4% gene pairs that already had expression bias in the two diploid progenitors reverted to no bias expression in B. napus, and only 0.8% homologous gene pairs in leaves of B. napus had this reversion (Figure 5). In addition, an average of 9.2% homologous gene pairs displayed novel bias in B. napus and this phenomenon was most common in flowers (13.6%). It was worth noting that an extremely unbalanced biased expression was observed in the natural allotetraploid B. napus, which had a preference toward the A-subgenome (A-bias vs C-bias = 78.1% vs 15.4%, Figure 5).
Expression level dominance (ELD) in the natural allotetraploid B. napus
In addition to homoeolog expression bias in gene pairs, ELD has been frequently described in studies of allopolyploids. Homoeolog expression bias mainly focused on the relative expression levels of the individual homologs, whereas ELD primarily focused on the total expression levels of homologous gene pairs in allopolyploids compared to their relative expression levels in its two parents [14, 48, 50-52]. To study additivity, transgressive expression and ELD in the four tissues of the natural allotetraploid B. napus, the homologous gene pairs were classified into 12 categories by comparing the total expression levels of the gene pairs in B. napus relative to its two diploid progenitors [48]. Overall, an average of 48% homologous gene pairs exhibited additivity expression (categories I and XII), the remaining 29.7% and 22.3% of gene pairs showed ELD (categories II, XI, IV and IX) and transgressive expression (categories III, VII, X, V, VI and VIII) respectively in natural allotetraploid B. napus (Figure 6). More A-expression level dominance (ELD-A) homologous gene pairs (categories IV and IX, an average of 15.7%) were observed in B. napus than C-expression level dominance (ELD-C) gene pairs (categories II and XI, an average of 14%, Figure 6). Therefore, the expression of gene pairs in the natural allotetraploid B. napus displayed ELD bias toward B. rapa. In addition, more gene pairs showed obvious transgressive up-regulation expression (categories V, VI and VIII, an average of 17.6%) rather than down-regulation (categories III, VII and X, an average of 4.7%, Figure 5).
Homologous gene pairs that exhibited additivity expression were the most in leaves (68.9%), and the least in flowers (34%). Gene pairs showed more ELD-A than ELD-C in stems and siliques (13% ELD-A vs 10.8% ELD-C in stems and 25% ELD-A vs 17.6% ELD-C in siliques); while gene pairs displayed more ELD-C than ELD-A in leaves and flowers (3.6% ELD-A vs 5.8% ELD-C in leaves and 21.1% ELD-A vs 22% ELD-C in flowers, Figure 6). Tissues have more ELD-A genes (stems and siliques) were indeed closer to B. rapa in morphology (Figure 1) in B. napus, and it is speculated that ELD-A genes were involved in the regulation of plant tissue morphogenesis. The GO enrichment analysis of ELD-A/-C genes in each tissue was shown in Figure S1 and the results indicated that they were mainly enriched in two part (cellular component and biological process). In addition, the proportion of gene pair with transgressive up-regulation expression was higher than that of gene pair with down-regulation in all four tissues of B. napus.
The relationship between individual homoeolog expression level and ELD
Individual homoeolog expression levels in four tissues of B. napus relative to it diploid progenitors were investigated to explain the ELD phenomenon. More modifications were observed in the A homoeolog (88, 16, 293, and 169 genes in stems, leaves, flowers, and siliques, respectively) than the C homoeolog (58, 13, 185, and 125 genes) (Table 3). The dominant progenitor has a higher expression level than the non-dominant progenitor in categories II and IV (Figure 6), and these could be explained by the up-regulation of at least one homolog from dominant or non-dominant progenitor (Figure 7A & 7C, Table 3). In contrast, the dominant progenitor has a lower expression level than the non-dominant progenitor in categories XI and IX (Figure 6), and these could be explained by the down-regulation of at least one homolog from dominant or non-dominant progenitor (Figure 7B & 7D, Table 3). Statistics showed that the up-/down-regulation of homologs from non-dominant progenitor always exceeded the homologs from dominant progenitor, except for the category IX in flowers (Table 3).