Transcriptome assembly using reference genome
Transcriptome assemblies of RNA sequences from Pekin (Anas platyrhynchos), Muscovy (Cairina moschata), Hinny and Mule ducks were performed as in a previous study [21] by a reference-based approach using the common duck BGI_duck_1.0 assembly as reference genome. Heterologous mapping rate of Muscovy and to a lesser extend of Hinny and Mule transcriptomes (40%, 58%, 59%, respectively) were lower than homologous mapping rate of Pekin transcriptome (71%) on this common duck reference genome, clearly indicating an important bias in heterologous mapping when compared to homologous mapping. Direct mapping of reads on reference genome produced similar results (data not shown). This mapping bias prompted us to test the relevance of a de novo approach.
De novo transcriptome assemblies
After adapters removing, quality trimming and filtering of reads, 545 691 171, 504 591 419, 534 967 150 and 627 575 995 pairs were conserved for Pekin, Muscovy, Hinny and Mule ducks, respectively. Subsequently, 37 073 089, 36 822 075, 43 650 600 and 46 866 094 pairs were selected during normalization. They were assembled de novo using Trinity in four independent “single genetic type” transcriptomes, in a “mixed parental” transcriptome (with Pekin and Muscovy reads) and in a “mixed hybrids” transcriptome (with Hinny and Mule reads). A mixed hybrids transcriptome was also assembled using Oases for comparison. As shown in Table 1, single parental species Pekin or Muscovy transcriptome assemblies with Trinity were of higher quality when compared to single hybrid Hinny or Mule transcriptome assemblies with greater average lengths, higher N50 values and lower numbers of transcripts. The quality of the mixed parental (Pekin+Muscovy) transcriptome assembly was also lower, similar to those of single hybrids transcriptome assemblies. The quality of the mixed hybrids (Hinny+Mule) transcriptome assembly was even lower. When mixed hybrids transcriptome assembly was conducted with Oases, very long transcripts with a great N50 value were produced suspected to be chimeras.
Table 1: De novo assemblies of transcriptomes using Trinity and Oases
Transcriptome
|
Pekin
|
Muscovy
|
Hinny
|
Mule
|
Pekin + Muscovy
|
Hinny + Mule
|
Hinny + Mule
|
Assembler
|
Trinity
|
Trinity
|
Trinity
|
Trinity
|
Trinity
|
Trinity
|
Oases
|
Transcripts
|
491 089
|
481 855
|
631 041
|
661 646
|
808 211
|
1 388 585
|
1 328 985
|
Average length
|
1033
|
1134
|
825
|
838
|
833
|
626
|
2532
|
N50
|
2330
|
2694
|
1441
|
1464
|
1406
|
858
|
5014
|
Pseudo-alignments of sample reads on these de novo transcriptome assemblies and quantification were performed using Kallisto (Fig. 1). Mapping rates of sample reads on single parental species Pekin and Muscovy Trinity assemblies were better than those on hybrids Mule and Hinny assemblies (85.63 ± 1,10 %, 85.15 ± 2.27%, 77.94 ± 2.56 % and 78.82 ± 2,27 %, respectively) (Fig. 1A). Mapping rates on single species transcriptomes were globally better than those on mixed Pekin+Muscovy and Hinny+Mule assemblies (74.24 ± 2.14 and 75.59 ± 1.79), except for Oases assembly that displayed a similar rate but a greater variability across samples (83.43 ± 2.78 %) (Fig. 1B).
Unique meta-transcriptomes were finally assembled with DRAP using single genetic type Pekin, Muscovy, Mule and Hinny assemblies and after filtering at 4 different FPKM thresholds. This filtering on FPKM values resulted in a reduction of transcript number by removing lowly expressed transcripts. For example, 175 572 sequences were conserved after filtering of FPKM > 1 and 64 862 with FPKM > 3, representing 28 6879 845 and 127 395 003 residues, respectively. We chose to keep the transcriptome assembly produced by DRAP after FPKM > 1 filtering for further analyses. This assembly was of high quality, with a N50 value of 2807 bp. Pseudo-alignments of reads on this meta-transcriptome and quantification were performed using Kallisto. As shown on Fig. 2, pseudo-alignment rates of sample reads on DRAP meta-assembly were very similar to pseudo-alignment rates on Pekin Trinity assembly, with low “heterologous biases” and variability across samples (84.56 ± 0.98 % and 85.63 ± 1.10, respectively).
Annotation completeness in terms of gene content of de novo transcriptome assembled with DRAP was assessed with BUSCO for the presence/absence of the conserved eukaryotic single copy orthologous genes. To refer to our previous work [21], completeness of transcriptome assembly using mapping approach with Cufflinks and BGI_duck_1.0 as reference genome was also assessed. As shown in Table 2, DRAP de novo approach produced more orthologues (97.1 % completeness), provided a more complete (0 % missing) and less fragmented (only 3%) catalog of orthologues when compared to the mapping approach using Cufflinks (81.5 %, 7.3 % and 11.2 %, respectively). Same BUSCO analyses were also conducted on BGI_duck_1.0 genome assembly and on two more recent genomes assembled at the chromosome level and expected to be more complete, i.e. CAU_duck1.0 (GCA_002743455.1) and ASM874695v1 (GCA_008746955.1) (Table 2). These three genomes contain more complete and single copy orthologous genes (81.2%, 82.8% and 90.1%, respectively) than de novo transcriptome assembly with DRAP (59.1%) but less complete orthologues in total (82.9 %, 83.5% and 90.8% versus 97.1%) with more genes missing (7.2 %, 7.3% and 6.9% versus 0.0%) suggesting that our RNA sequencing and de novo transcriptome assembly allowed identifying new genes that are missing in genome assemblies.
Table 2: Assembly assessments with BUSCO
Categories
|
DRAP
|
Cufflinks
|
Reference genome
|
BGI_duck_1.0
|
CAU_duck1.0
|
ASM874695v1
|
Complete
|
97.1%
|
81.5%
|
82.9%
|
83.5%
|
90.8%
|
Complete and single-copy
|
59.1%
|
42.9%
|
81.2%
|
82.8%
|
90.1%
|
Complete and duplicated
|
38.0%
|
38.6%
|
1.7%
|
0,70%
|
0.7%
|
Fragmented
|
3.0%
|
11.2%
|
9.9%
|
9.2%
|
2.3%
|
Missing
|
0.0%
|
7.3%
|
7.2%
|
7.3%
|
6.9%
|
Gene expression analyses
Gene expressions after DRAP de novo approach were analyzed with edgeR. Significant (p-value < 0.01) differentially expressed genes (fold change ≥ 2) were determined. In total, 13898 different genes were found up- or down-regulated by overfeeding in the four genetic types (Table 3). Much less differentially expressed genes (DEG) were identified in Pekin than in other genetic types (3749, versus 6167, 8920 and 7696, respectively). Among all 13898 DEG, 903 were identified in the four genetic types. Similar results were found when analyses were conducted with DESeq2 (data not shown).
Table 3: Differentially expressed genes using DRAP de novo approach
DEG
|
Pekin
|
Muscovy
|
Mule
|
Hinny
|
common
|
up
|
2281
|
3450
|
4907
|
3901
|
539
|
down
|
1468
|
2717
|
4013
|
3795
|
364
|
all
|
3749
|
6167
|
8920
|
7696
|
903
|
Expression levels of DEG were analyzed by principal component analysis (PCA) and hierarchical clustering (HC) to cluster samples according to similarities in gene expression. As shown in Fig. 3A, the first principal component (PC 1) of PCA summarized 50% of the whole variability and discriminated samples according to genetic type, pure species being extreme and hybrids intermediate. The second principal component (PC2) summarized 16% of the whole variability and discriminated samples according to feeding. The cluster corresponding to overfed Pekin ducks appeared more dispersed than the other clusters. Two clusters were defined for Mule and Hinny samples according only to feeding (ad libitum and overfed) without any distinction according to genetic type. When expressions of DEG were analyzed by HC (Fig. 3B), two clusters were first defined: one including all Pekin samples, whatever the feeding was, and the other including all other samples. This second cluster included two groups. One corresponded to ad libitum and overfed Muscovy samples. The second included two groups: on one hand ad libitum and on the other hand overfed mule and hinny hybrid samples as was observed in PCA. Finally, 4 different clusters were defined again according first to genetic type: a Pekin cluster more distant from the 3 others and including ad libitum and overfed samples; a Muscovy cluster also including ad libitum and overfed samples; a fed ad libitum Hinny and Mule cluster; and an overfed Hinny and Mule cluster. As observed in PCA, these two latter clusters suggest that Hinnies and Mules in the same feeding status cannot be distinguished according to differential gene expression.
The presence of down- and up-regulated DEG in the four genetic types was shown with Venn diagrams (Fig. 4A). Few down- and up-regulated DEG (364 and 539, respectively) were identified in the four genetic types. However, fold change in expressions of these common DEG were very different according to genetic type (Fig. 4B). DEG expression levels in Pekin were more different from the three other genetic types with less down- and up-regulated DEG visualized, suggesting that DEG responses to overfeeding are less important in Pekin ducks. Conversely, many DEG were not found in each of the 4 genetic types or specifically found in one genetic type only (821, 2171, 1281, and 2125 in Pekin, Muscovy, Hinny and Mule ducks, respectively) indicating genetic type effects (Fig. 4A). Some examples taken at random of such feeding and genetic type effects interactions are shown in additional file 1. They illustrate different types of response to overfeeding (fold change and/or up and down regulation) between the 4 genetic types. Hybrids shared the highest number of DEG with 2677 down- and 3061 up-regulated DEG found in both mule and hinny ducks. Only two genes were found with interaction, i.e. down-regulated in one hybrid and up-regulated in the other hybrid. These results indicate that few differences were observed in response to overfeeding between the two hybrids or in other words few genetic type effects and therefore few interactions with feeding effect. For comparison, less down- and up-regulated DEG were found in both mule and Muscovy ducks (1640 and 1936, respectively), even less in both mule and Pekin ducks (956 and 1611, respectively) and only 415 down- and 598 up- regulated DEG in both Pekin and Muscovy ducks indicating more differences in feeding effect between the two duck “pure” species.
Functional enrichments
Functional enrichments in GOBP associated to DEG identified after DRAP de novo method were analyzed and compared to those identified after reference based method with BGI_duck_1.0 genome as conducted previously [21]. Many GOBP enrichments (535) were found whatever the method used (additional files 2 and 3A). However, some differences were observed between methods as evidenced with enrichment annotation profiles (Fig. 5). Enrichments were less important with de novo method. Difference between Pekin and the 3 other genetic types was more important with de novo method and many GOBP enrichments found with reference based method (744) were not found with DRAP method (additional file 3C). Interestingly, many GOBP were found enriched with DRAP method (1257), but most of them corresponded to few DEG (additional file 3B) and/or were found in 1 or 2 genotypes only (additional file 2).
As expected, most of the enriched biological processes found with the 2 methods were related to metabolism of lipids but also to many other processes related to mitosis and cell cycle, transmembrane transport, ion homeostasis and inflammatory response to cite the more enriched (additional file 2). To better characterize lipid metabolism, we focused on GOBP terms including lipid or fatty acid words. As shown in Fig. 6, metabolism and regulation of fatty acids and neutral lipids, lipid signaling and response to lipid biological processes were enriched in Muscovy ducks and the two hybrids and much less or not in Pekin ducks. Again, difference between Pekin and the 3 other genetic types was more important with de novo method.