The aim of this study was to analyze and compare gene expressions in four different duck genetic types, common Pekin and Muscovy duck species and their reciprocal mule and hinny hybrids using a relevant approach. As mentioned by Moreton et al. [14], choice for using reference-based or de novo transcriptome assembly approach for gene expression analyses is generally based on the question whether or not a reference genome is available. Reference-based strategy involves mapping of RNA sequences on a reference genome to assemble transcripts and count reads. This strategy is generally conducted when reference genome in the species of interest or in a closely related species is available. For examples, this strategy was conducted in sparrows using zebra finch reference genome [19] or in red deer using cow reference genome [20]. Otherwise, when a reference genome from the species or a closely related species is not available, de novo assembly approach is generally recommended and many examples can be found in the literature. However, when gene expression from different species are to be analyzed and compared, the question of strategy is more complex and cannot be taken up with the question of availability of a reference-genome for one of these species. Few relevant results for such a situation have been published. Reference-based strategy was applied for transcriptome analysis of horse, donkey and their hybrids using the horse reference genome [18]. Conversely, de novo assembly approach was conducted for transcriptome analyses in interspecific cyprinidae [23] and brassica [24] hybrids. In support of these studies, neither strategy seemed a priori better than the other. Therefore, we decided to test both approach in our study.
Reference-based approach was first conducted using the common duck (Anas platyrhynchos) genome as reference for the four genetic types. As indicated, mapping rates were very different when homologous and heterologous mapping were considered (71% with Pekin duck reads and 41% with Muscovy duck reads). In the study of Wang et al. [18], homologous mapping rate of horse reads on horse reference genome (61.70%) was very similar to heterologous mapping rate of donkey reads on horse reference genome (62.07%). The reason of this difference of heterologous mapping rate between equids and ducks was not directly investigated but we can speculate that it is partly due to the difference of evolution distance between equids and duck species (estimated time according to http://www.timetree.org/, [43], 7.7 and 22.8 MYA, respectively), but also to different mapping parameters in the two studies. In our study, these parameters were set by defaults and were not optimized for heterologous mapping. As described in a previous study [28], this bias of mapping was not a problem to analyze differences in gene response to feeding in ducks from the same genetic type. It was however too important for gene expression comparisons between the four genetic types and to analyze genetic effect in gene expression.
Therefore, de novo transcriptome assembly approaches were then conducted. Many tools are available for de novo transcriptome assembly, with different efficiencies according to species assembled [44]. We chose to use and compare two of them, Trinity [21, 29] and Oases [22]. As shown in Table 1 and Fig. 1A, Trinity “single” transcriptome assemblies from “pure” Pekin or Muscovy duck sequences were very similar and better in terms of length and alignments of reads than assemblies from hinny and mule hybrid sequences. This result is probably due to the fact that in hybrids transcripts are expressed from two different genomes making assembly less efficient according to the presence of some polymorphisms in transcripts from the same orthologs between the two species. This was also observed when assemblies were performed with sequences from two different species (Fig. 1B). We have also shown that more reads were aligned on these de novo assembled transcriptomes than on common duck reference genome. Furthermore, homologous mapping of Pekin reads on de novo Pekin transcriptome assembly (around 85%) was more efficient than that on reference genome (71%). We can speculate that some reads correspond to genes that are missing in the reference genome, corresponding to new or misassembled genes, or to genes that are in unknown regions, i.e. unassembled and unassigned regions. Thus, these reads are not mapped on reference genome but are assembled together in transcripts by de novo approach and map on theses transcripts.
As previously said, mapping of reads on transcriptomes assembled from hybrids or two genetic types were less efficient, whatever the assembler Trinity or Oases used, indicating that these tools are not well suited for “complex” transcriptome assembly. Interestingly, Cabau et al. have described a de novo RNA-Seq Assembly Pipeline (DRAP) expected to improve Trinity and Oases RNA-Seq assemblies [27]. This prompted us to test DRAP, not only to improve “single” genetic type assemblies but also to perform a meta-transcriptome assembly of transcriptomes from the four duck genetic types. As shown in Fig. 2, this meta-transcriptome assembly with DRAP was very efficient, with a mapping rate of sample reads similar to the best rates previously observed, i.e. homologous mapping rate of Pekin or Muscovy duck reads on de novo assembled Pekin or Muscovy transcriptomes. Further analysis with BUSCO [33] confirmed the quality of this meta-transcriptome assembled with DRAP (Table 2). As stated on the dedicated website, BUSCO provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs. In our study, more than 97% of single-copy orthologs were assembled by DRAP as complete (i.e. full expected transcript length), 3% were fragmented (assembled with a shorter length than expected) and finally no (0%) were missing. With reference-based approach using Cufflinks, only 81.5% single-copy orthologs were assembled to full length, 11.2% were fragmented and 7.3% were missing. This clearly indicates that de novo assembly of meta-transcriptome using DRAP is of higher quality when compared to assembly using reference-based approach. Again, this is probably due to the missing of genes in the reference genome. Moreover, when BUSCO was applied to reference genome, proportions of complete and missing orthologs were similar (82.9% and 7.2%, respectively) to those found after reference-based approach for transcriptome assembly using Cufflinks. It is interesting to note that lower proportions of complete and single copy orthologs were found in DRAP and Cufflinks assembled transcriptomes (59.1% and 42.9%) than in reference genome (81.2%). The fact that most of orthologs were complete and as expected in a single copy in the genome and not in transcriptomes whatever the approach used is probably the result of the presence of only one isoform for one orthologs in the genome (by definition because BUSCO applies only on single-copy orthologs in related species) but of multiple alternative transcripts from one single-copy orthologs making assembly in one transcript more tricky. This clearly indicates that development of methods to improve transcriptome assembly and reduce duplication and more generally redundancy remains a concern today. Nevertheless, our results show that the de novo approach with DRAP is more relevant than the reference-based approach to assemble a meta-transcriptome from reads of the four duck genetic types.
Gene expression analyses and identification of differentially expressed genes was then conducted using this meta-transcriptome as reference, assuming that mapping bias were substantially reduced when compared to our previous study using a reference based approach [28] and allowed us to analyze differentially expressed genes within and between genetic types. Many genes were found up-and down-regulated by overfeeding and some of them (903) were found in the four genetic types. These results indicate some similarities but also some genetic type-specific responses to overfeeding. This was further confirmed by principal component analyses and hierarchical clustering. Different conclusions can be drawn from these results. First, responses to overfeeding in hinny and mule hybrids were very similar and indicate that expectation that hinnies are not used for fatty liver production due to a different and lower response to overfeeding is wrong. In fact, the reason essentially lies in the difficulty to produce viable hinnies due to a lower efficiency of artificial insemination of Muscovy females and their lower fertility when compared to Pekin females. Second important conclusion is that many genes are differentially expressed in the four genetic types. This suggests that response to overfeeding and consequently development of hepatic steatosis is a complex trait involving many genes and functions and is not well described by the expression of few candidate genes [6–9], even if expressions of these genes are well correlated to fatty liver weight [6]. Third, although some differentially expressed genes were found in the four genetic types, some other were not found in Pekin ducks, making this species less similar and more distant to the three other genetic types. Furthermore, individual variability of responses in Pekin ducks was very important making some overfed Pekin samples near to Pekin ducks fed ad libitum samples. This is directly linked to the greater variability and lower ability of Pekin ducks to produce fatty liver in response to overfeeding when compared to hinny and mule hybrids and Muscovy ducks. To conclude on this point, response to overfeeding in ducks is very complex, different in some extent from a genetic type to the other and involves many genes suggesting that genetic improvement of duck selection for fatty liver production should be undertaken at the whole genome level.