The evolutionary arms-race between host and parasites is of a key importance for maintaining species diversity and community composition. However, the pace of evolutionary change in host-parasite systems is modulated not only by co-interacting communities, but also by common components of their extrinsic environment [1–3]. Yet, the role of environment in shaping host-parasite interactions is much less understood [4]. The advancement of next-generation sequencing (NGS) technologies provides the opportunity to expand our understanding about such complex interactions at an unprecedented speed [5, 6].
During the last decade, high-throughput RNA sequencing (RNA-seq) has been increasingly used to explore infection, disease- and stress-related changes in gene expression of the host. Gene expression analyses at the whole transcriptome level have also shed light on fundamental aspects of host and parasite biology [e.g., 7, 8] and host-parasite interactions [9]. With increased sequencing depth of mixed host and parasite transcripts (i.e. dual-RNA-seq) it is possible to simultaneously observe gene expression changes in the interacting taxa [10–12]. In addition, novel insights into the dynamics of host-parasite interactions at the molecular level are increasingly gained also by analysing sequence data that were traditionally deemed to be invaluable and hence excluded [13]. For example, atypical bioinformatics analysis pipeline involves a step where reads from DNA or RNA sequencing are aligned to the target species genome; those that do not align are simply discarded. This principle is integrated into the majority of existing pipelines because unmapped reads could originate from library contamination and sequencing errors. As such, much effort has been put towards sorting out this type of nuisance information [13, 14]. However, there is growing awareness that some of the unmapped reads could actually harbour novel genetic and ecological information. Thus far, unmapped reads from RNA- or DNA-seq data have been used to discover symbionts, pathogens, and undescribed features of the target species genome, such as highly divergent regions or insertions of the reference genome that would have been missed otherwise [13, 15–18]. Given that parasite and pathogen RNA typically represents only a tiny proportion of the total RNA of the host, a very deep sequencing is necessary to obtain comprehensive understanding of the pathogen transcriptomes and genetic diversity. This means that by using untargeted sequencing of the host transcriptome it is rarely possible to obtain enough power for pathogen community composition analyses. As an alternative to sequencing of whole transcriptomes and genomes, a targeted amplicon-based high-throughput sequencing, known as metabarcoding, has become essential tool for monitoring biodiversity [19, 20] and also increasingly used for understandingparasite diversity in host tissues and environmental samples [e.g., 21, 22]. Community metabarcoding is a sensitive technique that allows detection of rare and cryptic species and species associations [23, 24] as well as analyses of within species genetic variability and population structuring [25].
Diplostomidae is geographically widely distributed trematode parasite group that has a complex life-cycle which includes two intermediate hosts–lymnaeid snails and fishes–while a piscivorous bird usually serves as a definitive host. After infecting and completing its development in snail, metacercariae enters fish eye structures and sometimes neural tissues, which may lead to changes in host behaviour that may reduce general condition of the fish [26, but see 27]. Diplostomidae species are morphologically extremely difficult to distinguish and each fish may be infected by hundreds of parasites. As a result, estimating species diversity, community composition, host-parasite interaction and effects of environmental factors is challenging in Diplostomidae [28–30]. While the use of molecular approaches and especially COI fragment based species identification via Sanger sequencing [31] have advanced the field tremendously by revealing hidden species diversity, most of the studies have focused on describing species from single fluke isolates [30, 32, 33]. However, using single fluke sequencing is suboptimal for characterizing community composition and intraspecific genetic diversity. Therefore, massive parallel sequencing with whole tissue extracts from host represents a potentially powerful strategy to improve the throughput and efficiency, to characterize both inter- and intraspecific diversity of parasites [29]. Different NGS approaches potentially provide complementary information, however, few studies to date have successfully combined multiple massive sequencing methods to further understanding of host-parasites-environment interactions.
Here, we describe how initial transcriptome screening of fish eyes – where we used both host-specific and unmapped RNA-seq reads– invoked a novel hypothesis that humic-associated differences among lakes affect the prevalence of Diplostomidae eye parasites in Eurasian perch (Perca fluviatilis). In particular, by building on RNA-seq read data and expanding upon previous work on eye parasites in perch [34], we hypothesized that the elevated content of humic substances (often measured as dissolved organic carbon (DOC) concentrations and spectral parameters of the water) would have a negative effect on the abundance of the intermediate hosts of eye flukes, gastropods. Since high humic content is also associated with increased acidity of the water, we expect this would negatively affect calcium availability necessary for building shells, or/and decrease the light availability for the underwater plants that serve as an important food source for gastropods [35, 36]. We tested the potential link between humic substances and occurrence of Diplostomidae eye parasites by conducting extensive molecular screening of eye flukes and developing a targeted metabarcoding approach to efficiently screen intra- and interspecific genetic diversity of parasites from host eye tissue. Our work demonstrates how integrated use of NGS approaches can lead to the discovery of novel host-parasite-environment interactions and provide unprecedented power to characterize the molecular diversity of cryptic parasites.