The increasing release of chemicals from agricultural, industrial, and domestic sources in the last few decades has led to significant contamination of aquatic ecosystems. The freshwater compartment is particularly vulnerable to such anthropogenic impacts, and large-scale monitoring programs have been established to assess the resulting degradation, such as the Water Framework Directive (WFD) (1). Biofilms (communities of organisms attached to surfaces) are one of the biological compartments recognized by the WFD as a target for freshwater quality assessment due to their rapid responses to environmental changes, rapid growth rates, and physiological variety of the constituent organisms (2). Microalgae are the dominant members of these communities and have a vital role as primary producers. They are sensitive to many environmental variables (e.g., salinity, pH, nutrient concentrations) and have traditionally been used to classify water bodies based on autecological preferences of the community's taxonomic composition (3).
Gathering information on microalgae community composition and abundance is, however, a challenging task that requires taxonomic expertise and specialized tools. These challenges apply particularly to diatoms, a group of microalgae that are present in all types of waters that are highly sensitive to eutrophication/organic pollution gradients (4). Conventionally, diatom taxonomists identify and count several hundred diatom valves in biofilm samples using light microscopy. Specialized tools for microalgae have also been developed which integrate sampling devices, image analysis technologies, and machine learning algorithms, such as ZOOSCAN (5), VPR (6), and FlowCam (7). However, these tools provide limited taxonomic resolution when morphological differences are subtle. Moreover, recent molecular phylogenetic studies have shown that many diatom morphospecies comprise several evolutionary lineages likely corresponding to species-level differentiation (8–11). It is essential to recognize these `cryptic` species because their ecological niches may differ even when they live in sympatry (12, 13), or when different localities harbor different proportions of morphs with varying ecological tolerances (14). Moreover, indices based on diatom community metrics that are currently in use for freshwater quality monitoring in European countries (15–17) require the identification and quantification of hundreds of species which include morphologically similar and/or phylogenetically close taxa.
Molecular methods can overcome some of these challenges. For example, qPCR and ddPCR methods have been developed to assess the abundance and distribution of sub-populations of plankton in environmental samples (13, 18–20). However, these methods require a priori information on the target gene of the focal populations and are limited to surveys targeting certain species or genera, hampering scalability which is crucial for environmental assessments. DNA metabarcoding can overcome this issue, and several studies comparing metabarcoding with microscopy methods have been published in the last decade (10, 21–27). In general, metabarcoding has proven to be a valuable tool for detecting rare species and overall changes in community composition. However, several issues have been highlighted for obtaining reliable abundance estimates, including; (i) reference database incompleteness, (ii) lack of resolution of phylogenetic markers, (iii) cryptic diversity, and (iv) gene copy number variation (28). Moreover, the correlations of gene copy numbers and genome sizes with biovolumes of different species need to be considered for reliable estimates (13). Therefore, morphological assessment remains to play a central role despite the many advantages offered by these more recently developed molecular approaches (29–31). There are several additional High Throughput Sequencing (HTS) based methods that have been used in recent years to quantify species abundances in plant mixtures, including genome skimming and multispecies genotyping by sequencing (msGBS) (32–34). These methods could have great value in identifying and quantifying microalgae in environmental samples as they are scalable in species numbers and provide sufficient taxonomic resolution.
Nitzschia palea is one of those widespread bioindicator species complexes with several morphological variants described from either organic- and metal-polluted or clean and only slightly polluted habitats (2, 17, 35, 36). Morphological differences between these varieties are very subtle, and their differentiation using light microscopy is impossible (37, 38). Ciftci et al39 revealed several evolutionary lineages in this species complex with recent gene flow between clades with different morphologies and a resulting putative hybrid. Given that N. palea plays a prominent role in the biological monitoring for the WFD, being able to rapidly and quantitatively distinguish among different lineages is of crucial importance.
In this study, we aimed to evaluate a genome-based quantification method to estimate the relative abundances within the N. palea species complex for application in biomonitoring. We used msGBS on mock mixtures prepared from six strains belonging to different N. palea lineages for (i) resolving closely related taxa within a non-model diatom species, and (ii) comparing quantification accuracy with traditional light microscopic surveys.