Study Area
The Conwy Catchment is a 678 km2 river drainage in north Wales that encompasses a wide range of habitats including forest, moorland, agriculture, light urbanization, and acid grasslands (Fig. 1). The area experiences rapid climatic shifts, particularly during winter months due to its mountainous terrain and porous rock foundations, which facilitates flash flooding, making it susceptible to periodic disturbance 54. The Conwy Catchment area exhibits four distinct seasons that correspond to the expected life cycles of EPT and Chironomidae larval emergences throughout the year.
Sampling
Fourteen headwater sites across 5 landuse types (acid grassland, agriculture, forest, urban, moorland) were sampled once per season (spring, summer, fall, winter), during 2017 (5 landuse types x14 sites x4 seasons). Headwater sites were selected to ensure local landuse types were not influenced by other landuse type via downstream transport62,63. In total 168 eDNA samples and 56 traditional kick-net samples were taken over the course of the study. Sampling for each season occurred over two consecutive days. For all sampling events, streams were sampled for both eDNA and macroinvertebrate community composition during the same day. Water samples for eDNA analysis (1 L) were collected in triplicates from each stream with plastic bottles that had been cleaned prior using a 10% bleach solution (soaked for 1 hour). These were then filtered through 0.22 μm SterivexTM filter units (EMD Millipore Corporation, Billerica, USA) using a Geopump TM Series II peristaltic pump (Geotech, Denver, USA). As filters would occasionally experience reduced filtration efficiency for different sites, or seasons, due to stream sediment loading, we would continue to run the pump for each sample until at least 500ml was filtered, to avoid potential downstream variation in sampling 64, which we previously showed was not an issue in this experimental setup 53. The filters were immediately preserved in 500 μl ATL lysis buffer (Qiagen, Venlo, The Netherlands) and stored in coolers during same-day transit to the laboratory, then stored at 0 °C, for further processing. Macroinvertebrate communities were sampled using a standardized 3-minute kick sampling protocol, with a 500 μm mesh gauge kick/hand net. Kicknet sampling occurred after eDNA sampling to ensure disturbance of the site from kicknetting would not influence the eDNA signal. Both bank margins and riffle habitats were sampled during this timed sampling period. Macroinvertebrates were preserved in absolute ethanol (99.8%; VWR International, Lutterworth, UK) on collection. Upon return to the laboratory the macroinvertebrates were cleared of other collected material and identified to the lowest practical taxonomic per protocol65. Environmental (abiotic) data including pH, conductivity, depth, moss coverage, algal coverage, plant coverage boulder coverage, gravel coverage and sand coverage, were collected for each site following UK Environment Agency site assessment protocols65. Three field blanks consisting of deionized water were prepared in Nalgene sampling bottles and were kept with the other sampling gear throughout each sampling event. For each seasonal sampling event we incorporated three field blanks into the sampling process, processing one blank every 4th sampling site, resulting in a total of 12 blanks for the study. Field blanks were 1L in volume and treated the same as standard samples.
Extraction and Sequencing
We followed unidirectional lab practices from field, to extraction to library preparation by using designated extraction (PCR free) and library preparation rooms. DNA was extracted from the filters using a modified QIAGEN DNA blood and tissue extraction protocol 66. In short, 70 μl proteinase K was added directly to the filters and incubated at 58 °C overnight in a rotating hybridization chamber. Then, the lysate was extracted and the full volume was filtered through a spin column tube, after which point the standard extraction protocol was continued. Extracts (final volume 50 μl) were then cleaned for impurities using QIAGEN Power Clean kit and frozen at -20°C for subsequent analyses. Sequencing libraries were created using a two-step protocol (see Bista et al. 2017), using matching dual end index tags (IDT) and the following COI gene region primers for the first round of PCR (PCR1): m1COIintF (5′-GGWACWGGWTGAACWGTWTAYCCYCC-3′) and jgHCO2198 (5′-TAIACYTCIGGRTGICCRAARAAYCA-3′) 67. Libraries were created at Bangor with the assistance of a Gilson pipette max liquid handler before being shipped to University of Birmingham’s Genomic sequencing facility for quality control and sequencing. Round 1 amplification (PCR1) with the COI primers was performed in triplicates, which were then cleaned for primer dimers using magnetic beads (Beckman coulter), pooled and index labeled during Round 2 PCR step (PCR2), which were cleaned again using magnetic beads. Unique dual paired end-indices were designed and purchased from Integrated DNA Technologies, to complement the Illumina P5/P7 sequence adapters. PCR1 utilized Thermo Scientific's Ampli-gold mastermix due to the high number of inosine in the COI primer pair, and for PCR2 we utilized New England Biolab's Q5 mastermix. All PCR1 and PCR2 reactions were run in 25 μL volumes. PCR1 amplicons were generated using a reaction mix of 12.5 μL mastermix, 2 μL DNA template, 1 μL of each primer and 8 μL nuclease free water and amplified using an initial 95 °C for 5 min then 25 cycles of 95 °C for 30 s, 54 °C for 30 s and 72 °C for 60 s followed by a 72 °C final annealing for 10 min. PCR2 amplicons were generated using a reaction mix of 12.5 μL mastermix, 2 μL DNA template, 1 μL of each primer and 8 μL nuclease free water and amplified using an initial 98 °C for 30 s then 15 cycles of 98 °C for 10 s, 55 °C for 30 s and 72 °C for 30 s followed by a 72 °C final annealing for 10 min. The, PCR2 amplicons were purified using High Prep PCR magnetic beads (Auto Q Biosciences) and quantified using a 200 pro plate reader (TECAN) with the Qubit dsDNA HS kit (Invitrogen). The final amplicons were pooled in equimolar quantities (at a final concentration of 12 pmol) using a Biomek FXp liquid handling robot (Beckman Coulter). Pool molarity was confirmed using a HS D1000 Tapestation ScreenTape (Agilent). Sequencing was performed on an Illumina HiSeq platform 250bp Paired-End, with an intended coverage of 100,000 reads per sample.
Bioinformatics
Bioinformatic processing up to taxonomic assignment was performed by University of Birmingham. In short, per base quality trimming was performed on demultiplexed reads using SolexaQA++ v.3.1.7.1 (Cox, Peterson, & Biggs, 2010) and paired end reads were merged using Flash v.1.2.11 68, with default parameters. Primer sequences were removed with TagCleaner v.0.16 69 allowing up to 3 mismatches per primer sequence. Only sequences with both forward and reverse primers were retained for further analyses. Amplicon sequence variants (ASVs) were obtained via Usearch at 97% similarity threshold, and denoising with the -unoise3 algorithm. Chimeras were removed as part of the –unoise3 algorithm70. Taxonomy to the genus level was assigned to representative ASV with BLAST against ASVs using the non-redundant nucleotide database of NCBI, using the default settings71.
Metacommunity analyses
All statistical analyses were performed using R version 3.6.1 72. Sequence reads were rarified for each set of replicates to the lowest replicate level. Mean number of reads for each ASV were calculated across the sample replicates before being matched to their taxonomic identifier. ASVs that were not identifiable to the genus level, or to a functional group (below), were not included in subsequent analyses. Genera richness was calculated as the number of unique genera per site. We further divided richness into unique EPT genera for EPT richness, and Chironomidae richness as the number of unique Chironomidae genera. Functional richness was calculated as the partition of unique functional groups per sample, following the partition of functional groups in Moog (2017). In short, Moog (2017) provides a catalogue of 3 296 metazoa species that form the basis of ecological status assessment for many European environmental agencies. The functional scores are assigned to each taxa based on a ten point partitioning to reflect the variation in functionality within taxa, meaning the functional scores is a score of the function and not simply a reassignment of the taxonomic identification. These groups reflect the functional feeding groups, divided into 8 categories, including; grazer/scrapers, xylophagous, shredders, gatherers/collectors, active filter feeders, passive filter feeders, predators and other. We refined these groups to shredders, grazers, and collectors whereby collectors were the summation of gather/collectors and filter feeding groups to simplify the functional groups to those used across wider studies. To assess community dynamics between sites we calculated the nestedness and turnover components of beta-diversity following Baselga (2010)14, whereby beta-diversity was calculated as βsor (1), Turnover as βsim (2) and βnes as βsor minus βsim (3)14. All mathematical formula follow the nomenclature of Baselga (2010), with a being the number of genera common to both sites, b is the number of genera occurring in the first site but not the second and c is the number of genera occurring in the second site but not the first.
We used generalized least squares (gls), as implemented using the gls function in the nlme package73, to assess the statistical relationships between (1) community, (2) EPT, (3) Chironomidae, and (4) functional richness (each as a separate response variable and independent statistical test) and all two-way interactions of the explanatory variables, including sampling method (eDNA or traditional), season (spring, summer, fall and winter) and landuse gradient (see below for description). Generalized least squares is an extension of linear regression that allows for variance structuring of variables to account for suspected correlation between residuals. Here, we specifically used the gls framework to account for potential spatial autocorrelation between sites by included a variance structure using the distance matrix for the sampling sites73. We further used gls to assess the statistical relationships between nestedness and turnover between communities against a set of explanatory variables, including the landuse gradient, sampling method, and season, including all possible two-way interactions. Backward model selection was performed to find the most parsimonious model using Akaike information criterion (AIC) to determine the best model fit 74. Model assumptions, including normality and heteroscedasticity using model diagnostic plots were implemented in R. The landscape gradient was calculated as the first principal component of a PCA derived from non-covarying environmental variables (normalized and centered prior to PCA analysis), where non-covariance was assessed via visualization of the pairwise correlation between all measured environmental variables (Supplementary Figure 1) (Borcard, Gillet, & Legendre, 2011). The first axes of the PCA, which was used as the derived environmental gradient, accounted for 67.47% of the observed variation with environmental loadings consisting of pH (0.027), moss coverage (-0.449), depth (-0.297) and boulder coverage (-0.842).
Data availability
Data including amplicon sequence data, ASV matrices and environmental data associated with the manuscript will be deposited on figshare.
Code availability
The analyses utilized standard language coding commands for the bioinformatics and data analyses as described in the methods.