Sampling sites
The study includes two commercial-size Swiss RAS farms (A and B) with distinct ownership and operational management procedures. Farm A breeds perch (Perca fluviatilis), raising offspring from egg to approximately 15g and features several life-stage-specific circuits with independent filtration systems. Fish are moved to the next circuit when they reach a certain cutoff weight. After each move, a stringent disinfection regimen is applied. First, the biofilter is disconnected from the circuit to protect the microbial community from disinfection solutions. Next, the tanks are emptied, followed by a four-step cleaning regimen, 1) a high-power jet wash with hot water, 2) brushing down the tank walls and floor with soap, 3) a static acid-base treatment of the tanks and pipes with neutralizing steps in between, and 4) spraying the tanks with alcohol. The tanks are then dried entirely before refilling and restocking with the next batch of fish. Farm A uses multiple feed brands depending on the life stage of the fish (Bernaqua, BioMar, and Alltech Coppens).
Farm B is situated > 100 km from Farm A in a different catchment. Farm B raises two fish species: perch, obtained from Farm A at around 15g, and pike-perch (Sander lucioperca), obtained at the fingerling stage from another provider. Both species are raised to slaughter weight within a single circuit in concrete tanks. Cleaning regimens are applied once a tank is emptied. However, because of grading and moving the fish into new tanks, which might already be occupied, there is no strict cleaning disinfection timeline. The disinfection protocol consists of emptying tanks, washing the tank with high-pressure hot water, spraying Virkon S as a disinfection solution, refilling with water, and then stocking with the next batch of fish. Farm B uses Alltech Coppens Supreme of varying pellet sizes according to fish size. Both farms use agitated biofilters with floating plastic biofilter carriers to supply the necessary surface area to foster microbial communities. Farm identities and locations are confidential.
Sample types
Three sample types were collected: tank biofilm, tank water, and biofilter water (Fig. 1A). First, biofilm samples were collected with a sterile, single-use foam swab (Merck – product was discontinued) by rubbing one side of the swab back and forth approximately ten times across a ~ 10x10 cm area of the tank wall about 6 cm below surface water level and repeating the procedure on the same area with the other side of the swab. After swabbing, the swab was placed into a 2 ml Eppendorf tube, the stick was broken off, and the closed 2 ml tube was placed on wet ice. Biofilm replicates were taken with an approximately 2 cm gap between them. Next, using a sterile 500 ml plastic beaker, we collected 500 ml of water from the same tank where the tank biofilm sample was collected, followed by on-site filtering of 120 ml of water using a 60 ml sterile, single-use syringe (Faust) and a 0.22 um mixed cellulose filter (Millipore, Merck) contained in a Whatman 47 mm plastic filter holder (Whatman, Merck). Replicates were taken from the same beaker, thoroughly mixing the water before the replicate was taken. After filtration, the filters were placed in a 2ml Eppendorf tube and stored on ice. Finally, biofilter water samples were collected, with a new sterile beaker, in the same way and from the same circuit as tank water and biofilm samples. All samples were transported back to the Institute for Fish and Wildlife Health, University Bern, on wet ice and stored at -80°C until further processing.
Sampling scheme
Sampling aimed to maximize insights into differences and similarities between replicates, sample types, time points, analysis methods, within-farm compartments, and farms. In Farm A, two sampling events on different dates occurred in the H circuit that houses 12-15g perch. The first sampling event took place on June 25th, 2020 and consisted of collecting tank wall biofilm (samples 4–6), tank water from the same tank as the biofilm (samples 7–9), and biofilter water (samples 10–12). This sampling took place less than a week after the last cleaning of the tanks. A second sampling took place on November 4th, 2020 and involved the collection of tank wall biofilm (samples 1–3), from a second tank within the H circuit, several weeks after the last cleaning of the tank. In farm B, sampling took place on November 23rd, 2020, that consisted of collecting tank wall biofilm from two tanks (samples 13–15 (tank 1) and 16–18 (tank 2), within the main circuit, E. Two negative control samples were taken at the June 25th, 2020, sampling event but not sequenced. An overview of all samples is provided in Additional file 1. The negative water control was filtered the same way as the on-site water samples, using distilled water instead of system water. The negative swab sample consisted of unpacking a swab on-site and placing it into a 2 ml tube without swabbing a surface.
DNA extraction
Three DNA extraction methods were tested on pre-trial water and swab samples for optimal and consistent DNA yield and quality (Fig. 1C) because suboptimal lysis conditions can introduce stochastic bias against gram-positive bacteria because of their thick outer wall. Tests included 1) the Purelink Microbiome DNA Purification Kit (Thermofisher), which is optimized for microorganism lysis, 2) the DNeasy PowerWater Kit (Qiagen), which is optimized for the isolation of genomic DNA from filtered water samples, and 3) phenol-chloroform extraction, which has been shown to produce high DNA yield from environmental samples [29]. The PowerWater kit produced inconsistent yields (results not shown), whereas, the Phenol-Chloroform appoarch produced higher DNA yield, but was countered by pheno carry-over and low DNA purity. The Purelink Microbiome kit consistently produced the highest quality and yield and was subsequently used for the study. Before extraction, frozen filters were crushed in a 2ml Eppendorf tube with sterile 1,000 ml pipette tips. Lysis buffer was added, and bead-beating was performed in a TissueLyser set to full speed for 10 minutes per the manufacturer's instructions.
Sequencing
Short amplicon
The performance of four amplicon-based 16S-targeting approaches was compared regarding amplification, read quality, taxonomic, and biological conclusions (Fig. 1D). Three amplicons designed for short-read Illumina sequencing included 16S variable regions V4 (primers 515F + 806R, hereafter referenced as "Earth"; [30]), V3-4 (primers 341F + 805R, hereafter referenced as "Miseq"; [31], and V1-3 (primers 27F [32] + 534R [33]; hereafter referenced as "27F_534R"; Table 1). One amplicon designed for long-read Pac-Bio sequencing with the Shoreline StrainID kit included 16S, ITS, and 600 bp of the 23S gene [34]; Table 1). The Shoreline Complete StrainID kit uses a patented StrainID primer set.
Table 1
Primers used in this study. FW: forward orientation. REV: reverse orientation. bp: base pairs.
Targeted region
|
Shorthand
|
FW primer
|
FW primer sequence
|
FW primer length
|
FW melting temperatures
|
RV primer sequence
|
REV primer length
|
REV melting temperatures
|
Amplicon length
|
Primer target species
|
Primer reference
|
16S:
V3-4
|
MiSeq
|
341F
|
CCTACGGGNGGCWGCAG
|
17 bp
|
58-60 °C
|
GACTACHVGGGTATCTAAT
|
19 bp
|
52-60 °C
|
~465 bp
|
Bacteria
|
Klindworth et al., 2013
|
16S:
V4
|
Earth
|
515F
|
GTGYCAGCMGCCCGCGGTA
|
19 bp
|
64-68 °C
|
GGACTACNVGGGTWTCTAA
|
19 bp
|
54-58 °C
|
~300 bp
|
Bacteria & Archaea
|
Caporaso et al., 2012
|
16S:
V1-3
|
-
|
27F
|
AGAGTTGATCCTGGCTCAG
|
19 bp
|
58 °C
|
ATTACCGCGGCTGCTGG
|
17 bp
|
56 °C
|
~500 bp
|
Bacteria
|
Weisburg et al., 1991; Muyzer et al., 1996
|
16S-ITS-23S
|
Pac-Bio
|
-
|
proprietory
|
-
|
-
|
proprietory
|
-
|
-
|
~2500 bp
|
Bacteria
|
Shoreline StrainID
|
Optimal amplification conditions suitable for all three short amplicons were determined by gradient PCR and reducing cycle number as much as possible. The PCR included 12.5 µl of KAPA HiFi HotStart Ready Mix (Roche, Switzerland), 5 µl of each primer (0.2 µM stock concentration), and 12.5 ng of DNA plus water to a total volume of 25 ul. 14, 16, 18, 20, 22, and 25 PCR cycles were tested at annealing temperatures between 54 and 58°C for all three primer pairs. Based on agarose gel electrophoresis evaluations of amplification success, the following protocol was derived: initial denaturation at 95°C for 3 min, 20 cycles (denaturation at 95°C for 30s, annealing at 55°C for 30s, and extension at 72°C for 20s), and final elongation at 72°C for 5 min. In addition to all samples, four positive controls (Zymobiomics microbial community standard (Zymo Research)) were amplified with this protocol. Samples 19–21 were introduced at this step and are technical PCR-level replicates of sample 2 amplified with Earth primers. Notably, during the sequencing run sample 19 failed to be sequenced.
The preparation of 16S rRNA gene amplicons for the Illumina MiSeq System was designed and performed at the Next Generation Sequencing Platform, University of Bern, according to the "16S Metagenomic Sequencing Library Preparation" protocol (Illumina, art #15044223 Rev. B). The quantity and quality of the cleaned amplicons were assessed using a Thermo Fisher Scientific Qubit 4.0 fluorometer with the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Q32854) and an Agilent Fragment Analyzer (Agilent) with an HS NGS Fragment Kit (Agilent, DNF-474), respectively. Next, the index PCR step was performed as in the protocol except using IDT for Illumina DNA/RNA UD Indexes Set A (Illumina, 20027213), MyFi Mix (BIOLINE, BIO-25050) and the inclusion of a no template control (NTC). The amplicon libraries were assessed for quantity and quality as described above using fluorometry and capillary electrophoresis. The remainder of the protocol was followed, except that the library pool was spiked with 10% PhiX Control v3 (Illumina, FC-110-3001) to compensate for reduced sequence diversity. Finally, the library was sequenced at 2x300 bp using a MiSeq Reagent Kit v3, 600 cycles (Illumina, MS-102-3003) on the MiSeq sequencing instrument. The run was assessed using Illumina Sequencing Analysis Viewer 2.4.7. We used Illumina bcl2fastq conversion software v2.20 to demultiplex the library samples and convert generated base call files into FASTQ files. All steps beyond the first PCR step were performed at the Next Generation Sequencing Platform, University of Bern. Short-read sequencing, before filtering, resulted in a total of 4,808,910 (27F_534R), 5,306,621 (Earth), and 5,149,263 (MiSeq) reads. Read numbers at all filtering steps are available in Additional File 1.
Raw data from Illumina amplicon sequencing are from the SRA NBI databank (accession codes between XXX – XXX).
Long amplicon
Long amplicon Pac-Bio sequencing was performed at the Next Generation Sequencing Platform, University of Bern. The quantity and quality of the extracted DNA were assessed using a Thermo Fisher Scientific Qubit 4.0 fluorometer with the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Q32854) and an Agilent Femto Pulse system with an Ultra Sensitivity NGS kit (Agilent, FP‑1101), respectively. The DNA was then amplified using dual-unique barcoded primers targeting 16S-ITS-23S, using the StrainID kit from Shoreline Biome using strain ID Set Z, Barcodes T1-T16 (Shoreline Biome, STRAIN-Z-SLB). This approach involves a single-step PCR, consisting of primers containing the barcode and target-specific primer, generating amplicons ready for SMRTbell template prep and subsequent sequencing on the Pac-Bio Sequel System. The protocol from input DNA to SMRT sequencing was followed according to the Shoreline Wave for Pac-Bio Technical Manual, following all parameters for the Strain ID workflow. As well as the input DNA of interest, a no template control (NTC), and two community controls (ZymoBIOMICS Microbial Community DNA Standard and ZymoBIOMICS Microbial Community DNA Standard II (Log Distribution) (Zymo Research, D6305 and D6311, respectively) were included. The generated library was SMRT sequenced using a Sequel binding plate 3.0, sequel sequencing plate 3.0 with a 10h movie time on a Pac-Bio Sequel system on their own SMRT cell 1M v3. The library was loaded at 9pM and generated 15 Gb and 284,296 HiFi reads.
Raw data from Pac-Bio amplicon sequencing are from the SRA NBI databank (accession codes between SRR18029933-SRR18029941).
Shotgun metagenomics
Illumina shotgun metagenomics sequencing was performed at the Next Generation Sequencing Platform, University of Bern. The extracted DNA was assessed for quantity, purity, and length using a Thermo Fisher Scientific Qubit 4.0 fluorometer with the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Q32854), a DeNovix DS-11 FX spectrophotometer, and an Agilent FEMTO Pulse System with a Genomic DNA 165 kb Kit (Agilent, FP-1002-0275), respectively. Sequencing libraries were made using an Illumina DNA Prep Library Kit (Illumina, 20018705) in combination with IDT for Illumina DNA/RNA UD Indexes Set B, Tagmentation (Illumina, 20027214) according to the Illumina DNA Prep Reference Guide (Illumina, 10000000254 16v09). Six PCR cycles were employed to amplify 30ng of tagemented DNA. Pooled DNA libraries were sequenced paired-end on a NovaSeq 6000 SP Reagent Kit v1.5 (300 cycles; Illumina, 20028400) on an Illumina NovaSeq 6000 instrument. The run produced, on average, 159 million reads/sample. The quality of the sequencing run was assessed using Illumina Sequencing Analysis Viewer (Illumina version 2.4.7) and all base call files were demultiplexed and converted into FASTQ files using Illumina bcl2fastq conversion software v2.20.
Raw data from Illumina shotgun metagenomics sequencing are from the SRA NBI databank (project ID: PRJNA757614; accession codes between xxx-xxx).
Read processing
Short amplicon
Illumina short-reads were processed with the DADA2 (Divisive Amplicon Denoising Algorithm 2) [35] pipeline. The DADA2 pipeline includes the inspection of read quality, quality filtering and trimming of reads, dereplication and error rate learning, sample inference for the determination of true sequence variants, merging of reads, construction of sequence table, removal of chimeric reads, and taxonomic assignment. Each primer dataset (Earth, MiSeq, 27F_534R) was first run independently through the DADA2 pipeline, then the Earth and MiSeq datasets were combined and processed identically ("Combined dataset"). The reads were truncated based on the forward and reverse primer read length (Earth forward primer: trimLeft – 19; Earth reverse primer: trimRight 20; MiSeq forward primer: trimLeft – 20; MiSeq reverse primer: trimRight 90). For the 27F_534R library, the reads were truncated with trimLeft (forward primer: 20, 17) and trimRight (Reverse primers: 30, 100). All other filterAndTrim parameters were left at the default values. Reads from the individual and combined libraries were merged, while primer pair 27F_534R was concatenated because too many basepairs were trimmed for a successful merging. For the remove bimera denova step, the minfoldParentOverAbundance parameter was set to 5 for individual datasets and 4 for the combined dataset. After filtering the datasets retained the following about of reads 3,195,326 (65.8% average) for the 27F_534R dataset, 4,453,843 (83.8%) for the Earth dataset, 4,549,777 (85.5%) for the Earth combined dataset, 4,212,922 (81.7%) for the MiSeq dataset, and 4,216,287 (81.8%) for the MiSeq combined dataset (Additional File 2). Mock community reads were removed from the total amount of reads for each dataset. Additionally, reads from the technical samples (e.g., 20 and 21) were removed for the Earth dataset.
Individual datasets were used to quantify individual primer pair read quality, while the Combined dataset was used to quantify alpha and beta diversity, technical replication reproducibility, and MDS analysis. Sequencing quality was analyzed using the percentage of reads with a Phred score equal to or larger than 30 for each sample type and primer. Microbial taxonomic alpha-diversity (intra-sample) was calculated using Richness and Shannon indices as implemented in the R package phylsoseq [36]. Species beta-diversity (inter-sample) was estimated using the Bray-Curtis dissimilarity metric, while the dissimilarity between groups was visually assessed with multidimensional scaling (MDS) plots.
Long amplicon
Pac-Bio Shoreline long reads were demultiplexed without primer trimming, palindromes were removed, and reads with lengths smaller than 200 basepairs were filtered out using the SBAnalyzer software (Shoreline Biome).
Shotgun metagenomics
Illumina shotgun metagenomics reads were of high quality, and no filtering was required.
Taxonomic assignment
Short amplicon
Short-read data was assigned to taxonomic units with the SILVA v.138 gene reference database. After DADA2 filtering, the Earth dataset contained 10,941 ASVs, with 196 ASVs assigned to the mock community sample and 3 ASVs assigned to both the mock community and samples. Of the 10,742 ASVs found within the samples, 10,501 were assigned to Bacteria, 14 to Archaea, 57 to Eukaryota, and 2,170 could not be assigned. For the MiSeq dataset, 6,101 ASVs remained, with 22 ASVs assigned to the mock community. Of the 6,079 ASVs found within samples, 6,072 were assigned Bacteria, 2 to Archaea, 2 to Eukaryota, and three could not be assigned. For the combined dataset, 18,075 ASVs remained, with 234 ASVs assigned to the mock community and five ASVs assigned to both the mock community sample and samples. Of the 17,836 ASVs found within the sample, 17,588 were assigned to Bacteria, 16 to Archaea, 58 to Eukaryota, and 174 could not be assigned (Additional File 2).
Sample data were managed using the R package phyloseq (v1.30.0) (McMurdie and Holmes, 2013), and plots were generated using the R package ggplot2 (v.2.2.1) [37].
Long amplicon
Long read data were taxonomically assigned with the Athena database v2.2, resulting in 99.3% of reads successfully classified (196,749 reads). An abundance table, a taxonomic classification list for each species, and a list of samples assigned to each read were created (Additional File 3). The initial goal was to compare output after running short- and long-reads through the DADA2 pipeline. However, we obtained low read depths of the samples due to the mock community sample vastly outnumbering the samples during sequencing, which made this approach futile. Therefore, the abundance table was analyzed manually for the spatial distribution of species.
Shotgun metagenomics
The raw reads of the metagenomics samples were classified according to their taxonomy using kraken2 [38]. This software classifies reads according to their best matching location in the taxonomic tree. Bracken was used to estimate the species abundance [39], using the taxonomy labels assigned by kraken2 to estimate the number of reads originating from each species present in the sample.
Data analysis
Short amplicon
Read quality was assessed based on the percentage of reads with a Phred score greater than 30 for each primer.
Microbial taxonomic alpha-diversity (intra-sample) was evaluated with the Richness and Shannon indices implemented in the microbiome R package [40]. Species beta-diversity (inter-sample) was estimated with Bray-Curtis distances, using the ordinate function in the phyloseq package, to understand similarities and differences in community composition independent of primer choice, within-farm compartments, farm identity, and time point in the production cycle. The dissimilarity between samples was assessed by multidimensional scaling (MDS). Statistical variation in microbial alpha-diversity and taxa composition was calculated using a two-way analysis of variance (ANOVA), followed by Post-hoc Tukey tests.
Community composition was analyzed between primers, replicates, sample types, and farms by comparing the relative abundance of the top 9 phyla, all other phyla, and NAs. Furthermore, a PERMANOVA was conducted to determine whether biofilm communities differed significantly between farms.
Finally, ASV enrichments were analyzed with a PERMANOVA non-parametric multivariate test using the adonis function in the R package vegan (v.2.5.7) [41] to determine which ASVs were significantly enriched between tank samples of farm A and between farms. The coefficients of the top 20 enriched ASVs were plotted.
All analyses were completed in RStudio 1.4.1717 [42].
Long amplicon
The ten most abundant species were identified for each sample type per farm based on the total number of reads after both replicate reads were summed together. Abundance was compiled and plotted for these species to understand spatial and abundance distribution across sample types and farms. Markedly, some replicates have less than ten dots because the top species was only detected in one replicate.
Shotgun metagenomics
Phyla with at least 0.5% or more of the total reads were retained to analyze the overall community composition as determined by shotgun sequencing. A Sankey plot using the R network3D v.0.4 package [43] was plotted to compare the community composition across the domains. In addition, relative abundance bar graphs were plotted to quantify community composition variance at the replicate, sample type, and within-farm compartments.
All figures were prepared for publication using Adobe Illustrator 2021.