Fish rearing conditions and disease phenotype
The experimental trial was performed at the LetSea land facility (Bjørn, Dønna, Norway) in a seawater based flow-through system heated by heat pump and aerated. Juvenile salmon (Salmo salar) were obtained from the commercial hatchery Grytåga Settefisk AS (Vefsn, Norway). Fish were approximately one-year-old and vaccinated with ALPHA JECT micro® 6 (PHARMAQ), a vaccine that protects against furunculosis, vibriosis, cold water vibriosis, winter sore, and infectious pancreatic necrosis. Salmon were initially kept at 12 °C in 24 ppt salinity before being stabilized at 14 °C. The fish were acclimatized whilst being fed commercial feed. Fish were then transferred to 12 replicate different tanks (capacity = 2000L) containing saltwater (33-34ppt of salinity) directly pumped from the sea and subjected to UV treatment for sterilization. Each tank contained between 200 and 300 fish. After the transfer into pure saltwater, a subset of the fish in each tank unexpectedly started to show disease symptoms, specifically the development of large skin ulcers (Additional File 1 - Supplementary Fig. 1). Some fish showed a more severe ulcer phenotype while others seemed in overall good health condition with no external signs of disease. Fish with ulcers were then considered sick, while fish with no visible wounds were scored as “healthy”. The lack of visible wounds is not per se proof of the absence of the causal pathogen but it indicates that these fish were at least at a less progressed stage of the disease and that they might have been able to resist the pathogen for a longer period. As such, the fish scored as healthy still serve as a useful reference group of more resilient fish for comparison with fish clearly affected by the pathogen or at later stages of the infection.
Diagnostic analysis of disease
To identify the causative agent of the ulcerative disease, a culture based bacteriological analysis from both wound and kidney swabs samples was performed by Vaxxinova Norway AS. Two different kinds of culture media were used: marine agar medium and blood agar with 2% NaCl medium. Sequencing of the V1-V2 hypervariable region of the 16S rRNA gene (primers: 27F AGAGTTTGATCCTGGCTCAG; 519R GWATTACCGCGGCKGCTG [33]) was performed for the identification of the bacterial species.
Water disinfection treatment with formalin
To disinfect the entire water system, formalin was applied (aqueous solution of formaldehyde stabilized with methanol), [31], [32]. For the treatment we used 1L of formaldehyde 38% (38 mg/ml) every 4000L of water (0.5L per tank). Formalin was left in the water to act for 30 minutes before reopening the water flow-through. The treatment included two separate disinfection procedures carried out with a period of four days in between. Each day before treatment the fish were left starving. Samples were collected both before and after the complete formalin treatment.
Distal gut content samples collection
Two samplings were performed at nine days distance in May 2019, one sampling before and one sampling after formalin treatment. A total number of 80 fish were sampled from different tanks. Of these, 40 salmon were sampled before formalin treatment and 40 after treatment. In both cases, 20 healthy and 20 sick fish were randomly picked across replicate tanks (Fig. 1).
Fish were euthanized using an overdose of Finquel MS-222 (Tricaine Methanesulfonate). Within ca 10 min after the euthanization, the abdominal cavity was opened at the ventral midline, and the intestine was aseptically removed. For each salmon, samples from both the distal gut content and the distal gut mucosa were collected (Fig. 1), using scalpels disinfected with bleach 10% and ethanol 70%. Samples were stored in 1.5 ml sterile Lysing matrix E tubes (MP Biomedicals™) containing 1x DNA/RNA Shield™ buffer (Zymo Research). Tubes were kept at room temperature for the transport to the laboratory and then transferred to a -20 °C freezer until DNA extraction.
Length (cm), weight (g) and gutted weight (g) were measured for each fish to investigate possible correlations between the gut microbiota composition and a high growth performance profile of the fish. Information on the presence or absence of visible wounds were recorded and then utilized to assess the health status of each individual fish. The presence of visible wounds was used as a phenotypic marker for the presence of the disease. Fish without visible wounds were considered as in overall good health with no, or little impact, by the pathogen (see above).
Three binary variables can be recognized in our experimental design: 1) healthy vs sick fish, 2) before vs after formalin treatment and 3) type of sample (distal gut content vs distal gut mucosa), for a total of eight (2³) groups, each one constituted of 20 samples (Fig. 1 and Additional File 2). The above-mentioned groups were used in the subsequent analysis and are hereafter referred to with their acronyms as presented in Fig. 1.
DNA extraction and quality control
DNA extraction and purification from both distal gut content and distal gut mucosa samples was performed with an in-house extraction protocol as described in the Additional File 3. DNA concentration was assessed with Qubit fluorometric 3.0 quantification (Thermo-Fisher Scientific), following the manufacturer's recommendations.
Real-time PCR (qPCR) was performed on all extracts prior to metabarcoding, to optimise the subsequent metabarcoding process. Specifically, all DNA extracts were pre-screened using SYBR Green qPCR [34] with both primer sets to I) screen for contamination in extraction negatives, II) identify the potential presence of PCR inhibitors, and III) optimise the cycles needed for metabarcoding PCRs. qPCRs were performed in 21 µl reactions containing 2 µl DNA template, 9.5 µl of AccuPrime SuperMix II (Invitrogen), 6.5 µl ddH20, 0.5 µM 16S forward primer, 0.5 µM 16S-reverse primer and 1 µl of SYBR Green/ROX solution (Invitrogen). qPCR amplifications were performed on an Mx3005 qPCR machine (Agilent Technologies) with the following cycling conditions: 95 °C for 10 min, followed by 40 cycles of 95 °C for 30 s, 55 °C for 20 s, and 68 °C for 40 s. Using serial dilutions of the DNA template (1:10 and 1:20) we tested for the presence of contaminants (e.g. excess of host DNA), responsible for PCR inhibition [35]. For some samples only the dilutions amplified. In those cases, the 1:10 dilution was selected for the subsequent PCR (Additional File 4). Negative controls were included in every qPCR reaction.
The Ct values obtained from the qPCR (Additional File 5) were also used to inspect relative differences in the microbial biomass of different samples or groups of samples, with the assumption that the 16S rRNA gene can serve as a proxy for the actual abundance of microorganisms. We accounted for a potential bias introduced by differences in the 16S rRNA gene copy number among OTUs (see below).
Metabarcoding and sequencing
For 16S rRNA gene profiling, the following primers were used: 341 F (5'-CCTAYGGGRBGCASCAG-3') and 806 R (5'-GGACTACNNGGGTATCTAAT-3'), to amplify the V3-V4 region of the 16S rRNA gene [36]. PCR was performed in 0.2 ml PCR tubes using the same reaction composition used for qPCR with the exclusion of the SYBR green and ROX dyes. Tagged primers were used in different combinations to allow multiplexing of samples during sequencing. According to the qPCR results, some distal gut content samples were subjected to 30 PCR cycles of amplification while the others got 35 PCR cycles. For the distal gut mucosa samples, some were subjected to 35 PCR cycles while others were given 40 PCR cycles (Additional File 4). Samples from the gut content amplified better than those from the mucosa (see Additional File 5). PCR was performed in triplicates under the following conditions: denaturation at 95 °C for 5 min followed by the determined number of cycles of denaturation at 95 °C for 15 s, annealing at 55 °C for 20 s and extension at 68 °C for 40 s. After the completion of the cycles, samples were left at 68 °C for 10 minutes for final extension and then cooled to 4 °C. Two PCR blanks were included in all PCR reactions with ultrapure water replacing the DNA template. To reduce the risk of contamination, pre- and post-PCR products were handled in two different laboratories designated for pre-PCR setup and post-PCR processing, respectively and PCR master mix solutions were prepared in a designated DNA template-free laboratory. PCR products were visualized using 2% agarose gel electrophoresis (GE) to check the amplification products quality and amount. All the samples in one PCR replicate have been pooled prior to library preparation. To reduce bias introduced by differential amplification between samples, PCR products were pooled at approximately equimolar ratios determined by gel band strength on the agarose gel. Extraction and PCR blanks were included in the pools for downstream quality filtering, but in a non-equimolar fashion to avoid excessive dilution. PCR replicates were purified through SPRI bead purification [37], with a beads-to-sample ratio of 1X, two washing steps in 0.5 ml of ethanol 80% and elution in 35 µl of EB Elution Buffer (10 mM Tris-HCl). DNA concentration measurement was performed with Qubit 3.0 (Thermo-Fisher Scientific), following the manufacturer's recommendations. We generated following the Tagsteady protocol [38] seven sequencing libraries, including three PCR replicates from the distal gut content samples, three PCR replicates from the distal gut mucosa samples and one library blank made of ultrapure water. Tagsteady is a PCR-free Illumina library preparation protocol specifically developed for metabarcoding studies to avoid false assignment of sequences to samples [34]. Indexed library quantification was performed using NEBNext® Library Quant Kit for Illumina® (NEB, New England Biolabs), following the manufacturer's recommendations. 300 bp paired-end sequencing of the amplified V3-V4 region of the 16S rRNA gene was performed at the Danish National High-Throughput Sequencing Center, University of Copenhagen, Denmark, using an Illumina MiSeq platform with reagent kit v3, 600 cycles.
Bioinformatic data processing
Raw reads were quality filtered and de-multiplexed prior to downstream analyses. Read quality was initially checked with FastQC [39] (Additional File 6). Sequences were subjected to trimming with AdapterRemoval [40]. Adapters were removed, together with low-quality bases (minquality = 28). Only sequences with a minimum length of 100 bp were retained. AdapterRemoval was also used to merge overlapping paired-end sequences, obtaining in this way sequences for the entire 16S rRNA gene V3-V4 region covered by the primers. Reads within each amplicon library were demultiplexed and filtered using Begum (https://github.com/shyamsg/Begum) a modified version of DAMe [41]. Singletons were removed and only sequences present in at least two out of three PCR replicates were maintained for downstream analyses. Merged read pairs were further filtered for their length, conserving only sequences with a length between 380 and 480 bp, discarding the unmerged reads.
Remaining sequences were then used to detect OTUs. OTU clustering was performed with SUMACLUST, using 97% similarity threshold [42]. The use of a higher clustering threshold didn’t affect the results and conclusion of the present study, however since it ensures an improved resolution, it’s utilization should always be taken into consideration [43]. Begum was used to convert the sequences in a suitable format for the clustering and to generate an OTU abundance table from the SUMACLUST output (Additional File 7). After the OTU table was generated, sequences were blasted using QIIME (version 1.9.1) against the NCBI nucleotide (nt) database for taxonomy assignment (Additional File 8).
Contaminants identification and removal
We identified and removed sequences originating from putative contaminants. To identify contaminants, all the DNA extraction and PCR amplification negative controls were included in the sequencing. The compositional profile of the negative controls was used to identify contaminants (Additional File 1 - Supplementary Fig. 2). To avoid removal of genuine OTUs present in the negative controls as a consequence of cross-contamination, all OTUs present in the negative controls were further investigated through BLAST search and, in some cases, phylogenetic analysis. Among all OTUs detected in the negative controls, only OTU2 (an unknown Mycoplasma genus) could be clearly assigned to fish gut or fish-related environments and was therefore retained. All other OTUs were treated as contaminants and filtered out.
For the unknown Mycoplasma genus a maximum likelihood (ML) phylogenetic tree was built (Additional File 1 - Supplementary Fig. 3). The sequence of the unknown Mycoplasma was blasted against NCBI database with blastx. A total of 28 sequences retrieved from GenBank were selected to be included in the tree, including 15 sequences obtained from fish gut or fish-related environments (such as fish farm sediments). Mycoplasma mobile, a fish gill pathogen that showed to diverge from all the other sequences, was included as an outgroup. The 29 sequences were aligned with MUSCLE [51] and the multiple alignment was used to build a ML neighbor-joining (NJ) phylogenetic tree with MEGA5 [52]. A model selection test was performed with MEGA5 to test for the optimal substitution model. The General Time Reversible (GTR) substitution model retrieved the lowest BIC scores (Bayesian Information Criterion) and was therefore considered the model that better described the substitution pattern. The ML-NJ-phylogenetic tree was then constructed using the GTR substitution model and 1000 bootstrap tests.
To further support the non-biological origin of the removed OTUs we utilized patterns of co-occurrence between the OTUs (Additional File 1 - Supplementary Fig. 4), their prevalence in low biomass samples (Additional File 1 - Supplementary Fig. 5) and support from previous literature on reagent contamination [44]–[46]. Remarkably, almost all the OTUs found in the negative controls corresponded to well-known reagent contaminants. Spearman correlation coefficient between OTUs (after rarefaction) was calculated in R (version 3.6.3) [47], [48] and plotted as a heatmap with the corrplot R package [49] (Additional File 1 - Supplementary Fig. 4). The qPCR based Ct values for each sample (Additional File 5) were incorporated in the analysis as a relative proxy for sample biomass to support identified contaminants. Since reagent contaminants are known to mostly affect samples with low microbial biomass [44]–[46], [50] we visualized this correlation by plotting the contaminants relative abundances (calculated as the sum of all the putative contaminants relative abundances) against samples Ct values using the “ggscatter” function (ggplot2 R package) [51]. We included OTUs recognized as genuine, OTU1 (Aliivibrio sp.) and OTU2 (Mycoplasma sp.), for comparison (Additional File 1 - Supplementary Fig. 5).
Data normalization and diversity analysis
After contaminants removal, also mitochondrial and chloroplast sequences were manually removed. Samples were then normalized by sub-sampling to a depth of 4,000 reads. All samples with less than 4,000 reads were discarded. In this way, 47 samples out of 160 were excluded. The remaining 113 samples were used for the subsequent analysis. Diversity analyses were conducted, using the hilldiv R-package [52] in Rstudio version 1.2.5033 [47], [48].
Microbial composition analysis
Stacked bar plots representing the microbial composition of the different groups or samples were generated using phyloseq [53], vegan [54] and ggplot2 [51] R packages. We used the gplots package “heatmap.2” function [55] and RColorBrewer [56] to create a heatmap representing OTU abundances among all eight groups. A heatmap dendrogram representing beta diversity among groups was calculated using Jaccard distance metric in vegan. We used Wilcoxon rank-sum test to assess differences in relative abundance of OTUs among groups of samples.
Correlation between microbiota and fish growth performances
To investigate potential correlations between the microbiota and fish growth performances we calculated the Spearman’s correlation coefficients between fish weight and the relative abundance of the most abundant OTUs (Aliivibrio sp. and Mycoplasma sp.). Spearman correlations were calculated and plotted using ggpubr [57] and ggplot2 R packages. Condition factor K (K value), a normalization of fish weight according to its length, was calculated for every fish (with parameter N = 5 as suited for salmonids) [58], and included in the analysis.
Fish were then grouped according to Aliivibrio sp. and Mycoplasma sp. relative abundances in their microbiota. Salmon with a percentage of Aliivibrio sp. higher than 80% (25 fish in total) were pooled together in one group and salmon with a percentage of Mycoplasma sp. higher than 80% (37 fish in total) were pooled in another group. All the other samples were discarded. After testing for prerequisites, we used a Welch's t-test to test for differences in the mean fish weight between the two groups.
Comparison of relative microbial biomass
To reveal variations in microbial biomass among groups we relied on qPCR Ct values. Ct values are inversely proportional to the amount of target DNA in the sample, meaning that in microbial metabarcoding studies the Ct value increases as the amount of microbial biomass in the sample decreases. The assumption is that the 16S rRNA gene can serve as a proxy for the actual abundance of microorganisms. A major deviation from this assumption can be introduced by differences in the 16S rRNA gene copy number among microorganisms. We used qPCR Ct values as a proxy for the samples microbial biomass and tested for differences in the qPCR Ct values among groups with an ANOVA coupled with a Tukey’s HSD post-hoc test for pairwise comparisons in R (version 3.6.3) [47]. To ensure that the observed differences were not an experimental artifact, we checked that our conclusions were robust even when accounting for differences in 16S rRNA gene copy number between Mycoplasma sp. and Aliivibrio sp., the two OTUs dominating our dataset (see below).
We utilized the information on EzBioCloud database (https://www.ezbiocloud.net/) (Accessed 28 July 2020) [59] to estimate the 16S rRNA gene copy number in Aliivibrio sp. and Mycoplasma sp. Median values for both genera were selected as an approximation of the real copy number value of our OTUs. Aliivibrio genus has a median value of 12, while Mycoplasma genus of 2. We then calculated a Ct value correction coefficient as log2(12/2) and, for each sample, we multiplied this value for the Aliivibrio sp. relative abundance and added the result to the sample Ct value:
Corrected Ct value = Ct value + (log2(a/m) * Aliivibrio relative abundance)
a = Aliivibrio genus median 16S rRNA gene copy number
m = Mycoplasma genus median 16S rRNA gene copy number
This correction does not account for changes in the OTUs relative abundances as a consequence of the different copy number, which, if included, would reduce the bias by reducing the Aliivibrio sp. relative abundance. Therefore, the applied correction has to be intended as a conservative approach to test the robustness of the Ct values based observations and demonstrate they are not a 16 rRNA gene copy number artifact.
Data and scripts used for the analysis are available at the GitHub repository: https://github.com/DavideBozzi/Bozzi_et_al_2020_analysis.