Fish husbandry and harvest
Fish were produced at the NCCCWA as a part of our ongoing selective breeding program. Briefly, full-sibling families (ARS-FY-H = 99; ARS-FY-L = 23) were produced from single-sire × single-dam mating events. Each family was reared separately from hatch through approximately 30 g (4 months post-hatch) when 8 fish per family were anesthetized (100 mg/L tricaine methanesulfonate, M-222) and uniquely tagged by inserting a passive integrated transponder (Avid Identification Systems Inc., Norco, CA) into the peritoneal cavity. After tagging, fish were comingled and reared in a single, 1,800-L tank that also housed contemporary fish (n = 118) from the ARS-FY-C line. The tank received identical water from a partial re-use system and water temperature was maintained at approximately 13 degrees C for the entirety of the grow-out period. Fish were split at random into a total of two 1,800-L and two 3,800-L tanks as they grew to maintain biomass densities below 100 kg/cubic meter. At approximately 13 months of age, fish used in the current study were split into three replicate 800-L tanks (n = 46 fish per tank). One week before harvest, the fish were split at random into four 800-L tanks (34–35 fish per tank) to allow the harvest of two complete tanks on each of two successive days and thus minimize netting-associated stress associated with harvesting of a partial tank. Fish were fed a commercial diet (Finfish G, 42% protein, 16% fat; Ziegler Bros Inc., Gardners, PA) using automatic feeders (Arvotec, Huutokoski, Finland) that provided feed at a daily ration that was considered as slightly below satiation.
At approximately 15 months post-hatch, fish were euthanized using an overdose of anesthetic (300 mg/L MS-222) and processed for analysis of the muscle yield trait. Fish were not fed the day prior to and the day of harvest. Twenty families were pre-selected from the ARS-FY-H and ARS-FY-L lines (40 families total within four tanks; n = 1 fish per family) for fecal collection at harvest; selection was based on divergent mid-parent breeding values and to maximize genetic diversity within each line. Due to a mortality of a ARS-FY-H fish, 39 fecal samples were collected for this study, 19 from the ARS-FY-H genetic line (11 and 8 samples from the first and second harvest dates, respectively) and 20 representing the ARS-FY-L line (13 and 7 samples from the first and second harvest dates, respectively). Fecal samples were stripped manually into sterile Eppendorf tubes (Eppendorf, Hauppauge, NY), then stored in a -80 °C freezer until analysis. Fish were eviscerated and the carcasses were placed on ice and held in a 4 °C refrigerator overnight for analysis of muscle yield the following day.
DNA extraction, library preparation and sequencing
To extract DNA, fecal samples from 19 ARS-FY-H and 20 ARS-FY-L fish were subjected to DNA isolation using a Promega Maxwell DNA Isolation Kit (Promega Corporation, Madison, WI), as we previously described [25] with a minor modification where 20 µL of lysozyme was added in samples to facilitate cell wall lysis. Briefly, 200 mg of fecal sample was added to a microtube containing 160 µL of incubation buffer, 20 µL proteinase k solution, and 20 µL lysozyme. The mixture was incubated at 70 °C overnight, and after incubation, 400 µL of lysis buffer was added to the mixture and the sample was vortexed briefly. The samples were then subjected to the Maxwell 16 Automated DNA purification machine and the DNA was collected in a 50-µL elution buffer.
Library preparations and sequencing were done based on 16srDNA sequencing strategy using the Illumina 16S Metagenomic Sequencing Library Preparation Guide. Briefly, 10 µM of 515F and 10 µM of 806R primers amplifying V4 regions were used to target 16srRNA gene using McLAB HiFi master mix using polymerase chain reaction (PCR). The final PCR reaction consisted of 12.5 µL 2x HiFi, 1 µL of 10 µM 515F primer and 1 µL of 10 µM 806R primer, 5 µL DNA and 5.5 µL sterile nuclease-free water. The PCR product was then subjected to size selection using a magnetic bead capture kit (Ampure; Agencourt). After the first PCR clean up, dual indexed primers were used to amplify the V4 region as described by Kozich et al. [44]. After indexing, samples were again size selected using a magnetic bead capture kit (Ampure; Agencourt). PCR products were quantified after amplification and indexing using a Qubit fluorometer (Invitrogen, Carlsbard, CA) and fragment size (approximately 450 bp) was visualized on a 1.5% gel electrophoresis stained with SYBER safe, then samples were normalized to 4 nM. Samples were loaded onto an Illumina MiSeq flow cell and sequencing was done using 250 bp-paired end sequencing using a 500 cycle V2 reagent cartridge (Illumina, Inc., San Diego, CA) according to the manufacturer’s instructions (Miseq System Guide) [45].
Bioinformatics Analysis
A total of 28,518,046 paired-end raw sequences were obtained during the Miseq run. Sequencing data were analyzed using Mothur (v.1.40.2, www.mothur.org) according to the Mothur Illumina Miseq standard operating procedure (SOP) [44, 46] with several modifications. After forming contigs, the total number of sequences was 11,020,368, and pcr.seqs command was used to trim primers and adaptors to the V4 region. The median length of the sequences was determined as 253 by using the summary.seqs command [47]. Screen.seqs command was used to remove sequences with length > 254 bp and < 251 bp containing homopolymers of > 8, and with ambiguous base calls. The split.abund command was used to keep sequences with more than two reads [48]. The SILVA v123 database [49] was used to align the sequences and sequences that failed to align, or classified as Archaea, chloroplast, eukaryotic mitochondrial, or unknown sequences were excluded from the analysis. Chimeric sequences were detected by chimera.vsearch and removed from the analysis. The remaining sequences were clustered using cluster.split [50] at a threshold of > 97% sequence similarity. Operational Taxonomic Units (OTUs) with relative abundance < 10 across all samples were removed from the analysis by using the remove.rare command [51, 52]. The final data set was subsampled at 2420 sequences to normalize the data set for statistical analyses. DNA extraction and library preparation blanks were included during sequencing and bioinformatics, and all OTUs within these samples were removed from the final analysis. The code used during bioinformatics analysis, taxonomy file, and shared file are all included in additional files 2, 3, and 4, respectively.
Statistical analysis
To test for the statistically significant differences of the muscle yield between the two groups, a one-way Mann-Whitney U test (Prism, GraphPad Software, Inc., La Jolla, CA) was performed. Statistical analyses (Alpha diversity, Beta diversity, microbial functional profiling pathways) were performed in R version 3.5.2 using the packages vegan [53] plyr [54] dplyr [55], ggplot2 [56], lmerTest [57], pheatmap, MuMIn [58], lme4 [59], Tax4Fun [60], DEseq2 [61], rcompanion [62], grid [63], and TidyVerse [64].
Alpha and Beta diversity analysis of fecal samples between high- and low- muscle yield genetic lines
Sixteen fecal samples (that passed QC during bioinformatics analysis) from the ARS-FY-H and 18 fecal samples (that passed QC) from ARS-FY-L were used for this analysis. A Tukey’s ladder of power transformation was performed to fit inverse Simpson values to a Gaussian distribution. Alpha diversity between the genetic lines was compared using a linear mixed-effect model (LMMs) with the genetic line as a fixed effect and harvest day set as a random effect (package lme4) [65].
Beta diversity was calculated to test if the gut microbiota was predictive of muscle yield genetic lines. To do this, a Bray-Curtis dissimilarity matrix was generated using the vegdist function in the Vegan package [66]. The betadisper function in Vegan was used to test for the homogeneity of multivariate dispersion between gut assemblages from ARS-FY-H and ARS-FY-L genetic lines. The metaMDS function in Vegan was used to generate non-metric multidimensional scaling ordination (nMDS) values, which were then plotted using ggplot2 [67]. The adonis function in Vegan was used to perform PERMANOVA on Bray-Curtis dissimilarity values to determine if the genetic line was predictive of gut assemblages while controlling for a harvest day effect (harvest day as strata, 999 permutations). An indicator species analysis was performed in Mothur to determine the microbial assemblages that were explanatory of muscle-yield genetic lines [29]. In addition, to determine the significant differences in relative abundances of taxa between the genetic lines, Kruskal test was performed. The nMDS ordination showed a pattern suggesting a ‘harvest day’ based effect; therefore, we subset our samples into two data frames based on independent harvest days. Both data frames had nearly equal numbers of gut microbial samples from the two genetic lines (10 ARS-FY-H and 11 ARS-FY-L - in harvest day 1, and 6 ARS-FY-H and 7 ARS-FY-L- in harvest day 2). Separate Bray-Curtis dissimilarity matrices were generated for each data frame, followed by nMDS ordination values calculated and plotted in ggplot2. PERMANOVA was used to test for differences in microbial assemblages with genetic line set as a fixed effect.
Functional annotation of 16srRNA sequence data
To investigate the microbial functional and metabolic capacities of the microbial assemblies, phylotype based OTU clustering and classification was performed using the phylotype command in Mothur. The shared file was then converted to the biome format using the make.biom command in Mothur. The Tax4FUN package in R was used to predict the microbial functional and metabolic capacities by linking 16srRNA gene-based taxonomic profiles to KEGG metabolic reference profile references [60]. The normalized KEGG pathway output was used to investigate the enrichment of microbial pathways between the genetic line samples using DESeq2. Important pathways associated with host-microbiome interactions and with an average FTU score of 0.55 and an adjusted p-value less than 0.001 were selected for heatmap visualization using the pheatmap R package[68]. The R code used during the analysis have been included in Additional file 5, and statistical results for all analysis are included in Additional file 6.