We analyzed the microbiome of 54 surface sediment, 54 seston and 121 mussel gut samples collected from 18 sites located in the Tennessee and Mobile River Basins, USA (Table S1) through amplicon-based sequencing of the v4 region of the 16S rRNA gene. Sequence processing was performed through Mothur and DADA2, before performing rarefaction, if any, and computing diversity indices.
Structure and coverage of 16S rRNA datasets ASV and OTU based analyses resulted in significantly different numbers of biological units in non-rarefied datasets, with fewer ASVs (mean 303 ± 215 per sample) than 97% similarity defined OTUs (1,386 ± 1,026), and 99% similarity defined OTUs (1,708 ± 1,319) (Wilcoxon signed-rank tests, p < 0.001 in all comparisons). The abundance distribution of biological units was more even in the ASV dataset than in 99%-OTU dataset and 97%-OTU datasets (Fig. 1; Wilcoxon signed-rank tests p < 0.001 in all comparisons). This pattern was apparent across all sample types, although evenness was highest in the sediment, intermediate in seston and lowest in host-associated communities (Fig. 1; Kruskal-Wallis and post-hoc associated pairwise tests, p < 0.001 for all comparisons). Chao’s coverage, which assesses the completeness of community sampling, was approximately 100% in all ASV-based datasets, and was significantly lower in OTU-based datasets (Wilcoxon signed-rank tests, p < 0.001 in all comparisons, Figure-S1). For the OTU based datasets, Chao’s coverage index was the highest in the mussel gut microbiome samples (mean of 97 ± 3% in mussel microbiome), and lower in the seston (81 ± 7%) and sediment communities (72 ± 13%).
Figure 1: Effect of sequence processing methodology on the rank abundance of biological units. The relative abundance of ASVs (obtained from DADA2) and OTUs defined by 97% or 99% similarity (obtained from MOTHUR). Units were sorted according to their rank of abundance and represented separately for each community type (a: mussel gut microbiome, b: sediment, c: seston). Inserts focus on the 500 most abundant biological units. The evenness of the relative abundance of the biological units were computed using Bulla’s O and are displayed on each plot.
Alpha-diversity metric
All alpha-diversity metrics were distinct between sequence processing methods ( ASVs vs. OTUs vs.), at every rarefaction level (Wilcoxon signed-rank tests p < 0.001 in all comparisons, Fig. 2, Figure-S1). ASV-derived richness was roughly twice as low as 99%-OTU and 97%-OTU richness, which were closer to each other (99%-OTU richness was 1.1 times higher than 97%-OTU richness). The difference between ASV- and OTU-derived richness was even higher using iChao1 (4.8–5.3 higher in 97%- and 99%-OTUs datasets than in ASVs, respectively, Figure-S1) and ACE (3.5–4.1 times respectively, Figure-S1). However, differences between the approaches were lower in the case of the Shannon alpha-diversity index (1.4 and 1.7 times respectively, Fig. 2).
Rarefaction decreased all alpha-diversity estimates (Fig. 2 & Figure-S1; Wilcoxon signed-rank tests p < 0.001 in all comparisons). This decrease was lower for the ASV-based datasets than for those derived from OTUss (Figure-S2). For example, after rarefaction to 2,000 sequences per sample, the decrease in observed richness compared to before rarefaction was 63 ± 17% for 99%-OTUs, 59 ± 17% for 97%-OTUs, but only 22 ± 15% for ASVs (Figure-S2). The post rarefaction decline varied across indices, being the greatest for observed richness, ACE and iChao1 (respectively, 48 ± 25%, 43 ± 24% and 40 ± 24% across all datasets), and lowest for Shannon alpha-diversity (21 ± 17%).
Figure 2: Effect of sequence processing methodology on alpha-diversity metrics for microbiome analyses. Measure of taxonomic richness (expressed as number of OTUs or ASVs and Shannon alpha-diversity depending on the methodology used, namely ASV, 97%-OTU and 99%-OTU, at 3 different levels of rarefaction (1,000; 2,000 and 3,000 sequences per sample) within each sample type studied (a-c). To ease the visualization of differences across methods and rarefaction levels, the y-axis of plots has been log transformed. The effect of methodological choices on other alpha-diversity metrics is available in Figure S1.
The choice of ASV or OTU-based analysis also influenced how samples ranked in terms of alpha diversity. Correlations in sample ranking between ASVs and OTUs approaches were the lowest for richness indices that correct for sampling bias (iChao1 and ACE; Spearman’s r = 0.61–0.84 depending on index, rarefaction level and OTU threshold), intermediate for observed richness (0.80–0.85), and highest for Shannon diversity (0.90–0.97) (Table 1). Rarefaction increased the strength of these correlations, but only minimally (Table 1).
Table 1
Correlation between alpha-diversity metrics between OTU- and ASV-based datasets accross every rarefaction level tested. Correlations were tested using Spearman’s signed rank tests tests (all P values were < 0.001). Correlations showing poorer agreement between OTUs and ASVs are noted, i.e. <0.9 (bold) or < 0.8 (bold and underlined).
Diversity index | Rarefaction | ASVs vs. 97%-OTUs | ASVs vs. 99%-OTUs |
Richness | No rarefaction | 0.80 | 0.81 |
3,000 sequences | 0.81 | 0.81 |
2, 000 sequences | 0.85 | 0.85 |
1,000 sequences | 0.90 | 0.91 |
Shannon | No rarefaction | 0.95 | 0.96 |
3,000 sequences | 0.96 | 0.96 |
2, 000 sequences | 0.96 | 0.96 |
1,000 sequences | 0.96 | 0.97 |
iChao1 | No rarefaction | 0.62 | 0.61 |
3,000 sequences | 0.65 | 0.66 |
2, 000 sequences | 0.70 | 0.69 |
1,000 sequences | 0.76 | 0.78 |
ACE | No rarefaction | 0.70 | 0.69 |
3,000 sequences | 0.73 | 0.73 |
2, 000 sequences | 0.77 | 0.77 |
1,000 sequences | 0.84 | 0.84 |
In terms of sample type, observed richness and Shannon diversity were highest in the sediment, intermediate in the seston, and lowest in mussel host-associated communities, regardless of sequence processing method or rarefaction level (Kruskal-Wallis and associated pairwise post-hoc tests, p < 0.05 in all cases and for all comparisons, Fig. 2). However, more subtle biological effects, like that of river or site, varied with the sequence processing method and the alpha-diversity metric, with an intensity that was dependent on the type of the community (Fig. 3). While differences in Shannon diversity between rivers were consistent regardless of sequence processing method and sample type, patterns in observed richness were more variable. For sediment samples, richness patterns were generally stable to shifting from ASV to OTU-based approaches, with no significant differences in sediment microbiome richness between the six rivers. However, for seston and mussel host communities, the selection of an ASV or OTU-approach influenced the outcome as to whether any rivers were significantly different in richness or not (Fig. 3).
Figure 3: Comparison of observed richness and Shannon alpha-diversity across sampling rivers and sequence processing methods. For each community type, (a) mussel microbiome, (b) sediment and (c) Seston, the average alpha-diversity value per river were represented by the size and color intensity of the corresponding circle. Letters representing whether diversity values of different rivers were significantly different or not, were displayed (Kruskal-Wallis’ pairwise post-hoc tests, P < 0.05). For instance, ASV richness of mussel microbiome sampled on Bear Creek (‘c’) was significantly lower fromthan the one of mussels collected on river Buttahatchee (‘a’), but did not differ from the ones collected on Paint Rock (‘abc’). To facilitate comparison across panels, all metrics were scaled between 0 and 1.
Beta-diversity indices
Beta-diversity metrics computed with ASVs and OTUs were highly correlated (Spearman’s correlation values > 0.9), with the exception of U-Unifrac (< 0.81, Table 2). Rarefaction slightly increased agreement between ASV and OTU based beta-diversity metrics, for both OTU thresholds (Table 2). Rarefaction only had a low effect on Bray-Curtis and Jaccard indices, with a correlation > 0.9 between rarefied and unrarefied data and no effect on W-Unifrac (r = 1; Fig. 4 and Figure-S3). Rarefaction did, however, have a great influence on U-Unifrac values computed on OTU datasets (r = 0.82–0.87, Figure-S3).
Table 2
Correlation between beta-diversity metrics between OTU- and ASV-based datasets across every rarefaction level tested. Correlations were assessed using Mantel tests based on Spearman’s correlation coefficient, which is indicated for each correlation (all P values were < 0.001). Correlations < 0.9 are highlighted in bold, showing poorer agreement between OTUs and ASVs.
Beta-diversity index | Rarefaction level | 97%-OTUs vs. ASVs | 99%-OTUs vs. ASV |
Jaccard | No rarefaction | 0.91 | 0.96 |
3,000 | 0.93 | 0.96 |
2,000 | 0.94 | 0.97 |
1,000 | 0.94 | 0.97 |
Bray-Curtis | No rarefaction | 0.92 | 0.96 |
3,000 | 0.93 | 0.96 |
2,000 | 0.94 | 0.97 |
1,000 | 0.94 | 0.97 |
U-Unifrac | No rarefaction | 0.80 | 0.82 |
3,000 | 0.83 | 0.89 |
2,000 | 0.84 | 0.90 |
1,000 | 0.86 | 0.91 |
W-Unifrac | No rarefaction | 0.93 | 0.95 |
3,000 | 0.94 | 0.95 |
2,000 | 0.94 | 0.95 |
1,000 | 0.93 | 0.95 |
Figure 4: Change in Bray-Curtis and W-Unifrac beta-diversity after rarefaction to 2,000 sequences per sample within the ASV-, 97%- and 99%-OTUs datasets across all sediment, seston and mussel microbiomes. The agreement of ranks of beta-diversities before and after rarefaction were assessed using a Mantel test based on Spearman’s coefficient of correlation, which result is indicated over each plot. The diagonal is marked by the red line. The same correlations on other beta-diversity metrics (Jaccard and U-Unifrac) is available in Figure-S3.
The selection of beta-diversity index had the strongest impact on detecting a biological signal (i.e. type of community, sampling river and sampling site). Random Forests tests estimated that the choice of beta-diversity index had a 7-15x higher contribution in detecting a biological signal than the choice of ASV vs. OTU pipeline, and both had greater contributions than rarefaction (Figure-S4). For instance, PERMANOVA R-square values varied by a six-fold factor across different beta-diversity indices within the same dataset and biological factor (Figure-S5), while the choice of ASVs vs. OTUs dataset only varied up to 3-fold, with all else being equal.
W-Unifrac was the most reliable index for detecting biological signals across different community types (PERMANOVA’s R2 = 0.42 ± 0.0), for detecting the river effect (R2 = 0.32 ± 0.14), and for the site effect (R2 = 0.59 ± 0.24) (Figure-S5). The second most efficient index was Bray-Curtis, with significant signals in the same factors (R2 = 0.24 ± 0.03, 0.34 ± 0.14, and 0.57 ± 0.21, respectively). Bray-Curtis surpassed W-Unifrac for detecting the effect of river in ASV and 99%-OTU datasets (Figure-S5).
97%-OTUs were slightly better at detecting a difference of structure across community types (PERMANOVA’s R2 = 0.26 ± 0.10 vs. ASVs R2 = 0.24 ± 0.11 across all indices and rarefaction levels, Wilcoxon signed-rank test, p < 0.05, Fig. 5). In contrast, when detecting river and site effects, while ASVs, 97%-OTUs and 99%-OTUs gave similar detection ranges in sediment and mussel microbiomes (River effect R2 = 0.24 ± 0.04 and 0.16 ± 0.04, respectively), ASVs performed better within seston communities (River effect R2 = 0.40 ± 0.12 in OTUs vs. 0.47 ± 0.12 in ASVs, Fig. 5). Usage of 99%-OTUs did not improve the detection of any biological effect compared to 97%-OTUs (Wilcoxon signed-rank test between 97%- and 99%-OTU datasets, p > 0.05).
Figure 5: Quality of detection of biological signal across sequence processing methods and rarefaction levels. The intensity of the effect of (a) community type, (b) sampling river and (c) collection site on community structure was assessed using separated PERMANOVAs for each factor, and each combination of sequence processing method, rarefaction level and index of dissimilarity. Here results are aggregated for all dissimilarity indices computed for this study (Jaccard, Bray-Curtis, and U- and W-Unifrac); comparison of the efficiency of the different indices for detecting the same biological effects is provided in Figure-S4.
Composition
While Gammaproteobacteria (12.0 ± 9.0%), Cyanobacteria (11.1 ± 12.4%), Alphaproteobacteria (10.0 ± 8.0%) and Planctomycetes (15.7 ± 15.3%) were the main classes identified in all datasets and rarefaction levels used (Fig. 6), several major classes of Bacteria had their relative abundance computed on OTUs poorly correlated with abundance computed on ASVs. These classes included Bacilli (Spearman’s r = 0.34 ± 0.21), Clostridia (r = 0.66 ± 0.28), Bacteroidia (0.75 ± 0.13), and Actinobacteria (0.81 ± 0.17). Other major classes presented a higher reproducibility across methods (0.83 ± 0.09). The class Verrucomicrobiae, which was dominant in OTU datasets (8.7 ± 14% of relative abundance), was detected at much lower level within ASVs, where it wasn’t among the most common classes (1.7 ± 0.6%) (Fig. 6). Conversely, the ASV pipeline characterized Mollicutes within the mussel gut microbiome (13.4 ± 13%), which remained completely undetected in the OTU datasets (Fig. 6).
Figure 6: Composition of mussel gut microbiome, sediment and seston communities found using OTU-based and ASV-based methodologies. (a) The relative abundance of the 11 most common bacterial classes in datasets rarefied to 2,000 sequences/sample. (b) Heatmap representing agreement of the 20 most detected bacterial genera across methodologies and rarefaction levels. Numbers represent the percentage of those 20 genera in common across the compared treatments (’shared’); color intensity represents overall Spearman’s correlation coefficient of the shared genera between 0treatments.
Differences across methods were more pronounced at the genus level, with only ~ 50% of the most common 30 genera shared between ASV and OTU-based datasets (excluding genera belonging to unclassified families), and correlations of the relative abundance of such shared genera generally below 0.9 (Spearman’s correlation Fig. 6; Supplementary Table 2). The 30 most detected genera unique to ASV datasets included Methyloglobus, Escherichia/Shigella and Clostridium sensus stricto (Supplementary Table 2). Compared to the differences between ASV and OTU methodology, rarefaction and OTU similarity threshold had only a minor impact on overall class- or genus-level classification, with, for instance, a correlation of > 0.9 of abundance of the major classes and genera between unrarefied and 2,000 sequence datasets, and between 99%- and 97% OTUs. As previously observed, with alpha- and beta-diversity indices, rarefaction had a lower impact on ASV- than OTU-based datasets, with all most detected genera shared before and after rarefaction (Fig. 6).