Behavior of Hill-based dissimilarity indices and the qRC null model
Count tables from microbial community surveys typically consist of a few highly abundant OTUs/ASVs and many low-abundant ones. Using a highly simplified count table (Table S1, supplementary material), we demonstrate how the Hill-based dissimilarity indices behave in comparison to the Jaccard and Bray-Curtis indices, which are more commonly used in microbial community studies.
First, let us consider the situation when samples have equal richness, i.e. the same numbers of detected species (Fig. 1A). Four samples (S0, S1, S2, S3) each have 2 abundant, 4 intermediate, and 8 rare species. Samples S0 and S1 share 1 abundant, 2 intermediate, and 4 rare species. As expected, the Hill-based dissimilarity (qd) between S0 and S1 is 0.5 for all values of q. Sample S0 and S2 share half of the rare and intermediate species, but none of the abundant species and consequently qd goes towards 1 as q increases. Samples S0 and S3 share all intermediate species, but only 1 of the abundant and 1 of the rare, and consequently we see a valley in the qd vs q curve. When the samples have the same species abundance distributions, the Bray-Curtis dissimilarity is identical to 1d. However, for S4, which has the same richness as S0 but a different species abundance distribution, the Bray-Curtis index is different from 1d.
Second, let us consider the situation when samples have unequal richness (Fig. 1B). Samples S5-S7 have only two species each. In S5, those two species are the same as the most abundant ones in sample S0 and consequently, qd decreases with increasing q. In S6, the two species are the same as two intermediates in S0 and we can see a valley in the curve. In S7, the two species are the same as two rare ones in S0 and the dissimilarity increases with q. The Bray-Curtis index shows a different behavior. For S0-S5, Bray-Curtis is equivalent to Hill-based dissimilarity with a low diversity order (q) of 0.52 and for S0-S6 and S0-S7 it is equivalent to diversity orders (q) much higher than 2.
Using the qRC null model, we can compare the observed dissimilarity between two samples to the expected dissimilarity if the two sampled communities had been randomly assembled from a regional species pool. In Fig. 1C-D, the sample pair S0-S3 is used as an example. For values of q close to 0, the observed dissimilarity is higher than the null expectation and consequently 0RC is 1. For higher values of q, the observed dissimilarity is close the null expectation and consequently the qRC values are intermediate, i.e. neither close to 0 or 1 (Fig. 1D). For this theoretical example, it means that if we consider the common species (q≈1), the observed dissimilarity could be explained by random assembly of the two communities from the regional species pool but if we give equal weight to all species (q≈0), the observed dissimilarity is higher than we can expect from a random assembly process.
Effects of bioinformatics choices on count tables
Samples collected from two experiments (AGS and MFC) were sequenced in two separate sequencing runs. The sequences were processed using DADA2 version 1.10 [36], Deblur version 1.04 [37], USEARCH version 10 [38], and Mothur version 1.41 [39] with various settings, resulting in 11 count tables for each experiment (Text S3, supplementary material). In USEARCH, either the UNOISE pipeline was used to generate ASVs or UPARSE was used to generate OTUs with a 97% OTU-clustering threshold.
Fig. 2 shows information about the generated count tables, including how the sequences were processed and the numbers of reads and OTUs/ASVs. Sequence reads were processed sample-by-sample (separate) or by pooling the reads in all samples (pooled). Relaxed quality filtering thresholds resulting in a higher number of OTUs/ASVs (high) or stringent settings resulting in fewer OTUs/ASVs (low) were used in some of the pipelines. Deblur resulted in the lowest number of ASVs/OTUs in both the AGS- and MFC experiments and Mothur resulted in the highest. There was a large span in the number of inferred OTUs/ASVs for different pipelines. In the AGS experiment the number ranged from 690 to 4055 and in the MFC experiment the span was 1800 to 6457. During analysis of dissimilarities, all count tables were rarefied (without replacement) to the read count in the sample with the lowest number of reads. This resulted in count tables with 223 692 to 321 060 reads per sample in the AGS data set and 14 896 to 35 680 in the MFC data set (Fig. S1, supplementary material). The order of magnitude difference in reads per sample for the two data sets was caused by a higher number of total reads, a lower number of samples, and a more even sequencing depth in the AGS data set. Rarefaction curves for each count table are shown in Fig. S2-S3 (supplementary material). For count tables with stringent quality filtering thresholds and relatively few OTUs/ASVs, the rarefaction curves reach a horizontal asymptote whereas increasing numbers of OTUs/ASVs with increasing sampling depth is observed for some count tables.
For each data set, the DADA2 and UNOISE tables having the highest ASV counts were used to generate a consensus table, which only contained ASVs detected by both pipelines. A function for generating a consensus table from an unlimited number of count tables was implemented in qdiv and is described in Text S4 (supplementary material). For the AGS data set, the consensus table was inferred from count tables #3 and #6. They together had 2041 unique ASVs, but only 919 of these were found in both and used in the consensus table. For the MFC data set, count tables #1 and #6 were used as input for the consensus table. They together had 5952 unique ASVs out of which 2489 SVs were found in both and used in the consensus table. For both data sets, most of the reads were associated with the ASVs that were kept in the consensus tables (99.7% for the AGS data and 97.1% for the MFC data).
Assessing choice of pipeline and dissimilarity index by analysis of replicates
Both the AGS and MFC samples contained microbial community replicates, which means that DNA was extracted in parallel from six aliquots of biomass collected from the same microbial community (e.g. the same AGS reactor or the same MFC biofilm). The MFC samples also contained one set of technical replicates, which in this study means that the same DNA extract was processed in six separate PCR reactions followed by sequencing of the six separate PCR products.
The diversity order (q) of the dissimilarity index had a strong effect on the dissimilarity between replicates. The highest dissimilarity was observed for incidence-based indices (0d and Jaccard). The dissimilarity decreased with increasing diversity order. Overall, the technical replicates had lower dissimilarity than the community replicates (Fig. 3A). The difference between technical- and community replicates was statistically significant for 6 out of 12 count tables for 0d and 9 out of 12 tables for 1d (p < 0.05, Welch’s anova with Holm-Bonferroni correction for multiple comparisons), but not for any of the count tables at 2d. Different count tables showed different dissimilarity between community replicates. For the AGS samples, the Mothur table (#11) showed significantly higher dissimilarity than all the other tables for 0d, and the UPARSE separate table (#10) showed significantly higher dissimilarity than all the others for 1d (p < 0.05). For 2d, there was no significant difference between the tables (Fig. 3B-D). The consensus table had lower dissimilarity between replicates than the two count tables used to generate the consensus table. In the AGS data set, the 0d between replicates dropped from 0.19±0.03 and 0.14±0.01 with count tables #3 and #6, to 0.12±0.01 with the consensus table (p < 0.05). The dissimilarity between replicates for both the AGS- and MFC data sets and several dissimilarity indices and pipelines are shown in Fig. S4-S5 (supplementary material).
Dissimilarity matrices generated with different pipelines and indices
Having seen differences in the number of OTUs/ASVs and read counts (Fig. 2), as well as in the dissimilarity between replicates for count tables generated with different bioinformatics pipelines (Fig. 3), we asked if dissimilarity matrices calculated from the count tables would show the same patterns in the data. Matrices of pairwise dissimilarities between samples are typically used to explore microbial communities. First, we investigated the similarity of dissimilarity matrices generated using different indices and count tables. The matrices were compared using the Mantel test with the Pearson correlation coefficient (r) or the Spearman rank correlation coefficient (ρ) as test statistic. For all pairwise comparisons between dissimilarity matrices, there was a statistically significant correlation (p=0.001, 999 permutations), and the correlation coefficients ranged from 0.70 to 1.00. However, although the dissimilarity matrices were highly correlated with each other, there were some differences. To visualize these differences, a principal coordinate analysis (PCoA) of the dissimilarities (measured as 1-r) between dissimilarity matrices was carried out (Fig. 4). Each point in Fig. 4 represents a dissimilarity matrix calculated from a rarefied count table. The matrices tended to separate by index. Incidence-based indices (0d and Jaccard) were more scattered and clearly separated from relative-abundance based indices (1d, 2d, Bray-Curtis). For the AGS samples, the count tables generated by Mothur also separated from the other count tables for the incidence-based indices.
Resolution power of different count tables and dissimilarity indices
The ability of different dissimilarity indices and count tables to distinguish between sample groups in the experimental data was also tested. The AGS data set was more challenging than the MFC data set because most taxa were shared between different samples. Therefore, the AGS data set with the three sample categories, the inoculum, reactor 1 (R1), and reactor 2 (R2), was used in the analysis. The F-statistic is the ratio of between-group variability and within-group variability. Dissimilarity matrices resulting in the calculation of a high F-statistic are thus better at resolving differences between sample groups. Fig. 5 shows that dissimilarity matrices generated with the 1d and 2d indices resulted in higher F-statistic than those generated with the Bray-Curtis index, which in turn resulted in higher F-statistic than those generated with the incidence-based indices. High dissimilarity between replicates, which was observed for the incidence-based indices (Fig. 3), would result in lower F-statistic. Despite large differences in the F-statistic, statistically significant separation between the three sample groups was found with all count tables and dissimilarity indices (permanova, p=0.001, 999 permutations). An example of a PCoA showing the separation between sample groups is shown in Fig. S6 and F-statistics for all pipelines are shown Fig. S7 (supplementary material).
Effects of pipelines and indices on hypothesis testing
AGS experiment
In the AGS experiment, we hypothesized that R1 and R2 diverged from the inoculum to the same extent after 150 days of operation since they were operated under identical condition. Thus, the dissimilarity between the inoculum and R1 should be the same as between the inoculum and R2. The results based on two count tables are shown in Fig. 6. For high diversity orders (q ≥ 0.4), both show larger dissimilarity between the inoculum and R2 than between the inoculum and R1. For low diversity order, count table #3 (Fig. 6A) still shows the same results whereas count table #7 (Fig. 6B) shows higher dissimilarity between the inoculum and R1. The results from all count tables with Hill-based, Jaccard, and Bray-Curtis dissimilarity indices are shown in Fig. S8-S9 (supplementary material). All count tables showed that if q ≥0.6, the inoculum had significantly higher dissimilarity with R2 than with R1 (p < 0.05, Welch’s anova). The Bray-Curtis dissimilarity showed the same. However, the results varied with the Jaccard index and with q<0.6d.
MFC experiment
In the MFC experiment, we compared microbial communities of biofilms growing on anodes (conductive surface functioning as electron acceptor) with biofilms growing on non-conductive porous separators. We hypothesized that biofilms growing on conductive and non-conductive surfaces would be more dissimilar to each other in the acetate-fed MFC than in the glucose-fed MFC. Glucose is a fermentable substrate and fermentative microorganisms should be able to grow anywhere within the MFCs, leading to a more homogenous microbial community structure. Acetate, on the other hand, is non-fermentable and the microbial communities in an acetate-fed MFC are therefore dependent on electron acceptor availability. On the anode surface, the anode serves as electron acceptor while in other locations within the MFCs, the microorganisms must use soluble compounds such as oxygen diffusing in through the gas-diffusion cathode. Microbial communities in different locations of the acetate-fed MFCs should therefore have different metabolisms, which likely leads to higher dissimilarity than between communities within the glucose-fed MFCs which, at least partly, could have the same metabolism, namely fermentation [28].
For high diversity orders, (q > 0.8), most count tables showed higher dissimilarity in the acetate-fed MFC than in the glucose-fed MFC. For low diversity orders, the glucose-fed MFC had higher dissimilarity (Fig. S10-S11, supplementary material). For the MFC data set, count tables generated with different bioinformatics pipelines showed quite different results. Particularly tables #1 and #2 generated with DADA2 run in pooled mode had much lower difference between the two types of MFCs than the other count tables. An example is shown in Fig. 7.
Null model
Null models were used to aid in the interpretation of dissimilarity values. An example from the AGS experiment is shown in Fig. 8A-C. The dissimilarity between the inoculum and R1 is not significantly different from the null distribution at any diversity order and consequently qRC is close to 0.5. For the inoculum and R2, the observed dissimilarity is higher than the null expectation, showing a higher degree of compositional changes in the microbial community.
For the MFC data set, we choose to run the null model for two count tables because the dissimilarity between the two types of biofilms in the acetate-fed MFC had a different shape when the results were based on count tables #1 and #2 compared to the other count tables (Fig. S10, supplementary material). The results from the null model analysis are shown in Fig. 8D-I. At a diversity order of 0, the observed dissimilarity is similar to the null expectation and consequently qRC is close to 0.5. This indicates that if we only care about detection of ASVs, there is a random distribution between the two biofilm communities. With increasing emphasis on relative abundance, the dissimilarity between biofilm types is higher than the null distribution. For the acetate-fed MFCs, the qRC values are close to 1, which means significant compositional differences between the two communities. For the glucose-fed MFCs, the qRC again drops to lower values at a diversity order above 1. This means that some of the most abundant ASVs are shared between biofilms growing on conductive and non-conductive surfaces. This indeed turned out to be the case with a Trichococcus sp. being highly abundant in both biofilm communities, likely carrying out fermentation in both places [28].