Background: High-throughput amplicon sequencing of marker genes, such as the 16S rRNA gene in Bacteria and Archaea, provides a wealth of information about the composition of microbial communities. To quantify differences between samples and draw conclusions about factors affecting community assembly, dissimilarity indices are typically used. However, results are subject to several biases and data interpretation can be challenging. The Jaccard and Bray-Curtis indices, which are often used to quantify taxonomic dissimilarity, are not necessarily the most logical choices. Instead, we argue that Hill-based indices, which make it possible to systematically investigate the impact of relative abundance on dissimilarity, should be used for robust analysis of data. In combination with a null model, mechanisms of microbial community assembly can be analyzed. Here, we also introduce a new software, qdiv, which enables rapid calculations of Hill-based dissimilarity indices in combination with null models.
Results: Using amplicon sequencing data from two experimental systems, aerobic granular sludge (AGS) reactors and microbial fuel cells (MFC), we show that the choice of dissimilarity index can have considerable impact on results and conclusions. High dissimilarity between replicates because of random sampling effects make incidence-based indices less suited for identifying differences between groups of samples. Determining a consensus table based on count tables generated with different bioinformatic pipelines reduced the number of low-abundant, potentially spurious amplicon sequence variants (ASVs) in the data sets, which led to lower dissimilarity between replicates. Analysis with a combination of Hill-based indices and a null model allowed us to show that different ecological mechanisms acted on different fractions of the microbial communities in the experimental systems.
Conclusions: Hill-based indices provide a rational framework for analysis of dissimilarity between microbial community samples. In combination with a null model, the effects of deterministic and stochastic community assembly factors on taxa of different relative abundances can be systematically investigated. Calculations of Hill-based dissimilarity indices in combination with a null model can be done in qdiv, which is freely available as a Python package (https://github.com/omvatten/qdiv). In qdiv, a consensus table can also be determined from several count tables generated with different bioinformatic pipelines.