Our comparisons revealed 16 phyla shared among all technologies, and another 13 phyla only found with one or two of the technologies that all belonged to the rare biosphere, meaning taxa with low read abundances (Fig. 1A, B, C, Table S1). An earlier study showed that less than 50% of phlya were detected with metagenomes in comparison to amplicons (Tessler et al. 2017), while other studies demonstrated the exact opposite (Poretsky et al. 2014; Guo et al. 2016; Tessler et al. 2017). However, comparisons of short reads to public databases is problematic; e.g., 150 bps-reads only cover approximately 15% of the average prokaryotic gene (Xu et al. 2006) leading to inherent biases in taxonomic calling. Interestingly, the percent relative abundance of the individual phyla varied greatly between the technologies, suggesting biases either arising from PCR-amplification or from measuring only active community members in metatranscriptomes (Fig. 1A, B, C). Ordination analyses coupled to Mantel-tests based on Bray-Curtis dissimilarities and Spearman rank correlations of taxon composition clearly showed a greater similarity between metatranscriptomes and metagenomes (rs=0.4122, p = 0.001) and metatranscriptomes and amplicons (rs=0.1928, p = 0.018), than between metagenomes and amplicons (rs=0.04281, p = 0.309) in taxon composition (Fig. 1D). Main differences between metagenomes and amplicon sequencing data were attributable to different abundances of phyla that recruited low to average numbers of reads as reported previously (Poretsky et al. 2014). Proportional overrepresentation or underrepresentation of phyla in amplicons in comparison to metagenomes were analyzed using the pairwise t-test with paired samples. The analysis revealed a proportional greater detection of Bacteroidetes, Alphaproteobacteria and Cyanobacteria and a lower detection of Betaproteobacteria, Planctomycetes and Actinobacteria in amplicons (Supplemental Material 1). While relative abundances of assembled metagenomes have been shown to significantly correlate with quantitative digital droplet PCR measurements (Probst et al. 2018), amplicon data suffer from primer bias (Probst et al. 2015; Starke und Morais 2019), amplification biases (Acinas et al. 2005), chimeras (Haas et al. 2011) and variable numbers of 16S rRNA gene copies per genome (Farrelly et al. 1995). In direct abundance correlations only low-abundant Gemmatimonadetes and fairly high abundant Betaproteobacteria showed concurrences between the two technologies, i.e., amplicon and metagenomes (Figures S1, S2). To overcome potential biases in amplicon data in silico, we corrected the amplicon dataset for 16S rRNA gene copy number and the specific primer bias by using the weighted primer score. As evidenced in Figure S3 and Table S2, a correction of relative abundances by gene copy number and weighted primer score (Figures S4, S5) did not result in an observable (significant) shift of community composition. Thus, we conclude that either amplification biases or suitability of the chosen hypervariable region for taxonomic calling (Yang et al. 2016) are the major components distorting the community profile derived from amplicon data.
The active community reflected by metatranscriptomes was significantly similar to communities assessed with metagenomes (Fig. 1), although the metatranscriptomic data was only based on transcribed 16S rRNA genes. Relating the relative abundance of a given taxon detected with metatranscriptomes to DNA-based methods (amplicon data or metagenome data) can result in meaningful ecological statements regarding its activity at the time of sampling. We determined great differences between the proportion of the active population in the sampled ecosystems when using metagenomic or amplicon data. For the twelve most abundant phyla only Actinobacteria, Cyanobacteria and Gemmatimonadetes had similar proportions for metatranscriptomes with metagenomes, and metatranscriptomes with amplicons, while the percent of the population that was active for nine other phyla (including Verrucomicrobia) was inconclusive (Figures S1, S2). For instance, metatranscriptomics coupled to metagenomics of Alphaproteobacteria, which are important for nitrogen cycling in lakes (Newton et al. 2011), suggests that the majority of the population was inactive. In contrast, using amplicon data as the baseline, a high percentage of the population was transcribing 16S rRNAs. These results suggest that the selection of the DNA profiling method heavily influences the inferences of the active members of a microbiome.
To receive a broader picture of the impact of the sequencing strategies on ecological conclusions, we compared the three datasets performing diverse methods from community ecology, i.e. RDA, Mantel tests and BioEnv (Fig. 2, Suppl Mat. 2). In line with Crump et al. (2007) and Souffreau et al. (2015), we assume that physico-chemical factors have a great effect on community composition. Therefore, we expect that a high proportion of variance is explained by physico-chemical factors. In general, the Mantel tests and BioEnv analysis (Fig. 2, Supplemental Material 2) revealed a strong link of the physico-chemical matrix with the metagenomic and metatranscriptomic dataset, while the amplicon dataset showed a negative correlation in the Mantel test and positive correlation in the BioEnv analysis. The RDA conducted with amplicon community data as response matrix and environmental data as explanatory variables revealed data that only 15% of variance could be explained (Figs. 2, S6). For metagenomic and metatranscriptomic community 43% and 51% of the total variance were explained by physicochemical data, respectively. The significant factors for the metatranscriptomic dataset were total phosphorus (TP), dissolved phosphorus (DP), elevation and potassium (K), while the metagenomic dataset was additionally to these factors significantly correlated with temperature and pH. Significant factors in the amplicon dataset were temperature and pH only.