Overview of Metagenome Assembly and Designation of 13C-Enrichment
Shotgun metagenomic sequencing of 13C-labeled DNA recovered a total of 1.1 billion reads after quality filtering, estimated by nonpareil  to cover 80% of genomic diversity in the DNA pool. Metagenome assembly produced a total of 356,131 contigs greater than 2.5 Kb, amounting to a total length of 1.8 Gb (~ 230 genomes of 8 Mb). The degree of 13C-enrichment was estimated for each contig by comparing the CsCl gradient profile to simulated natural abundance profiles (Figure S2). More than half of all contigs were designated either strongly or weakly 13C-enriched (total length = 921 Mb), while the remainder were from genomes of abundant soil taxa that lacked evidence of 13C-labelling (766 Mb). Contigs clustered into coherent sets according to pentanucleotide frequency which grouped by patterns of 13C-labelling and taxonomy (Figure 1ab), and not average G+C content (Figure 1c). The Random Forest model used to predict enrichment status had an overall accuracy of 89.1% with high sensitivity and specificity for both strongly enriched (98.3% and 86.4%, respectively) and unenriched contigs (100% and 93%) (details in Supplementary methods; Table S4).
A total of 47, 2 and 46 phylobins greater than 1 Mb were produced from strongly, weakly and unenriched contig sets, respectively (Table S5). Of the 95 total phylobins, 38 were deemed high quality (>75% completeness) and were divided into four categories based on enrichment status and cellulolytic potential (inferred from the presence of endoglucanases): strongly 13C-enriched and cellulolytic (nstrong = 12), strongly 13C-enriched and non-cellulolytic (nstrong = 8), weakly 13C-enriched and non-cellulolytic (nweak = 2), and unenriched (n = 16). Each phylobin represented a genomic-ecological unit, rather than an individual genome, encompassing genomes from, at most, four to seven genera for enriched and unenriched phylobins, respectively, based on the diversity of assembled full-length 16S rRNA genes (Figure 2). Both phylobins and representative reference genomes from 13C-enriched cellulolytic taxa were larger (µPhyBin = 24.7 Mb and µRep = 7.1 Mb) than those from 13C-enriched non-cellulolytic taxa (µPhyBin = 12.5 Mb and µRep = 5.2 Mb; Wilcoxon test, p ≤ 0.05) and unenriched taxa (µPhyBin = 11.4 Mb and µRep = 5.0 Mb; p < 0.01). Analysis of single-copy genes and single nucleotide polymorphisms per single-copy gene indicated that the large size of phylobins resulted from natural pangenomic diversity (intra-species) and inclusion of genome fragments from closely related taxa (inter-species diversity; details in Supplementary Methods).
The Structure of a Cellulose Economy
Taxa designated as strongly 13C-enriched and cellulolytic (i.e. encoding endoglucanases) represented the greatest proportion of unassembled SSU gene fragments in metagenomes. The most abundant were classified to well-known genera of cellulolytic soil organisms, including Cellvibrio, Herpetosiphon (Chloroflexi), and members of the fungal order Sordariales (predominantly Chaetomium), as well as lesser-known cellulolytic genera, such as Devosia and Sphingomonas (Figure 2). Peptides from these five taxa were also abundant within the total metaproteome (ntotal = 90,557 peptides; 33,765 unique proteins) occupying in the following percentages of total peptides: Rhizobiales (6.9%), Cellvibrionales (2.6%), Sphingomonadales (2.6%), Herpetosiphon (1.0%), and Sordariales (0.5%). Endoglucanases were detected in contigs classified to 22 of the 30 genera designated as strongly 13C-enriched, consistent with their presence in reference genomes (Figure 2, Table S6). The 22 genera designated strongly 13C-enriched and cellulolytic comprised 29% of the total SSU rRNA gene fragments. In contrast, the seven genera designated as strongly 13C-enriched and non-cellulolytic (see Figure 2) comprised 2.3% of recovered SSU rRNA reads. These putative non-cellulolytic taxa included only one reference genome encoding an endoglucanase.
A diverse set of endoglucanases were recovered from phylobins, revealing a snapshot of the functional diversity of cellulolytic populations. A total of 426 unique endoglucanase homologs were identified (at a > 80% identity threshold) belonging to 39 different CAZy families/sub-families. Eighty-two of these endoglucanases were present within gene clusters that contained a carbohydrate-binding module. A total of 54 peptides in the metaproteome matched endoglucanases; the most abundant was a GH9 from Cellvibrio (Figure 3). The second and third most abundant endoglucanases in the metaproteome were encoded by fungi (GH131 and GH7). Overall, most endoglucanases in the metaproteome were encoded by fungi (57%), which was disproportionate to the total relative abundance of fungal peptides (1.2%) in the whole metaproteome.
Evidence for Metabolic Dependency and Competitive Exclusion
To evaluate potential interactions among 13C-labeled taxa, we assessed the degree of auxotrophy (as an indicator of metabolic dependency) and presence of SM-encoding genes (antibiotic-based competition) in phylobins and their representative genomes. No phylobin or genome was fully prototrophic or auxotrophic for all biosynthetic pathways evaluated (n = 32), with the average phylobin being auxotrophic for 6 amino acids, 1 cofactor and 2 vitamins and the average representative genome auxotrophic for 5 amino acids, 1 cofactor and 1 vitamin. The extent of auxotrophy did not differ significantly between 13C-enriched cellulolytic and 13C-enriched non-cellulolytic phylobins or representative genomes (Figure 4) but did vary among dominant taxa within each group.
The most prototrophic representative genomes were Cellvibrio (31/32 pathways detected; genome size = 5.2 Mb; designated: 13C-enriched, cellulolytic), Devosia (31/32; 4.2 Mb; 13C-enr. cellulolytic) and Leptothrix (31/32; 4.9 Mb; unenr. non-cellulolytic) (see Table S7). The most auxotrophic representative genomes were Planctomyces (13/32; 3.2 Mb; 13C-enr. non-cellulolytic), Nannocystis (16/32; 12.1 Mb; 13C-enr. non-cellulolytic) and Vampirovibrio (16/32; 3.0 Mb; 13C-enr. non-cellulolytic). These trends were consistent in phylobins, where Cellvibrionales (30/32; ranked 1st in terms of biosynthetic capacity among the 38 phylobins examined) and Rhizobiales (28/32; ranked 3rd) were among the most prototrophic, while Planctomycetales (24/32; 20th), Vampirovibrionales (15/32; 34th), Chthoniobacterales (10/32; 37th; 13C-enr. non-cellulolytic) and Chloroflexales (9/32; 38th, 13C-enr. cellulolytic) were among the most auxotrophic. Overall, representative genomes from the phylum Actinobacteria were significantly more auxotrophic than Proteobacteria (µactino = 24.2 vs. µproteo = 25.3; Wilcoxon test, p = 0.04) driven by largely by Alphaproteobacteria (µalpha = 26.2, p = 0.003; Figure S3). This trend was consistent, but not significant, in phylobins (µactino = 23.8 versus µalpha = 26.0). Representative genomes for Actinobacteria and Alphaproteobacteria did not significantly differ in size (µactino = 5.8 Mb vs. µalpha = 5.3 Mb; Wilcoxon test, p = 0.46) or completeness (µ = 99.6% in both; p = 0.71).
The number of SM genes encoded in 13C-enriched cellulolytic phylobins (µrep = 14.0; µPhyBin = 40.0) was significantly higher than in 13C-enriched non-cellulolytic (µrep = 5.8; µPhyBin = 12.4; Wilcoxon test, p < 0.01) or unenriched phylobins (µrep = 4.8; µPhyBin = 10.0). The trend remained after normalizing to total phylobin or reference genome size: 2.2 read counts per million (rcpm) versus 1.9 rcpm and 1.3 rcpm, and 1.6 rcpm versus 1.0 rcpm and 1.0 rcpm, respectively. The genomes encoding the greatest number of SMs were Sporocytophaga, several Actinobacteria (Streptomyces, Lentzea, Dactylosporangium and Kitasatospora) and Cellvibrio (Table S7). Genes encoding type 1 polyketide synthases were consistently more abundant in 13C-enriched cellulolytic taxa than the other two groups (Figure S4). Non-ribosomal peptide synthetases and bacteriocins were more frequently encoded in both 13C-enriched groups, but only peptides matching cellulolytic taxa were present in the metaproteome (Figure S4). The metaproteome was dominated by terpene synthases from Actinobacteria, bacteriocins from Cellvibrio and non-ribosomal peptide synthetases from Sordariales. In contrast to trends in auxotrophy, representative genomes of 13C-enriched cellulolytic Actinobacteria encoded significantly higher numbers of SMs (µ = 20.0; n=6; p=0.03) than the cellulolytic Alphaproteobacteria (µ=5.3; n=6).
The orders Planctomycetales and Sphingomonadales were represented by independent phylobins that were weakly 13C-enriched, alongside those that were strongly 13C-enriched and unenriched (Figure 2). Only the strongly 13C-enriched Sphingomonadales phylobin encoded endoglucanases and also more SMs (predominantly bacteriocins) than the weakly 13C-enriched and unenriched phylobins (2.1 rcpm, 0.4 rcpm and 1.3 rcpm, respectively). Both 13C-enriched Sphingomonadales phylobins shared the same pattern of auxotrophy (Figure S5). No Planctomycetales phylobin encoded endoglucanase, nor a substantial number of SMs.
Comparison of Cellulolytic and Hydrolytic Potential
The functional gene content of representative genomes explained substantial variation in enrichment status (Figure 5). The trend was driven primarily by the relative abundance of glycosyl hydrolases (GH), which were 1.5- to 3-fold higher (after normalization for genome size) in 13C-enriched cellulolytic phylobins and corresponding reference genomes, respectively. This trend was evident in all gene families associated with lignocellulose degradation (GH, CBMs, AA and PL), which collectively explained 63% of variation in community functional composition along NMDS1 (Figure 5a; Table S8b). The genomes also separated along the secondary axis (NMDS2) defined by genome size, and peptidase and motility gene content, which explained 16.3, 16%, and 19% of variation, respectively (Table S8b). Degree of auxotrophy did not correlate with variation in functional gene content in either representative genomes or phylobins (Table S8a). In addition, the relative abundance of biomass-degrading enzymes (e.g. peptidases and nucleases) did not differ with respect to degree of 13C enrichment or cellulolytic capacity, either in phylobins or representative genomes. In contrast, broad differences in functional gene categories were observed between Actinobacteria and Proteobacteria (Figure 5a).
Temporal Dynamics in Cellulose Economy
Early and late-stage colonizers of cellulose were identified according to genome-based predictions of growth rate (Table S7). Taxa designated as 13C-enriched and cellulolytic were predicted to have faster generation times based on phylobins (3.0 hr) and representative genomes (3.1 hr) compared to 13C-enriched non-cellulolytic taxa (5.7 hr and 5.6 hr, respectively; Figure 6a), though these differences were not significant (Kruskal-Wallis, pphybin = 0.33 and prep = 0.52). However, genomes from 13C-enriched non-cellulolytic taxa exhibited a bimodal distribution (Figure 6a) and the set of genomes with slower generation times (generation time > 5 hr) were significantly more auxotrophic (µslow = 15.8/32 prototrophies) than taxa with faster predicted generation times (< 3 hr; µfast = 23.8/32; Wilcoxon test, p = 0.05). The same trend was not apparent in phylobins, though predictions were unobtainable for two of the most auxotrophic bins (Vampirovibrionales and Chthoniobacterales).
The genome-based characterizations of early and late-stage colonizers were consistent with temporal patterns of taxa observed in time-course amplicon sequencing data. The highly prototrophic taxa Cellvibrio and Devosia, increased in relative abundance earliest, peaking in 13C-enrichment at days 7 to 14 and declining by day 30 (Figure 6b; Figure S6a). Chaetomium were also early colonizers, showing 13C-enrichment by day 7 (Figure S7). In contrast, the relative abundance of Actinobacteria was less dynamic, and these organisms tended to become labeled on, or after, day 14. Taxa predicted to be slowest growing, and identified as 13C-enriched, non-cellulolytic, and highly auxotrophic (Planctomyces, Sphingomonas and members of Verrucomicrobia), began to increase in relative abundance only after day 14 and were maximally 13C-enriched on day 30 (Figure 6c; Figure S6c).
Surface Adhesion and Surface Motility
Phylobins from cellulolytic taxa were more likely to encode the capacity for surface adhesion and/or surface motility than other groups, including twitch motility, pili systems and fimbriae (Figure S8a). Surface attachment proteins were abundant in reference genomes from 13C-enriched taxa (both cellulolytic and non-cellulolytic), and in phylobins classified as Rhizobiales and Caulobacterales (Figure S8b). Adhesion proteins used in gliding motility (aglZ and sprB) were present in reference genomes of cellulolytic taxa but absent from phylobins.