Public metagenomes selection and data processing
To get an overview of the AD microbiome, 18 experiments published between 2014 and 2019 were selected. These include 134 samples, some of them representing biological replicates (Fig. 1). Only experiments performed using Illumina sequencing technology have been considered in the present study, in order to facilitate the assembly and binning process. Among these datasets, both laboratory-scale- and full-scale biogas plants fed with a range of different substrates were considered, thus, the outcomes of the work reflect a broad spectrum of the microbiomes residing in such engineering systems. Most of the samples were collected from reactors operated in Denmark (68%), while others derived from Germany (9%), Canada (7%), Japan (7%), Spain (4%), Sweden (3%) and China (2%) (Additional File 1). Most samples were collected from laboratory-scale biogas reactors and batch tests, while other samples were obtained from 23 full-scale biogas plants located in Europe.
Microbial composition was initially determined considering unassembled reads, and this highlighted marked differences between samples, which were classified into 35 groups (details reported in Additional File 2). This microbial diversity is also clearly evident in Fig. 2, where different samples are connected with arcs having different colors depending on the fraction of common species.
A subsequent binning approach was independently performed on each assembly of the 35 groups, leading to a total of 5,194 MAGs (Table 1). Data regarding metagenomic assemblies and number of MAGs collected from the binning process are reported in detail in Additional File 3. Those MAGs featuring completenesses (Cp) lower than 50% and/or contamination rates (Ct) higher than 10% were discarded. The remaining MAGs were de-replicated by means of the genome-aggregate ANI value reducing the number down to 1,635 unique “species” (Table 1; Fig. 3; Additional File 4). By considering all 134 samples, on average 89% of the reads were consistently aligned on the 1,635 MAGs, suggesting that the obtained dataset captured much of the available sequencing information. Results obtained were quite similar when only the HQ MAGs were selected. The degree of novelty of our study was determined performing a comparison with MAGs previously recovered from the AD environment [11,33,34] (https://biogasmicrobiome.com/). Our study showed an improvement in the quality (increased Cp and/or reduced Ct) of 75% of the MAGs already present in public repositories, and added 1228 “new species”, consistently improving the entire biogas microbiome (Additional File 5).
Structure of the microbial community
The analyses performed using MiGA estimated that a relevant fraction of the genomes belong to taxonomic groups for which genomes of type material are not present in the NCBI genome database. More specifically, 0.2% of MAGs cannot be assigned to known phyla, 11.6% to known classes, 69.7% to orders, 71.3% to families, 92.1% to genera and 95.2% to species. This evidenced that the present genome-centric investigation allowed to fill-in a notable gap in the knowledge of the AD microbial community. A dedicated project was established to allow the recovery of both genome sequences of MAGs and their taxonomic assignment “http://microbial-genomes.org/projects/biogasmicrobiome”.
In addition, to determine the taxonomic position of the MAGs, a procedure based on four different evidences was used (Additional File 2). Only 69 out of 1,635 MAGs were assigned to known species based on ANI comparison performed considering the genomes deposited in NCBI (https://www.ncbi.nlm.nih.gov/genome/microbes/) (Additional File 4). Furthermore, the vast majority of obtained MAGs (1,574) were assigned to the domain Bacteria and only 61 to Archaea, and distributed over 55 different phyla as reported in Fig. 4. However, our data are similar to those previously obtained using marker gene based analysis , in fact, the vast majority of species were classified as belonging to the phylum Firmicutes (790 MAGs), followed by Proteobacteria (137 MAGs) and Bacteroidetes (126 MAGs). The bacterial phylum Firmicutes, which is the most abundant taxon within the biogas microbiome, varied between 1.3% and 99.9% of the microbial community (Additional File 2, Figure S1 and Additional File 6). In almost 40% of all samples analyzed, Firmicutes was not the dominant taxon, but Bacteroidetes, Coprothermobacter, Actinobacteria, Thermotogae and Chloroflexi become prevalent reaching up to 85% relative abundance within the microbiome. Interestingly, in reactors where none of the previously mentioned taxa were dominant, microbial species belonging to candidate phyla radiation (CPR) and to other candidate taxa reached high relative abundances, as was the case for Candidatus Cloacimonetes (15.7%), Ca. Fermentibacteria (16.4%), Ca. Roizmanbacteria (19%) and Ca. Saccharibacteria (16.4%) (Additional File 6). The high relative abundance of yet-uncultivated taxa suggests that they may play an important role in the microbial community. Some species associated to CPR were identified by our study and were tentatively assigned to Saccharibacteria (8 MAGs) and Dojkabacteria (8 MAGs), Microgenomates (1 MAG) and Peregrinibacteria (1 MAG).
Regarding the methanogenic community, it was shown that the AD microbiome is almost exclusively represented by phylum Euryarchaeota (53 MAGs).
Influence of environmental conditions on the microbiome composition
It was shown that the applied environmental conditions (e.g. temperature), or the design of the reactors (e.g. biofilm), greatly determine the microbial diversity and properties of this ecosystem. For instance, the “Bacteria/Archaea” ratio, which has a median value of ~14, was highly variable (Additional File 2, Figure S2). Beside the acidogenic reactors, where the methanogenic process was undetectable (i.e. “LSBR-DSAc-preH2” and “LSBR-DSAc-postH2”), it was concluded that in 7.7% of all samples archaeal abundance was lower than 1% and consequently “Bacteria/Archaea” ratio exceeded 100. However, Archaea were predominant in several reactors analyzed in this study and in 3% of all samples, their abundance exceeded that of Bacteria, with a ratio of ~0.5 in a biofilm sample collected from a reactor fed with acetate (“LSBR-D200-DNA-BF”). Acetate is a very important “methanogenic substrate” and it can be directly converted to methane by acetotrophic Archaea.Thus, a dominance of Archaea in the microbial community is a reasonable finding, as evidenced in some samples of the present study. A complex combination of factors, such as the presence of biofilm, is probably contributing to this unbalanced proportion of the “Bacteria/Archaea” ratio. Considering only biogas plants, the ratio is kept within a narrower range, but still it is very flexible (from 470 in Nysted to 3.4 in Vilasana) (Figure S2).
Furthermore, we calculated the variation in abundance for each MAG across the AD samples, along with their taxonomic assignment. The number of MAGs in each sample was estimated considering as “present” those with abundances higher than 0.001%. This analysis revealed that the microbial community composition was highly variable depending on the origin of each AD sample as a consequence of the reactor operation, performance, and influent feedstock (Fig. 1, Fig. 2 and Additional File 2- Figure S3). The number of detectable species in the microbiome ranged between 79 (Fisher alpha diversity 4.4) and 1213 (Fisher alpha diversity 133.8) (Additional File 7). According to previous findings [6,9], thermophilic reactors have a lower number of species than mesophilic (p-value<0.001). Among the thermophilic reactors in this study, those characterized by a very high number of species were fed with manure or a mixture of manure and agricultural feedstocks, while those having fewer species were fed with simplified substrates such as cheese whey, acetate or glucose (p-value<0.001). This suggests that the AD process can be supported by less than 100 species when the feedstock is mainly consisting of a single compound. On the contrary, degradation of complex substrates (such as sewage sludge or manure) requires the cooperation of a large cohort of microbes including more than 1,000 species. Analysis of the MAGs shared among different samples (Fig. 2) revealed that thermophilic reactors tend to share more species than mesophilic systems, which could be due to the selective pressure imposed by the high growth temperature. Despite feedstock is the primary determinant of community structure, it was previously demonstrated that the initial inoculum plays a major role, lasting for months even after feed changes . Additionally, feedstock contributes to the community composition in terms of immigrant microbes, which are partially involved in shaping the final microbiome.
Cluster analysis was performed both at individual MAG abundance level and at sample level (Additional File 2, Figure S3) in order to verify MAGs and samples having similar abundance profiles, respectively. This allowed the assignment of MAGs to two main groups: “G1” includes mostly Chloroflexi and Bacteroidetes, while “G2” includes mostly Firmicutes. Sample clustering revealed three main groups, “C1” including reactors fed with sewage sludge, “C2” those fed with “simplified substrates” and “C3” fed with manure only. A similar classification is shown in Fig. 1, indicating that the temperature and feeding substrate were the main driving forces of the AD microbiome diversification [3,35,37,38]. Furthermore, the Principal Coordinates Analysis (PCoA) performed considering the microbiome composition originating from different AD environments revealed a clear separation of samples in three groups, one formed by thermophilic reactors fed with a mixture of carbohydrates and LCFA, one formed by thermophilic reactors fed with acetate and lactose and the third one represented by mesophilic samples (Additional File 2, Figure S4 A-C). This is in agreement with previous findings [3,4] showing mostly specialized microbial communities depending on the temperature regime. The high heterogeneity in metadata accompanying the experiments evidenced the importance of establishing common guidelines regarding the parameters that have to be recorded during AD process. These standards will simplify the comparison among projects and will allow the correlation between metadata and microbial composition.
Considering a concept of “core microbiome”, meaning that some species are present in the anaerobic digestion microcosm independently of the applied environmental parameters, we identified only few MAGs in multiple samples (Additional File 2, Figure S3; Additional File 8). By considering the highly abundant MAGs (more than 1% relative abundance), only 25 were present in more than 10% of the samples, while 1246 were considered as low abundant (lower than 1%) (Additional File 2, Figure S5). Among the 25 abundant MAGs, four methanogenic Archaea were identified, namely the Candidatus Methanoculleus thermohydrogenotrophicum AS20ysBPTH_159, Methanosarcina thermophila AS02xzSISU_89, Methanothrix soehngenii AS27yjCOA_157 and Methanoculleus thermophilus AS20ysBPTH_14. The remaining 21 MAGs were assigned to the phyla Firmicutes (14 MAGs), Bacteroidetes (2 MAGs), Synergistetes (2 MAGs), Thermotogae (1 MAG) and Coprothermobacterota (1 MAG). Interestingly, Defluviitoga tunisiensis AS05jafATM_34, one out of seven MAGs of the phylum Thermotogae identified in this study, was present at high abundance (average 2.1%; maximum 58.9%). Widespread identification of this species in reactors suggests its central role in thermophilic AD system possibly associated to specific metabolic potential related to saccharide, polyol, lipid transport systems (Additional File 9) and hydrogen production . Analysis of the low abundant MAGs (threshold 0.001%), revealed that 94% of these taxa were present in more than 10% of the samples, and the phyla statistically over-represented in this group were Chloroflexi, Elusimicrobia, Firmicutes and Plantomycetes (p<0.01). This finding indicates that many MAGs are widespread in the global AD microbiome, but they are present at very low relative abundances. Differently from other ecological niches (e.g. human gut) a “core microbiome” present in all the reactors was not clearly identified. However, the existence of distinct core microbiomes characterizing groups of reactors with similar characteristics (e.g. feedstock or temperature) is more realistic, as also previously hypothesized .
Functional analysis of the microbiome
Metabolic pathway reconstruction and biological role interpretation of 1401 HQ and MHQ MAGs were performed by applying a collection of functional units, called KEGG modules. Analysis was performed on 610 modules, and identified that 76.2% of them are “complete” in at least one MAG, 10.1% have at best one block missing (1 bm) and 2.5% have at best two blocks missing (2 bm). In the following sections, only complete and “1 bm” modules will be considered. Modules distribution and completeness indicated that a very low number of them are widespread in MAGs, while the majority has a scattered distribution in terms of presence/absence (Fig. 5). Additionally, the association of many modules with some specific taxa is remarkable; in fact, a strong correlation between the clustering based on modules presence/absence and MAGs taxonomic assignment was found (Fig. 5; Additional File 10).
Main functions within the anaerobic digestion food-chain
Initial evaluation was focused on the identification of MAGs having a specific KEGG module. Considering both the complete and “1 bm” modules, only 15 “core modules” have been identified in more than 90% of the HQ-MHQ MAGs. These include for example “C1-unit interconversion”, “PRPP biosynthesis”, “glycolysis, core module involving three-carbon compounds”. Other 223 “soft core modules” were present in 10% to 90% of the HQ-MHQ MAGs. Finally, 289 “shell modules” have been identified in less than 10% of MAGs, including those associated with “methanogenesis”, “reductive citrate cycle” and “Wood Ljungdahl (WL)-pathway”. The high fraction of “soft core” and “shell” modules revealed a highly specialized microbial community, with a small number of species performing crucial functions such as methanogenesis. Results obtained revealed the presence of a small fraction of “multifunctional MAGs” (~1.6%) with more than 180 modules encoded. These microbes are mainly associated to specific taxa, and considering the HQ-MHQ MAGs, they represent 8.6% of the Proteobacteria, 14.3% of the Chloroflexi, 7.7% of the Planctomycetes. Thus, the AD microbiome typically comprises “oligofunctional” MAGs, which are characterized by the presence of less than 80 modules. Taxonomic distribution of the 89 HQ “oligofunctional” MAGs demonstrated that they were phyla-specific, representing 91.7% of the HQ Tenericutes, 32.2% of the HQ Euryarcheota and 19.7% of the HQ Bacteroidetes.
Carbon fixation and methanogenesis
Particular attention was given to the modules associated with “methane metabolism”, and especially to the conversion of different substrates (carbon dioxide, acetate, methylamines and methanol) into methane. These modules were identified with different frequencies in the AD microbiome. Carbon dioxide reduction was identified in 29 MAGs, acetate conversion in 25 MAGs, methanol reduction in 40 MAGs and methylamine-methane conversion in 17 MAGs.
Apart from the fundamental role of methanogenesis in the AD system, the conversion of acetate, carbon dioxide and hydrogen can follow different pathways and can be strongly influenced by the environmental conditions. Practically, these flows are of particular interest for applying recent technologies, such as biomethanisation or bioaugmentation. Considering the modules associated with carbon fixation, those encountered more frequently were the phosphate acetyltransferase-acetate kinase pathway (acetyl-CoA => acetate) identified in 1,155 MAGs (82.4%) with 988 MAGs encoding the complete module, the reductive acetyl-CoA pathway (also called Wood-Ljungdahl pathway) identified in 86 MAGs (5.8%) with 52 encoding the complete module, and the reductive pentose phosphate cycle (ribulose-5P => glyceraldehyde-3P) identified in 128 MAGs (9.1%) with 42 encoding the complete module. The W-L pathway is present only in 0.49% of the microbial genomes deposited in the KEGG database; notably, this pathway is proven to be more common among the members of the AD microbiome. The taxonomic distribution of the 86 MAGs encoding the W-L pathway is mainly restricted to Firmicutes (75.6%), followed by Chloroflexi (9.3%), Proteobacteria (7%), Euryarchaeota (3.4%) and Actinobacteria (2.3%). Functional activity and syntrophic association with methanogens was previously reported for some of these species (e.g. Tepidanaerobacter syntrophicus, Syntrophorhabdus aromaticivorans and Desulfitobacterium dehalogenans) [40–42] . However, the vast majority was not previously characterized at the genome level, suggesting that potential syntrophic acetate oxidizer (SAO) or acetogenic metabolism are present in many unknown species. Most of the MAGs encoding the W-L pathway (putative SAO bacteria or acetogens) are rare in the microbiome and on average they do not exceed 1% of relative abundance. However, under certain conditions they can become dominant, as for example Firmicutes sp. AS4GglBPBL_6 (24.8% relative abundance in the Fangel biogas plant), Firmicutes sp. AS02xzSISU_21 (32% in reactor fed with avicel) and Firmicutes sp. AS4KglBPMA_3 (12% in the Nysted biogas plant). This piece of information is quite useful for the design of bioaugmentation strategies targeting biogas reactors that are fed with nitrogen/ammonia rich substrates. Interestingly, the Fangel biogas plant showed a high total ammonia level during the sampling process (4.2 g/L)  (Additional File 1). This indicates that, despite SAO bacteria are usually present at low abundance, environmental parameters of the reactors can strongly influence their abundance and probably their activity. More specifically, high acetate concentrations can disturb acetoclastic methanogenesis leading to a shift towards SAO process coupled with hydrogenotrophic methanogenesis. Despite it is hard to classify the species mentioned above as SAO or acetogens, this result can provide a more accurate evaluation of the fraction of bacteria involved in acetate conversion and may support the delineation of a more accurate mathematical model for the AD process.
Relative abundance of KEGG modules
Considering the relative percentage of HQ MAGs in each condition, along with the completeness of KEGG modules, it was possible to estimate the relative abundance of each module in all samples (Additional File 11). Although measurements at the RNA/protein level are needed to have direct information on pathways activity, it is evident that different samples have highly variable representation of crucial KEGG modules (Fig. 6). It is noteworthy that the relative abundance of MAGs potentially associated to the hydrogenotrophic and acetoclastic methanogenesis is highly variable among samples. Particularly, in biogas plants characterized by low TAN (1.9-2 mg/L) (e.g. “BP-Gimenells” and “BP-LaLlagosta”), acetoclastic methanogenesis is favored and the ratio acetoclastic/hydrogenotrophic is 0.94 and 0.99, while in biogas plants where TAN is high (4-7 mg/L) (e.g. “BP-Vilasana”, “BP-Torregrossa” and “BP-Fangel”) the ratio acetoclastic/hydrogenotrophic is 0.16, 0.21, 0.02. Analyzing reactors where ammonia levels were reported, it was indeed found a significant correlation (R2 0.62, p 9.3 E-5) between ammonia concentration and the “acetoclastic/hydrogenotrophic” ratio. Additionally, there is a high level of acetoclastic methanogenesis in reactors fed exclusively with acetate, such as “LSBR-D122-DNA-BF-Rep1”, “LSBR-D200-DNA-BF-Rep1” and “LSBR-R3-acetate”. Relative abundance of the methanogenic modules was found to be highly different among samples considered. As expected, it was close to zero in acidogenic reactors (pH<5, “LSBR-DSAc-preH2” and “LSBR-DSAc-postH2”) and very high in reactors with acetate as feeding substrate (e.g. “LSBR-D200-DNA-BF” or “LSBR-R1-acetate”). The high abundance of methanogenic modules in the latter reactors can be correlated with the direct use of the substrate by acetoclastic methanogens, with a parallel reduction of the species encoding the W-L pathway.
Polysaccharides degrading functions
Cellulosic biomass in AD is represented by agricultural residues and dedicated energy crops, and is the most abundant carbon source . In order to find the species involved in complex carbohydrate decomposition, MAGs featuring high enrichments in CAZymes (p<1*e-5) have been selected for further analysis (Additional File 12). Globally, 490 HQ MAGs (35% of the total) are enriched in one or more CAZymes classes, evidencing that polysaccharide degradation is one of the most widespread functional activities in the AD system. Although polysaccharide degraders are frequently associated to Firmicutes (246 MAGs) and Bacteroidetes (68 MAGs), many other phyla were found to be enriched, and an involvement in polysaccharide degradation can be hypothesized for members of other taxa. For example, all MAGs belonging to the Candidatus Hydrogenedentes, the Armatimonadetes, 90% of the Fibrobacteres, 93% of the Lentisphaerae and 85% of the Planctomycetes are potentially involved in this process. Some members of the CPR taxa are also predicted as associated to carbohydrate degradation, such as Candidatus Dojkabacteria .
A tentative estimation of the relative impact of the polysaccharide degradation process in different samples (Fig. 6 C) was obtained by considering the relative abundance of MAGs encoding genes for a specific function (e.g. “cohesin”, “dockerin”, or “Carbohydrate Esterases”). A few samples are dominated by polysaccharide hydrolyzing MAGs, (e.g. “LSBR-R1-avicel”), most probably because they were fed with substrates rich in cellulose, while generally the fraction is lower than 2%, particularly in biogas plants (Fig. 6 C). This indicates that, despite the number of MAGs involved in polysaccharide degradation is high, the relative abundance of most species is low. This can be due to the presence of relative minor players in terms of abundance, but having high transcriptional activity; if they are highly active, they can enhance or trigger the metabolic processes of dominant members. However, this needs additional verification to be demonstrated.MAGs replication index
Analysis of MAGs provides insights into the genetic composition of non-cultivable biogas community members and enhances our understanding of their contribution to the AD process. Such analysis is able to provide knowledge related to the replication capacity of certain biogas-producing members. Although the results obtained have to be considered with caution, bacterial replication index offers information on the growth dynamics and life cycles of microbial species, which in turn can be an indicator of community composition and the in situ activity of different species within the sub-communities.
To determine the replication index of MAGs across multiple samples, the sequencing coverage resulting from bi-directional genome replication was used to calculate the index of replication (iRep) . In total, 2,741 measurements were obtained for 538 MAGs (Additional File 13). Considering the median iRep values determined in all different samples for each MAG, it was obvious that nearly 90% of species showed similar values between 1.1 and 2, and only 10% had values between 2 and ~4 and can be considered as “fast growing”. Among the fast growing species, there are microbial members of the poorly characterized phylum Atribacteria ( Atribacteria sp. AS08sgBPME_53, iRep 2.9), and the candidate syntrophic species Defluviitoga tunisiensis AS05jafATM_34 (iRep 2.53) . Results were obtained for 28 phyla evidencing that Tenericutes, Spirochaetes, Atribacteria, Thermotogae, Synergistetes, and Coprothermobacterota have on average high median iRep values (iRep 1.66, 1.77, 2.12, 2.53, 2.13, 2.99, respectively) (p-values 8.63E-10, 2.52E-04, 7.59E-04, 2.61E-05, 2.22E-11, 0.016), while Euryarchaeota and Acidobacteria have low values (1.37 and 1.41) (p-values 7.02E-05 and not statistically significant NSS, respectively) (Fig. 7A). Euryarchaeota species having multiple replication origins were 18 and have been excluded from the analysis (Supplementary file 2), however results should be treated with caution. MAGs belonging to the phyla Bacteroidetes and Firmicutes have similar (and low) median iRep values (both 1.52) except some outliers. Otherwise, iRep values assigned to Synergistetes and Coprothermobacterota are distributed over a wide range, but on average are higher than that of other phyla (2.12 and 2.99) (Fig. 7). The limited growth rate of some taxa, such as Acidobacteria, was also previously reported  and it was speculated that this property hampered their isolation. The high iRep values measured here for some known species also suggest that their isolation may be easier as previously assumed .
Finally, Euryarchaeota replication index was calculated (~1.52 on average) for 8 MAGs having different abilities in substrate utilization. Interestingly, while M. soehngenii was previously defined as a slow-growing methanogen specialized in acetate utilization , 7 out of 9 iRep results obtained for M. soehngenii AS21ysBPME_11 are higher than 2, while all the other Archaea had values between 1.2 and 2 (Fig. 7B). Findings reported for AS21ysBPME_11 indicate that, in a complex microbiome, growth rates can be very different compared to those determined for isolated species under laboratory conditions, possibly because of cooperative/syntrophic associations with other microbes, or difficulties in identifying the appropriate growth medium.
Our findings also suggest that duplication rates are dependent on metabolic properties of MAGs. Calculation of iRep values performed independently for MAGs encoding different KEGG modules evidenced that MAGs involved in polysaccharide degradation have quite low iRep values; this is more evident for microbes growing attached to plant material with cohesin/dockerin domains (iRep 1.41) (p-value 0.024). These species represent the so-called slow-growing cellulolytic microflora . Species involved in “carbon fixation” (e.g. “reductive citrate cycle” or “W-L pathway”) have higher values (iRep 1.40; 1.53) (p-values 1.44E-08 and NSS, respectively). Additionally, iRep values were obtained for poorly characterized taxa such as Atribacteria and Candidatus Fermentibacteria (Fig. 7A), suggesting that most of the species are slow-growing members of the AD system, but with some exceptions such as Atribacteria sp. AS08sgBPME_53.
Availability of iRep values for a large number of species, and their association with functional roles of microbes can provide an estimate of the growth dynamics of species involved in particular steps of the AD food chain. Since nowadays mathematical models of the AD system are based on growth rates measured for a limited number of species, information obtained from iRep can provide a more generalized representation of microbial dynamics which can be included in simulations, reinforcing their predictive efficiency.