Digestate samples and DNA isolation
We isolated DNA from digestates retrieved from 17 biogas plants, two wastewater treatment plants (WWTP) and one laboratory-scale reactor, all located in either Sweden (SW) or Germany (GER). These AD processes were chosen as they use common feedstocks/substrates such as organic household waste, slaughterhouse waste, manure, sewage sludge and agricultural products (Table 1). Thirteen operate under mesophilic temperature (37-42°C), three under hyper-mesophilic temperature (44°C) and four under thermophilic temperature (48-52°C) (Table 1). The digestate samples, hereafter referred to simply as samples, were taken directly from the methanogenic reactors without any further storage. All German and one Swedish sludge sample(s) were pre-frozen for transportation.
The DNA isolation protocol described in this work was extensively pre-tested and optimised on sample 01-SW before being applied to the other 19 reactor samples. Sample 01-SW was from a laboratory-scale reactor mimicking the operation of a large-scale biogas plant. The protocol was optimised for the largest possible DNA fragment length and high DNA purity, for optimal nanopore activity and sequencing yield. The full protocol, with each step described in detail, is available online under at protocols.io [19]. The protocol includes all necessary and optimised steps in order to improve DNA yield and length, with brief explanations. The most significant improvements in overall DNA yield and length are shown in Figure 1.
In brief, we first compared the reactor sample with controlled-growth culture samples using the DNA isolation kit from the FastDNA Spin Kit for Soil (MP Biomedicals) according to the manufacturer. As expected, we observed more digested DNA in the sludge samples, with significantly lower DNA length to yield ratio than in the cultures (Figure 1A). To improve overall DNA yield and fragment length, we modified the washing steps and the total amount of sludge sample used. Increasing the amount of sludge used slightly increased the total DNA yield. Performing an additional humic acid (HA) washing step improved the yield and reduced the number of smaller DNA fragments. Both steps combined yielded the best overall results (Figure 1B). More additional washing steps with the default buffer did not improve the results further (data not shown).
The reactor samples contained large amounts of degraded DNA, due to the high biological activity in AD. These small fragments had a negative impact on the overall sequencing performance and the total sequencing yield. Centrifugation of the samples and replacing the supernatant with distilled water significantly reduced the amount of degraded DNA (Figure 1C). DNase treatment before isolation resulted in reduced smear, but in one case also reduced the overall yield during DNA isolation (sample lane 4 in Figure 1C). Thus, DNase treatment seems unsuitable for achieving a stable and reproducible protocol in this case.
Another influencing factor was the bead beating settings. We achieved the best results using a force of 6 m/s once for 20 s with a FastPrep -24™ Classic Instrument (MP Biomedicals™) (Figure 1D). We strongly recommend re-evaluating the force if another device is used for performing beat beating. To remove small DNA fragments (<1000 bp) produced during bead beating, the DNA sample was cleaned up using magnetic beads (0.35:1 ratio of beads to sample). This step also removed other impurities that may remain after DNA isolation.
Figure 1: Effect on DNA quality of different steps during isolation. White boxes indicate the chosen approach for the DNA isolation protocol. A) Default manufacturer’s protocol applied to (left) sludge and (right) controlled growth cultures. B) DNA yield/length after introduction of a humic acid removal wash (HA-wash) step. Arrows indicate use of 0.4 mL sludge instead of 0.2 mL. C) Pre-preparation of sludge and effect on DNA yield/length. (supern.: supernatant removed and replaced with water; DNase: sample pretreated with DNase; control: no pre-preparation; both: supernatant replaced with water and incubated with DNase). D) Impact of different bead beating settings on DNA yield/fragment length.
Sequencing yield and quality of metagenomic DNA
The optimised DNA isolation protocol [19] was applied to all 20 digestate samples (Table 1), using a MinION device (Oxford Nanopore Technologies) with a single flow cell for each DNA sample. We initially used the LSK-108 Kit, but changed later to the LSK-109 Kit as we observed better sequencing performance. Figure 2 summarises the sequencing results and the quality scores obtained.
Figure 2: Summary of all 20 sequence runs of biogas reactor samples using the MinION device. Different colours (red/blue) indicate properties that influenced quality parameters.
We observed a slight decrease in N50 and median read length for pre-frozen samples compared with fresh samples. Sequencing runs via LSK-108 Kits were also performed using flow cells stored for two months or longer, which is likely the main reason for the decrease in median Q-score. Total sequencing yield varied between 18.2 and 28.5 Gb for most samples and, across all samples, we achieved a median of 20.5 Gb. Total sequencing yield was below 10 Gb for only two of the 20 samples (03-SW and 04-SW). Thus the DNA isolation protocol presented here yields sufficient DNA of appropriate quality for nanopore sequencing and an acceptable number of long reads in all cases tested.
Abundance estimation and taxonomic read classification
Classifying as many reads as possible is essential to achieve the best characterisation and abundance estimates for the microbial community. To this end, we chose centrifuging for the taxonomic classification, because it allows for smaller kmer size to consider the higher error profile of long reads [20]. Moreover, the availability of pre-built databases and less RAM consumption in comparison with other classifiers made it a favourable choice. We also applied three different taxonomic indices: the default NCBI RefSeq supplied by the centrifuge manufacturer (https://ccb.jhu.edu/software/centrifuge/manual.shtml); the GTDB index (GTDB version r89, de-replicated to 54k genomes) from Méric et al. [21]; and an GTDB index based on 24,706 bacterial and archaeal representative species from Parks et al. [22]. We used Illumina reads (paired-end 150 bp) for each reactor to compare classification capability. Figure 3 gives an overview of the proportion of reads classified for each taxonomic index at phylum, genus and species level.
Figure 3: Proportion of taxonomically classified reads of each reactor sample at phylum, genus and species level. The four columns for each reactor represent different read classification approaches. The first bar represents Illumina reads using the GTDB-Méric index, for direct comparison against nanopore sequencing.
In all cases, we achieved a dramatic increase (range 1.8- to 13-fold) in the proportion of classified reads when using the GTDB index rather than the RefSeq index. In particular, improvements were obtained for samples 05‑GER, 04‑SW, 04‑GER and 03‑GER (11.5- to 13.2-fold increase in classified reads). We did not observe any noticeable decrease in classified reads when comparing phylum to genus or species level. The average proportion of classified reads was 8.8% of all reads on genus level using the RefSeq database and 42.11% using the ‘GTDB-Méric’ index. These results correspond to a median 4.8-fold increase in classified reads across all samples. For the reactors with a 10-fold or higher increase, more meaningful observation of the actual bacterial/archaeal composition was possible. Based on these results, all samples were subjected to further abundance analysis based on the ‘GTDB-Méric’ index, with abundance calculated based on the proportion of classified taxonomic reads in all classified reads. The results obtained at phylum level are shown in Figure 4 and the results at genus level in Supplementary Figure S1.
Figure 4: Summary of abundance for samples from all 20 reactors, calculated based on all classified reads. Only phyla with at least 0.1% abundance are shown (circles). The size and colour of each circle correspond to the abundance of each phylum in each reactor. Taxonomic names are based on GTDB version r89 (https://gtdb.ecogenomic.org), which includes “placeholder” names like Firmicutes_A. DTU030 corresponds to the NCBI phylum Firmicutes. UBP3 corresponds to “bacteria” classification via NCBI. The reactor samples are described in detail in Table 1.
In general, we identified a wide range of different phyla, especially for Firmicutes, due to the more phylogenetically coherent taxonomic definitions by GTDB. This increased taxonomic granularity at phylum level provided better resolution, especially for Firmicutes, thus improving tracking of changes. As known for biogas processes, Bacteroidota (NCBI: Bacteroidetes) and Firmicutes_G and A (NCBI: Firmicutes) were the dominant phyla in most samples [23]. Thermotogota, a phylum frequently found in systems operating at higher temperatures [23], was highly abundant in three of the four thermophilic reactors (02‑SW, 09‑SW and 08‑GER). Euryarchaeota, including methanogens, represented less than 10% of the total classified reads, as frequently described previously for biogas processes [23].
Samples 07-GER and 09-GER represented reactors operated in parallel by the same facility using the same operational parameters and feedstock. Both reactors had very similar microbial composition and structure on both phylum and genus level, but with slight abundance variations (e.g. some very low-abundance organisms were not common to both reactors). Some small abundance variations also occurred at genus level (Supplementary Figure S1).