We characterized the fecal microbiome of a healthy dog using long-read metagenomics with Nanopore sequencing. An overview of the complete experimental design is presented on Fig. 1. We obtained a total of 16.94 million reads (36.05 Gb), after two runs corresponding to the HMW and non-HMW DNA extractions.
After high accuracy basecalling and error correction, we performed several metagenomics assembly strategies to retrieve eight single-contig high-quality MAGs (HQ MAGs), which were > 90% complete with < 5% contamination and contained most ribosomal genes and tRNAs, and three medium-quality ones (MQ MAGs). We further corrected the HQ MAGs for frameshifts errors and compared them at the functional level with those previously identified in other gastrointestinal catalogs.
HMW sequencing produced 5.81 million reads with N50 of 4,369 bp and a median length of 2,312 bp (total throughput: 18.76 Gb), whereas non-HMW produced 11.13 million reads with N50 of 2,102 bp and a median length of 1,093 bp (total throughput: 17.29 Gb).
We taxonomically classified all the uncorrected raw reads with Kraken2 and found 81.8% of classified reads in HMW vs. 70.8% in non-HMW. More than 99% of total reads corresponded to Bacteria. The most abundant phylum was Bacteroidetes (~ 80% of total reads), followed in abundance by Firmicutes (12.5% in HMW vs. 8.9% in non-HMW), Proteobacteria (~ 5%), and Fusobacteria (1.9% in HMW vs. 3.9% in non-HMW). At the genus level, this dog fecal microbiome was rich in Prevotella (> 50%) and Bacteroides (> 20%). Moreover, it also contains Fusobacterium, Megamonas, Sutterella, and other fecal-related genera, representing each one of them less than 5% of the total bacterial composition (Additional File 2).
The metagenomics assembly with the HMW-DNA dataset is more contiguous, presenting fewer and longer contigs than the non-HMW DNA one (contigs: 1,898 vs. 2,944; N50: 187,680 vs. 94,109 bp) (Additional File 3). Moreover, HMW-DNA metagenomics assembly retrieves six HQ MAGs, yet only one HQ MAGs are retrieved from the non-HMW DNA assembly (Fig. 2 and Additional File 3).
In summary, HMW DNA extraction improved the taxonomic classification of the raw unassembled reads (less unclassified reads), the metagenomics assembly contiguity, and the retrieval of longer and circular contigs (potential HQ MAGs). Thus, HMW DNA extraction becomes the preferred choice to recover HQ MAGs directly from complex metagenomics samples.
Metagenomics assembly with different subsets followed by frameshift aware correction retrieved eight high-quality MAGs
To ensure the highest coverage and consensus accuracies for the retrieved MAGs, we further merged and assembled the HMW and the non-HMW datasets (100% dataset; 16.94 million reads, 36.05 Gb). As we aimed to retrieve the maximum number of HQ MAGs, we performed extra metagenomics assemblies using 75% and 50% data subsets from that merged dataset (Additional File 3).
After assigning taxonomy and comparing among assemblies, we identified non-redundant MAGs: eight HQ MAGs, and three MQ MAGs (Table 1). When compared to HMW assembly, we retrieved two new MQ MAGs from the 100% data assembly (the HMW and the non-HMW datasets together). Moreover, two MQ MAGs from HMW and 100% datasets were recovered as HQ MAGs from the 75% dataset. None of the performed assemblies alone retrieved the eight HQ MAGs.
Table 1
High quality (HQ) and medium quality (mq) single-contig MAGs retrieved in each metagenome assembly. Taxonomy assigned using the GTDB database release 95. Q is the MAG quality. Cov. is the coverage from Flye. *Blautia_A sp900541345 and *g__Sutterella HQ MAGs after correction of the indels.
| HMW data | 100% data | 75% data | 50% data |
Taxonomy (GTDB) | Q | Cov. | Q | Cov. | Q | Cov. | Q | Cov. |
HQ MAG | | | | | | | | |
g__Succinivibrio | HQ | 47X | HQ | 101X | mq | 82X | HQ | 50X |
g__Sutterella* | mq | 95X | mq | 159X | HQ | 123X | mq | 87/80X |
Prevotellamassilia sp900541335 | HQ | 394X | HQ | 577X | HQ | 430X | HQ | 282X |
Phascolarctobacterium sp900544885 | HQ | 87X | HQ | 205X | HQ | 155X | mq | 98X |
Catenibacterium sp000437715 | HQ | 13X | mq | 24X | HQ | 17X | mq | 11X |
Blautia_A sp003287895 | - | - | mq | 38X | HQ | 31X | mq | 18X |
Enterococcus_B hirae | HQ | 17X | HQ | 42X | HQ | 31X | HQ | 22X |
Blautia_A sp900541345* | HQ | 44X | - | - | mq | 45X | - | - |
MQ MAG | | | | | | | | |
Phocaeicola plebeius | mq | 126X | mq | 234X | mq | 168X | - | - |
g__Bacteroides | mq | 206X | mq | 368X | mq | 282X | mq | 196X |
g__Phocaeicola | - | - | mq | 271X | - | - | - | - |
For each HQ MAG, we selected the representative with the highest coverage –and subsequent highest consensus accuracy– for further analyses. We performed an extra step of frameshift aware correction that reduced the insertions and deletions (indels), which are the most abundant nanopore sequencing error type. The frameshift correction resulted in fewer predicted coding sequences (CDS) (Fig. 3, and Additional File 4). This correction step transformed two MQ MAGs into HQ MAGs: Blautia sp900541345 on the HMW-only assembly (from MQ MAG with 84.99% completeness to HQ MAG with 93.86% completeness) and the Sutterella MAG on the 75% assembly (from MQ MAG with 84.88% completeness to HQ MAG with 95.49% completeness) (Fig. 3). On the other HQ MAGs, completeness remained constant or increased after applying the frameshift correction, except for one of the contigs (Enterococcus hirae, 47X coverage; completeness of 99.69–99.13% after the indel correction). The differences in applying frameshift correction were more evident in contigs with low coverage than in those with high coverage.
High-quality MAGs of the canine fecal microbiome improved previous genome assemblies
From a single canine fecal sample, we obtained eight HQ MAGs that were single-contig, > 90% complete with < 5% contamination, and contained most ribosomal genes and tRNAs (Table 2). Thus, they represent HQ MAGs, without gaps or unplaced scaffolds regarding MIMAG criteria [32]. We used GTDB-tk to assign the taxonomy and assess the potential novelty. The ANI values serve to identify potential novel taxa (> 95% ANI are considered as the same species [36, 48]).
Table 2
Summary of genome statistics for High-quality MAGs comparison when compared to representatives on the public datasets. Completeness (% Compl.) values come from CheckM; tRNAs and rRNA values from PROKKA. MAGs in public databases with > 95 ANI represent the same species. Ref, reference, complete genome. *Despite Succinivibrio and Sutterella were potential novel species regarding GTDB, we found a single MAG > 95% ANI on the animal gut metagenome and UHGG catalog, respectively.
HQ MAG | Length (Mbp) | % Compl. | tRNAs | rRNAs | Contiguity level |
Succinivibrio sp.* | 2.04 | 98.68 | 77 | 22 | Complete |
> 95% ANI to Succinivibrio MAG in dog GI | 1.74 | 97.5 | 32 | 0 | 185 contigs |
Sutterella sp.* | 2.7 | 95.49 | 67 | 18 | Complete |
> 95% ANI to Sutterella MAG in human GI | 1.14 | 78.72 | 37 | 0 | 24 contigs |
Prevotellamassilia sp900541335 | 2.72 | 97.65 | 72 | 21 | Complete |
> 95% ANI to GCA_900541335.1 | 2.42 | 96.13 | 16 | 0 | 95 contigs |
Phascolarctobacterium sp900544885 | 2.09 | 99.85 | 58 | 15 | Complete |
> 95% ANI to GCA_900544885.1 | 1.75 | 98.65 | 18 | 1 | 87 contigs |
Catenibacterium sp000437715 | 2.53 | 98.5 | 76 | 21 | Complete |
> 95% ANI to GCF_004168205.1 | 2.54 | 100 | 20 | 2 | 212 contigs |
Blautia sp900541345 | 2.44 | 93.86 | 53 | 18 | Complete |
> 95% ANI to GCA_900541345.1 | 2.69 | 95.85 | 16 | 0 | 160 contigs |
Enterococcus_B hirae | 2.78 | 99.13 | 69 | 18 | Complete |
Ref: GCF_000271405.2 | 2.83 | 99.63 | 71 | 18 | Complete |
Blautia sp003287895 (Blautia argi) | 2.96 | 92.78 | 58 | 10 | Complete |
Ref: GCF_003287895.1 | 3.3 | 97.64 | 57 | 14 | Complete |
Despite Sutterella and Succinivibrio were considered novel by GTDB-tk, we found one MAG for each in human and dog GI datasets, respectively, that presented > 95% ANI to our HQ MAGs. Similarly, Prevotellamassilia sp900541335, Phascolarctobacterium sp900544885, Catenibacterium sp000437715, and Blautia sp900541345 HQ MAGs were representing bacterial species previously retrieved from metagenomes. In contrast, Enterococcus_B hirae and Blautia sp003287895 HQ MAGs were representing bacterial species that have complete reference genomes. In fact, Blautia sp003287895 –proposed name Blautia argii– was first isolated and characterized from dog feces [49]. Enterococcus_B hirae and Blautia sp003287895 HQ MAGs were aligned against their respective reference genomes to prove and validate the results (Additional File 5).
To conclude, six out of eight HQ MAGs represented bacterial species that lack a complete genome reference. Their current representatives are MAGs retrieved with short-read data, so highly fragmented and containing only a few ribosomal genes (if any) (Table 2).
Screening of previous microbiome studies revealed the first potential genome assembly for Sutterella stercoricanis
We assessed the prevalence of the HQ MAGs retrieved in the present study among several GI microbiome surveys, either using whole-genome data (metagenome surveys) or the 16S rRNA genes data (amplicon surveys).
On the one hand, we assessed the prevalence of our HQ MAGs in humans' [44] and animals' [10] gastrointestinal metagenome catalogs (Table 3). We identified that some of the bacterial species represented by the HQ MAGs from this study seem to be more canid-specific – Blautia_A sp900541345, Phascolarctobacterium sp900544885, Prevotellamassilia sp900541335, Succinivibrio –, whereas others are more broadly distributed among animal microbiomes –Catenibacterium sp000437715, Enterococcus_B hirae, Blautia sp003287895, and Sutterella–.
On the other hand, we took advantage of the fact that long-read sequencing allows retrieving complete ribosomal genes, which are universal taxonomic markers for Bacteria. So, we further extracted the 16S rRNA genes of the HQ MAGs to link them to 16S rRNA gene-based microbiome studies (Fig. 4, and Additional File 6) –most of the microbiome studies use this genetic marker. We found out that the Sutterella HQ MAG is potentially the first high-quality genome assembly for Sutterella stercoricanis since its 16S rRNA genes presented identities > 98% with the previously reported 16S rRNA gene reference (NR_025600.1) (Fig. 4). S. stercoricanis was first isolated in feces from a healthy dog and was characterized using microbiological methods and 16S rRNA gene sequencing [50].
For the other five HQ MAGs without a reference genome, we identified that their 16S rRNA genes were closely related to others previously identified in wolves' distal gut microbiome [51] (Succinivibrio HQ MAG and Prevotellamassillia HQ MAG), canine intestinal microbiome [52] (Phascolarctobacterium HQ MAG), and human GI microbiome [53] (Catenibacterium and Blautia sp900541345 HQ MAG) (Additional File 6).
Table 3
Prevalence of the bacterial species identified in public microbiome surveys. For human-derived MAGs, the Unified Human Gut Genome database was used [44]. For animal-derived MAGs, the animal gut metagenome catalog [10] was used. If no MAG belonged to that bacterial species, we further screened GTDB [34]. For further detail on 16S rRNA gene phylogenies, see Additional File 6.
HQ MAG | Dog | Human | Other animals | Closest 16S | Main host |
Blautia_A sp900541345 | 35 | 1 | 0 | Human gut | Dog |
Phascolarctobacterium sp900544885 | 12 | 1 | 0 | Dog gut | Dog |
Prevotellamassilia sp900541335 | 7 | 1 | 0 | Wolves’ gut | Canids |
g__Succinivibrio | 1 | 0 | 0 | Wolves’ gut | Canids |
Catenibacterium sp000437715 | 27 | 691 | 2 | Human gut | Human, animal |
Enterococcus_B hirae | 1 | 35 | 3 | Multiple | Human, animal |
Blautia sp003287895 | 1 | 6 | 1 | Dog gut | Human, animal |
g__Sutterella | 0 | 1 | 0 | Multiple carnivora | Human, animal |
Finally, we performed a pangenome analysis among the HQ MAGs from our study and other genomes from the same bacterial species inhabiting different hosts to assess functional and genomic similarities (Additional File 7). We included only those in which more than 10 representative genomes were available: Blautia_A sp900541345 (Additional File 7A), Catenibacterium sp000437715 (Additional File 7B), Enterococcus_B hirae (Additional File 7C), Phascolarctobacterium sp900544885 (Additional File 7D). Based on the ANI values, the HQ MAGs clustered with dog MAGs for Blautia, with a human MAG for Phascolarctobacterium, and with MAGs from mixed host origins for Catenibacterium and Enterococcus hirae (Additional File 7). The number of gene clusters belonging to the accessory genome was the highest for Catenibacterium (84%) when compared to Enterococcus hirae (66%), Phascolarctobacterium sp900544885 (60%), and Blautia_A sp900541345 (50%). Altogether, these results coincide with the fact that Catenibacterium and Enterococcus hirae seem to be more broadly distributed among different hosts (Table 3).
Long reads provide genomic context and enable capturing mobilome functions and antimicrobial-resistant genes
Long-reads enable to retrieve complete genes and their genomic context within a single read. Therefore, both the mobile genetic elements and the antimicrobial resistance genes assemble easily within the correct MAG.
We compared each HQ MAG's functional potential to previously published MAGs from the same bacterial species found in GI microbiome of dogs, humans, or other animals (Fig. 5). The main difference between the long-read HQ MAGs and other genomes from the same species in the public database is the overrepresentation of the COG category corresponding to Mobilome, except for Blautia argii and Enterococcus hirae, both with a reference genome in the database (Fig. 5B). Conversely to the MAGs from both UHGG and the animal gut metagenome catalogs obtained using exclusively short reads, the long-read metagenomic approach can retrieve mobile genetic elements and assemble them to the proper contig.
Finally, we further characterized the HQ MAGs to assess their potential antimicrobial resistance. Tetracycline resistance genes were detected in Enterococcus hirae (tetM gene), Catenibacterium sp000437715 (tetM gene), Blautia sp900541345 (tet(O) gene), and Blautia sp003287895 (tet(32) and tet(40) genes). Moreover, Enterococcus hirae also harbored aac(6')-Iid gene conferring resistance to aminoglycosides. Prevotellamassilia HQ MAG harbored Mef(En2) gene, which encodes for an efflux pump that exports macrolides. Phascolarctobacterium HQ MAG harbored two copies of lnu(C) gene conferring resistance to lincosamide. Each lnu(C) gene was located in an ISSag10 mobile element, allowing it to transpose. Succinivibrio and Sutterella HQ MAGs did not contain any antimicrobial resistance genes.
As an example of the potential of long-reads for providing genomic context, we were able to identify that tetM gene in Enterococcus hirae was in a region identified as a conjugative element (Tn916) integrated into the chromosome. This region encoded for a transposase, type 4 secretion system (T4SS), type 4 coupling protein, oriT, and relaxase (Additional File 8).