We applied long-read nanopore sequencing to the fecal microbiome of a healthy dog. At the technical level, we compared the assembly results when choosing a HMW DNA extraction vs. a non-HMW one from the same fecal sample. Finally, using different metagenomics datasets, we retrieved and characterized eight high-quality single-contig draft metagenome-assembled genomes (HQ-MAGs) considering MIMAGs criteria (completeness >90%, contamination <5%, and presence of rRNA and tRNA genes) and three medium-quality draft metagenome-assembled genomes (MQ-MAGs; completeness >50%, contamination <10%) [31].
HMW vs. non-HMW DNA: raw reads and metagenome assembly
HMW sequencing produced 5.81 million reads with N50 of 4,369 bp and a median length of 2,312 bp (total throughput: 18.76 Gb), whereas non-HMW produced 11.13 million reads with N50 of 2,102 bp and a median length of 1,093 bp (total throughput: 17.29 Gb).
We taxonomically classified the uncorrected raw reads with Kraken2 and found 81.8% of classified reads in HMW vs. 70.8% in non-HMW. More than 99% of total reads corresponded to Bacteria. The most abundant phylum was Bacteroidetes (~80% of total reads), followed in abundance by Firmicutes (12.5% in HMW vs. 8.9% in non-HMW), Proteobacteria (~5%), and Fusobacteria (1.9% in HMW vs. 3.9% in non-HMW). At the genus level, this dog fecal microbiome is rich in Prevotella (>50%) and Bacteroides (>20%). Moreover, it also contains Fusobacterium, Megamonas, Sutterella, and other fecal-related genera, representing each one of them less than 5% of the total bacterial composition (Supplementary Figure S1).
The metagenomics assembly is more contiguous, presenting fewer and longer contigs, with HMW-DNA reads rather than non-HMW DNA one (nº of contigs: 1,898 vs. 2,944; N50: 187,680 vs. 94,109 bp). Moreover, HMW-DNA metagenomics assembly retrieves three circular contigs, and that could represent complete closed MAGs, for only one circular with de non-HMW DNA assembly (Figure 1).
So, HMW DNA extraction improved the taxonomic classification of the raw unassembled reads (less unclassified reads), the metagenomics assembly contiguity, and the retrieval of longer and circular contigs (potential HQ MAGs).
Metagenome assemblies, frameshift-aware correction, and retrieval of HQ and MQ MAGs
For the in-depth analyses, we assembled both the HMW only dataset and the HMW and non-HMW merged datasets (100% dataset; 16.94 million reads, 36.05 Gb)to ensure the highest coverage and consensus accuracies. As we aimed to retrieve the maximum number of HQ MAGs, we performed extra metagenomics assemblies using 75% and 50% data subsets from that merged dataset (Table 1).
The number of contigs ranged from 1,898 with HMW dataset to 2,639 when analyzing all the merged data together. N50 ranged from 187,680 bp (HMW dataset) to 149,125 bp (50% subset), and mean coverage ranged from 138X (100% dataset) to 95X (50% subset). The largest contig of 2.95 Mbp was retrieved when using 75% of the data.
Table 1. Flye assembly summary statistics and the number of the final number of HQ and MQ MAGs for each metagenome assembly. HQ: high-quality; MQ: medium-quality.
|
HMW data
|
100% data
|
75% data
|
50% data
|
Total length
|
125,567,322
|
141,997,441
|
131,702,503
|
119,187,600
|
Contigs
|
1,898
|
2,639
|
2,259
|
1,901
|
Contigs N50
|
187,680
|
150,083
|
162,895
|
149,125
|
Largest contig
|
2,751,144
|
2,769,659
|
2,950,218
|
2,846,287
|
Mean coverage
|
104X
|
138X
|
119X
|
95X
|
nº of HQ MAGs
|
6
|
4
|
6
|
3
|
nº of MQ MAGs
|
3
|
6
|
4
|
5
|
After assigning taxonomy and comparing among assemblies, we identified a total of eight different HQ MAGs, and three different MQ MAGs (Table 2). The different datasets retrieved redundant MAGs but with different degrees of quality. None of the performed assemblies alone retrieved all the HQ MAGs.
Table 2. High quality (HQ) and medium quality (MQ) single-contig MAGs retrieved in each metagenome assembly. Taxonomy assigned using the GTDB database release 95. Cov. is the coverage from Flye. *Blautia_A sp900541345 and *g__Sutterella HQ MAGs after correction of the indels.
|
HMW data
|
100% data
|
75% data
|
50% data
|
|
Taxonomy (GTDB)
|
MAG quality
|
Cov.
|
MAG quality
|
Cov.
|
MAG quality
|
Cov.
|
MAG quality
|
Cov.
|
HQ MAG
|
|
|
|
|
|
|
|
Prevotellamassilia sp900541335
|
HQ
|
394X
|
HQ
|
577X
|
HQ
|
430X
|
HQ
|
282X
|
Phascolarctobacterium sp900544885
|
HQ
|
87X
|
HQ
|
205X
|
HQ
|
155X
|
MQ
|
98X
|
Catenibacterium sp000437715
|
HQ
|
13X
|
MQ
|
24X
|
HQ
|
17X
|
MQ
|
11X
|
Enterococcus_B hirae
|
HQ
|
17X
|
HQ
|
42X
|
HQ
|
31X
|
HQ
|
22X
|
Blautia_A sp900541345*
|
HQ
|
44X
|
-
|
-
|
MQ
|
45X
|
-
|
-
|
Blautia_A sp003287895
|
-
|
-
|
MQ
|
38X
|
HQ
|
31X
|
MQ
|
18X
|
g__Succinivibrio
|
HQ
|
47X
|
HQ
|
101X
|
MQ
|
82X
|
HQ
|
50X
|
g__Sutterella*
|
MQ
|
95X
|
MQ
|
159X
|
HQ
|
123X
|
MQ
|
87/80X
|
MQ MAG
|
|
|
|
|
|
|
|
Phocaeicola plebeius
|
MQ
|
126X
|
MQ
|
234X
|
MQ
|
168X
|
-
|
-
|
g__Bacteroides
|
MQ
|
206X
|
MQ
|
368X
|
MQ
|
282X
|
MQ
|
196X
|
g__Phocaeicola
|
-
|
-
|
MQ
|
271X
|
-
|
-
|
-
|
-
|
The eight HQ MAGs corresponded to the genera Prevotellamassilia, Phascolarctobacterium, Catenibacterium, Enterococcus, Succinivibrio, Blautia, and Sutterella (Table 2). The HMW dataset and the 75% subset assemblies recovered six out of the eight HQ MAGs. Four of them were redundant and corresponded to Prevotellamassilia sp900541335, Phascolarctobacterium_A sp900544885,Catenibacterium sp000437715, and Enterococcus_B hirae. The remaining two from the HMW dataset were g__Succinivibrio (found in all the datasets except for the 75% subset) and Blautia_A sp900541345 (recovered after frameshift correction). Finally, the remaining two from the 75% subset were Blautia_A sp003287895 and g__Sutterella* (recovered after frameshift correction).
For each HQ MAG, we chose the representative with the highest coverage –and subsequent highest consensus accuracy– to continue the analysis. We performed an extra correction step to reduce the insertions and deletions (indels), the most abundant error of nanopore sequencing. The indels correction reduced the frameshift errors and, consequently, the number of predicted coding sequences (CDS) (Supplementary Figure S2). This correction step transformed two MQ MAGs to HQ MAGs: the Blautia sp900541345 on the HMW-only assembly (from MQ MAG with 84.99% completeness to HQ MAG with 93.86% completeness) and the Sutterella MAG on the 75% assembly (from MQ MAG with 84.88% completeness to HQ MAG with 95.49% completeness. On the other HQ MAGs, completeness remained constant or increased after applying the indel correction, except for one of the contigs (Enterococcus hirae, 47X coverage; completeness of 99.69% to 99.13% after the indel correction). The differences in applying indels correction were more evident in contigs with low coverage than in those with high coverage.
Characterization of the HQ MAGs of the canine fecal microbiome
The eight HQ MAGs obtained are single-contig and represent complete HQ draft MAGs, without gaps or unplaced scaffolds (Table 3). GTDB-tk uses average nucleotide identity (ANI) values to identify potential novel taxa (>95% ANI is considered as the same species [40,45]). From the eight HQ MAGs, two corresponded to potentially new species (Succinivibrio sp. and Sutterellasp.); four, represented the first contiguous draft genome assembly for their genus (Prevotellamassilia, Phascolarctobacterium, and Catenibacterium HQ MAGs) or their species (Blautia sp900541345 HQ MAG); and the remaining two, corresponded to complete reference genomes (Enterococcus_B hirae and Blautia sp003287895).
Table 3. High-quality MAGs comparison to references. Completeness (% Compl.) values come from CheckM; tRNAs and rRNA from PROKKA; genome reference (Ref.) for the bacterial species from GTDB-tk.
HQ MAG
|
Length (Mbp)
|
% Compl.
|
tRNAs
|
rRNAs
|
Contiguity level
|
Succinivibrio sp.
|
2.04
|
98.68
|
77
|
22
|
Complete - new sp
|
Succinivibrio genus
|
1.38 - 3.96
|
51.33 – 100
|
10 - 66
|
0 - 24
|
10 - 320 scaffolds
|
Sutterella sp.
|
2.70
|
95.49
|
67
|
18
|
Complete
|
Sutterella genus
|
2.28 - 2.99
|
74.48 – 100
|
15 - 67
|
0 - 24
|
1 - 298 scaffolds
|
Prevotellamassilia sp900541335
|
2.72
|
97.65
|
72
|
21
|
Complete
|
Ref: GCA_900541335.1
|
2.42
|
96.13
|
16
|
0
|
95 contigs
|
Phascolarctobacterium_sp900544885
|
2.09
|
99.85
|
58
|
15
|
Complete
|
Ref: GCA_900544885.1
|
1.75
|
98.65
|
18
|
1
|
87 contigs
|
Catenibacterium sp000437715
|
2.53
|
98.50
|
76
|
21
|
Complete
|
Ref: GCF_004168205.1
|
2.54
|
100
|
20
|
2
|
212 contigs
|
Blautia sp900541345
|
2.44
|
93.86
|
53
|
18
|
Complete
|
Ref: GCA_900541345.1
|
2.69
|
95.85
|
16
|
0
|
160 contigs
|
Enterococcus_B hirae
|
2.78
|
99.13
|
69
|
18
|
Complete
|
Ref: GCF_000271405.2
|
2.83
|
99.63
|
71
|
18
|
Complete
|
Blautia sp003287895
|
2.96
|
92.78
|
58
|
10
|
Complete
|
Ref: GCF_003287895.1
|
3.30
|
97.64
|
57
|
14
|
Complete
|
Potential novel Succinivibrio species
Succinivibrio HQ MAG represents a new Succinivibrio species without any described representative, as confirmed by an ANI of 80% to its closest genome assembly GCA_900552905.1 (<80% to Succinivibrio dextrinosolvens representatives). Moreover, all the Succinivibrio genome assemblies in NCBI are fragmented (‘contig’ or ‘scaffold’ level). So, this is the first contiguous assembly for the Succinivibrio genus. In GTDB taxonomy, several genome assemblies from the Succinatimonas genus and others are re-classified as Succinivibrio, so we included representatives of these genera in the phylogenetic tree (Figure 2).
Further genome characterization detected a total of 22 ribosomal genes. Among these, its seven 16S rRNA genes presented the highest identity to uncultured bacterium clone CL_F_057 (GeneBank: FJ978526.1) (Supplementary Figure S3), previously identified in wolves’ distal gut microbiome [46]. Succinivibrio HQ MAG did not harbor antimicrobial-resistant genes.
Potential genome for Sutterella stercoricanis
Sutterella HQ MAG is probably the genome assembly for Sutterella stercoricanis, as suggested by identities >98% with the previously reported 16S rRNA gene reference (Figure 3A), since its whole-genome sequence is absent in the public databases. Sutterella stercoricanis was first isolated in feces from a healthy dog and was characterized using microbiological methods and 16S rRNA gene sequencing (NR_025600.1) [47].
Here, we retrieved a potential complete genome assembly for Sutterella stercoricanis in a single-contig HQ MAG. Sutterella HQ MAG is 2.70 Mbp and contains 18 ribosomal genes, including nine 16S rRNA and nine 23S rRNA genes (Prokka did not predict 5S rRNA genes). Moreover, the number of tRNAs detected is concordant to other complete Sutterella species (Table 3). The closest genome assemblies –including a representative of Sutterella wadsworthensis– presented ANI values around 80% (Figure 3B). No antimicrobial-resistant genes were identified within this HQ MAG.
Single-contig HQ MAGs for Prevotellamassilia, Phascolarctobacterium, Catenibacterium and Blautia sp900541345
Prevotellamassilia sp900541335, Phascolarctobacterium_A sp900544885, Catenibacterium sp000437715, and Blautia sp900541345 HQ MAGs are draft genomes with high completeness values that improve the contiguity of previous assemblies of their respective bacterial species. The species representative genomes in GTDB are also MAGs obtained from gastrointestinal or fecal human microbiome and retrieved using short-read technologies. In consequence, they are highly fragmented and fail to recover all ribosomal genes and transfer RNAs (Table 3).
Moreover, Prevotellamassilia, Phascolarctobacterium, and Catenibacterium HQ MAGs are the first single-contig representative for their genus since all the other assemblies of these genera are fragmented (‘scaffold’ or ‘contig’ level).
Their 16S rRNA genes were close to others previously identified in wolves’ distal gut microbiome [46] (Prevotellamassillia HQ MAG), canine intestinal microbiome [48] (Phascolarctobacterium HQ MAG), and human GI microbiome [49] (Catenibacterium and Blautia sp900541345 HQ MAG) (Supplementary Figure S3).
We further characterized the HQ MAG to assess the potential antimicrobial resistance. Firstly, Prevotellamassilia HQ MAG harbored Mef(En2) gene, which encodes for an efflux pump that exports macrolides. Phascolarctobacterium HQ MAG harbored two copies of lnu(C) gene conferring resistance to lincosamide. Each lnu(C) gene was located in an ISSag10 mobile element, allowing it to transpose. Catenibacterium HQ MAG harbored tet(M) and Blautia sp900541345 HQ MAG harbored tet(O). Both genes confer resistance to tetracycline.
Known genomes from metagenomes: Enterococcus hirae and Blautia argii
The HQ MAGs representing known genomes were Enterococcus hirae, and Blautia sp003287895 (Figure 4) –proposed name Blautia argii, first isolated and characterized on dog feces [50]. Both representative genomes in GTDB are already complete and reference genomes.
Enterococcus hirae HQ MAG presented a genome size similar to its reference and the same number of rRNA genes. It harbored aac(6’)-Iid and tet(M) genes conferring resistance to aminoglycosides and tetracycline, respectively. Specifically, the tetM gene was in a region identified as a conjugative element (Tn916) integrated into the chromosome. This region encoded for a transposase, type 4 secretion system (T4SS), type 4 coupling protein, oriT, and relaxase (Supplementary Figure S4).
Blautia HQ MAG presented a smaller genome size than its reference genome (2,959,590 bp vs. 3,297,975). When aligning both genomes, we observed some gaps in our HQ MAG that are identifying those differences (Figure 4). Moreover, the completeness of this HQ MAG was the lowest (92.78%) among all the HQ MAGs retrieved. Further MAG characterization identified 5 rrn operons (10 ribosomal genes, since Prokka missed five 5S rRNA genes), which coincided with the reference. Moreover, Blautia HQ MAG harbored tet(32) and tet(40) genes conferring resistance to tetracycline.
Overview of the MQ MAGs
Apart from the HQ MAGs, we identified three MQ MAGs (>50% completeness and <10% contamination). They corresponded to Phocaeicola plebeius and potentially novel species from Phocaeicola and Bacteroides genera (Table 2).
The closest genome reference for Phocaeicola plebeius (previously named Bacteroides plebeius) MQ MAGs was GCF_000187895.1. When looking at the 100% dataset, we had a longer (2.37 Mb) and a more contiguous assembly for Phocaeicola plebeius MQ MAG (single-contig vs. 19 scaffolds), but with lower completeness (74.81% vs. 99.25% completeness). This MQ MAG lacks the Mef(En2) gene, conferring resistance to macrolides found in the 75% MQ MAG.
The longest Phocaeicola spp. MQ MAG (2.56 Mb, 75% subset assembly) had a completeness of 83.39% and 0.19% of contamination. It also harboured Mef(En2) gene, conferring resistance to macrolides. Finally, the longest Bacteroides MQ MAG (2 Mb, 100% dataset assembly) had a completeness of 61.66% and 0.37% of contamination. It harbored CfxA2 gene conferring resistance to beta-lactams.