Long-Read Metagenomics to Retrieve High-Quality Metagenome-Assembled Genomes from Canine Feces

Background. Metagenomics is a powerful and rapidly developing approach that provides new biological insights into the microbes inhabiting underexplored environments, such as canine fecal microbiome. We investigate long-read metagenomics with Nanopore sequencing to prole the fecal microbiome and to retrieve high-quality metagenome-assembled genomes (HQ MAGs) from a healthy dog. Results. More than 99% of total classied reads corresponded to Bacteria. The most abundant phylum was Bacteroidetes (~80% of total reads), followed by Firmicutes, Proteobacteria, and Fusobacteria. Prevotella (>50%) and Bacteroides (>20%) are the more abundant genera, followed by Fusobacterium, Megamonas, Sutterella, and other fecal-related genera, (each representing <5% of the total bacterial composition). We retrieved eight single-contig HQ MAGs and three medium-quality MAGs, after combining several metagenome dataset assemblies. The HQ MAGs corresponded to Succinivibrio, Sutterella, Prevotellamassilia, Phascolarctobacterium, Enterococcus, Blautia, and Catenibacterium genera. Succinivibrio HQ MAG represents a novel candidate bacterial species. Sutterella HQ MAG is potentially the rst reported genome assembly for Sutterella stercoricanis, as assigned by 16S rRNA gene similarity. Prevotellamassilia, Phascolarctobacterium, Catenibacterium, and Blautia sp900541345 HQ MAGs improved the contiguity of previously reported genome assemblies in their respective genera, and the number of rRNA genes and tRNA genes. Finally, Enterococcus hirae and Blautia sp003287895 HQ MAGs represented species that already have a complete reference genome. At the technical level, we demonstrated that a high-molecular weight DNA extraction improved the taxonomic classication of the raw unassembled reads, the metagenomics assembly contiguity, and the retrieval of longer and circular contigs, which are potential HQ MAGs.


Background
Metagenomics is a powerful and rapidly developing approach that can be used to unravel uncultured microbial diversity and expand the tree of life, as well as to give new biological insights into the microbes inhabiting underexplored environments [1]. Metagenomics applied to both the canine gastrointestinal (GI) and the fecal microbiomes provides information on health and disease and some clues on how to prevent or treat speci c pathologies.
Previous studies reported similarities between canine and human GI microbiome (See [2][3][4][5] for extensive reviews). Different GI diseases relate to an altered GI microbiome that, on the other hand, can be modulated by diet and dietary complements (such as pre-and probiotics). Besides the veterinarian interest itself, dogs are considered closer models to humans than other animal models for GI microbiome studies [6,7].
Microbiome studies are either marker-speci c (e.g., 16S rRNA gene for Bacteria) or whole metagenome sequencing [8]. The canine GI microbiome studies published until today (August 2020) use nextgeneration sequencing -short-read sequencing-or earlier technologies and are mostly amplicon-based strategies (16S rRNA gene). To date, only two studies have used shotgun metagenomics with short-read sequencing to characterize further the whole microbial community and the gene content in dog feces [7,9].
The application of long-read sequencing to metagenomics enables the retrieval of metagenomeassembled genomes (MAGs) with high completeness. The most recent strategy in long-read metagenomics uses the long reads to obtain the raw metagenome assembly -ensuring the greatest contiguity of MAGs-and short reads to polish and improve the overall accuracy. This strategy was applied to assess the human GI microbiome [10], among others -such as mock communities [11], cow rumen [12], natural whey starter cultures [13] or wastewater [14]. Some authors suggest that we may overcome the need for short reads to polish long-read data by either using correction software, such as frameshift-aware correction [15] or with ultra-deep coverage of the genomes [11].
In our previous work, we used long-read metagenomics to pro le canine fecal microbiome taxonomically and reach species identi cation. Despite using a low-depth sequencing approach, we were able to assemble a circular contig corresponding to an uncultured CrAssphage [16].
Here, using nanopore long-read metagenomics, we aim to unravel potential new bacterial diversity from the feces of a healthy dog. We assembled and characterized high-quality MAGs and identi ed their antimicrobial resistance genes to gain new biological insights on dog fecal metagenome.

DNA extraction and long-read sequencing
Our study focuses on the analysis of a single fecal sample of a healthy dog. A fresh sample was collected and stored at -80ºC until further processing.
We used two different kits from Zymobiomics (Zymo Research) for DNA extraction: the Quick-DNA HMW MagBead for High-Molecular Weight DNA (without bead-beating) and the DNA Miniprep Kit, which is the standard microbiome DNA extraction with bead-beating. Throughout the manuscript, we use HMW-DNA extraction and non-HMW DNA extraction terms, respectively.
Each DNA extraction was sequenced in a single MinION Flowcell R9.4.1 using MinION™ (Oxford Nanopore Technologies). The Ligation Sequencing Kit 1D (SQK-LSK109; Oxford Nanopore Technologies) was used to prepare both libraries. For non-HMW DNA, we followed the manufacturer's protocol. For the HMW-DNA, we tuned few parameters: i) at DNA repair and end-prep step, we incubated at 20ºC for 20 minutes and 65ºC for 20 minutes; ii) we extended rotator mixer (Hula mixer) times to 10 minutes; iii) we extended elution time after AMPure XP beads to 10 minutes; iv) nal incubation with elution buffer was performed at 37ºC and for 15 minutes (as recommended for HMW DNA).
Raw reads: pre-processing, quality control and taxonomic analyses Raw fast5 les were basecalled using Guppy 3.4.5 (Oxford Nanopore Technologies) with high accuracy basecalling mode (dna_r9.4.1_450bps_hac.cfg). During the basecalling, the reads with an accuracy lower than 7 were discarded.
To obtain the rst taxonomic assignment directly from the raw reads, we processed the data using Kraken2 2.0.8 [17] with the maxikraken2 database (Loman Lab, from March 2019) that includes all the genomes from RefSeq. We visualized Kraken2 reports using Sankey diagrams with pavian 1.0.0 R package [18].
We used Nanoplot 1.28 to obtain the run summary statistics [19], Porechop 0.2.4 [20] for adapters trimming, Nano lt 2.6.0 [21] to discard reads shorter than 1,000 bp, and different modules of seqkit 0.11.0 [22] to manipulate fastq and fasta les during the whole analysis.

Metagenomics assembly and polishing
Before proceeding with the metagenomics assembly, we performed an error-correction step of the raw nanopore reads using canu 2.0 [23].
We polished the Flye assembly with one round of medaka 1.0.1, including all the raw fastq les as input [26]. The next step for the HQ MAGs was to correct the frameshift errors, as described in [15], using Diamond 0.9.32 [27] and MEGAN-LR 6.19.1 [28]. We used ideel [29] to visualize the number of truncated ORF.
To assess the quality of the MAGs, we used CheckM 1.1.1 [30] to retrieve completeness and contamination. MAGs can be classi ed as: high-quality, with >90% completeness, <5% contamination, and presence of rRNAs genes and tRNAs; medium-quality, with >50% completeness and <10% contamination and low-quality, the remaining ones [31].
Characterization of the high-quality MAGs GTDB-tk 1.3.0 [32] with GTDB taxonomy release 95 [33] were used to assess the novelty and the taxonomy of HQ MAGs. We used PROKKA 1.13.4 to annotate the MAGs [34].
For the novel HQ MAGs, we used GtoTree 1.4.15 [42] to perform a de novo phylogenetic tree including the HQ MAG; the GTDB entries classi ed as the same genus; other NCBI assemblies of the same genus not included in GTDB; and a genome of a related taxon as an outgroup. We visualized the tree with iTOL 5.5.1 [35]. Abricate 0.9.8 [36] was used to detect antimicrobial resistance genes using CARD database [37].
OriT nder was used to identify the origin of transfer (oriT) and conjugative machinery of mobile genetic elements [38] and SnapGene Viewer 5.0.7 [39] to visualize the results.
We used FastANI 1.3 [40] to con rm a potentially new species by determining the average nucleotide identity (ANI) between the most related genomes. One-to-one whole genome alignments were performed and visualized with dot plots using Mummer 4.0 [41].
We extracted the 16S rRNA genes from the HQ MAGs before the frameshift correction step using ANVIO 6.1 [42]. The 16S rRNA genes were analyzed using MOLE-BLAST tool in NCBI website [43] to obtain a phylogenetic tree. Mafft [44] in the EBI website was used to align 16S rRNA gene sequences from Sutterella HQ MAG and obtain an identity matrix.

Results
We applied long-read nanopore sequencing to the fecal microbiome of a healthy dog. At the technical level, we compared the assembly results when choosing a HMW DNA extraction vs. a non-HMW one from the same fecal sample. Finally, using different metagenomics datasets, we retrieved and characterized eight high-quality single-contig draft metagenome-assembled genomes (HQ-MAGs) considering MIMAGs criteria (completeness >90%, contamination <5%, and presence of rRNA and tRNA genes) and three medium-quality draft metagenome-assembled genomes (MQ-MAGs; completeness >50%, contamination <10%) [31].
HMW vs. non-HMW DNA: raw reads and metagenome assembly HMW sequencing produced 5.81 million reads with N50 of 4,369 bp and a median length of 2,312 bp (total throughput: 18.76 Gb), whereas non-HMW produced 11.13 million reads with N50 of 2,102 bp and a median length of 1,093 bp (total throughput: 17.29 Gb).
Moreover, it also contains Fusobacterium, Megamonas, Sutterella, and other fecal-related genera, representing each one of them less than 5% of the total bacterial composition (Supplementary Figure S1).
The metagenomics assembly is more contiguous, presenting fewer and longer contigs, with HMW-DNA reads rather than non-HMW DNA one (nº of contigs: 1,898 vs. 2,944; N50: 187,680 vs. 94,109 bp). Moreover, HMW-DNA metagenomics assembly retrieves three circular contigs, and that could represent complete closed MAGs, for only one circular with de non-HMW DNA assembly ( Figure 1). So, HMW DNA extraction improved the taxonomic classi cation of the raw unassembled reads (less unclassi ed reads), the metagenomics assembly contiguity, and the retrieval of longer and circular contigs (potential HQ MAGs).
Metagenome assemblies, frameshift-aware correction, and retrieval of HQ and MQ MAGs For the in-depth analyses, we assembled both the HMW only dataset and the HMW and non-HMW merged datasets (100% dataset; 16.94 million reads, 36.05 Gb)to ensure the highest coverage and consensus accuracies. As we aimed to retrieve the maximum number of HQ MAGs, we performed extra metagenomics assemblies using 75% and 50% data subsets from that merged dataset (Table 1).
The number of contigs ranged from 1,898 with HMW dataset to 2,639 when analyzing all the merged data together. N50 ranged from 187,680 bp (HMW dataset) to 149,125 bp (50% subset), and mean coverage ranged from 138X (100% dataset) to 95X (50% subset). The largest contig of 2.95 Mbp was retrieved when using 75% of the data. After assigning taxonomy and comparing among assemblies, we identi ed a total of eight different HQ MAGs, and three different MQ MAGs ( Table 2). The different datasets retrieved redundant MAGs but with different degrees of quality. None of the performed assemblies alone retrieved all the HQ MAGs.
The eight HQ MAGs corresponded to the genera Prevotellamassilia, Phascolarctobacterium, Catenibacterium, Enterococcus, Succinivibrio, Blautia, and Sutterella ( Table 2). The HMW dataset and the 75% subset assemblies recovered six out of the eight HQ MAGs. Four of them were redundant and corresponded to Prevotellamassilia sp900541335, Phascolarctobacterium_A sp900544885,Catenibacterium sp000437715, and Enterococcus_B hirae. The remaining two from the HMW dataset were g__Succinivibrio (found in all the datasets except for the 75% subset) and Blautia_A sp900541345 (recovered after frameshift correction). Finally, the remaining two from the 75% subset were Blautia_A sp003287895 and g__Sutterella* (recovered after frameshift correction).
For each HQ MAG, we chose the representative with the highest coverage -and subsequent highest consensus accuracy-to continue the analysis. We performed an extra correction step to reduce the insertions and deletions (indels), the most abundant error of nanopore sequencing. The indels correction reduced the frameshift errors and, consequently, the number of predicted coding sequences (CDS) (Supplementary Figure S2)  Potential novel Succinivibrio species Succinivibrio HQ MAG represents a new Succinivibrio species without any described representative, as con rmed by an ANI of 80% to its closest genome assembly GCA_900552905.1 (<80% to Succinivibrio dextrinosolvens representatives). Moreover, all the Succinivibrio genome assemblies in NCBI are fragmented ('contig' or 'scaffold' level). So, this is the rst contiguous assembly for the Succinivibrio genus. In GTDB taxonomy, several genome assemblies from the Succinatimonas genus and others are reclassi ed as Succinivibrio, so we included representatives of these genera in the phylogenetic tree ( Figure  2).

Potential genome for Sutterella stercoricanis
Sutterella HQ MAG is probably the genome assembly for Sutterella stercoricanis, as suggested by identities >98% with the previously reported 16S rRNA gene reference ( Figure 3A), since its whole-genome sequence is absent in the public databases. Sutterella stercoricanis was rst isolated in feces from a healthy dog and was characterized using microbiological methods and 16S rRNA gene sequencing (NR_025600.1) [47].
Here, we retrieved a potential complete genome assembly for Sutterella stercoricanis in a single-contig HQ MAG. Sutterella HQ MAG is 2.70 Mbp and contains 18 ribosomal genes, including nine 16S rRNA and nine 23S rRNA genes (Prokka did not predict 5S rRNA genes). Moreover, the number of tRNAs detected is concordant to other complete Sutterella species (Table 3). The closest genome assemblies -including a representative of Sutterella wadsworthensispresented ANI values around 80% ( Figure 3B). No antimicrobial-resistant genes were identi ed within this HQ MAG.

Single-contig HQ MAGs for Prevotellamassilia, Phascolarctobacterium, Catenibacterium and Blautia sp900541345
Prevotellamassilia sp900541335, Phascolarctobacterium_A sp900544885, Catenibacterium sp000437715, and Blautia sp900541345 HQ MAGs are draft genomes with high completeness values that improve the contiguity of previous assemblies of their respective bacterial species. The species representative genomes in GTDB are also MAGs obtained from gastrointestinal or fecal human microbiome and retrieved using short-read technologies. In consequence, they are highly fragmented and fail to recover all ribosomal genes and transfer RNAs (Table 3).
Moreover, Prevotellamassilia, Phascolarctobacterium, and Catenibacterium HQ MAGs are the rst singlecontig representative for their genus since all the other assemblies of these genera are fragmented ('scaffold' or 'contig' level).  Figure  S3).
We further characterized the HQ MAG to assess the potential antimicrobial resistance. Firstly, Prevotellamassilia HQ MAG harbored Mef(En2) gene, which encodes for an e ux pump that exports macrolides. Phascolarctobacterium HQ MAG harbored two copies of lnu(C) gene conferring resistance to lincosamide. Each lnu(C) gene was located in an ISSag10 mobile element, allowing it to transpose.

Known genomes from metagenomes: Enterococcus hirae and Blautia argii
The HQ MAGs representing known genomes were Enterococcus hirae, and Blautia sp003287895 ( Figure  4) -proposed name Blautia argii, rst isolated and characterized on dog feces [50]. Both representative genomes in GTDB are already complete and reference genomes.
Enterococcus hirae HQ MAG presented a genome size similar to its reference and the same number of rRNA genes. It harbored aac(6')-Iid and tet(M) genes conferring resistance to aminoglycosides and tetracycline, respectively. Speci cally, the tetM gene was in a region identi ed as a conjugative element (Tn916) integrated into the chromosome. This region encoded for a transposase, type 4 secretion system (T4SS), type 4 coupling protein, oriT, and relaxase (Supplementary Figure S4).
Blautia HQ MAG presented a smaller genome size than its reference genome (2,959,590 bp vs. 3,297,975). When aligning both genomes, we observed some gaps in our HQ MAG that are identifying those differences (Figure 4). Moreover, the completeness of this HQ MAG was the lowest (92.78%) among all the HQ MAGs retrieved. Further MAG characterization identi ed 5 rrn operons (10 ribosomal genes, since Prokka missed ve 5S rRNA genes), which coincided with the reference. Moreover, Blautia HQ MAG harbored tet(32) and tet(40) genes conferring resistance to tetracycline.

Overview of the MQ MAGs
Apart from the HQ MAGs, we identi ed three MQ MAGs (>50% completeness and <10% contamination).
They corresponded to Phocaeicola plebeius and potentially novel species from Phocaeicola and Bacteroides genera (

Discussion
We applied long-read metagenomics to a fecal sample of a healthy dog and retrieved eight HQ MAGs and three MQ MAGs, all of them single-contigs.
At the technical level, we compared a HMW and non-HMW DNA extraction to perform long-read metagenomics and con rmed that a HMW DNA extraction was the best choice. For analyses using unassembled raw reads, it improved the taxonomic classi cation and gave less unclassi ed reads. For metagenomics assembly, it improved the contiguity and increased the retrieval of longer and circular contigs (potential HQ MAGs).
For the subsequent analyses, we used both the HMW data and the whole merged dataset to ensure the highest consensus accuracy. Moreover, we assessed different amounts of total data (75% and 50% data subsets) to retrieve the maximum number of HQ MAGs. None of the performed assemblies alone retrieved the eight HQ MAGs. The HQ MAGs were representatives of the Succinivibrio, Sutterella, Prevotellamassilia, Phascolarctobacterium,Enterococcus, Blautia, and Catenibacterium genera.
Succinivibrio HQ MAG is the rst single-contig genome assembled in the genus. It represents a novel candidate bacterial species, with ANI of 80% to its closest genome assembly GCA_900552905.1 (<80% to Succinivibrio dextrinosolvens representatives). Its full-length 16S rRNA genes cluster with the 16S rRNA gene from uncultured bacteria ampli ed in a wolves' GI microbiome study [51].
Sutterella HQ MAG is potentially the rst reported genome assembly for Sutterella stercoricanis that was rst isolated in feces from a healthy dog [47]. Since the reference isolate lacks additional genome information to con rm that the Sutterella HQ MAG represents the same species, we compared the fulllength 16S rRNA gene sequences to identify the bacterial species. Both the classical threshold of 97% identity and the updated one of 99% identity were met in this case [52]: the nine 16S rRNA genes presented identities from 99.04% to 98.69% against Sutterella stercoricanis 16S ribosomal RNA (NR_025600.1). Whole-genome sequencing of the reference isolate and comparison to the HQ MAG could con rm if they represent the same species.
Prevotellamassilia, Phascolarctobacterium,Catenibacterium, and Blautia sp900541345 HQ MAGs improved the contiguity of previously reported genome assemblies in their respective genera (singlecontig assembly vs. multiple scaffolds), and the number of rRNA genes and tRNA genes. Finally, Enterococcushirae and Blautia HQ MAGs represented species with complete reference genomes. Blautia HQ MAG was Blautia sp003287895 (proposed species name Blautia argii) and was rst isolated from feces of a mature dog [50].
In dog GI microbiome, different diets and dietary interventions can modulate their abundances aiming to promote gut health [7,[57][58][59][60][61][62]. Moreover, several studies on dog GI microbiome identi ed Blautia genusamong others-as a microbial marker for health and had targeted it to assess differences with disease status [63][64][65][66]. So, in-depth characterization of these genera is of most relevance to de ning a healthy GI microbiome in dogs.
Sutterella stercoricanis was isolated from the feces of a healthy dog [47]. However, the increase of the genus Sutterella was associated with detrimental effects rather than health. Dogs with acute hemorrhagic diarrhea presented higher Sutterella [63], and some diets aiming to promote health bene ts observed its decrease [67,68]. Further whole-genome sequencing studies are needed to identify the different Sutterella species on dog feces and correlate their abundances to health or disease status.
Finally, Enterococcus hirae is a prevalent Enterococci species of the GI microbiome of healthy dogs. However, Enterococci species usually carry antimicrobial-resistant genes and virulence factors and are potential antimicrobial-resistant genes reservoirs that could be transferred to people [69-73]. Enterococcus HQ MAG harbors aac(6')-Iid gene, which was rst detected in Enterococcus durans and conferred resistance to aminoglycosides [74]. Besides, it harbors a tetM gene within the Tn916 conjugative element, which was rst reported in Enterococcus faecalis [75,76].
Tetracycline resistance genes were found not only in the genome of Enterococcus hirae, but also in Catenibacterium and both Blautia HQ MAGs and could be linked to a previous antimicrobial exposure that selected the resistant bacteria [77]. Three years before sampling, this dog was treated with doxycycline -tetracycline-class antibiotic-during 15 days for excess secretion of mucus and saliva.
Whole resistome analyses are needed to determine the AMR genes within the fecal microbiome in healthy dogs and to evaluate all the bacterial species together with their mobile genetic elements that could act as a reservoir for AMR genes.
Despite humans and dogs share similar microbial composition on the GI microbiome [6,7], Succinivibrio, Blautia, and Sutterella HQ MAGs seem to be canine-related fecal bacterial species. This fact highlights the need for building and using niche-speci c databases to accurately map and classify new reads from a particular environment, as well as understand the overall biological signi cance [12,78].
Apart from eight HQ MAGs, we recovered three different MQ MAGs from potentially new species of the Bacteroides and Phocaeicola genera and Phocaeicola plebeius. Our next step is to apply proximity ligation to link all contigs among them and recover new HQ MAGs and MQ MAGs, as well as to link antimicrobial resistance genes, mobile genetic elements, and bacteriophages to their bacterial host [79].
A limitation of this study is the use of nanopore-only data since it can compromise the accuracy of the HQ MAGs. To reduce the insertion and deletion error type, we applied a frameshift-aware correction step [15] that improved the completeness and reduced the number of CDS. On the other hand, long-read metagenomics improved the contiguity of the MAGs even for reference assemblies. Long-read metagenomics sequencing could harness short-read metagenomics data to build comprehensive and curated databases to in-depth characterize novel bacterial diversity in the canine fecal microbiome.

Conclusions
To conclude, we recovered and characterized eight HQ MAGs and three MQ MAGs from a fecal sample of a healthy dog using long-read metagenomics. Among them, one potential novel species for Succinivibrio and the rst genome assembly for Sutterella stercoricanis. Overall, long-read metagenomics allowed us to recover HQ MAGs from a complex microbiome. The high-molecular weight DNA extraction to improve contiguity and the correction of the insertions and deletions to reduce frameshift errors ensured the retrieval of complete single-contig HQ MAGs. Availability of data and materials: The raw assemblies, the metagenome-assembled genomes, and an overview of the scripts used are available on Zenodo: 10.5281/zenodo.3982645. An overview of the scripts used to analyze the data is at Additional File 5.   Phylogenetic de novo tree for Succinivibrio HQ MAG. The tree included: i) the HQ MAG; ii) the GTDB entries classi ed as Succinivibrio; iii) extra NCBI assemblies of the same genus not included in GTDB; iv) a genome of a related taxon as an outgroup (for rooting the tree).

Figure 3
Phylogenetic analysis of Sutterella HQ MAG. In A) identity matrix of the nine 16S rRNA genes and the reference NR_025600.1; and in B) Phylogenetic de novo tree, that included the HQ MAG, the GTDB entries classi ed as the same genus and a genome of a related taxon as an outgroup (for rooting the tree).