Urine Volume Experiment
Current urobiome studies vary widely in the volume of urine used for profiling microbial communities. Moreover, low biomass samples, like urine, are highly susceptible to contamination by microbes or microbial DNA (hereafter referred to as “contaminants”) that can be introduced during the DNA extraction and sequencing process. As such, in this experiment, we first assessed the relationship between urine sample volume and microbial contaminant load. Contaminants, as identified by decontam (Table S2), were at significantly lower relative abundances in urine samples of greater volume (Fig. 1A, Table S6, p = 0.026, Friedman).
We then evaluated bacterial diversity and composition by urine sample volume. Microbial richness, or the total number of unique ASVs in each sample, increased significantly with sample volume (Fig. 1B, S5, Table S7, p = 0.015, Friedman). Sequencing reads also increased with urine sample volume; although, this difference was not significant (Fig. 1C, p = 0.075, Friedman).
Bacterial composition, however, did not differ significantly by urine sample volume but did differ significantly between dogs (Fig. 2A, S5, between dogs: p = 0.001, by urine sample volume: 0.98, Bray-Curtis, PERMANOVA), indicating that inter-dog differences overwhelmed differences based on sample volume. We next evaluated within-dog microbial composition by sample volume. Within each dog, the 3 mL and 5 mL samples were more consistent in microbial composition, while the 0.1, 0.2, 0.5 and 1 mL samples were more variable (Fig. 2B, S6). Based on this pattern, we grouped 3 mL and 5 mL urine samples into a “High” volume group, and the remaining urine volumes into a “Low” volume group. There was no significant difference in microbial composition between the High and Low groups (p = 0.6, PERMANOVA); however, High volume samples had significantly less variable microbial communities than Low volume samples, indicating that Low volume samples are more subject to stochasticity (Fig. 2C, S6; p = 0.0017, PERMDISP). Based on these results, we proceeded to use 3 mL urine samples for subsequent experiments.
Host Depletion – 16S rRNA Gene
Healthy urine contains shed host epithelial cells at a relatively low abundance. However, in the presence of urinary tract disease (e.g., urinary tract infection, bladder cancer, bladder stones), host cell shedding can dramatically increase. There are multiple DNA extraction methods that incorporate host cell / host DNA depletion steps to facilitate microbial DNA recovery. In this experiment, we evaluated how six different extraction methods affected DNA concentrations and microbial community profiles. Extraction methods included: QIAamp BiOstic Bacteremia DNA Kit (Bacteremia); MolYsis Complete5 (Molzym); NEBNext Microbiome DNA Enrichment Kit; QIAamp DNA Microbiome Kit (DNA Microbiome) HostZERO Microbial DNA Kit (Zymo HostZERO); and a protocol using light-activated propidium monoazide described in Marotz et al., 2018 [19]. All methods except Bacteremia included host depletion steps. The Bacteremia extraction method was included for reference here because this method has already been validated as an optimal method for profiling canine urine microbial communitites [34], and it has been applied across multiple urobiome studies in humans and animals [1, 4]. However, it has not been tested against extraction methods that include host depletion steps, which we did here.
We first compared how each extraction method impacted total and bacterial DNA concentrations derived from urine samples. We also compared DNA concentrations in urine samples that were unspiked versus those spiked with host (canine) cells. While healthy mid-stream free-catch urine contains a low abundance of host cells, we opted to spike additional canine cells into urine at biologically relevant concentrations to best assess the host depletion capabilities of each extraction method. In unspiked samples, Bacteremia and NEBNext recovered the greatest total DNA concentrations (host + microbial); although, this result was not significant (p = 0.62, Friedman, Fig. 3A). Bacteremia, DNA Microbiome, and Molzym MolYsis demonstrated significantly greater bacterial DNA recovery than propidium monoazide, Zymo HostZERO, and NebNEXT; although no pairwise comparisons were significant (overall p = 0.014, Friedman, Fig. 3B). In spiked urine samples, Bacteremia and NebNEXT recovered significantly greater total DNA than all other extraction methods (Fig. 3C, overall p < 0.0001, Friedman, pairwise p between Bacteremia or NebNEXT and all other methods < 0.05), while DNA Microbiome recovered the most bacterial DNA; although, overall differences in bacterial DNA concentrations by extraction method were only marginally significant (Fig. 3D, overall p = 0.051, Friedman). There was no significant difference in total or bacterial DNA recovery by dog in unspiked or spiked samples (Fig. S7)
We next assessed urine microbial diversity (16S rRNA) of unspiked urine samples by extraction method. Sequencing data from all samples extracted using NEBNext did not pass quality control steps [35] and, as such, were excluded from analysis. Urine microbial diversity varied significantly by extraction method (Fig. 4, S8, Table S8, Microbial richness p = 0.0018, Shannon Entropy p = 0.0091, Friedman). Specifically, urine samples extracted using Bacteremia and DNA Microbiome contained the greatest microbial richness (unique ASVs) and significantly greater microbial richness than samples extracted using Zymo HostZERO (Fig. 4A, Table S8, overall p = 0.0018, pairwise p = 0.0041, Friedman). Samples extracted via Bacteremia, DNA Microbiome, or propidium monoazide also exhibited the greatest microbial diversity (Shannon Entropy), all three showing significantly greater microbial diversity than samples extracted via Molzym MolYsis (Fig. 4B, Table S8, pairwise p = 0.025, 0.028, and 0.017, Friedman, respectively).
Finally, we assessed urine microbial composition (16S rRNA) of unspiked urine samples by extraction method. Microbial composition (Bray-Curtis) differed significantly by dog but not by extraction method (Fig. 4C, S8, Bray-Curtis, by dog p = 0.001, by kit = 0.92, PERMANOVA). When composition was weighted by phylogeny (relatedness of microbes between samples; Unweighted UniFrac), composition differed significantly by both extraction method and by dog (Fig. 4D, S8, Unweighted UniFrac, extraction method p = 0.002, dog p = 0.001, PERMANOVA). Urine samples extracted using Bacteremia, DNA Microbiome, and Molzym MolYsis exhibited more similar microbial composition as compared to samples extracted with propidium monoazide or Zymo HostZERO (Fig. 4E, F, G, Table S9, Bray-Curtis p = 0.037, Jaccard p = 0.034, Unweighted UniFrac p = 0.0071, Friedman).
Host Depletion – Shotgun Metagenomics
We next assessed host depletion efficacy of each extraction method using shotgun metagenomic sequencing performed on urine samples spiked with host (canine) cells. Samples averaged 28.2 million paired-end reads per sample (range: 1399-80 million reads, SD: 16.7 million reads). There was no significant difference in the total number of reads obtained per sample by extraction method (Fig. 5A p = 0.12, Friedman). However, the total number of microbial reads did vary significantly by extraction method (Fig. 5B, p = 0.0039, Friedman), with DNA Microbiome, Molzym MolYsis, and Zymo HostZERO yielding a significantly greater number of microbial reads compared to Bacteremia, which includes no host depletion steps (all pairwise p = 0.01). The proportion of total microbial reads also varied significantly by extraction method with Molzym MolYsis and ZymoHostZERO yielding the greatest proportion of microbial reads (Fig. 5C, overall p < 0.0001, pairwise p < 0.02, Friedman). In terms of host reads, each method yielded the following (on average): Bacteremia, 82% host reads; DNA Microbiome, 78%; Molzym MolYsis, 29%; PMA, 81%; Zymo HostZERO, 30%. Finally, we quantified the abundance of contaminant reads by extraction method and found that DNA Microbiome samples contained the lowest abundance of contaminant reads (Fig. 5D, overall p = 0.014, Friedman), although contaminant read abundances varied widely between samples (0-100%).
To determine whether efficacy in host depletion translated to improved capture of the urobiome, we employed MetaPhlaAn4 and SingleM - computational tools used for profiling microbial communities from marker genes found in metagenomes. Urine microbial diversity varied significantly by extraction method (Fig. 6A, B, MetaPhlAn, Observed Species p = 0.011, Shannon entropy p = 0.002, Friedman), with DNA Microbiome yielding the greatest number of observed microbial species and significantly more species than all other extraction methods (all pairwise p = 0.014) except Molzym MolYsis. Urine microbial composition did not differ significantly by extraction method but did differ significantly by dog (Fig. 6C, D, MetaPhlAn4, By extraction method: Jaccard p = 0.67, Bray-Curtis p = 0.96; By dog: Jaccard p = 0.001, Bray Curtis p = 0.001, PERMANOVA), indicating that interindividual variation overwhelmed microbial community differences due to extraction method. SingleM largely recapitulated the MetaPhlAn results (Fig. S9).
We then assessed the viability of performing genome-resolved metagenomics on low biomass urine samples. To do this, we assembled MAGs within each sample (Assembly metrics for each sample: Fig. S10). We generated a total of 26 unique MAGs: 11 were bacteria found in the ZymoBIOMICs Gut Microbiome Standard (Table S3), and five were derived from urine samples (Fig. 7); 10 were probable contaminants (Table S5). The five E. coli strains present in the standard assembled into a single MAG. The greatest number of urine-derived MAGs (n = 4) were identified in DNA Microbiome samples while three or fewer MAGs were identified in all other extraction methods. The total number of MAGs did not vary by extraction method (Fig. S11, p = 0.3, Friedman); although, fewer contaminant MAGs arose from DNA Microbiome samples as compared to other extraction methods (Fig. S11, overall p = 0.018, Friedman, no pairwise significant).
Next, we compared the microbial taxonomic profiles generated by 16S rRNA sequencing, shotgun metagenomic sequencing (MetaPhlAn4), and genome-resolved metagenomics (MAGs) (Fig. 7). Each method is fundamentally different and employs different reference databases for taxonomy assignment. However, all five urine-derived MAGs also appeared in the top twenty most abundant taxa in the shotgun metagenomics and 16S datasets. Notably, Arcanobacterium is not present in the MetaPhlAn4 reference database, but was identified in the shotgun metagenomic data through the SingleM reference database (Fig S9). Additional top 20 genera common between the metagenomics and 16S datasets include: Peptacetobacter/Peptoclostridium spp. and Blautia spp.
Finally, we compared our capture of the ZymoBIOMICs Gut Microbiome Standard community across extraction, sequencing, and bioinformatic methods (Fig S12). The Standard contained 21 microbial taxa including 18 bacterial strains, 1 Archaea, and 2 microbial eukaryotes at differing and biologically relevant abundances. Amongst the bacterial strains, there were 5 closely related strains of E. coli. In the 16S rRNA dataset, we were able to detect a total of 12/21 taxa, all of which were present at ≥0.1% abundance in the Standard. Expectedly, we did not detect the 2 microbial eukaryotes (which do not encode a 16S rRNA gene). We were also unable to differentiate the 5 E. coli strains in the Standard as this is not feasible with amplicon sequencing. We also did not detect the 4 taxa found at ≤0.01% abundance in the Standard (Methanobrevibacter smithii, Salmonella enterica, Enterococcus faecalis, Clostridium perfringens). In the shotgun metagenomic data profiled using MetaPhlAn4, we detected a total of 14/21 taxa in the Standard including the 2 microbial eukaryotes. As with 16S rRNA sequencing, we were able to detect all taxa present at ≥0.1% abundance in the Standard and not able to detect the 4 taxa found at ≤0.01% abundance in the Standard. MetaPhlan4 did not distinguish the 5 E. coli strains. We were further able to assemble a total of 11 MAGs from the shotgun metagenomic data. This included all taxa at ≥1.5% abundance, excluding the eukaryote Candida albicans, which was found at 1.5% abundance but for which we were not able to assemble a MAG. We assembled a single E. coli MAG (rather than the expected 5 unique E. coli strains). The threshold we employed for MAG dereplication (99% ANI) did not allow us to distinguish between the 5 E. coli strains; therefore, as with our 16S rRNA data, we only detected “one” E. coli taxon. A higher ANI (99.9%) and a tool other than dRep would be required for strain differentiation. We were not able to assemble a MAG for M. smithii which was present at 0.1% abundance and detected in 16S rRNA and shotgun metagenomic sequencing. Across methods (16S rRNA, shotgun metagenomics, MAGs), samples extracted using Bacteremia and DNA Microbiome most closely matched the expected microbial taxonomic composition of the Standard (Fig S12).
Functional Profiling of Urine Microbes
Relatively few studies have performed shotgun metagenomics in urine, and even fewer have generated MAGs [26], which has limited our understanding of the functional potential of the urobiome. In this study, as proof-of-concept, we mined the urine-derived MAGs for key functions. We first identified core metabolic pathways (e.g., glycolysis, citrate cycle) across all MAGs (Fig. S13A). Then we identified pathways associated with carbohydrate, nitrogen, acid, and alcohol metabolism. Specifically, we observed urea utilization in 2 of the MAGs: Staphyocuccus pseudintermedius and Bacillus_A cerus. (Fig. S13B).
Next, we looked for microbial metabolic pathways associated with environmental chemical metabolism. There are a number of environmental chemicals (e.g., arsenic, polycyclic aromatic hydrocarbons) that have been linked to urinary tract diseases like bladder cancer [63]. The kidney filters many of these toxicants out of the blood and into the urine. Therefore, it is important to understand if and how urine microbes metabolize these chemicals and how that could impact disease risk. As such, we mined the urine MAGs for pathways associated with polycyclic aromatic hydrocarbon (PAH) and long-chain alkane degradation. PAHs and long-chain alkanes are common environmental pollutants produced during the combustion process and found in vehicle exhaust and industrial output [65–67]. We did not identify genes (> 80% gather cutoff) associated with PAH degradation but we did identify genes for long chain alkane utilization: ladB (91% of noise cutoff) in Bacillus_A cereus and ladA alpha (97% of trusted cutoff) in Staphylococcus pseudintermedius. Moreover, in B. cereus, we identified a full metabolic pathway starting with an alkanesulfonate monooxygenase (ssuD) that desulfonates organosulfonates to yield sulfite and an aldehyde (Fig. 8A). The presence of this pathway supports the possibility that B. cereus may be capable of utilizing a variety of hydrocarbons as potential carbon sources or electron donors. In S. pseudintermedius, we did not identify a complete metabolic pathway for long-chain alkane degradation, but the presence of alcohol and aldehyde dehydrogenase protein families suggest that long chain alkanes activated by ladA may be further oxidized by this organism (Fig. 8B). Taken together, these results suggest that urine-derived microbes can metabolize environmental chemicals, and that microbial metabolism merits further investigation in relation to urinary tract disease risk.