This study aimed to bridge a gap in our knowledge of the healthy pediatric urobiome. Our study prospectively enrolled 50 healthy male infants that underwent urinary catheterization during routine operative circumcision (Figure 1A). Below, we describe the pipeline we established to increase rigor and reproducibility within urobiome research, followed by a description of our findings.
Establishing Methodology for Low Biomass Urine Samples from Infants
The method of sample collection is a key concern in urobiome research. Genital and intestinal contaminants confound urobiome results22. To obtain sterile catheterized samples from healthy infants, we selected the population of infant males undergoing circumcision in the operating room. Informed consent for bladder catheterization was obtained from parental guardians. We collected urine from 50 male infants following induction of general anesthesia and sterilization of the periurethral area. The median age of the infants was 215 days (~7 months old, Table 1). The average amount of urine collected was 5.81 mL (range 0.4-28mL). Urine was immediately plated for extended quantitative urine culture (EQUC) as described in the methods to isolate and identify culturable bacteria. For sequencing, aliquots of the same urine were immediately frozen at -80°C to prevent microbial growth or contamination prior to processing for sequencing.
We modified existing EQUC urobiome protocols to preserve urine volume for DNA extraction2. We utilized two non-selective agar media (blood agar and Brucella agar) plated in duplicate and incubated under aerobic and anaerobic conditions. Aliquots of the same urine were subjected to DNA extraction using a commercially available kit which utilizes bead beating and DNA binding by magnetic beads for DNA isolation and purification. Isolated DNA was amplified using standardized PCR primers for the V4 region of the 16S rRNA. 16S rRNA amplicons were sequenced by Illumina paired-end sequencing.
The urobiome is a low biomass environment. There are myriad potential sources of contamination, a concern which is accentuated for low biomass samples. Contamination can be introduced at any step of sample processing, from sample collection to DNA extraction and amplification to sequencing35–37. We utilized three types of negative controls: 1) DNA extraction blanks; 2) no template PCR amplification blanks; and 3) sampling controls (Figure 1B). Specifically, we included eight DNA extraction blanks which underwent all steps of DNA extraction, PCR amplification, and sequencing. We included four no DNA template blanks during PCR amplification of the V4 region of the 16S rRNA. Finally, we included three types of sampling controls (operating theatre saline, mineral oil used for catheter lubrication, and saline flushed through a sterile catheter); four sets of which were collected on separate days. To our knowledge, this is the first urobiome study to report sampling controls collected contemporaneously with urine samples.
We sequenced sampling controls on an Illumina NovaSeq 6000 to add additional resolution to rare contaminant sequences and due to sequencing equipment availability. Urine samples were sequenced on an Illumina MiSeq. Notably, all samples and controls were processed in the same laboratory using the same reagents. To account for the different read numbers between MiSeq and NovaSeq platforms, we utilized a dilution series of a mock microbial community. Sampling controls had consistently higher 16S rRNA reads than extraction blanks. All sample read counts are shown in Supplementary Table S1. We applied the R package Decontam to remove sequences that were more prevalent in the blank extraction controls or PCR blanks compared to the sampling controls. A total of 37 specific genera were retained following filtering using Decontam. The most prevalent genera present in the sampling controls were Campylobacter, Rodentibacter, Mannheimia, Alloprevotella (Supplementary Table S2). The sampling controls (n=12) clustered distinctly from the subjects’ samples (n=50) (PERMANOVA p=0.001) (Figure 2A). Nonetheless, the number of 16S rRNA reads in sampling controls indicates that this is a potential source of contamination that must be accounted for within urobiome studies.
Characterizing the Urobiome by Amplicon Sequencing
Prior urobiome studies2,38 have used agarose gel electrophoresis to determine “negative samples” following 16S rRNA amplification (and thus excluded those samples from sequencing). The absence of a band in gel electrophoresis to determine negative samples has a false negative rate of 30% and should be avoided when sampling low biomass environments39. We subjected all samples to 16S rRNA amplification and sequencing. Every subject’s sample had higher sequencing reads than blank extraction controls. The range of merged non-chimeric 16S rRNA reads in the subjects’ samples from Illumina MiSeq sequencing was 6604-60040 reads compared to 2345-4149 reads in the extraction blank controls (Supplementary Table S1). We used the R package Decontam to identify potential contaminants which were sequences more prevalent in the negative controls than the urine samples. Given the different library sizes and readily apparent compositional differences (Figure 2A), we did not apply Decontam to remove sampling controls from subject samples. Next, we filtered ASVs less than 1% abundance in the whole subject dataset. This threshold has previously been applied to urobiome datasets 7. Supplementary Table S3 includes all taxa from subject urine samples identified by 16S rRNA sequencing with literature citations regarding prior detection in urobiome studies.
Following above-described filtering steps, there were 74 unique taxa remaining from the urine samples (Supplementary Table S4). Consistent with prior reports, the phyla Proteobacteria, Firmicutes, Bacteroides, and Actinobacteria were frequently detected (Figure 2B)40. The genera shared between subject samples and sampling controls were Campylobacter, Staphylococcus, Nocardiopsis, Halomonas, Saccharopolyspora, Vibrio, Porphyromonas, Rheinheimera, Cloacibacterium, and Anaerobacillus. Urine samples had a median of 41 unique genera (range 32-57). Six genera were detected in all 50 subject urine samples: Staphylococcus, Nocardiopsis, Acinetobacter, Pseudomonas, Corynebacterium, and Nesterenkonia. Three genera were found in 49 of the 50 samples: Aliihoeflea, Saccharopolyspora, and Sphingobacterium. Three genera were found in 48 of the 50 samples: Escherichia-Shigella, Lactobacillus, and Halomonas. The most abundant genera were Nocardiopsis, Staphylococcus, Escherichia-Shigella (median abundance >5%); five additional genera had median abundance >3%: Lactobacillus, Acinetobacter, Pseudomonas, Prevotella, and Lacibacter.
We sought to investigate whether various subject exposures influenced the diversity of the urobiome. Measures of community diversity are commonly used to summarize information about the richness and distribution of microbials species in the community41. We compared two measures of alpha diversity (Chao1, Shannon) for two subject exposures: mode of birth (vaginal delivery vs. Caesarean section) and prior antibiotic exposure (Figure 2C). While both of these exposures are known to alter the gastrointestinal and skin microbiota of infants42, no significant difference in alpha diversity was detected in the urine samples between either exposure (Figure 2C).
Next, we sought to determine whether specific taxa are influenced by subjects’ exposures. We selected the taxonomic family Lactobacillaceae which was present in variable amounts in the 50 subjects.Lactobacillaceae are well studied members of the urogenital microbiota, particularly in post-pubescent women43,44. Lactobacillaceae are transferred to infants during vaginal birth, and intestinal abundance of Lactobacillaceae are decreased in infants born by Caesarean section45. We compared Lactobacillaceae abundance between infants born by vaginal birth vs. Caesarean section (Figure 2D). There was no significant difference inLactobacillaceae abundance between these groups. Together, these data display a detectable and consistent urobiome among infant males. Twelve genera were detected in ≥48 of the 50 urine samples. Early life exposures, such as mode of birth and prior antibiotic exposure, did not significantly influence urobiome composition.
Expanded Quantitative Urine Culture Identifies Culturable Members of the Infant Urobiome
To facilitate future mechanistic studies between urobiome members and the urothelium, or uropathogenic bacteria, we designed an extended quantitative urine culture (EQUC) protocol with the goal of capturing as many bacteria as possible using limited urine volume from infants. Indeed, 32/50 (64%) of urine samples led to identifiable growth on one or more of the media and conditions. This percentage is consistent with several prior urobiome studies utilizing extended culture across the human lifespan1,2,8,14. Colony identification was performed by matrix-assisted laser desorption/ionization (MALDI) mass spectrometry. Among the 12 sampling controls, only 1 colony grew from extended culture, Cutibacterium acnes, a likely skin contaminant. This suggests that the 16S rRNA reads observed in the sampling controls were due to residual DNA, not viable bacteria.
The species identified by extended culture are listed in Table 2. The range of unique species was 1-5 per urine sample. The most common taxonomic families detected were Actinomycetaceae (n=15), Peptoniphilaceae(n=7), and Enterococcaceae(n=6). The most common species isolated were Actinotignum schaalii (n=9), Enterococcus faecalis (n=6), and Peptoniphilus harei (n=5).
Extended culture and amplicon sequencing are complementary but not strictly equivalent approaches. We created a concordance map that displays which taxonomic families were detected by EQUC, 16S rRNA sequencing, or both (Figure 3). We included families detected in >0.1% relative abundance in the 16S rRNA dataset. There were a total of 49 taxonomic families across the 50 subjects’ urine samples. The family Actinomycetaceae was the most frequently detected family by EQUC and exhibited a high level of concordance with 16S rRNA results.
We inspected the concordance map for taxonomic families disproportionately represented in either EQUC or 16S rRNA results. The families Moraxellaceae, Nocardiopsaceae, Pseudomonadaceae were detected in all urine samples by 16S rRNA amplicon sequencing, but not by EQUC. Similarly, the physiologically important family Lactobacillaceae was frequently detected by amplicon sequencing but not isolated by EQUC. The families Bifidobacteriaceae, Enterococcaceae, Peptoniphilaceae, Streptococcaceae were detected >3 times by EQUC but not present in the 16S rRNA results. These discordances highlight potential limitations of each method and the importance complementary approaches for sampling the urobiome.
Actinotignum schaalii is a Common Culturable Constituent of the Infant Urobiome
Actinotignum schaalii was the most common species identified in our extended culture and exhibited high concordance with the 16S rRNA amplicon sequencing results. Of the 32 urine samples that grew at least one bacterial species, nine (28.1%) grew A. schaalii. A. schaalii (formerly Actinobaculum schaalii) has been detected in numerous urobiome studies to date2,8,14–18,46. Intriguingly, in addition to being reported in this study and others as an asymptomatic colonizer of the urobiome, A. schaalii is also an opportunistic causative agent of urinary tract infections47. Specifically, there is concern of an increasing incidence of A. schaalii urinary tract infections19–21. Given the relatively fastidious growth requirements of A. schaalii, standard clinical microbiological techniques may not detect A. schaalii from urine samples 21,48,49, highlighting the need to broaden our understanding of A. schaalii in the urinary tract. To date, genome analysis of A. schaalii has been limited to genome announcements without comprehensive analysis 50. To expand our understanding of A. schaalii, we performed whole-genome sequencing on nine separate A. schaalii isolates identified by extended culture of urine from male infants.
We generated high quality whole genome sequences of each A. schaalii isolate, with a mean Q30 sequencing coverage of 275x. The mean genome length was 2,325,278 bp with an average of 1931 coding sequences (CDS). Following annotation of genes with Bakta27, we computed the pangenome with Roary28. The core genome shared by all nine isolates was composed of 831 genes. An additional 2081 genes were found in 2-8 of the isolates. Finally, there were 1626 unique genes found in only 1 of the nine isolates. We visualized the pangenome and calculated average nucleotide identify (ANI) with anvi’o (Figure 4A). We compared the gene clusters in the core and accessory genomes using by annotating clusters by COG category within anvi’o. Overall, 21.9% of the core genome and 55.8% of the accessory genome were classified as general functions (R), unknown functions (S) or unassigned within the COG database (NA) (Figure 4B). The core genome was enriched for genes involved in information processing (DNA replication, transcription, etc.; COG J/K/L/A), cell processing/signaling (COG D/V/T/M/N/O/U), and energy production (COG C). Interestingly the accessory genome was enriched for genes involved in carbohydrate metabolism (COG G) Intuitively, gene clusters involved in mobile gene transfer (COG X) were elevated in the accessory genome (4.6% vs. 0.06%), consistent with the flexible nature of the accessory genome.
To identify potential determinants of A. schaalii fitness in the urinary tract, we used ABRicate to screen for the presence of antimicrobial resistance genes and known fitness factors, utilizing the ResFinder, MegaRes, and the VirulenceFinder Database. Notably, the Actinomycetaceae family is poorly represented in the VFDB and genomic datasets in general51, limiting the identification of putative virulence factors. We also annotated contigs with Bakta which reduces the number of CDS annotated as hypothetical proteins27. We manually curated potential fitness factors from the Bakta annotations. Seven of the nine isolates encoded ermX, an rRNA methyltransferase conferring resistance to macrolides. All nine isolates encoded the Esx-1 Type VII secretion system and its toxin esxA (Figure 4C). Esx-1 has been most extensively characterized in Mycobacterium tuberculosis, a member of the phylum Actinobacteria like A. schaalii52,53. EsxA is an anti-eukaryotic membrane-permeabilizing toxin and is required for virulence in M. tuberculosis53.
Metal acquisition and homeostasis are key fitness determinants for microbial-host interactions54,55. All nine isolates contained the enterobactin transporters, entS and fepBCDG (Figure 4C). The siderophore enterobactin, an iron-chelating small molecule, is a known fitness factor within the iron-deplete urinary tract56. Analysis of A. schaalii contigs using antiSMASH32,33 to identify biosynthetic gene clusters (BGCs), particularly those responsible for siderophores production, did not reveal any putative BGCs that may produce enterobactin or related molecules (Supplementary Table 5). Systems for the acquisition and metabolism of heme were also ubiquitous in the 9 isolates. Specifically, all nine isolates encoded hemQ and hemH (coproheme decarboxylase and ferrochetalase, respectively) which are involved in heme biosynthesis, the heme chaperone hemW, the heme ATPase transporter ccmA, and the heme-degrading monooxygenase hmoA. Furthermore, all nine isolates encoded copper detoxification systems. Copper is toxic to bacteria in high concentrations and is elevated in the urinary tract during infection57,58. Thus, copper detoxification is considered a fitness factor in the urinary tract. All nine isolates encoded copA/Z, a copper exporter and chaperone respectively, and copR, a copper responsive transcriptional regulator (Figure 4C). Together, these results indicate that A. schaalii encodes known fitness factors within the phylum Actinobacteria (e.g. EsxA) and within disparately related urinary pathogens (metal acquisition and detoxification).