Pathogen perturbation extends the genomic sampling of the CBA/J microbiome
To examine the microbial community response to Salmonella colonization, 14 CBA/J mice were infected with 109 CFU Salmonella enterica serovar Typhimurium strain 14028 with results compared to 16 uninoculated control mice sampled at the same time points (n = 30 mice, Fig. 1A). Feces were collected prior to infection (Day − 1) and in late stages of infection (Days 10 and 11), with 16S rRNA microbial community analyses performed at early and late time points (n = 60) and lipocalin-2, an indicator of enteric inflammation, measured on late time points from select mice from each treatment group (n = 12). The 60 fecal samples yielded 2,047,287 paired end 16S rRNA reads, which identified 23,022 unique amplicon sequencing variants (ASVs) from both inflamed and control treatments (Additional file1: Data S1). To confirm infection, we established that inoculated mice had Salmonella relative abundance greater than 25% on Day 11 and had significantly higher lipocalin-2 concentrations than control mice on Day 10. From these mice we selected feces at Day 11 from 3 Salmonella infected mice and 3 uninfected mice for deep metagenomic sequencing.
The 16S rRNA gene findings confirmed Salmonella infection resulted in statistically discernable microbial communities by Day 11 following infection (Fig. 1B). A Salmonella enterica relative abundance increase (≥ 25%) was concomitant with increased inflammation evidenced by a 2.5 log-fold rise in lipocalin-2 compared to levels in uninfected mice (Fig. 1B, Additional file1: Data S1). The microbial community of inflamed mice statistically differed from control uninfected mice at the same time point, and pre-pathogen treated mice from both treatments (Fig. 1B, Fig. 1C, Fig. S1). Pre-infection (Day-1) mice that later became Salmonella inoculated, and uninoculated mice had fecal microbial communities that were not discernable from each other, indicating observed community differences by Day 11 were due to Salmonella infection. As others have reported [13, 27], Salmonella-induced inflammation significantly changed gut microbial diversity, it reduced ASV richness by more than half (76.2%) and decreased Shannon’s diversity by 2.6-fold. These findings demonstrate that pathogen perturbation changes microbial membership and structure, offering a strategy for differential genomic sampling of the CBA/J gut microbiome.
These 16S rRNA analyses revealed inflamed communities were enriched in members of the Gammaproteobacteria and Bacilli, while gut communities of uninoculated mice included higher relative abundances of Bacteroidia, Mollicutes, and Clostridia (Fig. 1C). From the mice also sampled for metagenomic analysis (n = 6), Alistipes sp. was the most dominant commensal and the most reduced during inflammation (from 37.2%), but notably still detectable (7.68%). Salmonella enterica Typhimurium dominated the Gammaproteobacteria in inflamed communities, contributing up to a mean relative abundance of 94% in infected samples. Certain low abundant members of the CBA/J microbiome significantly increased in relative abundance following pathogen treatment, including some members of Lactobacillus, Enterococcus, and Lachnospiraceae (Fig. 1D). Control mouse communities are consistent with findings from prior work showing uninfected CBA/J mouse gut community membership dominated by Bacteroidetes and Firmicutes, especially Clostridia of various Lachnospiraceae and Ruminococcaceae families [27, 28]. These 16S rRNA gene analyses revealed abundant members in both healthy and inflamed CBA/J gut microbiomes that represented microorganismal genome ‘targets’ for our database.
Microbial Genomic Reconstruction From Cba/j Mice Recovers Relevant Members Sampled In Amplicon Surveys
To thoroughly catalog the CBA/J gut microbiota high sequencing depth was required to sequence through Salmonella dominance (25.8–94.2% by amplicon analyses) and recover some of the first genomes from rare, but persistent co-occurring members of the pathogen inflamed gut. We obtained 254.2 Gbps of metagenomic sequencing data (Additional file 2: Data S2) from 6 representative mice (inflamed n = 3, uninfected n = 3, Fig. 1A), 77-fold more sequencing/sample than is commonly done in murine catalogs (Fig. S2A). Additionally, we used iterative, targeted assembly approaches (single, co-assembly, subtractive assembly) as well as two different assemblers to attempt to enhance genome quality and recovery, especially from less dominant members (Fig. 2A, Fig. S2B). Subtractive and co-assembly methods derived 259 additional metagenomically assembled genomes (MAGs) beyond those from single sample assemblies, with the distribution of MAGs from each assembler reported (Fig. 2A). In total, we recovered 2,281 MAGs. Quality assessment revealed 504 MAGs to be either medium or high-quality (MQHQ) with sufficiently low contamination to be included in further analyses. These quality genomes contained 156,921 uniquely called predicted genes [29, 30] (Fig. 2, Data S3).
Dereplication of our metagenome assembled genomes (99% identity) resulted in 113 medium and high-quality MAGs (dMQHQ) from both treatments. These MAGs were assigned to 7 Phyla – Actinobacteriota (1), Bacteroidota (4) Firmicutes (7), Firmicutes_A (98), Firmicutes_B (1), Proteobacteria (1), and Verrucomicrobiota (1) (Fig. 2D, Additional file 2: Data S2). Nearly a third (30 of the 113) of the dereplicated MAGs were assigned to 30 genera and 98 to species that were only recognized by alphanumeric numbering in GTDB-Tk, hinting that novelty sampled here may be undescribed not only in murine but larger MAG collections. Reflecting the richness of these samples, the majority of MAGs originated from uninfected mice (59%) and their co-assemblies (35%) while 13% came from inflamed mice. Specifically, Enterococcus_D gallinarum, Erysipelatoclostridium cocleatum, Kineothrix sp000403275, and Lactobacillus_B animalis MAGs were uniquely recovered from inflamed mice, consistent with their 16S rRNA membership (Fig. 1D). This finding indicates how perturbation can aid in the sampling of genomes from conditionally rare members. Expanding this resource beyond solely bacterial genomes, we also reconstructed viral genomes from our CBA/J assemblies, recovering 4,516 viral metagenome assembled genomes (vMAGs). Of these, 2,351 vMAGs were ≥ 10kb which were then dereplicated into 609 viral genomes (Fig. 2C, Fig. 4D, Additional file 3: Data S4).
We first sought to verify if this microbial dereplicated MAG set represented the key members identified in our amplicon sequenced CBA/J communities from both inflamed (n = 14) and uninfected (n = 16) individuals (Fig. 1B). The relative abundance of dMQHQ MAGs closely mirrored the full community 16S rRNA amplicon at the class level from both uninfected Day 11 (rho = 0.68) and inflamed Day 11 (rho = 0.86) mice, indicating the dMQHQ database is representative of CBA/J untreated and inflamed communities (Fig. 3A, Fig. 1C). More specifically, a linear discriminate analysis of MAG relative abundance indicated similar dynamics between our genome and amplicon data sets. For example, Salmonella and Enterococcus_D were the most significant genomes in determining infected communities, while genomes from Alistipes, Duncaniella and Lachnospiraceae COE1 were most significant in determining uninfected communities (Fig. 3B). Additional to these genera, relative abundance of other key taxa is consistent with amplicon sequencing, including Akkermansia, and Muribaculaceae prominence in uninfected mice and persistence in infected mice. Lactobacillus genome and ASV relative abundance also similarly increased during infection (Fig. 3C).
To link these reconstructed genomes more precisely to the amplicon data, we identified 96 MAGs that contained a partial to full 16S rRNA gene sequence. A pairwise comparison of MAG-derived 16S rRNA sequences and the V4 region sequences from our ASVs identified 33 unique genomes containing sequences matching ASVs in our 16S rRNA dataset. Many MAGs with 16S rRNA matches were among the most enriched taxa including Lactobacillus johnsonii, Alistipes sp002428825, and Clostridia in order 4C28d-15 (Fig. S3, Additional file1: Data S1). Together these findings indicate significant membership congruence in our MAG database and our amplicon data, demonstrating that inferences made with the CBAJ-DB have relevance to the more broadly sampled amplicon sequenced gut communities from inflamed and uninfected mice.
This Cba/j Microbial Genomic Resource Includes Mouse And Human Relevant Lineages
To determine the CBAJ-DB relevance to microbial genomes recovered from other murine models, we compared strain level identity (> 99% average nucleotide identity) of our sampled MAGs to similar quality MAGs from two prevalent mouse gut genome catalogs: (i) Integrated Mouse Gut Metagenomic Catalog (iMGMC) and (ii) The Mouse Gastrointestinal Bacteria Catalogue (MGBC). Notably, many of these CBA/J derived genomes represented unique strains from the classes Bacilli (n = 3), Bacteroidia (n = 2), Clostridia (n = 24), Coriobacteriia (n = 1), and Dehalobacteriia (n = 1) not represented in iMGMC, and MAGs from Bacilli (n = 1), Dehalobacteriia (n = 1), and Clostridia (n = 30) not represented in MGBC (Fig. 4A). Additionally, of the strains that were sampled in our dataset and prior curated catalogs, 33 (30 Clostridia, 3 Bacilli) received a higher quality score indicating the value of these recovered MAGs to advance knowledge of cultivated and uncultivated genomes in murine models more broadly.
We also examined CBAJ-DB MAGs against genomes derived from human hosts. To analyze shared genera and species, our dMQHQ database was dereplicated with isolate genomes from the Human Microbiome Project (HMP) (n = 813) and MQHQ MAGs (n = 2,560) from a human cohort (PRJNA725020) (Additional file 4: Data S5) [31]. Akkermansia muciniphila (CBAJDB_482) and Enterococcus_D gallinarum (CBAJDB_497), two defining members of the commensal and inflamed gut respectively, clustered with species previously recovered from human hosts. Recovery of Enterococcus_D gallinarum from the uninfected CBA/J gut demonstrates the applicability of perturbation techniques to uncover conditionally rare members. As has been reported by others, there was more similarity at higher taxonomic levels (e.g. genus) between our murine and human gut microbial members [32, 33], with 27 MAGs from Bacilli, Bacteroidia, Clostridia, Corriobacteriia, Gammaproteobacteria, and Verrucomicrobiae sharing similarity (Fig. 4A).
We were particularly interested if the microbial members recovered from our pathogen-inflamed CBA/J had relevance to inflammation in humans. To test this, sequencing reads from the Lloyd-Price et al cohort [34] containing 972 inflamed and 365 healthy gut metagenomes were stringently mapped to the CBAJ-DB MQHQ MAGs (Additional file 5: Data S6)[34]. Consistent with their distribution across our treatments, sequencing reads from healthy and inflamed humans mapped to 11 of our Akkermansia muciniphila MAGs, while 3 Enterococcus_D gallinarum MAGs derived only in our inflamed treatments recruited sequences from inflamed individuals (Fig. 4B, C). While it can be challenging to extend specific organismal findings from murine to human conditions [22, 32, 33], inferences from critical lineages (e.g. A. muciniphila or E. gallinarum) in our database may have more direct relevance for human-relevant applications.
Salmonella infection and inflammation restructures the metabolic potential of the murine gut microbiome
Given this is one of the first genome-resolved analyses of a pathogen-impacted microbiota, and the first for Salmonella, it offered a new opportunity to assess functional potential remodeling during infection. Prior reports indicated that pathogen induced inflammation created oxidative conditions that generated terminal electron acceptors like oxygen, tetrathionate, nitrate, and sulfate [9, 19]. As such, we wanted to evaluate the respiratory capacity of inflamed communities and compare it to uninflamed communities. In infected communities, Salmonella has the highest mean genome relative abundance, and encoded gene sets for respiring oxygen (both high and low affinity oxidases), fumarate, tetrathionate, and trimethylamine N-oxide (TMAO) (Fig. 5A). Outside Salmonella, no other organisms had the capability for respiring with low affinity oxidases, but we infer Enterococcus and Lactobacillus have the capability to reduce low levels of oxygen for detoxification (due to the absence of complex I in electron transport chain) while Akkermansia municiphila and Muribaculaceae likely respire low levels of oxygen using high affinity oxygenase. Similarly, we observed genes for detoxifying reactive oxidative damage (SOD, catalase, thioredoxin reductase) were more enriched in the inflamed community than the uninfected community. Together these findings demonstrated organisms co-existing with Salmonella in the inflamed gut encode the metabolic abilities to withstand or leverage the oxidative redox conditions caused by inflammation (Fig. 5C). Markedly, there were members in the uninflamed gut with respiratory metabolic potential that were not maintained in the inflamed gut (Duncaniella sp, Hungatella_A sp), demonstrating there are other selective forces besides the ability to respire that dictate persistence in response to pathogen colonization (Fig. 5C).
Prior reports by our team and others demonstrated that butyrate, a key gut short chain fatty acid (SCFA), decreased by 15-fold in the Salmonella inflamed gut, most likely due to inflammation induced redox changes with detrimental impacts on members of the class Clostridia [13, 27]. Here we sought to better understand the relationship between taxonomy and SCFA production potential. In uninflamed communities the most prevalent butyrate producing bacteria were members of the Alistipes and Lachonospiraceae, members of classes Bacteroidia and Clostridia respectively. Interestingly, while the most dominant Clostridia did decrease in relative abundance with inflammation, replacement Clostridia members (Lachnospiracea, Dorea, Faecalicatena) were enriched which encoded overlapping butyrate production potential. For example, a MAG belonging to the genus Dorea within the Clostridia was enriched 16-fold and likely most contributed to butyrate production stability, while the dominant Alistipes MAG (a member of the Bacteroidia) reduced in abundance by a third was not replaced by taxonomically similar members. Together, these data suggest the notion that decreased butyrate concentrations observed in the CBA/J mouse model during Salmonella infection [27] may be attributed to Bacteroidia reduction and less so to Clostridia, a hypothesis needing further validation using gene expression to track butyrate production and consumption activities in the inflamed gut.
Salmonella-induced inflammation alters carbon usage patterns with more favorable redox conditions enabling the use of less energetically favorable substrates like 1,2-propanediol and ethanolamine [35, 36]. While Salmonella encodes this metabolic capacity, we were interested if any of the other persisting microorganisms could compete for use of these substrates. Enterococcus_D and multiple Oscillospirales genomes contain genes from the eut gene cluster for ethanolamine utilization and pdu genes for 1,2-propanediol utilization. These genera increase in relative abundance with inflammation, particularly Enterococcus_D, which is one of the next most abundant members after Salmonella (expanding to 2.6% of the inflamed community). Additionally, we showed that the polymer utilization profile was also impacted with inflammation, as infected communities can utilize more alpha-galactan and chitin (Fig. 5). In a similar fashion, the community utilization potential of sugars fructose, fucose, and mannose increased with inflammation. Collectively, these data can inform probiotic approaches for controlling Salmonella abundance through competitive exclusion targeting select substrate use patterns using inflammation resistant strains.
Next, we quantified genes commonly reported in humans to impact inflammation and examined if they were depleted in this inflamed mouse model. Consistent with literature reporting healthy individuals have a greater potential for tryptophan degradation [37–39], we observed the potential for tryptophanase mediated conversion of tryptophan to indole by members of Bacteroidia, Clostridia, and Verrucomicrobiae in both inflamed and uninfected mice. However, the proportion of Bacteroidia with this gene was much lower in inflamed guts (Fig. 6B). Tryptophan Indole/AhR pathway representation in infected mice is concurrent with lower proportions of Verrucomicrobiae and Bacteroidia spp. (Fig. 6). Also, like human microbiomes, we observed microbial genes responsible for cleaving taurine or glycine from primary bile acids and metabolizing secondary bile acid products (bsh, baiN, baiA, and hdhA) were significantly lower in relative abundance in mice infected with Salmonella (Fig. 6). These data provide promising insights that the functional gene profiles for modulating inflammation may be conserved with humans, supporting CBA/J as a relevant inflammation model.
Viral AMGs contribute to the bacterial community functional potential in CBA/J mice via Firmicutes
In the creation of the first murine gut viral database, we sought to compare viral genomic content cataloged here to other mammalian gut systems. Of the 609 dereplicated vMAGs that were recovered from both treatments, less than 1% had taxonomic assignments (Additional file 3: Data S4). These three vMAGs were assigned to the Caudovirales in the families Siphoviridae (n = 2), and Myoviridae (n = 1). To perform biogeographic analyses, we collated phage genomes previously reported from mammalian guts [24, 34, 40–42] and clustered these with our mouse recovered vMAGs. We found that 322 of the CBA/J derived vMAGs (53%) had similar representatives in other phage gut metagenome studies, meaning over half of our vMAGs clustered with viruses from at least one additional study (Fig. 4D). This suggests a potentially cosmopolitan phage seedbank that may be conserved across a wide variety of animals, geographies and, in the case of humans, ethnicities and health statuses. Ultimately, viral content in the CBAJ-DB can have relevance to other mouse models and human guts.
To explore if viral communities could potentially influence the structure and function of the CBAJ-DB uninflamed and inflamed microbial communities, we verified that microbial and viral genome-based ordinations were coordinated (Fig. S4). With informatics we conservatively determined that of the 609 vMAGs, 11.5% were putatively linked to 43 MAGs that encompassed 27 unique taxonomies (Fig. S4). All putative hosts corresponded to members of the Firmicutes, and included members of the Lachnospiraceae, Ruminococcaceae, Oscillospiraceae, Anaerotignaceae, Borkfalkiaceae and Acutalibacteraceae families. Among the vMAGs that putatively infected hosts, we identified 36 auxiliary metabolic genes (AMGs) with functionalities including regulation of the TCA cycle (citrate synthase), glycolysis (orthophosphate dikinase), phosphate metabolism (PhoH), and oxidative stress response (rubrerythrin). These phage genomes also encoded AMGs for the induction of germination (Peptidase A25), spore formation (M50B), the cleavage of amorphous cellulose (GH2), and low pH resistance (ornithine carbamoyltransferase). Among the putative viral hosts were members within the Clostridia class, exhibiting some of the largest MAG relative abundance differences between inflammation states. For example, Dorea and Faecalicatena enriched in inflamed mice, and Lachnospiraceae COE1 enriched in uninfected mice. Together these findings indicate phages may be underappreciated top down (predation) and bottom up (resource) controllers of microbiota functionality in the murine gut.