Growing evidence suggests a complex interaction between the host genes and the microbiome [26, 29, 56, 57]. Further, studies have revealed the combined effects of host genetics and microbiome composition on several phenotypes including sleep, anxiety, and liver damage [34, 36, 37]. The effect of diet, environment, early exposure to certain microbes and the use of antibiotics, makes it challenging to study these interactions in humans.
We analyzed the fecal bacterial composition collected from 28 genetically diverse strains of CC mice. Despite identical housing, food and care, fecal microbial communities differed significantly across CC strains, beginning at the phylum level. The most abundant phylum, Bacteroidetes, had a relative abundance that varied from 40–80% across CC strains. The next most abundant phylum, the Firmicutes, had a relative abundance that varied between 17% and 58% across CC strains (Table 1). In contrast, Firmicutes were the most abundant phylum in previously published CC strain and CC founder strain studies [34, 58]. Analysis of previously published microbiome data from the Murine Microbiome Database, suggests that this difference is attributable to the sample collection site, feces vs cecum [59].
The other eight observed phyla were absent in as few as one strain and up to 26 of the 28 strains. This difference across the CC strains was also noticeable in the alpha diversity metric, Shannon’s index. Diversity starting at such a high level is interesting and warrants consideration of the microbiome as an important factor when comparing phenotypic outcomes from different inbred strains.
Animals within a CC strain tended to have more similar bacterial composition profiles in the feces than animals from different CC strains. The PCOA plots for two of the diversity metrics (Bray Curtis and Unweighted UniFrac) grouped mice better by strain than by any other observed parameter. PERMANOVA and ANSOIM analysis further confirmed significant differences across CC strains. Next, we tested whether these differences were caused by the variations in individual animals within a strain. We calculated the distance in individuals within a strain and between strains. For both diversity metrics, the between-strain distances were significantly higher than the individual distance within the strains. This close association of bacteria within a strain and diversity across the strain makes CC mice a great tool for studying the effect of host genetics on the gut microbiome.
Keeping in mind the limitations of 16S rRNA gene sequencing in prediction of species level data, we wanted to identify genetic associations with microbial abundance at the genus level. We calculated broad sense heritability for the fecal microbiome data. Of the 112 genera that we tested, 107 had positive heritability scores, and more than half had scores higher than 0.25 (Table S2). These values suggest that microbiome composition is a heritable trait, suitable for genetic mapping [60].
We also employed a machine-learning algorithm to predict effect of genotype on the microbiome. The Random Forest classifier had an overall accuracy of 100% and cross-validation score of 0.99 in matching microbiome composition to the correct CC strain. The accuracy of the classifier to accurately predict the CC strain based on the genus-level bacterial composition data provided further evidence for the influence of host genetics on microbial abundance.
For a successful association and QTL mapping, a phenotype must be diverse across the CC strains. ANCOM showed significant differences across the strains for the bacterial composition data at the genus level (Fig S4). Using R/qtl2 analysis, we successfully identified 26 significant peaks on 15 different chromosomes for 23 different genera. The heritability scores of all these genera were positive with an average score of 0.32. The 1.8 peak drop confidence interval ranged from 0.02 mb to 49.94 mb with a median range of 5.41 mb. Three different QTLs Micab6, Micab8, and Micab11 obtained from analyzing the composition of the cecal microbiome in CC mice shared regions with our QTL peaks. Micab6 and Micab8 shared regions with two different QTL from our list for the same class Clostridia. Micab11, associated with class Clostridia in the cecal microbiome, shared the regions with Genus Rikenella from the class Bacteroidota in our fecal microbiome [34].
We looked closely into the genes that were present in each QTL interval using Enrichr and DAVID analysis. Many genes in the QTL we identified had critical roles in important body functions including metabolism and the immune system (Table S4). Given their location in the intestine and their ability to produce and modify several metabolites, various species have been proposed or used as probiotics to intervene in different conditions including obesity, diarrhea, diabetes, Clostridium difficile infection, IBD, and neurological diseases [61–66]. Many of these probiotics have failed to produce the desired results at the population level [67–71]. The interplay between host genetics and the microbial composition could explain these differential outcomes between animal models and the general human population.
A decrease in beneficial microbes, enrichment of pathogens and imbalance in shared response between microbes can alter the outcome of diseases [72]. We wanted to identify correlations between pre-infection bacterial composition and the outcome of STm infection. Previous CC studies have successfully employed machine-learning algorithms to predict the outcome of phenotypes including memory, anxiety and AOM induced toxicity [35–37]. We employed a similar random forest classifier to predict the CC genotype with respect to microbial composition. For Genus-level data, the classifier had an overall accuracy of 90% with a AUROC of 0.93.
The algorithm we used identified several genera that were important in predicting the outcome of infection with STm. Genus Parasutterella was the top feature in helping the algorithm differentiate between animals that remained healthy versus those that became ill after STm infection. To further shortlist genera that were differentially abundant between the sick and healthy groups, we employed ANCOM. This analysis identified Lachnospiraceae UCG-006 along with Parasutterella as significantly different between the two groups.
The Genus Parasutterella is also found in other host species including humans, rats, dogs, pigs, chicken, turkeys, and calves [73]. Changes in the relative abundance of this genus have been reported in several diseases. Parasutterella are increased in submucosal tissues of patients with advanced Crohn’s disease [74] and are associated with pancreatitis in rats [75]. Increased abundance of Parasutterella is also associated with depression and major depressive disorder [76, 77]. Furthermore, increased abundance of Parasutterella was linked to the genesis and development of irritable bowel syndrome (IBS) and is associated with chronic intestinal inflammation in patients with IBS [78].
Parasutterella produce succinate as a fermentative end product and also alter the production of several microbial derived metabolites involved in bile acid maintenance, tyrosine, tryptophan and cholesterol metabolism [73]. In our experiments, Genus Parasutterella was abundant in the fecal microbiota of CC strains that developed clinical signs after infection with STm. Opportunistic pathogens, Enterohemorrhagic Escherichia coli and Clostridium difficile, sense succinate produced by commensal gut microbiota through a transcriptional regulator, catabolite repressor/activator (cra) [79, 80]. STm also utilizes succinate produced by intestinal commensals [81] and cra gene activation is important for STm pathogenesis [82]. Sensing of succinate results in the activation of STm virulence genes, including activating he Salmonella pathogenicity island 2 type III secretion system (SPI2 T3SS), leading to increased pathogenicity [83, 84]. Thus, succinate produced by Parasutterella may be sensed by STm leading to increased virulence and development of severe clinical symptoms in CC strains that harbor this organism in the intestinal tract.
The family Lachnospiraceae is a core member of the gut microbiota in both mice and humans [59, 85]. Despite being one of the main producers of short-chain fatty acids (SCFA) in the intestine and helping with metabolism [86], the role of family Lachnospiraceae is controversial. The relative abundance of multiple genera in this family can both positively and negatively influences several diseases, including obesity, diabetes, IBD and depressive syndrome [87]. In our study, the relative abundance of Genus Lachnospiraceae UCG-006, is associated with a positive health outcome after STm infection. Studies have correlated the increased relative abundance of Genus Lachnospiraceae UCG-006 to positive outcome in several diseases including colon cancer, IBD, LPS-induced inflammation and colitis ([88–91]. Butyrate, a SCFA, can be utilized by intestinal epithelial cells as an energy source [92, 93]. Butyrate also reduces the colonization and virulence of several Salmonella sp. including Typhimurium [94–98]. SCFA produced by Lachnospiraceae species may influence the initial intestinal colonization of STm, leading to a positive outcome after infection.
Diversity and homeostasis are key to the proper functioning of our gut microbiome. A change in any direction can lead to metabolic abnormalities, intestinal inflammation, infection by pathogens, auto immunity, and neurological diseases. By controlling the diet and environment, we identified various regions of the mouse genome that are associated with intestinal colonization by specific microbes. Despite the small sample size, we successfully used machine learning tools to predict the metadata columns. With publicly available bacterial composition data and advancements in the artificial intelligence field, more machine learning tools can be employed to develop microbes as biomarkers for disease prediction. Future genetic modification and gnotobiotic studies in murine models will be useful to convert the correlated genes to causal candidates for therapeutic application. A clear understanding of the role of environment, diet, and host genetics will provide a basis for using microbes as a personalized therapeutic tool to prevent, diagnose and treat various body conditions and diseases.