We selected 31 children recruited into the CDGEMM cohort for whom stool samples were available at birth, 3 months, and 4-6 months for this study (see Figure 1, Table 1 and see Additional File 1 for more detailed metadata). None of these infants consumed solid foods before 6 months, which makes them ideal for studying the effect of genetic and environmental risk factors on the gut microbiota in the absence of gluten as a confounder. Twenty-six of these infants were genetically susceptible to developing CD out of which 19 were either heterozygous for DQ2 or DQ8 or carried both DQ2 and DQ8 (referred to as “standard genetic risk” hereafter) and seven were homozygous for DQ2 (referred to as “high genetic risk” hereafter). Additionally, 19 infants who were genetically predisposed to CD and that have been exposed to at least one environmental risk factor are referred to as “environmentally exposed” infants throughout the rest of manuscript. The environmental factors that we considered in this study include delivery model, antibiotic exposure and infant feeding type. Therefore, environmentally exposed infants are the ones who were born via cesarean section or were exposed to antibiotics at or during birth (i.e., antibiotics administered to the mother during delivery) or were not exclusively breastmilk-fed (i.e., formula-fed or both formula- and breastmilk-fed). The choice of these environmental risk factors and their grouping is clinically relevant since cesarean section delivery is often associated with antibiotic administration at birth and formula feeding due to delayed breastmilk production. Seven infants who were genetically susceptible and that were not exposed to any of these environmental risk factors, i.e., those born vaginally and not exposed to antibiotics at or during delivery and exclusively breastmilk-fed, are referred to as “environmentally non-exposed” hereafter (see Figure 1).
Collected stool samples underwent shotgun metagenomic sequencing and metabolomic profiling. We analyzed metagenomic sequencing reads (see Methods) to profile microbial taxa at species-level resolution (see Additional File 2 see also Additional File 3 for the taxonomic composition of each sample at the genus and family levels) and functional pathways encoded by metagenomes (see Additional File 4). While we identified non-bacterial species (fungi, viruses, protists) in our taxonomic profiling, in this paper, we focus only on the bacterial species.
Additionally, stool samples underwent metabolomic profiling and were analyzed to identify metabolites present in each stool sample (see Additional File 5). The identified microbial taxa, functional pathways and metabolites were then analyzed to explore how genetic and environmental risk factors influence the development of the gut microbiota as outlined below.
Associations between genetic and environmental risk factors and microbiota features
We used the MaAslin procedure [34] to investigate how various microbiome features including microbial species, functional pathways and metabolites at each time point are associated with genetic risk for developing CD and three key environmental risk factors including mode of delivery, exposure to antibiotics and infant feeding type (see Figures 2-4).
Genetic risk: We found that both high and standard genetic risk to develop CD are associated with a decreased abundance of several species of Streptococcus and Coprococcus at 4-6 months of age compared to those lacking genetic compatibility (Figure 2; p-value < 0.05). Notably, a decreased abundance of Coprococcus has been previously reported in the gut of individuals who carry a genetic risk to develop autoimmune conditions including CD [35]. Standard and high genetic risk for developing CD are also associated with an increased abundance of Bacteroides and Enterococcus species, respectively, at enrollment compared to no genetic risk. These observations are in agreement with previous studies [25, 26], however, an association between genetic risk and increased abundance of Bifidobacterium or Proteobacteria, which were reported before [25, 26] were not observed here. Among other significant associations, we found a decreased abundance of Veillonella, Parabacteroides and Clostridium perfringens at 4-6 months after birth in infants with high and standard genetic compatibility. This observation is contrary to case-control studies that report an increased abundance of these microbes in autoimmune conditions such as autoimmune liver disease [36], Bechet’s disease [37] and neuromyelitis optica [38].
In addition to association with microbial species, we found that a high genetic risk of developing CD is associated with a decreased abundance of a number of functional pathways at 4-6 months of age (Figure 3; p-value < 0.05). These pathways include amino acid metabolism, biosynthesis of secondary metabolites and metabolism of cofactors including ubiquinone and other terpenoid-quinone biosynthesis. Furthermore, we identified an association between high genetic risk and a number of metabolites, e.g., an increased abundance of butanoic acid and a decreased abundance of dihydroxyacteone at 3 and 4-6 months of age (Figure 4, p-value < 0.05).
Mode of delivery: We found that cesarean section delivery is associated with a decreased abundance of several species of Bacteroides and Parabacteroides at all time points and with an increased abundance of Enterococcus faecalis (at 3 months after birth) compared to vaginal delivery (Figure 2; p-value < 0.05) in agreement with previous work [23, 39-41]. For example, we found associations between cesarean section delivery and a decreased abundance of beneficial species Bacteroides vulgatus and Bacteroides dorei. An increased abundance of these species has been reported to lead to a decreased gut microbial production of lipopolysaccharide, which will improve immune function through mechanisms such as major histocompatibility production and T cell activation, among others [42]. Analysis of pathways shows also an association between cesarean section delivery and decreased riboflavin metabolism and folate biosynthesis at 4-6 months after birth and an increase in the abundance of glycerolipid metabolism at 3 and 4-6 months (Figure 3; p-value < 0.05). Of note, defects in folate biosynthesis have been linked to an impaired immune response to viral infections and reduced natural killer cell response possibly contributing to T1D onset [43]. Finally, metabolites analysis unveiled an association between cesarean section delivery and an increase in the abundance of a number of metabolites such as butanoic acid (at 3 and 4-6 months), glycolic acid, oxalic acid, and hydroxyphenlacetic acid (at 4-6 months) and a decrease in that of valine, serine, and arabinoic acid among others (at 4-6 months) (Figure 4, p-value < 0.05). An increased abundance of hydroxyphenlacetic acid in the serum has been associated with ulcerative colitis in a previous study [44], however, no clear links between the level of metabolites in the gut and those in the serum have been established yet. Additionally, serine, which is decreased in cesarean section delivery, has been reported to be required for effector T cell expansion and thus for modulating the adaptive immune response [45].
Infant feeding type: We examined three infant feeding types in this study including exclusive breastmilk feeding, exclusive formula feeding and both breastmilk and formula feeding, the last two of which were considered as environmental risk factors. Previous work shows an association between infant feeding type and distinct species of Bifidobacterium [23, 46]. Consistent with these reports, we observed that exposure to both breastmilk and formula is associated with a decreased abundance of Bifidobacterium breve (at 4-6 months) while exclusive formula feeding is associated with an increased abundance of Bifidobacterium adolescentis compared to exclusive breastmilk feeding (Figure 2; p-value < 0.05). We also found that exclusive formula feeding is associated with a decreased abundance of Staphylococcus epidermis (at enrollment) consistent with previous work [47], and with an increased abundance of Ruminococcus gnavus and Lachnospiraceae bacterium (at 3 and 4-6 months), which have been linked to allergic disease [48], diabetes [49] and colonic inflammation [50]. Pathway analysis shows that exposure to formula only or both breastmilk and formula is associated with an increased abundance of pathways for lipids, amino acids and terpendoids metabolism and xenobiotic degradation, and with a decreased abundance of pathways for carbohydrate and energy metabolism (Figure 3; p-value < 0.05). Additionally, metabolomic analysis uncovered an association between both breastmilk and formula feeding with a decreased abundance of homoserine, alpha-D-glucopyranoside, and hydrocinnamic acid (at 4-6 months) (Figure 4; p-value < 0.05). Exclusive formula feeding is also associated with an increase in sucrose and threonine and a decrease in oxalic acid and dihydroxyacetone abundances, among others (at 4-6 months).
Antibiotic use: We found an association between antibiotic exposure (as an environmental risk factor) and an increased abundance of Bacteroides thetaiotaomicron (at 4-6 months of age) (Figure 2; p-value < 0.05). This is corroborated with previous work suggesting that this species, which is an important metabolizer of polysaccharides, increases in abundance in response to amoxicillin exposure [51]. Other identified associations for antibiotic exposure not previously reported include an increased Propionibacterium, Subdoligranulum species and a decreased abundance of Bifidobacterium merycicum and Strepotococcus lutetiensis (at 4-6 months). Pathway analysis also revealed an association between antibiotic exposure and a decreased abundance of phenylalanine metabolism and an increased abundance of cyanoamino acid (3 and 4-6 months) and galactose metabolism (4-6 months) (Figure 3; p-value < 0.05). Analysis of metabolites showed associations between antibiotic exposure and a number of metabolites including decreased sucrose abundance (at 4-6 months) (Figure 4; p-value < 0.05).
Changes in the microbiota of environmentally exposed vs. non-exposed infants
Here, we performed a cross-sectional (inter-subject) analysis to explore how various features of the gut microbiota (microbes, pathways and metabolites) change between genetically predisposed infants who were exposed to at least one environmental risk factor noted before (environmentally exposed infants) vs. those who were not (environmentally non-exposed infants) (Figure 5). This analysis did not identify any microbial species whose abundance is significantly different between the environmentally exposed and non-exposed infants. Pathways analysis, however, revealed that environmentally exposed infants have a higher abundance of pathways for xenobiotic degradation, fatty acid metabolism, and lipid metabolism among others (at enrollment) and of pathways such as toluene and xylene and biphenyl degradation (at 4-6 months) (Figure 5A; p-value < 0.05). Metabolomic analysis identified alterations such as a decreased abundance of homoserine (at enrollment and 3 months) and of 2-ketobutryic acid (at enrollment) as well as an increased abundance of ribose (peak 2) (at 3 and 4-6 months) in environmentally exposed infants compared to non-exposed infants (Figure 5B; p-value < 0.05).
Longitudinal changes in the microbiota of environmentally exposed and non-exposed infants
Given the unique prospective study design of our cohort, we were able to perform a longitudinal (intra-subject) analysis to gain additional insights beyond a cross-sectional analysis by identifying dynamic alterations in the gut microbiota composition, function and metabolome in the first six months after birth. To this end, we explored changes in the microbiota features noted above between all pairs of time points that are observed exclusively in environmentally exposed or exclusively in environmentally non-exposed infants (Figure 6).
By longitudinal analysis of microbial species, we found that the abundance of a number of species increases over time in the environmentally exposed infants (Figure 6A; p-value < 0.05). For example, the abundance of Anaerostipes caccae monotonically increases during the study period and that of Klebsiella species and Erysipelotrichaceae bacterium increases from enrollment to 4-6 months. Among these, Klebsiella, has been associated with the autoimmune condition ankylosing spondylitis [52]. When examining environmentally non-exposed infants, we observe that the abundance of Bacteroides uniformis monotonically increases during the first 6 months after birth, a pattern which has previously been reported in breastmilk-fed infants [53]. In addition, work in mice found that Bacteroides uniformis improves immune defense mechanisms, which are impaired in obesity, by decreasing TNF-a production and increasing IL-10 production [54]. In our study, we also observed a decrease in the abundance of Veillonella species from enrollment to 4-6 months in non-exposed infants. An increased abundance of Veillonella species has been associated with autoimmune hepatitis [36].
Longitudinal pathway analysis revealed that the abundance of ether lipid metabolism increases from 3 to 4-6 months of age in environmentally exposed infants (Figure 6B; p-value < 0.05). Notably, a decreased abundance of ether lipids in the serum of children with T1D compared to healthy controls has been observed, [55] although the relationship between the abundance of microbial pathways for ether lipid metabolism in the gut and the level of ether lipids in the serum are yet to be explored. For the non-exposed infants, we observe a decrease in the abundance of sulfur metabolism and lipoic acid metabolism at 3 and 4-6 months, and of methane metabolism and biotin metabolism at 4-6 months compared to enrollment. These patterns are consistent with previous reports [34, 56-62]. For example, increased sulfur metabolism is associated with the development of T1D [56] and is linked to IBD [34]. Additionally, lipoic acid is an antioxidant that has been suggested to have beneficial immunomodulatory effects on the innate and adaptive immune systems in autoimmune diseases [57]. Methane has also been shown to have an anti-inflammatory effect, promoting immune tolerance in the intestine when tested in animal models [58, 59]. Furthermore, biotin is known to enhance innate [60] and adaptive immune responses [61] and biotin deficiency has been associated with immune disorders and inflammation [63, 64]. A previous study also found that high dose of biotin may be useful in treating multiple sclerosis [62].
Metabolomic analysis revealed a monotonic increase in erythritol abundances during the study period and a decrease in propionic acid abundance from enrollment to 4-6 months in environmentally exposed infants (Figure 6C; p-value < 0.05). Propionic acid produced in the colon via bacterial fermentation of fiber promotes regulatory T cell generation [65]. Additionally, increased serum levels of erythritol have been associated with central obesity and weight gain [66], though the link between metabolite levels in the gut and those in the serum is not clear. In environmentally non-exposed infants, we observed an increased abundance of uracil, 3-3-hydroxyphenylpropionic acid and dihydroxyacetone from enrollment to 4-6 months. Previous work suggests that 3-hydroxyphenylproprionic acid acts as an anti-inflammatory and antioxidant agent [67].
Linking Microbial Species, Pathways and Metabolites
In order to link microbial species, pathways and metabolites identified in these analyses, we performed a correlation analysis (using Spearman rank correlation) as detailed in Additional File 6, which resulted in several significant correlations between these features as summarized in Additional File 7. For example, exploring the links between pathways and metabolites with altered abundance in the cross-sectional analysis identified positive associations between ribose (peak 2) and biphenyl degradation and between toluene and xylene degradation in the environmentally exposed infants. In addition, association analysis between significant pathways and metabolites in the longitudinal analysis identified a negative association at 3 and 4-6 months between 3-3-hydroxyphenylpropionic acid and sulfur, lipoic acid, methane and biotin metabolism in non-exposed infants (Additional File 7).