The gut microbiome of 7-37 month old children from The Gambia shows the development of 1 a distinct non-industrial Prevotella -based trophic network 2

Distinctive bacterial trophic networks exist in the gut microbiota of individuals in 26 industrialized and non- industrialized countries. To study the development of these networks, 27 we investigated the gut microbiota of 7-37 month old children living in rural Gambia (616 28 children, 1407 stool samples, stratified by 3-month age groups). We found that child age was 29 the largest discriminating factor between samples, and that anthropometric indices WAZ, 30 HAZ, and WHZ, collection timepoints, and iron supplementation did not significantly 31 influence the gut microbiome in this data set. Prevotella copri , Faecalibacterium prausnitzii , 32 and Prevotella stercorea were the most abundant species (35%, 11%, and 7%, respectively). 33 Distinct bacterial trophic network clusters were identified, centered around Prevotella 34 stercorea and Faecalibacterium prausnitzii, which were found to develop steadily as the gut 35 microbiome matured. The Prevotella stercorea trophic network cluster is distinct from those 36 found in individuals in industrialized countries and therefore this dataset, set within a critical 37 developmental timeframe, provides an excellent opportunity to understand the influence of 38 a high fiber, low-protein diet on the development of a Prevotella- enriched gut microbiome.

One of the largest characterizations of bacterial species present in stool samples to date was 55 from a study from 531 humans representing healthy Amerindians from the Amazonas of 56 Venezuela, residents of rural Malawian communities, and inhabitants of metropolitan areas 57 in the USA 1 . This study was of particular value because it included infants, children, 58 teenagers and adults. Striking differences were observed between USA city residents and 59 rural populations from the Amazonas and Malawi. Species belonging to the genus Prevotella 60 were one of the most dominant differences in the USA versus non-USA comparisons. The 61 importance of Prevotella species as a discriminatory taxon between the gut microbiota of 62 people living in industrialized vs. non-industrialized societies was also highlighted in a study 63 comparing children in West Africa (Burkina Faso) and Europe (Italy) 2 . In this study, the Italian 64 children's gut samples were dominated by Bacteroides and the children's gut samples from 65 Burkina Faso were again dominated by Prevotella. 66 The term enterotype has been proposed to describe distinctive bacteriological patterns of 67 human gut microbiota. Several studies have highlighted distinctive enterotypes 3,4 , one 68 dominated by Bacteroides or Clostridiales species, which is present more commonly in 69 industrialized countries (such as America and Europe) and one dominated by Prevotella 70 species, which is more commonly detected in countries with non-industrialized lifestyles 5 6 . 71 In the original paper describing enterotypes the human gut microbiome was classified into 72 three groups based on its composition 5 . However, the concept of enterotypes has been 73 critically re-appraised in the last few years. For example, a combined analysis of five studies 74 with 747 samples across five continents could not confirm the enterotypes hypothesis as 75 originally proposed 7 . The same group concluded that the abundances of Western diet tend to be dominated by Bacteroides and Clostridiales, while rural populations 79 with a high fiber, low-protein diet tend to be dominated by Prevotella". Other groups have 80 also re-analysed the enterotypes hypothesis 8,9 and these detailed cluster and statistical 81 analyses concluded that only two bacterial communities constitute our gut microbiota, one 82 dominated by Prevotella and one by Bacteroides or species of the Firmicutes 9-13 . 83 In the current study we utilised data from an iron intervention trial in The Gambia in West 84 Nutritional and diet information 106 The Gambia is a low-income country in West Africa, where food availability and nutritional 107 status in rural areas are poor. In rural areas, food availability and nutritional status are strongly 108 influenced by seasonality, and a chronically marginal diet is exacerbated by a ''hungry season'' 109 (July-September), when food stocks from the previous harvest season are depleted. Infants in 110 rural Gambia are breast-fed to 2 years of age, with fewer than half of infants being exclusively 111 breast-fed to 6-months of age as per WHO recommendation 17 . The first foods introduced from 112 3 months of age are thin gruels made from only cereal, water (occasionally cow milk is added), 113 salt, and sugar, and are of a low energy and fat content. A thicker porridge made from rice and 114 pounded groundnuts is sometimes administered. Cow milk alone is infrequently given to 115 infants < 1 year of age; only 57% of infants receive it more than once a week, although it is 116 provided often to children in the second year of life. From 6 months, infants start to share the 117 family food bowl, the most common meals consisting of boiled rice and a sauce made from 118 groundnuts or leaves. Dried fish may be added to sauces in very small quantities, but fresh fish 119 is not given to infants before 9 months 18 . 120 121

ALDEx2 test
Mul%dimensional scaling using Principal Coordinates Analysis (PCoA) did not cluster children differently based on iron supplement treatment and placebo (Sup fig 2.a). Volcano Plot only iden%fied one species which was sta%s%cally different between the two sample groups (red dot in Sup fig 2.b). The only sta%s%cally significant species iden%fied by the Kruskal-Wallis Rank test was Megamonas funiforms which was detected at higher abundance in the treatment group with a false discovery rate (FDR) corrected P value of 0.0099 (Sup fig 2.

220
The largest differences were observed between D1 and D85 sampling timepoints for which 221 eight species were statistically significantly different between the two timepoints with FDR 222 corrected P value < 0.05. Six of the eight species increased in abundance from D1 to D85. 223 3 a to 3). Two of the eight species decreased from D1 to D85 with Bifidobacterium being the 227 most abundant species (4.12) followed by Allisonella histaminiformans (0.42%) 228 (Supplementary Figure 3 g and h). The alpha diversity indexes for Richness, Chao1, and 229 Fisher's Alpha were also significantly higher in the D85 samples compare to the D1 samples 230 (Supplementary Figure 3 i to l). 231 232 Significant differences between the young, middle, and old age group samples 233 Beta diversity analysis (between group analysis) was conducted to identify if the microbial 234 composition between the young, middle, and old groups differs from each other. The 235 Bonferroni corrected P-value from the PERMANOVA test between the three different age 236 groups was 0.0003 and the calculated F statistic was between 35 and 125 for the species 237 taxon, between 46 and 163 for the genus taxon, and between 62 and 238 for the family 238 taxon (Supplementary Table 1  expected, the young and old age groups were separated the furthest apart from one another 248 while the middle age group clustered between them. It also showed that the greatest 249 separation was observed at the lowest (species) taxonomic resolution. We also plotted all 250 species, genera, and families with a minimum abundance of 0.5% across all samples to show 251 which taxa were most responsible for the separation. At the species level, these included     To identify changes over time in the whole data set, we split the data into 11 age groups, 273 separated by three-month intervals, and analyzed alpha diversity. The Fisher's alpha 274 parameter indicated a statistically significant increase in alpha diversity from the youngest 275 age group (7-9 months) to the 34-36 months age group (Kruskal-Wallis P < 0.0001) ( Figure  276 2.a). The Simpson's index was also statistically significantly different between the 11 age 277 groups, but an upwards trend as seen by the Fisher's alpha was not observed (Kruskal-Wallis 278 P = 0.0002) (Figure 2    Alpha diversity was measured by two different diversity indexes and by two different richness 287 indexes. Figure 2.a The Fisher's alpha diversity test, Figure 2.b Simpson's diversity test, Figure  288 2.c, Chao1 estimated richness, Figure  We used the non-parametric Kruskal-Wallis rank test to identify significantly differentially 294 abundant bacterial taxa in the 11 age groups (3-month intervals), (Supplementary Table 2). 295 From the Kruskal-Wallis rank test we report the uncorrected and the false discovery rate 296 (FDR) corrected P value, as well as the mean and median abundancies for the different taxa. Faecalibacterium prausnitzii (11.4%) and Prevotella stercorea (7.1%), respectively. These 303 three species accounted for a total of 53.6% of all reads (Supplementary Tables 2, first three 304 species entries in column "Taxa"). All Prevotella species combined accounted for 27%, 44%, 305 and 48% of all reads, in the young, middle, and old age groups respectively. 306 The maturation pattern over time of the top ten bacterial taxa, shown either at the species or 307 genus level, with a minimum abundance of 1%, which covers 73.2% of all bacterial 16S reads, 308 are shown in Figure 3. These ten bacterial groups are from four major bacterial phyla namely, Paraprevotella xylaniphila (2%); Firmicutes; Faecalibacterium prausnitzii (11.4%), 311 Streptococcus salivarius (1.1%); Actinobacteria; Bifidobacterium (4.1%); and Proteobacteria; 312 Escherichia coli (3.5%), Succinivibrio dextrinosolvens (2.4%), Sutterella wadsworthensis (1.8%). 313 The maturation pattern was investigated with two different normalized datasets. followed by a steady state and then again, an increase in abundance between the age of 30 326 months and 36 months followed with a slight drop in abundance after 3 years of age. 327    in % were calculated from total sum scaling (TSS) normalised absolute read counts which 392 were subsequently transformed using cumulative sum scaling (CSS)+log2. 393 However, the Faecalibacterium & co. cluster does seem to benefit from a high abundance of 396 P. copri (Figure 6.a) whilst the Prevotella stercorea & co. cluster does not seem to have a 397 distinct interaction with P. copri (Figure 6.b). Several of the species associated with the 398 Prevotella stercorea & co. and the Faecalibacterium & co. clusters seem to be shared 399 between both clusters to some degree, having high positive correlation coefficients with 400 members from both clusters (green and blue clusters in Figure 5.a and b). 401 The steady build-up of F. prausnitzii cluster over time appears to mirror the development of a 402 trophic network seen in children in industrialised counties 15 , where the build-up of a trophic 403 network of fibre degraders, acetate and lactate producers combined with butyrate producers 404 appears to be essential for a healthy immune system development. However, the addition or 405 expansion of the P. stercorea & co trophic network in our African cohort (alongside P. copri) 406 likely helps explain why higher levels of short-chain-fatty-acid production levels are seen in 407 Africans with non-industrialized gut microbiomes. The apparent positive association of 408 Bifidobacterium with various other bacteria (black box, Figure 5 Figure 6.c represents 73.5% of the variation in the data and indicates 450 the maturation process to be the accumulation of Prevotella copri and both the F. prausnitzii 451 and P. stercorea trophic networks over time. The build-up of the trophic networks over time 452 is furthermore well described by PC3 in Figure 6.d. PC2 in Figure 6.e indicates how P. copri 453 and the P. stercorea network are negatively correlated with Bacteroides whilst the F. 454 prausnitzii network is positively correlated with Bacteroides. PC4 in Figure 6.f indicates that P. 455 copri and the F. prausnitzii network are positively correlated with one another 456 457 The gut microbiome was stable across Weight-for-Age Z Score (WAZ), Height-for-Age Z Score 458 (HAZ), and Weight-for-Height z-score (WHZ) 459 Our study population in terms of WAZ, HAZ, and WHZ had a mean Z score of -0.95, -0.85, and 460 -0.77, respectively and a standard deviation between 0.85 and 0.94 for the three Z scores 461 (Table 1) Table 3, columns WAZ.p, HAZ.p, WHZ.p, yellow box) but none of the 467 associations remained significant after correction for multiple testing by the FDR method 468 (Supplementary Table 3, columns WAZ.p.fdr, HAZ.p.fdr, and WHZ.p.fdr). We also tested for 469 an association between microbial diversity indexes and our three multiple explanatory analysis the same alpha diversity indexes were used as shown in Figure 1. Fisher's Alpha 472 (Supp Fig 5.a), Richness index (Supp Fig 5.c), and Chao1 (Supp Fig 5.d) were all highly 473 significantly associated with age. However, none of the alpha community diversities (i.e., 474 Fisher's alpha, Simpson's index, Richness, and Chao1) remained significantly associated with 475 WAZ, HAZ, and WHZ after controlling for age ( Supplementary Figures 5.a to 5  68333 MLR was used to identify an association between community composition (219/488 bacterial taxa with a minimum abundance of 0.01%) and multiple explanatory variables for age, Weight-for-Age Z Score (WAZ), Height-for-Age Z Score (HAZ), and Weight-for-Height z-score (WHZ). Bacterial taxa were used as the dependent (response) variables and the clinical factors for age and score for WAZ, HAZ, and WHZ, were used as independent variables. The table was sorted by the false discovery rate (fdr) corrected p-values for age. The table shows the uncorrected p-values the coefficient ("c") value and the corrected p-values. MRL model was performed in Calypso Web Portal (version 8.72). The top 50 taxa are shown. Statistically significantly associated taxa with an uncorrected p-value < 0.05 are highlighted in yellow and statistically significantly associated taxa with a FDR corrected p-value < 0.05 are highlighted in red. network in Finnish/Estonian children can be said to be slightly more complex and as a whole 507 more abundant in terms of the percentage of reads, and that the F. prausnitzii trophic 508 network in children from The Gambia appears to utilize the metabolic activity of P. copri, a 509 species which independently reaches a very high stable prevalence already at the age of one 510 year. Prevotella copri, the most dominant species in this dataset, is expected to be negatively 511 correlated with most bacterial groups, even in the absence of clear antagonism, when 512 normalizing the abundances of species to 100% of all reads, due to its high abundance 513 The trophic network centred around Prevotella stercorea is typically not found in gut 519 microbiomes from individuals in industrialized countries. Many of the species associated in 520 this trophic network are not found in microbiota in industrialized countries, or in only low 521 numbers 1,2,15,24 . As a result, they have often not been studied sufficiently (or at all) which 522 makes it difficult to understand or explain, with any degree of certainty, what their roles are 523 within this trophic network. The P. stercorea trophic network, as described here in the 524 Gambian context, appears more diverse and self-contained than the F. prausnitzii network as 525 it does not depend on P. copri and is typically strongly negatively correlated with all other 526 main bacterial groups, apart from P. copri and the F. prausnitzii trophic network (Figure 5.b). 527 known about many of these species, that the assemblage should have a very wide 529 assortment of metabolic capabilities covering everything needed to form a self-contained 530 trophic network, leading (for example) to an enhanced production of short chain fatty acids, 531 as compared to microbiome compositions lacking this trophic network. 532 A particularly strong and important factor in the composition of the gut microbiome in these 533 children from The Gambia is the clear antagonistic relationship between the Prevotella and 534 Bacteroides genera (Figures 5 and 6). The Prevotella copri and the Prevotella stercorea 535 network are both very strongly negatively correlated with the abundance of Bacteroides 536 (Spearman rho = -0.36 and -0.32, respectively). The F. prausnitzii network however is not 537 negatively correlated with Bacteroides, similar to the association commonly found between 538

F. prausnitzii and Bacteroides in gut microbiome compositions from industrialized countries 539
where it is often found to be positive. 540 541 Similar to infant microbiotas in industrialized countries, the Bifidobacterium genus is the 542 most important and dominant group early in infancy. As infants are weaned, the dominance 543 of Bifidobacteria wanes and they are rapidly replaced by bacteria associated with a more 544 adult-like "African" microbiome composition, especially in the case of these Gambian 545 children 2 . Bifidobacterium levels at 3 years of age are much lower in children from The 546 Gambia than in European children, possibly because the lactate-and acetate-producing 547 Bifidobacteria become part of the trophic networks found in Europe whereas they are 548 replaced by other species with a similar function (acetate and lactate production) in The 549 Gambia. The higher abundance of Bifidobacterium in European children may also be a result 550 of earlier and longer formula-feeding in those children 25,26 . coli and Sutterella), the Bacteroides genus, and several other infant/baby-associated 553 microorganisms. 554

555
The high abundance of P. copri in the The Gambia cohort emphasises the function of P. copri 556 and their metabolomic pathways in fermenting complex polysaccharides, which is important 557 for people with a high fibre diet. It has also been shown in other studies that Prevotella 558 species play an important role in glucose metabolism improvements after the consumption 559 of barley kernel-based bread 27 . Metatranscriptomic RNA sequencing has provided detailed 560 information showing that P. copri encodes all the enzymes necessary for the Embden-561 Meyerhof-Parnas (EMP) pathway which is the most recognised glycolytic pathway 28 and is 562 based on glycolysis and succinate production from fumarate. In addition, transcriptomic 563 profiling has further revealed that it can degrade pyruvate to acetate and formate 29 . Another 564 study looking at sports athletes associated Prevotella with a number of amino acid and 565 carbohydrate metabolism pathways including lysine biosynthesis, alanine, aspartate and 566 glutamate metabolism, and D-glutamine and D-glutamate metabolism. Prevotella is also 567 known to be important in multiple pathways involved in drug metabolism, carbohydrate 568 metabolism, and metabolism of cofactors and vitamins, including vitamin B6 metabolism 30 . 569 570 In conclusion, we have analysed a stool collection from children between the age of 7 and 40 571 months of the upper region of The Gambia with a high fiber diet; the largest African 572 paediatric microbiome collection to date. Detailed analysis of the gut microbiome has 573 provided insights into the development of a Prevotella-rich gut microbiome. We observed 574 similar changes to those described in paediatric cohorts from industrialised countries, such as prausnitzii. However, in addition we were able to describe the steady development of a 577 trophic network centered around P. stercorea and the rapid rise to dominance of P. copri. 578 The high abundance of Prevotella in non-industrialized countries makes this genus the most 579 discriminating taxon between a healthy high-fiber diet and an unhealthy low-fiber diet seen 580 in industrialized countries. In industrialized countries, the increase of various diseases of 581 affluence, like diabetes, is typically associated with the (partial) loss of the last trophic 582 network (indicated by lower F. prausnitzii numbers) and an increase of Bacteroides and/or 583 Proteobacteria. The parallel loss of P. copri and the P. stercorea trophic network, whose 584 development is described here, indicates the likely importance of the metabolic pathways 585 and capabilities of Prevotella and species associated with them, both for maintaining a higher 586 fermentative capacity (SCFA production), a healthier gut microbiome environment and for 587 keeping opportunistic pathogens at bay. The study protocol and any subsequent amendments have been reviewed and approved by 635 The Gambia Government/MRC Joint Ethics Committee (reference SCC1489). Clinical Trials 636