Population Strati cation in the Gut Microbiota of Bali is Associated with Transitional Lifestyle

Clarissa Asha Febinia Eijkman Institute for Molecular Biology https://orcid.org/0000-0002-6918-8529 Safarina G. Malik Eijkman Institute for Molecular Biology Ratna Djuwita Universitas Indonesia I Wayan Weta Universitas Udayana Fakultas Kedokteran Desak Made Wihandani Universitas Udayana Fakultas Kedokteran Rizka Maulida Universitas Indonesia Herawati Sudoyo Eijkman Institute for Molecular Biology Andrew J. Holmes (  andrew.holmes@sydney.edu.au ) University of Sydney https://orcid.org/0000-0002-7406-4387


Introduction
The epidemic of chronic immuno-metabolic diseases, such as obesity and diabetes, presents a major challenge for health systems around the world. Epidemiological studies show that the increased incidence has occurred in association with environmental changes due to modernization, including more sedentary lifestyle and nutritional changes. They further show that for many of these nutrition-related chronic diseases (NRCD), differences in the human gut microbiota are also associated with them. For many NRCD, mechanistic links between microbes and pathological processes have been identi ed in animal models. This complex aetiology is known as dysbiosis -an undesirable state of the body system in which microbiota factors participate in the pathophysiological processes of the host. Two inter-related public health questions thus arise. Do environmental changes drive the microbiota in ways that globally alter the risk of dysbioses within a population? If so, which dimensions of the environment are most signi cant?
Cross-sectional studies of human populations occupying distinct environments have repeatedly found signi cant differences in the human gut microbiota [1][2][3][4][5][6][7][8][9]. It is widely accepted that these differences are potentially relevant to human health, however, neither the environmental factors driving microbiota differences nor their relevance for public health is well understood. A reason for this uncertainty is that human populations vary in multiple ways, including genetic, cultural, and geographic factors, each of which can also interact over time. A good example is the human diet. An individual's food intake is shaped by both cultural and geographic factors that collectively de ne their nutrient environment.
The term nutrition transition has been used to describe the net process of a society moving from a foraging or subsistent agriculture lifestyle to a modern urbanized lifestyle with industrialized food supply chain [10]. Differences in the structure of the gut microbiotas are typically seen in people from societies at different points of the nutrition transition [1][2][3][4][5][6][7][8][9]. However, separating the effects of dietary change from other factors (e.g. socioeconomic status, healthcare, and ethnicity) is di cult. Available data can be roughly categorized into three groups: 1) studies which compared pre-and post-transition societies at a speci c point in time [1-7, 9, 11, 12]; 2) longitudinal study of demographic groups as they immigrate from a pre-transition to post-transition society [8]; and 3) experimental diet studies in individuals [13,14]. A recurrent nding in these cross-sectional studies is that some taxa, notably Prevotella and Bacteroides, are differentially represented in microbiotas of pre-and post-transition populations.
The clustering of human gut microbial communities by Prevotella and Bacteroides was rst reported by Arumugam et al. They coined the term enterotypes -which also included a third grouping based on Ruminococcus -and cautiously proposed that they would have applications in diagnostics [15]. A caveat in de ning enterotypes, which was raised in that study and has been elaborated since by multiple authors, is that it is a complex process in which their observation is a product of analytical methods that are dataset dependent [16,17]. Nevertheless, the Prevotella-type community is repeatedly found to be more prevalent in subsistent societies, and the Bacteroides-type in industrial societies. Such ndings have been used to suggest that diet drives these microbiota variations [5,14,17]; and differences in intake of dietary bre and animal source foods, in particular, have been postulated as key factors [5,11,13,14,18].
While numerous studies have shown that diet can induce a rapid change in the microbiota of individuals [13,14,19], evidence that it drives enterotype formation is lacking. Long-term diet patterns have been associated with enterotypes [14,[20][21][22], but other societal factors in uencing microbial dispersal and community assembly are also associated. These barriers include, but are not limited to, water supply, housing density, birth mode, child-rearing, and infection control in healthcare settings. Effects on dispersal would almost certainly intersect with diet changes to in uence microbiota assembly.
In this study, we aimed to explore the gut microbiota of a demographically narrow group of individuals, raised during a period of rapid economic and nutrition transition [23,24]. In this cohort of 41 young females from Bali, Indonesia, we found two community types in an approximately equal frequency.
Further, we explored the association of these community types with other aspects of microbial structural composition, diet, and obesity. Lastly, we look for patterns of community type distribution across different human populations by conducting a meta-analysis that incorporated publicly available data from 216 individuals from ve populations with distinct socio-cultural and geographical backgrounds. We showed that the Bali cohort has a distinct pattern of community type frequency and that it is not explained by contemporary diet.

Study design
This is a cross-sectional study of the gut microbiota of Bali individuals from Indonesia in comparison to individuals from other populations. The aim is to identify microbiota patterns associated with lifestyle transition. Particularly, we looked for microbiota features in pre-and post-transition societies; and further, explored the association of the differences with nutritional intake and obesity.

Enrolment of Bali individuals
A cross-sectional sampling was performed. Study enrolment for the Bali cohort was conducted on 19-23 January 2015, at the Faculty of Medicine, Udayana University, Denpasar, Bali Province, Indonesia. We recruited all available volunteers in the area at the time of enrolment. We received written and signed informed consent from all participants. A clinician declared the volunteers were not suffering from chronic or infectious diseases at the time of enrolment. The volunteers self-declared that they had not consumed antibiotics, underwent surgery, or became pregnant within 3 months before sample collection.
Anthropometry and diet data collection Age was determined by date of birth. Weight and height were measured without shoes to the nearest 0.5 cm and 0.1 kg, respectively. BMI was calculated as body weight by squared height (kg/m2). Dietary information was collected using 24-hour food recall and food frequency questionnaires. NutriSurvey software (2007 English version of EBISpro, Germany) was used for analysis of nutrient content. The analysis used the Indonesian food database developed by The Southeast Asian Ministers of Education Organization -Regional Tropical Medicine and Public Health Network (SEAMEO-TROPMED) and the Deutsche Gesellschaft für Technische Zusammenarbeit (GTZ) GmbH Federal Republic of Germany [25].

Stool samples
Faecal samples were self-collected by the volunteers in a pre-sterilized pot. Within 4 hours of defecation, the samples were put in -20 o C freezer for 8-48 hours. Frozen samples were packed into aero-thermal insulated containers with freezer packs and delivered to the Eijkman Institute for Molecular Biology in Jakarta within 5 hours and stored in -20 o C freezer on arrival. Within 48 hours of delivery, we extracted metagenomic DNA using the PowerSoil DNA Extraction Kit (MOBIO Laboratories Inc., USA, Catalogue No. 12888-100) following the manufacturer's protocol, but sample homogenization was done for 15 minutes using Vortex Genie-II (MOBIO Laboratories Inc., USA). DNA quality was assessed using NanoDrop® Spectrophotometer (Thermo Fisher Scienti c) and 1% agarose gel electrophoresis. Extractions for low quality DNA samples were repeated. Good quality DNA samples were stored in -20 o C before being sent for sequencing at the Ramaciotti Centre for Genomics at the University of New South Wales, Australia, in May 2015.

Sequence processing
We sequenced the variable region 4 (V4) of 16S rRNA gene from the extracted metagenome, using 515F/806R sequencing primers, as previously described [26]. Paired-end sequence reads were merged in silico using the fastq-join tool (parameters: minimum overlap of 8 bp and maximum difference percentage of 10%) [27]. Merged sequences that contained low-quality reads (Q < 30) were truncated. We selected reads with lengths between 230 bp and 255 bp. Sequence data was imported into Quantitative Insights Into Microbial Ecology (QIIME) version 1.9.1 [28]. Chimera sequences were removed using uchime [29]. Operational Taxonomic Units (OTUs) were picked at 97% sequence similarity according to uclust method. Reads that did not align to the greengenes reference database (version 13.8) were retained as de novo OTUs [30]. Singleton OTUs were removed. We assigned taxonomy using Ribosomal Database Project Naïve Bayesian Classi er (RDP Classi er) at 80% similarity to Greengenes (version 13.8) reference dataset [31].

Diversity analysis
Microbial alpha-diversity was computed using Faith's Phylogenetic Diversity [32], Chao1, and Shannon index. Distances between samples based on the microbiota (beta diversity) were calculated as weighted and unweighted UniFrac distance [33] and analysed using Principal Coordinate Analysis (PCoA) [34]. We performed a clustering analysis on the Bali dataset according to Ward method on the weighted UniFrac distance [35].

Co-Abundance Groups (CAG) analysis
To identify ne-scale taxonomy patterns in the Bali dataset, we performed co-abundance network analysis using genus-level data at the sequencing depth of 90,000 sequences per sample. Only genera with a presence in more than 30% of sample size (12 samples) were used. We assessed pairwise relationships of these genera using Kendall correlation test (t). Positive Kendall scores (0 < t < 1) were taken as co-existing relationships, while negative scores (-1 < t < 0) were taken as contra-existing relationships. The scores were gathered into a matrix of t-scores, mathematically inversed (1 -t), and put through a hierarchical clustering analysis according to Ward's method [10]. We performed empirical clustering con gurations using k = 2 to k = 12 and tested the cluster distinction in these con gurations using Analysis of Similarity (ANOSIM) with 10,000 permutations. The highest con guration which passed the test (p < 0.05) was determined as co-abundance groups (CAG).

Comparative dataset
We downloaded merged and ltered V4 pair-ended sequence of 16S rRNA gene from the Metagenomics RAST server (MG-RAST) public repository [36]. Sample selection used the following inclusion criteria: study design (observational), sample type (stool), analysis region of the 16S rRNA gene (V4), and age group (adults aged 18-60). We obtained samples from Hadza hunter-gatherers, Malawians, Guahibo Amerindians, Italians, and Americans (Project No. mgp401 and mgp7058). Metadata for the reference dataset, including data on geography, age, and sex, were obtained from their original studies [2,3]. OTUs were assigned de novo at 97% sequence similarity. OTUs with < 0.005% presence within a population were removed. After the taxonomy assignment, the OTUs were grouped at species-level (Greengenes Level 7). The resulting table was rare ed to 4,000 sequences per sample and used for calculating intersample weighted UniFrac distance. The Greengenes reference sequence (version 13.8) served as the base for calculating phylogenetic relationships between the species-level OTUs.

Data analyses
All tests and gures were generated in R Project for Statistical Computing version 3.4.1 (www.rstudio.com). Images were generated in RStudio and compiled in Inkscape version 0.48. The normality of data distributions was tested with Shapiro-Wilk test. Differences in continuous variables between sample groups were tested with Student's T test if the distribution was normal, or Wilcoxon's Rank-Sum test if the distribution was skewed. When multiple comparisons were performed, p-values were corrected using the False Discovery Rate (FDR) method [37]. Signi cant correlations are determined as having a p < 0.05, or q < 0.05 if FDR adjustments were performed. Beta diversity data was visualised using Principal Coordinate Analysis (PCoA). Comparison between groups in weighted UniFrac data used ANOSIM. Associations between bacterial genus and beta diversity were analysed using vector regression tting and Permutational Multivariate Analysis of Variance (PERMANOVA) [38]. The projection of Prevotella and Bacteroides onto PCoA used Generalized Additive Model (GAM). Pairwise correlation relationships were measured using the Kendall Rank-Correlation test. Associations between CAGs, BMI, and macronutrient intake were analysed using univariate and multivariate rank-based regression model; and to assess the robustness of multivariate models, we used the reduction in dispersion test as previously described [39]. Models that passed the test (p < 0.05) are considered signi cant.

Demographics and diet of the Bali cohort
We recruited 47 female individuals between 18 to 27 years old in Denpasar City, Indonesia. Faecal samples were obtained from 41 individuals, of whom 36 identi ed themselves as ethnically Balinese, while the other four individuals identi ed as Batak Toba, Javanese, Chinese descend, and a Javanese-Balinese admixture. Four individuals were not born in the area but have lived in Bali for at least 2 years and they were born in cities (Jakarta, Bekasi, and Pangururan). There were six (14%) obese individuals with body mass index (BMI) over 30 kg/m 2 (Table 1).
Food recall and food frequency questionnaires revealed limited dietary variation. Energy intake was low (Mean ± SD = 1,359 ± 599 kcal daily), but all lay within the Acceptable Macronutrient Distribution Ranges (AMDR). All individuals had rice as the staple source of energy. Rice is also the primary source of plantbased protein; followed by tofu and tempeh (Additional File 1: Fig. S1). Meat consumption was generally once or twice daily. Most individuals had either chicken, pork, or sh. Beef intake varied signi cantly, with 6 out of 10 people abstained from it. Three individuals were vegan.

Microbiota of the Bali cohort
We produced over 8.5 Gb pair-ended reads of the 16S rRNA gene V4 region from 41 faecal metagenome extract, as previously described [26]. Quality and chimera ltering of these sequences yielded 6,655,198 pair-joined reads, between 110,358 and 228,531 reads per sample. Clustering at 97% sequence similarity yielded 3,167 non-singleton de novo Operational Taxonomic Units (OTUs). The OTUs were assigned to 13 Phyla, 55 Families, and 99 Genera according to greengenes reference data (version 3.18).
Using alpha diversity metrics as indicators of how community structure variation across the data set, Shannon entropy showed a range of values from 5 to 8 that were stable with sampling depth (Fig. 1a).
Other metrics including richness (Chao1) and phylogenetic distance (Faith's PD) continued to increase as the sequencing depth increased (Additional File: Fig. S2). We normalized the data to 90,000 sequences per sample in further analyses.
Analyses of between-subject diversity revealed a robust separation of the microbiota samples into two groups based on community structure (Additional File1: Fig. S3). There was no microbiota distinction between individuals who are ethnically Balinese and otherwise. The separation into two clusters was strongly associated with relative abundances of the genera Prevotella and Bacteroides (Fig. 1b). The pattern is also observable in linkage analysis of relative abundances at the Family level (Fig. 2). We consider these groups similar to the previously reported Prevotella and Bacteroides enterotypes by Arumugam et al. [15]. Henceforth, we refer to these clusters of Bali microbiotas as Type-P and Type-B communities as recommended by Costea et al. [17]. There was a comparable number of Type-P (n = 18) and Type-B (n = 23) individuals. We con rmed that relative abundances of Prevotella and Bacteroides were signi cantly different (p < 0.001; Wilcoxon test) between groups categorized by community type (Additional File 1: Fig. S4).
To test for other community dimensions that may correlate with Type-P and Type-B individuals, we performed univariate vector regression analyses on genus-level abundance data (comprised of 80 most common OTUs) against the inter-sample weighted UniFrac distance matrix. The result identi ed eight OTUs as signi cant drivers of differences: Bacteroides, Bilophila, Haemophilus, an unclassi ed Mogibacteriaceae OTU, Prevotella, an unclassi ed Prevotellaceae OTU, an unclassi ed Rikenellaceae OTU, and Sutterella (all p < 0.001, Additional File 1: Table S1). To quantify the strength of the association in a multi-variable setting, we performed PERMANOVA (Additional File 1: Table S2). We found that compared to other OTUs, the association with Prevotella was the strongest (R 2 = 0.33, p < 0.0001), followed by Bacteroides (R 2 = 0.16, p < 0.0001). These results collectively suggest that although multiple dimensions of community structure are correlated with the categorization as Type-P or Type-B, Prevotella and Bacteroides correlate the strongest.
Individuals with Type-P communities had lower diversity Diversity has widely been reported as a measure relevant to 'microbiota health'. Although many studies have found signi cant associations between microbiota diversity and health metrics, few patterns that are consistent across disparate human populations have emerged. The ecological signi cance of alpha diversity metrics is through their insight into the partitioning of ecological space between distinct taxa in a single community.
Here we found a consistent association between community type classi cation and alpha diversity metrics, whereby Type-P individuals were signi cantly lower in microbial diversity (Shannon) compared to lean Type-B individuals (p < 0.001, Wilcoxon test, Fig. 1c). Similar results were also seen using Faith's PD, Chao1, and Berger-Parker diversity metrics, albeit at lower statistical support (all p < 0.05, Wilcoxon test, Additional File 1: Fig. S5).
We also tested the relationship between obesity and diversity. Of the six obese individuals in our dataset, ve were observed with Type-P microbiota and only one with Type-B (Fig. 1b). Given the effect of community type on diversity, this small number limits the ability to robustly test the comparison.
Nonetheless, the Type-P individuals obese (n = 5) group had signi cantly lower Shannon diversity compared to the Type-P lean (n = 13) group (p < 0.05, Wilcoxon test). However, the difference was not statistically signi cant when measured using Faith's PD, Chao1, and Berger-Parker metrics (p > 0.05, Wilcoxon test).
Additionally, we observed a lower phylum-level richness in obese individuals regardless of community types (Additional File 1: Fig. S6). Yet, there was no signi cant difference in abundance for Firmicutes, Bacteroidetes, Proteobacteria, and Actinobacteria between the obese (n = 5) and lean (n = 35) group (p > 0.05, Wilcoxon test). Neither do we nd a signi cant difference in the Firmicutes to Bacteroidetes ratio of these individuals (p > 0.05, Wilcoxon test). At the genus-level, 13 taxa were different by obesity (p < 0.05, Wilcoxon test), but these differences were not signi cant after adjustment for multiple comparisons (Additional File 1: Table S3).
Type-P and Type-B microbiotas may re ect distinct assembly processes The Bali microbiotas were further explored for co-abundance relationships, which identi ed ten distinct genus-level co-abundance groups referred to as CAG1 through CAG10 (Fig. 3a, Additional File 1: Fig. S7). Both Bacteroides and Prevotella showed signi cant positive correlations (Kendall t > 0.5) to multiple other taxa forming CAG1 and CAG6, respectively. Using the net relative abundance of the CAGs (determined by summing the abundances of all CAG members in a sample), we observed that the CAGs containing Prevotella (CAG6) and Bacteroides (CAG1) had the highest summed abundances (Fig. 3a). As expected, their distribution across individuals re ected the Type-P and Type-B classi cation (Fig. 3b). Notably, these two CAGs and three others (CAG3, CAG4, and CAG7) were signi cantly different between the two community types (Fig. 3c). These ve groups account for variations in 44 genus-level OTUs, although only 15 OTUs were independently different between Type-P and Type-B group after adjustment for multiple comparisons (Additional File 1: Table S4).
These results collectively suggest that the community type concept re ects fundamental differences in the outcomes of community assembly processes, and it is not simply a product of variance in the two most abundant genera (Prevotella and Bacteroides). Nevertheless, the method to consistently de ne community types is challenging; and we note that if CAG pro les were taken as the main criterion for community type classi cation (rather than the clustering approach used here), then there were 11 individuals in which neither CAG1 nor CAG6 was dominant (i.e. would not classify as Type P or B). For instance, three individuals we categorized as Type-P (BA013, BA028, and BA021) were outliers from the other Type-P's because of their lower CAG6 abundance (< 20%); and one individual we categorized as Type-P (BA019) is potentially a false-negative in that CAG6 had higher abundance than CAG1 (Fig. 3b). We also observed that in four Type-B individuals (BA002, BA006, BA015, and BA027) CAG4 was strongly present (Fig. 3b). The CAG4 is dominated by Ruminococcus, which has been proposed to represent a third enterotype [15,17].
Through multivariate analyses, we found that CAG9 did not show signi cant associations with diet and community types (Additional File 1: Fig. 8); but it did show a signi cant association with obesity (Estimate = 0.52, R 2 = 0.30, p < 0.05). CAG9 association with increased BMI is retained in the all-inclusive model (Estimate = 0.49, R 2 = 0.3, p < 0.05). Across all variations of regression models, we only found an association between higher macronutrient intake with lower CAG1 abundance, but the relationships were linked to community types and weaker in predictive value (Additional File 1: Fig. 8). In a univariate analysis, there was a trend for lower protein intake in Type-P individuals, but the effect was weak, and the relationship is not signi cant (Estimates= -2, R 2 = 0.85, p > 0.05). These results indicate that associations between diet and community types are negligible in our data, but associations between CAGs and physiological states (e.g. obesity) may exist.

Prevotella and Bacteroides gradient across populations
To look for similar patterns in community type distribution in different human populations, we conducted a meta-analysis with data from ve other populations which have different ethnogeographic background. In addition to the 41 Bali, the data comprised of 22 Hadza hunter-gatherers, 21 Malawians, 28 Guahibo Amerindians, 16 Italians, and 129 Americans. The total number of samples is 257 individuals with 413,126,162 sequences -in which we observe 374 assigned taxonomic units (ATU) at species-level (Additional File 1: Table S5).
The result showed a separation of samples by demographic groups (Additional File 1: Fig. S9a). Interestingly, despite their narrow demographic structure (age, gender, and residency), the Bali microbiotas were a highly dispersed group in PCoA (Additional File 1: Fig. S9a). The pairwise inter-individual distance of the Bali was signi cantly higher than Malawi, Hadza, Italian, and USA (Additional File 1: Fig. S9b).
A known caveat for enterotypes is that they are de ned with respect to the dataset that they are part of. Thus, categorizations in different studies are not directly comparable. To address this, we looked at the distribution of Prevotella and Bacteroides across all populations as a proxy indicator for Type-P and Type-B communities. We tested the applicability of this using Generalized Additive Modelling (GAM), in which the relative abundances of Prevotella and Bacteroides were mapped onto the PCoA data of weighted UniFrac distances ( Fig. 4a-b). The rst two principal coordinate axes account for 49.1% of the variance in the data (at 36.2% and 12.9% for PCo Axis 1 and 2, respectively). The results showed a strong t for Prevotella and Bacteroides abundance to PCo Axis 1 and 2, respectively (adjusted R 2 > 0.9, p < 0.001, Additional File 1: Table S6-S7).
When the abundance distribution of Prevotella and Bacteroides was assessed by population, we found that communities with a dominance of Prevotella over Bacteroides were strongly over-represented in the three pre-transition societies (Hadza, Malawian, and Guahibo Amerindian) and the opposite for Americans and Italians (Fig. 4c). Consequently, the mean abundance of these genera was signi cantly different at a population level (Additional File 1: Fig. S10). The Bali cohort was distinctive in that Prevotella or Bacteroides dominance occurred in similar frequency in the population, and thus the mean abundances were similar.

Discussion
Nutrition-related chronic diseases are widely considered to be examples of dysbiosis in which their incidence is strongly shaped by socio-economic in uences on the human diet. In this study, we aim to identify phenomena associated with nutrition transition that may drive microbiota patterns at the level of society. Particularly, we looked for microbiota features distinguishing pre-transition societies from those of post-transition; and further, to explore the dimensions of the environment which are most likely to have driven the difference. We postulated that features discriminating pre-and post-transition should show intermediate values in samples of societies actively undergoing transition. If true, then sampling such active-transition societies is predicted to provide a more useful dynamic response range for the identi cation of environmental drivers.
To test our hypotheses, we focussed on the concept of microbial enterotypes. It has been proposed that categorizing microbiotas into subtypes based on dominant features of their community structure (the enterotype concept) is a potential discriminator between pre-and post-transition societies [17]. Here, we tested this proposition, and its potential correlation to diet and disease, in a cohort of young female from Bali which represents a narrow demographic group from a society that has only very recently undergone transition [23,24,40,41]. The analysis of diet and microbiota characteristics in this distinctive cohort, integrated with a meta-analysis of representative human microbiota datasets, provides new insights into potential drivers of enterotypes and their relevance to public health application.
The enterotypes concept was rst proposed by Arumugam et al (2011) who coined the term to distinguish three clusters found in a multi-country human gut microbiota dataset based on Jensen-Shannon distance [15]. Similar patterns have since been reported in numerous other studies [5,14,16,17]. However, there has been much controversy over how to use the concept of enterotypes as a tool to simplify the description of patterns in the human microbiota. It is evident that enterotypes are not intrinsically discrete biological entities; rather their observation is a property of the methods used to visualize beta diversity in the data [16,17]. Consequently, a robust de nition of enterotypes for the purpose of assessing their representation across populations in meta-analyses is very challenging. In a recent perspective article, Costea et al. proposed a uni ed nomenclature and guidelines for categorization aimed at facilitating the use of the concept for diverse datasets [17]. They proposed the terms ET-P, ET-B, and ET-F for referring to clusters where the attractor for that cluster is the taxon Genus:Prevotella, Genus:Bacteroides, or Class:Firmicutes. In this study, we have largely adhered to those guidelines, although taken a distinct approach to assessing the application of the concept in our meta-analysis using generalized additive models of marker taxa to support the ndings.
In our beta diversity analyses of the Bali cohort, we found strong support for segregation into two clusters, which we refer to as Type-P and Type-B in accord with the proposed enterotype nomenclature. We used rigorous statistical tests to scrutinize these groupings, which showed that Prevotella and Bacteroides were indeed the main drivers of the clustering (although not the sole driver). These two genera have been consistently reported to differ between industrialized and subsistent societies [1,5,7,11,[15][16][17]42]. In our network analysis, co-abundance patterns were generally consistent with the cluster analysis -Prevotella or Bacteroides as the 'hub genus' were signi cantly over-represented in individuals categorised to the designated community type. However, there were samples in which neither Prevotella nor Bacteroides are dominant. Although we did not see a third cluster in the Bali cohort, we note that four individuals in the Type-B cluster were dominated by CAG4, which includes various genera typically associated with the proposed third enterotype (ET-F). Our data are thus consistent with the existence of three similar types of community structure, but their visualization through clustering methods was not possible in the Bali dataset.
We then tested the proposition that the enterotypes would discriminate pre-and post-transition societies; and furthermore, that metrics based on these community types would be related to the stage of nutrition transition in that society. A challenge for this meta-analysis is that both the enterotype assignment and the determination of CAGs share a similar limitation in that they are products of a data-dependent clustering approach. As such, they are not suitable for a meta-analysis across multiple datasets. As a result, we did not see clusters that would permit facile classi cation into two or three community types in our analyses of the combined dataset, nor was this found in a similar study by Gorvitoskaia et al. [5]. As an alternative, we used the hub taxa, Prevotella and Bacteroides, to quantify Bali's position in the multicountry dataset and projected their relative abundance data onto the ordination using regression models. The result showed that Prevotella and Bacteroides abundances were strongly correlated to major differences in microbiotas across six populations.
Our data also showed that Prevotella and Bacteroides abundance in Bali represented an intermediate value if compared to pre-and post-transition societies. Our ndings are consistent with our hypothesis that a nutritional transition is associated with a gradient of diverging communities states (measured here as enterotype and hub taxon incidence) -whereby the Bali cohort broadly occupied an intermediate position between pre-and post-transition societies in the multi-country dataset. These ndings suggest that Bali is undergoing a shift in the meta-microbiota of their society. A plausible explanation for this is that recent industrialization of the food system have driven changes in microbial exposure and diet [23,24,40,41], both of which are known to in uence community assembly.
It has previously been found that enterotypes correlate to diet, including choices between animal or plantbased food [13,14,22]. Due to its association with nutrition, enterotypes has also been associated with increased incidence of cardiometabolic disorders (e.g. obesity, cholesterol) [17,[43][44][45][46]. To investigate the extent to which nutrition may alter the microbiota and affect obesity incidence, we employed various combinations of multivariate regression models on the Bali dataset. Our ndings in these explorative analyses showed a striking lack of signi cant correlation between CAGs or community type to contemporary diet. Neither do we nd associations with carbohydrate and bre source (primarily rice). We found only a weak association between Bacteroides enrichment and higher protein intake (predominantly meat in the Bali dataset).
The lack of signi cant patterns may be due to the limited variety of food items in the Bali cohort, and thus these ndings would require veri cation in a larger sample set. Nevertheless, our data do not support the idea that patterns of community type (or marker taxa) within a society will be re ected by short-term diet observations. This does not exclude effects of long-term diet as a driver microbiota differences across populations. It is possible that different patterns of Prevotella and Bacteroides relative abundance between subsistence and industrialized societies may re ect different levels of bre intake, animal-based food intake, and fat-sugar-enriched processed food products [5,13,14], but the drivers of this difference are likely to be more complex than short-term diet habits.
Although our Bali cohort is all young females, all of them were raised during the period of rapid lifestyle transition. Given that: 1) the timescale over which diet in uences the observation of enterotype-like patterns, or the time point in developmental history when the patterns become xed, is not yet clear; 2) enterotypes can become stable in early life; and 3) trans-generational microbe exposure in uences microbiota diversity; it is possible that differences in their early development may have driven the striking divergence in microbiota pattern. Our data are consistent with the Thai US immigration study, which reported that living in an industrialised society induced a replacement of Prevotella with Bacteroides in the human gut and that it progressed across generations [8]. Recent studies in the Central African Republic (BaAka Pygmies and Bantu populations) and Nepal (Himalayan populations), also showed similar transitional states in the microbiota between rural and urban populations [6,9]. Collectively, these data indicate that changes in human socio-economic conditions over time are re ected in the frequency distribution of Prevotella and Bacteroides across human societies.
Claims for microbiota association with nutrition-related disease began with obesity [44,47]. Although the small size of the obese cohort in this study limits our ability to detect robust differences, our ndings are instructive with regards to the use of simplifying metrics in health associations. Five of the six obese Bali individuals in our study were identi ed with a Type-P community which had lower microbial diversity. But despite this difference in diversity, it is unclear whether the difference is exclusively due to obesity or because the trait is linked to Type-P community. These ndings present a contrast to the prevailing view that microbiotas commonly found in pre-industrial societies, typi ed by Type-P (or ET-P), have higher diversity and lower risk for NRCD [17,[43][44][45][46]. However, we point out that the apparent associations of diversity and obesity with Type-P here could be a product of the categorization into community types.
Those Type-P individuals classi ed as obese in our study were distinguished by higher abundance of CAG9 and it was only this CAG that showed any statistical support in multi-variate models.
Costea et al. have proposed that one means of circumventing the issue in enterotype categorization is by enterotype assignment through relation to a reference dataset [17]. In our view, this approach has potential merit, but it does not address the main limitation of the method. The concept of enterotyping (at least as it has so far been promulgated) is essentially simplifying community structure to one dominant dimension. Therefore, its application to predicting the in uence of the microbiota in individuals (such as disease risk or treatment outcome) will be dependent on the assumption that a signi cant fraction of relevant in uence of the microbiota for that outcome is captured in the set of one-dimensional categories.
Even in our small data set, it is clear that relying solely on enterotype classi cations obscures the multidimensional nature of microbiota contribution to health outcomes [16]. However, our data also show that enterotype markers do have a strong predictive value for some population-level characteristics. We propose that the concept of enterotypes has very limited value in predicting properties of an individual, but that it is useful in predicting properties of a demographic group and thus enterotypes may inform study design.
Together with other ndings, our data highlighted the biological signi cance of enterotypes, particularly their implications toward disease association in future studies. Failure to address population heterogeneity between experimental groups may lead to false discoveries of disease biomarkers, particularly in societies undergoing socio-economic change. As shown in this study, the presence of two distinct community types in the Bali microbiota is a signi cant confounder for identifying obesity markers. We anticipate that microbiota disease association studies will be more robust if consideration is given to the baseline distribution of enterotypes (or proxy taxa) -since the enterotype is predicted to re ect extrinsic and intrinsic factors in uencing community composition in the sampled population. In terms of human societies, these factors can include, but not limited to, ethnicity, cultural restrictions, early-life development, nutritional choices, other demographic factors, disease co-factors, pre-existing community composition, and inter-species relationships within the microbiota.
It has long been recognized that clinical trials require treatment groups to be matched for confounding factors such as age, gender, and socioeconomic status. An emerging model is that the gut microbiota exhibits the property of multi-stability; but in the stable states adopted, a large proportion of microbiota variance that occurs across human societies can be explained by the nutrient environment and interspecies interactions in uencing the assembly. Whilst acknowledging the caveats described above, we propose that the enterotype concept can potentially underpin the development of useful tools to facilitate validation of life-history matched cohorts in studies of NRCD. This may be achievable through relatively simple proxy measures of enterotyping, such as abundance-ubiquity relationships of Prevotella in study cohorts; or simple indices for Prevotella and Bacteroides [5,8]. Such microbiota-matched cohorts could inform the design of clinical trials and ultimately development of precision medicine strategies for obesity and comorbid diseases.

Conclusions
Bali is a society that has recently undergone transitions in their socioeconomic and nutritional landscape.
We analysed the gut microbiota in 41 Bali individuals and found strong segregation into enterotype-like clusters that we refer to as Community Type-P (Prevotella-dominant) and Type-B (Bacteroides-dominant).
Our meta-analysis incorporating data from 5 other distinct populations further showed that this segregation was strongly associated with differences in pre-and post-industrial populations, indicating that Bali is undergoing a shift in the meta-microbiota along with the changes in the society. Further analyses with diet and obesity data showed that the presence of two distinct community types in Bali is a signi cant confounder for identifying health markers, and thus enterotypes has limited value in predicting properties of an individual. Collectively, however, our ndings suggest that microbiota clusters are much more useful in predicting population-level strati cation that may inform demographic heterogeneity in future studies.

Consent for publication
This manuscript contains de-identi ed individual data. We have received written consent for publication of said data from all study participants.

Availability of data and materials
The Bali dataset supporting the conclusions of this article is available in the European Nucleotide Archive repository, project code PRJEB32385. Datasets for other populations are available in the MG-RAST repository, project number mgp401 and mgp7058. The authors declare that all data supporting the conclusions of this article are included within the paper and its additional les. Full accounts of the R scripts used for statistical analyses and data visualisation is available in Additional File 2.