Long-term life history predicts current elderly gut microbiome


 Extensive scientific and clinical microbiome studies have explored contemporary variation and dynamics of the gut microbiome in human health and disease1–4, yet the role of long-term life-history effects is underinvestigated. Here, we analyzed the current microbiome composition in the elderly Bruneck Study cohort (n = 304; age 65–98) with extensive clinical, demographic, lifestyle, and nutritional data collected over the past 26 years. Combined analysis with the Flemish Gut Flora Project cohort (FGFP; n = 2,215; age 18–85) showed community richness increasing during aging linked to increased observation of low-abundance bacteria. Multivariate analysis of historical variables indicated that medication history, historical physical activity, past dietary habits, and specific past laboratory parameters explain a significant fraction of current elderly quantitative microbiome variation, enlarging the explanatory power of contemporary covariates by 33.4%. Prediction of current enterotype by past host variables revealed good levels of predictability (AUC > 0.7) for the Prevotella and dysbiotic Bacteroides 2 enterotypes with information from up to 15 years past. These findings demonstrate long-term life history effects on the microbiota and provide first insights into lifestyle variables and their role in maintaining a healthy gut microbiota in later life.

Microbiota diversity change with age is a debated subject. Prior assessments of alpha-diversity during aging gave con icting results (with both positive 8,9 and negative 10,11 associations) attributed to confounding by residence type 9 and/or frailty 12 . In our combined cohort (age range 18-98, combined n=2,519; relative microbial pro ling (RMP)), we observed a signi cant positive correlation between age and microbial richness (Spearman rho range 0.12 to 0.18, p < 0.05; Figure 1a). As only three subjects were in nursing homes in the combined cohort, the positive correlation observed here is likely independent of residence type. Further analyses of Bray-Curtis dissimilarities found elderly (age≥65) community composition signi cantly different from young adults (18≤age<30) and adults (30≤age<65; pairwise Adonis R 2 [0.003:0.01], FDR = 0.001; Extended Data Fig. 1a and Supplementary Table 1). Taxa increasing with age were mostly from the rare tail of the abundance curve (Extended Data Fig. 1b and Supplementary Tables 2, 3). On the contrary, top-ranked core taxa such as Faecalibacterium, Roseburia, and Anaerostipes were negatively correlated with age in elderly (Spearman rho [-0.28:-0.27], adjusted for gender, BMI, stool moisture, and antibiotics intake, FDR<0.1; Figure 1b and Extended Data Fig. 1c, d). Reduction of these major butyrate-producers might suggest unfavorable metabolic changes in the elderly gut.
We next studied whether the distinct elderly gut microbiome could be due to cumulative exposures to various perturbations. Prior to the determination of long-term life-history effects on the elderly microbiome, we quantitatively investigated contemporary microbial community covariates in the Bruneck Study cohort based on a db-RDA analysis after removing collinear variables (as in 1 ; Supplementary Table  4). Highly ranked contemporary covariates linked with microbial community composition were qualitative and quantitative proxies of transit time like current stool moisture, defecation frequency, hard stools, and obstipation (db-RDA, adjusted R 2 [1.5:2.4%], FDR < 0.1; Figure 2b, and Supplementary Table 5a). These variables were consistently found associated with microbial community composition in different age groups (Extended Data Fig. 2a), and previously identi ed as top covariates in the FGFP cohort 1 . In total, we observed a 7.0% non-redundant cumulative effect size from 11 signi cant contemporary covariates (db-RDA, FDR < 0.1; Supplementary Table 5a) using a forward stepwise redundancy analysis (Supplementary Table 5b), in line with estimates in adult populations 1 . Analysis of combined effect size of signi cant covariates per category revealed that current health parameters (e.g. liver stiffness) and bowel habits (stool moisture, defecations, and hard stools) had similar (redundant) cumulative explanatory power for elderly community variation (12.4 to 12.8% of variation; Supplementary Table 6a). Total additive effect size including further categories (medication, diet, anthropometric feature, and lifestyle) was 16.6%.
Next, we sought to assess the in uence of the extensive array of historical parameters collected during prior Bruneck Study evaluations (1990-2016) on the current microbiome composition. We rst performed a db-RDA as above using the historical parameters of each year as explanatory variables (Supplementary  Table 7a Finally, we combined all signi cant contemporary and historical features (Supplementary Tables 5a, 7a, 8a, and 8b), in one comprehensive db-RDA analysis and found a nal set of 14 variables signi cantly explaining the current microbiome variation, of which three were historical ones from 2010 (gammaglutamyl transferase (GGT), caloric intake, cereal ber score, one was average dumpling intake , and four were duration of beta blocker treatment (1990-2016), long-term changes in non-sport physical activity (2005-2016), hemoglobin (1990-2016), and alanine transaminase (ALT; 2005-2016). All together, they signi cantly increased the nal cumulative non-redundant effect size to 10.4% (likelihood ratio test p < 0.05; Figure 2a and Supplementary Table 8d) and total additive effect size to 25.5% ( Figure  2c and Supplementary Table 6c). Overall, this shows that the inclusion of historical data resulted in a 33.4% increase in non-redundant explanatory power for global microbiota variation.
We further deepened the relationship of these historical variables with the current microbiome by focusing on current absolute taxonomic group abundances as well as community enterotype based on Dirichlet Multinomial Modeling-based clustering previously validated across multiple cohorts [13][14][15] .
Previous studies detected four enterotypes 7 , dominated by either Bacteroides (B1 and B2 enterotypes, with B2 having a.o. lower microbial load and abundance of Faecalibacterium compared to B1) 16 , Prevotella (P), and Ruminococcaceae (R). All four enterotypes were present in the Bruneck cohort. Using the signi cant historical variables identi ed in the above RDA analyses (db-RDA, FDR<0.1; Figure 2a and Supplementary Table 8d), we rst analyzed beta blocker treatment in association with community diversity. By dividing subjects into three groups (chronic (treatment of beta blocker both in 1990 and 2016), current, and none (not medicated in 1990 and 2016), we found that beta blocker treatment was linked to a signi cant compositional shift (beta-diversity; Adonis r 2 = 0.013, p < 0.001; Extended Data   Fig. 2e). Further analysis of speci c taxonomic associations identi ed a list of bacteria more abundant in subjects who did not use beta blockers, which can be potential targets for remediation strategies (Generalized linear model (GLM), FDR<0.1, adjusted for age and stool moisture; Supplementary Table 10) if future studies con rm a causal link for this association.
Analysis of average dumpling intake (1995-2016), a historical covariate with the second-largest effect size, showed that P enterotype has had higher dumpling intake than B1 enterotype (Kruskal-Wallis test, FDR < 0.1; Extended Data Fig. 2f and Supplementary Table 11). This could be explained by the association of the P enterotype with long-term dietary patterns 15 and perhaps impact of traditional dietary intake or -lifestyle on the current microbiome given that they are a key component of traditional food in the Bruneck region.
We next looked at change in non-sport physical activity between the year 2005 and 2016. We rst identi ed taxa associated with both physical activity shifts (i.e., the change from the past to the present) as well as current levels of physical activity. Although no genera were associated with both variables, butyrate-producing bacteria (i.e., Roseburia, Faecalibacterium, and Butyricicoccus) signi cantly increased with long-term physical activity (Spearman rho [0.18:0.21], FDR<0.1, adjusted for age and stool moisture; Supplementary Tables 12-13). The positive in uence of exercise on gut health has gained recent attention with elevated abundances of Roseburia and Faecalibacterium reported in t individuals and those who perform regular exercise 17-20 . In order to study effects of changing physical activity, we clustered subjects into four categories: high activity in the past and at present (cluster 1), high in the past and low at present (cluster 2), low in the past and high at present (cluster 3), low in the past and at present (cluster 4). Interestingly, subjects who have recently increased physical activity as well as those who have consistently maintained high activity exhibited reduced ratio of (dysbiotic) B2 to non-B2 enterotypes. This suggests a bene cial role of physical activity for the healthy elderly gut ecosystem (pairwise Chi-Square test, FDR < 0.1; Figure 2e and Supplementary Table 14a).
Finally, we studied changes in hemoglobin between 1990 and 2016. Analysis of taxonomic association with both current hemoglobin as well as changes showed that another butyrate-producing bacterial genus, Coprococcus, was signi cantly associated with high levels of current hemoglobin as well as hemoglobin increase over time (Spearman rho [0.15:0.21], FDR<0.1, adjusted for age and stool moisture; Figure 2f and Supplementary Table 15-16). The oxygenation-dependent metabolic state of colonocytes was found associated to gut butyrate producers, also linked to outgrowth of opportunistic aerobic species 21 . Therefore, our results suggest that not only current but also historical adequate levels of hemoglobin and oxygen could be of value in retaining a healthy microbiome in later life. Using the previous clustering approach, the B2/non-B2 ratio was signi cantly higher in hemoglobin cluster 2 (highto-low) compared to cluster 4 (low-to-low) (pairwise Chi-Square test, FDR < 0.1; Figure 2g and Supplementary Table 14b) implying that rather a drop of hemoglobin is associated with dysbiosis than having consistently low levels throughout the years. Higher abundance of Coprococcus (butyrate producer) in cluster 4 versus cluster 2 is in line with this nding (GLM, FDR < 0.1, adjusted for age and stool moisture; Supplementary Table 17). As an accurate re ection of anemia, low hemoglobin levels can be a feature of recent major bleedings or surgery or of in ammatory, infectious, or neoplastic disorder 22 . Therefore, a drop in hemoglobin levels could be an indication of many conditions as well as poor health. Analysis of changes in ALT between 2005 and 2016 using the same approaches as above showed that only the current ALT levels were signi cantly associated with Methanobrevibacter but not with the enterotype ratio (Spearman rho = -0.18, FDR<0.1, adjusted for age and stool moisture; Supplementary  Fig. 2g).
Next, we estimated the predictive power of past lifestyle and physiological parameters on current microbiome composition. We investigated both effect sizes of single markers and predictive potential of more complex machine learning-based models. To explore single markers, we rst investigated long-term predictability by focusing on the three signi cant individual historical variables from the year 2010 (db-RDA, FDR < 0.1; Figure 2a and Supplementary Table 8d) and enterotypes, but no ndings emerged (Kruskal-Wallis, p>0.05; Supplementary Table 19). The lack of statistical signi cance for these variables implied a limited predictive power by single parameters. Therefore, we sought to determine the predictive power of life history using a combination of variables to predict current microbiome enterotype, as well as to investigate how far back we can use this combined information. To this aim, we applied a Random Forest classi er with feature selection to predict the current enterotype for each past sampling year based on diet, blood parameters, health, anthropometrics, and lifestyle using only variables that were available across all years for parallel comparison. Models derived from a random training dataset (70% of the total) were applied to test data (the remaining 30%) in order to estimate predictive power and avoid over tting. Models gave fair levels of prediction for the Prevotella (P; for 2000 and 2016) and  Fig. 3c). Interestingly, the prediction variables selected for each sampling year showed distinct patterns for the various enterotypes (Figure 3b and Extended Data Fig. 3d). For example, variables belonging to multiple categories were selected for B2, while for the P enterotype a much larger contribution of dietary variables was visible across the years. Unlike the two other enterotypes, in B2 more variables from the medication class were selected in the three recent time points implying an association with health deterioration (Extended Data Fig. 3e and Supplementary Table 20). Overall, our study provides rst evidence that the current gut microbiome is indeed predictable by past variables. Nonetheless, further validation in independent cohorts with a similar long-term sampling protocol would be warranted to con rm these results.
In conclusion, we show that an individual's life history has long-term effects on the assembly of the gut microbiome. We report, for the rst time, the predictability of the current gut microbiome by historical host parameters along with distinct characteristics of the elderly microbiome using a quantitative approach. In a large-scale, lifespan covering combined cohort (n = 2,519), we con rm a positive correlation between community richness and age. There is compelling evidence that the gut microbiome rapidly expands from birth until early childhood and stabilizes in adulthood 23-25 . Our results indicate that the gut remains dynamic at high ages and is in uenced by the host's life history. Speci cally, we found that changes in an individual's medication history, non-sport physical activity, and hemoglobin levels over time were linked to the individual's current microbiome. Further, we could predict an individual's current enterotype based on host parameters from as early as 15 years prior. Overall, these results suggest that long-term history of host laboratory parameters, medication, diet, and lifestyle can exert signi cant impacts on the current microbiome, highlighting rst key variables that are important for maintaining a healthy gut at a later life stage. individual's physician-con rmed medical history and diseases, food intake, lifestyle, vascular risk factors, medication, and laboratory parameters 6,[26][27][28][29][30] . In the survey area, virtually all inhabitants are referred to one local hospital that closely works together with the general practitioners, which allows retrieval of full medical information. Accordingly, in this study information on clinical diseases (current and past) and morbidities as well as medication does not rely on the subject's self-report but was validated by medical records and based on standard diagnostic criteria.

Main
Dietary intake was evaluated by quinquennial (1995,2000,2005,2010, and 2015) dietician-administered 118-item food-frequency questionnaires (FFQ) based on the gold-standard FFQ by Willett and Stampfer 31 and adapted to the dietary peculiarities in the survey area 26,30 . Dieticians made use of illustrative photos of foods when exploring aphasic patients and of information provided by spouses, caregivers, and nursing homes. For each item in the FFQ, a common unit or portion size was speci ed, and we instructed participants to customize how often on average they had consumed that amount in the past years. The nine response categories ranged from 'never' to 'six or more times a day'. We calculated nutritional intake by assigning a weight proportional to the frequency of use for each food (once per day equals a weight of one), multiplying this weight by the nutrient value for the speci ed size, and summing the contribution of all foods. Reproducibility and validity of the original FFQ are well documented 31 and extend to its application in the Bruneck Study, in which it was compared against nine-day diet records 26,30 . The Alternate Healthy Eating Index (AHEI), a measure of diet quality, signi cantly associated with the risk of major chronic diseases in a large number of studies, was calculated as described previously 33 . We did not consider the 'duration of multivitamin use component' because multivitamin supplementation was almost absent in our cohort.
Accordingly, this index has eight components in our study (vegetable score, fruit score, cereal ber score, alcohol score, meat ratio score, nuts and soy score, trans-fat score, polyunsaturated-to-saturated fatty acids ratio) 33 . Physical activity was quanti ed using the Baecke questionnaire 34 , rated activity intensities according to the compendium of physical activities, and used these data to calculate average metabolicequivalent hours per week (overall and separated in sports and non-sports physical activity). Individuals were coded as current smokers or non-smokers (including former smokers) with assessment of packyears of smoking 29 . Alcohol intake was quanti ed in grams per day. Body mass index was calculated as weight in kilograms divided by height squared in meters. Systolic and diastolic blood pressures were taken after the participant had been sitting for at least ten minutes, and the mean of three independent measurements was calculated. Hypertension was de ned as systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or the use of antihypertensive drugs. Socioeconomic status was de ned on a three-category scale (low, medium, high) based on information about occupational status and educational level of the person with the highest income in the household. Blood samples were taken in the morning hours after an overnight fast and 12 hours of abstinence from smoking and immediately processed or stored at -70 °C. Diabetes mellitus was diagnosed when fasting plasma glucose exceeded 126 mg/dl or when participants were on anti-diabetic medication. Laboratory parameters were assessed by standard methods in certi ed labs as detailed previously 6,[26][27][28][29][30] . All study participants underwent ultrasound and transient elastography (Fibroscan®, Echosens, France) examination to evaluate hepatic steatosis and liver stiffness. Out of 325, 20 subjects were excluded due to missing data of laboratory parameters, liver stiffness, stool features, and visceral fat thickness (Supplementary Information). Missing data less than 5 was replaced by the cohort mean, otherwise removed throughout the analysis (variables removed: muscle mass (%), metabolic rate, Bristol stool score, and fat mass (kg)). The FGFP cohort used in the present study (n = 2,215) is an expanded version of the rst round of sampling completed in 2014 (n = 1,106) 1 , 35 . FGFP procedures were approved by the medical ethics committee of the University of Brussels-Brussels University Hospital (approval 143201215505, 5/12/2012). A declaration concerning the FGFP privacy policy was submitted to the Belgian Commission for the Protection of Privacy. Written informed consent was obtained from all participants.

DNA extraction and sequencing
Faecal DNA extraction and sequencing were performed as described previously 1 . Brie y, DNA was extracted from 150-200mg of the frozen samples using MagAttract PowerMicrobiome DNA/RNA KF kit (QIAGEN) following the manufacturer's instructions. The V4 region of 16S rRNA genes was ampli ed using the 515F /806R primer pair and puri ed using the QIAquick PCR Puri cation Kit. Sequencing was performed using the Illumina MiSeq platform (MiSeq Reagent Kit v2) and HiSeq 2500 System (151 base pair paired-end) for the Bruneck Study and the FGFP cohorts, respectively.

Microbial load measurement by ow cytometry
Microbial load of the study cohort was measured as described previously 7 . Brie y, 200-250 mg frozen (-80°C) faecal aliquots were diluted in saline solution (0.85% NaCl; VWR International, Germany) and ltered using a sterile syringe lter (pore size of 5 µm; Sartorius Stedim Biotech GmbH, Germany). Next, 1 mL of the microbial cell suspension obtained was stained with 1 µL SYBR Green I (1:100 dilution in DMSO; Thermo Fisher Scienti c, Massachusetts, USA) and incubated for 15 min in the dark at 37°C. The ow cytometry analysis was performed using a C6 Accuri ow cytometer (BD Biosciences, New Jersey, USA) based on Prest et al. [14]. Fluorescence events were monitored using the FL1 533/30 nm and FL3 >670 nm optical detectors. The BD Accuri CFlow software was used to gate and separate the microbial uorescence events on the FL1/FL3 density plot from the faecal sample background. A threshold value of 2000 was applied on the FL1 channel. Based on the exact weight of the aliquots analyzed, cell counts were converted to microbial loads per gram of faecal material.

Relative and quantitative microbiome pro ling
After demultiplexing with LotuS v1.565 36 , fastq sequences were further processed following the DADA2 microbiome pipeline 37 . Brie y, sequence reads were rst ltered and trimmed with the parameters: truncQ=11, truncLen=c(130,200), and trimLeft=c (30,30). Filtered reads were denoised using the DADA2 algorithm, which infers the sequencing errors. After removing chimeras, amplicon sequence variants (ASVs) table was constructed, and taxonomy was assigned using the Ribosomal Database Project (RDP) classi er implemented in DADA2 (RDP trainset 16/release 11.5). To prepare quantitative microbiome pro ling (QMP) table, the relative microbiome pro ling (RMP) taxonomic table were then corrected for copy number and rare ed to even sampling depth -a division of sequencing depth by the cell countsand subsequently multiplied by bacterial cell load to quantify the number of bacteria per gram of faecal sample as previously described in 7 . One subject was further excluded due to low read counts during the data conversion. This way, the sequencing data becomes proportional to the microbial loads in the samples. All analysis was performed based QMP unless otherwise noted. Statistical and microbiome analysis were performed in R (version 3.6.0) 38 using phyloseq 39  and Boruta 48 packages. For the associations of the microbiota with any host parameters, taxa found in less than 20% of the population were excluded for noise reduction and alleviation of multiple testing correction. Comparison of 2 groups was performed using Wilcoxon rank sum test, and Kruskal-Wallis test when analyzing more than 2 groups followed by post-hoc Dunn's test. Count data was analyzed by Fisher's exact test. Taxonomic association with host parameters were determined by partial correlation to adjust for confounders using the R package ppcor 46 . All statistical tests were followed by multiple testing correction using the Benjamini-Hochberg method when testing more than two features.

Elderly microbiome analysis
Community diversity and compositions were evaluated using α-(observed richness, Shannon, and Pielou's index) and β-diversity (principal coordinates analysis on Bray-Curtis dissimilarity) at the genus level using phyloseq 39 and vegan 40 based on RMP. Partial correlation adjusting for gender, BMI, stool moisture, and antibiotic intake was used to analyze the associations of host age with the current microbiota. The correlation analysis was carried out based on centered log-ratio (CLR) transformed taxonomic table to remove compositional artifacts in the RMP data. Prior to the log transformation, zeros were imputed using the minimum relative abundance for each taxon 49 . Data transformation was performed independently for each cohort, and genera commonly found between the cohorts were further analyzed. Core taxa for the Bruneck cohort were de ned as genera prevalent in greater than 95% of the population. As the core taxa are largely in uenced by different sample size, 50 repetitions of subsampling were performed without replacement randomly selecting 15 subjects in each age category to match the small sample size of subject in their nineties (n =11).
Analysis of community variations using the current and past variables Explanatory power of cohort covariates and their combined effect size for the microbial community variation was evaluated as described previously 1 . Brie y, distance-based RDA (db-RDA) was performed on genus level using Bray-Curtis dissimilarity as implemented in vegan 40 . Covariates (FDR < 0.1) found in this step were entered for forward stepwise model selection to measure their cumulative effect sizes. Prior to the analysis, collinearity of variables was checked by Spearman's rank correlation and Wilcoxon rank sum test for continuous and binary variables, respectively. One of the collinear variables was removed based on their representativeness and explanatory power if their effect size was > |0.8| (Supplementary Table 4). To identify variables driving the elderly gut community variation as the host become older, subjects were divided into equal size windows based on their age (n =156). To assess the effect of past events/host parameter shifts on the current microbiome variation, different approaches were performed for continuous and binary variables (infection, medication, and smoking). Taxonomic association analysis after adjusting for age and stool moisture was performed by tting generalized linear model (GLM, link = logit). Beta blocker treatment and hemoglobin clusters were used as binary dependent variables and genera as independent variables. Standardized β coe cients were calculating using the R package lm.beta 45 . The signi cant associations of deconfounded genera with the host parameters were tested by performing likelihood-ratio test. Clustering of subjects was carried out by categorizing them as high or low based on the median values measured in 1990 and 2005 for hemoglobin and non-sport physical activity, respectively. A multiple linear regression was performed on the non-sport physical activity hemoglobin, and alanine transaminase regressing out the effect of age, gender/BMI. Prior to the regression, physical activity was transformed by inverse normal transformation to t normal distribution.

Prediction of the current microbiome based on the life history
In order to construct a prediction model, multinomial random forest was performed using default parameters as implemented in the R package randomForest 47 . The model was trained with randomly selected 70% of the entire dataset and validated using the remaining 30% of the dataset. Variables entered were the ones available across all years from 1995 to 2016. Additional steps of signi cant feature selection for each enterotype were performed to using the Boruta package 48 , which implements the random forest classi er and identi es signi cant features by random shu ing and comparing zscores. Boruta presented the features by 3 categories (i.e., con rmed, tentative, and rejected), and only the con rmed ones were presented in the study. Tenfold cross-validation was used to validate the results obtained from the training and test datasets (R package randomForest 47 , ntree= 500 and default mtry).

Data availability
Raw 16S data is available through managed access at the European Genome/Phenome Archive (https://ega-archive.org) under accession number EGAS00001004453. Derived species abundance counts and transformed microbial trait data can be found in Supplementary Table 21. Bruneck host metadata from this study are available in accordance and in consent with ethical permission through managed access, and organized via Principal Investigator Herbert Tilg, as follows: Upon data request by email to Herbert.tilg@i-med.ac.at the Bruneck data access committee will evaluate access permission, which will be granted upon signature of a data use agreement/material transfer agreement between the governing legal entities.