Captivity systematically alters the composition yet not the diversity of vertebrate gut microbiomes

There is an open debate on whether and how captivity alters the gut microbiota of vertebrates, due to the contrasting results reported in different taxa and the absence of systematic multi-species analyses. We performed a meta-analysis of gut microbiota proles of 322 captive and 322 wild specimens from 24 vertebrate species, including sh, reptiles, amphibians and mammals. We found no evidence that captivity either systematically depletes or increases their gut microbiota. In 88% of the species analysed, although captivity entailed a loss of a fraction of the diversity found in the wild, this was compensated through recruitment of a proportionally similar amount of new taxa only found in captivity. We show such compositional changes can impact evolutionary and ecological inferences that rely on hierarchical clustering-based comparative analyses of gut microbial communities across species. (5 species) exhibited any systematic trend either (Primates: REM dR : SMD = 1.972; t = 2.43, p-value = 0.072, REM dRER : SMD = 1.387; t = 1.900, p-value = 0.131; Cetartiodactylans: REM dR : SMD = 0.440, t = 0.830, p-value = 0.451; REM dRER : SMD = 1.157, t = 1.21, p-value = 0.294). Our results therefore indicate that captivity neither systematically drops, nor increases, the diversity of the gut microbiota, and that increased diversities observed in some species in captivity are not the result of poor study design and limited sample size 6 , but real biological signal. Rather, the effect of captivity on the gut microbiota probably depends on a combination of the intrinsic features of the host, and specic conditions animals are subject to, such as the diet, environment and health treatments 12,13 . This level of detail is not always reported in the literature (but see Supporting Dataset), which complicates to assess its impact on the resulting microbiota proles unless experiments are conducted 13


Introduction
The gastrointestinal tract of most animals on Earth hosts a microbial community 1 , the gut microbiota, that shapes their phenotypes through modulating a range of physiological processes 2 . Due to the plasticity of the gut microbiota 3 , animals living in different environmental conditions often exhibit particular microbial signatures 4 . Many wild animals are reared in captivity for reasons spanning scienti c research, to education, leisure and conservation. There is a general belief that microbial diversity drops under captivity conditions, due to the simpli cation of the environment in which host live. However, evidence supporting the opposite pattern has also been reported in some species 5 , although it has been suggested these observations might be spurious 6 . In consequence, whether consistent patterns of gut microbiota variation exist between wild and captive animals remains unresolved.
Besides, captive animals are often used to address ecological and evolutionary questions of animalmicrobiota interactions; e.g., how the natural history of vertebrates shape their gut microbiota 7 . Marked differences between wild and captive animals could challenge the representativeness of biological results drawn from captive animals 8 . However, the level of distortion the use of captive animals entails for such analyses is yet to be assessed.
To address these questions, we analysed the gut microbiota of comparable wild and captive individuals belonging to 24 vertebrate species, and performed a meta-analysis of the diversity and compositional variation between wild and captive individuals. We also contrasted the data to the expectations of four theoretical models relating to how the gut microbiota may vary when comparing wild and captive animals. Finally, we assessed how ecological and evolutionary inferences can differ depending on the origin of the animal specimens employed to characterise gut microbial communities.

Results And Discussion
After screening 222 publications, we retrieved raw 16S rRNA gene amplicon data and corresponding metadata from 23 publications that met the criteria explained in the Methods section. To increase comparability 9 , i) a common bioinformatic pipeline was used to re-generate the data from raw reads, and ii) sequence data were aggregated to the genus taxonomic level to minimise biases introduced by the distinct 16S rRNA regions targeted 10 . The resulting datasets contained 644 individuals (322 wild and 322 captive; 25.8 individuals/dataset) belonging to 24 animal species ( Fig. 1a-b; SI).
We conducted diversity analyses based on Hill numbers 11 , using two contrasting metrics that account for different components of diversity. The rst one, named dR, only considers a single component of diversity, namely richness (i.e., number of bacteria genera detected). The second one, called dRER (also known as phylogenetic Shannon diversity), incorporates three components of diversity: Richness + Evenness + Regularity. Thus, it does not only consider richness, but also relative abundances of detected genera (evenness) as well as their phylogenetic relationships (regularity) (see SI for further details). Alpha (K-W dR : X 2 = 5.185, df = 25, p-value = 1.126e-93; K-W dRER : X 2 = 3.385e + 02, df = 25, p-value = 1.060e-56) and beta (PMV dR : R 2 = 0.509, p-value = 0.001; PMV dRER : R 2 = 0.683, p-value = 0.001) diversities of the gut microbiotas studied differed across species. Average bacterial richness (number of genera) ranged between 3.7 ± 3.2 and 89.4 ± 16.0 (Fig. 1c). The overall meta-analyses of diversity differences based on the two metrics (e.g., dR in Fig Our results therefore indicate that captivity neither systematically drops, nor increases, the diversity of the gut microbiota, and that increased diversities observed in some species in captivity are not the result of poor study design and limited sample size 6 , but real biological signal. Rather, the effect of captivity on the gut microbiota probably depends on a combination of the intrinsic features of the host, and speci c conditions animals are subject to, such as the diet, environment and health treatments 12,13 . This level of detail is not always reported in the literature (but see Supporting Dataset), which complicates to assess its impact on the resulting microbiota pro les unless dedicated experiments are conducted 13 .
In contrast to diversity, microbiota composition did exhibit signi cant variation between wild and captive individuals within each species. We detected signi cant (PMV dR , p < 0.05) compositional changes in 96% of the species when considering only richness, while signi cant compositional differences (PMV dRER , p < 0.05) dropped to 64% when also considering evenness and regularity (Fig. 1f,g). To better understand this relationship between diversity and compositional changes, each dataset was contrasted to four theoretical models of gut microbiota compositional differences between wild and captive animals ( Fig. 1i). Model A assumed barely no difference between the gut microbiota of both populations. Model B described a scenario in which the gut microbiota of captive animals is a subset of that of wild counterparts. Model C de ned a situation in which captive animals recruit a proportional set of microorganisms that is different from that of wild counterparts, yet maintain a considerable overlap. Finally, Model D described a situation in which the gut microbiotas of both populations are totally different. We found that C was the model that best described 88% of the datasets (Fig. 1i). This pattern has been also reported in studies in which wild animals are brought into captivity and resampled after several months ( 14 . Accordingly, prevalence analyses showed that, on average, 28.9 ± 17.8% of genera were only detected in wild organisms, another 28.8 ± 11.8% were only detected in captive individuals, while 42.3 ± 12.2% were detected in both captive and wild specimens (Fig. 1h). Dissimilarity metrics based on Hill numbers' beta diversity 11 exhibited moderate values (0.41 ± 0.12; 0.20-0.60) when analysing dR (Fig. 1f) and low values (0.13 ± 0.10; 0.02-0.38) when analysing dRER (Fig. 1g). This drop suggests that the turnover of genera majorly happened among bacteria with low relative abundances and with phylogenetically related taxa.
We then assessed the impact of such compositional differences in commonly employed procedures used to compare animal-associated microbiotas when addressing ecological and evolutionary questions.
Overall, we detected a strong signi cant correlation between the gut microbiota dissimilarity across species computed based on captive and wild populations (dR: r = 0.877, t = 31.538, df = 298, p-value < 0.001, Fig. 2c). However, in 30% of the cases, the microbiota of a population from a different species exhibited a more similar composition than the wild or captive counterpart of a particular species. This pattern can also be observed in the dissimilarity ordination plot (Fig. 2a,b). Our results show that such differences can yield diverging topologies in hierarchical clustering analyses (Fig. 2d), which are often used to analyse eco-evolutionary patterns, such as phylosymbiosis 15 . This is particularly relevant when analysing species with shared dietary features and similar hindgut microbial communities, such as primates and artiodactylans (Fig. 2b).

Conclusions
Our study revealed that captivity does not systematically simplify the gut microbiota, yet it does recurrently change its composition. In most cases, captivity entails the loss of some bacteria along with recruitment of new taxa absent in wild animals, which buffers, compensates or even overcomes the loss of diversity. Such a turnover mainly occurs in low-representation bacteria that are replaced by phylogenetically related taxa. These differences can introduce biases in multi-species comparative analyses, which suggests that data from captive animals should be used with caution when addressing evolutionary questions concerning wild organisms.

Literature search and data retrieval
We performed a literature search on the internet (Google Scholar, Web of Science) using keywords including "gut microbiota", "animal microbiome", "microbiome 16S" and similar to retrieve relevant studies. This search yielded 222 articles published between 2014 and 2020. The materials and methods of these articles were analysed to ascertain whether the study met the following criteria: i) all wild and captive samples were processed using identical procedures, ii) compared wild and captive animals were phylogenetically closely related (members of the same species or species complex), iii) captive individuals were born in captivity, or no information was provided about the origin of the captive animals; i.e., wild animals brought into captivity and sampled some time later were excluded, iv) captive animals that underwent a deliberate selection process (e.g. inbred mice or domestic animals) were also excluded for considering them genetically not comparable to the wild counterparts, and v) only datasets with sample sizes over 12 individuals were considered for analysis. Raw data were extracted from the databases and repositories indicated in the articles (accession numbers listed in the "Bioinformatic resources").
Bioinformatic sequencing data processing Data les from the different studies were i) stored at the University of Copenhagen's Electronic Research Data Repository (ERDA), ii) assigned a unique study identi er and iii) re-processed in the Danish National Supercomputer for Life Sciences 'Computerome2' using a new bioinformatic pipeline we developed for processing data with different characteristics, including sequencing mode, read length and 16S rRNA gene fragment. The entire code can be found in the "Bioinformatic resources''. In short, for each individual dataset, we quality-ltered (mean phred score of q=25) and (if necessary) trimmed and merged the paired-end reads based on the sequence overlap using AdapterRemoval2 16 . Primers (if present) were trimmed using Cutadapt 17 , and reads were dereplicated with USEARCH Derep 18 using a relative minimum copy number threshold of 0.01% of the total sequencing depth. Reads were then converted into zero-ratio OTUs using the denoising algorithm UNOISE3 19 , and USEARCH was used to map the reads back to the OTUs and create an OTU table. HS-Blast 20 was used to assign taxonomy against the nonredundant Silva 132 database 21 , and taxonomic assignments were ltered using different identity thresholds for each taxonomic level: 97% for genus-level taxonomy, 95% for family-level taxonomy, 92% for order-level taxonomy and 90% for higher taxonomic levels 22 . To minimise the impact of incorrectly assigned taxa, taxonomic annotations below these identity thresholds were converted into unclassi ed. This pipeline yielded relative read abundances assigned to different taxa for each individual dataset analysed.

Data quality ltering
Individual data les generated by the aforementioned pipeline were aggregated by study and host species into genus-level abundance tables. The two datasets of Sarcophilus harrisii retrieved from two different studies were processed independently. We then proceeded to quality-lter the genus-level abundance tables of each species through ltering individuals by minimum sequencing depth, minimum diversity coverage and taxonomic annotation. Only individual datasets with more than 1000 reads and diversity coverage values over 99% were retained, and nal genus-level abundance tables that contained at least ve animals in each contrasting group were considered for analysis. Read counts assigned to taxonomic groups not belonging to Bacteria or not present in the LTPs132_SSU release of the SILVA Living Tree (https://www.arb-silva.de/projects/living-tree) used for measuring the phylogenetic relationships among bacteria were removed to ensure accurate measurements of phylogenetic diversities. In the cases where one group (either wild or captive) outnumbered the other, samples were randomly selected to ensure even sample sizes.

Diversity analyses
Diversity analyses were carried out in the R statistical environment v.3.6.3 23 based on the Hill numbers framework. The operations explained below were conducted using the R packages hilldiv 24 , vegan 25 , ape 26 , phytools 27 , meta 28 , dmetar 29 and dendextend 30 .

Hill numbers
The Hill numbers framework encompasses the group of diversity measures that quantify diversity in units of equivalent numbers of equally abundant taxa 31,32 -in our context bacteria genera. Hill numbers provide a general statistical framework that is su ciently robust and exible to address a wide range of scienti c questions that molecular ecologists regularly try to answer through measurement, estimation, partitioning and comparison of diversities 33 .
Diversity has multiple components, which can be analysed separately through decomposition of diversity or jointly through speci c diversity metrics. The three main components of diversity are richness, evenness and regularity 34 . Richness is the most intuitive component of diversity, as it only counts whether taxa are present or absent in the system. Evenness quanti es how unbiased a system is in terms or relative abundances of the taxa that comprise it, and the relative weight of abundant and rare taxa can be modulated through the parameter q, also known as the order of diversity. Finally, regularity quanti es the degree of co-relationship among the different taxa, which in our study referred to phylogenetic distances. Hill numbers-based metrics can account for one, some or all components of diversity within a uni ed statistical framework. To obtain a complete vision of the gut microbiome differences between wild and captive animals, we conducted all our diversity and compositional analyses based on two contrasting metrics: the so-called dR, which only accounted for richness (i.e. whether bacteria taxa were present or not, q=0) and the dRER, which considered Richness + Evenness of order of diversity 1 + Regularity (i.e. whether bacteria taxa were present or not, their relative abundance values that proportionately weighs rare and abundant taxa and their phylogenetic relationships -this metric is also known as the phylogenetic Shannon diversity). Further explanations about the use of Hill numbers in molecularly characterised systems can be found elsewhere 11,34,35 .
Phylogenetic tree RER metric requires a phylogenetic tree to compute the dissimilarity values among taxa. As our datasets contained different fragments of the 16S rRNA gene, we were unable to generate a phylogenetic tree from our DNA sequence data. Therefore, we relied on the SILVA Living Tree, and used the LTPs132_SSU release as the reference phylogenetic tree for computing homogeneity values.

Alpha and beta diversity metrics
We computed individual-based diversity metrics using the function hilldiv::hill_div, and obtained average alpha diversity metrics per species, as well as wild and captive populations per species. We also computed beta diversity values for all pairwise combinations of the 644 analysed individuals using the function hilldiv::pair_dis, from which a dissimilarity matrix was derived using the hilldiv::beta_dis function.

Across-species comparisons
We used a Kruskal-Wallis (K-W) test as implemented in the function hilldiv::div_test to ascertain whether the mean diversity values varied across analysed host species, and a PERMANOVA (PMV) test using vegan::adonis function based on the pairwise dissimilarity matrix to test whether host species were compositionally distinct 25 . In both cases, statistical signi cance level was set at a FDR-adjusted p-value of 0.05.

Wild-captive diversity differences meta-analysis
Average alpha diversity metrics of wild and captive populations per species were used to conduct a random-effects-model meta-analysis with raw effect sizes using the function meta::metacont. We used the Sidik-Jonkman estimator for the between-study variance and the Knapp-Hartung-Sidik-Jonkman adjustment method. The overall effect was calculated using Hedge's g (SMD) and its 95% con dence interval and p-value. An identical analysis was performed for the entire dataset and two representative subsets of ve species, containing only datasets derived from primates and cetartiodactylans.
Higgin's & Thompson's I 2 test, Tau-squared T 2 and Cochran's Q were used for quantifying the heterogeneity between the included species. Due to the high heterogeneity found in the study we evaluated whether the between-study heterogeneity was caused by outliers with extreme effect sizes, which could be distorting our overall effect. We de ned an outlier if the species's con dence interval did not overlap with the con dence interval of the pooled effect using dmetar:: nd.outliers function.The function detected three outliers in dR metric (GOGO, PEMA and TUTR) and seven in dRER (RHBR, PYNE, PEMA, TUTR, MOCH, CENI and AIME). Even when these outliers were excluded from the analysis the I 2 heterogeneity value was substantial for dR (I 2 from 79.3 to 71.4%) and moderate for dRER (I 2 from 86.9% to 54.2%) and signi cant for both (Cochran's Q, p-value <0.001). We performed a sensitivity analysis removing the outliers from the meta-analysis, yet the results of the random effects model did not change (R: SMD=0.345, p-value= 0.075; REH: SMD=0.015, p-value= 0.928). We also performed Graphic Display of Heterogeneity (GOSH) plots to explore the patterns of effect sizes and heterogeneity in our data. We used three supervised machine learning (k-means, DBSCAN and the Gaussian Mixture Model) algorithms to detect clusters in the GOSH plot data and determine which studies contribute the most to them automatically using dmetar::gosh.diagnostics function. This analysis detected ve datasets which might potentially contribute to the cluster imbalance, yet even removing these datasets from our study the results of the meta-analysis did not change. Funnel plot and Egger's regression test were used to evaluate publication bias. Egger's test was not signi cant (intercept=0.773, 95%=-2.34, CI=-3.89, t=0.486, p-value=0.632), which means that there is no substantial asymmetry in the Funnel plot that could be caused by publication bias. In light of the above, we decided to include all datasets.

Wild-captive compositional differences
We performed two complementary analyses to study compositional differences between wild and captive animals within each species. On the one hand, we quanti ed compositional differences between wild and captive animals within each species by means of Sørensen-type overlap-complement derived from the beta diversity values between wild and captive populations using the function hilldiv::beta_dis. On the other hand, we tested for compositional differences using PERMANOVA (vegan::adonis) based on the pairwise dissimilarity matrix of individuals within each host species.
Contrast of compositional differences with pre-de ned models We de ned four models that described four different scenarios of gut microbiota variation between wild and captive animals. Each model was characterised as a different combination of percentage of taxa detected only in wild individuals, in both populations and only in captive individuals, respectively. Model A (5%, 95%, 5%) assumed barely no difference between the gut microbiota of wild and captive animals.
Model B (75%, 20%, 5%) described a scenario in which the gut microbiota of captive animals is a subset of that of wild counterparts. Model C (33%, 34%, 33%) de ned a situation in which captive animals recruit a proportional set of microorganisms that is different from that of wild counterparts, yet maintain a considerable overlap. Finally, Model D (45%, 10%, 45%) described a situation in which the gut microbiota of wild and captive animals are totally different. Subsequently, we measured the percentage of taxa detected only in wild individuals, only in captive individuals and in both populations for each host species, and calculated the Euclidean distances between the observed data and the four models. The model with the lowest distance to the observed data was selected as the most representative for each dataset.

Compositional similarities among populations within and between host species
We split the abundance table of each species into two subtables only containing either wild or captive individuals, and these subtables were converted into incidence data, which yielded one incidence-based microbiota pro le representing each origin (wild and captive) per dataset. Pairwise dissimilarities of all dataset*origin combinations were computed using hilldiv::pair_dis. For each of these 50 datasets, we identi ed the closest match and calculated the percentage of cases in which the closest match was not the wild or captive counterpart of each species.
Correlation and topological differences between wild-and captive-data derived host comparisons The pro les generated in the previous step were employed to compute pairwise dissimilarities of the microbiota pro les among species, based solely on wild (25 datasets) and captive (25 datasets) individuals. The correlation between the two dissimilarity matrices was computed in terms of Pearson Product-Moment Correlation test using the function stats::cor.test. For each pairwise comparison, we also calculated the evolutionary distance in terms of millions of years of divergence time. Pairwise dissimilarity matrices derived from wild and captive individuals were subsequently used to generate compositional cladograms based on hierarchical clustering (UPGMA method).

Charts and gures
All charts and gures in the manuscript were originally generated either in R or in Mac OS X Numbers based on csv les created with R scripts. These were subsequently modi ed in Adobe Illustrator to achieve the desired layout without distorting the dimensions of the quantitative elements. Fig. 1d and 1e were created using the function meta::forest, Fig. 2a and 2b using the function hilldiv::dis_nmds and