Gut microbiome of capybara, the Amazon master of the grasses, harbors unprecedented enzymatic strategies for plant glycans breakdown

Background: Plant biomass is a promising feedstock to replace fossil-based products including fuels, chemicals and materials. However, the high resistance of plant biomass to either physicochemical or biological deconstruction has been hampering its broad industrial utilization and, consequently, the transition to a sustainable bioeconomy. The gut system from herbivores are formidable bioreactors in nature for lignocellulose breakdown and the diverse ecological niches where herbivores are found have led to the rise of a myriad of molecular strategies to cope with the sheer complexity of plant polysaccharides. This study illuminates how the underexplored microbiota of the largest living rodent, capybara, found in Pantanal wetlands and the Amazon basin, can eciently depolymerize and utilize lignocellulosic biomass. Results: Here, we have elucidated the gut microbial structure and composition of the semiaquatic herbivorous capybara through multi-omics approaches. Metabolic reconstruction of this microbiota showed that cellulose degradation is likely performed by Fibrobacter bacteria, whereas hemicelluloses and pectins are processed by a broad arsenal of Carbohydrate-Active enZymes (CAZymes) organized in polysaccharide utilization loci (PULs) identied in the multiple metagenome-assembled genomes from the phylum Bacteroidetes. Furthermore, metabolomics analysis showed short chain fatty acids as major fermentation products, which are key markers of digestion performance of plant polysaccharides. Exploring the genomic dark matter of this gut microbial community, two novel CAZymes families were unveiled including a glycoside hydrolase family of β-galactosidases (GHXXX) and a carbohydrate-binding module family (CBMXX) involved in xylan binding that establishes an unprecedented three-dimensional fold among associated modules to CAZymes. Conclusions: Our results reveal how the capybara gut microbiota orchestrates the depolymerization and utilization of dietary plant polysaccharides, representing an untapped reservoir of new and intricate enzymatic strategies to overcome the recalcitrance of plant polysaccharides,

being a formidable plant biomass fermenter, the enzymatic strategies and metabolic pathways employed by its microbial symbiotic community for the breakdown and utilization of recalcitrant dietary bers remain mostly elusive. In addition, wild capybara animals dwelling the Southeast region of Brazil have incorporated sugarcane in their diet for decades [14], which makes their cecal microbiome an especially attractive system for lignocellulose depolymerization of this industrially relevant feedstock.
To elucidate the enzymatic strategies employed by the Brazilian capybara microbiota for plant cell wall deconstruction, we comprehensively investigated this gut microbial community combining an integrated multi-omics approach (16S rRNA gene targeting sequencing, metagenomics, metatranscriptomics and NMR-based metabolomics) with carbohydrate enzymology and X-ray crystallography, which ultimately led to the discovery of two novel CAZy families. These ndings highlight the potential of the capybara gut microbiome as a reservoir of unprecedented enzymatic systems for carbohydrate processing, and thereby expanding our current understanding of gut microbial strategies to overcome the lignocellulose recalcitrance, which might be instrumental to foster the development of bio-based technologies.
Binning of MG assembled contigs based on tetranucleotide frequency and coverage pro le resulted in the reconstruction of 79 unique Metagenome-Assembled Genomes (MAGs) (Suppl. Table S1), being 24 considered of high quality (completeness >90% and contamination <5%) and 50 medium-quality (completeness >50% and contamination <10%), in agreement with the completeness and contamination parameters suggested by Bowers et al. 2017 [17]. Based on the GTDB database, 24 of the recovered MAGs are classi ed as taxonomic novelties, representing either novel species or genera, which include one from Fibrobacterota (family Fibrobacteraceae), one from Planctomycetota (family Thermoguttaceae), one from Spirochaetota (family Sphaerochaetaceae), eight from Firmicutes (families Erysipelotrichaceae, Lachnospirales, CAG-826 and Oscillospiraceae) and 13 from Bacteroidetes (families Bacteroidaceae, UBA932 and Muribaculaceae) (Suppl. Table S1). It is notable that despite only two MAGs from Fusobacteria and Proteobacteria (MAG38, 39 and MAG77,78, respectively) were recovered from MG, the most abundant OTUs revealed by 16S rRNA gene analysis were classi ed to these phyla (20% Fusobacteria and 18% Proteobacteria, Suppl. Fig. S1), pointing to an important role of these species in this microbiota.

Fibrobacteres and Bacteroidetes are main degraders of dietary bers in capybara gut
In order to understand the ability of capybara to convert plant polysaccharides into free sugars, MG and MT data were investigated to determine the genomic potential of its gut microbiota associated with Carbohydrate-Active enZymes (CAZymes). A total of 6,132 putative CAZymes genes encoding for 105 Glycoside Hydrolases (GH), 11 Carbohydrate Esterases (CE), and 10 Polysaccharide Lyases (PL) families were identi ed, of which 456 genes presented a modular architecture (Suppl. Table S2). The most abundant CAZymes identi ed are members of the families GH3, GH2 and GH1 (by decreasing abundance) (Fig. 2), which is in agreement with that reported for other host-associated gut microbiomes such as human, swine and cattle rumen [25]. These enzymes encompass diversi ed activities including β-glucosidase, β-xylosidase, β-galactosidase and β-mannosidase, and are often associated with the nal steps in the depolymerization cascade of several plant polysaccharides such as cellulose, heteroxylans, mixed-linkage β-glucans and β-mannans.
In the CAZyme repertoire of this microbiota neither cellulases from families GH6, GH7 and GH48, nor cellulosomes, assessed by the presence of cohesin and dockerin domains associated with cellulases, could be identi ed in MG or MT datasets. In ruminal anaerobic fungi, these families are found in high abundance, possibly targeting recalcitrant cellulose structures [26]. However, in the capybara gut microbiota assayed here, fungi were detected only at very low abundance (Fig. 1a). This suggests that cellulose degradation in the capybara gut might be mainly accomplished by endo-β-1,4-glucanases (EC 3.2.1.4) from families GH5 (subfamilies GH5_2, GH5_4, GH5_25 and GH5_37), GH8, GH9 and GH45, which were detected either as single domains or in multi-modular protein architectures. Notably, the most expressed genes encoding endo-β-1,4-glucanases belong to families GH5_2, GH8, GH9 and GH45, and were identi ed in Fibrobacter MAGs (Suppl. Fig. S2 and Suppl. Table S3) that also present a high MT/MG ratio (Fig. 1b), indicating that these bacteria are putatively key players in cellulose degradation in the capybara gut. Fibrobacter succinogenes is known as a highly e cient cellulolytic bacterium in the cow rumen [27] and employs a multi-protein complex to attach to cellulose bers and cellulases secreted by the T9SS-dependent secretion system for cellulose breakdown [28]. The three Fibrobacter MAGs recovered from capybara gut microbiome encode cellulases with a T9SS signal sequence as well as proteins for cellulose adhesion including tetratricopeptide, bro-slime, OmpA and pilin proteins, as reported for F. succinogenes [28]. Furthermore, from the set of 347 proteins observed in the outer membrane vesicles (OMVs) from F. succinogenes [29], we have identi ed 262 with sequence identity ranging from 30-99%. These observations suggest that typical Fibrobacter mechanisms, fundamentally relying on cell surface adhesion and OMVs, are central for cellulose depolymerization in the capybara gut.
In the multiple recovered Bacteroidetes MAGs, a large number of polysaccharide utilization loci (PULs) and clusters of CAZymes (CCs) were identi ed ( Fig. 3 and Suppl. Table S4), which provide highly diversi ed capabilities to this microbiota to cope with the chemical and structural complexity of typically abundant hemicelluloses and pectins in gramineous or aquatic plants such as heteroxylans, mixedlinkage β-glucans, β-1,3-glucans, xyloglucans, mannans and homogalacturonans. Notably, the identi ed PUL targeting mixed-linkage β-glucans (Suppl. Fig. S3) is highly conserved in capybara and human microbiomes, presenting identical gene architecture encompassing one GH16 and two GH3 enzymes, as reported in [30]. PULs targeting to heteroxylans and homogalacturonans (Suppl. Fig. S3), common components of gramineous plants such as sugarcane [31], also resemble PULs identi ed in human gut microbiomes [32], highlighting the signi cant level of conservation of Bacteroidetes enzymatic systems in omnivores and hindgut herbivores.
Despite the presence of multiple carbohydrate esterases (CEs) (Fig. 3), the lack of auxiliary-active enzymes (AAs) indicates a low capacity of the capybara microbiota to perform plant biomass deligni cation as also observed for other monogastric herbivores such as horses [33,34]. As a mechanism to cope with lignin-rich diets, these animals may employ cecotrophy to enhance digestibility and nutrient uptake. In addition, it is noteworthy that many identi ed PULs only showed similarity with non-experimentally validated PULs and without a clear substrate target (Suppl . Table S4), which in part could be due to intrinsic limitations of genome reconstruction from metagenomes, but also re ects the variability, heterogeneity and partial knowledge of the structure and composition of the glycans present in the diet of wild capybaras.
Taken together, the CAZyome analysis of the capybara gut microbiota indicates Fibrobacteres as main drivers for cellulose breakdown, whereas the numerous Bacteroidetes PULs/CCs confer to this community a myriad of enzymatic strategies to tackle with the complex and diverse hemicelluloses and pectins typically present in gramineous and aquatic plants, major components of capybara diet.
Global metabolite pro ling shows high performance on the conversion of dietary bers into short-chain fatty acids Once addressed important players in the depolymerization of dietary bers in capybara gut, we further investigated the role of these microorganisms in the conversion of free sugars into energy for the host by integrating metabolomics and metabolic reconstruction analysis.
The major fermentation products measured in the capybara gut were short-chain fatty acids (SCFAs), among more than 40 metabolites detected by NMR spectroscopy-based metabolomics (Suppl. Table S5).
The most abundant metabolites observed in cecal and rectal samples were acetate (mean + SD: 74.83 + 22.17 and 30.40 + 22.76 mM, respectively), propionate (31.0 + 6.67 and 15.98 + 12.8 mM) and butyrate (23.30 + 5.63 and 8.35 + 12.83 mM). These SCFA ratios indicate a forage-based diet and are similar to that seen for ruminants [35,36], supporting a high e ciency of this microbiota in the use of dietary bers as an energy source.
Genes related to pyruvate fermentation into acetate were highly abundant in both MG and MT data for cecal and rectal samples, and they are derived from Firmicutes, Bacteroidetes and Fusobacteria ( Fig. 4 and Suppl. Fig. S4). Metabolic pathway reconstruction analysis shows that acetate can be putatively produced by any of the bacterial MAGs recovered from capybara gut microbiome (Fig. 4), which is in agreement with the high abundance of this metabolite in both cecal and rectal samples (Suppl. Table  S5). On the other hand, the expression analysis of key genes involved in the butyrate pathway (atoA/D genes) indicates that Firmicutes Ileibacterium sp. MAG6 and Megasphaera sp. MAG33 are likely the major butyrate-producing bacteria in the capybara gut (Suppl. Fig. S5 and Suppl. Table S6). The Bacteroidetes Marinilabiliaceae MAG47 and Fusobacteria MAG38 and MAG39 also have co-localized genes atoA/atoD and ptb/butK, suggesting that they also contribute to butyrate production, in some extent (Suppl. Fig. S5 and Suppl. Table S6). The typical genes from acrylate and propanediol pathways involved in propionate production were not identi ed in the recovered MAGs from capybara gut (Fig. 4), but the mmdA gene encoding a methylmalonyl-CoA decarboxylase from the succinate pathway, is widespread mainly among Bacteroidetes and was also observed in some Firmicutes and Fusobacteria MAGs (Suppl. Fig. S5 and Suppl. Table S6). Furthermore, the ratio of propionate detected in the gut capybara gut correlates (R=0.77 and p=0.07) with the relative abundance of Bacteroidetes, supporting the succinate pathway from this phylum as the major source of propionate production in the capybara gut.
Together, these results demonstrate a high SCFA production in the capybara digestive tract, which is a common marker of digestion performance of dietary bers [37], and therefore, reinforce the potential of this microbiota for the breakdown of recalcitrant plant polysaccharides with concomitant production of energetic metabolites for the host.
A new GH family mined from the genomic dark matter of capybara microbiome Drawing on the results showed herein, capybara gut microbiome can be an important source for uncharted enzymes involved in plant polysaccharides depolymerization. Moreover, the joint MG and MT analysis of capybara gut microbiome revealed several expressed genes annotated as hypothetical proteins. Some of these genes are remotely similar to CAZy members, with sequence identity ranging from 10 to 20%, suggesting a potential function in the processing of plant polysaccharides, but requiring further functional investigation (Suppl . Table S7). Aiming to uncover the activity of these proteins, several ORFs were expressed and subjected to biochemical assays employing a diverse set of synthetic, polyand oligosaccharides substrates (Suppl. Table S8).
One of these proteins (SEQ ID PBMDCECB_44807, named here CapGHXXX) was active on p-nitrophenyl-β-D-galactopyranoside (pNP-β-D-Gal) and kinetic parameters were determined from substrate saturation curves (Table 1 and Suppl. Fig. S6). CapGHXXX orthologues are found in Actinobacteria, Firmicutes, Verrucomicrobia and Bacteroidetes MAGs recovered from diverse sources such as rumen, feces, gut, and oral microbiotas (Suppl . Table S9), being the closest sequence from a rumen-derived MAG (UBA2817) from the uncultured RC9 group [38]. Sequence analysis showed that CapGHXXX is distantly related to families GH5 and GH30 (Fig. 5a) and protein threading indicates a TIM-barrel fold (Suppl. Fig. S7), suggesting that this GH family belongs to the clan GH-A. To further explore the GHXXX family, the enzyme CBK67650.1 (SEQ_ID BXY_26070) from B. xylanisolvens, which shares 46% sequence identity with CapGHXXX, was heterologously produced and biochemically characterized (Table 1). This second member also showed β-galactosidase activity that strengthens at biochemical level the establishment of this new GH family.
In the Bacteroidota bacterium MAG42 recovered from Capybara gut, CapGHXXX is found in a predicted PUL, additionally comprising enzymes from families GH2 and GH78. A similar PUL organization was also predicted in Bacteroidetes sp. 1_1_30 recovered from human gut, which yet harbors enzymes from GH36, CE7 and PL8_2 families. It is noteworthy that CapGHXXX is often found fused to a GH36 module or in PULs having GH36 members, as in B. xylanisolvens and Prevotella dentalis, recovered from stool and oral cavity, respectively (Fig. 5b), indicating a synergistic relationship between these families. Moreover, these families are also commonly found along with GH78 α-L-rhamnosidases in the PUL context. In Bacteroidales bacterium UBA2817, a GHXXX member is appended to a GH78 module carrying a CBM67, both targeting rhamnogalacturonans (Fig. 5b). These observations suggest that GHXXX could act on βlinked galactosyl residues in pectic polysaccharides. Capybara Prevotella sp. MAG57 harbors a novel family of carbohydrate-binding module Among the recovered MAGs from capybara gut microbiome, Prevotella sp. MAG57 is the one with the largest number of CAZyme-encoding genes. Phylogenetic and whole genome analyses show that MAG57 is closely related to other uncultured MAGs taxonomically assigned to the Prevotellaceae family recovered from the UBA project [38], either from sheep, elephant and mice gut microbiomes (Suppl. Fig.  S8).
Multiple PULs and CAZyme clusters were identi ed in MAG57 (Suppl . Table S4) including a gene cluster targeting arabinoxylan (CC102), an abundant hemicellulose in secondary cell walls of sugarcane and other grasses. This cluster encodes two exo-enzymes from families GH43 and GH97, and an unconventional GH10 member with an unknown 45 kDa N-terminal domain (Fig. 6a). Sequence analysis showed that this unusual N-terminal domain is also present in Bacteroidetes MAGs derived from the gut of human, mouse, and elephant (Suppl. Table S10); however, it displays no similarity with any known ancillary domain associated with CAZymes.
Therefore, to evaluate the function of this unconventional GH10 member (CapGH10), the full-length protein, its domains apart, and the other GH members comprising the CC102 cluster were recombinantly expressed and characterized. The GH97 member (CapGH97) is a calcium-activated α-galactosidase, whereas the GH43 member (CapGH43_12) is a highly active α-L-arabinofuranosidase (Suppl. Figs. S9-S10 and Table 1) -two key activities to remove decorations of heteroxylans.
The GH10 domain of CapGH10 exhibits endo-β-1,4-xylanase activity, being active on both xylan and distinct arabinoxylans (Table 1). Kinetic analysis indicate that decorations present in rye arabinoxylan (arabinose/xylose ratio = 40/60) are not detrimental to the catalytic performance, showing similar K m and k cat constants compared to xylan (Table 1 and Suppl. Fig. S11). The Xyn10Z enzyme from Hungateiclostridium themocellum ATCC 27405, sharing 36% of sequence identity with CapGH10, is the closest characterized member so far, with high activity on xylan [39]. The N-terminal region of Xyn10Z encompasses a feruloyl esterase followed by a CBM6 domain, which is not conserved in CapGH10 [40]. The CapGH10 N-terminus showed only sequence similarity with uncharacterized proteins, with the closest homologs mostly presenting a GH10 domain with sequence identity around 37-44%. CapGH10 orthologues, featuring similar domain architecture, were identi ed in PULs from ruminal Prevotella sp. such as Prevotella sp. BP1-148, Prevotella sp. BP1-145, Prevotellaceae bacterium HUN156 and Prevotellaceae bacterium MN60, also likely targeting xylan-related polysaccharides.
The potential enzymatic activity of the isolated N-terminal domain of CapGH10 was assessed against 30 different substrates including synthetic substrates, oligosaccharides, and polysaccharides (Suppl . Table  S8), but no (hydrolase, lyase or esterase) activity was observed. Typical activities involved in heteroxylans breakdown including endo-β-1,4-xylanase, β-xylosidase, α-L-arabinofuranosidase, α-D-galactosidase, α-Dglucuronidase, 4-O-methyl-glucuronoyl methylesterase, feruloyl esterase and acetyl xylan esterase were assayed by distinct methods without the detection of product formation or substrate consumption. Under this perspective, we then interrogated the capacity of this N-terminal domain to bind potential substrates of its GH10 partner such as beechwood xylan and arabinoxylans using a nity gel electrophoresis (AGE). As shown in Fig. 6c, this domain can indeed interact with the substrates of the GH10 domain, suggesting that this N-terminal domain may target the CapGH10 catalytic domain to xylan polysaccharides.
To get further insights into the potential role of this unconventional N-terminal domain, its crystallographic structure was solved by SeMet phasing at 1.8 Å resolution (Suppl. Table S11). The domain is monomeric in solution (Suppl. Fig. S12) and exhibits a parallel right-handed β-helix fold, consisting of 14 complete helical turns with two main short helices protruding from the β-helix backbone (Fig. 6b). The 14 helical turns are twisted and curved with a calcium ion between the 11 th and 12 th turns in an octahedral coordination sphere (Fig. 6b). This β-helix fold is observed in the clan GH-N of the GH superfamily, in the carbohydrate esterase CE8 and in several polysaccharide lyase (PL) families; however, structural comparisons with these CAZy families (GH28, GH91, PL6 and CE8) led to high rmsd values (> 3 Å), indicating poor three-dimensional conservation (Suppl . Table S12). Neither the catalytically relevant residues nor the active site topology of these families are conserved in the CapGH10 β-helix domain (Suppl. Fig. S13), supporting that this domain is not catalytically active.
In order to elucidate the molecular determinants for xylan binding observed in AGE experiments, two surface regions populated with aromatic and acidic residues, typical platforms for carbohydrate interaction, were identi ed and mutated. The region I between the turns 1-4 and the region II near to turns 6-10 ( Fig. 6b and Suppl. Fig. S14). Mutations E247A and E282A, in the region II, severely impaired protein stability and led to the expression only as inclusion bodies. Mutation D344L (at the calcium-binding site) also affected protein stability in a less extent, but the arabinoxylan/xylan binding capacity was preserved (Suppl. Fig. S15). This result indicates that calcium ion has a structural relevance rather than a functional role in carbohydrate recognition. Only the mutants Y62A and E82A, affected the migration pattern in AGE assays with beechwood xylan and rye arabinoxylan (Fig. 6c). Both residues are located at the region I, indicating that this patch plays a role in carbohydrate binding. It is worth to mention that two aromatic residues located at the corresponding region of the GH28 active site, Y193 and Y279, did not alter the carbohydrate binding, in agreement with no functional relevance of this region for CapGH10 β-helix domain. Combining the biochemical, structural and mutagenesis analyses, we would de ne CapGH10 βhelix domain as a CBM, therefore, establishing a novel structural scaffold in this superfamily and founding the family CBMXX.
Taken together, we conclude that this unprecedented modular endo-β-1,4-xylanase along with the synergistic activities of other CC107 partners confers the ability to Prevotella sp. MAG57 to act on complex heteroxylans (Fig. 6d), a key function in the gut microbiome of capybara that have grasses as a major component in its diet.

Discussion
Gut microbiota of herbivores is one of the most e cient systems in nature to overcome the recalcitrance of plant cell walls and the understanding of enzymatic and metabolic mechanisms employed by these microbial communities can provide novel and alternative solutions for the valorization of lignocellulosic materials and open new opportunities for carbohydrate-based biotechnological applications. In this study, we reveal how the gut microbiota of the largest living rodent, capybara (Hydrochoerus hydrochaeris) and also known as "master of the grasses", can e ciently depolymerize and utilize plant polysaccharides. These semi-aquatic animals are hindgut fermenters throughout found in Pantanal wetlands and the Amazon basin, which have incorporated sugarcane in their diet for decades, raising the possibility that their gut microbiota has been shaped by this biomass of great industrial relevance.
The metagenomics, metatranscriptomics and CAZyme inventory analyses revealed that cellulose degradation in this community is not accomplished by classical mechanisms involving cellobiohydrolases or cellulosomes. Instead, cellulose is likely processed by singular and sophisticated mechanisms featured by Fibrobacteres, including single and multi-modular CAZymes secreted by the T9SS system, CAZyme-rich outer membrane vesicles and lignocellulose adhesion proteins (Fig. 7). The complex and diverse architecture and composition of hemicellulosic and pectic polysaccharides present in gramineous and aquatic plants are tackled by a vast number of CAZymes organized in PULs found in the multiple recovered Bacteroidetes MAGs, which in part resembles to that from human gut Bacteroidetes species such as the PULs for mixed-linkage b-glucans [30] and xyloglucans [41].
It was prominent the identi cation of genes or PULs with remote or no similarity to known CAZy families and systems that led to the discovery of two new CAZymes families including a high-molecular weight CBM family involved in xylan recognition (CBMXX) and a GH-A clan family of b-galactosidases (GHXXX). This novel CBM family exhibits an unprecedented structural scaffold in this superfamily and its molecular architecture consisting of repeating b-helix units could serve as a platform for the rational design and engineering of CBMs. The GHXXX family expands the panel of industrially relevant CAZy families since b-galactosidases are broadly employed in food and beverage industries especially in the processing of dairy-based products.
The CAZyme repertoire of the capybara gut microbiome utterly covers the most abundant and recalcitrant polysaccharides present in gramineous and aquatic plants, which also requires e cient metabolic capabilities to further convert these depolymerized polysaccharides into SCFAs, the main energy source of the host. This hypothesis was further validated by global metabolite pro ling and metabolic reconstructions with the three SCFAs (acetate, butyrate and propionate) as the main metabolites produced mostly by Bacteroidetes, Firmicutes and Fusobacteria (Fig. 7). Similar microbial strategies for SFCAs production were also observed in human gut bacteria, highlighting the commonalities between the gut microbiota from omnivores and hindgut fermenters [42]. Taken together, these results shed light on the molecular mechanisms of carbohydrate processing and metabolism by the native capybara gut microbiota.

Conclusions
This work provides an unprecedented and comprehensive understanding of the enzymatic apparatus and metabolic pathways employed by the gut microbiota from the Amazon monogastric semi-aquatic herbivore, capybara, for the breakdown and utilization of recalcitrant dietary polysaccharides. This microbial community combines the unique cellulolytic machinery featured by Fibrobacteres and the diverse and elaborated PULs found in Bacteroidetes to e ciently depolymerize lignocellulosic biomass. Structural and functional investigation of proteins and PULs identi ed in the genomic dark matter of this microbiota uncovered two new CAZy families, highlighting its great potential as source of enzymes for the processing of plant polysaccharides. These ndings expand our current understanding about gut microbial strategies to overcome the recalcitrance of lignocellulosic biomass, which might be utilized in biore neries for the valorization of agroindustrial residues.

Ethics statement
This study was carried out in strict accordance with the Animal Management Rule of the Brazilian Ministry of Environment (Sisbio 59826-1). The samples were obtained from three euthanized animals in Tatuí/São Paulo State, Brazil (September 2017) as a measure of management of Rocky Mountain Spotted Fever (RMSF) hosts. After euthanasia, 20 g of intestinal contents were collected from the cecum and recto of each animal. All samples were placed in sterile containers and immediately frozen in liquid nitrogen. Samples were kept at −80 °C until processing.

Microbial DNA extraction
Samples of cecal and rectal contents were frozen in liquid nitrogen and pulverized with an oscillating ball mill (MM400, Retsch Inc.). The homogenized samples were used for microbial DNA extraction according to the protocol described for [43] with modi cations. Brie y, 0.25 g of sample was transferred to Lysing Matrix E Tube -Kit FastDNA Spin Kit for Soil (MP Biomedical, Inc.). For cell lysis, 1 mL RBB+C buffer was added in each sample, followed by homogenization in a FastPrep® FP120 instrument (MP Biomedical, Inc.). The precipitation of nucleic acids was obtained with the addition of a solution of ammonium acetate (10 M). The samples were incubated on ice for 30 min and then centrifuged at 4 °C for 10 min at 16,000×g. The nucleic acids pellet was recovered and washed with 70% (v.v -1 ) ethanol, followed by drying at room temperature. The nucleic acid pellet was dissolved in 75 μL of autoclaved ultrapure water. RNA was removed with the addition of DNase-free RNase (10 mg.mL -1 ). DNA puri cation was performed using PowerClean® DNA Clean-Up Kit (Mo Bio Laboratories) according to the manufacturer's protocol. Finally, electrophoresis using 0.8% agarose gel was used to separate the DNA fragments and to evaluate DNA quality. The DNA solution was stored at -20 °C.

RNA extraction and mRNA enrichment
The samples homogenized with an oscillating ball mill were also used for RNA extraction. Brie y, 500 mg of sample was used for total RNA extraction with Trizol and FastRNA® Pro Green Kit (MP Biomedicals), according to the manufacturer's instructions. The total RNA samples were treated with Ribo-Zero™ Magnetic Kit (Epicentre Biotechnologies) to remove ribosomal RNA (rRNA) from total RNA and enrichment of mRNA, followed by a 5 min incubation at 50 °C with magnetic beads to remove the hybridized rRNA molecules from the mRNA. Subsequently, the supernatant was puri ed using an 80% ethanol solution and the resultant RNA were used for sequencing.
Microbial community structure and diversity analysis Capybara gut microbial community structure and diversity was investigated via high-throughput sequencing of 16S rRNA gene. The ampli cation of the 16S rRNA gene V4 region was performed using the 515F (5'-GTGCCAGCMGCCGCGGTAA) and 806R (GGACTACHVGGGTWTCTAAT) primers [44].  Table S13. Prior to OTUs clustering, primer sequences were removed, and singletons were discarded. Filtered amplicon reads were denoised (error-corrected) using the UPARSE unoise3 function (parameters: -minsize 8 and alpha 0.2), to likely recover true biological sequences. Prokaryotic taxonomy assignment was performed using sintax function, as implemented in Usearsh v.10, using a the sintax_cutoff parameter of 0.8 as threshold and the RDP database v16 [47]. Further analyses were performed using phyloseq v1.20 R package on R Studio.

Metagenome and metatranscriptome sequencing
Metagenomic libraries were prepared using the Nextera Library Preparation kit (Illumina Inc.), while metatranscriptomic libraries were prepared using the TruSeq Stranded total RNA Library Prep Kit (Illumina Inc.). Libraries concentration were measured through quantitative qPCR using the KAPA Library Quanti cation Kit (Roche Inc.) and assayed for quality using the BioAnalyzer (Agilent Technologies). MG and MT libraries were paired-end sequenced in two runs (2 × 100 bp) on the Illumina HiSeq 2500 platform at the NGS sequencing facility at LNBR/CNPEM (Campinas, Brazil). The cecal and rectal gDNA were homogenized into a single sample and sequenced on a MinION sequencing device from Oxford Nanopore Technologies Inc. to obtain long reads. About 1 µg of ultra-long high molecular weight gDNA from the homogenized samples was require for library preparation. The library was prepared with the SQK-LSK109 Kit (Oxford Nanopore Technologies Inc.) following the manufacturer protocol. The MinION run was performed on a Flow cell R9 version, generating around 3Gb of long reads.

Metagenome and metatranscriptome analysis
Metagenome and Metatranscriptome raw sequences were quality checked and trimmed as described above, MT reads were also analyzed using SortmeRNA v. 2.0 to remove rRNA reads, and then both MG and MT reads were taxonomically classi ed using Kaiju v. 1.7.4 with maximum number of mismatches allowed=5 and with the greedy mode [48]. For functional analysis, the MG trimmed reads were de novo co-assembled using IDBA_UD (version 1.1.1) with the pre-correction parameter and k-mer size from 20 to 60 [49]. Assembly statistics are described in Suppl. Table S14. The assembled metagenome was binned using CONCOCT v.0.4.0 (parameters: -c 400, -k 4 -l 1000, -r 200 and --no_cov_normalization) [50] and MaxBin 2.0 (parameters: min_contig_length 1000, max_iteration 50, prob_threshold 0.9 and markerset 107) [51] to recover putative genomes from the metagenomic data. The binned genomes were dereplicated to remove redundancies using dRep v. 2.0.5 (parameters: -comp 80 -con 10 -str 100 and -p 10) and analyzed using CheckM v1.0.6 with the lineage_wf work ow [52] to determine the completeness and contamination ratios of these genomes. Long-reads sequencing were used for MAGs scaffolding using SSPACE-long-reads v1.1 (parameters: -k 5, -a 0.7, -x 1, -m 50, -o 20 and -n 1000). Genomes with completeness smaller than 55% and more than 15% contamination rate were discarded. To assign taxonomy to the recovered genomes GTDB-tk tool v.1.4 was used with the release 95 of the GTDB database [53]. Gene prediction and annotation of both the recovered genomes and the co-assembly were performed using Prokka v.1.11 with the meta parameter [54], and annotation statistics are described in Suppl. Table S14. KEGG pathways and Kegg Orthologous (KOs) annotation were performed using KOFAM (e-value < 1e-5) [55] and Functional Ontology Assignments for Metagenomes (FOAM) database (e-value < 1e-5) [56]. CAZymes annotation was performed according to CAZy database pipelines [57].
Furthermore, MG and MT reads were mapped to the whole set of genes recovered from the co-assembled metagenome and the set of genes recovered from the MAGs using Kallisto v. 0.46.1 with quant function [58] to estimate the coverage/abundance of protein coding genes in cecal and rectal samples.
Normalized abundance was estimated based on the count/number of reads per kilobase per million mapped reads expressed as TPM.

Phylogenetic analysis and metabolic reconstruction
Phylogenetic analysis of the MAG57, reference Bacteroidetes type strains and Prevotellaceae uncultured genomes recovered from UBA project [38] was performed using concatenated 92 single copy core genes according to UBCG method [59]. CAZymes phylogenetic analysis was carried out using the catalytic domain of each family aligned with MAFFT [60], and using maximum likelihood methods employing the RAxML software [61] with 1,000 rapid bootstrap inferences and LG as the substitution model.
To perform the metabolic pathways reconstruction of each recovered MAG, their annotation obtained from KOFAM database were ltered to keep only the top 5 hits of each protein with e-value below the 1e-5 threshold. These ltered annotations were then supplied to the Annotation of Metabolite Origins (AMON) tool [62], which based on the KOs annotated in each MAGs predicts the putative metabolites that it can generate.

NMR-based metabolomics
Approximately 30 mg of dry cecal and rectal contents, and 300 μL of solution 2:1 (methanol: chloroform) were mixed and sonicated for 1 min (4 cycles of 15 sec with intervals of 10 sec) and placed at 4 °C for 15 min. Next, 300 μL of solution 1:1 (methanol: ultrapure water) was added, followed by centrifugation at 16,000 g and 4 °C for 20 min. The supernatant was transferred to a new tube and were dried in CentriVap Solvent System (Labconco Corporation). Samples were diluted to 630 μL by addition of D 2 O, 70 μL of sodium phosphate buffer ( nal concentration 0.1 M) containing dimethyl-silapentane-sulfonate ( nal concentration 0.5 mM) for NMR chemical shift reference and concentration calibration. The samples were ltrated in a syringe lter with a 0.22 µm pore size hydrophilic polyethersulfone (PES) membrane. The nal volume of ltrate ranged from 500 to 650 μL. 1H NMR spectra of samples were acquired using a Varian Inova NMR spectrometer (Agilent Technologies Inc.) equipped with a 5 mm triple resonance cold probe and operating at a 1H resonance frequency of 599.84 MHz and constant temperature of 298 K (25°C ). A total of 1024 free induction decays were collected with 32-k data points over a spectral width of 16 ppm. A 1.5-s relaxation delay was incorporated between scans, during which a continual water presaturation radio frequency (RF) eld was applied. Spectral phase and baseline corrections, as well as the identi cation and quanti cation of metabolites present in samples, were performed using Chenomx NMR Suite 7.6 software (Chenomx Inc.).

Protein expression and puri cation
Protein expression and puri cation were conducted as reported in [63]. Brie y, E. coli BL21 strain was transformed with target genes subcloned into pET28a in frame to a 6xHis-Tag at the N-terminus. LB medium [0.5% (w.v -1 ) yeast extract, 1% (w.v -1 ) tryptone, 1% (v.v -1 ) sodium chloride] was employed for protein expression in the presence of speci c antibiotics for transformant selection. Culture growths were conducted at 37 °C until O.D. 600nm around 0.8 and then, expression was induced by the addition of 0.2 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) (Sigma Aldrich). Protein expression was conducted at 18 ºC for 16 hrs and cells were harvested by centrifugation at 5000 x g.
Protein puri cations were performed by nickel a nity chromatography followed by size exclusion chromatography. Brie y, pelleted cells were resuspended in saline-phosphate buffer (20 mM sodium phosphate, 500 mM NaCl, pH 7.5) with addition of 5 mM imidazole, 1 mM phenylmethylsulfonyl uoride (PMSF), 5 mM benzamidine and 0.1 mg ml −1 lysozyme. Cells were then disrupted by sonication and soluble protein lysates were applied to a 5-ml HiTrap Chelating HP column (GE Healthcare). Target proteins were eluted in an imidazole gradient up to 0.5 M. 6xHis-Tag was cleaved using 1% (w.w -1 ) trypsin (catalog no. T1426, Sigma Aldrich). Target proteins were further puri ed by size exclusion chromatography with a HiLoad 16/600 Superdex 75 pg column (GE Healthcare) equilibrated with 20 mM sodium phosphate, 150 mM NaCl, pH 7.5. Puri ed proteins were evaluated by dynamic light scattering (DLS) and samples with low polydispersity (<20%) were employed in biochemical and biophysical experiments.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.