Gut Microbiome of the Amazon Master of the Grasses Harbors Unprecedented Enzymatic Strategies for Plant Glycans Breakdown

Abstract


Introduction
The diverse symbiotic microbiota present in the digestive tract of herbivores has been an overwhelming source of intricate enzymatic mechanisms for lignocellulose deconstruction (1)(2)(3)(4). In particular, the microbiota of foregut (rumen) fermenters has served for decades as a model system (5,6), which led to the discovery of sophisticated and intriguing systems to degrade complex plant bers such as the multienzyme complexes cellulosomes from Ruminococcus avefaciens (7) and the e cient and distinguishing cellulose degradation system of Fibrobacter succinogenes (8).
Hindgut fermenters represent another class of herbivores probably as rich as ruminants in enzymatic mechanisms for the breakdown of recalcitrant plant glycans, since they are able to e ciently utilize lowquality forage (9). Similar to foregut fermenters, the digestion is accomplished by a symbiotic microbial community, but it occurs in a single and enlarged fermentation chamber (10). These monogastric herbivores comprise a vast range of animals from massive mammals such as elephants, rhinos and horses to small animals exempli ed by rabbits and semiaquatic rodents (11). In addition, they are spread over a myriad of ecological niches in all continents such as rain forests, savannas, grasslands, swamps, highlands and deserts, suggesting to have evolved highly specialized molecular strategies to overcome the sheer complexity and diversity of plant glycans in these environments.
Capybara (Hydrochoerus hydrochaeris) is the largest living rodent throughout found in Pantanal wetlands and the Amazon basin, which is also known as "Master of the grasses" due to its diet. In this animal, the fermentation takes place in the cecum, which corresponds to almost three quarters of the digestive tract, reaching a digestive e ciency comparable to that of ruminants (12). Preliminary characterization of capybara cecum microbiome indicated that the most abundant commensal microbes were from Firmicutes and Proteobacteria phyla, with an unusual low abundance of Bacteroidetes (13), not expected for a typical mammalian hindgut fermenter. This initial study led us to the hypothesis that this gut microbiome may employ diversi ed metabolic capabilities and enzymes for depolymerization and fermentation of complex dietary bers to provide host nutrition. Moreover, capybara animals dwelling the Southeast region in Brazil for decades have incorporated the industrially relevant sugarcane in their diet (14), which makes their cecal microbiome an especially attractive system for lignocellulose depolymerization.
To address this knowledge gap, we performed an integrated multi-omics approach to reveal the community structure and composition, Carbohydrate Active EnZymes (CAZymes) repertoire and metabolic pathways of the gut microbiota from native capybara animals dwelling the Brazilian Southeast region. Furthermore, by combining carbohydrate enzymology, X-ray crystallography and mutagenesis, two new CAZy families involved in plant glycan deconstruction were discovered, highlighting the potential of the capybara´s gut microbiome as a reservoir of unprecedented enzymatic systems for carbohydrate processing.

Results
Taxonomic structure and composition of the capybara gut microbiota To explore the microbial community structure, membership, and metabolic exchange of capybara gut microbiome, we collected replicated fresh samples from the cecum and rectum from three wild female animals. Herein we combined several culture-independent omics approaches including 16S rRNA targeted sequencing to access the community structure; whole shotgun metagenome sequencing (MG) to reveal the community genetic and functional pro le; metatranscriptomic RNA sequencing (MT) to determine the expression level of genes; and NMR-based metabolomics to elucidate the small molecules pro le of this specialized community adapted to degrade recalcitrant plant polysaccharides.
Binning of MG assembled contigs based on tetranucleotide frequency and coverage pro le resulted in the building of 79 unique Metagenome Assembled Genomes (MAGs), with completeness > 55% and contamination < 15% (Supplementary Table 1); among those, 24 were considered of high quality (completeness > 90% and contamination < 5%) and 50 medium-quality (completeness > 50% and contamination < 10%), according to parameters suggested by Bowers et al. 2017 (15). Taxonomy classi cation indicates that 35 of the recovered MAGs belong to the Firmicutes phylum, including six from the Erysipelotrichaceae family and eight from the Lachnospirales family. The second most abundant group was the Bacteroidetes, with 30 MAGs classi ed in the Bacteroidales order (Supplementary Table 1). Although only two genomes from Fusobacteria and Proteobacteria were recovered from MG, the most abundant OTUs identi ed by 16S analysis were classi ed as Fusobacteria and Proteobacteria with a relative abundance of 20% and 18%, respectively, ( Figure S1), pointing to a central role of those species in this environment.
A dominance of Firmicutes, Proteobacteria, Bacteroidetes and Tenericutes was observed in the hindgut microbiomes of other herbivores such as Castor ber, Castor canadensis, horse, rabbit and koala (16)(17)(18)(19)(20). Further, microbiota analysis of domesticated herbivores including hindgut fermenters, ruminants and monogastric animals revealed Firmicutes as the dominant phylum (53.11, 63.35 and 52.27% respectively), followed by Bacteroidetes (31.36, 20.95 and 26.95%, respectively). Although the dominance of Bacteroidetes and Firmicutes is a general feature of mammalian gut microbiomes, the microbiota of native Brazilian capybara differs from other hindgut fermenters and ruminants, mainly due to a reduced abundance of Firmicutes (35%) along with a higher abundance of Fusobacteria (15%) and Proteobacteria (8%) (21). The increased presence of Fusobacteria can be associated with the production of butyrate, a short-chain fatty acid that is often the end-product of carbohydrate fermentation (22). On the other hand, and in spite the high polysaccharide diet, the lower abundance of the Firmicutes in the capybara microbiome may point to strategies for lignocellulose utilization distinct from those typically found in other hindgut herbivores and ruminants.

Metabolic pro ling indicates high performance on the conversion of dietary bers
Recalcitrant glycans found in diet components such as cellulose, hemicellulose and pectins are processed via anaerobic microbial fermentation to produce a wide range of metabolites, re ecting the diversity of substrates available in the digestive tract of herbivores, as well as the biochemical potential of the gut microbiota. The major fermentation products detected in the capybara gut by NMR spectroscopy-based metabolomics, were short-chain fatty acids (SCFAs) such as acetate, propionate, and butyrate, among more than 40 metabolites measured (Supplementary Table 2 Table 2). These SCFA ratios indicate a forage-based diet and are similar to that observed for ruminants (23,24).
The MG and MT datasets were analyzed to describe the microorganisms and metabolic pathways associated to fermentation and SCFA production (Supplementary Fig. 2A). Genes related to pyruvate fermentation were highly abundant in both MG and MT data for cecal and rectal samples and the microbiota related to this pathway was dominated by Firmicutes, Bacteroidetes and Fusobacteria ( Supplementary Fig. 2B). Metabolic pathways reconstruction of the 79 unique genomes recovered from capybara gut microbiome was conducted to further investigate the contribution of individual microorganisms to SCFAs production (Fig. 2). This analysis indicates that acetate can theoretically be produced by any of the bacterial genomes recovered from capybara gut microbiome, in agreement with the high abundance of this metabolite in both cecal and rectal samples (Table S2). Butyrate is known to be produced mainly by Firmicutes and the analysis of the key genes involved in the nal steps of this pathway including butyryl-CoA:acetate CoA-transferase atoA/D genes, butK and ptb genes encoding butyrate kinase (EC 2.7.2.7) and phosphotransbutyrylase (EC 2.3.1.19), respectively, showed that Firmicutes Ileibacterium sp. MAG6 and Megasphaera sp. MAG33 are likely the major butyrate-producing bacteria in the capybara gut since they present the highest expression of atoA/D genes ( Figure S3 and Table S3). Other bacteria, for instance the Bacteroidetes Marinilabiliaceae MAG47 and Fusobacteria MAG38 and MAG39 also presented co-localized genes atoA/atoD and ptb/butK, suggesting that they may also contribute to butyrate production ( Figure S3 and Table S3).
In order to verify the distribution of the pathways for propionate production within the capybara gut microbiota, key genes from each pathway (acrylate, propanediol or succinate) were analyzed (25). Lactoyl-CoA and propane-1,2-diol, intermediates from acrylate and propanediol pathways respectively, were not identi ed in the metabolic reconstruction of any of the genomes recovered from capybara gut (Fig. 2). On the other hand, the succinate pathway, assessed by the mmdA gene encoding methylmalonyl coA decarboxylase, was widespread mainly among Bacteroidetes, but also detected in some Firmicutes and Fusobacteria genomes ( Figure S3 and Table S3), indicating that the main substrate used by capybara gut microorganisms to propionate production are probably hexoses and pentoses. Furthermore, the proportion of propionate detected in the gut capybara gut correlates (R = 0.77 and p = 0.07) with the relative abundance of Bacteroidetes, reinforcing that succinate pathway of this phylum is the major source of propionate production in capybara gut.
A few gut microorganisms are known to produce both propionate and butyrate, such as Roseburia inulinivorans, Coprococcus catus and Eubacterium hallii (26,27). Other microorganisms able to produce acetate, butyrate and propionate as metabolic end products are Megasphaera sp. NM10, BL7 and M. elsdenii (28). According to metabolic reconstruction analysis, butyrate and propionate were predicted to be present concomitantly in 15 genomes (Fig. 2) and Megasphaera sp. MAG33 shares ci. 95% identity to the ruminal M. elsdenii suggesting similar metabolic capabilities. These observations reinforce the idea that the capybara microbiome is a promising source of novel species with diversi ed metabolic functions, with great potential for the breakdown of dietary structural carbohydrates as the high SCFA production are common markers of digestion performance of recalcitrant plant bers (29).

Capybara gut microbiome strategies for the breakdown of dietary polysaccharides
The capacity of capybara to convert lignocellulosic materials into SCFAs is determined by the genomic potential associated with Carbohydrate-Active enZymes (CAZymes) of the gut microbiota. A total of 6,132 putative CAZymes encoding genes from 105 Glycoside Hydrolases (GH) and 10 Polysaccharide Lyases (PL) families were identi ed, of which 456 genes presented a modular architecture ( Figure S4 and Table S4). The most abundant CAZymes identi ed are plant cell wall-degrading enzymes from families GH3, GH2 and GH1 (by decreasing abundance) that encompass diversi ed activities including βglucosidases, β-xylosidases, β-galactosidases and β-mannosidases, among others. These enzymes are often associated with the later steps in the degradation cascade of several plant polysaccharides such as cellulose, heteroxylans, mixed-linkage β-glucans and β-mannans. Moreover, it has already been reported that these families are highly abundant in several host-associated gut microbiomes such as human, mouse, swine, and cattle rumen (30), probably due to their broad functions.
As sugarcane is part of the capybara diet dwelling Brazil Southeast region, it was expected that its microbiota would be able to use the easily metabolizable sugar sucrose. In capybara gut CAZymes arsenal, invertases from GH32 family were identi ed in a proportion of approx. 1.5%, which is similar to that reported for several gut microbiomes from ruminants to humans (30,31). It is worth to mention that in the sequenced genome of capybara itself there is no gene encoding GH32 invertases, which holds for all mammals sequenced to date. Further analysis of MG and MT datasets, revealed a high abundance of GH32 enzymes in MG only ( Figure S4), which led us to the hypothesis that, although the gut microbiome has the genomic potential to metabolize sucrose, the capybara was digesting more recalcitrant components of its diet at the time of sample collection.
One of the main dietary polysaccharides of capybara is cellulose, which is highly resistant to microbial degradation due to its chemical and structural organization along with numerous intermolecular interactions with a complex matrix of hemicelluloses, pectins and lignin. Neither cellulases from families GH6, GH7 and GH48, nor cellulosomes, assessed by the presence of cohesin and dockerin domains associated with cellulases, could be identi ed in capybara gut MG or MT datasets. This suggests that cellulose degradation in the capybara gut may be accomplished by endo-β-1,4-glucanases (EC 3.2.1.4) from families GH5 (subfamilies GH5_2, GH5_4, GH5_25 and GH5_37), GH8, GH9 and GH45, which were detected either as single domains or in multi-modular protein architectures. Interestingly, the most expressed genes putatively encoding endo-β-1,4-glucanases detected in capybara gut microbiome belong to families GH5_2, GH8, GH9 and GH45 and were recovered from Fibrobacter genomes ( Figure S5 and Table S5), indicating that these bacteria may be the major contributors to cellulose degradation in the capybara gut. Fibrobacter succinogenes is known as a highly e cient cellulolytic bacterium in the cow rumen (32). It is proposed that F. succinogenes utilizes a multi-protein complex to attach to cellulose bers and secretes cellulases by the T9SS-dependent secretion system to enable cellulose breakdown into cellodextrins, which then would be imported into the periplasm for further degradation and utilization (33). The three Fibrobacter genomes recovered from capybara gut microbiome encode cellulases with a T9SS signal sequence as well as proteins for cellulose adhesion including tetratricopeptide, bro-slime, OmpA and pilin proteins, as reported for F. succinogenes (33). Furthermore, from the set of 347 proteins observed in the outer membrane vesicles (OMVs) from F. succinogenes (34), we have identi ed 262 with sequence identity ranging from 30-99%. These observations suggest that typical Fibrobacter mechanisms, fundamentally relying on cell surface adhesion and OMVs, are central for cellulose degradation in the capybara gut.
Hemicelluloses and pectins are also important polysaccharides in the diet of capybaras and 30 Bacteroidetes genomes were recovered from capybara gut microbiota. Bacteroidetes are known to possess highly diversi ed carbohydrate degradation capabilities, many of them encoded as polysaccharide utilization loci (PULs), which are clusters of genes encoding CAZymes, SusCD-like transporter and regulators. Around 120 predicted PULs and 150 Clusters of CAZymes (CCs) were identi ed in our Bacteroidetes MAGs (Extended Data Fig. 1 and Table S6), and were compared to literature-derived PULs available in the PULDB database (35). PULs probably involved in the degradation of xylans and arabinoxylans -polysaccharides highly abundant in grasses including sugarcane -were identi ed in the genomes of B. heparinolyticus MAG 61 and Bacteroidota bacterium MAG40 (Fig. 3A), resembling PULs from B. ovatus (36). The strategies for the breakdown of mixed-linkage β-glucans are highly conserved in capybara and human microbiomes, with an exact same PUL organization encompassing GH16 and GH3 enzymes ( Fig. 3A) (37). PULs involved in xyloglucan (XyG) degradation, a more recalcitrant hemicellulose, were identi ed in the Bacteroidaceae bacterium MAG53, featuring core hydrolases from families GH5_4, GH31 and GH9 (Fig. 3A). In B. ovatus, the XyG-PUL encodes other enzymes from GH43, GH3 and GH2 families (38), which were also detected in MAG53, albeit in distinct genomic regions. These enzymes may function as escorts for a complete depolymerization of XyGs similar to that reported for the saprophyte Cellvibrio japonicus (39). PULs predicted to act on mannosecontaining glycans were also identi ed in the capybara gut microbiome (Fig. 3A), conserving the core genes GH26 (endo-β,1-4-mannanases) and GH130 (β-1,4-mannosylglucose phosphorylases) as described for the human gut bacteria B. fragilis (40). Furthermore, a set of different PULs putatively enabling the degradation of other polysaccharides such as starch and pectins, were identi ed mainly present in Bacteroidaceae genomes ( Fig. 3A and Table S6). For instance, PUL54 from Bacteroidaceae bacterium MAG51 involved in the degradation of homogalacturonan, a key component of sugarcane cell wall pectin (41), comprising enzymes from families GH105, GH43_10 and GH28 ( Fig. 3A and Table S6) resembles the corresponding PUL from B. ovatus (36). However, a clear target substrate could not be de ned for a large fraction of PULs predicted from Capybara gut microbiome (Table S6), in part due to intrinsic limitations of genome reconstruction from metagenomes, but also re ecting the variability, heterogeneity and insu cient knowledge of the structure and composition of the glycans present in the diet of wild capybaras. Nevertheless, our analyses highlight the importance of the Bacteroidetes phylum in the Capybara gut providing a diverse arsenal of enzymatic systems for the degradation and utilization of the main components of dietary carbohydrates.
Taken together, our results demonstrate that the capybara gut microbiota preferentially exploits a combination of free enzymes (rather than cellulosomes) containing a catalytic module either isolated or appended to CBMs or other catalytic modules to deconstruct dietary polysaccharides with a biochemical diversity provided by Bacteriodetes PULs/CCs and with Fibrobacter genera as workhorses for cellulose breakdown.
A new partner for an old acquaintance in heteroxylan degradation Among the genomes recovered from capybara gut microbiome, Prevotella sp. MAG57 is the one with the largest number of CAZyme-encoding genes ( Fig. 3B and Table S6). Phylogenetic analysis and whole genome comparison indicated that MAG57 is closely related to other uncultured genomes from the Prevotella genus recovered from capybara and from the UBA project (42) from sheep, elephant and mice gut (Fig. 6A). Regarding sequence-based genomic comparisons, MAG57 has an average nucleotide identity (ANI) of 75% but with an alignment fraction < 60% to genomes selected across Bacteroidetes phylum, and thereby it most likely corresponds to a novel species ( Figure S6B). Many different PULs and CAZyme clusters organizations were identi ed in MAG57, probably involved in the degradation and utilization of hemicelluloses and pectins (Table S6). In particular, a gene cluster with predicted GH10, GH43 and GH97 members drew our attention as putatively acting on arabinoxylans, an abundant hemicellulose in secondary cell walls of sugarcane and other grasses. In particular, its GH10 member appear to contain an unknown N-terminal domain extension with a predicted mass of approx. 45 kDa (Fig. 4A). Sequence analysis showed that this unusual N-terminal domain is also present in Bacteroidetes species derived from human, mouse, and elephant gut-associated species (Table S7). However, it displays no similarity to domains typically associated with GH10 members such as xylan-binding CBM22 and xylanase-speci c CBM9.
To evaluate the function of this unconventional GH10 member (CapGH10), the full-length protein and its domains along with other GH members of the CC102 cluster were recombinantly expressed and characterized. The GH97 member (CapGH97) is a calcium-activated α-galactosidase, whereas the GH43 member is a highly active α-L-arabinofuranosidase (Figure. S7-S8 and Table 1), two critical activities to remove decorations of heteroxylans. The later belongs to subfamily GH43_12 and showed low sequence identity to other structurally characterized GH43 members [~ 34% with Bacteroides ovatus GH43a, PDB 5JOW (43)]. Structural elucidation by SeMet phasing (Table S8) revealed a two-domain architecture with a β-sandwich accessory domain tightly bound to the catalytic domain ( Figure S8D). Distinct to all other GH43_12 members structurally characterized so far, in which the β-sandwich domain is composed only by C-terminal β-strands, the GH43_12 structure herein elucidated shows an N-terminal β-strand that integrates with C-terminal β-strands to form the β-sandwich domain (43)(44)(45) (Figure S8 D). It indicates a further level of structural complexity within the GH43 family that might be carefully considered when designing constructs and chimeras involving these instrumental enzymes for plant polysaccharides depolymerization. Structural comparisons with other GH43_12 arabinofuranosidases showed a highly conserved active-site pocket including all residues comprising − 1 subsite, which is in agreement with the speci city and action mode of CapGH43_12 ( Figure S8 E-F). The GH10 domain of the CapGH10 protein was shown to be an endo-β-1,4-xylanase active on beechwood xylan and several arabinoxylans including high viscosity rye our arabinoxylan (33 cSt), low viscosity wheat our arabinoxylan (8 cSt), acid debranched wheat arabinoxylan (26% Ara and 22% Ara) and enzyme debranched wheat arabinoxylan (30% Ara). Kinetic analyses indicate that decorations present in rye arabinoxylan (arabinose/xylose ratio = 40/60) are not detrimental to the enzyme catalytic performance, exhibiting similar K m and k cat constants compared to xylan (Table 1 and Figure S9). The Xyn10Z enzyme from Hungateiclostridium themocellum ATCC 27405, sharing 36% of sequence identity with CapGH10, is the closest characterized member so far, with high activity on xylan (46). The N-terminal region of Xyn10Z comprises a feruloyl esterase followed by a CBM6 domain, both of which are not present in CapGH10 (47 Table 9), but no (hydrolase, lyase or esterase) activity was observed. Typical activities involved in heteroxylans breakdown including endo-β-1,4-xylanase, β-xylosidase, α-Larabinofuranosidase, α-D-galactosidase, α-D-glucuronidase, 4-O-methyl-glucuronoyl methylesterase, feruloyl esterase and acetyl xylan esterase were assayed by distinct methods without the detection of product formation or substrate consumption. Under this perspective, we further interrogated the capacity of this N-terminal domain to bind potential substrates of its GH10 partner such as beechwood xylan and arabinoxylans using a nity gel electrophoresis (AGE). As shown in Fig. 4C, this domain can indeed interact with the substrates of the GH10 domain, suggesting that this N-terminal domain may target the CapGH10 catalytic domain to xylan polysaccharides (Fig. 4C).
To get further insights into the potential role of this unconventional N-terminal domain, its crystallographic structure was solved by SeMet phasing at 1.8 Å resolution (Table S8). The domain exhibits a parallel right-handed β-helix fold, consisting of 14 complete helical turns with two main short helices protruding from the β-helix backbone (Fig. 4B). The 14 helical turns are twisted and curved with a calcium ion between the 11th and 12th turns in an octahedral coordination sphere (Fig. 4B). This β-helix fold is observed in the clan GH-N of the GH superfamily, in the carbohydrate esterase CE8 and in several polysaccharide lyase (PL) families; however, structural comparisons with these CAZy families led to high rmsd values (> 3 Å), indicating poor three-dimensional conservation (Table S10). Despite that, structural superpositions were performed with CAZy families (GH28, GH91, PL6 and CE8) as an attempt to identify similarities of CapGH10 β-helix domain with the active sites of these enzymes. Neither the catalytically relevant residues nor the active site topology of these families are conserved in the CapGH10 β-helix domain (Extended Data Fig. 2). Besides the lack of all key catalytic residues, a long loop (G126-K140) in the CapGH10 β-helix domain also partially occludes the region corresponding to the active site in the GH28 enzymes (PDB ID 3JUR (48)) (Extended Data Fig. 2A). In comparison to family GH91 (PDB ID 2INU (49)), the two loops critical for catalytic activity, T2 and T3, are absent in CapGH10 β-helix domain (Extended Data Fig. 2B) and in the PL6 family (PL6, PDB ID 6QPS (50)), the Ca 2+ -binding site essential for catalytic activity is not present in CapGH10 β-helix domain (Extended Data Fig. 2C). Despite there is a cleft-like region in the CapGH10 β-helix domain near to the corresponding active site of the CE8 family (PDB ID 3UW0 (51), Extended Data Fig. 2D), the catalytic residues are not conserved and most residues populating this region in the CapGH10 β-helix domain are not even conserved within homologues, weakening the possibility of this region to be a catalytic center. Moreover, SAXS data ( Figure S10) indicated that the CapGH10 β-helix domain is monomeric in solution, unlike the GH28 and GH91 families that rely on oligomerization to be functional. These structural analyses, and the lack of conservation of residues corresponding to the cleft-like region in CapGH10 β-helix domain homologues support the biochemical data that this domain is not catalytically active.
Considering aromatic and acidic residues as important platforms for carbohydrate interaction, mapping of the molecular surface of the CapGH10 β-helix domain led to two potential binding regions, one between turns 1-4 (region I) and another between turns 6-10 (region II). Therefore, residues Y62 and E82 from region I and residues E132, D133, Y193, E225, E247, Y279, E282, D360 and D365 from region II were mutated to alanine ( Supplementary Fig. 11). Moreover, one mutation at the calcium-binding site (D344L) was evaluated to address whether calcium ion incorporation could be essential for carbohydrate binding.
Mutations E247A and E282A severely impaired protein stability and led to the expression only in the insoluble fraction. Mutation D344L also affected protein stability in a less extent, but the arabinoxylan/xylan binding capacity was preserved ( Figure S12). This result indicates that calcium ion has a structural relevance rather than a functional role in carbohydrate recognition. Among the other nine mutants, only Y62A and E82A, affected the migration pattern in AGE assays with beechwood xylan and rye arabinoxylan (Fig. 4C). Both residues are located at the region I, indicating that this region plays a role in carbohydrate binding. It is worth to mention that two aromatic residues located at the corresponding region of the GH28 active site, Y193 and Y279, did not alter the carbohydrate binding, being in agreement with no functional relevance of this region for CapGH10 β-helix domain. Combining the biochemical, structural and mutagenesis analyses, we would de ne CapGH10 β-helix domain as a CBM, therefore, establishing a novel structural scaffold in this superfamily and founding the new family CBMXX.
Taken together this unprecedented modular endo-β-1,4-xylanase along with the synergistic activities of other CC107 partners, we conclude that this cluster confers the ability to Prevotella sp. MAG57 to act on complex heteroxylans (Fig. 4D), a key function in the gut microbiome of capybara that have grasses as a major component in its diet.
A new GH family mined from the genomic dark matter of capybara microbiome The combined MG and MT analysis of capybara gut microbiome revealed several expressed genes annotated as hypothetical proteins. Some of these genes presented extremely remote similarity to CAZy members, with percentage of sequence identity ranging from 10 to 20%, suggesting a potential function in the processing of plant polysaccharides, but requiring con rmation by functional investigation (Table   S11). Aiming to uncover the activity of these proteins, synthesized ORFs were expressed and subjected to biochemical assays employing a diverse set of synthetic, poly-and oligosaccharides substrates.
CapGHXXX orthologues are present in Actinobacteria, Firmicutes, Verrucomicrobia and mainly in Bacteroidetes genomes recovered from diverse sources such as rumen, feces, gut and oral microbiota (Table S12), being the closest sequence from a rumen-derived genome (UBA2817) from the uncultured RC9 group (42). Sequence analysis showed that CapGHXXX is distantly related to families GH5 and GH30 (Fig. 5A) and protein threading indicates a TIM barrel fold ( Supplementary Fig. 14), suggesting that this novel GH family belongs to the clan GH-A. To further explore this GH family, the enzyme CBK67650.1 (SEQ ID BXY_26070) from B. xylanisolvens, which shares 46% sequence identity with CapGHXXX, was synthesized, produced and biochemically characterized ( Table 1). This second member also showed βgalactosidase activity that strengthens at biochemical level the establishment of this new GH family.
In the genome of Bacteroidota bacterium MAG42 recovered from Capybara gut, CapGHXXX is found in a putative PUL additionally comprising enzymes from families GH2 and GH78. A similar PUL organization was predicted in the genome of Bacteroidetes sp. 1_1_30 recovered from human gut, which yet harbors enzymes from GH36, CE7 and PL8_2 families. It is noteworthy that CapGHXXX is often found fused appended to a GH36 module or in PULs also having GH36 members such as in B. xylanisolvens and Prevotella dentalis, recovered from stool and oral cavity, respectively (Fig. 5B), indicating a synergistic relationship between these families. Moreover, these families are also commonly found along with GH78 α-L-rhamnosidases in the PUL context. In the genome of the Bacteroidales bacterium UBA2817, a GHXXX member is appended to a GH78 module carrying a CBM67, both targeting rhamnogalacturonans (Fig. 5B). These observations suggest that GHXXX could act on β-linked galactosyl residues in pectic polysaccharides. Further studies in the PUL context are required to shed light on their biological role in complex gut environments.

Discussion
The capybara (Hydrochoerus hydrochaeris), also known as "Master of the grasses", is the largest rodent living on earth, dwelling Pantanal wetlands and the forests and plains of the Amazon basin. This semiaquatic herbivore is a hindgut fermenter with an enlarged cecum that can e ciently degrade and utilize recalcitrant plant polysaccharides by microbial processes so far unexplored. Interestingly, in the Southeast region of Brazil these animals have incorporated sugarcane in their diet for decades, raising the possibility that their gut microbiome has been shaped by this biomass of great industrial relevance.
Multi-omics analysis of the capybara gut microbiome revealed that carbohydrate processing resides on an elaborated arsenal of CAZymes from a diversi ed set of microorganisms from Bacteroidetes, Firmicutes, Fibrobacteres and Fusobacteria phyla, which yet exhibit distinct metabolic pathways to convert dietary bers into SCFAs, a major energy source for the host (Fig. 6). Our analyses indicate that Fibrobacter bacteria are probably the workhorses for cellulose breakdown, involving the orchestrated action of diverse single-domain and as well as modular endo-β-1,4-glucanases from families GH5, GH9 and GH45. On the other hand, the degradation of hemicelluloses and pectins is catalyzed by an intricate and broad repertoire of PULs/CCs observed in the assembled Bacteroidetes genomes targeting complex heteroxylans, xyloglucans, mixed-linkage β-glucans, homogalacturonans and rhamnogalacturonans, which are abundant polysaccharides in grasses, in particular sugarcane (41). It is noteworthy that many putative PULs/CCs identi ed in these MAGs only showed similar organization with predicted PULs for which a de ned target substrate is unknown, pointing to a number of yet unexplored strategies for glycan processing in this microbiome.
Metagenomics along with metatranscriptomics also revealed a notable number of genes remotely related to known CAZy families or modular architectures comprising unknown domains, leading us to further explore this genomic dark matter through carbohydrate and structural enzymology. A cluster of CAZymes specialized in complex heteroxylans from a novel Prevotella bacterium contains an unconventional modular GH10 endo-β-1,4-xylanase, featuring a novel CBM family targeting xylans and arabinoxylans.
This CBM family exhibits an original fold among the 87 known CBM families and an unusual high molecular weight for a typical CBM, expanding the known three-dimensional architectures in this superfamily. Furthermore, a new GH family has been established with the discovery of two enzymes exhibiting β-galactosidase activity but insu cient sequence similarity for inclusion to previously described CAZy families. This new family is phylogenetically and structurally related to the large GH-A clan.
Together, these results provide an unprecedented and comprehensive understanding of the enzymatic apparatus in the capybara gut microbiome specialized in the breakdown of lignocellulosic biomass.

Conclusion
Multi-omics analysis has unveiled the biochemical and metabolic pathways employed by the gut microbiota from the Amazon monogastric semi-aquatic herbivore, capybara, for the breakdown and incubated at 68°C for 10 min. After that, the reactions of RNA/rRNA were incubated 5 min at 50 °C with magnetic beads to remove the hybridized rRNA molecules from the mRNA. Following, the tubes were placed over the magnetic tube rack for 5 min to separate the beads from solution. The supernatant content the rRNA-depleted was carefully transferred to another RNase-free tube. The puri cation of rRNAdepleted was performed adding 200 μl of freshly prepared 80% ethanol to the tube while in the magnetic rack. The solution was incubated at room temperature for 5 minutes, and then the ethanol was discarded.
The procedures were repeated twice, after that the samples were dried for 15 minutes at room temperature. The tubes were removed from the magnetic rack, followed for the addition of 11 μl of RNase-Free water and incubation for 2 minutes. Finally, the tubes were placed again over the magnetic tube rack, and the supernatant was collected. The total RNA obtained by magnetic beads procedure described above were used for sequencing.
Microbial community structure and diversity analysis Capybara gut microbial community structure and diversity was investigated via high-throughput sequencing of 16S rRNA V4 region (LNBR, Brazil). The ampli cation of the 16S rRNA V4 region was done using the 515F (5'-GTGCCAGCMGCCGCGGTAA) and 806R (GGACTACHVGGGTWTCTAAT) primers (53).
Sequencing was performed on an MiSeq Sequencing System (Illumina Inc., USA) with the V3 kit, 600 Cycles, in paired-end sequencing with 2x300bp. The ZymoBIOMICS™ Microbial Community DNA Standard (D6305) from Zymo Research (Irvine, CA, USA), with eight phylogenetically distant bacterial strains (3 gram-negative and 5 gram-positive) and 2 yeasts, was included as a positive control to evaluate possible bias in libraries construction, sequencing and bioinformatics analysis. For taxonomy analysis, paired-end reads were quality checked using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and ltered using Trimmomatic v.0.36 (54), to remove adapters and low-quality reads. Filtered paired-end reads were merged using fastq_mergepairs function of Usearch v.10 package (55). Prior to OTUs clustering, primer sequences were removed using fastx_truncate function, reads were dereplicated and singletons were discarded. OTUs were clustered using the UPARSE unoise3 function. Prokaryotic and Eukaryotic taxonomy assignment was performed using sintax function with RDP database v16 (56)

Metagenome and metatranscriptome analysis
Metagenome and Metatranscriptome raw sequences were quality checked and trimmed as described above, MT reads were also analyzed using SortmeRNA to remove rRNA reads, and then both MG and MT reads were taxonomically classi ed using Kaiju (57). For functional analysis, the MG trimmed reads were de novo assembled using IDBA_UD (version 1.1.1) with the pre-correction parameter and k-mer size from 20 to 60 (58). Furthermore, the assembled metagenome was binned using CONCOCT v.0.4.0 (59) and MaxBin 2.0 (60) to recover putative genomes from the metagenomic data. The binned genomes were dereplicated to remove redundancies using dRep and analyzed using CheckM v1.0.6 (61) to determine the completeness and contamination ratios of these genomes. Long-reads sequencing were used for MAGs scaffolding using SSPACE-long-reads v1.1. Genomes with completeness smaller than 55% and more than 15% contamination rate were discarded. Gene prediction and annotation of both the recovered genomes and the co-assembly was performed using Prokka v.1.11 with the meta parameter (62). KEGG pathways and Kegg Orthologous (KOs) annotation were performed using KOFAM (63) and Functional Ontology Assignments for Metagenomes (FOAM) database(64); CAZymes annotation was performed according to CAZy database pipelines (65). Furthermore, MG and MT reads were mapped to the assembled metagenome and recovered MAGs using Kallisto v. 0. 46.1 (66) to estimate the coverage/abundance of protein coding genes in cecal and rectal samples. Normalized abundance was estimated based on the count/number of reads per kilobase per million mapped reads expressed as TPM.

Phylogenetic analysis and metabolic reconstruction
MAGs whole genome phylogenetic analysis was performed using the pipeline of PhyloPhlAn (67). To further assign taxonomy to the recovered genomes GTDB-tk tool was used (68). Phylogenetic analysis of the MAG57 and reference Bacteroidetes type strains was performed using concatenated 92 single copy core genes according to UBCG method (69). CAZymes phylogenetic tree was conducted using the catalytic domain of each family, aligned with MAFFT (70) (76)) were employed for data preparation, anomalous scatters location and phase calculation, respectively. Initial models were built with the AutoBuild Wizard (77) from the Phenix package (78). All structures were re ned with programs PHENIX.REFINE (79) and REFMAC (80), and the models were inspected and manually adjusted according to the computed σ A -weighted (2F o -F c ) and (F o -F c ) electron density maps using COOT (81). TLS groups were calculated by TLSMD (82) applied to both re nements. All structures were evaluated by MolProbity (83) and PDBRedo server (84). Structure factors and atomic coordinates of CBMXX and GH43 enzyme were deposited in the Protein Data Bank (PDB) under the accession codes 7JVI and 7JVH, respectively. Data collection and re nement statistics are summarized in Supplementary Table 8. The GH43_12 structure was rstly solved in in the C2 space group with 2 molecules found in the asymmetric unit. However, we obtained unusual high values for Rwork/Rfree for the data resolution even after many re nement/validation cycles, which could indicate a wrong space group or a crystal pathology. The data was then re-processed in the P1 space group with 6 molecules in the asymmetric unit and small decrease in the R values was obtained. It was observed a poor density in some molecules (E and F) in comparison with the others, probably due to several orientations assumed; our analyses of the data indicate a possible partial rotational order-disorder pathology.
Small Angle X-ray Scattering (SAXS) -Data collection and analysis SAXS data of CBMXX was collected at the SAXS1 beamline (Brazilian Synchrotron Light Laboratory, Campinas, Brazil) at protein concentration of 8.4 mg.mL -1 in 20 mM Hepes buffer pH 7.5. Buffer scattering were recorded and subtracted from the corresponding protein scattering. SAXS patterns were integrated using Fit2D (85) and GNOM (86) was used to evaluate the pair-distance distribution functions p(r). Ab initio molecular envelopes were calculated from SAXS data with DAMMIN (87) and averaged models were generated from several runs using DAMAVER (88). Each nal SAXS low-resolution model was superimposed to its respective protein crystal structure using the program SUPCOMB (89).  Table 10) were evaluated using the 3,5-dinitrosalicylic acid method by determination of reducing sugar released (90).

Declarations Ethics approval
This study was carried out in strict accordance with the Animal Management Rule of the Brazilian Ministry of Environment (Documentation Sisbio 59826-1).

Consent for publication
All authors approved the nal version of the manuscript.

Data and materials availability
All data for this study can be found under the bioproject ID PRJNA563062. The 16S, metagenomic and metatranscriptome reads for cecal and rectal samples are available at SRA under the accession numbers SRR11852069-SRR11852086; SRR11852046-SRR11852057 and SRR11852097-SRR11852108, respectively (Supplementary Table 14). The MAGs can be found at GenBank under the accession numbers JABUSA000000000 -JABUVA000000000. Structural data have been deposited in the Protein Data Bank (https://www.rcsb. org/) under accession codes 7JVI (CapCBMXX) and 7JVH (CapGH43_12). All other data generated or analyzed during this study are included in this published article (and its Supplementary information les) or are available from the corresponding author on reasonable request.

Competing Interests
The authors declare no competing interests. LC and MLS performed the metabolomics analysis.
All authors analyzed the results and approved the nal version of the manuscript.