Taxonomic distribution of the capybara gut microbiota diverges from typical hindgut fermenters
The taxonomic structure of the capybara gut microbiota is mainly constituted by Bacteria, with only a small fraction of reads corresponding to Archaea (16S: 1.07%, MG: 0.40% and MT: 1.55%) and Fungi (MG 0.12% and MT: 0.32%). 16S rRNA gene-based taxonomy analysis indicates that the most abundant bacteria found in capybara gut microbiota are members from the phyla Firmicutes (mean + sd: 35.8 + 12.4%), Bacteroidetes (31.5 + 9.8%), followed by Fusobacteria (15.3 + 5.4%) and Proteobacteria (8.4 + 5.4%) (Fig. 1a). A similar taxonomic distribution was observed for 16S rRNA reads recovered from metagenome (16S_MG), MG and MT datasets (correlation coefficient r=0.96 for MG:MT, r=0.74 for 16S_MG:MG and r=0.71 for 16S_MG:MT, P < 0.05). Bacteroidetes, Fusobacteria and Proteobacteria phyla were significantly more abundant in MG than in MT considering cecal samples (Fig. 1b), whereas Euryarchaeota (MT/MG ratio expressed as mean + SD: 2.3 ± 0.6), Fibrobacteres (1.7 ± 0.9) and Spirochaetes (1.6 +/- 0.4) were more prevalent in MT than in MG data of cecal samples (Fig. 1b). Although the protein-to-RNA ratios are rather dynamic in complex microbial communities according to absolute multi-omics studies [15], the higher MT/MG ratios for some phyla, such as Euryarchaeota known to include methanogens species, might correlate to the fact that capybara is a pronounced methane producer among the hindgut fermenters [12, 16].
Binning of MG assembled contigs based on tetranucleotide frequency and coverage profile resulted in the reconstruction of 79 unique Metagenome-Assembled Genomes (MAGs) (Suppl. Table S1), being 24 considered of high quality (completeness >90% and contamination <5%) and 50 medium-quality (completeness >50% and contamination <10%), in agreement with the completeness and contamination parameters suggested by Bowers et al. 2017 [17]. Based on the GTDB database, 24 of the recovered MAGs are classified as taxonomic novelties, representing either novel species or genera, which include one from Fibrobacterota (family Fibrobacteraceae), one from Planctomycetota (family Thermoguttaceae), one from Spirochaetota (family Sphaerochaetaceae), eight from Firmicutes (families Erysipelotrichaceae, Lachnospirales, CAG-826 and Oscillospiraceae) and 13 from Bacteroidetes (families Bacteroidaceae, UBA932 and Muribaculaceae) (Suppl. Table S1). It is notable that despite only two MAGs from Fusobacteria and Proteobacteria (MAG38, 39 and MAG77,78, respectively) were recovered from MG, the most abundant OTUs revealed by 16S rRNA gene analysis were classified to these phyla (20% Fusobacteria and 18% Proteobacteria, Suppl. Fig. S1), pointing to an important role of these species in this microbiota.
A prevalence of Firmicutes, Proteobacteria, Bacteroidetes and Tenericutes was observed in the gut microbiomes of other hindgut herbivores such as Castor fiber, Castor canadensis, horse, rabbit, and koala [18–22]. Furthermore, microbiota analysis of domesticated herbivores including hindgut fermenters, ruminants and monogastric animals revealed Firmicutes as the most abundant phylum (53.11, 63.35 and 52.27%, respectively), followed by Bacteroidetes (31.36, 20.95 and 26.95%, respectively) [23]. Although the predominance of Bacteroidetes and Firmicutes is a general feature of mammalian gut microbiomes [24], the microbiota of native Brazilian capybara differs from other hindgut fermenters and ruminants, mainly due to a reduced abundance of Firmicutes (35%) along with a higher abundance of Fusobacteria (15%) and Proteobacteria (8%), suggesting that diversified strategies for lignocelulosic utilization may have evolved in this gut environment.
Fibrobacteres and Bacteroidetes are main degraders of dietary fibers in capybara gut
In order to understand the ability of capybara to convert plant polysaccharides into free sugars, MG and MT data were investigated to determine the genomic potential of its gut microbiota associated with Carbohydrate-Active enZymes (CAZymes). A total of 6,132 putative CAZymes genes encoding for 105 Glycoside Hydrolases (GH), 11 Carbohydrate Esterases (CE), and 10 Polysaccharide Lyases (PL) families were identified, of which 456 genes presented a modular architecture (Suppl. Table S2). The most abundant CAZymes identified are members of the families GH3, GH2 and GH1 (by decreasing abundance) (Fig. 2), which is in agreement with that reported for other host-associated gut microbiomes such as human, swine and cattle rumen [25]. These enzymes encompass diversified activities including β-glucosidase, β-xylosidase, β-galactosidase and β-mannosidase, and are often associated with the final steps in the depolymerization cascade of several plant polysaccharides such as cellulose, heteroxylans, mixed-linkage β-glucans and β-mannans.
In the CAZyme repertoire of this microbiota neither cellulases from families GH6, GH7 and GH48, nor cellulosomes, assessed by the presence of cohesin and dockerin domains associated with cellulases, could be identified in MG or MT datasets. In ruminal anaerobic fungi, these families are found in high abundance, possibly targeting recalcitrant cellulose structures [26]. However, in the capybara gut microbiota assayed here, fungi were detected only at very low abundance (Fig. 1a). This suggests that cellulose degradation in the capybara gut might be mainly accomplished by endo-β-1,4-glucanases (EC 3.2.1.4) from families GH5 (subfamilies GH5_2, GH5_4, GH5_25 and GH5_37), GH8, GH9 and GH45, which were detected either as single domains or in multi-modular protein architectures. Notably, the most expressed genes encoding endo-β-1,4-glucanases belong to families GH5_2, GH8, GH9 and GH45, and were identified in Fibrobacter MAGs (Suppl. Fig. S2 and Suppl. Table S3) that also present a high MT/MG ratio (Fig. 1b), indicating that these bacteria are putatively key players in cellulose degradation in the capybara gut. Fibrobacter succinogenes is known as a highly efficient cellulolytic bacterium in the cow rumen [27] and employs a multi-protein complex to attach to cellulose fibers and cellulases secreted by the T9SS-dependent secretion system for cellulose breakdown [28]. The three Fibrobacter MAGs recovered from capybara gut microbiome encode cellulases with a T9SS signal sequence as well as proteins for cellulose adhesion including tetratricopeptide, fibro-slime, OmpA and pilin proteins, as reported for F. succinogenes [28]. Furthermore, from the set of 347 proteins observed in the outer membrane vesicles (OMVs) from F. succinogenes [29], we have identified 262 with sequence identity ranging from 30-99%. These observations suggest that typical Fibrobacter mechanisms, fundamentally relying on cell surface adhesion and OMVs, are central for cellulose depolymerization in the capybara gut.
In the multiple recovered Bacteroidetes MAGs, a large number of polysaccharide utilization loci (PULs) and clusters of CAZymes (CCs) were identified (Fig. 3 and Suppl. Table S4), which provide highly diversified capabilities to this microbiota to cope with the chemical and structural complexity of typically abundant hemicelluloses and pectins in gramineous or aquatic plants such as heteroxylans, mixed-linkage β-glucans, β-1,3-glucans, xyloglucans, mannans and homogalacturonans. Notably, the identified PUL targeting mixed-linkage β-glucans (Suppl. Fig. S3) is highly conserved in capybara and human microbiomes, presenting identical gene architecture encompassing one GH16 and two GH3 enzymes, as reported in [30]. PULs targeting to heteroxylans and homogalacturonans (Suppl. Fig. S3), common components of gramineous plants such as sugarcane [31], also resemble PULs identified in human gut microbiomes [32], highlighting the significant level of conservation of Bacteroidetes enzymatic systems in omnivores and hindgut herbivores.
Despite the presence of multiple carbohydrate esterases (CEs) (Fig. 3), the lack of auxiliary-active enzymes (AAs) indicates a low capacity of the capybara microbiota to perform plant biomass delignification as also observed for other monogastric herbivores such as horses [33, 34]. As a mechanism to cope with lignin-rich diets, these animals may employ cecotrophy to enhance digestibility and nutrient uptake. In addition, it is noteworthy that many identified PULs only showed similarity with non-experimentally validated PULs and without a clear substrate target (Suppl. Table S4), which in part could be due to intrinsic limitations of genome reconstruction from metagenomes, but also reflects the variability, heterogeneity and partial knowledge of the structure and composition of the glycans present in the diet of wild capybaras.
Taken together, the CAZyome analysis of the capybara gut microbiota indicates Fibrobacteres as main drivers for cellulose breakdown, whereas the numerous Bacteroidetes PULs/CCs confer to this community a myriad of enzymatic strategies to tackle with the complex and diverse hemicelluloses and pectins typically present in gramineous and aquatic plants, major components of capybara diet.
Global metabolite profiling shows high performance on the conversion of dietary fibers into short-chain fatty acids
Once addressed important players in the depolymerization of dietary fibers in capybara gut, we further investigated the role of these microorganisms in the conversion of free sugars into energy for the host by integrating metabolomics and metabolic reconstruction analysis.
The major fermentation products measured in the capybara gut were short-chain fatty acids (SCFAs), among more than 40 metabolites detected by NMR spectroscopy-based metabolomics (Suppl. Table S5). The most abundant metabolites observed in cecal and rectal samples were acetate (mean + SD: 74.83 + 22.17 and 30.40 + 22.76 mM, respectively), propionate (31.0 + 6.67 and 15.98 + 12.8 mM) and butyrate (23.30 + 5.63 and 8.35 + 12.83 mM). These SCFA ratios indicate a forage-based diet and are similar to that seen for ruminants [35, 36], supporting a high efficiency of this microbiota in the use of dietary fibers as an energy source.
Genes related to pyruvate fermentation into acetate were highly abundant in both MG and MT data for cecal and rectal samples, and they are derived from Firmicutes, Bacteroidetes and Fusobacteria (Fig. 4 and Suppl. Fig. S4). Metabolic pathway reconstruction analysis shows that acetate can be putatively produced by any of the bacterial MAGs recovered from capybara gut microbiome (Fig. 4), which is in agreement with the high abundance of this metabolite in both cecal and rectal samples (Suppl. Table S5). On the other hand, the expression analysis of key genes involved in the butyrate pathway (atoA/D genes) indicates that Firmicutes Ileibacterium sp. MAG6 and Megasphaera sp. MAG33 are likely the major butyrate-producing bacteria in the capybara gut (Suppl. Fig. S5 and Suppl. Table S6). The Bacteroidetes Marinilabiliaceae MAG47 and Fusobacteria MAG38 and MAG39 also have co-localized genes atoA/atoD and ptb/butK, suggesting that they also contribute to butyrate production, in some extent (Suppl. Fig. S5 and Suppl. Table S6). The typical genes from acrylate and propanediol pathways involved in propionate production were not identified in the recovered MAGs from capybara gut (Fig. 4), but the mmdA gene encoding a methylmalonyl-CoA decarboxylase from the succinate pathway, is widespread mainly among Bacteroidetes and was also observed in some Firmicutes and Fusobacteria MAGs (Suppl. Fig. S5 and Suppl. Table S6). Furthermore, the ratio of propionate detected in the gut capybara gut correlates (R=0.77 and p=0.07) with the relative abundance of Bacteroidetes, supporting the succinate pathway from this phylum as the major source of propionate production in the capybara gut.
Together, these results demonstrate a high SCFA production in the capybara digestive tract, which is a common marker of digestion performance of dietary fibers [37], and therefore, reinforce the potential of this microbiota for the breakdown of recalcitrant plant polysaccharides with concomitant production of energetic metabolites for the host.
A new GH family mined from the genomic dark matter of capybara microbiome
Drawing on the results showed herein, capybara gut microbiome can be an important source for uncharted enzymes involved in plant polysaccharides depolymerization. Moreover, the joint MG and MT analysis of capybara gut microbiome revealed several expressed genes annotated as hypothetical proteins. Some of these genes are remotely similar to CAZy members, with sequence identity ranging from 10 to 20%, suggesting a potential function in the processing of plant polysaccharides, but requiring further functional investigation (Suppl. Table S7). Aiming to uncover the activity of these proteins, several ORFs were expressed and subjected to biochemical assays employing a diverse set of synthetic, poly- and oligosaccharides substrates (Suppl. Table S8).
One of these proteins (SEQ ID PBMDCECB_44807, named here CapGHXXX) was active on p-nitrophenyl-β-D-galactopyranoside (pNP-β-D-Gal) and kinetic parameters were determined from substrate saturation curves (Table 1 and Suppl. Fig. S6). CapGHXXX orthologues are found in Actinobacteria, Firmicutes, Verrucomicrobia and Bacteroidetes MAGs recovered from diverse sources such as rumen, feces, gut, and oral microbiotas (Suppl. Table S9), being the closest sequence from a rumen-derived MAG (UBA2817) from the uncultured RC9 group [38]. Sequence analysis showed that CapGHXXX is distantly related to families GH5 and GH30 (Fig. 5a) and protein threading indicates a TIM-barrel fold (Suppl. Fig. S7), suggesting that this GH family belongs to the clan GH-A. To further explore the GHXXX family, the enzyme CBK67650.1 (SEQ_ID BXY_26070) from B. xylanisolvens, which shares 46% sequence identity with CapGHXXX, was heterologously produced and biochemically characterized (Table 1). This second member also showed β-galactosidase activity that strengthens at biochemical level the establishment of this new GH family.
In the Bacteroidota bacterium MAG42 recovered from Capybara gut, CapGHXXX is found in a predicted PUL, additionally comprising enzymes from families GH2 and GH78. A similar PUL organization was also predicted in Bacteroidetes sp. 1_1_30 recovered from human gut, which yet harbors enzymes from GH36, CE7 and PL8_2 families. It is noteworthy that CapGHXXX is often found fused to a GH36 module or in PULs having GH36 members, as in B. xylanisolvens and Prevotella dentalis, recovered from stool and oral cavity, respectively (Fig. 5b), indicating a synergistic relationship between these families. Moreover, these families are also commonly found along with GH78 α-L-rhamnosidases in the PUL context. In Bacteroidales bacterium UBA2817, a GHXXX member is appended to a GH78 module carrying a CBM67, both targeting rhamnogalacturonans (Fig. 5b). These observations suggest that GHXXX could act on β-linked galactosyl residues in pectic polysaccharides.
Capybara Prevotella sp. MAG57 harbors a novel family of carbohydrate-binding module
Among the recovered MAGs from capybara gut microbiome, Prevotella sp. MAG57 is the one with the largest number of CAZyme-encoding genes. Phylogenetic and whole genome analyses show that MAG57 is closely related to other uncultured MAGs taxonomically assigned to the Prevotellaceae family recovered from the UBA project [38], either from sheep, elephant and mice gut microbiomes (Suppl. Fig. S8).
Multiple PULs and CAZyme clusters were identified in MAG57 (Suppl. Table S4) including a gene cluster targeting arabinoxylan (CC102), an abundant hemicellulose in secondary cell walls of sugarcane and other grasses. This cluster encodes two exo-enzymes from families GH43 and GH97, and an unconventional GH10 member with an unknown 45 kDa N-terminal domain (Fig. 6a). Sequence analysis showed that this unusual N-terminal domain is also present in Bacteroidetes MAGs derived from the gut of human, mouse, and elephant (Suppl. Table S10); however, it displays no similarity with any known ancillary domain associated with CAZymes.
Therefore, to evaluate the function of this unconventional GH10 member (CapGH10), the full-length protein, its domains apart, and the other GH members comprising the CC102 cluster were recombinantly expressed and characterized. The GH97 member (CapGH97) is a calcium-activated α-galactosidase, whereas the GH43 member (CapGH43_12) is a highly active α-L-arabinofuranosidase (Suppl. Figs. S9-S10 and Table 1) – two key activities to remove decorations of heteroxylans.
The GH10 domain of CapGH10 exhibits endo-β-1,4-xylanase activity, being active on both xylan and distinct arabinoxylans (Table 1). Kinetic analysis indicate that decorations present in rye arabinoxylan (arabinose/xylose ratio = 40/60) are not detrimental to the catalytic performance, showing similar Km and kcat constants compared to xylan (Table 1 and Suppl. Fig. S11). The Xyn10Z enzyme from Hungateiclostridium themocellum ATCC 27405, sharing 36% of sequence identity with CapGH10, is the closest characterized member so far, with high activity on xylan [39]. The N-terminal region of Xyn10Z encompasses a feruloyl esterase followed by a CBM6 domain, which is not conserved in CapGH10 [40]. The CapGH10 N-terminus showed only sequence similarity with uncharacterized proteins, with the closest homologs mostly presenting a GH10 domain with sequence identity around 37-44%. CapGH10 orthologues, featuring similar domain architecture, were identified in PULs from ruminal Prevotella sp. such as Prevotella sp. BP1-148, Prevotella sp. BP1-145, Prevotellaceae bacterium HUN156 and Prevotellaceae bacterium MN60, also likely targeting xylan-related polysaccharides.
The potential enzymatic activity of the isolated N-terminal domain of CapGH10 was assessed against 30 different substrates including synthetic substrates, oligosaccharides, and polysaccharides (Suppl. Table S8), but no (hydrolase, lyase or esterase) activity was observed. Typical activities involved in heteroxylans breakdown including endo-β-1,4-xylanase, β-xylosidase, α-L-arabinofuranosidase, α-D-galactosidase, α-D-glucuronidase, 4-O-methyl-glucuronoyl methylesterase, feruloyl esterase and acetyl xylan esterase were assayed by distinct methods without the detection of product formation or substrate consumption. Under this perspective, we then interrogated the capacity of this N-terminal domain to bind potential substrates of its GH10 partner such as beechwood xylan and arabinoxylans using affinity gel electrophoresis (AGE). As shown in Fig. 6c, this domain can indeed interact with the substrates of the GH10 domain, suggesting that this N-terminal domain may target the CapGH10 catalytic domain to xylan polysaccharides.
To get further insights into the potential role of this unconventional N-terminal domain, its crystallographic structure was solved by SeMet phasing at 1.8 Å resolution (Suppl. Table S11). The domain is monomeric in solution (Suppl. Fig. S12) and exhibits a parallel right-handed β-helix fold, consisting of 14 complete helical turns with two main short helices protruding from the β-helix backbone (Fig. 6b). The 14 helical turns are twisted and curved with a calcium ion between the 11th and 12th turns in an octahedral coordination sphere (Fig. 6b). This β-helix fold is observed in the clan GH-N of the GH superfamily, in the carbohydrate esterase CE8 and in several polysaccharide lyase (PL) families; however, structural comparisons with these CAZy families (GH28, GH91, PL6 and CE8) led to high rmsd values (> 3 Å), indicating poor three-dimensional conservation (Suppl. Table S12). Neither the catalytically relevant residues nor the active site topology of these families are conserved in the CapGH10 β-helix domain (Suppl. Fig. S13), supporting that this domain is not catalytically active.
In order to elucidate the molecular determinants for xylan binding observed in AGE experiments, two surface regions populated with aromatic and acidic residues, typical platforms for carbohydrate interaction, were identified and mutated. The region I between the turns 1-4 and the region II near to turns 6-10 (Fig. 6b and Suppl. Fig. S14). Mutations E247A and E282A, in the region II, severely impaired protein stability and led to the expression only as inclusion bodies. Mutation D344L (at the calcium-binding site) also affected protein stability in a less extent, but the arabinoxylan/xylan binding capacity was preserved (Suppl. Fig. S15). This result indicates that calcium ion has a structural relevance rather than a functional role in carbohydrate recognition. Only the mutants Y62A and E82A, affected the migration pattern in AGE assays with beechwood xylan and rye arabinoxylan (Fig. 6c). Both residues are located at the region I, indicating that this patch plays a role in carbohydrate binding. It is worth to mention that two aromatic residues located at the corresponding region of the GH28 active site, Y193 and Y279, did not alter the carbohydrate binding, in agreement with no functional relevance of this region for CapGH10 β-helix domain. Combining the biochemical, structural and mutagenesis analyses, we would define CapGH10 β-helix domain as a CBM, therefore, establishing a novel structural scaffold in this superfamily and founding the family CBMXX.
Taken together, we conclude that this unprecedented modular endo-β-1,4-xylanase along with the synergistic activities of other CC107 partners confers the ability to Prevotella sp. MAG57 to act on complex heteroxylans (Fig. 6d), a key function in the gut microbiome of capybara that have grasses as a major component in its diet.