Herbivorous Fish Microbiome Adaptations to Sulfated Dietary Polysaccharides

This work connects specific uncultured bacterial taxa with distinct polysaccharide digestion capabilities lacking in their marine vertebrate hosts, providing fresh insights into poorly understood processes for deconstructing complex sulfated polysaccharides and potential evolutionary mechanisms for microbial acquisition of expanded macroalgal utilization gene functions. Several thousand new marine-specific candidate enzyme sequences for polysaccharide utilization have been identified. ABSTRACT Marine herbivorous fish that feed primarily on macroalgae, such as those from the genus Kyphosus, are essential for maintaining coral health and abundance on tropical reefs. Here, deep metagenomic sequencing and assembly of gut compartment-specific samples from three sympatric, macroalgivorous Hawaiian kyphosid species have been used to connect host gut microbial taxa with predicted protein functional capacities likely to contribute to efficient macroalgal digestion. Bacterial community compositions, algal dietary sources, and predicted enzyme functionalities were analyzed in parallel for 16 metagenomes spanning the mid- and hindgut digestive regions of wild-caught fishes. Gene colocalization patterns of expanded carbohydrate (CAZy) and sulfatase (SulfAtlas) digestive enzyme families on assembled contigs were used to identify likely polysaccharide utilization locus associations and to visualize potential cooperative networks of extracellularly exported proteins targeting complex sulfated polysaccharides. These insights into the gut microbiota of herbivorous marine fish and their functional capabilities improve our understanding of the enzymes and microorganisms involved in digesting complex macroalgal sulfated polysaccharides. IMPORTANCE This work connects specific uncultured bacterial taxa with distinct polysaccharide digestion capabilities lacking in their marine vertebrate hosts, providing fresh insights into poorly understood processes for deconstructing complex sulfated polysaccharides and potential evolutionary mechanisms for microbial acquisition of expanded macroalgal utilization gene functions. Several thousand new marine-specific candidate enzyme sequences for polysaccharide utilization have been identified. These data provide foundational resources for future investigations into suppression of coral reef macroalgal overgrowth, fish host physiology, the use of macroalgal feedstocks in terrestrial and aquaculture animal feeds, and the bioconversion of macroalgae biomass into value-added commercial fuel and chemical products.

. Although dietary preferences are known to vary among kyphosid species and ontogenic stages of individual animals, macroalgae are believed to be the primary adult food source, based on visual observation of stomach contents (13,14). However, the full range of potential dietary diversity among individual animals over time has not yet been established using molecular methods.
16S rRNA-based molecular methods have identified a number of bacterial species in hindgut compartments of kyphosid fish (35,36) that are taxonomically related to those present in plant-digesting terrestrial herbivores, including Alistipes-related Bacteroidota, Clostridia-related Bacillota, and Desulfovibrio-related Deltaproteobacteria. Although numerous cultured isolates, genomes, and metagenome-assembled genomes (MAGs) have been obtained for these taxa in terrestrial animals (37,38), few laboratory cultures, complete genomes, or high-quality MAGs are currently available for their relatives within herbivorous fish gut microbiota (34,36). As a result, strain-specific functional adaptations that might account for their success in marine hosts with macroalgal diets have not yet been established.
The diverse galactose, mannose, xylose, fucose, rhamnose, arabinose, and uronic acid subunits of macroalgal polysaccharides occur in a wide variety of different ratios, sulfation states, and branching patterns (39)(40)(41). Processing these complex carbohydrates into simpler sugars for incorporation into central metabolic pathways requires sulfate removal either before, concurrently with, or after glycosidic bond cleavage (31,42,43), but determining precise hydrolysis mechanisms can be difficult for substrates requiring coordinated action by multiple different enzymes. Enzyme characterization ambiguities can also arise in cases of natural substrate promiscuity, potentially confounding experimental determinations based on simplified oligosaccharide model compounds instead of full-length polymers (44). Additional challenges arise in trying to predict functional specificities from genomic, metagenomic, and/or transcriptomic sequence data for protein families with few experimentally characterized examples.
Bacterial genes encoding cooperative functional activities are often located in close genomic proximity to each other, providing opportunities for coordinated regulation. Well-studied glycohydrolase gene clusters known as polysaccharide utilization loci (PUL) and carbohydrate-active enzyme gene clusters (CGCs), may also include sulfatases, sugar binding proteins, transporters, and regulatory elements (45,46). PUL and CGCs can be characterized bioinformatically by sequence comparison to the CAZy database of carbohydrate-active enzymes (47), and the associated PUL database (48), using either bacterial isolate genomes or culture-independent metagenomes (23). A large database of sulfatase sequences is also available (49), but its protein family classifications are based on broad evolutionary relatedness, rather than experimentally determined substrate specificities.
The breakdown of naturally occurring sulfated polysaccharides via exported extracellular enzymes is known to occur in cultured bacteria isolated from macroalgal surfaces (50), but the extent to which these observations can be applied to the microbiomes of macroalga-eating vertebrates is unknown. A better understanding of enzyme candidate diversity, species of origin, and subcellular localization can lead to practical applications in aquaculture and animal husbandry: for example, by enhancing digestibility of macroalgal feed sources through probiotic supplements (51)(52)(53)(54)(55)(56) and/or reducing ruminant methane contributions to global warming by adding small amounts of seaweed to their diets (57). This information would also be useful in identifying key organisms and enzyme variants for biotechnological processing of marine algal feedstocks for the production of value-added products such as biofuels (58), as well as pharmaceutical production of biologically active glycan compounds such as fucoidans, with medicinally valuable anti-inflammatory, anticancer, antiviral, antioxidation, anticoagulant, antithrombotic, and antiangiogenic effects (43,59).
The current study uses metagenomic analysis to survey microbially encoded proteins related to macroalgal digestion in the gut microbiota of three sympatric Hawaiian reef fishes, Kyphosus vaigiensis, Kyphosus cinerascens, and Kyphosus hawaiiensis (60,61). Networks of expanded CAZyme and sulfatase protein families occurring in close genomic proximity have been used to obtain insights into the diversity, taxonomic distribution, and operon context of microbial enzymes enabling these fish to digest a broad variety of diverse macroalgal polysaccharides.

RESULTS
Taxonomic composition of metagenomic samples. The 16 fish gut metagenomic samples assembled using metaSPAdes yielded a total of 1.478 Tbp of assembled nucleotides in contigs of .2 kb, encoding 1,432,202 predicted proteins (see Table S1B in the supplemental material). Microbial community compositions of these samples were assessed using unassembled reads with Kraken2, enabling relative abundance estimates for eukaryotic, archaeal, and viral taxa as well as bacteria.
High-level taxonomic classifications for individual samples (Fig. 1) were generally consistent with 16S rRNA gene abundances reported from the same kyphosid fish samples (35), although some additional variability was observed between individual fishes that had previously been masked by averaging. Bacteroidota and Bacillota were highly abundant in all hindgut samples, with Alpha-, Beta-, and Deltaproteobacteria, Spirochaetota, Verrucomicrobiota, and Archaea distributed at lower abundances across all gut regions. Interfish differences were observed for Gammaproteobacteria, dominating midgut samples from K. vaigiensis and K. hawaiiensis but not K. cinerascens, and Eukaryota, present at much higher abundances in fish 8 (juvenile K. vaigiensis).
Taxonomic associations of fish gut-associated bacterial clades were further explored using predicted protein sequences of the single-copy DNA-directed RNA polymerase subunit beta (rpoB) gene on assembled contigs. An amino acid sequence tree (Fig. 2) shows the largest numbers of rpoB genes recovered were taxonomically classified as Bacillota (50), followed by Bacteroidota (39), Gammaproteobacteria (19), Spirochaetota (7), Deltaproteobacteria (6), Verrucomicrobiota (5), Alphaproteobacteria (4), and Melainabacteria (1). No rpoB sequences were detected for Archaea, consistent with the low coverage of this taxonomic group observed in unassembled reads.  The closest database relatives to kyphosid metagenome rpoB genes suggest that transient ingested environmental bacteria may be more abundant in midgut samples, with hindgut regions containing higher numbers of sequences more closely related to persistent taxa from gastrointestinal compartments in other animal hosts (Table S2). Predicted fish hindgut RpoB proteins from Bacteroidota, Bacillota, Deltaproteobacteria, and Alphaproteobacteria clades most closely matched sequences from terrestrial ruminant MAGs, rather than marine environmental or bacterial isolate genomes. In contrast, midgut-associated Gammaproteobacteria RpoB protein sequences most closely resembled marine environmental Vibrionaceae, including one shallow-branching clade matching Vibrio campbellii at 95.8% amino acid identity (WP_005433641.1). Surprisingly, no matches were found for previously described macroalga-degrading Gammaproteobacteria from the Alteromonadaceae and Flavobacteriaceae families, such as Paraglaciecola agarilytica (23) or Formosa agariphila (22).
Macroalgal components of fish diets. Eukaryotic components of fish gut samples were initially assessed based on the presence of 18S rRNA markers in assembled  Alphaproteobacteria (4) Bacteroidota (37) Deltaproteobacteria (6) Bacillota (48) Gammaproteobacteria (26) Melainabacteria (1) Verrucomicrobiota ( Predicted protein sequences from this study are shown in black, identified by fish number, gut compartment, and Prokka annotation number. Sequences in red are from GenBank nr, with selected species known to digest sulfated macroalgal polysaccharides highlighted in boldface as outgroups. Blue text indicates predicted rpoB sequences from terrestrial ruminant MAGs (not currently included in GenBank as protein sequences) prefixed with GenBank genome identifier codes from reference 38, followed by Prokka annotation numbers from this study (e.g., RUGXXXXX). Additional information about these sequences is provided in Table S2. metagenomic contigs, but this method proved relatively insensitive, detecting only highly abundant taxa. 18S rRNA sequences assembled from metagenomic data were limited to those of kyphosid fish hosts, along with single-celled protozoa (Trichomonadea, Entamoeba, Giardia, and Plasmodium) and multicellular worm taxa (Enenterum, Acanthocephalus, Enoplus, and Opisthadena) often associated with fish pathology (Table S3). Eukaryotic components of gut samples were subsequently analyzed using Kraken2 classifications of unassembled metagenomic reads. Despite relatively low levels of red, brown, and green algal sequences in total metagenomic DNA, the use of very-largeinput data sets (millions of reads for most samples) enabled sensitive quantification of their relative abundances. This methodology was helpful in avoiding potential issues associated with variability in 18S rRNA gene copy numbers, incomplete metagenomic assembly, and potential degradative fragmentation of nucleotide sequences. Even though the reference sequence library used by Kraken2 does not include exact matches for every possible environmental species of interest, solid taxonomic assignments can be made at broader granularities based on closest available relatives (67). A summary of phylum-level algal matches ( Fig. 3) shows that metagenomic reads from red (Rhodophyta) and green (Chlorophyta) algal lineages were more abundant than those from brown algae (Phaeophyceae) in all cases, except the most proximal gut sample from fish 7 (F7GI2). Brown algal abundance declined sharply in more distal gut sections of this fish, consistent with consumption of a meal composed primarily of brown algae during a single, relatively brief time period. Finer-grained examinations revealed that algal sequences from the phylum Phaeophyceae in all fish samples were dominated by matches to the order Ectocarpales (.70% overall), with relatively few examples from Sphacelariales, Dictyotales, Fucales, and Laminariales (Table S4A).
Red algal matches encompassed a much broader range of taxonomic diversity, including not only macroalgae from the orders Ceramiales, Gracilariales, Gigartinales, and Bangiales, but also sequences most closely related to unicellular forms classified as Cyanidales and Porphyridiales (Table S4B). The most prominent green algal sequences matched unicellular, filamentous, and colonial classes Chlorophyceae, Trebouxiophyceae, and Mamiellophyceae, with much lower representation of parenchymatous lineages from class Ulvophyceae (Table S4C). These results are consistent with prior observations of both kyphosids and other marine reef fish consuming not only macroalgae, but also uncharacterized microalgal assemblages described as "turf algae" and "epilithic algal matrix" ( Macroalgal polysaccharide hydrolysis. Functional annotations of predicted kyphosid fish gut microbiome proteins (see Data Set S1 in the supplemental material) were consistent with general metabolic characteristics of microbial taxa related to those observed in Fig. 1, as well as those recently described in Kyphosus sydneyanus (36,70). Although relative abundances of predicted protein functions in metagenomic assemblies do not necessarily correspond with biological activity, expanded numbers of metagenomically predicted genes from specific protein families can indicate potential adaptive capacity.
High levels of structural microheterogeneity in naturally occurring marine macroalgal polysaccharides (15) might be expected to promote expansion and diversification of microbial enzyme families involved in their hydrolysis. To distinguish between marine macroalga-specific enzymes and more general carbohydrate digestion activities common to saccharolytic bacteria from nonalgivorous hosts with high-carbohydrate diets, predicted protein annotations for kyphosid fish gut metagenomes were compared to a set of 391 MAGs from terrestrial ruminant digestive systems (38). The most frequently encountered nonhypothetical gene function descriptions in adult kyphosid fish metagenomes were arylsulfatases (1.9% of all predicted proteins). In contrast, arylsulfatases comprised only 0.13% of annotated proteins in MAGs from terrestrial ruminants with no dietary exposure to macroalgal polysaccharides. (Data Set S1).
Although the annotations produced by Prokka are not comprehensive (for example, laminarinases, cellulases, and fucosidases were labeled with only broader, more generic keywords), these results did confirm enrichment of predicted proteins described as agarases, carrageenases, porphyranases, and arylsulfatases in the fish metagenomes (P , 0.05). Carrageenases, porphyranases, and agarases were observed at consistently higher frequencies in hindgut versus midgut samples, compared to more variable compartmental distributions for alginate lyases and arylsulfatases (see Fig. S1 in the supplemental material).
Candidate enzymes for macroalgal digestion were subsequently characterized in more detail by classification according to the CAZy reference database ( Fig. 4; Data Set S2). Quantitative family abundances, normalized for total numbers of predicted proteins from each gut compartment and species sample, were used to identify significant protein family expansions and diversifications in the context of total available protein repertoire.
CAZy families targeting red algal agars (GH117 and GH50), porphyrans (GH86), and carrageenans (GH82 and GH150), green algal ulvans (PL40 and PL38), and nonsulfated alginates from brown algae (PL6_1) were particularly abundant among fish metagenomes. Fish-specific families annotated as chondroitinases (GH88, PL8_1, PL30, and PL29) and heparin lyases (PL13) are most likely associated with the breakdown of hostassociated extracellular matrix components. Fish metagenomes also contained large numbers of sequences from families annotated as galactosidases (GH110 and GH165) and xylanases (GH10), as well as several families annotated as hydrolyzing polysaccharides containing multiple different monomers (GH39, GH136, and PL9_2), but potential involvement of these families in macroalgal decomposition could not be unambiguously determined. In agreement with annotation keyword results, the great majority of sequences from CAZy families enriched in kyphosid fish relative to terrestrial ruminants and described as hydrolyzing sulfated macroalgal polysaccharides were obtained from hindgut compartments.
CAZy glycoside hydrolase families are based on shared amino acid sequence similarity, protein structural features, and catalytic mechanisms, but often do not distinguish between enzymes acting on chemically similar substrates, especially those with different branching patterns (47,71,72). Polyspecific CAZy families are particularly common among glycohydrolases annotated as having 1,3 b-glucosidase and/or 1,4 b-glucosidase activities, including laminarinases and cellulases (Table S5). Families GH16_3 and GH16_21, historically described as laminarinases despite having a substantially broader range of actual substrates (73), were slightly more abundant in fish gut than terrestrial ruminant metagenomes, but these differences were not statistically significant. Other CAZy families annotated as hydrolyzing nonsulfated, glucose-containing polysaccharides (e.g., GH3, GH5, GH8, GH9, and GH64) were more abundant in metagenomes of terrestrial ruminants than fish. However, model enzymes in these families hydrolyze not only algal polysaccharides (74), but also nonalgal glycans from bacterial capsule polysaccharides (75) and the cell walls of plants (76) and fungi (77). These ambiguities preclude confident substrate predictions for these particular CAZy families based on sequence data alone.
Two recently described CAZy families (GH107 and GH168) include enzymes experimentally demonstrated to hydrolyze sulfated brown algal fucans (23,43). Although neither of these two families was detected in terrestrial ruminant metagenomes, kyphosid fish samples included several predicted sequences from family GH168 (Fig. 4). Additionally, fucosidase family GH141, present in multiple copies in PUL of known fucan-degrading bacteria (26), was highly expanded in fish gut metagenomes. In contrast, fucosidases from family GH29 were abundant in both kyphosid fish and terrestrial ruminant metagenomes, and family GH95 fucosidases were more abundant in the terrestrial samples, suggesting activity profiles that are not specific for marine macroalgae.
Bacterial export signal sequences were detected in the majority of sequences from fish-expanded CAZy families (Fig. 4A). In Gram-positive bacteria such as Bacillota, proteins containing signal sequences are translocated across the cell membrane and exported to the extracellular environment (78). In Gram-negative bacteria such as Bacteroidota, signal sequences mediate initial translocation across the inner cytoplasmic membrane to the periplasmic space, which may or may not be followed by secretion across the outer membrane (79). Although N-terminal signal sequences could be missing in some predicted proteins due to incomplete metagenomic assembly, potential assembly truncation should not adversely affect reliability of the amino acid similarity comparisons presented in Fig. 4C

Bacteroidota Verrucomicrobiota Gammaproteobacteria Bacillota
Percent amino acid identity within families GH95 (86%), PL38 (85%) and GH168 (58%) were much lower than those for other families (95 to 99%). These more heterogeneous CAZy families might need to be divided into narrower subfamilies as more experimental data become available in the future. Taxonomic distributions of fish gut-expanded CAZy enzyme families were estimated by protein sequence comparisons to classified relatives in GenBank nr (Fig. 4B). Predicted proteins from families GH168, GH165, and PL40, targeting brown and green algal polysaccharides, along with families PL30, PL29 and PL13, most likely digesting host tissues, exclusively matched database sequences from Bacteroidota-related clades. Predicted carrageenandegrading family GH82 enzymes were confined to taxa identified as either Bacteroidota or Verrucomicrobiota, while other macroalgal-degrading families also contained some matches to Bacillota and Gammaproteobacteria. Although not all enzyme candidates could be assigned to a particular bacterial taxon, these results confirm the dominant role of Bacteroidota as elite complex carbohydrate digesters as previously described in other environments (80).
Polysaccharide desulfation. Negatively charged sulfate residues in polysaccharides can shield glycosidic bonds from enzymatic cleavage (26), stabilizing carbohydrate backbones against degradation and impeding transport of partially degraded external intermediates into bacterial cells. To identify enzyme families that might help overcome these barriers for marine-specific macroalgal substrates, all predicted metagenomic arylsulfatases were classified according to SulfAtlas database categories (81), and relative abundances of these sulfatase families were compared between kyphosid fish and terrestrial ruminant metagenome samples.
Relative abundances of the most highly expanded SulfAtlas families in fish gut metagenomes are shown in Fig. 5 and Data Set S2. Low or undetectable levels of these families in terrestrial herbivores suggest a high degree of marine specificity (Fig. 5A). As in CAZy families, the majority of taxonomically classifiable sulfatase sequences were associated with Bacteroidota, followed distantly by Verrucomicrobiota and Gammaproteobacteria (Fig. 5B). 300 S1_17 S1_15 S1_16 S1_8 S1_19 S1_11 S1_14 S1_20 S1_28 S1_4 S1_25 S1_24 S1_72 S1_23 S1_7 S1_29 S1_27 S1_30 S1_62 S1_46 S1_51 S1_13  Only one sulfatase family (S1_27) included members classified as Bacillota. All expanded SulfAtlas families except S1_13, the only group exclusively associated with Gammaproteobacteria, were highly concentrated in hindgut regions. With the exception of family S1_46, most fish gut proteins contained export signal sequences. S1_46 sequences were also unusual in their level of intraclass heterogeneity, with a median amino acid identity of 72% versus 93 to 100% for other expanded SulfAtlas families (Fig. 5C). SulfAtlas families S1_7 and S1_81 have previously been demonstrated to include endo-4S-k -carrageenan sulfatases (21,42,49,82), while S1_19 family members have shown activity against both endo-4S-k -and endo-4S-i -carrageenans (21,83). In this study, metagenomic frequencies for family S1_19 (n = 273) were considerably higher than those for S1_7 (n = 87) or S1_81 (n = 32). These results are consistent with expansion of CAZy families targeting more highly sulfated GH82 (i -) and GH150 (l-) carrageenases, rather than simpler k -carrageenases. Further research will be required to determine the role of other, as yet functionally uncharacterized, SulfAtlas families in accommodating sulfated polysaccharide diversity potentially associated with fish microbiomes encountering a wide variety of red macroalgal taxa.

Number of classified genes
Colocalization of sulfatase and polysaccharide degradation enzyme classes. CAZy and SulfAtlas classified genes occurring in close proximity to each other on the same contig are potential candidates for colocalization within a common PUL. Although not all metagenomic contigs are long enough to encompass full-sized PULs, which can include as many as 25 adjacent genes (48), the median separation distance for the 1,453 unique CAZy/SulfAtlas gene pairs detected in fish gut metagenomes was only 4 genes apart, with 98% separated by 25 or fewer genes (Fig. S2). In contrast, the median intervening distance for CAZy and SulfAtlas family pairs colocalized on terrestrial ruminant metagenomic contigs was 15 genes, with only 62% falling within a 25-gene distance limit.
One striking feature of gene colocalization networks was the presence of higher proximity frequencies between certain enzyme families. These enhanced links suggest potential occurrence within a common PUL that might facilitate coregulation of gene expression. Both linkage frequencies and the diversity of colocalized nodes varied widely between families. These variations were not necessarily proportional to overall metagenomic abundance, as illustrated by comparing the colocalization frequency network centered on CAZy family GH50 (94 metagenomic occurrences) with that of family GH150 (95 metagenomic occurrences) in Fig. 7 and SulfAtlas family S1_29 (75 metagenomic occurrences) with SulfAtlas family S1_30 (63 metagenomic occurrences) in Fig. 8.
Network comparisons also show enormous variation in the number of self-loops formed by neighboring genes from the same class on a single metagenomic contig, suggesting differences in the expansion of particular families by gene duplication (summarized in panels B and C of Fig. 9). The most extreme example of gene duplication was observed in family PL38 glucuronan lyases, where tandem repeats of 7 to 9 closely related sequences predicted to originate from an unknown Spirochaete lineage were found in assembled contigs from both K. cinerascens and K. hawaiiensis samples. Although no closely related proteins have been reported in other Spirochaetes, the PUL Database (48)  GH117 S1_15 CBM6 S1_11 GH43_1 GH141 CE17 GH2 S1_8 S1_27 GH136 GH105 GH16_11 S1_72 S1_28 S1_24 GH117 (431) GH2 S1_8 S1_72 S1_7 S1_30 S1_25 S1_23 S1_19 GH50 GH86 (142) S1_15 GH110 S1_16 S1_30 GH82 S1_20 GH43_1 GH127 CBM6 S1_8 GH167 GH2 S1_N.C. GH136 S1_19 S1_72 S1_8 S1_15 S1_7
Nearly all SulfAtlas families expanded in fish gut metagenomes included some examples of proximity with red algal polysaccharide-hydrolyzing CAZy families, but some were also located near CAZy families predicted to degrade brown and green algal and/or host chondroitin substrates (Fig. 9A). SulfAtlas families S1_28 and S1_25 were predominantly linked to CAZy families hydrolyzing brown algal substrates, while families S1_19, S1_20, S1_27, and S1_30, were more narrowly associated with red algal-digesting enzyme sequences. Families S1_14 and S1_17 were most often associated with green algal ulvan lyases. These distinctively different colocalization patterns suggest promising avenues for future experimental determination of sulfatase familyspecific substrate ranges.

DISCUSSION
Although diet is undoubtedly a key element in determining gut microbial community composition, discovering and quantifying the full variety of dietary items a wild fish might have eaten on a particular day are challenging. Stomach contents are homogenized, fragmented, and conglomerated into amorphous mixtures, making visual identification difficult. Epibionts, biofilms, and marine sediments may be ingested unintentionally along with preferred food items, and relative quantities of the items consumed may vary for individual fish over time periods considerably briefer than those required for digestive system transit. As a result, single postmortem snapshots of stomach contents may fail to capture or recognize the full range of dietary diversity associated with individual fish.
The deep metagenomic sequencing methods used in this study have addressed these issues by quantifying molecular signatures of algal taxa as they passed through the digestive systems of individual fish, minimizing potential observational bias arising from differences in the physical sizes and deterioration states of ingested food. Dietary composition, microbial taxonomy, and predicted enzyme functions were all assessed in parallel for each sample, circumventing potential inconsistencies due to time-dependent variability in host feeding behavior. This approach has also provided evidence suggesting progressive replacement of transient, environmentally sourced microbes such as marine Vibrionaceae in more proximal gut regions with more persistent host-associated taxa during transit to distal regions.
One limitation of the in-depth metagenomic techniques applied in this study is that only one adult representative was analyzed from each of the three different kyphosid species. There is no guarantee that identical dietary profiles would be obtained from other fish of the same species sampled from the same habitat or even the same fish if captured at different times. However, results spanning multiple gut regions in sympatric individuals consuming similar diets have yielded a consistent, high-level picture that greatly extends previous analyses based on observations of stomach contents alone (13,36). These insights will inform the design of future experiments aimed at investigating the relationships between diet and microbial compositions within, as well as between, species and how these relationships might vary between individuals of different ages and sexes.
The microbiomes of wild-caught individuals from three different sympatric kyphosid species revealed consistent connections between particular microbial groups and shared functional capabilities likely to contribute to efficient processing of macroalgal polysaccharides. These connections were highlighted by both expansion of individual enzyme families and their co-occurrence in operon context. Consistent availability of especially diverse red macroalgal dietary items correlated with expanded, marine-specific enzyme families capable of hydrolyzing their characteristic galactose-rich sulfated polysaccharides. Lower metagenomic frequencies were observed for enzyme families targeting brown algal fucans and green algal ulvans, complementing molecular evidence suggesting less frequent consumption of foods containing these polysaccharides.
Previously described methods for detecting polysaccharide utilization loci (PUL) and their operon context rely heavily on sequences from well-characterized taxa (28,86), potentially limiting discovery of novel examples from uncultured or poorly studied environmental taxa with atypical PUL architectures. Quantification of metagenomic enzyme family colocalizations avoids this limitation by identifying statistically verifiable frequency patterns replicated in multiple independent samples, revealing informative associations even in cases where only partial PUL are assembled. This technique has identified relationships among several thousand newly sequenced macroalgal degradation enzyme candidates from kyphosid fish metagenomes, providing valuable tools for exploring evolutionary mechanisms that may be responsible for acquisition and expansion of these capabilities among individual microbial taxa.
Frequently observed extracellular export signals in CAZy and SulfAtlas family proteins enriched in fish relative to terrestrial ruminants, combined with the known complexity and diversity of naturally occurring macroalgal polysaccharides, suggest potential opportunities for cooperative activity on recalcitrant substrates. Exported enzyme cooperativity could include multiple genes originating from the same strain, variants comprising the pangenomic repertoire of closely related strains, or even panmicrobiome diversity arising from widely different taxa within the gut microbial community. Future strategies for testing this hypothesis could include reconstruction of metagenome-assembled genomes (MAGs) from binned metagenomic contigs, transcriptional mapping to determine in vivo gene expression levels, in vitro substrate hydrolysis measurements using combinations of purified enzymes obtained by expression cloning, and comparisons of macroalgal degradation performance by enrichments or defined mixed cultures of live bacteria.
The work presented here extends previous kyphosid microbiome studies (35,36,70) by highlighting taxa likely to be most active in digestion of sulfated macroalgal polysaccharides, identifying potentially extracellularly exported, cooperative networks of macroalgal digestion enzymes, and revealing prospective functional properties of previously uncharacterized sulfatase enzyme families. The discovery of gut compartment-specific microbial adaptations to a diet rich in sulfated macroalgal polysaccharides improves our understanding of the enzymes and organisms involved in host utilization of these molecules, providing foundational resources for future investigations into suppression of coral reef macroalgal overgrowth, the role of gut microbiota in fish physiology, the use of macroalgal feedstocks and probiotics in aquaculture, the inclusion of seaweed supplements in terrestrial ruminant feedstocks, and the use of naturally occurring microbial enzymes in extraction of commercially valuable products from macroalgae.

MATERIALS AND METHODS
Metagenomic sequencing and assembly. DNA samples were isolated from lumen contents distributed over gut compartments from three different kyphosid fish species (see Table S1A in the supplemental material), using previously described collection and processing procedures (35), in accordance with IACUC protocol S12219. Approximately 30 million 250-bp paired-end reads were generated per sample using Illumina NovaSeq 6000 technology. Reads from each individual sample were quality filtered and trimmed using Trimmomatic version 0.36 with the following parameters: adapter-read alignment settings 2:30:10, LEADING:10, TRAILING:20, HEADCROP:12, SLIDINGWINDOW:4:15, and MINLEN:200 (87). Trimmed reads from each sample were then assembled separately using metaSPAdes version 3.13 (88) with a minimum contig retention size of 2,000 nucleotides (nt).
Taxonomic classification. Taxonomic assignments were made for unassembled reads with Kraken2 version 2.1.2 (67) using GenBank nr (accessed August 2021) as a custom reference database. Assembled metagenomic contigs were taxonomically classified using DarkHorse version 2.0_rev09 as previously described (89). DNA-directed RNA polymerase subunit beta (RpoB) protein sequences were retrieved from assembled contigs annotated with Prokka as described below. The closest sequenced database relatives were identified by top matches in blastp searches against a combined database including both GenBank nr entries and predicted proteins from terrestrial ruminant metagenome MAGs (38).
Multiple-sequence alignments of fish gut RpoB sequences of .900 amino acids and their closest database relatives were obtained using MUSCLE version 3.8.31 (90) and used to build phylogenetic trees using FastTree version 2.1.10 (91). Trees were visualized using the R package ggtree version 3.3.5 (92). Fish microbiome 18S rRNA gene sequences were identified by blastn search against the SILVA_138.1_SSURef_NR99 database (93), retaining alignments covering at least .30% of the reference sequence length with E values of 1e25 or better. These relatively loose parameters were chosen to maximize sensitivity in assigning approximate taxonomies to both full-length and partial, incomplete metagenomic sequences.
Gene annotation. Assembled metagenomic contigs were annotated with Prokka version 1.14 (94). Predicted proteins containing microbial extracellular export signals were identified using SignalP version 6.0f (95). Carbohydrate enzyme families were assigned using HMMSEARCH with CAZy version 10 database patterns downloaded from dbCAN2 (96), retaining matches with alignments covering at least 30% of the protein and E values of 1e215 or better. Sulfatase enzyme categories were determined using the SulfAtlas hidden Markov model (HMM) subfamily classification tool with database version 2.3 (49). Assembled nucleotide sequences for 391 terrestrial ruminant MAGs (38) were downloaded from NCBI BioProject no. PRJEB34458 and annotated using Prokka, SignalP, CAZy, and SulfAtlas, as described above, to enable direct comparisons with assembled fish gut metagenomes.
The overlap of dominant microbial community taxa observed in the digestive systems of terrestrial vertebrate herbivores with those of kyphosid marine fish-for example Alistipes-related Bacteroidota, Clostridia-related Bacillota, and Desulfovibrio-related Deltaproteobacteria (35-38)-provides a context of evolutionarily shared microbial genomic backgrounds that can be used to facilitate discrimination between broad enzyme activities involved in digesting polysaccharides common to all Viridiplantae and those more specific to marine macroalgae. Vertebrate MAGs representing microbiota from domesticated and wild-caught herbivores were selected as potentially informative comparators based on taxonomic similarity of their microbial communities, breadth of potential polysaccharide types consumed, and previously documented similarity in gut transit times (1) and fermentative acetate turnover rates (6).
Metagenomic occurrence frequencies were tallied for CAZy and SulfAtlas database entries and selected Prokka annotation keywords and then grouped into subsets according to the species of origin (for fish, K. vaigiensis, K. cinerascens, and K. hawaiiensis; for ruminants, reindeer, red deer, sheep, goat, cattle, or mixed ruminant assembly) and normalized for total number of predicted protein sequences in each subset. Normalized values were subjected to 1-tailed, homoscedastic t tests using Microsoft Excel version 16.54 to evaluate statistical significance (P values) of observed differences between fish and ruminant data sets. Enzyme families colocalized on the same metagenomic contig were identified using a Unix command-line pipeline of custom perl scripts available on GitHub (97). Co-occurrence frequencies of enzyme pairs obtained using this pipeline were plotted as edge-weighted network diagrams using Cytoscape version 3.9.1 (98).
Data availability. Sequence reads are available under SRA accession no. SRR19136343 through SRR19136358, assembled contigs under WGS accession no. JAMHIX000000000 through JAMHIZ000000000 and JAMHJA000000000 through JAMHJM000000000, and predicted protein sequences in Zenodo (https:// zenodo.org) under DOI no. 10