Taxonomic composition of metagenomic assemblies
The 16 fish gut metagenomic samples assembled using metaSPAdes yielded a total of 1.478 Tbp (terra base pairs) of assembled nucleotides in contigs > 2 kb, encoding 1,432,202 predicted proteins (Addditional File 1). Microbial community compositions of these samples were initially assessed using unassembled reads with Kraken2, enabling relative abundance estimates for eukaryotic, archaeal, and viral taxa as well as bacteria. High-level taxonomic classifications for individual samples (Fig. 1) were consistent with multi-sample 16S rRNA gene amplification averages previously reported for kyphosid fishes . These observations included midgut dominance of Gammaproteobacteria and hindgut enrichment of Bacteroidota and Bacillota, with lower abundances of Alpha-, Beta–, and Deltaproteobacteria, Spirochaetota, Verrucomicrobiota, and Archaea sequences distributed across all compartments. More detailed examination of individual samples showed that Gammaproteobacteria levels varied widely among midgut compartments, and that eukaryotic-associated reads were present in much lower abundances in adult samples from K. vaigiensis, K. cinerascens, and K. hawaiiensis versus the juvenile K. vaigiensis sample (fish 8).
Divergence among fish gut-associated bacterial clades was further explored using predicted protein sequences encoding the single-copy DNA-directed RNA polymerase subunit beta (rpoB) gene on assembled contigs. An amino acid sequence tree (Fig. 2) shows the largest numbers of rpoB genes recovered were taxonomically classified as Bacillota (48), followed by Bacteroidota (37), Gammaproteobacteria (26), Spirochaetes (7), Deltaproteobacteria (6), Verrucomicrobiota (5), Alphaproteobacteria (4), and Melainabacteria (1). No rpoB sequences were detected for Archaea, consistent with the low coverage of this taxonomic group observed in unassembled reads.
Closest database relatives to kyphosid metagenome rpoB genes suggest that ingested environmental bacteria may dominate midgut sample microbial communities, with hindgut compartments containing closer relatives to gastrointestinal-associated taxa from terrestrial hosts (Additional File 2). Gammaproteobacteria RpoB protein sequences from fish midgut samples most closely resembled marine Vibrio, including one shallowly branching clade matching shrimp-associated Vibrio harveyi, at 99.9% identity (WP_005433641.1). Although Verrucomicrobiota RpoB sequences were quite divergent from previously described relatives, their closest database matches were to marine bacteria of the Kiritimatiellaceae family (Kiritimatiella glycovorans, 78% identity) and algae-associated bacteria from genus Lentimonas (WP_162027363.1, 63% identity). One Spirochaetes clade found in fish midgut samples was distantly related to mammalian pathogen Brevinema andersonii (SFB68422.1, 84% identity), but Brevinema-related taxa have also been observed in 16S rRNA gene amplification studies of trout , tilapia , salmon , seahorses , and kyphosids , as well as environmental metagenomes from sulfidic artesian boreholes . Spirochaete sequences from fish hindguts clustered more closely with a terrestrial ruminant sequence (RUG30335_00745, 72% identity). The closest database relatives to fish gut rpoB sequences from Bacteroidota, Bacillota, Deltaproteobacteria, and Alphaproteobacteria were all MAGs from terrestrial ruminants, rather than marine environmental or bacterial isolate genomes.
Assembled fish gut sequences were also analyzed for the presence of 18S rRNA genes, as markers for eukaryotic organisms (Additional Files 3–4). Teleost 18S rRNA genes were most abundant in samples from juvenile fish F8, consistent with overall taxonomic classifications based on unassembled reads. Non-teleost 18S rRNA gene sequences, found in all fish and distributed across all gut compartments, most frequently matched parasitic single-celled protozoa (Trichomonadea, Entamoeba, Giardia, Plasmodium) and multicellular worm taxa (Enenterum, Acanthocephalus, Enoplus, Opisthadena) often associated with fish pathology. 18S rRNA genes associated with marine algae (Hydrolithon, Scytosiphon), fungi (Stereopsis), and crustaceans (Podocopa) were also detected at low levels, but could potentially have originated from dietary sources.
Protein functional annotations
Although relative abundances of predicted protein functions in metagenomic assemblies do not necessarily correspond with biological activity, expanded numbers of metagenomically predicted genes from specific protein families can indicate potential adaptive capacity. Functional annotations of predicted proteins from fish gut metagenome assemblies (Additional Files 5–10) included both generalized and taxon-specific structural and metabolic features consistent with phylogenetic distributions observed in Fig. 1. Some illustrative examples include midgut enrichment of flagellar motility and chemotaxis genes characteristic of Gammaproteobacteria Vibrionaceae and hindgut expansion of genes encoding Bacteroidota-specific SusC/SusD extracellular attachment proteins, as well as glycoside hydrolases targeting galactose, fucose, and xylose-rich polysaccharides typically found in Bacteroidota and Bacillota species from terrestrial microbiomes. Predicted genes annotated as catalases and superoxide dismutases, suggesting potential resistance to oxidative stress, were observed in bacterial contigs from fish metagenomes in both midgut and hindgut compartments.
The most frequently encountered non-hypothetical gene function descriptions in fish metagenomes were arylsulfatases (1.9% of all predicted proteins in adult fish samples), as might be anticipated from a diet rich in sulfated macroalgal polysaccharides. In contrast, numbers of predicted proteins annotated as dissimilatory sulfite reductases (< 0.007%) were quite low, consistent with low abundances of Desulfovibrio-related Deltaproteobacteria (Fig. 1). Low concentrations of sulfate-reducing bacteria were somewhat unexpected, considering their previously reported expansion under conditions of high sulfate availability in the digestive systems of terrestrial animals .
Although no direct oxygen concentration measurements are currently available for kyphosid fish gut compartments, low levels of genes encoding strictly anaerobic processes and the predominance of relatively oxygen-tolerant fermentative taxa such as Bacteroidota, Bacillota, and Gammaproteobacteria suggest the possibility of microaerophilic oxygen stress as a potential inhibitory factor for more strictly anaerobic species. Another striking difference from terrestrial ruminant microbiomes was the absence of key methanogenesis genes such as methyl-coenzyme M reductase, coupled with low read mapping frequencies to methanogenic archaea. This result is consistent with numerous studies demonstrating inhibition of methanogenesis and suppression of methanoarchaeal growth by red and brown macroalgae, both in vitro  and when used as a dietary supplement for terrestrial ruminants .
Macroalgal polysaccharide hydrolysis
High levels of structural micro-heterogeneity in naturally occurring marine macroalgal polysaccharides  might be expected to promote expansion and diversification of microbial enzyme families involved in their hydrolysis. This hypothesis was initially tested by comparing predicted protein annotations for kyphosid fish gut metagenomes to a set of 391 MAGs from terrestrial ruminant digestive systems , processed using an identical bioinformatic pipeline (Fig. 3, Additional Files 6–10). Although the annotations produced by this pipeline were not comprehensive (for example laminarinases, cellulases, and fucosidases were not explicitly labeled as such), these results did reveal significant enrichment of predicted proteins described as agarases, carrageenases, porphyranases, and arylsulfatases in the fish metagenomes (p-values < 0.05). Starch molecules, typically cleaved by enzymes annotated as amylases, occur in both plants and green algae, but not in the red and brown algae typically eaten by kyphosid fish. Amylase keywords were correspondingly more abundant in terrestrial ruminant metagenomes. In contrast, xylanase annotation frequencies did not differ significantly between the two data sets (p-value = .093), consistent with the observation that xylans of various types occur in red and green algae as well as terrestrial plants. Within fish gut compartments, proteins annotated as carrageenases, porphyranases, and agarases were observed at consistently higher frequencies in hindgut versus midgut samples, in contrast to more variable compartmental distributions for alginate lyases and arylsulfatases (Additional File 11).
Candidate enzymes for macroalgal digestion were further characterized by classification according to the CAZy reference database (Fig. 4, Additional File 12). CAZy classes targeting red algal agars (GH117, GH50), porphyrans (GH86) and carrageenans (GH82, GH150), as well as non-sulfated alginates from brown algae (PL38, PL6_1, PL7_5) were particularly abundant in fish metagenomes. Class PL40 sequences, whose annotated activities include both exo-alginase and ulvanase, were also significantly expanded in fish gut samples, but other CAZy database classes described as ulvanases (PL24, PL25, PL28, PL37) were not. Abundant fish-specific classes annotated as chondroitinases (GH88, PL8_1, PL30, PL29) and heparin lyases (PL13) are most likely associated with breakdown of host-associated extracellular matrix components. Fish metagenomes were significantly enriched in classes annotated as galactosidases (GH110), iduronidases (GH39), and xylanases (GH10), as well as classes GH136, GH16_14, PL9_2, GH102, annotated as hydrolyzing polysacccharides containing multiple different monomers, but potential involvement of these classes in macroalgal decomposition could not be unambiguously determined. In agreement with annotation keyword results, the great majority of sequences from fish-enriched CAZy classes described as hydrolyzing sulfated macroalgal polysaccharides were obtained from hindgut compartments.
CAZy glycohydrolase classes are based on shared amino acid sequence similarity, protein structural features, and catalytic mechanisms, but often do not distinguish between enzymes acting on chemically similar substrates, especially those with different branching patterns [45, 79, 80]. Polyspecific CAZy classes are particularly common among glycohydrolases annotated as having 1,3 β-glucosidase and/or 1,4 β-glucosidase activities, including laminarinases and cellulases (Additional File 13). Classes GH16_3 and GH16_21, historically described as laminarinase subfamilies despite having a substantially broader range of actual substrates , were slightly more abundant in fish gut than terrestrial ruminant metagenomes, but these differences were not statistically significant. Other CAZy classes annotated as hydrolyzing non-sulfated, glucose-containing polysaccharides (e.g, GH3, GH5, GH8, GH9, and GH64) were more abundant in metagenomes from terrestrial ruminants than fish. However model enzymes in these classes include not only those hydrolyzing algal polysaccharides , but also examples that digest non-algal glycans from bacterial capsid polysaccharides  and the cell walls of plants  and fungi . These ambiguities preclude confident substrate predictions for these particular CAZy classes based on sequence data alone.
Two recently described CAZy classes (GH107 and GH168) include enzymes experimentally demonstrated to hydrolyze sulfated brown algal fucans [22, 31]. Although neither of these two classes were detected in terrestrial ruminant metagenomes, kyphosid fish samples included 13 predicted sequences from class GH168. Additionally, fucosidase class GH141, present in multiple copies in PULs of known fucan-degrading bacteria , was highly expanded in fish gut metagenomes. In contrast, fucosidases from class GH29 were abundant in both kyphosid fish and terrestrial ruminant metagenomes, and class GH95 fucosidases were significantly more abundant in the terrestrial samples.
Bacterial export signal sequences were detected in the majority of sequences from fish-expanded CAZy classes annotated as degrading either red and brown macroalgae or host extracellular matrix polysaccharides (Fig. 4a) with the notable exception of class GH95, whose members may act intracellularly. In Gram-positive bacteria such as Bacillota, proteins containing signal sequences are translocated across the cell membrane and exported to the extracellular environment. In Gram-negative bacteria such as Bacteriodota, signal sequences mediate initial translocation across the inner membrane to the periplasmic space, which may or may not be followed by secretion across the outer membrane. Percentages of signal sequence-containing macroalgal hydrolysis enzymes were generally lower in terrestrial ruminant gut metagenomes than kyphosid fishes, although N-terminal signal sequences could be missing in some cases due to incomplete metagenomic assembly. However, potential assembly truncation should not adversely affect reliability of the amino acid similarity comparisons presented in Fig. 4C, which show that median percent identities within classes GH95, PL38 and GH168 were much lower than for other classes (50–85% versus 95–99%). These relatively heterogeneous CAZy classes might need to be divided into narrower subclasses as more experimental data becomes available in the future.
Taxonomic distributions of fish gut-expanded CAZy enzyme classes were estimated by protein sequence comparisons to classified relatives in GenBank nr (Fig. 4b). Predicted proteins from classes GH168, GH165, and PL40, targeting brown algal polysaccharides, along with classes PL30, PL29 and PL13, most likely digesting host tissues, exclusively matched database sequences from Bacteroidota-related clades. Predicted carrageenan degrading class GH82 enzymes were confined to taxa identified as either Bacteroidota or Verrucomicrobiota, while other macroalgal-degrading classes also contained lower percentages of matches to proteins classified as Bacillota and Gammaproteobacteria. Although taxonomies cannot be assigned to all enzyme candidates, especially those from groups that are sparsely represented in public databases, these results confirm the dominant role of Bacteroidota as elite complex carbohydrate digesters, as previously described in other environments .
Negatively charged sulfate residues in polysaccharides can shield glycosidic bonds from enzymatic cleavage , stabilizing carbohydrate backbones against degradation and impeding transport of partially degraded external intermediates into bacterial cells. To identify fish gut-specific enzymes that might help overcome these barriers, all predicted metagenomic arylsulfatases were classified according to SulfAtlas database categories , and relative abundances of these sulfatase classes were compared between kyphosid fish and terrestrial ruminant metagenome samples.
More than half of all arylsulfatases detected in fish gut metagenomes fell into the five most abundant SulfAtlas classes, S1_16, S1_15, S1_17, S1_8, and S1_19 (Fig. 5a). Predicted proteins from these classes were highly concentrated in hindgut compartments, and greatly enriched in fish gut samples relative to terrestrial ruminants. Other sulfatase classes abundant in fish hindguts, such as S1_7, S1_14, S1_11, and S1_22, were present at similar (p-value > 0.05) or higher numbers in terrestrial ruminant metagenomes. The majority of fish gut metagenome sulfatase sequences contained bacterial export signal sequences, but this was not true of ruminant examples from the same classes, except for S1_13, S1_27 and S1_46, where the export signal pattern was reversed. These three classes were also the only ones not dominated by database matches to Bacteroidota-related sequences (Fig. 5b). Class S1_13 matches consisted primarily of Gammaproteobacteria, and more than half of S1_27 sequences were most closely matched to Bacillota relatives. Figure 5c shows that intra-class sequence similarity levels were quite high for most SulfAtlas groups (93–100% median amino acid identity), except classes S1_46 and S1_27 (70–85% median amino acid identity).
Co-localization of sulfatase and polysaccharide degradation enzyme classes
CAZy and SulfAtlas classified genes occurring in close proximity to each other on the same contig are potential candidates for co-localization within a common PUL. Although not all metagenomic contigs are long enough to encompass full-sized PULs, which often include as many as 25 adjacent genes , median separation distance for the 1453 unique CAZy/SulfAtlas gene pairs detected in fish gut metagenomes was only 4 genes apart, with 98% separated by 25 or fewer genes (Additional File 14). In contrast, the median intervening distance for CAZy and SulfAtlas class pairs co-localized on terrestrial ruminant metagenomic contigs was 15 genes, with only 62% falling within a 25 gene distance limit.
CAZy-SulfAtlas class pairs identified by proximity screening can be visualized as a network of nodes, with edges depicting co-localization frequencies (Fig. 6). Similarities and differences in potential PUL associations are highlighted by sub-networks, such as those including only fish-enriched CAZy (Fig. 6b) or SulfAtlas (Fig. 6c) classes, as well as those limited to connections associated with individual enzymes (Figs. 7–8). Co-localization patterns for CAZy and SulfAtlas classes present in both fish gut and terrestrial ruminant metagenomes (Additional File 16) showed little similarity to each other, consistent with microbiome adaptation to different dietary inputs.
One striking feature of gene co-localization networks was the presence of highly concentrated links (thicker connecting lines) between some enzyme classes. Higher proximity frequencies associated with these links suggest more likely occurrence within a common PUL, potentially facilitating co-regulation of gene expression. However, both linkage frequencies and the diversity of co-localized nodes varied widely between individual enzyme classes. These variations were not necessarily proportional to overall metagenomic abundance, as illustrated by comparing the co-localization frequency network centered on CAZy class GH50 (94 metagenomic occurrences) with that of class GH150 (95 metagenomic occurrences) in Fig. 7, and SulfAtlas class S1_29 (56 metagenomic occurrences) with SulfAtlas class S1_30 (47 metagenomic occurrences) in Fig. 8. Network comparisons also show enormous variation in the number of self-loops formed by neighboring genes from the same class on a single metagenomic contig, suggesting differences in the expansion of particular classes via gene duplication. The most extreme example of gene duplication was observed in class PL38 alginate lyases, where tandem repeats of 7–10 closely related sequences were found in several assembled contigs predicted to originate from an unknown Spirochaete lineage.
Detailed comparisons of co-localized enzyme classes can reveal subtle relationships and provide hints about potential metabolic interactions that might benefit from co-regulation. As an example, sulfatase classes S1_16 and S1_8 were frequently co-localized on the same contig, suggesting potential complementary PUL-linked functions within a single genome. Both classes were also frequently paired with polysaccharide-degrading CAZy classes that included those targeting highly sulfated iota- (GH82) and lambda- (GH150) carrageenans, as well as exo-α-galactosidase class GH110, α-1,3-(3,6)-anhydro-D-galactosidase class GH127, and generalized hexosidase class GH2. Links to CAZy classes targeting agarases (GH117, GH50) and porphyranases (GH16_11) occurred much less frequently. A different pattern was observed in terrestrial ruminant metagenomes, where network connections for sulfatase class S1_16 were more evenly distributed over a wider variety of partner classes (Additional File 13), while S1_8 was more strongly paired with sulfatases S1_27 and S1_4 instead of S1_16, along with glycosyltransferases, carboxylesterases, and general glycoside hydrolases like GH2 and GH3.
A matrix summarizing SulfAtlas/CAZy co-localization frequencies (Fig. 9) shows that nearly all SulfAtlas classes expanded in fish gut metagenomes included some examples of proximity with red algal polysaccharide-hydrolyzing CAZy classes, but some were also located near CAZy classes predicted to degrade brown algal and/or host chondroitin substrates. Classes S1_17, S1_14, S1_28, and S1_25, were predominantly linked to CAZy classes hydrolyzing brown algal substrates, while classes S1_19, S1_20, S1_27, and S1_30, were more narrowly associated with red-algal digesting enzyme sequences. These distinctively different co-localization patterns suggest promising avenues for future experimental determination of sulfatase class-specific substrate ranges.
One limitation of metagenomic co-localization analysis is that not all paired enzyme features are equally informative, particularly those including diverse glycohydrolase classes like GH2. At least a dozen different functional activities have been ascribed to this CAZy class, targeting not only β-galactosides, but also β-mannosides, β-glucuronosides, and β-galacturonosides. Another constraint is that predicted proteins from sparsely represented microbial taxa are often encoded on shorter, less completely assembled contigs, reducing the amount of available information about potential neighboring genes. Although this limitation can decrease detection sensitivity for rare enzyme class pairs, infrequent relationships could also be interpreted as less important to overall fish digestive processes than features showing elevated metagenomic linkage frequencies that potentially contribute more to successful metabolic strategies, resulting in higher community abundances for the organisms encoding them.