The HUM-V genome database enabled metaproteomic characterization of river sediment microbiomes
Here we created the Hyporheic Uncultured Microbial and Viral (HUM-V) genomic catalog from Columbia River HZ sediments. We leveraged this resource for metaproteomic peptide recruitment, enabling identification of the community members and their gene expression in these sediments.
We reconstructed 655 bacterial and archaeal metagenome assembled genomes (MAGs); 102 were medium or high-quality genomes based on the Genome Consortium Standards  (Additional File 2). These genomes were dereplicated into 55 genomic representatives to form the bacterial and archaeal portion of the HUM-V microbial genome database. These dereplicated HUM-V MAGs were distributed across 9 Bacterial and 2 Archaeal phyla. In terms of new genomic discoveries, 1 genome represented a new order within the Actinobacteriota, and 12 genomes represented 6 new genera from archaeal and bacterial phyla including members of the Thermoplasmatota, Acidobacteria, Actinobacteriota, CSP1-3, Proteobacteria, and Desulfobacterota (Fig. 1b, Fig. 2a).
From the same metagenomic assemblies we reconstructed and reported viral metagenome assembled genomes (vMAGs), making this one of only a handful of genome-resolved studies that include viral genomes derived from rivers [46–48], and to our knowledge, the first study to complement these with bacterial and archaeal genomes. We reconstructed 2,482 vMAGs that dereplicated into 111 dereplicated viral populations > 10kb in size (Additional File 6). Given their sparse sampling from river corridors, only 5 of the HUM-V viral genomes had taxonomic assignments using established viral taxonomies from standard reference databases. To better understand if the remaining 95% (n = 105) of viral genomes were completely novel or had been previously detected in similar ecosystems, we repeated the analyses, this time adding 1,861 viral genomes we reconstructed or pulled from public metagenomes from four freshwater sites in North and South America (Fig. 2c, Additional File 6). Of the 105 remaining viral genomes in HUM-V, 15% (n = 17) clustered with these freshwater derived sequences, indicating a portion of this viral community is shared across diverse geographic and freshwater systems. Of the remaining viral genomes, 23% (n = 26) clustered only with genomes recovered in this data set, indicating multiple samplings of the same virus spatially at this site, while 57% (n = 63) of the viral genomes we sampled were singletons (i.e., only sampled from these sediments once). These results hint at the possible cosmopolitan and endemic viral lineages that warrant further exploration.
HUM-V recruited viral and microbial peptides from our HZ sediment metaproteomic dataset (n = 33 lateral and depth resolved samples) (Fig. 2bd, Additional File 5). Across all sediment samples, microbial genomes recruited 13,102 total peptides to ~ 1,300 proteins in HUM-V, with 68% of these proteins uniquely assigned to a single microbial genome. For viruses and microbes alike, the most abundant genomes were not necessarily the most actively expressing proteins. The most abundantly ranked microbial members included the Nitrospiraceae genus NS7, Binatia, and Nitrososphaeraceae genus TA-21 (Fig. 2b), yet only the Nitrososphaeraceae had high proteomic recruitment (15%). Similarly, some low abundance members (e.g., members of the Actinobacteria) accounted for a majority of the uniquely assigned proteome relative abundance (49%) (Fig. 2b). Like our microbial dataset, 66% of the viral genomes encoded genes that uniquely recruited peptides (Fig. 2d). This exceeded prior viral metaproteome recruitment from other environmental systems (e.g., wastewater, saliva, rumen (0.4–15%, [49–51]), thus we infer a relatively large portion of the viral community was active at the time of sampling. While microbial and viral activity did not appear to be structured by transect, sediment depth, or geochemical conditions, these two assemblages were coordinated to one another (Additional File 3: Figure S6). Explaining this lack of geochemical or spatial structuring, it is possible that the microbial heterogeneity in these samples occurred over a finer spatial resolution (pore or biofilm scale) than the bulk 10 cm depths sampled or that these HZ sediment microbiomes are metabolically robust to the small, but significant changes in chemistry measured across spatial gradients (Additional File 3: Figure S6, Additional File 3: Figure S7, Additional File 7).
Microbial cross feeding of organic carbon is likely sustained by aerobic respiration
It is well recognized that microbial carbon oxidation in HZ sediments largely contributes to river respiration, yet the microbial food webs underpinning this process have yet to be documented. Consistent with resazurin (raz) data (see Additional File 3: supplementary methods) that indicated these sediments were oxygenated and supported aerobic microbial respiration (Additional File 3: Figure S7) , all but one of the microbial genomes recovered from this site encoded aerobic respiration machinery, including a complete electron transport chain and a cytochrome oxidase (Additional File 3: Figure S1). Proteomic evidence for aerobic respiration (cytochrome c oxidase aa3) was detected from nearly all samples, but only assigned to few members of the Nitrososphaeraceae. However, given limitations with detecting membrane cytochromes , we consider it likely this metabolism was more active than was captured in proteomic data, as we failed to find any evidence for other anaerobic metabolisms (e.g., methanogenesis).
While the overall carbon content of these sediments was low (< 10 mg/g) (Additional File 7), our FTICR-MS analysis indicated that plant litter could be an important substrate, as lignin-like compounds were the most abundant biochemical class detected (Additional File 3: Figure S5, Additional File 8). In support of this, from our metagenomes, 38% of the HUM-V genomes encoded genes for potentially degrading phenolic/aromatic monomers, while 10% could degrade the larger, more recalcitrant polymers (Additional File 2). Gene expression of carbohydrate-active enzymes (CAZymes) also supported the degradation of plant polymers like starch and cellulose via extracellular glucoamylase (GH15) and endo-glucanase (GH5) from an actinobacterial genome (Microm_1) and the Nitrososphaeraceae (Nitroso_2), respectively (Fig. 3). In summary, many types of chemical and biological data reveal that heterotrophic, aerobic metabolism in these low carbon sediments is likely maintained by inputs from decomposition.
Given the capacity for plant polymer decomposition (e.g., lignin, cellulose, and starch) across HUM-V genomes, we next tracked the microbial fate of the degradation products of these metabolisms, including sugar monomers, short chain fatty acids, and carbon dioxide (Fig. 3, Additional File 2, Additional File 5). Metabolites detected by NMR included sugars (e.g., glucose, sucrose, and trehalose), which could be the result of depolymerization of plant derived polymers, and we confirmed the CAZYmes to use these substrates were also expressed in situ. Additionally, NMR also detected organic acids (acetate, butyrate, lactate, pyruvate, propionate) and alcohols (ethanol, methanol, isopropanol), with proteomics supporting the usage of acetate and methanol by Anaeromyxobacter MAG (Anaerom_1) and archaeal Woeseia (Woese_1), respectively. Here our metabolite and proteomic data demonstrated that plant biomass degradation supports sequential metabolic handoffs that lead to carbon dioxide production.
Carbon dioxide production and consumption is widely encoded by HUM-V microorganisms
In addition to carbon dioxide being generated from the heterotrophic metabolisms described above, our proteomics revealed that carbon dioxide could arise by the aerobic oxidation of carbon monoxide (CO). Genes for aerobic CO dehydrogenases (from Actinobacteria, Binatia, and CSP-1 genomes) were among the most expressed in these sediments. Analogous to findings from soil systems, it is possible that atmospheric carbon monoxide is a major energy source supporting persistent aerobic heterotrophic bacteria in deprived, or dynamic organic carbon environments . Based on the genomic inventory of these HUM-V genomes, we posit that Binatia, CSP1-3, and Micromonosporaceae are capable of carboxydotrophy, while Actino_1 is a carboxydovore, using CO metabolism as supplemental energy or possible carbon source during starvation .
Since heterotrophic respiration and carbon monoxide oxidation would generate carbon dioxide in these sediments, we next tracked microorganisms that could use this carbon source autotrophically (Fig. 3, Additional File 2, Additional File 5). The ability to fix carbon was prevalent, encoded by 75% of HUM-V microbial genomes. In fact, this metabolism was represented by multiple fixation pathways from 18 different lineages, demonstrating both functional and taxonomic redundancy. Specifically, this includes 4 different pathways (e.g., Calvin-Benson-Bassham cycle, reductive TCA cycle, 3-HydroxyPropionate /4-HydroxyButyrate cycle, 3-Hydroxypropionate bi-cycle) from members of nitrifying lineages (Thaumarchaeota and Nitrospirota) (discussed below), as well as from organisms with heterotrophic capabilities like Binatia, CSP1_3, Proteobacteria, Woeseiaceae, and Acidobacteria (Additional File 3: Figure S1, Additional File 2). Collectively our multi-omics data suggest that sediment microbial respiration is likely decoupled from river respiration, since some microbially produced carbon dioxide would be lost to supporting autotrophy. Our research further resolves the carbon economy in HZ sediments, implying that the net effect of carbon dioxide emissions from rivers could depend on the balance between carbon dioxide production from heterotrophy and carbon monoxide, as well as consumption by autotrophs.
Microbial metaproteomics supports theoretical inferences derived from geochemistry
The ratio of total element carbon (C) and total nitrogen (N) (e.g., C/N) is a geochemical indicator often used to assess the possible microbial metabolisms that can be supported in an ecosystem [55, 56]. Here the C/N ratios of these sediments were relatively low to other sediments at 6.4 ± 1.1 across the samples (Additional File 7). Biogeochemical theory posits that C/N values less than 15 would indicate rapid microbial mineralization of organic nitrogen to release inorganic nitrogen . This theory also states that C/N ratios less than 10 may indicate ammonium is released to the surrounding environment, allowing sufficient concentrations to simultaneously support the assimilatory needs of heterotrophs and energy needs of nitrifiers, allowing for their co-occurrence . Our multi-omics data offered a new opportunity to substantiate these geochemical inferences by profiling the possible substrates and microbial activity of nitrogen mineralizers and nitrifiers in river sediments.
Given the prevalence of ammonium in all 33 sediment samples (0.28–11.22 µg gram− 1) (Additional File 3: Figure S8, Additional File 7), we next examined our metaproteomic data for peptidases, genes that could contribute to the mineralization of organic nitrogen into amino acids and free ammonium. Hinting at the relevance of this metabolism, the gene expression of peptidases (n = 31) was 3 times more abundant and prevalent than glycoside hydrolase genes modulating organic C transformations (Additional File 5). In support of active microbial N mineralization, hydrophobic, polar, and hydrophilic amino acids were prevalent (more so than sugars) in the H1-NMR characterized metabolites (Additional File 3: Figure S8).
We focused our analyses on the putative extracellular peptidases, as these were most likely to shape organic nitrogen pools in the sediment. We categorized expressed peptidase families as either amino acid releasing (end terminus cleaving, e.g., M28) or peptide releasing (endocleaving, e.g., S08A, M43B, M36, MO4) (Fig. 4, Additional File 5). Linking these expressed peptidases to our genomes, members of the Actinobacteria, Thermoproteota, and Methylomirabilota, and Binatia are likely candidates for driving the mineralization of organic N. We then profiled amino acid transporters that were expressed, revealing uptake of branched chain amino acids, glutamate, osmoprotectants, spermidine/putrescine, and peptides (Fig. 4). This profiling indicated synergy and competitions for this organic N resource in these sediments.
We propose that in HZ sediments extracellular peptidases are a shared public good whose cost of production is assumed by certain individuals with benefits to the entire community . In some cases, taxa that mineralized organic N were consumers of the resulting products, as genomes in the Actinobacteria and Binatia expressed external peptidases genes and the genes for transporting the organic N products (Fig. 4, linkages shown). In other instances, members of the Proteobacteria, Thermoplasmatota, and CSP1-3 could be functioning as cheater cells that expressed only genes for intracellular transport and benefitting from peptidases produced by others. Our findings reinforce that cooperative interactions based on cross-feeding and public goods are likely at the core of many processes relevant to organic carbon (Fig. 3) and nitrogen (Fig. 4) cycling in these sediments.
Consistent with established conceptual geochemical theory, we showed the lower C:N ratios (< 10) of these sediments not only supported mineralization which could be a source of free ammonium in these sediments, but also nitrification. Supporting this, ammonium was detected in all sediments (average concentration 2.6 µg/gram of sediment) (Additional File 3: Figure S8, Additional File 7). Proteomics confirmed ammonium (NH4+) oxidation to nitrite was performed by Archaeal Nitrososphaeraceae (formerly Thaumarchaeota), with ammonia monooxygenase proteins being one of the most prevalent and highly expressed functional proteins (top 5%) across this dataset (Additional File 5). The next step in nitrification, nitrite oxidation to nitrate was inferred from nitrite oxidoreductase peptides assigned to 5 genomes belonging to 2 new species (Nitro_40CM-3_1, Nitro_NS7_3, Nitro_NS7_4, Nitro_NS7_5, and Nitro_NS7_14) (Additional File 3: Figure S9, Additional File 5, see sheet metabolism info). Both nitrifying lineages had the capacity for carbon dioxide fixation with the reductive tricarboxylic acid (TCA) cycle (e.g., ATP-citrate lyase) in Nitrospiraceae genomes, and 3-HydroxyPropionate/4-HydroxyButyrate (3HP/4HB) encoded by the Nitrososphaeraceae. We did not detect genomic evidence for comammox or anammox and thus aerobic, chemolithoautotrophic nitrification supported by a metabolic partnership between bacteria and archaea occurred in the presence of heterotrophs as predicted by C/N ratios.
Similarly, others have reported the prominence of nitrifying lineages from the archaeal thaumarcheotal Thermoproteota and bacterial Nitrospirota both by 16S rRNA  and genome-resolved metagenomics [12, 13] in HZ sediments. Here we nearly doubled the genomic sampling of these river nitrifiers, assigning unique gene expression patterns to 3 and 17 genomes from Nitrososphaeraceae and Nitrospiraceae respectively, including the first genomic sampling of new genera and species (Fig. 2). Our co-expression data indicate that metabolic handoffs between archaeal ammonia oxidizers and bacterial nitrite oxidizers may be an unaccounted-for biogenic source of nitrate in these sediments (Additional File 5). This suggests the activity of nitrifiers could be an underappreciated modulator of nitrous oxide fluxes from oligotrophic HZ sediments, both through their indirect stimulation of denitrifiers and their own contributions to this greenhouse flux . Taken together, the archaeal-bacterial nitrifying mutualism outlined here appears well adapted to the low nutrient conditions present in many HZ sediments, warranting future research on the variables that constrain nitrification rates (i.e., ammonium availability, dissolved oxygen, pH) and their role as driver of nitrogen fluxes from these systems .
Denitrification is encoded by novel and taxonomically diverse lineages in HZ sediments
Beyond the possible biogenic sources of nitrate, we identified from nitrification, these HZ sediments receive significant allochthonous nitrate from groundwater. When river stage decreases, groundwater discharges through the HZ sediments, bringing nitrate concentrations to over 20 mg/L [2, 62]. In support of an important influence of nitrate from either source, HUM-V genomes with the capacity for nitrate reduction spanned diverse taxonomies, with NarG or NapX encoded in 11 genomes from the Actinobacteriota, Binatia, Gammaproteobacteria, and Myxococcota (Additional File 3: Figure S1). However, our proteomic evidence for nitrate reduction was detected in less than 10% of the 33 sediment samples, with unique peptides assigned to Binatia NarG from a single sample.
Based on gene expression data, we inventoried other steps in the denitrification pathway. Nitrite was reduced via nitrifier and denitrifier reduction to nitric oxide from archaeal ammonia oxidizers of the Nitrososphaeraceae active in 79% of metaproteome samples, and from Gammaproteobacterial Burkholderia in a single sample, respectively. The role of nitrite reduction by Nitrososphaeraceae is still under investigation but could be used for detoxification . Genes for converting nitric oxide to nitrous oxide were not detected in proteomics, but we did find evidence that the Desulfobacterota genome (Desulf_UBA2774_1) expressed the nos gene for reducing nitrous oxide to nitrogen gas. Phylogenetic analysis suggest this organism used a "Clade II” nos sequence type adapted for low atmospheric concentrations of nitrous oxide (Additional File 3: Figure S2), and consistent with our genome metabolic summary did so without encoding other steps of the denitrification pathway . Notably, the capacity for denitrification exists beyond those detected in proteomics, as Binatia encoded dissimilatory nitrite reduction to ammonium (DNRA) and the potential for nitrous oxide production via nor was encoded by two Gammaproteobacteria (Steroid-FEN-1191_1, Steroid_1) and a member of the Myxococcota (Anaerom_1).
In summary, our data adds to the growing realization that complete denitrification by single microorganism is likely the exception rather than the rule in natural systems , including the HZ . In support of this, none of the genomes reconstructed here encoded a complete denitrification pathway for reducing nitrate to nitrous oxide or dinitrogen gas (Additional File 3: Figure S1). Similarly, our proteomics data hinted that separate microbial members likely catalyzed each step of the denitrification pathway (Additional File 3: Figure S4). This suggests cross-organism inorganic nitrogen exchange would be necessary for nitrogen gas flux, such that physical processes (e.g., advection, diffusion) or the spatial colocalization of microorganisms, as well as organic carbon availability, may have disproportionate impacts on flux of nitrous oxide and dinitrogen from these sediments.
HUM-V identifies new microbial and viral players in hyporheic zone carbon and nitrogen cycling
The creation of a genome database expanded upon prior amplicon-based surveys, allowing us to assign new metabolic functions to microbes and even viruses in hyporheic sediments. While HUM-V contains genomes from phyla (CSP1-3, Eisenbacteria) and classes (Binatia, MOR-1 in Acidobacteria) composed entirely of uncultivated members (Fig. 1d), here we focus our analysis on the Binatia, as we recovered 7 genomes (one which included a complete 16S rRNA gene), they recruited peptides, and they also played key roles in carbon and nitrogen cycling. Using the 16S rRNA gene (from Binatia_7), we inventoried the distribution of closely related species to our HUM-V genomes (> 97% similarity) in the Sequence Read Archive (SRA) samples, to uncover the ecological distribution of these organisms from soils, as well as a wide variety of terrestrial, terrestrial-aquatic, marine samples (Fig. 5), indicating the processes uncovered by proteomics here are likely applicable to a wide range of ecosystems.
A recent comparative genomics analysis on Binatota MAGs provided a first assessment of their metabolic potential, indicating genes for methylotrophy, alkane degradation, and pigment production were distributed across the phylum . These HUM-V genomes belong to a class and family denoted UBA9968. Contrary to their prior metabolic inventory, HUM-V UBA9968 MAGs do not encode the potential for methanol oxidation, and we identified a new role in the decomposition of aromatic compounds from plant biomass (phenylpropionic acid, phenylacetic acid, salicylic acid), and xenobiotics (phthalic acid) (Fig. 5). We provide the first proteomic evidence for any members of the Binatia, supporting their roles in aerobically oxidizing carbon monoxide, producing extracellular peptidases, and in denitrification. Together these findings illustrate the power of HUM-V paired proteomes to illuminate new roles for members of uncultivated, previously enigmatic lineages in HZ carbon and nitrogen cycling.
The relatively high proteomic recruitment of viruses sampled in HUM-V (Fig. 2d) suggested important viral contributions in these sediments. In silico analysis assigned a putative host to 29% of the 111 viral genomes linking 18 microbial genomes that belong to bacterial members in Acidobacteriota, Actinobacteriota, CSP1-3, Eisenbacteria Methylomirabilota, Myxococcota, Nitrospirota, and Proteobacteria (Additional File 3: Figure S10, Additional file 2, Additional file 6). Analysis of the metaproteomes for these phage-impacted microorganisms revealed these hosts expressed genes for nitrification (Nitrospiraceae) as well as carbon monoxide oxidation and nitrogen mineralization (Actinobacteria) (Fig. 6). Additionally, HUM-V phage genomes encode auxiliary metabolic genes with the potential to enhance microbial metabolism of carbon (CAZymes), sulfur (sulfate adenyl transferase), and nitrogen (amidase to cleave ammonium) (Additional File 3: Figure S11, see Additional File 3 supplemental text). We also show viral abundances were better predictors of total carbon and nitrogen percentages relative to microbial genome abundances (Additional File 3: Figure S12, Additional File S9, see Additional File 3 supplemental text). Together, these HUM-V enabled results indicate viral infections may contribute to river sediment functioning and raise the question to whether enhanced viral interrogation might provide a means to improved ecosystem or biogeochemical models in these systems.