Correlations of metagenome samples based on available metadata.
We analysed 37 metagenomes from all over the world. The physicochemical variables of most of our sediment samples were not available for comparison. However, they all come from shallow sediments at the interface with the water column, for which metadata such as geographic parameters (latitude and longitude, depth in metres below the seal level (mbsl), and depth in metres below the seafloor (mbsl) are known.
Twenty samples were taken below 1000 mbsl and 17 above (Fig. 1 and Supplementary Table 1). This allowed us to test the correlation between environmental variables and the abundance diversity matrix at the class level.
We calculated a Mantel test to test whether the taxonomic community structure was correlated with geographical and spatial parameters (Supplementary Table 2). Our results showed a significant and positive correlation between depth (mbsl) and taxonomic diversity (Bray-Curtis dissimilarity matrix), while geographical distances and sediment depth (mbsf) were not significant (Supplementary Table 2). To further analyse this positive correlation between depth of the water column and taxonomic diversity, we made a linear regression of both dissimilatory matrixes (Supplementary Fig. S1). A low R-squared value (0.0404) suggests that depth below sea level does not explain much of the variation in taxonomic dissimilarity.
For that, we decided to make a metagenomic profile of the samples based on the taxonomical diversity of the community against its metabolic potential.
To this end, we assign categories to our samples (oxic/anoxic environments) based on their gene content of heme-copper oxygen reductases (HCO) and nitric oxide reductases (NOR). HCOs and NORs are enzymes found in the last complexes of many respiratory chains in microorganisms (Sousa et al., 2011). As reference, we used four sediments found in Loki's Castle labelled as anoxic and one from the South Pacific Gyre labelled as oxic (Supplementary Table 1). Everything above the oxic control was considered oxic, and everything below was considered anoxic. Our results show 18 metagenome samples that can be considered oxic and 13 anoxic. Some of the samples assigned to the oxic label were shallow samples (under 1000 mbsl) and, although there is a correlation between a greater depth of the water column and thin sediments, because the amount of organic matter is depleted and oxygen penetration is found throughout the sediment, many of our samples were taken in the first centimetres (from 0 to 2.23 mbsf) where the community utilizes oxygen. (D’Hont et al., 2015). This could be the reason why these samples below 1,000 metres below sea level are above the oxic control. (Fig. 2, Supplementary Table 1).
Once we established the abundance of HCO and NOR as a condition in the samples, a principal coordinate analysis (PCoA) based on the relative taxonomic abundance at the class level (Bray-Curtis dissimilarity matrix) showed a clear separation of the samples labelled oxic and anoxic (62.18% of the variance explained in CoA1 and CoA2) (Fig. 3a, Supplementary Table 3). Samples were clustered into two groups; In the oxic group, samples from the deep Gulf of Mexico (Godoy-Lozano et al., 2018; Zhao et al., 2020) are reported without hydrocarbon or methane seeps (Zhao et al., 2020). The South Pacific Gyre is the only sample with an oxic level and an oligotrophic layer (Tully et al., 2016). Samples from Korea and Antarctica present anthropogenic disturbances; the Korea metagenomes are beach samples, the Davis Station are shallow samples rich in nutrients, and oxygen is consumed in the first centimetres of the sediment (Leeming et al., 2015). In the anoxic group, samples from the Gulf of Mexico (Delaware University), the Basin and Loki’s Castle, the Hydrate Ridge of the Pacific, and the Santa Monica Mounds were clustered together. These have been reported to have seepages of hydrocarbons or related compounds (Zhao et al., 2020), hydrothermal vents with anaerobic metabolism (Jaeschke et al., 2012; Kauffman et al., 2018; Bäckström et al., 2019;), and mud volcanoes (Kauffman et al., 2018; Bäckström et al., 2019) (Fig. 3a).
Once we saw a clear separation between labels, we explored differences in taxonomic composition between the oxic and anoxic samples through a LEfSe analysis (Segata et al., 2011) based on bacteria and archaea abundance matrices at the class and phylum levels, respectively (Fig. 3b, Supplementary Fig. S2 and S3).
The oxic samples showed a significant difference in Alphaproteobacteria. However, anoxic samples had significant differences in several bacterial classes: Epsilonbacteria, Deltaproteobacteria, Bacilli, Clostridia, Fusobacteriia, Dehalococcoidia, Bacteroidia, Sphingobacteriia, Cytophagia, and Thermodesulfobacteria. Among Archaea phyla, Thaumarchaeota are significant indicators of oxic samples, while Candidatus Bathyarchaeota, Euryarchaeota, and Candidatus Lokiarchaeota are indicators of anoxic samples. This is consistent with the literature where it is known that anoxic sediments are enriched with strictly anaerobic groups such as sulphate-reducing bacteria of the Chloroflexota phylum and Deltaproteobacteria and methanogenic archaea, such as Euryarchaeota, while in oxic sediments there is prevalence of the Alphaproteobacteria class in bacteria and Thaumarchaeota phylum in archaea (Biddle et al., 2008; Orsi, 2018; Hoshino et al., 2020). Our results found that the classes Dehalococcoidia and Deltaproteobacteria of the Chloroflexota phylum along with other anaerobic classes such as Clostridia, Thermodesulfobacteria, Fusobacteriia bacteria and Euryarchaeota archaea indicative of an anoxic environment, while the Alphaproteobacteria class of bacteria and Thaumarchaeota archaea (Tully et al., 2016; Hoshino et al., 2020) were indicative of oxic samples (Fig. 3b).
In summary, both groups exhibited significant differences in the classes of bacteria and the archaea diversity that appear to match the anoxic / oxic conditions of the microorganisms reported in marine sediments, as well as the genes reported (Fig. 3b).
CAZyme profile of marine sediments.
We examined the distribution of CAZyme (Carbohydrate-Active Enzymes) content within the metagenomes. To accomplish this, we performed a Principal Coordinate Analysis (PCoA) using normalised counts of all CAZyme modules identified within each metagenome sample. Like our findings on beta diversity, our samples showed a clear separation between oxic and anoxic conditions (59.18% of the variance explained in CoA1 and CoA2) (Fig. 4).
Given the assumption that carbon turnover in marine sediments is carried out by microbial organisms that use secreted enzymes to store carbon over time (Orsi et al., 2018), we decided to search for extracellular CAZymes. We performed a functional annotation of CAZyme modules that had a peptide signal against the CAZyme database (Lombard et al., 2013). We categorized sequences into the six classes of the CAZy database, which are implicated in the creation, breakdown, and identification of carbohydrates. These classes are Glycoside Transferases (GTs), Glycoside Hydrolases (GHs), Carbohydrate Esterases (CEs), Carbohydrate Binding Modules (CBMs), Polysaccharide Lyases (PLs), and Auxiliary Activities (AAs). Eighteen extracellular CAZyme modules were found in more than 1% of all total annotations (accounting for 55.94% of all CAZymes annotated in our metagenome samples). Of these modules, GH109, GH23, and CE1 were the most abundant (Fig. 5a). Their abundance was particularly high in the following metagenomes: Guaymas Basin (GBGOC), Davis Station from Antarctica (DSANT), Korean beaches (KOR), South Pacific Hydrate Ridge (HRSPAC47), Loki’s Castle (LOKART) from the Arctic, Santa Monica Mounds (SMMPAC), and the Gulf of Mexico (CIGOMD18 and KJGOM6) (Fig. 5b).
The metagenomes had an extracellular inventory of CAZyme, primarily targeting algal and necromass detritus (see Fig. 5b). Among the prevalent modules engaged in the breakdown of algal debris were glycoside hydrolase modules GH2, GH3, and GH16_3, as well as carbohydrate esterase CE1. The binding modules included CBM9, CBM44, and CBM67.These modules are composed of enzyme families with β-galactosidases, β-glucuronidases, β-mannosidases, exo-β-glucosaminidases activities in the case of GH2 and GH3, where glycoside hydrolases and phosphorylases perform a wide range of functions that involve biomass degradation and remodelling of plant and bacterial cell walls. GH16_3 breaks laminarase, a carbohydrate found in brown algae (Qin et al., 2017) while CE1 has acetylxylan esterases (EC 3.1.1.72), feruloyl esterases (EC 3.1.1.73) activities, and many other esterases such as PHB depolymerases. CM9 and CBM44 are modules targeting cellulose binding domains mainly xylan and other carbohydrates cellulose binding domains and CB67 targets binding to L-rhamnose, a carbohydrate produced by microalgae (0-13.3 of algal composition%) (Brown, 1990) (Fig. 5b) (Lombard et al.,2014).
For necromass degradation, the GH23 and GH103 modules contain families of peptidoglycan lytic transglycosylases. GH23 has also been found to have chitinase activity. Furthermore, known activities of the CE4 and CE14 families include enzymes such as acetylxylan esterases, chitin deacetylases, chitooligosaccharide deacetylases, and peptidoglycan deacetylases (CE4) and diacetylchitobiose deacetylase (EC 3.5.1.-) chitin disaccharide deacetylases (CE14). (Lombard et al., 2014). Finally, for host glycan degradation, the GH29 module contains α-L-fucosidases, and the GH109 modules conform to -N-acetylgalactosaminidase, α-N-acetylgalactosaminidase, and β-N-acetylhexosaminidase. GH33 sialidase or neuraminidase (EC 3.2.1.18) targets the sialic acid of the host glycan (Fig. 5b).
It is documented that bacterial communities dominate shallow sediments, which are primarily composed of clay, cellular envelopes of planktonic organisms, and organic matter (Bienhold et al., 2016). Genes related to the degradation of recalcitrant carbon, including cellulose, chitin, or peptidoglycan, are expected to play an important role in marine sediments (Tully et al., 2016; Bradley et al., 2018; Orsi et al., 2018). Necromass contributes significantly to meeting the energy demand of up to 13% of the microbial community in shallow sediments when it is oxidised under oxic or anoxic conditions. The oxidation of one cell per year can provide sufficient energy to support the demand of thousands of cells in sediments with low energy resources, potentially positioning necromass oxidation as a primary carbon source for microorganisms unable to survive in energy-poor environments (Bradley et al., 2018). The fact that mineralization and adsorption of biopolymers in sediment particles could reduce the accessibility of other carbohydrates (Orsi et al., 2018) could make cell envelopes, such as peptidoglycan, a preferred choice for secreted CAZyme modules found to be the most abundant (Fig. 5a). Most of these CAZyme modules are found across a broad spectrum of life forms but are concentrated in bacteria (Lombard et al., 2014).
Some of the most abundant modules differed between the oxic and anoxic samples. The CAZyme modules GH23, CBM9, GH16_3, GT51, CE4, and CE14 were significantly more abundant in oxic samples. On the other hand, CBM44 and GT83 were found to be different in relation to anoxic samples. Interestingly, despite quite opposite distributions, both CBM44 and CBM9 can bind cellulose (Fig. 5c).
Reconstruction of MAG and their potential to degrade carbohydrates found in marine sediments.
To better understand the community involved in carbohydrate turnover in marine sediments, we recovered MAG from each metagenome sample. We reconstructed 494 high quality MAG (Completeness 75%, Contamination < 10%) that were assigned taxonomically. The classes Alphaproteobacteria, Bacteroidia, and Gammaproteobacteria were among the most represented in our MAG (Supplementary Table 4).
To see whether our recovered MAG are involved in carbohydrate turnover in marine sediments, the CAZymes of the 18 most abundant modules found in our annotations of the metagenome samples were searched and annotated in the MAG. We focused on secreted CAZymes modules and CAZymes modules corresponding to CAZyme Gene Clusters (CGC) (Fig. 6 Supplementary Table 5 and Supplementary Table 6).
The GH23 module was the most abundant in sediment MAG and no CBM44 modules were found in any. MAG from the classes Alphaproteobacteria, Bacteroidia and Gammaproteobacteria had more than one module of extracellular CAZymes (Fig. 6). Alphaproteobacteria MAG had GH103 and GH23 modules. The Alphaproteobacteria MAG belong to the Rhodobacteraceae and Methyloligellaceae families with species found in marine environments such as the genera Pseudorhodobacter, Sulfitobacter, Roseicyclus and Hyphomicrobium (Uchino et al., 2002; Rathgeber et al., 2005; Yoon et al., 2007; Vuilleumier et al., 2011) and other species of the genus Methyloceanibacter who had been previously reported in North Sea sediments (Vekeman et al., 2016) (Supplementary Table 4, Supplementary Table 5).
The Bacteroidia MAG contained CAZyme modules in at least one MAG except for module GT83; CAZyme composition of the main modules found in our metagenome samples (GH109, GH23 and CBM9) was higher in the family Flavobacteriaceae were MAG assigned to the genus Prevotella (DSANT95_maxbin.044), Maribacter (DSANT06_maxbin.002, DSANT06_maxbin.016 and DSANT95_maxbin.030) Pricia (DSANT95_maxbin.051, DSANT95_maxbin.024 and DSANT95_maxbin.016) Eudoraea (DSANT11_maxbin.017), Aureibaculum (DSANT06_maxbin.006 and DSANT08_maxbin.008) along with other abundant modules (Fig. 6, Supplementary Table 4, Supplementary Table 5).
The species Prevotella, Maribacter, and Aureibaculum had been recovered from marine sediments from the Pacific Ocean and Yellow Sea (Reed et al., 2002; Nedashkovskaya et al., 2004; Zhao et al., 2019). The Pricia genus had previously been isolated from a sample of sandy intertidal sediment collected from the Antarctic coast (Yu et al., 2012) which is consistent with the place it was recovered (Davis Station). Eudoraea species were isolated from coastal waters of the Adriatic Sea (Alain et al., 2008).
Finally, MAG without GH23 modules such as the classes of Phycisphaerae, UBA2214, Planctomycetia and Bacteroidia contained GH109, GH2, GH29 and CBM67. UBA2214 was also enriched with GH3 modules. MAG from UBA2214, Phycisphaerae, and Planctomycetia MAG were assigned to the Zgenome-0027, Anaerohalosphaeraceae and Thermoguttaceae families, respectively. Species from these families are found in marine sediments and low oxygen aquatic environments (Dedysh et al., 2020; Pradel et al., 2020; Chiciudean et al., 2022). Furthermore, four Bacteroidia MAG assigned to the Bacteroidales order showed a similar CAZyme inventory (Fig. 6, Supplementary Table 5).
As CAZymes are also known to work in conjunction with other CAZymes and proteins forming CGCs, we decided to search for clusters involving the main CAZyme modules found in our metagenomes.
In general, CGCs targeting the GH23 module often came attached to a CBM50 module; GH3 module often came attached to a CBM6 module and GH2 modules often came with CBM67 modules (Supplementary Table 6). CBM50, a module for the recognition of chitin or peptidoglycan (Ohnuma et al., 2008), has already been abundantly reported in marine sediments (Orsi et al., 2018). CBM6 is known for the recognition of xylanases, lichenases, β-agarases, laminarinases and deacetylases, and CBM67 is known for the recognition of rhamnose, both carbohydrates are found in algal content (Lombard et al., 2014).
MAG belonging to Bacteroidia and Gammaproteobacteria have the highest number of CGCs. Bacteroidia MAG classified as Prevotella (DSANT95_maxbin.044) and Gammaproteobacteria GCA-001735895 sp009937625 (KOR58_maxbin 012.fasta. contigs.refined) had the highest number of CGCs of all, with five including GH3, GH23 and GH2 modules in the case of Prevotella and targeting GH23, GH103, GT51 and CE4 modules. (Supplementary Table 4, Supplementary Table 5, Supplementary Table 6). The Alphaproteobacteria MAG contained CGCs targeting GH23, GH103, and GH3 modules. Gammaproteobacteria CGCs were found targeting CE4, GH103, GT51, CBM9, GH23 and GH3 modules.
Even though assembled MAG cannot cover all sediment diversity, we did find a group of MAG annotated to classes that were abundant in our samples, such as Bacteroidia, Alphaproteobacteria, and Gammaproteobacteria. We did find the CAZyme inventory and CGCs that contained the most abundant modules found in our metagenomes in these classes of bacteria. Furthermore, the MAG we found having important CAZyme modules belong to genera or families found or isolated in marine environments, making these classes some of the main drivers for carbohydrate transformation in marine sediments. It is well known that the Bacteroidota phylum is considered the primary phylum for carbohydrate degradation (Lapébie, et al., 2019). All our MAG from this phylum belonged to Bacteroidia. Phylum Proteobacteria was the most prevalent one in sediment samples. Most of the taxa we found belong to Gamma and Alpha Proteobacteria (47.75–13.88% and 33.83–6.91% of relative abundance, respectively) (Supplementary Table 7, Supplementary Fig. S2).
We successfully identified and analysed the MAG from metagenome samples, shedding light on the key players in carbohydrate turnover in marine sediments. These classes showed the presence of the most abundant CAZyme modules and CAZyme Gene Clusters (CGCs) that correspond to carbohydrate degradation in marine environments. The presence of these CAZymes and CGCs in marine-derived MAG indicates their critical role in carbohydrate transformation in marine sediments.
This highlights the importance of the Bacteroidota phylum in carbohydrate degradation, particularly the Bacteroidia class, and the significant contributions of both Gamma and Alpha Proteobacteria to the observed taxa in marine sediment samples.
CAZyme profile of marine sediment taxa vs. soil sediments.
Since Alphaproteobacteria, Gammaproteobacteria and Bacteroidia had such a rich inventory of CAZyme for carbohydrates found in marine sediments, we decided to explore how different were the CAZyme inventories of our MAG to those of MAG of Alphaproteobacteria, Bacteroidia, and Gammaproteobacteria selected from soil samples published by Nayfach et al., (2021) using the same selection criteria (Completeness > 75% Contamination < 10%).
Marine sediments and soil are rich ecosystems of microorganisms and are crucial components of the Earth's surface, as they can sequestrate carbon and play a role in carbon recycling. (Arndt et al., 2013; Bargett et al., 2014).
MAG from these classes were mainly from different families compared to the sediment MAG we recovered (Supplementary Table 8). PCoA analysis of the counts of all the CAZyme modules found in each MAG showed that the composition of CAZyme appeared to be similar between the phyla where Alfa and Gamma Proteobacteria clustered together and as did Bacteroidia MAG (29.74% of the variance explained in CoA1 and CoA2) (Fig. 7a).
The main difference between the classes recovered from the MAG of sediments compared to the MAG of soil was the number of CAZyme modules found between them and the diversity of the CAZyme modules: all classes of soil MAG had a total number of modules greater (Fig. 7b) and more diverse (Fig. 7c) compared to those we recovered from marine sediments where the Bacteroidia class was the one that had more counts and more diverse CAZyme modules. This is consistent with studies of MAG in environments where CAZyme modules are phylogenetically conserved, among microbial phyla, but some specificity toward habitat is present where soil is an ecosystem where richness in and diversity in CAZyme modules has been found in contrast to marine environments such as marine sediments (López-Mondéjar et al., 2022). Furthermore, the Bacteroidetes phylum to which the Bacteroidia class belongs has been reported as the main class for carbohydrate transformation, as it uses a large inventory of CAZyme (Lapébie et al., 2019).
This comparison between the MAG of marine sediment and soil metagenome-assembled genomes of Alphaproteobacteria, Gammaproteobacteria, and Bacteroidia reveals interesting differences that highlight the contrasting ecological roles and environmental pressures these bacteria experience in their respective habitats. The higher number and diversity of CAZyme modules found in soil MAG compared to marine sediment MAG support the idea that soil microbial communities are exposed to a wider variety of organic substrates, including plant biomass, animal detritus, and complex soil organic matter. This diversity of substrates likely drives the need for a broader suite of enzymatic capabilities in soil microorganisms, as reflected in their CAZyme repertoire.
On the contrary, marine sediment environments may be more homogeneous in terms of organic substrate availability, possibly due to the predominance of marine-derived organic matter, such as phytoplankton and other marine organisms. This could explain why marine sediment MAG possess a less diverse CAZyme profile compared to soil MAG.
Another possible explanation could be that the marine sediment environment is more energy-limited compared to the soil, leading to selective pressure for organisms that can efficiently degrade available organic matter with a smaller set of enzymes. This could potentially lead to a more streamlined CAZyme profile in marine sediment bacteria.
Despite these differences, the fact that Alphaproteobacteria, Gammaproteobacteria, and Bacteroidia from soil and marine sediments cluster together in the PCoA analysis suggests a core set of CAZyme modules that are conserved within these taxonomic groups, likely reflecting shared evolutionary histories and core metabolic functions.
This study underscores the importance of considering the ecological context when studying the functional capabilities of microbial communities. The stark differences in the CAZyme profiles between soil and marine sediment bacteria underscore how environmental factors can shape the functional potential of microbial communities. Therefore, it is essential to take these factors into account when studying the ecology and function of microorganisms in different environments.