Particulate organic matter (POM) and dissolved inorganic carbon (DIC) fixation have been proposed as the main trophic basis for heterotrophic microbes in the dark ocean1,7,8,10. However, the contribution of individual organismal groups to the dissolved organic matter (DOM) pool and the role of free-living versus particle-associated microbes in transforming DOM in the deep ocean, remains enigmatic1-3. During the last decade, metagenomic and metatranscriptomic surveys revealed the structure and function of marine microorganisms7,8,11-15 providing information on the metabolic potential of bacteria and archaea in the global ocean carbon cycle. However, the link between the genetic potential (metagenomics), the transcriptional response (metatranscriptomics) and the metabolic function (metaproteomics) is not well documented, especially in the deep ocean.
Proteins are essential biomolecules for all living organisms. While their expression level is a direct response to the (micro)environment16-18, their abundance is also a measure of the contribution of individual populations to the total biomass in a specific depth layer in the oceanic water column19. We characterized the structure and function of marine microbial communities using a metaproteomics approach to assess the protein abundance of individual taxa, we also focused on microbial enzyme expression profiles to determine the links between organic matter supply and microbial activities.
Protein profiles of the marine plankton community
We collected 61 metaproteomic samples between 5m and 4000m depth of the major ocean basins and three size-fractions (>0.8mm, 0.2-0.8mm, and <0.2mm) to cover eukaryotes, bacteria, archaea, and viruses (Fig. 1A, Extended Data Fig. 1A-C, supplementary dataset 1, see Methods). Metaproteomic analysis heavily relies on the completeness of sequence database, a well curated gene catalog from global scale metagenomic and metatranscriptomic surveys11,12,14,15,20 will significantly improve the protein identification and functional profiling in the metaproteome. Here we employed an optimized database construction strategy for robust protein identification21. Metagenomic assembles from same sampling stations were combined to publically available metagenomics/metatranscriptomic assembles to obtain a full coverage of all organisms in the ocean (including eukaryotes, bacteria, archaea and virus) throughout the entire water column from global ocean expiditions11,12,14,15,20. A two-step searching approach was implemented to minimize the high false discovery rate in protein identification caused by large database size22. Searching against such comprehensive database covering marine micro-eukaryotes (79,577,878 sequences), prokaryotes (105,585,296 sequences) and viruses (1,501,235 sequences), we identified 234,550 protein entries (Extended Data Table 1, supplementary dataset 2, see Methods). Bacterial (156,187) and eukaryotic (55,494) sequences were highly abundant, followed by archaeal (7,163) and viral (6,213) sequences (Extended Data Table 2). More than 70% of the protein sequences were taxonomically classified at the class level and were also functionally annotated in at least one functional database (Extended Data Fig. 1D, E). While protein counts represent protein occurrence, peptide-spectrum matches (PSMs) are related to protein abundance. By investigating the relationship between taxonomic affiliation and protein sequence attributes (protein counts, unique peptides and peptide-spectrum matches), our results, for the first time, identified the major contributors to the marine protein pool. Bacteroidetes, Alpha- and Gamma-proteobacteria (especially Alteromonadales) from the Bacteria domain dominated the identified protein pool (Fig. 1A). Sequences from bacterial autotrophs such as Cyanobacteria and Thaumarchaea were also abundant. Interestingly, SAR11, despite their high abundance throughout the ocean23, contributed only marginally to the metaproteomic dataset (Fig. 1A). Sequences related to Myoviridae were the most abundant viral proteins. Eukaryotic proteins mainly originated from algae and zooplankton (Fig. 1A). Across kingdoms the analysis detected clear differences between the size-fractions and depth strata in the protein distribution pattern (Fig. 1B). For example, more than 45% of eukaryotic proteins were found in the >0.8mm fraction in the epipelagic realm, bacterial and archaeal proteins dominated the 0.2-0.8mm fraction throughout the water column and viral proteins were detected in the <0.2mm fraction. The functional annotations of sequences from each kingdom varied drastically (Fig. 1C). Bacterial proteins were involved in energy production and cellular metabolism, while eukaryotic proteins were mainly involved in protein synthesis and cytoskeleton interlinking.
Functional assessment of the metaproteome was carried out using KEGG-orthologues (KOs) based analysis24. Protein sequences were grouped into 3,817 KOs (supplementary dataset 3). The relative abundance of KOs revealed that although the sample size and the number of detected KOs were different between “omics” dataset (Extended Data Fig. 2A, B), KOs with high abundances in the metagenome and metatranscriptome were also highly abundant in the metaproteome (Extended Data Fig. 2C). The functional composition of the metaproteome, however, was significantly different (Fig. 2A, PERMANOVA, p<0.05) from the metagenome and metatranscriptome for both, the eukaryotic and prokaryotic community11-13. Diversity analysis on KOs showed that the metaproteomics dataset had the lowest alpha-diversity but a high beta-diversity (Wilcoxon test, p<0.05, Fig. 2B, C). Clusters of size-fractions were found in the metagenomic and –transcriptomic dataset (Fig. 2A), as well as at the metaproteome level (Fig. 2D, PERMANOVA, p<0.05), although the metaproteomic samples were collected from disparate ocean regions (Extended Data Fig. 1A). The metaproteomic KO profile in the >0.8mm size-fraction showed the lowest alpha-diversity but the highest variance (Bray-Curtis dissimilarity, Fig. 2E, F). In contrast, in the <0.8mm size-fraction, alpha-diversity was highest and variance lowest (Wilcoxon test, p<0.05, Fig. 2E, F). These results suggest that the high variance of the metaproteome is driven by the KO profile in the >0.8mm size-fraction, where low within site diversity of proteins but high diversity across sites was observed (Fig. 2E, F). As the samples were collected from diverse sites (Extended Data Fig. 1A), differences in epipelagic biogeochemistry might affect the phytoplankton community and shape high diversity of particles25, which leads to the high beta-diversity in the <0.8mm size-fraction. A similar pattern was also found for the prokaryotic metatranscriptomes (Fig. 2B, C). The changes in the level of protein profiles between samples (Fig. 2B, C) imply that the genome (particularly prokaryotic genome) responds to the environment by transcribing and translating proteins/enzymes adapted to specific functions with adequate expression levels16,26. This observation is consistent with the fact that the transcription response12 and translational regulation of protein synthesis16 rather than protein-coding gene abundance control the microbial interaction with the environment such as particles26,27. In addition, the distinct clustering pattern revealed by different ‘omics’ analysis (Fig.2A) suggested strong interactions between microbial activity and ocean dynamics, thus large-scale surveys combining ‘omics’ tool with rate/parameter measurement will further advance our understanding of the biogeochemical cycles in the global ocean.
To determine the key enzymes/proteins in the different size fractions, we identified 1,630 (40% of total KOs) differently expressed (relative abundance is significantly different) KOs either among size-fractions (1,387 KOs) or in depth-strata (412 KOs) of the water column (Wilcoxon test, p<0.05, Extended Data Fig. 3A, S4, supplementary dataset 4). There were 178 KOs differentially expressed in both, depth and size-fractions comparisons, predominately originating from eukaryotes (ca. 60%, pie chart in Extended Data Fig. 3A). However, bacteria dominated the differentially expressed unique KOs in size-fractions (ca. 70%) and depth (55%) (pie chart in Extended Data Fig. 3A). The highest number of differentially expressed KOs was found between the <0.2mm and 0.2-0.8mm size-fraction and the difference between epi- and bathypelagic is substantial (Extended Data Fig. 3B-D). In contrast, the number of differentially expressed KOs was relatively low between the <0.2mm and >0.8mm fraction (Extended Data Fig. 3B). This suggests that the bacterial protein in the <0.2mm fraction likely originates from the particle-attached community. Functional annotations of microbial protein profiles revealed that photosynthesis, nitrification/denitrification, microbial chemotaxis and motility exhibited different expression profiles between size-fractions (Extended Data Fig. 4, supplementary dataset 4). Fourteen KOs were predicted to be responsible for the functional clustering between the size-fractions using a machine-learning random forest28 classification (Fig. 2G, Extended Data Fig. 5). Among these KOs were enzymes involved in C1 metabolism (carbon-monoxide dehydrogenase, CoxL), CO2 fixation (Rubisco, RbcL), nitrification/denitrification (nitrate reductase/nitrite oxidoreductase, NarG/NxrA), sulfur metabolism (dimethylsulfide dehydrogenase, DdhA) and transporter proteins, all different in relative abundance in the different size fractions (Fig. 2G). Hence, despite the depth-stratification in the taxonomic composition29, functional differences among the size-fractions of the marine microbiome were observed.
Zooplankton-supported deep-sea POM flux
The taxonomic composition of the metaproteome also exhibited a size-clustering pattern (Table 1, supplementary dataset 5). Eukaryotic and bacterial proteins constituted 70-80% of the total proteome in our metaproteomic dataset with varying contributions among the size-fractions (Table 1). In the >0.8mm size-fraction, the ratio between bacterial and eukaryotic proteins (Bact:Euk) was about 1 (Table 1). In the 0.2-0.8mm size-fraction, however, the Bac:Euk ratio of proteins was 3, and in the <0.2mm fraction ~5. The increase in the Bac:Euk ratio of proteins towards the smaller size fractions, particularly in the bathypelagic (Wilcoxon test p<0.05, supplementary dataset 5), indicates a shift in the source of organic matter, where eukaryotic proteins dominate the particle fraction while bacterial proteins dominate the dissolved protein pool.
By grouping eukaryotic proteins into taxonomic categories (zooplankton, algae and fungi), changes in eukaryotic protein profiles were observed throughout the water column (Fig. 3A, Extended Data Fig. 6, supplementary dataset 6). Although zooplankton-derived proteins exhibited a weak depth-related trend in the >0.8mm size-fraction (Wilcoxon test, Epi- vs. Meso-pelagic, p=0.095; Epi- vs. Bathy-pelagic, p=0.111, Fig. 3A supplementary dataset 5), in the meso- and bathypelagic, the relative abundance of zooplankton proteins (ca. 30%) was three times higher than algal proteins (5-10%) in both, the >0.8mm and 0.2-0.8mm size-fraction (Wilcoxon test, p<0.05, Fig. 3A). This difference is in sharp contrast to the epipelagic, where the relative abundance between algal and zooplanktonic proteins was similar in the >0.8mm size-fraction (Wilcoxon test, p>0.05, Fig. 3A). Especially, in the >0.8mm size-fraction, the relative abundance of algal proteins significantly decreased from the epipelagic (median = 13.98%, IQR = 12.65-18.45%) to the bathypelagic layer (median = 6.29%, IQR = 3.99-6.96%) (Wilcoxon test, p<0.05).
Proteins are an essential component of total biomass and the changes in protein source reflect the variation in organic matter supply25. The attenuation of algal proteins with water column depth is consistent with the fact that sinking phytoplankton are insufficient to sustain the deep-sea microbiome1,30. It has been suggested that zooplankton derived POM (fecal pellet, carcasses) and DOM31-33 becomes the primary carbon source in the deep ocean, supporting microbial activity in the dark ocean1. This fact also implies that sequestration mechanisms like the gravity pump (fast-sinking zooplankton fecal pellet)5,31 and the zooplankton migration pump (living zooplankton)6 substantially contribute to the carbon flux into the deep ocean4,6,33. These changes directly influence the quantity and composition of deep-sea POM such as marine snow25.
The relative abundance of fungal proteins remained fairly low (1-3%) throughout the water column in each fraction although an active role of fungi in marine organic matter cycling has been suggested34,35.
Viral lysis of Gammaproteobacteria shapes the DOM pool
Alpha- and Gammaproteobacteria were the two major heterotrophic bacterial groups with gammaproteobacterial proteins dominating all three size-fractions, especially the <0.2mm fraction which is in contrast to the 16S rRNA profile (Fig. 3B, Extended Data Fig. 7A, supplementary dataset 7). The ratio between Gammaproteobacteria and Alphaproteobacteria in the metaproteome was significantly higher than the 16S rRNA- based ratio (Wilcoxon test, p<0.05, Extended Data Fig. 7B), suggesting that Gammaproteobacteria substantially contribute to protein production despite their relatively low abundance. Taxonomic analysis showed that while Alteromonadales and Oceanospirillales were the major contributors to gammaproteobacterial proteins in all fractions, the dominating groups in Alphaproteobacteria varied between fractions (Extended Data Fig. 8). This variability led to differences in the ratio of Gamma-/Alpha-proteobacterial proteins between fractions (Extended Data Fig. 7B, One-way ANOVA, p<0.05). For example, the cell abundance of the most abundant Alphaproteobacterium in the 0.2-0.8mm fraction, Pelagibacterales (SAR11), was almost 30 times higher than Alteromonadales in the epipelagic (Extended Data Fig. 9A). However, their protein abundance was lower than that of Alteromonadales throughout the water column (Extended Data Fig. 9B). Cell-size measurements showed that deep-sea Alteromonas spp. had larger (1.2 times) biovolumes than Pelagibacterales (SAR11) (Wilcoxon test, p<0.05, Extended Data Fig. 9C, D). Thus, the smaller biovolume of SAR11than Alteromonadales resulted in a low protein yield. As proteins account for about 50-60% of bacterial dry weight36, protein abundance can be used as a proxy for microbial biomass and add additional value in parameterizing ecological and biogeochemical models37.
Remarkably, gammaproteobacterial proteins dominated (ca. 80%) the <0.2mm size-fraction (Fig. 3B). While proteins collected in the >0.8mm and 0.2-0.8mm size-fraction originated mainly from intact cells, proteins collected in the <0.2mm fraction consisted of cell-free extracellular enzymes and proteins released from microorganisms. Signal peptides indicate protein/enzyme secretion into the environment38. Ten to 15% of the proteins in the <0.2mm fraction were associated with signal peptides and hence, were actively secreted as cell-free extracellular enzymes (Extended Data Fig. 10A). In contrast, cell-associated extracellular enzymes detected in the 0.2-0.8mm and >0.8mm size-fraction varied in their relative abundance (Extended Data Fig. 10B, C), but the functional composition of the cell-free and cell-associated extracellular enzymes was similar (Extended Data Fig. 10D-F). In the extracellular enzyme pool, hydrolytic extracellular enzymes only accounted for <20% of the extracellular enzyme pool (Extended Data Fig. 10D-F). Oxidoreductases, involved in the oxidative degradation of algal polysaccharides39, in the production of reactive oxygen species and in mediating metal bioavailability40 contributed >40% to the extracellular enzyme pool (Extended Data Fig. 10D-F). Oxidoreductases dominated both, the extracellular and cytoplasmic enzyme pool (Extended Data Fig. 10G-I) but their composition differed (Extended Data Fig. 11), with cell-free oxidoreductases in the <0.2mm fraction mainly acting on the CH-OH group (EC 1.1). Such functional difference suggests distinct enzymatic activities between extracellular substrate processing and cellular metabolism.
Besides the cell-free enzymes, proteins released from cell decay (proteins without signal peptide) accounted for 80% of the dissolved protein pool in the <0.2mm size-fraction (Extended Data Fig. 10A) mainly of gammaproteobacterial origin (Fig. 3B). The dominance of Gammaproteobacteria-derived dissolved proteins reflects their primary role in the marine carbon cycle and the importance of viral lysis as one of the major causes of cell death15,41.
Viral proteins constituted 1-5% of the total proteome (Table 1), with Myo-, Podo- and Siphoviridae comprising the majority of the viruses (Extended Data Fig. 12A). Linking viral proteins to the putative host (see Methods, Fig 3C, Extended Data Fig. 12B, C) revealed that the relative abundance of viruses infecting Gammaproteobacteria was highest in the >0.8mm and 0.2-0.8mm fraction (viruses reproducing in the cell, Fig. 3C, Wilcoxon test, p<0.05, Supplementary dataset 8). In the <0.2mm fraction (free viruses), however, the relative abundance of Gammaproteobacteria-related viruses was similar to that of Alphaproteobacteria-related viruses (Fig. 3C, Wilcoxon test, p>0.05, Supplementary dataset 8). Viral proteins detected in the >0.2mm (>0.8mm and 0.2-0.8mm, Fig. 3C) fraction represented viruses actively infecting host cells42,43, such high relative abundance suggests dynamic lytic activity of Gammaproteobacteria infecting viruses. This close virus-host relationship in our dataset reflects high lytic activity on Gammaproteobacteria, which likely resulted in the high proportion of gammaproteobacterial proteins in the <0.2mm size-fraction, where 80% of dissolved proteins originated from cell lysis (Fig. 3B). High viral lysis rates were observed on marine detrital particles as a large fraction of deep-sea heterotrophic microbes is preferentially associated with particles10,44. Particle-attached Gammaproteobacteria exhibit high turnover rates41,45,46. Measuring leucine incorporation rates of Alteromonas spp. from bathypelagic layers as a proxy for heterotrophic microbial activity further revealed that Alteromonas spp. were among most active bacterial cells and constituted 25-50% of the average leucine incorporation rate in the deep ocean (Extended Data Fig. 9E, supplementary dataset 9). Thus, the lytic infection of active Gammaproteobacteria ultimately leads to an efficient conversion of cellular organic matter into DOM due to viral lysis41,47. We also found high yields of gammaproteobacterial DNA in metagenomes of the <0.2mm fraction in the epipelagic realm (mentagenome from Tara ocean expedition, n=45, supplementary dataset 2)11 with a taxonomic profile similar to the metaproteome (Extended Data Fig. 13A). Classification at the order level showed that Sphingomonadales and Rhodobacterales from Alphaproteobacteria (Extended Data Fig. 13B), Alteromonadales and Oceanospirillales from Gammaproteobacteria (Extended Data Fig. 13C) were the major groups in the metagenomes of the <0.2mm fraction, which is consistent with our metaproteome data (Extended Data Fig. 8). This further suggests a high turnover rate of Gammaproteobacteria, which is in stark contrast to their cell abundance (Extended Data Fig. 7A). Thus, the high turnover rate together with the high yields of extracellular enzymes indicates Gammaproteobacteria’s dominant role in the dark ocean’s carbon cycle (Table 1, Fig. 3B). Such a close virus-host interaction was also found for Cyanobacteria in the epipelagic layer. Cyanobacteria were most abundant in the >0.8mm size-fraction (ca. 30%) in the epipelagic and decreased in relative abundance with depth (ca. 2% in the bathypelagic, Fig. 3B, supplementary dataset 6). Along with the cyanobacterial hosts, cyanophages were also abundant in the epipelagic waters (Table 1, Fig. 3C). The relative abundance of the viral photosystem-II (psbA) was similar to the relative abundance of cyanobacterial psbA, especially in the >0.8mm size-fraction (Fig. 3B, C, Extended Data Fig. 12D) reflecting close virus-host interactions42. These results are consistent with those from metatranscriptomic analyses, where 50% of psbA transcripts originated from cyanophages, confirming the major role of viruses in regulating photosynthetic processes in the sunlit surface ocean48.
Bacteroidetes derived proteins were also abundant in the epipelagic but decreased in abundance with depth (Wilcoxon test, p<0.05) in both, the >0.8mm and 0.2-0.8mm size-fraction (Fig. 3B), consistent with the results of 16S rRNA analysis (Extended Data Fig. 7A). Recently, it has been shown that high hydrostatic pressure inhibits the metabolism of Bacteroidetes in the deep sea46.
Urea fuels nitrification-mediated dark inorganic carbon fixation
Thaumarchaea and Nitrospinae were the major chemolithoautotrophs in our metaproteome with the highest relative abundances detected in the 0.2-0.8mm size-fraction in the mesopelagic (Table 1, Fig. 4A). Protein abundance of Nitrospinae was higher than expected from the 16S rRNA analysis (Fig. 4A, Extended Data Fig. 7C, D) because Nitrospinae cells are larger compared to other bacterial taxa8. In contrast, thaumarchaeal cells have a low biovolume as revealed by microscopic analyses (only 60% of SAR11, Extended Data Fig. 9C, D). Ammonium monooxygenase (AmoA) and nitrate oxidoreductase (NxrA) are the key enzymes used by Thaumarchaea and Nitrospinae, respectively, for energy harvesting to fuel dark dissolved inorganic carbon (DIC) fixation. In our metaproteome, we found that the relative abundance of NxrA was almost two orders of magnitude higher than that of AmoA (Fig. 4B), which compensates the lower abundance of Nitrospinae compared to Thaumarchaea (Extended Data Fig. 7C). Nitrospinae also exhibit one order magnitude higher DIC fixation rates8 but lower (three-four fold) energy conversion efficiency49 than Thaumarchaea. Thus, a high cell-specific oxidation rate is required for Nitrospinae and the high expression level of NxrA (Fig. 4B) found in our dataset indicates homeostasis of the nitrogen flux between Nitrospinae and Thaumarchaea49.
The first nitrification step is ammonia oxidation providing energy for DIC fixation. Recent reports suggest that, due to insufficient ammonia supply, urea might serve as an alternative ammonium source for Thaumarchaea as they can use urease for urea cleavage16,50,51. In the 0.2-0.8mm size-fraction almost 90% of the urease present in the mesopelagic (Fig. 4C-D) was expressed by Thaumarchaea, which attained their highest relative abundance in the mesopelagic (Table 1, Fig. 4A, Extended Data Fig. 7C). This high contribution of thaumarchaeal urease coincided with the high contribution of zooplankton-derived proteins to the total protein pool in the mesopelagic realm (Fig. 3A) where zooplankton excretion is likely a significant source of urea16,50. Thus, the activity of zooplankton in the mesopelagic ocean might not only provide POM for heterotrophs but likely also supports dark DIC fixation of Thaumarchaea via the release of urea. It is noticeable that cyanobacterial urease was abundant in the >0.8mm size-fraction (Extended Data Fig. 14) and showed relative high abundance in the mesopelagic despite their decreased cell and protein abundance compared to the epipelagic layer (Fig. 3B, Extended Data Fig. 7A). This suggests that sinking Cyanobacteria might provide organic carbon and ammonia to prokaryotes in the mesopelagic52, further alleviating the ammonia deficiency in the dark ocean. In addition, niche partitioning was found in heterotrophic bacterial urea utilization, where urease in the >0.8mm size-fraction was expressed by Alphaproteobacteria while gammaproteobacterial urease dominated the <0.2mm size-fraction (Fig. 4D).
Enzymes involved in aerobic respiration (CoxA/CyoA/CcoN/CydA, cytochrome oxidase) were drastically reduced in the >0.8 mm size-fraction in the mesopelagic (Extended Data Fig. 15A, B) compared to the epipelagic waters but denitrification related enzymes such as dissimilatory nitrate reductase (NapA/NarG) were abundant in the mesopelagic realm (Extended Data Fig. 15C, D). This supports the notion that detrital particles provide a niche for anaerobic microbial metabolism in the dark ocean53.