Functional traits of particle-associated microbial assemblages in the low-latitude Pacific Ocean

Background Organic particles are hotspots for microbial activity and serve as sites of organic matter mineralisation in the water column of marine systems. In nutrient-limited surface water, degradation of organic matter and nutrient regeneration by marine microbes is crucial. Although free-living (FL) bacteria vastly outnumber those on particles, particle-associated (PA) bacteria can reach locally higher concentrations. Accordingly, to achieve a better understanding of marine microbial ecosystems, it is important to elucidate the differences in not only microbial community structures, but also functional traits, between PA and FL environmental sample fractions. In a previous study, we demonstrated that the Genomaple (formerly MAPLE) system could successfully differentiate the functional potentials and diversity of contributors to each function in four metagenomic datasets generated by the Global Ocean Sampling expedition. Hence, we also used this system to highlight functional traits in PA microbial assemblages. Results The PA and FL fractions could be distinguished from one another by their taxonomic compositions, inferred from ribosomal proteins and relative abundance of module functions. Module functions that were more abundant among PA assemblages than FL assemblages were shared between both subtropical gyres, and their taxonomic compositions were similar. Bacterial transport systems associated with adhesive molecules used for forming microbial assemblages through particulate organic matter were more abundant in the PA fractions. Bacterial regulatory system elements for C 4 -dicarboxylate transport and B-vitamin biosynthesis were also abundant among PA assemblages, suggesting mutual relationships between bacteria and algae involved in exchange of nutrient sources. On the other hand, module functions related to amino acid biosynthesis and bacterial transport systems for inorganic nitrogen, phosphorus, and urea were significantly more abundant in the PA assemblages of more oligotrophic North and South Pacific subtropical gyres than eastern equatorial Pacific regions. Conclusions Comprehensive functional metagenomic analyses based on functional abundance revealed some notable functional traits in PA assemblages related to cell adhesion and nutrient acquisition, enabling the microbes to survive in subtropical regions that are more oligotrophic than the equatorial regions.

demonstrated that the Genomaple (formerly MAPLE) system could successfully differentiate the functional potentials and diversity of contributors to each function in four metagenomic datasets generated by the Global Ocean Sampling expedition. Hence, we also used this system to highlight functional traits in PA microbial assemblages. Results The PA and FL fractions could be distinguished from one another by their taxonomic compositions, inferred from ribosomal proteins and relative abundance of module functions. Module functions that were more abundant among PA assemblages than FL assemblages were shared between both subtropical gyres, and their taxonomic compositions were similar. Bacterial transport systems associated with adhesive molecules used for forming microbial assemblages through particulate organic matter were more abundant in the PA fractions.
Bacterial regulatory system elements for C 4 -dicarboxylate transport and B-vitamin biosynthesis were also abundant among PA assemblages, suggesting mutual relationships between bacteria and algae involved in exchange of nutrient sources. On the other hand, module functions related to amino acid biosynthesis and bacterial transport systems for inorganic nitrogen, phosphorus, and urea were significantly more abundant in the PA assemblages of more oligotrophic North and South Pacific subtropical gyres than eastern equatorial Pacific regions. Conclusions Comprehensive functional metagenomic analyses based on functional abundance revealed some notable functional traits in PA assemblages related to cell adhesion and nutrient acquisition, enabling the microbes to survive in subtropical regions that are more oligotrophic than the equatorial regions. Background 3 Although bacteria can grow rapidly when appropriate nutrients are available, some still grow slowly under optimum conditions. Such microbes are usually present in oligotrophic environments, and are rarely present in copiotrophic environments [1]. Subtropical and tropical ocean surface water is nutritionally poor due to the low productivity of such ecosystems, which are characterised by strong thermal stratification. In particular, the southern Pacific is extremely oligotrophic, with nutrient concentrations that are undetectable by standard methods [2]. Phosphate concentrations of the North Pacific subtropical gyre (NPSG) are lower than those of the South Pacific subtropical gyre (SPSG) [3,4].
By contrast, inorganic nitrogen and phosphorus concentrations in the equatorial (EQ) Pacific regions are relatively high as a result of upwelling [5]. Nevertheless, primary productivity is insufficient to exhaust macronutrients owing to the limited supply of iron received as dust from Eurasia, which is a high-nutrient and low-chlorophyll area [6,7].
Organic particles are hotspots for microbial activity, and microbes play critical roles in the degradation and remineralisation of both sinking [8,9] and suspended particulate organic matter (POM) [10,11]. However, the physiological differences between particle-associated (PA) and free-living (FL) assemblages remain poorly understood [12,13]. The primary and most important process of POM formation is photoautotrophic fixation of CO 2 into biomass in pelagic ecosystems, usually by phytoplankton. Particles associated with microbes constitute a significant fraction of sinking particles and thus contribute to the biological carbon pump [14][15][16]. In nutrient-limited surface water, degradation of POM and nutrient regeneration by marine microbes is crucial and more directly linked to phytoplankton primary production than in nutrient-replete ocean areas. Thus, phytoplankton development affects the community structure and dynamics of marine bacteria in a way that depends on changes in nutrient availability, algal exudates, and biological surfaces [17,18]. Although FL bacteria vastly outnumber those on particles, PA bacteria can reach locally higher concentrations [19]. Accordingly, it is important to elucidate the differences not only in microbial community structures but also in functional traits between PA and FL environmental sample fractions to reach a better understanding of marine microbial ecosystems. Although most bacterial community structure studies do not distinguish between PA and FL assemblages, several studies have compared microbial 4 diversity between PA and FL fractions at individual marine sites [17,[20][21][22][23].
Metagenomic approaches have been increasingly recognised as a starting point for understanding microbial functions and community structures [24,25]. However, many metagenomic microbial community analyses simply infer bacterial community structure based on potentially biased polymerase chain reaction (PCR) amplicons of 16S rRNA sequences, despite the fact that gene copy number can vary even within the same species [26]. On the other hand, we have confirmed the effectiveness of using ribosomal proteins, instead of problematic 16S rRNA sequences, to identify organisms and conduct community structure analysis [27]. An unrealised major goal of metagenomics is the inference of both species diversity and potential functionomes, defined as comprehensive sets of biological functions arising through the combinations of coding genes underlying such functions, e.g., carbon fixation, nitrogen fixation, and amino acid metabolism [28,29]. Indeed, only a fraction of genes corresponding to specific metabolic pathways and transporters have been employed in functional characterisation of habitats. To resolve this problem, we developed software to evaluate metabolic and physiological potential, Genomaple (formerly MAPLE) [27][28][29], which enables estimation of comprehensive and detailed functional potential based on completion ratios of every functional module defined by the Kyoto Encyclopedia of Genes and Genome (KEGG) [30] and the abundance of complete modules. Indeed, given that Genomaple has successfully differentiated the functional potentials and diversity of contributors to each function in four metagenomic datasets generated by the Global Ocean Sampling (GOS) expedition [27], it should be capable of revealing differences in functional traits between PA and FL microbial assemblages.
However, it remains unclear how and to what extent PA and FL functional potentials and diversities of microbial consortia differ among the Pacific subtropical gyres and equatorial regions. Accordingly, a functional metagenomic approach has become indispensable for detecting differences in functional potentials and their host organisms, both between PA and FL assemblages and among NPSG, SPSG, and EQ assemblages within the respective PA and FL fractions.

5
In this study, our analyses, based on the functional modules responsible for pathways and molecular complexes using Genomaple [27,28], have enabled a novel, comprehensive, and detailed functional characterisation of the PA and FL fractions. Here, we provide comparative analyses of functional abundance and a reanalysis of prokaryotic community structure based on ribosomal proteins instead of problematic 16S rRNA sequences. We place special emphasis on functional traits of the PA fraction in the subtropical regions and the biodiversity contributing to functional abundance of the highlighted modules. We also focus on the differences in the functional traits of the PA fraction between subtropical and equatorial regions.

Results And Discussion
Comparison of community structure between PA and FL fractions.
We performed prokaryotic community structure analyses of metagenomes obtained in the NPSG, SPSG, and EQ regions (Additional File 1: Fig. S1) based on ribosomal proteins (Additional File 2: Table   S1). The relative abundance of bacteria in the PA fractions of each region exceeded 96%, and there was no significant difference in the ratio of bacteria and archaea between the NPSG and EQ regions ( Fig. 1a). However, the relative abundance of archaea in the SPSG regions was extremely low (0.3-0.8%) in comparison with other regions. The FL fraction exhibited a similar tendency to the PA fraction, except that archaeal abundance in the EQ region was about 2-fold higher (2.6-3.6%) than in the NPSG region.
Among bacterial community members, α-proteobacteria were predominant across all samples, and their relative abundance was lower in the PA fraction (23.9-45.8%) than in the FL fraction (36.1-57.3%) ( Fig. 1b and Additional File 3: Table S2). Cyanobacteria were the second most prominent members in both NPSG fractions except for the PA fraction from site NPSG1; in contrast to SPSG, where Cyanobacteria was only second most abundant for SPSG1 in the FL fraction. Thus, remarkable differences were observed in relative abundance between the NPSG and SPSG sites even though both are oligotrophic and subtropical. The relative abundance of Bacteroidetes in PA assemblages was more than twice as high as in the FL fraction in almost all regions, and their abundance in the NPSG was about half of that in other regions (Fig. 1b). We previously performed community structure 6 analyses of metagenomes based on ribosomal proteins (GS030 and GS031) [27], obtained near the Galápagos Islands by the GOS Expedition [36]. The community structure of these GOS sites, which are near the EQ1 site, was mainly composed of α-proteobacteria, γ-proteobacteria, and Bacteroidetes, but the relative abundance of Cyanobacteria was very low (< 2%), unlike the FL fraction of site EQ1 (14%). In the GOS expedition, an 0.8-μm filter was used for prefiltration, unlike the 3.0-μm filter used in this study, and the smaller pores may have more efficiently trapped the longer, narrower Cyanobacteria cells. Non-metric multidimensional scaling (nMDS) analysis of the compositional patterns of the prokaryotic ribosomal proteins revealed that the community structure in the FL and PA fractions differed, and this difference was supported by Bray-Curtis dissimilarity (Fig. 2a). In addition, the difference in prokaryotic community structure among the subtropical and equatorial regions in each fraction (PA and FL) was statistically significantly similar to that observed in the former analysis ( Fig. 3a).
In most metagenomic analyses, amplicon sequences of 16S rRNA genes are used to compare microbial community structures among different environments [37]. Recently, this PCR-based amplicon approach has generally targeted the V4 region because different 16S rRNA gene regions yield varying degrees of accuracy in taxonomic assignment [38]. However, because prokaryotic species exhibit 16S rRNA copy number variation, it is impossible to determine the copy numbers of individual uncultivable microbes present in natural environments. Thus, because taxonomic compositions based on amplicon sequences are strongly influenced by copy number and PCR bias, this approach is suboptimal for community structure analysis. We used three types of primer sets containing the V3-V4 region to confirm the variability among results, and we found that the patterns of bacterial composition in the PA and FL fractions of the NPSG, SPSG, and EQ regions varied significantly depending on the primer sets [2]. The composition of the four main bacterial taxa (i.e., αproteobacteria, Cyanobacteria, γ-proteobacteria, and Bacteroidetes) in both fractions of each sampling site differed by factors of 1.2-6, and for Verrucomicrobia, the difference was several tens of fold because these species were barely detected by the V1-V2 primer set (Table 1). Copy numbers of the rrn operon in α-proteobacteria, Cyanobacteria, γ-proteobacteria, and Bacteroidetes range from 1 7 to 11 (mean, 2.7), 1 to 6 (mean, 2.6), 1 to 21 (mean, 6.2), and 1 to 13 (mean, 3.4), respectively [26].
A comparison of the relative abundances of each bacterial taxon based on ribosomal proteins with amplicon-based relative abundances revealed that while some relative abundance estimates were similar, there was no consistency among the results of amplicon primer sets. Indeed, although δproteobacteria were specifically detected by all types of amplicons in the PA fraction of NPSG2, they were hardly detected based on ribosomal proteins in either fraction from any region. Furthermore, the relative abundance of Bacteroidetes was clearly overestimated in the PA fractions of the EQ2 and SPSG2 sites.

Localisation pattern and biodiversity of archaea
Among archaeal community members, Marine group II Euryarchaeota (MGII) [39,40] predominated among archaeal ribosomal proteins in both PA and FL fractions of all regions, with SPSG samples exhibiting especially high relative abundances exceeding 95%, and about 80% of MGII sequences consisted of MGIIA (Fig. 4a). In NPSG samples, marine group III Euryarchaeota (MGIII) accounted for about 30% and 20% in the PA and FL fractions, respectively, with MGIIB predominant in both fractions, in contrast to the SPSG samples. MGIII was also detected in EQ samples, but at a lower relative abundance than in the NPSG samples (16.6% and 6.8% in the PA and FL fractions, respectively). Similar to the SPSG samples, MGIIA was predominant in EQ samples, but the region's archaeal community structure was intermediate between those of the NPSG and SPSG regions.
Although some uncultivated marine Euryarchaeota can adhere to particulate organic matter [41], no significant differences in archaeal community structure were observed between the PA and FL fractions in this study. Most archaeal ribosomal proteins assigned by Genomaple in each metagenome matched uncultured draft genomes by top score in homology searches. Some of those ribosomal proteins exhibited more than 90% identity with Euryarchaeota archaeon TMED252 and TMED215 proteins within MGIIA and MGIIB, respectively, in all regions [34], and with MGIII Euryarchaeota CG-Epi1 and CG-Epi3 [42] in the NPSG and EQ regions (Fig. 4b). However, many ribosomal proteins did not match other draft genomes, with identities reaching only 80%, indicating greater marine euryarchaeal diversity than expected, as reported previously for the NPSG region [41].

Functional discrimination based on relative module abundance.
Using Genomaple, we analysed a total of 18 metagenomic sequence datasets from the PA and FL fractions of each sampling site (Additional file 4: Table S3). The following results were recorded: module completion ratio (MCR) by individual taxonomic ranks (ITRs; see Methods), number of ITRs with completed modules (i.e., diversity of ITR completed modules), and sum of relative module abundance of each ITR-completed module (Additional file 5: Tables S4). We used the relative module abundance commonly completed in all metagenomic samples to compare the functional traits among samples and also examined the composition of each ITR responsible for completion of each module.
First, we performed statistical analysis based on the pattern of relative module abundance of each sample to confirm that the FL and PA assemblages, as well as their microbial community structures, were functionally distinct. The nMDS analysis of Bray-Curtis dissimilarity showed that the prokaryotic functional potential of the FL fraction significantly differed from that of the PA fraction (Fig. 2b). Next, we examined the significance of the differences among sampling sites for the same fraction (i.e., FL or PA) in the same manner. In the FL fraction, while NPSG, SPSG, and EQ regions were significantly differentiated (Fig. 3b), the SPSG1 site, with a prokaryotic compositional pattern similar to that of the EQ3 site (Fig. 1b), deviated from the other SPSG sites. Additionally, in the PA fraction, each region was significantly differentiated according to a pattern similar to that of the FL fraction, as supported by ANOSIM and PERMANOVA. These results indicated that the PA and FL fractions had predominant functions, and that their microbial assemblages were independent of each other. In addition, we expected that PA and FL would have respective functional characteristics that reflect the environmental factors at the sampling location. Therefore, in the next step, we performed comparative analyses to highlight the predominant physiological functions characterising the PA fractions, as well as the subtropical regions.

Common functional traits of PA fraction in NPSG and SPSG.
Particle association is likely a prominent strategy for utilisation organic matter by heterotrophic bacteria and archaea, especially those in oligotrophic marine environments. Accordingly, we focused on prokaryotic functions that predominate in the PA fractions, and combined prokaryotic sequences from three sampling sites in the same region as representative datasets from the NPSG, EQ, and SPSG regions to highlight differences in the prokaryotic potential functionomes of each region (Additional File 4: Table S3). We focused on modules with relative abundances > 2-fold higher in the PA fraction than in the FL fraction, based on the Genomaple results (Additional File 6: Table S5 Marinobacter flavimaris [43]. Photosynthetic picoeukaryotic algae that contribute to primary production in oligotrophic marine environments also require B vitamins, and the heterotrophic bacterial partner Dinoroseobacter shibae within α-proteobacteria frequently associates with marine algae within Mamiellales, providing vitamins B 1 and B 12 to its host [44]. Vitamin B 1 plays a pivotal role in intermediary carbon metabolism and is a cofactor for a number of enzymes involved in primary carbohydrate and branched-chain amino acid metabolism. Indeed, the major contributors to the relative abundance of the vitamin B 1 biosynthesis module (M00127) are the α-and γ-proteobacterial species containing Dinoroseobacter and Marinobacter, respectively. In addition, other types of Dinoflagellate and Mamiellales were detected by eukaryotic community structure analysis based on ribosomal proteins (Additional File 7: Fig. S2).
Another notable feature of the PA fractions was the fairly high relative module abundance of twocomponent regulatory systems related to transport of C 4 -dicarboxylates (fumarate, succinate, and malate) [45] in both the NPSG and SPSG fractions. Contributors to module abundance consisted of three major proteobacterial classes, namely α-, γ-, and β-proteobacteria (Fig. 5). Although the process of physiological adaptation of diatoms under nitrogen starvation is poorly understood, proteomic research on the marine diatom Thalassiosira pesudonana has revealed that many of the TCA cycle proteins that yield C 4 -dicarboxylate, such as succinate dehydrogenase and fumarate dehydratase, are upregulated by 1.6-to 2.5-fold under nitrogen starvation [46]. Diatoms within Thalassiosirales were detected in all PA fractions, at 1-6% relative abundance, and were most abundant in the EQ region, where upwelling occurs (Additional File 7: Fig. S2). Heterotrophic bacterial assemblages adhering to diatomaceous surfaces may exploit C 4 -dicarboxylate released from the diatoms as a nutrient source.
Additionally, expression of genes encoding portions of the C 4 -dicarboxylate transport system and binding proteins has been observed in Pelagibacter within α-proteobacteria in the NPSG region, suggesting the importance of C 4 -dicarboxylate as a carbon and energy source for these bacteria [33].
Although such a transcriptomic approach has not yet been applied to SPSG samples, similar results can be expected. Our findings further emphasise the importance of C 4 -dicarboxylate utilisation, especially for proteobacteria adhered to POM in oligotrophic marine environments.
The relative abundance of the module responsible for adhesin protein transport system (APTS) in the PA fraction was more than 3-fold greater than in the FL fraction (Fig. 5). POM is initially living biomass of primary producers that, through various processes and transfer steps in the food web, eventually becomes detrital POM, which often dominates total POM. Actually, eukaryotic ribosomal proteins mainly corresponding to various taxa of alga were significantly overrepresented in trapped biomass on the 3.0-µm pre-filter (Additional File 7: Fig. S2). During the first step of bacterial interaction with POM, including living biomass, bacterial cells develop a series of surface proteinaceous adhesins that promote specific or nonspecific adhesion under various environmental conditions and achieve irreversible attachment [47]. Although APTS occurs in many kinds of bacterial species, major contributors to this system, such as Dinoroseobacter and Planctomarina within α-proteobacteria, were shared between the NPSG and SPSG regions. In addition, the relative abundance of the capsular polysaccharide and lipopolysaccharide (LPS) ABC transport system was higher in the PA fraction.
Complex carbohydrates in the form of LPS, polysaccharide, capsules, and lipooligosaccharides are commonly found on the cell surface [48]. Capsules allow organisms to adhere to various surfaces, as well as to other bacteria, facilitating the colonisation of various ecological niches and formation of biofilms [49]. The capsule provides the bacterium with mechanisms to avoid non-specific host defences, including complement-mediated bacteriolysis and opsonophagocytosis [50]. Because LPS is also implicated in surface adhesion, bacteriophage sensitivity, and interactions with predators, these aggregative and protective molecules may greatly contribute to the formation and maintenance of PA bacterial assemblages.
Thus, PA bacteria can be more advantageous to gain nutrient and energy sources through particlemediated interactions between microbial communities than FL bacteria, but they also become susceptible to stress caused by various factors, especially antibiotics, produced by secondary metabolism. As one manifestation of this concept, the relative module abundance of the twocomponent regulatory system related to the envelope stress response (ESR) and cationic antimicrobial peptide resistance (CAPR) was remarkably high. ESR is an inducible bacterial response that senses and mediates adaptation to insults to the outer compartment of the cell, and has been implicated in the ability to resist or endure various types of antibiotics [51]. Antimicrobial peptides are part of the innate immune response found among all classes of life. Antimicrobial peptides can kill Gram-negative and Gram-positive bacteria, as well as enveloped viruses and fungi [52]. Bacteria use various resistance strategies to avoid being killed by antimicrobial peptides [53], but only γproteobacterial species contributed to the relative abundance of the CAPR and ESR modules.

Significant functional traits in subtropical regions.
As mentioned previously, surface seawater of subtropical latitudes is generally more oligotrophic than that of tropic latitudes, but the concentrations of inorganic salts such as nitrate, phosphate, and iron complex differ between the subtropical and tropical regions, and also the north and south subtropical regions. Hence, we calculated the relative abundance ratio of each module between the subtropical (NPSG and SPSG) and EQ regions (i.e., NPSG/EQ and SPSG/EQ) in the PA fraction. Fifteen modules for which the NPSG/EQ and SPSG/EQ values exceeded a 2-fold difference were identified ( Fig. 6 and Additional File 6: Table S5). Especially, the relative abundance of the module responsible for the Entner-Doudoroff (ED) pathway was more than 3-fold higher in the PA fraction than in the EQ regions ( Fig. 5). Aerobic and facultative anaerobes are more likely to have the ED pathway despite its low energy yield, as these microbes have other non-glycolytic pathways for creating ATP, such as oxidative phosphorylation. Thus, it is thought that the ED pathway is especially favoured among species thriving in the oligotrophic environments due to the lower amounts of protein required to generate ATP. Another notable feature of the subtropical oligotrophic regions was the high relative abundance of modules related to biosynthesis of amino acids such as serine, phenylalanine, tyrosine, and histidine; moreover, there was no significant difference in the contributors to these modules between NPSG and SPSG, with the exception of serine biosynthesis (Fig. 6). In addition to amino acid biosynthesis, the relative abundance of the module responsible for urea transport system was more than 4-fold higher than in the EQ regions. The concentration of ammonium in the subtropical regions corresponding to the sampling sites in this study is very low, less than 0.15 μmol/l [54]. In the NPSG regions, more than 60% of the contributors to the abundance of urea transport system were cyanobacterial species similar to Prochlorocossus sp. MIT 604 with a complete set of urease genes. In contrast to the NPSG, SPSG, in which the proportion of cyanobacteria in the total community is small (Fig. 1b), contributed only 15% to the urea transport system. However, most of the cyanobacteria species were Prochlorococcus sp. MIT 604, as in the case of NPSG. As well as ammonium, nitrate and nitrite (N+N) concentrations in the NPSG and SPSG regions were much lower than in the EQ region (Additional File 1: Fig. 1). Corresponding to this condition, the relative abundance ratio of the two transport systems related to nitrate and nitrite in the subtropical regions, especially the nitrate/nitrite transport system (M00438), was 8-10 times higher than that of the EQ (Fig. 6). Interestingly, there was a significant difference between the NPSG and SPSG in the composition of contributors to this function. Thus, these results reflect the differences in concentrations of carbon and nitrogen sources between the subtropical and tropical regions; indeed, the concentration of chlorophyll a in the EQ region is 3-7 times higher than in the subtropical regions (Additional File: Fig. S1). In addition, cell 13 number by DAPI count was 3-4 times higher, suggesting larger amounts of organic matter than in the NPSG and SPSG regions. Therefore, PA assemblages in the subtropical regions are presumably adapted to oligotrophic environments due to higher capacity for amino acid synthesis and uptake of urea and N+N.
The other highlighted module included molybdate transport systems. Nitrogen fixation activity is not observed in the equatorial region, which is rich in soluble reactive phosphorus (SRP), whereas NPSG, which has been depleted of SRP, has significantly higher N 2 -fixation activity [2]. In fact, the relative abundance ratio of the two-component regulatory system for phosphate starvation response was higher in both the NPSG and SPSG regions, where the concentration of SRP is much lower than in the EQ. Nitrogen fixation contributes up to 84% of new production in the upper waters of subtropical gyre, where the diazotroph community includes γ-proteobacteria and Cyanobacteria [55]. Considering that nitrogenase, the core enzyme of nitrogen fixation, is Mo-dependent [56], higher Mo-transport potential is presumably connected to the significant nitrogen fixation activity in the NPSG [55], as the major contributors to the relative abundance of the Mo transport system are the γ-proteobacteria. On the other hand, the two-component regulatory system related to nitrogen fixation also had a high relative abundance ratio, although its association with nitrogen fixation in the NPSG was not clear, as α-proteobacteria were the main contributor to this function.
As mentioned above, the PA microbial assemblage has higher functional potential for adhesion and self-protection in oligotrophic surface water. In fact, considering the extremely high relative abundance ratios of modules related to the adhesion protein transport system, this seems to indicate that the subtropical region has a higher capacity than the EQ region to form microbial assemblages capable of surviving in an oligotrophic environment. On the other hand, other types of drug resistance system (i.e., multidrug efflux pump and aminoglycoside resistance) were highlighted with significant high relative abundance ratio unlike former comparison. Aminoglycoside antibiotics exhibit bactericidal activity against mainly Gram-negative aerobes and some anaerobic bacilli [57].
Streptomycin is a first-in-class aminoglycoside antibiotic produced by Streptomyces griseus; indeed, various actinobacterial species containing S. griseus were detected in PA assemblages by the 14 community structure analysis based on the ribosomal proteins, although their population was small ( Fig. 1b and Additional File: Table S2).

Conclusions
Through comprehensive functional metagenomic analyses based on functional abundance, we highlighted the functional traits of PA fractions associated with formation and maintenance of microbial assemblage in comparison with the FL fractions and locations. We could also reveal the functions of PA assemblages that were required for survival in oligotrophic ocean surface water.
Accordingly, this approach can enable more precise interpretations of biological information harboured in massive metagenomic datasets and lead to a better understanding of marine ecosystems. Our study also reinforces the importance of considering PA and FL prokaryotic assemblages as independent microbial communities. In addition, we reconfirmed that community structure analysis using problematic 16S rRNA amplicon sequences can often be misleading in studies of this kind.

Sample collection and treatment and DNA extraction.
As previously reported, during two excursions of the R/V Hakuho-Maru across the NPSG, SPSG, and EQ regions of the Pacific Ocean (Additional File 1: Fig. S1), 18 seawater samples were collected from the surface at nine locations [2]. To separate the PA and FL assemblages, 100 L of seawater were serially filtered with 3.0-µm pore Nucleopore polycarbonate membrane filters (Whatman, Florham Park, NJ, USA) and 0.22-µm pore Steripak-GP cartridge filter units (Millipore, Billerica, MA, USA). DNA was extracted using a previously reported method with some modifications [31]. Lysozyme (1 mg/mL as a final concentration) was added to 20 mL of sucrose lysis buffer (50 mM Tris HCl, pH 8.0, 40 mM EDTA, 0.75 M sucrose) and transferred to the Steripak-GP cartridge filter in which the cell masses were trapped. After a 30-min incubation at 37°C, the mixture including Proteinase K (0.1 mg/ml) and SDS (1%) was divided into 2.0-mL tubes with 3.0-μm membrane filter cartridges, and each sealed tube was incubated at 65°C for 90 min. This treatment was repeated to increase lysis efficiency. After concentration of cell lysate with Amicon Ultra 15 (Millipore), DNA was purified according to a standard method.

Sequencing and raw data processing.
Paired-end libraries with 480-bp inserts were prepared using the Nextera XT DNA Library Prep Kit (Illumina, San Diego, CA, USA) and sequenced on the Illumina MiSeq platform, generating 2-5 million reads with 250-bp read lengths. PEAR (v0.9.6) was used to merge these paired-end reads after sequences with low-quality scores were removed using FASTX-Toolkit and duplicates were eliminated using PRINSEQ [58]. Gene prediction from the merged sequences was performed using MetaGene Annotator [59]. These sequences were divided into partial and complete genes, and all sequences, except for partial sequences shorter than 50 amino acids (a.a.) in length, were randomly selected to compile multi-FASTA files for Genomaple analysis. The number of sequences obtained from each sampling site is listed in Table S3 (Additional File 4).

Functional annotation and evaluation of metabolic potential.
Each multi-FASTA file of a.a. sequences (Additional File 4: Table S3) was submitted to Genomaple 2.3.2 using GHOSTX as a homology search engine [28]. The single-directional best-hit method [27] was used to assign a KEGG Orthology (KO) ID to the query gene using the KEGG Automatic Annotation Server [60]. The MCR of each functional module was calculated as described previously [27,29]. To normalise KO abundance, the total number of query sequences assigned to each KO was divided by the average length of each KO group [27]. To enable comparisons among different sites, module abundance was normalised against the abundance of all ribosomal proteins in each sample (yielding module abundance per ribosomal protein) [27]. Taxonomic assignment of contributors to complete modules was performed based on ITRs, such as phylum and class (i.e. secondary taxonomic rank defined by KEGG).

Comparative analysis of module abundance.
PA and FL datasets were created using three combined sequences from the same region, as shown in Table S3 (Additional file 4). These six sets of sequence data were submitted to Genomaple to calculate the relative abundance of each functional module per ribosomal protein appearing in each fraction (FL and PA) from the NPSG, SPSG, and EQ regions. The abundance ratio was calculated between the PA and FL fractions in the same region, and also between regions in the same fraction, based on the Genomaple results (Additional File 6: Table S5).

Community structure analysis.
The proportional representation of bacteria, archaea, and eukaryotes in the metagenomes was calculated based on the mapping pattern of virtual module M90000 for prokaryotic ribosomes (Additional File 8: Fig. S3) [27]. Because the archaeal ribosome contains six more proteins than the bacterial ribosome (58 vs. 52), we normalised the total number of archaeal ribosomal proteins to the number of bacterial ribosomal proteins by multiplying the archaeal ribosomal protein count data by 52/58. We summed the number of bacterial ribosomal proteins and normalised archaeal ribosomal proteins, and then used this sum as the denominator in subsequent calculations. We calculated the proportion of prokaryotes at the ITR level similarly. For a more detailed analysis of archaeal communities, we searched all sequences assigned to archaeal ribosomal proteins by Genomaple using the NCBI non-redundant (nr) protein database.

Phylogenetic analysis.
To determine the phylogenetic position of the top hit euryarchaeotic species based on homology searches, we retrieved ribosomal proteins from each complete or draft genome registered in the NCBI database. Subsequently, 10 commonly conserved ribosomal proteins were selected from among the draft genomes. These concatenated protein sequences were aligned with the corresponding sequence of Halobacterium salinarum, used as an outgroup. The LG+G model in MEGA 6.06 was used to construct a phylogenetic tree using the maximum likelihood method [61].

Statistics.
A dataset based on the mapping patterns of module M90000 was prepared for comparative prokaryotic community structure analysis, as shown in Table S1 (Additional File 2). Another dataset was also prepared for comparative functional analysis based on module abundance patterns, as shown in Table S4 (Additional File 5). Only abundance data for modules commonly completed among all metagenomic samples were used for this analysis. Bray-Curtis distance [62] was used to create the distance matrix. The differences in community structure and module abundance patterns, both between the PA and FL fractions and among locations, were analysed using non-metric multidimensional scaling (nMDS). The differences between categories (i.e., different treatments) were tested using ANOSIM [63] and       Functions that were more abundant in the PA fraction in subtropical regions than in equatorial regions, and their biodiversity. Only modules with abundance ratios exceeding 2fold are shown in this figure. NPSG/EQ, abundance ratio of NPSG to EQ; SPSG/EQ, abundance ratio of SPSG to EQ. Module ID, coloured in blue, indicates rare modules, as shown in Fig. 5.