Ecological distribution
PAUC43f 16S rRNA gene sequences were detected in several marine environments such as sediments, corals, sponges, oysters, estuaries, sea water and hydrothermal vents, in addition to samples from hypersaline lake sediments, soil and marine sediment mats (Fig. 1A). It was remarkable that, from the 179 sequences that were included in the map, 89 were recovered from marine sediment samples. Regarding the geographical distribution, PAUC43f was detected around the world in almost every latitude and longitude, and in both shallow and deep environments.
To get more insights into the PAUC43f ecological distribution, its relative abundance (% of 16S reads from the total) was estimated for each environment (Fig. 1B). From the total of 189,104 16S rRNA gene amplicon datasets analysed, PAUC43f was detected in 4,965 of them (Suppl. Table 1). The highest mean relative abundances were in sponges, marine sediments and soils (Fig. 1B) while the lowest values were found in sea water and hydrothermal samples. It is remarkable that PAUC43f could reach extremely high relative abundances as observed, for example, in an arid saline soil from China [66] and in petroleum-impacted sediments from a saline lake [67], where it represented 34.3% and 19.3% of the total community, respectively.
Since PAUC43f reached its highest relative abundances in sponges, sediments and soil samples, its distribution in these environments was explored more deeply. PAUC43f was detected in at least 30 different sponge species, being most frequently found in Coscinoderma matthewsi, where it accounted for up to 5.4% of the 16S rRNA gene sequences, Xestospongia spp., Rhopaloeides odorabile and Suberites spp. Regarding marine sediments, no clear pattern of distribution related to latitude or the temperature was observed; PAUC43f was as abundant in cold as in warm waters (Suppl. Figure 1A-1B), although the highest abundances were found in middle latitudes. No pattern related to the water column depth above the sediment was found, and it was remarkable that PAUC43f was detected even in a sediment with an overlaying water column of 5813 m, which suggests it can resist strong pressure (Suppl. Figure 1C). On the other hand, sediment depth seemed to be important since PAUC43f abundances were highest at the surface and decreased with depth (Suppl. Figure 1D). For soils, the highest abundances were found in middle latitudes in the northern hemisphere, although it is important to note that this hemisphere presents a higher proportion of land than the southern hemisphere (Suppl. Figure 2A). As for sediments, the abundance of PAUC43f in soils was also related to soil depth, with higher abundances at the surface (Suppl. Figure 2B).
Ecotaxonomy
The 16S rRNA-based phylogenetic tree revealed 16 different genera supported by both neighbour-joining and PHYML algorithms (Fig. 2), which included 62% of the total tree sequences. All these genera, and indeed almost all the sequences included in the tree (except AB305477.1.916), belonged to the same order and the same family, based on previously proposed thresholds for these taxonomic ranks (order: 82.0%, family: 86.5%; Yarza et al., 2014).
To analyse the ecological distribution of these genera, their frequencies and abundances in each environment were calculated. As shown in the Fig. 3A, the frequency of each genus differed across environments: some genera, such as 1, 3, 4, 6 and 9, displayed a wide environmental distribution, while others, such as genera 10, 11, 12 and 13, were limited to a few environments and samples. All genera were detected in corals, sea water, sediment and soil whereas only a few genera were found in fish, hydrothermal vents, hypersaline lake sediments and marine sediment mats. Regarding relative abundances (Fig. 3B), these genera are included within the rare biosphere in many environments (< 0.1%; [68]); however, in certain environments, some genera had moderate to high relative abundances (> 0.1%). For example, genus 6 was remarkably abundant in hypersaline lake sediments, marine sediments, and soil samples while genus 16 was clearly host-associated, with abundances above 0.1% in coral and sponge samples. Further, genera 7 and 9 had abundances above 0.1% in marine sediment and hydrothermal samples, and genus 10 had high abundances in hypersaline lake sediments and soils. These observations suggest that each genus might be adapted to specific environments. This finding implies that at least some genera are genuine members of microbiomes of corals, sponges, marine sediments, hypersaline lake sediments and soils. However, since their abundances and frequencies in fish, sediment mats and oysters are low, they are most likely detected in these types of samples as a passively transported bacteria.
Phylogenomic and metabolic analyses
Searching genome/MAG databases (GEM & GTDB r95) and recent publications [26, 27] led to the identification of 24 MAGs: 6 from the GEM database, 0 from GTDB, and 18 from the publications. An additional 15 MAGs were recovered from metagenomes sequenced in our lab from Mar Menor sediments (Aldeguer-Riquelme et al., in prep). All 39 MAGs (Table 1) met the criteria to be considered of good quality by having completeness above 80% and contamination below 5% [54, 69]. In addition, 14 MAGs also carried 16S rRNA genes (Table 1). The MAG sizes ranged from 1.80 to 4.07 Mb with a minimum GC content of 65.4% and a maximum of 71.7%. Regarding their origins, the MAGs were obtained from sponge, marine sediment and soil metagenomes (18, 18 and 3 MAGs respectively, Table 1; Suppl. Table 2). A relationship between MAG origin and size, independent of the completeness, was observed, with the smallest genomes found in marine sediment MAGs and the highest in sponge MAGs (Suppl. Figure 3). On the other hand, in terms of abundance, most of these MAGs had abundances above 0.1% (0.05% − 12.52%, Table 1) in their original metagenomes and thus belonged to the abundant biosphere.
A phylogenomic tree inferred using all Gemmatimonadota genomes and MAGs available in the GTDB supported the monophyletic origin of PAUC43f within this phylum (Fig. 4A). MAGs recovered from sponges clustered in a PAUC43f sub-branch different from those from sediments and soils. A similar result was obtained when the AAI was calculated among them (Fig. 4B). Thus, PAUC43f MAGs clustered according to their origin, which agrees with the results of 16S rRNA gene analyses (Figs. 2 and 3). Indeed, 16S rRNA gene sequences retrieved from MAGs were also classified and showed that some genera were associated with specific environments, as showed above (Fig. 2). These results strongly supported the specialization of these MAG lineages on specific ecological niches.
With respect to the taxonomic range of MAGs, the phylogenomic tree and AAI values (Fig. 4B) indicated that the 39 MAGs represented 14 different species (AAI ≥ 95%; [69, 70]), 8 of which were recovered at least twice from different metagenomes. The sponge MAGs belonged to 8 different species of the same genus while the 6 species from soils and sediments represented 4 different genera (AAI ≤ 65%; [69]). Hereinafter, the analyses will focus on these 14 species.
Table 1
General characteristics of PAUC43f MAGs. aStrain heterogeneity. bMAG abundance is shown as percentage of recruited reads from the total metagenome reads.
BinID | Contigs | bp | %GC | 16S (Genus) | Completness (%) | Contamination (%) | SHa (%) | Abundanceb (%) | Origin | Reference |
IRC1_bin_13 | 43 | 3603896 | 69.29 | No | 91.21 | 3.3 | 0 | 1.96 | Sponge | Engelberts et al., 2020 |
IRC2_bin_12 | 180 | 3551308 | 69.26 | No | 93.41 | 3.3 | 0 | 0.94 | Sponge | Engelberts et al., 2020 |
IRC3_bin_28 | 79 | 3616708 | 69.29 | No | 95.6 | 3.3 | 0 | 1.38 | Sponge | Engelberts et al., 2020 |
IRC4_bin_13 | 52 | 3197555 | 68.94 | No | 93.41 | 3.3 | 0 | 1.48 | Sponge | Engelberts et al., 2020 |
RHO1_bin_50 | 58 | 4078012 | 68.89 | No | 94.51 | 4.4 | 0 | 5.38 | Sponge | Engelberts et al., 2020 |
RHO2_bin_49 | 54 | 3475617 | 69.01 | Yes (16) | 95.54 | 3.3 | 0 | 2.72 | Sponge | Engelberts et al., 2020 |
RHO3_bin_38 | 43 | 3421550 | 69.05 | No | 91.14 | 3.3 | 0 | 6.59 | Sponge | Engelberts et al., 2020 |
VXMQ01000000 | 219 | 3760526 | 69.11 | No | 95.54 | 3.3 | 0 | 0.70 | Sponge | Engelberts et al., 2020 |
VXMR01000000 | 158 | 3163883 | 69.03 | No | 94.51 | 3.36 | 0 | 0.55 | Sponge | Engelberts et al., 2020 |
VXOJ01000000 | 71 | 3403461 | 69.49 | No | 94.51 | 3.3 | 0 | 1.73 | Sponge | Engelberts et al., 2020 |
VXQX01000000 | 96 | 3603938 | 69.05 | No | 94.51 | 3.3 | 0 | 12.52 | Sponge | Engelberts et al., 2020 |
VXSM01000000 | 159 | 3454019 | 68.84 | No | 93.41 | 3.3 | 0 | 1.16 | Sponge | Engelberts et al., 2020 |
VXWG01000000 | 114 | 3599158 | 69.44 | No | 95.6 | 3.3 | 0 | 2.06 | Sponge | Engelberts et al., 2020 |
VXXU01000000 | 424 | 2999705 | 68.98 | No | 84.55 | 2.2 | 0 | 1.27 | Sponge | Engelberts et al., 2020 |
VXYT01000000 | 102 | 3583507 | 69.04 | No | 95.6 | 3.3 | 0 | 1.99 | Sponge | Engelberts et al., 2020 |
VYCV01000000 | 309 | 3082878 | 69.08 | No | 90.11 | 4.5 | 0 | 0.51 | Sponge | Engelberts et al., 2020 |
VYDQ01000000 | 196 | 3551893 | 69.02 | No | 94.51 | 3.3 | 0 | 1.27 | Sponge | Engelberts et al., 2020 |
VYFI01000000 | 341 | 3232310 | 68.91 | No | 87.3 | 2.2 | 0 | 0.66 | Sponge | Engelberts et al., 2020 |
3300025550_6 | 399 | 2468409 | 67.08 | No | 80.49 | 3.85 | 0 | 0.23 | Marine sediment | Tringe, unpublished |
3300025554_5 | 315 | 2894191 | 67.11 | Yes (2) | 91.21 | 2.2 | 0 | 0.20 | Marine sediment | Tringe, unpublished |
3300026122_2 | 273 | 2964121 | 71.34 | Yes | 90.91 | 3.3 | 0 | 0.48 | Soil | Zhou et al., 2022 |
3300026127_2 | 119 | 3341153 | 71.38 | Yes | 96.7 | 3.3 | 0 | 0.98 | Soil | Zhou et al., 2022 |
3300026196_11 | 67 | 3192904 | 71.26 | Yes | 89.01 | 3.3 | 0 | 1.44 | Soil | Zhou et al., 2022 |
3300027962_15 | 199 | 2956959 | 71.68 | No | 96.15 | 4.4 | 0 | 0.62 | Marine sediment | Kimbrel, unpublished |
Bin_M15_27 | 604 | 2793946 | 65.52 | Yes (4) | 90.11 | 0 | 0 | 1.64 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S15_23 | 531 | 2769550 | 65.51 | Yes (4) | 90.56 | 0.55 | 0 | 1.09 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_M162_005 | 197 | 2934314 | 65.49 | Yes (4) | 95.3 | 3.3 | 0 | 0.34 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_M182_011 | 153 | 2696093 | 65.49 | Yes (4) | 94.51 | 4.4 | 25 | 0.19 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_M42_016 | 78 | 1799666 | 65.37 | Yes (4) | 80.22 | 1.1 | 0 | 0.07 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_M51_014 | 235 | 2719955 | 65.52 | No | 94.51 | 2.75 | 0 | 0.20 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_M92_019 | 219 | 2740149 | 65.42 | Yes (4) | 95.6 | 4.4 | 25 | 0.16 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S162_005 | 153 | 2826554 | 65.49 | Yes (4) | 96.7 | 4.4 | 25 | 0.37 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S162_007 | 280 | 2573992 | 66.45 | No | 93.96 | 3.6 | 16.67 | 0.25 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S182_010 | 257 | 2980185 | 65.47 | No | 96.7 | 3.3 | 0 | 0.26 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S212_14 | 88 | 2387045 | 65.52 | Yes (4) | 91.71 | 3.3 | 0 | 0.47 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S43_010 | 308 | 2799597 | 65.57 | No | 94.51 | 4.95 | 16.67 | 0.26 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S43_056 | 396 | 2710074 | 65.57 | Yes (2) | 82.99 | 3.3 | 0 | 0.05 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S53_006 | 260 | 2329960 | 65.44 | No | 93.46 | 2.75 | 33.33 | 0.68 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
Bin_S91_007 | 166 | 2340983 | 65.52 | No | 92.49 | 3.3 | 0 | 0.28 | Marine sediment | Aldeguer-Riquelme et al., unpublished |
To shed light on the ecological role of PAUC43f, the potential metabolic capabilities of each species were explored (Fig. 5, Suppl. Table 3). In terms of cell wall structure, PAUC43f was a gram-negative bacterium which lacked the genes for the flagellar assembly (except species 10). Regarding central carbon metabolism, complete glycolysis and tricarboxylic acid cycle (TCA) pathways were found in almost all species, as were sugar transporters, which suggested that PAUC43f has a chemoorganotrophic metabolism. The lack of phosphoglucose isomerase in species 10, 11, 12, 13, and 14, and phosphoglycerate kinase and malate dehydrogenase in species 8 may have been due to MAG incompleteness since all other enzymes were present in these MAGs. Genes related to carbon fixation or photosynthetic metabolism were not found. However, species from sediments and soils presented 1c and 1f hydrogenases [71], which indicated they could potentially shift between chemoorganotrophic and chemolithotrophic metabolism. It is noteworthy that hydrogenotrophic metabolism in other Gemmatimonadota members has been recently demonstrated [72].
Members of PAUC43f seemed to be facultative aerobes since genes for complex IV cytochrome oxidase, which transfers electrons to oxygen, were detected in all MAGs, although most of them also encoded the genes for the nitrate, nitrite and/or nitrous oxide respiration. In addition, species found in sponges were able to respire thiosulfate, and species from sediments and soils could likely carry out alcoholic fermentation. The potential for PAUC43f lineages represented by MAGs from sediment and soils to reduce N2O is in agreement with previous observations from other Gemmatimonadota members [14, 15, 73] and highlights its ecological relevance. N2O is a potent greenhouse gas, which, due to human activities such as agricultural fertilization of combustion of fossil fuels [74], is increasing its atmospheric concentrations at a rate of 0.8 ppb per year [75], with some of the highest concentrations measured in coastal and estuarine waters [76, 77]. Thus, N2O consumers, such as PAUC43f, could play a key role in mitigating the harmful effects of this gas.
Biosynthetic pathways for amino acids, B vitamins, and secondary metabolites, which are relevant molecules for microbial metabolism and physiology, were explored in PAUC43f. With respect to amino acid biosynthesis, all PAUC43f species encoded the complete pathways for synthesis of glutamate, alanine, aspartate, glycine, threonine, cysteine and leucine, while the pathways for the rest of amino acids were complete in only some species. It is noteworthy that species from sponges were able to synthesize more amino acids (16–18) than species from sediments and soils (10–13). The most extended putative auxotrophies were found for valine, isoleucine, lysine and histidine; each of these could be only synthetized by two species. Species from sponges were all auxotrophic for histidine, whereas species from sediments and soils were auxotrophic for valine, lysine, isoleucine, tyrosine, proline and phenylalanine. However, species from sponges could perhaps circumvent valine and isoleucine auxotrophies by encoding a branched-amino acid transporter and, in sediment and soil species, tyrosine might be acquired by cotransport with H+. Furthermore, transporters for oligopeptide import were found in all species. A possible explanation for these auxotrophies might be due to each amino acid’s frequency of use and biosynthetic cost. The most used amino acids in the PAUC43f proteome were alanine, leucine and glycine (Suppl. Figure 4) which could be produced by these species with a low metabolic cost (alanine and glycine; [78]). In contrast, histidine and lysine, whose biosynthesis is metabolically more expensive [78], were among the less frequent amino acids and most species were auxotrophic for them. On the other hand, PAUC43f could play an important ecological role providing cysteine and serine to the marine community since their biosynthetic pathway and efflux transporters were encoded in almost all species, and serine auxotrophy has been demonstrated for important marine bacteria such as Pelagibacter ubique [79].
In regards to the potential for vitamin B production (Suppl. Table 3), essential core biosynthetic genes for thiamine (vitamin B1) (thiC, thiG and thiE), which is a cofactor of several essential enzymes [80], were detected in all species except species 10 that lacked thiC. Given that B1 auxotrophy is frequent in both eukaryotic and prokaryotic marine communities [81–83], and is the second most common in marine environments [84], PAUC43f could be a key supplier of B1 to the marine communities. Regarding riboflavin (vitamin B2), which is a precursor of the coenzymes FAD and FMN [85], all PAUC43f MAGs encoded the complete biosynthetic operon for this vitamin. Niacin or vitamin B3 act as coenzymes in redox reactions and could also be synthetized by PAUC43f species from soils (species 9) and marine sediments (species 10, 11 and 14). On the other hand, the pathway for pantothenate (vitamin B5), a precursor of coenzyme A, was found complete only in species 9 while species 1, 3, 4, 5, 6, 7, 8, 10 and 11 lacked one gene and species 2, 12, 13 and 14 lacked several. Similarly, most genes involved in the biosynthesis of folate (vitamin B9), an important molecule in anabolic reactions, were found in PAUC43f MAGs (folE, DHPS, folC, phoD, folB, PTPS) but dihydrofolate reductase (DHFR) and 4-amino-4-deoxychorismate lyase were only present in species 9 and species 11, respectively. Thus, the presence of most genes involved in pantothenate and folate biosynthesis suggests that some PAUC43f species may be capable of synthesizing these vitamins, and that missing genes were likely not binned in the MAGs, however, this remains to be tested in future studies. Biosynthetic pathways for vitamins B6, B7 and B12 were not found, and the presence, in all species, of the bioY gene, which encodes a biotin (vitamin B7) transporter [86], and btuF and btuB, which are part of the cobalamin (vitamin B12) transporter [87], suggests PAUC43f imports these vitamins from the extracellular environment.
Secondary metabolites are usually involved in growth, development and defense [88], and they are interesting molecules for medicine due to their potential uses as antibiotic, anti-tumoral and cholesterol-lowering drugs. PAUC43f MAGs were analysed with antiSMASH [63], which revealed that sponge MAGs presented a higher number and diversity of BGC (4–9 BGC per MAG) than those from sediments and soils (1–2 BGCs per MAG) (Fig. 6, Suppl. Table 4). Species from sponges encoded type I polyketide synthase (T1PKS), ranthipeptide and ribosomally synthesised and post-translationally modified peptides (RiPPs). Most of them had low similarity to previously described BGCs so their products could not be predicted. However, some T1PKS were similar to those known to synthetize azinomycin B, a potent antibiotic with antitumor activity [89, 90], cyphomycin, an antifungal compound [91], and vazabitide A and funisamine, both with unknown biological properties [92, 93]. On the other hand, terpene and β-lactone producing bacteria are frequently found in marine sediments and sponge symbionts [88, 94], thus it was not unexpected to find them encoded by most PAUC43f species.
Antibiotic and heavy metal resistance genes were detected in PAUC43f MAGs. With respect to antibiotic resistance genes, PAUC43f MAGs encoded β-lactamases, tetracycline/H+ antiporters and fosmidomycin and macrolide efflux pumps. With respect to heavy metal resistance genes, they were mainly present in sediment and soil species. Bacterioferritin, an iron storage protein which also protects the cell from the reactive Fe2+, was present in species 11, 12, 13 and 14. Additionally, some species could be resistant to As3+ (species 9, 10, 11, 12, 13 and 14), Zn2+, Pb2+ and Cd2+ (species 9, 11, 12, 13 and 14) by exporting these metals out of the cell.