4.1 Taxonomic and functional depth stratification of bacteria, archaea, eukarya, and viruses
Guerrero Negro microbial mat is one of the best studied microbial mat ecosystems, and the use of the molecular analysis based on sequencing of small-subunit rRNA genes and 454 sequencing have revealed the high complexity of this stratified ecosystem [6, 7]. While a vertical stratification of oxygen and sulfide, as revealed with microelectrodes, was documented in early studies of this mat [23] as was the surprising occurrence of sulfate reduction in the oxic zone of the mat [23]. Vertical patterns of nitrogen cycling genes with respect to vertical variations in oxygen concentration were documented more recently [9]. The vertical organization of other functions in these communities has been less well studied. Through metagenomics, the current study provides a taxonomic description of the vertical taxonomic organization as well as a functional organization delineated between bacteria, archaea, eukarya and viruses in a GN microbial mat – revealing new insights into the ecology of these communities. In the discussion that follows, we describe the vertical taxonomic distribution and the functional organization of the communities as a whole and subsequently broken down by bacteria, archaea, eukarya and viruses.
4.1.1 Community as a whole
Overall, 922,324 unique gene-copies were recovered (meaning assembled and predicted), with 84.51% of those being classified to at least the domain level, leaving 15.49% unclassified (Table S17). Based on coverage, 88.71% of read-recruitment went to those genes classified to at least the domain level, with the remaining 11.29% going to taxonomically unclassified genes. From here on, the percentages stated are out of the total classified genes and coverages. Focusing on domain-level taxonomy, 98.06% of recovered unique gene-copies were taxonomically classified as bacteria, 1.82% as archaea, 0.07% eukarya, and 0.05% viral. Values based on coverage were similar, with 98.71% of total coverage going to genes taxonomically classified as bacteria, 1.19% for archaea, 0.07% for eukarya, and 0.03% for viruses. The greatest coverages of bacteria and eukarya genes were detected in Layer 1, while the highest coverages of viruses and archaea genes were found in Layers 3 and 4 (Fig. 1).
Based on KEGG KO functional annotations, 28.92% of the total recovered genes (35.47% of total coverage) were annotated, with 71.08% not successfully annotated (64.53% of total coverage). Broken down by domain-level taxonomic classifications, 67.2%, 74.3%, 73.3%, 94.4% of the genes taxonomically classified as bacteria, archaea, eukarya and virus, respectively, were not successfully functionally annotated – highlighting the potential for unrepresented functions and/or lineages as previously speculated in the same mat [6]. The spatial distribution of bacteria, archaea and eukarya in GN mats has been previously surveyed to some extent, largely based on marker genes [24]; however, to the best of our knowledge no studies investigating viruses in GN have been reported. Viruses constituted the lineage with the least genes successfully functionally annotated, with > 94% of the genes taxonomically classified as viral not having any KO annotations assigned (certainly in large part owing to their lack of representation in general databases). With respect to their diversity, 28.2% of the genes were not successfully taxonomically classified at below the domain level, and 58.4% of the genes were not successfully taxonomically classified below the order rank. Studies have reported > 75% of viral proteins are typically not functionally annotated [25]. Additionally, viruses lack many conserved universal genes that are often used for taxonomic classification in the domains of bacteria (16S-23S rRNAs), archaea (18S-28S rRNAs) or in eukaryotes (18S-28S rRNAs), and therefore require different methods of standard identification [26]. The direct sequencing provided through metagenomics and metatranscriptomics approaches has facilitated large scale virome identification from complex ecosystems, discovering novel viral genes and allowing their identification [27].
4.1.2 Bacteria
The bacteria community composition represented 98.06% of the data (as based on unique genes recovered), and 98.71% as based on gene-level coverage (Table S2). Cyanobacteria, including the families of Microcoleacea and Oscillatoriaceae, dominated the upper one millimeter (Layer1), while sequences classified as belonging to the phylum Chloroflexi had the greatest coverages in Layer 2–4 (Figure S1). And overall, the coverage of bacterial genes decreased with depth (Fig. 1A1). This type of high-level taxonomic stratification has been previously recovered in GN mats by several studies [6, 7, 28]. The taxonomic vertical distribution has also been studied in Shark Bay microbial mat in Australia and a similar vertical distribution was found [29]. However, less is known about the bacterial functional organization.
Based on the grouping of KO annotations and their coverages based on KEGG metabolic pathways, genetic information processing was the dominant pathway, followed by signaling and cellular processes, carbohydrate metabolism and energy metabolism (Table S1). For Cyanobacteria, the genes with highest normalized coverages were annotated as photosystem II P680 reaction center D2 protein (psbD) and photosystem II oxygen-evolving enhancer protein 3 (psbQ); while DNA segregation ATPase Ftsk/SpoIIIE, S-DNA-T family (ftsk, poIIIE), integrase/recombinase XerD (xerD), and plasmid segregation protein ParM (parM, stbA) had the greatest coverages for Chloroflexi. Similarly, oxygenic photosynthesis genes have been identified from Cyanobacteria in a Shark Bay microbial mat in Australia [4].
4.1.3 Archaea
Regarding archaea, 1.82% of the sequences (as based on unique genes recovered), and 1.19% as based on coverage (Table S4), were classified as originating from members of this domain, with increasing coverage with depth (Fig. 1B1). Layers 3 and 4 contained the greatest coverages and the phyla Euryarchaeota, namely the family Methanosarcinaceae, were the dominant taxa. Some archaea are known to live in extreme environments such as high salt or temperature due to having diverse energy sources, cell physiology and metabolic characteristics. Previous studies in GN found Euryarchaeota and Methanosarcinaceae as predominant methylotrophic methanogenesis archaea using 16S rRNA amplicon sequencing [30]. Moreover, we also identified an uncultured archaeal MSBL1, described as a lineage exclusive to hypersaline environments [31]. Archaeal classified genes related to N2 fixation and denitrification were also identified (Table S18). Assimilatory pathway of nitrogen fixation genes (nifA, nifB iscU, nifU, glnB) were annotated in genes classified as originating from Candidatus Micrarchaeota and Euryarchaeota, while nitrate reductase gamma subunit (narI, narV), nitric oxide reductase NorD protein (norD), anaerobic nitric oxide reductase flavorubredoxin (norV), nitronate monooxygenase (ncd2, npd) and nitrous oxidase accessory protein (nosD) were present in the archaea lineages Euryarchaeota, Candidatus Thorarchaeota and Candidatus Helarchaeota (Table S18). Genes annotated as being involved in the nitrogen-related pathways of assimilatory nitrate reduction, dissimilatory nitrate reduction to ammonium (DNRA pathway) and nitrogen fixation genes were also detected in hypersaline microbial mat from Shark Bay, Australia [32].
4.1.4 Eukarya
The total number of unique genes attributable to eukarya was relatively low (0.07%), with the contributing coverage being 0.07% (Table S6). Layer 1 contained the greatest coverages of genes and the dominant phylum was Bacillariophyta. Members phylogenetically related to Bacillariophyta have been previously identified in phototrophic microbial mat in Iceland using 18S rDNA [33]. Layer 3 and 4 contained high coverages of genes related to fungi Ascomycota and Chytridiomycota. In a previous study, the abundance of fungi increased in depth-layers in the same mat using qPCR [10]. The most abundant pathways for genes classified as eukaryal were related to genetic information processing and photosynthesis, with the greatest coverages coming from genes related to photosystem (psbX, psaI and PsbX genes for photosystem I and II, respectively) and ribosomal genes. Eukaryotic ribosomal proteins are larger in size and structurally, biochemically and functionally more complex than their archaeal and bacterial counterparts [34]. Photosystem I and II are the two multi-protein complexes that use the light energy to catalyze the primary photosynthetic endergonic reactions producing high energy compounds [35]. Photosynthesis genes have been also found in MAGs recovered from the photo-oxic-zone in Shark Bay mats [4].
4.1.5 Viruses
With respect to viruses, 394 viral genes were identified in the mat co-assembly which were taxonomically classified into 3 phyla, 16 families and 92 species (Table S8). Layers 3 and 4 contained the highest coverages of genes and Caudovirales was the dominant order, comprising seven families: Siphoviridae, Myoviridae, Herelleviridae, Autographiviridae, Podoviridae, Herelleviridae and Autographiviridae. Among the 394 viral genes identified, 332 genes were taxonomically classified at species level and 197 genes were related to phages. The majority of the phages were taxonomically related to uncultured Caudovirales phage, followed by uncultured Mediterranean phage uvMED, Vibrio phage, Synechococcus phage S-SCSM1, Streptomyces phage BRock, Rhodothermus phage RM378, Cellulophaga phage phi14:2, belonging to the families Siphoviridae, Herelleviridae, Podoviridae, Myoviridae and Autographiviridae, all members of the order Caudovirales.
In terms of abundance, viruses are the Viruses are the most dominant biological entities on Earth [36]. They are also critically important in terms of their ecological implications in the transmission of genetic information, the predation of prokaryotic communities, and effects on nutrient cycling [37]. Caudovirales is one of the most predominant viruses and its functional capabilities are relatively well characterized [38]. In other hypersaline microbial mats and in other extreme environment such as stratified sulfidic mine tailings, the order Caudovirales and the families Myoviridae, Podoviridae and Siphoviridae were also the dominant viruses discovered using metagenomic sequencing [39–42]. In the current study, 43 of 92 species of viruses identified were related to known phages. The majority of the phages belonged to the families Siphoviridae, Herelleviridae, Podoviridae, Myoviridae and Autographiviridae, all members of the order Caudovirales. In environmental samples, 1 to 10 phages per prokaryote cell have been reported [43]. Bacteriophages specifically from the families Myoviridae, Siphoviridae, Podoviridae have been previously described in another microbial mat [41, 44]. Seven species belonged to the family Mimiviridae: Cafeteria roenbergensis virus, Barrevirus sp., Homavirus sp., Edafosvirus sp., Klosneuvirus KNV1, Mimivirus LCMiAC01 and Catovirus CTV1. The family Mimiviridae compresses a nucleo-cytoplasmic large DNA viruses (NCLDV) called “giant viruses” that infect eukaryotic cells [45, 46]. We also detected a gene taxonomically classified as the virophage Cafeteria roenbergensis, which increased with depth (Table S16; ID 302828), that was not successfully functionally annotated. Cafeteria roenbergensis is a double-stranded DNA virus, reported in the infection of marine microflagellates and contains a genome around 730 kb [47].
In Layer 1, the most abundant pathways for viral-classified genes were related to genetic information processing. Pathways related to the recombination of proteins increased in Layers 2, 3 and 4. A pathway related to oxidative damage repair was detected in Layer 2 with a gene related to UV damage present in uncultured Caudovirales phage. One gene was annotated as chaperonin GroES and was present in Prokaryotic dsDNA virus sp. (Table S16; ID 521570). Chaperonins are ancestral proteins detected in all cellular life, however, they have only been reported in a few viruses. GroES belongs to chaperonin Group I, which are involved in phage assembly [48]. The majority of the viral-classified genes (94.4%) were not successfully annotated by our KO-annotation process, highlighting the general underrepresentation of viruses in standard databases, as well as the potential great virome that awaits discovery in these mats. Similar limitations in the identification of viruses have been previously reported [49].
4.2 Living in extreme conditions: bacteria, archaea, and virus genes related to adaptation
In this section, we focus on genes annotated with functions that may be related to adaptations to the local environment, specifically functions related to UV radiation resistance, multidrug resistance, oxidative stress, heavy metal, antioxidative stress. This included 3,255 bacterial and 44 archaeal genes (Figs. 2A, 2B, Tables S9-S15). Furthermore, one gene related to UV DNA damage endonuclease that was taxonomically classified as viral was detected (Fig. 2C, Table S16). Genes involved with responses to oxidative stress, metal toxicity, osmotic stress due to hypersalinity, high-UV irradiation or protection against viral damage have been reported in Shark Bay microbial mats in Australia via metagenomics analysis [4]. This Wong study [4] is consistent with other studies in which genes related to adaptation are more common under extreme conditions. In extreme environments like brine, Antarctic lake or permafrost, genetic machinery related to metal resistance genes, cold shock, multidrug resistance efflux pumps or resistance to antibiotic stress conditions has been identified at higher frequencies than in other environments [50]. These ecosystems are living in extreme environments such as high salinity, desiccation, cold, limited nutrients and high UV exposure, leading potentially to a higher complexity and possibly to a better adaptation to the environmental constraints.
Genes encoding arsenical resistance were affiliated to bacteria lineages of Cyanobacteria and Proteobacteria (Table S10). Arsenic metabolism was an energetic metabolism in the primitive Earth and has been detected in modern stromatolites in Andean microbial ecosystems in Argentina and in Shark Bay microbial mats in Australia using metagenomics [4, 51].
Multidrug-resistance genes comprised a total of 930 genes in bacteria and were mostly taxonomically affiliated to Bacteroidetes, Chloroflexi, Planctomycetes, Proteobacteria, Spirochaetes, Verrucomicrobia (Table S9). In archaea, multidrug resistance was detected with 6 genes and was present in Candidatus Methanoliparales and Archaeoglobales (Table S15). The role of multidrug-resistance genes in mats is not yet clear however, and might be correlated with a general great adaptation potential of microbes living in relatively extreme conditions as has been suggested in a recent study [50]. Multidrug-resistance genes have been found in Antarctic microbial mats [52].
The potential for adaptive responses to oxidative stress was detected in terms of genes annotated as superoxide-dismutase, superoxide oxidase and superoxide reductase. Genes encoding the oxidative stress genes were found in bacteria lineages of Proteobacteria, Planctomycetes, Bacteroidetes or Chloroflexi and in archaea lineages of Euryarchaeaota and Candidatus Micrarchaeota (Table S12). In bacteria, superoxide dismutase FeMn (SOD2) were the most abundant (highest coverage) of the oxidative stress genes and were more abundant in the upper layers (Fig. 2A). SOD2 might help to protect microbes from relatively elevated levels of reactive oxygen species (ROS) due to UV radiation at the mat surface.
Cold-shock protein was the annotation for 221 genes in bacteria and 7 genes in archaea and was more abundant in Layers 3 and 4 (Fig. 2A, B; Tables S13 and S15). Cold-shock protein (Csp) is well characterized in bacteria; however, it has been reported that most archaeal genomes do not contain csp genes [53] or at least do not contain detected homologs to the bacterial csp [54]. In this study, cspA gene was present in 3 archaeal phyla: Candidatus Aenigmarchaeota, Euryarchaeota and Candidatus Woesearchaeota (Table S15). In addition to cold stress protection, bacterial csps offer other biological functions such as antioxidative stress, anti-antibiotic activity and stationary growth [55]. Further studies are needed in order to elucidate the biological functions of the recovered archaeal CspA proteins (Table S15).
With regard to heat shock proteins (hsIJ), 184 genes were present in bacteria and 7 in archaea (Tables S13 and S15). In bacteria, hsIJ-annotated genes were mostly taxonomically classified to Proteobacteria, Chloroflexi, Bacteroidetes, while in archaea in Candidatus Micrarchaeota and Euryarchaeota (Tables S13 and S15). In addition to preventing the thermal denaturing of proteins, heat shock proteins may be also associated with cold and UV stress [56–58].
In this study, 997 UV-related genes were classified under the domain Bacteria, with 14 under the archaeal domain (Tables S14 and S15). Microbial mats have existed for around 85% of the Earth’s history, reaching back prior to the emergence of oxygenic photosynthesis. It is tempting to speculate that UV-related genes recovered from these mats may provide insight into this evolutionary history. Before oxygenic photosynthesis, no protective ozone layer was present and microorganisms may have experienced a greater selective pressure to develop strategies to protect themselves against ultraviolet and gamma irradiation than they do on the modern Earth. In mats located in Shark Bay, Australia, high-UV exposure has been correlated with low rainfall, high salinity, high evaporation and variations in the extent of the ozone layer in hypersaline microbial mat [59]. Additionally, photoprotective strategies against UV radiation or reactive oxygen species have been identified in other cyanobacteria-dominated microbial mats [60].
4.3 Investigating relatedness between the genes present in bacteria, archaea and viruses
We employed phylogenetics in order to investigate the evolutionary histories of genes with similar functional annotations that were taxonomically classified as coming from bacteria, archaea and viruses (Table 1).
Table 1
Phylogenetically investigated KO functions
KO definition | KO ID | Gene name(s) | Bacteria | Archaea | Virus | Figures/Tables |
MFS transporter, ACDE family, multidrug resistance protein | K08221 | yitG, ymfD, yfmO | 14 | 1 | 0 | Figure 3A, Tables S9, S15 |
Small multidrug resistance pump | K03297 | emrE, qac, mmr, smr | 21 | 1 | 0 | Figure 3B, Tables S9, S15 |
Superoxide dismutase, Fe-Mn family | K04564 | SOD2 | 171 | 1 | 0 | Figure 4A, S5, Tables S12, S15 |
Superoxide reductase | K05919 | dfx | 107 | 7 | 0 | Figure 4B, S6, Tables S12, S15 |
Heat shock protein HtpX | K03799 | htpX | 106 | 4 | 0 | Figure 5A, S7, Tables S13, S15 |
Cold shock protein | K03704 | cspA | 221 | 7 | 0 | Figure 5B, S8, Tables S13, S15 |
Phage shock protein C | K03973 | pspC | 114 | 2 | 0 | Figure 5C, S9, Tables S13, S15 |
Excinuclease ABC subunit A | K03701 | uvrA | 362 | 5 | 0 | Figure 6A, S10, Tables S14, S15 |
Excinuclease ABC subunit B | K03702 | uvrB | 193 | 1 | 0 | Figure S11, Tables S14, S15 |
Excinuclease ABC subunit C | K03703 | uvrC | 203 | 3 | 0 | Figure 6B, S12, Tables S14, S15 |
DNA helicase II ATP dependent DNA helicase | K03657 | uvrD, pcrA | 214 | 1 | 0 | Figure S13, Tables S14, S15 |
UV DNA damage endonuclease | K13281 | uvsE, UVE1 | 17 | 2 | 1 | Figure 6C, Tables S14, S15, S16 |
Figures 3–6 show the phylogenetic relationships between domains; the complete trees are in the supplementary material Figures S5-S13. And see Table 1 and the corresponding supplemental tables for gene IDs and full amino-acid sequences.
Figure 3 shows the phylogenetic tree of MFS transporter, ACDE family, multidrug resistance protein (yitG, ymfD, yfmO; 3A) and small multidrug resistance pump (emrE, qac, mmr, smr; 3B). For ACDE family, multidrug resistance protein (yitG, ymfD, yfmO) (Fig. 3A), the gene ID 57669 present in Candidatus Methanolliviera hydrocarbonicum was between clades holding genes classified as sourced from the following bacterial lineages: Spirochaetes bacterium (gene ID 391435), Candidate division Zixibacteria bacterium (gene ID 13873), Bacteroidetes bacterium (genes IDs 219193, 511031), Tangfeifania diversioriginum (gene ID 127482) and the genes IDs 158770, 316573, 83331 and 254423.
For small multidrug resistance pump (emrE, qac, mmr, smr) (Fig. 3B), the gene ID 419991 present in Archaeoglobales archaeon formed a clade with 4 genes present in bacteria: the genes IDs 424272, 737623, gene ID 683720 present in Desulfonemais himotonii and gene ID 212872 present in Olavius sp. associated proteobacterium Delta1.
Figure 4 and Figure S5-S6 show the phylogenetic trees for superoxide dismutase, Fe-Mn family (SOD2) (Fig. 4A and S5) and superoxide reductase (dfx) (Fig. 4B and S6). For SOD2, the gene ID 550693 present in archaea formed a clade with a gene ID 201256 present in bacteria. For dfx gene, the gene ID 612886 present in archaea formed a clade with 3 genes present in bacteria lineages: two present in Deltaproteobacteria (genes IDs 56860, 26015) and one present in bacteria gene ID 612887. The gene ID 571069 present in Candidatus Micrarchaeota was closely related to genes present in candidate division KSB1 bacterium (genes IDs 255142, 206357) and gene ID 499729. Finally, 5 genes present in archaea with genes IDs 323416, 494295, 151464, 732097, 733106 present in Thermoplasmatales and Thermoplasmata formed a clade. This clade was closely related to a clade formed by the bacterial lineages Candidatus Cloacimonas sp. (411781), Peptoclostridium litorale (685247), Clostridia (847882), and the gene IDs 40055, 525801, 693963, 411780, 6209.
Figure 5 and Figures S7-S9 show the phylogenetic trees corresponding to heat, cold and phage shock protein. For heat shock protein HtpX (htpx) (Fig. 5A and S7), the gene IDs 443088 and 222562 from the archaea lineages of Thermoplasmata and Candidatus Micrarchaeota formed a clade with a gene recovered from a Deltaproteobacteria (gene ID 17526). The gene ID 350980 present in archaea formed a clade with 2 genes present in Candidatus Pacebacteria bacterium (genes IDs 137878, 190090). And the gene ID 60713 from an archaeon deeply branched in between clades holding bacterial genes.
For cold shock protein (cspA) (Fig. 5B and S8), the genes ID 806860, 245087, 649029, 199423 and 224539 detected from the archaeal lineages of Thermoplasmata, Candidatus Aenigmarchaeota archaeon and Thermoplasmatales formed a clade with two genes present in Chloroflexi bacterium (genes IDs 641093, 589131). The genes ID 454595 and 191917 present in Candidatus Woesearchaeota and Euryarchaeota were in a deeply branching clade with genes from Planctomycetes (gene ID 76796) and Desulfobacterales (gene ID 743598).
For phage shock protein C (pcpc) (Fig. 5C and S9), the gene ID 259327 present in archaea formed a clade with genes from candidate division KSB1 bacterium (gene ID 8195) and Actinobacteria (gene ID 500082). The gene ID 254336 present in Methanomas siliicoccalesa formed a clade with bacterial genes from candidate division KSB1 bacterium (gene ID 558252) and Mariniphaga anaerophila (gene ID 300663).
The phylogenetic trees representing the UV genes are shown in Fig. 6 and in the supplementary Figures S10-S13. For uvrA (Fig. 6A and S10), four genes classified as coming from the archaeal lineages of Euryarchaeota and Thermoplasmata (genes IDs 604005, 848457, 361714, 636321) formed a clade with two genes from bacteria (genes IDs 519469 and 777839). One gene classified as coming from Methanothermobacter (gene ID 727168) was within a deep clade with bacterial genes from Chloroflexi (gene ID 479360), Puniceicoccus vermicola (gene ID 808311), Spirochaetaceae bacterium (898458), and a Proteobacteria bacterium (gene ID 909516).
For uvrB gene (Figure S11), the gene ID 653100 taxonomically classified as Thermoplasmata branched near genes classified as Planctomycetes (genes IDs 728796, 54766).
For uvrC gene (Fig. 6B and S12), the genes IDs 699307, 905297, 647629 corresponding to archaea formed a clade with 6 genes present in bacterial lineages: Deltaproteobacteria (gene ID 873573), Spirochaetes (gene ID 664616), Spirochaetia (gene ID 391558), Spirochaetia (gene ID 399873), Spirochaetia (gene ID 179402) and Gemmatimonadetes bacterium (gene ID 82391).
For uvrD, pcrA gene (Figure S13), the gene ID 181004 present in Candidatus Bathyarchaeota formed a clade with a gene ID 297131 present in Bacteroidales bacterium.
For uvsE, UVE1 gene (Fig. 6C), two genes present in Thermoplasmata (genes IDs 807563 and 510363) and one gene present in uncultured Caudovirales phage (gene ID 667218) formed a clade with 3 genes present in bacteria (genes IDs 572443, 553365 and 74783, representing a Chloroflexi bacterium, an unclassified bacterium, and a Deltaproteobacteria bacterium, respectively).
Phylogenetic analyses showed an interlinking between domains, where genes taxonomically classified as archaea and viruses formed clades suggesting they were phylogenetically relatively closely related to genes present in bacteria lineages – suggesting they might have been acquired from the bacterial domain. In a recent study, sulfate reduction genes detected in viruses were identified as being phylogenetically closely related to bacterial lineages, suggesting they too may have been acquired from bacteria [39]. For microorganisms living in extreme environments, particularly in dense communities like microbial mats, horizontal gene transfer (HGT) is suspected to be the prominent method of gene gain in bacteria and archaea [61, 62]. HGT has been described in hyperhalophilic bacteria and archaea, suggesting adaptive responses against salt concentration, photobiology and bioenergetics [63, 64].