Strong evidence of predation-prey associations in Antarctic soils
Our study expands current insights regarding the genetic mechanisms explaining prey-predator co-evolutionary associations between bacteria and their viruses in poly-extreme Antarctic conditions (Supplementary Table 1). Following metagenome sequencing, assembly, and genome binning (Supplementary Tables S2 and S3), we recovered 18 medium- to high-quality metagenome-assembled genomes (MAGs) (Fig. 1a). These MAGs include three Acidobacteriota, one Cyanobacteria, twelve Bacteroidota and two Verrucomicrobiota, representing both dominant, and rare bacterial phyla in these soils (Supplementary Note 1 and Supplementary Fig. 1). The genome sizes of these bacteria ranged from 2.7–5.9 Mb, when accounting for completeness. These genomes had moderately low G + C contents (mean = 42.8%), which is surprising given the expectation that extreme environments may select for organisms with high G + C content 50,51. We estimated the genome replication rates for each MAG 52 and found that the highest genome replication rates were associated with Acidobacteriota (mean = 3.06) and the Verrucomicrobiota (mean = 2.90). These differed compared with Bacteroidota (mean = 2.48) and Cyanobacteria (2.04). Estimating the minimal doubling time with codon usage bias 53 suggests that the rates across all MAGs may be low with only members of the Flavisolibacter predicted to double in under five hours. Overall, these patterns suggest extremely slow growth rates which is expected given the poly-extreme conditions in this region (see additional information regarding the climatic conditions and these taxa in Supplementary Note 1). Given the comparatively low division rates, it is reasonable to predict that the slow evolutionary processes acting on microbial communities in this environment are likely to delay selection. There is some support for this assertion including the divergent ecological patterns of Antarctic microbiota and the substantial differences compared with those from outside the continent.
Our study provides strong evidence that Antarctic bacterial communities may have ancient origins. Bayesian evolutionary analysis, used to produce time-measured phylogenies, suggests that the genomes retrieved from our studies ranged between 500 and 1,200 Mya (Fig. 1b). These findings are consistent with previous estimates of cryptoendolith divergence in Antarctica 54. The results also support approximations for other genomes retrieved from Antarctic soils 46. Altogether, the unique monophyletic clades of our Antarctic MAGs were distinct suggesting that these bacteria diverged from other known taxa during the Precambrian (541 Mya). Taxonomy analysis indicated that 12 of our MAGs potentially represent novel species. Considering this evidence, we predict that bacteriophages associated with these bacteria may have also been co-separated from other microorganisms for a similar length of time since host-virus specificity is mostly strain specific. We hypothesized that this distinct, and specific, co-evolution may be corroborated by the recovery of uncharacterized and potentially novel adaptive immune signatures in Antarctic host genomes.
To test this hypothesis, we characterized prophages in our Antarctic genomes. We found one prophage, in the Verrucomicrobiota MS5-1_8 genome, as well as two prophages, in a Bacteroidota genome (TG5-1_3). Coincidently, these two taxa were predicted to be the slowest growing. This finding provides direct evidence linking prophages to both Verrucomicrobiota and Bacteroidota in this region (Table 1). The prophage found in the Verrucomicrobiota genome was 4,483 bp in length and was similar to known Microviridae phages (BLASTn: 31% query coverage and 69.74% identity). The prophages identified in the Bacteroidota genome were 5,477 bp and 7,214 bp in length, although these had low sequence similarity to known phages (more information on the uncultivated viral genomes identified here in Supplementary Note 2 and Supplementary Fig. 2). While several studies have investigated the importance of prophages on the evolution of bacterial pathogens 55, knowledge regarding their role in soil taxa remains limited and the lack of a comprehensive database of soil-dwelling viruses contributes to this knowledge gap 56. Previous studies suggest that lysogenic conversion may confer new traits to bacteria 57, which typically have novel metabolic functions through auxiliary metabolic genes (AMGs) 58. For example, phages associated with Verrucomicrobiota encoded genes have previously been implicated in nitrogen fixation 59. In oligotrophic soils, the benefits to the hosts may be vital for ecosystem services, facilitating access to alternative energy sources or stress avoidance mechanisms. Elucidating innate mechanisms may provide insights regarding host-virus interactions and reveal the extent of functional diversity in these soils.
CRISPR systems provide evidence of bacterial virus attacks in Antarctica.
The detection of prophages in MAGs and AMGs on phage contigs (Supplementary Note 2) supports the prediction of unique host-virus histories, as most viral genomes are completely unrelated to known viruses (Fig. 2) 60–62. The results from this study suggest that the host adaptive immune system, associated with divergent microbiota in Antarctica soils, may be more prominent than initially envisaged. However, apart from previous studies on rock associated microbiota, there is a severe knowledge deficit regarding host adaptive immune systems associated with poly-extreme environments.
To further explore host-virus histories associated with our MAGs, we searched for related diverse defence strategies against phage predation. As part of determining the adaptive immune system, we found putative CRISPR-Cas systems in 16 of the 18 MAGs (ca. 89%). In terms of CRISPR arrays, the identified conserved repeats ranged from 23 to 30 bp across four of the retrieved genomes. The repeats were flanked by unique spacers, that were an average of 36 bp in length (range 34–38 bp). The largest set of CRISPR cassettes was found in the MS7-5_6 genome (Acidobacteriota), with 51 CRISPR spacers between housed 6 unique CRISPR repeats. These values are within the optimal number of spacers, previously suggested to range between 10–100 within bacterial genomes 63. The CRISPR loci within bacterial genomes retain the memory of past viral infections 63,64. Yet, the length of these loci appears to be directly related to the capacity to respond to an infection 65. In other words, there appears to be a trade-off between maintaining a vast genetic memory of attacks (harbouring more spacers) and the functionality of the CRISPR mechanism 63. The remaining genomes only had between three and 16 spacers, which is more similar to human gut microbes (average of 12 spacers) 66 than the average cassette size of between 20 and 40 spacers 67. We speculate that the lower spacer count may be due limited encounters with a small set of phages. In this scenario, the spatial constraints of the soil microhabitat limit the number of potential interactions between phages and putative hosts. This suggests that phage diversity may be low in this region of Antarctica. Not only are cells immobilized by adsorption to soil particles of the Antarctic desert pavement, but rarely, if ever, subject to precipitation events which may allow for the mobilization of cells, thus reducing the spectrum of infection events considerably.
In addition to the CRISPR-Cas cassettes, the 16 MAGs had relatively low abundances of cas genes, with between 6 and 42 loci per MAG. These cas genes constituted 122 unclassified sequences (n = 221 total cas sequences), followed by several classified sequences including 48 type III, 31 type I, 20 type IV and 2 type V Cas systems. These Cas types are similar to those previously reported in Antarctic surface snow in which CRISPR-cas types I, II and III were most common 68. The MS7-5_6 MAG (Acidobacteriota) had a contig with 10 genes associated with a hybrid CRISPR-Cas Class I system. This contig also had a GCN5-related N-acetyltransferase (GNAT) toxin domain 69 (see Fig. 3a), which functions by acetylating charged tRNA molecules to prevent translation. Previous studies suggest that these GCN5-related N-acetyltransferase toxin domains may represent novel substrates for several enzymes linked to antibiotic modification 70.
We further investigated unbinned metagenomic contigs, which possessed eight or more co-localized cas genes, to determine if they represented novel CRISPR-Cas variants. Taxonomically, the CRISPR-Cas systems recovered from these contigs were affiliated members of the Acidobacteriota (n = 6 contigs), Unclassified Bacteria (n = 2), Chloroflexota (n = 1) and Bacteroidota (n = 1). However, the taxonomic relationships of these taxa suggest potentially shared histories with a variety of bacterial phyla (Fig. 3b). The architecture of effector complexes, within the CRISPR-Cas systems, suggests that most of these were class 1 with type I or type III systems. Genes for Cas1 and Cas2 proteins were ubiquitously distributed across all contigs (Fig. 3a). These genes were always structured as Cas1-Cas2 complexes. In four examples, the Cas1-Cas2 complex were flanked upstream by cas4, which directly interacts with the Cas1-Cas2 complex, to process pre-spacers prior to integration as the Cas4-Cas1-Cas2 complex 71. However, in two instances, we our analyses revealed that cas4 was downstream of the Cas1-Cas2 complex, which is an unconventional arrangement of these genes based on data from previous studies 36. In all 10 cases, the effector genes were located upstream of the Cas1-Cas2 operon.
The remaining four CRISPR-Cas systems may represent novel variants, based on arrangements of their effector modules (Fig. 3b) 36,72. These results imply ongoing horizontal gene transfer and recombination events of diverse CRISPR-Cas loci, led by continuous interactions with the same viruses. Notably, these uncategorized CRISPR-Cas system types were affiliated with members of the Acidobacteriota. They include FI-1_NODE_368 (cas2-cas1-cas4-cas3), which lacks an effector complex and seems to be closely related to Type IU. Contig FI-1_NODE_81 (cas4-cas2-cas1-cas6-cas3-cas5-cas7-cas8b1-cas7-cas8b1-cas7), which is a potential Type I-B variant based on multiple copies of cas7 and cas8 at the terminus of the array. Contig MtG-4_NODE_208 (cas2-cas1-cas1-RT-csm3gr7-csm3gr7-csm3gr7-cas10), which is potentially a Type IIIU array with three copies of csm3gr7. Finally, contig PT-2_NODE_41 (cas6-cas2-cas1-csm3gr7-csm3gr7-csm3gr7-cas10) may be a Type IIIA variant or Type IIIU variant since it lacks csm2, csm4 and csm5 genes.
All 10 predicted CRISPR-Cas systems were associated with CRISPR arrays. These systems were composed of spacers that ranged from 2 to 122 bp in length, with an average length of 35 bp. The cas2 sequences showed some divergence from those previously reported, and these results contrasted with our expectations. Instead, the cas2 sequences clustered among unrelated phyla, in some cases grouping within the kingdom Archaea (Fig. 3a). Nevertheless, our results show several cases where Archaea cluster with Firmicutes and other unrelated phyla. These results are not surprising given the fact that these genes are known to be horizontally acquired. This may indicate that the cas2 gene is not always taxonomically conserved. Instead, the result suggests mobilization via inter-phylum horizontal gene transfer (HGT) events or evidence of phylum-specific cas subtypes. A recent study showed that CRISPR-Cas systems may contribute to the propagation of transposable elements by facilitating transposition into specific sites 73. Similarly, our results support previous reports since we found transposase elements on almost half of the 10 CRISPR-Cas-containing contigs analysed.
Based on these data, we speculate that these Antarctic CRISPR-Cas systems were horizontally transferred as ancient mobilization events. This suggestion is supported by an evaluation of the G + C skew, among the 10 contigs containing cas genes, as a proxy for the timing of insertion events 74. Here, we inferred HGT through the detection of strong deviations in G + C content for a genomic fragment compared to the remaining genomic signature. Specifically, on NODE_81 from the FI-1 metagenome, the G + C content over the Cas proteins varied minimally across each gene yet is markedly different from the G + C content of the CRISPR array upstream of the cas genes (Fig. 4). By contrast, the contigs containing integrated prophages within the microbial genomes showed very high variations in G + C content (i.e. G + C skew) across the contig which possibly indicates a foreign origin 75 (Supplementary Fig. 3). Our Bayesian diversity estimates also indicated ancient divergence events of our MAGs from known bacteria. It is thus likely that the phages of these bacteria have similarly ancient Precambrian histories, which offers a possible explanation for their unique gene compositions.
Following this, we explored our data for the diversity of type V CRISPR-Cas systems. From the data, we identified a total of 216 contigs longer than 1 kb from 16 of the 18 metagenomes with predicted cas12 effectors proteins. Of these, 112 contigs with sizes ranging from 1,007 to 48,306 bp that possessed non-partial cas12 proteins were retained for downstream analyses. The lengths of effector proteins in these contigs varied from 89 to 630 amino acids, and this contrasted with previous reports that have indicated the average lengths for type V associated effector proteins to be ~ 400 amino acids and longer 76,77. As effector proteins associated with type V CRISPR-Cas systems are mainly distinguished by the possession of a RuvC nuclease domain, we also found these to be characteristic of our 111 effectors, including the smallest (89 aa) putative cas12 protein. Only one of these lacked a RuvC domain but possessed a helix-turn-helix domain. Further inspection of contigs possessing these indicated that only 13 of our effectors were proximal to CRISPR arrays, and unlike typical CRISPR-Cas systems none of the 112 were co-localized with the cas1-cas2 complex. Phylogenetic analysis of these indicated that just nine of our effectors (Ant Cas U5-8) clustered with previously characterized cas12 effectors. We then observed that 18 of our other effectors indicated a close phylogenetic relationship with transposon encoded TnpB proteins, further suggesting that type V effectors may have evolved from TnpB associated nucleases 78. We observed a further 83 additional effectors from our data that formed a distinct clade (indicated as Ant Cas U4), potentially representing a novel subgroup of cas12-like effectors (Fig. 5).
Altogether, we speculate that the unique diversity of the genes found in these Antarctic soils may be the result of a ‘slowed down’ evolution of genes selected during warmer periods of time. The Antarctic continent was a temperate rainforest during the mid-Cretaceous period ~ 140 Mya 79 and we speculate that the subsequent cooling of the continent may have constrained evolutionary forces from acting at their previous pace. Combined, these lines of evidence point to an ancient, acquired immunity of bacteria in Antarctica while contemporary infection events continue to occur through lysogenic phage infections.
We used metagenomes from remote and pristine Antarctic soils to assess their viral and bacterial diversity. Multiple lines of evidence suggest extensive phage-host interactions, potentially novel viral diversity, and CRISPR-Cas variants. The phage signatures (vOTUs) were linked to the infection of dominant soil bacterial lineages in these surface soils, including members of the Bacteroidota and Acidobacteriota, while prophages embedded within Verrucomicrobiota and Bacteroidota MAGs offer further insight into contemporary infections. CRISPR-Cas systems, part of the bacterial adaptive immune system, were common to 4 of 18 MAGs analyzed, indicating acquired immunity in both Bacteroidota and Acidobacteriota. Additional Class I CRISPR-Cas arrays (types I-B, I-C and I-E) were detected in the assembled metagenomes, where four CRISPR-Cas arrays did not perfectly match existing architectures and thus may be novel variants. Our analysis of G + C content and GC skew across CRISPR-Cas contigs showed low variations in G + C skew in CRISRP-Cas arrays, but more variation in prophages, suggesting that these acquired immunity markers are ancient whereas proviral elements appear to be the result of recent foreign DNA transfer as further evidenced by the description of novel, Antarctic exclusive cas12-like effectors.