Virus-Host Interactions Reveal Potential Roles Towards Phosphorus Cycling in South Atlantic Ocean Euphotic Zones

Percy Mutseka Lunga University of Pretoria Oliver K.I Bezuidt University of Pretoria Miho Hirai JAMSTEC: Kaiyo Kenkyu Kaihatsu Kiko Yoshihiro Takaki JAMSTEC: Kaiyo Kenkyu Kaihatsu Kiko Taichi Yokokawa JAMSTEC: Kaiyo Kenkyu Kaihatsu Kiko Takuro Nunoura JAMSTEC: Kaiyo Kenkyu Kaihatsu Kiko J. Colin Murrell University of East Anglia Thulani Makhalanyane (  thulani.makhalanyane@up.ac.za ) University of Pretoria https://orcid.org/0000-0002-8173-1678

genetic endemism in this ocean. Three phosphatases, phoN, gmhB and rnhA-cobC, were identi ed as Pcycle AMGs in both prokaryotic double-stranded DNA viruses and eukaryotic Nucleocytoplasmic Large DNA viruses. These genes are associated with the acquisition of inorganic phosphate from phosphate esters, the largest reservoir of P-containing compounds in the marine environment. AMGs were identi ed in both uncultured and unclassi ed prokaryotic double-stranded DNA viruses predicted to infect Bacteriodetes, Proteobacteria, Chloro exota and Poseidonales lineages.

Conclusion
Together, these results suggest that viruses modulate P-cycling in euphotic zones of the ocean and that the acquisition of these phosphatase genes may be cues of P-ester stress.

Background
Viruses are the most numerically abundant entities on Earth and substantially regulate the structure and function of microbial communities [1][2][3]. As obligate parasites of marine microorganisms, viruses play essential roles in marine environments [1,[4][5][6]. Previous studies have shown that viruses in uence nutrient cycling, functional diversity of microbes and particle sinking rates across a diverse range of marine ecosystems [7][8][9][10]. The complexity of microbial interactions such as widespread horizontal gene transfer complicates our efforts to understand the precise impacts of viruses on ecosystem services [4,11,12].
There has been some evidence showing the genetic potential of viruses to augment microbial metabolism through auxiliary metabolic genes (AMGs) [13][14][15][16][17]. AMGs encoded by marine bacteriophages are metabolically diverse and include various processes such as photosynthesis and the sequestration of nitrogen, sulphur, and carbon [18][19][20][21]. Metabolic genes that are equally suggested to augment host cellular processes including central nitrogen metabolism, iron and phosphorous uptake have also been reported for eukaryotic Nucleocytoplasmic Large DNA viruses (NCLDVs) [22,23]. The discovery of diverse AMGs and elucidating their metabolic control on microbiomes has become the subject of extensive research over the past few years [24][25][26][27][28]. In oligotrophic environments, viruses may maximize host tness through the expression of AMGs, which have been shown to increase metabolic exibility [29][30][31]. For instance, cyanophages in oligotrophic systems with particularly low phosphorus levels have been shown to encode alkaline phosphatase (PhoA) and high-a nity phosphate-binding (pstS) genes. These ndings suggest that cyanophages play important regulatory roles linked to host responses to phosphate stress [32][33][34]. Moreover, genes for inorganic phosphate transporters (Pho4) have also reported to be encoded by NCLDVs that infect phytoplankton, further suggesting the overall importance and contributions of viral families towards host responses to nutrient uptake under limiting conditions [35]. Dissolved inorganic phosphate (DIP) is one of the key macronutrients, which limits microbial growth and primary production [36,37]. In oligotrophic oceanic regions, P is assimilated as fractions of the dissolved organic phosphorus (DOP) that constitute P-esters, P-anhydrides, and phosphonates [38,39]. This inventory of DOP exceeds the pool of inorganic phosphate which is a key chemical component of DNA, RNA and phospholipids [40,41]. As a result microbial communities and viruses contribute substantially to P-acquisition and turnover [41,42]. Oligotrophic plankton communities from the North Atlantic have been reported to replace membrane phospholipids with those lacking phosphate in response to P-starvation [43,44]. In another study, conducted in the Mediterranean Sea, C-Plyase complex (phnGHIJKLNMOP), high-a nity inorganic phosphate (pstSCAB) and organophosphate This is likely due to extensive nitrogen xation in the NAO, which results in phosphorous depletion, subsequently affecting microbial communities in the NAO [49][50][51]. However, we lack a clear understanding regarding the importance of viruses in biogeochemical cycling and in particular P acquisition. Previous studies suggest that the SAO is dominated by Alphaproteobacteria, Gammaproteobacteria, Thaumarcheota and Actinobacteria microbial lineages [52,53]. Viruses linked to these microbial communities were mostly from members of the Caudovirales order. These viruses were shown to harbour AMGs involved in amino acid and photosynthetic metabolism [52]. An understanding of AMGs involved in P acquisition may help in identifying the effects of viruses on marine ecosystems.
We primarily characterized the host-virus linkages in euphotic zones of the SAO and used shotgun metagenomic analysis to predict host-virus interactions in the SAO and reveal their role in P-cycling through AMGs. In addition to providing a compendium of metagenome assembled genomes (MAGs) from these environments, we demonstrate that the SAO metagenomes harboured 27 P-related AMGs.
Several of these AMGs were directly linked to assimilation of organic phosphorus. These genes included phosphatases, gmhB and rhnA-cobC, which have not previously been described as AMGs. Our results indicate the potential of viruses to augment P-acquisition from phosphate-esters and phosphonates in the SAO and in other marine environments.

Results And Discussion
A compendium of phylogenetically diverse bacterial and archaeal communities To estimate the diversity of the epipelagic microbial communities, we generated 620 metagenome assembled genomes (MAGs). These were dereplicated to 104 high and medium quality MAGs (>50% completeness and <10% contamination) following the MIMAG reporting standards [54]. In this study, we were able to retrieve a disproportionate number of high (32) and medium (72) draft quality MAGs compared to a recent study conducted in the SAO [52] (Supplementary Table 2). These MAGs represent the largest compendium of SAO genomes and comprised of taxonomically diverse archaeal (2) and bacterial (8) phyla ( Fig. 1b and 1c). MAGs from our study ranged from 0.6Mbp to 6.92Mbp and encoded between 854 to 6,648 proteins consistent with genome completeness estimates. Within bacterial MAGs, members of Bacteriodia (17), Gammaproteobacteria (26) and Alphaproteobacteria (14) classes were highly represented in each sampling station. For archaea, members of Thaumarchaeota (11) and Thermoplasmatota (5) phyla were only recovered from a few sampling sites (sampling stations 1, 2 and 3).  Table  3). The compendium of South Atlantic Ocean metagenomes suggests substantial metabolic versatility in bacteria and archaea from this environment. In this study, MAGs were binned to predict viral-host assignments and assess the potential of viral mediated augmentation of phosphorus cycling amongst different microbial lineages.

SAO harbours novel viral communities
To explore viral communities associated with the SAO euphotic zones, microbial communities were processed using VirSorter2. The viral population obtained in this study were expected to be derived from either proviruses, viruses in infection and replication stages in microbial cells or giant viruses because microbial cell fractions larger than 0.2 µm were used in this study. The results suggest that the 24 metagenomes harboured 7,363 viral contigs. These contigs were assessed for quality using CheckV and the resultant contigs, with qualities ranked as complete (0.2%), high quality (27,8%), medium quality (19,8%) and low quality (43,4%) (Supplementary Table 4 and Supplementary Figure 1) were further used for downstream analyses. Of these, 7,176 contigs (approximately 3% reduction) with sizes >5kb were clustered at >95% ANI over 80% of the shortest sequence. These resulted in 5,999 viral operational taxonomic units (vOTUs) of which 1,663 were >10kb. A large proportion of the total vOTUs comprised 3,581 putative NCLDVs as well as 2,348 dsDNA viruses (Supplementary Figure 2). Only 423 of 3,581 NCLDVs had >= 1 hallmark genes, and of these only 2 had one polB and MCP proteins, respectively. Based on the underrepresentation of these marker genes within the putative NCLDV contigs we did not conduct taxonomic classi cations. Comparisons of prokaryotic-speci c dsDNA vOTUs against known prokaryotic and archaeal viruses from NCBI RefSeq (version 85) using vconTACT2 indicated that only 8.5% of these were classi ed within the Caudovirales order ( Figure Table 5), suggesting that these lineages may be sources of organic matter which may shunt nutrients to the aphotic zone. The remaining 28.9% (excluding putative NCLDV contigs) of prokaryotic dsDNA vOTUs, that could not be taxonomically assigned, suggests that these could represent a wide range of uncharacterised viral communities in the SAO (Supplementary Figure 3). Consistent with previous studies, our results suggest that the SAO viral communities possess protein coding genes which have no homologous hits against the NCBI nr database. However, a much lower proportion of these genes (only 2-3%) were viral [52].

Euphotic zone viruses encode AMGs that potentially reprogram phosphorus cycling
To further understand how viruses in uence nutrient cycling and biogeochemical processes in the SAO, we explored vOTUs for the presence of auxiliary metabolic genes (AMGs). Our analysis revealed 1,279 AMGs involved in a diverse range of metabolic processes among 804 vOTUs of which 356 and 448 were identi ed in both putative NCLDVs as well as dsDNA viruses. Overall, AMGs involved in carbohydrate, amino acids and cofactors/vitamins metabolic pathways were the most overrepresented (Supplementary Figure 4). This suggests that the composition and abundance of AMGs may be in uenced by host diversity and intrinsic environmental conditions [65]. Therefore, the diversity and abundance of AMGs involved in nutrient cycling re ects speci c genes that potentially contribute to host adaptation and nutrient acquisition. Among these, were several AMGs, identi ed in seven uncharacterized vOTUs, which were linked to P assimilation and acquisition. These included acid phosphatases (phoN) and other previously undescribed phosphatases such as D-glycero-D-manno-heptose 1,7-bisphosphate phosphatase (gmhB) and ribonuclease H/adenosylcobalamin/alpha-ribazole phosphatase (rhnA-cobC) ( Figure 3, Supplementary  20,26,66,67]. As these were of major interest, we further explored the genetic content and structure of these AMGs following established methods [26,67]. The screening involved: (i) using Vibrant v1.2.0 to search for AMGs as previously described [11], (ii) structural modelling using SWISS-MODEL and Phyre2 for quaternary and tertiary structures respectively and (iii) assessment of promoters using BPROM. All three sequences of P-AMGs genes were annotated as those linked to phosphate acquisition, based on protein structure models ( Figure 4, Supplementary Table 6).
Acid phosphatase The phoN acid phosphatase genes encode for enzymes that catalyse the hydrolysis of phosphomonoesters (P-esters) to inorganic phosphate at optimal pH under phosphorus stress [68]. Pesters form part of the most widespread pool of dissolved organic phosphorus (DOP) in phosphorouslimited marine environments [34,69]. The presence of phoN, as an AMG, suggests the potential of viruses in the SAO to enhance and facilitate the assimilation of organic phosphorous in bacterial (Gammaproteobacteria, Dehalococcoidia and Bacteroidia) and archaeal (Poseidoniia) lineages following viral-host association analysis using VirHostmatcher (Supplementary Table 6). As the phoN gene was identi ed in viral contigs, we conducted phylogenomic analyses to assess its evolution and check for possible horizontal gene transfer events. The analysis resulted in incongruent ML gene trees, indicating that these phoN AMGs were likely acquired from bacterial and archaeal microorganisms ( Figure 5). This result suggests that these lineages may be highly abundant in the SAO [52,53] or that they likely respond rapidly to P-stress.
D-glycero-D-manno-heptose 1,7-bisphosphate The gmhB gene encodes a phosphatase, which hydrolyses D-glycero-D-manno-heptose 1,7-bisphosphate, producing phosphate as a by-product [70]. This gene is a member of the haloalkanoic acid dehalogenase (HAD) enzyme super-family and has not been previously detected in viral genomes. The putative giant viral contig (S11_k141_598071), which harboured this new AMG, had 45 other genes ( Figure 4). Of these genes, three had functional assignments to a liated sequences previously recovered from unclassi ed Phycodnaviridae which are known to infect algae in the Pyramimonas order [71]. gmhB catalyses two pathways that result in the production of precursors for the S-layer glycoprotein and Lipid A biosynthesis [70,72]. Previous studies have shown that microbial communities in low phosphorus environments use alternative P-recycling mechanisms [43,73]. These mechanisms may include the replacement of membrane phospholipids with those that do not contain phosphorus [43,74,75]. Our results suggest the potential augmentation of the P-cycle by giant viruses in the SAO utilising gmhB to replace membrane phospholipids with lipopolysaccharides during P limitation. In addition, our analysis suggests that carbohydrate phosphates such D-glycero-D-manno-heptose 1,7bisphosphate may be abundant organic phosphate compounds in marine environments playing essential roles as sources of phosphorus during P starvation.
Ribonuclease H/adenosylcobalamin/alpha-ribazole The rhnA-cobC encodes an enzyme which catalyses the hydrolysis of adenosylcobalamin-5`-phosphate to phosphate and adenosylcobalamin, the active form of vitamin B [76]. Many marine bacteria lack the ability to synthesise adenosylcobalamin, and as a result are dependent on archaea and phytoplankton for this function [77]. In this study, the rnhA-cobC gene was associated with putative giant viral contigs. All MAGs in this study did not have the rnhA-cobC gene ( Figure 6). This may be due to the assembly approach used or other sequencing related issues as rnhA-cobC genes are ubiquitous in prokaryotic genomes [78]. In addition, the ML tree suggests that the gene was likely acquired from bacteria further highlighting mosaicism of giant viruses (Figure 6). To the best of our knowledge, this is the rst description of the rnhA-cobC as an AMG. Functional characterisation of the rnhA-cobC has mostly been linked to its cleavage of RNA/DNA hybrids formed following replication and repair [79]. Our results suggest that adenosylcobalamin-5`-phosphate could be a potential alternative source of phosphorus when its hydrolysis liberates phosphate, thus facilitating microbial use of phosphate. Alternatively, rnhA-cobC could be used to cleave host RNA/DNA, molecules which are rich in phosphate, during infection and lysis. Other compounds such as methylphosphonic acid, various phosphonates and monophosphate esters have been shown to be alternative sources of phosphorus [75,80,81]. Similar to phosphate esters, adenosylcobalamin-5`-phosphate has a C-O-P bond which suggests its potential use as a source of P in euphotic zones. Viruses may be shifting their ecological strategies to acquire non-canonical P-genes, thus enabling the assimilation of alternative sources of P by their prokaryotic hosts.

Conclusion
The South Atlantic Ocean is characterized by distinct genetic endemism and generally low levels of nitrogen and phosphorus [53,82]. The paucity of SAO metagenomics studies has resulted in a clear knowledge de cit regarding the roles played by viruses in shaping microbial diversity, ecology and evolution. This study revealed a large proportion of uncharacterised viruses, which putatively infect diverse consortia of archaea and bacteria. Virus-encoded AMGs, comprised of genes related to carbohydrate, amino acids and cofactors/vitamins metabolic pathways. Signifying that viruses have the potential to augment host metabolism during infection. Our analysis also revealed several AMGs, potentially related to P acquisition and assimilation by bacteria. These AMGs include phoN, ghmB and rnhA-cobC, which are outside the canonical repertoire of phosphorus genes. These results suggest that viruses potentially augment the acquisition of alternative sources of P in the form of monoesters and organic phosphorus compounds such as adenosylcobalamin. In addition, these P-AMG signatures suggest that their speci c pathways in Bacteroidetes, Proteobacteria, Chloro exota and Poseidonales are those that potentially respond rapidly to P-starvation. As regions of high productivity, nutrient depletion and high prokaryotic diversity, euphotic zones represent model ecosystems for studying the contributions of viruses to the physiology of their hosts under nutrient starvation. Since only a small proportion of viral contigs could be classi ed in this study, there remains large gaps in the understanding of the microbiology of SAO. Together, the data from our study provide fundamental insights regarding potential viral-host dynamics in euphotic zones of the SAO.

Materials And Methods
Cruise details and sampling using SAMtools v1.9 as previously described [90].

Viral identi cation and taxonomic assignments
Viral signatures in assembled contigs (> 5 kb) were identi ed using VirSorter v2.1 as previously described [102] Brie y, contigs were analysed using VirSorter v2.1 (--include-groups all -min-length 5000 -minscore 0.5) and the quality of the resultant contigs was checked using CheckV [103]. The output results nal-viral-score.tsv and contamination.tsv from VirSorter and CheckV were used to screen viral contigs for downstream analysis respectively. Quality checked contigs were screened based on the following criteria: VirSorter_max_score (>=0.95), viral gene (>0) and viral gene =0 (and host gene =0 or score >=0.95 or hallmark >2) as previously described [104]. The resultant outputs were further manually checked for proportions of contigs putatively assigned as double-stranded DNA viruses (dsDNA), nucleocytoplasmic large DNA viruses (NCLDV) and single stranded DNA (ssDNA). Viral populations were determined by clustering viral contigs > 5kb at 95% average nucleotide identity over 80% of the shortest sequence using CD-HIT-EST from the CD-HIT package [105]. vconTACT2 was used to classify viral contigs [106]. Brie y, viral open reading frames were rstly predicted using Prodigal v2.6.3 as described previously [88,107]. Predicted open reading frames were then queried against the NCBI Bacterial and Archaeal Viral RefSeq V85 with ICTV and NCBI taxonomy [108] used in vconTACT2 (http://www.cyverse.org). The results of viral classi cation were then visualised on Cytoscape v3.8.1 and colour coded based on respective order taxonomic rank [109,110].

AMG identi cation, annotation and validation
Auxiliary metabolic genes were determined using VIBRANT v1.2.0 (in "virome" mode), which was accessed via the CyVerse platform [111]. Contigs containing phosphorus acquisition and assimilation genes were queried and subjected to a series of validation steps to ensure that they could be attributed to viral sequences and to con rm the functional annotations [112]. Genetic architecture plots to visualise the organisation of genes on contigs were generated using the R package gggenes (https://wilkox.org/gggenes/). Conserved domains and active sites of phosphorus AMGs were identi ed using the NCBI conserved domain search (e-value 0.001) (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). Protein domains and structural homology of all AMGs were identi ed using Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) [113]. Predicted secondary structures, with a 100% con dence score and alignment coverage above 70%, were further analysed with the SWISS-MODEL to predict quaternary structure (GMQE > 0.5) [114]. Promoter regions in the protein domains were predicted using BPROM [115].

Viral relative abundance, distribution and host-viral matches
To determine the relative abundances of viral populations containing P-cycle AMGs, all contigs were concatenated and then used as a database to recruit the quality trimmed reads using BBMap [89]. The relative abundance of each population per sample was estimated from the resulting bam les and converted into table using a custom wrapper script from BamM (https://github.com/ecogenomics/BamM). Coverage values as relative abundance proxies were calculated using the "tpmean" algorithm, normalized for the size of each metagenome in bases, and the length of each contig as previously described [26,116]. The resultant data were used to assess the distribution of viral populations harbouring P-cycle AMGs among sampling stations and visualised using ggplot2 in R v4.0.3 [117,118]. To infer putative links between viruses and hosts, VirHostMatcher was used to compute oligonucleotide frequency between viral contigs and MAGs [119]. As the goal of the study was to primarily study the interactions and in uence of prokaryotic viruses towards bacterial hosts, all putative giant viral contigs were removed prior the analysis. Putative virus-host interactions were ltered using a d2* dissimilarity cut-off < 0.3 and visualised using Cytoscape v3.8.1.

Phylogenetic tree reconstruction
Functional analyses of rnhA-cobC (ribonuclease H/adenosylcobalamin/alpha-ribazole phosphatase) and phoN (acid phosphatase (class A)) genes were performed to investigate the evolutionary origin of these AMGs. Protein sequences were queried against the NCBI nr database (blastp 1000 bit score cut-off, and evalue 0.001) [120]. Closely related sequences were retrieved from the NCBI to estimate the non-viral context of sequences in the phylogenetic trees. The closest relatives were selected and reduced to representative sequences using CD-HIT (-c 0.9 -n 5). Protein sequences were aligned using MAFFT (--auto) and trimmed using trimAL (default parameters) [121,122]. Functional gene trees reconstruction were performed using PhyML with the Chi 2 -based branch support and visualised using iTOL [101,123].

List Of Abbreviations
AMGs

Consent for publication
Not applicable.

Availability of data and material
Nucleotide sequence data have been submitted to NCBI SRA under the study BioProject ID PRJNA748242.

Figure 2
Page 24/27 Taxonomic assignment of viral populations protein network clustering with reference phages. In the protein network, each shape represents a single viral population or reference phage, and shapes are connected by lines respective to shared protein content. Viral population taxonomy was coloured according to family level. The full untrimmed network is supplied as Supplementary Figure 3.