Soil plays a vital role in the distribution of organisms and their contributions to global biogeochemical cycles [22]. However, our understanding of soil viruses, and the factors driving their distribution, lags far behind that of marine viruses. Furthermore, the ways in which viruses participate in the biogeochemical cycling of soil elements has not been extensively investigated. This study provides evidence that latitude is a key driver of viral distribution in soils. Furthermore, the higher abundance of viral-encoded P metabolism genes in agricultural soils indicates that viruses have the potential roles of P cycling in these soil ecosystems.
The taxonomic distribution of soil viruses
The order Caudovirales (tailed viruses that infect Bacteria and Archaea), including Siphoviridae, Myoviridae and Podoviridae, was dominant in all of our soil samples, in agreement with previous studies [9, 11, 12, 16, 46]. In the Antarctic soil, Podoviridae presented at similar levels in all samples, whereas the abundances of Myoviridae and Siphoviridae were inversely correlated, as they may have direct competition for hosts in the same niche, and Siphoviruses are always present at higher abundances in neutral to alkaline pH soils [16]. However, our study showed different trends, with Myoviridae occurring in all samples, whereas Siphoviridae and Podoviridae were mainly present in more acidic soils. More data may be needed to find patterns, especially since so many viruses in the viromic data could not be classified.
Moreover, our soil viromes revealed diverse ssDNA viruses belonging to the Microviridae, Circoviridae, Genomoviridae, and Cruciviridae (Fig. 1c). The broad presence of ssDNA viruses is possibly due to the bias of MDA, which is necessary to generate enough viral DNA for metagenomic analyses but preferentially amplifies genomes of ssDNA viruses and thus leads to a quantitative bias [47]. Therefore, we analyzed the dsDNA and ssDNA viruses separately, and ssDNA viruses were reported in a qualitative rather than quantitative way in this study.
Latitude drives viral community composition and function
Viral community composition has been associated with a variety of environmental factors, such as host community composition, pH, soil depth and moisture [14], calcium content and site altitude [16]. According to the unsupervised random forests analysis, the viral communities and functions from 19 soil samples across China grouped into 3 clusters, which corresponded to geographical location well (Fig. 1a, Fig. 3a and Fig. S1). A subsequent supervised random forest analysis showed that the main environmental driver of these clusters for both viral communities and functions was latitude. There have been few reports regarding location and its effects on the distribution of viruses. Such as the altitude of Antarctic soils which probably linked to temperature could influence microbial metabolism and substantially impact viral communities and functions [16]. The temperature change along the latitude in this study may have similar effects, especially on viral community. All of the viruses differentiating these clusters were unclassified viruses. This highlighted the lack of knowledge and reference sequences for soil viruses.
Although phosphorus is an important factor of viral genome synthesis, the results do not imply any relationship between soil available P content and viral communities and functions. It is possible that our sampling time may be at different stages of phosphorus metabolism because of different fertilization time in each agricultural region. On the other hand, soil available P content may affect viral abundance more than viral community, and we will further focus on this point in the future.
Viruses may directly manipulate P cycling in soils
The phoH gene has been widely used as a signature gene for assessing viral phylogeny and diversity, and is encoded by various morphologically distinct viruses that infect a wide range of hosts, including autotrophic and heterotrophic Bacteria and Eukaryotes [27, 28]. A diversity of phoH genes have been found in viral communities inhabiting numerous environments, such as seawater [27], paddy water [45], and a Namib hypolith [11]. In these studies, phoH genes were distributed according to depth and location [27], biogeography [45], or were found to be entirely novel [11]. In this study, phylogenetic analyses showed that a few phoH sequences in groups 1, 2, 3 and δ (Fig. 4) were widely distributed in agricultural soils, paddy water [45] and sea water from different sites of the world [7, 27]. However, most viral phoH sequences (29/33) in this study belonged to two new viral clades (groups S1 and S2) that clearly differed from those in marine and paddy waters (Fig. 4). The majority of the Namib hypolith phoH amino acid sequences clustered separately from other sequences and was omitted from our phylogenetic tree. These results support the inference that the distribution of viral phoH genes is dependent on characteristics of the environment [48].
During the second Chinese soil survey [49], a database created from 2,473 soil profiles was analyzed and showed relatively consistent C:P (136) and N:P (9.3) ratios, with a highly constrained C:N:P ratio of 134:9:1 for the surface soils from both of agricultural and natural soils [50]. This ratio indicates that the P content in Chinese soils is generally lower than that required by phages, which have a C:N:P ratio of 20:6:1 [19]. Due to P slow diffusion and high fixation in soils, plus the crops on the absorption of P for agricultural production [18], this means that P can be a major limiting factor for soil microbes, especially viruses. Based on this background, this P deficient environment may select for these viruses to regulate P uptake and metabolism through evolution of the phoH gene. It is interesting that all 33 phoH gene sequences identified in this study were from viruses in agricultural soils. It is possible that agricultural soil is a rich environment in terms of dissolved organic matter, produced via photosynthesis, and nitrogen applied as fertilizer, but that these excesses of C and N result in P being limited. Once P fertilizer input, virus may prompt its host to quickly absorb inorganic P (Pi) and use PhoH to help its own reproduction (Fig. 6).
To better understand the metabolic potential of phoH genes, we searched for, but did not find, additional genes in the Pho regulon. However, it is interesting that four functions related to nucleotide synthesis, including dUTPase, MazG, Thy1, and RNR, were identified in association with phoH to act as a P metabolism module. Previous studies have demonstrated the presence of at least five proteins involved in P metabolism including PhoH, RNR, Thy1, endodeoxyribonuclease, and MazG pyrophosphatase in marine phage genomes [41, 42]. Similar modules were also found in two complete viral genomes from two agricultural soils in our unpublished data, including dUTPase, PhoH, RNR, and Thy1 (Fig. S4). Until our study, only two P metabolism genes (phoH and RNR) have been reported in terrestrial ecosystems [11]. Here, five of the P metabolism genes were identified, especially in agricultural soils (Fig. 5a). Among them, MazG is reported as a nucleoside triphosphate pyrophosphohydrolase, which can hydrolyze all eight of the canonical ribo- and deoxynucleoside triphosphates to their respective monophosphates and PP(i), with a preference for deoxynucleotides [51]. RNR, known as ribonucleoside diphosphate reductase, converts all four ribonucleotide diphosphates (rNDPs) to the respective deoxynucleoside diphosphates (dNDPs), which are then rapidly converted to dNTP [41, 52]. The dUTPase can catalyze dUTP to dUMP and release diphosphate, and provide a substrate (dUMP) for thymidylate synthase [40]. Thy1 can convert dUMP to dTMP depending on FAD, NADPH and 5, 10-methylenetetrahydrofolate [53]. PhoH has been reported as a cytoplasmic protein with an ATP-binding activity and is predicted to be induced by P starvation [54]; however, its function remains unknown. Altogether, this information led us to hypothesize that PhoH can act as a nucleotide synthase, possibly binding and hydrolyzing ATP through its conserved nucleoside triphosphate hydrolase domain to obtain energy, and taking advantage of Pi from the agricultural soil (through the host cell) to catalyze the synthesis of nucleotides for the virus's own genome (see conceptual model in Fig. 6). This model predicts the proliferation of a huge number of soil viruses playing an important role in depleting P from the soil ecosystem. Future work should focus on whether the concentration of Pi in soil is associated with the number of progeny produced by viruses, and also quantify the contribution of viruses to P loss from soil.