Characteristics of LVD dataset and assembled VPs
The land use virome dataset LVD was derived from 2.6 billion paired clean reads of sequences across 50 viromes of 25 samples with five types of land uses. A total of 6,442,065 contigs > 1,500 bp were yielded, then 764,466 (11.8%) putative viral contigs were identified through VIBRANT, in which 27,951 and 48,936 bona fide viral genomes were retained from 25 iVLPs and 25 eVLPs viromes after removing putative false positive viral genomes, respectively. These genomes were clustered into 25,941 and 45,152 VPs for iVLPs and eVLPs viromes, respectively, in which the iVLPs and eVLPs viromes shared 11,467 (19.2%) VPs. Subsequently, they were merged and dereplicated, resulting in 59,626 VPs (Table S2) for the following analysis. A total of 8,112 (13.6%) VPs were obtained with at least one complete viral genome, in which the median length of all and circular VPs were 25,183 bp and 45,511 bp, respectively (Figure 1a). In addition, 15.3% (9,172 VPs) lysogenic VPs were detected (Table S2). Furtherly, 4,844 (8.1%) completed genomes, 6,475 (10.8%) high-quality genomes and 15,156 medium-quality genomes (25.4%) were distributed into separate VPs through checkV (Figure 1b), only 133 genomes (0.2%) were identified as not-determined.
To explore the taxonomic affiliation of VPs in family and genus-level, a gene sharing network consist of 59,626 VP genomes from this study and 3,502 reference phage genomes (from NCBI Viral RefSeq version 201) revealed 6,009 VCs comprising of 37,224 VPs, of which 34,417 VPs were from LVD, besides 2,794 singletons (2,653 from LVD dataset), 16,056 outliers (15,833 from LVD) and 8,492 overlaps (8,061 from LVD) were detected (Table S3). Of these, only 157 VCs contained genomes from both the RefSeq and LVD dataset (1,864 viral genomes) (Table S3). Most of VCs (1,837, 30.4%) included only two members.
At the family-level, most of VPs were classified into Siphoviridae (712 by Vcontact2 and 29,671 (50.9%) by Demovir, tailed dsDNA), Podoviridae (610 by Vcontact2 and 9,923 (17.6 %) by Demovir, tailed dsDNA), Myoviridae (485 by Vcontact2 and 5,445 (9.9%) by Demovir, tailed dsDNA), Tectiviridae (50 by Vcontact2 and 10 (0.10%) by Demovir, non-tailed dsDNA) (Figure 1c). Besides, the Eukaryotic viruses Herpesviridae (159 by Demovir, 0.26%, dsDNA), Phycodnaviridae (120 (0.20%) by Demovir, dsDNA); the Virophage Family Lavidaviridae (15 (0.03%) by Demovir) were detected as well, but a majority of VPs were unclassified in genus-level.
Viral community structures differ across land use types
Bray-Curtis dissimilarity of viral communities (median 0.9951) showed strong heterogeneity of viral communities among different sample sites (Figure 2a). While, the Bray-Curtis dissimilarity (median: 0.5109) between paired viral communities of iVLPs and eVLPs from each site have a significant lower heterogeneity than inter-sites (Wilcox.test, p < 0.001) (Figure 2a). Similarly, viral communities of paired iVLPs and eVLPs were grouped together for each site and were well separated from the other sites (anosim.test, R=0.01, p > 0.05) (Figure 2b), paired iVLPs and eVLPs from each site were merged for subsequently viral community analysis.
PCoA based on Bray–Curtis dissimilarity of viral communities indicated the viral communities of the 25 samples from five land use types were clustered into three land use zones (Figure 2b). We designated these three emergent land use zones as the agricultural zone (AG) including paddy and vegetable plot, urban green spatial zone (UG) including park and road verge, and forest zones (FO) respectively (anoism.test, R=0.72, p=1e-4). PERMANOVA indicated that the best predictor of β-diversity of viral community composition was the land use zones compare with other environmental parameters, which explained 10.4% of the variance (p < 0.0001). The proportion of shared VPs within land use zones was significantly greater than those between zones (Wilcox.test, p<0.0001) (Figure 3a). The heatmap indicated that the viral communities were clustered according to the land use zones (Figure 3b) except sample T5 rather than geographic distance (Figure S3). Furthermore, at the VCs level, the multi-zonal VCs were dominant and accounted for 24.7% to 45.2% relative abundance (Figure 3c), suggesting that it revealed that the strong VPs boundary among different zones of land use presented a lesser pronounced at VCs level, such as the zone FO and AG did not share any VPs, but they shared 34% VCs (Figure 3d). We also observed that UG and AG shared the most VCs (47%), followed by UG and FO (42%), and FO and AG (32%) (Figure 3d).
The diversity of viral communities of three land use zones were comparable as indicated by Shannon (6.1–7.0) and Simpson (0.9878–0.9977) indexes. The compositions of viral communities were similar, dominated by tailed dsDNA bacteriophages assigned to the family Siphoviridae, followed by Podoviridae, Myoviridae, Herpesviridae, and Tectiviridae (Figure S4). Mantel tests revealed that viral community structures at population-level significantly correlated with 22 environmental parameters, of which the pH played the most important role in driving viral community structure (Figure 2c).
The relative abundances of lysogenic phages vary among different land uses
The relative abundance of lysogenic phage in individual viromes was significantly different among land use zones (Figure 4a), such as the relative abundance of lysogenic VPs of the AG (from 4.3% to 17.4%) was significantly lower than those in the UG (from 7.5% to 31.0%) and FO (from 24.5% to 47.8%) (Figure 4a). Likewise, the alpha-diversity of lysogenic phage showed a significant difference among the three zones (Figure S5). Besides, the random sample analysis in the relative abundance matrix revealed that the mitomycin C treatment facilitate the acquiring of lysogenic VPs (Figure 4b). The relative abundances of lysogenic VPs were negative correlated with soil moisture (Figure 4c), whereas a positive correlation with DOC was shown (Figure 4c).
Host-linked viral community compositions
The variations of prokaryotic community composition of different land uses were characterized based on taxonomic profiles derived from corresponding metagenomes. The results showed that the dominant class was Actinobacteria, Bacilli, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, and Deltaproteobacteria (Figure 5a). There was a significant difference among the five land use types (anoim.test R=0.44, p<0.001) (Figure S5a), whereas the alpha-diversity were similar (Figure S5b).
Next, the relative abundances of VPs were summed up according to their assigned hosts at class level (Figure 5b, Table S4). The results indicated obvious variations in community composition according to zones (Figure 5b) compared with overall soil bacterial compositions (Figure 5a). Overall, the dominant predicted host were Actinobacteria, Alphaproteobacteria, Gammaproteobacteria, Betaproteobacteria, and Acidobacteriia (Figure 5b). The zone UG occupied the highest relative abundance (from 17.5% to 52.8%, wilcox.test: p<0.01) of Actinobacteria-linked phages; the AG occupied the highest relative abundance (from 6.8% to 26.9%, wilcox.test: p<0.05) of Betaproteobacteria-linked phages (Figure 5b). We explored the host of 8,910 VPs with significant differences among the three zones (Table S5). Phages infected Gammaproteobacteria were enriched in zone FO, and the phages infected Actinobacteria and Alphaproteobacteria were enriched in zone UG, whereas the zone AG enriched phages that infected Acidobacteriia and Betaproteobacteria (Figure S6).
Mantel tests revealed that the compositions of bacterial communities were significantly related to the environmental factors DC/DN, C/N, DN, NO3, Se and longitude (Figure 5c). The host-linked viral communities were significantly related to bacterial communities (mantel statistic r=0.44, p<0.001), and they were significantly sensitive to the changes of geographic distance (longitude), soil pH, and concentration of metal element Ba, Ni, As, Mo, La, Nd, and Zn (Figure 5c). The network based on the Spearman correlation coefficient indicated that pH, longitude, P, Zn, and moisture have a higher connection degree compared to other environmental factors (Figure S7). pH was strongly correlated to the phages infecting Actinobacteriota, Acidobacteriia, Gammaproteobacteria, but not to the abundance of these taxa in soil metagenome (Figure S8). The relative abundances of Actinobacteria-linked phages were positive correlated with the elevated pH, whereas the relative abundances of Acidobacteria and Gammaproteobacteria-linked phages were negative correlated with pH (Figure S8).
Shared VPs and microdiversity
The shared VPs based on their distribution within and between land use zones were classified as local, regional, and multi-zonal VPs (Figure 6a). In total, 44,534 VPs (74.6%) were classified as “local”, and 12,314 VPs (20.6%) were classified as “regional”. Only 2,771 VPs (4.6%) were present in at least two zones and were defined as “multi-zonal”, of which zone FO and AG only shared 103 VPs, whereas the 1,479 VPs were shared by zone FO and UG. In addition, only 76 VPs were detected in all zones, whereas there were 7,398 bacterial species (64.4% of all) presented in all zones (Figure 6a).
The microdiversity of VPs increased from local, regional to multi-zonal, despite that in FO, significant difference in the microdiversity between regional and multi-zonal VPs was not observed (Figure 6b). The microdiversity of multi-zonal VPs in FO were significantly lower than UG and AG (Figure 6c).
The connection number of node (viral population) in the gene-sharing network constructed by vContact2 were calculated. We found that the number of connections (median 47.00) of multi-zonal VPs with other VPs was significantly greater than that of regional one (median, 38), and regional one (medium, 38) was significantly greater than that of local (median, 31) (figure 6d). Subsequently, 2,771 multi-zonal VPs were observed in 1,416 VCs. The results showed that the size (median, 10) of VCs included multi-zonal VPs is significantly higher than the total (median, 4), the multi-zonal VPs prefer to exist in the large-sized VCs (figure 6e). These results demonstrated that the multi-zonal VPs shared genetic information with more VPs than local and regional in soil.
We identified genes whether under positive selection by evaluating the ratio of non-synonymous to synonymous mutations observed in gene sequences using the pN/pS equation. Of 726,331 genes tested from populations with sufficient coverage (> 5 mean coverage depth), 39,118 genes were identified as being under positive selection in at least one sample, and most of them were predicated responsible for structure and DNA metabolism (Table S6). The genes carried by local VPs yielded higher pN/pS values than those carried by regional and multi-zonal VPs (Figure 6f).