Tracking Double-Stranded DNA Bacteriophages and Their Hosts in a Deep Freshwater Lake by Integrating Metagenomics and The Hi-C Technique

Background: In aquatic ecosystems, bacteriophages play key roles in species diversity, host population dynamics and functional gene transfer. Due to a lack of high-throughput experimental tools to obtain robust associations between bacteriophages and hosts, the current metagenome-based phage community research generally predicts interactions based on various computational pipelines. In this study, by introducing an in vivo proximity-ligation chromosomal conrmation capture (Hi-C) technique combined with existing sequencing data from a freshwater metagenome study, we investigated double-stranded DNA (dsDNA) bacteriophages distribution, bacteriophage-host associations and abundance proles at various depths in a deep alpine lake. Results: A total of 985 nonredundant viral genomes (containing 1,464 viral contigs measuring at least 10 kb) were identied using rigorous bioinformatic pipelines, and 67.3% of bacteriophages had never been reported previously. The bacteriophages had a signicant depth-dependent distribution, and 56.9% of phage populations were only present in the epilimnion. Overall, 567 auxiliary metabolic genes (AMGs) were identied in 239 phages (24.26% of the total population). Hi-C reconstructed 58 bacteriophage-host association events encompassing eight bacterial phyla, among which 14 AMG-coding bacteriophages were clearly linked to their hosts. Three bacteriophages contained the mec gene, which was exactly complementary to their hosts in the sulfur-related pathway, and three other bacteriophages contained AMGs that likely participated in host fructose and mannose metabolism. Conclusions: The present study identies bacteriophages in Lake Fuxian and describes the bacteriophages and their host dynamics along the depth prole. Our results indicate that metagenomics combined with the Hi-C technique has promise for future applications in high-throughput bacteriophage identication and bacteriophage-host association detection, which could further extend our knowledge of the functional roles played by bacteriophages in aquatic ecosystems.

, and mangroves [12]. The ndings of these studies indicated that bacteriophages interacting with their hosts (bacteria and archaea), acted as key partners in multiple ecosystems.
Bacteriophages can affect the diversity and function of aquatic microbial populations through the incorporation and expression of auxiliary metabolic genes (AMGs). Currently, phage-encoded AMGs are reported to include multiple types of genes involved in not only main nutrient cycle pathways, such as carbon metabolism [12][13][14], nitrogen [15], phosphorus [16] and sulfur cycling [17,18], but also some speci c pathways, such as acetate fermentation [19], methane oxidation [14], nucleotide metabolism [20] and oxidative stress responses [21]. Taken together, these ndings imply that viral AMGs in uence most, if not all, pathways of microbial metabolism. For phages, carrying and expressing their own AMGs related to energy metabolism provides advantages by generating ATP and/or reducing power for the synthesis of viral proteins, which places the largest energy expense on relatively small bacteriophages, including cyanophages [22,23]. For hosts, expanding their ecological communities based on receiving AMGs through horizontal gene transfer (HGT) can be mediated by AMG-encoding bacteriophages [24,25]. It follows that the importance of bacteriophages to nutrient cycling is not limited to killing their hosts but also includes altering the central pathways of host metabolism. Nevertheless, a comprehensive map of the AMG content of aquatic viral communities and how viral infection transforms host metabolism, partly through phage-encoded metabolic genes, has not been presented to date [26].
Currently, it remains a major challenge to associate bacteriophages and their hosts in viromics studies, although sequencing-based metagenomic technology has dramatically expanded our knowledge regarding the environmental virome, especially viral genome contents and micro-and macrodiversity [7,27]. Common bioinformatic methods for predicting bacteriophage-host associations are categorized into [28] (i) clustered regularly interspaced short palindromic repeat (CRISPR) spacer mining, wherein the spacer sequences should be nearly identical between bacteriophages and their hosts at the nuclear acid level [29,30], (ii) sequence composition similarity (such as tRNA, codon usage and restrictionmodi cation systems) [27], (iii) homology to other known bacteriophages [31,32], (iv) abundance correlations between bacteriophages and their hosts [26], (v) host-speci c genes located in the viral genome, such as whiB for Actinobacteria [25] and photosystem D1/D2 proteins for Cyanobacteria [33]. However, the rate of correct bacteriophage-host signals at the host species level was determined to be notably low (< 5%) by the above computational approaches [28]. Clearly, bacteriophage-host associations predicted only in silico were limited in further analysis; moreover, traditional experimental approaches, such as plaque assays, liquid assays and viral tagging methods, did not meet the needs of highthroughput detection of bacteriophage-host associations [28]. Recently, a new experimental method called AdsorpSeq (adsorption sequencing) was developed to measure bacteriophages that are preferentially adsorbed to speci c host cell envelopes and to detect unknown bacteriophage-host associations [34]. However, this novel technology was limited in the types of host cell envelopes and bacteriophage isolations that could be analyzed. Single-cell genome sequencing technology could represent direct experimental evidence of bacteriophage-host links and was used in previous aquatic virome studies [10,35,36]. However, with this method, bacteriophages in active replication might not be detected effectively, and single ampli ed host genomes could not be reconstructed on a large scale.
High-throughput chromosomal con rmation capture (Hi-C), which was developed ten years ago, is a xation-based proximity-ligation method for estimating the probability of close physical proximity between DNA fragments in the same cell [37]. In the early stage, this technique was employed to study DNA folding status (3D map for DNA) and chromosome regulatory landscapes [38,39]. During the past decade, Hi-C has been widely employed in contig orientation and scaffolding to construct chromosomelevel genomes for many eukaryotes [40][41][42]. More broadly, this technique was able to reconstruct strainand species-level microbial genomes (metagenome-assembled genomes, MAGs) from meta-communities [43]. The notable advantage of Hi-C-based MAG construction was the determination of the source of plasmid contigs that were always overlooked in traditional pipelines [44,45]. Bacteriophages need to enter host cells to complete their life histories, and it is possible to theoretically track all bacteriophagehost interactions simultaneously by metagenomic Hi-C [46,47]. Hi-C facilitated the association of extrachromosomal DNA (phages and plasmids) with microbial host genomes in the human gut, mouse gut and synthetic bacterial communities but not in environmental eld samples [48][49][50][51].
Freshwater ecosystems are small components of the world's surface ecosystems, but these areas are very important to humans for drinking water supply, and they matter for global carbon cycling [52]. There have been several freshwater virome studies related to spatiotemporal viral community dynamics and the identi cation of phages with antibiotic resistance genes and virophages [9,[53][54][55][56]. However, few freshwater virome studies have explored the viral AMG content and distribution across environmental gradients, such as the water depth, or identi ed bacteriophage-host interactions at the species level by relatively reliable methods. Lake Fuxian (24° 35'N, 102° 50'E, 1788.5 m a.s.l.) is the third deepest lake in the China Yunnan-Guizhou Plateau (its maximum depth is 158.9 m) and is a warm, oligotrophic monomictic lake [57]. Previous metagenomic deep sequencing studies took this lake as a model system for studying the microbiome community in the holomictic period. For Lake Fuxian, which had a microbiome strati cation pattern in the epilimnion and hypolimnion, we obtained 440 MAGs across 10 phyla and found that the epilimnion had a relatively high phage abundance [58]. In that study, the phage community was classi ed at the read level (< 0.5% of total reads) based on the NCBI RefSeq database, which only contained cultivated species, while most bacteriophages detected by metagenomic methods were novel and uncultivated [58]. Considering the wealth of good-quality and deeply sequenced metagenomic data for Lake Fuxian, this research provided an opportunity to thoroughly characterize the virome in this deep freshwater lake. Therefore, the primary aim of this study was to ll the research gap regarding the interactions between bacteriophages and their hosts by introducing Hi-C technology to environmental metagenome-based virome research and to provide a model for studying bacteriophageshost associations in natural ecosystems.

Hi-C sample collection and sequencing
Approximately 15 liters of Lake Fuxian raw water was collected from a depth of 20 m in December 2019 (in the holomictic period) when the sampling time was the same as that in a previous ultradeep metagenome study [58]. Subsequently, 15 liters of water was ltered through 0.1-μm lters (Isopore TM Membrance Filters, Merck Millipore) and used for the Hi-C experiment.
Microbe cells were lysed from one piece of membrane by glass bead disruption in detergent buffer. Next, we collected approximately 300 μl of solid material from the ltered membrane. The Hi-C library was constructed using a ProxiMeta™ kit according to the ProxiMeta Hi-C protocols provided by Phase Genomics (Seattle, WA, USA). Importantly, Hi-C DNA was fragmented using the Sau3AI (5'-^GATC-3') restriction enzyme. Finally, more than 200 million read pairs of Hi-C sequencing data were generated by NovaSeq 6000 sequencing systems (Illumina, USA).

Reanalysis of published Lake Fuxian metagenome datasets
First, raw reads for each sample were trimmed using the JAVA program Trimmomatic (version 0.33) to remove sequencing adapters and low-quality sequences (default parameters) [69]. To decrease the potential contamination from sampling and sequencing library construction procedures, we mapped the remaining sequences to the human genome (NCBI GRCh38) by BWA-MEM (v.0.7.17) with default parameters. This quality control step used in this study was different from a previous study, lacking the removal of known nonenvironmental bacterial genomes where prophage fragments might have existed to maximize the virome detection probability [58,70]. Second, MEGAHIT (v.1.1.1; parameter options: --mincontig-len 500 --k-min 21 --k-max 141) was introduced to perform metagenome assembly for each sample and generate a total of 7.11 Gb contigs (metagenome size was from 1.11 Gb to 1.87 Gb) [71]. Moreover, a coassembly strategy ( ve samples along the depth pro le) was also applied to improve the completeness and quality of Lake Fuxian metagenomes, while more than 6.53 Gb contigs were put into identify viral sequences (Additional File 2: Table S2).

Bacteriophage genomes identi cation
Following a previous virome study in hot spring, we used VirSorter and sequence BLASTN comparison to the IMG/VR database to detect viral genomes in the unbinned fraction of the assembled contigs [10,60,82]. First, the contigs (> 10 kb) classi ed as VirSorter categories 1-2 (viral genomes) and 4-5 (integrated viral contigs) were rst picked as candidate viral contigs. To improve the viral quality from VirSorter, contigs whose Pfam hit scores were higher than those of viral hits were discarded before further analysis.
The second total unbinned contigs were BLASTN against the IMG/VR database (downloaded July 2020, version 2 IMG_VR_2018-07-01_4) with an e-value less than 1e-5. Only best subject contigs remained, and corresponding hits were merged into consensus coordinate ranges by custom PERL scripts. Only contigs with at least 90% identity and 75% alignment coverage to IMG/VR viral genomes were assigned as IMG/VR viral clusters. Contigs with IMG/VR-and/or VirSorter-positive hits were considered nal viral sequences. Then, the nal viral sequences were aligned against each other by nucmer in the mummer package (v.3) [83]. Viral sequences sharing more than 95% nucleotide identity across more than 80% of the whole genome were dereplicated into groups, and the longest sequence within each group was chosen as the representative nonredundant Lake Fuxian viral genome (FXLVG). The remaining unbinned and non-FXLVG contigs were annotated by CAT (version 2020-06-18) with default parameters [84].
Bacteriophage taxonomic assignments and functional annotation vConTACT2 was used to classify the taxonomy for FXLVGs [85]. For each contig of FXLVGs, ORFs were predicted by Prodigal with '-p meta' parameter [86]. The corresponding protein sequences were treated as the input for vConTACT2, and DIAMOND was used to compare the similarity between all gene pairs based on amino acid homology [87]. The phage references used in vConTACT2 were from the NCBI RefSeq database (released on November 01, 2019, totaling 12,155 viral genomes). ClusterONE within the vConTACT2 pipeline was performed in the viral population clustering step (-d 0.3 -s 2 --max-overlap 0.9 -haircut 0.55). Networks representing the viral clusters were graphed using Gephi (v.0.9.2, https://gephi.org/). Auxiliary metabolic genes were annotated by VIBRANT (v1.2.1) with default parameters [64].

Abundance calculation
Sequencing depth has great impacts on the metagenomic gene abundance pro le, especially for ultralowand/or high-abundance objects [88]; therefore, 290 M reads (the lowest sequencing read number among samples) were randomly subsampled without replacement from ve metagenome-sequencing samples. The relative abundance of FXLMGs and FXLVGs was calculated as the genome coverage ratio using BBMap (a pipeline from the BBTools package, https://jgi.doe.gov/data-and-tools/bbtools/) and the SAMtools 'depth' module (v1.546) with default parameters. Representative FXLMGs and FXLVGs described above were selected for abundance calculation, and only genomes with reads across more than 75% of the corresponding genome length were treated as positive hits; otherwise, the relative abundance of objects was considered to be zero. The habitat speci city of FXLMGs and FXLVGs (using epilimnion as an example, S epi ) was measured as the quotient of relative abundance in the epilimnion (20 m) versus the sum of abundance in the epilimnion and hypolimnion (20, 60, 80, 120 and 140 m) [S epi =epilimnion abundance/(epilimnion + hypolimnion abundance) * 100%]. Epilimnion-speci c FXLMGs and FXLVGs were de ned as those whose S epi value was larger than 90%. So were hypolimnion-speci c FXLMGs and FXLVGs. BHR (bacteriophage/host ratio) was de ned as the bacteriophage genome abundance divided by the host abundance.
CRISPR-based bacteriophage-host linking CRISPR spacers in FXLMGs and unbinned contigs were identi ed by three software programs: CRT, CRISPRFinder and CRISPRDetect [89][90][91]. Spacers from one FXLMG and contigs were grouped by cd-hit- . Dereplicated spacer sequences were compared against FXLVGs by BLASTN, and only matched pairs with ≥ 97% alignment coverage and ≥ 97% nucleotide identity were considered positive bacteriophage-host matches.
Hi-C-based bacteriophage-host linking Hi-C data using the Sau3AI (5'-^GATC-3') restriction enzyme were rst trimmed to low-quality reads by Trimmomatic (default setting), and ltered high-quality reads were mapped to FXLMGs and FXLVGs simultaneously by BWA-MEM (v.0.7.17) and SAMtools (v1.546) with parameters recommended by Phase Genomics, Inc. (US Seattle, hic_qc.py script, https://github.com/phasegenomics). Hi-C contacts were extracted from mapping results by ALLHiC software (setting: "--RE GATC") [41]. The number of restriction enzyme cutting sites (RECS) for each genome and the observed link number (OLN) between each pair of genomes were recovered by the ALLHiC-extract module. To obtain highly reliable bacteriophage-host links from Hi-C contacts, we used the abundance mentioned above, as the normalized read coverage M and H is the assembly set of FXLMGs and FXLVGs respectively. We de ned the expected link number ELN(ij) = RECS(i) * M(i) * RECS(j) * H(j). We modeled the statistic (D) of spurious contact between all pairs of FXLMGs and FXLVGs as D= rank of OLN(ij) -rank of ELN(ij). The Kolmogorov-Smirnov test was used to test the normal distribution of statistic D. The threshold for Hi-C signi cant contacts between FXLMGs and FXLVGs was P < 0.05 (single-tailed test).

Results And Discussion
Overview of viral genome in Lake Fuxian From the ve shotgun metagenomes and coassembly results, 1,464 viral sequences (> 10 kb) were recovered from Lake Fuxian. Of these sequences, 942 were identi ed by the de novo method (VirSorter), 211 showed high similarity with reported bacteriophage sequences in the IMG/VR database, and 311 were recovered by both methods (Additional File 1: Table S1). Both approaches for bacteriophage identi cation employed in this study (VirSorter and comparison to IMG/VR) were based on an a priori algorithm with known bacteriophage sequences and core bacteriophage proteins, reducing the number of FXLVGs detected but increasing the accuracy of the results [59,60]. After clustering at least 95% sequence similarity and 85% aligned coverage, 985 nonredundant Lake Fuxian viral genomes (FXLVGs) were obtained for further analysis, with genome lengths ranging from 10 kb to 451.7 kb and guanine-cytosine (GC) contents ranging from 19.37% to 74.05% (Additional File 1: Table S1). Four bacteriophages measured more than 200 kb in length and were classi ed as prophages by VirSorter; therefore, we did not consider them as giant viruses that were reported in previous study [61]. FXLVG richness at 20 m in depth was the highest (three times larger than the average richness at other depths), while the richness from 60 m to the bottom decreased rapidly ( Fig. 1a; Additional File 2: Table S2). The FXLVG richness status also supported the pro le that not only microorganisms, including bacteria and archaea, but also phages had strati ed features during the holomictic period [58].
The genomic heterogeneity of the freshwater virome dataset was evaluated in our study. FXLVGs were compared with viral data obtained from other freshwater habitats at the nucleic acid level with the same sequence similarity cutoff used in FXLVG construction. There was no FXLVG matching with either the UFO (Uncultured Freshwater Organisms) or LBVC (Lake Biwa Viral Contigs) viral datasets, indicating that the freshwater virome had high habitat heterogeneity. A total of 655 FXLVGs were novel sequences that were rst reported in this study (Additional File 3: Fig. S1) [27,55]. It might not be productive to compare viral genomes from similar freshwater habitats even with the same bacteriophage identi cation pipeline, let alone virome dataset from different habitats, such as TARA Ocean, the large ocean virome GOV2.0 dataset generated by at least three kinds of viral sequences detection software [7]. Remarkably, among 322 IMG/VR-like FXLVGs, 245 (76.09%) were from freshwater, and 59 (18.32%) were from other aquatic ecosystems, such as marine and wastewater (Additional File 4: Table S3). A total of 94.41% of IMG/VRlike FXLVGs were traced from aquatic environments, suggesting that our viral detection pipeline was reliable.
A gene-sharing network was constructed by vConTACT2 to classify FXLVGs (985 accessions) with publicly available taxonomy-known viral sequences in RefSeq (12,155 accessions, March 2020). The classi cation using vConTACT2 was able to cluster viral sequences with known genomes. A total of 57.44% (n=7547, 7547/13,140) of viral accessions were clustered, and the remaining accessions did not cluster with any viral genomes. Of the 985 FXLVGs, 778 could be assigned to 158 viral clusters (VCs), of which 21 VCs contained taxonomy-known viral genome members (Additional File 4: Table S3; Additional File 5: Fig. S2). Finally, 66 (6.63%) FXLVGs clustered within the VCs belonging to the families Podoviridae (n=20), Siphoviridae (n=20), Myoviridae (n=19), Mimiviridae (n=5) and Lavidaviridae (n=2). Except for ve Mimiviridae FXLVGs, other known-family FXLVGs belong to Caudovirales classi ed as double-stranded DNA viruses (dsDNA viruses). Mimiviridae and Lavidaviridae were detected only in the epiliminion, while the ratio of Podoviridae in the hypolimnion was two times that in the epiliminion. These results indicated that the majority of the FXLVGs obtained from Lake Fuxian were completely unclassi ed, and the bacteriophage taxonomic structure showed strati cation between the epiliminion and hypolimnion during the holomictic period (Additional File 6: Fig. S3). Interestingly, the bacteriophages whose taxonomic percentages were in the top three were Podoviridae, Siphoviridae and Myoviridae in Lake Fuxian, which coincided with those identi ed from the ocean virome GOV2.0 and wastewater viromes [7,18]. In Lake Fuxian, putative hosts for Myoviridae, Podoviridae and Siphoviridae were members of Caulobacter, Myxococcus and Enterobacteria. In wastewater plants, the most abundant hosts were Acinetobacter, Arcobacter, and Moraxella [18]. This nding demonstrated that host structure could be more diverse than viral taxonomy structure among different habitats.

Depth pro le for bacteriophage abundance and gene function
The relative abundance of FXLVGs was determined based on the metagenomic average depth, calculated as read coverage per base under the same read input for each sample. Among the 985 FXLVGs, 695 (70.56%) were epilimnion-speci c, and 161 (16.35%) were hypolimnion-speci c, which was consistent with the viral richness results (Fig. 1b). With respect to the GC content of the clean metagenomic reads, an increasing GC content of FXLVGs was observed between the epi-and hypolimnion (40.75%±9.09% and 51.98%±12.31%), whose pattern coincided with those of MAGs ( Fig. 1c; Additional File 1: Table S1) [58]. In the hypolimnion, 17 FXLVGs were depth-speci c (7 at 120 m, 6 at 140 m, 2 at 60 m and 2 at 80 m).
These numbers were considerably lower than those observed at 20 m (epilimnion-speci c FXLVGs). We observed that epilimnion-speci c MAGs (higher-temperature habitats) had considerably smaller genome sizes than hypolimnion-speci c MAGs (Additional File 7: Fig. S4). However, no depth pro le was detected for FXLVG genome length based on the current genome completeness. Bacteria and archaea growing in a high-optimum-temperature environment were found to have smaller genomes in a previous report [62].
These results demonstrated that the genome evolutionary process might differ between bacteriophages and their prokaryotic hosts.
To further explore how bacteriophages might affect the biogeochemistry and the potential relationship between viral abundance speci city and function at different lake depths, AMGs in FXLVGs, which are host metabolic genes carried by viral genomes and function during the infection process, were categorized [63]. Overall, 567 AMGs were detected in 239 (24.26%) FXLVGs (2.37 AMGs per bacteriophage) ( Fig. 2; Additional File 8: Table S4). Similar to previous freshwater and soil virome studies, carbohydrate metabolism-related AMGs were the most popular in FXLVGs [9,12]. The proportion of AMGencoding epilimnion-speci c FXLVGs (198/695, 28.49%) was signi cantly higher than that of hypolimnion-speci c FXLVGs (18/161, 11.18%) (Fisher's exact test, p < 0.01). The proportion differences of aromatic compounds, glycans, nucleotides and sulfur relay metabolism-related AMGs between epiand hypolimnion-speci c FXLVGs exceeded more than 2-fold (Fig. 2). Among the four metabolism categories, epilimnion-speci c FXLVGs exhibited more AMGs in aromatic compound and glycan metabolism, while hypolimnion-speci c FXLVGs exhibited more AMGs in nucleotide and sulfur relay metabolism.
All these results demonstrated that the epi-and hypolimnion-speci c lineages were clearly differentiated by viral abundance and GC content, and depth-speci c AMGs might engage in favorable functional niches.
As a relatively reliable computational method for detecting bacteriophage-host association signals, the CRISPR spacer-based approach has been applied in many virome studies [10,14,[26][27][28]. A total of 7,075 CRISPR spacer sequences were identi ed from 393 representative MAGs covering a total of 24 phyla (Additional File 10: Table S6). The spacer sequencer distribution at the phylum level was similar to the FXLMG numbers across each phylum. The top ve phyla (Proteobacteria, Actinobacteriota, Planctomycetota, Verrucomicrobiota and Bacteroidota) also had the highest numbers of spacer sequencers (Fig. 3). The spacer sequence number varied notably at the single-FXLMG level; for example, one Myxococcota FXLMG had 526 spacer sequences, while 49 FXLMGs only had 1 for each ( Fig. 3; Additional File 10: Table S6; Additional File 11: Fig. S5). Furthermore, only 10 CRISPR-based virus-host events (10 FXLVGs and 8 FXLMGs) were identi ed from 985 FXLVGs detected in Lake Fuxian (Additional File 12: Table S7). Among these sequences, one Chloro exota FXLMG (FXLMG137) linked with three FXLMGs, but the other FXLMGs linked one-by-one with a unique FXLVG. We also searched CRISPR spacer sequences in unbinned or no-FXLVG fractions. A total of 2,895 spacer sequences and 19 link events were recovered, but only 10 hosts could be classi ed as bacteria, while the others were unclassi ed (Additional File 13: Table S8).
Similarly, in cone pool virome research, only three bacteriophage-host link events were detected by CRISPR [10]. Twenty-six (1.28%, 26/2034 total viral genomes) bacteriophages had host information identi ed by the CRISPR-based method in two freshwater habitats [27]. These results suggested that most MAGs lack a CRISPR defense system to combat current phages, and existing CRISPR elements across the host genomes might be evolutionary traces of host bacteriophage-ghting machinery in the history of host genomes [28]. On the other hand, the bioinformatic principle of FXLMG reconstruction could not be constructed e ciently for very low-abundance microbial genomes [6,22]. Meanwhile, bacteriophages could also control host abundance to an extent. Nevertheless, the low e ciency of CRISPR-based link event detection in Lake Fuxian suggests the need to use other strategies for linking bacteriophages to their hosts.

Hi-C-based bacteriophage-host link events
Because of the highest richness and abundance of FXLVGs in the epilimnion, we chose the epilimnion sample as a model to investigate the bacteriophage-host relationship by Hi-C technology. A total of 3,118 candidate bacteriophage-host link events were initially identi ed (Additional File 14: Table S9).
Considering the bacteriophage and host abundance and the number of restriction sites in each sequence, 58 link events (con dence > 95%) were eventually detected across 31 representative FXLMGs spanning 8 phyla (Fig. 3). Among 58 link events, 8 FXLVGs had more than one host, and 2 of them might infect FXLMGs across different phyla. In addition, 11 FXLMGs could be infected by multiple FXLVGs (Additional File 15: Fig. S6). There were 24 (24/58, 41.38%) events (among 8 bacteriophage-host clusters), where the corresponding FXLVGs contained AMGs (Fig. 4a). Interestingly, link events detected by CRISPR and Hi-C were completely different in our study. The low number of CRISPR-based events decreased the possibility of overlap event detection, and the Hi-C-based method was focused on the infection events within each host cell in situ. Therefore, it was reasonable that there was no link event detected by either strategy.
Strong correlation signals for functional complementarity between FXLVGs and FXLMGs were observed in the Hi-C-based results. The infection signals of 11 Cyanobacteriota-FXLVG link events were identi ed, including 2 epilimnion-speci c FXLMGs and 8 epilimnion-speci c FXLVGs (Additional File 12: Table S7). Four of the eight FXLVGs (FXLVG66, FXLVG309, FXLVG472 and FXLVG500) contained D1 or D2 proteins, which were members of the photosystem II P680 reaction center (psbA and psbD in the VIBRANT software classi cation system). Among these four FXLVGs, proteins PC and Fd, which participate in photosynthetic electron transport, were also identi ed (Additional File 16: Fig. S7). Remarkably, one of 8 cyanophages, FXLVG608, had the mec gene ([CysO sulfur-carrier protein]-S-L-cysteine hydrolase, EC:3.13.1.6), and its infected host was FXLMG141, which lacked the mec gene in the sulfur relay system (Fig. 4b). These results indicated that cyanophages might provide other important genes to Cyanobacteriota in addition to the most widely known D1 and D2. In previous reports, D1/D2 proteins in viral AMGs were generally used to predict potential Cyanobacteriota hosts [26,27,64,65], but the limitation was that only phylum information of the host could be predicted, and the corresponding bacteriophages were uniformly classi ed as cyanophages.
Another bacteriophage-host functional complementary event was found in the mannose-fucose metabolism pathway (KEGG ko00051) but with different strategies of cooperation. Mannose and fucose are sugars, with the former being an essential endoplasmic reticulum component in prokaryotes, while the latter is distributed in complex carbohydrates of all species. In prokaryotes, GDP fucose is synthesized from GDP-mannose in a three-step reaction catalyzed by two enzymes, GMDS (GDPmannose 4,6-dehydratase, EC:4.2.1.47) and TSTA3 (GDP-L-fucose synthase, EC:1.1.1.271) (Fig. 4c) [66]. In the FXLVG611-FXLMG349-FXLMG350 cluster (VHC4), two key enzymes (GMDS and TSTA3) were identi ed in the bacteriophage, and both hosts contained a PPM gene (phosphomannomutase, EC:5.4.2.8), which was used to transform mannose-6-phosphate into mannose-1-phosphate. In this case, it appears that the host could utilize the fucose synthesized within the bacteriophage. However, in the FXLVG794-FXLMG254 cluster (VHC5), only TSTA3 was identi ed in FXLVG794, and GMDS was observed in the host FXLMG254 (Planctomycetota). These results demonstrated that bacteriophages might play an important role in fucose synthesis, which is necessary in bacterial cell construction, although cooperation can be very diverse. Nevertheless, these ndings demonstrate the considerable power of Hi-C in bacteriophage-host link event detection in the future.

Bacteriophage-host interactive abundance analysis
Based on the identi ed bacteriophage-host pairs and genome coverage information obtained by the read mapping approach along the depth pro le, this research provided an opportunity to evaluate the abundance relationship between bacteriophages and hosts at species level. For 58 Hi-C-based bacteriophage-host pairs in ve samples (290 events), 170 (58.72%) events were discarded in further analysis because of undetectable coverage of the bacteriophage and/or host genome (Additional File 17: Table S10). All 170 events were from hypolimnion samples, which might be observed because only the epilimnion sample was selected to perform the Hi-C experiment, and no hypolimnion-speci c FXLVG was identi ed in the Hi-C-based bacteriophage-host relationship results. Among the remaining 120 events, more than half of the hosts (62/120, 51.67%) had higher genome coverage than their bacteriophages (bacteriophage/host abundance ratios, BHR < 1.5). A total of 22.5% (27/120) of events had nearly equal genome coverages (-1.5 < BHR < 1.5), and 25.83% (31/120) had higher bacteriophage coverage (Additional File 18: Fig. S8). Notably, these results were signi cantly different from those obtained in a previous virome study in which most bacteriophages had greater abundance than their hosts [10,67]. The uncertainty provides another opportunity to use the Hi-C-based method to detect the bacteriophage-host relationship within prokaryotic cells instead of counting bacteriophages at lysis time.
Similar to FXLVG abundance, BHR showed a speci c pattern along the depth pro le and host diversity.
BHRs were signi cantly (p < 0.01) higher in the epilimnion than in the hypolimnion (Fig. 5a). The BHRs in the epilimnion and hypolimnion were both negatively correlated with host abundance (Fig. 5b), and BHRs in the epilimnion decreased considerably more quickly with lower host abundance. Remarkably, BHRs varied among different hosts at the phylum level (Additional File 19: Fig. S9). For Cyanobacteriota, total FXLVGs had a higher abundance than FXLMGs (> 6.97-fold on average). For Planctomycetota, FXLMGs in hypolimnion had higher abundance where the corresponding BHRs were in considerably lower abundance, while in epilimnion, the opposite distribution was observed. These results were in keeping with the "Piggyback-the-Winner" theory that lysogeny was the predominant viral lifestyle in a high-hostabundance environment where bacteriophages (integrated into the host genome or coexisting within the host cell) increased the ability of the host to resist other bacteriophage infections and gain competitive advantages compared with other microbes [68].

Conclusion
Metagenomic techniques and decreasing sequencing costs provided an opportunity to survey the depth pro le of bacteriophage distribution and ecological functions in an alpine freshwater lake. Our study contributes nearly one thousand nonredundant viral genomes to the current collection of freshwater phage communities. The results of bacteriophage abundance pro les and AMG distribution along the depth provide further evidence of the incomplete mixture in Lake Fuxian during the holomictic period. To the best of our knowledge, this study was the rst to employ the Hi-C technique to characterize the bacteriophage-host association for real eld ecological samples, and we identi ed 58 bacteriophage-host association events. The results of AMG association analysis within bacteriophage-host pairs not only suggest that bacteriophages and their hosts might collaborate closely and have complementary functions at the gene level but also provide reliable evidence to indicating the robustness of the Hi-Cbased bacteriophage-host detection method. We believe that it will be useful to design virome studies in extensive ecosystems combining metagenomics and the Hi-C technique in future research.  Columns are annotated by the abundance feature. FXLVGs with Sepi > 90% were epilimnion-speci c (light green) and hypolimnion-speci c FXLVGs had Sepi < 10% (dark green), others were de ned as the mixture type. (c) GC content vs genome length for epi-and hypolimnion-speci c FXLVGs.

Figure 2
AMGs numbers and relative ratio between epi-(light green) and hypolimnion-speci c (dark green) FXLVGs. 11 functional subcategories of AMGs were classi ed by VIBRANT. "*": > 2× fold difference for the corresponding ratio of subcategory.

Figure 3
Phylogeny, genome size information, number of CRISPR spacers, relative abundance of 431 FXLMGs and their associated bacteriophage. Like FXLVGs, FXLMGs were also categorized into epi-and hypolimnionspeci c types. FXLMGs with Sepi > 90% were epilimnion-speci c (light green) and hypolimnion-speci c FXLMGs had Sepi < 10% (dark green). Pentagram means FXLMGs having more than one associated bacteriophage.

Figure 3
Phylogeny, genome size information, number of CRISPR spacers, relative abundance of 431 FXLMGs and their associated bacteriophage. Like FXLVGs, FXLMGs were also categorized into epi-and hypolimnionspeci c types. FXLMGs with Sepi > 90% were epilimnion-speci c (light green) and hypolimnion-speci c FXLMGs had Sepi < 10% (dark green). Pentagram means FXLMGs having more than one associated bacteriophage.