Detection of bacteriophages and cyanophages for water quality control of a water supply system in Amazônia

Background: Despite the importance of understanding the ecology of freshwater viruses, there are not many studies on the issue when compared to marine viruses. The microbiological interactions that occur in these environments are still poorly known, especially between bacteriophages and their host bacteria, as well as between cyanophages and cyanobacteria. Lake Bologna, from Belém, capital of the Brazilian State of Pará, is a source of water that supplies the city and its metropolitan region, yet it remains unexplored regarding the contents of its virome and viral diversity composition. Therefore, this work's main aim is to clarify in terms of taxonomic diversity the species of DNA viruses that are present in this lake, especially bacteriophages and cyanophages, since they can act both as transducers of resistance genes and reporters of water quality for human consumption. Results: For this work, we used the metagenomic sequencing data generated by Alves et al. (2020), and we analyzed it at the taxonomic level using the tools Kraken2, Bracken, and Pavian; later, the data was assembled using Genome Detective, which performs assembly of viruses. The results observed in this work suggest the existence of a widely diverse viral community and an established microbial phage regulated dynamics in the Lake Bolonha. Conclusions: This work is the rst-ever to describe the virome of Lake Bolonha using a metagenomic approach based on high-throughput sequencing, as it contributes to the understanding of water-related public health concerns regarding the spreading of antibiotic resistance genes and population control of native bacteria and cyanobacteria.


Introduction
Lakes close to urban areas are increasingly changing their ecosystem as human population expansion occurs and commercial, recreational, and residential uses increase [1]. The eutrophication process offers particular conditions for the replication of viruses, as environments of this type seem to provide high viral activation, as well as hypothetically controlling host abundance, respiration, and production [2].
In lake environments, there is little information about the ecology of the freshwater virus when compared to marine viruses [3]. Most of them are bacteriophages or human and other animal viruses, but plant viruses are also identi ed [4]. Bacteriophages can play an important function in the aquatic ecosystem as they can also contribute to the acquisition and spread of antibiotic resistance genes (ARGs) [5]. Some studies have shown ARGs-carrying phages to be abundant in many environments, especially those impacted by anthropogenic activities [6][7][8][9][10], which demonstrates that these types of viruses are relevant to local microbial ecology [6].
Like bacteriophages, another important category of viruses is cyanophages, which infects cyanobacteria and has a morphology similar to bacteriophages [11]. Being abundant in both fresh and saltwater, and they are relevant due to playing an important role in modulating cyanobacterial populations and preserving water quality [11,12]. However, unlike bacteriophages, it has many genera of possible hosts; therefore, freshwater cyanophages can be classi ed according to the taxonomy of their host organisms [13].
Given the need to study the diversity of different environments, including aquatic one, techniques have been developed, such as viral metagenomics, also known as virome. This technique allows the study of a variety of viruses from environmental samples [14]. In this way, the use of metagenomics and nextgeneration sequencing (NGS) to explore viral populations, both in aquatic environments and within the human microbiome, has demonstrated considerable genetic complexity as well as inter-and intra-species interaction [15,16]. However, despite an increasing number of studies using the technique, there are still signi cant gaps in the virome databases. It has been estimated that 1,031 viral particles are infecting bacterial populations, but less than 2,200 double-chain DNA virus (dsDNA) and retrovirus genomes deposited at the National Center for Biotechnology Information (NCBI), compared to more than 45,000 bacterial genomes [17].
The diversity and abundance imbalance of important viruses, such as bacteriophages, can cause important changes in the aquatic ecosystem, including the transduction of resistance genes between bacteria that can be mediated by some of these phages, which can provide evolutionary advantages to microorganisms and affect the water quality [18]. As for cyanophages, they are abundant in aquatic environments and play a fundamental role in owering dynamics, including growth regulation and photosynthesis of cyanobacteria [19]. Therefore, our objective was to characterize the diversity of DNA viruses in Lake Bolonha, especially those that have bacteriophagic and cyanophagic behavior, and thus can contribute to the transduction of genes associated with resistance to antimicrobials and water quality.

Sample collection
The samples of water were collected in January of 2017 at Lake Bolonha, Belém-PA, in three different points of the lake before arriving at the local water treatment station (Fig. 1), according to Alves et al.
(2020) [20]. The sampling sites along the lake took place at the water catchment area from the lake to the water treatment station (P1), at the local evacuation of water to other treatment substations (P2), and at the channel connecting the lakes Água Preta and Bolonha (P3). Alves et al. also performed the steps of water quality assessment, DNA extraction and metagenomic sequencing [20].

Treatment of raw data
The raw data was used to perform the taxonomic analysis through Kraken2 tool [21] with the parameter '--download-library viral' to download the complete viral sequences of RefSeq and classify the reads regards its taxonomy.

Assessment of Viral Diversity
The output generated by Kraken2 were submitted to the tool Bracken [22], using abundance and diversity to generate more accurate estimations on the viruses genus and species levels. The input parameters were "${CLASSIFICATION_LVL} = 'S' (Species)" and "input data = kraken2 output (report)". All other parameters were set as default. Later, the results were displayed with the Pavian tool [23], which allows comparing the taxonomic classi cations obtained by Kraken2 and Bracken, as well as presenting abundance estimations in several samples.

Viral Metagenome Assembly
The Genome Detective software [24] was used assembling the sequencing data and classifying the contigs formed in their respective taxons, identi ed using metaSPAdes software for single-end reads [25].

Virome Assembly
The viral portion of the metagenomic raw data was assembled in order to prepare the data for the classi cation of their respective taxon per each of the freshwater samples (   Table 3). The 20 species with a higher count of reads in all of the freshwater samples are described in Table 3, Table 4 and Table 5. The number of reads found for each of the viruses contained on this top selection was compared with their respective number of reads on the other two samples from this study.
In general, the sample P1 presented a prevalent abundance of Synechococcus phage, a cyanophage (Fig. 2). We obtained the highest amount of reads for a single virus in sample P2 for Choristoneura fumiferana granulovirus (Fig. 3). The sample P2 also presented a much larger amount of the genus Pandoravirus and Mimivirus when compared to other collection sites. The presence of Haemophilus virus HP1 was only observed on sample P3 (Fig. 4).

Discussion
When it comes to viral abundance, taxonomic analysis reveals that a plethora of more than 3.500 distinct species of viruses is present at freshwater Brazilian Lake Bolonha. It is possible to observe that the viruses with higher abundance of reads in P1 and P3 have a lower distribution among its P2, which can be explained by the proximity between sampling sites P1 and P3 (Tables 3-5). The alpha diversity observed in the raw sequencing data of Table 1 of Alves et al. [20] also shows signi cant sample abundance in P1 and P3, what could also be related to this observation.
The sample P1 presented abundance of Synechococcus phage (Table 3), which is a type of phage that frequently infects cyanobacteria of the Synechococcus genus at diurnal rates of infection, due to the photosynthetic activities of its host [26]. This infection affects their population dynamics by killing part of this cyanobacteria population daily, estimated between 0.005% and 30% per day [19,27,28]. Additionally, it has been described that cyanophages play an important part in diversity and evolution of their host cyanobacteria [4,17,[29][30][31][32][33].
Amongst the abundant species, the cyanophage S-RIM was found in the sample P1 and infects the cyanobacteria of the genus Synechococcus [34]. The abundance of cyanophage S-RIM50 has been reported in both fresh and seawater [12]. Cyanophages such as the cyanophage S-RIM50, as well as Synechococcus phage, are abundant in freshwater environments and have been isolated from a variety of freshwater reserves, including lakes, ponds, streams and sewage points [19,26,28]. They have an important contribution in the maintenance of the cyanobacterial community and in the preservation of water quality [11].
According to Alves et al. [20], the Amazonian vegetation on its shore characterizes The Lake Bolonha and the propagation of large plants under its surface, resulting in eutrophication. Also, it presents increased phosphorus and total nitrogen values in physical-chemical analysis and a high fecal coliform rate [20]. In this study, we noted the signi cant presence of cyanophages, mainly at P1 (Table 3) and P3 (Table 5). This founding contributes as an important indicator of the interference of these phages in the environment, known for their ability to perform photosynthesis by consuming oxygen and their potential for binding to nitrogen or production of toxins [31]. Nitrogen or phosphorus supplies, in addition to reduced growth rate and biomass, may naturally limit freshwater ecosystems, including those involving cyanobacteria accumulation [35].
Synechococcus phages, present in both samples P1 (Fig. 2) and P3 (Fig. 4), can be associated with the occurrence of health problems such as multiple sclerosis due to the protein expressed by the Epstein-Barr virus since it contains many short sequences identical to the products of 16 autoantigen related to multiple sclerosis susceptibility genes [36]. Other viruses associated with multiple sclerosis have also shown this behavior and the bacteriophage Synechococcus has been identi ed as a new and important contributor to this phenomenon. According to the distribution of multiple sclerosis, the cyanobacterial phage host prefers a temperate climate, and the ecology of bacteria and bacteriophages is consistent with the epidemiology of multiple sclerosis [34].
The presence of Yellowstone Lake phycodnaviridae, a double stranded DNA virus that infects algae, and Escherichia virus DE3 on the sample P1 were also observed (Fig. 2). A larger number of reads associated with Shigella phage SfIV was observed when compared to samples P2 and P3 (Table 3). It is important to mention a possible relationship with the presence of the Shigella bacterial host, responsible for causing intestinal infection followed or not by fever, colic and diarrhea with blood and mucus [37]. This demonstrates the importance of this study in the characterization of the environmental conditions as a possible source of information for public health.
The P2 collection point presented a large amount of Choristoneura fumiferana granulovirus (Table 4), part of the Baculoviridae family [38]. It is interesting to observe a small number of reads (25 reads) associated with the species Diplodia scrobiculata RNA virus 1 only in P2. The viral species has as preferable host the endophytic fungus Diplodia scrobiculata, which mostly affects the genus Pinus spp. among other conifers (Bihon et al., 2011). The presence of Halovirus HF1, responsible for infecting members of the Halobacteriaceae family [39], was also identi ed (Fig. 3).
A considerable amount of Bacillus virus Bp8pC has also been observed on sample P2 (Table 4), which hosts the bacteria Bacillus thuringiensis and Bacillus pumilus. Both are of economic importance because they are used as pest control in agriculture bringing little harm to humans [40]. The presence of the genus Bacillus in lake water may indicate the contamination of the water environment by different types of residues coming from the watershed to the lake [41].
Exclusively on sample P2, a much larger amount of the genus Pandoravirus and Mimivirus was observed in comparison to the other collection points (Fig. 3). Previous studies suggest a potential role of Mimivirus in respiratory pathology, displayed during seroconversion in patients with pulmonary pneumonia. In addition, positive serology for Mimivirus is associated with increased duration of mechanical ventilation supported breathing and intensive care unit in patients with ventilator-associated pneumonia [42]. Both genera are constituted of very large viruses and have Amoeba as their common host [43][44][45]. Pandoravirus has a size of about 1 micron and may resemble some types of bacteria. Their genome contains more than 100 distinct genes and can be of order twice as large as the Mimivirus genome, besides the fact that their genome is quite different compared to other known organisms [46].
Haemophilus virus HP1 was only observed at sample P3 (Fig. 4). This bacteriophage infects the Haemophilus in uenzae bacterium [47], and its sampling site is located at the starting point of the channel connecting both Lakes Bolonha and Água Preta. It is important to observe that what occurs at this site may in the future in uence the environment of Lake Água Preta.
Overall, these results denote the presence of a diverse viral community and suggest the existence of an established regulation dynamics in the local microbial environment of Lake Bolonha, highly in uenced by the bacteriophages and cyanophages that inhabit the location. The dispersion of those biological entities along the water distribution channels using Lake Bolonha as a water source, as well as general eutrophic activity, might contribute to the spread of minor genetic elements like ARGs and future unbalance microenvironments of close by freshwater sources, such as Lake Água Preta.

Conclusions
Given the importance of Lake Bolonha as a source of drinking water supply for the metropolitan region of Belém, as well as being an area of environmental preservation, studies to elucidate the viral diversity of this environment are relevant in order to provide a better understanding on how its exploration can affect it. The results observed in this work indicates the presence of a widely diverse viral community, specially bacteriophages and cyanophages. These ndings also suggest the existence of an established microenvironmental dynamics in Lake Bolonha, possibly regulated by such phage entities. The dispersion of those viral being bare similarity along the course of the lake, apparently more related the deeper they are into the lake (P1, P3) and the further away they are from the water evacuation sites to other treatment substations (P2). This is the rst ever work to describe the virome of Lake Bolonha and, as such, contributes to the understanding of water-related public health concerns regarding the spreading of antibiotic resistance genes and population control of native bacteria and cyanobacteria.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.

Competing interests:
The authors declare that they have no competing interests. Funding: We thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -CAPES and PROPESP/UFPA (Pró-Reitoria de Pesquisa e Pós-Graduação / Universidade Federal do Pará), and the funding agencies FAPESPA (Fundação Amazônia de Amparo à Estudos e Pesquisas) and FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais) for nancial support on this work.
Authors' contributions: BVAG, KCP, and RTJR designed the study. BVAG and KCP compiled and curated the data and performed bioinformatic analysis. BVAG, KCP, and WGN interpreted the results. ACF and RTJR supervised and administered the project and provided funding. BVAG wrote the original draft and manuscript with input from KCP, WGN, AOA, ALCQ, ACF, and RTJR. All authors critically reviewed the manuscript and approved the nal version.