Cyanophages are specialized to infect cyanobacteria and could play an important role in modulating harmful blooms. As cyanophage distribution was found related to the occurrence of their hosts [1, 5, 6], it is needed to obtain knowledge of the cyanobacteria composition and factors influencing their growth in the study area. Analysis of cyanobacteria – potential virus host – indicated that their 16S rRNA gene was found in all studied lakes (Table 1). The total cyanobacteria biomass varied from 0.04 to 40.47 mg L-1 (Table S1). Filamentous cyanobacteria from the genera Aphanizomenon, Cuspidothrix, Dolichospermum, Limnothrix, Planktolyngbya, Pseudanabaena, Planktothrix, or Raphidopsis were among the dominants in most studied lakes. Additionally, Microcystis was among the dominant genera (0.57–1.35 mg L-1) in three lakes based on the microscopic analysis, and their overall presence was confirmed in 18 lakes according to the genetic analysis – 16S rRNA (Table 1 and 2, Table S1).
The study area (Fig. 1) was represented by the temperate-humid continental climate zone characterized by hot summers  together with water parameters which were shown in the following ranges: water pH 7.4–9.01, water temperature 16.3–27.8°C, and conductivity 251–729.1 µS cm-1. While total nitrogen and total phosphorus concentrations varied between 0.85–7.5 and 0.02–0.47 mg L-1, respectively (Table S1). Such parameters, conducive to eutrophication, ensured background favored the development of cyanobacteria [52, 53].
Cyanophages occurrence and diversity
The cyanophage genes (psbA, nblA, or g91) presented in host cells were detected in 16 from the 21 studied lakes (Fig. 1, Table 1). The lack of amplification of selected marker genes for cyanophages in some lakes, despite the presence of their potential hosts, may have been related to the number of the genes below the detection limit or used genetic markers not targeting the different phage communities, present in the lakes studied. According to Schrader et al. , the PCR inhibitors should be also taken into consideration.
The psbA was found in 88% cyanophage-positive samples (Table 1). Its DNA sequences were found between 75–98% of similarity for five Polish lakes (LUB, PNI, BUS, PAL, and ILN) and one Lithuanian lake (GIN). The variants with the highest similarity level were observed between LUB-PNI (98%) and BUS-GIN (95%). The psbA sequence of ILN had the lowest level of similarity (75–78%) with the analyzed sequences. Although all psbA sequences observed in this study branched within the larger cluster of marine cyanophages, they also grouped more closely to each other than to their marine counterparts (Fig. 2). Most of the psbA sequences showed 95-100% similarity to each other (data were not shown), with the only exception of Lake ILN (Fig. 3). The psbA sequences were intermixed, indicating that there were no differences in the distribution of cyanophages between distant lakes. Therefore, the higher divergence of ILN from other lakes may suggest that other, most likely local, factors might be responsible for the diversity of the cyanophage community. Whereas the psbA sequences from Polish lake ILN appeared to be the most similar with marine Synechococcus myocyanophage genome (S-CAM22) (Fig. 3). As it was described by Dreher et al. , the psbA similarity to the marine counterparts was also confirmed within Synechococcus specific S-CRM01 cyanophage, isolated from freshwater Copco Reservoir (Nothern California, USA). The high similarity of freshwater cyanophages psbA to marine cyanomyoviruses was also found in East lake (China) by Ge et al. . Moreover, the psbA of novel freshwater Ma-LEP Microcystis podocyanophage, isolated from Erie lake (USA) by Jiang et al. , also presented high sequence similarity with marine S-CBP4 Synechococcus podocyanophage. The above results, of freshwater psbA sequence similarities to their marine counterparts, indicated that this genetic marker can be used to study the diversity among freshwater and marine phages as already described by Chenard and Suttle .
The nblA, g91_S, and g91_L Microcystis cyanophage genes were found in 50%, 56%, and 44% of cyanophage-positive samples, respectively (Table 1). All nlbA sequences observed in this study from the analyzed samples (BYT, BUS, PNI, PAL, GIN, and SIM) if compared to each other showed high similarity, ranging from 88% to 99%. The highest level of sequence similarity (96%) was found between two Polish lakes – BYT and BUS, and between two Lithuanian lakes – GIN and SIM. The nblA sequences from analyzed samples were highly similar (>90%) with their corresponding gene fragments of uncultured Myoviridae phages (AB812972.1 and AB812972) and MaMV-DC (KF356199.1) The literature data indicated that the nblA gene is highly conserved and, hence, may underrepresent the existing diversity among cyanophages . The g91_S sequences obtained from six Polish lakes (LUB, BYT, BUS, PNI, PAL, and MIE) and three Lithuanian lakes (JIE and SIM) were similar in the range of 90–97% between them. Only GIN showed the lowest similarity (80–85%) when compared to all other sequences. Whereas the g91_L from the one Polish (BYT) and three Lithuanian lakes (JIE, GIN, and SIM) were similar in the range of 95–96%. The g91_S and g91_L sequences from this study were convergent (>91%) with their counterparts in culturable (MaMV-DC, KF356199.1; Ma-LMM01, AB231700.1) and unculturable (MH117957.1) Microcystis cyanophages. The above results might indicate the presence of Ma-LMM01-like phages within investigated lakes, as it was also showed in the Bay of Quinte (a Lake Ontario, Canada) by Rozon and Short  or Sulejowski Reservoir (Poland) by Mankiewicz et al. . In the case of lakes where there was no positive detection of nblA and g91 genes represented Ma-LMM01-like phages it is also possible that other Microcystis specific phages occur, which genomes were not characterized yet.
Principal component analysis (PCA) showed the relationships between cyanophages, cyanobacteria, and the physicochemical parameters of water (Fig. 4). The PC1 and PC2 represented up to 36.7 % of the total variance of the observations (19.46 % and 17.24 %, respectively; see also Fig. S3). The PCA scores and loadings (estimated with the Pearson correlation [r]) were described in the supplementary material Table S4 and S5, respectively. The PCA grouped lakes into three different clusters: groups A including only Polish lakes (LUB, PNI, BUS, BYT, and PAL) and B including only Lithuanian lakes (SIM, GIN, and JIE) and group C (DID, SIR, ILN, MYS, MOG, ZAB, GRY, GOP, and ZBA) including both Polish and Lithuanian lakes. In groups A and B two or three cyanophage genetic markers were detected while group C consisted of lakes with only one or none of the studied genes (Table 2, Fig. 4). Group A was significantly segregated from groups B and C (p = 4.76 x 10-4 and 2.2 x 10-4, respectively; see Table S6). The PC1 presented the highest positive correlations with the TP and conductivity (r = 0.71 and 0.70, respectively), followed by the occurrence of cyanophage genes – nblA and psbA (r = 0.56 and 0.52, respectively) (see Table S5). These results suggested that the above-mentioned factors could be important variables contributing to the spatial distancing between the Polish and Lithuanian lakes and favored the development of particular cyanobacteria [52, 53], which can differ in A and B groups analyzed (Fig. 4). The modest relationship between the abundance of some viral genes and TP was indicated for the Bay of Quinte by Rozon and Short . Moreover, TP as one of the most important parameters for the regulation of cyanobacterial occurrence, could directly influence in their development and thus becoming available to phages for the genome replication process inside the host cell [56, 57].
Whereas, the Lithuanian lakes in group B were significantly differentiated from group C by the vertical component – PC2 (p = 6.41 x 10-9; Table S7 and Fig. 4), which could be explained by the high positive correlations observed between the PC2 and the cyanophage genes – g91_S, g91_L, and nblA (r = 0.79, 0.69 and 0.61, respectively), followed by the pH (r = 0.47) (see Table S7). Cyanophages have a wide range of pH tolerances; however, a decrease in pH below the host's optimal requirements may directly affect the host's cells homeostasis and thus negatively affect the intracellular cyanophage replication process . Thus, group C was not only characterized with the lowest detection of cyanophage genes, but also the lowest values of environmental factors, and therefore, was found negatively scored in the PCA (Fig. 4, Table S4).
The psbA sequences presented in LUB, PNI, BUS, PAL, and GIN lakes were aligned close to each other within the phylogenetic tree, with exception of ILN lake (Fig. 3). While, after comparing their presence with physicochemical factors as part of the PCA analysis (Fig. 4) the mentioned psbA sequences with high similarity were divided into two groups: A (LUB, PNI, BUS, and PAL) and B (GIN). The separateness of psbA ILN based on its higher sequence divergence was also reflected within PCA results, as the one which was subjected to group C (Fig. 4). This observation might confirm the important role that the environmental factors, most likely local, may have in shaping the genetic variation in phages.
As it was shown, different cyanobacterial species were subjected to different groups highlighted with the use of PCA analysis (Fig. 4, Table S1). For instance, Planktothrix agardhii (average biomass 15.6 mg L-1) was a characteristic dominant species in group A, Planktolyngbya limnetica (3.2 mg L-1) in group B (Fig. 4), whereas Aphanizomenon gracile was the dominant species found in groups A and B with average biomass of 3.27 mg L-1 and 3.80 mg L-1, respectively and had five times lower biomass in the group C (0.64 mg L-1) (Fig. 4, Table S1). Observed species differentiation might result from the influence of different physicochemical factors. For example, lakes from group A where P. agardhii was a dominant species were positively related to TP which is in line with previous studies that demonstrated domination of this cyanobacterium in hypertrophic lakes with high concentrations of phosphorus [57, 58]. Also, A. gracile is a common dominant species in temperate lakes adapted to various types of environmental and nutritional conditions [59, 60, 61]. However, the cyanobacteria composition represented by total biomass was found to rather enhance the cyanophage genes occurrence (Table 2, Table 1S) than single species highlighted within PCA results. It was observed that in the lakes where two-three cyanophage genes were determined, also cyanobacteria biomass was two-three times higher (Table 2, Table 1S).