Virome Characteristics and Antibiotic Resistance Genes in Virome Reads
In May 2016, six surface water samples were collected from the Han River, South Korea (Fig. S1). Viral particles obtained from 10 L samples were subjected to metagenome sequencing using the Illumina MiSeq platform, producing 3.6 million to 6.6 million reads from each site (Table S1). Among virome reads, 77.3−87.9% were annotated via the MG-RAST pipeline . Only 12.3−13.8% of the annotated reads were referred to as viral reads, most of which belonged to Myoviridae, Podoviridae, and Siphoviridae of Caudovirales (Fig. 1), all commonly found in the environment. The majority of virome reads (82.5−84.9%) were annotated as bacterial genes that predominantly belonged to the classes Alpha-, Beta-, and Gammaproteobacteria. Since only 0.00007−0.0003% of the virome reads were found to be 16S rRNA gene sequences (Table S1), bacterial contamination in the virome data was negligible. Functionally annotated reads based on the SEED subsystems protein database  were generally annotated as phage- and prophage-related functional genes (55.6−66.7%), including terminases, capsid proteins, tail proteins, and phage DNA polymerases (Fig. S2). Of the virome reads annotated as phage-related genes, 38.6−45.0% were taxonomically assigned to bacterial genes, largely derived from Proteobacteria (Fig. S3). The annotation of virome reads as bacterial genes is a common phenomenon due to dearth of viral gene database as most of phages in the environment are not yet cultured. As such, virome reads encoding bona fide viral proteins might be improperly represented as bacterial genes. Virome reads predicted to encode bacterial function might have originated from inadequately named phage or prophage genes within bacterial cells. Along with phage- and prophage-related functions, virome reads were also annotated as diverse auxiliary metabolic genes (AMGs) , such as enzymes for carbohydrate metabolism, respiration, photosynthesis, virulence, and defenses against foreign materials (Fig. S2).
ARGs in the virome reads included genes for lactamases, multidrug transport proteins, polymyxin resistance proteins, and vancomycin resistance proteins (Table 1). Polymyxin resistance proteins were the most frequently annotated ARG, including ArnA, ArnC, and ArnT, which are involved in the lipid A-Ara4N (4-amino-4-deoxy-L-arabinose) pathway that modifies bacterial lipid cell walls to prevent the binding of cationic antibiotics . Multidrug transport proteins were largely composed of ABC-type multidrug efflux pump components. Both types of ARGs were relatively frequently found in diverse viromes while less was reported in isolated phage genomes , indicating that the majority of phages harboring these ARGs are not cultured yet. Genes for lactamases and vancomycin resistance proteins were also detected in the Han River virome at relatively lower frequencies. Collectively, the Han River viral communities were shown to harbor ARGs at proportions of approximately 0.1%. However, the taxonomy of ARG-carrier organisms was inconclusive due to their short sequence lengths (300-bp). Short fragments of virome reads were also insufficient to anticipate whether phage-associated ARGs were incorporated into phage capsids through generalized transduction or into phage genomes through specialized transduction. Taxonomic assignment of ARG-harboring sequences was improved by assembling raw virome reads into contigs prior to further searches.
Antibiotic Resistance Genes in Assembled Viral Contigs
From the six virome datasets of the Han River, raw virome reads were assembled into contigs (Materials and Methods; Table S1). Contigs carrying at least one phage-related gene were selected so that they can confidently be considered as phages genomes, resulting in 5,295 viral contigs. Predicted open reading frames (ORFs) of the viral contigs were used to search for ARGs in the ARG-specific databases CARD (Comprehensive Antibiotic Resistance Database)  and Resfams , and a total of 25 ARGs were found therein (Table 2). ARG-harboring virome contigs were considered to be of viral origin (Figs. S4-S7); however, contig sequences of the contigs were not matched to any known phage genomes.
Although all 25 phage-associated ARGs were selected by threshold (see Materials and Methods), it was difficult to determine the ARG functionality based solely on these sequences because most genes lacked conserved active sites or signature secondary structures and had low sequence similarities to known ARGs. The ORFs H6-C148-ORF17 and H7-C136-ORF18 encoded the vatA gene, and the ORF N1-C442-ORF2 encoded the vatB gene, the virginiamycin O-acetyltranferase enzymes (Table 2). The Vat proteins share seven well conserved antibiotic interaction sites ; however, the vatA gene encoded by the virome contigs did not harbor any of the conserved sequences. The vatB gene encoded by the ORF N1-C442-ORF2 appeared to carry only four out of seven conserved residues (Fig. S8) , and the functionalities of VatA and VatB in the virome contigs were not able to be verified. Acetyl-CoA-dependent N-acetyltransferases (AAC(6’)) were found in 11 viral contigs (Table 2). The AAC(6’) in the GCN5-related N-acetyltransferase superfamily  possesses a signature secondary structure essential for acetylation of aminoglycoside antibiotics . The AAC(6’) encoding genes found in 11 virome contigs had incomplete secondary structures (Fig. S9) and these enzymes were predicted to be functionally impaired. The virome contig, H1-C267 was found to encode the qnrA3 gene that confers resistance to quinolone antibiotics. Sequences of QnrA family proteins have only a few amino acid differences among them , while the qnrA3 sequence found in the viral contig had only 28.8% sequence similarity to qnrA3 (AAZ04782.1). Additionally, it was also difficult to confirm whether this gene confers antibiotic resistance without phenotypic characterization.
Some ARGs confer antibiotic resistance through antibiotic target protein alteration, functions of which can be utilized for different purposes in bacteriophages, such as machinery for replication or infection. The dihydrofolate reductase (DHFR), dfrB2 was detected in four viral contigs (Table 2). While the production of drug (trimethoprim)-resistant DHFR is a known bacterial antibiotic resistance mechanism [38,39], the dfrB2 gene in bacteriophages would likely to function as a participant in phage folate biosynthesis. Two virome contigs were found to harbor vanY genes (Table 2) encoding the d-alanyl-d-alanine carboxypeptidase that cleaves d-Ala from peptidoglycan precursors to prevent the binding of vancomycin  and also known to function as a bacteriophage endolysin . Thereby, antibiotic resistance function of dfrB2 and vanY genes in bacteriophages could not be definitively determined. On the contrary, β-lactamases encoded by four virome contigs (Table 2) had clear indication of antibiotic resistance function due to well conserved active sites (presented below). We thereby focused our detailed functional analysis on these four β-lactamase genes.
Novel and Functional β-lactamases Encoded by Viral Contigs
β-lactamases, one of the widely found ARGs in diverse environments, were found in four viral contigs (Fig. 2). Among these, H4-C441-ORF28 was related to class A β-lactamases and three ORFs, sharing identical nucleotide sequences, showed significant homologies with metallo-β-lactamases (MBLs, Table 2). The ORF H4-C441-ORF28 carried the conserved active sites S70-X-X-K73 and S135-D136-N137 , the motifs specific to class A β-lactamases (Fig. S10). Phylogenetic analyses also showed that H4-C441-ORF28 was affiliated with class A β-lactamases but formed a distinctive clade apart from representative class A β-lactamases (Fig. 3, Table S2). Accordingly, this unique class A β-lactamase gene was named blaHRV-1 and its product was named HRV-1 (Han River Virome β-lactamase-1). Three ORFs, H1-C74-ORF21, H4-C244-ORF21, and H4-C367-ORF18 were predicted to be related to MBLs with the conserved active site, H147-X-H149-X-D151-H152  (Fig. S11). A maximum-likelihood phylogenetic tree grouped the MBL-related ORFs into subclass B3 MBLs, also forming an independent monophyletic cluster (Fig. 4, Table S2); this novel gene and gene product were named as blaHRVM-1 and HRVM-1 (Han River Virome Metallo-β-lactamase-1), respectively. The presence of conserved active sites of β-lactamases suggested the possibility of their functionality, therefore phenotype analyses were performed on HRV-1 and HRVM-1.
For the phenotypic expression analysis, pET-30a(+)/blaHRV-1-His6 and pET-28a(+)/blaHRVM-1-His6 plasmids were transformed into E. coli BL21 (DE3) and their susceptibilities to β-lactam antibiotics were assessed. The two strains showed reduced susceptibility, ranging from 2- to 16-fold reductions, to extended-spectrum cephalosporins and carbapenems, as well as to penicillin and narrow-spectrum cephalosporins, displaying typical characteristics of extended-spectrum β-lactamases (ESBL) or carbapenemase (Fig. 5 and Table S3). A kinetic analysis of purified HRV-1 and HRVM-1 revealed that these two enzymes had broad substrate profiles including carbapenems (Table S4). The catalytic efficiencies (kcat/km) of HRV-1 for cefotaxime (4.1 × 103 M–1 S–1) and ceftazidime (3.3 × 103 M–1 S–1) were 2- and 26-fold higher than those of GES-14 (1.8 × and 0.125 × 103 M–1 S–1, respectively) belonging to class A ESBL . In addition, the kcat/km of HRVM-1 for imipenem (5.0 × 102 M–1 s–1) was similar to those of PNGM-1 (5.5 × 102 M–1 s–1)  from a deep-sea sediment metagenome and CAR-1 (9.6 × 102 M–1 s–1)  from Erwinia carotovora that belong to subclass B3 MBL. The kinetic data for HRV-1 and HRVM-1 were consistent with the determined MICs (Fig. 5 and Table S3), indicating that HRV-1 and HRVM-1 have ESBL and carbapenemase properties.
In addition, bacterial metagenome datasets were generated in parallel at identical sampling stations. Homologous sequences to HRV-1 and HRVM-1 were searched within the metagenome-assembled contigs to observe their occurrence in bacterial fractions. Three metagenomic ORFs, each with 100% sequence identity, were found for both HRV-1 and HRVM-1, (Figs. 3 and 4; Tables S5 and S6). The metagenome contigs with these ORFs showed high synteny to the viral contigs containing blaHRV-1 and blaHRVM-1 (Fig. 6), suggesting the presence of infectious phages or prophages carrying blaHRV-1 and blaHRVM-1 in the Han River bacterial communities . Public viral and bacterial metagenomes were additionally searched to assess the global distribution of HRV-1 and HRVM-1. Sequences significantly similar to HRV-1 (up to 69% similarity) were frequently found in metagenomes prepared from the Colombia River estuary in the United States, while HRVM-1-like sequences (up to 71% similarity) were mainly found in freshwater viromes from Singapore (Table S5, Fig. S12), suggesting that HRV-1 and HRVM-1 are globally distributed. In particular, blaHRVM-1 showed a sequence similarity of ~40% to metallo-hydrolases found in Clostridium botulinum and Clostridioides difficile (Table S6), imposing that HRVM-1 and its homologous sequences are possibly carried by phages that infect these pathogens.