Characterization and genome sequence of the genetically unique Escherichia bacteriophage vB_EcoM_IME392

In this study, a novel Escherichia coli-specific bacteriophage, vB_EcoM_IME392, was isolated from chicken farm sewage in Qingdao, China. The genome of IME392 was found by next-generation sequencing to be 116,460 base pairs in length with a G+C content of 45.4% (GenBank accession number MH719082). BLASTn results revealed that only 2% of the genome sequence of IME392 shows sequence similarity to known phage sequences in the GenBank database, which indicates that IME392 is a novel bacteriophage. Transmission electron microscopy showed that IME392 belongs to the family Myoviridae. The host range, the multiplicity of infection, and a one-step growth curve were also determined.


Introduction
Since the German paediatrician Theodor Escherich isolated Escherichia coli from healthy human faeces in 1885, this bacterium has been extensively and thoroughly studied [7,8,21,22,24]. E. coli, as a model organism, currently plays a vital role in life science research and in biotechnology industries such as pharmaceuticals and industrial chemicals [5,13,18]. E. coli is an important microorganism that is ubiquitous in the natural environment and the mammalian gastrointestinal tract, and it is part of the normal intestinal flora. E. coli and other facultative anaerobes constitute approximately 0.1% of the gut microbiota [9]. Most E. coli strains are harmless, but certain serotypes can cause severe food poisoning, septic shock, meningitis, and/or urinary tract infections [16,29], which seriously threaten human life and property safety. The discovery and use of antibiotics alleviated these dangers, but at the same time, antibiotic abuse has brought a new challenge to the clinical treatment of antibiotic resistance. There have been reports of E. coli strains resistant to all major antibiotic types, including extendedspectrum beta-lactams, carbapenems, fluoroquinolones, aminoglycosides, and trimethoprim-sulfamethoxazole [28]. Recently, even plasmid-mediated colistin resistance has emerged [23]. Bacteriophages that can lyse bacteria [14] seem to be a promising solution to the prevalence of multidrug-resistant bacteria. After their discovery by Twort and D'Hérelle, phages were soon used to treat bacterial infections. Phage therapy has certain advantages over antibiotic therapy, including low cost, easy availability, specificity, and few side effects.
Bacteriophages are the most widespread biological entities in nature, and their number is 10 times greater than that of bacteria. In co-evolution with their hosts, bacteriophages have developed tremendous diversity. The genome of E. coli phage MS2, with 3,569 nucleotides of positive-sense single-stranded RNA, was the first genome to be completely sequenced [10]. The following year, Sanger et al. completed the sequencing of bacteriophage Φ-X174, which was the first DNA genome to be sequenced [30]. M13 is a filamentous Handling Editor: Johannes Wittmann.
Yunjia Hu, Shanwei Tong and Ping Li are contributed equally to this work.

3
bacteriophage composed of circular single-stranded DNA (ssDNA) that is 6,407 nucleotides long, encapsulated in approximately 2,700 copies of the major coat protein P8, and capped with five copies of two different minor coat proteins (P9, P6, P3) on the ends [26]. In 1951, Esther Lederberg serendipitously discovered that, after ultraviolet irradiation, the laboratory E. coli K12 strain released a bacteriophage, which was later named lambda. Subsequently, the entire life cycle, including lytic and lysogenic phases, has been deeply studied. In addition, bacteriophages with large genomes, such as T2, T4, T5, and T6 have also been studied in many aspects. Of these, T4-like phages are the most representative and widely studied phages with large genomes. The genomes of these phages are more than 100 kb in length and encode 100-300 -or even more -proteins and a variety of tRNAs. Most of these large-genome phages share well-organized and highly conserved core genes, especially those encoding DNA replication and virion structural proteins [6,27]. Research on these known phages will help us understand and discover the mysteries of life and provide guidance for future research.
Thanks to the rapid development of high-throughput sequencing technology, the number of phage sequences in the GenBank database has grown geometrically. However, most of these phage sequences have a high degree of similarity to previously known sequences from bacteria or phages. In this study, we isolated and identified a genetically unique E. coli phage whose genome sequence is only 2% identical to those of the most similar sequences in GenBank, indicating that it may have completely different characteristics and functions from existing phages.

Sampling, isolation, and purification of Escherichia phage IME392
Phage IME392 and its host strain, E. coli E2, were isolated from sewage samples from a chicken farm in Qingdao, China. For the purification of phage particles, the sewage samples were first centrifuged at 12,000 × g for 5 min and then filtered through a 0.22-μm membrane. After that, an equal amount of the filtrate was added to 3× LB medium containing the log-phase host bacterium E2 (OD 600 = 0.4) and cultured overnight at 37 °C with shaking at 180 rpm. The culture was centrifuged at 12,000 × g for 5 minutes, and the pellet was discarded. The supernatant was filtered by passage through a 0.22-μm filter to remove host cells. The filtrate was serially diluted tenfold in sterile PBS, and 100 μL of each dilution was mixed with 200 μL of the logphase host bacterial culture, followed by incubation at room temperature for 5 minutes. The mixture was added to 5 mL of preheated 0.75% LB soft agar and poured onto the surface of 1.5% hard agar plates. After solidification, the plates were incubated overnight at 37 °C. Single plaques were isolated from the plates and again incubated overnight with a liquid culture of E2 with shaking at 37 °C. Cultures were re-centrifuged and sterile filtered, and the filtrates were subjected to another round of plaque assays. This process was repeated three times to obtain pure phage stocks.

Multilocus sequence typing (MLST)
Based on previous reports [15], primer pairs for eight housekeeping genes, dinB, icdA, pabB, polB, putP, trpA, trpB, and uidA, were designed for PCR amplification. All PCR products were purified by gel extraction and then sequenced by Beijing Ruiboxingke Biotechnology Co., Ltd., using the universal sequencing primers OF and/or OR. Further details about the MLST procedure can be found at http:// www. paste ur. fr/ mlst.

DNA extraction, gene sequencing, and bioinformatic analysis
Phage DNA was extracted using a modified standard phenol-chloroform extraction protocol [34]. First, DNase I and RNase A (Thermo Scientific, USA) were added to the purified phage IME392 preparation to a final concentration of 1 μg/mL and incubated overnight at 37 °C. After incubation at 80 °C for 15 minutes to inactivate DNase I and cooling to room temperature, lysis buffer with a final concentration of 0.5% SDS, 50 μg of protease K per ml, and 20 mM EDTA was added. The solution was incubated for 1 hour at 56 °C, and an equal volume of Tris-saturated phenol was added. The mixture was vortexed to form a uniform emulsion. After centrifugation at 10,000 × g at 4 °C for 5 minutes, the upper aqueous phase was collected and transferred to a new tube, and an equal volume of extraction agent (phenol:chloroform:isoamyl alcohol, 25:24:1) was added. The mixture was centrifuged again (10,000 × g, 4 °C, 5 min), and the aqueous phase was collected and added to an equal volume of isopropanol. The mixture was incubated at -20 °C for more than 1 hour, followed by centrifugation at 10,000 × g at 4 °C for 20 minutes, which precipitated the phage DNA. The pellet was washed twice with 1 mL of 75% cold ethanol, resuspended in 30 μL of deionized water, and stored at -20 °C.
A 2×300-nt paired-end DNA library was prepared using an NEBNext ® Ultra TM II DNA Library Prep Kit for Illumina ® according to the manufacturer's instructions. A Bioruptor UCD-200TS ultrasound system was used to fragment 50 μL of DNA (approximately 100 ng) into 300to 600-bp fragments. The resulting fragmented DNA was end-repaired and ligated to the NEBNext adaptor. Cleanup of adaptor-ligated DNA was performed using AMPure XP beads. Finally, the cleaned DNA was amplified by PCR for 4 to 5 cycles, and the PCR product was purified again using AMPure XP Beads. An Agilent 2100 Bioanalyzer system was used to measure the size distribution of the constructed library fragments, and the library was quantified using a KAPA Library Quantification Kit. Whole-genome sequencing was performed on an Illumina MiSeq sequencing platform (San Diego, CA, United States) with a 600-cycle MiSeq v3 Reagent kit to generate 2×300-bp paired-end reads. In total, 555,564 raw reads were generated.
The raw sequencing data quality was analysed using the quality control software FastQC v0.11.5 and filtered for lowquality reads and adaptor regions using Trimmomatic 0.36 with default parameters [4]. The high-quality reads were assembled using SPAdes v3.13.0 with default parameters, and approximately 4,872 contigs were generated [3]. For the assembled contigs, Bandage v0.8.1 [31], which is a tool for visualizing assembly graphs with connections, was used to display the connections between those contigs. Only three of the contigs were circular, with lengths of 116,460, 39,440 and 4,888 bp and coverage of 352×, 7×, and 18×, respectively. BLASTn analysis confirmed that the two shorter contigs were lysogenic phages and plasmids. Mapping was carried out using CLC Genomics Workbench 12.0.2 (length fraction = 0.95; similarity fraction = 0.95), which was also used to adjust the sequences and for result checking. A consensus genome sequence was generated that spanned 100% of the reference genome, and the 425,959 mapped reads had an 884.1 mean read coverage. A nucleic acid sequence similarity search was performed using BLASTn (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi). Gene annotation was first run on RAST [2] (http:// rast. nmpdr. org/) and then refined by amino acid sequence comparisons in BLASTp. A genome function map was generated using the laboratory's self-built script and optimized using Inkscape 0.92.1.
The amino acid sequences of the major capsid protein and the terminase large subunit of the bacteriophage IME392 were used to construct a neighbor-joining phylogenetic tree via MEGA 7.0 with 1000 bootstrap replicates, which was optimized using the online website tool EvolView (https:// www. evolg enius. info/ evolv iew/).

Transmission electron microscopy
After centrifugation of the coculture of the phage and its host at 12,000 × g and passage through a 0.22-μm filter, the phage particles were purified by sucrose density gradient centrifugation [33]. Approximately 20 μL of purified, enriched phage samples were deposited on carbon-coated copper grids, allowed to absorb for 15 minutes, and then dried using filter paper. The phage particles were negatively stained with 2% (w/v) phosphotungstic acid (pH 7.0) for 2 min and examined using a JEM-1200EX transmission electron microscope (Jeol Ltd., Tokyo, Japan) at an acceleration voltage of 80 kV.

Host range determination
The host range of phage IME392 was determined by spot assay and confirmed by plaque assay. Suspected hosts were cultured at 37 °C to reach an optical density of 1.0. Three hundred milliliters of bacterial culture was added to 5 mL of preheated 0.7% LB agar and poured onto 1.5% agar plates. After solidification, each plate was tested by pipetting 5 μL of phage suspension onto the bacterial lawn. Possible hosts were identified by plaque formation after overnight incubation at 37 °C. The plaque assay procedure is as described in the phage purification section.

Determination of the optimal multiplicity of infection
The multiplicity of infection (MOI), or the ratio of phage particles to bacterial cells prior to culture, affects the final level of bacteriophage produced. At the optimal MOI, a cultured product contains the most phage particles after reaching stationary phase. To determine the optimal MOI, first, the number of colony-forming units of the log-phase (OD 600 = 0.6) host bacterial E2 culture and the number of plaqueforming units of the bacteriophage IME392 stock solution were calculated separately. A bacteria-phage mixture was added to 5 mL of LB medium at different ratios to achieve MOIs of 10, 1, 0.1, 0.01, 0.001, and 0.0001 and was subsequently incubated at 37 °C with shaking at 220 rpm for 4 hours. After centrifugation at 12,000 × g for 1 min and passing the culture through a 0.22-μm filter, the phage titer was calculated using the double-layer plate method after serial dilution. Three replicates were performed, and the MOI that produced the highest phage titer was considered the best MOI for this phage.

One-step growth curve
A one-step growth curve was generated by the following method. A mixture of phage and bacteria at the optimal MOI (0.1) was incubated at 37 °C for 10 minutes for absorption. After centrifugation at 12,000 × g for 1 min, the supernatant containing unabsorbed phage particles was discarded, and the pellet was then washed twice with LB medium and resuspended in 20 mL of LB medium. The moment when the precipitation was resuspended in medium was defined as time zero. The suspension was then cultured at 37 °C with shaking at 220 rpm for 140 min. Samples (200 μL) were collected every 10 minutes (every 5 minutes in the first 30 minutes) and then centrifuged and plated on double agar plates to determine the phage titer. Each sample was plated on three separate plates. Finally, the resulting one-step growth curve was plotted by GraphPad Prism 8.0.

Determination of pH and temperature tolerance
To determine the tolerance of phage particles to different pH values, LB medium was adjusted to a variety of pH values ranging from 2 to 13 with 5 M HCl or NaOH solution and then passed through a 0.22-μm filter. Then, 100 μL of purified phage suspension was added to 900 μL of LB medium with different pH values and incubated at 37 °C for 1 hour. To investigate the thermal stability of the phage, 100 μL of purified phage suspension was added to 900 μL of LB medium and incubated at 30 °C, 37 °C, 40 °C, 50 °C, 60 °C, 70 °C, or 80 °C for 1 hour. The titer of the remaining viable phage particles was determined using the double-layer agar method. All assays were performed in triplicate.

Proteomic analysis
The phage particles were concentrated using polyethylene glycol (PEG) and then purified by sucrose density gradient centrifugation [33]. Purified phages were mixed with 6× protein loading buffer (TransGen Biotech Co., LTD) and then boiled for 10 minutes, followed by concentration at 12,000 × g and 4 °C for 3 min. The proteins were separated by 12% sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), and the bands were visualized by staining the gels with Coomassie brilliant blue. Gel slices were then excised and trypsinized. Briefly, 2.5 μg of trypsin was added to 100 μg of protein solution at a protein:trypsin ratio of 40:1 and incubated at 37 °C for 4 hours. Trypsin was added one more time at the above ratio, and enzymatic digestion was continued at 37 °C for 8 hours. The enzymatically digested peptides were desalted using a Strata X column and vacuum dried. The dried peptide sample was analyzed by liquid chromatography mass spectrometry (LC-MS). The full spectrum identification of proteins was mainly based on experimental tandem mass spectrometry data matched with theoretical mass spectrometry data obtained by database simulation. First, the original mass spectrum data were converted to a mass spectrum peak file. Then, the sequence in the database was searched and matched using Mascot v2.3.02 (parameters: enzyme, trypsin; fragment mass tolerance, 0.05 Da; fixed modifications, carbamidomethyl (C), variable modifications oxidation (M), Gln->pyro-Glu (N-term Q), and deamidated (NQ); max missed cleavages, 1; instrument type, ESI-FTICR; database, bacteriophage_392_nr.fasta), and filtering and quality control (Mascot evaluation ≤ 0.05) were performed on the search results to give reliable protein identification results.

Morphological features of phage IME392
After 10 hours of culture on double agar plates, the bacteriophage IME392 formed visible but small plaques reaching approximately 0.3-0.5 millimeters in diameter. Transmission electron microscopy results suggested that bacteriophage IME392 should be classified morphologically as a member of the family Myoviridae, possessing an icosahedral head approximately 83.93 ± 0.55 nm in diameter (n = 10) and a contractile tail 122.23 ± 3.55 nm in length (n = 10) (Fig. 1).

Host range
In this study, the ability of bacteriophage IME392 to lyse strains was determined by spot assay and plaque assay. A host range test was conducted on 33 clinically isolated pathogenic strains or environmentally isolated strains, including 28 E. coli strains and other bacteria (Salmonella). As shown in Table 1, among 28 strains of E. coli, only five strains produced bright and clear plaques, and the plaques formed on the lawns of eight strains was slightly turbid. The phage could not infect the other 15 E. coli strains or strains of other species. This indicates that the bacteriophage IME392 has a relatively narrow and specific host range.

Physiological features of phage IME392
The optimal MOI was determined to be 0.1, and this was used to generate the one-step growth curve for IME392 shown in Figure 2. The latent period and burst period were both approximately 15 minutes. In comparison to other known phages [11,12,19,35], IME392 has a lower titer when reaching the stationary phase, at approximately 10 8 plaque-forming units per milliliter (PFU/mL), which is consistent with the small plaque size of IME392.

3
The temperature and pH stability data for IME392 phage particles are shown in Figure 3A and B, respectively. IME392 is extremely sensitive to heat treatment. Incubation at 60 °C for 1 hour reduced the phage titer by 99.94%, while no phage activity remained if the incubation temperature reached 70 °C or higher. IME392 was also sensitive to both low-and high-pH environments. No live phage was detected when the phage particles were incubated at pH 2.0 or 13.0 at 37 °C for 1 hour.

Genome sequencing and analysis
The complete genome sequence of phage IME392 was 116,460 base pairs in length, with a GC content of 45.4%. A total of 160 potential open reading frames (ORFs) that could encode proteins were predicted by RAST. The genome sequence was submitted to GenBank (https:// www. ncbi. nlm. nih. gov/ genba nk/) and is available under the sequence ID MH719082.1.
The genome sequence of IME392 was analyzed using BLASTn. The results revealed that only 2% of the genome sequence (nt 91,907-94,497) was significantly similar to known nucleotide sequences in public databases. Homologous sequences included both phage and bacterial sequences. Eighty-nine potential CDSs were further analyzed using BLASTp, among which 34 were identified as functional proteins, including 15 morphogenesis-related proteins, 16 replication-related proteins, and five lysis-related proteins. Two tailspikes were classified as both morphogenesis-and lysis-related proteins. Twenty-three ORFs encoding proteins were highly homologous to phage proteins of unknown function, and two others were homologous to only hypothetical bacterial proteins. The products of the remaining 30 ORFs had no significant sequence similarity to proteins from the public database. The annotated genes are shown on the genome function map (Fig. 4) and in Table 2.
The replication-related modules of IME392 are mainly distributed between nt 7,051 and 54,779 of the genome, with a total length of approximately 47 kb, which is approximately one-third of the length of the entire genome. This is a relatively rare observation. A total of 60 open reading frames were predicted in this region, of which 16 ORF-encoded proteins have known functions, nine are hypothetical proteins, and the remaining 35 did not show homology to currently known proteins. The particularly large size of the replication-related region may be related to a large number of genes  for non-homologous proteins interspersed between genes encoding replication-related enzymes such as ligase, DNA polymerase, RNA polymerase, helicase, and topoisomerase. The next two genes encode two tRNAs, specifically for Met (CAT) and Arg (TCT). Packaging-related virion structure and lysis proteins are closely linked in the genome, covering the range of 61-104 kb, and they are all located in the positive strand of the IME392 genome.
The major capsid protein and terminase large subunit were chosen for phylogenetic tree construction (Fig. 5) [17]. However, the phylogenetic relationship to other phages was clearly distant, indicating that IME392 is a novel phage. Moreover, bacteriophage IME392 The outermost circle represents the gene organization. The arrows indicate the direction of transcription of each gene. Different colors refer to different functional categories: hypothetical proteins (gray), lysis-related proteins (yellow), phage morphogenesis proteins (blue), and replication-and regulation-related proteins (green). The two innermost circles represent GC skew [(G-C)/(G+C)] and G+C content. The gray circle represents the G+C content. An outward direc-tion indicates that the G+C content of that region is greater than the average G+C content of the whole genome, and an inward direction indicates that the G+C content of that region is lower than the average G+C content. The black circle represents the GC skew [(G-C)/ (G+C)]. An outward direction indicates that the GC skew [(G-C)/ (G+C)] is larger than zero, and an inward direction indicates that the GC skew [(G-C)/(G+C)] is lower than zero. The scale units are base pairs. forms an independent branch in the phylogenetic tree, has low similarity to known phages in the family Myoviridae, and can be classified as a member of a new genus.

Proteomic analysis
To identify the predicted proteins by our genomic analysis, the phage was concentrated and analysed using mass spectrometry. Sixty proteins were identified, and 28 out of these were identified as having homologues of known function ( Table 3). Nine of the identified proteins can be categorized as structural proteins or proteins involved in the morphogenesis of the phage (CDS1, CDS3, CDS48, CDS53, CDS65, CDS66, CDS68, CDS69, CDS72). In addition to the terminase large subunit, all replicationrelated proteins were identified. However, only two lysis-related proteins were identified. Mass spectrometrybased proteomics identified 28 of the 34 proteins predicted in our genomic analysis (Table 2) with a known function. In total, 60 out of 89 (67.4%) predicted proteins were identified by mass spectrometry proteomics, most of which were encoded by identified structural genes.

Discussion
The invention of antibiotics has greatly improved the living conditions of humans and saved tens of thousands of lives. However, at the same time, the problems of drug resistance and even multidrug resistance caused by antibiotic abuse have posed new challenges. It has been estimated that 700,000 people die from drug-resistant infections each year, and this number may rise to 10 million by 2050. The speed of development of new antibiotics cannot match the speed of antibiotic resistance development [20,25]. Phage therapy has gradually become a research hotspot due to its high efficiency, excellent specificity, and easy availability. A variety of phage preparations have been successfully developed and used in clinical treatments. In this study, we isolated and identified a new bacteriophage, IME392, that can infect E. coli. The phage infected only some of the E. coli strains tested, and none of the strains from other species. In addition, no toxin genes, antibiotic resistance genes, phage lysogenic factors, or other pathogen-related genes were found among the genes with known functions in the phage genome. However, the functions of many genes are still unknown, so it is important to identify the functions of these genes. The use of this phage to treat infections caused by resistant E. coli still has a long way to go.
Genomic analysis shows that bacteriophage IME392 has low similarity to existing biological entities (only 2%) and might have numerous novel features. The genome annotation of IME392 revealed only 13 predicted phage structural proteins, while the genes encoding other phage structural proteins are still unknown. In addition, the genome of phage IME392 also encodes a variety of replication-related proteins, including DNA polymerase, DNA ligase, DNA helicase, topoisomerase, 5'-3' DNA exonuclease, and other replication-related proteins. Therefore, it can rely on its own encoded enzymes to replicate, and we speculate that it may have its own replication mechanism. It is worth noting that we also found two genes encoding a DNA-directed RNA polymerase in the genome of the bacteriophage IME392, which is not common in bacteriophages.
Glycosyltransferase is an enzyme that can catalyze the transfer of the glycosyl moiety from an activated nucleotide sugar to a nucleophilic glycosyl acceptor molecule [32]. Some bacteriophages modify the glycome by influencing the expression of host glycosyltransferases, while other phages are unique in that they can express their own glycosyltransferases. These bacteriophages glucosylate their DNA to protect them from host restriction endonuclease systems. Furthermore, glycosyltransferase may function in the puncturing or lysis of the cell wall peptidoglycan. We were surprised to find that the genome of bacteriophage IME392 also contains a gene encoding glycosyltransferase, which may prevent its removal by the host, but its exact function in the life cycle of the bacteriophage still needs experimental study. The IME392 genome also encodes two distinct tailspike proteins. The tailspike protein of enterobacteria phage P22 mediates the recognition and adhesion between the bacteriophage and the surface of Salmonella enterica cells [1]. It is speculated that the tailspike protein of bacteriophage IME392 has a similar function. We believe that the presence of the glycosyltransferase and tailspike protein helps phage IME392 infect and adsorb to the host more easily.
In conclusion, we present here the biology and the genomic and proteomic characteristics of E. coli phage IME392, which was isolated from sewage samples from a chicken farm in Qingdao, China. The newly isolated phage IME392 was identified as a member of the family Myoviridae. The findings of this study not only provide new phage resources for the development of phage therapy against E. coli but also show that there are still many completely novel phages waiting to be discovered.