Cronobacter sakazakii are facultative, anaerobic Gram-negative bacteria that exist widely in various foods and raw materials [1, 2]. In recent years, as a new foodborne pathogen, C. sakazakii has been commonly found in formula milk powder and is associated with several infectious diseases, including meningitis, necrotizing enterocolitis, and sepsis [3, 4]. Viruses are ubiquitous in nature, and bacteriophages, which are viruses of bacteria, are effective tools to kill pathogenic bacteria . The genetic diversity of C. sakazakii poses a challenge for the use of phages to control microbial contamination in food processing environments, therefore it is necessary to isolate and identify new phages targeting C. sakazakii .
Fucose-containing exopolysaccharides (FcEPSs) are a promising source of fucosylated oligosaccharides and fucose. Cronobacter sp. typically have the capacity to produce fucose-rich FcEPSs . Bacteriophage-borne glycanases extracted from phages are effective tools for degrading FcEPSs . A previous study found that a phage isolated from sewage was capable of degrading FcEPS produced by C. sakazakii M1 . In this study, the bacteriophage targeting C. sakazakii M1 was purified via a plaque assay, more than 10 times. The genome sequence and functional biological characterization of phage PF-CE2 was completed and compared with those of homologous phages. In addition, the gene encoding the bacteriophage-borne glycanase was predicted.
Phage isolation and purification were performed as described previously . Briefly, prior to phage PF-CE2 DNA isolation, DNase I (10 µg/mL, Sigma-Aldrich) and RNase A (20 µg/mL, Sigma-Aldrich) were added to a purified suspension of PF-CE2 and incubated at 37°C for 1 h, to digest bacterial DNA and RNA. DNA isolation and purification of phage PF-CE2 were carried out using the E.Z.N.A® Viral DNA Kit (OMEGA). The sequencing library was generated using the Illumina TruSeq DNA Nano Sample Preparation Kit (Illumina). One microgram of DNA was sheared into 300–500 bp fragments using M220 Focused-ultrasonicator (Covaris). Following PCR amplification, the purpose straps were recovered by gel excision. A TBS-380 Micro-Fluorometer (Turner BioSystems) and PicoGreen® (ThermoFisher Scientific) were used for quantitative analysis and clusters were generated by bridging PCR amplification on cBot 2 system (Illumina). The genome of phage PF-CE2 was sequenced using the Illumina HiSeq system with a 2 × 150 bp paired-end run. To make sure the accuracy and reliability of the sequencing results, quality control of the original data was performed as follows: (1) removal of the adapter sequence from reads, (2) reads containing non-AGCT at the 5' end were removed, (3) the ends of reads whose sequencing quality was less than Q20 were removed, (4) reads whose N proportion was more than 10% were removed, and (5) fragments less than 50 bp were discarded. Following quality control, clean data were obtained; detailed information is shown in Table S1. The reads were assembled using ABySS (v2.0.2) assembly software, and GapCloser (v1.12) software was used to carry out gap filling and base correction.
The results of high-throughput sequencing showed that phage PF-CE2 was assembled at 88-fold coverage into a genome of 17 8248 bp in length, with a G + C content of 44.8% and that genes made up 95.87% of the genome. The genome sequence of phage PF-CE2 was compared with that of other phages using the standard nucleotide BLAST in the NCBI database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Table S2 shows the basic characteristics of eight selected phages, which are similar to phage PF-CE2 in length and G + C content, including the Citrobacter phages (Margaery (unpublished) and Maroon ), Cronobacter phages (vB CsaM Cronuts (unpublished), vB CsaM GAP161 , vB CsaM leB , vB CsaM leE , and vB CsaM leN ), and Enterobacter phage vB EkoM5VN (unpublished). tRNAscan-SE 2.0 (http://lowelab.ucsc.edu/tRNAscan-SE/) was used to identify possible tRNAs in the genome, and rRNA was predicted using the RNAmmer 1.2 Server (http://www.cbs.dtu.dk/services/RNAmmer/) [12, 13]. None of them contained rRNA. tRNAMet and tRNAGly were found in the genomes of PF-CE2, Cronobacter phages vB CsaM Cronuts and vB CsaM GAP161, while the others contained only tRNAGly. Comprehensive Antibiotic Resistance Database (https://card.mcmaster.ca/) and Virulence Factor Database (http://www.mgc.ac.cn/VFs/blast/blast.html) analyses did not detected antibiotic resistance encoding genes, or virulence factors in the genomes.
A total of 275 open reading frames (ORFs) were identified using the ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/), of which 123 were located on the plus strand and the remainder on the minus strand (Table S3). Blastp (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to search for sequences homologous to the 275 ORFs. A circular representation of the phage PF-CE2 genome was generated using the BLAST Ring Image Generator (BRIG) . As shown in Fig. 1, the genome consisted of several clusters, including structural proteins, DNA replication and transcription proteins, nucleotide metabolism and biosynthesis proteins, lysis proteins, and DNA packaging proteins.
The above 275 ORFs were analyzed using the CAZy database (http://www.cazy.org/). The results showed that only the protein encoded by Gp190 was assigned to the GH family, which implies that Gp190 may have a function related to the degradation of exopolysaccharides. To validate this hypothesis, Gp190 was analyzed using PSI-BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) , and the results of the two iterations are shown in Table S4. In the first iteration, 33 homologous glycoside hydrolase family protein sequences were found in the “Sequence with E-values WORSE than the threshold”. In the second iteration, 45 homologous sequences associated with glycoside hydrolase family proteins were shown in “Sequence with E-values BETTER than threshold”. In addition, HHPred (https://toolkit.tuebingen.mpg.de/tools/hhpred) was also used to analyze the potential function of Gp190. The results showed that the N-terminus of Gp190 was similar to the glycoside hydrolase catalytic center of 1WTH, with a probability of 100% , and that the C-terminus was similar to the glycoside hydrolase catalytic center of 3A1M, with a probability of 94.4% . Bacteriophage-borne glycanases are associated with the tail structure of many tailed phages. Phage ФMR11 is a long-tailed phage that targets Staphylococcus aureus and protein Gp61, which has glycoside hydrolase activity, is located on the tail of this phage . Phage P22 is a short-tailed phage and its tail spike protein, Gp26, has endorhamnosidase activity, which can specifically hydrolyze the O-antigen polysaccharide of Salmonella typhimurium . K5 lyase A is also a tail spike protein and is encoded by the K5A coliphage, which is capable of degrading K5 exopolysaccharides with the repeat unit [-4)-GlcA-(1,4)-GlcNAc(1-] . Here, Gp190 was also identified on the tail of phage PF-CE2. Therefore, according to the above analysis, there is a high possibility that Gp190 of phage PF-CE2 is capable of expressing bacteriophage-borne glycanase. Except for the Cronobacter phage vB CsaM leN, the other seven phages all contained a homologous sequence to Gp190 (> 92% identity, with 100% coverage), suggesting that they may also express bacteriophage-borne glycanase. Regrettably, there have been few studies regarding the glycanases of C. sakazakii phages, so this study is beneficial for exploring which gene encodes glycanase and performing heterologous expression of this enzyme.
For further research and application of Gp190, several online tools were used to study its structure. The primary structure of Gp190 was analyzed using the ProtParam tool (https://web.expasy.org/protparam/) , which showed that Gp190 was 1761 bp, and encoded 586 amino acids. The molecular formula was C2821H4439N797O895S17, with a molecular weight of 64385.10 Da, a theoretical isoelectric point of 5.01, and a liposoluble index of 77.65. The instability index of Gp190 was predicted to be 29.77, which indicated that the protein was stable. The Phyre2 server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) was utilized to predict the secondary structure of Gp190  and showed that Gp190 contained 14% alpha helices and 35% beta strands. ProtScale (https://web.expasy.org/protscale/) was used to predict the hydrophilicity and hydrophobicity of the Gp190 protein  and revealed that hydrophilic amino acids accounted for 65%, suggesting that Gp190 of phage PF-CE2 may be soluble. No transmembrane domain was identified in Gp190 using the TMHMM Server 2.0 (http://www.cbs.dtu.dk/services/TMHMM-2.0/). Additionally, Gp190 was determined to be non-secretory. Further, the SignalP 4.1 Server (http://www.cbs.dtu.dk/services/SignalP-4.1/), and the PrediSi (http://www.predisi.de/predisi/index.html) software were used to predict the signal peptide of Gp190, however no signal peptides were identified by either tool.
Although most proteins encoded by phage PF-CE2 were similar to the above-mentioned Cronobacter phages, there were still slight differences between them. For instance, Gp202 and Gp262 were unique genes to phage PF-CE2 (Fig. 1). Using HHPred software, Gp202 had a 96.23% possibility of encoding an HNH endonuclease, and the similarity between Gp202 and the mobile endonuclease, MobE encoded by Escherichia virus RB43, was 89%. HNH endonuclease is an important assembly machine in the life cycle of bacteriophages. Most HNH endonucleases have DNA nicking activity, and MobE is believed to promote the movement of homing endonuclease I-TevIII, by cutting the specific non-coding region of the gene encoding the small subunits of aerobic ribonucleotide reductase (nrdB), which is conducive to the inheritance of homing endonuclease I-TevIII in the offspring phages [23, 24]. The protein encoded by Gp202 may be of great significance for the reproduction, and infection of phage PF-CE2. Gp261, Gp262, and Gp263 were all related to the synthesis of phage tail fibers. The possibility of Gp262 encoding long tail fibers was 98.25% according to HHPred software analysis. Enterobacteria phage T4 (NC_00866), also belonging to the subfamily Tevenvirinae of the family Myoviridae, contains both long- and short-tail fibers in its tail, which are responsible for host cell recognition and binding to the host cell. The long tail fiber of phage T4 recognizes outer membrane protein C, or the lipopolysaccharides of host bacteria and is responsible for the initial and reversible attachment of the virion . Therefore, Gp262 may be beneficial for phage PF-CE2 to recognize and adsorb host strain.
Previous studies have used the DNA polymerase and short tail fiber protein sequences to determine the phylogenetic relationships of several phages [26, 27]. The phylogenetic trees are shown in Fig. S1 and were made using MEGA 7.0 with the neighbor-joining tree method . The tree based on the DNA polymerase, shows that phage PF-CE2 was grouped with the Cronobacter phage vB CsaM leB, suggesting a close relationship between these phages. In the short tail fiber protein tree, phage PF-CE2 was similar to the Citrobacter phage Margaery, with a bootstrap value of 98% and both grouped with a larger cluster, which included Cronobacter phages vB CsaM leE and vB CsaM leB. The Bacterial and Archaeal Viruses Subcommittee (BAVS) defined that the difference between two viruses belonging to the same species should be less than 5% at the nucleotide level. Recently, average nucleotide identity (ANI) has been used to assess the genetic relationships among species at the genome level [29–31]. ANI values of genus and species demarcation boundaries have a mean of 73.98% and 95%, respectively [30, 31]. In addition, the ANI value based on more than 50% coverage of genomes is credible . Here we used JSpeciesWS (http://jspecies.ribohost.com/jspeciesws/) to estimate the ANI values between the phage PF-CE2 genome and other phage genomes . When compared with phage genomes in 14 genera of the subfamily Tevenvirinae of the family Myoviridae, as shown in Fig. 2A and Table S5, phage PF-CE2 shared 88.65% genome ANI with Escherichia phage RB16 (Escherichia virus RB16 genus, HM134276), which suggests that phage PF-CE2 belongs to the Escherichia virus RB16 genus. As shown in Fig. 2B and Table S6, phage PF-CE2 and the eight phages mentioned above, shared < 95% genome ANI with Escherichia phage RB16, but shared > 95% genome ANI with each other. This indicates that the nine phages could be assigned to a new species under the Escherichia virus RB16 genus of the subfamily Tevenvirinae in the family Myoviridae. There, we would name this new species, Cronobacter virus PF-CE2.
In conclusion, we determined the genome sequence of phage PF-CE2, belonging to the subfamily Tevenvirinae of the family Myoviridae. This finding is beneficial for identifying the gene encoding the bacteriophage-borne glycanase of the C. sakazakii phage, PF-CE2 and for understanding the mechanism of infection of phage PF-CE2 in host bacteria containing FcEPSs, which could promote the wide application of bacteriophage-borne glycanases in the preparation of fucosylated oligosaccharides. Moreover, a new species named Cronobacter virus PF-CE2 was established and the taxonomic status of eight phages was modified according to ANI analysis, which provides an important reference for the preservation and sharing of these phages.