Complete genome sequence of a clinical isolate of Clostridioides dicile bacteriophage phiCDKH01 of the family Siphoviridae

A new temperate phiCDKH01 siphophage was obtained from clinical isolate of Clostridioides dicile. The phage genome is a 45,089 bp linear double-stranded DNA molecule with an average G + C content of 28.7%. It shows low similarity to known phage genomes except for phiCD24-1. Genomic and phylogenetic analysis revealed that phiCDKH01 is a novel phage. 66 putative ORFs were predicted in the genome, 37 of which code for proteins with predicted functions. The phiCDKH01 prophage has been localized in the host genome. Results of this study increases genetic diversity of known tailed phages.


Introduction
Clostridioides di cile is a pathogen with great epidemiological potential and a serious threat to human health [1]. In the CDC's latest report on the risk of drug resistance, C. di cile was classi ed as the leading cause of nosocomial infections [2]. C. di cile infection (CDI) is closely related to the weakening of the function of the intestinal microbiome as a side effect of antibiotic therapy [3,4]. CDI clinical picture is complex and most often manifested with mild, moderate, or severe diarrhea. The development of CDI infection can turn into life-threatening pseudomembranous colitis or toxic megacolon [5,6,7]. Currently, acute C. di cile infection is treated with antibiotics, i.e. metronidazole, vancomycin and daxomicin [8]. The use of antibiotics in the treatment of CDI increases the risk of exacerbation of micro ora dysbiosis causing a reduction or removal of normal intestinal commensals. Consequently, C. di cile may colonize this niche [9]. Moreover, in the case of this infection, antibiotic therapy promotes the recurrence of the disease and increases the chance of emergence of antibiotic resistance [10].
In the last decade, interest in bacteriophages that infect the pathogenic C. di cile has increased due to their possible contribution to virulence, host biology and their potential as alternative therapeutic agents [11]. All so far described phages infecting C. di cile are temperate. In most cases they have been isolated from bacterial cells after induction of prophages [12,13,14]. Described C. di cile phages belong to the Myoviridae or Siphoviridae families of the order Caudovirales, i.e. phages with contractile or noncontractile tails, respectively [12,15]. Myoviridae phages are the most numerous and their genomes show signi cant DNA homology with a tendency to create phylogenetically related clusters. On the contrary, a limited number of Siphoviridae phages have been described and sequenced and these phages have been shown to be genetically more distant [16].
In the current study, a newly discovered phage named phiCDKH01 was isolated and characterized. The phage genome was sequenced, annotated and phylogenetic analysis indicated that phiCDKH01 is a member of the Siphoviridae and might belong to a new phage lineage. We also showed the location of the newly discovered siphophage in the genome of its host.

Bacterial strain isolation
Clostridioides di cile CD34-Sr was isolated from the hospital environment, in a 600-bed clinical hospital of Medical University of Silesia, Katowice, Poland. The strain comes from a Nephrology Ward, from the bed frame in one of patients' room. The material was collected using a selective broth enabling C. di cile spores' germination: C diff Banana Broth (Hardy Diagnostics, Santa Maria USA). After incubation one loop broth was replated on selective C. di cile media chromID C. di cile (bioMérieux, Marcy L'Etoile, France) and incubated for 48 hours under anaerobic conditions. Colonies with a characteristic horse odor and yellow-green uorescence under UV light, microscopically recognized as a cylindrical Gram-positive bacilli, were identi ed in automatic system -VITEK 2 Compact (bioMérieux, Marcy L'Etoile, France) as C. di cile.

Prophage induction and phage isolation
To determine if strain CD34-Sr contained a functional prophage, we used the mitomycin C highthroughput induction method described previously [14]. In this method, the inducible phage DNA in the heated lysate was PCR-con rmed using speci c phage primers targeting-holin genes of myovirus and siphovirus [17]. The results con rmed that the ampli ed PCR product was holin gene of siphovirus.
Finally, the strain of C. di cile CD34-Sr containing inducible temperate phage, was chosen for large scale phage induction. Mitomycin C induction was performed on 500 ml of log phase bacteria cultured in BHI broth (Sigma-Aldrich, USA). Following the overnight incubation, phage lysate was collected, ltered, and concentrated using PEG precipitation [18]. We analyzed the concentrated phages lysate under the electron microscope and found only one type of phage particle. Phage fraction was then further puri ed using CsCl gradient as previously described [18]. The isolated phage was named after its discoverer's initials phiCDKH01.
The genomic features of phiCDKH01 The genome of phage phiCDKH01 is 45,089 bp in length with a G+C content of 28.7%, similar to that of its host C. di cile. In the initial annotation a total of 66 ORFs were identi ed as probable protein-coding genes. 53 were located on the positive strand, while only 13 ORFs were located on the negative strand. No rRNA or tRNA genes were identi ed. Thirty-seven genes were assigned a predicted function. The complete phage genome could be divided into functional clusters that encode proteins involved in DNA packaging, head and tail morphogenesis, host cell lysis, and replication (Fig 1). We identi ed genes for the terminase large subunit (phiCDKH01_44), terminase small subunit (phiCDKH01_43), tail tape-measure protein (phiCDKH01_43), two tail family proteins (phiCDKH01_62/63), pre-neck appendage-like protein (phiCDKH01_65), portal protein (phiCDKH01_45), scalfolding protein (phiCDKH01_51) and capsid protein (phiCDKH01_52).
We also found genes involved in DNA replication encoding: a dnaD domain protein (phiCDKH01_32), single-stranded DNA-binding protein (phiCDKH01_33) and two putative PemI proteins (phiCDKH01_10/42). These proteins have been shown to be essential for the autonomous replication of natural plasmids with a low copy number, i.e., R100 [21].
Finally, we identi ed several additional interesting genes that encode proteins with different functions e.g., an ADP-ribosyltransferase exoenzyme family protein (phiCDKH01_48) that might covalently modify cell actin to modify physiology of eukaryotic cells, similarly to Clostridium botulinum C2 or Clostridium perfringens E iota toxins do [22]. Gene coding for a putative lipoprotein (phiCDKH01_60) might play a role in cortex modi cation and thus spore germination [23]. Another one is the HicB antitoxin (phiCDKH01_23) that is a member of a type II toxin-antitoxin system family found in bacteria and archaea and has been shown to be involved in the stress response, virulence and persistence [24] (Tab.S1).
Among other interesting features of the phiCDKH01 genome, a putative CRISPR (clustered regularly interspaced short palindromic repeats) and a nearby CRISPR array comprising 5 spacers of 35, 36 or 37 bp (Fig 1, Tab.S2) were identi ed. Analysis of the CRISPR array revealed that all spacers do not target known C. di cile phages. Spacers 2 (100% identity) and 5 (97,14% identity) were, however, detected in several other C. di cile genomes, but spacers 1, 3 and 4 did not match known sequences (Table S2). Of note, no other phages were detected in the strain carrying phiCDKH01, supporting that the CRISPR array could phiCDKH01 be active and prevents further infection by phages.
Location of phiCDKH01 phage in the genome of C. di cile Bacterial genomic DNA of CD34-Sr strain was isolated using E.Z.N.A. Bacterial DNA Kit (OMEGA bio-tek, USA). Whole-genome sequencing was performed on the Illumina MiSeq platform (Genomed S.A.) with 72fold coverage. Upon quality check reads were assembled de novo using SPAdes v. 3.13.0 into 70 contigs. Obtained sequences were deposited in GenBank under accession number JACSDL000000000 and subjected to automatic annotation. The sequence of phiCDKH01 was found in the contig JACSDL010000003.1 in position 288,650 to 333,698. The prophage is integrated between loci H7706_07450 and H7706_07755. H7706_07450 shares homology to the manganese catalase family protein (GenBank accession MBC6710325.1). H7706_07755 is annotated as ilvB gene coding for the biosynthetic-type acetolactate synthase large subunit (GenBank accession MBC6710385.1).

Declarations
Funding The research was supported by the National Science Centre of Poland MINIATURA Programme (2018/02/X/NZ6/01360).
Con ict of interests The authors declare that they have no con ict of interests.
Ethical approval This article does not contain any study performed with human participants or animals.  . Predicted ORFs and direction of transcription are indicated by block arrows. Blue box represents a putative CRISPR. Conserved regions are shaded with grey. Colour intensity corresponds to identity level (89% to 100%). Genomic comparisons were performed with BLASTN. Similarities with E values lower than 1e-100 are plotted. The gure was produced with Easy g 2.2.5 [27].

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.