Bacterial strain isolation
Clostridioides difficile CD34-Sr was isolated from the hospital environment, in a 600-bed clinical hospital of Medical University of Silesia, Katowice, Poland. The strain comes from a Nephrology Ward, from the bed frame in one of patients’ room. The material was collected using a selective broth enabling C. difficile spores’ germination: C diff Banana Broth (Hardy Diagnostics, Santa Maria USA). After incubation one loop broth was replated on selective C. difficile media chromID C. difficile (bioMérieux, Marcy L'Etoile, France) and incubated for 48 hours under anaerobic conditions. Colonies with a characteristic horse odor and yellow-green fluorescence under UV light, microscopically recognized as a cylindrical Gram-positive bacilli, were identified in automatic system - VITEK 2 Compact (bioMérieux, Marcy L'Etoile, France) as C. difficile.
Prophage induction and phage isolation
To determine if strain CD34-Sr contained a functional prophage, we used the mitomycin C high-throughput induction method described previously [14]. In this method, the inducible phage DNA in the heated lysate was PCR-confirmed using specific phage primers targeting-holin genes of myovirus and siphovirus [17]. The results confirmed that the amplified PCR product was holin gene of siphovirus. Finally, the strain of C. difficile CD34-Sr containing inducible temperate phage, was chosen for large scale phage induction. Mitomycin C induction was performed on 500 ml of log phase bacteria cultured in BHI broth (Sigma-Aldrich, USA). Following the overnight incubation, phage lysate was collected, filtered, and concentrated using PEG precipitation [18]. We analyzed the concentrated phages lysate under the electron microscope and found only one type of phage particle. Phage fraction was then further purified using CsCl gradient as previously described [18]. The isolated phage was named after its discoverer's initials phiCDKH01.
Genomes sequencing and annotation
Phage genomic DNA was purified using a Phage DNA Isolation Kit (Norgen Biotek Corp., Canada) following the manufacturer’s instructions. Whole-genome sequencing was performed by the Genomed S.A. (Poland) on the Illumina MiSeq platform with 764-fold coverage. High-quality paired-end reads were assembled de novo using SPAdes v. 3.13.0 (https://github.com/ablab/spades). Obtained consensus sequence was annotated with myRAST v. 36 (https://rast.nmpdr.org/) [20] and deposited in the GenBank under accession number MN718463.
The genomic features of phiCDKH01
The genome of phage phiCDKH01 is 45,089 bp in length with a G+C content of 28.7%, similar to that of its host C. difficile. In the initial annotation a total of 66 ORFs were identified as probable protein-coding genes. 53 were located on the positive strand, while only 13 ORFs were located on the negative strand. No rRNA or tRNA genes were identified. Thirty-seven genes were assigned a predicted function. The complete phage genome could be divided into functional clusters that encode proteins involved in DNA packaging, head and tail morphogenesis, host cell lysis, and replication (Fig 1). We identified genes for the terminase large subunit (phiCDKH01_44), terminase small subunit (phiCDKH01_43), tail tape-measure protein (phiCDKH01_43), two tail family proteins (phiCDKH01_62/63), pre-neck appendage-like protein (phiCDKH01_65), portal protein (phiCDKH01_45), scalfolding protein (phiCDKH01_51) and capsid protein (phiCDKH01_52).
Additionally, we detected genes encoding proteins whose presence confirms the temperate nature of phiCDKH01, including a recombinase (phiCDKH01_31), integrase (phiCDKH01_12), antirepressors (phiCDKH01_20/24), and 5 putative transcriptional regulators (phiCDKH01_07/17/19/22/27), suggesting that the prophage could affect some bacterial functions.
We identified the gene cluster for host cell lysis containing an N-acetylmuramoyl-L-alanine amidase (phiCDKH01_06), putative holin protein (phiCDKH01_05) and ImmA/IrrE family metallo-endopeptidase (phiCDKH01_13).
We also found genes involved in DNA replication encoding: a dnaD domain protein (phiCDKH01_32), single-stranded DNA-binding protein (phiCDKH01_33) and two putative PemI proteins (phiCDKH01_10/42). These proteins have been shown to be essential for the autonomous replication of natural plasmids with a low copy number, i.e., R100 [21].
Finally, we identified several additional interesting genes that encode proteins with different functions e.g., an ADP-ribosyltransferase exoenzyme family protein (phiCDKH01_48) that might covalently modify cell actin to modify physiology of eukaryotic cells, similarly to Clostridium botulinum C2 or Clostridium perfringens E iota toxins do [22]. Gene coding for a putative lipoprotein (phiCDKH01_60) might play a role in cortex modification and thus spore germination [23]. Another one is the HicB antitoxin (phiCDKH01_23) that is a member of a type II toxin-antitoxin system family found in bacteria and archaea and has been shown to be involved in the stress response, virulence and persistence [24] (Tab.S1).
Among other interesting features of the phiCDKH01 genome, a putative CRISPR (clustered regularly interspaced short palindromic repeats) and a nearby CRISPR array comprising 5 spacers of 35, 36 or 37 bp (Fig 1, Tab.S2) were identified. Analysis of the CRISPR array revealed that all spacers do not target known C. difficile phages. Spacers 2 (100% identity) and 5 (97,14% identity) were, however, detected in several other C. difficile genomes, but spacers 1, 3 and 4 did not match known sequences (Table S2). Of note, no other phages were detected in the strain carrying phiCDKH01, supporting that the CRISPR array could phiCDKH01 be active and prevents further infection by phages.
Phylogenetic analysis
The entire genome sequence of phiCDKH01 was used in multiple genome alignment along with genomic sequences of other 10 Clostridioides difficile siphophages available in the GenBank. Alignment of genomes was performed with Mauve v. 2.3.1 (http://darlinglab.org/mauve/mauve.html) [25] using progressive Mauve method. Obtained results were visualised with FigTree v. 1.4.4 software (https://github.com/rambaut/figtree) (Fig.2A). The most closely related phage turned out to be phiCD24-1, which was originally isolated from clinical isolate exhibiting 078 PCR ribotype [13, 26]. The sequences of phiCDKH01 and phiCD24-1 share 89% identity (Fig.2B).
Location of phiCDKH01 phage in the genome of C. difficile
Bacterial genomic DNA of CD34-Sr strain was isolated using E.Z.N.A. Bacterial DNA Kit (OMEGA bio-tek, USA). Whole-genome sequencing was performed on the Illumina MiSeq platform (Genomed S.A.) with 72-fold coverage. Upon quality check reads were assembled de novo using SPAdes v. 3.13.0 into 70 contigs. Obtained sequences were deposited in GenBank under accession number JACSDL000000000 and subjected to automatic annotation. The sequence of phiCDKH01 was found in the contig JACSDL010000003.1 in position 288,650 to 333,698. The prophage is integrated between loci H7706_07450 and H7706_07755. H7706_07450 shares homology to the manganese catalase family protein (GenBank accession MBC6710325.1). H7706_07755 is annotated as ilvB gene coding for the biosynthetic-type acetolactate synthase large subunit (GenBank accession MBC6710385.1).