3.1. General features of the C. striatum genomes
C. striatum genomes presented a predicted size ranging from 2.61 Mb to 3.13 Mb, with a slight variation in G+C percentage between genomes (range: 59.2% - 59.8%). The total number of predicted coding sequences (CDSs) varied between 2,089 and 2,924. A phylogenetic network analysis based on seven housekeeping genes indicates the existence of two well-defined groups of C. striatum strains separated by geographical origin (Fig. 1a). The presence of different MDR C. striatum clones, identified by the cgMLST analysis (Fig. 1b), corroborates with findings from previous studies based on pulsed-field gel electrophoresis (PFGE) (Ramos et al. 2019). Interestingly, while strains with very similar phenotypic antimicrobial susceptibility profiles could be found forming clonal complexes (strains 2308 and 2023) (Fig. 1b), we also found strains carrying significantly different contents of AMR genes forming a clonal group (strains LK37 and 797_CAUR) (Fig. 1b). This finding underscores the potential role of horizontal gene transfer as a significant force driving the acquisition of antimicrobial resistance genes in the hospital environment by the species C. striatum.
The single strain from a non-human host, Kc-Na-01, presented the most significant genetic divergence from all studied isolates (Fig. 1a). Then, we analyzed this genome using average nucleotide identity by BLAST (ANIb) and found that this genomic sequence shares a higher than 94.5% identity with all genomic sequences included in the study (Supplementary Fig. S1). Even though it is generally regarded that a standard cutoff for species circumscription would be at ≥ 95% ANIb, some studies have shown that an ANIb value of ca. 94% equals to ca. 70% DNA-DNA hybridization (DDH) (Konstantinidis and Tiedje 2005; Rodriguez-R and Konstantinidis 2014; Qin et al. 2014). This assumption would permit classifying the strain Kc-Na-01 as belonging to the species C. striatum.
3.2. Pan-genomic analysis of C. striatum
The species C. striatum possesses an open pan-genome (α = 0.852803) (Fig. 2a), with 3,816 gene families, of which 33.99% (1,297) are present in the core genome and 34.25% (1,307) in the accessory genome. The unique genes represent 31.76% of the predicted gene families (1,212) (Fig. 2b). The strains KC-Na-01, from a non-human host, and BM4687, a clinical isolate from France, concentrate 42.8% of the species’ total number of unique genes (260 and 326 individual genes, respectively). For strain KC-Na-01, the high number of unique genes may be partially explained by the existence of sequences derived from two different plasmids that contribute to the complete gene set of this isolate. Noteworthy, a new aminoglycoside 3-N-acetyltransferase (AAC(3)-XI) from chromosomal origin was recently discovered from the analysis of the genome of isolate BM4678 (Galimand et al. 2015).
The gene families in the C. striatum pan-genome are mainly distributed in the following COG functional categories: Metabolism (36.9%); Information storage and processing (35.3%); Cellular processes and signaling (8.2%); and Poorly characterized (19.4%) (Fig. 3a). Regarding the core genome, the most prevalent functional categories included: Translation, ribosomal structure and biogenesis (9.6%); Amino acid transport and metabolism (9.2%); and Transcription (9.1%). The accessory genes are mainly classified into the categories of Amino acid transport and metabolism (10.1%); Replication, recombination and repair (8.6%); and Transcription (8%). Finally, unique genes predominate in the categories of Replication, recombination and repair (19.7%); Transcription (10.9%); and Defense mechanisms (10.4%) (Fig. 3b). A high number of genes related to transcription regulation was an essential feature of the three pan-genome subsets.
3.3. Prediction of antimicrobial resistance genes (AMRs)
Through automated prediction, we identified 15 antimicrobial resistance genes (AMRs) in the C. striatum genomes (Fig. 4a). The single gene identified in all strains was rpsL, which codes for the 30S subunit-S12 ribosomal protein, presenting mutations similar to those described in streptomycin-resistant M. tuberculosis (Sreevatsan et al. 1996). However, the ATCC6940 and 1961 strains have a streptomycin susceptibility phenotype (Fig. 4b), whereas all the other strains isolated in Brazil have a streptomycin-resistant phenotype (Fig. 4b).
The AMR genes aph(3')-Ia, aph(3')-Ib and aph(6)-Id code for aminoglycoside phosphotransferases (Wright and Thompson 1999) that were found in the C. striatum genomes in the following proportions: 40.7%, 37.0% and 37.0%, respectively (Fig. 4a). These genes encode enzymes that catalyze the phosphorylation of various aminoglycosides. The aac(3)-XI gene, found in 22.0% of the strains, encodes the enzyme aminoglycoside 3-N-acetyltransferase type XI, initially described in C. striatum (Galimand et al. 2015).
The cmx gene, part of the MFS transporters family and promotes chloramphenicol efflux (Tauch et al. 1998), was present in 44.0% of the strains (Fig. 4a). As expected from previous studies, all strains that possessed this gene presented phenotypic non-susceptibility to chloramphenicol (Fig. 4b).
The ermX gene was found in 74.0% of the strains and codes for the rRNA methyltransferase enzyme responsible for the ineffective binding of macrolides, lincosamides, and streptogramins to the 23S ribosomal binding site (Roberts et al. 1999).
The sul1 gene was identified in 18.5% of the strains and is part of a gene family that codes for alternatives to the dihydropteroate synthase enzymes, which have less affinity for sulfonamides (Changkaew et al. 2014).
Although the tetA and tetW genes were found in 44.4% and 40.7% of the C. striatum strains, respectively (Fig. 4a), resistance to tetracycline did not correlate well with the identification of these genes in the genomic sequences (Fig. 4b). These genes confer resistance to tetracycline through different mechanisms: tetA encodes transport proteins of the MFS family, whereas tetW encodes ribosomal protection protein (Roberts 2005).
Four AMR genes in the C. striatum pan-genome were identified as unique genes: tetB, aac(6')-lb7, aadA and qacE. The tetB gene was found in the 2023 strain, but this strain is phenotypically susceptible to tetracycline, despite carrying the tetA and tetB genes, encoding efflux pumps for tetracyclines (Fig. 4b). Additionally, mutations in the rpoB gene similar to those found in rifampicin-resistant strains of Mycobacterium tuberculosis were found in the 2023 strain, which is phenotypically resistant to rifampicin (Ramos et al. 2019).
The other three unique genes (aac(6')-lb7, aadA and qacE) were only found in the LK37 lineage (Fig. 4a). The aac(6')-Ib7 gene encodes the aminoglycoside acetyltransferase enzyme (Roberts et al. 1999) and the aadA the aminoglycoside nucleotidyltransferase enzyme (Clark et al. 1999) responsible for aminoglycoside resistance, while the qacE gene codes for a proton-dependent efflux pump for monovalent cationic antiseptics such as ammonium quaternary (Paulsen et al. 1996).
3.4. Prediction of virulence factors
We identified 32 genes potentially related to virulence in C. striatum, with 19 (59.3%) genes present in all strains, 11 (34.3%) genes appearing in at least two strains, but not in all, and 2 (6.3%) genes uniquely present in one strain (Fig. 5). These virulence factors are distributed in 10 functional categories (Fig. 5). In the ‘iron uptake category’, three operons were identified as present in all C. striatum strains, namely the fagABCD operon, also present in Corynebacterium pseudotuberculosis strains (Billington et al. 2002; Dorella et al. 2006) the hmuTUV and the irp6ABC operons, also found in C. diphtheriae (Allen and Schmitt 2009; Schmitt 2014). Additionally, the itrAB operon is present in 17 C. striatum lineages (63%), and the mbtH and fxbA genes in 12 lineages (44.4%); these genes have been widely described in bacteria of the genus Mycobacterium (Dussurget et al. 1999; Timms et al. 2015).
Four genes coding for transcriptional regulators of potential virulence genes were also found in all C. striatum genomes (Fig. 5). The iron-activated dtxR gene is involved in regulations of genes related to iron homeostase, such as the genes of the fagABC, hmuTUV and ipr6ABC operons in Corynebacteria spp. (Qian et al. 2002; Trost et al. 2010). This gene may also be involved in regulating of irtA, irtB, mbtH and fxbA, as demonstrated by (Manabe et al. 2005). The senx3, sigA and sigD genes have already been shown to play essential roles in the virulence and persistence of Mycobacteria spp. (Gomez et al. 1998; Raman et al. 2004; Singh and Kumar 2015).
The pafA and mpA genes, present in all strains, are part of the Pup proteasome System in Actinobacteria and have relevance in the persistence of M. tuberculosis in the host (Darwin 2009). The SpaFED pili are present in 21 strains, together with the genes of the sortases srtB and strC necessary for the pili’s assembly (Gaspar and Ton-That 2006). These protein structures are displayed on the cell surface and participate in biofilm formation, DNA translocation, and interactions with other bacteria, besides working as phage receptors, contributing to pathogenesis (Mandlik et al. 2008; Proft and Baker 2009; Kline et al. 2010).
The secA2 secretion system was found in all C. striatum strains (Fig. 5). This system has been demonstrated to be responsible for the exportation of multiple effectors that interfere with phagosome maturation and promote intracellular replication in M. tuberculosis (Zulauf et al. 2018).
Orthologs of the genes wecB and wecC code for the enzymes UDP-N-acetilglucosamine-2-epimerase and UDP-N-acetil-d-manosamine desidrogenase, uniquely found in the strain Kc-Na-01 (Fig. 5). Although these enzymes are expected to be found in Gram-negative bacteria, for the synthesis of lipopolysaccharide (Rai and Mitchell 2020), the orthologs mnaA and mnaB have been described in Staphylococcus spp. and are involved in the biosynthesis of the cell envelope.
3.5. Genomic islands (GI), prophages, insertion sequences (IS), and CRISPR loci
Eigthy-four out of 129 AMRs distributed throughout all strains were found within genomic islands (Fig.6). Sixteen strains presented the ermX adjacent to gcrA and gcrB, as shown in the genomic context for strain 2023 (Fig. 7a). Seven AMRs were found within the GI18, including ermX, tetA, and tetB (Fig. 6; Fig. 7b). The genes aph(3”)Ib, aph(6)-Id, and cmx are all presented in GI8 (Fig. 6). These together with other genes that are found in the flanking regions of AMR genes were identified in the pTP10 plasmid, which is part of the C. striatum M82B genome (Tauch et al. 2000), as well as of the genome of C. striatum strain 2308 (Ramos et al. 2018).
Seventy-four bacteriophage signatures were detected in the studied genomes (Supplementary Table S2). Nevertheless, only 16 phage sequences were found intact in the genomes, of which the most prevalent were the PHAGE_Rhodoc_REQ3_NC_016654 and the PHAGE_Staphy_SPbeta_like_NC_029119. Notably, four AMR genes were found within a phage context in strain LK37 (Fig. 7c), present in GI5 (Fig.6).
Integrons are versatile genetic elements, characterized by the ability to insert, excise, rearrange and express genes through a site-specific recombination system and can act as vehicles for intra- and inter-specific transmission of genetic material (El Sayed Zaki et al. 2022). Integrons are characterized into classes based on the type of integrase gene. Class 1 integron is the most frequently observed in clinical isolates, mainly in Gram-negative bacteria (Racewicz et al. 2022). In Pseudomonas aeruginosa, the presence of class 1 integron is associated with the emergence of the MDR phenotype (El Sayed Zaki et al. 2022). However, studies on the presence of integrons in Gram-positive bacteria especially in Corynebacterium species are scarce. Class 1 integrons have been found in some Corynebacterium clinical isolates, such as Corynebacterium diphtheriae biovar mitis (Barraud et al. 2011), Corynebacterium resistens (Schröder et al. 2012), and Corynebacterium urealyticum (Rocha et al. 2020). To date, there are no studies describing the presence of class 1 integrons in C. striatum (Leyton et al. 2021). In our study, we found the class 1 integron in LK37 carrying the genes sul1, qacE, aadA, and aac(6’)-lb7 (Fig. 7c), in strains 2130, 2296, 2425, and 3012STDY7069329 carry only the sul1 gene (Supplementary Table S3).
We also evaluated the insertion sequences (IS) that appear in the same genomic context as the AMR genes. The main IS families found in these regions were IS3, IS481, IS256, ISL3 and IS6. Interestingly, the ISCre1, belonging to the IS256 family, is associated with the aac(3)-XI gene in 7 C. striatum isolates. Additionally, the ISCx1 insertion sequence, belonging to the ISL3 family, was found in association with erm(X), tet(W) and aac(3)-XI. The IS1249 was also found in the genomic context of erm(X) and the IS5564 was located near the genes aph(3”)-Ib and aph(6’)-Id. Both IS1249 and ISCx1 are part of a transposon Tn5432, which has been identified in the genomic sequences of C. striatum by recent studies (Wang et al. 2019; Leyton et al. 2021; Leyton-Carcaman and Abanto 2022) (Fig. 7c).
We also evaluated the genomic context of virulence factors and found that the operons spaDEF, fagABCD, hmuTUV, irtAB, and the gene fxbA can also be located in GIs. Twenty-one C. striatum isolates have mobile genetic elements in the same context of genes coding for SpaD-like pili (Fig. 8). This region is also within a predicted genomic island, GI17 (Fig. 6). Noticeably, five strains present frameshifts in at least one of the pili encoding genes (Fig. 8b), suggesting an incapacity to pili assembly or dependence on sortases StrB and StrC (Gaspar and Ton-That 2006). Some strains did not present the pili assembly machinery within the same genomic context: KC-Na-01, 1961, 962_CAUR, 963_CAUR, 1329_CAUR and LK37 (Fig. 8c).
Great variability in CRISPR-associated loci was identified in the species C. striatum (Supplementary Table S4). The studied genomes presented an average number of 52 detected spacer sequences, but with a wide range between 5 and 117 sequences. The identified repeat sequences were more consistently distributed throughout the genomes (Supplementary Table S3). Some genomes presented two clusters of genes coding for CRISPR-associated proteins (Cas), but the most prevalent organization found was the type IE CRISPR system (Supplementary Table S3). A recent study suggests the existence of an alternative, as-yet-unidentified CRISPR system organization in this species, termed Type I-E’ (Ramos et al. 2022).