The pan-genome of the emerging multidrug-resistant pathogen Corynebacterium striatum

Corynebacterium striatum, a common constituent of the human skin microbiome, is now considered an emerging multidrug-resistant pathogen of immunocompromised and chronically ill patients. However, little is known about the molecular mechanisms in the transition from colonization to the multidrug-resistant (MDR) invasive phenotype in clinical isolates. This study performed a comprehensive pan-genomic analysis of C. striatum, including isolates from “normal skin microbiome” and from MDR infections, to gain insights into genetic factors contributing to pathogenicity and multidrug resistance in this species. For this, three novel genome sequences were obtained from clinical isolates of C. striatum of patients from Brazil, and other 24 complete or draft C. striatum genomes were retrieved from GenBank, including the ATCC6940 isolate from the Human Microbiome Project. Analysis of C. striatum strains demonstrated the presence of an open pan-genome (α = 0.852803) containing 3816 gene families, including 15 antimicrobial resistance (AMR) genes and 32 putative virulence factors. The core and accessory genomes included 1297 and 1307 genes, respectively. The identified AMR genes are primarily associated with resistance to aminoglycosides and tetracyclines. Of these, 66.6% are present in genomic islands, and four AMR genes, including aac(6')-ib7, are located in a class 1-integron. In conclusion, our data indicated that C. striatum possesses genomic characteristics favorable to the invasive phenotype, with high genomic plasticity, a robust genetic arsenal for iron acquisition, and important virulence determinants and AMR genes present in mobile genetic elements.


Introduction
Corynebacterium striatum is generally a common inhabitant of the human skin and upper respiratory tract. Still, it is also considered an emerging nosocomial pathogen in humans, affecting patients in opportunistic circumstances due to chronic diseases, immunosuppression, invasive medical procedures, previous antibiotic therapies and use of prosthetic devices (Khan et al. 2021). A growing number of reports have demonstrated the relevance of C. striatum in the etiology of a variety of infectious processes, in both immunocompromised and immunocompetent patients: endocarditis (Mansour et al. 2020;Rasmussen et al. 2020;Chang and Chen 2020;Biscarini et al. 2021;Bläckberg et al. 2021), meningitis (Zhang et al. 2020), and chronic septic arthritis (Hollnagel et al. 2020).
Only a few studies have investigated the virulence mechanisms of C. striatum. Some demonstrated the relevance of biofilm formation during bloodstream infections, contributing to host invasive diseases. C. striatum strains were reported as etiologic agents of serious nosocomial invasive bloodstream infections of inpatients submitted to invasive medical procedures including endotracheal intubation and catheterization (Souza et al. 2015;Kang et al. 2018;Ramos et al. 2018); adherence properties to several medical devices made of different abiotic and biotic (hydrophilic and hydrophobic) materials (Alibi et al. 2021;Souza 2021); and increased resistance of sessile and planktonic forms to some biocides, antiseptics, and varied antibiotic classes (Souza et al. 2020). Furthermore, other virulence factors have also been investigated in the Caenorhabditis elegans-based virulence assay .
Recent studies have investigated the genetic composition of the species C. striatum and demonstrated that clonal multidrug-resistant isolates can be identified as the causative agents of nosocomial outbreaks (Pereira Baio et al. 2013;Nudel et al. 2018;Wang et al. 2019Wang et al. , 2021 Genetically diverse isolates carrying similar mobile genetic elements conferring multiple resistance to antimicrobials have already been described, then confirming the role of horizontal gene transfer in the rapid acquisition of the MDR phenotype in the species (Navas et al. 2016;Ramos et al. 2018).
In this study, we performed a pan-genomic analysis of the C. striatum species, including comparative analyzes of the genomes of three clinical isolates newly sequenced by our research consortium associated to additional twenty-four genomes retrieved from public databases. We observed that C. striatum has an open pan-genome, with extensive horizontal gene transfer activity probably being the primary driver of the rapid acquisition of resistance to multiple antimicrobial agents in the species.

Whole-genome sequencing and analysis
The QIAamp DNA mini kit (Qiagen) was used to extract total genomic DNA from three C. striatum clinical isolates (2023,2230,2247) (Ramos et al. 2019) obtained from the Hospital Universitário Pedro Ernesto, State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil. Complete genome sequencing was performed with the Ion Torrent Personal Genome Machine System, using a 318 chip and a fragments library, exactly as previously described by our research team (Mattos-Guaraldi et al. 2015). The FastQC tool was used for quality assessment of the generated sequences (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/), trimming was performed to keep only reads with Phred values > 20. De novo genome assembly was achieved by SPAdes 3.0 (Bankevich et al. 2012) and MIRA 4.0 (Chevreux et al. 1999).

Data retrieval from public databases and through literature mining
Twenty-four additional genomic sequences from C. striatum isolates were retrieved from NCBI's GenBank (see supplementary material Table S1). Clinical data, isolation sites, and phenotypic profiles of susceptibility to antimicrobial agents were obtained through literature searches. All genomic sequences were re-annotated using the Pathosystems Resource Integration Center (PATRIC) platform (Wattam et al. 2017) for standardization of genomic annotation with the RASTtk pipeline (Brettin et al. 2015).

Phylogenetic and phylogenomic analyses
A multilocus sequence analysis (MLSA) and phylogenetic network analysis were performed with the following genes, retrieved from the standardized genomic sequences: atpA, dnaE, dnaK, fusA, leuA, odhA, and rpoB (Mattos-Guaraldi et al. 2015). A public genomic sequence for the PES1 strain of Corynebacterium simulans was used as an external group. The seven gene sequences for each genome were recovered from the standardized annotations, concatenated, and then aligned using MAFFT (Katoh et al. 2017). GBlocks were used to obtain regions with improved syntenies between the concatenated sequences (Castresana 2000). The MLSA tree was built using PhyML 3.0 (Guindon et al. 2010), with a substitution model GTR + G + I. A phylogenetic network was constructed with SplitsTree 4 (Huson and Bryant 2006). In addition, a minimum spanning tree based on Core Genome Multilocus Sequence Typing (cgMLST) was built with BacWGSTdb (Ruan and Feng 2015).

Pan-genomic analysis of C. striatum
The C. striatum pan-genome was inferred for the 27 standardized genomic sequences using the Bacterial Pan Genome Analysis (BPGA 1.3) pipeline (Chaudhari et al. 2016), using USEARCH v11 (Edgar 2010) for gene grouping and an identity cutoff value of 50%. The Power Law regression model (n = k. N α ) was used to determine whether the pangenome is open (α ≤ 1) or closed (α > 1) (Tettelin et al. 2005(Tettelin et al. , 2008. The subgroups of the pan-genome were submitted for functional annotation of the Cluster of Orthologous Groups (COG) categories using the eggNOG-Mapper (Huerta-Cepas et al. 2017). Biosynthetic pathways of secondary metabolites were predicted with antiSMASH 6.0 (Blin et al. 2021).

Predictions of AMR genes and virulence factors
The prediction of antimicrobial resistance (AMR) genes was performed in the PATRIC platform, using annotations of the Database of Antibiotic Resistant Organisms (NDARO) (https:// www. ncbi. nlm. nih. gov/ patho gens/ antim icrob ialresis tance/) and the Comprehensive Antibiotic Resistance Database (CARD) (Jia et al. 2017). Predictions of genes coding for potential virulence factors were made with VFanalyzer (Liu et al. 2018), using pattern searches by Hidden Markov Models with GLIMMER3 (Delcher et al. 2007); the Virulence Factor Database (VFDB) includes domain profiles generated by hmmbuild and searched by hmmsearch, both from the HMMER3 package (Mistry et al. 2013).
The CRISPR-CasFinder tool (Couvin et al. 2018) was used to identify CRISPR arrays and spacers sequences in the C. striatum genomes, using a 100 bp flanking distance an evidence level higher than 2.

Circular plot diagrams and genomic context analysis of antimicrobial resistance genes
Circular plot charts of the C. striatum genomes were generated with the Blast Ring Image Generator (BRIG 0.95) (Alikhan et al. 2011) by comparing all genomic sequences against a reference genome using the Basic Local Alignment Search Tool (BLAST 2.10.1) (Camacho et al. 2009). Predicted coordinates for genomic islands, prophages, virulence factors and antimicrobial resistance genes, including Illustrator for Biological Sequences (IBS 1.0.3) (Liu et al. 2015).

Results and discussion
General features of the C. striatum genomes C. striatum genomes presented a predicted size ranging from 2.61 to 3.13 Mb, with a slight variation in G + C percentage between genomes (range: 59.2-59.8%). The total number of predicted coding sequences (CDSs) varied between 2089 and 2924. A phylogenetic network analysis based on seven housekeeping genes indicates the existence of two well-defined groups of C. striatum strains separated by geographical origin (Fig. 1a). The presence of different MDR C. striatum clones, identified by the cgMLST analysis ( Fig. 1b), corroborates with findings from previous studies based on pulsed-field gel electrophoresis (PFGE) (Ramos et al. 2019). Interestingly, while strains with very similar phenotypic antimicrobial susceptibility profiles could be found forming clonal complexes (strains 2308 and 2023) (Fig. 1b), we also found strains carrying significantly different contents of AMR genes forming a clonal group (strains LK37 and 797_CAUR) (Fig. 1b). This finding underscores the potential role of horizontal gene transfer as a significant force driving the acquisition of antimicrobial resistance genes in the hospital environment by the species C. striatum.
The single strain from a non-human host, Kc-Na-01, presented the most significant genetic divergence from all studied isolates (Fig. 1a). Then, we analyzed this genome using average nucleotide identity by BLAST (ANIb) and found that this genomic sequence shares a higher than 94.5% identity with all genomic sequences included in the study ( Supplementary Fig. S1). Even though it is generally regarded that a standard cutoff for species circumscription would be at ≥ 95% ANIb, some studies have shown that an ANIb value of ca. 94% equals to ca. 70% DNA-DNA hybridization (DDH) (Konstantinidis and Tiedje 2005;Rodriguez-R and Konstantinidis 2014;Qin et al. 2014). This assumption would permit classifying the strain Kc-Na-01 as belonging to the species C. striatum.

Pan-genomic analysis of C. striatum
The species C. striatum possesses an open pan-genome (α = 0.852803) (Fig. 2a), with 3816 gene families, of which 33.99% (1297) are present in the core genome and 34.25% (1307) in the accessory genome. The unique genes represent 31.76% of the predicted gene families (1212) (Fig. 2b). The strains KC-Na-01, from a non-human host, and BM4687, a clinical isolate from France, concentrate 42.8% of the species' total number of unique genes (260 and 326 individual genes, respectively). For strain KC-Na-01, the high number of unique genes may be partially explained by the existence of sequences derived from two different plasmids that contribute to the complete gene set of this isolate. Noteworthy, a new aminoglycoside 3-N-acetyltransferase (AAC(3)-XI) from chromosomal origin was recently discovered from the analysis of the genome of isolate BM4678 (Galimand et al. 2015).
The gene families in the C. striatum pan-genome are mainly distributed in the following COG functional categories: Metabolism (36.9%); Information storage and processing (35.3%); Cellular processes and signaling (8.2%); and Poorly characterized (19.4%) (Fig. 3a). Regarding the core genome, the most prevalent functional categories included: Translation, ribosomal structure, and biogenesis (9.6%); Amino acid transport and metabolism (9.2%); and Transcription (9.1%). The accessory genes are mainly classified into the categories of Amino acid transport and  (Fig. 3b). A high number of genes related to transcription regulation was an essential feature of the three pan-genome subsets.
A prediction of biosynthetic pathways of secondary metabolites revealed the presence of 138 hits in the studied C. striatum genomes, related to 17 different compounds (Supplementary Table S2). Noteworthy, this analysis demonstrates the potential for production of several polyketides and non-ribosomal peptides (NRPs) by the species, of which some have potential antimicrobial activity Fig. 2 Pan-genome development analysis of Corynebacterium striatum. a There are number of gene families in the pan-genome (orange) and in the core genome (green). b The flower diagram demonstrates the dimensions of the core genome and accessory genes in gene numbers, and the contribution of genes per isolate to accessory genes and single genes (Li and Chen 1986;Nofiani et al. 2018). Such antimicrobial metabolites might contribute to the existence of C. striatum as a commensal of the skin and mucous membranes, through inhibition of growth or progression to the pathogenic state of other microorganism, like Staphylococcus aureus (Ramsey et al. 2016;Krismer et al. 2017).

Prediction of AMR genes
Through automated prediction, we identified 15 antimicrobial resistance genes (AMRs) in the C. striatum genomes (Fig. 4a). The single gene identified in all strains was rpsL, which codes for the 30S subunit-S12 ribosomal protein, presenting mutations similar to those described in streptomycinresistant M. tuberculosis (Sreevatsan et al. 1996). However, the ATCC6940 and 1961 strains have a streptomycin susceptibility phenotype (Fig. 4b), whereas all the other strains isolated in Brazil have a streptomycin-resistant phenotype (Fig. 4b).
The cmx gene, part of the MFS transporters family and promotes chloramphenicol efflux (Tauch et al. 1998), was present in 44.0% of the strains (Fig. 4a). As expected from previous studies, all strains that possessed this gene presented phenotypic non-susceptibility to chloramphenicol (Fig. 4b).
The ermX gene was found in 74.0% of the strains and codes for the rRNA methyltransferase enzyme responsible for the ineffective binding of macrolides, lincosamides, and streptogramins to the 23S ribosomal binding site (Roberts et al. 1999).
The sul1 gene was identified in 18.5% of the strains and is part of a gene family that codes for alternatives to the dihydropteroate synthase enzymes, which have less affinity for sulfonamides (Changkaew et al. 2014).
Although the tetA and tetW genes were found in 44.4% and 40.7% of the C. striatum strains, respectively (Fig. 4a), resistance to tetracycline did not correlate well with the identification of these genes in the genomic sequences (Fig. 4b).
These genes confer resistance to tetracycline through different mechanisms: tetA encodes transport proteins of the MFS family, whereas tetW encodes ribosomal protection protein (Roberts 2005).
Four AMR genes in the C. striatum pan-genome were identified as unique genes: tetB, aac(6')-lb7, aadA, and qacE. The tetB gene was found in the 2023 strain, but this strain is phenotypically susceptible to tetracycline, despite carrying the tetA and tetB genes, encoding efflux pumps for tetracyclines (Fig. 4b). Additionally, mutations in the rpoB gene similar to those found in rifampicin-resistant strains of Mycobacterium tuberculosis were found in the 2023 strain, which is phenotypically resistant to rifampicin (Ramos et al. 2019).

Prediction of virulence factors
We identified 32 genes potentially related to virulence in C. striatum, with 19 (59.3%) genes present in all strains, 11 (34.3%) genes appearing in at least two strains, but not in all, and 2 (6.3%) genes uniquely present in one strain (Fig. 5). These virulence factors are distributed in 10 functional categories (Fig. 5). In the "iron uptake category," three operons were identified as present in all C. striatum strains, namely the fagABCD operon, also present in Corynebacterium pseudotuberculosis strains (Billington et al. 2002;Dorella et al. 2006) the hmuTUV and the irp6ABC operons, also found in C. diphtheriae (Allen and Schmitt 2009;Schmitt 2014). Additionally, the itrAB operon is present in 17 C. striatum lineages (63%), and the mbtH and fxbA genes in 12 lineages (44.4%); these genes have been widely described in bacteria of the genus Mycobacterium (Dussurget et al. 1999;Timms et al. 2015).
Four genes coding for transcriptional regulators of potential virulence genes were also found in all C. striatum genomes (Fig. 5). The iron-activated dtxR gene is involved in regulations of genes related to iron homeostase, such as the genes of the fagABC, hmuTUV and ipr6ABC operons in Corynebacteria spp. (Qian et al. 2002;Trost et al. 2010). This gene may also be involved in regulating of irtA, irtB, Fig. 4 Antimicrobial resistance genes. a Left panel: MLSA tree; right panel: prediction of antimicrobial resistance genes. b Antimicrobial susceptibility phenotype retrieved from the literature. The same color scheme is used for geographic coordinates, as in Fig. 1 5 Page 8 of 15 mbtH and fxbA, as demonstrated by Manabe et al. (2005). The senx3, sigA and sigD genes have already been shown to play essential roles in the virulence and persistence of Mycobacteria spp. (Gomez et al. 1998;Raman et al. 2004;Singh and Kumar 2015).
The pafA and mpA genes, present in all strains, are part of the Pup proteasome System in Actinobacteria and have relevance in the persistence of M. tuberculosis in the host (Darwin 2009). The SpaFED pili are present in 21 strains, together with the genes of the sortases srtB and strC necessary for the pili's assembly (Gaspar and Ton-That 2006). These protein structures are displayed on the cell surface and participate in biofilm formation, DNA translocation, and interactions with other bacteria, besides working as phage receptors, contributing to pathogenesis (Mandlik et al. 2008;Proft and Baker 2009;Kline et al. 2010). The secA2 secretion system was found in all C. striatum strains (Fig. 5). This system has been demonstrated to be responsible for the exportation of multiple effectors that interfere with phagosome maturation and promote intracellular replication in M. tuberculosis (Zulauf et al. 2018).
Orthologs of the genes wecB and wecC code for the enzymes UDP-N-acetilglucosamine-2-epimerase and UDP-N-acetil-d-manosamine desidrogenase, uniquely found in the strain Kc-Na-01 (Fig. 5). Although these enzymes are expected to be found in Gram-negative bacteria, for the synthesis of lipopolysaccharide (Rai and Mitchell 2020), the orthologs mnaA and mnaB have been described in Staphylococcus spp. and are involved in the biosynthesis of the cell envelope.
Some of the predicted virulence factors have, in fact, been described in previous studies as being important factors for niche colonization in Corynebacterium spp. (Swierczynski and Ton-That 2006;Tauch and Burkovski 2015). This is because the best prediction strategies for virulence factors, which are based on databases such as the VFDB (Liu et al. 2022), consider genes that have been previously identified as being necessary for causing disease in the host. This includes, for instance, bacterial transcriptional regulators, whose mutants are attenuated in host cells. Therefore, even though classic corynebacterial virulence factors had not been identified in C. striatum, including diphtheria toxin or phospholipase D (Dorella et al. 2006;Sadiya et al. 2019), genes coding for proteins that are important players in the host-pathogen interplay were detected, including ptpA, that codes for an ortholog of a mycobacterial tyrosine phosphatase that significantly contributes to M. tuberculosis pathogenicity through inhibition of phagosome acidification and regulation of the expression of host genes (Wang et al. 2017).

GIs, prophages, ISs, and CRISPR loci
Eighty-four out of 129 AMRs distributed throughout all strains were found within genomic islands (Fig. 6). Sixteen strains presented the ermX adjacent to gcrA and gcrB, as shown in the genomic context for strain 2023 (Fig. 7a). Seven AMRs were found within the GI18, including ermX, tetA, and tetB (Figs. 6 and 7b). The genes aph(3″)Ib, aph(6)-Id, and cmx are all presented in GI8 (Fig. 6). These together with other genes that are found in the flanking regions of AMR genes were identified in the pTP10 plasmid, which is part of the C. striatum M82B genome (Tauch et al. 2000), as well as of the genome of C. striatum strain 2308 (Ramos et al. 2018).
Seventy-four bacteriophage signatures were detected in the studied genomes (Supplementary Table S3). Nevertheless, only 16 phage sequences were found intact in the genomes, of which the most prevalent were the PHAGE_ Rhodoc_REQ3_NC_016654 and the PHAGE_Staphy_ SPbeta_like_NC_029119. Notably, four AMR genes were found within a phage context in strain LK37 (Fig. 7c), present in GI5 (Fig. 6).
Integrons are versatile genetic elements, characterized by the ability to insert, excise, rearrange, and express genes through a site-specific recombination system and can act as vehicles for intra-and inter-specific transmission of genetic material (El Sayed Zaki et al. 2022). Integrons are characterized into classes based on the type of integrase gene. Class 1 integron is the most frequently observed in clinical isolates, mainly in Gram-negative bacteria (Racewicz et al. 2022). In Pseudomonas aeruginosa, the presence of class 1 integron is associated with the emergence of the MDR phenotype (El Sayed Zaki et al. 2022). However, studies on the presence of integrons in Gram-positive bacteria especially in Corynebacterium species are scarce. Class 1 integrons have been found in some Corynebacterium clinical isolates, such as Corynebacterium diphtheriae biovar mitis (Barraud et al. 2011), Corynebacterium resistens (Schröder et al. 2012), and Corynebacterium urealyticum (Rocha et al. 2020). To date, there are no studies describing the presence of class 1 integrons in C. striatum (Leyton et al. 2021). In our study, we found the class 1 integron in LK37 carrying the genes sul1, qacE, aadA, and aac(6')-lb7 (Fig. 7c), in strains 2130, 2296, 2425, and 3012STDY7069329 carry only the sul1 gene (Supplementary Table S4).
We also evaluated the insertion sequences (IS) that appear in the same genomic context as the AMR genes. The main IS families found in these regions were IS3, IS481, IS256, ISL3, and IS6. Interestingly, the ISCre1, belonging to the IS256 family, is associated with the aac(3)-XI gene in 7 C. striatum isolates. Additionally, the ISCx1 insertion sequence, belonging to the ISL3 family, was found in association with erm(X), tet(W) and aac(3)-XI. The IS1249 was also found in the genomic context of erm(X) and the IS5564 was located near the genes aph(3″)-Ib and aph(6')-Id. Both IS1249 and ISCx1 are part of a transposon Tn5432, which has been identified in the genomic sequences of C. striatum by recent studies (Wang et al. 2019;Leyton et al. 2021; Leyton-Carcaman and Abanto 2022) (Fig. 7c).
We also evaluated the genomic context of virulence factors and found that the operons spaDEF, fagABCD, hmuTUV, irtAB, and the gene fxbA can also be located in GIs. Twenty-one C. striatum isolates have mobile genetic elements in the same context of genes coding for SpaDlike pili (Fig. 8). This region is also within a predicted genomic island, GI17 (Fig. 6). Noticeably, five strains present frameshifts in at least one of the pili encoding genes (Fig. 8b), suggesting an incapacity to pili assembly or dependence on sortases StrB and StrC (Gaspar and Ton-That 2006). Some strains did not present the pili assembly machinery within the same genomic context: KC-Na-01, 1961, 962_CAUR, 963_CAUR, 1329_CAUR, and LK37 (Fig. 8c).
Great variability in CRISPR-associated loci was identified in the species C. striatum (Supplementary Table S5).
The studied genomes presented an average number of 52 detected spacer sequences, but with a wide range between 5 and 117 sequences. The identified repeat sequences were more consistently distributed throughout the genomes (Supplementary Table S5). Some genomes presented two clusters of genes coding for CRISPR-associated proteins (Cas), but the most prevalent organization found was the type IE CRISPR system (Supplementary Table S5). A recent study suggests the existence of an alternative, as-yet-unidentified CRISPR system organization in this species, termed Type I-E' (Ramos et al. 2022).

Conclusions
We found that the C. striatum species has an open pangenome, presenting widespread acquisition of novel genes by horizontal gene transfer. This is particularly demonstrated by the high number of identified genes related to resistance to aminoglycosides and tetracycline. The C. striatum genomic sequences illustrate the existence of a robust iron acquisition machinery in the species, suggesting an improved ability to adapt and survive within the host environment. However, the absence of pili or frameshifts in at least one of the five genes in some isolates can lead to a lowered ability to infect and persist in the host, considering that it was the only pili identified in the species. This study Fig. 7 Genomic context of AMR genes and mobile genetic elements. Shown in blue, are miscellaneous genes; in yellow are mobile genetic elements; in red are antimicrobial resistance genes; in gray are, hypothetical genes. a Genomic context of AMR genes from the C. striatum 2023 strain. b Genomic context of antimicrobial resistance genes from the C. striatum LK37 strain. c Prediction of prophage sequences in the genomic context of AMR genes in LK37 Page 11 of 15 5 Fig. 8 Genomic context of virulence factors in C. striatum genomes. Shown in blue, miscellaneous genes; in yellow, mobile genetic elements; in green, virulence factor genes; in orange, virulence factor genes with frameshift mutations; in gray, hypothetical proteins. a Genomic context of SpaD-type pili in ATCC6940. b Demonstrates frameshift in spaD-like pili genes. c Genomic context of the KC-Na-01, 1961, CAUR_962, CAUR_963, CAUR_1329 and LK37 strains at the same locus of the ATCC6940 strain 5 Page 12 of 15 highlights the importance of correctly identifying this microorganism in the clinical microbiology laboratory, besides performing molecular surveillance of antimicrobial resistance genes within the hospital environment, due to the widespread presence of clinically relevant AMRs in the species and the apparent facility to share mobile genetic elements.