Phylogenetic, ANI and isDDH analysis
In the eight tested Cupriavidus species, we found that six species of C. alkaliphilus, C. basilensis, C. gilardii, C. oxalaticus, C. necator, and C. taiwanensis are species complex based on phylogenetic analysis coupled ANI and isDDH (Fig. 1). Among the six species complex, C. taiwanensis and C. alkaliphilus are the most challenging problems because all C. alkaliphilus strains (7 strains) mixed with parts of C. taiwanensis (11 of 42 strains) and formed two new groups (group19, group20). Strains in each new group shared a high value of genomic similarity. For example, in group 19, genomic similarity (value of either ANI or isDDH) between C. alkaliphilus MLR2-44 and C. taiwanensis LMG 19430 was all over 99.8%. Thus, we inferred that some C. alkaliphilus and C. taiwanensis strains might be mislabeled.
Notedly, the ANI cutoff of 95% could not completely differentiate 17 C. taiwanensis strains and 7 C. alkaliphilus strains. Accordingly, the ANI cut-off value for these strains was adjusted from 95% to 96%, whereas the ANI cut-off value for other strains was maintained at 95%. As a result, 17 C. taiwanensis strains and 7 C. alkaliphilus strains were successfully separated into three groups (groups 18-20). In this way, six species (C. alkaliphilus, C. basilensis, C. gilardii, C. oxalaticus, C. necator, and C. taiwanensis) of the tested eight Cupriavidus species were identified as species complex, which included 18 genomic groups.
Characterization of pan-genome and core genome
The pan-genome in the eight species of Cupriavidus contained 37991 gene families (Fig. 2a). The pan-genome data fitted a power law curve (y = ax^b, b=0.37). These results showed that the pan-genome of Cupriavidus was open due to the value of b > 0 [51]. In the case of the core genome, the number of core gene families was 1616; the core genome accounts for 4.25% of the pan-genome. The remaining gene families (accessory and unique genome) amounted to 36375 (95.75%); they formed the variable genome of Cupriavidus, suggesting that Cupriavidus strains have a high degree of genetic variation. COG assignments were employed further to determine the function categories of the core gene families of Cupriavidus (Fig. 2b). A total of 1578 core gene families were annotated. A larger number of core gene families of Cupriavidus were involved in the cell wall/membrane/envelope biogenesis (category M: 98 gene families); translation, ribosomal structure and biogenesis (category J: 134 gene families); transcription (category K: 121 gene families); energy production and conversion (category C: 135 gene families); and amino acid transport and metabolism (category E: 168 gene families). Notably, function unknown (category S: 308 gene families) is the most significant amount of the core genome, indicating that the current knowledge of Cupriavidus is limited.
Mobile genetic elements, TA system, and pathogenicity prediction
The features of mobile genetic elements in Cupriavidus were described from three facets: genomic islands, prophages, and CRISPRs (Table. S2). Genomic islands are among the most abundant mobile genetic elements. All the 97 Cupriavidus strains have genomic islands; the number of genomic islands ranged from 2 to 162, with an average of 59. Prophages take the second place. Most strains (71/97) possessed between 1 to 8 prophages. CRISPRs were the least type of mobile genetic elements; only nine strains had CRISPRs. Overall, genomic islands are the most significant mobile genetic elements in Cupriavidus, which might be responsible for a high degree of genetic variation and promote the Cupriavidus strain's adaptation to various harsh environments and stresses.
The TA system, widely spreading in bacterial genomes, comprises a gene pair encoding a toxin and its corresponding antitoxin [52]. It is crucial in various biological processes, especially stress response and antimicrobial persistence [53]. Unexpectedly, no TA system was found in all strains of Cupriavidus (Table. S2).
To assess the pathogenic potential of Cupriavidus, we employed PathogenFinder, a computational tool for predicting pathogenicity. The probability of being a human pathogen range for this prediction was estimated to be 0.183-0.325 (Table. S2).
Virulence factors identification
A total of 47 genes related to virulence factors associated with virulence genetic profiles were identified in the eight species of Cupriavidus (Fig. 3), in which 12 major virulence factors exist in almost all the 97 selected Cupriavidus strains. These 12 virulence factors are associated with antimicrobial activity (acrB), biofilm (adeG, algU), stress survival (clpP, katA, sodB, ureA, ureB, and ureG), adherence (htpB, kdsA) and others (icl). Additionally, several species-specific virulence factors occur in particular species. The species-specific virulence factors include membrane-acting toxin, immune modulation, inflammatory signaling pathway, and iron uptake. Of these, a virulence factor associated with a membrane-acting toxin (plc-2) is present in almost all strains of C. gilardii, C. metallidurans, C. necator, and part of C. taiwanensis but is absent in the other four tested species. Virulence factors related to immune modulation and inflammatory signaling pathways are mainly present in C. taiwanensis (wbtL), C. basilensis (rfbF, acpXL, and rfbD), and C. gilardii (acpXL). Most iron uptake virulence factors were mainly identified in C. oxalaticus (pvdH), C. taiwanensis (pvdA, mbtH-like), and C. gilardii (mbtH-like). Overall, the differential distribution of virulence factors suggested that the pathogenicity of Cupriavidus might vary from species to species.
Detection of the secretion systems
Cupriavidus strains contain six types of secretion systems (T1SS-T6SS, Fig. 4). T1SS is present in seven Cupriavidus species (C. alkaliphilus, C. basilensis, C. gilardii, C. pauculus, C. oxalaticus, C. necator, and C. taiwanensis). The number of T1SS clusters in Cupriavidus strains ranges from 1 to 5. The genetic organization of the T1SS clusters in Cupriavidus is highly diverse (Fig. 5a), and the components of different T1SS showed a low protein sequences identity threshold (20%), indicating that the presence of T1SS is strain-specific. The phylogenetic tree based on shared sequences of T1SS was not congruent to the genome-based phylogenetic tree (Table. S1a, Fig. 1), suggesting that T1SS occurs in multiple horizontal gene transfer (HGT) events during divergent evolution in Cupriavidus strains.
T2SS was identified in all eight Cupriavidus species (96 strains; strain NDB4MOL1 was the only exception, Fig 4). T2SS shared similar genetic organization in the genomes of 96 Cupriavidus strains (Fig. 5b), and protein sequences of T2SS components indicated a high degree of identity threshold (82%). The phylogenetic tree of the shared sequences of T2SS showed a similar topology to the core SNPs phylogenetic tree (Fig. S1b, Fig. 1). Accordingly, T2SS is conserved in all eight species of Cupriavidus.
T3SS was identified as present in the four species of Cupriavidus, C. alkaliphilus, C. gilardii, C. taiwanensis, and one strain of C. necator (Fig 4). The genetic organization of T3SS clusters was very similar (Fig. 5c); the protein sequences of T3SS components also kept a high protein identity threshold (92%). The phylogenetic tree based on the shared sequences of T3SS showed a similar topological structure compared with the core SNPs phylogenetic tree (Fig. S2a, Fig. 1), suggesting that the strains lacking T3SS clusters probably result from earlier deletion events during divergent evolution. Additionally, it is interesting to note that all of C. gilardii strains were identified to possess an extra T3SS clusters with different genetic organization, while the other seven Cupriavidus species are absent (Fig. 5d). The extra T3SS clusters in C. gilardii strains displayed evident deviation in CAI with their host genomes (Fig. S2b), indicating that the extra T3SS clusters are likely acquired through HGT events. To further identify whether the extra T3SS in C. gilardii has putative homologous with the other bacteria, we performed blastp searches in the NCBI non-redundant protein database. The results showed that the extra T3SS in C. gilardii species is closely related to the T3SS clusters in Yersinia enterocolitica (protein identity with approximately 52%, Fig. 5d).
T4SS was identified with five types (T, I, F, G, and pT4SSt) in 33 strains of eight species of Cupriavidus (Fig. 4). Interestingly, we found that T4SS clusters predominately occurred in pathogenic Cupriavidus species (C. gilardii, C. metallidurans, and C. pauculus) instead of other species. Specifically, 78% (21/27) of pathogenic strains possess T4SS clusters, while 22% (13/60) of the other strains contain T4SS clusters. The genetic organization of T4SS clusters is different from each other (Fig. 6). The phylogenetic tree based on shared sequences of T4SS differed from the tree based on the core SNPs (Fig. S3). As such, T4SS probably results from the multiple HGT events.
T5SS was present in all 97 strains of the eight species of Cupriavidus, in which T5SS could be classified into three distinct types: T5aSSs, T5bSSs, and T5cSSs (Fig. 4). These results showed that the T5SS clusters in Cupriavidus are highly strain-specific because the type and the number of T5SS clusters vary considerably from strain to strain. For example, the maximum number of T5SS clusters in strain KF709 was up to 12, while the least T5SS clusters were as low as 1.
T6SS was identified in 61 strains of six Cupriavidus species, including C. alkaliphilus, C. gilardii, C. pauculus, C. metallidurans, C. necator, and C. taiwanensis (Fig 4). The T6SS clusters in Cupriavidus strains are present in four different genetic organizations (Fig. 7a). To further investigate the diverse origins of T6SS, a phylogenetic tree using the shared TssB proteins of Cupriavidus strains in combination with the data of the SecReT6 database. The T6SS clusters in 61 Cupriavidus strains were clearly classified into four types: i2 (C. metallidurans NA4), i3 (C. pauculus KF709), i4a (C. necator NBRC 102504), and i4b (58 strains) (Fig. 7b). The four different Cupriavidus T6SSs clusters scatter in different locations, showing high homology to other T6SS clusters from other species, such as Escherichia coli(i2) and Yersinia pseudotuberculosis(i3), Pseudomonas syringae (i4a), and Ralstonia solanaacearum (i4b). Thus, the divergent reason for the T6SS clusters in Cupriavidus is probably due to various donor species' HGT events.
Antibiotic resistance gene in Cupriavidus
Forty antibiotic resistance genes were identified in 97 strains of Cupriavidus (Fig. 8). There were 12 major antibiotic resistance genes identified in all strains, which all are related to five types of system of efflux pumps, namely ATP-binding cassette system, resistance-nodulation-cell division system, kdpDE system, major facilitator super-family system, and small multidrug resistance system. Efflux pumps are known as either single-component transporters or multiple-component systems [54]. However, we found that the 12 antibiotic resistance genes do not encode the complete efflux pump except for a small multidrug resistance system (emre).
The opportunistic pathogens of Cupriavidus, C. gilardii, C. metallidurans, and C. pauculus possess more specific-species antibiotic resistance genes compared with the other Cupriavidus species. For example, C. gilardii contain aminoglycoside resistance genes ant(3'')-Ib, aac(3)-Ivb, aac(3)-IV, and a β-lactam resistance gene oxa-837; C. metallidurans contain three β-lactam resistance genes och-4, och-8, and och-1, a fosfomycin efflux pump AbaF, and a multidrug efflux pump MexAB-OprM; C. pauculus has a multidrug efflux pump MexAB-OprM. In the case of the other species, C. taiwanensis is the most distinctive because some strains of C. taiwanensis (group17,18, and 19) have six genes associated with aminoglycoside resistance (ant(3'')-IIa, ant(3'')-IIb, ant(3' ')-IIc, ant(3'')-IId, ant(3'')-IIe, ant(3'')-Iig).