Antimicrobial Susceptibilities of the CRKP Strains
The source of isolates is supplied in Table 1, which denotes the infectious type and the result of susceptibility testing during the patients’ hospitalization. All eight strains involved in the study are confirmed to be K. pneumoniae, with five strains from sputum, one from bile, one from blood, and one from the environment (Supplementary Table 1). Clinical data demonstrate that seven of the eight patients are referred due to pulmonary infection, and another one is referred due to abdominal infection. The susceptibility testing data in Table 1 reveals that all the K. pneumoniae strains are resistant to almost all antibiotics, such as cephalosporins, penicilins, quinolones and carbapenems (imipenem with MICs ≥16). For aminoglycosides antibiotics, except that 1567D isolate is sensitive to amikacin and bramycin, all other isolates are resistant to aminoglycosides antibiotics. The strains including 1566D, 2038D, 2039D and 2040D are resistant to sulfamethoxazole/trimethoprim with MICs ≥320, and the other strains (1567D, 2035D, 2036D, 2037D) are sensitive to sulfamethoxazole/trimethoprim with MICs ≤ 20.
Genome assembly and annotation
The short-read sequenced seven CRKP strains are assembled into contigs. As listed in Table 2, the assembled genome size of all trains ranged from 5.4 Mb to 5.8 Mb, with mean length of 5.7 Mb and average contigs numbering 199. The N50 length of genomes is from 176.6 kb to 251.6 kb with an average N50 length of 220.4 kb and mean GC content of 57.2%. To obtain a more complete genome, the 1567D strain is resequenced using long-read sequencing technology and assembled into three contigs with size of 5.6 Mb (Figure 1). A total of 5,841 protein-coding genes are predicted with length between 37 to 1,649 bp (Supplementary Figure 1). Totals of 4,657, 5,097, 4,714, 3,179 and 3,099 predicted genes are functionally annotated in NR, COG, Swiss-Prot, GO and KEGG databases, respectively (Supplementary Figures 2, 3, 4).
Characteristics of the CRKP Isolates
The isolated seven CRKP bacteria are sequenced through Illumina MiSeq platform and assembled into whole genomes. To understand genetic diversity, mobile genomic elements of 24 prophages are identified in seven CRKP genomes, with sizes ranging from 8.4 kb to 49.3 kb (Figure 2). According to the criterion that the length of an intact prophage should be more than 20 kb20. Prophages detected in most strains (except for 2036D) are complete with a size of at least 20.2 kb with an average GC percentage of 52.7%. Additionally, three prophages are respectively identified in 3 strains at the same time, revealing the genomic sequence homology among all isolates. The 2036D strain is comprised of just one prophage probably because of the small genome size and distinct sequence characteristics, which is expected to have less neutral targets for prophage integration20.
Furthermore, multilocus-sequence typing (MLST) analysis reveals that there are two unrelated sequence type (ST) in K. pneumoniae strains isolated from different patients. 2036D K. pneumoniae strain correlates with ST-2632, and the other six strains are relevant to ST-11 (Table 3). pMLST analysis also reveals that all of the six K. pneumoniae strains with ST-11 are associated with IncF[F33:A-:b-] and the K. pneumoniae strain with ST-2632 is relevant to IncHI1 and IncF. ST-11, with IncF [F33:A-:b-] type accounting for the majority of K. pneumoniae strains.
Plasmid analysis shows different circular plasmids carried by the individual strains. All strains harbored IncR and ColRNAI plasmids with no virulence genes but contain several associated resistance genes, which is demonstrated in Table 3. The IncR plasmid is identified as multidrug-resistant plasmids and has variable copy numbers of certain resistance genes among K. pneumonia isolates.
Detection of Antibiotic Resistance Genes of CRKP Isolates
The relevant antibiotic resistance genes of seven CRKP bacteria are sequenced on Illumina MiSeq platform among the patient and environmental isolates. As illustrated in Table 3, some antimicrobial resistance genes are mediated by plasmid such as β-lactamase correlative genes (blaCTX-M, blaKPC, blaLEN, blaTEM) and those genes which encoded aminoglycoside [aac(3)-IId, rmtB], chloramphenicol (catA1, catA2), trimethoprim (dfrA1,dfrA17), and fluoroquinolone [QnrS1]. The other antimicrobial resistance genes are encoded by chromosome including blaSHV (narrow-spectrum β-lactamasein K. pneumoniae), oqxA (1,176 bp), oqxB (3,153 bp) (efflux pumps), and fosA (420 bp, fosfomycin resistance) genes.
Except 2036D, all the other K. pneumoniae strains harbor the associated carbapenemases-producing resistance gene, blaKPC-2. Extended-spectrum β-lactamases (ESBLs) resistance genes such as blaCTX-M, blaTEM, blaLEN and blaSHV are also informed. BlaCTX-M with different types (blaCTX-M-14, blaCTX-M-3, blaCTX-M-55 and blaCTX-M-65) is found among all the K. pneumoniae strains. BlaCTX-M-3 is observed in 2036D strains. BlaCTX-M-55 is observed in 1567D strain and blaCTX-M-14 is observed in 1566D and 2040D strains. BlaCTX-M-65 is detected in the other four (2035D, 2037D, 2038D, 2039D) K. pneumoniae strains. BlaLEN12 gene is exclusively found in 1566D strain, and there is no blaSHV gene in it. Nevertheless, blaSHV-93 and blaSHV-11 genes are detected in 2036D strain and the other five K. pneumonia strains, respectively. Except for the 2036D strain, blaTEM-1B gene is observed in all the other six K. pneumoniae strains. Aac(3)-IId and rmtB encoding fluoroquinolone resistance are observed among all strains. FosA resulting in fosfomycin resistance21 is also informed among all strains. The oqxA and oqxB are also detected in all CRKP strains that might result in carbapenem resistance since antimicrobials are expelled from pathogenic bacteria by efflux pump22.
Characterizing CRKP SNPs
The SNP markers are identified for all strains that sequenced using the short-read MiSeq data. The data demonstrate that 33,716 markers are detected in the 2036D strain, which is more significant than the other strains with an average of 8,289 SNPs. The cSNPs located in exonic regions are in slightly higher amounts among all detected SNPs of a minimal ratio of 85.5% (Supplementary Table 11). In addition, the pairwise comparison analysis reveals that 2036D isolate is disparate with the other strains based on clusters of sequence similarities using subprogram of Trinity23 (Figure 3a). Furthermore, the 2036D strain share few SNP loci with the others, which coincides with strain clusters (Figure 3b).
For validations, all strains have a high detection rate in that approximately 153 out of 200 SNPs (76.4%) that have amplifications, which demonstrate the analysis accuracy (Supplementary Table 11). After filtering SNP loci that are not located in exome regions, containing no-alleles locus, and comprise all-wild SNP loci in each isolate, we eventually obtain 92 SNPs among 200 validated loci. A total of 40 out of 92 SNPs are all-variation loci in all isolates, which could be utilized for recognizing CRKP strain from ordinary K. pneumonia (Supplementary Table 12). In addition, 24 SNPs of strain’s unique loci, including strains of 2036D (18 loci), 2035D (3 loci), 1566D (2 loci) and 2037D (1 loci), would be helpful resources for specific strain identification of clinical analysis.
Phylogeny Analysis
Comparisons of strain similarity are performed using the Harvest Tools Suite24 (version 1.1.2). For all of the isolates sequenced on a particular platform, parsnp is used to compare all the assembled isolates against each other and known reference strain. Results are visualized using EvolView. The phylogenic tree shows that 1567D strain is most distantly related to the other strains, and 2036D is more closely related to the reference strain comparing with all the other strains (Supplementary Figure 5).
GWAS analysis
To further identify significant SNPs and genes, we perform genome-wide association study (GWAS) analysis. The patients’ body temperature and counts of leukocyte are selected as phenotypic character. The short-sequencing reads of six strains (Figure 4) are aligned to the 1567D genome using BWA v0.7.17 software. We call SNPs using Platypus v0.8.125, and then filter the SNPs through plink v1.9 according to the following conditions: (i) missing loci, (ii) minor allele frequency (MAF) < 0.05 and (iii) significant deviation from the Hardy-Weinberg equilibrium (HWE) (P < 0.01). A total of 698 SNP markers are remained and utilized for GWAS analysis. As a result, 9 loci are identified (P < 0.05). Two loci (ygbI and murB) are related with temperature and the other seven loci (IsrD, SufD, yrkF, fabI, sppA, entF and ttuB) are relevant to leukocyte (Figure 4).