Antimicrobial Susceptibilities of the CRKP Strains
The source of isolates is supplied in Table 1, which denotes the infectious type and the result of susceptibility testing during the patients’ hospitalization. All eight strains involved in the study are confirmed to be K. pneumoniae, with five strains from sputum, one from bile, one from blood, and one from the environment (Supplementary Table 1). Clinical data demonstrate that seven of the eight patients are referred due to pulmonary infection, and another one is referred due to abdominal infection. The susceptibility testing data in Table 1 reveals that all the K. pneumoniae strains are resistant to almost all antibiotics, such as cephalosporins, penicilins, quinolones and carbapenems (imipenem with MICs ≥16 μg/ml). For aminoglycosides antibiotics, except that 1567D isolate is sensitive to amikacin and tobramycin, all other isolates are resistant to aminoglycosides antibiotics. The strains including 1566D, 2038D, 2039D and 2040D are resistant to sulfamethoxazole/trimethoprim with MICs ≥320 μg/ml, and the other strains (1567D, 2035D, 2036D, 2037D) are sensitive to sulfamethoxazole/trimethoprim with MICs ≤ 20 μg/ml.
Genome assembly and annotation
The short-read sequenced seven CRKP strains are assembled into contigs. As listed in Table 2, the assembled genome size of all trains ranged from 5.4 Mb to 5.8 Mb, with mean length of 5.7 Mb and average contigs numbering 199. The N50 length of genomes is from 176.6 kb to 251.6 kb with an average N50 length of 220.4 kb and mean GC content of 57.2%. To obtain a more complete genome, the 1567D strain is resequenced via long-read sequencing technology and assembled into three contigs with size of 5.6 Mb (Supplementary Figure 1). A total of 5,841 protein-coding genes are predicted with length between 37 to 1,649 bp (Supplementary Figure 2). Totals of 4,657, 5,097, 4,714, 3,179 and 3,099 predicted genes are functionally annotated in NR, COG, Swiss-Prot, GO and KEGG databases, respectively (Supplementary Figures 3, 4, 5).
Characteristics of the CRKP Isolates
The isolated eight CRKP bacteria are sequenced through Illumina MiSeq platform and assembled into whole genomes. To understand genetic diversity, mobile genetic elements of 24 prophages are identified in eight CRKP genomes, with sizes ranging from 8.4 kb to 98.9 kb (Figure 1). According to the criterion that the length of an intact prophage should be more than 20 kb9. Prophages detected in most strains (except for 2036D) are complete with a size of at least 20.2 kb with an average GC percentage of 52.7%. Additionally, three prophages are respectively identified in 3 strains at the same time, revealing the genomic sequence homology among all isolates. The 2036D strain is comprised of just one prophage probably because of the small genome size and distinct sequence characteristics, which is expected to have less neutral targets for prophage integration9.
Furthermore, multilocus-sequence typing (MLST) analysis reveals that there are two unrelated sequence type (ST) in K. pneumoniae strains isolated from different patients. 2036D K. pneumoniae strain correlates with ST2632, and the other six strains are relevant to ST11 (Table 3). pMLST analysis reveals that all of the six ST11 K. pneumoniae strains are associated with IncF[F33:A-:b-] and the ST2632 K. pneumoniae strain is relevant to IncHI1 and IncF.
Plasmid analysis10 shows different circular plasmids carried by the individual strains. All strains harbored IncR and ColRNAI plasmids with no virulence genes but contain several resistance-associated genes that cause resistance to carbapenems, which is demonstrated in Table 3. The IncR plasmid is identified as multidrug-resistant plasmids and has variable copy numbers of certain resistance genes among K. pneumoniae isolates.
Detection of Antibiotic Resistance Genes of CRKP Isolates
The antibiotic resistance-associated genes of seven CRKP bacteria (Table 3) are sequenced on Illumina MiSeq platform among the patient and environmental isolates. As illustrated in Table 3, some antimicrobial resistance genes are mediated by plasmid such as β-lactamase correlative genes (blaCTX-M, blaKPC, blaLEN, blaTEM) and those genes which encoded aminoglycoside [aac(3)-IId, rmtB], chloramphenicol (catA1, catA2), trimethoprim (dfrA1,dfrA17), and fluoroquinolone [QnrS1]. The other antimicrobial resistance genes are encoded by chromosome including blaSHV (narrow-spectrum β-lactamasein K. pneumoniae), oqxA (1,176 bp), oqxB (3,153 bp) (efflux pumps), and fosA (420 bp, fosfomycin resistance) genes.
Except 2036D, all the other K. pneumoniae strains harbor blaKPC-2 which is associated with carbapenems resistance. Extended-spectrum β-lactamases (ESBLs) resistance genes such as blaCTX-M, blaTEM, blaLEN and blaSHV are also informed. blaTEM is one of the genes that produce ESBL. blaCTX-M with different types (blaCTX-M-14, blaCTX-M-3, blaCTX-M-55 and blaCTX-M-65) is found among all the K. pneumoniae strains. blaCTX-M-3 is observed in 2036D strains. blaCTX-M-55 is observed in 1567D strain and blaCTX-M-14 is observed in 1566D and 2040D strains. blaCTX-M-65 is detected in the other four (2035D, 2037D, 2038D, 2039D) K. pneumoniae strains. blaLEN12 gene is exclusively found in 1566D strain, and there is no blaSHV gene in it. Nevertheless, blaSHV-93 and blaSHV-11 genes are detected in 2036D strain and the other five K. pneumoniae strains, respectively. Except for the 2036D strain, blaTEM-1B gene is observed in all the other six K. pneumoniae strains. Aac(3)-IId and rmtB encoding fluoroquinolone resistance are observed among all strains. oqxA and oqxB with the resistance to fluoroquinolones are exclusively detected in 2036D strain. fosA resulting in fosfomycin resistance11 is also informed among all CRKP strains.
Characterizing CRKP SNPs and Phylogeny
The SNP markers are identified for all strains that sequenced using the short-read MiSeq data. The data demonstrate that 33,716 markers are detected in the 2036D strain, which is largely more than the other strains with an average of 8,289 SNPs. The cSNPs located in protein-coding regions are in slightly higher amounts among all detected SNPs of a minimal ratio of 85.5% (Supplementary Table 11). In addition, the pairwise comparison analysis reveals that 2036D isolate is disparate with the other strains based on clusters of sequence similarities using subprogram of Trinity12 (Figure 2a). Furthermore, the 2036D strain share few SNP loci with the others, which coincides with strain clusters (Figure 2b).
For validations, all strains have a high detection rate in that approximately 153 out of 200 SNPs (76.4%) that have amplifications, which demonstrate the analysis accuracy (Supplementary Table 11). After filtering SNP loci that are not located in exome regions, containing no-alleles locus, and comprising all-wild SNP loci in each isolate, we eventually obtain 92 SNPs among 200 validated loci. A total of 40 out of 92 SNPs are all-variation loci in all isolates, which could be utilized for recognizing CRKP strain from ordinary K. pneumoniae (Supplementary Table 12). In addition, 24 SNPs of strain’s unique loci, including strains of 2036D (18 loci), 2035D (3 loci), 1566D (2 loci) and 2037D (1 loci), would be helpful resources for specific strain identification of clinical analysis.
Previous 5 CRKP strains that isolated in Hangzhou13 are downloaded from GenBank, and we conduct comparisons with strains in our study. The comparison result suggests that CRKP strains in Hangzhou are different from that in Fuzhou, presenting geographical difference (Supplementary Figure 6). The phylogenic tree shows that 1566D strain is most distantly related to other strains, and 2036D is more different from other strains, which is not even included in the phylogenic tree (Supplementary Figure 6).
GWAS analysis
To further identify significant SNPs and genes, we perform genome-wide association study (GWAS) analysis. The patients’ body temperature and counts of leukocyte are selected as phenotypic character. The short-sequencing reads of six strains (Figure 3) are aligned to the 1567D genome using BWA v0.7.17 software. We call SNPs using Platypus v0.8.114, and then filter the SNPs through plink v1.9 according to the following conditions: (i) missing loci, (ii) minor allele frequency (MAF) < 0.05 and (iii) significant deviation from the Hardy-Weinberg equilibrium (HWE) (P < 0.01). A total of 698 SNP markers are remained and utilized for GWAS analysis. As a result, 9 loci are identified (P < 0.05). Two loci (ygbI and murB) are related with temperature and the other seven loci (IsrD, SufD, yrkF, fabI, sppA, entF and ttuB) are relevant to leukocyte (Figure 3).