The 423 ST93-IV were isolated across Australia from the following states and mainland territories: Northern Territory (n=141), Queensland (n=98), New South Wales (n=64), Western Australia (n=54), Victoria (n=43), South Australia (n=19), Australia Capital Territory (n=3) and Tasmania (n=1). Overall, there were 302 bacteraemia and 121 non-bacteraemia isolates. The non-bacteraemia isolates were limited to four geographical regions: New South Wales, Victoria, Western Australia and Northern Territory.
Based on core genome SNPs, the rooted phylogeny based on 1383 SNPs depicted the ST93 population to cluster primarily in two main clades (Figure 1). Clade 1 contained 111 bacteraemia isolates predominantly from northern Australia whilst clade 2 contained 185 bacteraemia and 119 non-bacteraemia isolates collected across Australia.
Comparison between Principal Component Analysis (PCA) and Phylogenetic Clustering
By examining the presence and absence of accessory genes, PCA identified two distinct clusters (Figure 2). Isolates in the two PCA clusters correlated with isolates in the two SNP derived phylogenetic clades.
GWAS Comparison between Bacteraemia and Non-bacteraemia ST93 Isolates
GWAS revealed nine accessory genes correlated with the bacteraemia isolates (p < 0.001 and odds ratio > 1) (Table 1). However, as seven of the genes were clade 1 specific they were not considered bacteraemia factors.
Table 1: GWAS showing genes significantly correlating to bacteraemia using the presence (+) and absence (-) of each gene in 423 isolates (Bonferroni p value < 0.001 and a odds ratio > 1)
Gene
|
Function
|
Bacteraemia
Isolates
N (%)
|
Non Bacteraemia
Isolates
N (%)
|
clfA
|
Clumping factor A
|
240 (79.4)
|
53 (43.8)
|
hsdM_1
|
Type I restriction enzyme EcoKI M protein
|
269 (89)
|
55 (45)
|
ohrR
|
Organic hydroperoxide resistance transcriptional regulator*
|
103 (34.1)
|
0
|
acul
|
Putative acrylyl-CoA reductase Acul*
|
102 (33.7)
|
0
|
ypuA
|
Hypothetical protein*
|
105 (34.7)
|
1 (0.7)
|
hutl_2
|
Hypothetical protein*
|
101 (33.4)
|
1 (0.7)
|
entE
|
Enterotoxin type E*
|
101 (33.4)
|
0
|
soj
|
Chromosome partitioning ATPase*
|
101 (33.4)
|
1 (0.7)
|
entA_2
|
Enterotoxin type A*
|
99 (32.7)
|
0
|
The two non-clade specific genes that correlated with bacteraemia were hsdM (type I restriction enzyme EcoKI M protein) and clfA (clumping factor A) (Figure 1). Overall, of the 302 bacteraemia isolates, 76% (n=230) carried both genes; 16% (n=49) carried one of the genes, and the remaining 7% (n=23) carried neither gene. Only 43% and 45% of the non-bacteraemia isolates carried the clfA and hsdM genes respectively. The difference in the prevalence of clfA and hsdM in the bacteraemia and non-bacteraemia isolates was statistically significant (p value < 0.001).
The seven clade 1 specific accessory genes were ohrR (organic hydroperoxide resistance transcriptional regulator), acul (putative acrylyl-Coa reductase), ypuA (hypothetical protein), hutl_2 (hypothetical protein), entE (enterotoxin E), soj (chromosome-partitioning ATPase) and entA_2 (enterotoxin A) (Figure 1). Approximately 88% (n=98/111) of the clade 1 genomes harboured all seven genes, with seven isolates containing none of the seven genes. The seven genes were located on five different contigs, with entE and acuI co-located with soj and ohR respectively. The higher read coverage of the seven genes relative to the chromosome suggests the seven genes have a mobile genetic element origin.
Genomic diversity of ST93 over Time and Location
No significant differences in the presence or absence of accessory genes over time or location were identified.
Recombination/rearrangement of the ST93 genome
When we analysed conserved gene neighbourhoods, we observed some genes that had re-arrangements correlating to bacteraemia. For example, some bacteraemia isolates contained different rearrangements of the sftA (DNA translocase), sdrF (serine-aspartate repeat-containing protein F), pls (surface protein), and setC (sugar efflux transporter C) genes. Analysis of the contigs carrying these genes show sftA and setC are co-located, while sdrF and pls are located separately.