The continuing evolution of community-associated MRSA ST93-IV in Australia

Background The global emergence of community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has seen the dominance of specic clones in different regions around the world with the PVL-positive ST93-IV as the predominant CA-MRSA clone in Australia. In this study we applied a genome-wide association study (GWAS) approach on a collection of Australian ST93-IV MRSA genomes to identify genetic traits that may have assisted the ongoing transmission of ST93-IV in Australia. We also compared the genomes of ST93-IV bacteraemia and non-bacteraemia isolates to identify potential virulence factors associated with bacteraemia. Results Based on single nucleotide polymorphism phylogenetics we identied two distinct ST93-IV clades circulating concurrently in Australia. One of the clades contained isolates primarily isolated in the northern regions of Australia whilst isolates in the second clade were distributed across the country. Analyses of the ST93-IV genome plasticity over a 15-year period (2002-2017) revealed an observed gain in accessory genes amongst the clone’s population. The GWAS analysis on the bacteraemia isolates identied two genes that have also previously been associated to this kind of infection. summary, this study has multiple additional possibly the Australian community.


Background
Over the last three decades, community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has emerged globally. Although polyclonal, a small number of CA-MRSA clones are dominant in different regions of the world such as multilocus sequence type (ST) 8-IV (USA300) in North America, ST80-IV in Europe and Northern Africa, ST59-IV/V in Asia, ST772-V and ST22-IV in the Indian subcontinent, and ST30-IV in the West Paci c region (1). Transmission of the dominant clones in other regions has occurred, and characteristically they harbour the lukS/F-PV genes that encode the Panton-Valentine leukocidin (PVL) toxin (2).
In Australia, the dominant CA-MRSA clone is PVL-positive ST93-IV (3). Colloquially known as the "Queensland CA-MRSA clone", ST93-IV was rst described in the early 2000s. Although known to cause severe infections including necrotizing pneumonia, ST93-IV is typically associated with skin and soft tissue infections (4). Reported across Australia, the clone is frequently isolated in the indigenous Australian population where its dominance is believed to be linked to overcrowding (5), poor hygiene and healthcare (6). Using whole genome sequencing (WGS) and temporal and geographical analysis, ST93 has been shown to be an early diverging and recombinant lineage genetically related to ST59/ST121 and to an unknown S. aureus lineage that emerged in the 1970s in the North Western region of Australia (5). Although earlier studies into the genetic diversity of ST93 showed multiple rearrangements of the spa sequence, the core regions of the genome were very stable (2). However in 2014, Stinear et al. suggested ST93 clone was under pressure for adaptive change due to a reduction in both exotoxin expression and oxacillin minimum inhibitory concentration (7).
To determine the association between gene content and disease, genome-wide association studies (GWAS) can be performed by analysing single nucleotide polymorphisms (SNPs), and the accessory genes provided by WGS data. For example, GWAS performed on isolates from children with acute S. aureus osteomyelitis identi ed an association in the number of virulence genes present and the severity of disease (8). In contrast, when applied to S. aureus bacteraemia isolates, no obvious associations in the number of virulence genes present in isolates from patients with and without S. aureus infective endocarditis were identi ed (9). GWAS can also be used to examine the evolution of a bacterial clone. For example recent GWAS performed on livestock-associated CC398 MRSA, showed the clone frequently lost antimicrobial resistance genes and acquired human speci c virulence genes in relation to the origin of the host (10).
In this study, we performed GWAS on a collection of Australian ST93 MRSA bacteraemia isolates collected over a three-year period (2015-2017) and a collection of previously published ST93 MRSA genomes (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). Phylogenetic analysis of the genomes was performed by examining SNPs in the core genome and investigating the absence/presence of accessory genes. To determine what genetic traits may have assisted the ongoing transmission of ST93-IV in Australia we correlated the absence and presence of accessory genes in the ST93-IV genomes to time, location and whether they originated from a bloodstream infection.

Results
The 423 ST93-IV were isolated across Australia from the following states and mainland territories: Northern Territory (n=141), Queensland (n=98), New South Wales (n=64), Western Australia (n=54), Victoria (n=43), South Australia (n=19), Australia Capital Territory (n=3) and Tasmania (n=1). Overall, there were 302 bacteraemia and 121 non-bacteraemia isolates. The non-bacteraemia isolates were limited to four geographical regions: New South Wales, Victoria, Western Australia and Northern Territory. Comparison between Principal Component Analysis (PCA) and Phylogenetic Clustering By examining the presence and absence of accessory genes, PCA identi ed two distinct clusters ( Figure   2). Isolates in the two PCA clusters correlated with isolates in the two SNP derived phylogenetic clades.
GWAS Comparison between Bacteraemia and Non-bacteraemia ST93 Isolates GWAS revealed nine accessory genes correlated with the bacteraemia isolates (p < 0.001 and odds ratio > 1) (Table 1). However, as seven of the genes were clade 1 speci c they were not considered bacteraemia factors. The two non-clade speci c genes that correlated with bacteraemia were hsdM (type I restriction enzyme EcoKI M protein) and clfA (clumping factor A) ( Figure 1). Overall, of the 302 bacteraemia isolates, 76% (n=230) carried both genes; 16% (n=49) carried one of the genes, and the remaining 7% (n=23) carried neither gene. Only 43% and 45% of the non-bacteraemia isolates carried the clfA and hsdM genes respectively. The difference in the prevalence of clfA and hsdM in the bacteraemia and non-bacteraemia isolates was statistically signi cant (p value < 0.001).
The seven clade 1 speci c accessory genes were ohrR (organic hydroperoxide resistance transcriptional regulator), acul (putative acrylyl-Coa reductase), ypuA (hypothetical protein), hutl_2 (hypothetical protein), entE (enterotoxin E), soj (chromosome-partitioning ATPase) and entA_2 (enterotoxin A) ( Figure 1). Approximately 88% (n=98/111) of the clade 1 genomes harboured all seven genes, with seven isolates containing none of the seven genes. The seven genes were located on ve different contigs, with entE and acuI co-located with soj and ohR respectively. The higher read coverage of the seven genes relative to the chromosome suggests the seven genes have a mobile genetic element origin.

Genomic diversity of ST93 over Time and Location
No signi cant differences in the presence or absence of accessory genes over time or location were identi ed.
Recombination/rearrangement of the ST93 genome When we analysed conserved gene neighbourhoods, we observed some genes that had re-arrangements correlating to bacteraemia. For example, some bacteraemia isolates contained different rearrangements of the sftA (DNA translocase), sdrF (serine-aspartate repeat-containing protein F), pls (surface protein), and setC (sugar e ux transporter C) genes. Analysis of the contigs carrying these genes show sftA and setC are co-located, while sdrF and pls are located separately.

Discussion
In the current study we have identi ed two distinct ST93-IV clades circulating concurrently in Australia. The identi cation of the two clades by SNP analysis of the core region was supported by the PCA based on the absence and presence of genes matrix. The clade 1 isolates were primarily isolated in the northern regions of Australia spread over three states/territories (Western Australia, Northern Territory and Queensland) whilst the clade 2 isolates were distributed across the country. Based on the genomic data of the van Hal et al. (5) historic ST93-IV isolates that were located at the root of the phylogenetic tree, we believe the two clades recently diverged from a common ancestor.
Clade 1 isolates differed from the clade 2 isolates by having acquired up to seven additional accessory genes. The known biological signi cance of these accessory genes varies. The entA and entE genes, encode the superantigen enterotoxins A and E respectively and play an important role in serious staphylococcal infections by triggering an overexpression of in ammatory mediators. The ohrR gene, which has previously been identi ed in Pseudomonas aeruginosa (11) and Bacillus subtilis (12), is known to increase an organism's resistance to organic hydroperoxides. The ability to resist peroxide provides the organism a growth advantage and increases its survival in host cells (13). The soj gene, a parA homologue involved in chromosome segregation during DNA replication, is not normally found in S. aureus (14). Typically, chromosome segregation in S. aureus is performed by the parB homologue spo0J, which was identi ed in all ST93-IV genomes. In Bacillus subtilis, soj and spo0J are present and work together to prevent premature midcell Z ring assembly (15). By having acquired soj, clade 1 isolates may have an advantage over non-clade 1 isolates as they have a more e cient DNA replication system. The roles of the three remaining accessory genes, acul (a putative protein), and ypuA and hutl (both hypothetical proteins) are not known. The addition of the seven accessory genes, which are likely to have originated on mobile genetic elements, may explain the high rates of ST93-IV skin infections amongst indigenous children living in the northern regions of Australia (16). Further studies are required to determine if clade 1 has become the predominant ST93-IV strain in the region's indigenous communities.
Based on the variability of the ST93-IV accessory genes over time and location we attempted to identify clade 1 or 2 speci c subclades. Although minor accessory gene variations occurred in a small number of isolates, (for example, four isolates contained qacA [antiseptic resistance protein], qacR [HTH-type transcriptional regulator] and tnsB [transposon] which were all located on the same contig), no signi cant differences in the absence or presence of accessory genes related to speci c subclades were observed.

GWAS for Bacteraemia vs Non-bacteraemia MRSA
In 2017 a GWAS performed by Lilje and colleagues was not able to identify genetic differences between S. aureus bacteraemia and non-bacteraemia genomes (9). Their results however may have been in uenced by studying a variety of S. aureus lineages and clonal complexes. To identify if speci c genetic factors are harboured by S. aureus bacteraemia genomes our study was limited to a single S. aureus lineage. GWAS identi ed two genes associated with the ST93-IV bacteraemic isolates. The hsdM gene has recently been shown to be a hotspot for chromosome rearrangements in staphylococcus causing phenotype switching associated with persistent infections (17). The clfA gene, which mediates staphylococcal binding to brin coated surfaces, has previously been linked to bacteraemia (18).
Chromosome rearrangements of genes may lead to altered gene expression (19). The 23 bacteraemic genomes that did not harbor hsdM and clfA all carried rearrangements of the pls, sdrF, setC and sftA genes. The pls and sdrF genes encode surface proteins. pls, which mediates bacterial aggregation and binding to glycolipids and human epithelial cells (20,21), has been shown in mice models to be an important factor in causing sepsis (22). SdrF, which is a microbial surface components recognising adhesive matrix molecule (MSCRAMM), allows staphylococcus to attach to and colonise host cells (23). The sdr gene in the ST93 genome JKD6159 has previously been reported to be the most diverse amongst different S. aureus clones, suggesting acquisition by horizontal transfer has occurred. In the Huping et al. study the sdr in the ST93 genome was classi ed as sdrC (24). However, an updated annotation database has identi ed the gene as sdrF which had previously only been reported in S. epidermidis. SdrF adheres to human keratinocytes and epithelial cells facilitating S. epidermidis colonisation of the skin (25). The sftA gene encodes for a DNA translocase. DNA translocases play an important role in allowing an organism to convert to the "L-form" cell wall de cient state which is resistant to cell wall-targeting antibiotics (26). DNA translocases have also been shown to be necessary in the formation of bio lms (27). Although the role of SftA in S. aureus is believed to be performed by SpolllE, in Bacillus subtilis SftA and SpoIIIE have been shown to have different functions during cell division. SftA aids in moving DNA from the closing septum while SpoIIIE translocates septum-entrapped DNA (28). We therefore hypothesise the rearrangement of sftA in some ST93-IV causing bacteraemia may provide the organism with a more e cient cell division pathway. setC, which encodes a sugar e ux transporter, is not a known bacteraemia factor. However, the gene is co-located with sftA.

Conclusion
GWAS is a powerful tool to identify associations using large datasets however there are limitations such as the patient's age and prior medical condition which are factors associated with bacteraemia. In the current study we have identi ed accessory genes and gene rearrangements that have strong associations with ST93-IV bacteraemia. However, to con rm this association there is a need to expands the dataset to include more ST93 non bacteraemia isolates. Furthermore, to determine if these genes are bacteraemia determinants other S. aureus lineages should be examined. Phylogenetic analysis has shown ST93-IV has recently gained accessory virulence genes which may be contributing to the clone's persistence in the Australian indigenous communities. The raw sequence reads were assembled de novo using SPAdes V3.12 (32). Sequencing quality control was determined based on average sequencing depth. Thirty-one had genomes less with than 40x coverage and therefore were excluded. The MLST pro les of the remaining 269 genomes were determined using the mlst tool described by Seeman et al. (33).

Methods
In addition to the 269 ASSOP ST93 MRSA, whole genome sequences for 154 ST93 MRSA collected between 2002 to 2012 from Van Hal et al. study (5) were included (Supplementary Table 1).
All sequence data obtained from this study were deposited to the NCBI Sequence Read Archive under BioProject ID PRJNA644215.

Phylogenetic Analysis
Using the chromosome of S. aureus CC398 reference strain SO395 (GenBank accession ID AM990992) as the reference genome, the bacterial variant calling tool snippy V4.1.0 (34) was used to extract SNPs from the core genome. The 423 ST93 genomes were used to generate a rooted maximum parsimony phylogenetic tree using MEGA V10.1.7 (35). Phylogenetic clades were de ned as a cluster of isolates sharing multiple common SNP mutations. The iTOL V3 web service was used to visualise the phylogenetic tree and the corresponding metadata (36).

Genome-Wide Association Study (GWAS)
Genes from the 423 assembled S. aureus genome sequences were annotated with Prokka V1.13 (37) and the pan-genome was extracted by Roary V3.12.0 (38) using the -s option of no paralog splitting. The pangenome matrix of gene presence or absence in each genome was used as input for Scoary V1.6.16 (39) with the following traits; SNP phylogeny clades, location (states and territories), year of isolation and whether the isolate was from a bloodstream infection. Genes that scored a Bonferroni corrected p value ≤ 10 −5 and odds ratio > 1 were further investigated similar to the method described by Arnoud H. M. van Vliet (40). In addition to Scoary analysis, principal component analysis (PCA) of binomial variables on the pan-genome matrix was performed for determination of association with the statistical package R version 3.5.1 (41) and ggplot2 V3.2.1 to con rm relationships between genes identi ed in Scoary and traits. A chi-square test of independence was performed to examine the signi cance of any relationship. An alpha level of 0.05 was used as a signi cance criterion for all statistical tests.
The default Roary parameters using paralogous gene splitting was used to detect rearranged genes. Sequences upstream and downstream of the rearranged genes were analysed to con rm recombination. Availability of data and materials

List Of Abbreviations
The data that support the ndings of this study are openly available on the SRA database under Bioproject: PRJNA644215.

Competing Interest
The authors declare that they have no competing interests

Funding
This work received no speci c grant from any funding agency.
Authors' contributions SP conceived of and designed the study and performed the literature search, generated the gures and tables, and wrote the manuscript. DD, SS and MS collected and analyzed the data, and critically reviewed the manuscript. GC supervised the study and with SM reviewed the manuscript. All authors read and approved the nal manuscript.