Accessory gene pool amplification in S. aureus ST398 correlates with host-switch
To provide a global landscape of S. aureus ST398 isolates in a One Health perspective, we conducted whole-genome sequencing of 146 isolates from 72 patients and 30 isolates from 6 animal species (Table 1), and combined them with 903 high-quality ST398 genomes collected from publicly available databases (Supplementary Fig. S1 and Table S1). This generated an atlas of 1079 genomes representative of 13 typical host species, over 23 years (1998–2021), and 25 countries across 5 continents (Fig. 2).
Table 1
ST398 isolate collections in this study.
Collection | No of ST398 isolates | Time of isolation | Country of isolation | Source | Clinical source |
ASPIRE-ICU (NCT02413242) [37] | 25 | 2016–2018 | Bulgaria, Czech Republic, Estonia, France, Germany, Hungary, Serbia, Spain, The Netherlands, Turkey, UK | Humans | ETA, nose, sputum |
ASPIRE-SSI (NCT02935244) [38] | 57 | 2017–2019 | Belgium, Czech Republic, Estonia, France, Italy, Serbia, Spain, The Netherlands, UK | Humans | Nose, throat, ETA, perineal, wound, blood, urine |
SAATELLITE (NCT02296320) [39] | 35 | 2014–2017 | Belgium, France, Spain, Switzerland | Humans | ETA, pleural fluid, blood |
SHIP-TREND-0 and SHIP-TREND-1 [40] | 18 | 2008–2012 and 2016–2019 | Germany | Humans | Nose |
Raafat et al. (2020) [41] | 7 | 2016–2017 | Germany | Animal (rat) | Nose |
Lekkerkerk et al. (2015) [42] | 10 | 1998 and 2009 | The Netherlands | Humans | n.r. |
Lekkerkerk et al. (2015) [42] | 23 | 2007–2010 | The Netherlands | Animal (chicken, horse, pig, veal, calve | Nose, rectal, pharynx |
Ip et al. (2005) [43] | 1 | 2000–2001 | Hong Kong | Humans | Blood |
ETA: endotracheal aspirate; n.r.: not reported. |
To profile accessory gene pools of ST398 isolates, we assembled the accessory genes that are present in some but not all the 1079 isolates (Fig. 3A). The accessory gene accumulation curves levelled off as the isolate number increased (Supplementary Fig. S2), indicating a sufficient coverage of accessory gene pools. The global accessory genome profile showed a non-random pattern, evidenced by the significant difference (P < 0.001) from 1000 random profiles based on a null model analysis. In the global accessory genomes (Fig. 3A), multi-drug resistance genes were dominant among antibiotic resistance genes (ARGs), adherence factors predominated amongst virulence factors (VFs), and the genetic information processing was a dominant KEGG pathway. To reduce the bias caused by the uneven distribution of isolates across years, we used periods instead of years to reveal accessory genome succession (Fig. 3B-H). From the period 1998–2005 to 2006–2010, the substantial increase (by 201%) in the numbers of accessory genes aligned with the increase in the numbers of host species (by 350%), and thereafter, both remained relatively stable with minor increases (by 0–22%). Expansion in accessory gene pool correlated temporally and positively with increasing diversity of host species (Spearman’s P = 0.051), indicating that accessory gene pool is able to provide clues to understand multiple-host tropisms. These results support our hypothesis (Fig. 1B) that the substantial expansion of accessory gene pool tends to occur during the initial stage (no later than 2010) of frequent host-jumps. The actual time of the events is likely earlier, since strain emergence would have preceded strain collection. From the period 2011–2015 to 2016–2021, the proportions of ST398 isolates significantly increased in pig and dog but decreased in cattle and poultry (P < 0.05 for all, Fisher's test) (Fig. 3H). The numbers of VFs and ARGs slightly decreased but that of pathway genes increased recently from the period of 2011–2015 to 2016–2021 (Fig. 3C-F). The proportion of VFs among accessory genes significantly (P = 0.045) decreased from 0.82% (2011–2015) to 0.52% (2016–2021), while the proportions of ARGs and pathway genes did not show any significant shifts recently. To assess whether the accessory genome succession was driven by deterministic (e.g., biotic and abiotic selection) or stochastic (e.g., genetic drift) processes, the null-model based normalized stochasticity ratio (NST), a general framework to quantify ecological stochasticity [15], was applied. NST significantly increased (P < 0.001 in both comparisons in Wilcoxon test) since the period of 2006–2010 (Supplementary Fig. S3A), indicating an increasing stochasticity in accessory genome changes. This finding was further supported by the significantly (P < 0.001) higher NST in accessory genome changes between more recent periods (Supplementary Fig. S3B). This result was in line with the low importance (6% of variable explanation in Distance-based redundancy analysis (dbRDA)) of collection year in shaping accessory genomes (Fig. 3G).
To further explore the mechanisms underlying the global distribution of accessory genomes of ST398, we assessed the impact of collection country and host species. Country is a major determinant (29% of variable explanation) in shaping accessory genomes (Fig. 3G). Countries might provide geographic barriers to restrict dispersal of accessory genes [16], also accompanied by varied environmental characteristics and anthropogenic interventions that cause selection stress on the microbial gene pools [17]. To provide additional evidence of geographic barriers, isolate dissemination across continents was inferred as previously described [18] by using a strict threshold of Mash distances < 0.0005 (Supplementary Fig. S4). The potential transfer of ST398 was continent-dependent, with more frequent transfers between Europe and North America than between Europe and other continents, and no transfer was observed between Asia and other continents, despite a higher representation of isolates from Asia compared to America and Oceania (Fig. 4). ST398 transfer between humans and animals was more frequently detected between Oceania and Europe than between other continents. These differences might be related to the differences in human travel and farm animal trade across different continents [19].
Frequent host-switch rather than host-adaptation shapes the evolution of S. aureus ST398
A host-jump is not only accompanied by acquisition of novel genes but also by loss of dispensable genes along a long-term co-evolution with the new host [1]. In this context, host species is assumed to be critical to distinguish accessory genomes (hypothesis (I) in Fig. 1A). Unexpectedly, our results revealed that host species was not a determinant (only 9% of variable explanation) for accessory genomes (Fig. 3G), and thus did not support hypothesis (I). The accessory genetic elements (e.g., ARGs, VFs and phages) relevant to isolate adaptation, colonization and infection were revealed if they were present exclusively in one host species (Fig. 5A). Following the rationale of the hypothesis (I), some of host-exclusive genetic elements would be detected in all isolates of a specific host species, because these elements have been selected for isolate adaptation along a long-term co-evolution with this host species. However, our results showed that none of the host-exclusive genetic elements were present in all isolates of a specific host species (Fig. 5A). The same results were also found in all host-exclusive accessory genes (Supplementary Table S2). This result is unlikely attributed to ST398 resistance to selection of host species, because the accessory gene pool expanded as the increasing diversity of host species (Fig. 3B and E). A probable explanation is then hypothesis (II) that the host-jump is shortly followed by jump-back and/or jump-spillover (Fig. 1), so that there would be insufficient time for co-evolution with a novel host species to generate a host-specific accessory gene pool. Mutations in the core genome are another determinant of microbial adaptation in host species [20, 21]. However, host species barely (4%) determined the variations of ST398 core genome phylogeny based on single nucleotide polymorphisms (SNPs) (Supplementary Fig. S5 and S6), further corroborating the little impact of host co-evolution on the ST398 core genome. Such an evolutionary trajectory might be the result of increased globalization of livestock-trade and transportation, and anthropogenic intervention, which increases opportunities for direct/indirect contacts of pathobionts with different host species.
To provide further evidence of a frequent host-switch of ST398, we traced the potential transfer of accessory genes of ST398 across different host species. A wide dissemination of accessory genes across different host species was revealed with each host species showing gene exchange with 5–11 other host species (Fig. 6A and Supplementary Fig. S7). Such wide gene exchange among ST398 across host-species is probably because of the close phylogenetic relatedness among ST398 isolates from various sources, which facilitates this process [22, 23]. Our results showed that humans and cattle were the most common hosts where ST398 acquired and donated accessory genes, respectively (Fig. 6A). This indicated that humans were potentially a major donor for ST398 jumps to other host species, because the genes acquired from isolates in other host species would facilitate jumps into these host species. A large-scale survey of host-jumps of S. aureus based on isolate phylogeny also found humans to be a major donor of jumps to other host species [1].
Despite the wide transfer of accessory genes of ST398, the transfer tended to be host-species dependent. For example, gene transfer between humans and pig was more common (13% and 10% in both directions, respectively) than between humans and other host species (mean 10% and 8% in both directions, respectively) (Fig. 6A). To provide further evidence for the above finding and reduce the bias caused by uneven sample distribution across different host species, network analysis based on individual isolate level was applied. Similar accessory genomes were more frequently detected between ST398 from humans and pigs than between any other host species (Fig. 6B and Supplementary Fig. S8), further supporting the above finding. The first emergence of ST398 was in pig farms [5], and instances of ST398 transfer between human and pig have been frequently reported [5, 24]. Interestingly, some isolates from different countries and years also showed similar accessory genomes, indicating that gene exchange might occur through an ecologically shared accessory gene pool. Smillie et al demonstrated that bacteria from a similar ecological niche are more likely to share accessory genetic elements than those from distinct environments, probably by gene acquisitions from a shared pool of mobile DNA [16]. Taken together, our findings point to the complexity of (co-)evolutionary trajectories governing ST398 multiple-host tropisms.
Human-ST398 is marked by specific VFs and ARGs
Exotoxin genes lukS-PV and lukF-PV encoding the Panton-Valentine Leukocidin, a cytotoxin that causes leukocyte destruction and tissue necrosis, and seb encoding the staphylococcal enterotoxin B, implicated in massive food poisoning and marked as a bioterrorism threat [25], as well as ARGs of aph(3')-IIIa (conferring resistance to aminoglycoside) and tetO (conferring resistance to tetracycline), were abundant and exclusive to ST398 in humans. Among the ARGs and VFs that were detected in multiple host species, ermC (conferring resistance to macrolide-lincosamide-streptogramin) and parC (conferring resistance to fluoroquinolones), and sak (encoding staphylokinase), scn (encoding staphylococcal complement inhibitor) and chp (encoding chemotaxis inhibitory protein) were significantly associated with human isolates compared to animal isolates (P < 0.05 in all, Supplementary Fig. S9 and Table S3). These VFs are important immune modulators [26]. Most of human-ST398 exclusively harboring lukS-PV and lukF-PV were not MRSA, except two isolates that carried mecA separately with mobile elements SCCmec IV, V and VII (Supplementary Table S4). Although S. aureus isolates harboring lukS-PV and lukF-PV have been reported to transfer between humans and animals [27], our global surveillance of ST398 did not identify such instances. MecA was detected in most ST398 isolates interspersed among different host species but not in the relatively recent phylogenetic branches (e.g., ST398-9 and ST398-10), generally aligning with the distribution of the SCCmec (Fig. 7A and Supplementary Table S4). Both the numbers of genomic islands and ARG-carrying plasmids were higher in poultry-ST398 than in other host species (Supplementary Fig. S10). Poultry pathogens have been found to harbor novel mobile genetic elements that were not detected in humans pathogens [28]. Among all the ARGs carried by plasmids, genes conferring resistance to tetracycline were most frequently detected, implying a high transfer potential of resistance to this antibiotic class. Tetracycline is one of the major antimicrobials used in farms and tetracycline resistance widely distributes among ST398 [29]. Accessory genome co-evolution analysis revealed that the co-evolution events among ARGs, VFs, plasmids and insertion sequences occurred more frequently among human isolates than animal isolates (Supplementary Fig. S11), aligning well with a higher rate (P < 0.001) of gene exchange among human isolates (Supplementary Fig. S12). This indicates a potentially higher evolution/diversification rate among human isolates, which was further supported by a higher (P < 0.001) SNP dissimilarity among human isolates than animal isolates (Supplementary Fig. S13).
Nutrient availability and metabolism drive global diversification of S. aureus ST398
Among the global genomes of ST398 (n = 1079), we utilized the fast BAPS algorithm based on hierarchical Bayesian clustering analysis [30], and de novo detected 26 subtypes based on core genomes (Fig. 7A and Supplementary Fig. S5). Isolates were distinguished by subtypes in the phylogenetic tree, and the variation of core genome phylogeny was explained maximally by subtype (68%), much higher than by country (18%), the previously defined clade (Supplementary Table S5, see Methods) (10%), host species (4%) and year (3%) (Supplementary Fig. S6). Therefore, the subtypes could reflect the global diversification of ST398 core genomes. To reveal the major differences across subtypes, a consensus sequence among the isolates within a subtype was generated as the representative sequence for this subtype, and 177 SNPs across these representative sequences were detected (Supplementary Table S6). Most of these SNPs occurred in genes involved in carbohydrate metabolism (n = 15), followed by translation (n = 12) (Fig. 7B and Supplementary Fig. S14). When grouping host-exclusive genes into pathways, most were also involved in metabolic pathways (43%), such as energy metabolism and carbohydrate metabolism (Fig. 5B). These results together indicated that the global diversification of ST398 mainly occurred in nutrient metabolism pathways. The differences in lactose metabolism and carbon metabolisms have been found to be separately linked to genome diversification of S. aureus [1] and Bathyarchaeia (one of the most abundant microorganisms on earth) [31]. Nutrient availability regulates microbial niches and interplays (e.g., competition and cooperation) [32], and thus is probably a major force for microbial diversification. Among nutrient metabolism, carbohydrate metabolism seemed a key factor for diversification of ST398 (Fig. 5B and Fig. 7B). Carbohydrate is a critical nutrient to regulate microbial growth and responses to perturbations [33], and the differences in microbial carbohydrate metabolism probably reflect differences in microbial realized niches [34].
Phylogenomic characteristics of S. aureus ST398 associated with infection
To reveal the infection-related evolution and succession of ST398, we narrowed analyses to isolates (n = 146) from patients (n = 72) whose clinical demographics were available (Supplementary Table S7). Most isolates showed a conserved evolution and succession along the duration of hospitalization of the host-patients, except the two isolates from patient 20007430011 that showed an obvious jump across tree branches (143 SNPs) (Fig. 8). The two isolates were collected separately from blood and endotracheal aspirate within a four-day interval. Since a similar sample collection pattern also occurred in the other patients, it is unlikely that the phylogenetic jump was linked to the collection pattern and more probably it might be related to rare events of phylogenetic diversification. Such diversification has also been found in Klebsiella pneumoniae ST15 isolated from patients during hospitalization [18]. In general, the isolates were highly clustered together based on the host patients (P = 0.001, R2 = 0.80, dbRDA, Supplementary Fig. S15), despite infection status, body sites, countries, years, clades or subtypes, with isolates from the same patient (except 20007430011) showing a low dissimilarity with less than 46 SNPs (Supplementary Fig. S16). These results altogether indicate a conserved phylogenomic evolution of ST398 within a patient.
To explore phylogenomic characteristics of ST398 associated with infection, we compared isolates from patients who developed infections with ST398 to those that remained colonized. To avoid overrepresentation of isolates from a patient, and given the observed conserved phylogenetic evolution of isolates within a patient, only one isolate from each patient was included in the analysis. The differences between infection and colonization isolates were mainly reflected in accessory genomes but not in core genomes (Supplementary Fig. S17), with infection isolates showing a significantly lower rate (P < 0.0001) of gene exchange (Supplementary Fig. S18). Correlation analyses based on two independent approaches showed that the clade EM-HA-ST398 (P < 0.05), subtype ST398-9 (0.05 < P < 0.1), and the phages StauST398_5 and StauST398_1 (0.05 < P < 0.1) were closely associated with infection development (Supplementary Fig. S19). The clade EM-HA-ST398 defined human-derived ST398 isolates that harbored both StauST398_4 and StauST398_5 (Supplementary Table S5) [6, 8, 35]. Both EM-HA-ST398 and StauST398_5 have been reported to be associated with invasive infections in humans [8, 35]. ST398-9 subtype is a relatively recent branch of ST398 global diversification (Fig. 7A), implying ST398’s evolution towards infectiousness. StauST398_1 has been previously detected in ST398 [35], but its relatedness to ST398 infection has been rarely reported. These phylogenomic characteristics of infection isolates detected in this study provided novel avenues for characterizing the molecular basis of ST398 infection. There were no statistical correlations of ARGs, VFs, plasmids or insertion sequences with infection (Supplementary Fig. S19). However, we could not completely rule out their associations with infection, because the patients included developed diverse types of infection and were from different countries (Fig. 8), where phylogenomic characteristics of ST398 associated with infection might differ. Thus, a global-scale clinical study to investigate given types of infection in specific regions is warranted. Although global surveillance of pathogens is widely favored [1, 36], such studies need to take into account the biases caused by uneven sample distribution across different regions, time and host species. In order to minimize the impact of such biases on our results, we used several mitigation approaches such as normalized proportions, multiple independent analysis approaches, and isolate-level analyses.
This study, using a One Health perspective, provides a global high-resolution landscape of diversity and evolution of accessory genomes of a typical multiple-host pathobiont over 20 years. The low-level differentiation of ST398 accessory genomes across different host species indicated that a host-jump tended to be followed shortly by jump-back or jump-spillover rather than a long-term co-evolution with a new host species to generate host-specific gene pools. This pattern has already persisted and is contemporarily prevalent, shaping the evolution of S. aureus ST398 in humans and livestock.