Whole-genome based phylogeny, population genetic analysis, and average nucleotide identity provided a high-resolution taxonomy
To assess phylogenetic relationship of the Hafnia genus, a phylogenetic tree was constructed using the concatenated nucleotide sequence of 2045 core genes (Additional file 2: Table S2) from 20 newly sequenced and 27 publicly available Hafnia strains. The core genome tree generated a reliable delineation of phylogenetic relationships across the Hafnia genus. According to our core genome tree, the 47 strains were divided into two phylogenetic lineages (Fig. 1); one lineage contained 26 strains (designated alvei), while the other lineage included 21 strains (designated paralvei). The lineages alvei and paralvei formed distinct, extremely tight clusters in separate clades from each other, suggesting that the genetic differentiation between the genomes of alvei and paralvei strains occurred during adaptation to different niches, and co-evolution with their hosts, in which other microbes could be an important driving factor. To further explore the genomic similarities among strains, we complemented our phylogeny with genetic population structure analysis using the program STRUCTURE . As shown in Fig. 1, the strains generally clustered into two structure clusters in a pattern matching that observed in the phylogenetic analysis. Our phylogenetic analysis and population structure exhibited a reliable delineation of genetic relationships between the species alvei and paralvei. In addition, we identified a mislabeled strain, ATCC 51873, which was previously classified as the species alvei and should be corrected to paralvei.
The average nucleotide identity (ANI) value was applied to delineate species and was calculated to estimate the genetic distance between strains at the genomic level. Here we calculated the pairwise ANI values of 47 strains to examine the inter-species genetic relatedness within the Hafnia genus. Based on the ANI results, the strains were clearly divided into two groups, which was consistent with the phylogenetic analysis results. The ANI values for alvei and paralvei were above 96 and 93% (Additional file 3: Table S3), respectively. It is worth noting that the ANI values between alvei and paralvei were approximately 87%, below the recommended 95% threshold value for species circumscription , illustrating the prominent genetic distance between these two species.
Characterizing the core and pan-genomes
To assess the genetic diversity, we constructed the core (genes shared among all 47 strains) and pan (all genes found across all 47 strains) genome curves of the Hafnia genus (Fig. 2a). In our pan-genome of Hafnia, 13255 gene families were identified across 47 genomes, of which 2529 constitute the core genome. The pan genome curve is noticeably shaped by the number of novel gene additions with each additional genome. Conversely, a continuous decline in the core genome curve was observed for the novel additional genome. Interestingly, the pan-genome curve was strongly affected by the addition of a large number of novel gene families in additional alvei genomes (Fig. 2a), suggesting that there are substantial differences in gene content between alvei and paralvei.
Having determined the genomic differences, we characterized the core and pan-genomes of alvei and paralvei separately (Fig. 2b). The pan-genomes of both species show a clear linear upward trend in agreement with the Heap’s law pan-genome model , and a robust fit to the data of all four serovars was obtained with an increasing power law with positive exponents of γ = 0.8133 and 0.4500 (Fig. 2b). The exponent γ > 0 indicates an open pan genome species . Additionally, we found that the paralvei pan-genome is approximately 3000 genes larger than the alvei pan-genome (11148 and 8218, respectively), suggesting that strains of this species have more frequent genetic exchange events and a large source of gene content.
Compared to the pan-genome curves, the number of gene families decreased sharply with additional genomes, reaching minimum values of 3134 and 2917 for alvei and paralvei, respectively. We further used the Cluster of Orthologous Group (COG) assignments to determine the functional categories of the core gene families of alvei and paralvei. The core gene families of both species were unevenly distributed across the functional categories (Fig. 2c). Larger proportions of the core gene families of both alvei and paralvei were involved in the transcription (category K: 6.8% and 7.5%), energy production (category C: 7.3% and 6.7%), transport and metabolism of carbohydrates, amino acids and inorganic ion (categories G: 7.8% and 8.0%, E: 10.2% and 9.2%, and P: 6.8% and 6.3%). It is notable that most of the core gene families in both alvei and paralvei play important roles in maintaining growth and reproduction.
Species-specific core genomes revealed the extent of divergence between alvei and paralvei
The species-specific pan-genome content reveals that these are an underlying profiles of gene families that are conserved among strains of a species, some of which are unique to this species. To identify the species-specific gene families, we constructed the accessory genome by subtracting the core genome and low frequency genes (< 10 examples) from the pan-genome. As shown in Fig. 2d, the cluster map of the Hafnia accessory genome demonstrates that each serovar is differentiated by a set of conserved gene families (framed in black, Fig. 2d). A total of 213 and 183 gene families were identified as part of the alvei and paralvei-specific core genomes (Additional file 4: Table S4), respectively. Bared on KEGG annotation, the functional categories “carbohydrate metabolism”, “lipid metabolism”, “metabolism of cofactors and vitamins” and “membrane transport” were enriched in the alvei-specific core genome; the functional categories “carbohydrate metabolism” and “membrane transport” were enriched in the paralvei-specific core genome (Fig. 3a). These species-specific genes indicate the putative niche differentiation between alvei and paralvei.
Three and two complete pathway modules are present in the alvei and paralvei-specific core genomes (Fig. 3b), respectively. The EvgS-EvgA two-component regulation system in Escherichia coli is a transcriptional regulator of drug efflux genes and closely is related to multi-drug resistance . The cellobiose phosphotransferase system confers the assimilation of cellobiose by cleaving the disaccharide into glucose and glucose-1-phosphate, which serve as carbon and energy sources . MetD (MetI-N-Q) is a high-affinity transport system for methionine, which is an important amino acid involved in numerous metabolic processes in bacteria . The CreC-CreB (carbon source-responsive) two-component regulation system in Escherichia coli affects a number of functions, including the intermediary carbon catabolism and intracellular redox state . 2-O-α-mannosyl-D-glycerate is taken up by the MngA phosphotransferase system and utilized as a sole carbon source . Our analysis revealed that genes related to metabolic pathways, antimicrobial resistance and virulence were part of these unique species-specific core genomes, which helped us to characterize the genomic differences between alvei and paralvei. These species-specific genes might serve as a mark for distinguishing alvei and paralvei.
Macromolecular secretion systems reflected the pathogentic potential of Hafnia and Mobile genetic elements mediated the genomic plasticity
Secretion is an essential task for bacteria to interact with their surrounding environment . In particular, many virulence factors in pathogens are secreted . The production of extracellular proteins is important for many aspects of bacterial competition and adaptation such as virulence, antimicrobial resistance, detoxification and scavenging . In gram-negative bacteria, six types of secretion systems (T1SS to T6SS) have been identified and well characterized by numerous experimental studies . Here, we identify occurrences of the macromolecular secretion systems in 47 Hafnia genomes using MacSyFinder . The model and evolution of macromolecular secretion systems identified in Hafnia genomes are described in Fig. 4, 5, and 6. In the following sections we describe the models of each type of macromolecular secretion system.
Gene clusters of types I, III, IV, V, VI and homologues of the flagellum, and Tad pilus secretion systems, were found in the Hafnia genomes (Fig. 4). Although T1SS, Flagellum 1, Tad pilus, and T6SS-1 are restricted to Hafnia, T3SS, T4SS, T5SS, and other T6SSs are not wholly exclusive to strains of alvei and/or paralvei. These strain-specific secretion systems might be horizontally transferred from other species. Furthermore, numerous mobile genetic elements (MGEs) included clustered regularly interspaced short palindromic repeats (CRISPRs), insertion sequences (ISs), genomic islands, and prophages that have been identified across the study genomes (Fig. 4). These elements are the major contributors to horizontal gene transfer (HGT), and drive the adaptation of bacteria to diverse niches. The diverse secretion systems may be acquired as part of the MGEs.
Conserved the type I secretion system
T1SS is composed of three indispensable membrane proteins, an ABC transporter (ABC : providing an inner membrane channel), a membrane porin (OMF : forming the outer membrane channel) and an inner membrane anchored adaptor protein (MFP : connecting the OMF and the ABC components) . T1SS can secrete many proteins, including haemolysins for pathogenesis in the host organism, and some extracellular proteases for nutrient acquisition, some bacteriocins for antibacterial activity . We found that the T1SS cluster was present in all 47 Hafnia genomes. Both alvei and paralvei shared a common T1SS cluster (Fig. 5a). The core components ABC, OMF and MFP were encoded together and highly conserved (protein identity > 92, Fig. 5a), thus indicating that this T1SS is restricted and conserved in the Hafnia genus.
Conserved and species-specific flagellum and incidental the type III secretion system
The flagellum and T3SS are two of the most impressively large macromolecular complexes spanning both membranes of gram-negative bacteria . Two types of flagellum systems are identified in Hafnia genomes (designated Flagellum 1 and Flagellum 2, Fig. 5c). Flagellum 1 is present in both alvei and paralvei except for two alvei strains (NCTC 6578 and PCM 1190). As shown in Fig. 4, Flagellum 2 is present in all alvei strains and only three paralvei strains (HMSC23F03,ATCC 51873, and PCM 1194). Furthermore, we performed phylogenetic analysis using the core genes of Flagellum 2 (Fig. 5d), and the resulting tree revealed a similar topology to that of the core genome tree (Fig. 1). Our analysis shows that most strains of paralvei do not contain Flagellum 2 due to an earlier deletion event.
T3SS evolved from the flagellum and is at the centre of the export machinery that enables the direct transfer of proteins from the bacterial cytosol into the host cells. T3SSs are usually encoded in a single locus, and many are homologous to components of the flagellar apparatus . Three alvei genomes (GB001, PCM 1204, PCM 1214) had a complete set of T3SS genes (Fig. 5d). We compared the CAI (codon adaptation index) and GC-content between the T3SS gene clusters and three host genomes. The T3SS gene clusters displayed an apparent deviation in CAI (Fig. 5e), and it is likely that these T3SS gene clusters were acquired by HGT events. Furthermore we identified the putative homologous T3SS using blastp searches of the NCBI non-redundant protein database. We found that the well-studied T3SS of Salmonella pathogenicity island 2 (SPI-2) was closely related to the T3SS of Hafnia, and they inhabit similar gene loci and show approximately 54% identity between protein sequences(Fig. 5d). The function of the T3SS encoded by SPI-2 is central to the ability of S. enterica to cause systemic infections and for intracellular pathogenesis . It is worth noting that these three alvei strains with T3SS have the potential to cause systemic disease.
Conserved the tight adherence pilus
The Tad pilus secretion system plays a role in biofilm formation, pathogenesis, adhesion or natural transformation in many bacteria . In the Hafnia genus, a Tad pilus system has been identified in all but one alvei strain (NCTC 6578). The components of the Tad systems of both alvei and paralvei are encoded within one genetic cluster and share identical genetic organization and high identity of protein sequences (Fig. 5f). Therefore, our results reveal that this Tad pilus system is restricted and conserved in the Hafnia genus.
Strains-specific the type IV secretion system
T4SSs transport a diverse array of substrates, including DNA, DNA-protein complex, and proteins into the host cells, and play fundamental roles in both pathogenesis and adaptation in the host cellular niche. T4SSs are divided into eight subtypes . Our analysis revealed that 23 Hafnia genomes possessed T4SSs comprised of types T, I, F, G, and pT4SSt. The strain PCM_1218 possesses pT4SSt, which is a secreted protein system, and other T4SSs in Hafnia were the conjugation-related T4SSs. The number of genes in Hafnia T4SSs ranges from 7 to 15. As shown in Fig. 6, the genetic organization of T4SSs in Hafnia is highly diverse, revealing that the presence of T4SSs is strain-specific but not species-specific. The diversity and strain-specific distribution suggest the likelihood of multiple horizontal transfer events during divergent evolution. As the T4SS phylogenetic analysis generated with shared sequences was not congruent to those obtained with genomic sequences, the distribution of Hafnia T4SS phylogenetic clusters within the SecReT4 database  was not assessed further.
Diverse the type V secretion system
T5SS is the simplest and most widespread type of secretion pathways . T5SS encodes the translocator and the passenger domains in a single gene or two partner genes . In this study, based on TXSScan software, T5SSs are divided into three types, designated T5aSS (the classical autotransporter), T5bSS (two partner system), and T5cSS (the trimeric autotransporter) . We found that all 47 Hafnia genomes contain T5SSs (Fig. 4). A total of 370 T5SSs were identified, including 285 T5aSSs, 9 T5bSSs, and 76 T5cSSs. Most of the Hafnia genome contained T5aSSs and T5cSSs. Only 9 strains including 3 alvei and 6 paralvei contained one T5bSS. It is interesting to note that there are notable differences in the numbers of T5aSSs between alvei (3.3±1.8 per genome) and paralvei (9.5±2.5 per genome) (Fig. 4). In consideration of T5aSSs can function as enzymes, adhesins, cytotoxins, or mediate bacterial motility, this differential number of T5SSs suggests that the alvei and paralvei differ in their adhesion and evasion abilities . The lager number of T5aSSs may play an important role in the pathogenesis of paralvei, such as the colonization of host cells, biofilm formation, or evasion of the immune system .
Conversed and diverse the type VI secretion system
T6SS is present only in gram-negative bacteria and is a phage-tai-spike-like injection machinery . It is thought to contribute for bacterial pathogenesis by the translocation of substrates to the host cells and competition with other bacteria in their niches . T6SS has not been reported previously in Hafnia. In this study, all Hafnia genomes demonstrated the possession of one or more T6SSs (Fig. 4), suggesting that they may confer some benefit in terms of host colonization and/or pathogenic potential. Although previous studies suggest that strains carrying T6SS may selectively target proteobacterial commensals for the sake of competitive advantage over other potential competitors , the function of T6SS in Hafnia remains to be elucidated with further experiments. Based on genetic organizations and homology, the T6SS in Hafnia genomes were divided into 7 subtypes (Fig. 7a). T6SS-1 was conserved in most Hafnia strains, but absent in strains C5 and FDAARGOS_350. Nevertheless, similar to T4SS, the presence of T6SS-2 to -7 was strain-specific but not species-specific. The presence of diverse T4SSs and T6SSs further confirms that HGT is a normal event that occurs in Hafnia.
We constructed an ML tree based on the TssK protein sequences to span the diversity present in Hafnia T6SS (Fig. 7b). In the ML tree, the strains with T6SSs of the same subtype T6SSs form a unique cluster; that is distinct from other subtypes. A similar ML tree generated from TssF is provided in Additional file 5: Figure S1A. This observation reveals the diversity of Hafnia T6SS and the possibility of multiple evolutionary origins. To intuitively understand the diverse origins from an overall perspective, we performed phylogenetic analysis using the shared TssF proteins in the SecReT6 database  in combination with the Hafnia T6SS data. As shown in Fig. 7c, the distinct Hafnia T6SSs scatter in different locations, showing high homology to other T6SSs from other species, such as Escherichia coli, Salmonella enterica, Photorhabdus asymbiotica, Enterobacter asburiae, and Citrobacter rodentium. The phylogenetic trees created from the shared TssFs exhibited similar topologies (Additional file 5: Figure S1B). The data indicate that distinct subtypes of Hafnia T6SSs might be horizontally transferred from diverse donor species.
Virulence genotypic profile revealed the pathogenicity of Hafnia
Mobilome-based elements, such as virulence factors and resistance genes, are very important in pathogenicity and inter-strain variation. In our study, all 47 Hafnia genomes were also locally aligned against the VFDB database  and ResFinder database  to detect virulence factors and resistance genes. Virulence factor analysis of the strains examined in this study found that virulence factors represented the greatest numbers of genes among the Hafnia strains (214 virulence factors, Additional file 6: Table S5). Except for previously studied genes related to the macromolecular secretion system, the major virulence factors identified in all strains were associated with adherence (ompA, ilpA, papD, papC, htpB, and csg fimbriae), toxin (hlyA and AHA_3493), iron uptake (chu operon), stress adaptation (katA, clpP, and sodB), and efflux pump (farB and acrAB). Additionally, algU (antiphagocytosis), mgtB (magnesium uptake), luxS (quorum sensing), allS (nutritional factor), icl (lipid and fatty acid metabolism) and rcsB (regulation) were identified as common Hafnia virulence factors (Fig. 8).
Several virulence genes were identified as species-specific virulence factors that occurred in the majority of strains of one species but were found infrequently in strains of the other species, making them good discriminators between alvei and paralvei (Fig. 8). The alvei-specific genes included bcfC, and chuT. The chu operon except for alvei-specific chuT was found to be prevalent in Hafnia strains and was termed the haem transport locus, which appears to be widely distributed among pathogenic E. coli strains . The paralvei-specific genes identified were associated with fimbrial adherence (pefD) and iron uptake (ccmE and fepA). This differential distribution of virulence factors suggests that alvei and paralvei differ in their adhesion abilities and iron absorption capacities.
Antimicrobial genotypic and phenotypic profiles in Hafnia
Fifty resistance genes associated with nine different classes were identified (Fig. 8, Additional file 7: Table S6). All Hafnia genomes contained multiple resistance genes related to aminoglycoside, beta-lactam, bacitracin, cationic antimicrobial peptide, fluoroquinolone, and rifampin. We observed the diversity of acc gene alleles in Hafnia genomes (Fig. 8, Additional file 7: Table S6). Except NCTC8105, all alvei strains contained acc-3 allele. Instead, paralvei strains contained acc-1 (18/21), acc-5 (1/21), and acc-2 (2/21) alleles. This observation was in agreement with previous study. Similarly, the gyrB alleles related to fluoroquinolone also showed species-specific divergence (Res16 and Res17, Additional file 7: Table S6). Except error location of PCM_1191, the ML tree based on gyrB gene sequences separated Hafnia into two species clades with similar topology to core genome tree (Additional file 8: Figure S2). Additionally, we found that many genes encoding efflux pump related to multi-drug resistance are prevalently present in the genome of all Hafnia strains.
All of the 20 Hafnia strains were tested for susceptibility against 21 antimicrobial agents; the results are listed in Additional file 9: Table S7. All strains were uniformly susceptible to ticarcillin-clavulanic acid, cefoperazone-sulbactam, cefepime, imipenem, amikacin, tobramycin, ciprofloxacin, levofloxacin, tigecycline, and trimethoprim-sulfamethoxazole (Additional file 9: Table S7). Almost all strains showed prevalent resistance to amocicillin-clavulanic acid (100%; n=20) and colistin (90%; n=18). Meanwhile, we observed partial resistance against piperacillin-tazobactam (25%; n=5), ceftriaxone (35%; n=7), ceftazidime (35%; n=7), and ertapenem (20%; n=4). In terms of species susceptibility pattern, some alvei strains were resistant to aztreonam (2/13), meropenem (4/13), and chloramphenicol (2/13), meanwhile, one paralvei strain (PCM_1198) was resistant to doxycycline and minocycline.