The Enterobacterales encode five distinct flagellar systems
The complete and draft genomes of 4,082 taxa belonging to the order Enterobacterales were screened for the presence of additional flagellar (flag) loci by performing tBlastN analysis of the full complement of complete protein sequences required for the synthesis of the flag-1 and flag-2 flagella of E. coli K-12 strain MG1655 (47 proteins) and E. coli 042 (38 proteins), respectively, against the genome sequences. In total, 816 (20.26%) of the studied strains incorporate at least two distinct flag loci (Table 1; Additional File 1: Table S1). When considering that 664 strains lack any flag loci altogether and can be considered as incapable of swimming motility, dual or multiple flag loci are hallmarks of 24.26% of the presumed motile enterobacterial taxa. The secondary flag loci are predominated by the previously characterized flag-2 loci, with 593 (575 strains with and 18 strains lacking flag-1 loci) of the studied strains incorporating this locus (Table 1) [15]. The protein complements encoded by the remaining additional flagellar loci, along with taxonomically representative flag-1 and flag-2 datasets were compared. A total of twenty-five distinct single-copy orthologues (SCOs) are conserved among the loci and were aligned and concatenated to generate a Maximum Likelihood (ML) phylogeny. The resultant SCO phylogeny showed that, aside from the flag-1 and flag-2 loci, the remaining loci form three distinct clades, which we have termed flag-3, flag-4 and flag-5 loci (Figure 1). The largest clade incorporates the novel flag-3 loci, which occur in 249 distinct taxa (6.18% of total taxa studied), across a broad taxonomic spectrum, including members of the families Enterobacteriaceae Erwiniaceae, Hafniaceae, Morganellaceae and Yersiniaceae (Table 1; Figure 2). By contrast, the flag-4 and flag-5 loci are less prevalent, being restricted to six (four Pectobacteriaceae and two Yersiniaceae members) and eight taxa (all Plesiomonas shigelloides isolates; Family unassigned), respectively (Table 1; Figure 2).
The flag-3 loci are predominantly found in enterobacteria that also harbour a flag-1 locus, with only Rahnella variigena DLL 7529 incorporating a flag-3 locus, but lacking the flag-1 system. This is similar to the flag-2 system which, with the exception of ten taxa, occur in flag-1 flagellated taxa [15]. The flag-4 system occurs in three taxa (Sodalis spp.) that also incorporate flag-1 loci, while the other three (two Wigglesworthia spp. and Biostraticola tofi DSM 19580) lack the latter locus. All flag-5 encoding P. shigelloides strains also incorporate a flag-2 locus, but lack a flag-1 locus. While the majority of enterobacteria with supernumerary flagellar loci incorporate two flag loci in their genome, nineteen taxa (~0.5% of all strains studied) incorporate three distinct flag loci (Table 1). Eighteen of these harbour a flag-1, flag-2 and flag-3 locus and include members of the families Enterobacteriaceae (one strain of Buttiauxella warmboldiae, one strain of Enterobacter and four strains of Citrobacter Clade C), Hafniaceae (four strains of Obesumbacterium proteus) and Yersiniaceae (eight Yersinia spp.) (Table 1; Figure 2). One strain, Enterobacter ludwigii OLC-1682, lacks a flag-2 locus, but instead incorporates one flag-1 locus and two flag-3 loci (discussed in further detail below).
Comparison of the flag-1 through to flag-5 loci revealed that, while there is some synteny in gene blocks between the distinct flag loci (flag-1 through to flag-5), there is also evidence of extensive rearrangements and inversion of gene blocks (Figure 3). Furthermore, some of the flag loci show evidence of gene deletion events, while in others additional, non-conserved genes have been integrated within the loci (Figure 3). Average amino acid identity (AAI) values across the 25 SCOs shared among all flag loci further supports the separation of the enterobacterial flag loci into five distinct types with intra-clade AAI values ranging between 57.79 % (flag-4 loci) and 99.06 % (flag-5 loci), while the inter-clade values are only between 32.80 and 52.23% (Additional File 1: Table S2). These data suggest distinct evolutionary histories for the flag-1 to flag-5 loci. As such, a more in depth analysis of the flag-3, -4 and -5 loci was undertaken.
The flag-3 peritrichous flagellar loci can be further divided into two subtypes, flag-3a and flag-3b on the basis of sequence synteny and conservation
The flag-3 loci cluster with the peritrichous primary flagellar (flag-1) loci in the SCO phylogeny (Figure 1) and they share 52.23% average amino acid identity across the twenty-five conserved SCO proteins encoded on the loci. The former loci incorporate genes coding for 46/47 proteins involved in flagellar biosynthesis, regulation and maintenance and chemotaxis in the primary flagellar locus. One exception is the absence of orthologues of the gene coding for FliZ in the flag-3 loci. In the flag-1 loci FliZ is an activator for the expression of class 2/middle genes involved in the synthesis in the flagellar hook and basal body [17]. A distinct regulatory system for class 2 gene expression may occur in the flag-3 system.
Phylogenetic evaluation of the flag-3 loci, on the basis of the concatenated alignment of 45 SCOs (excluding FliC which is present in multiple copies in some strains) conserved amongst them, showed that they fall into two distinct clades (Figure 4), with the loci of the upper clade (flag-3a) comprising 154 (61.6% of total flag-3 loci) and the lower clade (flag-3b) comprising 95 taxa. The flag-3a loci are restricted to members of three genera in two enterobacterial families (the Enterobacteriaceae and Erwiniaceae), namely Enterobacter (111/608 of the studied strains), Erwinia (41/61 of the studied taxa) and Pantoea (3/151 studied taxa) (Table 1). By contrast, the flag-3b locus is represented across a much broader taxonomic spectrum, including the Enterobacteriaceae (one Buttiauxella, four Citrobacter Clade D, fifteen Kosakonia and fourteen Pluralibacter strains), Erwiniaceae (one Erwinia, three Mixta and seven Pantoea strains), Hafniaceae (four Obesumbacterium proteus strains), Morganellaceae (Morganella sp. nov. 2 H1r) and Yersiniaceae (four Rahnella, one Serratia Clade B and 39 Yersinia strains) (Table 1). The presence of these flag-3 loci appears to be mutually exclusive, with no taxon containing both flag-3a and flag-3b loci. There is, however, one strain, Enterobacter ludwigii OLC-1682 which incorporates two flag-3a loci. These cluster together with the other Enterobacter sp. flag-3a loci, but in distinct sub-clades, with flag-3a-1 clustering with those of E. bugandensis and flag-3a-2 clustering with the other E. ludwigii loci. They furthermore share 84.42% AAI values (entire Enterobacter clade = 84.12% AAI), suggesting that E. ludwigii OLC-1682 derived these loci through distinct evolutionary events.
The flag-3a loci are on average 46.3 kb in size, have an average G+C content of 55.0% and code for 51 distinct proteins, while the flag-3b loci are ~44.3 kb in size, have an average G+C content of 50.5% and code for 48 proteins (Additional File 1: Table S3). The proteins encoded on the two subtypes, flag-3a and flag-3b, share distinct sequence conservation, with intraclade AAI values of 75.54% (flag-3a) and 69.09% (flag-3b), respectively, while the inter-clade AAI value is 59.01% (on the basis of 25 conserved SCOs) (Additional File 1: Table S2). Furthermore, the flag-3a and flag-3b loci have distinct gene syntenies. The flag-3a loci all comprise of three gene blocks occurring in the order block 1: flhEAB-cheZYBR-fliEFGHIJKLMNOPQR, block 2: fliCDST and block 3: flgNMABCDEFGHIJKL-fliA-flhDC-motAB-cheAW. In most of the flag-3b loci block 3 precedes block 2 and block 1 is situated at the 3′ end of the locus. Two notable exceptions are the flag-3b loci of Buttiauxella warmboldiae CCUG 35512 and Morganella sp. nov 2 H1r. In both taxa, gene block 3 occurs at the 5′ end of the locus, but block 2 is integrated within gene block 1, with part of the latter gene block occurring in reverse complement (Figure 5). The co-localisation of the genes in their gene blocks, regardless of the locus subtype, suggests these loci may have been built by the step-wise incorporation of the individual gene blocks, which may have been derived from distinct ancestral loci. This is supported by the distinct G+C contents of the gene blocks. The G+C content of gene block 1 is on average 4.45% and 2.23% higher than those of block 2 and block 3 of the flag-3 loci (both flag-3a and flag-3b), respectively.
Similar to what is observed in both flag-1 and flag-2 loci [12,15], the flag-3a and flag-3b loci are characterized by the presence of a non-conserved genomic island adjacent to the fliC gene (Figure 5). This island occurs in 150/155 of the flag-3a loci and 78/95 of the flag-3b loci, has an average size of 3.9 kb (range: 0.5 - 8.9 kb) and codes for between one and eight distinct cargo proteins (Additional File 1: Table S3). A total of twenty-seven distinct proteins are encoded on this island, with nine of these unique to the flag-3a loci and seventeen unique to the flag-3b loci (Additional File 1: Table S4). One island feature is shared between the islands of 15 flag-3a and 57 flag-3b loci and codes for a methyl-accepting chemotaxis protein (Mcp1; 45.26% average amino acid identity; COG0840). A second methyl-accepting chemotaxis protein (Mcp2; 86.22% average amino acid identity; PRK15048) is found in the flag-3a islands of Enterobacter and Pantoea spp. A key feature among the Enterobacterales is the presence of genes in the flagellar locus coding for proteins with roles in glycosylation and post-translational modification of the main flagellar structural protein, flagellin. Flagellin glycosylation has been linked to a number of functions, including flagellar synthesis and stabilization, biofilm formation, surface recognition and adherence, virulence and host immune evasion [18,19]. Previous studies that showed 17.4% (307/1,761) and 57.6% (341/592) of the flag-1 and flag-2 loci incorporated flagellin glycosylation machinery, respectively [12,15]. Among the flag-3 loci, only the flag-3a loci of 41 (26.45% of the flag-3a loci; 16.4% of total flag-3 loci) Erwinia spp. incorporate three genes coding for enzymes involved in flagellin modification adjacent to the fliC gene (Figure 5). One gene codes for a 1,127 amino acid (96.45% AAI) N-acetyl glucosamine glycosyltransferase (Spy), which catalyses the post-translational addition of O-linked beta-N-acetylglucosamine to serine/threonine residues in the target protein (Additional File 1: Table S4) [20]. The flagellin glycan chains are frequently further decorated by formyl, methyl, acetyl and amino groups [12]. Adjacent to the spy gene in the Erwinia flag-3a are genes coding for an aminotransferase (WecE; 98.01% AAI; COG0399) and O-acetyltransferase (WbbJ; 93.65% AAI; COG0110) suggesting that the Erwinia flag-3b flagellin glycan is both acetylated and aminated, while the Spy protein also incorporated a methyltransferase domain (pfam13649) (Additional File 1: Table S4).
A key feature of the non-conserved island adjacent to the fliC gene among the flag-3a loci is the universal presence of a gene coding for a transcriptional regulator, CadC1 (COG3710) (Additional File 1: Table S4). Given its position in the locus, where fliZ occurs in the flag-1 locus, it is plausible that this transcriptional regulator may serve a similar role in the flag-3a loci, but this will need to be validated experimentally. No gene with this purported function is present in the flag-3b loci and hence how the class 2 gene expression would be regulated in the latter loci remains unclear.
The flag-4 locus is predominant among insect endosymbionts and codes for a predicted peritrichous flagellum
The flag-4 loci cluster with the flag-1 and flag-3 loci in the flagellar SCO phylogeny and share 49.69% and 46.73% AAI values with the former and latter loci, respectively (Figure 1; Additional File 1: Table S2). BlastP analysis of the flag-4 protein complement revealed the closest matches are proteins encoded on the flag-1 and flag-3 loci, suggesting that the flag-4 locus likewise codes for a peritrichous flagellar system. Of all the flag systems, the flag-4 loci show the greatest versatility, ranging in size between 31.6 and 45.1 kb and G+C content between 22.5 and 56.1% with an intra-clade AAI value of 57.79% (across 25 conserved SCOs) (Additional File 1: Table S2; Additional File 1: Table S3). Furthermore, while the flag-4 loci share extensive synteny, the flag-4 loci show evidence of frequent deletions and gene disruption through transposon integration (Figure 6). As such, the flag-4 locus of the tsetse fly endosymbiont Sodalis glossinidius ‘morsitans B4’ lacks the cheZYBR-cheAW and fliZ genes and incorporates a pseudogene of flhE, while that of S. pierantonius SOPE (Sitophilus oryzae endosymbiont) lacks the genes cheZYBR-cheAW, flhE, flgB, fliZST and incorporates pseudogene copies of flgC, fliQ, fliE, fliC and fliD. Similarly, while their genomes incorporate flag-1 loci, they are likewise heavily degraded (Figure 6). The eroded flag-1 and flag-4 loci in these Sodalis strains are typical of the observed degenerative genome evolution as these bacteria adapted to a symbiotic lifestyle and indeed they have been described as non-motile [21,22]. The gene complement of the flag-4 locus of Sodalis praecaptivus HS1 reflects that observed in S. glossinidius ‘morsitans B4’, and it also harbours a complete flag-1 locus (encodes 46/47 of the primary flagellar proteins, with the exception of FliZ). This latter strain, differs from the insect symbiont Sodalis spp. in that it was isolated from a human wound, can persist in free-living form and has been observed to be capable of swarming motility [23,24], although whether the flag-1 or flag-4 locus encoded flagellar system is responsible for this capacity remains to be elucidated.
The two tsetse fly endosymbiotic Wigglesworthia glossinidia strains included in this study lack flag-1 loci but both incorporate flag-4 loci. The loci of these two strains are the smallest among the flag-4 loci (Additional File 1: Table S3) and include only 33/47 genes coding for orthologues of the primary flagellar (flag-1) locus proteins, lacking the chemotaxis genes cheZYBR-cheAW, as well as the flagellar biosynthetic and regulatory genes flhE, flgM and fliATZ (Figure 6). As such, the resultant flagellar system would be expected to be non-functional as is the case in the endosymbiotic Sodalis spp. However, gene expression analysis of the fliC (flagellin) and motA (motor protein A) genes and immunohistochemistry analysis with flagellins-specific antibodies showed the expression of both and production of flagellin in intrauterine larvae and the milk gland cells of tsetse flies, suggesting an important role for the flag-4 flagellum in Wigglesworthia vertical transmission from host mother to progeny [25]. Similarly, Biostraticola tofi DSM 19580, isolated from biofilm on a tufa limestone deposit has been shown to synthesise flagella and be capable of swimming motility [26]. As the genome of this bacterium solely incorporates a flag-4 locus, a role for the flagellar system it encodes in motility can be suggested for the Biostraticola and Wigglesworthia.
The flag-5 locus is unique to Plesiomonas shigelloides among the Enterobacterales and codes for a polar flagellum
Plesiomonas shigelloides lack flag-1 loci, but previous studies have identified a lateral flag-2 locus in this species [15,16]. Furthermore, they incorporate the distinct flag-5 locus that forms a separate clade in the SCO phylogeny (Figure 1). The proteins encoded by this locus share limited sequence identity with those encoded on the other flag loci, with AAI values ranging between 32.80% (flag-4) and 34.92% (flag-1) across the 25 SCOs conserved among the loci (Additional File 1: Table S2). Instead, they share 54.95% AAI across 52 proteins with the polar flagellar loci of Vibrio parahaemolyticus BB22O (AF069392.3 and U12817.2) [16]. Furthermore, the flag-5 locus shows extensive synteny with the polar flagellar locus of the latter strain (Figure 7). While the V. parahaemolyticus polar flagellar system is encoded by two loci, which are separated by ~1.45 megabases on the chromosome, the genes for polar flagellar synthesis in P. shigelloides are harboured in a single locus, which ranges between 57.9 and 62.0 kb in size and codes for 57-61 distinct proteins among the eight P. shigelloides incorporated in this study. These loci also represent the largest among the flag-1 to flag-5 loci.
Vibrio parahaemolyticus encodes two flagellar systems which facilitate movement under distinct conditions, with the lateral flagella (multiple) allowing swarming motility across surfaces, while the single polar flagellum enables swimming in liquid environments [7,27]. The former are powered by the proton motive force, as observed for the peritrichous (flag-1) flagella of the Enterobacterales, while the latter derive their energy through the sodium membrane potential [7,27]. A previous study has shown that P. shigelloides is likewise capable of both swarming and swimming motility, with the flag-2 and flag-5 linked to the former and latter form of motility, respectively [16].
Comparison of the protein complement with those of the other Enterobacterales flag loci identified twenty-nine proteins that are unique to the flag-5 loci, although seven of these share orthology with proteins encoded by the V. parahaemolyticus locus. Three of these orthologues, FlgO, FlgP and FlgT (30.04-48.83% AAI with V. parahaemolyticus BB22O AGB09241.1-244.1) are predicted to form part of the H ring, an additional basal body ring that is associated with the outer membrane. The H-ring is specific to Vibrio spp. and facilitates outer membrane penetration and external assembly of the sheathed polar flagella [27,28]. The Vibrio polar flagellar locus lacks the genes flhC and flhD coding for the master transcriptional regulators in the enterobacterial peritrichous flagella. Instead the Vibrio polar flagellar loci incorporate three genes, flaK, flaL and flaM which are purported to fulfil this function. FlaK is a σ54-dependent transcription factor of FlaL and FlaM. The histidine kinase-like FlaL protein then phosphorylates FlaM which activates the transcription of the flagellar middle class genes [10]. The Plesiomonas FlaK orthologues share 58.15% AAI with that of V. parahaemolyticus BB22O (AGB10537.1). However, instead of two distinct FlaL and FlaM proteins, the flag-5 locus of P. shigelloides encodes a single 558 aa protein, FlaLM, with a histidine kinase domain (cd00082; Bitscore: 41; E-value 8.47e-5) and a σ54-activator domain (pfam00158; Bitscore: 295; E-value: 3e-98) at the C and N terminal, respectively. The FlaLM protein appears to be the product of a gene fusion between flaL and flaM, with the first 120 aa sharing 56.08% AAI with the V. parahaemolyticus FlaL protein (AGB10535.1; aa 1-184) and aa 200-558 sharing 59.24% AAI with the V. parahaemolyticus FlaM protein (AGB10536.1; aa 121-468). Both the P. shigelloides flag-5 and V. parahaemolyticus BB22O polar flagellar loci lack orthologues of the chaperone protein FliT. However, both incorporate a gene, flaI, adjacent to the fliS orthologues, which encodes a protein that is similar in size to FliT and hence may perform the chaperone function [7]. Furthermore, both encode a chemotaxis related protein, CheV, a CheY-CheW hybrid protein which is absent in the Enterobacterales flag loci [7].
A key difference between the P. shigelloides flag-5 loci and the V. parahaemolyticus BB22O polar flagellar locus is the presence of six distinct orthologues of the flagellin protein, FlaA-F in the latter strain. The P. shigelloides flag-5 loci include one orthologue, which shows highest sequence identity with the FlaC protein in V. parahaemolyticus BB22O (AGB09262.1; 51.85% AAI). By contrast, the P. shigelloides flag-5 loci uniquely incorporate between four and eleven genes adjacent to the flaC gene (Figure 7; Additional File 1: Table S5). In P. shigelloides 302-73, these genes have been shown to code for proteins involved in the synthesis of the legionaminic acid, which posttranslationally glycosylates the flagellin protein [16]. Mutagenesis shows this glycan to be essential for biosynthesis of the flagellum in this strain [16]. Distinct flagellin glycosylation proteins are encoded in the other flag-5 loci. P. shigelloides FM82, incorporates three genes coding for the proteins NeuAc (acylneuraminate cytidylyltransferase), NeuB (N-acetylneuraminate synthase) and NeuC (UDP-N-acetyl-D-glucosamine 2-epimerase), involved in the synthesis of neuraminic acid, the precursor for legionaminic acid (Additional File 1: Table S5). These proteins share only 31.16% AAI with its orthologues in the P. shigelloides 302-73 flag-5 locus, suggesting that a distinct flagellin glycan is present in P. shigelloides FM82. The six other strains incorporated in this study code for orthologues of PseB (UDP-N-acetylglucosamine 4,6-dehydratase), PseC (UDP-4-amino-4,6-dideoxy-N-acetyl-beta-L-altrosamine transaminase), PseF (pseudaminic acid cytidylyltransferase), PseG (UDP-2,4-diacetamido-2,4,6-trideoxy-beta-L-altropyranose hydrolase), PseH (UDP-4-amino-4,6-dideoxy-N-acetyl-beta-L-altrosamine N-acetyltransferase) and PseI (UDP-4-amino-4,6-dideoxy-N-acetyl-beta-L-altrosamine N-acetyltransferase) (Additional File 1: Table S5), the six enzymes which constitute the pathway for the synthesis of pseudaminic acid that forms part of the flagellin glycan in a number of both Gram-negative (Campylobacter and Helicobacter spp.) and Gram-positive (Geobacillus spp.) taxa [29,30]. These proteins share 97.41% AAI among the six P. shigelloides flag-5 loci that code for them, with the exception of PseI, where the orthologue in P. shigelloides NCTC 10360 shares only 29% AAI with those in the other five strains (97.42% AAI). This highlights that, while flagellin glycosylation appears to be a universal feature of the P. shigelloides flag-5 system, it is highly versatile both in the type of sugar and decoration of the glycan.