L-ENA fibers constitute a novel sub-class of disulfide cross-linked endospore appendages.
The endospores of the food-poisoning outbreak strain B. paranthracis NVH 0075-95 are decorated with two types of ENA fibers, i.e., S-ENA and L-ENA (Fig.1a-b). Close inspection of the exosporium layer reveals that L-ENA fibers protrude from the brush-like hairy nap layer (Fig.1b) seemingly tethered to the exosporium. L-ENA fibers have an apparent diameter of 8nm and exhibit a ladder-like uranyl staining pattern in nsTEM with a typical recurring distance of 4.6nm between consecutive ladder segments. L-ENAs consistently terminate in a single 2nm diameter tip-fibrillum or ruffle, that consists of a stalk region (45nm in length) and a terminal knob domain (Fig.1c). In preparation of cryoEM data collection, we produced a sample enriched in ENA via shear-induced dislodging from the spore surface followed by a series of purification steps as described in (18). Next, a 1568 movie cryoEM dataset (60 frames, 0.766Å/pixel) was collected on a JEOL CRYO ARM 300 microscope, yielding a 5.8Å resolution (FSC=0.143 criterion) reconstructed cryoEM-volume of L-ENA fibers with C7 symmetry (see Fig.1e and Supporting Figure 1). The refined helical parameters are a twist of 17.04°, and a rise of 43.82Å. We attribute the limited resolution of the final reconstruction to (i) the sparsity of images that contained L-ENA fibers (414 micrographs, i.e., 26% of the total dataset), and (ii) the innate flexibility of the L-ENA fibers (discussed below). The limited resolution precluded us to de novo build an atomic model of L-ENA but allowed us to identify the fold of the constituent monomers. Indeed, an exploratory rigid body docking exercise using Ena1B as template model (extracted from PDB: 7A02) taught us that the L-ENA subunits appeared structurally homologous to Ena1A/B and therefore likely represented members of the DUF3992 family. A careful search of the NVH 0075-95 genome (GCA_027945115.1) with HMMsearch 3.0 (22) using a hidden Markov model generated from a dataset of Ena1/2 protein sequences as query led to the identification of a fourth gene outside the ena1a-c cluster encoding for a DUF3992 domain-containing protein. As such, we reasoned that WP_017562367 represented the candidate L-ENA subunit and ordered its coding sequence as a synthetic gene, and cloned it into pET28a for cytoplasmic expression in E. coli C43 (DE3), similar to the approach taken for recombinant S-ENA production (18). To probe for the presence of putative L-ENA fibers, the insoluble fraction (after lysozyme treatment and 1% SDS extraction) of the E. coli lysate was analyzed using nsTEM. The obtained micrographs confirmed the presence of micron-sized, 7nm diameter fibers with a characteristic ladder-like pattern at 4.6nm intervals (Fig.1f). Notably, the recEna3A fibers did not have any ruffles on their termini. This result served as an initial confirmation that WP_017562367 is the major subunit of L-ENAs found on B. paranthracis NVH 0075-95. To reflect the fact that WP_017562367 is a member of a new ena gene cluster, we propose to name it ena3a, following the convention used for the ena1a-c (e.g., B. cereus type S-ENA) and ena2a-c (e.g., B. thuringiensis type S-ENA) (18).
Given the relative ease of production and purification of recEna3A fibers in comparison to the L-ENA purification from the natural source, we proceeded to collect a 10886 movies cryoEM dataset of vitrified recEna3A fibers (Supporting Figure 2) leading to a 3.3 Å global resolution (FSC=0.143 criterion) cryoEM volume after helical refinement using C7 symmetry in CryoSPARC v4.0.3 (Fig.2). The refined helical parameters are a twist of 18.5°, and a rise of 44.97Å. The recEna3A cryoEM volume reveals an axial stacking of heptameric Ena3A rings that rotate 18.5° clockwise relative to each other (Fig.2a). Each ring is composed of seven Ena3A molecules that interface laterally via b-sheet augmentation. The rings encircle a central, hollow volume which we refer to as the ring lumen (Fig.2b). Consecutive rings interlock with each other using the N-terminal extensions of the upper ring docking into the lumen of the ring below (Fig.2c). We refer to the first 14 N-terminal residues as the N-terminal connector (i.e., Ntc) in analogy to the N-termini of Ena1A/B subunits that interlock from above with subunits i-9 and i-10 of the neighboring helical staircase (18). Note that the orientation of the L-ENA fiber in Fig. 2 is such that the top rings represent the distal, ruffled end of the fiber, and that the bottom rings form the pointy end of the fibers, proximal to the spore surface. Close inspection of the cryoEM map revealed covalent cross-links between neighboring Ena3A subunits at three distinct locations (Fig.2d). After manual building of the final L-ENA model, we identified these contacts to be 3 types of inter-molecular disulfide bridges. Two disulfide bridges (i.e., Cys22-Cys82 and Cys14-Cys15) serve to reinforce the lateral contacts within a single ring, whereas one disulfide bridge (i.e., Cys8-Cys21) confers longitudinal coupling between the lumen and the Ntcs of consecutive rings. Given the heptameric nature of the structure, each ring segment will contain 21 inter-molecular disulfide bridges that are organized along three concentric rings centered on the fiber axis (Fig.2e).
Next we tested the stability of recombinant L-Ena fibers by subjecting the fibers to a series of physical (autoclaving) and/or chemical treatments (8M urea, SDS, proteinase K, formic acid). Following the respective treatments, we performed nsTEM imaging of the recovered fibers and determined the 2D class average to gauge the intactness of the fibers (Supporting Figure 4). Remarkably, the 2D class average images of all treatments (apart from the 100% formic acid) are very similar to the 2D class averages obtained from ex vivo purified fibers. We were not able to produce 2D class averages for the 100% formic acid (FA) treated sample, likely indicating that the Ena3A monomers had (partially) unfolded. Interestingly, the FA-treated fibers did not depolymerize, suggesting that disulphide bonds had not been reduced. We conclude that the extensive hydrogen bonding between the Ena3A subunits combined with the covalent cross-links underlies the physico-chemical robustness of the L-ENA fibers.
L-ENA marries this extreme stability with a remarkable flexibility as judged from the regions of high local curvature in the nsTEM micrographs and 2D class average images obtained during cryoEM processing (Fig.1f). To illustrate the point further, we manually selected and extracted curved L-ENA fiber segments from the motion-corrected micrographs and performed 2D classification (Supporting Figure 3a). To resolve the underlying structural heterogeneity, we performed a 3D variability analysis using CryoSPARC (Supporting Figure 3b). The resulting volume series (processed in ChimeraX 1.4 (23, 24) and exported as Supporting Movie 1) provides further molecular insights into the L-ENA flexibility. Ena3A rings exhibit a rocking motion normal to the fiber axis with the respective hinge points centered on the Ntcs. Hence, L-ENA flexibility can be traced back to (i) the spatial separation between consecutive rings thereby providing the possibility for local ring displacement without introducing steric clashes, (ii) combined with the intrinsic flexibility of the N-terminal connectors.
Ena3A subunits consist of a typical jellyroll fold (25) comprised of two juxtaposed β-sheets containing strands BIDG and CHEF (Fig.3). As mentioned earlier, the jellyroll domain is preceded by a flexible 14-residue Ntc that mediates inter-ring coupling. The backbone (420 atoms) root-mean squared displacement (RMSD) between Ena1B and Ena3A is 3.7Å even though the sequence identity is only 28.4% (Supporting Figure 5a-c). Despite the structural similarities at the fold level, S- and L-ENA have a markedly different quaternary architecture (Figure 3a-c). Ena1B subunits interact laterally via b-sheet augmentation (Supporting Figure 5d-e) with a non-zero (i.e., 3.2Å) vertical offset, making a 28° angle relative to the fiber axis. This in turn leads to a helical stacking of Ena1B monomers yielding an S-ENA fiber with a diameter of 110Å. Ena3A subunits form a similar dimer interface (see the residues marked with an asterisk in Supporting Figure 5b) that follows the same register between the interfacing strands G and C (Supporting Figure 5f-g). Contrary to Ena1B though, neighboring Ena3A subunits lie within the same plane (i.e., no vertical offset) thereby producing closed ring-like structures. We attribute this difference in axial displacement to the relative tilts that the subunits make with respect to the fiber axis, i.e., 28° and 14° for Ena1B and Ena3A, respectively (Fig. 3c and f). L-ENA fiber biogenesis therefore likely proceeds via docking and covalent locking (via the Cys8-Cys21 disulfide) of fully formed rings, whereas S-ENA fibers elongate via integration of successive monomers.
Distribution and expression of the ena3A gene cluster
Inspection of the GCA_027945115 genome reveals that ena3A (PGS39_28750) is embedded in a three-gene cluster on the NZ_CP116205 plasmid, hereafter referred to as the ena3 gene cluster (Figure 4). In this cluster, ena3A is preceded by the genes PGS39_28740 and PGS39_28745. For reasons discussed below, we suggest naming these two genes, i.e. exsL and l-bclA, respectively. The ena3 gene cluster is rare: it was found in only 62 organisms through a remote search of the entire NCBI RefSeq non-redundant protein database using cblaster (CAGECAT v. 1.0). Assemblies of sufficient quality were downloaded and appended to a representative database of genomes of the Bacillus genus. The proportion of genomes carrying an ena3 gene cluster was indeed low; only 9.5% (n=62/656) of the B. cereus s. l. genomes, of which 51 were B. cereus (n=51/126), four B. thuringiensis (n=4/52), one B. anthracis (n=1/63), one B. paranthracis (n=1/4), one B. toyonensis (n=1/204), two B. mobilis (n=2/5), and two Bacillus sp. (2/4) (Supporting Figure 6). Most genomes had one copy of the gene cluster (n=53), while nine strains had paralogs, carrying two (n=5, B. cereus (3), Bacillus sp. (2)), three (n=2, B. cereus), four (n=1, B. thuringiensis) and five (n=1, a B. thuringiensis) copies. The ena3 gene cluster was not found in B. subtilis or other saprophytic Bacilli (see Supporting Data).
Thirteen genomes with an L-ENA gene cluster were complete and closed, and the genomic location of the gene cluster could be inspected. The ena3 gene cluster was located either on the chromosome (n=7 genomes), on plasmids (n=1 genome) or were found as paralogs on one or more plasmids and on the chromosome (n=5 genomes). Many of the isolates carrying the ena3 gene cluster were from outbreaks of bloodborne infections in hospitals in Italy and Japan (26, 27).
The ena3 gene cluster of NVH 0075-95 is flanked by an incomplete topoisomerase upstream and a complete Tn3 family transposase upstream. This gene synteny was found in only two other strains in the cblaster/clinker analysis: B. cereus AFS093282 (NZ_NVMQ01000017.1) and B. pacificus strain BC444B (accession NZ_JAOPRQ010000006.1). In other strains, the L-ENA gene cluster was flanked by frameshifted versions of Tn3 family transposases, tyrosine- type recombinase/integrase and site-specific integrases. A few strains had a shorter Tn3 family transposase upstream of the L-ENA gene clusters. Taken together, the L-ENA gene cluster is/or has been located on a transposon and is part of the mobilome of the B. cereus s.l. group, accounting for the polypheletic distribution of this gene cluster in the population.
To determine the expression of L-ENA genes (exsL, l-bclA and ena3A), NVH0075-95 was cultured in sporulation medium for 16 hours, and cDNA prepared from culture samples collected at four-hour intervals after inoculation (4, 8, 12 and 16 hours) and analyzed by PCR. Using primers that specifically amplify open reading frames across exsL→l-bclA and l-bclA→ena3A (Figure 4a), we detected a ~587 bp PCR product across l-bclA→ena3A, but not for exsL→l-bclA (Figure 4b). This suggests that l-bclA and ena3A are expressed bicistronically. Notably, no l-bclA-ena3A transcript was detected during the first 8 hrs of cultivation, which represents the vegetative growth phase (Fig. 4b). Consistent with the PCR result, the data from a qPCR analysis indicated that the L-ENA genes are overexpressed exclusively during the sporulation phase (12 and 16 hours) (Fig. 4c). Notably, a ~13000, ~1200 and ~40-fold increase in the expression of ena3A, l-bclA and exsL, respectively, was evident in the samples collected at 12 hours after inoculation.
The ena3 gene cluster consists of the L-ENA subunit, the exosporium anchoring protein ExsL and the complement C1Q-like ruffle protein L-BclA
Ena3A is found in an operon with exsL and l-bclA. In pursuit of the biological roles of exsL and l-bclA, we made individual knockouts strains (ΔexsL and Δl-bclA) and investigated their respective endospores by nsTEM (Fig. 5). ExsL depleted spores are devoid of L-ENA fibers coupled to the exosporium but are otherwise morphologically identical to wild-type NVH 0075-95 spores. Careful inspection of the grid areas lead to the identification of detached L-ENAs in the spore supernatant, with ruffles present, suggesting that ExsL mediates the connection of L-ENA to the exosporium. Conversely, Δl-bclA spores have L-ENAs present on the exosporium, but the fibers lack the distal ruffle. We do note that the ruffles remained present on the S-ENA fibers, suggesting that l-bclA specifically encodes for the L-ENA ruffle protein. As expected, ena3a- spore samples were completely devoid of L-ENA fibers, be it spore attached or detached. Complementation of the ΔexsL, Δl-bclA and Δena3A mutants with a low copy plasmid (pHT315) containing the respective genes, restored the corresponding spore phenotypes to that of the wild type strain (Fig. 5).
A primary sequence analysis of ExsL (WP_048548726.1) using Interpro (28) shows that it is composed of an N-terminal spore coat protein Z (PF10612) domain and a C-terminal Ena core (DUF3992) domain (see the domain organization and corresponding AF2 prediction in Fig.6b). To gain further insights into the molecular mechanisms of L-ENA anchoring and ruffle formation, we performed Alphafold2 (AF2) modelling (29-31). First, we tested for the plausibility of a Ena3A-ExsL complex formation. In Supporting Figure 7 we show the AF2 multimer v1.2 model of an Ena3A-ExsL dimer, which had an overall pLDDT score of 82.6 and a ptmscore of 0.73, which is in support of the hetero-dimer hypothesis. As expected, Ena3A interfaces with ExsL via its C-terminal Ena-core domain in a manner that mimics the Ena3A dimer interface found in the L-ENA structure (Supporting Figure 7e-f). This is somewhat surprising given the low sequence identity (17.4%) between the Ena3A and the ExsL Ena-core domain (i.e. residues 157-262). Inspection of the pairwise sequence alignment (Supporting Figure 6d) between Ena3A and ExsL learns that key Ena3A residues (C82, S84 and T86) involved in lateral subunit contacts are conserved in ExsL (C224, S226 and T228). In fact, AF2 predicts a disulfide bridge between ExsL cys224 and Ena3A cys22, which mimics the intra-ring L-ENA S-S bridge between cys22 and cys82 (Supporting Figure 7e-f; Fig. 2d-e). Despite the low sequence identity in Ena3A and the ExsL Ena-core, the Ena3A-Ena3A and Ena3A-ExsL contacts are highly equivalent (Supporting Figure 7e-f), in line with the general mechanism of b-sheet augmentation, which is primarily driven by shape complementarity and backbone H-bonding and is relatively insensitive to the amino acid sequence in the paring b-strands (32).
In addition, we looked at the AF2 prediction of a putative ExsL-ExsY dimer. ExsY (WCA20099.1) is one of the major constituents of the B. cereus exosporium and shares structural similarities to the C-terminal CotZ domain of the exsL gene in the ena3 gene cluster, albeit at low sequence identity, i.e., 30.8% (Supporting Figure 8a,b,f). Our assumption based on the DexsL phenotype and the ExsL-Ena3A AF2 model was that ExsL could also be a potential binding partner of ExsY and, in doing so, act as an exosporium-embedded anchor point for L-ENA fibers. Indeed, the AF2 multimer model of a ExsL-ExsY dimer (with a relatively high pLDDT score of 80.2 and a ptmscore of 0.71) shows an ExsL-ExsY coupling via the N-terminal ExsL1-156 domain with ExsY, lending further credence to the supposition that ExsL serves as a connective bridge between the paracrystalline exosporium and L-ENA. In analogy to the ExsL-Ena3A dimer, the predicted ExsL-ExsY contact mimics the homomeric ExsY contacts (Supporting Figure 8c,d,e,g) that we observe in a high confidence AF2 multimer prediction of an ExsY hexamer (pLDDT=81.3; ptmscore=0.86), i.e., the base building unit of the exosporium lattice.
In turn, L-BclA (WP_271292911.1) contains an N-terminal Collagen triple helix repeat domain (PF01391) followed by a BclA C-terminal domain (PF18573; part of the C1q_TNF clan (Pfam CL0100)). BclA is the major glycoprotein of the hairy nap found to cover the exosporium of most B. cereus s.l. species (33, 34). Interestingly, this same architecture is found in vertebrate complement component C1q (35) (uniprot: P02745). In Supporting Figure 9a-b, we present the AF2 multimer model of an L-BclA94-336 homotrimer that consists of the C-terminal trimerization domain and a segment of the collagen-like triple helix. The corresponding pLDDT value of 94 and ptmscore of 0.9 support the accuracy of the predicted fold, as well as the supposed trimeric stochiometry in analogy to the crystal structure (PDB: 1WCK) of BclA-C of B. anthracis. Based on this model, we predict the lateral dimension of the C-terminal domain to be approximately 4nm, and for the triple collagen helix we project an average length of 2.8 Å per residue (see discussion below; Supporting Figure 9a). The all-atom RMSD between L-BclA-CTD and BclA-C is 3.55 Å despite the low sequence identity of 22.1%, demonstrative of significant structural homology (Supporting Figure 9c). Tan and Turnbough showed that the attachment of BclA to the basal layer protein BxpB of the exosporium is dependent on (i) proteolytic removal of the first 20 residues from the N-terminus and (ii) the presence of an N-terminal submotif -hereafter referred to as the exosporium leader sequence (interpro: IPR0212010; Supporting Figure 9d) - in front of the collagen-like region (36). Inspection of the N-terminal region of L-BclA demonstrates the absence of an exosporium leader sequence, which indicates that this collagen-like protein is likely not targeted to the exosporium.
A notable feature of collagen-like proteins is that the number of residues that comprise the collagen-stalk region in the primary sequence translates proportionally to the axial dimension of the folded, trimeric entity simply due to the extended nature of the collagen fold (34). To that end, we searched the GCA_027945115.1 genome for the ‘local’ orthologue of BclA (uniprot: Q81JD7) and compared its primary sequence to L-BclA (Supporting Figure 9e). BclANVH 0075-95 (WCA20088.1) has a collagen-like region of 125 residues, whereas the corresponding domain of L-BclA spans 173 residues. Based on the calibration discussed in Supporting Figure 9a, this translates into a predicted fully extended length of a folded trimeric entity of 39 nm and 52 nm for BclA and L-BclA, respectively. These predictions are in agreement with experimental measurements (average height of the hairy nap: ~36nm and average length of L-ENA ruffle: ~50nm) from nsTEM micrographs (Supporting Figure 9f). To further understand the topology of the L-ENA/L-BclA complex, we performed a single round of 2D classification based on particles that were manually picked from the cryoEM dataset that was collected on ex vivo isolated ENA fibers, focusing on L-ENA termini that were decorated with ruffles (Supporting Figure 9g). This class average shows that the ruffle docks into the lumen of the Ena3A ring at the apex of the L-ENA fiber, i.e. mimicking the inter-ring docking mechanism that exists along the body of the fiber. We therefore compared the Ntc of Ena3A to the N-terminus of L-BclA (Supporting Figure 9d). Although no particular sequence homology could be detected, both sequences contain a double cysteine motif at positions 8 and 13, respectively. As cysteine 8 mediates coupling of the Ena3A Ntc to the ring lumen, we speculate that the L-BclA C13C14 motif could be involved in a similar coupling mechanism.