Mfa5 from Porphyromonas gingivalis: a von Willebrand factor domain and an intramolecular isopeptide bond in a Gram-negative bacterial mbrial protein

The Gram-negative bacterium Porphyromonas gingivalis is a secondary colonizer of the oral biolm and is involved in the onset and progression of periodontitis. Its mbriae, of type-V, are important for attachment to other microorganisms in the biolm and for adhesion to host cells. The mbriae are assembled from ve proteins encoded by the mfa1 operon, of which Mfa5 is one of the ancillary tip proteins. Here we report the X-ray structure of the N-terminal half of Mfa5, which reveals a von Willebrand factor domain and two IgG-like domains. One of the IgG-like domains is stabilized by an intramolecular isopeptide bond, which is the rst such bond observed in a Gram-negative bacterium. These features make Mfa5 structurally more related to streptococcal adhesins than to the other P. gingivalis Mfa proteins. The structure reported here indicates that horizontal gene transfer has occurred among the bacteria that within the oral biolm.


Introduction
Fimbriae (also called pili) are long lamentous protein polymers that project from the bacterial surface and are crucial for attachment to other microorganisms, host cells, and surfaces [1]. They usually contain several protein subunits encoded by the same gene cluster, resulting in the assembly of a long shaft of repetitive proteins decorated by ancillary tip proteins [2][3][4]. For some gene clusters, anking transposons have been identi ed indicating that horizontal gene transfer can occur [5]. In Gram-negative bacteria, these lamentous structures are assembled in a non-covalent manner assisted by multi-protein complexes that span either the outer membrane or both the outer and inner membranes. In contrast, Gram-positive mbrial proteins are covalently linked to each other by intermolecular isopeptide bonds, which are amide bonds between a lysine side chain of one subunit and the carboxyl group of a C-terminal threonine of the next subunit. The formation of these bonds is mediated by speci c sortases that are encoded by the same gene cluster [6,7]. To date six different types of mbriae have been classi ed: 1) type-I, also called the chaperone-usher type, 2) type-IV mbriae, 3) type-IV secretion mbriae, 4) type-V mbriae, 5) curli bers, and 6) sortase-mediated mbriae, all of which are essential for bacterial virulence [2,4].
FimA mbriae are up to 2 µm long and Mfa1-mbriae are ~100 nm long [26,27], and both have been shown to be involved in disease progression [28]. The mfa1 and mA gene clusters encode ve proteins each, including one major stalk subunit (Mfa1 or FimA), a cell-wall anchor (Mfa2 or FimB), and three ancillary tip proteins (Mfa3-5 or FimC-E) [29]. The individual proteins harbor an N-terminal signal peptide that leads them to be transported into the periplasm via the Sec pathway [30]. Within the periplasmic space, the proteins are acylated, processed by signal peptidase II, and transported to the outer membrane by the localization of lipoprotein export pathway [31].
Bacteria of the bacteroidetes phylum have a speci c secretion system, type-IX, that transports certain proteins across the outer membrane [32], and proteins translocated by this system are selected by a conserved C-terminal signal domain. Among the mbrial proteins such a C-terminal signal domain is only found in Mfa5. Blocking the type-IX secretion system in P. gingivalis inhibits export of Mfa5 to the outer membrane, whereas the other mbrial proteins are not affected [33], indicating that they are transported over the membrane using another, unknown pathway. Mfa5 is also unique among the mbrial proteins in that it is exceptionally large (1,228 amino acids), and whereas the Mfa1-4 proteins consist of two βsandwich domains, Mfa5 is predicted to contain several domains, one of which is a von Willebrand factor (vWF) domain.
In this study we present the crystal structure of the N-terminal half of Mfa5, residues 99-664, at 1.8 Å resolution. Our high-resolution structure reveals unforeseen similarity to adhesins hitherto only observed in Gram-positive bacteria. As predicted, the structure contains a vWF domain with a complete metal iondependent adhesion site (MIDAS) as well as two IgG-like domains. Intriguingly, one of the IgG-like domains is stabilized by an intramolecular isopeptide bond, which to our knowledge is the rst such bond observed in a Gram-negative bacterial surface protein.

Structure determination of Mfa5
A construct lacking the N-and C-terminal signal peptides, Mfa521-1044, was designed and expressed in E. coli (Fig. 1). The calculated molecular weight was 112 kDa, however puri cation resulted in a degradation product estimated to ~70 kDa by SDS-PAGE. A similar fragment was obtained when the protein was treated with -chymotrypin. From this sample, intergrown, plate-like crystals were obtained.
The crystals diffracted to 1.8 Å, belonged to space group P212121, and contained one molecule in the asymmetric unit. However, no phase information could be obtained so a shorter construct, Mfa5138-435, encompassing only the predicted vWF domain was crystallized. One single rectangular crystal diffracting to 1.85 Å in space group P21 and with one molecule in the asymmetric unit was obtained. A highly redundant dataset was collected, and the structure was solved using sulfur SAD. The structure was re ned to nal Rwork and Rfree of 13  The second domain, D2 (amino acids 105-135 and 423-544), has an IgG-like fold composed of a βsandwich with three and ve β-strands, respectively. Intriguingly, the vWF domain is inserted between strands B ( rst sheet) and C (second sheet) of the D2 domain. The overall topology of D2 is similar to CnaA, which is one of the common building blocks of Gram-positive adhesins [35]. Interestingly, an intramolecular isopeptide bond connects Lys111 of the rst β-sheet to Asn518 of the second β-sheet (  [35]; however, the generally high B-factor and comparisons to other IgG-like domains led to the assumption that D3 might be incomplete.
von Willebrand factor domain The structure obtained from Mfa5138-435, that will be referred to as Mfa5D1, is folded into a vWF domain consisting of a typical Rossman fold comprising a six-stranded β-sheet surrounded by three helices on each side. This globular domain, which has a wrench-like form, has the dimensions 44 Å × 52 Å × 79 Å and includes the MIDAS motif, which is a central feature for interaction with various targets. In the Mfa5 MIDAS site, the ve residues coordinating the metal ion are situated on the wrench head on top of the β-sheet. Mfa5 has a classical MIDAS motif [36] in which the side chain oxygens of Ser152, Ser154, and Thr224 coordinate the metal directly and the side chains of Asp150 and Asp252 coordinate the metal via a water molecule. The crystallization conditions and the coordination distances suggest that this is a Mg 2+ ion (bottom insert in Fig. 2). A homology search with DALI [37] using both Mfa5D1 and Mfa5D1-D3 indicated a close structural relationship to the Gram-positive tip pilins RrgA (PDBID: 2ww8 [38]) and GBS104 (PDBID: 3txa [39]) from Streptococcus pneumoniae and Streptococcus agalactiae as well as to human integrins such as the collagen-binding α2-I domain (PDBID: 1dzi [40]). A superposition of their respective vWF domains to the vWF domain of Mfa5 gives a good match to the streptococcal tip mbriae (rmsd 1.9 Å (151 of 296 Cα-positions)) and the α2-I vWF domain (rmsd 2.5 Å (88 of 296 Cα-positions)). This was unexpected based on their low sequence similarity and identity to Mfa5, 19/10% for the streptococcal adhesins and 21/9% for the α2-I protein.
A structure-based sequence alignment of the vWF domain con rmed that the MIDAS site is conserved (Fig. S1).
Compared to the α2-I structure, the bacterial vWF domains proteins have several additional structural elements. Similar to GBS104 and RrgA [38,39], the Mfa5 vWF domain has a long insertion (ARM2) between the fourth β-strand and the fourth α-helix of the domain (amino acids 257-355). In Mfa5, ARM2 folds into a small domain comprised of long loops centered around a β-sheet of four very short strands. Two helices, separated by a proline, form an L-shaped structure that covers one side of the sheet. One of the loops in the ARM2 domain coordinates the second metal in the structure. The metal is coordinated by the main chain carbonyls of Asn298, Thr301, and Leu303 and by the side chain oxygens of Asn298 and Thr301. In addition, the side chain of Asp336 from another loop also coordinates the metal. The conformation of the loop is additionally constrained by the three prolines Pro299, Pro302, and Pro305. Based on the coordination distances, this metal has been modeled as a calcium ion. Interestingly, the whole ARM2 domain is rich in prolines, 13 of 98 residues, which contributes to the rigidity of the loops ( Fig S2). The equivalent ARM2 in RrgA and GBS104 is longer, 123 amino acids (Fig. 4), and does not bind any metals. RrgA and GBS104 also have two additional loops, a 38 amino acid insertion within the βstrand C (ARM1) and a shorter loop on top of α2 (9 amino acids). These two insertions have no equivalents in Mfa5. Nonetheless, in all three bacterial proteins additional loops extend out from the vWF domain and create a lid over the top of the cleft harboring< the MIDAS site. Superposing integrin structures bound to extra cellular matrix proteins onto Mfa5 shows that the ARM2 domain of Mfa5, in its present conformation, would block the collagen binding shown for the integrin α2-I domain (PDBID: 1dzi [40]). In contrast, the ARM2 domain seems to mimic parts of the αV integrin domain, which together with β3 integrin forms a cleft that binds to bronectin (PDBID: 4mmx [41] (Fig. S3).
The isopeptide bond stabilizes the D2 domain Although P. gingivalis is Gram-negative, the Mfa5 D2 domain has a fold that is common in surface molecules of Gram-positive bacteria. These domains in Gram-positive bacteria are often stabilized by intramolecular isopeptide bonds that are formed spontaneously between the side chains of a lysine and an asparagine or aspartate. Generally, these isopeptide bonds are formed in hydrophobic environments and need an adjacent acid to coordinate the participating side chains of the bond. Surprisingly, in the D2 domain of Mfa5 continuous electron density from the Lys111 ε-amino group to the δ-carboxyamide group of Asn518 is observed, which clearly indicates an intramolecular isopeptide bond (top inlay in Fig.  2). The surrounding environment is mainly hydrophobic, and the only acid close to the isopeptide bond is Asp432, which is not at hydrogen bonding distance (3.7 and 4.0 Å to the nitrogen and oxygen of the bond, respectively). Instead, the isopeptide bond is stabilized by Thr517 (OG) and by Tyr486 (OH) via a water molecule. An identical arrangement, with an isopeptide bond formed by Lys and Asn residues with coordinating residues Asp, Thr, and Tyr are found in the D2 and N2 domains of RrgA and GBS104, respectively (Fig. S2). The intramolecular isopeptide bond in Mfa5 links the two β-sheets of the IgG-like domain; Lys111 is located on the rst β-strand of the rst β-sheet and Asn518 is located on an antiparallel β-strand of the second β-sheet.
To further validate the presence of the isopeptide bond, puri ed Mfa599-664 and its point mutant K111A were used in ESI-TOF mass spectrometry. This method measures the molecular mass with an accuracy of ±1 Da, and it gave a mass for Mfa599-664 of 62,043 Da, which was 15.95 Da less than the theoretical molecular mass of 62,059 Da. This difference in mass indicates the loss of one NH3 group (17 Da), thus verifying the formation of an isopeptide bond (Fig. S4, S5, S6). In contrast, the measured mass of 62,002 Da for the Mfa599-664 K111A mutant was identical to the theoretical molecular mass. To further analyze if this intramolecular isopeptide bond has an in uence on protein stability, a thermal shift assay was performed on both proteins. The Mfa599-664 protein showed a two-step unfolding pattern with Tms at 66°C and 78°C, indicating two domains. In the K111A mutant, a shoulder at 66°C remained whereas the second peak shifted to 71°C (Fig. 5). The resulting 7°C Tm difference for the second peak and the overall difference in unfolding pattern indicate that the absence of the isopeptide bond has a destabilizing effect on the protein.

Mfa5 integration in the mbriae
The mfa gene cluster encodes ve proteins, of which Mfa5 has been con rmed as a substrate for the type-IX secretion system [33]. Because native Mfa5 mbriae from P. gingivalis contain Mfa1, 3, 4, and 5 as previously described [42], we decided to derive speci c antibodies against Mfa5 and one additional tip protein, Mfa3. When analyzing native mbriae by SDS-PAGE, the bands at 120-150 kDa and 40 kDa were subsequently veri ed by western blotting to belong to Mfa5 and Mfa3, respectively (Fig. 6). Using the same Mfa5 antibody serum and pure native mbriae for negative staining, a clear elongation of the mbriae was detectable (Fig. 7). Bulky connectors roughly in the middle of two ~100 nm long mbriae indicate an antibody connecting two mbriae head to head (white arrows).

Discussion
Fimbriae are protein polymers projecting from the bacterial surface, and these make the rst contact with the targeted host. Therefore, it is of great interest to understand the structure, assembly mechanism, and ligand speci city of mbriae because their adherence mechanisms and biogenesis are potential targets for the development of novel targeted antibacterials [43]. In recent decades, the combined efforts of many groups have contributed to the understanding of both the ligand speci city and the donor-strand exchange mechanism that underlie the polymerization of type-I mbriae in Gram-negative bacteria such as E. coli (phylum Proteobacteria). This has led to the development of novel antibacterial treatments that have gone as far as clinical trials [4]. P. gingivalis, on the other hand, belongs to the Bacteroidetes phylum and accordingly its two mbriae, Mfa1 and FimA, are of type-V. Recent publications describing the X-ray structures of recombinant proteins building up type-V mbriae describe a common core structure of two domains, both comprising a β-sandwich [44][45][46] where the rst strand of the protein is removed by gingipain proteases when the mbriae are polymerized. These structures led to the suggestion that type-V mbriae also depend on a donor-strand exchange mechanism for bioassembly; however, the exact nature of this mechanism is still not fully understood [47]. The mfa gene cluster encodes ve proteins where the rst four all have the classical type-V fold. The fth protein, Mfa5, is different from the other Mfa proteins in many ways. It is considerably larger (1,228 residues) than Mfa1-4 (324-663 residues), and contrary to Mfa1, 3, and 4 it does not appear to be dependent on gingipains for maturation [33]. Instead, Mfa5 is the only mbrial protein that has been found to be dependent on the type-IX secretion system for translocation across the outer membrane. , and the association of these proteins in the mbrial spatial arrangement is likely to have a stabilizing effect on the large Mfa5 protein compared to when it is expressed recombinantly on its own. The vWF domain is folded from 276 residues located between the two segments of the D2 domain. vWF domains are most commonly expressed by eukaryotic organisms but have also been found in some bacteria. In the human integrin α2-I, the vWF domain has its collagen-binding MIDAS motif exposed at the top of the protein, whereas bacterial vWF domains have extra segments (ARM1 and ARM2) folding as subdomains on the sides of the MIDAS motif. The streptococcal adhesins RrgA and GBS104 have two ARMs, whereas Mfa5 only has one. Superposition of the vWF domain of Mfa5 onto the human α2-I domain indicates that ARM2, in the present conformation, would interfere with collagen binding. Superposition of the vWF domain of Mfa5 onto integrin αVβ3 indicates a closer mimic to its bronectin binding cleft. Thus the vWF domain is a clear indication that the D1 domain of Mfa5 functions as an adhesin, but the ligand has yet not been determined.
The Mfa5 structure suggests that the ancestor of the mfa5 gene has been obtained from a streptoccocal ancestor. It has been shown that horizontal gene transfer between microorganisms happens frequently in the oral bio lm where hundreds of species live in close physical contact [48]. Over the course of evolution, Mfa5 has retained crucial structural features from its Gram-positive ancestor, such as the stabilizing intramolecular isopeptide bond and the conserved MIDAS motif, and has adapted to its Gram-negative host by adding a C-terminal domain that targets the protein to the type-IX secretion system, which is unique to the Bacteroidetes phylum. While the assembly mechanism of type-V mbriae is not fully understood, it is known that the mbrial proteins, that build up the nal mbria, are expressed as lipidated precursors that are transported to the outer membrane via the lipoprotein export system, [31] followed by gingipain-dependent removal of the rst β-strand. This cleaved mbrial protein can then polymerize through a donor-strand replacement mechanism in which a segment from the next mbrial protein is predicted to participate [44][45][46]. Integration of Mfa5 into the mbriae is not expected to follow this mechanism because Mfa5 is not dependent on the presence of gingipains; however, it has been indicated by in vivo studies that the incorporation of Mfa5 into the mature mbriae requires a functional vWF domain [33]. This needs to be studied further because the truncation of the vWF domain might obstruct the correct formation of the other domains and thus its ability to bind to other mbrial proteins.
Native mbrial puri cations from P. gingivalis, as analyzed by SDS gels, revealed protein bands that were identi ed as Mfa5 by Hasegawa et al. [42]. In this study we could further con rm that antibodies derived from recombinant Mfa5 speci cally stain these bands. In negative staining images, the antibody against Mfa5 appears to bind two mbriae together, leading to the assumption that Mfa5 is indeed located at the tip of the mbriae. The different conformations visible in the images also indicate high exibility, which might be related to the linker regions between the domains. A structure of the full-length native mbria would answer the remaining questions about the localization of the different subunits and their biogenesis.
P. gingivalis is a key pathogen and is strongly associated with the periodontitis that affects a large part of the population worldwide. The presence of P. gingivalis is believed to contribute to the onset of other systemic diseases such as Alzheimer's disease, rheumatoid arthritis, and oral and pancreatic cancer, and its capacity to bind early colonizers of the oral bio lm and to extracellular matrix proteins on host cells is deemed to be an important factor. Therefore, further development of anti-adhesive substances to block the vWF-binding cleft or the development of blocking antibodies will enable future tools to hinder the establishment and proliferation of this pathogen. Hindering the mbriae from attaching to the primary colonizers of the oral bio lm is crucial because if P. gingivalis cannot colonize the mouth it will never cause the chronic in ammation that burdens the immune system and it will not have the opportunity to spread to other, non-oral parts of the body.
The recombinant proteins were expressed in E. coli C41 (DE3) (Mfa521-1044) or BL21 (DE3 pLysS) (other constructs). Cells were cultivated at 20°C in 2-4 L Luria Bertani media supplemented with 50 μg·mL −1 kanamycin. At an OD600 of 0.8 the induction was started by the addition of 0.25 mM IPTG. Cells were harvested 16 h later by centrifugation for 20 min at 4000 x g, ash frozen in liquid nitrogen, and stored at −80°C until further processing.
Cell pellets were resuspended in 50 mL cold phosphate-buffered saline (PBS) containing 260 mM NaCl, 10 mM imidazole, 2 mM β-mercaptoethanol, 1% Triton X-100, and protease inhibitors (Pierce) and lysed by sonication. Cell debris was removed by centrifugation at 64,000 x g for 30 min, and the supernatant was applied to Ni-IDA resin (TaKaRa Bio). After two wash steps, the protein was eluted with PBS and 250 mM imidazole. Tobacco Etch Virus protease or α-chymotrypsin at a 1:100 (w/w) ratio was added to the eluted sample, and the mixture was dialyzed against PBS overnight at 4°C. Next, the sample was applied to a fresh Ni-resin column for re-chromatography. The cleaved protein was concentrated with an Amicon Ultra centrifugal lter (10 kDa cutoff; Millipore) and run on a Superdex200 16/60 column (GE Healthcare) in 20 mM Tris pH 7.4 and 100 mM NaCl. Peak fractions were pooled, concentrated, and stored at −80°C until further use.

Puri cation of intact mbriae
A total of 1 L of P. gingivalis culture was harvested by centrifugation at 4,000 x g for 20 min. The cells were resuspended in 30 mL 20 mM Tris pH 7.4, 10 mM MgCl2, 1.5 M NaCl, 10% sucrose, 0.1 mM DTT, and DNase and lysed by four passes through a French press at 900 psi. Cell debris was separated by centrifugation at 9,000 x g for 10 min, and the supernatant was cleared at 143,000 x g for 90 min. The supernatant was saturated with 50% NH4SO4 at 4°C, and a follow-up centrifugation at 15,000 x g for 30 min was used to collect the precipitated protein. Mfa5-vWF or anti Mfa3 (Agrisera AB, Sweden), were diluted 1:20,000 in 3% milk in TBST and incubated with the membrane on a shaker for 60 min. After washing for 10 min three times with TBST, the secondary antibody (abcam, Sweden) was applied at a 1:50,000 dilution in 3% milk in TBST for 60 min.
After washing three times with TBST, the bound secondary antibody was detected by chemiluminescence using BCIP/NBT (Promega).
Crystallization, data collection, and structure determination  [59]. All datasets were processed with XDS [60], and the sulfur SAD data were combined with BLEND [61]. Phase determination, re nement, and automated model building were performed in the PHENIX suite [62]. The initial solution of Mfa5138-435 was used as a molecular replacement model for Mfa521-1044, which was built by BUCCANEER [63] and with cycles of manual building in COOT [64] followed by re nement with PHENIX-re ne. The nal structures were validated using MOLPROBITY [65] and PDBSUM [66], and analyzed with CATH [34], SALIGN [67], and DALI [37]. nm/detection 510-530 nm) after each 0.5°C with 3 s equilibration time. Plotting the rst derivative of the measured uorescence at each data point versus the temperature in Microsoft Excel allowed the calculation of the in ection point, and the minima were referred to as the melting temperatures (Tms). Each point was measured in triplicate, and their average value was used.

Mass spectrometry
The accurate molecular mass for Mfa599-664 and its K111A mutant was determined by electrospray ionization time of ight (ESI-TOF) mass spectrometry. Puri ed protein, 4 μL at 10 mg·mL −1 , was separated on a C8 column using a gradient of 0.1% aqueous formic acid to 0.1% formic acid in acetonitrile over 3 min on an Agilent 1200 HPLC system. The sample was subsequently ionized in an acquisition mode operated at 4 GHz on an Agilent 6230 LC-TOF/MS. The acquired total ion chromatogram was deconvoluted in the MassHunter Qualitative Analysis software (Agilent; V: B.07.00).

Transmission electron microscopy
Puri ed P. gingivalis mbriae (3 μL) were applied to a carbon coated and glow-discharged copper grid (CF300-Cu, Electron Microscopy Sciences) and incubated for 3 min. The excess sample was removed by blotting, and the grid was washed with two drops of water and stained with 1.5% w/v uranyl acetate for 15 s two times. A Ceta CMOS camera (4k × 4k pixels, FEI) connected to a Talos L120C transmission electron microscope (FEI, Umeå Core Facility for Electron Microscopy, Sweden) operating at 120 kV was used to examine the negative-stained samples. Images were recorded using the TIA software (FEI).
We thank the beamline scientists at beamlines ID29 (European Synchrotron Radiadion Facility, Grenoble) and MX14.