Sequence similarity network (SSN) revealed substrate-specificity GH29 fucosidase clusters
SSN was used to explore the sequence-function space of microbial fucosidases belonging to the GH29 family (www.cazy.org). The SSN is composed of nodes and edges, with each representative node representing a single protein sequence, which is linked with an edge when sharing over 40% sequence identity. The SSN analysis of GH29 amino acid sequences revealed a total of 2971 representative nodes wired by 141732 edges. The network was composed of 63 distinct main clusters and 121 singletons defined by cluster analysis utility29 (Fig. 1). Clusters 1, 2, and 3 accounted for 54% of the total nodes. Of the 63 clusters analysed, clusters 1-11, 13, 16, 18, 20, 21, 23, 26, 34, 41, 45 and 47 included sequences corresponding to functionally characterised enzymes. Among them, clusters 1, 13, and 45 contained GH29-B enzymes while the remaining clusters belonged to GH29-A apart from cluster 11, in which Fuc30 isolated from breast-fed infant faecal microbiome was found unrelated to GH29-A or GH29-B subfamilies14. Clusters 1 and 13 contained α1,3/4 fucosidases active towards α1,3/4 fucosylated GlcNAc found in Lewis antigens (Supplementary Table S1). The convergency ratios of clusters 2, 3 and 4 were lower than 0.30, indicating that these clusters were not isofunctional. Consistent with this, fucosidases belonging to clusters 2, 3, and 4 have been reported to have promiscuous activities for α1,2/3/4/6 fucosyl linkages (Supplementary Table S1). Fucosidases in clusters 2 and 3 have been reported to release Fuc from xyloglucans8,13,30,31. Cluster 2 also contained the newly found exo-α-l-galactosidase BpGH29 from Bacteroides plebeius DSM 17135 32. Fucosidases from clusters 3 and 47 as well as non-clustered FucWf4 from Wenyingzhuangia fucanilytica CZ1127T have been shown to release terminal α1,3/4 Fuc from sulfated fucooligosaccharides33,34. Most fucosidases in cluster 4 are of animal origin. Cluster 5 contained fucosidases that specifically act on α1,3 fucosyl linkages with cFase I from Elizabethkingia meningoseptica FMS-007 cleaving α1,3 Fuc from the core GlcNAc position from intact glycoproteins 35. Clusters 6 contained two characterised fucosidases, BF0810 fromBacteroides fragilis NCTC 9343 active on pNP-Fuc but not on natural substrates with α1,2/3/4/6 linkages36; and Fuc5372 isolated from breast-fed infant faecal microbiome, with preference for α1,2 fucosyl linkages found in HMOs and blood group antigens14. Clusters 7, 8 and 10 contained fucosidases with relatively high catalytic efficiency towards aryl-Fuc and marginal activity against α1,2/3/4 fucosyl linkages7,13,36–38. Cluster 9 contains Fuc1584 from breast-fed infant faecal microbiome which acts on α1,3/4/6 fucosyl linkages14. Clusters 11 and 41 contained α1,6 specific fucosidases with no activity to α1,2/3/4 fucosyl linkages14,39. In cluster 16, AlfB from Lactobacillus casei BL23 has been reported to be over 800-fold more active on α1,3 fucosylated GlcNAc than on α1,4 fucosylated GlcNAc with the non-terminal Gal in LeX abrogating its activity 39. Cluster 26 contained site-specific core α1,6 fucosidase AlfC fromL. casei BL2339. Cluster 45 contained Afc1 from Clostridium perfringens ATCC 13124 which showed no activity against all aryl- and natural substrates tested40. Functionally characterised fucosidases displaying transfucosylation activities were found in GH29-A clusters including clusters 2, 3, 7, 8, 18 and 26 and in cluster 1 belonging to GH29-B subfamily (for full information on functionally characterised fucosidases identified in SSN clusters, see Supplementary Table S1). The novel microbial-derived GH29 sequences identified in this SSN analysis belong to a range of microorganisms from Proteobacteria, Actinobacteria, Planctomycetes, Spirochaetota, Firmicutes, Bacteroidetes phyla (Supplementary Table S2).
Based on this analysis, we selected 11 GH29 sequences predicted to encode novel fucosidases including three from the GH29-B subfamily, TT1377 and TT1380 in cluster 1, TT4202 in cluster 44; seven from GH29-A subfamily, TT1379 in cluster 2, TT1817 and TT4187 in cluster 3, TT4225 in cluster 4, TT4197 from cluster 9, TT1819 and TT1820 in cluster 26; and one singleton, TT4206. We also included previously characterised fucosidases as controls i.e. two α1,3/4 fucosidases from cluster 1, E1_10125 from R. gnavus E110 and SsFuc (TT1385) from Streptomyces sp. 142; TfFuc1 (TT1386) α1,2/6 fucosidase from Tannerella forsythia ATCC 43037 from cluster 813,41; and Afc1 (TT4199) from C. perfringens ATCC 13124 from cluster 45, a predicted fucosidase but with no reported activity against any of the α1,2/3/4/6 fucosylated substrates tested40.
Microbial GH29 enzymes show broad substrate specificity towards fucosylated substrates
The genes encoding the selected GH29 fucosidases were heterologously expressed in E. coli and the His6-tag recombinant proteins purified by IMAC and gel filtration (Supplementary Fig. S1). E. coli Tuner DE3 pLacI strain was chosen as heterologous host as it does not display any endogenous β-galactosidase activity (due to the deletion of the LacZ gene) that may interfere with the enzymatic characterization of the recombinant enzymes.
The kinetic parameters of all GH29 enzymes (TT1377, TT1379, TT1380, TT1385, TT1386, TT1817, TT1819, TT1820, TT4187, TT4199, TT4197, TT4202, TT4206, TT4225 and E1-10125) were determined by calculating the initial rate of reaction with increasing CNP-Fuc concentrations (Table 1). All enzymes were found to be active towards CNP-Fuc, apart from TT4199 from cluster 45 as reported earlier 40. TT1379, belonging to cluster 2, showed highest activity towards CNP-Fuc among all GH29 enzymes tested, with kcat/Km of 58.24 µM-1·min-1, in a range similar to that of Ssα-fuc from the neighbouring node (kcat/Km=10.25 µM-1·min-1)42. The lowest Km values were obtained for GH29-A enzymes such as TT1386, TT1817 and TT1820 from clusters 8, 3, 26, respectively, a common feature of GH29-A subfamily enzymes against aryl substrates 4. Other GH29-A fucosidases were distributed across clusters 3, 4, 8 and 9 have similar Km values in the range of 153.7 to 668 µM-1 whilst kcat values varied from 0.23 to 1411 min-1 (Table 1). The kinetic parameters of TT1377, TT1380, TT1385 and E1-10125 from cluster 1, and TT4202 from cluster 44 were in the same range with catalytic efficiencies between 10-2 and 10-1 µM-1·min-1, consistent with other GH29-B fucosidases i.e. BT1625 from B. thetaiotaomicron VPI-54829, Eo0918 from Emticicia oligotrophica DSM 1744843, Blon_2336 from B. longum subsp. infantis ATCC 156977.
Next, the substrate specificity of the recombinant fucosidases was tested on a range of fucosylated oligosaccharides. The specific activity was first determined based on fucose release against 2′FL (Fucα1,2Galβ1,4Glc), 3FL (Galβ1-4[Fucα1-3]Glc), DFL (Fucα1-2Galβ1-4[Fucα1-3]Glc), BgA (GalNAcα1-3[Fucα1-2]Galβ1-4GlcNAc), BgB (Galα1-3[Fucα1-2]Galβ1-4GlcNAc), BgH (Fucα1-2Galβ1-4GlcNAc), LeA (Galβ1-3[Fucα1-4]GlcNAc), sLeA (Neu5Acα2-3Galβ1-3[Fucα1-4]GlcNAc), LeX (Galβ1-4[Fucα1-3]GlcNAc), sLeX (Neu5Acα2-3Galβ1-4[Fucα1-3]GlcNAc), LeY (Fucα1-2Galβ1-4[Fucα1-3]GlcNAc), Fuc1,6GlcNAc (Fucα1-6GlcNAc), pPGM and pNP-Fuc using the k-fucose kit (Table 2).
TT1377, TT1380, TT1385 and E1-10125 from cluster 1 were found to be over hundred times more active towards α1,3/4 fucosylated linkages than α1,2 fucosylated linkages, while no detectable activity was shown towards Fuc1,6GlcNAc, in line with other characterised GH29-B enzymes from cluster 1 such as BT_2192 38 (Table S1). In this cluster, only E1-10125 showed similar activity towards both LeX and sLeX 10 (Table S1&2) and pPGM (Table 2), consistent with the Lewis epitopes being capped with sialic acids in type III mucin used in this work44. TT4199 (i.e. Afc1 from C. perfringens ATCC 13124) from cluster 45 belonging to GH29-B showed no activity towards pNP-Fuc, weak activity against Fuc1,6GlcNAc and over thousand times lower activity towards α1,3/4 substrates compared with GH29-B α1,3/4 fucosidases from cluster 1. The non-clustered TT4206 fucosidase showed an enzymatic profile towards α1,3/4/6-linked fucosylated substrates similar to that of TT4199 but weak activity against pNP-Fuc.
The GH29-A enzymes showed preferences towards different α1,2/6 fucosylated linkages. For instance, the highest Fucα1, 2Gal specific activities around 400 U/µmol towards 2’FL and BgH and 0.7 U/µmol towards pPGM were found for TT1379. TT1386 (i.e. TfFuc1 from T. forsythia ATCC 43037) from cluster 8 showed the second highest specific activity towards pNP-Fuc, BgH and pPGM. TT1819 and TT1820 from cluster 26, and TT4197 from cluster 9 showed specificity for 6FN, with TT1819 and TT1820 being hundred times more active against pNP-Fuc than TT4197 (Table 2). TT1386 showed dual specific activity towards α1,2/6 fucosylated linkages in BgH-II and Fuc1,6GlcNAc, respectively. No detectable activity towards LeX was found for TT1817 and TT4187 from cluster 3. TT4202 only showed specificity towards α1,6 linkage albeit with low activity. TT4225 showed strict specificity for α1,2-fucosylated linkages. None of the enzymes tested in this study showed detectable activity towards blood group A/B type II antigens under the experimental conditions tested, probably due to steric hindrance from the non-terminal GalNAc/Gal.
HPAEC-PAD analyses confirmed the release of Fuc from all GH29 fucosidases tested on their preferred substrates (Fig. 2 and SupplementaryFig. S2). In addition, due to the lower detection limit of HPAEC-PAD, it was possible to identify products below 5 µM, which was not possible using the fucose-kit assay. For example, minor Fuc peaks released from BgA/BgB were showed for 10 of the 15 GH29 enzymes tested in this study including TT1377, TT1379, TT1380, TT1385, TT1386, TT1817, TT1820, TT4187, TT4197 and E1-10125. However, no Fuc could be detected by HPAEC for the enzymatic reactions of TT4199 on 2’FL and BgH-II, TT4225 on LeA and sLeA, and TT4187/TT4202 on sLeA, in agreement with their specific activity.
LC-FD-MS/MS was used to investigate whether TT1819, TT4197 and TT4225 fucosidases could act directly on a1,3/6 core fucosylated glycoproteins. Among the different substrates tested, TT4225 showed activity towards IgG glycan and TT1819 was active towards FA2G2 (Fig. 3). None of the enzymes tested showed activity with PLA2 or IgG glycoprotein (SupplementaryFig. S3).
Microbial GH29-A fucosidases show transfucosylation activity
To test the transfucosylation capacity of the GH29 fucosidases characterised above, the recombinant enzymes were first assayed using GlcNAc as acceptor and pNP-Fuc as donor. The GH29-A fucosidase ATCC_03833 from R. gnavus ATCC 2914910 showing 73.0 % similar to aLfuk1 from Paenibacillus thiaminolyticus (both in cluster 3) was used as control as aLfuk1 was previously shown to catalyse the transfer of α-l-fucosyl moiety to different pNP-glycopyranosides with pNP-Fuc as donor45. The analysis of the reaction products by TLC confirmed the formation of transfucosylation product by ATCC_03833 and showed that TT1379 from cluster 2, TT1817 from cluster 3, TT1819 and TT1820 from cluster 26 displayed transfucosylation activity (Fig. 4A and SupplementaryFig. S4B). These results are in agreement with the SSN analysis showing that GH29 enzymes with reported transglycosylation activity with GlcNAc as acceptor were distributed in clusters 2, 3, 8, and 26 belonging to GH29-A subfamily. In contrast, none of the GH29-B fucosidases tested showed transfucosylation activity using this acceptor-donor pair. Since the Rf values for Fuc1,3GlcNAc, Fuc1,4GlcNAc and Fuc1,6NAc (0.57, 0.52 and 0.55, respectively) on TLC could not discriminate between the products formed, NMR was used to gain further insights into the linkages of the transfucosylation products. The NMR analysis showed that Fuc1,3GlcNAc was the main product generated by TT1379 although traces of Fuc1,4GlcNAc were detected, consistent with TT1379 showing slightly higher hydrolytic activity towards LeX compared to LeA (Fig. 4B). ATCC_03833 and TT1817 from cluster 3 produced Fuc1,6GlcNAc and Fuc1,3GlcNAc but not Fuc1,4GlcNAc (SupplementaryFig. S4C), in agreement with other cluster 3 enzymes such as AmGH29A from Akkermansia muciniphila ATCC BAA-835 with reported activity towards Fuc1,3GlcNAc but not Fuc1,4GlcNAc46. Transfucosylation reactions with a1,6 fucosidases TT1819 and TT1820 resulted in the synthesis of Fuc1,6GlcNAc (SupplementaryFig. S4C).
TLC-ESI-MS was carried out to further investigate the transfucosylation capacity of TT1819 (Fig. 4C). The product of TT1819 enzymatic reaction with GlcNAc was confirmed to be a fucosylated compound (Fuc1,xGlcNAc, found m/z 390.1 for [M+Na]+, calcd for C14H25NO10Na 390.1) (Fig. 4C). ATCC_03833 reaction with GlcNAc produced a fucosylated product (Fuc1,xGlcNAc, found m/z 390.8 for [M+Na]+, calcd for C14H25NO10Na 390.1) (SupplementaryFig. S4A). Further transfucosylation reactions were performed using Fuc1,3GlcNAc or Fuc1,6GlcNAc as acceptors. Both TT1819 and ATCC_03833 produced bifucosylated products with Fuc1,3GlcNAc but not with Fuc1,6GlcNAc (Fig. 4C andSupplementaryFig. S4A). TT1819 product of the reaction with Fuc1,3GlcNAc was confirmed to be a product of fucosylation (Fuc1,x[Fuc1,3]GlcNAc, found m/z 537.0 for [M+Na]+, calcd for C20H35NO14Na 536.2) (Fig. 4C). ATCC_03833 reaction product with Fuc1,3GlcNAc was confirmed as a fucosylation product (Fuc1,x[Fuc1,3]GlcNAc, found m/z 537.0 for [M+Na]+, calcd for C20H35NO14Na 536.2) (SupplementaryFig. S4A). For both enzymatic reactions with Fuc1,6GlcNAc, the only peak produced corresponded to the acceptor (TT1819: Fuc1,6GlcNAc, found m/z 390.8 for [M+Na]+, calcd for C14H25NO10Na 390.1; ATCC_03833: Fuc1,6GlcNAc, found m/z 390.7 for [M+Na]+, calcd for C14H25NO10Na 390.1) (Fig. 4C andSupplementaryFig. S4A). From this analysis, it is expected that the product of the TT1819 reaction with Fuc1,3GlcNAc and GlcNAc as acceptors is Fuc1,6[Fuc1,3]GlcNAc, in agreement with the substrate specificity of TT1819 for a1,6 linkages (Fig. 4D).
Structural basis for TT1819 fucosidase from Bifidobacterium asteroides substrate specificity
In addition to its substrate specificity towards α1-6 linkages and transfucosylation activity reported above, LC-FD-MS/MS analyses showed that TT1819 was active against the decasaccharide FA2G2 (Fig. 2) while no activity was detected towards IgG glycan or glycoprotein (SupplementaryFig. S3). To further explore TT1819 substrate specificity, the crystal structure of the catalytic domain was solved, demonstrating the (α/β)8-fold, typical of GH29 enzymes (Fig. 5A) and catalytic features conserved with previously solved GH29 enzymes such as AlfC from Lactobacillus casei W56 (Fig. 5B). Data collection and refinement statistics are detailed in Table 3. It was only possible to grow diffracting crystals of TT1819 in the presence of 2’FL, resulting in a complex with Fuc bound in the active site (Fig. 5A and SupplementaryFig. S5A). Asp218 and Asp260 were identified as catalytic nucleophile and acid/base, respectively based on proximity to the Fuc residue and homology with other GH29 enzymes, such as AlfC from L. casei W5647. Asp218 is flanked by the structurally conserved Tyr151 that donates a hydrogen bond to the nucleophile, as previously observed in TmαFuc and E1_10125 fucosidases from T. maritima and R. gnavus E1, respectively10,48. Extensive hydrogen bonding interactions were observed between the active site and the bound sugar hydroxyl groups. The C6 methyl group sits in a hydrophobic pocket formed by Trp216 and Trp305. Unlike E1_10125, which showed evidence of β-fucose bound10, the electron density of the TT1819 complex most clearly matched α-fucose. Furthermore, attempting to model β-fucose led to a steric clash with Asp210. High B-factors were observed in the residues surrounding the active site, indicating that there may be plasticity in the presence of larger substrate molecules (SupplementaryFig. S5B). However, minimal conformation changes are observed when comparing the Fuc bound WT active site to that of an unbound D218N catalytic mutant (Fig. 5A). Compared to fucosidases E1_10125 from R. gnavus E1 and Blon_2336 from B. longum subsp. infantis, the TT1819 active site was shown to be constricted (SupplementaryFig. S5C, S5D), which may contribute to the substrate specificity of this enzyme.
Tyr57 is of interest in relation to TT1819 α1,6 linkage specificity. The residue hydrogen bonds with the catalytic acid/base at the active site boundary (Fig. 5B). This residue is structurally conserved in AlfC (as Tyr37), found in the same SSN cluster as TT1819 and shows specificity to α1,6-linked Fuc47. AlfC Tyr37 has been shown to change conformation in presence of the α1,6-linked ligand. Tyr37 then forms an aromatic subsite, providing a stacking interaction with the monosaccharide that is immediately linked to the intimately held Fuc47. This conformational change and function are expected to be maintained by TT1819. In contrast, an equivalent residue is absent in E1_10125 and Blon_2336 from B. longum subspecies infantis, both belonging to cluster 1. This cluster favors hydrolysis of α1,3/4 fucosyl linkages rather than α1,6 (Supplementary Fig. S5C, S5D). Additionally, it is proposed that TT1819 Ile284 would clash with substrates presenting α1,3/4 linkages, whereas Blon_2336 has an acidic residue well placed to create a stabilising hydrogen bond to the substrate (Supplementary Fig. S5D).
To gain further structural insights into the ligand specificity of TT1819, saturation transfer difference nuclear magnetic resonance spectroscopy (STD NMR) studies (Mayer and Meyer, 1999) were conducted with the inactive TT1819 D218A mutant in the presence of FA2G2 (Fig. 6A). The D218A mutation allowed the NMR study to focus on the process of molecular recognition of the substrate, disentangling it from the subsequent chemical reaction. Transfer of magnetization as saturation from the protein to the ligand was observed, in agreement with the activity of TT1819 for this substrate. Due to the large size of FA2G2 (decasaccharide), the 1D NMR spectrum showed significant chemical shift overlapping, challenging the analysis. For that reason, only isolated protons were assigned and quantitatively analysed (i.e. protons H5 and H6 of fucose, H2s of mannose and the methyl group of the four GlcNAc rings). A full build-up curve analysis of their STD intensities showed that the enzyme intimately recognises the non-reducing end sugar residues constituting FA2G2 (Fig. 6B and Supplementary Fig. S6) with no significant differences in their binding epitopes. The main contacts were restricted to Fuc and GlcNAc (Fig. 6A) residues, whereas only loose contacts were observed with the distant GlcN moieties (Fig. 6B, 6C, 6D).