Bacterial strains, plasmids, and cultivation media. Escherichia coli strains, plasmids, and primers used in this work are summarized in Table S1. E.coli strain DH5α (Novagen, Darmstadt, Germany) was used to propagate and amplify plasmid constructs. E. coli Rosetta pLysS (DE3, Novagen, Darmstadt, Germany) was used to express the wild-type PsG3Ox and its variants cloned in the pET-15b plasmid (Novagen, Darmstadt, Germany). KRX (Promega, Wiscosin, USA) was used as expression strain in the SSM library screenings. Luria Bertani medium (LB) was used for cell cultivation, supplemented with 100 μg ml-1 of ampicillin (NZYTech, Lisbon, Portugal) and, in the case of Rosetta pLysS, also with 20 μg ml-1 of chloramphenicol (NZYTech, Lisbon, Portugal).
Construction of variants by site-directed mutagenesis. The Quick-change mutagenesis protocol (Stratagene, California, USA) was used in the construction of single variants, as well as to delete two regions of the enzyme, the insertion-1 loop, between residues 73 and 93, and the substrate-binding loop between residues 345 and 359, in variant Δloop (345-359) (Table S1). The plasmid pSM-1, carrying the wild-type PsG3Ox gene, was used as a DNA template with appropriate primers. PCRs were performed in a thermal cycler (MyCyclerTM thermocycler, Biorad) in 50 μL reaction volume containing 100 ng of DNA template, 1 μM of primers (forward and reverse), and 200 μM of dNTPs (NZYTech, Lisbon, Portugal). 1 U of NZYProof polymerase (NZYTech, Lisbon, Portugal) was used to amplify the DNA, except for pAT27 and pAT28 where 1 U of Q5 High-Fidelity DNA polymerase (New England BioLabs, Massachusetts, USA) was used. For the single and double mutants, after an initial denaturation of 4 min at 95 ºC, 25 cycles of 1 min at 95 ºC, 1.5 min at 72 ºC, and 10 min at 72 ºC were performed, followed by a final elongation of 10 min at 72 ºC. Amplification of truncated variants was performed after an initial denaturation of 30 sec at 98 ºC, 35 cycles of 10 sec at 98 ºC, 30 sec at 72 ºC and 4 min at 72 ºC, followed by a final elongation of 2 min at 72 ºC. For all the PCR products, the DNA template was digested with 10 U of DpnI (ThermoFisher, Massachusetts, USA) at 37 ºC for 6 h, followed by purification using Illustra GFX PCR DNA kit (GE Healthcare, Illinois, USA). In the truncated variants, an overnight ligation with T4 ligase (ThermoFisher, Massachusetts, USA) was performed at room temperature, followed by purification with the abovementioned kit. The PCR products were transformed in E. coli strains using electroporation, and the presence of the desired mutation(s) or deletions was confirmed by DNA sequencing.
Production and purification of PsG3Ox. The recombinant strains E. coli Rosetta pLysS carrying the genes coding for wild-type PsG3Ox and variants were grown in 2.5 L LB media supplemented with 100 μg ml-1 of ampicillin and 20 μg ml-1 of chloramphenicol in Corning® 5L Baffled PETG Erlenmeyer flasks. The cultures were incubated at 37 °C, 100 rpm (Innova 44 incubator shaker, New Brunswick Scientific). Cultures were induced with 100 μM of isopropyl β-D-1-thiogalactopyranoside (IPTG) at an OD600 nm = 0.8, the temperature was lowered to 25 °C, and cells were collected by centrifugation (4420 × g, 10 min, 4°C) after 16 h of cultivation. The purification of wild-type PsG3Ox and variants was performed using a Histrap HP column (Cytiva, Massachusetts, USA) as previously described (Mendes et al., 2016). Enzyme preparations for crystallographic trials were purified using a 1 ml-Resource-Q column (Cytiva, Massachusetts, USA) using 20 mM Tris-HCl pH 7.6 as running buffer and a gradient of 0-500 mM of NaCl for elution. Before the enzyme crystallization, the His(6x)-tag was cut from the enzyme using the thrombin cleavage kit (Abcam, Cambridge, UK) following the manufacturer protocol at 20ºC for 16 h. Afterward, the preparation was loaded on the Histrap HP column, and the flowthrough (containing the untagged enzyme) was collected. The total protein concentration was determined by Bradford assay using bovine serum albumin as standard. Abs450nm ( M-1 cm-1) of purified preparations was measured to assess the functional fraction of enzyme preparations (e.g., for kinetic measurements).
Crystallization and cryoprotection. Crystallization conditions were screened with a nanodrop crystallization robot (Cartesian, Genomic Solutions) using the sitting drop vapour diffusion method with round-bottom Greiner 96-well CrystalQuickTM plates (Greiner Bio-One, Kremsmünster, Austria). The Structure Screen I and II (Molecular Dimensions) led to the formation of PsG3Ox crystals within seven days at 20 ºC using 2 M ammonium sulfate and 0.1 M Tris-HCl, pH 8.5 in drops of 0.1 μl protein solution (18 mg ml-1) plus 0.1 μl reservoir solution. Following this crystallization hit, microliter-scale crystals optimization proceeded using the hanging drop vapour diffusion method in XRL 24-well crystallization plates with 500 μL of the reservoir solution (Molecular Dimensions, Newmarket, UK). Several conditions were tested: ammonium sulfate concentration ranging from 0.5 to 2M, Tris-HCl, pH 7.0-8.5, and different ratios (1:2, 1:1, and 2:1) of protein: reservoir solution volumes. Yellow and round crystals appeared after 7-10 days, reaching dimensions of 100 µm in their three dimensions when using 2 M ammonium sulfate and 0.1 M Tris-HCl, pH 8.5, and 1 μL of protein (18 mg ml-1) and 2 μL of reservoir solution at 20 °C. Crystals of the PsG3Ox-substrate complex were obtained by soaking PsG3Ox crystals in 1) the reservoir solution containing 2M D-Glc for 30 min and 2) the reservoir solution containing 1 mM Mang for 1 min. Crystals were cryo-protected by plunging in a reservoir solution supplemented with 20% (v/v) glycerol before flash-cooling in liquid nitrogen.
Data collection and processing. Diffraction data were measured in ID23-2 and ID30A-1 beamlines at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) and in XALOC beamline at ALBA (Barcelona, Spain) for PsG3Ox, PsG3Ox-Glc and PsG3Ox-Mang complex crystals, respectively. Diffraction images of PsG3Ox were obtained with a DECTRIS PILATUS3 X 2M detector, using 0.8731 Å radiation wavelength, crystal-to-detector distance 232 mm, and oscillations width 0.20º in a total of 360º rotation. The diffraction data of the PsG3Ox-Glc complex were obtained with a PILATUS3 2M detector, using 0.9654 Å radiation wavelength, crystal-to-detector distance of 238 mm, and oscillations width of 0.20º in a total of 180º rotation. The diffraction data of the PsG3Ox-Mang complex were obtained with a DECTRIS PILATUS 6M detector, radiation wavelength 0.97926 Å, crystal-to-detector distance of 553.95 mm, and oscillations width of 0.20º in a total of 180º rotation. Data were indexed and integrated with XDS38 in a space group determined with POINTLESS39, and the data scaled with AIMLESS39,40. These programs were used within the autoPROC data processing pipeline 41. Data collection details and processing statistics are listed in Table S2.
Structure determination, refinement, and analysis. The three crystals belong to the same space group and show similar cell dimensions. The distribution of their Matthews coefficient42,43 indicated a high probability of a single molecule in their asymmetric units. The phase problem of PsG3Ox was solved by molecular replacement using MORDA that selected the coordinates of P. chrysosporium POx (PcP2Ox, PDB 4MIF) and Alkalihalobacillus halodurans ATP phosphoribosyltransferase regulatory subunit structure (PDB 3OD1) as search models, which led to a 99% probability solution. PHASER, within the PHENIX suite, was used to localize the two PsG3Ox-substrates structures using native PsG3Ox structure as a search model, which led to TFZ values of 35 and 42, indicating successful structures solutions44. Automated model rebuilding and completion were performed with PHENIX.AUTOBUILD followed by manual model building, performed with COOT45, and iterative refinement cycles, using PHENIX.REFINE.
Structure refinement included atomic coordinates, isotropic atomic displacement parameters (a.d.p.s), and domains of translation, libration, and screw refinement of anisotropic a.d.p.s (TLS), previously defined with TLSMD server (http://skuld.bmsc.washington.edu/~tlsmd). Approximately 1.5% of reflections were randomly excluded from monitoring the refinement strategy. Solvent water molecules were automatically assigned from σA difference maps peaks neighboring hydrogen bonding acceptors/donors within 2.45-3.40 Å distances. Other solvent molecules were identified through a comparison of their shapes against electron density blobs, as well as by comparing their refined a.d.p.s with those of neighboring atoms. Some atoms were modeled with partial occupancies when hinted by the difference in Fourier maps and neighboring a.d.p.s values. As some regions of the PsG3Ox and PsG3Ox-substrate complex structures were not visible in the electron density maps, a BUSTER protocol was applied to search for missing atoms46,47. The stereochemistry of the refined structures was analyzed with MOLPROBITY. Three-dimensional superposition of polypeptide chains was performed with MODELLER. Figures of structural models were prepared with PyMOL. Refinement statistics are presented in Table 1. Structure factors and associated structure coordinates of PsG3Ox, PsG3Ox-Glc, and PsG3Ox-Mang complex were deposited in the Protein Data Bank (www.rcsb.org) with PDB codes 7QF8, 7QFD, and 7QVA, respectively.
Loop modeling. Rosetta loop modeling was used to build the non-visible loops in the PsG3Ox (202-204 and 309-318), PsG3Ox-Glc (77-89, 201-206, 308-320, and 345-359), and PsG3Ox-Mang (74-90, 201-206, 309-324 and 351-359) complex structures using a methodology previously described48; a full-atom refinement step was performed with the next-generation kinematic (NGK) closure robotics-inspired conformational sampling protocol49. The crystal structures of PsG3Ox, PsG3Ox-Glc, and PsG3Ox-Mang were kept intact except for the loop regions that were created, with repacking of the side chains within 10Å of the remodeled region. A total of 500 loops were built, and to find the best loop candidate; each model was scored by its Rosetta energy score and contacts with the substrate molecules.
Apparent steady-state kinetics. Apparent steady-state kinetics measurements were performed at 37 ºC in 100 mM sodium phosphate buffer, pH 7.5, and reactions were started with the addition of enzyme. The kinetic parameters for D-glucose (D-Glc, PanReac Applichem, Darmstadt, Germany), D-galactose (PanReac Applichem, Darmstadt, Germany), D-ribose (VWR, Pennsylvania, USA), D-xylose (Sigma Aldrich, Missouri, USA), and L-arabinose (Sigma Aldrich, Missouri, USA) were measured using coupling assay containing 0.1 mM 4-Aminoantipyrine (AAP, Acros organics, Geel, Belgium), 1 mM 3,5-dichloro-2-hydroxybenzenesulfonic acid sodium salt (DCHBS, Alfa Aesar, Massachusetts, USA), 8 U ml-1 Horseradish peroxidase (HRP, PanReac Applichem, Darmstadt, Germany) and different concentrations of substrate. Enzymatic activity was monitored using a Synergy2 microplate reader (BioTek, Vermont, USA) following the formation of N-(4-antipyryl)-3-chloro-5-sulfonate-p-benzoquinone-monoimine (a pink chromogen) at 515 nm (ε515= 26,000 M-1 cm-1). The kinetic parameters for molecular oxygen were measured in an Oxygraph system (Hansatech instruments, Pentney, UK) to follow oxygen consumption in reactions containing 1 M D-Glc as an electron donor and different oxygen concentrations pre-set by bubbling O2 or N2 gas. The oxidation of the glycosides mangiferin (Sigma Aldrich, Missouri, USA), rutin (Acros Organics, Geel, Belgium), and carminic acid (Sigma Aldrich, Missouri, USA) were followed by oxygen consumption in the Oxygraph apparatus in reactions containing 0 - 2 mM of Mang, 0 - 0.5 mM of rutin or 0 – 1 mM of carminic acid. Specific activity was calculated considering the preparation's functional (FAD-loaded) enzyme ratio. Apparent steady-state kinetic parameters (kcat and Km) were determined by fitting data directly into the Michaelis-Menten equation using Origin-Lab software. For inhibition assays, the steady-state kinetics for D-Glc were performed as described below in the presence of 0 - 0.5 mM rutin or 0 - 0.2 mM carminic acid. The data was represented using a Lineweaver Burk plot and the inhibition constants were estimated based on a secondary plot of the slopes against inhibitor concentration.
Identification of mangiferin oxidation product. Oxidation of Mang was performed under aerobic conditions at 25ºC, pH 7.5 in 30 mL of Milli-Q water containing 20 mg of Mang, 1 U ml-1 of Catalase (Sigma Aldrich, Missouri, USA), and 1 U ml-1 of PsG3Ox. To estimate the time needed to have a high yield of oxidized Mang, a time-course of the reaction was performed in a thin layer chromatography (TLC) on silica gel 60 F254 sheet (Merck, Darmstadt, Germany) using a mixture of butanol, acetic acid, and water in the proportions 4:1:2.2 (v/v) as mobile phase. The TLC revealed a diphenylamine-aniline-phosphoric acid reagent50, a system used to distinguish sugars. For the NMR characterization, the reaction occurred for 30 min, and then the enzymes were removed by ultrafiltration using a vivaspin20 of 30 kDa cutoff (Cytiva, Massachusetts, USA). The water in the mixture was evaporated under low pressure on a rotary vacuum evaporator, and the resulting sediment was resuspended in ~ 600 of dimethyl sulfoxide-d6 (Merck, Darmstadt, Germany). The Mang and the reaction product, both in DMSO-d6, were analyzed through 1H, 13C APT, COSY, and HMQC NMR in a Bruker Avance II+400. 1H NMR spectra were obtained at 400 MHz and 13C at 100.61 MHz.
Molecular dynamics simulations. Molecular dynamics simulations (MD) of PsG3Ox were carried out to further explore the conformational preferences of the enzyme at different stages of the catalytic cycle. First, MD simulations of 400 ns production runs were performed of four systems built as follows: i) Model I: taken from the PsG3Ox crystal structure (without substrates), which shows closed substrate and insertion-1 loops; ii) Model II: obtained from the same crystal structure but with the substrate loop sampled with the Yasara Sample Loop function and D-Glc docked (see below) in a binding mode compatible with C2 oxidation; this model presents a semi-open substrate loop and a closed insertion-1 loop; iii) Model III: prepared from the PsG3Ox-Mang crystal structure, with Mang removed, the missing parts of the loops were built with the Yasara Build Loop function, and D-Glc docked in a C2 binding mode; this model shows open substrate and insertion-1 loops; iv) Model IV: derived from the PsG3Ox-Mang crystal structure (in a binding mode compatible with C3 oxidation) by building the missing parts of the open substrate and insertion-1 loops. Missing hydrogen atoms were added, and the protonation state of the titratable residues was assigned with the Yasara hydrogen bond networks optimization and pKa prediction tools at pH 751. The systems were solvated with a solvation box and neutralized with NaCl. The conventional MDs, cMDs, were set up and run using Yasara52; the AMBER14 force field53 and TIP3P water model54 were used. A two-step equilibration was carried out for 400 ps: first, the system’s temperature was increased in 10 steps from 30K up to 300K, followed by a second step with constant temperature and box dimensions. The bonds and angles involving a hydrogen atom were fixed. A restrain on all non-hydrogen atoms of the complexed protein-ligand was applied so that the equilibration would mainly affect the system’s solvent. Bonded and non-bonded forces were updated every 2 and 5 fs, respectively. The protein-ligand complex was then released, and a production step of 400 ns was carried out. Gaussian accelerated MDs (GaMDs)55 were then run for Model I - III to access effective longer simulation times and wider conformational explorations, especially of the substrate loop. The starting structures for these GaMD simulations were the 100 ns structure from the corresponding classical MDs. An extra model (Model III*) was built from Model III by removing the bound D-Glc substrate to see if loop transitions could be observed. The GaMD simulations were run with the AMBER 20 program56. The GaMD protocol consisted of an initial equilibration stage where the potential boost was applied, boost parameters were updated, and production runs were updated with fixed boost parameters. A dual boost on dihedral and total potential energy was applied (igamd = 3). Two simulations of 600 ns were run for each Model I, II, and III*, whereas the simulation for Model III was stopped after 200 ns as D-Glc was observed to left the active-site. GaMD inputs were generated following the recommendations from the developers57. The convergence of all simulations was assessed by calculating the RMSD values of the protein Calpha atoms. The distances between FADN5 and A352, as well as G84, were used as an indicator of the conformational state of the substrate and insertion-1 loops, respectively.
Protein-ligand docking. Protein-ligand docking calculations were carried out on different protein structures from the cMD trajectories using AutoDock VINA58-60 D-Glc was docked in PsG3Ox’s active site among 100 frames from semi-open (Model II) and open (Model III) substrate’s loop MDs, respectively. On the other hand, Mang was docked among 100 frames from the open substrate loop MD (Model IV). The rest of the substrate’s loop conformations, particularly those with the closed loop, did not allow proper positioning of the substrate in the active site due to steric clashes, and they were discarded. D-Glc was docked using the YAMBER force field61 on a 20 × 20 × 20 Å cuboid docking cell centered on FADN5, while Mang was docked on a 34 × 34 × 34 Å cuboid docking cell centered on the same atom. A total of 16 ligand conformations were generated per frame, potentially yielding 1600 ligand-enzyme combinations per ligand. Distances between the glycoside’s D-Glc group and FADN5 (N5-HC2; N5-HC3), as well as between H440 (NE2-HO2; NE2-HO3), were measured to filter structures in potential catalytically relevant C2 (NE2-HO2 and N5-HC2 < 4 Å) and C3 (NE2-HO2 and N5-HC2 < 4 Å) binding modes that could go into an optimization protocol, with a minimization step of the whole complex. The initial (pre-optimization) and final distances between FADN5/H440 and the D-Glc group of each glycoside, as well as the total energies of the entire system and the ligand binding energies, were obtained for further analysis. The minimum distances between the ligand and residues Q297, Q340, R94, T129, K55, were measured. The number of events (frequency) for each measured distance and energy, depending on the ligand-binding mode and substrate loop conformation, were plotted in different histograms. Rutin and carminic acid were docked following the same protocol as for Mang.