Improving the E ciency of the Assembly of Cellulosomes Derived from Clostridium Thermocellum by in Silico Design of Docking Protein

Zirui Wang Qilu University of Technology Cuiping Yang Qilu University of Technology Le Xue Huazhong University of Science and Technology Jie Lu Qilu University of Technology Peng Du Qilu University of Technology Nan Li Tianjin University of Science and Technology Piwu Li Qilu University of Technology Junqing Wang (  wjqtt.6082@163.com ) Qilu University of Technology https://orcid.org/0000-0001-9862-2366 Ruiming Wang Qilu University of Technology


Introduction
Lignocellulose, which the main component of plant cell walls, is currently the most abundant renewable resource worldwide and has a high exploitation value. However, lignocellulose is extremely di cult to degrade because it comprises a mixture of cellulose and lignin, which results in great waste of cellulose resources (Ragauskas et al. 2006). Cellulosomes, which are generally produced by anaerobic microorganisms, are macromolecular complexes assembled from scaffolding proteins and various enzymes that are able to degrade cellulose e ciently and have attracted much attention (Bayer et  Cellulosomes are highly e cient self-assembled multienzyme systems that are generally composed of two subunits, namely, a multienzyme subunit containing an anchoring structural domain (dockerin), which has a catalytic role, and a scaffolding protein subunit containing one or more cohesion structural domains, which assembles the cellulosome complex.
The concept of arti cial cellulosomes was rst proposed by Bayer erin terms of the arti cial design and genetic engineering of cellulosomes for e ciently degrading lignocellulose (Bayer et al. 1994). Several laboratories used recombinant DNA techniques to construct genes encoding scaffolding proteins carrying adhesion structural domains and genes encoding cellulase carrying anchoring structural domains, which were expressed, puri ed, and assembled in vitro into the predicted multienzyme complexes. Fierobe designed a series of scaffolding proteins containing two adhesion structural domains and assembled in vitro a dual-enzyme complex containing two cellulase domains, which had a speci c activity that was seven times that of the free enzyme . Moreover, brillar microsomes have been proposed for the exploitation of biomass resources. Arti cial cellulosomes can e ciently degrade cellulose-like substances that are di cult to degrade and are present in plant cell wall polysaccharides, and they thereby play an important role in fermentation and the production of renewable energy and provide ideas for solving problems associated with the utilization of cellulose resources (Zverlov et al. 2008).
Cellulosome fractions can be functionally assembled in engineered organisms for the e cient production of biofuels from organic waste. However, there have been few studies of improving the interactions between key components of cellulosomes via protein engineering. In order to improve the e ciency of binding between docking proteins and adhesion proteins in cellulosomes, a type I docking protein (hereinafter referred to as DocA) and a type I adhesion protein (hereinafter referred to as Coh) from Clostridium thermophilum were selected as the main objects for in silico design (Shang et  amino acids within 4 Å of the calcium ion-binding site of DocA were regarded as key residues involved in calcium binding. Using a Biacore T200 molecular interaction analyzer, we identi ed the two mutants that had the highest binding capacity for Coh, and then a molecular dynamics (MD) simulation was carried out to analyze the dynamic binding between the DocA mutants and Coh.

Strains and media
Escherichia coli BL21(DE3) was used as an expression host and was cultured in Luria broth (LB) medium at 37°C. The pET-28a(+) plasmid vectors (Sangon, Shanghai, China) were used for gene cloning. The enzymes used for DNA ampli cation and restriction and the plasmid preparation kit were obtained from Vazyme (Nanjing, China). The primers were synthesized by Qingke (Beijing, China). All chemicals were purchased from Sigma-Aldrich (St. Louis, MO, USA). The strains and plasmids used are listed in Table 1.   Table 1 here.

Selection Of Key Components Of Cellulosomes
The research model of cellulosomes was derived from Clostridium thermocellum, which has been fully studied. The 3D structures of a docking protein (referred to as DocA; Protein Data Bank [PDB] ID code: 2CCL) and an adhesion protein (referred to as Coh; PDB ID code: 1OHZ) from the cellulosomes of C. thermocellum were used for preliminary structural analysis using PyMOL 2.3.2 software. The homology sequence of DocA was searched on the NCBI website using the BLAST server, and homology alignment among a family of 10 xylanase primary structures was performed using the ClustalW2 program (http://www.ebi.ac.uk/Tools/msa/clustalw2/). The 3D structures of the DocA mutants were predicted by multiple template-based homology modeling using the SALIGN program (http://salilab.org/salign) and the MODELLER 9.9 program (http://salilab.org/modeller/).

Heterologous Expression Of Doca-egfp And Coh
The protein DocA was fused with enhanced green uorescent protein (EGFP) via a connecting peptide with the sequence SGGGSGGGSGGS to determine the expression status of DocA in terms of uorescence intensity. The genes corresponding to DocA-G (DocA fused with EGFP) and Coh were codon-optimized according to the genome of E. coli BL21(DE3) and were synthesized by GenScript (Nanjing, China). The pET-28a(+) plasmid was used with an inducible T7 promoter for heterologous expression of DocA-G and Coh with a His-Tag. The pET-28a(+)-Coh and pET-28a(+)-DocA-G vectors that were obtained were separately transfected into E. coli BL21(DE3) by electroporation. E. coli BL21(DE3) transformants were selected on the basis of their ability to grow on an LB plate containing kanamycin and were then screened by colony polymerase chain reaction (PCR) with the primer pairs Coh-F, Coh-R and DocA-F, DocA-R. The primer sequences used are shown in Table 2. The expression of DocA-G and Coh in E. coli BL21(DE3) was performed according to a previously reported method. The nal concentration of the inducer isopropyl thiogalactoside (IPTG) was 0.2 mmol/L.  Table 2 here.

Puri cation Of Recombinant Doca-egfp And Coh
The induced bacterial cells were resuspended in 2 × phosphate-buffered saline (PBS) buffer and were then disrupted using an ultrasonic disintegrator (Scientz-650E, NingBo, China). After centrifugation at 12000 × g, a sample of 50 mL culture supernatant was brought to 75% saturation by the addition of solid (NH 4 ) 2 SO 4 . The precipitate was harvested, dissolved in 5 mL of 20 mmol/L Na 2 HPO 4 -NaH 2 PO 4 buffer (pH 6.0), and dialyzed against the same buffer overnight. The dialysate was concentrated to 1 mL by ultra ltration using a membrane with a 3 kDa cut-off (Millipore, Billerica, MA, USA) and was loaded onto a HisTrap HP a nity chromatography column (GE, PalosAlto, USA), followed by elution with a linear gradient of 0-400 mmol/L imidazole in the abovementioned buffer at a ow rate of 0.4 mL/min. Aliquots of 2 mL eluate that only contained the target xylanase were pooled, dialyzed against deionized water, and concentrated. The puri ed protein was detected by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), and its purity was ascertained. Protein samples that met the experimental requirements were collected, and desalting and concentration were separately performed. DocA-G was dialyzed to remove salts using PBS-EP + buffer, and Coh was dialyzed to remove salts using acetic acidsodium acetate buffers with different pH values. All puri cation procedures were performed at 4°C unless stated otherwise.

Design And Identi cation Of Mutant Docking Proteins
Page 6/14 The protein sequence data for DocA-G were imported into the Rosetta website (http://rosettadesign.med.unc.edu) for in silico design. One key sequence involved in calcium ion binding, namely, DV 40 D 41 K 42 N 43 GS 45 , was selected as a potential mutation site after protein-protein docking and analysis. As shown in Fig. 1. Two single-site DocA-G mutants, which were predicted to have the most stable structures, were identi ed. We used rapid site-directed mutagenesis (Rapid Site-Directed Mutation Kit Tiangen, Beijing) to obtain these mutants.
Please insert Fig. 1 here.
Interaction between DocA-G and Coh and determination of binding ability The DocA-G mutants were heterologously expressed and puri ed. A Biacore T200 molecular interaction analyzer was employed to investigate the binding mechanism of DocA-G and Coh. Coh was anchored on a CM5 sensor chip, and DocA-G mixed with different concentrations of CaCl 2 was allowed to ow through.
In accordance with the standard protocol, puri ed Coh was immobilized on the entire surface of a CM5 sensor chip, and the chip was then loaded into the analyzer. The channel of the DocA-G mutant was used as the detection channel, and the channel of the original DocA-G was used as the reference channel. 4-(2-Hydroxyethyl)-1-piperazineethanesulfonic acid-buffered saline was used as the mobile phase. The ow rate of the ow pool was 10 µL/min, and the temperature was set to 20 ℃. DocA-G was diluted to approximately the same concentration as the immobilized Coh, and CaCl 2 was serially diluted to concentrations of 1.00 × 10 − 2 , 1.00 × 10 − 3 , 1.00 × 10 − 4 , 1.00 × 10 − 5 , 1.00 × 10 − 6 , and 1.00 × 10 − 7 mol/L and stored at 4°C. The reaction time was strictly controlled at 30 min. We repeated the experiment three times using the same concentrations to con rm the repeatability of the results. After centrifugation, the ligand was injected into the detection and reference channels at a rate of 10 µL/min, and the binding status was determined according to the value of the absorption response.

Molecular Dynamics Simulation
We used RosettaDock 3.4 molecular docking software to construct the DocA-Coh complex. An MD simulation was performed using the GROMACS 4.5.4 package with the GROMOS 96 force eld and the SPC/E explicit water model. Each system was minimized and equilibrated until the maximum force reached 10 kJ/(mol nm), as previously described. We gradually equilibrated the equilibration systems at 300 K for 100 ps with the restrained protein and ligand. After periodic boundary conditions were applied, electrostatic interactions were treated using the particle mesh Ewald method. The integration step was set to 0.002 ps, and bonds were constrained using the LINCS algorithm. After the rst equilibration step, full equilibration was carried out for 5 ns without restraints, and then the g_rms tool was used to calculate the root mean square deviation (RMSD) values for the interacting enzymes.

Selection and expression of the key components of cellulosomes
The docking protein DocA (PDB ID code: 2CCL) and the adhesion protein Coh (PDB ID code: 1OHZ), whose crystal structures had been investigated, were selected as the targets. The selected genes were initially optimized and synthesized in accordance with codons of E. coli and were then transfected into E. coli BL21(DE3) with the help of the pET-28a(+) plasmid for gene expression. When the optical density at 600 nm reached 1.0, IPTG at a nal concentration of 0.2 mmol/L was added to induce protein expression, and induction was carried out at 22 °C for 10 h. After ultrasonic fragmentation and centrifugation, Coh and DocA-G were puri ed by salting out, ultra ltration, and a nity chromatography using a HisTrap HP column and were then analyzed by SDS-PAGE. As shown in Fig. 2, the molecular weight of Coh was about 16.7 kDa (lanes 2 and 4), whereas the molecular weight of DocA-G was about 36.7 kDa (lanes 1  and 3). The molecular weights of the puri ed proteins were close to the theoretical values, which indicated that Coh and DocA-G, i.e., the key components of cellulosomes, were successfully puri ed.
Please insert Fig. 2 here.

Analysis of binding ability of DocA-G and Coh
After the puri cation process, the docking mechanisms of Coh and DocA-G at different calcium ion concentrations were investigated using a Biacore T200 molecular interaction analyzer (GE Healthcare, Chicago, IL, USA). It was found that the higher was the concentration of calcium ions in the range from 1.00 × 10 −7 to 1.00 × 10 −4 mol/L, the higher was the binding ability of Coh and DocA-G. However, when the calcium ion concentration was lower than 1.00 × 10 −4 mol/L, the binding ability of Coh and DocA-G was similar to that when the calcium ion concentration was 1.00 × 10 −7 mol/L. It can be tentatively concluded that the interaction between Coh and DocA-G to form a stable structure requires the participation of calcium ions at a concentration of about 1.00 × 10 −4 mol/L.

Design and selection of DocA-G mutants
Mutation sites in DocA-G were selected with the help of PyMOL software. The amino acids V 40 , D 41 , K 42 , N 43 , and S 45 , which were within 0.4 nm of the calcium ion-binding site of DocA-G, were selected as the key residues involved in calcium binding. Then, the key residues were altered and simulated using the Rosetta website. The two highest-scoring mutants, namely, DocA-D40 (containing T 40 , S 41 , N 42 , D 43 , and Y 45 ) and DocA-D41 (containing T 40 , S 41 , N 42 , and T 45 ), were selected for protein-protein docking and analysis.

Please insert
In vitro con rmation of binding of DocA-G mutants to Coh.
As shown in Fig. 4, when the calcium ion concentration was in the range from 1.00 × 10 −7 to 1.00 × 10 −4 mol/L the binding capacities of DocA-D41 and DocA-G were almost identical, although the binding capacity of DocA-D41 was slightly higher at a calcium ion concentration of 1.00 × 10 −4 mol/L and was about 1.2 times that of DocA-G. When the calcium ion concentration was in the range from 1.00 × 10 −5 to 1.00 × 10 −2 mol/L the binding capacities of DocA-D40 and DocA-D41 were about 3.68 times and 4.11 times that of the original protein DocA-G, respectively. Moreover, DocA-D41 exhibited the highest binding capacity for Coh at a calcium ion concentration of 5 × 10 −4 mol/L. Fig. 4 here.

Molecular dynamics simulation and structural analysis of DocA-G mutants
We constructed different DocA-D40-Coh and DocA-D41-Coh complexes using RosettaDock 3.4 and then performed an MD simulation for 5 ns using GROMACS 4.5 software. Using the g_rms tool in GROMACS 4.5, the difference parameters (i.e., RMSD) for the structures of the mutants and that of the original docking protein DocA (with/without Ca 2+ ) were calculated. The RMSD values for the mutants DocA-D40 and DocA-D41 (0.232 and 0.228, respectively) were lower than that for DocA (0.378), which implies that the structures of DocA-D40 and DocA-D41 are more stable than that of DocA.
Please insert Fig. 5 here.

Discussion
The value of cellulosomes in the conversion of cellulose has been recognized with advances in research on cellulosomes, which have led to new ideas for arti cially designing and modifying natural cellulosomes to act more e ciently in the degradation of cellulose. The concept of arti cial cellulosomes was rst proposed by Bayer et al. in terms of the arti cial design and genetic engineering of cellulosomes for e ciently degrading lignoc ellulose. Several researchers have used techniques such as DNA recombination to construct genes encoding scaffolding proteins carrying adhesion domains and genes encoding cellulase carrying anchoring domains, which were expressed and puri ed to assemble the desired multienzyme complex in vitro (Biswas et al.2015). Fierobe designed a series of scaffolding proteins containing two adhesion structural domains and assembled in vitro a dual-enzyme complex containing two cellulase domains, which had a speci c activity that was seven times that of the free enzyme (Fierobe et al. 2005). Morais et al. constructed heat-stable exoglucanase Cel48S, endoglucanase Cel8A, and heat-stable β-glucosidase from C. thermophilum by error-prone PCR and introduced them into arti cial cellulosomes (Moraïs et al. 2016). The results showed that the degradation rate of the "heatstable" arti cial cellulosomes increased by a factor of 1.7 in comparison with conventionally designed arti cial cellulosomes. Carvalho et al. found that DocA had two calcium ion-binding sites, of which one was used to stabilize the protein structure and the other was used for stable binding to Coh (Lytle et al. 2000). Research on cellulosomes currently mostly focuses on their structural analysis and applications, but reports on how to improve the binding e ciency of key components of cellulosomes by rational design have been rare (Igarashi et al. 2009;Jeon et al. 2012;Haimovitz et al. 2008).
In this study, we initially used a Biacore T200 molecular interaction analyzer to determine the binding a nities of Coh and DocA and found that binding between Coh and DocA occurred when the calcium ion concentration was in the range from 1.00 × 10 − 4 to 1.00 × 10 − 2 mol/L, whereas a high calcium ion concentration may inhibit the binding of Coh and DocA. In order to improve the binding e ciency of Coh and DocA, structure data for DocA-G were imported into the Rosetta website for in silico design. The two highest-scoring mutants, namely, DocA-D40 and DocA-D41, were selected for protein-protein docking and analysis. The results showed that the binding capacities of DocA-D40 and DocA-D41 were about 3.68 times and 4.11 times that of the original protein DocA-G, respectively. These results make it possible to improve the binding activity between components of cellulosomes via in silico design based on 3D structures. The mutant DocA-D40 exhibited a higher binding capacity when the calcium ion concentration was 1.00 × 10 − 4 mol/L, which implies that, via in silico design of the calcium ion-binding site, cellulosomes can be assembled in the presence of lower concentrations of calcium ions.
We also performed MD simulations to study the interactions of the DocA mutants with Coh. The results showed that the DocA mutants differed from the original protein to a greater or lesser extent and that the differences were mainly concentrated in the loop region. We used the g_rms tool in GROMACS 4.5.4 to calculate the RMSD values for the structures of the DocA mutants and that of the original protein DocA. As shown in Fig. 5. The mutants DocA-D40 and DocA-D41 had smaller RMSD values than DocA (without Ca 2+ ). In contrast to DocA (0.378), the high-frequency RMSD values for DocA-D40 and DocA-D41 were 0.232 and 0.228, respectively, which implies that the structures of the mutants DocA-D40 and DocA-D41 are more stable than that of DocA. Hence, the mutant docking proteins and adhesion protein are easier to assemble with calcium ions.
In conclusion, we developed a method based on the use of a Biacore T200 molecular interaction analyzer to measure activity involved in the assembly of cellulosomes. Moreover, via in silico design of the calcium ion-binding site based on the structure of DocA, two DocA mutants with higher binding capacities for Coh were obtained. As shown in Fig. 4 The binding capacities of the mutants DocA-D40 and DocA-D41 were about 3.68 times and 4.11 times that of the original protein DocA-G, respectively.
DocA-D41 exhibited the highest binding capacity for Coh at a calcium ion concentration of 5 × 10 − 4 mol/L. By an MD simulation and structural analysis, we found that the RMSD values for the mutants DocA-D40 and DocA-D41 were lower than those for the original protein DocA, which implies that the structures of DocA-D40 and DocA-D41 are more stable as a result of mutation. Our ndings provide an effective method for constructing e cient cellulosomes derived from those in C. thermocellum and will lay a foundation for the design of other types of cellulosome. Figure 1 Ca 2+ binding key amino acid selection and saturation mutation design. PyMol 1.7 was used to analyze the structural characteristics of the protein DocA, and the amino acids in the protein DocA that were close to the calcium ion were selected as the key amino acids for calcium ion binding.

Figure 2
Electrophoretogram of puri ed Coh protein and DocA-G protein.
The docking protein DocA-G and the adhesion protein Coh, whose crystal structures had been investigated, were selected as the targets. The selected genes were initially optimized and synthesized in accordance with codons of E. coli and were then transfected into E. coli BL21(DE3) with the help of the pET-28a(+) plasmid for gene expression. After ultrasonic fragmentation and centrifugation, Coh and DocA-G were puri ed by salting out, ultra ltration, and a nity chromatography using a HisTrap HP  Multiple sequence alignment of DocA and its mutants.
Two DocA-G mutants, which were predicted to have the most stable structures, were obtained by Rosettadesign. The picture is a list of two DocA-G mutants amino acids and the original amino acid sequence.

Figure 4
Binding capacity map of single-site mutants and Coh at different Ca 2+ concentrations.
When the calcium ion concentration was in the range from 1.00 × 10 −7 to 1.00 × 10 −4 mol/L the binding capacities of DocA-D41 and DocA-G were almost identical, although the binding capacity of DocA-D41 was slightly higher at a calcium ion concentration of 1.00 × 10 −4 mol/L and was about 1.2 times that of DocA-G. When the calcium ion concentration was in the range from 1.00 × 10 −5 to 1.00 × 10 −2 mol/L the binding capacities of DocA-D40 and DocA-D41 were about 3.68 times and 4.11 times that of the original protein DocA-G, respectively. Moreover, DocA-D41 exhibited the highest binding capacity for Coh at a calcium ion concentration of 5 × 10 −4 mol/L.

Figure 5
Scatter plots of RMSD for 4 kinds single-point mutants.
We constructed different DocA-D40-Coh and DocA-D41-Coh complexes using RosettaDock 3.4 and then performed an MD simulation for 5 ns using GROMACS 4.5 software. Using the g_rms tool in GROMACS 4.5, the difference parameters (i.e., RMSD) for the structures of the mutants and that of the original docking protein DocA (with/without Ca 2+ ) were calculated. The RMSD values for the mutants DocA-D40 and DocA-D41 (0.232 and 0.228, respectively) were lower than that for DocA (0.378), which implies that the structures of DocA-D40 and DocA-D41 are more stable than that of DocA.