Predictive Approaches to Guide the Expression of Recombinant Vaccine Targets in Escherichia Coli: A Case Study Presentation Utilising Absynth Biologics Ltd Proprietary C.Dicile Vaccine Antigens

Background: Bacterial expression systems remain a widely used host for recombinant protein production. However, overexpression of recombinant target proteins in bacterial systems such as Escherichia coli can result in poor solubility and the formation of insoluble aggregates, termed inclusion bodies. As a consequence, different and numerous strategies or alternative engineering approaches have been employed to increase recombinant protein production. In this case study, we present the strategies used to increase the recombinant production and solubility of ‘dicult-to-express’ bacterial antigens, termed Ant2 and Ant3, from Absynth Biologics Ltd’s Clostridium dicile vaccine programme. Results: Single recombinant antigens (Ant2 and Ant3) and fusion proteins (Ant2-3 and Ant3-2) formed insoluble aggregates (inclusion bodies) when overexpressed in BL21 CodonPlus (DE3) cells. Further, proteolytic cleavage of Ant2-3 was observed, potentially due to the presence of a large un-structured loop between the protein boundaries. Optimisation of culture conditions such as varying the induction temperature and addition of heat-shock inducer benzyl alcohol to the growth media had no signicant effect on the processing and protein production pattern for all four antigen molecules. Changes to the construct design to include N-terminal solubility tags (Thioredoxin and N utilisation substance protein A) did not improve solubility. Screening of different buffer/additives to improve stability showed that the addition of 1-15mM dithiothreitol (DTT) alone improved the stability of both Ant2 and Ant3. Structural models were generated for Ant2 and Ant3 and solubility-based prediction tools were employed to determine the role of charge and hydrophobicity on protein production. The results showed that both Ant2 and Ant3 contained unfavorable features associated with poor solubility. A large non-polar region was detected on the surface of Ant2 structures, whereas, positively charged regions were observed for Ant3. Conclusions: Commonly used strategies to enhance recombinant protein production in bacterial systems did not act to increase production of model ‘dicult-to-express’ antigens, Ant2 and Ant3 and their fusion proteins. Sequence and structural analysis of antigens identied unfavorable features that potentially result in the increased tendency of these antigens to aggregate and/or lead to improper processing. We present a guide of strategies and predictive approaches that aim to guide the construct design, prior to expression studies, to dene and engineer sequences/structures that could lead to increased expression of single and potentially multi-domain (or fusion) antigens in bacterial expression systems.


Introduction
Bacterial expression systems remain a widely used host for recombinant protein production. Escherichia coli (E.coli) remain the preferred used bacterial expression system, due to the low cost, rapid growth and high-cell densities achieved [1]. However, disadvantages of using this system include the inability to perform eukaryotic post-translational modi cations e.g. glycosylation and di-sulphide bond formation for correct protein folding and protein maturation. Further, overexpression of recombinant target proteins in E.coli can result in poor solubility and in the formation of inclusion bodies [2,3].
Inclusion bodies are enriched with insoluble denatured protein aggregates, which subsequently require additional extraction and re-folding processes to isolate proteins of interest. Proteins that localize to inclusion bodies usually have low or no biological activity. It has been reported, 15-25% of the total amount of protein is recovered from inclusion bodies, due to a loss of secondary structure and protein aggregation during solubilisation and re-folding processes [3,4]. However, an advantage of protein puri cation from inclusion bodies includes the ease in isolating the inclusion body fraction via highspeed centrifugation from bacterial lysates [5,6]. In addition, inclusion bodies can have a protective effect and prevent proteolytic degradation resulting in homogeneity of protein species within this fraction [4,7,8].
Many strategies or engineering approaches have been employed to increase recombinant protein production and overcome limitations in the production of 'di cult' or toxic target proteins in E.coli from the construct design through to the puri cation and formulation of target proteins [4]. E cient recombinant protein production in E.coli relies on a combination of the correct DNA construct, host cell strain and downstream protein puri cation processes. Within each of these processes, there are multiple factors that can impact recombinant protein production. Strategies for optimal expression of targets rely on careful optimisation of these variables/factors and often are speci c to each recombinant target. Examples of approaches used to improve protein production of 'di cult' recombinant targets include engineering an optimal combination of DNA elements in the expression vector. Also as part of the construct design, protease-cleavable solubility and/or detection tags have been used to aid protein expression and puri cation leading to increased amounts of protein recovered from bacterial cultures [9][10][11][12]. Different E.coli host cell strains have been engineered and tailored to the requirements of the recombinant targets such as those capable of performing speci c post-translational modi cations [4,13]. Another approach has been the co-expression of chaperones. Different groups of chaperones exist in E.coli to aid protein folding, prevent aggregation and/or degradation and co-expression of these factors have shown to increase protein yields and solubility of certain recombinant targets [14,15]. Great efforts have been focused on optimisation of the culture conditions such as the growth medium, inducer concentrations [16,17], induction temperature [18,19] and the isolation, re-solubilisation and puri cation of proteins from inclusion bodies [4,7,8]. Other strategies employed include the targeting of proteins to different cellular compartments, for example the periplasmic space where an oxidising environment and lower protease concentration has been shown to increase protein production and stability [20]. Alongside the optimisation of bacterial systems, computational approaches have allowed the prediction of sequence and/or structural features that may impact e cient production and solubility [21][22][23][24][25]. Such tools have allowed the re-design of recombinant targets with increased expression and/or solubility [23,[26][27][28]. Together, the optimisation of recombinant protein production tends to be tailored to each speci c recombinant target and therefore more generic predictive tools and/or guides to aid e cient recombinant protein production would be bene cial.
Absynth Biologics Ltd is focused on the discovery and development of novel vaccines that target a range of infectious diseases to address the challenge of antimicrobial resistance. Absynth Biologics Ltd has a portfolio of proprietary antigens that have been identi ed in Clostridium di cile (C.di cile) as potential vaccinogens. Two leading antigens, Ant2 and Ant3, are small protein domains taken from essential proteins that localize to the bacterial outer cell membrane. Ant2 was taken from a protein ortholog of tRNA N6-adenosine threonylcarbamoyltransferase (YdiE/TsaD) [29][30][31][32][33]. TsaD is a predicted membrane protein that has been assigned multiple functions that include RNA translational delity, regulation of membrane transport and may act as a protease [31,34]. Whereas, Ant3 was taken from a bacterial cell division protein DivIB/FtsQ, that has been shown to bind to the cell wall and is involved in bacterial morphogenesis and cell division [35][36][37][38][39][40].
Earlier expression and puri cation studies of Ant2 and Ant3 antigens (from C.di cile) by Absynth Biologics Ltd showed that overexpression of both proteins produced insoluble protein (inclusion bodies).
Proteins isolated from the inclusion bodies had an increased tendency to oligomerise and protein precipitation was observed during the puri cation steps (unpublished work). This case study presents the strategies used to increase the recombinant production and solubility of these 'di cult-to-express' bacterial proteins, Ant2 and Ant3, from Absynth Biologic Ltd's C.di cile portfolio. We also present a work ow or 'fast-track' guide with the use of predictive approaches that aim to guide the construct design and increased expression of single and potentially multi-domain (or fusion) antigens in bacterial expression systems.

Materials
All reagents used were of the highest grade and purchased from Sigma-Aldrich unless stated otherwise.
The coding sequence for fusion 2 (Ant3-2) was cloned into three different pET16 parental expression vectors (provided by EA McKenzie) with a cleavable N-terminal 6×His tag alone and in combination with a thioredoxin (Trx) or N utilisation substance protein A (NusA) solubility tag (termed pHis, pHisTrx and pHisNusA respectively). Ant3-2 coding sequences were ampli ed by polymerase chain reaction (PCR, Table I) and cloned into the three parental expression vectors using BamHI (5') and EcoRI (3') restriction sites. BamHI and EcoRI restriction sites were introduced into PCR product via the forward and reverse primer respectively (Table I). Sequences for Ant2, Ant3 and fusion 1 (Ant2-3) were ampli ed by PCR using primers designed using the In-Fusion® primer design tool (Table I) and cloned into pHisNusA using the In-Fusion® HD cloning method (Clontech, cat no. 638916) as per the manufacturers instructions. Small-scale bacterial expression DNA constructs for the single antigens (Ant2 and Ant3) and fusions (Ant2-3 and Ant3-2) were transformed into BL21-CodonPlus (DE3) E.coli cells. A single colony was used to inoculate 5ml overnight culture (LB broth with 100μg/ml Ampicillin). The next day, bacterial cultures were seeded from the overnight culture (1:100 dilution) in a total volume of 100ml LB broth containing Ampicillin (100μg/ml nal concentration). The cells were grown at 37°C with shaking at 220rpm. The optical density at 600nm (OD 600 ) for each culture was monitored until an OD 600 of 0.5-0.7 was reached. The expression of the recombinant antigens was induced with Isopropyl β-D-1-thiogalactopyranoside (IPTG, 0.2mM nal concentration), followed by incubation at 37°C, 30°C or 18°C for 20 h at 220rpm. For growth at lower temperatures (30°C and 18°C) cultures were cooled to the appropriate temperature prior to induction.
Large-scale bacterial expression DNA constructs were transformed into BL21-CodonPlus (DE3) E.coli cells. A single colony was used to inoculate a 5ml starter culture (LB broth with 100μg/ml Ampicillin) and incubated at 37°C, 220rpm for 6h. The overnight culture was inoculated with 50μl of the starter culture in 50ml selective LB broth (1:1000 dilution) and incubated overnight at 37°C with shaking at 220rpm. The next day, bacterial cultures were seeded from the overnight culture (1:100 dilution) in a total volume of 1L LB broth containing Ampicillin (100μg/ml nal concentration). The cells were grown at 37°C with shaking at 220rpm. At the correct OD 600 (0.5-0.7), cultures were cooled (4°C for 30mins) and induced with IPTG (0.2mM nal concentration) for 20 h at 18°C, 220rpm. Cultures were harvested by centrifugation (5000rpm, 4°C for 10mins). The dry pellet weight was recorded and cell pellets were re-suspended in 30ml ice-cold lysis buffer and sonicated on ice (7 cycles, 30sec on/off pulses at 25% amplitude, Bandelin Sonoplus sonicator, HD3200). After lysis, an aliquot (40μl) of the bacterial lysate (total fraction) was isolated for SDS-PAGE and western blot analysis. The insoluble and soluble fractions were isolated by centrifugation (17,000rpm, 4°C for 30mins). Aliquots (40μl) of the soluble fraction (supernatant) and insoluble fraction (pellet) were isolated for SDS-PAGE and western blot analysis.
Protein re-folding After high-speed centrifugation of the total fraction, the isolated inclusion body enriched pellet (insoluble fraction) was used for protein re-folding purposes. The pellet was re-suspended in either 4ml (smallscale) or 30ml (large-scale) urea buffer (5mM imidazole, 1M NaCl, 40mM Tris-HCL pH 7.9, 8M urea and 0.14% (v/v) β-mercaptoethanol) and sonicated (5 cycles, 30 secs on/off pulses at 25% amplitude). The suspension was centrifuged at 17,000rpm, 18°C for 30mins to remove cell debris. The supernatant was isolated for puri cation.

His-tag puri cation
For small-scale puri cation (100ml bacterial culture) from the soluble fraction, 1ml Ni-NTA agarose (50% suspension, Qiagen) was washed twice with 14ml ice-cold phosphate buffered saline (PBS) buffer in a 15ml falcon tube. For large-scale puri cations (1L bacterial culture), 5ml Ni-NTA agarose was washed twice with 45ml ice-cold PBS in a 50ml falcon tube.
Puri cation from the soluble fraction All incubation steps were carried out at 4°C and spin steps completed at 4000rpm, 4°C for 5 mins (smallscale) or 10 mins (large-scale).
For large-scale puri cation, the Ni-NTA resin was equilibrated with 10ml ice-cold lysis buffer and then spun down and supernatant was discarded. 30ml suspension (soluble fraction) was added to the resin and incubated on a roller for 2 h at 4°C. After incubation, the suspension was spun down and the ow through collected. The resin was washed three times consecutively with 50ml each of buffer 1, buffer 2 and buffer 3 for 20 min at 4°C. After each wash the suspension was spun down and supernatant isolated. Soluble proteins were eluted with elution buffer in 9 x 1ml steps (E1-E3) at 4°C.
At each stage, 100μl aliquots of each fraction ( ow through, washes and eluates) were collected for protein concentration determination, SDS-PAGE and western blot analysis.

Puri cation from the insoluble fraction
For small-scale puri cations (100ml bacterial culture), all spin steps were carried out at 4000rpm, 4°C or room temperature for 5 mins. The resin was equilibrated with 2ml urea buffer and then spun down. The supernatant was discarded. 4ml suspension (pellet/insoluble fraction) was added to the resin and incubated on a roller overnight at room temperature. The next day, the suspension was spun down and the ow through collected. The resin was washed with 10ml buffer A (5mM imidazole, 1M sodium chloride, 20mM Tris-HCL pH 7.9, 8M urea) for 1 min at room temperature. The suspension was spun down and supernatant (wash 1) isolated. The resin was washed with 10ml buffer B (5mM imidazole, 1M sodium chloride, 20mM Tris-HCL pH 7.9 and 0.14% (v/v) β-mercaptoethanol) for 1 min at room temperature. The suspension was spun down and supernatant (wash 2) isolated. The resin was washed a third time with 10ml buffer C (60mM imidazole, 0.5M sodium chloride, 20mM Tris-HCL pH 7.9 and 0.14% (v/v) β-mercaptoethanol) for 1 min at 4°C. The suspension was spun down at 4°C and supernatant (wash 3) isolated. Re-folded proteins were eluted with buffer D (1M, 0.5M sodium chloride, 20mM Tris-HCL pH 7.9 and 0.14% (v/v) β-mercaptoethanol) in 3 x 1ml steps (E1-E3) at 4°C. Insoluble proteins bound to the resin (after re-folding) were eluted with a denaturing elution buffer, buffer E (5mM imidazole, 1M sodium chloride, 20mM Tris-HCL pH 7.9, 8M urea, EDTA) in 3 x 1ml steps (DE1-DE3) at room temperature. All eluates were stored at 4°C.
For large-scale puri cations (1L bacterial culture), all spin steps were carried out at 4000rpm, 4°C or room temperature for 10mins. Following the washes the resin was equilibrated with 10ml urea buffer and then spun down. The supernatant was discarded. 30ml suspension (pellet/insoluble fraction) was added to the resin and incubated on a roller overnight at room temperature. The next day, the suspension was spun down and the ow through collected. The resin was washed consecutively with 50ml buffer A then buffer B for 20 min at room temperature on a roller. After each wash the suspension was spun down and supernatant isolated. The resin was washed a third time with 50ml buffer C for 20 min at 4°C on a roller. The suspension was spun down at 4°C and supernatant isolated. The resin was transferred to an empty single gravity ow chromatography column and re-folded proteins were eluted with buffer D in 5 x 1ml steps (E1-E5) at 4°C. Insoluble proteins (after-re-folding) were eluted with buffer E in 4 x 1ml steps (DE1-DE4) at room temperature. All eluates were stored at 4°C.
At each stage, an aliquot (100μl) of each fraction was collected for protein concentration determination and characterization by SDS-PAGE and western blot.

Determination of protein concentration
An estimate of the protein concentration was determined for puri ed samples using the Bio-Rad DC™ Protein Assay kit as per the manufacturers instructions.

SDS-PAGE and western blotting
SDS-PAGE (12.5% (w/v)) gels were prepared and run using the Bio-Rad mini PROTEAN Tetra system.
Protein samples were mixed in equal volume with 2× loading buffer (125mM Tris- . Under reducing SDS-PAGE conditions, 2×-sample buffer was supplemented with 1.8% (v/v) β-mercaptoethanol. Samples were heated to 100°C for 5 mins and cooled before loading. The following total volume of sample was loaded, 5μl of the total induced (T), pellet/insoluble (P) and soluble (S) fraction and Bovine serum albumin (BSA) protein standard. For puri ed protein samples, 15μl total volume was loaded. SDS-PAGE was performed in electrode buffer (25mM Tris base, 190mM glycine, 0.1% w/v SDS, pH 8.3). The marker and samples were electrophoresed at 60V through the stacking gel and then 200V until the dye front reached the bottom of the gel.
For western blot analysis, proteins separated by SDS-PAGE were transferred onto nitrocellulose membrane using the TE 22 wet transfer unit at 300mA for 2 h. Blots were blocked for 1 h at room temperature in 5% (w/v) non-fat milk in phosphate buffered saline (PBS, 137mM NaCl, 2.7mM KCl, 10mM Na 2 HPO 4 , 2mM KH 2 PO 4 , pH 7.4) with 0.1% (v/v) Tween-20 (5% mPBS-T). A mouse monoclonal anti-6×His tag primary antibody (1:4000 dilution, Abcam, ab18184) and donkey anti-mouse secondary antibody (1:15000, LI-COR Biosciences, 926-32212) were prepared in 5% mPBS-T and incubated with blots for 1 h at room temperature. Blots were imaged using the LI-COR Odyssey® Classic imager according to the manufacturer's instructions. Analysis and quanti cation of bands was completed using the LI-COR Image Studio™ Lite software.

Protein identi cation
Protein samples were separated by SDS-PAGE and submitted to the protein identi cation service (Manchester Institute of Biotechnology, University of Manchester). Protein samples were gel extracted, trypsin-digested and analysed by Dr. Martin Read using mass spectroscopy.

Protein solubility screen
The solubility of puri ed protein samples was pro led in different formulations using the Optisol™ III protein soluble screening kit (Soluble Bioscience) as per the manufacturers instructions. Ant2 and Ant3 were expressed in the BL21-CodonPlus (DE3) E.coli strain (1L total volume) and proteins were resolubilised in urea buffer from the insoluble fraction and puri ed via the 6×His-tag. Proteins were eluted and the protein concentration was determined (Table AI). The top three concentrated samples were pooled for screening. A blank absorbance reading (280nm) was taken of the plate with reagent alone. 15μl of the pooled protein sample was mixed with 150μl of reagent per well and incubated at 37°C for 24 h (stressed conditions). Following incubation the soluble protein was collected by centrifugation (3000rpm, 30min). The absorbance was measured after isolation of the soluble protein and the blank subtracted from these values. The data was subsequently analysed using the Protein Dashboard™ (supplied by Soluble Bioscience).

Use of predictive tools for sequence-and structural-based analysis of antigens
Amino acid sequences for all antigens were analysed using Jpred 4 (protein secondary structure prediction server) [41] and ProteinSol (predictive protein solubility tool) [25].
Three-dimensional structural models for Ant2 and Ant3 sequences were generated in SWISS-MODEL [42,43]. Models were based on sequence homology of both C.di cile Ant2 and Ant3 sequences from this study with published structures, in the protein data bank (PDB), of their corresponding E.coli homologs, YdiE (PDB ID: 4wq4) and FtsQ (PDB ID: 2vh1) respectively. Structural models for fusion proteins (Ant2-3 and Ant3-2) could not be generated. The structural models for Ant2 and Ant3 were analysed using the ProteinSol algorithm (in collaboration with Dr. J Warwicker, University of Manchester). The algorithm was used for a sequence-based solubility prediction and a structural-based prediction where surface mapping of model structures were analysed in terms of electrostatic potential (positive vs. negative charge) and hydrophobicity (polar vs. non-polar).

Data analysis
All results are presented as the mean ± standard error of the mean (SEM) for at least three biological replicates. All graphs were plotted in Graphpad Prism® (Version 6.02).

Results
Characterisation of bacterial growth and recombinant protein production Two antigens from Absynth Biologics Ltd's C.di cile portfolio, Ant2 and Ant3, were used as models to assess the effect of different strategies to improve production and solubility of these 'di cult-to-express' targets, together with predictive approaches to gain insight to whether these could be used to guide the design of antigens to aid increased recombinant production. As well as the single Ant2 and Ant3 antigens, fusion proteins were generated. The fusion proteins were designed to incorporate Ant2 and Ant3, in different orientations, Ant2-3 (fusion 1) and Ant3-2 (fusion 2), into a single polypeptide chain. As historical data from Absynth Biologics Ltd, showed Ant2 and Ant3 localised to the insoluble (inclusion body) fraction and recovered poorly (data not shown), we hypothesized the fusions may act to improve production and solubility. In addition, use of fusion protein would decrease the overall cost-of-goods by expressing both antigens simultaneously.
Preliminary work focused on the optimization of a high-yielding bacterial expression protocol. A panel of E.coli strains (BL21 (DE3), BL21 CodonPlus (DE3), JM109 (DE3) and JM109-pGJKE8 (DE3)) was tested to evaluate their effect on the production of these antigens. Regardless of the strain used, all four antigens localised to the insoluble fraction and this observation was not speci c to the host cell strain (data not shown). For further studies, BL21 CodonPlus E.coli cells were selected as they were shown to express the largest amount of total protein of all the strains tested.
The constructs for Ant2, Ant3 and Ant2-3 (fusion 1) and Ant3-2 (fusion 2) were transformed into the BL21 CodonPlus E.coli strain. Prior to induction, the bacterial growth was monitored at 37°C (Figure 1a). No signi cant difference was observed between the growth/doubling times of all four antigens. Overexpression of the antigens was induced with IPTG for 20 h at three different temperatures 18°C, 37°C or 30°C. Post-induction, the bacterial crude lysate (total fraction, T) were analysed by SDS-PAGE. At 18°C, the single antigens, Ant2 (~25kDa) and Ant3 (~26kDa), were expressed in smaller amounts compared to the fusions, Ant2-3 and Ant3-2 (~51kDa) (Figure 1b). Parallel expression studies at the higher growth temperatures 37°C and 30°C, showed an increase in the amount of the total protein but no effect on protein solubility ( Figure A2).

Assessment of protein solubility
To assess protein solubility, the total crude fraction (T) was centrifuged at high speed (17,000rpm) to isolate the insoluble (I, pellet) and soluble (S, soluble) fraction. Aliquots of each fraction were analysed by western blotting, using a 6×His tag primary antibody was used for detection ( Figure 2). Assessment of the protein solubility for all four antigens showed that although a greater amount of protein was detectable at the higher growth temperatures (37°C and 30°C), almost all the protein was detectable in the insoluble fraction. Further, a larger presence of high and low molecular weight species (predicted oligomers and degradation products) was observed at 37°C and 30°C. At 18°C, soluble protein was detectable for all antigens and signi cantly less high and low molecular weight species were detected at the lower growth temperature (Figure 2).
All four antigens showed a tendency to form oligomers. Presence of oligomers was detected by western blotting under non-reducing conditions (data not shown). In addition, lower molecular weight bands were also detected which may correspond to degradation products.

Protein re-folding and His-tag puri cation
To compare the yields and purity between the insoluble and soluble fraction, the antigens were puri ed from each fraction separately via the C-terminal 6×His tag (Figure 3 and Figure A3). As seen with the solubility data, a signi cant proportion of protein localized to the inclusion body enriched insoluble fraction therefore proteins were rst re-folded and subsequently puri ed.
A poor recovery of protein was observed from the soluble fraction for both fusions Ant2-3 and Ant3-2 ( Figure 3a). In addition, other unknown high and low molecular weight species were detected in the puri ed samples. These species could correspond to host-cell proteins co-purifying with each antigen or non-speci cally binding to the Ni-NTA agarose. In contrast, a larger amount of protein was recovered after re-solublisation (Figure 3b). Though a signi cant proportion remained insoluble after re-solubilisation (fractions DE1-DE3) for both Ant2-3 and Ant3-2.
The same pattern was observed for the puri cation of single antigens. A larger amount was recovered from the insoluble fraction compared to the soluble fraction ( Figure A3). However, for both Ant2 and Ant3 after protein re-folding from the insoluble fraction almost all the protein was solublised ( Figure A3b). The data for the single antigens differed to the fusion data as a signi cant proportion of both fusion proteins remain insoluble after re-folding (Figure 3b).
Post-6×His tag puri cation of Ant2-3, a prominent band was detected at half the molecular weight (~25kDa) of the full fusion protein in both soluble and insoluble fractions and which was not present in the Ant3-2 samples (Figure 3). Protein identi cation of the low molecular weight band by mass spectroscopy showed that the band corresponded to a mixture of Ant2 and Ant3 antigens. The trypsindigested protein fragments detected by mass spectroscopy, mapped to either side of the boundary of Ant2 and Ant3 in the full fusion protein suggesting improper processing of the fusion 2 was occurring (discussed further in the next section).

Protein stability
As improper processing of Ant2-3 was observed further work was carried out to ascertain if cleavage of Ant2-3 was occurring post-lysis. Ant3-2 was also studied as a comparison. Both fusions were puri ed from the soluble and insoluble fraction and puri ed samples were then incubated and sampled at different time points at 4°C for 24 h (Figure 4) and 37°C for 2 h (data not shown) and analysed by western blotting.
Under both temperature conditions, no increase in the amount of degradation product was observed postpuri cation in the soluble and insoluble fraction for Ant2-3. Further no proteolytic products of the same size were observed for Ant3-2 ( Figure 4b). The data suggested the improper processing of Ant2-3 occurred during intracellular expression rather than post-lysis.

Strategies to improve protein production and solubility
As a large proportion of all four antigens localised to the insoluble fraction and in the case of the fusions largely remained insoluble after re-folding -strategies were employed to improve protein production, solubility and prevent improper processing of fusion 1 (Ant2-3).
As mentioned earlier, different E.coli host strains were tested to assess their effect on the production and solubility of these target antigens. For example, the JM109-pGJKE8 (DE3) strain allowed the coexpression of chaperones alongside the target antigens with the aim to aid protein folding during expression. The data showed no improvement in the protein production and solubility. Further, proteolytic cleavage of Ant2-3 was observed in all strains (data not shown).
Alongside the chaperone strain studies, a heat-shock inducer benzyl alcohol (BA) was added to BL21 CodonPlus (DE3) cultures to improve recombinant protein yields, solubility and prevent aggregation. Previously, De Marco et al [44] reported the use of BA in cultures increased yields of recombinant targets comparable to the use of chaperone co-expression. Cultures transformed with Ant2, Ant3, Ant2-3 and Ant3-2 were treated with BA, 30 mins prior to IPTG induction (18°C for 20 h). Post-induction cultures were harvested and protein production and solubility assessed for all antigens with and without BA addition ( Figure A4). No signi cant difference was observed for cultures with BA compared to untreated cultures.
An alternate strategy tested was the use of solubility tags. Initially, Ant3-2 (fusion 2) was used as a model, the idea was to test the effect on protein solubility and use the data to aid cloning of the appropriate/favorable tag with the other target antigens. The Ant3-2 gene sequence was cloned into pET16 vectors with different cleavable N-terminal solubility tags NusA and Trx (Figure 5a). In addition, a N-terminal 6×His tag was present for detection. A N-terminally His-tagged Ant3-2 construct (pHis-F2) was generated as a control to account for the change in position of the His-tag from the original pET21d (pF2-His, C-terminal His-tag). Bacterial expression of the Ant3-2 constructs showed that the addition of the solubility tags did not increase the solubility of the antigens. Addition of the N-terminal His-tag resulted in a lower amount of total protein compared to the un-modi ed C-terminally His-tagged Ant3-2 (Figure 5b).
The expression of NusA alone was used as a positive control ( Figure A5a). NusA alone was mainly detected in the soluble fraction whereas pNusA-Ant3-2 was insoluble. Though addition of the NusA did not increase solubility of Ant3-2 during bacterial expression, it did increase the amount of soluble protein after re-folding and His-tag puri cation (Figure 5c). The N-terminal NusA tag was also added to Ant2, Ant3 and Ant2-3 to see if the lack of effect on protein solubility was speci c to Ant3-2 or common across all four antigens. Addition of the N-terminal NusA tag did not increase solubility of the other antigens, rather, in all cases a decrease in the amount of total protein was observed ( Figure A5b).
As the strategies employed at the plasmid design and bacterial expression stage did not increase protein production and/or solubility, further work involved optimising the current methodology to keep puri ed material soluble. For puri ed Ant2 and Ant3 samples, post-puri cation from the insoluble fraction, protein precipitation was observed above a certain concentration range (1.5-2mg/ml) either as proteins were eluted from the column or during storage at 4°C after 24-48 h, leading to signi cant loss in protein yields. To prevent such loss in protein yields, the solubility of puri ed material was assessed in different buffers for Ant2 and Ant3.
Ant2 and Ant3 were expressed in a total volume of 1L and proteins re-folded and puri ed from the insoluble fraction ( Figure 6). The three eluate fractions with the highest protein concentration (determined by protein assay, Table AI) were pooled and analysed using the OptiSol™ protein solubility screening kit to identify conditions that could stabilise protein at higher concentrations after puri cation. Puri ed protein was plated out onto the supplied reagent plate and incubated under 'stressed' conditions (37°C for 24 h) and the amount of soluble protein remaining was assessed after incubation. Overall, Ant3 showed poorer stability compared to Ant2, as Ant3 precipitated under almost all conditions (negative absorbance values). Across the different conditions tested, the addition of reducing agent, DTT, had a signi cant effect on protein solubility for both Ant2 (Figure 8a) and Ant3 (Figure 8b). An increase in concentration of DTT (1-15mM) resulted in a proportional increase in the amount soluble protein recovered after incubation. The addition of another reducing agent β-mercaptothanol had a negligible effect on protein solubility. The data suggests the addition of DTT alone may act to decrease the formation of insoluble aggregates and increase solubility of these antigens. This hypothesis was supported tested further as addition of DTT to the initial puri ed Ant2 sample and incubation under 'stressed' conditions showed an increase in the amount of soluble protein recovered compared to Ant2 without DTT (data not shown).
In addition to the strategies employed to increase protein yields and solubility, other work in parallel was completed to understand whether the lack of protein solubility could be predicted/rationalised based on the sequence and structural properties of these antigens. Further we aimed to assess whether sequence/structural features could guide the design of recombinant antigens to prevent limitations and aid e cient recombinant production.
Analysis of the protein secondary structure Initial analysis of amino acid sequences centered on the prediction of secondary structural elements using the JPred 4 server (Figure 7). Predictions of the single antigens (Figure 7a-b) showed mostly ordered regions with differences in their secondary structure as expected. Analysis of the fusion proteins showed differences in the structure between Ant2-3 and Ant3-2 (Figure 7c-d). Ant2-3 was predicted to contain a 30 amino acid un-structured region between the boundary of Ant2 and Ant3, whereas, Ant3-2 was predicted to contain two alpha-helices either side of the boundary between Ant3 and Ant2. The unstructured region in Ant2-3 could potentially result in improper processing and be the site of proteolytic cleavage as observed previously (Figures 2-4).
Modeling and computational analysis of protein structures Analysis of the amino acid sequence was performed to evaluate whether 'di culties' in the expression of these proteins could be predicted. Amino acid sequence-based prediction using ProteinSol gave an average value below 1 for Ant2 (0.50) and Ant3 (0.81) and both fusions (0.56) -suggesting all antigens were indeed soluble based on a database of all E.coli proteins expressed in a cell-free bacterial expression system. Therefore, sequence based analysis of the antigens could not be used as a predictive tool of recombinant production of these antigens.
In parallel, structural models for Ant2 and Ant3 were generated based on their sequence homology to existing structures published in the protein data bank (PDB). Ant2 had 39% sequence identity to the published structure 4wq4A, whereas Ant3 had a lower homology at 15% with 2vh2A. It is important to note, these structures are predictions of these proteins and actual structures may vary. Models of the fusion proteins could not be generated, as both single antigens are not natural physiological partners.
Structural-based prediction showed that in terms of the electrostatic potential, Ant3 contained a larger positively-charged patch (posQmax = 4125), which has shown to correlate to a poorer solubility (threshold = 2990) (Figure 8b). Predictions for Ant2 (posQmax = 2781) were just within the threshold and considered soluble (Figure 8a). Hydrophobicity analysis of both antigens showed that Ant2 has a notable non-polar patch (Figure 8c), which could potentially have a negative impact on solubility. Ant3 showed no unfavorable features in terms of hydrophobicity (Figure 8d).

Discussion
In this case study we have presented the expression and puri cation data for 'di cult-to-express' antigens from Absynth Biologic Ltd's C.di cile programme. Two antigens, Ant2 and Ant3, were expressed as single antigens and as fusion (double-antigen) proteins to aid expression and decrease the cost-ofgoods. Data shown in this study and observed by Absynth Biologics Ltd showed both single and fusion proteins formed insoluble aggregates when expressed in E.coli. Use of fusion proteins acted to improve the overall yield, but not the solubility of these recombinant targets.
To overcome the limitations in the production of these 'di cult' antigens, widely used strategies were employed to improve bacterial recombinant protein production ( Figure 9). Further, the role of sequence and structural-based features on expression was analysed to help guide improved recombinant protein production ( Figure 9). Initial efforts focused on the optimization of bacterial culture conditions. A preliminary screen of different host E.coli strains showed no improvement in the solubility of these antigens. Further, no signi cant difference was observed between the bacterial cultures expressing all four antigens, suggesting that these antigens did not have a toxic effect and grew similarly between the single and fusion proteins. It was observed that a greater amount of soluble protein and less high and low molecular weight (predicted oligomers/degradation products) were detected at a low induction temperature (18°C), which was in agreement with other published studies [18,[45][46][47][48]. Puri cation studies showed a greater yield and purity of proteins was achieved via puri cation and re-folding from the inclusion body-enriched insoluble fraction. The use of bacterial proteins as solubility tags has been a commonly used strategy to increase the production of recombinant targets. Commonly used tags include highly soluble Trx [49,50] and NusA [51,52] bacterial proteins. Contrary to published reports, the solubility tags did not improve the overall solubility and processing of these antigens.
During the bacterial expression studies it was observed that fusion 1 (Ant2-3) underwent proteolytic processing whereas the alternative fusion Ant3-2 showed greater stability. Analysis of protein stability post-lysis suggested the improper processing of Ant2-3 occurred during the bacterial expression stage rather than post-harvest of cultures. The use of an amino acid sequence-based secondary structure prediction tool, Jpred4, showed the presence of a long unstructured region between the protein boundaries in Ant2-3 and absent in fusion 2 (Ant3-2). Potentially the addition of a linker between Ant2 and Ant3 would prevent improper processing and aid in protein folding and hence solubility. Different exible and rigid linkers have been utilized to construct fusion proteins and aid increased protein expression [53]. Future design of fusion proteins or polypeptides (multiple antigens) could account for the presence of large unstructured regions, with the use of predictive tools prior to expression, to prevent downstream challenges in the expression of such recombinant antigens. Together, the mechanism by which these recombinant proteins formed insoluble aggregates and improper processing of fusion proteins occurred, was not rescued by the optimisation of the bacterial expression conditions. As a result, increased efforts were applied to understand the implications of sequence and structural features on recombinant production of these 'di cult-to-express' antigens.
It was shown that the sequence and structure of these antigens are important considerations when designing antigens and fusions. Analysis of cysteine residues in both antigens, showed the presence of two residues in Ant2 and a single unpaired cysteine in Ant3. As both antigens are fragments of a fulllength protein, the corresponding cysteine pair for Ant3 was absent from this antigen. It is possible these proteins may form di-sul de bonds within and/or between separate protein molecules resulting in insolubility and the formation of the observed oligomers. Therefore, addition of DTT may act to prevent the formation of these di-sul de bonds, therefore stabilising and increasing the solubility of these proteins. This hypothesis is supported by the increased solubility of Ant2 with DTT in the initial puri ed sample compared to Ant2 without DTT (data not shown). The identi cation and removal of redundant free cysteine residues could be included as an additional 'checkpoint' when designing recombinant antigens ( Figure 9). However, the physiological effect of removing such amino acid residues from antigens would need to be explored.

Conclusions
In this case study, we have presented the strategies and predictive tools employed to increase the production of 'di cult-to-express' antigens ( Figure 9). Such studies are important in guiding others towards a critical understanding of potential limitations of speci c approaches. We have highlighted the importance of screening antigen sequences and using predictive computational tools (Jpred and ProteinSol), prior to expression studies, to guide the design of recombinant antigens. Such measures could potentially prevent limitations in their production and increase solubility and yields of 'di cult' recombinant targets. Abbreviations Ant: Antigen; BA: Benzyl alcohol; BME: β-mercaptoethanol; BSA: Bovine serum albumin; DTT: Dithiothreitol; C.diff: Clostridium di cile; DivIB/FtsQ: Bacterial cell division protein; E.coli: Escherichia coli; NusA: N utilisation substance protein A; Trx: Thioredoxin; TsaD/YdiE: tRNA N6-adenosine threonylcarbamoyltransferase; IMAC: immobilised metal a nity chromatography; PCR: polymerase chain reaction; PDB: protein data bank; PBS: phosphate buffered saline.

Declarations Ethics Approval and Consent to Participate
No animals or human subjects were used in the above research.

Consent for Publication
Our manuscript does not contain any individual data in any form.

Availaibilty of Data and Materials
All data generated and/or analysed during this study are included in this published article and its supplementary les.

Competing Interests
The authors declare that they have no competing interests.