Escherichia coli is one of the most widely used organisms for protein production. Despite decades of progress in heterologous protein expression, the folding of many proteins, especially large multi-domain eukaryotic proteins, remains a significant challenge [1]. Heterologous protein production can account for up to 50% of E. coli’s total cellular protein content and thus puts strain on the native transcriptional and translational machinery [2]. Additionally, the speed of coupled transcription and translation in bacteria can be problematic for producing large eukaryotic proteins. Key protein folding factors from the native host are also absent, causing further protein folding problems. Combined, these factors can lead to misfolding and/or aggregation of the recombinant protein [3]. Over the years many strategies have been developed to improve heterologous protein production [1, 4, 5].
For each recombinant protein being expressed, numerous variables influence the amount of soluble protein being produced by the cell including the protein sequence, expression vector, cell line, and expression conditions. Many of these factors may need to be optimized to achieve expression of active, soluble protein in E. coli. Upon selecting the protein of interest (POI) to be expressed, several avenues to improve protein solubility can be explored. A protein sequence can be altered by eliminating rare codons, reducing stretches of hydrophobic residues, or simply by selecting alternative homologs of the desired protein to increase chances of successful expression
[6–8]. Any changes to the protein sequence itself come with the risk of altering the activity of the protein. Once a protein sequence has been selected, the appropriate expression system must be designed. The promoter, origin of replication, and the inclusion of solubility tags (MBP, DsbC, SUMO, NusA, etc.) can all be altered to enhance protein expression [9–13]. After designing and building the vector harboring the POI, an E. coli expression strain must be chosen. There are many strains that have been engineered for the purpose of heterologous protein expression which are reviewed here [14]. The choice of strain can have a profound impact on protein expression, and it is often difficult to predict which strain of E. coli will give the best results. Finally, the growth conditions of the strain expressing the recombinant protein need to be optimized. The growth media, temperature, concentration of inducer, and time of induction can all be altered to maximize protein production.
Optimizing the multiple parameters involved in protein expression can be time consuming and costly. The previously described methods modify either the expression conditions, the protein, the expression vector, or the strain in which the proteins are expressed. An additional approach is to use the co-expression of molecular chaperones to improve recombinant protein expression. Chaperones are proteins which interact with and assist client proteins during the transition from unfolded to natively folded protein, without being present in the final complex. Chaperones are frequently used to enhance the expression of heterologous proteins [15, 16]. Two approaches are often used: 1) overexpressing native E. coli chaperones which may be at capacity during recombinant expression of a client protein [17, 18] and 2) expressing chaperones not native to E. coli [19–22]. However, there is no single chaperone that will universally enhance heterologous protein expression, thus chaperones assisting in expression of a POI are usually identified empirically, which is a time-consuming process.
Here, we developed a genetic selection capable of identifying chaperones that enhance solubility of a client protein from a diverse chaperone library by linking in vivo, the solubility of POI to cell viability. This genetic linkage can be achieved in several ways. For example, a POI can be directly fused to the N-terminus of an antibiotic resistance protein (e.g. kanamycin [23] or chloramphenicol [24]). While applicable to a broad range of proteins in high-throughput genetic selections, there are drawbacks to using these single-fusion systems. Chiefly, that proteolytic cleavage events can separate the reporter from the protein, uncoupling antibiotic resistance from protein solubility [25, 26]. An alternative approach involves creation of a tripartite fusion in which a POI is fused to multiple partners which must function in concert to produce a selectable or screenable in vivo phenotype. The first tripartite selection system used a fusion of the twin-arginine translocation (Tat) sequence, a POI, and the β-lactamase gene [27]. Proteins must be fully folded to be transported to the periplasm by the Tat machinery. Directing this construct through the Tat pathway serves as an internal quality control step that directly indicates the folding state of the POI. The β-lactamase protein is only active in the periplasm. Therefore, tripartite fusion will only confer ampicillin resistance once it has been correctly folded and transported to the periplasm via the Tat pathway. Using this system, DeLisa et al., assessed a variant library of the Alzheimer’s Aβ42 peptide and demonstrated that they could select for proteins with enhanced solubility by selecting for ampicillin resistant colonies [27]. However, β-lactamase activity could still be separated from protein solubility if proteolytic cleavage occurs in the periplasm.
A more generally applicable tripartite system involves embedding a POI in-frame within an antibiotic resistance gene [28]. β-lactamase tolerates insertion of a foreign protein in a surface exposed loop. Neither half of the split protein is capable of providing resistance to the β-lactam antibiotic penicillin V in isolation [29], but insertion of a soluble POI at this split site results in a functional β-lactamase protein capable of conferring antibiotic resistance. Conversely, if the inserted protein is insoluble the complex will be misfolded and potentially degraded, resulting in the cells exhibiting sensitivity to antibiotics. Bardwell et al. demonstrated that the relative solubility of the inserted protein correlated with penicillin V resistance allowing for selection of proteins with increased solubility from a mutant library. Other iterations of the periplasmic tripartite fusion systems have been used for the selection of increased protein solubility by linking cadmium resistance, ampicillin resistance, or maltose metabolism to protein solubility [30, 31] Cytoplasmic tripartite fusions have also been developed and used split constructs of aminoglycoside-3′-phosphotransferase IIa (APH(3′)), adenylyltransferase (ANT(3″)), and nourseothricin acetyltransferase (NAT), which confer resistance to kanamycin, spectinomycin and nourseothricin respectively [32].
Thus far these tripartite selection systems have been used in combination with a mutagenic approach for increasing the POI solubility, either by mutagenizing POI itself or mutagenizing the E. coli chromosome. The objective of this study was to design a cytoplasmic selection system that couples protein solubility to viability and use this system in combination with a curated library of chaperones. Thus, by selecting for antibiotic resistance, one can rapidly and accurately identify chaperones that increase POI solubility. To accomplish this, we created a unique tripartite construct, where the human Hsp70 ATPase domain (ATPase70), a model eukaryotic insoluble POI, was inserted into the hygromycin B resistance gene aminoglycoside 7″-phosphotransferase-Ia (APH(7″)). Using a library of constitutively expressed chaperones, we demonstrate selection of a protein-folding factor capable of enhancing ATPase70 solubility – its native co-chaperone Hep. We further show that the hygromycin resistance conferred by the ATPase70–APH(7ʹʹ) tripartite fusion correlates with the solubility of the fusion construct. We anticipate our selection system and chaperone library may be broadly applicable to different POI and potentially enable identification of protein-folding factors capable of enhancing expression of insoluble protein constructs.