Strain construction, rational mutagenesis, and culture temperature selection
The pelB secretion peptide35 was fused to the N-terminus of the mature Chinese northern yellow Cny-EKL (UniProtKB Q6B4R4) EKL protein sequence (Supplementary Figure S1), in an attempt to secrete EKL into the periplasm and thus facilitate cleavage of the pelB signal peptide, folding of the mature EKL protein, and disulphide-bond formation. Even after optimising the fermentation parameters with BL21 E. coli, the best conditions at 30 °C still gave poor soluble expression (0.5 mg L− 1 or 0.13 mg L− 1 OD600 − 1), detectable by SDS-PAGE only after concentrating the cells 20-fold by centrifugation and resuspension, while 99% of the protein formed inclusion bodies (> 55 mg L− 1 or 13 mg L− 1 OD600 − 1). A very low level of activity was detected using a fluorescent assay, also after the 20-fold concentration of cells, and osmotic shock to release the periplasmic fraction. A multiple sequence alignment of 250 sequences retrieved in a BLAST search, identified point mutations towards the consensus sequence with the potential to improve folding, stability, and (we hypothesised) soluble expression (Supplementary Figure S2, and Table S1). Five consensus mutations were constructed, and while four improved soluble expression to some degree (Supplementary Figure S3a), V15Q was found to give the greatest improvement with double the soluble expression, while retaining significant activity (Supplementary Figure S3b). The soluble periplasmically secreted protein remained in the range 0.8–2.5% of total protein for all variants, with the remainder expressed as inclusion bodies.
The more widely studied Bovine EKL (UniProtKB P98072) differs from Cny-EKL via only two mutations R82P and D176E. R82P was also a consensus mutation (Supplementary Table S1). Previous work has shown that these mutations do not interfere with inclusion body formation or subsequent solubilisation, in vitro refolding, and autocatalytic activation20. The C112S mutation was also previously found to enhance in vitro refolding yields from solubilised inclusion bodies by 50%22. We hypothesised that the C112S and consensus R82P mutations could potentially improve EKL folding in vivo. Therefore, we introduced the V15Q, R82P, and C112S mutations, to compare the eight variants (WT pelB-CnyEKL, V15Q, R82P, C112S, V15Q/R82P, R82P/C112S, V15Q/C112S, and V15Q/R82P/C112S). These were expressed in TB medium at both 30 °C and 37 °C in E. coli BL21(DE3), and also in C41(DE3) as this was reported to be more tolerant to toxic protein expression36. The variants were then compared for their total activity within lysates, using a more sensitive version of the fluorescence assay that avoided the need to pre-concentrate the cells.
The relative total activity at 30 °C and 37 °C for each variant-strain combination, is shown in Fig. 1. E. coli C41(DE3) and 37 °C gave consistently higher total activity than BL21(DE3) and 30 °C respectively, for the best variants. The activity of WT pelB-CnyEKL expressed at 37 °C could not be detected above the control background in either strain, whereas a low activity was detectable when expressed at 30 °C, though only in C41(DE3). V15Q gave an 8.5-fold increase in this activity to 240 mM hr− 1 L− 1, and also detectable activity when expressed at 37 °C in C41(DE3). By contrast, R82P and C112S each gave a similar improvements in activity across all conditions, from 160–210 mM hr− 1 L− 1, though with slightly higher activities of 370 and 470 mM hr− 1 L− 1 respectively, for BL21(DE3) expression at 30 °C. The addition of V15Q within V15Q/R82P gave only modest changes in total activity, with slight increases in C41(DE3) and slight decreases in BL21(DE3). By contrast the addition of C112S within R82P/C112S led to a 31-fold improvement in activity in C41(DE3) at 37 °C, compared to R82P, and a 24-fold improvement in BL21(DE3) at 37 °C. Comparisons of single mutations when forming double, and triple mutants revealed the same general impacts of the individual mutations as modest (1.2–3.8 fold increases) for V15Q, significant for R82P (5–78 fold increases), and significant for C112S (16 to 40-fold increases). The highest activity attained was for V15Q/R82P/C112S at 37 °C in C41(DE3), with a 230-fold improvement over that of V15Q, and at least 340-fold higher than the WT CnyEKL, assuming it had activity just below the minimum detection limit of 15 mM hr− 1 L− 1.
That the R82P mutation was highly beneficial was consistent with both the expectation that consensus mutations are frequently stabilising37, and also that proline residues are known to be often stabilising, particularly to loop structures38. The R82P mutation is located within a large structured loop (residues Asn81-Ile94) that also forms part of the active site.
Residue C112 of EKL is is unpaired in recombinant expressions of only the light chain, and a previous study had shown that the C112S mutation was shown to improve refolding yields by 50% for human EKL22, so it is likely also to have contributed positively through a similar mechanism in vivo here.
Random mutagenesis
It was fully expected that DNA-level mutagenesis of the T7 promoter, lac operator, and rbs, as well as silent mutations within the pelB and EKL sequences, could alter the expression dynamics and hence functional expression levels achieved. Therefore, random mutagenesis and DNA-shuffling techniques were targeted across a range of gene elements consisting of the T7 promoter, lac operator, ribosome binding site (rbs), pelB signal peptide, and the EKL gene sequence (Supplementary Figure S1).
Random mutagenesis by error-prone PCR was applied first on the parent variants V15Q and V15Q/C112S, providing an evolutionary option to discover beneficial mutations that could match, or simply rediscover, the beneficial effect of the R82P mutation. Variants were also screened in E. coli BL21(DE3) at 30°C, to discover mutations that could be potentially beneficial under conditions where the total activity was lowest and yet still measurable using high throughput plate-based growth and assays. Thereafter, iterations of DNA-shuffling and HITS-shuffling were performed to optimize the mutational compositions of evolved EKL variants. For shuffling, more moderate evolutionary pressures were applied by expressing the enzyme variants from E. coli C41(DE3) at 30°C. The stabilities of variants to thermal inactivation were also assessed via the loss of activity after a heat challenge, though this was not used as a screening filter. Therefore, EKL variants exhibiting a wide range of improved total activities and stabilities were discovered throughout the directed evolution campaign.
Three random mutagenesis libraries were constructed and screened. A mutation rate of typically 1–3 base changes, with some containing as many as six base changes, was obtained. Given the high level of E. coli background protease activity towards the GD4K-na substrate, compared to the low total activity of EKL variants, all variants with at least 1.2-fold improved activities over their respective parents (V15Q and V15Q/C112S) were included in subsequent DNA-shuffling rounds. Of approximately 3000 random mutagenesis variants screened, 225 EKL variants were found to have improved total activity, and were therefore sequenced and mined for beneficial mutations. A total of 86 mutations were discovered and taken forward to construct DNA-shuffling libraries.
First-round DNA-shuffling libraries
DNA-shuffling of all 86 variants would constitute a library size of 286 possible variants. Instead, several smaller libraries that shuffled typically only 2–6 variants in each, aimed to maintain manageable library sizes and ensure that each mutation was represented at least once in the screening. Ten DNA-shuffled combinatorial libraries (L1 – L10) were constructed on the V15Q/R82P/C112S parent variant, which re-introduced the beneficial R82P mutation, and then transformed into the more favourable C41(DE3) E. coli strain. The mutagenic primers incorporated into libraries (L1 – L10) are shown in Supplementary Table S2. The possible number of variants in each library (except L10) varied from 4 to 64, and so each was screened within one or two 96-microwell plates while retaining a 95% probability of observing each variant at least once39. Variants containing non-synonymous mutations were first grouped as bottom, middle, and top-tiered depending on their relative total activity, and subsets of each tier shuffled into libraries L1 – L5 as in Table S2. Additionally, variants with mutations residing in expression or secretion-related gene elements such as the T7 promoter, lac operator, and the ribosome binding site were independently shuffled in libraries L6 – L8. Finally, libraries L9 and L10 consisted of the remaining mutations that were derived from randomly mutagenized variants that were at least 1.2x improved over V15Q or V15Q/C112S. Sequencing confirmed the presence of 1–6 mutations in each new variant.
Approximately 2400 variants were screened in total from across libraries L1-10, and 370 variants with up to 10-fold improvements in total activity, and up to 5-fold improvement in residual activity after heat inactivation compared to the V15Q/R82P/C112S parent (Supplementary Figure S4), were selected and sequenced. The earlier strategy of randomly-mutagenizing debilitated EKL parent variants containing the P82R mutation, and their expression in the BL21(DE3) strain that was less tolerant to functional EKL expression and secretion, thus provided appropriate evolutionary pressures to discover beneficial mutations, that subsequently also recombined beneficially.
Second to fourth-round DNA-shuffling
Top variants selected from the first-round DNA-shuffling libraries were further shuffled in three new successive libraries (Supplementary Tables S3 and S4). New variants were selected based on a combination of improved total activity and also stability as measured by residual activity after heating. As unintended random mutations were also introduced during the construction of the first round of DNA-shuffling libraries, with unclear contributions, mutant primers based on these mutations were also designed and included in one of the three variant-shuffling libraries. Library L11 was generated by shuffling the two highest activity variants from each of the eight libraries L1–6, L8 and L9 (Supplementary Table S3). Library L12 was generated by shuffling ten additional high-activity variants from libraries L1-L4, L7, and L10, plus one newly-discovered variant from library L11 (Table S3). Library L13 was then generated by shuffling eleven top variants for activity from library L12, along with short oligos encoding twenty less frequently observed mutations from libraries (L1-L12) (Supplementary Table S4), plus six newly appearing random mutations arising from DNA shuffling libraries (L1-L10), with the aim of maximising the diversity search. Approximately 1200 variants were screened (270 from L11, 360 from L12, 540 from L13) and 251 variants sequenced (61 from L11, 86 from L12, 104 from L13). Two separate EKL variants were found to be improved up to 24-fold in total activity and up to 37-fold in stability over the V15Q/R82P/C112S parent variant, respectively (Supplementary Figure S4). While both of these top two variants were obtained from library L12, the final library L13 contained a higher frequency of variants with increased stability. The results also showed that the combinatorial libraries (L11 – L13) were successful in further increasing the genetic diversity between the top hits having improved total activity.
Retesting of all improved variants
Overall, approximately 6500 EKL variants, derived from three random mutagenesis and thirteen combinatorial shuffling libraries, were evolved primarily for improved total activity while also measuring the stability of evolved variants to thermal inactivation. From the thirteen combinatorial library screens, 846 EKL variants with a range of total activity and stability phenotypes were sequenced. It was found that 321 of the 846 sequenced EKL variants were unique (Supplementary Table S5). These unique EKL variants were retested together to enable a direct comparison of their total activity and stability. Retests were carried out with EKL variants expressed at both 30 °C and 37 °C to investigate their sensitivity to expression temperature. Comparisons of the total activity and stability to thermo-inactivation when expressed at each temperature, are shown in Fig. 2, for all unique variants. Interesting variants representing a range of properties were selected for further characterisation below, and are highlighted with larger symbols, including V15Q/R82P/C112S as a reference point. The evolution of total activity for key variants throughout this work is summarised also in Supplementary Figure S5.
Particularly interesting variants included WLEK0362, and WLEK0699, and WLEK0779. WLEK0362 was one of the most stable variants with 93–100% retained activity after heat inactivation, compared to just 4.3% for V15Q/R82P/C112S and 0% for V15Q. It also had moderately improved (up to 1.8-fold) total activity over V15Q/R82P/C112S at 30°C, but the highest overall total activity when expressed at 37°C, reaching 11300-fold improvement over the original WT Cny-EKL. This rise to become the best variant at 37°C was likely due to the WLEK0362 variant being also the most stable, aiding its high soluble and functional expression. WLEK0699 exhibited both good stability (84–89% retained activity) and high total activity that was up to 3.7-fold improved over V15Q/R82P/C112S, and 620-fold higher than WT Cny-EKL when expressed at 30°C. By contrast, WLEK0779 exhibited the highest total activity when expressed at 30°C (4-fold higher than V15Q/R82P/C112S, and 680-fold higher than WT Cny-EKL), but was only 50% as active when expressed at 37°C. Heat inactivation resulted in only 10% residual activity for WLEK0779, and so its poor stability was likely to have been the reason for activity loss when expressed at 37°C.
The expression temperature did not significantly effect the total activity for most EKL variants (Fig. 2A), although a small number of outliers were highly sensitive to expression temperature. However, the expression temperature did result in a noticeable drop in stability to thermo-inactivation for many variants expressed at 37°C compared to at 30°C. This mainly affected the less stable variants, indicating that they had already become more susceptible to thermal denaturation during expression at the higher temperature, perhaps through formation of a population of partially denatured protein or soluble aggregates, that promote further unfolding or aggregation. A parity in stability was reached for the more stable variants, indicating sufficient stability to tolerate expression at 37°C.
The relationship between total activity and the stability as measured from residual activity after a two-hour heat challenge at 50°C, is shown in Fig. 3 for the variants expressed at both 30°C and 37°C. The distributions for the variants expressed at the two different temperatures were very similar, and reveal that increased total activities were biased towards variants that were also more stable. Directed evolution studies often reveal a trade-off whereby variants with greater catalytic efficiency (eg. kcat) lead to some loss in stability. As a result, a minimum threshold stability is often required to accommodate mutations that improve activity40,41. Our assay for total activity represents an overall "fitness", which could improve via increases in several possible factors in addition to catalytic efficiency, including expression level, correct folding, solubility, and the stability of the protein. The non-linear correlation in Fig. 3 suggested that a stability threshold of > 80% residual activity had to be reached before any mutations translated into an impact on total activity, regardless of which of the other factors the mutations were improving. The parent variant V15Q/R82P/C112S was below that threshold and so while selecting for total activity, the initial mutations were being selected indirectly for their stabilising impact.
Statistical ranking of mutational effects
A partial least squares (PLS) analysis was used to deconvolute the effects of individual mutations on EKL total activity and stability, from within the library of 321 unique EKL variants. The 321 unique EKL variants contained 206 unique nucleobase mutations, which included those found on the T7 promoter, lac operator, and ribosomal binding site as well as those encoding residues within the pelB secretion peptide and EKL protein. Redundancies were removed from multiple mutations that always coexisted, by labelling as a single factor with multiple mutations (eg. I135S/t556g), which left a total of 170 mutations in 321 sequences. The frequency of each mutation within the 321 unique sequences, ranged from 1 to 93.
The PLS regression ranked the relative effects of all mutations (as independent variables) on the four dependent (response) variables of total activity and residual activity, each when expressed at 30°C and at 37°C. A 10-fold cross validation resulted in coefficients for each mutation that were obtained from an optimal model (minimum root-mean PRESS) in which 7 components explained 63%, 65%, 70%, and 69% of the variance in activity 30°C, activity 37°C, stability 30°C, and stability 37°C, respectively, reflected in the Pearson correlations between predicted and actual responses of R2 = 0.63–0.7 (Supplementary Figure S6).
The highest coefficient values indicate mutations that have the largest effects on each response in the model. High VIP values indicate mutations which account for more of the overall variability in the sequences themselves, and are therefore of more importance to the model. Mutations were first ranked by their variable importance in projection (VIP) values, to reject those with VIP < 0.6. The remainder were binned into significant mutations with VIP > 1.0, and moderately significant mutations with VIP 0.6 to 1.0, and their coefficients plotted in VIP rank order in Supplementary Figure S6 for each of the four responses.
Silent mutations throughout the promotor, lac operator, ribosome binding site, pelB leader and EKL gene, generally had insignificant impacts on any of the four responses. It is possible that the silent mutation t556g in the EKL gene had an impact, but this mutation was always convoluted with I135S. One silent mutation, g226a (EKL gene), had a significant negative impact on all four responses.
At the protein mutation level, S38T, I135S/t556g, A129T, S127T, M96L, and M100K were beneficial to both activity and stability. H235N was beneficial to activity, but neutral to stability. Then possibly I135V/T, M100T, Q160L, and M48K were also beneficial, but these mutations had VIP values of 0.84 to 0.95 making them less important to the PLS model. The pelB mutation P5’L had a positive impact but only on the total activity obtained when expressed at 37°C. Notable mutations with significant to modest negative impacts were D142F, g226a, P162S, I128V, R124G, L74F, c604g and possibly G27S.
Interestingly, the ranking of mutation I135S, and to a lesser extent I135V or I135T, as highly beneficial for both total activity and stability, was consistent with a previous finding26 that mutation I135K improved the in vitro refolding of EKL. That patent also highlighted M48K as less significant, consistent with our ranking of this mutation as only moderately beneficial for total activity and stability. This validates our statistical analysis to be useful for differentiating between beneficial and deleterious mutations of EKL, including their relative impacts, and has the potential to inform further library designs or rational mutations to improve the properties of EKL.
Purified EKL variant characterization
Several variants were purified and then characterised in more detail to identify the broader links between their stability (Tm and kinetic inactivation rates), soluble expression level, enzyme kinetic parameters, and the total activity obtained in lysates. A total of 10 EKL variants (V15Q, V15Q/R82P/C112S, WLEK0294, WLEK0362, WLEK0488, WLEK0513, WLEK0528, WLEK0699, WLEK0707, and WLEK0779), were expressed at both 30°C and 37°C. To examine the impact of codon-optimization, five variants were additionally codon-optimized to give V15Qopt, V15Q/R82P/C112Sopt, WLEK0362opt, WLEK0513opt, and WLEK0528opt. WLEK0779, WLEK0528 and WLEK0528opt could not be purified in sufficient amounts for characterisation. The residual activity after heat-inactivation in lysates indicated that WLEK0779 was not very stable, and that WLEK0528 was only moderately stable, and so this instability may have hampered their purification. EKL purification used NTA-His6-tag affinity purification followed by affinity capture specific to the EKL active-site using STI-immobilized agarose resin. While the first step would effectively capture all soluble EKL proteins from clarified lysates, the second step would presumably bind only to correctly folded EKL proteins with fully formed active-sites. The characterization results of those EKL variants purified with sufficient amounts of protein, is summarised in Tables 2 & 3.
Table 2
Characterisation of EKL variants expressed at 30°C. Total activity in clarified lysates and specific activity for purified enzyme were each assayed using 0.0625 mM substrate. Functional expression = Total activity / Specific activity. n.d. – no data.
Variants
|
Total activity (lysate)
(mM hr− 1 L− 1)
|
Specific activity (purified)
(mM hr− 1 ng− 1 EKL)
|
Functional expression
(µg EKL L− 1)
|
Specific expression
(µg EKL L− 1 OD− 1)
|
Km
(mM)
|
kcat
(s− 1)
|
kcat/Km
(mM− 1 s− 1)
|
Tm
(°C)
|
ku @ 45oC
(min− 1)
|
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
WT CNY
|
28
|
12
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q
|
240
|
10
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q/R82P/C112S
|
4766
|
117
|
0.014
|
1.3E-03
|
341
|
30
|
60
|
5
|
0.58
|
0.08
|
247
|
23
|
425
|
73
|
48.0
|
0.04
|
0.0085
|
5E-04
|
WLEK0294
|
12067
|
167
|
0.021
|
1.2E-03
|
567
|
32
|
84
|
5
|
0.58
|
0.13
|
321
|
43
|
553
|
141
|
48.7
|
0.05
|
0.0064
|
7E-05
|
WLEK0362
|
8725
|
40
|
0.020
|
3.4E-04
|
443
|
7
|
65
|
1
|
0.91
|
0.38
|
421
|
133
|
463
|
242
|
52.8
|
0.07
|
0.0017
|
5E-05
|
WLEK0488
|
14081
|
2
|
0.010
|
3.3E-04
|
1365
|
43
|
204
|
7
|
0.26
|
0.06
|
81
|
10
|
309
|
82
|
49.2
|
0.05
|
0.0051
|
6E-05
|
WLEK0513
|
8462
|
52
|
0.015
|
3.4E-04
|
566
|
12
|
75
|
2
|
0.43
|
0.05
|
177
|
12
|
414
|
57
|
51.2
|
0.05
|
0.0025
|
1E-05
|
WLEK0528
|
8312
|
71
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
WLEK0699
|
17634
|
196
|
0.025
|
6.4E-04
|
696
|
16
|
126
|
3
|
0.12
|
0.01
|
102
|
5
|
871
|
110
|
50.2
|
0.05
|
0.0043
|
1E-04
|
WLEK0707
|
7501
|
209
|
0.006
|
3.0E-04
|
1174
|
45
|
341
|
13
|
0.14
|
0.01
|
30
|
1
|
215
|
14
|
n.d.
|
--
|
n.d.
|
--
|
WLEK0779
|
19081
|
197
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Qopt
|
239
|
5
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q/R82P/C112Sopt
|
6948
|
108
|
0.008
|
7.3E-04
|
841
|
73
|
162
|
14
|
0.25
|
0.06
|
65
|
8
|
261
|
71
|
48.1
|
0.05
|
0.0074
|
2E-04
|
WLEK0362opt
|
12002
|
180
|
0.013
|
4.9E-04
|
939
|
33
|
166
|
6
|
0.23
|
0.04
|
93
|
7
|
411
|
77
|
52.7
|
0.07
|
0.0018
|
4E-05
|
WLEK0513opt
|
12896
|
114
|
0.023
|
6.1E-04
|
567
|
14
|
108
|
3
|
0.42
|
0.04
|
253
|
15
|
606
|
72
|
51.2
|
0.09
|
0.0025
|
2E-05
|
WLEK0528opt
|
5965
|
73
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
Table 3
Characterisation of EKL variants expressed at 37°C. Total activity in clarified lysates and specific activity for purified enzyme were each assayed using 0.0625 mM substrate. Functional expression = Total activity / Specific activity. n.d. – no data. * WT activity is rounded up to 1, but the limit of detection was approx 15 mM hr− 1 L− 1. Functional expression = Total activity / Specific activity.
Variants
|
Total activity (lysate)
(mM hr− 1 L− 1)
|
Specific activity (purified)
(mM hr− 1 ng− 1 EKL)
|
Functional expression
(µg EKL L− 1)
|
Specific expression
(µg EKL L− 1 OD− 1)
|
Km
(mM)
|
kcat
(s− 1)
|
kcat/Km
(mM− 1 s− 1)
|
Tm
(°C)
|
ku @ 45oC
(min− 1)
|
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
AVG
|
SE
|
WT CNY
|
1*
|
15
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q
|
36
|
13
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q/R82P/C112S
|
8372
|
133
|
0.012
|
3.9E-04
|
698
|
20
|
163
|
5
|
0.28
|
0.04
|
74
|
6
|
260
|
42
|
47.9
|
0.10
|
0.0082
|
2E-04
|
WLEK0294
|
10001
|
103
|
0.011
|
4.7E-04
|
894
|
37
|
216
|
13
|
0.21
|
0.03
|
67
|
5
|
316
|
57
|
48.4
|
0.07
|
0.0076
|
4E-04
|
WLEK0362
|
11330
|
154
|
0.008
|
5.0E-04
|
1352
|
78
|
343
|
20
|
0.24
|
0.05
|
58
|
6
|
245
|
58
|
52.1
|
0.05
|
0.0021
|
7E-05
|
WLEK0488
|
9080
|
35
|
0.011
|
1.3E-04
|
847
|
10
|
262
|
4
|
0.25
|
0.06
|
87
|
9
|
354
|
92
|
48.8
|
0.05
|
0.0058
|
2E-04
|
WLEK0513
|
10857
|
99
|
0.008
|
2.7E-04
|
1426
|
49
|
363
|
19
|
0.23
|
0.05
|
51
|
6
|
219
|
53
|
50.8
|
0.05
|
0.0032
|
3E-05
|
WLEK0528
|
1891
|
23
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
WLEK0699
|
10107
|
82
|
0.041
|
4.7E-04
|
249
|
2
|
106
|
2
|
0.16
|
0.01
|
211
|
7
|
1329
|
104
|
49.1
|
0.2
|
n.d.
|
--
|
WLEK0707
|
5619
|
47
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
WLEK0779
|
10633
|
191
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Qopt
|
62
|
11
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
V15Q/R82P/C112Sopt
|
4645
|
67
|
0.019
|
7.0E-04
|
243
|
8
|
61
|
2
|
0.40
|
0.05
|
226
|
13
|
560
|
82
|
49.0
|
0.1
|
n.d.
|
--
|
WLEK0362opt
|
4264
|
75
|
0.012
|
4.5E-04
|
343
|
11
|
143
|
5
|
0.52
|
0.04
|
168
|
8
|
324
|
30
|
52.7
|
n/a
|
n.d.
|
--
|
WLEK0513opt
|
4382
|
20
|
0.013
|
1.3E-04
|
329
|
3
|
125
|
2
|
0.59
|
0.07
|
204
|
15
|
347
|
47
|
52.6
|
n/a
|
n.d.
|
--
|
WLEK0528opt
|
3245
|
48
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
n.d.
|
--
|
Enzyme kinetics for purified variants
To determine the kinetic parameters of each variant, initial velocity data were fitted by non-linear regression to a modified Michaelis–Menten equation, to account for any substrate inhibition which was apparent in all variants except the V15Q/R82P/C112Sopt expressed at 37°C. All Km values obtained were > 10x the substrate concentration (0.0625 mM) used in the total activity assays and library screening, and hence kcat/Km was the expected rate constant under these conditions. Furthermore, all substrate inhibition constants were found to be in the range 0.6-1 mM, which was thus an insignificant factor in the total activity assays and library screening performed at 0.0625 mM substrate.
The relationship between the catalytic turnover rate constant (kcat) and substrate binding affinity (Km) for all EKL variants, expressed at both 30°C and 37°C, is shown in Fig. 4. For most of the originally selected non-codon optimised variants, this revealed only minor differences in kcat and Km, which mostly maintained kcat/Km values within a narrow range 215–553 mM− 1 s− 1 and with an average standard error of ±120 mM− 1 s− 1. One exception was WLEK0699 which increased kcat/Km up to 5-fold relative to V15Q/R82P/C112S. Therefore, most of the variants were not altered in their catalytic efficiency, and so the total activity must have increased due to a change in level of soluble functional protein expression and its stability.
Changing the expression temperature and codon optimisation affected kcat values, and to a lesser extent Km, indicating an impact on the proportion of the purified protein that was fully functional. Codon optimisation at 37°C expression led to a 3-fold increase in kcat, but conversely a 50% decrease when expressed at 30°C. Decreasing the expression temperature from 37°C to 30°C increased kcat 4-fold for non-codon optimised variants, but decreased it 30% for codon optimised variants. Thus overall, codon optimisation was preferable at the higher expression temperature, whereas non-codon optimisation was preferred at the lower temperature. On balance, the best combination for the highest kcat was achieved with non-codon optimised varaiants expressed at 30°C. This suggests that translation efficiency at the codon level needs to be matched by some other temperature-dependent aspect of translation, translocation, or folding within the protein expression pathway.
It was expected that total activity would increase as a function of soluble protein expression level, especially as the specificity constants (kcat/Km) for purified variants did not vary significantly. The relationship between soluble protein expression and total activity is shown in Fig. 5 for variants expressed at 30°C and 37°C. WLEK0699, the only variant that increased kcat/Km, is plotted separately in each case (triangles). Soluble expression was determined by dividing the total activity in clarified lysates by the specific activity determined for the purified variants at the same substrate concentration of 0.0625 mM. Soluble expression was independently corroborated by SDS-PAGE densitometry, which was less accurate but confirmed the range of values obtained (Supplementary Figure S7). As expected there was an underlying proportionality between soluble expression and total activity, although only strongly correlated at 37°C (R2 = 0.9). The impact of increased kcat/Km for WLEK0699 can be seen as an increase in total activity above the trend followed by the other variants.
Thermostability of purified variants
Comparisons of enzyme kinetic parameters indicated that the proportion of correctly-folded soluble enzyme, was affected by expression temperature and codon optimisation. The thermostability of the purified variants was measured by differential scanning fluorimetry to probe folding quality further, as a possible source for differences in total activity, residual activity, and enzyme kinetics, when expressed under different conditions. The melting temperatures, Tm, for purified variants are summarised in Tables 2 & 3. These Tm values were also observed to be inversely related to their unfolding rate constants at 45°C providing an orthogonal verification of variant stability (Supplementary Figure S8).
A comparison of Tm values to the residual activity after heat inactivation for 2 hours at 50°C, showed that variants with Tm <48.4°C, just below the incubation temperature, had a sharp decrease in residual activity from > 80% to < 10% (Supplementary Figure S9), confirming that denaturation of the protein was the principle reason for loss of residual activity in the screening assays. The improvement in total activity for variants with > 80% residual activity (Fig. 3), therefore also translated into a threshold of Tm ≥48.4°C required to achieve good expression and subsequently increased total activity for which the variants were selected (Supplementary Figure S9B). Therefore, the Tm ≥48.4°C threshold represents the minimal stability required to avoid unfolding, ensure periplasmic refolding in the periplasm, or provide resistance to proteolytic digestion or aggregation during expression at 37°C.
The Tm values did not correlate with soluble expression levels, kcat, Km or kcat/Km when considering the entire dataset together, or after removing the variants with low stability (Tm <48.4°C). However, codon optimisation led to a noticeable 0.6–1.8°C increase in Tm when expressed at 37°C, yet no change when expressed at 30°C. Meanwhile, decreasing the expression temperature from 37°C to 30°C slightly increased Tm by 0.4–1.1°C for non-codon optimised variants, but decreased it by 0.8°C for codon optimised variants. This directly mirrored the influence of these same expression condition factors on kcat as discussed above, and linked an increase in Tm with an increase in kcat. Presumably both resulted from an improved proportion of the purified protein being in the functional native form, rather than being misfolded or aggregated at the time of measurement. Indeed it is often found that higher expression rates are detrimental to the soluble expression of aggregation-prone proteins42.
Structural location of significant mutations
The 321 unique variants used in the PLS model represented a wide range of activities and stabilities obtained from the libraries, and not just those selected for improvements. The location of all mutations with highly positive (beneficial) or highly negative (detrimental) PLS coefficients are highlighted in the structure of EKL along with the locations of mutations present in key variants (Fig. 6). While these mutations were distributed throughout the structure, the positive or negative PLS coefficients for both total activity and stability were segregated such that mutations with negative coefficients were generally closer to catalytic residues or the substrate. This strongly indicates that mutations most detrimental to total activity were those that impacted structural regions directly involved in catalysis or substrate binding. By contrast, those that improved total activity mostly acted indirectly at a distance from the active site, through improved soluble expression and stability. Thus the best mutations for improved total activity tended to be found at a distance from the active site. This is in contrast to most directed evolution studies which have started with good expression and so focussed on improving catalytic efficiency (eg. kcat). In those cases, beneficial mutations are more often found closer to the active site43. In our study, it appears that the biggest gains in total activity were easier to obtain through improved stability and expression, which started from a low point, than from improvement of catalytic efficiency.
Most of the observed mutations can be rationalised post hoc through the local interactions gained or lost, or through altered backbone flexibility expected for mutations to and from proline (eg. P162S and P214L) or glycine (eg. R214G). Some of these are described below for important mutations found in key selected variants.
Comparison of selected variants
Variant WLEK0779 (V15Q/R82P/M96L/C112S/R124G/N169D) exhibited the highest total activity when expressed at 30°C, and yet only moderate total activity when expressed at 37°C, and just 11% residual activity after pre-incubation at 50°C for 2 hours. Examination of Fig. 6C shows how this high activity and yet poor stability resulted from a balance of beneficial and deleterious mutations. M96L was beneficial to both activity (coeffs. 0.71 at 30°C; 0.76 at 37°C), and stability (coeffs. 0.68 at 30°C; 0.56 at 37°C). However, R124G was slightly deleterious to activity (coeffs. -0.24 at 30°C; -0.36 at 37°C), but highly deleterious to stability (coeffs. -0.92 at 30°C; -0.75 at 37°C). N169D was close to neutral for stability, with a slight benefit to activity (coeffs. 0.2 at 30°C; 0.1 at 37°C). Thus, carrying the R124G mutation was the likely cause of rapid inactivation of WLEK0779 during the 2 hour incubation at 50°C, and lower total activity when expressed at 37°C. The R124G mutation removes a salt bridge to E195, while introduction of a glycine would also potentially increase the local backbone flexibility. Indeed, variant WLEK0463 (V15Q/R82P/M96L/C112S/N169D) was identical except for the R124G mutation, and exhibited considerably better stability with 78% and 92% residual activity from 30°C and 37°C expression respectively. However, its total activity, while 2-fold improved over V15Q/R82P/C112S, was only 30–40% that of WLEK0779.
WLEK0362 (V15Q/S38T/L74F/R82P/M100K/C112S/S127T/N169D) was the most stable variant with 93–100% retained activity, and the highest Tm of all when purified. It also had 1.3 to 1.8-fold improved total activity over V15Q/R82P/C112S. Examination of Fig. 6C shows that the high stability and good activity resulted from mutations that were mainly beneficial, and particularly so for stability. S38T had the highest positive PLS coefficients for both activity (coeffs. 1.5 at 30°C; 1.6 at 37°C), and stability (coeffs. 2.0 at 30°C; 1.7 at 37°C), and had the highest importance to the model (VIP of 6.35). S127T was also strongly positive to activity (coeffs. 0.66 at 30°C; 0.67 at 37°C), and stability (coeffs. 1.0 at 30°C; 0.83 at 37°C), and M100K slightly less positive (coeffs. in the range 0.35 to 0.57). As described above N169D was essentially neutral to all responses. Finally the effects of L74F were only modestly detrimental (coeffs. in the range − 0.16 to -0.27) even though its VIP of 4.7 indicates that it was important to the model.
WLEK0488 (V15Q/S38T/L74F/R82P/M100K/C112S/S127T/P162S/H235N) had slightly lower stability, higher total activity in lysates, and 5-fold lower kcat than WLEK0362. The biggest difference was due to P162S which was deleterious to both activity (coeffs. -0.55 at 30°C; -0.58 at 37°C), and stability (coeffs. -0.66 at 30°C; -0.54 at 37°C). This was counteracted by H235N for activity (coeffs. 1.1 at 30°C; 0.77 at 37°C), but not for stability (coeffs. 0.08 at 30°C; 0.06 at 37°C).
Finally, WLEK0699 (P5'L/A19'V/V15Q/R82P/E99A/C112S/A129T/I135S) had improved stability (84–89% retained activity) and a high total activity (5 to 6-fold improved over V15Q/R82P/C112S). As discussed above, this variant was the only one which increased catalytic effciency, such that kcat/Km increased up to 5-fold relative to V15Q/R82P/C112S, compared to the other variants which improved total activity through increased soluble expression. From Fig. 6C it can be seen that this variant introduced three beneficial mutations, E99A, A129T and I135S, and none that were detrimental. A129T was strongly positive to activity (coeffs. 0.79 at 30°C; 0.78 at 37°C), and stability (coeffs. 1.0 at 30°C; 0.81 at 37°C). I135S was even more positive to activity (coeffs. 0.89 at 30°C; 1.03 at 37°C), and stability (coeffs. 1.2 at 30°C; 0.91 at 37°C). E99A was modestly beneficial to activity (coeffs. 0.32 at 30°C and 37°C), and stability (coeffs. 0.34 at 30°C; 0.25 at 37°C). The promotor region mutations P5'L and A19'V were much less influential with either low coefficients (P5'L) or low VIP (A19'V).
The improved catalytic efficiency of WLEK0699 was driven mainly through a decrease in Km observed at either expression temperature, and an increase in kcat seen only when expressed at 37°C. The A129T and I135S mutations were both found near the EKL extended binding site, and hence potentially influencing kcat or Km. The A129T alanine sidechain was 7.0 Å from the backbone amide nitrogen of catalytic residue S187, and so the threonine mutation gives it the potential to hydrogen bond directly to the carbonyl of neighbouring residue G188 to create a more stable active site. The A129 sidechain also packs onto the V15 sidechain in wildtype EKL, which had been mutated to V15Q in the parent sequence. Thus the benefits of A129T may also have been complementary to the V15Q mutation. The I135 mainchain was 10 Å from the lysine sidechain of the substrate DDDK, although separated by the disulphide formed between C183 and C211. The I135 sidechain is also highly solvent exposed and in the middle of a hydrophobic patch with V2, L134, Y136, and A212, making a I135S mutation potentially beneficial to solubility.
The E99 sidechain makes van der Waals contacts with the hydrophobic M100 and L74 sidechains, which unfavourably partially buries an acidic sidechain charge. E99A would remove this unfavourable charge. It is worth noting that M100K which has positive PLS coefficients as described above, would potentially be stabilising by forming a new salt bridge to E99. E99A and M100K only appear together once (in WLEK0433), out of 72 instances of variants containing M100K and 24 instances containing E99A, highlighting a strong selection against the two simultaneous mutations. WLEK0433 ranked low, with little improvement over the V15Q/R82P/C112S parent.