Improved pyrrolysine biosynthesis through continuous directed evolution of the complete pathway

Pyrrolysine (Pyl, O) exists in nature as the 22 nd proteinogenic amino acid. Despite being a fundamental building block of proteins, studies of Pyl have been hindered by the difficulty and inefficiency of both its chemical and biological syntheses. Here, we improved Pyl biosynthesis via rational engineering and directed evolution of the entire biosynthetic pathway. To accommodate toxicity of Pyl biosynthetic genes in Escherichia coli , we devised an approach termed Alternating Phage Assisted Non-Continuous Evolution (Alt-PANCE) that alternates mutagenic and selective phage growths. The evolved pathway exhibited a 32-fold improved yield of Pyl-containing super-folder green fluorescent protein (sfGFP) compared to the rationally engineered ancestor, whereas the WT pathway produced no detectable quantities of Pyl-containing sfGFP. This study demonstrates that Alt-PANCE provides a general approach for evolving proteins exhibiting toxic side effects, and further provides an improved pathway capable of producing substantially greater quantities of Pyl-proteins in E. coli .

Today, Pyl is found in numerous bacterial and archaeal species but not eukaryotes. Although Pyl has been found in several classes of proteins 3 , it is best known for its essential role in a unique class of methanogenic enzymes 1,4 . Pyl has a remarkably distinct structure compared to other proteinogenic amino acids and is noteworthy for its reactive electrophilic moiety 1 -a feature absent in all other proteinogenic amino acids.
The genetic components required for Pyl incorporation are encoded in a single operon, pylSTBCD 4 , which mediates Pyl biosynthesis and protein incorporation through nonsense suppression of amber (UAG) codons 5 . Within the operon, pylS encodes pyrrolysyl-tRNA synthetase (PylRS), which catalyzes the ligation of Pyl to tRNA, while pylT encodes the corresponding transfer RNA (tRNA Pyl ) 4 . Genes pylB, pylC, and pylD encode enzymes that biosynthesize pyrrolysine from lysine ( Figure 1A) 4 . To date, numerous genetic code expansion studies have utilized PylRS and tRNA Pyl to incorporate synthetic amino acids into proteins, as these genes provide an aminoacyl tRNA synthetase (aaRS)-tRNA pair that does not exhibit cross-reactivity with the existing E. coli translation system [6][7][8][9][10] .
Despite being a fundamental building block of proteins in nature, studies of Pyl proteins have been hindered by the poor supply of the amino acid. In contrast to most synthetic amino acids, Pyl is naturally recognized by PylRS and is ligated to tRNA Pyl with high efficiency. Improving production of Pyl proteins provides an unusual challenge, as genetic code expansion studies have typically focused on improving aaRS-tRNA pairs for better recognition of synthetic amino acids. To date, Pyl protein production has been severely limited by the poor activity of the archaeal biosynthetic pylBCD pathway. When this pathway is expressed heterologously in laboratory strains (such as E. coli), Pyl proteins are produced at a very low yield 4,11 . Supplying cells with an exogenous source of synthetically produced Pyl provides an alternative solution 12 . However, organic synthesis of Pyl is known for its difficulty [12][13][14]  Here, we detail the improvement of the pylBCD pathway for increased production of Pyl proteins in E. coli, performed using a two-step process. Our first step entailed the rational addition of a solubility tag to pylB, resulting in reduced toxic protein aggregation within the cell and also facilitating detectable levels of Pyl-containing sfGFP production. We next devised a version of PANCE that we term "Alternating Phage Assisted Non-Continuous Evolution" (Alt-PANCE), designed to reduce cellular toxicity during evolution. We used this method to evolve pylBCD for increased activity across numerous selection conditions. This process resulted in an additional 32-fold increase in Pyl-sfGFP production mediated by our most active mutant. Our evolutionary characterization found that the majority of mutations occured within pylB, and served to increase cellular production of this protein by ~6-fold. This work provides both a new procedure to enable continuous directed evolution of proteins exhibiting toxic side effects, and further provides a substantially improved biosynthetic pathway for bacterial production of Pyl proteins.

Devising Alt-PANCE and improving PylB solubility
We initially attempted to use PANCE to evolve a codon-optimized variant of the M. acetivorans pylBCD pathway, and the poor initial activity of these genes led us to perform additional optimization before beginning evolution. Following overexpression of the pylBCD pathway, we observed formation of inclusion bodies within each cell ( Figure S1). After noting that cells expressing only pylCD did not form inclusion bodies, we rationally fused a SUMO tag to the N-terminus of PylB to improve its solubility 21 . The addition of a SUMO tag has previously been shown to improve PylB solubility, enabling purification and crystallization of this protein 22 . Following the addition of a SUMO tag to pylB, we observed that expression of SUMO-pylBCD resulted in healthy cells without inclusion bodies, indicating improved PylB solubility in vivo and reduced toxic side effects ( Figure S1). We next cloned SUMO-pylBCD into an M13 selection phage (SP) vector, termed SP.BCD (see Methods).
As expression of SUMO-pylBCD still exhibited a moderate toxic effect on E. coli cells, we next developed an alternating version of PANCE 15 that we termed Alt-PANCE ( Figure 1C) to enable evolution of this pathway for improved activity. Typically, PANCE exposes evolving phage to simultaneous selection and mutagenesis, both of which lead to a high fitness cost and reduced phage titers. We further observed that expression of pylBCD in E. coli results in an additional fitness cost, the cumulative effect of which precludes simultaneous selection and mutagenesis. The Alt-PANCE procedure was thus developed to reduce the fitness cost associated with continuous evolution of mild to moderately toxic genes. Further, the Alt-PANCE approach can be used to simultaneously coevolve multiple genes, in this case the entire Pyl biosynthetic pathway.
To mediate genetic selection, we expressed pylS and pylT within cells to link Pyl production to translation of an amber mutant of the essential phage gene, gIII 16 . Following Pyl biosynthesis, PylRS ligates Pyl to tRNA Pyl , which leads to Pyl incorporation at 1-3 amber codons within gIII and expression of functional PIII ( Figure 1B). Besides mediating selection, covalent ligation of Pyl to tRNA Pyl also limits cell-to-cell diffusion of Pyl, which facilitates evolution by reducing evolutionary "cheating". A similar selection system was used in previous work to evolve pylS for improved incorporation of a Pyl analog-Nɛ-Boc-L-lysine (BocK) 15,18 . Like Pyl, this synthetic amino acid mediates amber suppression 15 .

Continuous directed evolution of Pyl biosynthesis pathway
Following our engineered improvement of PylB solubility, the pylBCD pathway exhibited sufficient activity to initiate Alt-PANCE. We began evolution at low selection stringency by supplementing the media with a starting concentration of BocK (200 µM) high enough to ease phage propagation yet low enough that phage variants producing more Pyl had a selective advantage ( Figure S2B). During selection growths, we began by using accessory plasmid (AP) JH61, which encodes gIII containing one amber codon to facilitate Pyl insertion and also encodes a highly expressed PylRS (Table S1).
During mutagenic growths, phage were propagated using strain S1059 23 , a permissive strain that expresses gIII following phage infection without imposing a selection. Mutagenic growths also included the presence of mutagenesis plasmid MP6, previously shown to increase mutation rate by five orders of magnitude following arabinose induction 24 . SP.BCD was capable of separately propagating across both the selective and mutagenic conditions described above, thereby enabling Alt-PANCE initiation.
We performed Alt-PANCE of SP.BCD across three independent lineages, termed lineages A, B, and C. As phage continued to evolve, we steadily increased selection strength by varying four separate parameters, growing cells across a total of 11 stringency conditions (  Figures S2C and S3).

Analysis of evolved pylBCD mutations
For each lineage, we isolated 10-15 phage plaques; for each isolate, we sequenced the pylBCD insert, its upstream promoter, and ribosome binding sites. We identified 5-8 mutations in each lineage ( Figure 2A; Table S2), 11 convergent high-frequency mutations (Figure 2A), and a total of 16 unique mutations (Table S2). All mutations were found within protein coding sequences and the majority were distal to the active site of each enzyme ( Figure 2B). We identified six distinct subpopulations-36A_sub-pop1, 36A_sub-pop2, 34B_sub-pop3, 34B_sub-pop4, 34B_sub-pop5, and 40C_sub-pop6-that contained representative combinations of convergent mutations (Table S2). The pylBCD cassettes from these subpopulations were cloned into expression vectors to measure the activity of each variant via a coupled super-folder green fluorescent protein (sfGFP) reporter assay, wherein biosynthesized Pyl is incorporated into sfGFP though amber suppression (see Methods).
While most evolved mutants appeared to exhibit higher activity than the ancestral variant SUMO-pylBCD, the highest activity levels were observed in 36A_sub-pop1, 36A_sub-pop2, and 34B_sub-pop3 ( Figure S4). Next, we rationally combined mutations originating from separate lineages to produce five combinatorial variants (Table S3) and observed that 3f2 and JM10.1  Figure   S7 and Table S4) and 151 ( Figure S8 and Table S5, see Methods).
Assessing the 16 unique mutations identified within the pylBCD operon at the endpoint, the majority (9) were found within the pylB coding sequence, and two of those were within the SUMO tag (Table S2). Each isolated subpopulation contained significantly more mutations in pylB than in pylC or pylD, with two subpopulations (36A_sub-pop1 and 40C_sub-pop6) containing no mutations in pylC or pylD. This disparity is also observed in the highly active combinatorial variants 3f2 and JM10.1. Those two variants each contained only a single mutation in pylD, no mutations in pylC, and either six or eight mutations in pylB (Table S3). These observations suggest that improved Pyl production primarily stemmed from mutations in pylB, consistent with prior biochemical evidence and quantum mechanical simulations indicating this enzyme catalyzes the rate-limiting step of Pyl synthesis 11,26 .

Characterization of PylB mutants
Notably, pylB mutations were remarkably convergent in character, resulting in increased cationic protein surface charge. Each Alt-PANCE evolved subpopulation contained mutations that increased the charge of SUMO-PylB by +2 to +5. Combinatorial variants 3f2 and JM10.1 exhibited an even greater change, with cationic charge increasing by +7 and +9, respectively. Mutations changing anionic glutamic acid to cationic lysine residues were prevalent, with four separate instances observed (E84K, E122K, E175K, and E178K). Analysis of the crystal structure of PylB from a closely related organism Methanosarcina barkeri 22 revealed that all four E-to-K mutations occurred at solvent-exposed regions of the protein surface ( Figure 2B). This pattern suggests that, instead of directly improving catalytic properties of PylB, these mutations may confer a benefit towards the biophysical properties of the protein, thereby increasing Pyl production by reducing protein turn-over within the cell.
To better characterize evolved mutations in PylB, we next overexpressed and purified two evolved SUMO-PylB variants (from 3f2 and JM10.1 cassettes) as well as the ancestral SUMO-PylB for further analysis (see Methods). Consistent with prior work, we were unable to measure activity from any purified PylB samples, owing to the extreme lability of its SAM cofactor 22 Figure S9). We further confirmed that solubility is not improved within evolved mutants 3f2 and JM10.1 by cloning SUMO tag deletion variants, which produced inclusion bodies following pathway induction ( Figure S10).
We next examined the effects of evolved PylB mutations on protein thermostability, as protein stability is a key determinant of steady-state protein concentration 28 . For these experiments we used differential scanning fluorimetry (DSF, also known as the Themofluor assay) 29 , a method in which a fluorescent probe is used to monitor protein unfolding as samples are slowly heated (see Methods).
This assay was performed using each of the aforementioned purified PylB samples, both under low and high salt conditions ( Figure S11). Under both salt conditions, evolved mutants exhibited similar melting temperature (Tm) values to one another, while the ancestral SUMO-PylB exhibited a higher Tm. Compared to low salt conditions, higher amounts of salt led to an increased Tm value for the ancestral SUMO-PylB variant, but did not significantly affect the evolved variants.
During purification, we noted that protein yields of both evolved PylB variants were were substantially greater than that of the ancestral SUMO-PylB ( Figure S12). This result was confirmed upon repeating each purification in triplicate ( Figure S13), with variants 3f2 and JM10.1 producing 6.0-fold and 5.6-fold greater yields compared to the ancestral variant, respectively. These findings indicate that the evolved mutants exhibit increased steady-state protein concentration compared to the ancestral variant, consistent with our expectation that these evolved mutations confer a benefit.
Since PylB is an enzyme, such an increase in protein concentration will produce an even larger fold increase in product formation. Indeed, the ~6-fold increase in PylB concentration within evolved mutants may largely or entirely account for the ~32-fold increase in Pyl-sfGFP production ( Figure 3).

Discussion
In light of our above results, we thus conclude that pylBCD mutations evolved during PANCE mediate We note here that we also attempted multiple strategies to extract and quantify free Pyl following its biosynthesis (see Supplementary Methods). Although our approach was guided by prior work 33 , we were unable to detect Pyl in our numerous extracts. While the reasons underpinning our inability to reproduce this protocol are unclear, we postulate that our efforts may have been complicated by chemical modification or degradation of the free amino acid within the complex cellular mileu. Indeed, similar instability is observed with the 21 st amino acid selenocysteine, as this residue and its biosynthetic intermediates are highly labile; selenocysteine becomes stabilized only upon incorporation into proteins 34,35 .
The Alt-PANCE procedure developed here establishes a general methodology for rapidly evolving proteins for improved function while reducing toxic side effects. Our results indicate this method can improve protein stability within the producing cell by altering biophysical parameters that are difficult to address via rational or targeted methods 36 . While prior PACE and PANCE experiments have primarily evolved single proteins 19, 37 , here we apply these techniques to a complete multi-gene biosynthetic pathway. A persistent challenge lay in diffusion of biosynthesized products between competing cells 37 , which enables evolutionary escape. Our results demonstrate that ligation of the biosynthetic product to tRNA sufficiently reduces diffusion to enable successful continuous directed evolution. As numerous naturally occurring ncAAs cannot be biosynthesized in E. coli at high yield 38 , our selection system could be applied to evolve other valuable amino acid biosynthetic pathways.
Our improved pylBCD pathway enables production of useful Pyl proteins at substantially greater yields. As the 22 nd amino acid, Pyl is fundamental to the origins of life, yet has long been difficult to study. In nature, Pyl is a critical residue in methanogenic enzymes-mttB, mtbB, mtmBthat produce methane from methylamines 1 . Methane is increasingly viewed as a substitute for fossil fuels, but microbial methanogenesis is only known to occur in archaea 39 , which is recalcitrant to genome engineering. Heterologous expression of these Pyl-containing methanogenic enzymes in E.
coli, a chassis much easier to engineer than archaea, is now feasible and can provide a viable strategy toward the production of industrially relevant quantities of methane.
The unusual chemical properties of Pyl further enable other exciting bioengineering applications. While electrophilic moieties are notably absent in other proteinogenic amino acids 40 , the imine group found within the Pyl pyrrole ring is both electrophilic and highly reactive. This moiety has been previously shown to react with 2-amino-benzaldehyde (2-ABA) and 2-amino-acetophenone (2-AAP) groups to form a bio-orthogonal crosslink 11 , which provides a bioconjugation method with unique chemistry distinct from other approaches. The positive charge carried by Pyl also enables its use in antimicrobial peptides whose mechanisms of action require a cationic charge 41