Strain engineering is now a reliable approach to scale up production of target metabolites by integrating known genes, and applying simple yet effective metabolic engineering strategies1. But engineering the microbial production of secondary metabolites reaches the limitation of the characterized enzymes present in sequence databases, where many annotations are incorrect. In reality, there are millions of enzyme variants to choose from for each desired reaction, and a great abundance of variations are still hidden in nature with unknown sequence and function. In this way the evolution of nature over millions of years can be viewed as a highly diverse screening resource for synthetic biologists. Accordingly, the rational discovery of natural enzymes with novel functions is a powerful and inevitable approach to improve microbial bioproduction pathways2-6.
The current study clarifies how machine learning can reveal novel enzyme functions with potential for sustainable biosynthesis. In our previous study, aromatic acetaldehyde synthase (AAS) was predicted with the enzyme selection software M-path to improve production of valuable alkaloids7. However, only EC digits could be predicted with M-path and the actual selection of candidate sequences had to be performed by human intuition. This issue is addressed by developing a support vector machine (SVM) algorithm8 to automatically select specific enzymes sequences: an upgrade that enables computer automated Design, Build, Test and Learn (DBTL) cycles.
To prove the concept of machine learning enzyme prediction, conversion of tyrosine to benzylisoquinoline alkaloid (BIA) is selected as the target pathway for optimization (Fig. 1). BIAs are precursors to opioid analgesic medications that are currently mass-produced by industrially grown Papaver somniferum plants, which are a historical target for human-directed evolution of natural product production. While opioid misuse is a global problem, natural and semi-synthetic opioids derived from the BIA reticuline actually result in fewer deaths than less expensive and overly potent synthetic opioids (CDC Opioid Data Analysis and Resources). With diverse potential, natural BIAs have been shown to inhibit coronavirus9, and the BIA norcoclaurine is a β2-adrenergic receptor agonist that is present in edible plants, medicinal herbs and sports supplements10,11.
BIA production in Escherichia coli has utilized bacterial monoamine oxidase and insect DHPAAS to generate toxic 3,4-dihydroxyphenylacetaldehyde (DHPAA)12. However, the DHPAA containing pathways result in rapid loss of unstable catechol containing intermediates7,12-14. Other reports suggest that plants favor the 4-hydroxyphenylacetaldehyde (4HPAA) pathway to norcoclaurine (Fig. 1a), which may be more stable due to lack of a catechol group in early intermediates. Therefore, plant 4HPAA pathways offer potential to prevent loss of BIA intermediates in E. coli. Furthermore, the combination of 4HPAA and DHPAA pathways may also improve utilization of tyrosine and aryl acetaldehydes. Despite success with the 4HPAA pathway in yeast1,4,15,16, and many discussions on the expected phenylpyruvate decarboxylase (PPDC, EC 4.1.1.43) and AAS (EC 4.1.107-9 ) activities in plants, no enzymes to produce aryl acetaldehydes 4HPAA or DHPAA have been characterized from high alkaloid producing poppy plants17. Moreover, no plant sequence annotated as phenylpyruvate decarboxylase can be found from public databases, and numerous P. somniferum cytochrome P450 (CYP450) monooxygenases (EC 1.14.14) require complex clarification. This serious limitation in known enzymes is addressed by applying machine learning to predict the essential missing links in plant alkaloid pathways shown as dotted arrows in Figure 1.
To guide the selection of sequences from over 100 candidates present throughout highly duplicated carboxy-lyase and oxidase gene families, 8 refined SVM models are built from training sequences classified using structure-based rules. Then, to verify the machine learning prediction, 50 strains expressing combinations of candidate sequences and analogous templates are screened using liquid chromatography–mass spectrometry (LC-MS)-, capillary electrophoresis–MS (CE-MS)- and gas chromatography–MS (GC-MS)-based metabolomics. As a result, AAS, PPDC, N-methylcoclaurine 3-hydroxylase (NMCH) and CYP450 reductase (CPR) enzymes with distinct features are identified as missing links that mediate uncharacterized branches of the Papaver somniferum alkaloid pathway. Synergistic combination of predicted sequences together with homologous microbial templates affords 96.7 mg/L norcoclaurine, 71.8 mg/L N-methylcoclaurine (NMC) and 24.6 mg/L reticuline, without using any strain engineering. The alternative branches of flux from tyrosine to downstream alkaloids are confirmed using dynamic metabolic profiling5 with mechanism-directed deuterium labeling patterns.
Prediction and discovery of P. somniferum TyDC1 as a missing link to plant arly acetaldehydes
DHPAA and THP are more easily oxidized and more toxic than their corresponding 4-hydroxyphenyl analogues12. Therefore, the 4HPAA pathway to norcoclaurine is explored as a first example of machine learning enzyme selection to construct an improved metabolic pathway (Fig. 2). Our previous M-path analysis identified 4-hydroxyphneylacetaldehyde synthase (4HPAAS, 4.1.1.108) to mediate 4HPAA production from tyrosine; however specific 4HPAAS sequences are incompletely annotated throughout most databases. In this study the term AAS is used to cover plant-type AAS enzymes 4HPAAS and phenylacetaldehyde synthase (PAAS, 4.1.1.109) as well as insect 3,4-dihydroxyphenylacetaldehyde synthase (DHPAAS, EC 4.1.1.107), because substrate specificities are often mixed throughout these groups.
Unclear variations within the plant-type AAS group, which may act upon a wide range of substrates including phenylalanine, tyrosine, 3,4-dihydroxy-L-phenylalanine (L-DOPA), tryptophan and histidine, further complicates the selection of a correct sequence based on phylogenetic and structural analyses alone. Accordingly, no AAS enzyme from P. somniferum has been clearly established17. To overcome this challenge in prediction, our SVM-based algorithm8 is first applied to select AAS from P. somniferum homologs annotated as tyrosine/DOPA decarboxylase (TyDC) (Fig. 2b and Supplementary Table 1).
AAS often exhibits mixed carboxy-lyase and oxidative deamination activities, so separate SVM models for aromatic amino acid decarboxylase (AAAD) and AAS (Supplementary Table 1) were trained using sequences classified as described in the methods. According to database annotations and previous reports, P. somniferum TyDC (PsTyDC) proteins should be expected to catalyze the decarboxylation of tyrosine to form tyramine, and possibly L-DOPA conversion to dopamine20,21. In contrast, SVM-based models show that while most of the 8 full length PsTyDC sequences have high potential for AAAD activity, PsTyDC1-8 also appear in AAS prediction space, with PsTyDC1, PsTyDC2 and PsTyDC6 scoring highest for positive AAS prediction (Fig. 2b and Supplementary Table 1). PsTyDC6 contains AAAD-like active site residues Y98, F99, H205, Y350 and S372, and scores high for AAAD prediction, suggesting PsTyDC1 as a better candidate. PsTyDC2 scores high for AAS and low for AAAD and should be explored as AAS in future studies, but it was suggested that PsTyDC2, which also contains AAAD-like residues Y100, F101, H203, Y348 and S370, mediates typical tyrosine decarboxylase activity according to the original report21. In addition to positive AAS prediction scores, PsTyDC1 and PsTyDC3 contain unique active site residues L205 and I370 (serine in all other TyDC sequences), respectively, further suggesting atypical activities of these test sequences.
In accordance with the SVM prediction, expression of PsTyDC1 in E. coli using various plasmids leads to in vivo production of norcoclaurine, with significantly low production of tyramine (Fig. 2c), indicating that PsTyDC1 has specific 4HPAAS activity with low bifunctional AAAD activity. As a positive AAS control, PsTyDC1-Y98F-F99Y-L205N with engineered active site residues transplanted from insect DHPAAS, is also presented with similar results to that of wild-type PsTyDC1. After substitution of PsTyDC1-L205 to a histidine residue found in typical AAAD, the decarboxylation product tyramine increases dramatically (Fig. 2c and Supplementary Fig. 1). PsTyDC1 mediated production of norcoclaurine is further confirmed in strains with additional variations in the alkaloid pathway (Supplementary Fig. 2 and Table 1). Consistent with lower AAS prediction scores, wild-type PsTyDC3 produces lower ratios of AAS product norcoclaurine to AAAD product tyramine, compared to ratios of wild-type PsTyDC1.
Table 1 | Aromatic producing strains of this study
Strain
|
Genotype
|
Conditions
|
Products
|
BL21(DE3)
|
F– ompT gal dcm lon hsdSB(rB–mB–) λ(DE3 [lacI lacUV5-T7p07 ind1 sam7 nin5]) [malB+]K-12(λS)
|
-
|
-
|
BL21-AI
|
F– ompT gal dcm lon hsdSB(rB–mB–) [malB+]K-12(λS) araB::T7RNAP-tetA
|
-
|
-
|
T1-01-DE3
|
pCDFD-TfNCS-PsTyDC1
|
LB-Tyr+DA
|
NC
|
T1-02-DE3
|
pCDFD-TfNCS-PsTyDC1-S
|
LB-Tyr+DA
|
Tyramine
|
T1-03-DE3
|
pCDFD-TfNCS-PsTyDC1-T
|
LB-Tyr+DA
|
NC
|
T3-01-DE3
|
pCDFD-TfNCS-PsTyDC3
|
LB-Tyr+DA
|
trace NC
|
T3-02-DE3
|
pCDFD-TfNCS-PsTyDC3-S
|
LB-Tyr+DA
|
Tyramine
|
T3-03-DE3
|
pCDFD-TfNCS-PsTyDC3-T
|
LB-Tyr+DA
|
NC
|
T1-04-DE3
|
pCDFD-PsONCS3-PsTyDC1
|
LB-Tyr+DA
|
trace NC
|
T1-05-DE3
|
pCDFD-PsONCS3-PsTyDC1-S
|
LB-Tyr+DA
|
Tyramine
|
T1-06-DE3
|
pCDFD-PsONCS3-PsTyDC1-T
|
LB-Tyr+DA
|
trace NC
|
T3-04-DE3
|
pCDFD-PsONCS3-PsTyDC3
|
LB-Tyr+DA
|
trace NC
|
T3-05-DE3
|
pCDFD-PsONCS3-PsTyDC3-S
|
LB-Tyr+DA
|
Tyramine
|
T3-06-DE3
|
pCDFD-PsONCS3-PsTyDC3-T
|
LB-Tyr+DA
|
trace NC
|
T1-07-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-TfNCS-PsTyDC1
|
M9-Tyr+DA
|
Reticuline
|
T1-08-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-TfNCS-PsTyDC1-S
|
M9-Tyr+DA
|
Reticuline
|
T1-09-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-TfNCS-PsTyDC1-T
|
M9-Tyr+DA
|
Reticuline
|
T1-10-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-PsONCS3-PsTyDC1
|
TB-DOPA*;
TB-Tyr+DOPA
|
1 μM Reticuline;
NC
|
P1-01-AI
|
pBAD-PsPDC1
|
M9-4HPP
|
Tyrosol
|
P2-01-AI
|
pBAD-PsPDC2
|
M9-4HPP
|
P3-01-AI
|
pBAD-Ps2HCLL
|
M9-4HPP
|
P1-02-AI
|
pACYC-3CjMTs-PpDDC, pCDFD-PsONCS3, pBAD-PsPDC1
|
M9-Tyr+DA;
TB-DOPA*;
TB-Tyr*+DOPA*
|
NC;
1.5 μM Reticuline; NMC, Reticuline
|
P1-03-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-PsONCS3-PsTyDC1, pBAD-PsPDC1
|
TB-Tyr+DA
|
NC
|
P1-04-AI
|
pACYC-3CjMTs-PpDDC, pCDFD-PsONCS3-PsTyDC1, pBAD-PsPDC1
|
TB-DOPA;
TB-Tyr*+DOPA*
|
1.5 μM Reticuline;
NMC, Reticuline
|
N1-01-DE3
|
pET23a-3PsMTs, pCOLAD-PsNMCH-PsCPR
|
TB-NC
|
27.8 μM Reticuline
|
N1-02-DE3
|
pET23a-3PsMTs, pCOLAD-PsNMCH-H203Y-PsCPR
|
TB-NC
|
15.8 μM Reticuline
|
N1-03-DE3
|
pET23a-3PsMTs, pCOLAD-PsNMCH-AtATR2
|
TB-NC
|
15.9 μM Reticuline
|
N1-04-DE3
|
pET23a-3PsMTs, pCOLAD-PsNMCH-H203Y-AtATR2
|
TB-NC
|
8.0 μM Reticuline
|
N2-01-DE3
|
pET23a-3PsMTs, pCOLAD-EcNMCH-AtATR2
|
TB-NC
|
4.9 μM Reticuline
|
N2-02-DE3
|
pET23a-3PsMTs, pCOLAD-EcNMCH-Y202H-AtATR2
|
TB-NC
|
8.4 μM Reticuline
|
N2-03-DE3
|
pET23a-3PsMTs, pCOLAD-EcNMCH-PsCPR
|
TB-NC
|
3.7 μM Reticuline
|
N2-04-DE3
|
pET23a-3PsMTs, pCOLAD-EcNMCH-Y202H-PsCPR
|
TB-NC
|
3.6 μM Reticuline
|
DS-01-DE3
|
pACYC-3CjMTs-PpDDC-S, pCDFD-PsONCS3
|
M9-DOPA
|
DT-01-DE3
|
pACYC-3CjMTs-PpDDC-T, pCDFD-TfNCS
|
M9-DOPA
|
16.8 μM THP, 2.5 μM Reticuline
|
DT-02-DE3
|
pACYC-3CjMTs-PpDDC-T, pCDFD-PsONCS3
|
M9-DOPA
|
34 μM THP, 6.4 μM Reticuline
|
DS-02-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-S
|
M9-Tyr+DA
|
NC
|
DD-01-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-D
|
M9-Tyr+DA
|
DT-03-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-T
|
M9-Tyr+DA;
M9-DOPA*
|
2.1 μM THP, 0.26 μM Reticuline
|
DQ-01-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-Q
|
M9-Tyr*+DA
|
DS-03-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-PpDDC-S, pET23a-EcHpaBC
|
M9-Tyr
|
NC
|
DD-02-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-PpDDC-D, pET23a-EcHpaBC
|
M9-Tyr
|
DT-04-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-T, pET23a-EcHpaBC
|
M9-Tyr
|
DA, THP
|
T1-11-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-PsONCS3-PsTyDC1, pET23a-EcHpaBC
|
LB-Tyr
|
DA
|
A1-01-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-ARO10
|
TB-Tyr*+DA*;
TB-DOPA*+DA*
|
356 μM NC, 240 mM NMC; 1 μM Reticuline
|
A1-02-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-ARO10
|
|
|
A1-03-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-ARO10, pET23a-EcHpaBC
|
M9-Tyr*;
TB-Tyr*
|
4HPP, DA, NC;
THP, NMC
|
DS-04-DE3
|
pACYC-3CjMTs-PpDDC, pCDFD-CjNCS-PpDDC-S, pET23a-EcHpaBC, pE-BmDHPAAS
|
|
|
A1-05-DE3
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-ARO10, pTrc-BmDHPAAS-T
|
TB-Tyr+DOPA
|
DA
|
A1-06-AI
|
pACYC-3CjMTs-PpDDC-T, pCDFD-CjNCS-ARO10, pTXB1-PsTyDC1
|
TB-Tyr+DOPA; TB-Tyr+DA
|
74.9 μM Reticuline; 112 μM NMC
|
P1-05-AI
|
pACYC-3CjMTs-PsNMCH, pCDFD-CjNCS-PpDDC-T, pBAD-PsPDC1
|
TB-Tyr+DOPA
|
|
P1-06-DE3
|
pACYC-3CjMTs-PpDDC-T, pCDFD-PsONCS3, pBAD-PsPDC1
|
TB-Tyr+DOPA
|
3.7 μM Reticuline
|
P1-07-AI
|
pACYC-3CjMTs-PpDDC-T, pCDFD-PsONCS3, pBAD-PsPDC1
|
TB-Tyr+DOPA
|
61.7 μM Reticuline
|
Strains with names ending with "DE3" are derived from BL21(DE3), and strains ending with "AI" are derived from BL21-AI. Plasmid details are given in Supplementary Table 6. The last two columns list successfully tested conditions (growth medium and added substrate) and produced BIAs or BIA precursors. Concentrations of extracted NMC and reticuline per culture volume are listed for AI-01-DE3, A1-06-AI, P1-06-DE3 and P1-07-AI; all other listed concentrations represent titers in filtered culture medium. Only product titers precisely quantified above 1 μM are listed. Matched substrates and corresponding products are indicated by bold font. Substrates marked with * include isotopes (tyrosine-13C, tyrosine-d4, L-DOPA-d3 or dopamine-d2). Abbreviations: PsONCS3 - P. somniferum multi-domain NCS, CjNCS - Coptis japonica NCS, 2CHLL - 2-hydroxyacyl-CoA ligase-like, S - single variant, D - double variant, T - triple variant, Q - quadruple variant, 3CjMTs - C. japonica 6OMT, CNMT and 4OMT, 3PsMTs - P. somniferum 6OMT, CNMT and 4OMT, Tyr - tyrosine, 4HPP - 4-hydroxyphenylpyruvate, DA - dopamine, DOPA - 3,4-dihydroxy-L-phenylalanine, NC - norcoclaurine, THP - tetrahydropapaveroline, NMC - N-methylcoclaurine.
P. somniferum PDC1 decarboxylates 4-hydroxyphenylpyruvate in an alternative 4HPAA bypass pathway
Phenylpyruvate decarboxylase (PPDC) is an alternative to AAS for production of aryl acetaldehyde intermediates 4HPAA and DHPAA (Fig. 2, d-f). Previous reports hypothesize that P. somniferum should contain PPDC with specificity towards 4-hydroxyphenylpyruvate (4HPP)17; however no plant protein accessions are found with the annotation of phenylpyruvate decarboxylase. In comparison to the known enzymes with PPDC activity, including Azospirillum brasilense ipdC22, Lactococcus lactis KdcA23, and yeast ARO1024, the PsPDC1 active site more closely resembles that of typical pyruvate decarboxylases25 (Fig. 2d). Yet, in SVM prediction models constructed according to the methods section, P. somniferum PDC1 (PsPDC1) scores high for PPDC activity, relative to other homologous sequences (Fig. 2e, Supplementary Fig. 3a and Supplementary Table 2). Two additional test candidates, PsPDC2 and 2-hydroxyacyl-CoA ligase-like sequences, are predicted with lower scores, and found to exhibit lower levels of 4HPP decarboxylase activity, compared to that of PsPDC1 (Fig. 2, e - f, and Supplementary Fig. 3).
In vivo screenings with PsPDC1 reveal the alternative alkaloid route through 4HPP, and this PPDC bypass is distinct from the direct aromatic amino acid branch mediated by PsTyDC1 (Fig. 2f). Application of PsPDC1 for conversion of tyrosine through the 4HPP and 4HPAA containing pathway results in improvement in norcoclaurine titers to the >10 μM range (Fig 2f), compared the 100-200 nM range of PsTyDC1 (Fig. 2c and Supplementary Figure 2).
Paired prediction of NMCH and CPR missing links extend the 4HPAA pathway
After constructing the 4HPPA pathway to noroclaurine, P. somniferum cytochrome P450 (CYP450) homologs of NMCH are next considered to extend this pathway to reticuline (Fig. 3). Currently all yeast alkaloid production studies utilize characterized Eschscholzia californica NMCH (EcNMCH) for conversion of NMC to 3-hydroxy-N-methylcoclaurine (3HNMC) within the conversion of norcoclaurine to reticuline1,15,26. There are several promising P. somniferum CYP450 sequences annotated as NMCH based on gene expression analyses in plants27; however these reports lack clear in vivo or in vitro characterization of the exact P. somniferum NMCH (PsNMCH) sequence28. Furthermore, the presence of many additional CYP450 homologs in the P. somniferum genome further complicates the selection of the best candidate sequence. To clarify the selection of this important missing link, a SVM model was trained using plant CYP80B sequences annotated as "N-methylcoclaurine hydroxylase" as positive examples. 100 P. somniferum CYP450 sequences were then tested against this model to assist the selection of an optimal candidate (Fig. 3b, Supplementary Table 3). As a result, PsNMCH Isoform 1 (PsNMCH-I1) scored high against the model and was selected.
A CPR redox partner for PsNMCH was also selected using the same workflow. While a CPR sequence has been characterized from P. somniferum29, the referenced sequence AAC05021.1 is annotated as "NADPH:ferrihemoprotein oxidoreductase", confusing the rapid database selection of this sequence as CPR. Moreover there are at least 8 other unique P. somniferum sequences with high CPR homology that have not been characterized. After testing the 8 additional P. somniferum candidates against the CPR SVM model, XP_026404029.1 is selected as a high scoring sequence (Fig 3c and Supplementary Table 4), and observed to exhibit CPR activity (Fig. 3d). This new CPR sequence is annotated as "NADPH--cytochrome P450 reductase-like", and accordingly it is referred to as PsCPR-L in this manuscript.
NMCH activity is evaluated by converting norcoclaurine to stable reticuline using NMCH and CPR variants expressed together with norcoclaurine 6-O-methyltransferase (6OMT), coclaurine N-methyltransferase (CNMT) and 3-hydroxy-N-methylcoclaurine 4-O-methyltransferase (4OMT) (Fig. 3d and Table 1). NMC accumulates much more than other intermediates in this system, and therefore reticuline titers should reflect the activity of the NMCH bottleneck. In this system, PsNMCH-I1 affords higher amounts of reticuline than that of EcNMCH, when paired with either PsCPR-L or AtATR2 (Fig. 3d). PsNMCH-I1 pairs best with PsCPR-L from the same species, resulting in the highest amount of reticuline. On the other hand, reticuline production with EcNMCH is best with AtATR2 pairing, with no improvement from PsCPR-L pairing.
Just one residue difference is observed when comparing the binding pockets of PsNMCH and EcNMCH: PsNMCH-H203 versus EcNMCH-Y202 (Fig. 3a). SVM prediction of PsNMCH-H203Y and EcNMCH-Y202H sequences results in lower and higher SVM scores, respectively (Supplementary Table 3 and Fig 3c), indicating that the SVM model is able to identify this key residue as an important feature. Consistent with this prediction, transplantation of EcNMCH-Y202 into engineered PsNMCH-H203Y results in lower reticuline, and transplantation of PsNMCH-H203 into engineered EcNMCH-Y202H results in higher conversion of norcoclaurine to reticuline when paired with AtATR2. The improvement in reticuline with EcNMCH-Y202H could be replicated in a second independent test of the same strains (data not shown).
Early in vivo tests of PsNMCH-I1 without a CPR redox partner in E. coli did not result in detectable NMCH activity, but L-DOPA production from tyrosine was detected (Supplementary Fig. 2). This led us to hypothesize that PsNMCH-I1 might also have potential tyrosine 3-monoxygenase activity; however the observed L-DOPA production is probably more likely to be mediated by native E. coli HpaBC. To further clarify this important missing link in P. somniferum, the candidate CYP450 monooxygenase sequences are also explored as potential tyrosine 3-monooxygenase templates (Supplementary Table 5). Here, the candidate sequences are tested against a plant CYP76AD SVM model and a combined SVM model trained with plant CYP76AD, CYP98A3 and CYP199A2 sequences that hydroxylate tyrosine and structurally similar compound coumaric acid30-32. CYP98A2-like (XP_026403623.1), geraniol 8-hydroxylase-like (XP_026409442.1) and flavonoid 3',5'-hydroxylase 1-like (XP_026378021.1) sequences appear as prime targets with relatively high scores in the positive prediction space of both high-dimensional models of Supplementary Table 5. Completion of the discovery and testing of the tyrosine 3-monooxygenase candidate sequences, which are toxic and difficult to clone, is currently being addressed in a parallel study.
Expansion of PsTyDC1 and PsPDC1 routes into dual pathways
Docking of L-DOPA to PsTyDC1 (Fig. 2a) shows stronger binding affinity than that of tyrosine, suggesting that PsTyDC1 may also possess DHPAAS activity. Accordingly, co-expression of PsTyDC1 with PsNMCH-I1, 6OMT, CNMT and 4OMT, results in a plant-gene only dual pathway through 4HPAA and DHPAA to norcoclaurine and reticuline (Supplementary Fig. 2). Discovery of DHPAAS activity mediated by PsTyDC1 is confirmed in other strains, and also with isotope tracing from L-DOPA-d3 to downstream deuterium labeled BIA (Table 1). Based on these findings, the potential DHPAAS activity of PsTyDC1 is further explored to combine norcoclaurine and THP pathways (Fig. 4a).
After incorporating L-DOPA decarboxylase (DDC) from Pseudomonas putida (PpDDC) for in vivo dopamine production and optimization in Terrific Broth (TB), PsPDC1 and PsTyDC1 containing strains produce reticuline from L-DOPA via the DHPAA containing pathway, with titers reaching the μM range (Fig. 4b). This result indicates that PsPDC1 can also produce DHPAA from 3,4-dihydroxyphenylpyruvic acid (DHPP) that is supplied by L-DOPA transamination. Previously, a single strain containing DHPAAS, 6OMT, CNMT and 4OMT only produced reticuline titers of 0.2 μM from L-DOPA7. Under optimized conditions, PsPDC1 performs better than PsTyDC1 up to 44 hours. However, 61 hours after addition of L-DOPA substrate, PsPDC1-mediated reticuline titers decline slightly, likely due to oxidative degradation, and PsTyDC1 mediates a similar titer to that of PsPDC1. PsPDC1 works synergistically with PsTyDC1 at later production times to maintain higher reticuline. Accordingly, combinations of PPDC and AAS are next explored to improve BIA titers.
Switching missing link templates to homologous microbial sequences for improved production
Natural plant enzymes PsTyDC1 and PsPDC1 exhibit desired specificities, but their expression and activity in E. coli is sensitive, contributing to limited titers. Therefore, to better apply missing link functions, analogous microbial sequences are explored as enzyme engineering templates (Fig. 5).
AAS activity analogous to that of PsTyDC1 could be engineered into the bacterial PpDDC template by transplanting Bombyx mori DHPAAS (BmDHPAAS) catalytic residues F79, Y80 and N181. Rationally engineered PpDDC-Y79F-F80Y-H181N mediates improved THP production in E. coli (Fig. 5a). Switching from PsPDC1 to a Saccharomyces cerevisiae ARO10 template confers improved in vivo PPDC activity towards both DHPP (Fig. 5b) and 4HPP (Fig. 6), in comparison to corresponding strains containing PsPDC1. However, the high activity of ARO10 may come at a specificity tradeoff, as production of additional aromatic keto acid derived alkaloids result from ARO10 expression (Fig 5d).
Combinations of natural and analogous enzyme templates result in improved E. coli BIA production (Fig. 6a and Table 1). Expression of PpDDC- Y79F-F80Y-H181N together with PsPDC1 in strain P1-07-AI selectively promotes the DHPAA pathway in the presence of tyrosine and L-DOPA to produce 20.3 mg/L reticuline, while the application of ARO10 in strain A1-01-DE3 selectively favors the 4HPAA pathway in the presence of tyrosine and dopamine to produce 96.7 mg/L norcoclaurine and 71.8 mg/L NMC. A dual pathway from tyrosine and dopamine to 33.6 mg/L NMC and 24.6 mg/L reticuline is promoted through the combination of PpDDC-Y79F-F80Y-H181N, ARO10 and PsTyDC1 in strain A1-06-AI.
Dynamic metabolomic profiling of AAS and PPDC branch pathways
Isotope tracing enables analysis of flux through bioproduction pathways5,33,34. While multiple reaction monitoring (MRM) with LC-MS is sensitive, this method does not readily detect isotope-labeled intermediates. After improving BIA titers to μM levels suitable for quantification with high-resolution CE-MS, isotope tracing experiments could be performed. Combinations of PsPDC1, ARO10, PsTyDC1 and PpDDC produce various labeling patterns: tyrosine-13C to BIA-13C2, L-DOPA-d3 with tyrosine-d4 to d6-labeled BIA, L-DOPA-d3 to d5-labeled BIA, L-DOPA-d3 with dopamine-d2 to d5-labeled BIA, tyrosine-d4 with dopamine-d2 to d6-labeled BIA, and tyrosine-d4 with dopamine to d4-labeled BIA (Fig. 6 and Supplementary Fig. 4). The loss of a ring deuterium atom during NCS-mediated condensation of aryl acetaldehydes with ring-labeled dopamine is consistent with the reported NCS mechanism (Fig. 6b and Supplementary Fig. 4d)35,36: this kind of mechanism-directed deuterium labeling pattern has not been reported for the tracing of BIA37-39. Isotope tracing from L-DOPA-d3 to d5-labeled BIA supports the bifunctional decarboxylase and oxidative deamination activities of PpDDC-Y79F-F80Y-H181N (Supplementary Fig. 4d). Improvement of NMC-d6 and reticuline-d5 production via PsTyDC1 in addition to PsPDC1 again demonstrates the synergistic combination of these distinct aryl acetaldehyde producing enzymes (Fig. 6b). Moreover, amounts of NMC-d6 and reticuline-d5 relative to their respective precursors norcoclaurine-d6 and THP-d5 (Supplementary Fig. 4, b and c) show the bottleneck of the S-adenosylmethionine (SAM)-dependent methylation of deuterium-labeled BIA. Furthermore, isotope tracing from tyrosine-13C supports that PsPDC1 and ARO10 are converting isotope labeled 4-hydroxyphenylpyruvate (4HPP) to downstream BIA (Supplementary Fig. 4a).
Monitoring time course turnovers of isotope labeled metabolite fractions enables direct observations of metabolic flux5,33,34. Mixed fractions of unlabeled and labeled BIA, could be clearly quantified in the case of high-titer production of d4-labeled norcoclaurine and NMC (Fig. 6c). In this case, a higher fraction of d4-labeled norcoclaurine relative to d4-labeled NMC is consistent with the SAM-dependent methyltransferase bottleneck observed previously1,7.