Structure of the PCP 2 -C 3 didomain. To elucidate the structure of C domain with a PCP domain bound in the acceptor site, we screened several systems including a thermophilic example of a PCP2-C3 didomain (containing the PCP domain from module 2 and the C domain from module 3) of the fuscachelin NRPS from the thermophilic organism Thermobifida fusca (Fig. 1A [red rectangle]).30 Expression of the fuscachelin PCP2-C3 didomain in E. coli yielded 0.8 mg/ L of culture of stable protein and afforded crystals that grew rapidly in 18–22% w/v PEG 3350 and 0.17–0.3 M magnesium formate at room temperature. Crystals were harvested, cryoprotected in 20–30% glycerol and diffraction data collected at the Australian Synchrotron, with initial phases obtained from a single anomalous diffraction experiment (SAD) using xenon-derivatised crystals (see Methods). The crystals belonged to the P212121 space group, with the unit cell comprising two highly similar copies of the PCP2-C3 construct (RMSD (all atoms) 0.74 Å).
The PCP2-C3 didomain structure we obtained from these experiments was solved at a resolution of 2.2 Å (Fig. 2A, SI Table S1). When considered separately, the overall folds of both the PCP2 domain and C3 domain were consistent with previously reported structures.26 The PCP2 domain comprises a 4-helix bundle with a small α-turn between helices 1 and 2 (seen in most crystal structures but absent from NMR structures); the serine residue that is the site of 4’-phosphopantetheine (PPant) attachment is located at the start of helix 2 (Fig. 2B). Of the published crystal structures of PCP domains, this structure is most similar to the PCP domain found in the PCP-Te/R didomain NRPS construct from the archaeon Methanobrevibacter ruminantium M1 (PDB ID 6VTJ; RMSD (all atoms) 1.3 Å, 32% sequence similarity, see SI Table S2). The C3 domain of the didomain resembles other members of its class (see SI Table S3), comprising a pseudo-dimer of CAT domains with bridge (R2923 to T2944) and floor loop (A2843 to L2858) regions (Figs. 1C & 2C). The catalytic residues sit at the core of the C3 domain and can be accessed from the bulk solvent via tunnels formed along the interface of the two pseudo-domains (Fig. 2D). Differences in the relative position of these two halves are observed in structures of C domain homologs and can alter the size and character of the acceptor and donor catalytic tunnels.3 A superimposition of the fuscachelin C3 domain with two well-characterized C domains (from surfactin and linear gramicidin NRPSs)7,12 highlights this, with a pronounced difference in displacements observed when comparing the fuscachelin C3 domain and Srf-A domain (SI Figure S1). This aspect of C-domain conformational flexibility and diversity is currently not broadly understood, although recent efforts have been made to understand these conformational differences in terms of the accessibility of the substrates to the active site the C-domain.9
In the PCP2-C3 didomain structure, the PCP2 domain sits at the acceptor-PCP binding site (near the opening of the acceptor substrate channel) on the C3 domain from the second chain in the asymmetric unit. The interface between the PCP2 domain and C3 domain is mostly hydrophobic in nature (537/510 Å2 buried surface area (chain A/B) excluding PPant), with the side chains of V2534, L2515, L2518, F2508 and F2538 of the PCP domain playing a major role in the interaction along with residues A2907, V2908, V2584, L2580 and W2579 in the C domain (Fig. 2E, SI Tables S4-5). This interface is reminiscent of the hydrophobic interaction pattern described in other structures of PCP domains docked at the acceptor site of C domains (SrfA-C (PDB ID 2VSQ),12 AB3403 (PDB ID 4ZXH);10 see also 26). These interfaces center around a hydrophobic residue (L2515) immediately following the serine to which the PPant is attached (S2514) and at least one hydrophobic residue ~ 20 amino-acids after the serine residue. R2906 also plays an important role in positioning the docked PCP domain via interactions with the phosphate moiety of the PPant arm. In the PCP2-C3 structure, these residues are V2534 and the aliphatic moiety of R2535 that interacts with V2908 of the C domain. The overall orientation of the PCP domain relative to the C domain is similar to what has been observed in the structures of SrfA-C 12 and ObiF1 (PDB ID 6N8E)8 (SI Figure S2A-B), whilst other structures contain a PCP domain that is rotated by several degrees around the conserved serine (AB3403,10 LgrA (PDB ID 6MFZ)7) (SI Fig. 2C-D). Although the overall orientation of these PCP domains in relation to the C domain are different, it is important to note that the position of the PPant-modified serine (located at the beginning of the second helix) is always maintained at the entrance of the acceptor substrate channel of the C domain.
Since the PCP2 domain precedes the C3 domain in the fusachelin NRPS, we had expected that the PCP2 domain would dock at the donor-PCP binding site of the C3 domain. We were surprised, therefore, to find that this construct crystallized with the PCP2 domain docked into the acceptor-PCP binding site of a symmetry-related C3 domain (Fig. 2A). Given that the PCP2 and PCP3 domains of the fuscachelin NRPS are highly similar (65% sequence identity, Fig. 3), and that PCP domains can act as both aminoacyl donors and acceptors for C domains, we rationalized that the arrangement observed in our structure is a valid model of an acceptor-PCP-bound C domain. Indeed, when we determined the structure of the isolated PCP3 domain, we found its structure to be highly similar to the PCP2 domain (RMSD (all atoms) 2 Å; Fig. 3A-C). Importantly, the residues at the interface with the C domain are conserved or highly similar (Fig. 3D). Furthermore, computational docking of the PCP3 domain onto the acceptor-PCP binding site of the C3 domain showed that it binds in an almost identical orientation to the PCP2 domain in the structure of the PCP2-C3 didomain (SI Figure S4). This supports the notion that the PCP2-C3 didomain structure is a valid representation of an acceptor-PCP-bound C-domain.
Analysis of the PCP2-C3 didomain structure revealed extra density extending from the conserved Ser (S2514) at the beginning of helix 2 of the PCP domain. This serine residue is the target of phosphopantetheinyl transferases, a class of enzymes that attach the essential PPant moiety to PCP domains. Mass spectrometric analysis of the PCP2-C3 didomain construct revealed a 340 Dalton mass increase, consistent with attachment of PPant to S2514, likely installed by the phosphopantetheinyl transferase EntD that phosphopantetheinylates some PCP domains when they are expressed in E. coli. Indeed, expression of the PCP2-C3 didomain construct in an entD mutant31 showed no increase in mass, supporting this hypothesis. Having confirmed the presence of a PPant arm, we modeled this into the electron density observed in our structure. Interestingly, we found that this did not extend into the active site of the C domain, but instead curled back towards the outer surface of the C domain (Fig. 4A). The side chain of R2577 appears to block the channel that leads to the active site of the C domain (Fig. 4A). Molecular dynamics simulations initiated from structures of the C3 domain (with the PCP-PPant removed) highlight the intrinsically dynamic nature of the acceptor substrate channel and the important role that R2577 has in modulating its shape and size (SI Figure S5). This residue forms the bottleneck of the channel and samples alternate rotamers (primarily rotation around chi-3) that, in concert with a displacement of alpha-helix 1, largely determines its size. When we compared our PCP2-C3 didomain structure with published structures of other C domains in complex with a PPant-modified PCP domain, we found residues with shorter side chains at this position (G21 in AB340310 and A18 in ObiF18), resulting in channels that do not block PPant access. Interestingly, this Arg residue appears largely conserved in LCL [1] domains (73% harbor an Arg at this position), but is not seen in DCL domains (Gly (80%) or Ala (4%) are found instead (SI Figure S6)). Whilst it was unclear what role this residue plays in NRPS function, we hypothesized that it could influence access to acceptor channel of the C domain.
Effect of R2577G mutation on substrate position. To verify the role of the R2577 in controlling access to the catalytic channel, we generated the Arg to Gly mutant (R2577G) of the C3 domain. To control the modification state of the PCP2 domain, the mutant PCP2-C3 didomain construct was expressed in the entD mutant of E. coli.31 After purification, the protein was modified using the promiscuous PPant transferase Sfp R4-4 mutant32 and coenzyme A (CoA) (See Methods) to ensure homogeneous PPant loading. Similar to the wild type construct, the protein expressed well and crystallized in the same conditions. Crystals diffracted to 2 Å and the structure was phased using molecular replacement with the previous model (SI Table S1). The structure of the R2577G mutant is very similar to that of the wild type protein, with the PCP2 domain sitting at the acceptor site of the C3 domain (RMSD (all atoms) 1.2 Å compared to wild type). The first noticeable difference is a small rotation of the PCP domain in relation to the C domain and slight alterations in the PCP interacting regions of the C domain, likely attributable to the R2577G mutation allowing the first helix of the C domain to sit deeper in the acceptor channel (SI Figure S7).5 The major difference, however, is the positioning of the PPant moiety, which now fully extends thought the acceptor channel into the active site (Fig. 4B, SI Figure S8) in a similar way to that seen in the ObiF1, SrfA-C and AB3403 structures.8,10,12 This observation supports the hypothesis that R2577 acts to control substrate access to the active site of the C domain. One possibility is that this process operates by charge repulsion: when an aminoacyl-PPant approaches the acceptor channel, the ammonium group of the substrate triggers the rotation of the Arg side chain due to charge repulsion, which opens the channel, allowing the aminoacyl-PPant to enter it. This would explain our inability to crystallize the wild type PCP2-C3 construct loaded with PPant derivatives lacking an amino group (such as propionyl and propan-1,3-dioyl,33 data not shown), due to interactions that interfere with crystallization when the substrate is not bound in the acceptor channel of the C domain. To further explore this mechanism, we next turned to the characterization of the PCP2-C3 construct with an aminoacyl group appended to the PPant thiol group.
Structure of the amino acid acceptor bound substrate. To append the glycyl substrate of module 3 to the PCP2 domain, we attempted to load the apo-PCP2C3 didomain using Sfp and the CoA thioester of glycine. Crystals in the same space group were readily obtained using the same method as for the two previously described structures. Somewhat surprisingly, in this structure it was clear that the electron density corresponding to the PPant did not sit in the acceptor channel but rather followed the same path as the substrate-free PPant, appearing to be repelled by R2577. However, upon refinement it became clear that the glycyl thioester had been hydrolyzed during crystallization. This forced us to explore alternatives to thioester-tethered amino acids, and we chose to use an analog of the aminoacyl-CoA with a thioether in place of the reactive thioester. This results in a non-hydrolyzable substrate analogue that is still tethered to the PPant via a C-S bond and has a very similar structure to the real substrate (SI Figure S9), circumventing issues encountered with other stabilization strategies.34 To obtain crystals of the PCP2-C3 construct with this substrate analogue (hereafter referred to as Glystab) bound, we again used Sfp to attach PPant-Glystab to the PCP domain. This construct was then crystallized as previously, resulting in diffraction to a resolution of 1.9 Å (SI Table S1).
The overall structure of the Glystab-loaded PCP2-C3 construct was highly similar to the holo-PCP2C3 construct (572/532 Å2 buried surface area (chain A/B) excluding PPant). In the Glystab structure, however, the density for the PPant extends through the acceptor channel of the C domain into the active site, as observed in the structure of the R2577G mutant (Fig. 4C-D). R2577 now forms specific interactions with two of the carbonyl oxygen atoms in the Ppant arm (3.7 Å and 3.8 Å), likely acting as a ratchet to hold the Ppant arm (and substrate) in the correct position until after peptide bond formation has occurred (SI Figure S10). The PPant-Glystab extends completely into the active site (Fig. 5A), with the terminal amine of Glystab stabilized by hydrogen-bond interactions (Fig. 5B). Of particular interest, given the lack of clarity over the role of the active site histidine in the HHxxxDE motif, is its close proximity (3.6 Å) to the amino group of the Glystab moiety. An ordered water molecule also sits close (2.9 Å) to this amino group, where it likely forms a hydrogen bond. By modelling the reaction with density functional theory (Figs. 5C, see Discussion), we predict that N-C bond formation likely precedes N deprotonation and a distinct zwitterionic (oxyanion/ammonium) intermediate is formed (Fig. 5D). This is reminiscent of amide bond formation via reaction of an ester or anhydride with an amine in solution, which is known to occur via a similar mechanism. A significant energy barrier is observed for proton transfer from the zwitterionic intermediate to the imidazole group of the active site histidine residue, suggesting the mechanism of peptide bond formation in C domains relies on specific base catalysis. This may explain why the mutation of this central histidine residue does not completely abolish activity in some C domains, as an active site water molecule could instead play the role of an alternate specific base. The calculations show that the formation of at least one hydrogen bond to the oxyanion is key to stabilizing the zwitterionic intermediate. We also observed the close interaction of the atypical E residue in the HHxxxDE motif (which is typically a Gly in most C-domains) with the nitrogen atom of Glystab (2.6 Å). It is important to note that Glystab sits in a different position to the aminoacyl mimic in a previous model of a C domain bound to the acceptor substrate – in these structures the aminoacyl mimic does not enter into the active site as far as observed in our GlyStab-PCP2-C3 complex.11
Exploring C domain activity and specificity of the PCP 2 -C 3 construct. To test the activity and selectivity of the C domain, as well as the effect of mutating key residues, we first needed to generate an activity assay for the C domain using the PCP2-C3 construct and downstream PCP3 domain. Given that the interaction between PCP and C domains is weak and transient in nature,26 we first validated the importance of this restraint in an assay using separately isolated PCP2-C3 (loaded with a synthetic dihydroxybenzoic acid (DHB)-D-Arg-Gly donor substrate) and PCP3-Gly constructs. This experiment revealed no elongation when these constructs were incubated together. Thus, we turned to the use of a fused PCP2-C3-PCP3 construct, albeit one in which the PCP-constructs could be separately loaded with substrates prior to generation of the fused complex (Fig. 6A). To accomplish this, we cloned the donor PCP2-C3 construct with a C-terminal SpyCatcher domain and the acceptor PCP3 with an N-terminal SpyTag peptide.35 This system allows for the separate loading of the substrates on the PCP domain of each construct using Sfp and synthetic CoA substrates whilst also allowing the reconstitution of the NRPS assembly line.
Using this experimental setup, we confirmed that the condensation reaction was performed as expected, with high levels of conversion of the canonical donor DHB-D-Arg-Gly tripeptide into the Gly-extended tetrapeptide, as determined by high-resolution LC-MS/MS experiments (Fig. 6B). Next, we tested a simplified benzoic acid (BA)-D-Arg-Gly donor substrate in these assays, which showed acceptable levels of conversion (61%) and hence we retained this simplified substrate for all subsequent assays. With a functional condensation assay in hand, we first could verify that the stabilized Glystab acceptor substrate was a functional mimic of Gly in this C domain (SI Figure S11) using intact protein MS together with PPant ejection (see Methods). With confidence that the Glystab structure represents a functional acceptor substrate-bound C domain state, we then set out to investigate the effect that mutating key residues had on the condensation activity. Firstly, we confirmed that the R2577G mutant C domain retained activity (with Gly), although this was reduced compared to the wildtype C domain (32%), possibly due to the loss of stabilizing interactions with the PPant arm (Fig. 6C, SI Figure S10). We next generated an active site H2697Q mutant and determined that H2697 is indeed essential for activity with this C domain, as the mutant only retains ~ 1% of the WT activity with Gly as the acceptor substrate (Fig. 6C).
In addition to Gly and Glystab, we found that C3 could also accept PPant-linked L-Ala and L-Leu as substrates, with 99% and 75% conversion levels, respectively (Fig. 6C). In contrast, PPant-linked L-Phe was a poor substrate, with minimal (6%) levels of conversion. In order to rationalize these differences, we analyzed the structures and performed molecular docking. First, assuming that the position of Glystab in our GlyStab-PCP2-C3 complex represents that catalytically-competent conformation, and that alternate amino acid acceptor substrates must bind in a way that positions the terminal amine group in a similar position, we identified several residues in the central cavity that would likely interact with the side chain of an alternate acceptor substrate. In particular, the side chains of M2917, S2919, Q2921, P2941 and E2950 could contribute a putative side chain binding pocket for this C-domain, in a manner reminiscent of A-domains (Fig. 5B). Molecular docking revealed that sidechains of L-Ala and L-Leu could be accommodated by the active site cavity’s side-chain binding pocket and had top scoring poses that positioned the terminal amine towards the catalytic residues (although the L-Leu pose was slightly strained, SI Figure S12). In contrast, the bulky side chain of L-Phe could only be accommodated within the central cavity in poses that positioned the terminal amino acid amine away from the catalytic histidine and that would not be compatible with catalysis (SI Figure S12). Analysis of the putative pocket residues (M2917, S2919, Q2921, P2941 and E2950) compared to the reported activity of the downstream A-domain did not reveal any correlation between acceptor substrate and these possible “pocket” residues (Spearman's rho: -0.05), indicating the lack of a C-domain side chain binding pocket and hence “C domain code” comparable to those found with A domains (SI Figure S13).25,36 Our results do however indicate that alterations in the C domain active site can lead to changes in selectivity, and hence we turned to further analysis of the residues within the active site motif.
Although most C domains contain a canonical HHxxxDG motif,37 the C3 domain from fuscachelin NRPS features an unusual HHxxxDE variant. We hypothesized that, in absence of a side chain in the acceptor substrate (Gly) to position the acceptor substrate, the role of this glutamate (E2702) could be to stabilize and orient the acceptor substrate amine group to ensure an efficient nucleophilic attack of the donor substrate thioester. Indeed, an analysis of C domains where their acceptor substrates are known demonstrated that there is a higher proportion of modified motifs where the acceptor substrate is small as opposed to traditional HHxxxDG containing C domains (SI Figure S14). To test this hypothesis, we mutated this glutamate to its canonical glycine residue (E2702G) and performed condensation reactions with Gly as the acceptor substrate. As expected, the condensation level with Gly as the acceptor substrate was reduced by almost half (61–35%) when compared to the WT, demonstrating the non-essential, although beneficial role of this glutamate residue. Interestingly, while the E2702G mutation had reduced activity with the Gly acceptor substrate, this substitution improved the activity for PPant-linked L-Leu from 75–92% (Fig. 6C). This result indicates that the E2702 residue can play a particularly important role in supporting condensation reactions involving Gly as an acceptor substrate, but may be detrimental for other acceptor substrates. Molecular docking of the Gly-PPant and Glystab-PPant into a model of the E2702G C3 mutant reveals how the removal of the glutamic acid results in substrate poses that are unlikely to be compatible with catalysis, with the terminal amine of the substrates instead interacting with Glu2950 (SI Figure S15).
[1] Superscript indicates the stereochemistry of the C-terminal residue of the donor substrate, subscript indicates the stereochemistry of the acceptor substrate