Identification of kintamdin 1 from Streptomyces sp RK44
Streptomyces sp. RK44 is a new streptomyces strain isolated from a soil sample collected at the Kintampo waterfall in the Bono East of Ghana in 2014. Metabolite profiling of the strain revealed the presence of a high molecular weight metabolite (2507 Da) and other metabolites8 under laboratory culture conditions. Large scale fermentation (10 L), followed by chemical workup allowed the isolation of pure compound 1 (3 mg).
Analysis of high-resolution electrospray ionization mass spectrometry (HR-ESIMS) gave a [M + 3H]3+ ion at m/z of 836.7524, indicating a neutral monoisotopic mass of 2507.2349 Da (Supplementary Fig. 1). Inspection of 1H NMR indicated the presence of numerous overlapping α protons of amino acid (δH 3.4–4.6 ppm), suggesting that 1 is a peptidic natural product (Supplementary Fig. 2). 1 displays a pair of unusual H signals (5.3 and 6.85 ppm, respectively, 8.4 Hz j-coupling), suggesting the presence of a moiety containing cis configuration (Fig. 1C and Supplementary Fig. 2). Interpretation of 1- and 2-D NMR spectra enabled identification of the major fragment (19 mers) of the peptide (Supplementary Figs. 3–7). Along with different proteinogenic amino acids, modified residues such as dehydroalaine (Dha) and dehydrobutyrine (Dhb) were found in 1 (Supplementary Figures S8-11). The unit containing cis alkene was identified as an unsaturated β amino acid, (Z)-3-amino-acrylic acid (Aaa), which to the best of our knowledge, has not been previously reported in any other RiPPs (Supplementary Figures S12-14). The sequence tag was also confirmed through assignment of de novo analysis of MSn spectra using tandem MS fragmentation and the sequential fragment ions (Supplementary Fig. 15 and Supplementary Table 1). Almost all of the mass shifts generated can be substituted with proteinogenic amino acids except mass shifts of 69 and 83 Da which were assigned to the non-proteinogenic amino acids, Dha and Dhb. It is worth to note that Aaa (ΔZβAla), presumably a rearranged product of dehydrated serine, has the same molecular weight as Dha in the MS analysis. Collectively, the NMR data combined with MS analysis allowed us to connect the major fragments of the 1 sequence while both sequences in both N and C terminals remain undetermined.
Kintamdin 1 featuring unusual chemical moieties
To assist the structural elucidation of the remaining sequence of 1, we took advantage of genome mining strategy. Blast search in the annotated RK44 genome in RAST servers9 using the sequence tag as a probe led to the identification of a 174 bp open reading frame (orf) encoding the precursor peptide, KinA (Supplementary Figure 16). With the AA sequence on hands, the molecular formula of 1 was established as C115H174N28O31S2 based on the HR-ESIMS analysis (observed [M+ H]+ = 2508.2427, calculated [M+ H]+ = 2508.2414, Δ = 0.518 ppm]. We then revisited the NMR spectra including 1H, COSY, TOCSY, HSQC, HMBC and NOESY (Supplementary Figures 17-27 and Supplementary Table 2). An N, N-dimethyl isoleucine was found to be present at the N-terminal of 1, a typical chemical feature of the rare linaridin family10 and more recently cacaoidine11. Unfortunately, fragmentation within the C-terminal structure was not observed in MS2 and the y6 ion was resistant to fragmentation in MS3 experiments. Therefore, HR-MS and isotope fine structure analysis was used to confirm the elemental formula of the y6 ion as [C26H34N7O8S1]+ (Fig. 2 B-D). However, inspection of NMR spectra together with the sequence of the precursor peptide allowed us to determine an unprecedented bis-thioether crosslink ring system in the planar structure of the C-terminus of 1 (Supplementary Figures S28-30 and Supplementary Table 2). The bis-thioether crosslink (MAbi) is unique in natural products and further highlights the structural novelty of 1. Overall, the combination of these unusual chemical features makes 1 unique among natural products described to date.
The planar structure of 1 (Fig. 2A and Supplementary Fig. 31) confirmed the dehydration of genetically encoded Thr-2, 3, 4, 6, 20, 22, and Ser 8 and the subsequent Michael addition of decarboxylated Cys-27 to Dhb-22 to yield the transient S-[(Z)-2-aminovinyl]-(3S)-3-methyl-cysteine (AviMeCys) as observed in biochemical precedent2,11, followed by the second cyclization of Cys-11 to AviMeCys to generate the bis-thioether crosslink. The Aaa-7 residue in 1 is derived from the rearranged dehydration of Ser-7. At the same time, it was shown that three of the serine amino acids in the genomic sequence (Ser-13, 16 and 18) was present in the final structure as Ala residues. It is likely that these L-Ser residues are converted to D-Ala. The biochemical precedent has also been found in a few RiPPs4,12−14, suggesting that L-Ser undergoes dehydration, followed by subsequent reduction. To confirm those conversions and determine the absolute configurations for all the amino acids in 1, we performed advanced Marfey’s analysis (Supplementary Fig. 32, Supplementary Table 3). The unmodified amino acid residues are L-configuration, except Cys-11 and Ala residues. Both L- and D-Ala are present in 1 and the relative peak area suggested a molecular ratio of two L-Ala to three D-Ala. This was consistent with the formation of D-Ala stereoisomer in 1 exclusively from the three genetically encoded serine residues (Supplementary Fig. 33).
A computational model of 1
The unusual fused macrocylic MAbi ring system in 1 marks a new structural motif for cyclic peptides. The three-dimensional conformation of 1 was modelled to address the unique structural constraints imposed by the macrocyclization and the Aaa-7 (ΔZβAla) residue as well as the stereochemistry of four as-yet to-be-determined chiral centres, Cys-11, α- carbon at aminoether 1,2 dithiol-27 (AED-27), and α- and β- carbons at Abu-22 (Fig. 2A). Computational modelling approaches have been successfully used to distinguish among possible diastereomers in the post-translationally modified peptides15,16. However, due to the high content of modified residues in these systems it is difficult to obtain accurate force fields and so less accurate potentials must be used. In the present work we have employed electronic structure calculations at the density functional tight-binding level which avoid the need for force field parameter sets suitable for the compound under study17. The self-consistent-charge extended tight binding method GFN2-xTB was selected as this provides accurate geometries, vibrational frequencies and non-bonded interactions whilst being rapid enough to permit molecular dynamics (MD) simulations of useful length to be performed on systems of the size of the peptide under study here. In the present work, atom pairs identified by NMR NOESY correlations (Supplementary Figures 34-36) were employed in conjunction with the GFN2-xTB-MD method. A NOESY spectrum acquired in CD3OH at 298K with the cross-referenced NOESY in CD3OD18,19 was utilized for the calculations (Supplementary Figures 34-36). MD simulations were performed for multiple structures containing different combinations of either the R or S configurations at the chiral carbons in the vicinity of the bis-thioether linkage. The S configurations in both chiral centres at Abu-22, the S configuration at AED-27 and the R configuration of Cys-11 displayed better fits to the data, accounting for a greater number of NOESY-derived distance constraints, relative to other diastereomers (Supplementary Tables 4 and supplementary information of molecular dynamic animations). The calculated structure indicates a relatively rigid fused bicyclic structure held in place by the two crosslinks as well as important backbone and side chain hydrogen bonds (Fig. 3). Extensive hydrogen bonds among the backbone of amino acids residues (such as carbonyl at DAla-13 with NH at Val-15, carbonyl at Leu-14 with NHs of Ala-17 and DAla-18, respectively, carbonyl of Ala-17 with Dhb-20, carbonyl of DAla-18 with NH of MAbi residue) within the fused macrocycle could also be formed, giving the ring system a cage-like secondary structure (Fig. 3 A, molecular dynamics animation in supplementary document). Interestingly, the N-terminal tail is also structured through hydrogen bonds among the side chain of Glu-9 and the NH and carbonyl groups in the backbone, making the helix-like N-terminus coiled toward the fused macrocycle (Fig. 3). The Z-configuration at position 7 is likely to play an important role of the overall shape of N-terminal flexible chain, possessing an N-H••O=C intra-residue H-bond in an average distance of 2.14±0.20 Å (Fig. 3). Together, the studies above provide a conformational model for 1 and indicate that the crosslinks are generated with the R, S, S,S configurations, and the cage-like macrocycle C-terminus as well as the helix-like N-terminus are significantly stabilised via hydrogen bonding, a prediction that can be tested experimentally in the future. The extensive molecular interactions among the linear chain and the macrocyclic motif, based on the calculated model (Fig. 3), allowed us to re-examine the NOE data for previously unnoticed long-range correlations. Indeed, interpretation of complex NOE spectrum allow identification of two new long-range correlations between the Aaa-7 residue in the linear chain and the methyl group at MAbi motif in the macrocyclic ring, and between β-H at Leu-14 and α-H at aminoether 1,2 dithiol (S1, S2) of MAbi motif, respectively, further validating the calculated model (Fig. 2A).
Overall, 1 displays unusual chemical features as well as a predicted sophisticated secondary structure, rendering 1 as a new RiPPs, which we named kintamdin, associated with the place, Kintampo water fall where Streptomyces sp. RK44 was originally isolated. When tested for various antibacterial and cytotoxic activities, 1 displays good anticancer activities against skin and breast cancer cell lines (2.4 ± 0.1 µM and 0.6 ± 0.1 µM, respectively) where possess weak or no inhibitory activities against bacterial strains available (Supplementary Table 5).
The minimal biosynthetic gene cluster of 1
The structural novelty of 1 motivated us to probe its biosynthetic origin in the producing strain, RK44. Analysis of the surrounding genetic environment of kinA allowed identification of a candidate gene cluster (kin) (Fig. 4A, Supplementary Table 6). To validate the identity of the BGC, we carried out TAR cloning strategy for heterologous expression. To this end, we modified the construction method of pathway-specific capture vectors in order to improve the capture efficiency of the BGC of interest from the genomic DNA of RK44 as shown in Supplementary Fig. 37. One out of five clones after yeast transformation was identified to contain the correct length of the BGC in the construct. The construct pCAP03-kin2 was then transferred into various streptomyces hosts in our lab via E. coli-streptomyces conjugation. The production of 1 in Streptomyces coelicolor M115220 was confirmed through HRMS analysis as well as MS2 fragmentation by comparing with the ones generated from the authentic peptide 1 (Fig. 4C, i-ii and xi, Supplementary Figs. 38–39).
To determine the boundary of the 1 BGC, a series of gene inactivation were carried out on pCAP03-kin2, followed by E coli-streptomyces conjugation and fermentation. HRMS analyses of the extracts of these variants demonstrated that gene inactivation of orf-1 and orf-2 as well orf1 at the boundaries of the cloned DNA fragment showed no perturbation of the 1 production (Figure S40). Therefore, the minimal BGC directing the biosynthesis of 1 include fifteen orfs, encoding two metallopeptidases from the M16 family (KinE and KinF), one flavin-dependent decarboxylase (KinI), one SAM-dependent methyltransferase (KinO), one oxidoreductase (KinJ), two aminoglycoside phosphotransferases (KinD and H), four ABC transporters (KinK-N), two hypothetical proteins (KinB and C), and one transposase (KinG) (Fig. 4A and Supplementary Table 6). It is noteworthy that none of ORFs encoded in the BGC share any significant homologues (< 30% aa sequence identities) identified in other classes of natural products. Genes involved in some of the posttranslational modifications of 1 were assigned (Supplementary Fig. 16 and Supplementary Table 6). Both KinE and KinF share low homologue with each other. They may function as proteases to generate mature 1, a similar proposed function in the biosynthesis of ruminococcin C21. KinI, a protein homologue to cypemycin N-methyltransferase22 may be responsible for the installation of N, N-dimethyl moiety at the N-terminus of 1, while KinJ might be involved in the hydrogenation of Dha to D-Ala, as previously described for LanJB enzymes such as CrnJ13 and BsjJB14. Although sharing low homologue (< 30% AA identity) to cypemycin decarboxylase (CypD)23, KinO may account for the generation of the macrocyclic bis thioether ring (MAbi) system. The kin cluster does not encode any protein homologous to dehydratases presented in lanthipeptide (i.e. LanB) or linaridin (i.e. CypH) BGCs. The possible candidate gene products for dehydration are KinD and KinH, which share low structural homology (16% i.d.) to the ATP-dependent aminoglycoside phosphotransferase (APH, Pfam01636) superfamily, predicted in Phyre 2 server24. It was proposed that KinD and Hmay catalyse the phosphorylation of Ser and Thr residues, followed by anti-elimination to yield Dha and Dhb residues respectively. The enzymes responsible for the formation of the unsaturated β-amino acid, Aaa-7, remain to be determined. Blast search using KinC as a sequence query in NCBI and structural modelling in Phyre2 server24 indicated that KinC share no significant homologue to any known proteins. However, part of the predicted structural model displays low degrees (83.8 confidence, 18% id) of similarity to the C-terminal domain of phosphoglucomutases, suggesting that KinC may be the candidate to catalyse the rearrangement of phosphorylated Ser-7 into Aaa-7 residue.
To assess the in vivo role of these genes, we generated eight variants (ΔkinC-F, H-J and O) (Fig. 4C, traces iii-xi). Gene inactivation of ΔkinE resulted in significantly reduced production of 1 while knocking out kinF caused moderate reduction of the 1 production (Fig. 4C, traces v-vi), suggesting the presence of possible synergetic function between these two genes to ensure the efficiency of the proteolytic activities.
The production of 1 was abolished in six other variants (ΔkinC, D and H-O), suggesting that these genes are essential for 1 biosynthesis (Figure 4C, traces iii, iv and viii-xi). Gene inactivation of kinO resulted in accumulation of a new metabolite 2 in the culture of the ΔkinO mutant (Figure 4C, trace xi). MS and MSn fragmentation analysis demonstrated that 2 is the non-methylated 1 (Supplementary Figure 41). Knocking out kinI, encoding a flavin-dependent decarboxylase resulted in accumulation of another new metabolite 3 in the culture of ΔkinI variant (Figure 4C, trace xi). 3 is the N,N-dimethylated 27-mer linear peptide containing the intact cysteine residue at the C-terminus and dehydrated amino acid residues as well as Ala residues derived from Dha, as evidenced in MS and tandem MS analysis (Supplementary Figure 42), suggesting that the dehydration on Thr and Ser and hydrogenation on the resulting Dha residues occur prior the decarboxylation.
To assess the roles of the gene products of kinI and kinO in the 1 biosynthesis, biochemical analysis of recombinant KinI and KinO was carried out. Overexpression of kinI and kinO in E. coli allowed purification of both recombinant proteins to near homogeneity, as observed in SDS Page analysis (Supplementary Fig. 43), respectively. An in vitro assay of a 6-His-tagged recombinant KinO with 2 in the presence of S-adenosyl-L-methionine (SAM, 1 mM) was performed. The production of 1 was confirmed by HR-MS and MSn analysis (Supplementary Fig. 44), confirming that KinO is responsible for the N-terminal Ile dimethylation. Interestingly, 3 contains Dhb at position 20, suggesting that the Abu residue in 1 is derived from Michael addition of decarboxylated Cys with Dhb to likely yield AviMeCys, a biochemical precedent proposed in cypemycin10 and cacaoidin11. Incubation of KinI with 3 and other necessary cofactors, however, failed to produce any new products, indicating that 3 is a shunt product in the variant.