Syntenic analysis reveals ubiquity of the miltiradiene biosynthetic gene cluster
C. americana provided a unique opportunity to investigate the evolution of a family-wide diterpenoid BGC since it is in a sister lineage to the rest of the Lamiaceae and has a large, dense BGC. We analyzed nine Lamiaceae genomes against our anchor species, C. americana, to determine synteny with its miltiradiene BGC (Fig. 2). We chose our genome panel based on their assembly quality and contiguity as well as subfamily representation (i.e., phylogenetic placement). In addition to three species with previously reported syntenic BGCs, we selected four species with published genomes. To increase the diversity of representatives across the phylogeny, we sequenced three new genomes (Supplemental Table 2, Supplemental Table 3). In total, these ten species represent five of the twelve currently recognized subfamilies with a most recent common ancestor estimated at 60–70 million years ago80–82.
Out of the 10 species sampled, all contained diTPSs orthologous to known (+)-CPP and miltiradiene synthases. In seven species these diTPSs were within syntenic BGCs (Fig. 3). The genomes of Prunella vulgaris, Plectranthus barbatus, and R. officinalis were too fragmented to determine whether they were part of a larger cluster. Four of the BGCs in this analysis have not been previously reported, showing that this cluster is even more conserved than originally described. All BGCs except that in S. baicalensis contain multiple CYP76AH genes. Five species, C. americana, T. grandis, S. miltiorrhiza, Hyssopus officinalis, and Leonotis leonurus, also had at least one copy of a CYP71D gene.
Comparison of the BGCs provides insight into the formation and maintenance of this cluster in divergent lineages (Fig. 3). The S. baicalensis BGC uniquely contains no CYPs but appears to have tandem duplications of a class II diTPS and an additional non-syntenic class I diTPS. Non-syntenic diTPS and CYP genes are present in most of the BGCs, pointing toward dynamic assembly and independent refinement in each species. There are also several diTPS and CYP pseudogenes presumably present from past tandem duplications. Interestingly, there are few interrupting genes in these BGCs. The H. officinalis and C. americana BGCs encompass large genomic regions with more intergenic space, while others such as Pogostemon cablin and L. leonurus are compact and gene dense. The presence of two related BGCs in both S. miltiorrhiza and L. leonurus may award the plant evolutionary flexibility with a duplicated pathway. It is evident that each BGC, while maintaining the core miltiradiene genes, has assembled and disassembled in a lineage-specific manner.
Phylogenetic evidence of an ancestral miltiradiene cluster in Lamiaceae
To better understand evolution of genes from each BGC, we estimated phylogenetic relationships for each enzyme subfamily in the BGCs along with a set of functionally characterized reference genes from Lamiaceae, except in the 71D clade where few characterized Lamiaceae sequences are available (Fig. 4, Supplemental Table 4). Consistent with other angiosperm labdane-type diTPSs, those diTPSs with class II function cluster in the TPS-c subfamily while those with class I function cluster in the TPS-e subfamily.
As expected, syntenic diTPSs in both subfamilies have common ancestry. Recent tandem duplications in the TPS-e and TPS-c families are evident in all examined species and contribute to lineage-specific BGC expansion (Fig. 4, Fig. 5). The phylogenies also highlight the more distant origins of several non-syntenic diTPSs. The presence of divergent class I and II sequences points to independent acquisition as part of the diversification that occurred during speciation. Close inspection of phylogenetic relationships with characterized diTPSs can offer clues to likely functions. All class II diTPSs syntenic to CamTPS6 phylogenetically cluster in clade TPS-c.2.2, which contains all known Lamiaceae (+)-CPP synthases as well as some diTPSs which yield labdanes in the (+)-configuration. The two divergent class I enzyme sequences, Sb.71 and Pc.28, cluster in TPS-c.1 which produces compounds in the ent- rather than (+)-configuration, so it is likely that these two enzymes follow suit.
Consistent with their expected role in specialized metabolism, no BGC class I enzymes clustered in clade TPS-e.1. This clade contains mostly ent-kaurene synthases, which are integral to gibberellin metabolism. All BGC class I diTPSs cluster in TPS-e.2, which contains enzymes that generally accept (+)-CPP as a substrate. Enzymes syntenic with CamTPS9 are grouped in clade TPS-e.2.1, which contains all but one of the Lamiaceae enzymes known to catalyze formation of miltiradiene. Also characteristic of this clade is the loss of the internal γ domain, which is retained in most diTPSs but lost in mono- and sesqui-TPSs. The non-syntenic enzyme sequences are split between clades TPS-e.2.2 and TPS-e.2.3, which encompass only a few characterized sequences with unique functions. The functional heterogeneity of these clades makes it difficult to draw conclusions as to the likely function of these BGC enzymes but does offer intriguing possibilities for discovery of novel terpene backbones.
While phylogenetic classification is not a perfect predictor of TPS function37,83, previous work has demonstrated a high level of clade specific consistency that allows us to draw tentative conclusions about the function of the BGC diTPSs57. Phylogenetic evidence supports that these BGCs likely have at minimum a (+)-CPP synthase and a miltiradiene synthase, enabling production of miltiradiene in each plant (Fig. 4). Moreover, several BGCs contain diTPSs from clades that may offer distinctive chemistries.
CYPs in the 76AH subfamily exhibit close phylogenetic clustering across the species analyzed. Several functionally characterized CYP76AHs have been found to oxidize miltiradiene in critical steps towards tanshinone and carnosic acid biosynthesis63,64. Although we were unable to identify a BGC in R. officinalis due to a fragmented assembly, the close relationship between the RoCYP76AH enzymes and those the other BGCs supports common ancestry. Nearly all CYP76AHs in the BGCs have paralogs within each cluster, highlighting the role of tandem duplication in expanding this subfamily55,84. However, there are several BGC CYP76AHs that are highly divergent from the syntelogs. The C. americana enzymes CYP76AH65, CYP76AH66, and CYP76AH67 are phylogenetically distinct, showing only 50–60% sequence similarity to other BGC CYP76AHs. These enzymes are more related to the clade of CYP76AKs, which have not been found in this BGC but are part of the tanshinone and carnosic acid oxidation networks.
CYPs in the 71D subfamily similarly show phylogenetic clustering with others in the BGCs. Three CYP71D enzymes from H. officinalis and L. leonurus are in the same clade as the CYP71D array from S. miltiorrhiza, which was implicated in furan ring formation for the tanshinones24. SmCYP71D410 is a previously unrecognized member of the BGC Sm-b that phylogenetically clusters with HoCYP71D724 and PbCYP71D381 enzymes. PbCYP71D381 can oxidize the forskolin precursor (13R) manoyl oxide, a close structural relative of miltiradiene85. One enzyme from T. grandis stands out as much less related than the rest, with only 40–50% sequence similarity to other BGC CYP71Ds. This enzyme is likely another recent independent acquisition, although it is the only one observed in the CYP71D subfamily. All BGCs containing CYP71Ds also have at least one duplication, once again highlighting the importance of duplication in the diversification of these pathways86.
Close phylogenetic clustering of most enzymes in all four subfamilies provides compelling evidence for a common ancestral origin and subsequent lineage-specific duplications. We analyzed presence/absence of syntelogs and proposed a model for a minimal cluster using ancestral state reconstruction (Fig. 5, Supplemental Fig. 1, Supplemental Fig. 2). High levels of sequence conservation between syntelogs supports a minimal ancestral cluster that contains a (+)-CPP synthase, a miltiradiene synthase, a CYP76AH, and a CYP71D. The dynamic nature of this BGC over millions of years of evolution is evident through the gene loss, presence of pseudogenes, and addition of non-syntenic genes observed in these extant Lamiaceae. Despite these differences, the high degree of conservation of the ancestral cluster is notable.
Since the miltiradiene BGC was present in nearly every Lamiaceae species sampled, we also investigated the synteny in Erythranthe lutea (yellow monkeyflower; formerly Mimulus luteus), a closely related Lamiales outgroup80,87,88. We found a syntenic block which contains a class II diTPS as well as a class I diTPS but no CYPs. The class II diTPS, El.13152, is in clade TPS-c.2, showing some similarity with the (+)-CPP synthases. The class I enzyme, El.13874, is within TPS-e.2.1, but distinct from the rest of the clade and surprisingly retains the γ domain (Fig. 4). This domain loss has occurred multiple times in the evolution of plant TPSs89, so it is conceivable that El.13874 represents the three-domain Lamiaceae miltiradiene synthase shared by the most recent common ancestor. While the E. lutea cluster provides a glimpse into an ancestral state of the Lamiaceae BGC, a more widespread examination of additional Lamiales genomes would be an interesting avenue for future work and could more firmly establish the timeline of gene acquisition and loss.
Functional characterization of the C. americana BGC reveals two metabolic modules and a novel terpene backbone
Though increasing numbers of computationally predicted BGCs have been identified in plants, only a few are functionally characterized. So far, coregulation has proven to be a greater predictor of functional relationship in BGCs than colocalization alone90. Previous analysis of the two BGCs in S. miltiorrhiza, Sm-a and Sm-b, found that each had divided expression between root and aerial tissues. The diTPSs from Sm-a and CYP76AHs from Sm-b were expressed exclusively in root tissues and found to be vital steps in the root tanshinone biosynthetic pathway59. Additionally, an array of root-specific CYP71Ds were also integral to tanshinone biosynthesis but located elsewhere in the genome24. Another example where differentially expressed diTPSs and CYPs were reported in distinct specialized metabolite pathways despite being colocalized is the bifunctional gene clusters of phytocassanes/oryzalides found in Oryza sativa (rice)79 and the noscapine/morphinan biosynthesis in Papaver ssp. (poppy)11,66. Divergence in expression may be one way in which plants exploit some of the benefits of genomic organization while creating unique pathways based on regulation.
Given the unprecedented size and complexity of the BGC identified in C. americana, we sought to investigate whether it is a metabolically unified BGC. We first analyzed RNA expression in 8 tissue types to determine the expression pattern of the BGC (Fig. 6)78. This revealed a clear divergence between the first and second halves of this BGC. The first half is preferentially expressed in fruit and root tissue and contains a (+)-CPP synthase (CamTPS6)78, the predicted miltiradiene synthase (CamTPS9), and several CYP76AHs. The second half is more strongly expressed in flower and young leaf tissues and contains a non-orthologous class I diTPS (CamTPS10), another predicted (+)-CPP synthase (CamTPS7), and two CYP71Ds as well as partial fragments of a CYP76AH (Ca.26–27). The presence of a diTPS class II/class I pair as well as CYPs in each module suggests that this BGC may have evolved divergent diterpenoid pathways.
We successfully cloned from cDNA and tested the following members of the C. americana cluster: CamTPS7, CamTPS8, CamTPS9, CamTPS10, CamCYP76AH64, CamCYP76AH65, CamCYP76AH67, CamCYP76AH68, CamCYP76AH69, CamCYP71D716, and CamCYP71D717. Combinations of all genes were transiently expressed in Nicotiana benthamiana to evaluate enzyme function and potential promiscuity. All gene constructs were co-infiltrated with two genes encoding rate-limiting steps in the upstream 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway: P. barbatus 1-deoxy-D-xylulose-5-phosphate synthase (PbDXS) and GGPP synthase (PbGGPPS) to boost production of the diterpene precursor GGPP91,92. DiTPS functions were determined by comparison of mass spectra and retention time by GC-MS with published diTPS activities or using NMR for previously unpublished activity (Fig. 7). CamTPS7 was confirmed to be a (+)-CPP synthase (Supplemental Fig. 3). CamTPS9 is a miltiradiene synthase, with some abietatriene resulting from spontaneous aromatization in plantae consistent with previous observations93. CamTPS10, when paired with a (+)-CPP synthase, forms (+)-kaurene, a previously unknown diTPS activity (NMR: Supplemental Fig. 4). The biological relevance of this activity is supported by the structure of the diterpenoid calliterpenone, which is derived from the (+)-kaurene backbone and has been documented in multiple Callicarpa species94. Calliterpenone has been investigated for its potential as a plant growth promoting agent95, and thus represents an interesting biosynthetic target. Discovery of this (+)-kaurene synthase will enable biosynthetic access to this group of metabolites as well as to non-natural diterpenoids that may have useful bioactivities92. The physical grouping and similar expression patterns of CamTPS10 and CamTPS7 supports that this cluster has diverged into two metabolically distinct modules through the duplication of a (+)-CPP synthase, the recruitment of an additional class I diTPS, and a shift in tissue-specific gene expression.
After establishing routes to the formation of the C. americana diterpene backbones, we tested each CYP against all possible diterpene intermediates found in this plant (Fig. 8): ent-kaurene (CamTPS12; Supplemental Fig. 5) and kolavenol78 formed by diTPSs outside the cluster, and (+)-kaurene and miltiradiene from the BGC. No activity was detected with kolavenol or ent-kaurene. With miltiradiene, CamCYP76AH67 formed six different oxidation products. Based on m/z of the molecular ions and comparison of mass spectra with each other and the NIST database, two match oxidations of abietatriene and the other four of miltiradiene (Supplemental Fig. 6). We were unable to separate these products by column chromatography, preventing complete structural elucidation. CamCYP76AH68 dramatically shifted the product profile towards abietatriene and afforded a small amount of oxidized abietatriene (Supplemental Fig. 6). This indicates that CamCYP76AH68 may be hydroxylating the c-ring of miltiradiene, which then undergoes water loss to form abietatriene more readily than the spontaneous aromatization of miltiradiene alone (Fig. 9). In previous work characterizing enzymes involved in tanshinone and carnosic acid biosynthesis, the ferruginol synthases showed a preference for abietatriene, but enzymatic conversion of miltiradiene to abietatriene was not observed. It was suggested that the aromatization is spontaneous and possibly driven by sunlight93. The discovery of CamCYP76AH68 indicates that at least in C. americana an enzyme may assist in the conversion of miltiradiene to abietatriene. When we expressed each CYP with CamTPS6 and CamTPS10 to evaluate CYP activity with the (+)-kaurene backbone, we observed a new peak with expression of CamCYP71D717. Upon further investigation, however, we realized this enzyme apparently catalyzes formation of (+)-manool (6) from (+)-copalol (5), the dephosphorylation product of (+)-CPP (Fig. 8, Supplemental Fig. 7). Each CYP/TPS combination that resulted in observable products was then expressed in combination with all other CYPs. CamCYP76AH67 combined with CamCYP76AH68 and miltiradiene yielded at least one new oxidized compound (Fig. 8, Supplemental Fig. 6). The combination of CamTPS6 with CamCYP71D716 and CamCYP71D717 resulted in full conversion of (+)-manool to 3-oxy-manool, which was confirmed by NMR (Fig. 8, NMR: Supplemental Fig. 8).
No abietane-type diterpenoids were previously found in C. americana, which has been primarily studied for clerodane diterpenoids produced in leaves96–98. Other Callicarpa species, including C. bodinieri and C. macrophylla99, produce a wide variety of medicinally relevant abietane diterpenoids (Fig. 9), indicating that the abietane skeleton is a key intermediate for at least
some plants in this genus73,99. We analyzed methanol extracts of C. americana root, fruit, and leaf tissue for evidence of diterpenoids. All extracts showed peaks with distinct retention times and MS/MS fragmentation patterns consistent with diterpenoids (Supplemental Fig. 9).
C. americana contains over 600 predicted CYPs, and it is likely that the BGC CYPs are part of a larger metabolic network with peripheral modifying enzymes elsewhere within the genome78. However, the functional activities we report here validate the biological significance of the BGC and its divergent modules. The CYPs showed a marked preference for the (+)-copalol and miltiradiene backbones over other diterpenes present in the plant. Within the two modules, the miltiradiene and (+)-kaurene synthases were differentially expressed along with their respective (+)-CPP synthases. The CYP76AHs were more active towards miltiradiene, whereas the CYP71Ds utilized (+)-copalol. Functionalization of (+)-kaurene may require oxidations catalyzed by non-clustered enzymes.