Chemical Validation of Briarane Production
We previously characterized homologous cembrene B synthases identified from the genomes of R. muelleri and E. caribaeorum.27,28 Both encoding genes were colocalized with cytochromes P450 genes, which also shared homology with their counterparts on the two distinct genomes. The tantalizing possibility that coral animals evolved gene cluster families for the biosynthesis of structurally related specialized metabolites motivated us to evaluate whether diverse corals known to contain briarane diterpenoids also encode cembrene B synthases in their genomes. As such, we first set out to confirm literature reports of briarane producing octocoral families. Previously, Pennatuloidea, Ellisellidae, Coralliidae, Briareidae, and Erythropodiidae were established as briarane containing octocorals (Fig. 2a).30–36 As each of these five coral families belong to the Scleralcyonacea order, we further examined the remaining 14 families within this taxonomic order and found that five others reported terpenoid chemistry, yet not with briarane scaffolds (Fig. 2b, SI Table 1) .37 For the remaining nine families, no terpenoid literature was found.
We thus collected coral tissue from representative species across five known briarane producing families, including Renilla koellikeri (family Pennatuloidea), Stylatula elongata (family Pennatuloidea), Dichotella gemmacea (family Ellisellidae), the branching and encrusting morphologies of Briareum asbestinum (family Briareidae), E. caribaeorum (family Erythropodiidae), and Corallium rubrum (family Coralliidae). R. koellikeri collected, by SCUBA in San Diego, is not a reported source for briaranes, but is related to the Atlantic R. muelleri that produces briaranes38 and whose genome encodes cembrene B synthase as part of a putative BGC.27,28 We thus analyzed the tissue of R. koellikeri and isolated 11-hydroxyptilosarcenone (6) whose chemical structure was confirmed by NMR spectroscopy (NMR supplementary note, SI Table 2, SI Fig. 1). This compound was previously identified from the sea pen Ptilosarcus gurneyi, but not yet from a Renilla sea pansy.39,40
Limitations in accessible biomass for the other octocorals targeted in this study led us to use molecular networking analyses of crude extracts and semi-pure fractions to confirm briarane production. We used 11-hydroxyptilosarcenone (6) as a standard in the molecular network to identify a cluster containing 54 nodes representing unique masses of briarane compounds (Fig. 2, SI Fig. 2, SI Table 3). This analysis not only expanded the ptilosarcenone network from R. koellikeri (Pennatuloidea) but also established related briaranes in D. gemmacea (Ellisellidae), E. caribeorum (Erythropodiidae), and B. asbestinum (Briareidae). Two nodes corresponding to compounds in these corals had exact masses consistent with literature values: gemmacolide R (7) from D. gemmacea and erythrolide D (8) from E. caribaeorum (Fig. 2c, SI Table 1).41,42 However, we did not detect briarane nodes from extracts of C. rubrum (Coralliidae) or S. elongata (Pennatuloidea). While there is literature precedent of S. elongata and corals in the family Coralliidae, although not specifically C. rubrum, making briaranes, we included these corals in our study assuming that they should have the genomic capacity to produce briaranes even though the samples we analyzed were devoid at the time of collection.33,34,43
Genome assembly and syntenic analysis of the briarane gene cluster family in Scleralcyonacea corals
At the start of this study, a few briarane containing coral draft genomes with very low continuity (contig N50) were available: E. caribaeorum (Contig N50 2 kb)28, R. muelleri (N50 70.5 kb)44, and Pteroeides caledonicum (N50 4 kb) (SI Table 4 for assembly accessions and statistics).To capture intact BGCs, we proceeded to generate higher quality assemblies of corals across four briarane producing families, obtaining long read sequencing data for R. koellikeri (family Pennatuloidea), S. elongata (family Pennatuloidea), D. gemmacea (family Ellisellidae), the branching and encrusting morphologies of B. asbestinum (family Briareidae), and E. caribaeorum (family Erythropodiidae). We assembled 26 to 78 gigabase pairs (Gb) of sequencing reads into genomes ranging in size from 231 megabase pairs (Mb) (R. koellikeri) to 1,237 Mb (B. asbestinum) (Table 1). To assess the completeness of our assemblies, we estimated genome sizes with K-mer based analyses using Illumina short reads (SI Fig. 3), producing genomes ranging from 144 Mb (R. koellikeri) to 1,155 Mb (B. asbestinum) (Table 1). Illumina short read coverage was too shallow for E. caribaeorum to allow for K-mer based analysis (SI Table 5 for all sequencing data used). Additionally, we included our recently reported chromosomally resolved C. rubrum (family Coralliidae) genome at 475 Mb (20 chromosomes, 951 scaffolds)45,46 in our analysis to enable BGC exploration across all five known briarane producing coral families. The genome assemblies are now some of the most contiguous for octocorals with contig N50 values above 196 kb (Table 1,Fig. 3a, SI Table 6). The chromosomal resolution assembly of C. rubrum had a scaffold N50 of 16,290 kb, on par with the current best scaffolded octocoral assembly from Xenia sp. with a scaffold N50 of 14,832 kb.47 Benchmarking universal single-copy ortholog (BUSCO) completeness scores ranged from 86.4–88% (Table 1). BUSCO scores are highly dependent on the database used for the analysis and the low scores could reflect the use of the metazoan database, which was the closest available. The lower-than-expected completeness is on par with the BUSCO for the Xenia assembly, which was 88.1% complete (Fig. 3a).47 We further assembled transcriptomes to guide our genomic analyses by identifying actively transcribed genes and determining intron bounds (SI Table 6).
Using the putative briarane BGC gene sequences from R. muelleri and E. caribeorum as queries, we analyzed the five newly established genomes of D. gemmacea, B. asbestinum, R. koellikeri, C. rubrum, and S. elongata and the public P. caledonicum for homologous genes. In total, the genomes from these eight octocoral species span five of the 20 octocoral families in the order Scleralcyonacea. To determine if non-briarane producing octocoral families also have syntenic BGCs, we also evaluated the publicly available Heliopora coerulea (family Helioporidae) genome.48 We identified homologs of the cembrene B terpene cyclase (cbTC) in all eight octocorals shown or suspected to produce briaranes, whereas the H. coerulea genome did not contain a homologous sequence, although it did encode nine other TCs. In the eight octocoral species containing a homologous cbTC gene, seven contained iterations of the briarane BGC (Fig. 3b). Given that the P. caledonicum genome had poor contiguity, we were not surprised to find the cbTC homolog on a 3 kb contig and expect that it too is part of a BGC.
We observed one conserved short-chain dehydrogenase (cbSDH) and up to three cytochrome P450s (cbCYPa, cbCYPb, and cbCYPc) per briarane BGC (Fig. 3b, SI Fig. 4). In the eight octocorals with briarane BGCs, four maintained the full five-gene briarane BGC spanning 22 kb in R. muelleri up to 40 kb in C. rubrum, albeit with different gene organization. Notably, in the cases of S. elongata, E. caribeorum, and B. asbestinum, the orthologous genes were split between two BGC loci (SI Table 7). For S. elongata and B. asbestinum, their respective cbTC and cbCYPc genes reside on one contig, while cbCYPa, cbCYPb, and at least two cbSDHs are on a second contig. In the case of E. caribeorum, cbCYPc resides outside the BGC contig. Also, in the case of B. asbestinum, the orthologous genes have much longer gaps resulting in a significantly larger BGCs spanning ~ 150 kb. S. elongata and B. asbestinum further have multiple copies of the SDH gene at three (85%, 89% and 90% sequence similarity) and two (46% sequence similarity) copies, respectively (SI Fig. 5). The duplications of the SDHs and gene rearrangements suggest lineage specific independent evolution across octocoral species.
Since we identified the same set of homologous cytochromes P450s in a wide range of octocoral species, we conducted a phylogenetic analysis of these sequences in the background of broader cnidarian cytochromes P450s to shed light on their evolutionary origin. Our analysis included CYPs from four non-octocoral cnidarians as a reference set: Hydra vulgaris (24 CYPs), Acropora digitifera (24 CYPs), Aurelia aurita (37 CYPs), and Nematostella vectensis (70 CYPs).49 From octocorals we included CYPs from six briarane producing species D. gemmacea, B. asbestinum, R. koellikeri, S. elongata, C. rubrum, and E. caribaeorum, as well as the non-briarane producers Xenia sp. and H. coerulea, yielding 318 octocoral CYP sequences at ~ 40 per genome (SI Table 8). We combined the annotated reference cnidarian CYPs with the annotated octocoral CYPs to build a phylogenetic tree (Fig. 3c, SI Fig. 6). The CYPs from the four reference cnidarians were previously associated with accepted CYP clans and families,49 thereby enabling us to examine the evolutionary context of the octocoral BGC CYPs. Notably, some octocoral CYPs fell into the clades associated with the cnidarian CYP clans, suggesting that they fulfill primary, widespread functions generally present in cnidarians. Others, including the BGC-associated CYPs, formed new monophyletic clades that were octocoral specific. The three briarane CYPs, cbCYPa-c, form monophyletic clades consisting of cbCYPa, cbCYPb, and cbCYPc homologs from different species (Fig. 3c). While cbCYPb and cbCYPc share a common ancestor that may suggest related enzymatic functions, cbCYPa is distantly related and instead lies closer to clan 46 with predicted functions associated with steroid biosynthesis (Fig. 3c). Interestingly, H. coerulea, while also from the order Scleralcyonacea has CYPs closely related to the cbCYPb and cbCYPc clades, but these genes are not co-localized hinting at the loss of the briarane BGC in this lineage.
To test whether our phylogenetic approach could be used for genome mining in octocorals with lower quality assemblies, we next included CYP sequences from the poor contiguity genome assembly of P. caledonicum. We found corresponding sequences present in two of the three clades, thereby identifying cbCYPa and cbCYPb orthologs in the genome (Fig. 3d).
| Table 1. Assembly statistics and BUSCO assessment of five octocoral genomes |
Organism | Contigs (#) | Assembly size (Mb) | K-mer Based Genome size (Mb) | Contig N50 (kb) | Longest (kb) | BUSCO (%C/ %D) |
Briareum asbestinum | 22,526 | 1,254 | 1,155 | 332 | 4,126 | 87.3/4.6 |
Dichotella gemmacea | 8,171 | 550 | 401 | 196 | 2,036 | 87.5/15.6 |
Renilla koellikeri | 557 | 231 | 144 | 4,895 | 22,942 | 86.9/2.6 |
Stylatula elongata | 392 | 360 | 269 | 5,139 | 39,131 | 88.0/1.0 |
Erythropodium caribeorum | 8,289 | 300 | NA | 233 | 1,576 | 86.4/0.6 |
Corallium rubrum45 | 20 chromosomes, 951 scaffolds | 545 | 540 | 1,600 (scaffold N50 18,521kb) | 115,000 | 88.5/1.2 |
Biochemical validation of the briarane BGC
To provide support that the briarane BGC is involved in the early stages of briarane biosynthesis, we set out to provide biochemical validation of the clustered biosynthesis genes (SI Table 9). We first tested whether all the homologous cbTC terpene synthases functioned as cembrene B cyclases as we originally established in R. muelleri and E. caribaeorum.27,28 We expressed each, 6 in total, in Saccharomyces cerevisiae and evaluated their production profiles by GC-MS analysis (Fig. 4, SI Fig. 7). In all cases, we confirmed that GGPP (9) was cyclized into cembrene B (10) as the sole product.
For the remaining enzyme assays, we tested the functions of genes from the R. koellikeri BGC and additional representative homologs identified by synteny. All tested enzyme homologs showed identical reactivity. We evaluated the functions of the three co-localized CYP subtypes in pairwise fashion by co-expression with cognate cbTC genes using previously established methods (Fig. 4).50–52 We consistently measured the production of 19-hydroxycembrene B (11) in yeast upon co-expression with cbCYPb homologs. Its formation proceeded in a regiospecific manner with no other detectable monooxygenated products (SI Fig. 8). Scale-up fermentation and chromatographic purification provided 11 in pure form, which we characterized by NMR spectroscopic analyses (SI Tables 10 and 11, NMR supplementary note, SI Fig. 9). When we instead co-expressed cbCYPc homologs with cbTC genes, we observed formation of 17-hydroxycembrene (S1) which could be detected by LCMS (SI Fig. 10) and characterized by NMR (SI Tables 11 and 12, NMR supplementary note, SI Fig. 9). Because cbCYPb and cbCYPc were found to oxidize complimentary positions on 10 with respect to the briarane pathway, we tested the cbCYPb product 11 by feeding it to yeast expressing cbCYPc homologs (SI Fig. 11). We clearly observed a single oxidized product that upon purification and combined spectroscopic (SI Tables 10 and 11, NMR supplementary note, SI Fig. 9) and x-ray crystallographic characterization (SI Fig. 12, SI Table 12) was established as (7S)-7,19-dihydroxycembrene B (12). Likewise, when yeast harboring cbCYPb genes were fed S1, we similarly obtained 12, albeit with lower rates of conversion (see Methods). The cooperative and related functions of the cbCYPb and cbCYPc enzymes correlates well with their shared evolution (Fig. 3c). Despite numerous attempts with different substrate and gene pairs, we have yet to observe a function for the third cytochrome P450, cbCYPa.
As a variety of cembrene double bond isomers are produced by corals, we considered the possibility that the described CYP450 enzymes could act non-specifically on other cembrene isomers. We thus co-expressed cbCYPb homologs with terpene synthases that selectively produce either cembrene A or C. However, no oxidized products were produced (SI Fig. 13), showing high specificity of cbCYPb for cembrene B.
We next evaluated the co-localized short-chain dehydrogenase cbSDH homologs for their possible involvement in further transforming the diol 12. We expressed cbSDHs in Escherichia coli to conduct a series of in vitro incubation experiments. Incubation of purified cbSDH enzymes with NADP and 11 produced aldehyde (S2), while incubation with the diol 12 produced cembrene B g-lactone (13), a structure feature diagnostic of all briarane diterpenoids (SI Figs. 14 and 15). Two consecutive dehydrogenation steps were detected by LCMS chromatography via the formation of a m/z 301.2183 product ion ([M + H]+, calcd. m/z 301.2163 for C20H29O2+). We scaled-up the in vitro reaction of 12 and NADP with the cbSDH to provide lactone 13, which was purified and characterized spectroscopically. Scale up experiments were conducted with representative characterized genes to unambiguously determine all structures. The structure was supported by NMR (SI Tables 11 and 12, NMR supplementary note, SI Fig. 9) and the diagnostic γ-lactone carbonyl stretch in the IR spectrum (vmax1757 cm− 1). It is possible that 13 forms from 12 via a five-membered hemiacetal intermediate (SI Fig. 16).