Mining of Cyanobacterial Genomes Indicates That Plasmids Are Involved in the Production of Natural Products


 Background Microbial natural products have unique chemical structures and diverse biological activities. Cyanobacteria commonly possess a wide range of biosynthetic gene clusters to produce natural products. Several studies have mapped the distribution of natural product biosynthetic gene clusters in cyanobacterial genomes. However, little attention has been paid to natural product biosynthesis in plasmids. Some genes encoding cyanobacterial natural product biosynthetic pathways are believed to be dispersed by plasmids through horizontal gene transfer. Thus, we examined complete cyanobacterial genomes to assess if plasmids are involved in the production and dissemination of natural products by cyanobacteria.Results The 185 analyzed genomes possessed 1 to 42 gene clusters and an average of 10. In total, 1816 biosynthetic gene clusters were found. Approximately 95% of these clusters were present in chromosomes. The remaining 5% were present in plasmids, from which homologs of the biosynthetic pathways for aeruginosin, anabaenopeptin, ambiguine, cryptophycin, hassallidin, geosmin, and microcystin were manually curated. The cryptophycin pathway was previously described as active while the other gene cluster include all genes for biosynthesis. Approximately 12% of the 424 analyzed cyanobacterial plasmids contained homologs of genes involved in conjugation. Large plasmids, previously named as “chromids”, were also observed to be widespread in cyanobacteria. Sixteen cryptic natural product biosynthetic gene clusters and geosmin biosynthetic gene clusters were located in those mobile plasmids.Conclusion Homologues of genes involved in the production of toxins, protease inhibitors, odorous compounds, antimicrobials, antitumorals, and other unidentified natural products are located in cyanobacterial plasmids. Some of these plasmids are predicted to be conjugative. The present study provides in silico evidence that plasmids are involved in the distribution of natural product biosynthetic pathways in cyanobacteria.


Background
Microbial natural products originate in secondary metabolism and exhibit a wide diversity of chemical structures and biological activities [1]. These metabolites can act as antibiotics, anticancer agents, antivirals, toxins, and nd applications such as enzyme inhibitors, polymers or surfactants [2]. The enzymes involved in the biosynthesis of natural products are commonly encoded in biosynthetic gene clusters located in contiguous stretches of DNA known as biosynthetic gene clusters (BGCs) [3][4][5]. BGCs usually include genes for core biosynthesis and tailoring enzymes and regulatory and resistance genes [6,7]. Among the accessory enzymes, 4-phosphopantetheinyl transferases (PPTs) play a major role in the biosynthesis of several natural products through the conversion of inactive apo-proteins into their active holo-forms [8][9][10].
Cyanobacterial natural product BGCs are mostly concentrated in the genomes of late-branching cyanobacteria, mainly in the orders Oscillatoriales and Nostocales, although these BGCs are found in almost all cyanobacterial genomes [22,30,31]. Several studies have attempted to map the distribution of BGCs in these organisms [22]. However, little attention has been given in cyanobacterial studies on whether these BGCs are located in chromosomes or plasmids [22,[29][30][31][32]. Horizontal gene transfer (HGT) events are linked to the dissemination of many cyanobacterial natural product BGCs, including toxins that belong to the cylindrospermopsin, microcystin, anatoxin-a, and saxitoxin families [33][34][35].
Plasmids play a key role in HGT, and conjugation is one of the processes that can transfer genetic material [36,37]. The most frequent mechanism of DNA conjugation in gram-negative bacteria involves a relaxome, which includes a relaxase and a type IV coupling protein (T4CP) encoded by mobility genes (MOB), and a transferosome assembled by a type IV secretion system (T4SS) that is encoded by mating pair formation genes (MPF) [38,39]. During conjugation, the relaxase cleaves and covalently binds itself to the transferring DNA on a site called oriT [40]. The T4SS is believed to then act as a secretor protein by transfering DNA and the relaxase to the recipient cell [41]. For this purpose, the T4CP recognizes, energizes, and delivers the nucleoprotein to the T4SS [42]. Plasmids encoding these three components are called self-transmissible or conjugative, while mobilizable plasmids usually encode just the MOB and a T4SS and are transmitted only in the presence of a helper conjugative plasmid [40]. Although the majority of the cyanobacterial plasmids were found to lack all the necessary genes to be conjugative [39], no concomitant analysis of the presence of BGCs in plasmids and their mobility is currently available for cyanobacteria.
Thus, the present study screened 184 complete genomes publicly available in the GenBank database [43] from the phylum Cyanobacteria and one from Candidatus Melainabacteria, a phylum that is closely related to cyanobacteria [44]. The compartmentalization of natural product BGCs and the key enzymes PPTs in chromosomes and plasmids were investigated. Moreover, the mobility of plasmids and the phylogeny of known BGCs were predicted. We found evidence that plasmids are involved in the production of several natural products and HGT of BGCs in Cyanobacteria.

Results
According to the latest proposed system of cyanobacteria, approximately 37% of the analyzed genomes belong to the order Synechococcales (mainly Synechococcus and Prochlorococcus), followed by Nostocales (26% of genomes; the genus Nostoc alone was approximately 11% of the total dataset) (Fig. 1). The remaining 37% of the genomes were distributed in the orders Gloeobacterales, Gloeomargaritales, Synechococcales, Pleurocapsales, and Chroococcidiopsidales. No representative genome of the order Spirulinales was analyzed here due to unavailability.

General features of the evaluated genomes
From the 52 genera represented in the retrieved dataset, 27 included more than one genome. Thus, these genera were used for the calculation of averages and standard deviation of genomic characteristics (Table 1). These cyanobacterial genomes consisted of up to two chromosomes and 14 plasmids (Table S1). Chromosomes consisted of 97-100% of genomic DNA, while plasmids represented up to 3% (Table S2). Genome sizes ranged from 1.65 to 12.05 Mb, GC content from 30.8-68.7%, and the number of genes varied from 1816 to 11674 (Table S1). The number of BGCs in chromosomes ranged from 3 to 42; up to ve were found encoded in plasmids. Table 1 Genome statistics of cyanobacterial genera with more than one complete sequence in NCBI GenBank [43]. Averages and standard deviations of genome size, GC content, number of genes, BGCs in the chromosomes and plasmids, and the total number of BGCs in the genome were calculated. Gen = genome, Pld = plasmid, BGC = biosynthetic gene cluster, Chr = Chromosome. Biosynthetic potential A total of 1816 BGCs were identi ed; approximately 10 per genome were identi ed (Table S1). Synechococcus sp. JA-2-3B'a(2-13), Candidatus Melainabacteria MEL.A1, and Synechococcus sp. JA-3-3Ab had only one BGC and thus were the genomes with the lowest number of BGCs. In contrast, Moorea producens JHB and Moorea producens PAL-8-15-08-1 had 42 BGCs each (Fig. S1). Nostocales genomes were among those with the highest average number of BGCs. The number of BGCs appears to correlate with the genome size (Fig. 2).
Most BGCs were identi ed in chromosomes (1719) and were 95% of the total (1818). From these, RiPPs were the most widespread class of BGC products (526 representatives, approximately 31% of the chromosomal BGCs). Terpenes were the second most widespread products (470 representatives, approximately 27% of the BGCs in chromosomes) and were absent only in Arthrospira platensis C1, Candidatus Melainabacteria bacterium MEL.A1, and Nostoc sphaeroides CCNUC1. PKS was the least frequent class, with only 49 representatives.
Hybrid NRPS/PKS corresponded to more than half of the natural product BGCs located on the plasmid (26 BGCs) and were the most common. NRPS consisted of 20 representatives, followed by bacteriocin (15 representatives). In contrast to chromosomes, terpenes were one of the least frequent products with BGCs encoded on plasmids.

Known biosynthetic pathways on plasmids
Several of the analyzed complete genomes revealed large plasmids (here considered as > 500 kb), with sizes reaching a maximum of approximately 2.5 Mb in Stanieria cyanosphaera PCC 7437 plasmid pSTA7437.02 (Table S2). These large replicons contained one to ve BGCs, which is greater than the remainder of the analyzed plasmid that presented a maximum of three. A 16S rRNA phylogenetic analysis (Fig. 3A) was compared with phylogenetic trees built with manually curated known BGCs found in plasmids and chromosomes ( Fig. 3B-E). The geosmin BGC tree shows that plasmid BGCs appear to share recent ancestors with chromosomal BGCs but tend to form their own cluster that are incongruent with 16S phylogeny. Thus, these 2 BGCs from Nostoc plasmids might have been transferred through HGT and possibly face different evolutionary pressure than geosmin BGCs present in chromosomes. In contrast, there is no evidence of HGT in hassalidin and anabaenopeptin BGC trees. Insu cient information is available for the microcystin BGC, as a single cluster was found in plasmids (Fig. 3D).

Distribution of 4-phosphopantetheinyl transferases
A total of 193 PPTs were found (Table S4). From the 185 complete genomes analyzed here, 155 had at least one copy of PPT homologs (approximately 84%). The majority (148) encoded only one enzyme, while 6 genomes encoded two enzymes and 1 genome (Halomicronema hongdechloris C2206) had three copies of the gene. The size of these enzymes ranged from 137 (one of the two copies in Chroococcidiopsis thermalis PCC 7203) to 339 aa (one of the three copies in Halomicronema hongdechloris C2206). However, approximately 90% (147 enzymes) of the enzymes ranged between 200 to 280 aa. The genome of Acaryochloris marina MBIC11017 included a PPT in the genome and another in plasmid pREB1. This translated enzyme sequence was more similar to an AcpS-type PPT gene than a Spf-type from Bacillus subtilis. The remaining 162 PPTases in the cyanobacterial genomes were likely Spf-type.

Homologs of proteins involved in conjugation
The 424 plasmids were searched for the presence of the relaxase gene homologs VirB4 and VirD4 encoded in Nostoc sp. PCC 7120 pCC7120alpha (Fig. 4). The presence of these three genes was used to predict the transmissibility of the plasmids (Table S5).

Phylogenomics
Overall, later-branching cyanobacteria from Nostocales and also Oscillatoriales and Chroococalles had more natural product BGCs than other orders (Fig. 5). The gene clusters encoded by these three orders were from all analyzed classes. In contrast, early branching cyanobacteria, especially those from Gloeobacterales, Synechococcales, and Gloeomargaritales, had fewer natural product BGCs (mainly terpenes and RiPPs) than those of the other analyzed orders. This pattern of distribution also applied to mobile plasmids. PPTs were distributed in all analyzed cyanobacterial orders.
Chromids are large, plasmid-like replicons that were previously found in approximately 10% of bacterial genomes [48].
Chromids possess replication systems that are similar to plasmids and can carry essential genes for cell viability [60].
One of the proposed functions of these large replicons is to increase genome plasticity through the rapid acquisition or loss of genes by HGT [61]. Here, chromids occurred in approximately 15% of the analyzed cyanobacterial genomes, and therefore seem to be more widespread in cyanobacteria than in other phyla [48].
Nostoc sp. strain ATCC 53789 is a known producer of the antiproliferative cytotoxin cryptophycin, which is encoded in a plasmid [62,63]. Other plasmidial BGCs found here, such as the hepatotoxin microcystin, antifungal hassallidin, and odorous terpenoid geosmin, contained all the core genes and are possibly functional [64][65][66]. Consistent with our results, plasmids have been previously shown to contain genes encoding RiPPs and are associated with the products of these toxic and odorous compounds [29,67]. Toxins produced by other bacteria, such as botulinum from Clostridium botulinum and cereulide from Bacillus cereus, are also found on plasmids [68,69]. In the case of the botulinum toxin, HGT of the botulinum gene cluster by conjugative plasmids < 200 kb is likely [70].
Only the plasmid pCC7120α from Nostoc sp. PCC7120 has been reported to be transmissible [71]. Nevertheless, our results indicate that other cyanobacterial plasmids are possibly conjugative. A previous study using automatic annotation found no homologs of the T4SS in cyanobacteria and hypothesized that an unknown mechanism of conjugation could be present in these organisms [72]. It is currently unclear whether cyanobacterial plasmids are predominantly immobile, unlike in other bacterial phyla, due to the reduced availability of cyanobacterial genome sequences [72].
NRPSs and PKSs are constituted by multi-domains that have speci c functions in the biosynthetic pathways of polyketides and non-ribosomal peptides [73,74]. While the core module of an NRPS consists of at least the adenylation, condensation, and peptidyl carrier protein modules, acyltransferase, acyl carrier protein, and a ketoacyl synthase are the core domains of a PKS [75,76]. Thus, carrier proteins, such as the PPTs, are essential for the biosynthesis of these natural products. Two main families of PPTs are known, namely AcpS-type PPTs, which are involved in activating carrier proteins involved in the primary metabolism, and Sfp-like PPTs, which are involved in secondary metabolism pathways [9,77].
In cyanobacteria, only one copy of Sfp-like PPTs had previously been found in 29 different genomes [78]. However, the present study revealed that some cyanobacterial genomes can encode up to three different PPTs. Other bacteria also contain multiple copies of these enzymes in their genomes [10,79]. Interestingly, the fact that a plasmid from Acaryochloris marina MBIC11017 was the only representative of an AcpS-like PPT indicates that this enzyme could possibly be transferred horizontally together with BGCs. Consistent with our results, plasmids from other bacterial phyla have also been found to encode PPTs [80,81].
RiPPs gene clusters were located in almost all analyzed genomes. These molecules are products of post-translational modi cation of ribosomally synthesized precursor peptides [82]. Thus far, over 20 families of compounds that possess unique chemical features have been proposed [82]. Cyanobacteria encode the machinery to produce several RiPPs, including cyanobactins [83], lanthipeptides [84], lassopeptides [85], and microviridins [86]. Although cyanobactin BGCs are widespread in cyanobacteria and initially received the most attention, other RiPPs from cyanobacteria are also being explored [29,84]. Considering that automated tools are being improved to better predict genes involved in the biosynthesis of these compounds, future studies may expand the known repertoire of RiPPs produced by cyanobacteria [47,87].
Although terpenes are commonly isolated from plants and fungi, genes involved in their biosynthesis are widely found in bacterial genomes [88]. These compounds are essential in primary metabolism, such as for photosynthesis and respiration, but also have roles as secondary metabolites [89]. This could explain why genes encoding enzymes involved in the biosynthesis of terpenes are present in cyanobacterial genomes [90]. In cyanobacteria, geosmin and 2methylisoborneol are widely studied terpenes as they are odorous metabolites that impact drinking water quality [64,91,92]. Nevertheless, the repertoire of terpenes produced by cyanobacteria is possibly larger than currently known, as various cryptic terpene synthases are found in their genomes [30,88].

Conclusion
The availability of complete genomes has allowed mapping of BGCs in plasmids and the detection of known pathways of toxins (microcystin), odorous metabolites (geosmin), protease inhibitors (anabaenopeptin, aeruginosin), antimicrobial compounds (ambiguine and hassalidin), and antitumor (cryptophycin) compounds. This is new in silico evidence that plasmids are involved in the biosynthesis of diverse natural products. Cyanobacterial plasmids also appear to be involved in the dissemination of BGCs by HGT in cyanobacteria. The likelihood of mobility of natural product BGCs seems to be higher in certain orders with larger genomes, particularly from Nostocales. Thus, future research should investigate potential transmission of BGCs between cyanobacteria in vivo. If possible, the transmission of BGCs among cyanobacteria would present new biotechnological opportunities but also environmental and economic risks. Cyanobacteria, which are believed to be harmless, could acquire genes for toxin biosynthesis.

Methods
Cyanobacterial genomes "Cyanobacteria/Melainabacteria group" genomes deposited between 27 July 2001 and 14 January 2020 in the NCBI GenBank [43] at the "Complete" and "Chromosome" assembly level were analyzed. Altogether, they included 184 genomes from the phylum Cyanobacteria and 1 genome from Candidatus Melainabacteria (Table S1). The statistics of the genome assemblies were obtained from GenBank. Averages, standard deviations, and boxplot and scatter graphs were generated using Microsoft Excel v16.0.6742.2048 (Microsoft, Redmond, WA, USA).

Identi cation of natural product pathways and other proteins of interest
Gene clusters involved in secondary metabolite pathways were automatically annotated with antiSMASH v5.1.1 [47].
Manual annotation and curation were performed in the program Artemis v18.1.0 [93], and sequences were compared against the NCBI GenBank database using BLASTp [94]. For the manual identi cation of BGCs in the plasmids and the gene cluster involved in plasmid mobility, the parameters of e-value ≤ 1e-20, identity ≥ 60% were used for assigning orthologs. Identi cation of relaxase, VirB4, and VirD4 involved a wide diversity of strains and thus the parameters evalue ≤ 1e-20, identity ≥ 20% were used. Plasmid representations were generated using the standard parameters of the BLAST analysis in the server Gview [95] and BRIG v0.95 [96]. The program Inkscape v0.92 was used for drawing BGCs (https://inkscape.org/).

Phylogenetic analyses
The phylogenetic analyses of the concatenated genes from the BGCs and 16S rRNA were created with 5 000 000 generations in MrBayes 3.2.7a [97]. The best substitution model for each gene in the BGCs was predicted using BIC calculation in jModelTest v2.

Availability of data and materials
All data generated or analyzed in this study are included in this published article and its supplementary information les. The phylogenetic and phylogenomic datasets generated and/or analyzed during the current study are available in the TreeBase repository 27385, http://purl.org/phylo/treebase/phylows/study/TB2:S27385?x-access-code=a0eb8a6420200cbeaee9110ca518fe8f&format=html

Competing interests
The authors declare that they have no competing interests.