The Sclerotinia sclerotiorum genome contains 80 putative secondary metabolite clusters
Secondary metabolite biosynthetic gene clusters are ubiquitous among fungi and may constitute an important adaptive component of the fungal genome. To determine how many secondary metabolites S. sclerotiorum potentially produces and aid future investigations into their functions, we used several software packages to predict secondary metabolite biosynthetic gene clusters in the S. sclerotiorum genome.
We found that antiSMASH predicted 87 clusters containing 1,630 genes, while SMURF predicted 46 clusters containing 490 genes (Supplementary Table 1). Thirty SMURF clusters overlapped with antiSMASH clusters. Of the overlapping SMURF clusters, 29 contained predicted PKS, NRPS or PKS/NRPS-like backbone enzymes while one contained a DMATS, identified by both SMURF and antiSMASH. Two clusters identified by antiSMASH as fatty acid biosynthesis clusters were excluded from further analyses (Supplementary Table 1). These clusters contained fungal type I fatty acid synthase and type II fatty acid synthase domains, and did not contain other biosynthetic or tailoring enzymes.
The 16 SMURF clusters that were not predicted by antiSMASH, did not contain genes encoding known biosynthetic backbone enzymes and few contained tailoring enzymes, transporters or transcription factors. Therefore only the largest, 20-gene SMURF-only cluster, containing cytochrome P450, transporter and transcription factor encoding genes was included in further analyses. The other putative clusters are listed in Supplementary Table 1.
Secondary metabolite clusters are often transcriptionally co-regulated. Therefore, to further interrogate the antiSMASH and SMURF predictions, we also analysed expression of SM cluster genes using an existing RNA sequencing dataset profiling gene expression in S. sclerotiorum in vitro and during infection of B. napus [4]. We detected 174 clusters of three or more neighbouring co-regulated genes (Figure 1, Supplementary Table 2), which overlapped with 37 antiSMASH-predicted clusters and 12 SMURF-predicted clusters.
To obtain a final set of putative secondary metabolite biosynthesis gene clusters based on predictions from these three software packages, we used the following procedure: 1) clusters were formed from the union of antiSMASH and SMURF predictions (with the exception of 15 SMURF-only clusters); 2) clusters were extended to include adjoining clusters of co-expressed genes; and, 3) clusters were joined if there was a gap of three or fewer genes between them. Four pairs of clusters and one set of three clusters were joined and 33 clusters extended, resulting in 80 clusters (Table 1, Figure 1), of which 46 contained three or more co-expressed genes.
Genes encoding biosynthetic backbone enzymes in the clusters included 5 NRPSs, 17 PKSs, 2 hybrid PKS-NRPSs, 96 NRPS-like and PKS/NRPS-like proteins and one DMATS (Supplementary Table 3a). Six clusters contained putative isoprenoid biosynthesis enzymes including three UbiA prenyltransferases, two squalene/phytoene synthases and a polyprenyl synthase. There were seven clusters with no identified backbone enzyme, while 33 clusters had two or more backbone enzymes (Supplementary Table 3). The majority of clusters contained either an ABC or MFS transporter (67%, n=54), a Zn2Cys6 transcription factor (51%, n=41), or both. Twenty-five clusters (31%) contained one or more cytochrome P450s (Supplementary Table 3).
Several putative secondary metabolite biosynthesis clusters in the Sclerotinia sclerotiorum genome are up-regulated during infection of Brassica napus
Many plant pathogenic fungi produce secondary metabolites that have important roles in virulence. To assess whether this may be the case for S. sclerotiorum, we used a previously published transcriptome data set to determine the expression of BGCs during infection of B. napus. In the original analysis of the RNA sequencing dataset used here, Seifbarghi et al. [4] identified 12 PKSs, four NRPSs, five NRPS-like enzymes, a phytoene synthase and a chalcone and stilbene synthase that were up-regulated during infection of B. napus. All but one of these enzymes were in our predicted biosynthetic gene clusters and our analysis agrees that most are upregulated (Supplementary Table 3). The exceptions were three PKSs and an NRPS that were upregulated in planta, but not significantly, and one NRPS - here identified as an NRPS-like protein – that we found not to be upregulated.
We found that 54 backbone enzymes in 41 clusters were significantly up-regulated in planta at one time point or more (Figure 1; Supplementary Table 3). These enzymes comprised the phytoene and chalcone/stilbene synthases identified by Seifbarghi et al. [4], 2 NRPSs, 9 PKSs, one hybrid PKS/NRPS, a UbiA prenyltransferase and 39 NRPS-like and PKS/NRPS-like proteins. Other cluster genes upregulated during infection included transcription factors (11 clusters), cytochrome P450s (16 clusters) and transporters (29 clusters). A total of 70 clusters (88 %) contained at least one upregulated key gene including tailoring enzymes, transcription factors and transporters (Figure 1). The number of upregulated backbone enzymes increased over the time course of B. napus infection from six at 1 hour post inoculation (HPI), to 37 at 24 HPI and 33 at 48 HPI. Together these data indicate that many secondary metabolite biosynthesis clusters in S. sclerotiorum may have a function during plant infection, and that clusters play an increased role late in infection (>=24 HPI).
Furthermore, analysis of the transcriptome data found 19 clusters of six or more neighbouring co-expressed genes that did not overlap with any predicted secondary metabolite clusters. This could indicate that there are potentially other biosynthesis pathways not predicted by the tools we used, that are active in S. sclerotiorum. However, this is quite speculative these clusters could also have other functions unrelated to secondary metabolism.
Comparative analysis of putative secondary metabolite gene clusters provides insight into their potential functions
Numerous secondary metabolite biosynthesis genes have been predicted, and many of them functionally characterised, in many eukaryotes. To assess the homology of predicted S. sclerotiorum gene clusters to clusters in other eukaryotes, we conducted a MultiGeneBlast analysis. We conducted the analysis against all clusters across plant, fungal and mammalian genomes in the Genbank archive (Supplementary Table 4). This identified several clusters with high similarity to homologous clusters in other fungi, including clusters in the closely related fungus B. cinerea with known products.
Most (98 of 129; 76 %) of the key biosynthetic enzymes in S. sclerotiorum had homologues in B. cinerea (54-98 % amino acid identity, 51-113% query coverage per subject). This includes 7 out of 16 PKSs (77-90% amino acid identity), all 5 identified NRPSs (71 to 89 % amino acid identity), a phytoene synthase and a chalcone and stilbene synthase. Four of these homologous enzymes occur in biosynthetic gene clusters that have been characterised in B. cinerea and that are linked to the production of melanin and the phytotoxin botcinic acid (Supplementary Table 4, Table 1). The homologous phytoene synthase occurs in both B. cinerea and S. sclerotiorum in a four-gene putative carotenoid biosynthesis cluster. A further three homologous NRPSs have been linked to siderophore biosynthesis in B. cinerea, but the associated clusters have not been characterised. The following sections describe specific clusters with homology to characterised gene clusters in B. cinerea.
Putative extracellular siderophore cluster:
We identified a putative cluster (number 2_4, Table 1, Figure 2A) containing a homologue of B. cinerea siderophore NRPS6 and three other genes (ABC transporter, enoyl-CoA hydratase and GCN5-related N-acetyltransferase), all conserved across the Ascomycetes and known to be involved in coprogen or fusarinine biosynthesis. The B. cinerea gene NRPS6 has been categorised as an extracellular siderophore synthetase according to a phylogeny of NRPSs [31]. Three of the S. sclerotiorum genes in this cluster, sscle_02g018200 – sscle_02g018220, were significantly coexpressed according to FunGeneClusterS. These were the homologues of B. cinerea NRPS6 (sscle_02g018200) and two 3’ neighbouring genes. The homologue of the ABC transporter in the B. cinerea NRPS6 cluster (sscle_02g018190), which is the gene closest to its 5’ end, showed a similar expression pattern to these genes but was not found to be significantly coregulated (Figure 2A). Other genes in this cluster were not coexpressed but were homologous to genes flanking the conserved extracellular siderophore cluster in B. cinerea.
Putative intracellular siderophore biosynthetic gene cluster:
Both NRPS2 and NRPS3 in B. cinerea were classified as intracellular siderophore biosynthesis NRPSs according to the phylogeny of Bushley and Turgeon [31]. We found that the homologue of the B. cinerea NRPS2 in S. sclerotiorum has a different arrangement of modules from B. cinerea but appears to be involved in intracellular siderophore biosynthesis since it occurs throughout the Leotiomycetes in a cluster with an l-ornithine 5-monooxygenase [32] (cluster 9_5, Table 1, Figure 2B). Genes in cluster 9_5 that were homologous to the B. cinerea NRPS2 cluster showed two distinct expression patterns. The homologue of NRPS2 and an oxidoreductase were both significantly upregulated at 24-48 HPI whereas others were downregulated throughout infection with some showing an increase in expression at 48 HPI (Figure 2B). No genes in cluster 9_5 were found to be significantly coexpressed according to FunGeneClusterS.
The putative intracellular siderophore synthase sscle_05g044190 was homologous to B. cinerea NRPS3, which is found in B. cinerea strain T4 but not in B. cinerea strain B05.10. Homologues of this NRPS and a nearby ABC transporter are clustered in some Trichocomaceae as well as in some Rutstroemiaceae and Vibrissiaceae. However, no other siderophore biosynthesis related genes were found in the cluster. This NRPS showed low expression (< 16 FPKM) and was not upregulated during B. napus infection.
Putative carotenoid biosynthetic gene cluster:
Both S. sclerotiorum and B. cinerea contained a four-gene cluster with similarity to carotenoid gene clusters in Neurospora crassa and F. fujikuroi (cluster 2_3, Table 1, Figure 2C). All four genes in this cluster were upregulated in planta relative to in vitro at 24 HPI and three of these genes were also upregulated at 48 HPI. These four genes and three others further downstream in cluster 2_3 were found to be significantly coexpressed with neighbouring genes but the rest of the genes in cluster 2_3 were not.
Putative sclerotial and conidial melanin biosynthesis clusters:
PKS12 and PKS13 are homologues of B. cinerea dihydroxynaphthalene (DHN) melanin biosynthesis PKSs and occur in separate clusters along with homologues of other melanin biosynthesis genes identified by Schumacher [33]. Cluster 12_1 contains homologues of BcPKS12 and the transcription factor BcSMR1 (sclerotial melanin regulator 1) (Table 1, Figure 3A). BcPKS12 is hypothesised to provide the intermediate 1,3,6,8-tetrahydroxynaphthalene (T4HN) in sclerotia for conversion to DHN. Though no genes in the S. sclerotiorum cluster were significantly coexpressed with neighbouring genes, there was a discernible similarity between the expression profiles of sscle_12g091470 (ABC transporter) and sscle_12g091490 (Zn2-Cys6 transcription factor).
Cluster 3_7 contains homologues of BcPKS13 along with two transcription factors, a THN reductase and a scytalone dehydratase (Table 1). BcPKS13 is hypothesised to provide T4HN in conidia for conversion to DHN. This PKS showed low expression during infection (FPKM<16).
Botcinic acid biosynthetic gene cluster:
Cluster 15_3 contains homologues of 11 of the 17 genes of the B. cinerea botcinic acid gene cluster (Boa3 to Boa13), while Cluster 5_2 contains another two genes (Boa1, Boa2) (Table 1, 3C). These genes were found to be coregulated despite being located on different chromosomes, with almost all genes in the cluster significantly upregulated at 48 HPI. The exception was Boa9 – one of the cluster’s two PKSs - which showed low (~20 FPKM) and constant expression throughout infection. Genes in these clusters outside of the homologues of the botcinic acid cluster were not significantly coexpressed according to FunGeneClusterS.
Manual curation of domains of predicted co-regulated clusters shows that Sclerotinia sclerotiorum may produce ribosomally synthesised and post-translationally modified peptides
Secondary metabolites can be produced without PKSs, NRPSs and other known key biosynthetic enzymes by ribosomal synthesis, in which a precursor protein is produced ribosomally and then processed via peptidases. A number of gene clusters producing ribosomally synthesised and posttranslationally modified peptides (RIPPs) have been reported in filamentous fungi including gene clusters for the antimitotic toxins ustiloxins [34] and phomopsins [35]. Genes common to biosynthetic clusters for ustiloxins and phomopsins include copper-binding tyrosinases, zinc finger transcription-regulating proteins, S41 family peptidases, multiple DUF3328 proteins and SAM-dependent methyltransferases [35]. The ustiloxin B cluster in A. flavus also contains two flavin-containing monooxygenases, a cytochrome P450, an MFS multidrug transporter and a gamma-glutamyltranspeptidase.
We conducted a preliminary investigation of whether S. sclerotiorum has the capacity to produce RIPPs by interrogating the Interpro annotation for proteins annotated as DUF3328, since presence of multiple DUF3328 proteins was noted as a conspicuous feature of known RIPP clusters [35]. There are four pairs of adjacent DUF3328 proteins in the S. sclerotiorum genome, two of which are in clusters of coexpressed genes. Genes near these pairs were then scanned for the presence of tyrosinases and peptidases. One of these clusters, which was located on chromosome 3, contained potential RIPP biosynthetic genes (Figure 4). Eight genes in this cluster were co-expressed and significantly upregulated relative to in vitro at 24 HPI. This cluster was not conserved throughout fungi but appeared in the distantly related species Talaromyces atroroseus.
Sclerotinia sclerotiorum secondary metabolite biosynthetic gene clusters are enriched at subtelomeres
In many species of fungi, secondary metabolites are over-represented in polymorphic and repetitive subtelomeric genomic regions [36]. This is thought to be a result of selection for enhanced metabolic plasticity in the face of a constantly changing environment. To assess whether this is the case in S. sclerotiorum, we assessed how many secondary metabolite cluster genes were within 300 kilobase pairs of telomeres. We found that secondary metabolite clusters were enriched in subtelomeric regions, with 38% of clusters (n=30) and 29% of cluster genes being subtelomeric (chi squared test of independence χ2=23.6, degrees of freedom (df)=1, p=1.2 x 10-6), compared with 24% of all genes in the genome.
We then assessed whether secondary metabolite BGC genes were closer on average to transposable elements than non-BGC genes. We found that secondary metabolite cluster were on average further from repeats than non-cluster genes. However, when we performed the comparison using only genes on the ends of BGCs, we found that there was no difference. Regardless of whether they were in BGCs or not, genes were on average closer to transposable elements if they were within 300 Kb of telomeres (Figure 5). The subtelomeric BGC genes were not closer to repeats than subtelomeric non-BGC genes. These data suggest that although there was a slight enrichment of BGC genes at subtelomeres, they were not especially close to transposable elements when considered as a whole gene class.
Sclerotinia sclerotiorum secondary metabolite genes are more likely to be paralogues than other genes
Duplication and neofunctionalisation of genes is an evolutionary process that often affects secondary metabolites and it may occur through activity of transposable elements [24]. To determine whether S. sclerotiorum secondary metabolite clusters exhibited evidence of recent duplication, we detected paralogues by using OrthoFinder to find S. sclerotiorum genes with multiple orthologues in orthologous groups among 25 fungal genomes from 10 taxonomic classes. Of 10,336 S. sclerotiorum genes in orthologous groups, 3,022 are paralogues, of which 687 (23%) are in secondary metabolite clusters. Chi squared tests of independence showed an association between paralogues and secondary metabolite clusters, with paralogues significantly more likely to occur in clusters than non-paralogous genes (χ2=78.1, degrees of freedom (df)=1, p<2.2e-16).
Sclerotinia sclerotiorum paralogues are closer to repeats and more likely to be in taxonomically restricted orthogroups
To determine whether these paralogous genes might have been duplicated through the activity of transposable elements, we assessed their genomic positions relative to a previously published repeat annotation and subtelomeres. The mean distance of paralogues to TEs was 2,278 base pairs (bp) closer than the mean distance of non-paralogues to TEs (Wilcoxon W= 15056993, p= p<2.2e-16). Paralogues were also significantly more likely to be subtelomeric than non-paralogous genes (χ2=39.7, degrees of freedom (df)=1, p< 3.017e-10).
As a measure of the age of duplications leading to paralogues, we assessed duplication events with respect to branches of the tree produced by the Orthofinder algorithm (Supplementary figure 1). Overall, there were 201 duplicated genes that were specific to S. sclerotiorum. Of these, only 13 were not transposable element genes. Intriguingly, three of the duplicated non-transposable element genes were genes residing in BGCs. Although a relatively small number, this provides evidence of ongoing duplication of BGC genes in the S. sclerotiorum lineage. Since speciation between S. sclerotiorum and its closest relative in the tree, S. subartcica, duplications appeared to affect 265 genes. Of these, only 64 were not transposable element genes. A total of 16 of the non-transposable element duplicated genes were in BGCs. Although not specific to S. sclerotiorum, these duplication events appear to have specifically affected the Sclerotinia genus. Duplicated genes specific to the Sclerotinia genus or S. sclerotiorum alone were not enriched among BGCs, despite their overall enrichment among paralogous genes. This would suggest that much of the duplication and neofunctionalisation of BGCs has occurred over a relatively long evolutionary time frame with a few recent events indicative of some ongoing selection for changes in the metabolome.
Sclerotinia sclerotiorum secondary metabolite biosynthetic gene clusters exhibit greater sequence diversity and presence / absence polymorphisms than other genes
Since there was an enrichment of BGC genes at subtelomeres (albeit without a corresponding decrease in proximity to repeats), we hypothesised that they might be subject to accumulation of more polymorphisms than other genes. We found that secondary metabolite genes were highly over-represented among genes with presence / absence polymorphisms (P = 6.077e-5) (Figure 6A and B). Around 1.2 % BGC genes were completely absent in at least one individual, compared with 0.4 % of non-BGC genes; however, about 0.69 % of BGC genes were partially absent, which was similar to the 0.77 % of non-BGC genes. Despite the over-representation of BGC genes among those with complete loss in at least one isolate, there was no enrichment of BGC genes among those that exhibited at least one high impact SNP or InDel polymorphism (P = 0.9177) (Figure 6A). However, the overall SNP diversity of secondary metabolite genes was higher than non-secondary metabolite genes (Figure 6C). The mean haplotype diversity of secondary metabolite genes was 0.94, which was significantly higher than the 0.90 of other genes (P < 2.2e-16). The mean nucleotide diversity of secondary metabolite genes was also higher at 12.3, compared with 10.91 for other genes (P = 4.31e-06). These data indicate that S. sclerotiorum BGC genes are among the most polymorphic genes in the genome, affected both by point mutations and large scale insertions and deletions leading to complete gene loss.a