Strain characterization of Cephalotrichum gorgonifer NG_p51
Ecology and taxonomy
We previously isolated the fungal strain NG_p51 and according to the former taxonomic positioning determined the genus of the strain by its ITS region as Doratomyces sp. (GenBank: HQ115716.1) (18). In the meantime, the genus Cephalotrichum (Microascaceae, Hypocreales) has been reorganized and now contains also species formerly affiliated to the genera Doratomyces and Trichurus. No sexual morphs have been observed. C. gorgonifer is frequently found in soil, decaying plant materials, dung and on wet cellulosic materials indoors. Occasionally it has also been isolated from human hair and respiratory samples. Their temperature optimum is around 25°C to 30°C but also growth at 37°C has been observed for several isolates. There are however no indications for human pathogenicity but data in this regard is scarce (36, 37).
We performed a phylogenetic analysis with ITS sequences (= Internal transcribed spacer regions of the rDNA and 5.8S region) of strain NG_p51 (GenBank No.: HQ115716.1), together with the C. gorgonifer (including the Ex-epitype of C. gorgonifer: Trichurus spiralis CBS 635.-78) and Cephalotrichum telluricum (i.e.: closest outgroup) dataset as published by Woudenberg et al. (37). The resulting phylogenetic tree shows that our strain clearly belongs to the species C. gorgonifer and that the C. telluricum species are clustered as an outgroup as expected (Fig. 1). We therefore designated the strain C. gorgonifer NG_p51.
Figure 1 Phylogenetic tree of Cehpalotrichum gorgonifer isolate NG_p51
A: Phylogenetic tree based on the ITS sequences of strain isolate NG_p51 together with several Cephalotrichum gorgonifer isolates and C. telluricum strains as closest. The tree entries follow following pattern: “accession number- genus and species-isolate identifier”. The phylogenetic analysis shows that strain isolate NG_p51 belongs to the species Cephalotrichum gorgonifer. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The strain isolate (i.e. NG_p51) used for all experiments in this publication and the Ex-epitype are written in bold letters.
Morphology of C. gorgonifer NG_p51:
Colonies of C. gorgonifer NG_p51 reach a diameter between 35–55 mm on oat meal agar (OA), potato carrot agar (PCA) and potato dextrose agar (PDA) after 14 days at 25°C. They effuse a dark greyish black (Fig. 2a) mycelium, consisting of nearly velutinous layer of synnemata which are 1–2 mm in length (Fig. 2b) and formed by aggregation of conidiophores (the annelophores). Numerous curled sterile setae are present alongside the fertile part of the synnemata (Fig. 2c-d) and conidia are borne from ampuliform annelids which appear smooth-walled and ovoidal to broadly ellipsoidal with truncate base and rounded apex, measuring 4–6 x 3.5-4.0 µm (Fig. 2e). Phenotypic traits of the strain NG_p51 (especially presence of undulate sterile setae, greyish color of conidia en-masse as well as their shape and dimensions) are in perfect concordance with the species concept of C. gorgonifer (Bainier) Sandoival-Denis, Gené & Guarro (36), in older literature also described under a name Trichurus spiralis Hasselbr. by Domsch et al. (38) and Ellis (39).
Additionally, the growth on malt extract agar plates (MEA) at 28°C, 30°C, 32.5°C, 35°C and 37°C showed that NG_p51 behaves as published for C. gorgonifer (36). In the observed range, optimal growth was at around 28°C. At 35°C, radial growth was strongly impaired while at 37°C radial growth was arrested. However, within the inoculation spot, spores at 37°C were germinating and mycelia density was increasing over the 4 days of incubation (Fig. 3)
Figure 3 Radial growth of C. gorgonifer NG_p51 on MEA petri dishes at different 5 temperatures after 4 days.
SM analysis of C. gorgonifer
Initial work with NG_p51 revealed that it is capable of suppressing growth of Staphylococcus aureus and two clinical methicillin resistant isolates of S. aureus when grown in the presence of the HDAC inhibitor valproic acid. Valproic acid treatment increased the production of seven antimicrobial compounds (cyclo-(l-proline-l-methionine), p-hydroxybenzaldehyde, cyclo-(phenylalanine-proline), indole-3-carboxylic acid, phenylacetic acid and indole-3-acetic acid) and also lead to the biosynthesis of an otherwise absent compound which was identified as phenyllactic acid (26). Also, rasfonin has been found in these screens, although the production of this bioactive compound was subsequently shown not to be dependent on valproic acid in the growth medium (31). Initial analysis of the production conditions revealed that this metabolite is produced on AMM liquid medium and detectable within 48 hours of incubation (Fig. 6). Although rasfonin is a well-known bioactive compound with potential applications in cancer treatment (32–35), its biosynthetic pathway remains unknown. To decipher its production pathway, we sequenced the genome and searched for the presence of BGCs potentially coding for rasfonin production.
Genome analysis and gene annotation:
Whole genome shotgun sequencing of C. gorgonifer was performed by 454 pyrosequencing yielding 1.78 Gb of raw sequence data which were assembled into 48 scaffolds of overall 35.9 Mb (49.4-fold coverage, N50 of 2.07 Mb). A total of 10,469 gene models were predicted using two different GeneMark gene predictors and manual validation. The annotated ORFs account for 43.8% of the genome. The overall GC content is 55%, while the average GC content of ORFs is 60.3%. Genome sequence data and annotation was deposited in the European Nucleotide Archive (ENA) at EMBL-EBI with the project number PRJEB15373 which can be accessed online at (40).
Annotation of biosynthetic gene clusters (BGCs):
Around 1 (41) to 25 genes (42) can be involved in the biosynthesis of single SMs. In most cases these genes are coregulated and in collinear vicinity to each other forming so-called BGCs (43). Genes within these clusters can code for proteins that are responsible for the biosynthesis of the compound-defining backbone (polyketide synthase (PKS), non-ribosomal peptide synthetase (NRPS), terpene synthase (TPS), …), the modification of the chemical backbone structure (e.g., cytochrome P 450, transferases, oxidoreductases, O-methyltransferases…), the regulation (global or cluster specific transcription factors (TFs)) of the cluster genes or for transport (delivery to compartments or export to surrounding media) (14, 44).
For BGC annotation, antiSMASH 6.0.1 for fungi (fungiSMASH) was used with standard parameters (45). The prediction algorithm for fungal BGCs assigned 308 genes to 24 putative BGCs. Six BGCs, showed homology to BGCs from other species with similarities ranging from 20–100% which are included in Table 1 with their respective references (Only the best scoring homolog BGC is shown). Similarity in this case is defined as “percentage of genes within the closest known compound that have a significant BLAST hit to genes within the current region” (taken from antiSMASH documentation (45)). In total, ten PKSs and one type3-PKS, three NRPS-PKS hybrids, five NRPSs, six NRPS-like proteins and three putative TSs could be assigned.
Expression analysis of candidate biosynthetic genes and whole genome transcriptome sequencing under different physiological conditions.
To obtain information about the genome wide differential expression pattern of BGC genes we cultivated the fungus under conditions of active growth in liquid shake cultures for 24 hours in Aspergillus minimal medium (AMM) and compared the expression pattern with cells in stationary phase after 48 hours of growth on the same medium. Of all 10469 annotated genes, 717 genes that were predicted not to be part of a secondary metabolism pathway by antiSMASH were up-regulated at 48h and 818 were down-regulated. From the 308 genes that are putatively involved in secondary metabolism, 39 were up-regulated at the 48-hour timepoint and 40 were down-regulated. The remaining 231 genes stayed unaffected. 8855 genes were not differentially regulated between the two timepoints which are 85% of the annotated genes.
The whole transcript-dataset can be retrieved from the additional materials (Additional_file_2.xlsx). The whole transcriptome was uploaded at NCBIs Gene Expression Omnibus (GEO) (46) under the accession number GSE217303
To estimate the transcriptional activity of BGCs under the chosen conditions of active growth versus stationary phase, we categorized the transcriptional status of the key-enzymes within a given BGC into three states (Table 1 Column Expression; Cat.). The expression of category 1 genes is not detectable or falls below the arbitrarily set threshold of biological relevance (see caption of Table 1) at both time-points (i.e. 14 of 29 genes). Category 2 genes (3 genes) are expressed only at one time point (i.e. either 24h or 48 hours) and the expression of category 3 genes (12 genes) is above the threshold at both 24h and 48 hours, although their expression levels may vary between the two time points (7 genes with at least a 2-fold difference between the time points). The individual expression levels, differential regulation between 24h and 48h and their respective transcriptional category of all predicted key-enzyme encoding genes can be retrieved from Table 1.
Table 1
Summary of BGCs of Cephalotrichumg gorgonifer NG_p51 as predicted by antiSMASH. Columns: ID: Cluster ID as assigned by antiSMASH; BB-gene(s): gene number of the key-enzyme encoding genes; type: type of key- enzyme; Range (genes): The predicted borders of the BGC with the total gene count in parenthesis; Expression: Expression of backbone genes at 24h and 48h are depicted as a 2-zone heatmap (weak expression | red --> blue |strong expression), based on their RPKM value. An arbitrary log-value of 1 was considered as a biologically relevant expression threshold under which, the expression was set to zero. The column “Δ” depicts the differential expression between 24-hour and 48-hour samples and is visually supported by a 3-zone heatmap (down-regulated @ 48h | blue --> white --> red | up-regulated @ 48h) with zero being the turning point. A significance value of p < 0.05 was applied as threshold which is indicated by an Asterix near the differential expression value. If the differential expression is higher than 2-fold, it is indicated with an additional "+". The Cat.: Expression Category (Cat1: Silent at 24h and 48h; Cat2: Expressed at either 24h or 48h (i.e. differentially expressed); Cat3 (Expressed at both 24h and 48h). Best scoring homolog BGCs in other species that were found by antiSMASH are also displayed together with their similarity in % and references.
ID | BB-gene(s) | Type | Expression (BB- gene) | Homologs (Species) | Similarity | Ref. |
24h | 48h | Δ | Cat. |
1 | DNG_00901 | terpene | 5.6 | 6.2 | 0.6 | Cat3 | - | - | - |
2 | DNG_01262 | T1PKS | 5.3 | 5.6 | 0.3 | Cat3 | naphtalene BGC (Daldinia eschscholzii) | 33% | (47) |
3 | DNG_01539 | T1PKS | 1.7 | 3.7 | 2.0 | Cat3 | - | - | - |
4 | DNG_02059 | T3PKS | 4.9 | 6.1 | 1.2*, + | Cat3 | - | - | - |
5 | DNG_02555 | terpene | 6.8 | 4.3 | -2.5*, + | Cat3 | - | - | - |
6 | DNG_02774 | T1PKS | 0.0 | 6.4 | 7.7*, + | Cat2 | - | - | - |
DNG_02782 | T1PKS | 0.0 | 6.2 | 9.5*, + | Cat2 | - | - |
7 | DNG_03072 | T1PKS | 0.0 | 0.0 | - | Cat1 | - | - | - |
8 | DNG_04293 | NRPS | 0.0 | 0.0 | - | Cat1 | - | - | - |
DNG_04303 | T1PKS | 0.0 | 0.0 | - | Cat1 | | | |
9 | DNG_04582 | NRPS-T1PKS | 0.0 | 0.0 | - | Cat1 | betaenone A – C (Phoma betae) | 25% | (48) |
10 | DNG_04746 | terpene | 7.7 | 7.5 | -0.1 | Cat3 | Squalestatin (Aspergillus sp.) | 40% | (49) |
11 | DNG_06431 | T1PKS | 0.0 | 0.0 | - | Cat1 | - | - | - |
12 | DNG_07105 | T1PKS | 0.0 | 0.0 | - | Cat1 | - | - | - |
DNG_07107 | T1PKS | 0.0 | 0.0 | - | Cat1 | | | |
13 | DNG_07411 | NRPS | 2.1 | 2.7 | 0.6 | Cat3 | - | - | - |
14 | DNG_08044 | NRPS | 0.0 | 1.9 | 0.9 | Cat2 | phyllostictine A/B (Phyllosticta cirsii) | 20% | (50) |
DNG_08052 | NRPS-T1PKS | 0.0 | 0.0 | - | Cat1 | | | |
15 | DNG_08262 | NRPS-like | 0.0 | 0.0 | - | Cat1 | - | - | - |
16 | DNG_08309 | NRPS | 5.4 | 1.4 | -4.1*, + | Cat3 | Dimethylcoprogen (Alternaria alternate) | 100% | (51) |
17 | DNG_09028 | NRPS-like | 5.2 | 5.2 | 0.0 | Cat3 | - | - | - |
18 | DNG_09136 | NRPS-like | 0.0 | 0.0 | - | Cat1 | - | - | - |
19 | DNG_09754 | NRPS | 0.0 | 0.0 | - | Cat1 | - | - | - |
20 | DNG_09868 | NRPS-like | 5.0 | 1.6 | -3.4*, + | Cat3 | - | - | - |
21 | DNG_09951 | NRPS-like | 0.0 | 0.0 | - | Cat1 | - | - | - |
22 | DNG_10196 | NRPS-like | 7.3 | 5.6 | -1.7*, + | Cat3 | - | - | - |
23 | DNG_10230 | NRPS-T1PKS | 0.0 | 0.0 | - | Cat1 | cytochalasin E / K (Aspergillus clavatus) | 23% | (52) |
24 | DNG_10425 | T1PKS | 0.0 | 0.0 | - | Cat1 | - | - | - |
Prediction of the BGC responsible for rasfonin biosynthesis based on PKS domain structure.
Rasfonin (Fig. 4) is an α-pyrone-containing natural product composed of two distinct acetate-based polyketide (PK) chains. The PK chains are derived from the condensation of four and six acetyl-CoA subunits resulting in the formation of a partially reduced and di- and trimethylated tetra- and hexaketide, respectively, that are further modified (hydroxylation) and finally linked via an ester bound. These chemical features suggest the involvement of two highly-reducing PKS (HR-PKSs) containing, in addition to the essential acyltransferase (AT), ketoacyl-synthase (KS), and acyl carrier protein (ACP) domains, domains involved in the reduction of the growing PK chain, i.e., a ketoreductase (KR), a dehydratase (DH) and an enoylreductase (ER). Next to this, the methyl groups at C6, C8, C10, and C4’, C6’, suggest the presence of an intrinsic C-methyltransferase (MT) domain that functions in the transfer of a methyl group from S-adenosylmethionine (SAM) to a β-ketoacyl-ACP substrate for both involved PKSs.
We thus screened the genome sequence of C. gorgonifer for the presence of predicted PKS- encoding genes that harbour a domain composition putatively fitting the enzymatic functions necessary for rasfonin biosynthesis. Bioinformatic analysis of the 14 PKSs present in the C. gorgonifer genome revealed that seven HR-PKSs harbour the necessary domain architecture (Fig. 5A). Only two BGCs contain two HR-PKSs in close proximity to each other namely BGC 6 (Table 1 ID 1) with the combination of [CgPKS4 + CgPKS5] (Fig. 5A) and BGC 12 (Table 1 ID 12) with the combination [CgPKS10 + CgPKS11] (Fig. 5A). Both also encode a putative transferase, i.e., DNG_02776 and DNG_07106, respectively. To further define which of the two BGCs is involved in rasfonin biosynthesis, we questioned our transcriptome data set and found that only the BGC with the HR-PKS pair CgPKS4 and CgPKS5 (Fig. 5A; Table 1 ID 6) showed expression at 48 hours. The other candidate pair CgPKS10 and CgPKS11 (Fig. 5A; Table 1 - ID 12) were not expressed at any tested condition.
Next, we set out to analyse the extension of the CgPKS4/ CgPKS5 BGC. Our transcriptome data showed a clear coregulation in expression of the genes that are positioned between CgPKS4 and CgPKS5. No transcription was observed 24 hours post inoculation (hpi) while expression of all genes was significantly upregulated 48 hpi. The genes upstream of CgPKS4 and downstream of CgPKS5 were not coregulated (Fig. 5B). The GO annotations of adjacent genes do not suggest that they are involved in the synthesis of rasfonin. Thus, the BGC likely spans nine genes, i.e., DNG_02774-DNG_02782. To evaluate the contribution of the putative CgPKS4/ CgPKS5 BGC to rasfonin biosynthesis, one of the PKS-encoding genes, CgPKS4, was arbitrarily chosen for targeted gene disruption.
Figure 5 A Domain organization was analysed using the NCBI Conserved Domain (36), InterPro (37), SBSPKSv2 (38); and the PKS/NRPS Analysis Web-site (39). KS (red), keto synthase; AT (yellow), acyltransferase; DH (pink), dehydratase; MT (blue) C-methyltransferase; ER (grey) enoylreductase; KR (violet), ketoreductase; ACP (green), acyl carrier protein; cAT (orange), carnitine acyltransferase; Chalcone/stilbene_synt_N (lime-green): Chalcone/stilbene synthase N-terminal; Chalcone/stilbene_synt_C (dark-cyan): Chalcone/stilbene synthase C-terminal. The proposed PKS names are listed together with their gene number and length. The transcriptional activity at 24h or 48 hours ins indicated as “-“ (No transcription at given timepoint) or “+” (transcription at given timepoint). The two PKS encoding genes that are proposed to be involved in the rasfonin synthesis (i.e. CgPKS4 and CgPKS5) are highlighted by a red box B The proposed BGC for rasfonin biosynthesis is depicted with transcript at 24h and 48h timepoint (shake flask culture, AMM). The BGC borders are depicted in red and were assigned based on the coregulation of genes. Proposed BGC gene names (rsf1 to rsf9) are depicted above gene illustrations.
Experimental Verification Of The Bgc Involved In Rasfonin Biosynthesis
The standard method to prove the involvement of a respective candidate gene in SM biosynthesis is deletion of the coding region in the genome, accompanied by the loss of product formation. However, so far Cephalotrichum sp. have not yet been genetically modified and used for reverse genetics approaches. Therefore, a transformation method had to be established with the aim to disrupt or mutate CgPKS4 and show by chemical analysis that rasfonin is not produced any longer in the transformed strain. To maximize the chance obtaining CgPKS4 mutations by transformation, three different approaches were selected, i.e., a Cas9 approach (Strain Cg-Cas9-02774-1), an agrobacterium-mediated transformation approach (Strain Cg-At-02774-1 ) and also a homology directed repair (HDR) approach via protoplast-mediated transformation of a linear fragment (Strain Cg-HDR-02774-1) (53, 54). All three transformation methods were successfully implemented in our C. gorgonifer strain NG_p51. Cas9 and the HDR transformations were performed on kryo-stocks of C. gorgonifer protoplasts (see material and methods section).
Disruption of CgPKS4 by Cas9 resulted in the strain Cg-Cas9-02774-1. Sequencing of the locus revealed a 640 bp deletion ranging from base pair 269 (i.e.: sgRNA annealing site) to base pair 908 of the coding sequence (CDS), a mutation that additionally leads to a frameshift within the remaining CDS of the gene (Sequence at Cas9- restriction site: 5’- ACCAATGCACCCGGT | 640 bp deletion | GTTCGACCACCGCGC − 3’) (see Additional_file_3.gbk).
The A. tumefaciens mediated mutation resulted in the transformant Cg-At-02774-1 that has the first 2300 base pairs of CgPKS4 (DNG_02774) replaced with the hygromycin resistance cassette (hph), again resulting in a frameshift within the CDS. The 5’-end homology region spans 1200 base-pairs upstream of the start codon (i.e. promoter region) and the 3’- homology region spans 1500 base-pairs downstream of the exchanged fragment (i.e. first 2300 base pairs of the gene body) (see Additional_file_3.gbk).
The transformant Cg-HDR-02774-1 resulted from the HDR-directed approach. Here, the same replacement cassette was used as for the agrobacterium transformation (i.e., PCR product from the “HDR-Fragment” primer pair with template pAt-DNG02774; Additional_file_1.pdf Table S 1). Both strains (i.e., Cg-At-02774-1 and Cg-HDR-02774-1) were PCR tested for the successful gene replacement event (see Additional_file_1.pdf Figure S1; Additional_file_3.gbk))
To test if the mutated target gene is responsible for the biosynthesis of rasfonin, the transformants and the recipient strain NG_p51 were cultivated in parallel. The fungal colonies (mycelia + agar media) were extracted, and extracts were analysed by high performance liquid chromatography (HPLC) for the presence of rasfonin. None of the three tested mutants showed any detectable amounts of rasfonin (Figure 6C, left) when compared to the wildtype NG_p51 (Figure 6B, left). A 25 µg/mL rasfonin solution produced in our laboratory and verified by NMR (see materials and methods) served as analytical standard (Figure 6A). Next to this, we spiked the wildtype sample with our standard thereby unequivocally proofing that the peak in the wildtype chromatogram indeed is rasfonin (Figure 6B and C, right), and that rasfonin is missing in our CgPKS4 mutants. To verify that not even trace amounts of rasfonin are produced that are not detected by our HPLC method, we subjected an extract from the Cg-Cas9-02774-1 strain to liquid chromatography-high resolution mass spectrometry (LC-HRMS). The results of this analysis Figure 6 HPLC analysis of rasfonin between a 25 µg/mL rasfonin standard (green, A), the wildtype isolate NG_p51 (blue, B left) and the CgPKS4 knockout strain NG_p51∆CgPKS4 (orange, C left). All 3 knockout strains and all replicates showed the same absence of rasfonin, but only one is shown here. Spiked samples are shown on the right panel and are indicated by a green peak (NG_p51, B right; NG_p51∆CgPKS4, C right). The retention times in minutes are shown next to the peaks.
showed no detectable amounts of rasfonin in the mutant strains (Additional_file_1.pdf Figure S2). This verifies that CgPKS4 (from now on referred to as Rsf1) is indeed directly involved in the biosynthesis of rasfonin in C. gorgonifer NG_p51.
Next to CgPKS4 (rsf1), eight genes are coregulated under rasfonin-inducing conditions including a second PKS-encoding gene CgPKS5 located at the end of the putative BGC. Thus, we propose that the BGC involved in rasfonin biosynthesis stretches between CgPKS4 and CgPKS5, and comprises overall nine genes, rsf1 to rsf9 (Fig. 5B). Next to the PKSs, the BGC encodes three putative cytochrome p450s (DNG_02775: rsf2; DNG_02779: rsf6; DNG_02781: rsf8), a putative O-acyltransferase (DNG_02776: rsf3), a Major facilitator family1 (MFS1) transporter (DNG_02777: rsf4), one FSH domain-containing protein (DNG_02778: rsf5) as well as one uncharacterized protein (DNG_02780: rsf7).