Biosynthetic potential analysis of an Arctic marine-derived strain Aspergillus sydowii MNP-2

doi:10.21203/rs.3.rs-4071076/v1

Download PDF

Article

Biosynthetic potential analysis of an Arctic marine-derived strain Aspergillus sydowii MNP-2

https://doi.org/10.21203/rs.3.rs-4071076/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Genome mining strategy plays a key role in the discovery of bioactive secondary metabolites (SMs) from microorganisms. This work highlighted deciphering the biosynthetic potential of an Arctic marine-derived strain Aspergillus sydowii MNP-2 by a combination of whole genome analysis and antiSMASH as well as feature-based molecular networking (MN) in the Global Natural Products Social Molecular Networking (GNPS). A completed genomic assembly of 34.9 Mb containing 10 contigs with an N50 scaffold size of 4.1 Mb was generated and a total of 13,218 protein-coding genes and 46 rRNA, 7 sRNA, 32 snRNA and 119 rRNA were annotated using Nr, GO, COG, Pfam, KEGG and other databases. AntiSMASH results indicated that strain MNP-2 harbors 52 biosynthetic gene clusters (BGCs), suggesting a great potential for producing SMs with various structure motifs. Noticeably, these BGC-encoded SMs with therapeutic potential were detected in its metabolic products through GNPS and MN analyses.

Biological sciences/Biological techniques

Biological sciences/Microbiology

Polar microorganisms

Aspergillus sydowii

Whole-genome sequence

Biosynthetic gene cluster

AntiSMASH

Molecular networking

A rich diversity of microorganisms exists in various polar habitats, and these microorganisms have evolved physiological, genetic, and metabolic characteristics adapted to extreme environments under the selection of long-term extreme environmental stresses, and thus have important basic research value and great application potential^1–3. A growing number of studies have demonstrated that the polar regions have the potential to be a significant repository of microbial resources and a potential source of active ingredients. There were 263 new natural products discovered between 2001 and 2020 that were derived from polar organisms, 134 of which were polar microorganisms^4,5. These products covered a wide range of structural types, including alkaloids, macrolides, terpenoids, peptides, and polyketides, and they showed promising biological activities like antibacterial⁶, antitumor⁷, and antiviral⁸ effects. The quantity of intriguing secondary metabolites with polar microbial origins has not altered considerably over the past few years, most likely because it is challenging to adapt polar bacteria to some common culture techniques.

Polar marine microbe-derived natural compounds offer a tremendous amount of potential for utilization as sources of therapeutic agent⁹. As fewer drugs become available, researchers are increasingly focusing on special microbial resources, such as habitat-specific microbes, and have begun to shift away from bioactivity-guided fractionation as the gold standard approach for natural product discovery, instead turning to genomics, metabolomics, and other big data approaches to guide isolation efforts towards uncharted chemical space^10,11. Even well-studied organisms have untapped biosynthetic potential because they encode a large number of biosynthetic genes that have not yet been connected to metabolite products¹². Paulus¹³ et al. localized the α-pyrone lagunapyrone biosynthetic gene cluster of a marine origin Streptomyces strain by antiSMASH and identified two analogues. Hou¹⁴ et al. successfully identified seven cyclohexadepsipeptides, chrysogeamides A–G, from the coral-derived fungus Penicillium chrysogenum using MN. Liu¹⁵ et al. used MN localization to determine two wealthy polycyclic macrolactam ansamycins from Streptomyces. Sun¹⁶ et al. successfully obtained 11 omicsnin analogues while identifying a biosynthetic gene cluster for antiviral components by integrating antiSMASH, MN, and other techniques. Here, we generated the arctic marine-derived strain Aspergillus sydowii MNP-2 genome using a combination of Nanopore and Illumination sequencing data sets and used Nr, COG, KEGG, CO, Pfam, and some other databases for gene function annotation based on sequence similarity or Motif similarity search. Additionally, to investigate the natural product synthesis ability of strain MNP-2 further, we used the antiSMASH platform to analyze its BGCs and the GNPS platform to establish a molecular network based on LC-MS/MS data to analyze its metabolite profile and associate metabolites with BGCs.

DNA extraction and next-generation sequencing

The fungal strain MNP-2 of A. sydowii was derived from Arctic marine sediments (73.8° N 168.9° W) and is preserved at the China Center for Type Culture Collection (CCTCC NO: M 2022061). The strain MNP-2 grown on potato dextrose agar (PDA) media was inoculated into 250 mL Erlenmeyer flasks containing 100 mL potato dextrose broth (PDB) medium and shaken for 2 days at 200 rpm and 30 ℃. After centrifugation, the supernatant was removed and washed once with phosphate buffered solution (PBS) to obtain mycelium for storage at -80 ℃. The purity and integrity of the genomic DNA were evaluated using 1% agarose gel electrophoresis and densitometry on comparably sized standards.

The yield and purity of the collected DNA were determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and a Qubit 2.0 fluorometer (Thermo Fisher Scientific, USA). After the DNA samples were tested and qualified, the libraries were built, and after they were finished, the libraries were diluted using Qubit 2.0 for initial quantification, and then the insert fragments of the libraries were tested using Agilent 2100. After the insert fragment size exceeded the expectation, the effective concentration of the libraries was correctly measured using the Q-PCR method to guarantee the libraries' quality. After the libraries passed the test, they were divided into flow cells based on their effective concentrations and downstream data volume requirements. cBOT was formed into clusters and sequenced using Illumina NovaSeq, Illumina's high-throughput sequencing platform.

SMRT sequencing

The qualified samples described in 3.1 were randomly interrupted by Megaruptor for genomic DNA, and large fragments of DNA were enriched and purified using magnetic beads. large fragments were cut and recovered using a BluePippin automated nucleic acid recovery instrument, and after purification, end repair and addition of acid was performed at both ends of the DNA fragments, and the SQK-LSK109 kit. Finally, the DNA library was accurately quantified using Qubit. After library construction, a certain concentration and volume of DNA library is added to the flow cell, and the flow cell is transferred to the Oxford Nanopore PromethION sequencer for real-time single-molecule sequencing.

Genome assembly

Because raw data may comprise low-quality sequences, joint sequences, and so on, the raw data must be filtered to obtain legitimate data (clean reads or pass reads) and then stored in FASTQ format to ensure the dependability of the information analysis results. SOAPnuke (v2.1.2, https://github.com/BGI-flexlab/SOAPnuke) is used to filter the raw data from next-generation sequencing. The raw data for third-generation sequencing is fast5 files, which are converted to fastq format after base calling with GUPPY, and then filtered to obtain valid data. K-mer values were automatically selected based on the read length and data type. NECAT (v0.0.1, https://github.com/xiaochuanle/NECAT) software was used to correct and splice the genome to obtain the initial splicing results, then Racon (v1.4.11, https://github.com/isovic/ racon) software was used to perform two rounds of error correction on the splicing results based on third generation sequencing data, and then Pilon (v1.23, https://github.com/broadinstitute/pilon) software was used to perform two rounds of next generation sequencing error correction on the initial assembly results after third generation sequencing error correction. The final assembly results were obtained by deduplicating the corrected genomes using purge_haplotigs (v1.1.2, https://github.com/skingan/purg e_haplotigs_multiBAM).

Gene annotation

Gene structure prediction allows researchers to obtain extensive information about the genome's gene distribution and structure, as well as vital raw materials for functional annotation and evolutionary study. Gene annotation of the MNP-2 genome was conducted using BRAKER (v2.1.4, https://github.com/Gaius-Augustus/BRAKER) software, which is a combination of GeneMark-ET¹⁷, and AUGUSTUS¹⁸. The annotation of gene functions and metabolic pathways based on existing databases, containing predictions such as Motif, structural domains, protein activities, and information about the metabolic pathways in which they are placed, is referred to as functional annotation of genes. Gene function annotation was performed on strain MNP-2 using nine databases, including Nr (https://ftp.ncbi.nlm.nih.gov), Pfam (https://pfam.xfam.org/), eggCOG (https://www.ncbi.nlm.nih.gov/COG/), Uniprot (https://www.uniprot.org/), KEGG (https://www.kegg.jp/kegg/), GO (http://geneontology.org/), Pathway (http://www.pathwaycommons.org/), Refseq (https://www.ncbi.nlm.nih.gov/refseq/), Interproscan (https://github.com/ebi-pf-team /interproscan), and so on, in order to acquire comprehensive gene function information.

Non-coding RNA annotation

Non-coding RNAs can all be transcribed from the genome, but rather than being translated into proteins, they can carry out their biological tasks at the RNA level. TRNA and rRNA are two of them that are directly engaged in protein synthesis. Using INFERNAL (v1.1.2, https://github.com/EddyRivasLab/in fernal) software based on the Rfam database (http://rfam.xfam.org/), various forms of ncRNAs were predicted and statistically categorised.

Repetitive sequence annotation

Scattered repeats and tandem repeats are two types of repeated sequences. LTR, LINE, SINE, and DNA transposons are examples of scattered repetitive sequences, also known as transposon elements. They can be characterized as highly repetitive sequences, moderately repetitive sequences, or low repetitive sequences based on the number of repeats. The software RepeatModeler (v1.0.4, https://github.com/Dfam consortiu m/RepeatModeler) was used to create its own repeat library, and RepeatMasker (v4.0.5, https://github.com/rmhubley/RepeatMasker) was used to annotate the genome with repetitive sequences.

Prediction of carbohydrate-active enzymes (CAZymes)

CAZymes are a very important class of enzymes classified as Glycoside Hydrolases (GHs), Glycosyl Transferases (GTs), Polysaccharide Lyases (PLs), Carbohydrate Esterases (CEs), Auxiliary Activities (AAs), Carbohydrate-Binding Modules (CBMs), and so on. The research of carbohydrate-related enzymes can yield a lot of useful biological information. The CAZy database can be used to investigate carbohydrase genomic, structural, and biochemical information. HMMER (v3.2.1, https://github.com/EddyRivasLab/hmmer) was used to annotate protein sequences based on the CAZy database (filtering parameters: E-value < e^− 18, coverage > 0.35, http://www.cazy.org/).

Analysis of pathogen-host interaction (PHI)

PHI is a database of pathogen-host interactions with experimentally validated content derived primarily from fungal, oomycete, and bacterial pathogen-infected hosts such as animals, plants, and insects. The target protein sequences were annotated using Diamond blastp (v2.9.0, https://github.com/enormandeau/ ncbi_b last_tutorial) based on the PHI database (http://www.phi-base.org).

Prediction of drug-resistant gene

The CARD framework is built as an Antibiotic Resistance Ontology (ARO) taxonomic unit to correlate information on antibiotic modules and their targets, resistance mechanisms, gene variants, etc. The comparison results display the position of each gene annotated in the CARD database (https://card.mc maste r.ca/), as well as the ARO ID and classification description, which can be used to understand the specific function of each gene related to antibiotic resistance.

Cytochromes P450 (CYP450) annotation

CYP450 is a large protein family that catalyzes the oxidation of a variety of substrates and participates in the metabolism of endogenous and exogenous substances such as drugs and environmental compounds. The target protein sequences were annotated using Diamond blastp based on the FungalP450 database (http://drnelson.utmem.edu/CytochromeP450.ht ml).

Prediction of virulence gene

Database of fungal virulence factors (DFVF, http://sysbio.unl.edu/DFVF/) is a database dedicated to the study of fungal virulence factors. To investigate the virulence-related genes present in strain MNP-2, the predicted protein sequences were compared with DFVF using Diamond blastp.

Other annotations

Classification of membrane transporter proteins using Transporter Classification Database (TCDB, http://www.tcdb.org/). All predicted gene pair protein sequences were analyzed using the software signalP (v5.0, http://www.cbs.dtu.dk/services/SignalP/) to identify proteins containing signal peptides. To identify proteins containing transmembrane helices and secreted proteins, all predicted gene-to-protein sequences were analyzed using the software tmhmm (v2.0, http://www.cbs.dtu.dk/services/TMHMM/).

Prediction of biosynthetic gene clusters (BGCs)

The genes responsible for secondary metabolite production are typically organized in BGCs. AntiSMASH (v6.1.1, https://docs.antismash.secondarymetabolites.org/) is the most extensively used tool for finding and characterizing BGCs in bacteria and fungi at the moment. AntiSMASH employs a rule-based technique to detect a variety of SM-producing biosynthetic pathways. For BGCs encoding NRPSs, type I and type II PKSs, lanthipeptides, lasso peptides, sactipeptides, and thiopeptides, which cluster-specific analyses can provide more information about the biosynthetic steps performed and thus provide more detailed predictions on the compounds produced, more in-depth analyses are performed.

Analysis of molecular networking (MN)

MN has swiftly become an extensively used technology in the field of natural products chemistry, with applications ranging from dereplication to genome mining, metabolomics, and chemical space visualization since the advent of the online open-source Global Natural Products Social (GNPS, https://gnps.ucsd.edu/). The samples were dissolved in methanol (1 mg/mL) and analyzed using a SCIEX X500 QTOF (SCIEX, USA) mass spectrometer to generate LC-MS/MS data, which were pre-processed by Mzmine and analyzed on the GNPS online platform to generate molecular networks, which were visualized using Cytoscape.

Fermentation and extraction

The strain MNP-2 grown on potato dextrose agar (PDA) media was inoculated into 500 mL Erlenmeyer flasks containing 200 mL potato dextrose broth (PDB) medium, and shaken for 3 days at 200 rpm and 30 ℃. The fermentation was performed in Erlenmeyer flasks (2 × 1 L) with sterilized rice (80 g) and tap water (120 mL). After autoclaving at 121 ℃ for 20 minutes, each flask was inoculated with 5% seed cultures and then incubated at room temperature under static conditions for 30 days. The fermented rice in each flask was extracted with 500 mL EtOAc by an ultrasonic instrument for 20 minutes three times followed by filtration using gauze. All the filtrate was combined and evaporated under vacuum to dryness, obtaining the sample 1 (approx. 1.56 g). The strain MNP-2 was inoculated in 500 mL flasks containing 200 mL PDB or Czapek-Dox Medium (2 flasks each) and shaken for 15 days at 200 rpm and 30 ℃. After completion of fermentation, the fermentation broth was extracted three times by EtOAc (twice the volume of the fermentation broth) and evaporated under vacuum to dryness, obtaining samples 2 (approx 0.31 g) and 3 (approx 0.27 g). The culture medium composition is shown in the supporting material.

Morphology, classification and phylogenetic analysis of strain MNP-2

On potato dextrose agar PDA medium, strain MNP-2 grows quickly, starting out as white filamentous, turning green in 2–3 days, and eventually turning dark green and powdery in a few days (Fig. 1a). Mycelium is more branched, septate multinucleate, conidial peduncle apical expansion into a spherical apical capsule, and a small peduncle bearing a string of conidia, according to electron microscope observations (Fig. 1b). The nuclear ribosomal DNA (nr DNA) internal transcribed spacer region (ITS) of strain MNP-2 was amplified and sequenced, and the ITS sequence was searched for homology in the nucleic acid database genbank. Blast analysis¹⁹ of the ITS gene sequence sequence revealed that strain MNP-2 had the highest similarity with the strain A. sydowii CBS 593.65 (100%, Fig. 1c).

Genome feature of strain MNP-2

The whole genome of strain MNP-2 contains 10 contigs with an N50 of 4.1 Mb and 50.0% GC content (Table 1), and its size was determined as 34.9 Mb (Fig. 2). N50 is the shortest contig length that needs to be included for covering 50% of the genome. In general, the contig N50 size of the genome is used to assess genome continuity; the larger the contig N50, the better the genome continuity. The overall genomic characteristics are similar to those of the five strains of Aspergillus sydowii currently included in the NCBI database (https://www.ncbi.nlm.nih.gov/) (Table S1).

Table 1

Statistics of strain assembly
Item	Value
Total_length (bp)	34,924,093
Total_length_without N (bp)	34,924,093
Contig	10
GC_content (%)	50.00
N50 (bp)	4,126,853
N90 (bp)	2,836,306
Average (bp)	3,492,409.30
Median (bp)	3,372,540.50
Min (bp)	1,368,430
Max (bp)	5,687,656

¹ Total_length is the assembly length; Total_length_without N is the length without gap in the assembly result; GC_content is the GC content; N50 is the N50 of the contigs. When the added length reaches half of the total length, the length of the last added contig is N50; the N90 algorithm is the same as N50; Average is the average length of the contig; Median is the median length of the contig; Min is the minimum length of the contig; Max is the maximum contig length. Generally, the contig N50 size of the genome is used to evaluate the continuity of the genome.

Prediction of genetic structure

The prediction and completeness evaluation of coding genes showed that there were 13218 total genes, with an average mRNA length of 1610.07 bp, an average CDS length of 1444.90 bp, a total of 42982 exons, a total of 29764 introns, an average number of exons per gene of 3.25, an average exon length of 444.34 bp, and an average intron length of 29764 bp. The gene had an average exon count of 3.25, an average exon length of 444.34, an average intron length of 73.35, and a single copy BUSCO of 98%²⁰ (Table S2, Fig. S1). Results of non-coding RNA annotation for rRNA, sRNA, snRNA, and tRNA were 46, 7, 32, and 119, respectively (Table S3). According to the repeat sequence annotation results, there were 19 SINE (Short interspersed element), 414 LINE (Long interspersed nuclear element), 1728 LTR (Long terminal repeat), 606 DNA transposons, 40 satellite DNAs, and 91 others (Table S4).

Annotation of gene functions

The total number of predicted genes was 13,218; of these, the number of genes with annotation information was 12,912 (97.68%), and the total number of functional gene annotations in the databases of Nr, Pfam²¹, eggCOG²², Uniprot²³, KEGG²⁴, GO²⁵, Pathway²⁶, Refseq²⁷, and Interproscan²⁸ were 12,894 (97.55%), 10642 (80.39%), 1047 (7.92%), 7772 (58.8%), 2946 (22.29%), 7693 (58.20%), 2764 (20.91%), 6252 (47.30%), 10626 (80.39%, Table S5), respectively. The species with the most strain MNP-2 comparisons, according to the analysis of Nr library comparison annotation results, was Aspergillus sp. Fungi (Fig. 3a). The COG database, which was created based on the evolutionary connections between bacteria, algae, and eukaryotes, can be used to categorize genes according to their direct homology. Energy generation and conversion, amino acid transport and metabolism, carbohydrate transport and metabolism, and lipid transport and metabolism are the COG group's more prevalent categories, according to the examination of COG data (Fig. 3b). The sequences' major classification after KEGG annotation was broken down into cellular processes, environmental information processing, genetic information processing, metabolism, cellular systems, etc. Among them, metabolism (3823, 56.25%) has the most annotated genes, particularly involved in carbohydrate metabolism and amino acid metabolism, with 742 and 732 annotated genes, respectively. These annotated genes suggest the existence of rich and diverse functions for protein and lipid metabolism, resulting in higher energy conversion efficiency (Fig. 3c). The GOslim classification was obtained by simplifying the GO annotation information, and the top 20 most annotated GOslim secondary classifications under each classification were chosen for mapping (Fig. 3d) after summarizing the gene functions in terms of cellular components, molecular functions, and biological processes. The gene enrichment of each GO secondary function in the context of all genes was used to understand the status of each secondary function. The Pfam database contains information about protein families. The genes annotated in each structural domain were statistically summarized, and the top 20 annotated structural domains were mapped (Fig. 3e), with the number of genes matching on the Major Facilitator Superfamily (MFS) found to be the highest, 620, according to the annotations.

Annotations to proprietary databases

In addition to the studies mentioned above, 6 carbohydrate-related enzymes were annotated using the Carbohydrate-active enzymes (CAZy) database²⁹ (Table S6). Based on the Pathogen Host Interactions (PHI) database³⁰, 6 assay sequences were annotated, and the target sequences and similarity of the database matches were given (Table S7). 10 drug resistance genes were annotated using the Comprehensive Antibiotic Research database (CARD)³¹ to learn more about the drug resistance genes present in each genome (Table S8). On the basis of the FungalP450 database³², a total of 1366 cytochrome P450 (CYP450) protein sequences were annotated. CYP450 is a broad family of proteins with ferroheme as a cofactor (Table S9). The predicted protein sequences were compared with the Database of Fungal Virulence Factors (DFVF)³³, and a total of 6 virulence-related genes were found in the sequenced strains (Table S10). Meanwhile, 2139 membrane transport proteins were annotated in the Transporter Classification Database (TCDB)³⁴, 1182 protein sequences containing signal peptides were predicted by SignalP software, and 2729 protein sequences containing transmembrane proteins and 926 protein sequences containing secreted proteins were predicted by TMHMM (Table S11). The analysis software and database information used in the study are shown in Table S12.

Prediction of secondary metabolite clusters (BGCs)

SM clusters prediction in A. sydowii MNP-2 was done using an available software packages antiSMASH³⁵, which predicts 52 putative clusters based on a search for conserved domains on the genome assembly. The predicted SM clusters of strain MNP-2 are defined by the "backbone enzymes" that generate the putative SM's carbon skeleton. The majority of the "backbone enzymes" in strain MNP-2 are polyketide synthase (PKS) or non-ribosomal peptides synthase (NRPS). 18 SM clusters contain sequence coding for a NRPS/NRPS-like enzymes, 10 SM clusters contain sequence coding for a PKS/PKS-like enzymes and 10 SM clusters are hybrid clusters. The remaining SM clusters seem to be required in terpene/terpenoid metabolites production as the “backbone enzyme” is a terpene cyclase (8 SM clusters) or indole synthase (5 SM clusters). These predicted BGCs reveal that strain MNP-2 has a diverse variety of secondary metabolite production potential, and some of them are strikingly similar to BGCs from compounds like neosartorin (A-1)³⁶, nidulanin A (A-2)³⁷, asperlactone (A-3)³⁸, squalestatin (A-4)³⁹, penicillin (A-5)⁴⁰, fellutamide B (A-6)⁴¹, equisetin (A-7)⁴², destruxin A (A-8)⁴³, and others that have been reported in current publications (Fig. 4). These compounds are all secondary metabolites produced by microorgnisms and exhibit a variety of biological functions (Table S13). Some of the substances mentioned above demonstrate the ability of strain MNP-2 to produce these kinds of substances.

Analysis of molecular networking (MN)

The metabolite analysis demonstrated that strain MNP-2 has the capacity to create complex molecular skeletons, particularly some heterocyclic or compound skeletons with a bridge-ring (G-1–G-10, Fig. 5). These compounds come from a wide range of sources and have diverse biological activities (Table S13)^44–50. Destruxin A (A-8), a cyclic peptide with strong bioactivity, was also found in sample 1 (Fig. 5, G-3), and the antiSMASH analytical platform was used to look into its potential BGC (Fig. 4h). Similarly, neosartorin (A-1), a xanthone analogue, and asperlactone (A-3), which contains a lactone ring, were both found to have similar structures present in the MN (G-7–G-10), greatly facilitating the analysis of biosynthetic pathways for this class of compounds. Furthemore, the majority of the metabolites analyzed were not matched to the corresponding BGCs, indicating a significant research gap that needs to be filled.

The analysis revealed that the metabolites of strain MNP-2 were extremely abundant, and their structural features mainly included aromatic polyketides, peptides, alkaloids, terpenoids, and fatty acids. In rice-solid medium, strain MNP-2 had the most abundant metabolites, as shown in Fig. 5. Furthermore, there are more monochromatic block nodes in the molecular network, indicating that strain MNP-2 has significant metabolic differences in different mediums. There were many unknown nodes present in clusters, and some of the marker compounds were closely related to the BGCs of strain MNP-2.

Till now, only five complete genome sequences of A. sydowii strains from various sources had been deposited in the NCBI database. Comparative analysis of the total gene features of these strains with those of A. sydowii MNP-2 showed that the strain MNP-2 possesses the middle level of genome size (34.9 Mb) and GC content (50%). However, the contig number of strain MNP-2 was only 10, suggesting the quality of genome assembly is more higher than others.

AntiSMASH study revealed that strain MNP-2 possessed 52 BGCs, including NRPS/NRPS-like (18 SM clusters), PKS/PKS-like (10 SM clusters), terpene cyclase (8 SM clusters), indole synthase (5 SM clusters), heterozygous route (10 SM clusters), and fungal-RiPP (1 SM clusters)". Compared to the strains A. sydowii CBS 593.65 (47 SM clusters), A. sydowii Fsh102 (43 SM clusters), A. sydowii AS31 (48 SM clusters), A. sydowii AS42 (43 SM clusters), and A. sydowii BOBA1 (50 SM clusters), the strain A. sydowii MNP-2 had the highest number of BGCs. Some BGCs may manufacture several complicated structurally active compounds (such as A1-A8)^36–43. However, the majority of the other BGCs do not match similar clusters that might synthesize unique chemical skeleton. Although the types of BGCs can be accurately predicted based on the core SM biosynthetic genes encoding backbone enzymes, it is still impossible to exactly predict the boundaries of BGCs or the functions of some clusters without backbone enzymes. This is due to the fact that a large number of genes around the core SM biosynthetic genes in strain MNP-2 cannot been characterized using open-source bioinformatics tools^51,52. Metabolic analyses of strain MNP-2 grown on various media (rice-solid, Czapek-Dox and PDB) using GNPS networking revealed its great potential of biosynthesis of bioactive SMs containing a variety of heterocyclic and bridge-ring structures. For example, compound G-2 exhibited potent anti-HIV effect with an IC₅₀ value of 7.2 nM and an EC₅₀ value of 0.9 nM⁴⁴. Compound G-5 had excellent in vitro cytotoxicities against the K562, MCF-7, Hela, DU145, U1975, SGC-7901, A549, MOLT-4 and HL60 cell lines with IC₅₀ values ranged from 0.10 to 3.3 µM, and showed significant anti-viral (H1N1 and H3N2) activities with IC₅₀ values of 15.9 and 30.0 µM, respectively⁴⁶. Compound G-10 displayed moderate immunosuppressive effect with an IC₅₀ value of 19.2 µg/mL⁵⁰. These findings indicate that the Arctic marine-derived strain MNP-2 is one of prolific producers of therapeutic agents. To deeply mine the biosynthetic potential of strain MNP-2, leveraging genomic and metabolomic data rapidly facilitates assessing the novelty of metabolites and linking them to their BGCs^53–55.

As one of underexploited organisms on earth, polar marine-derived microbes harbor more diversified genes for the biosynthesis of functional natural products¹. In this study, a high-quality whole genome sequence of an Arctic marine strain MNP-2 with a size of 34.9 Mb was successfully obtained. Its total number of genes predicted by BRAKER software was 13,218, and that of non-coding RNAs (rRNA, sRNA, snRNA, tRNA) predicted by using INFERNAL software was 204. The number of annotated genes was found to be 12,912, accounting for 97.68% of all genes using the Nr, Pfam, eggCOG, KEGG and GO databases. The results of these analyses are significant for gene resource mining and polar microbial genome investigations. Additionally, antiSMASH results indicated that strain MNP-2 harbors 52 BGCs, which can produce SMs with various structure motifs. This work effectively unveiled the biosynthetic potential of strain MNP-2 using genomics and metabolomics techniques. Various genome mining strategies should be further employed to awaken most cryptic BGCs in this strain to produce novel and/or valuable SMs⁵⁶, such as ribosome engineering⁵⁷, metabolic engineering⁵⁸, global regulators⁵⁹, protein modification genes⁶⁰, heterologous expression⁶¹, promoter exchange⁶², BGC refactoring⁶³, BGC-specific regulators⁶⁴.

SMs

secondary metabolites

molecular networking

GNPS

Global Natural Products Social Molecular Networking

BGCs

biosynthetic gene clusters

PDA

potato dextrose agar

PDB

potato dextrose broth

PBS

phosphate buffered solution

CAZymes

carbohydrate-active enzymes

GHs

Glycoside Hydrolases

GTs

Glycosyl Transferases

PLs

Polysaccharide Lyases

CEs

Carbohydrate Esterases

AAs

Auxiliary Activities

CBMs

Carbohydrate-Binding Modules

PHI

pathogen-host interaction

ARO

Antibiotic Resistance Ontology

CYP450

Cytochromes P450

DFVF

Database of fungal virulence factors

TCDB

Transporter Classification Database

ITS

internal transcribed spacer region

CDSs

protein-encoding sequences

SINE

Short interspersed element

LINE

Long interspersed nuclear element

LTR

Long terminal repeat

MFS

Major Facilitator Superfamily

Non-redundant protein sequence

COG

Cluster of Orthologous Group of proteins

KEGG

Kyoto Encyclopedia of Gene and Genome

Gene Ontology

CARD

Comprehensive Antibiotic Research database

TCDB

Transporter Classification Database

PKS

polyketide synthase

NRPS

non-ribosomal peptides synthase.

Data availability

Data is contained within the article or supplementary material. The complete genome sequence data reported in this study are available within NCBI GCA_034192605.1.

Author contributions

Huawei Zhang designed the research; Zhiyang Fu performed the research; Xiangzhou Gong, Zhe Hu and Bin Wei modified the figures; Zhiyang Fu wrote the manuscript. All authors read and approved the manuscript.

Funding

This work was financially supported by the National Key Research and Development Program of China (2022YFC2804203).

Competing interests

The authors declare no competing interests.

Santiago, I.F., Soares, M.A., Rosa, C.A. & Rosa, L.H. Lichensphere: A protected natural microhabitat of the non-lichenised fungal communities living in extreme environments of Antarctica. Extremophiles 19 (6), 1087–1097 (2015).
Makhalanyane, T.P., Van Goethem, M.W. & Cowan, D.A. Microbial diversity and functional capacity in polar soils. Curr. Opin. Biotech. 38, 159–166 (2016).
Liu, J.T. et al. Bioactive natural products from the Antarctic and arctic organisms. Mini-Rev Med Chem. 13 (4), 617–626 (2013).
Tian, Y., Taglialatela-Scafati, O. & Zhao, F. Secondary metabolites from polar organisms. Mar Drugs. 15 (3), 28 (2017).
dos Santos, G.S., Teixeira, T.R., Colepicolo, P. & Debonsi, H.M. Natural products from the poles: structural diversity and biological activities. Rev Bras Farmacogn. 31, 531–560 (2021).
Asthana, R.K. et al. Isolation and identification of a new antibacterial entity from the antarctic cyanobacterium Nostoc CCC 537. J Appl Phycol. 21, 81–88 (2009).
Lin, A., Wu, G., Gu, Q., Zhu, T. & Li, D. New eremophilane-type sesquiterpenes from an antarctic deep-sea derived fungus, Penicillium sp. PR19 N-1. Arch Pharm Res. 37 (7), 839–844 (2014).
Yang, A. et al. Nitrosporeusines A and B, unprecedented thioester-bearing alkaloids from the arctic Streptomyces nitrosporeus. Org Lett. 15 (20), 5366–5369 (2013).
Tripathi, V.C. et al. Natural products from polar organisms: Structur-al diversity, bioactivities and potential pharmaceutical applications. Polar Sci. 18, 147–166 (2018).
Kellogg, J.J. et al. Biochemometrics for natural products research: comparison of data analysis approaches and application to identification of bioactive compounds. J Nat Prod. 79 (2), 376–386 (2016).
Bachmann, B.O., Lanen, S.G. & Baltz, R.H. Microbial genome mining for accelerated natural products discovery: Is a renaissance in the making? J Ind Microbiol Biot. 41 (2), 175–184 (2014).
Caesar, L.K., Montaser, R., Keller, N.P. & Kelleher, N.L. Metabolomics and genomics in natural products research: Complementary tools for targeting new chemical entities. Nat Prod Rep. 38 (11), 2041–2065 (2021).
Paulus, C. et al. New natural products identified by combined genomics-metabolomics profiling of marine Streptomyces sp. MP131-18. Sci Rep. 7, 42382 (2017).
Hou, X.M. et al. Integrating molecular networking and ¹H NMR to target the isolation of chrysogeamides from a library of marine-derived Penicillium fungi. J Org Chem. 84 (3), 1228–1237 (2019).
Liu, L.L. et al. Molecular networking-based for the target discovery of potent antiproliferative polycyclic macrolactam ansamycins from Streptomyces cacaoi subsp. Asoensis. Org Chem Front. 7 (24), 4008–4018 (2020).
Sun, H.M. et al. Multi-omics-guided discovery of omicsynins produced by Streptomyces sp. 1647: Pseudo-tetrapeptides active against influenza a viruses and Coronavirus HCoV-229E. Engineering 16, 176–186 (2022).
Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: Unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32 (5), 767–769 (2016).
Stanke, M. et al. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435-W439 (2006).
Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 33 (7), 1870–1874 (2016).
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. & Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31 (19), 3210–3212 (2015).
Finn, R.D. et al. Pfam: Clans, web tools and services. Nucleic Acids Res. 34, D247-D251 (2006).
Tatusov, R.L.,Galperin, M.Y., Natale, D.A. & Koonin, E.V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28 (1), 33–36 (2000).
The UniProt Consortium. Reorganizing the protein space at the universal Protein Resource (UniProt). Nucleic Acids Res. 40 (D1), D71-D75 (2012).
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 (D1), D353-D361 (2017).
The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47 (D1), D330-D338 (2018).
Karp, P.D. et al. Pathway tools version 19.0 update: Software for pathway/genome informatics and systems biology. Brief Bioinform. 17 (5), 877–890 (2015).
O'Leary, N.A. et al. Reference Sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44 (D1), D733-D745 (2016).
Quevillon, N.A. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116-W120 (2005).
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42 (D1), D490-D495 (2014).
Urban, M. et al. PHI-base: The pathogen–host interactions database. Nucleic Acids Res. 48 (D1), D613-D620 (2019).
Alcock, B.P. et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48 (D1), D517-D525 (2019).
Park, J. et al. Fungal cytochrome P450 database. BMC Genomics 9 (1), 402 (2008).
Lu, T., Yao, B. & Zhang, C. DFVF: Database of fungal virulence factors. Database bas032, (2012).
Saier, M.H. et al. The transporter classification database (TCDB): Recent advances. Nucleic Acids Res. 44 (D1), D372-D379 (2016).
Blin, K. et al. antiSMASH 5.0: Updates to the secondary metabolite genome mining pipeline, Nucleic Acids Res. 47 (W1), W81-W87 (2019).
Matsuda, Y., Gotfredsen, C.H. & Larsen, T.O. Genetic characterization of neosartorin biosynthesis provides insight into heterodimeric natural product generation. Org Lett. 20 (22), 7197–7200 (2018).
Andersen, M.R. et al. Accurate prediction of secondary metabolite gene clusters in filamentous fungi. Proc Natl Acad Sci U.S.A. 110 (1), E99-E107 (2012).
Bacha, N. et al. Cloning and characterization of novel methylsalic-ylic acid synthase gene involved in the biosynthesis of isoasperlactone and asperlactone in Aspergillus westerdijkiae. Fungal Genet Biol. 46 (10), 742–749 (2009).
Bonsch, B. et al. Identification of genes encoding squalestatin S1 biosynthesis and in vitro production of new squalestatin analogues. Chem Commun. 52 (41), 6777–6780 (2016).
Fierro, F. et al. Transcriptional and bioinformatic analysis of the 56.8kb DNA region amplified in tandem repeats containing the penicillin gene cluster in Penicillium chrysogenum. Fungal Genet Biol. 43 (9), 618–629 (2006).
Yeh, H.H. et al. Resistance gene-guided genome mining: serial promoter exchanges in Aspergillus nidulans reveal the biosynthetic pathway for fellutamide B, a proteasome inhibitor. ACS Chem Biol. 11 (8), 2275–2284 (2016).
Kakule, T.B., Sardar, D., Lin, Z. & Schmidt, E.W. Two related pyrrolidinedione synthetase loci in fusarium heterosporum ATCC 74349 produce divergent metabolites. ACS Chem Biol. 8 (7), 1549–1557 (2013).
Wang, B., Kang, Q., Lu, Y., Bai, L. & Wang, C. Unveiling the biosynthetic puzzle of destruxins in Metarhizium species. Proc Natl Acad Sci U.S.A. 109 (4), 1287–1292 (2012).
Sato, M. et al. Novel HIV-1 integrase inhibitors derived from quinolone antibiotics. J. Med. Chem. 49 (5), 1506–1508 (2006).
Truman, P., Stirling, D.J., Northcote, P., Lake, R.J. & Hannah, D.J. Determination of brevetoxins in shellfish by the neuroblastoma assay. J AOAC Int. 85 (5), 1057–1063 (2002).
Wang, J.F. et al. Dicarabrol, a new dimeric sesquiterpene from Carpesium abrotanoides L. Bioorg Med Chem Lett. 25 (19), 4082–4084 (2015).
Li, A., Sun, A. & Liu, R. Preparative isolation and purification of costunolide and dehydrocostuslactone from Aucklandia lappa Decne by high-speed counter-current chromatography. J Chromatogr A. 1076 (1–2), 193–197 (2005).
Sviridov, A.F. Gingkolides and bilobalide: structure, pharmacology, and synthesis. Bioorg Khim. 17 (10), 1301–1312 (1991).
Ma, T.T. et al. Xanthones with α-glucosidase inhibitory activities from Aspergillus versicolor, a fungal endophyte of Huperzia serrata. Helv Chim Acta. 98 (1), 148–152 (2015).
Liu, H. et al. Polyketides with immunosuppressive activities from mangrove endophytic fungus Penicillium sp. ZJ-SY2. Mar Drugs. 14 (12), 217 (2016).
Li, X. et al. Genome sequencing and evolutionary analysis of marine gut fungus Aspergillus sp. Z5 from ligia oceanica. EBO. 12 (Suppl 1), 1–4 (2016).
Yaegashi, J., Oakley, B.R. & Wang, C.C. Recent advances in genome mining of secondary metabolite biosynthetic gene clusters and the development of heterologous expression systems in Aspergillus nidulans. J Ind Microbiol Biotechnol. 41 (2), 433–442 (2014).
Louwen, J.J.R., Medema, M.H. & van der Hooft, J.J.J. Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching. Microbiome 11(1), 13 (2023).
van der Hooft, J.J.J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem Soc Rev. 49, 3297–3314 (2020).
Louwen, J.J. & Van Der Hooft, J.J.J. Comprehensive large-scale integrative analysis of omics data to accelerate specialized metabolite discovery. Msystems 6 (4), e0072621 (2021).
Kalkreuter, E., Pan, G., Cepeda, A.J. & Shen. B. Targeting bacterial genomes for natural product discovery. Trends Pharmacoll Sci. 41 (1), 13–26 (2019).
Liu, L. et al. Ribosome engineering and fermentation optimization leads to overproduction of tiancimycin A, a new enediyne natural product from Streptomyces sp. CB03234. J Ind Microbiol Biot. 45 (3), 141–151 (2018).
Xu, F. et al. A genetics-free method for high-throughput discovery of cryptic microbial metabolites. Nat Chem Biol. 15, 161–168 (2019).
Peng, Q. et al. Engineered Streptomyces lividans strains for optimal identification and expression of cryptic biosynthetic gene clusters. Front Microbiol. 9, (2018).
Zhang, B. et al. Activation of natural products biosynthetic pathways via a protein modification level regulation. ACS Chem Biol. 12 (7), 1732–1736 (2017).
Alberti, F. et al. Heterologous expression reveals the biosynthesis of the antibiotic pleuromutilin and generates bioactive semi-synthetic derivatives. Nat Commun. 8, 1831 (2017).
Liu, Y. et al. A CRISPR-Cas9 strategy for activating the Saccharopolyspora erythraea erythromycin biosynthetic gene cluster with knock-in bidirectional promoters. ACS Synth Biol. 8 (5), 1134–1143 (2019).
Ren, H., Biswas, S., Ho, S., van der Donk, W.A. & Zhao, H. Rapid discovery of glycocins through pathway refactoring in Escherichia coli. ACS Chem Biol. 13 (10), 2966–2972 (2018).
Chen, Y., Yin, M., Horsman, G.P. & Shen, B. Improvement of the enediyne antitumor antibiotic C-1027 production by manipulating its biosynthetic pathway regulation in Streptomyces globisporus. J Nat Prod. 74 (3), 420–424 (2011).

No competing interests reported.

SupplementaryInformation.pdf

Download PDF

Version 1

posted

You are reading this latest preprint version

Biosynthetic potential analysis of an Arctic marine-derived strain Aspergillus sydowii MNP-2

Status:

Version 1

Abstract

Figures

Introduction

Methods

DNA extraction and next-generation sequencing

SMRT sequencing

Genome assembly

Gene annotation

Non-coding RNA annotation

Repetitive sequence annotation

Prediction of carbohydrate-active enzymes (CAZymes)

Analysis of pathogen-host interaction (PHI)

Prediction of drug-resistant gene

Cytochromes P450 (CYP450) annotation

Prediction of virulence gene

Other annotations

Prediction of biosynthetic gene clusters (BGCs)

Analysis of molecular networking (MN)

Fermentation and extraction

Results

Morphology, classification and phylogenetic analysis of strain MNP-2

Genome feature of strain MNP-2

Prediction of genetic structure

Annotation of gene functions

Annotations to proprietary databases

Prediction of secondary metabolite clusters (BGCs)

Analysis of molecular networking (MN)

Discussion

Conclusions

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1