DNA extraction and next-generation sequencing
The fungal strain MNP-2 of A. sydowii was derived from Arctic marine sediments (73.8° N 168.9° W) and is preserved at the China Center for Type Culture Collection (CCTCC NO: M 2022061). The strain MNP-2 grown on potato dextrose agar (PDA) media was inoculated into 250 mL Erlenmeyer flasks containing 100 mL potato dextrose broth (PDB) medium and shaken for 2 days at 200 rpm and 30 ℃. After centrifugation, the supernatant was removed and washed once with phosphate buffered solution (PBS) to obtain mycelium for storage at -80 ℃. The purity and integrity of the genomic DNA were evaluated using 1% agarose gel electrophoresis and densitometry on comparably sized standards.
The yield and purity of the collected DNA were determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and a Qubit 2.0 fluorometer (Thermo Fisher Scientific, USA). After the DNA samples were tested and qualified, the libraries were built, and after they were finished, the libraries were diluted using Qubit 2.0 for initial quantification, and then the insert fragments of the libraries were tested using Agilent 2100. After the insert fragment size exceeded the expectation, the effective concentration of the libraries was correctly measured using the Q-PCR method to guarantee the libraries' quality. After the libraries passed the test, they were divided into flow cells based on their effective concentrations and downstream data volume requirements. cBOT was formed into clusters and sequenced using Illumina NovaSeq, Illumina's high-throughput sequencing platform.
SMRT sequencing
The qualified samples described in 3.1 were randomly interrupted by Megaruptor for genomic DNA, and large fragments of DNA were enriched and purified using magnetic beads. large fragments were cut and recovered using a BluePippin automated nucleic acid recovery instrument, and after purification, end repair and addition of acid was performed at both ends of the DNA fragments, and the SQK-LSK109 kit. Finally, the DNA library was accurately quantified using Qubit. After library construction, a certain concentration and volume of DNA library is added to the flow cell, and the flow cell is transferred to the Oxford Nanopore PromethION sequencer for real-time single-molecule sequencing.
Genome assembly
Because raw data may comprise low-quality sequences, joint sequences, and so on, the raw data must be filtered to obtain legitimate data (clean reads or pass reads) and then stored in FASTQ format to ensure the dependability of the information analysis results. SOAPnuke (v2.1.2, https://github.com/BGI-flexlab/SOAPnuke) is used to filter the raw data from next-generation sequencing. The raw data for third-generation sequencing is fast5 files, which are converted to fastq format after base calling with GUPPY, and then filtered to obtain valid data. K-mer values were automatically selected based on the read length and data type. NECAT (v0.0.1, https://github.com/xiaochuanle/NECAT) software was used to correct and splice the genome to obtain the initial splicing results, then Racon (v1.4.11, https://github.com/isovic/ racon) software was used to perform two rounds of error correction on the splicing results based on third generation sequencing data, and then Pilon (v1.23, https://github.com/broadinstitute/pilon) software was used to perform two rounds of next generation sequencing error correction on the initial assembly results after third generation sequencing error correction. The final assembly results were obtained by deduplicating the corrected genomes using purge_haplotigs (v1.1.2, https://github.com/skingan/purg e_haplotigs_multiBAM).
Gene annotation
Gene structure prediction allows researchers to obtain extensive information about the genome's gene distribution and structure, as well as vital raw materials for functional annotation and evolutionary study. Gene annotation of the MNP-2 genome was conducted using BRAKER (v2.1.4, https://github.com/Gaius-Augustus/BRAKER) software, which is a combination of GeneMark-ET17, and AUGUSTUS18. The annotation of gene functions and metabolic pathways based on existing databases, containing predictions such as Motif, structural domains, protein activities, and information about the metabolic pathways in which they are placed, is referred to as functional annotation of genes. Gene function annotation was performed on strain MNP-2 using nine databases, including Nr (https://ftp.ncbi.nlm.nih.gov), Pfam (https://pfam.xfam.org/), eggCOG (https://www.ncbi.nlm.nih.gov/COG/), Uniprot (https://www.uniprot.org/), KEGG (https://www.kegg.jp/kegg/), GO (http://geneontology.org/), Pathway (http://www.pathwaycommons.org/), Refseq (https://www.ncbi.nlm.nih.gov/refseq/), Interproscan (https://github.com/ebi-pf-team /interproscan), and so on, in order to acquire comprehensive gene function information.
Non-coding RNA annotation
Non-coding RNAs can all be transcribed from the genome, but rather than being translated into proteins, they can carry out their biological tasks at the RNA level. TRNA and rRNA are two of them that are directly engaged in protein synthesis. Using INFERNAL (v1.1.2, https://github.com/EddyRivasLab/in fernal) software based on the Rfam database (http://rfam.xfam.org/), various forms of ncRNAs were predicted and statistically categorised.
Repetitive sequence annotation
Scattered repeats and tandem repeats are two types of repeated sequences. LTR, LINE, SINE, and DNA transposons are examples of scattered repetitive sequences, also known as transposon elements. They can be characterized as highly repetitive sequences, moderately repetitive sequences, or low repetitive sequences based on the number of repeats. The software RepeatModeler (v1.0.4, https://github.com/Dfam consortiu m/RepeatModeler) was used to create its own repeat library, and RepeatMasker (v4.0.5, https://github.com/rmhubley/RepeatMasker) was used to annotate the genome with repetitive sequences.
Prediction of carbohydrate-active enzymes (CAZymes)
CAZymes are a very important class of enzymes classified as Glycoside Hydrolases (GHs), Glycosyl Transferases (GTs), Polysaccharide Lyases (PLs), Carbohydrate Esterases (CEs), Auxiliary Activities (AAs), Carbohydrate-Binding Modules (CBMs), and so on. The research of carbohydrate-related enzymes can yield a lot of useful biological information. The CAZy database can be used to investigate carbohydrase genomic, structural, and biochemical information. HMMER (v3.2.1, https://github.com/EddyRivasLab/hmmer) was used to annotate protein sequences based on the CAZy database (filtering parameters: E-value < e− 18, coverage > 0.35, http://www.cazy.org/).
Analysis of pathogen-host interaction (PHI)
PHI is a database of pathogen-host interactions with experimentally validated content derived primarily from fungal, oomycete, and bacterial pathogen-infected hosts such as animals, plants, and insects. The target protein sequences were annotated using Diamond blastp (v2.9.0, https://github.com/enormandeau/ ncbi_b last_tutorial) based on the PHI database (http://www.phi-base.org).
Prediction of drug-resistant gene
The CARD framework is built as an Antibiotic Resistance Ontology (ARO) taxonomic unit to correlate information on antibiotic modules and their targets, resistance mechanisms, gene variants, etc. The comparison results display the position of each gene annotated in the CARD database (https://card.mc maste r.ca/), as well as the ARO ID and classification description, which can be used to understand the specific function of each gene related to antibiotic resistance.
Cytochromes P450 (CYP450) annotation
CYP450 is a large protein family that catalyzes the oxidation of a variety of substrates and participates in the metabolism of endogenous and exogenous substances such as drugs and environmental compounds. The target protein sequences were annotated using Diamond blastp based on the FungalP450 database (http://drnelson.utmem.edu/CytochromeP450.ht ml).
Prediction of virulence gene
Database of fungal virulence factors (DFVF, http://sysbio.unl.edu/DFVF/) is a database dedicated to the study of fungal virulence factors. To investigate the virulence-related genes present in strain MNP-2, the predicted protein sequences were compared with DFVF using Diamond blastp.
Other annotations
Classification of membrane transporter proteins using Transporter Classification Database (TCDB, http://www.tcdb.org/). All predicted gene pair protein sequences were analyzed using the software signalP (v5.0, http://www.cbs.dtu.dk/services/SignalP/) to identify proteins containing signal peptides. To identify proteins containing transmembrane helices and secreted proteins, all predicted gene-to-protein sequences were analyzed using the software tmhmm (v2.0, http://www.cbs.dtu.dk/services/TMHMM/).
Prediction of biosynthetic gene clusters (BGCs)
The genes responsible for secondary metabolite production are typically organized in BGCs. AntiSMASH (v6.1.1, https://docs.antismash.secondarymetabolites.org/) is the most extensively used tool for finding and characterizing BGCs in bacteria and fungi at the moment. AntiSMASH employs a rule-based technique to detect a variety of SM-producing biosynthetic pathways. For BGCs encoding NRPSs, type I and type II PKSs, lanthipeptides, lasso peptides, sactipeptides, and thiopeptides, which cluster-specific analyses can provide more information about the biosynthetic steps performed and thus provide more detailed predictions on the compounds produced, more in-depth analyses are performed.
Analysis of molecular networking (MN)
MN has swiftly become an extensively used technology in the field of natural products chemistry, with applications ranging from dereplication to genome mining, metabolomics, and chemical space visualization since the advent of the online open-source Global Natural Products Social (GNPS, https://gnps.ucsd.edu/). The samples were dissolved in methanol (1 mg/mL) and analyzed using a SCIEX X500 QTOF (SCIEX, USA) mass spectrometer to generate LC-MS/MS data, which were pre-processed by Mzmine and analyzed on the GNPS online platform to generate molecular networks, which were visualized using Cytoscape.
Fermentation and extraction
The strain MNP-2 grown on potato dextrose agar (PDA) media was inoculated into 500 mL Erlenmeyer flasks containing 200 mL potato dextrose broth (PDB) medium, and shaken for 3 days at 200 rpm and 30 ℃. The fermentation was performed in Erlenmeyer flasks (2 × 1 L) with sterilized rice (80 g) and tap water (120 mL). After autoclaving at 121 ℃ for 20 minutes, each flask was inoculated with 5% seed cultures and then incubated at room temperature under static conditions for 30 days. The fermented rice in each flask was extracted with 500 mL EtOAc by an ultrasonic instrument for 20 minutes three times followed by filtration using gauze. All the filtrate was combined and evaporated under vacuum to dryness, obtaining the sample 1 (approx. 1.56 g). The strain MNP-2 was inoculated in 500 mL flasks containing 200 mL PDB or Czapek-Dox Medium (2 flasks each) and shaken for 15 days at 200 rpm and 30 ℃. After completion of fermentation, the fermentation broth was extracted three times by EtOAc (twice the volume of the fermentation broth) and evaporated under vacuum to dryness, obtaining samples 2 (approx 0.31 g) and 3 (approx 0.27 g). The culture medium composition is shown in the supporting material.