Insect inoculation and DNA extraction
M. brunneum strain 4556 was cultured in SDA medium plates and incubated at 25 0C for 10 days. Conidia were collected after 10 days by flooding the dish with 20 mL of 0.04 % Tween 80 and scraping the surface with a scalpel. The collected conidial suspension was vortexed until complete homogenization and filtered using a sterile nylon membrane. Concentration of conidial suspension was adjusted to 1 x 108 spores mL-1 using a hemocytometer (Neubauer, Germany). Spore viability was verified and spores were considered to have germinated if they had formed a germ-tube that was as long as spore width.
Larvae of the greater wax moth, Galleria mellonella, were immersed in 10 ml of conidial suspension for 10 seconds and were placed on moist filter paper in petri dishes in order to encourage sporulation and fungal growth. Controls were included with insects immersed in pure 0.04 % Tween 80, in order to ensure that insect death was a result of fungal infection. Plates were incubated in the dark at 25 0C and were inspected daily. After fungal growth was observed, mycelia were collected and grown on SDA media for DNA extraction.
A total of 100 mg of conidia was scraped off the plate under a laminar flow hood, and collected into a sterile 1.5 mL DNA LoBind tube (Eppendorf, Hamburg, Germany). The conidia were ground in the tube with a micro-pestle, and DNA was extracted using the PureLink® Plant Total DNA Purification Kit (Invitrogen, Carlsbad, USA), following the manufacturer’s guidelines. The DNA was checked for purity on a Nanodrop (Thermo Scientific, USA), and DNA concentrations were measured using the Qubit broad range DNA assay kit (Thermo Scientific, USA).
Illumina sequencing
Illumina DNA library preparation and sequencing were outsourced to Eurofins Genomics GmbH, Ebersberg, Germany. Illumina paired-end reads (2 x 150 bp) were produced using the ‘INVIEW Resequencing Sequencing of Fungi 50x Coverage’ package. Illumina reads were trimmed using Trimmomatic version 0.38 (32), setting the HEADCROP configuration to 15 and the CROP configuration to 120. Read qualities were assessed with FastQC (33).
Nanopore sequencing
A total of 1 ug of genomic DNA was used for Nanopore library preparation using a 1D Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies). Sequencing was performed on a MinION device (Oxford Nanopore Technologies), equipped with a R9.4.1 MinION flow cell. Base calling was performed offline with ONT’s Guppy software pipeline version 3.4.5, enabling the --pt_scaling flag and setting the --trim_strategy flag to DNA.
Long read filtering and correction
Long read adapter trimming was performed with Porechop version 0.2.4 (www.github.com/rrwick/Porechop), setting the --adapter_threshold to 96, and enabling the --no_split flag. In order to retrieve any circular contig assemblies (e.g. mitochondrial DNA), adapter trimmed long reads and trimmed Illumina paired-end reads were used as input for Unicycler version 0.4.8-beta (34), using the default settings. The trimmed long reads were filtered to remove reads under 3000 bases in length using NanoFilt version 2.6.0 (35), and were subsequently converted from FASTQ to FASTA format using a custom AWK script- ['BEGIN{P=1}{if(P==1||P==2){gsub(/^[@]/,">");print}; if(P==4)P=0; P++}' in.fastq > out.fasta]. The trimmed long reads were corrected using the trimmed Illumina short reads with FMLRC version 1.0.0 (36). These corrected reads were further trimmed with Canu version 1.9 (37), using the -trim option, setting the genome size to 38 Mb, and disabling the stop on low coverage and stop on low quality features. Two filtered read sets were generated from the Canu output using SeqKit version 0.11.0 (38), one set filtered to contain reads with >3,000 bases and the other to contain reads with >5,000 bases.
Long read assembly
One assembly was carried out per read set using Flye version 2.7 (39) using the --nano-corr flag, setting the genome size to 38 Mb and enabling the --trestle flag. Each of the two assemblies were then used to generate an additional assembly by subjecting each output to a total of two rounds of polishing with Flye (as opposed to the default of one round). Evidence from all assemblies were used to manually resolve tangles. Mapping of reads to a short contig of 5,231 bp, which contained the telomere sequence TTAGGG at its terminal end, showed the contig to overlap with an end repeat region of Chromosome 1, and they were combined manually with the aid of CAP3 (40), thus producing, in combination with the manual resolving of tangles, a FASTA file containing all 7 complete chromosomes.
Validation of assembly and comparison of long read assembler performance
In order to validate the final complete assembly and compare long read assembler performance of a fungal genome, assemblies were carried out on both the adapter trimmed long reads (>3,000 bp) and the FMLRC corrected Canu trimmed long read (>3,000 bp) using various assemblers. Assemblers tested included; Canu version 2.0, Flye version 2.7, Miniasm/Minipolish version 0.1.3 (41) Raven version 1.1.10 (42), NECAT version 0.01 (43), wtdbg2 version 2.5 (44), and shasta version 0.5.1 (45). All assemblers were run with default parameters (flagging raw or corrected reads depending on read input, Raven was run with the --weaken flag when corrected reads were used). Additional Flye assemblies were performed using both Canu and NECAT self-corrected read sets and an additional short-read corrected read set corrected with Ratatosk version 0.1 (46), in order to assess read correction strategy performance. The Ratatosk corrected reads were Canu trimmed using the same settings as for the FMLRC corrected read set. Assemblies were compared using Quast version 5.0.2 (47). Bandage version 0.8.1 (48) was used to visualize assembly graphs and search for telomere sequences by using the built-in blast function to search the telomere sequence TTAGGGn5, as well as blast searching the complete assembly against each assembly to determine inter-chromosomal mis-assembly events.
Assembly polishing
The uncorrected, adapter trimmed >3,000 bp long reads were realigned to the manually resolved assembly with minimap2 version 2.17-r941 (49) and the resulting alignment file was used to polish the assembly with Racon version v1.4.13 (50), using default parameters with the --no-trimming flag enabled. A total of two rounds of racon polishing were performed in this manner. The corrected consensus was further polished with the same long read set using Medaka version 0.11.5 (https://github.com/nanoporetech/medaka). The trimmed short-read pair-end Illumina reads were mapped to the long-read polished contigs using BWA-mem2 version 2.0pre2 (51), and the assembly was further polished with Pilon version 1.23 (52), enabling the --fix all and --changes flags. In total, four iterations of polishing with the Illumina reads were performed in this manner, and further polishing yielded no additional changes. A summary of the full assembly pipeline is shown in figure 1. A dotplot comparison of the scaffolds and contigs from the NCBI reference M. brunneum ARSEF 3297 assembly (GCF_000814965.1) against the complete assembly produced in this study was made using Mummer version 3 (53).
Gene prediction and functional annotation
BUSCO analyses were performed with BUSCO version 4.0.2 (54), using the hypocreales_odb10 lineage gene set. Chromosomes were visualized in Tapestry version 1.0.0 (https://github.com/johnomics/tapestry) in order to determine chromosome completeness (by checking for long read mapping gaps), and setting the telomere sequence as TTAGGG- a common eukaryotic telomere repeat sequence previously shown to be present in Metarhizium telomeres (55). All assembly annotations were performed in GenSAS version 6.0 (56), unless otherwise stated. Low complexity regions and repeats were detected and masked using RepeatModeler version 1.0.11 (57) and RepeatMasker version 4.0.7 (58), setting the DNA source to fungi and the speed/sensitivity parameter to slow. A masked consensus sequence was generated on which ab initio gene prediction was performed using the following tools; 1. GeneMarkES version 4.33 (59) with default parameters, 2. Augustus version 3.3.1 using Fusarium graminearum as the species, but otherwise keeping the default parameters, 3. GlimmerM version 2.5.1 (60) selecting Aspergillus as the organism. Two separate standalone ab initio gene predictions were conducted on the masked consensus sequence (one including the mitogenome sequence and the other without) using the latest version of GeneMarkES (4.48_3.60.lic), enabling the --ES and --Fungus flags. The highest BUSCO scoring ab initio predicted protein set was used for functional analyses using InterProScan version 5.25-68.0 (61), a native version of SignalP version 5.0 (62) setting the -org flag to eukaryote, and identifying ab initio predicted proteins with blastp (63) by conducting a protein vs protein search against the SwissProt protein data set to determine best matches. Ribosomal RNA genes were detected using RNAmmer version 1.2 (64). tRNA genes were determined using tRNAscan-SE version 2.0.3 (65). Comparison of orthologous gene clusters between the protein set generated in this study and the NCBI reference M. brunneum,M. anisopliae and M. robertsii protein sets was performed using OrthoVenn2 (66), with default parameters. The mitogenome, including previously described manual annotations (67), was visualized using the GeSeq tool in Chlorobox (68), selecting a circular mitochondrial sequence.
Full genome sequence-based synteny and pan-genome analyses of Hypocreales fungi
Synteny analyses were performed by comparing the M. brunneum complete genome assembly to three other species within the order Hypocreales that had genome assemblies that are designated as complete by the NCBI (full telomere length chromosomes). These included the genomes of the entomopathogenic fungus Cordyceps militaris (69), the systemic endophytic fungus Epichloe festucae (70), and the cellulolytic, endophytic fungus Trichoderma reesei (71). Genomes were aligned with progressiveMauve v2.4.0 (72), using default settings. Alignment blocks were filtered to remove syntenic blocks that were less than 1,000 bp in size, and also those which were not present in all 4 species. Synteny was inferred with i-ADHoRe v3.0 (73) running default parameters, and whole genome synteny between each species were visualized with Circos plots using Circos v2.40.1 (74). Ab-initio gene prediction was performed on the three genome assemblies of the other Hypocreales species using GeneMarkES (4.48_3.60.lic), enabling the --ES and --Fungus flags. In order to determine the core genes shared across the 4 species, comparison of orthologous gene clusters between the protein sets for each of the Hypocreales fungi were performed with OrthoVenn2 using default parameters.