Isolates and oxalic acid production
Scleotium rolfsii strains ZY and GP3 were originally collected from Henan and Guangxi provinces of China, respectively. These two strains were in different mycelial compatibility group (MCG) and exhibited similar growth rate on potato dextrose agar (PDA medium: 200 g peeled and sliced potatoes boiled for 20 min, 20 g dextrose, adjusted to pH 7.0, 20 g agar, to make the final volume 1000 ml with distilled water) [65]. Oxalic acid production of S. rolfsii was detected by two methods. PDA containing bromophenol blue was used to test oxalic acid produced in PDA plate. Mycelium discs of each strain were placed in the center of PDA medium containing 0.5 g/l of bromophenol blue and kept at 30 °C in the dark, four petri dishes for each strain. The diameter of yellow halo was measured after three days. KMnO4 titration was used to detect oxalic acid produced in liquid PDB. The strains grew in liquid PDB medium, three replicates of 150 ml flasks containing 30 mL of medium were included for each isolate, three discs per flask were added and the flasks were incubated without shaking at 30 °C in the dark. The culture of each strain was filtered through a Whatman No.1 filter paper after 5 days incubation. Oxalic acid (OA) content in 5 mL filtrate was determined using a KMnO4 titration method following the procedure of Kritzman’s [66].
Pathogenicity test
The experimental design was a randomized complete block with three replications. Plots consisted of three rows with a row length of 2.5 m and row space of 0.33 m. The peanut variety Zhonghua 21 was planted in all trials (15 plants per row) in plots. The plants were inoculated 50–55 days after sowing. S. rolfsii inoculum was prepared just before inoculation. Oat grains were soaked in water for 4 h, sterilized at 121℃ for 30 min twice after water removed. The fresh mycelium discs of S. rolfsii GP3 and ZY were transferred to the flasks containing sterilized oat grains, respectively. The oat grains culture maintained in the dark at 30℃ until surface of grains covered by S. rolfsii mycelium. The oat grains inoculum was mixed with equal amount of sterilized sand to ensure uniform delivery of inoculum. Each plant was inoculated with 2 g of S. rolfsii oat inoculum and sand mixture. The plots were watered to field capacity after inoculation. Disease symptoms were investigated 14 days after inoculation. A 1–5 scale for the severity of wilting according to Shokes’ method [67], where 1 = no symptom, 2 = lesions on stem only, 3 = up to 25% of the plant wilting, 4 = 26–50% of the plant wilting, and 5 = > 50% of the plant wilting. Disease index was calculated by using the following formula. DI = {[Σ(number of plants × corresponding diseases scale)] / (total number of plants × maximum disease scale)} × 100. Different level of virulence was determined according to Punja (1985) [16], high virulence with DI more than 66.7, and weak virulence showing DI less than 33.3.
DNA and RNA purification
To prepare the genomic DNA and RNA for sequencing, the GP3 and ZY isolates were cultured on PDA plates overlaid by cellophane films and maintained in the dark at 30 °C for 3–4 days. Mycelia were collected and grounded for DNA and RNA extraction. High-molecular-weight genomic DNA for single-molecule real-time (SMRT) was extracted using the SMRTbellTM Templated Prep Kit 1.0 (PACBIO). The genomic DNA for Illumina sequencing was extracted using a CTAB method as previously described [68]. Total RNA was extracted from mycelia using the TRIZOL Kit (Invitrogen, Carlsbad, CA, USA) following the manufacture’s protocol.
Genome sequencing and assembly
For PacBio Sequel genome sequencing, high molecular weight genomic DNA (20 µg ) was random sheared with Covaris- g-Tube with a goal of DNA fragments of approximately 20 kb and end-repaired according to the manufacturer’s instructions. A blunt-end ligation reaction followed by exonuclease treatment was performed to generate the SMRT Bell template, then the library was qualified and quantified using an Agilent Bioanalyzer 12 kb DNA Chip (Agilent Technologies, Santa Clara, CA, USA) and a Qubit fluorimeter (Invitrogen, Carlsbad, CA, USA). SMRT Bell cells were sequenced using the PacBio Sequel sequencing platform (Nextomics Biosciences, Co., Ltd., Wuhan, China). After adaptor removed and low quality reads filtered out, a total of 9, 97 Gb with 8.80 kb average subreads were generated for ZY and 6.34 Gb with 10.68 kb mean subreads for GP3, respectively.
For Illumina sequencing, about 100 µg of genomic DNA were sheared to ~ 180 bp using a Covaris LE instrument and adapted for Illumina sequencing on Illumina Hiseq Xten platform (San Diego, CA, USA) by NextOmics Biosciences. Illumina short reads were trimmed using Trimmomatic version 0.36 [69], a total of 6.42 Gb and 7.03 Gb clean data were yielded for GP3 and ZY, respectively.
The cDNA libraries were prepared by Illumina TreSeq RNA Sample Preparation Kit (Illumina, Inc., San Diego, CA, USA) and validated according to Illumina’s low-throughput protocol. The RNA-seq was conducted on an Illumina HiSeq 2500 Platform with 150 bp paired-end strategy.
A de novo genome assemblies of ZY and GP3 were generated with the PacBio Sequel reads using CANU pipeline (v1.5) [19] with default setting. The assemblies were adjusted using Arrow program, and polished using Illumina reads by Pilon [20]. Finally, the integrity of assemblies was evaluated using BUSCO [24].
Repetitive elements analysis
Repetitive elements were identified by using different methods. Transposable elements were analyzed using four programs, two programs for de novo prediction, including RepeatMoldeler (http://www.repeatmasker.org/RepeatModeler) and LTR finder [70], and the database based programs RepeatMasker (http://www.repeatmasker.org/) and Repeat-ProteinMasker (submodule in Repeatmasker) with default parameters to search Repbase [71]. Tandem Repeats Finder (TRF) was used to identify tandem repeat sequences [72]. MicroSAtellites (MISA) (https://www.plob.org/tag/misa) was used to identify simple sequence repeats (SSR) with default setting.
Genome annotation
Gene predication was performed by using a combination of ab initio-based and homology-based methods. To aid gene annotation, we generated transcript assemblies based on RNA of GP3 and ZY, respectively. For ab initio-based prediction, Augustus v2.4 [73] and Genscan (version 1.0) [74] were used to de novo predict protein coding genes with the default setting. Exonerate [75] was used to predict the gene structure with RNA-seq data. For homology-based predication, GeneWise [76] was used to predict protein coding genes by homology analysis with known protein sequences from six related species of Basidiomycota, including Galerina marginata, Gymnopus luxurians, Hydnomerulius pinast, Jaapia argillacea, Piloderma croceum, and Plicaturopsis crispa. EvidenceModler (EVM) was used to compute the weighed consensus gene structure annotation [77]. The final gene sets were obtained after removed genes with TE transposable elements by Tranposon PSI [78].
The predicted gene sets of S. rolfsii GP3 and ZY were functionally annotated based on similarity comparison with homologous in public databases. BLASTP was used to align the protein sequences by automated searches in NCBI-NR, Swiss-Prot (http://www expasy.org/sprot/), KEGG, GO and KOG database with E-values ≤ 1e-5. Gene function domain annotation was conducted by InterProScan program [79]. The pathway analyses were conducted by KAAS-KEGG Automatic Annotation Serve [80]. The candidate non-coding RNA (ncRNA) was annotated by two approaches, BLAST was used to align the S. rolfsii genome against the Rfam database [81], and tRNA scan-SE [82] and RNAmmer [83] were used to predict tRNAs and rRNA, respectively.
Analysis of orthologous gene families in Agariomycetes
Orthology comparison was conducted by OrthoMCL [28] (http://va.orthomcl.org) with e-value less than le-5 among protein sets of S. rolfsii GP3, ZY and nine related species of Agariomycetes, including Armillaria gallica (GenBank: GCA_002307695.1), Auricularia subglabra (GenBank: GCA_000265015.1), Exidia glandulosa (GenBank: GCA_001632375.1), Galerina marginata (GenBank: GCA_000697645.1), Gymnopus luxurians (GenBank: GCA_000827265.1 ), Hydnomerulius pinast (GenBank: GCA_000827185.1), Psilocybe cyanescens (GenBank: GCA_002938375.1), Scleroderma citrinum (GenBank: GCA_000827425.1), and Piloderma croceum (GenBank: GCA_000827315.1) .
Phylogenetic analysis and synteny analysis
The phylogenetic tree of S. rolfsii GP3, ZY and the above related nine species of Agaricomycetes was constructed by single copy gene based on the orthologous gene families analysis. Mafft [84] software was conducted to align the protein sequence of the single copy gene, and converted to coding sequence alignment. Gblocks [85] was used to extract the well-aligned regions of each coding sequence alignments. RAxML 8.2.12 [86] was carried out to generate the maximum-likelihood tree with 100 bootstrap replicates with Psilocybe cyanescens as an outgroup. The whole genome aligner Murmer 3.06 [87] was used for comparative analysis of the assemblies of GP3 and ZY. Dot plots between contigs of GP3 and ZY were created by MuMerplot programs from the MuMmer package.
Identification of the pathogenicity related genes
The S. rolfsii GP3 and ZY protein sets were used to conduct a BLASTP search against PHI base (a database of Host-Pathogen gene interaction) with e-value less than 1e-5 to identify pathogenicity genes. Putative carbohydrate active enzymes (CAZymes) of S.roflsii GP3 and ZY were annotation using dbCAN (dbCAN HMMs 5.0) [88] servers, with an e-value of less than 1e-5 and more than 70% coverage. The CAZymes were classified as per type of reaction catalyzed like Glycoside Hydrolases (GHs), Polysaccharide Lyases (PLs), Carbohydrate Esterases (CEs), Glycosyl Transferase (GTs), Carbohydrate-Binding Modules (CBMs), and
Auxillary Activities (AAs) as described in CAZyme database classification (http://www.cazy.org) [89].
Secretome and effector predication
The prediction of the S. rolfsii strains GP3 and ZY secretome was conducted based on the following pipeline. SignalP version 4.0 [90] was used to analyze signal peptide and cleavage sites of S. rolfsii GP3 and ZY proteins. Candidate proteins with signal peptide were identified by Protcomp 9.0 (http://www.softberry.com/berry.phtml) using the LocDB and PotLoc DB databases and proteins predicted as extracellular or unknown were kept for next analysis. The candidate proteins were conducted by TMHMM version 2.0 [91] to identify protein with transmembrane domains, and all proteins with 0 TM or 1 TM, if located in the predicted N- terminal signal peptide, were kept. The candidate proteins that harbored a putative glycophosphatidylinositol membrane-anchoring domain were identified by GPIsom (http://gpi.unibe.ch/) [92]. The remaining proteins without GPI-anchor were predicted with Target P [93], the proteins with a Target P Loc = S or – were kept in the final secretome databases. The candidate secretory proteins were blasted in NR database and PHI database to annotate the protein function and also searched against CAZyme database for function of CAZymes. The candidate effectors were identified by passing the secretome through the program Effector P 1.0 [94]. Putative effector candidates were manually inspected and those with known plant cell wall degrading catalytic domains, such as lipase, peptidase, CAZymes and cutinase were filter out. Putative candidate effectors were also screened for those candidates with molecular weight ranged from 50 to 300 amino acids, and at least 4 cysteine amino acids in their sequences.