Genomic analysis for Biosynthesis and Metabolic Pathway of Elsinochrome Toxin produced by Elsinoë arachidis

Background: Elsinoë arachidis , an important peanut pathogenic fungus that distributes widely and leads to large-scale losses in peanut producing regions in China, produce elsinochromes (ESCs) as the vital toxin through pathogenic process. However, the biosynthesis of elsinochromes have not been investigated and the transcriptional response of the light on synthesis of elsinochrome in Elsinoë is poorly understood. Results: In this paper, high-quality genome of E. arachidis by PacBio RS II sequencing method was reported. The 33.18Mb genome encodes 9056 predicted genes, of which the proportion of genes encoding secreted proteins 8.0% (734 secreted proteins), 124 transporter-related genes, 949 signal peptides, 1,829 transmembrane protein-coding genes, 127 non-coding RNAs and 13 pseudogenes. Mapped the E. arachidis assembly to 16 scaffold that contain 86 secondary metabolites gene clusters, including six polyketide synthase gene clusters encoding two melanin, one elsinochromes and three T-toxin were also identified in the genome. Additionally, ESC biosynthesis-related gene cluster predicted to contain ESCB1 which was high expression under light condition. Conclusion: Analysis of genomic information of E. arachidis lays a solid molecular data foundation for further exploration of pathogenic mechanisms and toxin biosynthesis pathways. Taken together, we provide a valuable foundation in the biosynthesis of elsinochrome and essential information for further comprehend its virulence mechanism.

4 interaction genes allow researchers to more understanding E. arachidis during the investigation of pathogenic mechanism.

Genome features
The genome of E. arachidis LNFT-H01 was sequenced (100 × coverage), high-quality sequencing data, a total of 6.28 Gb, was obtained and de novo assembled into 16 scaffolds (N50, 3,376,838 bp) with a total size of around 33.18 Mb by CANU, large than E. australis (23.34 Mb) (GenBank: NHZQ00000000) and Sphaceloma murrayae (20.72 Mb) (GenBank: NKHZ00000000). In which, sixteen scaffolds above 1Kb, the length is 33,184,353 bp and the longest scaffold is 4,426,246 bp.
The completeness of E. arachidis LNFT-H01 genome was evaluated to be > 99%, totally encoded 9174 protein genes similar to E. australis (9223) and more than S. murrayae (8256). The proportion of genes encoding secreted proteins in E. arachidis LNFT-H01 was 8.0% (734 proteins), the proportions of secreted proteins in E. arachidis was close to 7-10%. Sixteen scaffolds were displayed by circusplot ( Fig. 1), the gene density is 285 genes per 1 Mb and have 127 Non-coding RNA and 13 pseudogenes are predicted in the genome.

Phylogenetic Analysis And Collinear Analysis
Phylogenetic analysis shows that E. arachidis is close to Sphaceloma murrayae and E. australis( Fig. 2A). In addition, synteny analysis of E. arachidis genome with E. australis, reveals that E. arachidis highest synteny with E. australis. For example, scaffolds 1, 5, 6 and 17 of E. australis correspond well with the scaffold 1 of E. arachidis, scaffold 32 and 37show well synteny to the scaffold 10 of E. arachidis (Fig. 2B).

Repetitive DNA Sequences And Methylation Sites
In eukaryotic genomes, repetitive DNA sequences have a critical role in genes function and genome structure, meanwhile, the different types and the proportion of repetitive sequences in a genome are different between species. [48,49] Among 16 scaffolds of E. arachidis LNFT-H01 genome, 7,033,311 bp repeat sequences were totally identified, such as DNA transposon and LTR retrotransposon (Table S2), which were accounts for 21.4% of the genome, in which, LTRs accounting for 78.46%. DNA methylation is significant in epigenetic processes and cell processes [50,51].The fungus genome contains a variety of DNA modifications, the most common of which are adenine methylation and cytosine methylation. 1,033,888 4 m-C (4-methyl-cytosine) and 28,762 6 m-A (6-methyl-adenosine) were identified in E. arachidis LNFT-H01 genome, additionally, m4C are the majority of methylation (97.3%), whereas m6A only 2.7%. As for methylase-specific motif based on DNA methylation, we identified by DNA polymerase kinetic information and detect the fungus-specific motif (Table S3) In order to clarify the secondary metabolic pathway of E. arachidis, KEGG used to identify the biological pathway in E. arachidis. The results of KEGG annotation show that the substance metabolism in this pathogen is active, including not only the formation of nutrients such as amino acids and sugars, but also the synthesis of some secondary metabolites. 4958 genes of E. arachidis were assigned to 24 functional regions of KOG annotation, and the number of genes distributed in different KOG categories was significantly different (Fig S1). The functional regions of the gene that account for a high percentage of the annotated results are posttranslational modification, protein turnover, signal transduction mechanisms, carbohydrate transport and metabolism, amino acid transport and metabolism, and secondary metabolites biosynthesis. The functional genes involved in transport and catabolism are abundant, and there are many genes involved, including lipid transport and metabolism, transport and metabolism related genes such as ions and coenzymes.
In the GO analysis, 3237 genes were further divided into 42 GO functional classify in biological process, cellular component and molecular function (Fig S2). The proportion of catalytic activity and metabolic process is high, and detoxification and antioxidant activity related to pathogen selfdetoxification are also noted. The annotation of these gene functions for further study of the secondary metabolic biosynthesis and transport process of toxins provides a rich data base of E. arachidis.

Gene Family Analysis
Analyses of the E. arachidis genome for pathogenicity proteins In order to explore the potential pathogenic genes of E. arachidis, using the pathogen-host interaction database for Blastp alignment, 2,752 genes were screened from the E. arachidis genome, including secondary metabolic synthesis of key genes, cytochrome P450, ATP-binding cassette superfamily (ABC) transporter and Major Facilitator Superfamily (MFS) and other related genes, can be speculated on the complexity of the disease (Table S4).
Gene Associated With Detoxification ESC, which produced by E. arachidis, can produce a large amount of active oxygen under light conditions. Active oxygen can act on cell membranes and destroy its structure. E. arachidis can also grow and develop in the case, indicating that it has a certain detoxification effect.
The MFS transporter and ABC transporter are the two largest families of fungal transporters [52][53]. A total of 57 ABC superfamily transporter genes and 190 MFS superfamily transporter genes were obtained from the genome of E. arachidis. In addition, the cytochrome P450 enzyme system is a multifunctional oxidoreductase. [54]. In the genome of E. arachidis, 78 cytochrome P450 enzymes were predicted, these may be involved in the synthesis and the detoxification of toxins.

The CAZyme
The carbohydrate-active enzymes secreted by pathogenic fungi are involved in the process of  The cuticle and cell wall on the plant surface are the first barriers to prevent the invasion of pathogen.
Most pathogen invade the host defense system by producing cutinase and cell wall degrading enzymes. Through, further analysis detect pectin lyase (15), cutinase (13) and cellulase (19), the function in pathogenitic process still futher to study.

Secondary Metabolite Gene Clusters
E. arachidis can produce secondary metabolites ESC [5]. Although ESC is an important pathogenic factor in E. arachidis, the core gene for ESC synthesis has not be clarified in E. arachidis. The genome of E. arachidis provides the possibility to find the core genes for ESC. To identify gene clusters responsible for biosynthesis of polyketides in E. arachidis, using antiSMASH2 to identify all secondary metabolites clusters. Totally own 86 predicted secondary metabolites clusters, including polyketide synthase (PKS), nonribosomal peptide synthetase (NRPS), NRPS-PKS hybrid and others. The number and distribution of coding genes contained in gene clusters are different.

Identification and analysis of PKS Genes in E. arachidis
To further clarify the PKS gene cluster that regulates the biosynthesis of ESC in E. arachidis, a total of 19 polyketide synthase protein sequences from different species were analyzed, and phylogenetic tree was constructed (Fig. 3

RT-qPCR analysis of ESCB1
The biosynthesis of ESC in E. arachidis was significantly different in different light condition [47].
Under light condition, the production of ESC was 16 nmol/plug, while in the dark, no synthesis of toxins was detected. To further clarify whether ESCB1 participate in the biosynthesis of ESC, the expression of ESCB1 under different light conditions was examined. Results showed that the expression of ESCB1 was the same as that of toxin production (Fig. 5).

Distribution of the ESCB1 gene cluster of ESC
Further analysis of the ESCB1 cluster, 13 putative ORFs were identified including ESCB1 (Fig. 6). The ESCs are perylenequinones photosensitive toxins, which can produce a large amount of reactive oxygen species (ROS) under light condition and act on the cell membrane and destroy its structure. E.
arachidis can also grow and develop under the circumstance of high toxin production, indicating that it has certain detoxification. 57 ABC superfamily transporter genes, 190 MFS superfamily transporter genes and 78 cytochrome P450 enzymes were obtained from the genome analysis of E. arachidis.
ABC transporter is a multi-component major active transporter with ATP-binding region, which can be transported by ATP under the conditions of active transport, including transmembrane transport of various molecules including small molecules and macromolecules, while MFS transporter is A single polypeptide-assisted vector capable of transporting small molecules in response to chemically permeable ion gradients [54][55]. Cytochrome P450 enzyme system is a multifunctional oxidoreductase [56], and it is speculated that the three may participate in the self-detoxification of E.
arachidis, and reduce the effect of toxins on themselves by secreting toxins in vitro and redox and other physiological reactions. In addition, gene annotation includes detoxification genes and antioxidant genes related to self-detoxification, which provides ideas and references for further research on self-tolerance to ESC of E. arachidis.
So far, the molecular mechanism of biosynthesis of cercosporin is the most studied and characterized.
The CTB gene cluster centered on the gene CTB1 (cercosporin synthase gene 1) encoding polyketide synthase has confirmed that the biosynthesis of the toxin is synthesized by the polyketide were involved in T-toxin. Interestingly, by comparing the positions of the genome sequences of the six gene clusters, it was found that ESCB1, EVM0004732 and EVM0005880 are located in Contig00003. It is speculated that the positional relationship between the three may be related to the mechanism of melanin and ESC production by E. arachidis. This also provides new ideas for the study of ESC secondary metabolic synthesis pathways. The gene cluster contains 13 predicted genes including

Materials And Methods Strain and culture conditions
The strain of Elsinoë arachidis used in this study was LNFT-H01 which have been previously identified.

Genome Sequencing And Assembly
A 20Kb library was constructed for the sequencing Elsinoë arachidis LNFT-H01 strain genomic DNA which performed by PacBio RS II (Biomarker Technologies)to ensure that the sequencing depth was not less than 100 ×. The genome sequence was assembled by Canu [18][19][20] software.

Phylogenetic And Syntenic Analysis
The protein sequences predicted by E. arachidis LNFT-H01 and the reference genome of E. arachidis were subjected to family clustering using OrthoMCL software, and then single copy gene were extracted. The phylogenetic tree was constructed using PhyML software by single-copy gene sequences of E. arachidis LNFT-H01 and E. australis. [21,22] The protein sequence of E. arachidis LNFT-H01 is BLAST aligned with the protein sequence of E.
australis, collinearity relationship at the nucleic acid level is obtained based on the positional information of the homologous gene on the genomic sequence. Using the software MCScanX to map E. arachidis LNFT-H01 to E. australis in pairs according to the collinearity. [23] Repetitive Sequences And DNA Methylation Analysis Since the conservation of repetitive sequences between species is relatively low, the prediction of repetitive sequences for specific species requires the construction of a specific repetitive sequence database. Therefore, we use LTR_FINDER [24], MITE-Hunter [25], RepeatScout [26], PILER-DF [27] to construct a repetitive sequence database of sequencing data genomes based on the principles of structural and de novo prediction, using PASTEClassifier [28] classify the database, then merge it with Repbase [29] as the final repetitive sequence database, use RepeatMasker [30] software to predict the sequence of the repeated data based on the constructed repetitive sequence database.
Based on the kinetic information generated by the DNA polymerase synthesis reaction, SMRT threegeneration sequencing technology can directly identify DNA methylation modification sites [31].

Protein Family Classifications And Function Characterization
In order to further analyze the protein family in E. arachidis, the Carbohydrate active enzymes database (http://www.cazy.org/) [43], transporter classification database (TCDB) [44] and pathogen-host interaction database [45] were used to analyze CAZymes and other important gene families for further analysis.

Secondary Metabolite Cluster Identification, Characterization And Visualization
Secondary metabolite clusters in E. arachidis genome sequence were analyzed by using antiSMASH2 analysis. The GI number of the polyketide synthase is shown in table S1.

Measurements of toxin contents and Expression of ESCB1
Elsinochrome was extraction and quantitative analysis as before [47].

Competing interests
The authors declare that they have no competing interests. All authors have read and approved the final version of the manuscript.

Availability of data and materials
We have submitted our data in NCBI (http://www.ncbi.nlm.nih), the accession number is:JAAPAX000000000.
studies of fusarium trichothecene biosynthesis: pathways, genes, and evolution.   the fourth circle is repeated Sequence; the fifth circle is tRNA and rRNA, the blue is tRNA, the purple is rRNA; the sixth circle is GC content, the light yellow part indicates that the GC content of this region is higher than the average GC content of the genome, and the higher the peak value is the average GC content. The greater the difference, the blue part indicates that the GC content of the region is lower than the average GC content of the genome; the innermost circle is GC-skew, the dark gray represents the region where the G content is greater than C, and the red represents the region where the C content is greater than G.    Distribution of the ESCB1 gene cluster. BLASTX was used to search the NCBI database to predict the function of related genes.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.