Whole Genome Sequence of Wilsonomyces Carpophilus, the Causal Agent of Shot Hole of Stone Fruits: Insights Into Secreted Proteins of a Necrotrophic Fungal Repository

Wilsonomyces carpophilus is a necrotrophic plant pathogenic fungus with a wide host range infecting all stone fruits such as peach, plum, apricot and cherry, and almonds among the nut crops. Necrotrophs are more devastating with a complex pathogenicity mechanism and least known effector repositories. Here, we report a 29.9 megabase draft genome assembly of W. carpophilus. We explored the hybrid technology of Illumina HiSeq and PacBio sequencing technologies to get the unbiased results of sequence reads. We aligned short Illumina reads against the long PacBio reads. A total of 10,901 protein-coding genes were predicted that includes varied set of genes such as HET genes, cytochrome-p450 genes, kinases etc. We mined 2851 simple sequence repeats (SSRs) in the genome assembly. We also predicted the diverse inventory of secretory proteins, transporters, primary and secondary metabolic enzymes. A total of 225 secreted proteins, hydrolases, polysaccharide-degrading enzymes, esterolytic, lipolytic and proteolytic enzymes were the most signicant proteins reecting the necrotrophic lifestyle of the W. carpophilus. We also identied 146 tRNAs and 52 rRNAs in the pathogen genome.


Introduction
The stone fruits that includes peach, plum, cherry, apricot, nectarine and, almonds among nut crops are important crops grown throughout the world. The foremost growing countries are America, Australia, Afghanistan, China, Iran, Italy, Greece, France, New Zealand, Portugal, India and Central Asian countries of earlier USSR 1 . Among the number of biotic factors affecting stone fruits among, shot hole disease caused by Wilsonomyces carpophilus is of paramount importance 2 . Shot hole disease is one of the major fungal foliar diseases in Prunus species worldwide 3 . The disease is reported from Africa, Asia, America (North, South, Central), Australia and Oceania 4 . Recently, 5 reported the shot hole disease of stone fruits (Prunus spp.) as a major threat to the wild-fruit forest of the Western Tianshan Mountains of China. The shot hole disease of Prunus spp. is also reported from California and Poland (https://nt.arsgrin.gov/ fungaldatabas-es/). In literature, a number of synonymous exits for the pathogen such as Thyrostroma carpophilum, Stigmina carpophila and W.carpophilus, however, recently 6 proposed W. carpophilus to name the pathogen. The intermittent outbreak of the disease causes notable yield losses ranging from 30 to 90% in cherry, and about 60.3% in apricot in Malatya province of Turkey 1 . The disease appear as small circular reddish or purplish lesion with yellow halo, the centre gradually enlarges and become necrotic that ultimately fall down leaving a shot hole appearance 7 . The fungus shows cross pathogenicity on different hosts under in vitro conditions 8 suggesting that the pathogen lacks speci city towards a particular host species and therefore causes disease in all the stone fruits, and almonds in nut crops. Such studies indicates broad host range of the pathogen and therefore needs profound study before devising a better management capsule. Existence of high pathological and molecular diversity in W. carpophilus hampers resistance breeding, a viable disease management alternative 1 .
In literature, a number of synonym such as Clasterosporium carpophilum (Lev.), Stigmina carpophila (Lev.), Thyrostroma carpophilum and W. carpophilus exists for the pathogen 9 , However, recently, 6 considered W. carpophilus as a causal organism of shot hole disease of Prunus spp. W. carpophilus is an orphan plant pathogen in terms of studies conducted in Prunus -shot hole interface. Although high pathogen diversity and successful development of an ATMT protocol 1,10 , for the fungus increased our understanding but many basic questions related to pathosystem warrants additional research. Genome sequencing of plant pathogens has provided the insights on pathogen life style besides it changed conventional genetics to genomics. Over last few years, hundreds of plant pathogenic fungal genomes have been decoded and pangenomics currently provide deep insights on pathogenicity and life style characteristics of plant pathogens 11,12 . To gain the evolutionary insights about the poorly studied W. carpophilus fungus, it is necessary to exploit the high throughput sequencing toolbox for understanding its life style and evolutionary dynamics. Based on the feeding habit, W. carpophilus is a necrotroph and to infect diverse Prunus spp., the pathogen need to be equipped with pathogenicity arsenals that have the capacity to breach pathogen triggered as well as effector triggered immunity of Prunus hosts. Besides secreting the chain of enzymes to degrade the host tissues, nectrotrophs exploit the cell death machinery of the host 13 . Therefore, effectors, the pathogen-encoded secreted proteins play crucial roles in necrotrophs to evade host defense system. Since, it is necessary to understand the pathogenicity mechanism of W. carpophilus and studies are needed on how the pathogen is able to manipulate the host cell machinery. Decoding genomes coupled with the ne-tuned bioinformatics pipelines have increased our knowledge about the pathogenicity mechanisms and provided insights on role of secreted effector molecules to evade host defenses. Here we report the rst genome draft of W. carpophilus with an aim to gain insights on Prunusshot hole pathogen interaction and fungus pathogenicity mechanism. Decoding the W. carpophilus genome using hybrid NGS technology provided us clues about its host defense mechanism evading capabilities and the pathogenicity armory the pathogen is harboring in its genome that makes it a successful pathogen of all Prunus hosts.

Fungal culture preparation
The pathogen was isolated from the shot hole infected leaves of stone fruits viz., plum, peach, apricot, and cherry and almonds among nut crops grown in University orchard of SKUAST-K, Shalimar, Srinagar (J&K). The puri ed fungal culture was maintained on Asthana and Hawker's and potato dextrose agar (PDA) 10,14 media. On the basis of morpho-cultural characteristics, the pathogen was identi ed as Wilsonomyces carpophilus synonym Thyrostroma carpophilum Nabi 1,10 . The pathogenicity of these isolates was carried out by detached leaf technique on their respective hosts 15 followed by their cross infectivity on different stone fruits including almond.

DNA isolationfor whole genome sequencing
The most virulent isolate of the pathogen based on minimum incubation time and symptom development, was selected for whole genome sequencing. The DNA of the pathogen isolate was extracted using XcelGen DNA isolation Kit (Xceleris, Ahmedabad, India) according to the manufacturer instructions. The quality and quantity of extracted DNA was checked using a Qubit 2.0 Fluorimeter (Life Technologies Ltd., Paisley, UK). The integrity of DNA (DIN) was checked using Bioanalyser 2100 (Agilent Technologies, Santa Clara, CA).

Library preparation and genome sequencing
The DNA Library was prepared using NEBNext Ultra DNA Library Prep Kit (Biolabs, England). The library preparation process was initiated with 200ng DNA. The adapters were ligated to both ends of the DNA fragments. These adapters contain sequences essential for binding dual-barcoded libraries to a ow cell for sequencing and PCR ampli cation. To ensure maximum yield from a limited amounts of starting material, a high-delity ampli cation step was performed using PCR Master Mix.
The whole genome of plant pathogenic fungus W. carpophilus was decoded using Illumina HiSeq and PacBio sequencing technologies. De Novo assembly of high quality paired end reads was accomplished using Velvet v1.2.10 and the assembly was optimized at Kmer-79 (Supplementary Table 2.) (Fig. 7). Further, scaffolding was performed on pre-assembled contigs taking long reads of PacBio using SSPACE-LongRead v1.1. We aligned Illumina short reads on PacBio long reads (a hybrid approach) using PBJelly software and GapCloser v1.12 to increase the precision of base calling.

Gene prediction and annotation
The assembled genome was subjected to gene prediction using Augustus v2.5.5 for the identi cation of coding sequences. The predicted protein coding genes were subjected to similarity search against NCBI's non-redundant (nr) database using Uniprot, KOG and Pfam database of BLASTP algorithm with an evalue threshold of 1e-5. Simultaneously, all the proteins were searched for similarity against BLASTP with an e-value threshold of 1e-5. Comparative analysis of gene annotation in different database was carried out using http://www.interactivenn.net/. Gene Ontology (GO) annotation was obtained using nr database through Blast2GO command line v-1.4.1. GO sequence distributions helps in specifying all the annotated nodes comprising of GO functional groups. Genes associated with the similar functions were assigned to same GO functional group. The GO sequence distribution was analyzed for all the three GO domains i.e. biological processes, molecular function and cellular components.
Simple sequence repeats A high-throughput SSR search to identify mono-to hexa-nucleotide SSR motifs was performed using MIcroSAtellite (MISA) identi cation tool (http://pgrc.ipk-gatersleben.de/misa/download/misa.pl) with default parameters. The default parameters were used so that di-nucleotide pattern should appear at least six times, whereas tri-, tetra-, penta-and hexa-nucleotide motifs should appear ve times.

Pathway analysis
Pathway analysis, ortholog assignment and mapping of genes to the biological pathways were performed using KEGG automatic annotation server (KAAS). All the gene sequences were compared against the KEGG database using BLASTP with threshold bit-score value of 60 (default).

Identi cation of tRNAs and rRNAs in the genome
To identify probable tRNA genes, we used tRNAscan-SE that allows detection of unusual tRNA species with accurate prediction of secondary structures. It includes both prokaryotic and eukaryotic selenocysteine tRNA genes, tRNA-derived repetitive elements and pseudogenes. The RNAmmer 1.2 was used for rRNA gene identi cation.

Collection of diseased planting material / samples
Necessary permission whenever required was obtained, and all the guidelines and legislation were followed for the collection of diseased planting material or samples from University orchard, SKUAST-K, Shalimar, Srinagar (J&K), India

Results
The introduction of novel sequencing technologies such as PacBio has revolutionized the genomic studies. Long read sequencing platforms are used to determine the complex genomic regions that are di cult to explore with short read length sequencing technologies. However, these long-read sequencing technologies are prone to higher error rates also, therefore, in a present study, we explored hybrid approach of Illumina HiSeq and PacBio to decipher the high quality whole genome sequence of Wilsonomyces carpophilus, a plant pathogenic fungus for its announcement for the rst time. The genome assembly of W. carpophilus shows that the hybrid approach of sequencing is effective in constructing contigs with almost full length genes. To authenticate the genome assembly, we assessed the completeness of the gene space using different softwares and found that the most of the core eukaryotic conserved genes were represented in the assembled genome. It is also commendable that we found 10901 genes using gene prediction algorithm. When these genes were blasted against the nr database, most of the tophits were against the Pyrenochaeta spp. followed by Ascochyta rabiei. Pyrenochaeta genus is comprised of a wide range of species infecting plants and humans. We found that the plant pathogenic species of Pyrenochaeta behaves similarly as that of W. carpophilus and no sexual stage (mating) has been reported in Pyrenochaeta lycopersici (a plant pathogenic fungus) 16 as well as in W. carpophilus in the nature [17][18][19] . Interestingly, the pathogen genome also lacks MAT genes that are the key factors to decide either sexual or asexual reproduction occurring in the fungus. This evidence clearly shows that the fungus is incompetent for sexual reproduction. Thus, the variability of fungus can be potentially due to the vegetative hyphal fusion or anastomosis, and is evident from the presence of HET genes in the W. carpophilus genome. The viable heterokaryon is formed by anastomosis only when individuals have same set of HET genotype, whereas individuals with different HET genotypes forms incompatible vegetative heterokaryon which later on undergoes programmed cell death 20 . The advantage of this is to limit the contamination of the pathogen and other lethal replicons between the strains 21 . The selective pressure is probably responsible for the broad diversi cation of HET genes and play a key role in the transfer of genetic information between the strains, and variability in the pathogen is indispensable for the adaptation to the environment and to overcome the host defense mechanisms. The other key genes deployed in the pathogen genome were antimicrobial peptide (AMP) binding genes that are used by the pathogen to surmount host defense. These AMPs are the part of plant's innate immunity system against the pathogen attack 22 . The pathogen has an ability to bind these genes in order to surpass plant defense system. AMPs are naturally synthesized low molecular products [up to 100 amino acids (AAs)] that are structurally and biochemically diverse in nature. The diversity in the AMPs suggests the diversity of AMP binding genes in the pathogen that ultimately explaining its diverse host range. We also found cytochrome P450, pKinases, sugar transporters etc. in the pathogen genome. The cytochrome P450 gene is a heme containing protein that are involved in the degradation of plant derived toxins and therefore plays an important role in fungal development eventually in pathogenesis 23 . Higher number of CYPs also indicates the wide host range of the pathogen requiring more toxins to overcome the phytoalexins. Similarly protein kinases play an important role in various key processes of the fungal life cycle such as growth direction, nutrient uptake, stress responses and reproduction 24 , thus can be an important factor in the infection process of the pathogen. Secretory proteins also play a crucial role in the fungal pathogenesis and colonization, and a set of secretory proteins suggests the feeding habit of the pathogen as biotroph, hemibiotroph or necrotroph. The e ciency and hostility of the phytopathogens are often associated with the presence of cell wall degrading enzymes. The rst and foremost obstacle to fungal pathogens in plants is the cell wall and its associated components. Plant pathogenic fungi secrete a concoction of hydrolytic enzymes known as carbohydrate-active enzymes (CAZymes), that are required to degrade the cell wall components of the host 25 . The presences of cell wall degrading enzymes such as glycosidases, glucanases, carboxylesterases, laccases, pectatelyases, cellulases etc. in the W. carpophilus genome are more often predicted in necrotrophs. The secretion and presence of these enzymes in the W. carpophilus showed its stronger resemblance to the necrotrophic plant pathogens. Unlike biotrophs, the necrotrophs have signi cantly expanded set of cell wall degrading enzymes, thus secretome of the W. carpophilus suggests necrotrophic behavior of the pathogen. The secretome of the fungus revealed some other proteins that also suggests its necrotrophic behavior such as FAD binding domains that have ability to catalyse various biochemical reactions and are mainly involved in electron transport chain 26 . We also found some other secretory proteins such as chaperone proteins also known as heat shock proteins that play an important role in the pathogenesis. They maintain the integrity of the pathogen in the adverse conditions, thus making it more viable for infection 27 . Laccasse precursor that as endo-1, 4-beta-galactosidase, cutinase, rhamnogalacturonan lyase, pectin-esterase, pectate lyase, lipolytic protein etc. that play a crucial role in the pathogenicity. Surprisingly, some pathogenicity determinants that plays a role in disarming the host defense such as mycelial catalases that are known to degrade the hydrogen peroxide produced by the host as a result of oxidative burst to kill the pathogen 29 . The presence of these enzymes suggests that the pathogen secrete diverse proteins that co-ordinate in an organized manner to cause disease and evade the host defense. However, these pathogenicity determinants need further characterization and validation to get in depth insight of pathogenicity mechanism of the fungus. In our previous study, we were successful in transforming the fungus by using random ATMT method and the transformants were unable to cause the disease 10 . However, the present study has revealed the number of pathogenicity genes that can be easily targeted to render pathogen ineffective. The present study has opened new opportunities for the comprehensive genomic study of a variety of biological, metabolic and pathological aspects that make the W. carpophilus a successful necrotrophic pathogen. Therefore, it is an opportune time to go beyond the conventional neutral genetics by identifying, analyzing, site speci c targeting of pathogenicity determinants and re-modelling the core effector repositories.

Discussion
The introduction of novel sequencing technologies such as PacBio has revolutionized the genomic studies. Long read sequencing platforms are used to determine the complex genomic regions that are di cult to explore with short read length sequencing technologies. However, these long-read sequencing technologies are prone to higher error rates also, therefore, in a present study, we explored hybrid approach of Illumina HiSeq and PacBio to decipher the high quality whole genome sequence of Wilsonomyces carpophilus, a plant pathogenic fungus for its announcement for the rst time. The genome assembly of W. carpophilus shows that the hybrid approach of sequencing is effective in constructing contigs with almost full length genes. To authenticate the genome assembly, we assessed the completeness of the gene space using different softwares and found that the most of the core eukaryotic conserved genes were represented in the assembled genome. It is also commendable that we found 10901 genes using gene prediction algorithm. When these genes were blasted against the nr database, most of the tophits were against the Pyrenochaeta spp. followed by Ascochyta rabiei. Pyrenochaeta genus is comprised of a wide range of species infecting plants and humans. We found that the plant pathogenic species of Pyrenochaeta behaves similarly as that of W. carpophilus and no sexual stage (mating) has been reported in Pyrenochaeta lycopersici (a plant pathogenic fungus) 16 as well as in W. carpophilus in the nature [17][18][19] . Interestingly, the pathogen genome also lacks MAT genes that are the key factors to decide either sexual or asexual reproduction occurring in the fungus. This evidence clearly shows that the fungus is incompetent for sexual reproduction. Thus, the variability of fungus can be potentially due to the vegetative hyphal fusion or anastomosis, and is evident from the presence of HET genes in the W. carpophilus genome. The viable heterokaryon is formed by anastomosis only when individuals have same set of HET genotype, whereas individuals with different HET genotypes forms incompatible vegetative heterokaryon which later on undergoes programmed cell death 20 . The advantage of this is to limit the contamination of the pathogen and other lethal replicons between the strains 21 . The selective pressure is probably responsible for the broad diversi cation of HET genes and play a key role in the transfer of genetic information between the strains, and variability in the pathogen is indispensable for the adaptation to the environment and to overcome the host defense mechanisms. The other key genes deployed in the pathogen genome were antimicrobial peptide (AMP) binding genes that are used by the pathogen to surmount host defense. These AMPs are the part of plant's innate immunity system against the pathogen attack 22 . The pathogen has an ability to bind these genes in order to surpass plant defense system. AMPs are naturally synthesized low molecular products [up to 100 amino acids (AAs)] that are structurally and biochemically diverse in nature. The diversity in the AMPs suggests the diversity of AMP binding genes in the pathogen that ultimately explaining its diverse host range. We also found cytochrome P450, pKinases, sugar transporters etc. in the pathogen genome. The cytochrome P450 gene is a heme containing protein that are involved in the degradation of plant derived toxins and therefore plays an important role in fungal development eventually in pathogenesis 23 . Higher number of CYPs also indicates the wide host range of the pathogen requiring more toxins to overcome the phytoalexins. Similarly protein kinases play an important role in various key processes of the fungal life cycle such as growth direction, nutrient uptake, stress responses and reproduction 24 , thus can be an important factor in the infection process of the pathogen. Secretory proteins also play a crucial role in the fungal pathogenesis and colonization, and a set of secretory proteins suggests the feeding habit of the pathogen as biotroph, hemibiotroph or necrotroph. The e ciency and hostility of the phytopathogens are often associated with the presence of cell wall degrading enzymes. The rst and foremost obstacle to fungal pathogens in plants is the cell wall and its associated components. Plant pathogenic fungi secrete a concoction of hydrolytic enzymes known as carbohydrate-active enzymes (CAZymes), that are required to degrade the cell wall components of the host 25 . The presences of cell wall degrading enzymes such as glycosidases, glucanases, carboxylesterases, laccases, pectatelyases, cellulases etc. in the W. carpophilus genome are more often predicted in necrotrophs. The secretion and presence of these enzymes in the W. carpophilus showed its stronger resemblance to the necrotrophic plant pathogens. Unlike biotrophs, the necrotrophs have signi cantly expanded set of cell wall degrading enzymes, thus secretome of the W. carpophilus suggests necrotrophic behavior of the pathogen. The secretome of the fungus revealed some other proteins that also suggests its necrotrophic behavior such as FAD binding domains that have ability to catalyse various biochemical reactions and are mainly involved in electron transport chain 26 . We also found some other secretory proteins such as chaperone proteins also known as heat shock proteins that play an important role in the pathogenesis. They maintain the integrity of the pathogen in the adverse conditions, thus making it more viable for infection 27 . Laccasse precursor that plays an important role in lignin depolymerization of infected host 28 and number of other enzymes such as endo-1, 4-beta-galactosidase, cutinase, rhamnogalacturonan lyase, pectin-esterase, pectate lyase, lipolytic protein etc. that play a crucial role in the pathogenicity. Surprisingly, some pathogenicity determinants that plays a role in disarming the host defense such as mycelial catalases that are known to degrade the hydrogen peroxide produced by the host as a result of oxidative burst to kill the pathogen an organized manner to cause disease and evade the host defense. However, these pathogenicity determinants need further characterization and validation to get in depth insight of pathogenicity mechanism of the fungus. In our previous study, we were successful in transforming the fungus by using random ATMT method and the transformants were unable to cause the disease 10 . However, the present study has revealed the number of pathogenicity genes that can be easily targeted to render pathogen ineffective. The present study has opened new opportunities for the comprehensive genomic study of a variety of biological, metabolic and pathological aspects that make the W. carpophilus a successful necrotrophic pathogen. Therefore, it is an opportune time to go beyond the conventional neutral genetics by identifying, analyzing, site speci c targeting of pathogenicity determinants and re-modelling the core effector repositories.      Unclassi ed: genetic information processing 4 Unclassi ed: signaling and cellular processes 13 Poorly characterized 33 Figures Figure 1 It represents scaffold length distribution across genome and maximum number of scaffolds has length above 5 Kb. The least represented scaffold length is in between less than 1kb to less than equal to 5kb.  Relationship of other fungal species with Wilsonomyces carpophilus. The bars represent no. of hits attained in nrdata base against the Wilsonomyces carpophilus genome representing its similarity with other fungal plant pathogenic species with maximum similarity to Pyrenocheata species.

Figure 5
Venn diagram representing the number of genes that are common between and among the four data bases. The comparison between the data bases revealed that only 3142 genes are common to each data base out of 10901 genes.