Analysis of The Genome Sequence of Phomopsis Vexans: A Fungal Pathogen Causing Phomopsis Blight of Eggplant


 Background: Phomopsis vexans is a phytopathogenic fungus causing Phomopsis blight of eggplant. This disease is one of the major issues reducing eggplants production. To lay a solid foundation for the research of pathogenicity and to understand the mechanism of the disease development, the genome of an isolate PV4 was sequenced, assembled and analyzed from Guangdong, China. Results: The assembled complete genome size of Phomopsis vexans was about 59.78 Mb with 51.24% G+C content and 4.93Mb contig N50. In the genome, 3,552 annotated repetitive elements were identified. A total of 15,034 genes with 1,790bp average length were predicted, of which 14,116 genes were annotated by NCBI nr database. Moreover, 1,206 genes were annotated from Carbohydrate-Active enzymes (CAZy) Database which may play a role in degrading plant cell walls. At last, 134 effector proteins were predicted. Conclusions: The present genome is the first genome of Phomopsis vexans. The information obtained from this study support important resource for research on the pathogen and for sheding light on its pathogenicity mechanism.


Background
Phomopsis vexans is a phytopathogenic fungus that belongs to class Sordariomycetes, order Diaporthales, family Valsaceae [1] and causes Phomopsis blight of eggplant (Solanum melongena). [2] . It was rst reported by Halsted in USA and has been spread widely in all eggplant growing countries [3][4][5] . The common symptoms of Phomopsis blight include leave spot, branch brown spots, withered and fruit rot [6] . It has caused signi cant yield loss of eggplants, especially at tropical and subtropical regions, where the climate is warm and humid [7] . So Phomopsis blight is one of the most economically important diseases of eggplants.
Several ways have been tried to control phomopsis blight including crop rotation, fungicide treatments, etc [2,6,8] . However, they are neither lack of e ciency nor caused environment and food safety problem. It is critical to understand its pathogenesis mechanism to develop e cient way for phomopsis blight management. To approach this goal, more information about Phomopsis vexans should be needed. Scientists have isolated, morphological and molecular characterized it [7,[9][10][11] . However, information about the genomic features of P. vexans which are indispensable for pathogenesis mechanism research is still lacking.
To provide information about genomic basis of Phomopsis vexans' pathogenicity, genome of isolate PV4 was sequenced, assembled and annotated. Potential pathogenesis mechanism was discussed. This is the rst genome of Phomopsis vexans. It will shed light on its pathogenesis mechanism and help developing more e cient ways to manage Phomopsis blight of eggplants.

Methods
Isolation, identi cation, and cultivation of P.vexans isolate A strain of Phomopsis vexans was isolated from infected branch of an eggplant (Solanum melongena) tree plant in Zhongluotan Town, Baiyun District, Guangzhou, Guangdong Province, China (23°23'24.5"N 113°26'19.4"E). Brie y, 1x0.5cm size of the disease-health junction fruit tissue was choosed, rinsed twice with sterile water, sterilized with sodium hypochlorite solution for 10 minutes, and then washed with sterile water three times, cultured on a PDA plate at 28°C.
Genomic DNA extraction and sequencing After 7 days culture, 100mg mycelia was collected from PDA plate, frozen with liquid nitrogen immediately, then grounded with a mortar and pestle. Genomic DNA was then extracted by QIAGEN Genomic-tip tool kit. Sample quality testing, library construction, library quality testing, library sequencing process were performed in accordance with the standard protocol provided by Oxford Nanopore Technologies (ONT). Library construction included the following steps: a. Nanodrop, Qubit and 0.35% agarose gel electrophoresis were used for purity, concentration, and integrity testing; b. use BluePippin automatic nucleic acid recovery system to recover large fragments of DNA; c. use SQK-LSK109 ligation kit to construct library; d. sequencing on Nanopore sequencing platform.
Data analysis and genome assembly Clean data was obtained by trimming adapter sequences and poor quality bases for each sequence read. Canu v1.5 [21] software was used to correct the errors of the ltered subreads, then wtdbg [22] software was employed to assemble the corrected subreads. Finally, Pilon [23] was used to further correct the assembled genome to get higher nal accuracy data.

Repeat elements prediction
Due to the relatively low conservation of repetitive sequences between species, it is necessary to construct a speci c repetitive sequence database when predicting repetitive sequences for speci c species. Therefore, with the help of LTR_FINDER v1.05 [24] , MITE-Hunter [25] , RepeatScout v1.0.5 [26] , PILER-DF v2.4 [27] , we build fungal genome repetitive sequence database. Use PASTEClassi er [28] to classify the database, and then merge with the database of Repbase [29] as the nal repeated sequence database. Then RepeatMasker v4.0.6 [30] was employeed to predict the repeat sequence of the fungus.

General genome features
The genome of P. vexans isolate PV4 was sequenced by nanopore strand-sequencing. Statistics of genome sequencing and assembly are showed in Table 1  Repetitive elements identify 3,552 annotated repetitive elements were identi ed in the genome of PV4 (Table 3). The information about their annotation source, type, loci and attributes were shown in Table S1. They were generally classi ed in Class I (Retrotransposons, 738), Class II (DNA Transposons, 369), Potential host gene (455) and SSR (1,990). Major types were LTR/Copia (258) and MITE (218) in Class I and Class II separately. There were 4,931 unknown repetitive elements.

Gene prediction and annotation
As shown in Table 4 A total of 14,181 predicted genes were annotated with NCBI nr (14,116), GO (5,841), KEGG (3,729) and other four database (Table S2). Moreover, 95 predicted genes did not signi cantly match any known genes.
Nr homologous species distribution analysis showed that PV4 had most homologous genes with Togninia minima (21.97%), Pestalotiopsis ci (7.45%) and Colletotrichum gloeosporioides (6.36%) (Fig 1). Functional categorization and distribution of predicted genes by GO annotation are showed in Fig. 2. Distribution of annotated genes in KEGG database is shown in Fig. 3. Biosynthesis of amino acids (145), Carbon metabolism (124) and Ribosome (102) had the most annotated genes.
Pathogen-host interactions (PHI) database was used to nd more information about genes related to pathogen-host interactions. Result was showed in Table S3. Protein subcellular location analysis predicted that there were 1,786 signal peptides, 3,223 transmembrane proteins, 1,394 secreted proteins, and 134 effector proteins (Table S4).
Repetitive elements are critical factors to determine the genome architecture and drive evolution and host adaptation of fungal genome. Here, we identi ed 4.84% repetitive elements, which is less than Phomopsis longicolla (13%) [12] , but great than Phomopsis phragmitis (2.12%) [13] . The most important category of repetitive elements is transposable elements (TEs) which is 1.5% in Phomopsis vexans genome. TEs are mainly separated into two classes based on structural features and mode of transposition: Class I TEs (retrotransposons), function by reverse transcription and propagate through copy-and-paste mechanism; Class II TEs (DNA transposons), propagate through cut-and-paste mechanism [16,17] .
In this study, the majority of TEs was Class I (1.4% of whole genome), in which were long terminal repeats (LTRs), Copia had the largest number. Class II formed a minor fraction (0.1%) of the this genome. This is consistent with previous studies [12,18] .
Passing through the plant cell wall is the rst step for plant pathogen's infection [12] . Plant cell wall degrading enzymes (PCWDEs) are a subset of carbohydrate-activated enzymes (CAZy) that are produced by plant pathogens to degrade plant cell walls [19] . A total of 1,206 genes were annotated to be CAZy coding genes. In other study, the number of phomopsis pathogens were ranged from 778 to 1,702 [12][13][14][15] . This result indicated that Phomopsis vexans has a lot of genes encoding PCWDEs.
Pathogen need effector proteins to suppress the host immune response or manipulate host cell physiology [20] . Here, we identi ed 134 effector proteins predicted by EffectorP. There is no effector prediction in other research on phomopsis spp. Moreover, there is no validated effector in Phomopsis spp. Since effector prediction is available now, more research was needed to character their function.

Conclusion
Phomopsis blight of eggplant is one of the most serious diseases affecting eggplant production around the world.
Research about its pathogenesis mechanism is critical for its management. However, the genomic information of Phomopsis vexans was blank. Here, we reported the rst genome sequence of Phomopsis vexans. Repetitive elements, carbohydrate-activated enzymes, effectors and other genes were identi ed and annotated. The present study provided the genome sequence information of Phomopsis vexans, which will be helpful to understand the pathogenicity mechanism of this disease.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
Raw data of the genome is available at: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA694791

Competing interests
The authors declare that they have no competing interest Author's contributions TL designed the experiments. ZH and QY analyzed the data and wrote the paper. ZhiliangL, ZhengingL, YL, HW managed the eggplant eld and sampled fruits. XX, BS and CG isolated the Phomopsis vexans. All authors have read and approved the nal manuscript.