HGCC isolate features
HGCC isolate with high virulence to cucumber was selected for de novo sequencing of C. cassiicola sampled from cucumber. By being sub-cultured on PDA plates at 25℃ in dark, HGCC produced a layer of fluffy aerial mycelium with whitish gray when young and dark green when old (Figure 1A), which was easily peel off. The clubbed conidia were varied in length (from 20 to 120 µm) and shape with one to nine septa, which could germinate at one end or two ends in sterile distilled water (Figure 1B). After being inoculated on sensitive and resistant cucumber cultivars artificially, HGCC caused typical symptom of cucumber target spot disease in the sensitive cultivar Biyu, but it hardly infected the resistant cultivar Shenqing-1 (Figure 1C and D).
Genome sequencing and general features
HGCC genome was de novo sequenced (184 × Coverage) using Illumina Hiseq X-Ten. 54,580,316 high quality reads were assembled into 1,032 scaffolds (N50: 500 kb) with 42.7 Mb of genome size, slightly less than CCP (44.8 Mb) (JGI: 1019537) and greater than UM591 (41.4 Mb) (GenBank: JAQF00000000.1) (Table 1). HGCC genome encoded 15,678 genes close to CCP (15,614) and slightly more than UM591 (15,388) through the prediction of coding sequence (CDS) prediction (Table 1). However, the number of secreted proteins in HGCC (1,166) were less than CCP (1,216) and UM591 (1,182). The proportions of secreted proteins in the three isolates were 7.4% for HGCC, 7.8% for CCP and 7.7% for UM591, which were similar to other phytopathogenic ascomycetes fungi (7-10%) .
Interspecific genome-wide phylogeny
HGCC genome had 93.1% and 92.3% amino acid sequence identity with CCP and UM591, and 46.1-58.4% with other phytopathogenic fungi such as C. lunata (58.3%), B. maydis (58.3%), S. turcica (58.3%), P. nodorum (58.2%), P. tritici-repentis (57.7%), C. zeae-maydis (48.4%), A. flavus (48%), B. cinerea (47.1%), F. graminearum (46.4%) and M. oryzae (46.1%), respectively. >70% of HGCC genes had >90% amino acid sequence identity with CCP and UM591, far more than 55% between two C. lunata isolates CX-3 from maize and m118 from sorghum, respectively . 13,672, 13,701 and 13,715 homologous core genes were screened by reciprocal blast analysis in HGCC, CCP and UM591, of which 1,335, 1,330 and 1,401 were specific to the 10 selected pathogenic fungi (Figure 2A), respectively. Additionally, HGCC, CCP and UM591 had 7.3% (1,138), 6.6% (1,035) and 6.7% (1,037) specific genes compared to each other, of which 6.7% (1,047), 6.3% (961) and 6.3% (971) were specific to the 10 selected pathogenic fungi, respectively (Figure 2A and Table 2). It was suggested that the three C. cassiicola isolates had distinct differences in genome, although HGCC had high amino acid sequence identity with CCP and UM591.
12,225 gene encoding proteins were classified into 4,070 conserved protein families in HGCC by Pfam matches with profile hidden Markov models, slightly more than 4,003 families containing 12,238 proteins in CCP and 4,002 families containing 11,918 proteins in UM591. Glycoside hydrolase, pectate lyase, cutinase and cellulose were important pathogenic factors in phytopathogenic fungi that richly existed in C. cassiicola (Additional file 1: Table S1). C. cassiicola had family expansions in fungal specific transcription factor (P = 0.00451), transporter (P = 0.00005), major facilitator superfamily (P = 0.00002), ATP-binding cassette (ABC) superfamily (P = 0.00985), cytochrome P450s (P = 0.00001), G-protein coupled receptors (P =0.00336), protein kinases (P =0.00008), proteases (P = 0.00004), glycoside hydrolase (P = 0.00035) and pectate lyases (P = 0) compared to P. nodorum, P. tritici-repentis, S. turcica, B. maydis, C. lunata, C. zeae-maydis, B. cinerea and M. oryzae, F. graminearum. These protein families were expected to play important roles in the fungal survival in varied adverse environments.
In order to mine potential virulence-associated genes, Blastp searches of three C. cassiicola genomes were conducted against the pathogen-host interaction (PHI) database. 28.6% of predicted genes of HGCC were matched with PHI database at a E-value of 1e-105 and putatively related to the PHI, which was close to 29.0% in CCP, 28.7% in UM591, 25.3% in P. tritici-repentis, 28.4% in S. turcica, 26.7% in B. maydis, 30.8% in C. lunata, 25.8% in C. zeae-maydis, 32.6% in A. flavus, 25.1% in M. oryzae, but far more than 21.9% in P. nodorum and 19.6% in B. cinerea.
An interspecific phylogenomic tree of C. cassiicola and the 10 other plant fungi was constructed based on concatenated amino acid sequences of 2,344 core proteins (Figure 2B). The tree showed C. cassiicola of Pleosporales order had the genetic affinity with other Pleosporales order fungi, followed by C. zeae-maydis of Capnodiales order of Dothideomycetes class, A. flavus of Eurotiomycetes class, B. cinerea of Leotiomycetes class, and M. oryzae and F. graminearum of Sordariomycetes class. In addition, it was showed in the tree that C. cassiicola speciation had occurred before the speciation of the other Pleosporales order fungi, suggesting that C. cassiicola speciation had occurred before the speciation of the Pleosporales order. These results were similar to the result of David Lopez .
Pathogenic signal pathway
G-protein-coupled receptors (GPCRs) transduce external environmental signals by way of heterotrimeric G proteins into secondary messengers to regulate gene expression and subsequent cellular response . They are required in plant recognition and pheromone/nutrient sensing of plant pathogenic fungi . Pth11, as one GPCR of Magnaporthe grisea, mediates appressorium differentiation and fungal pathogenicity , and its homologues exists in other plant pathogenic fungi. HGCC genome contains 181 GPCR-like genes and 77 Pth11-like GPCRs, which is not only more than 165/61 in CCP and 169/55 in UM591, but also far more than 81-156/21-58 in selected 10 other plant pathogenic fungi (Additional file 1: Table S2). 56.4% of GPCRs-like genes and 80.5% of Pth11-like GPCRs are identified as PHI associated genes in HGCC, far more than 28.6% of PHI percentage in HGCC genome. G-protein alpha subunit is a important component of heterotrimeric G protein complex , which can activate downstream effectors and function as fungal pathogenicity [30, 31]. In HGCC genome, three G-protein alpha subunit (HGCC_9368, HGCC_6623 and HGCC_511) were identified, and both of them were both PHI-associated genes. HGCC_9368, HGCC_6623 and HGCC_511 showed high amino acid identities with a G protein beta subunit of Pseudocercospora fijiensis (GenBank: XP_007925889.1, 93%) and two G-protein alpha subunits of Stemphylium lycopersici (GenBank: KNG46771.1, 95%; GenBank: KNG45504.1, 85%), respectively.
MAPK, cAMP and Ca signaling pathways were main virulence-associated signal pathways, and controlled by a series of protein kinase . Three MAPK pathways of Saccharomyces cerevisiae, FUS3/KSS1, Mpk1 and Hog1, were well studied and highly conserved in other fungi . Based on homologous searches of HGCC genome against known MAPK, cAMP or Ca pathways associated genes in S. cerevisiae, 48, 12 and 23 genes in HGCC, high homologous with MAPK, cAMP and Ca signal pathways genes of S. cerevisiae, were screened, respectively (Additional file 1: Table S3, S4 and S5). 156 protein kinases were identified in HGCC, close to 160 in CCP and 157 in UM591, but more than 106-140 in other plant pathogenic fungi (Figure 3, Additional file 1: Table S6). The 156 protein kinases of HGCC were classified into 8 groups. STE group (16 kinases) and MAPK family (5 kinases) in CMGC group were involved in MAPK pathway, and PKA family (3 kinases) of AGC group in cAMP pathway, CLK family (1 kinase) and RCK family (1 kinase) of CMGC group, CAMK group (22 kinase), and PKC family (2 kinases) in AGC group are related to Ca pathway. Interestingly, almost all protein kinases (151/156) of HGCC were PHI-associated genes, suggesting that protein kinases played key roles in pathogenic processes mainly mediated by the three pathogenic signal pathways. Fungal histidine kinase (HK) phosphorelay signaling pathway, i.e. two-component signaling pathway, play important roles in stress adaptation and virulence . HGCC, CCP and UM591 contained 11, 13 and 12 HKs, respectively.
Protein families involved in degrading plant cuticle, cell wall, cytomembrane and cell inclusion
In order to successfully invade into host plant, plant pathogenic fungi are expected to produce and secrete multiple extracellular degrading enzymes such as cutinase for the degradation of cuticle , cell wall degrading enzymes (pectinase, cellulase and hemicellulase) , and cytomembrane and cell inclusion degrading enzymes (protease and lipase) [35, 36]. 9 cutinases were identified both in HGCC, CCP and UM591 genomes (Table S1). But the number was constricted (P = 0.02717) with other plant fungi as the reference (average 12). Pectate lyase, a type of pectinase, well existed in C. cassiicola (36 in HGCC, 31 in CCP, 34 in UM591), far more than 9-20 in other fungi. But pectinesterase, another type of pectinase was less in C. cassiicola (4 in HGCC, 4 in CCP, 5 in UM591) and other fungi (1-5). 17, 19 and 19 cellulase were identified in HGCC, CCP and UM591, respectively, close to the average (15) of plant pathogenic fungi. Glucanase, glucosidase and xylanase belonging to hemicellulase were rich in C. cassiicola, of which only glucosidase existed gene expansion (P = 0.00277). In HGCC, CCP and UM591 genomes, proteases were the second largest family including 73 subfamilies, but they were mainly in families of metallpo peptidase (138/144/136) and serine protease (372/376/374) (Figure 3, Additional file 1: Table S7). Aspartic peptidases are virulence factors both in plant and mammalian pathogens due to their ability in cleaving a large number of host proteins . Interestingly, the number of A11 transposon peptidase (11 in HGCC, 10 in CCP, 8 in UM591) in C. cassiicola was significantly expanded (P = 0) in comparison with the average (0.3) of other plant fungi. HGCC, CCP and UM591 contained 23, 27 and 24 lipases, respectively, which were not significantly expanded (P = 0.09042) compared to other plant fungi.
Glycoside hydrolase (GH) catalyzes the hydrolysis of glycosidic bonds in complex sugars , which functions as fungal pathogenesis. C. cassiicola contains 44 GH families (Additional file 1: Table S8). The numbers of GHs in HGCC (269) is lightly less than CCP (282) and UM591 (281), but far more than the average of other plant pathogenic fungi (207). More than half of HGCC GHs (143/269) were PHI-associated genes. It was reported that GH6, GH7, GH45 and GH61 cellulases and GH10 xlyanases were absent in insect pathogenic fungi, but present in plant pathogenic fungi . The GH families of cellulases well existed in HGCC (72), CCP (74) and UM591 (75) including GH3, GH 6, GH7, GH45 and GH61 cellulases, of which GH3 and GH61 of cellulases were the majority. GH16 family of xyloglucosy transferases play an important role in the digestion of plant cell walls, and were well present in HGCC (14), CCP (17) and UM591 (18), close to the average (16) of other plant fungi.
Protein families for transportation
C. cassiicola contained a large number of transporters (693 in HGCC, 690 in CCP, 686 in UM591), which were classed into 88 families (Additional file 1: Table S9). The major facilitator superfamily (MFS) (240 in HGCC, 241 in CCP and 237 in UM591) and ABC (51 in HGCC, 50 in CCP and 48 in UM591) superfamily were the two biggest superfamilies. The former was capable of transporting small solutes in response to chemiosmotic ion gradients, and the latter transported small molecules and macromolecules under ATP hydrolysis [40, 41]. Notably, almost all MFS transporters and all ABC transporters in HGCC were PHI-associated genes, showing that MFS and ABC transporters played key roles in fungal pathogenicity.
In phytopathogenic fungi, drug transporters of ABC and MFS superfamilies can secrete endogenous fungal virulence factors such as toxin and protect pathogen against exogenous plant defense compounds such as phytoalexins [39, 42]. Two drug:H+ antiporter (DHA) subfamilies (DHA1 and DHA2), drug transporters of MFS superfamily, can secrete toxic compounds into outer environment . The multidrug resistance (MDR) and the pleiotropic drug resistance (PDR) subfamilies are drug transporters of ABC superfamily, being capable of functioning in resisting antifungal agents . There was almost no difference in the numbers of DHA1 (41, 41 and 41), DHA2 (5, 6 and 6), MDR (10, 9 and 9) and PDR (15, 15 and 14) among HGCC, CCP and UM591 genomes (Additional file 1: Table S10), but they were more than the average of other plant pathogenic fungi. Interestingly, all drug transporters of HGCC were PHI-associated genes except for DHA2 (4/5).
Protein families for detoxification
Cytochrome P450 enzymes (CYPs) are proteins of a superfamily being ubiquitous in all biological kingdoms , which contain heme as a cofactor and therefore are hemoproteins. Fungal CYPs play key roles in various metabolisms such as housekeeping biochemical reactions, detoxification of chemicals and adaptation to adverse surroundings . A large number of CYPs were identified in C. cassiicola HGCC (229), CCP (226) and UM591 (216), being classed into 115 families, which were far more than 95-167 in other plant fungi (Figure 3, Additional file 1: Table S11). Interestingly, almost all CYPs of HGCC (211/229) were involved in the PHI. CYP65 and CYP505 subfamilies participate in the biosynthesis of mycotoxin, for example, CYP65 subfamily catalyzed the epoxidation reaction in the trichothecene biosynthesis in F. graminearum , and both CYP505 and CYP65 were required in the fumonisin biosynthesis of Fusarium verticillioides [46, 47], showing that CYP505 and CYP65 subfamilies were probably related to the mycotoxin biosynthesis in C. cassiicola. CYP65 was the biggest subfamily in CYP superfamily of C. cassiicola (24 in HGCC, 28 in CCP and 26 in UM591) and other plant pathogenic fungi. Relatively, C. cassiicola contained less CYP505 (6 in HGCC, 5 in CCP and 9 in UM591).
Secondary metabolite backbone genes
Melanin and mycotoxin are important virulence factors in plant pathogenic fungi . So far, melanin and mycotoxin associated genes were not identified in C. cassiicola yet except for cassiicolin coded Cas gene. Secondary metabolite backbone genes were essential for the biosynthesis of melanin and mycotoxin as secondary metabolites. C. cassiicola HGCC contained 52 backbone genes, close to 51 in CCP and 49 in UM591 (Additional file 1: Table S12). Notably, almost all backbones (50/52) were PHI-associated genes, showing that these backbone genes were probably involved in the pathogenic process of C. cassiicola. The 52 backbone genes were classified into 5 groups including non-ribosomal peptide synthetase (NRPS), NRPS-like, polyketone synthase (PKS), PKS-like, dimethylallyl tryptophan synthase (DMAT) but lacking hybrid PKS-NRPS enzyme (HYBRID), of which PKS was the biggest group containing 33 genes. The numbers of backbone genes (P = 0.00350) and PKS (0.00032) were significantly expanded compared to other plant fungi.
In order to analyze the relationship between PKS domain and its function, phylogenetic analysis for the ketoacyl CoA synthase (KS) domain of PKS was performed among C. cassiicola PKSs and role-known PKSs in other pathogenic fungi (Figure 4). These PKSs were divided into two different clusters based on the phylogenetic analysis. One kind was reducing PKSs with KS, acyltransferase (AT) and dehydratase (DH) domains at least, which contained 25 C. cassiicola PKSs and 6 known PKSs involving in the biosynthesis of mycotoxin in other plant pathogenic fungi such as Aspergillus ochraceus AoLC35-12 for ochratoxin, Alternaria alternate ACTTS3 for ACT-toxin, B. maydis PKS1 and PKS2 for T-toxin, Gibberella zeae PKS4 for zearalenon, and Gibberella moniliformis Fum1p for fumonisin. Two reducing PKSs (g13578 and g7009) were specific in CCPagainst HGCC. The other kind was non-reducing PKSs with KS and AT domains at least and without dehydratase (DH), enoyl reductase (ER) and ketoreductase (KR) domains, which included 10 PKSs of C. cassiicola and 9 known PKSs related to melanin biosynthesis in other fungal pathogens such as Aspergillus fumigatus Alb1p, Ceratocystis resinifera PKS1, Colletotrichum lagenarium PKS1, Chaetomium globosum PKS-1, Ascochyta rabie PKS1, B. maydis PKS18, S. turcica StPKS, Bipolaris oryzae PKS1, and A. alternata ALM1. It was hard to identify which reducing PKSs were involved in the mycotoxin biosynthesis of C. cassiicola due to the complicated evolutionary relationship for KS domain of C. cassiicola reducing PKSs and known mycotoxin related PKSs. Nevertheless, a non-reducing PKS HGCC_7666 of C. cassiicola had the closest evolutionary relationship with known melanin-associated non-reducing PKSs, suggesting that HGCC_7666 was probably the backbone gene for melanin synthesis of C. cassiicola.
Small, cysteine-rich peptides and effector proteins
The small cysteine-rich proteins (SCRPs) could be secreted directly into host plant cells to function as host recognition or colonization  and the stimulation of host hypersensitive response (HR) . Some SCRPs as virulence effectors facilitated fungal virulence by multiple ways including perturbing host cell signaling, interfering with host recognition of the pathogen and suppressing pathogen-associated molecular pattern (PAMP)-triggered immunity (PTI). Some SCRPs as Avr genes triggered or suppressed effector-triggered immunity (ETI) mediated by a gene-for-gene system in the PHI . 51 SCRPs were identified in HGCC ranging in size from 68 to 150 amino acids, including 3 hydrophobin and 1 cerato-platanin (Additional file 1: Table S13). Fungal hydrophobins are involved in surface recognition , and play a role as effectors in the PHI . Cerato-platanin family proteins could elicit disease resistance responses in the host plant .
Lysin motif (LysM) contained effectors are ubiquitous in plant pathogenic fungi, which are secreted out of the cell to suppress the immune response of host [56, 57]. Common in fungal extracellular membrane (CFEM) domain is a fungi specific domain containing eight cysteines and is found in some proteins with proposed roles in fungal infection and colonization [58, 59]. HGCC contained 8 LysM-contained and 20 CEEM-contained genes, of which 5 LysM-contained and 15 CFEM-contained genes had signal peptides. It was suggested that the 5 LysM and 15 CFEM as candidate effectors were probably involved in fungal pathogenic process.
Gene expression in spore germination
To reveal the molecular and cellular processes during spore germination of C. cassiicola, 6 h- and 12 h- germination times were selected for RNA-seq based transcriptome analysis. For 12 h at 25℃ in sterile ddH2O, the majority of spores were germinated at two ends, and only a few of that at one end (Figure 1B). Germinated and un-germinated spores of C. cassiicola were harvested in biological triplicates for RNA isolation. RNA-seq libraries were constructed and sequenced using Illumina HiseqTM. Following quality control and adapters trimming, 611,310,902 bp clean paired reads were obtained from 6 C. cassiicola RNA-seq libraries. Sequencing yield from individual libraries ranged from 84,671,942 to 116,800,420 reads per sample (Table S14 in Additional file 2). Pearson correlation analysis show that samples of same treatment were high correlative to each other in gene expression levels with 0.885-1 of R2 value higher than 0.742-0.85 of R2 value between samples in different treatments (Figure S1), which suggested that samples in the same treatment were repeatable and RNA-seq data was reliable. 66-73% of C. cassiicola filtered reads per library mapped to the gene predictions of C. cassiicola. A total of 3,288 genes were differentially expressed, of which 1,552 and 1,736 were up-regulated and down-regulated during spore germiantion of C. cassiicola with un-germinated spores as the reference, respectively (Figure 5). Specific DEGs were listed in Table S15 in Additional file 2. These DEGs contained hundreds of previously identified functional gene from the genome sequences classified by gene families (Table 3).
A total of 2,600 DEGs (79.08%) were annotated by gene ontology (GO). GO term analysis for C. cassiicola revealed an enrichment for cellular process, localization, metabolic process, regulation of biological process and single-organism process, and binding and catalytic activity (Figure 6). A total of 891 DEGs (34.27%) were annotated by KEGG pathway, containing 624 up-regulated DEGs and 267 down-regulated DEGs. The pathway classification of DEGs were listed in Table S16 in Additional file 2, and the top 20 of the enriched pathway terms were shown in Figure 7. The pathway enrichment statics indicated that the majority of the KEGG annotated DEGs were involved in metabolism, genetic information processing, cellular processes, organismal system, human diseases and environmental information processing.
qRT-PCR validation of selected DEGs
qRT-PCR assays were conducted to validate the gene expression patterns of 66 DEGs containing 49 up-regulated and 17 down-regulated genes. As shown in Figure 8, qRT-PCR data were correlated with the RNA-seq data (R2=0.7565). The express patterns of 56 DEGs were confirmed by qRT-PCR except for 10 DEGs (Table S17). These results showed a high correlation of RNA-seq and qRT-PCR results, indicating that the RNA-seq data was very reliable.