New de novo assemblies using long-read sequencing for six Zymoseptoria spp.
We sequenced and assembled the genome of the reference isolates of Z. ardabiliae, Z. brevis, Z. pseudotritici and Z. passerinii and the genomes of two Z. tritici isolates sampled in Denmark and Iran. The obtained contigs were filtered based on base-quality confidence and read depth to ensure high quality of the final assemblies (see Methods). This filter removed a high number of contigs (between 17% and 58% of the total), but little overall length (between 0.4 and 2.6% of the total assemblies), indicating that most of the excluded contigs were of small size. The best assemblies were of the two Z. tritici isolates comprising 19 and 30 contigs and the most fragmented was of Z. passerinii comprising 103 contigs (Figure 1). The resulting assembly lengths ranged from 38.1 Mb for Z. ardabiliae to 41.6 Mb for Z. brevis, which is comparable to the reference assembly length of Z. tritici (39.7 Mb) but slightly larger than previous short-read based assemblies (Table 1; previous assemblies ranged from 31.5 Z. ardabiliae to 32.7 for Z. pseudotritici [22, 23]. The assembly of the Iranian Z. tritici isolate Zt10 has telomeric repeats at the end of all contigs, indicating that each chromosome is completely assembled, comprising six accessory and thirteen core chromosomes. The assemblies for the Danish Z. tritici isolate (Zt05), Z. brevis (Zb87) and Z. pseudotritici (Zp13) contained, respectively, twelve, nine and five fully assembled chromosomes including both core and accessory chromosomes (Figure S1). The assemblies of the Z. ardabiliae (Za17) and Z. passerinii (Zpa63) genomes included no fully assembled chromosomes, but twelve and ten contigs respectively with telomeres at one of the ends (Table 1; Figure S1).
The transcriptome-based gene predictions for these new assemblies include between 10,528 and 12,386 protein-coding genes (Table S1). This range is consistent with the annotation of the reference genome IPO323 reporting 11,839 protein-coding genes [22]. We used Benchmarking Universal Single-Copy Orthologs (BUSCO) from the lineage dataset Pezizomycotina to evaluate the completeness of the assemblies and gene predictions (Waterhouse et al. 2018). The proportion of complete BUSCOs were as follow Zt10: 98.7%, Zt05: 98.4%, Zp13: 98.5%, Zb87: 97.0%, Za17: 98.2% and Zpa63: 97.5%. These values are comparable to the one obtained for the reference genome of Z. tritici (97.8%). The assessment of gene content completeness (see Methods) indicates that, despite more fragmented assemblies of Z. ardabiliae and Z. passerinii, the genomes are complete in terms of gene content and that the unassembled fragments are more likely to comprise repeats and not protein coding genes.
Based on the whole-genome sequences and the predicted genes we reconstructed the phylogeny of the Zymoseptoria genus using the publicly available genome of Cercospora beticola as an outgroup [34, 35]. For both trees, the phylogenetic relationship of the Zymoseptoria species is in accordance with previously published phylogeny based on seven loci sequenced in multiples isolates (Figure 1) [21].
Genomes of Zymoseptoria spp. comprise accessory chromosomes and compartments but show overall high synteny
Next we addressed the extent of co-linearity of the Zymoseptoria genomes. Using coordinates of orthologous genes, we were able to reveal a high extent of synteny conservation among the five Zymoseptoria species and between the three isolates of Z. tritici, as depicted in Figures 2 and S2. Based on this high extent of synteny and the prediction of telomeric repeats, we identified the correspondence of chromosomes between the reference genome of Z. tritici IPO323 and the other Zymoseptoria genomes (Figure 2 and S2). Z. brevis and Z. pseudotritici share a near perfect synteny in their core chromosomes, however, when compared to Z. tritici, Z. brevis and Z. pseudotritici have two large-scale inversions comprising roughly ~900 kb and ~1.2 Mb of chromosomes 2 and 6, respectively (Figure 2, S2 and S3). Based on the phylogeny in Figure 1, it is likely that these two events occurred after the divergence of Z. tritici from Z. brevis and Z. pseudotritici. Overall, we observe a higher extent of synteny conservation between Z. brevis and Z. pseudotritici compared to Z. tritici IPO323 (Figure 2; S2 and S3).
In Z. tritici, core and accessory compartments have very distinct genomic features. It was previously shown that hallmarks of accessory regions in the reference isolate IPO323 include lower gene density, lower levels of H3K4 methylation levels and lower gene expression [6, 31]. In the reference genome of IPO323, compartments with these genomic and epigenomic hallmarks represent either accessory chromosomes or specific regions of the core chromosomes. Here we find that the specific accessory hallmarks including low gene density, low expression, low H3K4me2 methylation and significant enrichment of species-specific genes (see description below) on the non-core contigs are found in genomic compartments throughout the genus (Table S3, Figures 3 and S4).
In the reference Z. tritici strain, the specific “accessory-like” pattern includes a particular region of the core chromosome 7 of ~0.6 Mb (Figure 3A) [6]. This particularly large accessory marked region is observed in several Zymoseptoria spp (Figure 3B; S4). We identify ~0.7 Mb of the contig 28 in Z. pseudotritici and ~0.6 Mb of the contig 17 for Z. brevis, both corresponding to chromosome 7 of Z. tritici, which share the same hallmarks of accessory chromosomes (Figures 3 and S4, Table S3). For the two remaining sister species, the fragmentation of the assembly does not allow the identification of such pattern although there is an indication of a similar effect on gene expression and species-specific gene enrichment on contig 19 of Z. ardabiliae which corresponds to a fragment of chromosome 7 in IPO323 (Figure S5).
We also identified other regions enriched in isolate-specific genes, thus defining orphan loci in the core chromosomes of both Zt10 and Zt05 (Figures 3 and S4). We observe a region of ~0.2 Mb of contig 1 in the Iranian isolate Zt10 corresponding to the core chromosome 3 in the IPO323 genome with high content of isolate-specific genes (Figure S4; table S3). We furthermore identified small segments with species-specific genes on the core chromosomes of the wild-grass infecting sister species including a ~0.3 Mb region of the contig 26 in Z. brevis and ~0.1 Mb of contig 30. Overall, we show that genome compartmentalization in core and accessory regions is an ancestral and shared trait among the Zymoseptoria species. This phenomenon generates highly variable compartments and defined loci that deviate from genome averages in terms of gene content, sequence composition and synteny conservation.
Variable repertoires of effector candidate genes
To obtain gene annotations for the Zymoseptoria genome assemblies, we established a custom pipeline adapted from Lorrain and co-workers (Figure S5) [36]. Briefly, we use the consensus of three methods to predict gene product localization, then extract secreted proteins to further identify predicted effectors. This detailed functional annotation provided a catalog of predicted gene functions and cellular localizations (Figure S6). For each genome, a large proportion of genes could not be assigned to a protein function and lacked protein domains. 49.6% of genes in Z. tritici (N = 5953) and up to 71.8% of genes in Z. pseudotritici (N = 8373) lack a predicated function (Figure S6A). A relatively consistent number of genes are predicted for each functional category among Zymoseptoria spp. (Figure S6A and B). Likewise, the numbers of gene products predicted to belong to the different subcellular localizations are very similar (Figure S6C) across the whole genus, including secreted proteins. The difference between the minimal and maximal gene number fpr the different categories of subcellular localizations do not exceed 1.6X between species (Figure S6C). Overall, secretomes range from 7% of the genes predicted in Z. passerinii (N=828) to 11% of genes in Z. ardabiliae (N=1328 genes, Figure S6B).
We further investigated the number and distribution of genes predicted to encode proteins with a pathogenicity-related function, such as secondary metabolites, CAZymes and effector candidates (Figures S1 and S6B). Genes involved in the synthesis of secondary metabolites are typically organized in clusters, with genes participating in the same biosynthetic pathway grouping together at a genomic locus (Shi-Kunne et al. 2019). The number of biosynthetic gene clusters (BGC) ranges from 25 in Z. ardabiliae and Z. passerinii to 33 in the IPO323 reference genome and includes from 305 to 471 predicted genes (Figure S1). The only BGC identified in a non-core contig is a non-ribosomal peptide synthetase BGC found on the contig 38 of Z. brevis which has no orthologous cluster detected in any of the other Zymoseptoria genomes (Figure S1). We identified between 454 and 515 CAZyme genes in the Zymoseptoria species. Both BGCs and CAZymes are almost exclusively found on the core chromosomes (Figure S1). The only exceptions are a CAZyme encoding gene found on chromosome 14 in Z. tritici IPO323 and Zt05, and a CAZyme encoding gene on the putative accessory contig 38 of Z. brevis (Figure S1). These two genes encode for a beta-glucosidase and a carboxylic-ester hydrolase, respectively.
In contrast to the high conservation of CAZyme and BGC gene content among the Zymoseptoria genomes, we find that predicted effector genes exhibit a large variation in gene numbers between genomes (Figure S6B). In fact, the predicted effector gene repertoire of in Z. ardabiliae (N=637) is three times as high compared to Z. brevis (N=206). Interestingly, the three Z. tritici isolates also vary considerably in their predicted effector repertoires. The reference isolate IPO323 has a reduced set of effector genes (N=274) compared to Zt05 and Zt10 that encode approximately 30% more effector genes (N=417 and N=403, respectively, Figure S6B). Despite the high variability, the effector genes are mostly located on core chromosomes and none of the five Zymoseptoria species have more than ten effector genes located on accessory chromosomes (Figure S1).
The accessory genes of Z. tritici are shared with the closely related wild-grass infecting species
To further characterize variation in gene content among the five Zymoseptoria species, we identified orthologous genes (i.e. orthogroups) from the gene predictions. We categorized 22341 gene orthogroups identified in the seven Zymoseptoria genomes and in C. beticola according to their distribution among fungal genomes (Figure 4A). The core orthogroups, which are genes present in all eight genomes, represent around 30% of all orthogroups (N = 6698). The genus-specific orthogroups, shared between several Zymoseptoria spp. but not found in the C. beticola genome, represent 45% of the orthogroups (N = 9955; ranging from 2066 to 3212 per species). Among the genus-specific orthogroups, 1100 are found in all Zymoseptoria genomes (Figure 4A), whereas all others show presence-absence polymorphisms within the genus. A total of 2476 species-specific orthogroups (ranging from 552 to 1191 per species) are found only in individual species. Among the species-specific genes, 205 orthogroups (Ngenes = 414 to 562) are found in all three Z. tritici genomes while the isolate-specific genes in Z. tritici represent 391 (Zt10) to 792 (IPO323) genes.
Comparing the three Z. tritici isolates independently from the other species, we observe extensive gene presence-absence polymorphisms between the three isolates: 1540 orthogroups are identified in only two strains and 2522 are found in only one (Figure 4B). The number of genes showing presence-absence variation is striking compared to the 10098 core genes in Z. tritici as these genes comprise almost 30% of all predicted genes. Interestingly, we show that the number of orthogroups detected as isolate-specific is much larger when the comparison includes only members of the same species than when the other species are included (1035, 849 and 638 vs 792, 659 and 391 genes for IPO323, Zt05 and Zt10 respectively; Figures 4A and B). This indicates that a large part of the accessory gene content in Z. tritici is shared among the sister species, and highlight the importance of including sister species when establishing core and accessory gene content.
Interestingly, we show that effectors are enriched among the genus-specific genes but not among the species-specific or isolate-specific gene categories (with the exception of Z. ardabiliae). Fifty-six percent of effectors in Z. ardabiliae and up to 78% of effectors in Z. pseudotritici are shared with at least one of the other five Zymoseptoria species (Figure 4C). Indeed, 427 effector orthogroups are found in at least two genomes. However, only 47 (10% of the total effector orthogroups N= 474) are found in all seven Zymoseptoria genomes. Among the effectors shared by Z. tritici, and at least one other Zymoseptoria species, 32% (N = 112 of 352) are present in all three Z. tritici isolates while 68% (N = 240 of 352) show presence-absence polymorphisms in at least one of the three isolates. These results indicate that the majority of these shared effectors is actually accessory (i.e. presence-absence polymorphism) in Z. tritici.
Among in planta differentially expressed genes, species-specific are more expressed than core genes
Finally, we addressed the functional relevance of accessory and orphan genes in Z. tritici by analyzing gene expression patterns. We used previously published transcriptome data of three Z. tritici isolates [30] and focused on gene expression of the above-defined categories (core genes, genus-specific, species-specific, and isolate-specific). We sorted in planta expression data into two different infection phases: the biotrophic phase and the necrotrophic phase, a separation supported by principal component analysis of normalized DESeq2 counts (Figure S7). We compared expression levels by mapping RNA-seq reads to the genomes of IPO323, Zt05 and Zt10, using normalized read mappings to transcript per million. We tested differences among gene categories using pairwise comparisons with a Kruskal-Wallis test (Figure 5). Overall, we find that gene expression of the species-specific and isolate-specific genes is significantly lower in IPO323 and Zt10, but not in Zt05 (Kruskal-Wallis p-value < 0.05). Species-specific and isolate-specific gene median expression ranges from 3.2 to 5.6 TPM in IPO323 and Zt10 while median expression of core genes is 12.1 and 10.9, respectively. The Zt05 expression profile does not follow the same trend: the core genes are the lowest expressed gene category (8.9 median TPM), while genus-; species- and isolate-specific genes showed higher transcription levels (12.0; 14.4 and 13.5 median TPM respectively, Kruskal-Wallis p-value < 0.05).
In contrast, we observe a significantly higher expression of the species-specific and isolate-specific genes for all three isolates (Table S2; Kruskal-Wallis p-value < 0.05) when comparing the expression of genes that are differentially expressed (DEGs; DESeq2 p-adjusted < 0.05) between the biotrophic and necrotrophic phases. Species-specific and isolate-specific DEGs are higher expressed in planta than the core and genus-specific genes (Figure 5B; Kruskal-Wallis p-value < 0.05). The expression pattern of DEGs with different levels of specificity present a consistent pattern in all three isolates (Table S2). Overall, this comparison reveals a potential functional relevance of accessory genes which are up-regulated during infection of Z. tritici.