A phylogenomic approach facilitates clustering-based identification of F.g.
The pattern of clustering observed on the phylogenomic tree (Fig. 1) sheds new light on the phylogeographic structure and genetic relationship among the species, which plays a significant role in their identification. Species-specific clustering was evident in case of all species represented by more than one strain (Fig. 1). In general, two major sister clades were resolved. The first clade occupied the basal position of the tree and grouped all species endemic to Africa (F. aethiopicum, F. acaciae-mearnsii), Asia (F. asiaticum, F. nepalense, F. ussurianum and F. vorosii) and Australia (F. acaciae-mearnsii). The second larger clade was more diverse and included species endemic to North (F. gerlachii, F. louisianense) and South America (F. austroamericanum, F. mesoamericanum, F. brasilicum) and New Zealand (F. cortadariae). F. boothii and F. meridionale reported from diverse regions of Asia, Africa, and Latin America (van der Lee et al. 2015) were also clustered in this large clade.
To minimize the impact of geographic variation that may interfere with the results, we incorporated a large set of geographically diverse strains. Such a strategy is especially critical for pandemic species, which are often subdivided into genetically distinct populations (Shakya et al. 2021). F. graminearum formed a peripheral clade on the tree separating the species into thirteen major subclades named I to XIII. We observed geographic overlap in these subclades. Five out of nine strains originating from either North or South America were members of subclade I (n=9). Its basal position in the F.g. clade may suggest that subclade I includes genotypes with the highest genetic relationship to ancestors of F.g. Interestingly, subclade I included the oldest known strains of F.g.: un1 (CBS 185.32) and un2 (CBS 104.09), which were isolated/deposited in the collection of the Westerdijk Fungal Biodiversity Institute in 1932 and 1904, respectively. Unfortunately, associated metadata of these two strains do not include information on their geographic origin. The first three subclades (II, III and IV) diverging from subclade I were represented by single strains, among which two: ar4 (114-2) and ar1 (CBS_139514) come from South America and one (sy1, CS3005) from Australia. Most European strains were grouped into clusters VI-XIII. Interestingly, subclades VI (n=7), VII (n=1), VIII (n=10) and IX (n=2) grouped mostly Polish strains. A small subclade X clustered only two strains: ne5 (79E1) originating from the Netherlands and po11 (16-390-z) originating from Poland. This small clade was located between Polish subclade IX and west European subclade XI grouping strains (n=34) mostly from West Europe. The observed high degree of geographic clustering of the strains was not observed in subclade XIII (n=27), which diverged from west European subclade XI. It grouped strains from diverse geographic locations such Argentina, Brazil, Italy, Germany, the Netherland, Poland, Russia and Serbia, pointing out evidence of their most recent spread to new geographic locations. Geographic expansion in accessible directions may be also indicated by clustering two European strains ge3 (CS10007) (from Germany) and ru13 (70725) (from Russia) into subclade I. Other examples come from subclade XI, which included single strain from South Africa (sa1, CBS 119799), and subclade VI, which included strain ir1 (CBS 110263) from Iran.
A phylogenomic approach enables the detection and clarification of the incorrect taxonomic status of historical strains held in fungal collections
Among the F.g. strains, one strain (sa2, CBS 119800) unexpectedly clustered outside F.g. clade and was grouped in F. boothii clade. This strain is held in the fungal collection Westerdijk Fungal Biodiversity Institute (Utrecht, The Netherlands) as F. graminearum and according to available metadata it was isolated from maize in South Africa where F. boothii has been frequently reported (van der Lee et al. 2015). However, our previous mitochondrial-based comprehensive studies did not indicated incorrect taxonomic status of this strain (Kulik et al. 2015; Brankovics et al. 2018; Wyrębek et al. 2021). The complete mitogenome of CBS 119800 (NCBI accession no. KP966554) displays 100% sequence identity to F.g. strain CBS 104.09 (NCBI accession no. KR011238). Partial (659 nt) tef gene sequence of CBS 119800 (NCBI accession no. KT855180) shows 100% identity to F.g. (NCBI accession no. KP267345), indicating that tef alone could not definitively confirm its taxonomic affiliation. Blast search against NCBI non-redundant (nr) nucleotide database with an e-value cutoff of ≤ e-0.0, 100% identity and 100% coverage also yielded nearly 30 hits to F. boothii. To further clarify its taxonomic status, we retrieved the complete sequence of the topoisomerase 1 (top1) and phosphoglycerate kinase (pgk) genes, which have become widely used taxonomic markers for Fusaria (Stielow et al. 2015). Blast searching with the top1 gene as a query (e-value cutoff of ≤ e-0.0 and 100% coverage) yielded two hits to F. boothii (NCBI accession no. KY952952 and KY952951) with 100% sequence identity. However, blast searching with pgk gene as a query did not produce hits with 100% sequence identity, presumably due to the lack of pgk sequences from F. boothii in GenBank database. To determine sequence similarity of pgk between CBS 119800 and other strains of F. boothii, we retrieved its sequence from genome assembly of three strains of F. boothii: CBS 316.73, CBS 110251 and CBS 119170. Subsequent sequence comparison (data not shown) revealed that all four strains shared 100% identity in the pgk gene, thus, supporting its identity as F. boothii.
Our results of phylogenomic analysis may be also helpful in resolving the uncertain taxonomic status of CBS 110260. This strain was isolated from maize in Nepal and has been assigned to either F. asiaticum or F. meridionale or a hybrid strain (O’Donnell et al. 2000; Ward et al. 2002; O’Donnell et al. 2004; Starkey et al. 2007; Yang et al. 2008). Most recent studies by (Walkowiak et al. 2016) based on analyses of SNPs and indels suggested that this strain shows 99% sequence identity to F. meridionale. Indeed, our phylogenomic approach grouped CBS 110260 together with the strain CBS 110249 (fme1, Fig. 1), which supports its taxonomic assignment as F. meridionale. Moreover, positioning of F. meridionale on the second large clade suggests that this cryptic species is more closely related to especially F. cortadariae, F. austroamericanum, F. brasilicum and even to F.g. than to F. asiaticum. Additional whole-genome SNP analyses (Table 1) confirmed the above findings. Notably, the number of SNPs (126.189) between CBS 110260 and F. meridionale strain (CBS 110249) is in range of intraspecific variability found for F.g. (discussed in later sections), which is indicative of its taxonomic assignment as F. meridionale.
Table 1
Results of whole-genome SNP analyses by mapping of assembly genomes to reference CBS 110260 strain
Assembly genomes | F. meridio- nale CBS 110249 | F. cortada- riae CBS 119183 | F. austroameri-canum CBS 110244 | F. brasili-cum CBS 119179 | F. graminea- rum PH-1 | F. asia- ticum CBS 110258 |
CBS 110260 | SNPs | 126,189 | 336,338 | 335,339 | 342,238 | 567,931 | 577,489 |
Linear coverage | 98.5% | 97.4% | 97.4% | 97.1% | 95.6% | 93.5% |
Assessment of genome-wide SNP counts allow successful identification of F.g.
One of the major drawbacks of clustering-based strategies for microbial identification is that they are time consuming, require bioinformatics skills, and specialized software and equipment. The remedy for these limitations is intuitive, user-friendly web-based platforms enabling fast and easy processing of next generation sequencing data through numerous cutting-edge tools. We used the PhaME workflow available at EDGE bioinformatics enabling fast counting of the total number of SNPs for determination of F.g.. The method was validated through multiple comparisons of assembly genomes to reference PH-1 strain. We estimated intra and interspecific differences in the number of SNPs facilitating species recognition (Supplementary File S2). Intraspecific variation calculated via comparisons of F.g. strains ranged from around 86.000 to nearly 158.000 SNPs, with one exception. The exceptional result was found for the strain CBS 119173, which yielded nearly 209.000 variable SNPs. However, the increased variability of CBS 119173 is not unexpected. CBS 119173 belongs to Gulf Coast population of F.g. with higher divergence, as evaluated via previous phylogenetic analyses of multilocus DNA sequence data (Starkey et al. 2007). The increased genetic divergence of CBS 119173 was also depicted on phylogenomic tree (Fig. 1) by clustering of this strain separately from the remaining F.g. strains.
For cryptic species from the FGSC complex, interspecific variation ranged from 304.164 to 706.454 SNPs being from at least, nearly two times higher than intraspecific variation. Unsurprisingly, higher intraspecific variation was found for closely related morphospecies F. culmorum, F. pseudograminearum, F. sambucinum and F. venenatum, and ranged from 1,032,686 to 1,955,620 SNPs being more than 6.5 times higher than intraspecific variation.