Recent sequencing projects showed that relying solely on rDNA marker for species identification and phylogenetic analysis can drive problematic species delimitations and incorrect identifications, especially in species complex, due to the presence of intragenomic polymorphisms (Stadler et. al. 2020; Paloi et al. 2023; Bradshaw et al. 2023). To further explore this notion, we strived to obtain high-quality genomes for rDNA analysis, combining 2nd generation sequencing technologies such as Illumina with 3rd generation sequencing technologies like Oxford Nanopore, along with extensive usage of bioinformatic tools, which significantly increased the accuracy of the genome assembly (Stadler et al. 2020; Paloi et al. 2022; Hoang et al. 2022).
The results obtained here are well-aligned with a previous research published in Stadler et al. (2020), which revealed high intragenomic polymorphism in a pilot study featuring a smaller subset of strains accommodated within the Hypoxylaceae. Additional sequence analysis revealed the presence of deep rDNA paralogs. Intragenomic variation in the rDNA cistron can likely be traced back to be caused by nucleotide deletions, insertions, and substitutions within the genome (Bradshaw et al. 2023; Paloi et al. 2022).
Although the number of LSU copies did not differ significantly in most of the studied strains, probably due to the rDNA cistron being arranged in tandem throughout the genome (Torres-Machorro et al. 2009), considerable variations were observed in genomes derived from e.g. J. cohaerens, Pa. papillatum, H. addis, H. dussii, H. guialense, H. sporistriatatunicum, Hyp. monticulosa, Py. hunteri and X. hypoxylon, where more copies of ITS were found compared to LSU and in some cases only a partial LSU was recovered. Additionally, we were not able to retrieve ITS sequences from H. rubiginosum MUCL 52887 and found only one copy in Hyp. submonticulosa. The quality of third-generation sequencing technologies and the impending implications on studying phenomena such as polymorphisms of the rDNA cistron, including their multiplicity (Bradshaw et al. 2023) are widely discussed for different groups of fungi (Paloi et al. 2022; Stadler et al. 2020). That the apparent absence, or at least failure to detect ITS sequences inside a genome sequence for technical reasons is not an isolated case was recently discussed by Bradshaw et al. (2023). Here, the authors were unable to locate the ITS sequences for a quarter of all taxa evaluated and only a single ITS copy for half of the taxa studied.
The study by Stadler et al. (2020) reported polymorphisms in the ITS region for the species H. lienhwacheense, H. rickii, Hyp. monticulosa and Py. hunteri. In our analysis, apart from Py. hunteri, all these species also exhibited such variations in the LSU region. On the other hand, the genomes of A. truncatum, D. concentrica, H. pulicicidum and J. multiformis were found to possess polymorphisms in the LSU region but not in ITS (Table 2). Despite this, for the species studied here, the ITS rDNA showed to have a lower rate of polymorphisms than the LSU region. We expected the opposite, taking into consideration that the ITS region (~ 400–900 bp) contains two introns (ITS1 and ITS2) that are highly variable and a well conserved small non-coding RNA (5.8S). Usually, the LSU, with ~ 3000–5000 bp) is highly conserved within species because of the crucial function it plays (ribosome function and protein synthesis; cf. Gregory et al. 2019) and many variations in its sequence could cause disruption in these processes.
After in-depth analyses of sequences retrieved from the genome of H. rickii, we concluded that only one of the four sections (where the ITS and LSU can be located in the genome) were amplified using Sanger-sequencing methods (Supplementary Information Fig. 6) in the past. This is interesting because the target sites of commonly used primers (ITS1, ITS4, ITS5) located in the genome did not diverge, and thus should be amplified by PCR stochastically. We encountered a similar phenomenon when studying the Py. hunteri genome sequence, where only one of the five sections can be found in the literature (Supplementary Information Fig. 5). For both regions (ITS and LSU), however, one of the sequences retrieved from the genome presents mismatches with the primers ITS1 and ITS5, hence offering an explanation on why this region is not targeted for amplification. A second explanation would be a highly condensed rDNA cistron, with the consequence that these sections are not actively transcribed when the cell needs to produce ribosomes for protein synthesis.
Phylogenetic analysis of the ITS showed low – medium support for the main clades of the Hypoxylaceae reported in previous studies (Wendt et al. 2018; Lambert et al. 2019; Becker et al. 2020; Cedeño-Sanchez et al. 2023). This result is congruent with older phylogenetic studies solely relying on ITS rDNA data alone, but does not reflect taxonomic advances of the last decade (compare with Sánchez-Ballesteros et al. 2000; Hsieh et al. 2005). Strikingly, sequences from the same strain consistently formed well-supported clades, confirming the reliability of our results. This result suggests that ITS can rather be used to estimate taxonomic affinities towards the different species complexes described for the Hypoxylaceae in a “quick-and-dirty”-fashion. Due to LSU showing a similar pattern of intragenomic polymorphism, we would consider it risky to apply it as complementary barcoding marker to ITS for identification to species level. On the contrary, all the protein-coding regions studied here clearly displayed better phylogenetic resolution and hence can be regarded as much more suited for barcoding for fungi due to being highly conserved and not displaying intragenomic polymorphisms, at least in the here studied strains.
Of note, we pioneered the retrieval of TEF1 data from all here investigated genomes, marking its inaugural inclusion in a Xylariales phylogeny. This locus has been well-established for other taxonomic groups, such as Amphisphaeriaceae, Cainiaceae, Cladosporiaceae, Clypeosphaeriaceae, Diatrypaceae, Hyponectriaceae (Jaklitsch et al. 2012; Dai et al. 2014; Vicente et al. 2021, Samarakoon et al. 2022). However, applying this locus alone did not show any advantage in direct comparison to the others, at least for Hypoxylaceae. Other retrieved genes (TUB2 and RPB2) have already been successfully applied to infer well-resolved phylogenies (Hsieh et al. 2005; Wendt et al. 2018; Lambert et al. 2019; Becker et al. 2020; Cedeño-Sanchez et al. 2023). Comparing the results obtained in the study of Hsieh et al. (2005) with a partial ACT1 and in this study using a whole ACT1 gene, it is clear that the ACT1 gene alone is insufficient for solving the relationships among species in the Hypoxylaceae.
Our investigation revealed high intragenomic polymorphism of rDNA in distinct strains within the Hypoxylaceae group, indicating the presence of paralogs. Specifically, we found multiplets of rDNA sequences in Hyp. monticulosa and Pa. papillatum genomes, which we propose to call deep rDNA paralogs. They uniformly exhibited a significantly lower GC content compared to the major copy, and showed highly variable 5.8S rDNA sequences, which were otherwise found to be highly conserved across species. Some of the paralogs retained the necessary motifs for maintaining the required secondary structure of ITS2, while two paralogs exhibited thermodynamically less stable structure (i.e. it has a high free energy) which could not be modeled anymore for technical reasons (Supplementary Information Table 2). Specifically, in two minor haplotypes of Pa. papillatum, XXX conservative motifs in 5.8S and LSU sequences were disrupted, resulting in a corresponding proximal stem structure with a highly atypical form. Low GC content and mutations in otherwise well-conserved sequence segments are typical signs for a pseudogene (see Kolařík and Vohník 2018 and Stadler et al. 2020 for review). In eukaryotes, the hybridized 5.8S and LSU rRNA parts, forming so-called proximal stems, have a free nucleotide on each side with approximately six base pairs in between. The structural pattern of this proximal stem is necessary for successful detection of its associated processing machinery (see Keller et al. 2009) and has been proposed as a diagnostic character for pseudogene detection (Harpke and Peterson, 2007, 2008). In general, the 3rd ITS haplotype of Hyp. monticulosa deviates the most, as it exhibits the lowest GC content, an unpredictable ITS2 secondary sequence, and a highly disrupted proximal stem structure. Furthermore, the LSU sequence adjacent to this haplotype shows the highest observed divergence (11.5%) from the major haplotype. To conclude, it is highly likely that the captured deep paralogs represent pseudogenes (at least in the case of Hyp. monticulosa) or sequences in an advanced stage of pseudogene formation. Variations in rDNA identity spotted for the other explored genomes can be explained with nucleotide deletions, insertions, and substitutions in the genome.
Lastly, we want to stress that the major conclusion of this study is that it reinforces our opinion that the rDNA cistron alone is insufficient as a universal barcode marker for fungi (Paloi et al. 2022; Bradshaw et al. 2023), especially in the Hypoxylaceae (Stadler et al. 2020). We propose to evaluate the TUB2 gene as a new primary barcoding marker for Hypoxylaceae as a substitute, as ample reference sequences have already been obtained for Hypoxylon and many other genera of Xylariales in the past. A phylogeny based on RPB2 sequences resolved the hitherto accepted topology for Hypoxylaceae and helped to further stabilize the phylogeny, but the number of available sequences derived from type and reliably identified vouchers is far lower than in case of TUB2. The suitability of TUB2 for many taxa needs to be examined by the inclusion of additional vouchers belonging to the same species (complex) to assess interspecific variability because there are no molecular data at all or only a single sequence dataset available for more than 50% of the described Hypoxylaceae taxa. Nevertheless, it has already been shown that the few Hypoxylon taxa for which multiple specimens were sequenced that the TUB2 locus shows little variability in e.g. H. fragiforme, H. rubiginosum (according to data from GenBank arising from multiple independent studies) and H. fuscum (cf. Lambert et al. 2021). Our study contributes to a better understanding of the genetic diversity and evolutionary dynamics within Xylariales and emphasizes the need for holistic approaches, such as multi-gene sequencing in fungal barcoding endeavors and phylogenetic analyses.