ITS barcoding-based species identification for Sanghuangporus (Basidiomycota), a genus of medicinal mushrooms

“Sanghuang” is a kind of important medicinal mushrooms and taxonomically represented by members in the fungal genus Sanghuangporus . Species of Sanghuangporus referred to medicinal studies and industry are discriminated mainly by BLAST search of GenBank with ITS barcoding region as a query. However, the inappropriately labeled ITS sequences related to “Sanghuang” in GenBank restrict accurate species identification and, to some extent, the utilization of these medicinal resources. Here, we examined all available 271 ITS sequences related to “Sanghuang” from GenBank including 31 newly submitted sequences for this study. Of these sequences, more than half were mislabeled and the corresponding species names are corrected. The mislabeled sequences mainly came from strains by non-taxonomists. Based on the analyses of ITS sequences submitted by taxonomists, we treat Sanghuangporus toxicodendri as a later synonym of S. quercicola , and the intraspecific and interspecific differences are below 1.50% (but S. weirianus ) and above 1.50%, respectively. Moreover, ten potential diagnostic sequences are provided for hyperbranched rolling circle amplification to rapidly detect three common commercial species, viz. S. baumii , S. sanghuang and S. vaninii . Generally, the current results provide a practical method for ITS barcoding-based species identification of Sanghuangporus , and will promote medicinal studies and industrial development from the taxonomic perspective.


Introduction
Macrofungi are a group of fungi producing fruiting bodies visible by naked eyes. Many macrofungi are famous medicinal mushrooms and possess diverse medicinal functions (Wu et al. 2019a). Of them, "Sanghuang", a kind of important wood-inhabiting medicinal mushrooms, has been utilized as folk medicines for the past two thousand years in China and adjacent countries ). After that modern scientific studies did reveal some medicinal functions from "Sanghuang", including antitumor, antioxidant, anti-inflammation, immunomodulation and so on , this kind of fungal resources attracts the attentions from European fungal chemists and pharmacologists (Chepkirui et al. 2018; Cheng et al. 2019). Secondary metabolites, such as, polysaccharides, polyphenols, pyrones and terpenes are in charge of these medicinal functions of "Sanghuang" ). Nowadays, "Sanghuang" are mainly consumed in a tea form of chips and pieces of cultivated basidiocarps and occasionally in an oral form of mycelial powders.
Like other precious wood-inhabiting medicinal mushrooms, such as "Lingzhi" (Cao et al. 2012; Wang et al. 2012;Yao et al. 2013Yao et al. , 2020Dai et al. 2017), "Niuchangchih" (Wu et al. 2012b, c) and "Fuhling" (Redhead and Ginns 2006), there was a hot debate about what the taxonomic identity of "Sanghuang" is. For now, most of fungal taxonomists have agreed that "Sanghuang" is represented by species in Sanghuangporus Sheng H. Wu, L.W. Zhou & Y.C. Dai . A total of 14 species have been described and accepted as members of Sanghuangporus: 11 species are distributed in Asia, one in Africa, one in North America and one in Europe ). In addition, more new species of Sanghuangporus await to be described from Africa (Chepkirui et al. 2018; Cheng et al. 2019) and maybe also from other parts of the world. Besides morphological and ecological characters, ITS barcoding region provides the most powerful evidence for discriminating species of Sanghuangporus ).
As a hot topic, transdisciplinary studies on Sanghuangporus have been performed to promote the utilization of these medicinal resources (Zhou et Shao et al. 2020). Most of this kind of medicinal studies try to identify their materials via BLAST search of GenBank (https://www.ncbi.nlm.nih.gov/genbank/) with ITS barcoding region as a query. However, even though each of 14 species of Sanghuangporus was given a reliable accession number of ITS sequence ), sometimes it is not easy to determine which species a material represents by the simple ITS-based BLAST search. This is because some redundant and even incorrectly labeled ITS sequences are present in GenBank. With these obstacle sequences as references, it is undoubtful that certain collections will be inaccurately identified to a species level and the corresponding ITS sequences generated from these inaccurately identified collections will be submitted to GenBank as new obstacles for later species identification. In this situation, some medicinal results will attribute to inappropriately identified species names. Meanwhile, before the erection of the genus Sanghuangporus published online in 2015 (Zhou et al. 2016), the ITS sequences generated from "Sanghuang" were labeled under other generic names, such as Inonotus P. Karst. and Phellinus Quél., even though with correct epithets. This phenomenon confuses certain fungal chemists and pharmacologists who are lack of taxonomic knowledge, and also results in a misapplication of species names to certain medicinal functions. This kind of misapplications has a negative effect on obtaining permissions from government for industrial development (Zhou 2020).
As stated by Zhou (2020), the use of correct Latin names for fungal species is crucial for the traditional Chinese medicinal studies and industry of macrofungi. To facilitate the medicinal utilization of Sanghuangporus, all ITS sequences related to "Sanghuang" in GenBank should be examined for assisting species identification. Given the above, the aim of the current study is to correct previously mislabeled ITS sequences for species of Sanghuangporus in GenBank, to re-delimit species boundary of Sanghuangporus on the basis of ITS barcoding region, and to provide candidates of diagnostic ITS sequences for rapid species identification of Sanghuangporus using Hyperbranched Rolling Circle Amplification (HRCA).

Molecular sequencing
A small piece of specimens or strains was taken for DNA extraction using CTAB rapid plant genome extraction kit-DN14 (Aidlab Biotechnologies Co., Ltd, Beijing). The crude DNA was used as templates for PCR amplifications of ITS region. The primer pairs ITS1F/ITS4 and ITS5/ITS4 (White et al. 1990; Gardes and Bruns 1993) were selected for amplification and subsequent sequencing at the Beijing Genomics Institute, Beijing, China. The PCR procedure was as follow: initial denaturation at 95 °C for 3 min, followed by 34 cycles at 94 °C for 40 s, 57.2 °C for 45 s and 72 °C for 1 min, and a final extension at 72 °C for 10 min. All newly generated sequences are deposited in GenBank (Table 1).

Downloading sequences from GenBank
The genus name Sanghuangporus and the epithets of 14 Sanghuangporus species were firstly used as queries to search GenBank. Meanwhile, the reliable sequences of 14 Sanghuangporus species ) were used as queries to perform BLAST search in GenBank. The cut-off value of similarity for the resulting sequences was set as 95%. All these ITS sequences by April 30, 2020 were retrieved from GenBank (Table 1). In addition, the recently published papers related to the taxonomy of Sanghuangporus were checked for supplementing sequence information (Wu et

Phylogenetic analyses
The datasets of ITS sequences were separately aligned using MAFFT (Ronquist et al. 2012), which employed two independent runs each with four chains and starting from random trees. Trees were sampled every 1000th generation, of which the first 25% were removed as burn-in and the other 75% were retained for constructing a 50% majority consensus tree and calculating Bayesian posterior probabilities (BPPs). Tracer 1.5 (http://tree.bio.ed.ac.uk/software/tracer/) was used to judge the convergence of chains.

Evaluation of genetic distances of ITS sequences
The genetic distances of an alignment of ITS sequences was estimated using MEGA X (Kumar et al. 2018;Stecher et al. 2020). For genetic distances between and within species of Sanghuangporus, the parameters were both set as follows: a BS method of variance estimation with 1000 BS replications, a p-distance substitution model including transitions and transversions, the uniform rates among sites, and a pairwise deletion treatment of gaps and missing data.

Identification of diagnostic ITS sequences
According to the alignment of ITS sequences generated using MAFFT 7.110 (Katoh and Standley 2013) under the G-INI-i option (Katoh et al. 2005), if a more than one-nucleotide-long fragment was unique for one species and not variant within this species, this fragment was identified as a potential diagnostic sequence for this species.

Results
A total of 13 specimens and 18 strains were newly sequenced, and the resulting ITS sequences were submitted to GenBank (Table 1). According to our criterion, 240 ITS sequences were downloaded from GenBank, but two sequences (HQ845057 and KP974834) showed unexpectedly large differences from other sequences of Sanghuangporus by BLAST search and thus excluded from subsequent phylogenetic analyses (Table 1). Eventually, a dataset of all available 269 ITS sequences (31 newly sequenced and 238 downloaded from GenBank) from Sanghuangporus species was employed to construct a preliminary phylogenetic frame of this genus. An alignment of 941 characters was resulted from this dataset, and HKY + G was estimated as the best-fit evolutionary model for phylogenetic analysis. The ML search stopped after 850 bootstrap replicates. All chains in BI converged after ten million generations, which is indicated by the estimated sample sizes (ESSs) of all parameters above 500 and the potential scale reduction factors (PSRFs) close to 1.000. The ML and BI algorithms generated nearly congruent topology in main lineages (Additional file 1: Tree S1, Additional file 2: Tree S2). Therefore, only the topology from the ML algorithm is visualized in a circle form; the midpoint-rooted tree recovered 13 species and three undescribed lineages of Sanghuangporus (Fig. 1). In GenBank, species names from nine out of 77 phylogenetically analyzed specimens were misapplied (tips labeled in green color in Fig. 1), while those from 131 out of 192 phylogenetically analyzed strains were wrongly identified to a species level (tips labeled in red color in Fig. 1). Besides, two ITS sequences of strains (HQ845057 and KP974834) labeled as members of Sanghuangporus were extremely deviated and maybe came from inappropriate readings of Sanger sequencing chromatograms ( Table 1). Most of these errors came from submitters of non-taxonomists. Therefore, to delimit species boundary of Sanghuangporus, we selected the ITS sequences submitted to GenBank by taxonomists for a new round of phylogenetic analysis ( Table 1). The new dataset included 122 ITS sequences and resulted in an alignment of 871 characters with HKY + I + G as the bestfit evolutionary model. The ML search stopped after 450 bootstrap replicates. All chains in BI converged after four million generations, which is indicated by the ESSs of all parameters above 1000 and the PSRFs close to 1.000. The ML and BI algorithms generated nearly congruent topology in main lineages, and only the midpointrooted ML tree is presented along with the BPPs at the nodes (Fig. 2). Similar to Fig. 1, this tree also recovered 13 species of Sanghuangporus with S. quercicola and S. toxicodendri nested within a single clade (Fig. 2). Among these 13 species, S. lonicericola was still not strongly supported as a monophyletic lineage, and S. alpinus and S. sanghuang were moderately supported from the ML algorithm and fully supported from the BI algorithm, while all other species received strong statistical supports from both the ML and the BI algorithms (Fig. 2).
To further explore the species relationships among Sanghuangporus, the alignment with 122 selected ITS sequences was conducted a genetic distance analysis. In addition to Sanghuangporus microcystideus and S. pilatii (Černý) Tomšovský each referring to a single collection, the genetic distances of ITS sequences within species of Sanghuangporus was mostly below 1.00% (even 0.   The genetic distances between species are shown down the diagonal, and those within species are shown in italic along the diagonal. Fifty-eight ITS sequences of S. baumii, S. sanghuang and S. vaninii (Ljub.) L.W. Zhou & Y.C. Dai that are the most common species in medicinal studies and products  were further retrieved from the dataset with 122 selected sequences. These 58 ITS sequences were realigned and the alignment is presented with shadows (Fig. 3). From this alignment, 10 potential diagnostic sequences with two to six nucleotide differences were identified for HRCA to discriminate species: two for S. baumii, two for S. sanghuang and six for S. vaninii (Fig. 3, Table 3). Table 3 Diagnostic sequences adopted from Fig. 3

Discussion
In this study, we summarized all available ITS barcoding sequences of "Sanghuang" from GenBank. A total of 271 ITS sequences related to "Sanghuang" including 31 newly generated sequences for this study were analyzed. More than half of these sequences, or say 142, were mislabeled. So many errors undoubtfully raised chaos when BLAST search, especially for non-taxonomists.
Comparing with specimens, much more mislabeled sequences came from strains. Most of these sequences were submitted by non-taxonomists. One typical case is a recently published paper on genome sequencing of "Sanghuang" that meanwhile submitted six ITS sequences to GenBank (Shao et al. 2020). In GenBank, all these six sequences were labeled as Inonotus sp. rather than certain species of Sanghuangporus (MN242716-MN242721), while the six strains generating these sequences were named as Sanghuangporus sanghuang in the paper submitting these sequences (Shao et al., 2020). However, five of the six strains including that subject to genome sequencing are actually Sanghuangporus vaninii (Fig. 1, Zhou et al., 2020). That is to say, five out of six strains were wrongly identified to a species level. Therefore, this incorrected species identification makes the whole genome sequence of "Sanghuang" misapplied to an inappropriate species. Even worse, Shao et al. (2020) stated that these six strains are commercially cultivated, which further results in the name chaos for commercial products of "Sanghuang". Another case is a paper specially on the species identity of "Sanghuang" strains (Han et al. 2016). Thirty strains deposited in the Agricultural Sciences Institute culture collection (Mushroom Research Division, Rural Development Administration, Republic of Korea) were correctly identified as Sanghuangporus vaninii and S. sanghuang according to an ITS-based phylogenetic analysis; however, unfortunately, most of these ITS sequences were mislabeled when being submitted to GenBank.
Nine mislabeled sequences came from specimens. These errors were caused mainly by the update of taxonomic recognition. Six sequences of specimens originally labeled as Sanghuangporus sp. are accepted to represent S. quercicola (Table 1). In the paper submitting these six sequences, the specimens generating them were newly described as Sanghuangporus toxicodendri (Wu et al. 2019b). However, in that paper the separation of S. toxicodendri and S. quercicola was actually not supported from a phylogenetic perspective, and moreover, the morphological differences between these two species are not on the basis of stable characters (Wu et al. 2019b).
In the current phylogenetic analyses, the six specimens of S. toxicodendri, three specimens of S. quercicola and additional four collections merged together in a fully supported clade (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). Therefore, S. toxicodendri and S. quercicola are considered to be conspecific, and S. quercicola has priority over S. toxicodendri. Another mislabeled sequence was generated from a specimen originally described as Inonotus tenuicontextus L.W. Zhou (Tian et al. 2013). Therefore, this mislabeled sequence is accepted to represent S. weigelae ( Table 1).
The independence of Sanghuangporus lonicericola was not well supported in the current phylogenetic analyses (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). Similarly, Sanghuangporus alpinus and S. sanghuang were not strongly supported as monophyletic species by the ML algorithm (Fig. 2). However, the intraspecific difference of ITS sequences in each of the three species was quite low (0.10-0.49%, Table 2). So, we still accept S. alpinus, S. lonicericola and S. sanghuang as three independent species. Maybe a phylogenetic analysis employing more loci will improve the resolution. On the contrary, Sanghuangporus baumii, S. weirianus and S. zonatus are the only three species with more than 1.00% of intraspecific ITS differences (Table 2). However, these three species all received strong supports as independent lineages (Additional file 1: Tree S1, Additional file 2: Tree S2, Fig. 2). Noteworthily, Chinese collections of Sanghuangporus baumii formed three strongly supported subclades corresponding to geographic origins, viz. nine from Northeast China, two from Beijing and two from Shanxi; regarding S. zonatus, two collections of from Hainan, China grouped together with full statistical support, and then formed a fully supported clade with the collection from Yunnan, China (Table 1, Fig. 2). Moreover, branch lengths of the only two available collections of S. weirianus were extremely different (Fig. 2). A more comprehensive sampling of these three species in phylogenetic analyses will further clarify their intraspecific relationships. For now, we tentatively accept them as monophyletic species.
Although intact mature specimens of "Sanghuang" are not difficult to be morphologically identified to a species level in a short time, most of commercial products are chips and pieces or even powders. Normally, it is impossible to rapidly determine which species such kind of commercial products really represents. Like other medicinal mushrooms (Raja et al. 2017), species names of Sanghuangporus are sometimes misapplied to certain products of "Sanghuang" (Shao et al. 2020). This confused situation to some extent restricts the industrial development of "Sanghuang" (Zhou 2020). Therefore, to standardize the industry of "Sanghuang", ten candidate sequences were provided for HRCA based on the accurate boundaries among three commonly studied and cultivated species, viz. Sanghuangporus baumii, S. sanghuang and S. vaninii (Lin et al. 2017;Zhou et al. 2020). HRCA is an isothermal amplification approach and thus provides a rapid, simple and low-cost detection of specific nucleic acid sequences (Nilsson et al. 1994;Lizardi et al. 1998). This approach has been widely used for clinic detection of human-pathogenic microfungi (Zhou et al. 2008;Trilles et al. 2014;Rodrigues et al. 2015), and recently, was also reported for rapid detection of poisonous macrofungi (He et al. 2019a(He et al. , 2019b. Regarding lethal Amanita species, a more than two-nucleotide-long difference was evidenced to be valid for identification of α-amanitin gene (He et al. 2019a). Here, to provide more candidates, two and more nucleotide differences are given, because it was reported that this approach could reveal single nucleotide differences (Nilsson et al. 1997). Hopefully, certain candidates will work well in future experiments.

Conclusion
Generally, to promote medicinal studies and industrial development, the ITS barcoding region of Sanghuangporus is comprehensively analyzed for accurate species identification. Firstly, the names of all available ITS sequences in GenBank related to "Sanghuang" are carefully corrected. Secondly, the intraspecific ITS difference for each species of Sanghuangporus but S. weirianus is evaluated to be below 1.50%, while the interspecific ITS difference is always above 1.50%. This provides a practical cut-off value for BLAST search-based species identification. Finally, ten potential diagnostic sequences are provided for HRCA assay to rapidly discriminate three commonly studied and cultivated species, viz. Sanghuangporus baumii, S. sanghuang and S. vaninii.

Figure 1
The phylogenetic tree inferred from 269 ITS sequences. The topology was generated from the maximum likelihood algorithm. The tips in blue color represent name-mislabeled specimens, while those in red color represent name-mislabeled strains.

Figure 2
The phylogenetic tree inferred from ITS sequences submitted by taxonomists. The topology was generated from the maximum likelihood algorithm, and bootstrap values and Bayesian posterior probabilities simultaneously above 50% and 0.8, respectively, are presented at the nodes.

Figure 3
The alignment of Sanghuangporus baumii, S. sanghuang and S. vaninii generated from ITS sequences submitted by taxonomists. Ten potential diagnostic sequences for hyperbranched rolling circle amplification are labeled in capital letters.