Variability in ITS1 and ITS2 Sequences of Historic (herbaria) and Extant (fresh) Phalaris Species

Background: Phalaris species occupy diverse environments throughout all continents except Antarctica. Phalaris arundinacea is an important forage, ornamental, wetland restoration and biofuel crop grown globally as well as being a wetland invasive. The ITS (internal transcribed spacer) region has been used for Phalaris barcoding as a DNA region with high nucleotide diversity for Phalaris species identication. Recent ndings that P. arundinacea populations in Minnesota USA are most likely native and not European prompted this analysis to determine whether Eurasian vs. native North American P. arundinacea differed in ITS regions. Our objectives were to amplify and compare ITS regions (ITS1 and ITS2) of historic (herbaria) and extant (fresh) Phalaris specimens; analyze ITS regions for species-specic polymorphisms (diagnostic SNPs); compare ITS regions of historic Phalaris specimens with known, extant Phalaris species. Results: We obtained complete ITS1 and ITS2 sequences from 31 Phalaris historic (herbaria samples, 1908 to 2001) and ve extant (fresh) specimens. Herbaria Phalaris specimens did not produce new SNPs (single nucleotide polymorphisms) not present in extant specimens. Diagnostic SNPs were identied in 8/12 (66.6%) Phalaris species. This study demonstrates the use of herbaria tissue for barcoding as a means for improved species identication of Phalaris herbaria type specimens. No signicant correlation between specimen age and genomic DNA concentration was found. Phalaris arundinacea showed high SNP variation within its clade, with the North American being distinctly different than other U.S. and most Eurasian types, potentially allowing for future identication of specic SNPs to geographic origin. Conclusions: While not as ecient as extant specimens to obtain DNA, Phalaris herbaria specimens can produce high quality ITS sequences to evaluate historic genetic resources and facilitate identication of new species-specic barcodes. No correlation between DNA concentration and age of historic samples (119 years) occurred. Considerable polymorphism was exhibited in the P. arundinacea clade with several N. American accessions being distinct from Eurasian types. Further development of within species- and genus-specic barcodes could contribute to designing PCR primers for ecient and accurate identication of N. American P. arundinacea. Our nding of misidentied Phalaris species indicates the need for DNA sequence database curation for proper specimen identication.


Background
From the 1980s onwards, sequence-based molecular phylogenetic studies in plants relied primarily on plastid genome spacers and genes, particularly rbcL (ribulose-bisphosphate carboxylase in the chloroplast genome) [1]. Risks of using such uniparentally inherited sequences for phylogenetics necessitated the development and implementation of nuclear markers to re ect biparental trait inheritance [2]. The advent of the internal transcribed spacer (ITS) region of the nuclear ribosomal cistron, 18S-5.8S-26S, by Baldwin's laboratory [3][4][5] quickly became the universally applied tool for molecular-based phylogenetic research for several reasons (Alvarez and Wendel, 2003) [6], not least of which included that 18S-26S rDNA arrays and their products are essential components of eukaryotic nucleolus organizing regions (NORs). Consequently, numerous plant families, genera and species have been analyzed for variance in ITS sequence differences, particularly for phylogenetic studies, species identi cation (barcoding) as well as ascertaining cultivar or genotype identities.
Such is the case with the genus Phalaris L. which is an important forage, ornamental, birdseed, wetland remediation/restoration and biofuel crop grown across the globe as well as being recognized as an invasive wetland species [7][8][9]. The genus Phalaris (Poaceae, grass family), classi ed in the Aveneae-Poeae section of the subfamily Pooideae, contains 20 species in the latest taxonomic treatises [10][11][12][13][14][15], although it previously included as many as 25 taxa. All Phalaris species in this monophyletic genus are cool-season grasses, with either annual or perennial life histories, from both the New and Old Worlds, varying in basic chromosome numbers of x = 6 and x = 7 and include a polyploid series from 2x to 8x [16][17][18]. Polyploid Phalaris exist in both Europe and North America, although polyploidy predominates throughout Europe and Asia. The species are widely adapted across the globe and are not restricted to one hemisphere [19]; six oret types distinguish the species as well as other diagnostic traits such as the presence of characteristic ligules [15]. The centers of origin and diversity are the Mediterranean Basin while a secondary center of diversity exists in western North America [7,[17][18][19]. The most cosmopolitan species is P. arundinacea L., circumpolar in distribution in the northern hemisphere. Most likely, a diploid ancestor of P. arundinacea came across on the land bridge of the Bering Strait into the present-day State of Alaska, USA [20] during the late Tertiary period [21].
Phalaris has a lengthy taxonomic history with the earliest references arising in the 1st Century CE (common era), with P. canariensis being described by Dioscorides with an accompanying drawing from the Byzantine period (~ 525 CE) [22], although [7] argued that the drawing did not clearly delineate this species. Two species were given scienti c names in quadrinomial nomenclature early in the 1600s: P. major semine albo (cf. P. canariensis) and P. major semine nigro (cf. P. minor Retz.), prior to the Linnaean era of binomial nomenclature [23][24].
Of particular importance within the genus, two species are of primary commercial value as cultivated crops, i.e. P. arundinacea L. (reed canarygrass; grown for ornamental, forage, biofuel, and remediation/restoration efforts) [7,8,9] and P. canariensis L. (canarygrass; grown for birdseed) [25], although additional species are also cultivated: P. aquatica, P. minor [17,18] and P. xdaviesii [26]. Seed of both P. arundinacea and P. canariensis are commercially produced in Roseau, Minnesota USA [25], the state with the highest concentration and wetland surface area coverage (50%-100% of wetland area) of native, yet invasive, P. arundinacea in the continental U.S.A. and in North America [27]. The species also is native to Eurasia [7,28] and North America [29]. As a widely adaptable species, it is able to withstand a variety of conditions including frost, drought, partial shade, and poorly drained soil [30]. While minor variation in plant height, biomass, etc. exists due to genetic and environmental factors, both the native P. arundinacea North American types and those native to Eurasia are virtually indistinguishable for any morphological trait [31], since all possess ligules [7] and the same oret type, "Floret Type 4" [15], although the oret type cannot be used when collecting vegetative genotypes for analyses. The North American and Eurasian types (using both extant and historic or herbaria specimens) have been separated, however, using biochemical (allozymes) [28] and molecular phylogenetic markers, such as ISSRs (inter-simple sequence repeats) [32], AFLPs (ampli ed fragment polymorphisms) [33][34][35], SNPs (single nucleotide polymorphisms) [29,36], as well as ITS regions [11,15,19,20].
Herbaria specimens are museum-quality, pressed and dried plant samples deposited and preserved in global herbaria to serve in future research [37]. Plant specimens are mounted on acid-free paper with the primary goal of preserving specimen integrity and visual quality to allow for future systematic and taxonomic studies [37]. Herbaria contain an extensive collection of Phalaris species past germplasm (pre-1900), that were collected by early land surveyors during European settlement [38,39]. Those herbarium specimens can serve as an informative resource for comparative genetics, genomics and systematics, to describe changes of Phalaris species germplasm over time. Less than optimal early specimen preservation methods can lead to various degrees of DNA degradation [29]. Estimation of the genetic distance between individuals and populations is based on single nucleotide polymorphism (SNP) and depends on the proper nucleotide base calling. Nucleotide misincorporations were found during ampli cation of ancient DNA and could cause false SNP recognition [40,41]. False SNP recognition in herbaria specimens can be highly problematic when proper relatedness needs to be determined in comparison to fresh tissue collection. Here we compared SNPs found in genomic DNA obtained from herbaria specimens (collection years from specimens resulted in varying SNP differences in comparison to DNA obtained from fresh tissue. Plastid trnT-F, trnL-F and the nuclear ribosomal ITS region (ITS1-5.8S-ITS2) were ampli ed/sequenced in the core tribe Aveneae Dumort. (oats) for taxonomic reconstruction of the Aveneae-Poeae-Seslerieae complex in the Poaceae [11]. This study included the genera Anthoxanthum, Hierochloe, and Phalaris, which reside in the Phalarideae (sub Panicoideae). Alignment or clustering vs. bootstraps of ITS showed Phalaris canariensis and P. truncata to be completely aligned (100/100; posterior probability support or PPS/bootstrap support) whereas P. coerulescens scored 100/95 in relation to these two Phalaris species [11]. The use of plastid and nuclear ribosomal markers provided highly accurate taxonomic delineation of the associated genera and species in the Aveneae-Poeae-Seslerieae complex wherein Phalaris was classi ed as a "small, less-diversi ed satellite lineage" [11]. Chloroplast DNA (13 intergenic sequence regions) and AFLPs were subsequently used to distinguish among North American and European Phalaris arundinacea herbaria samples [9]. Chloroplast markers supported AFLP ndings that North American races of P. arundinacea were distinctly different from European types (which had higher genetic diversity) plus the added nding that a separate Scandinavian chloroplast race from the rest of Europe. Subsequently, Voshell et al. [19] were the rst to use ITS with plastid trnT-F in Phalaris to construct a phylogenetic tree for the genus as well as determine oret evolution and polyploidy relationships. The shortest ITS region length was 588 bp in P. rotgesii to the longest of 602 bp in P. arundinacea (both of which are closely related) and 142 of 169 variable characters were parsimonious [19]. ITS sequencing differentiated two main clades in the genus with an additional subclade, proving the utility and robustness of ITS in taxonomically distinguishing among species and within species' genotypic differences as well as discerning ploidy and oret type variation. Subsequent studies using ITS in Phalaris [15,20] were used to determine historic dispersal routes from the center of origin in the Mediterranean Basin into the Americas via the Bering land route. Phalaris arundinacea appeared in two ITS clades with one embedded in Europe while the other was distinctly North American; the mid-Miocene was identi ed as the epoch in which Phalaris species diversi cation occurred [20].
Recent discovery that, based on SNPs, all historic (herbaria) and extant riparian and cultivated populations of P. arundinacea in the State of Minnesota are most likely North American natives [29,36] provided impetus for the present study to examine speci c Minnesota / North American native P. arundinacea genotypes for their identity with reported ITS for the species as well as for comparison with additional Phalaris. The objectives of this study were to: 1) amplify and compare ITS regions (ITS1 and ITS2) of historic (herbaria) and extant (fresh) Phalaris specimens; 2) analyze ITS regions for species-speci c polymorphisms (diagnostic SNPs); 3) compare ITS regions of historic Phalaris specimens with known extant Phalaris species. Associated null hypotheses tested were, respectively: 1) H o = There is no differences in ITS region between herbaria and fresh Phalaris specimens due to herbarium sample age; 2) H o = There is no ITS region polymorphism found within Phalaris species; 3) H o = There is no additional polymorphism between herbaria genotypes and currently known extant Phalaris species (National Center for Biotechnology Information (NCBI)).

Historic Specimen DNA Degradation
Sampling of historic specimens showed variable levels of DNA degradation when compared to DNA obtained from extant Phalaris tissue (Fig. 1), similar to previous sampling [29]. Extant tissue DNA extracted from P. aquatica (PI 476288) and P. arundinacea (PI 241065) (lanes 1 and 2, respectively, Fig. 1) have a majority of higher molecular weight DNA fragments (<10 kbp). Among representative historic samples P. canarensis (PI 619107; lane 3, Fig. 1 A standard DNA puri cation kit (OPS Diagnostics Laboratory, Lebanon, NJ) provided high quality and quantity of genomic DNA that allowed ampli cation of all extant tissue and most of the historic specimens (41/52 or 78.8%; Table 1). For the majority of herbaria samples, the OD 260/280 values were within the expected range for high quality DNA (2.10 ± 0.79, mean ± standard deviation; Table 1) and the second measure of DNA purity (OD 260/230 ) had higher variation (1.65 ± 1.41; Table 1). Fresh specimens yielded high quality DNA, based on both the OD 260/280 (1.93 ± 0.05) and OD 260/230 (1.93 ± 0.24; Table 2) markers. Both herbaria and fresh specimens have a wider range in OD 260/230 purity measures. However, for both purity measurements, herbaria specimens were less consistent than fresh specimens.
Concentration of DNA from historic Phalaris specimens varied widely, from 2.5 ng/μl (P. paradoxa, ISC-V-0021415; Table 1) to 87.7 ng/μl (P. canariensis, 619107; Table 1), despite similar amounts of tissue used for extraction. No correlation between the age of the historic samples over a 119 year range and DNA concentration was found, based on the insigni cant slope of the linear regression, y=0.0364x + 18.93 and a small but insigni cant r 2 = 0.003 (p-value = 0.7; Fig. 2a). Extant specimens exhibited much more consistent values with a range of 28.5 to 80.7 ng/μl ( Table 2). No regression was t to the concentration of fresh specimens (Fig. 2b) due to an inadequate sample size (n=7). One of the oldest samples collected 29 July 1886 (P. canariensis; 71229, lane 7, Fig. 1; Table 1) produced 41.9 ng/μl of DNA of relatively good quality. This specimen did not, however, yield a successful ampli cation or sequence of the ITS regions.

ITS Ampli cation and Sequencing of Phalaris Specimens
Plant-speci c ITS-P5 and ITS-U4 primers were e cient in PCR ampli cation of the ITS regions across a diverse set of Phalaris species, including a majority of Phalaris tissue extracted from herbarium specimens, producing complete sequences of both the ITS1 and ITS2 regions (31/52 or 59.6%, Table 1). The oldest specimen fully sequenced was collected in 1908: P. californica (ISC-V-0021043, MN811182.1; Table 1). In addition, most of the ITS ampli cations were suitable for direct sequencing after clean up. Overall, 31/41 (75.6%) herbarium (+; Table 1) and 5/7 (71.4%) fresh specimens (+; Table 2) produced full sequences of the ITS region for 12 Phalaris species. An additional six sequences of fresh (n=2; P. arundinacea PI 241065 and P. aquatica PI 476288) and herbarium samples (n=4; P. arundinacea 71166, P. canariensis 619107 and 71226, and P. caroliniana ISC-V-0021166) produced partial sequences of the ITS region but were not used for sequence analysis. High PCR ampli cation success for fresh specimens were evidenced across the sampling, e.g. P.canarensis (lane 1; Fig. 3a), P. aquatica (lane 2; Fig.3a), and P. arundinacea (lane 3; Fig. 3a).
The low PCR ampli cation success rate of herbaria specimens is most likely caused by DNA degradation that did not allow ampli cation of the ITS DNA, as found for samples P. californica (ISC-V-0021040), P. paradoxa (ISC-V-0021360), and P. minor (ISC-V-0021338; lanes 6, 8,10, respectively; Fig. 3b). Overall, herbaria specimens that were collected before 1929 ± 36 years did not amplify the full ITS region, wherein most of the Phalaris herbarium samples collected later, in 1956 ± 17 years, were successful in ITS region ampli cation ( Table 1). The 50ng/μl DNA template was used as a standard DNA concentration for PCR ampli cation of all Phalaris herbarium specimens (Fig. 3b).
However, due to variable degradation of DNA from herbarium specimens, ITS ampli cation did not always allow use of those products for direct sequencing based on sequencing requirements (Fig. 3b, lane 8; University of Minnesota Genomics Center -Sanger Sequencing Classic). For some herbaria samples it was necessary to reamplify PCR products, which allowed for later product sequencing (Fig. 3c).

Polymorphism Analysis of Phalaris Species
Reconstruction of genetic distance from newly sequenced herbarium and fresh Phalaris specimens with combination of those available at NCBI showed proper specimen classi cation and grouping within the same species (Fig. 4a). Generally, the sequences produced in this study matched with previous sequences of the same species. Shared SNPs from the matching sequences created unique clades by species in the phylogenetic tree (Fig. 4a). Some species (P. brachystachys and P. canariensis) are more closely related than others and the distance between clades, suggests that the ITS region was insu cient to separate those two species into separate branches. Both species share nine unique SNPs that distinguish the two from the rest of Phalaris (Supplementary Fig. 1; Table 3).
A large Phalaris species sequence collection allowed us to identify species-speci c SNPs that identi ed selected Phalaris species, based on single SNPs found within ITS1 and ITS2 (Table 3). Two additional species (P. lemmonii, P. angusta) lacked species-speci c SNPs, based on their sequence alignment (Table 3).
Considerable polymorphism was exhibited in the P. arundinacea clade (Table 3; AR1, AR2, AR3, AR4, AR5, and AR6 diagnostic, species-speci c, SNP sites in the ITS1 and ITS2 regions), particularly among Asian and those of N. American origin (Fig. 4b). We did not observe sequence abnormalities between herbarium specimens and fresh tissue specimens of the same species despite suspected misidenti ed samples. Collection locations of P. arundinacea specimens re ect the relatedness within or distinctiveness among P. arundinacea genotypes (Fig. 4b). The P. arundinacea ITS sequences identi ed three clusters, the rst with P. arundinacea originating in the United States which are N. American in origin [29,36]: MN81175.1, MN811200.1, MN811176.1 (herbarium) and MN811174.1 (fresh) clustered tightly together (Fig. 4b), although they were segmented in between Korean accessions KF713257.1 and FJ766174.1. The second cluster of P. arundinacea was an adjacent clade, on one side of the N. American natives, which included one genotype from Korea (KF713257.1, Fig. 4b) and two others from the east coast U.S. State of Virginia (KF753779.1, JF951077.1). On the other side in closely related clades were four EurAsian genotypes from Taebaek-si and Yungyang-gun, S. Korea, speci cally FJ766174.1, KF713256.1, KU883517.1, and KF13255.1 (Fig. 4b). A third, more distant cluster of two inter-related clades contained strictly EurAsian types from S. Korea (n=5), China (n=2), and Germany (n=1) (Fig. 4b). The relatedness, based predominantly on geographic distribution, illustrates high variation of the P. arundinacea samples within the ITS region along with the possibility to distinguish among P. arundinacea from different geographic locations on a continental scale (Fig. 4b).
The composition of the Phalaris phylogenetic tree also indicates that there are NCBI sequences and herbaria specimens that may have been misidenti ed (Fig. 4a). In addition to proper classi cation and groupings, multiple ITS sequences found in NCBI did not follow the species branch assignment: P. angusta (KX873129.1) was classi ed with the P. aquatica clade; P. paradoxa (JF951070.1) was assigned to P. minor clade; P. caroliniana (JF951065.1) was assigned to P. angusta clade; P. minor (JF951086.1) and P. aquatica (KU883516.1) assigned to the P. paradoxa clade (Fig. 4a). Three of the sequences produced in this research did not group with a speci c clade: P. coerulescens (MN811190.1; ISC-V-0021199), P. aquatica (MN811171.1; ISC-V-0021399) and P. angusta (MN811166.1; ISC-V-0020920), which clustered with P. minor, P. angusta and P. caroliniana, respectively. Both MN811189.1 and MN811190.1 (that did not group with a species clade) were P. coerulescens herbarium specimens grown from Turkish seed at Iowa State University but did not have the same SNPs, further suggesting possible misidenti cation.
The misidenti cation is likely due to the similar morphological characteristics of the Phalaris genus and highlights the need for ITS barcoding to distinguish among and between extant and historic Phalaris species.

Discussion
Historic Specimen DNA Degradation E cient DNA ampli cation (barcoding) requires the DNA template to be of high quality (measured by OD 260/280 and OD 260/230 ), quantity (ng of DNA template/µl) and integrity (visualized on agarose gels). High DNA quality equates to the DNA being free from protein and chemical contaminants with a su cient quantity of DNA template to start PCR ampli cation of targeted DNA regions [42]. A measure of DNA quality is the ability to be ampli ed. A second round of PCR can reduce inhibitors, as found in extracted DNA [43]. The potential effect of PCR inhibitors is reduced during the PCR reaction, in that isolated DNA from each reaction was diluted 1/25 in volume prepared for PCR in our experimentation. One of the most critical aspects when working with herbaria specimens is physical DNA degradation (fragmentation) that directly impacts ampli cation of selected DNA regions, such as the ITS regions.
In the case of historic Phalaris specimens [29], the additional factor of varying levels of genomic DNA degradation also affect DNA ampli cation [44] (Ribeiro and Lovato, 2007). Choi [45] demonstrated that trends between age and purity were not generally signi cant in four evolutionarily, geographically, and ecologically different plant lineages.
Therefore, DNA purity of specimens is likely due to the post-harvest and storage techniques used. Historic samples that are preserved and stored in classic herbaria cases maintain the plant tissue appearance and overall specimen condition to minimize tissue degradation and maintain the phenotype of the type specimens [37]. However, the process and conditions of tissue preservation (harvest and post-harvest tissue storage) may not preserve DNA integrity [29,46], leading to degradation of DNA in herbarium specimens which reduces ampli cation of ITS regions for samples with signi cantly degraded DNA [47]. Other factors, such as sample age, are also inadequate assessors of DNA quality [48]. Nonetheless, despite obtaining ODs with high purity and DNA concentrations of 2.5 to 87.7 ng/µl (Table 1), only some of the Phalaris herbaria samples were extracted with high molecular weight DNA (Fig. 1) which was not as consistent as with fresh (extant) specimens, yielding 28.5 to 80.7 ng/µl ( Table 2).
It is not su cient to obtain a certain quantity of DNA template (e.g. 50 ng/PCR reaction), but evaluation of DNA degradation should also be taken into account as a major factor limiting proper PCR ampli cation [49,50]. Another tactic to be considered is to amplify a locus in a stepwise fashion when the DNA template is highly degraded [47].
While an exhaustive study was not performed on reampli cation techniques (nested PCR, whole genome ampli cation, or others) with the intent of increasing the concentration of the PCR product to the required concentration for Sanger Sequencing, a visible band was needed. This method could be useful when working with old or precious tissue, such as herbaria, as a means to increase sequencing success.
The lack of consistency in herbaria specimens may be due to increased degradation and/or minor levels of contaminants, although the wide age range of 119 years across herbaria samples was not correlated with DNA concentration (Fig. 2a). When using the ITS region for barcoding herbaria specimens within the Juncaceae, a signi cantly negative association between age and sequencing success was found [47]. Therefore, the age relation to sequencing success may vary by species as well as storage treatments for herbaria specimens, rather than ageat least in the case of our Phalaris herbaria samples.

ITS Ampli cation and Sequencing of Phalaris Specimens
Complete ITS1 and ITS2 sequences were obtained in 59.6% of the accessions across 12 species in the genus Phalaris (Tables 1, 2) and were within the expected range of 757 ± 140 bp [51]. The oldest herbarium specimen that was fully sequenced was P. californica collected in 1908 (Table 1). Six fresh and herbaria partial sequences of P. arundinacea, P. aquatica, P. canariensis, and P. caroliniana (Tables 1, 2) were not used in the sequence analyses. The occurrence of partial sequences is not unusual, as previous researchers had di culties with P. peruviana producing low quality sequences with missing data [15]. Design of a new set of ITS PCR primers that would be located farther from the beginning (and end) of the ITS region would most likely provide more high quality ITS sequences. Sanger sequencing tends to produce low quality sequences, just after the end of the sequencing primer position [52].

Polymorphism Analysis of Phalaris Species
Multi-alignment of the ITS region from herbaria and fresh Phalaris species showed that sequences and DNA polymorphisms found in newly sequenced Phalaris species resemble those previously reported [19,53]. The similarity among the phylogenetic tree of the ITS region constructed by Voshell et. al. [19] (fresh Phalaris tissue) and the phylogenetic tree constructed in this study (mainly herbarium Phalaris tissue) veri es the relatedness found among the Phalaris species and the genetic distinctions that can be extrapolated from SNPs in the ITS region with the use of herbarium tissue as old as 119 years.
Two species (P. lemmonii, P. angusta) lacked species-speci c SNPs, based on their sequence alignment (Table 3). Their relatedness may be expressed in ower morphology and may be due to geographic separation [19,53]. In other instances, Phalaris species, i.e. P. brachystachys and P. canariensis, were too closely related such that the ITS region was insu cient to separate them (Fig. 4a). To separate both species we recommend use of other barcoding regions, such as: atpF-atpH, matK, rbcL, rpoB, rpoC1, psbK-psbI, and trnH-psbA [54].
When comparing the phylogenetic tree constructed by Voshell et. al. [19] to our tree (Fig. 4) of predominantly herbarium based ITS regions, similar clade groupings can be identi ed. A separation of clades (P. canariensis, P. brachystachys, and P. truncata) and remaining species was identi ed in our study (Fig. 4), similar to [19]. Both ITS regions, based phylogenetic trees from Fig. 4 and Voshell et al. [19] form a consistent cluster of the species P. lemmonii, P. angusta, P. caroliniana, P. californica, and P. arundinacea interpreted to resemble lineages 1 and 2 of the Phalaris genus [19]. The divergence of lineage 3 including P. coerulescens, P. minor, P. paradoxa, and P. aquatica was also found in this analysis [19]. The similarity between the phylogenetic tree of the ITS region constructed in this study (mainly herbaria Phalaris tissue) and by Voshell et al. [19] (fresh Phalaris tissue) veri es the relatedness found among Phalaris species and the genetic distinctions that can be extrapolated from SNPs in the ITS region with the use of herbarium tissue as old as 112 years. No false SNPs were produced in sequence alignment ( Supplementary   Fig. 1) and phylogenetic tree construction due to the age of tissue and relatedness [19], indicating the versatility of herbarium specimens in ITS barcoding (Fig. 4a). Nucleotide polymorphism exists among most Phalaris species that were shared among historic and extant tissue types.
The clustering of P. arundinacea ITS sequences into predominantly regional clades from the U.S. and Asia is novel and signi cant (Fig. 4b). All genotypes from N. America were in a single clade, distinct from most other ones of Eurasian origin. Despite considerable polymorphisms in the species (AR1-AR6, Table 3 Fig. 4b) was unexpected. These were segmented among two accessions of Korean origin (KF713257.11, FJ766174.1) which could be the original ancestors of P. arundinacea germplasm that came across the land bridge of the Bering Strait into the State of Alaska, USA [20] during the late Tertiary period [21].
Other genotypes of U.S. origin (either native N. American or of Eurasian ancestry) were separated in the clade from the four N. American (Minnesota) natives. While the separation of U.S. and Asia P. arundinacea was not 100% consistent, most of the Eurasian types were in distinct clades; the use of additional barcoding markers would most likely allow proper geographic association of P. arundinacea. The relatedness of N. American vs. most Eurasian types illustrates the possibility to distinguish among P. arundinacea from different geographic locations on a continental scale (Fig. 4b). Future research is essential with our large germplasm base of native riparian, roadside, lakebed, and cultivated N. American types in the Midwest (> 3,000 genotypes) [29,36] to potentially provide substantive con rmation of this distinction or a melding of all N. American types.
The high variation within the P. arundinacea species and the detection of variants, forma, and subspecies (Fig. 4b) further justi es the sequencing of more specimens within the ITS region and other barcoding regions to better understand the relatedness and variation within this species and the genus as a whole. Herbaria curators at The Consortium of California Herbarium have recognized the P. arundinacea species to contain the subspecies picta and typica, four varietas (arundinacea, colorata, geuina, and japonica), and nine forma (Arundinacea, coarcta, luteo-Picts, minor, pallens, pallida, Ramifera, Ramosa, variegata; http://www.cch2.org/portal/taxa/index.php? taxon=Phalaris+arundinacea&formsubmit=Search+Terms). Further genetic structure analysis of P. arundinacea will provide the potential ability to genetically distinguish P. arundinacea by continents, even though it is morphologically indistinguishable. More exhaustive analysis of the ITS region may provide SNPs to discriminate the native vs. exotic status in N. American populations.

Conclusions
Our study demonstrates that most Phalaris herbaria specimens can be used to produce high quality ITS sequences that are useful to evaluate past genetic resources. The methods used in this study are relatively simple to replicate and, most importantly, do not require speci c protocol modi cations. Overall, the use of a standard DNA isolation protocol, PCR ampli cation method, and direct PCR product sequencing were su cient to sequence the majority of the collected herbarium specimens. The lack of ampli cation of some specimens is most likely due to high DNA degradation. Furthermore, herbaria can provide the plant tissue and taxonomically identi ed specimens, facilitating the identi cation of new species-speci c barcodes and evaluation of past genetic resources. The ITS region (ITS1 and ITS2) was su cient to distinguish eight out of twelve Phalaris species in this study. In addition, separation of the genus and within species was possible, implying that subsampling of multiple genotypes of this same species is necessary. The ITS region distinguished among most Eurasian vs. N. American accessions of P. arundinacea. A further development of within species and genus speci c barcodes could contribute to designing PCR primers for e cient and accurate plant identi cation. Our nding of misidenti ed Phalaris species indicates the need for the DNA sequence database curation for proper specimen identi cation.

Germplasm Collection
Extant (fresh) and historic (herbaria) samples of the Phalaris species were used for genomic DNA extraction. A total of n=52 historic specimens were collected from the Bell Museum Herbarium, University of Minnesota, St. Paul, MN (MIN; n = 9; Table 1) and the Ada Hayden Herbarium, Iowa State University, Ames, IA (ISC; n = 43; Table 1). Destructive sampling of ~2.5 x 0.63 cm of leaf tissue was performed on each herbarium specimen for leaf samples positioned in the back of the specimen to not decrease their visual integrity, similar to our previous methodology [29]. Permission for destructive tissue sampling was obtained from each herbarium curator in advance. Specimen notes were added to delineate the specimen-speci c sampling. This herbarium collection represents a range of North American Phalaris herbarium specimens with collection dates ranging from 1882 (P. minor Retz.; ISC-V-0021344) to 2001 (P. arundinacea L.; 484712; Table1). Selection of the P. arundinacea herbaria samples were already tested for SNP genetic variation and, provided su cient and high quality nuclear DNA could be extracted, were determined to be most likely native North American genotypes [29,36].
For additional sampling purposes, seeds were obtained from the U.S. Department of Agriculture Germplasm Resources Information Network (USDA GRIN; https://www.ars-grin.gov/). Germinated seedlings were used as extant specimens of the Phalaris species. Fresh samples served as an ampli cation and SNP detection validation. Extant specimens included three P. aquatica (PI 476287; PI 476288; PI 303825), two P. arundinacea (PI 241065; PI 422030), and two P. canariensis (PI 578800; PI 578798; Table 2). Seeds were sown in 10 cm square pots lled with Sungro Professional Growing Mix (Sun Gro Horticulture; SKU:5105, Agawam, MA) and placed in a mist house (greenhouse with an intermittent mist system). Once the seedlings germinated and true leaves were developed, plants were moved to the greenhouse for continued growth and leaf harvest. Environmental conditions in both greenhouses were 24.4±3.0/18.3±1.5°C day/night daily integral and a 16 hr long day photoperiod (0600-2200 HR) lighting (400 w high pressure sodium high intensity discharge lamps, HPS-HID) at a minimum of 150 μmol m -2 s -1 . Plants were fertilized twice daily, between 0700-0800 and 1600-1700, using a constant liquid feed (CLF) of 125 ppm N from water-soluble 20N-4.4P-16.6K (Scotts, Marysville, OH). Fungicide drenches were applied in monthly rotations. Leaf tissue of matured, fully-expanded leaf trips were harvested and stored at -20°C in sealed plastic bags.

NCBI ITS Resources
National Center for Biotechnology Information (NCBI) database searches were performed to collect existing ITS region sequences for comparative purposes from the twelve Phalaris species included in this study (Table 1). In this research, only Phalaris ITS sequences that contained full ITS1 and ITS2 regions were included and partial sequences were not compared. Partial sequences were not included because missing sequences could contain SNP polymorphism(s) that would contribute to less accurate phylogenetic analysis. Multiple ITS sequences from twelve Phalaris species were found (n = 68, full sequences): P. angusta (KX873129.

DNA Extraction
Nuclear DNA was extracted from historic and extant Phalaris samples using Synergy 2.0 Plant DNA Extraction Kit (OPS Diagnostics Laboratory, Lebanon, NJ) with minimal adjustments to the protocol [29]. Tissue was loaded using scissors and forceps that were washed in soapy water, rinsed twice in distilled water and dried with paper towels. Extant samples, unlike historic ones, were kept cool on dry ice throughout the loading process. Samples were ground for a total of 15 minutes at 1,500 rpm in homogenizer (Geno/Grinder; SPEX SamplePrep, Metuchen, NsJ). Once puri ed, DNA was suspended in molecular grade water and stored at -20°C. DNA quality and quantity were checked with a spectrophotometer (NanoDrop 2000; Thermo Scienti c, Waltham, MA). Quality guidelines were followed as recommended by Thermo Fisher Scienti c as ~1. 8  Polymerase chain reaction (PCR) was performed using 10 μM of ITS-P5 and ITS-U4 primers [51]. PCR master mix (GoTaq Green Master Mix, M712; Promega, Madison, WI) plus 1 μL of DNA (with a minimum concentration >50ng/ μL) or DNA volume was adjusted to obtain 50 ng total DNA. The PCR protocol followed previous methodology [51]: 94°C for 4 min, then 34 cycles of 30 sec at 94°C, 40 sec at 55°C and 1 min at 72°C, nishing with 10 min at 72°C. PCR reactions were visualized using electrophoresis on a 1% (w/v) agarose gel (1 x Tris-acetate-EDTA buffer) with ethidium bromide and a DNA ladder. Expected DNA ampli cation product of ITS-P5 and ITS-U4 primers was 757 ± 140 bp [51]. If a single ampli cation product was observed, ampli ed products were directly puri ed using PureLink™ Quick PCR Puri cation Kit (Thermo Fisher Scienti c, Waltham, MA) and prepared for Sanger sequencing at the University of Minnesota Genomics Center, following UMGC sample requirements (University of Minnesota Genomics Center -Sanger Sequencing Classic, http://genomics.umn.edu/sanger-sequencing-classic.php). If the puri ed PCR reaction yielded less than the total DNA required for Sanger sequencing (25 ng/µL), the PCR product was diluted in water (1/50) and 1 ul of diluted product was used and re-ampli ed following the same procedure (ITS-P5 and ITS-U4, primer set, PCR reaction composition, PCR program).

Sequence Analysis
Sequencing results were edited and quality-checked using 4Peaks software (Nucleobytes, Alsmeer, Netherlands; http://nucleobytes.com/4peaks/) with additional manual sequence trimming. Sequence editing, alignments, annotations and manipulations were done using Geneious 11.1.5 software (Biomatters, Ltd., New Zealand; https://www.geneious.com) [55]. The Arabidopsis thaliana (x52320.1) ITS sequence was initially used to annotate full ITS regions of the newly obtained (n = 36) Phalaris DNA sequences ( Table 1, Table 2). Genetic distance among Phalaris species was inferred using the Neighbor-Joining method [56] and a bootstrap test was performed for each tree (with 100 replicates) among newly obtained and NCBI ITS sequence collection. A multiple sequence alignment was performed with use of MUSCLE alignment [57]. Diagnostic, species-speci c SNPs that differentiate among Phalaris species were determined based on multiple alignment of full-length ITS (ITS1 and ITS2 regions) sequences with exclusion of 5.8S ribosomal subunit. Sequences obtained in this research were deposited into the NCBI database, accession numbers can be found in Tables 1 and 2. Declarations Ethics approval and Consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The DNA sequences produced are deposited in the NCBI database (https://www.ncbi.nlm.nih.gov/nuccore/) and can be found by searching for each NCBI Accession Number. Sequence lists can be found in the Materials and Methods section (Tables 1 and 2).

Competing interests
The authors declare that they have no competing interests.  n/a denotes specimens with no NCBI sequence publication. + denotes a successful PCR ampli cation.
-denotes an unsuccessful PCR ampli cation.  Table 3. Diagnostic, species-speci c, single nucleotide polymorphism (SNP) sites in the ITS1 and ITS2 regions (with the species-speci c letter code followed by the regions numbered 1-9) for Phalaris species, within each speciesspeci c site, specifying the nucleotide substitution (A/C, A/G, A/T, C/A, C/G, C/T, G/A, G/C, T/A, T/C, T/G) and the relative position each site on the ITS1 and ITS2 multialignments ( Supplementary Figure 1.). Two species (P. lemmonii, P. angusta) lack diagnostic SNPs, while two species (P. brachystachys, P. canariensis ) are indistinguishable (N/A).