Plant material and DNA extraction
Seeds of 2–3 cultivars of 16 forage species (Alopecurus pratensis L., Arrhenaterum elatius L., Cynosurus cristatus L., Dactylis glomerata L., Festuca pratensis Huds., F. rubra L., Lolium perenne L., L. multiflorum Lam., Lotus corniculatus L., Medicago sativa L., Phleum pratense L., Poa pratensis L., Trifolium pratense L., T. repens L. and Trisetum flavescens L.) were germinated and their seedlings were transferred into pot trays (77 wells, 50 cm x 32 cm, with compost as substrate). Plants were grown for 3 weeks after which DNA was extracted from three plants per species. A list of the selected plant specimens and cultivar name is presented in Table 1. For grasses, three leaf fragments of ~1 cm and for legumes three young leaflets were harvested. The plant material was freeze-dried for 48 h and pulverized in a QIAGEN TissueLyser II (QIAGEN, Hilden, Germany). DNA was extracted using the NucleoSpin® II kit (Macherey-Nagel, Düren, Germany) and its integrity visually inspected by agarose gel electrophoresis (1% w/v). DNA purity and concentration were determined with a NanoDrop™ spectrophotometer (ThermoFisher Scientific, Waltham, MA, USA).
DNA barcode amplification and sequencing
The BOLD database was screened for DNA barcode sequences for our selected species and close relatives; barcodes rbcLa, matK and trnH-psbA were selected as candidates because they reported the most available sequences. Primer sequences for the three barcodes were obtainedfrom BOLD [10] and were optimized for amplification in the target plant families (Additional file table S1). Each PCR reaction consisted of 15 ng of template DNA, 1 x flexi buffer (Promega, Madison, WI, USA), 2 mM MgCl2, 200 µM dNTPs, each primer at 0.4 µM, 0.75 units of GoTaq® G2 Flexi DNA Polymerase (Promega, Madison, WI, USA) and water to a final volume of 30 µL.
For rbcLa, PCR conditions were 5 min at 94°C followed by 33 cycles of 40 s at 94°C, 1 min at 55°C and 40 s at 72°C, followed by a final extension cycle of 10 min at 72°C. For matK and trnH-psbA, a 5 min at 94°C followed by 50 cycles of 40 s at 94°C, 1 min at 54°C and 40 s at 72°C followed by a final extension cycle of 10 min at 72°C were used.
Amplicons were purified in a MultiScreen PCR96 filter plate (Merck, Darmstadt, Germany). Sequencing reactions were prepared with 1 x BigDye™ Terminator 3.1 Reaction Mix (ThermoFisher Scientific, Waltham, MA, USA), 1 x BigDye™ 3.1 Sequencing Buffer, forward or reverse primer at 0.16 µM and 800 ng of purified amplicon to a final volume of 5 µL. The same primers used for PCR were used for sequencing. Capillary electrophoresis was performed on a 3130 ABI (ThermoFisher Scientific, Waltham, MA, USA). The resulting traces were quality filtered and merged using GAP4 [14] with the default settings. All traces and sequences were uploaded to BOLD v4 (project code: SWFRG; http://boldsystems.org/index.php/MAS_Management_DataConsole?codes = SWFRG).
Taxonomical assignments
Sequences of matK, rbcLa and trnH-psbA were downloaded from BOLD v4 on May 23, 2019 [10]. Only sequences from the Poaceae and Fabaceae families with no contaminants and longer than 200 bp were included. In total, 6,232 rbcLa, 11,971 matK and 1,236 trnH-psbA sequences were present in the downloaded fasta files, which also include the plants from the BOLD project SWFRG (Supplementary Table S2). The taxonomical identifiers of the BOLD fasta files were reformatted to remove spaces and rearrange their informative fields in a consistent manner (fasta_name_reformat.py script from https://github.com/mloera/forage-barcoding).
Each barcode-specific fasta file was then used to make a blast database and the SWFRG sequences were queried in their corresponding database with blastn using the flag outfmt = 6 (i.e., tabular format). The resulting blast output tables were parsed with the blastn_matcher.R script from the above-mentioned GitHub repository. The script removes self-hits and corrects some misspellings in the taxonomy of queries and hits. The script then compares the taxonomy of the queries and hits at the species- and genus-level. A “match” was called when the taxonomy of a query sequence is equal to the taxonomy of the highest scoring hit or hits. A “taxonomical assignment rate”for each barcode was then calculated as the ratio between the sum of its correct taxonomical assignments and the total number of query sequences.