The accurate evaluation of biodiversity for any given ecosystem is a keystone element, even imperative, in numerous biological and applied disciplines, including ecology, conservation biology, food regulatory compliance, forensics, and ecosystem monitoring and assessment [1-2]. In response to these needs, DNA-based taxon identification relying on the genetic barcode markers, mainly and above all, Cytochrome c Oxidase I (COI) gene, are commonly used to assess biodiversity, including species identification, boundaries and diversity analyses. Except for the COI, other mitochondrial genes are in used for animal barcoding such as Cytb, 12S or 18S, and while COI marker is commonly used for most animals, several other markers are in usage for different taxa, such as the RuBisCO (Ribulose-1,5-bisphosphate carboxylase/oxygenase) is used for plants, internal transcribed spacer (ITS) rRNA often used for fungi, 16S rRNA gene is widely used in identification of prokaryotes, the 18S rRNA gene is mostly used for detecting microbial eukaryotes and other markers as well, can be found under “primers list” in BOLD where a few thousands of primers are available for the different markers for the species identification. BOLD [3], a cloud-based data storage and analysis platform that can be further employed as a curation tool, currently contains (updated in April 2020) about eight million barcodes, encompassing >310,000 animal, plant and fungal species. The rationale for using the COI marker gene in species barcoding relies on the fact that intraspecific diversity for this gene is usually lower than interspecific diversity and thus is more effective in species identification, and along with the difficulties associated with the traditional morphological taxonomy [4]. The major benefit of using BOLD is immediately emerged when an unknown sequence is compared against a database to determine its closest species match, an evaluation that strictly depends on the correctness and reliability of the data stored BOLD and on the quality of the barcode libraries [5]. Yet, it should be noted that while barcodes may be excellent tools to identify species that are already in BOLD, they may have poor predictive power in identification of unknown species, however, it should be noted that they do have good predictive power if the species has close relatives in BOLD already.
As a curator tool, it is inferred that all barcode sequences stored in the BOLD database are backed by vouchered specimens and thoroughly identified by taxonomy experts. Yet, being a public database, it is inevitable that BOLD, as any other similar curation tool, might accrue erroneous data, sometimes significantly [2,6]. Taxonomic misidentifications and/or taxonomic conundrums, cryptic species complexes, delimiting cryptic species, technical faults, such as deficient DNA extraction, PCR-based errors and foreign DNA contaminations, including bacterial sequences, especially COI sequences, are just some of the causes that may unavoidably generate erroneous data and inaccurate sequences [2,6-10]. The above difficulties may affect dramatically the accuracy of barcoding. For example, the Barcode Index Number (BIN) is used as a system persistent registry for animal OTUs and is recognized through sequence variation in the COI DNA barcode region, and aid for the taxonomy of species by flagging possible cases of synonyms for specimens that are likely to belong to the same species, however, it has been claimed that it can lead to the lack of an unambiguous species level identification in the BOLD system, and to taxonomic conflicts by the assignment of more than a single species name per BIN [11].
The European Register of Marine Species (ERMS [12]) is an authoritative taxonomic check-list of species that are found in all European marine environments (the all-taxon marine species inventory from the Canaries and Azores to Greenland and north west Russia, towards the Mediterranean sea and the Baltic Sea), from the deep sea, all continental shelf areas and up to the splash zone above the high tide mark, and in estuaries, down to 0.5 psu salinity. During 1997–1999, ERMS was published on the internet and subsequently as a book, containing a list of about 30,000 marine species of the kingdoms Animalia, Plantae, Fungi and Protoctista, occurring in the European marine environment [13]. It is projected that this marine species inventory will be used as the standard reference and technological tool for marine research and for management of the marine environment in Europe.
Until recently, the standardized methodologies available for biological monitoring and management in the marine environments, primarily for practitioners, were restricted to traditional morphological taxonomy, tedious and time-consuming methodologies that require the involvement of expert taxonomists with skills that can only be attained via years of practice. This line of analyses is currently being complemented and may be even replaced in the future by molecular approaches such as DNA barcoding and metabarcoding of bulk or environmental DNA (eDNA) [5, 14-17]. The success of these approaches is strictly dependent of complete and reliable DNA barcode reference libraries. Thus, , it is of special interest to identify gaps in the current existing or developing DNA barcode reference libraries, primarily those that are pertinent in the context of the EU Water Framework Directive (WFD) and the Marine Strategy Framework Directive (MSFD). A recent global study on this perspective [5] has revealed that the barcoding coverage varies strongly among taxonomic groups, and among geographic regions, pointing to many missing species and unreliable data (e.g., errors in species identification, discordance among taxonomists) that are relevant to monitoring and highlighted the needs for improving quality assurance of the barcode reference libraries.
Following Weigand et al. [5] global analysis, we aim here to investigate potential gaps in already DNA barcoded organisms (based on publicly available data in BOLD database) listed in the ascidians and cnidarian (Anthozoa and Hydrozoa)- reference libraries of the ERMS inventory. We discuss the necessity of quality control (QC) when building and curating a barcode reference library, and provide recommendations for filling the gaps in the barcode library of European aquatic taxa.