Our study, based on a taxonomically diverse dataset and several species delimitation methods, reveals that about 70% of the examined Ponto-Caspian amphipod species show concordance between the MOTUs revealed by DNA barcoding and species recognized through morphological study. However, the remaining species showed morpho-molecular mismatches resulting both from splitting or lumping. Because some of these incongruent taxa are prominent invasive species, our results indicate that taxonomic work is urgently needed to resolve the inconsistencies.
Taxonomic And Evolutionary Implications
Many of the taxa (ca. 70%) examined in our study had congruent morphological and molecular boundaries, especially when using the conservative ASAP method. This indicates that DNA barcodes can effectively identify Ponto-Caspian amphipods. However, the other species showed consistent morpho-molecular discordances that need further attention. In several instances, the majority of the species delimitation methods indicated the presence of several evolutionary distinct molecular lineages (MOTUs) within a particular morphospecies. Because members of the different MOTUs possessed the same diagnostic morphological characters (as defined by Copilaș-Ciocianu and Sidorov23), they can provisionally be considered cryptic species55. Many cases of cryptic lineages reflected divergence between populations in the Black Sea and Caspian Sea basins, suggesting the importance of geographical isolation for speciation in these lineages27. However, Pontogammarus borceae possessed three MOTUs in the Black Sea basin and one in the Caspian Sea. Our study further confirms previous studies that Dikerogammarus haemobaphes, D. bispinosus, Pontogammarus maeoticus and perhaps Chaetogammarus ischnus contain cryptic lineages that require taxonomic attention32,33,36. Interestingly, for amphipods, cryptic or pseudo-cryptic lineages have not yet been detected outside the native range. However, such cases were reported for Ponto-Caspian mysid crustaceans56. Despite these cases, it appears that cryptic species are less common within Ponto-Caspian gammarids than in strictly freshwater taxa such as Gammarus where most of the nominal species contain numerous cryptic lineages2,57−59.
Apart from these splits, we also observed cases where several distinct morphospecies were merged into a single MOTU. The most remarkable case involved Trichogammarus (alternatively Echinogammarus) trichiatus which was nested within C. ischnus. Earlier studies sequenced European populations of T. trichiatus but no prior sequences came from the Caucasian range, which is the type locality of T. trichiatus28,60. Our results show that the European taxon (indicated as T. cf. trichiatus) is molecularly distinct, and likely represents a distinct species from the Caucasian taxon (indicated as T. trichiatus). The European taxon was also initially described as a separate species (from Romanian and Bulgarian lagoons) which was later on synonymized with the Caucasian T. trichiatus60. Other cases of lumping involved Pontogammarus robustoides and P. setosus, Amathillina cristata and A. spinosa, and D. haemobaphes and D. oskari. The first taxa pair is allopatric while the last two pairs are sympatric. In one exceptional instance, DNA markers showed that three currently described morphospecies Chelicorophium curvispinum, C. monodon and C. mucronatum, are likely to belong to one species. None of these cases appear to represent mitochondrial introgression as the same lack of divergence was observed for the nuclear markers (unpublished data), indicating either incipient speciation or that some “species” are just ecophenotypic variation.
In rare cases we observed both lumping and splitting of a particular morphospecies. For example, D. haemobaphes was split into two cryptic lineages but D. oskari was nested in one of them. Similarly, P. borceae was split into three lineages in the Black Sea, but its Caspian lineage was lumped with P. abbreviatus. Chelicorophium mucronatum was sometimes resolved as a distinct lineage, but in other cases it was lumped with C. curvispinum and C. monodon.
These inconsistencies between MOTUs and morphology imply the need for integrative taxonomic work, preferably incorporating nuclear and mitochondrial markers, as well as morphology and ecology61. Our study does affirm the value of DNA barcoding as an important first step for exposing cases of taxonomic incongruence. From an evolutionary perspective, the high and low genetic distances observed between and within morphospecies indicate that the Ponto-Caspian amphipod fauna is diversifying in complex ways and at divergent rates.
Reference library and importance of accurate species identifications
Although our study has more than doubled the DNA barcode coverage for Ponto-Caspian amphipods by adding records for 32 taxa, the current dataset only includes representatives for about 60% of the known fauna. Nevertheless, the current data includes the most common species and all species that have substantially dispersed outside their native Ponto-Caspian realm. It therefore constitutes reliable reference library for monitoring and early detection of species that have invaded Europe and North America. It also provides a foundation for further taxonomic progress.
By comparing our COI dataset with sequences available in GenBank and BOLD, we detected 14 cases of misidentification or errors in data entry. In some cases, the incorrectly identified species are morphologically similar to the correct identification (e.g., D. haemobaphes misidentified as D. villosus). However, most cases of misidentification involve representatives of different genera that are very distinct morphologically (e.g. T. cf. trichiatus misidentified as Akerogammarus sp., or Obesogammarus crassus misidentified as Pandorites podoceroides). Such cases may reflect errors in data handling or contamination. Sequencing of immature specimens might also be the source of some errors since their accurate identification is more problematic. However, these errors clearly emphasize the need to reverse the declining numbers of well-trained taxonomists62.
Because well-validated DNA barcode reference libraries are critical for accurate taxonomic species identification via barcoding or metabarcoding, the occurrence of misidentified/mislabeled sequences is highly problematic, especially for taxa with few sequences available. For example, before our study there was only one publicly available COI sequence labeled as Chaetogammarus warpachowskyi, but it actually represented a misidentified specimen of C. ischnus. There needs to be an ongoing effort to ensure that records in reference libraries are carefully validated.
Accuracy Of Species Delimitation Methods
Our results indicated that MOTUs delineated by the conservative ASAP method, which clusters individuals into species based on the gap between intra- and interspecific genetic distances, had the highest congruence with morphology-based taxonomy, with values approaching 69% for both the COI and 16S markers. However, on the downside, it lumped more morphospecies than the other methods, especially for the more conserved 16S marker (28%). The COI-based patristic distance threshold (PDT), and BINs showed similar congruence levels at 67% and 66%, the former being more prone to lumping and the latter to oversplitting. The congruence levels of the PTP method were also not far, with 65% for COI and 63% for 16S. Lumping and splitting occurred with similar frequency at COI but lumping was twice more prevalent at 16S. However, the KoT method at the traditional level of 4 was obviously much less congruent with morphology (54% for COI, 57% for 16S). With respect to markers, it was more prone to oversplitting at COI and to lumping at 16S. Using a more conservative threshold of 5 increased the accuracy, but only for 16S.
These present results are in broad agreement with previous studies, as they showed that distance-based methods, without a priori divergence thresholds, were generally more accurate than tree-based approaches which can be more prone to oversplitting 63–66. The patristic distance threshold of 0.16 substitutions per site proposed for the COI marker67 is rather realistic and should be employed more often in crustacean taxonomy, although it may be overly conservative in some instances68.