Here we demonstrate the high accuracy (> 99% correct assignment) of a set of short haplotypic markers for identifying 54 species of the genus Sebastes, including all of the species commonly found in the California Current Large Marine Ecosystem along the Pacific coast of North America. Using these loci, we distinguish between closely related and recently described cryptic species, describe phylogenetic relationships, and quantify a decrease in the heterozygosity and nucleotide diversity of these genetic markers in species with increasing evolutionary genetic distance from the ascertainment species.
Ecological studies and management of fisheries require efficient methods to conclusively identify sympatric marine species, particularly at the larval and juvenile stages. In rockfishes, planktonic larvae from many species coexist during their pelagic phase and remain challenging to identify morphologically as they recruit to settlement habitats (Butler et al. 2012). Even as adults, the number of species present in overlapping habitats, the presence of cryptic species (e.g., Frable et al. 2015), and subtle differences in coloration or morphology (Ingram and Kai 2014) underscore the need for genetic species identification. Previous marker types have been used for this task; one such study included 33 species with 97.4% assignment accuracy (Pearse et al. 2007), and the other, a much more complete survey of the genus, genotyped 103 individuals from 101 species at seven mitochondrial and two nuclear genes, but did not test these data for genetic assignment accuracy (Hyde and Vetter 2007). Our method of genotyping fewer than 100 multiplexed microhaplotype loci with high-throughput DNA sequencing is highly accurate, efficient for large sample sizes and can be coupled with a reproducible analysis workflow based on the reference database for species assignment generated by this study.
Self-assignment using genotype data from 90 retained microhaplotype markers accurately identified the true species identity of every sample for all 54 species, with the exception of two extremely closely related species. At a stringent likelihood threshold (> 95%), eight samples of S. carnatus and S. chrysomelas assigned to the combined S. carnatus/chrysomelas group at a lower level of confidence, but still above a 50% scaled-likelihood. Notably, these sister species have been the subject of ongoing research (Narum et al. 2004; Buonaccorsi et al. 2011) and our results from the self-assignment demonstrate the challenge of separating the two groups with existing genetic markers and call into question their taxonomic status as two distinct species.
Coincidentally, S. carnatus/chrysomelas are also the most phylogenetically proximate to the primary ascertainment species (S. atrovirens; Fig. 1; Fig. S3, Fig. S4), and with nearly as much variation in these loci (Fig. 2, Fig. 3). And while these genetic markers easily differentiate juvenile-stage cryptic species (e.g., S. mystinus/diaconus, S. aleutianus/melanostictus) and those commonly misidentified even as adults (e.g., S. flavidus/serranoides), they underperform for S. carnatus/chrysomelas. This indicates that these taxa are more genetically similar than every other pair of sister species included in our dataset, at least in the portion of the genome surveyed with these loci, consistent with the lowest pairwise FST value (0.015) in the study. Previous work on S. carnatus and S. chrysomelas identified a single, highly diverged locus and concluded that the pair is likely in the final stages of speciation, but with ongoing gene flow (Narum et al. 2004, Buonaccorsi et al. 2011). Such a hypothesis is consistent with the general idea that speciation mechanisms in rockfishes likely involve both allopatric and sympatric processes, including habitat differentiation associated with depth gradients (Ingram 2011) and mate choice reinforced by internal fertilization (Buonaccorsi et al. 2011).
Previously described rockfish species relationships relied heavily on mitochondrial DNA data (Kai et al. 2003; Li et al. 2006; Hyde and Vetter 2007; Li et al. 2007), providing an opportunity to apply the nuclear markers from this study to estimate phylogenetic relationships for comparison (Fig. 1, Fig. S3, Fig. S4). Rooted and unrooted maximum-likelihood trees produced consistent topologies with very similar branch support, although some deeper nodes in the unrooted tree garnered higher support, while other nodes were better supported in the rooted tree (Fig. 1, Fig. S4). High confidence nodes in the Bayesian tree were generally well supported in the maximum-likelihood tree, with most differences occurring at nodes with lower support, such as the position of either S. alutus or S. borealis in a clade with S. melanostictus and S. aleutianus (Fig. 1, Fig. S3). Few instances of well-supported Bayesian relationships deviate from the maximum-likelihood tree, although S. polyspinis presents one such case. The Bayesian tree topology from our data is the most appropriate for comparison with the phylogeny in Hyde and Vetter (2007) since the analyses are equivalent and, although Bayesian methods can overestimate node support, bootstrapped maximum-likelihood values may be overly conservative (Douady et al. 2003).
Most relationships remain consistent between the microhaplotype tree topologies and the more complete Sebastes tree from Hyde and Vetter (2007). Although they analyze species that are absent from our dataset, primarily from the northwest Pacific and North Atlantic, we analyze representatives from each major clade with the exception of the subgenera Sebastocles and Mebarus, whose constituents are exclusively in the northwest Pacific, with the exception of S. atrovirens which should clearly be included in the Pteropodus subgenus. Generally, we find very high concordance with Hyde and Vetter (2007) at the subgeneric level. Areas in which the microhaplotype tree (Fig. 1) deviates from their tree include clade ”D” nesting within Pteropodus, and members of Eosebastes, S. aurora and S. diploproa, nesting within Sebastichthys. At the species level, more variation exists. For example, both trees depict close phylogenetic relationships among S. atrovirens, S. carnatus, and S. chrysomelas, with the microhaplotype tree placing S. maliger as a closer relative of the three species than S. caurinus, as in the mitochondrial tree. Other small differences in the topologies include strong support that S. melanops is more closely related to S. flavidus than S. serranoides; and that S. goodei is more closely related to S. paucispinis than to S. jordani. We also show that S. diaconus and S. mystinus are easily distinguished and nearest neighbors in the phylogeny, which is unsurprising since these species were only recently described as separate taxa (Frable et al. 2015).
Taxonomy of rockfishes, particularly of subgenera, has been and continues to be dynamic, as highlighted by multiple revisions of subgeneric classifications (Love et al. 2002). For example, S. diploproa is part of the subgenus Sebastichthys in Kendall (2000), who cites Eigenmann and Beeson (1894), but Li et al. (2006) designate S. diploproa as a member of Allosebastes, attributed to Gilbert (1890). Phylogenetic relationships described by the microhaplotype data are generally consistent with mitochondrial data and support polyphyly of generally accepted subgenera, including Acutomentum, Allosebastes, and Sebastosomus (Hyde and Vetter 2007; Li et al. 2007). A formal re-description of these subgenera would alleviate some of the taxonomic confusion but comprehensive taxonomic revision would require data from more species in the genus than included in this study.
The set of nearly 100 microhaplotype loci target substantial variation in the ascertainment species, S. atrovirens, S. carnatus, and S. chrysomelas (Baetscher et al. 2018; 2019) and contain a similar amount of variation in a closely related taxon (S. maliger). However, variation declines rapidly with increasing genetic distance (Fig. 3), even for members of the Pteropodus subgenus. Such reduced variation has been documented in studies of ascertainment bias in microsatellite loci across multiple genera (Vowles and Amos 2006). Even so, the ascertainment bias we observe here is even more significant than previously observed, with dramatically decreased nucleotide diversity over relatively small evolutionary genetic distances, with only the most closely related species to those included in the marker discovery process found to have substantial variation (Fig. 3). The surprising amount of variation in S. rosaceus and S. ensifer, despite their evolutionary distance from Pteropodus, might be explained by cryptic structure in those species, as indicated by the relatively high number of loci that deviated from HWE. However, selectively removing loci for individual species would be challenging with the > 50 species included in this analysis.
Although the relatively low observed heterozygosity found in this set of markers for the majority of species analyzed here suggest limited utility for purposes other than species identification (e.g., pedigree reconstruction), the amplicon library preparation protocol is highly flexible and enables researchers to add additional loci or swap out markers that would increase power for species of particular interest. Such an effort could bolster this set of markers for population genetic structure or pedigree analyses in additional species, and previous research has shown that genotyping samples with a single set of genetic markers to both identify species and analyze pedigree relationships is an economical approach (Baetscher et al. 2019).
Here, we describe an efficient method for genotyping and analyzing genetic data to identify species of rockfishes, particularly for taxa commonly captured together as juveniles. The genetic markers we employ, and our subsequent analytical workflow, provide highly accurate species identification and estimates of phylogenetic relationships largely consistent with previous genetic data. In addition, we describe a flexible protocol for modifying the set of target loci and accounting for ascertainment bias to suit the specific needs of a variety of ecological studies and fisheries management objectives.