Collections. A total of 76 tissue samples were obtained from crayfish morphologically identified as F. shoupi from three sites within MC, Tennessee and one site in the Tennessee River at Pickwick Tailwater (Table 1, Fig. 1). Collections at PW were conducted by scuba divers operating from an anchored dive boat at depths ranging from approximately 2-4 meters and a river width of 308 meters. Crayfish were collected by overturning loose slab rocks and quickly grabbing specimens with gloved hands. The specimens were placed in mesh holding bags, brought to the surface where they were processed for data before being released.
Sampling sites in the MC drainage were selected to span the currently known distribution of the species with portions of the upper, middle (type locale) and lower reaches serving as designated sites and to facilitate comparisons between MC and PW. Tissues from outgroup species were collected from the Green River in Buffalo River Drainage (F. durelli, type locality), Barrens Fork in Collins River Drainage (F. placidus, BF population) and Sinking Creek in the Lower Cumberland - Old Hickory Lake Drainage (F. placidus, type locality) (Table 1). Tissue samples from F. shoupi were obtained from leg sections separated at autonomizing joints (approximately 5 mm) and individuals were immediately released on site. Crayfish are known to undergo autotomy of appendages and regeneration initiates after ~6 days, depending on the intermolt period (Durand 1960). Complete specimens for outgroups were preserved in 70% ethanol and will be deposited as voucher specimens in the Tennessee Tech Ichthyology collection. Tissues were stored in RNAlater at -20°C for preservation of DNA prior to extraction.
Mitochondrial sequencing. Genomic DNA was extracted from leg tissues using the EZNA Tissues DNA Kit (Omega Bio-tek) except that DNA was eluted in water in the final step. The 5’ end of the mitochondrial gene cytochrome oxidase I (COI) was amplified using universal primers LCO1490 (GGTCAACAAATCATAAAGATATTGG ) and HCO2198 (TAAACTTCAGGGTGACCAAAAAATCA) (Folmer et al. 1994). Conditions for polymerase chain reactions (PCR) were as follows: initial denaturation step of 5 minutes at 95°C followed by 35 cycles of 15 s at 95°C, 15 s at 54°C, 45 s at 72°C. This program ended with a final extension of 10 minutes at 72°C. PCR products were cleaned prior to cycle sequencing reactions by exonuclease I/shrimp alkaline phosphatase (New England Biolabs) and used for bi-directional Sanger sequencing on an ABI 3730 automated sequencer (MCLAB). Sequence chromatograms were imported and visualized using SEQUENCHER version 5.2 (Gene Codes Corp.). Sequences were aligned using the ClustalW alignment algorithm (Thompson et al. 1994) as implemented in Bioedit (Hall 1999). Alignments were refined by eye and protein-coding genes were examined for stop codons to check for miscalled bases and pseudogenes.
Mitochondrial Haplotype Diversity. Estimates of genetic variation within sampled populations at the mitochondrial COI gene, including genotypic diversity (HGD) and nucleotide diversity (πM) were performed in Arlequin 3.5.2 (Excoffier et al. 2010). Mitochondrial haplotypes from all F. shoupi individuals and outgroups were trimmed to an equal length of 632 base pairs and used in a reconstruction of a minimum spanning haplotype network as performed by the package pegas (Pardis 2010) in the R-studio suite (Racine 2012). Structuring of genetic variation for mitochondrial COI haplotypes was investigated using Analysis of Molecular Variance (AMOVA) as performed by Arlequin (Excoffier et al. 1992). The hierarchical analysis investigated the proportion of molecular variance due to difference between drainages (i.e. MC populations and PW; FST), variation between populations within drainages (FSC), and variation within sampled populations (FCT).
Phylogenetic Analyses. Phylogenetic reconstructions of mitochondrial haplotypes were estimated using both Bayesian and maximum likelihood (ML) optimality criteria. Both analyses assumed the HKY-Gamma model of evolution (Hasegawa et al. 1985); this was selected as the best-fit model under Bayesian Information Criteria in as performed by MEGA X (Kumar et al. 2018). A Bayesian phylogenetic reconstruction was performed using MrBayes version 3.2.7a on the CIPRES Science Gateway (Miller et al., 2010). The Markov chain Monte Carlo (MCMC) algorithm ran for 10,000,000 generations, sampling every 1,000 generations. Two independent runs were performed and the resulting trees were combined after the deletion of a burnin (first 10% of trees). A majority-rule consensus tree was generated and nodal support was estimated by posterior probabilities. A ML analysis was performed using MEGA X and the tree with the highest log likelihood was retained. Nodal support was estimated by bootstrap analysis with 1000 replicates.
Genotyping by Sequencing. Genotyping by sequencing (GBS) was used to identify and genotype SNPs across all four F. shoupi sites, F. placidus LC and F. placidus BF (Table 2) following the protocol described by Elshire et al. (2011). Genotyping by sequencing library construction used the same DNA extractions that were used for mitochondrial Sanger sequencing. Extracted DNA was standardized to 5 ng/μl and quantified using the Quantifluor dsDNA System (Promega). Standardized DNA was digested with the restriction enzyme ApeKI (New England Biolabs) and barcoded adapters were ligated to cut restriction sites. Each individual was labelled with a unique barcode. Ligated DNA was pooled and PCR amplified using primers complementary to ligated adapters. The resulting PCR product was cleaned using the Qiaquick PCR purification kit and the distribution of PCR fragment lengths was measured using an Agilent 2100 BioAnalyzer (Agilent Technologies). The resulting library was cleaned using the AxyPrep Mag bead purification system (Agilent) and sequenced on an Illumina NexSeq 500 (Illumina Inc.) with the 75 bp, single-end read chemistry.
Genotyping and SNP filtering. Single Nucleotide Polymorphisms were identified and filtered using the de novo map STACKS pipeline (Catchen et al. 2013). In ustacks, the minimum number of raw reads needed to form a stack (allele) was set to m=3 and the number of mismatches between alleles within individuals M=2. A catalog of loci was constructed in cstacks where n=2 mismatches were allowed between sample tags. The populations function was used to estimate population summary statistics and to generate input datasets for downstream analyses. One source of bias in GBS datasets is mutations that occur at restriction enzyme cut-sites; the accumulation of substitutions in restriction sites can influence population genetic inferences by causing allele drop-out and biasing datasets to more conserved regions of the genome (Guatier et al. 2013; Silliman et al. 2021). We investigated the influence of filtering parameters for missing data on the retention of loci and variable sites using the filtering flags in the populations function. Estimates of contemporary effective population sizes and tests for bottlenecks used the p1r70 dataset, where SNPs were required to be present in 70% of individuals in a single population. This dataset was used to maximize the number of loci within populations instead of shared loci across populations. Estimates of within population genetic variation and population structure (Bayesian assignment tests, DAPC, and pairwise FST’ and ΦST estimates) were performed using the p2r60 dataset, so that only SNPs present in at least two out of four of the F. shoupi populations were included in these analyses. Maximum heterozygosity was set at 0.60 to remove homeologous loci. Tests for Hardy-Weinberg equilibrium (HWE) were performed in Plink 2.0 (Chang 2015) and loci with p-values below 0.01 in any single populations were removed from further analyses.
Genetic Variation at SNP loci. Estimates of genetic variation including mean observed heterozygosity (Ho), expected heterozygosity (He), and nucleotide diversity at variable sites (πN) were estimated in the program populations within the STACKS workflow. We estimated the average allelic richness (ARN) and private allelic richness (PRN) using a rarefaction approach to correct for unequal sample size across sites as implemented in the software ADZE (Szpiech et al. 2008). Due to the loss of shared loci across species we did not include F. placidus LC and F. placidus BF in rarefaction analyses.
Demographic Histories. Contemporary effective populations size (Ne) estimates for all populations of F. shoupi and F. placidus were performed using the LD method (Waples and Do 2008), as implemented in NeEstimator v2.01 (Do et al. 2014). The LD method measures the strength of association of alleles at independent loci to infer the degree of drift as a function of Ne. A threshold of 0.02 for the allelic frequency was chosen in order to reduce bias resulting from low-frequency variants. Confidence intervals (95%) were estimated using parametric bootstrap. Linkage disequilibrium based estimates of Ne were adjusted for physical linkage using the equation by Waples et al. (2016).
We used the estimated chromosome number from the congeneric F. virilis for Ne adjustments (Chr = 200, Mlinarec 2016).
Rapid declines in population size can also leave a signature on patterns of allelic variation within populations due to a loss of low frequency alleles. We tested for evidence of recent population bottlenecks using the program BOTTLENECK (Piry et al. 1999). A test for excess heterozygosity (compared with mutation-drift equilibrium) was performed using the 1-tailed Wilcoxon signed rank test (α = 0.05), which detects historical bottlenecks (0.25 – 2.5 times Ne generations; Cornuet and Luikart 1996). Simulation parameters were set for 1000 iterations and assumed an infinite allele model (IAM). We also tested for a mode-shift in allele frequency distributions to detect a genetic signature of more recent bottlenecks (Luikart et al. 1998). Recent bottlenecks result in a rapid loss of low frequency alleles, resulting in the allele frequency spectrum to a greater number of alleles at intermediate frequencies then expected under mutation-drift equilibrium. The allele frequency shift is detectable for several dozen generations.
Population Structure. Pairwise population differentiation at SNP loci was summarized using pairwise FST’ and ΦST as implemented in the program Populations in the STACKS workflow. FST’, an analogue of FST, is a standardized measure of differentiation that is independent of the amount of genetic variation that is present within populations (Meirmans 2006). The ΦST statistic is an AMOVA-based measure of population differentiation that incorporates information regarding the molecular distance between haplotypes (Excoffier 1992).
A Bayesian assignment test was implemented in the software STRUCTURE v. 2.3.4 (Pritchard et al. 2000) on all F. placidus and F. shoupi populations; F. durelli was not included due to small sample size. We performed ten independent runs for each value of K (1-6) assuming the admixture model. Markov chain Monte Carlo (MCMC) settings included two independent MCMC chains, each with a burnin of 50,000 iterations followed by 100,000 iterations. The optimal valued of K was examined by plotting the average natural log of the probability of the data for K=1-6 (Ln Pr(X|K); Pritchard and Wen 2003). We also investigated the optimal value of K using the ΔK method as implemented in Structure Harvester; however, due to small standard deviation across replicates for K=3, the ΔK method could not be performed (Evanno 2005; Earl 2012). Permuted membership assignment coefficients from the ten independent runs were estimated using Clumpp v. 1.1.2 (Jokobsson and Rosenberg, 2007). Structure bar plots were generated using STRUCTURE PLOT (Ramasamy et al. 2014).
Discriminant analysis of principal components (DAPC) was performed in R using the package ‘adegenet’ (Jombart et al. 2010). This method determines the number of populations or groups without prior knowledge of sample origins. Discriminant analysis of principal components differs from Bayesian clustering in that it does not assume that loci are unlinked and that populations are randomly mating. The optimal number of clusters was determined using BIC as performed by the find.clusters function.
Divergence time Estimation. Divergence times estimates for the split between the F. shoupi MC sites and PW were estimated using the Bayesian multispecies-coalescent-with-introgression (MSci) method implemented in the A00 model in BPP v.4.2 (Yang 2015). The input tree for the BPP analysis was based on results from the mtDNA gene tree. Two different models were used for divergence time estimates. The first model assumed no migration after initial population divergence between F. shoupi populations at MC and PW. The second model allowed bidirectional migration between the two drainages. Each analysis of 50,000 MCMC generations was ran twice from different starting seeds with a burn-in period of 5000 using algorithm 0 (default fine-tuning parameter, e = 2) and an estimated heredity (a = 4, b = 4); this gave consistent parameter estimates between replicate runs and ESS values >1000 for all parameters. We consider speciation probability values >0.95 as strong support for all speciation events. The theta prior (3, 0.04), tau prior (3,0.2) phi prior (1,1) finetune automatic update. The posterior probabilities of the model with migration (M1) and without migration (M2) were compared using the stepping-stones method as implemented in the bppr package (https://github.com/dosreislab/bppr, see also, and Rannala and Yang 2017). The stepping-stone method analysis used 16 independent MCMC chains for each model (50,000 generations per chain).