Fishes (161 individuals) were captured in October 2019 from 10 locations in the Irtysh River basin, Xinjiang, China (Fig. 1, Table 1). Genomic DNA was extracted using an Ezup Column Animal Genomic DNA Extraction Kit from Sangon Biotech (Shanghai) Co., Ltd. Extracted DNA was stored at 4°C for less than a week, until further use.
Table 1
Sample locations including species, sample location, sample population code, drainage basin, number of individuals screened for both mtDNA and microsatellite loci, and geographical coordinates for Thymallus samples in this study
Sample location | Sample code | Number of individuals | | |
| | mtDNA | microsatellite | Lat. (N) | Long. (E) |
Kayiertesi River | KY | 25 | 12 | 47°38′31″ | 89°44′55″ |
Karaertis River | KL | 25 | 45 | 47°58′53″ | 88°40′10″ |
Crane River | KE | 16 | 14 | 47°54′38″ | 88°7′25″ |
Hongqi Reservoir | HQ | 15 | – | 48°4′33″ | 87°7′38″ |
Burqin River | BE | 15 | – | 47°47′26″ | 87°5′49″ |
Chonghuer Reservoir | CE | 15 | – | 48°4′33″ | 87°7′38″ |
Hemu River | HM | 13 | – | 48°40′14″ | 87°32′57″ |
Kanas River | KN | 15 | 27 | 48°47′8″ | 87°1′56″ |
Akkaba River | BH | 7 | – | 48°30′11″ | 86°36′9″ |
Kara-Kaba River | HB | 15 | 13 | 48°43′17″ | 86°46′23″ |
To amplify Cyt b gene sequences, we used L14724 (5'-GACTTGAAAAACCACCGTTG-3') and H15915 (5'-CTCCGATCTCCGGATTACAAGAC-3') primers. For the CR (Dloop) gene sequence we used D-loop-F (5'-ACCCCTGGCTCCCAAAGC-3') and D-loop-R (5'-ATCTTAGCATCTTCAGTG-3') primers.
A 25 µL PCR reaction system was established with 1 µL of each upstream and downstream primer (concentration 10 pmol/µL), 0.2 µL of polymerase, 2.5 µL of dNTP, 2.5 µL of buffer, 1.5 µL of Mg2+, 2 µL of DNA template (concentration 50 ng/µL), supplemented with double-distilled (dd) H2O to 25 µL. The PCR program entailed pre-denaturation at 94°C for 4 min, denaturation at 94°C for 55 s, annealing at 60°C for 45 s, an extension at 72°C for 30 s, and a final extension at 72°C for 7 min, for 30 cycles. After the reaction, 3 µL of PCR product was collected, and after confirming that the CR product displayed clear and bright bands after 1% agarose gel electrophoresis, purification and sequencing were conducted by Sangon Biotech (Shanghai) Co., Ltd.
Table 2 Microsatellite loci | |
| Locus | Repeat motif | Primer sequence (5′–3′) | Ta (°C) |
Single | ClaTet1 | (GACA)13 | F: GAGCCCATCATCACTGAGAAAGA | 60°C |
R: CTGCTACCCACAAACCCCTG |
4Plex | BFRO004 | (GT)11 | F: GCTCCAGTGAGGGTGACCAG | 58°C |
R: AGGCCACTGATTGAGCAGAG |
BFRO010 | (AC)17 | F: GGA CGG AGC CAG CAT CAC | 58°C |
R: GTTTCTTGATTTCATAATCAGGTCAATAGTCAT |
Tar103 | (ATCC)7TCC(ATCC)14 | F: CAGTCGGGCGTCATCACGGGGATCAATAAAGTATCC | 58°C |
R: CTTCACTGTCGCTGTGAGTAC |
Tth445 | (GATA)20 | F: TGA CGG CTA CAG GAA TTGT | 58°C |
R: GTTTCTTCCACAGAGGGTTCTACATTG |
5Plex | Tar100 | (CTTT)5CTTC(CTTT)18 | F: CAGTCGGGCGTCATCATTTGGATGTGTCAGACCTG | 58°C |
R: GAGAAAGCAAGGAGAAATCAC |
Tar101 | (CTTT)22 | F: CAGAGCACACCAAGCAGAG | 58°C |
R: GTTTCTTAGGGCAAGTCATTCCAGTC |
Tar110 | (TAGA)30 | F: GCAATAACAATTCCATGAGAAG | 58°C |
R: GTTTCTTCTCCTCTGATTCCAAGAAATG |
Tar112 | (TATC)7 | F: CAGTCGGGCGTCATCACCTGGGAATCAACAAAGTATC | 58°C |
R: AGGAGGTTCAGTGAGTGTTTC |
Tth313 | (GAGT)22 | F: AAACCAGTCCAAGCGAGAG | 58°C |
R: GTTTCTTCTCCTGTTTATCACATGA |
Microsatellite sequences are detailed in Table 2, as cited from previous studies (Weiss et al. 2020b). A 25 µL PCR reaction system was established containing 0.5 µL of each forward and reverse primer, 0.5 µL of dNTP (mix), 2.5 µL of Taq Buffer (with MgCl2), 0.2 µL of Taq polymerase, and supplemented with ddH2O to 25 µL. PCR amplification was performed using an ABI VeritiTM 96-well PCR machine, with pre-denaturation at 95°C for 5 min; denaturation at 94°C for 30 s, annealing at 60°C for 30 s, an extension at 72°C for 30 s, repeated for 10 cycles; denaturation at 94°C for 30 s, annealing at 55°C for 30 s, and an extension at 72°C for 30 s was repeated for 30 cycles; with a final extension at 72°C for 10 min. The size of the microsatellite loci was determined using a 3730xl DNA Analyzer.
mtDNA data analysis
The CR (Dloop) sequences were aligned with those of T. brevicephalus (N = 33) from existing literature, T. brevirostris sequences (N = 27), those of T. nikolskyi (N = 3) (Knizhin et al. 2008; Weiss et al. 2020b), and six outgroup taxa (Froufe et al. 2003b; Jacobsen et al. 2012; Koskinen et al. 2002b; Wang et al. 2019); 230 sequences were obtained. Multiple sequence alignment was performed using the ClustalW method in MEGA 11.0.13 (Larkin et al. 2007; Tamura et al. 2021), and haplotype analysis was conducted using DnaSP 6.12.03 (Rozas et al. 2017). The best nucleotide substitution models for Maximum Likelihood estimation (ML) and Bayesian Inference (BI) were identified as HKY + Gamma using MEGA 11.0.13 and MrModeltest 2 (Nylander 2004; Tamura et al. 2021). ML estimation analysis with 1000 iterations was performed using the HKY + G model in raxmlGUI 2.0 (Edler et al. 2021). BI was conducted in MrBayes 3.2.7, with the number of runs set to 2 × 107 (Ronquist et al. 2012). Haplotype networks were generated using PopArt1.7 (Leigh and Bryant 2015), with undefined states masked. To gain further insights into relationships among Thymallus species in the upper Irtysh River, the obtained 166 Cyt b sequences were trimmed and overlaid with the CR (Dloop) sequences. Using the same methodology, a combined sequence analysis of haplotypes was conducted to generate a haplotype network.
To estimate divergence times between species, a time-calibrated Cyt b + CR phylogeny was constructed using BEAUti in BEAST 1.10.4. This BI phylogenetic analysis employed the HKY + G substitution model (Drummond and Rambaut 2007). A molecular clock was modeled using uncorrelated relaxed molecular clock priors, specifically lognormal distributions (Suchard and Rambaut 2009). The tree prior was modeled using the birth–death process. The mtDNA molecular clock calibration for Salmoniformes species was set at 1% per MY, with a relative death rate following a normal distribution (mean 0.01, SD 0.002). Two divergence times were specified: the most recent common ancestor of Salmoniformes around 50 MY (lognormal distribution, offset 50, mean 10, SD 1) and the period around 0.13 MY when Thymallus baicalensis entered Lake Baikal (normal distribution, mean 0.12, SD 0.1) (Koskinen et al. 2002b; Weiss et al. 2021). Iterations were run 30 × 106 times, with sampling conducted every 3000 iterations. Generated data were assessed for convergence and effective sample sizes (> 200) using Tracer v1.7.2 (Rambaut et al. 2018). The final tree was constructed using TreeAnnotator v1.10.4, exported in FigTree v1.4.4 (Rambaut 2012), and optimized on the iTOL (Letunic and Bork 2021).
To further investigate if Thymallus species in the upper Irtysh River were conspecific, and to explore differences between them and other species, we used DnaSP v6.12.03 to compute nucleotide diversity (Rozas et al. 2017). Additionally, MEGA v11.0.13 was used to estimate inter-population genetic distances based on Kimura 2-parameter and Tamura–Nei models (Tamura et al. 2021). Analysis of molecular variance (AMOVA) was performed using Arlequin 3.5 to detect genetic variation among geographic populations or groups, and genetic variation indices (FCT, variance among groups relative to total variance; FSC, variance among populations within groups; FST, variance among populations) (Excoffier and Lischer 2010). The significance of covariance at different levels of genetic structure was tested through 1000 resampling iterations.
Tajima’s D test and Fu’s Fs test in Arlequin 3.5 software (Excoffier and Lischer 2010), as well as the sum of squared deviation (SSD) and Harpending’s raggedness index (r), were used to infer population historical dynamics. Mismatch distribution plots were observed using DnaSP v6.12.03 (Rozas et al. 2017). BEAST 2.7.4 (Drummond and Rambaut 2007) using Bayesian Skyline Plot (BSPs) was used to infer historical dynamics and approximate time ranges of the effective sample size for each genetic lineage, with parameter settings consistent with those used in divergence time analysis. Finally, Tracer 1.5 software was used to visualize and edit the historical dynamics of effective population size (Rambaut et al. 2018).
Nuclear microsatellite genotyping and data analysis
Preliminary analysis indicates that Thymallus populations in the upper Irtysh River are conspecific. Because of maternal inheritance of mtDNA (Wallace 2007), we used 10 microsatellite loci markers for supplementary phylogenetic reconstruction to further understand genetic relationships between populations. Using FSTAT v2.9.3.2 (Goudet 2001), we calculated the number of alleles per locus, deviations based on the FIS values for Hardy–Weinberg equilibrium, and deviations from Linkage Equilibrium. ARLEQUIN v3.5.2.2 was used to calculate observed and expected heterozygosity, as well as to perform analyses based on the infinite allele model (FST), stepwise mutation model (RST), and AMOVA (Excoffier and Lischer 2010). GenAlEx was used to calculate Nei’s genetic distance among populations. Principal coordinates analysis (PCoA) was performed based on genetic distances between populations and individuals (Peakall and Smouse 2012). A clustering tree was constructed based on Nei’s genetic distance using MEGA (Tamura et al. 2021).
Overall genetic structure was assessed using the Bayesian clustering method in STRUCTURE v2.3 (Porras-Hurtado et al. 2013). Prior values of K (number of populations) were assumed between 1 and 5. STRUCTURE was run for 100,000 iterations, with the first 50,000 iterations discarded as burn-in, and five independent MCMC replicates were performed for each K value. The Delta K statistic method (Earl and vonHoldt 2005) were used to determine the most suitable K value. After obtaining output results, K matrices from multiple runs were merged using CLUMPP (Jakobsson and Rosenberg 2007); the visualization was optimized using Distruct (Rosenberg 2004).
To further decipher the potential genetic structure of Thymallus in the upper Irtysh River, a cluster analysis was conducted using the stochastic optimization method in BAPS 6.0 (Corander et al. 2008; Corander and Marttinen 2006). BAPS analysis parameters were initially set to K = 1 to 5, with each K value repeated 10 times; the maximum Log marginal likelihood [Log(ML)] value determines the best K. For population admixture analysis, the number of iterations was set to 1000; other settings remained at default values.
To further understand divergence time and population history, we used the Approximate Bayesian Computation (ABC) algorithm implemented in the diyabcGUIv1.2.1 package (Collin et al. 2021). Population parameters for five hypothetical evolutionary scenarios were simulated and compared to observed data using Linear Discriminant Analysis (Cornuet et al. 2014). We used default uniform and wide priors for all parameters.