SNPs from ddRAD-seq
We established ddRAD library and sequenced (150 bp paired-end) from 156 individual fish of 8 populations. The numbers of raw sequence read (average per sample: 16,016,724, range: 12,794,576 − 24,139,524) was obtained. After removing poor quality sequences, the average number of raw reads (15,046,821, range: 11,394,714 − 22,827,782) was applying for sequence mapping and SNP discerning. A total of 13,684,154 tags were detected and the final 32,744 high-quality SNPs showing minor allele frequency (MAF) of > 0.05 across the entire sequence data set were obtained by filtering and screening original SNPs.
For these loci, a significant difference was detected in genotype frequencies and the number of these SNPs among 8 populations. In general, SNPs statistical results of Dianchi (DC), Fuxian Lake (FXH), Chongming (CM), and Suzhou (SZ) were highly similar, in which ref and heterozygous genotypes were relatively dominant. In contrast, the results of all southern populations (GM, SS and DM) demonstrated a consistent genetic variation with more alt genotype, while Jining (JN) population was at an intermediate level (Fig.S1).
Genetic diversity and population divergence
Amplified from 156 individuals, 61,898,184 bp region of genomics gave rise to 32,744 SNP sites. Based on all these variant SNPs, the evaluations of genetic diversity, including Ho, He, π and FIS, were estimated (Table 1). There was still a marginal but distinct difference in genetic diversity among groups for different watershed regions. Among the 8 populations, the DM, GM and SS showed slightly lower nucleotide diversity (0.1937 ~ 0.1954) and expected heterozygosity (0.1850 ~ 0.1891) than the others. High genetic diversity was observed in the northern and the plateau group. In addition, the value of FIS within the plateau populations was lower than other groups (0.1113 ~ 0.1147), which might be connected with random genetic drift and the founder effect after transplantation.
AMOVA tests based on 32,744 SNPs was employed to show that the majority of the molecular variances derived from the differentiation among groups (48.8%) or within populations (50.31%) rather than within groups (Table 2). Furthermore, pairwise FST values of the southern group from the Pearl River System appeared a high level of differentiation (0.296 ~ 0.322) when compared to the remaining groups, coupling with low gene flow (0.527 ~ 0.595) (Table 3), thus corroborating on a distinct cluster in the molecular phylogenetic tree (Fig. 4). However, genetic differentiation between each pair of populations within groups was lower. Due to the transplant event, the genetic divergence of both the northern and the plateau groups did not reach a significant level in a short time (0.046 ~ 0.067), showing at least mild differentiation (FST > 0.05, Nm < 5; Wright 1984) in comparison with the other populations respectively (Table 3), but its pairwise FST increased strikingly, associated with trends of genetic divergence. And the plateau group exhibited relatively low FIS values (0.1113 ~ 0.1147), indicating a lower level of inbreeding coefficient of an individual relative to a subpopulation. Although genetic differentiation is significant among groups, no signs of isolation by distance were detected across all locations (R2 = 0.2011, p = 0.096 > 0.05).
Genetics structure, phylogeography and migration
Genetic structure and phylogenetic analysis determined that the 8 populations were divided into two clades three independent clusters, respectively. The sampled individuals from the Yangtze and Plateau Lakes Systems were assigned to one admixture by phylogenetic analysis, suggesting the close relationship between the two systems (Fig. 3). The southern group formed a distinct cluster that was well separated from the rest of populations in the phylogenetic relationship, lacking gene flow (0.527 ~ 0.595) and promoting genetic isolate. Similar phylogenetic topology was recognized by PCA which clearly divided the populations into three distinct lineages (Fig. 4).
Bayesian clustering analyses with high-density SNP markers inferred the optimal number of subgroups (K) by using Evanno's ΔK method and the likelihood at each K (Evanno et al. 2005). As the most parsimonious partitioning of individuals within species, the estimates for ΔK fall into two distinct patterns: the highest ΔK values at K = 3, followed by secondary values K = 4 (Fig. 5a, b). Nevertheless, these analyses achieve the lowest likelihood at K = 3 and the ancestral barplots showed that the plateau groups and the northern were grouped into one cluster, which was not consistent with previous phylogenetic analysis. The barplot at K = 4 reasonably reflected the complex relationship that individuals can be attributed to 4 genetic clusters, identifying to the ancestral components, which they denoted as northern group (orange and pink), plateau group (green and pink), and the southern group (blue and orange) with no admixed genotype (Fig. 5c).
The analysis of population splitting patterns, the direction and magnitude of migration was conducted by TreeMix based on our SNP data (Fig. 6), and the tree rooted using SNPs variants of H. quoyi, one sister genus of H. intermedius, which revealed two key results: First, within two main clades, GM, SS, and DM populations, comprising of the southern groups, were well separated from the remaining sites, while the Plateau and northern groups were admixtures with strong support. 7 historical migration events were added to the tree sequentially. For all SNPs, the migration events were inferred for several pairs of taxa, including gene flow from the whole five population except southern group into CM population (0.320), from northern and plateau groups into southern group (0.440), and from the northern group into Yangtze River System (SZ and CM; 0.345), as well as another 4 weak migration events (0.004 ~ 0.055). Noteworthily, the total direction of migration was non-random, moving from higher to lower altitude and higher to lower latitude.
Demographic history of ancestry populations
To assess deviations from neutrality across whole groups, the results for the overall SNPs dataset of H. intermedius showed that the significant positive value (mean Tajima's D = 2.773; Fig. 7), which revealed that the medium-frequency loci were polymorphic and dominant in the genomes, indicating effects of balancing selection or population subdivision, which maintained the higher genetic variance and contributed to local adaptation. There were fewer negative Tajima's D values and low-frequency loci, proving no strong positive selection.
Demographic history analysis based on posterior probabilities was conducted by ABC analysis to interpret the history of the genetic clusters detected in the phylogeography. Prior to this, we hypothesized that 4 scenarios were consistent with the demography of H. intermedius (Fig. 8c). The analysis results indicated that support for scenario 2 reached the highest level with a probability of 0.34 in direct approach (Fig. 8a) and 0.7577 in logistic approach (posterior probability, Table 4, Fig. 8b), whereas other models exhibited lower probability. Based on the demographic model, the scaled model values of effective population sizes varied greatly among regions: the population groups of PL had large population sizes (6.58e + 3), whereas the northern populations (NP) were smaller (6.53e + 3), and the southern population (SP) sizes was only 6.40e + 2(Table.4).
Morphology and phenotype-environment association
Analysis distinguished pronounced changes of shape associated with the bases of the dorsal and anal fins. Furthermore, the numbers of dorsal and anal fin rays showed a significant difference between the south groups and the remaining groups (P < 0.05), indicating the appearance of dramatic phenotypic difference which coincided with the genomic differentiation.
To explore the relationships between morphological traits and environmental heterogeneity, all the measured phenotypic and environmental parameters were conducted by RDA analysis (the redundancy analysis). After the Monte Carlo tests and the forward selection from RDA, water transparency, water temperature, and daily temperature range were selected as the key environmental variables which significantly influence morphological traits of H. intermedius (P < 0.05; Fig. 9a, b), but dissolved oxygen content also contribute to certainly influence on the traits with no significant p-value(p = 0.094 > 0.05). Ordination diagram of RDA displayed that the first two axes cumulatively explained 86.71% of the variance of the phenotype-environment relationship (Axes 1 and 2 modeled 53.7% and 33% of phenotype data respectively). Of the four variables, only water temperature was highly positively paralleled to the axis 1 of redundancy analysis, whereas daily temperature range, DO (dissolved oxygen) and transparency was negatively correlated with both axes. Axis 1 explained 53.7% of the total variability on the phenotype-environment relationship, while the second axis explained 33% of this variance in morphological traits (Table 5).
One main gradient along axis 1 was observed, linking to morphological data (Fig. 9a) and sites (Fig. 9b) being separated from three groups. The phenotypic traits such as dorsal and anal fins related to DTR and DO. While the UJAW(upper jaw length) and many other traits generally preferred warmer water, BW(body weight) and SL(standard length) were correlative with high transparency (Fig. 9a). In terms of sites, the plateau group performed a strong relationship with dissolved oxygen content, daily temperature range, while the southern group distribution was associated with the water temperature and JN population was admixed with the plateau group.