Sample collection and genotyping
A total of five Takifugu populations comprising 90 individuals were collected from three sites in China (Figure 1, Supplementary Table 1). After quality control of extracted DNA, all of the individuals were re-sequenced to explore the population structure and the selective signals under adaptive evolution. In total, 438.67 Gb of sequencing data were generated, while for each individual, 4.87 Gb sequences with an average of ~12.43-fold depth per sample were obtained. From 2.92 G raw reads, 2.57 G reads were mapped to the reference genome for SNP calling with an average mapping ratio of 87.87%. After the sequence alignment and filtering, a total of 11,344,115 candidate SNPs were identified for further analysis (Supplementary Table 2).
Population structure
To investigate the genetic relationships of different Takifugu species, a phylogenetic tree was constructed using all of the informative SNPs (Figure 2A). Each population formed a distinct clade that reflected the evolutionary differences among species as expected. Similar results were observed in the plot of the PCA (principle components analysis) analysis. Overall, the populations of T. rubripes, T. niphobles, and T. obscurus clustered separately and tightly, while the individuals of T. flavidus and T. bimaculatus were closely clustered together and overlapped. This was consistent with the results of phylogenetic analysis in that the clades of T. flavidus and T. bimaculatus shared one main branch point and were adjacent to each other. Four clusters from the PCA were far apart from each other with no confusing admixture. Population structure of the five species was evaluated using the Bayesian clustering program STRUCTURE. As the model K=4 obtained the highest log likelihood value, the population structure for K=4 is shown in Figure 2C. Unsurprisingly, similar results were observed; the five populations were separated into three subgroups. Outside the T. rubripes and T. obscurus subgroups, T. flavidus, T. niphobles, and T. bimaculatus clustered and shared a common ancestry. T. rubripes, T. niphobles, and T. obscurus showed a simplex inheritance background, while T. flavidus and T. bimaculatus reflected a more admixed genetic structure. T. flavidus and T. bimaculatus have a very similar genetic structure and shared some ancestral sequences with T. obscurus. When K=2, 3, or 5, only two subgroups existed, T. rubripes and the others (Figure S1). This information, which was consistent with the phylogenetic analysis, also reflected the relatively earlier differentiation time of T. rubripes and the close genetic relationships and relatively recent species differentiation of the other four species.
LD decay and pattern of haplotype blocks
LD (Linkage Disequilibrium) decay was investigated for five populations, and R2 values were calculated by PLINK and sorted by distance ranges. T. flavidus showed the highest R2 value, while the other four populations obtained significantly lower R2 values (Figure 3A). When the distance was greater than 1 kb, the mean R2 values were 0.55 in T. flavidus, 0.40–0.45 in T. rubripes, T. niphobles, and T. obscurus, and 0.38 in T. bimaculatus (Supplementary Table 3). LD decay of whole samples and saline samples showed higher R2 values than most single populations except for T. flavidus. Because of the relatively small sample sizes for T. flavidus (N=8) and T. bimaculatus (N=8), haplotypes were constructed only for the populations of T. rubripes, T. niphobles, and T. obscurus using the PLINK “-block” parameter. The distribution of haplotype block lengths was calculated using the R software ggplot2 package (Figure 3B). The block patterns were similar in three populations, while all samples and saline samples showed higher densities in smaller blocks. The maximum block lengths were 146.67 kb, 36.51 kb, and 34.22 kb in T. rubripes, T. niphobles, and T. obscurus, respectively, while the corresponding mean block sizes were 3.81 kb, 2.17 kb, and 2.42 kb (Supplementary Table 4).
Genome-wide selective sweep analysis
The genetic diversity in certain regions of the genome might be dramatically decreased as a result of natural selection or domestication. Among the five species in this study, the T. obscurus population (Ob group) is more strongly euryhaline compared with the other four species (the Saline group), which is probably due to its unique genetic background. To identify genome regions under selection in the T. obscurus genome, we scanned the genome-wide variation and allele frequency spectra based on the approximately 13.69 million SNPs. The π ratios (πSaline/Obscurus) were calculated using a 100 kb sliding-window approach with 10 kb steps. In comparison to the Saline group, the Ob group had a significantly lower level of diversity (median πSaline/Obscurus = 2.860), reflecting the fewer recombination events in the Ob population. We identified a total of 1,548 significant windows (top 5%, empirical π ratios ≥ 5.856) with median πSaline/Obscurus = 8.230 that included 554 candidate genes (Supplementary Table 5) based on the π ratio analysis. To further validate the genome regions under strong selective pressure in the Ob population, the genome regions with Fst greater than 0.232 (top 5%) were also identified, corresponding to 497 candidate genes (Supplementary Table 5). A total of 379 candidate genes shared by both the π ratio and Fst analyses were recognized as potential genes under selection (Figure 4A). Among these selected genes, many are involved in ion transportation and ATPase related pathways; those that have been reported in previous studies include atp1a3 (sodium/potassium-transporting ATPase), slc13a1 (sodium/sulfate symporter), kcna2/3/10 (potassium voltage-gated channel subfamily A member 2/3/10), slc5a8 (sodium-coupled monocarboxylate transporter), and scnb1 (sodium channel subunit beta-1) (Figure 4B). The diversity in the neighboring regions of these selected genes showed significant decrease in the Ob population compared with the Saline group, indicating possible selective sweeps adapting to the long-term changing osmotic environment. For example, in the kcna gene family, kcna2, kcna3, and kcna10 in the same sweep region (>500 kb) were identified, while another gene, slc5a8, was also found in this region. To better understand these candidate genes and their potential functions, further GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analyses also identified several key genes related to osmotic regulation. Two genes, fyn and atp2a3, were enriched in the GO term “ATP binding,” and grb2 was enriched in the “Gap junction” pathway (Figure 4B, Figure S2, and Supplementary Table 6). The results based on Tajima’s D analysis also provided supporting evidence identifying strong selective sweep signals (Figure 5). The results suggested that the genomes of the Ob population have significantly evolved adaptations to the euryhaline environment.
As the Saline group contains four species, the genetic divergence among species could well reduce the common background noise when used as a whole. The comparison would then be focused on the species’ shared lower osmoregulation ability in contrast to the Ob population. Still, it might be useful to identify more potential genes under selection by one-to-one comparisons between the Ob population and the other four populations (shown as Bi, Fl, Ni, Ru). The π ratios and Fst values for the four comparisons are shown in Figure 6A, and genes within selected regions are summarized in Supplementary Tables 7-10. The most abundant selected genes (211 genes) were found in the comparison Fl vs Ob, while 41 genes, 132 genes, and 92 genes were identified in Bi vs Ob, Ni vs Ob, and Ru vs Ob, respectively. Despite the result that similar genes were found in all four comparisons of Saline vs Ob (e.g., atp1a3, atp2a3, fyn, and scnb1), other important ion transporter genes were also identified (Figure 6B), including slc12a4 (potassium/chloride transporter) from the Bi vs Ob group and slc12a2 (sodium/potassium/chloride transporter) and slc26a2 (anion exchanger) from the Ru vs Ob group, indicating their vital roles in regulating osmotic balance. Additionally, atp2b2 (calcium-transporting ATPase 2) and aqp3 (aquaporin 3) were also discovered in selective sweep regions from three comparisons (Bi vs Ob, Fl vs Ob, and Ru vs Ob). The prlr (prolactin receptor), which was reported to be involved in salinity tolerance of Nile tilapia [29], was also identified in Bi vs Ob and Fl vs Ob comparisons. GO and KEGG enrichment analyses identified crucial pathways including “neuromast development” (contains slc26a2), “integral component of plasma membrane” (contains aqp3), and “neuroactive ligand-receptor interaction” (contains prlr), suggesting involvement of core functions of the neural system in the regulation of salinity (Supplementary Table 11). Further analyses based on Tajima’s D differences also provided evidence for previous selective sweep signals (Figure S3). These results suggested that in pairwise comparisons between the Ob population and the other four populations, the genome of the Ob population displays significant selection signatures indicating enhancement of the species’ euryhaline ability.
The most important genes under selection are summarized in Supplementary Table 12. Ion transporters comprise a large proportion of the candidate genes, while growth and neural development-related genes might play upstream roles in the regulation networks.