Genetic variation of S. variegatus populations
Seventy haplotypes of the COI gene and 67 haplotypes of the Cytb gene were identified from the 15 populations. The S. variegates COI fragment (652bp) and Cytb fragment (421bp) have 45 (6.9%) and 40 (9.5%) variable sites with 28 and 23 parsimony informative sites, respectively (Table 1). The base composition of the two genes is adenine (A) and thymine (T) (67.5% and 73.3%) biased, respectively, which is common for insect mitochondrial genes. Haplotype diversity (Hd) ranges from 0.424 to 0.913 (mean = 0.865) and nucleotide diversity (π) ranges from 0.00072 to 0.00462 (mean = 0.00427) for the COI gene (Table 1). Similarly, the Hd ranges from 0.464 to 0.833 (mean = 0.834) and π ranges from 0.00119 to 0.00539 (mean = 0.00479) for the Cytb gene (Table 1).
Haplotype analyses of the COI and Cytb genes
The distribution of the haplotypes for the two genes across the populations studied was showed in Table S1. The rarefaction analyses showed that the curves converged on an asymptote (Fig. S1). The COI haplotypes (H1-H70) include 34 (48.6%) unique haplotypes (Table S2). The four most frequent haplotypes (H1-H4) were found in 132 (30.2%), 59 (13.5%), 29 (6.6%), and 60 (13.7%) individuals (Table S2; Fig. 2a). The haplotype 1 (H1) was in almost all populations except GDQH, FJCQ and ESHB populations, whereas the haplotype 2 (H2) was only discovered in GYSC, HZSX, AKSX, FJCQ, ESHB and LCHB populations (Table S2). The Cytb haplotypes (H1-H67) have 35 (52.2%) unique haplotypes, among which 32 were observed in more than one individual (Table S2). Three most frequent haplotypes (H1-H3) were found in 158 (36.2%), 61(14.0%) and 48 (10.9%) individuals (Table S2; Fig. 2b). The haplotype 1 (H1) was found in all populations except ESHB population, whereas the haplotype 3 (H3) was only discovered in AQAH, LAAH, HFAH, CHAH, NJJS and ZJJS populations (Table S1).
The haplotype distribution and haplotype network analyses (see below) of both COI and Cytb genes revealed that S. variegates populations could be divided into three major geographical distribution regions or haplogroups: the northwestern China (NW) haplogroup (GDQH, HZGS and ZYGS populations), the central China (CC) haplogroup (GYSC, HZSX, AKSX, FJCQ, ESHB and LCHB populations) and the central and eastern China (CE) haplogroup (AQAH, LAAH, HFAH, CHAH, NJJS and ZJJS populations) (Fig. 1).
For the haplotype network of the COI gene, there was only one common haplotype (H1) in three haplogroups. The haplotype 2 (H2) was only detected and abundant in the CC haplogroup. The haplotype 3 (H3) was only discovered in the CE haplogroup. There were six common haplotypes (H4-H9) between the NW haplogroup and CC haplogroup. A total of five missing haplotypes were observed in all populations (Fig. 2a). Similarly, for the haplotype network of Cytb gene, there were two common haplotypes (H1, H4) in three haplogroups. The haplotype 2 (H2), the most abundant, was only detected in the CC haplogroup. The haplotype 3 (H3) was only discovered in CE haplogroup. The haplotypes 5-6, 7, 8-9 (H5-H6, H7, H8-H9) were common in both the NW and CC haplogroups, NW and CE haplogroup, CC and CE haplogroup, respectively. A total of four missing haplotypes were observed in the CC haplogroup (Fig. 2b).
Population genetic differentiation
To further assess whether the three inferred clusters of S. variegates populations are genetically distinct, the Bayesian clustering analysis was performed using STRUCTURE. The STRUCTURE analysis showed that the most likely value of K chosen with Evanno’s ΔK method was 3, indicating a division of genetic variation into three clusters as well. The proportions of each population that contributed to each of the three clusters are showed in Figure 3. Clusters 1 (red) and 2 (yellow) were contributed mainly from the NW and CC populations, respectively. The CE populations were mainly shared in cluster 3 (green).
A strong genetic divergence was observed across populations (FST = 0.425, P < 0.0001, Table 2). The FCT value among three regions (NW, CC and CE) was highly significant (FCT = 0.470, P< 0.0001, Table 2), further demonstrating that S. variegates populations in China is divided into three regions. A significant genetic differentiation was observed among populations within regions (FSC = 0.072, P< 0.0001, Table 2), and within populations (FST = 0.508, P< 0.0001, Table 2) based on the combined date of the COI and Cytb genes. The percentages of genetic variation within populations (60.16% in the populations between NW and CC regions, and 56.00% between in the populations NW and CE regions) were significantly higher than those of the comparisons between regions (33.89% between NW and CC regions, 33.88% between the NW and CE regions) (Table 2). However, the percentage of genetic variations between the CC and CE regions (54.95%) was higher than that of 42.82% within populations (Table 2), an indicator that there is limited gene flow between the CC and CE regions.
The pairwise FST values based on the combined date of the COI and Cytb genes among populations ranged from -0.015 to 0.811 (Table 3). In 105 comparisons, 88 comparisons showed a significantly high genetic differentiation. The pairwise FST values among populations within the CC and CE regions were less than 0.159, while the pairwise FST values between populations from the CC and CE regions were above 0.409. In addition, the pairwise FST values were high and significant among regions (FST > 0.25, P < 0.001, Table 4), and gene flow among regions was estimated extremely low (Nm < 1, Table 4), suggesting a limited gene flow among regions. The results were greatly consistent with those obtained by the analysis of molecular variance (AMOVA) described in above sections.
The Mantel test based on the combined date of the COI and Cytb genes revealed a significant correlation between the genetic distance (FST/(1-FST)) and the geographical distances among all populations (r = 0.500, P < 0.0001, Fig. 4).
Demographic analyses
The Tajima’s D values obtained with the single and combined gene data in the NW region were negative, but not significant (P > 0.05, Table 1). The Tajima’s D and Fu’s Fs values in the CC and CE regions were negative and highly significant (P < 0.05, Table 1), whereas the CE region showed significant sum of squares deviation (SSD) values (P < 0.05, Fig. 5, S2). Thus, for the NW and CE regions, the sudden expansion hypothesis was rejected. However, the distributions of the pairwise differences obtained with the single and combined gene data in the CC region were unimodal with non-significant SSD and Harpending’s raggedness index (Rag) values (Fig. 5, S2), suggesting an expansion event in the CC region. The tau values (τ), a rough estimate of the population expansion, were approximately 3.842 (COI date), 2.016 (Cytb date), and 1.595 (COI+Cytb date) mutation units for the CC region. For the NW and CE regions, τ was 1.344 and 0.766 in the data of the COI gene, 3.693 and 0.875 in the data of the Cytb gene, and 2.628 and 1.875 in the combined data of the COI and Cytb genes (Fig. 5, S2).