Genetic variation of S. variegatus populations
Seventy haplotypes of the COI gene and 67 haplotypes of the Cytb gene were identified from the 15 populations. The S. variegates COI fragment (652bp) and Cytb fragment (421bp) have 45 (6.9%) and 40 (9.5%) variable sites with 28 and 23 parsimony informative sites, respectively (Table 1). The base composition of the two genes is adenine (A) and thymine (T) (67.5% and 73.3%, respectively) biased, which is common for insect mitochondrial genes. The haplotype diversity (Hd) ranges from 0.424 to 0.913 (mean = 0.865) and the nucleotide diversity (π) ranges from 0.00072 to 0.00462 (mean = 0.00427) for the COI gene (Table 1). Similarly, the Hd ranges from 0.464 to 0.833 (mean = 0.834) and π ranges from 0.00119 to 0.00539 (mean = 0.00479) for the Cytb gene (Table 1).
Haplotype analyses of the COI and Cytb genes
The distribution of the haplotypes for the two genes across the populations studied is shown in Table S1. The rarefaction analyses showed that the curves converged on an asymptote (Fig. S1). The COI haplotypes (H1-H70) included 34 (48.6%) unique haplotypes (Table S2). The four most frequent haplotypes (H1-H4) were found in 132 (30.2%), 59 (13.5%), 29 (6.6%), and 60 (13.7%) individuals (Table S2; Fig. 2a). The haplotype 1 (H1) was in almost all populations except the populations from GDQH, FJCQ and ESHB, whereas the haplotype 2 (H2) was only in the populations from GYSC, HZSX, AKSX, FJCQ, ESHB and LCHB (Table S2). The Cytb haplotypes (H1-H67) had 35 (52.2%) unique haplotypes, among which 32 were observed in more than one individual (Table S2). Three most frequent haplotypes (H1-H3) were found in 158 (36.2%), 61(14.0%) and 48 (10.9%) individuals (Table S2; Fig. 2b). The haplotype 1 (H1) was found in all populations except ESHB population, whereas the haplotype 3 (H3) was only discovered in the populations from AQAH, LAAH, HFAH, CHAH, NJJS and ZJJS (Table S1).
The haplotype distribution and haplotype network analyses (see below) of both COI and Cytb genes revealed that S. variegates populations could be divided into three major geographical distribution regions or haplogroups: the northwestern China (NW) haplogroup (GDQH, HZGS and ZYGS populations), the central China (CC) haplogroup (GYSC, HZSX, AKSX, FJCQ, ESHB and LCHB populations) and the central and eastern China (CE) haplogroup (AQAH, LAAH, HFAH, CHAH, NJJS and ZJJS populations) (Fig. 1).
For the haplotype network of the COI gene, there was only one common haplotype (H1) in three haplogroups. The haplotype 2 (H2) was only detected and abundant in the CC haplogroup. The haplotype 3 (H3) was only discovered in the CE haplogroup. There were six common haplotypes (H4-H9) between the NW haplogroup and the CC haplogroup. A total of five missing haplotypes was observed in all populations (Fig. 2a). Similarly, for the haplotype network of the Cytb gene, there were two common haplotypes (H1, H4) in three haplogroups. The haplotype 2 (H2) was most abundant and only detected in the CC haplogroup. The haplotype 3 (H3) was only discovered in the CE haplogroup. The haplotypes 5-6, 7, 8-9 (H5-H6, H7, H8-H9) were common in the NW and the CC haplogroups, the NW and the CE haplogroup, the CC and the CE haplogroup, respectively. A total of four missing haplotypes was observed in the CC haplogroup (Fig. 2b).
Population genetic differentiation
A strong genetic divergence was observed across populations (FST = 0.425, P < 0.0001, Table 2). The FCT value among three regions (NW, CC and CE) was highly significant (FCT = 0.470, P< 0.0001, Table 2), further demonstrating that S. variegates populations in China is divided into three regions. A significant genetic differentiation was observed among populations within the regions (FSC = 0.072, P< 0.0001, Table 2), and within the populations (FST = 0.508, P< 0.0001, Table 2) based on the combined data of the COI and Cytb genes. The percentages of genetic variation within the populations (60.16% in the populations between NW and CC regions, and 56.00% in the populations between NW and CE regions) were significantly higher than those of the comparisons between the regions (33.89% between NW and CC regions, 33.88% between NW and CE regions) (Table 2). However, the percentage of genetic variations between CC and CE regions (54.95%) was higher than 42.82% within the populations (Table 2), indicating that there is limited gene flow between the CC and CE regions.
The pairwise FST values based on the combined date of the COI and Cytb genes among populations ranged from -0.015 to 0.811 (Table 3). In 105 comparisons, 88 comparisons showed a significantly high genetic differentiation. The pairwise FST values among populations within the CC and CE regions were less than 0.159, while the pairwise FST values between the populations from CC and CE regions were above 0.409. In addition, the pairwise FST values were highly significant among the regions (FST > 0.25, P < 0.001, Table 4), and the gene flow among the regions was estimated extremely low (Nm < 1, Table 4), suggesting a limited gene flow among the regions. The results are greatly consistent with those obtained by the analysis of molecular variance (AMOVA) described in above sections.
The Mantel test based on the combined data of the COI and Cytb genes revealed a significant correlation between the genetic distance (FST/(1-FST)) and the geographical distances among all populations (r = 0.500, P < 0.0001, Fig. 3).
Demographic analyses
The Tajima’s D values obtained with either single or combined data of the two genes in the NW region were negative, but not significant (P > 0.05, Table 1). The Tajima’s D and Fu’s Fs values in the CC and CE regions were negative and highly significant (P < 0.05, Table 1), whereas the CE region showed significant sum of squares deviation (SSD) values (P < 0.05, Fig. 4, S2). Thus, for the NW and CE regions, the sudden expansion hypothesis was rejected. However, the distributions of the pairwise differences obtained with single and combined gene data in the CC region were unimodal with non-significant SSD and Harpending’s raggedness index (Rag) values (Fig. 4, S2), suggesting an expansion event in the CC region. The tau values (τ), a rough estimate of the population expansion, were approximately 3.842 (COI data), 2.016 (Cytb data), and 1.595 (COI + Cytb data) mutation units for the CC region. For the NW and CE regions, τ was 1.344 and 0.766 in the data of the COI gene, 3.693 and 0.875 in the data of the Cytb gene, and 2.628 and 1.875 in the combined data of the COI and Cytb genes (Fig. 4, S2).