Metapopulation of Ellobium Chinense Through The Late-Middle and Late Pleistocene Expansions: Four Covariate COI Hotspots Linked to G-Quadruplex Conformation

The land snail Ellobium chinense (Pulmonata, Ellobiida, Ellobiidae), which inhabits the salt marshes along the coastal areas of northwestern Pacic, is an endangered species on the IUCN Red List. Over recent decades, the population size of E. chinense has consistently decreased due to environmental interference caused by natural disasters and human activities. Here, we provide the rst assessment of the genetic diversity and population genetic structures of northwestern Pacic E. chinense. The results analyzed with COI and microsatellites revealed that E. chinense population exhibit metapopulation characteristics, retaining under the inuence of the Kuroshio warm currents through expansion of the Late-Middle and Late Pleistocene. We also found four phylogenetic groups, regardless of geographical distributions, which were easily distinguishable by four unidirectional and stepwise adenine-to-guanine transitions in COI (sites 207-282-354-420: A-A-A-A, A-A-G-A, G-A-G-A, and G-G-G-G). Additionally, the four COI hotspots were robustly connected with a high degree of covariance between them. We discuss the role of these covariate guanines which link to form four consecutive G-quadruplexes, and their possible benecial effects under positive selection pressure.


Introduction
Ellobium chinense 1 , of the family Ellobiidae (Gastropoda, Pulmonata, Ellobiida), is a small, air-breathing, land snail that dwells between-stones in freshwater tidelands near the shoreline and on halophytes such as Zoysia sinica, as shown in Figs 1a and S1. Its distribution range is con ned only to the northwestern Paci c coasts including the coastlines of South Korea, Japan, and China 2 . Importantly, E. chinense is a key indicator of the environmental health of the intertidal zone of its habitat. However, over recent decades, its natural habitat and the number of populations and individuals have rapidly declined due to environmental stresses, such as typhoons and tsunamis, and human activities including land reclamation and coastal development; thus, E. chinense has been designated as an endangered species on the South Korean Regional Red List 3 , as a vulnerable species on the Japanese Regional Red List, and as an endangered species on the International Union for Conservation of Nature (IUCN; Association of Wildlife Research and Envision, 2007).
Su cient genetic diversity is important for ensuring the persistence of species, increasing their adaptive potential in the face of environmental change, and avoiding inbreeding depression 4 . On the other hand, the loss of intraspeci c genetic diversity dramatically reduces the capacity for species to respond to novel selection pressure and thereby increases their extinction risk 5 . In recent decades, research on the population genetics of endangered metazoan species has been increasingly conducted because of its central importance in planning both in-situ and ex-situ conservation efforts 6 . In such research, molecular markers, including speci c mitochondrial genes [mainly cytochrome c oxidase I (COI)] 7 , whole mitochondrial genomes 8 , and microsatellites 9 are consistently used. Given the importance of determining the genetic diversity and population genetic structure of an endangered species in conservation planning, it is surprising that such research has rarely been northwestern Paci c E. chinense.
In the present study, therefore, we provide the rst assessment of the genetic diversity and the pattern of genetic structures among populations of northwestern Paci c E. chinense (sampled in South Korea and Japan). Our analyses were based on the popular DNA barcode region of the mitochondrial gene COI and 10 selected microsatellite markers. In total, 140 COI data (consisting of 139 South Korean samples and one Japanese sample), and 54 microsatellite data (from South Korea only) were analyzed. In addition, we employed four distinct covariate adenine/guanine hotspots observed on COI which are distinct key sequences to easily classify phylogenetic groups of E. chinense; these sites were closely associated with G-quadruplex structure conformation. G-quadruplexes, i.e., four-stranded noncanonical DNA structures, can be spontaneously formed by Hoogsteen bonds between stacked sets of four guanines from each of four separated runs of two, three, or four guanines 10 . A stable G-quadruplex within the coding sequences (CDS) causes ribosome stalling; protein expression can be enhanced through silent mutations that affect its stability 11 . Herein, by determining the genetic diversity and structures of populations of E. chinense, we were able to discuss not only the roles of the four covariate hotspots on the CDS of COI in relation to Gquadruplex structure conformation but also the associated bene cial and resistant effects against environmental perturbation such as oxidative stress.

Sample collection and DNA extraction
In total, 103 individuals of E. chinense were collected from eight sampling sites across the western and southern coastal areas of South Korea; these covered the entire remaining habitats of E. chinense on the Korean Peninsula (Figs 1b and S1; Table 1). Since this species is protected as an endangered species in South Korean government and the IUCN Red List, samples were collected after gaining permission from the Ministry of Environment, South Korea. The collected samples were immediately preserved in 95% ethanol immediately. Total genomic DNA was extracted from the foot tissue of E. chinense with the DNeasy Blood and Tissue Kit (Qiagen Co., USA) following the manufacturer's protocol and then maintained at −20°C until further use. The e ciency of the genomic DNA extraction and the purity of the extracted DNA were examined with a spectrophotometer (Nanodrop Technologies, USA).

PCR ampli cation and sequencing of COI
To amplify a partial fragment of mitochondrial COI gene of E. chinense, PCR was conducted using two universal primers: LCO1490 (5′-GGT CAA CAA ATC ATA AAG ATA TTG G-3′) and HCO2198 (5′-TAA ACT TCA GGG TGA CCA AAA AAT CA-3′). These primers, which were developed for invertebrates by Folmer et al. (1994) 34 , can be used for PCR ampli cation of a partial mitochondrial COI fragment, i.e. a DNA barcoding region, prior to sequencing. The PCR mixtures were prepared to a total volume of 50 μl containing 2 μl of 10 pM of each primer, 1 μl of 10 mM dNTP mix, 5 μl of 10 × Taq DNA polymerase reaction buffer, 0.25 μl of 5U/μl DiaSter Taq DNA polymerase, and 2 μl of total genomic DNA. Thermal cycling conditions were as follows: denaturation at 94°C for 2 min, 35 cycles of 94°C for 40 s, 52°C for 1 min, and 72°C for 1 min, and a nal extension step of 72°C for 7 min, which followed a cool down step at 4°C. The PCR products were electrophoresed on 1% agarose gel including eco-dye and examined under UV light; they were puri ed using a QIAquick PCR Puri cation Kit (Qiagen Co., USA) and then sequenced using the primers applied in PCR and an ABI Prism 3730 DNA sequencer (PerkinElmer, USA) with a BigDye Termination Sequencing Kit (PerkinElmer, USA). The determined DNA sequences were manipulated in BioEdit v7.2.5 35 , and nally veri ed by visual inspection.
PCR ampli cation and genotyping of microsatellite markers A total of 30 microsatellite markers that have recently been developed for E. chinense 36 ; these were tested using some representative E. chinense samples collected in the present study. The initial screenings discovered ten primer pairs showing clear and reproducible pro les; these were selected for subsequent genetic analyses of E. chinense populations (Table S1). A tailed PCR primer was produced by adding a M13 (-21) universal sequence "tag" to the 5′ end of each forward primer as described by Schuelke (2000) 37 . The M13-forward oligonucleotide was labeled with the 6-FAM uorescent dyes. PCRs were performed in a total volume of 20 μl consisting of 20 ng of template (gDNA), 10 mM dNTPs, 10 pmol of M13 modi ed forward primer, 10 pmol of each reverse primer and M13 primer uorescently labeled with 6-FAM (Euro ns Genomics, France), with 10 × incubation mix including MgCl 2 , 5 units HANLAB Taq DNA polymerase, and lled with distilled water. PCR cycling conditions comprised an initial denaturation phase at 94°C for 5 min, followed by 34 cycles of 94°C for 30 s, 54°C for 30 s, and 72°C for 30 s, and nally an extension step at 72°C for 5 min. The tagged PCR product (0.2 μl) was added to 9.8 μl of Hi-Di formamide loading buffer, sized precisely using 0.2 μl of GeneScan 500 LIZ Size Standard (Applied Biosystems, USA) and subsequently run on an ABI 3730x1 automated sequencer (Applied Biosystems, USA). Allele size calling and genotypic analyses were performed using GeneMapper Software ver. 4.1 (Applied Biosystems, USA).

Analyses of population genetic diversity and structure
To elucidate the level of genetic diversity and structure at the interpopulation and/or intrapopulation levels in E. chinense, the 140 obtained COI sequences were aligned with Clustal X 38 (Data S1). The numbers of polymorphic sites and haplotypes, haplotype diversity, and nucleotide diversity were estimated for each population using the program DnaSP v.6.0 39 . AMOVA (analysis of molecular variance) was conducted in ARLEQUIN v.3.5 40 to partition the genetic variance within and among populations. Pairwise F ST values were calculated to assess the pattern of population differentiation. A neutrality test was performed by calculating Tajima's D 41 and Fu's Fs 42 via ARLEQUIN v.3.5 40 to detect the historical existence of demographic expansion. To examine whether E. chinense has passed through recent expansion or frequent migration among neighbor demes, a mismatched distribution analysis was conducted based on the distribution of the observed pairwise nucleotide site differences and the expected values obtained assuming the populations of constant size or under the growth 12 . Based on the sequence alignment set of 58 COI haplotypes ( Fig. S2; Data S2), unrooted and outgroup-rooted maximum likelihood (ML) trees were reconstructed using the IQ Tree web server (http://iqtree.cibiv.univie.ac.at) with the GTR+I+G model and 1,000 bootstrapping replicates. Additionally, a phylogenetic network was generated via the neighbor-net algorithm 43 . Furthermore, a statistical parsimony haplotype network was constructed with a 95% connection limit using TCS v.1.2.1 44 ; this was used to assess the genealogical relationship at the intraspeci c level and to infer biogeography by constructing a COI haplotype network. To evaluate and visualize the geographic genetic structure among populations, principal coordinates analyses (PCoA) were conducted via DARwin v.6.0.9 45 , which ordinated genetic distance estimates calculated with the haplotype data used in this study.
Using the 10 microsatellite loci (Table S1) selected from the 30 microsatellite markers previously developed by Hyun et al. (2017) 36 , we reexamined the genetic diversity and structure of E.chinense populations in comparison to the results inferred from COI haplotype data. The PCR primers selected as microsatellite markers in the present study exhibited highly informative values of polymorphism information content (PIC > 0.5). Based on these microsatellite markers, genetic variability was estimated in terms of the number of alleles averaged over 10 loci (N A ), observed heterozygosity (H O ), and expected heterozygosity (H E ) across populations and across loci using GenALEx v.6.5 46 . Furthermore, F IS was calculated for each population and for each locus, based on Wright's F-statistics 47 . The genetic differentiation coe cient (F ST ) was estimated using GenALEx v.6.5; additionally, tests for deviation from the Hardy-Weinberg Equilibrium between the microsatellite loci were performed using this software. To characterize the overall genetic structure, we applied a Bayesian clustering approach implemented in STRUCTURE v.2.3.4 48 which identi es the number of potential genetic clusters (K) in the dataset without a priori information on geographical or population identi ers. An admixture model was used with correlated allele frequencies applying burn-ins of 2 × 10 5 and runs of 10 5 repetitions for each value of K varying from 1 to 8. Three iterations were completed for each tested K value. The true number of K was identi ed based on the approach of Evanno et al. (2005) 16 by applying Structure Harvester v.0.6.94 49 . In addition, we investigated isolation by distance (IBD) was examined by correlating the matrix of pairwise geographic distances between sampling locations with two correspondent matrices of genetic differentiation estimates: F ST (estimated based on haplotype frequency) and ф ST (considering the level of distance among haplotypes) 50 . Geographic distances were calculated as the shortest pathways along the coastal line.

Covariance calculation and screening of possible G-quadruplex motifs in COI
From the multiple sequence alignment results of the COI haplotypes (Fig. S2), we searched for cases of concerted patterns of variation between different alignment positions. In performing this covariance calculation, we made pairwise gene sequence comparisons for every position. The base was checked for covariance with the frequency ratio in the multiple sequence alignment results, and this result was displayed on the network.
To screen for possible G-quadruplex-forming sequence motifs in COI (Fig. S1), we used the QGRS-Conserve program 51 , considering new types of G-quadruplex in addition to the canonical forms. The frequencies of nonoverlapping G-quadruplex-forming sequence motifs in the COI barcoding region were calculated using uninterrupted or interrupted by non-G-rich fragments not exceeding the maximum loop length (20 bp). This calculation process was repeated up to 10 times.

Divergence time estimation based on COI
Divergence time estimation of the nodes on the phylogeny of E. chinense was conducted based on COI haplotype sequences via the BEAST 2.6.0. 52 . The divergence time was estimated using the strict molecular clock algorithm under the calibrated-Yule tree prior. For the calibration point, the previously estimated age of the subfamily Carychiinae (from the family Ellobidae) was used as the basal node (60 mya; normal distribution). The age of Carychiinae was given by Weigand et al. (2013) 53 , who among others reported that it originated no earlier than the beginning of the early Cenozoic (about 65 mya) 53,54 .
The best-t nucleotide substitution model was selected by jModelTest 55 as GTR+I+G. The posterior distributions of parameters were estimated using 1,000,000 MCMC generations with sampling every 1,000 generations. Additionally, an effective population size value was determined in Tracer 1.7 56 .
TreeAnnotator 2.6.0. 57 was used to produce a maximum clade credibility tree with a median height after removing the initial 25% of trees as burn-in. FigTree 1.4.2. 58 was used to visualize the topology of the resultant consensus tree.

Results
Genetic diversity and phylogeny of E. chinense based on COI The partial fragment of COI, 595bp in length, was sequenced from 103 E. chinense individuals collected from the eight sites in South Korea (Table 1). These sequences were aligned together with 37 COI haplotype sequences retrieved from the NCBI GenBank. Hence, 140 COI haplotype sequences of E. chinense were analyzed, representing 18 collection sites at the nine populations in South Korea and Japan (Table 1). Based on the alignment set (no indels) of these 140 COI sequences (Data S1), we obtained a total of 58 COI haplotypes, of which 43 were singleton, appeared in only a single site ( Table 2). The novel 58 COI haplotype sequences obtained were registered under the GenBank accession nos. MW265437−MW265477. According to the sequence alignment of the 58 COI haplotypes ( Fig. S2; Data S2), there were 71 polymorphic sites and 31 parsimoniously informative sites (Fig. 1c), among which four adenine/guanine hotspots at 207, 282, 354, and 420 were ascertained to articulately divide the haplotypes of E. chinense into four meaningful phylogenetic groups: Based on the COI haplotype sequence alignment ( Fig. S2; Data S2), we reconstructed a ML tree using Ellobium aurisjudae as an outgroup. In the resultant tree topology (Fig. S3), it was con rmed that E. chinense appeared as a monophyletic group, but no distinction between the haplotypes from each geographical population was observed. To de ne detailed relationships among the COI haplotypes, the outgroup was removed and then an unrooted ML tree (Fig. 1d) was reconstructed. The resultant tree showed two distinctive phylogenetic groups, namely A-A-A-A and the other groups (including at least one G or more in the four positions), regardless of collection localities. The A-A-A-A group included 35 of the 58 COI haplotypes. The others could be divided into the A-A-G-A group (N=12: ECH11, 12, 15, 16, 18, 23, 27, 28, 32, 33, 36, and 49), the G-A-G-A group (N=1: ECH35), and the G-G-G-G group (N=8: ECH01, 07, 19, 29, 41, 45, 48, and 54).
As shown in Table S2 and Fig. S3, ECH01 was a dominant member of the G-G-G-G group with the most individuals (27), which appeared across all the South Korean populations examined here. As shown in Fig. 1d, the A-A-A-A group is likely to be an ancestral type because it was most frequently found in the other species within Ellobiidae (data not shown) and its haplotype diversity was the highest among the four genetic groups. Given that the G-G-G-G group exhibited much lower haplotype diversity than the A-A-A-A group, and was not observed in any other ellobiid species (data not shown), it is reasonable to suggest that the G-G-G-G group is a derived rather than an ancestral type. Thus, as shown in Fig. 1d, it is conceivable that unidirectional and stepwise A→G transition events from A-A-A-A to G-G-G-G may have been occurred in E. chinense. Within the A-A-A-A and A-A-G-A groups, parsimoniously informative A→G transition event was found at the sites 120 (ECH12, 15, 16, 18, 23, 38, 40, 44, and 49) and 183 (ECH3, 9, 10, 20, 43, 50, and 55), with a few exceptional cases, for example, G→A at the sites 216 (ECH12, 15, 16, 18, and 23) and 372 (ECH12, 15, and 23), and 429 (ECH37, 38, 46, and 47; ECH19 found in the G-G-G-G group). Table 2, the nucleotide diversity (π) is relatively low among the nine populations of E. chinense, ranging from 0.00749 (population BG) to 0.01042 (SC) with an average of 0.00865, whereas the haplotype diversity was very high across these populations, ranging from 0.924 (YG) to 1.000 (SC and JB) with an average of 0.939). All values of Tajima's D and Fu's F S were congruently negative, with averages of -1.87100 (P < 0.05) and -2.75886 (not statistically signi cant), respectively, indicating that the signature of demographic expansions recently occurred in E. chinense.

As indicated in
Population genetic structureof E. chinense based on COI As shown in Table S3, the pairwise F ST values between nine E. chinense populations were highly low and statistically non-signi cant, ranging from -0.21177 (JB and GY) to 0.09091 (JB and JK), indicating the lack of genetic structure. Analysis of molecular variance (AMOVA) for the nine populations of E. chinense in South Korea and Japan (Table S4) revealed that 101.05% of the variance was allocated into the level of individuals within each population, while differentiation between populations (-1.05%) did not appear to contribute to the overall variance.
Upon the results of TCS network analysis, all the COI haplotypes were connected by forming a single, large network, conforming that no distinct genetic structure existed among populations (Fig. S4a). The G-G-G-G group, with its representative member ECH01, was located rather at a distal position on the TCS network, which implies that these haplotypes may have recently emerged during the expansion of northwestern Paci c E. chinense populations. The phylogenetic network analysis (Fig. S4b) also indicated that a distinct genetic structure representing the geographical distributions of E. chinense did not exist. Indeed, all haplotypes were grouped into a single broad cluster with overlaps between populations, which is in agreement with the results of the TCS network analysis (Fig. S4a). In addition, the mismatch distribution analysis showed a clearly unimodal curve (Fig. S4c), evidently implying the metapopulation of E. chinense that may have experienced a recent demographic expansion 12,13 or through a range expansion with high levels of migration between neighboring demes 14,15 .
The PCoA shown in Fig. 2a provided additional evidence that, regardless of the geographical distribution of Northwestern Paci c E. chinense, there was consistent support for the existence of four distinct phylogenetic groups, i.e., A-A-A-A, A-A-G-A, G-A-G-A, and G-G-G-G (as also shown in Fig. 1). In this analysis, the plausible ancestral A-A-A-A group (with the highest degree of haplotype diversity) was placed mainly around the bottom of the second quadrant, whereas the A-A-G-A group was in the left of the fourth quadrant, the G-A-G-A group (ECH35 only) in the right middle of the fourth quadrant, and the G-G-G-G group (represented by ECH01) was in the rst quadrant. The three variation types within the A-A-A-A group, namely A-A-A-G, A-G-A-A, and A-G-A-G, appeared in the ML tree ( Fig. 1d; Fig. S3) and were spatially separated from the remaining orthodox members of A-A-A-A, which were located along the Y axis between the rst and second quadrants. Based on the results of PCoA (Fig. 2a) and the ML tree (Fig. 1a), we con dently suggest that the four phylogenetic groups of E. chinense arose as a result of serial, stepwise A→G transition events in the following order: Fig. 1d; Fig. S3); these could be considered the fth spatially separated phylogenetic group of E. chinense, based only on PCoA (Fig. 2a)

, being located between the phylogenetic groups G-G-G-G and A-A-A-A but apart from A-A-A-A.
Covariance of four COI hotspots linked to the conformation of four G-quadruplexes Similar sequence interval lengths existed between the four adenine/guanine hotspots of COI characterized by unidirectional A→G transitional changes: G-(75 bp)-G-(72 bp)-G-(66 bp)-G. Additionally, the covariance analysis indicated that there was a high degree of covariance among these (see the four thick and bold linkage lines in Fig. 2b): out of 58 haplotypes, 44 were covariate between 207 and 282 sites, 44 between 207 and 420 sites, 41 between 282 and 420 sites, 37 between 207 and 354 sites, 32 between 282 and 354 sites, and 32 between 282 and 420 sites (Data S3 in detail). Given the similar intervals and covariance characteristics, we used QGRS-Conserve to search for sequence motifs that could possibly indicate the formation of G-quadruplex structures in relation to the four hotspots on the CDS of the COI barcoding region. Our results revealed possible sequences associated with the conformation of four different G-quadruplexes represented by each of the four hotspots (Fig. 2c). The rst hotspot located at site 207 may be involved in forming a G-quadruplex structure such as G 2 -N 6 -G 2 - G-quadruplex structures generally have 1-7 nucleotides between the four guanine-tracks each consisting of two or three guanines, e.g., G 2-3 -N 1-7 -G 2-3 -N 1-7 -G 2-3 -N 1-7 -G 2-3 ; two (for G 2 ) or three (for G 3 ) Gquadruplexes can form a stacked structure. Considering the typical features of G-quadruplex structures, each of the putative motifs searched for G-quadruplex structure formation is likely to exhibit two Gquadruplexes stacked together (Figs 2c and d). However, the number of nucleotides between the G-tracks is much larger than usual for some sites, and some G tracks consist of only one G (not two or three). Thus, the G-quadruplex structures formed from the putative motifs are expected to be relatively weak.
Assuming that all four putative motifs found in our analyses are capable of forming G-quadruplex structures, the four consecutive G-quadruplexes with intervals of 57/60, 39/41, and 65 bp in order might play a role in reducing or inhibiting transcription or translation of COI expression, i.e., they may be an exemplar of downregulation of gene expression; the ribosome stalling process (Fig. 2d).
Genetic diversity of E. chinense inferred from microsatellite markers The 10 polymorphic microsatellite loci were ampli ed from 54 individuals collected from the four natural populations (BG, YG, HK, and HS) of E. chinense in South Korea (Table 1), and the PCR products were then sequenced and analyzed to elucidate the level of genetic diversity and the pattern of genetic structure of E. chinense. Fig. 3a  Generally, pairwise genetic differentiation among populations was low, indicating that the E. chinense populations were not genetically differentiated from each other.
The distributions of molecular genetic variation among and within populations, as estimated by AMOVA, are shown in Table S7; only 0.85% of the total molecular variation was attributed to interpopulation differentiation, whereas 86.40% of the variation was within populations. This supports the conclusion that signi cant genetic differences did not exist among the four examined E. chinense populations.
Based on microsatellite markers, the 54 E. chinense individuals were further examined to determine population strati cation. The optimal number of clusters, K, according to the method of Evanno et al.
(2005) 16 , was 2 (Fig. 3b). However, with this method, it is only possible to infer with con dence about clusters ≥2. Indeed, the LnP (D) plot shows a strong drop off in model t only after K=2 (data not shown); thus, the ancestry proportions were plotted for each individual given two clusters. All individuals were equally admixed (Fig. 3b) across the range, supporting a single genetic cluster. IBD analysis indicated that the pairwise observed genetic differences were not correlated with the pairwise geographical distances among the four examined E. chinense populations (Fig. 3c). Consistently, the PCoA results (Fig.  3d) showed that a distinct genetic structure did not exist and that all populations were clustered into a single genetic group. Taken together, the Bayesian clustering approach, IBD analysis, and PCoA provided consistent results that indicated a lack of spatial genetic clustering among the four E. chinense populations from South Korea and Japan. These results are coincident with the metapopulation of E. chinense through recent expansion, as inferred from COI haplotypes (Figs 1 and 2).

Discussion
The land snail E. chinense inhabits on the shoreline adjacent to areas of brackish water and prefers to dwell in salt grass colonies, which are distributed widely across northwestern Paci c coastal areas including those in South Korea, southern Japan, Hong Kong, and southern China. Over recent decades, population numbers have decreased sharply due to coastal development, tidal at reclamation, and salt marsh destruction. E. chinense is also known to be a group of organisms that are particularly more susceptible to human activities than other counterparts of marine ecosystems 17 . Consequently, E. chinense is currently designated as an endangered species II class by the Ministry of Environment in the South Korean government, and a vulnerable species by the Japanese government and IUCN; thus, it has been under continuous management at both national international levels 17 .
The phylogenetic and population genetic analyses of E. chinense conducted here based on COI and microsatellite markers consistently showed the absence of any distinct genetic structure with which to distinguish populations; this may be attributable to their larvae being able to easily cross oceanic barriers via Kuroshio warm currents 18 . It is most likely that E. chinense exhibits typical metapopulation characteristics, which are commonly observed in other marine organisms 19 : 1) high haplotype diversity but low nucleotide diversity, 2) lack of genetic structure along populations, and 3) low genetic differentiation (frequent genetic ows between populations). Given the general metapopulation characteristics, it is reasonable to conclude that E. chinense populations inhabiting South Korea and Japan are a single broad metapopulation that has arisen due to repeating extinctions and migrations caused by continuous and rapid changes in environmental and habitat conditions during the Late-Middle and Late Pleistocene (Figs 4b and S5).
As shown in Fig. 4a, a part of the Kuroshio warm current passing through Japan diverges near Haenam to the south and west coasts of the Korean Peninsula. Thus, it is assumed that E. chinense larvae spread east along the southern coast and north along the western coast according to the in uence of the Kuroshio warm currents. In summary, our results strongly suggest that South Korean and Japanese populations of E. chinense possess metapopulation characteristics associated with Late-Middle and Late Pleistocene expansions (Fig. 4b), and that all northwestern Paci c E. chinense affected by passes of the Kuroshio warm current are likely to be a single, large metapopulation (Fig. 4a). However, our study is limited because we included only one E. chinense sample from Japan and did not analyze samples from southern China and Hong Kong given the di culties in international collection of endangered species. Therefore, we cannot conclusively state that northwestern Paci c E. chinense from the Korean Peninsula, southern Japan, southern China, and Hong Kong exhibit overall metapopulation dynamics. Indeed, further research is required in which additional E. chinense samples from southern Japan, southern China, and Hong Kong are collected and comprehensive analyzed.
According to the results of our molecular clock analysis based on 58 COI haplotypes, the rst appearance of the subfamily Ellobiinae including E. chinense was estimated to be at ca. 58 (Fig. 4b).
Intriguingly, we found that four adenine-to-guanine transition hotspots at the 207, 282, 354, and 420 sites of the COI haplotype alignment of E. chinense could determine the production of genetic groups in the species, regardless of geographical distribution. Based on our PCoA, it seems possible to divide the COI haplotypes into ve different genetic groups within a single metapopulation of E. chinense. The largest group, A-A-A-A, which is plausibly ancestral, includes 35 of the 58 haplotypes; the most derived group, G-G-G-G, consists of only 8 haplotypes. Despite a few G→A transitions, such as at the 216, 372, and 429 sites, purine base transitions seem to be biased to unidirectional A→G transitions, including the four hotspot sites and two other sites, 120 and 183. In particular, the unidirectional A→G transition events are shown in the serial and stepwise transitions of the four hotspots in COI. Such a strong unidirectional tendency can produce guanine richness under plausible positive selection pressure. For example, a positive selective bias from adenine-to-guanine transition has recently been characterized in soft-bodied cephalopods by Moldovan et al. (2020) 23 . These authors suggested that edited adenines tend to be substituted to guanines, and this tendency is supported by positive selection at highly edited sites.
Furthermore, they suggest that such A→G transition bias can yield bene cial effects of increased phenotypic diversity in a low-polymorphic population, which can enhance adaptation, and facilitate the evolutionary process. Similar observations were recently reported in Drosophila and human mRNA editing sites 24 . Therefore, guanine-rich DNA sequences, resulting from the serial and stepwise A→G transitions reported here, could increase phenotypic diversity in a low-polymorphic metapopulation such as that of E. chinense, which might have enhanced the possibility of adaptation in terrestrial and brackish water areas as the species passed through repeated long glacial and relatively short interglacial periods during the Late-Middle Pleistocene and Late Pleistocene (Fig. 4b).
We found strong covariance among the four guanine hotspot sequences in COI, each of which may link respectively to a putative G-quadruplex motif. Guanine-rich sequences are known to be capable of producing noncanonical conformations of G-quadruplexes (also known as G-tetrads or G4) in DNA or RNA sequences 25 . In the nuclear genome, G-quadruplex motifs have been associated with genome instability and gene expression defects; however, they are mainly recognized as regulatory structures that control gene expression or positively affect telomere capping 26 . Recent studies 27 have reported that Gquadruplex structures can form in the mitochondrial genome and even on the CDS in protein-coding genes 28 . Although G-quadruplexes typically have harmful effects, those formed in mitochondrial genomes can help in the response to certain environmental stresses such as oxidative stress 29 .
De ciency in COI within mitochondria caused by G-quadruplexes could result in positive selection bias that reduces the production of reactive oxygen leading to less oxidative damage and a selective advantage during competition with other mitochondria within the same cell, which could generate homoplasmy for COI de ciency. Hershman et al. (2008) 30 suggested that cells with cytochrome c oxidase de ciency are apoptosis resistant and therefore more likely to survive. Active COI oxidizes cytochrome c, which activates pro-caspase 9 leading to apoptosis. Assuming that the four guanine hotspots link to create four G-quadruplex structures, they may function in the regulation (mainly inhibition) of COI expression at the transcription or translation levels, and in modulating (mainly reducing) the concentration of activated COI within the mitochondria. This would reduce oxidative damage and increase resistance to apoptosis with lower reactive oxygen production. Such G-quadruplex conformation-related oxidative stress and apoptosis resistance could provide a selective advantage in competition with other mitochondria in a cell. According to a previous study on the energy metabolism of the garden snail Helix aspersa, natural selection can reduce energy metabolism in a land snail 31 . Intriguingly, the slow behaviors of land snails that have led to reduced energy consumption might be associated with downregulation of COI expression modulated by G-quadruplex motifs on the CDS of COI (as we hypothesized; see Fig. 2).
Determining the genetic diversity and population genetic structure of isolated and threatened species is of great importance when planning a suitable conservation strategy 32 , especially for the selection of appropriate populations and management methods aimed at maintaining genetic variation. Highly diverse populations can be targeted for protection while small and depleted populations can be subjected to management actions that restore their diversity. For example, enhancement or reinforcement plans aim to add individuals to existing populations while reintroduction plans aim to re-establishment a species within its natural habitat where it has become extinct 33 . From the present study, the genetic resources of the endangered E. chinense could be essential for successfully designing conservation strategies aimed at preserving its population and restoring its habitat. In research into endangered species, such as E. chinense, permission from local and national governments must be acquired, via strict evaluation processes, prior to sample collection, which makes it di cult to conduct further research and collect samples. Moreover, direct collection of endangered species from foreign countries is largely prohibited by law, which explains the rareness, to date, of international-scale sample collections, and related research. Considering the di culty in performing population genetic, phylogenetic, and demographical research on endangered species, the present study, which includes a well-organized research model, has great value for the global protection of E. chinense. Total 140 *Only 10 samples out of 11 were employed. 'N' indicates the number of individuals. Diversity parameters are given for each locality: N = the number of CO1 sequences (individuals), N h = the number of haplotypes, N p = the number of private haplotypes, h = haplotype diversity, π = Jukes-Cantor corrected estimates of nucleotide diversity, S = the number of segregation sites, K = the average number of pairwise nucleotide differences, and 'nd' = not determined. Statistically signi cant values are written in bold: *P < 0.05 and **P < 0.01. The localities of the populations refer to Table 1 and Figure 1.