Study areas and sampling method
Eelgrass, Zostera marina L., is distributed in the coastal waters of Japan between temperate and subarctic seas (Nakaoka and Aioi 2001). The study areas were the Seto Inland Sea, Tokyo Bay, and the coastal waters of the town of Akkeshi, all of which are located along the Pacific Ocean (Fig. 1a) and are sites of eelgrass beds. All the sampling sites were in sheltered areas and near the offshore ocean.
The Seto Inland Sea (SIS) has mouths at its western and eastern sides that connect it to the Pacific Ocean (Fig. 1a). The calm conditions in the SIS account in part for the widespread eelgrass within it (Yoshida 2012). The area of eelgrass in the SIS began to decline in the 1970s because of the degradation of water quality and the reclamation of sites for coastal development and port construction (Komatsu 1997). Both restoration efforts in the 2000s and natural reproduction have allowed the eelgrass beds to largely recover (Morita et al. 2002; Hosokawa et al. 2019); natural reproduction has provided evidence of connections among eelgrass populations in the SIS. We focused on the populations on the western side (spatial scale < 200 km), which are isolated by a geographical barrier, the Hoyo Strait (Fig. 1b). We sampled at 12 sites within the inner SIS and two sites outside the SIS by the Hoyo Strait.
Tokyo Bay adjoins the Pacific Ocean at the mouth of the bay (Fig. 1a). Although eelgrass beds were widely distributed in the bay in the 1900s (Morita 2013), the present beds are fragmented because much of the eelgrass disappeared after the urbanization of the coastline. The eelgrass was sampled at three sites located ~ 20 km from each other (Fig. 1c). There has been an eelgrass bed at the Futtsu site since before urbanization (Yamakita and Nakaoka 2009; Yamakita et al. 2011). The Yokohama Seaside Park (YSP) is an artificial, shallow flat that lies within 1 km of another seagrass bed that is distributed on the flat (Watanabe et al. 2024). Eelgrass seed was artificially dispersed at the YSP as part of restoration programs (https://www.spf.org/opri/newsletter/120_3.html). Although the Hashirimizu coast has been hypothesized to be the source of the seed (see Fig. 1c for the location of the coast), there is no definitive record with which to test that hypothesis. Eelgrass sampling was performed at subsites YSP1, YSP2, and YSP3, which are 20–40 m from each other within the site. The third sampling site was Kurihama Bay, which is located at the outside of Tokyo Bay (Fig. 1c). Aerial photographs indicate that the eelgrass beds have recovered since the 1990s at this site (Hosokawa et al. 2010). Five microsatellite markers have been used to analyze the genetic differences between the eelgrass at different sites in Tokyo Bay (Tanaka et al. 2011), but the genetic structures of the three populations remain unclear.
The Akkeshi site consisted of the Akkeshi-ko estuary and Akkeshi Bay, which are connected by a narrow channel (Fig. 1d). There are areal gradients of water temperature and salinity between Akkeshi Bay and the mouths of small rivers as well as Bekanbeushi River (Momota and Nakaoka 2018). The biomass of eelgrass, which is distributed in sandy or muddy areas at depths of ~ 5 m in this study area, varies along substrate gradations (Hasegawa et al. 2008; Momota and Nakaoka 2017; Momota and Nakaoka 2018). The tide drives currents, and the outflow of water from the estuary to the bay is governed by the estuarine circulation at the surface (Hasegawa et al. 2018), but the currents are restricted by the vegetation in the estuary (Hasegawa et al. 2008). We chose three sites from the eelgrass beds within the Akkeshi-ko estuary and two sites outside the estuary through a channel in Akkeshi Bay (Fig. 1d). The study area within Akkeshi Bay was the smallest of the three areas that we studied.
We chose 22 sites for eelgrass sampling (Table 1). We collected at least 20 samples at each site from within the bed or 10–20 samples from rafted individuals. The youngest and/or second-youngest eelgrass leaves were cut from each individual to mitigate potential DNA contamination from epiphytes on the eelgrass leaves (Hosokawa et al. 2009). The leaf samples were subsequently rinsed with distilled water, wiped with paper towels, and packed in silica gel for transport to the laboratory. At most sites, the sampling was performed in 2022 and 2023, but at the KRH site in Tokyo Bay in 2019 (Table 1).
Table 1
Samples used in this study. Type of sample: A = sample collected from bed, B = drifted sample, and C = sample obtained in 2019
Region | Site | Type of site | Abbreviation | Sampling | No. samples taken | Type of sample |
Year | Month |
SlS | Nakatsu | inner | NKT | 2022 | Aug | 10 | B |
| Aio | inner | AIO | 2022 | Dec | 20 | A |
| Aio | inner | AIOd | 2022 | Aug | 20 | B |
| Murozumi | inner | MZ | 2022 | Aug | 13 | B |
| Beppu | inner | BPU | 2022 | Aug | 10 | B |
| Hiji | inner | HIJ | 2022 | Dec | 20 | A |
| Mitsukue | inner | MTK | 2022 | Dec | 20 | A |
| Suo-oshima | inner | SUO | 2022 | Dec | 20 | A |
| Gogo-shima | inner | GOG | 2022 | Dec | 20 | A |
| Baishinji | inner | BSJ | 2022 | Aug | 16 | B |
| Mitsukuchi | inner | MIT | 2022 | Dec | 20 | A |
| Omishima | inner | OMS | 2022 | Aug | 15 | B |
| Saiki | outer | SIK | 2023 | Jun | 20 | B |
| Uwajima | outer | UWJ | 2023 | Jun | 20 | B |
Tokyo Bay | Futtsu | inner | FT | 2022 | May | 40 | A |
| Yokohama Seaside Park | inner | YSP | 2022 | May | 29 | A |
| Kurihama | outer | KRH | 2019 | June | 20 | A, C |
Akkeshi | Akkeshi-ko1 | inner | AK1 | 2022 | Jul | 39 | A |
| Akkeshi-ko2 | inner | AK2 | 2022 | Jul | 41 | A |
| Central lake | inner | CL | 2022 | Jul | 20 | A |
| Shinryu | outer | SR | 2022 | Jul | 20 | A |
| Aininkappu | outer | AI | 2022 | Jul | 20 | A |
Genetic analysis
In 2019, the samples were freeze-dried and homogenized, and the genomic DNA was extracted using an MPure Bacterial DNA extraction kit (MP Bio Japan K.K., Tokyo, Japan). In 2022 and 2023, the samples were ground in liquid nitrogen, and the genomic DNA was extracted using an Isospin Plant DNA extraction kit (Nippon Gene Co., Ltd., Tokyo, Japan).
Genotyping by Random Amplicon Sequencing-Direct (GRAS-Di®) was used for genome-wide sequencing. The GRAS-Di method involves a two-step polymerase chain reaction (PCR) with random primers to generate a sequence library of amplicons (Enoki and Takeuchi 2018; Enoki 2019); details of the method are described in Patent ID P2018 42548A. The library construction and sequencing were carried out by Bioengineering Lab. Co., Ltd. (Sagamihara, Japan) for the samples in 2019 and by GeneBay, Inc. (Yokohama, Japan) for the samples in 2022 and 2023. The sequencing was performed using a NextSeq 500 (Illumina, San Diego, CA, USA) with a 76-bp paired-end protocol for the samples in 2019 and a DNBSEQ-G400 (MGI Tech Co., Ltd., Shenzhen, China) with a 150-bp paired-end protocol for the samples in 2022 and 2023.
SNP filtering
We used Cutadapt version 2.8 (Martin 2011) to filter the reads generated by next-generation sequencing by trimming low-quality ends in the sequence reads (≤ Q20), filtering of short reads (remaining length > 51 bp) and removal of adapters. The cleaned sequences were mapped to the reference genome (JGI: Zostera marina v3.1; Ma et al. 2021) using the BWA-MEM algorithm in BWA version 0.7.17 (Li and Durbin 2009). Identification of duplicate reads, base quality score recalibration, and SNP calling were performed using GATK version 4.2 (McKenna et al. 2010); base quality score recalibration was performed by hard filtering.
Repeated SNPs are noise in the analyses of genetic structure and were filtered using bedtools version 2.27.1 (Quinlan and Hall 2010). Repeat sequences in the reference genome were masked. Low-quality SNPs (< Q20 and < 5 in depth) were filtered using VCFtools version 0.1.16 (Danecek et al. 2011), and the insertions and deletions were removed. We defined this filtering to be the first stage of SNP filtering. The rate of genotyping may have been lower for samples collected in 2019 than for newer samples after this stage of filtering (see Figs. S1 and S2 in Supporting Information A).
Analysis of genetic structure
The occurrence of IBD and IBR was tested based on the genetic distances in each study area. The genetic distance between sites was quantified by the value of the pairwise FST/(1 − pairwise FST). The pairwise FST was calculated using SNPs from the first stage of filtering and the individuals that remained after the second stage of filtering (vide infra). The geographic distance of the site pair was calculated using the direct distance of the pair on the map and the distance via the transit point at the pair where the direct distance was not rational on the real current (Fig. 1b, c, d). The possibility of resistance was considered between the outer and inner sites and between outer sites (Table 1). The nonparametric Wilcoxon rank sum test was used to test the difference in the genetic distance between sites with and without the possibility of resistance. The other test was based on a multiple matrix regression with randomization (MMRR) (Wang 2013). We tested for the possibility of a single pattern of IBD or IBR as well as multiple patterns. In that case, we set the paired site with possible resistance to 1 and the paired site without possible resistance to 0. The number of random permutations was set to 9999 in the SIS because of the large number of total, real permutations and to 119 permutations calculated from five sites in Tokyo Bay and Akkeshi. We also demonstrated the statistical significance of IBD at a small number of sites by a simulation using the inner sites in the SIS with the MMRR algorithm. The simulation selected five sites randomly per iteration, and its statistical significance was tested with 119 permutations. The simulation was performed 100,000 times. The number of significant iterations was counted. The pairwise FST was calculated using the fs.dosage functions in hierfstat version 0.5–11 of the R package (Goudet et al. 2022). The geographic distance was calculated using the distGeo function in geosphere version 1.5–18 (Hijmans et al. 2022). The Mantel test was performed by using the mantel function in vegan version 2.6-4 (Oksanen et al. 2022).
We analyzed the detailed genetic structure using a principal component analysis (PCA) and ADMIXTURE analysis. These analyses usually required additional filtering to remove noise. This second stage of filtering was set to avoid the drawback of reducing the numbers of samples and SNPs (see Table S1 and Fig. S3 in Supporting Information A). Such a reduction, which would have decreased the statistical power and caused a bias in the calculation of statistics (Weir and Goudet 2017), would have occurred in our dataset because of the increased strength of filtering (Fig. S4) and would have caused a serious problem in comparison of statistics between the samples from 2019 and more recent samples (Fig. S5). In the second stage, we filtered the samples based on their rate of genotyping, the SNPs based on their rate of genotyping, and the SNPs based on their minor allele frequency (see Supplementary Information B). We used strict filtering in the SIS and Akkeshi (Table S2 in Supplementary Information C) and moderate filtering in Tokyo Bay to retain the KRH samples obtained in 2019 (Table S3).
The PCA used a count datum of 0 when the alleles did not match the reference genome at marker j and individual i of the diploid sample, 1 when the alleles matched the reference in heterozygous individuals, and 2 when the alleles matched the reference in homozygous individuals (Patterson et al. 2006). The count data were normalized, and the analysis was performed on a matrix consisting of the count data. A scree plot was used to determine the number of significant principal components in the PCA.
ADMIXTURE analysis is a statistical technique for estimating individual ancestries in K genetic clusters based on the likelihood of an allele’s occurring at locus j in individual i (Alexander et al. 2009). ADMIXTURE analysis was conducted with genetic clusters ranging from K = 1 to K = 10. A fivefold cross-validation was carried out for each number of genetic clusters. We considered the number of genetic clusters with the minimum cross-validation to be the optimized model (Alexander and Lange 2011). We used the model with minimum cross-validation and models with cross-validation close to the minimum cross-validation to find the genetic structure in the three study areas.
Nucleotide diversity, the proportion of variants, and the population-specific FIS and FST were calculated based on the SNPs after the first stage of filtering. Because nucleotide diversity can be biased when calculated using a marker for which the rate of genotyping is biased, we calculated it for the loci for which the genotyping rate was 100% (i) at the site or (ii) in the study area. The first method made it possible to obtain a relatively large number of SNPs, but there was a risk of assessing the nucleotide diversity under different markers across sites. The second method was more likely to assess the nucleotide diversity under same marker across sites, but it may have been less powerful because the number of SNPs was relatively low. We refer to the former as “nucleotide diversity” and the latter as “restricted nucleotide diversity”. The population-specific FIS is a relative inbreeding coefficient (Weir and Goudet 2017), which has a low value when the degree of heterozygosity is high and a value of 1 when the homozygosity is a maximum. The population-specific FST is a measure of deviation from the ancestral population (Kitada et al. 2021) and has a maximum value of 1. The population-specific FIS and FST can take negative values. The two nucleotide diversities and the population-specific FIS and FST were estimated with a bootstrapping method, randomly resampling one-tenth of the SNPs after the first stage of filtering in a bootstrap sample and with 200 bootstrap samplings; their 95% confidence intervals (CI) were also calculated. Because the restricted nucleotide diversity and the population-specific FIS and FST depend on individuals, they were estimated in the individuals retained after strict and moderate filtering in the analysis of Tokyo Bay.
The PCA and ADMIXTURE analysis were performed using the R package pcadapt version 4.3.3 (Luu et al. 2017) and the ADMIXTURE software version 1.3.0 (Alexander and Lange 2011), respectively. Missing data were inputted using the pairwise covariance approach as the default process in the R package and ignored in ADMIXTURE. The nucleotide diversity was calculated by the pi.dosage functions in hierfstat version 0.5–11 of the R package. The population-specific FIS and population-specific FST were calculated using the fs.dosage functions in the hierfstat package.