Collection sites
Since none of the D. oxycoccana sample collections in this study were carried out in restricted areas, national parks, etc., where permits are required, it is clearly stated that there is no content regarding collection permits. We examined a total of 632 D. oxycoccana individuals obtained in 2011‒2013 from 31 different population collections from Korea (22 collections) and USA (9 collections) (Fig. 1; Supplementary Material Table S1). Of these, 28 collections were obtained from highbush, lowbush, or rabbiteye blueberries (Vaccinium spp.) in Korea and USA, while three collections were obtained from cranberries (V. macrocarpon) in USA. Samples from US populations were collected from the central (Michigan), southern (Georgia), and eastern (New Jersey) regions, which comprise the native range of D. oxycoccana and possible sources of invasion into Korea. In the invaded region, samples from Korean populations were collected from areas where D. oxycoccana occurred during the period of initial invasion (2011‒2013). All samples consisted of D. oxycoccana larvae, which is the damaging stage living inside the plant tissues and thus having a strong association with its host plant. To avoid sampling related (sibling) D. oxycoccana individuals, specimens used in the molecular analyses were collected from different host plants that were distantly located. All freshly (live) collected D. oxycoccana larvae used for molecular analyses were carefully removed from infested buds and preserved in 95% or 99% ethanol, and stored at -70°C.
Microsatellite genotyping
A total of 632 D. oxycoccana individuals were genotyped using 12 microsatellite loci (Dox08, Dox09, Dox10, Dox11, Dox12, Dox22, Dox23, Dox25, Dox30, Dox33, Dox41, Dox42), which were previously isolated from this species (Kim et al. 2015). In a preliminary test, all loci developed in the previous study (Kim et al. 2015) were polymorphic among most population samples, and were thus included in the henceforth analyses. Total genomic DNA was extracted from single individuals using LaboPass ™ Tissue Genomic DNA mini Kit (COSMOGENETECH, Daejeon, Korea) according to the manual protocol. All genomic DNA templates were extracted from the whole body of D. oxycoccana larvae, which were mostly at the final instar stage. The tissues were left in the lysis buffer with protease K solution at 55°C for 24 hours, and then the cleared cuticle was dehydrated. Microsatellite amplifications were performed using AccuPower® PCR PreMix K-2037 (BIONEER, Daejeon, Korea) in 20 µl reaction mixtures containing 0.5 µM forward labeled with a fluorescent dye (6-FAM, HEX, or TAMRA), reverse primers and 0.05 µg of DNA template. Polymerase chain reaction (PCR) was performed using a GS482 thermo-cycler (Gene Technologies, Essex, UK) according to the following procedure: initial denaturation at 95°C for 5 minutes, followed by 34 cycles of 95°C for 30 seconds; annealing at 56°C for 40 seconds; extension at 72°C for 45 seconds, and a final extension at 72°C for 5 minutes. PCR products were visualized by electrophoresis on a .5 % agarose gel with a low range DNA ladder to check for positive amplifications. Automated fluorescent fragment analyses were performed on the ABI PRISM 377 Genetic Analyzer (Applied Biosystems, Waltham, MA, USA), and allele sizes of PCR products were calibrated using the molecular size marker, ROX labeled-size standard (GenScanTM ROX 500, ABI, Waltham, MA, USA). Raw data on each fluorescent DNA products were analyzed using GeneMapper® version 4.0 (ABI, Waltham, MA, USA).
Data analyses
For the 632 individual samples, the results of allele data analyses were processed in GENALEX 6.503 (Peakall, Smouse 2012) through Microsoft office Excel 2019 (Microsoft). We used GENCLONE 2.0 (Arnaud-Haond, Belkhir 2007) to identify multilocus genotypes (MLGs) among D. oxycoccana populations (Dorken, Eckert 2001). Observed (HO) and expected heterozygosity (HE) values were estimated on multiple loci using GENEPOP 4.0.7 (Raymond, Rousset 1995) among the population datasets as well as between the regional datasets. Levels of significance for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium tests were adjusted using sequential Bonferroni correction for all tests involving multiple comparisons (Rice 1989). Deviations from HWE were tested for heterozygote deficiency or excess. MICRO-CHECKER (Oosterhout et al. 2004) was used to test for null alleles (Brookfield 1996) and identify possible scoring errors because of the large-allele dropout and stuttering. The program FSTAT 2.93 (Goudet 2002) was used to estimate the gene diversity (HS), a mean number of alleles (NA), allelic richness (RS) and inbreeding coefficient (FIS).
Different groupings were tested independently on the (1) ecological basis (cranberry-associated versus blueberry-associated; ‘case 1’), (2) geographical basis (source versus invasion; D. oxycoccana from blueberry only; ‘case 2’), and (3) genetic structure-based groups (A, B, C, D, E; D. oxycoccana from blueberry only; ‘case 3’) with analysis of molecular variance (AMOVA) in ARLEQUIN 3.5.1.2 (Excoffier, Lischer 2010), with significance determined using the non-parametric permutation approach described by Excoffier et al. (1992). We also used ARLEQUIN for calculations of pairwise genetic differentiation (FST) values (Weir, Cockerham 1984), in which 31 populations were assigned by each local collection. Exact test of population differentiation was done as optioned by 100,000 Markov chains, 10,000 Dememorisation steps, and 0.05 significance level.
The program BOTTLENECK 1.2.02 (Piry et al. 1999) was used to identify in our samples the possible effect of a recent bottleneck, separately for each population. Two mutation models, considered appropriate for microsatellites (Cornuet, Luikart 1996; Piry et al. 1999), were applied as the strictly stepwise mutational model (SMM) and the two-phase model (TPM). For the TPM, a model that includes both 90% SMM and 10% TPM was used for 20,000 iterations. Significant deviations in observed heterozygosity over all loci were tested using a nonparametric Wilcoxon signed-rank test (Cornuet, Luikart 1996; Piry et al. 1999).
To examine genetic relationships among 31 D. oxycoccana populations, we used principal coordinate analysis (PCoA) on a genetic distance matrix based on codominant-genotypic distance (Peakall et al. 1995; Smouse, Peakall 1999) provided in GENALEX 6.503 (Peakall, Smouse 2012). The PCoA is a multivariate technique that allows to plot and visualize major patterns within a multivariate dataset (e.g., multiple loci and multiple samples) (Peakall et al. 1995; Smouse, Peakall 1999). Plots of the PCoA were independently estimated and drawn based on both individual and population distances.
STRUCTURE 2.3.4 (Pritchard et al. 2000) was used to analyze the genetic structure of 31 D. oxycoccana populations using a Bayesian clustering approach. We set the number of clusters (K) from 1 to 11 and conducted five independent runs for each value of K. Each run consisted of a burn-in period of 30,000 steps, followed by 500,000 Markov chain Monte Carlo (MCMC) repetitions with a model allowing admixture. The ΔK value calculated as ‘∆K = m(|L′′(K)|) / s[L(K)]’ was obtained using the ad hoc quantity, which is calculated based on the second-order rate of change of the likelihood (Evanno et al. 2005). To correctly perform this process, ∆K was calculated using the online resource STRUCTURE HARVESTER 0.6.94 (Earl 2012) that explained the data structure. Visualization of the STRUCTURE results was conducted using DISTRUCT 1.1 (Rosenberg 2004).
In addition, GENECLASS 2 (Piry et al. 2004) was used to perform the assignment/exclusion tests, which were used for the detection of genetic signatures of dispersal and immigration. For each individual of a population, the program estimated the probability of belonging to any other reference population or to be a resident of the population where it was sampled. The sample with the highest probability of assignment was considered as the most likely source for the assigned genotype. We used a Bayesian method of estimating population allele frequencies (Rannala, Mountain 1997) with Monte-Carlo resampling computation (10,000 simulated individuals) to infer the significance of assignments (type I error, alpha = 0.01) (Paetkau et al. 2004).
Approximate Bayesian computation analysis
To estimate the relative likelihood of alternative scenarios of the D. oxycoccana invasion, an approximate Bayesian computation (ABC) was performed for microsatellite data, as implemented in DIYABC 2.1.0 (Cornuet et al. 2014). DIYABC allows for the comparison of complex scenarios involving bottlenecks, serial or independent introductions, and genetic admixture events in introduced populations (Estoup and Guillemaud 2010). The parameters for modeling scenarios are the times of split or admixture events, the stable effective population size, the effective number of founders in introduced populations, the duration of the bottleneck during colonization, and the rate of admixture (Cornuet et al. 2010). The software generates a simulated dataset used to estimate the posterior distribution of parameters to select the most likelihood scenario (Cornuet et al. 2010). DIYABC generates a simulated dataset that is then used to select those most similar to the observed dataset, and so-called selected dataset (nδ), which are finally used to estimate the posterior distribution of parameters (Cornuet et al. 2008).
The DIYABC analysis was conducted on the purpose of inferring (1) the serial divergence between blueberry and cranberry host races in Analysis #1; (2) the initial introduction process of D. oxycoccana from the source (North American) to the invaded (Korean) regions in Analysis #2. Considering the results of PCoA, STRUCTURE and GENECLASS2 (see Results), several populations could be selectively used to estimate the scenarios based on their relationships. In the first ABC analysis that tested the divergence between blueberry and cranberry D. oxycoccana populations, Analysis #1, we set one cranberry-associated group (CR) with populations US-C-NJ4, US-C-MA, US-C-WC, and three blueberry-associated groups; New Jersey (USA) group (BBNJ) with populations US-B-NJ1, US-B-NJ2, US-B-NJ3, Georgia (USA) group (BBGA) with populations US-B-GA1, US-B-GA2, and Michigan (USA) group (BBMG) with population US-B-MG. Three scenarios (1‒3) were estimated with comparison to each other in the DIYABC (Supplementary Material Fig. S1). In the second ABC analysis (Analysis #2) that tested the introduction process of D. oxycoccana from the source (USA) to the invaded region (Korea), we set one source blueberry-associated group (SCNJ) with populations US-B-NJ1, US-B-NJ2, US-B-NJ3, and two invasive blueberry-associated groups; one invasive group A (INVA) with populations KR-B-UW, KR-B-HS1, KR-B-KY, KR-B-CW, KR-B-YD, KR-B-DA, KR-B-IS, KR-B-DJ, KR-B-SC, KR-B-HW, KR-B-NH, KR-B-JJ1, and another invasive group B (INVB) with populations KR-B-GJ, KR-B-HS2, KR-B-PT, KR-B-YS, KR-B-SJ, KR-B-BH1. SCNJ was set as the source group because all populations within this group came from New Jersey (USA), where D. oxyccocana originates and thus a likely source of invasion into Korea. Three scenarios (1‒3) also were estimated with comparison to each other in the DIYABC (Supplementary Material Fig. S2).
We produced 1,000,000 simulated datasets for each scenario. We used a generalized stepwise model (GSM) as the mutational model for microsatellites, which assumes increases or reductions by single repeat units (Cornuet et al. 2008). To identify the posterior probability of these three scenarios, the nδ = 30,000 (1%) simulated datasets closest to the pseudo-observed dataset were selected for the logistic regression, which were similar to the nδ = 300 (0.01%) ones for the direct approach (Cornuet et al. 2010). The summary of statistics was calculated from the simulated and observed data for each of the tested scenarios such as the mean number of alleles per locus (A), mean genetic diversity for each group and between group, genetic differentiation between pairwise groups (FST), classification index, shared alleles distance (DAS), and Goldstein distance.