Sampling and growth rate measurement
The translocated population of B. vittatum was established in 1993 using a mixture of three genetically divergent, conspecific source populations: the Abrolhos Islands (28° 57′ S 113° 57′ E); Albany (35° 04′ S 117° 55′ E); and Penguin Island (32° 18′ S 115° 41′ E). Pure B. vittatum were represented by samples from each of the source populations of the translocated population (Albany, Penguin Island and the Abrolhos Islands; Fig. 1) collected in 1993 (n=15). Pure B. auratum samples included those collected from Fremantle (n=10) in 1994, prior to hybridisation, and collected from Bayonet Head (n=20) in Albany in 2021. Samples were collected from the translocation site in 1996 (n=46), 2007 (n=64) and 2020 (n=73) to determine the amount of admixture between the parent species at each time point.
In 2007 and 2020 individuals were collected from the translocated population as juveniles (snails <5mm shell width) to conduct a growth rate experiment (Brooshooft, 2009). The dorsal surface of each juvenile was sprayed with non-toxic enamel paint, which has no detectable effect on growth rate (Parsons, 1997), and juveniles were left to grow for six months. Painted snails were then recaptured (n=191 in 2007 and n=384 in 2020), and measurements of initial width (indicated by the outer margin of spray paint) and final width (measured as the widest distance along the base of the shell) were obtained using digital callipers to 0.1mm accuracy. Growth rate was standardised using the below equation, to ensure measurements of growth rate were not confounded by initial size (as initial size increases, growth rate decreases) (Johnson and Black, 1998). Growth was standardised to a common initial width of 4.5mm using the following equation:
y’ = y + m(X-x),
where y’ is standardised growth, y is the measured change in width, m is the slope of regression of change in width against initial width, X is the standardised width (4.5) and x is the initial width. Untransformed data were used to obtain the slope of regression between change in width (y) and initial width (x) as no transformation resolved skewness of the data. The slope of the regression was significant only for samples collected in 2007, not 2020, but we standardised growth rate for both years using the above equation for consistency and to account for any effect of initial size on growth rate. Based on standardised growth rate, we chose the fastest and slowest growing individuals from 2007 (n=32 for both fast and slow growers) and 2020 (n=37 for both fast and slow growers) for Single Nucleotide Polymorphism (SNP) sequencing. Samples were stored at -80 o C awaiting DNA extraction.
DNA extraction
Genomic DNA was extracted from the head and foot tissue of each snail from each snail, following a combination of a modified DNeasy Plant Mini Kit (Qiagen) extraction protocol and CTAB extraction method (Cummins et al, 2023). Tissue was homogenised in 400 µL of CTAB buffer (2% CTAB, 1.4M NaCl, 20mM EDTA, 100mM Tris-HCL pH 8, 0.2% β-mercaptoethanol) using a mini pestle and some sand. 10 µL of 20 mg/mL of Proteinase K was added prior to incubation of each sample at 60 o C overnight. From incubation onwards, the manufacturer’s instructions outlined in the Dneasy Plant Mini Kit (Qiagen) were followed. To assess DNA concentration and quality, Nano-drop and gel electrophoresis were used, respectively.
Mitochondrial and genomic DNA sequencing
Mitochondrial DNA (mtDNA) haplotypes for each individual collected from the translocated population and each of the source populations were determined based on a partial sequence of the mitochondrial 12S ribosomal rRNA gene. 12S allows identification of individuals in terms of species and populations within species (Kennington et al, 2012). 324 base-pair fragments of the 12S ribosomal gene were PCR amplified using the method described in Williams et al. (2003). PCRs were performed as described in (Cummins et al, 2023), and PCR products were sent to the Australian Genome Research Facility (AGRF) for purification and sequencing. Sequences of 12S haplotype were edited and aligned manually using Geneious Prime 2022.0.1 (https://www.geneious.com).
Single Nucleotide Polymorphism (SNP) data were generated via a genotype-by-sequencing method (DarT-seq; http://www.diversityarrays.com/). DarTseq involves a combination of Illumina sequencing and DarT complexity reduction methods (DarT-seq; http://www.diversityarrays.com/). Preliminary data filtering and quality control were performed by DarT, using DarT’s proprietary algorithm, and SNPs were called from a groomed dataset of raw sequence reads via a secondary analytical pipeline (Kilian et al, 2012).
SNP filtering
SNP filtering was carried out using the R package ‘dartR’ (Gruber et al, 2018). SNP loci were removed if sequence coverage was exceptionally low (<5) or high (>50), the proportion of missing values was high (call rate <0.95 by individual and by locus), locus repeatability was <0.9 by individual and <0.95 by locus, minor allele frequency was <0.05 and if loci shared a sequence tag (secondaries). In the case of secondaries, the more repeatable SNP locus was retained.
As previously described in (Cummins et al, 2023), we refer to SNP loci that were fixed and contained alleles that were private (unique) to one species as diagnostic loci. Identification of diagnostic loci was based on the allele frequencies (reference and alternative SNP allele) at each locus in each population, calculated using the R package ‘hierfstat’ (Goudet, 2005). Alleles were considered private for B. auratum if they were present in both B. auratum populations (Albany 2021 and Fremantle 1994) and absent from all B. vittatum source populations. Alleles with a frequency equal to one in both B. auratum populations were considered fixed. The same approach was used to identify B. vittatum private alleles. These alleles were present in all B. vittatum source populations (Albany, Abrolhos and Penguin Island) and absent from both B. auratum populations. Loci that had alleles that were both private and fixed for either species were retained as diagnostic loci.
Using a similar filtering process, we identified alleles that were private to each B. vittatum source population. Alleles were considered private to either Abrolhos, Albany or Penguin Island if they were present in a single B. vittatum source population and absent from the other two B. vittatum sources and the native B. auratum population. Loci that were private to each B. vittatum source population were not necessarily fixed.
Data analysis
We used generalised linear models fitted with a quasibinomial error distribution to test the effect of differing levels of admixture and mtDNA haplotype on growth rate in the hybrid population. This approach was used in preference to assigning individuals to hybrid classes using NewHybrids (Anderson and Thompson, 2002) and testing for an association with growth rate because our previous analysis revealed a narrow range of generational classes in the 2007 and 2020 population samples (Cummins et al. 2023). We constructed the model using the ‘glm’ function from the R package ‘stats’ (R Core Team, 2020), with growth rate as the response variable, a measurement of admixture or heterozygosity as a main effect and an interaction between mtDNA haplotype and the main effect to look for a potential cyto-nuclear interaction effects. Standardised growth rate values for fast and slow growers from each year were converted into a binary representation, where ‘1’ represented fast growers and ‘0’ represented slow growers and the model was run separately for 2007 and 2020. Admixture coefficients were calculated for each individual from 2007 and 2020 using the software package Structure 2.3.4 (Pritchard et al, 2000). The first analysis included all loci and all sampled populations, and the second included diagnostic loci and only the native B. auratum population from Fremantle, the three B. vittatum source populations and the three timepoints sampled from the admixed population (Fremantle 1996, 2007, 2020). The number of genetic clusters (K) ranged from one to nine in the first analysis and from one to seven in the second analysis, with a burn-in of 10 000, 100 000 MCMC replicates (assuming correlated allele frequencies), and ten replicates per K in each analysis. High admixture coefficients (>0.5) corresponded to a high proportion of B. auratum alleles, and low values (<0.5) corresponded to a high proportion of B. vittatum alleles. Another measure of admixture (hybrid index) was calculated using the R package ‘introgress’ (Gompert and Buerkle, 2010), using all loci and using just diagnostic loci. ‘introgress’ measures the proportion of parental ancestry of each hybrid, based on the proportion of alleles inherited from B. auratum or B. vittatum parents, via a maximum likelihood estimation of hybrid index. For both analyses, B. vittatum collected from the Abrolhos, Albany and Penguin Island represented the source of B. vittatum alleles, and B. auratum collected from Fremantle (1994) and Albany (2021) represented the source of B. vittatum alleles. High hybrid index values (>0.5) corresponded to greater B. auratum ancestry and low values (<0.5) corresponded to B. vittatum ancestry. In addition, we calculated observed heterozygosity and the frequency of the homozygous reference and the homozygous alternative SNP, using ‘dartR’. Each of these measurements was also tested as a main effect. We tested for an association between each measure of admixture and between the same measurements calculated using all loci versus diagnostic loci using Spearman’s correlation test (Spearman, 1904) in R. We investigated the effect of species-specific haplotype (12S partial gene matching B. auratum or B. vittatum) and population-specific haplotype (12S partial gene matching B. vittatum source Albany, Abrolhos or Penguin Island) on the relationship between admixture and growth rate.
For alleles that were unique to either the Abrolhos, Albany or Penguin Island, we calculated the change in frequency over time of each allele in the Fremantle admixed population, from 1996 to 2020, to see if any alleles increased in frequency over time, contrary to the overall pattern. For all filtered loci we calculated Weir and Cockerham’s estimate of locus-specific FST (Weir and Cockerham, 1984) in the Fremantle admixed population, using ‘hierfstat’. The significance of FST estimates for each locus was determined by comparing the observed value to a distribution obtained by randomly shuffling genotypes among population samples (Fremantle 1996, 2007 or 2020) and recalculating the FST value 1000 times. For loci that were unique to a B. vittatum source population and significantly increased in frequency over time in the Fremantle population, we estimated pairwise linkage disequilibrium between loci, using the R package ‘snpStats’ (Clayton, 2022).
To investigate whether individual loci had a significant effect on growth rate, we used a similar generalised linear model to that described above. The analysis was conducted using all SNP loci and using just diagnostic loci. Standardised growth rates were converted to ‘0’ for slow growers and ‘1’ for fast growers. As above, we constructed the quasibinomial generalised linear model using the ‘glm’ function from the R package ‘stats’. We included hybrid index in the model as the main effect, to account for the effect of admixture on growth rate and independently tested the effect of the genotype at each locus (0 = two copies of reference allele, 0.5 = heterozygote and 1 = two copies of alternate allele) as a covariate in the model. We did not include the interaction effect of haplotype in the model because it was found to be non-significant in the previous models. We corrected for multiple comparisons using a permutation method. To do this we randomly shuffled growth rates with respect to the multilocus genotypes of the individuals. After testing the effect of the most common allele at each locus separately, the smallest P-value was extracted from each permutated data set. The P-value of the real non-permuted data was considered significant at P < 0.05 if it was lower than the 5% percentile of the distribution of 1000 permuted values. We also used the ‘p.adjust’ function in the R package ‘stats’ to control for the false discovery rate (Benjamini and Hochberg, 1995) of loci showing a significant association with growth rate.
To assess whether (putatively) adaptive variation within each species showed patterns of introgression that differed from other loci we first conducted outlier analyses on each species separately using the source populations for B. vittatum and samples from Albany and Fremantle (prior to the introduction of B. vittatum) for B. auratum. Several methods of outlier detection were used. The first was a Bayesian model-based approach, conducted using Bayescan 2.0 (Foll and Gaggiotti, 2008). This method has been shown to have lower rates of type 1 errors, and SNP loci with a q-value of less than 0.1 were considered to be under directional selection. The second method was a principal component approach to identify outlier loci, using the R package ‘PCAdapt’ (Privé et al, 2020), which is more powerful than other outlier detection methods in the presence of population divergence (Luu et al, 2017). We accounted for loci that were close in proximity to each other by applying a linkage disequilibrium thinning step, and used false discovery rate (FDR) to identify outliers while adjusting for multiple comparisons, using the R package ‘qvalue’ (Storey et al, 2019). Loci with a q-value less than 0.05 were retained as putative outlier loci for each species. Third, we used the R package ‘Outflank’ (Whitlock and Lotterhos, 2015) to identify outlier loci based on atypical values of FST.
We then used BLAST (Altschul et al, 1990) to see if the sequences of any loci with potentially interesting patterns (including outlier loci and alleles unique to B. vittatum source populations that increased in frequency in the admixed population) were associated with genic regions. For B. vittatum loci that increased in frequency, we conducted an initial general nucleotide blast search (blastn) using both the megablast (for highly similar sequences) and discontiguous megablast (for more dissimilar sequences such as sequences from different species) settings. A match was considered significant with an e-score of 1x10-10 or less. Due to the underrepresentation of chromosome level genome assemblies (including annotation) for gastropods, and the short sequence length (69 base pairs) of SNP sequences, few matches resulted from a general nucleotide blast. We also blasted our sequences against the ‘whole-genome shotgun contigs’ database, including gastropod genomes assembled to the level of contigs or scaffolds. If any loci significantly matched part of another gastropod species assembly, we used the matching region, plus surrounding base pairs to provide a longer sequence (150 base pairs) to blast.