Data Collection
Sampling and DNA isolation
Typically, caecilians are considered difficult to find due to their mostly burrowing nature [87], such that substantial, dedicated fieldwork can be required to collect suitable numbers of samples for phylogeographic projects. This difficulty in sampling is perhaps one reason that very few studies of caecilian molecular ecology have been undertaken [74–76]. Hypogeophis rostratus were collected on nine granitic islands in the Republic of the Seychelles between 1981 and 1991, and between 2013 and 2015: Mahé, Sainte Anne, Cerf, Silhouette, Frégate, La Digue, Praslin, Curieuse, and Félicité (Fig. 3). Our sampling covers the known distribution of the species [56] with the exception of the small island of Grande Soeur. We included a total of 782 individuals in the various components of this study (Additional file 2). The specimens collected between 1984 and 1991 for molecular analyses (mt- & nuDNA sequences and AFLPs) were either returned live to the lab, subsequently euthanized, and liver and muscle tissues snap frozen and stored at -80°C at University of Michigan Museum of Zoology; or they were collected in the field and snap frozen. Additional tissue samples were collected between 2013 and 2015 for molecular analyses (mt- and nuDNA sequences) and were processed in the field or were non-lethally sampled (buccal swabs and/or annular pocket biopsies, [following 88]) and stored in 100% EtOH. Specimens used for morphological analyses (collected between 1984 and 1991) were fixed in 10% buffered formalin and stored in 65 – 70% EtOH. Samples utilized in this study are represented by vouchers accessioned to the permanent collections of the Natural History Museum, London (BMNH, issued with SM field codes) and the University of Michigan Museum of Zoology (UMMZ) (Additional file 2). Genomic DNA from 312 individuals of Hypogeophis rostratus from nine islands was obtained from tissue by a standard [89] proteinase K digestion with either a phenol/chloroform/isoamyl alcohol extraction or a Qiagen DNeasy Blood and Tissue kit [see 88 for non-lethal DNA extraction protocols].
DNA amplification for Sanger sequencing
The polymerase chain reaction (PCR) was used to amplify a portion of the mtDNA cytochrome b (cytb) gene from 100 individuals sampled from nine islands (Additional file 1) and four nuclear loci for 26 individuals from seven islands; samples for sequencing were semi-randomly selected in order to obtain geographic coverage. The nuclear loci consisted of portions of two protein coding genes (pro-opiomelanocortin (pomc) and brain-derived neurotrophic factor (bdnf)) and two anonymous nuclear markers (brev5 and rost5) [79]. See Additional file 1 for primer, DNA amplification, sequencing and sequence editing details.
Amplified Fragment Length Polymorphism (AFLP) data generation
AFLP data were used to assess nuclear genetic variation among sampled populations of H. rostratus. AFLP marker profiles were generated for a total of 274 individuals from seven islands (all of these seven islands were among those sampled in the mitochondrial and nuclear DNA sequence datasets: Additional file 2), collected between 1984 and 1991, using the protocol described by Vos et al. [90] with modifications described in Mock and Miller [91]. In the second selective amplification step, the following four primer combinations were used to generate multilocus DNA fingerprints: EcoACG/MseAGA, EcoACG/MseACT, EcoAGG/MseATC, EcoAGG/MseAGA. After the second selective amplification, PCR products were run on an ABI 3100 automated DNA sequencer (Applied Biosystems, Inc.) using a Rox400 internal size standard. Repeatability was checked by duplicating ca. 15% of samples. Following sequencer runs, the presence or absence of individual AFLP marker phenotypes were visualized and scored using Genographer [92]. Markers were scored if they were polymorphic (95% criterion) and could be scored unambiguously across the dataset. Scoring was performed without reference to sample or population identity. The final AFLP dataset included data from 49 polymorphic loci.
Phenotypic traits
Morphological data were generated for 506 individuals of H. rostratus from eight islands (including all islands that were sampled for mitochondrial and nuclear DNA sequence data: see Additional file 1). Only individuals collected between 1984 and 1991, that were in a good preservation state, were examined for morphology. The sex of 461 of the specimens was determined by direct examination of gonads (females = 231, males = 230). We examined 5 males and 4 females from Curieuse, 9 males and 11 females from Félicité, 17 males and 21 females from Frégate, 30 males and 15 females from La Digue, 98 males and 115 females from Mahé, 37 males and 39 females from Praslin, 2 males and 2 females from Sainte Anne, and 26 males and 17 females from Silhouette. A ruler was used to record total length to the nearest 1.0 mm. Dial calipers were used for all other measurements. Body width was measured to the nearest 1.0 mm, all remaining measurements to the nearest 0.1 mm. The morphological measures recorded were: total length (TL), body width at midbody (BW), head length measured dorsally from tip of snout to first groove of first collar (HL), head width at level of corner of mouth (HW), inter-ocular distance between medial borders of eyes (IO), inter-narial distance between medial borders of nares (IN), eye-naris distance from anterior margin of eye to posterior margin of naris (EN), eye-tentacle distance from anterior corner of eye to midpoint of tentacular aperture (ET), tentacle-naris distance from midpoint of tentacular aperture to posterior margin of naris (TN), number of primary annular folds (grooves) after second collar (PF), number of primary annular folds interrupted by the vent (VF), number of primary annuli bearing partial or complete secondary annular folds (SF), number of secondary folds that completely encircle the body (CSF), total number of vertebrae counted from x-ray plates (VERT), number of overlapping rows of scales in a posterior primary fold counted in the dorsal region (SR), and number of primary folds containing scales (PWS) determined using the method described by Wilkinson et al. [93].
Data Analyses
Mitochondrial and nuclear DNA sequence analyses
We inferred relationships among individuals by constructing haplotype networks and by inferring phylogenetic trees. Individual alleles for diploid nuclear sequences were reconstructed using PHASE v.2.1.1 [94]. Input files for PHASE were produced using seqPHASE [95]. PHASE was executed three times for each locus, at a random starting seed, for 1000 iterations, a 10 thinning interval, and 100 burn-in. Each run was examined for mean frequency concordance, and the run that was most similar to zero selected for further analysis. Sites with heterozygous probabilities of ≥0.7 were considered to be correctly called by PHASE, others being coded as IUPAC ambiguities. Haplotype networks for cytb and nuclear sequence data were constructed using the median-joining algorithm [96] as implemented in the software NETWORK v.4.611 (fluxus-engineering.com). Uncorrected pairwise (p-) molecular genetic distances and Tajima D values were estimated using MEGA X [97].
Phylogenetic trees were inferred for cytb using Bayesian inference (BI) methods. PartitionFinder v.2.1.1 [77] was used to identify the best-fitting partitioning strategy and models of sequence evolution. MrBayes v.3.2.2 [78] was used to infer the BI tree for cytb, sampling every 10,000 generations over 106 generations with one cold and three heated chains. Tracer v.1.7 [98] was used to check that chain convergence and good mixing occurred for all parameters and that all effective sample size (ESS) values were >200. Convergence was also investigated by assessing that the potential scale reduction factor (PSRF) was close to 1.0 for all parameters, that the average standard deviation of split frequencies was below 0.01 and that log probability was within a relatively stable range and not still increasing. The first 10% of trees were discarded as burn-in. Cytb sequence data for a congeneric endemic Seychelles caecilian, H. brevis, were obtained from GenBank (HQ444110), and used as a closely related outgroup [38] to root phylogenetic trees. BI analyses were performed on the CIPRES Science Gateway v.3.1 [99].
Based on results from the initial molecular analyses, the Isolation-with-Migration (IM) model [100] as implemented in IMa2 [80, 101] was used to estimate population sizes, migration and time since divergence between the two major H. rostratus clades found ((Praslin, La Digue, Felicite, Curieuse, Frégate) and (Mahé, Silhouette, Cerf, St Anne)). The null hypothesis is that, since divergence, there has been no gene flow between the two (northern and southern island-group) lineages because of a saltwater barrier separating the two groups [28–32]. Initially, the mitochondrial cytb gene was used in the analyses (as its mutation rate would allow us the estimate of the parameters in meaningful demographic units) but after multiple independent runs, double peaks were observed in several parameters, which were likely a consequence of reciprocal monophyly between the two groups for cytb data. Final IMa analyses therefore were carried out with only the nuclear data. No molecular rate was provided for the nuclear data because of a lack of comparative evolutionary rates for these markers. The IS model was used for the nuclear loci. Posterior density curves were inspected for clear peak estimates and appropriate prior distributions boundaries. Priors were defined based on summary statistics and adjusted following initial runs. The final analysis was run five times with different starting seeds and checked for consistency between runs.
Time estimates for the split between the northern- vs southern-island lineages were made using *BEAST in BEAST2, defining each clade as a “species” [102] for all of the sequence data (mt- and nuDNA). We estimated divergence dates based on an evolutionary rate of approximately 1% sequence divergence per million years with a lognormal distribution and a standard deviation of 0.0027 for cytb, following other studies of amphibians and reptiles [45, 103]. Nuclear rates were estimated relative to cytb. PartitionFinder v.2.1.1 [77] was again used to identify the best-fitting models of sequence evolution based, this time on each locus. Some of these were later replaced by their immediate less complex most similar model, when they failed to converge. The Yule Model prior was used as the tree-prior because we had no evidence of extinction in the population, and because choice of prior is unlikely to impact estimated divergence dates significantly [104]. As a test of time estimate robustness we also ran analyses using the Birth Death Model and the Coalescent (constant population size) priors. A log normal relaxed clock was used as the clock-prior for the cytb partition, and strict clock priors were used for the nuclear partitions (according to the coefficient of variation obtained during preliminary runs, a strict clock was considered appropriate to use). Analyses were run for 106 generations, with sampling every 10,000 generations. Tracer was used to investigate that good mixing was occurring between all parameters and that ESS values were >200 as per previous analyses.
AFLP analyses
Two different approaches were used to quantify within-island AFLP diversity. First, the program TFPGA [105] was used to calculate Nei’s [81] unbiased heterozygosity. Allele frequencies were calculated from the dominant AFLP marker data using the allele frequency estimator of Lynch and Milligan [106]. Second, the program MANTEL-STRUCT [107] was used to obtain average pairwise band sharing coefficients between individuals within islands. This procedure calculates the average proportion of shared AFLP marker phenotypes among individuals.
Genetic structuring among islands and island groups was characterized using Wier and Cockerham’s [108] estimate (θST) of Wright’s FST in Arlequin v.3.5 [109]. Ninety-five percent confidence limits of qST were obtained via bootstrapping over loci. Given that other analyses of our genetic data indicated the presence of two distinct island groups (see Results), we also estimated qST and its associated confidence limits for a posteriori grouping of islands. To further refine our inferences about patterns of genetic structure among islands, we used TFPGA to generate a UPGMA dendrogram based on pairwise Nei’s [81] unbiased genetic distances. Support for resulting dendrogram clusters was quantified using a bootstrap procedure over loci (sensu Felsenstein 1985 [110]). To further characterize patterns of genetic differentiation, we used the program NTSYSpc (Exeter Software, Inc.) to perform a principal coordinates ordination to visualize patterns of (dis)similarity among all individuals included in AFLP analyses.
To identify the number of discrete populations (K) that occur within the sampled H. rostratus we used a Bayesian clustering algorithm implemented in STRUCTURE v.2.3.4 [82]. To assess additional possible substructure, subsets of the data that formed distinct clusters in initial analyses were also subject to the same analyses (southern island group; northern island group; Mahé). Each individual analysis comprised four replicates of 100,000 steps with a 10,000 step burn-in. Given our assumption that individuals from the different islands have not been in contact, the no-admixture model with correlated allele frequencies was used [111]; tested K values were specified based on the number of islands for which data were generated for (i.e. K1 – K7 for the full dataset) in each analysis. An analysis of H. rostratus genetic variation within the largest, highest and geographically most complex Seychelles island of Mahé, which has the most sampled populations (seven) and specimens (n = 97), was carried out using the admixture model to investigate intra-island population structure. To infer the most likely K for each dataset STRUCTURE HARVESTER [112] was used to determine DK [113]. The most likely K value was then subjected to independent runs on STRUCTURE of 1 x 106 with a burn-in of 100,000 steps. Final summary figures of STRUCTURE results were created using distruct v.1.1 [114].
Phenotypic data analyses
Nussbaum & Pfrender [115] presented evidence of sexual size dimorphism (SSD) in Frégate H. rostratus. We tested for SSD across all sampled islands using a two-factor analysis of covariance (ANCOVA) with TL as the covariate implemented using PROC GLM [116]. Using ANCOVA, trait means were adjusted for TL and tested for effects of sex, island of origin, and their interaction. Results from this analysis suggested strong dimorphism between males and females for body width and head dimension traits. Subsequent phenotypic multivariate analyses and estimation of levels of population subdivision for phenotypic traits were conducted separately for each sex.
Multivariate analyses for measures and counts of 16 phenotypic traits in 170 males and 146 females were conducted separately for each sex to determine patterns of morphological variation among islands. Principal components analysis (PCA) of metric characters (transformed relative to TL using the allometric vs. standard method [117]) and principal coordinates analysis (PCoA) of metric + meristic variables was implemented using PAST3 [118]. In addition, we estimated Mahalanobis distance matrices using PROC CANDISC (SAS 2003) to generate a matrix of pairwise island phenotypic distances. This matrix was used to construct UPGMA dendrograms of among island morphological variation for each sex using MEGA v.3.1 [119].
To examine patterns of phenotypic divergence on a trait-by-trait basis we obtained restricted Maximum Likelihood (ML) estimates of the variance components explained by island of origin (Vi) and error (Ve) for each trait using PROC VARCOMP [116]. Island and error variance estimates are analogous to estimates of the within and among island components of variance, respectively. These estimates were then used to calculate the level of population subdivision for phenotypic characters, PST, using the formula Vi/(Vi + 2(Ve)). This measure is analogous to the standard measure of quantitative trait subdivision QST [120, 121]. However, because our measurements were not taken from individuals reared in a common environment we cannot partition genetic from environmental effects, and thus PST includes both genetic and environmental sources of variance. We used a nested design, with islands nested within groups (southern, and northern + Frégate island groups) to estimate the levels of subdivision at the level of island group. Because the univariate ANCOVA demonstrated significant sexual dimorphism we examined the correlation between males and females in population subdivision across all phenotypic traits using a least-squared regression of PST values.
Combined data analysis
To assess factors influencing evolution within H. rostratus, simple and partial Mantel tests [122, 123] were employed to investigate isolation-by-distance (IBD) and isolation-by-adaptation (IBA), respectively. Morphometric (male and female), molecular (AFLP and mtDNA) and geographic distance data were used in IBD and IBA comparisons to investigate correlations among datasets. The vegan package [124] as implemented in R v. 3.2.3 [125] was used for all tests. Analyses were completed using the Pearson method with 999 permutations. For direct comparisons, only islands that had data available across all sampled datasets were used in tests, leaving a total of seven islands: Curieuse, Frégate, La Digue, Mahé, Praslin, Ste Anne, Silhouette.
To produce distance matrices for the Mantel tests, appropriate analyses for each dataset were completed. For the morphometric data, Gower similarity coefficients for males and females were used and generated by PAST v.3.05 [118]. For AFLPs, ΦPT were generated using 999 permutations with the Microsoft EXCEL add-in GenA1Ex v.6.502 [126, 127]. For mtDNA data the Kimura two-parameter (K2P) model [128] was used to generate K2P distances in MEGA X [97] with partial deletion. The Geographic Distance Matrix Generator v.1.2.3 (Ersts; http://biodiversityinformatics.amnh.org/open_source/gdmg/) was used to generate a pairwise distance matrix for each geographic sampling locality. The probability of the observed correlation was estimated by comparison with a distribution of correlation coefficients generated with 1,000 random permutations of matrix elements.