Sampling
Scat collection was conducted between 2008–2010 within two areas in Central Belize (Figure 1A) as part of the Global Felid Genetics Program: The Central Belize Corridor and Cockscomb Basin Wildlife Sanctuary. The Central Belize Corridor (17.349140° N, 88.455310° W, 50m elevation) has been identified as an important link between jaguar populations in northern and southern Belize (24). Cockscomb Basin Wildlife Sanctuary (16.7162° N, 88.6608° W, 500 m elevation) houses the highest density of jaguars in Mesoamerica (21,23). These two areas play a crucial role in the maintenance of the Mesoamerican Biological Corridor, comprised of a network of protected areas stretching from Mexico to Panama (21). A total of 852 scats were collected opportunistically along paths, trails, roads, and transects. Samples were georeferenced at the time of collection and stored individually at room temperature using silica gel beads until DNA extractions. All faecal material has been deposited in the Sackler Institute for Comparative Genomics at the American Museum of Natural History.
DNA extraction and species identification
About 200mg of the dry sample was shaved from each scat and used to extract genomic DNA using a QIAmp DNA extraction Stool Mini Kit (Qiagen Inc., Valencia, California, USA). Samples were screened to identify species via PCR using three mitochondrial gene regions including cytochrome b (H15149)(40,41), 12S rDNA (L1085 and H1259)(42) 16S rDNA (L2513 and H2714)(42) and 16Scp (16S cp-F 16S cp-R)(42). PCR amplifications were done using G&E Ready-to-go PCR Beads (GE Healthcare) in 25ul reaction volumes containing 0.2uM of each primer, 0.3uL BSA and 1ul of DNA. PCR profiles for each reaction are available as supplementary material (see Additional file 2). All amplifications included negative controls. Cycle-sequencing was performed with the BidDye® Terminator v. 1.1 cycle Sequencing kit (Applied Biosystems, Lennik, Belgium) using the PCR primers. Amplified products were visualised and scored in an ABI 3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA). Sequences were aligned and manually corrected using GENEIOUS v.6.5 (Biomatters Ltd., Auckland, New Zealand). Species identification was achieved by comparing consensus sequences to known felid references and by constructing a phylogenetic tree with 93% similarity using the Jukes-Cantor neighbourhood-joining model (43). Species ID was validated if at least three gene fragments successfully identified the same species.
Individual and sex identification
Samples identified as P. onca were genotyped for twelve polymorphic microsatellite loci (FCA 32, 75, 96, 107, 126, 100, 124, 132, 212, 229, 275, and 225)(44). Each sample was screened 4–6 times to have enough replicates to assign the alleles correctly. The genotypes were validated if they successfully scored at least three times. Samples that failed to amplify at least six loci were not considered in our analysis. We amplified sets of 3 primers in multiplex PCR reactions and FCA225 by itself. Amplification was performed using the Qiagen Multiplex Master Mix ®, with a final volume of 20ul following manufacturers recommendations.
PCR reaction conditions were optimised in the annealing temperature for each set of primers as follows: 95°C for 15min, followed by 13 cycles of 94 °C for 30s, 57.4–62.4 °C for 90s, 72°C for 60s, followed by 32 cycles of 94°C for 30s, 55–60 °C for 90 s, 72°C for 60s and 60 °C for 30min (see Additional file 2). Amplified products were visualised and scored in an ABI 3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA). PCR products were analysed using GeneMapper v4.0 following the quality index measure to validate genotypes (Miquel et al. 2006). In order to further validate the allele calling, we used FRAGMAN v.1.0.8 package in R (45). Identification of unique multilocus genotypes was achieved using the ALLELEMATCH package in R v.2.5 (46), which accommodates for genotyping error and missing data by making pairwise comparisons and finding similarity scores for each pair of profiles and clustering similar ones. To determine if genotypes were not different individuals matched by chance, the cumulative probability of identity for unrelated individuals (PID) and siblings (Psib) was calculated. Loci with Psib < 0.010 were positively identified as single individuals (15).
Gender was determined following the PCR-CTPP method for sex identification developed by Wei et al. (2008) based on zinc finger alleles (ZFX/ZFY). The two forward primers (ZF–1F and ZFY–2F) were fluorescently labelled with FAM and MAX, respectively. Amplification was performed using G&E Ready-to-go PCR Beads (GE Healthcare) in a final volume of 25ul containing 2mM of each primer and 5ul of DNA. The PCR conditions were as follows: 95°C for 5min, followed by 13 cycles of 95 °C for 45s, 63 °C for 60s, 72°C for 60s, followed by 35 cycles of 95°C for 45s, 53 °C for 45s, 72°C for 60s and 72 °C for 10min (see Additional file 2). DNA amplifications were visualised in a 5% agarose gel to confirm the molecular sexing. Amplified products were visualised and scored in an ABI 3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA). PCR products were analysed using GeneMapper v4.0.
Population genetics analyses
MICROCHECKER v 2.2.3 (47) was used to screen null alleles in each locus. The rarefaction procedure implemented in HIERFSTAT v 2.9.3.250 (48) was used to estimate the expected number of alleles and to compare allelic richness (r). Measures of genetic diversity as average number of alleles per locus (A),, allelic richness (AR),, observed (Ho) and expected heterozygosity (He) deviations from Hardy-Weinberg equilibrium (HWE),, inbreeding coefficient (FIS),, and multivariate analyses were performed using the ADEGENET package in R v.2.0.1 (49). Genetic diversity among sampled individuals was summarised in a Principal Component Analysis (PCA) based on the allele frequencies. Linkage disequilibrium (LD) between pairs of loci was performed in GENEPOP v. 4.2 (50) with default settings. We used GenAlex v6.0 (51) to explore multivariate patterns of molecular diversity relative to populations via Principal Coordinate Analysis (PCoA) and Mantel tests of matrix correspondence to test for Isolation by Distance (IBD);; we assessed the partitioning of genetic variation between sampling localities with Analysis of Molecular Variance (AMOVA)..
Population Structure
We estimated population genetic structure using Bayesian assignment methods with STRUCTURE v2.3.4 (52), which assigns individuals to a number K of genetically homogeneous groups, based on the Bayesian estimate in accordance to the expected Hardy–Weinberg equilibrium and absence of linkage disequilibrium between loci. We ran STRUCTURE with the LOCPRIOR option to allow sampling location to assist in the clustering, and we performed 20 independent runs for k = 1–10. We set a burn-in period of 100,000 and 1,000,000 MCMC iterations and assumed an admixture model with correlated allele frequencies. To determine the optimal number of clusters and render bar plots, we implemented the Evanno method (30) using POPHELPER package in R v.1.2.1 (53). Furthermore, we inferred spatial genetic structure with TESS 2.3.1 (54). This program assumes that population memberships follow a hidden Markov random field model where the log-probability of an individual belonging to a particular population, given the population membership of its closest neighbours, is equal to the number of neighbours belonging to this population (55). We tested the CAR, and BYM models with linear trend surface to define the spatial prior for admixture (31); we set a burn-in period of 100,000 and 1000,000 sweeps through 10 independent runs testing the maximal number of clusters from 1—10. To decide the optimal K, we plotted the deviance information criterion (DIC) against Kmax. We used GENELAND v4.0.6 (55) as an additional method to infer the number of populations and the spatial location of genetic discontinuities. This program allows using georeferenced individual multilocus genotypes to infer the number of populations and uses the spatial location of genetic discontinuities between those populations. We determined K across 20 independent runs with 1,000,000 MCMC iterations. Thinning was set at 100, allowing K to vary from 1 to 10. We used the correlated allele model and set the maximum rate of the Poisson process at 50 (the number of individuals), the maximum number of nuclei in the Poisson-Voronoi tessellation at 150 (three times the number of individuals), and the uncertainty of spatial coordinates of the collection at 25 meters. We re-ran the analysis ten times to check for consistency across runs.
To further explore the genetic diversity and structure among individuals, we reduced the dimensions via a Discriminant Principal Component Analysis (DAPC) without a priori group assignment using the ADEGENT package in R. v2.0.1 (49,56). The tools implemented in DAPC allow solving complex population structures by summarising the genetic differentiation between groups while overlooking within-group variation, therefore achieving the best discrimination of individuals into pre-defined groups (57). This multivariate method is useful to identify clusters of genetically related individuals when group priors are lacking. Estimation of clusters is performed by comparing the different clustering solutions using the Bayesian Information Criterion (BIC). We compared the results from the three Bayesian approaches and the DAPC to provide confidence in the spatial designation of genetic groupings.
Relatedness
Levels of genetic relatedness were calculated using seven estimators as implemented in the RELATED package in R v1.0 (58). Pairwise relatedness was calculated using the estimators described by Queller and Goodnight, 1989; Li et al., 1993; Ritland, 1996; Lynch and Ritland, 1999 and Wang, 2002), as well as the dyadic likelihood estimator described in Milligan (2003) and the triadic likelihood estimator from Wang (2007). Genotyping errors and inbreeding estimations were incorporated into the model, and confidence intervals (95%) were obtained through bootstrapping across loci. Allele frequencies were used to simulate pairs of individuals of known relatedness based on Parent-Offspring, Full siblings, Half siblings and Unrelated individuals.
Landscape Connectivity
We predicted the most effective corridor via least-cost path analysis using GDISTANCE package in R v1.1 (66). This approach offers the shortest cost-weighted distance between two sampling points; the program allows calculating grid-based distances and routes and is comparable to ArcGIS Spatial Analyst (67), GRASS GIS (GRASS Development Team 2012), and CIRCUITSCAPE (68). The package implements measures to model dispersal histories and contains specific functionality for geographical genetic analyses (66). The least-cost path analysis was inferred from the most northern GPS point in the Central Corridor to the most southern point in the Cockscomb Reserve. Additionally, we used CIRCUITSCAPE v3.5 (68) to model resistance surfaces of the landscape as an alternative to the least-cost path analysis, which assumes that gene flow is associated to total cost along a single optimal path (69). Circuit theory considers all possible paths and is useful to assess different interactions between different landscape features. With these two approaches, we were able to identify the most probable routes for dispersal and gene flow between localities.
Ecosystem preference costs were based on literature review and expert opinion of habitat use by jaguars (3,5,6,35,70–72). To model each ecosystem as analogous to an electrical circuit, each pixel was assigned a resistance value in a scale of 0—9 based upon land cover (Table 2). The resistance value represents the relative effort required to move from one point to another, and the map of resistance values is used to derive all possible pathways for jaguar movement. Spatial data were obtained from the Biodiversity and Environmental Resource Data System of Belize (BERDS)(73).