Study Area
Our study was located south of Houston, Texas on and around Galveston Island, San Bernard Wildlife Refuge, and Anahuac National Wildlife Refuge. This region is humid and subtropical, where hurricanes are common during the summer and fall seasons (Berger 2008). Galveston Island is a barrier island situated approximately 80 km South of Houston on the Texas Gulf Coast. It is 160 square kilometers, 43.5 km long, and no more than 4.8 km at its widest point. It has a year-round population of 48,000 residents (Census 2010) and supports many tourists throughout the year. Galveston Island is completely disconnected from the mainland, and only has two access bridges, from highway 45 at the North end and San Luis Pass on the west end. The closest distance to the mainland is approximately 1 km at San Luis Pass. San Bernard National Wildlife Refuge is ~ 65 km west of Galveston in East-central Texas. It is a ~ 185 km2 wildlife refuge that is a mix of salt marsh, coastal prairies, and bottomland hardwood forests (Yao et al. 2020). The Anahuac National Wildlife Refuge is ~ 40 km east of Galveston Island and is a part of the Texas Chenier Plain Refuge Complex which covers 140 km2 (Lane 2017).
Field Methods
We systematically sampled Galveston Island for feces for noninvasive genetic sampling. We collected fecal samples and recorded their GPS location along 25 designated transects distributed east to west (Supplemental Fig. 1) across the island during three field seasons, January 2020, July 2020, and January 2021. Each transect was routinely walked or driven over multiple days to retrieve fresh fecal samples. Based on a pilot study in August 2019 to test fecal collection protocols (Appendix A), we collected fecal samples by swabbing them with a sterile cotton swab that was subsequently placed in a 2 mL tube of Longmire buffer. Additionally, the whole fecal sample was collected in a labelled paper bag and dried for future diet analysis. We also opportunistically collected fecal samples and recorded their GPS location from mainland Texas, including National Wildlife Refuges surrounding Galveston to compare Galveston Island to nearby parts of Texas. Swabs were stored in a freezer at -20°C and whole fecal samples were stored in a clean dry area at Michigan Technological University (MTU), Houghton, USA. We additionally collected roadkill tissue samples from Galveston Island and mainland Texas. Coyote tissue was placed on silica or ice and transported to MTU for long term storage at -20°C. All research was approved by the MTU Institutional Animal Care and Use Committee (#1438689) and the Texas Parks and Wildlife Department (SPR-0220-020).
Molecular Lab Methods
We extracted DNA from fecal samples using a modified QiAmp Fast DNA Stool protocol (Qiagen, Inc., Hilden, Germany), in a laboratory dedicated to low-quality DNA at MTU. We extracted DNA from tissue samples using a DNeasy Blood and Tissue Kit (QIAGEN, Valencia, CA) following the manufacturers protocol in a laboratory dedicated to high-quality DNA samples. We amplified a portion of the cytochrome B region of the mitochondrial genome via polymerase chain reaction (PCR) to confirm DNA isolated from fecal samples was collected from wild coyotes and to gather information on matrilineages, which is useful in differentiating red wolves and coyotes (Adams & Waits 2007). Extant red wolves are represented by a single mitochondrial haplotype, so we compared the mitochondrial lineages of our generated sequences with previously published sequences to assess if they had the red wolf haplotype.
For low quality DNA extracted from fecal samples, we targeted a ~ 200 base-pair fragment using primers ScatSeqF: 5`- CCATGCATATAAGCATGTACAT-3` and ScatSeqR 5`-AGATGCCAGGTATAGTTCCA-3` (Adams et al. 2003). The PCR mix consisted of 10mM of each primer, 0.2mM dNTPs, 1x Buffer II, 2.5 mM MgCl2 and 0.5 µL Amplitaq Gold (Applied Biosystems) in a 15 µL reaction with 1.5 µL of DNA extract. We used an Eppendorf Mastercycler Gradient Thermal Cycler (Eppendorf, Hamburg, Germany) using an initial denaturation step of 95°C for 10 minutes, 40 cycles (95°C for 30 s, 48°C for 45 s, 72°C for 60 s) and a final extension for 7 minutes at 72°C. For high quality DNA extracted from tissue we targeted a ~ 420 base-pair fragment using primers Thr-L: 15926 5’-CAATTCCCCGGTC TTGTAAACC-3` and DL-H 16340: 5’ -CCTGAAGTAGGAA CCAGATG-3` (Vila et al. 1999). The PCR mix consisted of 10mM of each primer, 0.2mM dNTPs, 1x Buffer II, 2.5 mM MgCl2 and 0.5 µL Amplitaq Gold (Applied Biosystems) in a 10 µL reaction with 1 µL of DNA extract. The Eppendorf Mastercycler Gradient Thermal Cycler conditions were an initial 10-minute denaturation at 95°C, 40 cycles (95°C for 30 s, 55°C for 45 s, 72°C for 60 s) and a final extension for 7 minutes at 72°C. Each PCR was run with a negative control from DNA extractions to monitor for contamination. We visualized each PCR product on a 2% agarose gel run at 115 volts for one hour. Samples that contained nonspecific amplification were reamplified with an annealing temperature of 58°C. We enzymatically cleaned samples using the product ExoSapIT (Bell 2008) on an Eppendorf Mastercycler Gradient Thermal Cycler following manufacturers protocol and sent samples to GENEWIZ (New Jersey, U.S.A.) for Sanger Sequencing; mitochondrial sequences were viewed, trimmed, and aligned with Geneious software v2021.1.1 (Kearse et al. 2012).
We compared our generated sequences to all previously published sequences in the NCBI BLAST database using Geneious. The samples confirmed to be from the target sequence were aligned using a pairwise MUSCLE alignment in Geneious using 8 iterations. The shorter sequences that were amplified using the Scatseq primers were matched to a longer sequence from the Thr-L 15926 and DL-H 16340 primers for haplotype identification. We used a Muscle alignment in Geneious to compare our mtDNA haplotypes to all extant North American canid haplotypes from NCBI GenBank.
We genotyped nuclear DNA (nDNA) at 17 microsatellites at the Laboratory for Ecological, Evolutionary, and Conservation Genetics (University of Idaho, Moscow, U.S.A.) to identify individuals from fecal samples, estimate ancestry, and estimate relatedness. This multi-locus microsatellite panel has been used extensively in the past for identifying red wolf X coyote hybrids (Adams et al. 2007; Miller et al. 2003; Bohling & Waits 2011; Bohling et al. 2013; Murphy et al. 2019). We generated genotypes using two multiplexes (Bohling & Waits 2011; Bohling et al. 2013). The first multiplex contained 0.06 µM of CXX.377, 0.07 µM of CXX. 172, CXX.173, and CXX.250, 0.13 µM of CXX.109, 0.16 µM of CXX.200, 0.20 µM of AHTq121, 0.60 µM of AHT103, 0.71 µM of CXX.20, 1X Qiagen Multiplex PCR Kit Master Mix, 0.5X Q solution, and 1 µL of DNA extract in a 7 µL reaction (Ostrander et al. 1993, 1995; Mellersh et al. 1997; Miller et al. 2003; Murphy et al. 2019). The second multiplex contained 0.06 µM of FH2010, 0.07 µM of FH2062 and FH2054, 0.10 µM of FH2001, 0.16 µM of FH2145, 0.24 µM of FH2004, 0.36 µM of CXX.225, 0.80 µM of CXX.403, 1X Qiagen Multiplex PCR kit Master Mix, 0.5X Q solution, and 1 µL of DNA extract in a 7 µL reaction (Ostrander et al. 1993; Mellersh et al. 1997; Miller et al. 2003). We amplified tissue samples in duplicate and performed up to four and six replicate PCRs for the tissue and fecal samples respectively. We visualized PCR products using a 3130xl DNA Sequencer and scored allele sizes using Genemapper 5.0 (Applied Biosystems, Inc., Foster City, U.S.A.). Assessment of sample quality and genotype screening methods followed those described by Adams and Waits (2007). We calculated Hardy-Weinberg Equilibrium using package Pegas in R v4.0.2 (Paradis 2010) and used a Bonferroni correction that corrects for multiple, simultaneous comparisons (Weisstein 2004). We calculated probability of identity (PID) and probability of identity for siblings (PIDSIBS) using nine loci in Multiplex 1 with GenAlEx v6.5 (Peakall & Smouse 2006) and performed a matching analysis in GenAlEx to determine how many individuals were detected in the fecal genotypes after Multiplex 1. Only five loci were necessary to differentiate between individuals, so only unique individuals were amplified with Multiplex 2. We removed 4 individuals collected from mainland Texas from the rest of the analyses because they were from South Texas, not adjacent to Galveston Island, and thus not a genetically relevant group for assessing gene flow between Galveston Island and the adjacent mainland.
Analytical Methods
Our first objective was to estimate red wolf ancestry proportions in each individual, by assessing the mitochondrial lineage and determining the distribution of red wolf ancestry in nDNA across the island and adjacent mainland Texas. To accomplish this, we estimated gene trees with mtDNA haplotypes using Bayesian methods implemented in BEAST v.1.10.4 (Drummond et al. 2012), with a constant size coalescent tree prior, an uncorrelated lognormal relaxed molecular clock, and a random starting tree. We conducted Markov Chain Monte Carlo (MCMC) analyses with 20 million steps, sampling every 2000 steps, and combined tree estimates from each run with LogCombiner v.1.10.4 (Rambaut & Drummond 2015) with a 10% burn-in. We calculated the maximum clade credibility in TreeAnnotator v.1.10.4 (Helfrich et al. 2018) and uploaded the most likely tree in the Interactive Tree of Life v3.6.3 online webtool to visualize the gene tree and mtDNA lineages (Letunic et al. 2016).
To determine red wolf nDNA ancestry, we used a Bayesian assignment method (Bohling et al. 2013) implemented in program STRUCTURE (Pritchard et al. 2000; Falush et al. 2003). Galveston Island and mainland samples were assigned ancestry proportions to our Canis references. References included Mexican wolves (n = 14), domestic dogs (n = 38), gray wolves (n = 38), red wolves (n = 19) representing 13 of the 14 genetic founders plus descendants of the fourteenth founder of the captive breeding population, and southeastern coyotes (n = 107). For the STRUCTURE analysis we set the number of populations (K) a priori to five (e.g., Mexican wolf, domestic dog, gray wolf, red wolf, and coyote) and ran 10 independent runs of the admixture model with correlated allele frequencies with a burn-in of 20,000 Markov chain Monte Carlo iterations followed by 50,000 iterations to estimate a posterior probability of ancestry (q). To prevent bias that can arise from including related individuals among our samples, we used the PopFlag prior that uses a Boolean variable to indicate learning samples. Because the Popflag prior can be sensitive to admixed individuals in the reference populations, after an initial STRUCTURE run, we removed reference individuals with ≥ 25% ancestry assignment to a difference reference group (i.e., gray wolf with 25% dog ancestry). This left us with Mexican wolves (n = 14), domestic dogs (n = 37), gray wolves (n = 36), red wolves (n = 18), and southeastern coyotes (n = 98). We used a t-test to compare mean red wolf ancestry proportions between males and females and further evaluated genetic clustering with a principal component analysis (PCA) using package Adegenet (Jombart 2008) and package Factoextra (Kassambara & Mundt 2017) in R v4.0.2. We conducted two PCAs, the first with all reference populations and second with a subset (gray wolf, red wolf, coyote, our samples) to better visualize clustering with only one outgroup.
Our second objective was to evaluate the baseline genetic variation of Galveston Island coyotes and compare it to Texas mainland coyotes surrounding the island to measure for restricted gene flow and inbreeding. We calculated standard measures of genetic variation in several ways. First, we calculated observed and expected heterozygosity, which are measures of genetic diversity within a population using program GenALEx (Peakall & Smouse 2006). We calculated this metric for all Galveston Island coyotes, Texas mainland samples that were close geographically proximity to the island, and all Canis reference populations (Table 1). Next, we assessed genetic differentiation of Galveston Island and our Texas mainland samples in comparison to reference populations by estimating pairwise FST (Takezaki & Nei 1996) and FIS values per population using package Hierfstat in program R (Goudet, Jombart, & Goudet 2015). We additionally calculated allelic richness and private alleles using package PopGenReport in program R (Adamack et al. 2014).
Our third objective was to estimate relatedness, genetic substructure, and describe the distribution of red wolf ancestry on Galveston Island. To identify possible family groups, we first estimated genetic relatedness of Galveston Island coyotes by calculating pairwise relatedness between all individuals on Galveston Island using the maximum likelihood approach implemented by the program ML-Related (Kalinowski et al. 2006). We estimated average relatedness of all individuals on Galveston Island and evaluated differences in relatedness between the sexes. Next, we used STRUCTURE to evaluate the genetic substructure of coyotes on Galveston Island. We conducted 10 independent runs for each K value with the admixture model K = 1–10, using 50,000 repetitions after a burn-in of 20,000 Markov chain Monte Carlo iterations. The most likely number of genetic clusters represented by the data was estimated by considering delta K (Evanno et al. 2005), calculated with STRUCTURE HAVESTER v0.6.94 (Earl & vonHoldt 2012). However, delta K cannot provide support for K = 1 or K = 10 as it is based on the rate of change in log-likelihood between successive K values, so we also used the log-likelihood (Ln Probability) values inferred from STRUCTURE (Pritchard et al. 2000). We used pie charts plotted spatially in ArcGIS Pro to visualize differences in genetic clusters (K = 2–7). We considered individuals to have high assignment to a given inferred cluster if the ancestry proportion (q) was greater than or equal to 0.8. We separated individuals into family groups based on their assignment.
To address our fourth objective, we then separated the family groups by the habitat characteristics based on the GPS location they were collected within (developed or undeveloped), to test for differences in mean red wolf ancestry proportions between developed and undeveloped natural areas of Galveston Island. We used The National Oceanic and Atmospheric Administration (NOAA) geospatial classification system with a 30-meter resolution (NOAA 2010). We ran a F-test to test for unequal variances and then ran a one sample t-test on the ancestry proportions of the family groups within each habitat characteristic. Next, we used NOAA land cover class (NOAA 2010) in ArcGIS Pro and grouped all individuals into three habitat categories: open, low intensity developed, and high intensity developed based on where they were sampled. The open category included habitats with little to no human presence including Galveston Island State Park, East End Nature Preserve, Artist Boat Coastal Heritage Preserve, and Galveston Bay Foundation Conservation Preserve; the low intensity category included green spaces with some development including golf courses, airports, and RV parks; the high intensity category included Galveston City. We conducted an ANOVA in R Studio to test for a difference in mean red wolf ancestry between the three habitat categories using family groups as a random variable.