Sampling
Our study was carried out along the SWA coast, which was conducted during night monitoring of females along the Espírito Santo (ES), Bahia (BA), and Sergipe (SE) rookeries and covered the 2004/2005 until 2021/2022 nesting seasons (from October to March). Blood and tissue samples were obtained with an active collection of a piece of 6 mm of epithelial tissue with a biopsy punch from the base of the neck and the beginning of the shoulder preserved in ethanol at 95%.
Laboratory procedures
Genomic DNA (gDNA) was extracted following the saline solution (Bruford et al 1992) and CTAB 2% (Doyle and Doyle 1987). We genotyped 15 nuclear microsatellite loci (SSRs dataset) using Cc7G11, Cc1F01, Cc1G02, Cc1G03, CcP7D04, CcP2F11, CcP7C06, CcP8D06, CcP1F09, CcP5C11, CcP1F01, CcP1G03, CcP1B03, CcP5C08, and CcP5H07 primers (Schuelke 2000; Shamblin et al 2007; 2009). Alleles were scored using Geneious the following literature (Schuelke 2000; Shamblin et al 2007; 2009). The identification of null allele, large allele dropout, PCR slippage, and typographic genotyping errors was evaluated by MICRO-CHECKER v2.2.3 (van Oosterhout et al 2004) with a 95% confidence interval by Monte Carlo simulation. Departures from Hardy-Weinberg equilibrium (HWE) for each locus were examined by the exact test method using GENEPOP (Rousset 2008), and the linkage disequilibrium (LD) between each pair of loci was evaluated through the significance test using the Markov Chain method (10,000 dememorizations, 20 batches, and 5000 iterations per batch) with a Bonferroni correction with the standard procedure of an initial critical value of 0.05 using the FreeNA (Kawashima et al 2009).
Additionally, we compiled D-loop haplotype sequences from a previous study (Ludwig et al 2023) for comparison levels.
Population structure analysis
The population structure of the SWA loggerhead populations was evaluated using a Bayesian clustering analysis implemented in Geneland v4.0.9 (Guillot et al 2005) for R (R Development Core Team, 2013), which detects if there is spatial genetic population structure based on the distribution of allelic frequencies (SSRs dataset) by populations and correlates with geographic distance (UTM coordinates) through an optimal value of K estimation by examining the posterior probabilities averaged over multiple runs (10 runs, K = 1–10). The analysis was set using a correlated alleles model, 100,000 iterations, thinning of 200, and a burn‐in of 500, and default settings were used for the maximum rate of the Poisson process, and the maximum number of nuclei in the Poisson–Voronoi tessellation. Based on these results, we split the individuals and their respective D-loop and SSR datasets according to the estimated genetic clusters for all further analyses (Table 1), except for RJ where the SSR dataset was not available.
Then, to properly examine and validate that there is significative partitioning of genetic variation among the genetic clusters (obtained by Genland), we performed a pairwise FST and Analysis of Molecular Variance (AMOVA) implemented in GenAlex (Peakall and Smouse 2012) followed by statistical significance with 9,999 permutation steps each. Besides, we performed a principal coordinate analysis (PCoA) based on global FST that is implemented in GeneAlex to examine the genetic variations among the genetic clusters. At last, we performed a discriminant analysis of principal components (DAPC) that is implemented in adegenet R package (Jombart and Collins 2017), to evaluate the admixture levels among the four nesting areas using the population information a priori, which could indicate a sharing ancestry.
Demographic analysis
Based on the compiled D-loop dataset, signatures for demographic expansions were evaluated in DnaSP v6.0 (Rozas et al 2017) by statistics of neutrality tests of Tajima’s D test (Tajima 1989), Fu’s Fs test (Fu 1997). In addition, in DnaSP, Mismatch distribution analyses (MD) were carried out to estimate the frequency of pairwise nucleotide-site differences between sequences using the population growth-decline model to further examine the demographic history. From that, the MD can provide distinct patterns like unimodal which is associated with a sudden population expansion, skewed unimodal is generally associated with a recent expansion or bottleneck, or multimodal and bimodal which are usually associated with the presence of two distinct lineages (e.g. Reis et al 2010). We also estimated the sum of squared deviations (SSD) between the observed and expected mismatch for each nesting area using a parametric bootstrap approach using 1,000 replicates (Harpending et al 1993). A non-significant index suggests that the observed data have a relatively good fit for the growth-decline model. In contrast, a significant index is indicative of a stable population which is typically thought to show a ‘ragged’, multi-modal mismatch (Harpending 1994).
Based on the SSR dataset, the contemporary Ne was estimated through the linkage disequilibrium (LD) method implemented in NeESTIMATOR v2 (Do et al 2014) comparing the genetic clusters. Then, the changing of Ne over the last generations through past population declines (0.5-5 Ne generations) for the SWA loggerheads was estimated in BOTTLENECK 1.2.02 (Cornuet and Luikart 1996; Piry et al 1999) that is based on the expected distribution of alleles frequencies and the significant number of loci with excess He comparing the three mutation models available (IAM of Maruyama and Fuerst 1985, SMM of Cornuet and Luikart 1996, TPM of Fu and Chakraborty 1998) with 95% single-step mutations and 5% multi-step mutations. The significance of excess heterozygosity was assessed using a one-tailed Wilcoxon sign-rank test, as recommended for fewer than 20 loci (Luikart 1997), and probability (P) for 1,000 simulations.
The BSP was performed to evaluate in an older time window the SWA loggerhead's demographic history. It was chosen because it is based on posterior probabilities derived from an approximate likelihood Markov Chain Monte Carlo (MCMC) approach. For the D-loop dataset (BSP1) was performed in BEAST v1.8.2 (Drummond et al 2013), as follows (Clusa et al 2018; Baltazar-Soares et al 2020). The substitution model was set to HKY as the best-fit model of nucleotide substitution chosen through Akaike Information Criteria (AICc). The mutation rate was set to 3,24 x 10–3 substitutions/site/MY (Duchene et al 2012). Plots were drawn using Tracer v1.6 (Rambaut et al 2014) and 25% of each chain as burn-in. While, for the SSR dataset (BSP2) was performed in the VarEff package (Nikolic and Chevalet 2014), which applies a coalescent approach estimating changes in ancestral Ne that rely on motif distance frequencies between alleles to estimate variation in Ne over time (Global θ = 4Neµ; µ = mutation rate) (Nikolic and Chevalet 2014). We set four independent MCMC runs using a Two-Phase Mutation model with µ =5,7 x10 -4 (estimated mutation rate from FitzSimmons et al 1997) and 10% of the mutations greater than a single step, simulating 10 million generations sampled at every 1,000 trees, and 25% burn-in. Inferences were made about past expansions and declines based on posterior distributions and their attributes (mean, median, and mode values) (Nikolic and Chevalet 2014) along the last 1000 generations which covered the SWA loggerheads colonization period into the SWA. We transform the generation's time using 32 years as the mean age of maturation for SWA loggerheads as follows (Petitet et al 2012; Vilaça et al 2021).
Evolutionary history reconstruction
The evolutionary reconstruction history of the SWA was performed using the SSR dataset only by a Bayesian inference model (Beaumont et al 2002) that implemented the DIYABC v2.1.0 (Cornuet et al 2014), which is based on the coalescent using a similarity criterion between simulated and observed data sets. The approach allows the posterior probability (PP) of different scenarios applied to the same data set to be estimated (Miller et al 2005; Terray et al 2007), with the assumption that the scenario with the highest PP is the most accurate based on a polychotomic logistic regression of each scenario probability on the deviations between simulated and observed summary statistics (Fagundes et al 2007; Beaumont 2008; Cornuet et al 2008).
We simulated five scenarios for the divergence and colonization of loggerhead populations into the SWA region following the DAPC results: (1) null model with a split of SWA loggerhead populations by their respective rookery diverging from an unknown lineage; (2) sequential splitting model directly following results from Geneland, where one the hand the BA/k1 and SE/k2 populations raised independently each from the ES/k3 and ES/k4; 3) the BA/k1, ES/k3 and ES/k4 sharing a common origin, while SE/k2 diverged independently; 4) where the ES/k3 and ES/k4 raised independently each from BA/k1 and SE/k2; and 5) the BA/k1 originated by introgression among ES/k3 and ES/k4, and SE/k2 raised independently from others. For that, we assumed that microsatellites evolved under a stepwise mutation model (after positive indication by BOTTLENECK, see further), and all individual loci had the same value, then, we simulated 1,000,00 datasets in three independent Monte Carlo Markov Chains (MCMC), with a burn-in of 25% for every 1000 interactions. To determine the best-supported scenario, all scenarios were compared using the posterior distribution probability with 95% confidence intervals (CI) with type I and type II errors for the best-supported scenarios, and a Relative Median Absolute Deviation (RMedAD) to estimated parameters for all scenarios (details in Table S2).