Development of 27 new microsatellite markers for the shanny Lipophrys pholis

The shanny Lipophrys pholis is an intertidal fish that is widely distributed throughout the Northeast Atlantic. Characterized by limited adult mobility and a long pelagic larval duration, the shanny stands as an ideal model to better understand larval dispersal and connectivity dynamics, which are critical parameters with implications for marine conservation and management. To this aim, we developed 27 highly polymorphic microsatellite markers and characterized a population of 42 individuals, presenting an average allelic diversity of 20.1 alleles per locus and heterozygosity ranging from 0.619 to 1. This set of newly developed microsatellite markers will be useful in providing critical insights into the processes which shape L. pholis gene flow and connectivity patterns and can be used to investigate local parentage lineages.


Introduction
Connectivity among geographically distant populations is a key aspect in the ecology of marine organisms, as it shapes metapopulation dynamics, determines genetic diversity, and drives resiliency of populations to environmental disturbances and exploitation [1]. Understanding population connectivity has proven valuable in identifying dispersal-related processes, with important consequences for conservation and management through the design of ecologically relevant protected areas and fisheries management units [2]. Connectivity dynamics vary widely among marine organisms, including intertidal fish [3], according to oceanographic characteristics and life-history traits. In particular, pelagic larval duration has long been associated with dispersal capabilities, with often greater genetic homogeneity for species with a long planktonic phase [4].
The shanny Lipophrys pholis (Linnaeus, 1758, Pisces: Blenniidae) is a common resident fish of the intertidal rocky shores of the Northeast Atlantic that is particularly useful for biomonitoring ecosystem health (e.g. for detecting organic contaminants [5]), as its life-history traits reflect local environmental conditions. Due to its high abundance as well as its restricted home range [6], it has been used as a model species in marine physiology [7] and to study the exposure to genotoxins [8]. L. pholis presents a wide distribution, ranging from Norway to Mauritania in latitude, and from the Azores and Madeira in the Atlantic to the Western Mediterranean in longitude [9]. Its life-cycle includes a long larval dispersal period of 57 to 73 days [10] after which they remain in the same intertidal pool for the rest of their lifecycle, with little adult migration [11]. These characteristics also make it an ideal model for investigating genetic differentiation in marine fishes with a long pelagic larval duration.
Earlier genetic studies, based on mitochondrial DNA, suggest that the Atlantic populations of L. pholis can be separated into two groups [12,13]. One population from the European coastline (UK to Morocco) to Madeira displays a lack of phylogeographic structure along the Atlantic coast and high genetic diversity throughout its range implying 1 3 large-scale connectivity between shanny populations along the European coastline. A second genetically distinct population is situated in the Azores. Nevertheless, these patterns of high genetic connectivity do not necessarily oppose small-scale demographic processes [14]. In fact, cross-scale research linking local and broader geographic scale patterns of larval exchange is key to understanding population dynamics and resource management [15] and is valuable in shedding light on scale-related demographic exchange [16]. As such, L. pholis may present finer scale genetic differentiation patterns which have yet to be detected. Research on the genetics of L. pholis has largely been assessed through mitochondrial (Control Region, D-Loop, 12S, and 16S) and nuclear (S7) sequence data but has failed to provide information on fine-scale connectivity [12,13,17]. Microsatellites have been a widely used tool in genetic studies since the 1990s because of their high polymorphism, their abundance, and their dispersion throughout eukaryotic genomes [18]. By using a panel of several microsatellite loci, a unique genotype profile can be produced for each organism tested, leading to their individual identification. Only one study has developed and used microsatellite markers to study interspecific amplification, but it did not focus on genetic structure [19]. Guillemaud et al. [19] only developed four specific markers for the shanny, which is insufficient to conduct population structure and parentage analyses [19]. The goal of this study was to develop an additional set of de novo microsatellites for L. pholis to extend the existing set, in order to investigate the Northeast Atlantic population structure at a finer geographic scale.

Biological samples and DNA extraction
For this study, nine L. pholis samples originating from six sample sites in Galicia, Spain (Corrubedo Cape, Couso Cape, O Grove, Ons Island, Cies Island, Silleiro Cape) were used to develop a database of species-specific microsatellites. Additionally, 42 samples from Ons Island (42°23′28.4 N "8°55′24.0 W) within the Islas Atlánticas Marine Park (Pontevedra, Spain) were characterized using the newly developed set of markers. Individuals were collected using hand nets in multiple tide pools within sample locations. After collection, fin clips were preserved in 70% ethanol, and frozen until DNA extraction was performed at the laboratory. Genomic DNA was extracted from all samples using the QIAamp® 96 DNA QIAcube® HT robotic workstation (Qiagen, Hilden, Germany) following the manufacturer's instructions. DNA templates were then diluted at 50 ng/µL and stored at − 24 °C.

Development of new microsatellite markers
After extraction, the DNA was sent to GenoScreen (Lille, France) for high-throughput genomic sequencing and library preparation, using the approach described in Abdelkrim (2009) [20] to provide a database of microsatellites. The paired-end 2 × 250 sequence run was done on an Illumina MiSeq platform, and the resulting reads were merged by the Usearch software. A final analysis of the raw sequences was done using QDD v.3, which treats all bioinformatics steps from raw sequences until Polymerase Chain Reaction (PCR) primers are obtained and relies on BLAST, ClustalW and Primer3 programs for adapter removal, microsatellite detection, redundancy detection, and sequence selection with target microsatellites and primer design. The default parameters used for QDD were 200 bp flanking region length, removal of sequences shorter than 80 bp, 95% minimum pairwise identity between sequences of a contig, and a 66% minimum of sequences with the same base at a site to be considered as a consensus. Primer design settings were set for a PCR product between 90 and 300 bp, with primer length set between 18 and 27 bp. Following this initial sequencing step, 2331 primer pairs were obtained, from which 50 pairs were selected based on repeat number, motif (di-, tri-, or tetranucleotide), and PCR product size (≥ 100 bp). Primers were then tested individually using PCR to determine optimal annealing temperature. PCR was performed with reaction volume of 11 µL, containing 5 µL of Master Mix Type-it 2X (Qiagen) which included all materials required for the PCR amplification (HotStarTaq Plus DNA Polymerase, dNTP mix with 200 µM each, and PCR buffer containing 6 mM MgCl 2 ), 6 µL RNase-free water, 1 µL of forward and reverse primers (2 µM diluted in TE pH 8 buffer), and 1 µL of 50 ng/µL genomic DNA solution. The PCR conditions were as follows: an initial denaturation step at 95 °C for 5 min, 40 cycles of 95 °C for 30 s, chosen annealing temperature (53 °C to 65 °C, see Table 1) for 90 s, 72 °C for 30 s, and a final 30 min extension step at 60 °C. Electrophoresis was carried out at 100 V on 2% agarose gels for microsatellite size characterization as well as for preliminary detection of polymorphism. Thirty microsatellites were chosen and included in five multiplex panels designed by combining loci of different allele sizes and identical primer annealing temperatures. In order to proceed with genotyping, forward primers were labeled with fluorescent dyes FAM, YAKYE, ATTO550, and ATTO565 (Eurofins, Luxembourg). Amplifications were carried out with PCR conditions identical to those previously described. PCR products from all samples were sent to GenoScreen for fragment visualization using an Applied Biosystems 3730 Sequencer, where an internal ladder was    added to each sample in order to ensure accurate sizing (GeneScan 500 LIZ, Applied Biosystems). Allele sizes were scored in GENEMAPPER software v.5 (Applied Biosystems), with ambiguous peaks assigned as missing data to avoid errors in genotyping.

Data analysis
The software MICRO-CHECKER v 2.2.3 [21] was used to detect null alleles, potential scoring errors as well as to test for large allele dropout. The total number of alleles N a was calculated in GenAlEx 6.503 [22]. Observed (H o ) and expected (H e ) heterozygosities were estimated through GENETIX v 4.05.2 [23], which was also used to compute the inbreeding coefficient (F IS ) and linkage disequilibrium (LD) by permutations (1000 permutations each). Holm-Bonferroni correction was applied to LD analysis to adjust p-values for multiple comparisons.

Results and discussion
Out of the 50 primer pairs that were tested in this study, 30 polymorphic loci exhibiting clear amplification profiles were successfully developed. Analysis with MICRO-CHECKER revealed the presence of null alleles in three loci (AG08, AC17, AGG47), which also featured high and significant F IS values as well as a noticeable difference between expected and observed heterozygosity (Table 1) compared to other markers, indicative of a departure from the Hardy-Weinberg equilibrium. These loci were removed from the dataset for further analysis, and no other large allele dropout or allele scoring errors were detected in the rest of the marker set. No significant LD was detected between the retained loci after Holm-Bonferroni correction. While previous studies only developed dinucleotide markers, the new panel of microsatellite markers was characterized by an almost equal distribution of motif abundance percentages, with a majority of di-(37%), tri-(33%), and tetranucleotide repeats (29%) [19]. Although less abundant than dinucleotides in the genome, tri-and tetranucleotide repeats are traditionally sought-after due to the large number of base-pair differentiating alleles: this makes variations in the number of repeats simpler to identify during the genotyping process, therefore removing possible biases, such as stuttering [24]. After 42 individuals from Ons Island were screened with the 27 newly developed loci, a total of 568 alleles were revealed in the dataset, which was deemed robust enough to study the structure and parentage of the L. pholis population [25].
The number of alleles found per locus ranged between 8 (AT50) and 42 (AG10) with an average of 20.074 alleles per locus (Table 1). Excluding the three loci which exhibited null alleles, average expected heterozygosity (H e ) was 0.8832, ranging between 0.6566 and 0.9722, and the observed heterozygosity (H o ) ranged from 0.619 to 1 with an average of 0.8754. The difference between H e and H o was less than 1% with slightly fewer heterozygotes observed than expected. The new panel of loci showed a high level of genetic diversity and high heterozygosity levels, which is in line with the previous study on L. pholis that used microsatellites [19], although recent findings seem to suggest this may not be the case throughout its distribution range and may also vary according to sampling years [26]. Similar to other marine species, the high diversity shown here is likely correlated with a combination of life-history traits, such as small body length [27] and large population size [28][29][30]. Moreover, L. pholis also displayed high diversity when other genetic markers were used at a larger geographic scale of hundreds of kilometers [12,17].
Excluding the loci that presented null alleles, only one marker (AGG49) presented a significant F IS value (p-value = 0.047), suggesting heterozygote deficiency which could be attributed to inbreeding, the presence of null alleles, a Wahlund effect, or selection [31]. In the shanny, occupancy of a tide pool is based on body size [32]. As individuals were sampled from multiple tide pools, it is possible that sampling included individuals from genetically distinct cohorts influenced by interannual current variability. However, a Wahlund effect would typically be observed at all loci, similar to inbreeding. Furthermore, loci with possible null alleles were discarded from the set of markers following analysis by MICRO-CHECKER, which did not detect evidence for null alleles in AGG49.
The successful amplification of a large number of markers (27 de novo markers) and the high levels of polymorphism found in the samples suggest that these newly developed microsatellite markers can be effectively used in L. pholis to make inferences about genetic diversity, family structure and to assess potential dispersal patterns. Having a large panel of microsatellite markers is also fundamental in the study of patterns of connectivity among populations and these newly developed markers will help to shine a light on gene flow patterns in future genetic studies.

3
Ethical approval Sampling was conducted by A. B. under a permit delivered by the Atlantic Islands of Galicia Maritime-Terrestrial National Park. A. B. has been trained, is aware of and fully complies with all current ethical guidelines of the European Commission (Article 9 of the Directive 2010/63/EU) and national legislation regarding animal research.