Sex-linked markers (SLMs) are important in both theoretical and applied biological sciences, especially in conservation. For example, such markers can provide valuable insight into speciation and genome evolution (Graves 2008; Charlesworth and Mank 2010; Kitano and Peichel 2012). Further, in the field of population genetics they allow the inference of sex-specific demographic events due to their sex-specific inheritance, for instance the comparison of mitochondrial and Y chromosome markers can reveal sex-specific migration rates (Petit et al. 2002; Wilson Sayres 2018). Lastly, sex ratio is a key component ecological and demographic studies, particularly when females and males differ in life-history traits (e.g. Tsai et al. 2014; Pillans et al. 2021). In species with a heterogametic sex, markers on sex chromosomes provide a way to identify sex from DNA samples when sexual dimorphism is absent and destructive sampling is not an option, e.g. for Threatened species (Stovall et al. 2018; Suda et al. 2019) or when sampling in no-take zones.
Despite the importance of SLMs, only a few methods and tools exist to identify them in non-model species. Most approaches are focussed on presence-absence and heterozygosity patterns of Restriction-site-Associated DNA sequencing (RADseq) data to differentiate the heterogametic (XY or ZW) from the homogametic (XX or ZZ) sex (Gamble and Zarkower 2014; Fowler and Buonaccorsi 2016; Gamble 2016; Hill et al. 2018). Most of the workflows following this approach use demultiplexed FASTQ files and are based in STACKS or RADtools (see Gamble and Zarkower 2014; Fowler and Buonaccorsi 2016), which can be computationally intensive, or the faster RADSex software (based in C++; Feron et al. 2021). Currently, the only methods that can identify SLMs from genotyped single nucleotide polymorphism, or SNP, data are outliers detection methods (e.g. BayeScan; Foll and Gaggiotti 2008) where the data is partitioned by sex (e.g. Benestan et al. 2017; Trenkel et al. 2020).
Many elasmobranchs (sharks and rays) are threatened with extinction (Dulvy et al. 2014; Dulvy et al. 2021). Their slow life history, low fitness and low connectivity between populations, which is often male biased (see Phillips et al. 2021), has instigated many conservation genomic studies in elasmobranchs (see Ovenden et al. 2018 and references therein). However, to date, no studies have used these available genomic resources to investigate the value of SLMs for sex identification and population genetics. In this study, we introduce a tool that can analyse the existing genomic data, such as RADseq, Diversity Arrays Technologies (DArTseq), genotyping by sequencing (GBS) and others, to look for differential signals between heterogametic and homogametic individuals and identify SLMs on the sex chromosomes.
Specifically, we have designed a function, ‘sexy_markers’, as part of the radiator package (Gosselin et al. 2020) in the R environment (R Core Team 2020) which tests three different scenarios to find markers on the sex chromosomes under the assumption that one sex is heterogametic (see Supplementary Material): (i) markers are only present in females or males, (ii) markers are homozygous in one sex while exhibiting an intermediate range of heterozygosity in the other sex, (iii) markers have double the read depth in females or males. Here, the first scenario identifies markers on the sex chromosome unique to the heterogametic sex and the latter two detect markers on the sex chromosomes shared by the heterogametic and homogametic sexes. The function is based on a visual identification of SLMs from genotyped RAD-type data after minimal data-quality filtering. This function allows the re-assignment of genetic sex when markers on the heterogametic sex chromosome are identified. In addition, we demonstrated the work flow and accuracy of this function using a White Shark example (Carcharodon carcharias, listed as Vulnerable to extinction; Rigby et al. 2019), with a DArTseq dataset of 558 individuals and 23,393 SNPs (Bruce et al. 2018; Hillary et al. 2018). We further compare our function to alternative approaches that use genotyped SNP data as input (i.e. outlier detection methods): OutFLANK (Whitlock and Lotterhos 2015) and PCadapt (Luu et al. 2017). The SLMs were validated using polymerase chain reactions (PCR) with primers designed from SLMs that were mapped to the reference genomes from Marra et al. (2019) and the Vertebrate Genome Project (VGP; https://vgp.github.io/genomeark/). Autosomal primers in the beta-actin gene were also designed from the reference genome to act as a positive control between sexes. Primer sequences and PCR condition are described in the Supplementary Material.
Overall, we found nine Y-linked and 406 X-linked markers using the ‘sexy_markers’ function in less than 5 min computation time. The nine heterogametic SLMs allowed us to assign sex to 43 individuals with unknown visual sex and showed a 6.7% phenotypic – genotypic sex discrepancy across the 402 sexed sharks that passed quality filtering. The latter is most likely explained by human error, although hermaphroditic elasmobranchs have been identified (reviewed in Adolfi et al. 2019). Further, the outlier methods identified 131 and 2720 SLMs for OutFLANK and PCadapt respectively (10 markers in common), but only PCadapt had 16 markers in common with the ‘sexy_markers’ approach. We were able to confidently blast 179 SLMs (seven Y-linked and 172 X-linked markers) to 49 scaffolds from the Marra et al. (2019) genome, of which 47 SLMs mapped to five scaffolds (i.e. putative sex scaffolds). Eight Y-linked and 215 X-linked markers had confident BLAST hits (see Supplementary Material) to eight scaffolds from the VGP genome, with the majority (199 SLMs) mapping to three scaffolds. Overall, we conclude that 48% of the 415 identified SLMs were located in close proximity on putative sex chromosome scaffolds. These markers were considered as a reference to test the accuracy of the ‘sexy_markers’ function with suboptimal data. By randomly sampling 6, 12, 24, 48, 72, 96, 120, 144, 252 and 348 individuals for 200, 1000, 2000, 5000, 7500, 10000, 12500, 15000, 17500 and 20000 markers (Fig. 1), we showed that too few individuals (< 100) and too few markers (< 10,000 or 50% of the total data) will identify false positive SLMs (i.e. not in common with the SLMs from the full data; Fig. 1A-B). This result was more pronounced when the female:male sex ratio was skewed (2:1 or 1:2; Fig. 1C-F). The Y- and X-linked markers were validated through multiplex PCR (Fig. 2; Supplementary Material). PCR results showed that males amplified for the Y-chromosome fragments, while X and beta actin fragments were present in both sexes. The same individuals that had a phenotypic – genotypic sex mismatch based on the heterogametic SLMs (70 base pairs) also showed this discrepancy for the Y-chromosome fragment (655 base pairs).
In general, these results confirm that the White Shark has morphologically distinct sex chromosomes (X and Y), males being the heterogametic sex; an observation also obtained using karyotyping (Maddock and Schwartz 1996), where the authors suggested the White Shark and some other elasmobranchs possess X and Y sex chromosomes. Further, we showed the utility and robustness of the ‘sexy_markers’ function for species with distinct sex chromosomes. Importantly, the function takes genotyped SNP data as input (whereas other software require demultiplexed FASTQ files), which allows a more versatile use of previously published datasets for comparative studies. Finally, we developed a quick (~ 2-hour) PCR assay to identify the sex of sampled White Sharks. This tool will prove useful for sex identification in species that do not display obvious morphological differences between sexes. For instance, most juvenile sharks without developed external sex organs or samples obtained from processed carcasses (e.g. fisheries or fin trade) could be sexed using our method. Future studies include applying the R function on other species, as well as utilising the sex-linked markers for population genetic studies.