The mammalian Y chromosome is widely recognized for its role in male traits, characterized by paternal inheritance, high mutation rate, and minimal recombination rate or reverse mutation (Cortez et al., 2014). Over approximately 300 million years, it has evolved from ancestral autosomes to include genes vital for male reproduction and fertility (Lahn and Page 1999; Bachtrog 2013). Whole genome sequence has been characterized in cattle and, more recently, in yak (Kumar et al., 2024). However, research on the Y chromosome in yaks remains scarce, likely due to its repetitive sequences, palindromic regions, and elevated mutation rates, which pose challenges for sequencing. The Y chromosome plays a crucial role in studying paternal genetic variation, origins, and population structure in domestic animals. Earlier investigations into the yak Y chromosome have utilized PCR amplification to explore diversity (Luo et al., 2022; Li et al., 2014; Ma et al., 2018). Previous studies on the yak Y chromosome primarily utilized PCR-based methods to investigate genetic diversity (Luo et al., 2022; Li et al., 2014; Ma et al., 2018). Our study represents a pioneering effort to identify and analyze single nucleotide polymorphisms (SNPs) on the yak Y chromosome through whole genome resequencing. This is the first study to explore the impact of linkage disequilibrium (LD) pruning on Y chromosome variants in bovine species.
Variant calling
The SNP calling pipeline employed in this study closely followed the methodology described by Kumar et al. (2024), but with a notable difference: we used the LU_Bosgru_v3.0.99 reference database in SnpEff. This methodological choice resulted in the identification of a total of 274,828 SNPs in Arunachali yaks, 243,143 in Himachali yaks, 283,774 in Ladakhi yaks, and 194,228 in Jinchuan yaks. These numbers differ significantly from those reported by Kumar et al. (2024), indicating potential variations in SNP detection due to differences in reference databases and analytical pipelines. Despite the similar sample sizes across the studied yak populations, Jinchuan yaks exhibited a notably lower SNP count compared to other populations. This reduction can be attributed to the intense selection practices applied to Jinchuan yaks. Jinchuan yaks are selectively bred for superior reproductive performance, achieving up to 90% reproductive success, and for their enhanced milk and meat quality (Mipam et al., 2012). Such selective breeding pressures reduce genetic diversity within this population, resulting in fewer SNPs compared to other yak populations, which have experienced less intense selection (Lachance and Tishkoff, 2013). The observed lower SNP count in Jinchuan yaks underscores the impact of selective breeding on genetic diversity. While this selective pressure enhances desirable traits, it also leads to reduced genetic variation, which is reflected in the SNP profiles of these yaks.
Gene Annotation and functions
The genes ASMT (acetylserotonin O-methyltransferase) and ASMTL (acetylserotonin O-methyltransferase-like) are located in the pseudoautosomal region (PAR) of the short arm of the X and Y chromosomes. The ASMT gene plays a critical role in the biosynthesis of melatonin, a hormone involved in regulating circadian rhythms (Liu et al., 2023). Research has shown that overexpression of ASMT in dairy goats and cattle leads to elevated melatonin levels in blood and milk, resulting in better milk quality (Wu et al., 2022; Ma et al., 2017; Yao et al., 2022). This hormone is essential for maintaining the circadian rhythm, which synchronizes internal biological processes with external environmental cues (Lan et al., 2018). Another significant gene, ACE2 (angiotensin-converting enzyme 2), has been observed to increase mRNA expression and protein levels during the early stages of hypoxia, before returning to near-baseline levels later (Zhang et al., 2009). This response highlights its potential role in adaptation to low-oxygen environments. Mutations in the GLRA2 gene (glycine receptor, alpha 2 subunit) have been linked to high myopia in both humans and mice (Tian et al., 2023), illustrating the gene's importance in ocular development. Additionally, the RBBP7 gene (RB Binding Protein 7) is involved in cell differentiation and proliferation, further underscoring its significance in cellular processes.
The SRY gene (Sex-determining Region Y) is crucial for male sex development, functioning as a transcription factor that initiates the formation of Sertoli cells, which are key in the development of testes. This gene is pivotal in determining male sex differentiation based on chromosomal configuration. Interestingly, our study also identified the Vascular Endothelial Growth Factor (VEGF) gene, a known target of the hypoxia-inducible factor HIF-1α, which is crucial under hypoxic conditions (Rashid et al., 2020; Kumar et al., 2024). Previous whole-genome resequencing studies on Indian yaks revealed the VEGF gene's involvement in hypoxia response, highlighting its adaptive significance. Similarly, Li et al. (2014) identified the VEGF gene under selection in Tibetan Mastiffs, emphasizing its role in adaptation to high-altitude environments. Effect of LD pruning on Y chromosome
Linkage disequilibrium (LD) describes the non-random association of alleles at different loci. In cases where the frequency of a haplotype matches the product of the individual allele frequencies, the alleles are in linkage equilibrium and are independent. Deviations from this equilibrium indicate LD, influenced by factors such as genetic drift, mutations, and recombination. Typically, alleles at loci close to each other exhibit high LD due to the infrequency of recombination events, while alleles at distant loci or on separate chromosomes show lower LD levels. Biological mechanisms, including selection, inbreeding, and population structure, further shape LD patterns (Slatkin et al., 2008). LD pruning, the process of removing SNPs based on their LD, can be complex. Traditional methods often prune SNPs with high correlations, which can lead to the removal of SNPs with significant allele frequency differences among subpopulations. This issue, known as the two-locus Wahlund effect (Nei, 1973), can obscure true genetic differentiation between populations and bias metrics like FST, which measure genetic divergence (Li et al., 2019).In our study, we applied various LD pruning parameters to the Y chromosome data from different yak populations. We observed that 70–96% of SNPs were pruned under different r² values. Specifically, using an r² threshold of 0.2 with a fixed window size of 50kb and a step size of 5kb, we removed 256,291, 227,778, 264,533, and 186,183 SNPs in Arunachali, Himachali, Ladakhi, and Jinchuan yaks, respectively. A similar approach in goats removed 30,139 SNPs, leaving around 15,000 SNPs (Visser et al., 2016). Increasing the r² threshold to 0.5 further reduced the number of SNPs to 226,591, 201,001, 230,752, and 174,883 in these yak populations, consistent with previous findings in bovine species (Li et al., 2022). At an r² value of 0.8, the SNP count dropped to 202,080, 178,873, 202,769, and 161,148, respectively. The extensive SNP pruning, removing between 72% and 96% of variants, highlights the potential pitfalls of stringent LD criteria, especially in sex chromosomes or allosomes. Such pruning may inadvertently eliminate loci under selection, distorting the analysis of genetic diversity and population structure (Falconer, 1996; Laird and Lange, 2011; Abramovs et al., 2020; Kanaka et al., 2023). Therefore, careful consideration is required when applying LD pruning, particularly in genetic studies involving multiple populations or sex chromosomes.