Genomic signatures of divergent selection for production and adaptation in domestic goats (Capra hircus)

Background : Goats are distributed worldwide and include many breeds with marked phenotypic variation. Both natural and human driven processes have shaped their genomes. Here, 52K genome-wide SNP genotypes were used to investigate signatures of divergent selection between indigenous goats from an arid hot environment in Egypt and breeds developed under temperate environments in Europe. Three selection signature approaches were used, the di derivative of F ST , iHS and Rsb. Results: Out of a total of 36 candidate regions that were detected to be under selection by the three approaches, the analysis detected nine regions with the strongest selection signatures spanning 134 genes of which 68 were the most significantly functionally enriched. In addition to well-known genes affecting dairy traits ( LAP3 , ABCG2 , MEPE , IBSP , MED28 ) and body weight and stature ( LCORL , NCAPG , CCSER1 , DCAF16 ) in several mammalian species, we found evidence for selection in regions spanning genes implicated in the regulation of fatty acid synthesis and metabolic pathways ( PI4K2B and PPARGC1A ), inflammatory gene expression and autoimmune response pathways ( CXCL8 , HERC5 , RGS18 , TROVE2 ) and spermatogenesis ( SPATA18 , LAP3 ). Other genes not reported before in goats included SLIT2 , PACRGL , GRP125 , DHX15 , SOD3 and KCNIP4 which have been associated with thermal nociception in mice. We also detected a paralog of TBC1D12 , namely TBC1D14 , in one of the candidate regions. TBC1D12 has been linked to environmental adaptation in sheep. Gene ontology analysis revealed the 68 candidate genes were highly enriched in two biological processes viz regulation of G protein-coupled receptor signaling pathway (GO:0045744) and cellular response to stress (GO:0033554). Conclusions: Our results provide evidence that goat genomes have evolved, in part, via diverse positive selection directed at breed formation, at past and recent genetic improvement and adaptation to environments. Based on their uniqueness, these breed-group specific signatures of

selection can be considered to be footprints of divergence which may be useful in characterizing genome architecture and diversity in domestic animals and for targeted genetic improvement.

Background
Humans and the environment have played a fundamental role in the evolutionary history of domestic livestock [1]. Genetic changes in the caprine genome driven by the quest to adapt to new circumstances dates to about 11,000 years ago [2]. Domestication and subsequent human-mediated dispersion across the globe, and socio-cultural and economic activities, exposed domestic goat genomes to new and challenging natural and/or artificial environments and husbandry practices. For some of the exposures, the species evolved to adapt to the novel environments but in other instances, the resulting outcomes were not favourable. The favourable adaptations saw the species being established in a wide range of habitats within the tropics, sub-tropics and temperate environments, where locally adapted populations and breeds were differentiated by various factors including selection, genetic drift and reproductive isolation. More recently, artificial selection for various unique and aesthetic morphological (coat colours, presence of horns etc.) and economically important (growth rates, milk production etc.) traits geared towards the development of specialized and distinct breeds must have left signatures of divergent selection in their genome.
The introduction of goats to Africa may have commenced with the movement of herders from the Near East into North Africa (Egypt, Libya, Algeria) around 7,000 years ago [3].
Egypt has approximately 4.4 million goats [4], that are predominantly of indigenous types.
The hot climate and arid environment in Egypt present a major positive selection pressure and livestock keepers prefer indigenous stocks over their exotic counterparts, due to their better adaptive survival and low husbandry requirements. The combined effects of natural selection, traditional management, and ancient and contemporary intermixing may have resulted in a uniquely adapted indigenous goat gene pool which can be used to examine the phenotypic and genetic consequences of adaptive evolution.
The detection of genomic regions that have been targeted by recent positive selection has proved a powerful tool for delineating genes contributing to adaptation to environmental variables and for informing functions accounting for phenotypic diversity. Over the last decade, many genome-wide scans for selection have been reported in domestic animals, fuelled by the advent of single nucleotide polymorphism (SNP) data sets and wholegenome sequences. These studies have made use of various statistical methods based on the predictable effects of positive selection on patterns of genetic variation. These effects include a decrease in haplotype diversity, high fraction of rare alleles, or major shifts of allele frequency between populations [see reviews by 5,6]. These approaches have led to the identification of several hundred genomic regions displaying selection footprints, suggesting the presence in these regions of new beneficial mutations that have spread rapidly through the population.
Several candidate selection signatures associated with adaptation to arid environments were reported in the genomes of Barki goats and sheep from Egypt [7]. The authors however, analysed only a small number of individuals (68 goats and 59 sheep) with a restricted geographic range. Furthermore, selection signatures associated with meat and milk production traits have been reported mostly for breeds of sheep and cattle in the developed countries [8,9,10,11]. To advance our knowledge and understanding of the genome dynamics of indigenous goats vis a vis well-defined commercial breeds, we analysed 52K SNP genotype data of breeds of goats from a sub-tropical arid environment in Egypt verses temperate European environment, and exposed to contrasting selection pressures, natural versus artificial. The comparative analysis of the goat breeds from Africa and Europe, identified 36 candidate regions spanning 258 genes under positive selection. Sixty-eight functionally enriched genes were found across nine candidate regions that had the strongest signatures of selection. We considered the 68 genes to be the likely primary candidate genes constituting the first line of adaptive evolutionary divergence amongst the different goat breeds.

Results
Genome-wide patterns of signatures of selection. We investigated candidate regions under selection using allele frequencies estimated within (iHS) and between (di and Rsb) Egyptian and European breed-groups of goats (Supplementary Table S1). The di identified four candidate regions showing significant differentiation between the two breed-groups on four chromosomes (CHI5, CHI6, CHI12 and CHI17; Figure 1a; Supplementary Table S2). CHI6 and CHI12 had the strongest signatures of differential selection spanning 5 and 20 significant SNPs, respectively. The Rsb revealed 19 candidate regions spread across 12 chromosomes (Figure 1b; Supplementary Table S2). The strongest and most significant regions occurred on CHI3, CHI6 and CHI16 and spanned 15, 11 and 81 significant SNPs, respectively. Thirteen candidate regions found across eight chromosomes were revealed by iHS (Figure 1c; Supplementary Table S2). The strongest of these regions were found on CHI6 and CHI12 and were defined by 19 and 18 significant SNPs, respectively.
Overlap between candidate regions. All the three approaches identified candidate regions under selection on CHI6 and CHI12 (Supplementary Table S2). On CHI6, one region was identified by di (68,327,495 Gene content and functional annotation of the strongest candidate regions. There were five (Rsb = 3, iHS = 2) candidate regions that spanned no annotated genes (Supplementary Table S2). Such regions have been designated as "gene deserts" and have also been reported by other investigators [12,13,14,15]. Genome-wide, we identified 258 genes (di = 13, Rsb = 115, iHS = 130) mapping to 36 (di = 4, Rsb = 19, iHS = 13) candidate selection regions that were defined by 287 significant SNPs (Supplementary   Table S2). Amongst the 36 candidate regions, there were seven that were the strongest (di = 2, Rsb = 3, iHS = 2) and spanned 169 significant SNPs (Table 1). In total, 134 genes mapped to the nine strongest candidate regions (Table 1). We regard the positive selection acting on these nine strongest candidate regions to be the primary ones driving the evolutionary divergence between the two groups of goats analysed.
The two strongest signals of selection revealed by di were respectively, on CHI6 and CHI12. The region on CHI6 spanned four genes and the one on CHI12 extended across two genes ( Table 1). The strongest signals revealed by Rsb were on CHI3, CHI6 and CHI16, respectively. The region on CHI3 spanned four genes, the one on CHI6 five genes and the one on CHI16, 29 genes (Table 1). Chromosomes CHI6 and CHI12 also had the strongest iHS candidate regions. The one on CHI6 spanned 80 genes, while 10 genes occurred in the region on CHI12 (Table 1).
Enrichment analysis was performed with the 134 genes present in the nine strongest candidate regions. We limited the functional annotation to B. taurus and used it as the background species because, in both cases, using C. hircus returned no results, most likely due to either the incomplete annotation of its genome or its exclusion in the DAVID database. Based on the bovine RefSeq gene annotation, of the 134 genes, 68 were significantly enriched. From DAVID analysis, we found seven functional annotation clusters that were significantly enriched (Supplementary Table S3); the top three were regulator of G protein-coupled receptor (GPCR) protein signaling pathway (GO:0008277; Enrichment score = 4.14), protein ubiquitination involved in ubiquitin-dependent protein catabolic process (GO:0042787; Enrichment score = 2.02) and Leucine-rich repeats (LRR) typical subfamily (IPR003591; Enrichment score = 1.42).
The functions of the 68 genes were further investigated using the text-mining tool in STRING. We detected several genes that have been reported before to be strongly linked to positive selection in other domestic livestock. These genes included LAP3, IBSP, MED28, MEPE, ABCG2, CCSER1, SPP1, that have been associated with milk production related traits [8,12,16,17,18,19,20] and PLAG1, NCAPG, IBSP, DCAF16, CCSER1, FAM184B, CCKAR and LCORL which are body size-related genes [12,21,22] in mammals. There were also genes associated with male fertility and spermatogenesis (ANAPC5, GPR125, SMARCAD1, SPATA18) and prolificacy (BMPR1B). Other genes were involved in fatty acid synthesis (adipogenesis) and metabolic pathways (PI4K2B, PPARGC1A, RGS1, RGS2, RGS13), and with pro-inflammation and immune response (SPP1, CXCL8/IL8, HERC5, HERC3) [23]. There were several genes that have been reported to be of functional significance in other species. SLIT2, PACRGL, GPR125, DHX15, SOD3 and KCNIP4 were identified in a mice QTL interval linked to thermal nociception [24,25]. TBC1D14 has a critical development role for RAB-GAP-mediated protein transport in early embryogenesis in mouse, while its paralog, TBC1D12, has been implicated as a candidate gene for environmental genetic adaptation [26]. TROVE2, UCHL5 and RGS1 were identified as autoimmune disease response candidate genes in activated T cells in human subjects and in interactions with ubiquitin and proteasome pathways [27] while RGS proteins play an essential role in B-lymphocytes and thus in immune response in mice [28].
Protein-protein interactions (PPI), as well as GO enrichment, were also investigated for the 68 most plausible candidate genes using STRING. The network proteins encoded by the 68 genes had significantly more interactions amongst themselves than was expected for a random set of proteins of similar size, drawn from the genome (85 edges identified; PPI enrichment P-value = 1.0 × 10 − 16 ; Figure 2). STRING also revealed two GO terms that were the most significantly enriched (FDR < 0.05) viz: negative regulation of G proteincoupled receptor (GPCR) signaling (GO: GO:0045744) and cellular response to stress (GO:0033554).
GPCR's represent the largest family of transmembrane receptors in the mammalian genome and occur in nearly all multicellular organisms [29,30]. They play a major role in signal transduction and perception of external environments in mammalian cells. In vertebrates, GPCR signaling networks have been associated with neurotransmission, cellular metabolism, secretion, cellular differentiation and growth, inflammatory and immune responses, smell, taste and vision [31]. Cellular stress response encompasses a wide range of molecular changes in response to environmental stressors, including extremes of temperature, exposure to toxins, and mechanical damage. The various processes involved in cellular stress response serve the adaptive function of protection against unfavourable environmental conditions, through either short-term mechanisms that minimize acute damage to cell integrity, and/or long-term mechanisms which provide a measure of resiliency against adverse conditions.

Discussion
Goats have played multiple and crucial socio-cultural and economic roles in ancient and modern societies. Accurate characterization of the species genome architecture could help infer the genetic and functional variation underlying the population structure and the design of appropriate conservation, and genomic selection and breeding programs. Using 708 samples we provide an assessment of the genome structure amongst goats of African descent from a hot non-tropical arid environment and those of European ancestry selected artificially for production traits in temperate environments.
Comparing the results of the three analyses, there were differences in the number of signals detected. The di analysis detected four candidate regions, compared with 19 and the configuration of the three selection mapping statistics that capture different patterns in the genome [32,33,34]. Selection signatures based on genetic differentiation persists for a longer period than signals based on haplotype structure. The, di therefore can detect older signatures of selection [32,34] which are most likely associated with long term adaptive divergence. The signatures that are based on haplotype structure arise from short-and/or medium-term events. They are likely to reveal outcomes of artificial selection and formation of breeds given that systematic selection and animal breeding based on predefined traits, which resulted in well-defined breeds, came into effect in the mid-1800's [35]. The combination of the three methods therefore provides a comprehensive picture of the signatures of selection acting in the genomes of goats that are exposed to different environments and selection pressures. Although few overlapping selection signature regions have been identified using computationally different approaches, when they occur, they do provide support for credibility of the results [36] and in our case, they increase the reliability of the nine primary candidate regions reported here to result from genuine selection events. Furthermore, our selection signature scans used allele frequencies that were derived from multiple populations, and individual chromosome segments have a low probability of drifting, simultaneously, towards fixation in several populations.
In performing the di and Rsb, we pooled four Egyptian breeds of goats into one test group (Egyptian predominantly under natural selection) and 10 breeds from Europe into another group (non-Egyptian predominantly under artificial selection). The grouping was based on a priori knowledge of the breeding history and geographic-environmental origin of the breeds. Such a pooling strategy has been implemented previously [13,15,37]. Sample pooling offers several advantages. Apart from the fact that it results in stronger and more distinct signals of selection, it also reduces the effects of population specific linkage disequilibrium arising from genetic drift, smoothens out population specific effects that may arise from specific genetic backgrounds and averages out the effects of within group structure on haplotype homozygosity [37].
Five of the candidate regions under selection were gene deserts. This can be due to the lack of genes with functional significance in the candidate regions or more likely due to poor annotation. In observing such regions in cattle, Zhao et al. [21] suggested that such regions are likely to play important roles in adaptation which may be elucidated in future with improvements in genome annotations. It is possible that there may be an Several studies have implicated mutations in ADAMTS20 in skin pigmentation [47,48]. The current data therefore raises the possibility that, by virtue of their functions, mutations in the genes flanking the gene deserts may affect long range regulatory sequences and this may explain our finding. Alternatively, the mutations could disrupt local chromatin structures, or interaction of chromatin with the nuclear matrix, in a manner that affects the expression of other genes. However, these postulations require further investigations.
Given the limited marker density in our data and the finding of Kemper et al. [49] that response to selection is usually based on small changes in frequency at many loci, it is likely that we may have only detected loci with major effects due to strong selection. The collection and analysis of many phenotypes together with high-density genotype data or full genome sequences, may be necessary to disentangle the genetic basis and functional significance of the gene deserts.
Out of the 36 candidate regions, nine were defined by the strongest signals and spanned 68 functional genes. We hypothesize that these nine candidate regions and the associated 68 genes could be the primary candidates driving genomic adaptive divergence and evolution in the study breeds and could therefore be potential targets for functional studies. A large number of the 68 candidate genes are linked to multiple functions. This suggests that the underlying genetic mechanism for optimal productivity and performance and adaptation to arid environments are complex and may involve multiple genomic regions and genes with pleitropic functions. For instance, PPARGC1A mediates expression of genes involved in oxidative metabolism, adipogenesis, and gluconeogenesis [50].
Expression of this gene has been suggested to be required for the initiation and development of lactation in dairy cattle [18] and has also been shown to be associated with milk composition [18,51], reproduction [52], growth [53], carcass traits [54] and meat quality [55]. LAP3 and CCSER1 have been associated with milk production and body size in Charolais cattle [12,21,56]. Furthermore, DAVID revealed 12 gene clusters that were functionally enriched while STRING revealed more significant protein-protein interactions than were expected for the 68 primary genes suggesting the proteins are, at least partially biologically linked as a functional group.
The variability in signatures of selection can shed light on genome function and evolution.
Our analysis revealed several genes associated with production traits, and variation in the latter can provide evidence of phenotypic response to artificial selection [12]. Genes underlying variation in quantitative and qualitative traits are of interest in breed formation, breeding improvement and tracing trajectories of genome evolution [12].
Several causal variants identified within PLAG1, DCAF16, NCAPG and LCORL genes, which correspond to the same QTL, have been reported to affect calving ease through their impacts on animal development (growth and weight, body size, height and stature) and birth weights in sheep [9], cattle [16,57], pigs [58], horse [59] and dogs [60]. RGS proteins have been implicated in efficiency in differentiation during adipogenesis and osteogenesis and thus in bone development and therefore body size and fat deposition [61]. In addition to LAP3, LCORL and NCAPG, IBSP, MED28, MEPE and ABCG2 have been associated with dairy traits and explain a large percentage of the variation in milk yields, protein content and fat composition [62,63,64]. Through differential expression analysis, it was revealed that ABCG2 also acts as a regulator of lactation [65]. NCAPG also plays a role in decreasing subcutaneous fat thickness and increasing carcass yields while PI4K2B and PPARGC1A are involved in fatty acid synthesis and metabolic pathways. The considerable overlap between our results and those of other studies, appear to suggest that the genetic mechanisms underlying several production traits (milk and body size) might be the same across domestic livestock and it is unlikely that they are false associations, or our selection signatures are false positives. It also implies that the incorporation of any identified causative variants into genomic selection strategies could improve their accuracy and robustness, while allowing targeted selection to achieve more rapid genetic improvement across domestic livestock. Further studies may therefore be required to assess how mutations in these candidate genes affect protein functions and phenotypes.
Our results provided indirect insights on possible mechanisms underlying adaptation to arid environments in indigenous goats as evidenced by the two most significant biological processes, negative regulation of G protein-coupled receptor (GPCR) signaling and cellular response to stress as revealed by STRING. Arid environments are characterized by a number of bio-physical stressors including high temperatures, solar radiation, feed and water stress. Indigenous domestic animals are known to be well adapted to these stressors and several studies on signatures of selection have been undertaken to unravel the genetic basis of adaptations to environments in a number of livestock species [7,13,15,26,66]. A paralog of TBC1D14, namely TBC1D12, was present in one of our candidate regions and was identified previously by Lv et al. (26) to be candidate gene for environmental adaptation in sheep. Other genes such as SLIT2, PACRGL, GPR125, DHX15, SOD3 and KCNIP4 were identified in a mice QTL interval linked to thermal nociception [24,25]. Environmental stressors can trigger a variety of responses at the cellular and subsequently organism level. Depending on the severity and duration of the stress factors, cells either re-establish cellular homeostasis to the former state or adopt an altered state in the new environments. Signal transduction, a fundamental biological process that is required to maintain cellular homeostasis and ensure coordinated cellular activity in all organisms, is important in this regard. Cell surface membrane proteins provide the communication interface between the cell's external and internal environments. GPCRs are one of the largest and most diverse membrane protein families [67]. They function by detecting a wide spectrum of extracellular signals and following ligand binding, undergo conformational changes, causing the activation of complex cytosolic signalling networks, resulting in a cellular response [68]. Based on our results, it is likely that an interplay between the genes modulating cellular stress response and GPCRs constitutes a "stress surveillance and response system" which functions to detect biotic and abiotic stressors to trigger appropriate responses resulting in either short-and/or long-term adaptation. We hypothesize that the "stress surveillance and response system" could be highly conserved in indigenous goats and may reflect shared mechanisms of environmental response related to GPCR signaling across domestic animals.

Conclusions
We carried out a genome-wide scan of selection signatures using the Caprine 52K SNP BeadChip array that contrasted breeds of goats from an arid environment in Egypt and a temperate one in Europe. The two genetic stocks represented diverse gene pools based on location and breeding history. The analysis of these two diverse genetic stocks identified a number of genes under positive selection that are related to milk production, growth and body size, and other functions such as fatty acid synthesis and response to stress and adaptation. Although some of the genes were previously reported in other domestic animals, we confirmed the occurrence of positive selection in the goat breeds using more diverse samples. We also revealed other genes not reported before in domestic animals that are linked to skeletal structure and thus body size, reproduction and nociception, aspects which are important in adaptation to stressful environments. Our results clearly indicate that diverse genomic selection during the process of genetic improvement and breed formation, and adaptation to arid environments have contributed to the sculpturing of the caprine genome.

Methods
Sampling, genotyping and data quality control: Whole blood was collected via jugular venipuncture from 366 individuals of four breeds of Egyptian indigenous goats (Barki, Saidi, Wahati, Nubian; Supplementary Table S1), using EDTA coated vacutainer tubes. All the animals were released immediately following sampling for grazing. The animals were selected at random from flocks maintained by livestock producers and research farm.
Efforts were made to ensure the animals represented, as much as possible, the target breeds of goats. Genotyping was performed on the Illumina CaprineSNP52K BeadChip at GeneSeek Inc. The raw genotypes for the 53,347 SNPs included in the genotyping platform were first analysed with GenomeStudio (Illumina; GenCall score for raw genotypes > 0. 15) and used to extract the genotypes in standard format for PLINK 1.09 [69]. Amongst the 53,347 SNPs that were genotyped, 1483 that failed genotyping in at least 80% of the individuals in each population and 1756 that were not polymorphic (minor allele frequency (MAF) ≤ 0.03) were dropped. We included in this study, a subset of SNP genotypes that were generated with the same BeadChip array that were collated within the framework of the ADAPTmap project [70]. The extracted subset included 447 individuals of 12 European breeds of goats that have been bred and selected for meat and milk production under temperate environments (Supplementary Table S1). These two sets of genetic stocks present divergent but ideal, gene pools for assessing genome regions under different selection pressures (mainly natural versus artificial). The two datasets were merged and the SNPs that mapped on autosomes, with positions based on the goat genome assembly v1.0 [71], were chosen for analysis. In total, 708 individuals with genotypes for at least 95% of the SNPs were available for analysis. This dataset was subjected to the following quality control filters: individual call rates ≥ 95%, SNP call rates ≥ 90%, MAF ≤ 0.03 and genotype frequencies had to be in Hardy-Weinberg equilibrium with P ≥ 10−4. The number of SNPs retained in the final dataset following quality control filtering was 49,056.
Detection of selection signatures: The genotypes were pooled into two groups comprised of Egyptian and European (non-Egyptian) breeds of goats based on a priori knowledge of their breeding history and geographic origin. Three analyses were performed to detect genomic regions under positive selection. A genetic differentiation analysis was used to contrast the Egyptian and European breeds by calculating the unbiased estimate of F ST [72] as described by Akey et al. [73], for each of the 49,056 SNPs that passed quality thresholds. The F ST values were averaged across a sliding window of 20 SNPs which overlapped by five SNPs following Kim et al. [7]. Identification of candidate signatures of selection was based on window estimates at the extreme of the empirical distributions following Akey et al. [73]. This approach has been used to detect selection sweeps that are specific to a breed/population, or a subset of breeds/populations [7,12,74].
Specifically, we considered that a position carried a signature of selection if it occurred in the top 1 percentile of the empirical distribution for genetic differentiation.
Two extended haplotype homozygosity (EHH) statistics, iHS [75] and Rsb [76] were also used to map selection signatures. The tests detect regions with high levels of haplotype homozygosity, either within (iHS) or between (Rsb) breeds, over an unexpected extended genomic region relative to neutral expectations. The intra-population Integrated Haplotype Score (iHS score) was computed using the rehh package [77] in R [78]. The natural log of the ratio between the integrated EHH for the ancient (iHH A ) and derived (iHH D ) alleles for the Egyptian goats breed-group were calculated for each SNP. The allelic states (ancient/derived) of each SNP were inferred using two approaches: i) the allele with the highest frequency was inferred as the ancestral one as was determined for humans [79]; ii) the allelic states were assigned at random after permutating 100 times to ensure consistency in the assigned allelic states. A two tailed Z-test was used to identify The inter-population Rsb scores were also computed between the Egyptian and European breed-groups of goats. The Rsb statistic was defined as the natural logarithm of the ratio between iES Egyptian and iES European . For this purpose, the integrated EHHS (site-specific EHH) for each SNP and group of goats (iES) was calculated. A Z-test was used to identify the significant SNPs under selection in the Egyptian group of goats. One sided P-values were derived as -log10(1-Φ(Rsb)), where Φ(Rsb) represents the Gaussian cumulative distribution function.
For the two EHH statistics, haplotypes were reconstructed by phasing the genotyped SNPs using Beagle [80]. Sliding windows of 20 SNPs that overlapped by five SNPs as per Kim et al. [7] were used to calculate haplotype frequencies. The -log10(P-value) > 4.0 (P-value < 0.0001), was used as the threshold to define the significant iHS and Rsb scores. Candidate regions were identified if at least three SNPs passed the threshold. In instances where multiple SNPs passed the threshold, a distance of 0.25 Kb up-and down-stream of the most significant SNP was used to define candidate selection region intervals. This distance was chosen based on the rate of change in the mean pairwise linkage disequilibrium statistic (r2) calculated using the r2fast function of GenABEL [81], binned over distance intervals across autosomes of the Egyptian goats (Supplementary Figure S1).
Functional annotation and protein-protein interactions: All variants found within the confidence intervals of the candidate selection regions were annotated. To avoid missing important genes while including potential effects of regulatory changes on loci, as well as reduce the risk of excluding the outermost portions of the selected haplotypes, as may arise due to using sliding windows of fixed sizes [82], the confidence intervals were extended by 100 kb on each side of the candidate region interval. We retrieved genes within the candidate region intervals from the NCBI database based on the caprine ARS1 reference genome assembly [71], Capra hircus Annotation Release 102. Functional annotation and gene ontology (GO) enrichment analysis were performed with the function annotation clustering tool of DAVID 6.8 [83,84]. Each gene was analyzed and an enrichment analysis was performed using the B. taurus genome background supplied with DAVID 6.8. Corrections for multiple testing were performed by applying the Benjamini and Hochberg [85] approach. The GO terms and the KEGG pathways were considered significant at P values lower than 0.05. In the case of the functional annotation clusters, an enrichment score of 1.3 was used as the threshold following Huang et al. [83,84].
Functional protein-protein interaction (PPI) networks encoded by the possible candidate genes were also investigated using STRING Genomics 11.0 [86]. STRING provides known PPI from curated databases or experiments and PPI predicted based on gene neighborhood, fusions, and co-occurrences, text mining in literature, co-expression, or protein homology. A global PPI network which retained only interactions with a high level of confidence (score > 0.4) was constructed.

Acknowledgements
Special thanks to flock owners in Egypt whose goats were sampled and to ADAPTMap consortium for providing genotypes of the European breeds. This study was carried out as

Availability of data and materials
The datasets generated and/or analysed during the current study are available from the   Figure 1 The Protein-Protein Interaction (PPI) network for the 68 primary candidate genes generated using STRING.

Figure 2
Manhattan plots for the genome-wide distribution of SNPs following selection sweep analysis based on di (1a), Rsb (1b) and iHS (1c) analysis.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.