Whole-Genome Sequencing and Genetic Variation
Genome sequencing yielded a total of 438.46 billion reads of raw paired-end reads with a length of 150-bp from 19 goats in the present study. Stringent quality filtering yielded 430.32 billion reads or 64,548 Gb of genome data (approximately 147× effective sequence depth). Over 98% of the total clean reads were mapped to the annotated goat reference genome with a coverage of ~99.66% (Supplementary Table S1), These data are indicative of high quality sequences for the subsequent analyses. A total of 22.29 million high quality SNPs were obtained, relative to the published reference goat genome. Most of the SNPs (62.98%) in the individual genome of goat were located in intergenic regions. The remaining SNPs were located upstream (0.55%) and downstream (0.57%) of open reading frame, in introns (34.51%) and only 0.76% of the SNPs were located in exonic regions (0.63%) (Supplementary Table S2). A total of 60,259 non-synonymous SNPs and 84,818 synonymous SNPs were located within exons, forming a nonsynonymous/synonymous ratio of 0.71 was closed to the ration of Mouflon sheep22 (0.689) (Supplementary Table S2). Next, the 18.58 million SNPs were divided into 3 categories: 9.86 million SNPs shared between CDMG and CDBG, as well as 8.57 million and 0.15 million SNPs unique to CDMG and CDBG (Fig.1), respectively. Next, 18.58 million SNPs were measured for genome-wide variation and the frequency spectrum so as to accurately detect genomic footprints left by selection. The numbers of SNPs in the exonic, intronic, and intergenic regions of CDBG were lower than was observed in CDMG. Furthermore, higher genomic diversity was observed in CDMG relative to CDBG (Supplementary Table S3). These data indicate that the high altitude goat breed underwent intense natural and artificial selection in extreme weather conditions.
The sequence diversity (π) was estimated to be 1.35x10-3 in CDBG and 2.05x10-3 in CDMG, which were slightly lower than the native sheep22 (2.28x10-3). Moreover, the goats exhibited a nucleotide diversity that was higher than estimates for human populations23 (1.0x10-3), and lower than that of Chinese cattle breeds24 (Leiqiong cattle,4.2x10-3). The level of genetic differentiation between them, Fst, was estimated to be 0.03, which is lower than that between Tibetan and Hu sheep25 (about 0.249). The high proportion of shared SNPs and the low Fst revealed a small genetic divergence between plateau and plain goats. However, the principal component analysis (PCA) was significantly separated along the first principal component between plateau and plain goats (Fig.2), which accounts for the greatest proportion of genetic variability. The neighbor-joining tree also grouped the plateau and plain goats into two distinct groups.
Short insertions and deletions (InDels) and Structural Variants (SVs)
As efforts turn to whole genome sequencing as an approach for identifying adaptive variation, it will become even more critical to detect InDels fully and accurately26. Furthermore, InDels and SVs are important types of genetic variations that contribute significantly to genomic diversity27. A total of 861,995 insertions and 1,114,290 deletions were identified. Within all InDels, 1,129,583 were observed in both domestic goat, 388,539 were specific to CDBG and 741,044 were specific to CDMG. The length of InDels ranged from 0 to 10 bp, with the large majority consisting of single nucleotide deletions (Fig.3a). Observation of a high number of unique genetic variations suggests a benefit to the animals in protecting genetic diversity.
A total of 31,483 reliable structure variants were detected. Among these, 4,941 were insertions, 16,785 were deletions, 292 were inversions, 7,888 were chromosomal translocations, and 1,577 were tandem duplications. Within these SVs, 1,366 were specific to CDBG and the other 11,915 were specific to CDMG (Fig.3b). No significant differences were observed in the length of these types of SVs. Apparently, more variation was retained under selection pressure for CDMG in this environment. This phenomenon could be the result of the life-sustaining adaptions to high altitude, low temperature, and low oxygen conditions. The discovery range of InDels overlapped with that of structural variation. Thus, breed-specific variants could be useful for further studies of breed characterization, while overlapping variants may reflect closer relations between breeds28.
Signatures of Positive Selection in goat Genome
Genome variations were compared to both nucleotide diversity and allele frequency in response to natural and artificial selection. The ratio of the genetic diversity in CDBG versus CDMG goats (πCDBG/πCDMG) across the goat genome along the autosome using the sliding window method was measured. The 5% of windows with a Fst>=0, π ratio (CDBG compared to CDMG=0.775), in comparison to CDBG, CDMG was observed to possess lower levels of polymorphisms, suggesting that these genomic regions were strongly selected for in the extreme environment. These selected regions also exhibited significant differences (Tajima.D= - 0.313). This result quantitatively reflects that the extreme climatic conditions had a strong impact on the emergence of adaptive genetic changes in highland goats. In addition, although the overall goat genome is similar, two divergent clusters were observed that may be the result of adaptations to the extreme environmental conditions in which they lived (Fig.2). Moreover, the combination of Fst values and π ratios were more reliable indicators of goat genome for the selected genomic regions from the plateau and high-altitude regions. A total of 1277, were identified as high leves of differentiation regions (Supplementary Table S4). Using the top 5% of Fst>=0 and π ratio>=0.78 cutoffs for the CDBG/CDMG (Fig.4b). A total of 3184 overlapping selective candidate genes were observed to be shared between the Fst values and π ration values (Fig.4a). The combined analysis of these methods provided further support that the candidate genes were in fact a result directly related to the treatment.
GO and pathway enrichment analysis of selection signals
It is reasonable to expect that specific genes and biological pathways be involved in adaptation to extreme environments. To gain insights into the molecular functions of the candidate genes in the selected region and to provide clues for further screening validation, a gene ontology enrichment analysis was conducted. The analysis identified significant enrichment of 1867 specific candidate genes embedded in the selective regions in 175 GO terms (P < 0.05, Supplementary Table S5). These genes were significantly enriched in detection of stimulus involved in sensory perception (GO:0050906), neurological system process (GO:0050877), positive regulation of phospholipase activity (GO:0010518), detection of stimulus (GO:0051606). These pathways reflect the likely adaptations of cashmere goats to the low temperatures on the plateau. Also enriched were cardiac conduction (GO:0061337), pulmonary valve morphogenesis (GO:0003184), positive regulation of potassium ion transmembrane transporter activity (GO:1901018). These pathways are associated with adaptations to activity under the hypoxic environmental conditions associated with life at high altitude. Moreover, some immune activities were also significantly enriched. For example, positive regulation of T cell mediated immunity (GO:0002711), regulation of humoral immune response (GO:0002920), this would probably have been the result of a rapid response to new pathogens associated with frequent contact with humans during domestication29. Multiple candidate genes involved in production traits. For example, coat color genes included OSTM1 (ncbi_102189757), KITLG (ncbi_100860807), POMC (ncbi_102182514), APC (ncbi_102184724), CDK5 (ncbi_102178040), body size included BMP2K (ncbi_102186290), FOSL2 (ncbi_102171522), IGF2 (ncbi_100861277), BMP1 (ncbi_102179476), ACSL3 (ncbi_102185694), POMC (ncbi_102182514), CKM (ncbi_102170262), MYF5 (ncbi_100861252), cashmere traits included PHF14 (ncbi_102173711), FGF5 (ncbi_102171226), KRT74 (ncbi_102176789). In addition, some specific selective genes that were expected to be mutated to allow for environmental adaptation were also identified. These included genes involved in adaptability to hypoxia, such as GNB1 (ncbi_102187070), HIF1A (ncbi_100861391), RHOG (ncbi_102191156), PPARA (ncbi_100861393), TMTC2 (ncbi_102190760), EPO (ncbi_102180175), TH (ncbi_102179849), PPARA (ncbi_100861393), ACER1 (ncbi_102179093), HSP70.1 (ncbi_100860849). These findings substantially enriched the catalog of genetic variation and provided new insights into the unique characteristics of Chinese native goat breeds.
Different genes coordinate with each other to perform their biological functions, and pathway-based analysis is helpful to further understand the biological functions of genes. Selected gene were annotated using KEGG pathways. The analysis identified 21 genes that were significantly enriched (P<0.05) (Fig.5, Supplementary Table S6). Many of these pathways that were identified are involved in inflammatory responses in the lung tissue and immune system. For example some of the enriched pathways included Asthma pathway, Pertussis pathway, Antigen processing and presentation pathway. This may be because in the hypoxic environment, lung tissue and pulmonary blood vessels are damaged, so that the pathway of organ inflammatory response is in a state of inhibition, and by activating a large number of immune-related biological pathways, the inflammatory response caused by blood vessels in the remodeling process is reduced30. This is probably a defensive adaptation mechanism of the cardiovascular system to the environment of high altitude hypoxia. The immune regulatory cytokine interleukin 10 (IL-10) is an important regulatory cytokine with an overall immunosuppressive effect, which may represent an evolutionary adaptation that permits reactivation of the IL-10 gene under appropriate physiological or pathological conditions31. There are also candidate gene clusters associated with neural and automatic immune responses. Immune related genes are expected to be a main target for selection because they play a key role in immunity and defense, for example interleukin 1A (IL1A), interleukin 1B (IL1B), interleukin 10 (IL10), interleukin 9 (IL9), interleukin 4 (IL4), interleukin 13 (IL13), interleukin 18 (IL18)32,33,34. This reflects the effect of natural selection pressure on plateau goats in the harsh habitat of high altitude areas. There may be general patterns of recent evolutionary adaptation of the human immune system to particular regions and that these adaptations can produce differences in disease susceptibility35. Taken together, these results suggest that regulatory variations affecting immune-related genes, particularly those associated with viruses, may confer a selective advantage on the populations.
In addition to the environmental challenges posed by low oxygen and high UV radiation, smell plays a key role in the survival of animals through a series of core behaviors, such as sexual selection, alarm signals, foraging, and defensive awareness36. As the data presented here indicated, the Olfactory transduction signaling pathway was highly enriched (Q< 0.05) (Fig.5a). The ability of prey to innately recognize the odor of a potential predator provides a strong selective advantage. These neural mechanisms enable chemical surveillance against other species, interpretation of the cues, and initiation of a defensive response 37,38. Signals that induce fearful behavior, secreted by one individual and received by a second individual of the same species, in which they initiate a specific reaction, are cues transmitted between species that selectively disadvantage the sender and advantage the recipient39. Pheromones are chemical signals between two members of the same species which are known to be found mainly in animal communication. Although these models have not provided insight into the organization of the neural response in the receiving animals40, subsets of sensory neurons are thought to be genetically determined and modulate innate behavior.
However, the evolution of olfactory perception during goat domestication has previously received little attention. Olfactory perception acuity is less important for survival of domesticated goats living in the predictable farm environment than those living in the wild. The three genes enriched in the olfactory perception signaling pathway were Cyclic nucleotide gated channel alpha 4(CNGA4), Adenylate cyclase 3 (ADCY3), cGMP-dependent 3',5'-cyclic phosphodiesterase (PDE2A). Heteromultimeric cyclic nucleotide—gated (CNG) channels play a central role in the transduction of odorant signals and subsequent adaptation, Cyclic nucleotide gated channel alpha 4 (CNGA4)subunit rapid olfactory adaptation allows an animal to continuously assess changes in the odor environment and the intensity of odor41. The CNGA4 subunit in Olfactory sensory neurons (OSNs) was a functionally important mechanism for sensory adaptation of the olfactory system in vivo42. This volatile "alarm signal" produces and releases defensive chemicals that play an important role in animals adapting to their natural lives, as originally described by Darwinian natural selection in extreme environments. In this study, the CNGA4 gene revealed a higher level of diversity in CDMG (Fig.6a, 6b). Further screening of this candidate gene for non-synonymous mutations in exon, which may be representative of putative functional variants, revealed 4 missense SNPs (exon3: c.G178A/p.D60N, exon4: c.T835C/p.F279L, exon4: c.G893A/p.R298Q and exon6: c.C1489A:p.L497M) (Fig.6c) in the domestic goats. These results suggest that variations may be an important adaptation of the olfactory defense system.
Moreover, ADCY3 gene encodes the adenylate cyclase involved in olfactory signal transduction. Animals carrying functional knockouts of ADCY3 are unable to smell, as olfactory signaling across the main olfactory epithelium is mediated via receptor stimulation of type 3 adenylyl cyclase43. An indispensable role of ADCY3 in postnatal maturation of OSNs and the maintenance of olfactory ciliary ultrastructure in mice through adulthood44. The PDE2A gene was expressed in olfactory sensory neurons of olfactory epithelium45. As a marker for a unique set of olfactory sensory neurons, PDE2A is now defined as the “necklace olfactory”46. In the present study, 2 remarkable selection signals were detected in the ADCY3 (Fst=0 and π ratio=0.87) and PDE2A (Fst=0 and π ratio=0.79). These observations suggest that these selection signals may result in an enhancement of the olfactory perception defense system.
Interestingly, KEGG enrichment analysis showed that oocyte meiosis pathways were also involved in the adaptations to the environment. Calcium/calmodulin-dependent protein kinaseII (Camk 2b) might be related the hypoxia-induced factors-1 (HIF-1) pathway, or may possibly even function through the HIF-1 signal pathway47. The HIF pathway is a known complex hypoxic regulatory system involved in homeostasis, embryogenesis, and development, and can stimulate hundreds of down-regulated genes in response to hypoxia. In addition, the anaphase-promoting complex subunit 13 (ANAPC13) gene in this pathway is specifically involved in oocyte maturation48. In hypoxic environment, through the transmission of multiple collective signals, cellular metabolism may be converted from aerobic to glycolytic metabolism, which is conducive to the maturation of granulosa cells in the process of follicle maturation. This would allow for the cells to adapt their energy requirements using pathways with reduced oxygen requirements in hypoxic environments. Therefore, after a period of Darwinian natural selection, the relevant adaptation mechanism of a species may be the result of interactions between multiple genes. This indicates the complexity of selecting important adaptively mutated genes in signaling regions, and the associated regulatory pathways of these genes.
Some of the enriched signaling pathways were not believed to be related to the adaptations of indigenous goats to domestication in the present study, such as the Riboflavin metabolism pathway, Lipoic acid metabolism pathway, Linoleic acid metabolism pathway of metabolism system and prolactin signaling pathway of endocrine system. However, these pathways are involved in the basic regulation of physiological metabolic and endocrine processes. Therefore, it can be speculated that the selected genes associated with these pathways could be involved in responses resulting from domestication, possibly providing new clues to the functions of such signaling pathways.