Whole genome sequences reveal multiple functions of PAT1 gene on environment adaptation and coat coloration in Tibetan sheep populations

Background: Most sheep breeding programs designed for the tropics and sub-tropics have to take into account the impacts of both productive and adaptive traits. However, the genetic mechanism regulating the multiple biological process remain unclear. Results: In this study, we report a novel PAT1 gene that simultaneously explained the variations of productive trait (coat color), adaptive traits (altitude and geography response) in 15 indigenous Tibetan sheep populations. Overlapped genomic regions harboring 6 candidate genes across three traits were identified at 27 chromosomes, with the top 1% of Fst and |Zph| values. The SNP/INDELs and expression of these candidate genes were further analyzed, and we find that only PAT1 gene, a CSDE1 homologue was consistent with the variation of multiple traits regarding. Haplotype analysis of PAT1 reveal that Tibetan sheep breeds with C-type of PAT1 have significantly greater body weight, shear amount, chest width and body length, but have lower body height, than those with CA-type of PAT1. Conclusions: We emphasized that PAT1 gene could be a potentially selective target used for the improvements of environmental adaption and coat coloration in the future. These results contribute to the knowledge of adaptive response in Tibetan sheep populations and will help to guide future conservation programs for Tibetan sheep native to Qinghai-Tibetan Plateau. subpopulations white coat subpopulation suggested that these overlapped genes identified in both altitude and geography locations are related to convergent adaption to high altitudes in indigenous sheep


Genome-wide Selective Sweep Test
We calculated the genome-wide distribution of Fst values as previous reports in [25] for three subpopulation pairs as mentioned above as to originated altitude information, coat coloration types, and geographic regions (Table 2), using a sliding-window approach (100kb windows with 50-kb increments). To identify regions that were likely to be or have been under selection, the "Z transformed heterozygosity" (ZHp) approach was used, as previously described [26]. Individual Hp values were Z transformed as follows: ZHp = (Hp-μHp)/σHp, where μHp is the overall average heterozygosity, and σHp is the standard deviation for all windows within each population. We calculated the ZHp value in sliding 150-kb windows along the autosomes from sequence reads corresponding to the most and least frequently observed alleles at all SNP positions as previously described [27].

Bioinformatics analysis of breed specific SNPs/INDELs
Overlapped candidate genes across originated altitudes, coat coloration, and ecological regions were chosen for further analysis. In addition to this, we compared gene sequence of Capra hircus, Pantholops hodgsonii, and Bos taurus from NCBI database (https://www.ncbi.nlm.nih.gov/genome). We specifically focused on SNPs within genes and 1000-bp upstream and downstream flanking regions. Protein-protein interaction network was used to compare potential interactive genes with key overlapped genes in Homo sapiens from STRING database [28].

Validation of interactive genes
Expression of putative proteins interacted with key overlapped candidates relating to genetic variation of originated altitudes, coat coloration, and ecological regions as mentioned above were confirmed by qPCR. Detailed procedures for real-time PCR amplification were performed following standard protocol of SYBR Green real-time PCR kit (Applied Biosystems™ SYBR™ Green). The primer is listed in Supplementary Table S1. β-actin gene was used as internal control. The levels of relative gene expression were calculated against the expression of RXFP2 [17], using 2-Ct method [29]. Four biological replicates and three technical replicates were conducted.

Results
Genomic sequencing and PCA in 15  More than 9 million SNPs for each Tibetan sheep population that confidently remained after filtering were used in the subsequent analyses. Results from SNPs statistics showed that 63% SNPs were identified at intergenic regions, whereas only 0.7% SNPs cases were found at exon regions (Table S5) Table S6).
Results showed that 31, 1, and 22 common genes were identified that in classifications of altitude & geography, color & altitude, and color & geography, respectively (Fig. 2D).
In addition, we found other 5 genes, except IL12RB1 at chr.5 which was identified in both coat coloration and geography. Among these genes we cite; CSF2, LOC106990309 that specifically in Tibetan geography locations, TRNAC-GCA, ACSL6 and MEIKIN were common genes that have been identified simultaneously. These genes are located at chromosome 27. These genes include PAT1 (LOC101123097, also named productive and adaptive traits), KLF4, PSMA1, POF1B, APOOL, and ZNF711 (Table S6). These genes have high heterozygosity levels within either white coat coloration or low altitude subgroups (Table S6).
Haplotypes of PAT1 are related to adaptive response to environments and body size To further analyze the natural variations of these 6 overlapped genes, we analyzed INDELs of these genes including 2k-bp promoter and CDS regions along with other species, such as Capra, Pantholops, and Bos Taurus ( Figure 3). Results from INDEL information showed that 62% INDEL events were identified at intergenic regions, whereas only 0.3% INDEL cases were found at exon regions (Table S7). Our findings suggested that among 6 genes, only INDEL at 5'-flank region of PAT1 can fully explain the variations of different defined clusters in altitude variations, coat color, and geography locations. C-type is occurred in high altitude, black coat coloration, and Gansu geography location, while CA-type is occurred in low altitude, white coat coloration, and both Qinghai and Tibetan geography regions ( Figure S1-S3).
We further analyzed the protein structure and gene expression pattern among different indigenous Tibetan sheep populations ( Figure 4). According to bioinformatics analysis, we predicted the protein sequences of PAT1 according to translate tool in ExPASy database (https://web.expasy.org/translate/), and found the similarities score is 99% in both mRNA and protein sequences between CSDE1 and PAT1 protein, indicating its counterpart roles of PAT1 with CSDE1. We predicted the secondary structure of PAT1 with 798 amino acids, and found there are 13 different amino acids. Most variation loci were observed at Nterminal of protein containing 7 different amino acids, and these amino acids were located in either Beta strand or Turn domains, but not Helix ( Figure 4A). Three different amino acids were mapped at 3D protein structure from two PDB entries (1WFQ and 2YTX) showing high expression of PAT1, but is a white coat coloration sheep ( Figure 4D).
Therefore, an interesting question arising here is whether interactive proteins with CSDE1 could show similar expression pattern between the subpopulations of altitude, color and geography region? Therefore, we reconstructed the protein-protein interaction networks using CSDE1, and found PABPC1 and MYC, with relatively high correlation with PAT1, bears similar expression pattern with PAT1 ( Figure S4).
To interpret the biological functions of PAT1, we compared 47 physiological and biochemical parameters in representative 4 indigenous Tibetan sheep populations with different promoter haplotypes of PAT1 gene ( Figure 5; Table 2). As mentioned in Figure   3C, there are two haplotypes, C-type and CA-type, containing AW&HB, and GJ & QL Tibetan sheep populations, belong to both high and low altitude clusters, respectively. Tibetan sheep subpopulations with C-type of PAT1 have significantly greater body weight, shear amount, chest width and body length, but have low body height, relative to sheep in subgroups with CA-type of PAT1 ( Figure 5A-B).

Discussion
Global climate change would have a fundamental impact on agricultural production.
Integrated characteristic for livestock needs to be taken in consideration for designed breeding program, including adaptive ability to high altitudes, black coat coloration, and widespread geographical locations. Tibetan sheep is known to adapt to various agroecological conditions, it therefore represents an excellent model for understanding genetic mechanisms on adaptations of livestock to extreme environmental conditions. This study is aimed to identify key genes those regulate environmental response and coat coloration simultaneously.
Overlapped genes in the explanation of altitude and geography variations It is common that hypoxia-inducible factors (HIFs) are transcription factors that respond to changes in the available oxygen in the cellular environment under high-altitude conditions. In this study, we identified a candidate gene, ZEB1 at chr.13 with top 1% Fst values in both altitude and geography variations (Figure 2 A-C). It is reported that ZEB1 is a target gene of HIFs, which is critical in the regulation of the macrovascular angiogenic response but not that of microvascular angiogenesis [32]. Moreover, under hypoxic conditions, HIF-1α up-regulates the expression of proteins that induce TWIST1 and ZEB1 [33,34]. RALY is identified to be correlated with both altitude and geography variations, which has been reported as candidate genes with significant enrichments in spliceosomal complex pathway in high-altitude adaption Tibetan sheep populations, indicating its important roles in hypoxia response [13]. Fst values, also named fixation index, were typically used to evaluate differentiated genomic regions and identify selective signals among whole-genomic SNPs [30,31]. IL12RB1 is identified with high Fst values for both color and geography (Figure 2 B-C), and IL12RB1 also has high levels of heterozygosity specifically in Tibetan geography locations ( Figure S3). The numbers of totally overlapped candidate genes were presented based on top 1% Fst values for different combinations among altitude, color and geography from 150-kb windows ( Figure 2D).

Overlapped genes in the explanation of coat coloration and geography variations
Differentiated genomic regions in classification of geography factor reflect environmental selections and robustness. We identified that two known genes that is related to variation of both geography and coat coloration. ARAP1 gene at chr.15 has been previously reported to be under climate-driven selection as signal molecular involving in GTPase regulator activity process [35], which is in charge of various types of membrane transport, such as vesicle fusion and docking of transport vesicles to specific target organelles and/or plasma membranes during secretory processes [36,37]. It can anchor and transport melanosomes to the plasma membrane, and melanosomes within melanocytes is critical for pigmentation synthesis of hair [38]. IL12RB1 at chr.5, was also correlated with environmental selection [35]. It is involved in cytokine-cytokine receptor interaction pathways, and cytokine is secreted by keratinocytes and by melanocytes during the first two steps of Hyperpigmentation in the epidermis [39]. We also compared levels of genomic heterozygosity within different subpopulations as represented by |ZHp| values [40], and found IL12RB1 has high levels of heterozygosity specifically in Tibetan geography locations ( Figure S3).

Coat coloration, altitude and geography variations
Our findings suggested that only variation of SNP/INDEL at 5'-flank region of PAT1 can fully explain the phenotypic variations in combined characteristics of altitude response, coat color, and geography locations ( Figure 3C). This indicates that the expression of this gene could play key roles in regulating these productive and adaptive traits, which would be discussed in the following. For other genes, the INDEL variation only can explain partially to defined clusters. For example, the KLF4 has 4 types of INDELs, in 5'flank and other 3 intron regions. The types of INDEL can explain the variation of altitudes and coat coloration, but not three geography types. PSMA1 can explain coat coloration variation but cannot explain variation of other two clusters, whereas ZNF711 can explain altitude variation, but cannot explain variation of another two clusters. Taken together, only the 5'-UTR of PAT1 can largely explain the variance of three classifications, indicating that genetic basis of such as coat coloration and adaptive traits relies on gene expression.

Function of PAT1 in productive and adaptive characteristics
The CSDE1 (Cold Shock Domain Containing E1), alternative name: UNR, is a conserved RBP containing five cold-shock domains (CSDs) that bind single-stranded RNA [41,42]. We confirmed CSDE1 homologous protein, PAT1 showed clear differences between high low altitudes subpopulation of Tibetan sheep populations and between different geography locations with changed climate conditions ( Figure 3C). CSDs represent the most evolutionarily conserved nucleic acid-binding protein domain, found in bacteria and eukaryotes [43,44]. This domain of around 70 amino acid residues mediates binding to single-stranded DNA and RNA [45,46]. Function of proteins possessing CSDs are involved in two processes: transcriptional and translational control. Two major old shock proteins CspA and CspB in Escherichia coli and Bacillus subtilis, respectively, were intensively reported. CspA and CspB are massively and transiently induced after a temperature downshift and are involved in the adaptation to cold shock [47].
To explain the relationship of PAT1 and CSDE1 with coat coloration, we reconstructed the CSDE1-centerized module according to STRING protein-interaction database, and we found 9 genes that involving in this module. These are: MAX, MYC, YBX1, HNRNPU, DHX9, PABPC1, PAIP1, STRAP, SYNCRIP. Differential gene expression analysis via qPCR suggested that these genes have strong correlation with CSDE1 and PAT1 (Table S8) Figure S4), and we confirmed that expression of the two genes showed distinct pattern between white coat coloration and black coat coloration subpopulations, especially for PABPC1 ( Figure S4C). These inherent correlations probably explain the variation of coat coloration regulated by CSDE1/PAT1.

Morphological and physiological parameters of different haplotypes
Different haplotypes of PAT1 could be reflective of altitude response, and is related to changes of morphological characters, such as body weight, shear amount, chest width and body length and body height ( Figure 5A-B). This is consistent with the conclusion that large physical size in highland as observed in man [50]. In particular, body weight of Tibetan sheep in C-type population is 1.7 times higher than that in CA-type group. For body haematological parameters, body temperature, pulse interval, MCH, MCHC, HGB levels are at least 1.5 times higher in C-type of PAT1 group than that in CA-type population; this is similar to that observed in Tibetan sheep and dogs [13,51].
Interestingly, breathe rates, pulse rates, RBC, HCT, MCV, PLT, and RDW-CV are relatively lower in C-type of PAT1 than that in CA-type population. Values of blood-gas parameters were similar or relatively lower in C-type population than that in CA-type population, except for pO2. For lung tissue structure parameters, values are relatively higher in Ctype population than that in CA-type population. In particular, elevated lactate dehydrogenase (LDH) in C-type population relative to CA-type population support the    subpopulations. The overlapped genes were highlighted in blue fonts. Identified genes in two and three clusters were labeled in yellow and blue, respectively. D: Venn diagram representing overlapped genes across different clustered subpopulations. E: Numbers of corresponding genes harboring overlapped SNPs across different clustered subpopulations.  Comparison on sequence alignments and expression pattern between PAT1 and CSDE1. A: Secondary structure of PAT1 and differentiated amino acids between Supplementary Files