Mitogenomic Diversity and Maternal Origins of Yak ( Bos grunniens )

Background and aim: Mitochondrial genome has the characteristics of simple structure, no recombination, maternal inheritance, high conservation, fast evolution, and high copy number. It is easy to sequence, contains high-resolution phylogenetic information, and exists in a wide range of taxa. Therefore, it is widely used in the study of biological phylogeny. At present, phylogenetic studies focus mainly on D-loop region, cytochrome b gene, and protein coding sequence; phylogenetic studies using the mitochondrial complete sequence are rare. Therefore, this study aimed to conduct phylogenetic experiments using the mitochondrial complete sequence and compare the results with previous findings obtained using partial sequences. Results: Complete mitochondrial sequences of five yak populations from Qinghai and Xinjiang were obtained. The haplotype diversity of the five populations were Xueduo yak (0.992 ± 0.015), Pamir yak (0.990 ± 0.014), Yushu yak (0.963 ± 0.033), Qilian yak (0.948 ± 0.036), and Huanhu yak (0.905 ± 0.048), which showed a higher haplotype diversity compared with other breeds. A total of 78 haplotypes were obtained from 111 individuals. Among these, Yushu yak, Huanhu yak, Xueduo yak, and Qilian yak all shared haplotypes, but the Pamir yak did not share haplotypes with these four populations. Phylogenetic analysis showed that yak populations were separable into three distinct branches. The analysis identified a new phylogenetic branch with both wild and domestic yaks being represented in the new branch. The 155 haplotypes were divided into 6 haplotype groups by haplotype clustering. The haplotype group had nothing to do with the morphological group of yaks, and yaks from the same population or the same ecological group were distributed in different haplotype groups. The four haplotype groups A, B, C, and D, showed a star-shaped distribution of haplotypes. The central haplotypes were widely distributed and had a high frequency. Conclusions: The genetic diversity of yaks in Qinghai was high. Both domestic and wild yaks exhibited three different maternal origins. Three distributed and had a high frequency. Overall, these findings suggested that haplogroup A was associated with the primary yak domestication event. However, the results of this study might be susceptible to haplogroup bias owing to the limited sample size. Additional samples from outside of Qinghai province should be collected to clarify these results in the future.


Introduction
Yak (Bos grunniens) is a large animal with a compact body, an absence of functioning sweat glands, and a relatively small skin surface area per unit of body weight [1]. These characteristics make yaks well adapted to low-temperature hypoxic environments where they are endemic, such as the QTP Qinghai−Tibetan Plateau (QTP) and adjacent high-altitude regions [2]. China has more than 16 million yaks [3], accounting for more than 95% of the global yak population [4]. Yaks are multipurpose, high-altitude bovid species [5], providing an essential source of economically valuable products for local herders such as milk, meat, fur, and fuel [6]. In addition, yaks provide locals with a means of transportation [7]. Yak thus play an important socioeconomic role in regions where they are endemic, being vital in maintaining pasture ecosystems and agricultural biodiversity in plateau areas [8]; hence, yaks are referred to as "all-around animals" [2]. However, yak populations are currently undergoing degradation, leading to an urgent need to cultivate new breeds. Yaks may have been domesticated from wild yaks by ancient Qiang people in northwest China during the early Holocene period [7] and are the only large animal to live in an environment similar to that of their wild ancestors [9]. Qinghai province is located in the northern QTP and is home to abundant yak genetic resources. Archeological analyses and assessments of mitochondrial DNA (mtDNA) sequences suggest that Qinghai province may be the origin or site of yak domestication [10]. However, further research is warranted to better clarify the maternal origins, phylogenetic structure, and diversity of domestic yak populations by studying the yak mitochondrial genome.
Genetic diversity is a central facet of biological diversity [11], and variations in DNA sequences are the primary drivers of such diversity. MtDNA is a self-replicating, maternally inherited circular DNA molecule that undergoes rapid evolution and no recombination, and contains high frequencies of polymorphic variants [12]. As the entire mitochondrial genome is inherited as a single unit and mutations accumulate therein [13], mtDNA is considered an ideal tool for performing animal studies, evolution, classification, and population genetic diversity [14]. 4 Previous studies analyzed mtDNA to explore bovine genetic diversity [15]. A separate analysis of mtDNA samples from domestic yaks revealed that all domestic yaks originated from a common wild yak ancestor [15], enabling researchers to tentatively identify two domestic yak migration routes [14]: One originates from the eastern part of the QTP through the Himalayas and the Kunlun Mountains to the Pamir region, and the other originates from the eastern part of the QTP through the South Gobi and the Altai Mountains to Mongolia and Russia. Diversity levels of the domestic yak population are the highest in the QTP region [14]. Most mtDNA studies conducted to date have focused explicitly on the control D-loop region of the mitochondrial genome, making it virtually impossible to clearly distinguish certain important branches in livestock [16,17].
Recently, researchers have discussed the importance of performing complete mtDNA sequencing because such analyses can yield a detailed genetic map when conducted with a sufficiently large sample size [18].
In combination with previous research results, yaks from Qinghai province, which has the highest yak population diversity in the QTP region and is also the presumed origin of yaks, were selected as the research object and Pamir yak from Xinjiang province as the control group in this study.
The complete mitochondrial sequences of Yushu, Qilian, Huanhu, Xueduo, and Pamir populations were obtained by sequencing, and the genetic diversity and interspecific genetic distance of the five populations were analyzed. The phylogenetic relationship of yaks was analyzed using the sequencing data and the existing mitochondrial sequence data in GenBank. This study evaluated the maternal origin of yak at the molecular level and laid a foundation for protecting and effectively utilizing bovine resources.

Analysis of base composition and polymorphic sites of mtDNA
This study assessed genetic variations in 111 complete mtDNA sequences (16321-16325) from 4 yak breeds in Qinghai province and 1 yak breed in Xinjiang province as a means of evaluating the genetic diversity, phylogeny, and maternal origins of these breeds. These analyses found that these five yak breeds all exhibited an identical overall base composition (27.3% T, 25.8% C, 33.7% A, 5 and 13.2% G). Among these, A+T (61%) was higher than G+C (39%), indicating that A + T was substantially more common than G + C in the total yak mitochondrial gene, displaying an AT bias.
A total of 114 variable sites were found in Huanhu yaks (1 singleton variable site and 113 parsimony informative sites), 105 in Pamir yaks (1 singleton variable site and 104 parsimony informative sites), 105 in Qilian yaks (all were parsimony informative sites), 122 in Xueduo yaks (all were parsimony informative sites), and 115 in Yushu yaks (all were parsimony informative sites). Through these sequence analyses across all five populations, 150 variable sites were found, including 2 singleton variable sites and 148 parsimony informative sites.

Haplotype analysis of mtDNA
A total of 78 haplotypes were defined for these 111 samples. Among 78 haplotypes, 13 were shared by different populations (accounting for 16.67% of the total haplotypes) and 65 were specific haplotypes of specific populations (accounting for 83.33% of the total haplotypes). The number and type of haplotypes were different among different populations. Of these, haplotype H2 was the most common, being observed 13 times and represented among Huanhu, Qilian, Xueduo, and Yushu yaks. Of the 78 haplotypes observed in these 5 yak populations, only Pamir yaks in Xinjiang did not share haplotypes with other yak populations; the other 4 yak breeds all had shared haplotypes. For more information about haplotypes, refer to Supplementary Table S2.

Genetic diversity analysis of yak mtDNA
The genetic diversity analysis showed that the haplotype diversity of the five yak populations was 0.981 ± 0.007, and the nucleotide diversity was 0.00272 ± 0.00019. These data showed that the genetic diversity of five yak populations was high. The nucleotide diversity value of the Pamir yak population (0.00309 ± 0.00018) was higher than that of the other four populations. The haplotype diversity reached a maximum in the Xueduo yak population (0.992 ± 0.015) and a minimum in the Huanhu yak population (0.905 ± 0.048). The D-loop region was found to be the most variable mtDNA sequence in these yaks. Additional details pertaining to genetic diversity estimates, including variable site number, haplotype number, and nucleotide diversity (Pi ± SD), are shown in Table 1.

Analysis of genetic distance between five yak populations
The results of the genetic distance between five yak groups are shown in Table 2. The genetic distance between any two yak groups was small based on the mitochondrial complete genome sequences. The genetic distance between Pamir yak and Qilian yak, Qilian yak and Xueduo yak both are 0.002, and the genetic distance between other yak groups was 0.003. No genetic distance of zero was found between any two populations, indicating differences in the genetic and breeding development of yaks.

Assessment of mtDNA sequence variations and genetic diversity
Combined with the 111 sequences obtained by sequencing, 95 complete mitochondrial genome sequences of yaks that lived in other provinces were downloaded from GenBank (Supplementary  Table S4). Base replacement saturation for 155 haplotype sequences was calculated using DAMBE software. The result is shown in Figure 1. The 155 haplotype sequence base substituents in yaks were not saturated, and therefore phylogenetic trees could be constructed.
A neighbor-joining tree (Fig. 2) was next constructed based on these 155 haplotype sequences.
This tree separated yaks into 3 branches, of which branch I was the largest (with 103 haplotypes), accounting for 66.45% of the total haplotypes. Branch II contained 52 haplotypes, accounting for 32.26% of the total haplotypes. Branch III was the smallest; only one domestic and one wild yak were included, accounting for 1.29% of the total haplotypes. Phylogenetic analyses revealed that yak mtDNA was separable into six haplogroups (haplogroups A-F). Haplogroup A was the most frequent; it was also the only haplogroup found in five yak groups, including Yushu yak, Qilian yak, Pamir yak, Xueduo yak, and Huanhu yak. This was followed by haplogroup B, which included yaks from all provinces. In contrast, yaks from Yunnan were not included in haplogroup A, whereas yaks from all other provinces were present therein. The aforementioned experimental results showed that branch I comprising A and B was the main branch of yak. In order to verify the reliability of the results, we constructed a Bayesian phylogenetic tree (Fig. 3), and found that 7 206 yak mitochondrial complete sequences were also divided into 3 large branches and 6 haplogroups.This proves the reliability of data analysis.
A network diagram of 206 yak individuals (Figure 4) was constructed to study yak phylogeny.
This diagram mainly comprised three branches, thus verifying the reliability of the branches in Figure 2. Six haplogroups (A-F) were identified, with haplogroups A, B, C, and D exhibiting a star-shaped phylogenetic relationship, and haplogroups E and F exhibiting a dendritic phylogenetic relationship.

Discussion
As a form of matrilineally inherited genetic material, mtDNA has been widely used to study maternal phylogenetic relationships among mammals [19,20]. Studies of the evolution and taxonomy of yak populations in China to date have primarily focused on the mitochondrial D-loop region [21] and cytochrome b gene (cytb) sequences [16]. The cytb gene is relatively stable in organisms, and its mutation process is slow in mtDNA. Therefore, it is commonly used to reconstruct phylogenetic relationships above the species level. The D-loop region is often used to study the system relationships at the subspecies level because of its rapid variation in mtDNA [15].
Complete mtDNA sequences yield more data than either of these individual isolated sequences, making them more useful than cytb, D-loop regions, or nuclear genes when evaluating mammalian phylogenetic relationships [22][23][24][25][26]. Four yak breeds in Qinghai province and one yak breed in Xinjiang province were sequenced, evaluating the genetic diversity and matrilineal origins of these breeds by comparing them with one another and with yak mtDNA sequences in the GenBank database to develop a phylogenetic model of Chinese yaks.
In this study, mitochondrial full-length sequencing of 111 individuals from 5 yak populations showed that the A + T content was higher than the G + C content, showing the characteristics of mitochondrial base preference. By referring to the literature published in the journal Mitochondrial DNA Part B, a general AT preference was found in the yak mitochondrial genome.
Specific species often form a set of favorable codon usage patterns in the long-term evolutionary process to adapt to environmental changes [27]. The codon preference is influenced by many 8 factors, such as genome size, gene sequence length, gene density, gene expression level, CpG island, GC distribution location, tRNA abundance, protein secondary structure, and codon variation preference [28,29]. Among these, natural selection and mutation are the two main factors affecting codon usage bias, which are also the two factors that have received much attention at present [30,31]. The base preference of the yak mitochondrial genome may also be to better adapt to the alpine and hypoxic environment at high altitudes. In 111 individuals, 78 haplotypes were found, among which Yushu yak, Huanhu yak, Xueduo yak, and Qilian yak all shared haplotypes, but the Pamir yak did not share haplotypes with these four populations. This might be because the first four populations were close to each other and exchanged genes between populations, while the Pamir yak was far away from the other four species, creating geographical isolation. Further, the haplotype specificity of the Pamir yak was caused by the accumulation of mutations during migration from its origin to the Pamir region.
Genetic diversity is an integral part of all biological diversity. It is the basis of biological evolution and species differentiation, and is of great significance for population maintenance and reproduction and adaptation to habitat changes. The haplotype diversity and nucleotide diversity are important indicators to measure the degree of genetic variation of the population. The higher the haplotype diversity and nucleotide diversity, the higher the degree of genetic variation of the population. The more the genetic diversity, the more likely it is to adapt to different environments.
The haplotype diversity of the five populations was higher than that of Jiulong yaks, Maiwa yaks, Zhongdian yaks, Tianzhu white yaks, and Huanhu yaks studied by Zhengchao Tu by sequencing the mitochondrial genomes of 111 yaks from 5 populations in Qinghai and Xinjiang [32]. This was consistent with the result that the diversity level of the yak population was the highest in the QTP [14]. The haplotype diversity (0.905 ± 0.048) of Huanhu yaks in this study was close to that of Huanhu yaks (0.9000) studied by Zhengchao Tu, which verified the reliability of the analysis.
Phylogenetic relationship analyses conducted in this study revealed that yak populations were separable into three distinct branches. Relative to the findings of Lai [16], Guo [33], and Ho [34], the present analysis identified a new phylogenetic branch with both wild and domestic yaks. 9 However, Wang et al. [35] separated wild yaks into three branches, whereas domestic yaks into two branches. Only a single wild yak was represented in branch III in the present study. In their COIII study of 111 Tibetan yaks, Zhao et al. [36] separated these animals into three branches, consistent with the results of the present study. Previous studies showed highly differentiated intraspecific genetic branches either from several independent domestication events or from a wild species that diverged in an early stage [37,38]. Together, the findings of this study suggested that yaks might have three matrilineal origins. All three branches were found in both yaks and wild yaks, suggesting that highly differentiated genetic branches had been developed in the early wild yaks. Differences in grouping among studies might be attributable to limited sample sizes in certain analyses, resulting in the overlooking of yaks in the smaller third phylogenetic branch.
The 155 haplotypes were divided into 6 haplotype groups by haplotype clustering, which was consistent with the results of Zhaofeng Wang [35] using mitochondrial protein coding sequence clustering. Zhengchao Tu [32] used the enzyme digestion method to divide yaks into five haplotype groups, revealing that the results of different methods might be inconsistent. The haplotype group had nothing to do with the morphological group of yaks, and yaks from the same population or the same ecological group were distributed in different haplotype groups.
Accordingly, each haplotype group contained individuals from different populations or different ecological groups. These results were consistent with the findings of Guo [33] and Wang [35].
The mtDNA of yaks in Qinghai was distributed across haplogroups A-F, but such distribution was not observed for yaks in other provinces. Yaks in Qinghai were also close to the wild yak distribution range, suggesting that Qinghai was likely the site of initial yak domestication [35]. All yaks were divided into six haplotype groups; four of these haplotype groups (A, B, C, and D) showed star-shaped distribution of haplotypes, which was typical of domestic species and consistent with population expansion [39]. The central haplotypes of A, B, C, and D were widely distributed and had a high frequency. Overall, these findings suggested that haplogroup A was associated with the primary yak domestication event. However, the results of this study might be susceptible to haplogroup bias owing to the limited sample size. Additional samples from outside of Qinghai province should be collected to clarify these results in the future.

Conclusions
In summary, the results revealed that the genetic diversity of yaks in Qinghai was high. Both domestic and wild yaks exhibited three different maternal origins, and three different matrilineal origins all existed in ancient wild yaks before domestication, with haplogroup A being the primary haplogroup associated with yak domestication.

Animals and sample collection
The yaks in Qinghai province, the central producing area of yaks, were selected as the research object, and Pamir yaks in Xinjiang were selected as the control group.

Extraction, amplification, and sequencing of the mitochondrial genome
In the experiment, primers (Table 1, P1-P6) designed by Zhaofeng Wang [35] were used to amplify the entire yak mitochondrial genome, and the primers were sent to Xi'an Qingke Biotechnology Co., Ltd. (Xi'an, China) for synthesis. An EasyPure Blood Genomic DNA Kit (Quanshijin Biotechnology Co., Ltd., Beijing, China) was used to extract DNA from yak blood samples. The extracted DNA was diluted to 50 ng/μL and stored at −80°C until further use. The 11 DNA concentration and OD260/280 ratio of the samples were determined using a NanoDrop 2000 spectro-photometer (ThermoFisher Scientific, MA, USA). The DNA concentrations and OD260/280 ratios of the samples were 50-1000 ng/μL and 1.6-1.8, respectively. The DNA extraction metrics and tissue-specific metadata are shown in Supplementary Table S1. Ribonucleic acid quality was assessed by evaluating the band on a 1% agarose electrophoretic gel.
Individual polymerase chain reaction (PCR) reactions were conducted using a PCR reaction system, with each reaction containing 2 μL of each primer, 25 [43] was used to calculate the frontal genetic distance between five yak populations, and an NJ tree was constructed. Next, 1000 bootstrap replicates were used to assess the reliability of the topology of this tree. A Bayesian phylogenetic tree was constructed using MRBAYES 3.1.2 [44] based on the GTR +G + I Model.  Sciences, Chinese Academy of Agricultural Sciences.

Consent for publication
Not applicable.

Availability of data and material
The datasets generated and analysed during the current study are available in the GenBank repository, accession number: MW414100-MW414210 (https://www.ncbi.nlm.nih.gov/genbank/).
The data supporting the conclusions of this study are available within the supplementary table.