Genomic Analyses Reveal Distinct Genetic Architectures and Selective Pressures in Chinese Donkeys

Jiafei Shen Northwest A&F University: Northwest Agriculture and Forestry University Jie Yu Northwest A&F University: Northwest Agriculture and Forestry University Xuelei Dai Northwest A&F University: Northwest Agriculture and Forestry University Gang Wang Northwest A&F University: Northwest Agriculture and Forestry University Ningbo Chen Northwest A&F University: Northwest Agriculture and Forestry University Chuzhao Lei Northwest A&F University: Northwest Agriculture and Forestry University RUIHUA DANG (  drh211@hotmail.com ) Northwest A&F University: Northwest Agriculture and Forestry University https://orcid.org/0000-00024818-5871


Abstract
Background Donkey (Equus asinus) is an important livestock animal in China because of its nourishment and medical value. After a long period of natural and arti cial selection, the variety diversity and phenotype of donkey are very rich. However, due to the lack of genome-wide studies, its genomic value remains unclear.

Results
We clari ed the genetic and demographic characteristics of Chinese donkeys and the selective pressures on them by analyzing 78 whole genomes from 12 breeds. We found that Gunsha donkey had the lowest nucleotide diversity, longest length, and largest number of runs of homozygosity. Phylogenetic analysis revealed obvious geographical distribution trends in Chinese donkeys. In selective sweep, gene annotation, functional enrichment, and differential expression analyses between large and small body donkey groups, we identi ed selective signals, including NCAPG and LCORL, to be related to rapid growth and large body size.

Conclusion
Our ndings improve the understanding of the evolutionary history and formation of different donkey breeds and provide theoretical insights into the genetic mechanism underlying breed characteristics and molecular breeding programs of donkey clades.

Background
The distribution of domestic donkey, an important farm and transportation animal in China, is concentrated between 32° and 42° north latitude. Domestic donkeys live in the northwest, north, and southwest regions of China in temperate and warm climates (1). After a long period of natural and arti cial selection, domestic donkeys developed to have different body shapes and production performance. They are particularly suited for transport in mountainous and arid regions. With advancements in agricultural mechanization and rapid development of the transportation industry, domestic donkeys have gradually bene eliminated as a means of transportation, and the number of donkey herds has decreased sharply; however, the meat and medicinal values of domestic donkeys have increased (2).
The history and origins of the donkey are particularly interesting. Because the earliest archeological evidence (fossil evidence) of domestic donkeys was found in Egypt, it was originally thought that domestic donkeys were domesticated from Nubian wild donkeys around the Nile in ancient Egypt (3)(4)(5)(6).
However, ancient donkey bone samples were found in Syria, Iran, and Iraq in the 1980s because of technical limitations at the time. Some archeological researchers even believed that the Asian wild donkeys were the direct ancestors of domestic donkeys (7)(8)(9)(10)(11). Donkeys can be either domestic or wild.
Ten articulated donkey skeletons were discovered in the three brick tombs, causing scholars to believe that domestic donkeys were domesticated from wild African donkeys (12). Archeological, historical, and ethnographic data indicated that there were at least three different groups of wild asses in Africa 2000 years ago, with only two surviving until modern times (12). Somali wild ass is critically endangered but is still found in Somalia, Ethiopia, and Eritrea. Nubian wild asses have rarely been seen in recent years, creating concerns that these populations have become extinct. Previous studies of mitochondrial DNA levels have shown that modern donkeys can be clearly separated into two distinct clades: clade I was clustered with the Nubian wild ass (Equus africanus africanus) and clade II was closer to the extinct Somali wild ass (E. africanus somaliensis) (13,14).
Whole-genome sequencing is a powerful method for evaluating population structures and demographic patterns as well as for identifying regions associated with important economic and environmental adaptive traits. However, because of the lack of a complete chromosome-level reference genome, most recent studies of donkeys have been based on mitochondrial levels. Considering that the entire genome variation of donkey is largely unexplored, we sequenced the whole genome of 78 donkeys from 12 breeds of different geographic origins to examine the genomic diversity, population structure, and demographic history of this important livestock species and to reveal possible signs of natural and arti cial selection.

Ethics statement
This study was conducted to the guidelines of the Council of China and animal welfare requirements.

Sample collection and sequencing
We evaluated 78 donkeys from different locations (Table S1). Genomic DNA was extracted from the ear tissue or blood samples using the standard phenol-chloroform protocol (15), ampli ed in genomic libraries with an average insert size of 500 base pairs, and sequenced (150-base pair paired-end reads) on an Illumina HiSeq 2000. Additional details are provided in Table S2 Pseudo chromosomal level reference genome assembly We downloaded the latest donkey reference genome (PRJNA259598) from NCBI. This genome was published in 2015 and is currently at the scaffold level (16). From an evolutionary perspective, horses are relatives of donkeys and high-quality reference genomes of horses have been determined at the chromosomal level. Therefore, we used the reference genome of a related horse species (PRJNA421018) to assemble the donkey scaffold at the chromosome level (17). First, we used minimap2 software (18) (parameter: secondary = no -cx asm10) to compare the donkey's reference genome to the horse's reference genome to obtain a paf le. Next, using a Python script (Connect_Pseudo_Chromosome.py, Note S1), according to the comparison information in the paf le, the donkey's Scaffold was assembled to at the pseudo chromosome level, and the connection information of each scaffold was recorded. To obtain annotation information for the donkey's pseudo genome, we used a Python script (Pseudo_Chromosome_Annotation.py, Note S2) to convert the annotation information at the donkey scaffold level to the pseudogene which was assembled based on the donkey's scaffold connection information.

Alignments and variant identi cation
All cleaned reads were aligned to the reference genome (PRJNA421018) linked to "30 + X + Y + unplaced" pseudo-chromosomes (Table S3) using BWA-MEM (19) with default settings. Duplicate reads were ltered using Picard tools. SNPs were detected with the Genome Analysis Toolkit (GATK, version 3.6-0-g89b7209) (20) and ltered using the "VariantFiltration" tool, as described in the Supplementary Information.

Phylogenetic and population structure analyses
The NJ tree, PCA, and ADMIXTURE methods were used to explore the genetic relationships among key populations. An individual-based NJ tree based on the matrix of pairwise genetic distances from the autosomal SNP data of 60 donkeys was constructed with PLINK (version 1.9) and visualized in FigTree.
The TreeMix program was used to evaluate the population-level phylogeny (21). PCA was performed using the SmartPCA program in the EIGENSOFT v5.0 package (22) and the signi cance of the eigenvectors was detected by the Tracy-Widom test. The population genetic structure was estimated using ADMIXTURE v. 1.3.0 (23) considering 2-10 clusters (K).

Genome-wide selective sweep test
To detect selective sweeps in different body size groups, we compared two types of donkeys: (i) large body size group as the reference and small body size group as the object population; (ii) small body size group as the reference and large body size group as the object population. Two methods were used: (i) xation index (F ST ) values (24) were calculated in sliding 50-kb windows with 20-kb steps along the autosomes using VCFtools (25); (ii) high differences in genetic diversity (π ln ratio) were calculated with 50-kb sliding-windows and 20-kb steps along the autosomes using VCFtools and in-house scripts. Signi cant genomic regions were identi ed as having a P-value < 0.005. The two methods showed outlier signals (P-value < 0.005) in overlapping regions and were therefore considered as candidate selective regions. KOBAS 3.0 (http://kobas.cbi.pku.edu.cn/anno_iden.php) was used to examine their biological functions and pathways involved in these regions.

Results
Analyses of 78 China donkey genomes revealed > 24.4 million single-nucleotide polymorphisms (SNPs)  (Fig. 1A). Analysis of the genomic characteristics showed that nucleotide diversity was highest in Huaibeihui donkey and lowest was in Gunsha donkey, while diversity was similar in other donkey breeds (Fig. 1C). Runs of homozygosity (ROH) analysis performed using the sum of the ROH (S ROH ) and number of ROH (N ROH ) showed signi cant differences in 12 donkey breeds.

Population genetic structure and relationship
Neighbor-joining (NJ) trees, principal component analysis (PCA), and ADMIXTURE were used to explore the genetic relationships among the examined donkey populations. In the NJ tree, the donkey breeds were separated into three main clades: East China Plain donkey, loess plateau donkey, and plateau area donkey ( Fig. 2A). PCA showed that the rst PC explained 2.66% of the total variation and separated Gunsha and Jiami donkeys from other donkey breeds. The second PC, explaining 2.59% of the total variation, could not separate any breed (Fig. 2B). Admixture analysis did not distinguish between each donkey breed and different ancestral types. The results showed that most individuals were of mixed ancestry ( Figure S1).

Two clearly separated Chinese donkey mitochondria clades
Previous studies divided the phylogenetic relationship of donkey mitochondria in two lineages. Thus, we constructed a highly supported phylogenetic tree based on the 78 mtDNA genome sequences (coverage > 100×) from this study and 8 mtDNA genome sequences downloaded from NCBI (including one horse, four E. hemionus, one E. kiang, and two E. asinuss somali).
As outgroup, horses were independent of domestic and wild donkeys. Similarly, Asian wild asses (E. hemionus and E. kiang) formed a branch between the horse and domestic donkey. Two highly divergent phylogenetic clades (Clade I and Clade II) were identi ed in domestic donkeys, with two Somali wild asses clustering within a group close to Clade II. In addition to mtDNA genome phylogeny analysis, we constructed a reduced median network based on 78 domestic genome sequences to assess whether their evaluation patterns were similar between the two lineages. Seventy-eight samples were de ned as having 12 haplotypes based on their variety information. Similar to mitochondrial cluster analysis, the haplotypes can be divided into two branches; Clade I showed a simple star-like shape, whereas the genetic architecture of Clade II haplotypes was more complex, with more universally occurring haplotypes.

Genome-wide differential selection in small and large body size
The phenotypes of Chinese domesticate donkey we found to be very rich and were divided into three groups according to body size (large donkeys, medium donkeys, and small donkeys). We applied the Fst and π-ratio to evaluate genomic regions related to the selection of the large and small body size groups. The Manhattan plots showed the distribution trend of the selected signals among these different groups (Fig. 3A). The 1% top regions from different methods were considered as candidate selection regions, and 332 and 589 genes were obtained after annotation, respectively. Only 85 overlapping genes (Fig. 3D) identi ed by the two selected statistical methods were de ned as candidate genes under positive selection (Table S4). In both methods, the most signi cant signals were observed on chromosome 3, which mainly contains two genes, non-SMC condensin I complex subunit G (NCAPG) and liganddependent nuclear receptor corepressor-like (LCORL). Previous studies reported that LCORL and NCAPG are closely related to body size traits in other species (26)(27)(28)(29). We also found a well-known gene, FAM184B, which is related to the body size traits on the 3 chromosomes from the π-ratio (30,31). In addition, 11 and 5 nonsynonymous sites were found on LCORL and NCAPG, respectively, through mutation site analysis. According to genotype analysis of 16 nonsynonymous loci, the nonsynonymous genotypes signi cantly differed between large and small donkeys (Fig. 3B). LCORL and NCAPG were observed as neighboring genes at the chromosome level, occupied near 0.2-Mb regions, and contained 586 SNPs overall. We calculated the linkage disequilibrium values of these SNPs to determine whether the SNPs were linked, which revealed that SNPs in this region have strong linkage (Fig. 3C).
In addition, we calculated the SweepD values of 9 representative donkey breeds to choose the signi cant selection signals of each breed. Experience P-values of SweepD value less than 0.01 were considered as positive selection regions for each breed, and the distribution of SNPs in each breed was displayed in Manhattan plots ( Figure S2). By annotating the candidate regions of each variety, the counts of candidate genes for each breed were displayed on a histogram, as shown in Fig. 3E Intersections of candidate genes in each breed were considered, and the NCAPG and PTPRN2, which are related to body size traits, were detected in large donkeys (Dezhou, Guanzhong, and Guoluo donkey). Finally, we performed Kyoto Encyclopedia of Genes and Genomes and Gene Ontology enrichment analysis of the 85 candidate genes to determine the known functional information for each gene ( Figure S3).

Discussion
In this study, we analyzed the whole genome sequence of 78 donkeys from 12 breeds. According to their body sizes, 12 donkey breeds were divided into groups of small donkeys (Xinjiang, Tibetan, Qinghai, Haibeihui, Taihang, and Gunsha donkeys), medium donkeys (Jiami and Qingyang donkeys), and large donkeys (Biyang, Dezhou, Guanzhong, and Guoluo donkeys). The genetic diversity of these animals may re ect various evolutionary events, including bottlenecks during domestication, introgression, recent cross-breeding, and arti cial selection. Gunsha donkey showed reduced levels of nucleotide diversity compared to the other breeds, which is likely the result of intensive arti cial selection in China. The nucleotide diversity of the other 11 donkey breeds was similar, which may be explained by unknown historical demography such as population expansion or introgression and a degree of inbreeding or a smaller effective population size. The length and number of homozygosity (ROH) among different donkey breeds were analyzed. ROH was rst described by Gibson (32) and de ned as contiguous homozygous genotype segments in the genome that are present in an individual because of the transmission of identical haplotypes from parents to their offspring (33)(34)(35). Inbreeding can increase the homozygosity of a population; as the level of inbreeding increases, the possibility for becoming homozygous for harmful recessive genes also increases, which may lead to decreases in reproduction, viability, and phenotypes of the offspring (36). The segment and number of ROH in Gunsha donkey were longer and larger, respectively, than those in other donkey breeds, possibly because of strong arti cial selection and recent excessive inbreeding of Gunsha donkeys. Qinghai donkey had the shortest and smallest number of ROHs among the 12 donkey breeds. This may be because of the following reasons: 1. Qinghai donkey inbreeding may have been low in the large area of Qinghai and 2. Qinghai donkey was not the only or most important economic and transportation animal in the local area, and thus the intensity of natural and arti cial selection may be relatively low. A lower inbreeding coe cient and lower selection pressure may be underlying factors causing Qinghai donkey to have the smallest number of and shortest ROH.
Population structure and phylogenetic analyses can improve the understanding of the evolutionary process of certain species or breeds. As there is no complete chromosome-level reference genome available, no population genomic analyses have examined the genetic ancestry and population structure of Chinese donkey. In our study, we selected representative breeds of Chinese donkey to analyze their relationships. According to the NJ tree, Mongolian wild ass was the outer group, and Chinese donkey breeds were clustered into three branches according to their geographic region. These mainly included the East China Plain, loess plateau, and plateau area donkeys. The results of PCA showed that only the rst PC separated Gunsha and Jiami donkey from the other donkey breeds, whereas other PCs could not separate the breeds. Admixture analysis did not distinguish between each donkey breed and different ancestral types. The reason that clear results were not obtained by PCA and admixture analysis may be that most individuals were of mixed ancestry. With the rapid development of civilization, donkeys were rarely used as a means of transport; however, domestic donkeys have been widely used for their meat and medicinal value. This has led to the arti cial migration, hybridization, and breeding of Chinese donkeys. Such mixed results can be explained by mixed ancestry between donkey breeds. The degree of genetic diversity among Chinese donkey breeds may not be high, and the difference in genetic background is not obvious. Similar to the results of previous studies, we found that the Chinese donkey has two maternal sources and forms two branches, clade I and clade II. Among them, Clade II and Somali wild donkeys gather together, and this branch may have originated from Somali wild donkeys in Africa.
As the Asian wild ass and Chinese donkey were present on different branches, it does not appear that the domestic donkey originated from the Asian wild ass.
By performing whole-genome selection scans, we analyzed the genetic basis of the divergence of different body sizes of donkey groups. The top two divergent genes on chromosome 3 between the large and small donkey groups were both strongly related to cell mitosis and bone development, including NCAPG and LCORL, which may play an important role in body growth. NCAPG is a subunit of the condensin 1 complex which is involved in chromosome condensation and interacts with a DNA methyltransferase linking methylation and chromatin condensation (37). LCORL encodes a transcription factor that is thought to function during spermatogenesis (38). These neighboring genes exhibit strong linkage. In previous studies of the functions of these two genes, the regions containing the genes were often described together in many species. The regions of NCAPG and LCORL have been identi ed as loci for adult human height in European (39)(40)(41)(42)(43), Japanese (44) and African/African American populations (45,46). The region was also associated with the peak height velocity in infancy (47), birth weight (48) and birth length (49). In cattle, the two gene regions were associated with birth weight (50), weaning and yearling weight (51), increase in body frame size (52) and carcass weight (53). Further, this region was found to be associated with height in some horse breeds (26,29,54,55) and has been identi ed as highly differentiated between dog breeds (38) and a selective sweep region in European domestic pigs (56).

Conclusions
In our study, whether it is the selection signal among different body types of donkeys, or the selection signal in large-sized donkey breeds. NCAPG and LCORL were in signi cant selection regions. Thus, in the process of natural selection and arti cial domestication of donkeys in China, these two genes may play a vital biological role in the evolution of donkeys.  Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.