Investigation of the origin and genetic diversity of wild boars in Northeast Asia via mitochondrial D-loop markers

Northeast the wild in The and genetic diversity In this study, we determined the mtDNA D-loop sequences of 23 wild boars from Northeast Asia and analysed the sequences of 480 Eurasian wild boars available from NCBI. The Eurasian wild boar sequence pool included a total of 21 D-loop haplotypes, among which 27 SNPs were distributed. The phylogenetic tree showed that Eurasian wild boars could be divided into two major groups: Northeast Asian wild boars were mainly concentrated in group A, dominated by East Asian wild boars, while group B consisted of West Asian and European wild boars. Haplotypes 2, 11, 13, and 14 of group A in Southeast China shared position 120C with the outgroup, indicating that they might represent the ancestral group of wild boars. The observed geographical distribution and haplotype heat maps further showed that all the Eurasian wild boars were separately distributed and differentiated from native-born boars in Southeast China. Similarly, Northeast Asian wild boars belonged to one of the branches. The results indicate the possibility that Heilongjiang may be the main region of immigration or site of origin for the local wild boars in Northeast Asia.


Background
Wild boar is distributed mainly on the Eurasian continent and to a lesser extent in Northwest Africa [1]. It is generally accepted that domestic pigs are the evolutionary descendants of wild boars, which were domesticated by humans 9000 years ago [2][3][4][5]. The origin of wild boars has been a topic of interest in long-term studies of genetic and morphological variations in Eurasian pig breeds. In the past ten years, investigations based on a large amount of mitochondrial molecular data from domestic pigs and some wild boars have indicated that the oldest wild boar populations in Europe and Asia originated in Southeast Asia and then evolved into native breeds [6,7].
China harbours the largest number of autochthonous pig breeds in the world, and pork is the primary meat resource in China [8]. Currently, China accounts for 56.6% of global pig breeding and 49.6% of the world's pork consumption, holding the top ranking worldwide in both cases [9]. According to Chinese historical records, the Chinese economic and cultural centre was located south of the Yangtze River, where the domestication of wild boars began. The Chinese domestic pig has been identi ed as a descendant of the group of Southwest Asian wild boars [7,10]. Wild boars are widely distributed in China and are found in nearly every province. There are approximately seven geographical subspecies of wild boar in China, among which the Northeast subspecies accounts for the largest proportion (Fig.1a).
However, the population characteristics of wild boars in Northeast China have not been described in detail; thus, whether they migrated from other regions or originated locally is still unclear.
Although whole-genome-based high-throughput sequencing has become a popular data analysis tool, evolutionary differences between individuals of the same species can still be observed by mitochondrial DNA (mtDNA) sequencing. Because of its faster base substitution rate than the nuclear genome, mtDNA provides fast, economical, effective candidate markers suitable for phylogenetic analysis [11,12]. One of the components of the mtDNA sequence that serves as a regulatory promoter element, the displacement loop (D-loop), contains a hypervariable region at the 5' end; therefore, it has frequently been used as a target for investigating matrilineal inheritance and interspeci c and intraspeci c genetic relationships and for tracing the origin of vertebrates [13][14][15][16][17].
Our study focused on the genetic relationships among wild boars in Northeast Asia and the groups living in other areas (Europe and Asia), with the purpose of understanding the evolutionary status of this group in the Eurasian wild boar population.

Phylogeny of Northeast Asian wild boars
We sequenced the hypervariable region (161 bp-564 bp, Heilongjiang and Inner Mongolia (indicated with a yellow line). In the tree, we can observe that the genotype from Heilongjiang Province is the main haplotype throughout Northeast Asia, suggesting that Heilongjiang may be the centre from which the diffusion migration of Northeast Asian wild boars began.
The results indicate the possibility that Heilongjiang may be the main region of immigration or site of origin for the local wild boars in Northeast Asia.

Haplotype of the mtDNA D-loop and nucleotide variation in wild boars
For the purpose of exploring the origin of Northeast Asian wild boars, we further investigated the data for wild boars from Europe, West Asia, Southeast Asia and areas of China other than the northeast. A Eurasian wild boar dataset consisting of 480 samples was established. Finally, 21 haplotypes distributed across 27 SNP sites were obtained, including 8 singleton variable sites and 19 parsimony-informative sites. The SNP sites in the sequence are presented in Figure 3, and the base substitutions are listed in Table 1.  To clearly describe the origin and differentiation of wild boars in Northeast Asia, we rst built a medianjoining network (Fig. 4a) to obtain the ratios of different regions among 21 wild boar haplotypes and two Suina outgroup members, peccary (Tayassuidae) and warthog (Suidae), by using Network 5.0 and integrated the pie chart information for each haplotype into the neighbour-joining tree constructed with MEGA X. Additionally, we calculated the ratios of the seven geographic areas in which every haplotype was distributed (Fig. 4c). According to the phylogenetic relationships, we found that all the wild boar haplotypes could be classi ed into group A and group B (Fig. 4b) We further analysed the haplotype ratios of the wild boars of every group from every area of the Old World (Fig. 4c). We observed that Hap4 occurred only once in the European group, while Hap4 was one of six haplotypes that occurred simultaneously in the West Asian group. Hap4 and Hap15 accounted for largest proportions of the haplotypes in West Asia. The East Asia group exhibited the maximum number of haplotype varieties. The primary haplotype among the six haplotypes in the Northeast Asia group was Hap5, and the two secondary haplotypes were Hap3 and Hap10. However, consistent with the above results, the haplotype heatmap showed that more haplotypes were present in East Asia than in other areas, and the wild boar population in Northeast Asia showed a relatively similar haplotype composition to that of the Southeast Asia group, which differs from the haplotype clusters in Southern Asia, West Asia and Europe (Fig. 4d).

Discussion
Compared with the abundant research on domestic pigs, the available studies on wild boars are obviously insu cient. Because the wild boar is not a wild animal of national priority for protection due to its impact on crops and livestock[18], some groups of wild boars are facing extinction without the implementation of necessary conservation measures. In this paper, we determined mtDNA D-loop sequence variation among 23 wild boars from Northeast China and integrated 445 additional sequences from other regions in Eurasia available in the NCBI database, with the purpose of understanding the origin and genetic diversity of wild boars in Northeast China. Although our sampling was not su cient to cover the entire worldwide species distribution, the investigation results may help to trace the maternal origin of wild boars in various habitats.
There is a clear hypothesis that domestic pigs evolved from wild boars [3-5, 7, 19]. Based on this notion, it is essential to focus on understanding how the wild boar originated. It is widely accepted that the domestication of pigs was occurred separately in Asia and Europe [20,21]. Spontaneous or humanmediated introgression from Asia to Europe around the mid-18th century has also been reported [5].
Similarly, the domestication of the Chinese breed occurred separately in southern and northern regions divided by the Yangtze River [22,23]. However, there is an obvious dearth of surveys on feral pig origin and evolution patterns. A previous authoritative study suggested that pioneer wild boars originated from Southeast Asia and then extended to Eurasia [24,25]. Furthermore, Northeast China may be the centre from which domestic pigs spread to other areas in Asia, such as Korea [26].
On the basis of collating the wild boar sequences in the NCBI database, we considered the 21 haplotypes of the wild boar mitochondrial D-loop to be representative. Additionally, we found that wild boar diversity in East Asia was markedly greater than that in West Asia and Europe and that Southeast Asia harboured more variety than Northeast Asia. According to the sequence alignment shown in Fig. 5, we inferred that the Hap2, 8, 11, 13 and 14 came from the oldest wild boar population because they shared the 120T base with the peccary and warthog sequences; therefore, 120T is predicted be one of the initial alleles of wild boar. Furthermore, these ve haplotypes were only present in China, so we speculated that China, especially Southeast China, may likely be the region of origin for wild boars. We also found that the haplotypes of the subgroup in group B were the ultimate results of evolution, as they possessed the 293T base. However, peccary, warthog and wild boar, in addition to group B, which included 293C, were mainly distributed in Southeast, West and Southwest Asia, indicating that diverse differentiation has occurred in Southeast China. However,in Europe and West Asia, Sus scrofa was a transition group in the multifarious evolutionary event, and it is very likely that this group migrated to West Asia and even Europe, furthermore and contributed to the domestication of European pigs. The haplotype analysis of the Northeast Asian wild boar group showed that the formation of this group then likely resulted from migration towards the north by the southern group. This hypothesis was supported by the results showing that the haplotypes of Europe and most of West Asia did not appear in the Northeast China group and that except for Hap17, all the haplotypes in Northeast Asia could be identi ed in the group from Southeast Asia, and in addition to the European haplotypes and a large proportion of the West Asian haplotypes.

Conclusions
On the basis of our research we consider the southeastern zone of China to be the region of origin for wild boars on mainland China, which is consistent with previous research results [6,7]. Then, during persistent species differentiation, generating diversity, some individuals or communities were transferred to West Asia and Europe, and the settlement of the boars in Northeast Asia was due to the spread of the Southeast Asian group to the north.

Samples and DNA extraction
Ear tissues of 23 wild boars from Heilongjiang Province (China) were obtained and whole genome DNA was extracted utilizing a Genomic DNA Kit (TIANGEN BIOTECH, Beijing, China). The concentration of the DNA samples was measured using a NanoDrop 1000 spectrophotometer (Thermo Fisher, Massachusetts, USA). The samples were stored at -20°C.

D-loop sequence ampli cation and genotyping
Each 50 μl reaction system for polymerase chain reaction (PCR) contained 25μl DNA Taq polymerase (2×Taq Plus PCR MasterMix, TIANGEN), 1 ng DNA template and primers (forward primer: 5'-CCAAAACAAGCATTCCATTCG-3'; reverse primer: 5'-CGTAACCATTGACTGAATAGCACC-3') to perform polymerase chain reaction (PCR). The reaction conditions were as follows: 94°C for 5 min, followed by 35 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 50 s, and a nal extension of 72°C for 50 min. The sequencing of the region of interest in ampli cation product was entrusted to Sangon Biotech Co., Ltd.

Software and Data Analysis
All mtDNA D-loop sequences obtained in this article were aligned by using MEGA-X software (version 10.0.4) [27][28][29][30] and Geneious free trial software [31,32]. Nucleotide variation sites were analysed by DNASTAR (version 7.10) software [33,34]. DNA Dragon was used to screen the SNPs that only appeared once. Statistical analysis was conducted by using DnaSP (version 6.12) [35][36][37]. Reduced median networks were generated using the NETWORK 5.0 program (https://www. uxus-engineering.com/). The available sequence data were cited from NCBI PopSet data sets (https://www.ncbi.nlm.nih.gov/popset). Availability of data and materials The raw data obtained in this study are available in the supplemental les.   a. Schematic diagram of the wild boar mitochondrial genome and the hypervariable region within the Dloop ampli ed in this research. b. Phylogenetic tree of Northeast Asian wild boars The evolutionary history was inferred using the neighbour-joining method [18][19][20]. The bootstrap consensus tree inferred from 1000 replicates is used to represent the evolutionary history of the analysed taxa [21,22]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The evolutionary distances were computed using the number of nucleotide differences method [23] and are presented in units of the number of base differences per sequence. This analysis involved 56 nucleotide sequences. All positions with less than 95% site coverage were eliminated (i.e., fewer than 5% alignment gaps, missing data, or ambiguous bases were allowed at any position; partial deletion option). There were a total of 511 positions in the nal dataset. Evolutionary analyses were conducted in MEGA X [24].

Figure 3
The SNP sites of the 21 haplotypes screened from 480 wild boar samples and typical peccary and warthog transcripts.