Whole-Genome Sequencing of Bamei Mutton Sheep for Screening the Genes and SNPs Associated with Litter Size under Selection

Background: Bamei mutton sheep is a Chinese domestic sheep breed developed by crossing German Mutton Merino sheep and indigenous Mongolian sheep for meat production. Here, we focused on increasing the litter size of this breed to improve the eciency of mutton production. Results: We selected ve high- and ve low-fecundity Bamei mutton sheep for whole-genome resequencing to identify candidate genes for sheep prolicacy. We used the FST and XP-EHH statistical approach to detect the selective sweeps between these two groups. Combining the two selective sweep methods, the reproduction-related genes JUN, ITPR3, PLCB2, HERC5, and KDM4B were detected. JUN, ITPR3, and PLCB2 play vital roles in GnRH, oxytocin, and estrogen signaling pathway. Moreover, KDM4B, which had the highest FST value, exhibits demethylase activity. It can affect reproduction by binding the promoters of estrogen-regulated genes, such as FOXA1 and ESR1. Notably, one nonsynonymous mutation (p.S936A) specic to the high-prolicacy group was identied at the TUDOR domain of KDM4B. Conclusion: These observations provide a new opportunity to research the genetic variation inuencing fecundity traits within a population evolving under articial selection. The identied genomic regions of KDM4B that are responsible for litter size can in turn be used for further selection.

As reported in Science, the rst high-quality 2.61 Gb reference genomes of domestic sheep were sequenced and assembled in 2014. They can help in identifying genomic signatures of domestic traits in sheep (12). By performing whole-genome resequencing on phenotypically divergent sheep populations, some selective sweeps were identi ed relating to important traits targeted by arti cial selection during domestication, such as horn (1,13), coat color (14), tail morphology and fat deposition (3,15,16) and variation of thoracic vertebrae (17).
Regarding reproductive traits, using selection tests in pigs, previous studies demonstrated some strong selective signals belonging to the TGF-β signaling pathway (18,19). In addition, in goats, the genes regulating seasonal reproduction and litter size have been speci cally selected (20,21). Moreover, some fecundity-related genes revealed a strong selection signature in sheep from Ethiopia and Europe (22,23).
Against this background, we performed whole-genome resequencing of 10 ewes selected to have extreme fecundity and sweeping analysis to identify the underlying variants and genes responsible for the litter size of Bamei mutton sheep under the in uence of arti cial selection.

Results
Sequencing and mapping of the ten sheep We sequenced ve monotocous and ve polytocous sheep to an average depth of 5.5 × using an Illumina HiSeq2500 sequencer, generating a total of 627. 16 Table S5).

Population structure analysis
The levels of genome-wide genetic diversity, Tajima's D, and the minor allele frequency (MAF) distribution indicated that high-frequency minor alleles constitute a small proportion of the total but are slightly more abundant in polytocous sheep than in monotocous ones (Supplemental Fig. S1). Additionally, haplotype analysis indicated that the polytocous sheep have slower decay of pairwise correlation coe cient (r 2 ) and higher integrated haplotype homozygosity (iHH) than the monotocous sheep (Supplemental Fig. S2).
To determine the phylogenetic relationship between monotocous and polytocous sheep, a neighbor-joining tree was constructed using high-quality SNPs. When the tree was generated using the SNPs for the whole genome, monotocous and polytocous sheep formed a mixed clade (Fig. 1A). This indicated that the pairwise distances within each group were larger than those between the groups and there was no signi cant distinction between the two groups. However, in the phylogenetic tree constructed using SNP data selected based on F ST score, the monotocous and polytocous sheep were classi ed into two genetically different groups (Fig. 1B). The results of the two trees show that the population genetic structure was not associated with the litter size and that the breeding time of highfecundity Bamei mutton sheep was relatively short.
Analysis of the selected loci and candidate genes To nd selection signals associated with high proli cacy, the average F ST values were calculated for nonoverlapping 50 kb windows on the autosomes and X chromosome. After Z-transforming the values, we selected the windows with Z(F ST ) > 5 across the genome, as in previous studies (24). In total, we identi ed 85 unique autosomal regions  Table S6).
We also estimated the XP-EHH statistic for the monotocous and polytocous groups using monotocous sheep as a control. An XP-EHH value greater than zero indicates that these sites have been selected for in the monotocous population, while a value less than zero indicates that selection has occurred in the polytocous population. We  Table   S7).
We combined the genes obtained by the two methods described above. A total of 221 genes were found to have been selected in total. Overall, 14 genes were positively selected across both of the two methods (Supplementary Table S8).
Gene Ontology (GO) and KEGG pathway analyses were performed to further study the functions of the selected genes identi ed by the two methods. The enriched GO terms and KEGG pathways are shown in Supplementary Tables S9 and S10. By the Functional Annotation Clustering tool of DAVID 6.8, we identi ed two annotation clusters (classi cation stringency: Medium). Annotation cluster 2 contained GnRH signaling pathway, estrogen signaling pathway, and oxytocin signaling pathway. The reproductive hormones in these pathways are involved in regulating sheep estrus, follicle development, and ovulation. JUN (JUN proto-oncogene, AP-1 transcription factor subunit), ITPR3 (inositol 1,4,5-trisphosphate receptor type 3), and PLCB2 (phospholipase C beta 2) were enriched in all three pathways (Table 1). Mutations in the KDM4B gene KDM4B is located in the highly differentiated region with the highest F ST value between the monotocous and polytocous groups. A selected mutation that goes to xation tends to reduce variation in linked sites in the process of a selective sweep (25). Therefore, we determined the Hp value of the window (chr 5: 16750000-16800000 bp) around KDM4B. The Hp value of the polytocous group (Hp = 0.0818) decreased in the KDM4B region and was the lower than in the monotocous group (Hp = 0.22923) (Fig. 3A).
To identify SNVs subjected to selection, we screened the exonic mutations of the KDM4B gene in both monotocous and polytocous groups. SNVs that can alter protein translation, structure, and even function may contribute to rapid evolution in domestic animals (26). Here, 13 synonymous SNVs, 3 nonsynonymous SNVs, and 1 frameshift deletion were identi ed. All of the three nonsynonymous SNVs cause amino acid sequence changes p.  Table S11).
We compared the F ST values of the three nonsynonymous SNVs; only mutation p.S936A had high divergence of allele frequency within our population: monotocous sheep 50% (n = 5) and polytocous sheep 90% (n = 5). To con rm these frequencies based on Illumina sequencing, alleles of additional samples were genotyped by Sanger sequencing. The results showed a similar trend: monotocous sheep 40% (n = 15) and polytocous sheep 60% (n = 14) ( Fig. 3B). This indicates that the nonsynonymous mutations may be associated with the selective sweep at KDM4B.
p.S936A affected the TUDOR domain (Fig. 3C). This domain can bind to speci c lysine methylation marks on histone proteins (H3-K4me3, H3-K23me3, and H4-K20me3). It plays a vital role in chromatin localization and the regulation of enzymatic function (27). Thus, we aligned the KDM4B protein mutant with its ortholog in diverse vertebrates to evaluate the functional effects of the variants. The results reveal that p.S936A is quite well conserved, being invariant among all of the other mammals that we used (Fig. 3C). All of the results indicate that p.S936A is an important mutation for the reproduction-related KDM4B sweep.

Discussion
Continuous arti cial selection in production-oriented breeding has left selective signatures and genomic variability in domesticated sheep. In this study, we performed whole-genome sequencing of 10 Bamei mutton sheep with different litter sizes from the same population. In this population, reproductive traits have undergone intensive selection via the breeding strategy. In this research, numerous mutations were selected in the population after screening, but the rates of heterozygosity did not differ between the two populations. When a neighbor-joining tree was constructed using SNPs selected based on F ST score, these two groups were differentiated into two genetic clusters. Most quantitative traits are known to respond quickly to arti cial selection, and this population subjected to selection of litter size might have evolved in opposite directions with soft sweeps (28,29).
Selective sweeps can identify important genomic regions that have been swept in the recent past. Some genes associated with complex, economically important traits have been identi ed with the help of selected sweep analysis by whole-genome sequencing. We used the XP-EHH test to detect alleles near xation within a Bamei mutton sheep population. Upon undergoing recent selection, the selected allele generally reaches a high frequency or xation in one group, but remains polymorphic in the whole population (30). In this research, we discovered 155 regions selected for in the polytocous population, which was greater than the 43 regions screened in the monotocous population. Using xed window selection of F ST and XP-EHH, 221 candidate genes were found to be associated with the proli cacy trait. In a previous study using a segregated ock based on QTL and GWAS mapping, some mutations (FecB, FecX, and FecG) were identi ed to affect ovulation in sheep (11). However, here we did not identify any mutations previously reported to increase the number of ovulations in sheep. The litter size per breeding ewe is not only in uenced by ovulation, but also affected by a number of factors including fertilization rate and pregnancy loss (in the embryonic and fetal development period). Recently, the LEPR gene, estrogen receptor 1 (ESR1) gene, and prolactin (PRL) gene were found to be associated with fecundity, as revealed by a selective sweep analysis in European commercial and semi-feral breeds and a Chinese indigenous breed distributed in different ecoregions (23,31,32).
JUN, ITPR3, and PLCB2 in the selected region were enriched in GnRH, oxytocin, and estrogen signaling pathway associated with the complex process from estrus to lambing. This process is organized by complex communication among the hypothalamus, pituitary, ovary, and uterus. Gonadotrophin releasing hormone (GnRH), secreted by the hypothalamus, regulates the synthesis and release of gonadotrophins, follicle-stimulating hormone (FSH), and luteinizing hormone (LH) from the pituitary (33). FSH and LH support the growth and maturation of follicles.
Estradiol and progesterone secreted by the corpus luteum inhibit GnRH release by feedback modulation. During the follicular phase, following luteolysis, estradiol reaches a critical threshold and stimulates the preovulatory gonadotrophin surge, appearing to help the ovulation of mature follicles (11).
PLCB is involved in a wide range of signals of reproductive processes, being regulated by many hormones (FSH, LH, GnRH, oxytocin). PLCB2 as a member of the PLCB subfamily is activated by G-protein-linked receptors and can hydrolyze phosphatidylinositol 4,5-bisphosphate to form inositol-1,3,4-trisphosphate (IP3) and diacylglycerol (DAG), which stimulate Ca 2+ release and protein kinase C activity, respectively (34 (39). In the developing ovarian follicle, granulosa cells are the main producers of estrogen. KDM4B is expressed in granulosa cells at early stages of folliculogenesis and its level is correlated with pregnancy failure in IVF patients (40). KDM4B expression in the uterus is also associated with recurrent pregnancy loss in women (41). The identi ed intersection between steroid hormones and KDM4B in the ovary and uterus sheds new light on the regulation of reproduction.

Conclusions
Following high-throughput sequencing, SNPs and indels were identi ed from two sheep populations. The sequencing data revealed the genetic diversity and population differentiation of the Bamei mutton sheep population experiencing selection for litter size. The genomic selection scan detected some interesting candidate genes and pathways under arti cial selection, which might have increased litter size. This genome-wide research provides valuable information for future whole-genome selection for fecundity in this sheep breed.

Animals and DNA preparation
In this study, we collected whole-blood samples from the jugular vein of 10 Bamei mutton sheep at the Xianghe Bamei mutton sheep-breeding park, Bayannur, China. These blood samples were placed in EDTA vacutainer tubes for storage. These ewes were about 3 years old and selected from among 500 sheep. Only ewes with litter size showing that they had given birth more than three times were sampled. The selected ewes were grouped into two categories based on the phenotype of litter size (monotocous sheep giving birth to only one lamb in three consecutive parities and polytocous sheep giving birth to more than two lambs in two consecutive parities). Then, genomic DNA was extracted from 200 µL of sheep blood using a QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany), in accordance with the manufacturer's instructions.

Genome sequencing
High-quality DNA for Illumina sequencing library construction was randomly sheared into small pieces (300-400 bp). After end-repair, "A"-tailing and ligating to Illumina sequencing adapters, 400-500 bp ligated products were ampli ed by ligation-mediated PCR (LM-PCR). Then, 2 × 100 bp paired-end sequencing was carried out on an Illumina HiSeq 2500 sequencer and the original data were analyzed by Illumina HiSeq Control Software.

Population genetics analysis
The samples were separated into two groups ( ve monotocous individuals, ve polytocous individuals). The SNP densities, minor allele frequencies, and Tajima's D of each group were calculated by VCFtools v0.1.12b (46). Only SNPs in autosomes were preserved for the phylogenetic analysis. A phylogenetic tree of all samples was generated using SNPhylo (47) version 20160204, based on the maximum likelihood method. A total of 500,000 SNPs were randomly selected for calculating the linkage disequilibrium (LD) r 2 using Haploview (48) with parameters set as follows: "--missingCutoff 0.2 --dprime --minMAF 0.1." The SNP pairs were clustered based on the physical distances of these genes. The average LD (e.g., 0-1 kb) of each group was represented by the mean r 2 .

Selective sweep analysis
The pooled heterozygosity Hp was calculated over 10 kb windows using the formula where Σn MAJ denotes the sum of major allele frequencies in a selected window and Σn MIN denotes the sum of minor allele frequencies (49). Putative selection targets were extracted from the extreme tail of the distribution by applying a Z(F ST ) > 5 cut-off (24). fastPHASE v1.4.0 was used to phase the genotypes of all samples with the parameters "-T10 -K8" (52). Then, the phased data were used to calculate the cross-population extended haplotype homozygosity (XP-EHH) value by XP-EHH (53). XP-EHH values were averaged over 50 kb sliding windows. These scores approximately followed a normal distribution; the threshold to locate putatively selected regions was two times the XP-EHH distribution standard deviation (|XP-EHH| > 2). Manhattan plots for F ST and XP-EHH were generated with the R package gap.
A phylogenetic tree was generated using all variants located in these regions. Candidate genes targeted by positive selection were de ned as genes overlapping with sweep regions (ZF ST > 5 and |XP-EHH| > 2). GO and KEGG enrichment analyses for candidate genes were performed by DAVID 6.8 (54), and P values were corrected using the

Consent for publication
Not applicable.

Availability of data and materials
The raw genomic sequencing data from this work have been uploaded to the NCBI SRA database with accession number PRJNA560632 (https://www.ncbi.nlm.nih.gov/sra/PRJNA560632).