Ethics statement
This study followed the recommendations of the “Regulations for the Management of Affairs Concerning Experimental Animals” (Ministry of Science and Technology, China, revised in June 2004). The study was approved by the ethics committees of all the participating institutes (Institute of Animal Science, Chinese Academy of Agricultural Sciences; Ningxia Academy of Agriculture and Forestry Sciences; Research Institute of Animal Science, Tibet Academy of Agricultural and Animal Husbandry Sciences; National Animal Science Institute, Nepal Agricultural Research Council).).
Samples and genotyping
Genome-wide SNP data of 968 sheep individuals from five South Asian thin-tailed sheep breeds (i.e., Tibetan, Changthangi, Deccani, IndianGarole and Garut sheep), six European thin-tailed sheep breeds (i.e., Churra, Leccese, Comisana, Altamurana, MacarthurMerino and MilkLacaune sheep), three American thin-tailed sheep breeds (i.e., BarbadosBlackBelly, MoradaNova and SantaInes sheep), six Middle East fat-tailed sheep breeds (i.e., Afshari, LocalAwassi, Karakas, Norduz, Moghani and CyprusFatTail sheep), four African fat-tailed sheep breeds (i.e., EthiopianMenz, NamaquaAfrikaner, RedMaasai and RonderibAfrikaner) and eight Mouflon sheep individuals (wild sheep) were downloaded from Ovine HapMap project (http://www.sheephapmap.org/hapmap.php) [32]. SNP data of four Chinese fat-tailed sheep breeds (i.e., Hu, Tong, Large tail Han and Lop sheep) were obtained from a previous study [10]. SNP data of four Nepalese thin-tailed sheep breeds (i.e., Bhyanglung, Baruwal, Lampuchhre and Kage sheep) were generated in our lab [33]. Tail type for these breeds was determined by either direct observation or description of previous literatures. The detailed information of these sheep breeds was provided in Additional file 1: Table S1. All the SNP data were generated using the Illumina Ovine 50K Beadchip and were thus readily merged together. The final dataset included 968 sheep individuals and 47,415 common autosomal SNPs (based on genome Oar_v3.1).
Determination of ancestral allele and data quality control
Ancestral allele information for a subset of 33,059 SNPs were obtained from Ovine HapMap [32]. For the remaining SNPs, the ancestral allele was deduced according to the major allele in Mouflon sheep. Finally, a total of 46,540 SNPs with available ancestral allele information were kept for next analysis. PLINK v2.05 [34] was applied for further SNP data quality control. SNPs were removed if any of the following conditions were met: 1) with call rate ≤90%; 2) with minor allele frequency (MAF) ≤0.05. Sheep individuals with an average call rate below 90% were discarded. To ensure independence among the studied sheep individuals, cryptic relatedness among individuals within each breed were identified using pair-wise Identity-By-Descent (IBD) metric (referred to as PI-HAT in PLINK). One individual from a pair of sheep individuals was removed from the following analyses if their PI-HAT value was over 0.3.
Phylogenetic analysis
A pruned set of 32,450 SNPs were used to investigate the genetic relationship among these sheep breeds from different geographic locations. PCA was performed with the GCTA software [35] and the individuals outside of their expected population clusters were excluded from further analysis. The neighbor-joining tree was constructed using PHYLIP 3.68 (http://evolution.genetics.washington.edu/phylip.html) on the basis of allele frequency data. After PCA analysis, a total of 45,337 SNPs for 828 individuals from 30 diverse sheep breeds were retained for downstream analyses (Additional file 1: Table S1).
Genomic screen for positively selected genes in fat-tailed sheep
Three previous studies compared the genomic variations between Middle East/European thin-tailed versus (v.s.) Middle East fat-tailed sheep [11], European thin-tailed v.s. Middle East fat-tailed sheep [12], and Chinese thin- versus fat-tailed sheep [10], respectively. Therefore, we used sheep breeds from South Asia, Middle East and Europe to identify the positively selected genes in fat-tailed sheep. Three group-pair comparisons between thin- and fat-tailed sheep were considered, including MEF v.s. SAT, MEF v.s. EUT, and CHF v.s. SAT.
Two statistics, including the FST and ΔDAF were applied to evaluate the genetic differentiation of each SNP between thin- and fat-tailed sheep populations. The FST analysis was conducted using Genepop 4.3 software [36]. ΔDAF was calculated as the derived allele frequency in the fat-tailed sheep minus the DAF in thin-tailed sheep (DAFFat-tailed sheep – DAFThin-tailed sheep) using R version 3.3.3 (https://www.r-project.org/). For each group-pair comparison, FST and ΔDAF at each SNP marker was calculated between each thin-tailed breed against each fat-tailed breed and averaged across breed-pairs to produce an overall value for each SNP. The top 1% SNPs with large overall FST or ΔDAF value were considered as the significant SNPs in each test and the significant SNPs overlapped in both tests were considered as positively selected SNPs in fat-tailed sheep. Finally, positively selected SNPs that were identified in all three group-pair comparisons were considered as the loci under positive selection in fat-tailed sheep and genes within 150 kb of these loci were retrieved from Ensembl BioMart database (http://useast.ensembl.org/biomart/martview/625a397ecca62a421509935f099cdaa7). Because the high average value of FST or ΔDAF may be resulted from the extremely large values in several specific breed-pairs due to population structure, we additionally examined the DAF of the candidate SNPs in three thin-tailed sheep breeds from Americas and three fat-tailed sheep breeds from Africa and further filtered the promising candidate list by removing SNPs with DAF >0.5 in more than five thin-tailed sheep breeds or DAF <0.5 in more than four fat-tailed sheep breeds (>30% of total thin- or fat-tailed sheep breeds).
Analysis of sequence variations within top genes
Genomic variations stored in Variant Call Format (VCF) file for 16 thin-tailed sheep individuals and 13 fat-tailed sheep individuals from 17 different breeds from around the world were downloaded from NextGen of Ensembl Projects (http://projects.ensembl.org/nextgen/) (Additional file 7: Table S5). The VCF file was converted to PLINK PED file using VCFtools [37]. PLINK [34] was used to perform the SNP quality control (removing SNPs with call rate ≤90% or MAF ≤0.05) and to calculate the allele frequency of each SNP in thin- and fat-tailed sheep group. The absolute ΔAF) of each SNP between fat- and thin-tailed sheep group were calculated using R.
Validation of top SNPs in expanded samples using Sequenom MassARRAY
Ear tissues of 200 Tibetan sheep (Thin-tailed sheep) 184 Tan sheep (fat-tailed sheep) were collected from farm of Research Institute of Animal Science, Tibet Academy of Agricultural and Animal Husbandry Sciences, and Ningxia Yanchi Tan sheep breeding farm, respectively. Animals were released after collecting the samples. Gnomic DNA was isolated and was quantified using a Nanodrop 2000 (Thermo Fisher Scientific, DE). A total of 13 intronic SNPs of PDGFD gene with the highest ΔAF value (>0.8) and derived allele frequency larger than 0.8 in fat-tailed sheep were selected for further validation in this expanded cohort. The genotyping of these 13 SNPs was carried out with Sequenom MassArray system. Briefly, primers for PCR and for locus-specific single-base extension were designed with MassArray Assay Design 4.0. The PCR products were applied for locus-specific single-base extension reactions. The final products were desalted and transferred to a matrix chip. The resultant mass spectrograms and genotypes were analyzed with MassArray Typer software.
Sections, Hematoxylin and Eosin (HE) staining, Oil Red O staining
A total of 16 pregnant tan sheep individuals (fat-tailed sheep) subjected to artificial insemination were randomly chosen at farm of Institute of Animal Science, Ningxia Academy of Agricultural and Forestry Sciences in China. At each of the expected embryonic stages including embryonic day 60 (E60), E70, E80 and E90, four animals were euthanized by captive bolt stunning followed by exsanguination and embryonic tail tissues were collected and subjected to histology analysis, RNA-seq or qRT-PCR analysis. For Histology analysis, tissues were fixed with 4% paraformaldehyde over-night at 4°C and then embedded in paraffin. Sections were cut at 8-um thickness. For HE staining, sections were deparaffinized with two changes of Xylene (10 minutes each) and were rehydrated with 100%, 95% and 80% ethanol rinses (5 minutes each). After a brief wash in distilled water, sections were stained in hematoxylin solution for 5 minutes, washed in running tap water for 5 minutes, differentiated in 1% acid alcohol for 30 seconds and washed again with running tap water for 1 minute. Following bluing in 0.2% ammonia water for 30 seconds and a wash in running tap water for 5 minutes, sections were rinsed in 95% alcohol and stained in eosin-phloxine solution for 30 seconds. Sections were then dehydrated with a series of rinses in 80% alcohol (5 seconds), 95% alcohol (twice, 10 seconds each), 100% alcohol (twice, 5 minutes each), and then mounted. For Oil Red O staining, 0.5 g Oil Red O powder (Sigma) was evenly dissolved in 100 mL of isopropanol to prepare stock solution, which was then diluted with distilled water at the ratio of 3:2 and filtrated to make working solution. The sections were rinsed with 60% isopropanol and then stained with Oil Red O working solution for 15 minutes. Sections were then mounted and imaged for evaluation of droplet formation.
RNA-sequencing and bioinformatics analysis
Tail tissues from Tan sheep at three different developmental stages including E60, E70, E80 were used for RNA-seq analysis. Total RNA was extracted by RNeasy Lipid Tissue Mini Kit (Qiagen) and the RNA-seq library was constructed with Dynabeads mRNA DIRECT Kit (invitrogen) for whole transcriptome RNA-seq analysis. Pair-end RNA-sequencing was performed on a X-ten system (Illumina) in 150-bp length.
After removing adaptor sequence and low-quality reads, pass-filtered reads were then mapped to Ensembl sheep reference genome Oar_v4.0 using Tophat 2.1.1 [38]. The genes annotated in Ensembl were quantified with Cufflinks [39]. FPKM (Fragments Per Kilobase of exon per Million fragments mapped) were calculated from raw counts and used to quantify relative gene expression for each sample.
Quantitative reverse transcription-PCR (qRT-PCR) analysis
Tail tissues of Tan sheep (fat-tailed) at different developmental stages including E60, E70, E80, E90 described above were used for qRT-PCR analysis of PDGFD expression during development. Four additional tail samples from each of Suffolk sheep (thin-tailed sheep) at E70 and Tan sheep at adult stage (about 2 years old, female) were collected from farm of Institute of Animal Science, Ningxia Academy of Agricultural and Forestry Sciences in China. Tail samples from four Bhyanglung sheep individuals at adult stage (thin-tailed sheep, about 2 years old, female) were collected from farm of National Animal Science Institute, Nepal Agricultural Research Council in Nepal. For embryonic sample collection, animals were euthanized by captive bolt stunning followed by exsanguination and embryonic tail tissues were harvested. For adult sample collection, animals were released right after harvesting samples. Total RNA from tail tissues were isolated with RNeasy Lipid Tissue Mini Kit (Qiagen). 0.8μg of total RNA was utilized as template for RT with random hexamer primers using PrimeScript RT reagent Kit (Takara). qRT-PCR was performed with respective gene-specific primers (PDGFD: Forward: GGGAGTCAGTCACAAGCTCT, Reverse: AGTGGGGTCCGTTACTGATG; ACTB: Forward: TCTGGCACCACACCTTCTAC; Reverse: TCTTCTCACGGTTGGCCTTG). All samples were amplified in triplicate and all experiments were repeated at least 3 independent times. Relative gene expression was converted using the 2-êêCT method against the internal control house-keeping gene ACTB where êêCT= (CTexperimental gene - CTexperimental ACTB) - (CTcontrol gene - CTcontrol ACTB). The relative gene expression in control group was set to 1. Student t test was performed to determine significance.
PDGFD expression in adipose tissues/cells in mouse and human
Micro-array data of different cell types isolated from human white adipose tissues by fluorescence-activated cell sorting (FACS) including CD45-/CD34+/CD31- (progenitors), CD45+/CD14+/CD206+ (total macrophages), CD45+/CD14+CD206+/CD11c+ (M1 macrophages), CD45+/CD14+/CD206+/CD11c- (M2 macrophages), CD45+/CD3+ (Total T cells), CD45+/CD3+/CD4+/CD8- (Th T-cells), CD45+/CD3+/CD4-/CD8+ (Tc T-cells) (GSE100795) [24] were obtained from GEO database and re-analyzed with online build-in tool GEO2R. Raw gene count matrix generated from RNA-seq for mouse 3T3-L1 preadiopocytes and differentiating cells at 24 hours after induction by DHA differentiation cocktail was obtained from GEO database (GSE118471) [23]. FPKM was calculated for each sample and used for quantification of gene expression level. Micro-array data of inguinal fat tissues from mice exhibiting high or low weight gain after 4 weeks on a high saturated fat diet (GSE4692) [28], adipocytes from non-diabetic lean and non-diabetic obese Pima Indian subjects (GSE2508) [27] were obtained from GEO database and analyzed with online build-in tool GEO2R.
Dual luciferase reporter assay
Using human genomic DNA as template, two fragments (WT, 222 bp; Del, 202 bp) spanning PDGFD gene promoter region were synthetized directly by BerryGenomics Company (Beijing, China), and were cloned into pGL4.23 (Promega) luciferase reporter vector between Kpn1 and EcoRV restriction sites. All plasmids were sequenced to verify the integrity of the insert. Transfection was performed with Lipofectamine 3000 reagent (ThermoFisher) essentially following manufactory’s instruction. The promoter activity was evaluated by measurement of the firefly luciferase activity relative to the internal control minimal TK promoter driven-Renilla luciferase activity using the Dual Luciferase Assay System as described by the manufacturer (Promega). A minimum of four independent transfections was performed and all assays were replicated at least twice.