DOI: https://doi.org/10.21203/rs.2.15926/v1
The dog (Canis familiaris) was likely the earliest domesticated animal and the only one humans' friend in the past [17,58]. Genetic studies and archaeology findings show that dogs have a common ancestor with the gray wolf (Canis lupus) [18, 56, 60]. In the Southwest Asia, major–scale farming extended within the so-named Fertile Crescent (FC) where the independent domestication of plants and animals had led to shifting from gathering and hunting to sedentary farming following expansion of the fist complex societies [19, 65]. Mostly, agricultural developments happened in the eastern horn of FC especially Elam (covering a region of southern Iraq and Iran), joining Mesopotamia and Iranian plateau [4]. Dogs are often drawn in art at ancient times in several parts of Southwest Asia [17, 44]. Therefore, one of the most theories about the geographical origin of the domestic dog has been that they originated in Southwest Asia, presumably in the FC [17]. In addition, Middle East has been proposed as the beginning of domestic dog for great haplotype sharing between Middle Eastern wolves and dog breeds [55] although this hypothesis has been questioned due to dog-wolf introgression in the Middle East [6, 7, 25] rather than an indication of Middle Eastern origins. The dog is a notable instance of variation under domestication, however the evolutionary processes underlying the genesis of this diversity are weakly realized.
In recent years, advance in high-capacity genome examining techniques, especially whole genome sequencing, SNP genotyping array and comparative genomic hybridization (CGH) arrays have authorized the recognition of genome-wide structural variants. The array methods have limited resolution and low sensitivity because their performance is strongly depending on the marker frequency and particularly constructed non polymorphic markeres [5, 36, 46], thus they cannot detect small CNVs (< 10 kb) and cannot precisely identify boundaries of CNVs [64]. Compared to the other methods, next-generation sequencing methods provide a high-accuracy base-by-base vision of the genome and capture all variants by different size that might otherwise be missed, and all these are important and have significant effects on an extensive range of traits in domesticated animals. For examples: Increased transcription of GRIK2 led to the increased fear response in domesticated animals compared with the wild counterparts including rabbit, guinea pig, dog and chicken [33], MC1R makes coat color variants in pig [23] and mutation in TSHR influences seasonal reproduction in chicken [48]. Copy number variations (CNVs) can also have major phenotypic changes in animals. For example, pea-comb in chickens is produced with the CNVs in the SOX5 gene [61], The late feathering in chicken is caused with the CNVs in the SPEF2 and PRLR [22], The polledness in goats is produced by deletion variation [42], The hair ridge phenotype in ridgeback dogs causing with the CNV in FGF gene [28], highly duplicated APOL3 gene engaged in lipid shifting has been reported in breeds of beef cattle [12] and increasing AMY2B and AKR1B1 copy numbers make adaptability to a starch- high diet in dog [9, 59].
In this work for the first time, we sequenced the whole genomes of 6 canids from the same geographical range (three Iranian wolves and three Iranian dogs) with relatively high coverage (14.51x to 17.15x). One of the sequenced dogs, Qahderijani, is a mastiff ecotype dog originating in Qahderijan, Iran, that is located in FC belt (surrounding areas of FC) and the other two sequenced dogs were sampled from the Saluki breed, a dog breed that originated in the FC and is a hunting dog breed and is considered as the long marathon runner of the canine as its incredible endurance enables the dog to run for many miles.
In our analysis of the Iranian dog and wolf sequences, we applied assembly version canFam3.1 as a reference sequence [34]. SNPs and small Indels were detected in this research as differences between the recently gained genome sequences and reference sequence, and detected 12.45 million SNPs and 3.48 million small Indels. Valid algorithms were applied to analyze 6 genomes to get highly reliable CNVs and SVs. The potentially breed-specific CNVRs were defined and the functional relation of the SV and CNVR-covering genes was further evaluated with GO enrichment. Genome-wide analysis indicates more genetic diversity in dog genome than that in wolf genome. Disclosed annotation of the results from different types of genomic variations proposed that increasing the percentage of genomic variations in the coding and the regulatory regions of genes than that in intron and intergenic regions during domestication is the substantial contributor to the currently detected difference between dog and wolf. Also, comparison of effect genomic variations between dog and wolf genomes showed that generally genes engaged in neurological and digestion and metabolism processes had a considerable effect in the progress of dog domestication. The CNVs reported in this research are enriched for olfactory and immune system genes.
Sequencing output
Illumina Paired-end sequencing was performed for 6 individuals (Additional file 1: Table S1 and Figure S1). After filtering, the range of total high-quality sequence data for 6 individuals was from 42.1 Gb (Sample ID: #GW1) to 51 Gb (#DogQI) and the coverage varied from 14.51 (#GW1) to 17.15 (#GW2) (Additional file 1: Table S2). For increasing reliability of CNV calling, we used uniform depth of coverage across the 6 individual genomes (Additional file 1: Table S2) as suggested formerly [1]. The mean insert size longer than the lengths of both reads with a Poisson-like distribution insert size and a small standard deviation across the 6 individual genomes (Additional file 1: Table S1 and Figure S1) have increased the amount of utilizable sequences in our dataset for detecting of genomic variations in this work [53]. To increase confidence of base calls and accuracy of detecting genomic variations, sequencing was done with relatively high mean depth for 6 individuals (Additional file 1: Table S2). Relatively high mean depth can increase the accuracy of CNV calling through read depth method [1], and using the paired-end DNA sequencing reads together with the relatively long read length will be useful to identify Indels [39, 54].
SNP detection and annotation
SNPs were detected through aligning sequences to the reference genome. The number of SNPs was calculated for all individuals (Additional file 1: Table S3 and Figure S2). We a total of 12.45 million SNPs in six individuals, of which 10.45 million SNPs were identified within the 3 wolves and 7.82 million SNPs within the 3 dogs. We obtained the ratio of transitions to transversions (Ti/Tv) for SNPs and the number of heterozygous and homozygous in SNPs across the 6 individual genomes (Additional file 1: Table S4). The number of heterozygous SNPs was higher than the number of homozygous SNPs in 6 individuals. The Ti/Tv ratio in SNPs varied from 1.99 (#DogQI) to 2.07 (#GW3) (Additional file 1: Table S4). Annotation of results from SNPs showed that most of the SNPs are located in intergenic and intron regions (Additional file 1: Tables S5). Of the total number of single-nucleotide polymorphisms, 53.57, 31.99, 0.81, 0.001, 4.83, 4.63, 0.44 and 0.12% were located within intergenic, introns, exon, transcript, upstream, downstream, three prime untranslated region (3'-UTR) and five prime untranslated region (5'-UTR) regions, respectively (Figure 1). Also, the total number of synonymous SNPs (silent SNPs) were more than the total number of non-synonymous SNPs (nonsense and missense SNPs) (Additional file 1: Table S6). Annotation of results from SNPs showed that the proportion of SNPs in intron and intergenic regions in wolf genome was higher than that in dog genome while the percentage of the SNPs in exon regions and 3'-UTR in the dog genome was higher than that in the wolf genome.
Small Indels detection, annotation and gene ontology
Indels were detected using aligning sequences to the reference genome. The number of Indels was calculated for all individuals (Additional file 1: Table S3). A total of 3.48 million Indels were detected across the 6 individual genomes, 2.24 million and 3.11 million of which were for 3 dogs and 3 wolves, respectively. We calculated the number of heterozygous and homozygous Indels across the 6 individual genomes (Additional file 1: Table S4). The number of heterozygous Indels was higher than the number of homozygous Indels in 6 individuals. The total number of small insertions across the 6 individual genomes was 1.58 million, also the total number of small deletions across the 6 individual genomes was 1.9 million (Additional file 1: Table S7). We drew indel length histogram for 3 dogs (Additional file 1: Figure S3), 3 wolves (Additional file 1: Figure S4) and across six individual genomes (Additional file 1: Figure S5). The results showed that Indels of 1 bp in length across the 6 individual genomes had the highest percentage and in the same size deletions had more percentage than the insertions. Annotation of results from small Indels showed that most of the Indels are located in intergenic and intron regions (Additional file 1: Tables S8). Of the total number of small Indels, 53.79, 34.778, 0.25, 0.002, 5.54, 4.95, 0.46, and 0.14% were located within the intergenic, introns, exon, transcript, upstream, downstream, 3'-UTR and 5'-UTR regions, respectively. The percentage of small Indels that are located in upstream, 5'-UTR, 3'-UTR, exon and transcript regions across 3 dog genomes was higher than that across 3 wolf genomes, but the percentage of Indels that are located in downstream, introns and intergenic regions across 3 wolf genomes was higher than that across 3 dog genomes. We obtained 21,104 genes from ensemble through annotation a total 3.48 million small Indels. After, we carried out gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis for these genes (Additional file 1: Table S9). Gene Ontology (GO) analysis categorized genes related to small Indels in the three main classes (molecular function, biological process and cellular component) (Additional file 1: Table S9). The KEGG pathway analysis showed that two pathways related to cancer and Melanoma (usually but not always, a cancer of the skin) were enriched among the small Indels in both dog and wolf.
Structural variants detection, annotation and gene ontology
We obtained genomic structural variants including insertions, deletions, translocations (inter and intra chromosomal) and inversions for three dogs and three wolves (Additional file 1: table S10; Additional file 2: table S16 and Additional file3: S17). The total number of deletions, insertions, inversions, inter chromosomal translocations and intra chromosomal translocations, across the 6 individuals was 14,321, 566, 469, 798 and 637 respectively (Additional file 4: Table S18). The total number of all structural variants except insertions in the wolf genome was higher than those in the dog genome (Additional file 2 :Table S16 and Additional file 3: S17). To obtain potential functional roles related to the different types of structural variants, all genes that completely or partially have overlapped with them were retrieved from Ensemble. We obtained 470, 163 and 228, 6,466, 191 and 269 genes from annotation for the total number of Indels (insertion and deletion), inventions and complex structural variants (inter and intra chromosomal translocations), respectively in dog and wolf (Additional file 1: Table S11). Annotation of results from structural variants showed in general the percentage of intergenic and noncoding transcript variants in wolf genome is higher than that in dog genome while the proportion of coding sequences and 3'-UTR variants in dog genome is higehr than that in wolf genome (Additional file 1: Figures S6-S13). Gene ontology (GO) analysis categorized genes related to structural variations in three classes, covering molecular function, biological process and cellular component (Additional file 1: Table S12). The genes related to olfactory and immune systems were enriched in the set of SVs identified in this work (Additional file 1: Table S12). The most conspicuous cluster terms in dog and wolf were “cellular carbohydrate metabolic process” and “nervous system development.
CNV detection
We obtained putative CNVs for 6 individuals using CNVnator program and the mean number of CNVs per individual was 4143.83 ranging from 2871 to 5437 (Additional file 1: Table S13). For all the autosomal CNVs categorized as gain, the mean copy number value of six individuals was 3.57 and the maximum copy number assessment was 174.472 on chromosome 7 (chr7) of wolf. The results showed that the number of gains in the three dog genomes was higher than those in the three wolf genomes (Additional file 1: Table S13). A total of 10571 CNVRs were obtained from overlapping of all CNVs across the 6 individuals (Additional file 5: Table S19), including 1-38 and X chromosomes, ranging in size from 1.05 kb to 3433.35 kb with an average of 14.63 kb and a median of 7.05 kb, covering 154.65 Mb, or 6.41%, of the assayed CanFam genome (Table 1). CNVRs were divided into three groups, including 6400 loss, 3916 gain and 255 both (gain and loss) events (Additional file 5: Table S19). Deletion:duplication ratio in the total CNVRs was 1.96. Among all CNVRs, 6,105 (57.75%) were found in a single individuals (singleton), 1,522 (14.39%) shared in two individuals, and 2,944 (27.84%) shared in at least three individuals (Figure 2B). A number of 6702 (63.4%) CNVR events were less than 10 Kb while 494 (4.7%) of the CNVRs were longer than 50 kb in size (Table 1 and Figure 2A). The highest and lowest numbers of CNVRs belonged to chromosomes 18 and 35, respectively (Figure 4 and Additional file 6: Table S20).
CNV annotation and gene ontology analysis
The annotation of results from CNVs showed that the percentage of CNVs in coding sequences (14% vs. 6%) and 3'-UTR (6% vs 0) in the dog genome was greatly higher than that in the wolf genome, but the percentage of CNVs in the intergenic regions (22% vs. 14%) in wolf genome was greatly higher than that in the dog genome (Additional file1: Figures S14 and S15). To achieve potential functional roles related to the putative CNVs, all genes that completely or partially have overlapped with these CNVs were detected from Ensemble. A total of 8595 genes were retrieved, including 6703 of the CNVs. Results of gene ontology (Go) analysis showed that in general genes associated with olfactory and immune systems are enriched among the CNV gains in dog and wolf (Additional file 1: Table S14). All the terms related to olfactory system are over-represented (P<0.01) in the wolf compared with those in the dog (Additional file 1: Table S14). The term “Starch and sucrose metabolism” is enriched in the dog CNV gains. The terms “cardiac conduction”, “Cardiac muscle contraction”, “regulation of heart contraction”, “heart development”, “muscle filament sliding”, “regulation of smooth muscle cell proliferation”, “ATP binding”, “calcium ion binding” and “muscle cell development” are enriched among the CNV gains in Saluki dogs (Additional file 1: Table S14).
Comparison with previous dog CNV studies
To compare the identified CNVRs in this work with those of the published studies, all previous CNVR coordinates from canFam2 were migrated to canFam3 using the UCSC leftover program. In our results, 4454 CNVRs (42.1%) were overlapped by four previous studies, and the remaining 6117 (57.865%) were considered as novel CNVRs (Additional file 1: Tables S15 and Additional file 7: S21).
Visualization of Structural Genomic Variation
For visualizing similarities and differences of positional relationships and genome structure between dog and wolf genomes, we drew maps of circular genomes for dog and wolf (Figure 3).
Analysis of high-quality next-generation sequencing data clearly showed the difference of the distribution and impact of the genomic variations between dog and wolf. The ratio of transition to transversion (ti/tv) is an indicator of false positive ratio for SNP calling [10, 27], the ratios calculated for all individuals (1.99 to 2.07) (Supplementary Table S4) indicate the precision of the identification of single-nucleotide mutations in our research. In addition, the results of this research similar to previous studies [50] showed that most of the SNPs belong to within introns or between genes and the number of synonymous SNPs was higher than non-synonymous SNPs. The majority of small Indels (95.89 % in dog and 95.64.% in wolf) were less than 10 bp in length, similar results were reported in study of Indels in chicken [63].
We detected 10571 CNVRs with a mean of 4143.83 CNVs per sample in the canine genome. Similar to those reported in dog and wolf [15, 38, 40, 41], human [20, 47] and mouse [26], loss events were more prevalent than gain events in our results (1.63 fold). This may mirror the greater relative hardness of identifying gains because of the smaller relative alteration in copy number (3:2 versus 2:1). Loss events included shorter genomic sequences than gains on median (4.499 kb vs. 11.699 kb), mean (7.387625 kb vs. 21.38724 kb) and total (47.280800 Mb vs. 83.752434 Mb) (Table 1). This could show that duplications are less likely to be cleaned by purifying selection [5]. A total of 4466 (42.25%) CNVRs are seen in at least two individuals and 6105 (57.75%) CNVRs present in only one individual. Percentage of singletons was obtained in this work is in agreement with that reported in previous studies related to identification of CNV in human [47], dog [40] and chicken [64]. We realized that the CNVRs were non-randomly distributed across the canid genome (Table S20). Chromosome 32, for example, has 2.03% of sequences displaying copy number variable, whereas chromosome 18 has 42.79% of sequences with copy number variation (Supplementary Table S20). In general, the chromosomes 9 (13.03%), 26 (14.97%) and 18 (42.78%) showed a high percentage of the CNVRs.
The terms “sensory perception of smell”, “detection of chemical stimulus” and “Olfactory transduction” are involved in sensory perception and were enriched among the CNV gains in dog and wolf and all of them were over-represented among the CNV gains in wolf (P<0.01). Both wolf and dog develop olfaction, audition and vision by 2 weeks, 4 weeks and 6 weeks of age on average, respectively [35]. Wolf pups start to investigate their environment at 2 weeks of age while they are blind and deaf, and must depend mainly on sense of smell, while dog pups start to investigate their environment at 4 weeks of age [35]. In a previous study, the fraction of olfactory receptor pseudogenes in dog and wolf was 17.78 and 12.08%, respectively, however, difference between these values in dog and wolf was not significant [67]. In another study, no difference in the olfactory capacity of the dog breeds that have been choosed for their smelling ability and the hand- breaded grey wolves was reported [43]. However, our results emphasize the importance of olfaction during dog domestication.
Many of GO terms belonged to CNV gains in this research are similar to those that were presented using aCGH method in dog [11]. Gene ontology go enrichment analysis showed that gene families involved in sense of smell and immune system commonly rapid growing for their importance in the organism answering to fast changes in the environment and fitness, also they have been frequently identified in CNV regions of multiple mammalian genomes [2, 62, 69]. Go terms related to heat function such as “cardiac conduction” and “regulation of heart contraction” were only enriched in the CNV gains in Saluki dog. These results can be expected because Saluki is a hunting dog breed which is considered as the long marathon runner of the canine world and its incredible endurance enables the dog to run for many miles. It has been presented that endurance exercise training makes a number of of cardiac adaptations to marathon running [51].
A fundamental number of the CNVs from this work (42.13%) are compatible with those identified in previous studies in dogs and wolves. In addition, a substantial number of the Go terms that are enriched among the CNV in this study are concordant with the Go terms related to studies of copy number variations in dogs and wolves. This compatibility with the previous studies, in conjunction with the identification of the CNVs specific to the Saluki breed, lends more support to the CNVs identified in this work. The difference between the CNVs detected in the study herein and those described previously can be related to the particular breeds studied and also the difference between the methods used. Generally, the CNVs that are identified by read-depth analysis are on average much smaller than those detected by aCGH.
The total numbers of SNPs, Indels, deletions, inversions, inter and intera chromosomal translocations in the wolf genome were higher than those in the dog genome while the total number of duplications or insertions in the dog genome was higher than those in the wolf genome. It has been accepted that gene duplication can be a chief source of recentness in evolution [68].
Our results from the genome analysis of dog and wolf revealed reduction of genomic diversity during dog domestication. A population bottleneck occurred in the wolves thousand years ago after a population expansion occurred by human through artificial selection on specific traits leading to different breeds of dogs [3, 25]. The effective population size in wolves is higher than that in dogs so higher genome diversity in wolves is expected compared to dogs [3, 25]. Our results from two components of genetic variation sources including SVs and CNVs confirmed that the novel adaptations permitted the primal ancestors of recent dogs to live on a diet high starch compared to the carnivorous diet of wolves and formed a essential step in the primal domestication of dogs [8, 9, 25, 52, 60]. The terms “Negative regulation of neuron apoptotic process”, “positive regulation of dendrite development” and “nervous system development” were enriched among SVs in wolf and are indicative of reducing aggression in the first steps of animal domestication. “Nervous system development" is defined as a process that particular result is the development of nervous tissue over time from its production to its developed shape.
In previous studies, the terms “axon” and “Nervous system development” were enriched among the genes related to the regions under selection during dog domestication [9, 57].
Annotation of results from different types of genomic variations showed that in general the percentage of genomic variations in intron and intergenic regions in wolf genome is higher than that in dog genome while in coding sequences and 3'-UTR in dog genome is higher than that in wolf genome. It seems that domestication and its related processes such as relaxed selection have an important role in increasing the percentage of genomic variation in the coding and the regulatory sequences of genes in dog. The relaxation of selection likely increases the functional genetic diversity throughout the genome of the dog and this diversity includes both the genes and the elements involved in gene expression [13, 21]. However, it should be noted that mammalian genomes possess a complex structure with a diverseness of repetitive elements that complicates extensive genome-wide analyses [54]. To better acknowledge this result, there is still the need for using mate pair sequences or merging long-insert mate pair and short-insert paired-end sequences to analyze the dog and wolf genomes and elucidate difference of the distribution and impact of the genomic variations between dog and wolf during dog domestication.
We resequenced the whole genomes of 6 canids from the Middle East for the first time and we compared the effect and distribution of the genomic variations between dog and wolf genomes. Whole genome resequencing of three dogs and three wolves detected 7.82 million and 10.45 million SNPs, respectively. Numerous putatively CNVs were identified through an analysis of read depth difference. Furthermore, we have identified SVs which could be useful for marker based population genetic investigation. Downstream analysis of the identified SVs and CNVs revealed the changes between dog and wolf genome during dog domestication. More work is needed to unravel the significance of the higher proportion of CNV gains in the Saluki dog.
Sampling and sequencing
The source of the animals used in this study were as follow: one wolf was sampled from Kerman zoo, South of Iran, two wolves were used from Eram Park Zoo, Tehran, Iran; two Saluki dogs were sampled from private farms in Kurdistan province, west of Iran and one Qahderijani dog was used from a private farm in Isfahan, Iran. We collected blood samples from three captive Iranian wolves (Additional file 1: S17) and three Iranian dogs including a Qahderijani (Additional file1: Figure S17) and two Saluki dogs (Additional file1: Figure S18) with the consent of the owners. Sampling locations are reported in Table 2. DNA was prepared with phenol/chloroform technique. Pair-end sequence data for all 6 individuals were generated using Hiseq 2500 IIIumina company in China (www.berrygenomics.com).
Quality control and mapping
The quality of reads was evaluated with FastQC program, outputs of quality control showed that all reads had high-quality and were without adaptor contamination. Aligning data against the genome assembly canfam3. 1 was done with burrows wheeler aligner program (bwa) [31]. The SAMtools [32] was applied to change the Sequence Alignment MAP (SAM) files to the Binary Alignment MAP (BAM) files and sort and index them. All of the .bam files were cleaned from PCR duplicates with Picard program. The accuracy of mapping was evaluated using of two criteria including percentage of aligning against the reference genome and mean depth with SAMtools
Short indel and SNP detection
Genome Analysis Toolkit (GATK) program [37] was applied to detect SNPs and Indels. All .bam files were preprocessed in two steps; i) local realignment around Indels was done using known Indels, ii) recalibrating base quality scores was done to increase quality score for each base. The purified data belonged to the same individual were jointly used to create genome variant call format (gVCF) files by GATK HaplotypeCaller, followed by merging the gVCF files belonged to all individuals employing the GATK GenotypeGVCFs. Finally, SNPs and Indels were separated from the resulted raw variant file and filtered using GATK Select Variants and GATK Variant Filtration, respectively.
SVs detection
SVs including deletions, inversions, translocations (inter and intra chromosomal) and insertions were detected using BreakDancer-1.1 [14] software. SVs were filtered using BreakDancer with read coverage >=10, the score>=80 and size>=50 bp.
SNP and Indel annotation
Functional consequence analysis of SNPs and short INDELs were studied using SnpEff 4.0e [16], also transition to transversion and homozygous to heterozygous ratios for single nucleotide variants and were calculated with SnpSift [49].
CNV Calling
Putative CNVs on the 38 Canine autosomes and X chromosome were detected based on read depth method using CNVnator [1]. We run CNVnator with a bin size of 150 bp and GC correction (default) for our data. Filtering putative CNVs was done using different criteria including size > l kb, P-value < 0.01 and q0 (zero mapping quality) < 0.5. We removed all un-localized chromosome CNVs (chrUn). Putative CNVRs were obtained using Bedtools software [45] from overlapping of 1bp or greater CNVs on chromosomes 1-38 and X chromosome in 6 individuals as reported before [47]. All CNVRs were categorized into three classes, e.g., “Loss” (including deletion), “Gain” (including duplication) and “Both” (including both deletion and duplication). To compare the putative CNVRs from this study with the CNVRs reported in the previous studies, all coordinates related to CNVRs of the previous studies were converted from CanFam 2.0 to CanFam 3.1 using the lift over tools (https://genome.ucsc.edu/cgi-bin/hgLiftOver).
Gene contents and gene ontology analysis
Dog gene IDs that covered small Indels, SVs and CNVRs were retrieved from Ensemble annotation [24]. All dog gene IDs were changed to human gene IDs. Gene orthologous connection between dog and human was obtained from Ensembl. Gene ontology (GO) was done using DAVID program [29].
Visualization of structural genomic variation
We drew the physical distribution of CNVRs on chromosomes 1-38 and X chromosomes using VCStools [30]. RCircos package [66] was used to draw circular genetic maps for visualizing similarities and differences of positional relationships and genome structure between dog and wolf.
BAM: Binary Alignment MAP; Bwa: burrows wheeler aligner program; chrUn: Un-localized chromosome; CGH: Comparative genomic hybridization; CNVRs: Copy number variation regions; CNVs: Copy number variations; FC: Fertile Crescent; GATK: Genome Analysis Toolkit; GO: Gene ontology; gVCF: Genome variant call format; GW: Gray wolf; Indels: Insertion and deletion; KEGG: Kyoto Encyclopedia of Genes and Genomes; QI: Qahderijani; SAM: Sequence Alignment MAP; SI: Saluki; SVs: Structural variants; 3'-UTR: Three prime untranslated region; 5'-UTR: Five prime untranslated region
Ethics approval and consent to participate
This study had Institutional Animal Care and Use Committee (Kunming Institute of Zoology, approval ID: SYDW-2013021) approval. We collected peripheral blood samples from 3 Iranian dogs with the consent of owners and 3 gray wolves after obtaining authorization for research from the Department of Environmental Protection in Iran (No. 93/34089, dated 14 October 2014).
Consent for publication
Not applicable.
Availability of data and materials
Data deposition: Raw sequence reads data have been deposited in the Genome Sequence Archive (http://gsa.big.ac.cn/) under accession CRA0001324 for raw data of genomes.
Competing interests
The authors express no competing interests.
Funding
This research was funded by the National Natural Science Foundation of China (No. 91531303), the international cooperation program of bureau of international cooperation of Chinese Academy of Sciences (No.GJHZ1559), and the Animal Branch of the Germplasm Bank of Wild Species, Chinese Academy of Sciences (the Large Research Infrastructure Funding). A.E. was supported by the Chinese Academy of Sciences President's International Fellowship Initiative (No. 2016VBA050). MSP and GDW appreciate the assistances from the Youth Innovation Promotion Association, Chinese Academy of Sciences.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Authors' contributions
G-DW, AE and M-SP realized and planed the study. ZAG and MAF provided samples. G-DW prepared the genomic DNAs of the six samples. ZAG and G-DW analyzed and interpreted the data. ZAG drafted the manuscript. M-SP and AE revised the manuscript. Y-PZ prepared resequencing of data and was the project leader. All authors have read and approved the final version of the manuscript.
Author details
1Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
2Yong Researchers Society, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
3State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences No. 32 Jiaochang Donglu, Kunming, Yunnan, 650223, China.
4 State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650091, China.
a¶Co-first authors.
Acknowledgements
This research was carried out as part of PhD thesis at Shahid Bahonar University of Kerman, Iran. We appreciate sampling assistance from the dog owners and staff from department of natural resources in assisted Tehran and Kerman, Kerman Zoo, Tehran Eram Zoo and Shiraz Zoo in Iran. Also we thank Dr. Hosein Rashidi and Dr. Iman Memarian for their assistance in sampling wolf in Kerman Zoo and Tehran Eram Zoo, respectively.
Table 1-Size distribution of CNVRs detected by CNVnator
Summary statistic of CNVRs |
Gain |
Loss |
Both (loss and gain) |
Total |
Number of CNVRs |
3916 |
6400 |
255 |
10571 |
Total length(Mb) |
83.75 |
47.28 |
23.62 |
154.65 |
Mean length(Kb) |
21.39 |
7.39 |
92.62 |
14.63 |
Median length(Kb) |
11.70 |
4.49 |
38.99 |
7.05 |
1≥ Kb to <5 Kb |
555 (14.17%) |
60 (0.94%)
|
- |
3996 (37.80%) |
5≥ Kb to <10 Kb |
1119 (28.57%) |
3441(53.76%) |
14 (5.49%) |
2706 (25.59%) |
10≥ Kb to <20 Kb |
1160 (29.62%) |
1573 (24.57%) |
45 (17.64%) |
2252 (21.30%) |
20≥ Kb to <50 Kb |
750 (19.15%) |
1047 (16.35%) |
189 (74.11%) |
1123 (10.62%) |
50≥ Kb |
332 (8.47%) |
279 (4.35%) |
7 (2.74%) |
494 (4.67%) |
Table 2 - Sampling location and ecotypes
The latitude and longitude of each location |
Ecotype |
Location |
Sample ID |
Sample |
35 18′ 52″ N, 46 59′ 32″ E |
Saluki (Tazi) |
Sanandaj, Iran |
DogSI1 |
Dog |
35 52′ 22″ N, 47 36′ 10″ E |
Saluki (Tazi) |
Bijar, Iran |
DogSI2 |
Dog |
32 38′ 0″ N, 51° 39′ 0″ E |
Qahderijani |
Esfahan, Iran |
DogQI |
Dog |
34 48′ 0″ N, 48° 31′ 0″ E |
- |
Hamadan, Iran |
GW1 |
Wolf |
35 41′ 46″ N, 51 25′ 23″ E |
- |
Tehran, Iran |
GW2 |
Wolf |
30 17′ 0″ N, 57 5′ 0″ E |
- |
Kerman, Iran |
GW3 |
Wolf |
Additional file 1: Suplementary.doc. Included Tables S1-S15 and Figures S1- S18
Additional file 2: Table S16. Genomic structural variants including insertions, deletions, translocations (inter and intra chromosomal) and inversions for three dogs.
Additional file 3: Table S17. Genomic structural variants including insertions, deletions, translocations (inter and intra chromosomal) and inversions for three wolves.
Additional file 4: Table S18. The total number of deletions, insertions, inversions, inter chromosomal translocations and intra chromosomal translocations, across the 6 individuals.
Additional file 5. Table S19. The total number of CNVRs
Additional file 6: Table S20. Statistics CNVs for Canine autosomes and X chromosome.
Additional file 7: Table S21. Comparison with previous dog CNV studies