Sequencing output
Illumina Paired-end sequencing was performed for all 6 individuals (Additional file 1: Table S1 and Figure S1). After filtering, the range of total high-quality sequence data was from 42.1 Gb (Sample ID: #GW1) to 51 Gb (#DogQI), and the coverage varied from 14.51 (#GW1) to 17.15 (#GW2) (Additional file 1: Table S2). Relatively high mean sequencing depth can increase the accuracy of CNV calling through read depth method [1], and using the paired-end DNA sequencing reads together with the relatively long read length (by a uniform length of 125) will be useful to identify Indels [39, 54]. Here, to increase confidence of base calls and accuracy of detecting genomic variations, sequencing was done with relatively high mean depth for all samples (Additional file 1: Table S1). We also used uniform depth of coverage across individual genomes for increasing reliability of CNV calling (Additional file 1: Table S2).
SNP detection and annotation
SNPs were detected through aligning sequences to the reference genome. A total of 12.45 million SNPs were detected in all individuals (10.45 and 7.82 million SNPs were identified for all studied wolves and dogs, respectively) (Additional file 1: Table S3 and Figure S2).
We obtained the ratio of transitions to transversions (Ti/Tv) for the SNPs and the number of heterozygous and homozygous in the SNPs across the 6 individual genomes. The number of heterozygous SNPs was higher than the number of homozygous SNPs in 6 individuals. The Ti/Tv ratio in SNPs varied from 1.99 (#DogQI) to 2.07 (#GW3) (Additional file 1: Table S4). Figure 1 illustrates the proportion of SNPs present in each genomic regions, including intergenic, introns, exon, transcript, upstream, downstream, 3' untranslated regions (3'-UTR) and 5′ untranslated regions (5'-UTR). Our results indicate that most of the SNPs are located in the intergenic (53.57%) and intron (31.99%) regions (Additional file 1: Table S5). The total number of synonymous SNPs (silent SNPs, 68899) were more than the total number of non-synonymous SNPs (nonsense and missense SNPs, 46789) (Additional file 1: Table S6). Also, results of SNPs annotation showed that the proportion of SNPs in intron (31.85 vs 31.81) and intergenic (53.92 vs 53.52) regions in wolf genome was higher while the percentage of the SNPs in exon regions (0.81 vs 0.84) and 3'-UTR (0.43 vs 0.46) in wolf genome was lower than that in dog genome.
Small Indels detection, annotation and gene ontology
Indels were detected using aligning sequences to the reference genome. The number of Indels was calculated for all individuals (Additional file 1: Table S3). A total of 3.48 million Indels were detected across the 6 individual genomes, 2.24 million and 3.11 million of which were for 3 dogs and 3 wolves, respectively. We calculated the number of heterozygous and homozygous Indels across the 6 individual genomes (Additional file 1: Table S4). The proportion of heterozygous Indels (52.12) was higher than the proportion of homozygous Indels (47.59) in the 6 individuals. The total number of small insertions across the 6 individual genomes was 1.58 million and the total number of small deletions across the 6 individual genomes was 1.9 million (Additional file 1: Table S7). We drew the Indel length histogram for 3 dogs (Additional file 1: Figure S3), 3 wolves (Additional file 1: Figure S4) and across six individual genomes (Additional file 1: Figure S5). The results showed that the Indels of 1 bp in length across the 6 individual genomes had the highest frequency and the deletions of the same size were more frequent than the iinsertions. Annotation of the results from small Indels showed that most of the Indels are located in intergenic (53.79%) and intron regions (34.45%) (Additional file 1: Table S8). Of the total number of small Indels, 53.79, 34.778, 0.25, 0.002, 5.54, 4.95, 0.46, and 0.14% were located within the intergenic, introns, exon, transcript, upstream, downstream, 3'-UTR and 5'-UTR regions, respectively. The percentage of small Indels that are located in upstream, 5'-UTR, 3'-UTR, exon and transcript regions across 3 dog genomes was higher than that across 3 wolf genomes, but the percentage of Indels that are located in downstream, introns and intergenic regions across 3 wolf genomes was higher than that across 3 dog genomes. We obtained 21,104 genes from ensemble through the annotation of a total of 3.48 million small Indels. After, we carried out gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis for these genes (Additional file 1: Table S9). Gene Ontology (GO) analysis categorized genes related to small Indels in the three main classes (molecular function, biological process and cellular component) (Additional file 1: Table S9). The KEGG pathway analysis showed that two pathways related to cancer and Melanoma (usually but not always, a cancer of the skin) were enriched among the small Indels in both dog and wolf.
SVs detection, annotation and gene ontology
We obtained genomic SVs including insertions, deletions, tandem duplication, translocations (inter and intra chromosomal) and inversions from three dogs and three wolves (Additional file 1: Table S10; Additional file 2: Table S16, Additional file 3: Table S17 and Additional file 4: Table S18). To obtain potential functional roles related to the different types of SVs, all genes that were completely or partially overlapped with genomic regions including, Indels (insertion and deletion), inventions and complex SVs (inter and intra chromosomal translocations) were retrieved from Ensemble (Additional file 1: Table S11). Annotation of results from SVs showed that in general the percentage of coding sequences variants in dog genome is higher than that in wolf genome (Additional file 1: Figures S6-S13). Results of gene set enrichment analysis showed three enriched categories related to “covering molecular function”, “biological process” and “cellular component” (Additional file 1: Table S12). The most conspicuous cluster terms related to dog and wolf individuals were “cellular carbohydrate metabolic process (P-value, 0.04)” and “nervous system development (P-value, 0.03)”, respectively. We also identified some candidate genes associated with olfactory and immune systems (Additional file 1: Table S12).
CNV detection
We obtained putative CNVs for 6 individuals using CNVnator program and the mean number of CNVs per individual was 4143.83 ranging from 2871 to 5437 (Additional file 1: Table S13). For all of the autosomal CNVs categorized as gain, the mean copy number value of six individuals was 3.57 and the maximum copy number assessment was 174.472 on chromosome 7 (chr7) of wolf. The results showed that the number of gains in the three dog genomes was higher than those in the three wolf genomes (Additional file 1: Table S13). A total of 10571 CNVRs were obtained from overlapping of all CNVs across the 6 individuals (Additional file 5: Table S19), including 1-38 and X chromosomes, ranging in size from 1.05 kb to 3433.35 kb with an average of 14.63 kb and a median of 7.05 kb, covering 154.65 Mb, or 6.41%, of the assayed canFam3.1 genome (Table 1). CNVRs were divided into three groups, including 6400 loss, 3916 gain and 255 both (gain and loss) events (Additional file 5: Table S19). Deletion:duplication ratio in the total CNVRs was 1.96. Among all CNVRs, 6,105 (57.75%) were found in a single individuals (singleton), 1,522 (14.39%) shared in two individuals, and 2,944 (27.84%) shared in at least three individuals (Figure 2B). A number of 6702 (63.4%) CNVR events were less than 10 Kb while 494 (4.7%) of the CNVRs were longer than 50 kb in size (Table 1 and Figure 2A). The highest and lowest numbers of CNVRs belonged to chromosomes 18 and 35, respectively (Additional file 1: Figure S14 and Additional file 6: Table S20).
CNV annotation and gene ontology analysis
The annotation of results from CNVs showed that the percentage of CNVs in coding sequences (14% vs. 6%) and 3'-UTR (6% vs 0) in the dog genome was greatly higher than that in the wolf genome, but the percentage of CNVs in the intergenic regions (22% vs. 14%) in wolf genome was greatly higher than that in the dog genome (Additional file1: Figures S15 and S16). To achieve potential functional roles related to the putative CNVs, all genes that completely or partially overlapped with these CNVs were detected from Ensemble. A total of 8595 genes were retrieved, including 6703 of the CNVs. Results of gene ontology (Go) analysis showed that some general genes associated with olfactory and immune systems are enriched among the CNV gains in dog and wolf (Additional file 1: Table S14). All the terms related to olfactory system are over-represented (P-value < 0.01) in the wolf compared with those in the dog (Additional file 1: Table S14). The term “Starch and sucrose metabolism” is enriched in the dog CNV gains. Also, our result showed some categories including“cardiac conduction (P-value <0.03)”, “actin filament (P-value, 0.037)”, “muscle filament sliding (P-value, 0.02)”, “ATP binding (P-value, 3.46E-04)” and “calcium ion binding (P-value, 0.001)” are enriched among the CNV gains in the Saluki breed (Additional file 1: Table S14).
Comparison with previous dog CNV studies
To compare the identified CNVRs in this work with those of the published studies, all previous CNVR coordinates from canFam2 were migrated to canFam3 using the UCSC leftover program. In our results, 4454 CNVRs (42.1%) were overlapped by four previous studies, and the remaining 6117 (57.865%) were considered as novel CNVRs (Additional file 1: Table S15 and Additional file 7: Table S21).
Visualization of Structural Genomic Variation
For visualizing similarities and differences of positional relationships and genome structure between dog and wolf genomes, we drew maps of circular genomes for dog and wolf (Figure 3).
Identification of Deleterious Mutations
Population genetic processes due to reduced population size, such as inbreeding depression and bottlenecks, have a profound impact on the genetic makeup of a species including levels of deleterious variation [15, 34, 36]. Our results indicated that the proportion of deleterious mutations varied between wolf and doge chromosomes (Figure 4), and more deleterious mutations are in dog genome, compared with their wild ancestor.