Diversity of CNVs between different European horse breeds
Genetic polymorphisms play an important role in the phenotypic diversification and speciation in equids [27-28]. Although diverse genetic variants underlying phenotypic variation have been successfully mapped (e.g. [12-13,29-30]), a large proportion of the horse genome still remains poorly understood. Using HD-SNP array data from a large cohort of individuals across groups of phenotypically and ancestrally divergent horses, we showed that CNV distribution across different breeds presented many commonalities (genomic location, gain or loss), but that some unique private CNVs were observed in particular genomic regions. Moreover, both validation rate of CNVRs and overall genotyping concordance rate of 82.5% proved the Axiom Analysis as a consistent method for CNV calling.
Principal component analysis showed remarkable variation among populations, and was in accordance with known breed divergence and history [26,31]. Comparable findings were also pointed out in other domestic and livestock species [23,32-33]. Similar to the current study, previous studies have indicated clear distinctions in CNV frequency between breed groups or populations, possibly reflecting breed patterns of phenotypic diversity and the population history of different breeds, such as a change in past effective population size, gene flow, or selection [9,21,23]. Our observed differences also support the hypothesis that genetic variation from CNVs may contribute to breed phenotypic diversity, but it may also result from the differential demographic history and effective population sizes between breeds [9, 23].
With only minor exceptions, the CNV distribution, showed small differences in SNP and segment CNVs between breeds. The average percentage of total coverage by the segment CNVs identified across the genome of the investigated breeds was small in relation to the reference breed (< 0.5%). In general, low CNV diversity is expected  as the segment CNVs identified in the present study covered an even smaller proportion of the genome in comparison with those previously reported in the same type of horse breeds [5,7,16-17]. Consequently, the differences observed may be attributable to the different genetic background of the individuals, sample size and methodologies applied for CNV discovery. As observed in other studies with the same type of horse breeds, the largest number of shared CNVRs were found in ECA12 (e.g. ECA12:9,158,392-17,707,943 [5,7,16,18,20]). ECA12 displays the particular feature of being enriched with clusters of olfactory receptor genes, which is also observed in other mammalian genomes and it has been hypothesized to influence flight response and temperament diversity in horses . Similarly, we found overlapping of CNVRs with previously identified T-cell receptor and MHC class genes, that exhibited high levels of diversity in one or more similar type of horse breeds (e.g. on ECA1:154,857,175-156,876,500; ECA1:158,843,180-160,751,024; and ECA20:28,731,700-35,604,382) [5,7,16,18]. Interestingly, ECA1 overlapped with established QTLs for conformation in British ponies and reproductive traits in German Warmbloods (e.g. on ECA1:53,917,112-54,139,087 and ECA1:93,506,091-95,186,154 [34-35]), and CNVRs in ECA20 overlapped with QTLs for back conformation in American Saddlebred horses (e.g. on ECA20:41,994,712-43,192,412 and ECA20:43,299,741-43,568,400 ). The largest amount of shared CNVRs within all horse breeds studied overlapped with QTLs for white markings detected in the light draught Franches-Montagnes horses (ECA1:154,857,175-156,876,500 ) and withers at height detected in British ponies (ECA7:43,0483,386-53,094,544; ECA8:0-6,769,072 and ECA12:9,158,392-17,707,943 ). Although no candidate genes have been previously reported in these regions, our findings suggest that the functionality of CNV-enriched genes in horses fall into sensory perception, immunity, reproduction and exterior traits.
Our results also support the hypothesis that high frequency private SNP CNVs in particular (e.g. on ECA25 in Exmoor ponies) may be responsible for population-specific selection and adaptation [21-22]. This provides further evidence to presume that CNVs in these regions may represent a substantial source of genetic variation for diverse phenotypes and biological processes, although further analysis is required to confirm phenotypic changes. Additionally, a greater amount of segment CNV gains compared to losses was observed, potentially reflecting the large number of losses in the reference native Belgian draught horse. However, this may also reflect the fact that duplications of coding sequences potentially enhance the organisms’ genetic diversity, phenotypic variation and adaptation potential [38-39].
Diversity of CNVs in breed clusters
The large cohort of individuals analysed in this study (~195 horse on average per breed), in comparison with other studies where breeds are represented with one or two horses, is likely to provide a more accurate overview of single private CNVs. The PCA analysis of the SNP and segment CNV distribution according to the three breed clusters confirmed similar frequency levels within three breed clusters. This should not be surprising given that the largest number of shared CNVs was previously reported between closely related breeds such as Warmbloods .
Our results also showed that the proportion of breed cluster-specific SNP CNVs differed between groups. For instance, the Draught and Warmblood clusters displayed a relatively high proportion of unique private SNP CNVs (up to 30-50%). However, a smaller proportion of SNP CNVs may be attributable to breed-specific characteristics (less than 14%). Such differences in unique CNVs in ancestrally divergent Equidae members have also been previously reported. As an example, Doan et al.  detected higher proportions of certain specific CNVs in donkeys (35%) and miniature (24%) horses, over the total breed-specific CNVs across 17 different Equidae species, which has been attributed to larger divergences relative to the Thoroughbreds.
Interestingly, we also identified breed group-specific SNP CNVs located in particular genomic regions which may be attributable to breed-group features. For instance, specific SNP CNVs gains were detected in ECA1 (163,032,489-163,181,822) in more than 20% of the individuals belonging to the Friesian horses’ cluster, and more than 20% of the individuals belonging to the Draught cluster showed specific SNP CNV losses in ECA7 (34,601,429-34,608,700). However, all these private SNP CNVs identified reside in intergenic regions or merge into CNVR which partially overlap with previously identified regions in the Quarter horse . Notwithstanding, the identified SNP CNVs gains in ECA9 (40,822,784-40,867,675) in more than 20% of the individuals belonging to the Warmblood cluster reside in a region which contain two genes, RAD54 homolog B (RAD54B) and Reactive intermediate imine deaminase A (RIDA). The latest is regarded as a potential candidate gene for athletic performance since it is involved in metabolic processes and has been related to blood protein levels in humans . CNV polymorphisms in these region may represent a substantial source of genetic variation of high value for genetic association analyses in the future.
PANTHER analysis of genes underlying the CNVR and novel CNVR discovery
The evolutionary process of species formation (speciation) is complex and influenced by fast evolving changes in specific regions in the genome (i.e. CNVRs driving novel gene functions), which may affect regulatory key biological mechanisms and play a fundamental role in gradual adaptation to different environments [1, 22]. In the last 10 years, substantial progress has been made in relation to the functional impact of structural variants in several species with focus on population diversity. Enrichment of CNVRs in genes related to immune response, brain development, metabolic processes, sensory perception of smell or chemical stimuli have been reported in global populations of humans as well as pigs, dogs, cattle and horses [2, 4, 8, 18, 23, 41]. In this sense, our results also indicate that CNVRs are located in specific genomic regions and are involved in important biological processes in mammals such as immunity. As horse populations have gone through strong and diverse selection since horse domestication , these findings could be expected and may indicate favourable selection of structural variants associated with specific traits (e.g. insect bite hypersensitivity (IBH) in the Friesian horses ). However, the association of CNVs with certain private traits in horses needs further exploration.