Deleterious mutation load in diverse cattle breeds of the world

Domestication of wild animals results in a reduction in the effective population size and this could affect the deleterious mutation load of domesticated breeds. Furthermore, artificial selection will also contribute to accumulation deleterious mutations due to the increased rate of inbreeding among these animals. The process of domestication, founder population size, and artificial selection differ between cattle breeds, which could lead to a variation in their deleterious mutation loads. We investigated this using the whole genome data from 432 animals belonging to 54 cattle breeds of the world. Our analysis revealed a negative correlation between the genomic heterozygosity and the ratio of amino acid changing diversity to silent diversity. This suggests a proportionally higher amino acid changing Single Nucleotide variants (SNVs) in breeds with low diversity. Our results also showed that breeds with low diversity had more high-frequency (DAF > 0.51) deleterious SNVs than those with high diversity. A reverse trend was observed for the low-frequency (DAF ≤ 0.51) deleterious SNVs. Overall, taurine cattle breeds had more high-frequency deleterious SNVs than indicine (or taurine-indicine hybrid) breeds. However, within taurine breeds European or Northeast Asian taurines had more highfrequency deleterious SNVs than East Asian or African taurine breeds. Similarly, within indicine breeds South Asian indicines had more high-frequency deleterious SNVs than East Asian indicine breeds. All the above observed patterns were reversed for low frequency deleterious SNVs. Some of the variation in the deleterious mutation load observed between different breeds could be attributed to the population sizes of the wild progenitors before domestication. However, the variations observed withing taurine and within indicine breeds could be due to the difference in the extent of inbreeding, strength of artificial selection and/or founding population size. The findings of this study imply that the rate of incidence of genetic diseases might vary between cattle breeds. Our study revealed a higher mutation load in cattle breeds with low genomic diversity compared to those with high diversity. The results also showed a higher number of highfrequency deleterious SNVs in the former than the latter. These results suggest that diversity, deleterious mutation load and the frequency of deleterious mutations are determined by their effective population sizes as predicted by population genetic theories. While we found higher mutational load in taurine breeds compared to indicines or hybrids, the loads did vary within taurine breeds owing to the difference in their population sizes. These results have implications in understanding the health of different cattle breeds as the mutations causing genetic diseases and their frequencies are expected to vary between them. For instance, rate of incidence of genetic diseases caused by recessive homozygous variations could potentially be higher in breeds that have small effective sizes.


Background
Theories of population genetics predict that at low frequencies deleterious Single Nucleotide Variants (SNVs) can contribute significantly to the heterozygosity. In contrast, they are prevented from reaching to high frequencies and are eventually eliminated by purifying selection [1]. Domestication of wild plants and animals result in a population bottleneck, because only a small subset of the wild population is typically sampled to form the founder stock [2]. Artificial selection for desired traits and inbreeding also lead to further reduction in the effective population sizes during breed formation [3]. Due to these reasons, domesticated plants and animals are expected to accumulate an excess of deleterious mutations compared to their wild types. A number of previous studies investigated this by comparing the deleterious mutational loads of wild and domesticated plants and animals. The ratio () of diversities of amino acid changing (nonsynonymous) and silent (or synonymous) SNVs was used as the measure of deleterious mutational load. In a previous study  was found to be much higher in domesticated pig breeds compared to wild pigs [4]. The  estimated for commercial white layer chickens was found to be much higher than that observed for wild African village chickens (putatively close to jungle fowl) [4]. Similarly, much higher  were observed for domesticated breeds of horse [5], dog [6], rabbit [7] and silkworm [7] than their wild relatives.
Studies on plants were also showed higher  for cultivated crops such as rice [2,8], soybean [9], cassava [10], and sunflower [11] compared to their wild progenitors Theories also predict that when population size declines harmful mutations are elevated to high-frequencies by genetic drift [12]. For instance, human migration out-of-Africa resulted in a series of bottlenecks in non-African populations as they were successively subsampled along the migratory route. This resulted in a higher proportion of high-frequency and homozygous deleterious SNVs in non-Africans compared to Africans [13,14]. Since the process of domestication also introduces bottlenecks, a similar pattern is expected in domesticated animals. This was confirmed by a previous study that compared the exomes of dog breeds and wild wolves and found a higher proportion of homozygous deleterious SNVs in dogs than wolves [6]. Similarly, domesticated yak populations were reported to have higher number of homozygous deleterious amino acid changing SNVs than those estimated for wild yaks [15]. High frequency deleterious variants causing diseases such as retinal degeneration in European cattle breeds has been attributed to the process of domestication and artificial selection [16].
Global cattle breeds predominantly derive from Bos taurus, Bos indicus or from their hybrid Bos taurus × Bos indicus [17]. Therefore, the heterozygosity as well as the deleterious mutational load in different breeds could also depends on the ancestral source species/population [18]. In addition, the differences in the degree and patterns of artificial selection and the rate of inbreeding could also contribute to the deleterious mutation load of different cattle breeds [3]. The present study estimated the deleterious mutational load in various cattle breeds and investigated the potential contributions of the above-mentioned factors by analysing the whole genome data from 432 animals belonging to 54 distinct breeds of the world.

Genome data
Whole genome data from 108 cows and 314 bull was obtained from the Bovine Genome Variation Database [19]. These animals belong to 54 breeds including those from Europe (Central and Western Europe), Northeast Asia (Japan and South Korea), Africa (Western Africa and Guinea), Middle East (Iran), South Asia (India, Pakistan, and Sri Lanka) and East Asia (China). The number of individuals in each breed varies between 1 to 45 (for details, see Supplementary Table S1). To orient the direction of mutations and to find the derived SNVs, the whole genome data of American Bison (Bison Bison) was used. For this purpose, the whole genome LASTZ alignment of Cow and American Bison pair was downloaded from the Ensembl genome data resource (ftp://ftp.ensembl.org/pub/release-102/maf/ensemblcompara/pairwise_alignments/btau_ars-ucd1.2.v.bbbi_bison_umd1.0.lastz_net.tar.gz). Using the chromosomal positional coordinates, the corresponding Bison nucleotide was obtained for each SNVs and using this, the orientation of mutational change of the SNVs were determined.

Functional annotations
To identify amino acid changing (nonsynonymous) SNVs and silent (synonymous) SNVs the genome annotation file containing the functional consequences information was obtained from

Data analysis
Nucleotide diversity () per base was estimated using the following equations [22] = − 1 where pi is the allele frequency of SNV i, S is the total number of SNVs in the whole genome or exome, n is the number of chromosomes sampled and L is the number of sites or bases in the genome, synonymous or nonsynonymous positions. The ratio () was estimated as = where N and S are nonsynonymous and synonymous nucleotide diversities. Only biallelic SNVs are used in the analysis. To test the significance between mean estimates the Z-test was used and the Spearman rank correlation was used to estimate the strength of relationships.

Results
To examine the pattern of genomic variation the whole genome nucleotide diversity was estimated for 54 breeds. The X-axis in Figure 1 shows that the taurine breeds typically had The  estimates indirectly revealed the nonsynonymous deleterious mutation loads in various cattle breeds. We then estimated the actual counts of derived deleterious nonsynonymous SNVs. Using a GERP threshold score of >2.0 deleterious nonsynonymous SNVs were determined. The counts of deleterious SNVs were found to vary between 680 to 870 per individual, which were plotted against the whole genome diversity of cattle breeds (Figure 2A).
Although this relationship was significant ( = 0.71, P < 0.00001) the maximum difference between the number of deleterious SNVs between the breeds was only 20%.
To investigate the deleterious mutation load based on allele frequencies, the deleterious SNVs were separated into two groups. One group consisted of deleterious SNVs with DAF (Derived Allele Frequency) ≤0.51 and the other group comprising those with DAF >0.51. The correlation of these SNVs with the genomic diversity revealed contrasting patterns. Figure 2B reveals a significant positive correlation (Spearman  = 0.96, P < 0.00001) between the number of low-frequency deleterious SNVs per genome and genomic diversity and a significant negative trend (Spearman  = -0.92, P < 0.00001) was observed for the high-frequency deleterious SNVs.
Previous analyses revealed the difference in the mutation load among various cattle breeds. To obtain the unique patterns of the load on various breeds, they were grouped into six categories based on their geographical locations and ancestral source populations, as shown in Figure 3. higher counts of high-frequency deleterious SNVs than their East Asian counterparts (P < 0.00001) and this pattern was reversed for the low-frequency deleterious SNVs (P = 0.00011).

Discussion
The whole genome diversity estimated in this study varied significantly between the cattle breeds but are very similar those reported by a previous study [23]. The correlation between diversity and  suggests a higher nonsynonymous mutation load in breeds with low diversity than those with high diversity. This result is similar to the correlations observed between diversity and  estimated for various dog breeds [6], and domestic breeds of rabbit, pig, chicken [7]. Furthermore, the higher  observed for many domestic crop varieties and animal breeds compared to their wild relatives also support our results [2, 4-11, 15, 24]. This is because those studies showed that the diversity of the domesticated breeds or varieties were always smaller than those of their wild relatives. Analysis based on the actual number of deleterious nonsynonymous SNVs revealed a higher number of high-frequency deleterious SNVs in breeds with low diversity [6,15]. This is due to the role of genetic drift that elevates the frequency of SNVs due to bottleneck, inbreeding and artificial selection. Previous studies on dog breeds and yaks showed a higher number of homozygous deleterious SNVs in domesticated canines or yaks than those present in their respective wild relatives [6,15]. Since homozygous SNVs represents high frequency variants, our results are in consensus with those reported by the above-mentioned studies.
The deleterious SNVs estimated for groups of breeds showed on an average taurine cattle breeds have a higher mutational load and more high-frequency deleterious SNVs than indicine breeds. This could be due to the difference in the effective population sizes of their progenitors before domestication as suggested previously [18,25]. Alternatively, this could also be due to the difference in the severity of the bottleneck occurred during their respective domestication.
The mutation loads and the number of high-frequency deleterious SNVs also significantly vary within taurine and within indicine breeds. For instance, Northeast Asian taurine breeds have higher number of high-frequency deleterious SNVs, and lower number of low-frequency deleterious SNVs compared to those of East Asian taurines. Similarly, South Asian indicine breeds have the higher number of high-frequency deleterious SNVs and lower number of lowfrequency harmful SNVs than East Asian indicines. These differences could be attributed to the practices during breed formation [3]. For example, the strength of artificial selection, extent of inbreeding and the number of founders selected could vary significantly among breeds.

Conclusion
Our study revealed a higher mutation load in cattle breeds with low genomic diversity compared to those with high diversity. The results also showed a higher number of highfrequency deleterious SNVs in the former than the latter. These results suggest that diversity, deleterious mutation load and the frequency of deleterious mutations are determined by their effective population sizes as predicted by population genetic theories. While we found higher mutational load in taurine breeds compared to indicines or hybrids, the loads did vary within taurine breeds owing to the difference in their population sizes. These results have implications in understanding the health of different cattle breeds as the mutations causing genetic diseases and their frequencies are expected to vary between them. For instance, rate of incidence of genetic diseases caused by recessive homozygous variations could potentially be higher in breeds that have small effective sizes.

Supplementary information
Supplementary information accompanies this paper at https ://doi.