Inbreeding estimations based on runs of homozygosity revealed genes associated with local adaptation and dairy traits in Iranian water buffaloes, Bubalus bubalis

Background Consecutive homozygous fragments of the genome that inherited from a common ancestor to offspring are known as runs of homozygosity (ROH). The ROH stretches can be used to estimate the genomic inbreeding and also for identifying genomic regions under extensive selection pressure. The aims of this study were (i) to estimate inbreeding based on ROH (FROH); (ii) to compare FROH with FHOM (excess of homozygosity), FUNI (correlation between uniting gametes), and FGRM (genomic relationship matrix); and (iii) to identify frequently occurred ROH islands (i.e., shared by >20% of the population) for gene enrichment analyses. A dataset consisted of 396 buffalo including Azeri (N = 281) and Khuzestani (N = 115) breeds genotyped for ~65000 SNP were used.


Conclusion
Genes within common ROH spots present a modest phenotypic selection for milk production traits in Khuzestani buffalo. Further, our results suggest that natural selection may have a role in forming the genome of Iranian buffaloes in order to adapt them to the harsh environment of the country.

Background
There are two main species of buffalo: Asian buffalo which is often called water (or river) buffalo, and African wild buffalo (Syncerus) [1,2] . In many tropical and subtropical countries, water buffalo is raised for both meat and milk production [3]. The worldwide buffalo population is only about 11% that of cattle. However, compared to other domesticated species, the world's buffalo population has raised in the last five decades bỹ 1.65% annually [4], for which India, Pakistan, and Europe with 5.3%, 4.8%, and 4.5% have the highest rates of annual increase, respectively. [5].
Domestication of Bubalus bubalis occurred around 5000 years ago [1]. The three major Iranian buffalo breeds are Azeri (AZ), Khuzestani (KHZ), and Mazandarani (MZ), each belongs to different geographical zone [1,4]. The AZ, KHZ, and MZ are common in the north-west (and north), west (and southwest), and north of the country, respectively.
Buffalo is of great economic and cultural importance in Iran and other developing nations for their milk production because of having such unique properties as resistance to local parasites, high ability to make use of low-quality feed, and long productive lifespan.
However, according to Safari, Hossein-Zadeh, Shadparvar and Arpanahi [1] the total number of buffalo population in Iran in 1930 was estimated 1500 thousand heads which decreased to 48 thousand by 1995, and this trend is not similar with the global buffalo population growth.
The inbreeding coefficient (F) is the most prevalent parameter in use in breeding programs because it can be estimated from pedigree records. However, the access to high-density SNP markers makes it possible to estimate the F based on genome-wide autozygosity [6]. Autozygosity occurs when parents pass identical chromosomal fragments, which they inherited from a common ancestor, to their offspring [7], and this is described as runs of homozygosity (ROH) [8]. The ROH has been suggested as a reliable way as it can distinguish between homozygous tracks that are IBD (identical by descent) from those non-autozygous IBS (identical by state) segments in the genome [6]. So, it has previously been reported that molecular methods of calculating the inbreeding such as F ROH can eliminate the drawbacks of using pedigrees to estimate inbreeding [9,10].
Also, studying ROH may give us a better understanding about the directions in which genetic selection has occurred as it leaves footprints on the genome. [11]. ROH islands that frequently occurred among individuals may contain genes associated with different traits of interest so that they can be interesting in animal genetics [11,12]. In other words, recombination occurs randomly, and the distribution of ROH is expected to be random over the genome. So, frequently occurring ROH in some regions of the genome may be a representative of selection pressure on those spots. The distribution and the occurrence of ROH have been studied in humans [13], cattle [9,14], pigs [15], sheep [16][17][18] but rarely in species such as buffaloes. In the current study, we aimed to estimate autozygosity in the genome of Azeri and Khuzestani buffalo breeds and to study ROH spots that frequently occurred among the individuals. Moreover, we provided a comparison of F ROH with other molecular inbreeding estimation methods such as F GRM , F HOM , and F UNI in these populations.

Material & Methods
Sample collection, ethical statement, and data quality control  [4].
For the 90K chip, SNPs were selected using buffalo DNA sequence [2,19]. However, since the draft of buffalo genome [20] does not have the appropriate quality [2,19], the SNPs were aligned to the cattle reference genome (UMD_3.1) [21]. Genotypes were obtained through AffyPipe [22]; then all the monomorphic and polymorphic SNPs with highresolution (n=64,750) were first maintained. According to the filtration criteria, samples with more than 5% missing genotype and SNPs with 5% missing rate were eliminated from further analyses. SNPs with unidentified position in the UMD_3.1 assembly, with MAF > 2%, and with HWE chi-square test P-value of lower than 0.000001 were filtered out. In total, 64,711 variants and 396 samples with an average call rate of 99.6% passed filters and QC.

Genetic distance among breeds
The genetic distance which is based on the IBS matrix was estimated through the -ibs command implemented in PLINK v1.09 [23]. Principal Component (PC) analysis of genetic distances was performed to visualize the genetic relationship among samples, and the graphical representation was displayed using R (http://www.R-project.org/). We removed four samples including two from each breed which were placed outside their expected breed cluster.

Runs of homozygosity analyses
Analyses of ROH were conducted for Azeri (n=279) and Khuzestani (n=113) populations, separately. For each individual within their breed cluster, ROH segments were identified using PLINK v1.09 according to the following criteria: (1) In order to avoid short ROH that where l is the number of SNP allowed in ROH, α set to be 0.5 which represent the false positive ROH; n a and n s represent the numbers of animal per breed and the number of SNP per individual, respectively; and het is the average heterozygosity for each breed. In the current study, l was calculated to be 40 and 38 for Azeri and Khuzestani breed; (4) each ROH should have at least 1 SNP over 100 Kb; (5) max gap between two neighboring SNPs that were in ROH was set 1 Mb. Then we categorized detected ROH into five groups including 1 to 2, 2 to 4, 4 to 8, 8 to 16 and >16 Mb using the same classification that reported by other authors [9,24]. After that, the frequency and the mean length (Mb) of ROHs within each category, the percentage of each ROH category, and the percentage of genome coverage by each ROH category were calculated, separately for each breed, through R software (http://www.R-project.org/).

Inbreeding coefficient estimations
The coefficient of inbreeding were estimated through ROH (F ROH ), excess of homozygous genotypes (F HOM ), correlation between uniting gametes (F UNI ) and genomic relationship matrix (F GRM ). F ROH (Eq. 2) was calculated per individual as following [25]: in which F R O H i is the inbreeding for the i th animal; n and L R O H j are the total number of ROHs and the length of the j th ROH for the i th animal, respectively; and L aut is the total autosome length covered by the SNP markers (i.e., 2.5 Gb in the present study).
Alternatively, we calculated genomic inbreeding through the -ibc command implemented in GCTA software [26] which returns three different inbreeding estimations including F GRM (Eq. 3), F HOM (Eq. 4) and F UNI (Eq. 5) where x i and p i are the number of copies and the frequency of the first allele for the i th genotype, respectively; and h i is 2p i (1-2p i ). The Pearson's correlation coefficient among these four inbreeding measures were also calculated.

Frequently appeared ROH islands and gene enrichment analyses
To identify genomic regions which were frequently covered with ROH, the frequency that each SNP occurred in ROH was calculated for each breed. The percentage that each SNPs has occurred in ROH were depicted against their position along the autosomes which were mapped against the cattle reference genome (UMD3.1). The ROHs that had a frequency of > 20% were considered as prospective genomic regions. To identify genes inside these spots, we used the UMD3.1 map viewer from NCBI website (https://www.ncbi.nlm.nih.gov/mapview/). Then to identify significantly enriched gene ontology terms (P < 0.05) using the list of genes that had previously been obtained based on the Bos taurus annotation file, we used DAVID v6.8 tool [27,28]. Finally, we did an extensive literature review to explore the biological function of annotated genes inside the ROH regions.

Results And Discussion
The objective of this research was to estimate genome-wide inbreeding in the genome of AZ and KHZ buffalo breeds using different methods, as well as to identify ROHs which shared by more than 20% of samples within each breed for further gene enrichment analyses. AZ and KHZ, which are the two major buffalo breeds in Iran, adapted to distinct geographical areas. [4]. As shown in Fig. 1 principal component analyses of the IBS matrix derived from SNP data shows two separate populations with no overlap between them which means that they are genetically distinct. However, population structure analyses reported by Mokhber, Moradi-Shahrbabak, Sadeghi, Moradi-Shahrbabak, Stella, Nicolzzi, Rahmaninia and Williams [4] has already shown a significant admixture between two breeds.

Runs of homozygosity
In the present study, we defined ROHs as lengths of homozygous genotypes that had >1Mb length but only up to one heterozygous genotype. Due to linkage disequilibrium, short homozygous haplotypes are prevalent in the bovine genome. Hence, we set a min length of 1 Mb and a min number of 40 (AZ) and 38 (KHZ) SNP for ROH detection in order not to consider those small and prevalent haplotypes in ROH. Unlike human populations, livestock species have higher levels of autozygosity and longer ROHs [9,14,25]. In addition, genotyping error can always occur. Therefore, allowing at least one heterozygous SNP seems more reasonable [16,17,24] as it avoids underestimating of long ROHs.
In total, 10,247 ROH were detected, from which 6496 ROH were in the AZ genome and 9 3751 in the KHZ genome ( Table 1). The average F ROH> 1Mb was 0.047±0.048 in the AZ and 0.06±0.04 in the KHZ breed. The average number of ROH per individual was 23.3 in AZ (ranging from 4 to 88) and 33.2 in KHZ (ranging from 4 to 132). These results illustrate that all the individuals of AZ and KHZ breed had at least 4 ROH longer than 1Mb (data not shown). As illustrated in Table (1), more than 53% of the detected ROH were of length 2-4 Mb. The percent of different length of ROH can be a reliable indicator of the number of past generations in which inbreeding has occurred. This is due to the recombination event.
In fact, recent inbreeding results in longer ROH because the recombination event requires sufficient time (generations) to disintegrate those long IBD stretches. On the other hand, short ROHs are due to ancient inbreeding because during continuous meiosis long IBD segment would have been broken down [13]. Surprisingly, 100% of the samples involved in our study displayed not less than one ROH with a length between 2-4 Mb (data not shown), representing that inbreeding event has occurred around 20 generations ago [6].
However, this results should be interpreted with cautious. As reported by Ferenčaković, Sölkner and Curik [29], a medium-density chip can result in the overestimation of the number of short-length ROHs (>4 Mb) probably because the searches cannot find the heterozygous genotype on the medium-density panel that appeared on the denser one.

Evaluation of different methods of genomic inbreeding
The mean inbreeding coefficient estimated through different methods were described in Table 2. While the KHZ presented lower inbreeding values for F HOM , F UNI , and F GRM (ranging from 0.019 to 0.021) than did AZ breed (0.03) for the same methods, F ROH was higher for KHZ (0.059) than AZ breed (0.047) ( Table 2). Compared to F ROH , inbreeding estimators such as F HOM , F UNI , and F GRM show noticeable variation among different breeds as they depend on allele frequency [10]. Our result was not surprising because there are differences in the minor allele frequency distribution among two buffalo breeds, and SNPs with low MAF (<0.2) are more frequent in KHZ genome ( Figure S1).
Pearson's correlation between F ROH and the other inbreeding estimators were represented in Table 3. High correlations were observed between F ROH and F UNI (0.94 and 0.98 for KHZ and AZ, respectively), as well as between F ROH and F HOM (0.93 for both breeds) ( Table 3).
In the literature, low to high correlation between F ROH and F UNI (ranging from 0.15 to 0.80) was reported by Zhang, Calus, Guldbrandtsen, Lund and Sahana [10]. In addition, close correlations between F ROH -F HOM ranging from 0.83 to 0.95 was described by Mastrangelo, Tolone, Di Gerlando, Fontanesi, Sardina and Portolano [14] for Italian dairy cattle breeds.
Another high correlation (0.84) between these two inbreeding estimator was reported by Zhang, Young, Wang, Sun, Wolc and Dekkers [15] which was based on ROHs >5Mb. There are also considerable variation in correlations between F ROH and F HOM reported in the literature (0.06, 0.35 and 0.61) for three dairy cattle breeds [10]. Table 3, we obtained modest to strong correlations between F ROH -F GRM (0.78 and 0.86 for KHZ and AZ, respectively) which was the lowest, among others. In Holstein cattle, the correlation of 0.81 was reported by Bjelland, Weigel, Vukasinovic and Nkrumah [30]. However, small to modest correlations between F ROH -F GRM (ranging from 0.17 to 0.42) was also reported by Mastrangelo, Tolone, Di Gerlando, Fontanesi, Sardina and Portolano [14] in Italian dairy breeds, and a moderate correlation (0.62) by Pryce, Haile-Mariam, Goddard and Hayes [31] in Holstein and Jersey populations. The variation among different results reported in the literature for correlation between F ROH and F GRM is most likely due to the fact that F GRM depends strongly on the allele frequencies, and especially populations with different allele frequency are expected to illustrate misleading IBD results. F PED of 0.03 in Iranian buffalo was previously reported [32] that was lower than F ROH estimated in this study. Since pedigree data were not available in the present study, comparing F PED and F ROH was not possible. However, Zhang, Calus, Guldbrandtsen, Lund and Sahana [10] and Purfield, McParland, Wall and Berry [18] reported a moderate to high (from 0.47 to 0.82) and low to moderate (from 0.12 to 0.76) correlations between F PED and F ROH in cattle and sheep breeds, respectively. Low to moderate correlation between F PED and F ROH was also reported by Peripolli, Stafuzza, Munari, Lima, Irgang, Machado, do Carmo Panetto, Ventura, Baldi and da Silva [9] suggesting that F PED may not capture small IBD segment that are the result of ancient inbreeding. It is apparent that while inbreeding based on pedigree has its own drawbacks such as the inability to capture ancient inbreeding, and that accurate and in-depth pedigree records are not always available, inbreeding methods based on allele frequency has also shown a considerable variation among different studies. So, it can be concluded that since ROH does not depend on allele frequencies and can capture both recent and ancient inbreeding, it is a more suitable approach to assess inbreeding in livestock genomics.

As shown in
It should be noted that ROH in an animal could occur as a result of either inbreeding or such events as selection. Indeed, some haplotypes that frequently occur on the genome are a direct result of selection pressure on the regions of genome harboring functional role [33]. Iranian buffalo breeds have not been exposed to intensive artificial selection; hence, the existence of ROH in these two breeds can be due to both natural selection and the inbreeding itself. Pryce, Haile-Mariam, Goddard and Hayes [31] studying the association of homozygosity on the genome of Holstein cattle with inbreeding depression reported associations of ROH > 3.5Mb with decreased milk production. In the current study, around 35% of the total detected ROHs were of length > 3.5 Mb. Therefore, our results suggest drawing up conservation plans to preserve the local livestock genetic resources.
Variations between samples with respect to their total ROH number and total ROH length were seen (Fig. 2). Individuals with an equal portion of the genome covered by ROH revealed different numbers of segments, which is likely due to the fact that they have a distinct distance from the common ancestor. On the whole, the portion of the genome that was autozygot (4.71% and 5.96% in AZ and KHZ, respectively) was high compared with the result reported in Nellore cattle (4.58%) [34], and was low compared to the result reported in Marchigiana beef breed (7%) [11], Austrian dual purpose breeds (9%) [35], and in Holstein cattle (10%) [36].  Table   S1). From among these ROH islands, those on BTA7, BTA13, and BTA14 were seen in both AZ and KHZ breeds (Figure 3). The strongest peak was viewed on BTA19 (19:411,773-3,701,223 bp) for AZ and on BTA5 (5:55,217,391-57,476,442 bp) for KHZ, which appeared in about 30% of the individuals (Fig. 3).

Candidate genes inside frequently occurred ROH regions
In this study, all the gene ontologies that significantly enriched (P < 0.05) were not discussed in detail, but we concentrated on those regions that displayed associations with particular traits of interest regarding animal genetics and breeding. In total, 220 genes were identified; of which 22 in AZ and 198 in KHZ genome (Table S2). Surprisingly, no significant GO term (P < 0.05) was found for the AZ population. The total length of frequently occurred ROH islands were 6,254,221 and 15,356,992 kb in the AZ and KHZ genome, respectively. That is more likely why only a few genes with no significant GO was identified for the AZ. However, the GO analyses revealed numerous significant process for the in KHZ breed. Five genes identified with respect to positive regulation of DNA metabolic development (GO:0051054) in biological process, in which we highlight the STAT6 (signal transducer and activator of transcription 6) gene on BTA 5. Interestingly, in feedlot cattle, it was recognized to have significant effects on the growth efficiency and on the quality of carcass [37]. In addition, Nguyen, Reverter, Cánovas, Venus, Anderson, Islas-Trejo, Dias, Crawford, Lehnert and Medrano [38] using a co-expression network analysis reported the critical role for transcription factors: STAT6, PBX homeobox2, and polybromo1 in regulating pubertal development in Brahman heifers.
Twelve genes recognized in ROH islands that were related to lipid metabolic process (GO:0006629). Of these, BMP2 (bone morphogenetic protein 2, on BTA13) may play a role in the rebuilding of the hair follicle [39]. In porcine, this gene was reported to have an important role in regulating body dimension and muscle developments [40,41]. Kim, Elbeltagy, Aboul-Naga, Rischkowsky, Sayre, Mwacharo and Rothschild [42], studying signatures of selection between goats and sheep native to Egypt, reported associations between several genes including BMP2 with body size and development and concluded that these genes regulate adaptation to hot arid habitat. According to Moioli, Pilla and Ciani [43] BMP2 an VNRT genes are most likely associated with the fat tail phenotype in sheep. Yuan, Liu, Liu, Kijas, Zhu, Hu, Ma, Zhang, Du and Wang [44] reported some genes including BMP2 as a candidate gene associate with tail type in Chinese indigenous sheep.
Another gene related to this ontology identified on BTA 5 is CYP27B1 (1-α-hydroxylase), for which gene expression was reported to be up-regulated as a result of bacterial infection suggesting that this gene has a role in modulating innate immune responses [45]. On BTA13 of KHZ breed, three genes identified with regard to positive regulation of DNA replication (GO:0045740) in biological process; from which, the PCNA (proliferating cell nuclear antigen) gene has earlier been reported to associate with follicular development and growth in buffalo ovary [46].
Single-organism cellular process (GO:0009987) with 64 genes was significantly enriched (p = 0.05), which contains the MARS (methionyl-tRNA synthetase) and ADRA1D (adrenoceptor alpha 1D also known as ADRA1A) genes on BTA5 and 13, respectively. Surprisingly, MARS has previously been reported to be in association with milk and protein production in Chines [47] and Portuguese [48] Holstein cattle, and ADRA1D with milk protein in Murrah dairy buffaloes [49]. Other interesting genes within this category were INHBC and INHBE (inhibin beta C and E subunits) on BTA 5 reported as candidate genes associate with reproductive ability of tropical young bulls [50], and with composite reproductive traits of Lori-Bakhtiari sheep [51]. KCTD16 gene on BTA7 were reported as candidate gene for fat yield in Nordic Holstein cattle [52], with meat quality in Simmental beef cattle [53]. And with residual feed intake in Junmu White pigs [54]. The PMEL (premelanosome protein) and MYO1A (myosin IA) on BTA 5 are two genes within this category that have been reported as putative candidate genes related to coat color phenotypes in cattle by [55] and [56], respectively. PMEL is required for the melanin biosynthesis process; for which the original function is the pigmentation of hair, mucous membranes, and eyes [57]. In cattle, this gene is reported as a candidate gene associated with the dilution of coat color, which has an effect on color intensity [58,59]. SUOX (Sulfite Oxidase), the other important gene within this category was reported to associate with bone development in cattle [60].
Basically, ROH pattern differs from one to another population [14,61]; as seen in the recent study between AZ and KHZ. Investigation of the pattern of ROH within breeds may help us discover the direction of selection and trace the impact of artificial and natural selection on the genome over time [36]. For buffalo breeding in Iran, selection practice on dairy traits including milk production, and fat and protein percentage have been carrying out through the government and under the supervision of ABCI [1]. Candidate bulls are selected from rural herds based upon their body conformation and maternal performance, and semen of selected bulls are distributed in individual herds [1]. On the other hand, Iranian buffalo breeds display noticeable adaptability to the rigid environment of the country. For example, according to Mokhber, Moradi-Shahrbabak, Sadeghi, Moradi-Shahrbabak, Stella, Nicolzzi, Rahmaninia and Williams [4], AZ breed in the northwest of Iran is subject to dry summers with temperatures approaching 35 °C, and cold and belowfreezing winters with heavy snowfall. The climate of Khuzestan province placed in the Southwest of the country is usually sweltering and sometimes sultry with summers' temperature regularly reaching 45 °C. Therefore, it is not astonishing that the detected ROH regions with significant GO term harboring candidate genes are associated with adaptability to the harsh environments of the country and with milk production related traits which deserve further studies.

Conclusion
In this work, we assessed runs of homozygosity (ROH) in the genome of Iranian local buffalo breeds. Several genes were identified in hotspot ROH zones, some of which were associated with growth efficiency, immune response, reproduction, and milk and protein production. Our result suggest that genes detected in the present study are related to adaptation of the animal to the harsh environment of the country, as well as to milk production related traits that have been under phenotypic selection. Thus, they deserve further investigations.    Figure 1 Principal Component analysis using IBS distance matrix   Table S1.xlsx Figure S1.tif