Genome-wide scan for runs of homozygosity identifies potential candidate regions associated with growth traits in beef cattle

Background: Runs of homozygosity (ROH) are continuous homozygous regions that generally exist in the DNA sequence of diploid organisms. Identifications of regions of the genome lead to reduction in performance can provide valuable insight into the genetic architecture of complex traits. Here, we evaluated genome-wide patterns of homozygosity and their association with growth traits in a commercial beef cattle population. Results: We identified a total of 29,271 ROH segments with an average number of 63.36 and an average length of 0.98 Mb in this commercial beef cattle population, representing ~2.53% (~63.36Mb) of the genome. To evaluate the enrichment of ROH across genomes, we initially identified 280 ROH regions by merging ROH events identified across all individuals. Of these, nine regions were significantly associated with six growth phenotype traits (body height, chest circumference, fat coverage, backfat thickness, ribeye area, carcass length; P<0.01), which contain 187 candidate genes. Furthermore, we found 26 consensus ROH regions with frequencies exceeding 10%, and several of these consensus overlapped with QTLs which are associated with weight gain, calving difficulty and stillbirth. To precisely locate locus within each ROH for every studied trait, we further utilized loci-based methods for association analysis among these identified regions. Totally, we obtained 9,360 loci within ROH, and 1,631 loci displaying significant association (P<0.01) for eight traits. In addition, we found that 67 genes embedded with homozygous loci. Several identified candidate genes, including EBF2, SLC20A2, SH3BGRL2, HMGA1 and ACSL1, were related to growth traits. Conclusions: This study assessed genome-wide autozygosity pattern and inbreeding level in a commercial beef cattle population. Our study identified many candidate regions and genes with ROH for growth traits in beef cattle, which can provide important insights into investigating homozygosity across genome in other farm animals. Our findings may further be unitized to assist the design of selection mating strategy. trait loci; Linkage disequilibrium; IBD:Identity GO:Gene ontology; DAVID:The Database for Annotation, Visualization and Integrated Discovery.


Background
Runs of homozygosity (ROH) are continuous homozygous regions that generally exist in the DNA sequence of diploid organisms [1]. Several studies have suggested that the individual inherits the chromosomal fragment that are identity by descent (IBD) from both parents, resulting in homozygous segments in the offspring's genome [2,3]. Many factors can contribute to the formation of ROH including genetic drift, inbreeding and intensive selection [4]. With the development of high density array and whole-genome sequencing technologies, the emergence of high-throughput approaches provided unprecedented opportunities to explore homozygosity segments in high resolution [5].
Initially, Broman et al. [3] proposed that ROH exists widely in the human genome and has an important impact on human health. Gibson et al. [2] further explored different distribution frequencies of ROH with different lengths using high-density SNP arrays, these study promoted the emergence for study of ROH in human genetics. Additional studies found ROH can provide insights into elucidating population structure and demography in human [6][7][8] and genetic relationships, inbreeding level and selection pressure for farm animals [9][10][11][12][13][14]. The enrichment of ROH may increase the harmful recessive alleles, and reduce the survival ability of individuals [15]. Several studies found a significant association between ROH and the occurrence of recessive diseases [16][17][18]. Other studies also suggested ROHs were related to human diseases including schizophrenia [16], rheumatoid arthritis [19] and Alzheimer's disease [20].
In recent years, ROH has been widely utilized to assess the genome wide inbreeding level and genetic relationship in farm animals [12,[21][22][23][24][25][26][27][28][29][30]. ROH can provide a novel perspective for detecting detrimental variation in the genome, assessing inbreeding level and offer important information of genetic and demographic history [31]. In cattle, Ferencakovic et al. investigated genome-wide ROH pattern and revealed inbreeding coefficients calculated by ROH were useful for assessing inbreeding level in cattle [9]. Also, more accurate estimation based on ROH can be obtained for inbreeding coefficient, when compared with the pedigree data [22]. Moreover, ROH can reflect the genetic relationship and inbreeding levels among groups, and affect the selection on the genomic region [27,32]. Previous studies suggested the formation of ROH may involve the selection of complex traits and some regions of ROH were identified which are related to economical important traits in farm animals [26,33,34]. However, there is no systematic assessment of runs homozygosity pattern and their association with important traits in commercial beef cattle populations. The commercial beef cattle population used on current study is a hybrid population between Wagyu cattle and Fuzhou cattle in China. Based on this population, we have previously identified many candidate genes associated with body measurement traits [35] and fatty acid composition in these cattle [36].
The aims of this study were to (i) assess genome-wide autozygosity patterns and estimate inbreeding level based on ROH in this commercial beef cattle population; (ii) characterize profiles of ROH with different sizes and their gene contents with ROH hotspots; and (iii) identify associated ROH regions and loci for economical important traits.

Genomic ROH distribution
We identified a total of 29271 ROHs in 462 Chinese commercial beef cattle population. ROH were identified with an average number of 63.36 segments and an average length of 0.98 Mb. We found most ROH lengths are relatively short, and most of ROH with lengths ranging from 0.5 Mb to 5 Mb, which accounting for 98.50% of total number, while the proportion of the genome covered by them was relatively small (Fig. 1a). The size of ROH varies from 500.02 kb to 48554.99 kb, we found the longest ROH was located on BTA10 (48554.99 kb, with 10015 SNPs), while the shortest ROH was located on BTA3 (500.02 kb, with 130 SNPs). In addition, we observed the number of ROH vary across chromosomes, while the highest number of ROH was observed on BTA1 (2463 ROH segments) and the minimum number of ROH was observed on BTA25 (Fig. 1b).

Roh Region And Inbreeding Coefficients
ROH regions were determined by merging ROH identified across all individuals using previously published protocols implemented in BEDTools [37]. In this study, we identified 280 nonredundant ROH regions in 462 individuals. Then, we estimated inbreeding coefficients of the commercial beef cattle population using two different approaches including F HOM and F ROH . The inbreeding coefficient of F HOM varied from − 0.24 to 0.27, and the values of F ROH varied from 0.01 to 0.33. Our results also show that F HOM and F ROH were highly correlated (r = 0.96, P < 2.2e-16).

Genomic Patterns Of Homozygosity
To evaluate the ROH pattern of the commercial beef cattle population, we divided the length of ROH into three size classes: a. Small (500 kb to 1 Mb), b. Medium (1 Mb to 5 Mb), and c. Large (> 5 Mb), as described in the previous study [6]. The ROH distributions by total number and length for studied population were shown in Fig. 2a. We observed the total of 23390 Small (500 kb to 1 Mb) with length of 15.88 Mb, while the total of 440 Large (> 5 Mb) with length of 4.80 Mb. Meanwhile, our study revealed ~ 2.81% (70.30 Mb) of the genome is autozygous in the population. Also, we found the total lengths of ROH and the number of ROH were highly correlated (r = 0.91, P < 2.2e- 16), and the number of ROH has increasing trend with the increase of the length of across autosomes (Fig. 2b).

The Consensus Of Roh Across Population
To explore the occurrence of the consensus of ROH across population, the distribution of ROH frequencies for consensus ROH was estimated using PLINK command (-homozyg -group) across genome. We found the distribution of ROH frequencies among different chromosomes is uneven (Fig. 3). For instance, the highest frequency of ROH (> 36%) was observed in the middle part of BTA23, and we also found two ROH region with frequency of 28% and 25% which are located at BTA5 and BTA12, respectively. We totally detected 26 regions with ROH frequency exceeding 10% among studied population, and these regions was overlapped with 16 refgenes based on UMD 3.1. Moreover, based on cattle QTLdb, 35 QTLs were found that overlapped with 26 regions. Notably, we found the high frequency regions were overlapped with QTLs which are associated with weight gain (at BTA6) and calving difficulty (at BTA23). Moreover, it was noted that there are several important QTLs associated with several important traits including Gestation length (at BTA5), Lean meat yield (at BTA12) and Stillbirth (at BTA23).

Roh Regions Associated With Phenotype
ROH frequently can be formed by selection pressure at specific positions on the genome, which may also indicate association with complex traits [38]. In our study, we totally identified 280 regions and performed association analysis using linear model for eight economical important traits. The summary statistics of ROH region were presented in (Additional file 1: Table S1). For body height, two significant ROHs were found at BTA23 and BTA7, which spanned around 1.2 and 3.6 Mb, these two ROH contain 34 and 56 candidate genes. Moreover, we identified several significant ROH at BTA8, BTA27, BTA12 and BTA28 for the chest circumference, the backfat thickness, the ribeye area and the fat coverage, respectively, which spanned a region 1.9 Mb, 0.6 Mb, 2,1 MB, 1 Mb (Additional file 2: Table S2). These identified regions contained 33, 22, 0, 4 candidate genes (P < 0.01). We generated genome wide ROH regions plot using Circos (http://circos.ca/) which illustrating ROH regions and association analysis results for eight traits (as shown in Fig. 4). For the body height and body length, we detected several regions on BTA23 with 1.2 Mb which contained 34 candidate genes. In addition, we also identified several significant ROH (with significant level P < 0.05) at BTA13, BTA14, BTA22, BTA23, BTA9 overlapping with 4, 62, 1, 34, 9 genes for body length. For slaughter weight, we also identified several significant ROH (P < 0.05) at BTA12, BTA14, BTA18, BTA8, which overlapped with 0, 7, 37, 107 candidate genes. Several candidate ROH regions and candidate genes for growth related traits using region based association analysis was presented in Table 2.

Loci Associated With Phenotype
To precisely locate the candidate loci within ROH region associated for important traits, we further carried out association analysis for eight traits in the 462 commercial beef cattle population based each locus with homozygous states. Among the identified candidate ROH region, we totally identified 9360 loci for 8 traits and found 1,631 significant loci and responding to 67 candidate genes. Among them, 37 loci were significantly associated (P < 0.01) with the fat coverage and only one candidate gene (EBF2) was found on BTA8 (Table 3). Also, 27 loci were significantly associated (P < 0.01) with the Slaughter weight, and two candidate genes ODF1 and UBR5 were identified at BTA14. However, no associated locus was detected for body length. In addition, a total of 109 significant loci (at P < 0.01) was found for chest circumference at BTA8 and BTA14, which contain 22 candidate genes (Additional file3: Table S3).

Discussion
Many studies have explored the ROH pattern and inbreeding depression at the genomic level in multiple cattle populations [39][40][41]. To our knowledge, our study is the first attempt to present the occurrence and distribution of ROH in commercial beef cattle population using high-density SNP arrays. Previous study suggested that highdensity SNP arrays are more sensitive to determine the small segments, while Bovine SNP50K arrays may underestimate the number of fragments with length of 1-4 Mb [42]. In present study, our results showed the most abundance of ROH segments were between 0.5 and 1 Mb, which implied the power of identification of small ROH using high density SNP array.
We estimated inbreeding coefficients in the commercial beef cattle population using two methods including F ROH and F HOM . In general, the correlation between FROH and FHOM varies across many studies. Previous study found the correlation range 0.78 to 0.85, indicating that FROH has a high correlation with the FHOM, thus F ROH can be used as an accurate estimate of the proportion of IBD genomes [43]. In this study, the significant correlation (p < 2.2e-16) were observed between FROH and FHOM, which is consistent with previous report [44].
Many studies have shown significant differences in the total number and length of ROH in cattle, as well as the genetic relationship between individuals [7,45]. The pattern of ROH count and length may indicate the differences of breed formation and recent breed management [7]. For instance, analysis of the British Isles breeds including Hereford, Guernsey, Angus and Jersey cattle displayed the highest sum of ROH events per animal compared other breeds, while African breeds displayed high variability in total number of ROH among breeds [7]. A recent study revealed that the average total length of ROH was 106 Mb and 371 Mb for Piedmontese cattle and Brown cattle, and the study also showed that ROHs identified in dairy breeds was longer and larger than beef and dual-purpose cattle. Consistent with previous findings, our results show ROH sizes range from 500 kb to several megabytes and also length and number of ROH are vary in commercial beef cattle population. Several studies suggest that short ROH reflect ancient inbreeding, and long ROH segments reveal recent inbreeding [31, 46,47]. Therefore, our study revealed that most of ROH belong to the short and Medium, and this also indicate that the commercial population has low inbreeding level, which are agree with population history that this population have undergo hybrid process for recent selection.
Our study identified many consensuses of ROH across genome among population, and the distribution of ROH across genomic regions can imply functional effect for traits and help to identify functional pathways affected by inbreeding. In this study, we totally detected 26 regions with ROH frequency exceeding 10% among population, these regions were overlapped with 16 QTLs related to important traits including weight gain, calving difficulty, Gestation length, Lean meat yield and Stillbirth, which may potentially imply their impacts on reduction in performance for important trait [38].
Many previous studies have been investigated to detect the regions of homozygosity and their impact on complex traits in human [16,19,20,48]. For instance, several methods have been utilized to explore the association between ROH region and complex disease [14,45,49,50]. In current study, we proposed the regionbased and loci-based approach to investigate the candidate the ROH region and loci associated with important traits in cattle. These approaches can also be extended to identify the loci of ROH related to important traits in other farm animals.
In present study, we firstly carried the association test based on ROH region. Totally, we obtained 280 regions by merging all individual ROH events among population, and then we utilized proportion of ROH coverage for each region among individual as variable. Using linear model, we identified 16 significant ROH regions, we found 371 candidate genes been located in nonredundant regions in the studied population. Based on the significant region, we obtained 153 genes, while other region with 218 genes. Since many genes were identified for the association analysis based on ROH region, therefore, it is difficult to pinpoint the region of ROH associated with studied trait.
Furthermore, we performed loci-based association to precisely locate the region of ROH for important trait. Totally, we found 1631 non redundant loci and 67 candidate genes for 8 traits. For the fat coverage, we found only one gene (EBF2) within 37 loci for fat coverage, this gene was found related to fat formation [51,52]. Two genes (ODF1 and UBR5) with 4 loci and 23 loci were detected for slaughter Weight. One study found UBR5 gene is involved with glucose-dependent degradation of PEPCK1 [53]. For backfat thickness, 103 candidate loci within 9 candidate genes were identified in current study. However, only one gene (SLC20A2) was associated with carcass trait [54]. For carcass length, 188 loci and 31 candidate genes were identified. Among them, there most promising candidate genes including SH3BGRL2, HMGA1 and ACSL1 have been reported from previous report. For instance, SH3BGRL2 gene had a significant effect on fatty acid metabolism [55], and HMGA1 gene was related to the growth, fertility and lean meat content in commercial pigs [56], and ACSL1 gene was a candidate gene for the function of fatty acid composition in bovine skeletal muscle [57]. For the body height, we found 24 candidate genes embedded with 119 loci, of these gene, HMGA1 and ACSL1 have been implied that can promote the growth and development of animal body [57,58].

Conclusions
Our study investigated the homozygosity patterns and population inbreeding level in commercial beef cattle population. Both region-based and loci-based association were proposed and used to assess the association between ROH and important traits, which can be extended to explored genetic basis involved ROH for other farm animals. Our finding also implied some candidate ROH regions which may have potential impacts on reduction in performance for important trait in beef cattle.

Ethics Statement
No ethics statement was required for the collection of genetic material. The dataset from animals included in this study were from previous analyses that obtained specific permissions [36].

Genotyped Samples
The original data used were the Bovine HD chip from commercial beef cattle (Hybrid population from Wagyu and Fuzhou cattle), which was extracted from our pervious publication [36]. The population was established in Dalian XueLong Co. Ltd, Liaoning Province, China. Only autosomal SNP markers (735293) were analyzed for subsequently analysis. Then, individuals with missing SNPs exceeding 5% were removed. PLINK v1.07 [59] software was used to check the quality of dataset, and the standards were set as follows: minimum allele frequency (MAF) < 0.05, individual call rate < 10%, marker genotype deletion rate < 10%. After quality control, 503579 SNPs and 462 individuals were used for subsequently analyses.

Roh Estimation
In this study, we used PLINK v1.07 to detect ROH on the autosomes of each individual. Since LD can cause a short and common ROH throughout the genome, ROH was defined to be at least 0.5 MB in this study. Then, the specific parameters were as follows: 50 SNPs sliding windows slid along the chromosome to detect homozygous segments of each individual, and the sliding window allowed no more than 1 heterozygote. Several important parameters of defining ROH are involved: (i) the minimum length > 500 kb; (ii) the proportion of homozygous overlap window is 0.05; (iii) the minimum number of consecutive SNPs included in a ROH was 100; (iv) the minimum SNP density > 50 kb/SNP; (v) the maximum gap between continuous homozygous SNPs > 100 kb; (vi) the missing genotypes of ROH < 2.

Roh Classification And Inbreeding Coefficient
The length of ROH were divided into three classes: Small (0.

Qtl Regions And Genomic Regions Within Roh
To explore the potential QTLs loci related to economic traits and diseases in the commercial beef cattle, we adopted approach plugged in PLINK v1.07 [59] (-homozyg -group ) to estimate the consensus region across individuals, which represent ROH pools of overlapping and potentially matching segments. Then, based on these consensus regions, we annotated QTL and reference gene based on both QTLdb and UMD3.1 genome assembly when ROH frequencies exceeding 10%. The function annotation of the identified genes and GO terms were further assessed using DAVID platform [61,62].

Region Association
ROH regions were generated using BEDtools and custom script [37]. The proportion of ROH coverage for each region among individual were considered as variable. The matrix represents the ratio of each ROH event to ROH regions across each individual. Raw phenotype was adjusted by the general linear model. The covariates included the weight of entry and fattening days, and the fixed effects included field, year, gender and group stratification. After correction of phenotypic data, eight growth traits including fat coverage, slaughter weight, backfat thickness, ribeye area, carcass length, body height, chest circumference and body length were utilized for subsequently analysis. Association analysis between ROH region and the adjusted phenotype were conducted using linear regression model, and candidate regions were considered statistically significant when P < 0.01. All statistical analyses were carried out using R programming (https://www.r-project.org/).

Loci Based Association
To further investigate the candidate loci within ROH for studied traits, we carried out the association test based the homozygous status of each locus. Variable for association analysis was generated based on the homozygous status of loci. For locus with the ROH for regions was coded as 1, while the locus was coded as 0 without the ROH. Also, association analysis was carried out using the linear regression model between the state of loci (homozygous or non homozygous) and the adjusted phenotypes, candidate loci were determined by the statistically significant level (P < 0.01).

Declarations Ethics approval and consent to participate
No ethics approval was required for any aspect of this study.

Consent to publish
Not applicable

Availability of Data and Materials
The genotype data reported in this article are available upon request for research purpose.

Competing interests
The authors declare that they have no competing interests

Funding
This study was supported by the Agricultural Science and Technology Innovation Program in Chinese Academy of Agricultural Sciences (CAAS-ZDXT2018006, ASTIP-IAS-TS-16 and ASTIP-IAS03) and National Beef Cattle Industrial Technology System (CARS-37) for the design of the study and data collection. The project was also partly supported by Beijing City Board of Education Foundation (PXM2016_014207_000012) for the data analysis and interpretation of the study. L.Y.X was supported by the Elite Youth Program in Chinese Academy of Agricultural Sciences. Table S1. Summary of summary statistics of ROH region including chromosome, start, end and frequency of population, average of length of each ROH, and average of SNP count within each region. The ROH region was generated were produced by aggregating overlapping CNVs (by at least 1bp) across samples using BEDTools, Table S2. Summary of the all candidate associated loci and candidate genes for growth related traits using locus based association analysis. Candidate loci were determined by the statistically significant level (P<0.01). Table S3. Summary of the candidate associated ROH regions and their candidate genes list for growth related traits using region based association analysis in commercial beef cattle population. Figure 2 (a) The total length of ROH belonging to three size classes including Small (0.5 to 1 Mb), Medium (1 to 5 Mb) and Large (>5 Mb) size. (b) Evaluation of number of ROH and ROH total length. The number of ROH found for each individual genome (y-axis) is plotted against ROH total length (i.e. the length of Mb covered by ROH in each genome, x-axis).

Figure 3
Distribution characteristics of ROH on chromosome in commercial beef cattle population. The horizontal axis is SNPs, which are ordered by the physical location of the genome; the vertical axis is the sample proportion with ROH in this SNP. The horizontal line in the graph is where the ROH frequency exceeds 10%.

Figure 4
Circos plot illustrating ROH regions and association analysis for eight traits. Association significance test for each of ROH are plotted based on -log10(P) value using histograms plot within the gray inner circle. ROH frequency indicated by occurrence of ROH with each region across total population. Frequency values are shown in the inner circle using circus package. The outermost circle displays the cattle auto chromosomes. The circles from inside to outside represent association result for body height, body length, chest circumference, fat coverage, backfat thickness, ribeye area, carcass length and slaughter weight, respectively.