Pedigree reconstruction and population structure using SNP markers in Gir cattle

Our objective was to establish a SNPs panel for pedigree reconstruction using microarrays of different densities and evaluate the genomic relationship coefficient of the inferred pedigree, in addition to analyzing the population structure based on genomic analyses in Gir cattle. For parentage analysis and genomic relationship, 16,205 genotyped Gir animals (14,458 females and 1747 males) and 1810 common markers to the four SNP microarrays were used. For population structure analyses, including linkage disequilibrium, effective population size, and runs of homozygosity (ROH), genotypes from 21,656 animals were imputed. Likelihood ratio (LR) approach was used to reconstruct the pedigree, deepening the pedigree and showing it is well established in terms of recent information. Coefficients for each relationship category of the inferred pedigree were adequate. Linkage disequilibrium showed rapid decay. We detected a decrease in the effective population size over the last 50 generations, with the average generation interval around 9.08 years. Higher ROH-based inbreeding coefficient in a class of short ROH segments, with moderate to high values, was also detected, suggesting bottlenecks in the Gir genome. Breeding strategies to minimize inbreeding and avoid massive use of few proven sires with high genetic value are suggested to maintain genetic variability in future generations. In addition, we recommend reducing the generation interval to maximize genetic progress and increase effective population size.


Introduction
Taurine breeds are adapted in temperate or subtropical climates, but do not maintain performances in tropical regions, especially due to heat stress and parasite infestations. Thus, better adapted cattle genetics is needed in tropical herds, such as indicine breeds. Gir breed is a major choice in tropical milk production, either in crossbreeding or as pure animals.
Considering genomic evaluation in commercial breeds, there are several available genotyping panels. Moreover, genotypes obtained at different densities microarrays are used to estimate relationship coefficients with great accuracy in an unobserved pedigree or to identify genomic regions that are not affected by recombination (Speed and Balding 2014).
The success of breeding programs depends on the knowledge of several factors that interfere with selection, such as effective population size, generation interval, and genetic variability (Malhado et al. 2010). Population structure combined with information about genetic changes can guide management decisions, allowing development of strategies to promote genetic gain.

Communicated by: Maciej Szydlowski
The main studies evaluating population structure are based on pedigree records (Leroy et al. 2013;Santana Jr et al. 2014). Recent genomic approaches provide information on genetic diversity, population history, and inbreeding coefficient (Peripolli et al. 2018;Ospina et al. 2019). In the current research, our objectives are as follows: (1) to establish a SNP panel for parentage testing and pedigree reconstruction using different densities microarrays and (2) to evaluate genomic relationship coefficient of inferred pedigrees, in addition to disentangle population structure in Gir cattle.

Data
Our data includes files from the Brazilian Dairy Gir Breeding Program (PNMGL), coordinated by Brazilian Agricultural Research Corporation (Embrapa) Dairy Cattle (Juiz de Fora, Minas Gerais, Brazil) in cooperation with the Brazilian Association of Zebu Breeders (ABCZ) and the Brazilian Association of Dairy Gir Breeders (ABCGIL).
Quality control for genotypes was implemented using snpStats package (Clayton 2019) in the R software (R version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria). Samples with call rate < 0.90 and average GenCall score (GC) < 0.70 were removed. SNPs at the same position or not assigned to any chromosome were removed from the dataset. Only autosomal SNPs with Call Rate > 0.90, at Hardy-Weinberg equilibrium (p value > 10 −6 ), minor allele frequency (MAF) > 0.05, and mean GC score > 0.70 present in all four different SNP microarrays were maintained for further analysis. SNPs in linkage disequilibrium, with r 2 > 0.20, were removed. After quality control, 16,205 Gir animals, 14,458 females and 1747 males, remained for parentage and genomic relationship analyses and 1810 SNPs common to the four SNP microarrays.
For population structure analyses, which includes linkage disequilibrium, effective population size and runs of homozygosity, the genotypes of 21,656 animals genotyped with five microarrays of different densities were imputed. A detailed description of this analysis is done in the next section.

Imputation
For population structure analyses, genotypes and genealogical records from animals born between 1960 and 2020 were considered. Imputation was implemented using FIMPUTE 2.2 software (Sargolzaei et al. 2014), in which the lower density panels were imputed to the high-density level (HD). Animals genotyped with the Bovine HD BeadChip (Illumina Inc., San Diego, CA, USA) were used as reference population for imputation. After imputation, the 21,656 animals had information of 416,155 SNPs. Imputation accuracy was 0.9669.

Parentage analysis and pedigree reconstruction
Pedigree reconstruction was performed by likelihood method, using sequoia package (Huisman 2017) in the R software (R version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria), which calculates the likelihood ratio (LR) of the offspring genotype. LR gives more weight to rare alleles, besides considering that all loci are independent; so the total likelihood ratio is multiplied by all loci (Grashei et al. 2018). We used the logarithm (base 10) of likelihood ratios (LLR), which indicates that a given animal is the parent versus the next most likely relationship between the focal individual and this animal (Marshall et al. 1998), with parentage attributed to the most likely parent. LLR values, while useful in obtaining a solution of maximum likelihood, cannot be interpreted statistically or biologically.
The quantity that was maximized was the total likelihood of the pedigree configuration over all genotyped individuals (N) (Huisman 2017): where P(A l = X|D A , S A ) is the probability of observing genotype X at locus l in individual A, conditional only on its parents D A and S A in pedigree P . Which were then multiplied over all individuals and multiplied over all loci.
Although pedigree files were available initially, to reconstruct the pedigree, we considered the worst-case scenario, assuming no available pedigree information. This implied that all peer relationships would be tested. The following relationships were attributed: parent-offspring (PO); full siblings (FS); half siblings (HS); grand-offspring (GO); full avuncular (FA); half avuncular (HA); full nephew/niece (FN); and half nephew/niece (HN).
In well-established pedigrees, the sequoia package (Huisman 2017) estimates the probability of confidence through reference pedigrees. For this analysis, the reference pedigree provided by the Brazilian Association of Zebu Breeders (ABCZ) was used, with prior verification of incompatibilities between parents and offspring based on the count of Mendelian conflicts, using SeekParentF90 software (Aguilar et al. 2014). Conflict threshold to exclude any parent was set to one percent (1.0%) of SNP markers. Confidence probability, obtained through ten simulations, was considered the number of correct assignments, which coincided with the reference pedigree, divided by all assignments made in simulated pedigrees.

Genomic relationship coefficient
Genomic relationship matrix was calculated according to Van-Raden (2008), using preGSf90 (Aguilar et al. 2010): where Z is an incidence matrix for SNP effects with elements for animal i and SNP j with allele frequency p j (VanRaden 2008).

Linkage disequilibrium (LD) and LD decay
Linkage disequilibrium between SNPs is computed as correlation of gene frequencies ( r 2 ij ) (Hill and Robertson 1968) using PLINK 1.9 software (Purcell et al. 2007): where p ij is the probability of the marker allele pair i and j; p i and q i are marginal allelic frequencies at i and j; and p 1 p 2 q 1 q 2 is the product of four allele frequencies at both loci.
To assess LD decay, r 2 ij will be regressed in the distance between pairs of markers based on the nonlinear parametric model described by Sved (1971): where LD ij is the r 2 ij observed between SNPs i and j; d ij is the distance in Kb between SNPs i and j; is the coefficient that describing LD decay with distance; and e ij is a random residue defined as e ij ∼ N(0, 2 ).

Effective population size
Effective population size (N e ) was estimated based on relationship between N e and LD without mutation and recombination rate ( c ) proposed by Sved (1971), using the equation (Hayes et al. 2003;Ospsina et al. 2019): where c represents the distance from map in Morgans, assuming 1 cM is equal to 1 Mb, and N T is effective population size in the T th generation, and T is 1 2c .

Runs of homozygosity and inbreeding coefficient
Runs of homozygosity (ROH) were identified in 21,656 individuals, using the genotypes imputed, by sliding window methodology of specified length, or a number of homozygous SNPs. For this, we used PLINK 1.9 software (Purcell et al. 2007). The parameters applied to define ROHs were as follows: (i) a sliding window of 50 SNPs throughout the genome; (ii) proportion of homozygous overlapping windows was 0.05; (iii) minimum number of consecutive SNPs included in an ROH was 120; (iv) minimum ROH length was 1 Mb; (v) maximum interval between consecutive homozygous SNPs was 1000 Kb; and (vi) density of one SNP per 50 Kb; vii. maximum of 1 SNP with missing genotypes and even one heterozygous genotype was allowed in every ROH. ROHs were defined by a minimum of 1 Mb in length to avoid short and common ROHs, which occur throughout genomes due to LD. ROHs were classified into five length classes: where L ROHj is the length of ROH j and L total is total size of each class, chromosome, or wide genome, covered by markers. L total , for wide genome, was taken as 2,487,068,108 bp, based on consensus map ARS-UCD1.2 (GenBank project accession: NKLS00000000.2).
For each animal, F ROH (F ROH1-2 Mb , F ROH2-4 Mb , F ROH4-8 Mb , F ROH8-16 Mb , and F ROH > 16 Mb ) was calculated based on the distribution of ROH of the five different lengths (ROH j ). F ROH for each generation was calculated using the expected length of autozygous segments in distributions with average equal to 1 2g Morgans, where g is the number of generations since common ancestors (Howrigan et al. 2011). Generation intervals were calculated using optiSel package (Wellmann

Results
In this work, for parentage and genomic relationship analyses, SNPs in common from four different genotyping microarrays were used. Markers present at all microarrays initially resulted in a set of 7372 common SNPs (Fig. 1A). After quality control, a panel with 1810 common SNPs was considered (Fig. 1B). MAF values of the 1810 SNPs ranged from 0.06 to 0.50, with mean MAF of 0.35 ± 0.115. MAF values for each marker, as well as name, chromosome, and position, according to the ARS-UCD1.2 bovine genome assembly, are presented in Online Resource 1. The distribution of SNPs number along the 29 autosome chromosomes can be seen in Online Resource 2. From ISAG SNP panel (ISAG, 2012), after quality control 46 markers (23%), were polymorphic. MAF values of these SNPs ranged from 0.07 to 0.50, with mean and standard deviation of 0.24 ± 0.125, respectively.

Parentage analysis and pedigree reconstruction
LLR values ranged from 3.84 to 84.56 for dams and 1.92 to 128.72 for sires, being mean and standard deviation for dams and sires 42.44 ± 6.08 and 33.77 ± 12.69, respectively. For pairs of parents, the LLR values ranged from 36.38 to 89.50, with mean and standard deviation of 63.52 ± 10.77. The distribution of LLR frequencies for pairs, dams, and sires can be seen in Fig. 2.
The opposite homozygous loci (OH) were checked between a given individual and each of the assigned parents (Fig. 3), being OH calculated for the parent-offspring relationship and the Mendelian Error (ME) for the trio information (sire-offspring-dam).
There should be no Mendelian inconsistencies in true parent-offspring relationships, and no locus with OH should occur between both evaluated individuals, except for mutation events and genotyping errors. Regarding 9238 dam assignments, 85.90% (n = 7935) did not present opposite homozygosis, 13.21% (n = 1220) presented OH equal to one, and for loci in OH equal to two and three, the percentage was 0.89% (n = 80 and n = 2, respectively). For sires, out of the 15,066 parentage assignments, 90.21% (n = 13,591) were detected OH equal to zero, 9.49% for OH equal to one (n = 1430), and for loci in OH equal to two and three, the percentage was 0.30% (n = 43 and n = 2, respectively).
Considering ME count, only assignments for parent pairs were used. For pairs of parents, 8799 assignments were made, in which 75.33% (n = 6628) did not present ME, 21.98% (n = 1934) presented one ME, 2.48% presented ME equal to two (n = 218), and the sum of ME equal to three and four represented 0.22% (n = 17 and n = 2, respectively).
Estimation of parentage assignments confidence to the most likely parent was achieved by comparison with the reference pedigree (Huisman 2017). Confidence probability values in the current study equals to 99.9970% for pairs of parents, 99.9980% for sires, and 99.9956% for dams.
In the reference pedigree, there was 13,460 information about sires (Info Ref ) (Table 1), while in the inferred pedigree, we assigned 15,066 (Info Inf ), with 1612 information occurring only in the inferred pedigree (Inf only ). In the dam-offspring category, the inferred pedigree assigned 9238 dams, while our reference pedigree contained 8051 assigned dams, resulting in 7.33% more information in the inferred pedigree. For trio data, the reference pedigree contained 7045 information, while we assigned 8799 sire-offspring-dam in the inferred pedigree.
Information, or lack of information, from one or both parents, was not identical in only 16.02% of the 16,205 animals. Comparing information from reference and inferred pedigrees, 90.01%, 92,65%, and 83.98% are respectively coincident for sires, dams, and trios (sire-offspring-dam).

Genomic relationship coefficient
We also constructed a categorical relationship matrix, containing up to two generations, with dimension equal to NxN (where N = number of animals) from the inferred pedigree file (Fig. 4). This analysis was performed with the 1810 SNPs common to the four microarrays.
Descriptive statistics of the number of assignments made in our inferred pedigree, genomic relationship values, and pedigree-based expected relationship coefficient values for each category are described in Table 2.

Linkage disequilibrium and effective population size
Imputed genotype data for 21,656 animals was used for population structure analyses. LD decay pattern in Gir breed was created based on estimated r 2 measurements, using the equation described by Sved (1971), and can be seen at Fig. 5.
Estimated r 2 between pairs of markers equals to 0.208 for 50 Kb distance. Average r 2 value, observed at distances less than 10 Kb, equals to 0.37 ± 0.35. Average r 2 values decreased with increasing distance, as well as the percentage of r 2 greater than 0.3 (Online Resource 3). The highest average r 2 value was equal to 0.40 ± 0.36, and the lowest value was equal to 0.06 ± 0.09, at distances of 0 − 5 Kb and 900 − 1000 Kb, respectively. The r 2 values differed among the autosomes (Online Resource 4).   The estimated Gir cattle population N e for the previous 50 generations is illustrated in Fig. 6. N e estimated was higher in generation 50, with 41 animals, and has declined dramatically over generations, in which the last generation (5) presented 11 animals. Analyzing from the older to the most recent generation, N e estimates for Gir cattle show a decreasing trend. Mean generation interval was calculated in 9.08 years.

Runs of homozygosity and inbreeding coefficient
A total of 1,156,117 ROHs were found among the 21,656 Gir animals using 416,155 markers. Per animal 53 ± 9.0 ROHs were detected, with a maximum number of 284 and a minimum of 6 ROHs. ROH length per individual was 3.24 Mb ± 1.06 Mb. A total of 12,272 animals presented ROHs in the class of > 16 Mb, with an average of 2 ROHs per animal in this class, in which minimum and maximum equal to 1 and 19 ROHs, respectively (Table 3).
Distribution and mean length, as well as ROH coverage, by chromosome is shown in Online Resource 5 and Online Resource 6, respectively.
The extent and frequency of ROH are used to infer ancestry at individual and breed levels. Inbreeding, measured as the proportion of ROH-covered genome, results in an increase of homozygosity. We found, in our work, a mean F ROH of 0.19 ± 0.13 for Gir cattle population, the distribution can be seen in Fig. 7.
To differentiate old and recent inbreeding, we calculate F ROH based on five different ROH length classes as shown in Table 4. We consider F ROH of 1-2 Mb in length equal to ≈50 generations, 2-4 Mb ≈ 25 generations, 4-8 Mb ≈ 12 generations, 8-16 Mb ≈ 6 generations, and 16 Mb ≈ 3 generations (Peripolli et al. 2018). Inbreeding showed large variation among chromosomes, both in relation to mean values and level of inbreeding per individual (Fig. 8).  The panel proposed by ISAG for parentage testing consists of 100 markers derived from European taurine breeds, and additional 100 markers from indicine and synthetic breeds (ISAG 2012). In our study, only 46 SNPs were informative, indicating that ISAG panel is not fully representative in all breeds and populations. Strucken et al. (2014) in a study with Hanwoo cattle, which, despite being a taurine breed, differs significantly from European taurines, showed that breed-specific marker panels perform better for parentage testing than the panel established by ISAG. Kopps et al. (2015) conducted a study testing type and quantity of markers required for high accuracy in parentage tests, concluding this vary according to relationship categories, mating system, and number of overlapping generations. There are few studies reporting parentage testing in cattle, and especially for indicine breeds, there is a lack of studies regarding pedigree analysis tackling genomic relationship. This point must be overcome, since indicine breeds are essential resources in tropical countries.

Parentage analysis and pedigree reconstruction
Opposing homozygotes occur if one parent is homozygous for one allele, the other parent is homozygous for the other allele, and the offspring is not heterozygous. In true parent-offspring relationships, there should be no Mendelian inconsistencies and no locus with OH should occur between both parents, except for mutation events or genotyping errors.  Even if two individuals are considered unrelated, according to the pedigree, it is possible to obtain genetic information supporting the hypothesis that they share identical alleles; markers might detect incorrect relationships in a pedigree, when these are monomorphic. The uncertainty is due to the random nature of allelic inheritance from parents to offspring during meiosis, and whether alleles are identical by descent or state, but the use of likelihood ratio allows increased information about relationship between individuals to be extracted from observed genotypes.

Genomic relationship coefficient
Unlike the relationship matrix calculated using pedigree information, which contains only positive values, genomic relationship matrix can also contain negative values, since these values are standardized correlations (VanRaden 2008). The explanation is that there is an average relationship inside each population and there are animals presenting values above or below the average. Animals with divergent genotypes will show negative values (Legarra et al. 2018).
When we compared in our study, the genomic relationship coefficients, and the average relationship coefficient values expected for each class, the calculated values are lower in some classes. This fact can be explained by the inbreeding control that Gir breed has had over the last few years, which is supported by the inbreeding coefficient calculated for the ROH >16 Mb class, which represents the most recent inbreeding. Therefore, when compared to the population mean, we found a lower than expected degree of homozygosity. O'BRIEN et al. (2014) evaluated taurine and indicine breeds, including Gir cattle, and observed that lower levels of r 2 were found for indicine breeds. Within indicine breeds, the levels of r 2 at a distance of 0-1 Kb is approximately 0.55, in which Gir breed presented a r 2 average equal to 0.3 with only 16.5 Kb, which indicates a rapid decay of the LD curve (O'Brien et al. 2014). In our study, the average value of r 2 observed at 0-1 Kb was 0.387 ± 0.3566.

Linkage disequilibrium and effective population size
The LD decay graph showed a r 2 = 0.3 at a distance of 29 Kb, similar to taurine breeds at 33 Kb (O'Brien et al. 2014). Gir breed presented an estimated r 2 between pairs of markers equal to 0.208 at 50 Kb, showing a fast decay. Porto-Neto et al. (2014) evaluated LD decay for taurine, indicine and crossbreeds animals. They noted that decay was rapid, with r 2 below 0.2 for pairs of markers separated by 50 Kb for all breeds, but at distances less than 10 Kb, taurine breeds showed higher LD ( r 2 = 0.45) than indicine breeds ( r 2 = 0.25) and crossbreeds ( r 2 = 0.32).
Our average value of r 2 = 0.376 ± 0.3538, observed at distances of less than 10 Kb, is higher than found by Porto-Neto et al. (2014) for indicine breeds. This higher LD value can be explained by the smaller effective population size in our population, which possibly reflects a stronger bottleneck during breed formation. Considering greater distances (100-1000 Kb), the values found by O'Brien et al. (2014), for indicines in a high-density dataset, were similar to ours, with an observed r 2 mean approximately from 0.15 to 100 Kb, and less than 0.1 to 1 Mb, showing smooth decay at greater distances.
Differences in LD levels of indicine breeds can be attributed to historical population events, including mixing with wild-type ancestors (Murray et al. 2010), lower selection pressure (Thévenon et al. 2007) and larger effective population sizes (Ospina et al. 2019). N e , according genomic data, can be estimated based on LD standards (Barbato et al. 2015;Jasielczuk et al. 2020). The strength of LD among loci at different genetic distances allowed us to infer about the effective size of the ancestral population; N e estimated in Gir cattle population for the previous 50 generations is illustrated in Fig. 6. N e estimates show a decreasing trend over generations, analyzing from older to the most recent generation. Malhado et al. (2010) reported N e , based on the sex ratio, for Gir raised in Northeastern Brazil estimated at 100 to 150, in the period from 2002 to 2006, corroborating with Reis-Filho et al. (2010) andSantana Jr et al. (2014), who described similar values of N e , based on inbreeding coefficients. The current Gir population presented a much smaller effective size than that found by these authors. The difference was possibly due to the methodology, since unlike the others, our information come from molecular markers, resulting in more accurate values.
The mean generation interval (9.08 years) was higher than described by Malhado et al. (2010), 7.9 years, andReis-Filho et al. (2010), 8.41 years. Longer generation intervals result in reduced annual genetic gain and consequently lower economic returns. Generally, long generation intervals are due to the longevity of the Gir breed, in which animals are usually kept for reproduction until older ages, in addition to the continuous use of specific bulls, without replacements. Nowadays, considering the genomic approach in breeding programs, younger bulls are being included in progeny tests, fastening genetic gain, and replacement of the proven older sires, decreasing generation interval. Santana Jr. et al. (2014) and Reis-Filho et al. (2010), analyzing the pedigree based inbreeding coefficient (F PED ) in Gir cattle, found values of 0.0192 and 0.0280, respectively. These studies described inbreeding values close to those found by us at distances above 8 Mb (F ROH8-16 Mb = 0,026 ± 0,025, e F ROH > 16 Mb = 0,022 ± 0,021). This was explained by Zavarez et al. (2015), who observed that incomplete pedigree does not capture remote inbreeding and estimates based on pedigree information are comparable only with the F ROH calculated based on long ROH lengths. The variation between these two estimates occurs since F PED does not assume selection and genetic recombination events; therefore, it does not consider the potential bias of these events (Peripolli et al. 2018). We emphasize that pedigree relationship is obtained from statistical expectations of the probable genomic proportion of identical by descent alleles, while estimates based on genotypes use the information of identical by state alleles, resulting in greater accuracy (Visscher et al. 2006).

Runs of homozygosity and inbreeding coefficient
Long ROHs are indicative of recent inbreeding, while short ROHs capture ancient inbreeding derived from older ancestors and detect population bottlenecks. Under the assumption that 1 cM is equivalent to approximately 1 Mb, ROH can be separated into length classes to express different points in the time at each inbreeding occurred (Curik et al. 2014). To differentiate old and recent inbreeding, we calculate F ROH based on five different ROH length classes (Table 4). Inbreeding events that took place ≈25 generations ago were presented by the F ROH calculated by length below or equal to 2 Mb. The degree of inbreeding based on the longest ROH class presented mean inbreeding equal to 0.022 ± 0.021, reflecting inbreeding events occurring < 3 generations ago.
The Gir population used in this study showed a reduction in the N e based on LD, when calculated over the last 50 generations, suggesting a recent inbreeding, which is supported by the occurrence of long ROH segments. Inbreeding levels, especially those based on short ROH segments, were moderate to high, suggesting not only a small N e but also existence of bottlenecks in the genome of Gir cattle.

Conclusions
The panel with 1810 SNPs was satisfactory for parentage attribution, in which the number of attributions was higher than the number of information already contained in the reference pedigree, with a probability of confidence greater than 99,995%. The methodology added information that was not in the pedigree of the breeders' association, suggesting that when there is a large number of genotyped individuals, the approach is advantageous. For Gir breed, due to the ease and speed of execution and interpretation, confirmation of parentage by the exclusion method would still be the most suitable method for routine analysis. The relationship coefficients allowed us to evaluate the distribution of values for each relationship category of the inferred pedigree, showing appropriate values for each category.
Regarding population structure, which we used the imputed genotypes, considering the N e decrease based on LD over the past 50 generations, we recommend monitoring and conservation measures to minimize loss of genetic diversity in the breed. We suggest the application of breeding strategies to minimize inbreeding and avoid the use of the most used bulls with high genetic value to maintain genetic variability in future generations. We also recommend increasing the effective population size and reducing generation interval to maximize genetic progress.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s13353-023-00747-x. tuto Nacional de Ciência e Tecnologia-Ciência Animal (INCTCA) for their financial support and Embrapa Dairy Cattle Research Center and Embrapa Digital Agriculture (CNPTIA) for providing the data and digital means necessary to carry out this study.
Author contribution All authors contributed to the study conception. Material preparation and data collection were performed by João Cláudio do Carmo Panetto, Marcos Vinicius Gualberto Barbosa da Silva, Marco Antônio Machado, and Rui da Silva Verneque. Material preparation and data analysis were performed by Arielly Oliveira Garcia and Pamela Itajara Otto. Resources, conceptualization, and supervision were performed by Simone Eliza Facioni Guimarães. Editing and data analysis were performed by Luiz Afonso Glatzl Junior, Renata de Fátima Bretanha Rocha, Mateus Guimarães dos Santos, and Daniele Alves de Oliveira. The first draft of the manuscript was written by Arielly Oliveira Garcia, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding This work was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) (Proex 1246/2020). Author AOG has received research support from Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG). Author MGS has received research support from CAPES. Author RFBR has received research support from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Data Availability
The data sets analysed in the current study might be available under reasonable request.

Declarations
Ethics approval This study did not make direct use of animals; a database from the breeders' association was used. The Universidade Federal de Vicosa Research Ethics Committee has confirmed that no ethical approval is required.

Competing interests
The authors declare no competing interests.