A within- and across-country assessment of the genomic diversity and autozygosity of South African and eSwatini Nguni cattle

In southern Africa, the Nguni cattle breed is classified as an indigenous and transboundary animal genetic resource (AnGR) that manifests unique adaptation abilities across distinct agroecological zones. The genetic integrity of various ecotypes is under potential threat due to both indiscriminate crossbreeding and uncontrolled inbreeding. The aim of this study was to assess the genetic diversity and autozygosity that exist both across countries (ES: eSwatini; SA: South Africa) and within countries (SA), between purebred stud animals (SA-S) and research herds (SA-R). Subsets of 96 ES, 96 SA-S, and 96 SA-R genotyped for 40,930 common SNPs were used to study genome-wide profiles of runs of homozygosity (ROH) and heterozygosity (ROHet) as well as inbreeding levels and population structure. The highest percentage (39.8%) of the 2168 ROH segments was 4–8 Mbp in length, whereas 65% of the 935 ROHet segments fell within the 0.5–1 Mbp length category. Inbreeding coefficients indicated positive but low inbreeding (FROH>1Mbp range: 0.025 for SA-S to 0.029 for SA-R). Principal component (PCA) and population structure analyses illustrated genome-level distinctness of (1) the Nguni from global indicine (Boran) and taurine (Hereford) breeds (K = 3), (2) the SA Nguni populations from the ES Nguni population (K = 4), and (3) different Nguni ecotypes within countries (K = 8). Furthermore, greater admixture was observed for the SA-R population compared to purebred SA-S population (shared ancestry = 0.631 ± 0.353 compared to 0.741 ± 0.123), and fewer genomics-defined ES ecotypes were observed than phenotypically (pre)defined. Overall, the results illustrated that genetic uniqueness within the sampled Nguni cattle resulted from both geographic isolation and exposure to different breeding strategies (and, selection pressures). A further loss of genetic variability should be monitored to prevent the endangerment of unique and beneficial ecotypes.


Introduction
The Nguni breed is one of the 150 recognized cattle breeds indigenous to the African continent (Mwai et al., 2015). Classified as a Sanga breed (Bos taurus africanus), these small-to medium-framed cattle have an admixed genetic composition that is intermediate between Bos taurus and Bos indicus subspecies (Hanotte et al., 2002). Nguni cattle are known for their relatively low maintenance requirements (Musemwa et al., 2010), longevity, and high calving rates (Matjuda et al., 2014). Furthermore, their heat tolerance (Katiyatiya et al., 2017) and their resistance to ticks and tick-borne diseases (Muchenje et al., 2008;Mapholi et al., 2014) deem them adaptable to the diverse production environments characteristic of southern Africa.
Accompanying the settlement of several ethnic groups in different geographic and climatic regions of South Africa (SA), distinct Sanga cattle ecotypes, phenotypically distinguishable by coat color variations, developed over time (Bester et al., 2003) and these include the Shangaan, Pedi, and Nkone (van Marle-Köster et al., 2021). In the early 1930s, Nguni cattle populations were classified as nondescript and an official breed society was only established in 1986. The SA Nguni, therefore, has a relatively short history of animal recording and objective selection (van Marle-Köster et al., 2021) in comparison to other beef cattle breeds in the country.
The breed is currently farmed throughout South Africa but is most popular in the regions with higher temperatures and humidity characteristic of both commercial and smallholder production systems. In rural communities, Nguni cattle are milked for household consumption and play an integral role in cultural and religious ceremonies, and are often only marketed when cash is required (Mapiye et al., 2019). Due to their smaller frame sizes, and relatively slower growth, Nguni weaners are undesirable and fetch lower prices on the commercial level (i.e., in feedlots; Leeuw & Jiyana, 2020). There are approximately 240 registered Nguni stud farmers who take part in animal recording and genetic evaluations (SA Stud Book Annual Report, 2016). Genomelevel genetic characterization of existing SA Nguni ecotypes is limited to one study based on microsatellite markers (Sanarana et al., 2016) and another using a 50,000 singlenucleotide polymorphism (SNP) genotyping panel but with a limited sample size (n = 50; Makina et al., 2014).
In eSwatini, Nguni cattle are reared in all six ecological regions of the country where it plays a significant role in the economy of rural communities, contributing more than 70% to the livelihoods of communities dependent on livestock (Vilakati, 1994). Even though indigenous eSwatini Nguni cattle are noted as an important heritage of the eMaswati (people of the kingdom of eSwatini), the breed has previously been listed as endangered in the country (Scherf, 2000). More than two decades ago, four conservation and breeding stations were established for the conservation of the eSwatini Nguni. Over time, the mandate of these farm units has changed from conservation alone to a combined focus on conservation and improvement for beef cattle production. The improvement program consists of intercrossing the six different indigenous lines for multiplication purposes and upgrading of selected populations with exotic breeds including the Angus and Brahman (FAO, 2004). The influx of exotic genetic material has resulted in the upgrading of locally adapted indigenous populations, including the Nguni, and this poses a threat to their genetic integrity (Keller and Waller, 2002;Taberlet et al., 2008). According to reports, the eSwatini Nguni population is furthermore declining (Madilindi et al., 2020). Genomic characterization of the eSwatini Nguni is, however, limited to a single 25 microsatellite-based study on 30 cattle sampled from two research stations (Madilindi et al., 2020).
Advancements in SNP-based methodologies have allowed the identification of homozygosity-and heterozygosity-rich regions to study both genome-wide levels of genetic uniformity (i.e., regions lacking diversity) and diversity. Runs of homozygosity (ROH) may be used to describe the inbreeding status (in terms of degree and age) of populations or to identify genomic regions under selection (Biscarini et al., 2020). Runs of heterozygosity (ROHet) are genomic regions where diversity might be beneficial (e.g., adaptive traits) and may indicate balancing selection events (Biscarini et al., 2020). These and other inbreeding and diversity parameters may assist in evaluating current genetic characteristics and to perform a risk assessment of southern African Nguni populations. This study aimed to analyze the genetic diversity and autozygosity that exist both across countries (ES: eSwatini; SA: South Africa) and within countries (SA), between purebred stud animals (SA-S) and research herds (SA-R).

Materials and methods
Subsets of 96 SA purebred stud animals (SA-S), 96 SA animals from research stations (SA-R), and 96 eSwatini cattle from three governmental breeding stations (ES) were studied. The SA-S cattle (all male animals) formed part of a previously published feedlot study (De Vos, 2018). The SA-R cattle (all female animals) were a randomly selected cohort used in a larger tick resistance study by Mapholi et al. (2016) and were reared under natural grazing conditions across four different agroclimatic zones in the Eastern Cape, Gauteng, Kwa-Zulu Natal, and Limpopo provinces of SA. The 96 ES cattle (18 males, and 78 females) comprised 32 animals each randomly sampled from the Nsalitshe (Western Lowveld), Mpisi (Lower Middleveld), and Manyonyaneni (Eastern Lowveld) Nguni cattle breeding stations. Pedigree data available in each breeding station was used in selecting against full and half-sibs to guarantee that unrelated individuals were sampled.
Deoxyribonucleic acid (DNA) was extracted from the hair (ES and SA-S) and/or blood samples (SA-R). The ES and SA-R populations were genotyped with the Illu-mina® Bovine SNP50 version 2 genotyping panel (54,609 SNPs), while the SA-S population was genotyped with the GeneSeek® Genomic Profiler™ uHD bovine genotyping panel (141,716 SNPs). A common set of 40,930 autosomal SNPs were retained for each population after the exclusion of SNPs on sex and mitochondrial chromosomes, unmapped SNPs, and SNPs that had duplicated genomic positions. Using PLINK software (Purcell et al., 2007), the exclusion of samples and SNPs was based on sample call rate (< 90%), SNP call rate (< 95%), low minor allele frequency (MAF < 1%), and Hardy-Weinberg equilibrium (P < 0.001). Liberal linkage disequilibrium (LD)-based filtering was additionally employed (r 2 > 0.5) before genetic relatedness and population structure analyses. In concordance with suggestions by Meyermans et al. (2020) for runs of homozygosity (ROH) identification, no MAF filtering was applied prior to this analysis. However, MAF filtering (as described above) was applied before runs of heterozygosity (ROHet) identification to prevent the underestimation of segment lengths. Low-MAF SNPs tend to have homozygous genotypes, which may partition longer ROHet segments (i.e., more but smaller ROHet are identified). No LD filtering was applied prior to neither ROH nor ROHet analyses.
Both ROH and ROHet segments were identified using the R package detectRUNS (Biscarini et al., 2018) by executing the consecutive-SNP-based detection method (Marras et al., 2015). For ROH, the minimum number of SNPs constituting a segment (n = 56) was estimated using the following formula implemented by Purfield et al. (2012): log e n s. n i ∕log e (1 − het) , where n s and n i were the number of SNPs and individuals, respectively, α represented the proportion of false-positive identifications (set to 0.05), and het was the mean SNP-wide heterozygosity. For ROHet, the default minimum of 10 SNPs constituting a segment was used. For ROH and ROHet, minimum segment lengths of 1Mbp and 10kbp, respectively, were required. A maximum number of zero (for ROHet) and one (for ROH) opposing genotypes, as well as two missing genotypes (for both ROH and ROHet), were allowed within segments, with a maximum gap of 1 Mbp allowed between segments. Two coefficients of inbreeding were calculated: (1) F IS , which was an SNP-by-SNP-based estimation of excess in homozygosity as implemented using PLINK's --het command, and (2) F ROH , which was an ROH-based coefficient considering all segments larger than 1 Mbp in size. The F ROH coefficient was calculated as S ROH ∕L GEN , where S ROH represented the summed length of ROH per animal and L GEN represented the base pair length of the genome covered by SNPs. The F ROH coefficient was calculated for different ROH length categories, namely > 1Mbp (F ROH≥1Mbp ), 1Mbp ≤ ROH < 4Mbp (F ROH=1-4Mbp ), 4Mbp ≤ ROH < 8Mbp (F ROH=4-8Mbp ), and > 16Mbp (F ROH≥16Mbp ).
To assess the genetic relatedness between individuals, a genomic relationship matrix (GRM) was constructed using GCTA software (Genome-wide Complex Trait Analysis; Yang et al., 2011) and used in the principal component analysis (PCA) to estimate eigenvectors per individual. The cross-validation (CV) procedure was used to identify the ideal number of ancestral populations (K) in ADMIXTURE software (Alexander et al., 2009); the K-value producing the lowest CV error was considered ideal. For comparison, 96 Boran cattle (BOR; representing the Bos indicus subspecies) and 96 Hereford cattle (HFD; representing the Bos taurus subspecies) were randomly selected from a study by van Marle-Köster et al. (2021) and included in the analysis. For visualization, population structure bar plots were produced with GENESIS software (Buchmann and Hazelhurst, 2014).

Results and discussion
A total of 2168 ROH segments were identified, ranging from 670 segments for ES to 806 segments for SA-R. The sub-totals of ROHet segments were 343, 309, and 283 for the ES, SA-R, and SA-S populations, respectively, which was similar to per-population numbers observed in other studies (e.g., 2 for Creolo from Guadalupe to 1702 for Montana Tropical composite; Mulim et al., 2021). The mean ROH and ROHet segment lengths observed were 8.76Mbp and 0.624Mbp, respectively. Though complex to compare (due to differing criteria for defining ROH and ROHet segments), mean lengths were similar to those reported for ROH (8.55Mbp) and ROHet (0.70Mbp) by Biscarini et al. (2020) for semi-feral Maremmana cattle. As illustrated in Fig. 1, the highest proportion of ROH segments (range: 0.363 for ES to 0.441 for SA-S) was 4-8Mbp in size, which translates to the majority of inbreeding effects occurring 12.5 generations ago and earlier (Howrigan et al., 2011). For the SA Drakensberger, also a Sanga breed, the highest proportion (0.545) of identified ROH segments was < 4Mbp in size (Lashmar et al., 2018). This is a confirmation of a longer history of intensive artificial selection for numerically larger and more commercially popular breeds, such as the SA Drakensberger, in comparison to the Nguni. The higher proportion of long ROH segments (> 16Mbp) in comparison to the SA Drakensberger (0.124 versus 0.052), however, is indicative of a recent increase in inbreeding practices and requires monitoring.
Similar to Biscarini et al. (2020), the 0.5-1 Mbp length category ranked the highest in terms of the mean proportion of ROHet segments (0.650 across populations), with the highest number (219 of 341 segments) identified for the ES population. This provides evidence of selection favoring heterozygotes, which could be attributed to admixture between ES populations. The number of SNPs available did not allow for the utilization of ROH and ROHet as tools to identify and locate fixed regions driven by selection; the most prevalent SNPs within runs were only identified in ~ 15.7% and ~ 19.5% of single subpopulations for ROH and ROHet. Larger per-population sample sizes and improved marker densities (or whole-genome sequencing) are expected to provide improved resolution.
The inbreeding coefficients indicated low but positive levels of inbreeding. The SA-S population ranked the highest for mean F IS coefficient (0.017 ± 0.046) and the ES population the lowest (0.001 ± 0.037), which was in concordance with higher F IS estimates previously observed for populations under higher selection pressures (e.g., 0.086 for Hereford versus − 0.017 for Montana Tropical composite; Mulim et al., 2021). Furthermore, Abin (2016) had previously reported the highest number of offspring per sire for the SA Nguni breed compared to other Sanga (Afrikaner, Drakensberger, and Tuli) and indicine (Boran) breeds, indicating a tendency toward overutilization of sires. The rankings were slightly different for ROH-based coefficients; the F ROH≥1Mbp coefficient ranged from 0.025 (SA-S) to 0.029 (SA-R), whereas F ROH=4-8Mbp ranged from 0.006 (ES) to 0.009 (SA-R) and F ROH≥16Mbp ranged from 0.016 (SA-R) to 0.025 (ES). These differences may be attributed to differences in the frequency distribution (Fig. 2). For ES and SA-S, there were a few highly inbred animals with F ROH>1Mbp > 0.1 (three and five, respectively), and this could be a sampling effect or the result of current breeding strategies. The high proportion of ROH > 16Mbp, and consequently high F ROH≥16Mbp , is indicative of a recent introduction of breeding practices that favor consanguineous mating (e.g., the overutilization of high-impact bulls across research stations). Erosion in the number of breeding animals that are available to certain ecotypes (i.e., that conform to a certain coat color variation) may also restrict the choice of breeding animals and, therefore, increase the chances of inbreeding.
The Nguni breed is clearly distinct from both indicine and taurine subspecies (Fig. 3a). The within-breed PCA indicated a clear separation of ES and SA populations (Fig. 3b), with more dispersion observed within the ES population than the overall SA population, and is consistent with a longer history of methodic within-breed selection for SA overall (since 1986) compared to a longer history of indiscriminate crossbreeding with exotic breeds for ES (since 1975). The tighter cluster observed for SA-S subpopulation compared to the SA-R subpopulation supports more intense exposure to artificial selection and the utilization of high-impact animals across herds. Population structure results (Fig. 3c) supported the clustering patterns depicted Fig. 1 The proportion of runs of homozygosity (ROH; a) and runs of heterozygosity (ROHet; b) within different segment length categories in the PCA. The ideal number of ancestral populations (K), with the lowest cross-validation error (CV = 0.514), for the merged data set was eight. Whereas K = 3 separated different subspecies (i.e., indicine, Sanga, and taurine), K = 4 separated the Nguni into country-specific populations and K = 8 further separated all populations, except SA-S, into ecotypes or subpopulations. The SA-S population was more uniform (mean proportion shared = 0.741), which is consistent with stud breeding practices (i.e., conforming to uniform breed standards). The SA-R population displayed a higher degree of admixture (mean proportion shared = 0.631 ± 0.353), which supported sampling from different agroecological zones and, hence, the possibility of different ecotypes. Even though ES animals were sampled from three separate breeding stations, only two main ancestral contributors were identified. Individual ecotypes in eSwatini may, therefore, be at risk of genetic erosion. The sustainable conservationthrough-utilization of distinct ecotypes will therefore rely on improved phenotype-and genotype-level characterization of these ecotypes to facilitate within-ecotype management and selection practices.
In conclusion, the genomic information proved useful to provide insight into the genomic diversity and inbreeding among the three Nguni populations. The SA and ES populations could be distinguished as separate clusters and require further investigation for the conservation of potential ecotypes. Breeding strategies for ES populations should be monitored to prevent genetic erosion of subpopulations. Higher resolution genomic profiling (e.g., with wholegenome sequencing information) of Nguni populations will be beneficial in identifying and locating consensus ROH and ROHet regions and the potentially important genes contained within these regions.