Fine-scale genetic diversity of the Brazilian Pantaneiro horse breed adapted to flooded regions

Among the animal species first introduced in Brazil during the country’s discovery, horses (Equus caballus) stand out because of their evolutionary history and relationship with humans. Among the Brazilian horse breeds, the Pantaneiro draws attention due to its adaptative traits. Blood samples of 116 Pantaneiro horses were divided into six populations based on their sampling location, aiming to identify the existence of genetic structure and quantify genetic diversity within and between them. Populations were compared to elucidate genetic variability and differentiation better and assess the impact of Pantanal’s natural geographic barriers on gene flow between populations. Data from the GGP Equine BeadChip (Geneseek-Neogen, 65.157 SNPs) was used to assess basic diversity parameters, genetic distance (FST), principal component analysis (PCA), and population structure (ADMIXTURE) for the sampled animals. Mantel test was also performed to investigate the correlation between the populations’ genetic and geographic distances. Results showed high genetic variability in all populations, with elevated levels of admixture in their structure. High levels of admixture make it challenging to establish a racial pattern and, consequently, populations within the breed, being that only one of the populations differentiated itself from the others. No significant correlations between genetic and geographic distances were observed, indicating that environmental barriers did not hinder gene flow between populations, and neither farmers’ selection practices might have change breed genetic composition significantly. Low genetic distance and similar heterozygosity values were observed among populations, suggesting strong genetic proximity and low differentiation. Thereby, the Pantaneiro breed does not exhibit genetic subpopulations and could be considered, for conservation purposes, a single big population in the Panatnal region. This study will support sampling strategies for National genebank.


Introduction
The Pantanal Matogrossense is the world's largest wetland, located in Brazil's midwestern region, and corresponds to 35% of the river Alto Paraguai (South America's central region) hydrographic basin (Abreu et al. 2010). The weather is hot, with dry winters. The annual precipitation varies between 1.000 and 1.400 mm, and approximately 80% of the rainfall occurs during the summer (November to March, with a good amount happening from December to January). The region's terrain is flat, contributing to retaining water above the soil surface due to constant flooding caused by the overflowing of rivers in the region (Abreu et al. 2010).
In addition to little to no human interference, this region's conditions originated the horse breed we know today as Pantaneiro. As a result of over 200 years of selective pressures, these animals are well adapted to this region (Sereno et al. 1997;Mariante and Cavalcante 2000). They possess adaptative traits that facilitate their maintenance in said environment, such as the ability to endure long treks in these wetlands during the rainy season (Silva et al. 2005;Abreu et al. 2010). Animals from the breed are widely used for transportation and dealing with the large cattle herds common in this region (Mariante and Cavalcante 2000;Silva et al. 2005;Abreu et al. 2010).
Since the year 1900, the Pantaneiro horse has been crossbred with breeds such as Arabian, Anglo Arabian, and English thoroughbred, aiming to increase the size and improve the breed's conformation (Balieiro 1971;Beck 1985). This crossbreeding was carried out indiscriminately, increasing the breed's susceptibility to diseases such as trypanosomiasis and equine infectious anemia, responsible for population decline (Mariante and Cavalcante 2000;Santos et al. 2001).
The conservation and use of naturalized breeds, such as the Pantaneiro, are directly associated with their functionality and efficiency in the field and their adaptation to severe environmental conditions, which implicates financial feedback (Paiva 2005). Cattle raising in Pantanal is extensive and occurs in large farms, making the Pantaneiro horse a fundamental tool in the field (Silva et al. 2005). For that reason, cattle farmers tend to prefer the breed, especially for moving the bovine herds in this flooded region, as the Pantanal only has one road running through it.
Genetic characterization is the first step in breed conservation and may have practical implications for future breeding strategies (Solis et al. 2005). There are no studies on genetic characterization and structure of the Pantaneiro breed using single nucleotide polymorphisms (SNPs), a molecular tool growing in popularity due to its capacity to generate high quantities of data in a short time and relatively cheap. The present study aimed to utilize SNP markers present in the Equine SNP70 BeadChip to help elucidate the genetic differentiation, structure, characterization, and variability of the breed and assess the impact of geographical barriers imposed by Pantanal's environment on gene flow between the suggested populations.

Sampling and genotyping
The present study included 116 Pantaneiro horses, divided into six populations based on their sampling locations from Mato Grosso (MT) and Mato Grosso do Sul (MS) states in Brazil (Table 1). Sampling locations were chosen based on the large number of registered animals and the presence of conservation nuclei for the breed. Geographical distances between the sites were also taken into consideration. All blood samples were collected from breeders/associations and Embrapa's conservation nuclei. A graphical representation of the sampling locations and geographical distance (in km) between them can be found in Fig. S1 and Table S1, respectively.
Blood samples stabilized by EDTA were used for DNA extraction following the adapted protocol of Miller et al. (1988). Approximately 0.5 µg of DNA was used for SNP genotyping using the GGP Equine Beadchip (Geneseek-Neogen, 65.157 SNPs).
After purification and being shipped to Neogen® Genomics (https:// genom ics. neogen. com/) for genotyping, the remaining DNA was added to the Brazilian Animal Germplasm Bank (Embrapa Genetic Resources and Biotechnology) and registered to the Animal Allele portal (http:// alelo animal. cenar gen. embra pa. br/ datab ase_ colla borat ion_ page_ dev).
Quality control was performed, and the exclusion of samples and loci that did not meet the criteria established was performed in SNP and Variation Suite (SVS) (SVS, Golden Helix, Bozeman, MT, USA). Quality control criteria were the following: minor allele frequency (MAF) (< 0.05); call rate for markers (> 0.99); call rate for samples (> 0.90); Hardy-Weinberg equilibrium (HWE) (p < 0.001) and Linkage Disequilibrium filter (LD) (markers with r 2 values < 0.4 in a 50 SNP sliding window). Due to uneven sampling number between populations and aiming to remove animals with high kinship, one animal of each pair with values of 0.8 or higher was removed. In total, 27.930 SNPs remained. Basic genetic parameters were calculated previous to marker filtering, with the use of 61.746 SNPs. Only markers associated with sex chromosomes were removed for this analysis.

Genetic diversity
Basic genetic parameters calculated for all populations were as follows: average call rate; average observed (Weir and Cockerham 1984) and expected (Nei 1978) heterozygosity; the average number of alleles and number of polymorphic markers. Genetic distance between populations was assessed with the use of a pairwise F ST matrix (2000 permutations; p < 0.05). This distance, as well as the relation between populations, was also evaluated by principal component analysis (PCA), utilizing the EIGENSTRAT (Price et al. 2006). Calculations were made with the use of the software SNP and Variation Suite v8 (Golden Helix, Inc., Bozeman, MT, www. golde nhelix. com) and Arlequin 2000 v.3.5 (Excoffier and Lischer 2010). Graphic 3D representation of the PCA was achieved using SigmaPlot13 (https:// systa tsoft ware. com/ produ cts/ sigma plot/).

Genetic structure
Population genetic structure was calculated in ADMIX-TURE 1.2.2 (Alexander et al. 2009). This analysis estimates ancestry with the use of a maximum likelihood model, with genetic characterization achieved using allele frequencies, and a hypothetical number of clusters (K), estimating the probability of each animal (or their genomic proportion) to be designated to a specific value of K. Ten repetitions were done for each cluster, varying from K = 2 to 12. A crossvalidation test was performed to estimate the ideal value of K. This was achieved with the assembly of a dispersion graph by calculating the mean value of all 10 iterations for each value of K. The "optimal" value was that with the lowest cross-validation error (lowest mean value of iterations) (Alexander et al. 2009). Graphic projections were assembled with the use of StructureSelector (Li and Liu 2018).

Correlation between genetic and geographic distances
A Mantel test was performed using the software Arlequin 2000 v.3.5 (Excoffier and Lischer 2010) to verify the correlation between genetic and geographical distances, using pairwise F ST values between populations and the geographical distances described in Table 2.

Genotyping and sampling
Kinship matrix showed two individuals with higher consanguinity values than those established as threshold (> 0.8), bringing the total number of analyzed horses down from 116 to 114.

Genetic distance (pairwise F ST )
All pairwise F ST values were significant (Table 4). The highest observed value was between the Barra do Bugres and Promissão populations (F ST = 0.056), with the next highest values associated with comparisons between them and the other populations. The lowest value was observed between Nhumirim and Campo Grande (F ST = 0.008). The Mantel test showed no significant correlation between genetic and geographical distances for the analyzed populations.

Principal component analysis
PCA showed components 1, 2, and 3 representing 1.87%, 1.48%, and 1.47% of the genetic variation present between populations, respectively. Intense overlap between all populations was observed, with only a few individuals belonging to Nhumirim, Campo Grande, and Promissão populations being placed at a certain distance from others (Fig. 1).

Population genetic structure (ADMIXTURE)
The cross-validation test indicated the adequate number of clusters to be K = 4 ( Fig. 2A and B, and supplementary material Fig. S2). With K = 4, all populations showed high levels of admixture. The only population that showed some level of structure was Barra do Bugres, having a majorly "pure" structure. However, the major genetic component in this population can be seen in certain degrees in the other populations. A new structure (green) can be observed for this value of K, being present in all populations but more intensely in Nhumirim.
Overall, except for Barra do Bugres, a clear genetic structure and contribution could not be observed in other populations. The highest levels of admixture can be observed in Campo Grande, Nhumirim, and Poconé, despite all populations having shown some levels of miscegenation ( Fig. 2A).

Intrapopulation genetic variability
Variation in the number of polymorphic alleles observed in the genetic parameter analysis (Table 3) may be associated with farm management (such as selecting a specific trait/phenotype or other breeding strategies) in sampling locations. Consequently, this may result in different alleles being fixed in these populations, yet all showed similar heterozygosity values. This indicates gene flow between the populations, since the number of polymorphic SNPs does not necessarily reflect the heterozygosity rates observed. Sampling sizes for the locations were different, which may confer bias to the analysis since a higher number of polymorphic markers were observed in populations with a higher number of individuals sampled.
Other studies calculating average heterozygosity for Pantaneiro horse populations using microsatellite (Giacomoni et al. 2008;Sereno et al. 2008;Cortés et al. 2017) and RAPD-PCR markers (Egito et al. 2007) were similar between each other but differed from our study. These earlier studies used multiallelic markers, explaining the differences with those found in the present study with SNPs. There are no published studies regarding the assessment of genetic diversity employing the use of SNP markers for the Pantaneiro horse breed.

Interpopulation genetic variability
Despite being significant, all F ST values observed were low (Table 4). This low genetic distance suggests the existence of gene flow between populations, indicating that the natural environmental barriers imposed by the Pantanal ecosystem (especially flooded terrains) do not prevent reproduction between individuals from different locations. This is possible due to the breed's adaptative traits from inhabiting this biome, conferring that these animals remain in the flooded area for long periods without experiencing foot-rot, for example (Ribeiro et al. 2008;Santos et al. 2008). This hypothesis is reinforced by the PCA results (Fig. 1), where animals from all populations overlapped with no clear segregation between them, indicating a lack of genetic structure.  The Mantel test showed no significant correlation between populations genetic (F ST ) and geographical (km) distances. Some populations that were closer geographically had higher F ST values than some that were further apart. Therefore, no reproductive isolation caused by geographical distance was discerned in the analyzed populations, which supports the hypothesis that the terrain does not hinder the gene flow between them. Similar results for Pantaneiro breed populations were found by Giacomoni et al. (2008), showing low genetic distances varying from F ST = 0.008 to 0.064. Authors also observed low inbreeding levels for these populations and suggest that combined with the breed's adaptative traits that allow for gene flow between animals from different locations, variation in F ST results may be associated with local breeding strategies from each farm. Cortés et al. (2017) also observed low values of inbreeding for the breed. These results corroborate the absence of genetic differentiation observed for populations of the breed in the present study. McManus et al. (2013) observed considerable genetic variability for the breed between farms, but low between Pantanal counties, as well as low inbreeding values, corroborating the hypothesis above.

Population genetic structure (ADMIXTURE)
Genetic structure analysis (ADMIXTURE), together with the cross-validation test performed for the election of an ideal number of clusters (K), suggests the existence of four substructured populations (Fig. 2). However, only one population (Barra do Bugres) showed a somewhat higher number of animals with "pure" genetic structure. In addition, the major genetic component present in Barra do Bugres animals can be found in other populations, so it is not exclusive. This corroborates, once again, with the existence of gene flow between locations. According to McManus et al. (2013), the demand for famous stallions is high, accounting for movement between regions and farms. Moving from K = 3 to K = 4, a new structure appears, which is more intense in the Nhumirim population. Except for these two somewhat structured populations, those remaining (Campo Grande, Cuiabá, Poconé e Promissão) showed high levels of admixture. Therefore, it was not possible to observe four substructured populations.
More clusters (K = 7) made it possible to observe certain levels of differentiation in two other populations: Cuiabá and Promissão. The genetic components here can, however, also be found in all other populations to some extent. Adding more clusters (K = 9 and 11), this continues to be accentuated and three larger genetic structures can be observed: Barra do Bugres (with a majorly "pure" composition), Cuiabá and Promissão (somewhat "pure" portions, with a more intense presence of admixture). With the addition of more clusters, populations Campo Grande, Nhumirim, and Poconé contained a high degree of admixture. Nhumirim animals, however, showed a genetic structure mostly exclusive to this population in K = 9 and 11 (Fig. S2). Sereno et al. (2008), with microsatellite markers, also observed intense admixture in the genetic composition of Pantaneiro horses, corroborating the high genetic diversity found in the present study, as well as the lack of genetic structure between the studied populations.
We raise the hypothesis that the selection of Pantaneiro horses may be using morphological characteristics as selection criteria instead of the animals' geographic location. This would explain the appearance of new substructured Pantaneiro populations with favorable alleles for selection based on breeding of animals with desired traits by determined breeders. The Pantaneiro breed may also have changed due to indiscriminate crossing with other breeds in the past (Santos et al. 2003) not sampled in this study, as indicated by the genetic contributions that are not present in other locally adapted Brazilian breeds (Bchara 2021). However, the Pantaneiro breed presents alleles to be confirmed as a distinct breed compared to other naturalized Brazilian horse breeds. It was not possible, in this study, to determine the cause of the apparent substructures and admixture observed in the Pantaneiro horse breed. Nevertheless, McManus et al. (2008) selection should ensure that the breed does not lose its adaptive traits.

Conclusion
The Pantaneiro horse showed high genetic variability and low substructuring. Geographical barriers imposed by the Pantanal Matogrossense ecosystem do not hinder the possibility of gene flow between populations, possibly, due to the breed's capability of crossing this kind of flooded terrain and breeding choices by farmers. Only the Barra do Bugres population could be differentiated from the others. No significant correlation between genetic and geographical distances was observed. The populations in this study showed high admixture, with low values of F ST and elevated overlapping in PCA. There is also the possibility of subpopulation formation based on breeders' specific breeding goals, aiming to fix alleles of economic interest (e.g., morphological differences).