Genome resequencing data offer remarkably high information content per individual (e.g., estimates of GD such as mean H or F100kb). This means that sampling only a few individuals can provide key insights into population biology. The relationships among GD, Ne, and fitness have been thoroughly reviewed and summarized by previous studies30–32. These and other studies indicate that GD, as measured by H or related measures, is a critical component not only of contemporary fitness but also of future evolutionary potential. The idea that GD can serve as an indicator of future evolutionary potential should not be overlooked considering the global environmental challenges facing natural populations today.
A reduction in GD, with its concomitant loss of fitness and increased probability of extinction9,33, is expected to result from demographic events like population bottlenecks, population subdivision, and founder events that reduce population sizes. Neutral GD is determined by the product of the generational mutation rate and the effective population size (Ne), and thus GD is determined in part by the census size of the population32,34. Moreover, and not surprisingly, population census size is positively correlated with geographic range size. According to conservation theory, small, threatened populations tend to have lower GD than large, broadly distributed populations which are typically not threatened35.
Our analyses of empirical data bear out those theoretical predictions (Fig. 1). We analyzed population genomic data from 83 species belonging to 11 Orders of mammals representing the various IUCN conservation categories. For each species, we calculated GD metrics and tested for significant associations between these metrics and various biological parameters, such as geographic distribution or body size, that might impact diversity. The overarching goal of the research was to determine the relationship between population-level GD metrics and IUCN conservation categories while simultaneously identifying key intrinsic drivers of mammalian GD, which we address first.
Description of mammalian genomic diversity
Our results are consistent with a long history of empirical genetic studies dating to the 1960’s when protein electrophoresis was first used to measure GD in natural populations of mammals. For example, Fig. 1 indicates that the three species with the highest H values are O. virginianus (white-tailed deer), Peromyscus maniculatus (deer mouse, including 2 subspecies), and Myotis lucifugus (little brown bat). Nevo et al. (1984) compiled an allozyme dataset of GD metrics, including H, from 1111 species of animals and plants including from 184 species of mammals. Their dataset was comprised of GD estimates from only a few dozen allozyme markers per species, and they examined only a few of the same species that we did. However, there are some remarkable similarities between Nevo et al. (1984) and our current study. Nevo et al. (1984) only 12 species of mammals (not including humans and domestic cat) that had values of H ≥ 0.09. Among them were O. virginianus, P. maniculatus, and two species of bats of the genus Myotis. The fact that the three species with the highest H in our dataset are either the same species or a congener of high GD species reported by Nevo et al. (1984) using such a different analytical approach is reassuring. It bolsters our confidence that evolutionary genetics theory is buttressed by existing, publicly-available genomic datasets that can be readily exploited by conservationists.
Taxonomic Order is the taxonomic level in which member species share a broad suite of morphological, physiological, genetic, and ecological characteristics; species of different Orders can easily be distinguished by many conservationists. If we just consider the 4 most speciose Orders, Rodentia had the highest mean value of H = 0.00520 and Carnivora had the lowest mean value H = 0.00088. This is not unexpected given that small herbivores generally have much larger population sizes and nucleotide substitution rates than do carnivores36. Conversely, Carnivores had the highest mean F1Mb = 0.06209 and rodents have the second lowest mean F1Mb = 0.02441. Again, this is consistent with their population biology in which rodents are expected to have higher effective mutation rates and larger population sizes than carnivores, where there is generally far more opportunity for inbreeding in isolated populations. Primates have relatively high inbreeding with F1Mb = 0.05569. This is perhaps a reflection of a high degree of social structuring, small census population sizes, and slower rates of molecular evolution in primates36.
Genomic diversity and Red List status
The major finding of this study is that key population GD metrics are predictive of IUCN conservation categories that presumably reflect extinction threat status. This supports the idea that GD is indirectly reflected by the current Red List assessment methodology. Our results also indicate that Threatened species or populations have reduced GD compared to those with Non-Threatened status. We found that H (and its correlates) was the best conservation metric, followed by F100kb (a measure of autozygosity that is reflective of inbreeding). Two individual Red List criteria, “population trend” and “geographic range”, also reflect GD. Species with “Stable” population trends had significantly higher H than do “Decreasing” or “Increasing” species. Geographic range was inversely proportional to longer fraction of ROH (Supplementary Figure S18), another reasonable result in that habitat contraction can result in elevated levels of inbreeding relative to random mating37.
Since H and F100kb were the best predictors of Red List designation, we plotted their global distributions (Supplementary Figures S19 and S20) to illustrate world-wide patterns of GD. Mammalian populations in Asia and Africa, where the human footprint is the oldest, generally had higher levels of inbreeding than did other continents whereas North America seemed to have relatively healthier distributions of mammals with regard to their GD. Taken as a whole, the worldwide GD distribution calls for more active conservation efforts and research in Asia and the Global South.
The correlation between GD and Red List status has been tested before10,11,38,39 but mostly with mitochondrial or microsatellite marker data. There has been no scientific consensus on whether the Red List indirectly captures GD. Recently, Schmidt et al. (2023) performed a meta-analyses of studies that used different markers and corroborated Willoughby et al. (2015), who found that GD is modestly predictive of Red List status. Our results are consistent with this interpretation. Several authors10,11,40 have suggested using the loss of GD rather than snapshot values of GD in conservation assessments. In the next section, we extend this line of reasoning by detailing an approach for including GD as an explicit criterion in future conservation assessments.
An explicit genetic criterion for conservation assessments
Over thirty years ago, Mace and Lande (1991) originally suggested an assessment criterion based on Ne in Version 1.0 of the Red List Categories and Criteria, but the most recent iteration of these Criteria (Version 3.1) still do not embrace Ne despite recent pleas to include genetic considerations in status determinations (e.g., 10,11,42). We suggest that an additional criterion that explicitly considers GD metrics and thresholds would help further inform conservation assessments, especially for species that might otherwise be deemed Data Deficient.
Our proposal for an explicit new GD criterion for status assessments is based on the mean loss of heterozygosity over time43. We chose H not only because the concept of heterozygosity is well understood by most biologists, but because our results indicate that it was the best indicator as well as the best predictor of existing IUCN categories. Furthermore, H has a solid theoretical foundation based on Crow and Kimura’s equation:
$${H}_{\text{T}}={H}_{\text{O}}{(1- \frac{1}{{2N}_{\text{e}}})}^{T}$$
where Ne = effective population size, HO = observed heterozygosity, HT = heterozygosity at time T, and T = the number of generations in 100 years (e.g., T is 100 for most insects or annual plants, T is 50 for antelope with 2-year generation times, and T is 5 for whales with 20-year generation times). Our proposed GD criterion is illustrated in Fig. 3 and, in principle, could be readily applied by any conservation organization that conducts status assessments given that the model parameters can be estimated from publicly available resources10. For example, HO can be estimated from population genomic datasets and generation time is generally known from life history studies. Ne can either be estimated indirectly from census population size (Nc) where Ne is crudely estimated from Nc44, or directly from population genomic data. For example, contemporary Ne can be estimated using the linkage-disequilibrium-based method (e.g., GONE45) or with a coalescence-based method (e.g., Stairway Plot 246) so long as practitioners recognize that genomes do not immediately register demographic changes (i.e., there is a lag time47,48).
We suggest that GD can be used to assign threat categories (e.g., CR or VU) when a population is expected to lose a given proportion of its H in 100 years49–52 as follows:
CR: if HT is 90% or less of HO (i.e., a 10% or more loss of heterozygosity in 100 years)
EN: if HT is 90–95% of HO (a 5–10% loss of heterozygosity in 100 years)
VU: if HT is 95–97.5% or less of HO (a 2.5-5% loss of heterozygosity in 100 years) OR Ne < 1000
NT: if HT is more than 97.5% of HO AND 1000 ≤ Ne < 5000
LC if HT is more than 97.5% of HO AND Ne ≥ 5000
We tested this new GD criterion using maximum population size estimates for species with Red List information, or by employing Stairway Plot 2 to estimate Ne for “Data Deficient” species (i.e., those without Red List information). The results are presented in Fig. 4 and Supplementary Table S5. Compared to the official IUCN Red List categories, the “GD categories” that we derived from the GD criterion described above were generally more conservative, likely because we used the maximum population size estimates available. The percentage loss of heterozygosity in 100 years (“Het_loss_%” in Supplementary Table S5) was less than 2.5% in many cases thanks to large census population sizes and/or long generation intervals. The two “Data-Deficient” species were assigned as “LC” or “NT” based on large estimates of Ne. However, while all the “LC” and “NT” species of the Red List remained in “Non-Threatened” categories, some “EN” and “CR” species according to the Red List were elevated or remained as “CR” when evaluated using only our new GD criterion, perhaps foreshadowing genomic manifestations of the extinction vortex5.
Our analyses show that the five conservation criteria currently used by IUCN (census population size, demographic trajectory, geographic range size, a combined index of population size and geographic range size, and associated quantitative analyses) indirectly capture heterozygosity, a key element of GD. However, many species on IUCN’s Red List are “Data Deficient” because parameters like census population size or demographic trajectory are extremely difficult to estimate. We think that GD could become valuable as a sixth criterion for conservation assessments, in large part because GD can be more easily and inexpensively evaluated than census size or demographic trajectory and can be estimated directly (by anyone) from public databases that are expanding rapidly.
Regardless of whether the scientific community adopts our specific GD criterion, we think conservationists would do well to explicitly assess GD metrics as part of a comprehensive evaluation of each species. We expect other genomic assessments, such as genetic load or genomic offset, could ultimately be incorporated into a more comprehensive GD criterion at some point in the future, but heterozygosity estimates for many species can be generated today as conservationists struggle with the ongoing biodiversity crises. Our study outlines the theoretical and empirical justification for a new GD criterion, a bioinformatic pipeline for estimating GD from publicly-available population genomic data, an analytical framework, and explicit recommendations for use by conservation authorities. We have illustrated our ideas using mammalian data, but they are applicable to most branches of the tree of life.