Maximum levels of global phylogenetic diversity efficiently capture plant services for humankind

The divergent nature of evolution suggests that securing the human benefits that are directly provided by biodiversity may require counting on disparate lineages of the Tree of Life. However, quantitative evidence supporting this claim is still tenuous. Here, we draw on a global review of plant-use records demonstrating that maximum levels of phylogenetic diversity capture significantly greater numbers of plant-use records than random selection of taxa. Our study establishes an empirical foundation that links evolutionary history to human wellbeing, and it will serve as a discussion baseline to promote better-grounded accounts of the services that are directly provided by biodiversity. Maximum levels of global plant phylogenetic diversity capture more human benefits and at higher diversity levels than does random selection of taxa.

tenuous 7 and not without controversy 8,9 . While some authors hold that maximizing phylogenetic diversity should lead to recognition of high levels of useful feature diversity 1,8 , others have suggested that the phylogenetic approach can be misleading 9 . This controversy likely reflects that the connection between evolutionary history and human wellbeing remains largely theoretical 10 (but see Forest et al. 11 for an empirical local assessment), being only an initial move towards its consolidation as a scientific paradigm.
Here, we provide quantitative evidence that maximum levels of global plant phylogenetic diversity (PD max ) capture more human benefits (plant-use records sorted into 28 standard categories of use 12 ) and at higher diversity levels (records more evenly distributed between the categories) than does random selection of taxa, supporting the long-standing notion that maximizing phylogenetic diversity is a valuable means to retrieve high levels of useful feature diversity [4][5][6] . Our genus-level analysis is based on the most comprehensive time-calibrated vascular plant phylogeny available 13,14 , including all accepted vascular plant genera worldwide (a total of 13,489) as well as 9,478 genus-level plant-use records (presence/absence) obtained from a systematic review of botanical literature and authoritative websites 15 .
The PD max strategy outperformed random selection of taxa at any sample size (Fig. 1a), with relative gains varying between 4% ,478 counted for all use categories combined) retrieved with the PD max and random selection strategies across sample sizes. b, Gain in plant-use records obtained with PD max relative to random selection across sample sizes. c, Equitability (Pielou's evenness index) in the distribution of plant-use records among the 28 categories with PD max and random sampling strategies across sample sizes. The symbols in a and c indicate statistical significance (based on SES scores) for a nominal alpha of 10% (·), 5% (*), 1% (**) and 0.1% (***) (two-tailed tests), and the vertical thin bars at the centre of the percentage bars represent confidence intervals at 95%. and 46% (Fig. 1b). This result suggests that, in the absence of any other source of information beyond evolutionary history, prospecting disparate lineages of the phylogeny could help to make the most of the natural services that are the result of evolution. With regard to individual plant-use categories, PD max retrieved a higher number of records relative to random selection in 92% of the comparisons -30% 0 +30% +60% +90% +120% E nv iro nm en ta l S o c ia l H u m a n a n d a n im a l n u tr it io n The bars represent the relative gains obtained with PD max relative to random selection at S = 20% of the total pool of taxa, the sample size at which the maximum equitability in the distribution of records among use categories was observed (Fig. 1c). The symbols on the bars indicate statistical significance (based on SES scores) for a nominal alpha of 10% (·), 5% (*), 1% (**) and 0.1% (***) (two-tailed tests). The colours represent different groups of categories following the Economic Botany Data Collection Standard (Supplementary Table 1 ( Fig. 2 and Supplementary Fig. 1). Moreover, given that relative record gains with PD max were overall higher for the less common categories ( Supplementary Fig. 2), PD max also retrieved significantly more equitable distributions of records among categories at most sample sizes (Fig. 1c). This indicates that PD max recovers more plant uses in general than random selection, and that it does so by optimizing the capture of some of the rarest uses, thus resulting in a more balanced palette of human benefits. Both PD max and random selection strategies retrieved the maximum possible richness of plant-use categories (n = 28) across most sample sizes, yet random selection failed in retrieving maximum richness of categories at 10% and 20% sample sizes in a few cases. Our genus-level approach is superior to the species-level approach in that the latter would suffer from unacceptable omission errors (ethnobotanical knowledge will most likely remain vastly under-documented for long below the genus level [16][17][18] ) and extreme lack of phylogenetic information 13 , yet it may introduce some uncertainty because the operational unit of plant use is often the species. As such, retrieving a useful genus that comprises just a few species could be considered more valuable than a highly diversified one with the same use, because the uncertainty regarding the species that are actually useful within each genus would be less in the former case. Nonetheless, a re-analysis of the data after downweighting our genus-level plant-use observations in direct proportion to species richness per genus revealed an even stronger pattern ( Supplementary Fig. 3). Moreover, the relationship between PD and plant benefits held in separate continental regions of the world (TDWG level 1 standards, Supplementary Figs. 4 and 5), which suggests that our results are consistent across floras that have evolved in distinct biogeographic regions and over different timescales.

Maximum levels of global phylogenetic diversity efficiently capture plant services for humankind
The striking success of the PD max strategy lies in the phylogenetic structure of the categories. As such, we found a strong positive relationship between the PD that is encapsulated in each plant-use category and the relative gain in records per category under the PD max strategy (Extended Data Fig. 1), meaning that greater gains are predicted for phylogenetically dispersed categories. In fact, the only category that was significantly underrepresented with PD max relative to random selection concerns rubber plants ( Fig. 2 and Supplementary Fig. 1), which are strongly clumped in the phylogeny (Supplementary Table 2). Our results complement previous findings reported in local studies that high levels of PD can increase multifunctionality via complementarity of beneficial attributes among phylogenetically distant taxa 3 . For example, regarding the production of natural poisons against harmful or nuisance invertebrates, we found that maximum levels of global PD capture more plant taxa generating them than random selection (Fig. 2), which in turn may imply an increased potential to control the detrimental effects of disparate invertebrate lineages. While the latter hypothesis cannot be tested with our data, observations that most of the antagonistic plant-invertebrate interactions that ultimately shaped this benefit are phylogenetically conserved 19,20 (that is, invertebrate species often attack a narrow range of closely related host plants) and geographically restricted 21 support this idea. It follows that, in the shadow of global change, counting on a variety of invertebrate poisons and deterrents from distinct plant lineages may help to counter phylogenetically diverse pests coming from disparate parts of the world 22,23 .
It is important to note that an unobserved link between a human need and a taxon does not necessarily imply that the link will not be found in the future. The ecological apparency hypothesis states that, among equally valuable taxa with regard to a certain use, the most apparent or salient ones are preferred simply because they are readily available 24 . Furthermore, cultural factors could also explain the preferential use of certain taxa at the expense of others that might equally fulfil the need 25 . By analogy to the ecological prediction that higher competition between closely related taxa of similar phenotypes can lead to greater phylogenetic diversity 26 , human preference patterns in the use of available plant resources might have increased phylogenetic overdispersion in local ethnofloras. Therefore, ecological and cultural factors, together with the fact that both plant lineages and the human cultures that prospect them are geographically restricted to a greater or lesser extent, may have contributed to the striking success of the PD max strategy over random selection in capturing the human benefits that are associated with plant biodiversity.
The Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) recently approved undertaking an assessment of the use of 'wild' species, including the identification of opportunities to establish measures that ensure and promote sustainable practices 27 . The ultimate goal of this conservation initiative is securing the 'option values' of biodiversity, this is, the present and future benefits that are associated with the continued existence of a wide variety of taxa in Nature, and phylogenetic diversity is increasingly recognized as a valuable indicator of such maintenance of options 28 . Concurring with the IPBES philosophy that the world is in need of a broad appreciation of option values as a key contribution of Nature to people 29,30 , our study establishes a solid empirical foundation that links evolutionary history to human wellbeing, and it will serve as a discussion baseline to promote better-grounded accounts of the services that are directly provided by biodiversity 31,32 .

Methods
Plant-use dataset. We compiled a genus-level dataset of plant-use records for all vascular plant taxa described to date using the information gathered in the fourth edition of Mabberley's Plant-book 15 . Mabberley's Plant-book is the most comprehensive and authoritative encyclopaedic review of global plant classification (genera) and their uses published hitherto. From 1974 to 2017, all the information included in Mabberley's Plant-book was gathered, sorted, evaluated and synthesized by David Mabberley, who systematically reviewed over 1,000 botanical sources including modern Floras, handbooks, periodicals, monographs and websites (all references can be found in Mabberley 15 ). We conducted a double-check manual screening of all plant uses described in Mabberley's Plant-book and sorted them into 28 standard categories of use following the guidelines in the Economic Botany Data Collection Standard 12 (hereafter 'Collection Standard'). When two or more applications of the same category were described for a given taxon, we considered them as a single plant-use record. For example, if the wood of a taxon is used to build poles, furniture and toys (that is, three different applications), we simply recorded that the taxon provides timber. This procedure resulted in a binary classification of 9,478 plant-use records across the 28 categories, including benefits related to human and animal nutrition (human food, human-food additives, vertebrate food and invertebrate food), materials (wood, stems, fibres, leaves, seeds/fruits, tannins/dyestuffs, gums/resins, lipids, waxes, scents and latex/rubber), fuels (fuelwood, charcoal and biofuels), medicine (both human and veterinary), poisons (vertebrate poison and invertebrate poison), social (antifertility agents, smoking materials/drugs and symbolic/magic/inspiration) and environmental uses (ornamental, bioindicators/bioremediators, soil improvers and hedging/ shelter). A detailed description of the categories is provided in Supplementary  Table 1. Although the use of leaves and seeds/fruits as materials is considered as 'miscellaneous' in the Collection Standard, we took them upfront as independent categories because we found many records in Mabberley's Plant-book that fit into these categories (typically leaves for thatching and seeds/fruits for handicrafts). The environmental categories 'erosion control' , 'revegetators' , 'soil improvers' and 'agroforestry' described in the Collection Standard were considered as one single category (soil improvers) because they were very difficult to tease apart in many cases (for example, some plants are used in agroforestry because they prevent soil erosion, and revegetators often improve soil quality). The same rationale applies to the Collection Standard categories 'shade/shelter' and 'boundaries/barriers/ supports' , which were merged into one single category (hedges and shelters). The Collection Standard also recognized different sub-categories of medicine, human food and poisons 12 , but we did not distinguish between them here because such information is often unknown and does not make much sense in the context of our global assessment. For example, while we are interested in recording the value of a taxon as human food, distinguishing between the parts of the plant that are actually eaten (sub-categories for human food in the Collection Standard) is rather irrelevant for the purposes of the study. A few records could not be assigned to any of the categories described in the Collection Standard (for example, spores and inflorescences used as materials), which recommends gathering such cases into 'miscellaneous' categories 12 . However, we simply disregarded them because such a mixture of poorly represented categories would not make sense in the context of our study. Finally, the category 'cork and cork substitutes' described in the Collection Standard was disregarded because we found very few records in Mabberley's Plant-book (likely because cork and cork substitutes are provided by only a few species and primarily from Quercus). We considered both fully realized (>99% of the cases) and mooted uses (as long as they were properly documented in literature), and doubtful entries were disregarded in any case. The resultant plant-use binary matrix (presence/absence of uses per genus) was used in all the analyses described below. Additionally, we derived a downweighted plant-use matrix by dividing the entries in the binary matrix (plant-use observations at the genus level) by the total number of accepted species per genus (following Plants of the World Online 33 ). This second matrix was used in a second round of analyses to take into account the uncertainty in the relationship between plant-use records in the genus-level dataset and the species that are actually useful, as the latter information is often unknown.
Of all the taxa included in the dataset, 33% showed at least one category of use, with a maximum number of plant-use records per taxa of 17 ( Supplementary Fig. 6). The most common category was 'ornamental' (26%), followed by 'medicine' (16%), 'human food' (13%) and 'timber' (8%), while the other categories occurred at a frequency lower than 5% (Supplementary Fig. 7). The phi correlation coefficient among the categories varied between −0.008 and 0.332, suggesting overall weak relationships among them.
Phylogenetic data. We generated a genus-level time-calibrated molecular phylogeny using the mega-tree GBOTB.extended 14 , which is a combination of the GBOTB tree for seed plants of Smith and Brown 13 and the pteridophytes clade in the phylogeny of Zanne et al. 34 with updates and corrections (that is, taxonomic standardization to The Plant List 35 nomenclatural and spelling criteria). This combined phylogeny represents the most comprehensive and sophisticated molecular phylogeny for vascular plants published hitherto. For each accepted genus in Mabberley's Plant-book, we picked one representative species at random from the largest monophyletic cluster of the genus in GBOTB.extended (if available). In the very few cases where more than one largest monophyletic cluster was found, we first selected one of the clusters at random and then picked one representative species. The GBOTB.extended phylogeny was then pruned to retain only the representative species of the genera. After resolving a few discrepancies and synonymy issues between Mabberley's Plant-book 15 and The Plant List 35 (using the nomenclatural criteria in Plants of the World Online 33 as a complementary reference to solve disputes), we found that 71% of the genera accepted in Mabberley's Plant-book included at least one representative species in the phylogeny. This purely molecular phylogenetic topology (hereafter 'molecular tree') revealed that all the taxonomic families of the genera included in the tree formed monophyletic clades except for Nymphaeaceae, Olacaceae and Tectariaceae, which were paraphyletic, and the polyphyletic Diplaziopsidaceae (see Supplementary Table 3 for a list of genera with taxonomic families). To take into account uncertainty in the phylogenetic relationships of the taxa that were missed in the molecular tree (hereafter 'phylogenetically uncertain taxa' or PUT 36 ), we derived a distribution of phylogenetic hypotheses from the latter using a systematic randomization procedure that was taxonomically and phylogenetically informed. The workflow implies defining for each PUT its 'most derived consensus clade' (MDCC) (that is, the clade in the molecular tree that most certainly contains the PUT) based on expert knowledge 36 (taxonomy, morphology, geographic distribution, etc.). Once the MDCCs of the PUTs are defined, a distribution of phylogenetic hypotheses can be generated by replicating the random insertion of the PUTs within their respective MDCCs a high number of times (for example, 100 times per posterior tree 36 ). The resultant phylogenetic hypotheses can then be used to replicate the analyses and average the results over the entire distribution of trees 9,14,36 . Smith and Brown 13 provided just one maximum-likelihood tree rather than a posterior distribution, thus we derived 100 alternative phylogenetic hypotheses from the maximum-likelihood tree as follows.
First, we retrieved for each genus in the dataset the taxonomic rank immediately above in the taxonomic hierarchy (typically subtribe, tribe or subfamily in ascending order, hereafter 'taxonomic ranks') from the National Center for Biotechnology Information (NCBI) Taxonomy database, the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration 37 . For some families, this information was not available in the NCBI repository, in which case we retrieved the taxonomic ranks from Mabberley's Plant-book 15 . In cases where taxonomic ranks were available in neither of these sources, we simply assigned the family rank to the genera. The mapping of taxonomic ranks in the molecular tree reveals whether they represent natural lineages (that is, monophyletic or paraphyletic 38 ), and we took advantage of such information to define the MDCCs for our PUTs. If the taxonomic rank of a PUT mapped as purely monophyletic or purely paraphyletic in the molecular tree, the subset of phylogenetic branches connecting all the genera in the tree that shared the same taxonomic rank as the PUT (hereafter 'sharing taxa') defined the MDCC (Supplementary Figs. 8a and 9a). In a few cases, the taxonomic ranks did not map as purely monophyletic or paraphyletic due to (1) the presence of 'outliers' that mapped away from the main cluster of sharing taxa or (2) the presence of 'intruders' from a different taxonomic rank within the main cluster. Such outliers and intruders might represent incorrect taxonomic assignments or even artefacts derived from the phylogenetic inference rather than evidence of unnatural (polyphyletic) groups. Thus, we calculated two different indices for each potential monophyletic or paraphyletic cluster of sharing taxa (because of the presence of outliers, intruders or both) in the phylogeny. The outlier ratio (OR) for a given set of sharing taxa is the ratio between the number of outliers observed for the set (relative to the largest cluster) and the number of sharing taxa in the set, and the intruder ratio (IR) is the ratio between the number of intruders observed within the largest cluster of sharing taxa and the size of the cluster (Supplementary Figs. 8 and 9). If (and only if) both ratios were ≤0.05, the subset of phylogenetic branches connecting all the sharing taxa in the largest cluster (that is, including intruders if any but not outliers) defined the MDCC of the PUT. Otherwise, the MDCC was defined as the smallest phylogenetic clade that included all the sharing taxa in the tree (that is, including outliers and/or intruders; Supplementary Figs. 8 and 9). In those cases where one single genus represented the only sharing taxon of a PUT in the molecular tree, the terminal node (phylogenetic tip) defined the MDCC of the PUT only if the node represented a singleton taxonomic family or subfamily. Otherwise (singleton tribes or subtribes), the parent node of the singleton sharing taxon defined the MDCC instead (Supplementary Fig. 10). Once all the PUTs were assigned to a MDCC (Supplementary Table 4), they were added to a randomly selected branch of their corresponding MDCC, the probability of being added along any branch of the clade being directly proportional to the length of the branch. We used a uniform distribution to determine the exact position to insert the PUTs along the selected branches 39 . This procedure was replicated 100 times to obtain a distribution of phylogenetic hypotheses.
Finding the subsets of genera that maximize phylogenetic diversity. We used the phylogenetic diversity (PD) index as a metric of the evolutionary history encompassed by a set of taxa 4 because PD is the most commonly used metric in exercises that aim at maximizing phylogenetic diversity 4,8,9,40 . The greedy algorithm 41 was used to find heuristically the subset of genera in the phylogeny that maximized the PD metric (PD max ) for a sample size S = 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of the total pool (n = 13,489). Because there are multiple subsets of size S that maximize PD in a phylogeny, we produced ten PD max subsets of genera per alternative phylogenetic hypothesis (n = 100) and sample size S. Thus, we obtained 1,000 different PD max subsets for each sample size S 9 .
Assessing the performance of the PD max strategy. With regard to human benefits provided by plant biodiversity, the PD max strategy could be considered more efficient than random selection of taxa if the former captures (i) a greater richness of plant-use categories, (ii) a greater number of plant-use records (in total and per category) and (iii) a greater equitability in the distribution of the records among the categories (Pielou's evenness index 42 ). Thus, for each sample size S, we computed these variables using 1,000 PD max subsets and averaged the results to obtain one observed value per sample size and variable 9 . We used standardized effect sizes (SES) to compare observed values against null distributions generated by randomly picking subsets of S taxa 1,000 times: where SES is the SES score for a given variable and sample size, M obs is the observed averaged value of the variable when taxa selection is phylogenetically informed (that is, using PD max subsets), M null is the mean of the null distribution (averaged value of the variable when taxa are picked at random) and SD null is the standard deviation of the null distribution.

Phylogenetic diversity of plant-use categories.
We computed the amount of evolutionary history (PD) that is encapsulated in each plant-use category in our dataset 4 . PD is not statistically independent of taxa richness, which differed greatly between the categories (Supplementary Table 2). Therefore, in order to make PD values comparable between them, we computed SES scores using Eq. 1. Null distributions of PD were generated for each category by shuffling taxa labels across the phylogenetic tips 1,000 times 43 , and SES scores were averaged across the 100 phylogenetic hypotheses used in the study. All analyses were conducted in R 44 using the packages 'picante' 45 and 'phytools' 39 and the greedyPD function developed by Mazel et al. 9 .
Continental-scale analyses. To assess whether the relationship between PD and plant benefits holds across floras that have evolved in distinct biogeographic regions, we also conducted all the analyses described above at the continental scale.
To do so, we compiled a checklist of the native genera of each TDWG level 1 region (Biodiversity Information Standards 46 ), namely Africa (n = 4,487), Australasia (n = 2,067), Europe + Asia-Temperate (n = 4,117), North America (n = 3,307), Asia-Tropical (n = 4,071) and South America (n = 4,783), using distributional information available in Plants of the World Online 33 and also Mabberley's Plant-book 15 in the few cases where this information could not be retrieved from the former source. The TDWG regions 'Pacific' (minor Pacific islands) and ' Antarctic' were disregarded because they showed comparatively lower diversities, and 'Europe' and ' Asia-Temperate' were merged into one single unit because the taxonomic turnover between the two regions (β sim distance 47 ) was very low (Supplementary Table 5), meaning that most of the genus-level flora of 'Europe' (the less diverse of the two) is shared with that of ' Asia-Temperate' . Thus, we finally analysed six continental datasets separately. We note that widespread genera might not always include useful species across their entire distribution range, which would lead to overestimating the ethnofloras of the regions. Thus, in order to account for this uncertainty, we also conducted the continental-scale analyses using only the genera that were endemic to each region (2,294 for Africa, 776 for Australasia, 1,887 for Europe + Asia-Temperate, 824 for North America, 809 for Asia-Tropical and 2,387 for South America).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data that support the findings of this study are available at https://doi. org/10.6084/m9.figshare.13625546.v1.

Code availability
All the code used in this research is available as functions that were either implemented in published R packages or provided as supplementary material in a previous open-access study.

acknowledgements
We thank the Scientific Computation Center of Andalusia (CICA) for the computing services they provided and H. Lima for assistance in downloading plant distributional information from the web. This work was supported by the Regional Government of the Community of Madrid and the University of Alcalá through the project 'Plant evolutionary history and human wellbeing in a changing world; assessing theoretical foundations using empirical evidence and new phylogenetic tools' , which was granted to R.

Corresponding author(s): Rafael Molina-Venegas
Last updated by author(s): Jan 14, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend,

Software and code
Policy information about availability of computer code Data collection We used the sofware Excel to assemble the plant-use dataset.

Data analysis
All the analyses were conducted in R v. 3.6.3 using the packages picante, phytools and the greedyPD function published by Mazel et al. (2018) For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The data that support the findings of this study will be published as a data paper upon acceptance of the article. Meanwhile, they are available from the corresponding author upon reasonable request.

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. All studies must disclose on these points even when the disclosure is negative.

Study description
In this study, we tested the hypothesis that maximum levels of plant phylogenetic diversity (PD) capture more human benefits (i.e. plant-use records sorted into 28 standard categories of use) than does random selection of taxa, both globally and across the main continental regions of the world. Our genus-level analysis included all accepted vascular plant genera worldwide (a total of 13489) as well as 9478 genus-level plant-use records obtained from a systematic review of botanical literature and authoritative websites. We analyzed nine sample sizes representing the deciles of the total pool of genera (i.e. S = 10, 20, 30 and up to 90% of them), and quantified three key aspects of the wealth of human benefits captured by each subset of taxa (sampled both randomly and maximizing PD), namely: (1) (4). RMV generated a genus-level time-calibrated molecular phylogeny using the mega-tree GBOTB.extended. Missing genera were randomly inserted in the phylogeny following a systematic and standardized protocol. This procedure resulted in a distribution of 100 phylogenetic hypotheses, and results were averaged across the entire distribution of trees to account for phylogenetic uncertainty.
Timing and spatial scale The dataset was assembled from October 2019 through March 2020, and it covers all accepted vascular plant genera (a total of 13489). The analyses were conducted both globally and across the main continental regions of the world (TDWG level-1 standards).

Data exclusions
A few doubtful plant-use records (e.g. poorly documented, contradictory information) were disregarded.

Reproducibility
Our research was not experimental but an analysis of plant-use records documented in the literature in a phylogenetic context, and thus it is fully reproducible.

Randomization
This is not relevant for our study because we did not follow an experimental approach. We did not use multiple covariates in the study.