Darwin’s naturalization conundrum disentangled: the role of phylogenetic relatedness depends on the invasion stages


 Darwin’s naturalization hypothesis predicts successful invaders to be distantly related to native species, whereas his pre-adaptation hypothesis predicts the opposite. It has been suggested that depending on the invasion stage (i.e. introduction, naturalization, and invasiveness), both hypotheses, now known as Darwin’s naturalization conundrum, could hold true. We tested this by analysing whether the likelihood of introduction for cultivation as well as subsequent stages of naturalization and invasion of species alien to Southern Africa are correlated with their phylogenetic distance to the native flora of this region. While species were more likely to be introduced for cultivation if they are distantly related to the native flora, the probability of subsequent naturalization was higher for species closely related to the native flora. Furthermore, the probability of becoming invasive was higher for naturalized species distantly related to the native flora. These results were consistent across three different metrics of phylogenetic distance. Our study reveals that the relationship between phylogenetic distance to the native flora and success of an alien species depends on the invasion stage.

are pre-adapted to the local environmental conditions -the so-called pre-adaptation hypothesis. These apparently contradictory hypotheses are now known as Darwin's naturalization conundrum 32,38 .
Studies that tested Darwin's naturalization conundrum have found no consistent relationships between invasion success and evolutionary distance to native species reviewed in 32,39 . This might re ect that the studies considered different temporal and spatial scales, e.g. at small scales Darwin's naturalization hypothesis is likely to hold, and at large scales the pre-adaptation hypothesis is likely to hold 32,39,40 . Furthermore, most studies addressed Darwin's naturalization conundrum without considering possible differences between invasion stages 28,41−45 , but see 38,46 . The process of biological invasions can be subdivided into subsequent stages, and species have to pass different barriers or lters to move from one stage to the next 1,47 . First, species must overcome biogeographic dispersal barriers via human introduction into a new region. Although Darwin's naturalization conundrum does not make any predictions regarding the introduction stage, accounting for this stage is important because the vast majority of naturalized plants have been introduced intentionally for economic uses (e.g. as ornamental plants), and these plants are not a random phylogenetic subset of the global ora 48 . Ignoring such introduction bias can result in the over-or underestimation of the importance of species' traits underlying naturalization success 25 . Once a species has been introduced, phylogenetic relatedness to the native ora may impact success in subsequent stages in different ways. It is plausible that in order to establish self-sustaining populations (i.e. to become naturalized) in a region, it is important that a species is preadapted to the new environment 49 , which is likely the case when there are closely related natives 28, 39,50 .
However, in order to reach the next stage of invasion (i.e. become dominant and widespread -being invasive), it may be bene cial to be different from the native species and capitalize on unoccupied niches, which is likely when there are no closely related natives. We hence hypothesize that distinguishing between the different stages of invasion 39,51,52 might provide new insights into Darwin's naturalization conundrum.
The evolutionary relatedness of naturalized species to native oras has typically been compared to the relatedness expected when the naturalized species would have been randomly drawn from the global ora e.g. ref. 29,53,54 . Such a test relies on the unrealistic assumption that there has been no bias with regard to the phylogenetic a nity and characteristics of the introduced species and that all species have had the same opportunity to naturalize. However, the pool of introduced species from which the naturalized ones emerged is often phylogenetically biased 48 . Information on the pool of alien species introduced to a region for cultivation, in addition to information on naturalized and native species, would be necessary to allow accounting for biases that might be associated with the selection of species for introduction 28,42 . Furthermore, when assessing drivers of plant invasions, one should consider that relationships may be non-linear (i.e. hump or U-shaped) 18 . To improve our understanding of plant invasion-success drivers, we hence need to quantify and account for biases associated with the introduction stage 25,55 and to test for non-linear effects 40 .
Here, we study 5,091 alien angiosperm species that have been introduced to Southern Africa for cultivation 56 . Combined with lists of the global ora, the naturalized and native species within the region of Southern Africa, and the species that have become invasive within the country of South Africa (Fig. 1), we test the Darwin's naturalization conundrum at the different stages of the invasion process. Speci cally, we asked how phylogenetic relatedness to native species is correlated with the subsequent stages of introduction, naturalization and invasiveness of alien species.

The introduction and naturalization transitions
We found that of all the 299,833 angiosperm species that are not native to Southern Africa, 899 (0.29%) have become naturalized in at least one of Southern Africa's subregions. Among the phylogenetic distance models explaining the likelihood of being naturalized, phylogenetic distance to the most closely related native species (PD Min) provided the best model t (i.e. had the lowest AIC), followed by weighted mean phylogenetic distance to the native species (PD wMean) and mean phylogenetic distance to the native species (PD Mean ) ( Table S1). The likelihood of naturalization of a species from the global species pool decreased signi cantly for species more distantly related to the native ora according to all three phylogenetic indices (Fig. 3a, b, c; Table S1).
When accounting for introduction bias by considering the two transitions -to the introduction for cultivation, and from introduction to naturalization-separately, we found that of all 299,833 angiosperm species that are not native to Southern Africa, 5,091 (1.69%) have been introduced for cultivation, of which 554 (10.88%) have become naturalized in at least one subregion of Southern Africa. For both transitions of the phylogenetic distance indices, PD Min again provided the best model t ( Table S1).
Regardless of which distance index was chosen, the probability of a species to be introduced from the global species pool into Southern Africa increased with phylogenetic distance to the native ora ( Fig. 2a; Fig. 3d, e, f; Table S1), whereas the opposite was true for the probability of subsequent naturalization ( Fig. 2b; Fig. 3g, h, i; Table S1). However, the amount of explained variation by the models was relatively small (all Nagelkerke R 2 < 0.1; Table S1).

The naturalization-invasiveness transition
Among the 459 introduced cultivated species that have naturalized in the country South Africa, 261 (56.8%) are considered invasive. In the phylogenetic distance models explaining the likelihood of being invasive in South Africa, PD Min again provided the best model t, followed by PD wMean and PD Mean (Table   S2). According to the best model, naturalized species were more likely to become invasive if the phylogenetic distance to the most closely related native species is large ( Fig. 2c; Fig. 4c; Table S2). Yet, the two other models revealed marginal non-linear U-shaped patterns, indicating that not only naturalized species that were distantly related to the native ora are more likely to become invasive, but also that naturalized species that were closely related to the native ora are more likely to become invasive than those with intermediate relatedness (Fig. 4a, b; Table S2).

Discussion
As many studies on biological invasions lack crucial data on which species have been introduced but failed to establish, we took advantage of a comprehensive list of alien plant species that have been introduced to Southern Africa for cultivation. This list allowed us to account for introduction biases in our tests of Darwin's naturalization conundrum along the different stages of the invasion process. Our results show that alien species distantly related to native species were more likely to be introduced for cultivation. Once introduced, however, those that are more closely related to native species were more likely to naturalize. On the other hand, among the naturalized species in the country of South Africa, the ones most distantly related to the natives were more likely to become invasive. Our results thus show that phylogenetic distance to the native ora has opposing effects on the transitions during the invasion process.
Humans have introduced thousands of plant species from their native regions into foreign lands. Although some of those introductions were accidental, most of them were intentional for cultivation purposes 40,57−60 . Moreover, it is likely that the species that have been introduced are not a random selection from the global ora but possess certain characteristics that make them of interest for cultivation 25,55 . This is also re ected in their phylogenetic distance to the native ora, as we found that introduction of alien species for cultivation in Southern Africa was positively associated with phylogenetic distance to the native ora ( Fig. 2a; Fig. 3d, e, f; Table S1). This most likely indicates that non-native species with characteristics that are missing in the native ora were more likely to have been prospected for cultivation in Southern Africa. For example, the fact that Australian Eucalyptus specieswhich have no close relatives in Southern Africa-grow faster and produce better wood than most native Southern African trees, made them very attractive for cultivation in forestry plantations 61 . The same is true for the introduction of other woody species to South Africa, that were planted for deserti cation control and to reduce rewood shortage 62 . In other words, our results are in line with the idea that humans might preferentially introduce plant species with characteristics that could provide economic and social bene ts or ecosystem services that are not provided by species of the native ora.
Once introduced, not all alien species manage to grow and reproduce outside of cultivation. We found that the naturalization success of species introduced for cultivation was negatively associated with phylogenetic distance to the native ora (Figs. 2b, 3g, h, i; Table S1). In other words, and in line with the pre-adaptation hypothesis, introduced species that have closely related, and likely ecologically similar, species in the native Southern African ora were more likely to naturalize. This pattern was even visible when we considered the naturalization of all species in the global ora that are not native to Southern Africa (Fig. 3a, b, c; Table S1), despite the biased introduction of species that are distantly related to the native ora. This suggests that even if there is no information on which alien species have been introduced and failed to establish, comparing the naturalized ones to the global ora, as done in previous studies e.g. 29,53,54 , provides some indication of the importance of phylogenetic distance.
Naturalization in a new range depends on several abiotic and biotic factors 1 . These factors can act as lters that determine which species can in principle grow in the region, but biotic interactions such as competition can also provide resistance against the ultimate establishment of those species. It has been shown that, at least at the local scale, environmental pre-adaptation and biotic resistance are both important for alien plant naturalization 38,43,63 . However, in line with our results, at large spatial scales, environmental ltering is usually the most decisive factor for naturalization success 28,39,50 . One potential explanation, the pre-adaptation hypothesis proposed by Darwin 36 , is that introduced species closely related to the native ora share features with those native species that allow for survival and reproduction in the new range. These shared characteristics make the alien species pre-adapted to the new environmental conditions. While it is likely that closely related species have stronger competitive impacts on each other 64-67 but see 68 , at large scales, the pre-adaptation effect may overrule the drawback of stronger competition from close relatives 32,41,69 . In a study that used functional traits of alien and native ora, Divíšek, et al. 21 found that the similarity of alien and native species facilitates naturalization, but not invasiveness. This agreement with our study, suggests that this pattern persists even without accounting for the phylogenetic signal of species traits.
While the naturalization of introduced alien plants in Southern Africa might have occurred at sites where closely related native species happened to be present, this is less likely the case for naturalized species that have become widespread and locally dominant (i.e. invasive). In order for a naturalized species to spread and become dominant, it will have to interact with increasing numbers of native species. Those interactions are expected to be less detrimental for the alien species if it is more dissimilar to the natives, as this will allow it to avoid strong competition and common enemies by occupying vacant niches 69, 70 . In line with this idea, we found that all naturalized species and the subset of cultivated naturalized species in South Africa were more likely to be invasive if they have no close relatives in the native ora. This has also been shown for alien trees and shrubs in Southern Africa 71 . It should be noted, however, that in this nal stage of the invasion process, the three phylogenetic indices provided slightly different results. While the best tting model revealed a monotonic increase of the probability of being invasive with increasing PD Min , the other models revealed marginal non-linear U-shaped trends for PD Mean and PD wMean . At high values of PD Mean and PD wMean , invasiveness probability also increased with increasing distance values, as it did with PD Min , but also showed a slight increase in probability at very low values of PD Mean and PD wMean (Fig. 4a, b; Table S2). This not only suggests that naturalized species are more likely to become invasive when they are phylogenetically distant from the native ora, but also that closely related species could bene t e.g. by sharing pollinators 72 , and that super-competitors among alien species can outcompete closely related native species 73 . It could also re ect that closely related species have weaker negative allelopathic effects on each other, as recently shown by Zhang, et al. 74 .
Although we used three different phylogenetic distance indices, they largely revealed the same patterns. This is not surprising given that the indices are correlated. Nevertheless, models that included PD Min had the best t. This suggests that for successful transitions from one stage of the invasion process to the next one, the phylogenetic distance to the native species most closely related to the alien species is more important than the average distance to all native species, weighted or unweighted with regard to commonness 32 . In another study, however, Malecore, et al. 43 found that models that used PD Min consistently had the worst t. This apparent discrepancy may re ect the difference in spatial scale between both studies. In our large-scale study, the most closely related native species is more likely to indicate whether there is a suitable habitat for the species somewhere in Southern Africa than the average distance to all native species does. At a smaller spatial scale, such as in the local plant communities of Malecore, et al. 43 , the unweighted or weighted mean distance to all native species might be a better indicator of the overall suitability of the local site for the introduced species.
In conclusion, our results show that the direction of the effect of phylogenetic distance of alien plants to the native ora alternated along invasion stages in Southern Africa. While introduction success was positively associated with phylogenetic distance to the native ora, the opposite was true for subsequent naturalization success, but invasiveness was again positively associated with phylogenetic distance. For the latter, the association might be non-linear, at least for the PD Mean and PD wMean indices, with a tendency that not just distantly related aliens are more invasive but also aliens that are very closely related to natives. Thus, accounting for the different invasion stages and considering multiple phylogenetic distance metrics provide more insights into Darwin's naturalization conundrum. Finally, yet importantly, the two seemingly opposing hypotheses of Darwin need not be in con ict. Rather, the mechanisms underlying them act concurrently, and the one that dominates depends on the alien species' stage along the invasion process.

Study area
Our study focuses on the region of Southern Africa, which includes 10 countries: Angola, Botswana, Eswatini, Lesotho, Malawi, Mozambique, Namibia, South Africa, Zambia and Zimbabwe, with approximately 4,000,000 km 2 of land area 71 . As we had separate native species occurrence data for the nine provinces of South Africa, the total number of Southern African subregions was 18 (Fig. 1). Southern Africa has had a long history of plant introductions, which started in the late 18th century with the arrival of the European settlers 75 .

The cultivated, naturalized, and invasive alien species lists
The basis for our study is a list of over 8,000 taxa (including species and infraspeci c taxa; hereafter jointly referred to as 'species') reported as being cultivated in Southern Africa in the book 'Cultivated Plants of Southern Africa' 56 . To allow the alignment of the list of species with other datasets used in this study (see below), we validated and synchronized the taxonomic names according to The Plant List (version 1.1; (http://www.theplantlist.org/) using the R package 'Taxonstand' 76 . We then removed the species that, according to The Global Inventory of Floras and Traits database (GIFT) 77 , are native to Southern Africa. Moreover, we removed ferns and gymnosperms as they are phylogenetically very distinct from the angiosperms. The nal list of introduced species comprised 5,091 cultivated alien angiosperm species.
To identify which of the cultivated alien species have become naturalized in Southern Africa, we used the Global Naturalized Alien Flora database (GloNAF) 78 . GloNAF includes lists of naturalized vascular plant taxa for over 1,000 regions (usually administrative regions, such as countries, states and provinces) around the globe 78 . Of the 5,091 cultivated species, 554 (10.8%) species have managed to naturalize in at least one sub-region following GloNAF regions 78 of Southern Africa.
As invasiveness of the naturalized species might vary among the different subregions of Southern Africa, and because data on invasiveness were not available for all those subregions, we restricted the analysis of how invasiveness relates to phylogenetic distance between the alien and native species to the subset of alien species that are naturalized in the country of South Africa (n = 459). For this, we used the most recent list of invasive plant species of South Africa by Zengeya and Wilson 79 , in which invasive species are de ned as widely spread species, in accordance with the uni ed invasion framework of Blackburn, et al. 47 .

Global angiosperm species pool
In order to be able to test whether the cultivated introduced plant species as well as the naturalized species are more or less phylogenetically related to the native ora of Southern Africa than expected by chance, we needed a list of the global species pool outside of Southern Africa. To get such a list, we extracted all 324,808 accepted names of angiosperm taxa from The Plant List (http://www.theplantlist.org/, last accessed in May 2019). From this list, we excluded the species that, according to GIFT, are native to Southern Africa (n = 24,974), resulting in a global pool of 299,834 species that are not native to Southern Africa.

Phylogenetic tree of the global ora
To compute phylogenetic distance indices between the species from the global species pool outside of Southern Africa and the native species of Southern Africa, we constructed a global phylogenetic tree of all angiosperms with accepted names according to The Plant List. In brief, as a basis for this phylogeny, we used the time-calibrated phylogeny of Smith and Brown 80 , which is currently the most comprehensive phylogeny for seed plants. Of the global angiosperm ora (n = 324,808 species), 71,133 (21.9%) species were missing from the base tree. We added these missing species to their genus-or family-level root node in the base tree using the R package 'V.PhyloMaker' 81 . The family Haptanthaceae is missing from the base tree, and the only species belonging to this family (Haptanthus hazlettii) was excluded from our analysis. This resulted in a global angiosperm phylogeny of 324,807 species, of which (n = 24,974) are native to Southern Africa.

Phylogenetic distance indices
Several metrics can be used to describe the phylogenetic relatedness between plant species 32 . We selected the three most commonly used phylogenetic distance indices 32 . First, we calculated the mean phylogenetic distance of a focal alien species to all native species (PD Mean ). This measure assumes that the whole native ora (irrespective of the abundances of the species) contributes equally to the success or failure of the alien species. Second, we calculated the weighted mean phylogenetic distance to the native species (PD wMean ). This measure weights the phylogenetic distance by the occurrence frequency of the native species, assuming that the widespread native species will be more likely to interact with the alien species. As a measure of how widespread a native species is, we used the number of Southern African sub-regions (total n = 18; Fig. 1) in which it occurs according to GIFT. Third, we calculated the phylogenetic distance to the most closely related native species (PD Min ). This measure assumes that the success of an alien species is driven by its distance to the phylogenetically nearest native species because they most likely use similar resources and share similar enemies and mutualists.
Using the global angiosperm phylogeny, we calculated for each species that is introduced for cultivation in Southern Africa its PD Mean , PD wMean and PD Min to the native ora of Southern Africa. In addition, to test for the effect of phylogenetic relatedness on the probability that a naturalized species becomes invasive in the country of South Africa, we calculated for each cultivated naturalized species in South Africa its PD Mean , PD wMean and PD Min to the native ora of South Africa (n = 15,382, according to GIFT).

Statistical analysis
All statistical analyses were performed in R version 3.6.1 82 . To test how introduction for cultivation, naturalization and invasiveness are associated with the phylogenetic distance indices, PD Mean , PD wMean and PD Min , we used a series of complementary generalized linear models (GLMs) with binomial error distribution with either a clog-log or a logit link function (see below). To see how naturalization success of the entire cultivated species relates to phylogenetic distance if one would not account for a potential introduction bias, we rst tested among the entire non-native global ora how the likelihood of naturalization in Southern Africa relates to the phylogenetic distance indices. As in these analyses, the binomial response variable had many more zeros than ones, we used the clog-log link function 83 . Then to account for the introduction bias, we broke this analysis down into two further analyses: (1) We tested with the entire global ora not native to Southern Africa how the likelihood of being introduced to Southern Africa for cultivation relates to the phylogenetic distance indices. We again used clog-log link function in this analysis. (2) We tested among the species introduced for cultivation in Southern Africa how the likelihood of naturalization relates to the phylogenetic distance indices. As for this analysis, the numbers of zeros and ones were similar, we used the logit link function. Finally, we tested for the subset of cultivated species that have become naturalized in the country of South Africa (n = 459) how the likelihood of invasiveness relates to the phylogenetic distance indices. We also used the logit link function for this analysis. For the invasiveness analysis, we assessed the robustness of results when considering all naturalized species in South Africa according to GloNAF (n = 799 for South Africa), i.e. irrespective of whether they were introduced for cultivation (n = 459 for South Africa) or not (n = 340). As the results of models when considering all species or the cultivated species were comparable, we present the results from the analyses using the cultivated species in the main manuscript and the results from the analyses using all species in the Appendix (Table S2).
As pairwise Pearson correlations between some of the phylogenetic indices were strong to very strong (PD Mean and PD wMean : r = 0.98, PD Mean and PD Min : r = 0.35, PD Min and PD wMean : r = 0.37), we ran separate GLMs for each of them. We then used Akaike's Information Criterion (AIC) to identify which of the three phylogenetic distance indices resulted in the best model t as in 43 . To test for potential non-linear humpor U-shaped relationships of introduction, naturalization and invasiveness probabilities with the phylogenetic distance indices, we also included a quadratic term for each distance index after centering the indices to means of zero. Moreover, to facilitate comparisons of the estimates within and between the models, we also scaled each explanatory variable to a standard deviation of one 84 . When the quadratic term was not signi cant (i.e. P > 0.05), we removed it from the model. To calculate the explained deviance (R 2 ) by the explanatory variables in our models, we calculated the Nagelkerke Pseudo R 2 for each GLM 85 using the "rcompanion" R package 86 . To account for phylogenetic non-independence of the species, we also ran phylogenetically corrected GLMs using the 'phyloglm' function of the R package 'phylolm', ver.
2.6.2 87 . However, because the results were very similar to those of the standard GLMs, we present the results of the phylogenetically corrected GLMs on in Tables S3,4. Figure 1 <p><strong>Map of Africa and the study regions showing the 10 countries of Southern Africa</strong>: Angola, Botswana, Eswatini, Lesotho, Malawi, Mozambique, Namibia, South Africa, Zambia and Zimbabwe. As we had separate native-species occurrence data for the nine provinces of South Africa, the total number of Southern African regions was 18.</p> Phylogenetic tree of naturalized species in the country of South Africa (n = 459). The three inner rings indicate in heat colors the three indices of phylogenetic distance to the respective native ora: mean phylogenetic distance to the native species (PD<sub>Mean</sub>), weighted mean phylogenetic distance to the native species (PD<sub>wMean</sub>), and phylogenetic distance to the nearest native species (PD<sub>Min</sub>). The outer black stripes in a, b and c indicate cultivated species (n = 5,091), naturalized species (554 species) and invasive species (n = 261), respectively. The names of major clades (i.e. plant orders) are provided for orientation.</p> Figure 3 <p><strong>The probabilities of introduction and naturalization success of alien ora of Southern Africa. </strong>The probabilities of naturalization of all non-native plant species (a-c), of introduction for cultivation of all alien plant species (d-f) and of naturalization of introduced cultivated plant species in Southern Africa (g-i) in relation to the different phylogenetic distance indices: mean phylogenetic distance to the native species (PD<sub>Mean</sub>; a,d,g), weighted mean phylogenetic distance to the native