The role of phylogenetic relatedness on alien plant success depends on the stage of invasion

Darwin’s naturalization hypothesis predicts successful alien invaders to be distantly related to native species, whereas his pre-adaptation hypothesis predicts the opposite. It has been suggested that depending on the invasion stage (that is, introduction, naturalization and invasiveness), both hypotheses, now known as Darwin’s naturalization conundrum, could hold true. We tested this by analysing whether the likelihood of introduction for cultivation, as well as the subsequent stages of naturalization and spread (that is, becoming invasive) of species alien to Southern Africa are correlated with their phylogenetic distance to the native flora of this region. Although species are more likely to be introduced for cultivation if they are distantly related to the native flora, the probability of subsequent naturalization was higher for species closely related to the native flora. Furthermore, the probability of becoming invasive was higher for naturalized species distantly related to the native flora. These results were consistent across three different metrics of phylogenetic distance. Our study reveals that the relationship between phylogenetic distance to the native flora and the success of an alien species changes from one invasion stage to the other. This paper examines how the relationship between native and alien plants changes the nature of an invasion, finding that the stages, and ultimate success, of an invasion are intrinsically linked to the phylogenetic relationship.

T he translocation and introduction of species to new regions where they are not native and their successful establishment in the wild (naturalization) 1 is rapidly changing the distribution of biota 2,3 . A subset of those naturalized alien species-the invasive ones-cause serious environmental and socio-economic problems [4][5][6][7] . Numbers of introduced, naturalized and invasive species are expected to continue to increase 6,8,9 , partly boosted by other global changes (for example, climate and land use) and the increasing transportation of people and goods 8,[10][11][12][13] . Thus, understanding the drivers of biological invasions has become a major objective in ecology.
Many studies have tried to determine the features of successful plant invaders by analysing their functional traits (for example, seed mass, height, mode of reproduction), native origins and introduction history across spatial and temporal scales [14][15][16][17][18][19][20][21][22][23][24][25][26][27] . Another line of research has tried to explain invasion success through taxonomic and phylogenetic comparisons of alien and native floras [28][29][30] , because evolutionary relatedness is assumed to reflect the functional similarity of species. This approach has been fostered by the increasing availability of phylogenetic information, which has allowed for more accurate estimates of the evolutionary relatedness of alien to native species [31][32][33][34][35] .
Interest in using evolutionary relatedness as a potential driver of invasion success started with Darwin's naturalization hypothesis 32,36,37 . Darwin 36 proposed that the more distantly related an alien species is to the native community, the more likely it will establish successfully. The reasoning behind this prediction is that distantly related alien species are functionally more distinct from native species, and it is therefore more likely that they can utilize empty ecological niches in the resident community. However, Darwin 36 also proposed that alien species that are closely related to native species should be more successful because their similarity to the native species would indicate that they are pre-adapted to local environmental conditions: the so-called pre-adaptation hypothesis. These seemingly contradictory hypotheses are known as Darwin's naturalization conundrum 32,38 .
Studies that tested Darwin's naturalization conundrum have found no consistent relationships between invasion success and evolutionary distance to native species 32,39 . This might reflect that studies considered different temporal and spatial scales; for example, at small scales Darwin's naturalization hypothesis is likely to hold, whereas at large scales the pre-adaptation hypothesis is likely to hold 32,39,40 . Furthermore, most studies have addressed Darwin's naturalization conundrum without considering possible differences between invasion stages 28,[41][42][43][44][45] , although some have considered both naturalization and invasiveness 38,46,47 . The process of biological invasion can be subdivided into subsequent stages, and species have to pass different barriers or filters to move from one stage to the next 1,48 . First, species must overcome biogeographic dispersal barriers via human introduction into a new region. Although Darwin's naturalization conundrum does not make any predictions regarding the introduction stage, accounting for this stage is important because introduced plants, such as the ones introduced intentionally for economic uses (for example, as ornamental plants), are unlikely to be a random phylogenetic subset of the global flora 49 . Ignoring such introduction bias can result in the over-or underestimation of the importance of species' traits and phylogenetic distance for naturalization success 25 . Once a species has been introduced, phylogenetic relatedness to the native flora may impact its success in subsequent stages in different ways. It is plausible that to establish self-sustaining populations (that is, to become naturalized) in a region, it is important that a species is pre-adapted to the new environment 50 , which is likely the case when there are closely related natives 28,39,51 . However, to reach the next stage of invasion (that is, become dominant and widespread-being invasive), it may be beneficial to be different from the native species and capitalize on unoccupied niches, which is likely when there are no closely related natives. We hence hypothesize that distinguishing between the three different stages of invasion 39,52,53 might provide additional insights into Darwin's naturalization conundrum.
The evolutionary relatedness of naturalized species to native floras has typically been compared with the relatedness expected if the naturalized species had been drawn randomly from the global flora 29,54,55 . Such a test relies on the unrealistic assumption that there has been no bias with regard to the phylogenetic affinity and characteristics of the introduced species and that all species have had the same opportunity to naturalize. However, the pool of introduced species from which the naturalized ones have emerged is often phylogenetically biased 49 . Information on the pool of alien species introduced into a region for cultivation, in addition to information on naturalized and native species, would be needed to account for biases that might be associated with the selection of species for introduction 28,42 . Furthermore, when assessing drivers of plant invasions, one should consider that relationships may be nonlinear (hump or U-shaped) 18 . To improve our understanding of the drivers of plant invasion success, we hence need to quantify and account for biases associated with the introduction stage 25,56 and to test for nonlinear effects 40 .
Here, we study 5,091 alien angiosperm species that have been introduced into Southern Africa for cultivation 57 . Combined with lists of the global flora, the naturalized and native species within the region of Southern Africa and the species that have become invasive within the country of South Africa (Fig. 1), we test Darwin's naturalization conundrum at the three different stages of the invasion process. Specifically, we asked how phylogenetic relatedness to native species is correlated with the subsequent stages of introduction for cultivation, naturalization and invasiveness of alien species.

Results
The introduction and naturalization transitions. We found that of the 299,834 angiosperm species that are not native to Southern Africa, 974 (0.32%) have become naturalized in at least one of Southern Africa's subregions. Among the phylogenetic distance models explaining the likelihood of being naturalized, phylogenetic distance to the most closely related native species (PD Min ) provided the best model fit (had the lowest Akaike's information criterion), followed by weighted mean phylogenetic distance to the native species (PD wMean ) and mean phylogenetic distance to the native species (PD Mean ) ( Fig. 2 and Supplementary Table 1). The likelihood of naturalization of a species from the global species pool decreased significantly for species more distantly related to the native flora according to all three phylogenetic indices ( Fig. 3a-c, Supplementary Fig. 2a and Supplementary Table 1).
When accounting for cultivation bias by considering the two transitions-to the introduction for cultivation and from the introduction for cultivation to naturalization-separately, we found that of the 299,834 angiosperm species that are not native to Southern The naturalization-invasiveness transition. Because information on the invasiveness of naturalized aliens is not available for the entire region of Southern Africa, we tested for a phylogenetic signal in the naturalization-invasiveness transition within the naturalized cultivated flora of the country of South Africa. Among the 524 introduced cultivated species that have naturalized in the country of South Africa, 310 (59.1%) are considered invasive. In phylogenetic distance models explaining the likelihood of being invasive in the country of South Africa, PD Min provided the best model fit, followed by PD wMean and PD Mean (Supplementary Table 2). According to all three models, naturalized species were more likely to become invasive if the phylogenetic distance to the most closely related native species is large ( Supplementary Fig. 1c, Fig. 4 and Supplementary Table 2).

Discussion
Because many studies on biological invasions lack crucial data on which species have been introduced but failed to establish, we took advantage of a comprehensive list of alien plant species that have been introduced into Southern Africa for cultivation. This list allowed us to account for cultivation biases in our tests of Darwin's naturalization conundrum along the different stages of the invasion process. Our results show that alien species distantly related to native species were more likely to be introduced for cultivation. Once introduced for cultivation, however, those that are more closely related to native species were more likely to naturalize. By contrast, among the naturalized species in the country of South Africa, the ones most distantly related to the natives were more likely to become invasive. Our results thus show that phylogenetic distance to the native flora has opposing effects on the subsequent transitions during the invasion process.
Humans have introduced thousands of plant species from their native regions into foreign lands. Although some of those introductions were accidental, most were intentional for cultivation purposes 40,58-61 . Moreover, it is likely that the species that have been introduced are not a random selection from the global flora but possess certain characteristics that make them of interest for cultivation 25,56 . This is also reflected in their phylogenetic distance to the native flora because we found that introduction of alien species for  Table 1). This most likely indicates that non-native species with characteristics that are missing in the native flora were more likely to have been prospected for cultivation in Southern Africa. For example, the fact that Australian Eucalyptus species-which have no close relatives in Southern Africa-grow faster and produce better wood than most native Southern African trees, made them very attractive for cultivation in forestry plantations 62 . The same is true for the introduction of other woody species into the country of South Africa that were planted for desertification control and to reduce firewood shortages 63 . In other words, our results are in line with the idea that humans might preferentially introduce plant species with characteristics that could provide economic and social benefits or ecosystem services not provided by species of the native flora.
Once introduced for cultivation, not all alien species manage to grow and reproduce outside cultivation. We found that the naturalization success of species introduced for cultivation was negatively associated with phylogenetic distance to the native flora (Figs. 2 and 3g-i and Supplementary Table 1). In other words, and in line with the pre-adaptation hypothesis, introduced species that have closely related, and likely ecologically similar, species in the native Southern African flora were more likely to naturalize. This pattern was even visible when we considered the naturalization of non-cultivated species only (Supplementary Fig. 2) and the naturalization of all species in the global flora that are not native to Southern Africa (Fig.  3a-c and Supplementary Table 1), despite the biased introduction of cultivated species that are distantly related to the native flora. This suggests that even if there is no information on which alien species have been introduced and failed to establish, comparing naturalized species with the global flora, as done in previous studies 29,54,55 , provides some indication of the importance of phylogenetic distance.
Naturalization in a new range depends on several abiotic and biotic factors 1 . These factors can act as filters that determine which species can, in principle, grow in the region, but biotic interactions such as competition can also provide resistance against the ultimate establishment of those species. It has been shown that, at least at the local scale, environmental pre-adaptation and biotic resistance are both important for alien plant naturalization 38,43,64 . However, in line with our results, at large spatial scales, environmental filtering is usually the most decisive factor for naturalization success 28,39,51 . One potential explanation, the pre-adaptation hypothesis proposed by Darwin 36 , is that introduced species closely related to the native flora share features with those native species that allow for survival and reproduction in the new range. Indeed, a study that used functional traits of alien and native plants found that the similarity of both groups facilitates naturalization 21 . These shared characteristics make the alien species pre-adapted to the new environmental conditions. Although it is likely that closely related species have stronger competitive impacts on each other [65][66][67][68] (but see Dostál 69 ), at large scales, the pre-adaptation effect may overrule the drawback of stronger competition from close relatives 32,41,70 . Overall, our results at the naturalization stage support the results of previous studies that examined the same mechanism 39 .
Although the naturalization of introduced alien plants for cultivation in Southern Africa might have occurred at sites where closely related native species happened to be present, this is less likely the case for naturalized species that have become widespread and locally dominant (invasive). For a naturalized species to spread and become dominant, it has to interact with increasing numbers of native species. Those interactions are expected to be less detrimental for the alien species if it is more dissimilar to the natives because this allows it to avoid strong competition and common enemies by occupying vacant niches 70,71 . In line with this idea, we found that all naturalized species (Supplementary Fig. 1c) and the subset of cultivated naturalized species in the country of South Africa (Fig.  4) were more likely to be invasive if they have no close relatives in the native flora. The same non-significant trend was also found for the subset of non-cultivated species in the country of South Africa for the phylogenetic distance metric that provided the best model fit (PD Min ; Supplementary Fig. 3). A positive relationship between phylogenetic distance and invasiveness has also been shown for alien trees and shrubs in Southern Africa 72 . Many other studies addressing Darwin's naturalization conundrum in plants also found support for Darwin's naturalization hypothesis at the invasiveness stage 21,28,39,45,51,73 . However, a recent study on the abundance of alien birds 74 and a study on alien plant dominance in vegetation plots 46 found support for Darwin's pre-adaptation hypothesis. This suggests that the role of phylogenetic relatedness in driving invasiveness may differ between groups of organisms and depends on the spatial scale considered. We used three different phylogenetic distance indices, but because they are correlated, they revealed similar patterns. The time-calibrated phylogeny of Smith and Brown 75 , which we used to construct the global phylogeny, includes 86% of all genera. Consequently, the phylogenetic resolution of our tree is relatively good within families but is not within most genera. This might particularly affect estimates of PD Min . Nonetheless, because phylogenetic distances between species of the same genus are usually shorter than for species belonging to different genera or families, PD Min should still provide a good proxy for the presence of closely related species, and therefore is a useful phylogenetic distance measure for the question at hand. Although all phylogenetic distance measures revealed the same patterns, the ones that provided the best model fit differed among invasion stages ( Fig.  3 and Supplementary Table 1). For the probability of introduction for cultivation, PD wMean had the best model fit, suggesting that particularly alien species that have no widely distributed native relatives are likely to be introduced for cultivation. For subsequent naturalization of the cultivated species, unweighted PD Mean had the best model fit, suggesting that for naturalizing, each native species is equally important, irrespective of its range size. For the probability of naturalized species becoming invasive, PD Min had the best model fit, suggesting that distance to the closest relatives is most important for becoming widespread and dominant.
In conclusion, our results show that the direction of the effect of phylogenetic distance of alien plants to the native flora alternated from one invasion stage to another in Southern Africa. Although introduction for cultivation was positively associated with phylogenetic distance to the native flora, the opposite was true for subsequent naturalization success, but invasiveness was again positively associated with phylogenetic distance. Although the three different phylogenetic distance indices (PD Mean, PD wMean and PD Min ) showed the same patterns along the cultivation-naturalization-invasiveness transitions, different phylogenetic indices provided the best fit model in each stage. Thus, accounting for the different invasion stages and considering multiple phylogenetic distance metrics provide additional insights into Darwin's naturalization conundrum.
Finally, yet importantly, the two seemingly opposing hypotheses of Darwin are not in conflict. Rather, the mechanisms underlying them act concurrently, and the one that dominates depends on the alien species' stage along the invasion process.

Study area.
Our study focuses on the region of Southern Africa, which includes ten countries: Angola, Botswana, Eswatini, Lesotho, Malawi, Mozambique, Namibia, South Africa, Zambia and Zimbabwe, with a land area of approximately 4,000,000 km 2 (ref. 72 ). Because we had separate native species occurrence data for the nine provinces of the country of South Africa, the total number of Southern African subregions was 18 (Fig. 1). Southern Africa has had a long history of plant introductions, which started in the late 18th century with the arrival of the European settlers 76 .
The cultivated, naturalized and invasive alien species lists. The basis for our study is a list of over 8,000 taxa (including species and infraspecific taxa; here jointly referred to as 'species') reported as being cultivated in Southern Africa in the book Cultivated Plants of Southern Africa 57 . To allow the alignment of the list of species with other datasets used in this study (see below), we validated and synchronized the taxonomic names according to The Plant List (v.1.1; http://www. theplantlist.org/) using the R package 'Taxonstand' 77 . We then removed the species that, according to The Global Inventory of Floras and Traits database (GIFT) 78 , are native to Southern Africa. GIFT data had been extracted using the R package 'RMySQL' v.0.10.20. Moreover, we removed ferns and gymnosperms because they are phylogenetically very distinct from angiosperms. The final list of introduced species comprised 5,091 cultivated alien angiosperm species.
To identify which of the cultivated alien species have become naturalized in Southern Africa, we used the Global Naturalized Alien Flora database (GloNAF) 79 . GloNAF includes lists of naturalized vascular plant taxa for over 1,000 regions (usually administrative regions, such as countries, states and provinces) around the globe 79 . Of the 5,091 cultivated species, 592 (11.6%) have managed to naturalize in at least one of the GloNAF regions 79 that are part of Southern Africa.
Because the invasiveness of naturalized species might vary among the different subregions of Southern Africa, and because data on invasiveness were not available for all those subregions, we restricted the analysis of how invasiveness relates to phylogenetic distance between the alien and native species to the subset of alien species that are naturalized in the country of South Africa (n = 524). For this, we used the most recent list of invasive plant species for the country of South Africa by Zengeya and Wilson 80 , in which invasive species are defined as widely spread alien species, in accordance with the unified invasion framework of Blackburn et al. 48 .

Global angiosperm species pool.
To be able to test whether the cultivated introduced plant species as well as the naturalized species are more or less phylogenetically related to the native flora of Southern Africa than expected by chance, we needed a list of the global species pool outside Southern Africa. To obtain such a list, we extracted all 324,808 accepted names of angiosperm taxa from The Plant List (http://www.theplantlist.org/, last accessed May 2019). From this list, we excluded the species that, according to GIFT, are native to Southern Africa (n = 24,974), resulting in a global pool of 299,834 species that are not native to Southern Africa.
Phylogenetic tree of the global flora. To compute phylogenetic distance indices between the species from the global species pool outside Southern Africa and the native species of Southern Africa, we constructed a global phylogenetic tree of all plant species with accepted names according to The Plant List (n = 326,101; for more details see van Kleunen et al. 49 ). In brief, as a basis for this phylogeny, we used the time-calibrated phylogeny of Smith and Brown 75 , which is currently the most comprehensive phylogeny for seed plants. Of the global flora, 2,056 genera (14.0%) and 27,064 species (8.3%) were missing from the base tree. We manually added these missing species to their genus-or family-level root nodes in the base tree using the R package 'phytools' 81 . Because we were only interested in seed plants, we removed all non-seed plants from the phylogeny. This resulted in a global angiosperm phylogeny of 324,808 species (Fig. 2), of which (n = 24,974) are native to Southern Africa.

Phylogenetic distance indices.
Several metrics can be used to describe phylogenetic relatedness between plant species 32 . Because phylogenetic grain (the depth within a phylogeny) might affect the results of phylogenetic analyses 73 , we selected the three most commonly used phylogenetic distance indices that describe phylogenetic composition at the deeper branches and at the tips of the tree 32 . First, to capture phylogenetic composition at the deeper branches of the tree, we calculated PD Mean . This measure assumes that the whole native flora (irrespective of the abundances of the species) contributes equally to the success or failure of the alien species. Second, we calculated PD wMean . This measure weights the phylogenetic distance by the occurrence frequency of the native species, assuming that the alien species will be more likely to interact with the widespread native species. As a measure of how widespread a native species is, we used the number of Southern African subregions (total n = 18; Fig. 1) in which it occurs according to GIFT. Lastly, to capture phylogenetic composition at the tips of the tree, we calculated PD Min . This measure assumes that the success of an alien species is driven by its distance to the closest native relative because they most likely use similar resources and share similar enemies and mutualists.
Using the global angiosperm phylogeny, we calculated for each species introduced for cultivation in Southern Africa its PD Mean , PD wMean and PD Min to the native flora of Southern Africa. In addition, to test for the effect of phylogenetic relatedness on the probability that a naturalized species becomes invasive in the country of South Africa, we calculated for each cultivated naturalized species in the country of South Africa its PD Mean , PD wMean and PD Min to the native flora of the country of South Africa (n = 15,382, according to GIFT).

Statistical analysis.
All statistical analyses were performed in R v.3.6.1 (ref. 82 ). To test how introduction for cultivation, naturalization and invasiveness are associated with the phylogenetic distance indices, PD Mean , PD wMean and PD Min , we used a series of complementary generalized linear models (GLMs) with binomial error distribution with either a clog-log or a logit link function (see below). To see how the naturalization success of all non-native species relates to phylogenetic distance if not accounting for a potential cultivation bias, we first tested among the entire non-native global flora how the likelihood of naturalization in Southern Africa relates to the phylogenetic distance indices. Because in these analyses, the binomial response variable had many more zeros than ones, we used the clog-log link function 83 . Then to account for cultivation bias, we broke this down into two further analyses. First, we tested using the entire global flora not native to Southern Africa how the likelihood of being introduced into Southern Africa for cultivation relates to the phylogenetic distance indices. We again used the clog-log link function in this analysis. Second, we tested among the species introduced for cultivation in Southern Africa how the likelihood of naturalization relates to the phylogenetic distance indices. Because for this analysis, the numbers of zeros and ones were more similar, we used the logit link function.
Finally, we tested for the subset of cultivated species that have become naturalized in the country of South Africa (n = 524) how the likelihood of invasiveness relates to the phylogenetic distance indices. We also used the logit link function for this analysis. For the invasiveness analysis, we also assessed the robustness of the results when considering all naturalized species in the country of South Africa (n = 874 species), irrespective of whether they were introduced for cultivation (n = 524) or not (n = 350). Because the results of models considering all naturalized species and the results of models considering only the naturalized cultivated species were comparable, we present the results from the analyses using the cultivated species in the main article and the results from the analyses using all species in the Supplementary Table 2. Because the analysis of invasiveness was only possible for the country of South Africa, whereas the other analyses were done for entire region of Southern Africa, there was a scale mismatch. To assess how sensitive our results are to this scale mismatch, we also performed the analysis of naturalization success among cultivated species for the country of South Africa only. The results of this analysis are comparable with those of the analysis for all of Southern Africa (Supplementary Table 5).
Because pairwise Pearson correlations between some of the phylogenetic indices were strong to very strong (PD Mean and PD wMean : r = 0.99; PD Mean and PD Min : r = 0.39; PD Min and PD wMean : r = 0.42), we ran separate GLMs for each. We then used Akaike's information criterion to identify which of the three phylogenetic distance indices resulted in the best model fit as in Malecore et al. 43 . To test for potential nonlinear hump-or U-shaped relationships of introduction, naturalization and invasiveness probabilities with the phylogenetic distance indices, we also included a quadratic term for each distance index after centring the indices to means of zero. Moreover, to facilitate comparisons of the estimates within and between the models, we also scaled each explanatory variable to a standard deviation of one 84 . When the quadratic term was not significant (P > 0.05), we removed it from the model. To calculate the explained deviance (R 2 ) by the explanatory variables in our models, we calculated the Nagelkerke pseudo R 2 for each GLM 85 using the 'rcompanion' R package 86 . To account for phylogenetic non-independence of the species, we also ran phylogenetically corrected GLMs using the 'phyloglm' function of the R package 'phylolm' , v.2.6.2 (ref. 87 ). However, because the results were very similar to those of the standard GLMs, we present the results of the phylogenetically corrected GLMs in Supplementary Tables 3 and 4.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The core dataset of this study is available at (https://doi.org/10.6084/ m9.figshare.19597093.v1). We also used the GloNAF dataset available at (https:// idata.idiv. The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection We obtained GloNAF and GIFT datasets using the R package "RMySQL" (version 0.10.20).

Data analysis
The main data process and analysis were done using R version 4.1.0. We used the R "Taxonsatad" (version 2.2) to validated and synchronized the taxonomic names according to The Plant List. We constructed a global phylogenetic tree of all plant species with accepted names according to The Plant List using the R package "phytools" (version 1.0-1). We did the phylogenetically corrected GLMs using the 'phyloglm' function of the R package 'phylolm' (version 2.6.2). We calculated the Nagelkerke Pseudo R2 for each GLM using the "rcompanion" R package (version 2.4.1). The map in Figure 1 was produced using ArcMap Software version 10.8 (Website: https://desktop.arcgis.com/en/arcmap/). For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All studies must disclose on these points even when the disclosure is negative.

Study description
We collected data on 5,091 alien angiosperm species that have been introduced to Southern Africa for cultivation. Then combined with lists of the global flora non-native to Southern Africa (n= 299,834 species), the naturalized and native species within the region of Southern Africa, and the species that have become invasive within the country of South Africa. This novel combination of datasets was used to test the Darwin's naturalization conundrum at the different stages of the invasion process. Specifically, we showed how phylogenetic relatedness to native species is correlated with the subsequent stages of introduction, naturalization and invasiveness of alien species.

Research sample
Our study based on several existing datasets. We used these datasets because, to our knowledge, they represent the most comprehensive available data on the cultivated flora of Southern Africa.

Sampling strategy
We extracted the list of over 8, 000 species that are known to be cultivated in Southern Africa from the book 'Cultivated Plants of Southern Africa' (Glen 2000, Jacana: National Botanical Institute). Then, we validated and synchronized the taxonomic names according to The Plant List (version 1.1; (http://www.theplantlist.org/) using the R package 'Taxonstand'. Moreover, we removed the species that, according to The Global Inventory of Floras and Traits database (GIFT), are native to Southern Africa. Finally, we removed ferns and gymnosperms as they are phylogenetically very distinct from the angiosperms. The final list of introduced species comprised 5,091 cultivated alien angiosperm species.

Data collection
The list of cultivated flora of Southern Africa was manually digitized to an Excel sheet by C. Gommel, K. Mamonova, V. Pasqualetto and B. Rüter.
Timing and spatial scale GloNAF and GIFT datasets were accessed on 23 September 2020. The Smith and Brown Phylogeny was accessed on 18 January 2019.
The invasive Flora of South Africa were downloaded on 1 April 2021. GloNAF, GIFT, and Smith and Brown Phylogeny datasets have a global spatial scale.

Data exclusions
To allow the alignment of the list of species with other datasets used in this study, we validated and synchronized the taxonomic names according to The Plant List (version 1.1; http://www.theplantlist.org/), we then removed species with unaccepted names. Moreover, we removed ferns and gymnosperms as they are phylogenetically very distinct from the angiosperms.

Reproducibility
As our study mainly analyzed existing datasets, we did not test for reproducibility. However, we will provide a link to all the processed data.