Cost data and processing
We extracted cost data from the latest version of the InvaCost database (version 4.1, publicly available at 10.6084/m9.figshare.12668570 [20, 41]). InvaCost has been generated following a systematic, standardized methodology to collate invasion costs from peer-reviewed scientific articles, official reports, grey literature, and stakeholder and expert elicitation. Following a thorough and hierarchical screening of each source document for relevance, costs were extracted, standardized to a common currency (2017 USA dollars/US$), and adjusted for inflation through the Consumer Price Index (https://data.worldbank.org/indicator/FP.CPI.TOTL?end=2017&start=1960) to be comparable across space and over time [41]. Costs were categorized under a range of descriptive fields pertaining to the original source (such as title, authors and publication year), spatial and temporal coverage (such as period of estimation, study area), cost estimation methodology (such as method reliability and acquisition method) and the cost estimates per se (such as nature and typology of cost relating to damage and/or management costs). Detailed information on all descriptive variables can be found in an online repository of the InvaCost database (https://doi.org/10.6084/m9.figshare.12668570, “Descriptors4.1.xlsx”).
Costs can occur over varying periods; for example, a one-off cost associated with a one-time eradication effort versus a multi-year cost associated with recurrent, annually estimated damages to crop production. To homogenize the temporal occurrence of these cost entries in the database, they were all converted to annual costs using the expandYearlyCosts function of the invacost R package [42]. This function provides annualized cost estimates for all entries, based upon the probable starting and ending years of the cost occurrence provided in the database (‘Probable_starting_year_adjusted’ and ‘Probable_ending_year_adjusted’ columns). For example, a single cost entry of $5,000 that occurred between 2000 and 2009 would be transformed to ten entries following expansion, each amounting to $500 per year. Accounting for these dimensions of costs also allowed for assessments of the dynamics of cost occurrence over time [42]. Furthermore, for this analysis, we considered costs with impact years between 1960 and 2020, given limited InvaCost data before 1960, and constraints on the availability of relevant socio-economic variables beyond this period (see Predictor variables).
We further considered species-specific cost entries only, thus excluding those for diverse (where costs were reported collectively for multiple taxa) or unspecific (where species-level information was missing) taxa. Likewise, we removed costs reported in unspecified geographic regions (those that could not be attributed to any continents or countries) and blank cost entries. We additionally removed cost entries for disease agents (viruses, bacteria, human pathogens) from the data, as these taxa are equivocally identified as non-native, and we are typically more interested in the movement of their vector species. For example, invasive alien mosquitoes (Aedes spp.) would be included, while the viral diseases they vector (yellow fever, Zika, chikungunya etc.) would be excluded. We also opted to use the most robust subset of these resulting data, by considering only costs that were of high method reliability (from peer-reviewed literature or other sources with documented, reproducible and traceable methods) and empirically observed (costs actually incurred, rather than expected or predicted). Further, we removed cost entries at the ‘unit’ spatial scale (belonging to various minor scales below the site level) because of the higher likelihood of double-counting with costs at larger geographic scales, and because the total area over which these costs were incurred was variable and often unreported (for example, costs reported per m2 without indicating the total size of the area impacted). These filters thus allowed us to consider costs (i) from individual IAS in defined recipient continents or nations, and for which regional origins could be determined, (ii) that were actually incurred, reported and estimated through “highly reliable” methods, and (iii) at appropriate, distinct spatial scales. The aforementioned filters, however, also mean our reported costs are underestimated and uneven due to reporting differences regionally. Unless specified differently, all results are provided for the filtered dataset.
Species origins
As a first step in determining species’ countries of origin, we employed a web scraping script to gather data from the Centre for Agriculture and Bioscience International (CABI) Invasive Species Compendium (ISC, www.cabi.org/isc), the International Union for Conservation of Nature (IUCN) Global Invasive Species Database (GISD, http://www.iucngisd.org/gisd/) and the Global Biodiversity Information Facility (GBIF, www.gbif.org) (see www.github.com/emmajhudgins/Givers_Takers for more information). CABI’s ISC contains a variety of information on IAS around the world, including their current distribution and countries of origin [18]. Our script searched using the species names as entered within InvaCost (harmonized using the GBIF.org Backbone Taxonomy; [39]) as well as synonyms in the Integrated Taxonomic Information System (ITIS) database via the taxize R package [43]. If a species match was found within the CABI ISC, we searched for a “Distribution Table” portion of the species’ entry. If found, we extracted country or region (within country) names tagged as “Native” within this table. GISD contains geographical information for many IAS, and was used as an alternative to CABI where distributional data were missing. Our script searched for GISD distributional data points tagged “Native” and compiled them at the country level. Finally, we checked for matching entries in GBIF — a global database of all types of species distribution — tagged as “Native” at the country level within the occ_search function of the rgbif package version 3.6.0 [44]. We used present day political border definitions for each country as defined by ISO3C codes in the countrycode package [45].
Next, where possible, we used country-scale origins to infer continental regions. Countries designated in InvaCost to be part of Central America were assigned to North America (and we refer to them henceforth as North America). Following InvaCost protocols, overseas territories were linked with the continent that matched their geographic, rather than political, designation. As exceptions, Turkey and Russia were identified as transcontinental sender and recipient countries. Origin continents within Turkey and Russia were selected on a case-by-case basis for each species, considering published data on the finer-scale distribution of each species within these countries as well as the continental designation of other countries listed (for example, if all other origin countries listed were European, we considered the native range to be European; see Supplementary Table 3 for details of species impacted). In these two cases, we classified recipient regions based on human population, because of the role of humans in transporting IAS [46] and incurring economic impacts [15]. Since most of Turkey’s population is in Asia and the majority of Russia’s population is in Europe, we assigned them accordingly to these continents. As a third exception, China’s Special Autonomous Regions (Hong Kong and Macau) and Taiwan were merged with mainland China due to them representing a much smaller landmass, as well as being strongly linked to China politically, economically and geographically.
All origin assignments were checked manually by co-authors (where we ensured that there existed 1 reliable sources that agreed on the origin continent at least), or were entered for the first time when information was unavailable from GISD and CABI, using available literature and databases. Literature was identified through ad hoc, informal searches, so it is possible that some known native countries were missed. However, this is likely to be a small issue compared to the number of native countries that have never been identified in the literature. A list of literature sources used to check the species’ origins is provided in Supplementary Note 4. Some species were allocated only to a continent of origin, due to the absence of country-level data (see later).
Origin information was identified for 467 unique species with cost records that met our aforementioned filters (high reliability, observed records within defined continents, cost incurred 1960–2020, non-pathogens). Of these, eight were removed due to a domesticated status (cat, Felis catus; dog/wolf, Canis lupus; sheep, Ovis aries; dromedary camel, Camelus dromedarius; pig, Sus scrofa; horse, Equus caballus; donkey, Equus asinus; and goat, Capra hircus, and with cow, Bos taurus, and ferret, Mustela furo having been removed by previous filters). This set of species does not have clear native ranges due to their long domestication and/or hybridization history. In contrast, we opted to retain species such as the European rabbit (Oryctolagus cuniculus) with a well-defined native range [47]. The remaining 459 unique species were recorded in six origin and recipient continents, amounting to 4,107 cost entries reported across 539 independent publications (expanded to 8,060 total entries; Figure 5).
When subset to entries with a country-level resolution, our dataset was further restricted to 412 unique species in 223 origin and 80 recipient countries, corresponding to 3,685 raw cost entries, 436 unique publications, and 7,112 expanded entries. Overseas territories were removed from this portion of the analysis because they lacked trade volume, GDP, and/or population data, which were implemented in models (see Predictor variables).
Impact distributions
Our analyses illustrate the distributions of both (i) numbers of IAS with costs and (ii) monetary costs, each among sender and recipient regions. Therefore, our analysis of IAS flows considers only those with reported costs in InvaCost. For numbers of species with costs, each species’ contribution was divided by the number of origin regions known for the species and/or destination regions recorded in InvaCost. This ensured that each species’ contribution summed to ‘1’ in the total number of species sent or received. For example, if a species was native to three countries and was reported to cause impacts to two countries in all of InvaCost, it would contribute a value of 0.33 species sent from each country and 0.5 species received to each country. We acknowledge that this may not be an accurate representation of the weight of particular origins of the invasion, but this information was unavailable given the complexity and changeability of pathways and vectors. For costs, when a single cost entry was reported in two or more geographic regions or countries in a cost entry, the cost was split equally among those recipient regions or countries. Similarly, if an IAS originated from two or more origin regions or countries, the aggregate cost from that IAS was split equally among those origin regions or countries.
Predictor variables
We separated our analysis by decade. Then, from InvaCost version 4.1, we generated a variety of predictor variables that we hypothesized would influence the magnitude of cost flow to and from different locations (where the cost flow from Region A to B refers to the costs of IAS in Region B that are due to native species from Region A). Firstly, we extracted the number of unique cost references associated with each receiving country in each decade, as a proxy for research effort (“Reference_ID” field in the InvaCost database). Secondly, we summed the total number of species causing the impact flows between countries for each decade. Thirdly, we summed the total cost, incurred between 1960 and 2020, of IAS originating from each country.
Beyond these InvaCost-specific predictors, we employed several external variables hypothesized to influence the magnitude of cost flows due to biological invasions [48]. We extracted the total volume of imported goods (in metric tonnes) for each country pair from the Centre d'Études Prospectives et d'Informations Internationales’ BACI database [49] for the years 1995–1999 inclusive and 2015–2019 inclusive, selecting the HS92 designation of harmonized import and export records (see Supplementary Table 2 example data from 10 pairs). We calculated the mean annual flow of goods between each country pair for the 1995–1999 period and dubbed this ‘historical trade’. Historical trade can be more predictive of present-day invasion risk due to invasion lags (see [48]), but we note that consistent import data are not available for the entire period of our cost data. The mean annual flows for 2015–2019 reflect recent trade (prior to the COVID-19 pandemic). To assess the role of origin and recipient biodiversity in dictating the flow of invasion impacts, we downloaded species richness data for each country from Mongabay, which tallies species richness for amphibian, bird, fish, mammal, reptile, and vascular plant species [50]. As a proxy for environmental matching, we identified countries that shared at least one terrestrial biome using data from [51] and Global Administrative Areas v3.6. We assume that country pairs sharing terrestrial biome(s) also share some freshwater and marine environments with similar conditions. Note that we also tested the effect of a shared climate zone variable, but the biome model had greater deviance explained. We also used the mean annual GDP and human population of each decade, and surface land area (reported in 2018, and measured in square km, including inland waters, but excluding marine Exclusive Economic Zones) from the World Bank using the wbstats R package, and the distance between countries variable from the Centre for Prospective Studies and International Information (CEPII) database (calculated between two most populous cities), which has been previously employed to model invasional flows [2]. From this same paper, we also extracted information on common language (spoken by at least 9% of the population in each country), shared geographical borders, the existence of a free trade agreement between countries, and a shared colonial history (CEPII GeoDist and Gravity Databases). Missing predictor variable values were filled in with either the closest decade of available data (372 missing sender country+decade pairs for GDP, one missing sender country+decade pair for population), or the mean value of the predictor across data points where no information was unavailable (96 interpolated countries for sender species richness, 42 interpolated countries for recipient species richness, 131 missing country pairs for distance, and one missing sender country for GDP [Democratic People’s Republic of Korea]). No pair of predictor variables was highly correlated (all r < 0.80), and thus all predictors were retained in the models.
Statistical modelling
We built predictive models of the cost flow between each country pair for all complete (non-zero) flows recorded in InvaCost (Fig. 5). To do this, we first summed our cost data within each decade and within each origin-recipient country combination, employing the countrycode R package [45] to ensure consistent country naming by converting all InvaCost country records to ISO3C codes. All models were fit as generalized additive models (GAMs) using the mgcv package [52], where all quantitative predictors and the cost flows (in millions of $) were logarithmically scaled. Decade was included as a thin-plate smoother term with five knots (a maximum of four inflection points in its functional form) to de-trend the cost flows for consistent variability in time. This variability could be due, for instance, to periods of global economic growth and decline. The ‘numbers of species with costs’ predictor per cost flow controlled for the expected increase in IAS impacts due to a simple increase in IAS sent or received. Within each GAM, we employed the select method to avoid the overparameterization of our smoother terms. This method uses a cross-validation approach to penalize overfitted smoother terms (using the GCV.Cp method). All non-smoothed variables were loge-transformed prior to analysis to meet model assumptions as determined by GAM model-checking results. Models were checked for high concurvity using the mcgv function concurvity (the GAM equivalent of multicollinearity; [52]), where ‘worst case’ concurvity values of >0.8 were taken to indicate model overfitting.