Cyrtophora citricola is an orb-weaving spider that builds a dense, sheet web (Eberhard 2020) (ca. 30-40 cm in diameter, Leborgne et al. 1998), with an irregular tangle above and below it (Wheeler 1926) (Fig. 1a). C. citricola is a gregarious species that occurs in both colonial (Fig. 1b) and solitary webs, built on diverse plant species and human-constructed structures (Chauhan et al. 2009; Johannesen et al. 2012; Lubin 1980; Madrigal-Brenes 2012; Rypstra 1979; Teruel et al. 2014). Colonies can be massive and often cover the entire crown of bushes and medium-size trees (Barba-Díaz et al. 2014; Chauhan et al. 2009; Edwards 2006; Martín-Castejón and Sánchez-Ruiz 2010; Rao and Lubin 2010).
We compiled 2795 geo-referenced occurrence points of C. citricola from five different sources. We obtained 258 data points from the Global Biodiversity Information Facility (GBIF.org; accessed on March 29th 2018https://doi.org/10.15468/dl.hi6ahq), 18 from SpeciesLink (http://splink.cria.org.br/, accessed on April 4th, 2018), and 662 from the Royal Museum of Central Africa database. We also obtained 78 records from different literature sources (Online Resource 1) and collected 13 points in the field in Costa Rica that we geo-referenced using Google Earth. Additionally, our colleague Angela Chuang kindly provided 1574 data points from USA, collected as part of her own research.
We removed duplicated and inaccurate data points (e.g. points that fell in the ocean) from the database prior to conducting the analyses by projecting all points on a global map. Then we filtered the remaining data using the R package spThin (Aiello-Lammens et al. 2015) to remove all data points having less than a distance of 5 km from any other point and guarantee one record maximum per cell according to the resolution of our climatic layers (Online resource 2). With this procedure, we generated 32 data points for South Africa and the Southeastern part of Mozambique (hereafter, the South African region), 108 for the Mediterranean Region and 122 for America (Fig. 2).
To quantify environmental conditions throughout C. citricola’s distribution range, we used the 19 Wordlclim’s bioclimatic variables for all analyses. We used data from the WorldClim database Version 2 (Fick and Hijmans 2017, http://www. worldclim.org), at a 2.5 arcmin resolution (approximately 20 km² near the Equator). Additionally, we constructed other three environmental grids using information on monthly precipitation, minimum temperature and wind speed from the same climatic database (Fick and Hijmans 2017). We obtained the mean value from the monthly precipitation layers and then calculated the Average Annual Precipitation (hereafter AAP). We extracted the minimum temperature value for each raster cell within the 12-month layers of minimum temperature to get the Minimum Annual Temperature (hereafter MinAT). Finally, we constructed a Maximum Annual Wind Speed (hereafter MWS) layer, by extracting the maximum speed values for each cell layer within the monthly layers of wind speed. We included this last variable as new-born spiderlings of this species are known to disperse by wind (Johannesen et al. 2012).
For the analyses described below, we created a subset of environmental variables. For each Species Distribution Model (SDM) and the respective multivariate environmental similarity surface (MESS) analyses (Online Resource 3), we selected the subset of variables with the lowest collinearity. For this, we used the step-wise approach coded by M. W. Beck and available at https://gist.github.com/fawda123/4717702, which calculates the Variance-Inflation Factors (VIF).
To obtain the possible native region of the American C. citricola, we created three sets of candidate models: one set for each of the two potential native regions (South Africa, Mediterranean) and one for the invaded region (America). We generated the sets of several candidate SDMs with different parameterizations using the ENMeval R package (Muscarella et al. 2014). We generated each model using variations of two different parameters: (1) regularization multipliers that generate penalty values which help to select more simple models (see Elith et al. 2011; Phillips et al. 2017) (these values range from 0.5 to 5, at intervals of 0.5) and (2) different feature classes (transformations of the environmental variables values, see Elith et al. (2011)): Linear (L), Quadratic-Linear(QL), Hinge (H), and their combinations with Product and Threshold (LQHP y LQHPT) (Phillips et al. 2006; Phillips and Dudík 2008).
We calibrated all candidate models by delimiting an area of 1000 km around the presence points of each region, so that the calibration area would include enough background area containing both environments where the species could be present and environments where the species is likely absent. To avoid the spatial autocorrelation between testing and training points, we used two different data partitioning methods implemented by Muscarella et al. (2014). For the South African Model, we used the Blocks method, which divides the data in four bins with an equal number of occurrences but allows bins to vary in geographic size (Muscarella et al. 2014). This method has been recommended in cases where spatial extrapolation is needed (Muscarella et al. 2014) such as the case of this region which presents few occurrences that also tend to be grouped (Fig. 2). For the Mediterranean and American models, we used the Checkerboard 2 partitioning method, which, as the blocks method, divides the data in four bins, but facilitates the inclusion of isolated occurrences without altering the geographical size of the bin (Muscarella et al. 2014). Therefore, we considered this method appropriate for the scattered occurrences we have for these two regions.
We selected the model with the best evaluation metric from each set of candidate models and this was the model used for the projections on other regions. To determine the accuracy with which each native model predicted the real known occurrences of spiders in the invaded region, we projected each of the two native models on America. Then we created a third model – the one calibrated on the invaded environmental conditions – and projected it onto the two possible native regions, to cross validate the accuracy of our native models.
To evaluate the performance of each set of models created and select the best fitted for each native region, we used four selection criteria with the following priority order: (i) the lowest ‘Minimum training presence’ omission rate (ORMTP), (ii) the highest Area Under the Curve (AUCTEST), (iii) the lowest value of 10% Training omission rate (OR10), and (iv) the lowest number of parameters. For details regarding these criteria, see Muscarella et al. (2014). The model that best fitted the criteria mentioned, was selected as the model to run the posterior projections and analyses.
Geographic and environmental similarity
We extracted the suitability values assigned by each selected native model to the occurrences of C. citricola in America. These values were generated by each model for each cell in the region where it was projected, after correlating and fitting the environmental variables to the occurrences included for the region used for calibration (Warren and Seifert 2011). We compared the suitability values of both native models using a T-test.
To analyze the climatic niche overlap among each native region and the invaded region, we used Schoener’s D index (Schoener 1968; Broenniman et al. 2012) and Hellinger’s I index (Warren et al. 2008; Broenniman et al. 2012). These indices range from 0 (no niche overlap) to 1 (total niche overlap). We used the same criteria proposed by Rodder and Engler (2011) to interpret the index obtained: values between 0-0.2 indicate no or limited overlap, 0.2-0.4 indicates low overlap, 0.4-0.6 for moderate overlap, 0.6-0.8 represents high overlap and 0.8-1 very high overlap.
Additionally, we calculated the environmental niche overlap using a PCA-env following the approach described by Broennimann et al. (2012). This approach includes three metrics that allowed us to compare the environmental conditions of the American region with those present in each native region. The most relevant parameters in this approach are “Stability”, which measures the proportion of environmental conditions shared between two regions (Petitpierre et al. 2012)—in our case the invaded versus each of the two native regions; “Expansion” that refers to the proportion of conditions in which the species is present in the new environment but not in the native environment; and “Unfilling” that shows the proportion of the native niche that do not overlap with the niche occupied in the new environment (Petitpierre et al. 2012; Tingley et al. 2014).
We performed some supplementary analyses, including (i) a multivariate environmental similarity surface (MESS) (Elith et al. 2010) to measure the amount of extrapolation the prediction of each distribution model (SDM) is inferring when projected on a different region (Saupe et al. 2012); (ii) density curves for all the 22 environmental characteristics associated with the occurrences of the three regions, and (iii) a Principal Component Analyses to compare the overall overlap of the three niches. For details of these analyses (i-iii), see Online Resource 3. To estimate the D and I indices, as well as the PCA-env and PCA we used all 22 variables.