2.1. Alien fish species
This study considers 127 alien fish species, belonging to 81 genera and 29 families that have been recorded as established in freshwater systems of one or more European countries (Fig. 1). Species were selected based on their establishment status retrieved from several sources [e.g. GISD; Pagad et al. 2018; Frose and Pauly 2020; all species traits, references and otherwise, are available in the OSF online repository https://osf.io/p3qkj/; Marcolin et al. (2022)]. A species was selected when it was reported as established in at least one European country. Whenever inconsistent information between two or more sources (i.e. one source reporting that the species is established in a country and the other source reporting as not established) were found, priority was given to the source reporting the newest reference. Scientific names follow the classification proposed by Fricke et al. (2020).
2.2. Fish species traits and invasion history
For each alien freshwater fish species, 23 species traits (nine functional traits and fourteen ecological traits), and seven variables describing the introduction history of the species were identified (Table 1). Data were retrieved from multiple sources including research studies, data sources and fish index manuals (Marcolin et al. 2022). Whenever inconsistent information between two or more sources were found, priority was given to the source reporting the newest reference. Furthermore, when the inconsistent information was reported by two references of the same year of publication, the “Precautionary Principle” (PP) approach (Cooney 2004) was followed: in face of inconsistent information on alien fish traits, the selected trait class or value was the one that potentially had higher impacts on native ecosystems following the introduction of the alien fish species. For example, since the adult trophic status of Vimba vimba is reported either as omnivore or invertivore by different sources, the PP approach led to classify the species’ diet as omnivore since it can be more impactful on the native fauna than an invertivore one (Oberdorff et al. 2002). Information was successfully retrieved for 90% (3417 out of 3810) of the trait records among all species (Supplementary Information Table S1). For 30% of the species (38 out of 127), all the traits were retrieved.
For many species traits, all the information required for each species was not available. In such cases, firstly it was attempted to consider the phylogenetic similarity (Troia and McManamay 2019), i.e. assigning a trait value, level or category of species belonging to a common genera or family with known traits. Whenever more than one of such species was available, the above PP approach was followed. In addition, whenever possible and needed, fish morphological characteristics were used to infer ecological traits [e.g. from the body shape of Cobitis bilineata was derived the feeding habitat as benthic; Chan 2001; Haas et al. 2010; Franssen 2011; Branco et al. 2013; Ingram 2015; all references for traits and traits values can be found in Marcolin et al. (2022)]. Among the resulting missing trait records, 90% (181 out of 201) were filled following the procedure based on the PP approach and retrieving information according to the phylogenetic similarity and morphological characteristics.
In case were not possible to fill the trait, trait levels were assigned to an intermediate trait score (e.g. the missing value for the Maximum fecundity of the Trinectes maculatus was assigned as ‘3’ out of five possible levels) (Branco et al. 2008). Only the remaining 10% (20 out of 201; 0.5% of the 3810 records) were filled through the assignment of records to the central class of the trait, based on the levels of that specific trait.
Five traits had a high level of missing data (Acid tolerance, Habitat degradation tolerance, Water quality tolerance general, Water quality tolerance oxygen, Water quality tolerance toxic), thus were aggregated into a single combined General tolerance trait. Following the PP approach, the level of the combined trait (tolerant, intermediate, or intolerant) resulted from the highest tolerance score found among the five tolerance traits. After aggregating the tolerance traits into a single variable, the completeness of the records increased to 94% (3101 out of 3302).
Finally, whenever the available information was in the form of interval values (e.g. Female maturity) the PP approach was also followed by selecting the value leading to a higher impact on native ecosystems (Marcolin et al. 2022).
2.2.1. Invasion pathway
Species traits selection/filtering was investigated for the release, establishment, spread and impact stages of the invasion pathway. To model the influence of the different species traits on the invasive potential of alien species, proxies of the four invasion stages were used as response variables (Marchetti et al. 2004a; Ribeiro et al. 2008). Specifically, (1) the propagule pressure as a proxy for the release stage; (2) the fraction of European countries with established populations where the species was released and where established populations have been reported as a proxy for the establishment stage; (3) the number of European countries with established populations as a proxy for the spread stage; and (4) the invasibility status as a proxy of the impact stage (Table 1). As a surrogate measure of the propagule pressure, the number of introduction events corresponding to the number of records of introduction reported of a species in European countries of the study area considered were used, retrieving the information from Froese and Pauly (2020) and AquaNIS database (http://www.corpi.ku.lt/databases/index.php/aquanis). This approach was based on the method used by McGregor et al. (2012), Procheş et al. (2012) and Peoples and Midway (2018) where the authors used, as a surrogate of propagule pressure, the total number of entries/abstracts of human use of a species in discipline-specific literature databases [e.g. “species name AND aquaculture” in Peoples and Midway (2018)]. The Fraction of European countries with established populations was calculated as the proportion between the number of European countries where a species was introduced and established, and the number of European countries in which the species was introduced but not necessarily established. In this way, this variable is a measure of establishment success that is independent from the total number of countries with established populations. The variable was based on the same sources used to assess where a species was established or not (see section 2.1 above). The invasibility status had a binary nature with 0 representing alien species for which there is no evidence of being invasive, and 1 representing IAS. A species was considered to be invasive whenever information was found (e.g. GISD, Pagad et al. 2018; Frose and Pauly 2020) that the species status is invasive in, at least, one country. However, most species were evaluated a priori, following the method of Vilizzi et al. (2019).
2.3. Data analysis
To avoid problems derived from multicollinearity among variables, a correlation tests was ran using the Pearson correlation coefficient. All variable pairs showing r > |0.7| were considered to be correlated. Variance Inflation Factor (VIF; Zuur et al. 2007) was also computed in a stepwise fashion by excluding at each step the variable with the highest VIF and computing the VIF again in the following step until all the VIF of the remaining variables were lower than 2.5 (Johnston et al. 2018). A correlation coefficient of 0.70 was found between maximum adult size and maximum lifespan and, based on the stepwise VIF, the maximum adult size was discarded from all models.
An empirical modelling approach based on Generalized Linear Mixed Models (GLMM) was followed, both to hierarchize the importance of each variable at different invasion stages and to build predictive models for each invasion stage. GLMM was used to account for the non-independence among traits induced by phylogenetic similarity (Ruesink et al. 2005), using taxonomic classification at the family level as a random factor. Different family distributions and link functions were used according to the type of response at each stage (Table 1): Gaussian errors and link = 1 (multiple linear regression) for the release stage (propagule pressure; continuous), Poisson errors and log link (Poisson regression) for the spread stage (Number of European countries with established populations; discrete), and binomial errors and logit link (logistic regression) for the establishment (Fraction of European countries with established populations; proportion) and impact stages (Invasive status; binary). All predictor variables coded at the ordinal scale were treated as numeric in the models. Along the sequence of analyses, the outcome of each stage was assumed to be dependent on the outcome of the previous stage(s). Hence, at each stage the effect of variables was tested while controlling for the effect of the previous stage by adding the response variables of all previous stages to the models of the subsequent stages as candidate independent variables. First, the probability of a species to be introduced (release stage), was modelled as measured by the propagule pressure, using functional and ecological traits as potential predictors, as well as three variables related to the introduction history (cause of introduction, number of causes of introduction and prior invasion success outside Europe; Table 2). Then, in a second model, the probability of a species to establish populations in Europe (establishment stage) was modelled using the Fraction of European countries with established populations, as the response variable. In this step, the same predictors were used but now adding also the response variable of the release stage (propagule pressure; Table 2). In the next modelling step, prior invasion success at European countries (spread stage), was modelled now adding also the response variable of the previous stage (Fraction of European countries with established populations) as a candidate predictor variable. Finally, the invasive status (impact stage) was used as the response variable (binary variable: invasive/non-invasive) adding the response variable of the spread stage (number of European countries with established populations). Nonetheless, the analysis of the impact stage was an exception: the response variable of the release stage (propagule pressure) was not included as a candidate variable in the model since its correlation with the spread stage response variable was higher than the established threshold (r = 0.9).
For model selection, a multimodel inference approach (Burnham and Anderson 2002) was followed. To select a reduced subset of candidate variables, firstly was ran a backward variable selection based on the Akaike’s Information Criterion (AIC) for each response variable (four stages; as similarly done in Anderle et al. 2022). Based on this reduced subset of variables, a set of candidate models (Supplementary Information Tables S2 to S5) was defined following two main a priori assumptions: (1) the success of alien species at each invasion stage always depend on the introduction history and the outcome of the preceding stages; (2) any other variable (functional and ecological traits) was allowed to enter the model, as far it did not exceed a total of five variables in each candidate model, to avoid model overfitting as well as to ease the setting of a number of candidate models that should not be greater than the number of cases (n=127; Burnham and Anderson 2002). The candidate models were ranked according to the Akaike’s Information Criterion. Models with ΔAICc < 2 (difference between model’s AICc and the lowest AICc) were used for model averaging (Bolker et al. 2009; Richards 2008), in which the resulting variable coefficients resulted from averaging the coefficients of the models containing the variable (conditional averaging). The relative importance of the selected variables was assessed using the probability of each variable to be included in the best approximating models, estimated by summing the Akaike’s weights of all candidate models where the variable was included (Burnham and Anderson 2002). To estimate the goodness-of-fit of models, the marginal and conditional R2 were computed: the first representing the variance explained by the fixed effects and the second representing the variance explained by the entire model (Nakagawa and Schielzeth 2013). When the final model resulted from averaging the best approximating models, a weighted average of the models’ R2 was computed, in which the weights corresponded to the respective Akaike weights (wi).
All the analyses were performed with R software, version 4.0.0 (R Core Team, 2020). The vifstep function of the usdm R package (Naimi 2015) was used to compute the VIF stepwise procedure. GLMM were performed using lme4 R package (Bates et al. 2007). Multimodel inference was performed with the MuMIn R package (Bartoń 2016).