Identifying Key Processes and Drivers Affecting the Success of Non-indigenous Marine Species in Coastal Waters

“Identifying key processes and drivers affecting the success of non-indigenous marine species in coastal waters” we evaluated Biological as Research Paper. Despite the rapid spread of non-indigenous species (NIS) in coastal waters worldwide, biotic invasions are widely disregarded in marine conservation planning. To guide conservation actions, a better understanding of the underlying mechanisms and drivers determining the success of NIS are therefore needed. In this study, we develop and apply a joint modelling approach to identify the key drivers and community assembly processes determining the occurrence of invasive benthic invertebrates, using Danish coastal waters as a case study. To reflect factors affecting the introduction, establishment and spread of NIS throughout the area, we compiled long-term monitoring data on NIS, as well as information on commercial shipping, environmental conditions and estimates of larvae settling densities derived from advanced drift model simulations (informed by species traits). We then applied a set of species distribution models to identify the key drivers determining the occurrence of NIS. Our results demonstrate a significant positive effect of vessel activity, a negative effect of depth and bottom salinity, as well as a positive effect of the simulated settling densities on the probability of presence. Taken together, our results highlight the role of commercial shipping, habitat characteristics and passive advection of early-life stages on the success of NIS. Our joint modelling approach provide improved process understanding on the key community assembly processes determining the presence of NIS and may serve to guide monitoring, management and conservation planning in order to limit future invasions and their negative consequences on coastal ecosystems. Abstract 8 Non-indigenous species (NIS) pose a major threat to biodiversity and the functioning and services of 9 ecosystems. Despite their rapid spread in coastal waters worldwide, biotic invasions are widely 10 disregarded in marine conservation planning. To guide conservation actions, a better understanding of 11 the underlying mechanisms determining the success of NIS are therefore needed. Here we develop a 12 joint modelling approach to identify the key drivers and community assembly processes determining 13 the occurrence of invasive benthic invertebrates, using Danish coastal waters as a case study. To 14 reflect factors affecting the introduction, establishment and spread of NIS throughout the area, we 15 compiled long-term monitoring data on NIS, as well as information on commercial shipping, 16 environmental conditions and estimates of larvae settling densities derived from drift model 17 simulations informed by species traits. We then applied a set of species distribution models to identify 18 the key drivers determining the occurrence of NIS. Our results demonstrate a significant positive 19 effect of vessel activity, a negative effect of depth and bottom salinity, as well as a positive effect of 20 the simulated settling densities on the probability of presence. Taken together, our results highlight 21 the role of commercial shipping, habitat characteristics and passive advection of early-life stages on 22 the success of NIS. Our joint modelling approach provide improved process understanding on the key 23 community assembly processes determining the presence of NIS and may serve to guide monitoring, 24 management and conservation planning in order to limit future invasions and their negative 25 consequences on coastal ecosystems. 26

included only records obtained using the standard 0.0143m 2 "HAPS" corer, the most commonly 131 applied sampling method of benthic invertebrates across all years and sites within the monitoring 132 program. Furthermore, we accounted for the marked differences in sampling effort over time and 133 between sites (Fig. 1), which may bias the detection and probability of presence of NIS at any given 134 site, by performing a formal data standardization. This was achieved by randomly bootstrapping (i.e., 135 resampling without replacement) the same minimum number of unique samples from each site and 136 season (i.e., aggregating months by quarters) over 500 random iterations. Two stations 137 ("Hirsholmene" and "Knud001") were excluded due to insufficient sampling coverage across years 138 and seasons. We then extracted the mean abundances and their standard deviations for all species, 139 sites, years and seasons across the entire set of randomized draws and used this standardized metric in 140 the following statistical analysis. Furthermore, we restricted the analysis to the years from 2000 141 onwards to allow for a sufficient coverage of sampling across stations and ensure a temporal match 142 with the drift model simulations. After omitting sporadic records based on other gear types and 143 observations prior to 2000, the standardized data set contained 17,769 mean abundances (or 144 presence/absences) of 11 NIS covering in total 66 sites. 145 146

Transport vectors and habitat characteristics 147
To represent the role of commercial shipping, which is considered to be one of the key transport humanactivities.eu/view-data.php). The data is based on all available vessel positions retrieved from 7 the Automatic Identification System (AIS) in 2019, and is expressed as hours per km 2 (Fig. 2a). To 152 represent key habitat characteristics allowing establishment of NIS, we also extracted available data 153 on depth, temperature and salinity ( Fig. 2b- To represent the dispersal phase of NIS (once having been introduced and established) we simulated 164 larval dispersal using the agent-based modelling system IBMlib (Christensen, 2008;Christensen et al. 165 2018). IBMlib is a model library for individual-based modelling using a Lagrangian approach that has 166 been specifically developed for simulating larval dispersal including biological processes affecting 167 spawning, drift and settlement of marine larvae. We ran simulations for each of the NIS for the years To identify a set of credible starting locations for releasing agents (i.e., larvae), as well as the timing 177 of release and duration of simulations, available information on species traits reflecting their 178 spawning period, pelagic larvae duration, substrate required to settle, preferred depth, as well as the 179 temperature and salinity tolerances for adults and larvae were collected for each NIS (Table S1).

Statistical analysis of underlying assembly processes acting on NIS 204
To assess the effect and relative importance of factors explaining the overall presence and absence of 205 the set of NIS considered, we applied Generalized Additive Models (GAMs) and random forests 206 PAs,l,y,q=a+s(spns,l)+s(shipl)+s(tempSl,y,q)+s(tempBl,y,q)+s(sall,d,q)+s(depthl)+5y+S+e 214 where the response variable PA is the presence/absence (0,1) of each species s, at site l, in year y and 215 season q as a function (using a logit link) of the simulated settling density (spn), ship activity level 216 (ship), surface and bottom temperature (temS, temB), salinity (sal) and depth at each site. To account 217 for mean differences in the probability of presence between species and over time we included species 218 identity (S) and years classified into 5-year time periods (5y) as fixed effect factors. The constant a is 219 the overall intercept, s the thin plate smoothing function for each smooth term and ε the error term. 220 Although the number of regression splines is optimized (and penalized) by the generalized cross 221 validation criterion (GCV), the degrees of freedom of the spline smoother function (s) was further 222 constrained to three knots (k=3) to allow for potential nonlinearities, but restrict flexibility during model 223 fitting. Finally, we applied backwards model selection to identify the best possible set of predictors. 224

225
The second method used, i.e., RF, is a machine learning tool comprising ensembles of decision trees 226 that rely on bagging (i.e. bootstrap aggregation). RFs are capable of reproducing complex nonlinear 227 shapes in single and multiple dimensions, making them suitable for ecological applications in which 228 complex shapes are to be expected (Breiman 2001). In addition, RF has fewer constraints and is able to 229 capture interactions between variables that cannot easily be achieved with GAMs. Individual 230 classification trees within the random forest are trained on randomly selected subsets of the data. The 231 final forest prediction is obtained by averaging predictions across all trees in the forest. We used the 232 same model setup in terms of response and explanatory variables as in the GAM. In order to account 233 for differences in the number of observations reporting absences versus presences of NIS, we accounted 234 for such unbalanced classes by assigning different weights to each class (using the "classwt" option). 235 Once trained on the available data, we used the final RFs (based on 10 000 individual trees) to estimate 236 the relative importance of each predictor, as well as visualize the partial response curves of each 237 individual explanatory variable. Finally, we used the final models to estimate the overall probability of 238 occurrence of NIS across the entire study area, beyond the NOVANA monitoring stations used for 239 model fitting and training. This was achieved by predicting occurrences based on a compilation of all 240 covariate values in each grid cell, i.e., based on both the collected ship activity, depth, temperature and 241 salinity data (Fig. 2), as well as the drift model outputs of species-specific settling densities (Fig. S2). 242 The model predictions serves to highlight areas highly vulnerable to NIS and in need of monitoring and 243 conservation actions. All statistical analyses were conducted using the R software, version 4.0.2 (R core 244 Team 2020) using the following packages: "mgcv" (Wood 2017) and "randomForest" (Liaw and 245

Spatial patterns of NIS and environmental conditions 249
The overall occurrences and abundances of the set of NIS demonstrated pronounced spatial 250 heterogeneity across sites throughout the study area. This is illustrated by marked spatial differences 251 in the number of NIS that are frequently occurring at each station based on our bootstrapping (Fig.  252   3a). Notably, the highest richness of NIS occurred in nearshore coastal areas, particularly in the 253 southern part of the area (i.e., around the densely populated islands of Zealand and Fyn and in the 254 Wadden Sea), while the lowest richness (or even absence) of NIS was found in more offshore areas in 255 Kattegat and the North Sea. A similar spatial pattern was also evident in the sum of median 256 abundances across NIS (Fig. 3b), indicating that not only the number of species, but the total number 257 of individuals of NIS are higher in nearshore and shallow, coastal waters, especially in the southern 258 part of the area. Note however, that the individual abundance records are highly variable across 259 species, sites and seasons and may in some samples be considerably higher (>100) than the sum of 260 median abundances shown here (Table 1; Fig. S1). Interestingly, some of the sampling stations where 261 NIS are entirely absent are located in deeper, offshore sites with low vessel activity (Fig. 3c)   This is illustrated by high overall estimates of larvae settling densities, primarily in shallow, nearshore 267 waters, while the deeper, offshore areas, or waters with high current velocities (such as in the Sound 268 and southern Kattegat) receive very few number of larvae (Fig. 4a). These patterns are broadly 269 consistent with the underlying habitat maps used to constrain the initial locations when seeding the 270 agents in our simulations (Fig. 4b). However, the areal extent of habitats classified as suitable 271 according to species traits are clearly larger than the areas receiving a high number of settled larvae, 272 particularly by extending further offshore. Please note that the simulated settling densities, as well as 273 habitat maps differ between individual species (Fig. S2-S3). These differences in turn, arise from 274 species-specific differences in the reported values of key traits reflecting their spawning period, larvae 275 duration and habitat requirements, respectively (Table S1).  Table 2). Furthermore, the model demonstrated a high overall 284 predictive performance, as demonstrated by an AUC (i.e., area under the receiver-operator 285 characteristic curve) value equal to 0.95 (i.e., 95% overall accuracy). However, please note that the 286 accuracy of predicting absences was considerably greater than predicting presences (Fig. S4). In terms 287 of RF, the final model demonstrated a slightly better overall predictive accuracy compared to GAM 288 (i.e., out-of-bag (OOB) error rate = 2.3%). More importantly, it showed more equal prediction skills 289 for both classes, demonstrated by mean error rates amounting to 2.2% and 6.6% for absences and 290 presences, respectively. 291

292
The fixed effect (parametric) terms of the GAM indicated clear differences in the overall mean 293 probability of presence between species, as well as between 5-year time periods (Table 2), where in 294 the latter case the higher estimated coefficients for the recent decades indicate an overall increased 295 overall probability of presence of NIS over time. Furthermore, the partial smooth plots showed a 296 positive, exponentially increasing response of probability of presence to higher settling densities (Fig.  297 5a). However, please note that the relationship towards low values of settling density are driven by a 298 pronounced scatter of observations at zero values of settling density. The corresponding partial 299 dependence plots derived based on RF predictions showed a similar positive relationship, but with a 300 more linear increase across all values of settling densities (Fig. 6a). In terms of ship activity, both 301 GAM and RF demonstrated a positive and saturating response at higher values of vessel density (Fig.  302   5b; Fig. 6b). Please note that the slightly decreasing relationship based on GAMs towards the highest 303 values of vessel density is uncertain and driven by very few extreme observations. Hence, it should be 304 considered with caution. The relationships with temperature, salinity and depth indicate the partial 305 responses of probability of presence of NIS to environmental conditions (Fig. 5c-e; Fig. 6c-e). The 306 responses are largely similar between methods, except for bottom temperature, where GAM 307 13 demonstrated a dome-shaped relationship (with highest probability at medium temperatures of 308 ~10°C), while the partial dependence based on RF was more erratic and highly variable across the 309 range of values. With regards to salinity and depth both methods demonstrated a negative, linear or 310 non-linear declining relationship, respectively, with low probability of presence at greater depths and 311 high salinities. Among the entire set of predictors, species identity had the highest relative variable 312 importance in the RF model, as illustrated by the largest decrease in the mean Gini index (Fig. S5). 313 Subsequently, the set of environmental predictors were deemed important (i.e., in the following order: 314 salinity > depth >> temperature), closely followed by the effect of shipping activity, time period and 315 settling density. Finally, predictions based on the selected models demonstrate a high mean 316 probability of presence of NIS in shallow, brackish and coastal areas, especially in the inner Danish 317 waters of Kattegat and the Belt Seas (Fig. 7a, b). In general, GAMs predict higher mean probabilities 318 compared to RF and also demonstrate a higher standard deviation of predicted probabilities across all 319 species compared to RF (Fig. 7c, d). Conversely, RF predicted higher probabilities along the major 320 shipping routes compared to GAM. In general, the model predictions are able to adequately 321 characterize the overall patterns of observed probability of presence of NIS based on available 322 monitoring data, but tend to either slightly overestimate (i.e., GAM) or underestimate (i.e., RF) 323 probabilities (Fig. S6), particularly in the inner Danish waters (Fig. 7e, f). The holistic approach we have developed and applied using a combination of observational data, 399 advanced drift models and multiple SDMs has generated insight into the key factors and assembly 400 processes affecting the success of marine NIS. However, we fully acknowledge that our 401 understanding of all the underlying mechanisms and drivers regulating their introduction, 402 establishment and spread throughout the entire study area and beyond is still incomplete. This is 403 primarily due to a number of limitations, such as the availability and quality of input data, as well as 404 sources of model uncertainty. The NOVANA monitoring program provides an excellent and best 405 available data set for studying marine NIS within the study area. But, the number and geographical 406 coverage of stations, the choice of gear, as well as the spatio-temporal resolution of sampling may not 407 entirely reflect the true underlying occurrences and abundances of the set of NIS considered; nor may 408 it represent the entire range of conditions and habitats where NIS are present or absent. This is 409 especially true for NIS in the early phases of the invasion process that have not yet reached 410 equilibrium and managed to occupy all the potential sites and habitats with conditions fitting their 411 reported niches (Fig. S3). Secondly, while we aimed to include a broad set of covariates, we may have 412 missed, or not sufficiently accounted for all the key determinants regulating the introduction, 413 establishment and spread of NIS. One such aspect is the degree to which biotic interaction, including a key factor promoting, or hindering spread of marine NIS across the sea scape. These insights may 451 provide key guidance for improving and prioritising existing monitoring programs and sampling 452 protocols to ensure a timely detection and robust status assessment of marine NIS. As an example, our 453 identification of vulnerable habitats and areas demonstrating high predicted probability of presence of 454 NIS may serve to aid in the planning and prioritization of limited conservation efforts aiming to avoid 455 or combat marine NIS. In our case study, this is particularly evident in the shallow, brackish and 456 coastal areas in the inner Danish waters where both predicted probabilities of NIS (Fig. 7a, b) and 457 human impacts are high. Implementing conservation efforts, either through protected areas and/or 458 other direct interventions (e.g., removals) are indeed critical, especially in the early stages of the 459 invasion process, since regulating an already established, self-supporting population is considerably 460 more difficult than avoiding them getting to a location in the first place. Finally, it is our hope and 461 ambition that the model development, knowledge and maps provided here will contribute to future     Table S1. Traits of 23 benthic species considered as non-indigenous in Danish waters. Species with asterisk denote the NIS present also in the 95 NOVANA data set (see Table 1). The four species in "red" were not included in the study due to lack of data. Habitat abbreviations: ALL=all 96 habitat types; S= sandy; M= muddy; H=Hard; Mus= mussel and oyster beds; W = seaweed and seagrasses.   Table S2. EMODNET seabed substrate classified into 3 main categories: "Mud", "Sand" and 99 "Hard substrates". Notice that "hard substrate" and "Sand" both include "Mixed sediments" 100 and "Coarse sediments".