First, we developed a habitat suitability model by comparing the habitat in 1 x 1 km grid cells of the Universal Transverse Mercator (UTM) coordinate system that were occupied by beaver to non-occupied but accessible grid cells in Flanders. This resulted in a habitat suitability index (HSI) per grid cell ranging from unsuited (0) to ideal (1) beaver habitat. A similar model for beaver in Flanders had previously been developed by Swinnen (2015) as a proxy for future beaver distribution. Validation by Vanstaen (2019) however showed several highly suited habitat patches to be unoccupied, while other, far less suited patches had been. Vanstaen (2019) attributed these discrepancies to the fact that the time needed to disperse into distant suited patches had been too short and to the rapid acceleration in population growth since the development of the model in 2015, forcing beavers to occupy less favourable habitat patches. We countered these concerns by using updated distribution information from the significantly larger Flemish beaver population of 2019 and by including a dispersion model in our prediction. All steps followed in the construction of these models are described below and presented in Fig. 1.

## Habitat suitability model

We constructed the habitat suitability model to the known distribution of the species in Flanders in 2017, as monitored every year by the Flanders Agency for Nature and Forest (ANB). In that year, Flanders had a total of 171 locations which were either occupied or known to have been occupied by beavers. We chose a 95% Minimum Convex Polygon (MCP) around these locations as the core area for beaver spread in Flanders. As a background to calculate habitat suitability, we selected all grid cells that overlapped with this area and of which at least 95 ha of the surface area was situated in Flanders. This limits the observed distribution to a core region which is necessary since locations outside of this spread are harder to reach and can therefore be unoccupied by reasons other than habitat preference. For the same reasons, we only evaluate accessible habitat within this distribution area. In semi-aquatic species like the beaver, it is useless to analyse habitat composition in locations that are not in the vicinity of streams or water bodies. We thus further limited accessible habitat to a buffer of 50 m around streams of categories 0, 1, and 2 and water bodies with a minimal surface area of 1000 m², that were located within 500 m of such streams (AIV 2018). For the selection of habitat characteristics, we used different habitat types as described in the Flemish biological valuation map (BWK) (De Saeger et al. 2016; Vriens et al. 2011). The choice of possible predictive habitat types was based on Swinnen (2015) and included: grasslands, freshwater swamps, thickets (except dunes), willow and swamp forests, poplar forests, other deciduous forests, small water bodies, swamps, orchards, groves and hedges. The presence and absence of beavers in the selected region were based on data from monitoring of 132 possible territories in 2018 done in a collaborative effort by INBO and Antwerp University (UA) (Huysentruyt et al. 2019; Vanstaen 2019). This dataset, with a total of 4127 track sites, contained the location of all kinds of beaver tracks, often situated within the same territories. This could result in an overestimation of beaver presence in certain locations. To counter this, we performed spatial thinning at a scale of 500 m using the R-package spThin v0.2.0 (Aiello-Lammens et al. 2015). This yielded a total of 233 locations within the selected region for further analysis. A similar exercise using the Flanders Environment Agency (VMM) data yielded an additional 95 locations for model testing. We used MaxEnt to determine habitat preference (Merow et al. 2013; Phillips et al. 2006; Phillips and Dudík 2008; Phillips 2017; Swinnen 2015; Rutten et al. 2019). Evaluation of the model parameters and model construction was done using the R-packages ENMeval v0.3.1 and dismo v1.1.-4 respectively (Hijmans et al. 2017; Muscarella et al. 2014). The model was validated using data from stream inventories performed by VMM in 2019. These inventories were part of management actions for Norway rat (*Rattus norvegicus*), muskrat (*Ondatra zibethicus*), and coypu (*Myocaster coypus*) but also incorporated data on beaver presence. The observations of other rodent species in that dataset further allowed us to compare both a model with and without sampling bias and evaluate the need to incorporate this bias in the final model. Analysis of geographical data was performed in ArcGIS v10.4.1. Model building was performed in R v4.0.2, model validation was performed in R v4.3.0 (R Core Team 2023).

## Dispersion model

We constructed a dispersion model based on the known distribution of beaver territories in Flanders up until 2019. We first applied a set of model parameters to small initial populations known on two known sites where beavers were either released or were known to have entered Flanders through dispersion from neighbouring regions in 2003 and made projections for the distribution in 2014. We then adapted the model parameters to let this distribution mimic the known distribution for 2014. After this, using these parameter settings, we simulated the 2017 distribution using the 2014 known distribution for final model validation. Finally, we used the known 2019 distribution to predict the 2022 population.. Habitat suitability model results were used as a measure of landscape permeability and the chance of a beaver settling at a given location. Watercourses as described by AIV (2018) were used as connective features throughout this landscape. To model the movements of individual beavers through the landscape we used the R-package SiMRiv, v1.0.4 (Quaglietta and Porto 2019). The output was visualized using ArcGIS® v10.8.1 (ESRI 2020).

For this paper, we used additional data on the beaver distribution in the 2020–2022 period to evaluate this model. The model we constructed aims at predicting the distribution of territories in Flanders, not individual animals. For this, we assume that every successful settlement of an individual beaver in a new location will result in a new territory the next year. In our model, new territories are then used as potential additional starting points for dispersion in the next year. In our approach we also favoured newly formed territories as potential starting points by selecting only the most recent 100 territories in each of the years following the initial year in order to allow sufficient chances of colonisation over larger distances. We used a fixed population growth rate for the expected number of territories, based on observed growth rates in the previous years. Since the number of territories in Flanders increased from 82 in 2017 to 210 in 2019 with a yearly average of 21 ± 6% (Huysentruyt et al. 2019), we used a growth factor of 1.21 in our model. Given all these assumptions, we expect our model to be more likely to overestimate beaver presence, which fits the purpose of our risk model better than an underestimate.

It has been suggested that habitat suitability can be a poor proxy for landscape connectivity and resistance during dispersal since animals use the landscape differently during dispersal than in the home range as animals can easily disperse through low-quality habitat (Keeley et al. 2017). Therefore, data on landscape permeability was added to the model in two different ways. On the one hand, all major waterways (categories 0 and 1) were added as dispersal routes with zero resistance, allowing easy long-distance dispersal, even through landscape with low habitat quality. Alongside these watercourses, a buffer of 200 m was added to increase the chance of capturing beaver trajectories. Second, habitat suitability (*h*) per kilometre square, as calculated in the habitat suitability model, was transformed into landscape resistance (*w*) using a negative logistic exponential transformation as suggested by Carroll et al. (2020). This allowed us to divert from a strictly linear relationship between habitat quality and resistance. Using such a transformation means that larger fractions of the landscape offer low resistance, allowing greater flexibility where potential corridors are located (Keeley et al. 2016). After initial exploration with the 2003–2014 data, we opted to limit this diversion to:

$$w=\frac{(100-99\text{*}\frac{\left(1-{e}^{-1\text{*}h}\right)}{1-{e}^{-1}})}{100}$$

To simulate beaver movement, we used a *levy walk* (LW) approach. In this approach, short movements in various directions (*random walk* RW) are alternated with longer movements in a continuous direction (*correlated random walk* CRW) (Porto and Quaglietta 2017). We choose a 40 m step length and a 1 km scan horizon in which a choice of direction for the next step is made. The correlation between two consecutive steps in CRW was set at 0.98. This approach aims to mimic reality in which local movements are alternated with long-distance dispersions. In our model, the probability of shifting between RW and CRW was kept low at 0.1. This allowed for longer bursts of both local and long-distance movements. We simulated 60,000 LW steps from each of the selected starting points. At each endpoint of these movements, the chance of a beaver settlement in that location was modelled as a binomial chance based on the habitat suitability index of the respective grid cell. When settlement was successful, this location was added as a new territory to the dataset. In case of failure, the LW procedure was repeated from the same starting point, with a maximum of three iterations. After each settlement, habitat suitability of the grid cell was lowered by 0.1 to reduce the chance of additional settlement in that same grid cell. This way, grid cells with more suitable habitat were not only more susceptible to beaver settlement, but they also had a larger chance of multiple settlements. In the case of settlement, the resistance of the grid cell was not altered, since, while occupied territories complicate additional settlement, beavers are easily capable of dispersing through them. We aimed to simulate 20,000 settlements and adjusted the number of iterations accordingly. The initial output of the model was the number of model runs in which a km² grid cell was occupied by beavers. However, this occupation probability does not take into account the average beaver territory size. This way, some km² grid cells would be labelled as unoccupied, even if they are surrounded by cells with possible occupancy. This is not realistic as the chance of occupancy in even low-quality habitat grid cells rises when nearby cells are occupied. To account for this, we enlarged the scale of possible occupancy. This was done by calculating the occupancy probability for each grid cell in combination with the average occupancy of all surrounding cells (= scale of 9 km²). When this value was higher than the observed occupancy probability in the grid cell, the value for this grid cell was adjusted accordingly. This helps in filling up any possible gaps in the occupancy probability map. At each output phase (initial and adjusted), the numeric output with the exclusion of cases with zero occurrences, was reclassified to four classes, labelled as accidental (1), low (2–5), moderate (6–20) and high (21–100) chance of occupancy.

## Validation

For validation of this combined model output, we compared the output per km² grid cell in Flanders to the observed distribution of beavers in Flanders in 2020–2022. For this, we used all data on beavers compiled in GBIF, data which also comprises stream inventory data performed by VMM (GBIF.org 2023). We combined this data with occurrence data obtained from waarnemingen.be, a citizen science portal hosted by a Flemish nature conservancy NGO (Natuurpunt 2023). To eliminate errors in the combined dataset, only km² grid cells with more than one observation over the three years were classified as occupied by beaver. This dataset was then used as the response variable in validation analysis, using either presence/absence as a binary response variable, where a single occupancy event is sufficient to label as present, or the number of occupancy cases as a numeric response. For the predictive variables, we used both the binary occupancy as well as the numeric output of the dispersion model. In all cases, we calculated the area under the curve using the package pROC v1.18 in R (Robin et al. 2011). In the analysis where both a binary response and predictor were compared, we also performed a confusion matrix analysis using the caret package v6.0-94 in R (Kuhn 2008).