Study area and record collection
We defined the study area as the one encompassing all European countries where S. pedo is known to occur with >1 location. This resulted in the inclusion of the area ranging from Portugal to western Siberia to the east, and from Sicily to Czech Republic and Slovakia to the north (Figure 1).
Occurrence records were retrieved from different sources, including gbif database (via the rgbif package: Chamberlain et al., 2017), authors’ own data from field surveys, and published references (see reference list in Supplementary materials). Records were included in the data collection only if georeferenced with <1 km accuracy. Additional records were also collected from iNaturalist (www.inaturalist.org). We then removed duplicated records, and used the spThin package (Aiello-Lammens et al. 2015) to thin data at 5 km distance, i.e. reducing multiple records within such distance to a single one, in order to avoid spatial autocorrelation and overestimating the importance of environmental variables’ from over-sampled geographical areas. Such a procedure led to the inclusion of 3,283 independent records.
For assessing the species’ association to protected grassland habitats as listed in the HD, we focused on Apulia – southern Italy (Figure 1b) – a region where the species is relatively common and frequently recorded (33.3% of Italian records) and for which detailed and exhaustive mapping of listed grassland habitats is available i.e., also including areas outside of the N2K network.
Species Distribution Models
We downloaded 19 bioclimatic variables as descriptors of climatic conditions from Worldclim2 (Fick and Hijmans 2017), with a 10 km resolution. We controlled for multicollinearity among variables by running a Variance Inflation Factor (vif) analysis, retaining only variables with vif values <10 (Curto and Pinto 2011). This procedure led us to maintain 6 independent bioclimatic variables (bio2, bio6, bio8, bio15, bio18, and bio19). We also included, as predictor variable, the most recent layer of grassland cover at European scale available (https://land.copernicus.eu/pan-european/high-resolution-layers/grassland/status-maps/grassland-2018), which features a 10 m resolution raster mapping of natural and semi-natural grasslands (e.g., including heathlands, sparsely vegetated grasslands, semi-arid steppes, and meadows), all known to potentially host S. pedo.
We built species distribution models (SDMs) based on a bioclimatic envelope concept (Pearson and Dawson 2005), and by adopting an ensemble forecasting approach as implemented in the sdm R package (Naimi and Araùjo 2016), a well-established procedure that reduces uncertainty of predictions by single model algorithms (Watling et al. 2015). We considered three modelling techniques: Generalized Linear Models (GLMs), Random Forests (RFs), and Maximum Entropy Models (Maxent), performing 10 runs for each technique, for RFs and GLMs, we generated pseudo-absences (background data, n=10,000) by adopting a randomization approach (Barve et al. 2011). The combination of these algorithms is considered among the best performing ones, providing robust and reliable prediction when used in an ensemble (Kaky et al. 2020). For model training, we randomly selected 70% of occurrence data, using the remaining 30% for model performance testing. Model performance was assessed by inspecting the values of the area under the receiver operating characteristic curve (AUC) and the True Skill Statistics (TSS), two validation methods widely used in sdms (Araùjo and New 2007) and that evaluate model discrimination abilities (AUC) and the ratio of correct predictions and randomly corrected ones, a recommended approach when assessing the performance of predictive models (Allouche et al. 2008).
The effect of each environmental predictor on the probability of occurrence of S. pedo was assessed by inspecting the response curves, while each variable’s relative importance was calculated by the specifically devoted function in the sdm package (getVarImp), which determines the change in AUC values due to the inclusion of each target variable.
Conservation Gap Analysis
We assessed the degree of protection granted to S. pedo by PAs in Europe by carrying out a conservation spatial gap analysis, based on both the full occurrences dataset and the binarized potential distribution map, by overlaying each with the shapefiles defining boundaries of both N2K and NPAs for Europe, as downloaded from the websites of the European Environmental Agency and UNEP’s World Conservation Monitoring Centre, respectively. We conducted this analysis both at the entire range and single-country scales. We excluded from this analysis those countries with n records <5, those not being within the EU area, and those which did not feature suitable habitat as predicted by our SDM. We thus calculated the percent of records and of suitable area to S. pedo overlapping with either N2K, NPA, and PA (N2k+NPA) networks of protected areas. To assess whether the two types of protected areas differed in their efficacy in preserving the species, we ran a repeated measure two-way ANOVA test upon the percent coverage values of each class of protected areas and for type of data (records vs suitable range), using Tukey’s post hoc tests for assessing significance in coverage between each compared pair of values.
Association to habitats
We assessed whether protected habitats as listed in Annex I of the Habitats Directive provide an effective surrogate of S. pedo’s ecological needs for conservation, by focusing on a more local (regional) scale. The grassland layer used for the models, and the habitat suitability raster were first clipped to the regional boundaries’ geographical extent. We then selected three grassland habitats listed in Annex I of the HD and occurring widely across the region and namely i) semi-natural dry grasslands on calcareous substrates (HD code: 6210), ii) pseudo-steppe with grasses and annuals of the Thero-Brachypodietea (6220), and iii) Eastern sub-Mediterranean dry grasslands (62A0). These grassland types encompass most of the protected dry grassland habitats occurring in low- to mid-altitudes across Southern Europe, and are considered as priority habitats to conservation due to the high-diversity of both plants and invertebrates they host (Valkó et al. 2016). Habitat layers, that have been mapped within the entire regional territory by 2018, were provided by the regional authority as vector polygons (https://pugliacon.regione.puglia.it/). In order to separately consider grassland surfaces as listed and non-listed habitats, we first excluded all portions of the grassland layer overlapping with any habitat polygons, i.e. leaving only grassland areas listed as “Non-habitat grassland”. To assess the importance of different kinds of grasslands in fostering the occurrence of S. pedo by boosting habitat suitability, we calculated the percent amounts of all habitat and on-habitat extents within each suitable grid cell across the region. Subsequently, we quantified the relationship between suitability values and grassland composition by running a Generalized Linear Model (GLM), using suitability values as response variable, and the percent amounts of all habitats and non-habitat grasslands as predictors, considering significant those effects with p<0.05 and whose confidence intervals did not encompass 0.