Data collection
We surveyed the Central Apennines in 2021, recording the geographic coordinates and the elevation (10 m precision) of all the sites where we observed L. petropolitana (Online Resource 1). Occurrence data for the Alps and the Apennines, as defined by Menchetti et al. (2021) (Fig. 1), were retrieved from literature, museum and private collections with reference to the Italian CKmap project (Balletto et al. 2007) and from collection data stored in Roger Vila’s laboratory at the Institut de Biologia Evolutiva CSIC-UPF (Barcelona, Spain). Data deposited in the platform iNaturalist have also been included if they were associated with high-resolution coordinates (error lower than 1000 m) and if accompanied by a photograph in order to avoid the risk of confusion with the similar L. maera (Linnaeus, 1758). A subset of occurrences was obtained using only the specimens with year and altitude recorded. We found a total of 147 suitable occurrence data. Due to the different sources of these data, and to reduce clustered occurrences, we then filtered the dataset selecting a minimum distance between the occurrence points of 2 km using the ‘thin’ function of ‘spThin’ R package (Aiello-Lammens et al. 2015) in R v. 4.0.3 (R Core Team, 2020). This reduced the final dataset used for the Species Distribution Modelling to 109 records (Fig. 1).
COI analyses
To evaluate the genetic identity of the Apennine population of L. petropolitana we gathered COI sequences for this species from across its distribution range (Eurasia) by downloading data publicly available in GenBank and in BOLD (Barcode of Life Data System), and by adding a series of seven unpublished sequences specifically analysed for this study, representing specimens collected in 2021 in Monti della Laga and Gran Sasso massifs. COI sequences from the Apennine specimens were obtained using LepF1 and LepR1 primers, following the protocol by Platania et al. (2020) at the Institut de Biologia Evolutiva CSIC-UPF (Barcelona, Spain). We aligned all the sequences with MegaX. We removed potential contaminant sequences as well as sequences shorter than 600 bp. This resulted in a final alignment of 55 specimens (Online Resource 2). We constructed maximum parsimony haplotype networks using the COI alignments and TCS 1.21 (Clement et al. 2000) by setting a 95% connection limit (11 steps). The haplotype network was graphically edited with tcsBU (dos Santos et al. 2016).
Haplotype colours were assigned according to p-distances projected after Principal Coordinates Analysis (PCoA) on the RGB space as done by the ‘recluster.col’ function of ‘recluster’ R package https://cran.r-project.org/web/packages/recluster/index.html) (Dapporto et al. 2013). Subsequently, the colours were plotted on a map using the recluster.plot.pie function of the same package. The haplotypes in the network were coloured according to their geographical origin.
Species distribution modelling
We modelled the distribution of L. petropolitana in the study areas as a single entity, although Alps and Apennine populations may be differently adapted to local conditions. Recent studies highlighted the importance of including adaptive genetic variation in climate change vulnerability assessment (Razgour et al. 2019). However, given the very limited distribution of L. petropolitana in the Apennines (4 raster cells) this was not possible. Modelling the entire species as a single entity is expected to generate more generous estimates of species tolerance to environmental conditions.
In the Alps and Apennines L. petropolitana inhabits grassy habitats in woodland and shrub areas, mainly on rocky substrates and in cold and humid sites (Prola et al. 1978; D’Alessandro et al. 2008; Helmann and Parenzan, 2010; Bonato et al. 2014). We obtained 19 climatic variables from the CHELSA database (Karger et al. 2017; Karger et al. 2018) with a resolution similar to that of the occurrence data (30 sec, ~1 km2), and clipped them for the study area. Then we carried out a PCA on the variables to obtain their correlation structure and, according to L. petropolitana biology, we selected the following four low correlated variables: temperature annual range between the warmest and coldest month, mean daily minimum air temperature of the coldest month and mean monthly precipitation amounts of the warmest quarter and of the coldest quarter. We also evaluated the Variance Inflation Factor (VIF) for each selected variable using the R package ‘usdm’ (Naimi et al. 2014) and verified that all values were < 3, which indicates a very low multicollinearity (Alin, 2010; Kock and Lynn 2012) (Online Resource 3). Then, we downloaded and clipped the same variables for future projections for the 2041-2060 period, as well as the Representative Concentration Pathway (RPC) 4.5, described by IPCC (2007) as the intermediate scenario. For future climatic variables we used the median of four Global Circulation Models (GCMs): NorESM1, CM5A-MR, MPI-ESM-MR, HadGEM2-AO.
Species distribution modelling was performed using the R package ‘biomod2’ (Thuiller 2014). In general, overall statistical models tend to be less prone to overfitting than machine learning models, but the latter usually achieve higher accuracy in present predictions (e.g. Merow et al. 2013). As such, it is recommended to use multiple and diverse algorithms (Araújo and New 2007; Norberg et al. 2019) and validate them using a robust blocking approach, in order to estimate the uncertainty around predictions (Bahn and McGill, 2013; Roberts et al. 2017). Accordingly, we used four statistical models to forecast the current and the future species distribution: generalised additive model (GAM), generalised linear model (GLM), maximum entropy (MAXENT) and random forests (RF). The first two are statistical linear models, while MAXENT and RF are machine learning algorithms, so the entire set covers a wide range of complexity in distribution models (Merow et al. 2013) allowing appropriate estimates of uncertainty in model projections (Buisson et al. 2010).
Thereafter, we generated 10 datasets of 1000 randomly selected background points, one dataset for each of 10 repetitions used for the model validation. To estimate the relative importance of the predictor variables, we correlated the predicted probabilities of presence, using the full dataset, with the predicted probabilities of presence obtained after permuting the variable of interest. The relative importance of each variable was quantified as one minus the Pearson rank correlation coefficient (Thuiller et al. 2009). We validated the models using spatial block validation with the R package ‘blockCV' (Valavi et al. 2019). We split the study area into six spatial blocks and two folds (Online Resource 4), iteratively fitted the model on all but one block, and tested against the left-out block. This procedure is known to provide a more objective assessment of model transferability for future projections (Bahn and McGill 2013; Roberts et al. 2017). We measured model performance using the True Skill Statistic (TSS) and the area under the curve (AUC). For both present and future projections, we obtained one occurrence probability raster for each statistical model by calculating the mean of all the projections of models with a TSS > 0.5 and an AUC > 0.7. In order to obtain a single projection for each scenario, we then averaged the four rasters and excluded cells with an occurrence probability <10%. In the results, the term ‘cells’ will thus indicate the cells with an occurrence probability >10%. Then we calculated the difference between the two scenarios by subtracting the current average predictions from the future ones; raster cells with positive delta values indicate a predicted improvement of climatic conditions, whereas raster cells with negative delta values indicate a deterioration of climatic conditions. To estimate the uncertainty in the predictions due to disagreements among the four different algorithms, we assigned -1 to all cells with negative values, +1 to all cells with positive values (and 0 otherwise) of the average single-model predictions, and assessed the consensus of model predictions by summing the four binarized maps. This resulted in a raster map with values ranging between -4 and +4, with extreme values indicating that all the four statistical models predicted a decrease (-4) or increase (+4) in the probability of occurrence, and intermediate values indicating partial (±2-3) or high disagreement (-1 to 1) among the predictions of the algorithms.
To ensure transparency and full replicability of the methods we provide an Overview, Data, Model, Assessment and Prediction (ODMAP) protocol as part of the supplementary material, that details all methodological choices and parameters (Online Resource 5; Zurell et al. 2020). The 109 occurrence data points used for species distribution modelling are available in the Online Resource 6.
Changes in elevation over time
Lasiommata petropolitana shifts in elevation were evaluated from 1964 to 2021, assuming 1964 as one of the dates suggested to mark the beginning of the Anthropocene (Lewis and Maslin 2015). When explicitly reported, elevation values were obtained from literature and collections data. For the occurrences without a clear report of elevation, but with precise coordinates (error lower than 1000 m like most citizen science data), the elevations were extracted using a 1 km2 layer available at https://www.earthenv.org/topography (Amatulli et al. 2018). The final dataset was composed of 323 records over a period of 57 years. An ordinary least squares (OLS) regression between time and elevation was used to test for the existence of a temporal trend in elevational shifts.