In this section, evaluation results are first presented for the 2006–2010 period, covered by all observational and reanalysis products. Then, the analysis is extended to a 26-year period (1985–2010) for the datasets with a sufficiently long temporal coverage, including the WRF regional climate simulations.
4.1 Evaluation for the 2006–2010 period
Table 3 presents the statistical parameters (Pearson correlation coefficient, ME, MAE, RMSE) calculated between station measurements and the gridded observation-based datasets. The ERA5 reanalysis and the CGLS product perform remarkably well, with correlation coefficients exceeding 0.9 and ME values close to zero.
Table 3 Pearson correlation coefficient (PCORR), mean error (ME), mean absolute error (MAE), and root-mean-square error (RMSE) of daily mean snow depth from the different gridded datasets, compared to station observations. Period: November to March 2006–2010
In the case of ERA5, the correlation coefficient even reaches 0.99 for two stations. The good performance of ERA5 and CGLS demonstrates the advantages of using direct observations in the construction of snow datasets, e.g., through the advanced data assimilation system of ERA5 and the incorporation of ground-based snow depth (SD) measurements to complement satellite retrievals in CGLS. ERA5-Land and CARPATCLIM display slightly lower correlation coefficients in the range of 0.8–0.9 and 0.7–0.8, respectively. With a PCORR number of 0.54, the CARPATCLIM dataset performs the worst in the case of the Debrecen station. The highest error numbers can also be observed for CARPATCLIM, followed by ERA5-Land. Muñoz-Sabater et al. (2021) noted that ERA5-Land does not necessarily perform better than ERA5 over relatively data-rich and non-mountainous regions, which is confirmed by our results.
Figure 3 Daily mean snow depth from station observations (OBS) and the different gridded datasets in the 2006–2010 period
The mean annual cycle of SD in the 2006–2010 period is generally reproduced by all datasets, with CARPATCLIM and ERA5-Land displaying noteworthy biases (Fig. 3). The mean errors of ERA5 and CGLS mostly remain below 1 and 2 cm, respectively, which suggests an exceptionally good agreement with station observations. On the other hand, biases of the CARPATCLIM dataset reach 4–7 cm in the case of the Budapest and Pécs measurement sites. For the other two stations, the overestimation of CARPATCLIM remains below 2 cm. The occasional noisiness of the data is presumably a consequence of the relatively short (5-year) study period and the fact that snow cover has significant year-to-year variability in the investigation area with intermittent snow-free periods even in the winter months. The discrepancies of the CARPATCLIM dataset likely originate from the application of a snow model to derive SD from the predictor variables, as its temperature and precipitation records have been proven to be of high quality and utilized in several RCM verification studies (e.g., Kotlarski et al. 2019; Vautard et al. 2021). In addition, the relationship between standard meteorological parameters and SD is not straightforward, and the snow model might not be appropriately tuned for this particular area. Daily mean SD is overestimated in the ERA5-Land reanalysis too, although the magnitude of the positive bias is lower compared to CARPATCLIM (mostly less than 3 cm). Moreover, ERA5-Land slightly underestimates peak SDs in the case of Budapest at the beginning of February. Figure S1 displays daily mean SDs from the different datasets averaged across the four stations in the 2006–2010 period.
Figure 4 Daily mean snow depth from the different gridded datasets in the 2006–2010 period, spatially averaged over the region shown in Fig. 1
To address the possible representativeness issue arising from the nearest grid cell method, Fig. 4 presents daily mean SDs from the gridded datasets in the 2006–2010 period, spatially averaged over the geographical area shown in Fig. 1. The relative performance of each dataset is very similar in the case of the pointwise comparison and the spatial averages. Moreover, the areal mean patterns shown in Fig. 4 are almost indistinguishable from those obtained by averaging out daily mean SDs across the four stations (Fig. S1). The ERA5 and CGLS datasets are very close to each other; meanwhile, ERA5-Land displays slightly higher values (by less than 1 cm). Mean SDs from CARPATCLIM are 1–2 cm higher compared to the other data sources in January and February.
The spatial distribution of mean February SD is similar among the gridded observation-based and reanalysis datasets (Fig. 5). SD minima can be observed over southeastern Hungary and northern Serbia. SD values increase from the south toward the north in every dataset. Although the differences are small over the low-altitude regions of Hungary, CARPATCLIM displays slightly larger values throughout the country than the other data sources. Over the ranges of the Carpathian Mountains, marked peak values are present in CARPATCLIM and ERA5-Land, presumably caused by the better description of topography at the higher horizontal resolution, as such pronounced maxima cannot be observed in the case of ERA5. Mountain regions are masked out in the CGLS product. In summary, Fig. 5 indicates a considerable dataset uncertainty over complex terrain.
Figure 5 Spatial distribution of mean February snow depth from the different gridded datasets in the 2006–2010 period
However, quantitative verification in mountainous areas is out of the scope of this paper. Several studies evaluated the ERA5 and ERA5-Land reanalyses over regions with complex topography, indicating the superiority of ERA5-Land. Muñoz-Sabater et al. (2021) demonstrated an improved performance of ERA5-Land compared to ERA5 over mid-altitude mountains due to the enhanced horizontal resolution. Larger SDs in ERA5-Land compared to ERA5 agree better with in situ observations over the Tibetan Plateau according to Lei et al. (2022). Li et al. (2022b) indicated that both ERA5 and ERA5-Land overestimate SD in the Tianshan Mountains (Central Asia), although the magnitude of the positive bias is larger in ERA5. Maps of the monthly mean SD for December, January, and March are shown in Figs S2–4. Conclusions for the other months are identical to those discussed above in the case of February.
4.2 Evaluation for the 1985–2010 period
Table 4 displays the statistical metrics calculated between the different datasets and the station observations for the 1985–2010 period. The correlation coefficients of ERA5 are between 0.94 and 0.97. The PCORR numbers of ERA5-Land and CARPATCLIM are in the range of 0.83–0.89 and 0.73–0.82, respectively.
Table 4 Pearson correlation coefficient (PCORR), mean error (ME), mean absolute error (MAE), and root-mean-square error (RMSE) of daily mean snow depth from the different gridded datasets, compared to station observations. Period: November to March 1985–2010
Correlation coefficients of the WRF regional climate simulations are between 0.52 and 0.65, implying a moderate performance compared to the observation-based and reanalysis datasets. Error numbers (ME, MAE, and RMSE) are the lowest for ERA5, followed by CARPATCLIM and ERA5-Land, displaying errors of similar magnitude. Error values are the largest for the WRF simulations. Positive MEs indicate a general overestimation for every dataset, although with different magnitudes. Reducing the grid spacing from 50 km to 10 km in WRF does not improve the results considerably.
Figure 6 Daily mean snow depth from station observations (OBS) and the different gridded datasets in the 1985–2010 period
The observation-based datasets, the reanalysis products, and the WRF regional climate simulations correctly depict the day-to-day variability of measured mean SDs in the 1985–2010 period (Fig. 6). However, magnitude errors exist, occasionally reaching significant values in the case of the RCM runs. The overestimation of ERA5 is almost negligible, remaining below 1 cm. ERA5-Land and CARPATCLIM show a comparable positive bias of 2–3 cm in the case of the Budapest and Debrecen stations in January, and excessive SDs persist throughout the entire period. The overestimation of ERA5-Land is lower (remains below 1 cm) in the case of Pécs and Szeged than for the other two stations. The WRF simulations significantly overestimate daily SD values. In the ablation period (February and March), WRF model biases exceed 5 cm (and even reach 8 cm) for some measurement sites. The melting period is delayed in the case of all four stations, but discrepancies are more severe for the sites located further north (Budapest and Debrecen). A similar SD overestimation has been found across the Tianshan Mountains in the melting period by Li et al. (2022a) using the Noah-MP land-surface model. You et al. (2020) indicated that the different options for the description of physical processes in Noah-MP affect SD in the ablation period, and the overestimation might be reduced by the optimal selection of model settings (e.g., using a semi-implicit snow temperature time scheme accelerated snow ablation compared to a full-implicit scheme). The delay of snowmelt impacts land-surface interactions and can have a deteriorating effect on simulated surface air temperatures, as shown in the case of the ERA5 reanalysis by Lei et al. (2022). Our results indicate that the increase in horizontal resolution does not lead to better model performance in the WRF simulations. We assume that the beneficial effects of the reduced grid spacing would be more evident over regions with complex topography. In addition, we propose that the significant overestimation of SD is a reasonable explanation for the extensive cold bias found in previous RCM evaluation studies in winter over large parts of Europe (García-Díez et al. 2015; Katragkou et al. 2015; Varga and Breuer 2020). The delayed ablation might also indicate problems in the representation of snowmelt processes in Noah-MP that should be addressed in further model improvement efforts. Figure S5 displays daily mean SDs from the different datasets averaged across the four stations in the 1985–2010 period.
Figure 7 Daily mean snow depth from the different gridded datasets in the 1985–2010 period, spatially averaged over the region shown in Fig. 1
The inspection of spatially averaged daily mean SDs from the grid-based datasets confirms the findings of the pointwise comparison (Fig. 7). The ERA5 reanalysis displays the lowest values, followed by ERA5-Land, CARPACTLIM, and the WRF regional climate simulations. Surprisingly, in most of the months, WRF10 presents even higher SDs than WRF50, i.e., the overestimation increases with the decreased grid spacing. On the other hand, in the ablation period (late February–March), WRF10 SDs are slightly below those of WRF50, suggesting that snowmelt processes might be better resolved at the higher horizontal resolution. However, the better performance of WRF10 is not as consistent at the station scale as in the spatial averages.
The excessive snow cover in the RCM simulations might at least partially originate from the overestimation of precipitation, as suggested by Orsolini et al. (2019) for the ERA5 reanalysis and Frei et al. (2018) for the EURO-CORDEX regional climate model ensemble. Indeed, WRF simulations participating in EURO-CORDEX overestimate winter precipitation over Europe (García-Díez et al. 2015; Vautard et al. 2021). Moreover, we also encountered a general wet bias in winter and spring during our preliminary sensitivity tests with the WRF model over the Pannonian Basin region (Varga and Breuer 2020). Lin and Chen (2022) demonstrated a significant overestimation of snowfall in the CMIP6 ESM ensemble and the ERA5 reanalysis over many parts of the world, using proxy snowfall data constructed from 2 m temperature and precipitation. According to their results, snowfall overestimation is the largest in spring, and the simulated snowfall duration is considerably longer than in reality. In summary, the overestimation of precipitation and therefore snowfall is quite common in state-of-the-art model-based datasets. It is challenging, however, to separate the relative contribution of precipitation and temperature errors, and land-surface model deficiencies to SD biases (McCrary et al. 2017).
The significant February–March SD overestimation in the WRF simulations is alarming for several reasons. First, the unrealistically deep and persistent snow cover can induce excessive snow-albedo feedback (Minder et al. 2016) that would lead to further atmospheric cooling in the model, resulting in an increased amount of solid precipitation in late winter and early spring. Moreover, the representation of albedo over snow surfaces might essentially (i.e., without major SD biases) be inaccurate. For example, Minder et al. (2016) demonstrated a large overestimation of surface albedo in the snowmelt season over the central Rocky Mountains, which they attributed to the lack of considering impurities deposited onto the snow in the LSM. Second, the positive albedo bias likely affects the surface energy budget and air temperatures aloft. In addition, the role of the temperature-albedo feedback is larger in spring when solar radiation fluxes are increasing (Mudryk et al. 2020). Finally, Thackeray et al. (2019) stated that the snow-atmosphere coupling is the strongest during the springtime melting period, and biases in snow cover can have a long-lasting impact on soil moisture.
Figure 8 Spatial distribution of mean February snow depth from the different gridded datasets in the 1985–2010 period
Figure 8 shows the spatial distribution of the mean February SD from the observation-based CARPATCLIM dataset and the reanalysis products for the 1985–2010 period. The conclusions are very similar to those for the 5-year period (Fig. 5), except for ERA5 also displaying peak values over the Carpathian Mountains. Nevertheless, high-elevation maxima are more spatially extensive and larger in magnitude in the case of CARPATCLIM and ERA5-Land. Over the low-altitude Hungarian areas, CARPATCLIM SDs are generally higher compared to the reanalysis products. Similar monthly mean SD maps for December, January, and March are shown in Figs S6–8.
Figure 9 Spatial distribution of the mean February snow depth bias of WRF50 (left column) and WRF10 (right column) in the 1985–2010 period, compared to the CARPATCLIM, ERA5, and ERA5-Land datasets (first, second, and third row, respectively)
The spatial distribution of the mean February SD bias of the WRF simulations with respect to CARPATCLIM, ERA5, and ERA5-Land can be seen in Fig. 9. Generally, an increasing overestimation of SD can be observed from southwest to northeast within the investigation area, implying that the snow excess is also greater where average SDs are greater. Over southwestern Hungary and northern Croatia, the WRF regional climate simulations overestimate SD compared to the ERA5 and ERA5-Land reanalyses but slightly underestimate it with respect to the CARPATCLIM dataset. The contrasting sign of the biases highlights the difficulty of snow verification originating from dataset uncertainties, pointed out by several earlier studies (e.g., Mortimer et al. 2020; Li et al. 2022b). The observational uncertainty is amplified over the Carpathian Mountains, where WRF50 consistently overestimates monthly mean SDs compared to ERA5 but underestimates them with respect to CARPATCLIM and ERA5-Land. On the other hand, WRF10 displays excessive SDs compared to every dataset over the high-elevation regions of the Carpathians (Fig. 9). Therefore, the 10 km model generates a deeper snowpack over the mountains than the 50 km simulation, presumably because of higher precipitation amounts due to the enhanced resolution and better-resolved orography. Terzago et al. (2017) also stated that dataset uncertainty can be very high over mountainous areas, undermining our ability to quantitatively evaluate RCM simulations over complex terrain. However, according to Terzago et al. (2017), despite the considerable spread in the observational products, many EURO-CORDEX regional climate models display even larger SWE values than any reference dataset over the greater Alpine region. Over low-altitude areas, the bias patterns of WRF50 and WRF10 are very similar regarding spatial distribution and magnitude. Bias maps for December, January, and March are shown in Figs S9–11.