This study evaluated the performance of eight selected GCMs simulation of CMIP5 decadal precipitation at a catchment level of 0.05-degree spatial resolution. Different skill metrics were employed from both temporal and spatial perspectives in this evaluation assessment. The performance metrics; CC, ACC, and IA measured the temporal skills of the models. The number of grids corresponding to individual metrics’ thresholds represents the spatial skills of the models. These metrics are also calculated for the spatial sum (sum over the entire catchment) of the precipitation for all models. In addition to these, FSSa85 and FSSb15 presented the spatial skill of the models for wet and dry seasons respectively. The CC and ACC measured the phase and correspondence (or anomalies) of the model time series concerning the observed values. The models showed a wide range of performance scores over the initialization years as well as across the catchments. It may be due to the difference in understanding of models on local climate features or the precipitation data of finer temporal and spatial resolutions or the combination of both.
Indeed the model performances are dependent on the model assumptions or basic principle on understanding the earth climate system, its processes, and interactions among atmosphere, oceans, land, and ice-covered regions of the planet. Besides them, decadal prediction skill also depends on the method of model initialization, and quality and coverage of the ocean observations (Taylor et al., 2012). Different initializations also may cause models’ internal variability that is still open for further discussion. For the decadal prediction, one of the most important aspects is the model drift and its correction (Mehrotra et al., 2014). However, to evaluate the performance of models’ derived raw data, neither the drifts were investigated nor any drift correction methods are employed. The reason is, the drift correction method itself may introduce additional errors that may not reflect the real performance of the models (Hossain et al., 2021c, 2021b). Based on the understanding of physical, chemical, and biological mechanisms of earth systems, different modeling groups have come up with different models with reproducing capabilities of climate variables that may vary over different regions (Choi et al., 2016; Homsi et al., 2020; Purwaningsih and Hidayat, 2016) and climate variables (Kamworapan and Surussavadee, 2019; Kumar et al., 2014, 2013). For instance, Kumar et al. (2013) analyzed the precipitation and temperature trends of the twentieth century from nineteen CMIP5 models and reported that the models’ relative performances are better for temperature as opposed to precipitation trends. Generally, models show lower skill to simulate precipitation than they do for temperature. This is because that the temperature is obtained from a thermodynamic balance, while precipitation results are from simplified parameterizations approximating actual processes (Flato et al. 2013; see also references therein). In addition, temporal and spatial scale (considered area) of the considered variables including seasons of the year (Sheffield et al., 2013; Ta et al., 2018) may also be the reason to vary the model performances. For instance, few models can reproduce winter precipitation very well but the other may not and vice versa. Likewise, Lovino et al. (2018) evaluated CMIP5 model performances for decadal simulation and concluded that both the best model. They also suggested that the MMEM could reproduce large-scale features very well but fail to replicate the smaller scale spatial variability of the observed annual precipitation pattern. These show clear evidence that there is a spatial variation in the climate model performances across the globe as they are developed by different organizations (Chen et al., 2017). This study noticed the highest skill in the initialization year of 1990 and the lowest skill in the initialization year 1980, but the reason behind the highest and lowest skill remains unknown. However, Meehl et al. (2015) reported that the consequences of Fuego (in 1974) and Pinatubo (1991) eruption degraded the decadal hindcasts skill of Pacific sea surface temperature in the mid-1970s in mid-1990s respectively. As Fuego was smaller than Mount Pinatubo and a lower degrade of skill in the mid-1970s and higher degrade of skill in the mid-1990s were evident but no degrade on the hindcast skill was evident due to Agung (erupted in 1963) and El Chichón (1982) (Meehl et al., 2015). In this study, models’ higher and lower skills of initialization 1990s and 1980s, seem neither relevant to volcanic eruption nor the post-eruption sequences. Nevertheless, the observed precipitation or coverage of the ocean observed state to initialize the models have been affected.
The CC and ACC values of all the selected models in all initialization years remained under the threshold > = 0.6, which was marked as the threshold of significant level in previous studies (Choi et al., 2016; Lovino et al., 2018) though those studies were for coarser spatial resolutions and one of them for different climate variables. Lovino et al. (2018) compared CMIP5 model performances over two variables at the local level and reported higher skill scores for the temperature than precipitation of the same models where the skill scores for precipitation were remarkably lower than the scores for temperature. Similar results were also reported by Jain et al. (2019). In this sense, it seems precipitation data with higher spatial resolution may be the reason for not capturing the significant level of skills on linear association (CC) and phase differences or anomalies (ACC). However, few models show that the level of significance (threshold > = 0.6 if we say) for the performance metric IA, which is a measure of the predicting accuracy that seems promising predictive skill of the models. But the studies that mentioned 0.6 as the level of significance for CC and ACC, used either coarser resolution data (Lovino et al., 2018) or different climate variables (Choi et al., 2016). For the local or regional level as well as models’ raw precipitation data of higher spatial and temporal resolution, 0.50 seems a significant score, which is also the same for the similar performance metrics for the case of total precipitation.
This study also investigated the model performances to reproduce the summer and winter precipitation. Upon comparing the model skills to reproduce the extreme wet ( > = 85 percentile of the observed values) and dry events (< 15 percentile of the observed values) across the catchment and also at the selected grid, this study reveals that except CMCC-CM, all models show almost similar skills to reproduce the summer precipitation but exhibits some variations to produce the winter precipitation. Similar skills are also noted for other intermediate thresholds. It is due to the maximum and minimum precipitation occurring in Brisbane during summer and winter respectively. This means that models’ responses to reproduce summer precipitation are better than the winter with the tendency to overestimate higher precipitation events. However, the Category-I model comparatively performed better to capture the dry events (Fig. 4) than the wet events, but this may vary for different regions around the globe. For instance, MRI-CGCM3 showed very good skills and has been marked as the first category model in this study but to reproduce the Sahelian precipitation, MRI-CGCM3 showed insignificant or no skills whilst MPI-ESM-LR and MIROC5 are categorized as the second and third category model but were marked as improved skilled models for Sahelian precipitation (Gaetani and Mohino, 2013).
Previous studies (Jain et al., 2019; Lovino et al., 2018) reported that MMEM improves the models’ skills to reproduce climate variables but the selection of models to form MMEM is very challenging as the arithmetic means of the models’ output may further lead to loss of signal (Knutti et al., 2010). This study also examined the performance of MMEM and revealed that MMEM improves the performance metrics to some extent but not always and the performances are highly dependent on models’ combination to form MMEM. For instance, MMEM2 shows better performance metrics than the other two combinations in reproducing the extremely dry and wet events where MMEM3 showed worse performance (Fig. 9). On contrary for the highest thresholds of individual metrics where few individual models were found better than MMEM3. Similar results were also reported in some other studies (Kumar et al., 2013; McKellar et al., 2013) where individual models were found better to some extent than the MMEM. However, lower skills of CMIP5 models for decadal precipitation as compared to temperature is also true for the MMEM which was also reported by Mehrotra et al. (2014).
In addition to understanding the climate system, models’ configuration structuring spatial and temporal resolutions of the simulating variables also play a vital role in determining the model performance (Sakamoto et al., 2012). In this study except for CMCC-CM, models with finer atmospheric resolutions performed better than the coarser resolutions’ models (see Table 1 Category-I model). It means, models of finer atmospheric resolutions can reproduce local climate features better than the models of coarser spatial resolutions and similar results were also reported in previous studies (Jain et al., 2019; Lovino et al., 2018). However, the lower skill of CMCC-CM may be due to the difference in understanding or geographical locations. However, for different climate variables like temperature, the performance of CMCC-CM may be different (Lovino et al., 2018). This study will help the water manager, infrastructure developers, agricultural stakeholders to sort out the models before taking any decision in planning and developing infrastructures based on the models’ predicted future precipitation. Findings of this study will also help the researchers for hydrological modelling, and other relevant stakeholders to increase the resilience of the society to climate change in relation to future water availability and uncertainty.