Performance evaluation of CMIP6 global climate models for selecting models for climate projection over Nigeria

This study assessed the performances of 13 global climate models (GCMs) of the CMIP6 in replicating precipitation and maximum and minimum temperatures over Nigeria during 1984–2014 period in order to identify the best GCMs for multi-model ensemble aggregation for climate projection. The study uses the monthly full reanalysis precipitation product version 6 of the Global Precipitation Climatology Centre and the maximum and minimum temperature CRU version TS v. 3.23 products of the Climatic Research Unit as reference data. The study applied five statistical indices, namely, normalized root mean square error, percentage of bias, Nash–Sutcliffe efficiency, coefficient of determination, and volumetric efficiency. Compromise programming (CP) was then used in the aggregation of the scores of the different GCMs for the variables. Spatial assessment, probability distribution function, Taylor diagram, and mean monthly assessments were used in confirming the findings from the CP. The study revealed that CP was able to uniformly evaluate the GCMs even though there were some contradictory results in the statistical indicators. Spatial assessment of the GCMs in relation to the observed showed the highest ranked GCMs by the CP were able to better reproduce the observed properties. The least ranking GCMs were observed to have both spatially overestimated or underestimated precipitation and temperature over the study area. In combination with the other measures, the GCMs were ranked using the final scores from the CP. IPSL-CM6A-LR, NESM3, CMCC-CM2-SR5, and ACCESS-ESM1-5 were the highest ranking GCMs for precipitation. For maximum temperature, INM.CM4-8, BCC-CSM2-MR, MRI-ESM2-0, and ACCESS-ESM1-5 ranked the highest, while AWI-CM-1–1-MR, IPSL-CM6A-LR, INM.CM5-0, and CanESM5 ranked the highest for minimum temperature.


Introduction
There have been many studies on the impacts of climate change resulting from increasing temperature and erratic precipitations in many parts of the globe (Iqbal et al. 2019;Khan et al. 2020a;Salman et al. 2020;Shiru et al. 2018). It has been commonly concluded that disaster frequencies, severities, and risks, particularly relating to droughts and floods, have increased (Alamgir et al. 2019;Asdak and Supian, 2018;Ayugi et al. 2020;Manawi et al. 2020). They are also expected to increase in the future under different emission scenarios of generations of global climate models (GCMs) (Homsi et al. 2020;Sa'adi et al. 2019;Shiru et al. 2020c;Tan et al. 2020), which are the primary tools for climate predictions and future climate projections. For increased confidence in future climate projections, performance evaluation of GCMs is imperative (Zhao et al. 2020). Such evaluations are pivotal to the development of reliable adaptation and mitigation measures against the risks and impacts of climate change.
Over the years, there have been developments of many GCMs for the different scenarios of the IPCC assessment reports including the coupled model inter-comparison project (CMIP) phase 3, phase 5, and the recently released phase 6. Many studies reported improvements of the CMIP5 over the CMIP3 (Tanveer et al. 2016;Taylor et al. 2012;Zhou et al. 2017). The main differences between the CMIP6 simulations and the earlier CMIP phases (CMIP3 and CMIP5) are the start years for the future scenarios, new sets of specifications for concentration, emission, and land-use scenarios (Gidden et al. 2019). Although the CMIP6 is yet to have the complete ensemble GCMs, some recent studies have demonstrated its robustness over CMIP5 in some regions, e.g., South Asia (Zhai et al. 2020), China (Xin et al. 2020), South Korea (Song et al. 2020), Australia (Grose et al. 2020), and Africa (Ayugi et al. 2021a). It is therefore important to assess their performances in other regions where they are yet or have not been widely applied.
Because there has been a general consensus that all GCMs show similar climate characteristics over the globe, each GCM has been treated as an equal. However, there are variations in GCM spatial performances across the globe Homsi et al. 2020;Khan et al. 2020b). Therefore, many studies have recommended the aggregation of a multi-model ensemble (MME) from a pool of GCMs by excluding the GCMs that are considered the least realistic in order to reduce the uncertainties associated with the GCMs (Ahmed et al. 2019a;Lutz et al. 2016;Shiru et al. 2019a). Though the combination of multiple GCMs for projection is characterized by challenges such as the effectiveness of error cancellation from the averaging of GCMs, they are better than the usage of a single model (Knutti et al. 2010). There is also the challenge of the definition of performance metrics which sufficiently relates to the models' prediction skills and the issue of an overall model ranking method of multipurpose in models subset selection.
Studies have been conducted in assessing the performances of GCMs in many parts of the globe using different statistical measures (Rivera and Arnould 2020; Sreelatha and Anand Raj 2019). However, there can be challenges in decision-making as to which GCMs performed best due to contradictions in the outputs from different statistical measures (Ayugi et al. 2021b;Klutse et al. 2021). This emphasizes the implementation of a compromise solution which can consider tradeoffs among the measures for selecting the best models.
Compromise programming (CP) (Zeleny, 1973) is one of the multi-criteria decision-making techniques applied in many fields including climate studies. The basic idea in CP is to identify the ideal solution as a point where each considered attributes' achieves its optimum value. Hence, the ideal solution will be that point that is closest to the ideal point. Interference by decision-making is prevented in CP through its ability in identifying the closest ideal solution (Rezaei et al. 2017;Samal and Kansal 2015). In climate research, CP was used in compromising the various performances and selecting the best gridded precipitation data over Iraq (Salman et al. 2019) and in the ranking of GCMs Raju et al. 2017).
The CMIP6 being recently released is not known to have been applied in Nigeria. GCM selection over the region has also not been conducted using CP method. Therefore, this present study aims to select the most suitable GCMs from a total of 13 precipitation and maximum and minimum temperatures for aggregation into MMEs for climate projection over the region. This study uses the monthly Global Precipitation Climatological Center (GPCC) precipitation and Climate Research Unit (CRU) maximum and minimum temperature data as reference data (observed data). Five statistical indices, namely, normalized root mean square error (NRMSE), percentage of bias (Pbias), Nash-Sutcliffe efficiency (NSE), coefficient of determination (R 2 ), and volumetric efficiency (VE), were used to identify the performances of CMIP6 relative to the observed precipitation and maximum and minimum temperatures. Using CP, the scores from the metrics were used for ranking of the GCMs. Probability distribution function (PDF) and Taylor diagram (TD) were used to assess the performances of the GCMs in replicating the observed data. The annual and monthly time series were also plotted to assess the closeness of the GCMs to the observed. Finally, the best performing GCMs were selected. So far, the study background and objectives of the study have been presented. Henceforth, Sect. 2 presents the study area and datasets; Sect. 3, the methodology; Sect. 4, the results; and Sect. 5, the discussions, while the conclusions are given in Sect. 6.

Study area
Nigeria in the western part of Africa (latitudes 4°15′-13°55′ N; longitude 2°40′-14°45′E), covering an area of 923,000 km 2 (Fig. 1) is considered for this study. In the north of the country is Niger, to the east are Chad and Cameroon, and Benin Republic borders it at the west, while the stretch of the southern part is bordered by the Atlantic Ocean. The country has two major seasons, namely, rainy and dry seasons. The climate of the country varies spatial and temporally with the southern parts receiving over 2,000 mm annual precipitation during April to October and below 500 mm precipitation in the northern parts occurring during June and September. In the inland areas of the country, the mean annual minimum and maximum temperatures vary diurnally from 19.0 to 22.8 °C and 32.3 to 36.0 °C respectively. At the coasts of the country, the mean annual minimum temperature ranges from 21.8 to 22.7 °C, while the mean annual maximum ranges from 30.7 to 31.0 °C. Different climatic conditions mainly monsoon climate in the south, tropical savanna climate at the central, and warm semi-arid and warm desert climate in the north exist in the country. The ecological zones within the country are mangrove swamp and rainforest in the south, Guinea savanna at the central, and Sudan savanna and Sahel savanna in the north. The lowest point within the country at 0 m is at the coastal south adjoined by the Atlantic Ocean, while Chappal Waddi at 2,419 m in the northeastern part of the country is the highest point.

Gauged based gridded precipitation and temperature data
Like in many developing nations, there is difficulty in accessing reliable climate data in Nigeria. Firstly, due to the sparse distribution of gauges across the country, only 87 operating rain gauges are available as against the recommended 1057 and gauge density of 874 km 2 by the World Meteorological Organization (WMO, 1965) for appropriately measuring precipitation . Secondly, there is the incompleteness of data for some of the stations, whereas some are not available at the long-term. Therefore, this study uses a 0.5° × 0.5° grid resolution GPCC V6 monthly full reanalysis data product of the Deutscher Wetterdienst (Becker et al. 2013) and monthly maximum and minimum temperature CRU TS v. 3.23 of the East Anglia University (Harris et al. 2014) for the period 1984-2014. Due to the lack of sufficient data, the GPCC precipitation and the CRU temperature products have been found to be suitable for climate studies over the African continent including in Nigeria (Diallo et al. 2018;Shiru et al. 2020bShiru et al. , 2019bTirivarombo et al. 2018). The monthly precipitation data of the GPCC are produced based on over 85,000 rain gauge stations from about 190 countries. A smart interpolation technique which considers the systematic relationship between elevation and station observations which enhances the estimation accuracies is used in the production of the GPCC (Funk et al. 2007).
The CRU employs measurements from about 4,000 monitoring stations across the globe. The product undergoes an extensive two stages, manual and semi-automatic quality control measures, the first being to ensure consistency and the second involving removal of stations or months with large errors. The production of the world's land-based gridded temperature data by the CRU has been of great significance to the international community especially in climate research as it has been widely used in the assessment of the changes in temperature across the globe.

Global climate models
This study uses the historical precipitation and temperature of the newly released CMIP6 GCMs. In the CMIP6 phase, the representative concentration pathways' (RCPs) scenarios, RCP2.6, RCP4.5, RCP6.0, and RCP8.5 of the CMIP5 have been updated to shared socio-economic pathways (SSPs), SSP1-2.6, SSP2-4.5, SSP4-6.0, and SSP5-8.5 respectively, each of which also considers 2100 radiative forcing levels. The climate change research community established the SSPs for facilitation of the integrated analysis of the future climate vulnerabilities, impacts, adaptation, and mitigation (Riahi et al. 2017). According to Hausfather (2018), the SSPs involve five possibilities which are (1) a situation of sustainability-focused growth and equality (SSP1); (2) a situation where the trends follow their historical patterns broadly (SSP2); (3) a fragmented world of "resurgent nationalism" (SSP3); (4) an ever-increasing inequality world; and (5) a situation of rapid and unconstrained growth in energy use and economic output (SSP5). The CMIP6 through improved emissions, land-use scenarios, improved physical processes, and model parameterization is targeted to robustly unfold the future conditions of the climate O'Neill et al. 2016). In this study, 13 sets of historical GCMs each for precipitation and maximum and minimum temperature were chosen from the CMIP6 database for the period 1984-2014. The period 1984-2014 was considered since the CMIP6 GCMs have up to 2014 and to have a 30 years baseline period as was considered for CMIP5. The GCMs were chosen based on the availability of at least one SSP for the future period, common GCMs among precipitation and temperature variables, and their availability for the study area. There are several ensemble members of the CMIP6 including r1i1l1f1, r2i1l1f1, and r3i1l1f1 representing the realization, initialization, and the models' physics. The first ensemble members, r1i1l1f1, of the GCMs were considered in order to have an unbiased comparison of all of them. Information about the GCMs of the CMIP6 models chosen for this study is provided in Table 1.

Methodology
The methods applied in this study are presented in this section. Before the assessments, the GPCC precipitation, the CRU maximum and minimum temperature, and all GCMs were re-gridded to 2° × 2° resolution using bilinear interpolation to have a uniform resolution. The 2° × 2° resolution was chosen as it is approximately equal to the average spatial resolution of most of the GCMs ). Bilinear interpolation is often conducted for smoothly transforming spatially coarse GCM data into finer data through GCM data interpolation from the four nearest neighboring grid points (Almazroui et al. 2020b;Penalba and Rivera 2016). The methods applied in this study are discussed as follows.

Statistical indices
The ability of the different GCMs in reproducing the properties of the observed at the 25 grid points of the study area was assessed using five statistical indices: NRMSE, Pbias, NSE, R 2 , and VE. The expressions used to describe statistical metrics used here, x pred , i, and x obs , i are the i-th gridded and observed data, which is the number of observations. Details about the statistical metrics used in this study are as follows.
The magnitude of the errors in predictions for various times is summarized by the NRMSE, making it a good measure of accuracy (Willmott, 1982). The closer the NRMSE value is to zero, the more accurate the model is (Chen and Liu, 2012;Johnston, 2004). It is a normalized statistic determining the relative magnitude of the residual variance to the variance of the measured data. Smaller NRMSE values (preferably zero) indicate better performance of the model (Raju et al. 2017). It is defined as follows: The models' data tendency to under-or overestimate the observed data is measured by the Pbias. A model performance is better when the Pbias values are closer to zero. A negative Pbias value is an indication of overestimation, while a positive one indicates underestimation (Gupta et al. 1999). The evaluation of Pbias is conducted as follows: The quantitative statistic of Nash and Sutcliffe (1970) is defined as one minus the sum of the absolute squared differences between the predicted and the observed values normalized by the variance of the observed value during the period under investigation. With 1 being the optimal value, the NSE can have values between − ∞ and 1.0. NSE values from 0.0 and 1.0 are considered acceptable levels of performance, whereas values < 0.0 are indicative of unacceptable model performance in which the mean observed value is a better predictor than the simulated value (Moriasi et al. 2007). NSE can be computed as follows: The R 2 can be defined as the square of the Pearson's product moment correlation coefficient (i.e., R 2 = r 2 ) describing the proportion of the total variance in the observed data which is explainable by the model (Legates and McCabe Jr, 1999). R 2 values can range between 0.0 and 1.0, in which the higher value indicates a better agreement. Computation of R 2 is as follows: The VE measures the ratio between GCM and GPCC precipitation volumes over a period, where a VE value of 1 indicates a perfect estimation. It can be calculated using the following equation:

Compromise programming
CP (Zeleny, 1973) is an approach applicable in the measurement of the combined effect of several statistical indices. CP was employed in this study for ranking of GCMs based on five statistical performance measures described above, NRMSE, Pbias, NSE, R 2 , and VE. The statistical indices values were used for the estimation of the distance measure (Lp) metric of CP. L p metric is given as where f j is the value of the statistical performance measure j, f * j is the ideal value of the statistical performance measure j, and p is a parameter which is equal to 1. For a statistical performance measure, the ideal value is that corresponding to a perfect match between relevant observations and the simulations produced by the model. The value of the L p metric is always positive, and the lower the value, the higher the performance of the model; therefore, the smallest values of L p are preferred.
In addition, the abilities of the GCMs to replicate the climate properties of the study area were also assessed using spatial comparisons, PDF, TD, and the plots of the annual and monthly averages of the observed and the GCMs.

Ranking of GCMs for precipitation
The performance metrics for precipitation calculated for all GCMs and the ranking of GCMs derived using CP is shown in Table 2. Though the ideal values vary from GCM to GCM, the overall ranking indicates that a GCM which may have its value as the most ideal for more metrics may not necessarily rank best. For example, ACC.ESM1.5 has two of its metrics as the ideal values but ranked fourth in terms of the overall ranking of the GCMs. For precipitation, the four highest ranked GCMs are IPSL.CM6A-LR, NESM3, CMCC.CM2-SR5, and ACC.ESM1.5 in that order. The least performing GCM for precipitation is INM.CM4.8.

Ranking of GCMs for maximum temperature
The performance metrics for maximum temperature calculated for all GCMs and the ranking of GCMs derived using CP is presented in Table 3. The highest ranking GCMs are INM-CM4-8, BCC-CSM2-MR, MRI-ESM2-0, and ACCESS-ESM1-5 in respective order. The least ranking GCM for the maximum temperature using CP method was CMCC.CM2-SR5.

Ranking of GCM for minimum temperature
The performance metrics for maximum temperature calculated for all GCMs and the ranking of GCMs derived using CP is presented in Table 4. The highest ranking GCMs are AWI-CM-1-1-MR, IPSL-CM6A-LR, INM-CM5-0, and CanESM5 in respective order. The least ranking GCM for the minimum temperature similar to that of the maximum temperature was also CMCC.CM2-SR5.

Spatial comparison of GCMs with GPCC precipitation
The ability of the different GCMs to spatially replicate the average annual GPCC precipitation over the period 1984-2014 is presented in Fig. 2. The performances of the GCMs vary in their abilities to reproduce the GPCC precipitation. However, almost all GCMs show similarities in some regions such as the lower precipitation in the northeastern arid region of the country. Four GCMs, namely, AWI-CM-1-1-MR, CanESM5, BCC-CSM2-MR, and CMCC.CM2-SR5, overestimated the precipitation at the southwest of the country. Precipitation was mostly underestimated by INM-CM4-8, INM-CM5-0, and MPI-ESM-1-2-LR. The GCMs ranked high by the CP method were seen to have better abilities in spatially replicating the historical precipitation in Nigeria.

Spatial comparison of GCMs with CRU maximum temperature
The ability of the different GCMs to spatially replicate the CRU mean annual maximum temperature is presented in Fig. 3. MIROC6 showed a significant overestimation of the temperature at the northeast of the country. The spatial assessment result is in agreement with the result of the CP for maximum temperature which ranked NESM3, IPSL-CM6A-LR, and CMCC-CM2-SR5 as the lowest. These GCMs underestimated the maximum temperature in the study area relative to the CRU. The highest ranking GCMs INM-CM4-8, BCC-CSM2-MR, MRI-ESM2-0, and ACCESS-ESM1-5 showed close spatial relationship with the CRU maximum temperature.

Spatial comparison of GCMs with CRU minimum temperature
The ability of the different GCMs to spatially replicate the CRU mean annual minimum temperature is presented in Fig. 4. The figure show all the GCMs overestimated the minimum temperature relative to the CRU minimum temperature particularly MRI-ESM2-0, NESM3, and CMCC-CM2-SR5 which are the lowest ranking ones. The highest ranking GCMs from the CP showed better skills in replicating the minimum CRU temperature over the country.

Comparison using probability density function
The PDFs of the mean monthly precipitation, maximum temperature, and minimum temperature compared with the observed GPCC and CRU for the period 1984-2014 in Nigeria are presented in Fig. 5a, b, and c, respectively. Figures show that most of the GCMs were able to replicate the precipitation properties of the GPCC especially from the tailing. However, the distribution of precipitation relative to the GPCC varies more in the GCMs INM-CM4-8, INM-CM5-0, and MPI-ESM1-2-LR which showed the least rankings in the CP and spatial underestimation of the precipitation. For maximum temperature, the underestimations by CMCCC-CM2-SR5, IPSL-CM6A-LR, and NESM3 are visible in the PDF plots, while the overestimations by MIROC6 and AWI-CM-1-1-MR are seen. PDF distributions show that most of the maximum temperature distributions are around 30 °C. For minimum temperature, the PDFs of the GCMs

Performance assessment using Taylor diagram
TD (Taylor, 2001) has the ability to give a good statistical summary of correlation, standard deviation, and root mean square (RMS) between the modelled and the observed data. Figure 6a, b, and c shows the performances of the precipitation, maximum temperature, and minimum temperature GCMs, respectively, relative to their observed data using TD. Though the standard deviations are all different, the correlation between the modelled and observed ranges from 0.8 to 0.91. The highest ranking GCMs were seen to have higher correlations.

Comparison of the mean monthly precipitation and temperature of GCMs with the observed
The mean monthly precipitation, maximum temperature, and minimum temperature for the GCMs compared to the observed for 1984-2014 are presented in Fig. 7a, b, and c, respectively. For precipitation, it can be seen that most of the models have good performance during the dry seasons from November to March when there are little or no variability in precipitations. There is high variability in the estimated precipitation by the GCMs relative to the GPCC during For minimum temperature, significant overestimation was observed for NESM3 and CMCC-CM2-SR5. Other GCMs also overestimated minimum temperatures during April and October. There were also some few underestimation of the minimum temperature by some GCMs during between the months of October and April. The highest ranking GCMs AWI-CM-1-1-MR, IPSL-CM6A-LR, INM-CM5-0, and CanESM5 show more closeness to the CRU.

Selection of multi-model ensemble for climate change projection
The scores and final rankings of the different GCMs in replicating the observed precipitation and maximum and minimum temperature derived using CP are shown in Table 5.

Discussion
The trends of increasing temperature and changes in precipitation are expected to intensify under different climate change scenarios of the previous assessment reports (AR) (Homsi et al. 2020;Onyutha et al. 2016;Sa'adi et al. 2019).
The expected changes are also projected to increase incidences of disasters and its risks. While understanding these expected changes through climate projection is imperative for the development of adaptation and mitigation measures against climate change, the choice of GCM or the generation of MME is even more crucial for a region. This is due to the different uncertainties associated with them, which may result from GCM initialization and parameterization (Tebaldi et al. 2005), future GHG emission, and aerosols content scenarios (Hawkins and Sutton, 2009), giving them characteristically varying spatial performances across the globe . Though the newly released CMIP6 has been used for the projection of climate over the globe including Africa (Almazroui et al. 2020a;Zhai et al. 2020), no study has yet specifically applied it in Nigeria as at the time of this study. Therefore, the projection of climate using CMIP6 is crucial for the country which majority of its rural populace depend on rain-fed agriculture, which constitutes 20% of the gross domestic products (GDP). In addition, the country has one of the highest rates of population growth, which would need more water resources, a resource grossly affected by climate. Water resources in the country have been reported to be declining (MacDonald et al. 2005), and a recent study projected groundwater levels will drop to as much as 12 m in some parts of the country in the future (Shiru et al. 2020a).
This emphasizes the need to conduct climate projection using reliable GCMs.
Among the studies that have evaluated the CMIP6 performances on the African continent, (Babaousmail et al. 2021) considering fifteen precipitation GCMs and using a combination of statistical metrics, empirical cumulative distribution function, Taylor skill score, and Taylor diagram, it was found that the CMIP6 GCMs were satisfactorily able to reproduce mean annual climatology of dry/wet months in the northern of the continent. The study found EC-Earth3-Veg, UKESM1-0-LL, GFDL-CM4, NorESM2-LM, IPSLCM6A-LR, and GFDL-ESM4 as the best performing models. Similar to their study, this present study also found the IPSLCM6A-LR to have a good performance for Nigeria. The study found that there was mostly underestimation of the precipitation by the GCMs during the wet seasons, a finding similar to that of ours.
In the eastern part of East Africa, covering Kenya, Rwanda, Tanzania, Uganda, and Burundi, thirteen CMIP6 temperature GCMs were used in evaluating mean surface temperature (Ayugi et al. 2021b) by employing statistical metrics, mean state, and trends. The study found that most of the GCMs overestimated the mean annual temperature with few underestimating it. FGOALSg3, HadGEM-GC31-LL, Fig. 5 PDFs comparing mean monthly observed a GPCC precipitation, b CRU maximum temperature, and c CRU minimum temperature to the GCMs MPI-ESM2-LR, CNRM-CM6-1, and IPSL-CM6A-LR were the models found to be the best performing in the study. Since this study and ours mostly considered different GCMs and different temperature parameters, mean temperature in the former and maximum and minimum temperature in the later, in their ability to replicate observed temperature, conclusions as to the generally performing models based on both studies are undeterminable. This is because some common models such as IPSL-CM6A-LR and MPI-ESM2-LR were found to have good performances as seen in Ayugi et al. (2021b) for mean temperature and in this present study for minimum temperature. However, IPSL-CM6A-LR was found to be the second least ranking model for maximum temperature in this study. It is therefore not known whether similar ranking is obtainable if the same temperature classes are considered. Another factor that may affect the differences in results is the study region and the methods applied in the studies.
In evaluating the capability of twenty one CMIP6 GCMs in replicating daily precipitation during the West African monsoon (WAM) period (June to September) over West Africa using three observations, global precipitation  Fig. 7 Comparison of mean monthly a GCM precipitation and GPCC, b GCM maximum temperature and CRU maximum temperature, and c GCM minimum temperature and CRU minimum temperature climatology project (GPCP), climate hazards group infrared precipitation (CHIRPS), and tropical applications of meteorology (TAMSAT) datasets for validation of the model simulations, Klutse et al. (2021) found that there are substantial discrepancies among the models and in comparison with the observations. It was concluded that there was no single model that exhibited all features of the observational datasets. Our findings show similarity to their study in that no GCM was able to completely replicate both the spatial and annual cycle of monthly precipitation or temperature. Apparently, this justifies the need for evaluation of models to select the most realistic ones or their aggregation into ensembles for projection of climate in order to reduce the individual uncertainties associated with different GCMs. However, this selection does not necessarily rank a GCM similar under different variables. For example, CMCC-CM2-SR5 which ranked high for precipitation had the least performances for the maximum and the minimum temperature. Similarly, INM-CM4-8 which ranked high for temperature showed a lower ranking for precipitation. This can be attributed to several factors including the quality of the observed data (Miao et al. 2012), in which the observed climatological records may contain inhomogeneities, which can be as a result of factors such as changes in measurement practices, relocation of stations, and changes in a station's surroundings over the years (Ducré-Robitaille et al. 2003). The climatic condition and terrain of an area can also be a factor. A study conducted over the Tibetan Plateau (Lun et al. 2020) revealed significant differences in the simulation of some GCMs for different climatic variables. For example, CanESM5 from CMIP6 simulated precipitation (rank score (RS), 6.09) better than the air temperature (RS, 3.11), while for FIO-ESM of the CMIP5, there was significantly better performance of the air temperature (RS, 7.01) than the precipitation (RS, 1.59).
CP is one of the methods of selecting GCMs that have been found to be efficient in a number of studies. Raju et al. (2017) applied the method for the ranking of 36 maximum and minimum temperature GCMs of CMIP5 using three performance indicators, namely, Pearson correlation coefficient (CC), NRMSE, and SS, over India. The study found CP was efficient in aggregating ensemble of different combination of GCMs based on their different weights over 40 grid points of India. CP method was applied by Sa'adi et al. (2019) for aggregation of scores obtained by 20 temperature GCMs of the CMIP5 at the Borneo Island of South East Asia. The results of the study showed four GCMs, FIO-ESM, MRI-CGCM3, GFDL-CM3, and IPSL-CM5A-MR, as the most suitable ones for temperature projection over the region. Similarly, Ahmed et al. (2019a, b) used the method to assess the performances of 20 precipitation and temperature GCMs of the CMIP5 over Pakistan. The study considered NRMSE, SS, and CC. The method identified CESM1-CAM5, HadGEM2-AO, NorESM1-M, and GFDL-CM3 as the best performing GCMs for the area.
This study demonstrates the efficiency of CP method in the aggregation of ensemble GCMs based on the different statistical indicators which are sometimes contradictory. For example in precipitation, IPSL-CM6A-LR which has the best statistical score under NRMSE and NSE ranked second for Pbias and fourth by R 2 . This suggests the application of a robust approach such as CP can place all GCMs on a uniform assessment criterion. In addition, other applications using spatial performances, PDF curves, TD, and mean monthly precipitation to compare the GCMs with the Table 5 Final ranking of GCMs using CP method (bold GCMs are the most suitable for the study area while GCMs in italic are the least suitable)