Evaluation of events of extreme temperature change between neighboring days in CMIP6 models over China

In the context of global warming, the frequency and intensity of extreme weather and climate events are increasing. However, the impact of these changes that is directly felt by people is the day-to-day temperature change. Extreme temperature changes between neighboring days (ETCNs) carry substantial disease risks and socioeconomic impacts. Evaluation studies of ETCN events with global climate models (GCMs) remain unknown in China. This study quantitatively evaluates the performances of 35 GCMs and the multi-model ensemble (MME) of the Coupled Model Intercomparison Project 6 (CMIP6) in simulating the extreme cooling (EC) and extreme warming (EW) events of two consecutive days as defined by relative thresholds. The results showed that from 1981 to 2013, the annual average EW frequencies showed an increasing trend over China, but a decreasing trend for EC events, and the frequency of EW events was higher than that of EC events. EW events mostly occurred in spring, while EC events occurred in autumn. Additionally, the performances of the CMIP6 models were quite different between EC and EW events. The models could capture the annual cycle of EC and EW events well, and the simulations of EW events were generally more reliable than those of EC events. Furthermore, most CMIP6 models overestimated the frequency of EW events but underestimated the frequency of EC events in China. The CMIP6 models could capture the trends in EC events in China but fail to simulate them in EW events. The interannual variability of EW events exhibited relatively better performance than that of EC events. The CMIP6 MME effectively improved the capabilities of the models to simulate the climatology of ETCN events. Individual CMIP6 models exhibited better performances than the CMIP6 MME in terms of the trend and interannual variability. Finally, according to the overall ranking of the CMIP6 models, MPI-ESM-1–2-HAM and FGOALS-f3-L achieved the best performance in simulating EW and EC events, respectively. This study selected the optimal models in different regions at the seasonal and annual scales, providing theoretical support for the frequency projection and modeling improvement of ETCN events.


Introduction
Global climate change is a great challenge for humankind. According to the fifth assessment report of the IPCC, from 1880 to 2012, the global average surface temperature showed an upward linear trend, increasing by 0.85 °C (IPCC, 2013). Compared with the mean climate state, extreme weather and climate events are more sensitive to global warming (Aguilar et al. 2009;Ha and Yun 2012;Schoetter et al. 2015;Lewis et al. 2017), and their frequency and intensity are also increasing (Sui et al. 2018), exerting profound impacts on ecosystems and human society (Easterling et al. 2000). The day-to-day temperature changes and fluctuations induced by climate change are directly felt by people (Li and Yan 2010). An extreme temperature change between neighboring days (ETCN) refers to a dramatic change between the daily temperature of a given day and the temperature of the previous day. Recent studies found that ETCN changes not only have huge effects on ecosystems but also increase the risk of disease and death. These impacts indicate that in addition to the commonly used temperature indicators (such as the average temperature, maximum temperature, and minimum temperature), ETCN is an important meteorological index that reflects the impacts of meteorological environmental changes and human health-related changes (Goldberg et al. 2011;Zhan et al. 2017).
In recent years, the frequency, intensity, and duration of ETCNs have increased , increasing the risk of disease and death (Lin et al. 2013;Cheng et al. 2014). For example, Guo et al. (2011) showed that when the 24-h temperature decreased by 3 °C in Brisbane, the nonaccidental death rate increased by 15.7%, the cardiovascular death rate increased by 35.3%, and people under the age of 65 were most easily affected. Kang et al. (2021) also found that for a 1 °C increase in temperature variability, a 6% increase was observed in cardiovascular disease in China. More studies have found that changes in ETCNs are related to infectious diseases, such as hand, foot, and mouth disease Cheng et al. 2016), pneumonia , and respiratory tract infections (Liu et al. 2015). In addition, ETCNs also affect the growth and reproduction of animals and plants. In Australia, studies on lizards have shown that sharp temperature fluctuations can change the shape of lizards and have a significant effect on embryo formation (Shine and Elphick 2001). Overall, China has a large population and a complex and diverse climate and is especially vulnerable to extreme weather events, resulting in serious economic losses and casualties. Given these characteristics, the simulation of ETCNs is essential for enhancing our confidence in reducing model uncertainties and developing adaptation strategies to reduce the risk of diseases, especially in different regions and seasons in China (Ren et al. 2011).
Global climate models are important tools used for climate simulations and projections of future climate change (Yang et al. 2018). The Coupled Model Intercomparison Project (CMIP) launched by the World Climate Research Program (WCRP) has evolved through CMIP1 (1995), CMIP2 (1997), CMIP3 (2004), and CMIP5 (2013) since its inception. Today, the CMIP has entered the CMIP6 phase, with CMIP6 providing data on the current simulations and future projections of approximately 112 climate models from nearly 33 institutions around the world (Zhou et al. 2019). Compared with the previous-generation CMIP5 model, CMIP6 is significantly improved in its spatial resolution, physical process, coupled carbon cycle, and so on (Eyring et al. 2016). The ability of a model to simulate the current climate has a direct impact on the accuracy of its future projections. Therefore, it is necessary to test the simulation ability of models and to objectively evaluate the accuracy and uncertainty of the simulation results (Grose et al. 2020). Many scholars have evaluated the ability of CMIP6 models to simulate air temperature (Luo et al. 2020), extreme precipitation (Akinsanola et al. 2020), terrestrial evapotranspiration , sea ice (Notz and Community 2020), the Hadley circulation (Grise and Davis 2020), soil moisture (Yuan et al. 2021), wind speed (Krishnan and Bhaskaran 2020), snow depth , the Indian Ocean dipole (McKenna et al. 2020), the East Asian summer monsoon index (Xin et al. 2020), and so on. Regarding ETCNs, Peng et al. (2019) used daily temperature data from 804 meteorological stations across China to calculate the frequency of consecutive days with temperature differences greater than 1 °C. However, the temperature change threshold set by the author was too small to reflect extreme situations of temperature changes between neighboring days. Zhou et al. (2020) studied the spatiotemporal pattern of global day-to-day temperature changes using stationcollected data and 27 CMIP6 models. The results showed that events in which the daily average temperature decreased by more than 10 °C decreased. However, the author only studied the trend changes of EC events and did not consider changes in EW events or seasonal changes. At the beginning of the twenty-first century, the World Meteorological Organization (WMO) and WCRP jointly established the Expert Team on Climate Change Detection and Indexes (ETCCDI). According to daily temperature and precipitation data, the ETCCDI defined 27 representative extreme climate indexes for the study of global and regional extreme climate change (Kim et al. 2020a, b;Akinsanola et al. 2020;Ge et al. 2021;Fan et al. 2020); the ETCN index was not defined.
Therefore, research on the evaluation of different meteorological elements or variables in China using CMIP6 models has been extensive, but to our knowledge, few relevant evaluative studies of ETCN events using CMIP6 models have been conducted in different seasons over China. To what extent do CMIP6 models capture the annual cycle, spatial pattern, trend change, and interannual variability of ETCN events? Which models should be selected as the optimal models based on a set of metrics in different zones in China? Is the performance of multi-model ensemble better than that of individual models? Motivated by these questions, the purpose of this study is to quantitatively evaluate the performance of individual CMIP6 models and the multi-model ensemble in simulating ETCN events, which are defined by relative thresholds and divided into EC events and EW events in eight subregions in China. This study provides a reliable scientific theory for the improvement of climate models and the projection of future EW and EC events in China.

Data
As a reference from observations, we used the daily maximum temperature (Tmax) or daily minimum temperature (Tmin) data provided by the National Climate Center of the China Meteorological Administration. This dataset has been widely used in research on climate change in China. The research period of this study is from January 1, 1981, to February 28, 2014. To better evaluate the ability of the models to simulate the ETCN event frequencies in different regions of China, China was divided into eight regions in this paper. These subregions over land were adapted from Wu et al. (2020) as shown in Fig. 1.
GCM daily Tmax and Tmin data were downloaded from the CMIP6 dataset (https:// esgf-node. llnl. gov/ search/ cmip6/). The dataset covers the same period as the observations. We chose the "Experiment ID" as the "historical" term and the "Variant Label" as the "r1i1p1f1" term, for a total of 35 models. Table 1 provides an overview of the institutions, model names, and resolution information.

ETCN event definitions
A day-to-day temperature change is estimated as the difference between the daily temperature on two neighboring days (Gough, 2008;Zhan et al. 2017;Lei et al. 2020): where TCN denotes a change in the daily Tmax or Tmin for the day i , and T i ( T i−1 ) denotes the Tmax or Tmin for day i (the previous day is denoted as i − 1 ). The term n is the total number of days in the four seasons, which are spring (MAM), summer (JJA), autumn (SON), and winter (DJF). TCN < 0 indicates a cooling event, while TCN > 0 indicates a warming event.
Slight warming or cooling events have little impact on human health. Thus, this study only considers the occurrence of sudden strong warming or cooling events. First, the value of TCN is sorted from highest to lowest. And then, using the top and bottom tenth percentiles as the criteria (Shi et al. 2018;Tan et al. 2021), the 10th percentile and 90th percentile are thus selected as the thresholds for ETCN events in this study. Note that ETCN events are divided into extreme cooling (EC) and extreme warming (EW) events. Here, an EC event refers to a decrease in the Tmin ( TCN<0) under the 10th percentile, while an EW event refers to an increase in the Tmax ( TCN>0) greater than the 90th percentile (Song and Yan 2021;Cai et al. 2020;Shi et al. 2018).

Bilinear interpolation
Due to the different resolutions of CMIP6 models, the bilinear interpolation method is used in this paper to uniformly interpolate the data of each model to the meteorological stations . Then, the model simulation results are compared with the observation results to evaluate the ability of each climate model to simulate ETCNs in China.

Linear trend analysis
The long-term trend of the frequencies of ETCN events was analyzed using the linear tendency estimation method (Tokarska et al. 2020). A simple linear regression was performed between the temperature variable ( y ) and the cor- where a is a linear regression coefficient representing the rate of change in ETCN events. A positive or negative value indicates an increasing or decreasing trend of ETCN events, respectively.
The trend results were tested for significance using the t test at the 95% confidence level.

Evaluation indexes of the performances of CMIP6 models
To assess the performances of the simulations against the observations, two statistical indexes were considered: the root mean square error ( RMSE ) and bias (BIAS) , which measure the overall accuracy of the simulations. These three indexes were calculated as follows: where O i and M i describe the observed and simulated ETCN event frequencies and n is the number of samples; a good model will have BIAS and RMSE values of almost 0.
We selected the Taylor skill score ( TS ) to assess the spatial pattern of the models. The TS is defined according to Wang et al. (2018) as follows: where R indicates the spatial correlation coefficient between the observed and simulated frequencies of ETCN events and R o is the maximum attainable correlation coefficient. The parameters M and O are the standard deviations of the simulations and observations, respectively. A score closer to 1 indicates a better consistency between the observations and simulations, while a score of 0 indicates no match at all. TS focuses on the performances of the correlation coefficient and variance, according to Guo et al. (2021).
The annual cycle is also a very crucial evaluation index for model performance, which can capture the monthly evolution of ETCN events. The annual cycle index ( ACI ) reflects the amplitude of seasonal variation, which is calculated as.
where M max and M min are the maximum and minimum values of four seasons (spring, summer, autumn, and winter) of the frequencies based on the CMIP6 model, respectively. Likewise, O max and O min are the observed data. Better performance is achieved when ACI is closer to 1.
To analyze the abilities of the models with respect to temporal variations, we asked the models to reproduce the temporal standard deviation. Therefore, we used the interannual variability skill (IVS) to evaluate the models (Zhu et al. 2020;Kim et al. 2020a, b). The formula is as follows: where STD m and STD o are the interannual standard deviations of the simulated and observed data, respectively. Small IVS values indicate better performances between the CMIP6 models and the reference datasets .

Multi-model ensemble mean
To eliminate systematic errors within the model, we use the multi-model ensemble average (MME) of 35 GCMs (Mudryk et al. 2020;Bai et al. 2021), calculated as follows: where MME(t) is the ensemble mean at time t , N is the total number of GCMs, and P n (t) is the frequency of the nth GCM at time t.

Comprehensive rating index
According to these multiple evaluation indexes, the rank of each model can be obtained. The comprehensive rating index ( CRI ) was introduced to calculate the comprehensive rank of each model (Rivera and Arnould, 2020;Zhang et al. 2018).
In the above formula, m is the number of models (here, we used 35); n is the number of indicators used for the evaluation; and rank i is the rank of the model according to its simulation ability based on evaluation index i . The closer rank i is to 1, the closer CRI is to 1, indicating a better simulation effect of the model.

Annual cycle
The observation results show that from 1981 to 2013, the annual average frequency of EW events in China was higher than that of EC events; these frequencies were 19.042 days and 17.980 days, respectively. In terms of different seasons, the frequencies of EW events in spring, summer, and winter are higher than those of EC events, and the frequency of EW events in autumn is slightly lower than that of EC events. EW events occur most often in spring (5.0 days) and least often in winter (4.584 days). In contrast, EC events occur most often in autumn (4.751 days) and least often in spring (4.323 days). Therefore, ETCN events are most likely to occur in the spring and autumn seasons. Additionally, obvious regional differences exist in the spatial distributions of EW and EC events. Most EW events occur in the SC region, with an average annual frequency of about 19.344 days (Figs. 2 and 3). EW events mainly occur in the WNW and SC regions in spring and autumn, respectively, and increasingly occur in the NEC regions in summer and winter. Notably, the TP region has the fewest EW events in all seasons. For the EC events, the frequency of EC events is the lowest in the WNW region, especially in spring. Generally, EC events are more frequent in the NEC region in spring, autumn, and winter, while in the TP region in summer.The annual cycle of the frequencies of EW and EC events in China is simulated by 35 CMIP6 models. The results show that the CMIP6 models can capture annual cycle features well (except for the SC region). Compared to EC events, the ACI values of EW events are generally closed to 1 in China; thus, CMIP6 models reproduce the annual cycle of EW events more robustly than that of EC events. However, CMIP6 models generally overestimate the magnitude of the annual cycle, especially in the TP region. The ACI value is in a range of 1.221 (GISS-E2-1-G) to 3.783 (INM-CM4-8) for EW events and 1.306 (CMCC-ESM2) to 3.484 (MPI-ESM-1-2-HAM) for EC events. CMIP6 MME exhibits the limited ability to improve the performance of the annual cycle in China, and the improvement varies from region. For example, the CMIP6 MME could obviously improve the simulation ability of EW events in the NEC, WNW, and NC regions, with the ACI values of 1.020, 1.020, and 1.016, respectively. For the EC events, the CMIP6 MME shows better reproducibility of the annual cycle of EC events in the NEC region, and individual CMIP6 models are better for simulating events in other regions.

Climatology
Figures 4 and 5 show the difference in the spatial distributions between the EW and EC events simulated by CMIP6 models and those observed in China. Generally, CMIP6 models can adequately capture the spatial patterns of EW and EC events in China, but some bias still exists, as shown in Figs. 6 and 7. Most CMIP6 models overestimate the frequency of EW events in China; the annual average bias ranges from approximately 0.279 days (GISS-E2-1-G) to 1.452 days (EC-Earth3-CC), and the RMSE values are 2.173 days (TaiESM1) to 3.853 days (NESM3). However, the frequency of EW events simulated by most models in the TP region in autumn is underestimated, with biases of approximately − 0.589 days (INM-CM5-0) to 0.075 days (KIOST-ESM), and the RMSEs are 0.970 days (TaiESM1) to 1.572 days (NorCPM1). Generally, the SWC region simulations are the closest to the observations, especially  In addition, the annual average frequency of EC events in China is underestimated as a whole, which is contrary to the conclusion obtained for EW events. The biases range from approximately − 2.517 days (NorESM2-LM) to  However, the CMIP6 model simulations vastly overestimate autumn and winter EC events in China, especially in the TP and WNW regions, with more than 70% of the models displaying positive biases. Interestingly, unlike most CMIP6 models, INM-CM4-8 and INM-CM5-0 models overestimate the frequency of EC events in China and subregions, especially in autumn and winter.
It is indicated that the TS results of the EW events are generally higher than that of EC events over China (Fig. 8). The TS scores of most CMIP6 models simulating EW events are above 0.4. Furthermore, models from the same institution generally exhibited consistent performances. For example, the simulation abilities of EC-Earth-Consortium's three models (EC-Earth3, EC-Earth3-Veg, EC-Earth3-Veg-LR) are relatively better, especially that of EC-Earth3-AerChem, and the TS score is 0.621 over China. Thus, models with high resolutions are revealed to output better simulations of EW events than models with low resolutions, which is consistent with the previous study (Sillmann et al., 2013). From the perspective of different seasons, the TS scores for simulations of EW events in spring are high, at 0.305 (NESM3) to 0.643 (TaiESM1), followed by those of EW events in autumn, at 0.187 (GISS-E2-1-G) to 0.542 (EC-Earth3-AerChem). However, the TS scores for the simulations of EW events in summer are generally low, ranging from 0.173 (IPSL-CM6A-LR) to 0.477 (GFDL-ESM4). Regarding the different regions, the TS scores obtained for the simulations of EW events in the WNW region are generally higher, with an annual average ranging from 0.369 (AWI-CM-1-1-MR) to 0.688 (BCC-CSM2-MR). In contrast, the TS values in summer in the SC region and in winter in the NC, WNW, and NEC regions are lower.
Unlike the EW events, the TS scores of most models simulating the annual average frequency of EC events in China exhibit a lower TS value. Among them, the TS score of NESM3 is the highest, at 0.525. In terms of different seasons, the TS scores for the simulations of spring EC events in China range from 0.173 (INM-CM4-8) to 0.588 (E3SM-1-0), followed by those for the simulations of autumn EC events, which range from 0.181 (MPI-ESM-1-2-HAM) to 0.485 (KIOST-ESM); the TS scores of the winter EC event are generally low, ranging from 0.073 (EC-Earth3-CC) to 0.418 (NESM3). Regarding the different regions, the TS scores for the simulations of EW events in the WNW region are generally higher, with an annual average ranging from 0.369 (AWI-CM-1-1-MR) to 0.688 (BCC-CSM2-MR). The TS scores obtained in the WNW region are lower in spring, summer and winter, but the TS values are higher in autumn ranging from 0.225 (CanESM5) to 0.673 (NESM3).
The improvement of CMIP6 MME varies from evaluation indexes and regions. According to the result of BIAS, CMIP6 MME failed to improve the simulation ability. Remarkably, in terms of RMSE, CMIP6 MME can effectively outperform the individual CMIP6 models in all Fig. 8 Same as Fig. 6, but for TS regions. CMIP6 MME exhibits higher performance for the TS score in simulating EC events in autumn in the WNW and winter ENW regions. The TS value is 0.709 and 0.415, respectively, which is better than all individual CMIP6 models.

Trend analysis
The observation results reveal that the annual average EW frequencies show a significant increasing trend (1.092 days/10a) over China. The increasing rate of EW events is fastest in summer (0.311 days/10a) and slowest in autumn (0.1 days/10a). As determined from the spatial distributions, except for the NEC, WNW, and TP regions, the annual average EW frequencies in most regions are significant upward trends. The increasing rate of EC events is fastest over SWC regions, especially in summer (0.838 days/10a). As a comparison, the annual average EC event frequencies show a decreasing trend (− 0.317 days/10a) with no statistical significance from 1981 to 2013 in China. The decreasing rate of EC events is fastest in NC and TP regions, with a rate of − 0.748 days/10a (P < 0.05) and − 1.056 days/10a (P < 0.05), respectively. However, in ENW and NEC regions, the frequencies of EC events show an increasing trend, with a rate of 0.221 days/10a (P > 0.05) and 0.325 days/10a (P > 0.05), respectively. In terms of the four seasons, the trend of EC event frequencies in summer, autumn, and winter exhibits consistency for the annual average frequencies in China, as previously noted. However, in spring, the frequencies of EC events show a slight upward trend (0.065 days/10a), notably in ENW and JH regions.
Whether a model can simulate the trend changes in EW and EC events is an important index used to examine the simulation ability of the model. As shown in Figs. 9 and 10, large differences exist among the simulation results of the models, especially for EW events. Most CMIP6 models could not reflect the trend changes in EW or EC events in China, and a few models have optimal simulation results. For the simulation of EW events, 11 models display a better performance in simulating the increasing trend of the annual average frequency of EW events in China (Fig. 11). The NorESM2-MM model is the closest to the observations, with a rate of 0.734 days/10a. Most models could simulate the observed increasing trend in winter but poor performance in summer. Generally, the spring NorESM2-MM, summer IITM-ESM, autumn GISS-E2-1-G, and winter AWI-ESM-1-1-LR simulate trends of EW events in China are good agreement with the observations. From the perspective of different regions, the models could exhibit the decreasing trend of EW events in the NEC region in summer and autumn, and could also capture the increasing trend of winter EW events in the ENW and JH regions. However, CMIP6 models have poor performance for reproducing the trend in the JH, ENW, and SWC regions in summer, and the NC regions in autumn.
Compared to the EW events, most models show a relatively better performance for the simulation in the trend of the frequencies of EC events (Fig. 12). Except for the IITM-ESM model, CMIP6 models could reasonably simulate the decreasing trend of the annual average frequencies of EC events in China. The outputs of the FGOALS-f3-L model are closest to the observations, with a rate of − 0.319 days/10a. Most models could simulate the decreasing trend in summer and winter but poor performance for spring and autumn. The spring CanESM5, summer EC-Earth3-Veg, autumn IPSL-CM6A-LR, and winter ACCESS-CM2 simulated trends of EC events in China are in good agreement with the observations in China, and the optimal models in four seasons are different from the EW events. Regarding the subregions, more than 50% of the models could simulate the trends of EC events in the NC in all seasons. Most models show a better performance for the trends of EC events in the NC, JH, SC, SWC, and ENW regions in summer, while poor performance in the NEC region.
CMIP6 MME could reasonably capture the decreasing trends in EC events in China; however, it fails to simulate the increasing trends in EW events (Fig. 13). In addition, the EC and EW events simulated by CMIP6 MME are quite different among different regions. The CMIP6 MME could obviously improve the simulation ability of winter EC events in China, especially in the JH region. The trend obtained is − 0.107 days/10a, which is second to the FGOALS-f3-L model and is better than the outputs of another individual model. In terms of EW events, the CMIP6 MME does not outperform the individual model in China. Regionally, a significant improvement in CMIP6 MME has been found in the NEC, TP, and WNW regions in autumn, especially in the NEC region in autumn. The trend of EC events simulated by CMIP6 MME is − 0.133 days/10a in the NEC region, which is closed to the observed trend (− 0.153 days/10a). However, for the other regions, the effect of the CMIP6 MME simulation is not as good as that of the simulations of most individual models. Although some models could successfully simulate the trends of the EW and EC events frequencies, the simulations of the change rates are much weaker and differ significantly from the observed results.

Interannual variability
The assessment of temporal variability is another index used to measure the simulation ability of a model. In this   Fig. 11 but for EC events paper, IVS is used to evaluate the consistency of interannual variability between the modeled and observed data. The results are shown in Fig. 14. Overall, the interannual variability simulations of EW events are satisfactorily better than those of EC events. Specifically, for EW events, the annual average IVS in China ranges from 0 (AWI-ESM-1-1-LR) to 1.238 (IITM-ESM). Regarding the different seasons, the IVS is highest in autumn, with IVS values greater than 1 obtained for approximately more than 50% of models, indicating poor interannual variability performance. However, the IVS values are lower in spring and winter, ranging from 0 (INM-CM5-0) to 1.320 (MIROC6) and from 0 (AWI-ESM-1-1-LR) to 0.665 (IITM-ESM), respectively. In the different regions, the IVS values obtained in the SWC region in summer are generally lower than those obtained in the other regions. In comparison, the IVS values obtained in the SC and SWC regions in spring are generally higher. In addition, the annual average IVS values of the TP region are higher than those of other regions, at 0.006 (TaiESM1) to 2.358 (GISS-E2-1-G), likely because the models could hardly reproduce the EW events related to the complex topography (You et al. 2020).
For annual average EC events, a closer look at the interannual variability in China shows that the values of IVS range from 0.035 (EC-Earth3-Veg) to 1.628 (NESM3) and the IVS in summer is generally higher, especially the models from NCC (NorCPM1, NorESM2-LM, and NorESM2-MM) and INM (INM-CM4-8 and INM-CM5-0), with IVS values greater than 1. The IVS is relatively lower in spring and winter, ranging from 0 (AWI-CM-1-1-MR) to 0.767 (NESM3) and from 0 (NorESM2-LM) to 0.590 (IPSL-CM6A-LR), respectively. In each region, IVS is generally lower in summer in the NEC and ENW regions, and winter in the SC region, while in the TP and WNW regions, IVS is higher than in other regions.
The CMIP6 MME has difficulty reproducing the interannual variability in the annual average EW and EC events in China, consistent with the findings described in Jiang et al. (2015), who pointed out that most models failed to simulate the interannual variability of extreme precipitation in China. Meanwhile, the improvements in other regions are also not obvious.

Overall model ranking
In this study, different evaluation indexes are used to select the optimal individual CMIP6 models in simulating the annual cycles, spatial patterns, climatology, interannual variability, and trends of EW and EC events in China. Because of the discrepancy inherent in model ranking, a comprehensive evaluation index (CRI) is adopted to select the most  Tables 2 and 3, the optimal models for simulating EW and EC events differed among different seasons and regions. For the annual average EW events in China, MPI-ESM-1-2-HAM has the best simulation performance. TaiESM1 in summer, AWI-ESM-1-1-LR in winter, and NorESM2-MM in spring and autumn are the preferred models for simulating EW events. For the annual average EC events in China, the FGOALS-f3-L model has the best simulation performance, and the spring TaiESM1, summer FGOALS-g3, autumn IPSL-CM6A-LR, and winter MPI-ESM-1-2-HAM are the best models for EC events. Notably, the high-resolution models perform better for EW events in the NC, JH, and SC regions and EC events in the NEC and JH regions, whereas this expectation is not extendable to other regions.
It is worth noting that the CMIP6 MME significantly improves the climatology simulations of EW and EC events in winter in China. This is mainly because the CMIP6 MME reduces the bias and uncertainty among climate models and outputs information that is closer to the observed results (Wang et al. 2018;Almazroui et al. 2017). However, the  CMIP6 MME does not significantly improve the overall performance of EW and EC events in China, and the CMIP6 MME is not the best scheme. A possible reason for this result is that the CRI score includes the IVS index value. Previous studies have shown that the CMIP6 MME has a large bias when simulating interannual variability, which may be due to the models of the CMIP6 MME canceling each other out and smoothing the natural variability within the climate system. Thus, the long-term trends of EW and EC events are underestimated, which is consistent with the conclusions of temperature studies conducted by Zhao et al. (2014) and Li et al. (2019). Therefore, the CMIP6 MME method should be used cautiously when simulating the trends and interannual variability rates of EW and EC events.

Discussion and conclusion
In this paper, we quantitatively assessed the performances of 35 CMIP6 GCMs in simulating the annual cycle, spatial pattern, climatology, interannual variability, and trend in ETCN events in China and different regions within China using multiple assessment methods and CRI metrics combined with daily Tmin and Tmax. These results provided a reference for model improvements and projections in the future. The following conclusions were obtained. The observation results show that from 1981 to 2013, the annual average frequency of EW events in China is higher than that of EC events. EW events mostly occur in spring, while EC events mainly occur in autumn. From the trend, the annual average EW frequencies show an increasing trend over China but a decreasing trend for EC events.
The 35 CMIP6 models could robustly reproduce the annual cycles of EW and EC events well. The simulation of the annual cycle of EW events is better than that of EC events. The CMIP6 models could capture the spatial patterns of EW and EC events in China. However, most CMIP6 models overestimate the frequency of EW events but underestimate the frequency of EC events in China. The TS scores of the EW events are generally higher than that of EC events over China.
Most CMIP6 models could capture the trends in EC events in China but fail to simulate the trend in EW events. The interannual variability of EW events exhibits a relatively better performance than that of EC events. Combining the evaluation indicators of the annual cycle, spatial pattern, climatology, trend and the interannual variability in EW events and EC events, it can be concluded that the optimal models for simulating EW and EC events differ among different seasons and regions in China. MPI-ESM-1-2-HAM best simulates EW events, while FGOALS-f3-L best simulates EC events in China.
Although the CMIP6 climate models could reproduce the spatial pattern of the frequency of ETCN events in China, some biases still exist in the simulation of ETCN events. Previous studies have shown that uncertainties in climate models mainly result from the subgrid parameterization process (Bony and Dufresne 2005;Luan et al. 2016). Furthermore, dynamic downscaling was a useful way to reduce model uncertainty based on the best model for simulating ETCN events, as was selected here; this method could improve simulations of regional climate changes (Seo and Ok 2013). To eliminate model uncertainty, we could improve the resolution of the climate model to reduce its dependence on the physical parameterization process (Randall et al. 2003;Eyring et al. 2019;Paik et al. 2020). The high-resolution model effectively improved the model performance in simulating the EW and EC events, especially in the JH region. However, this improvement was difficult to apply to the TP regions. This indicated that physical processes play an important role in areas with complex topography, which is still a challenging topic for the improvement of global climate models (Jiang et al. 2015).
In addition, the uncertainty of observed data was another main cause of biases, such as the data uncertainty in the TP region related to the sparse station density (Yin et al. 2019). In the future, multisource observation data would be used for comparative analyses to improve the accuracy of model evaluations (Li and Yan 2010). The MME method used in this study was an equal-weight ensemble. Each model could be given different weights according to the ranking of the models' performance to obtain more accurate simulation results (Knutti 2010); this would be the content of our next work.