Evaluation of re-analyses over China based on the temporal asymmetry of daily temperature variability

As an intrinsic feature of daily surface air temperature (SAT) variability found in station measurements, temporal asymmetry (TA) can be taken as an evaluation metric to access the quality of SAT re-analysis product. In this study, TA calculated from four SAT variables, i.e., daily mean SAT (Tmean), daily maximum SAT (Tmax), daily minimum SAT (Tmin) and diurnal temperature range (TDTR = Tmax − Tmin), is applied to evaluate synoptic-scale performance of four reanalysis products (NCEP-2, JRA-55, ERA-I, and ERA-5) over China. The results show that four re-analyses overall overestimate the TA of daily Tmax and Tmin variability over China, but with a comparatively consistent estimated TA for Tmean. Moreover, the TA of Tmean variability for these four re-analyses shares high spatial consistency with those from the observation. However, four re-analyses own the similar region-dependent spatial patterns of overestimated TA for Tmax and Tmin variability, especially for Tmax. Since high TA is an indicator for strong nonlinear feature, only Tmean reanalysis is the most suitable to explore synoptic-scale extreme events, such as heat waves and cold waves, which are highly related to the strong nonlinear processes.

As a kind of asymmetry, the temporal asymmetry (TA) in time series, defined by different statistics between forward and backward (reversed) directed series, plays an important role in air temperature variability studies (Bartos and Jánosi 2005;Gyüre et al. 2007;Ashkenazy et al. 2008;Xie et al. 2016Xie et al. , 2019. Previous studies also found that there exists differential TA among different temperature variables' daily fluctuations over China from both station observations and NCEP-2 re-analyses . For commonly used temperature variables, daily mean 2-m surface air temperature (SAT, T mean ), daily maximum SAT (T max ), daily minimum SAT (T min ), and diurnal temperature range (T DTR = T max − T min ), TA strengths among these four temperature variables are markedly different with the weakest TA for DTR and the strongest TA for T mean . Compared with the TA from station observations, TA from T max and T min in National Centers for Environmental Prediction (NCEP) AMIP-II Reanalysis (hereafter NCEP-2, Kanamitsu et al. 2002) is highly overestimated for most regions over China .
TA has been found to be an intrinsic feature in daily SAT variability in station measurements (Bartos and Jánosi 2005;Gyüre et al. 2007;Xie et al. 2016Xie et al. , 2019Li et al. 2021). Although similar TA behavior was also reported in re-analysis data (Ashkenazy et al. 2008;Xie et al. 2019;Li et al. 2021), only a few studies Li et al. 2021) compared TA difference between station measurement and re-analysis products, detailed and systematic comparative studies are still required. More specifically, is it a universal feature to different kinds of re-analyses for the reported overestimated TA in T max and T min variability from NCEP-2 re-analyses over China Li et al. 2021)? If the answer is yes, is there any spatial consistency among different reanalysis products?
The reanalyzed products are outputs from the assimilation technology of numerical weather prediction to restore the observation data and compensate for the lack of uneven distribution of weather stations (Bengtsson and Shukla 1988;Trenberth and Olson 1988;Chen and Liu 2016). Previous studies used different methods to compare and evaluate reanalysis data at different temporal and spatial scales from the point of view of mean value, standard deviation, long-term trend, long-range correlation (LRC), and so on, and they found re-analyses do not always work well to reproduce consistent results with observations (Flocas et al. 2005;Ma et al. 2008;Pitman and Perkins 2009;Mao et al. 2010;Marques et al. 2010;Mooney et al. 2011;Alfred et al. 2011;You et al. 2011You et al. , 2013Cornes and Jones 2013;Chen and Iwasaki 2014;Taguchi 2017;Zhu et al. 2017;He et al. 2018;Alghamdi 2020). For example, He and Zhao (2018) evaluated the air temperature reanalysis variables by means of LRC, and they found that NCEP-2 reanalysis overestimates LRC over some specific regions. Zhu et al. (2017) revealed that Interim European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-Interim, herein ERA-I, Dee and Coauthors 2011) can capture the intensity indices of the continuous extreme temperature events, but not their frequency indices. Since TA is closely related to the synoptic-scale processes (Ashkenazy et al. 2008;Xie et al. 2016;Li et al. 2021;Quan et al. 2021) and nonlinear behaviors (Li et al. 2021), behaviors like extreme temperature events found in ERA-I should also be able to be revealed by TA.
Besides NCEP-2 reanalysis and ERA-I, other commonly used reanalysis products are available: the 55-year Japanese Project (hereafter JRA-55, Ayataka et al. 2011;Kobayashi et al. 2015), and the Fifth generation ECMWF Re-Analysis (hereafter ERA-5, Radu et al. 2018;Hersbach et al. 2020). These four reanalysis products will be exploited to test whether there are universal conclusions on TA related to SAT variability.
Moreover, Ye and Hsieh (2008) found that increasing nonlinearity in ENSO and Lorenz systems can enhance their predictability by improving the contributions from the low-frequency variations. The process-dependent intrinsic predictability is also reported to be enhanced with increased nonlinearity (Huang and Fu 2019). As one kind of nonlinearity, overestimated TA in the NCEP-2 reanalysis daily minimum and maximum air temperature anomaly series over China is concurrent with overestimated intrinsic predictability, prediction skill, and the occurrence number of extreme events (Li et al. 2021). This finding indicates that overestimated TA in reanalysis products may distort the conclusions on extreme event studies. By comparing the TA calculated from the station observations of different SAT variables to those from different reanalysis products, the most suitable SAT variable and the corresponding reanalysis products to estimate TA can be determined.
The paper is organized as follows. Section 2 summarizes the sources of observation and reanalysis data and the methods used in this study. Section 3 reports the overestimated TA in both T max and T min from observations and re-analyses, contrasting to the comparable consistency in TA for T mean . And quantitative comparison of TA between observations and re-analyses for four different SAT variables is also provided. At last, Section 4 concludes this study with a brief discussion.

Observations
The observational time series (hereafter observations) of T mean , T max , T min , and T DTR were downloaded from China Meteorological Administration. After data quality check by removing the time series with missing points, datasets of 643 Chinese meteorological stations were finally selected for this study, covering a total of 40 years from 1979 to 2018.
Previous studies found that topography, altitude, and sparse station are the main factors affecting the quality of reanalysis data (Rusticucci and Kousky 2002;Wang et al. 2015;Zhao et al. 2018;He and Zhao 2018). Due to the sparse stations in Tibetan region, we removed the stations over Tibetan regions and only evaluated the quality of the reanalysis data by using observations from the rest of the stations over China except Tibetan regions.
In this study, all referred time series are standardized by removing the annual cycle (Koscielny-Bunde et al. 1998) by where T i is any given daily air temperature variables, 〈T i 〉 is its long-time average for each calendar day, T ′ i is the corresponding anomaly, and N is the data length.

Interpolation for reanalysis
In order to compare TA measures calculated from the four reanalysis products with different spatial resolutions to those from station observations, we interpolated the two-dimensional gridded data into the corresponding observation stations (hereafter interpolated reanalysis). Two widely used methods in previous studies were adopted: Gaussian weight function (Maddox et al. 1981;Xie et al. 2019) and interpolation by directly choosing the closest grid point to represent the targeted station (Diaconescu et al. 2018;Pendergrass and Knutti 2018;Yang et al. 2020).
For Gaussian weight function, the interpolated reanalysis is given by where s(i, j) is the targeted variable over grid (i, j) and the weight function w(i, j) is: where C is the weighting constant, the Euler distance d is from the grid point (i, j) to the location of the specific station. For the reanalyzed data with different resolutions, detailed numerical calculations found that the smaller value C (for example C = 0.5 or C = 1.0) within a certain range can produce reliable interpolations. (1)

Measuring temporal asymmetry of time series
Different methods have been proposed and applied to measure the TA strength of air temperature variations (Xie et al. 2016Zhang et al. 2019;Li et al. 2021). Zhang et al.
(2019) compared several TA measure methods and found all of them perform nearly equally well for most of cases. For simplicity, the asymmetry index (A) is adopted to quantify the TA strength in the air temperature anomaly T ′ i , which is defined as the ratio of positive air temperature variability steps to the total (positive plus negative) steps (Ashkenazy et al. 2008): When the value of TA is smaller than 0.5 (A < 0.5), it indicates that the air temperature warms rapidly and gradually becomes cold. The value of TA is larger than 0.5 (A > 0.5) indicates that the air temperature rapidly cools and gradually warms. When the value of TA is close to 0.5 (A ≈ 0.5), the air temperature time series are symmetric.

Significance test
To carry out the significance tests, the iterative amplitude Fourier transform (IAAFT) is employed to generate surrogates for each time series by keeping the two-point correlation and probability density function as those from the original series. Taken T mean anomaly time series as an example, we generate 500 surrogate series for it, and calculate A by means of Eq. (3) for each surrogate to obtain 500 values for A. Sorting these 500 values for A can define the lower 1% levels and the upper 99% among these 500 values of A as the significance thresholds. And then two thresholds of A c1 and A c2 can be obtained. When the calculated value of A for a given series is lower than A c1 or higher than A c2 , then this given series is taken to be statistically significantly temporal asymmetric. Such significance test through IAAFT surrogate can ensure that our results are not artificially influenced by stochastic effects, the autocorrelation and probability distribution of a time series (Huang et al. 2020).

Overestimated temporal asymmetry in T max and T min
First of all, the estimated temporal asymmetry (TA) for four SAT variables' variability from the observations is compared with those from four kinds of re-analyses, and detailed results are presented in Fig. 1. Here, the determination of overestimation or underestimation should be based on the statistical significance test. For each estimated TA, there is an interval from the significance test to judge whether it is overestimated or underestimated. The critical values at the significance level of 0.02 are shown as the lines on both sides of the 1:1 line in Fig. 1. Only outside of the critical lines, the estimated TA can be taken as significantly overestimated or underestimated at the significance level of 0.02. The marked results are that nearly all values of estimated TA in T max and T min from all four re-analyses are overestimated (see Fig. 1b, c), especially in T max , only a few from NCEP-2 reanalysis are comparable with or lower than those from observations. This finding is consistent with previous studies in NCEP-2 reanalysis compared with limited station observations . Different from the results given by only one specific reanalysis product (NCEP-2), the results given here indicate that the overestimated TA in T max and T min from re-analyses may be taken as a common intrinsic feature to all analyzed reanalysis products. Secondly, overestimated TA in both T max and T min from all four kinds of re-analyses has its well-defined spatial  patterns (see Figs. 2 and 3). Especially for T max , four kinds of re-analyses share very similar spatial distribution of TA difference between observation and interpolated reanalysis, which is defined as A diff = A re − A obs with A re from reanalysis and A obs from observations. There are region-dependent spatial patterns of A diff . Weak overestimated TA occupies most of the regions of China. However, there are still some special patterns. The consistent estimation of TA between observation and interpolated reanalysis mainly locates over northeast part of Heilongjiang Province and west part of both Yunnan and Sichuan Provinces. Among four kinds of re-analyses, the shared patterns of the strongest overestimated TA lie in Guizhou and Chongqing provinces. These regions are also where the TA is not statistically significant in T max for observations (see Fig. 4f), but strong TA for four kinds of re-analyses (see Fig. 4g-j). According to studies on temperature asymmetry and the mechanism of temperature asymmetry (Ashkenazy et al. 2008;Piskala and Huth 2020;Quan et al. 2021), the possible cause behind regiondependent spatial patterns of A diff is closely related to frontal processes, and frontal processes have a strong regional dependence (Ashkenazy et al. 2008;Piskala and Huth 2020;Quan et al. 2021). This may be part of the reason for TA's regional dependence, and the mechanism behind this phenomenon deserves further study in depth.
For T min , no common national-scale spatial pattern of TA difference is shared among four kinds of re-analyses between observation and interpolated reanalysis over China (see Fig. 3). Only NCEP-2 and ERA-5 share the similar national-scale spatial distribution of overestimated TA (see Fig. 3a, d). However, only JRA-55 and ERA-I rather than NCEP-2 and ERA-5 share the similar spatial distribution of TA difference with an east-west dipole pattern. Over the half part of China east to 110° E, nearly all values of TA are overestimated, whereas over the another half part of China west to 100° E, nearly all values of TA are weakly underestimated. The contrasting spatial patterns of TA in T min are mainly due to the weak TA strength in observations (see Fig. 5a) but strong TA strength in Central parts of China in four kinds of re-analyses (see Fig. 5b-e). At the same time, contrast topography and possible related specific procedures adopted by JRA-55 and ERA-I reanalysis products (Pinheiro et al. 2020) may contribute to TA difference with an eastwest dipole pattern in T min .
Besides the almost identical patterns of TA difference for T min , JRA-55, and ERA-I also share almost the same TA patterns for all temperature variables (T mean , T max , T min , and T DTR ), details can be found in Fig. 4c, d, h, and j; Fig. 5c, d, h, and j. The similarity between ERA-I and JRA-55 reanalyses on synoptic-scale phenomena has been reported in  2020) studied subtropical cut-off lows in the southern hemisphere, they found that the differences of track density between ERA-I and JRA-55 are relatively small. They explained this similarity between ERA-I and JRA-55, as both of them used the same data assimilation systems (Pinheiro et al. 2020).

Comparable estimation of temporal asymmetry in T mean
Different from the markedly overestimated TA found in T max and T min , more consistent TA estimations are revealed in T mean between observations and re-analyses (see Fig. 6). There are only a few stations (less than 1%) with strong overestimated TA, and nearly 50% of the stations have the same TA estimation for both observations and re-analyses from NCEP-2 and ERA-5 (see Fig. 6a Fig. 1a).
Since nearly all values of A for the reanalysis products are all larger than these critical values (see Fig. 1a), nearly all the daily mean SAT variability takes the temporal asymmetric behaviors over China. The regions of the strongest TA strength lie over central and southeast regions of China, especially to the south of Qinling Mountains and to east of 105° E (see Fig. 4a). Over these regions, the variations of T mean gradually warm and rapidly cool. Such a temporal asymmetry phenomenon occurring at mid-latitudes has been attributed to the different contributions from the warm and cold fronts (Ashkenazy et al. 2008;Piskala and Huth 2020;Quan et al. 2021). Lower A is found in the Basin of Tarim, Qaidam and Sichuan, Yun-Gui Plateau and northeast of China (see Fig. 4a). Surprisingly, all four re-analyses can reproduce the high and low values of TA over these regions (see Fig. 6), and this is why there are more consistent TA estimations in T mean (see Fig. 1a).

TA spatial pattern similarity quantification
In order to quantify the TA spatial pattern similarity between observations and interpolated re-analyses, we adopt the Taylor diagram (Taylor 2001) to compare it quantitatively. Three statistics most often used to quantify pattern similarity in Taylor diagram are the correlation coefficient between observations and interpolated re-analyses, defined as and the standard deviation of A Ra standardized by A o Taylor diagrams constructed from four kinds of interpolated re-analysis and observations for TA of four SAT variability could more accurately evaluate the quality of different reanalysis products for different SAT variables (Fig. 7).
First of all, it should be pointed out that synoptic-scale performance of all analyzed re-analyses is not comparable to their climate-scale performance. For T mean , the four reanalysis products all perform the best among these four  Taylor diagram for a T mean , b T max , c T min , d T DTR of A from observation and interpolated reanalysis. The black dot stands for the results calculated from observations, which serves as the reference. The radial distance from the origin is proportional to the standard deviation of a pattern normalized by reference pattern, the centered root mean square (RMS) difference between the reference and re-analyses is proportional to their distance apart and the correlation between the reference and re-analyses is given by the azimuthal position of a given reanalysis SAT variables, and they are all located in a limited region in the Taylor diagram (see Fig. 7a). Especially ERA-I, ERA-5, and JRA-55 are nearly of the equally well performance with the highest RCC value (around 0.8) and the lowest SRMS value (around 0.62), only NCEP-2 a little bit worse with the RCC value around 0.73 and SRMS value around 0.78. Four reanalysis products perform second well for T max among these four SAT variables (see Fig. 7b), among them, ERA-5 is the best with the RCC value around 0.73 and SRMS value around 0.74 and NCEP-2 the worst with the RCC value around 0.4 and SRMS value around 1.14. There are great discrepancies in TA estimation between the observation and the interpolated re-analyses for both T min and T DTR (Fig. 7c, d), for all re-analyses, the RCC values are low (less than 0.5), and SRMS values are all high (larger than 0.99). Moreover, Ye and Hsieh found (2008) that increasing nonlinearity in ENSO and Lorenz systems can enhance their predictability by improving the contributions from the low-frequency variations. As a kind of nonlinearity, temporal asymmetry in daily SAT variability is closely related to extreme events and some small-scale phenomena (Raghavendra et al. 2018;Li et al. 2021). More comparable TA estimation between observations and re-analyses leads daily mean surface temperature reanalysis products to be the most suitable choice to synoptic-scale extreme event study.

Discussion and conclusion
In order to directly compare the TA from observation with those from reanalysis, the grid reanalysis data have been interpolated into the targeted station to reach the interpolated reanalysis data. In previous studies, a number of interpolation methods were proposed and applied to test the impacts of interpolations on the derived results (Maddox et al. 1981;Xie et al. 2019). Xie et al. (2019) found that if the suitable choice is made to the interpolation distance parameter C, interpolations do not change the TA calculations too much. We compared the effects from different methods and different choices of interpolation distance parameter C on the calculations of TA from the original and four interpolated reanalysis data, detailed results can be found in Fig. 8 and were summarized in Table 2. It is confirmed that if the suitable distance parameter is chosen, the estimation of TA is insensitive to the interpolations. Also the different interpolation methods do not change the TA calculations too much. Especially, the estimation of TA from ERA-I and JRA-55 different values of C (C = 1.5, green; C = 1.0, cyan; C = 0.5, blue), Method-2 represents interpolation by the closest points to the stations (red) and Grid-ori from original reanalysis (gray) shadow is more robust to the interpolation methods and choices of distance parameter (Fig. 8; Table 2).
As an important nonlinear indicator, temporal asymmetry of time series can be taken as an intrinsic feature of nonlinear time series. Similar results can be found in a simple nonlinear model, such Logistic map with a given regime (Figure not shown here). TA calculation from observation in daily mean surface air temperature variability over China, (Fig. 1a), having nearly totally statistically significant temporal asymmetry leads it to be an intrinsic feature of daily surface air temperature variability. It can be taken as a metric to access the quality of different daily SAT reanalysis products. Taking TA as an evaluation measure, four daily SAT reanalysis products (NCEP-2, ERA-I, ERA-5, JRA-55) are accessed. Compared with the observations, the four reanalyses can consistently reproduce the TA in T mean over South China. NCEP-2 underestimates the TA in T mean over the northeast and central regions of China. JRA-55, ERA-I, and ERA-5 overestimate the TA in T mean over the northwest and central regions. However, all four re-analyses universally overestimate the TA in T max and T min . These results are different from the findings by previous studies based on the linear view He and Zhao 2018). Due to the weak TA in the T DTR (see Figs. 1d and 5f-j), there are larger relative uncertainties in calculation and comparison of TA from the observations and reanalysis, so no conclusive results can be reached on T DTR .
It was reported that the modeled daily maximum temperature and daily minimum temperature are unsuitable for the study of extreme events such as heat waves due to the underestimated daily maximum or overestimated daily minimum temperature compared with observations (Raghavendra et al. 2018). Based on the results of TA estimation from all four reanalysis products, we confirm that the nonlinear strength is highly overestimated in T max and T min from the re-analyses. Since there is a close relation between nonlinearity and extreme events (Ye and Hsieh 2008;Li et al. 2021), results from daily maximum or minimum temperature reanalyses may distort conclusions on extreme events such as heat waves and cold waves. On the contrary, the most comparable consistency of TA estimation in daily mean surface air temperature variability from both observations and re-analyses make it to be a reasonable choice. Last but not least, detailed comparison among the TA results for seasonal (Figure not shown here) and all-year data (see Fig. 1) shows that all these conclusions given above are qualitatively similar with only minor quantitative differences among results for four seasons, i.e., the above conclusions are not season-dependent.
Code availability Not applicable.
Funding This research was supported by the National Natural Science Foundation of China through Grants (Nos. 41475048 and 41975059).

Declarations
Ethics approval We confirm that this article is an original research and has not been published or presented previously in any journal or conference in any language (in whole or in part).

Consent to participate and consent for publication
We have consent to participate and publish.

Conflict of interest
The authors declare no competing interests.