Predictable Patterns and Their Corresponding Signal Sources of Midsummer Surface Air Temperature Over Eastern China in the ECMWF S2S Forecasts

The maximum signal-to-noise empirical orthogonal function (MSN EOF) method is 5 used to evaluate the midsummer 2-m air temperature (T2m) over Eastern China of subseasonal to seasonal scale forecast data in ECMWF model, and investigate the underlying mechanisms between temperature modes and predictable sources. The first predictable pattern mainly presents the dipole mode of positive value in the south and negative value in the north. The model captures the signal of the transition from preceding El Niño to La Niña and accompanying tropical Indian 10 Ocean warm surface temperature. In the summer of transforming years, the West Pacific Subtropical High is stronger and westward, meanwhile the southwest monsoon strengthens, which are the main direct influence factors of the high pressure in the south and the more precipitation in the north. Compared with observations, although the model captures the relationship between the temperature mode and the previous sea surface temperature signal, it obscures the mediating role 15 of the Western Pacific Subtropical High. The second predictable pattern is the warmer characteristic of the Yangtze River valley (YRV), and North Atlantic Oscillation which the atmospheric internal variability is the main signal. The wave train propagating from northwestern Russia to Northeast Asia is the main cause of the abnormal high pressure over YRV. The third mode is mainly the temperature trend item, and the spatial characteristics of observation and 20 model are quite different. ECMWF model shows high forecasting skills in the three modes, and presents high (low) surface pressure in areas with high (lower) temperatures, reduced (increased) precipitation and increased (reduced) solar radiation, which proving the model simulates the potential mechanism of circulation anomalies affecting surface air temperature commendably.


Introduction
In the context of global warming and climate change, heat waves (HWs) have shown an obvious trend of growth. HWs have large impacts such as harming human health, aggravating energy consumption, destroying ecological environment, and so on, which become important meteorological and climatic disasters, drawing more attention in recent years (Easterling et al. 30 2000; Rey et al. 2009;Coumou and Rahmstorf 2012;Papalexiou et al. 2018). For instance, the European HW in 2003 and the Russian HW in 2010 both caused huge casualties and property losses which immediately attracted a large number of researchers to explore their causes and to establish extreme HW prediction systems (Fouillet et al. 2006; Barriopedro et al. 2011). Eastern China (EC) is coastal area which densely populated, economically developed, and very sensitive 35 to extreme disasters. Such in the summer of 2013, the sustained HW in the Yangtze River valley (YRV) caused more than 5,500 deaths and a direct economic loss of 59 billion yuan ; Gu et al. 2016). Recent researches show that the air temperature and HWs in EC have a continuous increasing trend, and extreme high-temperature events (EHTEs) are mainly concentrated in midsummer (Sun et al. 2014;Guo et al, 2017). Therefore, it is particularly 40 important to understand the predictable sources of midsummer air temperature in EC and to predict them timely and accurately.
Abnormal temperature events such as EHTEs often require more advanced warning to provide the government with response measures and sufficient time for the people to take refuge.
The World Weather Research Program and the World Climate Research Program jointly initiated 45 the subseasonal to seasonal (S2S) prediction program with the aim of improving the S2S-scale forecast skills and analyzing its physical mechanism, a total of 11 members have participated this program and provided forecast data. S2S-scale prediction program fills in the gap between the weather and subseasonal forecasts (Vitart et al. 2017;White et al. 2017). Due to the disappearance of atmospheric initial conditions and the delay of boundary conditions, forecast models are 50 difficult to distinguish clear signal sources. The elusive non-linear processes of multiple scales also hinder the prediction. Consequently, the current research on the S2S models shows that the weather scale forecast skills within 2 weeks are high. The skills deviations on extended period between different models are large, and individual models still have high forecast skills on the time scale of about 4 weeks. For example, Andrade et al. (2018) showed that ECMWF, UKMO 55 and KMA had high precipitation forecast skills on a global scale, but that declined rapidly after the second week. Zhou et al. (2019) found that ECMWF had the highest forecast skills for winter 2-m air temperature (T2m) in China, its skillful duration was 2-4 weeks, and the forecast score presented differences in space which northern region was higher than that of the southern region.
Exploring the predictable sources of the S2S model is an important direction for understanding the 60 subseasonal process and improving forecast. The predictable sources of the S2S time scale mainly include Madden-Julian Oscillation (MJO), atmospheric quasi-biweekly oscillations and other long-lasting atmospheric initial conditions; boundary conditions such as sea surface temperature (SST), soil humidity, ice and snow conditions; others like atmospheric teleconnection, interaction between atmospheric layers, etc. What are the driving factors and physical mechanisms of the 65 temperature mode in S2S-scale, whether the signals obtained in observational research can be reproduced in the model, or whether the model has false signals that do not exist in the observations, these issues deserve more attention.
As our main research area, EC has a vast territory and a complex circulation system. Its air temperature in summer is affected by different systems or multiple systems, which blurred the 70 predictions of numerical models. Western Pacific Subtropical High (WPSH), an important member of the East Asian summer monsoon, is the most important factor affecting the air temperature over EC. When it becomes significantly stronger and westward, the abnormally high temperature will appear in the high-pressure control area, and its north-south position controls the high and low temperature ranges and rain bands (Wang et al. 2016;Deng et al. 2019). Southwest 75 monsoon and atmospheric quasi-biweekly oscillations are two other decisive circulation systems affecting the air temperature in South China (SC) (Chen and Lu, 2015;Chen et al. 2017). The location and intensity of the East Asian subtropical jet and the South Asian high affect north-south and east-west shifts of the high air temperature zone in north central China (Wang et al.2013).
These circulation anomalies usually have more advanced signals such as that the WPSH is 80 considered as a bridge connecting El Niño to the East Asian climate. In the summer of El Niño decaying years, the stronger WPSH cause the higher air temperature in SC (Wang et al.2017;Deng et al.2019). In addition, the equatorial Indian Ocean capacitor effect affects the SC to Jianghuai region by exciting the abnormal Kelvin wave to strengthen the anticyclone over western Pacific (Xie et al.2009). Atlantic sea surface temperature anomalies (SSTAs) from winter to 85 summer can affect the summer temperature in northern China through atmospheric circumglobal teleconnection (CGT) or North Atlantic-Eurasian teleconnection (Li and Ruan 2018;Li et al. 2019). In order to explore whether the S2S model can capture the complex multi-system signals and reproduce the underlying mechanisms, we use the ECMWF model forecasts with the highest forecast skills in the S2S program to study the main surface air temperature predictable patterns 90 over EC in midsummer and corresponding predictable sources.
Maximum signal-to-noise empirical orthogonal function (MSN EOF) is one of the effective methods to extract model predictable patterns (predictable components). This method is developed by Allen and Smith (1997), and applied by Huang (2004), Hu and Huang (2007) and Liang et al. (2009). MSN EOF can maximize signal-to-noise ratio in limited members, reduce random errors, 95 eliminate unpredictable parts, and obtain predictive patterns, which is effective method to solve the problems mentioned above. Therefore, this paper uses this method to study the above questions. The structure of this paper is as follows: The data and method used in this study are in section 2. The forecast skills and predictable patterns of the ECMWF model are showed in section 3. Section 4 discussed the corresponding physical mechanism. Summary and discussion are 100 provided in section 5.

Data and method
ECMWF-S2S forecast data is issued by European Centre for Medium-Range Weather Forecasts(ECMWF)which includes real-time forecast and hindcast. The real-time forecast started in 2015, which is a 0-46 d integration. Its horizontal resolution is T639/319 (0.25° × 0.25° 105 before day 10, 0.5° × 0.5° after day 10), a total of 91 layers in the vertical direction (L91), and twice a week for forecast frequency. The ensemble members include a control experiment member and 50 perturbed experiment members (perturbed atmospheric initial conditions). The hindcast data is flying, which the model returns the past 20 years hindcast data after each real-time forecast.
Its resolution, forecast frequency and forecast time are the same as the real-time forecast, but the 110 ensemble members include a control member and 10 perturbed members. The model is air-sea coupling but not sea-ice coupling.
This article mainly focuses on the eastern part of China (18°-42° N, 105°-123° E) in the midsummer that EHTEs occur frequently. In order to better analyze statistical result and ensure the consistency of the data, the data in this paper adopted the real-time forecast data in 2019 and the 115 hindcast data for the 2019 version (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). The ensemble members include a control member and 10 perturbed members. Because our research period focuses on midsummer, we only utilized each first data in July. Previous studies show that model's forecast skills declined sharply after the fourth week ,0-30 d average data is select to obtain the predictable pattern. Observed data used for verification are from the fifth-generation atmospheric reanalysis data of ECMWF (ERA5). 120 Because SST is a variable that changes slowly, so ERA5 SST is used to analyze the external forcing instead of model forecast SST. SST and circulation indexes are issued by the National Climate Center of China. All data have been processed to remove the 21 years average (1999-2019).
MSN EOF is the main analysis tool applied in this study to extract the model predictable 125 patterns developed by Allen and Smith (1997). It is same as mentioned by Venzke et al. (1999). A simple introduction and algorithm will be given below, and the specific principle can be referred to the above literature. Assuming an infinite number of ensemble members predicted by the model, the ensemble mean of the members can be regarded as the unbiased signal of the model because the unknown random error of each member's forecast has been eliminated. However, in 130 reality, the number of members is limited, therefore ensemble mean cannot completely eliminate this internal random error. How to maximize the elimination of this error is the key. The first mode of MSN EOF can maximize the ratio of signal (ensemble mean) and noise (the deviation between members), and the second mode maximizes this ratio while is orthogonal to the first mode...Similar to traditional EOF, MSN EOF also decompose the forecast sets into spatial-135 temporal distributions. In this study, the time dimension of the model forecast sets is 21 years, a total of 11 ensemble members, and the spatial area are selected at 18-42°N and 105-123°E with a total of 221 points. The specific operations are as follows: Denote by a model member forecasts at a fixed lead time: represents the ensemble mean, ′ represents the deviation from the ensemble mean, k = 1, 2, ... K represent different ensemble members. If the ensemble has an unlimited number of members, then the ensemble mean can well eliminate the random error, the expectations of (E( ) )is the complete signal to be sought. Assuming that can be decomposed into predictable and unpredictable components: 145 is the common "signal" in all members. is the random "noise" in individual members.
Predictable patterns are the leading modes of by EOF.
Assuming that and are temporally uncorrelated with each other, then the covariance matrix ( ) of can be decomposed into the the signal and noise covariance matrices: 150 Suppose there is a matrix such that = , which is the identity matrix. This transformation is called the "prewhitening". In actual calculation, is constructed from the first

=′ (4)
Where ′ represents the first element in the diagonal matrix of SVD for ′ . The first MSN EOF (predictable pattern) is obtained by projecting onto the PC1: The second pattern can be determined accordingly which is orthogonal to the first pattern…… The significance test method of the predictable patterns comes from Venzke et al. 1999.
Suppose there is no real signal for a certain mode, the ratio of the variance ( ) of the time series ( ) by projecting onto ̃ and the averaged within-ensemble variance ( ) of the time series ( ) by projecting onto ̃ obeys an F distribution: 170 m and n are the number of time samples and ensemble members respectively. The standard deviation of ECMWF forecast T2m is compared with that of Obs which are used to measure interannual variability (Fig. 2). The most notable feature in Obs is that large standard deviation mainly ranges from the middle and lower reaches of the Yangtze River to the 185 central and southern regions. Compared to Obs, the ECMWF forecast has a significantly smaller standard deviation, with about 0.5-1℃ bias in the maximum value area. This may be due to ECMWF forecast significantly underestimates the mean temperature and interannual variability.

Climatology, standard deviations, and trends
Both Obs and the model show that the highest climatic temperature region is consistent with the region with the largest interannual variability. 190

Most predictive patterns
The predictable patterns of T2m in EC are extracted by applying MSN EOF. We discuss the 205 leading three most predictable patterns, since only these exceed the significance at 95% confidence level (F test). Fig. 4a is the first MSN EOF mode (MSN EOF1), which explains 22.4% of the total variance. The first mode reflects the dipole of north-south reverse orientation near 30°N. The corresponding time series (Fig. 4b, black curve) have obvious interannual change characteristics. Since the MSN EOF mode is only characterized by the calculation of model 210 forecast data, in order to cross-check whether the most predictable pattern of MSN EOF1 exist in real, we examined the similarity between the time series computed by projecting the ensemble mean of the model forecasts onto MSN EOF1 (PC1) and by projecting the observed data onto MSN EOF1 (oPC1). The correlation between the two series is 0.77, which significantly exceeds the 95% confidence level, and both two series (PC1, oPC1) have obvious interannual change 215 characteristics. Although the amplitude of time series in model and Obs are relatively consistent, the explained variance is quite different. MSN EOF1 explains more variance within the model than in the Obs, which is likely due to the underestimated standard deviations in the model (Fig.   2). In order to assess whether the oPC1 mentioned previously is affected by any possible systematic bias of the model, we obtained the spatial mode by the regressing Observed T2m onto 220 the PC to examine the relationships between the first mode and observed T2m pattern. Fig. 5a presents the consistency of the first predictable pattern and observed T2m in the north-south dipole signal, and the positive and negative centers are similar to Fig. 4a (pattern correlation coefficient (pcc)=0.92), which testifies the authenticity of PC1.  PC3. The high correlation between model and Obs time series (Affected by consistent trends) and diffident feature between this predictable pattern and associated observed signals (Fig. 5c) indicate the third predictable patterns may be a temperature trend mode. In order to verify our conjecture, we compared Fig. 4e and Fig. 3b, Fig. 5c and Fig. 3a, and calculated the two pattern correlation coefficients are -0.74 and -0.52 respectively (negative value is due to the decay trend). The 240 similarity between the third predictable pattern and the trend mode confirms our conjecture to a certain extent.

Drivers and mechanisms 255
It is known that the high-pressure and low-pressure systems in the atmosphere are the main factors that affect the high and low temperatures. The anticyclones (cyclones) in the high-pressure (low-pressure) systems and the accompanying sinking (upward) airflow reduce (increase) cloud cover. Sunny (Cloudy) weather will cause the ground to receive more (less) solar radiation, thereby heating (cooling) the surface temperature. On the other hand, this situation means less 260 precipitation and evaporation, so less cloud cover circulates again. (Wang et al, 2017(Wang et al, , 2018Deng, et al, 2019). In order to investigate whether the surface air temperature predictable patterns of the ECMWF model match the anomalies of the circulation, and whether the above physical mechanisms can be reproduced, we analyzed the relationship between the modes and the above factors. 265 Fig.6 shows the regression maps between PCs and ensemble mean surface pressure (SP), total precipitation (TP) and solar short downward radiation (SSRD) forecasted by ECMWF model.
For the first mode, a gigantic positive anomalous high surface pressure occupies the low latitude area (Fig. 6a), with less TP and more SSRD. The northern area has obvious more precipitation and less solar radiation. Low water vapor conditions from the Bay of Bengal and South China Sea to 270 13 SC are similar to the water vapor conditions of the southwest monsoon that causes high temperatures in SC (Chen and Lu, 2015). For the second mode, there is a strong low pressure over most parts of northern China. More precipitation appears in the NC and SC, and the abnormal dryness in most of the remaining areas including the middle and lower reaches of YRV seems small than positive center in temperature mode. The SSDR shows a similar pattern to MSN EOF2, 275 with significant positive correlations. For the third mode, the correlation with SP is not significant.
Less water vapor and positive SSDR anomalies are seen over the southeast China area. These results are consistent with previous conclusion that anomalous high temperature is accompanied the decrease in SSRD and TP. From the analysis of surface pressure, the circulation anomalies related to the first mode are mainly the strong positive pressure in the south, and the second mode What interests us is why the circulation anomalies associated with the T2m predictable modes occur in these regions, what are the early signals and influence processes in ECMWF model, whether these actually exist in the Obs. Fig. 6 shows the regression maps of SST, 500hPa 295 geopotential height (GH500) and wind onto PC1. Because the 5880 line in the GH500 predicted by ECMWF does not appear in the selected area (may be caused by the model deviation), we use the 5860 line instead of the 5880 line to indicate the range of the WPSH. The most obvious characteristic shown in Fig. 7a-c is the abnormally warm SSTAs in the central and eastern Pacific in pre-winter, then the SSTAs gradually disappear in the following spring and summer, which is 300 the typical phenomenon form preceding El Niño to La Niña. Fig. 8a shows the correlation coefficient between the pre-winter NINO3.4 index and PC1 reaching 0.36, which exceeds the 90% confidence level, proving that El Niño is the major external force associated with the first predictable pattern. Previous studies have shown that the warm SST in the equatorial eastern Pacific in the pre-winter causes an abnormal anticyclone in the western Pacific in summer through 305 deep convection, hence affecting the climate of East Asia. (Chang et al. 2000;Wang et al. 2000).
Regression maps of GH500 (Fig. 7d) verify a positive anomaly of the geopotential height from southeast China to the West Pacific region, simultaneously the WPSH is westward than the climatic stage. The southwest monsoon that west of the anticyclone brings water vapor from the Bay of Bengal and the South China Sea to the area north of the Yangtze River, causing more 310 precipitation in the north which complies with Fig. 6d. The regression maps of circulation anomalies and Nino3.4 (Fig.7e) show consistent characteristics. Although these results are similar to previous studies in Obs, but it is worth noting that this circulation anomalies did not exceed the confidence test. In order to further analyze the specific reasons, we calculated the correlation between the observed and model PCs, the west ridge point of the WPSH index (WPSH_wrpi), and 315 the Nino3.4 index. The high correlation in the Obs (Fig. 8b) is consistent with previous studies, confirming that El Niño caused the temperature anomalies in the first mode by affecting the WPSH. The low correlation between the SSTA and the WPSH in ECMWF (Fig. 8c) shows that the model is difficult to reproduce the sea-atmosphere interaction between the WPSH and El Niño, which obscures the intermediate factors of the WPSH between the first predictable mode and El 320 Niño. can also strengthen the WPSH through two ways of affecting the equatorial Indian Ocean to West Pacific easterly wind anomalies and reducing the latent heat in the equatorial central Pacific (Lu 330 and Dong, 2005;Hong, et al, 2014). Although both them have a promoting effect on the WPSH, but they also be affected by El Niño. In summary, the predictable source of the first predictable mode in ECMWF is El Niño, but compared to Obs, there is no obvious correlation between SSTAs and WPSH, which may be the key to improving forecasts.  Pacific regions during the same period, but they were not obvious from the previous winter to spring. We think these local SSTAs are formed by the response of the atmosphere instead of the previous external forcing signals. Large-scale atmospheric circulation in Fig. 10a prove these SSTA centers correspond to the GH200 anomaly area, and it's interested that the circulation 355 characteristics seem like the circulation mode associated with high air temperature in the YRV mentioned by Deng (2019). As shown in Fig.10a and Deng's study, there is an abnormal lowpressure over the high-latitude North Atlantic Ocean (north of 60°N), meanwhile a high-pressure locating over the mid-latitudes of the North Atlantic Ocean to the north of Russia. And there is a "positive-negative-positive" southeast wave train from northwestern Russia to East Asia. The 360 upper high-pressure center in Northeast Asia affects local lower circulation by baroclinic convection (Deng, et al. 2019). The wave activity flux in Fig. 10a shows the existence of two wave propagating from North Atlantic to the northeast and Northwestern Russia to the southeast.
The circulation mode over the North Atlantic which located at the source of the wave train is similar to North Atlantic Oscillation (NAO). Its positive phase presents a positive air pressure 365 anomaly at 30-60°N and a negative air pressure anomaly at 60-80°N (Ambaum et al. 2001), which same as Fig. 10a. The regression maps of circulation and NAO are consistent with the characteristics of regression maps onto PC2, even the geopotential height anomalies are more obvious. Fig. 10c testifies the correlation coefficient between NAO and PC2 is 0.42, exceed the 95% confidence levels, indicating that NAO is the main signal source of the second mode in 370 ECMWF model. The correlation coefficient between NAO and oPC2 is 0.51, it seems that this signal is more obvious in the Obs.
Although our main focus is July T2m in summer, we also analyzed the data in June and August in June and August (figures not shown). Compared with July, ECMWF's forecasting skills for T2m in June and August are lower. However, the predictable patterns and corresponding 375 circulation anomalies have the same characteristics as in July, and El Niño and NAO are still the most important signals. Stippled areas indicate the significant values exceeding the 90% confidence levels

Summary and discussion
This study investigates the performance of ECMWF, which is the world's best dynamic model at subseasonal-seasonal timescale, about midsummer T2m over Eastern China. It further analyzes the predictable patterns of temperature anomalies based on the MSN EOF method and 390 corresponding predictable sources and underlying mechanisms. EMCWF model and Obs have high similar modalities in climatological mean and the standard deviation of T2m. Both the highest temperature mean and larger standard deviations are located in the middle and lower reaches of the YRV, and low temperatures and small standard deviation in the north parts, but EMCWF underestimates the climatological mean and interannual variability in southeast China. ECMWF 395 and Obs have very different patterns of temperature trends. Obs shows a weakening trend in NC and SC, and an increasing trend in the middle Huanghuai Basin. However, the model shows an increasing trend in most areas, with the most intense in the north.
The first predictable pattern of T2m in ECMWF model is characterized by the dipole with lower temperature in the north and higher temperature in the south, showing a significant 400 correlation with ENSO. In the transition phase from El Niño to La Niña, the WPSH is stronger and westward, and southern China is controlled by an abnormal anticyclone. Simultaneously southwest monsoon brings more water vapor to the north of YRV. These create the temperature mode that is colder in the north and warmer in the south. Although ECMWF captures the relationship between El Niño and this temperature mode, the weak correlation between WPSH and 405 SSTAs makes it impossible to perfectly reproduce the physical process. The second mode is mainly the strong positive pattern in the YRV, and NAO is its most important early signal. In the positive phase of NAO, there is a dipole circulation pattern over the North Atlantic Ocean. The abnormal high pressure extended to northwestern Russia, subsequently inducing an abnormal wave train propagating to the southeast. Influenced by that, an abnormal high pressure formed in 410 the lower reaches of the YRV to Northeast Asia, which caused the temperature anomalies in the second mode. For the leading two modes, high temperature is accompanied by local high pressure, precipitation reduction and solar radiation increase. The sinking airflow and less cloud cover strengthen the non-adiabatic heating and increase the surface temperature, meanwhile less precipitation promotes this process, and vice versa. The third mode is mainly the temperature 415 trend term predicted by the model. Although ECMWF S2S model has high skills beyond the weather timescale, and catches the early signal sources corresponding to predictable patterns of T2m, but there are still ambiguous or unknown mechanisms within the model, which shows potential for improvement in the dynamic or statistical prediction of the S2S time scale. For example, Fig. 8c clearly shows that the 420 relationship between ECMWF model and Obs for WPSH had undergone obvious interdecadal changes, with obvious synchronization in the middle period, but after 2013, they had almost the opposite interannual changes. And the surface air temperature may be affected by many other factors, such as that Arctic warming affects different modes of temperature by changing the propagation speed of Rossby waves (Francis and Vavrus 2012;Screen and Simmonds 2014), and 425 the third trend mode may be affected by Anthropogenic factors such as greenhouse gases (Kang and Eltahir 2018). The prediction of extended weather timescale will be the frontier field of atmospheric research for a long time.