A preliminary attempt on decadal prediction of the East Asian summer monsoon

The East Asian summer monsoon (EASM) is one of the major synoptic systems that affect the summer climate in China. Decadal prediction of the EASM is of great significance, yet few attempts have been made by far. This study represents a preliminary attempt that uses the decadal increment method to predict the decadal variability of the EASM. The 3-year increment of the decadal EASM (DI_EASMI) is firstly predicted by the leading 4 years tropical ocean and the leading 5 years sea ice over Barents Sea. These two leading predictors with quasi-10-year oscillation can explain most variance of the DI_EASMI, especially in recent decades. The predicted DI_EASMI is combined with the observed EASM at 3 years ago to get the final prediction result. The results of cross-validation and independent hindcast show that the decadal increment method can well predict decadal variability of the EASM during the recent century. The real-time prediction results show that the chance for the occurrence of strong decadal EASM would be rare in 2021. The method developed in the present study provides a new approach for decadal prediction of the EASM.


Introduction
The East Asian summer monsoon (EASM) refers to the southerly winds that prevail in East Asia in summer. As a unique component of the Asian climate system, the EASM demonstrates special spatiotemporal structure and dominates climate and climate change in East Asia (Tao and Chen 1987;Ding 1994;Huang et al. 2004a, b). Due to its interactions with the climate system on multiple time scales, the EASM shows large variations from intraseasonal to interdecadal scale. The interaction between the EASM and the climate system often leads to anomalies of the EASM, which can subsequently cause heavy droughts and floods in China. Such kinds of meteorological disasters impose serious impacts on people's lives and economic activities in East Asia. Therefore, reliable prediction of the EASM changes is of great significance (Huang et al. 2007;Ha et al. 2012). On the intraseasonal scale, the EASM often demonstrates 10-20-day and 30-60-day oscillation patterns (Chen et al. 2001;Mao and Chan 2005;Guan and Chan 2006). On the interannual scale, 2-year and 4-year oscillations account for a large part of the EASM variability. These intraseasonal and interannual variabilities of the EASM have remarkable impacts on summer rainfall in China and large-scale circulation in East Asia (Meehl and Arblaster 2002;Wu et al. 2008;Ding et al. 2013). Decadal climate change is defined as changes on the time scale of 10-30-year in the future. In the twentieth century, the EASM has undergone significant decadal changes. The weakening of the EASM that occurred at the end of the 1970s (Wang 2001(Wang , 2002Xue 2001;Huang et al. 2004a, b;Ding et al. 2008) was characterized by weaker than normal south-westerly winds in the surface level and at 850 hPa and a weakened easterly jet in the upper troposphere (Hu 1997;Xu et al. 2006;Ding et al. 2008;Zhou et al. 2009). Meanwhile, the western Pacific subtropical high retreated to the east Huang and Li 2015;Tong et al. 2020). In the beginning of the 1990s, the EASM started to intensify (Liu et al. 2012;Ding et al. 2013).
Corresponding to the aforementioned decadal changes in the EASM, summer precipitation in China has also experienced significant decadal changes. From the middle of the 1960s to the end of the 1970s, the rain belts in eastern China were located in North China and South China, while the Jianghuai and Yangtze River basins had less than normal precipitation. The spatial distribution of rainfall anomaly displayed a "positive-negative-positive" pattern (" + − + " meridional pattern) along the meridional direction. In the late 1970s, following the EASM weakening, the rainfall zone shifted to the Yangtze River basin, while precipitation in North China and South China decreased. Precipitation anomaly presented a "negative-positive-negative" pattern (" − + − " pattern) along the meridional direction. By the early 1990s, with the intensification of the summer monsoon, the main rainfall zone moved northward again, and a dipole-type of rainfall pattern prevailed in eastern China with floods in the north and droughts in the south (Wang 2001;Ding et al. 2008;Ding et al. 2018;Si et al. 2009;Zhu et al. 2011;Huang et al. 2011;Lü et al. 2014;Zhang et al. 2021).
Many scientists have conducted research to explore the mechanisms for the decadal variability of the EASM. At the end of the 1970s, it was found that a persistent interdecadal scale cooling occurred in the upper troposphere and lower stratosphere. On the one hand, it affected the cyclonic circulation anomaly in the upper levels and promoted the strengthening of the westerly winds to the south of the East Asian jet axis. On the other hand, it influenced the anticyclonic circulation anomaly in the lower levels and weakened the EASM (Yu et al. 2004, Yu andZhou 2007) . Based on results of numerical experiments, Li et al. (2008) demonstrated that the decadal weakening of the EASM is attributed to the decadal change in the tropical sea surface temperature (SST). Changes in the SST over the tropical central and eastern Pacific led to tropospheric warming above the tropical ocean in the late 1970s. At the same time, temperature in the temperate zone of continental East Asia decreased. As a result, the thermal contrast between the ocean and the land reduced, which subsequently weakened the EASM. Yu (2013) believed that the negative correlation between the phases of the Pacific Decadal Oscillation (PDO) and the EASM is the reason for the decadal weakening of the EASM since the late 1970s. The phase change of the PDO caused SST variation in the Pacific and Indian oceans, which affected the intensity of the western Pacific subtropical high and led to the anticyclone anomaly over the western Pacific. As a result, the EASM was weakened (Gong and Ho 2002;Dong and Xue 2016). In addition, Ding et al. (2008Ding et al. ( , 2009 found that the weakening of the atmospheric heating over the Tibet Plateau since the late 1970s has effectively reduced sensible heat flux transfer from the surface to the atmosphere, leading to increased snowfall in the previous winter and spring. Larger than normal snow cover in the Tibet Plateau delayed the formation of monsoon temperature gradient, and thereby weakened the EASM. Other factors such as North Atlantic SST tripole and North Pacific Gyre Oscillation can also affect the decadal variability of the EASM (Zuo et al. 2013;Ye et al. 2016).
In recent years, extensive attention has been focused on decadal climate prediction, which potentially has great impacts on economic and social development (Meehl et al. 2014;Zhou and Wu 2017) . Decadal climate prediction at present mainly relies on the initialized climate models, yet the initial shock is still a problem in the model initialization that has not been fully solved (Meehl et al. 2014;Zhou and Wu 2017). Compared with that in the North Pacific, the initialization in decadal prediction over the North Atlantic has been remarkably improved (Doblas-Reyes et al. 2013;Kirtman et al. 2013). Most of the current models have demonstrated satisfactory skills for prediction of the Atlantic Multidecadal Oscillation, whereas the model skills for PDO prediction are generally low (Kim et al. 2012). In addition, the model performance for the prediction of land surface temperature in the northern hemisphere remains poor. Compared with the uninitialized prediction skills, the model performance shows no obvious improvement after implementing initialization strategies (Doblas-Reyes et al. 2013;Wu et al. 2019). Also, decadal prediction of precipitation by climate models still has problems. The models only show prediction skills over some areas, and the improvements are quite limited even after the initialization strategy is implemented (Kirtman et al. 2013;Meehl et al. 2014;Wang et al. 2018). Therefore, decadal climate prediction by current climate models still remains a challenging issue.
Based on the interannual increment approach (Fan et al. 2008;Wang et al. 2012), Huang and Wang (2020a, b) proposed a decadal increment method to address climate prediction. In this method, decadal signals are first obtained from moving averages of the raw data, and the decadal increments are then used to identify the predictors to build the forecast model. Finally, the predicted increment is combined with the previous observations to get the final prediction result. This method helps to increase the effective samples and obtain more useful decadal signals of the climate system from previous observations. Great progress has been made in the predictions of PDO and decadal variability of summer precipitation in North China using the decadal increment method. However, it still remains questionable whether this method can be effectively applied to the prediction of other climate variables. Taking into account the close relationship between the EASM and the summer climate in China and the limited predictive skills of current climate models for precipitation and land surface temperature, this study attempts to predict decadal variability of the EASM using the decadal increment method.
The rest of this article is organized as follows. Section 2 describes the datasets and method. Section 3 introduces how to use the decadal increment to build a statistical model. The hindcast skills of the statistical model based on the decadal incremental method are presented in Sect. 4. The discussion is given in Section 6. Finally, Section 7 presents the summary.

Data and methods
Monthly mean reanalysis datasets used in this study include the following: Centre with a horizontal resolution of 1° × 1° and covers the period from 1910-2020 (Rayner et al. 2003).
There are few datasets covering the period from hundred years ago to current and the 20CR datasets used in this paper end at 2015. In order to predict in real time, the 20CR datasets before 1979 and NCEP2 datasets for the rest of years are combined.
The EASM index (EASMI) is defined as the normalized area-mean (110°E-125°E, 20°N-40°N) wind speed at 850 hPa in summer (June-July-August) (Wang 2002). Based on the decadal increment method proposed by Huang and Wang (2020a, b), the statistical model is established. First, decadal EASMI is obtained from the 5-year running mean of the original EASMI and marked as EASMI in the middle year of the 5 years. Decadal increment (DI) of the EASMI (DI_EASMI) is then calculated from decadal EASMI; that is, the decadal EASMI of the current year minus the decadal EASMI 3 years ago will be treated as DI_EASMI of the current year (Eq. 1). Finally, predictors are identified based on DI_EASMI and a prediction model is built using these predictors. The final predicted EASMI is the predicted DI_EASMI plus the observed EASMI 3 years ago (Eq. 2). For example, the final predicted EASMI in 1924 is obtained by adding the predicted DI_EASMI in 1924 to the observed EASMI in 1921. Similar to the DI of the predictand, the DI of the predictor is obtained by performing the 5-year running mean first and then calculating the 3-year increment. To avoid possible containing of the information of the prediction period in the predictors, the predictors have to lead the predictand by at least 3 years. For example, the summer DI of the SST (DI_SST) from 1918 to 2015 and the DI_EASMI from 1921 to 2018 are used for correlation analysis to find key predictors, and so on.
In order to judge whether climate variables follow normal distribution, the skewness coefficient (g1) and kurtosis coefficient (g2) are calculated.
where n is the number of the sample,‾x and s are the mean and standard deviation of the sample. The skewness coefficient characterizes the degree to which the peak of the curve deviates from the mean value. A positive g1 indicates that the mean is to the left of the peak, and a negative g1 indicates that the mean is to the right of the peak. The kurtosis coefficient characterizes the convexity of the peak of the distribution pattern and measures the degree of concentration of the frequency distribution. A positive g2 indicates that the frequency distribution is more concentrated than the normal distribution, and the average is more representative, and a negative g2 is the opposite.
Cross-validation and independent hindcast are used to verify the predictive skill of the established empirical statistical model. In the cross-validation, the EASM observational data for the period 1921-2018 and the predicted EASM data 3-5 years ahead of the observation time are selected first, and the 5-year data centred on the target year are excluded. The data in the remaining years are then used to build the model to predict the EASM in the target year. This process is repeated for each target year. The first and last three years of the data are verified by leaving the first and last five years of data out. The independent hindcast uses the same observational data. To avoid using the information of the prediction period, the data from the 64 years to 3 years ahead of the target year are used as the training samples to establish the empirical statistical model to predict the situation in the target year. The aforementioned process is repeated for . For example, if the target year is 1990 (1991), the data from 1926 to 1987 (1927-1988) is used to build the prediction model. The significance of the correlation coefficient is tested by Student's t-test. The effective sample size N* is computed (Bretherton et al. 1999): where N is the number of available time steps and r x and r y are the autocorrelation coefficients of the two correlated variables lagging one step behind.
The mean square skill score (MSSS) is used to test the predictive skill of the model (Murphy, 1988;Goddard et al. 2012). The MSSS algorithm is written as: where fi and o i represent the time series of observations and forecasts, respectively. MSSS reflects the percentage reduction of the mean square error (MSE) predicted by the statistical model, and MSE C is the MSE of the "climatological forecast." A positive MSSS indicates that the statistical model prediction is better than that of the "climatological forecast," and a negative MSSS indicates that the model forecast is inferior to the "climatological forecast."

Prediction model
Based on the definition of the EASMI, decadal EASMI is calculated. The time series of EASMI for the period 1918-2020 is displayed in Fig. 1a, which shows that the EASM was in a positive phase from 1920 to the mid-1930s and then switched to a negative phase from the mid-1930s to the mid-1940s. The EASM was relatively strong during the period from the 1950s to the 1960s, but it suddenly weakened at the end of the 1960s and the 1970s. This result is consistent with studies of Wang (2001), Guo et al. (2003), Huang et al. (2004a, b), and Ding et al. (2008). In the early 1990s, the EASM began to intensify again. This phenomenon is also found in Liu et al. (2012) and Ding et al. (2013). It returns to negative phase from the late 1990s until now.
The basic assumption of climate statistics analysis is that the climate variables follow a normal distribution. When the skewness coefficient (g1) and kurtosis coefficient (g2) of the data are both less than 1.96, the data can be treated as following normal distribution at the 95% confidence level. For the decadal EASMI (Fig. 2), both the skewness coefficient and the kurtosis coefficient are higher than 1.96, which means right shifting skewness distribution than normal distribution and steeper kurtosis distribution. The 1-year and 2-year increments of decadal EASMI also do not satisfy the normal distribution. Interestingly, the 3-year increment follows normal distribution at the 95% confidence level (Fig. 2). Therefore, the 3-year increment is selected for further analysis.
This paper implements the decadal increment method and uses the predictors in the decadal increment form to establish the empirical statistical model for DI_EASMI prediction. First, based on findings of previous studies, potential predictors for the decadal prediction are mainly selected from the external forcings which includes SST, SIC, snow cover, soil temperature, and so on. They are 3-5 years ahead of the DI_EASMI, and the number of which are far more than the final predictors. The stepwise regression method is then used to select predictors that are significantly related to DI_EASMI. And the selected predictors must be independent of each other. The finally built prediction models during 1921-2018 and 1985-2018 are expressed by The total explained DI_EASMI variance is 55% for the entire predict period of 1921-2018, but can increase to 73% during recent decades .
The first predictor selected in this paper is the tropical ocean temperature signal in summer that leads DI_EASMI by 4 years. Figure 3a shows the spatial pattern of the correlation coefficient between DI_SST and DI_EASMI. The negative center appears in the western equatorial Pacific, while the positive center is in the central and eastern Pacific. To quantify this characteristic relationship, area-weighted regionally averaged DI_SST in these two regions (multiplied by − 1 in the negative region) is defined as the sea surface temperature index (DI_SSTI). The ranges of positive and negative regions are (14°S-4°N, 170°E-160°W) and (2°-14°N, 133°E-153°E), respectively. The DI_SSTI has a quasi-10-year cycle (Fig. 4a), accompanied by a 4-6-year significant negative autocorrelation (Fig. 5a), which means SST signal can last 4-6 years. Therefore, the warmer equatorial western Pacific and the colder equatorial central and eastern Pacific are conducive to the enhancement of DI_ EASMI. Physically, the equatorial central and eastern Pacific (9) DI_EASMI = 0.57 * DI_SSTI + 0.47 * DI_SIC (10) DI_EASMI = 0.54 * DI_SSTI + 0.66 * DI_SIC warming is favorable for the EASM weakening because anomalous ascending motions develop above the equatorial central-eastern Pacific while descending motions occur over the subtropical western Pacific. The above circulation is the so-called "quasi-Walker circulation." At the same time, an enhanced East Asian Hadley Cell is induced, which leads to a weaker EASM (Ying and Sun 2000;Wu et al. 2003;Zeng et al. 2007;He et al. 2015). Additionally, the western Pacific warming can strengthen the EASM through changing in situ precipitation and further triggering descending atmospheric Rossby waves (Zhang et al. 1999;Wang et al. 2000a, b;Wang et al. 2013;Huang et al. 2018).
The other predictor is the leading 5-year Barents Sea SIC (77°N-80°N, 35°E-55°E) in autumn (Fig. 3b). It has a 10-year cycle (Fig. 4b) and a significant negative autocorrelation of 4-5 years (Fig. 5b), which means SIC signal can last 4-5 years. The changes of SIC are highly correlated with the variability of the EASM (Zhao et al. 2004;Guo et al. 2013;Lin and Li 2018). The variability of SIC may trigger quasi-stationary Rossby waves through direct thermal forcing and dynamic processes in the atmosphere. The Rossby waves transfer energy to East Asia and affect the climate here (Wu et al. 2009a, b;Zhang and Wu 2011). In addition, the SIC anomaly is closely related to changes in the extent of Eurasia snow cover (Cohen et al. 2012;, which has an impact on EASM (Wu et al. 2009c) through affecting the soil moisture (Zhang and Zuo 2011). Figure 6 shows time series of DI for each predictor and DI_EASMI. The correlation coefficients of DI_EASMI with DI_SSTI and DI_SIC are 0.71 and 0.47, respectively, both of which are significant at the 99% confidence level by the Student's t-test. The variabilities of the predictors are relatively consistent with DI_EASMI. Generally, predictors are stably and significantly correlated with DI_ EASMI across most study period (Fig. 7). Due to the long study period, the correlation between each predictor and Fig. 1 a Decadal EASMI during 1918and b DI_EASMI during 1921 DI_EASMI may present decadal changes. The correlation between the DI_SSTI and DI_EASMI dropped abruptly after 1950 and became insignificant from the early 1970s to the mid-1980s (Fig. 7a), which is possibly due to the insignificant quasi-10-year wavelet power (Fig. 4a). Similarly, the insignificant quasi-10-year cycle of DI_SIC may be responsible for the dropped correlation between the DI_SIC and DI_EASMI before 1950 ( Fig. 4b and Fig. 7b). Therefore, the quasi-10-year decadal variabilities of DI of SSTI and SIC may play important roles in the decadal prediction of the EASM.

Prediction effect
To evaluate the performance of the established statistical model, cross-validation and independent hindcast are conducted in the present study. Results of the cross-validation are displayed in Fig. 8a, which shows that the correlation coefficient between the predicted DI_EASMI and the observation for the period 1921-2018 is 0.71, which is significant at the 99% confidence level by the Student's t-test. MSSS is 0.50. The predicted DI_EASMI well captures the variability shown in observations during the study period except for the late 1930s, the mid-1950s and the 1980s, when decadal variability of the related relationships occurred. This result indicates that the empirical statistical model developed in the present study has a high predictive skill. The DI_EASMI predicted by the empirical statistical model is added to the leading 3-year observed EASMI to obtain predicted decadal EASMI. Except for the inconsistency that existed in the late 1940s to the mid-1950s, and from the late 1970s to the mid-1980s, the predicted EASMI realistically reproduces the positive phase over the period from the 1920s to the 1930s, the negative phase from around the late 1930s to the 1940s, the positive phase since the 1960s, the weaker EASM in the mid-1980s, the intensification of the EASM in the early 1990s and the weakening in the late 1990s. The amplitude of the EASM oscillation is also quite consistent with observations. Furthermore, the correlation coefficient between the observed and predicted EASMI is 0.88 and MSSS is 0.71 ( Fig. 8c and Table 1). The above results show that the decadal increment method has a high hindcast skill.
The hindcast skill is further verified based on the independent hindcast of the DI_EASMI from 1985 to 2018. Figure 8b shows that the variability displayed in observations Fig. 2 Skewness coefficient (red bar) and kurtosis coefficient (blue bar) of decadal EASMI and DI_EASMI. "0" represents decadal EASMI; "1," "2," and "3" represent 1-year, 2-year, and 3-year increment of decadal EASMI, respectively is basically grasped in most years, and the correlation coefficient between the observed and the predicted DI_EASMI is 0.85, which is above the 99% confidence level by the Student's t-test. MSSS is 0.71. The variability and phase of the predicted EASMI are relatively consistent with observations, including the transition from positive to negative EASMI in the late 1990s. The correlation coefficient between the observations and predictions of the EASMI is 0.83, and MSSS is 0.64 ( Fig. 8d and Table 1).
Regardless of cross-validation or independent hindcast, the prediction model shows good prediction results, which increases our confidence in operating the real time prediction of EASM. Therefore, we further use this model to predict the EASM in 2021 and 2022. According to the results of independent hindcast, DI_EASMI from 2019 to 2021 are all near the value of 0, and the EASMI in 2021 is significantly greater than in 2019 and 2020, but it is still in a negative phase.

Discussion
To compare with the decadal increment method, the empirical statistical model in the original form is established, and the predictors are selected over the same areas as that selected for the decadal increment model (Eq. 9).
The correlation coefficients of EASMI with SSTI and SIC are 0.69 and 0.45. Results of the cross-validation show that the correlation coefficient between observations and predictions of the EASMI by the empirical statistical model in the original form is 0.60 and MSSS is 0.35 ( Fig. 9a and Table 1), both of which are lower than that between observations and forecast of the empirical statistical model for decadal although it reproduces the negative phase around 1940, the positive phase in the early 1960s, the positive phase in the early 1990s, and the negative phase in the late 1990s (Fig. 9a). In the independent hindcast, the correlation coefficient between the predicted and the observed EASM from 1985 to 2018 is 0.45 and the variability of the prediction is similar to the observation ( Fig. 9b and Table 1). However, the amplitude of the prediction is different from the observation, and MSSS is − 0.22 (Table 1). This result indicates that the hindcast skill of the statistical model is quite limited. Therefore, using the decadal increment method to build a statistical model is helpful to improve the skill for the EASM prediction.

Summary
Based on previous research of decadal changes in the EASM and analysis of the relationship between the DI of predictors and DI_EASMI, the present study develops the empirical statistical prediction model that is combined with the decadal increment method to predict decadal variability of the EASM.
The statistical model is valuable when the relationships between the predictand and the predictors remain stable during the study period. The process of model building is illustrated in Fig. 10. First, based on the 5-year running mean EASMI, the 3-year decadal DI_EASMI is calculated, that is, the running mean EASMI of the current year minus the EASMI 3 years ago. The leading 4-year summer DI_SSTI over the equatorial Pacific and the leading 5-year autumn DI_SIC over the Barents Sea are then selected as predictors. Note that these predictors remain independent of each other. Compared with the method of selecting the original  SSTI, b SIC). The dotted lines are at the 90% and 95% confidence level by the Student's t-test, and the effective sample numbers are 24 and 24, respectively variables directly, the incremental method increases effective samples in the correlation analysis and obtains more useful decadal signals in the climate system from previous observations. Finally, the selected predictors are used to build the statistical model, and the final predicted EASMI is the predicted DI_EASMI plus the observed EASMI 3 years ago. Results of cross-validation from 1921 to 2018 indicate that the variability and amplitude of the predicted EASMI are relatively consistent with the observed EASMI. Their correlation coefficient is 0.88, and MSSS is 0.71. The independent hindcast during the period 1985 to 2018 also shows that the variability and amplitude of the predicted EASMI are consistent with the observed EASMI. Their correlation coefficient is 0.83, and MSSS is 0.64. The above results indicate that the decadal incremental method can well reproduce the characteristics of decadal variability of the EASM. Compared with the statistical model in the original form, the prediction skills of the model developed in this study are also improved, which makes it a great tool for future prediction of decadal changes in the EASM. Therefore, the model is further used to predict the EASM in 2021. The results show that EASM is in a negative phase in 2021, so the chance for the occurrence of strong decadal EASM would be rare.
Using the decadal increment method, preliminary attempts have been made to predict decadal variability in the EASM, and satisfactory results have been achieved. This study provides a new solution for the prediction of decadal variability. In the future, machine learning combined with this method can replace multiple linear regression for the prediction of other climate phenomena, such as decadal variability of drought in North China. Since  the predictive effect of the decadal increment method is better than that of the original form, it can also be applied for decadal prediction in climate dynamic models, which may further improve the predictive skill of climate models.