An Optimum Initial Manifold for Improved Skill and Lead in Long-range Forecasting of Monsoon Variability

Using an initial manifold approach, an ensemble forecast methodology is shown to simultaneously increase lead and realizable skill in long-range forecasting of monsoon over continental India. Initial manifold approach distinguishes the initial states that have coherence from a collection of unrelated states. In this work, an optimized and validated variable resolution general circulation model is being adopted for long range forecasting of monsoon using the multi-lead ensemble methodology. In terms of realizable skill (as against potential) at resolution (~ 60km) and lead (2–5 months) considered here the present method performs very well. The skill of the improved methodology is signicant, capturing 9 of the 12 extreme years of monsoon during 1980–2003 in seasonal (June-August) scale. 8-member ensemble average hindcasts carried out for realizable skill with lead of 2 (for June) to 5 (for August) months and an optimum ensemble is presented.


Introduction
The conventional formalism generally implies a loss of forecasting skill with increasing lead in the longrange prediction of rainfall (Lorenz 1965; Molteni and Palmer 1993;Goswami 1998). Although ensemble forecasting has been applied extensively in short-range forecasting (Toth and Kalnay 1997 The di culties in improving range or accuracy of forecasts are part of the general challenges in predictability; in particular, it is generally accepted that an increase in range of forecast leads to poorer accuracy due to growth of error (Lorenz 1965;Molteni and Palmer 1993). It may therefore be counter intuitive to attempt a simultaneous increase in the lead and accuracy of a forecast. It has been recognized that inclusion of improved forecast methodology, such as bias correction and multi-model ensemble can signi cantly improve forecast skill (Kirtman and Shukla 2002;Wang et al. 2004;Yun et al. 2003). Besides, the predictability at long range depends also on the scale and degrees of freedom; it has been shown that non-linearity in presence of a large number of degrees of freedom can improve predictability (Kang and Shukla 2006).
A primary source of errors in forecasts is due to the uncertainties in the initial conditions. A well-known methodology to quantify and reduce these uncertainties is ensemble forecasting (Toth and Kalnay 1997;Toth and Kalney 1993;Buizza 1997). In terms of implementation, the ensemble forecasting generate a number of simulations using multiple lead initial conditions such that, the average of the ensembles is more accurate than the simulation (forecast) obtained from any of the individual member of the ensembles. Generally, researchers consider a large sample of simulations. Using the high spread of the ensemble members which contain the qualitative information, there will be an increase in the reliability of the forecast which in turn provides a good basis for the probabilistic forecast. Thus, unlike in a classical forecast where the system evolves from a given initial state (point in the phase space), in an ensemble forecast the system evolves from a neighborhood of states.
The conventional ensemble forecasting can be said to assume all initial conditions (with comparable leads) in a given neighborhood in state space to be equivalent (Buizza et al. 1999). This is true for an isolated system like a non-linear oscillator whose initial state can be prepared by the observer. It also appears reasonable for short-range forecasting for which the day-to-day synoptic noise can overwhelm any (weaker) underlying structure (Moron et al. 2006). However, it may not be valid for processes like the Indian summer monsoon (ISM) which is a part of a continuous global dynamics. The ISM region is mainly characterized by quasi-periodic oscillation spectrum which in uences on the regional circulation and it is well established theory that ISO is very important for the tropical as well as extra tropical systems including monsoon. ISO presence also implies that the states in the pre-monsoon system are not dynamically disconnected but are embedded in an underlying dynamic. Thus, the initial states belonging to a cycle of an ISO, say 90 days, will have dynamic coherence; we shall call such states a manifold of states to distinguish them from a collection of unrelated states, such as those used in short-range forecasting or simulation of an isolated system. We shall use this dynamic coherence of initial states for improving skill in long-range forecasting of monsoon.
The important and challenging issue of ensemble forecasting is the generation of the best ensemble i.e. creation of a set of initial states which will simulate better results using the dynamical model. In shortrange prediction, many techniques evolved to create the ensembles by generating the perturbations (Toth and Kalney 1993; Kyouda and Kusunoki 2002). In LRF, the creation of ensemble of initial conditions is mainly required to represent the dynamical states on different dates. Thus, in case of long-range forecasting of monsoon, the problem of choice of an ensemble can be said to be the choice of an initial manifold. It has been shown that an ensemble with wider spread but longer lead can provide a better monsoon rainfall prediction skill than an ensemble of states, taken over a short initial manifold during pre-monsoon period i.e. March 01-April 30.

Model, Data And Methodology
The atmospheric General Circulation Model (GCM) adopted from LMD, France with variable resolution is considered for this simulation study. The GCM uses a stretched coordinate with a zoom so we can get higher resolution over a chosen domain. The principle, formulation of the model is described in earlier studies (Sharma et al. 1987;Hourdin et al. 2006) and the version used in this study is the optimum con guration and already used for long-range forecasting of monsoon rainfall (

Results And Discussion
A simple measure of manifold structure is the correlation among the members of the set (ensemble).
Correlation of states (globally averaged 850mb zonal wind) for 54 years  during certain (premonsoon) period with reference states taken from during the monsoon season (June 1, July 1 and August 1) reveals this structure (Fig. 1, left panel). In particular, the states between 18 th March-15 th April and between 15 th April -15 th May are signi cantly correlated with at least two of the reference states. In contrast, there is little such correlation of states in the winter season. (Fig. 1, right panel). Similarly, the average correlation of the states belong to CE with states of June 1, July 1 and August 1 are respectively 0.06, 0.16 and 0.15 (Fig. 1, left panel); thus these states do not have any coherence with the monsoon state. In other words, certain states during pre-monsoon period have better coherence with monsoon states and can be expected to provide better skill in capturing monsoon dynamics. The same analysis averaged over a domain 0-30 O N and 65-95 O E) is presented in Fig. 2.
We then consider eight such sub-manifolds of the initial manifold, each characterized by eight members (leads) spread over a period of about fteen days. As our null hypotheses we consider two ensembles, one is a compact ensemble (CE) of eight leads of closely packed states (April 23-May01) and the other a large ensemble (LE) of states from all the eight test ensembles (March 01-April 30). Thus, the CE, due to its short time span, can not embed the dynamical coherence inherent in an initial manifold characterized by low frequency ISO; the CE is thus akin to a collection of unrelated (synoptic) states. This is clear from the fact that none of the states of the eight leads in the CE has a signi cant correlation with any of the reference states (Fig. 2). In other words, it is expected that the better sampling of initial states over ISO time scale generally compensates error due to the longer lead and in turn improves the forecast using ensemble with shorter lead and small sampling time scale. A measure of difference among the ensembles is the standard deviation among the members (normalized to ensemble mean) and the same is presented in Fig. 3. It can be seen (Fig. 3) that the initial manifolds with wider spread have larger internal structure than a compact ensemble. Further, there is a gradual decrease in the richness of this structure beyond set 4 (March 18-April 15; Table 1). Thus, the ensembles with longer leads have the higher dispersion of states. Based on the results of Fig. 1 and Fig. 3, therefore, we expect the ensemble states between March 18 -April 15 to have highest skill and shall be referred to as optimum initial manifold (OIM) in subsequent discussion. It needs to be emphasized that the number of states in an ensemble should be chosen carefully to ensure that the results are stable with respect to the size of the ensemble; however, it has been shown that for the model con guration, dispersion among the forecasts from different ensembles saturates at an ensemble size of 6 (Goswami and Gouda, 2009). It may be mentioned that in terms of climatological seasonal cycle, most ensembles performed comparably well (Fig. 4). However, in terms of inter-annual variability in area-averaged (75-85 o E, 8-28 o N) seasonal (June-August) rainfall, de ned as departure from corresponding 24 years (1980-2003) mean, OIM outperforms both CE and LE (Fig. 5). The OIM has a phase synchronization of 67% with a correlation coe cient of 0.44 between all-India seasonal (JJA) rainfall anomalies, signi cant at 99% con dence level for the degrees of freedom involved. August) rainfall over continental India using the APCC and NCEP long-range simulations and IMD observations as presented in the gure S1which clearly indicates the simulation with APCC model has zero correlation and only 45% phase synchronization while the NCEP (CFSv2) has 0.43 correlation with 59% of Phase which is lower compared to the OIM used in our study as mentioned earlier.
A number of other parameters have been considered to quantify the skill of the forecasts ( Table 1).
Comparison of skill of the forecasts for different initial manifolds in terms of these parameters shows OIM to have highest and signi cant skill ( Table 1). The total number of failures (N UW +N OW +N M +N FF +N FD ) for OIM is 9, followed by 13 for the ensemble with starting initial state of March 01; However, this ensemble scores only 54% in terms of phase synchronization ( Table 1). As the phase synchronization is a binary (i.e. 0 or 1) process, the random forecast would be expected to result in a 50% success rate. Thus, this ensemble scores only marginally better than a random forecast.
An important consideration in evaluation of skill is the performance for extreme years. Examination of the skill for the years with amplitude of anomaly more than 5% at different scales shows (table 2) OIM to perform signi cantly better than most of the other ensembles; only two ensembles have higher (by one ) cases of extreme years in phase.
It is important to note that a larger ensemble (LE), with initial states including those of the optimum ensemble, does not provide a better forecast; the optimum initial manifold has signi cantly better skill than the LE. To measure the effectiveness of ensemble forecasting, the ratio of error (ε) in the ensemble forecast and the average forecast error from the ensemble members are computed and presented in gure 6. The initial manifolds 3-5 have among the lowest values of this error ratio; on the other hand, the CE ensemble has this ratio higher than one, while for the LE, this ratio is comparable to that of OIM.

Conclusion
The simulations carried out here in the hind cast mode using the monthly climatological SST shows that the ensemble forecasting skill is minimal but realizable unlike the simulations with observed SST having potential skill. This ensemble forecasting shows that the forecast dispersion is mainly due to atmospheric internal dynamics (atmospheric states) and the land surface processes. It is logical to expect that use of an OIM in a coupled ocean-atmosphere model will further improve the realizable skill of long-range forecasting of monsoon. While it has been recognized that prescription of sea surface temperature is a major factor that determines monsoon simulation (