Geopotential-based Multivariate MJO Index: extending RMM-like indices to pre-satellite era

Model simulations suggest that Madden–Julian oscillation (MJO) activity changes under the anthropogenic climate change background. However, satellite observations, which provide information on MJO convection activity, are not available before the 1970s, hindering research on the long-term historical variability of MJO. This study aims at extending the data length of MJO indices that include both MJO circulation and convection features, such as the widely used Real-Multivariate MJO (RMM) Index, to the pre-satellite era. This paper introduces a new MJO Index construction method, in which the outgoing longwave radiation (OLR) input is derived from upper-level geopotential, and names it as the Geopotential-Based Multivariate MJO (GMM) Index. The GMM Index is derived from 1902 to 2008 based on the 20th century reanlaysis product, by assuming that the relationship between OLR and geopotential does not change over time, and is compared with the filtered version of the RMM (FMM) Index during 1981–2008 and historical observed precipitation records in the 20th century. The GMM Index is shown to (1) have the same climatological properties as the FMM Index, (2) be statistically highly correlated to the FMM Index, and (3) be able to indicate MJO activities and MJO’s convection features in the pre-satellite era. The overall bivariate correlation between the FMM and GMM indices based on ERA-20C is 0.964. Evaluation results confirm the validity of the proposed MJO Index construction method, which could capture MJO convection activity in the pre-satellite era and can be applied to all MJO indices that require OLR inputs. This study provides an alternative way that overcomes the difficulty of historical MJO studies and will be beneficial to our understanding of the long-term change of MJO.


Introduction
The Madden-Julian oscillation (MJO; Julian 1971, 1972), named after Madden and Julian, is an important tropical atmospheric system often treated as a bridge linking weather and climate (Zhang 2013). Despite being named an oscillation, the MJO acts more like an eastward-moving atmospheric pulse with convection clusters 5000-10,000 km spatial scale (Nakazawa 1988;Lau et al. 2012) and notable zonal wind anomalies along the equator. Lower-tropospheric equatorial westerly and easterly anomalies are observed to the west and east of the MJO deep convection center, respectively. The low-level westerly and easterly anomalies are, respectively, the Rossby and Kelvin wave responses to the heat source over the deep convection center (Gill 1980;Wang and Rui 1990;Hsu and Li 2012;Wang and Chen 2017). The upper-level zonal wind also shows similar Rossby-Kelvin wave responses.
Given the dominant planetary-scale circulation and convection features of the MJO, MJO signals are usually extracted from outgoing longwave radiation (OLR) and tropospheric zonal wind data. Numerous methods and indices were proposed to derive MJO signals from observed atmospheric data. Among all, the Real-Multivariate MJO (RMM) Index (Wheeler and Hendon 2004, hereafter WH04), developed based on OLR, zonal wind at 200-hPa (u200) and , is the most widely used MJO Index. Constructed from the two leading principal components (PCs) of combined empirical orthogonal function (CEOF), the RMM Index can well visualize each MJO event's propagation and intensity in a two-dimensional Cartesian phase diagram, in terms of RMM phase and RMM intensity. Although some research pointed out the limitations and shortages of the RMM Index (e.g., Roundy and Schreck III 2009;Ventrice et al. 2011;Straub 2013;Wolding and Maloney 2015;Liu et al. 2016), the simplicity and convenience of the RMM Index makes it the most commonly used MJO Index at present.
A limitation of the RMM Index is its short data record. Like most MJO indices, the RMM Index is derived from OLR, a satellite-based data that captures MJO's convection features (Liebmann and Smith 1996); thus it is challenging to construct the RMM Index in the pre-satellite era when satellite data is absent. In other words, this hinders scientists from studying the observed long-term variability of MJO before the 1970s using the RMM Index. In order to explore MJO's long-term change and interdecadal variation, Oliver and Thompson (2012) reconstructed the RMM Index over the period 1905-2008 by regressing tropical surface pressures onto WH04's RMM Index. The MJO-related convection information in the RMM Index was assumed to be included in the regression model, although no convection data was directly input into the regression model. Their reconstructed index only accounts for 69% of the RMM index.
Other different ways have been applied to extract MJO signals in the pre-satellite era. A simple yet standard method is examining the intraseasonal variability of lower-and upper-tropospheric circulation (e.g., zonal wind) over the MJO-active area. For example, Slingo et al. (1999) showed an increasing trend of interannual MJO activity since 1958 by analyzing the variances of bandpass (20-100 days) filtered u200 and u850 averaged between 10° S and 10° N. Chen et al. (2017) also defined an MJO intensity index in a similar way (amplitude of u850 averaged over 10° S-10° N, 120°-160° E) in his study of interannual and interdecadal MJO variability during 1861-2010. Another method reflecting MJO activity is to compute the two leading PCs of MJO circulation responses, such as u200, u850, velocity potential, stream function, or combinations of them (Slingo et al. 1999;Jones and Carvalho 2006;Ventrice et al. 2013). A widely known example is the Velocity Potential MJO (VPM) Index, which captures MJO signals from u200, u850, and 200 hPa velocity potential (Ventrice et al. 2013). Although the VPM Index was initially not designed to solve the data length problem of the RMM Index, the VPM Index does not require observations of OLR and is thus available in the pre-satellite era. Among the above, however, most methods stated above could hardly consider both the convection and circulation features of MJO in their studies due to the absence of OLR observations.
Since the MJO plays a vital role in extreme weather climate events, it is important to understand how the MJO activity changes under the anthropogenic climate change background. However, due to the lack of satellite observations, studies on historical MJO variability could hardly capture the MJO's convection activity; and, studies about the impacts of climate change on MJO are mainly based on sophisticated modelling techniques, which might not provide perfect representations of the climate system (Henderson et al. 2017;Maloney et al. 2019). Thus, aiming at solving the above difficulties, this study introduces a new method, based on recent findings of the close relationship between atmospheric geopotential field and MJO deep convection (Leung and Qian 2017, see Sect. 3.1), to construct an extended MJO Index that (1) takes both MJO's deep convection and circulation features into account, and (2) covering both the satellite and pre-satellite eras. Leung and Qian (2017) discovered that the zonal gradient of equatorial 150-hPa geopotential (z150) anomaly (hereafter 150-hPa ∇ x z � ) is able to indicate MJO deep convection centers, which is consistent with the similarity of the upper-tropospheric circulation pattern in the MJO to a free Kelvin wave (Roundy 2020). As mentioned above, OLR observation data has been only available since 1974; thus, one has to obtain a longer time series of MJO deep convection before extending the MJO Index. The discovery of 150-hPa ∇ x z � provides a new option to reconstruct the MJO-related OLR and the MJO Index before 1974. More detail about the 150-hPa ∇ x z � is discussed in Sect. 3 after the data introduction in Sect. 2. The procedure of constructing the new MJO Index is also introduced in Sect. 3. The validity of the newly proposed method and MJO Index are evaluated in Sect. 4. Section 5 examines the extended MJO Index's ability to indicate MJO's activity in the pre-satellite era. Conclusions and discussions are given in Sect. 6.

Data
In the following sections, the extended MJO Index was built from the European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis of the 20th century (ERA-20C) product (Poli et al. 2016). The ERA-20C provides 4D-Var analysis of surface and upper-air data from 1 3 1900 to 2010 by assimilating surface pressure and marine winds observations only. Daily-mean z150, u200, and u850 data with a horizontal resolution of 1° × 1° were used to derive the extended MJO Index. Since satellite and upper-air observations are not assimilated in the ERA-20C product, the reanalysis product may not provide the best estimate of the atmospheric system. Thus, z150, u200, and u850 data from the ECMWF Re-analysis Interim (ERA-I) product (Dee et al. 2011) was also used to evaluate the validity of the proposed method of calculating the extended MJO Index. The ERA-I and ERA-20C data provides 6-hourly values only, and were converted into daily-mean values by taking averages.
Daily-mean OLR observation was obtained from the National Oceanic and Atmospheric Administration (NOAA) gridded interpolated OLR data archive (Liebmann and Smith 1996). The OLR data is based on polar-orbiting satellite observation and covers globally with a spatial resolution of 2.5° × 2.5° from 1974 to 2019.
In order to verify the extended MJO Index's ability to indicate MJO activity in the pre-satellite data, historical rainfall observations are used in the following analyses. The historical rainfall data is obtained from the Global Historical Climatology Network-Daily (GHCND) archive (Durre et al. 2008(Durre et al. , 2010Menne et al. 2012), which contains daily precipitation records from over 100,000 stations in 180 countries and territories up to 175 years. In this study, rainfall records of stations over Central North Australia (12°-15° S, 130°-135° E) and Northeast Australia (12°-15° S, 140°-145° E) that cover the period of 1902-2008 are selected.

Construction of the geopotential-based
Multivariate MJO (GMM) Index

150-hPa ∇ x z � : a proxy of OLR measurement
Since OLR measurements are based on satellite observations, a good proxy for OLR is needed for constructing an extended MJO Index that contains MJO's convection information. Leung and Qian (2017), by examining the observed geopotential structures of all MJO events during 1979-2013, discovered that 150-hPa ∇ x z � (i.e., zonal gradient of equatorial z150 anomaly) is in phase with MJO's convection activity. Namely, 150-hPa ∇ x z � shows positive above the MJO convection center over the MJO-active regions, and vice versa. Noting that ∇ x z � is referred to as the MJO-filtered (a space-time bandpass filter of 0-10 wavenumber and 15-100 day period, see Sect. 3.2 for more detail) zonal gradient of equatorial z150 anomaly in the following discussion, unless specified.
The relationship between 150-hPa ∇ x z � and MJO convection is an outcome of the MJO's Kelvin wave response. The MJO convection center acts as a mass source in the upper troposphere, thereby yielding a Kelvin ridge response to the east and a Kelvin trough plus a Rossby ridge response to the west. The MJO has a first baroclinic structure. In the lower troposphere, an equatorial low and high are respectively located to the east and west of the MJO convection center and a pair of off-equatorial low in its two flanks; in the upper troposphere, the equatorial low and high are respectively to the west and east of the convection center, and a pair of off-equatorial highs are in its two flanks (Adames and Wallace 2014) (Fig. 1a). The MJO convection center is located between the upper-level high and low anomaly to its east and west, respectively, and this is why the upper-tropospheric ∇ x z � is in phase with MJO's convection activity.
Through numerical simulation experiments based on the ideal theoretical MJO framework by Wang and Chen (2017), it was shown that the upper-level positive ∇ x z � center always appears right above the MJO convection center once the MJO Rossby-Kelvin wave packet starts to develop (Leung 2019). Leung and Qian (2017), using four sets of reanalysis products (including ERA-I, ERA40, NCEP1, and NCEP2), also showed the strong correlation between 150-hPa ∇ x z � and OLR anomaly over the MJO-active regions after applying a wavenumber-frequency filter (space-time bandpass filter, Wheeler and Kiladis 1999;Kiladis et al. 2006) of 0-10 wavenumber and 15-100 day period (hereafter MJO-filter). Based on the ERA-I, their correlation coefficients range from − 0.64 to − 0.75 from the western Indian Ocean to the western Pacific region, although this relationship is not significant over the eastern Pacific and Atlantic areas where active MJO convection is seldom observed. In all the four reanalysis products, the MJO-filtered 150-hPa ∇ x z � is the most correlated with filtered OLR anomaly, while ∇ x z � at other altitudes are relatively insignificant and even have a lead-lag relationship with OLR anomalies (Fig. 1b), so the 150-hPa ∇ x z � is taken as the proxy of MJO deep convection is the following analyses.

Procedure of constructing the GMM Index
Given the high correlation of 150-hPa ∇ x z � with OLR and the long record of ERA-20C reanalysis, one could estimate the OLR by 150-hPa ∇ x z � in the pre-satellite era. This could compensate for the missing OLR data due to the absence of satellite observation. Using the estimated OLR and zonal wind data from ERA-20C reanalysis, an extended MJO Index that takes both MJO-related circulation and convection features into account can be derived based on procedures similar to that in WH04. To distinguish with the currently existing MJO indices, the MJO Index introduced in this paper will be referred to as the geopotential-based Multivariate MJO Index (GMM Index). The detailed procedure of constructing the GMM Index is as follow: (1) Data pre-processing Daily-mean z150, u200, and u850 data from reanalysis products and OLR data were used in the following steps. In order to remove unrelated seasonal, interannual, and interdecadal variability, etc., the mean and first three harmonics of the annual cycle of each vari-able are first removed. Then, a wavenumber-frequency filter (space-time bandpass filter, Wheeler and Kiladis 1999;Kiladis et al. 2006) of 0-10 wavenumber and 15-100 day period (MJO filter) is applied to extract the intraseasonal, eastward-propagating MJO-related anomalies (hereafter, anomalies are referred to as MJO-filtered anomalies unless specified). The resulting u850, u200, OLR, and z150 anomalies were assumed to mostly contain MJO information and used to construct the GMM Index below. The MJO filter may cause weakening effects at both ends of the time series, and hence the data of the first two years and last 2 years of each dataset is excluded in the following analyses.
(2) Calculation of 150-hPa ∇ x z � After obtaining the MJO-filtered anomalies, the 150-hPa ∇ x z � (unit: gpm m −1 ), namely the MJO-filtered zonal gradient of z150 anomaly, is derived by taking the zonal spatial derivative of MJO-filtered equatorial z150 anomaly: ∇ x z � = z � x , where z ′ and x are the z150 anomaly averaged over 10° S-10° N and the zonal distance respectively.
(3) Derivation of OLR from 150-hPa ∇ x z � The OLR is derived from 150-hPa ∇ x z � based on linear regression. Leung and Qian (2017) verified that the linear relationship between 10° S-10° N 150-hPa ∇ x z � and 15° S-15° N OLR holds not only in the satellite era, but also during 1974-1977 using the National Centers for Environmental Prediction (NCEP) Reanalysis 1 (NCEP R1) (Kalnay et al. 1996) and 40-year ECMWF Re-Analysis (ERA-40) (Uppala et al. 2005) data. However, this relationship has not been confirmed with ERA-20C. Here, we show in Fig. 2 that the linear relationship between 15° S-15° N OLR anomaly and 10° S-10° N 150-hPa ∇ x z � derived from ERA-20C (Fig. 2c, d) does hold over MJO-active regions as that derived from ERA-I ( Fig. 2a, b). Despite the fact that 150-hPa ∇ x z � derived from ERA-20C has comparatively lower correlation coefficients (− 0.823 and − 0.775 in Fig. 2c, d compared to − 0.893 and − 0.872 in Fig. 2a, b) with OLR anomaly, 150-hPa ∇ x z � is still a valid choice for deriving OLR in the pre-satellite era. In the following context, OLR anomaly is referred to OLR anomaly averaged over 15° S-15° N and 150-hPa ∇ x z � is referred to that averaged over 10° S-10° N, unless specified. The different latitude domains of OLR anomaly and 150-hPa ∇ x z � are selected based on the results in Leung and Qian (2017).
It was also reported that 150-hPa ∇ x z � may have 1-to 2-day time lag with OLR anomaly over different MJO stages. According to Leung and Qian (2017), positive 150 hPa ∇ x z � , respectively, lead and lag negative OLR anomalies in the MJO developing and dissipating stage, because the Rossby-Kelvin wave packet Fig. 1 a Horizontal distribution of composite MJO-filtered 150-hPa geopotential (contours, 4 gpm interval), 150-hPa wind (vectors, unit: m s −1 ), and OLR (shadings, 5 W m −2 interval) anomalies of all MJO events over (10° S-10° N, 90° E-100° E). Letters "L" and "H" respectively represents anomalous low and high, with subscripts "R" and "K" indicating Rossby and Kelvin wave responses. b Evolution of vertical profiles of composite MJO-filtered ∇ x z � (shadings, 0.5 × 10 −6 gpm m −1 interval) and zonal wind (contours, 0.5 m s −1 interval) anomalies averaged of all MJO events over (10° S-10° N, 90° E-100° E). Lead-lag time series (days) of composite OLR anomalies (blue), 150 hPa ∇ x z � (red) and 150 hPa zonal wind anomalies (black) are plotted at the bottom of panel b. a, b are respectively the same as Fig. 10b and 11b of Leung and Qian (2017). Detailed analyses of a and b could be referred to their paper helps trigger and develop the deep convection in the development stage, and the MJO convection is easily affected by other systems (such as the Southern Pacific convergence zone) in the weakening stage. Considering MJO's eastward propagating property, the time lag between 150-hPa ∇ x z � and OLR implies that there may be zonal displacements between the two variables. Namely, positive 150-hPa ∇ x z � leading negative OLR anomalies implies that the MJO convection center is located to the west of the positive 150-hPa ∇ x z � , and vice versa. Thus, before applying linear regression, we first calculate the correlation coefficients between OLR anomaly of each longitude ( OLR ) and 150-hPa ∇ x z � of 15 degrees longitude nearby, and then determine the longitude ( z ) at which 150-hPa ∇ x z � has the largest correlation coefficient with OLR anomaly for OLR . Considering the seasonal variation of MJO activity, the calculation z for each OLR is repeated for each The red dashed line is the regression line of each case. Noted that the longitudes of 150-hPa ∇ x z � are different for different OLR and datasets. Correlation coefficients of all panels are shown in the figure title and are all statistically significant at the 99.9% confidence level month, thus z is a matrix with dimension of length of OLR times number of months ( 144 × 5 ). Since this study only covers the extended winter period, only five months (NDJFM) are included. Then, OLR anomaly at each OLR is linearly regressed on 150-hPa ∇ x z � at z (i.e., largest correlation), obtaining the regression coefficient m OLR , z , month and intercept term c( OLR , z , month) (Eq. 1): By plugging the regression coeff icient m OLR , z , month and intercept term c( OLR , z , month) into Eq. (1), the global equatorial daily MJO-filtered OLR anomaly during 1902-2008 can be obtained from the ERA-20C MJO-filtered 150-hPa ∇ x z � . Noting that MJO convection activity is associated with negative OLR anomalies and positive 150-hPa ∇ x z � (Sect. 3.1), which means their correlation coefficients should be theoretically negative; thus, the above-mentioned "largest correlation coefficient between OLR and 150-hPa ∇ x z � " refers to correlation coefficient that is closest to − 1. Figure 2 is an example illustrating the linear relationship between OLR( OLR ) and its corresponding 150-hPa ∇ x z � z during November of 1981-2008.
(1) Figure 3 plots the maximum correlation coefficient of OLR anomaly at each longitude ( OLR( OLR ) ) with its corresponding 150-hPa ∇ x z � z during the extended winter of 1981-2008. The correlation between the two variables is more significant between 60° E and 180° E (MJO-active regions, correlation coefficients ≥ −0.7 ), and relatively smaller in the Western Hemisphere, due to the weaker MJO convection activity. The correlation coefficients of OLR anomaly with 150-hPa ∇ x z � derived from ERA-20C ( Fig. 3b) are slightly weaker than that with 150-hPa ∇ x z � derived from ERA-I (Fig. 3a). This suggests the poorer data quality of the ERA-20C reanalysis, mostly due to the lack of satellite data input, in reflecting the MJO's atmospheric status. Nevertheless, the close relationship between OLR and 150-hPa ∇ x z � still holds in ERA-20C data with their correlation coefficients ranging between − 0.7 and − 0.9 over the MJO-active region (statistically significant at the 99.9% confidence level), implying that the ERA-20C 150-hPa ∇ x z � could well reflect MJO's convection feature and it is reasonable to derive OLR anomaly from ERA-20C products. The statistical significance of correlation coefficients here and throughout the paper is tested using Student's t-test, with the degree of freedom estimated following Wang and Li (2017) considering the reduction of degree of freedom due to filtering. propagating signals (arrows in Fig. 4d, e). By applying the linear regression method introduced above (Eq. 1), the OLR anomalies derived from ERA-I and ERA-20C 150-hPa ∇ x z � are plotted in Fig. 4b, c, respectively, and both of them do reflect the two MJO events' signals.
Although the derived OLR anomalies are apparently weaker than observations from 180° E to 60° E (Western Hemisphere and Africa) because of the weaker correlation between 150-hPa ∇ x z � and OLR anomalies ( Fig. 3), this error does not affect much when constructing the GMM Index (shown later in Fig. 5). Another important note is the overall smaller amplitude of OLR anomalies derived from ERA-20C (note the different colorbars in Fig. 4). This is due to the overall relatively smaller correlation coefficients of OLR with 150-hPa ∇ x z � derived from ERA-20C (Fig. 3b), compared to that with ERA-I 150-hPa ∇ x z � (Fig. 3a). Yet this does not affect the GMM Index calculation either. (4) Construction of GMM Index After deriving OLR anomalies from 150-hPa ∇ x z � , the remaining steps are similar to those documented in previous studies. Following WH04's method of constructing RMM Index, the CEOF analysis is applied on the 15° S-15° N averaged MJO-filtered u200, u850 anomalies, and OLR derived from 150-hPa ∇ x z � . The resulting PCs (PC1 and PC2) of the first two leading modes (EOF1 and EOF2) are then used to define the phase and intensity of MJO Index on a two-dimensional Cartesian phase diagram. The fundamental difference of the MJO Index presented here is that the OLR input of CEOF is not the primary satellite-observed OLR data but derived from atmospheric geopotential field, which is available with a longer data record. ERA-20C u200 + ERA-20C u850 (Set 3, Fig. 5c). Set 1 is the same as the RMM Index, except the CEOF input variables are MJO-filtered anomalies, and is referred to as the reference that represents the "correct" MJO Index. Set 2 is similar to Set 1, besides the OLR input is derived from ERA-I 150-hPa ∇ x z � (GMM-ERAI Index), i.e., the GMM Index derived from ERA-I. Set 3 is the GMM Index derived from ERA-20C, with everything being the same as Set 2 except that all input data is based on ERA-20C, i.e., the GMM Index derived from ERA-20C (GMM-ERA20C Index). The first two leading EOF modes of all three sets explains about 50% of the total variance and show the major MJO patterns consistent with that mentioned in WH04: EOF1 shows enhanced convection activities over maritime continent and suppressed convection over Africa and Western hemisphere, while EOF2 has enhanced and suppressed convection respectively over the Pacific Ocean and the Indian Ocean, respectively (Fig. 5). Only small differences exist between the EOFs based on observed and derived OLR: both EOF1 and EOF2 of Set 2 and Set 3 slightly underestimate the MJO convection activities over the Western Hemisphere and Africa (from 180° E to 60° E) (black line in Fig. 5b); the region of anomalous OLR of Set 1 and Set 2 are slightly narrower than that of Set 3. Since the OLR input is the only source of error of Set 2, compared to Set 1, we could attribute the above difference to the error of OLR derivation from 150-hPa ∇ x z � , namely the weaker correlation between OLR and 150-hPa ∇ x z � over the Western Hemisphere and Africa. Meanwhile, although the OLR anomalies derived from ERA-20C 150-hPa ∇ x z � are overall smaller than observations (Fig. 4c), the CEOF result of Set 3 is basically the same as that of Set 2. Unlike Set 2, the sources of error of Set 3 could be due to both the relatively worse data quality of ERA-20C and the error of OLR derivation from 150-hPa ∇ x z � .
In spite of some differences in Set 2 and Set 3 compared with Set 1, the first two leading EOF modes of Set 2 and Set 3 do capture the major MJO convection and circulation features. The correlation coefficients of the resulting EOF1 and EOF2 OLR components between Set 1 and Set 2 respectively reach 0.962 and 0.922, and that between Set 1 and Set 3 respectively reach 0.929 and 0.892 (significant at the 99.9% confidence level). These results imply that the small inconsistencies in the CEOF results do not produce large errors in the final GMM Index (more details shown in Sect. 4). derived from ERA-20C 150-hPa ∇ x z � + ERA-20C u200 + ERA-20C u850 (Set 3, GMM-ERA20C index). CEOF analyses were performed over extended winter during 1981-2008. Noted that the order of EOF modes are rearranged such that they are consistent with the two leading modes of the RMM index (Wheeler and Hendon 2004) The above procedures of calculating the GMM Index are basically the same as that introduced in WH04 except that (1) the OLR input is derived from 150-hPa ∇ x z � and (2) all variables have been MJO-filtered. The remarkable similarity of the above CEOF results (Fig. 5b, c) to that in WH04 confirms the feasibility of constructing an MJO Index that includes MJO convection features without the use of raw OLR observations.

Validity of the GMM Index and sources of errors
In this section, we will assess the validity of the GMM Index by (1) examining the climatological properties of the GMM Index, (2) statistically evaluating the GMM Index, and (3) the univariate indices based on the derived OLR. As mentioned above, the GMM Index in this study is derived from MJO-filtered anomalies; thus, the RMM Index that is based on MJO-filtered anomalies (hereafter FMM Index) is referred to as a standard reference for the following evaluation. The FMM Index is constructed from the CEOF results of MJO-filtered ERA-I u200, ERA-I u850, and NOAA gridded interpolated OLR anomalies, which is equivalent to Set 1 in Sect. 3.2 (Fig. 5a). The GMM indices built from Set 2 and Set 3 are respectively defined as the GMM-ERAI Index and GMM-ERA20C Index, among which the latter covers both the pre-satellite and modern satellite era. Considering the weakening effect of the MJO filter at both ends of the time series, the data of the first two years and last two years of each dataset is neglected in the following analyses. i.e., the FMM, GMM-ERAI indices cover the extended winter during 1981-2008 while the GMM-ERA20C Index covers 1902-2008.

Climatological properties of the GMM Index
An advantage of the GMM Index, as if RMM and FMM indices, is its capability of reflecting MJO's intensity and dividing MJO's life cycle into 8 phases. By combining the PCs of CEOF results (Fig. 5), one could visualize the FMM and GMM indices on a two-dimensional Cartesian phase diagram. Figure 6 plots all the data points of the GMM-ERA20C Index of the extended winter and each month during 1902-2008, showing the climatology of the GMM-ERA20C Index. Following Lafleur et al. (2015), the GMM Index of each day is classified into four categories: (1) inactive (IA, GMM < 1.0), (2) active (A, 1.0 ≤ GMM < 1.5),  Lafleur et al. (2015). By comparing the composite spatial distribution of OLR and z150 anomalies of different MJO phases based on the three MJO indices, the GMM-ERAI and GMM-ERA20C indices are shown to reproduce the same MJO features as that indicated by the FMM Index. Figure 7 shows the composited NOAA interpolated OLR and ERA-I z150 of each MJO phase during the extended winter of 1981-2008. The MJO phases of each column are defined based on the three MJO indices and are referred to as the FMM phase, GMM-ERAI phase, and GMM-ERA20C phase, respectively. The spatial distributions of both OLR and z150 anomalies, especially their center locations and intensities, of GMM-ERAI (Fig. 7b) and GMM-ERA20C (Fig. 7c) indices are consistent with that of the FMM Index (Fig. 7a)

Statistical errors of the GMM Index in the satellite era
The main purpose of building an MJO Index is to visualize the intensities and positions of MJO activities, and to be able to track the MJO state in observations and models. Thus, in addition to the climatological properties of the GMM Index, it is more important to validate the ability of the GMM Index to reflect the correct intensities and phases of MJO events. We have shown from OLR observations that two MJO events occurred during the 2006-2007 extended winter (Fig. 4).
Here, we compare the performances of the GMM-ERAI and GMM-ERA20C indices with the FMM Index (Fig. 8).
As shown in Fig. 4a, MJO convection of the first MJO event initiated in mid-December, sustained for about one month and dissipated in mid-January; the second MJO event appeared over Africa in early-February, propagated at a slower speed and finally disappeared in late March. The FMM Index is able to reproduce the two MJO events mentioned above (Fig. 8a). The FMM Index stays weak from November to early December and its amplitude attains 1 in Phase 1 on 15 December 2006, which corresponds to the first MJO event. The FMM Index reaches its maximum amplitude at the boundary between Phase 4 and Phase 5 on 3 January 2007, which is consistent with the observed OLR anomaly center over 130° E (Fig. 4a), and then dissipates in late-January in Phase 7. Then, the FMM Index once again attains the active MJO level on 7 February 2007 in Phase 8, implying the start of the second MJO event. The FMM Index reaches maximum on 21 February in Phase 2, then weakens to inactive MJO level on 6 March, and strengthens to active MJO-level again on 20 March. According to the FMM Index, there are indeed two separate MJO events during February-March 2007, which is inconsistent with the observed OLR anomalies that show one MJO event only. This could be explained from the zonal wind anomalies which show two separate eastward propagating upper-tropospheric easterly and lower-tropospheric westerly systems (Fig. 9) Fig. 4a shows only one MJO event during February-March 2007 is probably due to that the MJO-filtered anomalies combine the two successive MJO events into one.
As shown in Fig. 8b, c, the GMM-ERAI and GMM-ERA20C indices are both able to indicate the two MJO events during the 2006-2007 extended winter. Results show that both the GMM-ERAI and GMM-ERA20C indices have the ability to reflect MJO activities. However, compared to the FMM Index, the GMM Index in general overestimates the MJO strengths. Specifically, the maximum FMM intensity during the 2006-2007 extended winter is 2.48, but the maximum intensities of GMM-ERAI and GMM-ERA20C over the same period are respectively 2.73 and 2.78. The GMM-ERAI and GMM-ERA20C indices overall respectively underestimate and overestimate MJO intensities where FMM ≥ 1 by 0.03 (2.0%) and 0.15 (8.6%) during 1981-2008. In addition, errors of the GMM Index are in general more obvious from Phase 6 to Phase 8, for example, the second MJO event initiated in early-February 2007 in Phase 8 according to the FMM Index but the GMM Index indicates that the event started in late-January in Phase 7. And the errors of the GMM-ERA20C Index are slightly larger than that of the GMM-ERAI Index. ( The statistical errors of the GMM-ERAI Index are first examined. As mentioned above, the only difference between the GMM-ERAI Index and the FMM Index is the OLR input when applying CEOF analyses: the FMM Index is based on NOAA OLR observations while the GMM-ERAI Index is based on OLR values derived from ERA-I 150-hPa ∇ x z � . Table 1 gives the BCORR and BRMSE  Given that the only possible error source of the GMM-ERAI Index comes from the derivation of OLR anomalies, the relatively large index errors in Phases 6-8 are likely due to the larger OLR derivation error over the Central Pacific to Western Hemisphere (Figs. 3a, 4b). The EOF1 patterns of the GMM-ERAI and FMM indices (Fig. 5a v.s. b) also show obvious differences from the Central Pacific to Western Hemisphere. Meanwhile, the smaller differences over the Indian Ocean and Maritime Continent explain the higher evaluation score in Phases 2-4. Similar evaluation results are obtained for different categories (inactive, active, very active and extremely active MJO) of MJO days. For active, very active and extremely active MJO days, the GMM-ERAI Index has relatively larger errors in Phases 1, 6-8 and smaller errors in Phases 2-5, which can be explained by the lower precision of OLR derivation over the Pacific and Western Hemisphere. Meanwhile, larger errors of the GMM-ERAI Index are found in Phases 1, 3, and 8 for inactive MJO days, which is a bit different from the above results, but it is meaningless to discuss phases before MJO is developed. In short, the above evaluation result shows that the GMM-ERAI Index is highly similar to the FMM Index, confirming the validity of constructing the GMM Index (or RMM-like index) based on OLR values derived from 150-hPa ∇ x z � . We further evaluate the performance of the GMM-ERA20C Index, which can be extended to the early twentieth century but may have larger errors due to the poorer data quality of ERA-20C reanalysis product. As done in above, the differences between the GMM-ERA20C and FMM indices are examined by calculating their BCORR and BRMSE (Table 2). Comparing the whole series including all MJO intensities (FMM > 0) and phases, the overall BCORR and BRMSE are respectively 0.964 and 0.434, which is approximately double that between GMM-ERAI and FMM indices. Nevertheless, the high BCORR of GMM-ERA20C Index with FMM Index implies the high similarity between the two indices. The BCORR of GMM-ERA20C Index of all MJO phases on inactive (FMM < 1), active (1 ≤ FMM < 1.5), very active (1.5 ≤ FMM < 2.5) and extremely active (FMM ≥ 2.5) MJO days are 0.873, 0.953, 0.977 and 0.986, respectively; the BRMSE are respectively 0.391, 0.421, 0.479 and 0.549. In general, the error of the GMM-ERA20C Index is greater than that of the GMM-ERAI Index, but is still sufficiently high (small) to conclude that the GMM-ERA20C Index could well indicate MJO activities as the FMM Index.
Unlike the GMM-ERAI Index, the largest errors of the GMM-ERA20C Index do not center around the Pacific and Western Hemisphere. As shown in Table 2, the lowest BCORR and largest BRMSE of the GMM-ERA20C Index As mentioned above, the errors of the GMM-ERA20C Index come from both the derivation error of OLR values from z150 data and the relatively lower accuracies of ERA-20C u200, u850 and z150 data. While the former mostly contributes to the errors in Phases 1 and 6-8, the latter could affect all MJO phases. Thus, the random distribution of large GMM-ERA20C Index errors in different MJO phases suggests that the precision of twentieth century reanalysis data contributes more to the index error.

Similarities and differences between the GMM and VPM indices
As mentioned in Sect. 3.1, the 150-hPa ∇ x z � is a Kelvin wave response of the MJO convection which acts as a mass source in the upper troposphere. The 200-hPa velocity potential, used in the VPM index (Ventrice et al. 2013), also shares a similar dynamical essence, indicating the planetary-scale upper-tropospheric divergence and convergence. In this sense, the VPM Index indicates the MJO in a similar way to the GMM Index introduced in this paper, and it can also be extended to the pre-satellite era. However, as discussed in Liu et al. (2016), the VPM Index deemphasizes convective signals of the MJO over the Indian Ocean warm pool and enhances MJO signals over the relatively dry longitudes of the equatorial eastern Pacific and Atlantic, because strong 200-hPa velocity potential anomalies can still be observed in the western hemisphere where MJO convection is weak. Meanwhile, the GMM Index is constructed based on the OLR derived from 150-hPa ∇ x z � by a regression model. The resulted OLR anomalies do not overestimate the MJO convection activity in the western hemisphere. Consequently, this avoids the CEOF calculation underestimating MJO's convective signals over the Indian Ocean or overestimating signals over the western hemisphere.
This can be illustrated by comparing the difference between the VPM and FMM indices to that between the GMM and FMM Index. We, by using the same construction method as the GMM Index, calculated the VPM Index based on the MJO-filtered 200-hPa velocity potential, u200, u850 anomalies of ERA-I (hereafter VPM-ERAI Index). The calculating procedure of the VPM-ERAI Index is the same as that of the FMM Index, except the input NOAA OLR anomalies is replaced with 200-hPa velocity potential anomalies. The MJO phase diagrams during the 2006-2007 extended winter show that while the VPM-ERAI Index could generally capture the MJO process, the VPM-ERAI Index does not accurately reproduce the MJO intensity denoted by the FMM Index (Fig. 8a v.s. d), compared with the GMM-ERAI Index (Fig. 8b). A complete statistical evaluation of the VPM-ERAI Index shows that the error between the VPM-ERAI and FMM indices (Table 3) is slightly larger than that between the GMM-ERAI and FMM indices ( Table 1). The overall BCORR and BRMSE of the VPM-ERAI Index is 0.982 and 0.273, respectively. The BCORR of VPM-ERAI Index of all MJO phases on inactive (FMM < 1), active (1 ≤ FMM < 1.5), very active (1.5 ≤ FMM < 2.5) and extremely active (FMM ≥ 2.5) MJO days are 0.940, 0.977, 0.988 and 0.993, respectively, which are all smaller than that between the GMM-ERAI and FMM indices. This indicates that the proposed GMM Index calculation procedure is able to avoid estimation error of MJO's convective signals, results in a MJO Index more similar to the RMM Index, than the VPM Index does. One should bear in mind that every MJO Index has its own advantages and disadvantages. The high BCORR (and small BRMSE) between the GMM and FMM indices does not mean that the GMM Index is a better indicator of MJO activities compared to other indices, but implies that applying geopotential-derived OLR anomalies as an input in the MJO Index calculation only produces very small differences on the resulted index.

Statistical errors of univariate MJO indices based on derived OLR
Previous studies reported that the calculation of RMM Index is more sensitive to zonal wind variation (Straub 2013;Liu et al. 2016), which suggests that the validity of geopotential-derived OLR as a substitute of satellite OLR observation could not be concluded solely from the high similarity between the GMM and FMM indices. Straub (2013) found that the bivariate correlations between the full RMM Index and an RMM Index constructed with OLR removed (i.e., u850 and u200 only) is 0.99 during 1979-2010; and, the full RMM is also highly correlated with the univariate version of the RMM with u200 (correlation coefficient = 0.91) or u850 (correlation coefficient = 0.90). These suggest that OLR added little value to the RMM Index beyond what was already determined by the zonal winds. Hence, in order to justify the use of derived OLR as a component in the MJO Index calculation, we further evaluate the accuracy of univariate OLR Index based on the OLR derived from 150-hPa ∇ x z � in this subsection.

Reconstructing MJO events in the pre-satellite era
It has been verified in Sect. 4 that the proposed GMM-ERA20C Index is highly similar to the FMM Index even though OLR observation is not used in the index calculation. In other words, one could reconstruct RMM-like indices (i.e., MJO indices that include both MJO convection and circulation features) without using observed OLR data. This breaks through the previous limitation that MJO indices are not able to indicate MJO convection activities in the pre-satellite era. In this section, we extend the GMM-ERA20C Index back to the pre-satellite era by applying the same method, and explore MJO events that occurred in the early twentieth century. Figure 10 plots the time series of the GMM-ERA20C Index intensity during 1902-2008, calculated by √ PC1 2 + PC2 2 . If an MJO event is defined when the GMM (or FMM) Index intensity is greater than 1 for more than 15 consecutive days, there were in total 58 winter MJO events in the satellite era (1981-2008; 2.15 events per year) and 167 events in the pre-satellite era (1902-1978; 2.20 events per year) according to the constructed GMM-ERA20C Index. The number of MJO events estimated by the FMM, GMM-ERAI, and GMM-ERA20C indices are the same in the satellite era  The MJO activity during the extended winter of 1905-1906 is examined based on the GMM-ERA20C Index as an example (Fig. 11a). The year was selected because the strongest MJO day was recorded on 19 February 1906. According to the GMM-ERA20C Index, there were two MJO events with three convective episodes during the extended winter of 1905-1906. (1) The first event was developed before November 1905, and had already reached Phase 5 on 1 November 1905. The MJO maintained its strength in the coming three weeks, traveled through Phases 6-1, and  1902-1928, b 1929-1954, c 1955-1980  The above MJO events indicated by the GMM-ERA20C Index are clearly shown in the Hovmöller diagrams of 150-hPa ∇ x z � , u200, u850 anomalies, and OLR anomalies derived from z150 ( Fig. 11b-d). The 150-hPa ∇ x z � and OLR anomalies clearly indicate the eastward propagations and strengths of the three MJO convective episodes (see red arrows), where the zonal wind anomalies at the end of January circumnavigate the western hemisphere and connect the anomalous convective signal. However, one might notice that the strongest OLR anomalies do not necessarily occur on the strongest days of the GMM-ERA20C Index, e.g., the strongest convection of the third MJO event appeared in late February (Fig. 11c) instead of 18 February (Fig. 11a). This is because the GMM Index, which is constructed in the same way as the RMM Index, is more easily influenced by the MJO circulation activities (i.e., u200 and u850) than by convection activities (i.e., OLR) (Liu et al. 2016).
To verify the validity of the GMM-ERA20C Index in the pre-satellite era, we compare the GMM-ERA20C Index with the daily historical rainfall observations in Australia. MJO activity is a major factor affecting the rainfall amount over Australia. Concretely, MJO tends to induce positive precipitation anomalies over North Australia during MJO Phases 4-7 (Wheeler and Hendon 2004;Wheeler et al. 2009). Figure 12 compares the historical rainfall amount observed by stations over the Central North Australia (stations over 12°-15° S and 130°-135° E, including Beatrice Hill, Darwin Botanic Gardens, Elsey, Katherine Council, and Pine Fig. 12 Boxplots of historical observed regional-mean rainfall (unit: mm day −1 ) of different GMM-ERA20C phases during 1902-2008 over a Central North Australia (12°-15° S, 130°-135° E) and b Northeast Australia (12°-15° S, 140°-145° E). c, d The same as a, b except during . Noted that only rainfall data on active MJO days (GMM-ERA20C intensity > 1) is included in the calcula-tion. Orange solid lines and blue dotted lines denote the median and mean values of rainfall in each GMM-ERA20C phase, respectively. The upper and lower bounds of the boxplots respectively indicate the 75th and 25th percentiles. The black dotted line is the average rainfall over all GMM-ERA20C phases Creek) and Northeast Australia (stations over 12°-15° S and 140°-145° E, including Coen Post Office, Moreton Telegraph Station and Musgrave) regions (Table 4) in different MJO phases defined by the GMM-ERA20C Index. It shows that the overall rainfall, including its mean, median and maximum values, over the Central Northern Australia region peak in MJO Phases 4-5 and that over Northeast Australia in Phases 5-6 during 1902-2008 (Fig. 12a, b). The result is consistent with the previous findings (Wheeler and Hendon 2004;Wheeler et al. 2009), except that the median is slightly higher in Phase 4 than 6 in Fig. 12d. The same result is also true from observations in the pre-satellite era (1902-1978, Fig. 12c, d). The above results imply that the GMM-ERA20C Index is able to indicate MJO's activity, including its convection feature and its impacts, even when satellite data is not available.

Conclusions and discussion
Aiming at extending the data length of historical MJO indices that requires an input of convection data, this study introduces a new method to construct RMM-like indices (i.e., MJO indices that include both convection and circulation information) by estimating MJO convection activities in the pre-satellite era based on upper-tropospheric geopotential anomaly. Based on the intrinsic relationship between MJO convection and upper-level geopotential (i.e., MJO convection center is located in between the upper-level high and low anomalies) (Adames and Wallace 2014; Leung and Qian 2017), the proposed method (1) first derives MJO-related OLR anomalies by a linear regression model of OLR and 150-hPa ∇ x z � and then (2) applies the derived OLR anomalies to the calculation procedures of any MJO indices (such as the CEOF calculation of RMM Index by WH04).
As an example, the proposed method is applied to extend a filtered version of the RMM Index (FMM Index), which is based on observed u200, u850, and OLR input. By replacing the OLR input with ORL anomalies derived from the ERA-I and ERA-20C reanalyzed z150, GMM-ERAI and GMM-ERA20C indices are respectively constructed and compared with the original FMM Index. The validity of GMM-ERAI and GMM-ERA20C indices are evaluated in three aspects: (1) the climatological properties of the index, (2) statistical errors of the index in the satellite era, and (3) the index's ability to indicate MJO activities in the pre-satellite era.
(1) The overall evaluation results show that the GMM Index is highly similar to the FMM Index in terms of its climatological characteristics and ability to indicate MJO events. Both the GMM-ERAI and GMM-   in Australia shows that MJO tends to induce more precipitation over Central North Australia (around Darwin, stations over 12°-15° S and 130°-135° E) and Northeast Australia (around Cape York Peninsula, stations over 12°-15° S and 140°-145° E) regions during MJO Phases 4-5 and 5-6, respectively. The result is consistent with previous studies (Wheeler and Hendon 2004;Wheeler et al. 2009) and thus verifies the ability of the GMM-ERA20C Index to indicate MJO activity in both the modern satellite era and pre-satellite era.
In Sect. 4.2, statistical evaluation results show high BCORR and low BRMSE between the GMM and FMM indices. The high correlation does not mean that the GMM Index is a better indicator of MJO activities compared to other indices, but suggests the validity and high accuracy of replacing observed OLR anomalies with OLR derived from 150-hPa ∇ x z � when constructing RMM-like indices. The error of the GMM-ERA20C Index could come from the derivation of OLR anomalies from 150-hPa ∇ x z � and the data quality of ERA-20C reanalysis product. Our results show that the latter factor may produce larger errors than the former factor. Nevertheless, the small error of the GMM-ERA20C Index demonstrates the feasibility of constructing reliable RMM-like indices over the pre-satellite era. According to the long-term trend of the GMM-ERA20C Index, the overall MJO activities have become more active during 1902-2008, in terms of both the index intensity and the number of MJO events. Since the main target of this study is introducing a new method of constructing RMM-like indices, the long-term trend of presatellite MJO activities is not given in detail here and will be discussed in our future work. A few remarks about using 150-hPa ∇ x z � as an OLR proxy, and caution of calculating the GMM Index are noted here: (1) By deriving the historical OLR anomalies from the 150-hPa ∇ x z � in the GMM Index calculation, we made an important assumption that the relationship between OLR and 150-hPa ∇ x z � remains the same in the whole twentieth century and does not change under climate change. This assumption is difficult to verify without a long-term record of satellite OLR observations. If the above assumption is invalid, the GMM Index in the early twentieth century may not be as accurate as that after 1979.
(2) The 150-hPa ∇ x z � may not fully respond to MJO's shallow convection. According to convection Leung and Qian (2017), the 150-hPa ∇ x z � is a good indicator of MJO deep convection, but shallow convection may not induce significant geopotential anomalies in the upper troposphere. Nevertheless, because the calculation procedure of the GMM (or RMM) Index (i.e., CEOF) mainly captures the MJO deep convection signals, the above shortcoming of the 150-hPa ∇ x z � does not cause great impacts on the validity of the GMM Index. (3) In fact, the 150-hPa ∇ x z � shares a similar dynamical essence to the velocity potential, in terms that they both reflect the source/sink of air mass in the upper troposphere. One could also extend the proposed OLR derivation method to the velocity potential. Namely, OLR anomalies can be derived from velocity potential by a simple linear regression model, and be further used in the MJO Index calculation. (4) The proposed GMM Index building method is based on the high linear correlation between the MJO-filtered 150-hPa ∇ x z � and OLR anomalies. This helps eliminating signals of high-frequency waves (such as the equa-torial Kelvin waves), which are also significantly large in the power spectrum of the unfiltered data of both variables. It is important to note here that the Pearson correlation between the unfiltered anomalies of 150-hPa ∇ x z � and OLR is rather weak, and we do not recommend reconstructing the unfiltered GMM Index using the proposed method. (5) OLR (and precipitation) data is also generated in the twentieth century reanalysis data, but the uncertainty of the reanalyzed rainfall data has been reported in numerous research. By comparing with individual rainfall observation records that are not assimilated in the ERA-20C, the ERA-20C precipitation data was shown to have particular large errors over the mountainous, monsoon, and tropical regions (Poli et al. 2016;Rustemeier et al. 2019). In spite of this, our additional analyses show that one could also reconstruct the FMM Index directly using the reanalyzed OLR product of ERA-20C that is highly similar to the observation (figure omitted). This provides another direction to reconstruct the MJO indices in the pre-satellite era.
The long-term behavior of the MJO in the pre-satellite era is an important topic and deserves more attention in scientific literature. Meanwhile, the short record of OLR observation limits research on the long-term historical variability of MJO. The above evidence confirms the validity of the GMM Index and the proposed MJO Index construction method. The proposed method can be applied to any MJO indices that require the input of OLR data, such as the OMI and FMO indices, by simply replacing the OLR data with that derived from 150-hPa ∇ x z � . Making use of the proposed method, one could extend the records of MJO indices, especially that involve MJO convection information, to the presatellite era and study the long-term trend and inter-decadal variability of MJO activity which can hardly be done before due to the lack of satellite observations. This study provides an alternative way to overcome the difficulty of historical MJO studies, and will be beneficial to our understanding of the long-term variability of MJO.