Our main results section is divided into four subsections. We start the analysis in subsection 3.1 with investigating the cross-correlation between **T** and **h**, which is a fundamental statistic on which the ENSO phase space analysis is based on. This analysis will also focus on the regional shift in ENSO patterns, as it affects cross-correlation between **T** and **h**. In the following subsection 3.2, we analyse the ENSO phase space in CMIP simulations, which is the main focus of this study. This is followed by the analysis of the linear ReOsc model fitted to the CMIP simulations. We conclude our discussion by examining the effect of wind stress on ENSO dynamics and how well CMIP models capture this atmospheric feedback in the subsection 3.4.

## 3.1 Regional variations of T and h

Fig. 1a shows the observed cross-correlation between *T* and *h*. It shows the characteristic out-of-phase relation between *T* and *h* with positive cross-correlation when *h* leads *T*, zero cross-correlation at lag zero, and strong negative correlations when *T* leads *h*.

The CMIP 5 and 6 ensembles are very similar to each other, and both show significant deviations from the observed cross-correlation (Fig. 1a). All CMIP 5 and 6 simulations have negative cross-correlation at lag zero, shifting the whole ensemble of simulations into clear negative correlations at lag zero. This shift also affects the out-of-phase cross-correlations with lower positive correlation when **h** leads **T** that tend to peak at longer (10mon.) lead times than observed (7mon.). In turn, the negative cross-correlations when **T** leads **h** are stronger in the CMIP models than in the observations and tend to have short lead times (5mon.) than observed (7mon.).

The shift towards negative lag-zero cross-correlations between **T** and **h** in CMIP simulations could indicate an eastward shift in the thermocline depth variability associated with ENSO in the CMIP simulations. This would be consistent with observed regional variations in the SST cross-correlation with thermocline depth variability along the equatorial Pacific, which is known to be more positive in the eastern Pacific and more negative in the western Pacific (Burgers and Stephenson 1999; Dommenget et al. 2023).

Given that the central assumption in the ENSO phase space analysis is that the cross-correlation between **T** and **h** is zero at lag zero, we now focus on the hypothesis that regional shifts in the simulated ENSO patterns could affect ENSO dynamics (e.g., the cross-correlation between **T** and **h**) in CMIP simulations. We now analyse the equatorial patterns of the ENSO modes in the models, focussing on the SST pattern first and then on the more important regional shifts in the thermocline depth variability.

The **T** index in the ReOsc model is most closely related to the empirical orthogonal function (EOF) mode-1 of the tropical Pacific SST, see Fig. 2a. It is marked by a horseshoe pattern with warm SST anomalies in the eastern equatorial Pacific that transition into negative SST anomalies in the western equatorial and off-equatorial Pacific. This transition crosses zero equatorial SST anomalies at the longitude of about 159oE (see dashed line in Fig. 2a).

In Figs. 2b-d, we show three scenarios for EOF mode-1 of SST from CMIP simulations to highlight cases closest to the observed (Fig. 2b; ACCESS-CM2), average (Fig. 2c; MRI-ESM2-0) and worst (Fig. 2d; NorESM2-MM) in relation to the observed transition line for zero equatorial SST anomalies. While the best CMIP simulation is close to the observed with some negative SST anomalies in the western equatorial Pacific, the worst model has essentially only positive SST anomalies over the whole equatorial Pacific.

In general, the EOF mode-1 of tropical Pacific SST of CMIP simulations is similar to the observed, but the transition line for zero equatorial SST anomalies is, in ensemble mean, about 20o further west for both CMIP 5 and 6 simulations (Fig. 3a). This is consistent with the westward shift reported in previous studies (Weller and Cai 2013; Yang and Giese 2013; Bellenger et al. 2014; Capotondi et al. 2015; Wang et al. 2015; Guilyardi et al. 2020; Jiang et al. 2021; Planton et al. 2021).

The observed EOF mode-1 of thermocline depth variability is a dipole pattern (tilting mode; not shown) that has an in-phase relation with the **T** index in the ReOsc model and thus does not represent the **h** index (Meinen and McPhaden 2000). The EOF mode-2 of the equatorial Pacific is similar to a basin-wide mode, but the pattern is somewhat shifted to the eastern Pacific, see Fig. 2e. This mode is more closely correlated to **h** (correlation = 0.91). While the EOF mode-2 is mostly a monopole, it does have a transition from negative anomalies to positive anomalies in the far western equatorial Pacific at a longitude of about 146°E (see dashed line in Fig. 2e).

The EOF mode-2 of the CMCC-CM2-SR5 (best model) exhibits a similar spatial pattern, with near-zero values present around the same region with observation (Fig. 2f). A medium CMIP6 model, ACCESS-ESM1-5, demonstrates that the near-zero values shift towards the eastern equatorial Pacific region, and we can see a dipole-like structure with deeper thermocline depth in the western and shallower thermocline depth in the central to eastern pacific region (Fig. 2g). In addition, the CNRM-CM6-1 CMIP6 model (worst model) exhibits a clear dipole pattern in EOF mode-2, more similar to the tilting mode described in Meinen and McPhaden (2000; Fig. 2h).

In the ensemble of CMIP5 and 6 models, we can see that the EOF mode-2 patterns of thermocline depth (**h**) suggest an eastward shift in the equatorial Pacific region of about 20°E (Fig. 3b).

In summary, we find significant regional shifts in the SST and thermocline depth patterns associated with the ReOsc model. Interestingly, we find that the SST pattern is shifted to the west and the thermocline depth pattern is shifted to the east. At this point, we don't have a clear understanding of the reasons behind these biases, but it appears very likely that such regional shifts can affect the cross-correlation between **T** and **h**.

Following the analysis of Dommenget et al. (2023) we know that an eastward shift of the **h** index region leads to a more positive lag-zero cross-correlation between **T** and **h**. This could compensate for the negative lag-zero cross-correlation between **T** and **h** found in CMIP models (Fig. 1a). To fit into the ReOsc model and to follow the phase space concept, CMIP models should have an out-of-phase relationship between **T** and **h** (e.g., zero lag-zero cross-correlation).

To define a revised **h** index for the CMIP models which has zero lag-zero cross-correlation between **T** and **h**, we shifted the index region for **h** eastward along the equator (**h****shift**) until we found a region that most closely follows the observed zero lag-zero cross-correlation between **T** and **h**, for all CMIP simulations. We concluded that the out-of-phase characteristics for CMIP models in the ensemble mean are very similar to observation when **h****shift** is considered for the east equatorial pacific region (5°S to 5°N, 190°E to 80°W) (Fig. 1b). However, individual models do have some significant variations from this overall good fit.

In addition to this, we also investigated various regions of **T** to understand the impact of the SST pattern shift on the ENSO phase space for CMIP models. However, we found that changing the **T** regions had no significant impact on the out-of-phase relationship between **T** and **h.** As a result, we have focused solely on the regional shift of **h** for CMIP models for the remaining analysis.

## 3.2 ENSO phase space analysis for CMIP models

The observed ENSO phase space is depicted in Fig. 4a. The main features show a clockwise rotation of the tendencies, with clear phase propagation through all four phases of the ENSO cycle. Unlike the phase space of an idealised linear ReOsc model, which is symmetric in all phases (Dommenget and Al-Ansari 2022), the observed variability is clearly skewed towards El Niño (positive **T****n** values) to discharge states (negative **h****n** values) (quarter Q2) and the tendencies are stronger in quarter Q2 and weaker in quarter Q4 (La Niña to recharge phases). A more detailed discussion is presented in Dommenget et al. (2023).

Examples of CMIP models are shown for the phase space estimated based on **h** (left column in Fig. 4) and based on **h****shift** (right column), see Fig. 4. We selected a best (NorESM2-MM), medium (CESM2-WACCM), and worst (BCC-ESM1) model based on the correlation values between the mean phase space estimated based on **h****shift** and the observed value. If we compare the left with the right column, we can notice that the phase spaces of the left column are pronounced along the diagonal of Q2 to Q4, which is a signature of a negative cross-correlation between **T** and **h** (Dommenget et al. 2023). The right column, in turn, is more equally distributed on all four quarters of the phase space and generally, closer to the observed behaviour. This highlights that the CMIP models compare better with the observed phase space when **h****shift** is considered.

The best model (NorESM2-MM) closely reproduces the observed ENSO phase space with high correlation in the mean phase space (0.9), clear clockwise propagation in all four phases, and variability skewed towards stronger positive **T****n** (El Niño), and stronger negative **h****n** (discharge phase). In turn, the worst model (BCC-ESM1) shows opposite asymmetries in the phase space to those observed, with a negative correlation in the mean phase space (-0.85), indicating that it has extreme recharge states instead of the observed extreme discharge states. However, it also shows clear clockwise propagation in all four phases.

The mean statistical features in the ENSO phase space are shown for all CMIP models and the observations in Fig. 5. Again, the left column shows the CMIP models based on **h** and the right column based on **h****shift**. The observed values are shown in both columns for comparison (both based on **h**). The CMIP models phase space based on **h** (left column) shows clear biases in all statistics. The mean phase space and the probability distribution of **S** > 2 are both more pronounced along the diagonal of Q2 to Q4, the growth rate is more pronounced along the **T**-axis and the phase speed along the diagonal of Q1 to Q3. All these biases directly result from the negative cross-correlation between **T** and **h** (Dommenget et al. 2023). As a result, the CMIP models phase space statistics based on **h** do not correlate well with the observed values.

The phase space statistics of the CMIP models based on **h****shift** show much less obvious biases and correlate much better with the observed values. The CMIP models generally do capture the asymmetry in the mean phase space, with the largest mean value in Q2 (El Niño and discharge state), smallest value in Q4 (La Niña and recharge state), and a correlation of the ensemble mean value in CMIP 5 and 6 of 0.88 and 0.81, respectively. However, the CMIP ensemble does not quite capture the intensity of the shift from Q4 to Q2, indicating the CMIP models are underestimating the non-linearity in the ENSO phase space.

This also holds for the probability distribution of **S** > 2 (extreme values), which in observations show a clear shift to Q2 and away from Q4. This is captured by the CMIP models, but again not with the right intensity. In particular, the absence of observed extremes in Q4 is not fully captured by the CMIP models. Furthermore, we see a very large spread in CMIP ensemble members, indicating widely different behaviour between the models.

The observed mean growth rate as a function of the ENSO phase shows positive values mostly around the recharge towards the El Niño state and negative values in the transition from discharge to a La Niña state (Fig. 5g). This signature is somewhat captured in the CMIP ensemble, but only with a moderate correlation. The CMIP models capture the negative growth rate in the transition from discharge to a La Niña state better than the positive values during the recharge to the El Niño state. Again, the CMIP models show a wide spread between the ensemble members. Notably, we find several models with very strong negative growth rates around the discharge and recharge phases, suggesting that these ENSO states can collapse much faster in these models than observed.

The observed phase speed is fastest, or most clear, in Q2 (after El Niño states) and slowest, or least clear in Q4 (after La Niña states; Fig. 5h). The CMIP model ensemble can partly capture this signature, but only with a moderate positive correlation. In particular, the models have problems in capturing the smaller phase speed after La Niña events. This suggests that models tend to have a faster or clearer transition from La Niña states to the recharge state. The model ensemble also has a wide range of different phase speeds. Some models transition through some ENSO phases more than twice as fast than observed, and some models have phase speeds near zero in some ENSO phases, suggesting the models have no clear ENSO cycle in these phases.

The phase speed analysis suggests that CMIP models are oscillating a bit faster than observed. Figure 6 shows the observed and CMIP simulated power spectrum of the **T** index. We can first notice that the ensemble mean power spectrum of CMIP 5 and 6 are in very good agreement with the observed, with no clear shift in the power to higher frequencies. However, both CMIP ensembles have a small tendency to peak at slightly shorter periods (~ 3yrs period) than observed (~ 4yrs period). Individual members of the ensemble can, however, behave quite differently, with some models having clear peaks at different periods and other models having less obvious peaks with a more continuous power spectrum.

## 3.3 Fitted linear ReOsc model for CMIP

The ENSO phase space is based on the ReOsc model. It therefore does help to analyse the ReOsc model fitted to the data of the CMIP models to understand what is causing the characteristics of the ENSO phase space. Here we focus on the linear model, thus not considering non-linearities, and without considering any seasonality in the fitted parameters. Neglecting non-linearities is a strong limitation, as will be shown below, and is only reflecting that we yet do not know what non-linear model can describe the observed asymmetries in the ENSO phase space (Dommenget and Al-Ansari 2022; Dommenget et al. 2023).

Figure 7 shows the fitted ReOsc model parameters for all CMIP models and observations for models based on *h* (left column) and based on *h*shift (right column). The observed estimate is the same in both columns. Here all parameters are normalized (as in *T*n and *h*n) to allow a better discussion of the dynamical implication for the ENSO phase space.

We can first-of-all note that there is quite some spread between models, with many models having parameters being far away from the observed values in both estimates. Further, we find that the two estimates of the ReOsc models are quite different from each other. The models based on **h** are strongly biased in the growth rates and the noise strength in **h** relative to the observed values but are much less biased in the coupling parameters. In turn, the models based on **h****shift** are more strongly biased in the coupling parameters.

The results for CMIP5 models are also somewhat different from Vijayeta and Dommenget (2018; hereafter VD18), which used **Z****20** to estimate **h** instead of **Z****mxg**. This does alter the ReOsc model parameter fits. However, the CMIP5 models tend to overestimate the damping (negative a22) of **h** in both estimates present here and in VD18.

We now focus on the ReOsc model fits based on **h****shift** as they better represent the observed ENSO phase space. Here we are primarily interested in what dynamical elements of the ReOsc model are causing the simulated ENSO phase space characteristics. Any deviation of the ENSO phase space from a perfect cycle (a perfectly symmetric phase space), can be linked to asymmetries (deviations for the dashed lines in Fig. 7) in the ReOsc models parameter pairs: growth rates (a11, a22), coupling (a12, a21) or noise forcings (ξ1, ξ2). In a non-linear model, they can also result from non-linearities in any of these aspects.

Individual models do show strong deviations from the dashed lines in the growth rates and coupling parameters, but less so in the noise forcings. In the ensemble mean the largest asymmetries are found in the coupling parameters, where nearly all models are below the dashed line, indicating a stronger coupling of **T** to **h** (a12) than the coupling of **h** to **T** (a21), which is quite different from the observed symmetric coupling. The coupling parameters are most important for the period of ENSO (Wyrtki 1985). The Wyrtki-index, which measures the peak period of ENSO as a function of a12 and a21, is for most models slightly shifted to shorter periods (Fig. 7e), consistent with somewhat larger phase speed values seen in the phase space diagrams (Fig. 5h).

Integrating the linear ReOsc model fitted to each CMIP model allows us to compute the phase space statistics based on the resulting **T****n** and **h****n** values, see left column in Fig. 8. The phase spaces resulting from the ReOsc model are in general similar to those of the CMIP models, but, due to the linear approach, the phase spaces are by construction symmetric (e.g., all structures are mirroring at the origin) and therefore lack all asymmetric variations. This is particularly important when compared against the observed phase space, because most of the interesting observed phase space characteristics are asymmetric. None of the observed asymmetries can be captured by the analysis of the linear ReOsc model, indicating that such structures must result from the non-linear processes. Thus, a first and important outcome of the linear ReOsc model fit is that the CMIP model mismatch to the observed phase space characteristics are likely to result from non-linear processes.

Despite the limitation of the linear ReOsc model fit, we can still gain some understanding about the CMIP model ensemble spread and how different linear aspects of the ReOsc model affect the ENSO phase space statistics. The mean **S**, the probability of extreme **S** > 2 and the phase speed are all only varying along the diagonals of the phase space but have no variation along the **T**-axis and **h**-axis of the diagrams (left column in Fig. 8). The growth rates are only varying along **T**-axis and **h**-axis, but not along the diagonals. However, the actual data of the CMIP ensembles does show more complex variations in all statistical aspects (right column Fig. 5). This in turn suggests that variations not captured by the linear model parameters are a result of non-linear processes.

We can further test the sensitivity of the ENSO phase space to the individual ReOsc model parameter variations in the CMIP ensembles, by varying only a subset of the parameters, while holding the other parameters at the ensemble mean values following the approach of VD18. This approach can allow us to determine which ReOsc model dynamics are causing asymmetries or variations in the ENSO phase space. The following three scenarios are presented: asymmetries in the growth rates (second column of Fig. 8), coupling (third column of Fig. 8), and forcing strength (right column of Fig. 8).

The spread in the mean **S** values in the phase space of the linear ReOsc model fits (first row Fig. 8) is much smaller in each of the three sensitivity experiments, suggesting that the spread in mean **S** is resulting from a more complex combination of several parameters, but does not result from any of the three asymmetries tested. The ensemble mean **S** value is mostly affected by asymmetries in the coupling and forcings, which both have opposing effects. Here the asymmetries in linear coupling do have some similarities with the observed, but they lack the non-linear elements. Asymmetries in the growth rates have little impact on the mean **S**.

The phase space variations of the extreme values (**S** > 2) are affected by all their parameter asymmetries, but most strongly by the asymmetries in the coupling and forcings (second row Fig. 8). The phase space variations of the growth rates are clearly linked to the asymmetries in the growth rates, but asymmetries in the coupling and forcings do also have a significant effect (third row Fig. 8).

The sensitivities of the phase speed to the linear ReOsc parameters show complex behaviour (last row Fig. 8). When we only consider the asymmetries in the growth rates or forcing strength, we get much larger phase speeds than observed in the CMIP ensemble. This suggests that the asymmetry in the coupling parameters has a strong impact on the phase speed. We can further notice that none of the three asymmetries reflect the overall variations in the phase speed, suggesting that more complex combinations of different ReOsc model parameters affect the phase speed.

## 3.4 Wind dynamics for CMIP models

The surface wind response to SST anomalies is a key element of the Bjerknes feedback and is one of the main processes that control ENSO growth rate and period. While this relation does not directly relate to the ENSO phase space or the ReOsc model, it is implicitly related to both as it affects all parameters of the ReOsc model, in particular the growth rate of **T**.

Figure 9a shows the cross-correlation between the NINO4 region (5°S to 5°N, 160°E to150°W) zonal wind stress (**τ****x**) and **T** for the observations and the CMIP models. The observed cross-correlation shows a strong and mostly in-phase relation between **τ****x** and **T**, consistent with a strong response of **τ****x** to variations in **T**. The relation is, however, strongest when **τ****x** leads **T** by about one month, indicating also that variations in **τ****x** are forcing variations in **T**.

Overall, we see a close agreement between CMIP 5 and 6, and between the models and the observations. Despite the good agreement, there are some deviations in the CMIP models from observations. The CMIP models have a slightly low cross-correlation at lag zero, indicating a weaker response of **τ****x** to **T**. The maximum cross-correlation in the CMIP ensembles is when **τ****x** leads **T** by about two months, indicating a stronger forcing of **τ****x** on variations in **T** than observed. The spread in these relations is quite large in the CMIP ensembles, indicating quite diverse behaviour in the individual models.

Figure 9b shows the regression value of **τ****x** per **T**, which is often referred to as the atmospheric Bjerknes feedback (Bjerknes 1969). The observed atmospheric Bjerknes feedback is approximately 12.5*10− 3 Nm− 2/oC, whereas the CMIP ensemble averages are 7.4 (CMIP5) and 7.8*10− 3 (CMIP6) Nm− 2/oC. Thus, CMIP models underestimate this important feedback by approximately 40%, with none of the CMIP models reaching the observed value and no significant improvement from CMIP 5 to 6. There is a substantial spread within the CMIP model ensemble that is also of similar magnitude in both CMIP 5 and 6. These results are largely consistent with previous studies (Guilyardi et al. 2012; Bellenger et al. 2014; Vijayeta and Dommenget 2018; Guilyardi et al. 2020).

To further examine the variability in **T**, **τ****x** and **h** we looked at the standard deviation of these variables (Fig. 10). The standard deviation of **T** in the CMIP model averages closely resembles observations, with the CMIP5 model average being smaller than the CMIP6 value (Fig. 10a). However, there are significant differences between individual CMIP models, with some models having significantly smaller and others larger standard deviations than the ensemble model or observed.

The observed **τ****x** has a significantly larger standard deviation than both CMIP5 and CMIP6 model averages (Fig. 10b). This would be consistent with the weaker **T** forcing on **τ****x**. Some CMIP models show standard deviations that are closer to the observation, while others have noticeably smaller standard deviations. The results suggest that the weaker **τ****x** variations may be a cause or a reflection of the weaker **τ****x** relation to **T** in the models.

The CMIP6 ensemble mean standard deviation for **h** is very close to the observed, whereas the CMIP5 ensemble mean is a bit smaller, and some show large deviations for the ensemble mean (Fig. 10c). However, more relevant for the ENSO phase space discussion is the variability of **h****shift** (Fig. 10d). This is significantly weaker than the observed **h** for both CMIP ensembles. Given a similar cross-correlation between **h****shift** and **T** in the CMIP model as in the observations (Fig. 1b) and the smaller standard deviation of **h****shift** in the CMIP as shown above, we would expect larger values of a12 in CMIP models than observed (see Fig. 7e), since the regression values are divided the smaller standard deviations of **h****shift**. Thus, per **h****shift** anomaly in the CMIP models we get a larger **T** anomaly than in observations. It should be noted here that this does not need necessarily imply a larger sensitivity of **T** to **h****shift**, because correlation does not allow to determine causality here.