CMIP6 captures the satellite-era jet slowdown and Arctic amplification, yet projects future jet speedup and tropical amplification

The polar-to-subtropical temperature gradient in the free troposphere is a key driver of the extratropical jet stream response to climate change. Climate models tend to steepen this gradient in response to large greenhouse gas increases, due to very strong subtropical upper-level warming. This strengthens the simulated jets. However, multiple lines of observational evidence point to a slowing northern jet over the satellite era, driven by enhanced Arctic free-tropospheric warming and weakening of the gradient. Here, we reconcile these seemingly contradictory results by showing that the CMIP6 ensemble successfully simulates both the observed satellite-era slowdown/weakening, and the speedup/strengthening with strong global warming. Specifically, the observed gradient weakening from 1980–1997 to 1997–2014 appears inconsistent (p < 0.05) with the simulated gradient changes for just 6 of 45 models using Microwave Sounding Unit observations, and for just 5 of 45 models using reanalysis estimates. The observed jet slowdown appears inconsistent with the simulated jet changes for just 1 of 45 models. In fact, a clear majority of the CMIP6 models weaken the gradient and slow down the jet over this interval. Yet a strong majority of the models reverse course under a high-emissions future-type scenario, simulating gradient strengthening and jet speedup. Future work will seek to clarify the cause(s) of this unexpected difference between past and future atmospheric responses.


Introduction
The Northern Hemisphere extratropical jet is driven in large part by the equator-pole temperature gradient.The strength and location of the jet are anticipated to change in a warmed climate, but how these changes will manifest is not well defined (Francis and Vavrus 2015;Francis 2017;Matsumura et al. 2019).Two responses of the jet to climate change are often found in modeling studies.One is a poleward shift of the jet and extratropical storm track, often associated with an increase in zonal wind speed, a less wavy jet pattern, and potentially less persistent weather patterns, although this is contested (Butler et al. 2010;Francis and Vavrus 2012;Barnes et al. 2014;Deser et al. 2015;Ceppi and Hartmann 2016;Yim et al. 2016;Peings et al. 2018;Shaw and Tan 2018;Stendel et al. 2021).These features are also signatures of a positive North Atlantic Oscillation (NAO)/Arctic Oscillation (AO) (Feldstein and Lee 2014).In contrast, other studies show an equatorward shift of the jet and extratropical storm track, associated with a decrease in zonal wind speed, a wavier jet pattern, and possibly more persistent weather patterns (Butler et al. 2010;Francis and Vavrus 2012;Barnes et al. 2014; Barnes and Polvani 2015;Deser et al. 2015;Ceppi and Hartmann 2016;Yim et al. 2016;Peings et al. 2018;Shaw and Tan 2018;Stendel et al. 2021).These are also features of a negative NAO/AO (Feldstein and Lee 2014).
The poleward shift and strengthening of the extratropical jet have been attributed to increased warming aloft in the tropical upper troposphere (Butler et al. 2010;Allen et al. 2012;Deser et al. 2015;Ceppi and Hartmann 2016;Yim et al. 2016;Peings et al. 2018;Shaw and Tan 2018), though Shaw and Tan (2018) found the subtropical upper troposphere plays a larger role than the tropics.Lowerstratospheric cooling in the Arctic has also been shown to cause a poleward shift of the jet (Butler et al. 2010;Allen et al. 2012;Deser et al. 2015;Yim et al. 2016;Shaw and Tan 2018;Peings et al. 2018).Conversely, the equatorward shift and weakening of the extratropical jet has been attributed to enhanced warming in the Arctic lower troposphere, known as Arctic Amplification (Butler et al. 2010;Allen et al. 2012; Barnes and Polvani 2015;Deser et al. 2015;Ceppi and Hartmann 2016;Yim et al. 2016;Shaw and Tan 2018;Peings et al. 2018).Other factors may contribute to these changes in the jet as well.For example, Allen et al. (2012) found that warming in the midlatitude boundary layer also produces a poleward shift of the jet.
The balance between the warming of the tropical upper troposphere and the Arctic lower troposphere is the premise of the "tug-of-war" hypothesis: whichever one warms more in the future should determine the future jet response.Since both jet responses have opposing potential effects on weather (e.g. a wavier or less wavy jet; more or less persistent blocking patterns), accurately projecting the future jet change in climate models is important for our understanding of the effects of climate change on mid-latitude weather.
However, there is evidence that the behavior of the jet in climate models is at odds with observations over the last several decades.Most climate models project a poleward jet shift and/or the tropics warming more (Yin 2005;Lorenz and DeWeaver 2007;Butler et al. 2010;Barnes and Polvani 2013;Delcambre et al. 2013;Lorenz 2014;Ceppi and Hartmann 2016;Lachmy and Shaw 2018;Screen et al. 2018;Li et al. 2019;Harvey et al. 2020), but studies based on observations and reanalysis of the satellite era indicate that the Arctic may be warming more than the tropics and/or that the jet has shifted equatorward (Graversen et al. 2008;Santer et al. 2013;Santer et al. 2018;Feldstein and Lee 2014;Francis and Vavrus 2015;Manney and Hegglin 2018).
Additionally, the discussion surrounding the competition between the Arctic and the tropics has focused specifically on the warming of the Arctic near the surface and the tropics in the upper troposphere.The Arctic upper troposphere is typically not considered, as most of the warming in the Arctic occurs near the surface (Lorenz 2014;Barnes and Screen 2015;Matsumura et al. 2019).However, some other reanalysis and observationally based studies indicate that the Arctic upper troposphere is also warming significantly, and that CMIP5 and earlier generations of models overestimate tropical warming and underestimate Arctic warming in the mid to upper troposphere (Graversen et al. 2008;Santer et al. 2013;Santer et al. 2018).Kim et al (2021) used targeted GCM simulations to show that midlatitude jet shifts are more sensitive to Arctic free tropospheric warming relative to concentrated warming near the Arctic surface.
While the multi-model means of the earlier generation climate model-based studies tend to produce a poleward shift and strengthening of the jet, individual models have been shown to produce a wide variety of jet response patterns (Yim et al. 2016).Several studies have utilized the tug-of-war framework to understand differences in the jet response between different climate models and found a significant correlation between the Arctic-tropical warming contrast and changes in the jet (Barnes and Polvani 2015;Yim et al. 2016;Peings et al. 2018).Peings et al. (2018) found a strong correlation of 0.84 between the change in jet strength and the "ratio between upper-tropospheric tropical and Arctic warming (RUTAW)" that is consistent with the tug-of-war hypothesis.The correlation between jet strength and RUTAW offers the benefit of being able to quantitatively predict the jet response in climate models given the pattern of tropospheric warming.
Given the apparent disagreement between climate models and observations, as well as the disagreement between individual models, there is significant motivation to study the relationship between the Arctic-tropical warming contrast and the extratropical jet.Most studies have used older versions of CMIP (the Coupled Model Intercomparison Project).The most recent iteration, CMIP6 (Eyring et al. 2016), offers the opportunity to use a suite of state-of-the-art climate models.This new set of models will help us answer a variety of questions.Does the significant relationship between Arctic and tropical warming and zonal wind found by prior studies hold in CMIP6 future and historical climate projections, and if so, is it stronger or weaker?Are climate models accurately capturing the satellite era extratropical jet change and tropical/polar warming contrast?Finally, given the utility of the correlation between the Arctic minus tropical warming and zonal wind change as a predictor of the future jet response, can models be "ruled out" by comparing their historical simulations to observations?

High-emission simulations (1pctCO2)
To represent the future under a high emissions scenario, 45 climate model simulations from CMIP6 were used.However, the idealized 1pctCO2 experiment (1% year −1 CO 2 increase for 150 years; Eyring et al. 2016) was chosen instead of one of the more realistic shared socioeconomic pathway (SSP; Eyring et al. 2016) experiments, because relatively few modeling groups had made their SSP simulations available when data was being downloaded for this study, while many modeling groups had made their 1pctCO2 simulations available (Table 1).While the SSP simulations have more realistic forcing, it is still mostly CO 2 forcing (especially for the high-emission SSP5-8.5),so the 1pctCO2 simulations are a reasonable substitute.By the end of the 150-year 1pctCO2 simulations, CO 2 concentration has more than quadrupled from 284 ppm to about 1263 ppm, analogous to a high-emission future.Only the first ensemble member of each model is used for the 1pctCO2 portion of the study because the other ensemble members behave nearly identically to the first in CMIP5 rcp8.5 (Golden 2020), and a cursory check of a few models confirmed this for CMIP6 1pctCO2 as well.This is expected due to the very high signal-to-noise ratio for this experiment.
The variables analyzed in this study are the monthly air temperature (ta) and zonal wind (ua) on pressure levels, and a control and future-like period are identified for each variable.The control period is defined as model years 1 through 30 (average CO 2 ~ 330 ppm) for each variable.The futurelike period is defined as model years 121 through 150 (average CO 2 ~ 1090 ppm) for each variable (Eyring et al. 2016).Note that these do not correspond to actual calendar years.

Satellite era simulations
Simulations of the satellite era from 1980-2014 are provided by the CMIP6 historical simulations from the same 45 models used above.While SSP output could be used to extend this up to the latest availability of observational data (e.g.2022), there were few models with SSP simulation data available at the time of this study.Unlike the 1pctCO2 simulations, multiple ensemble members are included where available to help account for internal variability, due to the smaller signal-to-noise ratio over the satellite era.There are 469 total ensemble members across 45 models included in this part of the study (Table 1).Again, the variables analyzed are the monthly temperature and zonal wind, which are halved into two 18-year segments to obtain past (1980-1997) and present (1997-2014) periods.The start year is 1980 rather than the usual 1979 simply because one of the reanalysis datasets (below; "MERRA-2" in Table 2) only begins in 1980.It should be noted that due to the odd number of years there is one year of overlap between the two time periods.

Satellite observations and reanalyses
For the observational comparison, we use temperature and zonal wind output from four modern global reanalyses (Table 2), as well as satellite Microwave Sounding Unit (MSU) data on free tropospheric temperature.Reanalyses are not designed for long term trends, so there is inherent uncertainty in the reliability of their temperature and zonal wind trends.However, they are among the best estimates of reality we have for the free troposphere, and the four reanalyses used in this study are broadly in agreement (below; Sect.4).The Climate Forecast System Reanalysis (CFSR; Saha et al. 2014) was also originally included in this analysis, but it was an outlier with respect to its tropical and subtropical warming compared to the other reanalyses.This is likely due to the transition of CFSR to CFSv2 in 2010 (Wright et al. 2020), which may imply that the CFSR warming value is spurious.It was removed from the analysis with little change to the results.
The analysis of MSU data requires two datasets for a direct comparison to CMIP6.The first is the actual MSU data, which consists of tropospheric temperature observations for 1979 to near-present originating from over a dozen satellites that have been spliced together and interpreted by three different groups (Table 3).The second dataset is synthetic MSU data for the same time frame, developed by Po-Chedley et al. (2021), showing what a satellite would have observed in the atmosphere simulated by a CMIP6 climate model.There is synthetic MSU data for 44 of the 45 models (Table 1).For comparison to the CMIP6 analyses, only 1980-2014 will be considered.
Preliminary work by Golden (2020) determined that the warming throughout the depth of the free troposphere in both the Arctic and the subtropics is significant for the jet response (see methods section), so an MSU channel called the stratosphere-corrected TMT (temperature of the midtroposphere), also known as "T24" or "TTT", was chosen.The uncorrected TMT product includes some contamination from stratospheric cooling, which can be removed using information from the MSU stratospheric temperature channel (TLS) (Fu et al. 2004).In particular, this study uses TTT = (1 − a) × TMT + a x TLS with a = − 0.1.Some studies use a latitudinally-varying coefficient for corrected TMT (e.g., a = − 0.18 poleward of 30 degrees; Fu and Johanson 2005).We chose latitudinally-invariant coefficients to avoid artificial jumps in tropospheric temperature in regions of interest, as in Santer et al. (2018).

High-emission simulations (1pctCO2)
Zonal-mean, annual-mean temperature and wind change fields are computed for each CMIP6 model in Table 1 by differencing atmospheric temperature and wind between the control (years 1-30) and future-like (years 121-150) periods.
To understand the relationship between these variables, the extent of the subtropics, Arctic, and extratropical jet must also be defined.A latitude-pressure box is defined for each of these three regions, chosen based on sensitivity tests conducted for CMIP5 in Golden (2020): the box defining the extratropical jet extends from 30° N to 70° N and 1000 mb to 200 mb, the box defining the Arctic extends from 60° N to 90° N and 850 mb to 300 mb, and the box defining the subtropics extends from 20° N to 40° N and 850-200 mb (Fig. 1).Golden (2020) found that extending the Arctic box into the upper troposphere and the subtropical box into the lower troposphere yields stronger correlations than using just the subtropical upper troposphere or the Arctic lower troposphere.The chosen latitudinal extents also showed significantly better correlations than other tested combinations, except in the case of the 60° N to 90° N Arctic latitude bounds which were untested.The 20° N to 40° N extent for the subtropics is also supported by Shaw and Tan (2018), which found that the jet is more sensitive to subtropical warming than tropical.Additionally, the choice to extend the Arctic box into the upper troposphere is supported by Kim et al. (2021), which showed that Arctic free-tropospheric warming more effectively produces a jet shift in modeling experiments compared to the Arctic near-surface alone.The simple, unweighted average over latitudes and pressure levels is taken for each of the 3 boxes defined above to get the subtropical warming, the Arctic warming, and the zonal wind change for each model.The subtropical warming is subtracted from the Arctic warming to get the Arcticsubtropical warming difference.A simple linear regression is performed between the Arctic-subtropical warming difference and zonal wind change to test the tug-of-war hypothesis.

Satellite era simulations
The above process is repeated for the historical simulations of the 1980-2014 satellite era.The only differences are the   use of the above-defined 18-year segments for the past and present instead of the 30-year segments used in the future period, and the inclusion of 469 ensemble members for the 45 models.The regression is initially performed using only the first ensemble member of each model.Using the first ensemble member provides a more direct comparison to the 1pctCO2 regression to examine how the strength of the relationship differs between the simulations.However, this is not representative of the full distribution for the satellite era, and the regression is repeated with all 469 ensemble members for comparison with the observations.The ensemble-mean warming difference for the satellite era for each model is also plotted against the 1pctCO2 warming difference, and the satellite era ensemble-mean zonal wind change is plotted against the 1pctCO2 zonal wind change.This allows for a comparison of how the models are portraying the warming difference and the zonal wind change over the satellite era versus the future.
The satellite MSU data is analyzed in a similar way.For both the observational MSU data and the synthetic climate model MSU data, the average warming between the first 18 years and the last 18 years is calculated.However, instead of using the full defined latitude-pressure boxes, the defined latitudes are used in tandem with the above-discussed stratosphere-corrected TMT channel to create the Arctic and subtropical domains.This channel corresponds most closely to the pressure levels used for the latitude-pressure boxes in the previous analyses.As before, the subtropical mean warming is subtracted from the Arctic mean warming.The synthetic climate model MSU warming difference is correlated with the climate model zonal wind change over the satellite era for each model to determine the utility of the synthetic climate model MSU Arctic minus subtropical warming as a predictor of the modeled jet change.

Comparison between models and observations
The reanalyses are averaged over the same boxes as the climate models, then the reanalysis results are compared to the satellite era climate model results to determine which models are accurately capturing the patterns estimated over the satellite era.Similar to the reanalyses, the observational MSU Arctic minus subtropical warming is compared to the synthetic climate model MSU Arctic minus subtropical warming to determine if the models are successfully capturing the satellite era change.
To quantify which models are possibly consistent with the reanalysis or satellite MSU output, a two-sample t test is used for both the warming difference and zonal wind change to compare each model's ensemble to reality as estimated by reanalysis and/or by the satellite MSU data.The average warming difference is calculated over all 4 reanalysis products in Table 2 and over all 3 MSU observational products in Table 3, resulting in a single estimate of reality for each that is used as one of the samples for the t test.The collection of warming differences for all of the ensemble members of a given model is used as the other sample.For the satellite MSU, this is the collection of ensemble members of the model's synthetic MSU output.While the sample based on reality will only have a single value, the sample for the models will have multiple values, depending on the number of ensemble members.This is repeated for the zonal wind change for the reanalysis only.The two-sample t test evaluates the null hypothesis that the average observed warming difference or zonal wind change could be a reasonable member of each model's full ensemble population.If the test produces a p value less than 0.05, the null hypothesis can be rejected, and the model is not consistent with observations.Otherwise, the null hypothesis cannot be rejected and the model is tentatively consistent with observations.The twosample t test will not work for models that only have one ensemble member available, and these models are excluded from the analysis (Table 1).

Results and discussion
The CMIP6 1pctCO2 and historical simulations both produced a variety of warming and jet change patterns that are consistent with the tug-of-war hypothesis.For example, the CESM2 historical simulation shows much stronger satelliteera warming in the Arctic than the subtropics (Fig. 2c), and a corresponding decrease in jet strength (Fig. 2d).The historical CIESM shows more warming in the subtropics (Fig. 2a), and an increase in the jet strength (Fig. 2b).
The regression between the Arctic-subtropical warming difference and the zonal wind change shows a very strong, statistically significant negative correlation in both the CMIP6 1pctCO2 simulations and historical simulations (Fig. 3a, b, d).In other words, when the Arctic warms more than the subtropics, the models tend to produce a decrease in the jet strength and when the subtropics warm more than the Arctic, the models tend to produce an increase in jet strength.This relationship is consistent with the tug-of-war hypothesis, and is much stronger in 1pctCO2 than previous studies have found for future-like simulations (Barnes and Polvani 2015;Yim et al. 2016;Peings et al. 2018), potentially due to the optimization of latitude-pressure boxes done by Golden (2020).The relationship is also notably strong over the satellite era (though weaker than the relationship for 1pctCO2), which most other studies have not tested.While the intermodel correlations are strong for Fig. 3b, c, the correlation between ensemble members within individual models varies widely.Some individual model correlations are much stronger (e.g.− 0.99), some are much weaker (e.g.− 0.19), and some are positive instead of negative (not shown).Scatterplots of each individual model's ensemble members can be found in Figures S1-S5 in the supplemental information.
The reanalysis output and satellite MSU temperature data are also included on the satellite-era scatterplots in Fig. 3b,  c.The reanalyses all fall in the lower right quadrant of the historical simulation scatterplot (Fig. 3b), showing stronger warming in the Arctic than in the subtropics and a decrease in jet strength over the satellite era, consistent with prior literature (Graversen et al. 2008;Santer et al. 2013Santer et al. , 2018;;Feldstein and Lee 2014;Francis and Vavrus 2015;Manney and Hegglin 2018).The satellite MSU products also fall on the right side of the scatterplot, indicating greater warming in the Arctic (Fig. 3c).Additionally, both the reanalyses and the satellite MSU temperatures fall well within the model range.This was somewhat unexpected given that prior studies indicated the models were not capturing the warming observed in the Arctic or the tropics over the satellite era (Santer et al. 2013(Santer et al. , 2018;;Jansen et al. 2020).
This ability of so many models to produce increased Arctic warming and a weaker jet over the satellite era is a new feature of CMIP6 compared to CMIP5.Golden (2020) performed a similar analysis of the Arctic-subtropical warming and zonal wind change for RCP8.5 and historical simulations in CMIP5, and found that models favored stronger warming in the subtropics and a stronger jet over both the historical simulations and the future-like simulations.This is noted in Figs.S6, S7 in the supplemental information, reproduced from Golden (2020).Both the satellite-era simulations (Fig. S6) and the future simulations (Fig. S7) are weighted towards the top left quadrant, with the ratio of simulations in the upper left quadrant to simulations in the lower right quadrant being 1.6:1 for Fig. S6 and 2.1:1 for Fig. S7.This indicates a preference for the subtropics warming more and a strengthening of the jet, in agreement with other studies of the modeled future (Santer et al. 2013(Santer et al. , 2018;;Jansen et al. 2020).Figure 3d shows a similar outcome in the CMIP6 1pctCO2 simulations, with the ratio of simulations in the upper left quadrant to those in the lower right quadrant being 1.9:1.However, the ratio of simulations in the upper left quadrant to those in the lower right quadrant in the historical scatterplots is 1:3 for Fig. 3a, 1:3.6 for Fig. 3b, and 1:1.9 for Fig. 3c.This is opposite from the CMIP5 satellite-era simulations, indicating a preference for the Arctic warming more and a weakening of the jet.
The scatterplots of some individual models are shown in Fig. 4, and the rest can be found in Figures S1-S5.The majority of the models unambiguously produce values that are reasonable based on the observations, e.g., Can-ESM5-p2 and UKESM-0-LL (Fig. 4a, d).Conversely, a handful of models do not appear to produce values consistent with the observations, e.g., CIESM and GFDL-ESM4 (Fig. 4b, e).A few other models are a bit more ambiguous, and it's difficult to determine if they are consistent or not, e.g., FIO-ESM-2-0 and CNRM-ESM2-1 (Fig. 4c, f).
The results of the two-sample t test confirm that the reanalyses and satellite observations could be plausible members of almost every model's ensemble (Table 1), leaving very few models that are inconsistent with observations of either the warming difference or the zonal wind change.Of the models that fail the t test on the basis of either the reanalysis warming difference or zonal wind change, or the satellite MSU warming difference, most of them have very few ensemble members (e.g., Fig. 4b, e).Such limited sampling means the relationship is likely not strong enough to definitively say these models could not be consistent with observations.Most of the models with a large number of ensemble members do pass the t test when compared to the reanalysis, like Fig. 4a, and are clearly consistent with observations.Likewise, most models with a large number of ensemble members also pass the t test when compared to the MSU, like Fig. 4d.However, similar to the models that failed the t-test, there are also many models with very few ensemble members yet pass the t test (e.g., Fig. 4c).Like those that failed the t test, the sample size is too limited to say they are clearly consistent with the observations, as the t test results would imply.Therefore, the models that pass or fail with very few ensemble members are inconclusive, and only the models that pass or fail with larger ensembles could be convincingly consistent or inconsistent with the observations.Some models do fail the t test with relatively larger ensembles (EC-Earth3-Veg, GISS-E2-1-G-p5, MPI-ESM1-2-LR, and NorCPM1).Of these, NorCPM1 has the largest ensemble and would be the best candidate for being ruled out based on the results of the t-test.However, the scatterplot of NorCPM1 (as well as the other aforementioned models; see Fig. S1-S5) shows the reanalysis is not an obviously implausible member of the distribution.Thus, it isn't possible to truly rule out any models.With no models that are convincingly inconsistent with the observations, this increases confidence that the current generation of models Interestingly, although many models are able to better capture the warming and jet change observed over the satellite era, they are largely failing to capture the full magnitude of the warming difference between the Arctic and the subtropics.This can be seen in Figs.3b, c, where the majority of the models fall to the left of the reanalysis or satellite MSU.This can also be seen for individual models.For example, in Fig. 4a, d, the reanalysis and satellite observations are weighted towards the right side of the model ensemble cloud.This difference in magnitude between observations and the models could be the result of internal variability (Rantanen et al. 2022), or flaws in the forced response.
Finally, despite the models' general agreement with observations over the satellite era of stronger Arctic warming compared to subtropical warming, they surprisingly still tend to favor stronger subtropical warming and increased jet strength in the future-like simulations.This discrepancy can be seen in Fig. 3. On the 1pctCO2 scatterplot (Fig. 3d), a clear majority of the models fall in the upper left quadrant, favoring the subtropics warming more and the jet strength increasing.Far fewer points fall in the bottom right quadrant, favoring the Arctic warming more and the jet strength decreasing.Of those that do fall in the bottom right quadrant, most are clustered near the center, with only a few models showing significantly stronger warming in the Arctic and a more significant jet slowdown.However, on the historical scatterplots (Fig. 3a-c) the majority of the models fall in the bottom right quadrant, favoring stronger Arctic warming and a decrease in jet strength.This is shown more clearly in Fig. 5, which shows the ensemble mean historical versus the 1pctCO2 Arctic-subtropical warming difference and zonal wind change for each model.In the warming difference plot (Fig. 5a), the models are significantly weighted to the bottom right quadrant compared to the upper left, indicating that many models which produce increased warming in the Arctic over the satellite era still tend to produce increased warming in the subtropics in a future-like climate.Similarly, the models are significantly weighted to the upper left quadrant in the zonal wind change plot compared to the bottom right quadrant (Fig. 5b), indicating that many models which produce a jet slowdown over the satellite era tend to produce a jet speedup in a future-like climate.This shows that the warming difference in the future is systematically more subtropical amplified than over the satellite era, indicating that a future where the subtropics warm more and the jet speeds up may be consistent with a past where the Arctic warms more and the jet slows down, at least in these experiments.This difference in the warming and jet response to climate change could be influenced by the current and future state of the meridional temperature gradient, an inversion in the Arctic, and sea ice cover.These factors all significantly impact Arctic Amplification, and are expected to evolve due to climate change in a way that would reduce Arctic Amplification (Previdi et al. 2021).Alternatively, the enhanced Arctic Amplification in the satellite-era simulations could simply stem from their spurious high-latitude biomass-burning aerosol forcing as found by Fasullo et al. (2022), making those simulations "right for the wrong reasons" with regard to Arctic warming.Potential differences in the forcings between these periods may also contribute.It should also be noted that the 1pctCO2 runs start at a preindustrial state, and comparing the first 30 years of these runs to 1980-1997 is not a direct comparison that could introduce discrepancies.Additionally, there is evidence that both the enhanced warming observed in the Arctic compared to simulations is partially driven by internal variability (e.g.

Conclusion
This study has shown that there is a strong correlation between the Arctic minus subtropical warming and zonal wind change in both CMIP6 historical and future-like simulations that is consistent with the tug-of-war hypothesis.The modeled future agrees with previous model consensus, tending to warm the subtropics more and increase jet strength, while the modeled satellite-era is more consistent with recent studies that show the Arctic warming more than the tropics.This indicates that CMIP6 models are capturing the historical Arctic warming better than earlier model generations, and that the increased Arctic warming over the satellite era may actually be consistent with the future depicted by the models.However, while the models are performing better than earlier generations of models, they are still failing to capture the magnitude of the warming difference between the Arctic and the subtropics over the satellite era.Comparing the modeled satellite era to reanalysis output and satellite MSU observations shows that most models are broadly consistent with the observations over the satellite era, and no models can definitively be "ruled out".This indicates that overall the models are performing better than expected given previous model output that tended to show the tropics winning the tug-of-war instead of the Arctic over the satellite era (Jansen et al. 2020;Yin 2005;Butler et al. 2010;Delcambre et al. 2013;Lorenz 2014;Lachmy and Shaw 2018;Screen et al. 2018;Santer et al. 2013Santer et al. , 2018;;Golden 2020).This performance over the satellite era indicates that the past warming in the Arctic may actually be consistent with the future warming in the tropics and subtropics.It also shows that the satellite era may not necessarily be a straightforward predictor of the future response of the northern hemisphere extratropical jet to climate change.
Several key additional steps are planned for subsequent work on this topic.Most immediately, the endpoint of the satellite-era comparison will be extended from 2014 to the present, now that CMIP6 SSP simulations are widely available to continue the historical simulations beyond the year 2014.The SSP future (i.e. to 2100) simulations will also  be examined in addition to the idealized 1pctCO2 simulations, to test whether the future-past discrepancy found here is specific to 1pctCO2, or more generalizable.Similarly, the early portion of the 1pctCO2 experiment (corresponding to satellite-era global radiative forcing and temperature) will be compared to the historical satellite-era analysis to test whether the discrepancy is specific to the historical experiment.The analyses will also be repeated for individual seasons and/or longitudinal sectors, to understand more precisely where and when the discrepancies arise.
Most importantly, the key causes of the discrepancies will be investigated.Specifically, the potential role of forcing differences between the satellite era and the future will be isolated using the CMIP6 single-forcing experiments (Gillett et al. 2016).The CMIP6 satellite-era aerosol forcing in particular is highly polar-amplified but may be partly spurious (Fasullo et al. 2022), so it will be useful to know whether aerosol forcing is a key cause of the discrepancies.The moist energy balance modeling approach of Bonan et al. (2018) will then be used to quantify and probe the results of feedback differences between the satellite era and the future.Combined, this work will yield more confident results about the future of the tropospheric temperature gradient and northern extratropical jet stream on a warming planet.

Fig. 2
Fig. 2 Warming between the present (1997-2014) and past (1980-1997) period of the historical simulations for two example models (left column) and their corresponding zonal wind change (right column).The black contours are the zonal mean wind for the past period and are shown in 5 m/s intervals

Fig. 3
Fig. 3 Scatterplot and regression lines for a the first ensemble member of each model for the historical simulation analysis, b all ensemble members for the historical simulation analysis with each reanalysis (blue circles) and the reanalysis average (red diamond) overlaid, c all ensemble members of synthetic MSU warming difference and

Fig. 4
Fig. 4 Scatterplot of a-c an individual model's ensemble with the reanalysis average (red triangle) and each reanalysis (red dots) overlaid and d-f each model's ensemble of synthetic MSU warming vs. 1pctCO2 Warming Difference (b) Historical vs. 1pctCO2 Zonal Wind Change

Fig. 5
Fig. 5 Scatterplot of a the historical Arctic minus subtropical warming difference versus the 1pctCO2 Arctic minus subtropical warming difference and b the historical zonal wind change versus the 1pctCO2

Table 1
A list of all CMIP6 1pctCO2, historical, and synthetic MSU models used Unavailable models are noted by "N/A" The last three columns give the p-values from the two-sample t tests for observational consistency with the model ensemble, described more fully in Sect.3c.Bold text highlights p-values less than 0.05.t tests that could not be conducted because only one ensemble member was available are noted by "-"

Table 3 A
list of the groups that interpreted the satellite MSU observations