**Data assimilation method.** The reconstructions are generated using the ensemble Kalman filter to spatially spread information from global paleoclimate proxy data using teleconnection patterns captured by covariance structures in climate models, following the Last Millennium Reanalysis framework25,26. The ensemble approach generates a distribution of climate states, allowing us to estimate the most probable state as well as quantify associated errors. Additionally, this approach can generate multiple climate fields at once, including those not typically derived from proxy data. We use an offline data assimilation method in which we randomly draw annually-averaged anomaly states from climate model output to form a prior ensemble with 100 members, which we use for every year reconstructed. This approach ensures that the temporal variance and trends in the final reconstruction, or “posterior” ensemble, are generated from the proxies rather than variability in the model. This approach also leaves us with a posterior ensemble of 100 members, which we use to calculate uncertainty. Additionally, the offline approach allows us to test the sensitivity of the reconstruction to the choice of climate model used for the prior.

Our ensemble data assimilation method follows these equations:

The forward model used to map the prior estimate to proxy space is a seasonal linear regression between instrumental temperature anomalies and each proxy record for the period of overlap. For tree-ring records, we use a seasonal bilinear forward model calibrated to both instrumental temperature and precipitation data with proxy-specific seasonality26. Our calibration data for temperature and precipitation are GISTEMP60 and Global Precipitation Climatology Centre61 (GPCC), respectively. For ice core records from Antarctica, we calibrate the data to temperatures from Nicolas and Bromwich, 2014 rather than GISTEMP. Nicolas and Bromwich, 2014 is the most up to date and accurate spatial temperature reconstruction for Antarctica, though we note that the reconstructions show only very minor differences if we calibrate the ice cores to GISTEMP. The ice core records we use comprise both water-isotope (δ18O) records and annual-accumulation records; both are calibrated to temperature, following a study39 that found that using both accumulation and δ18O yields better performance for temperature than reconstructions using δ18O alone. GPCC is not reliable over Antarctica, and while other precipitation data sets are available, climate models generally show significant bias in accumulation; we therefore do not use an accumulation prior.

The anomaly reference period in the reconstructions is 1961-1990 unless otherwise noted (e.g., in comparisons to satellite-based reanalysis products that start in 1979, in which we shift the anomaly reference period in the reconstructions by moving them by the difference in reconstruction mean from 1961-1990 and 1979-2005). The reconstructions are regridded to 1° resolution.

**Correlation and coefficient of efficiency calculations.** To assess reconstruction skill, we calculate correlation to measure signal timing (after accounting for autocorrelation) and coefficient of efficiency to quantify errors in signal amplitude or bias. We use the Pearson correlation coefficient to calculate correlation on the ensemble mean time series. Correlation significance is calculated with the 2-tailed student t-test with 95% confidence, using the number of independent samples (N) that accounts for autocorrelation62:

**Calculations of trend magnitude and uncertainty.** Trends are calculated as the slope of a linear regression for the ensemble mean of the reconstruction. To calculate uncertainty on the magnitude of the trend, we randomly draw from the 100 ensemble members for each year to generate a random ensemble of time series (i.e., the ensemble indices are randomly labeled in time). We perform this step 100 times and calculate the uncertainty as twice the standard deviation of the 100 linear fits associated with these random draws. We note that 100 is sufficient for the statistics to converge. We use 2σ as the uncertainty bounds on the magnitude of the trend, as the data these are performed on are approximately normal (by failing to reject a Kolmogorov-Smirnov test for normality with 95% confidence). Thus, 2 standard deviations from the mean is equivalent to the middle 95% of the 100-member ensemble trends.

**Trend significance**. Trends are considered statistically significant if two criteria are met: (1) the trend magnitudes and their uncertainties for both reconstructions overlap with each other and do not overlap with 0 (i.e. they are consistent in sign and magnitude), and (2) if the 95% confidence intervals of the ensemble mean trend do not overlap with 0, after accounting for autocorrelation, in both reconstructions (i.e. how meaningful the linear trend is for how noisy the data are). The 95% confidence interval of the linear fit (for the second criteria) is calculated as follows:

**Southern hemisphere surface westerly position and strength calculations**. We calculate the circumpolar Southern Hemisphere (SH) westerly wind position as the center of mass latitude for zonally averaged westerly winds between the latitudes of 20°S and 70°S. The SH surface wind strength is the wind speed at the center of mass latitude. To perform these calculations, we add the mean of the prior back into the respective reconstruction so that we are working with the full climate field. Trends are calculated with 100 randomly generated time series drawn from ensemble members following the steps outlined above. We perform this calculation on the zonal average (full circumpolar) and by regions64: the Pacific (150–290°E), Atlantic (290–20°E), and Indian (20–150°E) Ocean regions (Table S3).

**Model bias in the reconstructions**. The trends in the reconstructions are generated by information from the proxies, but the mean wind speeds and wind positions are largely inherited from the models that the priors are drawn from (Fig. S7, Table S4). As a result, the CESM reconstruction shows a positive bias in wind strength (1.19 m/s) relative to ERA5, while the HadCM3 reconstruction shows a negative bias relative to NCEP2 (-1.19 m/s). The reconstructions show minor position biases relative to both ERA5 and NCEP2 (-0.43° to 0.86°). These biases are small compared to biases seen across CMIP5 and CMIP6 models50. Wind position biases in models have been shown to affect trends in the circumpolar westerly wind position64,65,66, so trends subject to biases must be treated with caution. However, even with contrasting biases in strength and small biases in position, both reconstructions show insignificant trends in position and weak strengthening of the SH westerlies during the 20th century, with a large strengthening component from the Pacific region.

**Pseudoproxy experiments**. We use the same data assimilation method as the real-proxy reconstructions to generate pseudoproxy reconstructions, with a few changes. We use a prior from a historical model so that we can generate pseudoproxies from the 20th century. We use a PACE prior composed of ensemble members 2-20, concatenated together to form a sufficiently long list of climate states to draw from. We generate the pseudoproxies by drawing temperature from the proxy locations in ensemble member 1 of PACE. Since the pseudoproxies are taken directly from the temperature fields, no forward model is needed. We use an observational error of 0.01°C. The ability of the pseudoproxy reconstructions to reproduce the historical climate models allows us to understand the upper bound of how well our method can reproduce the truth, given the proxy locations that we have and the covariance structures in the models.