Land temperature variability driven by oceans at millennial timescales

Variations in regional temperature have widespread implications for society, but our understanding of the amplitude and origin of long-term natural variability is insufficient. This is especially the case for terrestrial temperature variability which is currently thought to be weak over long timescales. Here, we provide the first comprehensive estimate of regional temperature variability from annual to millennial timescales based on sedimentary pollen records and instrumental data from the Northern Hemisphere. We show that the short-term random variations are overprinted by strong ocean-driven climate variability on multi-decadal and longer timescales. This may cause substantial climate change at the regional scale in the coming century, contrasting the rather monotonous warming projected by climate models. Due to the marine influence, regions characterized by stable oceanic climate at sub-decadal timescales experience stronger long-term variability while continental regions with higher sub-decadal variability show weaker long-term variability. This fundamental relationship between the timescales provides unique insight into the mechanisms governing slow terrestrial climate variability and sets the basis to project its amplitude. the

Several studies have suggested that GCMs can simulate realistic climate variability for global mean temperature across timescales 9,10 which is dominated by the response to external forcing. However, at the regional scale, where only a small fraction of the variability is forced 11 , the skill of GCMs has been called into question for slow timescales 12,13 . Marine proxies suggest a strong scaling of regional sea-surface temperature (SST) variance from annual to millennial timescales (β≈1) 7 which contrasts with the weak scaling found in GCMs (β≈0.1) 14,15 . This leads to an increasing discrepancy between reconstructed SST and model simulations at longer timescales, reaching two orders of magnitude in variance at the millennial timescale 7 .
On land, instrumental temperatures show a fundamentally different behaviour compared to the oceans, with only a weak timescale dependency up to multi-decadal scales (β≈0.1) [14][15][16] ; this lowfrequency weather has been termed "macroweather regime" and starts at sub-monthly timescales 17 . This difference can be explained by the much smaller heat capacity of land surfaces compared to the oceanic mixed layer 18 . If the macroweather-type behaviour over land would hold on to longer timescales, internal variability would only play a minor role on the uncertainty of regional climate projections 19 . However, a different scaling behaviour of ocean and land at longer timescales would imply an increasingly large variability discrepancy between terrestrial and marine regions which seems physically implausible and in contradiction with both diffusive energy balance models 20 and GCMs 15 . This leaves us with the conundrum that we must either reject altogether the marine proxies or see a fundamental change of variability in the terrestrial domain on longer timescales.
Given the limited instrumental record, this fundamental question on the scaling of variability between land and ocean cannot be answered from observations without terrestrial proxy data. Holocene Greenland ice-core δ 18O timeseries suggest that the weak scaling behaviour found in the instrumental data extends to millennial timescales (akin to white noise with β≈0) 21 . However, it is unclear whether the climate variability derived from the Greenland ice-cores is representative for other terrestrial regions 22 and to what extent the proxy variability reflects temperature variability 23 . In order to elucidate the behaviour of climate variability over land at timescales longer than centennial, we compiled and analysed an extensive collection of recent Holocene Northern Hemisphere pollen data covering the past 8000 years, providing the largest spatial coverage of any land-based proxy at millennial timescales. This compilation of pollen data comprises 985 unique records from the northern hemisphere, 363 of which from North America, 487 from Europe, and 135 from Asia including the most complete dataset for China 24 and Siberia 25 . We produced summer (June-July-August) temperature reconstructions (hereafter simply termed temperature) as summer temperatures are usually well correlated with other variables driving vegetation growth 26 . Our results are not sensitive on the choice of summer temperature (Extended Data Fig. 1) or a potential influence of precipitation (Extended Data Fig. 2). Pollen-based reconstructions rely on the assumption of dynamical equilibrium between climate and vegetation. This assumption is timescale dependent and is generally more valid at longer timescales 26 . In this respect, millennial scale estimates of temperature variability should thus be particularly robust. In addition, the comparison with instrumental and tree-ring data (Extended Data Figure 3) suggests that temperature variability derived from pollen assemblages also reliably captures faster timescales. Finally, we verified that the estimates of millennial variability were not systematically biased by human influences (see Extended Data Fig. 4).
Pollen-based reconstructions, once coupled with instrumental data 27 , enable us to establish for the first time a comprehensive picture of regional land temperature variability from inter-annual to millennial timescales (Fig. 1). The spectrum of instrumental air temperature shows a rather flat scaling behaviour that is characteristic of the macroweather regime. At longer timescales, however, the pollen-based reconstructions show a strong increase in variability with increasing timescale (Fig. 1a). This suggests that even in the relatively stable recent Holocene, there exists significant centennial to millennial scale temperature variability. This behaviour clearly differs from the rather flat macroweather regime and resembles power-law scaling with an exponent β near unity for timescales longer than centennial. Traces of this increased scaling behaviour at multi-decadal timescales already appear in the instrumental record and is corroborated by dendrochronological timeseries (Extended Data Fig. 3).
We benchmark our result against three transient climate models simulations of the recent Holocene: IPSL 28 , ECHAM5 29 and the last 8000 years of the TraCE-21ka deglaciation experiment 30 (i.e. after the last freshwater forcing events). The transient simulations of these GCMs show the weak scaling behaviour characteristic of the macroweather regime, although with higher annual to decadal variability relative to the instrumental data as previously recognized 15 . However, they fail to capture the increase in variability observed in the reconstructions at multi-decadal timescales. Instead, they show a relatively weak increase at multi-centennial timescales and a sharp increase at millennial timescales due to spectral leaking from the orbital 23-ka precession cycle (Fig. 1a, Extended Data Fig. 5). As a result of this divergence in variability scaling there is an increasing deficit in temperature variability observed in the simulations compared to the reconstructions. The range of variance ratios between the reconstructions and the different model simulations increases from 7-10 over 100-300 years to 17-56 (19-102 with orbital detrending) over 1000-3000 years, resembling the discrepancy between models and proxy data for regional SST variability 7 .
Comparing the reconstructions over land with estimates of marine variability 7 shows they have a very similar low-frequency behaviour (Fig. 1b). This supports the view that both components will vary more coherently as climatic variability becomes a global phenomenon over longer timescales 20 also indicated by coherent land and ocean average temperatures 5 . Energy-balance models suggest that this parallel behaviour of land and oceans on long timescales is due to heat exchange between the land and ocean compartments. In such models, land air temperature can be described as a linear combination of the SST and a time-dependent forcing over land 31 ; the resulting variability spectrum over land is then a linear combination of the spectra of each term (Fig. 1b) when the two are uncorrelated (see Supplementary Information). In this framework, the change in scaling behaviour can be regarded as a transition from the macroweather regime at shorter timescales, dominated by a weakly scaling forcing component akin to white noise over land 18 , to an oceanic regime dominated by the SST component at timescales longer than decadal. Interestingly, the parallel behaviour between land and ocean temperature spectra on multi-decadal to millennial timescales provides no evidence for additional terrestrial slow climate feedbacks. The oceanic component present in land temperature variability appears amplified by a factor of ~4 in PSD or ~2 in amplitude. This factor is similar to the land-sea warming contrast 32 observed during the last century 33 and within the range of land-sea warming ratios measured in GCMs 34 . This is thought to be the result of local feedbacks, for example the evaporation feedback, when moisture availability over land limits evaporative cooling in comparison with marine regions 35 , and also because of the asymmetry in the land-ocean heat exchange which favours land due to its lower specific humidity 34 .
The extensive spatial coverage of the pollen-based reconstructions allows us to perform a spatial analysis of the millennial scale temperature variability PSD 1000-3000 years (the mean PSD over 1000-3000 year) (Fig. 2a) and investigate the potential link to oceanic influence. The spatial coherency (Moran's I=0.19, p<0.001, see Methods) demonstrates that the variability estimates are not drowned out by local noise. Over Europe, with its large number of records, millennial scale variability decreases inland along the path of prevailing winds blowing from the Atlantic Ocean, and is lowest over Fennoscandia where blocking events are most frequent 36 . Similarly, China's high millennial variability would be linked to the persistent oceanic influence carried by the dominant easterlies at that latitude, while further north in eastern Siberia the dominant westerlies bring little oceanic influence. This further suggests that higher millennial variability relies on higher connectivity to oceans, as implied by energy-balance models, although compounded with local sensitivity. The high variability in central Asia remains an outlier given the strong continentality there, but the significance is lower because of the sparseness of records. It is also possible that the lower connectivity to oceans is compensated by the stronger local climate sensitivity 33 which may be linked to hydrological feedback due to the arid conditions 35 . Meanwhile, in North America the lowest millennial variability is found in the prairies, near the centre of the continent, where the westerlies predominantly blow from the north-west and the oceanic influence is lowest.
We use the instrumental data to study the mechanisms governing the spatial distribution of the millennial variability and the continuum of variability. The scaling of variability in instrumental data has already been shown to be related to the strength of the annual cycle and of the sub-decadal variability 8 . If we aggregate the instrumental data and reconstructions based on the sub-decadal variability PSD 2-10 years (the mean PSD over 2-10 years; Fig . We should thus expect an inversion where regions of low (high) sub-decadal variability, typically characterized by more maritime (continental) influences, would become regions of high (low) variability at long timescales.
Indeed, this hypothesized relationship is confirmed by the pollen-based reconstructions. Their estimates of millennial temperature variability PSD 1000-3000 years show a strong anti-correlation with the sub-decadal variability PSD 2-10 years (r=-0.95, p<0.01) and a strong correlation with the multidecadal scaling β 10-60 years (r=0.91, p<0.01). These significant strong relationships between the pollenbased reconstructions and independent instrumental temperature data demonstrate a fundamental link of temperature variability from sub-decadal to millennial timescales. The spatial pattern of the variability ( Fig. 2 and Extended Data Fig. 5) further suggests that this relationship is caused by varying marine influence. In addition, this can explain the relation of the amplitude of the annual cycle (an indicator for continentality) with the inter-annual variability scaling in the instrumental relationship proposed by Huybers and Curry (2006) 8 . Therefore, our findings complete the linkage between seasonal and millennial land temperature variability .   129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144  145   146  147  148  149  150  151  152  153  154  155  156  157   158  159  160  161  162  163  164  165  166  167 Our results indicate that current GCMs underestimate regional temperature variability over land at timescales longer than multi-decadal (Fig. 1a, Extended Data Fig. 5). In combination with the spatial pattern of variability (Fig. 2), this suggests that the deficit in low-frequency variability is related to an underestimation of marine variability 7 . The interpretation of climate sensitive proxies remains an area of active research, and in principle, it remains possible that the observed modeldata mismatches stem from non-climatic variability. However, several lines of evidence argue against this interpretation. Firstly, there are no known archival processes yet which could artificially create such power-law scaling in sedimentary archives 37 . Specifically, the known processes such as counting errors, spatial or temporal aliasing, and bioturbation in the sediment cannot explain the power spectra of variability found here. Secondly, the consistency between independent marine and terrestrial archives (Fig. 1b) provides further support for the temperature variability reconstruction. Finally, the spatial relationship with the independent instrumental temperature data (Fig. 3) also indicates that this is no artefact from the proxy data.
Thus, pollen-based reconstructions support the paradigm of an increasing continuum of climate variability with increasing timescales 8 , in contradiction with the local temperature variability in current GCM simulations and the classical picture of Mitchell 6 . More importantly, our results extend previous findings from instrumental data 8 and demonstrate a fundamental link between interannual, multi-decadal and millennial timescales driven by the interaction of marine and terrestrial temperature variability modulated by continentality.
This fundamental behaviour of temperature variability has implications for the relative impact of natural and anthropogenically forced variability. High latitude regions characterised by high interannual variability show a weaker oceanic regime and, ultimately, less natural variability on long timescales. As these regions are also highly sensitive to anthropogenic forcing, the impact of anthropogenic warming, relative to natural variability, will be greater. However, regions of strong maritime influence, where most of the world's population is located, could see large natural variability, that is not covered by current GCM projections which tend to display monotonous warming 33 . It is thus possible that until now, the stronger natural variability at multi-decadal timescales in maritime regions has partly overshadowed the anthropogenic warming in those regions which could explain their lower observational transient climate sensitivity 33 . Integrative archives such as glaciers should be particularly sensitive to this increased memory 16 and could be used to verify our findings. Large compilations of climate archives have the potential to inform us on the spatial patterns of slow variability and their underlying causes, and further studies combining multiple proxies over land and ocean show great promise to improve our understanding of the spatio-temporal correlation structure of climate variability.   Figure 1| Average spectral estimates of local land temperature over the Northern Hemisphere. a, Average spectral estimates of land air temperature from pollen-based reconstructions, instrumental data and model simulations extracted at the pollen record locations. Also shown are the spectra estimated after applying a 23-ka sinusoidal detrending (dashed, see Methods). 90% confidence intervals are given (shaded). The number of pollen records contributing to each timescale is indicated below (brown axis). b, Average spectral estimates from reconstructed annual sea-surface temperature derived from marine archives 5 , and from instrumental data at the corresponding locations. The pollen-based and instrumental average spectra from a are reproduced. Also shown are linear combinations of power-laws with slope β=1.2 and white noise series (dashed green for land, dashed blue for sea and dashed grey for the white noise levels) corresponding to the energy-balance model approximation.   Figure 3 | Spectral estimates of land temperature as a function of sub-decadal variability. a, Spectral estimates of the instrumental temperature (Δt<60 years) and of the pollen-based reconstructions (Δt>100 years) binned according to sub-decadal instrumental variability (See Methods and Extended Data Fig.7). Dashed lines indicate estimated instrumental multi-decadal scaling exponents β 10-60 years . 90% confidence intervals are given (shaded). b, Relation of the instrumental sub-decadal temperature variability PSD 2-10 years and the pollen-based millennial temperature variability PSD 1000-3000 years . The size of the points is proportional to the errors on the millennial variability estimates. c, As in b, but between β 10-60 years and PSD 1000-3000 years .

Reconstructions
The modern pollen dataset used for calibration consists of 15,532 sampling sites. For each fossil record location, we selected a unique subset of modern sites within a 2000 km radius in order to increase the reliability of the reconstructions and avoid a bi-modal climate optimum 38 24 . A fossil database comprising 985 records was compiled based on the requirement that the resulting spectral estimates covered timescales at least one fifth of an order of magnitude, i.e. 0.2 in a base 10 logarithm, below one third the length (to avoid the well-known multitaper low-bias at long timescales) 40 . The European and Asian datasets were combined keeping the 70 most common taxa according to Hill's second number.
The modern climate data used for calibration was the average of the June, July and August climatologies, i.e. the summer temperature, for the years from 1970-2000 as obtained from WorldClim 2.1 41 . The Weighted Averaging Partial Least Square (WAPLS) 42 method was used to calibrate transfer functions relating the pollen-assemblages to the summer temperature, with leaveone-out cross validation. The pollen percentages were square-root transformed to decrease the dominance of abundant taxa with high productivity. The number of retained WAPLS components was selected using a randomization t-test. The same method was also applied to reconstruct annual mean temperature (Extended Data Fig. 1) and annual precipitation (Extended Data Fig. 2), also using the climatologies from WorldClim 2.1 41 for the calibration of each variable.

Significance Testing
Following Telford and Birks (2011) 43 , we tested the summer temperature reconstructions for significance using the R package "palaeoSig" and found that 228 out of 985 reconstructions were significant (p<0.1). However, this significance test is rather conservative and several reasons can create type II errors (false negatives), including for example a low diversity of taxa, a small number of sub-fossil observations, an input climate signal that is less variable, or an inadequate training set 43 . A higher p-values therefore does not necessarily mean that the summer temperature has not been recorded, but rather that the information is insufficient to confirm it. Thus, instead of unduly discarding most records, we decided to include all records in the main analysis. We show that our conclusions are robust and continue to hold even if we restrict our analysis to the 'significant locations' only (p<0.1) which yielded similar results to the 'not significant locations' (p>0.1) (Extended Data Fig. 8).

Testing for Anthropogenic Impacts
We considered all series covering entirely the last 8-ka showed and defined two temporal windows: the more recent (0-4ka) and the more distant (4ka-8ka) past. The 1000-2000 year timescale band was taken to calculate the variance ratio between the two 4ka windows. We only included those series in our analyses whose spectral estimates covered at least one tenth of an order of magnitude (on a base 10 logarithmic scale) for both time periods; a criterion that was met by 344 records. We 224   225   226  227  228  229  230  231  232  233  234  235   236  237  238  239  240  241  242  243  244   245   246  247  248  249  250  251  252  253  254  255  256   257   258  259  260  261 found no systematic variance increase in the more recent half of the series as the mean of the distribution of the logarithm of the variance ratios did not significantly differ from zero (p>0.1). In fact, the more recent period, where human impacts may have contributed to an increased variability, is about 6 % less variable than the earlier one. Likewise, the spatial distribution (Extended Data Fig.  4) did not show any obvious spatial patterns that could be related to human occupation, displaying a non-significant Moran's I of 0.014 (p>0.1 ; see Methods). If human occupation was the dominant driver of millennial scale variability, we would have expected to observe an increase in variability over both Europe and China, where human occupation has been increasing the most over the last 4000 years compared to the preceding 4000 years. We thus conclude that human impacts on vegetation did not have a significant enough impact on the slow variability to systematically bias millennial scale variability estimates.

Instrumental Data
For the instrumental dataset we used the Berkeley Earth Surface Temperature (BEST) land and ocean product covering the period from 1850-2020. The equal area product is used for calculations, while the regular 1°×1° product was used for visual display. The instrumental data was detrended from anthropogenic influences to a first-order component 44 proportional to historical timeseries of doublings in atmospheric carbon dioxide concentration 45 . There are 403 grid points which comprise pollen records (circles on Fig. 2). Using alternative instrumental datasets 46 leads to similar conclusions.

Model Data
Three model simulations of the recent Holocene were considered, namely the IPSL 28 , the ECHAM5 29 and the CCSM3 47 . The first two are recent Holocene transient simulations of the past 6,000 years, and the latter is the TraCE-21ka deglaciation experiment 30 . We only retained the last 8,000 years of TraCE-21ka since it is comparable to the recent Holocene transient simulations as they contain no more freshwater forcing events which were the main drivers of the deglaciation. We selected the average 2-meter summer air temperature (June-July-August) and averaged it annually.
Since the long-term trends in summer temperature are linked to the precession in the Earth's orbit, we also analysed the timeseries after detrending for a 23-ka sinusoid rather than use the standard linear detrending performed before computing spectral estimates. This approach attempts to minimize power leakage from the orbital forcing frequencies onto the observed frequencies (Fig.  1a, dashed lines). The reduction in leaked power was not nearly as important in the case of the pollen-based reconstructions (Fig. 1a, Extended Data Fig. 5).

Spectral Estimates
The power spectral density estimates were calculated using the multitaper method 48 , adapted for irregular sampling through linear interpolation 49 , with the number of tapers n tapers =3 and the timebandwidth parameter ω=2, which yield up to n tapers *ω=6 degrees of freedom for the individual spectral estimates. Only timescales greater than twice the maximal resolution were kept to minimize power loss due to the interpolation 37 .
The confidence intervals were derived from the chi-squared distribution (χ²) of the multitaper estimates. The degrees of freedom of the χ² were limited to a maximum based on the expected effective spatial degrees of freedom at a given timescale in order to avoid obtaining over-confident estimates. The approximate relationship assumed for the maximum degrees of freedom ν max as a function of timescale is: ν max =40 Δt -0. 3 . This corresponds to about 20, 10, and 5 degrees of freedom at the decadal, centennial and millennial timescales respectively 37 . They were multiplied by six, the number of degrees of freedom for the multitaper estimates with three independent tapers, and further modulated by a factor of spatial representativity f Δt calculated as the fraction of the land area represented by the spatial distribution of the data.
For each point over land we calculated the effective number of records N Eff on a regular 1º by 1º degree grid using a Gaussian kernel: where d i,j is the geographical distance between the i th data record (out of n records with a PSD estimate at the given timescale Δt) and the j th grid point where N Eff is calculated, and σ is a characteristic decorrelation scale which we took as σ=300 km. N Eff was also used for Fig. 2a in order to modulate the opacity of the background as a function of nearby records. The factor of spatial representativity for a given timescale was then calculated as: where m is the number of grid points covering the land area north of the southernmost pollen record.

Variance Ratios
Variance ratios were computed by taking the ratio between the mean PSD over the same timescale band between different series after interpolating in the spectral domain. Since the ratio of two χ² distributed variables follows an F-distribution, the ratios were multiplied by (d-2) d ¹ where d is the ⁻ number of degrees of freedom of the denominator 22 .

Sub-Decadal Variability Binning
The data was aggregated based on the mean sub-decadal variability PSD 2-10 years , defined as the mean PSD over the 2-10 year timescale band. We calculated PSD 2-10 years for each of the 403 instrumental grid point for which pollen records were present nearby, ordered the results, and split them into eight non-overlapping bins (Extended Data Fig. 7). Each pollen record was assigned to the nearest instrumental grid point and averaged in the spectral domain. Varying the number of bins, for example using twenty bins instead of eight, leads to similar correlations (Extended Data Fig. 7). subsets according to the PSD 2-10 years amplitudes (see previous section). The standard errors on the millennial variability estimates, i.e. √2 DoF , where DoF is the total degrees of freedom for the PSD estimate, were used as weights for the correlation calculation and for visual representation in Fig. 3b,c.

Moran's I
Moran's I spatial autocorrelation index was calculated using the method from Gittleman and Kot (1990) 50 as implemented in the R-package "ape" 51 . The weight matrix used corresponds to the inverse of the distance between sites.

Extended Data
Extended Data Figure 1 | Summer and Annual Temperatures Spectra -Comparing the average spectra at the location of pollen records for the mean summer temperature (dashed) and the mean annual temperatures (solid). 90% confidence intervals are given (shaded). The IPSL and ECHAM5 model results exhibit a slightly lower variability in their annual temperature than in their summer temperature over all timescales, except for the longest timescale since it is dominated by leaked power from the Earth's orbital precession which mainly affects summer temperature in the Northern Hemisphere during the Holocene. On the other hand, TraCE-21ka generally shows a slightly higher variability in its annual compared to its summer temperature. Although the pollen-based reconstructions calibrated for annual temperature are thought to be less reliable than the summer temperature reconstructions, they give a very similar result. This shows that our conclusions are robust against uncertainties in the seasonal attribution of pollen variability.  Fig. 1, but for precipitation instead of summer temperature. While most locations should reflect temperature, here we also tested the boundary case of assuming that all sites reflect precipitation. Even in this extreme case, the main results hold, namely increasing climate variability over land as a function of timescale and a corresponding deficit of variability in the climate models. In all cases, there was little difference between the estimates with (dashed) and without (solid) sinusoidal detrending. The three climate models vastly disagree in terms of the amplitude of precipitation variability, but they all show a large deficit of variability at long timescales compared to the pollen-based reconstructions. Thus, even if precipitation would affect parts of our records, this cannot reconcile the model-proxy discrepancy.  Extended Data Figure 5| Comparison of millennial scale variability in the reconstructions and models. a-c Millennial temperature variability PSD 1000-3000 years (mean PSD for the timescale band 1000-3000 years) for the pollen-based reconstructions after smoothing with a Gaussian kernel with a characteristic scale of 300 km (see Methods). d-l Millennial temperature variability PSD 1000-3000 years for the three climate models (without smoothing). The results with different detrending methods before computing the power spectra are compared: d,g,j without detrending, e,h,k with linear detrending and f,i,l with a 23-ka sinusoidal detrending. The same colour scale is used for all maps. While the typical pollen-based variability would be generally between 10 and 1000 K 2 year -1 , the models yields a variability 1-2 orders of magnitude smaller, between 0.1 and 10 K 2 years -1 , with some regions reaching up to 100 K 2 year -1 . Extended Data Figure 7 | Spectral estimates of land temperature as a function of sub-decadal variability using twenty bins. As in Fig.3, but using twenty bins instead of eight. While the result is noisier than in Fig. 3, the correlation is still highly significant, which shows that our result is not sensitive to the number of bins. A total of 985 pollen records which provided spectral estimates were tested for statistical significance and separated into 'significant locations' (p<0.1) and 'not significant locations' (p>0.1). The resulting average spectra overlap and are fairly similar over a wide range of timescales. The average spectra of the instrumental data at the corresponding locations are shown alongside for reference (dashed). 90% confidence intervals are given (shaded).