Observed poleward freshwater transport since 1970

Warming-induced global water cycle changes pose a significant challenge to global ecosystems and human society. However, quantifying historical water cycle change is difficult owing to a dearth of direct observations, particularly over the ocean, where 77% and 85% of global precipitation and evaporation occur, respectively1–3. Air–sea fluxes of freshwater imprint on ocean salinity such that mean salinity is lowest in the warmest and coldest parts of the ocean, and is highest at intermediate temperatures4. Here we track salinity trends in the warm, salty fraction of the ocean, and quantify the observed net poleward transport of freshwater in the Earth system from 1970 to 2014. Over this period, poleward freshwater transport from warm to cold ocean regions has occurred at a rate of 34–62 milli-sverdrups (mSv = 103 m3 s−1), a rate that is not replicated in the current generation of climate models (the Climate Model Intercomparison Project Phase 6 (CMIP6)). In CMIP6 models, surface freshwater flux intensification in warm ocean regions leads to an approximately equivalent change in ocean freshwater content, with little impact from ocean mixing and circulation. Should this partition of processes hold for the real world, the implication is that the historical surface flux amplification is weaker (0.3–4.6%) in CMIP6 compared with observations (3.0–7.4%). These results establish a historical constraint on poleward freshwater transport that will assist in addressing biases in climate models. A study uses a temperature-percentile water mass framework to analyse warm-to-cold poleward transport of freshwater in the Earth system, and establishes a constraint to help address biases in climate models.

The global water cycle represents the sustained transport of freshwater through the lithosphere, hydrosphere, biosphere and atmosphere, and underpins the foundation of human society and ecological systems 1 . Understanding the response of the global water cycle to long-term global warming is critically important to mitigating the impacts of climate change. However, direct observations of rainfall are sparse, particularly over the ocean, which hosts 77% of global rainfall and 85% of surface evaporation [1][2][3] . Limited rainfall and evaporation observations, combined with substantial variance across atmospheric reanalysis products [4][5][6] , have made direct measurements of long-term changes to the global water cycle challenging. As a result, some of our understanding of changes to the water cycle has been shaped by proxies for rainfall measurements over the ocean, including salinity. As water associated with the hydrological cycle transits through the ocean, it generates a signature of ocean salinification and freshening, both at the surface 7,8 and the interior 9,10 . Using in situ salinity observations and ocean models, a growing body of research has identified a 'wet gets wetter, dry gets drier' paradigm, wherein the global water cycle intensifies owing to global warming 7,[11][12][13] , with some estimates suggesting a water cycle intensification of 2-4% relative to the 1950 ocean state 8,10,14 . This intensification manifests as increasingly salty subtropical oceans and increasingly fresh tropical and subpolar oceans.
Observed ocean salinity change has a large-scale coherent pattern through depth 15,16 (Fig. 1). The Indo-Pacific basin has well-defined regions of freshening and salinification, whereas the Atlantic basin is broadly salinifying in warmer temperature zones (compare Fig. 1a, b with Fig. 1c, d). Subsurface salinity changes seem to be related to surface patterns of precipitation, river runoff and evaporation (P E R − + ) in observations and climate models (hatched line plots in Fig. 1). There are also significant differences between estimates of ocean salinity changes in observations (Fig. 1a, c) and climate models over the same period (Fig. 1b, d). Such differences between observations and models in Eulerian diagnostics have motivated past research to use water-mass-based diagnostics to frame ocean salinity changes, for instance, by quantifying mean salinity, S, in temperature, T, (or density) layers (known as the T-S curve of the ocean) 9,17,18 .
Tracking the global T-S curve over time provides information about salinity changes in different climatic regions based on temperature. However, simply computing trends of absolute salinity at constant conservative temperature ignores ocean warming that moves the underlying T-S curve 19 , yielding anomalously large salinity tendencies at constant T (Extended Data Fig. 2). In this work, we analyse salinity changes at constant temperature percentile rather than constant temperature 20 . That is, we organize the ocean from warmest to coldest and track changes at a fixed accumulated volume (for example, by tracking salinity changes in the warmest 6% of the ocean by volume). Temperature-percentile surfaces occupy the same volume over time, and outcrop in roughly the same climatic region even as the ocean warms (for instance, Extended Data Fig. 3 exhibits the poleward migration of temperature-percentile outcrops compared with the equivalent isotherms). Using the temperature-percentile Article framework, we can track changes in the T-S curve of the ocean while eliminating much of the direct influence of ocean warming, enabling a clean analysis between different datasets that may be warmer or cooler in general but have a similar volume-temperature distribution. The overlaid black contours in Fig. 1 illustrate the geographical distribution of temperature percentiles in observations, from hot (0%) to cold (100%).
In this work, we quantify the global warm-to-cold limb of freshwater transport, a crucial component of the water cycle, in multiple observational datasets and climate models using the temperature-percentile water mass framework. We track changes in conservative temperature and absolute salinity in three gridded hydrographic historical observations (Institute of Atmospheric Physics; IAP, Ishii, and Enhanced Ocean Data Assimilation and Climate prediction (ENACT) version 4; EN4) 14,21-23 , as well as changes in potential temperature and practical salinity in 20 CMIP6 historical model runs 24 . To understand model biases, we examine two large ensembles of single CMIP6 model runs, and six Detection and Attribution Model Intercomparison Project (DAMIP) historical model runs 25 (see Methods for theory and data sources). We also evaluate a composite observational dataset (Composite Observations) that combines IAP data for which depth z < 2,000 m and EN4 data for which z > 2,000 m. This dataset is formulated to limit salinity errors associated with sampling biases before the Argo era (see ref. 21 for information on IAP sampling biases and ref. 20 for further information on the composite dataset). We also estimate observed time-mean surface freshwater fluxes from the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis (ERA5) 26 . All observational and model data sources (with the exception of Ishii and ERA5) cover the historical period from 1970 to 2014, with Ishii covering 1970-2012 and ERA5 covering 1979-2014. Using these observations and models, we produce an estimate of observed poleward freshwater transport since 1970 and explore biases in salinity trends in the latest generation of climate models.
The ocean's T-S curve for 1970 (dashed black curve in Fig. 2) has a characteristic question mark shape. Mean ocean salinity is lowest at the warmest and coldest temperature percentiles (representing the tropics, subpolar and polar regions), and is highest at intermediate temperature percentiles (representing the subtropics). The observed salinity tendency follows the widely established 'wet gets wetter, dry gets drier' pattern ( Fig. 2a), with red arrow vectors showing salinification (and warming) in the subtropics (between the warmest 0.2%-6% of the ocean) and blue vectors exhibiting freshening in the tropics (the warmest 0%-0.2% ocean by volume) and subpolar/polar regions (the warmest 6%-100% ocean by volume). The CMIP6 multi-model mean exhibits salinification over a narrower band of temperature percentiles, between the warmest 0.2%-2% of the ocean. Note that the CMIP6 models largely conserve salt on a global scale 27 (Extended Data Fig. 7a, b), as do observations in the range of uncertainty presented. On the basis of Fig. 2, the warmest 2% of the ocean experiences salinification in both the CMIP6 models and observations, and the warmest 6% of the ocean experiences salinification in the observations. Consequently, we focus hereafter on tracking changes in the warmest 2% and warmest 6% of the ocean.
We track implied freshwater content change in the 2 and 6% water masses by multiplying mean salinity tendency by layer volume (as in equation (6) in the Methods). In the warmest 2% ocean by volume (Fig. 3a), the freshwater content decreases in both the observations and the CMIP6 models. Over the historical period, the warmest 2% of the observed ocean loses freshwater at a rate of 13-41 mSv, 1.6-5 times more than the rate of freshwater loss in the CMIP6 ensemble mean (8 mSv). That said, both observations and CMIP6 models have a similar spread in freshwater content change (histograms in Fig. 3a). In the warmest 2% of the ocean, the single-model spread in internal variability is similar to the CMIP6 inter-model spread, as diagnosed from an ensemble of historical simulations from two CMIP6 models: ACCESS-ESM1-5 and  Stippling in b and d indicates regions for which more than 2/3 of the CMIP6 models agree on the sign of the salinity tendency. Black contour lines indicate zonally averaged temperature percentiles with the warmest 2% of the ocean bound by the 2% contour, the warmest 6% by the 6% contour and so on. Here we track salinity changes in these percentile layers.
CNRM-CM6-1 (Extended Data Fig. 5; one such run from each ensemble is highlighted in Fig. 3 in blue and orange, respectively). Therefore, the freshwater content change in the single-model ensembles also overlaps with the observational datasets explored here.
In the warmest 6% of the ocean (Fig. 3b), the observations show a poleward redistribution of freshwater of between 46 and 77 × 10 12 m 3 (at a rate of 34-62 mSv). The implied freshwater content change is robust across different observational products, lending credence to the temperature-percentile framework being used here. By contrast, many CMIP6 models (and the CMIP6 ensemble mean) experience freshening over the historical period, with the closest CMIP6 model experiencing a freshwater loss of about 1/2 of observations (21 × 10 12 m 3 ; 20 mSv rate). The substantial difference between CMIP6 models and observations is highlighted further by focusing on freshwater content change in the warmest 2-6% ocean layer (Fig. 3c). Almost every CMIP6 model has a net freshwater transport into this layer, compared to a remarkably consistent net freshwater transport out of this layer in all three observational datasets. As with the 2% water mass, internal variability in individual models has a similar spread in freshwater content change to the CMIP6 models (Extended Data Fig. 5) in the warmest 6% of the ocean and the 2-6% layer. Therefore, the observed salinification is still far from the range in the CMIP6 models, and internal variability does not explain the difference in poleward freshwater transport in models and observations. There is also larger spread between individual CMIP6 model realizations in the warmest 6% of the ocean, with the models coalescing into salty and fresh groups (histograms in Fig. 3b; see Extended Data Table 1 for individual models).
The difference between models and observations in the warmest 6% of the ocean in Fig. 3 may be due to a difference in the pattern of salinity change. Indeed, Fig. 1 shows that a freshening signal intrudes into the warmest 6% of the ocean by volume in the North Atlantic and North Pacific in CMIP6 models, and plots of sea surface salinity (SSS) tendency in the observations and CMIP6 models (Extended Data Fig. 3) confirm that surface freshening is more widely distributed in the CMIP6 models, particularly in the North Atlantic and North Pacific. Preliminary analysis suggests that this freshening signal, including the dipole in the subtropical Indo-Pacific, may be associated with surface freshwater anomalies (not shown).
Future projections of warming-induced extreme precipitation and global agricultural changes are based in part on estimates of water cycle change from CMIP6 models. The fact that CMIP6 models cannot accurately capture poleward freshwater transport is therefore a cause of concern for future climate change projections, and the systematic fresh bias in CMIP6 models must be urgently addressed. We explore the role of changing atmospheric surface fluxes (P − E + R) in modifying ocean freshwater content and causing this CMIP6 bias. Historical CMIP6 models alone do not exhibit a significant relationship between surface freshwater flux (P − E + R) tendencies and freshwater content changes (small black dots in Fig. 4). To sample a wider range of perturbations to the water cycle, we analyse a set of six DAMIP models 25 that each performed a historical greenhouse gas forcing (GHG-only) and a historical anthropogenic aerosol forcing (AA-only) experiment. In the GHG-only experiments, planetary warming associated with GHG build up amplifies the strength of the water cycle in a 'wet gets wetter, dry gets drier' pattern 6,13 . In the AA-only experiments, anthropogenic aerosols cool the climate system, leading to wet areas getting drier and dry areas getting wetter 13 . Forcing climate models solely in the GHG-only and AA-only configuration therefore provides a wider range of water cycle perturbations by which to judge climate model responses.
In the warmest 2% of the ocean, there is a correlation between surface freshwater flux changes (expressed as a linear rate of change between 1970 and 2014, in milli-sverdrups) and freshwater content change (also in milli-sverdrups) applied to the DAMIP runs and their six corresponding CMIP6 historical runs (red, blue and purple circles), with an r 2 value of 0.43 (Fig. 4a). An increase in net evaporation over the warmest 2% of the ocean leads to decreasing freshwater content in this water mass.

Article
The slope of the linear regression is between 1.41 and 2.57, indicating that a surface freshwater flux change of 1.4-2.6 mSv is required to effect a freshwater content change of 1 mSv. The remaining surface freshwater flux change of 0.4-1.6 mSv presumably leaves the warmest 2% of the ocean via mixing and circulation changes. In the warmest 6% of the ocean, the linear regression also indicates an inverse relationship between surface flux intensification and freshwater content change, with an r 2 value of 0.47 (Fig. 4b). Importantly, the slope of the linear regression in the warmest 6% of the ocean is closer to 1 (0.72-1.23), implying that approximately 1 mSv of surface freshwater flux change leads to 1 mSv of freshwater content change, with minimal contribution from mixing and circulation changes. Note that the linear regression does not cross the origin in both the 2% and 6% water masses, indicating that even in the absence of surface freshwater flux changes, there will still be some background rate of freshwater content change. This offset may be explained by surface ocean warming, which suppresses upper ocean mixing, leading to a reduction in the transfer of freshwater from cool, fresh, deep waters to warmer, saline, surface waters. Work by ref. 8 suggested that this was a leading-order effect in surface salinity pattern changes and a minor (but not insignificant) effect in changes in three-dimensional (3D) salinity contrasts. Furthermore, ref. 28 suggested that northward Ekman transport in the Southern Ocean may contribute to salinity changes in the subpolar ocean, although this effect is model-resolution dependent.
The linear regressions in Fig. 4 provide an indication of the relationship between surface freshwater flux changes and freshwater content changes in the DAMIP models. An equivalent relationship for observational datasets is difficult to independently analyse, mainly because atmospheric reanalysis products tend to disagree in the sign and magnitude of surface freshwater flux tendencies 4,5 . In the absence of reliable observations of surface flux tendencies, we assume that the relationship between freshwater content change and surface flux change obtained from the DAMIP models in Fig. 4 is realistic and can be applied to observations. Applying the linear regressions to the known observational freshwater content change (from the histograms in Fig. 3), we infer that surface freshwater fluxes in the warmest 2% ocean have intensified  Table 1). Right panels: histograms show the rate of freshwater change (in milli-sverdrups) of each model member and observational product based on the slope of a linear regression over the historical period. by 42-152 mSv (equivalent to an amplification of 2.7-10.1% relative to pre-industrial fluxes; Extended Data Fig. 6), compared with 29-116 mSv (1.9-8%) in CMIP6 models (green shaded area in Fig. 4a). In the warmest 6% ocean, we infer that surface freshwater fluxes have intensified by 55-127 mSv (equivalent to an amplification of 3-7.4%), compared with 5-77 mSv (0.3-4.6%) in CMIP6 models. Thus, in the warmest 6% of the ocean, we conclude that the surface flux intensification in the CMIP6 models is too weak resulting in models that are biased fresh.
Although this relationship is based on six models rather than the wider suite of twenty CMIP6 models analysed earlier, we note that the response in the fourteen other CMIP6 models (small black dots in Fig. 4) lies in broadly the same region as the six CMIP6 historical runs corresponding to the DAMIP models (purple dots in Fig. 4). Note that some GHG-only and CMIP6 historical models deviate significantly from the linear regression, reflecting the fact that the regression is not a perfect correlation, and that individual models may also be impacted by mixing and circulation. The CMIP6 model that deviates most from the regression has a surface flux change over the warmest 6% of the ocean that is approximately 20 times greater than that implied by the linear regression. This suggests broad scope for mixing to also modify the freshwater content in specific models. In the future, increased Argo coverage may provide an observational constraint on these model estimates, allowing for the removal of outliers from the analysis.
The temperature-percentile analysis presented here is distinct from previous methodological approaches that have estimated changes to the water cycle as a whole. Past studies have quantified the net change in freshwater transport, integrated over all climatic regions. In this work, however, we specifically quantify the crucial warm-to-cold limb of freshwater transport, that is, the net transport of freshwater out of the tropics and subtropics into the cold subpolar and polar regions. The trends we have identified in the warm-to-cold limb of the global water cycle are robust across three leading observational datasets and sit outside the envelope of 20 CMIP6 climate model simulations. Previous ocean salinity-based studies have indeed indicated substantial changes in the global water cycle. For instance, ref. 7 analysed the global 2D SSS pattern amplification to infer global water cycle amplification rates from 1950 to 2000. Their observational analysis indicated an SSS pattern amplification that was larger (per degree Celsius of surface air temperature rise) than the majority of CMIP3 climate model simulations. However, subsequent research 8,12 showed substantial spread in SSS pattern amplification across different observational datasets, with EN4 and EN3 data indicating a weaker SSS pattern amplification more consistent with climate models. Moreover, alternative approaches based on 3D ocean salinity changes suggested that although an anthropogenic signal is emerging from background variability 14 this signal is in line with climate model simulations 10 . Our targeted analysis of the warm-to-cold limb of water cycle change stands in contrast to these previous studies, indicating a substantial difference between CMIP6 models and all three observational datasets analysed.
Our understanding of the global water cycle hinges on a clear understanding of historical water cycle changes, and on climate models' faithful reproduction of these changes. In this study, we use a temperature-percentile water mass framework to quantify the warm-to-cold poleward transport of freshwater in our Earth system since 1970, inferred from gridded ocean observations. For 1970-2014, we estimate that 46-77 × 10 12 m 3 of freshwater has been transported

Article
towards the poles at a rate of 34-62 mSv. The latest generation of CMIP6 climate models show broad variability in poleward freshwater transport, which is 2-4 times smaller than the observations. The CMIP6 models experience salinification over a narrower band of temperatures, calling into question the accuracy with which they may be able to produce future water cycle projections. On the basis of the relationship obtained from DAMIP models, the weaker than expected freshening trend in CMIP6 historical runs is attributed to insufficient amplification of the surface freshwater fluxes over the warm portions of the ocean. In the warm regions of CMIP6 models, surface flux amplification has an approximately 1:1 impact on freshwater content changes, with mixing and circulation changes playing a minimal role, underscoring the need to correctly represent surface flux changes in these regions. These results establish an observational constraint on freshwater content changes, and motivate further exploration of the surface freshwater flux biases in climate models.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-04370-w.

The tracer-percentile framework
We begin by touching on the formulation for the tracer-percentile framework in temperature percentiles, as laid out in detail by ref. 20 . The volume of ocean warmer than a given conservative temperature Θ* is expressed as: x y z t Θ( , , , ) >Θ* V in which Θ is the global temperature field and Θ* is the binning temperature chosen for the integration. We invert equation (1) to express temperature as a function of volume, V, or equivalently, temperature percentile, p: in which V p V = / × 100 T and V T is the total volume of the ocean. V t Θ( , ) therefore represents the mean temperature bounding a given volume, and p t Θ( , ) represents the mean temperature bounding a given temperature percentile. The total volume of the ocean changes negligibly over time, so the volume and temperature percentile in this framework are related by a constant factor.

Mean salinity in temperature percentiles
We now extend the tracer-percentile framework by deriving the mean tracer tendency budget at constant tracer percentile. In this case, the mean tracer of interest is mean salinity, but this framework may be readily modified for use with other conservative and non-conservative tracers. The mean salinity at constant temperature is defined as: x y z t Θ( , , , )>Θ ∭ * * * * in which S x y z t ( , , , ) is the 3D absolute salinity field in the ocean. To find mean salinity at temperature percentiles, we interpolate S t (Θ*, ) onto p t Θ( , ) surfaces:

S t S p t t S p t (Θ*, ) (Θ( , ), ) = ( , ). (4) ⟹
Mapping mean salinity changes at constant temperature percentiles ensures that we maintain the benefits of adiabatic-invariant water-mass methods while also being able to trace freshwater fluxes directly to climatic regions (for example, tropics, subtropics and subpolar regions) at the ocean's surface.
The resulting variable S p t ( , ) provides a characteristic p-S curve for the ocean which, like T-S curves, captures changes in the salinity distribution of the global ocean 18 . For ease of comparison with traditional T-S curves, we recast the p-S curve onto the time-mean temperature p Θ( ) for a given temperature percentile, yielding a 'remapped' T-S curve.

Freshwater content and fluxes in temperature percentiles
To account for the volume of water in a given percentile layer experiencing salinity changes, we approximate a freshwater content, F p t ( , ) by multiplying salinity by layer volume: T 0 in which S t ( ) is the volume-averaged salinity that serves as the baseline differentiating 'fresh' and 'salty' water in the ocean, and S 0 is the reference salinity, assumed to be 35 g kg −1 . The global change in freshwater, or freshwater flux F p t ( , ), can be expressed as: T 0 F In practice, S t ∂ /∂ is approximated over the time period of interest as the slope of the linear trend in S p t ( , ). p t ( , ) F is related to surface fluxes s F and local fluxes of salt in the ocean, M, including mixing, such that the freshwater tendency budget may be defined as = + s F F M. Note that in the temperature-percentile framework, local fluxes of salt in the ocean M may be a consequence of both diabatic (that is, mixing) and adiabatic salt fluxes (for example, advective salt fluxes that occur alongside an unchanging temperature distribution).
The changes in freshwater input due to water cycle changes may be quantified at the surface of the ocean. As with salinity in equations (3) and (4), we begin by quantifying surface freshwater fluxes into volumes bounded by isotherms before interpolating onto temperature percentiles: in which A is the surface area bounded by contours of temperature Θ*, ρ 0 is the reference density, assumed to be 1,035 kg m −3 , and P, E and R are the precipitation, evaporation and river runoff respectively, in kilograms per second, assumed to be positive into the ocean. The residual between surface freshwater fluxes and the global freshwater flux represents the fluxes of salinity between percentile layers in the ocean.

Data
We explore the salinity tendency and freshwater and surface freshwater flux change over 44 years from 1970 to 2014 in a number of observational and climate model datasets.
Observations. The observational results presented come from three observational datasets and a composite dataset: Institute of Atmospheric Physics (IAP) data (in situ gridded temperature and salinity observations from refs. 14,21 for the upper ocean (z < 2,000 m), monthly averaged from 1970 to 2014 with 1° horizontal resolution); EN4 data (full-depth in situ temperature and salinity data from the UK Met Office Hadley Centre Enhanced Ocean Data Assimilation and Climate prediction version 4 (subversion EN.4.2.1, with expendable bathythermograph corrections from ref. 29 ; see ref. 23 for more details), monthly averaged from 1970 to 2014 with 1° horizontal resolution); Ishii data (in situ temperature and salinity data from ref. 22 for the upper ocean (z < 1,500 m), monthly averaged from 1970 to 2012 with 1° horizontal resolution); and composite data (full-depth composite of IAP data for z < 2,000 m and EN4 data for z > 2,000 m, monthly averaged from 1970 to 2014 with 1° horizontal resolution). Note that the IAP dataset is specifically formulated to reduce errors associated with sampling biases due to the rapid introduction of Argo in the early 2000s (see ref. 14 for more details), so we opt to use it where available. The composite dataset therefore makes use of the IAP data in the upper ocean and fills the deep ocean with EN4 data. All observed temperature and salinity measurements are converted to conservative temperature and absolute salinity in line with oceanographic standard practice and following recommendations by the UNESCO-IOC [30][31][32][33] . Marginal seas (the Mediterranean, Red, Baltic and Black seas, the Persian Gulf and Hudson Bay) are influenced by ocean processes different to the major ocean basins and vary in size and shape between climate models, so are excluded in the observations. In addition, observed time-mean surface freshwater fluxes in Fig. 1a, c are quantified by taking the zonal integral of precipitation, evaporation and river runoff from ECMWF Reanalysis 5th Generation (ERA5) data 26 , from 1979 to 2014.
Models. The analysis presented here enables direct comparisons between observations and climate models. We assess the difference between observations and climate models using 20 climate models from the CMIP6 (ref. 24 ; model names and references are provided in Extended Data Table 1). We focus on the historical experiments in the CMIP6 models (with masked marginal seas) that branch off from the pre-industrial control (piControl) runs from 1850 onwards. We analyse monthly averaged potential temperature and practical salinity fields from January 1970 to December 2014. Note that the difference between the potential temperature and practical salinity provided in the models and conservative temperature and absolute salinity is negligible 33 . Any differences between the CMIP6 models and observations may be attributed to the forcing field by comparing the CMIP6 response to a set of DAMIP models 25 . We look individually at the change in salinity in GHG-only (hist-GHG) and AA-only (hist-aer) runs in six DAMIP models that branch off from the piControl run in 1850. In these runs, the models are separately forced by AAs and GHGs in an effort to attribute the model response to either forcing field (see ref. 25 for more details of the DAMIP protocol). The list of DAMIP models used and relevant references (which include further details on the model setup) may be found in Extended Data Table 2. As with the CMIP6 models, the monthly averaged potential temperature and practical salinity fields from January 1970 to December 2014 are analysed in the DAMIP models. To evaluate internal variability in individual models, we also analyse 2 large single-model ensembles-30 historical runs with ACCESS-ESM1-5 and 28 historical runs with CNRM-CM6-1.
In the observations, CMIP6 runs and DAMIP runs, the annual mean of all relevant monthly averaged variables is calculated, binned into temperature space and interpolated onto temperature percentiles. yields the surface freshwater flux anomaly. Surface flux intensification (y axis in Extended Data Fig. 6) is calculated by dividing the surface freshwater flux anomaly by the climatological mean surface freshwater fluxes. Note that surface freshwater flux anomalies are not readily available in observations (with substantial regional variance between atmospheric reanalyses, see ref. 4 ), so they are calculated only for the CMIP6 and DAMIP runs.

Time-mean salinity and surface freshwater fluxes
We present the time-mean climatological state of the ocean in Extended Data Fig. 1. The time-mean T-S curve in Extended Data Fig. 1a shows that, generally, tropical and subpolar oceans are characterized by relatively fresh water, and the subtropics are characterized by relatively salty water. Although the CMIP6 models recreate the shape of the T-S curve seen in the observations, the majority are biased fresh compared with the observations. The distinction between climatological regions is driven, in part, by surface freshwater fluxes that, on average, dump freshwater into the tropics and subpolar regions and draw freshwater from the subtropics (Extended Data Fig. 1b).

Key characteristics of the temperature-percentile framework
Here we highlight the differences between the temperature-percentile framework and the more commonly used fixed-temperature framework. In absolute temperature space, the mean salinity of a given temperature layer changes owing to both changes in salinity and the warming of the underlying T-S distribution, which is particularly pronounced in strongly warming model simulations, and may be nonlinear. Extended Data Fig. 2 shows that tracking mean salinity changes in absolute temperature space yields an anomalously large salinity change compared with temperature percentiles (compare black and red lines in Extended Data Fig. 2). Transforming to temperature percentiles also has the added benefit of preserving the rough geographic distribution of outcrops, where isotherms would migrate poleward in a warming ocean. Extended Data Fig. 3 shows the time-mean surface outcrop location of temperature percentiles from 1970 to 1980 (dashed red lines), the time-mean outcrop location of temperature percentiles from 2004 to 2014 (solid red lines) and the surface outcrop location of the 1970-1980 isotherm corresponding to the dashed red lines, in the 2004-2014 period (solid blue lines). Clearly, temperature percentiles do not migrate as far north as isotherms, although this is more pronounced in the observations compared with the CMIP6 MMM.

T-S changes in DAMIP simulations
Here we present changes in the T-S curve in the DAMIP GHG-only, AA-only and corresponding CMIP6 historical runs, shown in Extended Data Fig. 4b-d, respectively. The multi-model mean T-S change in the six CMIP6 historical runs corresponding to the DAMIP models is similar to the multi-model mean change for all CMIP6 models in Fig. 2b. The GHG-only multi-model mean T-S change exhibits a larger magnitude of salinification compared with the all-forcing ensemble mean over a similarly narrow band of temperature percentiles, between the warmest ≈0.2% and 2% of the ocean. AAs, on the other hand, drive an opposite trend in water cycle change, with the subtropics getting fresher and the rest of the ocean salinifying.

Impact of internal variability on freshwater transport
As discussed in the main text, we analyse the influence of internal variability on the freshwater content change by analysing 30 and 28 historical ensemble members in the ACCESS-ESM1-5 and CNRM-CM6-1 models, respectively, as shown in Extended Data Fig. 5. In both the 2% and 6% warmest ocean by volume, the individual model ensemble has a similar spread to the CMIP6 multi-model spread in freshwater content change (compare coloured and grey histograms in Extended Data Fig. 5). In the 2% warmest ocean by volume, the standard deviation in the CMIP6 inter-model response is 11.8 mSv, compared to 10.1 mSv in the ACCESS-ESM1-5 ensemble and 7.4 mSv in the CNRM-CM6-1 ensemble. In the 6% warmest ocean by volume, the standard deviation in the CMIP6 inter-model response is 26.8 mSv, compared to 13.6 mSv in the ACCESS-ESM1-5 ensemble and 18.2 mSv in the CNRM-CM6-1 ensemble. The range of inter-and intra-model responses still lies outside the observations in the 6% warmest ocean. This difference is highlighted by tracking the freshwater content in the 2-6% warmest ocean layer, which shows a robust observational signal that is not captured by either the intra-or inter-model spread, implying that internal variability cannot explain the difference between CMIP6 models and the observations in the warmest 6% ocean. Whereas Figs. 3 and 4 and Extended Data Figs. 5 and 6 focus on freshwater content change and surface freshwater flux change in the warmest 2% and warmest 6% of the ocean, in Extended Data Fig. 7 we present freshwater fluxes for all percentiles, integrated from hot (0%) to cold (100%). As these plots are integrated from hot to cold, the sign of the slope of the lines indicates a net freshening or salinification. The observations and CMIP6 models freshen in the warmest ≈0.2% of the ocean, before salinifying in the warmest 6% and 2% of the ocean, respectively (Extended Data Fig. 7a). There is an increase in net P − E + R over the warmest 0.2% of the ocean, followed by decrease in net P − E + R over a narrow band of temperature percentiles that does not align with the broader range of percentiles over which salinification occurs in the CMIP6 mean (Extended Data Fig. 7b). Therefore, there is some component of circulation and mixing changes that impacts the salinification in the global freshwater fluxes in Extended Data Fig. 7a. In the warmest ≈22.8% of the ocean, the CMIP6 models experience a strong peak in freshening, which is not replicated in the observations. We also posit that the peak freshening over the warmest ≈22.8% of the ocean is a consequence of changes in ocean circulation and mixing, possibly due to changes in high-latitude ocean ventilation.

Global freshwater fluxes in CMIP6 and DAMIP
Extended Data Fig. 7c, d shows the global and surface freshwater flux changes in the observations, CMIP6 mean and the DAMIP runs. GHG-forced DAMIP models show a stronger salinification up to the warmest 2% of the ocean compared with the corresponding CMIP6 historical runs (compare red/orange lines with purple/magenta lines in Extended Data Fig. 7c). In addition, the freshening between the warmest 2% and warmest ≈22.8% of the ocean in the CMIP6 historical runs is not replicated in the GHG-only simulations, with little freshwater change in these layers. AA-only simulations show a strong freshening in the warmest 6% of the ocean (blue/light blue lines in Extended Data Fig. 7c). The GHGs and AAs in the climate models interact in a complex manner to yield the CMIP6 historical response-a weak salinification in the warmest 2% of the ocean and strong freshening in the warmest ≈22.8% of the ocean (purple/magenta lines in Extended Data Fig. 7c).
The stark difference in global freshwater fluxes in the GHG-only and AA-only runs may be explained, in part, by the surface flux tendencies in Extended Data Fig. 7d. The stronger freshening and salinification in the warmest 2% ocean in the GHG-only runs may, in part, be attributed to increased net precipitation and evaporation in the surface freshwater fluxes in Extended Data Fig. 7d. In addition, the GHG-only runs experience similar net evaporation over the warmest ≈22.8% of the ocean compared with the all-forcing runs, but do not experience the same peak in freshening as in the corresponding CMIP6 historical runs. We propose that the strong evaporation in the GHG-only runs at warmer percentiles and/or circulation changes may explain the lack of a freshening peak in the GHG-only runs. The AA-only runs show broadly the opposite trend in surface freshwater flux tendencies to the GHG-only and all-forcing runs.