The Eulerian frame of reference is convenient for atmospheric observation and modelling but conceals prevalent sampling bias in atmospheric data. This frame examines a control volume or “point” whose mass varies with inward and outward air fluxes, accommodating immobile measurement stations and model grid cells. By contrast, a Lagrangian frame that follows a particular fluid mass is preferable for expressing physical conservation laws1, and therefore for recognizing sampling bias. It shows that in the atmosphere it is practically impossible to practice “simple random sampling”, which requires an equal sampling likelihood for every element of the population and is the key to avoiding bias and simplifying analytic computations2.
Since Lagrangian measurement is an exceedingly difficult challenge, it is demonstrative to specify a simple, artificial dataset. Consider therefore a thermodynamic system composed of many dry air parcels, each with identical mass and pressure but thermally divided into two categories, with half the parcels warm (20 °C) and half cold (-20 °C). Clearly, the average temperature is 0 °C. However, because most of the aggregate volume is warm (by Charles's Law), a thermometer deployed randomly within the system is more likely to sample warm air than cold, despite their having equal masses. This exposes a universal issue with atmospheric sampling and reveals bias that repeated measurement does not easily amend.
Increasing the number of observations in this simple example, whether by random or gridded sampling, would yield > 53% warm samples and an overestimation of the mean temperature from arithmetic averaging by nearly 1.5 °C. Dividing the system into numerous equal-volume grid cells would produce more warm cells than cold, denser cells. Such spatially defined measurement or modelling domains are typical in atmospheric science and exemplify what statisticians call “cluster sampling”3. Critically, cluster populations vary with atmospheric state (particularly temperature) but are unknown a priori. This precludes unbiased sampling unless the instrument sample volume exceeds the scales of atmospheric variability.
Such bias affects the statistical moments of any state or flow variable. If the parcels defined above were moving vertically in a convective boundary layer, with warm/cold parcels rising/falling each at 10 cm s-1, the average velocity would be zero. However, an anemometer deployed to a random point would sample rising air with >53% probability. Arithmetic averaging of measurements from numerous such anemometers within the system, or of a time series measured by a single instrument with “frozen turbulence” advecting past (employing Taylor’s hypothesis4), would then yield an upward average vertical velocity of 7 mm s-1, again because sampling bias nudges the arithmetic average towards the characteristics of warmer air.
Failure to recognize this has confused the physical mechanisms of convective boundary-layer transport. Long-standing theory in micrometeorology, reasoned from the Eulerian perspective, has it that turbulence transports air downward against an upward heat flux, since warm updrafts are less dense than cool downdrafts5,6. This would imply an average velocity in the direction of the heat flux (upward), and is the basis of the "density corrections"6 that have underpinned decades of micrometeorological research7. But the simple example specified above illustrates its fault: with equal mass and opposing velocities of warm and cold parcels, their momenta sum to zero, defining a null ensemble velocity. Cold Eulerian volumes are denser but also fewer – the same mass occupies less volume. Neglecting this, random spatial sampling and arithmetic averaging combine to produce an upward average velocity that is an artefact of sampling bias. In fact, neither the turbulence nor the mean flow transports air for the conditions specified. Thus, unrecognized sampling bias has led to the entangling of gas transport mechanisms, by turbulent diffusion versus by the mean flow8.
Systematic errors such as those derived from sampling bias should be quantified when known and corrected when not negligible. If sampling bias cannot be avoided, unbiased estimates of statistical moments such as the average can be obtained via weighted calculation schemes. The Horvitz-Thompson estimator9 was designed for this purpose, and for dry air can accurately approximate the average by weighting each sample by the number of (moles of) air molecules that it represents.
However, atmospheric composition varies, and the leverage exerted by a sample on the aggregate is determined not by its molecule count, but rather by conservation laws. For example, in thermodynamics the determinant is the heat content: when we combine equal numbers of molecules of water vapour at one temperature and dry air at another, the temperature of the mixture is closer to that of the water vapour due to its high specific heat, inferior mass notwithstanding. Generally, any isolated system of air molecules that mixes internally conserves its mass, linear momentum, and energy, yielding final values of state and flow variables that correspond to their ensemble averages, which remain constant in the absence of external forcing. Therefore, weighting factors have been derived from mixing theory10 to negate the systematic errors that sampling bias otherwise induces.
The effects of sampling bias span the full range of atmospheric scales. We have calculated the magnitude of sampling-bias induced errors regarding global-scale warming, comparing trends in the National Center for Environmental Prediction (NCEP)-National Center for Atmospheric Research (NCAR) reanalysis11 temperatures averaged arithmetically (Ta) versus weighting (Tw) according to mixing theory (Fig. 1). The artefacts of bias towards the characteristics of warm air on Ta include (a) the overestimation of the global-average surface air temperature by ca. 0.4 °C; (b) the underestimation of the rate of temperature increase by about 9%, since the tropics have warmed less rapidly than the poles12; and (c) the underestimation of the variance explained by the linear trend. The difference between trends in Tw and Ta is of a magnitude similar to the differences between ensemble members of the HadCRUT4 data set for the period 1979-201013. Thus, uncorrected sampling bias causes meaningful underestimation of both the magnitude and certainty of global warming. (Similar conclusions were reached using the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data (ERA5)14,15; see Supplementary Information.)
Unless unbiased sampling strategies can be devised, micro- to global-scale atmospheric science requires weighted calculation of averages and other statistical moments, in order to avoid systematic error.