In this section, we first present the geolocator data used to illustrate our new approach (3.1). Since our methodology relies on identified stationary periods, we describe the steps followed to derive these stationary periods from activity data (3.2), before presenting our novel approach to estimate the geolocator position based on pressure data (3.3). Finally, we outline the methodology followed to estimate position from light data (3.4) to compare positions based on light and pressure respectively.

## 3.1 Geolocator data

In order to test our methodology with a diverse range of migrants, we use multi-sensor geolocator data from 16 tracks of nine species. The dataset includes four Afro-palearctic migrants, two Palearctic migrants and three Afro-tropical migrants (Fig. 1). These species exhibit a wide range of migration strategies in terms of distance covered (664–7192 km), migration duration (5-222 days and 39–474 hours of flight) and number of stopovers (9-151). They are all flapping migrants, mostly nocturnal with some whose flight extends into the day (e.g., Tawny Pipit, [24]). We included a fully mountainous species (Ring Ouzel, [25]) as well as a partly mountainous one (Eurasian Hoopoe in breeding season only) to illustrate the challenges of altitudinal movement with pressure and explore solutions to these challenges. The tracks also span over a wide range of latitudes in order to compare our results with light-based positioning, as well as different pressure systems (e.g., pressure tide at the Equator).

All individuals were equipped with a SOI-GDL3 PAM device weighing 1.3-1.5g [17]. The data of interest recorded are: (1) light measured every 5 minutes, (2) activity as the sum of the difference in acceleration on the z-axis of 32 measurements taken at 10Hz (~ 3 sec) every 5 minutes, and (3) atmospheric pressure (hPa) measured at 5-, 15- or 30-minutes intervals depending on the geolocator setting. The sensitivity of all sensors of all tags are tested under controlled conditions for three days before they are used in the field.

## 3.2 Identification of stationary periods from accelerometer data

We use the accelerometer data to determine stationary periods, namely a period during which a bird stays within the same location (at km scale). This is performed with a classification of the bird’s activity following

[18]. This classification consists of first classifying high and low activity data based on a K-mean clustering and then labelling continuous periods of high activity lasting at least 30 min as migratory flight. Because the pressure analysis requires high precision of classification, an iterative manual editing of the activity data is performed using TRAINSET [28]. More information on this manual editing can be found at https://raphaelnussbaumer.com/GeoPressureR/articles/labelling-tracks.

## 3.3 Creation of map of probable positions from pressure data

For each stationary period, we produce a probability distribution map of the position of the bird by applying a new approach which matches the timeseries of atmospheric pressure measured by the geolocator with ERA5 reanalysis data [29].

The atmospheric surface pressure is taken at the maximum available resolution of 0.25° x 1hr. To be able to compare the geolocator and reanalysis timeseries, we downscale the geolocator pressure to 1 hour resolution. Beforehand, we remove outliers with a moving median of +/- 1 datapoint (corresponding to 5, 15 or 30 minutes depending on the measurement resolution) for each stationary period.

The probability distribution map is estimated for each stationary period by combining two sets of information: (1) we apply a mask of possible locations for which the ground level elevation matches the pressure range measured by the geolocator (3.3.1) and (2) we quantify the likelihood of the mismatch between the temporal timeseries measured by the geolocator and the reanalysis data (3.3.2).

## 3.3.1 Filtering locations based on absolute pressure values

In this step, we leverage the relationship between pressure and elevation to filter out locations where the pressure recorded by the geolocator does not correspond to the ERA5 pressure at ground level.

The elevation of the reanalysis \({z}_{ERA5}\) can be computed based on the geopotential \({\Phi }\) (m2 s− 2) value provided by ERA5 with

$${z}_{ERA5}=\frac{1}{g}\frac{r{\Phi }}{r-{\Phi }},$$

where the gravity constant \(g\) (m s− 2) and the earth radius \(r\) (m) are adjusted for latitude [e.g., 30].

However, the spatial resolution of the reanalysis product is too coarse (0.25°~30km) and does not account for the possible variations of elevation (and pressure) within each grid cell. To circumvent this issue, we compute the minimum and maximum ground elevation which can be found in each grid cell based on the finer resolved digital elevation model, the SRTM-90m [31] (3arc-second ~ 90m).

To correctly estimate the altitude of the bird, we account for the known temporal variation of pressure from the ERA5 surface pressure. For each stationary period and grid cell, we compute the equivalent altitude, that is the bird’s altitude if the bird was located in this specific grid cell. The equivalent altitude \({z}_{gl}\) is computed based on the pressure measured by the geolocator \({P}_{gl}\) and the surface pressure of the reanalysis \({P}_{ERA5}\) with the barometric equation,

$${z}_{gl}={z}_{ERA5}+\frac{{T}_{0}}{{L}_{b}}{\left(\frac{{P}_{gl}}{{P}_{ERA5}}\right)}^{\frac{R{L}_{b}}{\text{g}M}-1},$$

where \({L}_{b}\)is the standard temperature lapse rate (-0.0065 K/m), \(R\) is the universal gas constant (8.31432 N*m/mol/K), \(g\) is gravity constant, \(M\) is the molar mass of Earth’s air (0.0289644 kg/mol) and \({T}_{0}\) is the standard temperature (273.15 + 15 K).

Then, assuming that the bird is always on the ground (i.e., between the minimum and maximum elevation of each grid cell), we mask all grid cells where the equivalent altitude \({z}_{gl}\) is either always below the minimum altitude or always above the maximum altitude during the entire length of the stationary period. In order to account for some errors and variability of the timeseries (e.g., due to temporal offset in tidal effect), we smooth the timeseries with a moving average of one day and accept elevation errors within a margin of +/-20m.

This first information extracted from pressure data thus produces a mask of possible grid cells where the bird could have been.

## 3.3.2 Temporal match of pressure values

The second information extracted from the pressure data quantifies how well the temporal variation of the geolocator’s pressure measurement matches the temporal variation of the reanalysis data.

For each stationary period, we quantify the difference between the geolocator \({P}_{gl}\) and reanalysis pressure \({P}_{ERA5}\) for each grid cell. However, as the influence of altitude is already verified in the previous sub-section, we remove the mean error of each stationary period,

$$\epsilon ={P}_{gl}-{P}_{ERA5}-\left(\stackrel{-}{{P}_{gl}-{P}_{ERA5}}\right).$$

We then assess the match by assuming a normal distribution of the errors,

$$f\propto \text{exp}\left(-w\left(n\right){\sum }_{i=1}^{n}\frac{{\epsilon }_{i}^{2}}{{\sigma }^{2}}\right),$$

where \(\sigma\) is the standard deviation of the error which was calibrated at the equipment and retrieval site (value ranging between 0.4 and 3) (see Fig. 1 in Additional file 1). This value accounts mainly for the altitudinal movement of a bird during a stationary period. Because some species live in mountainous areas only during their breeding period (e.g., Eurasian Hoopoe), we used a lower standard deviation during the rest of its journey than the one calibrated at the equipment site.

\(w\left(n\right)\) accounts for the correlation in the \(n\) datapoints of the pressure error timeseries \(\epsilon\) and is equivalent to the log-linear pooling weight in aggregation probability theory [e.g., 32,33]. As such, \(w=1\) assumes independence of the errors (i.e., conjunction of probabilities) and \(w=1/n\) (disjunction of probability). We empirically determine \(w\left(n\right)\) as

$$w\left(n\right)=\frac{\text{log}\left(n\right)}{n}.$$

We outline the rationale for using log-linear pooling and assess \(w\left(n\right)\) at the calibration periods at https://raphaelnussbaumer.com/GeoPressureR/articles/probability-aggregation.html.

For mountainous species, where extensive altitudinal change was found, we choose the most frequently occurring altitude level and manually discard the pressure at other altitudes (e.g., 20OE and 20OA in Additional file 2). If the bird changes altitude constantly, we discard the pressure for that entire stationary period (e.g., 20OA in Fig. 4). In both cases, we still use the pressure threshold mask without discarding any data. Because this methodology requires the pressure timeseries measured from a single elevation, we discard pressure datapoints when the bird is at a different elevation. This iterative labelling process is outlined in more detail and illustrated with examples at https://raphaelnussbaumer.com/GeoPressureR/articles/labelling-tracks. The resulting labelled pressure timeseries for all tracks can be visualized at https://raphaelnussbaumer.com/GeoPressureMAT/html/AllTracksPressureWithReanalysis and in Additional file 2.

Finally, we combine the mask of possible locations created in section 3.3.1 with the probability map based on the mismatch of pressure timeseries generated in section 3.3.2 to produce a probability map of possible bird locations for each stationary period.

## 3.4 Creation of map of probable positions from light data

We estimated the probability map of the bird’s position for each stationary period based on light measurements following the threshold method [9,10,15,34,35].

First, the twilight time is defined automatically as the first and the last light of each day recorded. A manual editing is performed to remove outliers with TRAINSET [28] based on a visual comparison of geolocator twilight datapoints and the theoretical twilight at the best match location based on light data for each stationary period.

Then, we calibrate the twilight based on the stationary period when the bird was located at the equipment and retrieval site. The calibration is performed directly on the zenith angle of the sun which corresponds to the angle of the sun at the known location of the bird, and at the time of twilight estimated on the geolocator data. We fit the distribution of zenith angle of all twilight of the calibration period with a kernel smoothing function [e.g., 36], allowing to account for any probability distribution of zenith angle (Fig. 3 in Additional file 1). Based on the distribution of zenith angle, we compute the probability distribution map of each twilight [e.g., 34].

Finally, the probability distribution maps of all twilights belonging to the same stationary period are aggregated into a single probability map. Because consecutive twilight estimates cannot be assumed to be independent due to the constant or correlated shading effects (e.g., topography, weather, vegetation), we use a log-linear pooling aggregator to combine the probability maps. Log-linear pooling relaxes the assumption of independence in the Bayesian aggregation [e.g., 32,33,37] by combining (conditional) probabilities as a weighted sum in log-space. A constant weight of 0.1 was empirically chosen.