Snow and landscape freeze-thaw information was obtained by fusing two different datasets the GlobSnow v3.0 climate data record (CDR)15,34, and the landscape freeze-thaw Earth system data record (FT-ESDR)16,35. The two products were combined on a fixed equal-area terrestrial grid (EASE Grid of 625 km2 grid cells), with one of five different daily landscape status values determined for each grid cell: thawed snow-free, frozen snow-free, thawed soil during fall with (dry) snow cover, frozen soil with dry snow cover and melting snow cover (soil frozen or top-soil thawed)28. For the analyses here, this classification was simplified to two classes: (1) thawed conditions and (2) frozen soil and/or snow cover. Annual summer maximum NDVI was extracted for evergreen forested areas of each investigated EASE Grid cell from the AVHRR NDVI data at 0.05 spatial resolution29. In practice, the maximum NDVI for the period 20 June - 31 July was extracted from ca. 25 AVHRR pixels covering each EASE Grid cell with a forest coverage fraction higher than 50 %. The applied eddy covariance (EC) flux station dataset includes daily averaged observations from 16 circumpolar sites20 that predominantly include mature forest stands (Extended Data Table 1). They provide altogether 114 annual time series of NEE from which GPP and ER were partitioned using established methods36. Mean annual NEE, GPP and ER [gC m-2 y-1] were calculated for all 114 flux observation time series. Additional datasets used for comparison include estimates from FLUXCOM24, JSBACH simulations30 driven by CRU-JRA dataset31, and TRENDY DLEM-S333. These datasets were interpolated to the same EASE Grid, and annual cumulative values were calculated for each dataset.

The basis of the developed approach is the significant correlation between the length of annual/seasonal thawed or frozen conditions and the cumulative ecosystem respiration and gross primary production (Extended Data Table 2). A regression formula for describing annual GPP as a function of the timing of spring snow melt (snow clearance from the landscape) and NDVI was obtained by analysing satellite retrievals against each of the 114 mean annual values of GPP from all EC flux stations. A formula applying a latitudinal correction to NDVI showed the best performance:

where *GPP*est [gC m-2 y-1] is the estimated mean annual GPP, is the number of days with thawed conditions during the first half of the year (of 183 days) based on satellite data, *NDVI*max is the annual maximum value for the grid cell (EASE Grid) from AVHRR data, *LAT* is the latitude. Regression coefficients *a*2 (7.873), *a*1 (2.042 103) and *a*0 (-837.8) were obtained by least-squares fitting the formula to the training dataset of 114 flux data-based mean annual values of GPP.

To facilitate the determination of error bounds for ER estimates the regression formula for mean annual ER is presented as a function of estimated GPP (note that GPP and ER are highly correlated):

where *ER*est [gC m-2 y-1] is the estimated mean annual ER, *annual-thaw* refers to the number of thawed, snow-free days during a single year, and *b*2 (-0.2026), *b*1 (15.42) and *b*0 (-423.2) are regression coefficients obtained by fitting the model to training data of mean annual ER. Equation (1) includes two predictor variables estimated by fitting the satellite data retrievals to flux data-derived values, whereas Equation (2) effectively includes three predictor variables as Equation (1) with the estimated parameters *a*2, *a*1 and *a*0 is inserted to Equation (2) for estimating the mean annual ER. The product of the length of annual thaw and the latitude-corrected *NDVI*max is used in Equation (2) as the second predictor variable since it provided the highest performance among investigated combinations.

For the mapping of circumpolar values of mean annual GPP and ER Equations (1) and (2) were applied to all grid cells representing evergreen boreal forests using a criterion that the grid cell of 625 km2 must have at least a 30 % fraction of conifer evergreen forest (larch dominated Siberian forests are excluded). Additionally, the analysis was only carried out for grid cells for which at least 80 % of winters have over 59 snow days during the first half of the year. This criterion makes the investigated forested area slightly smaller than previous studies7. The forest information was extracted from the ESA GlobCover and ESA Climate Change Initiative Land Cover dataset37. Circumpolar, North American, and European scale estimates of mean annual NEE were determined by subtracting the regional average of GPP for the regional value of mean annual ER. Since NEE is the small difference between ER and GPP and random error in estimating ER and GPP is relatively high, the spatial pattern of NEE cannot be reliably estimated for small regions (for large, continental-scale regions random estimation error is averaged out, which was confirmed by investigating the systematic error through the leave-one-out analysis).

Error bounds for the obtained satellite-derived estimates were determined through leave-one-out analysis (cross-validation). In this method, each flux station or a group of nearby flux stations is left out from the training of the regression algorithm one by one. The obtained regression algorithm is tested for the station left out from the training dataset. When this procedure is repeated for all stations, the outcome is the estimated random and systematic error (bias), see Extended Data Fig. 2. The obtained results indicate only a small bias both for the mean annual GPP (7.7 gC m-2 y-1) and ER (-20.9 gC m-2 y-1), which suggests that the satellite data analysis supported by the available flux station dataset yields spatially distributed estimates that have only a small systematic error. The maximum and minimum error bounds were calculated by applying the set of algorithms obtained from the leave-one-out analysis (Figs 2 and 3, Table 1). Altogether 10 algorithms are used to estimate the uncertainty range of GPP as flux stations from four regions were grouped together in the leave-one-out analysis (Estonia, northern Sweden, Russia, and Northwest Canada). For ER, the number of algorithms is 7 10 as the formula for estimating ER includes 10 GPP algorithms and the leave-one-out training is carried out using in situ ER values for 7 stations excluding the southern stations of Estonia, Hyytiälä (southern Finland) and Norunda (central Sweden), see Extended Data Fig. 2. However, Hyytiälä and Norunda were included in the testing of obtained ER algorithms, whereas Estonia south of the actual boreal zone was not included (Extended Data Fig. 2b).

The increasing circumpolar trends of mean annual GPP and ER during the period 1981-2018 are statistically significant with the criterion P-value < 0.05 for all regression algorithms obtained by the leave-one-out analysis (Fig. 2b and 2c). Further, the decreasing period of annual GPP from 2010 to 2018 (Fig. 2b) is also statistically significant (P-value of trend line = 0.0179 for the regression algorithm that shows the lowest statistical performance out of the ten GPP algorithms obtained by the leave-one-out analysis).

The validity of the developed approach can be further investigated by analysing the temporal behaviour of GPP and ER estimation error at the locations of circumpolar flux towers (Extended Data Fig. 4b). The difference between flux data-derived values and satellite estimates indicates that for both Eurasia and North America the overall bias does not show any significant trend during the period of flux observations. Flux data for calculating the mean annual values of GPP and ER, and incorporating several stations, is available for the period 2000-2018. Even though the available flux data do not extend to earlier years, the drivers of the developed satellite proxy explain the possible small decrease in GPP and ER through 1981-1990. The temporal dynamics of the satellite proxy are predominantly driven by the estimated lengths of thawed conditions. The pixel-wise median effect of springtime (6-month) thaw length variability to the estimated mean annual GPP is 368 gC m-2 y-1 during the period of 38 years, whereas the median dynamic effect of the latitudinally corrected NDVI driver is 279 gC m-2 y-1.

The MODIS GPP product38 can be also employed to provide an independent validation dataset. The comparison of MODIS GPP with the developed satellite proxy at the locations of flux stations through the period 2002-2018 indicates a relatively good agreement (r = 0.77, Extended Data Fig. 6). Based on comparison with the employed eddy covariance flux dataset, the MODIS product appears to underestimate the mean annual GPP for values above 1200 gC m-2 y-1 (nevertheless, the overall correlation is as high as r = 0.82 between the flux data-derived mean annual GPP and that derived from MODIS observations). Additionally, the comparison of the employed summer maximum NDVI with the summer maximum value of solar-induced chlorophyll fluorescence (SIF) obtained from a merged time series of Global Ozone Monitoring Experiment 2 (GOME-2) and Sentinel-5P TROPOMI satellite observations39,40 indicates that the NDVI information is highly correlated with the magnitude of boreal forest vegetation photosynthesis (Extended Data Fig. 1). Thus, the summer maximum NDVI explains well the spatio-temporal variability of peak vegetation productivity.