1. Theoretical basis to estimate CUE from eddy covariance observations
We used eddy covariance observations of CO2 flux between ecosystems and the atmosphere provided in the standard FLUXNET2015 dataset 58, as well as the flux data acquired from Ameriflux and ICOS, processed by the FLUXNET2015 pipeline. The pipeline to produce the data includes the quality control of raw flux data and the partitioning of the net CO2 exchange into GPP and Reco. Our main analysis used the GPP and Reco values based on the daytime partitioning method (GPP_DT_REF and Reco_DT_REF), though we have also tested GPP and Reco from the nighttime partitioning method and got similar results (Fig. S2).
According to the definition of CUE, CUE = (GPP-Ra)/GPP = 1 – Ra/GPP. Between any two timestamps, we can obtain:
ΔRa = (1-CUE) × ΔGPP (Equation 1)
in which we assume CUE to be constant between the two timestamps. Meanwhile, we further expanded autotrophic respiration (Ra) to growth respiration (RG) and maintenance respiration (RM). RG is the cost of producing new biomass (G) and RM is the cost of maintaining the existing biomass (Wlive). Therefore, RG and RM can be further expanded to the product of their respective respiration coefficients (gR and mR) and biomass amounts (G and Wlive)17, leading to Equation 2 below:
Ra = RG+RM = gRG+mRWlive (Equation 2)
Substituting Equation 2 into Equation 1, the change in Ra between two timestamps can be described as ΔRa = gRΔG+mRΔWlive. The change in new biomass growth (ΔG) is proportional to the change in NPP between two timestamps (ΔNPP = ΔGPP × CUE). The changes in live biomass (ΔWlive) that needs maintenance are proportional to the accumulated NPP between the two timestamps ΔcNPP = ΔcGPP×CUE×(1-𝜏), where cNPP and cGPP are the cumulative NPP and GPP from day one of the year, respectively. 𝜏 represents the fraction of live biomass turned to non-respiratory biomass per unit time. Therefore, we can adjust the Equation 2 to:
ΔRa = gR×CUE×ΔGPP + mR×CUE×ΔcGPP×(1-𝜏) (Equation 3)
Note that in equation 3, we assume all ΔGPP × CUE between the two timestamps is used for growth, neglecting the carbon allocated for root exudation or symbiosis of mycorrhizal. This assumption would not affect the estimation of CUE, but may lead to an overestimation of gR if carbonallocated for non-growth purposes is significant between the two timestamps.
Since mR has a strong temperature dependence19 and changes substantially within a year, to facilitate the comparison between site-years we further include the temperature sensitivity of maintenance respiration (i.e., Ea in eV, the Arrhenius activation energy for respiration) into equation 3:
ΔRa = gR×CUE×ΔGPP + e(-Ea/KB)(1/T-1/T0) ×mR0×CUE×ΔcGPP× (1-𝜏) (Equation 4)
where mR0 refers to the baseline mR of the site-year and T0 (in K) is the temperature corresponding to mR0. T is the mean temperature of the temperature bin. KB = 8.617333262 × 10−5 eV K−1 is the Boltzmann constant.
We further used two criteria to organize daily flux data for the derivation of CUE - 1) we grouped flux observations using windows of 5 consecutive days in the growing season, and 2) we grouped flux observations by daily mean temperature (i.e., the difference in temperature) should be less than 1 °C and remove the days with rainfall greater than 2 mm/day to avoid sudden pulses of respiration after rewetting the soil (e.g. Birch effect) 59,60. The criteria are to ensure there are limited changes in soil organic matter (i.e., majority of the soil organic matter have turnover time longer than a week) and no changes in Rh under similar temperatures. Therefore, we can regard ΔRa = ΔReco in these data groups. A side effect of these criteria is relevant to the interpretation of the results. The criterion 1) assumes there are limited changes in the biomass of leaf and fine roots within the short time window (i.e., 5-day), therefore the estimated mR and 𝜏 mostly reflect the characteristics of live biomass in stems, branches and coarse roots where most biomass is stored 61.
2. The workflow to estimate CUE from eddy covariance observations
Following the theoretical basis above, we designed the following workflow to estimate CUE, gR, 𝜏 and Ea for each site-year (Fig. S3):
Step 1. Organize daily flux data in the growing season into groups of 5 consecutive days. Within each 5-day window, we obtained ΔGPP, ΔRa and ΔcGPP between every two days. For example, if in a 5-day window the daily GPP are 1.1, 1.2, 1.3, 1.4, 1.5 g/m2/day, then the ΔGPP between day 5 and day 1 would be 1.5 – 1.1 = 0.4, and the ΔcGPP between day 5 and day 1 would be (1.1+1.2+1.3+1.4+1.5) – 1.1 = 5.4.
For each time window, we got approximately 20 samples of ΔGPP, ΔRa and ΔcGPP. The growing season was defined as when daily GPP was above 20% of the maximum daily GPP of the site-year.
Step 2. Apply the data obtained in step 1 to equations 1 and 3, and estimateCUE, gR, 𝜏 and mR for each 5-day time window. In this step, since we have 20 samples and 4 unknown parameters, we used the Markov Chain Monte Carlo (MCMC) method 62 to estimate the optimal values of CUE, gR, 𝜏 and mR for each 5-day time window. When running MCMC, the prior of variables for CUE, gR, mR and 𝜏 were set within the ranges of (0, 1), (0.15, 0.3), (0.0, 0.5) and (0 1), respectively.
Step 3. Obtain mR0 and T0 for the site-year, where mR0 is the bottom ten percentile of the mR values of all 5-day windows in the site-year (from step 2). T0 is the mean temperature corresponding to mR0. We used the bottom ten percentile to avoid obtaining an extreme mR0 value which is not representative of the ecosystem.
Step 4. Obtain gR and 𝜏 for the site-year, where gR and 𝜏 are the average gR and 𝜏 of all 5-day windows (from step 2). The step aligns with the observations that gR has a small range based on the construction cost of key chemicals in plants and should have limited seasonality 42,63. There were no previous reports on the seasonality of 𝜏, therefore we assume that it is also a constant for each site-year.
Step 5. Organize daily flux data by temperature and remove the days with precipitation greater than 2 mm. Within each temperature bin (the range of temperature is less than 1K), we obtained ΔGPP, ΔRa and ΔcGPP between every two days. For each temperature bin, we got approximately 20-200 samples of ΔGPP, ΔRa and ΔcGPP.
Step 6. Apply the data obtained in step 5, the mR0 and T0 obtained in step 3, and the gR and 𝜏 obtained in step 4, to equations 1 and 4, and estimateCUE and Ea for each temperature bin. In this step, since we have 20-200 samples and 2 unknown parameters, we used MCMC again to estimate the optimal values of CUE and Ea for each bin. When running MCMC, the prior values for CUE and Ea were set within the ranges of (0, 1) and (0.2 1.5), respectively.
Step 7. Obtain CUE and Ea for the site-year, where CUE and Ea are the average CUE and Ea of all bins from step 6. We acknowledge that while our method provides seasonality of CUE and Ea, we only examined CUE and other parameters at the annual time scale in this study.In total, we obtained 2737 site-years of CUE record for our analysis. We also got the average CUE for each site for validation (Fig. 1b; Table S2).
3. Quantify relative CUE from paired basal area increment (pBAI)
In parallel to CUE, the carbon accumulation rate of individual trees is assessed by another metric - basal area increment (BAI). BAI is indicative of NPP, as NPP allocated to stem, NPPstem, can be estimated by scaling BAI with stem density (i.e., number of trees per unit area), wood density and tree height. Similar to CUE, many studies have examined BAI and its associated changes in biomass production with abiotic (i.e., temperature 64 and droughts 65) and biotic factors (i.e., stand age 66, tree size 67, mortality 68, forest types 69), however, they often had a regional focus and thus lack global implications. Following these studies, we have:
NPPstem = BAI × tree height × wood density × stem density (Equation 5)
Meanwhile, we estimated that the CUE for stem is CUEstem = NPPstem/GPP, and assume that the difference in CUEstem is indicativeof the difference in total CUE. To compare the relative difference between the CUE of deciduous trees (DB) and evergreen trees (EV) at the same locations (i.e., similar climate and thus GPP), we have:
CUEstem_DB/CUEstem_EV ~= BAIDB/BAIEV = pBAI (Equation 6)
where pBAI means the paired BAI of DB and EN trees at the same locations. In this study, to obtain pBAI, we used a clean and high-quality version of tree-ring width data from international tree-ring data bank (ITRDB) 26 and estimated BAI from tree-ring width using the functions in an open R package “dlpR”. We classified the trees into DB and EV, based on the species information compiled at https://www1.ncdc.noaa.gov/pub/data/paleo/templates/tree-species-code.txt (accessed in 2021, now archived at https://github.com/lxzswr/CUE_fluxnet/blob/main/tree-species-code.txt). Note the EV trees are mostly evergreen needleleaf trees in ITRDB while evergreen broadleaf trees in the tropics are underrepresented.
4. Ancillary climate, soil and model datasets
For site-level analysis, we used the daily meteorological observations from the eddy covariance datasets. The climate variables we used include air temperature, precipitation and photosynthetic active radiation. The stand age of flux sites is acquired from a previous study70 wherever there is a record. For sites lacking stand age data, we sourced the required information from the Global Forest Age Dataset (GFAD v1.1) 71. The stand age of non-forest biomes is assumed to be 1 in all attribution analyses. We extract the soil total nitrogen content (soil N content) for each flux site from Soilgrids at 250 m resolution 72.
We further used the TS4.04 version of monthly gridded air temperature, precipitation, and solar radiation data at 0.5° provided by the Climate Research Unit (CRU) 73, along with the aforementioned GFAD v1.1 stand age dataset and the soil N content from soilgrids, to estimate global CUE using a random forest model (see Section 5 of Methods). All gridded datasets were resampled to 0.5 degree. The gridded climate data were averaged to annual values, including mean annual temperature (MAT), mean annual precipitation (MAP) and mean annual photosynthetic active radiation (PAR).
Meanwhile, we studied the CUE estimated by 15 Dynamic Global Vegetation Models (DGVMs) participating in the TRENDY v9 project in our study 74,75. For each DGVM, we estimated their CUE as the ratio of mean annual NPP to mean annual GPP from 1980 to 2019 (Fig. S5). The simulations of NPP and GPP were conducted under the S3 scenario (i.e., considering elevated CO2, climate change, and land use and land cover changes).
5. Linear mixed effect model and random forest model
We used the linear mixed effect (LME) model 76 to quantify the sensitivities of CUE to abiotic and biotic factors. LME model suffers less from correlations of samples from similar groups and can quantify the impacts of grouping as the random effect term. This is especially useful in our study to understand the impact of PFTs on CUE, while obtaining the sensitivity of CUE to continuous variables such as climate and soil properties. We tested multiple LME models and selected the one that has the lowest Akaike information criterion (AIC) and Bayesian information criterion (BIC), and considers the major factors found in previous studies (Table S1). The final LME model to estimate CUE was based on MAT, MAP, PAR, stand age, and PFTs, where PFTs were set as a random effect, and the others were fixed effect. The analysis was conducted using the function ‘fitlme’ in MATLAB.
We then trained a random forest model using the 2737 records of CUEEC and the in-situ variables identified by the LME model as inputs. Subsequently, the random forest model was employed to estimate global CUE using gridded MAT, MAP, PAR, stand age, soil N and PFTs. External validation of the random forest model shows good performance in extrapolating CUE (R = 0.72, RMSE = 0.08). We chose the random forest model as it can utilize both continuous variables (i.e., climate) and categorical variables (i.e., PFT) as inputs, capture the nonlinear dependence of CUE on some variables, and demonstrate stronger statistical performance in predicting CUE compared to LME. The random forest model has been widely used for upscaling in ecology and is often regarded as one of the better performing models 77–79.