Field-based tree mortality observations constrain model-projected forest carbon sinks across continents

Considerable uncertainty and debate exist in projecting the future capacity of forests to sequester atmospheric CO 2 . Here we generate spatially explicit patterns of biomass loss by mortality (LOSS) from a dataset (n = 2676) of long-term (1951 to 2018), largely unmanaged forest plots to constrain projected (2015-2099) net primary productivity (NPP), heterotrophic respiration (HR) and net carbon sink in six dynamic global vegetation models (DGVMs) across North and South America, Africa, Asia, and Australia. This approach was motivated by the higher accuracy of field-derived LOSS estimates and the strong relationship among LOSS, NPP, and HR at continental or biome scales. The field observations showed higher values of LOSS in tropical regions (0.53 Kg m -2 y -1 ) than in North America (0.22 Kg m -2 y -1 ). The upscaled gridded LOSS map show hotspots in Southern Asia & Australia, Northwestern South America, and the western coast of North America. The DGVMs overestimated LOSS, particularly in tropical regions and eastern North America by as much as 0.5 Kg m 2- y 1 . The spread of DGVM-projected NPP and HR uncertainties was substantially reduced after constraining the simulations with the observed LOSS patterns. The observation-constrained models show a decrease in the tropical forest carbon sink by the end of the century, particularly across South America (from 2 to 1.3 PgC y 1 ), and an increase in the sink in North America (from 0.75 to 0.97 PgC y 1 ). These results suggest that forest demographic data can be used to constrain land carbon sink projections. c, d). These results suggest a weak linear


Main text
Forests are major drivers of biophysical land-atmosphere feedbacks, the global carbon and water cycles, and thus overall planetary climate regulation 1 -yet large uncertainties and substantial debate persist regarding ongoing and potential changes in the capacity of forests carbon sinks across continents or biomes (i.e., tropical vs boreal) under anticipated future climates 2,3 . It is often thought that increased net primary productivity (NPP) across spatial resource gradients or under elevated CO2 will increase carbon sinks 4,5 . The loss of CO2 to the atmosphere through ecosystem heterotrophic respiration (HR), however, has been observed to increase worldwide 6 and could lead to a trade-off or even outweigh the positive effects of NPP (i.e., at regional scales) 7 , thus potentially amplifying future climate change via enhanced rates of carbon emission to atmosphere 8,9 . Reducing uncertainties about the future forest carbon sink thus requires an integrated understanding of NPP and HR across space or time.
Recent studies have suggested that faster forest growth (NPP) leads to higher tree mortality at local and regional scales, particularly in tropical forests 10,11 . This has been recently demonstrated across spatial scales in boreal forests using tree-ring datasets 12 and across forest biomes in Earth system models (ESMs) 13 . While the fraction of biomass subjected to tree mortality is often treated as a constant proportion of standing stocks in ESM simulations 13 , this steady state representation of mortality and growth is not necessarily observed in forests responding to global change. Ongoing climate change can result in disproportionate mortality relative to growth, from direct physiological mortality from more extreme drought and heat events, insect outbreaks, windthrow, lightning, and wildfire 14,15 . In contrast, growth may be exceeding mortality in some forests due to increased atmospheric CO2 reducing drought stress 16 and/or nitrogen deposition enhancing growth 17 . At long-term and broad spatial scales, however, NPP, biomass loss from tree mortality (hereafter LOSS) and HR could be coupled with their strong positive relationships because of the linkage of aboveground and belowground processes 18,19 , while the instantaneous rates of NPP, LOSS and HR could be decoupled over short term or local scales 20,21 . Thus, data on LOSS which can be directly measured in groundbased forest plots with high certainty 13 and linked or correlated with NPP and HR, may provide an unique constrain on the forest carbon sink in a future changing climate.
Uncertainty in projections of forest carbon cycling could be reduced through an emergent constraint (EC) approach by identifying heuristic relationships between a multi-model ensemble and an observational estimate. This approach has been used to constrain the projected land carbon storage 22 , or photosynthesis/GPP [23][24][25] , with the assumption that the processes driving the long-term response are also driving the historical (short-term) patterns 22,23 . Interpreting the results of emergent constraints needs caution in the confirmation of the verified mechanisms 26,27 and the mismatch of spatial and temporal scales between data and models 25,26 . Earlier studies usually used atmospheric variables (i.e., CO2 concentrations) to constrain the carbon storage and photosynthesis at regional to global scales 22,23 . Using spatially explicit observational products (i.e., GPP, evapotranspiration and leaf area index), studies have emergently constrained the future terrestrial carbon cycling projections at grid scale and then spatially aggregated to broader spatial scales 24,28 . Recent studies highlight the potential of integrating emergent constraints at lumped broad spatial scales and machine learning to generate the spatially explicit constraint of projected gross primary production (GPP) by accounting for the non-linear relationships between GPP and environmental drivers 29 . However, to date it remains unclear whether the increased availability of ground-based datasets (forest plot data) could better constrain the projected forest carbon sink at broad spatial scales in a future climate. Because observational uncertainty has more influence than model ensemble uncertainty in EC 26 , a LOSS constraint based on forest plot datasets is urgently needed and expected to largely reduce the uncertainties in projections of future forest carbon sinks and associated feedbacks to climate.
Here we generate spatially explicit patterns of LOSS from long-term ( and higher mortality [10][11][12][13] and the couplings of growth, mortality and respiration at long-term and broad spatial scales 18,19 . We used LOSS because: 1) it can be directly measured in forest inventory datasets 13 with higher accuracy than input fluxes such as growth or woody NPP (see Methods). Indeed, quantification of growth includes components of recruitment of new trees and growth of surviving trees 13 , and differing thresholds of diameter at breast height as a criterion to determine the recruitment of new trees could lead to large uncertainties in estimates of growth across forest plots 30 ; and 2) the accuracy of constraining the projected carbon cycle in DGVMs largely depends on the observational uncertainty 26 . We first used a random forest to upscale spatial variations of LOSS with 58 environmental variables (see Supplemental Table 1 40,41 . This pattern also support the existence of a spatial tradeoff between faster growth and higher mortality because of resource limitations or younger death, whereby competition plays the fundamental role 13,41 . In contrast to other studies 15,42 , forest age (available in boreal and temperate forests in North America) was not a good predictor of LOSS ( Supplementary Fig. 9), likely because of our focus on mature and old-growth forests (i.e., age > 80 and 100 years in boreal and temperate forests, respectively).
We then compared the observed patterns of LOSS with those simulated by six state-of-theart DGVMs in which tree mortality and LOSS were explicitly simulated 31  Methods). We found that the emergent constraint approach worked well in North America, where the relationship between historical LOSS and projected NPP and HR was significant (the scenario of using original plot data of LOSS: R 2 = 0.68 and P = 0.04 for grid-level NPP; R 2 = 0.97 and P = 0.0001 for grid-level HR; the scenario of using map of LOSS at continent scale: R 2 = 0.7 and P = 0.04 for grid-level NPP; R 2 = 0.95 and P = 0.0008 for grid-level HR) (Supplementary Fig. 11a; Supplementary Fig. 12a). This emergent constraint approach was less effective for other continents, where tropical forests are predominant (all P > 0.05; Supplementary Fig. 11b, c, d; Supplementary Fig. 12b, c, d). These results suggest a weak linear relationships when observations are lumped at broad continental scales, thus highlighting the importance of spatial scale in this emergent constraint 26 . We interpret the result that this LOSS emergent constraint works better in North America than in the tropical forests, by a better representation of forest plot distribution and couplings of LOSS and NPP and HR across space in North America.
To overcome this limitation, we trained a machine learning algorithm 29 13). After including the LOSShis (derived from LOSS) in the machine learning algorithm, we were able to generate spatially-explicit constrained estimates 29 of projected NPP and HR, and then compare them with the scenario without the constraint (Supplementary Fig. 14;   Supplementary Fig. 15). These patterns essentially show a lower NPPpro or HRpro in locations of overestimated LOSShis in DGVMs, consistent with the positive relationship between LOSShis and NPPpro or HRpro (Supplementary Fig. 3).
Our results show that DGVMs overestimate tree mortality, particularly in tropical regions (Figs. 2c, 2e). Thus, if modeled mortality is over-estimated, we expect that NPP is over-estimated as well. Our constraint approach by LOSS reduces this common bias of models and decreases projected NPP down to 7.4, 2.2, 1.9 Pg C y -1 in South America, Africa and Asia & Australia, compared to original NPP values of 9, 2.4, 2.3 Pg C y -1 (Fig. 4a). The reason for this is that NPP or growth is strongly positively correlated with LOSS across space in both inventory data and DGVMs (Supplementary Figs. 2,3; Supplementary Fig. 16). The constant mortality used in most models may be too large if modelers have tuned this parameter to obtain reasonable biomass stocks, thus compensating for an overestimate of NPP in absence of modeled competition between individuals and nutrients (e.g. phosphorus) limitations in tropical forests 13 .
Likewise, HRpro showed similar patterns with NPPpro because of coupling of HR and NPP and LOSS at broad spatial and long term scales 18,19 , despite the likely decoupling of the instantaneous rate of HR and NPP and LOSS at local and short-term scales 20,21 . Thus, we also constrained a decrease in projected grid-level HR with values of 6.2, 1.8, 1.6 Pg C y -1 in South America, Africa and Asia & Australia compared to 7, 1.9, 1.8 Pg C y -1 in the original model ensemble (Fig. 4b). Taken together, our results constrain a weaker future tropical forest carbon sink from observation-based LOSS estimates down to 1.3, 0.4, 0.3 Pg C y -1 in South America, Africa and Asia & Australia as compared to 2, 0.5, 0.5 Pg C y -1 in the original models. The sink is reduced in particular over the Amazon basin, while North America showed an enhanced future carbon sink (0.97 and 0.75 Pg C y -1 after and before constraint, respectively). The constraint by the machine learning approach substantially similarly reduced the model spread in grid-level NPPpro and HRpro across continents ( Fig. 4; Table 1).
Our results were robust to the inclusion of projected climate (temperature and precipitation) across space in the machine learning algorithm (Supplementary Fig. 17), while we note that our approach does not account for the slow varying effects of temporal climate change and atmospheric CO2 concentrations 22,23,25 . Indeed, our study focuses on the carbon flux -NPP and HR and carbon sink averaged over the long-term projected future (2015-2099) in mature forests across continents. The mechanistic basis underlying this approach is the observed pattern of faster growth and higher mortality and thus coupling of growth, mortality, and respiration averaged over long-term and broad spatial scales, which have been demonstrated in forest inventory datasets 11,43 , tree rings 12 , eddy flux towers 18 and Earth System models 13 and DGVMs here. Our approach assumes that this coupling holds both in the short-term historical period and will hold in the future, a space for time approximation allowing us to use observed historical LOSS to constrain the future projected NPP and HR carbon fluxes. Models typically have a simple representation of tree mortality processes, as a fraction of standing stock 13 , but increasingly more detailed global datasets of tree biomass and demography will also make it possible to test more realistic simulations of competition to access to light, water and nutrients in the next generation of models 31,44 . The finite resources and vegetation carrying capacity govern the tradeoffs between growth and mortality and respiration across space or time. Our results indicate that the projected increase in tropical forest productivity may not be as great as previously thought, especially over the Amazon, which may offset projected increases in boreal and temperate forest productivity, thereby reducing the carbon sink potential of global forests.

Forest plot datasets
Forest plot data used in this study met the following criteria 13 : (1) all plots had at least three consecutive censuses and long-term (>9 years) observations between the first and last census so that LOSS averaged over all censuses was representative of the historic forest status; (2) plots were natural, unmanaged forest stands that have not been disturbed by fires, harvesting, and other human activities; (3) the plots were largely mature or old-growth forests and were screened by criteria such as forest age or forest gymnosperm fraction (see Supplementary Information for details); (4) the plots were in a quasi-steady state and were screened by criteria that plots with growth more than 3 times of LOSS or LOSS more than 3 times of growth were excluded -such large differences (i.e., more than 3 times) between LOSS and NPP are likely due to disturbances such as fires that are not well-represented yet in DGVMs. Thus this criterion is thus used to further select natural mature and old-growth forest stands; and (5) the plots with low values of biomass (i.e., < 3 Kg m -2 ) were excluded because they are not fully stocked and thus not likely to be mature forests. Ultimately, we compiled a broad-scale (n = 2676) and long-term (1951 to 2018) dataset of largely unmanaged forest plots in a quasi-steady state, distributed in Canada, USA, Amazon, Africa and Asia & Australia (Supplementary Fig. 1). To be comparable with DGVMs, the aboveground woody biomass loss was converted to total woody biomass loss including belowground roots using the root-shoot biomass ratio product 45 . More details for the criterion of plots selected, plot establishment, and measurements are described in Supplementary Methods. We stress the advantage of using LOSS rather than growth or woody NPP to constrain the projected carbon sink because of different thresholds of a diameter at breast height to record the new recruitments to quantify growth across forest plots 13,30 . For instance, the criteria of 12. We used these six DGVMs because they explicitly reported tree mortality. To be comparable with forest plot data, we used the components of tree mortality flux without disturbance, such as fires (Supplementary Table 2). The historic mortality flux was averaged over 1961-2014 forced by CRU-NCEP v5 climate data and the projected NPP and HR were averaged over 2015-2099 forced by bias-corrected IPSL-CM5A-LR RCP 8.5 simulated climate data. Because the output of HR is reported at grid scale and it is not able to be decomposed into tree-level HR and non-tree level HR, we used grid scale NPP to estimate forest carbon sink, quantified as the difference of grid-level NPP and grid-level HR. To be consistent with forest inventory data, outputs of historical mortality at tree-level were used to compare with observations. All outputs were resampled and analyzed at 0.5 degree.

Geospatial modeling and environmental drivers
We used the Random Forest machine learning algorithm 46 Fig. 18a). Random Forest was able to predict the variation in LOSS with high predictive accuracy (R 2 = 0.48 in 10-fold cross validation; R 2 = 0.93 in final model; Supplementary Fig. 18b). Two types of bootstrapping were used to evaluate the uncertainty (standard deviation as a fraction of the mean predicted value) in the map of LOSS. One was based on a stratified bootstrapping (100 iterations) procedure 47 where is 'projected', ℎ is 'historic', is the index of model, and are coefficients.
NPP and HR were aggregated as sum within each continent and LOSS were aggregated as average across forest plot sites or within each continent ( Supplementary Fig. 4). In details, we used LOSS at forest-plot local scale to constrain the projected NPP and HR at continental scale in DGVMs and found its limited feasibility in the classic emergent constraint approach because of spatial mismatch between data and model 26 . This motivated us to generate a spatially explicit map of LOSS derived from the machine learning algorithm (see Methods) to constrain the projected NPP and HR at continental scale in DGVMs.
Alternatively, we used the historical (1961-2014) LOSS as the predictor to train a random forest model and constrain the projected (2015-2099) NPP, HR and forest carbon sink: where equation 2 is the scenario without the projected climate effects and equation 3 is the scenario with the projected climate effects. Climate effects were incorporated to account for their potential influence on the couplings or relationships between historical LOSS and projected NPP and HR across environmental (climate) gradients. CO2 was not included in eqn (3) because of the minimal spatial heterogeneity on annual timescales. We clarified that all of these variables were averages over the historical or future periods and our study thus focused on the spatial patterns of variables of interests in the quasi-steady state.
We stress that the random forest model allowed for inclusion of all grid values of variables of interests and consideration of non-linearity. This was achieved with 1000 trees and 10 maximum tree depth and 80% of the data for training purpose and the rest 20% for validation.
With this approach, we surrogated the predictive relationship between the historical (1961-2014) LOSS and the projected (2015-2099) NPP and HR from a complex but physically driven DGVMs with an empirical machine learning model (random forest). Then, we feed the observed maps of LOSS derived from forest plots datasets into the trained random forest model to assess the impacts of historic LOSS on the projected NPP and HR. In this way, we were able to generate the spatially explicit 29 constraint of the projected NPP and HR, which was subsequently aggregated as sum within each continent to assess the forest carbon sink at the continental scale.
The mechanistic underpinning justifying the linear (emergent) and non-linear (machine learning) constrain was the theory of faster growth and higher mortality 12,13,43 and thus coupling of growth, mortality and respiration averaged over long term and broad spatial scales 18,19 .