Most root-derived carbon inputs do not contribute to the global bulk soil 1 carbon pool 2

Plant root-derived carbon (C) inputs (I root ) are the primary source of C in mineral bulk 13 soil. However, a fraction of I root may lose directly (I loss , e.g., via rhizosphere microbial 14 respiration, leaching and fauna feeding) without contributing to bulk soil C pool. This loss has never been quantified, particularly at global scale, inhibiting reliable estimation of soil C dynamics. Here we integrate three observational global datasets including radiocarbon content, allocation of photosynthetically assimilated C, and root biomass distribution in 2,034 soil profiles to quantify I root and its contribution to the bulk soil C pool. We show that global average I root in the 0-200 cm soil profile is 3.5 Mg ha -1 yr -1 , ~80% of which (i.e., I loss ) is lost rather than entering bulk soil. If ignoring I loss , bulk soil C turnover will be incorrectly estimated to be four times faster. This can explain why Earth system models (in which all I root enters bulk soil C pools) predict much faster soil C turnover than radiocarbon-constrained estimates.

(e.g., C isotopes) techniques 8 or by root growth measurements [12][13][14] . Numerous relevant data (∆ 14 Cdata) to calculate soil C ages [35][36][37] in 750 soil profiles across the globe (Extended Data Based on the ∆ 14 C-dervied Cage and soil C stock (Cstock) measured with ∆ 14 C (Methods), 129 we can estimate C inputs to bulk soil (Ibulk) in each soil layer for ∆ 14 Cdata profiles as Cstock/Cage 130 under the steady state assumption (Methods). Averaging across the 750 soil profiles in the 0-131 200 cm soil profile, Ibulk is 0.60 Mg ha -1 yr -1 ranging from 0.06 (2.5% CI) to 3.59 Mg ha -1 yr -132 1 (97.5% CI, Fig. 1a); and is significantly different among global biomes (Fig. 1a). Temperate 133 forests have the highest Ibulk (1.20 Mg ha -1 yr -1 ), followed by boreal forests (0.77 Mg ha -1 yr -134 1 ); and deserts have the lowest Ibulk (0.16 Mg ha -1 yr -1 ), followed by tundra (0.26 Mg ha -1 yr -1 , 135 Fig. 1a). The depth distribution of Ibulk is generally consistent with that of Iroot ( Fig. 1b and d). 136 On average, the results indicate 0.26, 0.06 and 0.03 Mg ha -1 yr -1 of Ibulk in the 0-20 cm, 20-40 137 cm and 40-60 cm soil layers, respectively (Fig. 1b), accounting for 68.9%, 14.1%, and 5.2% of 138 total Ibulk, respectively (Fig. 1d). Across all biomes, Ibulk is significantly lower than Iroot (Fig. 139 1a). As expected, depth is the dominant control on Ibulk in different soil layers (Fig. 2). The 140 multivariate linear mixed regression considering soil depth, edaphic, climatic and topographic 141 variables, and biome type (Methods) explains 74% of the variance of Ibulk (Fig. 2). Compared 142 to the effects of environmental factors on Iroot, their effects on Ibulk are more consistent among 143 biome types. In addition, the three groups of environmental factors (i.e., edaphic, climatic and 144 topographic) show similar overall importance for controlling Ibulk (Fig. 2). 146 Using the estimates of Iroot and Ibulk, we can calculate the difference between Iroot and Ibulk (i.e., 147 Iroot -Ibulk), which represents the loss of root-derived C inputs (Iloss) that leaves the soil via any 148 direct loss pathways without entering the bulk soil and participating in the C cycle there. 149 Considering potential uncertainties in NPP thus Iroot and its allocation to soil depths, we 150 conducted six independent estimates of Iloss based on four independent datasets (Methods, 151 Extended Data Table 1) to provide confidence in the estimate of Iloss (Fig. 3 Table 2 and Table 3). 153 Across the globe, the six independent estimates of Iloss are generally comparable, with an 154 average of 2.31 Mg ha -1 yr -1 in the 0-200 cm soil profile (Fig. 3a) and provide spatial insights into prediction uncertainties.

189
The connections and interactions between belowground and aboveground processes remain 190 elusive and unquantified. One of the key issues is our poor understanding of the plant-soil 191 interactions in which the rhizosphere may play a core mediating role 39 . Under elevated 192 atmospheric CO2, for example, numerous field manipulation experiments have found enhanced photosynthetic C assimilation and thus stimulated C inputs to soil, but soil C stock shows no 194 or marginal response to such enhanced belowground C inputs and remains relatively stable 40,41 . 195 The reason for this imbalance is widely debated. One explanation proposed is that microbial 196 decomposition of organic matter in bulk soil is accelerated as microbes "mine" nutrients from 197 soil organic matter in order to utilize the enhanced input of newly assimilated C, particularly 198 in nutrient-poor environments. But this ignores the fact that plants also need additional 199 nutrients to support their stimulated growth 4 . Our results shed light on another mechanism, i.e.,  At the global scale, our results demonstrate that most root-derived C inputs to soil does 210 not directly contribute to the bulk soil C pool. That is, around 80% of root-derived C inputs 211 leaves the soil directly via various pathways potentially mediated by the rhizosphere. Here, we 212 propose three such pathways (Fig. 5). First, the majority of released C by roots as exudates 213 may be respired as CO2 by microbes in the rhizosphere 24 instead of moving to the C pool in 214 bulk soil. Root-released C (in the form of exudates and root detritus) may mainly contribute to 215 the bulk soil C pool via by-products (e.g., necromass) of microbes which utilize root-derived 216 C in the rhizosphere. Indeed, there is evidence that microbial necromass accounts for >50% of 217 soil organic C in temperate cropping, grassland and forest topsoils 44 . Second, soil fauna feeds on roots and microbes, and thus removes a fraction of root-derived C 23,45 . Third, root-derived 219 C may pass directly to other systems beyond the soil layer such as bedrock or groundwater.

220
For example, plant roots may penetrate to the bedrock 30 , and thus the relevant root-derived C 221 does not contribute to bulk soil C at all. It is also intriguing to note that the loss of root-derived 222 C relative to inputs in each soil layer decreases with soil depth except the top 0-20 cm soil layer 223 (Fig. 5). This may due to decreasing activities of soil microbes and fauna with soil depth; and 224 in the top layer, bulk soil C inputs may include marked amount of C from soil surface 38 .

225
Earth system models usually have a litter soil C pool with fast turnover rates 10,17 , which 226 may conceptually capture the direct C losses. However, the size and turnover rate of this pool 227 have been rarely explicitly tested or verified. The spatially explicit maps for Iroot, Ibulk and Iloss 228 provide benchmark global layers to force and verify these models and explore the relevant 229 implications for long-term C dynamics. The significant loss of root-derived C inputs provides 230 insights into C sequestration by capturing the "leaked" C, such as through the application of 231 biochar which can stabilize new root-derived C by formatting microaggregates via organo-232 mineral interactions 46 . We suggest that the oxidation of fixed C that occurs in the rhizosphere 233 must be considered for reliable soil C predictions and land management to simulate C 234 sequestration.   and lower depths). In total, we used 3,128 unique measurements of ∆ 14 C from 750 profiles (Extended Data Fig. 1).

368
Estimation of root-derived C inputs to soil (Iroot). Using the NPPdata dataset, belowground C 369 allocation, i.e., root-derived C inputs to soil (Iroot), can be directly estimated as belowground 370 NPP (i.e., BNPP) recorded by NPPdata (Extended Data Fig. 1a). It is a significant challenge to 371 determine the allocation of Iroot to different soil layer depths using current technologies.

372
However, it is reasonable to assume that Iroot to a specific layer is proportional to root biomass 373 in that layer. So we can infer C inputs to different soil layer depths by combining depth 374 distribution of root biomass and BNPP 15 . The depth distribution of root biomass based on 375 Rootdata (Extended Data Fig. 1) was used to estimate the depth distribution of Iroot. In order to 376 quantify the absolute Iroot to each soil layer, ideally we need to know BNPP at the corresponding 377 NPPdata location, but such data are unavailable for Rootdata sites. Here, we used a machine NPPdata sites to predict Iroot, which is further allocated to different soil layers according to the 384 vertical distribution of Ibulk, assuming these two variables (i.e., depth distribution predicted by 385 Rootdata and ∆ 14 Cdata) follow a similar pattern of depth distribution (Fig. 1d, see the section on 386 Machine learning models for Iroot and Ibulk).

387
Estimation of actual C inputs to bulk soil (Ibulk). Under the assumption of steady state, C 388 inputs to bulk soil (Ibulk) can be estimated as bulk soil C stock divided by C age. Using the 389 ∆ 14 Cdata, a soil radiocarbon model 48 was adopted to estimate Cage: where A input is the 14 C/ 12 C ratio of C inputs into bulk soil and assumed to be equal to the 14 C/ 12 C 392 ratio of the atmosphere, A soil is the 14 C/ 12 C ratio of bulk soil C, k is the decay rate of bulk soil 393 C, and γ is the β-decay rate of 14 C and equal to 1/8267 per year. Before the nuclear testing in 394 the 1950s and 1960s, atmospheric 14 C/ 12 C ratio was relatively stable. Under the steady state 395 assumption, A soil can be estimated by Equation (2), and with A soil measurements at the steady 396 state, k can be estimated by Equation (3) as follow: If A soil is measured after the nuclear testing, Equation (3) Fig. 5). Because soil layers were treated as independent 504 drivers in the model of Ibulk, we could directly predict the depth distribution of Ibulk. It is also 505 important to highlight that the depth distribution of Ibulk is comparable with that of root biomass 506 (Fig. 1d). This gave us confidence to use Ibulk depth distribution to infer Iroot depth distribution,
Treating the observed BNPP (i.e., Iroot) and fBNPP (the fraction of BNPP to total NPP) data 509 at the NPPdata sites as dependent variables, we fitted two machine learning-based models for 510 Iroot and fBNPP, respectively.  (Extended Data Fig. 3) and fBNPP (Extended Data Fig. 9) compared with other four algorithms 531 (i.e., XGBoost, SVM, Cubist and MARS). Using these fitted machine learning models 532 (ensemble models) and retrieving the predicting variables at other locations, we predicted Ibulk at Rootdata and NPPdata sites, and also calculated Iroot and fBNPP at Rootdata and ∆ 14 Cdata locations.

534
Global mapping and prediction uncertainty. Using the fitted machine learning models, we 535 mapped Iroot (Extended Data Fig. 4), Ibulk (Extended Data Fig. 5) and Iloss (i.e., Iroot -Ibulk, 536 Extended Data Fig. 6) in the seven standard layers across the globe at the resolution of 30 arc-537 second (~1 km) grid. Here, we used the random forest model, rather than the ensemble model,   independent data sources and/or methods (Iloss1-6), which are described in Extended Data Table   645 1; b, depth distribution of Iloss; c, the probability density distribution of the percentage fraction  root-derived carbon inputs is lost directly via three potential pathways: rhizosphere microbial 660 consumption, fauna feeding and leaching; and does not contribute to bulk soil carbon pool.