A generalizable framework for spatially explicit exploration of soil carbon sequestration on global marginal land

Designing land-based climate stabilisation strategies demands changes in land use management or new areas to grow biomass sustainably, without reducing food supply or damaging natural ecosystems. Here we integrated, into a generalizable framework, the mapping of a set of biophysical (climatic and edaphic) and land conservation constraints to identify the pairing of target areas with suitable plant species, and quantify their contributions to long-term soil carbon sequestration, including losses from rain-driven erosion. The proposed framework represents a refinement to previous global mapping exercises, which seldom consider pedo-climatic constraints, plant species tolerances and world protected areas. We demonstrated the applicability of the framework through a global case study by mapping target areas featuring low soil organic carbon stocks (≤50 t SOC ha -1 ) and producing quantifiable soil carbon sequestration potentials of best matches. Preliminary target areas were mapped at a 30 arc-sec resolution, and then consolidated per geographical and environmental boundaries (geopolitical world regions x global ecological zones). soil by canopy (expressed as the cover management factor in the erosion model), lead to these effects. From the resulting best target area x biopump pairs, one third of the results showed negligible soil C losses by erosion (up to 8% of final SOC values before erosion) located in European Russia (Boreal mountain system, Temperate mountain system and Boreal coniferous forest), Western Europe (Temperate mountain system and Temperate continental forest) and Northern (Temperate continental forest).


Introduction
The global top 2 m of soil feature a soil organic carbon (SOC) debt of 133 Gt 1 , which has increased in the past two centuries due to increasing agricultural land use. This debt (i.e. the difference between the original pre-agriculture and the current amounts of SOC in exploited soils) demonstrates a strong correlation between land degradation and SOC losses, dependent on the degree of intensity and duration of soil exploitation. It has been recognised that the regions featuring historic SOC losses can now be considered as SOC sinks 2,3 , with a potential to offset two thirds of today's SOC debt in the near-term 1 .
The 4 per 1000 initiative (www.4p1000.org), launched at the COP21 Paris Climate Summit in 2015, is a multi-stakeholder international initiative aiming at increasing SOC sequestration through sustainable land management 4 . It is based on the premise that an annual mean SOC increase of 0.4% in the global agricultural topsoil (30 cm) would contribute to an annual global sequestration of 2.5 Gt C 5 -an estimation subject to criticism, due to intrinsic data and model uncertainty 6 , yet representing a clear goal towards climate stabilisation. Soil carbon sequestration (SCS) thus constitutes a key element of negative emission technologies and practices that result in the net removal of greenhouse gases from the atmosphere" 7 . SCS is increasingly seen as a key land-based mechanism relying on plant photosynthesis to stabilise the climate 8 , constituting both a mitigation measure (for climate change but also for ecosystems quality) and a way to induce additional negative emissions, as required by the Paris Agreement, to limit warming below 2⁰C 9 . Biophysical potentials for SCS have been variously estimated at 0.4-8.6 Gt CO2 yr -1 10 , while its technical feasibility may range as much as between 1-5 Gt CO2 yr -1 11 -reliant on local land use type (e.g. cropland), management (including external inputs) and pedoclimatic conditions, among other factors. Moreover, exploited soils featuring SOC stocks <30 t C ha -1 are expected to be able to attain high SCS after introducing specific management practices 5 .
The bulk of agriculture's contribution to SCS is influenced by land use management (e.g. addition of organic matter via organic amendments and fertilisers, deployment of improved crop rotations and cover crops), including the cultivation of specific crop types such as perennial and deep rooting species 12,13 . Perennial species have received special attention, as they generally require less soil work, enlarge the C fraction in the soil micro-aggregates, and increase belowground C allocation 14 . In contrast to annually harvested crops, species with long life spans accumulate C over several years to decades in above-and belowground biomass, representing an organic C stock 8 . The extent to which crops in general contribute to SCS is subject to extensive research 15 .
The cultivation of dedicated biomass enabling SCS and eventually providing feedstock for various economic pathways is intrinsically connected with land demand, and thus with the availability of new areas to grow plants sustainably; i.e. with no negative effects (e.g. on the carbon debt, on food security, on ecosystem services, on biodiversity quality) 16 . In the past decade, research has focused on identifying land unsuitable for food production but potentially suitable for non-food crops (e.g. bioenergy, biomaterials), defined as marginal land 17 . Marginal land could, depending on its capacity to sustain biomass, be used for biomass cultivation aiming at either increasing SCS or providing feedstock for the bioeconomy. The expected benefits of exploiting marginal land are wide, ranging from soil quality improvements (e.g. soil fertility, soil structural stability) 18 to carbon sequestration (e.g. 30-50% of C in cropland mineral soils has been lost due to degradation 11,13 ), through biodiversity conservation 19 and eventual socio-economic development (e.g. employment, infrastructure, rural poverty alleviation) 20 .
Key types of land classifiable as marginal include degraded and abandoned agricultural lands. Degraded land, which may include soils naturally characterised by low productivity (e.g. natural high salinity soils, or heathlands such as the Mediterranean garrigue), has received considerable attention, as a key constituency of marginal land 21 . In particular, research led by the International Soil Reference and Information Centre (ISRIC), throughout various projects such as GLASOD 22 , GLADA 23 and LADA 24 , applied the use of a remotely sensed global normalised difference vegetation index (NDVI) as a proxy for land degradation due to different causes 25 . Among the dominant typologies of degraded land, the following FAO classificationreferred to as the FAO agricultural problem land approach 26 , associated with the FAO/UNESCO Digital Soil Map of the World 27 , now integrated into and superseded by the Harmonized World Soils Database (HWSD) 28 and into FAO's Global Agro-Ecological Zones (GAEZ) 29 -, represents a synthesis of criteria: too cold (polar/boreal), alluvial soil in deserts, too dry, steep lands (dominant slope >30%), shallow lands, poorly drained, coarse texture, vertisols, infertile (e.g. nutrient-poor), saline/sodic, acid sulphate, and peats (organic soils). For abandoned agricultural land, another key constituency of marginal land, extensive research has proposed different definitions and mapping approaches [30][31][32] . Agricultural land abandonment has been considered to be mainly driven by biophysical constraints, but also by reasons pertaining to farm structure, agricultural viability, as well as to changing population, political regimes, nature conservation and other regional contexts 31,33 . These approaches generically consist of comparing satellite data corresponding to two different periods, and interpreting the land cover differences.
Several studies attempted to quantify and map marginal land use across spatial (global, national, regional and local data) and temporal (historic to current data) resolutions 34 . At the global scale, it has been quantified by mapping current land cover and land suitability indices 35 . However, it has not always been possible to identify the relative importance of agricultural (i.e. abandoned) and non-agricultural land types on the identification of marginal land, because of a combination of factors: analyses based exclusively on (bio)economic drivers, quality of underlying datasets and scales, and lack of detail on current land covers' spatial (i.e. granularity) and temporal resolutions 16 . Beyond the challenge of defining and identifying marginal lands, the design of SCS strategies involving biomass demands the correct pairing of suitable plant species with specific target areas, based on the compatibility of plant species with the target areas' biophysical characteristics. Such endeavour is not negligible, as demonstrated by the body of research evidence, as recently reviewed at the global scale 16 .
A recently proposed strategy to facilitate global scale soil climate mitigation 11 , suggests that detailed maps of C sequestration potentials including erosion, associated with simple analytical tools, would contribute to supporting the global implementation of SCS. One of the first examples of such an approach for quantifying global SOC dynamics is that by Morais et al. 36 , which was applied to spatially differentiated unique homogenous territorial units characterised in three main land use classes, namely cropland, grassland and forest land. Following the hypothesis that global soils with low initial SOC stocks feature high SCS potentials, we chose to explore and map global SOC-deficient marginal lands, featuring up to 50 t SOC ha -1 at the topsoil (≤30 cm), equivalent to 1.2% C in soil, to test if the hypothesis applies to this specific condition. The framework proposed here thus complements that of Morais et al. (which focused on nonmarginal land) and other attempted approaches focusing at least partially on marginal land for biomass production as bioenergy feedstock, usually at a regional scale 37,38 . This study had thus a double purpose: i) to propose a coherent generalizable framework to facilitate the mapping of marginal lands with low SOC stocks (≤50 t SOC ha -1 ) at global scale and their matching with candidate plant species, and ii) to illustrate the feasibility and interest of the proposed framework by means of a proof-of-concept implementation producing a quantification of SCS resulting from suitable matches.

Conceptual framework
The proposed framework allows quantifying global SCS potentials of biomass cultivation on target areas by accounting for both geographic-and plant species-dependent climatic and edaphic limitations. We combined and intersected georeferenced products and automatized the tasks of matching pre-selected "biopumps" (i.e. plant species featuring SCS enhancing capabilities and representing a potential source of feedstock for the bioeconomy) to target areas (i.e. aggregations of specific areas of marginal land, based on ecological zoning). Such matching was based on the pedoclimatic tolerances of the former to the prevailing conditions of the latter, and was followed by a computation of soil C sequestration and C losses due to erosion by water over the simulation time horizon 2020-2100, using well-established models. A R code combined with the data treatment strategy depicted in Figure 4, constitutes the proposed generalizable framework. The framework is able to produce estimations showing whether more soil C would be sequestered, in the long-term, than it would be lost to degradation and rain-driven erosion, if the matching plant species were systematically grown on the identified areas. To our knowledge, no previous global study on SCS considered such careful identification of marginal land (i.e. integrating pedoclimatic constraints, plant species tolerances, and land conservation), while balancing SOC sequestration and losses.

Global marginal lands and target areas
To define the extent of marginal land, elements from various studies 16,37,38 were combined to define marginal lands as land covers that are currently unused or underutilised by agriculture due to an aggregation of socio-economic and biophysical constraints, or human-induced land degradation, but which could potentially be suitable for sustainable biomass production. These areas are presented in Supplementary Results ( Figure S1) with SOC stocks divided into five classes, following increments of 10 t SOC ha -1 . Identified marginal lands (in total 2 714 Mha), consist of bare areas (74.4%), sparsely vegetated (25.4%) and abandoned agricultural land (0.14%). Additional intermediate maps are also presented in Supplementary Results, depicting marginal land ( Figure S2 to S6) and the criteria leading to the identification of target areas ( Figure S7 to S10) under the retained constraints (e.g. protected areas, pedoclimatic constraints, global climate zones, and world regions).  The majority of target areas were found in Asia (Eastern, Central, Southern, and Western), Northern Africa, and South America (Figure 2), corresponding to GEZ in the temperate, tropical and subtropical desert and mountain systems, as well as temperate and subtropical steppe. Europe accounted for 0.1% of all target areas, largely present in the Southern Europe Region concordant with GEZ Subtropical dry forest. All values per world region and GEZ are listed in the Supplementary Results Table S1.

Figure 2. Relative geographical concentration of identified target areas
Suitable "biopumps" with soil carbon sequestration potentials Initially, 432 species were identified in the FAO's ECOCROP database 40 , and their environmental requirements were determined to further assess the suitability to grow on the identified target areas. The eventual introduction of invasive species in the receiving ecosystems was not considered beyond the exclusion of world protected areas. Then, 50 biopumps were preselected by scoring and ranking SCS enhancing capabilities, primary yield productivity, and marginal land adaptability. SCS potentials are provided in Supplementary Results (Table S1 and Figure S10), including a ranking and scoring of biopumps in Table S2.
The matching exercise resulted in 460 viable combinations of target areas x biopumps (out of 65 664 theoretically possible matches). Overall, 128 combinations (28% of all matches) featured net SOC stocks >0 at the year 2100 (i.e. the sequestered C is larger than the eroded C), where 17 biopumps (associated to 67 species) were compatible within 12 world regions and 11 GEZ (Figure 3).

Target areas-biopumps pairs with the highest sequestration potentials
From the previously identified 128 possible target area-biopump pairs with SOC >0, 6.5% (i.e. 30 matches) represent the selection of only one biopump per target area. Overall, 11 biopumps (associated with 12 species) were identified to yield the highest net SCS results at year 2100 (referred to as "best-cases"), whereas the most representative in number of matches were cup plant (S. perfoliatum) followed by  (Table S2). We hypothesise that very strong seasonal precipitation, coupled with incomplete protection of the soil by canopy (expressed as the cover management factor in the erosion model), lead to these effects. From the resulting best target area x biopump pairs, one third of the results showed negligible soil C losses by erosion (up to 8% of final SOC values before erosion) located in European Russia (Boreal mountain system, Temperate mountain system and Boreal coniferous forest), Western Europe (Temperate mountain system and Temperate continental forest) and Northern Europe (Temperate continental forest).

Discussion
The framework design is generic enough to be implemented in other settings, regarding target area definition, plant species of interest, scopes (e.g. geographic boundaries), as well as temporal and spatial resolutions. The provided R script in 41 is usable in different situations as long as the required input data is organised in the prescribed way. Moreover, the framework is flexible to accommodate different SOC sequestration and erosion models, depending on the required outputs and data availability. For our example, we selected the RothC model, being an appropriate SOC model for the following reasons: (i) its demonstrated performance to replicate observed SOC changes in validation experiments, as compared with other models (described in Supplementary Methods), (ii) its applicability to a wide range of world climates and regions in combination with GIS products 3,36 ; (iii) its recognition, being recommended as a standard spatialized SOC model at a 30 arcsec resolution by the FAO 42 .
As a proof-of-concept we applied the framework on global marginal lands under well-defined criteria (e.g. soil constraints, land conservation and low initial SOC stocks) and were able to identify the plant species delivering highest net SCS -showing the feasibility and interest of the proposed framework to produce immediately useful results. The climatic and edaphic characteristics of the target areas were averaged at the world region x GEZ scale. Despite the loss of detail, we considered that the consolidated target areas were the more suitable geographical resolution to explore biopump compatibility at a manageable scale. Furthermore, the projected long-term SOC changes were based on near-term climate data, i.e. disregarding climate change effects on SOC stocks dependent on temperature, precipitation and evapotranspiration A key hypothesis for the global case study selection was that only SOC-deficient land covers are interesting for biopump implantations, as they have not reached C saturation and are likely to attain higher sequestration potentials 5 . However, SOC with <0.5% and <0.75% organic carbon (30 cm topsoil) are considered, respectively, as featuring severe to sub-severe soil fertility deficiency 49 , which is equivalent to about 21 and 31.5 t SOC ha -1 respectively, as per the equation in Ledo et al. 50 . We disregarded the (sub-)severe soil fertility constraints for this exercise by cross-referencing global land covers with global SOC stocks ≤50 t C ha -1 (about 1.2 % organic carbon). This soil constraint might even further reduce the identified target areas. However, we have not included it because the residual biomass, as input to the soil, contributes to soil fertility and other management practices might further improve the soil quality. Any land use management (or change) influences the evolution of SOC pools, which need to be specified in the soil model. In our example, we have not considered additional management practices other than plant-based C inputs. RothC is a non-saturating SOC model 51 , yet deemed to yield accurate predictions in cases when Cinputs are "low" 52 (e.g. in the absence of organic fertilisation). A further refinement in the framework would involve adding a C-saturating model to better account for management practices, including organic fertilisation 51,53,54 .
Cultivation practices in terms of resource requirements (e.g. nutrient supply, irrigation), including best management practices (e.g. organic amendments and fertilisers, cover crops), should be considered in more detailed assessments, as they have been shown to influence SCS rates 5 . Other specific agronomic requirements (e.g. to prioritise only low-input species on marginal land), as well as ecological impacts and invasion risks (of so-called alien or invasive species), need further assessment.
The preselection of the biopumps was primarily based on soil C sequestration and feedstock (yield) potentials. Only about 17% of all inventoried plant species were suitable for the identified target areas, yet representing about 34% of the preselected biopumps. The preselection criteria alone are not a qualification for SOC improvements and biomass productivity. The study has demonstrated that the net sequestration is a result of a combination of biophysical factors (e.g. soil, terrain and climate types, rain erosion, etc.), and that the performance of one crop on one target area is not the same on another. Several other elements would require consideration, such as local environmental challenges (e.g. compaction and low water retention, limited nutrients, low organic matter and potential phytotoxicity, weed and pests, etc.), tradeoffs with competing land uses (e.g. forage and livestock, housing, conservation, recreation, etc.), and socioeconomic market interferences (e.g. disruption of value chains or people's livelihoods).

Methods
We developed a framework to explore SCS potentials by various plant species and categories, and applied it to identifying and consolidating global target areas delimited by geopolitical and environmental boundaries. The proposed framework relies on the use of georeferenced products, corresponding to the needs of macro-level global models. It was structured around four main steps ( Figure 4): 1) identifying target areas corresponding to land covers of interest, while considering biophysical constraints to biomass production and land conservation (in our example: global marginal land); 2) characterising target areas by their pedoclimatic and terrain conditions, and consolidating them into the required level of aggregation/granularity (in our example: consolidated target areas defined by geopolitical world regions and geoecological zones), 3) selecting plant species or groups of species (in our example: promising biopumps) and determining their environmental requirements/tolerances, and 4) matching target areas-biopump pairs as determined by the compatibility of plant species to target areas' biophysical characteristics, and modelling soil C flows of the resulting target areas-biopump pairs.
The full list of exploited data and sources of georeferenced data is presented in the dataset 41 and Supplementary Methods Table S1.

Identification of marginal land (steps 1)
Global marginal lands were mapped at a 30 arcsec (1 km) resolution, using a geographic information system (ArcGIS v10.6). We studied different definitions of marginal land to segregate land types into agricultural (potentially suitable for food production historically, currently or in future) and non-agricultural (unsuitable/unfavourable for food production). The selection was based on a combination of various criteria of soil constraints to biomass (plants-based) production from key marginal land studies (detailed in Supplementary Methods, Table S2 and S3) to ensure that their associated biophysical constraints would be compatible with the environmental requirements for plant growth, and that chosen areas correspond to a land cover category that, if changed, would not contribute to further environmental degradation (including SOC losses). In our example, we followed five sub-steps to identify marginal lands: In step 1.1, we explored the global land cover (LC) map by the European Space Agency Copernicus Climate Change Initiative (ESA-CCI) for the year 2018 at a 300 m (9.7 arcsec at the equator) resolution, which features 22 LC classes defined in the FAO Land Cover Classification System 55 (see Supplementary Methods,  Table S4). We retained bare and sparsely vegetated (<15% of vegetation) areas, as all other LC classes likely feature food/feed production (e.g. irrigated or rainfed cropland, natural grasslands), densities of vegetation that would risk severe damage to natural ecosystems and SOC losses from land use change (e.g. forests and shrubland), or are vulnerable to crop growth due severe biophysical constraints (e.g. lichens and mosses).
In step 1.2, we identified recent abandoned agricultural land, understood as the share of marginal agricultural land that is unused or underutilised.  58 .
In step 1.5, we identified the biophysical constraints associated with the lands satisfying the previous two conditions, in order to discard areas featuring (sub-)severe soil and terrain restrictions that would exceed the tolerances of potential plant species. These constraints correspond to those associated with degraded lands in the georeferenced FAO Dominant Type of Problem Lands product for a generic perspective, complemented by the Harmonized World Soil Database (HWSD v1.21) 28 on soil properties.

Characterisation and consolidation of target areas (step 2)
The identified marginal lands were further characterised with their edaphic and climatic conditions for an iterative refinement in finding possible matches with biopumps' environmental tolerances, as well as to inform SOC models. In our example, we used the following georeferenced products: • Soil properties (e.g. texture, clay content, pH, bulk density, depth) and terrain specification (slope and elevation) were obtained from HWSD v1.21 28 Table S6).
Once all individual (contiguous pixels) marginal lands were characterised, it was necessary to consolidate them into larger target areas, according with the desired scope and granularity. In our example, we consolidated all marginal lands within the same GEZ (e.g. Tropical shrubland) and geo-political world region (e.g. West Africa), coarsely following 61 .

Identification and characterisation of plant species (step 3)
A preselection of potential biopumps was performed and the plant data recorded into a database to evaluate their suitability to grow on identified and consolidated target areas based on their pedoclimatic requirements, and to inform the SOC model (e.g. plant-based C inputs). Initially a list of 164 herbaceous and woody plants were considered, and data compiled from diverse data sources specific to agricultural perennial 50 and lignocellulosic bioenergy 62 crops, including all 68 industrial crops initially considered in the EU H2020 MAGIC project (www.magic-h2020.eu/). The preselection was performed via a semi-quantitative multi-criteria analysis by scoring and ranking SCS performance, feedstock productivity, and marginal land suitability (detailed in in Supplementary Methods, Table S7).
Data on the pedoclimatic tolerances provided in 41 Table S6). Moreover, several plant species with the same common name were recorded, showing different tolerances, to ensure a broader initial pool for the matching exercise, and because species specifications were lacking for some of the previously used datasets.
The land use class of biopumps was classified into three types based on the life form described in retained datasets 40,50,65 : grass (e.g. bahiagrass, hemp, miscanthus, banana, bamboo, etc., including small woody or herbaceous shrubs (e.g. blueberry, etc.), crops (e.g. sugar cane, maize, loofah, etc.), and trees (e.g. short rotation coppice, orchards, tree-nuts, etc., as well as woody shrubs that are tree-like, e.g. due woody stem, lifetime, size and physiognomy of the plant). Uncertainties may have accumulated where the boundaries between grass and woody species are not clear.
Data provided in 41 on plant C-inputs [t C ha -1 yr -1 ] were computed by the fractioning and C partitioning approach 66 , individually for above-(product, stem, and leaves) and below-(roots) ground compartments and C contents per fraction [%] 67 . The total C-inputs were resolved by partitioning the C per fraction to the soil, depended on the lifecycle (annual or perennial) and lifetime in years. For perennial species several estimates are required, as variations arise from annualising the aboveground C-inputs (e.g. whether the leaves are deciduous or evergreen or the stem is considered a bioeconomy feedstock). Roots, on the contrary, remain over the entire rotation length in the soil. We adopted a linear approach by dividing the total C root by the lifetime in years, here assuming an uneven-aged approach 68 .

Matching of target areas and plant species and computation of net sequestration of target areas-biopumps pairs (step 4)
Once both target areas and a list of candidate plant species were identified, it was necessary to determine matches that would be, ad minimum, biophysically possible. A clear way to achieve such pairings is to compare the pedoclimatic conditions prevailing in the target areas with those of the plants.
We developed a R (R Core Team 69 ) engine provided in 41 to firstly identify the corresponding matches between potential biopumps and target areas, and subsequently run the models for both SOC stock changes and SOC losses to erosion per matched target area-biopump pair up to the year 2100. A match took place when the tolerance ranges of biopumps were within the values associated to the target areas (e.g. temperature, pH, elevation, climatic zone compatibility, etc.). The resulting combinations contain all the information gathered from the previous modules, which is a unique set of input data required to initiate the SOC model and compute the associated SOC stock changes.
We used them monthly time-stepped and processes-based Rothamsted C (RothC) model v26.3 70 to compute SCS (described in Supplementary Methods, including a model comparison in Table S8). To run RothC for a combination of multiple sites and biopumps we used the SoilR package v1.1 71 , which provides a library of functions and tools under the R environment. For the initialisation of the conceptual carbon pools we used pedotransfer functions (equation and constants provided in Supplementary Methods), which seemed appropriate, as other approaches (e.g. physico-chemical or model equilibrium analysis) are not measurable at the regional scale. Monthly temperature and precipitation data were retrieved from CHELSA 59 , and monthly evapotranspiration from CGIAR's High-Resolution Global Soil-Water Balance 72 .
The SCS estimations were complemented with the method described in Lugato et al. 73 to compute SOC erosion from soil erosion by water (detailed in Supplementary Methods). Input layers for soil erosion were used from the Global Soil Loss map at a 25 km (810 arcsec) resolution for the year 2012 74 , based on a Revised Universal Soil Loss Equation (RUSLE)-based method 75 . Retained cover-management factors required by the SOC erosion method 73 are listed in Supplementary Methods Table S9.
The p-value were computed via a paired sample t-test.