Modeling soil organic carbon as a function of topography and forest characteristics


 BackgroundThe soil organic carbon (SOC), an important carbon pool among the five forest carbon pools, plays a crucial role in global carbon cycling. However, it is affected by various factors like topography and climate. Some studies conducted for smaller areas have shown a dependence of the SOC on altitude, but do not allow a general statement for larger regions. Nepal, which has the world's most dynamic terrain elevation, is ideally suited for studying the dependence of the SOC on altitude. ResultsWe found a significant correlation between SOC and altitude (r=0.81). In our study, altitude alone can explain about 66 percent of the variability of the SOC. On the basis of our data, we developed a model which, with altitude as the only independent variable, can predict around 61 percent of the SOC. Conclusions﻿Only altitude has been found to be a sound predictor of SOC at the national level. Other factors with potential impacts on SOC predictors such as crown cover, slope or basal area have only a minor contribution to the improvement of prediction accuracy of the model.


Background
The soil organic carbon (SOC) plays crucial role in global carbon cycling (1,2).It is a vital component of soil and contributes effectively on the functioning of terrestrial ecosystems (3). The SOC is an important carbon pool among the ve forest carbon pools (4)(5)(6). The forest soils comprises about 73% of global soil carbon storage (7). Therefore, a slight change in the amount of soil carbon may have substantial effects on atmospheric CO 2 concentration (8).
Forest soils may serve as important carbon sinks for ameliorating excess atmospheric carbon dioxide (CO 2 )(9). The carbon sequestration capacity of soils is affected by biophysical processes such as rainfall in ltration, soil erosion and soil temperature due to landscape heterogeneity (3). The heterogeneity affects carbon input and carbon losses resulting in difference in carbon content along topographic gradients (9). The amount of SOC is affected by various factors such as topographical factors (10) and climatic factors (2,11). Ecosystem attributes affecting soil carbon dynamics along elevation gradients are usually the product of the long-term interactions between climate, vegetation, and soil type (12).
The distribution of SOC along the altitudinal gradient has not been presented consistently. In the Himalayan region, studies show negative relationship of SOC concentration with altitude and argued that SOC has stabilized to a greater extent at low altitudes than at higher altitudes (13,14). The density of SOC in the forest in Himalayan region is characterized by climate, vegetation and topography (15,16). In contrast, other studies show positive relationship between SOC and altitude in tropics (17)(18)(19)and temperate regions (20)(21)(22).
Most of the studies dealing with SOC (10,13,23,24)have covered small ranges of altitude(150 m-1961 m, 1800 m-2500 m, 1060 m -1230m and1200m-2200 m respectively) and do not allow for generalized statements. In order to be able to obtain generally valid statements for larger regions, we selected Nepal as reference area for our study. Nepal is the country with the widest range of altitude in the world. This results in distinct physiographic zones ranging from sub-tropics to permanent snow-covered Himalayas, which allows studying the SOC response with respect to a wide range of topography and forest characteristics. Investigating along elevation gradients are a useful approach to the study of environmental change, and its effect on soil processes (12).Therefore, this study will provide a new insight in understanding the relationships between the SOC and biophysical factors from sub-tropical lowland to Himalayan foothills of Nepal.

Study area
In Nepal, hills and high mountains cover about 86% of the total land area and the remaining 14% is the atland which lies below less than 300m Altitude. Altitude varies from 60m above sea level in the Teraithe lowland stretching from east to west-to the Mount Everest, with 8,848m the highest peak in the world (25). An altitude is considered as a major factor that has resulted wide range of climatic conditions in Nepal (25). Average temperature decreases by 6 o C for every 1,000m gain in altitude ((26) cited in (25). Wide altitudinal variations and diverse climatic conditions have produced four main physiographic zones i.e. Terai (lowlands), midhills, high mountains and high Himal((27)cited in (25)). These wide ranges of climatic conditions in uence composition of ora and fauna in Nepal, and are mainly due to altitudinal variation (25).
Stainton ((28)cited in (25)) has classi ed 35 forest types in Nepal. These forests are broadly categorized into 10 major groups based on altitudinal range (25).These forests found in varied altitudinal ranges have been reported to store soil organic carbon, above ground tree biomass to the different level (29).

Data collection
The study used National Forest Inventory (NFI) data collected during 2010 to 2014. The NFI adopted two phase systematic sampling design with cluster plots. The NFI established 450 clusters and 1,553 Permanent Sample Plots (PSPs) systematically in the forest area representing different physiographic zones of the country. Data collection was conducted from Terai region ( at land) to high Himal region (High altitude land). Concentric circular sample plots (CCSPs) were used to record biophysical data. Tree attributes (diameter at breast height, total tree height, crown length, crown cover, species, quality class etc), stand level variables (forest types, disturbances on forest stands, management regime, slope, aspect, location etc) and soil samples were collected. The detail methodology of data collection has been shown in States of Nepal's Forests (29)(hereafter, Forest Resource Assessment (FRA) report of Nepal)). The data represents the wide topography, i.e., altitude ranging from 88m to ~4000m and slope of 0-100%. Likewise, the data were taken from sparse forests to very dense forests.
Above ground tree biomass (AGTB) and soil organic carbon (SOC) analysis: Above ground tree biomass was calculated by summing up stem biomass, branch biomass and foliage biomass. Stem biomass was calculated as a product of volume of the stem and air dried wood density (Equation 1). Furthermore, branch biomass and foliage biomass were calculated using branch-to-stem and foliage-to-stem ratios based on the species type and size of the stem at diameter at breast height (DBH) (30). Biomass = Stem biomass = Volume * Density ………..Equation (1) Where, Volume = Stem volume in cubic meters (m 3 )

Density = Air dried wood density
Similarly, for SOC analysis, black wet combustion method ((31)cited in (32)) was applied in the soil samples collected up to 30 cm depth in the DFRS soil laboratory Nepal while dry combustion and LECO CHN Analyzer was used in the Metla Soil Laboratory, Finland for the quality assurance. Detailed method of estimating AGTB and SOC has been explained in the FRA report of Terai forest in detail(32).

Data split
Data analysis was focused on assessing SOC based on topographic (altitude, slope and aspect) and forest variables (AGTB, Basal Area (BA)/ha and crown cover)). Before splitting data, boxplot was used to check the presence of outliers in the data. Altogether 1026 SOC plots were used for the analysis. The data was split into two sets i.e. train data (80% i.e. 822) for developing models and test data (20% i.e. 204)for data validation using "createDataPartition" function in "caret" package (33).It splits data randomly into two different sub-sets with different proportions. All the data analysis work was done in R program (34).

Modelling
Before modelling, correlation analysis was performed to determine linear relationship of SOC with six independent variables (altitude, basal area, AGTB, slope, aspect, crown cover) using "cor" function in "stats" package in R (34).Afterwards, these six independent variables were used as function of SOC using "lm" function in the "stats" package (34).To nd out the best predicted variables, stepwise regression method was applied using "step" function in the "stats" package (34). Altogether six different models were developed (M1, M2, M3, M4, M5, M6). Akaike Information Criterion(AIC) and coe cient of determination (R 2 ) value were used as selection criteria for the best model. Number of predicted variables used in the models having lower AIC value and higher R 2 value indicate better ts in the model. Moreover, VIF (variation in ation factor) function in "car" package (35)was used to determine the presence of multicollinearity problem in the predicted variables in the models.In this way, models were tested based on several variables to assess SOC in the entire region of Nepal.

Data transformation
Prior to validation of the models, it is necessary to check whether the built models violate assumption of simple linear regression (i.e. homoscedasticity and normality). For this purpose, bptest function in "lmtest" package (36) for heteroscedasticity test and shapiro.testfunction in "stats" package (34) for normality tests were conducted. To overcome the problem of rejecting the null hypothesis of the homoscedasticity and normality assumption, the response variable (i.e. SOC) was transformed using BoxCoxTransfunction in "e1071" package (37) to normalize its distribution and newer models were developed ( TM1, TM2, TM3, TM4, TM5 and TM6).

Model validation
The model was validated using test data set. The predicted value of response variable was back transformed and compared with real value in the test data. The mean absolute percentage error (MAPE) was calculated to assess the accuracy of the model using "MAPE" function in "MLmetrics" package (33). Lower MAPE value indicates the higher accuracy of the models.

Distribution of variables
Six independent variables and one response variable were used. Distribution of response variables (i.e. SOC) ranges from 2.18 ton/ha to 61.35 ton/ ha against different independent variables (Fig 2) and its descriptions ( Table 1). It is evident from gure 2 that altitude has stronger correlation with SOC compared to the other variables.

Correlation of variables
Correlation analysis was conducted to assess degree of relationship between response and predicted variables (Fig 2).  To check the multi-collinearity problem in the M4, Variation In ation Factor (VIF) of predicted variables of the model were found to be<1.1. It shows that there is no multi-collinearity problem in the M4. However, the M4 uses three different predicted variables to predict SOC. In practice, more predicted variables in the model require more efforts to collect data and hence increase the cost. To increase the e ciency of the model, the number of predicted variables should be kept as minimum as possible in order to make it cost effective. Furthermore, step wise regression was done on the M4 by eliminating less important variable in a subsequent manner (Table 2). Despite the robustness of the model, it is necessary to check whether the model violates assumption of simple linear regression (i.e. homoscedasticity and normality). It was found that all of the models M1 to M6 rejected null hypothesis (assumption of homoscedasticity and assumption of normality) and could be seen in graphical presentation (Fig.4). To overcome the problem, the response variable (SOC) was transformed using Box Cox Transformation method and newer models were developed (Table 3). Among six predicted variables used in the Transformed Model1 (TM1), altitude (p<2e-16) was found signi cant followed by crown cover (p<0.00042), slope (0.00272) and aspect (0.01052) while AGTB and basal area were found insigni cant. During the model development, predicted variable (altitude>250m) was used by hit and trial method. Altitude below 250m had no correlation (r= -0.08) with SOC and using it in the model did not remove heteroscedasticity and normality problem in all the new models. Finally, all the transformed models (TM1 to TM6) accepted the null hypothesis of homoscedasticity (p>0.05) and normality (p>0.05). Same data (altitude>250m) was also used in the ordinary model (M1 to M6), however problems of heteroscedasticity and non-normality were still present. Figures 4 and 5 show the presence of heteroscedasticity and homoscedasticity in the ordinary data and transformed data respectively

Discussion
Soil organic carbon pool plays an important role in carbon cycle by mitigating the level of CO 2 in the atmosphere (38). Forest soil stores majority amount of SOC than other pools (6).SOC is found to be positively correlated with the altitude in this study. It shows that there is increase in SOC along with altitudinal gradient. Same trend has been reported in several previous studies in different regions which were con ned to short altitudinal gradients than this current study (250-3993m).i.e. Meghalaya, India  (10,(17)(18)(19)(20)(21)(22)(39)(40)(41).This study has con rmed the positive relation of altitude and SOC distribution. Increase in the stock of soil organic carbon along the higher altitudes could partly associated with the decreasing temperature with the increasing altitude ((42) and reduced soil carbon losses through decomposition of organic matter (12). It is also argued that low temperature is likely to control retention of SOC (23).
Contrastingly, declining trend of SOC along with altitudinal gradient has also been reported by other studies (13,14)conducted in short altitudinal ranges (1800-2500m;600-1200m /and1800-2200m)arguing that it is due to better stabilization of SOC at lower Altitudes. Our study also found similar result (r = -0.08) in the altitudinal range below 250 m. This can be explained by better stabilization of carbon in soils at lower altitudes (14).
Higher variation in SOC among different altitudinal ranges might be due to a variation in topographic and climatic factors. Topographical factors (such as elevation) are main contributing factor for spatial variability of SOC (43)(44)(45) and induce heterogeneity in SOC which is likely to produce large uncertainties in SOC storage (46,47).Altitude itself does not affect directly to the SOC distribution but the variation in climate (temperature, precipitation) due to changes in altitudinal ranges affect the SOC amount. The phenomena of decreasing temperature in the increase of altitude may help to use altitude as a proxy of climatic variable (i.e. temperature) in the assessment of SOC distribution.
Moreover, the strong correlation between SOC and altitude offers the possibility to develop a linear regression model in which altitude solely serves as major predictor (approx. 2/3rd variance of the real value) for SOC estimates. Our study shows that model uncertainties increase with altitude. To improve the model, transformation of response variable is necessary to improve linear relationship.
Compared to altitude, other variables (AGTB, crown cover, basal area) are not signi cant to predict SOC distribution over the large area. AGTB do not have signi cant relationship with SOC in this study. Though AGTB is one of the main sources of carbon for SOC formation but the amount of AGTB is directly affected by deforestation and forest degradation (Gibbs et al 2007). Likewise, crown cover and basal area of the trees do also not have a pronounced effect on SOC distribution. These variables are also much in uenced by management practices in the forests. Large amount of timber and fuel-wood is extracted from the forest to ful ll the demand of the people. Effect of forest management on AGTB, crown cover and basal area is larger than the effect of climate. Existing forest management practices (timber harvesting, thinning, fuelwood fodder collection) based on the different objectives and other factors (forest re, hurricane) could be a reason for weak relation between SOC and AGTB/crown cover/basal area.
The amount of SOC is also affected by topographic positions (i.e. slope and aspect). A study by Zhu et al (2017) reported the effect of slope and aspect on SOC distribution is higher than the altitude in the small area (47). Whilst our study area is large (national level), slope and aspect are not well suited to capture variation of SOC distribution over the larger area. Similar slope and aspect in two different altitudes in the larger area may have different climates (temperature, precipitation) resulting in different amount of SOC accumulation.

Conclusions
The study along the altitudinal gradients has contributed a better understanding of biophysical factors that potentially affect SOC. Effect of altitude on SOC distribution has been well presented in the previous studies. However, most of the studies are limited to the areas with small range of altitudinal gradient and are not applicable to larger scale estimation of soil carbon storage. This study has used representative data from the large area with wider altitudinal gradient and has provided reliable estimates of SOC stock.
Thus, our study (wider altitudinal range) supports the ndings of previous studies on positive relation of SOC distribution along altitudinal gradient and helps permit wider acceptance of this hypothesis.
The signi cance of altitudinal gradient on SOC is con rmed in this study. Altitude has been found to be a good predictor of SOC at the large area with diverse altitudinal gradient. Other predictors such as crown cover, slope and basal area help increase the accuracy of the model slightly. But use of these variables in the model seems ine cient and unworthy for the larger area with diverse altitudinal gradient.
In the context of detrimental impacts of climate change on land degradation and food security, there is promising development concerning soil security. Recently, Sherwood et al. show that a warming in the range of 2.3 to 4.5 °C is very likely to occur in the next 40 to 80 years (48). High-altitude forests which store large quantities of SOC would loss more organic carbon under the elevated temperature. The ndings of our study will have signi cant implications on the maintenance of SOC sinks and forest Cpools as well as the GHG mitigation potential of high-altitude forests.
The understanding provided by this study on the relationships of SOC with environmental and topographic factors particularly in the mountain ecosystems contributes to climate change research. Authors' contributions RM contributes on data acquisition, data analysis and drafting manuscript while PN and MK contribute from draft stage to the nal stage of the manuscripts. All authors discussed and revised the manuscript.

Abbreviations
All authors read and approved the nal manuscript.  Showing correlations among seven different variables (soil organic carbon, altitude, slope, aspect, above ground tree biomass, basal area, crown cover) Figure 4 Showing sample plot selection during national forest inventory Showing presence of homoscedasticity (left) and normal distribution (right)