Modeling Forest Carbon Estimation Using Sentinel-2 Derived Indices in Yayu Afro-Montane Forest, South West Ethiopia

Background: Empirical analyses were common methods for forest carbon estimation. Lately, satellite images are popularly used to study different attributes of forest vegetation. Sentinel-2 image provides a signicant improvement in spectral coverage, spatial resolution and temporal frequency to assess forest biomass. This study assessed the potential of vegetation indices and biophysical variables derived from Sentinel-2 images in modeling above ground biomass (AGB) and carbon stock in the Yayu forest biosphere reserve. Method: About twenty variables extracted from the Sentinel-2 image were used in this study. Forest stand parameters such as DBH and tree height were used to calculate AGB using allometric equations. The correlation between the biomass values measured from plots and the variables extracted from Sentinel-2 images were examined using the Pearson correlation coecients. A regression analysis was applied to select determinant variables for predicting AGB. The regression model results were validated based on the coecients of determination between the observed and the predicted values. Results: A strong correlation (r = 0.65 - 0.74) was found between the biophysical variables from Sentinel-2 image and AGB measured from sampling plots. The multispectral (MS) Band 4, the biophysical vegetation variables from Sentine-2 images were strongly correlated with the AGB. The variables MS Band 4, IRECI, LAI, FCOVER and FAPAR are good predictors of the forest AGB. The model goodness of t between the observed and predicted values of the AGB showed a coecient of determination (r 2 ) value of 0.74 and root mean square error (RMSE) of 0.16 ton C/pixel. Conclusion: The developed AGB prediction model was applied to successfully quantify and map the AGB and carbon stock of the forest in the biosphere reserve. Vegetation indices from Sentinel-2 images can effectively predict AGB in forest landscapes and can avoid costly ground surveys to quantify AGB and carbon stock in dicult terrains. as to the on the and within the and core of the Yayu Biosphere Within each all with ≥ 5 cm and H of > 1.3 m were and for DBH and The DBH was using diameter tape while H was using Sunnto clinometer. The establish a forest biomass prediction determinant vegetation and coecient were a multi-linear regression The input variables for the prediction were again selected the magnitude of their coecients and the root mean square error. The values of the coecients of determination of the variables ranged from 0.31 to 0.74 between Sentinel-2 image extracted indices and the above ground biomass. A strong correlation was observed for most of the vegetation biophysical variables (r = 0.65 - 0.74) than the vegetation indices with the measured AGB from the eld. the

Data on forest productivity assessment, total biomass production, growth prediction and ecosystem services valuation are essential for forest management planning and utilization ( (Georgia et al. 2017;Pertille et al. 2019). In this study, we used correlation analysis and regression algorithm to establish above ground biomass estimation model in a natural forest. The technique has been tested in other studies and yielded best results in modeling above ground biomass estimation (Lu 2006). This study was aimed at examining the relationship between directly measured above ground forest biomass and vegetation indices as well as biophysical variables derived from Sentinel-2 Multispectral Instrument (MSI) image, and to identify best predictor variables so as to develop carbon stock predictor model for the study forest types.

Description of the study area
The Yayu afro-montane forest is found in the Illubabor Zone, southwest of the country at about 550 Km from the capital, Addis Ababa. The geographic location is between 8 4′ 56.05″− 8 24′ 40.46″ N latitude and 35 44′53.85″− 36 5′12.23″ E longitudes (Fig. 1). Large part of the Yanu afro-montane forest is protected as a Forest Biosphere Reserve. The forest is part of the last remaining intact patches of natural forests in the southwest region. The forest has multiple economic, social and environmental bene ts. It provides non-timber forest products, mainly spices, honey, and herbal medicine to rural communities for their livelihoods. The forest contains one of the largest forest biomass in the country and hence signi cantly contributes to climate change mitigation. Besides, the Yayu forest is one of the last remaining montane-rainforests containing wild Coffee arabica gene pool populations in Ethiopia. The forest site is effectively serving as an in situ conservation forest for the wild Coffee arabica population gene pool (Gole et al. 2008;Schuit et al. 2021). Coffee makes the largest share of living for the local communities. The climate is characterized by hot and humid tropical climate with a mean annual temperature of 25°C, varying between 12.7°C and 26.1°C. The region receives high mean annual rainfall of about 2100 mm, with high annual variability ranging from 1400 to 3000 (Gole et al. 2008).
The topography is complex with undulating hills and valleys dissected by several small streams draining into the Geba and Dogi Rivers. The elevation ranges between 1217 m.a.s.l at the valley bottom to 2583 m.a.s.l at the highest point in the watershed (Fig. 2). The valley gorges and the mountains are steep slopes and not easily accessible. As result, the dense and large patches of the forests are located in these parts of the landscape.
The land use land cover was mapped from a Landsat 8 dry season imagery of 2018. The forest land constitutes the largest cover with about 62 % followed by cultivated agricultural land constituting about 30 % of the total cover. The rest of the landscape is covered with shrub lands (3 %), settlements (2.7 %) and wetlands (2.3 %) (Fig. 3). Although the forest area is registered as a National Forest Priority Area and a Biosphere reserve, the local communities are highly dependent on the forest mainly for harvesting natural coffee, spices and honey production. Thus, the Biosphere reserve forest has three functional zones allowing farmers to harvest non-timber forest products in the transition and buffer zones while leaving the core zone as access-restricted conservation zone (located primarily in the valleys and mountains). As shown in Figure 3 below, the dark green covers are the dense forests designated as core zones in the inaccessible high altitude steep mountain and in the low altitude river valleys in the Yayu forest. The landscape in the middle altitude landscape are the buffer and transitions zones, where agricultural cultivation is practices with strict management (Gole et al. 2008) Data sources Three data sources were used for the study. The Landsat 8 image, dated February 2018, was used to classify the land use land cover map and extract the forest cover area of the Yayu forest biosphere reserve. Vegetation parameter data for biomass estimation were directly measured in the eld using vegetation sampling plots. Vegetation indices (VIs), biophysical variables (BPVs) and relevant bands were derived from Sentinel-2 imagery (Fig. 4).
The Sentinel-2 satellite imagery, taken in the dry season of February 2018, was downloaded from the open access European Space Agency (ESA) hub. The images were pre-processed using the Sentinel Application Platform (SNAP) and quantum GIS (QGIS). The Sentinel-2 Multispectral instrument (MSI) with swath width of 290 km was Ortho-recti ed to UTM Zone 37N projection and a radiometric correction was done to reduce atmospheric and sun angle effects (Baillarin et al. 2012). The image was transformed from radiance to surface re ectance by applying the Dark Object Subtraction (DOS) method using the semi-automatic classi cation plugin (SCP) in QGIS software. The DOS method removes the darkest pixel in each band that might be affected by atmospheric scattering (Chavez 1988). The blue, green, red and near infrared bands, with 10 m resolution, were resampled into a 20 m resolution using ArcGIS software to correspond with the 20 m vegetation sampling plot size of the eld data measurement. The Sentinell-2 MSI was used for deriving multi-spectral bands, vegetation indices (VIs) and biophysical variables (BPVs) (Fig. 4).

Vegetation indices (VI) extraction
The vegetation indices for biomass estimation in this study were extracted from the Sentinel-2 image (Table 1). In a remotely sensed data, a vegetation index is a spectral transformation of two or more bands designed to enhance the contribution of vegetation properties and allow reliable spatial and temporal inter-comparisons of terrestrial photosynthetic activity and canopy structural variations (Huete et al. 2000). Vegetation indices extracted from Satellite data have emerged as important tools in monitoring, mapping and managing terrestrial vegetation as the indices provide radiometric measurement of the quantity, structure and condition of vegetation, and effectively serve as useful indicators of seasonal and inter-annual variations.
There are many VIs with similar functionality and most of them use the inverse relationship between red and near-infrared re ectance associated with healthy green vegetation. The measurements of vegetation attributes include leaf area index (LAI), green leaf area index (GLAI), percent green cover or fractional green cover, chlorophyll content, green biomass and absorbed photosynthetically active radiation (APAR). According to Bannari et al. (1995), VIs are normally classi ed based on a range of attributes such as the number of spectral bands (2 or greater than 2); the method of calculations (ratio or orthogonal), depending on the required objective; and the historical development (as rst generation VIs or second generation VIs). In order to compare the effectiveness of different VIs, Lyon et al. (1998) classi ed seven types of VIs based on their computational methods (Subtraction, Division or Rational Transform). With the advancement in hyper-spectral remote sensing technology, high-resolution re ectance spectrums are now available to be used along the traditional multispectral VIs. Besides, VIs have also been developed to be speci cally used with hyper-spectral data such as the use of Narrow Band Vegetation Indices.

Biophysical variables (BPVs) extraction
Surface biophysical or canopy properties provide an understanding of the physics of the interactions between solar radiation and vegetation elements (Asrar et al. 1989). Surface parameter retrieval from satellite remote sensing data has been one of the major sources to obtain surface parameters because it relates the vegetation characteristics to its spectral signature or re ectance value thereby providing reasonable estimates of vegetation properties across various spectral, spatial and temporal scales (Asrar et al. 1989). According to Widlowski et al. (2004), biophysical variables describe the spatial distribution of vegetation state and dynamics, thus, are useful for biomass estimation. The vegetation indices and biophysical variables were computed using the ArcGIS and SNAP software. The indices were selected based on their performances in biomass estimation in earlier studies ( Table 1). The vegetation index map layers were produced using QGIS and ArcMap (Fig. 5 and 6).

Vegetation parameter measurement from sampling plots
A total of 20 randomly drawn sample plots were used to measure the AGB biomass samples from the forest. The vegetation parameter (tree parameters) such as Diameter at Breast Height (DBH) and height (H) were measured in a 20 m X 20 m (400m 2 ) sampling plot, which were randomly generated from the forest map using ArcGIS. The sampling plot coordinates were used as references to locate the plots on the ground and within the transitional, buffer and core zones of the Yayu Biosphere reserve forest. Within each plot, all trees with ≥5 cm diameter and H of > 1.3 m were recorded and measured for DBH and height. The DBH was measured using diameter tape while H was measured using Sunnto clinometer. The eld data were used for validating the biomass modeling outputs and to serve as a ground truth data. For most vegetation types in the tropics, a relationship is established for measurable tree parameters and forest stand parameters such as volume and biomass, which are often di cult for a direct measurement (Husch et al. 2003). Hence, already established allometric equations are often used to estimate the biomass by using tree parameter data.

Extraction of the pixel values of predictor variables
The pixel values for each variable derived from the Sentinel-2 image were extracted using zonal statistics in ArcGIS. The eld plot geographical location (latitude and longitude) points were used as references to match the pixels as shown in the gure below (Fig. 7). The extracted pixel values for each variable were exported in CSV (comma separated variable) data formats.
Above ground biomass and carbon stock estimation The above ground biomass and carbon stock were quanti ed using an allometric equation with input data from the tree parameter measurements such as DBH and H in the eld. Besides, speci c wood density, which is the dry mass of a unit volume of fresh wood of trees, is used to convert the wood volume into biomass and carbon estimate. The allometric equation selected for this study was established for tropical forest biomass estimation and has been widely applied in similar studies (Chave Where, AGB is Above-ground biomass (g), ρ is speci c wood density (g/cm3), D2 is diameter at breast height (DBH) (cm); H is height of tree (m). The above-ground biomass was converted into carbon equivalent using the biomass conversion factor or carbon fraction of 0.47 IPCC (2006).

C = AGB x CF Equ. 2
Where, C is Carbon stock (g), and CF is Carbon Fraction of above ground biomass Data analysis (Correlation, regression analysis and model development) The forest biomass data measured from the eld and the extracted variables from the Sentinel-2 images were organized into a spreadsheet with CSV format. Correlation between the biomass estimates from the eld and variables from the Sentinel images were tested using SPSS software. Those variables having signi cant correlation with the measured biomass data were identi ed, selected and a regression analysis was performed between the measured biomass and the vegetation indices to develop a biomass prediction model.
The model was then evaluated based on the magnitude of the Root Mean Square Error (RMSE) and coe cient of determination (r2). The best model was developed by integrating those variables with high r2 and a low RMSE. The equation obtained from the regression model was then used to estimate AGB. The r2 was preferred since it has a standard measure with values ranging from 0 to 1. The r2 also shows the percentage of the variability explained by the model (Husch et al. 2003). This makes it easy to understand the relationship between the independent (indices) and dependent variable (biomass) (Peters 2007). The signi cance of the model was assessed using the P-Value at α = 0.05. For those signi cant indices, the equation obtained from the regression model was then used to estimate AGB.

Above Ground Biomass of the Forest
The results from the plot measurements data showed that the highest amount of forest above ground biomass (ABG) was concentrated in those plots located in the dense part of the forest or the core zone of the biosphere reserve, in which large sized trees are found in large numbers ( Table 2, plots shown in bold). The least amount of above ground biomass were recorded In those plots located in the disturbed and semi-disturbed parts of the forest or in the buffer and transition zones of the biosphere reserve ( Table  2, plots in red and bold). This is perhaps directly linked to the degree of human impact on the forest since the core is protected while the buffer and transition zones are open for community access that might lead to selective removal of mature trees from the forest.

Correlation between AGB and the predictor variables from Sentinel-2 image
The result of a correlation analysis between the measured above ground biomass and the predictor variables extracted from the Sentinel-2 images showed that there is strong correlation between the observed AGB and the predictor variables, with a correlation coe cient (r) value ranging from 0.36 to 0.74. Among the predictor variables, NDVI (r = 0.36), IRECI (r =0.5), NDVI45 (r = 0.40), LAI (r = 0.74), FAPAR (r = 0.7), FCOVER (r = 0.64) and Cab (r =0.69) were strongly correlated with the AGB (Table 3). IRECI from the vegetation indices and LAI from the biophysical variables were found to be best correlated with the observed/measured AGB. Among the different variables, biophysical variables were strongly correlated with the above ground biomass (r = 0.65-0.74).

Relationship between measured biomass and derived indices
The results from the regression analysis revealed that there is a positive linear relationship between forest biomass and vegetation indices extracted from the Sentinel-2 satellite images ( Fig. 8; r Similarly, the forest biomass has showed a strong and linear relationship with the biophysical variables drawn from the satellite images ( Fig. 9; r 2 = 0.42 -0.54). From the MSI bands, Band 4 performed better than other Sentinel-2 bands (r = -0.44 and r 2 = 0.2) which is selected for developing the AGB prediction regression model. The best predictor variables were selected for the biomass prediction model development based on the strength of the relationship between the indices and the measured above ground biomass.

Modeling AGB Biomass Prediction from vegetation indices
From the regression analysis, those variables with high values of coe cient of determination were selected for the above ground biomass prediction. Those with low values of coe cient of determination and those showing multicollinearity were excluded from the model. Only ve variables were selected to develop the model and others were excluded because of very low values of coe cient of determination and problem of multi-collinearity (Table 4). As a result, LAI, FCOVER and FAPAR from the biophysical variables, IRECI from the vegetation indices and Band 4 from the MSI bands were selected for the model development ( Table 4). The results show that the biophysical variables are better suited for developing forest biomass prediction model compared to other types of vegetation indices. Where B4 is Band 4, LAI is Leaf area index, IRECI is Inverted Red-Edge Chlorophyll Index, FCOVER is Fraction of vegetation cover, FAPAR is Fraction of Absorbed Photo-synthetically Active Radiation. These indices can be derived from any Sentinel-2 image and can be used to predict forest above ground biomass using the prediction equation. The prediction model was validated using the measured or observed values of above ground biomass from the eld ( Table 5). The measure of the goodness of t between the observed and predicted values showed a strong linear relationship with a coe cient of determination of r 2 = 0.73 (Fig. 10).

Discussion
The plot measurement results correspond with the forest biosphere strata of the Yayu forest, illustrating the different management zones of the forest. The magnitude of the Biomass measured from those plots located in the buffer zone was lower than those measured from plots located in the core zone. Unlike the buffer and transition zones, the core zone is protected for biodiversity reserve and conservation (Gole et al. 2008;Schuit et al. 2021). Besides, it is inaccessible as well as the density and size of trees is relatively high compared to the transition zones. On the contrary, the buffer and transition zones are freely accessible for agricultural production and managed for coffee cultivation (Schuit et al. 2021 Using the raster calculator of ArcGIS and the forest layer thematic map, the above ground biomass was mapped by applying the prediction model (Fig. 11). The result well corresponded with the biosphere structure zones. The highest amount of the AGB is in the range of 6 to 10 ton per pixel or 150 to 250 ton/ha, which are those areas closer to the core conservation zone of the forest biosphere reserve. In the transitional and the buffer zone of the forest, where access roads are available and where agricultural activities are permitted, the concentration of biomass is below 6 ton per pixel (Fig. 11). Within the core zone, a predicted value of biomass higher than 10 ton per pixel (250 ton/ha) has been recorded in scattered pocket areas of the forest reserve. These spots are located in the steepest and most inaccessible parts of the forest, in which anthropogenic activities are restricted and very minimum.
The above ground forest biomass was converted to the carbon equivalent using carbon conversion factor (a default value of CF = 0.47) and mapped using the raster calculator in ArcGIS. The carbon stock map is similar to the biomass distribution map and the inaccessible areas were found to have higher amount of carbon stock with a value of 7.05 ton/pixel or 176.25 ton/ha. Likewise, the lowest amount of carbon stock, i.e., 2.82 ton/pixel or 70.5 ton/ha, was recorded in the accessible and transitional zone of the biosphere reserve forest (Fig. 12).

Conclusion
Forest biomass and carbon stock estimation techniques using remotely sensed data are becoming more reliable due to improvements in spectral and spatial resolutions of products from different sensors.
Sentinel-2 optical data are increasingly applied for biomass and other vegetation attributes estimation.
This study assessed the strength of correlation and relationship between forest above ground biomass directly measured from sampling plots and vegetation indices as well as biophysical variables extracted from Sentinel-2 optical images so as to establish a forest biomass prediction model by identifying the determinant vegetation indices. Those variables and bands with high coe cient of determination were selected for a multi-linear regression analysis. The input variables for the prediction model were again selected based on the magnitude of their coe cients and the root mean square error. The values of the coe cients of determination of the variables ranged from 0.31 to 0.74 between Sentinel-2 image extracted indices and the above ground biomass. A strong correlation was observed for most of the vegetation biophysical variables (r = 0.65 -0.74) than for the vegetation indices with the measured AGB data from the eld. As a result, the variables such as the LAI, FCOVER, FAPAR, IRECI and Band 4 were main inputs for developing the above ground biomass prediction model. The model was validated by considering the strong correlation coe cient of 0.738 and the root mean square error of 0.16 between the observed and predicted values of the forest above ground biomass. Hence, we can conclude that vegetation biophysical variables derived from Sentinel-2 optical images are highly suitable for forest above ground biomass prediction.  Linear relationship between Observed AGB and Vegetation Indices extracted from Sentinel-2 MSI  Scatter plot showing goodness of t between the observed and predicted values of the above ground biomass Figure 11 Map of the predicted values of the above ground biomass of Yayu forest biosphere reserve Figure 12 Map of the predicted carbon stock in the biosphere reserve of Yayu forest