Prediction of Biomass in Dry Tropical Forests: An Approach on the Importance of Total Height in the Development of Local and Pan-tropical Models

ABSTRACT Dry tropical forests in arid lands cover large areas in Brazil, but few studies report the total biomass stock showing the importance of height measurements, in addition to applying and comparing local and pan-tropical models. Here, we use a biomass data set of 500 trees and shrubs, covering 15 species harvested in a management plan in the state of Pernambuco, in Brazil. We seek to develop local models and compare them with the equations traditionally applied to dry forests – showing the importance of tree height measurements. Due to the non-linear relationships with the independent variables of the tree, we used a nonlinear least squares modeling technique when adjusting models, we adopted the cross-validation procedure. In summary, our above-ground biomass data set is best represented by the Schumacher-Hall equation: exp [3.5336 + 1.9126 × log (D) + 1.2438 × log (Ht)], which shows that height measurements are essential to estimate accurately biomass. The biggest prediction errors observed when testing pan-tropical models in our data demonstrated the importance of developing new local models and indicated that careful considerations should be made if generic “pan-tropical” models without height measurements are planned for application in dry forests in Brazil.


Introduction
Dry tropical forests are a large reservoir of above-ground living biomass, which play a key role in the global carbon cycle and are widely recognized as one of the main ecosystems that serve as a barrier to desertification (Abich et al., 2018;Guha et al., 2019;Salinas-Melgoza et al., 2018). Current efforts to quantify the global above-ground biomass and carbon stocks in these forests comprise dynamics and productivity (e.g., Althoff et al., 2018;Avitabile et al., 2016;Wagner et al., 2016), assess the conservation potential to mitigate climate change (e.g., Bastin et al., 2017;Salis et al., 2006) and examine the ecosystem function relationships of biodiversity (Chave et al., 2005a(Chave et al., , 2014Hiltner et al., 2018). All of these cases almost exclusively depend on robust estimates of aboveground carbon and biomass (AGB) storage.
It is therefore correct to state that obtaining biomass using allometric models is substantially important in forest inventories (e.g., Avitabile & Camia, 2018;Gonzalez De Tanago et al., 2018), especially for deforested dry forest areas, because in addition to supporting estimates of local carbon dynamics (Althoff et al., 2018), it provides information for understanding concentrations at different continental scales (Chave et al., 2014).
However, allometric models which only relate biomass to tree diameter in dry tropical forests can perform poorly in biomass predictions (Chaturvedi & Raghubanshi, 2015;Návar et al., 2013), especially when compared with models that include tree height and/or basic wood density (Ali & Mattsson, 2018) and are parameterized at local, regional or continental scales (Abich et al., 2018;Chave et al., 2014).
Thus, tree height is an important component of this allometric relationship as tree biomass is partly a function of tree volume, which is a function of tree height, trunk basal area and trunk taper (Chave et al., 2005a;Sullivan et al., 2018). Although obtaining tree height in forest inventories is not an easy task (see Larjavaara and Muller-Landau 2013for discussions), the incorporation of a height parameter is known to markedly improve the estimates of individual tree biomass scales (e.g., Lima et al., 2017;Sampaio et al., 2010;Sampaio & Silva, 2005), and this has a substantial effect on pan-tropical scales as well (Feldpausch et al., 2012;Sullivan et al., 2018). As a result, in practice, this can lead to incorporating tree height in REDD + (Reducing Emissions from Deforestation and Forest Degradation) carbon monitoring (Pelletier et al., 2017;Sullivan et al., 2018).
Several pan-tropical models were therefore developed to estimate biomass in dry tropical forests using models which only relate biomass to diameter (Návar, 2015); basic wood diameter and density (Chave et al., 2005a -Type II.1 and Type II.3); diameter, height and basic wood density (Chave et al., 2005a -Type I.1 and Type I.5). However, these relationships can be expected to vary at various environmental and spatial scales (Ubuy et al., 2018), suggesting that even these developed pan-tropical or local models lack the sophistication required for many applications (Rutishauser et al., 2013).
In addition, it would be very useful to generally understand how tree height affects the reliability of local scale models in Brazilian dry tropical forests. In particular, ecologists and professionals with the aim of generating better accuracy of forest biomass estimates would benefit from knowledge that locally derived models consistently outperform existing regional and pan-tropical models, especially when the importance is verified in measuring tree height.
In this paper, we address these challenges by assembling a data set where 507 trees were sampled for biomass and total height measurement and examined to quantify how well the locally derived models predict the tree's biomass. A cross-validation approach was performed to enable testing the performance of different allometric models with and without height on data which are independent of those used for model fitting. The specific objectives were: (1) to examine how models with and without height derived from the site affect forecasting errors; and (2) to test different pan-tropical models with and without height to local data to verify the improvement in biomass prediction when a local model is not available for areas with and without change in its structure.

Study area and collected data
The data used in the present study were collected from trees harvested from an area submitted to forest management located in the municipality of Floresta, State of Pernambuco (8° 30ʹ 37" S and 37° 59ʹ 07" W). The vegetation is predominantly Caatinga (tropical dry forest) characterized by shrub-tree vegetation, with the presence of cacti and herbaceous strata (Instituto Brasileiro de Geografia e Estatística-IBGE, 2012) ( Figure 1).
The two study areas differ from each other in terms of preservation conditions. The first to the north is called "transposition" with 40 permanent plots of 400 m 2 (20 x 20 m), having an extension of approximately 50 ha and is considered preserved (55 years of lesser anthropic disorders). The second area further south, also with 40 permanent plots of 400 m 2 (20 x 20 m) called "correntão", underwent logging using the correntão technique in 1987 for planting eucalyptus, but was abandoned and has been undergoing regeneration for 29 years ( Figure 2).
The aboveground biomass (trunk and branches), total and commercial heights and base diameter (0.30 cm from ground level) and diameter at breast height (1.30 cm from ground level) of 507 trees distributed in 14 species and 2 genera were measured. The trees were harvested with a minimum diameter at the base of the stem (Db) of 1.9 cm, up to the maximum found in the area, and covered a wide sampling range for diameter and heights. Table 1 has a descriptive summary with the means, maximums and minimums for the variables of the measured species. We consider the local values of basic wood density of the species, and when not available, the global database (Chave et al., 2014) available at https://datadryad.org/handle/10255/dryad.235 was considered. Despite the lack of sufficient data, several studies recommend average wood density at the gender level for biomass assessment (Henry et al., 2010;Ubuy et al., 2018).
where n is the number of trees harvested, WD is mean wood basic density of individual trees (g cm −3 ), Db is diameter at stump height (cm), Ht is total tree height (m), AGB is total aboveground tree biomass (kg).

Model fitting and validation
The biometric data were randomly divided into two subsets after measurement for fitting and validating the allometric models using the Hold-out, cross-validation tool which randomly selects a sample N1 = 80% to adjust the models and N2 = 20% to validate the models.
Three single-entry allometric models (with only the base diameter as an explanatory variable) and five double-entry models were tested, with the explanatory variables being Db and total height (later referred to as locally derived models, Table 2).
where β i = parameters to be estimated; ε i = random error. We estimate model parameters using the Ordinary Least Squares (OLS) method and we verified the significance using the t-test (p = .05). The parameters were generally calculated using the total data of the trees measured and are assumed to be the true parameters, which  Table 2. Allometric models tested to estimate above-ground biomass in a dry tropical forest in the semiarid region of Pernambuco, Brazil.
Autor Model Variables 1. Husch Schumacher-Hall (1933) represent allometry. However, samples (20%) were extracted from the complete data set to assess the influence of predictions and calculate the bias of the estimates with the adjusted parameters (hold-out). The purpose of this tool is to estimate the value of a set of evaluation statistics using cross-validation testing. This type of estimate is obtained by performing N repetitions of a test cycle, where N is the size of the data set provided. One of the N observations is left out in each repetition to serve as a test set, while the remaining N-1 cases are used to obtain the model (James et al., 2013). The process is repeated N times, setting aside each of the N observations given. The Hold-out estimates are obtained by the average of the N scores obtained in the different repetitions. All computations and analyses were performed using the R® statistical software (R Core Team, 2019). Local equations, with height and without, were selected to verify the influence of this variable on the biomass predictions at the tree level. All the generated equations were analyzed by comparing the following statistical criteria: (1) RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi whereAGB i is aboveground biomass (i); d AGB i is the estimate of aboveground biomass and n is the total number of observations. AGB i is mean aboveground biomass.

Allometric predictions
The local equations developed in this study were compared with generic local/pan-tropical equations developed for dry forest areas using only diameter, or and in combination with height (Ht) and wood density (wd) as predictor variables (Table SI −1).
These pan-tropical allometric equations developed for dry tropical forests were applied to the sample trees in this study and are widely applied to predict stock assessments of AGB and C in dry tropical regions (Abich et al., 2018). We compared the observed average AGB and the predicted AGB.
The model error (RSE) and the tree level variation coefficient (CV) were defined for each equation as follows: In which: AGB ALTi and AGB REFi are the biomass estimate of tree i obtained from alternative equations with and without height (pan-tropical) and reference (local models with and without height), respectively. A large CV value (i) would be acceptable as long as the bias is low, because the model is generally applied to many trees within a site and therefore random errors tend to cancel each other out (Chave et al., 2014).
We also calculated new measures of mean prediction errors (bias), and in addition, the models were evaluated based on Pseudo-R 2 as follows:

Results
Among the local equations, those obtained by the Schumacher-Hall (1933) and Chave et al. (2005b) logarithmic models -Model I (5 and 6), showed better performance. Both equations produced similar values for RMSE, RSE, Bias and CV, but the best forecasts indicate a slight superiority of the Schumacher-Hall model, explaining more than 90% of the total variance and lower AIC value (Table 3). We noticed that the parameter related to basic density in the Chave et al. The Schumacher-Hall equation selected for biomass prediction in local dry forest predicts that the logarithmic transformation of diameter and height for a given tree weight decreases the bias in the estimate. These results support the decision to use regression methods to build models and estimate their parameters, therefore suggesting that it is more parsimonious to maintain an allometric double-entry model, in this case, the Schumacher-Hall model obtained for the area (Figure 3). The residual dispersion graph of the models indicates a trend line in the highest expected values of biomass with the presence of few outliers in the lowest values. However, it should be considered that the outliers suggest a curvature possibly caused by model errors rather than the data selected for the fit (trees that are unusually tall or short for their diameter).
Although much of the variation in biomass for pan-tropical models was explained by only diameter, the improvement was relatively significant when basic height and density were included (see Table 4). When analyzing the predictions of local and pan-tropical models, the variation in biomass estimated for small diameter trees is in fact smaller, being similarly predicted by models, which include height or height and basic wood density. However, there is a great divergence for different diameter classes when height is not included in the prediction, which results in significantly higher percentage biases of those models that include height (Figure 4).
When compared with the equations developed in this study, the average errors (RMSE and Bias) of the biomass estimate of the pan-tropical equations validated at the tree level did not reveal substantially visible differences, with the exception of the local Koperzky-  Gehrhardt equation (Table 3) and the pan-tropical equation by Brown et al. (1989) (Table 4) who use only the diameter as a predictor variable. The highest mean CV (i) in all comparisons was 378% for the equation that uses wood density and diameter (Chave et al., 2005a -Type II.1) despite the Brown et al. (1989) reported a greater absolute error at the tree level (bias = 7.57 kg). The mean lower bias was 0.32 kg for the Sampaio and Silva (2005) equation, which presents the same structure as the model developed locally. For pan-tropical equations, these results reflect not only absolute values, but also a large percentage variation, especially when wood density is included along with diameter and height in biomass forecasts. Interestingly, however, in these equations, the RMSE values practically double when compared to the equations of other tropical regions that use only the diameter or diameter and basic density.
The local equations developed tend to predict the AGB values in a more homogeneous manner and with a smaller amplitude of error for trees of smaller diameter, with a slight divergence for trees with Db >15 cm (bias>30%, Figure 4c). Pan-tropical and local equations from other regions suggest greater differences by class of diameter, being substantially more visible in smaller trees mainly by the equation of Brown et al. (1989) (Figure 4d). With the exception of the equations of Brown et al. (1989) and Barreto et al. (2018), the average biomass measured in the field does not report visible differences between the best local equations developed and the local equations of Dalla-Lana and pan-tropical of Návar (2015).

Discussion
Logarithmic models have constantly been used in the study of biometric relationships, mainly for developing biomass equations in dry tropical forests (Brown et al., 1989;Packard & Boardman, 2008;Ubuy et al., 2018). These results are also in line with studies carried out in dry tropical forests in Brazil (Brahma et al., 2018;Roitman et al., 2018) and another regions tropical dry forest (Abich et al., 2018;Návar-Cháidez, 2010).
The base diameter and the total height of the tree were generally the best predictive biometric variables to estimate AGB. All the statistical evaluation criteria revealed that the double-entry equations suggest greater precision of predictions, especially the equation obtained from the Schumacher-Hall model. These results indicate the inclusion of the height variable in the biomass estimate, as the simple entry models assume that trees of different diameters have the same heights, which is not true for dry tropical forests (Helmer et al., 2010;Salas-Morales et al., 2018). The functional form of the generated equation is biologically consistent, especially with the inclusion of the height variable, therefore it is concluded that tree height is an important biomass predictor, especially when considering data from different species (Abich et al., 2018).
Although height measurements are more expensive and time-consuming in forest inventories, the use of models that relate height and diameter is recommended in tropical forests (Feldpausch et al., 2011(Feldpausch et al., , 2012Sullivan et al., 2018). Models that include tree height improve the biomass estimate in many tropical forests and support more accurate biomass and carbon estimates (Rutishauser et al., 2013).
However, it is important to note that the predominant tree forms are represented among the sample trees used to develop allometric and hypsometric models (Duncanson et al., 2015). The accuracy of biomass predictions is likely to be improved with appropriate sample trees and small measurement errors by adding height as an explanatory variable, despite the uncertainty added using a diameter height model (Larjavaara et al., 2013;Sullivan et al., 2018).
The biomass predictions by the best pan-tropical models suggest that its shape parameters and trunk profile (β1 and β2) were not substantially different from the parameter estimates of the best local model. This indicates that biomass does not vary much on a pantropical scale and that the local model of this study could be applied in other parts of dry forests in the tropics. These results may also explain that there is a similarity in the type of vegetation with different land use histories, due to bioclimatic conditions and soil types; or intrinsic characteristics of the tree, such as physiology, regrowth, adaptive development and trunk bifurcations (multiple stem species) (Chave et al., 2014;Dexter et al., 2018).
This work seeks to fill the gap on the validity of allometric equations developed for dry tropical forests, although some research has already suggested the development of individual equations for species and regions (Chave et al., 2014;Lima et al., 2017). In addition to the importance of height measurements, the question currently being discussed in this paper is whether it would be better to use generic equations from other locations in locations where no generic allometric equation is available, or to develop location-specific equations.
Another important point that must be considered for the development of equations is related to the intra-and interspecific factors of the species, such as variations in the basic wood density (Ali & Mattsson, 2018;Bastin et al., 2015;Henry et al., 2010); the tree canopy (Bastin et al., 2014;LI Duncanson et al., 2010;Gara et al., 2014;Salas-Morales et al., 2018), leaf area index and height profiles (Cushman & Kellner, 2019;Greaves et al., 2015;Helmer et al., 2010;Wagner et al., 2016). Considering that such factors may still be influenced by changes in structural parameters such as richness, density, frequency and dominance, the development of specific allometric equations for locations and even more at the species level is fundamental for understanding the concentration of carbon stocks (Abich et al., 2018).
However, the development of allometric models is not a trivial task. The limiting factor has always been destructive sampling of the trees for adjusting and selecting the models. Highly accurate volume and biomass estimates of individual trees are increasingly available through Lidar technology (Estornell et al., 2011(Estornell et al., , 2012Hildebrandt & Iost, 2012). These estimates do not require destructive sampling of trees and can be carried out systematically in the field (Duncanson & Dubayah, 2018;Duncanson et al., 2017). A system could be developed with proper sampling to sample biometric data in situ from the tree in environmental gradients, providing a potential solution to the outstanding problems related to forest biomass and carbon stock.

Conclusions
The statistical results of the selected models were satisfactory and the evaluations revealed that the equations which include height in the biomass forecasts are more accurate than the models without height, and this statement is valid for the pan-tropical models with height when a local equation is not available. Predictive errors generally observed when testing and validating local and "pan-tropical" models without height demonstrated the importance of developing new models including the total height of trees. The large forecasting errors for "pan-tropical" models without height also indicate that careful considerations should be made if they are planned to be applied to dry forests elsewhere in northeastern Brazil. As there are no other appropriate model options, it is recommended that the model set be generally applied for estimates of large biomass areas in the region.
After fitting and validation, our above-ground biomass data set is best represented by the Schumacher-Hall equation: AGB = e 3:5336þ1:9126�log D ð Þþ 1:2438�log Ht ð Þ . The development of local allometric equations is appropriate to improve the AGB estimates in the dry tropical forests of Pernambuco, Brazil. It is a vital step in the ecosystem assessment of forest resources. This forest is widely recognized as one of the main ecosystems for carbon sequestration and desertification barriers. They can serve as important tools for carbon credit and payment for environmental services initiatives. As revealed in this study, the base diameter and height are the best biomass estimators, representing more than 90% of the AGB for multi-species data. The comparison of equations showed that the pan-tropical equation for dry forests developed by Chave et al. (2005b) can be used for the evaluation of forest biomass when the specific equations of the place and species are absent. Thus, the equations can be used for carbon accounting in REDD + and appropriate incentive projects which initiate forest development and evaluate ecosystem services.