Using Spatial Regression Models in Identifying the Drivers of Forest Structure in the Hyrcanian Forests of Iran

doi:10.21203/rs.3.rs-41187/v1

Download PDF

Research

Using Spatial Regression Models in Identifying the Drivers of Forest Structure in the Hyrcanian Forests of Iran

https://doi.org/10.21203/rs.3.rs-41187/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: Understanding the relationships between forest structure, in particular attainable height, and the environment is important for sustainable forest management. Similarly, modeling structural attributes improve our understanding of forest growth dynamics and may identify key drivers of long-term changes in the forest ecosystem. Due to the inherent complexity of these relationships, quantification of some drivers of forest growth is often not available, resulting in spatially auto-correlated errors of the regression model.

Methods: To explore the tree height-environment relationships of oriental beech we compared the performance of a standard regression model (multiple linear regression, MLR) to those accommodating a spatial correlation structure, specifically a Generalized Least Squares model with exponential correlation structure (GLS) and three variations of the Simultaneous Autoregressive Model (SAR): the spatial lag model (SLM), the spatial Durbin model (SDM) and the spatial error model (SEM). Across 127 0.1 ha circular sample plots in the primeval World Heritage Hyrcanian Forests of Iran, we collected data on tree height and edaphic and topographic. Within each plot, the height of all trees with DBH ≥ 7 cm was measured.

Results: The results showed that SAR and GLS models reduced spatial autocorrelation of model residuals and improved model fit, with both SDM and SEM slightly superior to the SLM in removing spatial autocorrelation in the model residuals. SDM performs better than SEM in terms of RMSE and adjusted R2.

Conclusions: Although SAR-based models performed marginally better than GLS, we still recommend GLS for spatial analyses due to their easier implementation and ease-of-use compared to SAR models. However, when the computation time is a concern, SAR-based models can be more useful because of faster execution. Keywords: spatial autocorrelation; Hyrcanian forests; multiple linear regression model; simultaneous autoregressive model; generalized least squares

Forestry

spatial autocorrelation

Hyrcanian forests

multiple linear regression model

simultaneous autoregressive model

generalized least squares

In primeval forests, tree height is one of the most important forest structural parameters for both forest ecology and forest utilization: forest tree height, forest productivity and carbon sequestration are strongly related (Lou et al. 2016). And tree height also correlates positively with terrestrial plant diversity at various spatial scales (Lindenmayer et al. 2012; Slik et al. 2013; Marks et al. 2016; Gatti et al. 2017; Lutz et al. 2018). Hence understanding which, and how, abiotic factors affect forest height will be indispensable to connect ecological theory with ecosystem management in an era of global change (Fricker et al. 2019). In spite of significant local variations in tree height, the environmental predictors determining it have been mostly assessed at large scales (Tao et al. 2016; Zhang et al. 2016).

Sustainable forest management requires an understanding of the relationships between forest structure attributes such as tree height and environmental parameters for multiple reasons. Modeling the structural attributes and environment can be used to understand the drivers behind variation in forest structure and to identify key indicators for monitoring long-term changes in the forest ecosystem. Understanding the variations in tree growth is also important for sustainable forest management, in order to tailor harvesting to growth dynamics (Coomes and Allen 2007; Pretzsch 2009). A large suite of non-linear, interacting factors, including climate, topography, soil conditions and competition for resources, influence the growth of forest tree species (Oliver and Larson 1996). The topography is a key driving factor in forest structure and composition, constraining the local nutrient and hydraulic conditions under which trees grow (Jucker et al. 2018). These effects are similarly reflected in forest structure as well (Tateno and Takeda 2003). The variation in topography constitutes a resource gradient and thus provides an opportunity to study the relative importance of soil nutrients availability in the forest structure development through resource competition between tree species (Tateno and Takeda 2003).

A variety of regression techniques have been used to model the complex relationships between biological response variables (e.g. the growth of individuals or the structure of ecological communities) and environmental predictors (in particular machine learning algorithms), but for interpretable results, multiple linear regression (MLR) regression is still the go-to method (e.g. Currie, 1991; Kerr and Packer, 1997; Rahbek and Graves, 2001). One critical point for all of them is the violation of the assumption of independence of data points, as spatial autocorrelation (SAC) is not taken into account (Keitt et al. 2002).

In recent years, spatial regression analysis has become an important part both in forestry and ecology to improve the description of spatial patterns in plant or other organism groups or communities (e.g. Beale et al. 2009, Lu and Zhang, 2011, Shestasova et al. 2019). Several extrinsic factors and intrinsic processes may lead to spatial structure in forest stands (Moeur 1993; Rouvinen and Kuuluvainen 1997). Spatial autocorrelation describes the pattern in which observations becoming less similar as they are further apart in space (Fortin and Dale 2005). The presence of spatial autocorrelation in model residuals violates the independence assumption and can inflate type I errors (Kühn 2007). This, in turn, can result in the selection of unimportant predictors and poorly estimated regression coefficients in species distribution studies (Lennon, 2000; Dormann et al., 2007). Spatial autocorrelation is an important issue in ecology because it is a general property of ecological variables which is measured over the geographic space (Legendre and Legendre 2012).

There are two types of causes of spatial autocorrelation, depending on which kind of process generates the spatial structure: endogenous or exogenous (Fortin and Dale 2005; Legendre and Legendre 2012). In the case of endogenous processes, factors which are related to the biology of the species under consideration (for example, conspecific attraction, dispersal limitation, demography, interspecific interactions, colonial breeding, home-range size, host availability, predation or parasitization risk, and so forth) generate the spatial pattern (Lichstein et al., 2002; Dormann, 2007). On the other hand, exogenous processes which are spatially autocorrelated themselves (e.g. topography varies smoothly and is hence autocorrelated) that drive the response variable of interest imprint their spatial autocorrelation onto the response, in what is called (induced) spatial dependence (Fortin and Dale 2005). Many studies have shown the importance of spatial autocorrelation in studying the species-environment relationships (Rahbek and Graves 2001; Dormann et al. 2007; Miller et al. 2007; Kissling and Carl 2008; Beguería and Pueyo 2009; Beale et al. 2010; Dahlin et al. 2014; Teng et al. 2018). Analyses accounting for spatial autocorrelation provide a more detailed description of spatial structure in species performance data and lead to a better understanding of the underlying ecological processes (Diniz-Filho et al. 2003).

Northern forests of Iran, called Hyrcanian or Caspian forests, are important sources of genetic variation, biodiversity, commercial woody products, and various environmental services (Ahmadi et al. 2017). These forests inscribed as the World Heritage in 2019 cover an area of about 1.85 million ha. Hyrcanian forests account for 15% of the total Iranian forests and 1.1% of the country’s area. These forests range from sea level up to an altitude of 2800 m and comprise various forest types, harboring approximately 80 woody species (trees and shrubs). Oriental beech (Fagus orientalis Lipsky) forests in the Hyrcanian ecoregion occupy about 18% of the total forest area, 30% of the standing volume and 24% of the stem number (Sagheb Talebi et al. 2014).

As one of the main timber species in the Hyrcanian forests, our overall goal in this paper was to explore the oriental beech tree height-environment relationships. We hypothesize that statistical models considering the spatial information provide both better description and a sounder statistical framework than using non-spatial models. We compare the performance of multiple linear regression (MLR), as benchmark, to alternative models with spatial correlation structure for modelling the tree height in relation to edaphic and topographic predictors in the study area.

Study Area

The study area was a mid-elevation (1000–1500 m) natural oriental beech (Fagus orientalis Lipsky) forest in the Hyrcanian ecoregion, northern Iran. The approximately 450 ha large study area is located between 36° 31' 56" N and 36° 32' 11" N latitudes and 51° 47' 49" E and 51° 47' 56" E longitudes (Fig. 1). The study area is about 450 ha ranging from 1000 m to 1500 m a.s.l. The minimum temperature in December is 6.6 °C and the maximum temperature of 25 °C occurs in June. The mean annual precipitation of the study area is 1500 mm at the Nowshahr city metrological station, which is located 40 km away from the study area. The bedrock is limestone - dolomite, leading to soils with silty-clay-loam soil texture (Ahmadi et al. 2013). The forests of the study area are mixed and uneven-aged and are dominated by Fagus associated with other species such as Carpinus betulus, Acer velutinum, Parottia persica and Quercus castaneifolia. There is no history of harvesting in these forests and managed as the protected area.

Data Collection

Data were collected from a total of 127 0.1 ha circular temporary sample plots by using a random- systematic network laid out in the field. Specifically, we set up a 0.1 km x 0.1 km grid across the study region and selected 130 grid intersection plots at random. The sample plots were established in sites with no evidence of disturbances to minimise the noise in the response variable, removing three plots from the initial selection. Within each plot, the diameter of all tree species with DBH > 7 cm and the total height of all beech trees were measured by using caliper and Vertex IV (Haglöf, Sweden), respectively.

Since environmental changes to the soil are more strongly reflected in the topsoil, five soil samples of the top 10 cm below the litter were randomly taken within each plot using core soil sampler. The soil samples were then mixed and analyzed in the laboratory. Roots, shoots and stones were separated by hand and discarded and the air-dried soil samples were then sieved at 2 mm mesh size. Soil organic matter was determined using the Walkey-Black method (Allison 1975). Total nitrogen was measured in the laboratory by the Kjeldahl method (Bremner and Mulvaney 1982). The available K was determined by a flame atomic absorption spectrophotometer (AA500F, PG Instruments Ltd, China). The available P was determined by using the Olsen method (Homer and Pratt 1961). The Bouyoucos hydrometer method was used for determining the soil texture (Bouyoucos 1962). Soil pH and bulk density (at air-dried moisture content) were determined using pH-meter and Plaster (1985) method, respectively. Site factors such as altitude, slope percent and aspect were recorded at each sample location. Aspect, as the azimuth measured from true north, was then converted to a topographic radiation index using the following equation TRASP = [1 – cos((π/180)(θ – 30))]/2 (Alavi et al. 2019). Environmental variables collected as a basis for modelling are summarized in Table 1.

Table 1

Summary of the continuous site characteristics in the sample plots. TRASP, OC, OM and N refer to solar radiation aspect, organic carbon content, organic matter and nitrogen, respectively.
	Mean	Standard Deviation	Minimum	Maximum
Height (m)	27.65	4.27	19.49	37.13
Altitude (m a.s.l)	1229	80.63	1067	1445
Slope (%)	26.2	12.37	3.3	72.6
TRASP	0.30	0.28	0	1
Sand (%)	27.34	12.25	4	62
Clay (%)	35.85	12.04	0	64
Silt (%)	36.81	7.68	16	56
OC (%)	3.49	1.82	1.13	7.76
OM (%)	6.01	3.13	1.96	13.38
N (%)	0.33	0.12	0.13	0.69
Phosphorus (mg Kg^− 1)	14.13	8.79	4	37.84
K (mg Kg^− 1)	99.2	63.47	6	224
C-N-ratio	10.29	3.26	4.46	20.18
Bulk Density (g cm^− 3)	1.52	0.25	1.01	2.05
pH	5.99	0.54	5.1	7.53
Saturation Moisture (%)	48.74	2.62	43.64	55.42

Statistical analysis

Collinearity among environmental predictors was tested by hierarchical cluster analysis using squared Spearman correlations with the Hmisc package (Harrell et al., 2018) in the statistical software R (R Core Team 2018). The variables percentage sand, percentage carbon, percentage organic matter and percentage saturation were hence removed from the set of predictors due to their high correlation with altitude, slope, TRASP, clay, silt, nitrogen, phosphorus, K, C-N-ratio, bulk density, pH as they were deemed ecologically least relevant based on the authors' expert knowledge. The linear model with quadratic and first-order interaction terms was simplified using backward stepwise and Bayesian Information Criterion (BIC), which considers both the goodness-of-fit and model complexity and penalizes model complexity more than the AIC does (Burnham and Anderson 2002). The model selection step was carried out on the MLR to identify a minimal adequate model structure for all model types. The residuals of the MLR were normally distributed (for details see supplementary information).

Simultaneous autoregressive (SAR) models assume that the response variable at each location i, conditional on the value of explanatory variables at i, depends on the other response variables at neighboring locations j (Haining 2003). SAR models enhance the linear regression model with an additional term that combines the spatial autocorrelation structure of observations in data. In simultaneous autoregressive models, the neighborhood relationship is formally expressed in an n × n matrix of spatial weights (W), in which elements (w_ij) represent a measure of the connection between locations i and j. In the present study, we used three different SAR models including the spatial error model (SEM), spatial lagged model (SLM) and spatial Durbin model (SDM = spatial mixed model). The SEM models assume that the autoregressive process is in the error term. This is most probably in cases when spatial autocorrelation is not completely explained by the predictor variables (Diniz-Filho et al. 2003). The SAR spatial error model takes the form

Y = Xβ + λWµ + ε ,

where λ is the spatial autoregression coefficient, W is the spatial weights matrix, β is a vector representing the slopes associated with the explanatory variables in the original predictor matrix X, and ε represents the (spatially) independent errors.

The SLM models suppose that the autoregressive process affects only the response variable. It takes the form

Y = Xβ + ρWY + ε

where ρ is the autoregression parameter, and the remaining terms are as above.

If spatial autocorrelation can affect both response and explanatory variables, SDM takes the form (Anselin and Griffith 1988):

Y = Xβ + ρWY + WXγ + ε

Here a new term (WXγ) appears in the model, which represents the autoregression coefficient (γ) of the spatially lagged explanatory variables (WX).

In all SAR models, the neighborhood needs to be provided as input, which we consider as a room for arbitrary decisions (as discussed in Bauman et al. 2018 for eigenvector approaches), particularly compared to the generalized least squares model (GLS). GLS is, in principle, just the name for the algorithm used for fitting regression models with pre-specified error covariances and hence also used to fit the SAR models (Pinheiro and Bates 2000). Typically it is referred to, however, as an approach where the spatial covariance structure is usually modelled assuming a simple distance-decay, e.g. exponential or linear (Dormann et al., 2007).

All approaches, MLR, SAR and GLS, were fitted with the free software R. The three SAR model types (SEM, SLM, and SDM) are implemented in the 'spdep' package (Bivand 2011). Determine the neighborhood distances is an important issue for using the SAR function. After that spatial weight matrix calculated by weighting the neighbors with a certain coding scheme. For specifying the best neighborhood distances, we looped through 20–50 m distance settings and the best neighborhood distance was selected as the one with the lowest AIC. The final Simultaneous Autoregressive Regressions were run with a spatial weights matrix based on a neighbourhood distance of 150 m and a row standardized coding scheme ‘W’. For the GLS model, three different correlation structures (corExp, corGaus, and corSpher) were specified using the nlme package (Pinheiro et al. 2019).

In order to test the spatial autocorrelation in the model residuals, we employed Moran’s I, which quantifies the variance among residuals as a function of geographical distance. Values of Moran’s I vary between − 1.0 and 1.0, and positive values show observations within a certain distance have a tendency to be similar, negative values indicate dissimilarity, and approximately zero means arranged randomly and independently over space (Bao 2001). To evaluate patterns of spatial autocorrelation in the residuals of all regression models we used Moran’s I assessed by a test statistic (the Moran’s I standard deviate) (Dormann et al., 2007). Spatial correlograms for the mean height of oriental beech trees were prepared based on a neighbourhood from 0 to 150 m. For comparison of the different modeling approaches, we computed different goodness-of-fit measures: (1) percentage of variance explained, as adjusted R²; (2) AIC; (3) root mean square error (RMSE) and of course Moran's I.

Mlr Model

Ten of the 14 explanatory variables were significant and entered the model in MLR regression analysis (Table 3). The non-spatial multiple linear regression (MLR) explained 56.6% of the variance in the dataset. Using standardized regression coefficients as a criterion for determining the relative importance of explanatory variables (Kabacoff 2015) showed that phosphorous, altitude and nitrogen were the most important variables for the observed tree heights. Carbon-to-nitrogen ratio was the least important predictor, followed by silt, pH and TRASP.

The spatial correlogram (Fig. 2) representing the spatial autocorrelation for the model indicated that at short distances there was positive autocorrelation, as was also indicated by a highly significant Moran’s I score of 0.17 (P < 0.001).

SAR model

Tables 2 and 3 show the statistical results of the SAR and GLS models fitted by maximum likelihood. The spatial autoregressive parameter for SEM (λ = 0.533) is significant, as showed by the p-value < 0.001 in an asymptotic t-test as well as likelihood-ratio test (LR) (P < 0.001), clearly indicating the influence of neighboring observations. The estimate for the spatial autoregressive coefficient ρ = 0.155 for SLM is also highly significant (P < 0.01). For both the SLM and the SDM, the Likelihood Ratio test (LR) on ρ was also significant (P < 0.01) and was a bit higher for the SDM (0.196 vs. 0.155).

Across all models, the structurally most flexible approach, the SDM, achieved the best fit in terms of adjusted R² and RMSE. However, it did not rank best on the AIC, due to its larger number of spatial parameters. Instead, the SEM had the smallest AIC, followed by SLM and the GLS with "spherical" correlation structure. GLS with Gaussian correlation structure, SLM, SDM and GLS with exponential correlation structure had larger AICs, but the differences among those models were small (638.14 to 643.39). This group was also not substantially different in AIC from the non-spatial MLR.

For the Moran’s I of model residuals (Fig. 3), SEM had the smallest value (Moran’s I = 0.027, P = 0.608), while for the GLS_Gau it was the largest (Moran’s I = 0.160, p-value = 0.72) and nearly indistinguishable from the non-spatial MLR (I = 0.169, P < 0.001). The results of this study showed that the residual autocorrelation was removed by all spatial models but not by the non-spatial MLR.

There were differences in the estimates and significance level of the variables. For some variables such as altitude and bulk density, P-values changed as well (Table 3).

Table 2

Results of the MLR and spatial models including the following statistics: Adjusted coefficient of determination (R²), Root Mean Square Error (RMSE), Akaike Information Criterion (AIC), and residuals’ autocorrelation (as estimated by the Moran’s I coefficient) along with its p-value. All models have the same number of predictors (22 plus intercept).
Model	AIC	R² (%)	RMSE	Moran’s I	p-value
MLR	644.97	56.82	2.77	0.168	0.000
SEM	634.61	61.01	2.52	0.027	0.608
SLM	639.35	59.45	2.68	0.089	0.159
SDM	643.39	71.35	2.25	0.047	0.426
GLS_sph	637.58	60.69	2.95	0.012	0.766
GLS_Exp	640.82	56.82	2.88	0.041	0.475
GLS_Gau	638.14	60.51	2.94	0.160	0.720

The intercept coefficients β₀ of SEM and GLS spatial models were close to that of MLR and were highly significant, but in the other spatial models (SLM and SDM), the intercept coefficients β0 were very different. There are considerable differences in the slope coefficients β of SLM, SDM, SEM and GLS models with MLR (Table 3 and Figure of parameter estimates in the appendix). The standard errors of β0 and β1 of SLM, SDM, SEM and GLS models were nearly equal to or lower than those of MLR (Tables 3).

Table 3

Parameter estimates of all models (and their standard error). Note that all predictors were standardised before the analysis so that (absolute) estimates serve as an indicator of variable importance. ***, **, * and n.s. refer to significance levels of the model term at P < 0.001, < 0.01, < 0.05 and > 0.05, respectively. One important effect of including spatial autocorrelation is a much larger uncertainty of the estimates in all spatial models.
	MLR	SEM	SLM	SDM	GLS.Sph	GLS.Gau	GLS.Exp
(Intercept)	28.82 (0.67)***	28.17 (0.68)***	24.58 (1.6)***	25.08 (1.5)***	28.36 (0.77)***	28.36 (0.77)***	28.47 (0.77)***
Altitude	-2.17 (0.33)***	-1.86 (0.45)***	-1.9 (0.29)***	-1.13 (0.86)^n.s.	-1.94 (0.47)***	-1.97 (0.47)***	-2 (0.46)***
Bulk density	-0.64 (0.32)*	-0.12 (0.27)^n.s.	-0.56 (0.28)*	-0.11 (0.27)^n.s.	-0.06 (0.3)^n.s.	-0.08 (0.3)^n.s.	-0.2 (0.31)^n.s.
Clay	-0.74 (0.6)^n.s.	-0.55 (0.46)^n.s.	-0.59 (0.52)^n.s.	-0.68 (0.5)^n.s.	-0.67 (0.51)^n.s.	-0.65 (0.51)^n.s.	-0.72 (0.52)^n.s.
C-N-ratio	0.23 (0.43)^n.s.	0.2 (0.33)^n.s.	0.22 (0.37)^n.s.	-0.15 (0.34)^n.s.	0.07 (0.38)^n.s.	0.08 (0.38)^n.s.	0.09 (0.39)^n.s.
K	1.72 (0.57)**	1.53 (0.44)**	1.64 (0.49)**	1.26 (0.47)**	1.58 (0.52)**	1.61 (0.52)**	1.61 (0.54)**
N	-1.98 (0.51)***	-1.69 (0.4)***	-1.95 (0.44)***	-1.69 (0.41)***	-1.76 (0.45)***	-1.83 (0.45)***	-1.85 (0.46)***
P	2.94 (0.53)***	2.44 (0.49)***	2.46 (0.48)***	1.46 (0.54)**	2.41 (0.55)***	2.43 (0.56)***	2.52 (0.56)***
pH	0.24 (0.37)^n.s.	0.24 (0.3)^n.s.	0.31 (0.32)^n.s.	0.29 (0.33)^n.s.	0.37 (0.34)^n.s.	0.35 (0.34)^n.s.	0.34 (0.35)^n.s.
Silt	-0.23 (0.49)^n.s.	-0.19 (0.39)^n.s.	-0.14 (0.43)^n.s.	-0.09 (0.42)^n.s.	-0.19 (0.43)^n.s.	-0.2 (0.43)^n.s.	-0.19 (0.45)^n.s.
TRASP	0.24 (0.32)^n.s.	0.18 (0.27)^n.s.	0.12 (0.28)^n.s.	0.18 (0.27)^n.s.	0.17 (0.3)^n.s.	0.18 (0.3)^n.s.	0.19 (0.31)^n.s.
CN²	-0.87 (0.31)**	-0.59 (0.25)*	-0.89 (0.27)**	-0.65 (0.27)*	-0.48 (0.28)^n.s.	-0.5 (0.29)^n.s.	-0.54 (0.29)^n.s.
P²	-0.98 (0.34)**	-0.71 (0.29)*	-0.78 (0.3)*	-0.24 (0.32)^n.s.	-0.71 (0.32)*	-0.71 (0.33)*	-0.76 (0.33)*
Altitude:Clay	1.7 (0.49)***	1.78 (0.41)***	1.95 (0.43)***	2.26 (0.42)***	1.98 (0.46)***	1.99 (0.46)***	1.92 (0.47)***
Altitude:C-N-ratio	1.37 (0.43)**	0.91 (0.32)**	1.34 (0.37)***	0.97 (0.39)*	1.03 (0.37)**	1.05 (0.37)*	1.1 (0.38)*
Altitude:K	1.06 (0.46)*	0.73 (0.36)*	1.17 (0.4)**	1.35 (0.41)**	1 (0.42)*	1 (0.42)*	0.98 (0.44)*
Bulk density:C-N-ratio	1.01 (0.38)**	0.46 (0.31)^n.s.	1.06 (0.33)**	0.76 (0.31)*	0.66 (0.34)^n.s.	0.69 (0.34)*	0.75 (0.36)*
Bulk density:Silt	-0.91 (0.37)*	-1.04 (0.3)***	-0.97 (0.32)**	-1.1 (0.3)***	-0.88 (0.32)**	-0.88 (0.32)*	-0.87 (0.33)*
Clay:K	-1.43 (0.7)*	-1.4 (0.55)*	-1.19 (0.61)*	-0.76 (0.57)^n.s.	-1.29 (0.61)*	-1.34 (0.61)*	-1.33 (0.63)*
Clay:pH	1.1 (0.44)*	0.61 (0.37)^n.s.	0.87 (0.39)*	0.73 (0.38)^n.s.	0.76 (0.41)^n.s.	0.76 (0.41)^n.s.	0.79 (0.42)^n.s.
Clay:TRASP	-1.29 (0.39)**	-0.97 (0.32)**	-1.29 (0.34)***	-0.99 (0.34)**	-0.9 (0.37)*	-0.9 (0.37)*	-0.98 (0.37)*
CN:Silt	1.21 (0.55)*	0.9 (0.44)*	1.17 (0.48)*	1.14 (0.46)*	1.11 (0.5)*	1.12 (0.5)*	1.16 (0.52)*
K:Silt	-1.21 (0.51)*	-1.05 (0.39)**	-1.21 (0.44)**	-1.37 (0.44)**	-1.16 (0.44)*	-1.16 (0.45)*	-1.17 (0.46)*
Silt:TRASP	0.91 (0.36)*	0.94 (0.29)**	0.84 (0.31)**	0.92 (0.28)**	0.89 (0.32)**	0.9 (0.32)*	0.9 (0.33)*

Various spatial models have been used for reducing spatial autocorrelation of model residuals (Kissling and Carl 2008; Meng et al. 2009; Lu and Zhang 2011; Lou et al. 2016). In this study, three spatial autoregressive models and three GLS model structures were used to evaluate the relationships between beech tree height and environmental variables. All spatial models had a better performance than the non-spatial multiple regression. In general, the results of SDM and SEM were significantly better than SLM, suggesting that the spatial error was largely due to environmental, not to endogenous, ecological processes. Having high potential for reducing the spatial pattern of model residuals, spatial autoregressive and generalized least square models help to meet the assumption of independence in regression models. In the modeling process, we seek to select the best model based on comparing the evaluation criteria of the models such as R² and AIC. The results showed that when the spatial weight matrix is added in SAR models, the adjusted R² value increased from 59% for the MLR model to 71% for the SAR model.

The GLS with spherical correlation was the best-fitting model from the GLS set, and the mixed SAR and error SAR for the SAR set. Kissling and Carl (2008) specified that SEM and SDM were the most reliable models regarding the precision of parameter estimates, reducing the spatial autocorrelation in model residuals, and controlling the type I error, irrespective of what kind of spatial autocorrelation existed in the data. Zhang et al. (2009) indicated that SEM performed better than SLM in terms of goodness of fit (e.g., AIC and R²) and Moran’s I of model residuals. Meng et al. (2009) also found that SEM was better than SLM and SDM in model fitting in a case study on 690 slash pine (Pinus elliottii Englem.) trees, but there was a subtle difference between SEM and SDM. Lu and Zhang (2011) showed that SDM had the best performance in terms of such criteria as the goodness of fit, model prediction, and spatial autocorrelation in both model residuals and prediction errors. However, SEM and SDM were very close to each other regarding model fitting and performance. By using simultaneous autoregressive (SAR) models to analyze the relationship between stand top and stand mean height in the mixed Quercus mongolica broadleaved natural stands in the Northeast China, Lou et al. (2016) found SEM and SDM had better performance than MLR regarding to the reduction of the spatial dependence in the model residuals and model fitting. So overall, it seems that SEM and SDM seem to work very well in a forestry context.

Global Moran's I calculations detect the problem of misspecification in the relationships between the predictors and the response variable described by the model (Anselin 2005). The residuals of the MLR model showed significant positive autocorrelation (Moran's I = 0.168, p < 0.001). By using the spatial models, Moran's I was reduced to 0.027 in the case of the SEM model and 0.012 in the GLS_sph model (both not significant). This indicated that not only SAR and GLS models were able to improve model performance, they also alleviated the problem of spatially autocorrelated error terms as intended.

Several researchers stated that the interpretation of parameter estimates and coefficients of spatial autoregressive models model are among the most important issues in geographical ecology (Lennon 2000; Diniz-Filho et al. 2003; Tognelli and Kelt 2004; Dormann et al. 2007; Kühn 2007; Kissling and Carl 2008; Lu and Zhang 2011). Thus our comparison is not just a statistical exercise but it has profound implications for research in biogeography, macroecology and global change, because any bias in the parameter estimates and model misspecifications affect the hypotheses testing and the prediction of species distributions (Diniz-Filho et al. 2003; Dormann et al. 2007). Some authors assume that spatial models always provide better parameter estimates than the MLR technique, but Kissling and Carl (2008) suggested that researchers must be cautious about this assumption. In Lu and Zhang (2011) study, SEM commonly provided coefficient estimates similar to the MLR model. We found that all spatial models had better performance compared to the MLR model, on the other hand, there are considerable differences in parameter estimation between MLR and spatial models which are inconsistent with Lu and Zhang (2011). More important than bias may be the fact that spatial models always had larger uncertainty for the estimates, resulting also in a wider error margin for model predictions.

Both SAR and GLS rely on generalized least squares regression, therefore they are mathematically very similar, although GLS shows more flexibility in the way spatial autocorrelation is accounted for (Dormann et al. 2007). Different settings of the distance parameters in the SAR-based models were compared by using AIC and the best configuration of the correlation matrix was found. Comparing the performance of models here suggests that GLS with spherical correlation structure and SEM provide more accurate results than other models. This result is attributed to the fact that spatial autocorrelation is considered at all scales in the GLS model, therefore, we do not need a priori knowledge of the distance range of spatial influence. By including the correlation structure derived from a semivariogram, an improvement to SAR models could be achievable (Beguería and Pueyo 2009).

Our analysis suggests that phosphorus, altitude, and nitrogen were the most important predictor variables of the height of beech trees in both spatial and non-spatial models. Soil nitrogen and phosphorus are the most common macronutrients limiting the growth of plants under natural conditions (Liu et al. 2014). The strong correlation of beech tree height with the concentration of phosphorus in the soil suggests that phosphorus is the primary limiting nutrient for beech tree height in the Hyrcanian Forest. various studies conducted in other temperate forests have presented phosphorus is a limiting factor in the forest stands (Binkley and Högberg 1997; Harrison et al. 1999; Brown and Courtin 2003; Corbin et al. 2003). Phosphorus is one of the major limiting nutrient of primary productivity in the terrestrial ecosystems and, therefore, the phosphorus demand of plants might be among the most important drivers of soil and ecosystem development (Gradowski and Thomas 2006). Phosphorus availability may also shape the interactions among plant, microorganism and soil in the forest ecosystems (Lang et al. 2016). In addition to its function in energy storage and in non-cyclic electron transport, phosphorus supply is related to concentration of plant Rubisco (Warren and Adams 2002) and as such can affect the carbon gain and growth of the trees (Courtin 1992; Herbert and Fownes 1995; Brown and Courtin 2003; Marschner 2011). Nevertheless, the importance of phosphorus in photosynthesis is likely not the only reasons for its limiting role in the study area. Phosphorus is of particular importance in accelerating the root growth, cell division, and growth of meristem tissues, its limitation is associated with a sharp decline in tree growth. As a result, phosphorus deficiency will slow down or stop the growth of above- and underground parts of the forest trees. In temperate forests, the main limiting nutrient factor for plant growth is generally the available nitrogen (Aber et al. 1989). It is expected that when tree nitrogen requirements are satisfied, some other nutrient may become the growth-limiting factor (Taylor 1934). In the analysis of the response curve of this particular beech species, Alavi et al. (2017) concluded that NPK and C/N variables are effective indices of tree growth (Alavi et al. 2017) .

The performance of species along elevation gradient is governed by a series of interacting biological, climatic and historical factors (Colwell and Lees, 2000). As such, elevation represents a complex gradient along which many environmental variables change simultaneously (Austin et al., 1996). Beech tree has the best performance at lower altitudes (ca. 1100 m), which is consistent with (Mohadjer 2005). It seems these altitude ranges have optimum humidity conditions and resource availability, yielding high productivity (Rahbek 1995; Rosenzweig 1995). The major decline in beech performance at higher altitudes could be due in part to ecophysiological constrains, such as reduced growing season length, low temperatures and hence low ecosystem productivity at high elevation (Körner 1998).

In this study, three spatial autoregressive models and three GLS models were used to model the relationships of beech tree height and environmental variables, with MLR as a benchmark. The three spatial autoregressive models had a better performance to the MLR model. In general, the results of SDM and SEM were significantly better than SLM. SEM was better than SDM based on the AIC evaluation criterion and spatial correlogram. SDM performs better than SEM in terms of RMSE and adjusted R². SDM has the advantage of analyzing the spatial variation of micro-environmental conditions in forest stands and competition among individual trees simultaneously. However, if the complexity of the model structure is important, SEM is definitely a reasonable choice over SDM because makes the understanding of the model much easier. Although SAR-based models have better performance than the GLS model, we recommend using the GLS model for modeling the height of trees, because the GLS is easier to than SAR-based models. However, when the computation time is a concern, SAR-based models can be more useful because of faster execution.

S.J.A.

Seyed Jalil Alavi; V.M.:Vria Mardanpour; C.F.D:Carsten F. Dormann

Acknowledgements

We would like to thank the members of the Soil Laboratory of Faculty of Natural Resources of Tarbiat Modares University for their help with laboratory work. We appreciate Ehsan Fakour and Koroush Ahmadi for field data sampling.

Authors’ contributions

S.J.A., V.M., and C.F.D conceived and designed the experiment, S.J.A. and V.M. collected the data; S.J.A. and C.F.D analyzed the data; S.J.A. and C.F.D wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Tarbiat Modares University.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Author details

¹Department of Forestry, Faculty of Natural Resources, Tarbiat Modares University, Tehran, Iran

²Department of Biometry and Environmental System Analysis, TennenbacherStraße 4, 79106 Freiburg, Germany

Aber JD, Nadelhoffer KJ, Steudler P, Melillo JM (1989) Nitrogen saturation in northern forest ecosystems. Bioscience 39(6):286–378
Achat DL, Pousse N, Nicolas M, Brédoire F, Augusto L (2016) Soil properties controlling inorganic phosphorus availability: general results from a national forest network and a global compilation of the literature. Biogeochemistry 127(2–3): 255–272. Springer
Aber JD, Nadelhoffer KJ, Steudler P, Melillo JM (1989) Nitrogen saturation in northern forest ecosystems. Bioscience 39:286–378
Ahmadi K, Alavi SJ, Kouchaksaraei MT (2017) Constructing site quality curves and productivity assessment for uneven-aged and mixed stands of oriental beech (Fagus oriental Lipsky) in Hyrcanian forest, Iran. Forest Sci Technol 13:41–46
Ahmadi K, Alavi SJ, Tabari Kouchaksaraei M, Aertsen W (2013) Non-linear height-diameter models for oriental beech (Fagus orientalis Lipsky) in the Hyrcanian forests, Iran. Biotechnol Agron Soc Environ 17:431–440
Alavi SJ, Ahmadi K, Hosseini SM et al (2019) The response of English yew (Taxus baccata L.) to climate change in the Caspian Hyrcanian Mixed Forest ecoregion. Reg Environ Chang. https://doi.org/10.1007/s10113-019-01483-x
Alavi SJ, Nouri Z, Zahedi Amiri G (2017) The Response Curve of Beech Tree (Fagus Orientalis Lipsky.) in Relation to Environmental Variables Using Generalized Additive Model in Khayroud Forest, Nowshahr. J Wood For Sci Technol 24:29–59. https://doi.org/10.22069/jwfst.2017.11630.1617
Allison L (1975) Organic Carbon. Methods soil Anal Part 2 Chem Microbiol Prop 1367–1378
Anselin L (2005) Spatial regression analysis in R: a workbook. Urbana 51:61801
Anselin L, Griffith DA (1988) Do spatial effecfs really matter in regression analysis? Pap Reg Sci 65:11–34. https://doi.org/10.1111/j.1435-5597.1988.tb01155.x
Bao S (2001) Literature review of spatial statistics and models. China Data Center, Univ Michigan Pap available online http//www umich edu/* iinet/chinadata/docs/review pdf
Beale CM, Lennon JJ, Yearsley JM et al (2010) Regression analysis of spatial data. Ecol Lett 13:246–264
Beguería S, Pueyo Y (2009) A comparison of simultaneous autoregressive and generalized least squares models for dealing with spatial autocorrelation. Glob Ecol Biogeogr 18:273–279
Binkley D, Högberg P (1997) Does atmospheric deposition of nitrogen threaten Swedish forests? For Ecol Manage 92:119–152
Bivand R (2011) spdep: Spatial dependence: weighting schemes, statistics and models. Spdep Spat Depend Weight Schemes, Stat Model
Bouyoucos GJ (1962) Hydrometer method improved for making particle size analyses of soils. Agron J 54:464–465
Bremner JM, Mulvaney RG (1982) Nitrogen total. In: Page, AL, Miller, RH Keeney, DR (eds.), Methods of soil analysis. Am Soc Agron Madison, Wilcosin 575–624
Brown KR, Courtin PJ (2003) Effects of phosphorus fertilization and liming on growth, mineral nutrition, and gas exchange of Alnus rubra seedlings grown in soils from mature alluvial Alnus stands. Can J For Res 33:2089–2096
Burnham KP, Anderson DR (2002) Model Selection and Multimodel Inference, 2nd edn. Springer-Verlag, New York
Coomes DA, Allen RB (2007) Effects of size, competition and altitude on tree growth. J Ecol 95:1084–1097. https://doi.org/10.1111/j.1365-2745.2007.01280.x
Corbin JD, Avis PG, Wilbur RB (2003) The role of phosphorus availability in the response of soil nitrogen cycling, understory vegetation and arbuscular mycorrhizal inoculum potential to elevated nitrogen inputs. Water Air Soil Pollut 147:141–162
Courtin PJ (1992) The relationships between ecological site quality and the site index and stem form of red alder in southwestern BCMF thesis. Univ Br Columbia, Vancouver, BC
Currie DJ (1991) Energy and large-scale patterns of animal-and plant-species richness. Am Nat 137:27–49
Dahlin KM, Asner GP, Field CB (2014) Linking vegetation patterns to environmental gradients and human impacts in a Mediterranean-type island ecosystem. Landsc Ecol 29:1571–1585
Diniz-Filho JAF, Bini LM, Hawkins BA (2003) Spatial autocorrelation and red herrings in geographical ecology. Glob Ecol Biogeogr 12:53–64. https://doi.org/10.1046/j.1466-822X.2003.00322.x
Dormann C, McPherson M, Araújo JB M, et al (2007) Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography 30:609–628. https://doi.org/10.1111/j.2007.0906-7590.05171.x
Dormann CF (2007) Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Glob Ecol Biogeogr 16:129–138. https://doi.org/10.1111/j.1466-8238.2006.00279.x
Fortin M-J, Dale MRT (2005) Spatial analysis: A guide for ecologists
Fricker GA, Synes NW, Serra-Diaz JM et al (2019) More than climate? Predictors of tree canopy height vary with scale in complex terrain, Sierra Nevada, CA (USA). For Ecol Manage 434:142–153
Gatti RC, Di Paola A, Bombelli A et al (2017) Exploring the relationship between canopy height and terrestrial plant diversity. Plant Ecol 218:899–908
Gradowski T, Thomas SC (2006) Phosphorus limitation of sugar maple growth in central Ontario. For Ecol Manage 226:104–109. https://doi.org/10.1016/j.foreco.2005.12.062
Haining RP (2003) Spatial data analysis: theory and practice, 1 edition. Cambridge University Press
Harrell FE Jr, Dupont C, Others M (2018) Hmisc: Harrell Miscellaneous. R package version 4.1-1. https://CRAN.R-project.org/package=Hmisc
Harrison AF, Carreira J, Poskitt JM et al (1999) Impacts of pollutant inputs on forest canopy condition in the UK: possible role of P limitations. Forestry 72:367–378
Herbert DA, Fownes JH (1995) Phosphorus limitation of forest leaf area and net primary production on a highly weathered soil. Biogeochemistry 29:223–235
Homer DC, Pratt PF (1961) Methods of analysis for soils, plants and waters. Univ California, Div Agri Sci USA 150–196
Jucker T, Bongalov B, Burslem DFRP et al (2018) Topography shapes the structure, composition and function of tropical forest landscapes. Ecol Lett 21:989–1000. https://doi.org/10.1111/ele.12964
Kabacoff R (2015) R in action: data analysis and graphics with R. Manning Publications, Second Edi
Keitt TH, Bjørnstad ON, Dixon PM, Citron-Pousty S (2002) Accounting for spatial pattern when modeling organism-environment interactions. Ecography 25:616–625. https://doi.org/10.1034/j.1600-0587.2002.250509.x
Kerr JT, Packer L (1997) Habitat heterogeneity as a determinant of mammal species richness in high-energy regions. Nature 385:252
Kissling WD, Carl G (2008) Spatial autocorrelation and the selection of simultaneous autoregressive models. Glob Ecol Biogeogr 17:59–71. https://doi.org/10.1111/j.1466-8238.2007.00334.x
Körner C (1998) A re-assessment of high elevation treeline positions and their explanation. Oecologia 115:445–459. https://doi.org/10.1007/s004420050540
Kühn I (2007) Incorporating spatial autocorrelation may invert observed patterns. Divers Distrib 13:66–69. https://doi.org/10.1111/j.1472-4642.2006.00293.x
Lang F, Bauhus J, Frossard E et al (2016) Phosphorus in forest ecosystems: New insights from an ecosystem nutrition perspective. J Plant Nutr Soil Sci 179:129–135. https://doi.org/10.1002/jpln.201500541
Legendre P, Legendre LFJ (2012) Numerical ecology. Elsevier
Lennon JJ (2000) Red-shifts and red herrings in geographical ecology. Ecography 23:101–113. https://doi.org/10.1111/j.1600-0587.2000.tb00265.x
Lichstein JW, Simons TR, Shriner SA, Franzreb KE (2002) Spatial autocorrelation and autoregressive models in ecology. Ecol Monogr 72:445–463. https://doi.org/10.1890/0012-9615(2002)072[0445:SAAAMI]2.0.CO;2
Lindenmayer DB, Laurance WF, Franklin JF (2012) Global decline in large old trees. Science 338:1305–1306
Liu X, Meng W, Liang G et al (2014) Available phosphorus in forest soil increases with soil nitrogen but not total phosphorus: Evidence from subtropical forests and a pot experiment. PLoS One 9:1–8. https://doi.org/10.1371/journal.pone.0088070
Lou M, Zhang H, Lei X et al (2016) Spatial autoregressive models for stand top and stand mean height relationship in mixed Quercus mongolica broadleaved natural stands of northeast China. Forests 7:. https://doi.org/10.3390/f7020043
Lu J, Zhang L (2011) Modeling and prediction of tree height-diameter relationships using spatial autoregressive models. For Sci 57:252–264
Lutz JA, Furniss TJ, Johnson DJ et al (2018) Global importance of large-diameter trees. Glob Ecol Biogeogr 27:849–864
Marks CO, Muller-Landau HC, Tilman D (2016) Tree diversity, tree height and environmental harshness in eastern and western North America. Ecol Lett 19:743–751
Marschner H (2011) Marschner’s mineral nutrition of higher plants. Academic press
Marvie Mohadjer MR (2005) Silviculture. University of Tehran
Meng Q, Cieszewski CJ, Strub MR, Borders BE (2009) Spatial regression modeling of tree height-diameter relationships. Can J For Res 39:2283–2293. https://doi.org/10.1139/X09-136
Miller J, Franklin J, Aspinall R (2007) Incorporating spatial dependence in predictive vegetation models. Ecol Modell 202:225–242
Moeur M (1993) Characterizing spatial patterns of trees using stem-mapped data. For Sci 39:756–775
Oliver CD, Larson BC (1996) Forest stand dynamics. Wiley New York
Pinheiro J, Bates D, DebRoy S et al (2019) nlme:Linear and Nonlinear Mixed Effects Models. R Packag version 31–141 1–3
Pinheiro JC, Bates DM (2000) Linear mixed-effects models: basic concepts and examples. Mix Model S S-Plus 3–56
Pretzsch H (2009) Forest dynamics, growth, and yield. In: Forest dynamics, growth and yield. Springer, pp 1–39
R Core Team (2018) R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Austria, 2015
Rahbek C (1995) The elevational gradient of species richness: a uniform pattern? Ecography 18:200–205
Rahbek C, Graves GR (2001) Multiscale assessment of patterns of avian species richness. Proc Natl Acad Sci 98:4534–4539
Rosenzweig ML (1995) Species diversity in space and time. Cambridge University Press
Rouvinen S, Kuuluvainen T (1997) Structure and asymmetry of tree crowns in relation to local competition in a natural mature scots pine forest. Can J For Res 27:890–902. https://doi.org/10.1139/x97-012
Sagheb Talebi K, Sajedi T, Pourhashemi M et al (2014) Forests of Iran: A Treasure from the Past, a Hope for the Future. Springer
Slik JWF, Paoli G, McGuire K et al (2013) Large trees drive forest aboveground biomass variation in moist lowland forests across the tropics. Glob Ecol Biogeogr 22:1261–1271
Tao S, Guo Q, Li C et al (2016) Global patterns and determinants of forest canopy height. Ecology 97:3265–3270
Tateno R, Takeda H (2003) Forest structure and tree species distribution in relation to topography-mediated heterogeneity of soil nitrogen and light at the forest floor. Ecol Res 18:559–571
Taylor WP (1934) Significance of extreme or intermittent conditions in distribution of species and management of natural resources, with a restatement of Liebig’s law of minimum. Ecology 15:374–379
Teng SN, Xu C, Sandel B, Svenning J (2018) Effects of intrinsic sources of spatial autocorrelation on spatial regression modelling. Methods Ecol Evol 9:363–372
Tognelli MF, Kelt DA (2004) Analysis of determinants of mammalian species richness in South America using spatial autoregressive models. Ecography 27:427–436. https://doi.org/10.1111/j.0906-7590.2004.03732.x
Warren CR, Adams MA (2002) Phosphorus affects growth and partitioning of nitrogen to Rubisco in Pinus pinaster. Tree Physiol 22:11–19. https://doi.org/10.1093/treephys/22.1.11
Zhang J, Nielsen SE, Mao L et al (2016) Regional and historical factors supplement current climate in shaping global forest canopy height. J Ecol 104:469–478

Download PDF

Version 1

posted

You are reading this latest preprint version

Using Spatial Regression Models in Identifying the Drivers of Forest Structure in the Hyrcanian Forests of Iran

Status:

Version 1

Abstract

Figures

Background

Materials And Methods

Study Area

Data Collection

Statistical analysis

Results

Mlr Model

SAR model

Discussion

Conclusions

Abbreviations

Declarations

References

Status:

Version 1