Spatial modeling of relationship between soil erosion factors and land-use changes at sub-watershed scale for the Talar watershed, Iran

Soil erosion is one of the most common types of land degradation. To provide useful information for proper management, quantitative soil erosion evaluation and identifying influential factors are needed. However, rare studies have been reported on spatial modeling of soil erosion in connection with affective factors to prioritize the locality and the type of erosion control measures. Hence, the aim of this study was to (1) assess erosion-prone areas in the Talar watershed, Iran, using the revised universal soil loss equation (RUSLE) model and (2) investigate the relationship between soil erosion variability and land-use changes. Toward that, the ordinary least squares (OLS), geographically weighted regression (GWR) models, and principal component analysis (PCA) were used to analyze spatial relationships between soil erosion, land-use, and the RUSLE factors. The results of the OLS and GWR models indicated that these relationships are spatially non-stationary. GWR models had a good predictive performance than OLS with lower Akaike’s Information Criterion (from 254.31 to 276.81 in OLS and from 247.87 to 269.42 in GWR) and higher adjusted R2 values (from 0.12 to 0.54 in OLS, and from 0.36 to 0.66 in GWR). Among the variables mentioned above, LS factor, P factor, forest, and irrigated land were the most influential variables in GWR models. The results of PCA showed that PC1 and PC2 explained 66.2% of the variation in soil erosion concerning land-use and the RUSLE factors. These results provided appropriate references for managers and experts properly planning the study watershed.


Introduction
Soil erosion is one of the most severe environmental issues linked to land degradation. As a result, soil erosion is the primary cause (> 85%) of land loss (Tang et al. 2015). It causes some immediate on-site and off-site problems such as soil nutrient loss, reduced agricultural productivity, land abandonment (Nunes et al. 2011;Ochoa et al. 2016), and some secondary environmental problems viz. flooding, river siltation, and water pollution (Chakrabortty et al. 2020). Furthermore, soil erosion is an environmental threat to sustainable development. Water erosion affects some 1100 Mha of land worldwide, which is almost twice what occurs by wind erosion (Ganasri and Ramesh 2016). As a result, adequate soil and water conservation, the adaptation of specific management strategies, monitoring and estimation of water erosion, and identification of critical areas for implementing managerial practice and technical measures are needed.
Remote sensing (RS) technology, together with Geographical Information Systems (GIS), provides valuable information to help policymakers make sound decisions about implementing erosion control strategies. At the same time, it remains the most effective and time-consuming method. Many models, such as the soil and water assessment tool (SWAT) (Halecki et al. 2018;Zema et al. 2022), the water erosion prediction project (WEPP) (Anache et al. 2018;Choudhury et al. 2022), the universal soil loss equation (USLE) (Gia et al. 2018), the revised universal soil loss equation (RUSLE) (Chicas et al. 2016;Senanayake et al. 2022), and erosion productivity impact calculator (EPIC) (Gao et al. 2017) have been developed in this way for risk evaluation or predictive assessment of water erosion. Each model has its own set of features and implementation possibilities. The RUSLE and its previous version (USLE) are accepted universally because of reasonable costs, applicability, and reliability of results which has been corroborated as one of the best and most widely used models for estimating soil erosion (Prasannakumar et al. 2011;Zerihun et al., 2018;Fayas et al. 2019;Efthimiou et al. 2020).
RUSLE predicts soil loss using rainfall erosivity, topography, soil erodibility, and vegetation cover and practice management, which will be effective for better understanding the erosion process. Many researchers have shown that soil erosion is influenced by some other factors, such as characteristics of soil particle size distribution (Wang et al. 2008), different crop types (Ruysschaert et al. 2007), and land-use (Vanacker et al. 2019;Avand et al. 2022). Changes in the land-use distribution pattern, known as a landscape and land-use morphology, can also affect soil erosion. Both land-use types and landscape morphology influence soil erosion.
Some researchers have applied traditional statistical analysis and the ordinary least squares (OLS) model to explore relationships among dependent and independent factors. As a result, the relationships are spatially consistent across the entire study area. In other words, these traditional statistical methods provide average and global relationships, which may neglect some significant spatial characteristics and hide local variations. Generating a global relationship shows the average conditions, while local variations of parameters are ignored. In this light, advanced approaches can be used to address current limitations and assess spatial heterogeneity in relationships. In this vein, a local statistical model called geographically weighted regression (GWR) was proposed by Brunsdon et al. (1996) to determine the spatial non-stationary relationships. This technique provides spatially weighting information, which enables the regression models to be various locally.
In the past, GWR and OLS have been used in a variety of fields, including the ecology and even human geography (Tu 2011), NDVI-rainfall relationship (Georganos et al. 2017), land use, land cover (Dadashpoor et al. 2019), soil organic carbon (Mirchooli et al., 2020a), and land surface temperature (Li et al. 2020;Zhao et al. 2018a;Mirchooli et al., 2020b). Many studies investigated the relationships between soil erosion, RUSLE factors, and land-uses using commonly global statistical models and machine learning (Mehri et al. 2018;Setyawan et al. 2019;Nyesheja et al. 2019). For example, Sahour et al. (2021) used soil erosion pins to assess the spatial distribution of soil erosion and applied multiple linear regression (MLR), boosted regression trees (BRT), and deep learning (DL) to identify the relationships between environmental factors and soil erosion. They found that slope steepness was the most influential factor in the predicted soil erosion. Nguyen et al. (2021) quantified soil erosion using erosion pins and machine learning in the Shihmen Reservoir watershed, Taiwan. They found that slope direction and type of slope had the maximum impacts on soil erosion across the watershed. Chuma et al. (2022) estimated soil erosion using the RUSLE model in the Chisheke watershed in the Eastern Democratic Republic of Congo. They reported high impacts of land use/cover on soil erosion.
A review of the current literature reveals that most studies used a single model to apply soil erosion and other driving forces variables to all data, which was uniformly distributed throughout the study area. To achieve an acceptable result, research on the spatial non-stationary relationship between soil erosion and different variables is necessary to understand these relationships better. Therefore, the main goals of the current research are to (1) assess the spatial distribution of soil erosion for the Talar watershed, Iran, using the RUSLE model, (2) assess local spatial distributions of soil erosion using cluster analysis and classify similar sub-watersheds, and (3) identify spatial variations of soil erosion in response to the land-use area and the RUSLE factors using PCA, OLS, and GWR models.

Study area
The Talar watershed in eastern Mazandaran Province, north of Iran, was used as a case study for this study. It is located at 36 36 to 36 46 N and 55 23 to 54 31 E with an area of ca. 211,000 ha. Talar is one of the most significant rivers draining into the Caspian Sea. The Talar watershed has been subjected to various human interventions, including fuel, timber smuggling, livestock grazing, road construction, mine exploitation, and factory development (Kavian et al. 2018), leading to severe threats to the water quality and soil conditions in the watershed. The Talar River, with a 100 km length, intersects the Kassilian River with a length of 50 km (Talar). With mean annual precipitation of 552.7 mm, the climate in this watershed is semi-humid and cold. The annual mean minimum and maximum temperatures are 7.7 and 21.1 °C, respectively. This watershed is characterized by gentle steep hill slopes (< 12%), which lie between the southeast highland plains and land near the watershed outlet. Highlands and sloping areas (> 60%) comprise 14.2% of the total area, while more than 30% of inclinations constitute 38%. (Mohammadi et al. 2019). Figure 1 presents the general geographical view of the location and other important information about the Talar watershed in the north of Iran and Mazandaran Province. In this watershed, the lands were natural forests, but the land-use change occurred by the local people and industries. In this watershed, the natural forest was converted to agricultural and pasture lands, making the watershed more vulnerable to soil erosion. The main land-uses are Forest, rangeland, dry farming, irrigated agriculture, and residential area (Mohamadi et al. 2016).
To achieve the study's goals, remote sensing data and input data of the RUSLE model were processed to obtain a land-use map and soil erosion. Three models of PCA, OLS, and GWR were used to investigate the relationships among soil erosion, the RUSLE factors, and landuses. The flowchart presented in Fig. 2 shows the proposed method in this study.

Data pre-processing
Land-use distribution map of the Talar watershed was obtained using Landsat 8/Operational Land Imager (OLI) image (02.07.2014) and supervised classification algorithm in the ENVI 5.3 software environment (Torabi Haghighi et al. 2018). Five land-use classes were identified, including forest, rangeland, irrigated agriculture, rainfed agriculture, and residential area. To meet the requirement for the RUSLE model, the monthly precipitation data were obtained from the Ministry of Energy (http:// www. moe. gov. ir/), and a digital elevation model was downloaded from www. earth explo rer. usgs. gov to compute the LS factor. In addition, to prepare K factor inputs, 140 points of soil samples were collected based on homogeneous land units resulting from overlying soil types, land-use/cover, and slope layers.
After that, the necessary geo-referenced, radiometric, and FLAASH atmospheric corrections were made. In addition, the Talar watershed's land-use map was generated using supervised classification and a maximum likelihood algorithm. Some point samples of 1250 were collected during the field survey and divided into two sets of validation and training with respective values of 30 and 70%. The error matrix, as well as the producer's accuracy and the user's accuracy, was used to determine the overall accuracy of land-use.

Fig. 2
Flowchart of the methodology adopted for spatial modeling of the relationship between soil erosion factors and land use changes for the Talar watershed, Iran 1 3

Soil erosion estimation
The RUSLE model is the simplest model to estimate soil erosion in this analysis. Equation (1) was used to measure the mean annual soil loss (A) (Renard et al. 1991): where R is the rainfall erosivity factor [MJ mm, (ha −1 h −1 year −1 )]; K is the soil erodibility factor [ton ha −1 h MJ −1 ha −1 mm −1 ]; LS is the slope length and steepness factor (dimensionless); C is the cover management factor (dimensionless), and P is the conservation support or erosion control practices factor (dimensionless).
Rainfall erosivity (R) The r factor reflected the soil erosion potential caused by rainfall was calculated using the Renard and Freimund (1994) method according to Eqs. (2), (3), and (4): F is the modified index value, P i is the average monthly precipitation (mm), and P̅ is the average annual precipitation (mm).
Soil erodibility (K) The Wischmeier and Smith equation were used to determine the soil erodibility factor in this analysis (Eq. 5) (Wischmeier 1976).
where K is the soil erodibility factor, M is the particle size parameter (% silt + %very fine sand) * (100-% clay), OM is the organic matter content (%), S and P are the classes for soil structure and permeability, respectively.
Topographic factor (LS) LS factor reflects the combined impact of slope length (L) and steepness (S) on soil erosion. In this study, a program written in Arc Macro Language (ALM) (Hickey 2000), updated in 2004 with the C ++ programming language (http:// www. iamg. org), was applied to calculate the LS factor.
Land cover management (C) The protective impact of ground covers against erosive rainfall, i.e., the C factor was calculated using Normalized difference vegetation index (NDVI) according to Eq. (6) (Nearing et al. 1989): Conservation practices factor (P) The relevant P factor value was calculated in this study using a method proposed by Troeh et al. (1980), which was partly based on landuse data. (1)

OLS and GWR models
The OLS and GWR models were applied in the present study to understand the importance of the RUSLE factors and land-uses in the estimated soil erosion and the spatial relationships among them.
OLS is the primary statistical and global model and examines the average condition over space, as shown in Eq. (7).
where x i is the independent variables (i.e., RUSLE factors and land-use areas) and y is the dependent variable (i.e., soil erosion); n is the number of independent variables; β 0 is intercept and β i and e represent, respectively, coefficient and error. While the GWR model is a local form of linear regression that overcomes the limitation of the OLS model. This model improves the results of OLS by recognizing the spatial variation in the relationship among the variables. GWR improves the OLS model by considering the local parameters estimate for each observation i as expressed in Eq. (8): where y i is the value of the soil erosion at location i, (u i , v i ) is the spatial location of ith data, β 0 (u j , v j ) acts as intercept, and β j (u i , v i ) is the value of the jth parameter at the location of i, i is the random error for location i (Georganos et al. 2017).
The location distance i is used to measure the weight of data which varies with i. The parameters in a matrix form at location i are estimated with the help of Eq. (9): where W(u i , v i ) is a spatial weighting matrix, X is an independent data (i.e., RUSLE factors and land-use areas) matrix, and Y is a soil erosion data vector (Fotheringham et al. 2002;Gao and Li 2011).
Although several functions have the potential to be used for determining the weighting matrix, the common method of the Gaussian function was applied to calculate the weight of each point using Eq. (10): where w ij is the weight of observation at location j for estimating the dependent variable at location i, and h is the bandwidth. Wang et al. (2005Wang et al. ( , 2014 regressed this equation as a distance decay function, which means that the higher the weight of position j, the closer it is to location i. In the GWR model, the maximum distance from location i as bandwidth controlled the size of the neighborhood was considered. The fixed kernel appropriate for little data evenly distributed in space (Georganos et al. 2017;Taghipour Javi et al. 2014) was used for developing the GWR model in the current research.
In the present study, OLS and GWR were conducted in ArcGIS 10.3. Finally, adjusted R 2 and Akaike's Information Criterion (AIC) were used to measure and compare the 1 3 accuracy and performance of these models. The higher adjusted R 2 and lower AIC mean the better performance of the model.

Principal component analysis
Principal component analysis (PCA), a multivariate statistical technique, was used to provide information about the most critical factors and to describe the variance of the data (Pejman et al. 2015). In the present study, the PCA approach was used to analyze the relationship between soil erosion, land-use, and the RUSLE factors and verify the critical potential factor of soil erosion in the studied region, using R software, FactoMineR, and factoextra packages. Figure 3 indicates the spatial distribution of RUSLE factors and soil loss across the study area. The value of the R factor ranged from 209.65 to 503.24 (MJ mm ha −1 h −1 y −1 ) across the study watershed, and the highest and lowest values were distributed in the northern and western parts of the watershed, respectively. The mean value of this factor was 308.25 MJ mm ha −1 h −1 y −1 , along with the results of Sadeghi and Tavangar (2015). In the case of the K factor, values were distributed between 0.02 and 0.09, with the highest values in central areas. The values of the LS factor varied from some 0.00 to 140.82 following flow accumulation and slope steepness. The values of the C factor were between 0.24 and 0.46. Ultimately, the P factor ranged from 0.10 to 0.70. Table 1 also shows the mean of RUSLE values in sub-watersheds. Soil erosion potential was divided into three categories low, moderate, and strong. Low soil erosion is observed in most watershed areas, especially in the east and southeast. Soil erosion is most prevalent in the central parts of the watershed, where almost all affective factors are high except R. To assess soil erosion for the Talar watershed, enhanced hierarchical Cluster analysis based on Euclidean distance was adapted to cluster data of soil erosion in the sub-watersheds.Cluster analysis generated a dendrogram, grouping all 34 sub-watersheds into two significant clusters of subwatersheds. In one group, all sub-watersheds with a low level of soil erosion are clustered together. The other group includes sub-watersheds with a high level of soil erosion which can be further subdivided into three additional clusters. The results of the cluster analysis are shown in Fig. 4.Group 1 consisted of sub-watersheds 5, 1, 2, 3, 7, 8, 10, 11; group 2 included sub-watersheds 27, 30, 34, 32, 33; and group 3 comprised sub-watersheds 14,15,17,18,4,6,9,12,13, and group 4 included remained sub-watersheds of the Talar watershed. The sub-watersheds in the same group have similar characteristics in terms of soil erosion. In the case of group 1, these sub-watersheds were located near the outlet of the watershed and were mainly influenced by dense forest cover (i.e., lower C factor). The least human interference and good land-use (i.e., lower P factor).

Development of OLS and GWR models
In this research, different variables, including areas of some land-use types viz. rainfed agriculture, forest, irrigated agriculture, rangeland, urban, R factor, K factor, LS factor, C factor, and P factor were used to predict soil erosion using OLS and GWR for the Talar watershed.
The OLS models were conducted before the GWR models. The analysis of OLS statistics, namely Jarque-Bera Statistic, showed that it was insignificant (p 1 3 values = 0.26-0.87) for all variables, which meant the residuals were not normally distributed. The normal distribution of residual indicated that the model was biased and missed some influential critical variables. Moreover, for the studied variables, the joint F-statistics were significant (p value < 0.05), meaning that the relationship between the dependent and independent variables was non-stationary and that the GWR model might boost it. The variables of urban and rainfed agriculture areas with Jarque-Bera Statistics and Joint F-statistics of p values < 0.05 and > 0.05 were removed from further analyses. The adjusted R-squared (R 2 ) and AICs were used to compare the model performances. So, the models with higher adjusted R 2 and lower AICs were considered better performed. In the OLS model, among the areas of land-uses variables, the area of forest and irrigated farming was the most influential variables with adjusted R 2 and AICs values of 0.29, 269.20, and 0.14 and 275.99, respectively. The detailed results are shown in Table 2. Furthermore, model (10) of the P factor, followed by the model (8) of the C factor, were the most important factors among other models of the RUSLE factors in the OLS models.  Table 2 also explained that the higher variance and lower AIC values rather than OLS models in all studied models which confirmed the results of other researchers in other fields (Zhao et al. 2018b;Dadashpoor et al. 2019;Li et al. 2020). The variable of the irrigated land farming area had the most significant effect with the least AIC and maximum adjusted R 2 .
The outputs of the GWR model offered a spatial illustration of the independent variables in describing soil erosion in the Talar watershed. The spatial pattern of local adjusted R 2 for independent variables is shown in Fig. 5. It indicated that local R 2 was not homogeneously distributed for the Talar watershed. Most local adjusted R 2 values for the K factor were less than 0.50, suggesting that the K factor was linked to soil erosion in the southern regions of the study watershed. The K factor had positive regression coefficients, which indicated that an increase in the K factor could result in a soil erosion increase. The influence was more substantial in the northern and central sub-watersheds.  In almost all sub-watersheds, the LS factor can explain more than 26% of the spatial variation in soil erosion. Some researchers have also reported similar findings on the crucial impact of topography on soil erosion (Nazari and Mohammady 2017;Chalise et al. 2018). The increase in LS factor affected the soil erosion more strongly in the northern and central sub-watersheds with a local regression coefficient of 4.59-5.29, which meant the change of LS factor could increase the soil erosion up to a maximum level of 459-529% in local areas.
From the north to the south of the watershed, the local adjusted R 2 for the C factor decreased. As a result, the C factor had a more significant effect on soil erosion in the watershed's upper parts than in the lower parts. Similarly, the highest local adjusted R 2 of the P factor was obtained for sub-watersheds located in the northern part of the watershed. Generally, it can be explained that among the RUSLE factors, LS and P factors had an enormous impact on soil erosion due to the topographic characteristics of the Talar watershed and suitable protective measurements in most parts of the watershed.
These local variations in local adjusted R 2 for the P factor showed that the spatial trends of the P factor were better associated with soil erosion in some northern parts of the watershed. In contrast, correlations in the southern parts were weaker. In the sub-watersheds 32 and 34, the P factor was negatively associated with soil erosion. The most considerable value of the local coefficient was found in the north of the Talar watershed.
The local coefficient and local adjusted R 2 distributions of the land-use for soil erosion in the GWR models are shown in Fig. 6. As indicated, the northern and central parts of the watershed with the most area of irrigated land had higher values of local adjusted R 2 . On the other hand, the global coefficient for irrigated land was − 0.04, suggesting a negative relationship between soil erosion and irrigated land areas. The local map of irrigated land coefficients, however, showed a range of − 0.059 to 0.154, indicating the spatial heterogeneity of the model. The relationships between soil erosion and areas of irrigated land had complex local characteristics. The sign of the local coefficients for this variable switched from positive in the southeastern sub-watersheds to negative in the other sub-watersheds. Cropping systems, mainly wheat cultivated on sloping lands with intrinsic high soil erodibility, represented the prone areas of the watershed to soil loss among different irrigated lands and caused a positive relationship with soil erosion.
For forest land-use, an obvious spatial correlation between forest area and soil erosion was determined by local adjusted R 2 values. The local adjusted R 2 in the northern subwatershed was more significant than those in the southern parts due to the more fabulous presence of forestlands in this part of the watershed. Forestland had negative regression coefficients that indicated that a decrease in the area of the forest could increase soil erosion. This close relationship between forestland and soil erosion had also been verified by other research (Koirala et al. 2019;Belayneh et al. 2019;El Jazouli et al. 2019).
Regarding local adjusted R 2 and coefficient of rangeland areas, southern parts of the watershed showed a weaker correlation with soil erosion than northern parts. However, the rangeland mainly covers the southern sub-watersheds. This could be attributed to rangeland conditions, which are more degraded in the northern sub-watersheds than southern parts.
The result of the GWR model showed a meaningful relationship between soil erosion and land-uses for the Talar watershed, particularly for forestland. In this way, studies reported that land-use is an essential factor that exacerbates the roles of rainfall and slope 1 3 steepness on soil erosion processes (Thornes et al. 1990; Thornes and Wainwright 2004). In the last two decades, a large area of the Talar watershed turned to agriculture through deforestation and land conversion (Kavian et al. 2018), leading to dramatic soil erosion, as confirmed by other researchers (e.g., Chaplot et al. 2005).

Principle component analysis (PCA)
The scree plot obtained from PCA analysis revealed ten principal components (PCs) in soil erosion prediction, as shown in Fig. 7. In total, PC1 and PC2 explained 66.2% of soil erosion resulted from the RUSLE model concerning land-use and the RUSLE factors, which gave a good idea of data structure. PC1 and PC2 represented 40.7 and 25.5% of the total variance for soil erosion about explanatory variables, respectively. The contribution of different explanatory variables in PC1 and PC2 of soil erosion is indicated in Fig. 8.
Based on Fig. 8, as the arrows change from bright blue and shorter ones to darker blue and longer ones, it is inferred that the contribution of variables in studied PCs increased. Some variables, viz. forest, irrigated agriculture, and urban, had obvious correlations and higher contributions with dimension 1 (p values < 0.01). Previous research (Sharma et al. 2011;Park 2012;Zare et al. 2017) confirmed the critical role of forestland on soil erosion and the impact of traditional farming systems in the Talar watershed irrigated land.
As shown in Table 3, forest and irrigated agriculture had a significant positive relationship with soil erosion among different land-uses, which is due to the presence of dense vegetation cover and its protective role. Furthermore, it is found the solid negative contribution and correlation for P factor (− 0.82) and C factor (− 0.78) in dimension 1 (Dim1), which meant proper land cover management and conservation practices could reduce soil erosion significantly, as documented by Tang et al. (2015) and Thomas et al. (2018).
A positive higher correlation between rainfed agriculture and the K factor in the RUSLE model characterized the case of dimension 2 (Dim2). The R factor was negatively associated with Dim2 (p values < 0.01).

Conclusion
The spatial relationships between soil erosion, land use, and the RUSLE factors were investigated using OLS and GWR. The results of these models were compared, and it was found that all GWR models outperformed their OLS counterparts in terms of modified R 2 and AIC values, providing more details through a local coefficient. In other words, GWR was more effective and powerful for interpreting the relationship between soil erosion and explanatory variables. The relationships provided by the GWR models revealed significant spatial non-stationarity. The association between soil erosion was more robust with the LS factor, P factor, and forestland. The LS factor had a positive relationship with soil erosion. PCA also showed that PC1 and PC2 could explain some 66.2% of the variance, and forest and irrigated lands had an obvious correlation and higher contribution with variables associated with dimension1. Forest had hostile relationships with soil erosion, so it was related to the obstacle role of vegetation cover against erosion. The current study indicated how to