Evaluating the effects of vegetation and land management on runoff control using field plots and machine learning models

Excess surface water after heavy rainfalls leads to soil erosion and flash floods, resulting in human and financial losses. Reducing runoff is an essential management tool to protect water and soil resources. This study aimed to evaluate the effects of vegetation and land management methods on runoff control and to provide a model to predict runoff values. Filed plot data and three machine learning (ML) methods, including artificial neural network (ANN), coactive neuro-fuzzy inference system (CANFIS), and extreme gradient boosting (EGB), were used in a test site in the north of Iran. In this regard, plots with various vegetation and land management treatments including bare soil treatment, rangeland cover treatment, forest litter treatment, rangeland litter treatment, tillage treatment in the direction of slope, tillage treatment perpendicular to the slope, and repetition of treatments under forest canopy were constructed on a hillslope. After each rainfall event, the amount of rainfall and corresponding runoff generated in each plot was recorded. Three ML models (ANN, CANFIS, and EGB) were used to establish relationships between amounts of recorded runoff and its controlling factors (rainfall, antecedent soil moisture (A.M.C), shrub canopy percentage and height, tree canopy percentage and height, soil texture (clay, silt, and sand percent), slope degree, leaf litter percentage of soil, and tillage interval). These data were normalized, randomized, and divided into training and testing subsets. Results showed that the ANN performed better than the other two models in predicting runoff in training (R2 = 0.98; MSE = 0.004) and the test stages (R2 = 0.90; MSE = 0.95). Statistical analysis and sensitivity analysis of inputs factors showed that rainfall, rangeland cover, and A.M.C are the three most important factors controlling runoff generation. The adopted method can be used to predict the effect of different vegetation and land management scenarios on runoff generation in the study area and the areas with similar settings elsewhere.


Introduction
Excessive runoff causes harmful effects on the environment through intensifying soil erosion, water and soil pollution, and flash floods (Schismenos et al. 2022). Therefore, it is necessary to evaluate runoff and water erosion as important management tools to mitigate the harmful effects of excessive runoff (Hu et al. 2020). In recent decades, soil erosion and flood occurrences in many parts of the world have increased due to population growth, deforestation, and other human manipulation of the landscape (Food and Agriculture Organization 2016). Additionally, runoff and soil erosion causes excessive sedimentation downstream of rivers, introducing pollution to aquatic environments and reducing reservoirs' storage capacity (Hu et al. 2020;Boussadia-Omari et al. 2021;Gholami and Sahour 2022). Studies have shown that runoff rates depend on various factors, including rainfall (rainfall value and intensity), land cover and land use, topography, antecedent soil moisture (A.M.C), and soil properties (Poesen and Hooke 1997). On hillslopes, slope rate, vegetation cover, and soil types are the most critical factors affecting splash erosion, overland flow, and runoff (Yair and Lavee 1974).
Previous studies used experimental models to quantify runoff generation using geomorphological and physical parameters (Wischmeier and Smith 1978;Jasrotia et al. 2002). The importance of runoff prediction in hydrological studies has urged scientists to develop various experimental and mathematical methods for rainfall-runoff modeling. Despite their effectiveness for specific conditions, experimental methods are limited by high uncertainty and significant differences between estimated and actual runoff volume (Alewell et al. 2019).
Converting rainfall to runoff is a nonlinear process in time and space and is controlled by various factors. Therefore, advanced methods are required to consider the complexity of the rainfall-runoff process and incorporate multiple controlling elements. Measuring runoff through hydrometric stations and using field measurements such as field plots are costly and time-consuming; thus, alternative methods for accurately estimating runoff are essential for the watershed without gauging stations.
Machine learning (ML) techniques have been used in numerous studies and have proven to be useful tools for assessing and mapping different types of rainfall-runoff models (Liu et al. 2016;McAfee and Brynjolfsson 2017;He et al. 2019;Gholami et al. 2019;Kratzert et al. 2019a, b;Cappugi et al. 2021;Mina et al. 2022). ML techniques, including artificial neural networks (ANNs), random forest (RF), support vector machine (SVM), deep learning (DL), and extreme gradient boosting (EGB), have been used in environmental modeling (Sahour et al. 2021a, b;Halecki et al. 2018;).
An ANN comprises a series of connected nodes that establishes non-linear relationships between inputs and corresponding outputs. The ANN is inspired by biological nervous systems (Kia et al. 2012). The structure of ANN enables the nodes to pass information between one another and allows them to learn from data, recognize patterns, and make decisions. According to research by Braddock et al. (1998) and the Task Committee (ASCE), 90% of the ANNs used in hydrological studies are based on Backward propagation (BP) algorithms (Lippman 1987). The ANN technique employed in this study is known as multilayer perceptron (MLP). An MLP is comprised of sets of units called perceptron. A perceptron consists of one or multiple inputs, an activation function, and an output. Typically, MLP is trained by a BP algorithm. MLP consists of three main layers (input, hidden, and output), each composed of neurons. The training was performed by changing the binding weight based on the error values. In this study, a feed-forward MLP was used to model runoff.
EGB is a predictive modeling method in machine learning used to model runoff from a set of input variables. The learning procedure in EGB is based on the decision tree algorithm that starts from observations of input variables (branches of the tree) to the prediction of the output variable (leaves of the tree). Tree-based models have several advantages, including handling various data types, modeling complicated interactions, and managing missing data with the lowest loss of information. However, tree-based learning algorithms have two main limitations: prediction weakness and difficulties with analyzing large trees. To address these shortcomings, gradient boosting was first introduced by Breiman et al. (1984) and further developed by others (Mason et al. 2000;Chen and Guestrin 2016). The EGB method combines a series of weak models periodically constructed to create a robust final model (Mirzaei et al. 2021;Sahour et al. 2021bSahour et al. , 2022. The EGB is based on reinforcement and boosting, which can be described as creating a "strong learner" by combining the results of several weak learners (Fan et al. 2018). The EGB also uses parallel processes, which reduces the required computing time (Fan et al. 2018;Naghibi et al. 2020).
In gradient boosting, the goal is to use a set of inputs (X 1 ,…, X n ) to predict a set of outputs (Y 1 ,…,Y n ) by creating a model F(X) → Y and minimizing the sum of the loss function by improving the F(X). This is performed by calculating the negative gradients of J with respect to F(X i ), which is − J F (X i) . Then, a regression tree h is fit to negative gradients− J where is the step size of the algorithm to achieve the estimated minimum ofJ.
As a significant improvement in EGB, the process begins with a loss function L Y i , F X i + h and m i n i m i z e s J = The T represents the number of leaves in the tree and w is the weight of the leaves.
The high performance of machine learning (ML) models has been reported in various types of runoff-induced soil erosion modeling (Kern et al. 2017;Zhao et al. 2020). ML models have a high ability to communicate between input and output data. Using ML techniques, runoff can be estimated by quantifying the relationship with its controlling factors. These methods have become popular in water science and engineering due to their completely nonlinear mathematical structure (Prasad et al. 2018;Nawar and Mouazen 2019;Kashani et al. 2020). One of the most common types of neural networks is the multilayer perceptron (MLP) which has been successfully used in a wide range of applications, including soil erosion and rainfall-runoff simulation (Lippman 1987;Braddock et al. 1998;Harris and Boardman 1990;Loh and Tim 2000;ASCE 2000;Kisi 2008;Wang et al. 2009;Hu et al. 2020;. For example, Pal et al. (2003) used a combination of MLP and self-organizing map (SOM) to predict the temperature and could successfully predict air temperature. Adamowski (2013) compared SVM and ANN performance in rainfall-runoff modeling in a mountainous basin in Uttaranchal, India. Findings indicated the higher accuracy of the SVM model in predicting surface runoff, baseflow, and total flow. Successful attempts in rainfall-runoff modeling using ANNs can also be found in Nourani and Komasi (2013) and Lafdani et al. (2013).
A coactive neuro-fuzzy inference system (CANFIS) as a generalized ANFIS (adaptive network-based fuzzy inference system) is a type of ANN that is based on Takagi-Sugeno fuzzy inference system. The method was developed in the early 1990s (Jang 1993). Its inference system corresponds to a set of fuzzy IF-THEN rules that have the learning capability to approximate nonlinear functions (Abraham 2005). CANFIS is based on an input-output data set of a fuzzy inference system (FIS). This system is based on a combination of three components: membership functions of input and output variables (fuzzy), fuzzy rules (rule-based), mechanism inference (combination of rules with fuzzy input), and output characteristics and system results (nonfuzzy). For the first time, Jang (1993) was able to use the linguistic power of fuzzy systems and ANN training to introduce a system called fuzzy systems based on adaptive neural networks. The CANFIS is an extended version of the earlier ANFIS with multiple input-output pairs. The CANFIS structure consists of five layers, each one being adaptive or fixed. It combines some single-output ANFIS models to produce a multipleoutput model with nonlinear fuzzy rules (Dinh and Afzulpurkar 2007). CANFIS has proven to be capable of nonlinear modeling with high precision. CANFIS takes advantage of added benefits of ANN and fuzzy networks. This model uses ANN's learning ability and fuzzy inference's modeling ability for the best performance. This enhances the performance of CANFIS over each individual model of ANN and fuzzy system (Mirjalili and Lewis 2016;Reddy et al. 2017).
Although several studies have used ML algorithms for rainfall-runoff modeling (Milly et al. 2008;Vaze et al. 2015;Kratzert et al. 2019a, b;, few have used these algorithms to disclose the nonlinear relationships between runoff and its controlling factors.
The critical issue is to provide an efficient methodology to model the rainfall-runoff process concerning land and vegetation management. Additionally, to find an optimum MLbased framework to evaluate land use management practices on runoff control. Therefore, the current study aims to fill the gap in the existing literature by quantifying the effects of landscape management, vegetation cover, soil property, and topography on runoff generation. In this study, we simulated various management scenarios by constructing field plots in open areas that are replicas of natural environments to investigate the combined effect of crop management and landscape parameters on the generation of runoff.

Study area
The study site is a hillslope located at the University of Guilan, Iran (between 49° 19′ eastern longitude and 37° 18′ northern latitudes) (Fig. 1). The height of the study site is 20 m above sea level. The average annual precipitation is 1360 mm, and the region's climate is temperate and humid. The study area comprises Quaternary alluvial sediments with a heavy texture of clay or loamy clay soil type. The predominant vegetation type of the area is grass and native trees, including Beech and Alder trees. The field plots were placed on the hillslope with a length of 100 m and a slope of about 25° (Fig. 1). A rain gauge set was installed at the study site to measure rainfall.

Runoff measurement in field plots
Eighteen plots with dimensions of 2 × 1 m and different land cover types were placed on the hillslope with minimum soil tampering. The plots were placed in pairs and with the same size and dimensions. The total height of the plots is 40 cm with 10 cm on the ground (Fig. 2). The plots are made of thin metal sheets, making them stable in the ground with minimum soil tampering. The plots were created with triangle-shaped outlets to direct the runoff to the drainage pipe. The drainage pipe was installed to transfer the runoff to reservoirs, which were sealed correctly. The plots were established before spring, the beginning of the growing season in the region. As mentioned earlier, the plots contained various vegetation types and soil cover. The plot containing bare soil runoff and rainfall was recorded at the beginning of plowing and before vegetation growth. For other plots with vegetation cover, the measurements were carried out from late spring 2019 to late winter 2020, when the root systems of plants were formed, and the plants were stable in the ground.
Eighteen plots with different soil and vegetation treatments, including bare soil, rangeland vegetation cover, forest litter treatment, rangeland litter treatment, tillage in the slope direction, tillage treatment perpendicular to the slope, and repetitive treatments by tree canopy, were tested. An automated rain gauge instrument was set up near the plots to measure rainfall values, and an evaporator pan was used to measure possible evaporation from the rain gauge surface. These devices measure rainfall and evaporation with an accuracy of one millimeter. The runoff measurement's precision depends on the plot's correct functioning in directing the runoff toward the tank. This was considered by designing a non-rectangular outlet for the plots. The volumetric method can measure the collected runoff with an accuracy of 0.01 Litter. However, the primary issue is to direct all generated runoff into the tanks. In this regard, selecting an available study site on the university campus and controlling the drainage process during rainfall events helped minimize errors in runoff measurement.

Plots and treatments
The soil texture (clay or loamy clay) and land slope (25°) were the same in all plots. The plots' small size and similar shapes helped to create stable soil and slope condition, in order to investigate the effects of other factors, including vegetation, litter, and land management. A detailed explanation of each treatment used in the plots is as follows:

Rangeland vegetation treatment
In this treatment, plots with 0, 50, and 100% vegetation coverage were used to evaluate the effects of rangeland vegetation treatment on runoff generation. Canopy height was about 20 cm ( Fig. 2A). Rangeland species were native rangeland species of grasses, including Paspalum and Dactylis glomerata. The canopy percentage was estimated by photography and calculating the ratio of the covered area to the plot area. The measured vegetation height and canopy percentage accuracy were 1 cm and 1%, respectively.

Wooded rangeland vegetation treatment
In this exercise, plots with 0, 50, and 100% rangeland vegetation cover were placed under trees with a height of about 10 m and canopy percentage of 30 and 50 (Fig. 2E). The trees were native species of beech (Fagus Orientalis) and alder (Alnus subcordata). The percentage of tree canopy was estimated by photography, and the ratio of the covered area to the plot area with a 1% accuracy.

Treatment of plowed bare soil
Two plots were used in this treatment with thoroughly plowed soil and without vegetation. One was used under the tree canopy, and the other was placed away from the tree canopy.

Rangeland litter treatment
We used two plots in this treatment, one with 50% and the other with 100% litter coverage (Fig. 2B, C). Dried litter remnants of grasses species were used for these plots. The depth of litter was similar in the plot area (measured with a ruler with an accuracy of 1 mm), and litter percentage was assessed by photography and calculating ratios with an accuracy of about 1%.

Forest litter treatment
After plowing the land, we used different percentages of litter and chopped branches of native trees in this treatment. These plots included one with 50% and the other with 100% forest litter coverage (Fig. 2B, C). The thickness of the litter cover was the same in each plot (measured with a ruler with a1mm accuracy). Litter percentage was assessed by photography and calculation of ratios with 1% accuracy. A separate plot was used for the plowed bare soil.

Tillage treatment parallel to the slope
In this exercise, we plowed the soil parallel to the slope and then planted native rangeland vegetation (same as rangeland plot treatment) (Fig. 2D). Cultivation distances were about 30 cm in three rows. The plants were cultivated at the beginning of spring, and measurements started after the full establishment of plants at the end of spring and continued until the end of the year.

Tillage treatment perpendicular to the slope
In this exercise, the plot was first plowed perpendicular to the slope, then native rangeland plants were planted on the ridges (Fig. 2D). Cultivation distances were about 40 cm in three rows. The plants were cultivated at the beginning of spring, and measurements were conducted from the end of the spring to the end of winter.

Model inputs and output
The factors affecting the generation of runoff (independent variables), including rainfall, AMC, rangeland cover, forest cover, soil texture, slope degree, percentage of litter cover, and tillage direction, were used as inputs of the ML models. Runoff generated at each plot was measured and used as the output (dependent variable) in the modeling process.

Runoff
The runoff height in each rainfall event is the only output of the models for this study. After each rainfall event, the runoff volume collected in the reservoirs was measured. The runoff volume is divided by the plot area, and the result is the runoff height of the rainfall event in mm. Small rainfall events do not generate significant runoff and are, therefore, not suitable for comparing and evaluating the role of vegetation and land management in runoff generation. Runoff is generated when rainfall exceeds the initial loss (sum of infiltration, interception, and depression storage). Initial loss can be estimated using experimental methods. Based on the SCS-curve number method, the approximate values of the initial loss were determined between 15 and 30 mm (Eqs. 1 and 2): where CN is a curve number that is determined based on soil hydrological group, land use, soil surface vegetation, and AMC, S is the total amount of loss in rainfall (mm), two-tenths of which is equivalent to the initial loss. This means there will be no runoff for the loss values below 0.2. The effects of vegetation, soil texture, and land use on runoff generation are more pronounced after heavy rainfall (Mein Rafiei Sardoii et al. 2012). The total loss values (S) were calculated by subtracting runoff from rainfall values. There was no depression storage associated with the plots in this study due to the small size and locations of plots. The initial loss values indicated the amount of infiltration into the ground and interception by the leaves in the forested plots. Interception is part of the rainfall absorbed by the tree's canopy.
Regarding rainfall and runoff data and their use to model the rainfall-runoff process, it is also necessary to use no rainfall events in the inputs to restrict the models from generating runoff for nonrainy days.

Rainfall
The amount and intensity of rainfall are the most influential parameters (input) in generating runoff. Due to access to the cumulative rain gauge, only rainfall amounts could be measured. Rainfall values were measured using a rain gauge installed near the plots with a 1 mm accuracy (Fig. 1). After each measurement, the rain gauge is discharged for the subsequent measurement.

A.M.C
Measuring the A.M.C is difficult and time-consuming. Therefore, the total precipitation of the latest 5 days was used as an indicator of A.M.C (Song and Wang 2019). To record the total rainfall of the latest 5 days, the data of the meteorological station and the rain gauge (1 mm accuracy) located on the site were used.

Rangeland cover
To evaluate the effect of rangeland cover, two indicators were used quantitatively. The most important indicator was the percentage of rangeland canopy. The percentage of rangeland cover was estimated through photography and the ratio of vegetation cover area to the plot area. Another indicator of rangeland cover is vegetation height. In the study plots, the percentage of vegetation cover varied between 0 and 100, and the height of the rangeland cover varied between 0 and 20 cm.

Forest cover
The heights of the trees were almost the same because they belonged to the same species and age. The percentage of tree canopy was also determined by photography and comparing the covered area with the plot area. Concerning runoff generation, the percentage of tree canopy is effective in the interception of rainfall and the amount of runoff (Berland et al. 2017;Liu and Chang 2019;Selbig et al. 2022). Moreover, the height of the trees (10 m) and the distance between the canopy below the ground level are effective in soil splash and runoff production (Ghahramani et al. 2011).

Soil texture
Soil texture was determined by sampling and soil tests. This parameter was used quantitatively in the modeling process through the percentage of sand, clay, and silt (Sahour et al. 2021a). However, soil texture was not an effective input in the rainfall-runoff model in this study due to the lack of noticeable changes in the limited area of the plots.

Slope degree
The land slope was determined by land surveying with a 1° accuracy. The slope was 25°, with minimal variation in the plots. Due to its limited variation, the land slope was not an optimal input of the model (or an effective factor) in runoff generation.

Percentage of litter cover
Rangeland and forest litter covers with variable percentages were used in different plots (0, 50, and 100% litter covers).
The percentage of litter in each plot was measured by the photography of the area of the plots and calculating the ratio of litter-covered area with a 1% accuracy.

Tillage pattern
The tillage pattern is challenging to quantify and incorporate into the modeling process. The row widths were estimated and used as input to incorporate the tillage pattern in the rain-fall runoff models. Using different tillage patterns, we could compare the effect of two tillage scenarios (perpendicular and parallel to the slope) on runoff generation.
A summary of the plots and treatment conditions is shown in Table 1. Moreover, the accuracy in measuring and measuring methods is presented in Table 2.

Rainfall-runoff modeling
This study used three machine learning methods, namely ANN (MLP), CANFIS, and EGB, for rainfall-runoff modeling. The output of the machine learning models was recorded runoff values, and the inputs included rainfall, percentage of rangeland canopy, rangeland vegetation height, percentage of the tree canopy, A.M.C, slope, soil texture, tillage row widths, and litter percentage. First, the data were randomized, and Table 1 The plot types and treatments in the field studies. Soil texture and the ground slope remained invariant in all field plots  after normalizing the data, they were divided into two subsets of training (including 70% of total data) and test data (30% of data). The same training and testing data were used in the modeling process for all three methods. Then, the model training, model optimization, and model testing were performed. For ANN models, the Grid Search method was used to determine the optimal network structure, including optimal transfer function, optimal learning technique, optimal inputs, number of neurons, and number of optimal training cycles in the training and optimization phases (Isaaks and Srivastava 1989;Isik et al. 2013). This method chooses the optimum parameters by searching through a subset of manually specified values. The optimal number of hidden neurons was found from a list of numbers from 1 to 10. Finally, the optimal inputs were selected based on their runoff modeling and sensitivity analysis performance. We used a grid search method to find optimum hyperparameters of the EGB model, including learning rate, number of estimators, maximum depth, and subsample. The three methods in runoff modeling were evaluated by comparing the predicted values with the measured ones using statistical coefficients (R 2 and mean squared error (MSE)). An optimal network is a network that provides the highest R-squared and lowest MSE.

Model performance evaluation
The performance of the three models was evaluated by comparing the predicted runoff values with the measured values on training and testing subsets. We used statistical coefficients, namely mean square error (MSE), coefficient of determination (R 2 ), and mean absolute error (MAE). The optimum model is the model that produces the highest R 2 and the lowest error values.

Results of field plots
We measured the runoff generated in the field plots after ten rainfall events. The recorded rainfall values range between 1.5 and 40 mm. However, only rainfall events with more than 20 mm could generate runoff in all plots. The ranges of rangeland canopy percentage and tree canopy percentage were between 0 to 100 and 0 to 50%, respectively. Further, the amount of litter in the plots varied between 0 and 100%. The parameters of soil texture and land slope in all the plots have no significant changes in the plots. The A.M.C was estimated between 0 and 53.7 mm in the studied rainfall events. In Fig. 3, the measured runoff volume (in liter) in the plots, the recorded values of rainfall in the rain gauge station, and different treatment scenarios were also presented (A.M.C, litter percentage, vegetation canopy, tree canopy). The changes in soil texture (silt, clay, and sand) and ground slope are not shown in Fig. 3. There were no notable soil texture and ground slope changes in the plots. The difference between the amount of rainfall and runoff indicates the amount of total loss. Total loss (S) has been variable in different events (Fig. 2, S = rainfall-runoff). During rainfall, even in similar soil and slope conditions, the amount of runoff and S is highly variable due to the role of vegetation. The runoff values also change from time to time in the same plot with changes in rainfall and AMC.

Relationships between model inputs and runoff
The relationship between the measured runoff and its controlling factors is shown in Fig. 3. The rainfall-runoff process is affected by parameters such as rainfall intensity, rainfall value, vegetation, land cover, slope, A.M.C, and land management. Pearson s correlation coefficients between the measured runoff and the factors affecting runoff generation are given in Table 3. According to Table 3, all factors, except slope, soil texture, and tree height, have a significant relationship with runoff. However, soil texture and slope are the most important factors in runoff generation in this study because these variables did not change in the plots. Due to high infiltration, a sand-dominated soil type can prevent runoff production even during heavy rainfall. Unfortunately, we did not have the instruments to measure rainfall intensity, and therefore, we did not evaluate this factor. Additionally, the study aimed to provide a workflow that could potentially be duplicated in the areas with minimum hydroclimate recording tools.
Rainfall value has a significant and direct relationship with runoff (R = 0.76). Other factors, including rangeland vegetation, tree canopy, A.M.C, litter cover, and tillage row width, have substantial and inverse relationships with runoff production. The soil moisture is the second most important factor in runoff values (R = − 0.32). However, in the rangeland plot, with the change of vegetation cover from zero to 100%, the effect of rangeland vegetation is considered the second factor after rainfall. Tree height is effective in protecting soil from splash erosion, land degradation and preventing soil hardpan formation. However, tree height in the plots did not change notably; therefore, it was not one of the optimum variables for the models' inputs.
Effects of tillage direction (parallel and perpendicular to slope direction) were compared in plots with different vegetation heights and percent coverage. Figure 2D shows a pair of plots with different tillage directions. These plots are constructed and cultivated with similar conditions of soil and rangeland species. At the end of the growing season, we noticed that vegetation conditions and especially

Modeling results
The grid search method was used in the modeling process to determine the optimal hyperparameters and network structures for ANN, CANFIS, and EGB models. The results indicate that rainfall values, A.M.C, rangeland vegetation, canopy percentage, litter, and tillage width are optimal inputs to predict runoff amounts. Further, the transfer function of the hyperbolic tangent and LM learning technique is the optimum parameters of the MLP network that were also found to be the optimum in previous studies (Gholami et al. 2019. In CANFIS modeling process, the membership function of the Bell and the Takagi-Sugeno-Kang (TSK) fuzzy model was the optimum, the parameters that were also found to be the best options in previous studies (Zhang et al. 2009).
All three models used in the studies showed acceptable performance in the training stage (Table 4). The comparison between observed and predicted runoff on the test subset is shown in Table 5. According to the training and testing results, all three methods showed favorable performance. The ANN and CANFIS showed the highest and the lowest performance, respectively.
The comparison between predicted and measured runoff values in the test stage is presented in Fig. 4. As it is observed, there is a good match between predicted and measured values. However, the three models showed notable differences in the prediction of maximum, average, and minimum runoff values. Finally, a sensitivity analysis of input factors was performed using the tested models. The Table 3 Pearson's correlation coefficients between runoff and its controlling factors

Field plots data
In rainfall-runoff modeling, the selection of input variables for modeling is important. Statistical analysis, sensitivity analysis, and model optimization showed that rainfall, vegetation (type and canopy), and soil moisture are the best inputs for rainfall-runoff modeling. Previous studies have similarly shown rainfall, vegetation, slope, and soil moisture conditions (Fang et al. 2015;Varvani et al. 2019;Gao et al. 2020). Based on statistical analysis, rainfall and A.M.C have a positive relationship with runoff production, and vegetation and litter are inversely related to runoff values. Finally, the most important natural factor in runoff production is rainfall. However, vegetation cover is the most important factor that humans can manipulate to decrease runoff generation. Additionally, land slope and soil properties are two important factors; however, in this study, we focused on the role of vegetation treatment and land management in controlling runoff. Therefore, to investigate the effect of these two factors on plots, the conditions of slope and soil properties should be the same (Gholami et al. 2018). In other words, since there were no noticeable changes in soil texture and land slope in the study plots, no significant relationship between them and changes in runoff values is observed. One of the primary goals of the current research was to select the conditions of the plots in such a way (uniform soil conditions and land slope) that the changes in runoff generation depend on the characteristics of rainfall, vegetation, and land management. The slope length was also omitted from the factors controlling runoff generation because it was challenging to investigate the effect of slope length within the small size of the plots (Ghahramani et al. 2011). Generally, on a steep slope, the runoff velocity increases from top to downhill (Kara et al. 2010;Liu et al. 2020).
Although rainfall intensity is an important parameter in runoff generation, the study area lacks gauging equipment to measure this parameter. This is also the case in the majority of watersheds in the region. Therefore, this study presented a practical model-based workflow for watersheds lacking rainfall intensity measurement equipment. The methodology could be used in other watersheds with similar settings.
The vegetation factor has multifaceted effects on runoff production. The type of vegetation and its density are the most important factors. Rangeland cover or short crop cover with high density has maximum efficiency because it controls the effect of rain splash and has high water infiltration and absorption. The highest efficiency in controlling runoff was observed in a plot with rangeland cover with 100% canopy cover. Vegetation determines litter type and coverage and soil humus. Tree canopy cover showed a limited effect on runoff control. However, it should be noted that there was no significant rangeland vegetation cover under the tree canopies.
Previous studies showed that dense forests could reduce the runoff by up to 40% (Styzcen and Morgan 1995;Buendia et al. 2016;Wolka et al. 2018). However, we observed only a 10% decrease in runoff generation in this study. The low impact of trees on runoff reduction was due to the limited density of trees and the low density of tree canopy. We did not have broadleaf trees with more than 60% canopy.
The effect of litter on reducing runoff production is highly variable and depends on the amount and type of litter (Ghahramani et al. 2011;Gomyo and Kuraj 2016). The results showed that both factors of litter type (remnants of Grass species or tree mulch) and the percentage of litter cover effectively control runoff. The use of litter and large chopped tree branches can lead to the generation of surface runoff due to the mass and durability of the debris. The litter of herbaceous plants and rangeland vegetation has a relatively better efficiency in controlling runoff and soil erosion (Whitford 2002;Muñoz-Robles 2010).
One of the important results is the effect of crop management and tillage direction. These parameters could not be evaluated statistically, but the visual comparison of two adjacent plots is shown in Fig. 2D, one with planting in the slope direction and the other with planting the same species perpendicular to the slope direction. The two plots were similar in slope, soil characteristics, rainfall, planting species, and planting method. The plants were cultivated in the spring without human interference or irrigation during the growing, and the result was observed after three months. Comparing the two plots with different tillage directions shows that tillage perpendicular to the slope reduces runoff velocity, increases infiltration and soil moisture retention, and improves vegetation growth (Muñoz-Robles 2010; Laufer et al. 2016) which eventually reduces runoff. Runoff generated in the plot with perpendicular tillage was 40% less than in the plot with tillage parallel to the slope showing the effectiveness of perpendicular planting in controlling runoff.
Total loss (S) showed high variability in different treatment scenarios. The maximum S was almost equal to the total rainfall in the average and minimum rainfalls, and that was observed in the rangeland vegetation treatment with 100% canopy. The minimum S was very small, which was observed in the bare soil treatment. Total loss (S) and initial loss (0.2S) are primarily related to rainfall and vegetation conditions. Rainfall events less than 20 mm were unable to generate notable runoff regardless of vegetation type and treatment scenarios.

Rainfall-runoff modeling
The performance evaluation of the three models showed that the MLP has a higher performance than the EGB model, but the EGB model predicts minimum runoff values with higher accuracy. This capability of the EGB model is important in predicting runoff values in small rainfall events in the annual time scale.
Previous studies showed that models have more errors in minimum and maximum values (Jimeno-Sáez et al. 2018;Sahour et al. 2021a). This could be due to the nonlinear relationship between rainfall and runoff values. Therefore, the models will have more error in the maximum or higher runoff values. To address this issue, it is necessary to incorporate significant values of rainfall and runoff in the modeling process. Moreover, accurate estimation of maximum runoff values is more important in watershed management, flash flood prediction, and soil and water conservation plans.
The input parameters used in this study (rainfall, vegetation height, canopy percentage, slope, soil texture, and A.M.C) can be quantified and incorporated into the modeling process. Using data from meteorological stations, rainfall values latest total 5-day precipitation (A.M.C) are available on an annual or decadal scale. Therefore, the adopted methodology can be used to predict runoff values in many places.

Conclusion
Rainfall characteristics are the main determining factor in runoff generation; however, vegetation type and density are the parameters that can be manipulated and used for flood risk management. Additionally, land management and tillage direction are other important factors. Therefore, a suitable vegetation type and tillage practice can be selected by considering soil characteristics, land slope, A.M.C, total loss, and initial loss. Unfortunately, quantifying runoff over large areas is costly and time-consuming. ML models are powerful tools to predict runoff values on a monthly or annual scale based on precipitation data from meteorological stations. In this regard, the selection and precise measurements of inputs play a significant role in the accuracy of the results. The distinctive feature of using machine learning and AIbased models in predicting runoff is the speed of operation, high accuracy, low cost, and most importantly, the ability to use different structures and models with inputs appropriate to the region's conditions and available data. Depending on the nature of the input and output data, the performance of the models will be significantly different.
This study presented an efficient methodology to predict runoff values concerning land and vegetation management. The MLP network was more efficient than the CANFIS network and the EGB model in this study. But, the EGB model showed the least error in the amounts of minimum rainfall and runoff on sunny days. Therefore, EGB will be a top choice to predict long-term runoff when we have a variety of rainfall values. Therefore, the optimal model should be selected based on the nature of the data, objectives, and parameter selection in the modeling process. For example, the target parameter can be the prediction of maximum runoff values for flood prediction or the runoff reduction in the study of vegetation effects. For future studies, we suggest using additional plots with different degrees of slope and land use types and more rainfall data. It is also suggested to incorporate rainfall intensity values in addition to rainfall height.
Data availability All data generated or analyzed during this study are included in this published article [and its supplementary information files].

Declarations
Ethical approval All procedures performed in studies involving human participants were by the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent Informed consent was obtained from all individual participants included in the present study.

Conflict of interest
The authors declare no competing interests.