A comparison of different Articial Intelligence and Machine Learning methods for Gully Erosion Susceptibility Mapping in the Upper Narmada Basin

Gully erosion (GE) is one of the most important mechanisms of soil loss worldwide. In this study, various machine learning techniques such as Classication and Regression Trees (CART), Random Forest (RF) and Articial Neural Networks (ANN), have been used to ascertain gully erosion susceptibility (GES) in the Upper Narmada Basin (UNB). The mapping and analysis was achieved using R programming and ArcGIS 10.8 software. Initially, a gully inventory map (GIM) of 1501 gully locations was prepared from Sentinel-2 and Google Earth images and extensive eld surveys. Out of the 1501 gullies in the study area, 1051 gully locations (about 70%) were used for training and 450 gully locations (about 30%) were used for validating the models. For GES modeling, 12 gully conditioning factors (GCFs) were used and the relationships between these GCFs and gully erosion were evaluated. The GES maps were prepared using the CART, RF and ANN models and divided into three susceptibility-based classes: low, moderately and highly susceptible GE classes. A large part of the study area was found to be highly susceptible to GE. Subsequent validation tests proved the high ecacy of these models in ascertaining the GES. The RF model was found to perform best compared to the others in this respect with an AUC-ROC value of 0.78. This model can therefore be used not only in the UNB but in other such areas to evaluate the GES zones and thereby aid in framing suitable measures to mitigate soil loss through gully erosion.


Introduction
Soils are a signi cant biodiversity carrier that sustains vital human needs (Keesstra et al., 2016) and their persistent worldwide deterioration has been a long-standing problem (Magliulo, 2012), adversely affecting lives and livelihoods (UNEP, 2017;Haregeweyn et al., 2017). Soil erosion through gullies is a common occurrence (Nampak, 2018;Amiri et al., 2019), with such narrow, incised channels being pathways for the surface runoff channelization (Kirkby and Bracken, 2009) that leads to substantial landscape degradation (Marzolff and Ries, 2011;Borrelli et al., 2017). This causes loss of a vital natural resource and therefore studies of gully formation and analysis of their causative factors have gained much precedence (Castillo and Gomez, 2016). Gully erosion can foster various environmental issues, such as deserti cation, oods, lowered soil fertility and agricultural productivity decline, which have a detrimental effect on the local economy (Zhang et al., 2018;Zabihi et al., 2018). The factors enabling gully erosion are generally twofold-physiographic (which encapsulates the region's lithology, topography, soil and climatic attributes) and anthropogenic (comprising of its land use and land cover, land management practices and built infrastructure) (Cui et al., 2012;Tang et al., 2013;Castillo and Gomez, 2016;Malik, 2008). This process ensues when the geomorphological threshold value of an area is surpassed by water erosion or sediment formation (Mccloskey et al., 2016).
Gully erosion thus raises the runoff coe cient by changing surface roughness, removing the topsoil and reducing ground permeability, thereby altering the hydrological characteristics of a region (Rahmati et al., 2016). It also affects the production of sediments in upstream areas, river ow conditions and water quality, the effectiveness/maintenance of water transfer structures and the storage of downstream surface water (Rahmati et al., 2017;Renschler and Harbor, 2002). Climatic changes can also contribute to the rise of gullies, following subsequent changes in the local groundwater balance Luca et al., 2011). As continual gully erosion can severely affect agricultural prospects and enhance wasteland expansion (Deng et al., 2015;Prosdocimi et al., 2016), a range of methods have been employed to track their growth and how the ambient landscape attributes enable this process to occur (e.g. Yitbarek et al., 2012;Conoscenti et al., 2013;Desprats et al., 2013;Shruthi et al. 2015), in order to demarcate the most susceptible zones where new gullies can form (Gomez-Gutierrez et al., 2015). The use of geospatial datasets and methods for this purpose is now widespread (e.g. Liu et al., 2016;Goodwin et al., 2017;Bennett and Wells, 2019), along with the infusion of various statistical measures and algorithms that facilitate such analysis and provide accuracy assessments of the generated outputs (El Baroudy and Moghanm, 2014;Bianchin et al., 2016).
In such susceptibility studies, discerning the spatial distribution of gullies becomes important along with the undertaking of a sensitivity analysis of its causative factors (Kirkby and Bracken, 2009;Luca et al., 2011), which analyses the diversity of these attributes (Rahmati et al., 2017;Hosseinalizadeh et al., 2019;Zakerinejad and Maerke, 2015) and the differences in their relative contribution towards enabling gully formation (Agharazi et al., 2017;Conoscenti et al., 2014). Through this, gully erosion susceptibility maps can be derived, which highlight areas where such features are most likely to develop (Dewitte et al., 2015).
While, these investigations have been mostly geoinformatics-based (Sadeghi-Niaraki et al., 2010;Faroqi and Sadeghi-Niaraki, 2016), the recent development of different machine learning algorithms and the integration of geographic information systems (GIS) with the R programming language has facilitated more precise susceptibility model preparation (Arabameri et al., 2019;Hosseinalizadeh et al., 2019).
Since susceptibility assessments depend on a number of diverse factors, statistical approaches, especially particular multivariate techniques, are clearly relevant for obtaining quantitative forecasts of potential event locations of such natural hazards (Luca et al., 2011). In this regard, ANN, Boosted Regression Tree (BRT), MARS and Generalized Linear Model (GLM) techniques have contributed signi cantly to the eld of susceptibility mapping, most notably in landslide hazard assessments and gully erosion studies (Vorpahl et al., 2012;Confortiet al., 2014;Youssefet al., 2016;Gomez-Gutierrez et al., 2015;Pourghasemiet at., 2017;Rahmati et al., 2017). Such machine learning based models are bene cial since-(a) the dependent variable is dichotomous (i.e. 0 or 1) and the expected values can be expressed as a probability; (b) various types of independent variables can be used, such as categorical, ordinal and continuous binary variables; (c) no preliminary assumptions need to be described before they are used; and (d) the independent variables, when used in these models, do not necessarily need to have normal distributions (Luca et al., 2011;Rahmati et al., 2017).
Given the widespread use of machine learning algorithms (MLA) for gully erosion susceptibility analysis, we have undertaken a comparison of some of the most common methods employed, using the Upper Narmada Basin (UNB) in western India as a case study. This provides an insight into which of these methods is more apt for this purpose. We therefore aim to-Assess the different environmental factors that abet the occurrence of gullies in the test region.
Explore the e cacy and robustness of various susceptibility models in predicting gully erosion zones.
To prepare the above GESM outputs for each model tested on the UNB, rstly sites of the existent gullies in the area shall be mapped. Alongside this, the respective layers of the different causative factors associated with gully erosion (as ascertained from the literature) shall be prepared from various geospatial datasets. Different machine learning models will then be calibrated and trained using the above database and executed to predict the gully erosion susceptibility in the area. Subsequently, statistical measures shall be enumerated to gauge these models' effectiveness and accuracy, and their respective outputs shall be compared with the actual gully sites in the area for validation. The primary objective of this analysis would thus be to prepare a GESM for the use of RBF-ICA models from the LR, RF and ensemble models. The novelty of this study lies in its re nement of the process of RBF interpolation by including a susceptibility map of gully erosion using the ICA algorithm.

The Study Area
The Narmada River is known as the lifeline of the central Indian state of Madhya Pradesh and it is the largest westward owing river in the country. It rises in the Amarkantak Plateau of the Maikal Range in Anuppur District of Madhya Pradesh at a height of 1057 m above sea level and travels 1312 km to outfall into the Gulf of Khambhat in the Arabian Sea, near Bharuch in Gujarat. The Central Water Commission divides its basin, which lies hemmed in between the Vindhyachal and Satpura Ranges, into three parts-Upper, Middle and Lower Narmada basin. The Upper Narmada Basin (UNB) comprises of a fairly large tract lying between 21°41 N to 23°46 N and 79°17 E to 81°46 E (Fig. 1). It covers an area of 23089 sq.km (~ 23.37% of the overall basin area) and its perimeter is 1652.20 km. The longitudinal basin length (L) is 258.40 km and the highest and lowest points within the UNB are at 1135.73 m and 328.74 m, respectively, above mean sea level. The UNB has a humid tropical climate, receiving on average an annual rainfall amount of 1178 mm, while the annual average temperature is 25.80°C. About 30.83% of its area is under forest cover, 57.81 % under arable land and 6.26 % is grassland/wastelands.

Materials And Methods
For this study, diverse datasets were gathered together or generated from different sources (Table 1). Image analysis was undertaken on the Google Earth Engine platform on Sentinel-2 (MSI Multispectral Instrument, Level 1C) images of spatial resolution 10 m of the area and a Shuttle Radar Topographic Mission (SRTM) digital elevation model (DEM) of spatial resolution 30 m was used to extract the terrain attributes. The study followed the following steps ( Figure 2): ii. Examination of the interlinkages between the GIM and individual GCFs.
iii. Application of machine learning models [Classi cation and Regression Trees (CART), RF and ANN] to prepare the gully erosion susceptibility maps (GESMs).

Gully Inventory Map
The Gully Inventory Map (GIM) is important since it enables the different predictive models to prepare the gully erosion susceptibility maps (GESMs) (Hosseinalizadeh et al., 2019). The GIM layer was considered to be the dependent variable in this analysis. Gully locations were initially ascertained from the Google Earth Engine platform and veri ed in the eld using a handheld global positioning system (GPS) device.
In the study area, a total number of approximately 1501 gullies were found, from which 1051 gully locations (about 70%) were used for the model training purpose and 450 gully locations (about 30%) were used for validating their respective outputs.

Gully Erosion Conditioning Factors
A signi cant step in the preparation of GESMs is the initial choice of geo-environmental factors (Hosseinalizadeh et al., 2019), which are then analyzed by different machine learning algorithms. Based on the existent literature, twelve gully erosion conditioning factors (GCFs) were chosen (Table 1). These were the area's elevation, slope amount, slope aspect, soil cover, geomorphology, lithology, land use and land cover (LULC), Normalized Difference Vegetation Index (NDVI), annual rainfall, stream density and road density (Figure 3a, 3b, 3c). Another layer with the junctions of streams and roads and existing large gullied tracts was also prepared, as newer gullies are more prone to form near such locations (Gayen et al., 2019;Hosseinalizadeh et al., 2019).
The altitude of an area primarily in uences gully erosion and also affects vegetation and precipitation (Zhu et al., 2014;Gomez-Gutierrez et al., 2015). It also regulates the drainage ow and markedly in uences the moisture content of the soil and the hillside slope attributes (Rahmati et al., 2017). Both, the slope amount and its direction, have a major impact on gully erosion, in terms of channeling the surface runoff (Arabameri et al., 2019). The slope map of the UNB was extracted from the SRTM DEM using the relevant algorithms provided in the ArcGIS toolbox and this was also classi ed into ve zones. The aspect map (similarly prepared from the SRTM dataset), was classi ed into ten sub-classes on the basis of the principal directions.
Due to its direct effects on the rock erodibility, soil character and in ltration rate (Casalı, et al., 1999), the ambient lithology is one of the most important input layers when modeling gully erosion susceptibility.
Along with the lithology, the local geomorphology (both natural and human-made landforms) plays a vital role in conditioning gully erosion (Gayen et al., 2019). Both these layers were prepared from the Geological Survey of India databases and maps for the same. Another important factor governing in ltration and thereby runoff generation for surface erosion is the local soil properties (Hosseinalizadeh et al., 2019). This layer was prepared from the Soil and Land Use Survey of India (SLUSI) data products for this region.
The annual rainfall map of the UNB was prepared on the basis of rainfall data collected from CHIRPS Pentad 2.0 (https://www.chc.ucsb.edu/data/chirps) using the kriging interpolation method in ArcGIS. The stream network layer was extracted from the DEM using the D8 ow routing and ow accumulation algorithms available in ArcGIS and for calculating the drainage density, the line density tool was used. The type of land use and land cover (LULC) greatly impacts on erosional processes. Barren areas typically undergo more rapid erosion than forest-covered surfaces (Dai et al., 2001). Using the maximum likelihood classi cation method, the LULC map of the UNB was derived from the Sentinel-2 images used (Copernicus Global Land Cover Layers: CGLS-LC100 Collection 2). Alongside the extent of the ambient vegetation cover, its robustness/health is a pertinent factor that in uences the amount of precipitation intercepted and lost through evapotranspiration, thereby reducing the amount available for runoff. To gauge the vegetation health in the UNB, the NDVI parameter was ascertained as follows: where, IR is the infrared portion of the electromagnetic spectrum and R is the red portion of the electromagnetic spectrum. NDVI values range between +1 (healthy vegetation) and -1 (non-vegetation class). A road density layer was incorporated since road construction often results in forest clearance and slope modi cation, which can then facilitate runoff channelization and gully initiation. The main road networks were digitized and distance buffers created around them in ArcGIS, using the relevant tools. The junctions of the main roads and streams and the already gullied tracts were extracted through digitization. The principal contours and their alignment/pattern were also demarcated in this layer to highlight the broad physiographic entities within the UNB.
All the above input layers are made similar in terms of their locational reference information and a uniform cell size of 10 m for the ensuing analyses (following similar procedures adopted by Rahmati et al., 2017;Gayen et al., 2019;Hosseinalizadeh et al., 2019).

Classi cation and regression trees (CART)
CART (Breiman et al., 1984 is a common technique of data mining based on recursive binary partitioning and was developed as a MLA using R software (Breiman et al., 2017). However, only a few GESM studies have employed it (Akgün and Türk, 2011). By repeated partitioning of nodes from which the data is partitioned, CART models are acquired until the values are either identical or limited to fewer observations than the user-de ned boundaries (Naghibi and Pourghasemi, 2015;Youssef et al., 2016). Monotonic alteration and dissimilar factor scales (Aertsen et al., 2011;Naghibi and Pourghasemi, 2015;Youssefet al., 2016;Akgün and Türk, 2011) remain unaltered in the model. Additionally, the pruning method helps prevent problems relating to over-tting and the exclusion of irrelevant data (Akgün and Türk, 2011). Here, a selection rule called the modi ed towing rule, based on a straightforward relationship between the distribution of the target function in two child nodes data (Akgün and Türk, 2011), was used as follows: where, k denotes the target parameter, the probability distributions of the left and right child nodes are indicated separately by PL(k) and PR(k) and u speci es the penalty imposed on partitions that generate two con icting child nodes (Wu et al., 2008). The result of CART is a binary decision tree dividing the prediction space into regions (Rm), where the response factor values are homogeneous (≅α m ), as follows: In R 3.6.0 (Therneau et al., 2019), the CART model is implemented using the 'rpart' package.

Random Forest (RF)
Random forest MLAs have emerged from the random forest decisions rst suggested by Ho (1995) and then Breiman (2001). RF has been recognized as a rapid learning algorithm and has recently been validated in a number of GESM studies (Rahmati et al., 2017;Arabameri et al., 2019;Gayen et al., 2019) due to its unbiased classi cation and selection without deletion of a large number of input variables (Immitzer et al., 2012). RF is also commonly used for questions related to regression and classi cation (Rahmati et al., 2017). In order to create a set of decision trees with regulated variance, the approach incorporates Breiman's bagging idea (i.e. bootstrap aggregation- Breiman, 2001) and the random selection of features, implemented independently (Ho, 1998;Amit and Geman, 1997). This algorithm generates numerous decision trees that form a forest, altering the factors that disturb the target and generating aggregations to provide accurate results (Arabameri et al., 2019). Each tree is built using the CART technique on the basis of a bootstrapped sample of data with a random subset of variables selected at the individual node, with the nal model prediction ascertained using the decision trees of all majority votes (Micheletti et al., 2014;Liaw and Wiener, 2002). The key bene ts of the RF approach compared to other MLA are discussed in Naghibi and Pourghasemi (2015). Inputs are given for the number of trees (t) and factors (

Arti cial Neuron Network (ANN)
ANN is a machine learning system that is generally used to solve different classi cation and prediction di culties and is a particular type of nonlinear regression based on a collection of connected neurons or computational units (De la Rosa et al., 1999). It is a method of input data training that identi es a pattern of performance, i.e. the aim that has been determined by its ability to learn. Several researchers have carried out surface erosion analysis using ANN and have obtained acceptable accuracy predictions (Negnevitsky, 2002;Kim and Gilley, 2008). The most common form of ANN algorithm is the multi-layer perceptron (MLP) (Negnevitsky, 2002). The MLP consists of three-layers of neurons (nodes or processing elements) comprising the input layer, hidden layer and the output layer. Based on the corresponding weights, the neurons or nodes in the hidden layer analyze a signi cant amount of information from their inputs (i.e. the conditioning factors) (Lee et al., 2003) and the input layer uses a nonlinear transfer function to obtain a probabilistic estimate of the occurrence of the dependent parameters. The eventual output layer consists of a susceptibility map . The secret and output layers thus receive data from the input layer and evaluate complex functions in order to boost the model's generalizability and predictive in uence (Falaschi et al., 2009). The numbers of input and output nodes are xed using a design application in order to use an ANN to generate a GESM. In fact, the input nodes are equal to the control factors and the output nodes are the pairs of values for each pixel for the dependent variable or gully erosion (between 0 and 1), where '1' denotes the highest likelihood of gully erosion and '0' denotes no such erosion. We have used a feed forward implementation of ANN with a single hidden layer in this analysis (Figure 4). In this form of ANN, the two key control parameters, comprising the amount of weight decay and the number of units or neurons in the hidden layer (Vorpahl et al., 2012), were selected using the 'train' function in the 'caret' package (Kuhn, 2008) by a tenfold internal cross-validation process. The ANN model was implemented using the 'nnet' package (Ripley and Venables, 2016) in R software v.3.4.1.

Spatial distribution of the causative factors
The prepared UNB elevation map from the SRTM DEM was classi ed into ve zones. The elevation range was 807 m (highest 1136 m, lowest 329 m). Higher elevations were evident towards the UNB's eastern margins while the landscape graded down to the basin outlet in the west. The elevation map also clearly highlighted the principal valleys in the UNB, which trended from east to west. Most of the basin area has gentler slopes (~ 0° to 2.75°) but marked breaks-of-slope were evident along the valley sides in the central and eastern part of the study area. The highest slope values of ~ 58° were noted along the escarpments traversing the eastern side. The aspect map showed a dominance of at slope segments throughout the basin. Following the trend of the main drainage lines, many slopes segments dip towards the west and southwest. Due to the presence of steep escarpments, local variations in the slope aspect are seen on opposing anks of the ridges.
Being a part of India's peninsular block, the UNB is mostly covered in basaltic lava ows (basic volcanics) and the associated crystalline rock types (metamorphics and other plutonic intrusives). In all, there are seven rock groups present in the UNB. Patches of sedimentary rocks and unconsolidated sediments are found in the principal river valleys at lower elevations towards the west. The UNB has a wide variety of landform types, which range from denudational to structural ones. Fluvial depositional features are more evident to the west while in the higher elevation and steeper slope sections to the east, there is a preponderance of structural landforms and residual, dissected landscapes. Human-made landforms like mine dumps are also present in the area. The prepared soil map of the UNB had six categories based on the ambient soil types-deep black soils, laterite soil, medium black soil, red loamy soil, red sandy soil and shallow clay soil. Deep black soils, which can retain much moisture, are only seen within the main valleys and oodplains towards the west, while most of the dissected plateau and ridge areas contain shallow soils. Leaching in the higher elevation zones in the east has resulted in the formation of laterite soils.
The rainfall amount varies from about 176 cm near the southern fringe of the basin to 138 cm near its northern extremities. Drainage density values are on the lower side (highest values of 0.44 km/sq.km are seen in various stream con uence zones). The main LULC components found were water bodies, agricultural lands, forests/dense vegetation, shrubs and built-up tracts. Forests and agricultural lands were the dominant classes. The only signi cant built-up patch was present in the western part of the UNB (the town of Jabalpur), near which is present the Bargi Reservoir on the Narmada River. The dissected plateau tracts and hill ranges in the eastern part of the UNB reported the highest NDVI values of about 0.74. These roads converge on Jabalpur town, located in the western part of the UNB. The junctions of the main roads and streams and the already gullied tracts were extracted through digitization. The broad physiographic entities present in the UNB ( oodplains, pediments, ridges, scarps and dissected plateau surfaces), generally followed the local elevation variation.
Based on their ROC-AUC values that ranged from 0.616625 to 0.781775 (Fig. 6, Table 2), all the three models could be said to have been mostly successful in predicting likely gully erosion locations in the UNB. Their respective accuracies were 61.66% for the CART, 78.17% for the RF and 65.75% for the ANN model. The RF model was thus by far the best predictor of the likely gully erosion sites in the study area and also displayed a much higher precision. The ANN model performance was also acceptable as its accuracy was above 65%.  On the basis of the relative importance accorded to the different input factors, both the CART and RF models highlighted the area's geomorphology (landforms), soil type (texture) and lithology as being the three most important variables that conditioned gully erosion in the region. The ANN model incorporated the LULC parameter within the list of the most important parameters, alongside those noted by the CART and RF models. These were followed by the other factors like road density, elevation, slope aspect, NDVI, precipitation, stream density and slope amount.
The individual GESM of the three models were all classi ed into high, moderate and low susceptibility zones (Fig. 5a, 5b), on the basis of the enumerated cell values (Table 3), using the natural breaks classi cation scheme (following Zhang, 2018). In each of the output GESM layers, the highest possibility of gully formation and erosion was noted in the central and eastern parts of the UNB, which lie within the dissected plateau and hill ranges that are comprised of basic volcanics or crystallines/metamorphics. This portion of the basin also has relatively higher elevation and is overlain by thin, shallow soils or

Model comparison
The respective outputs of the three models used in this analysis, i.e. CART, RF and ANN, were categorized into zones of low, moderately and highly susceptible (GES) zones (Fig. 5a, 5b). When combined together [Figure 5b(g)], the ANN model shows the largest area of the highly susceptible zone while the CART model shows the largest area of the low susceptibility zone. Both these models have demonstrated quite acceptable prediction accuracy, according to the validation results (Fig. 6, Table 2). However, in terms of the AUC values, the RF model is the best with a score of 78.17%, making it the most accurate, followed by the ANN model (65.75%) and then the CART model (61.66%). Therefore, relative to the other models, the RF model was better in predicting the GES zones in the UNB. In general, the results show that the area of the respective susceptibility groups decreased for all three models with increasing susceptibility (from low to high), whereas the area under gully erosion increased in contrast. These outcomes are in line with those obtained by Youssef et al. (2016).
By taking the RF model as the most accurate, the proportionate area under each of its three demarcated susceptibility classes (low, medium and high) were computed as being 26.27%, 37.74% and 35.99%, respectively ( class and 30.66% and 37.26% in the medium and high susceptibility classes, respectively). Thus, it is of concern that more than one-third of the UNB can be subjected to gully erosion quite easily and that this process is quite likely to occur over two-thirds of the basin area (when considering the medium and high category coverage together).

Discussion
While numerous techniques have been developed and used worldwide for the spatial prediction of environmental hazards, the goal of all these methods is the same, i.e. to gauge the likely susceptibility of a place to that threat. The rst step in controlling gully erosion is thus to assess the locations where this phenomenon can occur by preparing suitable GESMs, based on the existing locations of ambient gullied tracts and spatial distribution of the most likely erosion enhancing geo-environmental variables. The development of machine learning techniques during the last decade has furthered the preparation of seemingly more accurate susceptibility maps. For this, the input categorization, clustering and data elaboration are essential procedures (Mezaal et al., 2017;Rizeei et al., 2016).
To generate the GESMs for the UNB, based on the prepared training and validation datasets and 12 GCFs, we used the CART, RF and ANN MLAs. Landforms, soil type and lithology are the most in uential and successful factors among all three models (CART, RF and ANN). These were followed by attributes like road density, elevation, aspect, NDVI, LULC, precipitation, stream density and slope, which had a comparatively lesser in uence on the occurrence of gully erosion in the study area. We identi ed a highly susceptible class of GES in the central to eastern parts of the UNB and our results ranged from being quite accurate for the CART model to being very accurate for the RF model, based on their elicited AUC values. Very few such studies have been previously conducted in the headwater regions of the Narmada River, with more attention being paid to the neighboring Chambal and Godavari basins (e.g. Joshi and Nagare, 2013;Joshi, 2014;Pani, 2016;Ranga et al., 2016). Thus this paper lls an existing gap in the gully development landscape of the Indian Peninsula.
Some errors and limitations do exist in our study. The collection of gully locations in certain places within the affected area was quite less as these locations were di cult to access in the eld while groundtruthing or were di cult to discern clearly from the images. Furthermore, the sampled gully dataset was split into a 70:30 ratio, following previous literature, without testing the sample accuracy. The topographic/landscape attributes derived from the DEM and images can be enumerated more precisely from higher resolution datasets (Erasmi et al., 2014;Pope et al., 2014;Das et al., 2016). In order to improve the accuracy of the GES models, the effective study of the non-gully area ratio, the assessment of the sample division, the consideration of the method for selecting the feature and the use of ensemble approaches are also helpful. At best, the ascertained GESMs highlight the broad zones wherein gullying can possibly take place, they are not indicative of individual sites of gully formation or growth, as such landform features are too minute to be detected from the range of datasets employed here. The tracking of the development of singular gullies and their associated landscape forms would be better achieved using close-range or aerial photogrammetry (Patel et al., 2020;Meinen et al., 2020).

Conclusion
The goal of this research was to not only analyze the ability of machine learning models to predict the susceptibility to gully erosion, but also to compare the ability and robustness of the models implemented, i.e. the CART, ANN and RF models. Results reveal that even when conducting model comparisons with some clear goals, such as prediction accuracy and robustness, the understanding of their strengths and limitations remains somewhat challenging for model preference. The RF obtained the most outstanding performance as per the achievements, based on evaluation criteria, while all three models indicated that the highest gully erosion susceptible zone is the central and eastern portion of the study area. Suitable land use planning measures are therefore required to avoid further gully and soil erosion in these tracts. Due to the high accuracy ascertained for the RF model, it can be employed not only in the Upper Narmada Basin for demarcating likely gullying affected zones, but in other areas too for similar assessments.

Declarations Declaration of Interest Statement
No con ict of interest exists.
We wish to con rm that there are no known con icts of interest associated with this publication and there has been no signi cant nancial support for this work that could have in uenced its outcome.