The proposed approach in this work is based on the use of six machine learning classifiers (MLC) for landslide susceptibility modeling. Figure 3 illustrates the overall workflow of the current study.
3.3. Landslide triggering factors (LTFs)
To achieve high modeling accuracy of landslide susceptibility, selecting and preparing the LTFs database is a crucial step. According to previous studies, many researchers indicated that precipitations and earthquakes are the external triggering factors of landslides (Goetz et al. 2015). Moreover, others mentioned that they depend on the characteristics of the study area (known as internal factors) (Youssef and Pourghasemi 2021). However, Alqadhi et al. (2022) stated that knowledge of optimal conditioning factors should be conducted.
Related to the study area and field investigations, thirteen LTFs were involved including, slope-angle, slope aspect, slope length (LS), distance from roads, geology, distance from streams, distance from faults, elevation, rainfall, erodibility, plan curvature, profile curvature, and land use/land cover (LULC). Accordingly, a database of 13 geo-environmental factors were generated using GIS software (ArcGIS 10.5). Thirteen thematic layers of the same spatial resolution have been prepared (Fig. 5). All layers have UTM coordinate system zone 30 with a Datum of WGS 84. LTFs have different types: nominal (categorical) where data do not have a natural order or ranking such as (geology, slope aspect, and land use/ land cover) and numeric (continuous data) in which the order matters such as (elevation, slope angle, slope length, plan-curvature, profile- curvature, distance from faults, distance from roads, and distance from streams).
3.3.1 Slope angle
The slope or the degree of slope inclination is a major factor in the reproduction of geological hazards in mining areas (Su et al. 2021). The higher the degree of inclination of the slope, the larger the number of the landslides (Youssef and Pourghasemi 2021). Thus, with the increase in slope angle, the gravity's force on the slope also increases. Therefore, the greater the slope, the stronger the anti-weathering ability of the slope rock. The slope angle map of the study area was derived from the DEM-SRTM with a resolution of 30 m, ranging from 0° to 66.08° (Fig. 5a).
3.3.2 Slope aspect
Slope-aspect can influence directly or indirectly landslides by its impact on various process including the surface distribution of solar radiation (Alghamdi and Abdel-Mottaleb 2021), wind directions, precipitation patterns, discontinuities orientations, hydrological processes, evapotranspiration, concentration of the soil moisture, vegetation, and root development. The study of Capitani et al. (2013) showed that slope aspect significantly influenced the distribution of shallow landslide types but did not influence other landslide types. The slope aspect map was generated using DEM-SRTM and classified into nine categories including flat (-1), North (0°–22.5°; 337.5°–360°), Northeast (22.5°–67.5°), East (67.5°–112.5°), Southeast (112.5°–157.5°), South (157.5°–202.5°), Southeast (202.5°–247.5°), West (247.5°–292.5°), and Northeast (292.5°–337.5°) (Fig. 5b).
3.3.3 Slope length (LS)
LS is a topographical parameter that greatly influences the susceptibility to landslides. this factor is incorporated into the slope angle and affects soil loss and hydrological processes of mountainous areas (Pourghasemi and Rahmati 2018). In the current study, the LS factor map, ranging from 0 to 432 m, as shown in Fig. 5i, was extracted from the DEM using ArcGIS 10.5 software according to the following formula of Wischmeier and Smith, developed by Bizuwerk et al. (2003):
LS = (L/22,13)m (65,41 Sin2(S) + 4,56 Sin(S) + 0,065) (1)
Where: L is the Slope length (m); L = flow accumulation × DEM spatial resolution and S is the slope gradient (in %); m: Constant which is equal to: 0.5 for slopes greater than 5%; 0.4 for slopes of 3.5 to 5%; 0.3 for slopes of 1 to 3.5% and 0.2 for slopes of less than 1%.
3.3.4 Distance from roads
Road construction works in mountainous areas are accompanied by engineering activities such as cutting or excavating slopes resulting in changes in the original geological situations and the weakening of the natural support of the rocky slopes. These anthropogenic activities are an important agent that influences the topography of natural slopes (Xiao et al. 2019). Thus, these works will increase landslides occurrence (Wang et al. 2016). Consequently, proximity to roads can be a potential indicator of landslides. In the current study, the proximity to road map was developed using the Euclidean distance tool in ArcGIS 10.5 (Fig. 5e).
3.3.5 Distance from streams
The extraction of the drainage network (the main streams) was developed using the DEM of the study area. In fact, runoff along wadis plays a crucial role in undermining phenomena and increasing pore pressure of areas adjacent to wadis and initiating landslides (Youssef and Pourghasemi 2021). Therefore, it becomes an essential triggering factor of landslide susceptibility. The map of proximity to streams was generated using the Euclidean distance tool in ArcGIS 10.5 (Fig. 5f).
3.3.6 Distance from faults
Faults greatly affect the surface stability of a given area and their strictures control the distribution, number and magnitude of geohazards. Using GIS platform, the two maps of the geological structure of Rich and Boudnib at 1/200,000 were vectorized, georeferenced, cut and assembled. The fault map of ZUW was generated and illustrated in Fig. 5g above. The map of proximity to faults (Fig. 5g) was developed using the Euclidean distance tool in ArcGIS 10.5.
3.3.7 Lithology
Each lithology is subject to a type of geohazards. Landslides can occur on different lithologies than where land subsidence can occur (Youssef and Pourghasemi 2021). These lithologies vary in their physical and mechanical characteristics, including type, strength, degree of weathering, durability, density and permeability (Henriques et al. 2015).
In the current study, the lithology was extracted from the geological map (scale 1:200,000) acquired from the Geological Service Office of Morocco. According to stratum age, eight lithological units have been identified, including Trias, Aalenian, Toarcian, Domerian, Pliensbachian, Sinemurian, Batho-Bajocian and Quaternary (Fig. 2).
3.3.8 LULC
According to previous studies, the area of rangeland and degraded forest in ZUW accounts for 60.31% and 20% of the total region, respectively. On the one hand, these two types of land cover do great damage to the surface, affecting the occurrence of landslide hazards; and on the other hand, the absence of land maintenance and a large part of ZUW is almost bare provides certain conditions for the occurrence of geological hazards. LULC map of the study area is illustrated in Fig. 5m.
3.3.9 Rainfall
Precipitation is an essential landslide conditioning factor. According to many field surveys, it is observed that rainfall is the main triggering factor to landslide in ZUW.
According to data recorded over the past 36 years at four meteorological stations all located in the study area, the rainfall map was obtained using the ArcGIS 10.5 platform and is shown in Fig. 5h.
3.3.10 Elevation
Many studies showed that slope instability is influenced by many topographic factors including the elevation (Feizizadeh et al. 2014). This factor has been commonly involved in landslide susceptibility predictions. In this study, elevation map was processed according to classification of the study area's DEM. The altitude values ranged between 1023 m and 3687 m above sea level (Fig. 5c).
3.3.11 Plan and Profile curvature
Slope shape and terrain morphology influence the landslide susceptibility in different ways (Haigh and Rawat 2012). Evans (1979) indicated that the curvature is one of the principal terrain variables which would be used in different geomorphological analyzes. For instance, the convergence and dispersion of surface runoffs is affected by plan curvature (Nasiri Aghdam et al. 2016). While the profile curvature influences the deposition rates of materials by controlling the transport of these materials on slope (Xiao et al. 2019).
In this work, plan and profile curvatures were generated from the ZUW’s DEM using the Spatial analyst tool of ArcGIS 10.5. The plan and profile curvatures values ranged between − 5.1 to 7.8 and − 7.3 to 7.9, respectively (Fig. 5j and 5k).
3.3.12 Erodibility index (K)
Erodibility index estimates the susceptibility of particles from a soil surface to detachment under the effect of raindrops or surface runoff or a combination of the two. Erodibility rate depends on the soil's physical characteristics, in particular the percentage of organic matter, clay, structure and permeability.
The ZUW’s erodibility map (Fig. 5l) used in this work is based on previous study carried out by Fenjiro et al. (2020).
3.4 Normalization of LTFs
Landslide’s triggering factors database aims to select and analyze the relation between landslide parameters. Generally, all the parameters are normalized using different approaches such frequency ratio (Zhu et al. 2021). This approach defines the area ratio wherever landslides have occurred in the whole study area for a period (Aditian et al. 2018). The frequency ratio is defined by the ratio of landslide occurrence percentage to the area occupation percentage for different classes of each triggering factor, as given in the Eq. (2). According to Samanta et al. (2018), the more the frequency ratio increases, the more the chances of landslides occurrences increase.
$$F{R}_{i}=\frac{{N}_{i}∕N}{{s}_{i}∕s}$$
2
Where FRi is the frequency ratio of the ith class of a triggering factor; Ni and Si, respectively, represent the number of landslides and the size of the area in the ith class of this triggering factor; N is the total number of landslides in the study area; and S is the total study area.
The FR values of the 13 triggering factors (each with several classes) are calculated and presented in Table 1 and Table 2. Using the normalize filter method, the normalized value for each class of the LTFs were rescaled from 0.1 to 0.9. A normalized value close to 1 recommends a more distinguished association between landslides and the factor. However, a value close to 0 symbolizes a weaker association.
Table 1
Spatial relationship between LTFs and landslides using frequency ratio (FR) and normalized classes
LTF | Class | Number of pixels | % of pixels | Number of landslides | % of landslides | Frequency Ratio |
FRi | FRnormalized |
Slope | Gentle (0–15°) | 1807422 | 0,37 | 11 | 0,08 | 0,21 | 0,10 |
Moderate(15–30°) | 1249754 | 0,25 | 21 | 0,15 | 0,57 | 0,16 |
Steep (30–45°) | 881213 | 0,18 | 28 | 0,19 | 1,08 | 0,23 |
Extreme (45–60°) | 671457 | 0,14 | 36 | 0,25 | 1,83 | 0,35 |
Very strong (> 60°) | 303773 | 0,06 | 48 | 0,33 | 5,39 | 0,90 |
Aspect | Flat (-1-40) | 505437 | 0,10 | 20 | 0,14 | 1,35 | 0,73 |
N (40–80) | 414655 | 0,08 | 5 | 0,03 | 0,41 | 0,10 |
NE (80–120) | 531783 | 0,11 | 7 | 0,05 | 0,45 | 0,13 |
E (120–160) | 754482 | 0,15 | 30 | 0,21 | 1,36 | 0,73 |
SE (160–200) | 772236 | 0,16 | 36 | 0,25 | 1,59 | 0,89 |
S (200–240) | 456295 | 0,09 | 9 | 0,06 | 0,67 | 0,27 |
SW (240–280) | 398328 | 0,08 | 9 | 0,06 | 0,77 | 0,16 |
W (280–320) | 467495 | 0,10 | 11 | 0,08 | 0,80 | 0,16 |
NW (320–360) | 617270 | 0,13 | 17 | 0,12 | 0,94 | 0,45 |
Elevation | 1 | 393526 | 0,08 | 18 | 0,13 | 1,56 | 0,47 |
2 | 850896 | 0,17 | 21 | 0,15 | 0,84 | 0,29 |
3 | 837542 | 0,17 | 9 | 0,06 | 0,37 | 0,17 |
4 | 885384 | 0,18 | 6 | 0,04 | 0,23 | 0,14 |
5 | 655483 | 0,13 | 14 | 0,10 | 0,73 | 0,26 |
6 | 552065 | 0,11 | 29 | 0,20 | 1,79 | 0,53 |
7 | 409303 | 0,08 | 22 | 0,15 | 1,84 | 0,54 |
8 | 240897 | 0,05 | 23 | 0,16 | 3,26 | 0,90 |
9 | 92885 | 0,02 | 2 | 0,01 | 0,74 | 0,27 |
Geologic unit | Sinemurian | 360359 | 0,07 | 33 | 0,23 | 3,19 | 0,90 |
Quaternary | 597887 | 0,12 | 7 | 0,05 | 0,41 | 0,17 |
Batho-Bajocian | 2317137 | 0,46 | 103 | 0,72 | 1,55 | 0,47 |
Domerian | 268712 | 0,05 | 1 | 0,01 | 0,13 | 0,10 |
Pliensbachian | 532940 | 0,11 | 0 | 0,00 | 0,00 | 0,10 |
Trias | 60809 | 0,01 | 0 | 0,00 | 0,00 | 0,10 |
Aalenian | 423867 | 0,08 | 0 | 0,00 | 0,00 | 0,10 |
Toarcian | 454640 | 0,09 | 0 | 0,00 | 0,00 | 0,10 |
Distance to road | 1 | 857267 | 0,17 | 47 | 0,33 | 1,87 | 0,90 |
2 | 707668 | 0,14 | 31 | 0,22 | 1,50 | 0,72 |
3 | 634743 | 0,13 | 24 | 0,17 | 1,29 | 0,62 |
4 | 549073 | 0,11 | 30 | 0,21 | 1,87 | 0,90 |
5 | 2170701 | 0,44 | 12 | 0,08 | 0,19 | 0,10 |
Distance to stream | 1 | 147774 | 0,03 | 4 | 0,03 | 0,92 | 0,13 |
2 | 134610 | 0,03 | 4 | 0,03 | 1,02 | 0,14 |
3 | 143612 | 0,03 | 5 | 0,03 | 1,19 | 0,15 |
4 | 100359 | 0,02 | 5 | 0,03 | 1,70 | 0,18 |
5 | 4393097 | 0,89 | 126 | 0,88 | 0,98 | 0,14 |
Distance to fault | 1 | 927639 | 0,19 | 71 | 0,49 | 2,61 | 0,90 |
2 | 854753 | 0,17 | 33 | 0,23 | 1,32 | 0,47 |
3 | 814911 | 0,17 | 19 | 0,13 | 0,80 | 0,29 |
4 | 739301 | 0,15 | 10 | 0,07 | 0,46 | 0,18 |
5 | 583038 | 0,12 | 4 | 0,03 | 0,23 | 0,10 |
6 | 999810 | 0,20 | 7 | 0,05 | 0,24 | 0,10 |
Rainfall | 1 | 1906686 | 0,39 | 46 | 0,32 | 0,82 | 0,21 |
2 | 2118988 | 0,43 | 9 | 0,06 | 0,15 | 0,10 |
3 | 410111 | 0,08 | 47 | 0,33 | 3,92 | 0,70 |
4 | 224347 | 0,05 | 34 | 0,24 | 5,18 | 0,90 |
5 | 259320 | 0,05 | 8 | 0,06 | 1,05 | 0,24 |
Length of slope | 1 | 21828 | 0,00 | 142 | 0,99 | 222 | 0,90 |
2 | 76772 | 0,02 | 0 | 0,00 | 0,00 | 0,10 |
3 | 1015571 | 0,21 | 0 | 0,00 | 0,00 | 0,10 |
4 | 1949329 | 0,40 | 0 | 0,00 | 0,00 | 0,10 |
5 | 1855923 | 0,38 | 2 | 0,01 | 0,04 | 0,10 |
Plan of curvature | 1 | 249457 | 0,05 | 138 | 0,96 | 18,9 | 0,90 |
2 | 974262 | 0,20 | 0 | 0,00 | 0,00 | 0,10 |
3 | 2332702 | 0,47 | 1 | 0,01 | 0,01 | 0,10 |
4 | 1101346 | 0,22 | 2 | 0,01 | 0,06 | 0,10 |
5 | 260214 | 0,05 | 3 | 0,02 | 0,39 | 0,12 |
Profile of curvature | 1 | 208449 | 0,04 | 12 | 0,08 | 1,97 | 0,89 |
2 | 774445 | 0,16 | 28 | 0,19 | 1,23 | 0,45 |
3 | 2241836 | 0,46 | 42 | 0,29 | 0,64 | 0,10 |
4 | 1348631 | 0,27 | 42 | 0,29 | 1,06 | 0,35 |
5 | 344620 | 0,07 | 20 | 0,14 | 1,98 | 0,90 |
Erodibility (K) | 1 | 23208 | 0,46 | 100 | 0,69 | 1,51 | 0,40 |
2 | 620 | 0,01 | 7 | 0,05 | 3,95 | 0,90 |
3 | 17820 | 0,35 | 36 | 0,25 | 0,71 | 0,24 |
4 | 8724 | 0,17 | 1 | 0,01 | 0,04 | 0,10 |
Land use / Land cover (LULC) | Agricultural fields | 176032 | 0,04 | 1 | 0,01 | 0,19 | 0,10 |
Degraded forest | 450223 | 0,09 | 21 | 0,15 | 1,59 | 0,90 |
Rangeland | 4245821 | 0,87 | 122 | 0,85 | 0,98 | 0,55 |
Water bodies | 28178 | 0,01 | 0 | 0,00 | 0,00 | 0,10 |
Table 2
Normalized classes of non-landslides conditioning factors used in this study.
LTF | Class | Number of pixels | % of pixels | Number of landslides | % of landslides | Frequency Ratio |
FRi | FRnormalized |
Slope | Gentle (0–15°) | 1807422 | 0,37 | 17 | 0,12 | 0,32 | 0,10 |
Moderate(15–30°) | 1249754 | 0,25 | 127 | 0,88 | 3,47 | 0,90 |
Steep (30–45°) | 881213 | 0,18 | 0 | 0,00 | 0,00 | 0,10 |
Extreme (45–60°) | 671457 | 0,14 | 0 | 0,00 | 0,00 | 0,10 |
Very strong (> 60°) | 303773 | 0,06 | 0 | 0,00 | 0,00 | 0,10 |
Aspect | Flat (-1-40) | 505437 | 0,10 | 12 | 0,08 | 0,81 | 0,16 |
N (40–80) | 414655 | 0,08 | 13 | 0,09 | 1,07 | 0,40 |
NE (80–120) | 531783 | 0,11 | 25 | 0,17 | 1,61 | 0,90 |
E (120–160) | 754482 | 0,15 | 21 | 0,15 | 0,95 | 0,29 |
SE (160–200) | 772236 | 0,16 | 22 | 0,15 | 0,97 | 0,30 |
S (200–240) | 456295 | 0,09 | 10 | 0,07 | 0,75 | 0,10 |
SW (240–280) | 398328 | 0,08 | 10 | 0,07 | 0,86 | 0,20 |
W (280–320) | 467495 | 0,10 | 13 | 0,09 | 0,95 | 0,29 |
NW (320–360) | 617270 | 0,13 | 18 | 0,13 | 1,00 | 0,33 |
Elevation | 1 | 393526 | 0,08 | 28 | 0,19 | 2,43 | 0,90 |
2 | 850896 | 0,17 | 34 | 0,24 | 1,36 | 0,54 |
3 | 837542 | 0,17 | 33 | 0,23 | 1,35 | 0,53 |
4 | 885384 | 0,18 | 20 | 0,14 | 0,77 | 0,33 |
5 | 655483 | 0,13 | 26 | 0,18 | 1,35 | 0,53 |
6 | 552065 | 0,11 | 2 | 0,01 | 0,12 | 0,11 |
7 | 409303 | 0,08 | 1 | 0,01 | 0,08 | 0,10 |
8 | 240897 | 0,05 | 0 | 0,00 | 0,00 | 0,10 |
9 | 92885 | 0,02 | 0 | 0,00 | 0,00 | 0,10 |
Geologic unit | Sinemurian | 360359 | 0,07 | 2 | 0,01 | 0,19 | 0,10 |
Quaternary | 597887 | 0,12 | 20 | 0,14 | 1,17 | 0,70 |
Batho-Bajocian | 2317137 | 0,46 | 99 | 0,69 | 1,49 | 0,90 |
Domerian | 268712 | 0,05 | 2 | 0,01 | 0,26 | 0,14 |
Pliensbachian | 532940 | 0,11 | 0 | 0,00 | 0,00 | 0,10 |
Trias | 60809 | 0,01 | 0 | 0,00 | 0,00 | 0,10 |
Aalenian | 423867 | 0,08 | 8 | 0,06 | 0,66 | 0,39 |
Toarcian | 454640 | 0,09 | 13 | 0,09 | 1,00 | 0,60 |
Distance to road | 1 | 857267 | 0,17 | 127 | 0,88 | 5,06 | 0,90 |
2 | 707668 | 0,14 | 12 | 0,08 | 0,58 | 0,18 |
3 | 634743 | 0,13 | 4 | 0,03 | 0,22 | 0,13 |
4 | 549073 | 0,11 | 0 | 0,00 | 0,00 | 0,10 |
5 | 2170701 | 0,44 | 1 | 0,01 | 0,02 | 0,10 |
Distance to stream | 1 | 147774 | 0,03 | 60 | 0,42 | 13,9 | 0,90 |
2 | 134610 | 0,03 | 16 | 0,11 | 4,06 | 0,32 |
3 | 143612 | 0,03 | 13 | 0,09 | 3,09 | 0,26 |
4 | 100359 | 0,02 | 8 | 0,06 | 2,72 | 0,24 |
5 | 4393097 | 0,89 | 47 | 0,33 | 0,37 | 0,10 |
Distance to fault | 1 | 927639 | 0,19 | 10 | 0,07 | 0,37 | 0,10 |
2 | 854753 | 0,17 | 11 | 0,08 | 0,44 | 0,13 |
3 | 814911 | 0,17 | 27 | 0,19 | 1,13 | 0,41 |
4 | 739301 | 0,15 | 51 | 0,35 | 2,36 | 0,90 |
5 | 583038 | 0,12 | 22 | 0,15 | 1,29 | 0,47 |
6 | 999810 | 0,20 | 44 | 0,31 | 1,50 | 0,55 |
Rainfall | 1 | 1906686 | 0,39 | 48 | 0,33 | 0,86 | 0,25 |
2 | 2118988 | 0,43 | 51 | 0,35 | 0,82 | 0,23 |
3 | 410111 | 0,08 | 23 | 0,16 | 1,92 | 0,59 |
4 | 224347 | 0,05 | 19 | 0,13 | 2,89 | 0,90 |
5 | 259320 | 0,05 | 3 | 0,02 | 0,40 | 0,10 |
Length of slope | 1 | 21828 | 0,00 | 137 | 0,95 | 214 | 0,90 |
2 | 76772 | 0,02 | 0 | 0,00 | 0,00 | 0,10 |
3 | 1015571 | 0,21 | 1 | 0,01 | 0,03 | 0,10 |
4 | 1949329 | 0,40 | 2 | 0,01 | 0,04 | 0,10 |
5 | 1855923 | 0,38 | 4 | 0,03 | 0,07 | 0,10 |
Plan of curvature | 1 | 249457 | 0,05 | 0 | 0,00 | 0,00 | 0,10 |
2 | 974262 | 0,20 | 4 | 0,03 | 0,14 | 0,10 |
3 | 2332702 | 0,47 | 103 | 0,72 | 1,51 | 0,90 |
4 | 1101346 | 0,22 | 33 | 0,23 | 1,02 | 0,61 |
5 | 260214 | 0,05 | 4 | 0,03 | 0,52 | 0,32 |
Profile of curvature | 1 | 208449 | 0,04 | 0 | 0,00 | 0,00 | 0,10 |
2 | 774445 | 0,16 | 4 | 0,03 | 0,18 | 0,10 |
3 | 2241836 | 0,46 | 103 | 0,72 | 1,57 | 0,90 |
4 | 1348631 | 0,27 | 33 | 0,23 | 0,84 | 0,48 |
5 | 344620 | 0,07 | 4 | 0,03 | 0,40 | 0,23 |
Erodibility (K) | 1 | 23208 | 0,46 | 100 | 0,69 | 1,51 | 0,90 |
2 | 620 | 0,01 | 0 | 0,00 | 0,00 | 0,10 |
3 | 17820 | 0,35 | 38 | 0,26 | 0,75 | 0,42 |
4 | 8724 | 0,17 | 6 | 0,04 | 0,24 | 0,10 |
Land use / Land cover (LULC) | Agricultural field | 176032 | 0,04 | 106 | 0,74 | 20,5 | 0,90 |
Degraded forest | 450223 | 0,09 | 24 | 0,17 | 1,81 | 0,17 |
Rangeland | 4245821 | 0,87 | 11 | 0,08 | 0,09 | 0,10 |
Water bodies | 28178 | 0,01 | 3 | 0,02 | 3,62 | 0,24 |
3.5 Modelling procedure using MLTs
The following paragraphs describe the machine learning classifiers used in this study.
3.5.1 AdaBoost
According to Freund and Schapire (1997), the AdaBoost algorithm contains a series of individual classifiers which are generated iteratively, and each classifier tries to precisely classify the training data thus this algorithm is one of the most widely used boosting algorithms. The classifier uses an adaptive resampling technique in order to choose training samples, i.e., a misclassified dataset produced by a previous classifier is selected more often than correctly classified ones so that a new one classifier can work well in a new dataset. Each iteration assigns a weight to the dataset so that the next integration focuses on reweighted datasets that were misclassified previously.
The final classifier is a weighted sum of the ensemble predictions. The advantage of the AdaBoost algorithm is that it works well for solving two-class problems, multi-class one-label problems, multi-class multi-label problems, single-category problems. label and regression issues (Hong et al. 2018).
3.5.2 Artificial Neural Network (MLP)
Inspired by the biological nervous system of the human brain, Multi-Layer Perceptron are a form of artificial neural networks (ANN), which is a type of models. They can do simulations of complex functions such as decision making and model generation (Liakos et al. 2018). MLPs, like the human brain, consist of a set of processing units called "neurons", which are interconnected. Each unit is a multi-input, single-output nonlinear element (Ding et al. 2013).
Neurons mostly act in parallel and are arranged in multiple layers which feature a data input layer; hidden layer of learning; and an output layer (Liakos et al. 2018).
MLPs could detect complex nonlinear relationships through a learning process that involves adjusting the weighted connections that exist between neurons. After choosing the number of hidden layers and the number of processing units in an individual layer, the ANN starts learning from the training samples (Aditian et al. 2018).
3.5.3 Random Forest
RF is a group of DTs that form an ensemble learning model used for classification and regression problems (Al-Najjar and Pradhan 2021). These models are effective for prediction because they utilize the strength of each tree and their correlations and less sensitive to over-fitting problems. The difference between RF and DT is that a decision tree is built on a whole dataset, utilizing all the variables of interest, while a random forest randomly adopts observations and specific variables to construct multiple decision trees from, and then averages the results. In the present study, samples for landslide and non-landslide events were selected to construct the classification tree (30% of the samples were kept aside from the training and 500 nodes were set as a favorite value).
3.5.4 Naïve Bayes (NB)
The NB classifier is a machine learning algorithm based on Bayes' theorem, which assumes that all learning variables are independent, and that each variable contributes equally to the classification problem (Tsangaratos and Ilia 2016). This algorithm relies on simple iterative parameter estimation schemes; therefore, it is easily developed. According to Tien Bui et al. (2012), NB classifier has good robustness to noise and irrelevant attributes.
According to Khosravi et al. (2018), the NB classifier application passes in four steps: (1) the choice of training variables, (2) the calculation of the posterior probability of each class (Landslides sites/non-landslides sites), (3) estimating the class level along with a covariance matrix and (4) creating a discriminant function for each class level.
3.5.5 Instance-Based K (IBK)
IBK classifier is considered as one of the K-Nearest Neighbors (KNN) classifiers which can select appropriate value of K based on cross-validation and can also do distance weighting.
According to Pedregosa et al. (2011), the classification of a data point by the KNN classifier is based on the properties of neighboring data points. The probability that a data point belongs to such a class is defined by the classification of its nearest neighbors (Bröcker and Smith 2007).
For a set of unclassified points, KNN calculates the distance from each point to find K nearest neighbors. The classification of these neighbors is then used for voting, and the classification with the maximum votes is assigned to the unclassified data point (Abraham et al. 2021).
3.5.6 Decision Tree (J48)
Decision Tree model (J48) is a supervised and nonparametric machine learning algorithm that is operable without prior knowledge about data distribution, with easy interpretation and capability to model as well as it handles the reduction of data complexity and the relationships between variables. Compared to other models, it is a flexible, fast, and robust algorithm that can be used to control the nonlinearity between the input features and discrete classes so that nonlinear relationships between parameters do not affect tree performance. Moreover, DT models are simple to construct and clarify for decision-makers (Kadavi et al. 2019; Al-Najjar and Pradhan 2021).
3.6 Evaluation of LTFs
After the preparation of the landslide inventory map and thirteen causative factor maps, the five algorithms AdaBoost, NB, RF, IBK, MLP and J48 were used to produce the landslide susceptibility maps.
In addition to the 144 landslide sites inventoried in the study area, the same number of points were chosen as non-prone landslide sites (144) to provide essential knowledge on stable or unfavorable conditions of landslide occurrence (Su et al. 2021).
Then, the total of 288 points were randomly separated into two parts: 1) The first one contains 70% data, used for the models training step; 2) The second part including the remaining 30% data adopted to verify these models and confirm their accuracy.
The three measures of accuracy, specificity and sensitivity have been adopted to evaluate performance after applying a dataset. Also, the Receiver Operating Characteristics (ROC) of each model were plotted, and the Areas Under the ROC Curves (AUCs) were obtained. Landslide susceptibility maps were classified into five classes: very low, low, moderate, high, and very high.
3.7 Modelling prediction and performance
Using the trained landslide datasets and saving models, the landslide susceptibility index (LSI) for each pixel in the entire study area was calculated. Then, the produced landslide susceptibility maps (LSMs) were reclassified into five susceptibility classes, including areas of very low, low, moderate, high, and very high susceptibility.
To evaluate the prediction accuracy and performance, the results can be produced quantitatively and graphically by different means. Accordingly, several precision indices (RMSE, Recall, Confusion matrix, …) have been used in landslide susceptibility modelling but many authors reported that AUC is exceptional as the most useful statistical precision index of the overall performance (Lee et al. 2004).
In the current study, the ROC curve is obtained by plotting the true positive rate (sensitivity) on the X-axis and the false positive rate (1-specificity) on the Y-axis. According to Youssef and Pourghasemi (2021), the AUC values can help to extract valuable practical information such as: (i) the quantitative evaluation of the prediction capacity of the models (ii) the identification of the prediction approach or the mathematical functions which give better predictions, (iii) comparing the predictive capability of multiple landslide susceptibility models produced by different methodologies and (iv) the evaluation of the capacity of each model to discriminate the most prone zones and the least susceptible areas to landslide (the safest areas).