3.1 Multicollinearity testing on the conditioning factors
A test for multicollinearity was run on all the landslide precondition variables. The linearity existing between the parameters, according to Chen et al. (2018), would reduce the predictive power of the models. The multicollinearity was tested using the tolerance and VIF. Values of tolerance and VIF < 0.1 and > 10 signify multicollinearity issues (Du et al. 2017). It is interesting to note that the threshold values for tolerance and VIF were discovered to be, respectively, > 0.1 and 10. (Table 3). To choose appropriate parameters for LSMs. Various landslide precondition elements, include the topographical/geomorphological factors (elevation, slope degree, slope aspect, TWI, PC, TPI, TRI, SA, and TST), the hydrological factors (rainfall and DtS), the geological factors (lithology), and the environmental factors (DtR, LU/LC, NDVI, and soil type) were chosen for the LSMs based on the various models utilizing the multicollinearity test. Because all of the landslide conditioning factors have Tolerance and VIF values that are lower than threshold values, the outcomes of this study's multicollinearity analysis listed in Table 3, reveal that the effective factors of landslides have no multicollinearity problem. The highest Tolerance value was shown on DtR (0.995), with the minimum value seen on lithology (0.344). On the other hand, the VIF value of lithology (2.907) was identified as the highest amongst the sixteen conditioning factors whereas the least VIF value was recorded on DtR (1.005) (Table 3). These results suggest that all the input predisposing factors were suitable for the production of the landslide maps based on the SVR-GOA, BRT, ANN, and elastic net models in Kalaleh basin.
Table 3
Result of multicollinearity analysis among landslide conditioning factors
Factor
|
Collinearity Statistics
|
VIF
|
Tolerance
|
Elevation
|
0.912
|
1.096
|
Aspect
|
0.943
|
1.060
|
Slope
|
0.740
|
1.351
|
Plan curvature (PC)
|
0.522
|
1.916
|
Topography position index (TPI)
|
0.705
|
1.418
|
Topography wetness index (TWI)
|
0.897
|
1.115
|
Surface area (SA)
|
0.369
|
2.710
|
Terrain ruggedness index (TRI)
|
0.444
|
2.252
|
NDVI
|
0.504
|
1.984
|
Rainfall
|
0.823
|
1.215
|
Distance to stream (DtS)
|
0.878
|
1.139
|
Distance to road (DtR)
|
0.995
|
1.005
|
Soil
|
0.436
|
2.294
|
Terrain surface texture (TST)
|
0.936
|
1.068
|
Lithology
|
0.344
|
2.907
|
LU/LC
|
0.779
|
1.284
|
3.2 Spatial distribution and effects of the landslide conditioning factors
Landslides are caused by a number of hydrological, geological, environmental, geomorphological, and topographical variables. Selecting pertinent factors is crucial in LSM because only a small number of parameters in a given area have a major impact on landslide events. Sixteen landslide predisposing factors were chosen in this study (Fig. 4). Random forest (RF) modeling was used to depict the mutual links between landslide sites and conditioning factors, with the management of landslide mitigation as the study’s goal. The RF-based pixel-based investigation is more precise and suitable for landslide susceptibility mapping. As shown in Table 4, it is indicated that the geology (lithology), slope degree, rainfall, TPI, TWI, SA, and LU/LC are the most influential conditioning factors of landslide in the Kalaleh basin.
Table 4
Relative influence of landslide conditioning factors in the LSM based on RF
Factor
|
Weight
|
Elevation
|
3.9
|
Aspect
|
3.6
|
Slope
|
17.9
|
Plan curvature (PC)
|
0.3
|
Topography position index (TPI)
|
8.9
|
Topography wetness index (TWI)
|
7.9
|
Surface area (SA)
|
6.1
|
Terrain ruggedness index (TRI)
|
1.3
|
NDVI
|
2.8
|
Rainfall
|
11.3
|
Distance to stream (DtS)
|
4.9
|
Distance to road (DtR)
|
4.2
|
Soil
|
0.7
|
Terrain surface texture (TST)
|
3.8
|
Lithology
|
19.3
|
LU/LC
|
5.5
|
The topographical/geomorphological factors adopted in this study were classified using the Jenk’s natural break method and the results of the factors are spatially distributed as presented in Fig. 4. Elevation ranged from 60–2348 m (Fig. 4a), slope aspect ranged from − 1–359.7 (Fig. 4b), slope degree ranged between 0° and 73.1° (Fig. 4c), PC was found between − 72.7 and 93 (Fig. 4d), and TPI ranged from 5.4–6.9 (Fig. 4e). Further, other topographical/geomorphological factors such as TWI, SA, TRI, and TST were found to range from 12–22.5 (Fig. 4f), 900–3013 (Fig. 4g), 0–62.1 (Fig. 4h), and 0.05–67.6 (Fig. 4n), respectively, across the study area. The lithology (representing the geology) was categorized into various classes (as shown in Fig. 4o) in line with the geologic units presented in Table 1. The hydrological factors, which include rainfall and DtS, are found to range from 236–1005 mm (Fig. 4j) and 0–1800 m (Fig. 4k), respectively. The NDVI, DtR, soil type, and LU/LC were also classified spatially into various groups. The NDVI sublayers, classed using the Jenk’s natural break method, were found to range from − 0.18 to 0.68 (Fig. 4i). The DtR in the study area ranged between 0–12,472 m (Fig. 4l). The different soil types in the area are altisols, inceptisols, mollisols, and entisols (Fig. 4m). The different types of LU/LC characterizing in the in Kalaleh basin are forest, agriculture, dry farming, orchard, water bodies, rock bodies, agriculture-orchard, and urbanized areas (Fig. 4p). It is worthy to mention that the lithology, soil type, and LU/LC are regarded as categorical conditioning parameters.
When preparing LSM, topographic variables have significant importance (Rahmati et al. 2016). There is a direct correlation between elevation and the likelihood of landslides. The different zones of susceptibility have been shown by the sub-classes of the elevation dataset. In most places, precipitation or rainfall is the main reason landslides occur. Due to soil loosening, quicker rate of sediment transmission, and increased surface runoff velocity, prolonged periods of heavy precipitation have a direct impact on landslides. It is projected that areas in the research region with the most rainfall will be most susceptible. When it comes to the slope, steeply sloping parts are more prone to landslides than gently sloping areas (Nwazelibe et al. 2023). The study's classes with the highest likelihood of landslides occurring are those with slopes greater than 30°. One of the elements that affects the likelihood of landslides is the TPI. The TPI sublayers were greater than 2 and thus suggesting that they indicate high landslide susceptibility. However, lower TPI values could have indicated lesser susceptibility to landslide occurrences.
The relationship between road distance and landslide locations may point to areas with extremely high landslide risks. In this area, landslides commonly happen near roadways (Fig. 3). With higher distance from nearby roadways, the DtR subclasses > 1000 m show a decreased link to landslide occurrence. In terms of the distance to streams, being close to a stream has a significant influence on the likelihood of landslides. A result of the streams' rapidity and discharge, the shortest distance from one (DtS) directly affects the likelihood of landslides. In other words, the closer soil or rock slopes are to streams, the higher their susceptibility to slides. In contrast, a higher DtS has no direct influence on landslide occurrences. The DtS sublayers > 200 m would have low implications on the landslide susceptibility. The SA often directly affects the susceptibility of landslide. Maximum surface area exposure can create a favourable situation for landslides. The SA subclasses obtained in this study indicate high possibility of landslide event and disaster.
Geologically, the research area is dominated by sedimentary formations that are often more susceptible to slides than hard rocks. The most vulnerable, brittle, and soft soils in the area have the largest risk of landslides when it comes to the soil type factor. Thus, compared to entisols and mollisols, alfisols and inceptisols have a larger impact on the occurrence of landslides. Landslides are also significantly influenced by LU/LC. Eight LU/LC classes in total were found. According to field observations and Fig. 4p, most of the landslides in the region happen in urbanized, agricultural-orchard, and mountain rock sectors. In the agricultural areas, dry farming areas, and forested areas, very few are seen. While great concentrations of landslides may characterize a low vegetation area, which is more prone to the occurrence of landslides and erosion, a high vegetation density can minimize the landslide risk and also delay the soil transmission and erosion (Nwazelibe 2023).
3.3 Landslide susceptibility models and their implementation
The dependent parameters were written as binary variables in this study of landslide susceptibility. Using the R programming application, the categorical and numerical values of the landslide conditioning elements were retrieved in order to build the final landslide maps based on the four computational ML models. The 4 LSM models are presented in Fig. 5. The results, as given in Fig. 5, which show that the landslide susceptibility of the area can be classified into five, such as very low susceptibility, low susceptibility, moderate susceptibility, high susceptibility, and very high susceptibility zones. A closer look at the maps produced using the four models suggest that the area is predominantly characterized by moderate to very high landslide susceptible zones. Generally, across the area, the southern, central, and northern portions are more characterized by high and very high susceptibility zones (Fig. 5). The eastern and western portions of the study area, on the other hand, are characterized by very low, low, and moderate susceptibility zones (Fig. 5). However, it was also noticed that the elastic net and SVR-GOA models look much alike (Figs. 5a and 5c) while some spatial similarities were noticed between the BRT and ANN models (Figs. 5b and 5d).
Furthermore, the results obtained from this study indicate that the portions that are faced with moderate, high and very high hazard risks are greater than the portions known for very low and low hazard risks (Fig. 5). This is quite critical for the area and the inhabitants. The specific percentages of areal coverage of each susceptibility class in the different models have been broken down in Fig. 6. The ANN LSM covered about 27.69%, 25.46%, 18.71%, 15.72%, and 12.42% of very low, low, moderate, high, and very high landslide susceptibility zones in the Kalaleh basin, respectively. The ANN model demonstrates the predominance of combined very low and low susceptibility zones over the combinations of the moderate, high, and very high susceptibility zones (Fig. 6). The LSM result based on the SVR-GOA model indicate that 19.30% and 18.49% of the area are, respectively, characterized by high and very high susceptibilities whereas 20.28%, 21.31%, and 20.62% of the study area have moderate, low, and very low susceptibilities. The SVR-GOA identified that there the portions of the area known for moderate, high and very high risks predominated.
The BRT LSM was also classified into the 5 classes of very high, high, moderate, low, and very low hazard zones. About 26.67% of the total landmass of the study area has very low landslide susceptibility, 26.48% has low risk, 19.00% has medium risk, 15.49% has high risk, and 12.36% has very high susceptibility. The combinations of the susceptibility zones seem to agree more with that of the ANN model (Fig. 6). Lastly, the elastic net model agrees more with the SVR-GOA. Based on the elastic net model, it is indicated that 20.23%, 20.03%, and 14.59% of the Kalaleh basin are, respectively, characterized by moderate, high, and very high hazard susceptibilities whereas 20.09% and 20.05% of the basin have low and very low landslide susceptibilities (Fig. 6).
3.4 Validation and comparison of the models’ performances
The success and reliability of the landslide models were tested using prediction AUROC curves (Fig. 7), which were obtained using the testing and training landslide datasets. Also, Kappa and RMSE values were used for the model validation (Fig. 8). For the training datasets, the AUROC values were 0.992 (99.2%), 0.901 (90.1%), 0.900 (90.0%), and 0.757 (75.7%) for the SVR-GOA, ANN, BRT, and elastic net models, respectively. This reveals that the SVR-GOA learnt better and performed better than the other models in the order: SVR-GOA > ANN > BRT > elastic net. The same performance trend was obtained in the training according to the Kappa and RMSE values (Figs. 8). Interestingly, the same performance trend was also seen in the testing stage. However, the AUROC and Kappa values decreased while RMSE values increased for the testing datasets. The AUROC values for the model testing were 0.930 (93.0%), 0.833 (83.3%), 0.822 (82.2%), and 0.726 (72.6%) for the SVR-GOA, ANN, BRT, and elastic net landslide models, respectively. Comparing the training and testing performances of the models, the Kappa values dropped from 0.945 to 0.872, 0.812 to 0.790, 0.804 to 0.783, and 0.698 to 0.672 for the SVR-GOA, ANN, BRT, and elastic net, respectively. Meanwhile, the RMSE increased from 0.280, 0.320, 0.330, and 0.410 in the training to 0.300, 0.350, 0.370, and 0.430 in the testing for the SVR-GOA, ANN, BRT, and elastic net landslide models, respectively.
The outcomes of the validations obtained in this study also indicated that strong relationships exist between the reality and the currently identified landslide sites and the machine learning model-predicted landslide locations. Although discrepancies have been noticed in the performances of the models, it is interesting to also note that the novel SVR-GOA, which has now been tested in landslide mapping, performed much better with the lowest modeling errors than the ANN, BRT, and elastic net models. This implies that the LSM model by SVR-GOA promises to match more with the reality in the Kalaleh basin than other models. However, it is realized that ANN, BRT, and elastic net models performed well too, as their AUROC and Kappa values for the training and testing datasets were much above the benchmark value of 0.500 for model acceptability. Moreover, their RMSE values in training and testing stages were found to be much lower than 1, indicating high model acceptability. Thus, as recognized in this study, all the four models performed well but the novel SVR-GOA was exceptionally excellent in producing a more accurate landslide mapping. Nevertheless, the agreement or correlation between the landslide models is significant. It is also pertinent to note that the variations noticed in the performances of the four models could be attributed to the structural variations in their algorithmic makeup.
3.5 Performance of the present SVR-GOA ensemble and previous studies
Numerous studies have used ensemble models to do spatial modeling (Ruidas et al. 2022). Due to its excellent predicted accuracy and capacity for working with huge, less expensive datasets, ML or AI algorithms have recently gained a lot of attention, especially for analyzing environmental dangers (Fu et al. 2020; Band et al. 2020). Based on the proper conditioning elements in a certain area, all of these strategies have produced the best outcomes. Even if LSM approaches have undergone significant progress, new developments are surely needed to boost performance. The SVR-GOA ensemble has been recently utilized in the modeling of flood susceptibility in the Qazvin Plain, Iran (Panahi et al. 2021) and in the Gandheswari River basin, India (Ruidas et al. 2022). However, we are not aware of any study that implemented this ensemble in LSM. Hence, its application in this study is considered novel.
The comparison of the results of this study with those on flooding is necessary. As several factors influence flooding, so are several factors affecting landslide occurrence. Moreover, some of the factors are the same for both geohazards. Validation of the developed models was attained by using three statistical techniques – AUROC, Kappa value, and RMSE. In this comparison, reference will only be made to AUROC of the studies. The study conducted by Panahi et al. (2021) utilized SVR-GOA and reported a validation AUROC value of 0.959 (95.9%). Also, the study by Ruidas et al. (2022) implemented SVR-GOA and obtained a validation AUROC = 0.938 (93.8%). However, the AUROC value for the validation of the SVR-GOA ensemble in this study is 0.930 (93.0%). These show high predictive performance of the SVR-GOA as a modeling technique. Also, this study has demonstrated a high-level consistency, in terms of performance, with the previous modeling studies that have applied the SVR-GOA ensemble model. The SVR-GOA ensemble exhibits a very little variation in the testing and training stages compared to the three other models (ANN, BRT, and elastic net), according to all of the adopted validation methods. This, additionally, suggests that Due to the combination of several models, overfitting issues can be greatly overcome. This validated SVR-GOA ensemble model created in this work fits the LSM of the Kalaleh basin, hence it is a good fit.
3.6 Implications of the present prediction modeling
The results of this study indicate that all of the conditioning elements are significant factors influencing the landslide risk of the different areas in the Kalaleh basin. As a result, priority areas with high landslide hazards must be carefully taken into account for prompt and efficient soil management and conservation. The results and insights from the present investigation have shown that the applied SVR-GOA ensemble was effective alongside the ANN, BRT, and elastic net models. Therefore, we suggest that the design of this study can be applied effectively in other settings and regions of the world. The following are possible practical implications of this study.
-
The use of multiple data-driven models in this study convincingly minimized the bias that could arise from the implementation of standalone SVR-GOA model. Accordingly, the problem of overfitting has been eliminated. Thus, the SVR-GOA can be applied with these other models elsewhere or can also be integrated with other types of data-driven algorithms.
-
The performance of the SVR-GOA could be due to lesser computational time required to implement it. The validated SVR-GOA ensemble required less computation time and had lower error. The other standard models, on the other hand, seemed to require more memory, a large dataset, and more processing time in order to enhance their performances. In a similar way, standard statistical methods need a lot of effort, vast datasets, and additional input factors that are inappropriate for areas with low data. In contrast to the other three models, the SVR-GOA method demonstrated more accuracy with the provided datasets.
-
The application of SVR-GOA ensemble in LSM would certainly advance landslide research on regional, national, and global scales. As have been cited earlier, there are no previous literatures on the application of SVR–GOA algorithm in LSM. This algorithm showed good performance in this case study, as can be observed. As a result, it is highly good at predicting landslide susceptibility. This research paper's content provides baseline data on SVR-GOA landslide mapping, which may be highly helpful in future studies in many parts of the world.
-
The SVR-GOA ensemble could improve the Kalaleh basin's pixel detection of vulnerable spots that require rapid reclamation and upkeep as well as the prioritization of landslide risk. This study identified the local characteristics that affect the likelihood of landslides. When the proper area-specific conditioning factors and model input variables are taken into account, the innovative method suggested in our study can effectively estimate landslide susceptibility in various basins.
-
Other natural hazards can potentially be modelled to the SVR-GOA ensemble model to increase accuracy.