Landslide is a brief and promptly happening vandalize natural disaster in mountainous areas caused by immature geology along with other sequential triggering factors which leads to changes of landscape and numerous damages (Cruden, 1991). Recently, the magnitude and frequency of landslide events have increased greatly and global attentiveness has been strained on the landslides studies generally because of the urbanization increasing pressure and its socio-economic effects on habitations (examples shown in Fig. 1) (Akgun, 2012). Thus, identification of landslide-susceptible areas using various strategies such as landslide susceptibility mapping (LSM) has become significantly important for better landslide reduction and management (Das et al., 2012).
Recently, LSM is considered as an integral part of landslide management in various regions, as it maps areas susceptible to landslides on the assumption that there is a probability of landslides to occur in areas with similar geo-environment conditions as well as landslide history (Yimin Mao et al., 2021; Roy et al., 2019). Several machine learning (ML) techniques supported by Geographic Information System (GIS) and Remote Sensing (RS) technologies are extensively applied by many researchers in various parts of the world to develop LSM models. Some of those ML methods are based on classification (supervised learning) and clustering (unsupervised learning) methods. Classification methods work with labeled dataset with multiple conditioning factors and tend to classify this dataset into relevant landslide and non-landslide classes. Such methods includes: classification and regression trees (Li & Chen, 2020), random forest (Hong et al., 2019), SVM, (Lee et al., 2017; Zhao & Zhao, 2021), Boosted Regression (Park & Kim, 2019), Quadratic Discriminant Analysis (Wang et al., 2020), naïve Bayesian classification (Mao et al., 2015), Artificial Neural Networks (Bragagnolo et al., 2020) and Decision tree (Arabameri et al., 2021; Mao et al., 2017). Despite of their wide applications, LSM models based on classification methods depend on large landslide labeled dataset to enhance their performance accuracy, and to acquire such dataset huge efforts are required in surveying sites, a process which becomes a challenge when dealing with large study areas. With this challenge, the application of these methods becomes limited.
Luckily, the clustering methods do not rely on landslide labeled dataset containing mapping units (points) with multiple conditioning factors. They tend to analyze the dataset, discover meaningful similarities and classify the mapping units into subclasses (clusters) based on their similarities. But there are very few LSM models based on clustering methods that have been developed so far, including: Agglomerative hierarchical clustering (Yimin Mao et al., 2021), hierarchical clustering (Pokharel et al., 2021), FCM (Wan, 2013), k-means (Wang et al., 2017), and KPSO (Wan et al., 2015). Unfortunately, most of these methods have limited performance due to their inability to detect subclasses with arbitrary shapes and varying sizes, sensitivity to noise, inability to perform well in large study areas with large datasets as well as lack of ability to successfully process the uncertain (rainfall) data.
Hence, from the above analysis, it can be noted that there is still a need to develop, implement and re-evaluate more of improved clustering methods to demonstrate their capacity to yield good susceptibility maps for their implementation in prone areas as they are easy to implement and their performance do not depend on the amount of the available dataset.
Therefore, in the present study, we developed a new clustering algorithm called CIBD-CURE which integrates the CIBD (City Block Distance (de Souza & De Carvalho, 2004)) and CURE (Clustering Using Representatives (Guha et al., 1998)) methods for LSM at Baota District, Shaanxi Province, China. The proposed methodology aims at addressing limitations of inability to identify clusters (subclasses) with arbitrary shapes and varying sizes, sensitivity to noise, inability to perform well in large study areas with large dataset and inability to process rainfall (uncertain) data, which affect the results of clustering algorithms in LSM. Moreover, during landslide susceptibility mapping, the LEPAM method which includes landslide density and eigenvalues (LE) and Partitioning Around Medoids (PAM, (Rdusseeun & Kaufman, 1987) method is designed and applied to classify the study area into five susceptibility levels (very low, low, moderate, high and very high).
The remainder of the present study is organized as: section 2 includes a brief explanation on the study area; section 3 provides detail information on the research materials and methods used in this study; section 4 covers results and discussions of the study, while the study conclusions are in section 5.