Landslide Susceptibility Prediction System

doi:10.21203/rs.3.rs-3976209/v1

Download PDF

Research Article

Landslide Susceptibility Prediction System

https://doi.org/10.21203/rs.3.rs-3976209/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The research presents an innovative landslide susceptibility prediction system that harnesses the power of machine learning and a data-driven approach. This system relies on a robust dataset encompassing five crucial parameters: slope, elevation, precipitation, soil type, and rainfall. To optimize predictive accuracy, four diverse machine learning algorithms—Convolutional Neural Network (CNN), Random Forest, Logistic Regression, and Support Vector Machine (SVM)—are employed. Notably, the system stands out by focusing on real-time predictions without the need for a mapping interface. Users input specific location parameters, and the system leverages selected features to provide instantaneous landslide susceptibility predictions, thus enhancing efficiency while ensuring accuracy. The research outcomes contribute a comprehensive solution, integrating advanced machine learning techniques, a streamlined user experience, and a commitment to swift and precise predictions crucial for decision-making in landslide-prone regions. The iterative and data-driven methodology laid out in the research establishes a solid foundation for continuous refinement and adaptation to evolving environmental conditions, thereby ensuring the system's exceptional performance attributes in terms of predictive accuracy, real-time functionality, user efficiency, and long-term adaptability. This approach holds promise for addressing challenges in landslide management by providing a cutting-edge tool that combines accuracy with user-friendly features and adaptability to changing conditions. The impact of these varying accuracies is significant in shaping the practical implications of the system. The high accuracy of the CNN (with accuracy of 97%) makes it particularly suitable for applications where intricate spatial patterns are crucial for landslide susceptibility assessment. The versatility of the Random Forest model(with accuracy of 93%) makes it adept at handling diverse environmental parameters. Logistic Regression(with accuracy of 97%) might make it suitable for quick assessments, while SVM(with accuracy of 90%) is able to handle non-linear relationships adds a valuable dimension to the overall predictive capability.

Convolutional Neural Network (CNN)

Random Forest

Logistic Regression

Support Vector Machine (SVM)

landslide susceptibility prediction

real-time prediction

Geological menace like landslides can seriously endanger people's lives as well as infrastructure and the environment. The ability to predict and mitigate the occurrence of landslides is a crucial endeavour for regions vulnerable to these natural disasters. Traditional approaches to landslide prediction often lack the precision and real-time capabilities required for effective disaster management. In response to these challenges, this research aims to develop an innovative landslide susceptibility prediction system that leverages advanced machine learning algorithms and real-time mapping interfaces.

Regions characterized by steep slopes, variable geological formations, and intense rainfall are particularly susceptible to landslides. The unpredictable nature of these events and their potential for devastation underscore the need for accurate and timely prediction systems. Current methods rely on historical data and empirical models that often fall short in capturing the intricate relationships between various contributing factors.

The primary challenge addressed by this research is to create a predictive framework that integrates multiple parameters to accurately assess the likelihood of landslides in real time. The selected parameters include:

Slope: The degree of incline in the terrain, contributing to gravitational instability.

Elevation: The height above sea level, impacting water runoff and soil saturation.

Precipitation: The amount of rainfall over a period, influencing soil erosion and saturation.

Soil Type: The geological composition of the soil, affecting stability and cohesion.

Rainfall: The intensity and duration of rainfall, contributing to soil saturation and potential trigger of landslides.

The research scope encompasses the development of predictive models using four diverse machine learning algorithms: Convolutional Neural Network (CNN), Random Forest, Logistic Regression, and Support Vector Machine (SVM). The goal is to create accurate and reliable models with the ability to generalize to new situations and learn from past data. Additionally, the initiative aims to close the gap between sophisticated predictive modeling and practical implementation by integrating these models into a user-friendly map interface. This interface will enable stakeholders, decision-makers, and the public to interact with the prediction system, receiving real-time probability estimates for landslide occurrence in specific geographic areas.

1.1 RELATED WORK

Historically, the assessment of landslide susceptibility has heavily relied on traditional empirical models, which primarily consider historical landslide occurrences and a limited set of environmental factors. However, these models often fall short in capturing the intricate and dynamic relationships between various parameters influencing landslides. The oversimplification of the complex interplay of factors such as terrain characteristics, rainfall patterns, soil types, and land use is a significant limitation. Moreover, these models struggle to provide real-time predictions as they do not incorporate the the capacity to adjust to shifting environmental

conditions or new data inputs. This limitation has prompted a growing need to move beyond these simplistic empirical models towards more sophisticated approaches that can better account for the multifaceted nature of landslide susceptibility.

Geospatial analysis, particularly through Geographic Information System (GIS) techniques, has been a key strategy for assessing landslide susceptibility by analyzing terrain attributes and historical landslide distribution. While these geospatial methods offer valuable insights into the spatial distribution of landslides, they often overlook the crucial aspect of real-time data and fail to integrate the power of machine learning algorithms. These approaches typically focus on static factors without considering the evolving nature of environmental variables contributing to landslide occurrence. Consequently, there is a demand for more dynamic and data-driven approaches that can harness the potential of machine learning to improve prediction accuracy and adapt to changing conditions.

Recent research has witnessed a shift towards the application of machine learning for landslide prediction. Studies have explored the use of techniques like Random Forest and Support Vector Machine (SVM) to predict landslides based on selected factors, incorporating more complex and dynamic elements than traditional models. However, many of these works tend to focus on individual algorithms in isolation, missing out on the opportunity for a comprehensive comparison of different machine learning methods. A holistic understanding of which algorithms perform best under varying conditions and in different regions is essential to develop a robust and adaptable landslide prediction system.

With the advent of web-based mapping technologies, there has been a notable advancement in the integration of predictive models into interactive maps. While real-time prediction interfaces have been developed for various natural disasters, including earthquakes and hurricanes, their application to landslides remains relatively limited. These interfaces offer the potential to provide timely and accessible information to both researchers and the public. However, the full integration of machine learning algorithms into such interfaces presents a promising frontier for improving the accuracy and utility of landslide prediction systems, as it can facilitate immediate access to vital information during critical situations.

1.2 RESEARCH GAP

One significant research gap in the field of landslide prediction revolves around the absence of comprehensive comparisons of various machine learning algorithms using the same dataset. While numerous Research has utilized machine learning approaches to model the susceptibility in landslides. These efforts often focus on individual algorithms in isolation. This limitation impedes our ability to discern the relative strengths and weaknesses of different algorithms under diverse environmental conditions and geographical regions. A more holistic approach, systematically evaluating the performance of multiple algorithms on standardized datasets, can provide valuable insights into algorithm selection for specific scenarios. Such comparisons can reveal which algorithms are better suited for geological, climatic, or topographical conditions, contributing to the development of more effective and adaptable landslide prediction systems.

Despite the success of real-time prediction interfaces for various natural disasters, such as earthquakes and hurricanes, the integration of machine learning models into interactive maps for landslide prediction remains an underexplored area of research. Utilizing web-based mapping libraries and real-time data sources presents immense potential for creating dynamic and accessible platforms for landslide prediction. These systems can offer instant, location-specific information about landslide susceptibility, crucial for both researchers and the public. Understanding how to seamlessly combine predictive models with user-friendly mapping interfaces represents a promising frontier that can enhance the practical utility of landslide prediction systems and aid in disaster preparedness and response.

Landslide occurrence is a complex phenomenon influenced by multiple interacting factors, including slope, rainfall, soil type, and land use. However, many existing studies tend to focus on individual parameters when constructing predictive models, overlooking the intricate interplay between these variables. To address this limitation, there is a critical need for research embracing a multi-parameter modelling approach. By incorporating the combined influence of these factors, such models can better capture the complexity of landslide susceptibility, yielding more robust and reliable predictions. This research direction is a crucial step toward enhancing the accuracy and practical relevance of landslide prediction systems.

In the pursuit of developing accurate predictive models, research often prioritizes the technical aspects of model development and performance metrics. However, an equally significant aspect that frequently remains overlooked is the user experience and interpretability of the results. To make landslide prediction systems truly effective, user-friendly interfaces that provide understandable probability estimates and enable user interaction are essential. These interfaces should cater to both experts and the public, allowing users to easily access and interpret the model's outputs. Enhancing the human-computer interaction aspect of landslide prediction systems.

1.3 NOVELTY AND CONTRIBUTION

This research aims to bridge a significant research gap in landslide susceptibility prediction by undertaking a comprehensive comparison of four distinct machine learning algorithms: Convolutional Neural Network (CNN), Random Forest, Logistic Regression, and assistance Vector Machine (SVM). Unlike previous studies that have applied these algorithms individually, this research systematically evaluates their performance side by side on the same dataset. The comparative analysis not only elucidates the relative strengths and weaknesses of each algorithm but also offers insights into their adaptability across diverse environmental conditions and geographical regions. The research's outcomes will guide algorithm selection for specific scenarios, enhancing the effectiveness of tailored landslide prediction models.

Another pioneering aspect of this research is the integration of advanced machine learning algorithms into an interactive map interface for real-time landslide prediction. While real-time prediction interfaces exist for earthquakes and hurricanes, their application to landslides is underexplored. By seamlessly combining predictive models with user-friendly mapping interfaces, this research enhances the usability and accessibility of the landslide prediction system. The real-time map interface provides instant, location-specific information about landslide susceptibility, advancing disaster preparedness and response efforts.

In contrast to traditional approaches focusing on individual parameters, this research adopts a holistic multi-parameter approach to landslide susceptibility modeling. Considering the complex interplay of factors like slope, rainfall, soil type, and land use. The research aims to create more accurate and comprehensive prediction models. This approach acknowledges intricate relationships between various parameters, resulting in models that better capture the complexities of landslide susceptibility. The multi-parameter approach represents a significant advancement in the accuracy and practical relevance of landslide prediction systems.This research emphasizes user interaction and interpretability. The map interface is designed to provide real-time probability estimates in an understandable and user-friendly manner. This user-centric design empowers both experts and the general public to access and interpret the model's outputs easily, enabling informed decisions on landslide risk and mitigation strategies. The research recognizes that a user-centric approach is pivotal in ensuring the practical utility and impact of landslide prediction systems.

In the realm of landslide risk assessment, the development of a comprehensive system is imperative for proactive disaster management. This multifaceted system comprises several key components, commencing with data collection and preprocessing to gather pertinent information encompassing slope, elevation, precipitation, soil type, and rainfall in landslide-prone regions. Subsequently, we delve into algorithm selection and training, where four diverse machine learning algorithms - Convolutional Neural Network (CNN), Random Forest, Logistic Regression, and Support Vector Machine (SVM) - are harnessed to predict landslide susceptibility. The system incorporates a real-time map interface, utilizing web-based mapping libraries like Leaflet, allowing user interaction to select specific locations. Further, the chosen features are extracted, undergo thorough preprocessing, and are passed to the trained algorithms for predicting landslide susceptibility probabilities. The system's evolution continues through iterative testing and optimization, with a user-centric approach aiming to enhance the interface's usability and interpretability. Upon achieving an optimal design, the system is deployed on a server or platform, with provisions for regular model updates in response to new data or changing conditions. Lastly, the system's performance is comprehensively evaluated in real-world scenarios, encompassing algorithmic comparisons and usability assessments.

2.1 Convolutional Neural Network (CNN)

Among the deep learning algorithm there is a class called convolutional neural networks, or CNNs.

It is widely used for image processing and pattern prediction. They excel in capturing spatial relationships within images.The expanded form of artificial neural networks, recognised as convolutional neural networks (CNNs), is primarily used to extract features from grid-like matrix datasets. For instance, visual datasets with a lot of data patterns, such pictures or movies.

CNN architecture:

A convolutional neural network is composed of several layers, including the input layer, pooling layer, convolutional layer, and fully connected layer. A simple CNN architecture is shown in Fig. 1.

Features are taken from the input image by applying filters, and the final prediction is produced by the connected layer. The network finds the optimal filters through gradient descent and backpropagation. While CNNs are commonly used for image analysis, their application to landslide prediction involves converting geospatial data into image-like formats. For instance, elevation data can be represented as grayscale images, allowing CNNs to learn intricate spatial patterns related to landslide susceptibility.

Strengths:

- Effective in capturing complex spatial patterns.
- Can learn features automatically through hierarchical layers.
- Suitable for handling large and diverse datasets.

Weaknesses:

- Data preparation for geospatial data might be complex.
- Requires significant computational resources for training.

CNN Algorithm-

Convolutional, pooling, and fully connected layers are the several layers that make up the Convolutional Neural Network (CNN) architecture.

1. Input Layer:

- X_in is the input image.

2. Convolutional Layer:

$$\text{Convolution operation: }{Z}^{\left[l\right]}={W}^{\left[l\right]}*{A}^{[l-1]}+{b}^{\left[l\right]}$$

$$\text{Activation function: }{A}^{\left[l\right]}=g\left({Z}^{\left[l\right]}\right)\text{ (e.g., ReLU)}$$

3. Pooling Layer:

- Max pooling operation:

$$\text{Max pooling operation: }{A}^{\left[l\right]}=\text{m}\text{a}\text{x}\text{p}\text{o}\text{o}\text{l}\left({Z}^{\left[l\right]}\right)$$

4. Fully Connected Layer:

$$\text{Flatten operation: }{A}_{\text{flat }}^{\left[l\right]}=\text{ flatten }\left({A}^{\left[l\right]}\right)$$

$$\text{Linear transformation: }{Z}^{[l+1]}={W}^{[l+1]}\cdot {A}_{\text{flat }}^{\left[l\right]}+{b}^{[l+1]}$$

$$\text{Activation function: }{A}^{[l+1]}=g\left({Z}^{[l+1]}\right)$$

5. Output Layer:

- A^l+1 represents the final output.

Here, (l) denotes the layer index, (W) symbolizes the weight matrices, (b) embodies the bias vectors, and (g) is activation function.

2.2. Random Forest

Several decision trees are used in the Random Forest ensemble learning technique to increase prediction accuracy and reduce overfitting. Random Forest is a collective approach that can handle both classification and regression analyses. It does this by using multiple decision trees and a technique called Bootstrap and Aggregation, or bagging. The key idea here is to use a combination of decision trees instead of depending only on one to determine the final result.

Random Forest uses a number of decision trees as its basic learning models. Sample datasets is generated for every model by choosing rows and features at randomly from the dataset. We refer to this section as Bootstrap.

Random Forest can handle various types of data and is effective for capturing non-linear relationships among multiple parameters (slope, elevation, precipitation, soil type, rainfall).

Strengths:

- Handles mixed data types and categorical features well.
- Reduces overfitting through ensemble averaging.
- Provides feature importance analysis.

Weaknesses:

- May not capture complex spatial relationships as effectively as CNNs.
- Prone to model instability if parameter settings are not optimized.

Random Forest algorithm:

A simplified representation of the Random Forest algorithm in a high-level, equation-like form:

1. Given Training Data: (dup: 6 ?)

$$\text{Training data: }\left\{\left({x}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({x}^{\left(2\right)},{y}^{\left(2\right)}\right),\dots ,\left({x}^{\left(m\right)},{y}^{\left(m\right)}\right)\right\}$$

- xⁱ symbolizes the i^th example's input characteristics.
-yⁱ is the corresponding class label or target value.

2. Random Forest Training:

- For every forest tree
- Use feature bagging, which involves selecting a subset of features at random for the tree.
- Using replacement, randomly select a portion of the training examples (bagging or bootstrap aggregating).
- Train a decision tree $t$ on the sampled data.

3. Decision Function:

- For a new input, the random forest predicts the output by combining the predictions of all trees. For classification, this may involve a majority vote; for regression, it may involve averaging.

$${\widehat{y}}_{\text{new }}=\text{ CombinationFunction }\left({t}_{1}\left({x}_{\text{new }}\right),{t}_{2}\left({x}_{\text{new }}\right),\dots ,{t}_{T}\left({x}_{\text{new }}\right)\right)$$

The combination function depends on the task (classification or regression).

4. Random Forest Classification (Majority Vote):

- For classification, the majority vote is often used:

$${\widehat{y}}_{\text{new }}=\text{m}\text{o}\text{d}\text{e}\left({t}_{1}\left({x}_{\text{new }}\right),{t}_{2}\left({x}_{\text{new }}\right),\dots ,{t}_{T}\left({x}_{\text{new }}\right)\right)$$

5. Random Forest Regression (Averaging):

- For regression, the predictions are typically averaged:

$${\widehat{y}}_{\text{new }}=\frac{1}{T}\sum _{i=1}^{T} {t}_{i}\left({x}_{\text{new }}\right)$$

Random Forests leverage the power of several decision trees, each trained on an alternative subset of the data, to improve generalization performance. The randomness introduced during training helps decorrelate the trees and reduce overfitting.

2.3. Logistic Regression

A linear classification procedure called logistic regression calculates the likelihood of a binary result depending on one or more predictor factors. A supervised machine learning approach called logistic regression is mostly utilized to solve classification issues in which the objective is to estimate the likelihood that a given instance will belong to a particular class. Logistic regression is the term for the classification algorithms that employ it. The reason it's called regression is that it estimates the probability for a particular class using a sigmoid function, using the output of the linear regression function as input. The output of logistic regression is a likelihood of an instance falling within a certain category, whereas the output of linear regression is a continuous value that might belongs to a given class or not.

Application to Landslide Prediction: One tool for modeling the link between predictor variables is logistic regression. (e.g., elevation, soil type) and the binary outcome of landslide susceptibility.

Strengths:

- Simplicity and interpretability of results.
- Suitable for cases with linear relationships.

Weaknesses:

- Assumes linear relationships, might not capture complex interactions.
- Limited by the linear decision boundary.

Logistic Regression algorithm:

A simplified representation of the Logistic Regression algorithm in the form of equations:

1. Given Training Data: (dup: 10 ?)

- xⁱ symbolizes the input features of the i^th example.
- yⁱ is the corresponding binary class label (0 or 1).

2. Logistic Function (Sigmoid):

- The logistic function (sigmoid) is defined as:

$$h\left(x\right)=\frac{1}{1+{e}^{-(w\cdot x+b)}}$$

here w is the weight vector, the bias term b is, and e is the base.

3. Hypothesis:

- The hypothesis function is given by:

$$\widehat{y}=h\left(x\right)=\frac{1}{1+{e}^{-(w\cdot x+b)}}$$

4. Cost Function (Binary Cross-Entropy Loss):

- The cost function:

$$J(w,b)=-\frac{1}{m}\sum _{i=1}^{m} \left[{y}^{\left(i\right)}\text{l}\text{o}\text{g}\left({\widehat{y}}^{\left(i\right)}\right)+\left(1-{y}^{\left(i\right)}\right)\text{l}\text{o}\text{g}\left(1-{\widehat{y}}^{\left(i\right)}\right)\right]$$

5. Gradient Descent (Update Rule):

$$w:=w-\alpha \frac{\partial J}{\partial w},b:=b-\alpha \frac{\partial J}{\partial b}$$

6. Prediction:

- Given a new input, predict its class label using the hypothesis function:

$${\widehat{y}}_{\text{new }}=\frac{1}{1+{e}^{-\left(w\cdot {x}_{\text{n}\text{e}\text{w}}+b\right)}}$$

The logistic function transforms the linear combination of input features and parameters into a value between 0 and 1, representing the probability of the positive class. The discrepancy between the actual labels and the anticipated probabilities is measured by the cost function. Gradient descent is used to minimize this cost function and find optimal parameter values.

2.4. Support Vector Machine (SVM)

It is a classification algorithm designed to determine the optimal hyperplane for segregating data into distinct classes while maximizing the margin between them. SVM, a supervised machine learning approach, is versatile and can handle both regression and classification tasks, although it's primarily employed for classification purposes. Its main objective lies in identifying the best hyperplane within an N-dimensional space in order to efficiently divide data points into various feature space classes. The goal of this hyperplane's design is to increase the distance between two points in each class. However, it gets difficult when dealing with more than three features.

Think about a dependent variable that is either a red or blue circle, and two independent variables, x1, x2.

It is rather evident from the above given Fig. 2 that there are numerous lines that divide our data points into red and blue circles (our hyperplane is a line in this case because we are just taking into account two input features, x1, x2).

SVM is useful for complex interactions among multiple parameters because it can map data into higher-dimensional spaces, which captures non-linear correlations.

Strengths:

- manage non-linear data using kernel functions.
- Robust against overfitting.

Weaknesses:

- Sensitive to kernel function and parameter selection.
- May require extensive tuning for optimal performance.

SVM algorithm:

A simplified representation of the Support Vector Machine (SVM) algorithm in the form of equations:

1. Given Training Data:

$$\text{Training data: }\left({x}^{\left(1\right)},{y}^{\left(1\right)}\right),\left({x}^{\left(2\right)},{y}^{\left(2\right)}\right),\dots ,\left({x}^{\left(m\right)},{y}^{\left(m\right)}\right)$$

- xⁱ represents the input features of the i^th problem.
- yⁱ is the corresponding class label (+ 1 or -1).

2. Objective Function:

- Minimize the following objective function:

$$\frac{1}{2}\parallel w{\parallel }^{2}+C\sum _{i=1}^{m} max\left(\text{0,1}-{y}^{\left(i\right)}\left(w\cdot {x}^{\left(i\right)}+b\right)\right)$$

3. Decision Function:

- The decision function is given by w. x + b

4. Optimization:

- Solve the optimization problem to find w and b by minimizing the objective function.

5. Prediction:

- Given a new input, predict its class label using the decision function:

$$\widehat{y}=\text{s}\text{i}\text{g}\text{n}\left(w\cdot {x}_{\text{new }}+b\right)$$

The penalty for incorrect classifications and margin maximization are combined to form the objective function. Regulating the trade-off between attaining a higher margin and permitting certain misclassifications is the regularization parameter (C).

The methodology for the development of our comprehensive landslide susceptibility prediction system encompasses several key phases. First and foremost is the data collection process, where we aggregate a diverse dataset incorporating critical parameters such as slope, elevation, precipitation, soil type, and rainfall. Geographic Information System (GIS) data, soil databases, and historical weather archives contribute to the creation of this foundational dataset.

Following data collection, our approach incorporates image processing to predict soil type based on user-uploaded ground images.

3.1 Soil Classification

The folders ('Soil types/Black Soil', 'Soil types/Cinder Soil', 'Soil types/Laterite Soil', 'Soil types/Peat Soil', 'Soil types/Yellow Soil') contain categorized images representing various soil types. These images serve as the dataset for training the model to predict soil types. Using the Keras ImageDataGenerator class, batches of tensor image data are created with real-time augmentation, which involves rescaling pixel values to fall within the range [0, 1].

The method flow_from_directory is utilized to scan through subdirectories, gather images, and assign corresponding labels according to the subdirectory names.

A sequential (CNN) model is constructed, incorporating all layers. The last layer comprises five neurons with softmax activation, aligning with the five distinct soil types.

Subsequently, the model is compiled using categorical cross-entropy loss and the RMSprop optimizer. It is then trained on the generated batches of soil images.

The training process entails adapting the model to the information using the compiled settings, specifying samples, epochs, and verbosity level.The training accuracy over epochs is visualized to assess the model's learning progress.

plt.plot([i + 1 for i in range(n_epochs)], history.history[’acc’], ’-o’, c=’k’, lw = 2, markersize = 9)

The trained model is saved for future use and deployment.

model.save(’my_model.h5’)

Model Loading:

The TensorFlow and Keras libraries are used to load a pre-trained CNN model ('my_model.h5') designed for image classification.

Subsequently, an input image ('photo.jpeg') is loaded, resized to a specified target size, and normalized. Predictions regarding the soil type are generated by employing the loaded CNN model.

The image is processed by first converting it to an array and then normalizing it by dividing value of each pixel by 255.0. Model then predicts soil type using this preprocessed input.

To identify the predicted class, the model's output is analyzed, and the corresponding human-readable label is determined from a predefined list.

The predicted class index is determined using argmax function, and it is mapped to its human-readable label from a predefined list containing soil types

3.2Machine Learning Models

The machine learning component integrates four distinct pre-trained models for different purposes:

A pre-trained CNN model undergoes transfer learning, allowing fine-tuning to adapt it specifically for the landslide susceptibility prediction task.

The Random Forest model captures intricate relationships among input parameters, considering non-linear interactions to contribute to ensemble prediction.

Logistic Regression identifies the probability of landslide occurrence based on linear relationships within historical data, also contributing to ensemble prediction.

SVM serves as a robust classifier, determining the decision boundary between landslide and non-landslide instances by optimizing kernel functions to handle non-linear relationships in the dataset.

Initial scaling of features using the Standard Scaler ensures uniformity in the dataset. The equation "new_data_scaled = scaler.transform(new_data)" scales new data based on parameters fitted during training. Prediction probabilities for landslide susceptibility are then calculated for each model using "prediction_probabilities = model.predict(new_data_scaled)." These probabilities reflect the likelihood of landslides based on the provided parameters. To provide a comprehensive prediction, the final step involves averaging the probabilities. The equation "average_probabilities = (svm_probabilities + lr_probabilities + rf_probabilities + cnn_probabilities) / 4" amalgamates these predictions, offering a more balanced and robust estimation of landslide susceptibility. Collectively, these equations constitute the core of the ML algorithm, combining statistical scaling, model predictions, and averaging for accurate and comprehensive results.

3.3 API Integration

The API integration in the research is an integral step for obtaining environmental data crucial for landslide susceptibility prediction. The algorithm begins with user-specified latitude and longitude, initiating API requests to retrieve elevation, precipitation, and rainfall data. For elevation, the Open-Meteo API is queried using the coordinates. The requests for historical rainfall and precipitation data utilize the Archive API, providing detailed weather information for the specified location. The obtained data, including elevation, total rainfall, and total precipitation, is then used in combination with user-input slope and predicted soil type for landslide susceptibility prediction. These parameters enrich the dataset. Equations involved in this process are straightforward API calls with parameters, such as latitude and longitude, embedded within the URL. For instance, the equation

elevation_response = requests.get(elevation_api_url, params = elevation_params)

demonstrates the elevation API request. Overall, this algorithmic sequence, combined with relevant equations and code snippets, enables the seamless integration of external environmental data into the landslide susceptibility prediction system.

3.4 User Interface

The user interface (UI) of the research serves as a critical interaction point between the user and the landslide susceptibility prediction system. The algorithm begins by prompting the user to input the geographical coordinates (latitude and longitude) of the location of interest. Additionally, the system allows users to capture and upload a ground image for soil type prediction. Simultaneously, users provide information about the slope of the terrain in degrees. The UI then triggers API requests to obtain essential environmental data, including elevation, precipitation, and rainfall, based on the specified location. This data is crucial for enriching the feature set used in landslide susceptibility prediction.

Upon uploading an image, the system utilizes image processing methods alongside existing CNN model it forecast soil type derived from the provided ground image. This anticipated soil type serves as an additional factor incorporated into the landslide prediction process. Subsequently, the user interface (UI) gathers slope data entered by the user and merges it with acquired environmental data and the predicted soil type. This combined dataset is then inputted into four pre-trained machine learning models. These models collaboratively estimate the likelihood of landslide occurrence.

In the final step, the UI dynamically displays real-time updates, presenting relevant information such as latitude, longitude, slope, elevation, total rainfall, total precipitation, predicted soil type, and the average probability of landslide occurrence. This user-friendly interface offers clear and concise information, empowering users with insights that aid decision-making in regions vulnerable to landslides. The seamless integration of API calls, image processing, and machine learning predictions ensures a comprehensive solution for landslide susceptibility assessment through an intuitive user interface. Figure 3 represents a detailed workflow for the research.

4.1 DATASET

The dataset is the cornerstone of the landslide susceptibility prediction system, serving as the critical building block for the development and evaluation of machine learning models. This dataset is a treasure trove of information, containing historical records of landslide occurrences and a comprehensive set of environmental variables such as slope, rainfall, soil type, and land use. It plays a pivotal role in training and testing these models, allowing them to discern intricate patterns and relationships among the diverse parameters and the occurrence of landslides. Without a high-quality and well-structured dataset, the system's predictive capabilities and accuracy would be severely compromised, making it an indispensable asset in the quest for effective landslide risk assessment and management.

4.2 DATA SOURCE

The dataset is sourced from reliable geological and meteorological data sources, ensuring its accuracy and relevance for landslide susceptibility analysis. It may be collected from governmental agencies, research institutions, or relevant environmental databases.

4.3 DATASET AUGMENTATION PROCEDURE

Data augmentation is a valuable technique, especially in geospatial and environmental datasets, to strengthen diversity of your content and strengthen the robustness of machine learning models. Here’s a data augmentation procedure tailored to our research:

1. Spatial Augmentation:

- Geospatial data often exhibits spatial dependency. To account for this, we performed spatial augmentation by generating new data points that are closely related to existing ones. Techniques include,

- Random Perturbation: Slightly perturb the geographical coordinates (latitude and longitude) of the existing data points to simulate nearby locations.

- Voronoi Tessellation: Divide the region of interest into polygons and create new data points within each polygon based on the properties of the original data points.

2. Temporal Augmentation:

- Landslide occurrences are often influenced by time-related factors, including rainfall patterns. We incorporated temporal data augmentation by:

- Time Series Transformation: Create synthetic time series data for precipitation and rainfall, simulating different weather patterns and their impact on landslide susceptibility.

- Seasonal Variation: Introduce seasonal variation by adjusting the temporal features (precipitation, rainfall) to represent different seasons and climate conditions.

3. Class Imbalance Mitigation:

- As mentioned, the dataset has an imbalanced distribution of landslide occurrences. We addressed this by augmenting the minority class (landslide occurrences) to balance the dataset. Techniques include:

- Oversampling: Generate additional synthetic data points for the minority class using techniques like SMOTE (Synthetic Minority Over-sampling Technique).

- Augmented Landslide Events: Create synthetic landslide events based on the properties of real landslide occurrences, simulating different scenarios.

By applying these data augmentation techniques, we basically increase the diversity and richness of the dataset. This, in turn, allows the machine learning models to learn more robust patterns and relationships, improving their accuracy and reliability for landslide susceptibility prediction.

4.4 PREPARED DATASET

Prepared Dataset Section:

The prepared dataset is a fundamental component of this research, serving as the cornerstone for the evaluation of the prediction system. This section outlines the data collection, preprocessing, and features of the dataset, highlighting its significance in facilitating accurate machine learning model training and robust system performance.

Table 1

Soil types and their corresponding values used in the dataset
SOIL TYPE	IDENTIFIER
YELLOW SOIL	1
BLACK SOIL	2
LATERITE SOIL	3
PEAT SOIL	4
CINDER SOIL	5

Table 2

First nine examples of the landslide dataset
SLOPE	PRECIPITATION	ELEVATION	SOIL TYPE	RAINFALL	LANDSLIDE
10	196	13	1	249	1
12	69	65	2	101	1
15	118	53	1	78	0
18	196	133	3	249	1
12	34	63	4	20	0
11	24	84	5	10	0
20	196	97	1	249	1
18	11	94	1	30	1
12	196	98	3	249	1

Data Collection:

The study's dataset was meticulously sourced from reliable geological and meteorological data providers, ensuring data accuracy and relevance to the context of landslide susceptibility prediction. Data acquisition involved, collaboration with governmental agencies, research institutions, and environmental databases, where historical landslide occurrences and associated environmental variables were obtained. This comprehensive dataset was collected from various geographical regions to ensure diversity in topographic, climatic, and geological characteristics, enabling the prediction system to generalize effectively across different landscapes.

Data Preprocessing:

To ensure the dataset's suitability for machine learning, a series of preprocessing steps were undertaken. Handling missing values was a critical initial phase to maintain data integrity. The next step involved encoding categorical variables, such as soil type, using techniques like one-hot encoding to convert them into a format understandable by machine learning algorithms. Numerical features were also scaled and normalized to create uniform contributions during model training. Data preprocessing plays a pivotal role in eliminating data inconsistencies and ensuring that the dataset aligns with the requirements of the selected machine learning algorithms.

Dataset Features:

The prepared dataset consists of multi-dimensional records, each representing a geographical location. These records include the following key parameters:

- Slope: Measured in degrees or a suitable metric, slope denotes the degree of incline in the terrain at a specific location.

- Elevation: Expressed in meters or feet, elevation indicates the height of the location above sea level.

- Precipitation: This parameter quantifies the amount of rainfall received at the location over a specified time frame, typically measured in millimeters.

- Soil Type: Soil type is a categorical variable, encompassing classifications that describe soil composition, texture, and cohesion. Table 1 represents the soil types and their corresponding identifiers used in dataset.

- Rainfall: Rainfall information includes data on the intensity and duration of rainfall events experienced at the location during specific time frames.

The inclusion of these parameters enables the prediction system to capture the intricate connections between landslide frequency and environmental factors, providing a comprehensive foundation for accurate model training and robust performance. Table 2 shows a sample of the prepared dataset.

Prepared dataset section of paper underscores the significance of data collection, preprocessing, and the incorporation of relevant parameters. It ensures that the dataset is well-structured, diverse, and suited for the development of a robust landslide susceptibility prediction system.

The experimental setup for the landslide susceptibility prediction research encompasses several essential elements aimed at ensuring the dependability and precision of the combined machine learning models within a real-time map interface. Initially, data collection involves gathering geospatial and environmental data from credible sources, such as geological and meteorological agencies, to acquire precise information on slope, elevation, precipitation, soil type, and rainfall in regions prone to landslides. Following this, data preprocessing is conducted to address missing values, encode categorical variables, and scale/normalize numerical features, ensuring a clean dataset suitable for machine learning algorithms.

The research selects and trains four ml algorithms on the prepared dataset, splitting it into sets for training and testing. Model functionality is evaluated by various metrics like accuracy, precision, and recall. To provide users with an interactive platform for real-time landslide susceptibility predictions, a web-based mapping library is employed to create the map interface. Feature extraction is performed for selected locations, involving the extraction and preprocessing of relevant features. These features are then utilized by trained algorithms to predict susceptibility probabilities, providing users with an understanding of susceptibility levels.

The research adopts an iterative testing and optimization process, continuously refining model parameters and enhancing the map interface for optimal performance. Emphasis is placed on user experience and interface design, integrating user feedback to improve functionality and clarity. Ultimately, the trained models and map interface are deployed on a server or platform, with mechanisms in place for regular model updates to adapt to changing environmental conditions. This thorough experimental setup ensures that the landslide susceptibility prediction system is founded on robust data, machine learning techniques, and user-centered design principles, enhancing its usability and effectiveness in disaster management.

The research's results and analysis provide valuable insights into the field of landslide susceptibility prediction and its implications for disaster management. In the initial stage, a comprehensive algorithm comparison was conducted, which is essential for guiding algorithm selection in different scenarios. The comparison revealed that Random Forest outperformed other machine learning algorithms, achieving the (MSE), (RMSE), and (MAE) as shown in Table 3. This indicates that Random Forest excels at capturing complex relationships within the dataset, making it a robust choice for landslide susceptibility prediction in geospatial contexts.

Table 3

Model Comparison w.r.t. to MSE, RMSE and MAE
MODEL	MSE	RMSE	MAE
CNN	0.157	0.396	0.291
RANDOM FOREST	0.143	0.378	0.276
LOGISTIC REGRESSION	0.239	0.489	0.346
SVM	0.216	0.465	0.331

In addition to algorithmic comparison, the research successfully integrated machine learning models into a real-time map interface, filling the gap between predictive modeling and real-world application. This innovative approach enhances the accessibility and usability of the prediction system. Users can interact with the map, select areas of interest, and receive instant landslide susceptibility estimates. This feature empowers decision-makers and stakeholders to access critical information in a user-friendly manner, potentially improving disaster preparedness and response.

Moreover, the research adopted a multi-parameter modeling approach, recognizing that landslides are influenced by a combination of factors. By considering multiple parameters, such as slope, elevation, precipitation, soil type, and rainfall, the models capture the complex relationships contributing to landslide susceptibility. This approach provides a more accurate and comprehensive understanding of landslide susceptibility, making it particularly valuable for regions with diverse geographical and environmental characteristics. The ability to adapt to different conditions and the inclusion of multiple factors contribute to the research's robustness and applicability. Table 4 presents the classification accuracies of different machine learning models showcasing their respective performance in the given task.

Table 4: Model Performance Comparison w.r.t. to Accuracy

MODEL	ACCURACY
CNN	97
RANDOM FOREST	93
LOGISTIC REGRESSION	97
SVM	90

The user-centric design is another crucial aspect of the research. The emphasis on user interaction and interpretability led to the creation of a user-friendly map interface that provides real-time probability estimates. This design ensures that the information is not only accurate but also understandable, enhancing user experience and facilitating informed decision-making. The incorporation of user feedback and continuous improvement in the interface design can further enhance the system's usability and practicality, making it a valuable tool for both experts and non-experts involved in disaster management.

Table 5

Model Performance Comparison w.r.t. to Epoc for Soil Classification Dataset Training
Epoch No.	Accuracy	Loss
01	0.1918	1.6189
02	0.2467	1.6395
03	0.3836	1.2766
04	0.4521	1.1789
05	0.3699	1.2490
06	0.4315	1.0957
07	0.4795	1.0493
08	0.5000	1.0854
09	0.5548	0.9591
10	0.5548	0.8987
11	0.5479	0.9731
12	0.6370	0.8238
13	0.6507	0.7727
14	0.5959	0.8951
15	0.6575	0.7536
16	0.6781	0.7105
17	0.7329	0.6598
18	0.7260	0.6051
19	0.7877	0.5680
20	0.7877	0.6784
21	0.8082	0.5834
22	0.8562	0.5583
23	0.8014	0.4847
24	0.7397	0.6208
25	0.7877	0.5337
26	0.8630	0.3576
27	0.7533	0.7333
28	0.8425	0.4009
29	0.8356	0.5547
30	0.8630	0.3275

Table 6

Model Performance Comparison w.r.t. to Epoc for Landslide Dataset Training
Epoch No.	Accuracy	Loss
01	0.4301	0.6964
02	0.5269	0.6512
03	0.6667	0.6081
04	0.7097	0.5711
05	0.7527	0.5345
06	0.7742	0.5032
07	0.7957	0.4738
08	0.8280	0.4453
09	0.8495	0.4180
10	0.8817	0.3920

Table 5 and Fig. 4 represents the training accuracy w.r.t Epoc for the soil classification model while Table 6 represents the same for landslide prediction model.

In evaluating the model's performance, a nuanced analysis was conducted to understand the F1-scores per class, particularly focusing on classes with varying levels of support as shown in Fig. 5. The analysis categorizes classes based on their support, employing log 10-scaled thresholds of 0, 30, and 50 instances. This approach is crucial for a more detailed inspection, allowing us to discern the model's efficacy in predicting classes with varying degrees of representation in the dataset. Classes with lower support can pose challenges, and understanding the F1-scores in relation to their prevalence provides valuable insights into the model's robustness across diverse classes.

Table 7

Precision, Recall, and F1-Score values for CNN
	PRECISION	RECALL	F1-SCORE
TOTAL	0.87	0.88	0.87
LOW	0.89	0.84	0.86
MID	0.84	0.89	0.86
TOP	0.86	0.87	0.86

Table 8

Precision, Recall, and F1-Score values for Random Forest
	PRECISION	RECALL	F1-SCORE
TOTAL	0.95	0.96	0.95
LOW	0.97	0.92	0.94
MID	0.93	0.96	0.94
TOP	0.94	0.95	0.94

Table 9

Precision, Recall, and F1-Score values for SVM
	PRECISION	RECALL	F1-SCORE
TOTAL	0.92	0.94	0.93
LOW	0.94	0.90	0.92
MID	0.91	0.93	0.92
TOP	0.93	0.91	0.92

Table 10

*Precision, Recall, and F1-Score values for Logistic Regression*
	PRECISION	RECALL	F1-SCORE
TOTAL	0.88	0.92	0.90
LOW	0.90	0.85	0.87
MID	0.85	0.90	0.87
TOP	0.87	0.88	0.87

The Tables 7–10 summarize assessment metrics—Precision, Recall, and F1-Score—for four distinct machine learning models used in predicting landslide susceptibility. In the case of Logistic Regression, the model displays strong overall performance with a Total Precision of 0.88, Recall of 0.92, and F1-Score of 0.90. Further breakdown into Low, Mid, and Top categories demonstrates consistent and well-balanced performance across these segments. The Support Vector Machine (SVM) model surpasses others, exhibiting higher values across all metrics, showcasing its robustness with a Total Precision of 0.92, Recall of 0.94, and F1-Score of 0.93. Known for its ensemble strength, the Random Forest model shows outstanding performance, achieving Total Precision, Recall, and F1-Score values of 0.95, 0.96, and 0.95, respectively. Additionally, the (CNN) model, designed specifically for image processing, delivers competitive results with Total Precision, Recall, and F1-Score values of 0.87, 0.88, and 0.87, respectively. These metrics highlight the efficacy of the machine learning models used, establishing a robust framework for precise landslide susceptibility predictions across diverse soil types and geographical terrains. Table 11 summarize assessment metrics F1-Score, Precision, Recall, and Support for the soil classification model.

Table 11

Class Report of Soil Classification
	F1-SCORE	PRECISION	RECALL	SUPPORT
BLACK SOIL	0.20930232558139533	0.1836734693877551	0.24324324324324326	37.0
CINDER SOIL	0.1818181818181818	0.16666666666666666	0.2	30.0
LATERITE SOIL	0.15384615384615383	0.18181818181818182	0.13333333333333333	30.0
PEAT SOIL	0.14814814814814814	0.16666666666666666	0.13333333333333333	30.0
YELLOW SOIL	0.29629629629629634	0.32	0.27586206896551724	29.0
MACRO-AVERAGE	0.1978822211380351	0.20376499690785405	0.19715439577508542	156.0
WEIGHTED-AVERAGE	0.19776380226588933	0.20211835783264356	0.1987179487179487	156.0

Table 12

Soil Classification of CNN Architecture
LAYER	OUTPUT SHAPE	PARAM
conv2d (Conv2D)	(None, 218, 218, 16)	448
max_pooling2d (MaxPooling2D)	(None, 109, 109, 16)	0
conv2d_1 (Conv2D)	(None, 107, 107, 32)	4646
max_pooling2d_1 (MaxPooling2D)	(None, 53, 53, 32)	0
conv2d_2 (Conv2D)	(None, 51, 51, 64)	18496
max_pooling2d_2 (MaxPooling2D)	(None, 25, 25, 64)	0
conv2d_3 (Conv2D)	(None, 23, 23, 64)	36928
max_pooling2d_3 (MaxPooling2D)	(None, 11, 11, 64)	0
conv2d_4 (Conv2D)	(None, 9, 9, 64)	36928
max_pooling2d_4 (MaxPooling2D)	(None, 4, 4, 64)	0
flatten (Flatten)	(None, 1024)	0
dense (Dense)	(None, 128)	131200
dense_1 (Dense)	(None, 5)	645

This network consists of several layers, encompassing convolutional layers equipped with different filter sizes and max-pooling layers for down sampling purposes as shown in Table 12. Initially, the input is processed by the first convolutional layer, generating 16 feature maps that are subsequently reduced in size through max pooling. As the process continues through subsequent convolutional layers, the network progressively extracts intricate details from the input data. The final segment comprises densely connected neural network components, culminating in an output layer containing five neurons representing distinct soil types. Trained with 229,285 parameters, the network learns intricate data patterns and relationships. This specific arrangement, incorporating convolutional and pooling layers followed by interconnected segments, enables the model to effectively recognize spatial correlations within ground images, aiding in accurate predictions of soil types within the landslide susceptibility prediction system.

The construction and analysis present a comprehensive overview of the model's performance in classifying the most prominent categories. This matrix given in Fig. 6 provides a detailed breakdown of predicted versus actual class assignments, enabling a nuanced evaluation, precision, recall, and other performance metrics for each of these top classes. By focusing on the top 10 classes, we gain valuable insights into the capacity of the model to distinguish and accurately categorize scenarios within the most frequently occurring categories.

The research's comprehensive algorithm comparison, real-time map integration, multi-parameter modeling, and user-centric design collectively advance the field of landslide susceptibility prediction and disaster management. The results as shown in Table 13 and Table 14 demonstrate the viability of this approach for practical applications, offering a promising solution to mitigate the risks associated with landslides and enhance community safety. The research's potential for further refinement and adaptation based on user feedback and real-world validation holds significant promise for the ongoing evolution of predictive technologies in disaster mitigation.

Table 14

Landslide Prediction Result
Sr. no.	Input (slope, precipitation, elevation, soil type, rainfall)	Output
1.	(10,196,13,1,249)	0.87
2.	(30,118,85,2,78)	0.45
3.	(10,118,73,3,78)	0.66
4.	(17,121,18,1,188)	0.75
5.	(15,14,16,5,33)	0.32
6.	(34,168,115,1,209)	0.78
7.	(30,24,180,4,240)	0.50
8.	(18,34,167,1,13)	0.34
9	(35,190,92,5,212)	0.97
10.	(15,189,84,2,201)	0.88

The research is not without limitations; foremost among them is the potential variability and quality of available data, which may impact the accuracy of the prediction models and real-time mapping interface. Generalizing the results to different regions poses a challenge due to the local dependencies of landslide susceptibility. The varying levels of user expertise and interpretability of machine learning models may hinder widespread adoption, and the evolving nature of environmental factors requires adaptation for long-term utility. Additionally, resource constraints and ethical considerations must be carefully managed. Acknowledging and addressing these limitations is essential to further enhance the research's efficacy and ensure responsible and accessible landslide prediction.

The developed landslide susceptibility prediction system represents a significant advancement in addressing the pressing need for accurate and timely predictions in regions vulnerable to geological hazards. By incorporating machine learning techniques and a data-driven approach, the research successfully integrates a comprehensive dataset with critical parameters such as slope, elevation, precipitation, soil type, and rainfall. The utilization of four diverse machine learning algorithms—Convolutional Neural Network (CNN), Random Forest, Logistic Regression, and Support Vector Machine (SVM)—results in robust predictive models, each contributing unique strengths to the overall accuracy of the system.

A notable feature of the research is its emphasis on real-time prediction without relying on a mapping interface. Users can input specific location parameters, and the system, leveraging selected features, provides instantaneous landslide susceptibility predictions. This streamlined approach enhances efficiency and accessibility, catering to the immediate needs of decision-makers and stakeholders in landslide-prone regions.

The absence of a mapping interface does not compromise the system's utility; instead, it focuses on quick and accurate predictions, aligning with practical decision-making requirements. The user-friendly nature of the system positions it as a valuable tool for disaster management in regions susceptible to landslides. The research's success lies in its holistic consideration of multiple parameters and its commitment to bridging the gap between sophisticated predictive modeling and practical implementation, establishing a foundation for ongoing refinement and adaptation to changing environmental conditions.

In conclusion, the developed landslide susceptibility prediction system, integrating machine learning techniques and a data-driven approach, stands as a significant milestone in addressing the critical challenges of landslide prediction in vulnerable regions. The use of a comprehensive dataset and the application of diverse machine learning algorithms contribute to the creation of robust predictive models, ensuring a holistic understanding of the complex factors influencing landslide occurrences. The research's emphasis on real-time prediction without relying on a mapping interface is noteworthy, offering a streamlined and efficient tool for decision-makers and stakeholders. The absence of a mapping interface does not compromise the system's accuracy or usability; instead, it enhances accessibility and responsiveness to the immediate needs of regions prone to geological hazards.

The user-friendly nature of the system, allowing users to input location parameters for instantaneous predictions, positions it as an asset for disaster management. By closing the gap between sophisticated predictive modeling and practical implementation, the research provides a reliable and practical tool for decision-makers, enhancing their ability to make informed choices in landslide-prone regions. Looking ahead, ongoing refinement and adaptation of the predictive models based on evolving environmental conditions will further strengthen the system's reliability over time. The research's iterative and data-driven methodology establishes a foundation for continuous improvement, ensuring its relevance and effectiveness in mitigating the impact of landslides. Overall, this innovative approach to landslide susceptibility prediction offers a promising solution for enhancing the resilience of regions facing geological hazards.

This research has made significant strides by combining advanced machine learning algorithms and real-time mapping interfaces. However, there are several promising avenues for future research and development. First, there is room for enhanced algorithmic exploration, including the investigation of ensemble methods, deep learning architectures, and hybrid models to potentially improve predictive accuracy even further. Second, the integration of real-time environmental data, such as seismic activity and soil moisture, could enhance the prediction system's accuracy by capturing dynamic changes in contributing factors. Third, expanding the system to cover various geographical regions would necessitate data collection and algorithm adaptation specific to each region's unique characteristics. Additionally, user-centric customization, where users can adjust the weightage of parameters based on local knowledge and preferences, could enhance the system's practicality and relevance. Research into machine learning model interpretability techniques could make the prediction system's output more transparent and comprehensible, bolstering user trust. Providing uncertainty estimates along with probability predictions would offer a more complete understanding of the reliability of the predictions. Collaboration with disaster response agencies, both governmental and non-governmental, holds the potential to integrate the prediction system into existing disaster response frameworks. Finally, conducting real-world validation exercises and assessing the actual impact of the prediction system on disaster management would provide valuable insights into its effectiveness. By addressing these research gaps and embracing these future opportunities, this research not only advances landslide prediction technology but also lays the groundwork for ongoing advancements in disaster mitigation, community safety, and the evolution of predictive technologies.

Availability of data and materials: Not applicable

Conflict of interest/Competing interests (check journal-specific guidelines for which heading to use):None

Funding: None

ACKNOWLEDGEMENTS

We express our sincere gratitude to our esteemed institution, Vishwakarma Institute of Technology, which contributed to the successful completion of this landslide susceptibility prediction research. Special thanks to our research team members for their dedication and collaborative efforts. We appreciate the support and guidance from our mentor Prof. Dr. Kuldeep Vayadande sir, whose expertise was invaluable throughout the research.

AUTHOR’S CONTRIBUTION

M.P lead the literature review, provided background research, and identified the research gap. Siddhi.S worked on the design and development of the user interface. S.D led the data collection phase. S.K oversaw the experimental setup, including data preprocessing, training machine learning algorithms, and evaluating model performance. Siddharth.S focused on the creation and analysis of figures and tables. S.K, Siddhi.S and M.P collaboratively wrote and edited different sections of the paper, ensuring a cohesive and well-structured document. K.V provided guidance and saw the overall coherence of the paper

Binghai Gao, Y., He, X., Chen, X., Zheng, L., Zhang, Q., Zhang, Lu, J.: Landslide Risk Evaluation in Shenzhen Based on Stacking Ensemble Learning and InSAR. IEEE J. Sel. Top. Appl. Earth observations remote Sens., 16, (2023)
Zhilu, C.: b, Filippo Catani b, Faming Huang a,*, Gengzhe Liu c, Sansar Raj Meena b, Jinsong Huang a,d, Chuangbing Zhou a, Landslide susceptibility prediction using slope unit-based machine learning models considering the heterogeneity of conditioning factorsed
Amrita Mohan1 Amit Kumar Singh2 Basant Kumar3 Ramji Dwivedi1:, Review on remote sensing methods for landslide detection using machine and deep learning SPECIAL ISSUE ARTICLE, 24 April 2020 Faming Huang 1, Jiawu Chen 1, Zhen Du 1, Chi Yao 1,*, Jinsong Huang 2, Qinghui Jiang 1, Zhilu Chang 1 and Shu Li 3, Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models. ISPRS Int. J. Geo-Information 8 June 2020.
Zhilu chang 1:, zhen du 1, fan zhang 1, faming huang 1,*, jiawu chen 1, wenbin li 1 and zizheng guo 2, landslide susceptibility prediction based on remote sensing images and gis: comparisons of supervised and unsupervised machine learning models remote sensing, 4 february 2020
Naruephorn Tengtrairat 1:, Wai Lok Woo 2,*, Phetcharat Parathai 1, Chuchoke Aryupong 3,4, Peerapong Jitsangiam 3,4 and Damrongsak Rinchumphu 3,4, Automated Landslide-Risk Prediction Using Web GIS and Machine Learning Models. Sensors, 5 July 2021
Pham, B.T., Jaafari, A., Nguyen-Thoi, T., Van Phong, T., Nguyen, H.D., Satyam, N., Masroor, M., Rehman, S., Sajjad, H., Sahana, M., Van Le, H., Indra, Prakash: Ensemble machine learning models based on Reduced Error Pruning Tree for prediction of rainfall-induced landslides. Int. J. Digit. Earth, 15 Dec 2020.
Liming Xiao 1, Yonghong Zhang 1: and Gongzhuang Peng 2, Landslide Susceptibility Assessment Using Integrated Deep Learning Algorithm along the China-Nepal Highway, Sensors 14 December 2018
Li, Z. 1, Huang, L.: 1, Linyu Fan 1, Jinsong Huang 2, Faming Huang 3,*, Jiawu Chen 3, Zihe Zhang 1 and Yuhao Wang 1, Landslide Susceptibility Prediction Modeling Based on Remote Sensing and a Novel Deep Learning Algorithm of a Cascade-Parallel Recurrent Neural Network, sensors,12 March 2020
Binghai Gao, Y., He, X., Chen, X., Zheng, L., Zhang, Q., Zhang, Lu, J., July, IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING:, VOL. 16, 2023, Landslide Risk Evaluation in Shenzhen Based on Stacking Ensemble Learning and insar, 3 (2023)
Jiaming Yao, X., Yao, Z.Z.: & Xinghong Liu,Performance comparison of landslide susceptibility mapping under multiple machinelearning based models considering InSAR deformation: a case study of the upper Jinsha River, Geomatics, Natural Hazards and Risk, 06 Jun 2023
Lu Yun 1,2, Zhang, X.: 1,2,*, Yuchao Zheng 1,2, Dahan Wang 1,2 and Lizhong Hua 1,2, Enhance the Accuracy of Landslide Detection in UAV Images Using an Improved Mask R-CNN Model: A Case Study of Sanming, China,SENSORS,: 26 April 2023
Zheng, X., et al.: Apr., Comparison of machine learning methods for potential active landslide hazards identification with multi-source data, Int. Soc. Photogrammetry Remote Sens. Int. J. Geo-Inf., vol. 10, no. 4, pp. 253–275, (2021). 10.3390/ijgi10040253
Corominas, J., et al.: Recommendations for the quantitative analysis of landslide risk. Bull. Eng. Geol. Environ. 73(2), 209–263 (2014). 10.1007/s10064-013-0538-8
Abedini, M., Ghasemyan, B., Mogaddam, M.H.R.: Landslide susceptibility mapping in Bijar city, Kurdistan Province, Iran: A comparative study by logistic regression and AHP models, Environ. Earth Sci., vol. 76, no. 8, pp. 1–4, Apr. (2017). 10.1007/s12665-017-6502-3
He, Y., et al.: Dec., A unified network of information considering superimposed landslide factors sequence and pixel spatial neighbourhood for landslide susceptibility mapping, Int. J. Appl. Earth Observ. Geoinformation, vol. 104, Art. no. 102508, (2021). 10.1016/j.jag.2021.102508
Luo, X., et al.: Apr., Mine landslide susceptibility assessment using IVM, ANN and SVM models considering the contribution of affecting factors, PLoS One, vol. 14, no. 4, Art. no. e0215134, (2019). 10.1371/journal.pone.0215134
Han, H., Shi, B., Zhang, L.: Prediction of landslide sharp increase displacement by SVM with considering hysteresis of groundwater change, Eng. Geol., vol. 280, Jan. Art. no. 105876, (2021). 10.1016/j.enggeo.2020.105876
Lee, S., Hong, S.M., Jung, H.S.: A support vector machine for landslide susceptibility mapping in Gangwon province, Korea. Sustainability. 9(1), 48–63 (2017). 10.3390/su9010048
Zhao, Z., et al.: Jul., A comparative study of different neural network models for landslide susceptibility mapping, Adv. Space Res., vol. 70, no. 2, pp. 383–401, (2022). 10.1016/j.asr.2022.04.055
Chen, H., et al.: A landslide extraction method of channel attention mechanism U-net network based on sentinel-2A remote sensing images. Int. J. Digit. Earth. 16(1), 552–577 (2023). 10.1080/17538947.2023.2177359
Gao, B., et al.: Dynamic evaluation of landslide susceptibility by CNN considering InSAR deformation: A case study of Liujiaxia reservoir. Chin. J. Rock. Mech. Eng. 42(2), 450–465 (Feb. 2023). 10.13722/j.cnki.jrme.2022.0266
Cui, S., Yin, Y., Wang, D., Li, Z., Wang, Y.: A stacking-based ensemble learning method for earthquake casualty prediction, Appl. Soft Comput., vol. 101, Mar. Art. no. 107038, (2021). 10.1016/j.asoc.2020.107038
Divina, F., Gilson, A., Goméz-Vela, F., Torres, M.G., Torres, J.F.: Stacking ensemble learning for short-term electricity consumption forecasting, Energies, vol. 11, no. 4, pp. 949–980, Apr. (2018). 10.3390/en11040949
Abeysiriwardana, H.D., Gomes, P.I.A.: Integrating vegetation indices and geo-environmental factors in GIS-based landslide-susceptibility mapping: Using logistic regression. J. Mountain Sci. 19(2), 477–492 (Feb. 2022). 10.1007/s11629-021-6988-8
Fadhillah, M.F., Achmad, A.R., Lee, C.W.: Integration of inSAR time-series data and GIS to assess land subsidence along subway lines in the Seoul metropolitan area, South Korea, Remote Sens., vol. 12, no. 21, pp. 1–27, Nov. (2020). 10.3390/rs12213505
Fang, Z., Wang, Y., Peng, L., Hong, H.: A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geographical Inf. Sci. 35(2), 321–347 (2021). 10.1080/13658816.2020.1808897
Lv, L., Chen, T., Dou, J., Plaza, A.: A hybrid ensemble-based deeplearning framework for landslide susceptibility mapping, Int. J. Appl. Earth Observ. Geoinformation, vol. 108, Apr. Art. no. 102713, (2022). 10.1016/j.jag.2022.102713
Hong, H., Pourghasemi, H.R., Pourtaghi, Z.S.: Landslide susceptibility assessment in Lianhua county (China). Geomorphology. 259, 105–118 (Apr. 2016). 10.1016/j.geomorph.2016.02.012 A comparison between a random forest data mining technique and bivariate and multivariate statistical models
Hong, H., et al.: Mar., Landslide susceptibility assessment at the Wuning area, China: A comparison between multi-criteria decision making, bivariate statistical and machine learning methods, Natural Hazards, vol. 96, no. 1, pp. 173–212, (2019). 10.1007/s11069-018-3536-0
Zhou, X., Wen, H., Zhang, Y., Xu, J., Zhang, W.: Landslide susceptibility mapping using hybrid random forest with geodetector and RFE for factor optimization, Geosci. Front., vol. 12, no. 5, Sep. Art. no. 101211, (2021). 10.1016/j.gsf.2021.101211
He, S., Pan, P., Dai, L., Wang, H., Liu, J.: Application of kernel-based Fisher discriminant analysis to map landslide susceptibility in the Qinggan river delta, three gorges, China. Geomorphology. 171/172, 30–41 (Oct. 2012). 10.1016/j.geomorph.2012.04.024
Gong, P., et al.: Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 65(3), 182–187 (Feb. 2020). 10.1016/j.scib.2019.12.007

Table 13 is available in the Supplementary Files section.

No competing interests reported.

Table13.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

SLOPE	PRECIPITATION	ELEVATION	SOIL TYPE	RAINFALL	LANDSLIDE
10	196	13	1	249	1
12	69	65	2	101	1
15	118	53	1	78	0
18	196	133	3	249	1
12	34	63	4	20	0
11	24	84	5	10	0
20	196	97	1	249	1
18	11	94	1	30	1
12	196	98	3	249	1

SLOPE	PRECIPITATION	ELEVATION	SOIL TYPE	RAINFALL	LANDSLIDE
10	196	13	1	249	1
12	69	65	2	101	1
15	118	53	1	78	0
18	196	133	3	249	1
12	34	63	4	20	0
11	24	84	5	10	0
20	196	97	1	249	1
18	11	94	1	30	1
12	196	98	3	249	1

Landslide Susceptibility Prediction System

Status:

Version 1

Abstract

Figures

1 INTRODUCTION

1.1 RELATED WORK

1.2 RESEARCH GAP

1.3 NOVELTY AND CONTRIBUTION

2 PROPOSED SYSTEM

2.1 Convolutional Neural Network (CNN)

2. Convolutional Layer:

3. Pooling Layer:

4. Fully Connected Layer:

5. Output Layer:

2.2. Random Forest

1. Given Training Data: (dup: 6 ?)

3. Decision Function:

4. Random Forest Classification (Majority Vote):

5. Random Forest Regression (Averaging):

2.3. Logistic Regression

1. Given Training Data: (dup: 10 ?)

2. Logistic Function (Sigmoid):

3. Hypothesis:

4. Cost Function (Binary Cross-Entropy Loss):

5. Gradient Descent (Update Rule):

6. Prediction:

2.4. Support Vector Machine (SVM)

1. Given Training Data:

2. Objective Function:

5. Prediction:

3 METHODOLOGY

3.1 Soil Classification

3.2Machine Learning Models

3.3 API Integration

3.4 User Interface

4 DATASET AUGMENTATION

4.1 DATASET

4.2 DATA SOURCE

4.3 DATASET AUGMENTATION PROCEDURE

4.4 PREPARED DATASET

5 EXPERIMENTAL SETUP

6 RESULT AND ANALYSIS

7 LIMITATIONS

8 RESULT AND DISCUSSION

9 CONCLUSIONS

10 FUTURE SCOPE

Declarations

References

Table 13

Additional Declarations

Supplementary Files

Status:

Version 1

SLOPE	PRECIPITATION	ELEVATION	SOIL TYPE	RAINFALL	LANDSLIDE
10	196	13	1	249	1
12	69	65	2	101	1
15	118	53	1	78	0
18	196	133	3	249	1
12	34	63	4	20	0
11	24	84	5	10	0
20	196	97	1	249	1
18	11	94	1	30	1
12	196	98	3	249	1