Patterns of influence of different landslide boundaries and their spatial shapes on the uncertainty of landslide susceptibility prediction

Some landslide susceptibility modeling uses idealized landslide points or buffer circles as landslide boundaries, adding uncertainty to the susceptibility modeling. However, landslide boundaries and their spatial shapes are typically presented as irregular polygonal surfaces, such as semicircles and bumps. To study the influence of different landslide boundaries on modeling uncertainty, 370 landslides and 11 environmental factors in Ruijin were chosen in order to establish landslide boundaries and their frequency ratio correlations with environmental factors. Then, these borders were formed, utilizing, respectively, landslide points, buffer circles, and precisely encoded and drawn polygons. Then, models like Point, Circle, and Polygon-based DBN and RF were built using deep belief network (DBN) and random forest (RF). Finally, the distribution pattern of the susceptibility index and its variability were used, along with the receiver operating characteristic (ROC) accuracy, to analyze the modeling uncertainty. The results indicate that: (1) while correct landslide polygon borders are more successful in ensuring modeling accuracy and dependability, using landslide points or buffer circles as boundaries can increase modeling uncertainty. (2) but the. (3) in the absence of precise landslide borders, the landslide susceptibility results derived by employing points and buffer circles as landslide barriers can reflect the spatial distribution pattern of landslide likelihood in the studied area as a whole.


Introduction
A workable scientific approach for precisely estimating the spatial probability of a prospective landslide occurring in a particular area is provided by landslide susceptibility prediction modeling (Lombardo et al. 2015;Jaafari et al. 2019;Dou et al. 2020). Obtaining accurate landslide inventory data is the initial step in the susceptibility modeling process. This involves registration and cataloging of information on landslide locations, categories, borders and their spatial configuration, dates, triggers, and potential damage. The performance Extended author information available on the last page of the article of the landslide susceptibility modeling is directly impacted by the landslide boundaries and their spatial shape in the aforementioned catalogs (Kirschbaum et al. 2015;Kritikos and Davies 2015;Lombardo et al. 2015;Huang et al. 2020aHuang et al. , 2020b. This is due to the fact that knowledge of the landslide border influences the landslide's area and its nonlinear association with environmental parameters, both of which have a substantial impact on the spatial dataset that has been modeled (input and output variables, training test set, etc.) (Melchiorre and Abella 2011;Huang et al. 2020aHuang et al. , 2020b. As can be seen, precise mapping of the landslide borders is a requirement for successful landslide susceptibility modeling, and landslide boundaries that are ambiguous or have significant inaccuracies are likely to cause modeling uncertainty. When simulating susceptibility, it is challenging to establish precise landslide boundaries for a number of reasons, including mapping and GIS spatial analysis. In the absence of accurate landslide boundary data, a variety of other landslide boundary forms may be derived, and there is no agreement on the form of landslide boundaries in the various studies, for example, most of the common landslide inventories are currently shown as points or circles (Steger et al. 2016;Ada and San 2018;. But the landslides that have formed in the area tend to be polygonal in shape (irregular oval, skip-shaped, semicircular and elongated, etc.) (Ada and San 2018;Bordoni et al. 2020;Faming et al. 2020;Pourghasemi et al. 2020). Generally polygonal surfaces are used to represent the entire landslide area, landslide points represent the center of each landslide and buffer circles are circular landslide boundaries drawn by buffering outward from the landslide points (Huang et al. 2021). Although these spatial configurations also exist in many research, its real point or buffer circle as an imagined landslide boundary plainly does not correlate to the actual landslide situation. Because the borders were either not precisely recorded when the first landslides were gathered or because the accuracy of the initial geographical base map was so poor, it was challenging to accurately map out the landslide boundaries in the GIS, points were used as landslide boundaries. Instead, the purpose of using buffer circles is to increase the number of landslide rasters in order to expand the modeled spatial dataset (Nguyen et al. 2019).
To lessen modeling uncertainty, it is consequently vital to investigate the differences in susceptibility modeling under various landslide boundaries. This study suggests using landslide points, buffer circles, and precisely mapped landslide polygonal surfaces, respectively, to calculate the spatial correlation between landslides and their environmental components. Further spatial datasets for different operating conditions are constructed and imported into a data-driven model for vulnerability modeling (Paul and Li et al. 2018).
Numerous data-driven models, including informativeness, logistic regression, gray correlation, DBN, support vector machines, and random forest, have been found to be widely used for predicting the likelihood of landslides (Chang et al. 2020;Li et al. 2020;Huang et al. 2022). Two common and more sophisticated DBN and RF models are chosen for modeling in order to lessen the uncertainty that could be brought on by a single data-driven model. DBN and RF (Point, Circle, and Polygon-based DBN and RF) models based on landslide points, buffer circles, and polygonal surfaces are then constructed. Finally, an evaluation of modeling uncertainty is carried out using Ruijin City as an example (Fig. 1).

ideas
This paper's major goal is to investigate how various landslide boundaries and spatial forms affect the predictive modeling of landslide susceptibility. The following six steps make up its main process: 1. A summary of the approaches and challenges involved in collecting and mapping the boundaries of landslides, highlighting the most important aspects. 2. Environmental factor collection and administration, including topography, hydrology, lithology, and ground cover. 3. Based on frequency ratio (FR) and correlation analysis, the FR values of the environmental factors were determined for three different groups of landslide boundaries. 4. FR values are used as model inputs for the DBN and RF models, and the output variables are landslide catalogs and randomly chosen non-landslides from the region. 5. The county-wide landslide susceptibility index was predicted and mapped using the tested Point, Circle, and Polygon-based RF and DBN models. 6. Based on the receiver operating characteristic (ROC) curve, the distribution pattern of the landslide susceptibility index, and the importance of differences, the impact of various landslide boundaries on modeling uncertainty was assessed.

Landslide cataloging and its mapping
Basic components including reasonably precise landslide boundaries can be plotted in the landslide catalogs on DEM base maps of 30 m and higher resolution. Landslide cataloging data acquisition techniques include interpretation from satellite imagery, such as high-resolution remote-sensing imagery. The main types of interpretation from airborne data are aerial imagery, laser scanning, and hyperspectral imagery. The other option is to acquire field data based on geological, land use, and DEM information, and then immediately construct information like landslide boundaries in software like ArcGIS (Dou et al. 2019). The landslide polygon boundaries in this paper are taken from the landslide catalogs interpreted by the Ruijin Natural Resources Bureau, using the exact original landslide boundaries. While the buffer circles are drawn based on the landslide points utilizing GIS's buffer analysis, the landslide points are obtained by point drawing at the polygon's center.

Frequency ratio model (FR)
The basis for vulnerability assessment based on statistical analysis is the classification of an indicator factor into states and the calculation of the degree of influence of each state on a landslide. Previous studies have frequently used FR models to implement this process in order to improve the accuracy of the state classification (Chen et al. 2016). The formula is as follows.
The FR indicator characterizes the importance of the state of the indicator factor at each level for the occurrence of landslides. FR > 1 indicates that the condition is favorable for landslides to occur, and FR < 1 indicates that the condition is unfavorable for landslides to occur (Kavzoglu et al. 2015). This paper employs the FR model to calculate the landslide susceptibility index and then performs susceptibility zoning based on the GIS platform because the single FR model ignores the weighting of each indicator factor.

Random forest (RF)
By constructing several decision trees for classification and prediction using a randomly chosen subset of variables, RF is an effective integrated classifier that has been used to resolve numerous nonlinear problems (Wang et al. 2015). The RF algorithm functions as follows. (1) Repeated resampling of the initial training data. (2) A set of features that are chosen at random for each resampling. (3) Assuming a random feature set and a resample, estimate a decision tree. (4) The estimated set of decision trees is combined to create a single decision tree. (1)

FR =
Landslide area within classification∕Total landslide area of study area Area within classification∕Total area of study area = Landslide area ratio Classified area ratio

Deep belief network (DBN)
DBN, a crucial deep learning model, is a probabilistic generative model made up of several restricted Boltzmann machines (RBMs) units (Yin et al. 2017). Each neural unit in the hidden layer is connected to each neural unit in the visual layer, and there are no connections between any individual neural units in the different layers of the RBM model. The DBN is organized as illustrated in Fig. 2 and consists of a sequence of RBMs, with the output of each layer of RBMs acting as the input to the following layer. The training process for constrained Boltzmann machines often uses the contrastive divergence (CD) algorithm to save the training time of the model. Under the assumption that the data are a fixed input, the primary idea is to decrease the energy value of the network model by modifying the network parameters. The contrast scattering approach employs unsupervised learning techniques, which can successfully address the issue of neural networks entering local optima during model training and significantly reduce the model's training time.

ROC curve evaluation modeling accuracy
To accurately assess the model's accuracy, the appropriate assessment measures must be defined. This is fundamentally a binary classification problem because landslide susceptibility evaluation looks at the assessment of the susceptibility to the occurrence of a landslide (0 for non-occurrence and 1 for occurrence). The receiver operating characteristic curve (ROC) and the area under the curve (AUC) are two regularly used evaluation metrics for binary classification issues. In landslide susceptibility assessment, the plotting of ROC curves requires the calculation of true-and false-positive rates for susceptibility assessment results. The true-positive rate refers to the proportion of elements classified as landslides that are actually landslides, while the false-positive rate is the proportion of non-landslide elements classified as landslides to the number of all non-landslide elements. The chance of a false positive is the horizontal coordinate of the ROC curve, while the true-positive rate is its vertical coordinate (probability of hit). It should be noted that the closer the ROC curve is to the upper left corner, the more accurate the corresponding model evaluation is. AUC, as its name suggests, measures the level of accuracy of the model evaluation by measuring the area under the ROC curve. The higher the AUC value, the higher the accuracy of the model. Once a model for assessing landslide susceptibility has been created, susceptibility probability estimates can be computed for each raster pixel throughout the study area. Between 0 and 1 is the likelihood value range for this. A higher probability score indicates a more vulnerable area since it increases the likelihood that the landslide will cause instability and damage.

Confusion matrix for the susceptibility index
The fit between the model test sample and the projected values of the outcomes related to landslide susceptibility are statistical measure that are used to evaluate model performance, such as sensitivity, positive prediction rate (PPR), and total accuracy (TA) approaches (Yaseen et al. 2019). In binary classification, examples can be categorized into positive (positive) and negative (negative) classes, for instance, a landslide is a positive sample point. A sample point is a negative category if it is a non-landslide. For classification results, there are four cases, if an instance has a status of slippery and is also predicted to be slippery, it is noted as TP (true positive). If the state of the instance is non-slippery and it is predicted to be slippery, it is noted as FP (false positive). Accordingly, if the state of the instance is non-slippery, it is predicted to be slippery and is denoted as TN (true negative). If the state of the instance is slippery, it is predicted to be non-slippery which is denoted as FN (false negative). As the result obtained from the landslide susceptibility assessment is a susceptibility probability, if a dichotomous result is to be obtained, a threshold value needs to be set, e.g., if the threshold value is 0.5, the predicted result is no landslide if the predicted value is less than 0.5. If the predicted value is greater than 0.5, the prediction is that a landslide will occur.

Overview of the city of regent and landslide data
Subheadings may be used to split this section. It should give a clear and succinct explanation of the experimental findings, their interpretation, and any possible experimental inferences. (2)

Introduction to Ruijin and landslide cataloging
With a total area of 2441.4 km 2 , Ruijin City is situated in the southeast of Jiangxi Ganzhou City (Fig. 3). With an average annual rainfall of 780 mm, it has a subtropical monsoon climate. The area's elevation ranges from 139 to 1117 m, and metamorphic, carbonate, and clastic rocks make up the majority of the stratigraphic lithology. The county is separated into mountainous regions in the northeast, northwest, and southwest, while the southeast is hilly and river basin. Numerous accumulation landslides have formed in Ruijin City as a result of the city's complicated natural setting, geological characteristics, seasonal heavy rainfall, and slope excavation.
According to the Ruijin Land Bureau's geological disaster inventory, there were 370 landslides in Ruijin by the end of 2014, the majority of which were small to medium-sized shallow slides. The landslide boundary is accurately delineated as a polygonal surface, and the average area of the slide and the region it affects is about 13,000 m 2 . Most of these landslides are located in areas around densely populated areas, along roads or gullies, etc.

Analysis of landslides and their environmental factors
A brief description of the data sources modeled in this paper is as follows. (1)   The majority of landslides are shallow, small-to-medium-sized landslides. The accumulation layer is primarily made up of Quaternary silty clay and debris. The primary movement mode is the slope's overall downhill slide.
Landslide susceptibility can be understood as the probability of spatial instability of a slope considering only the action of underlying environmental factors (external triggers are not considered) (Yang et al. 2021). A total of 11 environmental factors, including elevation, slope, NDVI, distance from water system, and stratigraphic lithology, were ultimately chosen based on the spatial distribution pattern and formation conditions of landslides in Ruijin and with reference to the selection of landslide environmental factors in other study areas with similar environmental characteristics to Ruijin . Raster layers of each environmental factor at 30 m resolution were extracted through the GIS platform. Slope, slope direction, profile curvature, and plane curvature extracted from the DEM by means of GIS surface analysis tools. Obtaining topographic relief by means of a focal statistics tool processed by a raster calculator. Extraction of NDBI, NDVI, and MNDWI by band calculation of remotely sensed images with ArcGIS software. GIS hydrological analysis was used to extract the river network, and further buffer zones were done to obtain distances from the water system (Noori et al. 2019). FR values represent the impact of each general subinterval of environmental conditions on the occurrence of landslides. Table 1 displays the FR values for the 11 environmental parameters used for this study under the three landslide zones.

Topographical and geomorphological factors
Data on topographic and hydrological aspects can be found in the DEM. The FR values calculated under the three landslide boundaries in the elevation range of 139.9-1117.8 m are all > 1, as shown in Table 1 and Fig. 3, suggesting that the middle and low elevation areas of Ruijin City are favorable for the development of landslides. FR values based on polygonal surfaces are all > 1 for slopes ranging from 4.4° to 28.7°. The fact that the FR values for points and circles are higher in the lower slope interval of 4.4° to 13.2°, however, demonstrates that there is a significant error in the way that points and circles express the characteristics of landslide development, leading to an underestimate of the slope of landslide development. The profile curvature is obtained by extracting the slope value from the calculated slope, while the plane curvature is obtained by calculating the slope value in the direction of the slope, both reflecting the structure of the landslide and its complexity from different perspectives.

Hydrological environment, engineering geology, and surface cover factors
Hydrology can influence the process of water transport within the geotechnical body (see Fig. 4g and Table 1). The FR values for the three landslide boundary shapes are all > 1 when the distance from the water system is < 300 m, showing the important influence of hydrology on landslide development. The material composition, weight, strength, and shear modulus characteristics of the geotechnical body are determined by the lithology of the formation. Figure 4f and Table 1 show FR values > 1 for metamorphic and magmatic rock types in all three landslide shapes, while FR values < 1 in carbonate and clastic rocks, showing that carbonate and clastic rocks are relatively unfavorable for landslide development. In this paper, the surface cover factor is mostly described using NDBI, NDVI, and MNDWI (see Fig. 4i, j, k and Table 1). Landslides do occur more frequently when construction of different types of projects increases, but when too much construction is carried out, it will attract the attention of government departments, which will reduce the probability of landslide occurrence through prevention and control projects. The FR values for the three landslide boundary shapes all gradually increase and then start to decrease.

Landslide susceptibility results under different landslide boundaries
Landslide susceptibility modeling is based on the engineering geological analogies concept, in which statistical correlations between actual landslide events and other environmental parameters are used to build models to forecast susceptibility in nearby locations.

Point, Circle, and Polygon-based RF models predict landslide susceptibility
For each operating state, the out-of-bag error of the RF model is determined on the MAT-LAB platform using cyclic iterations, and the accuracy of the RF modeling increases as the out-of-bag error decreases. The optimal number of random features was obtained as 6, and the number of decision trees in the RF model was 477. Finally, landslide susceptibility   : a elevation, b slope, c aspect, d plan curvature, e profile curvature, f lithology, g distance to river, h topographic relief, i NDVI, j NDBI, and k MDWI prediction and mapping were carried out using the trained and tested Point, Circle, and Polygon-based RF model (see Fig. 5). Also shown in Fig. 4 and Table 2, the majority of the landslides occur in very high and high susceptibility zones, while only a small number occur in very low and low susceptibility zones, demonstrating that all three types of models are capable of accurately predicting the overall susceptibility map for landslides in Ruijin City.

Point, Circle, and Polygon-based DBN models predict landslide susceptibility
In order to build the best model, the most appropriate parameters were adjusted in the MATLAB software using the training and testing datasets of Point, Circle, and Polygonbased DBN models (based on the most widely used 7:3 ratio). The learning rate, momentum and iteration time in the model are 0.01, 0.25 and 500, respectively, and the number of hidden layers is set to 2. The DBN's activation function is configured to be the Soft-Max function. The entire Ruijin City's landslide susceptibility index is predicted using the trained model, and the results are then imported into ArcGIS 10.2 for rasterization to produce maps of landslide susceptibility for the Point, Circle, and Polygon-based DBN models. The methods of landslide susceptibility classification include quantile, natural interruption point, equal interval method, etc. In this study, the obtained landslide susceptibility is reclassified into five categories of very low, low, medium, high and very high according to the natural interruption point method (see Fig. 6 and Table 2) (Pellicani et al. 2017). Table 2 shows that each model's expected FR values for the landslide susceptibility index are gradually rising, demonstrating the general viability of the predictions made by the various DBN models.

Comparison of modeling accuracy
The ROC curve and its AUC value are suggested to assess the modeling accuracy in order to analyze the modeling uncertainty under various landslide borders and their  Fig. 7. The AUC values for the Polygon-based DBN and RF models are 0.935 and 0.876, respectively, which are significantly higher than the AUC values for the DBN and RF models under the remaining 2 landslide boundaries. AUC values for the Point-based DBN and RF models were 0.886 and 0.836, respectively, and 0.891 and 0.862 for the Circle-based DBN and RF models. Although DBN has a greater AUC accuracy compared to the RF model, the modeling pattern of susceptibility prediction under various landslide boundaries is consistent between the 2 different machine learning models. 1 3

Confusion matrix for susceptibility index
According to the DBN prediction model, 0.5 is employed as a threshold for binary classification of whether a landslide has occurred when a continuous value of landslide susceptibility between 0 and 1 is achieved. A landslide has happened if the value is more than or equal to 0.5 and less than 0.5 signifies, there has been no landslide. As a result, the assessment results were totaled, and Table 3 error matrix was created. As can be seen from the table, the accuracy and recall of the slippage are low, despite the fact that the total accuracy of the classification results at a threshold value of 0.5 are all relatively high. Comparing the data in the table, Polygon-based DBN has the highest accuracy, recall and total precision.

Uncertainty in susceptibility results under different landslide boundaries
Landslide susceptibility modeling needs to consider reliable landslide locations and their spatial shape, which are fundamental to the identification of potential landslides and their instability. As can be seen in the combined Sects. 5.1 and 5.2, in general the use of polygonal surfaces as landslide shapes for modeling gives more accurate susceptibility results, while failure to accurately map landslide boundaries will necessitate the use of special landslide boundary forms. When the landslide boundaries are wide in extent, the use of landslide points for modeling frequently results in high levels of uncertainty. The buffer circle as a landslide shape is inherently subject to a certain amount of metric error, which inevitably introduces error information into the modeled spatial dataset, while accurately mapped landslide boundaries are effective in improving the accuracy and reducing the uncertainty of susceptibility predictions.
This study also demonstrates that, in the absence of precise landslide boundary data, landslide boundary shapes based on landslide points and buffer circles can be used for landslide susceptibility modeling. The susceptibility results obtained can reflect the overall landslide distribution pattern in the study area. Additionally, this paper only examined the impact of various landslide boundary shapes on the modeling of susceptibility; it did not examine the impact of these boundary shapes on the modeling of landslide hazard warning when the impact of external triggers, such as engineering slope cutting and heavy rainfall, etc., was present.

Technological advances in cataloging and mapping landslides
Mapping out accurate landslide boundaries is very challenging and recent developments in geospatial informatics, based on 3S technology and spatial data infrastructure, have provided new technical support for landslide cataloging and susceptibility modeling.
(1) Landslide detection, identification, cataloging, monitoring, and early warning have all been greatly enhanced by new high-definition remote sensing and earth observation technologies, such as high-resolution visible and thermal infrared imagery, satellite- based interferometric radar, differential interferometric radar, and airborne laser altimetry.
(2) The interpretation of landslide imagery can be greatly enhanced by the use of image visualization and virtual reality, multi-temporal elevation detection, cloud GIS, and large-scale concurrent RS software. (3) Combining the detailed geohazard survey hosted by the China Geological Survey Bureau with high-precision remote sensing interpretation will further complement and improve the geohazard database established by the county and municipal geohazard surveys. (4) High-resolution topographic maps or DEMs are very important input data for landslide interpretation and can be used for accurate mapping of landslide boundaries and fine identification of topographic features. (5) Predictive modeling of landslide susceptibility greatly depends on the accuracy of the gathered landslide inventory data and its mapping with the actual landslide development. On the one hand, the landslide susceptibility results' accuracy can inadvertently reflect the original landslide compilation's dependability, and results that are extremely accurate show the original landslide compilation's high caliber. On the other hand, to ascertain whether the landslide catalogs are valid, pertinent specialists can be asked to carry out a comprehensive test of landslide cataloging mapping utilizing fundamental information such local landslide field survey experience and landslide environmental elements.

Conclusion
1. The findings of this study demonstrate that compared to the Point-based and Circlebased models, and the Polygon-based model has higher accuracy and lower uncertainty, and the susceptibility index is more comparable with the actual field characteristics of the probability distribution of landslides. As can be seen, using an exact polygonal surface as the landslide boundary allows for more precise prediction of the susceptibility to landslides. 2. The DBN model predicts susceptibility accuracy more accurately than RF, demonstrating that more sophisticated machine learning can significantly improve slip prediction accuracy and that machine learning models typically have better susceptibility prediction accuracy than traditional mathematical statistics and heuristic models. Comparisons under various combinations of conditions also demonstrate that the DBN model predicts susceptibility accuracy more accurately than RF. 3. This paper offers new ideas to explore the various factors affecting the modeling and the resulting uncertainties, starting from the distribution pattern of landslide susceptibility indexes and the significance of differences, in contrast to the existing studies, which only discussed the accuracy of susceptibility prediction from the accuracy indices such as AUC. Future ground-hazard survey maps should appropriately depict landslide borders and their spatial forms, and landslide points or buffer rings should not be used to denote landslides. Of course, buffer rings can be used to approximate landslide boundaries in the absence of precise information, and the anticipated landslide susceptibility can also represent the likelihood of landslides occurring in specific locations over the whole research area.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.