Integration of Information Theory, Fractal Theory and Statistical Analyses for Landslide Susceptibility Mapping at Regional Scales

A new model, integrating information theory, fractal theory and statistical model for accurate landslide susceptibility 7 mapping (LSM) at regional scales, has been proposed. In this model, landslide conditional factors are firstly classified with an 8 optimal number of classes, which is determined by maximizing their information coefficients estimated from Shannon’s entropy 9 model. The spatial association between influencing factors and induced landslides has been measured by introducing the variable 10 fractal dimension method (VFDM). The VFDM approach fully considers the characteristics of landslide fractal distribution. Then 11 the fractal dimensions ( 𝐷 ) are calculated to provide multiple factors with various numerical weights. The proposed model 12 eventually combines the landslide frequency ratio ( 𝑓𝑟 ) of each factor with corresponding weight to achieve spatial prediction of 13 landslides, illustrated by an example area in China. In the study area, 500 landslides have been identified by aerial photograph 14 interpretation, extensive field investigations, historical and bibliographical landslide data. In the model, these landslides are 15 randomly split into a training dataset (70 %) and a validating dataset (30 %). Seven factors are recognized and analyzed by 16 frequency ratio (FR) method, including lithology, distance to fault, altitude, slope, aspect, distance to stream and distance to the 17 road. The receiver operating characteristic curve (AUROC) has been adopted to compare and validate the model results. Results 18 show that the proposed landslide model achieved a more accurate prediction with AUROC equal to 0.8467, over-performing than 19 the conventional frequency ratio method (AUROC=0.8088). According to the final prognostic landslide susceptibility map, 16.37 % 20 of the study area shows very high and high susceptibility, accounting for 63.55 % of the entire landslides. Evaluation of relative 21 factor importance based on a one-by-one factor removal test indicates that the lithology factor contributes unique information for 22 landslides. In conclusion, the example demonstrates that the proposed framework is promising for further improvement of LSM.


Introduction
(fractal) function Malamud et al. 2004; Guthrie et al. 2007; Trigila et al. 2010; 48 Ghosh et al. 2012). Also, geoscientists explored the spatial association between landslides and 49 corresponding influencing variables (Brunetti et al. 2009). Some related and significant conclusions have  The objective of this study is to propose an alternative approach for landslide susceptibility mapping 63 (LSM). The proposed new method integrates information theory, fractal theory and statistical analyses.  Table 3). Around 53.9 % of the study area 93 is covered by metamorphic rocks (schist, gneiss, and metamorphic limestone).

95
Landslide inventory map 96 The unique mountain-basin landscape, complex geologic condition and frequent extreme weather are 97 responsible for the widespread landslide distribution in this area. Landslide inventory has been performed 98 to analyze the homogeneous population. According to statistics, over 85 % of slope movements in the 99 study area are rockfalls. Thus, only rockfalls have been selected. A total of 600 landslides were registered.  beyond the scope of this study. In this study, the 600 landslides are depicted as points in GIS shapefile 112 format. The landslide inventory map has been then produced as a point map (Fig. 1). 114 Seven landslide conditional factors are recognized in the study area, including lithology, distance to fault, 115 altitude, slope, aspect, distance to stream and distance to the road. The inclusion of these factors draws 116 heavily on extensive literature reviews, experience gained from studying landslide phenomena, and 117 sufficient data availability. Some ancillary data have been used to extract these conditional factors, as 118 listed in Table 1 below. These factors and the landslide inventory map have been extracted with a grid 119 size of 30 m×30 m to match the digital elevation model (DEM). Landslide observations are usually divided into a training dataset for model building and a validating are randomly sampled from the landslide-free areas.

137
In the treatment of classifying continuous landslide conditional factors, Shannon's entropy index has 138 been adopted to determine optimal class numbers by maximizing the information coefficient of each The frequency ratio (FR) method has been commonly used to predict landslide spatial distribution for its

193
 Construct cumulative-sum sequences with the basic data sequence { } . We define the 1-order 194 cumulative-sum sequence { 1 }, the 2-order cumulative-sum sequence { 2 }, and so on, as follows:

Validation and comparison strategy
The receiver operating characteristic (ROC) curve can evaluate landslide susceptibility models (Zweig 223 and Campbell 1993). Specifically, in each mapping unit, the stochastic methods produce an estimate of 224 the probability (between 0 and 1) of an unstable condition. Once an optimal cutoff value has been derived,  gives the least information.

271
Optimal class numbers of five continuous factors are given in Fig. 3. The two categorical factors of 272 lithology and aspect were also illustrated in this figure.
273 It can be concluded that the GI serves as a more suitable classification method for the current case, for 276 the information coefficient values derived from this method are generally higher than those derived from 277 the NB method. Note that "Null" represents non-calculate, indicating no landslide points in specific factor 278 class/interval. For the factors of distance to lineaments (fault, stream, road), most of the information 279 coefficient values were assigned as "Null" when classified by the NB method. This is mainly because  Landslide frequency ratio 290 After the determination of the optimal class number, the calculation of followed. The calculated values for each class/interval of the conditional factors were plotted in Fig. 4.  Fractal-based weights 309 The measurement of spatial associations between landslides and the conditional factors using fractal 310 dimension has been performed by the VFDM. Fig. 5 presents the linear fitting results of the relationships 311 between the conditional factors and the transformed landslide cumulative-sum sequences. The figure   312 shows that the paired datasets ( , 2 ) of six factors can be well linear fitted after the two-order 313 cumulative-sum transform, except for the slope factor.    For visualization, the final landslide susceptibility map was illustrated in Fig. 6a. According to the statistical result of the landslide susceptibility map (Table 5)   Validation and comparison 357 The prediction performance of the landslide model has been evaluated using the ROC curve. The ROC 358 analysis shows that the proposed method (red line in Fig. 7) provides a satisfying prediction accuracy, 359 with an AUROC value of 0.8467. The second landslide susceptibility map (Fig. 6b), together with the related ROC curve (blue line in 363 Fig. 7), has been added using the same weight for all the conditional factors. Its ROC shows relatively  the fractal structure of landslide distribution is a heterogeneous one.

397
Furthermore, the VFDM has been introduced to process the calculated data in Table 5. The landslide 398 grid number was transformed to obtain the 1-order cumulative-sum sequence 1 (Row 4), then the 399 data series ( , 1 ) were plotted in double logarithmic coordinate (Fig. 9b). For comparison, these data 400 series can be well fitted by a single straight line, from which we can obtain a constant fractal dimension 401 value.

402
These outcomes comparisons support our hypothesis and prove the feasibility of VFDM for the current 403 research.
404  16.37 % of the study area showed very high and high susceptibility, including the 63.55 % of the total 420 number of landslides. The most susceptible areas were located in the northeast mountainous areas, while low and very low susceptible zones characterized the central-south areas. Evaluation of relative factor 422 importance based on a one-by-one factor removal test indicated that the lithology factor provides unique 423 information compared to the rest. 424 Furthermore, for many regions where the landslides distribution presents fractal characteristics with 425 heterogeneous structure, using the framework described here can make a promising landslide 426 susceptibility assessment.