The multivariate statistical methodology developed in this study to assess the susceptibility to landslides was chosen to quickly adapt to the needs of data management since the use of statistical methods in geodomain is involved and is still involved, at least in some countries. It makes it possible to describe the relationships between variables simultaneously observed on all landslides, even in large numbers. For this research, data collection, field survey, mapping of landslide inventory, image analysis, assessment of landslide factors and mapping, modeling and the validation of the sensitivity to landslides were applied.
The first step is to process and analyze the database resulting from the inventory of landslides in the region. The data are chosen based on relevance, availability and scale attribute (maquaire et al., 2006). The seven factors selected from Table 1 for the assessment of landslide susceptibility must be operational, comprehensive, nonuniform, fundamental, and measurable (Ayalew and Yamagishi, 2005).
Table 1
Variables for the analysis of susceptibility to the kabylia and sources of information.
Group of variables

Variables
Dependent (Vd)/explicatif (Ve)

Source

Inventory of observed events

Vd : Landslides

Orthophoto plan (google Earth)
Site Observations
Topographic maps

Topography

Ve : Gradient of slope
Ve : Elevation

MNT Digital terrain model

Geology / Geomorphology

Ve : Lithology

Geological map.
Observations
Orthophoto plan
Technical reports

Vegetation cover

Ve : Land cover

Satellite image fusion (LandSat ETM, Spot XS)

land use

Ve : Land use

Land use maps

Weather

Ve : Rainfall

Annual precipitation map

The next step is to conclude a landslide susceptibility map and assess the parameters (factors) influencing the landslide susceptibility. The method of principal component analysis (PCA) was used to determine the most influential parameters and their respective weights according to the percentage of variability obtained, in order to reduce the redundant information of the variables and to transform them into variables. correlated into uncorrelated variables (Gorsevski, 2001). The outputs are then integrated into a GIS geographic information system model to assess and map the susceptibility to landslides in the study area. The analysis is performed using a linear model, which calculates the probability that an individual pixel contains a landslide.
For the validation of the model, we used the density graphs of the landslides by comparing the LSM with the training and validation dataset of the inventory map. In general, the procedure followed in this work is summarized in Fig. 5.
Data Acquisition And Preparation
In this study, the seven parameters acquired altitude, lithology, slope angle, NDVI vegetation cover, rainfall, land use and cause category were prepared in the ARCGIS10.3 database as conditioning factors of landslides.
These conditioning factors and an inventory of landslides and with the same projection (UTM) and the same pixel size (30mx30m) formed the database for this study. Therefore, these landslide factor maps were reclassified to subclasses and overlaid with classified training landslide datasets (Fig. 6).
The slopes are characterized by a strong gradient between 10 and 45 °. This slope gradient has been reclassified into six slope degree classes and shows that more than 70% of the slopes have an inclination of less than 20 °. The use of the NDVI method to construct the vegetation cover map of the region was chosen, To this end, landsat 8 satellite images were used to simplify and limit the number of classes while preserving most of the information that has been grouped into four classes: weakly dense (F. dense), medium dense (M. dense), dense and very dense (T. dense).
The land cover / land cover map of the study area was derived from the interpretation of the satellite image and the Google Earth image. The land use / cover map has been reclassified into five classes: forest, river, villages, agricultural area and towns.
Soil lithology has a significant impact on slope stability, lithological units have different susceptibilities. The lithology map was derived from the geological map and has been reclassified into seven classes according to the main units: clays, Gneissgranites, Flyshes, marls, shales, sandstones and alluvium.
Precipitation plays an important role in decreasing shear strength by increasing pore pressure. This parameter is widely used in susceptibility analysis. The annual mean precipitation map was produced by interpolation of isohyet curves (kriging) then reclassified into four classes at 100 mm / year intervals: 600–700, 700–800, 800–900, 900–1000,> 1000 mm /year.
Determination Of Influencing Factors
In this step, influencing factors have been prepared, the factors obtained are those which have a major effect on the occurrences of landslides among the initial conditioning factors. The relationship between the landslide distribution and the landslide control factors is reflected by the weight values of the parameter classes as a function of the percentage of variability obtained. This was done using the APC multivariate statistical method, which effectively eliminates the redundancy of information that arises when analyzing a large number of parameters and reduces the number of variables and limits their interdependence.
A negative weight value of any parameter class means that its presence may not contribute to the occurrence of landslides. On the other hand, a positive weight value means that the characteristics or occurrence of the particular parameter class can increase the likelihood of landslides. A weight value around zero (± 0.1) does not indicate the presence or absence of landslides (Van Westen et al., 2003). Magliulo et al. (2008) demonstrated that taking into account landslide tear zones rather than landslide bodies for statistical analysis gives a significant result of the sensitivity assessment.
To perform the principal component analysis on the classified variables, first of all, a nonrotating principal component analysis (PCA) was carried out using the correlation matrix (Table 2). Thus, all the variables are given the same weight. The correlation matrix is provided with elements of description of associations between active variables (according to the recommendation of Lebart et al. (2000)).
Table 2
Correlation matrix used in Principal Component Analysis (PCA).
Variable

Lithology

Slope degree

Rainfall

NDVI

Elevation

Land use

Causes

Lithology

1,000







Slope degree

0,034

1,000






Rainfall

0,238

0,011

1,000





NDVI

0,010

0,024

0,295

1,000




Elevation

0,177

0,093

0,286

0,440

1,000



Land use

0,200

0,008

0,120

0,347

0,009

1,000


Causes

0,270

0,041

0,031

0,359

0,038

0,398

1,000

Table 2 shows that the correlations between the variables are generally not very high. Only the two components of NDVI and altitude are strongly correlated with each other, as well as causes and land use.
A first strategy is simply to inspect the correlation matrix. The offdiagonal elements take on low values (in absolute value) when the variables are not closely related. It is therefore illusory to hope to obtain an efficient synthesis in a reduced number of factors. Some variables are strongly correlated and others are less so. It is difficult to give a threshold value from which to decide on the existence of exploitable links. It is above all a "general impression of correlation" by consulting the matrix.
It is not always easy to find, from the correlation matrix, the groups of variables having roughly the same behavior. Principal component analysis will allow a synthesis of these links. However, the extraction of the important components actually amounts to a rotation allowing to maximize the variance (varimax) in the space of origin of the variables (Table 3).
Table 3
Eigenvalues of main components.
Value
(factor)

Eigenvalues

% Total variance

Cumulated Eigenvalues

Cumulus %

Lithology

1,958

27,97

1,958

28

Slope degree

1,608

22,97

3,566

51

Rainfall

1,014

14,49

4,580

65

NDVI

0,812

11,61

5,392

77

Elevation

0,655

9,36

6,047

86

Land use

0,543

7,76

6,590

94

Causes

0,410

5,85

7,000

100

The first result consists of the list of the eigenvalues and the percentages of variance calculated from the correlations (Table 3 and Fig. 7). The main components, which will be considered in this study, are those which contribute to the total variance at least as such as the original variable (eigenvalue > 1). On the basis of this criterion, the first components of the tree (03) are selected. These components represent about 65% of the variance. These components can be discussed in terms of factorial weights and communities (common variances).
A factorial weight represents the correlation between an original variable and a new component (Table 4 and Fig. 7) by indicating the contribution of each variable as a function of this correlation. The matrix of factorial weights, with the original variables in rows and the principal components in columns, allows each component to described according to the best correlations it presents with the original variables. Table 5 presents the matrix of factorial weights obtained after rotation of the “normalized varimax” type for the (03) first components considered useful for the analysis.
Table 4
Eigenvalues of main components.
Variable

Lithology

Slope degree

Rainfall

NDVI

Elevation

Land use

Causes

Lithology

0,01

0,34

0,00

0,08

0,57

0,00

0,00

Slope degree

0,00

0,01

0,94

0,03

0,01

0,00

0,02

Rainfall

0,10

0,19

0,02

0,26

0,25

0,17

0,01

NDVI

0,36

0,01

0,01

0,05

0,01

0,01

0,56

Elevation

0,14

0,19

0,01

0,32

0,01

0,05

0,27

Land use

0,20

0,10

0,00

0,25

0,01

0,36

0,07

Causes

0,19

0,16

0,01

0,01

0,15

0,41

0,07

The representation of the variables in the two different factorial designs is illustrated in Fig. 8 and the parameters selected as a function of the principal factors are presented in Fig. 9. The principal component analysis provided a better understanding of the interrelationships of the explanatory variables used in modeling and highlighting their redundancy. According to this study, these variables can be grouped into three main components. Therefore, a model combining a total of (03) data layers describing the spatial distribution of each of the different principal components selected should allow a reliable prediction of the susceptibility to landslides.
Table 5
Varimax rotated factor matrix
Variable

With 1 factor:
« Lithology »

With 2 factor:
«Lithology
+ slope degree»

With 3 factor:
«Lithology
+slope degree + Rainfall»

Lithology

0,011

0,559

0,560

Slope degree

0,004

0,012

0,966

Rainfall

0,201

0,509

0,534

NDVI

0,697

0,712

0,722

Elevation

0,273

0,578

0,591

Land use

0,397

0,561

0,563

Causes

0,376

0,635

0,645

It is clear from the PCA analysis that three (03) of the variables already assumed to have a predominant role in the reactivation processes, namely lithology, degree of slope and precipitation, are not very redundant. Due to the maximization of the effects of the intervening factor, we try to go up to (05) variables
Landslide Susceptibility Mapping
In this section, we proceed to the establishment of susceptibility maps related to the landslide phenomenon in the kabilie region under ARCGIS. The weights obtained by multivariate statistical processing were used (Table 6) to establish the susceptibility index to site failures. These weight values are used as function coefficients. Each layer of factors (parameter) influencing the instability of slopes that we selected was attributed to its relative weight (Fig. 10).
Table 6
Varimax rotated factor matrix.
Variable

Function coefficients

Lithology

0,611504

Slope degree

0,976530

Rainfall

0,555157

NDVI

0,835143

Elevation

0,552416

The calculation of the landslide sensitivity values (named "LS") was done according to the following equation:
Where "fn" is the value of the stability factor of the variables considered, which is evaluated as a function of the influence of each factor on the stability of the site and Wn is the weighted value attributed to each factor f (the index "n" describes the factor considered) (MontoyaMontes et al. 2012).
The susceptibility to landslide varies from 0 to 5, making it possible to distinguish 5 levels of susceptibility: very low (0 < S ≤ 1), low (1 < S ≤ 2), moderate (2 < S ≤ 3), high (3 < S ≤ 4), and very high (4 < S ≤ 5) (MontoyaMontes et al. 2012). The weighted maps were subsequently developed in ArcGIS. These weighted maps were rasterizedusing search in spatial analysis. After rasterizing the factor maps, the landslide susceptibility index maps generated by the linear combination under ARCGIS (the sum of all raster maps using a raster calculator in Map Algebra).
The map (LSM) was classified according to the classification scheme into five categories: very low, low, moderate, high and very high sensitivity classes using natural breaks (Fig. 11).As shown in Fig. 11, the "high" susceptibility classes are mainly located on varied slopes and generally low to medium class up to 25 ° and in areas whose lithology is characterized by marl formations and clays, a low or medium density plant cover and annual rainfall ranging from 600 mm to 900 mm. The middle and southeastern parts of the region are globally classified as "low" to "moderate" susceptibility. These various observations thus show a good agreement with the expert opinion. The dispersion of the slips in relativity to the susceptibility map is shown in Fig. 12.
The distribution plot helps determine how well the resulting maps classified existing landslide areas (Chung and Fabbri 1999). This type of classification is a suitable method for easy visual interpretation of the LSM map
Validation
In this study, the density graph of the landslide susceptibility map produced for the proposed method was drawn. This graph is a suitable approach to show how landslides are distributed in different sensitive areas of an egress LSM. To plot the landslide density graph, the landslide density (the ratio of pixels with landslides that occurred to the ratio of pixels of landslides that did not occur) for each classified hotspot will be plotted on a diagram.. According to the theoretical basis, the value of the density of landslides is expected to increase from very weak sensitive areas to very sensitive areas (Pradhan & Lee 2010a) with an increasing rate. The results obtained havethen been shown in FIG. 13.
According to the results of the analysis, 37.50% and 38.24% of the study area fall into very low and low sensitivity classes. The moderate, high and very high susceptibility classes to landslides represented 45.45%, 71.43% and 83.33% of the study area, respectively.
According to Pradhan and Lee 2010, the value of the density of landslides is expected to change from very low to very high susceptibility. In fact, the same phenomena were observed in this study and Fig. 11 shows an increasing rate. The results obtained show that a large part of the instabilities is classified as very high susceptibility.
The model was validated by comparing existing landslide inventory and validation datasets in the study area with the landslide sensitivity map produced. In this study, the ROC curve for the model was plotted and the air under this curve was calculated using IBM SPSS Statistics software.
As the results of the analysis in Fig. 14 and Table 7 show, the closer the ROC curve is to the left of the top of the curve, indicating the higher the accuracy of the model. As shown in Table 7, the AUC value is 0.782 closer to one, indicating the high accuracy of the model.
Table 7
Air under the AUC curve of the model proposed in the study region.
Zone

Standard error a

Sig. Asymptotic b

95% asymptotic confidence interval

Lower bound

Upper bound

,782

,073

,060

,541

,827

a. In the nonparametric hypothesis

b. Null hypothesis: true area = 0.5
