Landslide Susceptibility Assessment Using Statistical Multivariate Analysis “PCA” in the Mediterranean Kabylia

Landslides are one of the most catastrophic geo-risks observed in northern Algeria, particularly in the regions of mountain ranges (Mediterranean Kabylia), where the processes are spectacular. Over the past decade, the landslides risk has become increasingly amplied in urban space, mainly affecting the economy and human life. This highlights the importance and the need to predict the spatial occurrence of these events in the national territory. In order to better manage this phenomenon, decision-makers must be able to have susceptibility maps, allowing them to identify areas in their region where new landslides will have a higher probability of being triggered in the future and of predene the damage associated with this phenomenon. However, approaches to assessing landslide sensitivity require a good knowledge of previously observed scenarios and include data collection and management, as well as spatial and statistical analyzes. In this study, a new multivariate statistical approach "PCA" was proposed to produce landslide susceptibility maps of the study area in a GIS system. This study allows the automatic analysis of most of the parameters related to the occurrence of slope failures while reducing the factors not inuencing the triggering of landslides, this eliminates the effect of redundancy between the factors studied.


Introduction
Land movements are considered among the most destructive natural phenomena. They are triggered by various factors and mainly occur in areas where relatively precarious soil equilibrium conditions predominate and are most often aggravated by hydraulic, seismic or anthropogenic stresses. The damage caused by land movements has considerable human and socio-economic consequences, which means that land movements turn into real risks that are often under-estimated, because they generally represent a part of multi-hazard disasters (Schuster, 1996).
In the Mediterranean region, landslides occur every year, and increase more and more in the urban space, mainly concerning the economy (destruction of several buildings, infrastructure and the rehousing of many families). This highlights the importance and the need to predict the spatial occurrence of these events in the national territory and to establish maps of susceptibility to landslides.
Risk assessment and management require a large amount of multidisciplinary and technical information that must be collected, processed, analyzed and communicated to a wide range of users under very different conditions, ranging from planning and regulatory activities to emergency management (Fedra, 1997). Modern GIS information technologies provide tools for the assessment of susceptibility and "slope movements" hazard (van Westen, 1993; Soeters and van Westen, 1996; Longley et al., 2001;Süzen, 2002).
Many susceptibility-mapping methods have been developed since the 1970s to deal with practical management problems (Kienholz, 1978;Brabb et al., 1979;Carrara et al., 1979). Methodologically, three main types of approaches can be identi ed for mapping susceptibility: qualitative approaches, semiquantitative approaches and quantitative approaches.Qualitative approaches are based on the expert opinion of the person performing the susceptibility analysis and mapping.
These methods are direct, because the expert interprets the susceptibility of the terrain directly on the terrain, depending on the phenomena observed and the geomorphological / geological framework or created in the o ce as a map derived from a geomorphological map through existing documents (maps topographic, geomorphological maps, geological maps (Petley et al., 2002;Barlow et al., 2003;Canuti et al., 2003;van Westen, 2004;van Westen et al., 2006;van Westen et al., 2014) As in the direct heuristic method, the GIS is used as a tool to capture the nal map without in-depth modeling (J. Corominas et al., 2014). Semi-quantitative (semi-empirical) approaches are based only on a relative assessment of the value of the exhibits. This relative value (single or variable) can be individually assigned to a speci c type of element (MATE / METL, 1999), just as it can be globally assigned to a set of elements for a homogeneous area (Bonnard et al., 2004), We cite as semi-quantitative methods "fuzzy logic", "AHP".
Quantitative methods are methods based on weighting criteria, theoretically reproducible. These methods produce identical results with the same conditions and data. These methods include statistical approaches, probabilistic approaches, arti cial intelligence, deterministic models and temporal approaches (M. Fessard, 2014). Various methods exist for the development of rules and relationships between variables and these include bivariate analysis and multivariate analysis. The bivariate analysis methods (Brabb et al., 1972) calculate a factor weight designated as the information value for each predisposing factor considered in the model. On the other hand, multivariate analysis methods, in particular discriminant analysis (Neuland, 1976;Carrara, 1983;Carrara et al., 1995), Boolean approaches using logistic regression (Atkinson and Massari, 1998;Ayalew and Yamagishi, 2005), Bayesian methods using weights of evidence and neural networks (Gómez and Kavzoglu, 2005;Lee et al., 2006). The limitations of these methods result from data quality such as mapping errors, incomplete inventory, and poor resolution of some datasets, as models are essentially trained data. In addition, the results of these models are not easily transferable from one region to another.
To solve the problem of solving certain data, risk analysis from multivariate statistics is implemented in geoscience when it is possible to establish correlations between the failures and explanatory factors.
This assumes, as a corollary, the availability of abundant data on failures. They allow the prediction of the evolution of the behavior of a landslide (prediction of displacements, pore pressures, etc.) and the anticipation of undesirable phenomena. This is a powerful approach for 'risk analysis of mechanisms.
The objective of this work is to assess the sensitivity to landslides in Mediterranean Kabilie by a multivariate statistical method proposed "PCA". The landslide susceptibility assessment procedure will be implemented in a GIS system using ArcGIS software. This study allows the automatic analysis of most of the parameters related to the occurrence of slope failures while reducing the factors not in uencing the triggering of landslides, this eliminates the effect of redundancy between the factors studied.

Study Area
The region of Mediterranean Kabylia located in north central Algeria has experienced in recent years an intense activity of several natural hazards including the phenomenon of landslide causing; several natural slopes experience more or less extensive and active land movements which are increasingly ampli ed in urban space, mainly relating to the economy and human life (Fig. 1).
The region covers an area of 2,992.96 km2. It lies between the angular coordinates: 36 ° 28 'North latitude, 36 ° 55' North latitude and, 03 ° 45 'East longitude and 04 ° 31' East longitude (Fig. 2). Schematically, this region is a vast bastion made up of a succession of mountain ranges all of general East-West orientation, which imprison narrow alluvial plains.
The Central Kabyle massif, comprising a set of entangled mountain ranges, oriented North-South without interruptions. The Kabylia of Djurdjura then takes the form of a "mouth" whose upper lip would be formed by the maritime massif, while the lower lip would contain the Massif Central bordered by the Djurdjura.
The Sebaou valley will separate the two "lips". The physical environment of the region is dominated by mountains; more than 50% of its territory is de ned as a set of very high mountains.
The distribution of slopes in relation to the physical units of the region of study shows that more than 83% of the territory of the region is made up of "di cult" terrain with strong and very steep slopes where 51.84% of the surface of the region is presented by very steep slopes > 25%, on the other hand the weak slopes (0 to 3%) represent only 6.24% of the total surface.
The studied region is located in the North Atlas chain, more precisely in the northern Tell which presents a particularly complex and diverse structure where the various sedimentary formations. For geology ( Fig. 3), more than half of the region is based on clays and marls, which occupy large areas. Paradoxically, sandstones and shales represent nearly 30% of the total in the region, and the rest is composed of Quaternary alluvium along the valleys, scree, limestone, and mica schists. Lithology and tectonics have imposed a great diversity of landscapes and a limited area of extension of the valley where parallel East-West trending reliefs and depressions containing stretched plains, more extensive downstream, coexist. With the exception of the permeable formations of the limestone and alluvial chain of the Sebaou wadi, the watershed formations are often "impermeable".
The warm Mediterranean climate with dry summers usually characterized by a long dry summer season, relatively mild winter temperatures with heavy rains during the rainy season. The average temperature in the region is around 18.5 ° C. from May the dry season begins; it can then be very hot 40 °, this climate is however a little softened by the proximity of the sea, and bene cial thunderstorms are frequent. On average, Kabylia receives between 600 and 1000 mm of rain per year, which makes it a well-watered region. As it is mainly composed of limestone massifs, this water is retained by the soil. The region's hydrographic network is made up of a dense, well-organized, and mostly deep, hairline. However, the region has a dense and diverse vegetation cover. It is one of the most forested regions (38% afforestation rate). Natural vegetation has been replaced over the past two centuries by secondary vegetation and large areas have been subjected to deserti cation. Earthquakes can also occur along the suture that joins the Kabylia of the Tellian external domain (Fig. 4). Earthquakes occurring along the southern edge of the Grande Kabylie massifs, ranging from Aomar to the Béjaia region. Finally, along the edge of Petite Kabylie, numerous earthquakes also punctuate the suture (Yelles-Chaouche et al., 2006).

Materials And Methods
The multivariate statistical methodology developed in this study to assess the susceptibility to landslides was chosen to quickly adapt to the needs of data management since the use of statistical methods in geo-domain is involved and is still involved, at least in some countries. It makes it possible to describe the relationships between variables simultaneously observed on all landslides, even in large numbers. For this research, data collection, eld survey, mapping of landslide inventory, image analysis, assessment of landslide factors and mapping, modeling and the validation of the sensitivity to landslides were applied.
The rst step is to process and analyze the database resulting from the inventory of landslides in the region. The data are chosen based on relevance, availability and scale attribute (maquaire et al., 2006). The seven factors selected from Table 1 for the assessment of landslide susceptibility must be operational, comprehensive, non-uniform, fundamental, and measurable (Ayalew and Yamagishi, 2005). Table 1 Variables for the analysis of susceptibility to the kabylia and sources of information. The next step is to conclude a landslide susceptibility map and assess the parameters (factors) in uencing the landslide susceptibility. The method of principal component analysis (PCA) was used to determine the most in uential parameters and their respective weights according to the percentage of variability obtained, in order to reduce the redundant information of the variables and to transform them into variables. correlated into uncorrelated variables (Gorsevski, 2001). The outputs are then integrated into a GIS geographic information system model to assess and map the susceptibility to landslides in the study area. The analysis is performed using a linear model, which calculates the probability that an individual pixel contains a landslide.

Group of variables Variables
For the validation of the model, we used the density graphs of the landslides by comparing the LSM with the training and validation dataset of the inventory map. In general, the procedure followed in this work is summarized in Fig. 5.

Data Acquisition And Preparation
In this study, the seven parameters acquired altitude, lithology, slope angle, NDVI vegetation cover, rainfall, land use and cause category were prepared in the ARCGIS10.3 database as conditioning factors of landslides.
These conditioning factors and an inventory of landslides and with the same projection (UTM) and the same pixel size (30mx30m) formed the database for this study. Therefore, these landslide factor maps were reclassi ed to subclasses and overlaid with classi ed training landslide datasets (Fig. 6).
The slopes are characterized by a strong gradient between 10 and 45 °. This slope gradient has been reclassi ed into six slope degree classes and shows that more than 70% of the slopes have an inclination of less than 20 °. The use of the NDVI method to construct the vegetation cover map of the region was chosen, To this end, landsat 8 satellite images were used to simplify and limit the number of classes while preserving most of the information that has been grouped into four classes: weakly dense (F. dense), medium dense (M. dense), dense and very dense (T. dense).
The land cover / land cover map of the study area was derived from the interpretation of the satellite image and the Google Earth image. The land use / cover map has been reclassi ed into ve classes: forest, river, villages, agricultural area and towns.
Soil lithology has a signi cant impact on slope stability, lithological units have different susceptibilities. The lithology map was derived from the geological map and has been reclassi ed into seven classes according to the main units: clays, Gneiss-granites, Flyshes, marls, shales, sandstones and alluvium.
Precipitation plays an important role in decreasing shear strength by increasing pore pressure. This parameter is widely used in susceptibility analysis. The annual mean precipitation map was produced by interpolation of isohyet curves (kriging) then reclassi ed into four classes at 100 mm / year intervals: 600-700, 700-800, 800-900, 900-1000,> 1000 mm /year.

Determination Of In uencing Factors
In this step, in uencing factors have been prepared, the factors obtained are those which have a major effect on the occurrences of landslides among the initial conditioning factors. The relationship between the landslide distribution and the landslide control factors is re ected by the weight values of the parameter classes as a function of the percentage of variability obtained. This was done using the APC multivariate statistical method, which effectively eliminates the redundancy of information that arises when analyzing a large number of parameters and reduces the number of variables and limits their interdependence.
A negative weight value of any parameter class means that its presence may not contribute to the occurrence of landslides. On the other hand, a positive weight value means that the characteristics or occurrence of the particular parameter class can increase the likelihood of landslides. A weight value around zero (± 0.1) does not indicate the presence or absence of landslides (Van Westen et al., 2003). Magliulo et al. (2008) demonstrated that taking into account landslide tear zones rather than landslide bodies for statistical analysis gives a signi cant result of the sensitivity assessment.
To perform the principal component analysis on the classi ed variables, rst of all, a non-rotating principal component analysis (PCA) was carried out using the correlation matrix (Table 2). Thus, all the variables are given the same weight. The correlation matrix is provided with elements of description of associations between active variables (according to the recommendation of Lebart et al. (2000)).  Table 2 shows that the correlations between the variables are generally not very high. Only the two components of NDVI and altitude are strongly correlated with each other, as well as causes and land use.
A rst strategy is simply to inspect the correlation matrix. The off-diagonal elements take on low values (in absolute value) when the variables are not closely related. It is therefore illusory to hope to obtain an e cient synthesis in a reduced number of factors. Some variables are strongly correlated and others are less so. It is di cult to give a threshold value from which to decide on the existence of exploitable links. It is above all a "general impression of correlation" by consulting the matrix.
It is not always easy to nd, from the correlation matrix, the groups of variables having roughly the same behavior. Principal component analysis will allow a synthesis of these links. However, the extraction of the important components actually amounts to a rotation allowing to maximize the variance (varimax) in the space of origin of the variables (Table 3). The rst result consists of the list of the eigenvalues and the percentages of variance calculated from the correlations (Table 3 and Fig. 7). The main components, which will be considered in this study, are those which contribute to the total variance at least as such as the original variable (eigenvalue > 1). On the basis of this criterion, the rst components of the tree (03) are selected. These components represent about 65% of the variance. These components can be discussed in terms of factorial weights and communities (common variances).
A factorial weight represents the correlation between an original variable and a new component (Table 4 and Fig. 7) by indicating the contribution of each variable as a function of this correlation. The matrix of factorial weights, with the original variables in rows and the principal components in columns, allows each component to described according to the best correlations it presents with the original variables. Table 5 presents the matrix of factorial weights obtained after rotation of the "normalized varimax" type for the (03) rst components considered useful for the analysis. The representation of the variables in the two different factorial designs is illustrated in Fig. 8 and the parameters selected as a function of the principal factors are presented in Fig. 9. The principal component analysis provided a better understanding of the interrelationships of the explanatory variables used in modeling and highlighting their redundancy. According to this study, these variables can be grouped into three main components. Therefore, a model combining a total of (03) data layers describing the spatial distribution of each of the different principal components selected should allow a reliable prediction of the susceptibility to landslides. It is clear from the PCA analysis that three (03) of the variables already assumed to have a predominant role in the reactivation processes, namely lithology, degree of slope and precipitation, are not very redundant. Due to the maximization of the effects of the intervening factor, we try to go up to (05) variables

Landslide Susceptibility Mapping
In this section, we proceed to the establishment of susceptibility maps related to the landslide phenomenon in the kabilie region under ARCGIS. The weights obtained by multivariate statistical processing were used (Table 6) to establish the susceptibility index to site failures. These weight values are used as function coe cients. Each layer of factors (parameter) in uencing the instability of slopes that we selected was attributed to its relative weight (Fig. 10). The calculation of the landslide sensitivity values (named "LS") was done according to the following equation: Where "fn" is the value of the stability factor of the variables considered, which is evaluated as a function of the in uence of each factor on the stability of the site and Wn is the weighted value attributed to each factor f (the index "n" describes the factor considered) (Montoya-Montes et al. 2012).
The susceptibility to landslide varies from 0 to 5, making it possible to distinguish 5 levels of susceptibility: very low (0 < S ≤ 1), low (1 < S ≤ 2), moderate (2 < S ≤ 3), high (3 < S ≤ 4), and very high (4 < S ≤ 5) (Montoya-Montes et al. 2012). The weighted maps were subsequently developed in ArcGIS. These weighted maps were rasterized-using search in spatial analysis. After rasterizing the factor maps, the landslide susceptibility index maps generated by the linear combination under ARCGIS (the sum of all raster maps using a raster calculator in Map Algebra).
The map (LSM) was classi ed according to the classi cation scheme into ve categories: very low, low, moderate, high and very high sensitivity classes using natural breaks (Fig. 11).As shown in Fig. 11, the "high" susceptibility classes are mainly located on varied slopes and generally low to medium class up to 25 ° and in areas whose lithology is characterized by marl formations and clays, a low or medium density plant cover and annual rainfall ranging from 600 mm to 900 mm. The middle and south-eastern parts of the region are globally classi ed as "low" to "moderate" susceptibility. These various observations thus show a good agreement with the expert opinion. The dispersion of the slips in relativity to the susceptibility map is shown in Fig. 12.
The distribution plot helps determine how well the resulting maps classi ed existing landslide areas (Chung and Fabbri 1999). This type of classi cation is a suitable method for easy visual interpretation of the LSM map

Validation
In this study, the density graph of the landslide susceptibility map produced for the proposed method was drawn. This graph is a suitable approach to show how landslides are distributed in different sensitive areas of an egress LSM. To plot the landslide density graph, the landslide density (the ratio of pixels with landslides that occurred to the ratio of pixels of landslides that did not occur) for each classi ed hotspot will be plotted on a diagram.. According to the theoretical basis, the value of the density of landslides is expected to increase from very weak sensitive areas to very sensitive areas ( According to Pradhan and Lee 2010, the value of the density of landslides is expected to change from very low to very high susceptibility. In fact, the same phenomena were observed in this study and Fig. 11 shows an increasing rate. The results obtained show that a large part of the instabilities is classi ed as very high susceptibility.
The model was validated by comparing existing landslide inventory and validation datasets in the study area with the landslide sensitivity map produced. In this study, the ROC curve for the model was plotted and the air under this curve was calculated using IBM SPSS Statistics software.
As the results of the analysis in Fig. 14 and Table 7 show, the closer the ROC curve is to the left of the top of the curve, indicating the higher the accuracy of the model. As shown in Table 7, the AUC value is 0.782 closer to one, indicating the high accuracy of the model.

Discussion
The results obtained show that the "elevation" parameter has no direct in uence on the occurrence of the landslide, compared to the other parameters considered which directly contribute to the occurrence of the landslide and strongly affect the whole of the system. However, this factor could not be excluded but its weight index will be low. Similar ndings are made by Rozos et al. (2008). The results show that the susceptibility to landslides is directly affected and increases with increasing degree of slope. In addition, certain classes of susceptibility to landslides are in uenced by lithology but certain other classes of susceptibility are weakly or not affected by this parameter. For "land use / cover", the most sensitive areas are generally located in areas characterized by agricultural exploitation for which signi cant deforestation work has been observed. The results obtained led to the observation that water is one of the most important causative factors of the occurrence of landslides in the Kabylia region. Indeed, according to Varnes 1984, water constitutes a lubricating parameter on a sliding surface, which facilitates the occurrence of landslide movements. Increasing the NDVI value causes a gradual increase in the frequency of landslides.
The LSM landslide sensitivity map of the study area subdivided into ve classes of very low, low, moderate, high and very high sensitivity based on multivariate statistical analysis is well applicable to treat unevenly distributed data.. The reliability of the LSM card depends on the quality of the input parameters used in the evaluation of the susceptibility of landslides. The selection of parameters conditioning landslides is carefully made and even justi ed for a large number of these parameters. The PCA analysis also showed a good effect of taking into account the correlations between these parameters.
The success rate curves on the proposed model from the landslide database showed how well the model could classify the study area based on previously existing landslides.