A detailed landslide inventory was based on fieldwork and interpretation of satellite images (SPOT-6), and on a compilation of pre-existent landslide inventories. Fieldwork was conducted in year 2015 and 2016 along the main river and some tributary rivers. Field data recorded landslide type, size (length, height, and depth), and location in relation to the river. For the landslide mapping and classification, we followed the landslide hazard zonation protocol (2006) of the Forest Practices Division, Department of Natural Resources, Washington State, Cruden & Varnes (1996), and Wieczorec (1984). Landslides were classified into shallow landslides, debris flow, debris slides, deep-seated landslides, incised meanders, and rock falls; interpretation and mapping used Spot 6 from the year 2015 with spatial resolution of 1.5 m in panchromatic mode and of 6 m in multispectral mode (SIAP-SEDENA 2016). Some landslides were added from the database of Álvarez (2015). With the landslide location mapped, a point coverage was produced in the Arc/Info® program version 10.0. Positioning the points corresponding to landslides over the satellite image, the landslide headscarps were identified, verified, and digitized (Fig. 2).
Two classes of dependent variables were recognized. Landslide headscarp areas and non-landslide areas were rasterized and reclassified in a dichotomous way: landslide areas as 1 (1 299 pixels, 0.97% of the total watershed area) and non-landslide areas as 0 (132 390 pixels, 99.03% of the total watershed area) with a spatial resolution of 15 m per pixel (Fig. 3).
We used stratified random sampling of sites in landslide areas and non-landslide areas to conduct MLR-in situ and MLR-CNSA analysis (McGrew and Monroe 2000; Pardo and Ruíz 2002). Of 512 sites selected, 384 (75%) were used for the calibration of the models and 128 (25%) for model validation. The same number of sample sites was sampled from the landslide areas and non-landslide areas: 192 from each class for the calibration, and 64 from each for the validation.
Thematic maps and second-derivative products at a scale of 1:50 000 were used as the independent variables. The thematic cartography was standardized to a raster format with a spatial resolution of 15 m per pixel. The 16 independent cartographic variables were processed and coded according to the type of data that they represented: six were qualitative (land use, land use change, geomorphology, edaphology, lithology, and soil texture) (Table 1) and the other ten were quantitative (altitude, terrain steepness, down-slope direction, terrain surface curvature, terrain roughness, slope length, distance from faults and fractures, distance from roads, and normalized difference vegetation index) (Table 2). For the MLR-in situ analysis, the qualitative cartographic variables were coded as categorical (string variable) and the quantitative variables by using their raw values (numeric variable). For the MLR-CNSA analysis, the qualitative cartographic variables were coded using dummy variables. Dummy coding is a way of representing groups of a cartographic qualitative variable using only zeros and ones. To do this, we created several variables, one for each cartographic variable class. For instance, the qualitative cartographic variable of land use change had 3 classes: permanence, changes, and deforestation. For each of these classes, a dummy variable with 0 and 1 was created (Table 1, Fig. 4). For the quantitative cartographic variables, the raw values were used (Table 2, Fig. 5).
Table 1
Independent variables of qualitative type
Variable / Class
|
Variable / Class
|
Land use 2015 (Castro, 2020)
|
Geomorphology (García, 2017)
|
Human settlements
|
Dome 1
|
Farming
|
Dome 2
|
Fragmented forest
|
Dome 3
|
Semi-dense forest
|
Dome 4
|
Dense forest
|
Nevado de Toluca Cone
|
Natural grassland
|
Upper Hillslope
|
Induced grassland
|
Lower Hillslope
|
Without vegetation
|
Lava Flow Mesa
|
Other types of vegetation
|
Dome 5 partially buried
|
Land use change 1983–2015 (Castro, 2020)
|
Lava hillslopes covered with pumice and ash
|
Permanence
|
Debris avalanche hillslopes covered with pyroclast
|
Changes
|
Pumice hillslopes and fall deposits
|
Deforestation
|
Pumice hillslopes, block and ash
|
Edaphology (INEGI, 2001a)
|
Lava hillslopes covered with pyroclast
|
Humic Andosol
|
Falling pumice hillslopes and pyroclastic flows
|
Molic Andosol
|
Pyroclastic hillslopes
|
Ochric Andosol
|
Lava flow covered with pumice and ash flows
|
Haplic Pheozems
|
Lava flow partially covered with pyroclast
|
Eutric Fluvisol
|
Alluvial fan
|
Leptosol
|
Floodplain
|
Eutric Regosol
|
Lithology (INEGI, 2001b)
|
Soil texture (INEGI, 2001a)
|
Intermediate extrusive igneous
|
Gross
|
Alluvial
|
Medium
|
Volcanic breccia
|
|
Tuff
|
|
Andesite
|
|
Basalt
|
Table 2
Independent variables of quantitative type
Variable / Units
|
Variable / Units
|
Altitude m a.s.l. (INEGI, 2013a; ESRI, 2016)
|
Slope length (INEGI, 2013a; FSFI, 2016)
|
Units meters above sea level.
|
Units meters.
|
Terrain steepness (INEGI, 2013a; ESRI, 2016)
|
Distance from faults and fractures (INEGI, 2001b; ESRI, 2016)
|
In degrees.
|
Units meters.
|
Down-slope direction (INEGI, 2013a; ESRI, 2016)
|
Distance from rivers (INEGI, 2010; ESRI, 2016)
|
Expressed in azimuthal measure; the value − 1 corresponds to flat areas.
|
Units meters.
|
Terrain surface curvature (INEGI, 2013a; ESRI, 2016)
|
Distance from roads (INEGI, 2013b; ESRI, 2016)
|
Positive values for convex surfaces, negative values for concave surfaces, and 0 for flat areas.
|
Units meters.
|
Terrain Roughness (INEGI, 2013a; Sappington et al., 2007)
|
Normalized Difference Vegetation Index (NDVI) (SIAP-SEDENA, 2016; ESRI, 2016)
|
values between 0 and 1, where 0 corresponds to flat areas, 1 corresponds to maximum roughness.
|
Values between − 1 to 1; values close to -1 correspond to water or snow and values close to 1 indicate vegetation with a high chlorophyll content.
|
A multicollinearity diagnostic was calculated for the quantitative variables by using the variance inflation factor (VIF). A VIF value greater than 10 is indicative of a serious multicollinearity problem (Field 2013).
MLR-in situ was calculated with the values extracted from the 16 thematic variables for the sample sites. To extract the values, 384 sites corresponding to the calibration sample were coded 1 in areas with landslides and 0 in areas with no landslides. Using the values, an intercept β0 and β coefficients were calculated by using logistic function (Eq. 1) under SPSS (Pardo and Ruíz 2002).

where:
Y is the probability that a process will occur.
X 1 to Xk are the independent variables that are part of the model.
β0 is the constant in the model coefficients.
β1 to βk are the independent variables coefficients.
With SPSS, forward MLR examined the variables in the model to see whether any variable should be removed. This method begins the model with only the constant β0, then selects the independent variable that has the highest simple correlation with the outcome. If this variable significantly improves the ability of the model to predict landslides, then this predictor is retained in the model and the computer searches for a second predictor. The process is repeated until all variables have been evaluated (Spiegel and Stephens 2009).
The MLR-CNSA analysis required, first, use of the Focal Statistics tool in ArcMap for a search radius to extract pixel values for each cartographic variable based on a distance ratio. The distance ratio allowed consideration of the values f the neighborhood pixels that surround a landslide. The specific neighborhood in our case was concentric circles with radii from 1 pixel to 20 pixels. For the qualitative cartographic variables, the values assigned to each variable class was the sum of values within each concentric circle; to do this, we used the Focal Statistic in ArcMap to calculate the Sum statistic for the maps of dummy variables with 0 and 1. For the quantitative cartographic variables, the value assigned to each variable class was the mean of values within each concentric circle; this used the mean function of the Focal Statistics tool in ArcMap. A data matrix was obtained for each neighborhood area analyzed, from a 1-pixel radius to 20 pixels. The matrix thereby generated contained the data corresponding to the sampling site and the number of pixels of each class within the neighborhood area that made up the variable (Tables 3 and 4, and Fig. 6). Hence, a unique ASCII matrix was obtained for each cartographic variable, leading to 320 matrices (16 thematic variables x 20 neighborhood areas).
Table 3
Example of qualitative matrices from the CNSA for the land-use change variable at distances of 1 and 20 pixels
Sum of the CNSA at 1-pixel diameter
|
Sum of the CNSA at 20-pixels diameter
|
Site
|
Process
|
Permanence
|
Changes
|
Deforestation
|
Site
|
Process
|
Permanence
|
Changes
|
Deforestation
|
230
|
0
|
5
|
0
|
0
|
230
|
0
|
1 257
|
0
|
0
|
231
|
0
|
0
|
0
|
5
|
231
|
0
|
941
|
0
|
316
|
232
|
0
|
5
|
0
|
0
|
232
|
0
|
1 074
|
0
|
183
|
489
|
1
|
3
|
0
|
2
|
489
|
1
|
941
|
0
|
316
|
490
|
1
|
1
|
0
|
4
|
490
|
1
|
966
|
0
|
291
|
Table 4
Example of quantitative matrices from the CNSA for the altitude variable at distances of 1 and 20 pixels
Mean of the CNSA at 1-pixel diameter
|
Mean of the CNSA at 20-pixels diameter
|
Site
|
Process
|
Altitude
|
Site
|
Process
|
Altitude
|
230
|
0
|
2 888
|
230
|
0
|
2 880
|
231
|
0
|
3 075
|
231
|
0
|
3 066
|
232
|
0
|
3 046
|
232
|
0
|
3 011
|
489
|
1
|
2 981
|
489
|
1
|
3 012
|
490
|
1
|
2 975
|
490
|
1
|
3 008
|
The next stage in the MLR-CNSA analysis required selection of the neighborhood radius value. MLR was used to evaluate the relationship of the information obtained from the CNSA, and to develop the landslide probability model. Among the statistics of the MLR model obtained with SPSS, the − 2 log likelihood (-2 LL) assesses the level of probability from the model's result by comparing the observed cases with the expected ones (Martín et al. 2008). A smaller value of the − 2 LL indicator signifies a better adjustment for the statistical model, with 0 being the optimal value. The − 2 LL indicator can be compared between models, and the values show the level of adjustment from one model to another (Hair et al. 1999). Therefore, -2 LL can be used to evaluate the data obtained with the CNSA. For each variable, the distance of the neighborhood analysis that reports the best fit of the MLR model with the landslide process was selected, based on the lowest value of -2 LL within the range analyzed (Castro 2020; Castro and Legorreta 2019).
With the selected distances and their neighborhood values, a single matrix was integrated (Fig. 7) for the 16 independent variables. The conditional forward MLR model was applied to this matrix to create the model.
For an objective comparison between the MLR-CNSA and MLR-in situ models, a binary classification scheme was applied (landslide and non-landslide areas). The cut-off point between the two classes for each model was obtained by ROC curve analysis; this maximizes successes in both classification classes (Pardo and Ruíz 2002). Once the cut-off point had been obtained, spatial models of landslide probability were compared to identify possible advantages or disadvantages.
The accuracy in the predictive capability of the MLR-CNSA and MLR-in situ models was evaluated from the ROC curve; the area under the curve corresponded to the proportion of elements correctly classified with the MLR (Pardo and Ruíz 2002). The model maps were evaluated with a contingency table obtained from the overlay between inventory and models; the evaluation was made for the enlarged sample. The contingency table evaluates the classificatory capacity of the discriminant functions such as MLR (Hair et al. 1999). The contingency table yielded the kappa index that expressed the degree of agreement between the real group and predicted group (Pardo and Ruíz 2002). Other statistics calculated were as follows: the overall accuracy, which is the percentage of elements correctly classified (Nemmaoui et al. 2013; Pardo and Ruiz 2002); the producer’s accuracy, which is the ratio of elements correctly classified in a class divided by the total of real elements of the same class (Legorreta et al. 2012); the user’s accuracy, which is the ratio between the elements correctly classified in a class and the total elements classified by the model in that same category (Legorreta et al. 2012); and the model efficiency, which is the ratio of correctly classified elements of a class, minus the incorrectly predicted elements of the same category, divided by the total elements of the real group (Legorreta et al. 2012). In all of these statistics, the MLR-CNSA model had better results, which reflected its more exact spatial representation with respect to the landslides identified in the field.