This section introduces the general workflow of the study, the study area, the employed datasets and their processing steps as well as the methodology of the exposure and social vulnerability analysis. Moreover, a conceptual note explains the assumptions taken in the context of this paper.
We developed the workflow shown in Figure 1 for the long-term multitemporal, intra-urban and meso-scale landslide exposure and social vulnerability assessment. First, for the generation of urban masks for three time-steps (1994, 2006 and 2018), we perform land cover (LC) classifications based on Landsat mosaics (a); second, we divide the built-up area of the city into two thematic groups: using the informal settlement layer from Kühnl et al. (2021), we classify informal and formal settlement areas to approximate social groups (b); third, population is estimated at the pixel level using disaggregation methods (c) and specified into population of informal settlements (d). These results (i.e., multitemporal urban masks, informal settlements and population) are combined with the landslide hazard map for the exposure and social vulnerability analysis (e).
Fig. 1 Workflow of the study composed of land cover classifications based on Landsat data in order to conduct multitemporal urban masks (a), multitemporal informal mask calculations based on an informal settlement layer from Kühnl et al. (2021) (b), population disaggregation methods based on census data separating between formal (c) and informal areas (d) as well as the exposure and as social vulnerability analysis using all pre-processed data and a landslide hazard map (e)
2.1 Study area
Our study area, the city of Medellín, is the second largest city in Colombia. It is the capital of the Department of Antioquia as well as of the Metropolitan region of the Aburrá Valley, a political and administrative unit of ten municipalities with a population of 3.5 million (Figure 1b; Garcia Ferrari et al. 2018; Hernandez Palacio 2012).
The municipality of Medellín (Figure 2, white boundary) is composed of urban (Medellín and San Antonio) and rural parts (Figure 2a and c). The area of interest (AOI) in this study refers to the urban, expansion and urbanized areas of Medellín (Figure 2a). Expansion areas are in the process of getting added to the administrative urban areas, but do not yet fully belong to this planning level (Alcaldía des Medellín, 2014 a, b). Urbanized areas are officially rural zones, but we include them into our AOI since they are characterized by mostly informal urban structures growing in the city’s outskirts. This allows the analysis to reflect the built-up conditions beyond administrative boundaries.
Geographically, Medellín is situated in a 14 km north-south expansion in the Aburrá Valley between two mountain ranges of the Andes in the west and in the east, crossed by the Medellín river running from north to south along the valley. The valley itsself has a 10 km maximum width and the height difference between the highest and lowest point is about 1 km (Garcia Ferrari et al., 2018; Hernandez Palacio, 2012). These characteristics lead to a very steep topography of the valley slopes in the east and west with a significant landslide risk (Claghorn & Werthmann 2015).
Socio-economically, the living conditions get worse with distance to the Medellín river and higher up in the mountains. Especially, the landslide and flash-flood prone slopes in the west and east of the city are mainly occupied by informal settlers. Also, a north-south seggregation can be identified. The neighborhoods with low unemployment rates are located in the south-east of the city whereas the contrary is dominant in the north-east and -west (Garcia Ferrari et al., 2018).
Fig. 2 Location of the study area. Urban, expansion and urbanized areas of the Municipality of Medellín within the Aburrá Valley in Colombia. In the background the hillshade of a digital elevation model shows the topography (DEM AMVA: Open data Medellín)
2.2 Data
In this study we relied on long-term multitemporal data; therefore, we used Landsat satellites imagery at a 30-meter resolution to derive land cover (LC) maps, with a special focus on the urban layout. To do so, we created cloud-free Landsat mosaics for three time-steps: 1994, 2006 and 2018. We used the Google Earth Engine platform (Gorelick et al. 2017) for building the cloud-free mosaics and downloading the resulting images. Since the time span is fairly wide, we used the atmospherically corrected surface reflectance datasets from Landsat 5 ETM (L5), Landsat 7 ETM+ (L7), and Landsat 8 OLI/TIRS (L8) sensors. We filtered cloudy pixels by masking low-quality pixels using the pixel_qa band. Due to the tropical climate in Medellin, the chances to get cloud free pixels are quite low, and thus, we set long-term periods for the mosaicking to obtain good enough results for each date. Hence, we used imagery from the year 1989 to 1994 to create the 1994 mosaic, images from 2003 to 2006 to create the 2006 mosaic, and from 2013 to 2018 to create the 2018 mosaic. For the LC classification, we selected a subset of spectral bands and calculated additional indices. On top of the visible red, green and blue (RGB), near infrared (NIR) and short-wave infrared (SWIR) bands (bands 1,2,3,4,5,7 in L5 and L7, and bands 2,3,4,5,6,7 in L8), we calculated the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI) and the Normalized Difference Buildings Index (NDBI) as well as their 10th, 25th, 50th, 75th and 90th percentiles. The NDVI gives information about the greenness of vegetation (Rouse et al. 1973), the NDWI indicates water features (positive values) or soil and terrestrial vegetation (negative values; McFeeters 1996) and the NDBI acts as indicator for built-up areas (Zha et al. 2003). As a result, each image mosaic (see Figure 1a) has 24 bands (composed of spectral bands, indices and percentiles). These indices and their percentiles provide additional information for training the LC classification algorithm.
Secondly, to retrieve information on the social vulnerability, we proxied the socio-economic status based on the morphologic characteristics of the living environment and built-up structures. We used an informal settlement mask based on a scene-based Local Climate Zone (LCZ) classification of Medellín, performed with the use of a very high-resolution satellite image from the year 2019 and urban blocks (Kühnl et al. 2021). The lightweight low-rise class is an urban structural type within the LCZ schema (Stewart & Oke 2012), which shows typical morphological features of informal settlements like high density, small and low-rise buildings, lightweight construction materials and sparse vegetation. We extracted the lightweight low-rise polygons with their centroid in our AOI (Figure 3a and 1b). Figure 3b and c show two examples of the neighborhoods and building types identified as informal. The accuracy of the informal settlement layer has been measured at 86% (Kühnl et al. 2021). For this study, we manually checked and corrected over- and under classifications to improve the informal settlement mask.
Fig. 3 (a) Location of formal and informal settlements in Medellín based on the improved lightweight low-rise built-up. (b) Google Street View image (© Google Street View 2021) of an exemplary housing structure in an informal settlement, and (c) of one of their characteristic slopes. (d) Landslide hazard map from the POT 2014 (Plan de Ordenamiento Territorial de Medellín). The map categorizes landslide hazard into very low, low, medium and high. (e) and (f) are visual examples of the high hazardous slopes (own source)
Thirdly, to estimate the amount of population at risk over time we used population data from the official population projections (Proyecciones de Población 1993-2005 a 2015 de Medellín; Alcaldía de Medellín 2015) for the years 1994 and 2006 and the Colombian census (Censo Nacional de Población y Vivienda; DANE 2018) for the year 2018 (Figure 4). We used these dates since the projections coincide with former census surveys from 1993 and 2005 and the uncertainty in the projection is expected to be lower. Depending on the year, the population data is available in different spatial units (i.e., commune, neighborhood, sector and section levels; Figure 4). Data from 2018 is existing at different spatial levels, therefore we use the sector level to disaggregate the population to the pixel level and the section level, with higher detail, to validate the result.
Fig. 4 Population density for the years 1994, 2006 and 2018 as well as the different spatial levels of availability. In 2018 the sector level is used to disaggregate the population to the pixel level and the section level to validate the result
Finally, we use an open source landslide hazard map for the whole municipality in vector format provided by the land use plan from Medellín (Plan de Ordenamiento Territorial de Medellín; POT 2014). The hazard map was created by combining all risk-related information available (Figure 3d). It relies on hazard maps from the POT 2006 and from the National University of Colombia from 2009, as well as mass movement inventories by the Administrative Department of Disaster Management (DAGRD), morphodynamic process maps and all geotechnical and slope stability studies carried out for the municipality of Medellín since 2006. It categorizes landslide hazard into very low, low, medium and high susceptibility zones (Alcaldía de Medellín 2014c). Hence, it is the best hazard map that is available at the city level, since it includes multi-source data and it was created by local experts.
2.3 Conceptual note
Landslide estimation, exposure, and population vulnerability is a multifaceted and highly complex problem. This requires highly accurate, diverse and, in our approach, multi-temporal data sets. Since these are not available in the necessary spatial and thematic depth, nor are they consistent, complete, and given over time, we must make some conceptual assumptions based on the available data.
First, we analyze landslide risk using a hazard map from 2014 assuming that the hazard areas keep constant over time. In this study, we consider two risk levels, risk areas when the hazard of a landslide in the POT is medium or high, and non-risk areas when the hazard is low or very low.
Secondly, with respect to exposure and the capabilities of remotely sensed data available since the 1990s, we make the following assumptions: pixels classified as urban that are categorized as ‘formal’ in 2018, are per definition also ‘formal’ in previous time steps if they were identified as urban in 1994 and 2006, otherwise they are non-urban pixels. Similarly, urban pixels categorized as ‘informal’ in 2018 are also informal in previous time steps, when the pixels are urban. Therefore, formal development does not change to informal over the time, since this transition is very unlikely. At the same time, we disregard informal settlement upgrade, since we do not have data on informality over time as our informal settlement mask is from 2019.
Thirdly, socio-economic indicators allowing to approach the social sphere in a multi-temporal manner are inexistent, especially on such a high spatial granularity. Thus, we tackle social vulnerability using morphological parameters as proxy. We assume that people living in informal settlements are more vulnerable. We assume their economic capabilities reduce the chances of recovering after a disaster. Besides, the definition of morphologic informality differs slightly to informal settlements from a socioeconomic perspective; however, as Kühnl et al. (2021) show, it is a fairly accurate proxy of precariousness and informality.
2.4 Land cover classification
Our focus for the analysis of exposure towards landslide hazard are settlement areas. In this regard, we use Landsat mosaics for the three time steps to classify Medellín into basic land covers (urban, open vegetation, forest, bare soil). We apply a supervised pixel-based classification method using the Random Forest (RF) machine learning algorithm (Breiman 2001). In preparation, we manually create ground truth sample data. This is done independently for the three time steps by visual interpretation with the help of historical and very high resolution images from Google Earth©. The sample data are polygons covering areas of homogeneous land cover, which are equally split into spatially independent training and testing polygons (50/50). The image pixels for the training and testing datasets are subsequently randomly selected from the respective polygons. We use the training dataset to build a RF model for each year that is validated against the testing dataset of the respective year. Using spatially independent pixels in the validation, we thus avoid spatial correlation between classification and evaluation. Table 1 shows the accuracy of the RF classifications for the different thematic classes and the overall accuracy for each time step.
Table 1 Land cover classification accuracy metrics per year
|
1994
|
2006
|
2018
|
Land Cover
|
User’s Accuracy
|
Producer’s Accuracy
|
User’s Accuracy
|
Producer’s Accuracy
|
User’s Accuracy
|
Producer’s Accuracy
|
Urban
|
96.40
|
99.60
|
92.39
|
99.43
|
89.97
|
99.79
|
Open vegetation
|
86.82
|
99.56
|
98.28
|
97.22
|
99.75
|
99.86
|
Forest
|
95.28
|
83.66
|
96.75
|
98.63
|
83.59
|
93.32
|
Bare soil
|
89.17
|
48.43
|
97.13
|
70.33
|
75.94
|
32.30
|
Overall Accuracy (OA)
|
90.85
|
96.45
|
93.61
|
We use the urban land covers to build the urban masks for 1994, 2006 and 2018. In order to diminish temporal fluctuation in the data, we apply a change trajectory analysis similar to the ones applied by Taubenböck et al. 2012. This approach solves errors from the classifier when it is unable to correctly detect urban areas over time. The assumption is that urban pixels cannot change to non-urban pixels. Therefore, several rules where established following the majority rule. For example, if a pixel is urban in 1994 and 2018, but it is non-urban in 2006, we change the status to urban pixel in 2006 to keep consistency. Similarly, if a pixel is urban in 1994 and 2006, it should be also urban in 2018. However, if a pixel is urban in 1994, but it is non-urban in 2006 and 2018, the state of 1994 is changed to non-urban.
2.5 Identification of informal settlements
In order to produce the multitemporal informal settlement masks, we rely on the urban block level vector dataset with informal settlements in 2019 (see section 2.2).We applied morphological filters to fill the gaps between urban blocks, such as roads, to obtain a continuous surface, and then it is transformed to raster format using the majority rule with a spatial resolution of 30 m. Finally, the informal settlement mask 2019 is spatially overlapped with the multitemporal urban masks, and matching pixels are considered informal settlements. As a result, we have three consistent informal settlement masks for the years 1994, 2006 and 2018.
2.6 Disaggregation of population to the pixel level
We estimate population at the pixel level using a top-down binary dasymetric disaggregation method originally developed by Wright (1936). This method consists of redistributing population counts from larger spatial units into smaller spatial units (Reed et al. 2018; Stevenson et al. 2015; Wu et al. 2005).
In this study, we use administrative boundaries as source zones where population counts are known (i.e., communes, neighborhoods and sectors for 1993, 2006 and 2018 respectively) and urban pixels, from the urban masks 1994, 2006 and 2018, as target zones. In a first step, we calculate for each source zone the population density by dividing the population count by the urban area. Then, the population at target zones (pixels) is estimated by multiplying the area of the pixel by the previously calculated population density from the source zone where the pixel is located. For those remaining target zones outside of the boundary of a source zone (a few urban pixels in the west and east of Medellín are not covered by the administrative units), we use the population density of the closest urban pixel. Once we have the population at the pixel level for 2018, we extract the pixels covered by the informal settlement mask from the same year to obtain the informal settlement population at pixel level. Lastly, we validate the disaggregation method with official population counts at a higher spatial detail. We use the population counts at section level from the year 2018 as validation zones to evaluate the performance of the disaggregation method (Grippa et al. 2019). Therefore, the population at pixel level is summarized using the boundaries of the validation zones, and subsequently this sum is compared to the population counts from the validation zones to measure the root mean square error (RMSE) and the RMSE divided by the mean validation zone population count (%RMSE).
The same process is then applied to the remaining time-steps 1994 and 2006 using the respective commune and neighborhood levels.
2.7 Quantifying exposure and social vulnerability to landslides
To quantify the development of exposure and social vulnerability in Medellín over time, we spatially overlapped the multitemporal assets and people (urban and informal settlement masks and pixel population, Figure 1 a, b, c and d) with the landslide hazard map (Figure 1e).
In order to estimate how much urban area and population are at risk, we first intersected the multitemporal urban masks with the hazard map to summarize the total urban area for each hazard level and year. For those pixels crossed by the boundary of a hazard level, only the proportional area of the pixel covered is summarized. Secondly, and similarly, the multitemporal population at pixel level is intersected with the hazard map and the total population is summarized. Likewise, when a pixel is crossed by the boundary only the proportional population of the pixel is summarized for each hazard level and year, and this is calculated using the population density of the pixel and the proportional area of the pixel covered. Finally, this process is replicated with the multitemporal informal settlement masks as well as the informal population at pixel level.
The results of these spatial analyses enabled the calculation of the amount of exposed and socially vulnerable areas and their population based on their spatial localization for the years 1994, 2006 and 2018 by separating into no-risk and risk areas. Since the landslide hazard map consists of four levels (very low, low, medium and high; see section 2.2), we calculate the results based on this division, but focus on risk (medium and high landslide hazard) and no-risk (very low and low landslide hazard) areas in the interpretation. We also calculated the ratio between formal and informal settlements over time, using both, the area and population. In addition, we monitored the development of urban areas and population with respect to their risk to answer the question whether exposure and social vulnerability have relatively increased over time in Medellín.