The methodology was framed to assess the vulnerability of the study area. Vulnerability is susceptibility to suffer losses; in other words, weakened resilience to face the onslaught of a disaster. It incorporates considerations of both the intrinsic value of the elements concerned and their functional value in contributing to communal well-being in general and to emergency response and post-disaster recovery in particular. Socio-economic vulnerability is owing to adverse social positioning due to poverty, unemployment, living in hazard prone zones, or dilapidated structures. On the other hand, the physical-environmental vulnerability refers to the influence of topography, hydrologic, and environmental parameters associated with flood propagation.
The vulnerability assessment of floods for a region thus encompasses two phases. In phase one, the creation of socio-economic vulnerability was done. For a group of people within an area of identical physical environmental conditions, these factors can individually contribute to their vulnerability to flood hazard. The contribution of these factors are subjective and is analysed based on expert opinion, and hence multi-criteria decision analysis approach is used. In Phase two, preparation of the physical-environmental vulnerability map was done. Independent of the social and economic patterns of the habitation, this vulnerability varies over the study area, depending on the topography and conditions of the terrain. Random forest, which is a highly efficient machine learning classification algorithm, is used to categorize the physical-environmental vulnerability.
The resulting spatial representations of the two phases were subjected to weighted overlay analysis to generate the vulnerability classification map following the procedure shown in Figure 1.
Socio-economic vulnerability: Indices such as age group, gender, number of members in a family, the function of buildings, type of roof covering, condition of buildings, unemployment and literacy rate of people, occupancy of people, distribution of population density, building density over the area, and land use or land cover classification of the area were considered to estimate socio-economic vulnerability. It is harder for children, women and aged people to cope with a flood event and are considered more vulnerable. Further, families having a higher number of individuals are also more vulnerable when their mobility during evacuation and coping capacity are taken into account. In case of the type of buildings, a commercial building is expected to contain a lesser population during a flood event and is less prioritized. Buildings that are well built (concrete houses) and are in good condition with good roofing are considered to be less vulnerable since they have more ability to withstand the effects of flood and heavy rains. Unemployed people have to depend on others for financial aids and will have to bear more economic burdens during floods and are considered more vulnerable under the economic perspective. Similarly, illiterates are often uneducated and unaware of disaster preparedness and management, hence more vulnerable. Furthermore, the area with a higher population density and building density is more exposed to flood hazards. And in the exposure viewpoint, urban area and agricultural area are more prioritized compare to barren lands in the assessment.
The following procedure was adopted for analysing the socio-economic vulnerability. First, a hierarchical structure was created using population data and land use data by analysing various factors that influence the vulnerability of the region. The population data which was obtained from census department of India includes age, gender, number of members in a family, function of the building, roof types of buildings, condition of buildings, unemployment, occupancy (Landlord/ Tenant), literacy rate, number of buildings and population density. This was used along with land use data derived by supervised classification to form the four objectives namely population, buildings, economics and exposed elements. These were considered as the three-level hierarchical classification model (Paulo et al., 2015, Kirby et al., 2019). ArcGIS software was used to create the different map layers for the various parameters considered in the study. The next step involves the rescaling of each of the factors into a linear scale of 1 to 10 and relative weights were assigned to each of the factors using AHP (Saaty, 1977). These different factors were combined using a weighted linear combination which is a simple additive weighing procedure according to the following equation: (see Equation 1 in the Supplementary Files)
The aggregation method of weighted linear combination (WLC) was used to map the socio-economic vulnerability and to classify the region into high, medium and low vulnerable areas of flood hazard. Since socio-economic vulnerability relates to the adaptive capacity of the population to that hazard, an area can be considered highly vulnerable, if the population within the area has less capacity to resist the impact of the natural hazard and to recover from its long term or short term effects. The socio-economic vulnerability can be considered as a subjective term. For example, commercial buildings are highly vulnerable in case of an economic point of view, while they are less vulnerable in case of population point of view. This makes it necessary to evaluate the socio-economic vulnerability under four different perspectives. The schematic workflow of the approach is presented in Figure 2.
Physical-environmental Vulnerability: In the present study, Elevation, Proximity to the river, Slope, Normalized Differential Vegetation Index (NDVI), Land use/land cover patterns (LULC), Stream Power Index (SPI), and Topographic Wetness Index (TWI) are the factors identified to affect the physical-environmental vulnerability (Haghizadeh et al. 2017, Samanta 2018). Flooding in the study area is due to heavy rains and associated overflow of water from the river channel to nearby areas. As the nearness to the river increases the vulnerability to flooding event also increases. Moreover, a higher elevation and slope posses lesser chances of holding the excess water causing the flooding and consequently decreases the vulnerability. TWI and SPI are two topographic indices used that influence the flood (Moore et al.,1991; Pourghasemi et al., 2013). An increase in TWI and SPI over an area increases the flood vulnerability of the area to flood events. NDVI is a measure of vegetation cover over the area. Interception losses in tree cover and infiltration of water to earth surface increases with NDVI and thereby make the area less vulnerable. (Wang et al., 2003). Vulnerability in the physical-environmental aspect is further analysed by the landuse landcover pattern of the area. Builtup area and roads cannot percolate water to subsurface and can make the area more vulnerable whereas paddy fields in the area can enhance the penetration of water to subsurface layers of earth and thus decreasing chance of flooding.
For creating a spatial map of physical-environmental vulnerability using Random forest model, a set of raster layers were prepared as in Figure 3. The number of layers depends on the number of parameters that are considered as hazard inducing factors for the flood event. All indices considered in the study were derived from satellite data and made into raster format as layers in GIS platforms. The following equations were used. (see Equations 2-4 in the Supplementary Files)
The dataset was prepared by stacking these raster layers and this served as the input to the Random forest model. Random forest model randomly samples a certain number of values by bootstrapping from the different parameter values of the dataset and divides them into training and testing samples. A number of decision trees were developed to form the random forest and the end node of these trees was indexed as high, medium or low-vulnerable based on training data created. Training data was another raster layer created using known flood water levels at different locations over the study area and interpolating the same to get a raster layer. To minimize the uncertainties due to interpolation of flood water levels, the surface elevation data of the area was also incorporated with the water level to get the required training raster. The pixels of the training raster have values which correspondingly represent the three vulnerable zones considered in the study. The model can thus be trained to three different vulnerability zones namely high, medium and low. The Random Forest model used the same set of classification trees formed to classify the pixels of the testing sample set and the model accuracy was also checked. When the required accuracy level was met with, the whole data set of the study area was fed to the RF model prepared and the model classify every pixel into any of the three categories thereby into the three-vulnerable zone namely high, medium and low-vulnerable to the flood hazard. A highly physical-environmental vulnerable area is those which are at higher flood risk due to its physical and environmental characteristics. For instance, area at a higher elevation is less vulnerable, while the area at closer proximity to the river is more vulnerable. A combined assessment of such physical and environmental factors is made use of to classify the study area to different vulnerable zones. A low vulnerable zone is less susceptible to flood and can be considered suitable for human settlements. The algorithm was implemented in python platform.
Combined Vulnerability to Floods: A region can be considered vulnerable to flood if it is both socio-economic and physical-environmental vulnerable. This concept can prevent over estimation in many cases as a socio-economic vulnerable region may not be physically-environmental vulnerable. Thus, the socio-economic vulnerability (SV) obtained from MCDA approach and the physical-environmental vulnerability (PV) from Random Forest method was combined using AND operator to get the spatial distribution of vulnerable zones (V) within the study area. This can be expressed as: (see Equation 5 in the Supplementary Files)
An analysis was also performed by varying the weights of socio-economic and physical-environmental vulnerabilities, to study the effect of variation of socio-economic and physical-environmental vulnerability in the distribution of vulnerable zones.
ANALYSIS
The details pertaining to the study area, data used in the study, assessments of SV and PV for the study area are discussed subsequently.
Study area
The proposed methodology was illustrated for the Aluva town, a peri-urban municipality in the northern suburb of the city of Kochi, in Ernakulam district, Kerala state, India (Figure 4). The town centre is located at latitude of 10.1004° N and longitude 76.3570° E, with the Periyar River flowing through the municipality, such that it almost divides the region into two. It is a town of around 25,000 residents with an area of 6.46 km2 comprising of 23 wards as per 2011 census. Aluva town was the most affected municipality in Ernakulam district, during the 2018 Kerala flood. The influence of dams and other hydraulic structures which regulate the flood were not considered in this study.
Data used
The data used in the study include spatial data products like Cartosat DEM, satellite imagery, ward map of the municipality, Google image and the Inundation map with flood levels. The non-spatial data include population data and associated statistical data. The details pertaining to the data used in the study are presented in Table 1.
Table 1: Data used in the study
SL NO
|
DATA
|
SOURCE
|
REMARKS
|
1
|
DEM
|
National Remote Sensing Centre (NRSC)
|
• Cartosat 1
|
• Spatial Resolution: 1 arc second
|
• File format: GeoTIFF
|
2
|
LISS III
|
National Remote Sensing Centre (NRSC)
|
• Resourcesat 1
|
• Spatial resolution: 23.5m
|
• File format: GeoTIFF
|
• Number of bands: 4(2,3,4,5)
|
B3 :0.62-0.68 (RED)
|
B4 :0.77-0.86 (NIR)
|
3
|
Ward Map
|
Aluva Municipality
|
Hardcopy
|
5
|
Population data
|
2011Census data
|
Datasheets
|
6
|
Flood level data and Inundation map
|
Kerala State Disaster Management Authority (KSDMA)
|
Shapefiles
|
Analysis of socio-economic vulnerability
The hierarchical structure of socio-economic vulnerability model is shown in Figure 5. The objective population considers age, gender, and the number of members per house as the second level factors. The third level classification for age was included as, age below 14 years, age between 14 and 65 years and age above 65 years, as the extremes of the age spectrum are more vulnerable to the disaster. In the case of gender, third level classification include male or female and for the number in family representing houses with members less than 4 or greater than 5, considering the fact that larger families would have more dependent to evacuate and thus have to share the resources.
The second level factors for the objective buildings include function, type of roof, and the condition of the buildings. Based on the type of use, buildings were classified into residential and commercial buildings. Based on the roof type, houses were classified into three as houses with concrete roof, houses with tiled roof, houses with other types of roofs, which include thatched houses, houses made of bamboo, slate etc. Since concrete houses can withstand the effect of flooding, they were given comparatively less weightage and houses with other types of roofing were given higher weightage. A similar classification was followed for the condition of buildings. The buildings were classified into three as, houses with good living condition, liveable condition and with the dilapidated condition, with maximum weightage given to houses with dilapidate condition and minimum weightage to those with good condition.
The next objective of economics, considered unemployment, housing occupancy (Landlord/Tenant) and illiteracy as the factors. The unemployed are dependent on other family members and are considered more vulnerable. While considering housing occupancy, tenants usually do not possess the financial status to own a house and are thus economically more vulnerable. Illiterates were considered as more vulnerable as they generally lack the basic knowledge to adapt in a hazard situation.
While considering the objective of exposed elements, the second level factors include land use, population density, and building density. Under land use, built-up, agricultural area and barren land were considered as the third level factors. The urban area was considered as highly vulnerable to a flood event; whereas the vulnerability of agricultural land is dependent upon on the crop season and the barren land always possess relatively low vulnerability. The population density and density of buildings are also important factors that influence socio-economic vulnerability in risk areas.
According to the hierarchical classification, three-level of maps were prepared. The process of map preparation assumes negligible effect of the floating population within the study area. Various map layers were created in the ArcGIS platform. The higher-level maps were prepared by overlaying the lower level map of the corresponding factor. To avoid the sharp variation of population and building densities over the ward boundaries, the population and building density maps were created using pycnophylatic interpolation by employing focal statistics tool in ArcGIS.
The weights were assigned to each factor using Analytical Hierarchy process (AHP). After creating the hierarchical structure of socio economic vulnerability model, the relative importance of each pair of criteria of the model is evaluated. Saaty’s 9- point continuous scale (Saaty 1980) was used for weighing each criterion for the creation of a pairwise comparison matrix. First the pairwise comparison matrix was created using each of the attributes within the third level factors. The score of one indicates that both criteria were equally important whereas a score of 9 indicates extreme importance of one criterion over the other. All scores were assembled in a pairwise comparison matrix with the value of 1 on the diagonal and reciprocal scores on the lower left triangle. The pairwise comparison generated was based on expert opinion which was evaluated. An eigen vector was extracted from each comparison matrix and the weight was assigned to each of the factors (Leal 2020) within the third level. For each level in hierarchy it is necessary to understand the consistency of judgement to accept the results of judgement. The parameter consistency ratio (CR) is used to check the consistency and the value of CR greater than 0.1 has to be re-evaluated. By trial and error, the weights for each factor in the third level were obtained. Using each of these weights, weighted overlay was performed on the third level factors to create the second level factors. Similarly, weight estimation and overlay for each of the second level and third level factors were performed respectively to obtain the final vulnerability map.
The measurement of each of these factors was done on different scales. Thus, rescaling of these factors to a common scale was necessary. The factors were standardised to a linear scale of 1 to 10, where 1 represents very low vulnerability and 10 represents very high vulnerability. A simple additive weighting procedure such as weighted linear combination was used to combine each criterion. The 3rd level maps were combined using WLC to create 2nd level maps and further these 2nd level maps were clustered to generate 1st level maps based on the four above mentioned objectives. The final social vulnerability map was obtained by combining the four objectives by assigning equal weightage to them.
Analysis of factors affecting PV
Raster layers were created using ArcGIS and Erdas Imagine software from remote sensing data for the analysis and then used to create a dataset to be fed into the Random forest model. Elevation of the area from the mean sea level was one of the factors affecting the physical-environmental vulnerability. A Digital Elevation Model (DEM) was used to get the spatial variation of elevation in the study area. The Cartosat DEM of one arc second spatial resolution (approximately 30 m) was used in this study. Those pixels which lie in higher elevation are considered to be less prone to flood as compared to those in low-lying areas. The slope was another aspect considered in the study, derived from DEM using GIS tools, with the index reflecting the degree of topographic change. Regions with high slopes drain the water faster compared to the flat regions that are more susceptible to flooding. Proximity to the river was another most important factor affecting the vulnerability. It is considered that an area having closer proximity to the river is more likely to be affected by the flooding. Since both proximity to river and elevation are incorporated in the study, a combination of situations like an area nearer to the river but having higher elevation was considered as less vulnerable. Proximity to the river was estimated by computing the Euclidean distance of every point on the area considered concerning the digitized line feature (from LISS 3 image) representing the river. In addition, two major topographic indices considered were topographic wetness index (TWI) and stream power index (SPI ). Both are derivatives of DEM and computed using the formula given in equation 3 and equation 4. The topographic wetness index, also known as the compound topographic index (CTI), is a steady-state wetness index. TWI was determined as the product of slope and upstream contributing area per unit width perpendicular to the flow direction. TWI has a good correlation with many soil properties such as horizon depth, percentage of silt, organic matter and phosphorous and hence directly influence the flood vulnerability (Moore et al.,1991). Potential flow erosion at a particular point is represented by stream power index. Erosion risk increases as the amount and velocity of water increases and hence directly contributes to flood vulnerability (Pourghasemi et al., 2013). Normalized Difference Vegetation Index (NDVI) was used to quantify the vegetation cover of an area. It was computed using equation 2 from LISS III image using band 3 and band 4, being the Red and Near Infra-red bands. Increase in NDVI shows thicker vegetation cover which reduces the runoff. Interception loss to surface runoff increases with vegetation and the vegetation facilitates more infiltration which in turn reduces the surface runoff causing the flood. Least NDVI values are usually shown by water bodies and barren lands where the uninterrupted flow of water can occur with very less penetration to the subsurface (Wang et al., 2003). Hence a decrease in NDVI was considered to be an aid to increase flood hazard vulnerability. Land Use/ Land Cover of the area is also a key factor which determines the intensity of flood. Five land classes over the study area were identified by field studies and using the supervised classification of LISS 3 image, the study area was classified into Built-up area, Water cover, Tree-cover, Barren land and Paddy fields. Built-up area includes regions where buildings and roads exist, with least percolation of water to the sub-surface; whereas paddy fields are recharge zones where water easily reaches the subsurface layers. Thereby built-up area contributes to flood hazard vulnerability, whereas paddy fields and barren lands can decrease the chances of the area being flooded. Different raster maps created for all the indices considered to be hazard inducing factors were analysed. All the maps were made into the same resolution by using nearest neighbour resampling technique. The output maps with the spatial distribution and intensity variations across the Aluva municipality as in Figure 6 was obtained.
Analysis of Random forest model
Spatial variation of all the physical-environmental factors influencing the flood vulnerability over the study area was plotted into raster layers in ArcGIS, which in turn form the data set as input to the Random forest model. The variables in the model are the physical-environmental factors of which some are randomly selected by bootstrapping and decision trees are grown on these selected samples identified as the training dataset. A number of decision trees are randomly grown on these datasets for which the end nodes are labelled as High, Medium and Low vulnerable based on the real flood data. The algorithm for forming random forest implemented in Python is summarised as follows:
- Samples were selected from the dataset by bootstrapping procedure. (Yeh et al., 2014)
- Best split on randomly selected variables was found based on Gini value and trees are grown based on training data (Wang et al.,2015)
- Data to be predicted was given to the model and final prediction was done by ensembling the output of each decision tree through majority voting principle.
The accuracy of the model was obtained using testing samples. The model was run on testing samples and the predicted output of testing samples was compared with the known values to get the accuracy. The influence of parameters considered was also determined by the model. This was done using a function called Mean Gini decrease index which computes Gini decreases individually for all parameters over all the trees of the forest (Ai et al., 2014). In addition, the sensitivity of the model to the number of trees grown and to the depth of pruning was also assessed. Accuracy of the model was also checked for different combinations of tree number and pruning depth. The predicted output was visualized in ArcGIS software. The ward map of the Aluva municipality was superimposed over the vulnerability map to get the ward wise distribution of the physical-environmental vulnerability.