The vulnerability assessment of floods for a region encompasses two phases: Phase 1: Creation of socio-economic vulnerability using multi-criteria decision analysis and Phase 2: Preparation of the physiographic vulnerability using the Random forest algorithm. The resulting spatial representations were subjected to weighted overlay analysis, to generate the vulnerability classification map following the procedure shown in Figure 1.
Socio-economic vulnerability: The following procedure was adopted for analysing the socio-economic vulnerability. First, a hierarchical structure was created by analysing various factors that influence the vulnerability of the region. A three-level hierarchical classification was adopted which comprises of four components namely: Population, Buildings, Socio-economics and Exposed elements (Paulo et al., 2015). Arc GIS software was used to create the different map layers for the various parameters considered for the study. The next step involves the rescaling of each of the factors into a linear scale of 1 to 10 and relative weights were assigned to each of the factors using analytic hierarchy process (AHP) developed by Saaty (1977). MCDA allows the decision-makers to compare different criteria based on their relative importance to obtain a final solution. These different factors were combined using a weighted linear combination which is a simple additive weighing procedure according to the following equation:

Wi = normalised criteria scores
Xi = criteria weights
The aggregation method of weighted linear combination (WLC) was used to map the socio-economic vulnerability and to classify the region into high, medium and low vulnerable areas of flood hazard. Since socio-economic vulnerability relates to the adaptive capacity of the population to that hazard, an area can be considered highly vulnerable if the population within the area has less capacity to resist the impact of the natural hazard and to recover from its effects, whether long term or short term. The socio-economic vulnerability can be considered as a subjective term. For example, commercial buildings are highly vulnerable in case of an economic point of view, while they are less vulnerable in case of population point of view. In that context, an area labelled as less vulnerable in socio-economic domain indicates that the population residing in that area are more resilient to the flood hazard, whereas a highly vulnerable area in the same domain represents that community would experience adverse impacts of flood hazard and can be regarded as less resilient. The schematic workflow of the approach is presented in Figure 2.
Physiographic Vulnerability: For creating a spatial map of physiographic vulnerability using Random forest model, a set of raster layers were prepared as in Figure 3. The number of layers depends on the number of parameters that are considered as hazard inducing factors for the flood event. In the present study, Elevation, Proximity to the river, Slope, Normalized Differential Vegetation Index (NDVI), Land use/land cover patterns (LULC), Stream Power Index (SPI), and Topographic Wetness Index (TWI) are the factors identified to affect the flood vulnerability and were derived from satellite data in raster format layers in GIS platforms. The following equations were used.
(Wang et al., 2003)
(Where NIR = Near Infra-Red band and R = Red band)
(Moore et al.,1991).
(Pourghasemi et al., 2013)
(Where α = upslope area and tan b = local slope in radians)
The dataset was prepared by stacking these raster layers and this served as the input to the Random forest model. Random forest model randomly samples a certain number of values by bootstrapping from the different parameter values of the dataset and divides them into training and testing samples. A number of decision trees were developed to form the random forest and the end node of these trees was indexed as high, medium or low-vulnerable based on training data created. Training data was another raster layer created using known flood water levels at different locations over the study area and interpolating the same to get a raster layer. To minimize the uncertainties due to interpolation of floodwater levels, the surface elevation data of the area was also incorporated with the water level to get the required training raster. The pixels of the training raster have values which correspondingly represent the three vulnerable zones considered in the study. The model can thus be trained to three different vulnerability zones namely high, medium and low. The Random Forest model used the same set of classification trees formed to classify the pixels of the testing sample set and the model accuracy was also checked. When the required accuracy level was met, the whole data set of the study area was fed to the RF model prepared and the model classify every pixel into any of the three categories and thus the area can be divided into the three-vulnerable zone namely high, medium and low-vulnerable to the flood hazard. A highly physiographic vulnerable area is those which are at higher flood risk due to its physiographic characteristics. For instance, area at a higher elevation is less vulnerable, while the area at closer proximity to the river is more vulnerable. A combined assessment of such physiographic factors is used to classify the study area to different vulnerable zones. A low vulnerable zone is less susceptible to flood and can be considered suitable for human settlements. The algorithm was implemented in python platform.
Combined Vulnerability to Floods: A region can be considered vulnerable to flood if it is both socio-economic and physiographic vulnerable. This concept can prevent overestimation in many cases as a socio-economic vulnerable (SV) region may not be physiographically vulnerable (PV). Thus, the socio-economic vulnerability obtained from MCDA and the physiographic vulnerability from Random Forest can be combined using AND operator to get the spatial distribution of vulnerable zones (V) within the study area. This can be expressed as:
V= SV ∩ PV …. (5)
An analysis was also performed by varying the weights of socio-economic and physiographic vulnerabilities, to study the effect of variation of socio-economic and physiographic vulnerability in the distribution of vulnerable zones.
ANALYSIS
The details of the study area, data used in the study, assessment of socio-economic and physiographic vulnerability for the study area are discussed subsequently.
Study area
The proposed methodology was illustrated for the Aluva town, a peri-urban municipality in the northern suburb of the city of Kochi, in Ernakulam district, Kerala state, India (Figure 2). The town centre is located at a latitude of 10.1004° N and longitude 76.3570° E, with the Periyar river flowing through the municipality, such that it almost divides the region into two. It is a town of around 25,000 residents with an area of 6.46 km2 comprising of 23 wards as per 2011 census. Aluva town was the most affected municipality in Ernakulam district, during the 2018 Kerala flood. The influence of dams and other hydraulic structures which regulates the flood were not considered in this study.
Data used
The data used in the study include spatial data products like Cartosat DEM, satellite imagery, ward map of the municipality, Google image and the Inundation map with flood levels. The non-spatial data include population data and associated statistical data. The details pertaining to the data used in the study are presented in Table 1.
Table 1: Data used in the study
SL NO
|
DATA
|
SOURCE
|
REMARKS
|
1
|
DEM
|
National Remote Sensing Centre (NRSC)
|
• Cartosat 1
|
• Spatial Resolution: 1 arc second
|
• File format: GeoTIFF
|
2
|
LISS III
|
National Remote Sensing Centre (NRSC)
|
• Resourcesat 1
|
• Spatial resolution: 23.5m
|
• File format: GeoTIFF
|
• Number of bands: 4(2,3,4,5)
|
B3 :0.62-0.68 (RED)
|
B4 :0.77-0.86 (NIR)
|
3
|
Ward Map
|
Aluva Municipality
|
Hardcopy
|
5
|
Population data
|
2011Census data
|
Datasheets
|
6
|
Flood level data and Inundation map
|
Kerala State Disaster Management Authority (KSDMA)
|
Shapefiles
|
Analysis of socio-economic vulnerability
The hierarchical structure of socio-economic vulnerability model is shown in Figure 5. The objective population considers age, gender, and the number of members per house as the second level factors. The third level classification for age was included as, age below 14 years, age between 14 and 65 years and age above 65 years, as the extremes of the age spectrum are more vulnerable to the disaster. In the case of gender, third level classification included male and female and for the number of persons per household, houses with members less than 5 and greater than 4 were taken, considering the fact that larger families would have more dependent to evacuate and thus have to share the resources.
The second level factors for the objective buildings included the number of rooms, the type of use, the material of roof type and the condition of the buildings. The third level classification for each of the second level factors was as follows: The number of rooms was divided in the 3rd hierarchical order as two, houses with room lesser than 5 and houses with room greater than 4. Based on the type of use, buildings were classified into residential and non-residential buildings. Based on the roof type material houses were classified into 3, houses with concrete roof, houses with tiled roofs, houses with other types of roofs, which included thatched houses, houses made of bamboo, slate etc. Since concrete houses can withstand the effect of flooding, they are given comparatively less weightage and houses with other types of roofing are given higher weightage. A similar classification was been followed in the condition of buildings. The buildings were classified into three, houses with good living condition, liveable condition and with the dilapidated condition, with maximum weightage given to houses with dilapidate condition and minimum weightage to those with good condition.
The next objective, socioeconomics, considered of unemployment, housing occupancy (Landlord/Tenant) and illiteracy. The unemployed are dependent on other family members and are considered more vulnerable. While considering housing occupancy, tenants usually do not possess the financial status to own a house and are thus economically more vulnerable. Illiterates can also be considered more vulnerable as they generally lack the basic knowledge to adapt in a hazard situation.
While considering the objective of exposed elements, the second level factors included land use, population density, and building density. Among them, land use, urban areas, agricultural area and barren land are considered as the third level factors. The urban area can be highly vulnerable to a flood event. The vulnerability of agricultural land depends upon the crop season and barren land always possesses relatively low vulnerability. The density of people and buildings are also important factors that influence socio-economic vulnerability in risk areas.
According to the hierarchical classification, three-level of maps were prepared. The process of map preparation assumes the effect of the floating population within the study area was ignored. Various map layers were created in the ArcGIS platform. The higher-level maps were prepared by overlaying the corresponding lower levels of the map of corresponding factors. Analytical hierarchy process (AHP) was employed to assign the weightage to each criterion and aggregation was done by weighted linear combination. To avoid the sharp variation of population and building densities over the ward boundaries, the population and building density maps were created using pycnophylatic interpolation by employing focal statistics tool in ArcGIS.
The measurement of each of these factors was done on different scales. Thus, rescaling of these factors to a common scale was necessary. The factors were standardised to a linear scale of 1 to 10, where 1 represents very low vulnerability and 10 represents very high vulnerability. The weights were assigned to each factor using AHP. The consistency of judgement was checked by calculating the consistency ratio (CR) and any value of CR greater than 0.1 was re-evaluated (Saaty 1980).
A simple additive weighting procedure such as weighted linear combination was used to combine each criterion. The 3rd level maps were combined using WLC to create 2nd level maps and further these 2nd level maps were clustered to generate 1st level maps based on the four above mentioned objectives. The final social vulnerability map was obtained by combining each of the four objectives by providing equal weightage.
Analysis of factors affecting the physiographic vulnerability
Raster layers were created using ArcGIS and Erdas Imagine software from remote sensing data for the analysis and then used to create a dataset to be fed into the Random forest model. Elevation of the area from the mean sea level was one of the factors affecting the physiographic vulnerability. A Digital Elevation Model (DEM) was used to get the spatial variation of elevation in the study area. The Cartosat DEM of one arc second spatial resolution (approximately 30 m) was used in this study. Those pixels which lie in higher elevation are considered to be less prone to flood as compared to those in low-lying areas. The slope was another aspect considered in the study, derived from DEM using GIS tools, with the index reflecting the degree of topographic change. Regions with high slopes drain the water faster compared to the flat regions that are more susceptible to flooding. Proximity to the river was another most important factor affecting the vulnerability. It was considered that an area having closer proximity to the river was more likely to be affected by the flooding. Since both proximity to river and elevation are incorporated in the study, a combination of situations like an area is nearer to the river but having higher elevation was considered as less vulnerable. Proximity to the river was estimated by computing the Euclidean distance of every point on the area considered concerning the digitized line feature representing the river. In addition, two major topographic indices considered were topographic wetness index and stream power index. The topographic wetness index (TWI), also known as the compound topographic index (CTI), is a steady-state wetness index. TWI was determined as the product of slope and upstream contributing area per unit width perpendicular to the flow direction. TWI has a good correlation with many soil properties such as horizon depth, percentage of silt, organic matter and phosphorous and hence directly influence the flood vulnerability (Moore et al.,1991). Potential flow erosion at a particular point is represented by stream power index. Erosion risk increases as the amount and velocity of water increases and hence directly contributes to flood vulnerability (Pourghasemi et al., 2013). Normalized Difference Vegetation Index (NDVI) was used to quantify the vegetation cover of an area. It was computed using equation 2 from LISS III image where band 3 and band 4 are respectively Red and Near Infra-red bands. Increase in NDVI shows thicker vegetation cover which reduces the runoff. Interception loss to surface runoff increases with vegetation and the vegetation facilitates more infiltration which in turn reduces the surface runoff causing the flood. Least NDVI values are usually shown by water bodies and barren lands where the uninterrupted flow of water can occur with very less penetration to the subsurface (Wang et al., 2003). Hence a decrease in NDVI was considered to be an aid to increase flood hazard vulnerability. Land Use/ Land Cover of the area was also a key factor which determines the intensity of flood. Built-up area is those places where buildings and roads exist, with least percolation of water to the sub-surface; whereas paddy fields are recharge zones where water easily reaches the subsurface layers. Thus, Built-up area contributes to flood hazard vulnerability whereas paddy fields and barren lands can decrease the chances of the area being flooded. Different raster maps created for all the indices considered to be hazard inducing factors were analysed to obtain the following output maps with the spatial distribution and intensity variations across the Aluva municipality as in Figure 6.
Analysis of Random forest model
Spatial variation of all the physiographic factors influencing the flood vulnerability over the study area was plotted into raster layers in ARCGIS, which in turn form the data set as input to the Random forest model. The variables in the model are the physiographic factors of which some are randomly selected by bootstrapping and decision trees are grown on these selected samples identified as the training dataset. A number of decision trees are randomly grown on these datasets for which the end nodes are labelled as High, Medium and Low vulnerable based on the real flood data. The algorithm for forming random forest implemented in Python is summarised as follows:
- Samples were selected from the dataset by bootstrapping procedure. (Yeh et al., 2014)
- Best split on randomly selected variables was found based on Gini value and trees are grown based on training data (Wang et al.,2015)
- Data to be predicted was given to the model and final prediction was done by ensembling the output of each decision tree through majority voting principle.
The accuracy of the model was obtained using testing samples. The model was run on testing samples and the predicted output of testing samples was compared with the known values to get the accuracy. The influence of parameters considered was also determined by the model. This was done using a function called Mean Gini decrease index which computes Gini decreases individually for all parameters over all the trees of the forest (Ai et al., 2014). In addition, the sensitivity of the model to the number of trees grown and to the depth of pruning was also assessed. Accuracy of the model was also checked for different combinations of tree number and pruning depth. The predicted output was visualized in ArcGIS software. The ward map of the Aluva municipality was superimposed over the vulnerability map to get the ward wise distribution of the physiographic risk.