Comparison of Statistical and Analytical Hierarchy Process Methods on Flood Susceptibility Mapping: In a Case Study of the Tana Sub-basin in Northwestern Ethiopia

: The sub-basin of Lake Tana is one of the most flood-prone areas in northwestern Ethiopia, which is affected by flood hazards. Flood susceptibility modeling in this area is essential for hazard reduction purposes. For this, the analytical hierarchy process (AHP), bivariate, and multivariate statistical methods were used. Using an intensive field survey, historical record, and Google Earth Imagery, 1404 flood locations were determined which are classified into 70% training datasets and 30% testing flood datasets using subset in the GIS tool. The statistical relationship between the probability of flood occurrence and eleven flood-driving factors is performed using the GIS tool. Then, the flood susceptibility map of the area is developed by summing all weighted factors using a raster calculator and classified into very low, low, moderate, high, and very high susceptibility classes using the natural breaks method. The results for the area under the curve (AUC) are 99.1% for the frequency ratio model is better than 86.9% using AHP, 81.4% using the logistic regression model, and 78.2% using the information value model. Based on the AUC values, the frequency ratio (FR) model is relatively better followed by the AHP model for regional flood use planning, flood hazard mitigation, and prevention purposes.


Introduction
A flood is an overflow of water that submerges usually dry land.It can also occur in rivers or lakes when the flow rate exceeds the capacity of the rivers channel, particularly at the bends or meanders in the waterway and backflow from the Lakes.Natural hazards, in particular flood, has been affecting the world during rainy seasons.Even though Flood is one of the natural parts of the hydrological cycle, it is increased in both frequency and magnitude from year to year (Samanta et al. 2018).This is because of the over change of climate and land degradation on the Earth due to the anthropogenic intervention.
The anthropogenic intervention on the Earth can reduce the water retention capacity of the catchments because of the cleanup of forestation for a different purpose, which resulted in a high rate of soil erosions.The Flood hazard has been causing damage to crops, infrastructures, engineering structures, properties, and loss of human and animal lives worldwide including Ethiopia.As reported by (Samanta et al. 2018;Calil et al. 2015), the flood has resulted in a risk to a human being (like loss of life, injury), properties (agricultural area, yield production, villages, and buildings), communication systems (urban infrastructure, bridges, roads, and railway routes), cultural heritage and ecosystems.(Zou et al. 2013;Calil et al. 2015) stated that more than 2000 deaths can occur within a single year and more than 75 million people have been adversely affected across the planet Earth by flood hazards.
Flood hazard is becoming one of the destructive natural hazards in Ethiopia followed by landslide incidences and resulted in huge damages to properties, crops, farmlands, infrastructures, and loss of life.For example, in the last two years, 2019-2020, flood hazard was displaced more than 500,000 people and damaged wide cultivated lands (more than 25, 000 ha cultivated lands), damaged various engineering structures, destructed more than 35 houses, and lost of lives in Amhara, Somali, Afar, SNNP, Dire Dwa, and Oromia regions of Ethiopia.The study area is one of the severely affected areas by flooding which resulted in the loss of life, properties, destruction of houses, roads, and more than 7, 000-hectare farmlands covered by various crops in the area.These show that huge economic loss caused by flooding hazard that retards the sustainable development of the economy of the country.Therefore, flood susceptibility mapping is one of the most important elements for early warning systems or strategies to prevent and mitigate future flood situation, which helps to reduce the negative results of flood hazard.Flood susceptibility mapping can be also perceived as one of the ways of vulnerability assessment (Adger et al. 2006;Jacinto et al. 2015).In geohazard mapping, susceptibility/vulnerability, hazard, and risk mapping are the most important activities to understand, mapping, and evaluating the spatiotemporal condition and level of risk due to geo-hazards.These terms have different meanings but some researchers use the terms interchangeably.
Susceptibility refers to the probability of occurrence of an event within a particular type in a given location whereas hazard refers to the probability of occurrence of an event within a particular type and magnitude in a given location within a reference period.This means, susceptibility can be used to predict the spatial occurrence of events, but hazard can be used to predict the spatiotemporal occurrence of events in a given terrain.The term risk refers to the expected losses or damage by events in a given region, which are the products of susceptibility, hazard, and elements at risk.Hence, the main objective of this study is to prepare a flood susceptibility map, this study only focuses on flood susceptibility other than hazard and risk.The flood susceptibility mapping has implementing using various methods by different and numerous studies.These methods including qualitative (for example, analytical hierarchy process (AHP), quantitative (machine learning, statistical), and hydrological based methods.The hydrological methods are very simple and are based on a nonlinear concept and they are less effective to model complex features like catchments (Sahoo et al. 2009).Nowadays, these traditional methods have been replaced by automated and rule-based methods that are more suitable for flood hazard mapping (Hostache et al. 2013).SWAT (Anjum et al. 2016) and WetSpass (Nurmohamed et al. 2012) methods are examples of hydrological methods that are used to produced spatial flood susceptibility models by integrated GIS and remote sensing tools.Qualitative methods are an expert-driven approach, which required field experience specialists (Rahmati et al. 2016;Dahri and Abida 2017).
Rely on the experience and professional background knowledge of experts and subjectivity is the drawback of these methods.
An analytical hierarchy process (AHP) is an example of a qualitative method used by many scholars to produce a flood susceptibility model based on a multicriteria analysis framework (Karimi et al., 2018).Machine learning techniques are advanced methods that used in flood susceptibility mapping, however, a considerable processing time, the requirement of having high-performance computing systems along with specific software, and strict selection criteria for input parameters make machine learning methods less usable for a wide range of users (Ghalkhani et al. 2013;Tehrany et al. 2013).
Topographic wetness index (TWI) is a commonly used very simple traditional flood hazard mapping, however, this method is used considering only the slope and flow accumulation in a region.Nevertheless, flood hazard is the result of the combination of several factors like soil texture, depth of groundwater, land use, vegetation cover, elevation, rainfall, stream density, distance from streams, and the depth of the riverbank.To solve the above-mentioned limitation of TWI, the use of statistical and analytical hierarchy process methods is very important to evaluate the spatial statistical correlation of flood driving factors, and flood points.Statistical methods are indirect susceptibility mapping methods widely or routinely used to evaluate the correlation between flood driving factors and floods based on mathematical expression (Bednarik et al. 2012;Chen and Wang 2007;Pradhan et al. 2011;Regmi et al. 2014;Wang et al. 2011).Statistical methods are imperative to utilize quick, understandable, and accurate methods for flood susceptibility modeling.It has no specific requirements regarding input data, software, and computer capacity.The statistical methods can be further divided into multivariate and bivariate statistical methods, which are widely used throughout the world.They provide reliable results (Dai and Lepcha 2002;Donati and Turrini 2002;Luelseged and Yamagishi 2005;Duman et al. 2006;Sarkar et al. 2013;Meten et al. 2015;Chandak et al. 2016;Kouhpeima et al. 2017;Wubalem and Meten 2020;Hong et al. 2020).The bivariate statistical methods are used to evaluate the relationship between flood governing factors and past flooding.Frequency ratio, certainty factor, information value, and weight of evidence are examples of bivariate statistical methods, which are simple, easy, and produce reliable models.It also helps to evaluate the effects of a flood at a factor class level that is impossible in data mining or multivariate methods.However, it requires quality input data, past flood data, and lacking to evaluate the relationship among flood governing factors.Multivariate statistical methods are used to examine the relationship between three and above dependent and independent variables (Pham et al. 2016b;Das 2019;Duman et al. 2006;Kouhpeima et al 2017;Luelseged and Yamagishi 2005).Logistic regression and discriminant analysis are examples of multivariate statistical methods used frequently in flood susceptibility modeling and provide reliable results (Chen and Wang 2007;Das 2019;Duman et al. 2006;Kouhpeima et al. 2017;Luelseged and Yamagishi 2005;Meten et al. 2015).However, it is incapable to examine the contribution of each factor class for flood probability like data mining, unlike bivariate methods.
Many scholars have been employing both qualitative and quantitative methods for flood susceptibility modeling, however, no clear and tangible agreements to select the best methods for flood susceptibility modeling practice.Although the suitability of the model depends on various constraints including physical parameters, data quality, and availability, expert and technological advancement, comparison among different natural hazard mapping methods is one of the solutions to select appropriate approaches.Hence, each method has its limitation, using different approaches together for landslide or flood susceptibility mapping is very important to fill the gap among the methods.For example, the logistic regression model can perform multivariate statistical analysis between a dependent variable and a set of independent variables, but it is incapable to analyze the impacts of internal classes of flood governing factors individually on flood occurrence.This limitation can be solved using bivariate statistical methods, for example, frequency ratio and information value statistical methods can be extracted the influence of each flood governing factor class on flood occurrence, but it cannot consider the relationship between these flood governing factors and flood occurrence.Therefore, the combination use of bivariate and multivariate statistical methods are very essential to overcome the limitation of each method.As a result, in the present study, bivariate, multivariate, and expert methods are employed to generate a flood susceptibility model in the sub-basin of Lake Tana and the performance of each method has been evaluated using receiver operating characteristics curve and area under the curve (AUC).Thus, based on the concerns stated overhead, the main objective of this study is 1) to compare and evaluate the performance of the frequency ratio, information value, logistic regression, and analytical hierarchy process methods to determine flood-prone areas 2) to evaluate the relationship between flood factors and flood probability as well as flood factor class and flood occurrence probability.The nobility of this study lies in, 1) for the first time, the rigorous flood susceptibility methods like statistical methods was conducted in the sub-basin of Lake Tana to generate flood susceptibility model 2) the comparison among the information value, frequency ratio, logistic regression, and analytical hierarchy process methods have not performed yet.This study will be determined with statistically significant methods for flood susceptibility modeling.The resulted map will be helped the regional and local authorities and policymakers to mitigate flood hazards.

Study Area
The study area is located in Amhara Regional State of the sub-basin of Lake Tana basin in northwestern Ethiopia, which is characterized, by wide flat to gently sloping plains and somehow raged topography.Its elevation ranges from 1,774-4,037 m above mean sea level (Fig. 1).It is bound between 330,000-410, 000 E and 1,280,000-1,350,000 N. It is characterized by subtropical to cool climatic zones with very high and prolonged rainfall between Jun to October.The study area is covered mainly three Districts including Fogera, Farta, and Libo Kemkem which is frequently affected by flood hazards yearly during heavy and prolonged rainfall seasons.The study area has many tributaries that drained to the two major rivers called Gumara and Ribb Rivers that also drained to Lake Tana, which is the parts of the Abay basin.Agriculture is one of the most dominant land use in the study area, which is performed more than two per year.The dominant soil types in the study area including clay, loam, sandy loam, silty sand, fine to coarse sand, and gravels sourced from volcanic rocks.

Flood Inventory Map
In flood susceptibility mapping, flood inventory mapping is one of the key elements, which can be prepared using various techniques like the aerial photograph or Google Earth Imagery interpretation, field investigation, and evaluation of archived data coupled with GIS tool.Evaluating and recognizing the correlation between flood driving factors and flood incidences is required an accurate and precise flood inventory map (Pradhan et al. 2012;Tehrany and Jones 2017;Mahyat et al. 2019).
This flood inventory map can be prepared in map forms from the data that can be collected from a satellite image or Google Earth Imagery interpretation, historical records, and extensive field survey.In the present research work, 1404 most relevant flood inventory data were collected from historical records, Google Earth Imagery interpretation, and Extensive fieldwork (Fig. 2).In the literature, several suggestions are provided regarding the size of flood samples to be used for modeling and model verification (Ohlmacher and Davis 2003).Therefore, based on a literature review, the flood inventory data was classified into 70% (983) flood for the training dataset and 30% (421) for testing datasets keeping their spatial distribution using subset in ArcGIS 10.1 (Lee et al. 2012;Tehrany et al. 2013;Khosravi et al. 2016;Mahyat et al. 2019) as shown in the figure.The same number of flood and non-flood points were chosen for the logistic regression analysis.

Flood Driving Factors
The selection of flood factors is one of the most crucial elements in flood susceptibility mapping, which depend on physical and natural characteristics of the study area and data availability (Kia et al. 2012;Liuzzo et al. 2019), however, no welldefined standards to select the most significant flood driving factors.The factors that initiate the flood incidence in the study area are selected based on the study area's environmental condition, data availability, logistic regression analysis, and a literature review (Lee et al 2012;Mahyat et al. 2019).The slope angle, slope curvature, land use, soil texture, distance to stream/river, stream density, normalized vegetation index, flow accumulation, groundwater depth, rainfall, and elevation have been taken into account to examine the spatial relationship between them and flood occurrence in the study area.These factors were classified into sub-factor classes using a natural break in ArcGIS to evaluate the effects of each flood factor class for the case of frequency ratio and information vale methods.The flood factors, which have derived from DEM, distance to stream (five classes), slope angle (five classes), flow accumulation (five classes), stream density (five classes), elevation (five classes), and slope curvature (three classes) maps were constructed from 12.5 m x 12.5 m resolution DEM (Fig. 3).The soil map of the study area is prepared through digitization from a 1:250,000 textural soil map of the Amhara Region, which has four classes (silty sand, sandy loam, clay, and loam).Land use and NDVI maps of the study area were prepared from Sentinel 2 satellite image analysis using ArcGIS with the help of high-resolution Google Earth image interpretation.The LULC has eight classes including grazing land, agricultural land, barren land, residential/settlement, river zone /water body, dense forest, moderate forest, and wetland (Fig. 3) whereas NDVI has five classes.The rainfall and groundwater depth raster map was constructed using ArcGIS 10.1 from annual mean rainfall and well data that are collected from Amhara Metrological Agency and Amhara Water Well Drilling Enterprise, respectively.To determine the effects of each flood factor class on flood occurrence, weight rating through flood factor raster combined with flood raster map is important.For this purpose, all flood factor maps converted into a raster and reclassified with the same pixel size (12.5 m x 12.5 m) and the same projection using the GIS tool.Then, the flood inventory map is overlaid through a combination of spatial analysis tools under the local toolbox with flood factor raster class to extracted flood pixels for each flood driving factor class.Then the effects of each factor class were determined using the equation of frequency ratio, and information value methods as summarized in Table 1.

Methodology
To achieve the goal of the present research work, various activities and steps are employed.These are data collection, Flood inventory mapping, database creation for Flood factors, Flood susceptibility modeling using frequency ratio, information value, logistic regression, and AHP methods as well as model validation using the Receiver Operating Characteristics curve (ROC).Moreover, appropriate data, including a topographic map, borehole data, Digital Elevation Model (DEM) with 12.5 m resolution, historical flood events, soil type map, geological map, and meteorological data were collected.These data were collected from the United States Geological Survey (USGS), Amhara Water Well Drilling Enterprise (AWWDE), Field Survey, Google Earth Imagery from NASA, Ethiopian National Meteorological Agency, and the Geological Survey of Ethiopia (GSE).The flood location of the study area was identified using historical records, Google Earth imagery analysis, and an intensive field survey.This was classified into training and testing flood datasets.The training flood datasets were used for model preparation, whereas the testing flood datasets were used for model prediction accuracy evaluation.Based on the data availability, local environmental conditions, data evaluation, literature, and local people interview, eleven flooddriving factors were determined.The flood driving factor maps and flood inventory map were prepared using ArcGIS 10.1.
Geodatabase building is one of the most fundamental elements in flood susceptibility mapping.Therefore, four databases were built for information value, logistic regression, frequency ratio, and analytical hierarchy process (AHP) models.The frequency ratio, information value, and analytical hierarchy process (AHP) database contain flood inventory and flood driving factors while the logistic regression database contains flood and no flood points with eleven-weighted flood driving factors.After the database was built, an evaluation of the relationship between flood and flood factors as well as the determination of the statistical significance of each flood factor was the next step in flood susceptibility mapping.Therefore, eleven flood factor maps were reclassified into subclass and overlaid with reclassified training flood datasets.Weight ratings for all flood factor classes assigned statistically using Excel.These weighted maps rasterized-using lookup in spatial analyst.
After rasterized the factor maps, the flood susceptibility index maps were generated by the sum-up of all raster maps using a raster calculator in Map Algebra.These maps (LSI) are classified into a fivefold classification scheme: very low, low, moderate, high, and very high susceptibility classes using natural breaks (Fig. 5, 6, 7, and 8).In the case of the logistic regression method, the study area is classified as training flood and non-flood points using GIS.Then, the weight of eleven factors has been extracted to generate logistic regression coefficients of each flood factor in SPSS, and finally, the flood susceptibility index of the area was generated using the logistic flood probability equation (Eq.8) and GIS tools (Fig. 3).
Finally, the accuracy of the four models evaluated using the prediction rate curve based on observed testing flood datasets (Fig. 9).

Modeling Approaches Information Value Model
The information value method is one of the probabilistic methods of a bivariate statistical method, which is used to envisage the correlation between floods and flood factor classes (Sakar et al. 2006).The information values for each factor class determined through the combination of reclassified flood raster to reclassified flood factor raster based on the presence of flood in a given map unit.These values are important to define the role of each causal factor in classes for flood occurrence.This can calculate as in Eq.1.
Where Conditional probability is the ratio of the pixel of a flood in class to the pixel of a class and prior probability is the ratio of the total number of pixels of the flood to the total number of pixels of the study area.Nfopix is a flood pixel/area in a flood factor class.Ntfopix is the total area of a flood in the entire study area.Ncpix is the area of the class in the study area and Ntcpix is the total pixel area in the entire study area.When the IV > 0.1, the flood occurrence with the factor classes have a high correlation, means it will have a high probability of flood occurrence however when the IV < 0.1 or IV < 0, it is a low correlation between flood factors and flood occurrence which indicate a low probability of flood occurrence.After calculated the information value for each flood factor class using Microsoft excel and GIS, the information value for each factor class was assigned through the join in the ArcGIS tool.Then, the weighted flood factors rasterized using the lookup tool in spatial analysis, and the flood susceptibility index (LSI) of the study area was calculated as in Eq. 2.Where LSI is the flood susceptibility index and IV is the information value of each factor class.The higher value of LSI has indicated a higher probability of flood occurrence.

Logistic Regression Model
Logistic regression is one of the popular multivariate statistical analysis methods, which can be used to establish a multivariate regression relationship between the dependent and independent variables (Pradhan and Lee 2010).Among other statistical methods, the logistic regression model has been proven one of the most reliable approaches for flood susceptibility ).The first way is using all data from all the study areas.However, this leads to an uneven proportion of non-flood and flood pixels, which incorporate a large volume of data in the analysis.
Using all flood pixels with equal non-flood pixels is the second method, which also results in a less reliable output, but it can reduce sample size and sampling bias.The third method uses an unequal or equal proportion of flood and non-flood pixels by classifying flood into training and testing datasets.
In the present work, the floods of the study area were classified into training flood datasets (70%) and as testing flood datasets (30%).In this study, the dependent data are a binary variable and are made up of 0 and 1, which represent the absence and presence of floods, respectively.Consequently, an equal number of non-flood sample points, whose dependent variable value is 0 were randomly selected from flood-free areas to represent the absence of floods using GIS.The equal number of flood points and non-flood points were merged.Moreover, all the values of independent variables containing flood and non-flood were extracted from the maps of each flood governing factors using ArcGIS.Then, the logistic regression was conducted and coefficients were calculated in the SPSS program.It can be expressed mathematically (Lee and Sambath 2006;Schicker and Moon 2012) as: Where P is the probability of flood occurrence that varies from zero to one.Z is the linear combination of the predictors and varies from -1< z < 0 for higher odds of non-flood occurrence to 0 < z <1 for odds of higher flood occurrence.Z can be defined as: Where x1, x2, x3...xn are independent variables, Bo is the intercept of the slope of logistic regression analysis, and β1, β2, β3... βn are the coefficients of the logistic regression analysis.

Frequency Ratio Model
It is one of the bivariate probability methods, which is applicable to determine the correlation between flood occurrence and flood causative factor classes.The frequency ratio is the ratio of areas where the flood occurred in the areas to areas in which flood has not occurred.When the ratio value is greater than one, it indicates a strong correlation between factor classes and flood occurrence in a given terrain, however, the ratio value less than one indicated a weak correlation between flood occurrence and flood factors, which means a low probability of flood occurrence (Lee and Talib, 2005).It can calculate using Eq. 5.
Where FR is frequency ratio, Nfopix is a flood pixel/area in a flood factor class, Ntfopix is the total area of a flood in the entire study area (a), Ncpix is an area of the class in the study area and Ntcpix is the total pixel area in the entire study area (b).In the present research work, the frequency ratio for each causative factor class calculated using Eq.5, and the results are summarized in Table 1.
After calculated the frequency ratio for each flood factor class using Microsoft Excel and GIS, the frequency ratio value for each factor class was assigned through the join in the ArcGIS tool.Then the weighted flood factors rasterized using the lookup tool in spatial analysis.The flood susceptibility index (LSI) of the study area was calculated by carefully summing up the weighted factor raster maps using Eq.6 by the raster calculator in Map Algebra of the spatial analysis tool.To get the flood susceptibility index, the frequency ratio of each factor type or class is summed as in Eq. 6.The flood susceptibility index indicated the degree of susceptibility of the area for flood occurrence.Where LSI is the flood susceptibility index, n is the number of flood factors, Xi is the flood factor and FRi is the frequency ratio of each flood factor type or class.After the flood susceptibility index was calculated, the index values were classified into different levels of flood susceptibility zones using natural breaks in the ArcGIS tool.The higher the value of the flood susceptibility index (LSI), the higher the probability of flood occurrence, but the lower the LSI indicates, the lower the probability of flood occurrence.
Based on the natural break classification, the flood susceptibility map of the study area has five classes such as very low, low, moderate, high, and very high landslide susceptibility class (Fig. 5).

Analytical Hierarchy Process (AHP)
The AHP is one of the qualitative methods used to determine the relationship between flood factor class and flood occurrence.The AHP method is a structured tool that is used to analyze difficult decisions based on mathematics and psychology (Cho et al. 2015;Nguyen et al. 2015;Saaty 2000;Zhang et al. 2016).To produce weighting factors, the pairwise comparison method was used by considered Saaty's ranking scale (Luu et al. 2018;Saaty 2008).The consistency of calculated weight for each flood factor class was examined by the consistency ratio, which is calculated by Eq.7 (Luu et al. 2018;Saaty 2001).When the consistency ratio (CR) is less than 0.1, the weight of the factor class that is calculated using the comparison matrix is consistent but if it is greater than 0.1, the comparison matrix is inconsistent and should be revised.
After the weight of each factor class was determined, the flood susceptibility map was produced as showed in Eq.9 (Rahmati et al., 2016c).Where CR is consistency ratio, CI is consistency index, RI is the average random consistency index of the judgment matrix and λmax is the largest eigenvalue derived from the paired comparison matrix and n is the number of flood factor, Wi is the weight of the flood factor, Xn is the flood factors and FSI is flooded susceptibility index.

Frequency Ratio Results
The frequency ratio method is used to calculate FR for each subclass of every flood-driving factor, which is the ratio of flood occurrence ratio to the area ratio.The result of the FR is summarized in Table 1.The greater the value of FR indicates a strong correlation between flood factor class and flood occurrence, the higher probability of flood occurrence when FR greater than unity (Table 1 and Fig. 4).As the results of the analysis designated in Table 1 and Fig. 4), the FR value for the first slope class, 0° -5° is greater than 1, is indicating a higher probability of flood occurrence which has 96% of a flooded area in the slope classes.This finding is consistent with other studies (e.g., Rahmati and Pourghasemi 2017;Tehrany et al. 2014;Shafizadeh et al. 2018).However, the slope gradient greater than 5° has less correlation with flood occurrence.This result confirmed that the concepts as the slope gradient increase, the probability of flood occurrence in a given train will be decreased.Because the steeper the slope gradient, the higher will be the rate of downslope water velocity however the lower the water concentration as well as the infiltration of rainwater into the ground.Nevertheless, when the slope gradient decreases, the potential for surface water concentration and rainwater infiltration into the ground will increase it depends on the hydraulic behavior of soil in that region.The higher concentration of surface water will have resulted in a high probability of flood incidence.
Slope curvature is another flood factor, which has three classes including Convex, Concave, and flat slope shapes.As the results of the correlation analysis of curvature class with flood inventory indicated in Table 1, the flat class received a higher FR value, indicating a strong correlation with flood occurrence.56.1 % of the flooded area is fall in this class.This is because of the higher potential of rainwater concentration and low infiltration of rainwater due to its flatness and the existence of impermeable soil formation.Hence, this class is flat; the overflow of the water from the riverbed is high in a class that is why the flat portion of the curvature class indicating a higher flood occurrence probability.This finding is confirmed with the other studies (Cao et al. 2016;Chapi et al. 2017;Khosravi et al. 2016;Shafizadeh et al. 2018).1, the relationship between elevation and the relative likelihood of flood occurrence is a negative correlation at the elevation > 1,972 m, meaning the probability of flood occurrence is low in elevated lands than low lands (Shafizadeh et al. 2018).This result is similar to the previous studies of (Hong et al. 2016;Shafizadeh et al. 2018).
In the spatial prediction of flood-prone areas in a catchment, distance to the river is a critical factor because floods occur due to the overflowing of water from the riverbanks (Chapi et al. 2017).Therefore, the areas closer to the riverbeds demonstrate a rapid response to rainstorms and flooding.As the results of the analysis shown in Table 1, the first four classes (0 -100 m, 100 -300 m, 300 -500 m, and 500 -700 m) indicating a strong correlation with flood occurrence and 57.1 % of flooded area falls in these classes but the value of FR is decreased as the distance to the river bed is increased.This result confirmed that the concepts, the closer to the riverbed, the higher would be flood occurrence probability (Chapi et al. 2017;Hong et al. 2020;Shafizadeh et al. 2018).As the correlation analysis of flow accumulation with flood inventory results indicated in Table 1, flow accumulation is one of the most important parameters in flood susceptibility mapping (Pradhan 2010).The higher value of FR for flow accumulation is indicating higher concentration of water and consequently higher flood occurrence probability.As Table 1 indicated, when the flow accumulation increased, the FR value is increased in parallel.Land use and land cover are other important parameters in flood susceptibility mapping which can be influenced by the interrelationship between surface and groundwater, the amount of infiltration, surface water concentration, and overland flow.As the result of land use and flood inventory correlation analysis indicated in Table 1, River zone, barren land, grazing land, settlement, and moderate vegetation/cropland have higher FR value, indicating higher flood occurrence probability.37% of flooded area falls in these land-use classes.Because the moderate vegetation/cropland favors rainwater infiltration and hence the groundwater of this region is shallow, which enhanced the overland flow of water that is why moderate vegetation class has received higher FR value.The urban and grazing land have received higher FR value because of the impermeable nature of the class and indicating a higher flood occurrence probability correlation.This result is in line with the work of (Shafizadeh et al. 2018).The NDVI is one of the important parameters for flood susceptibility mapping, its value ranges from -1 to 1.When the value is closer to one, the higher vegetation cover but the closer to -1 implies the lower vegetation cover.Higher NDVI indicated dense vegetation that can reduce and slow water flow (Turoglu and Dolek 2011).This gives the water time to infiltrate into the ground and resulting in a decrease in water volume and less probability of flood occurrence.However, it depends on the hydraulic behavior of soil and the depth of groundwater.In this study, the NDVI value ranges from -1 to 1 which is from non-vegetated to highly vegetated regions.As the vegetation density increased, the flood susceptibility of a region will be decreased depending on the depth of groundwater and vegetation type.
As the results of NDVI with flood inventory correlation analysis indicated in Table 1, the first, third, fourth, and fifth classes of the NDVI have received a higher value of FR and indicating higher flood occurrence probability correlation.This is because the groundwater depth of the study area is shallow which can be increased overland flow water by reducing the rate of infiltration of rainwater that is why the region shows a higher flood occurrence correlation.60.4% of the flooded area falls in these classes.Table 1 shows, as a stream density increased, the value of FR is increased in parallel and indicating high flood occurrence probability (Chapi et al. 2017;Shafizadeh et al. 2018).The stream density classes (3.5 -5.1 m/km 2 and 5.1 -8.8 m/km 2 ) have received a high value of FR, indicating a strong correlation with flood occurrence and 61.5 % of flooded area falls in these classes.
The amount of surface water concentration and rainwater infiltration rate mainly depends on the hydraulic behavior of soils in the region.When the soil mass in a region is highly pervious, the rate of water infiltration into the ground would be higher but the amount of surface water concentration would be lower.This will enhance the non-flood incidence probability in a region.However, this will be highly affected by the depth of groundwater.The results of flood inventory with soil correlation analysis indicated in Table 1, silty sand and clay soil mass have received higher value of FR compared to loam and sandy loam soil masses, indicating higher flood incidence probability.This is because of the impervious behavior of fine-grained soils.When the grain size of soil mass increased, the percent of pore space in between soil grain will increase but the pore space diameter will low.This leads to the blockage of flowing water inside the soil.These types of soil will have a high water holding capacity.This again increased the overland flow of water.This can be contributed to high flood incidence probability.88% of the flooded area falls in the silty sand and clay soil masses.Table 1 indicated the shallow groundwater class has received a high value of FR, indicating high flood incidence probability.97.2 % of the flooded area falls in very shallow groundwater depth.Even though rainfall is one of the most important flood driving factors, its effect highly depends on the nature of the ground and the depth of the river channel.As a result of rainfall with flood inventory analysis indicated in Table 1, the annual mean rainfall of class (106 -113 mm) has received a high value of FR, indicating high flood incidence probability.This is because of the impervious hydraulic behavior of soil mass, low slope gradient, and shallow groundwater depth.68.5 % of the flooded area falls in the class (106 -113 mm).

Information value Results
ArcGIS 10.2 and Microsoft Excel were used to calculate the information value (IV) of each factor classes to determine the statistical significance of each factor class for flood incidence probability.The factor class, which received higher (positive) information value indicating higher flood occurrence probability, but the factor class, which has received lower (negative) information value indicating a negative or weak correlation with flood occurrence probability.For example, as the result shown in Table 1, the distance to the stream of the first four classes indicating a positive correlation with flood occurrence but the rest factor class of the distance to stream, show negative correlations for flood occurrence probability.The slope class > 5°, elevation > 1, 972 m, the first class of flow accumulation, distance to stream class > 700 m, the stream density classes (0 -0.8 Km 2 , 0.8 -2.1 Km 2 , and 2.1 -3.5 Km 2 ), slope curvature (concave & convex slope), LULC (dense forest, wetland, and agriculture land), the second and the third classes of NDVI, Soil texture (sandy loam & loam), and groundwater depth > 1, 951 m did show negative statistical correlation with flood occurrence probability (Table 1).IV is information value, FR is frequency ratio, a is flooded area in a factor class, b is an area of factor class, Con_P is conditional probability and Prio_P is the prior probability

Logistic Regression Results
Hence, sets of independent variables are so sensitive for collinearity (interrelatedness of independent variable) which can be checked using Tolerance (TOL) and variance inflation factor index (VIF), Multicollinearity test was applied using SPSS software before logistic regression analysis.When the Tolerance (TOL) < 0.2 and VIF > 5, the given independent variable have multicollinearity.As a result of the multicollinearity test indicated in Table 2, no independent variables that were used in flood susceptibility analysis showed any multicollinearity.Using logistic regression analysis in SPSS, the logistic regression coefficient for all flood-driving factors was determined.Similar to the information value method, the positive logistic regression coefficients indicating a positive association with flood occurrence probability but the negative logistic regression coefficients indicating a negative correlation of flood factors with flood occurrence probability.As the result of logistic regression analysis indicated in Table 2, Stream density, NDVI, Rainfall, and Curvature have received negative logistic regression coefficients but the remaining factors that have received positive logistic regression coefficients, indicating the flood factors have positively associated with flood occurrence probability.

AHP Pairwise Comparison Matrix Results
After reclassifying and ranking the eleven-flood factor thematic raster into subclasses, the pairwise comparison was performed for 5 x 5, 8 x 8, 4 x 4, and 3 x 3 matrixes using the AHP calculator (Table 3), where the diagonal element is equal to 1.As indicated in Table 3, the significance of sub-criteria for each factor has shown in the row of the pairwise comparison matrix.The first row in Table 3 illustrates the significance of the first slope angle compared to the other slope angle classes.
For instance, the first slope angle class (0° -5°) is significantly more important than the other slope classes, which are placed in the column for flood probability and assigned 9.However, the last classes of the slope angle at the row have less significance for flood probability and assigned the reciprocal values of the pairwise comparison (E.g.1/9 for the last slope class, 29° -77°).The details for all parameters weight rating have summarized in Table 3 and the consistency of the factor class weight was evaluated using the consistency ratio (CR).When CR < 0.1, the weights' consistency is affirmed.As indicated in Table 3, the CR value for all factor classes is less than 0.1 and indicated no weights' inconsistency.Based on the results of the pairwise comparison analysis, as the slope angle, elevation, and groundwater depth increased, the flood probability will be decreased and the vise verse.Similarly, as the distance to Riverbed increased, the flood probability will be decreased.Concerning the other parameters, as the stream density, rainfall and flow accumulation increased, the flood probability will be increased (Table 3).The flood occurrence probability and its impact also depend on the hydraulic behavior of soil regard to the other parameters.If the permeability of soil is high, the flood probability will low.This depends on the grain size and diameters of pore space between soil particles.Therefore, the clay soil has low permeability than the high water holding capacity.This is the case why the clay soil has received a high value (9) in the pairwise comparison matrix (Table 3).In the study area, Settlement, bare land, agricultural land, grazing land, water body, and wetland have a high contribution to flood occurrence respectively compared to the forested regions.

Flood Susceptibility Model
Frequency Ratio Flood Susceptibility model After weight rating for each flood driving factor classes using FR, each flood-driving factor was converted into raster using lookup in spatial analysis option under ArcGIS 10.2 software.The flood Susceptibility index of the study area is generated by sum up all raster maps carefully using the raster calculator in spatial analysis.The flood susceptibility index (Fig. 5) was reclassified into five classes (Very low, low, moderate, high, and very high) using the natural break method in ArcGIS as shown in Eq. 6.As a result, shown in

Information Value Flood Susceptibility Model
Similar to the frequency ratio method, the flood susceptibility index generated using the information value method (Fig. 6) was reclassified into five classes (Very low, low, moderate, high, and very high) using the natural break method in ArcGIS as shown in Eq. 2. As a result, shown in Table 4, high and very high flood susceptibility classes have covered 20.3 % and 20.2 % of the study area, respectively.However, the remaining, 13.1 %, 23.9 %, and 22.5 % of the study area covered by very low, low, and moderate flood susceptibility areas.

Logistic Regression Flood Susceptibility Model
In the logistic regression method, logistic regression coefficients for individual factor was determined using SPSS.The linear combination of LR constant and factor products with LR coefficients is called Z, which is calculated as shown in Eq. 4. The value of Z enters into Eq. 3 and the flood probability index (P) was generated.The value of P ranges from 0 -1 and the closer the value to one is indicating the higher the flood susceptibility region.Similar to the frequency ratio and information value methods, the flood susceptibility index generated using the logistic regression method (Fig. 7) was reclassified into five classes (Very low, low, moderate, high, and very high) using the natural break method in ArcGIS as shown in Eq. 3. As a result, shown in

Model Validation and Comparison
The most important ambition of flood susceptibility mapping is to determine the areas that are prone to flood hazards.However, flood susceptibility modeling without prediction and model performance evaluation is non-sense to the application of disaster reduction programs.Although researchers used many techniques to validate the flood susceptibility model, the receiver operating characteristics (ROC) method is routinely used (Shafizadeh et al. 2018;Tehrany et al. 2013;Liuzzo et al. 2019) because of its simplicity and produce clear as well as reliable results (Samanta et al., 2018;Rhmati et al. 2016;Khosravi et al. 2016;Pradhan and Lee 2010).Therefore, the prediction and model performance of the flood susceptibility map of the study area was validated by comparing the flood model with existing flood data using the ROC curve (Lee et al. 2007;Tien Bui et al. 2012;Pourghasemi et al. 2012).The prediction accuracy and model performance of the flood susceptibility map was evaluated quantitatively using the receiver operating characteristics (ROC) curve based on the evaluation of the true and false positive rates (Chauhan et al. 2010;Mahyat et al. 2019).Both the training and testing dataset were used to calculate the success rate curve and predictive rate curve.The predictive rate curve for the four models was obtained by comparing testing flood datasets with the flood susceptibility index while the success rate curve was also obtained for the four models by comparing training flood datasets.The AUC value ranges from 0.5 -1 (Yesilnacar and Topal, 2005) and the closer the value to one indicating the higher accuracy of the model.As the results of the Success rate curve of AUC analysis indicated in (Table 4 and Fig. 9), FR has received a 97  2).These are in line with the finding of Kia et al. 2012;Chapi et al. 2017;Mosavi et al. 2018;Falah et al. 2019;Rahman et al. 2019).Overall, logistic regression also causes oversimplification and generalization on the effects of flood governing factors.Whereas frequency ratio and information value are simple and effective statistical methods that can extract the influence of each flood governing factor class on flood occurrence (Table 1), but it cannot consider the relationship between these flood governing factors and flood occurrence.
The analytical hierarchy process method is a very important method to evaluate the effects of factors and factor classes on flood occurrence probability, however, this method has a series of subjectivity problems during pairwise comparison to assign the weights for each factor class and flood driving factors.In summary, there are no unique statistical and expert-based methods to determine both the effects of each factor class and general effects of flood factors.Therefore, the combination use of bivariate and multivariate statistical methods to predict flood susceptibility in a region is very essential when there is no unique method that helps to evaluate the effects of flood driving factors as general and inherently.
important role than the methods.Although all statistical models indicated higher prediction accuracy, based on their statistical significance analysis result of AUC value (see Table 4 ), the frequency ratio (FR) model is better than the analytical hierarchy process (AHP), logistic regression (LR) model, and information value model for regional land use planning, flood hazard mitigation, and prevention purposes.

Conclusion
In flood hazard reduction and mitigation management, a flood susceptibility map is one of the key elements.
Therefore, it is essential to prepare the most precise and reliable flood susceptibility map.The application of frequency ratio, information value, logistic regression, and analytical hierarchy process (AHP) models have been tested in flood susceptibility mapping and their results are compared to each other using AUC results.The results showed that the flood susceptibility map produced by the frequency ratio method is relatively better than the AHP, logistic regression, and information value methods.However, the ranges of prediction accuracy value for all four methods are indicated that the frequency ratio, AHP, logistic regression, and information value methods are capable to produce an acceptable flood susceptibility model.
The models, which are generated using the bivariate, multivariate statistical, and AHP models, can help to understand the flood hazard problems in the study area.Although the resulting maps cannot forecast the time, and how often it can occur, it has provided the spatial distribution of flood probability.These models can also provide important information to the researchers, local people, government, and planners to reduce the flood hazard problems in the study area.Therefore, the concerned bodies may at the Wereda/District, Zone, Region, and Federal levels take tangible activities to mitigate the flood problem by avoiding permanent activities at the high and very high regions with the integration of construction of check dams for streams.

Figures
Figure 1 Location Map of the Study Area Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Flood location map Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Figure 3 Flood governing factor maps Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.Flood Susceptibility map using the frequency ratio method Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Flood Susceptibility map using information value method Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Flood Susceptibility map using the logistic regression method Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Flood Susceptibility map using analytical hierarchy process method Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.This map has been provided by the authors.
Predictive and success rate curves for IV, LR, FR, and AHP methods

Figure 1
Figure 1 Location Map of the Study Area

Figure 2
Figure 2 Flood location map

Figure 5 FloodFigure 7 Figure 8 Figure 9
Figure 5 Flood Susceptibility map using the frequency ratio method

Table 1
indicated that the FR value for elevation class is decreased as the elevation of the region is increased (Shafizadeh et al. 2018), indicating a higher flood probability correlation with the first class of 1, 774 -1, 972 m which is 99 % of the flooded area fall in this region.As indicated in Table

Table 1
Statistical analysis results of flood occurrence and flood factors using FR, and IV methods

Table 3
Pairwise comparison matrix and weight of flood factor classes

Consistency Ratio CR = 4.4% Distance Stream
Table 4, high and very high flood susceptibility classes have covered 19.8 % and 20.7% of the study area, respectively.However, the remaining, 14.1 %, 23.6 %, and 21.7 % of the study area covered by very low, low, and moderate flood susceptibility areas.The high and very high flood susceptibility classes in the study area fell Ribb River, Gumara River, Ribb dam, and other streams as well as flat and impervious soil regions.However, the low and very low regions fell in the steep slope gradient and deep groundwater depth as well as densely forested and previous regions.

Table 4
logistic regression flood susceptibility index, LRFSP is logistic regression flood susceptibility pixel, FRFSI is frequency ratio flood susceptibility index, FRFSP is frequency ratio flood susceptibility pixel Z =-4.38+ 0.769*Slope raster + -0.095 *Stream density raster + -0.040*Slope curvature raster + 0.106*Soil Texture raster + 0.159*Land use raster + 1.73*Distance to stream rasteris .9% and 99.1% success rate curve and prediction rate curve, respectively.When evaluating the accuracy of the model, the FR model indicated superior -testing datasets that are not used for model development were overlaid on the four flood susceptible maps.The number of flood points that fells in the very high susceptibility class was measured as shown in Table4, 85.2%, 55.3%, 85.1%, and 93.92% of flood points were fell in the very high susceptibility class of FR, LR, IV, and AHP models.Here also the FR and AHP models confirm again its excellent performance followed by the IV model.All in all the flood points which fell in the very high susceptibility class are greater than 55%, indicating acceptable model accuracy of IV, LR, AHP, and FR models.Although the analytical hierarchy process, frequency ratio, information value, and logistic regression methods are routinely used methods for flood susceptibility mapping, they have some foreseeable limitations.For example, the logistic regression model can perform multivariate statistical analysis between a dependent variable and a set of independent variables (Table2), but it is incapable to analyze the impacts of internal classes of flood governing factors individually on flood occurrence.As the results indicated in Table2, the importance of flood driving factors is determined using the LR model.The result showed that among eleven factors, distance to stream (Yesilnacar and Topal 2005)(Yesilnacar Rahman et al. 2019)spectively.This finding is similar to the work of(Bui et al. 2018; Samanta et al. 2018a;Rahman et al. 2019).Besides the ROC curve, flood(1.73),elevation (0.8), slope gradient (0.769), flow accumulation (0.222), land use (0.159), soil texture (0.106), and groundwater depth (0.006) had received the highest statistical impact on the probability of flood occurrence (Table