Optimal flood susceptibility model based on performance comparisons of LR, EGB, and RF algorithms

Wadi El-Matulla, located in the eastern desert of Egypt, is the most important water basin. The Qift–Qusayr highway (west–east direction) and the Cairo–Aswan eastern desert highway (north–south direction) pass through the watershed. Many urban areas (villages and industrial areas) and agricultural lands are located at the outlet of these basins. In addition, the basin has promising potential for future economic and urban development as it is located within the Golden Triangle (governmental megaproject). The current study investigates flood hazard modeling and its impact on the area. To determine the optimal flood susceptibility mapping algorithm, performance comparisons of three techniques were conducted: logistic regression (LR), extreme gradient boosting (EGB), and random forest (RF). Remote sensing, topographic, geologic, and meteorological data were used with the help of field visits to provide the spatial and inventory database required by the models. The performance and reliability of the predictions of the proposed models were evaluated using five statistical indices: receiver operating characteristic–area under the curve, overall accuracy (OAC), kappa index, root mean square error (RMSE), and mean absolute error (MAE). The performance of the models showed that the values of ROC (93, 86 and 80%), OAC (88, 82 and 76%), kappa index (0.85, 0.75 and 0.51), RMSE (0.34, 0.42 and 0.49) and MAE (0.12, 0.18 and 0.24) for RF, EGB, and LR, respectively. Based on AUC values, RF and EGB models provide excellent and very good prediction for flood susceptibility. Our results show that RF is the optimal algorithm for flood susceptibility mapping, followed by EGB and LR. Consequently, the predictive power of RF model is quite good and the flood susceptibility map was classified into five classes, namely very low (51.7%), low (23.7%), moderate (16.2%), high (7.1%), and very high (1.3%). Ultimately, the RF model was verified using sentinel-1 imagery for real floods in 2016 and 2021, and it provides good agreement. The optimal model could be useful for decision makers and planners to protect existing facilities and plan future projects in non-flood-prone areas. Accordingly, the most suitable areas for future development need to be distributed mainly in the low and very low flood hazard areas.


Introduction
Nowadays, floods are a global hazard due to climate change and environmental variability (Bubeck and Thieken 2018;Xu et al. 2019). They can pose a significant threat to human life and property and cause damage to the physical environment (agricultural land, urban facilities, and infrastructure), the breakdown of social ties, and economic losses (Thomas 2017;Costache 2019;Waqas et al. 2021). Many factors can contribute to flooding, including heavy rainfall (an important factor), inadequate drainage systems to drain excess water or lack of preventive measures, human activities (land use planning, agricultural practices, and built facilities) (Lin et al. 2019;Zhao et al. 2019;Ullah and Zhang 2020). Gao et al. (2019) pointed out that many disasters occurred worldwide between 1960 and 2014, including 2171 floods, droughts, and extreme hydrological events. From 2018 to 2021, about 51% of people were displaced due to floods, 35% due to storms, 12% due to earthquakes and tsunamis, and less than 2% due to other types of hazards (Uddin and Matin 2021). In general, there are different types of floods, such as flash floods, river floods, coastal floods, and urban floods ). Among them, flash floods are the most devastating and cause great loss of life and property because they occur very quickly and at high velocity, carrying a large amount of debris and earth material; therefore, these floods can cause severe damage to infrastructure and fatalities (Bui et al. 2020;Siegel 2020).
It is an urgent task to create a safe environment for people and property by minimizing the flood risk of areas. To achieve this, flood susceptibility mapping must be done as early as possible, especially for future planning areas (Al-Abadi et al. 2020;Uddin and Matin 2021). With the advantages of "geographic information systems (GIS) and remote sensing (RS)," susceptibility maps are becoming very popular recently. Remote sensing can be used to collect various types of spatial information showing the characteristics of topographic surfaces, vegetation cover, changes in land use over time, and the effects of climate change (Douvinet et al. 2015;Kabenge et al. 2017;Hussain et al. 2020). In addition, the methods of GIS help to create a spatial database, which is considered the cornerstone for flood susceptibility analysis and delineation of flood-prone areas (Zhao et al. 2019). Various methods have been developed to map flood-susceptible areas, including the "analytical hierarchy process (AHP) (Seejata et al. 2018;Skilodimou et al. 2019;Das and Gupta 2021); frequency ratio (FR) and index of entropy (IoE) (Feizizadeh et al. 2021), logistic regression (LR)" (Band et al. 2020;Malik et al. 2020), and ensemble fuzzy MCDM (Tella and Balogun 2020). Machine learning techniques such as alternating decision tree, logistic model tree, reduced-error pruning tree, J48 decision tree, and Naïve Bayes tree (Luu et al. 2021); extreme gradient boosting (EGB) (Naghibi et al. 2020;Mirzaei et al. 2021); and random forest (RF) (Farhadi and Najafzadeh 2021) were applied for flood susceptibility mapping. Other methods such as deep learning techniques have been applied for flood susceptibility (Satarzadeh et al. 2022).
Flood hazard is one of the most important and complex applied geomorphological studies that require various datasets. Through the application of RS and GIS, these data are available to perform such flood hazard models with a high degree of accuracy in predicting the flood-prone areas. The Egyptian authorities have elected the Wadi El-Matulla and its surroundings as an area for future development (agriculture, industry, and urban development). Various studies have been conducted in this area and its surroundings such as geomorphological studies (El-Shamy 1985;Embabi 2004) and hydrological conditions (Abd-Allatief et al. 2014). Recently, the Egyptian government has chosen this area as one of the megaprojects named Golden Triangle; subsequently, several studies were conducted in this region to investigate its groundwater characteristics (Abdelkareem and El-Baz 2015;Gaber et al. 2018). However, to ensure the sustainability of any project, natural hazard evaluation is a must (Bathrellos et al. 2017;Skilodimou et al. 2019). Unfortunately, hazard vulnerability mapping is still a new topic in Egypt and few studies have been conducted on it (e.g., Abdrabo et al. 2020;Prama et al. 2020;Saber et al. 2020;Hermas et al. 2021). However, flood susceptibility assessment is a new issue in Egypt and the Middle East regions.
The main objective of the present study is to develop and apply quantitative analysis techniques with the integration of GIS for mapping flood-prone areas. An attempt is made to compare different models in order to achieve an optimal flood susceptibility zonation in a region highly affected by floods. Wadi El-Matulla and its surrounding catchments in Egypt were selected as the study area. The area is promising for agriculture and new development activities as it is part of the Golden Triangle. In addition, the wadis are adjacent to several urban and industrial areas that could be affected by flooding. In order to avert flood hazards and protect existing facilities and future developments, flood susceptibility modeling was conducted based on the application of three individual models that were adopted and compared, including: multivariate (LR) and machine learning techniques using "Extreme Gradient Boosting (EGB) and Random Forest (RF)." In addition, the results of the different models are evaluated to select the optimal model based on five statistical indices, including receiver operating characteristic-area under the curve (ROC-AUC), overall accuracy (OAC), kappa index (Ka), root mean square error (RMSE), and mean absolute error (MAE). These results will be useful for planners, investors, researchers, and decision makers in impact assessment to predict the susceptible flood zones in the future (Bathrellos et al. 2017;Skilodimou et al. 2019). Also, various mitigation strategies and contribute to sustainable development of the basin could be suggested.

Study area characteristics
Wadi El-Matulla catchment (length is approximately 130 km) and its surrounding are located in the eastern desert of Egypt, starting from the "Red Sea Mountains," crossing the sandy Ababda plateau. Wadi El-Matulla tributaries (e.g., Wadi El-Hammamat, Wadi Zaidun, Wadi El-Mishash and Wadi El-Muweih) drain the catchment area during rainstorms toward the Nile Valley (westward) where many urban areas are located such as the Al-Kalahin village and the town of Qift. The area is situated between longitudes of 32° 50′ 31'' and 34° 06′ 31'' and latitudes of 25° 26′ 34'' and 26° 19′ 56''. Thus, its total area is almost 7231 km 2 (Fig. 1). Also, this wadi is characterized by numerous ancient monuments that located along the ancient Qift-Qusier road, including numerous remains of ancient wells from the Pharaonic and Roman periods (e.g., the Hammam Cleopatra, Bir Lakita and Bir Fawakhier).
Geologically, the rocks in the basin area vary between igneous, metamorphic and sedimentary rocks (Conoco 1987;Said 1993), where the basin is divided into three terrain zones, namely: the Red Sea mountain range occupying the eastern part of the basin and covering (40% of the basin area). This zone is characterized by many mountains such as mountains Matiq 1112 m above sea level (asl), and Nasib Azraq (1062 m asl). The area is dissected by many normal faults and joints and is crossed by numerous wadis (valleys). The second zone is the plateau zone (Ababda Plateau), which occupies the central and western part of the basin area covering 50% of the area. This zone is composed of sedimentary rocks, mainly sandstone and chalk (Upper Cretaceous) and limestone (Paleocene, Eocene and Pliocene). The plateau is a semi-flat surface and covered 1 3 by a few isolated mounds. This plateau dissected by numerous wadis. The third zone is the lowest part of the a basin, which accounts for 10% of its area and consists mainly of the valley floor of the main wadis and the fan deposits of the some tributary basins that flow into this zone (mainly filled by thick sand and gravel deposits). Its height ranges from 76 to 160 m, and its surface is semi-level with very low gradient.
Based on meteorological investigation, the catchment area belongs to arid to hyperarid zone, which characterized by erotic precipitation with an average annual rainfall of less than 100 mm (Sultan et al. 2008). According to the rainstorms, the catchment experienced various flash flood events (e.g., 2016 and 2021). These flooding events are

Data used
Several datasets were used for the current study, including: (1) Digital Elevation Models (DEM): DEM (12.5 resolution) of PALSAR ALOS was acquired in 2020 (https:// asf. alaska. edu/ data-sets). It was used to extract various important parameters for the flood conditions (geomorphological layers), including eight parameters such as elevation, slope angle, plan and profile curvatures using ArcGIS (v.10.8), as well as topographic position index (TPI), convergence index (CI), slope length (LS), and topographic witness index (TWI) using SAGA (v8.0) software. (2) Topographic and geological maps: In the current study, different topographic maps (Egyptian Survey Authority, 1990) at 1:50,000-scale and geological maps (Conoco 1987) at 1:250,000-scale were used to extract drainage networks and lithological units. (3) Satellite and Google Earth imagery: The current study relies on three types of remote sensing data, including passive satellite imagery, where Sentinel-2 (https:// earth explo rer. usgs. gov) was utilized to extract NDVI and land use/land cover with a spatial accuracy of 10 m. The other type of remote sensing imagery used in this study is Sentinel-1 (SAR), acquired from Sentinels Scientific Data Hub (https:// apps. senti nel-hub. com). These images were used to identify flooded areas and create a dataset for the inventory. The third type of data is the use of Google Earth imagery, which was used to map the various flood locations related to the previous events. (4) Metrological data: It was used to estimate the distribution of rainfall depth in the catchment area, and three meteorological stations were identified, namely: Qena, Luxor and Qusayr, which were obtained from the "Water Resources Research Institute of the Ministry of Water Resources and Irrigation." (5) Anthropogenic parameters: Roads were extracted from government data and high-resolution Google Earth images. (6) Field study: Various field studies were carried out between 2018 and 2021 to study the geographical features of the catchment and verify the accuracy of the results (flooded areas extracted from historical records and from Google Earth and radar imageries).
Seven steps are required for flood susceptibility mapping, as follow: (1) data collection from various sources; (2) production of a flood map (inventory map) based on remote sensing imagery, historical records, and field visits; (3) extraction of various flood-conditioning factors and construction of a database using ArcGIS 10.8; (4) application of multivariate (LR) and machine learning models (EGB and RF); (5) construction of flood susceptibility maps; (6) validation of flood models using ROC-AUC and other statistical parameters; and (7) selection of optimal flood model for flood management analysis (Fig. 2).

Inventory map (dependent factor)
Historical and current flood events are crucial for the preparation of the flood inventory, which serves as the basis for flood susceptibility modeling to predict the potentially vulnerable areas for future floods (Sarkar and Mondal 2020). With the advantages of remote sensing applications, "C-Band Sentinel-1 Synthetic Aperture Radar 1 3 (SAR)" has been used to detect flooded areas after flood events. These data are available with no charge, less amount of revisit time, works with all-weather conditions, and sensitive to water bodies and wetland areas (Voormansik et al. 2014;Filion et al. 2016;Anusha and Bharathi 2019). Two images from Sentinel-1 SAR were acquired from Sentinels Scientific Data Hub (https:// apps. senti nel-hub. com) covering the study area and surrounding areas. These images were acquired on February 25, 2016 (flood event on February 14, 2016), and November 19, 2021 (flood event on November 13, 2021). The data were processed using ENVI v.5.5 and ArcGIS v.10.8 software. The data used in the current study were corrected for speckle and noise effects. Most of the applications of polarimetry in the current study are wetland and flooded area identification. Accordingly, Sentinel-1 imagery was acquired after the 2016 and 2021 flood events. Sentinel-1 is dual polarization system with two transmitting and receiving signals in either horizontal (H) or vertical (V) polarization. Different objects on the ground (forest canopy and ocean surface) can have different characteristics leading to produce distinct polarization signature (such as, different reflection, intensities and exchange polarization between H and V). Polarimetric techniques can be used to separate different scattering leading to provide information about these various objects. In the current study, the Sentinel-1 image of the study area shows the composite RGB (color) image generated using various channels such as VV, VH, and VV/VH ratio for red, green, and blue, respectively. The RGB color composite of Sentinel-1 SAR imageries of the current basin area has been used to highlight certain features of the flooded areas after the flood events (downstream area of the El-Matulla basin (Fig. 3a, c). Additionally, the Google Earth program was applied to detect and visualize past floods. Google Earth has a time-track feature that allows us to go back in time to see the impact of flooding on the affected areas. In the current study, Google Earth has been applied to extract the flooded locations after the two flood events of 2016 and 2021 (Fig. 3b, d), where the signature of flowing water is represented by white color (upstream section of the wadi). Field photographs were also taken after the November 13, 2021, flood event (Fig. 3e, f). As can be seen from the photographs, the flood waters flowed through the villages and isolated them. People were afraid of drowning during the flood; accordingly they were standing above their houses, and it seems that after the flood, the community tried to build some dikes to protect themselves from the rains. Based on all these data obtained from Sentinel-1 imagery, Google Earth, archived surveys, and other historical records, a flood database was created in ArcGIS 10.8. In this work, 480 flood points and non-flood points were prepared (Fig. 2). These data were randomly separated into . Note that people are standing above the roofs of buildings to avoid the flowing water two datasets called training (70% flood and non-flood) to build the flood susceptibility models. The remaining datasets (30% of flood and non-flood) were used for model testing and to help in selecting the optimal model based on accuracy and performance evaluations (Islam et al. 2021;Tang et al. 2021).
where A is the catchment area and β is the slope angle (in °) (Wolock and McCabe 1995).
The NDVI index map was produced based on the relationship between band eighth (near-infrared electromagnetic radiation (wave length of NIR = 842 nm)) and band fourth (red electromagnetic radiation (wave length of R = 665 nm)) (Eq. 3). The plant has a very strong near-infrared absorption, and therefore they are used in calculating the NDVI (Rouse et al. 1974).
Topographic positioning index was calculated based on Eq. (4) (Guisan et al. 1999) where Zo is the height of the desired cell and Zi n is the average height of neighboring cells.
ArcGIS 10.8 was used to create the stream density layer at 12.5 m resolution using the line density tool, while the distance to roads (DTR) and distance to wadis (DTW) were created using the Euclidean distance tool. Different LULC types in a watershed can significantly affect flood susceptibility and were extracted based on a supervised classification of the Sentinel-2A image. The LULC map was classified into five categories, including agriculture, urban, sparsely tree cover, wadi deposits, and barren rocks. To estimate the   The topographic position index is defined as the height difference between adjacent pixels in the area (Weiss 2001). If the height of the pixel is higher than the neighboring pixels, then it has a positive value, while a negative pixel means a lower height than the neighboring pixels, which generally represents the flooded areas (

R Points (Rain stations)
Precipitation is a major factor causing flooding, and many studies have shown that it is a triggering factor that can contribute greatly to flooding potential  Rainfall is essential factor that causes inundation, and many studies indicated that it is a trigger factor that can strongly contribute to the flooding potential ( amount of rainfall in the 100-year return periods, the rainfall data recorded in the three stations were collected, which are daily data for the period between 1970 and 2018. Statistical probability distribution methods were applied to select the suitable distribution for the rainfall data of the studied meteorological stations. Our study shows that the generalized extreme value method (GEV) was suitable for the Qena and Qusayr stations; Person type III was suitable for Qusayr station. The 100-year return period rainfall values were 70, 57, and 45 for Qena, Luxor, and Qusayr stations, respectively. The "inverse distance weighting (IDW)" algorithm was used to generate the rain distribution map as it is the most common technique for interpolating scatter points (Kilinc 2018).

Background of the algorisms
Flood susceptibility mapping is crucial to identify the vulnerable zones of potential flooding and could be created by identifying the relationship between actual flood locations and associated causal factors. Three models were used in this study including logistic regression model (LR), Extreme Gradient Boosting (EGB) and Random Forest (RF).

Logistic regression (LR)
LR is a common multivariate algorithm that identifies the regression relationship between independent factors (two or more factors) and a dependent variable (one factor) (Liao et al. 1988). It calculates the probability that an event will occur versus the probability that an event will not occur (Sofia et al. 2018). It was used as a proven approach for flood susceptibility mapping and identify the most contribution variables (Wubalem et al. 2020). Equation (5) shows the relationship between flood events and the conditioning factors.
where P is the probability of flooded or non-flooded area. The coefficients are determined as the best mathematical fit for the specified model. A coefficient indicates the influence of each independent variable on the outcome variable, taking into account all other independent variables. This technique has been adapted as an accurate hazard prediction algorithm that can determine the probability of flooding (Band et al. 2020;Malik et al. 2020). The LR model was performed in the current study using SPSS (v.26) statistical software, with 15 flood-conditioning factors (independent factors) and flood and non-flood points (dependent factor). The predicted value of the model is the sum of the results of the individual products formed by multiplying the coefficient value and the independent variables, as shown in Eq. (6).
where B1-B15 are the coefficient values and X1-X15 are the flood-related factors. The extracted linear model is LR as a function of the presence or absence of flooding as a function of the values of the independent factors associated with the occurrence of flooding in previous years. The probability of flood susceptibility is expressed by a value ranging from 0 to 1.

Extreme gradient boosting (EGB)
The EGB, a machine learning algorithm, is an advanced supervised algorithm, and was developed by Chen and Guestrin (2016). The major advantages of EGB technique are its ability to generate a strong learner from multiple weak learners outcomes, dealing with missing data in the datasets, and tuning the factors without overfitting the model, and using parallel processes to reduce the computation time (Fan et al. 2018;Naghibi et al. 2020). To conduct EGB model, three steps are required as follows: (1) building the learning feature over the entire dataset of factors, (2) constructing the next model over the residuals, and (3) ending the procedure as it reaches "stopping criteria" (Fan et al. 2018). In the current study, the EGB was performed based on the open-source code "xgboost" for R software (Chen et al. 2015).

Random forest (RF)
RF was developed by (Breiman 2001) as an ensemble approach is generated by "numerous decision trees as predictors and run for classification and regression analyses." Many authors pointed out that RF is a robust and flexible method where random trees are applied by a set of cases through a bootstrapping technique, where the cases not considered in the designed trees are out of the bag (Zhu and Zhang 2021). RF possess many advantages as follows: (1) ability to categorize parameters based on their contribution using "mean decrease accuracy (MDA) and mean decrease Gini (MDG)" (Wang et al. 2016) and the ability to process a large number of datasets and obtain satisfactory results (Rahman et al. 2019a, b). In this study, the flood susceptibility map was created using the random Forest package in R software (Briman and Cutler 2015). The final map was created and classified using ArcGIS 10.8.

Multicollinearity, models' validation and comparisons
Various methods are used to evaluate factor effectiveness in model construction, such as the multicollinearity method (Yoo and Cho 2019). The "variance inflation factors (VIF) and tolerance (TOL)" were applied to evaluate the effectiveness of factors (Eqs. 7 and 8): j Represents the regression coefficient of the independent factor J on all other dependent factors. Rahman et al. (2019a, b) indicate that a TOL < 0.10 and a VIF > 5 account for multicollinearity issues.
In addition, reliability of the models' performances can be measured using the "receiver operating characteristics (ROC) and the area under the curves (AUC)" method. It is a crucial step in flood susceptibility modeling (Metz 1978). The "ROC-AUC" was used in this study to calculate the accuracy of the flood susceptibility maps, which are appropriate indicators and provide valuable information . Cao et al. (2020) classified the AUC values into four zones as follows: less than 0.6 which indicates no scientific significance of the model (weak), between 0.6 and 0.7, which shows moderate significance model; from 0.7 and 0.8, which represents good model; and above 0.8 indicating a very good model. To compare between these susceptibility models, various arithmetic indices were applied using (1) discrimination accuracy measures (using hazard accuracy (HAC), nonhazard accuracy (NHAC), overall hazard accuracy (OHAC), and kappa index (K)) and (2) reliability assessment using "mean absolute error (MAE) and root mean square error (RMSE)" (Hembram et al. 2021). These indices are shown in Eqs. 9-14. Hazard accuracy (HAC), non-hazard accuracy (NHAC), and overall hazard accuracy (OHAC) equations are determined based on the confusion matrix (Table 2) which depends on flood and non-flood points.
where TF = number of correctly classified true-flood pixels, TNF = number of correctly classified true-non-flood pixels, FF = number of false-flood pixels correctly classified, FNH = number of false-non-flood pixels that are incorrectly classified, Xei = values predicted by the model, Xoi = values observed by the model, n = the number of data points, Lc = correct flood pixels and correct non-flood pixels, and Lexp are the expected matches.

Multicollinearity test
The VIF and TOL multicollinearity indicators of the 15 flood-conditioning factors used in the current work are shown in Fig. 5. It was noted that tolerance (TOL) values are acceptable which are above 0.1 (the minimum value is 0.155 of elevation factor) and the VIF values are below 5 (the maximum value is 3.901 of geology factor) except one elevation factor which provides VIF value above 5 (elevation = 6.432). Accordingly, the results showed that there was no multicollinearity among 14 factors which were used in the models and the elevation factor was excluded from model construction in this study.

Factors importance evaluation (FsIE)
The importance of flood-related factors in the current study was determined using RF by calculating the mean Gini decrease. It is presented in Fig. 6 as a percentage (%) after excluding the elevation factor from the analysis. The results show that three factors are above 10% (distance to wadis has the highest value with 23.9%, followed by TWI with a value of 17.3%, then slope with a value of 13.2%), three factors from 5 to 10% (drainage density (9.81%), geology (6.17%) and NDVI (5.51%)), and the remaining factors are below 5% (CI has 4.8%, rainfall (4.38%), LULC (4.1%), LS (2.96%), PlC (2.42%), TPI (2.01%), PrC (1.93%) and the lowest value is for distance to road (1.62%)).

Flood susceptibility models
In this work, the ArcGIS software 10.8 was used to conduct flood susceptibility maps for wadi El-Matulla basin and its surrounding, Egypt. Natural breaks classified (Nicu 2018) was applied to divide the flood susceptibility index maps (FSIMs) into five zones (including; "very low, low, moderate, high, and very high") (Cao et al. 2020;Swain et al. 2020).

Models performance assessment and comparisons
Validation datasets (30%), for both "flood and non-flood locations," which were not applied for model construction, were utilized to identify the model's performance (prediction rate). Additionally, other statistical parameters such as overall accuracy, Kappa parameter, RMSE, and MSA statistical indices were used to compare between the models (Fig. 7)

Discussion
In the present study, three algorithms and GIS techniques were used to evaluate the flood susceptibility zones in the drainage areas of El-Matulla Wadi and its surroundings in Egypt. Fifteen factors were considered as the most important flood-related factors affecting runoff. Future development of buildability is an essential issue to ensure sustainable urban and infrastructural development worldwide. Accordingly, understanding the extent of flood-prone areas is a critical factor in ensuring a future sustainable environment (Bathrellos et al. 2016). In the present work, predictions were made using LR (easy to interpret), EGB and RF (high performance). Based on these models, flood susceptibility maps were generated and then the best predictive model (the optimal model) was selected based on various statistical indices that can be used for future development of urban, agriculture, and industrial suitability of the study area. This will provide important information to decision makers and planners to select less flood-prone areas.
In the present work, multivariate statistical analysis (LR) was applied and shows acceptable performance with AUC = 80%, which is consistent with the work done by Tehrany et al. (2019) on flood susceptibility, where "the statistical index (Wi), logistic regression (LR), and frequency ratio (FR)" were used, and the AUC for LR reached 79.54%. Moreover, EGB is a boosting technique known to be a robust feature in data mining algorithms, resulting in better performance than the model LR with an AUC value of 86%. Our results are also very close to the results of Abedi et al. (2021). They used four models named Fig. 7 FSMs for the Wadi El-Matulla basin and its surroundings using LR, EGB and RF models and classes percent "Classification and Regression Tree (CART), RF, Boosted Regression Trees (BRT), and XGBoost." They concluded that RF performed best with (AUC = 95.6%), while "XGBoost" performed well (AUC = 89.2%). Studies have shown that the main advantage of Extreme Gradient Boosting is its strong regularization parameter that overcomes the overfitting problem (Abedi et al. 2021). In addition, Janizadeh et al. (2021) applied Extreme Randomize Tree and Extreme Gradient Boosting for flood susceptibility analysis and found that EGB provides higher model effectiveness (AUC = 91.95%) than ERT, which provides an AUC of 91.37%, which is consistent with our results for EGB, which provides a higher AUC (86.0%) than logistic regression (LR) with an AUC of 80.0%.
Our findings indicated that the RF and EGB algorithms have higher accuracy than the LR model. Regarding the produced susceptibility maps generated by these two models, the areas with very high and high flood hazard are distributed mainly in the lower reaches of the wadis, especially in the western part of the study area (Fig. 6b, c). Moreover, RF has achieved high accuracy and performance due to its unique characteristics such as robustness in feature detection, strength in identifying noise and outliers, working with various inputs and large datasets without removing factors, and better interpretability (Park et al. 2019;Paul et al. 2019). In the current work, the result of RF susceptibility map (Fig. 8a) was verified by comparing with real flooded areas in 2016 and 2021 using Sentinel-1 images (VV-VH-VV/VH, RGB). After special filtering, we were able to highlight the flooded areas in blue color that were inundated during the February 14, 2016 (Fig. 8b), and November 13, 2021 (Fig. 8c), flood events. The result of the RF susceptibility map agreed well with the Sentinel-1 images. Chen et al. (2020a, b) pointed out that RF has its ability to categorize various factors based on their contribution). Accordingly, the results concluded that distance to wadis plays a crucial role in model results, the second factor is TWI, followed by slope, drainage density, geology, and NDVI (Fig. 9). Our results show that distance to wadis is a significant contributor to flood susceptibility modeling. In addition, lower slopes are associated with the western part of the study area, which is characterized by high TWI and lower elevation, resulting in higher flood susceptibility. In addition, areas with high slopes away from wadis and moderate to high elevations are not expected to experience flooding, making future development in these areas sustainable (see green and yellow areas in Fig. 8a).

Conclusion
Identifying areas at high risk of flooding is a critical step in implementing long-term sustainable projects. However, the lack of high-quality data (both spatial and temporal) poses a challenge to the accurate management of flood disasters, especially in developing countries such as Egypt. This study provides an accurate methodology to produce powerful flood susceptibility maps for large regions by applying three models including LR, EGB and RF.
The comparison of the models shows that RF and EGB have high performance and their results have satisfactory efficiency. The EGB model had an AUC value of 0.86, which is considered very good. On the other hand, the RF model had an AUC value of 0.93, which is considered to have excellent predictive ability in evaluating flood susceptibility. Therefore, we highly recommend the use of RF approach (the optimal model) for future flood vulnerability assessment. In addition, the results of the significance of the variables showed a high influence of the distance to wadis, the topographic wetness index, slope, drainage density, geology and NDVI in modeling flood susceptibility. This means that topographic parameters play an important role in flood modeling. It was also found that susceptibility maps could be improved if the quality of spatial resolution of digital elevation models is increased. This will improve the factors derived from DEM. Our results indicate that the western part of the study area is most affected by flooding, along the contact between the Nile floodplain and the areas reclaimed from the desert, which include lowland regions. The current work proposes an appropriate method to compare different models in order to select the optimal flood susceptibility model, which could be a cornerstone for the sustainability of future development. Moreover, the flood-prone areas could be saved from flood damage by appropriate preventive measures.