Multiple Kernel Learning with Maximum Inundation Extent from MODIS Imagery for Spatial Prediction of Flood Susceptibility

Identifying flood prone areas is essential for basin management. In this paper, a spatial prediction technology of flood susceptibility based on multiple kernel learning (MKL) is proposed. We establish the flood susceptibility model by using EasyMKL, nonlinear MKL (NLMKL), Representative MKL(RMKL), Generalized MKL(GMKL), support vector machine(SVM) with linear kernel and SVM with Gaussian radial base function(RBF) kernel, The spatial prediction of flood susceptibility in Sanhuajian basin of the Yellow River is carried out. We use MODIS remote sensing images to obtain historical flood inundation sites in the study area. Then, ten flood conditioning factors are used as inputs to the flood susceptibility model. The model performance is evaluated in terms of accuracy (ACC), balanced F Score (F1 score), and areas under the curve (AUC). According to the results, MKL significantly outperforms the SVM adopting single kernel, and NLMKL(ACC=0.833,F1=0.841,AUC=0.889) demonstrates the best comprehensive performance. The flood susceptibility map generated by MODIS remote sensing images and MKL, therefore, can provide effective help for researchers and decision makers in flood management.


Introduction
Flood is one of the most destructive natural disasters. It inflicts massive losses all over the world on a yearly basis (Chen et al. 2020). From 1980 to 2016, flood accounted for 23% of global economic loss (Mateo et al. 2017). Such losses can be mitigated or even averted by effective flood risk management methods.
Hydrodynamics-based flood inundation models, such as HEC-RAS (Pappenberger et al. 2005), Mike21 (Beilicci et al. 2016), Delft3D (Hoch et al. 2018), LISFLOOD-FP (Rajib et al. 2020), etc., can provide high-precision information on the spatial distribution of flood risks in a river basin so as to facilitate decision-making. However, it is not easy to use these models. These models require researchers to have high professional knowledge, and model calculation requires detailed basin data, a large amount of computing resources, and each calculation case takes a long time (Teng et al. 2017). Therefore, they are not suitable for basin scale flood assessment.
In recent years, flood susceptibility map (FSM) combined with machine learning and geographical information system (GIS) technology has been widely used in flood management (Tehrany et al. 2013;Shafapour et al. 2019;Zhao et al. 2019;Liuzzo et al. 2019). Machine learning model of FSM identifies flood prone areas by learning the relationship between historical flood records and flood conditioning factors. At present, the commonly used methods include SVM (Tehrany et al. 2014;Zhao et al. 2019), artificial neural network (Kia et al. 2012;Tien et al. 2020), decision tree (Tehrany et al. 2013;Yariyan et al. 2020), Bayesian method (Janizadeh et al. 2021).
MKL is a machine learning method based on SVM. MKL, now, has become a hotspot in machine learning researches and been widely used in imaging (Gu et al. 2016), bioinformatics (Qi et al. 2020), and finance (Arratia et al. 2020), but it has not been applied to the research of FSM. MKL learns through a combination of multiple kernel functions instead of one, and the kernel function selection and parameter optimization are the key to performance. The kernel combination can be linear, nonlinear, or data dependent (Gönen and Alpaydin 2011). Compared with SVM using single kernel function, MKL method has better generalization ability and robustness.
In the actual modelling process, however, insufficiency and poor availability of the historical data (labeled data) is a commonplace issue. Moreover, since the majority of studies only focus on a single flood outbreak in the study area, the findings are prone to have limitations and do not consider the comprehensive situation of multiple floods in the basin over a long period of time. In view of this, we employed the open water likelihood index (OWL) method to extract the inundation range from MODIS remote sensing images in the study area (Huang et al. 2014). The historical maximum flood inundation range in the study area is obtained by combining the inundation ranges in differente periods, and used this as the base map for the sampling of the machine learning model. This research focuses on using different combinations of MKL to classify the flood susceptibility in the study area and draw the FSM. The study area is from Sanmenxia to Huayuankou of the Yellow River Basin in China. EasyMKL, a scalable MKL algorithm (Aiolli and Donini 2015) and NLMKL (Cortes et al. 2009) are adopted, compared with SVM (linear kernel), SVM (RBF kernel), RMKL (Do et al. 2009), and GMKL (Varma and Babu 2009), and their performance is evaluated.

Description of Study Area
This paper examines the area between Sanmenxia and Huayuanko (A.K.A. the Sanhuajian Basin) in the middle reaches of the Yellow River, the longitude between 109.73° E and 113.58° E, and the latitude between 33.64° N and 36.97° N. The area of the watershed is 40,000km 2 . Sanhuajian is surrounded by mountains in the north, west and south; the Yellow River runs from west to east across it in the middle. The two tributaries of Yiluo River and Qin River join the Yellow River above Huayuankou. The whole terrain of Sanhuajian is high on the north and south sides and low in the middle. Its altitude is about 2000 meters in the north and 1600 ~ 1800 meters in the south. In the middle, it gradually drops below 100 meters from the north and south sides. A semi-humid area, Sanhuajian witnesses frequent rainstorms from June to August, the annual precipitation is 600 ~ 800 mm, and the precipitation distribution decreases from south to north. (Wang and Zheng 2014).

Historical Daily Flow Data
We obtained the historical daily flow data of the Huayuankou Site from the YRCC website 1 , which cover ten complete water years (from 2007 to 2017) as shown in Fig. 1 Flooding is usually caused by flood peaks, and the major peaks are represented by red dots in the Fig. 1.

MODIS Imagery Maximum Inundation Extent and Flood Inventories
MOD09A1 is a synthetic product of the MODIS surface reflectance by the Terra satellite following an exclusion of interferential factors like atmospheric radiation and aerosol. Its spatial resolution is 500 meters, and its temporal resolution is once every eight days (Barnes et al. 1998). MODIS remote sensing imagery is widely used in flood detection and suitable for large-scale spatial research (Chen et al. 2013;Coltin et al. 2016). This study adopts the open water likelihood (OWL) index (Huang et al. 2014) to extract flood inundation ranges from MOD09A1 remote sensing images. The OWL index contains five parameters sensitive to water bodies, namely, the short-wave infrared band (band 6 and band 7), NDVI (Townshend and Justice 1986), NDWI (Gao 1995) and the multi-resolution valley bottom flatness index (MrVBF) (Gallant and Dowling 2003). Modeled by nonlinear logistic regression, the OWL index adopts the expression below: where where x1 = SWIR Band 6 (reflectance × 1000); x2 = SWIR Band 7 (reflectance × 1000); x3 = NDVI; x4 = NDWI; x5 = MrVBF; a0 = −3.41375620; a1 = −0.000959735270; a2 = 0.00417955330; a3 = 14.1927990; a4 = −0.430407140; a5 = −0.0961932990. The OWL index value ranges from 0 to 100, representing the probability of water existence in each pixel of the MOD09A1 remote sensing images. Whether the pixel is a flooded or nonflooded one is determined by setting the corresponding threshold. In this study, the threshold is set as 0, so if the value exceeds 0, it is judged to be a flooded pixel.
The flood inundation range of a single image is obtained through the OWL index. ArcGIS can help synthesize multiple flood inundation ranges of the same study area throughout a long time series into the maximum flood inundation range, which is used as the base map for the sampling of the machine learning model. The chosen sample sites are labeled "flooded" or "non-flooded".
According to the time points of flood peak occurrences in Fig. 1, we pinpointed the corresponding dates of the MOD09A1 remote sensing images, along with the dates before and after the occurrences. The flood inundation range of each remote sensing image was extracted by OWL index and synthesized via ArcGIS into the maximum flood inundation range of the study area.
In the maximum flood inundation range, 310 flooded sites were randomly selected; in the non-flood inundation range, 310 non-flooded sites were randomly selected. In Fig. 2, the red dots stand for flooded sites, and the blue dots, non-flooded sites. We used 70% (434) of the total sample sites as the training set and 30% (186) as the test set, marking the flooded sites as "1" and the non-flooded sites as "-1".

Flood Conditioning Factors
Flood disasters involve an extremely complex process and can be attributed to many interplaying factors. Drawing reference from previous studies (Zhao et al. 2019;Dodangeh et al. 2020;Tien et al. 2020), this paper selects out ten flood conditioning factors, including elevation, slope, aspect, curvature, Stream Power Index (SPI), Topographic Wetness Index , distance from river, soil type, vegetation, and annual average rainfall. Table 1 lists the data source, format, spatial and temporal resolution of these flood conditioning factors. Topography mainly affects the occurrence and distribution of flood through elevation and topographic fluctuation, which is also considered to be the main factor affecting the occurrence and distribution of flood. The elevation data of the study area is ASTER GDEM V2 product, which is downloaded from "Geospatial Data Cloud" 2 . The slope, aspect, curvature and distance to rivers are obtained by processing DEM data with ArcGIS software.  Stream Power Index (SPI) represents the river erosivity of the catchment area, while Topographic Wetness Index (TWI) takes into account the influence of the terrain and soil characteristics on soil moisture distribution (Regmi et al. 2010). They are calculated using the following formulas: where A (m 2 ) is the upstream catchment area or flow accumulation, b (m) is the unit width of water flow, and β (radian) is the slope.
Vegetation type and soil type jointly determine water infiltration, water holding and water control capacity. The data of soil type and vegetation type comes from "earth big data engineering data sharing service system" 3 .
The data of annual average rainfall for many years are from "Daily value of surface precipitation in China" in 0.5° × 0.5° grid data set (V2.0) " 4 in " China Meteorological Data Service Centre", we selected the data of the study area from 2007 to 2017, calculated the annual average rainfall of 10 years by using ArcGIS, and formed the multi-year average rainfall distribution map by spline function interpolation.
Because the resolution of each factor is different, we used ArcGIS to preprocess all the data and resampled them into raster data with a resolution of 90m, so that all raster data share the same coordinate system and pixel size, making it convenient for machine learning and statistical analysis. Flood conditioning factors are shown in Fig. 3.

Multiple Kernel Learning
Support vector machines often use a single kernel function to measure the similarity between different instances. The selection of the kernel function and its parameters determines the structure of feature space as well as the classification effect (Damasevicius 2010). However, a single kernel function cannot fully describe data similarity. MKL, in contrast, combines multiple kernel functions and thus often outperforms single kernel SVM. The kernel combination of MKL can be linear (weighted addition of multiple kernels), nonlinear (product or power of multiple kernels) or data dependent (kernel combination in relation to sample characteristics). By finding the best kernel combination parameters, MKL combines several base kernels and makes use of their respective advantages to obtain a kernel function with stronger learning and generalization abilities. Common methods of determining the kernel combination parameters include the fixed rule-based method, heuristic method, and optimization method (Gönen and Alpaydin 2011). This paper introduces EasyMKL of linear kernel combination and NLMKL of nonlinear kernel combination. Both of them use optimization method to determine the parameters of kernel combination.

EasyMKL
EasyMKL (Aiolli and Donini 2015), a highly scalable MKL algorithm, can effectively handle hundreds of thousands of kernels. Compared with other MKL algorithms (e.g. RMKL, GMKL, etc.), it shows better performance and maintains good robustness even in the presence of data noise (Aiolli and Donini 2015;Saleh et al. 2018;Arratia et al. 2020). This algorithm has a linear combination of the base kernels in the following form: where K r is the base kernel function, R is the number of base kernels, and r , the weight coefficient of each base kernel. EasyMKL obtains the best combination parameters by solving a min-max problem about the variables and : where is the probability distribution of positive and negative training samples, is the base kernel combination vector, a unitary norm vector. Ŷ is the label vector of training set, and K r , the r-th base kernel. Define the vector of the r-th entry d r ( ) = TŶK rŶ , and then the optimization problem can be written as: By maximizing the distance between positive and negative samples, the min-max problem can be simplified as a quadratic problem, of which the analytical solution is as follows: The weight parameters of each base kernel K r are obtained through the following formula:

NLMKL
NLMKL is a nonlinear polynomial kernel combination method (Cortes et al. 2009): where d is the degree of the combination of polynomials, K is a positive semidefinite matrix. In order to reduce the complexity, only the case where d = 2 is taken into consideration, and thus the nonlinear kernel combination can be given the following expression: Apply (11) to SVM, and then the weight coefficient r of each base kernel is obtained by solving a min-max problem: where Ω is a positive, bounded, and convex set. Norm-1 and norm-2 bounded sets constitute the two possible choices for the set Ω , and the norm-2 bounded set is adopted in this paper.
In Eq. (13), 0 is the bias parameter of the base kernel weight vector . In actual implementation, 0 is initially taken as 0 and Λ as 1. The min-max optimization problem of Eq. (8) can be solved by a two-step method. This paper proposes a momentum-based gradient descent algorithm (Algorithm 1). In each iteration, firstly, given the fixation of the base kernel weight vector , the support vector machine problem is solved to obtain . Then, considering the constraint Ω 2 , the base kernel weight vector is updated by the gradient descent method with momentum term while the SVM parameters are fixed. Repeat these two steps until convergence. Because the momentum term increases gradually in the gradient direction while decreasing gradually where the gradient direction changes, we can get faster convergence speed and weaker oscillation.
As the focus of this paper is to develop a machine learning model for spatial prediction of flood susceptibility, we have compared the two proposed MKL models with other algorithms, selecting SVM (linear kernel), SVM (RBF kernel), RMKL, and GMKL as baseline methods. The setting particulars of the four MKL algorithms (EasyMKL, NLMK, RMKL and GMKL), all of which use linear and RBF kernels as the base kernel, will be covered in Sect. 3.5.

Experiment Settings
The historical flood inundation data set (Sect. 3.2) includes 620 instances (310 flooded sites and 310 non-flooded sites). Randomly select 70% (434) of the instances as the training set and 30% (186) of the instances as the test set. The number of flooded sites and non-flooded sites in training set is equal, and so is the test set. The dimension (feature length) of each instance is 10. In order to get the best performance, the data set has been standardized. SVM (linear kernel), SVM (RBF kernel), RMKL and GMKL were adopted as the baseline methods. EasyMKL, NLMK, RMKL and GMKL all used linear and RBF kernels as the base kernels.
We implemented these algorithms in Python using Sklearn Library and PyTorch Library and found the best hyper parameter configuration through 5-fold cross-validation and grid search: C for SVM and for RBF kernel function. The value range of C is C = 10 −2 , 10 −1 , … , 10 2 , and that of is = {0.01, 0.1, 0.2, 0.5, 0.8, 1} . Accuracy is the evaluation criterion of the cross-validation. The table of optimal parameter selection is shown in Appendix A.

Performance Evaluation
The performance of each model was evaluated in terms of accuracy (ACC), F1 score, and Area Under Curve (AUC) -common indicators for machine learning model evaluation (Choubin et al. 2019;Hu et al. 2021). TP, TN, FP, and FN were true positive (correctly classified flooded sites), true negative (correctly classified non-flooded sites), false positive (misclassified flooded sites) and false negative (misclassified non-flooded sites), respectively. P and R are precision and recall: As to the receiver operating characteristic (ROC) curve, false positive rate (FPR) is defined as the X-axis, and true positive rate (TPR), the Y-axis. Area Under Curve (AUC), which is widely applied in FSM model evaluation, was used to assess the model performance. When AUC is 0.5, it means the model works like a random model; when AUC exceeds 0.8, it shows the model has good performance (Tien et al. 2016); when AUC is 1, the model features perfect classification. Table 2 presents a detailed comparison of the classification algorithms used in the training set. The performance of the multiple kernel method is better than that of the single kernel support vector machine. In terms of accuracy, NLMKL is the best (0.938), followed by EasyMKL (0.917) and GMKL(0.912). In terms of F score, NLMKL (0.941) and EasyMKL (0.92) are the best performers. Figure 4 shows the ROC curve and AUC value of each model in the training set. EasyMKL (0.973) and NLMKL (0.987) have higher AUC values, similar ROC curves, and therefore similar performances, followed by GMKL(0.970). Table 3 presents a detailed comparison of the classification algorithms used in the test set. In general, the multiple kernel method again outperforms the single kernel support vector machine. In terms of accuracy, NLMKL is the best (0.833), followed by EasyMKL (0.823) and GMKL (0.817). With regard to the F score, the ranking remains unchanged. Figure 5 shows the ROC curve and AUC value of each model in the training set. All classification algorithms show similar performance, yet NLMKL (0.889) has a higher AUC value.

Experiment Results and Analysis
The trained machine learning model was used to predict the probability of flood inundation for each pixel in the study area. The classification methods commonly used in GIS include natural breaks, quantile, equal interval, and geometric interval (Tehrany et al. 2013). The natural breaks method, based on the inherent natural grouping of data, can identify the similar values and group them in the most appropriate manner (Smith et al. 2007). It can also maximize the difference between classes and set the boundary at the point where the difference of data values is relatively large. Therefore, we used the natural breaks method in ArcGIS to divide the model-obtained probability values into five categories (very low, low, moderate, high, and very high) and completed the flood susceptibility map of the study area (Fig. 6). As can be seen from all the figures, the most flood-vulnerable areas are mainly located in the east of the basin. These areas are low and flat, and placed in proximity of the river and most of the vegetation there is cultivated. Flood density was taken into consideration to further evaluate the performance of the flood susceptibility map (Chapi et al. 2017). FD is the ratio of flooded site number to class pixel number in each susceptibility category. Ideally, density values should increase from very low susceptibility areas to very high susceptibility areas (Pradhan and Lee 2010). Table 4 lists the distribution of the categories of flood susceptibility maps by each model, the number of flood sites in each category, and flood density. Figure 7 shows flood density in different flood susceptibility areas by the six compared models. The results show that the flood susceptibility map obtained by the NLMKL (363.68) model has the highest FD in extremely high susceptibility area. Also a multiple kernel learning method, EasyMKL (360.66) shows great performance, which is superior to other baseline methods. Likewise, in very low and low susceptibility areas, the multiple kernel learning method outperforms the single kernel SVM. Among those in comparison, NLMKL (0.72 and 2.87) is the best performer. In terms of flood site coverage, Fig. 8 shows the inundation sites in different flood susceptibility areas by the six compared models. EasyMKL and NLMKL cover 283 and 287 flood sites respectively in the high and very high susceptibility areas, outperforming the linear kernel SVM (254), RBF kernel SVM (269), RMKL (271), and GMKL (275); in low and very low susceptibility areas, EasyMKL and NLMKL cover six and four flood sites respectively, outperforming the linear kernel SVM (11), RBF kernel SVM (16), RMKL (11), and GMKL (10) .

Discussion
FSM can help managers and decision-makers identify flood susceptibility areas in the basin. The establishment of FSM model requires historical inundation site data in the study area, but the data is valuable and difficult to obtain. We found that MODIS remote sensing can be used to obtain historical inundation site data, and the six models used in this paper have achieved good performance. FSM generated by the six models are relatively similar. In the study area of this paper, there is a strong correlation between flood susceptibility and river network. The distribution of river network determines the possibility of flood inundation in a certain place to a certain extent. The closer to the river, the higher susceptibility of flood. The susceptibility of flood is highest in plain areas close to the watershed outlet. Topography is also the main factor affecting flood occurrence and distribution. Areas with low altitude and gentle slope have high flood susceptibility. During the flood season, these areas need key protection.
Among the six models tested in this paper, the flood susceptibility prediction using NLMKL has the highest accuracy and performance, and the MKL performance is better than that of single kernel SVM. Central to MKL is the combination of multiple kernel functions and the learning method for the base kernel weight coefficient. For NLMKL with nonlinear kernel function combination, a momentum-based gradient descent algorithm is proposed to solve the optimal weight of each kernel matrix. This method can eliminate the heterogeneous information in sample features and uneven distribution of data in highdimensional feature space, and quickly and stably obtain the optimal parameters and optimal model.

Conclusion
In this study, two multiple kernel learning methods were developed for flood susceptibility mapping of Sanhuajian of the Yellow River. The performance of the two MKL methods (EasyMKL and NLMKL) is compared with that of the linear kernel SVM, RBF kernel SVM, RMKL, and GMKL. The results show that the MKL method has better performance and is more suitable for flood susceptibility mapping, especially for flood inundation sites extracted from historical MODIS remote sensing images alone. To summarize, the major contributions of this study are as follows: 1. In the absence of any historical records of the inundation sites, the sites are pinpointed using MODIS remote sensing images to provide data sets for machine learning models and complete the flood susceptibility map of the study area. This data acquisition method can provide reference to other areas in need of a flood susceptibility map when lacking a record of the inundation sites. 2. The two MKL models have higher prediction accuracy and higher quality in generating the flood susceptibility map. Their performance is better than that of the single kernel support vector machine. 3. Between the two MKL methods, NLMKL has the higher prediction accuracy, F1 score, and AUC than EasyMKL. In terms of the flood susceptibility map performance, NLMKL is also slightly better than EasyMKL. Table 5 shows the best parameter configuration of the implemented algorithms in paper. Figure 9 shows the maximum flood inundation map, which is extracted from MODIS imagery using OWL algorithm and ArcGIS.