Flood poses significant threat to life and property. This risk is increasing day by day due to rapid population growth and climate change. To mitigate losses from flood, two approaches are employed, viz., structural and non-structural measures. Structural measures involve building reservoirs, levees, and improving channels. Non-structural measures focus on developing flood forecasting systems, implementing floodplain zoning, and organizing evacuation and relocation efforts (Subramanya 2008).
The Bagmati River, which flows from Nepal to Bihar in India, is a chief reason of flooding in the area almost every year. Especially Bihar region having dense population suffers significant loss of life and property during floods. However, government’s proposal of constructing embankments along banks to control flood has been opposed by local people due to past failure of these structures. They fear that building embankments and its failure or breach will result in loss of land, livelihood and affect the fertile soil brought by the river. So, there is need of non-structural measures of flood management like real time flood forecasting (RTFF) in this region.
RTFF is the method of forecasting flood discharge or water level by using real-time hydro-meteorological data (Todini 2005; Tshimanga et al. 2016). It involves collection of real-time data such as precipitation, water level, discharge, evapotranspiration, temperature, humidity, and others, which are input for the rainfall-runoff models and stream flow routing programs to forecast water level or flood discharge for a lead time ranging from few hours to few days. Lead time is the amount of time available between when a flood forecast is issued and when the expected flooding event is anticipated to occur. A longer lead time allows for better preparation, evacuation, and property protection (Jain et al. 2018). However, developing countries like India face challenges due to inefficient weather networks, different terrains and sparse rain gauge stations distribution resulting in data scarcity and hindering real-time flood monitoring (Belabid et al. 2019; Yeditha et al. 2020). To overcome from these issues, Satellite Precipitation Products (SPPs) have arisen as a viable alternative for obtaining precipitation data, providing extensive spatial and temporal coverage. SPP uses remote sensing and space science to estimate precipitation and provide data for ungauged basins and inaccessible remote areas. Unlike rain gauge data, SPPs are not prone to errors such as wind-induced under-catch or evaporation (Yeditha et al. 2020).
The SPPs generally used for precipitation measurement are Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG) (Huffman et al. 2019), Tropical Rainfall Measuring Mission (TRMM) (Huffman et al. 2007) and Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) (Funk et al. 2015). Most of these data are free of cost and easily accessible without restriction in real-time. These SPPs integrated with various hydrological and climatological models have proven to be very effective for flood, drought, weather, soil erosion and landslide monitoring (Belabid et al. 2019; Xiao et al. 2020; Yeditha et al. 2020; Hinge et al. 2021; Yigez et al. 2022; Yeditha et al. 2022). However, before using SPP in any study, its quality should be checked and compared with observed ground-based rainfall data (Kumar et al. 2017; Prakash et al. 2018; Xiao et al. 2020; Gautam and Pandey 2022; Bhattacharyya et al. 2022; Reddy and Saravanan 2022).
There are three types of rainfall-runoff and flood forecasting models: physical, conceptual, and data-driven (Yeditha et al. 2020; Sezen and Partal 2022). Physical models, like MIKE Flood and HEC-RAS, represent detailed physical processes in a watershed but require extensive data. Conceptual models, such as SCN Curve Number and HEC-HMS, are based on mathematical equations and expert knowledge but may not capture all system complexities. Data-driven models, like Artificial Neural Networks (ANNs), use statistical and machine learning techniques and adapt to changing conditions for real-time forecasting (ASCE 2000a,b). However, they are less reliable for limited or poor-quality data. Studies have explored conceptual and physical models with SPPs as rainfall input and achieved good results (Kumar et al. 2017; Li et al. 2018; Belabid et al. 2019; Belayneh et al. 2020; Llauca et al. 2021; Zhou et al. 2021; Soo et al. 2022; Mokhtari et al. 2022). Data-driven machine learning (ML) models, particularly those using ANNs, have emerged as reliable options for flood forecasting (Li et al. 2014; Kim et al. 2016; Nanda et al. 2016; Tripura and Roy 2018; Ghose 2018; Yeditha et al. 2022). These ML models, inspired by the human brain, can predict water level or flood discharge without relying on a basin's internal hydraulic structure. However, they are often considered black box models as they don't provide explicit information on how they arrived at their outputs (Kumar et al. 2018).
For discharge and water level forecasting, ML models like Feed Forward Neural Network (FFNN), Extreme Learning Machine (ELM), Long Short-Term Memory (LSTM), Decision Tree (DT) and Support Vector Machine (SVM) are often used (Jain et al. 2018; Piadeh et al. 2022; Yeditha et al. 2022)
Various studies have suggested that the efficiency of individual singular ML models can be further enhanced by incorporating wavelets, which are mathematical functions commonly used for non-stationary time series analysis in hydrology. In a study by Maheswaran and Khosa (2012), wavelets, viz., db1 (Haar), db2, db3, db4, Sym4, and Spline-B3 were compared for hydrologic forecasting using time series data in a Wavelet Volterra Coupled model. The results revealed that Haar wavelets performed well for time series with short memory and transient features, while db2 and spline wavelets were better suited for long-term features. However, the study emphasized the importance of selecting the appropriate wavelet based on the data through proper analysis. Other studies by Tiwari and Chatterjee 2010, Shoaib et al. 2014, Sehgal et al. 2014, Nanda et al. 2016, Yeditha et al. 2020, and Linh et al. 2021 also demonstrated the benefits of combining wavelets with ML models to improve forecasting accuracy. However, very few studies, especially in India, have explored the use of wavelets and ML models with SPP for RTFF.
Based on the previous studies, the ML models, viz., FFNN and ELM have been selected for the development of flood forecasting models. This paper has two main objectives. First, to compare the daily SPP data of IMERG with observed rain gauge data of Indian Meteorological Department (IMD) and second, to develop singular machine learning models (FFNN and ELM) and wavelet-based hybrid machine learning models (W-FFNN and W-ELM) using the hourly rainfall data of IMERG for forecasting flood water levels at Hayaghat gauging site of the Bagmati River basin. The models have been used to forecast flood levels with lead times of 1 hr, 3 hrs, 6 hrs, 12 hrs, 1 day, 3 days, 5 days, 7 days, and 10 days.