Real-Time Flood Forecasting using Satellite Precipitation Product and Machine Learning Approach in Bagmati River Basin, India

doi:10.21203/rs.3.rs-3193368/v1

Download PDF

Research Article

Real-Time Flood Forecasting using Satellite Precipitation Product and Machine Learning Approach in Bagmati River Basin, India

https://doi.org/10.21203/rs.3.rs-3193368/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 07 Apr, 2024

Read the published version in Acta Geophysica →

You are reading this latest preprint version

Real-time flood forecasting (RTFF) is crucial for early flood warnings. It relies on real-time hydrological and meteorological data. Satellite Precipitation Products (SPPs) offer real-time global precipitation estimates and have emerged as a suitable option for rainfall input in RTFF models. This study first compared the daily SPP data of Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG) with observed rainfall data of Indian Meteorological Department (IMD) from the year 2001 to 2009 using contingency tests. Hourly rainfall from this SPP is used to build four RTFF models based on machine learning: feedforward neural network (FFNN), extreme learning machine (ELM), wavelet-based feedforward neural network (W-FFNN), and wavelet-based extreme learning machine (W-ELM). These models have been trained and tested with the observed data. The model’s performance was also evaluated using various statistical criteria. Results showed good correlation between IMERG and observed data, with a probability of detection (POD) of 85.42%. Overall, wavelet-based models outperformed their singular counterparts. Among the singular models, the FFNN model performed better than ELM, with satisfactory predictions till 5 days of lead time. Further, developed models have been used to forecast hourly water levels at Hayaghat gauging site of Bagmati River with different lead times from 1 hour to 10 days. For a 7-day lead time, only W-FFNN performs well, whereas none of the models performs satisfactory results for a 10-day lead time.

RTFF

FFNN

ELM

Wavelet

SPP

IMERG

Flood poses significant threat to life and property. This risk is increasing day by day due to rapid population growth and climate change. To mitigate losses from flood, two approaches are employed, viz., structural and non-structural measures. Structural measures involve building reservoirs, levees, and improving channels. Non-structural measures focus on developing flood forecasting systems, implementing floodplain zoning, and organizing evacuation and relocation efforts (Subramanya 2008).

The Bagmati River, which flows from Nepal to Bihar in India, is a chief reason of flooding in the area almost every year. Especially Bihar region having dense population suffers significant loss of life and property during floods. However, government’s proposal of constructing embankments along banks to control flood has been opposed by local people due to past failure of these structures. They fear that building embankments and its failure or breach will result in loss of land, livelihood and affect the fertile soil brought by the river. So, there is need of non-structural measures of flood management like real time flood forecasting (RTFF) in this region.

RTFF is the method of forecasting flood discharge or water level by using real-time hydro-meteorological data (Todini 2005; Tshimanga et al. 2016). It involves collection of real-time data such as precipitation, water level, discharge, evapotranspiration, temperature, humidity, and others, which are input for the rainfall-runoff models and stream flow routing programs to forecast water level or flood discharge for a lead time ranging from few hours to few days. Lead time is the amount of time available between when a flood forecast is issued and when the expected flooding event is anticipated to occur. A longer lead time allows for better preparation, evacuation, and property protection (Jain et al. 2018). However, developing countries like India face challenges due to inefficient weather networks, different terrains and sparse rain gauge stations distribution resulting in data scarcity and hindering real-time flood monitoring (Belabid et al. 2019; Yeditha et al. 2020). To overcome from these issues, Satellite Precipitation Products (SPPs) have arisen as a viable alternative for obtaining precipitation data, providing extensive spatial and temporal coverage. SPP uses remote sensing and space science to estimate precipitation and provide data for ungauged basins and inaccessible remote areas. Unlike rain gauge data, SPPs are not prone to errors such as wind-induced under-catch or evaporation (Yeditha et al. 2020).

The SPPs generally used for precipitation measurement are Integrated Multi-Satellite Retrievals for Global Precipitation Measurement (IMERG) (Huffman et al. 2019), Tropical Rainfall Measuring Mission (TRMM) (Huffman et al. 2007) and Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) (Funk et al. 2015). Most of these data are free of cost and easily accessible without restriction in real-time. These SPPs integrated with various hydrological and climatological models have proven to be very effective for flood, drought, weather, soil erosion and landslide monitoring (Belabid et al. 2019; Xiao et al. 2020; Yeditha et al. 2020; Hinge et al. 2021; Yigez et al. 2022; Yeditha et al. 2022). However, before using SPP in any study, its quality should be checked and compared with observed ground-based rainfall data (Kumar et al. 2017; Prakash et al. 2018; Xiao et al. 2020; Gautam and Pandey 2022; Bhattacharyya et al. 2022; Reddy and Saravanan 2022).

There are three types of rainfall-runoff and flood forecasting models: physical, conceptual, and data-driven (Yeditha et al. 2020; Sezen and Partal 2022). Physical models, like MIKE Flood and HEC-RAS, represent detailed physical processes in a watershed but require extensive data. Conceptual models, such as SCN Curve Number and HEC-HMS, are based on mathematical equations and expert knowledge but may not capture all system complexities. Data-driven models, like Artificial Neural Networks (ANNs), use statistical and machine learning techniques and adapt to changing conditions for real-time forecasting (ASCE 2000a,b). However, they are less reliable for limited or poor-quality data. Studies have explored conceptual and physical models with SPPs as rainfall input and achieved good results (Kumar et al. 2017; Li et al. 2018; Belabid et al. 2019; Belayneh et al. 2020; Llauca et al. 2021; Zhou et al. 2021; Soo et al. 2022; Mokhtari et al. 2022). Data-driven machine learning (ML) models, particularly those using ANNs, have emerged as reliable options for flood forecasting (Li et al. 2014; Kim et al. 2016; Nanda et al. 2016; Tripura and Roy 2018; Ghose 2018; Yeditha et al. 2022). These ML models, inspired by the human brain, can predict water level or flood discharge without relying on a basin's internal hydraulic structure. However, they are often considered black box models as they don't provide explicit information on how they arrived at their outputs (Kumar et al. 2018).

For discharge and water level forecasting, ML models like Feed Forward Neural Network (FFNN), Extreme Learning Machine (ELM), Long Short-Term Memory (LSTM), Decision Tree (DT) and Support Vector Machine (SVM) are often used (Jain et al. 2018; Piadeh et al. 2022; Yeditha et al. 2022)

Various studies have suggested that the efficiency of individual singular ML models can be further enhanced by incorporating wavelets, which are mathematical functions commonly used for non-stationary time series analysis in hydrology. In a study by Maheswaran and Khosa (2012), wavelets, viz., db1 (Haar), db2, db3, db4, Sym4, and Spline-B3 were compared for hydrologic forecasting using time series data in a Wavelet Volterra Coupled model. The results revealed that Haar wavelets performed well for time series with short memory and transient features, while db2 and spline wavelets were better suited for long-term features. However, the study emphasized the importance of selecting the appropriate wavelet based on the data through proper analysis. Other studies by Tiwari and Chatterjee 2010, Shoaib et al. 2014, Sehgal et al. 2014, Nanda et al. 2016, Yeditha et al. 2020, and Linh et al. 2021 also demonstrated the benefits of combining wavelets with ML models to improve forecasting accuracy. However, very few studies, especially in India, have explored the use of wavelets and ML models with SPP for RTFF.

Based on the previous studies, the ML models, viz., FFNN and ELM have been selected for the development of flood forecasting models. This paper has two main objectives. First, to compare the daily SPP data of IMERG with observed rain gauge data of Indian Meteorological Department (IMD) and second, to develop singular machine learning models (FFNN and ELM) and wavelet-based hybrid machine learning models (W-FFNN and W-ELM) using the hourly rainfall data of IMERG for forecasting flood water levels at Hayaghat gauging site of the Bagmati River basin. The models have been used to forecast flood levels with lead times of 1 hr, 3 hrs, 6 hrs, 12 hrs, 1 day, 3 days, 5 days, 7 days, and 10 days.

2.1 Study Area

Figure 1 shows the study area with river networks, rain gauge stations, water level gauge stations, and major districts of Bagmati River basin in India. It also shows the Bagmati River and its tributaries Lalbhekya, Lakhandei, Khiroi and Adhwara group of rivers. The Bagmati River originating from the Shivpuri range of hills in Nepal at 1500 m above msl, is a perennial river that flows through the districts of Sitamarhi, Sheohar, Muzaffarpur, Darbhanga, and Samastipur in Bihar State of India. The river basin lies to the north of the Ganga River. It has length of 394 km in Bihar and 195 km in Nepal, with basin area of 6500 km² in Bihar and 7884 km² in Nepal. To the north of its source, the Himalayan range of hills lies, which is at a higher elevation draining into its neighboring rivers Kosi and Gandak. The Bagmati River meets the Kamla River at Jagmohra village of Samastipur and then finally outfalls into the Kosi River near Badlaghat, Khagaria. The elevation of rain gauge stations - Benibad, Dheng, Hayaghat and Kamtaul are 54 m, 74 m, 46 m and 55 m, respectively. The basin experiences an average yearly precipitation of 1255 mm. The average annual temperature is 27°C. The river has mainly alluvial soil type.

2.2 Data Used

Daily rainfall data of satellite precipitation product, viz., IMERG, created by the National Aeronautics and Space Administration (NASA) and Japan Aerospace Exploration Agency (JAXA) have been used. IMERG uses low-earth orbit and geostationary satellite data, along with microwave-calibrated infrared satellite estimates to deliver rainfall estimate. The quickest version of IMERG provides rainfall data within 4 hours of the observation (Huffman et al., 2019). The "Final Run" data of IMERG with a spatial resolution and spatial coverage of 0.1°× 0.1° and 60° N − 60° S respectively were used. They provide half-hourly and daily rainfall data with coverage from 2000 to the present time. Access to these data is obtainable through the website https://giovanni.gsfc.nasa.gov/giovanni/.

For quality assessment of IMERG data, observed daily data of rainfall at four rain gauge stations, viz., Benibad, Dheng, Hayaghat and Kamtaul were collected from IMD from the year 2001 to 2009. Monthly rainfall from the year 2001 to 2014 were also provided by IMD. To develop a RTFF model based on machine learning, it was necessary to collect hourly rainfall and water level data. For this, half-hourly IMERG data were downloaded and later converted to hourly rainfall data. The hourly water level gauge data at Hayaghat gauging site for the South-West (SW) monsoon and post-monsoon period from year 2001 to 2014 were also collected.

Figure 2 depicts the flowcharts that illustrate the methodology employed for the research. Due to lack of hourly rainfall data from IMD, the daily rainfall data from IMD and IMERG were compared at the basin level using the contingency test. Since daily rainfall data represent the aggregation of 24 hours of hourly rainfall data, so, it was assumed that the SPP performing better at the daily level would also perform better at the hourly level. Observed mean rainfall over the Bagmati River basin was computed using the Thiessen polygon method for rain gauge stations, while for SPP, mean precipitation was computed using the Thiessen polygons based on the grid points within the basin. After comparing the daily data, the hourly rainfall data for the South-West (SW) monsoon and post-monsoon periods of IMERG from the year 2001 to 2014 were used in developing different machine learning-based RTFF models. The outcomes of the models were compared, and the most suitable ML model for RTFF was determined. The detailed methodology is discussed in further section.

3.1 SPP Evaluation

The contingency test for SPP consist of evaluation of four categorical metrics: Hit (H), Miss (M), False Alarm (F), and Correct Negative (Q). These metrics assess the satellite's capability to distinguish between days with rainfall and days without rainfall. Two contingency indices, Probability of Detection (POD) and False Alarm Ratio (FAR), were computed based on these metrics. POD quantifies the satellite's accuracy in correctly identifying events that match the observed data, while FAR measures the instances where the satellite registers events that do not align with the observed data. A POD value of 1 signifies perfect detection of all positive events without any misses, indicating high accuracy. A FAR value of 0 implies no false alarms, showcasing precise and accurate identification of positive events. Studies by Hogan et al. (2010), Sireesha et al. (2020) and Navale et al. (2020) have used these categorical metrics in contingency tests. The methods of computing contingency metrics and contingency indices have been presented in Table 1 and Table 2, respectively. In this study, rainfall events are defined as instances where the amount of rainfall exceeds a threshold value of 0.1 mm/day.

Table 1

Contingency metrics.
	Rain perceived by SPP (Yes)	Rain perceived by SPP (No)
Rain perceived by rain gauge (Yes)	Hit (H)	Miss (M)
Rain perceived by rain gauge (No)	False Alarm (F)	Correct Negative (Q)

Table 2

Contingency indices.
Contingency Indices	Formula	Range	Perfect Value
POD	$POD = \frac{H}{H+M}$	0–1	1
FAR	$FAR = \frac{F}{F+H}$	0–1	0

3.2 Machine Learning Models for Real-Time Flood Forecasting

Four RTFF models have been developed using MATLAB R2020a. The basic architecture and functioning of these models are discussed in the proceeding sections.

3.2.1 Feed Forward Neural Network (FFNN)

FFNN entails an input layer, hidden layers, and an output layer. Data such as rainfall, water level, discharge, and various meteorological and hydrological parameters are taken as input. The hidden layers process this input data using a series of mathematical functions and nonlinear transformations. Through the modification of weights and biases in the connections between nodes, the neural network can undergo learning processes that enable it to identify patterns and make predictions using input data (ASCE 2000a,b). Finally, the output layer produces the predicted value, such as the water level or discharge. The architecture of the FFNN model for RTFF is shown in Fig. 3. Here, W is weight, b is bias, d is lag, n is the number of hour ahead, R(t) is current hour rainfall input, WL(t) is current hour water level input, R(t-1), R(t-2)….(t-d) are lagged rainfall inputs by 1, 2 .. d hrs, WL(t-1), WL(t-2)….WL(t-d) are lagged water level inputs and WL_M are model outputs of water level for lead time of n hours.

3.2.2 Extreme Learning Machine (ELM)

ELM is an ML algorithm categorized under FFNN. Differing from conventional neural networks, ELM adopts a unique approach where it trains the output weights of network in a single step, rather than iteratively updating them using backpropagation. This allows ELM to train much faster than traditional neural networks and avoid the problem of getting stuck in local minima. The input weights of the ELM network are randomly generated and fixed (Huang et al. 2006). It has a low computational cost and can handle large-scale data efficiently. However, ELM has limited interpretability due to the use of randomly generated input weights and fixed hidden layer activation functions.

The effectiveness of ELMs is mainly determined by hidden nodes, and this aspect is quantitatively described using equation (Huang et al. 2006; Yeditha et al. 2020) :

$$\sum _{i=1}^{n}{O}_{i}{t}_{i}\left({\alpha }_{i}{x}_{i}+ {\beta }_{i}\right)={M}_{i}$$

where, n represents the count of nodes, $i$ denotes the index of hidden neuron, O corresponds to weight vector connecting hidden nodes to output nodes, ${t}_{i}$ denotes target value connected with the $i$_th hidden node, ${x}_{i}$ denotes input data accompanying with the $i$_th hidden node, ${\alpha }_{i}$ denotes weight linking input data ${x}_{i}$ to the hidden node, ${\beta }_{i}$ denotes bias term of $i$_th hidden node and M represents the model's output.

To determine the output weights in an ELM model with P input neurons and Q hidden neurons, the Moore-Penrose pseudoinverse is utilized (Huang et al. 2006; Yeditha et al. 2020). The input-hidden weight matrix is represented as A with dimensions Q × P, where each row corresponds to the weight vector linking input neurons to hidden neuron. Let X be the input data matrix with dimensions P × L, where L is the number of training samples. The pseudoinverse of a matrix A, denoted as A⁺, is calculated using the following equation:

A⁺ = (A^TA)⁻¹A^T (2)

Where, A^T is the transpose of matrix A, and (A^TA)⁻¹ is the inverse of the product of A^T and A.

Finally, the output weights W are computed as:

W = A⁺ Y (3)

Where, Y denotes target output vector.

Figure 4 presents the structure of ELM model employed for RTFF. Within this framework, W denotes weights associated with the input layer, b signifies biases that are randomly generated and not subject to subsequent training, ƒ represents activation function, h_i represents hidden neurons, R_i and WL_i represent input data, β represents optimal bias values, T_i denotes target data, and WL_M corresponds to model outputs pertaining to water level.

3.2.3 Wavelet-based Hybrid Models

Wavelets are mathematical functions with finite duration and zero mean and play a crucial role in the analysis of non-stationary time series. In contrast to the Fourier transform, which emphasizes frequency domain information exclusively, wavelet transforms utilize localized basis functions known as wavelets. Wavelets possess the unique characteristic of being localized in both time and frequency domains. This property allows to simultaneously capture time and frequency information, making them well-suited for analyzing signals or data with localized features or time-varying patterns. This allows more comprehensive understanding of the data and overcome the limitations of Fourier analysis (Roshni et al. 2019). In hydrology, hydrometeorological time-series data are analogous to wavelet input signal.

Wavelet transform represents time series using mother wavelet (detail function, $\psi$) and father wavelets (scaling function, $\phi$). Mother and father wavelets are basic waveforms used in wavelet analysis to construct a complete wavelet that can be used to analyze signals with localized features and overall shape, respectively. Mother wavelet function is used to yield the scaling function (Yeditha et al., 2020).

The admissibility conditions to consider $\psi$ as a wavelet are (Agarwal et al. 2016; Yeditha et al. 2020):

$${\int }_{-\infty }^{\infty }\psi \left(x\right)dx=0$$

$${\int }_{-\infty }^{\infty }{\left|\psi \left(x\right)dx\right|}^{2}=1$$

A basis function $f\left(x\right)$ can be scaled and translated as:

$$f\left(x\right)= {\sum }_{p}{s}_{C,p}{\phi }_{C,p}\left(x\right)+{\sum }_{p}{e}_{C,p}{\psi }_{C,p}\left(x\right)+ {\sum }_{p}{e}_{C-1,p}{\psi }_{C-1,p}\left(x\right)\dots + {\sum }_{p}{e}_{1,p}{\psi }_{1,p}\left(x\right)$$

where,

${\phi }_{M,p}\left(x\right)$ = Functions generated through the translation of father wavelet.

${\psi }_{M,p}\left(x\right)$ = Functions generated through the dilation of mother wavelet.

$C$ = Total scales for assessment

$p$ = Length of time series (1 to n)`

${s}_{M,p}$ = Approximations coefficients

${e}_{M,p}\dots {e}_{1,p}$ = Wavelet transform coefficients (scales $C$ to 1)

In hydrological forecasting, the Discrete Wavelet Transform (DWT) is commonly used in signal decomposition due to the discrete structure of observed hydrometeorological time-series data. DWT is better for representing natural components and removing unnecessary noise in the signal. It does not have any redundant variables and requires less memory, resulting in a better outcome (Adamowski and Sun 2010; Roshni et al. 2019). In DWT, two sets of coefficients are generated for an input signal x: high-pass approximation coefficients (a₁) for low frequency and low-pass detail coefficients (d₁) for high frequency. After performing wavelet decomposition on the input, it exhibits the structure [a_n, d_n, d_n−1,..., d₂, d₁].

For developing the wavelet-based hybrid models, mother wavelet function is selected and then a suitable scale of decomposition for the input signal of rainfall R(t) is determined. Based on the study of Yang et al. (2016) and Reddy et al. (2022), the minimum (${l}_{min})$ and maximum (${l}_{max})$ level of decomposition can be determined by ${l}_{min}=int \left[\text{log}\left(k\right)\right]$ and ${l}_{max}=\left[{log}_{2}\text{k}\right]$ respectively. Here, $k$ is the data length. The process of wavelet decomposition is conducted using the maximal overlap discrete wavelet transform (MODWT) function available in MATLAB R2020a.This gives the discrete wavelet coefficient of input signals which is used as a new time-series input for the singular model. This combination of wavelet and singular models generates W-FFNN and W-ELM models. The architecture of the wavelet-based hybrid model for RTFF is shown in Fig. 5.

3.2.4 Input Selection and Processing

A total of 44170 numbers of mean hourly rainfall data and 44170 numbers of hourly water level data at the Hayaghat gauging site were available from the year 2001 to 2014. These data were used as input for RTFF model development. Before developing the neural network model, lags of inputs data were determined using Autocorrelation Function (ACF), Partial Autocorrelation Function (PACF), and Cross-Correlation Function (CCF) (Tiwari and Chatterjee 2010; Yeditha et al. 2022).

However, before using these input and target data for model development, normalization (Min-Max Feature Scaling) was done to scale the input features to have a consistent range between 0 and 1.

Based on the results of the lag analysis, the input combination for the model used to forecast water levels at any lead time $x$ is as follows:

$${WL}_{M}\left(t+x\right)=Model \left[ \left\{R\left(t\right), R\left(t-1\right),\dots R\left(t-{d}_{CCF}\right), WL\left(t\right), WL\left(t-1\right),\dots WL\left(t- {d}_{ACF} or {d}_{PACF}\right)\right\}\right]$$

Where, ${WL}_{M}\left(t+x\right)$ represents the forecasted output by the model at a lead time of $x$ hour, $R\left(t\right)$and $WL\left(t\right)$ denote the current rainfall and water level respectively, ${d}_{CCF}$ value refers to the lag determined by CCF analysis${d}_{ACF}$ and ${d}_{PACF}$ denote the lags determined by autocorrelation function and partial autocorrelation function analysis respectively. For computing the level at a specific lead time, $x$, the output target for the corresponding lead time should be used, i.e., for forecasting level at a lead time of $x$ hours, the output target should be$WL \left(t+x\right).$

3.2.5 Model Training and Testing

The total normalized data were divided into 70% for training and 30% for testing to develop models. For the FFNN-based model, the Levenberg-Marquardt backpropagation algorithm was used as learning method. The FFNN-based models were configured with varying numbers of hidden neurons, ranging from 1 to 10, and employed the TANSIG transfer function. The model was trained iteratively with a maximum of 10,000 epochs, minimum gradient of 10^− 7, learning rate of 0.9, and maximum validation check of 1000. The stopping criterion for the network was mean square error of 0. Both the hidden layer weights and output layer weights were trained iteratively.

The learning method used was simple linear regression for the ELM-based models with sine function as the activation function. The model consists of one hidden layer with randomly generated weights. Only a single pass of the entire training data was used during the training process through the neural network. Haar wavelet was employed to develop the hybrid models (Maheswaran and Khosa 2012). This wavelet is a square-shaped pulse or step function that is zero outside the finite support interval and takes a constant value within its support. The Haar wavelet has a fixed length and a fixed height and is orthogonal to itself and its dilated versions. The level of decomposition was selected as 10. After model development, the outputs were denormalized to their original scale, and the model's performance was assessed using various performance criteria.

3.3 Performance Criteria for Model

The model's performance was assessed using four statistical metrics: Coefficient of Correlation (R), Nash-Sutcliffe Efficiency (NSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). R indicates the linear association between two variables and varies between − 1 and + 1. NSE measures the relative difference between residual and observed variances, with values closer to 1 indicating better performance (Nash and Sutcliffe 1970). NSE is robust to outliers and penalizes both overestimation and underestimation. RMSE and MAE are the measures to calculate error between predicted and observed values, with lower values indicating greater accuracy. Unlike RMSE, MAE is not affected by outliers.

If ${X}_{i}^{Obs}$= ith observed data, ${X}_{i}^{Sim}$= ith simulated data of model, ${X}_{i}^{Mean}$= mean of the data and N = number of observations, then the different performance criteria to evaluate the models are as follows:-

$$R= \frac{\left(n\sum _{i=1}^{n}{X}_{i}^{Obs}*{X}_{i}^{Sim}\right)- \left(\sum _{i=1}^{n}{X}_{i}^{Obs}\right)\left(\sum _{i=1}^{n}{X}_{i}^{Sim}\right)}{\sqrt{\left[n{\sum }_{i=1}^{n}{\left({X}_{i}^{Obs}\right)}^{2 }- {\left({X}_{i}^{Obs}\right)}^{2 }\right] \left[n{\sum }_{i=1}^{n}{\left({X}_{i}^{Sim}\right)}^{2 }- {\left({X}_{i}^{Sim}\right)}^{2 }\right]}}$$

$$NSE=1 - \left[\frac{{\sum }_{i=1}^{n}{({X}_{i}^{Obs} -{X}_{i}^{Sim})}^{2}}{{\sum }_{i=1}^{n}{({X}_{i}^{Obs} -{X}_{i}^{Sim Mean} )}^{2}}\right]$$

$$RMSE= \sqrt{\frac{{\sum }_{i=1}^{n}{({X}_{i}^{Obs} -{X}_{i}^{Sim})}^{2}}{N}} \times 100 \%$$

$$MAE= \frac{{\sum }_{I=1}^{n}\left|({X}_{i}^{Obs} - {X}_{i}^{Sim})\right|}{n}$$

4.1 Comparison of IMERG and Observed Rainfall Data

Thiessen Polygon for the rain gauge stations and grid points of IMERG is shown in Fig. 6. For the IMERG, 51 grid points were inside the basin. The results of contingency test is presented in Table 3. Overall, the IMERG data showed reasonably good performance in detecting precipitation events in the basin with a POD of 85.42%. Result of contingency test showed that SPP with POD > 70% is acceptable for flood studies (Su et al. 2019; Yeditha et al. 2020; Yeditha et al. 2022; Weng et al. 2023). Further, Cumulative Distribution Function (CDF) plot presented in Fig. 7, compared the mean daily rainfall data of observed and IMERG over the basin. Results showed a close match in the daily range of data across different thresholds of rainfall. Notably, the IMERG data successfully detected an extreme daily rainfall magnitude of 199.44 mm as compared to the observed data. Figure 8 shows the comparison of rainfall data of observed and IMERG for South-West (SW) monsoon and post monsoon. These two seasons brings the most of the rainfall of the year in the basin. Both the data matches well in South-West (SW) monsoon, whereas there is slight variation in the post monsoon data. IMERG data consistently align with the reported observed rainfall magnitudes during these seasons. So, it can be inferred that IMERG data are suitable for flood studies in Bagmati River basin.

Table 3

Contingency metrics and contingency indices for IMERG data.
SPP	Contingency Metrics (days)				Contingency Indices (%)
SPP	Hit	Miss	False Alarm	Correct Negative	POD	FAR
IMERG	879	150	714	1892	85.42	44.82

4.2 Analysis of Water Level Data

Table 4 provides information about the flood level indicator at Hayaghat gauging site, which is important for people residing nearby and authorities to prepare for potential flood management.

The descriptive statistics for water level recorded at Hayaghat gauging site for South-West (SW) monsoon and post monsoon period from the year 2001 to 2014 is presented in Table 5. The mean water level for a particular year was calculated by adding all hourly water levels of SW monsoon and post monsoon of that year and then dividing by the total number of observations of water levels taken during that period. The median values were determined as the middle value when all the water level data were arranged in ascending order. The standard deviation was calculated by finding the difference between each water level and the mean, squaring each difference, adding them up, dividing by the total number of water levels minus one, and then taking the square root. It was observed that the mean values ranged from 41.58 m and 44.91 m, median values ranged from 40.47 m to 45.87 m and standard deviation ranged from 1.23 m to 2.94 m. Flood level crossed danger level 10 times, out of the 14 years. During the year 2001 to 2014, the highest peak recorded was 48.76 m on the date 15/07/2004 and the lowest peak of 33.42 m was recorded on the date 05/08/2009. This shows water level at the gauging site showed significant variation along the years. Factors such as weather patterns, river morphology, land use changes, human water withdrawals and other environmental factors are supposedly the causes of such variability of water levels. In this study, model performance has been validated using flood water level of the year 2011.

Table 4

Flood level indicator for Hayaghat gauging site.
Date of HFL Attained	14/08/1987
Highest Flood Level (HFL)	48.96 m
Danger Level (DL)	45.72 m
Warning Level (WL)	44.72 m

Table 5

Descriptive statistics of water level for Hayaghat gauging site.
Year	Minimum WL	Time of Occurrence of Minimum WL	Maximum WL	Time of occurrence of Maximum WL	Mean WL	Standard deviation	Median WL
Year	(m)	(DD-MM-YYYY HH:MM)	(m)	(DD-MM-YYYY HH:MM)	(m)	(m)	(m)
2001	37.71	18-06-2001 22:00	46.75	31-10-2001 18:00	44.26	2.44	45.60
2002	39.50	16-06-2002 13:00	48.39	02-08-2002 09:00	43.60	2.17	43.45
2003	39.05	17-06-2003 12:00	46.00	29-07-2003 02:00	44.09	1.81	45.03
2004	38.40	16-06-2004 09:00	48.76	15-07-2004 09:00	43.23	2.13	42.70
2005	36.68	17-06-2005 02:00	47.02	05-09-2005 02:00	42.13	2.84	41.44
2006	38.98	16-06-2006 10:00	46.34	10-10-2006 19:00	42.59	2.06	42.28
2007	38.20	16-06-2007 13:00	48.68	04-08-2007 23:00	44.91	2.44	45.87
2008	36.95	27-06-2008 14:00	45.58	29-07-2008 19:00	42.69	1.68	42.74
2009	33.42	05-08-2009 12:00	46.77	27-08-2009 23:00	41.66	2.94	40.47
2010	37.55	18-06-2010 02:00	45.15	03-09-2010 06:00	41.58	1.84	41.44
2011	38.70	19-06-2011 10:00	47.25	06-10-2011 18:00	44.08	2.25	44.82
2012	36.74	29-06-2012 02:00	45.65	23-09-2012 15:00	41.68	2.09	41.48
2013	39.25	07-06-2013 10:00	45.25	15-07-2013 18:00	41.93	1.23	41.70
2014	38.41	09-06-2014 10:00	46.16	12-08-2014 04:00	42.35	2.11	42.12

4.3 Determination of Lags and Wavelet Decomposition

The graphs of ACF, PACF and CCF are shown in Fig. 9, Fig. 10, and Fig. 11, respectively. UCB and LCB denotes 95% upper and lower confidence bounds respectively. If the estimated value exceeds the UCB or falls below the LCB, it may be considered statistically significant. The lags for the dependent variable (water level) were determined using ACF and PACF. The lags for the independent variable (rainfall) were determined using CCF. It was found from ACF result that the water level at time t was significantly correlated with the water level 20 hours earlier being all the values of ACF up to 20 hrs, are higher than UCB. This suggested that there was a persistent pattern in the water level fluctuations and that the system being modelled had a memory of past conditions that persisted for at least 20 hours.

Similarly, PACF analysis revealed a notable spike in the PACF value at a lag of 0.2, indicating a significant direct correlation between the water level at time t and the water level 5 hours earlier, after accounting for the effects of intermediate lags. This finding suggests that short-term patterns or fluctuations in the system also influence the changes in the water level. Since in real-time forecast, short-term patterns or fluctuations in the system is accounted, a 5 hours lag was considered for the input data of the water level for further modeling.

CCF results inferred that rainfall and corresponding changes in the water level showed correlation up to 20 hours. This was indicated by all CCF values surpassing the Upper Confidence Bound (UCB). However, adopting a lag of 20 hours for model development would introduce unnecessary complexity and require longer computational time. Furthermore, for this lag of 20 hours, the CCF values ranged between 0.015 and 0.02, which is considered statistically insignificant. Hence, a threshold of 0.02 was set for the CCF values. Consequently, the lag corresponding to a CCF value of 0.02 was determined to be 5 hours. Thus, a 5-hour lag was deemed appropriate for incorporating rainfall data into the subsequent modelling process.

Based on these input lags, the input combination for FFNN-based model and ELM-based model were taken as per Eq. 7. For example, for forecasting at 3 hr lead time, the input combination was taken R(t), R(t-1), R(t-2), R(t-3), R(t-4), R(t-5), WL(t), WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5) with output target as WL(t + 3).

Figure 12 shows wavelet decomposition of rainfall inputs at scale 10 using Haar wavelet. It produced 10 sets of detail coefficients (d₁ to d₁₀) and one set of approximation coefficients (a₁₀) on decomposition. Where, d₁ set represents the highest frequency components of the signal whereas d₁₀ represents the lowest frequency components. The final approximation is represented by a₁₀, which indicates the overall trend of the signal. These coefficients were used as input for the development of the wavelet-based hybrid model. For W-FFNN and W-ELM, the decomposed signal of IMERG rainfall was used in the input combination.

Table 6 presents the ideal number of neurons in hidden layers for different models at different lead times. It was observed that the FFNN-based model performed better with a lesser number of neurons. However, even with fewer neurons, the computational time was higher for FFNN-based models. This suggested that the FFNN model incurred a higher computational overhead or complexity due to the involvement of multiple iterations. On the other hand, the ELM model was faster in creating the network as it involved a single-iteration learning procedure, which significantly reduced the computational time as compared to the iterative learning process of the FFNN model.

Table 6

Optimal number of neurons for different model.
Lead Time	FFNN	ELM	W-FFNN	W-ELM
1 hr	7	41	2	48
3 hrs	4	41	2	47
6 hrs	5	47	1	45
12 hrs	10	42	5	47
1 day	10	49	7	50
3 days	5	41	5	31
5 days	4	1	4	39
7 days	5	39	5	42
10 days	2	3	4	29

4.4 Real-Time Short-Range Forecasts

Table 7 presents the model performance for short-term flood forecasting in real-time for training and testing. In the short-range flood forecast (lead times of 1, 3, 6, 12 hours, and 1 day), all models performed well with high values of NSE and R, and low values of RMSE and MAE. The FFNN-based model gave the best result almost at all lead time as compared to other models. The hybrid models, W-FFNN and W-ELM models generally perform better than singular FFNN and ELM models, especially for longer lead times. Figure 13 shows the violin plot between observed and model outputs for short-range forecasts. Based on the shape of the violin, it can be inferred that the results of all models have similar distribution as observed till 12 hours of lead time. Also, the white dot represents the median value of simulated water levels by all models at different lead time, which ranges from 42.68 m to 46.71 m and are very close to median value of observed water level, 42.68 m. The interquartile range of the observed water level is from 41.15 m to 45.12 m. A similar interquartile range is seen in the model outputs at different lead time. However, at a lead time of 1 day, the ELM and W-ELM showed little varying shape.

4.5 Real-Time Long-Range Forecasts

Table 8 presents the model performance for long-term flood forecasting in real-time for training and testing. The performance of all the models for longer lead time forecasts have been reduced as compared to short lead time forecasts, (i.e. lead times of 3, 5, 7, and 10 days), with lower NSE and R values and higher RMSE and MAE values. However, W-FFNN and W-ELM models still outperformed their singular FFNN and ELM models. In the testing phase, the W-FFNN model achieved an NSE value of 0.843, 0.697, 0.573, and 0.364, and an R value of 0.919, 0.838, 0.765, and 0.636 for lead times of 3, 5, 7, and 10 days, respectively. The W-ELM model achieved an NSE value of 0.781, 0.586, 0.375, and 0.035, and an R value of 0.898, 0.816, 0.741, and 0.631 for the same lead times. Performance shows that W-FFNN model is better than W-ELM model. Figure 14 presents the violin plot for the comparison of the forecasted water level by all the models with the observed water level for long-range forecasts. As the lead time increases, the violin plot for the ELM-based model shows greater distortion in shape. Additionally, the ELM-based model generally exhibits a smaller interquartile range but with higher outliers. On the other hand, the FFNN-based model demonstrates a better distribution, with the W-FFNN model maintaining a comparable distribution even at higher lead times, such as 7 days. However, at a lead time of 10 days, all models exhibit dissimilar distributions compared to the observed values.

In nutshell, the W-FFNN model performs satisfactorily up to lead time 7 days, whereas the W-ELM model produces satisfactory results up to lead time 3 days. In contrast, the singular FFNN and ELM models perform satisfactorily up to 5 days and 3 days, respectively. This observation indicates that with increasing lead time, the performance of all models decreases. On average, the W-FFNN model improved the NSE by 4.58% in long-term forecasts, and the W-ELM model improved it by 11.98% compared to their singular models.

Table 7

Results of short-range flood forecast in real time for training and testing.
Lead Time	Performance Parameters	Training				Testing
Lead Time	Performance Parameters	FFNN	ELM	W-FFNN	W-ELM	FFNN	ELM	W-FFNN	W-ELM
1 hr	RMSE	0.135	0.142	0.183	0.171	0.066	0.072	0.092	0.105
	NSE	0.997	0.997	0.995	0.995	0.999	0.999	0.998	0.998
	R	0.999	0.998	0.997	0.998	1.000	0.999	0.999	0.999
	MAE	0.019	0.023	0.030	0.061	0.014	0.019	0.031	0.052
3 hrs	RMSE	0.106	0.227	0.248	0.249	0.220	0.116	0.132	0.144
	NSE	0.998	0.992	0.990	0.990	0.992	0.997	0.996	0.995
	R	0.999	0.996	0.995	0.995	0.996	0.999	0.998	0.998
	MAE	0.033	0.048	0.052	0.086	0.037	0.044	0.054	0.078
6 hrs	RMSE	0.155	0.301	0.326	0.319	0.303	0.180	0.196	0.198
	NSE	0.995	0.986	0.983	0.984	0.986	0.993	0.992	0.991
	R	0.997	0.993	0.992	0.992	0.993	0.996	0.996	0.996
	MAE	0.059	0.087	0.097	0.115	0.064	0.087	0.099	0.112
12 hrs	RMSE	0.361	0.390	0.304	0.400	0.227	0.284	0.234	0.294
	NSE	0.980	0.976	0.986	0.975	0.989	0.982	0.988	0.981
	R	0.990	0.988	0.993	0.988	0.994	0.991	0.994	0.991
	MAE	0.101	0.147	0.089	0.171	0.100	0.155	0.105	0.172
1 day	RMSE	0.447	0.540	0.397	0.516	0.440	0.487	0.386	0.471
	NSE	0.969	0.953	0.976	0.957	0.958	0.946	0.967	0.949
	R	0.984	0.977	0.988	0.979	0.979	0.974	0.984	0.975
	MAE	0.164	0.264	0.165	0.257	0.227	0.294	0.221	0.291

Figure 15 presents the simulated results of all models for the flood event in 2011 at different lead times. The High Flood Level (HFL), danger level, and warning level are denoted by red-dash, orange-dash, and yellow-dash lines, respectively. The observed water level surpassed the danger level on October 1, 2011, at time 06:00:00, reached its peak value of 47.52 m on October 6, 2011, at time 18:00:00. A consistent rising limb, peak, and falling limb were observed for all models till 12 hours lead time, indicating a favorable overall fit. However, the ELM model exhibited a slight underestimation of the peak and falling limb at 12-hour lead time. From the lead time of 1 day onwards, variations in the model outputs became apparent, with occasional over- or under-predictions of the flood water level as compared to the observed data. Notably, at longer lead times such as 3 days, 5 days, and 7 days, the singular models (FFNN and ELM) exhibited higher variations between their outputs and the observed data as compared to the hybrid models (W-FFNN and W-ELM). W-FFNN model performed best, displaying the highest degree of overlap between its output and the observed water level for all the lead times. However, at a lead time of 10 days, all models performed poorly and failed to accurately capture the actual pattern of the observed water level.

Table 8 Results of long-range flood forecast in real time for training and testing.

Lead Time	Performance Parameters	Training				Testing
Lead Time	Performance Parameters	FFNN	ELM	W-FFNN	W-ELM	FFNN	ELM	W-FFNN	W-ELM
3 days	RMSE	0.775	0.942	0.673	0.828	0.906	1.031	0.849	0.947
	NSE	0.906	0.840	0.930	0.881	0.822	0.729	0.843	0.781
	R	0.952	0.928	0.964	0.945	0.908	0.878	0.919	0.898
	MAE	0.460	0.617	0.404	0.543	0.589	0.692	0.570	0.657
5 days	RMSE	1.059	1.907	0.893	1.009	1.253	2.038	1.181	1.259
	NSE	0.826	0.673	0.876	0.811	0.660	0.499	0.697	0.586
	R	0.909	0.834	0.936	0.917	0.817	0.738	0.838	0.816
	MAE	0.712	1.360	0.607	0.719	0.874	1.504	0.847	0.917
7 days	RMSE	1.276	1.477	1.039	1.187	1.520	1.612	1.405	1.468
	NSE	0.746	0.484	0.831	0.717	0.500	0.130	0.573	0.375
	R	0.864	0.812	0.912	0.883	0.720	0.676	0.765	0.741
	MAE	0.926	1.094	0.759	0.879	1.094	1.172	1.100	1.099
10 days	RMSE	1.627	2.050	1.295	1.396	1.788	2.074	1.720	1.720
	NSE	0.585	0.287	0.738	0.559	0.312	-0.059	0.364	0.035
	R	0.765	0.662	0.859	0.834	0.584	0.508	0.636	0.631
	MAE	1.264	1.523	0.974	1.053	1.374	1.515	1.349	1.321

In this study, ML-based RTFF models have been developed for the Bagmati River basin of Bihar in India. The lack of rainfall data obtained at ground stations at fine spatial and temporal resolution in the region causes hindrance in developing a robust RTFF model. To overcome these problems, SPP data of IMERG were compared with the observed data and used for model development. IMERG data closely matched with the observed data and can be considered as an substitute source of rainfall data for the Bagmati River basin. The “Haar” wavelet-based hybrid models have performed better than their singular counterparts. FFNN-based singular and hybrid models outperformed both ELM models in most of the cases. For short-term flood forecasting (lead time of 1 hour to 1 day), singular and hybrid models performed very well and FFNN-based models give best results for almost all lead times. For long-term flood forecasting (lead time of 3 days to 10 days), performance of singular models were not satisfactory but wavelet-based hybrid models performed well. The W-FFNN model demonstrated satisfactory performance, accurately forecasting flood water levels till 7 days lead time. On the other hand, W-ELM model achieved satisfactory results but was limited to 3 days lead time. Overall, W-FFNN model was deemed a robust RTFF model due to its ability to produce precise forecasts for longer lead time. The study of flood forecasting results were phenomenal, making IMERG a viable option for further study on flood, drought, soil erosion, climate change, and other factors in the Bagmati River basin.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Adamowski J, Sun K (2010) Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds. J Hydrol 390:85–91. https://doi.org/10.1016/j.jhydrol.2010.06.033
Agarwal A, Maheswaran R, Sehgal V, et al (2016) Hydrologic regionalization using wavelet-based multiscale entropy method. J Hydrol 538:22–32. https://doi.org/10.1016/j.jhydrol.2016.03.023
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology (2000a) Artificial neural networks in hydrology. I: Preliminary concepts. J Hydrol Eng 5: 115-123. https://doi.org/10.1061/(ASCE)1084-0699(2000)5:2(115)
ASCE Task Committee on Application of Artificial Neural Networks in Hydrology (2000b) Artificial Neural Networks in Hydrology. II: Hydrologic Application. J Hydrol Eng 5:124–137. https://doi.org/10.1061/(ASCE)1084-0699(2000)5:2(124)
Belabid N, Zhao F, Brocca L, et al (2019) Near-real-time flood forecasting based on satellite precipitation products. Remote Sens 11:. https://doi.org/10.3390/rs11030252
Belayneh A, Sintayehu G, Gedam K, Muluken T (2020) Evaluation of satellite precipitation products using HEC-HMS model. Model Earth Syst Environ 6:2015–2032. https://doi.org/10.1007/s40808-020-00792-z
Bhattacharyya S, Sreekesh S, King A (2022) Characteristics of extreme rainfall in different gridded datasets over India during 1983–2015. Atmos Res 267:. https://doi.org/10.1016/j.atmosres.2021.105930
Funk C, Peterson P, Landsfeld M, et al (2015) The climate hazards infrared precipitation with stations - A new environmental record for monitoring extremes. Sci Data 2:1–21. https://doi.org/10.1038/sdata.2015.66
Gautam AK, Pandey A (2022) Ground validation of GPM Day-1 IMERG and TMPA Version-7 products over different rainfall regimes in India. Theor Appl Climatol 149:931–943. https://doi.org/10.1007/s00704-022-04091-8
Ghose DK (2018) Measuring discharge using back-propagation neural network: A case study on Brahmani River Basin. Springer Singapore. https://doi.org/10.1007/978-981-10-7566-7_59
Hinge G, Mohamed MM, Long D, Hamouda MA (2021) Meta-analysis in using satellite precipitation products for drought monitoring: Lessons learnt and way forward. Remote Sens. 13. https://doi.org/10.3390/rs13214353
Hogan RJ, Ferro CAT, Jolliffe IT, Stephenson DB (2010) Equitability revisited: Why the “equitable threat score” is not equitable. Weather Forecast 25:710–726. https://doi.org/10.1175/2009WAF2222350.1
Huang G Bin, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomputing 70:489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Huffman GJ, Adler RF, Bolvin, DT et al (2007) The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-global, multiyear, combined-sensor precipitation estimates at fine scales. J. Hydrometeorol, 8:38–55. https://doi.org/10.1175/JHM560.1
Huffman G, Bolvin D, Braithwaite D, et al (2019) NASA Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG). In: Algorithm Theoretical Basis Document (ATBD). NASA/GSFC, Greenbelt, MD, USA. https://gpm.nasa.gov/sites/default/files/document_files/IMERG_ATBD_V5.1b.pdf
Jain SK, Mani P, Jain SK, et al (2018) A Brief review of flood forecasting techniques and their applications. Int J River Basin Manag 16:329–344. https://doi.org/10.1080/15715124.2017.1411920
Kim S, Matsumi Y, Pan S, Mase H (2016) A real-time forecast model using artificial neural network for after-runner storm surges on the Tottori coast, Japan. Ocean Eng 122:44–53. https://doi.org/10.1016/j.oceaneng.2016.06.017
Kumar D, Pandey A, Sharma N, Flügel W-A (2017) Evaluation of TRMM-Precipitation with Rain-Gauge Observation Using Hydrological Model J2000. J Hydrol Eng 22:. https://doi.org/10.1061/(asce)he.1943-5584.0001317
Kumar K, Singh V, Roshni T (2018) Efficacy of neural network in rainfall-runoff modelling of Bagmati river basin. Int J Civ Eng Technol 9:37-46. https://iaeme.com/MasterAdmin/Journal_uploads/IJCIET/VOLUME_9_ISSUE_11/IJCIET_09_11_003.pdf
Li BJ, Cheng CT (2014) Monthly discharge forecasting using wavelet neural networks with extreme learning machine. Sci China Technol Sci 57:2441–2452. https://doi.org/10.1007/s11431-014-5712-0
Li D, Christakos G, Ding X, Wu J (2018) Adequacy of TRMM satellite rainfall data in driving the SWAT modeling of Tiaoxi catchment (Taihu lake basin, China). J Hydrol 556:1139–1152. https://doi.org/10.1016/j.jhydrol.2017.01.006
Linh, N. T. T., Ruigar, H., Golian, S., Bawoke, G. T., Gupta, V., Rahman, K. U., Sankaran, A., & Pham, Q. B. 2021. Flood prediction based on climatic signals using wavelet neural network. Acta Geophysica, 69(4), 1413-1426. https://doi.org/10.1007/s11600-021-00620-7
Llauca H, Lavado‐casimiro W, León K, et al (2021) Assessing near real‐time satellite precipitation products for flood simulations at sub‐daily scales in a sparsely gauged watershed in Peruvian andes. Remote Sens 13:1–18. https://doi.org/10.3390/rs13040826
Maheswaran R, Khosa R (2012) Comparative study of different wavelets for hydrologic forecasting. Comput Geosci 46:284–295. https://doi.org/10.1016/j.cageo.2011.12.015
Mokhtari, S., Sharafati, A., & Raziei, T. 2022 Satellite-based streamflow simulation using CHIRPS satellite precipitation product in Shah Bahram Basin, Iran. Acta Geophysica, 70(1), 385-398. https://doi.org/10.1007/s11600-021-00724-0
Nanda T, Sahoo B, Beria H, Chatterjee C (2016) A wavelet-based non-linear autoregressive with exogenous inputs (WNARX) dynamic neural network model for real-time flood forecasting using satellite-based rainfall products. J Hydrol 539:57–73. https://doi.org/10.1016/j.jhydrol.2016.05.014
Nash JE, Sutcliffe J V (1970) River Flow Forecasting Through Conceptual Models - Part I - A Discussion of Principles. J Hydrol 10:282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Navale A, Singh C, Budakoti S, Singh SK (2020) Evaluation of season long rainfall simulated by WRF over the NWH region: KF vs. MSKF. Atmos Res 232:. https://doi.org/10.1016/j.atmosres.2019.104682
Prakash S, Mitra AK, AghaKouchak A, et al (2018) A preliminary assessment of GPM-based multi-satellite precipitation estimates over a monsoon dominated region. J Hydrol 556:865–876. https://doi.org/10.1016/j.jhydrol.2016.01.029
Piadeh F, Behzadian K, Alani AM (2022) A critical review of real-time modelling of flood forecasting in urban drainage systems. J Hydrol 607:127476. https://doi.org/10.1016/j.jhydrol.2022.127476
Reddy BSN, Pramada SK, Roshni T (2022) Selection of level and type of decomposition in predicting suspended sediment load using wavelet neural network. Acta Geophys 70:847–857. https://doi.org/10.1007/s11600-022-00761-3
Reddy NM, Saravanan S (2022) Evaluation of the accuracy of seven gridded satellite precipitation products over the Godavari River basin, India. Int J Environ Sci Technol. https://doi.org/10.1007/s13762-022-04524-x
Roshni T, Jha MK, Deo RC, Vandana A (2019) Development and Evaluation of Hybrid Artificial Neural Network Architectures for Modeling Spatio-Temporal Groundwater Fluctuations in a Complex Aquifer System. Water Resour Manag 33:2381-2397. https://doi.org/10.1007/s11269-019-02253-4
Sehgal V, Tiwari MK, Chatterjee C (2014) Wavelet Bootstrap Multiple Linear Regression Based Hybrid Modeling for Daily River Discharge Forecasting. Water Resour Manag 28:2793–2811. https://doi.org/10.1007/s11269-014-0638-7
Sezen C, Partal T (2022) New hybrid GR6J-wavelet-based genetic algorithm-artificial neural network (GR6J-WGANN) conceptual-data-driven model approaches for daily rainfall–runoff modelling. Neural Comput Appl 34:17231–17255. https://doi.org/10.1007/s00521-022-07372-5
Shoaib M, Shamseldin AY, Melville BW (2014) Comparative study of different wavelet based neural network models for rainfall-runoff modeling. J Hydrol 515:47–58. https://doi.org/10.1016/j.jhydrol.2014.04.055
Sireesha C, Roshni T, Jha MK (2020) Insight into the precipitation behavior of gridded precipitation data in the Sina basin. Environ Monit Assess 192:. https://doi.org/10.1007/s10661-020-08687-3
Soo EZX, Wan Jaafar WZ, Lai SH, et al (2022) Enhancement of Satellite Precipitation Estimations with Bias Correction and Data-Merging Schemes for Flood Forecasting. J Hydrol Eng 27:. https://doi.org/10.1061/(asce)he.1943-5584.0002190
Subramanya K (2008) Engineering hydrology. McGraw-Hill, India.
Tiwari MK, Chatterjee C (2010) Development of an accurate and reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approach. J Hydrol 394:458–470. https://doi.org/10.1016/j.jhydrol.2010.10.001
Todini E (2005) Present operational flood forecasting systems and possible improvements. Taylor and Francis Boca Raton, FL.
Tripura J, Roy P, Barbhuiya AK (2018) Application of RBFNNs incorporating MIMO processes for simultaneous river flow forecasting. J Eng Technol Sci 50:434–449. https://doi.org/10.5614/j.eng.technol.sci.2018.50.3.9
Tshimanga RM, Tshitenge JM, Kabuya P, et al (2016) A Regional Perceptive of Flood Forecasting and Disaster Management Systems for the Congo River Basin. Flood Forecast A Glob Perspect 87–124. https://doi.org/10.1016/B978-0-12-801884-2.00004-9
Xiao S, Xia J, Zou L (2020) Evaluation of multi-satellite precipitation products and their ability in capturing the characteristics of extreme climate events over the Yangtze River Basin, China. Water (Switzerland) 12:. https://doi.org/10.3390/W12041179
Yang M, Sang YF, Liu C, Wang Z (2016) Discussion on the choice of decomposition level for wavelet based hydrological time series modeling. Water (Switzerland) 8:1–11. https://doi.org/10.3390/w8050197
Yeditha PK, Kasi V, Rathinasamy M, Agarwal A (2020) Forecasting of extreme flood events using different satellite precipitation products and wavelet-based machine learning methods. Chaos 30:. https://doi.org/10.1063/5.0008195
Yeditha PK, Rathinasamy M, Neelamsetty SS, et al (2022) Investigation of satellite precipitation product driven rainfall-runoff model using deep learning approaches in two different catchments of India. J Hydroinformatics 24:16–37. https://doi.org/10.2166/HYDRO.2021.067
Yigez B, Xiong D, Belete M, et al (2022) Evaluation of multi-satellite precipitation products for soil loss and sediment export modeling over eastern regions of the Koshi River Basin, Nepal. J Soils Sediments 22:2731–2749. https://doi.org/10.1007/s11368-022-03264-2
Zhou L, Rasmy M, Takeuchi K, et al (2021) Adequacy of near real-time satellite precipitation products in driving flood discharge simulation in the fuji river basin, Japan. Appl Sci 11:1–16. https://doi.org/10.3390/app11031087

Download PDF

Journal Publication

published 07 Apr, 2024

Read the published version in Acta Geophysica →

Editorial decision: Major revisions
04 Dec, 2023
Reviewers agreed at journal
06 Aug, 2023
Reviewers invited by journal
06 Aug, 2023
Editor invited by journal
30 Jul, 2023
Editor assigned by journal
26 Jul, 2023
First submitted to journal
24 Jul, 2023

You are reading this latest preprint version

Real-Time Flood Forecasting using Satellite Precipitation Product and Machine Learning Approach in Bagmati River Basin, India

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Study Area and Data Used

2.1 Study Area

2.2 Data Used

3. Methodology

3.1 SPP Evaluation

3.2 Machine Learning Models for Real-Time Flood Forecasting

3.2.1 Feed Forward Neural Network (FFNN)

3.2.2 Extreme Learning Machine (ELM)

3.2.3 Wavelet-based Hybrid Models

3.2.4 Input Selection and Processing

3.2.5 Model Training and Testing

3.3 Performance Criteria for Model

4. Results and Discussion

4.1 Comparison of IMERG and Observed Rainfall Data

4.2 Analysis of Water Level Data

4.3 Determination of Lags and Wavelet Decomposition

4.4 Real-Time Short-Range Forecasts

4.5 Real-Time Long-Range Forecasts

5. Conclusions

Declarations

References

Status:

Journal Publication

Version 1

Contingency Indices	Formula	Range	Perfect Value
POD	\(POD = \frac{H}{H+M}\)	0–1	1
FAR	\(FAR = \frac{F}{F+H}\)	0–1	0