A novel combined model for prediction of daily precipitation data using instantaneous frequency feature and bidirectional long short time memory networks

Meteorological events constantly affect human life, especially the occurrence of excessive precipitation in a short time causes important events such as floods. However, in case of insufficient precipitation for a long time, drought occurs. In recent years, significant changes in precipitation regimes have been observed and these changes cause socio-economic and ecological problems. Therefore, it is of great importance to correctly predict and analyze the precipitation data. In this study, a reliable and accurate precipitation forecasting model is proposed. For this aim, three deep neural network models, long short-time memory networks (LSTM), gated recurrent unit (GRU), and bidirectional long short time memory networks (biLSTM), were applied for one ahead forecasting of daily precipitation data and compared the performances of these models. Moreover, to increase the far ahead forecasting performance of the biLSTM model, the instantaneous frequency (IF) feature was applied as the input parameter for the first time in the literature. Therefore, a novel model ensemble of IF and biLSTM was employed for the aim of one-six ahead forecasting of daily precipitation data. The performance of the proposed IF-biLSTM model was evaluated using mean absolute error (MAE), mean square error (MSE), correlation coefficient (R), and determination coefficient (R2) performance parameter and spider charts were used to assess the model performances. According to the numerical results, the biLSTM model outperformed compared with the LSTM and GRU models. After the good score achieved with biLSTM model, IF feature applied to biLSTM and IF-biLSTM model has the best forecasting performance for daily precipitation data with R2 value 0.9983, 0.9827, 0.9092, 0.8508, 0.7827, and 0.7646, respectively, for one-six ahead forecasting of daily precipitation data. It has been observed that the IF-biLSTM model has higher forecasting performance than the biLSTM model, especially in far ahead forecasting studies, and the IF feature improves the estimation performance.


Introduction
Precipitation occurs when the moisture in the atmosphere condenses and returns to the earth in different conditions. Many factors affect precipitation. Factors such as pressure, temperature, and wind in the atmosphere with the effect of global changes also affect the amount of precipitation over the years (Wei et al. 2005). Hydrology is known as water resources correctly and to project the water structures to be built properly. Precipitation is one of the most important of these parameters that generate the flow. Accurate measurement of this parameter is very important in the management and operation of water resources (Hamill and Whitaker 2006;Price 2013;Wei et al. 2005;Wu et al. 2012). Precise estimation of precipitation is often difficult, as the exact mechanisms that affect its occurrence are not known. It is much more important to accurately forecast precipitation, especially on a small time scale such as daily or hourly. However, precipitation is a difficult subject to analyze due to its high complexity, non-stationary, non-linear, and dynamic internal structure. Therefore, improving precision in precipitation prediction is a significant topic for hydrologists, and research on precipitation prediction methods has increasing importance.
Studies for hydrometeorological parameter estimation were first based on linear approaches, and models such as the parametric autoregressive moving average (ARMA) and the autoregressive integrated moving average (ARIMA) have been introduced since the 1970s to analyze time-series data (Box et al. 2015). These techniques are linear models and model the time series assuming that it is stationary and linear. For this reason, forecasting of hydrometeorological data studies is developed based on artificial intelligence techniques due to the non-stationary and non-linear character of this data in the last two decades.
These methods have been found to obtain satisfactory forecasts capturing the nonlinear property of hydrological and meteorological processes.
But, when the literature studies are examined recently, it has been revealed that the performance of hybrid models is better than a single model for forecasting precipitation. For example, Yu Xiang et all performed the empirical mode decomposition (EMD)-ANN-SVR model for the prediction of rainfall. They stated that the EMD-ANN-SVR model was better than the single ANN model (Xiang et al. 2018). In another study, maximum overlap discrete wavelet transform (MODWT) and ANN model were performed for rainfall forecasting (Gomes and Blanco 2021). They concluded that the proposed hybrid model had a good performance.
In addition to traditional machine learning techniques, deep neural networks (DNN) is recently introduced as an effective modeling method for mapping nonlinear relationships and started to be applied in many different areas (Bashar 2019). Long short-term memory networks (LSTM) is DNN architectures designed by Hochreiter and Schmidhuber (1997) to learn the long-term dependencies of time series through gate and memory units (Hochreiter et al. 1997). This is a type of deep neural network and is based on recurrent neural networks architecture. LSTM architecture is developed to eliminate some of the disadvantages that occur in RNN architectures, such as vanishing gradient problems and restricting the memory capabilities. LSTM was used for forecasting river flood (Le et al. 2019), air pollution forecasting (Freeman et al. 2018;Yu et al. 2019), fog (Miao et al. 2020), wind power (Shahid et al. 2021), and modeling of rainfall-runoff processes (Schuster et al. 1997). Recently, gated recurrent unit (GRU) and bidirectional LSTM (biL-STM) networks which are LSTM's variants are utilized analysis of long-term dependencies (Cho et al. 2014;Schuster et al. 1997). GRU neural network is a newly developed gating mechanism and is successfully employed in many fields, such as short-term electricity load forecasting method based on multilayered self-normalizing GRU network (Kuan et al. 2017), heat load forecast (Lu et al. 2018), and electricity generation and planning (Li et al. 2018). Also, the biL-STM model is applied for the aim of time series forecasting (Siami-Namini et al. 2019a, b), financial time series (Siami-Namini et al. 2019a, b), and forecasting of trading area (Kim et al. 2019). In these studies, experimental results show that the biLSTM model performed better than LSTM models.
Although much work has been done on precipitation prediction using machine learning and LSTM models (Danandeh Mehr el at. 2018;Gomes et al. 2021;Le et al. 2020;Parmar et al. 2017;Xiang et al. 2018, Wu et al. 2021, there is no existing literature on the application of the biLSTM model for the forecasting purpose of daily precipitation data.
The novelty of the proposed study is to develop a novel biLSTM-based model for complex natural phenomena such as precipitation. Also to improve the far ahead forecasting performance of the biLSTM model, in addition to the precipitation data, the IF feature is also applied as an input to the model.
The term IF is used to describe how the frequency of a specific monocomponent signal varies over time. In the literature, the IF feature has been used mostly for feature extraction purposes in classification studies (Yang et al. 2009;Xue et al. 2018;Khan et al. 2021). In this study, it was shown the effect of the IF feature in the estimation of precipitation data having nonlinear and nonstationary character.
To the best of our knowledge, there is not any study in the literature, one-six ahead forecasting of daily precipitation data with high performance using proposed novel Instantaneous frequency combined biLSTM, IF-biLSTM, model.
The main contributions of this study can be summarized as follows: 1-The forecasting performance of the biLSTM model is compared with GRU and LSTM models for one ahead forecast using 1-3 inputs. Thus, the success of the biL-STM model over other models in forecasting was shown. 2-To evaluate the far ahead forecasting performance of the biLSTM, 2-6 day ahead forecasting studies were also carried out. 3-The effect of the IF feature on the one-six day ahead forecasting performance is analyzed. For this, in addition to the precipitation data, the instantaneous frequency feature is applied as an input to the biLSTM model. 4-A spider (radar) chart of the performance parameters was used to measure the performance the biLSTM and IF-biLSTM models. 5-Finally, a novel high performance IF ensemble biLSTM (IF-biLSTM) model is proposed for forecasting of daily precipitation data.
The rest of the paper is organized as follows. "Study area and data" gives information about the study area and the data. "Materials and methods" provides a brief review of the LSTM, GRU, biLSTM, and instantaneous frequency methods for daily precipitation estimation. "Results and discussion" describes the prediction results obtained by the LSTM, GRU, biLSTM, and IF-biLSTM. Finally, "Conclusions" concludes the paper.
Application of the proposed novel model using IF-biLSTM to the precipitation data for forecasting is seen in Fig. 1, and the flow chart of the study is expressed in the "Materials and Methods" section.

Study area and data
The daily mean areal precipitation (mm) data has been continuously gauged over the 49-year period of between 1963 and 2012; hence, consisting of 17,990 data are used as the material of this study. The region, where data was gauged, is the Churchill River above Otter Rapids basin lies between 55°38′51.0″N, 104°44′09.0″W (latitude: 55.647499, longitude: − 104.735832) in Saskatchewan Province in Canada. The drainage basin from which the data was recorded is shown in Fig. 2. This data set is obtained from the CAN-OPEX database (http:// canop ex. etsmtl. net/, Arsenault et al. 2016). The drainage area at this site is 114,248 km 2 .
The daily precipitation data were divided into a 70:30 ratio, where 70% of data was employed for training the model, and the remaining 30% data was utilized for testing the effectiveness of the model. Therefore, 12,593 elements of the precipitation data were used for the training phase and the rest of 5397 elements of the precipitation data were used for the testing stage. Training and testing data are normalized as shown in Fig. 3.

Long short-term memory networks
LSTM networks are a special type of recurrent neural networks (RNN) that can learn long-term dependencies through special hidden units called memory cells, which are used to remember the previous input for a long time. It has been first introduced by Hochreiter and Schmidhuber (1997). It is developed in different versions in later times (Hochreiter et al. 1997). LSTM model can capture nonlinear trends in data and recall long-term information. Therefore, LSTM is successfully applied to many kinds of time series problems. As shown in Fig. 4, there are three gates including input, forget, and output gates in the LSTM unit composed of a cell.
The main information flow of the LSTM memory cell (Fig. 4) can be expressed mathematically. " × " and " + " symbols indicate the addition and multiplication operations in the model. The flow direction of the information is shown by the arrow. The first layer of the memory unit decides to remove unnecessary information from the cell state. This decision is made with an operation denoted by forgetting gate and expressed by Eq. 1. Here, C t − 1 is assigned a value between 0 and 1 according to the cell status. The output of the forgetting gate is shown in f t as Fig. 1 The proposed DNN forecasting model for daily precipitation data shown in Eq. 1 (Hochreiter et al. 1997). The forgetting gate in the LSTM unit gives a certain weight (W) to long-term memory information.
The input gate layer is the layer where it is decided what new information to store in the cell state. This layer receives the input and learns new information along with the information previously learned from short-term memory. It consists of two parts. The entrance gate part of the input layer is a sigmoid layer called (i t ), which is the layer that decides which values to update. The equation gives the output expressed by Eq. 2. The tanh layer of the input layer is the layer in which new candidate values (C y ) are formed, and it is expressed by Eq. 3. These two outputs are combined to create an update.
C t update status in LSTM networks; new values are created according to the information from other layers. Here, the long-term memory is updated by adding the new information learned to the parts coming from the long-term memory. The update status is determined by adding the forget gate layer and the input gate layer values. (1) In the last layer of LSTM networks, as shown in Eq. 5, firstly, the inputs (x t , h t − 1 ) are passed through a sigmoid layer that decides how much the cell state will affect the output. The cell state is passed through the tanh activation function as shown in Eq. 6 and multiplied by the output of the output gate.
The functions used in the LSTM unit (sigmoid (σ), hyperbolic tangent (tanh), product (x), and sum ( +) are differentiable, as can be seen in Fig. 3. Therefore, weights can be updated by derivatives in the backpropagation process.

Bidrectional long short-term memory networks
A bidirectional LSTM, often known as biLSTM, is a DNN model that is a more advanced variant of the standard LSTM model. There are two LSTM units in this design. The information flow in the LSTM unit is unidirectional, but it is bidirectional in the biLSTM unit, as seen in Fig. 5. One of them takes the input in the forward direction, the other in the back direction. The biLSTMs improve the content available for the algorithm, effectively increasing the amount of information available on the network (Schuster et al. 1997). The biLSTM can remember long-term dependencies while also processing information bidirectionally.

Gated recurrent unit
Today, different variations of LSTM architecture are introduced. The most commonly used of these is the gated recurrent unit or GRU. In the GRU architecture, forget gate and input gates are combined. It has less complexity compared to standard LSTM models. The GRU unit is given in Fig. 6 below. The main difference between a GRU and a LSTM is that a GRU has two gates (reset and update gates) and an LSTM has three gates (input, forget, and output gates). The output of h t in the GRU unit is defined by the equations given below (Cho et al. 2014).
It is assumed that r t and z t denote the reset and update gates. The update gate acts similar to the forget and input gate of an LSTM. It decides what information to throw away and what new information to add. The mathematical formula can be expressed as follows: where represents the sigmoid function, x t and h t are the variables represent the current input and the output of the GRU unit. Also, h t−1 is hidden state at t − 1 time.
where h ′ t is the candidate state and tanh is the hyperbolic tangent function. If the r t reset gate is closed, then the GRU will ignore the previous hidden state h t−1 . In this case, output is affected only by the current input x t . Also, the update gate z t controls how much information of the past state h t−1 can be passed to the current state h t .
It is important that the number of LSTM layers and the number of cells in each layer for forecasting performance and computation time. In the development of the DNN architecture, different number and order of layers were tried. DNN architecture giving the best performance and having the lowest processing load was developed. The architectures realized with LSTM, biLSTM, and GRU in this study are shown in Fig. 7.
The first layer in architecture consists of a sequence input layer, which is used to input the daily precipitation data into the network.
The second layer consists of LSTM, biLSTM, and GRU layers for each model.
In this layer numbers of 32, 64, 128, and 256, memory units were tried during the development of the model, and the best forecasting performance was obtained with the The third layer of the model is the rectified linear unit (ReLU) layer. This activation function is the most commonly used function in deep neural networks. This layer is also known as the activation layer. The effect it has on the input data is that it makes the negative values zero.
The fourth layer of the model is the fully connected layer. In this layer, data from previous layers are combined by weighting and a loss function and the optimal weight to be given to neurons during training is found. In this layer, 10-100 units were tried during the construction of the model and the most reasonable unit value was obtained as 10.
The fifth layer of the model is the dropout layer which is used to forget some neurons in order to prevent overfitting during training. In this study, dropout was applied as 50%.
The sixth level of the model is the fully connected layer. The output of this layer was defined as one.
The model lasts with a regression output layer.
In the training phase of the network, the maximum number of epochs was 100, the initial learning rate was 0.002, the learning rate drop range was 100, and the learning rate drop factor was 0.1. These values are decided by the trial and error method. "Adam" optimization algorithm was used for the training of the network.

Inputs for deep neural networks
Deep neural networks are trained by input data and output data. In this study, sequential data of previous rainfall and instantaneous frequency features are applied as input variables shown in Table 1. One-to six-day ahead forecasting uses one-three inputs and after with instantaneous frequency feature.

Hilbert transform and instantaneous frequency
The instantaneous frequency is crucial in numerous signal processing applications and represents one of the most important parameters in the time-frequency analysis for the modeling and classification of signals. Hilbert transform is applied to the real-valued x(t) signal to obtain instantaneous frequency. The Hilbert transform is a specific linear operator and defined as below equation: According to Eq. 11, this linear operator is given by convolution with the function 1 t . Convolution operation with an x(t) signal in time domain imparts a phase shift of ± 90° to every frequency component of the signal in the frequency domain (Johansson 1999).
As a result of the Hilbert transform, it is possible to obtain the analytic representation of a real-valued x(t) signal. An analytic complex-valued X(t) signal can be constructed from a real-valued input signal x(t) as seen in Eq. 12.
where X(t) is the analytic signal obtained from x(t) and its Hilbert transform h(t). Also, x(t) signal can be expressed in polar coordinates. The derivative of the phase named as (t) of the analytic signal X(t) is called instantaneous frequency. Instantaneous frequency is defined as below equation.
In this study, instantaneous frequency feature and the previous two daily precipitation data were applied in the forecasting of daily precipitation data.

Performance indicators
In the proposed study, the forecasting performance of the IF-biLSTM model has been shown using mean absolute error and the mean square error.

The mean absolute error
Average absolute error is the measure of the difference between the observed time series data and the forecasted data by the proposed model. MAE is defined below equation: The mean square error The mean square error represents the difference between observed time series data and forecasted data by the proposed model extracted by squared the average difference over the data set. MSE is defined as below equation:

The correlation coefficient (R)
The correlation coefficient reveals the degree, direction, and importance of the relationship between observed time series data and forecasted data by the proposed model. The correlation coefficient is denoted by the R and takes a value between [− 1, 1]. R value is defined as below equation: In this equation, X observed,i is the observed time series data, X is the average, and X is the standard deviation of the observed time series data. Y estimated,i is estimated data, Y is the average, and y is the standard deviation of the estimated data.

The ddetermination coefficient (R 2 )
R 2 is often used to evaluate the predictive power of used hydrological models. This statistical criterion takes a value between − ∞ and 1. If the R 2 determination coefficient value is one between the actual and estimated data, it means that excellent results have been obtained. R 2 value is defined as below equation:

Performance evaluation for LSTM, biLSTM, and GRU forecasting models
In the first stage of the study, to evaluate the forecasting performance of LSTM, biLSTM, and GRU models, one to three previous daily precipitation data was applied to models as input variables and 1-day ahead forecasting were performed. As it can be seen in Table 2, the biLSTM model shows superior forecasting performance compared to the LSTM and GRU models.
It is seen that the performance of one input biLSTM model is the best for one ahead estimation of daily precipitation data. Observed and predicted results are shown for 1-day ahead forecasting model in Fig. 8. As clearly observed from the scatter plot of one ahead forecasted data and observed data, the linear trend from the biLSTM model is close to the line y = x.
Since the biLSTM model outperformed the other models, only the biLSTM model was carried out for the far ahead (2-6 ahead) forecasting studies.

One to six ahead forecasting performance of biLSTM model
At this stage of the study, 1-to 6-day ahead forecasting was performed using the biLSTM method. As can be seen from Table 3, the best forecasting performance is obtained for one ahead forecasting using one input, two-four, and six ahead forecasting using two inputs and five ahead forecasting using three inputs. As an example, the model result of six ahead forecasting using two inputs is shown in Fig. 9. As can be seen in Table 3, since the number of ahead steps to be forecasted increases, the performance of the biL-STM model decreases as expected. In addition, in the 2, 4, and 6 ahead forecasting studies, it was observed that the forecasting performance obtained by applying two inputs precipitation data to the model was better than the other cases.

Forecasting performance of biLSTM model using instantaneous frequency
At this stage of the study, to increase the far ahead prediction performance of the biLSTM model, the effect of applying the instantaneous frequency feature as an input parameter was investigated. Therefore, the instantaneous frequency feature was applied as an input in addition to two precipitation data inputs to the biLSTM model. It is seen from Table 4 that forecasting performance is improved with the application of instantaneous frequency features as input to the biLSTM model. In particular, a significant improvement is observed in the 2-6 ahead prediction performance as a result of the application of the IF feature to the model. As an example, six ahead forecasting using IF-biLSTM model is shown in Fig. 10. In addition, the effect of the IF feature on the prediction performance is shown with a spider chart. The comparison among the biLSTM and IF-biLSTM models was made using a spider chart of MSE, MAE, R, and R 2 values seen in Fig. 11 for 2-6 ahead forecasting.  It is obvious from Fig. 11 that the IF-biLSTM model has lower MSE and MAE values, and a higher R, and R 2 values than the biLSTM model for 2-6 ahead forecasting study.

Conclusions
Prediction of daily precipitation is a challenging task because it has a nonlinear and nonstationary property of the data. Recently, the biLSTM model is used for forecasting aims in different fields like the prediction of financial time series, stock price, and trading area (Siami-Namini S et al. 2019;Siami-Namini S et al. 2019;Kim et al. 2019;Wu et al. 2020;Lu et al. 2021).
In this study, a novel IF-biLSTM model was employed for the estimation of daily precipitation data. The main motivation of this study is to perform a promising and attainable high-performance forecasting model for daily precipitation data. The IF feature with the biLSTM model was proposed for the first time in the literature for forecasting daily precipitation data.
For this aim, firstly, the daily precipitation data were split as training and testing data. The testing data were completely unused (not applied to the model) during the training stage of the model.
Three DNN models LSTM, GRU, and biLSTM were developed for one ahead forecasting of daily precipitation data using one to three precipitation data inputs. The obtained performance parameters indicate that the forecasting performance of the biLSTM model is much better than LSTM and GRU model for one day ahead forecasting.
Also, the performance of the biLSTM model was analyzed for 2-to 6-day ahead forecasting to show the far ahead forecasting performance of the model. According to numerical results, the biLSTM model forecasting performance with two inputs is better than for 2, 3, 4, and 6-day ahead forecasting.
The IF term plays a significant role in the definition and analysis of time-frequency representations. In this Fig. 10 a Six-ahead forecasting of daily precipitation data using two inputs IF-biLSTM model. b Scatter plot study, to analyze the forecasting performance of instantaneous frequency feature, IF with two precipitation data was applied to the biLSTM model. It is seen that the IF feature improves the forecasting performance of the proposed model. Fig. 11 Spider charts display the performance parameters of biLSTM and IF-biLSTM models a for two ahead forecasting, b for three ahead forecasting, c) for four ahead forecasting, and e for five ahead forecasting As seen from Table 3, There are remarkable improvements as an example R 2 parameter starting with one ahead forecasting: R 2 values as 0.994 (without IF feature) to 0.998 (with IF feature), with two ahead forecasting; R 2 values as 0.921 (without IF feature) to 0.983 (with IF feature), with three ahead forecasting; R 2 values as 0.869 (without IF feature) to 0.909 (with IF feature), with four ahead forecasting; R 2 values as 0.777 (without IF feature) to 0.851 (with IF feature), with five ahead forecasting; R 2 values as 0.730 (without IF feature) to 0.783 (with IF feature), with six ahead forecasting; and R 2 values as 0.721 (without IF feature) to 0.7646 (with IF feature).
Also, it is seen from the spider chart that the IF-biL-STM model has lower MSE and MAE values, and higher R, and R 2 values than the biLSTM model for 2-6 ahead forecasting study.
A new deep neural network and instantaneous frequencybased model called IF-biLSTM is proposed in this study. Thus, it is achieved high forecasting performance of precipitation data using a reliable model and the IF-biLSTM model is outperformed according to LSTM and GRU models. Especially even for far forward, forecastings (four, five, six) IF-biLSTM model gives more accurate results according to the biLSTM models. It results that the proposed model can be used for different forecasting studies confidently.
Author contribution Levent Latifoğlu conceived and designed the analysis, performed the analysis, and wrote the paper.
Data availability All data used are original, and the method used in the article was applied for the first time.

Declarations
Ethics approval Ethical approval is no need for approval.

Consent to participate This article accepted by all authors.
Consent for publication For this article, publishing approval has been given.

Competing interests
The author declares no competing interests.