Evaluating The Performance of Several Data Preprocessing Methods Based On GRU in Forecasting Monthly Runoff Time Series

: The optimal planning and management of modern water resources depends 41 highly on reliable and accurate runoff forecasting. Data preprocessing technology can 42 provide new possibilities for improving the accuracy of runoff forecasting, when basic 43 physical relationships cannot be captured using a single prediction model. Yet, few 44 researches evaluated the performances of various data preprocessing technology in 45 predicting monthly runoff time series so far. In order to fill this research gap, this paper 46 investigates the potential of five data preprocessing techniques based on gated recurrent 47 unit network (GRU) model in monthly runoff prediction, namely variational mode 48 decomposition (VMD), wavelet packet decomposition (WPD), complete ensemble 49 empirical mode decomposition with adaptive noise (CEEMDAN), extreme-point 50 symmetric mode decomposition (ESMD), and singular spectrum analysis (SSA). In this 51 study, the original monthly runoff data is first decomposed into a set of subcomponents 52 using five data preprocessing methods; second, each component is predicted by 53 developing an appropriate GRU model; finally, the forecasting results of different two-54 stage hybrid models are obtained by aggregating of forecast results of the corresponding 55 subcomponents. Four performance metrics are employed as the evaluation benchmarks. 56 The experimental results from two hydropower stations in China show that five data 57 preprocessing techniques can attain satisfying prediction results, while VMD and WPD 58 methods can yield better performance than CEEMDAN, ESMD and SSA in both 59 training and testing periods in terms of four indexes. Indeed, it is significantly important

CEEMDAN, ESMD and SSA in both training and testing periods in practice.

155
The remainder of this paper is organized as follows. Section 2 introduces the 156 related methodologies. The experimental results and discussions are shown in Section 157 3, and Section 4 concludes the paper. the degree to which the information from the previous state will be passed to the future, 167 and the reset helps to define what information should be ignored from the previous state.

168
A typical structure of GRU is shown in Fig. 1. The specific expressions are as follows: (2) where is the output of the update gate, is the sigmoid function, is the 174 weight of the update gate, is the input of this hidden layer, ℎ −1 is the output of 175 the last hidden layer, is the output of the reset gate, is the weight of reset gate, 176 ℎ ̅ is the current memory content, tanh (•) is tanh activation function, ℎ is final 177 memory at the current step.

184
Comparing with EMD, VMD avoids error caused in the process of recursive 185 computation and ending effect. The theory of VMD is given as follows: where ( ) is the input signal, K is the total number of the modes, t is the time script, 189 is the kth mode, is the center frequency, is the Dirac distribution,  is the 190 convolution operator.

191
To convert the optimization problem into an unconstrained one, Lagrangian where and are Lagrange multiplier and penalty, respectively.

204
WPD is an efficient signal preprocessing technology, which can split the original 205 signal into approximation coefficients and detail coefficients. The wavelet basis 206 function and decomposition levels can significantly influence the performance of WPD.

207
WPD contains DWT and CWT. CWT is shown as follows: (1) White Gaussian noise series ( ) is added to the original signal ( ) and 223 the obtained signal can be expressed as: where i denotes the number of trials.

226
(2) The first IMF is obtained by EMD as follows: (3) Compute the first Res 1 ( ) by: (4) Compute the kth Res as: (5) Decompose the noise-added Res to attain the (k+1) th IMF by: where (•) denotes the kth mode of EMD.

271
(1) The original runoff data are decomposed into several subsequences using VMD,

272
(2) For each extracted sub-sequence, GRU is adopted as a forecasting tool to 274 model the split module, and to perform the corresponding forecasting for each sequence, 1994 is adopted to train the model whilst that from 1995 to 2004 is adopted for testing.

297
In order to avoid attributes within small range of values to be dominated by larger range

342
Tables 1 and 2 list the results of 6 forecasting models for Manwan and Hongjiadu 343 reservoirs, respectively. One can see from