Streamow forecasting based on the hybrid decomposition-ensemble model

Reliable and accurate streamflow forecasting plays a vital role in the optimal management of water resources. To improve the stability and accuracy of streamflow forecasting, a hybrid 16 decomposition-ensemble model named VMD-LSTM-GBRT, which is sensitive to sampling, noise 17 and long historical changes of streamflow, was established. The variational mode decomposition 18 (VMD) algorithm was first applied to extract features, which were then learned by several long 19 short-term memory (LSTM) networks. Simultaneously, an ensemble tree, a gradient boosting tree 20 for regression (GBRT), was trained to model the relationships between the extracted features and 21 the original streamflow. The outputs of these LSTMs were finally reconstructed by the GBRT model 22 to obtain the forecasting streamflow results. A historical daily streamflow series (from 1/1/1997 to 23 31/12/2014) for Yangxian station, Han River, China, was investigated by the proposed model. VMD-LSTM-GBRT was compared with respect to three aspects: (1) Feature extraction algorithm; 25 ensemble empirical mode decomposition (EEMD) was used. (2) Feature learning techniques; deep 26 neural networks (DNNs) and support vector machines for regression (SVRs) were exploited. (3) 27 Ensemble strategy; the summation strategy was used. The results indicate that the VMD-LSTM- 28 GBRT model overwhelms all other peer models in terms of the root mean square error 29 (RMSE=36.3692), determination coefficient (R 2 =0.9890), mean absolute error (MAE=9.5246) and 30 peak percentage threshold statistics (PPTS(5)=0.0391%). The addressed approach based on the 31 memory of long historical changes with deep feature representations had good stability and high 32 prediction precision.

to obtain the forecasting streamflow results. A historical daily streamflow series (from 1/1/1997 to 23 31/12/2014) for Yangxian station, Han River, China, was investigated by the proposed model.

37
Streamflow forecasting is of great significance for the optimal management and effective 38 operation of a water resources system. Therefore, it has been investigated by many researchers, and 39 numerous forecasting models have been developed in the past decades. Among these models, 40 forecasting techniques based on statistical modeling, data-driven models, seem to be in fashion for 41 their simplicity and robustness (Huang et  52 Wang et al., 2009). However, most of the AI models, which belong to the "shallow" learning 53 category, cannot sufficiently represent instinctual information (Bai et al., 2016). The deep learning 54 models, e.g., the deep belief network (DBN) and recurrent neural networks (RNNs), can overcome 55 this drawback due to their deeper representation ability (Bai et al., 2016). However, these deep 56 learning models completely rely on historical observed data, and some of the earlier changes of 57 streamflow may or may not influence future streamflow. It is entirely possible for the gap between 58 the streamflow information from further back in time and the current point where it is needed to 59 become large. Therefore, using a deep learning model that can automatically "remember" or "forget" 60 previous information should be able to enhance the accuracy of streamflow forecasting. Fortunately, 61 LSTM (Hochreiter and Schmidhuber, 1997), one of the deep learning models, has the ability to 62 tackle this task. LSTM has been successfully used in some fields, e.g., accident diagnosis (Yang and The implementation process of the VMD model is summarized as Algorithm 1.

173
The LSTM memory block diagram is illustrated in Fig. 1 where W, U and b are input weights, recurrent weights and biases, respectively, and the subscripts 190 i , f and o represent the input, forget and output gates, respectively. The activation function 191

Gradient Boosting Regression Trees (GBRTs)
195 Gradient boosting is a powerful machine learning strategy to efficiently produce highly robust, 196 competitive, interpretable procedures for both regression and classification (Friedman, 2001; 197 Friedman, 2002). The key to boosting is to combine the output of many weak prediction models

239
After discussing each key constituent separately, the approach of the proposed model VMD-240 LSTM-GBRT can be concluded as follows and is diagrammed in Fig. 2.

241
Step 1. Collect raw daily streamflow data

242
Step 2. Use VMD to decompose the raw series X into several components. 243 Step 3. Plot the partial autocorrelation coefficient figure of each component obtained in step 2 to 244 select optimal numbers of inputs for it. Divide each of the components into three sub-sets: the 245 training set (80%) for training multiple LSTM structures, the development set (10%) for searching 246 optimal structure, and the test set (10%) for validating the ensemble model VMD-LSTM-GBRT.

247
Step 4. Given the test set, predict each component based on the optimal LSTM structure of each 248 mode obtained in step 3.

249
Step 5. Build the ensemble tree model GBRT using the components obtained in step 2 as input and 250 the original series obtained in step 1 as output. Use GBRT to reconstruct the predictions given by 251 step 4.

252
Step 6. Output the forecasting streamflow results and perform error analysis.
is then selected as one of the input variables under the condition that 333 PACF at lag k is out of the 95% confidence interval indicated by the blue lines in Fig. 7. Fig. 7 334 shows that almost all the PACFs of each component are out of the range. Therefore, we select the 335 20 days of lag form ( ) 1 and 20-21-21-21-21-21-1 as the optimal structure of the 1 to 5 hidden layers, respectively. 20-19-363 19-1 means that the structure has 20 input features, 1 output target, and two hidden layers, with 19 364 hidden units for each layer. Fig. 8 (b) shows the boxplots of optimal structure for each level of the 365 hidden layers, where the upper and lower quartiles are determined by the PPTS(5) of the training 366 and development set. The range between the upper and lower quartiles indicates the degree of bias 367 and variance tradeoff; the smaller the range, the better the tradeoff. From Fig. 8 (b), one can find 368 that the structure 20-15-1 has the best bias and variance tradeoff. Therefore, the model structure 20-369 15-1 was selected as the optimal model to predict IMF1.

370
To validate the optimal model for forecasting each sub-series, the predictions during the test 371 period were renormalized to the original scale and are plotted in Fig. 9. The PPTS(5) and R 2 of the 372 whole components during the training and development period are listed in Table 2. From Fig. 9 373 and    Fig. 10. From Fig. 10 (a), one can find that the peak predictions obtained by VMD-LSTM-

408
GBRT are closer to the original streamflow. Moreover, the scatter plot as shown in Fig. 10 Fig. 11 (a), the proposed model performed better for peak flow forecasting than the traditional 432 decomposition method EEMD, which can be validated by Fig. 11 (c). From the scatter plot 433 illustrated by Fig. 11 (b), one can find that the recorded predicted values of the proposed model are 434 much more concentrated than the model using EEMD. Moreover, the comparison of prediction 435 performance between the proposed feature learning model LSTM and the other two machine 436 learning models, DNN and SVR, were conducted and are indicated by Fig. 12. However, from the 437 forecasting results shown in Fig. 12 (a) and the scatters shown in Fig. 12 (b), one can observe that 438 the difference between the three feature learning models is not that obvious. However, we can still 439 determine that the best feature learning model is the LSTM from the quantitative evaluations given 440 in Table 3 and the detailed predictions shown in Fig. 12 (c). As shown in Table 3

454
In this paper, a decomposition-ensemble-based multiscale feature learning approach with 455 hybrid models was developed for forecasting daily streamflow, and the approach was evaluated 456 based on a historical river streamflow dataset for Yangxian station, Han River, China. To improve 457 the accuracy and stability of forecasting, three aspects were considered in a comprehensive way: (1)