Streamflow prediction of the Yangtze River base on deep learning neural networks: Impact of the El Niño–Southern Oscillation

Accurate long-term streamflow and flood forecasting has always been an important research direction in hydrology research. Nowadays, with climate change, floods, and other anomalies occurring more and more frequently and bringing great losses to society. The prediction of streamflow, especially flood prediction, is important for disaster prevention. Current hydrological models based on physical mechanisms can give accurate predictions of streamflow, but the effective prediction period is only about one month in advance, which is too short for decision making. Previous studies have shown a link between the El Niño–Southern Oscillation (ENSO) and the streamflow of the Yangtze River. In this paper, we use ENSO and the monthly streamflow data of the Yangtze River from 1952 to 2016 to predict the monthly streamflow of the Yangtze River in two extreme flood years by using deep neural networks. In this paper, three deep neural network frameworks are used: Stacked LSTM, Conv LSTM Encoder-Decoder LSTM and Conv LSTM Encoder-Decoder GRU. Experiments have shown that the months of flood occurrence and peak flows predicted by these four models become more accurate after the introduction of ENSO. And the best results were obtained on the Convolutional LSTM + Encoder Decoder Gate Recurrent Unit model.


INTRODUCTION
The Yangtze River is one of the most important rivers in China, with a large, densely populated, and economically developed river basin. Flooding in the Yangtze River is of great concern to people, and China has invested heavily in flood prevention.
However, thousands of people still died in several major floods in the past three decades, and the average direct loss is more than 100 billion RMB per year [1] . Yangtze River streamflow forecasting plays an important role in flood prevention and post-disaster relief, as well as in integrated water resources development and utilization, scientific management, and optimal scheduling. Because many factors affect the streamflow of the Yangtze River [2] , researchers have used various methods to predict the streamflow of the Yangtze River over the years to obtain valuable prediction data.
Runoff is a natural signal, a complex non-linear time series that is simultaneously influenced by a variety of factors such as rainfall in the basin, the degree of erosion in the basin, atmospheric circulation, and urban and rural water use. Different methods of flow prediction have been proposed by researchers for predicting runoff. These methods can be divided into short-term prediction methods, dealing with prediction times of hours [3,4] to days [5][6][7] , and long-term prediction methods, dealing with scales of weeks [8] , months [7,9] , and even years [10] . These methods can also be divided according to the type of model employed: hydrological models based on physical mechanisms and datadriven models based on data analysis. Hydrological models include the Soil and Water Assessment Tool (SWAT), Top Model [11] , and the Xinanjiang model [12] . These models terrestrial water storage and ENSO in the Yangtze River basin. The upstream streamflow and ENSO phases are inversely correlated while the downstream streamflow and ENSO phases are positively correlated [30] . Furthermore, Jiang Tong et al. point out La Niña is strongly associated with drought events and El Niño related to floods in the middle and lower Yangtze River basin, while the opposite is true in the upper Yangtze River basin [31] .
The types of data used for solving streamflow prediction problems with artificial intelligence include streamflow, precipitation, sea surface temperature, wetness, sea level pressure, and evaporation. Such as S. Sharma compared the differences between the adaptive neuro-fuzzy inference system and the Loading Simulation Program in C++ model using these types of data, and found that the two methods produced similar results [32] . Typically, the data used for streamflow prediction using ANNs are streamflow, evaporation, and precipitation; ENSO data has not been used [3,7,33] . To investigate whether the introduction of ENSO values into the streamflow prediction problem will help improve the accuracy of streamflow prediction, the present paper adds ENSO values to the training data of several better-performing and widely used ANN models.

A. LONG SHORT-TERM MEMORY
Long short-term memory (LSTM) was proposed by Sepp Hochreiter et al. in 1997 [34] . It is an algorithm based on the recurrent neural network (RNN). LSTM solves the vanishing gradient problem by introducing three thresholds and two memory states [35] . The formulas used by LSTM are Equations (1)- (6). , , and comprise the matrix of parameters to be trained. , , and are the biases to be trained. is the entered data. ℎ −1 is the result of the last moment of memory. ℎ represents the short term memory and the cell state represents the long term memory.
The formula for the input gate is (1), the formula for the forget gate is (2), and the formula for the output gate is (3).
The tanh activation function limits the output to between -1 and 1, and can be replaced by other activation functions. The three gates multiply the input data and the memory of the previous moment and output. Equation (4) is the formula for memory, which is the result of multiplying the output data from the current output gate with the cell state that has undergone the tanh function; the memory represents the short term memory resulting from the action of the output and the long term memory. The cell state represents the long term memory, and is calculated as in (5) by multiplying the cell state at the previous moment through the forget gate by the candidate state. The candidate state represents the information to be deposited in the cell state, and is calculated as in (6); it is the result of the action of the current input data and the output data from the previous moment. Fig.1 shows the structure of an LSTM memory unit.
In this paper, the LSTM model is used in stacked LSTM and convolutional LSTM encoder-decoder LSTM. Fig.2 illustrates the Stacked LSTM used for the experiments  affecting the prediction performance and even produce better results, thus making it a frequently used model in machine learning [37] .
The GRU is similar in principle to LSTM, with an update gate (7)  determines the update of the memory by controlling the candidate state, but controls how much of the information from the previous memory is forgotten. The candidate hidden layer represents the memory formed at the current moment. Fig.3 shows the structure of a GRU memory unit.
In the experiments set up in this paper, the convolutional LSTM encoder-decoder GRU (Conv LSTM Encoder-Decoder GRU) uses a three-layer GRU as the decoder structure to decode the encoded vectors and output them through the dense layer.

C. CONVOLUTIONAL LSTM NETWORK
The convolutional LSTM network (Conv LSTM) was proposed by Shi Xingjian et al. in 2015 [38] . In the past, LSTM was used as the encoder layer when building encoder-decoder models; however, LSTM has no special design for spatial-temporal sequences and uses full connections between layers to transform data. Meanwhile, Conv LSTM uses convolution instead of full connections to transform data. Conv LSTM has roughly the same formula as LSTM, using formulas (11)- (16), but the * stands for convolution instead of a full-connection operation; otherwise, the meaning and function of each formula is as in the LSTM and described above. According to Shi Xingjian's article, a larger kernel can perceive features with larger spatial variation in the data while a smaller kernel can perceive features with a small spatial variation. Fig.4 show the structure of a Conv LSTM memory unit.
= ∘ −1 + ∘̃ (15) ̃= tanh( * [ℎ −1 , ] + ) (16) Conv LSTM was originally developed to process a series of radar wave images and extract the motion of clouds according to the time series of radar wave images, thus giving accurate short-term predictions. In this paper, the streamflow data and ENSO data are 1-dimensional data that change with time. When using Conv LSTM, the time series are first grouped according to different periods, and then the grouped 1dimensional data are treated as special 2-dimensional data, and the streamflow data and ENSO data are composed of a sequence with two channels fed into the Conv LSTM network. After the above procedure, the convolutional kernel extracts the feature information from the time series as spatial features, thus increasing the accuracy of the prediction.

D. Conv LSTM ENCODER-DECODER RNN
The encoder-decoder model was (DNNs) [39] . The encoder-decoder structure is shown in Fig. 5. The encoder encodes the input field into a vector and the decoder decodes the encoded vector into the output field. Ilya Sutskever et al. found encoder-decoder structure constructed by the LSTM model handles the translation results similar to the best translation results at that time. Therefore, the encoder-decoder structure is often used to handle the sequence-to-sequence problem. The encoderdecoder model has one feature when dealing with the sequence-to-sequence problem: It is sensitive to the order of the input sequences, which means that encoder-decoder may perform well in dealing with the time series problem. Since streamflow prediction Figure 5 The architecture of Encoder Decoder Figure 6 Conv LSTM Encoder Decoder using time series data consisting of streamflow and ENSO values to predict future streamflow data can also be used as a sequence-to-sequence problem, the encoderdecoder structure is chosen for our experiments.
The Conv LSTM encoder-decoder RNN used in this paper uses encoder-decoder as the model framework (Fig.6), and the encoder uses a Conv LSTM with 64 convolutional kernels, and the size of the convolutional kernels is adopted as (n, 3 and Datong [40] ; the floods that occurred in 1998 and 2016 caused great economic losses in the Yangtze River basin.
In this experiment, we use streamflow

D. PERFORMANCE EVALUTION
After the model has completed its predictions outputting data in the range 0-1, the normalized data needs to be reduced to the original size data using equation (18) when performing the evaluation.
To measure the difference between the true and predicted values, we used the following four statistics.
The root mean square error (EMSE) is defined as Equation (19).
The RMSE is the inverse square of the mean square error. The inverse square method reduces the MSE by an order of magnitude so that the scale of the result is the same as that of the original data, making it possible to compare the results more intuitively. When evaluating data that are expected to follow a Gaussian distribution, the RMSE is more suitable than the MAE to reflect the model performance [43] .
The coefficient of determination ( 2 ) is defined as Equation (20).
The coefficient of determination reflects what percentage of the fluctuations in the predicted value can be explained by the fluctuations in the observed values [44] . The decision coefficient takes values in the range -∞ to 1. Willmott's Index of agreement (WI) is as shown in Equation (21).
WI is often used in the measurement of hydrological data. It is dimensionless, and  [45] .
It not oversensitive to extreme values and can reflect additive and proportional between model predictions and observations. The index is better suited as a complement to assessment instruments than other correlation-based assessment instruments. It is also dimensionless, bounded by 0 and 1.0.And higher the LMI value, the better the fitting effect of the model [46] .
Among all the equations, where n represents the number of data pairs, is the observed values, ̂ represents the forecasted value and represents the mean of observed values.

RESULTS
As mentioned above, the monthly streamflow forecasts of the Yangtze River have        Combining the prediction results for two years, we can find that 18m-min-pd outperforms the other datasets in most cases, and gives predictions suitable for later reference. We can conclude that the model with 18m-min-pd performs well on the streamflow + ENSO dataset.      In this paper, we introduce ENSO values that are implicitly related to the streamflow data, in addition to the previous machine learning approach of using only streamflow data for training and prediction.
Through this, we can enhance the training effect by increasing the data dimensions and get more accurate monthly streamflow predictions, and hopefully more accurate flood predictions. The bestperforming Conv LSTM encoder-decoder GRU model is used in this next experiment, and the bestperforming 18m-min-pd data partitioning method is used to compare the difference in prediction results between the ENSO + streamflow dataset and the       with a significant improvement in the 1998 prediction and a small improvement in the 2016 prediction.
In the above comparison, the addition of ENSO data to the 18m-min-pd division   between ENSO data and flow data enables the network structure to extract more information, thus compensating for the time series feature extraction deficiency to some extent and greatly improving the accuracy of prediction.
We found that the neural network model predicts the middle and lower reaches of the Yangtze River represented by the flows at Hankou and Datong stations. By adding ENSO data to the streamflow data, the prediction ability of each model on different parameters is greatly improved, which reveals that there is an implicit relationship between ENSO and flow data that can be learned by the neural network. At the same time, the discrepancy between the 1998 and 2016 forecasts shows that there is a certain discrepancy between the 1998 flood season streamflow forecast trend and the actual streamflow trend; the 2016 flood season streamflow forecast trend is almost the same as the actual trend because the streamflow of the mainstream of the Yangtze River started to be disturbed by human interference after the construction of the Three Gorges Dam and other water conservation projects. The difference between the streamflow of the Yangtze River in the last century and the streamflow changes in the current century is due to this influence, which leads to the fact that the prediction model cannot learn similar unnatural river streamflow changes simply by adding ENSO data. We note that the number of streamflow data samples collected is only about 700, which is small for machine learning. Augmenting the model with ENSO data can be seen as augmenting the training set and compensating for this lack of data.
The variation in streamflow volume in the Yangtze River is not only related to the ENSO data but also many other variables; thus, the data can be enhanced by adding more variables, which would make the prediction more accurate. Different regions in the Yangtze River basin have different relationships with climate change, and different locations in the Yangtze River have different relationships with upstream streamflow; thus, more sites could be used for joint prediction. With the rapid development of deep learning, there may be many more powerful models that would enable us to obtain better prediction results.