Monthly Streamflow Forecasting Using Convolutional Neural Network

Monthly streamflow forecasting is vital for managing water resources. Recently, numerous studies have explored and evidenced the potential of artificial intelligence (AI) models in hydrological forecasting. In this study, the feasibility of the convolutional neural network (CNN), a deep learning method, is explored for monthly streamflow forecasting. CNN can automatically extract critical features from numerous inputs with its convolution–pooling mechanism, which is a distinct advantage compared with other AI models. Hydrological and large-scale atmospheric circulation variables, including rainfall, streamflow, and atmospheric circulation factors are used to establish models and forecast streamflow for Huanren Reservoir and Xiangjiaba Hydropower Station, China. The artificial neural network (ANN) and extreme learning machine (ELM) with inputs identified based on cross-correlation and mutual information analyses are established for comparative analyses. The performances of these models are assessed with several statistical metrics and graphical evaluation methods. The results show that CNN outperforms ANN and ELM in all statistical measures. Moreover, CNN shows better stability in forecasting accuracy.


Introduction
Long-term streamflow forecasting is an essential basis for managing water conservancy and hydropower projects. This type of forecasting has a long forecast period, which gives water managers sufficient time to allocate water to different sectors. However, the complex 1 3 results indicated that 1D-CNN is superior to SVM and multilayer perceptron. Chong et al. (2020) developed a hybrid 1D-CNN to forecast daily and monthly rainfall over the Langat River Basin in Malaysia. Hussain et al. (2020) employed 1D-CNN to forecast the streamflow of four rivers in the UK. Barino et al. (2020) investigated the availability of 1D-CNN in multiple days ahead river discharge forecasting. Compared with 1D-CNN, literature on the use of 2D-CNN in streamflow forecasting is limited: Huang et al. (2020) applied 2D-CNN to forecast daily streamflow, and the 2D-CNN forecasting accuracy was much better than that of comparative models. Chen et al. (2021) designed a 2D-CNN model to realize flood process forecasting.
Motivated by its feature extracting ability, outperformance in short-term streamflow forecasting, and limited applications in long-term streamflow forecasting, we investigate the potential of the 2D-CNN to accurately forecast monthly streamflow in this study. Antecedent rainfall, streamflow, and atmospheric circulation factor (ACF) data are converted into 2D matrices and then used to drive CNNs to forecast one-month-ahead streamflow, which, to the best of our knowledge, has not been considered in existing studies. ANN and ELM are chosen as comparative methods. Two cases, Huanren Reservoir and Xiangjiaba Hydropower Station of China, are used to verify the feasibility of 2D-CNN.

Convolutional Neural Network
Generally, CNN refers to 2D-CNN, the inputs of which are 2D matrices (Hussain et al. 2020). However, there are other types of CNN, such as 1D-and three-dimensional (3D)-CNN. All types of CNN have the same characteristics but differ in the dimension of the input matrix. The CNN employed in this study is the 2D-CNN. Two salient characteristics contribute to the uniqueness of CNNs. First, a neuron is only connected to their local nearby neurons in the previous layer. Second, the pooling mechanism is introduced to significantly reduce the number of coefficients in the network (Mozo et al. 2018). A standard CNN generally comprises five types of layers: input, convolution, pooling, fully connected, and output layers (Fig. 1).
The input layer provides input information for the entire model. In this study, antecedent data with time and space dimensions are employed to forecast streamflow. Let x-and y-axis denote the time and space of a matrix, respectively. The elements in the matrix are observations of different input variables at different times. In the time dimension, time ranges from the past to the present, and the time interval depends on the type of forecast. For the one-month-ahead streamflow forecasting, the time interval is 1 month. In the space dimension, different variables are viewed as a sequence of dots. Suppose the length and width of the input matrix are m and n, respectively, the input matrix X can be written as follows: where i denotes different timeseries, j denotes the periods, m represents the number of timeseries employed, n is the length of timeseries, and x ij represents the value of ith timeseries at the jth period. The convolution layer comprises several convolution kernels and learns feature representations from the input layer. Denote the parameters, input, and output of the jth kernel in the lth convolution layer by j l , b j l , x l and o j l , respectively, the output of the jth kernel can be written as follows: where j is the index of the convolution kernel in the convolution layer, x k l is the ith channel of the x l , M l is the number of kernels in the lth layer, f (⋅) is the activation function, and b j l represents a bias item.
The pooling layer is used to decrease the size of the outputs from convolution layers with pooling methods. In this study, the average-pooling operation is employed. Since only one value (average value), in each scanned region is selected, the number of CNN parameters after the pooling operation significantly reduces. The alternation of the convolution and pooling layers not only reduces the network scale of a CNN but also identifies the most prominent features of the input layer.
In the fully connected layer, different features learned by the convolution and pooling layers are converted into a dense vector. The output layer is used to establish the relation, y = Wo + b , among the dense vector o, bias item b, and the forecasted value y. Once these parameters (the vector W and b) are determined, a final forecasted result can be obtained when giving a dense vector o.

Artificial Neural Network
ANN is a nonlinear dynamic system and can approximate any nonlinear mathematical input-output relations (Cichocki and Unbehauen 1993;Pashova and Popova 2011). There are many ANN variants (Yılmaz and Yuksek 2008). In this study, we employ the feedforward backpropagation (BP) network as a comparative model. The training of a BP neural network is an optimization process (Kuang and Xu 2018), which comprises two parts: the forward and backward passes. In the forward pass, the input is processed through the ANN, and then the forecasted results are obtained. If the deviation between the forecast and observation is large, the backward pass will be performed to modify the parameters of each layer. After sufficient iterations of the forward and backward passes, the ANN can forecast accurately.

Extreme Learning Machine
ELM is a type of single hidden layer feedforward neural network (Huang et al. 2006;Hadi et al. 2020). We present a brief description of ELM. Considering a set of training samples X 1 , y 1 , X 2 , y 2 , ⋯ , X i , y i , ⋯ , X N , y N , where X i ⊂ R l is the input vector of the ith sample,y i is the corresponding observation, and N is the size of training samples. The ELM with P hidden neurons can be modeled as follows: where j is the weight vector connecting the input variables to the jth hidden neuron, j is the weight vector connecting the jth hidden neuron to the output neuron, and b j is the bias of the jth hidden neuron. Equation (3) can be expressed in matrix form as y = H , where = y 1 , y 2 , ⋯ , y N T , = 1 , 2 , ⋯ , P and defined as follows: The solution of ELM is ̂ = † , where † = ( T ) −1 is called the Moore-Penrose generalized inverse of H (Huang et al. 2004). Finally, the forecasted value is expressed as follows: where X test is the input vector in the testing period.

Input selection methods for artificial neural network and extreme learning machine
The CC function measures how two random variables X and Y covary linearly by calculating the linear correlation coefficient r XY : where x i and y i are sample values of X and Y, respectively; N is the sample size; x and y are the mean value of the samples. r XY = 1 implies a perfect linear correlation, whereas an intermediate value corresponds to partial correlations, and r XY = 0 implies X is uncorrelated with Y. (3) MI quantifies the stochastic dependency between two random variables without assuming the nature of their relation (Babel et al. 2015). For two discrete variables, X and Y, MI can be expressed as follows (Siqueira et al. 2018): where the p(X, Y) is the joint probability density function; p(X) and p(Y) are the marginal distribution function of X and Y, respectively. MI values range between 0 and infinity (∞). MI(X, Y) = 0 implies that X and Y are independent of each other. If X and Y are dependent, the MI value will be greater than 0, and the larger the value, the stronger the dependence.

Model Performance Evaluation
The following three statistical indices are used to evaluate the performance of the models developed in this study (Hadi et al. 2019).

I. Mean absolute error (MAE):
II. Root-mean-square error (RMSE): III. Nash-Sutcliffe efficiency coefficient (NSE): where Q i sim and Q i obs are the forecasted and observed ith value of the streamflow, respectively; Q obs is the average of observed streamflow, N is the number of observations.

Description of Catchments and Data
Huanren Reservoir and Xiangjiaba Hydropower Station are taken as case studies (Fig. 2). The Huanren Reservoir basin, located in the upper reaches of the Hunjiang River in China with a drainage area of 10,400 km 2 , is characterized by a mountainous environment with good vegetation. The average annual rainfall is approximately 860 mm with an average annual streamflow of 142 m 3 /s, 70% of which are from May to October. The Xiangjiaba Hydropower Station, located in the Jinshajiang River, is China's third-biggest hydropower station, following the Three Gorges Hydropower Station and Xiluodu Hydropower Station. The annual average streamflow in the river is approximately 3810 m 3 /s, and the annual average rainfall of the entire basin is approximately 756.4 mm, 90% of which concentrates in summer. The ACF data, including 66 variables is provided by the National Climate Center, China. Together with rainfall and streamflow, 68 kinds of variables are taken as candidate inputs. The dataset for Huanren covers 600 months  and is partitioned into three parts: the training dataset , the cross-validation dataset (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005), and the testing dataset (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). The dataset for Xiangjiaba covers a period of 528 months (1967-2010), among which, 1967-1993, 1994-2000, and 2001-2010 are partitioned as the training, cross-validation, and testing periods, respectively.

Model Development
In this subsection, CNN, ANN, and ELM models are established to forecast the monthly streamflow in flood season (May-October) for Huanren and the entire year for Xiangjiaba.

Convolutional Neural Network Model
For CNN, 68 variables during the past 36 months are employed as predictors. The structure of CNN used in this study includes two convolution layers, two pooling layers, and one fully connected layer. Each convolution layer has only one channel, and the hyper-parameters including To avoid overfitting in CNN (also in ANN and ELM), the training process is subjective to the stopping criteria, where the cross-validation error increases for a specified number of iterations, or the training reaches the scheduled maximum iterations. After the operation from step a to d, there are still many feasible structures. Among these different feasible structures, nine parameter combinations, representing different structures (Table 1) were finally chosen for forecasting.

ANN and ELM Models
For the ANN and ELM, their inputs are determined via CC and MI analyses, and four comparative models are established: CC-ANN, CC-ELM, MI-ANN, and MI-ELM. The number of inputs for the ANN and ELM models is designed to agree with the dimension of dense vectors in the fully connected layers listed in Table 1.
To identify the best architecture for ANN, we follow the approach in Zhang et al. (2015). A three-layer BP network model structure is used, and the optimal number of neurons in the hidden layer is determined using a heuristic method. Specifically, different numbers of neurons from 1 to 10 are tried 20 times, and the architecture that acquires the smallest MAE value in the training period is considered the optimal model. For the ELM, different

Results and Discussion
The 1 month-ahead streamflow forecast performances in the testing period of the three models for Huanren and Xiangjiaba are listed in Table 2 The results also showed that there was no strict increasing or decreasing trend for each performance index for any model, which implied that the inclusion of more inputs did not necessarily improve forecast results. Notably, the forecasting for Xiangjiaba was more accurate than that for Huanren with NSE varying from 0.13 to 0.84 and −0.55 to 0.37, respectively. Figure 3a details the best forecast results from each model in the testing period for Xiangjiaba, including the observed and forecasted data, absolute errors (forecasted value − observed value), and relative errors (RE). The forecasts from the three models all fit well with the observations, with an average RE smaller than 10%. However, CNN underestimated the peaks of streamflow, whereas ANN and ELM were more likely to overestimate the peaks. For medium flow, the forecast results of the models were close, and all fit well with the observations. In terms of low flow, all models underestimated the streamflow. Figure 3b exhibits the best forecast results from each model in the testing period for Huanren. From the figure, the peaks of streamflow in this region were mostly overestimated by each model, except for some extremely high peaks. For the medium and low flows, the average REs for each model were approximately 50%, and the maximum RE was up to 500%. Compared with Xiangjiaba, the forecast performances by each model in Huanren were relatively poor, which might be attributed to the Huanren Reservoir basin characteristics. The Huanren is located in northern China, where the formation of rainfall in the flood season is strongly influenced by the mutual effects of Siberian cold air and the summer monsoon. The interaction between the Siberian cold air and the summer monsoon is complicated and varies significantly, thus leading to the difficult forecasting of long-term streamflow. Besides, streamflow in Huanren has a greater variation, which might increase the difficulty in forecasts.
The scatter plots and linear correlation coefficients (r) for the best forecast results from each model are shown in Fig. 4. In Fig. 4a, the data points of each model were scattered, which confirmed the conclusion that the forecast results for Huanren were relatively poor. Nevertheless, CNN still performed slightly better than ANN and ELM with a relatively high r-value of 0.615. In Fig. 4b, the data points of each model concentrated in the vicinity of the diagonal y = x , and the correlation coefficients of all models were above 0.9.  Although the r-value of CNN was the lowest ( r CNN = 0.910 , r ANN = 0.929 , r ELM = 0.938 ), it was only a 2.04% and 2.99% reduction compared with ANN and ELM, respectively. Table 3 presents the metric improvement of CNN compared with ANN and ELM in terms of their best forecast results. In Table 3, the metrics of CNN were better than those of ANN and ELM for the Huanren. For the Xiangjiaba, all metrics, except r, outperformed those of ANN and ELM.
In the process of establishing forecast models, it is not easy to obtain the optimal input combination for a specific model. Therefore, the sensitivity of the model performance to input combinations is also a key indicator and worthy of attention. A lower sensitivity can make forecasts closer to the optimal value, although the optimal input combination is not employed. Figure 5 displays the overall distribution of the performance metrics. Notably, for MAE and RMSE, smaller median and shorter box height represent higher accuracy and stability, whereas, for NSE, a larger median and shorter box height represent more satisfying forecasts and lower sensitivity. For the Huanren, ANN had a high sensitivity to the input combinations with a large box height of each performance metric. CNN and ELM had nearly the same but better stability than ANN. CNN outperformed ELM because of its smaller box median of MAE and RMSE and larger median of NSE. For the Xiangjiaba, a similar result was realized. In this study, CNN was applied to long-term streamflow forecasting for the first time. In the application of Huanren, CNN outperformed its comparative models with better forecasting results and stable forecasting ability. In the application of Xiangjiaba, similar results were obtained. Therefore, we can assert that CNN can be used for monthly streamflow forecasting, and its feature extraction ability is effective. For most data-driven models (e.g., ANN and ELM), a careful selection of the input variables is required, whereas, for CNN, such work can be completed by its feature extraction module automatically, and only constructing the candidate variables (e.g., rainfall, streamflow, and ACFs) into input matrices is required. In fact, we cannot underestimate this advantage, which can save us a lot of work, especially for predictions from numerous catchments. Further, incorporating this advantage into other data-driven models is meaningful and relevant for further study, which will make them also have the feature extraction ability.
Although encouraging forecasting results have been achieved in Huanren and Xiangjiaba, CNN has some shortcomings that require mitigation. For example, it tends to underestimate or overestimate the peaks of streamflow and yields relatively poor forecasting results for the streamflow with greater irregularity in some areas. In addition, the forecasting effect in some small watersheds (e.g., a watershed with an area less than 200 km 2 ) requires further verification.

Conclusions
In this article, we explored the potential of CNN for monthly streamflow forecasting. The monthly streamflow and rainfall data from Huanren Reservoir and Xiangjiaba Hydropower Station and ACF data from the National Climate Center, China were employed for model training, validation, and testing. Nine cases of different structures of CNN were designed with ACFs, rainfall, and streamflow delayed for 1 to 36 months as inputs. ANN and ELM were employed as comparison models. The results demonstrated that CNN could be successfully applied to forecast the monthly streamflow. Comparing the results of CNN, ANN, and ELM, CNN outperformed ANN and ELM with smaller MAE and RMSE and higher NSE. With the change of the model structure or input combinations, CNN showed better stability in forecasting accuracy than ANN and ELM. The results also demonstrated that CNN, ANN, and ELM yielded satisfactory forecasts for Xiangjiaba but were unable to maintain their accuracy for Huanren. Overall, the results and analysis presented in this study demonstrate that CNN is a superior and an alternative to ANN and ELM in monthly streamflow forecasting.
Author Contributions YP designed the study. XS performed the research and wrote the initial draft of manuscript. WD analyzed the data and made revisions to the draft. ZW contributed to the revisions. JW contributed to the revisions. ML contributed to the revisions.

Declarations
Ethical Approval Not applicable.
Consent to Participate Not applicable.