A Hybrid Model Integrating Elman Neural Network with Variational Mode Decomposition and Box–Cox Transformation for Monthly Runoff Time Series Prediction

Precise and reliable monthly runoff prediction plays a vital role in the optimal management of water resources, but the nonstationarity and skewness of monthly runoff time series can pose major challenges for developing appropriate prediction models. To address these issues, this paper proposes a novel hybrid prediction model by introducing variational mode decomposition (VMD) and Box–Cox transformation (BC) into the Elman neural network (Elman), named the VMD-BC-Elman model. First, the observed runoff is decomposed into sub-time series using VMD for better frequency resolution. Second, the input datasets are transformed into a normal distribution using Box–Cox, and as a result, skewedness in the data is removed, and the correlation between the input and output variables is enhanced. The proposed VMD-BC preprocessing technology is expected to overcome the problems arising from nonstationary and skewed runoff data. Finally, Elman is used to simulate the respective sub-time series. The proposed model is evaluated using monthly runoff time series at Zhangjiashan, Zhuangtou and Huaxian hydrological stations in the Wei River Basin in China. The model performances are compared with those of single models (SVM, Elman), decomposition-based (VMD-SVM, VMD-Elman et al.) and BC-based models (BC-SVM and BC-Elman) by employing four metrics. The results show that the hybrid models outperform single models, and the VMD-BC-Elman model performs best in all considered hybrid models with an NSE greater than 0.95, R greater than 0.98, NMSE less than 4.7%, and PBIAS less than 0.4% in both the training and testing periods. The study indicates that the VMD-BC-Elman model is a satisfactory data-driven approach to predict nonstationary and skewed monthly runoff time series, representing an effective tool for predicting monthly runoff series.


Introduction
Accurate and reliable runoff prediction is of great significance for water resources planning and management, flood control and drought warning, which has attracted extensive attention as a vital and difficult topic in hydrology modeling (Peng et al. 2011). However, runoff time series show complex nonlinearity, nonstationarity and skewness (Barge and Sharif 2016), which are attributable to climate changes and frequent human activities, such as the variations in precipitation, conservancy projects, and urbanization. As a result, it is a great challenge to accurately simulate the intrinsic dynamic process of inconsistent runoff series (Feng et al. 2020). In recent years, a multitude of runoff prediction models have been developed to model runoff time series, and representative models generally include two categories: process-driven models and data-driven models. Process-driven models, which are based on strict physical mechanisms, can successfully characterize rainfall-runoff processes with complex mathematical formulas, accurate meteorology and hydrological knowledge (Troin and Caya 2014). However, there have been key modeling controversies about issues such as the adequacy of process parameterizations, data limitations and uncertainty, and computational constraints on model analysis (Demirel et al. 2009;Clark et al. 2017), which may lead to poor performance of process-driven models. In contrast, data-driven approaches, which are based on hydrological statistics and machine learning, can achieve satisfactory performance without too much information input. Datadriven models can be categorized into three main methods: statistical methods, artificial intelligence methods (AI) and hybrid models (Myronidis et al. 2018;Kuremoto et al. 2014;Tiwari and Adamowaki 2013). Statistical models, such as autoregressive models (AR) (Sarlak 2008), autoregressive moving average models (ARMA) (Pan et al. 2021) and so on, have been widely applied to capture the stationary and linear relationships between variables; however, they are not suitable for modeling nonstationary and nonlinear runoff time series. AI models, i.e., neural network models (Sedki et al. 2008), support vector machines (Aggarwal et al. 2012), and random forest (Bojang et al. 2020) have been extensively applied in hydrology prediction. Compared to conventional regression models, AI models have greatly improved prediction performance for highly nonlinear runoff time series. As a typical recurrent neural network model, the Elman neural network (Elman), equipped with a time delay operator, has short-term memory ability and is very suitable for time series forecasting, and parameter selection is simple (Krishnan et al. 2019). Support vector machine (SVM), based on the Vapnik-Chervonenkis dimension and structural risk minimization, can map data from input space to high-dimensional space in terms of a kernel function selected to suit the problem and attain the optimal solution for regression issues (Karamouz et al. 2009). However, standalone AI models have a limited ability to identify the intrinsic nonstationary features of the input data, resulting in problems that the Elman model is inclined to fall into a local optimum, overfitting for, and the SVM model strongly depends on the parameter selection (Wang et al. 2013). Therefore, a hybrid model that combines AI methods with data processing techniques has become the most popular option for predicting nonstationary runoff time series (Wen et al. 2019).
In recent years, signal decomposition algorithms, used as data processing techniques, have been widely developed to effectively extract the multiple frequency information hidden from complex runoff series, and the generalizability of the data-driven model can be improved (Zhang et al. 2015). Empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD) are commonly employed as time-frequency decomposition algorithms. Among them, EMD has strong adaptability due to a self-adaptive data-driven tool, but it lacks bases for rigorous mathematical theories and is prone to edge effects and ensuing modal aliasing (Huang et al. 2003;Sankaran and Reddy 2016). As an improved EMD model, the EEMD algorithm is proposed by adding white noise into a series of intrinsic mode functions (IMFs), which eliminates effective mode aliasing (Wu and Huang 2009). However, the amplitude of the added white noise in the EEMD model is determined by expert experience, and the modal components cannot be controlled adaptively, which may lead to information loss or incomplete decomposition (Yu et al. 2018). To address these issues, variational mode decomposition (VMD), which has a sound mathematical basis, strong anti-aliasing ability and good decomposition potential, is introduced to recursively decompose original nonstationary series into a group of relatively stable subseries (Drcgomiretskiy and Zosso 2014). As a new signal decomposition method, VMD has been applied to speech recognition, fault analysis and hydrology forecasting in recent years, and the models have shown satisfactory performance for signal decomposition (Deb et al. 2020;Mohanty et al. 2018;Li et al. 2020).
The above hybrid models can process the nonstationary characteristics of runoff time series using decomposition, but the influence of runoff skewness on forecasting performance has rarely been explored by researchers. Normalization transformation techniques of data processing are worthy of more attention. Box-Cox transformation can remove the skewness of runoff data and transform the data into a more normal distribution, which significantly stabilizes the variance in the runoff time series, improves the normality and linearity of the data series, reduces the probability of pseudoregression, and effectively improves the correlation between the input and output variables. Box-Cox has achieved good normality in hydrology frequency analysis and calculation (Vasiliades and Loukas 2009;Seong 2014), but it has rarely been employed in runoff prediction models.
In summary, AI models, especially the Elman neural network, have rarely been integrated with VMD and Box-Cox methods for runoff prediction. Therefore, to further improve the prediction accuracy, a novel hybrid VMD-BC-Elman model is proposed for monthly runoff prediction in this paper. The objectives of this study are as follows: (1) VMD is used to decompose the original nonstationary monthly runoff time series into several relatively stationary sub-time series, (2) the Box-Cox transformation is employed to normalize the candidate input variables for each sub-time series, (3) a novel hybrid model incorporating VMD, Box-Cox, and Elman is constructed and tested using the monthly runoff data collected from Wei River Basin; and (4) the robustness and generalizability of the proposed hybrid model is evaluated by comparing it to other combinations of models.

Variational Mode Decomposition
Variational mode decomposition (VMD), a novel nonrecursive signal processing algorithm proposed by Dragomiretskiy and Zosso, is utilized to adaptively decompose a complex nonstationary signal into a set of discrete bandwidth-limited modes in a spectral domain by Wiener filtering (Dragomiretskiy and Zosso 2014). The original runoff time series f (t)(t = 1, 2, .., n) , regarded as a nonstationary signal, is decomposed into K different modal functions k (t) (k = 1,2,…,K) with central frequency ω k . The bandwidth and central frequency of each mode can be estimated by seeking the optimums of the constrained variational modes. The objective function is that the sum of the estimated bandwidths of all subseries is minimized, while the constraint satisfies that the sum of each decomposed mode is equal to the original signal, which can be expressed as: where k is the set of modes, k is the set of central frequencies of modes, t represents the Dirac distribution, K is the number of modes, and ⊗ denotes a convolution operation.
The above-constrained problem can be transformed to an unconstrained variational problem by introducing a Lagrange multiplier and quadratic penalty to construct the augmented Lagrange term L: where α and λ denote the penalty parameter and Lagrange multiplier, respectively.
The alternating direction method of multipliers (ADMM), an effective splitting method for separable optimization problems (Feng et al. 2020), is employed to seek out the saddle points of the augmented Lagrange term, and therefore, the optimums of the decision variables k and k are presented as follows: where n is the number of iterations, is the update parameters of Lagrange multipliers, and ̂ n+1 k ( ), f ( ) , ̂ n i ( ) , and ̂ n+1 ( ) represent the fast Fourier transforms of n+1 k (t) , f(t) , n k (t) , and n (t) , respectively.

Box-Cox Transformation
The Box-Cox power transformation, proposed by Box and Cox in (1964), is applied to stabilize variance and remove skewness in time series and convert them into a more normal-like distribution (Box and Cox 1964;Hamasaki and Kim 2006). Assuming y i (i = 1, 2, …n) is a positive random variable, the Box-Cox transformation with parameter is defined as: (1) where y i is the original signal (y i > 0), y ( ) i is the transformed signal, and λ denotes the transform coefficient.
To estimate the parameters of the model, the transformed maximum likelihood function is adopted as follows (Hamasaki and Kim 2006): where L denotes the likelihood function and 2 is the variance in y ( ) i . Finally, the value of λ can be obtained by employing the optimization algorithm to solve Formula (5).

Elman Neural Network
The Elman neural network, first proposed by Elman in 1990, is a dynamic recurrent neural network with feedforward connections (Chandra 2015). The Elman network consists of four layers: an input layer, a hidden layer, a context layer, and an output layer.
Compared with feedforward neural networks, the additional context layer in the Elman neural network is employed to record the output information of the previous moment from the hidden layer and relay the former state in the current iteration, so the Elman neural network is sensitive to the historical input information and more suitable for time series forecasting (Mehrgini et al. 2019). Since the runoff time series have short-term dependency, the Elman neural network is suitable for runoff forecasting. The architecture is shown in Fig. 1.
The mathematical form can be expressed as ): Fig. 1 Architectural graph of Elman neural network where I t , H t , C t and O(t) denote the input, hidden, context and output layer vectors, respectively. ω 1 , ω 2 and ω 3 are weights connecting the other two adjacent layers. g and f are the activation functions of the hidden layer and the output layer, respectively. b 1 and b 2 represent the bias vector.

Proposed VMD-BC-Elman Model
According to the abovementioned methods, the hybrid model is proposed and diagrammed in Fig. 2. The modeling procedures are described as follows: Step 1: The original runoff data are decomposed into k sub-time series that are bandwidth-limited with a center frequency using VMD.
Step 2: Each sub-time series from step 1 is reconstructed using phase space reconstruction (PSR) to generate candidate input variable series for predicting models, and each reconstructed candidate input variable series is normalized by Box-Cox.
Step 3: For each candidate input variable series transformed by Box-Cox from step 2, LASSO is employed to rank and select the optimal number of input variables for the Elman model. Step 4: Each sub-time series decomposed by VMD is divided into a training dataset and a testing dataset. The parameters can be calibrated by inputting the selected input variables into the Elman model, and then the trained model is tested using the testing dataset.
Step 5: The final estimates of runoff time series are obtained by summing up the subtime series estimated from the predicting models.

Model Performance Evaluation
The Nash-Sutcliffe efficiency coefficient (NSE) (Nash and Sutcliffe 1970), the coefficient of correlation (R), the normalized mean square error (NMSE) (Himanshu et al. 2017), and the percent of the bias error (PBIAS) (Wen et al. 2019) are adopted as performance metrics to evaluate the prediction performance of different models. These metrics are defined as follows: where Q o (t) and Q p (t) are the observed and predicted runoff, respectively, and Q o and Q p denote the mean of the observed and predicted monthly runoff time series, respectively. N is the number of Q p (t).

Study Area and Data Description
The Wei River originates from Weiyuan County in Gansu Province, extends more than 800 km and finally converges into the Yellow River (Jiang et al. 2019). The Wei River Basin (WRB) covers an area of approximately 13.5 × 10 4 km 2 between 104°00′-110°20′E and 33°50′-37°18′N. Topographically, the elevation of the WRB ranges from 336 m to 3,929 m , and a digital elevation model (DEM) of the WRB is presented in Fig. 3. The basin is located in arid and semiarid regions in China, affected by a continental monsoon, and the climate is warm and rainy in summer and cold and dry in winter. Precipitation and runoff, exhibiting extremely high intra-annual and interannual variability, mainly occur in the wet season from June to October, accounting for almost 65% of the annual precipitation and over 50% of the annual runoff, respectively (Zou et al. 2018).
Notably, the WRB has been an important agricultural and industrial region in Shaanxi Province, and runoff has been disturbed by human activities. Thus, the WRB is a suitable candidate to verify the feasibility of the proposed model for predicting nonstationary runoff time series disturbed by climate change and human activities. The monthly runoff time series employed in this paper were observed from the Zhangjiashan, Zhuangtou and Huaxian stations, which are the main control stations in the Jing River, Beiluo River and Wei River, respectively, and data were collected from 1/1933 to 12/2016, 1/1938 to 12/2016 and 1/1954 to 12/2016 in the published Hydrological Yearbook, respectively, as shown in Fig. 4. It is worth mentioning that the data from Zhangjiashan station are first employed to develop the hybrid model, and data from Zhuangtou and Huaxian stations are utilized to further validate the performance of the model. Table 1 presents the statistical characteristics of the runoff series. It is shown that Cv values are over 0.90 at the three stations, the runoff varies within a relatively large range, and Cs values are greater than zero, indicating that the runoff series have skewness.
The monthly runoff hydrographs are plotted in Fig. 4, which illustrates drastic intraand interannual variability and high nonstationary. The cumulative departure curve in  1996, 1994 and 1993 at Zhangjiashan, Zhuangtou and Huaxian stations, respectively. Considering that the training period of the model is expected to contain sufficient information on the runoff data after the variability points, the monthly runoff dataset is partitioned into the training subset (80%, before December 2000) and the testing subsets (20%, from January 2001 to December 2016), as shown in Fig. 4. In theory, the proposed model could capture the variability of the runoff time series during the training period.

Decomposition Results with VMD
K and α, subject to the constraints of the input runoff time series, are two key parameters of the VMD algorithm. To achieve satisfactory performance, the center frequency iteration is run to determine the number of sub-time series K. The results show that a convergence of the center frequency of the high-frequency sequence is present when K = 8-10, indicating that the decomposition has been completed and K = 8 is a determined value, which is consistent with an earlier study (Huang et al. 2016). According to the multiple preexperiment, α = 1,000 is determined. The default parameters will be adopted for other parameters of the VMD model, as shown in Table 2. The sub-time series of the original runoff series at Zhangjiashan station are obtained by VMD decomposition and are shown in Fig. 6.

Reconstruction and Transformation by PSR and Box-Cox
To explore the intrinsic characteristics of the runoff variation, each one-dimensional IMF (sub-time series) is reconstructed into a high-dimensional feature space using PSR. Considering the periodic nature of runoff over 12 months, when PSR dimension d = 12 and delay time =1 (Packard et al. 1980), the t-th space vector in the reconstructed phase space is expressed as X t = x t−1 , x t−2 , ⋯ , x t−12 t = 13, 14, ⋯ , n , where n is the length of the original monthly runoff series, and x t−1 , x t−2 , ..., x t−12 , respectively, denote 12 different dimensional series, namely, 12 candidate input variable series.
To avoid the unsatisfactory prediction performance arising from the skewness of runoff data, each candidate input variable series is converted into a normal distribution using Box-Cox. The optimal transformation coefficient λ is estimated by the maximum likelihood method, and the estimated λ values of each candidate input variable series for runoff time series are shown in Table 3.
Taking the original runoff series as a case to illustrate the normalization of Box-Cox, the probability density of the untransformed and transformed candidate input variables are shown in Fig. 7. It can be seen that the candidate input variable series transformed by Box-Cox exhibits a normal distribution.  Figure 8 is used to analyze the correlation between the output variables and the 12 candidate input variables transformed by Box-Cox. The correlation coefficients of the transformed data have been improved, indicating that Box-Cox strengthens the correlation of the data.

Input Variable Selection
Least absolute shrinkage and selection operator (LASSO) is a kind of variable selection method that performs well for handling multicollinearity issues. LASSO selects significant input variables by adjusting the value of tuning parameter ρ from large to small and controlling the number of nonzero coefficients of candidate input variables (Milani et al. 2016). The importance ranking and the optimal number of selected input variables are shown in Table 4. Taking the original runoff series as an example, the optimal number of selected input variables is 4 using the trial-and-error method; thus, the input variables are x t−1 , x t−12 , x t−11 , x t−2 .

Development and Training of the Forecasting Models
To verify the performance of the VMD-BC-Elman model, single models and other combinations of models are used for comparison in Fig. 9. The training of these models is described in this section.

Group 1: Single models
The standard Elman and SVM are individually used as basic models for developing the hybrid models. The activation functions and parameters of Elman are set utilizing the training datasets with a maximum number of training iterations of 500, a target error of 0.001, and a learning coefficient of 0.01. The hidden layer activation function employs Tansig, the output layer activation function uses Purelin, and the training algorithm adopts the L-M algorithm. The number of each layer is determined using the trial-and-error method, and the trained Elman architecture is presented in Table 5. In  Table 3 The λ values of each candidate input variables series for the original runoff series and each IMF at Zhangjiashan station the SVM model, the radial basis kernel function is selected with an insensitivity loss parameter of 0.01, a penalty factor of 1 and a kernel function parameter of 2.
Group 2: Decomposition-based hybrid models Decomposition-based hybrid models are developed by coupling EMD, EEMD and VMD with Elman or SVM. The original runoff series are decomposed into 6, 11 and 8 sub-time series by EMD, EEMD and VMD, respectively. Each sub-time series is modeled using the Elman or SVM. In Elman training, the target error is set at 0.00001, and the remaining parameters and the activation functions are constants as the single model has in Group 1. The three-layer architecture of each sub-time series is determined by Fig. 7 Probability density of the candidate input variables of the original runoff time series before and after Box-Cox transformation at Zhangjiashan Station the trial-and-error method, and the optimal number of inputs, with hidden and output neurons listed in Table 5. For SVM training, the number of input variables of each subtime series is selected in Table 4, and the other parameters are the same as those of the standard SVM.

Group 3: Box-Cox-based hybrid models
To reduce the negative effect of the skewed runoff data on the modeling, BC-based models are developed by introducing Box-Cox into Elman or SVM. The input variable series are transformed into a normal distribution using Box-Cox with the parameters λ in Table 3. The model architecture and parameters are the same as those of the single models.

Fig. 8 The correlation coefficients between the output and input variables before and after Box-Cox transformation at Zhangjiashan Station
Group 4: Hybrid models-VMD-BC-Elman and VMD-BC-SVM To further improve the models' performance and sufficiently overcome the problems arising from the nonstationary and skewed runoff data, the VMD-BC-Elman and VMD-BC-SVM are proposed by introducing Box-Cox and VMD into the Elman or SVM. In this section, the VMD-BC-Elman architecture of each sub-time series is the same as those of VMD-Elman, and the VMD-BC-SVM modeling parameters are the same as those of SVM.

Forecasting Results and Discussion for Zhangjiashan Station
Based on the training models above, runoff time series are modeled using the different models. The performance statistical metrics are listed in Table 6. The hybrid models are shown to be more accurate than their standalone counterparts, and the performance of VMD-BC-Elman is the best among all models.  In the standalone models, Elman and SVM have unsatisfactory prediction results with NSE < 0.6, lower R and higher NMSE and PBIAS, as shown in Table 6. Compared to SVM, Elman provides better general results in the training period because its recurrent feedback networks with short-term memory increase the ability to process dynamic information. However, the performance of Elman is unstable, which may lead to overfitting. The standalone SVM, which is based on the risk minimization principle, has more robustness and generalizability for estimating nonstationary and skewed time series than the Elman does. The Elman model provides better prediction results in the training period, and the SVM provides better prediction results in the testing period, which is consistent with other studies (Song et al. 2020;Feng et al. 2020).
Compared with the standalone models, the metric values of decomposition-based hybrid models are satisfactory, as shown in Table 6. The six models yield R values in the range of 0.7878 to 0.9843, NSE values from 0.5754 to 0.9592, NMSE values from 1.06% to 42.24%, and PBIAS values from 1.26% to 19.26% in the testing period; thus, the decompositionbased hybrid models are superior to the counterpart single models. The metric values of VMD-Elman are the best among the six models, which indicates that VMD-Elman outperforms the other five hybrid models and that VMD has a better ability to denoise runoff data than EMD and EEMD do, which agrees with the results of other researchers who have modeled runoff time series based on EMD EEMD and VMD decomposition (He et al. 2019).
To eliminate the skewness of runoff time series, BC-based models are introduced. Table 6 shows that the statistical metrics of the BC-SVM and BC-Elman models are much better than those of the single models. Taking the NMSE in the testing period as an example, the NMSEs for the BC-Elman, BC-SVM, Elman and SVM models are 18.3%, 24.84%, 92.91% and 66.93%, respectively. This illustrates that the performance of the BC-based hybrid models is superior to that of the single models, and BC-Elman is better than BC-SVM. The excellent performance of BC-based models in this paper is similar to the study on hydrology frequency analysis, which shows that Box-Cox can provide robustness in fitting models (Seong 2014).  The statistical metrics of VMD-BC-Elman and VMD-BC-SVM are shown in Table 6. In the testing period, the R values of the VMD-BC-Elman model are 0.04%, 6.05%, and 0.14% greater than those of the VMD-Elman, BC-Elman and VMD-BC-SVM models; its NSE is 0.08%, 17.65%, and 0.39% greater, respectively; its NMSE is 1.97%, 78.25%, and 8.50% smaller, and its PBIA is 68.29%, 96.54%, and 46.58% smaller, respectively. The statistical metric values demonstrate that although VMD or Box-Cox alone can improve the ability of a single Elman, the integration of VMD and Box-Cox into Elman can promote their prediction to a new level, with VMD-BC-Elman outperforming VMD-BC-SVM. Considering these results, it is evident that the proposed VMD-BC-Elman has a more stable and consistent performance. It is affirmed that the AI models, combining VMD and Box-Cox, represent a significant improvement in runoff prediction.
While the statistical metrics can evaluate the overall performance of models, hydrographs and scatter plots can further identify the temporal correspondence. Figures 10 and 11 display the hydrographs and scatter plots of the observed and predicted runoff from the different models in the testing periods. In Figs. 10a and 11a-b, the single models cannot accurately capture the observed runoff, especially for the peak flow, and the scatter plots of the observed and estimated runoff are more dispersed than the hybrid models. Notably, in Figs. 10b-c and 11c-j, the estimated values generated by the decomposition-based and BC-based hybrid models are closer to the corresponding observed values, and the scatter plots are clustered closer to the ideal fit. In Figs. 10d and 11k-l, it is also obvious that VMD-BC-Elman and VMD-BC-SVM have greater accuracy as the scatter plots are concentrated closer to the ideal fitting line, illustrating that the VMD-BC-based hybrid models are more effective in estimating the peak values of data than other models. In particular, VMD-BC-Elman has the best capability to capture information on the overall features of runoff time series. Therefore, it is also reaffirmed that VMD and Box-Cox can effectively improve the stability and consistency of the models, and VMD-BC-Elman exhibits the strongest performance.
In Fig. 12, the boxplot diagram is designed to depict the error distribution between the estimated and observed values in the entire tested dataset to further verify the prediction performance of all the models. It can be seen from Fig. 12 that the predicted errors, including the first, median and third quartiles and minimum and maximum nonoutliers, are smallest when VMD-BC-Elman is employed compared with the compared models. This indicates that VMD and Box-Cox enhance the robustness of the models, and VMD-BC-Elman is superior to the other models, which is consistent with the above findings.
The Taylor diagram shown in Fig. 13, integrating multiple characteristics of models into a compact plot (Taylor 2001), can present the performance of the models comprehensively. It shows how close or far away the simulated values of each model are to or from the observed runoff data in the testing period, quantified via the correlation coefficient, standard deviation and root mean square error in the Taylor diagram. It is evident that VMD-BC-Elman is the closest to the observed reference point with the lowest RMSD, the highest correlation coefficient and the lowest SD, followed by the VMD-BC-based and the VMDbased models, while the single models are positioned the furthest from the reference point. Obviously, VMD-BC-Elman performs best for monthly runoff prediction, which can be explained by the fact that the provision of VMD and Box-Cox contributes to the improved stability and consistency of the models and avoids overfitting.

Forecasting Results and Discussion for Zhuangtou and Huaxian Stations
To further verify the feasibility of the VMD-BC-Elman, the runoff at the Zhuangtou and Huaxian stations is modeled. In Tables 7 and 8, the hybrid models are better than the single models in the testing period. VMD-based models are generally more accurate and stable than EMD-based and EEMD-based models. Additionally, the BC-based models outperform the single models. Obviously, VMD-BC-Elman and VMD-BC-SVM outperform the VMDbased and BC-based models, and VMD-BC-Elman is superior to VMD-BC-SVM, which is similar to the results of Zhangjiashan station, so it is safe to conclude that VMD-BC-Elman has the best forecast performance in this paper.
In Fig. 14, the proposed hybrid VMD-BC-Elman exhibits the best fit for the observed runoff compared with other models, and estimates by the VMD-BC-Elman are closest to the corresponding observed values and follow the same trend in all plots. Their scatter plots are closest to the ideal line, especially around the peak flow, which also indicates that VMD-BC-Elman has the best predictive skills. In conclusion, VMD-BC-Elman exhibits the best performance in terms of accuracy, stability and consistency, and the combination of VMD and Box-Cox has great potential in monthly runoff prediction areas.

Fig. 13
Taylor diagram depicting the predictive ability of 12 models in the testing period at Zhangjiashan station

Conclusions
To estimate nonstationary and skewed monthly runoff series, a novel VMD-BC-Elman model was developed in this paper, and runoff data from Zhangjiashan, Zhuangtou and Huaxian stations in the Wei River Basin were used. The steps of this model are as follows.
(1) The original nonstationary monthly runoff time series were decomposed into a set of relatively stationary sub-time series IMFs by VMD.
(2) Each IMF was reconstructed into a 12-dimensional phase space by PSR, and each candidate input variable series was normalized using Box-Cox. (3) LASSO was employed to determine the optimal number of input variables. (4) Each sub-time series was simulated by Elman. The final results are generated by summing the estimates of all the sub-time series.
The proposed VMD-BC-Elman exhibited the best precision, stability and consistency for predicting the nonstationary and skewed monthly runoff series. The remarkable advantages were demonstrated in the following aspects. (1) The proposed model with a perfect denoising ability can effectively distinguish the modalities of the complex nonstationary series and adaptably decompose the original series into the relative stationary sub-time series using VMD.
(2) It can remove the skewness of the original runoff series using Box-Cox and convert the skewed candidate input variables into a normal distribution, which can enhance the correlation between the input and output variables and improve the mapping ability and self-learning ability of Elman. (3) Integrating decomposition-normalization-simulationreconstruction techniques, VMD-BC-Elman can effectively identify and separate the different modal features, distinguish and construct the different distribution features of the runoff time series, and accurately summarize the optimal estimates of sub-time series to achieve better prediction for the monthly runoff. Overall, VMD-BC-Elman represents an effective tool for forecasting nonstationary and skewed monthly runoff series.
Despite the remarkable performance of VMD-BC-Elman over comparative models, the study has limitations that create opportunities for further research. First, the prediction results of high IMF are unsatisfactory, possibly due to too much noise. Second, the proposed model only considers one-step prediction, and multistep prediction has not yet been discussed. Finally, the proposed model does not consider the physical mechanism of runoff formation. The following aspects are worthy of further research. First, the high-and low-IMFs are expected to develop different models to more effectively predict the different frequency characteristics. Second, it will be useful to explore a model utilizing a multistep prediction approach. Third, a physically-based hydrology model can be introduced into the current VMD-BC-Elman, so this data-driven model can combine the strength of the selflearning ability with an enhanced physical mechanism.