Optimum Application of Hybrid Data Driven Models With Two Step Verication Method at Mangla Watershed, Pakistan

: 16 In this study, 03 ensemble and decomposition methods (DMs) i.e., empirical mode 17 decomposition (EMD), ensemble empirical mode decomposition (EEMD) and improved 18 complete ensemble empirical mode decomposition with additive noise (ICEEMDAN) 19 were coupled artificial intelligence and machine learning based method AI-ML, i.e., 20 multilayer perceptron (MLP), support vector regression (SVR) to develop 06 fundamental 21 hybrid models to predict streamflow with one-month lead time. Developed models in this 22 study were categorized into runoff models (RMs) and rainfall-runoff models (RRMs). 23 Results indicated that (i) among standalone models (SMs), support vector regression (SVR) 24 performs better than multilayer perceptron (MLP), (ii) decomposition methods (DMs) have 25 ability to improve the accuracy rate of the standalone models (SMs) and, (iii) rainfall runoff 26 models (RRMs) have shown great accuracy throughout the investigation as compared to 27 runoff models (RMs). To compare model performances flow-hydrographs (FHG) were 28 generated, 05 performance evaluation criteria (PEC) were used to quantify the model 29 precision. Two step verification method i.e., extreme value analysis (EVA) and least value 30 analysis (LVA) approaches were proposed to verify the performances. Among all 31 developed hybrid models (HMs), i.e., EMD- (MLP, SVR), EEMD- (MLP, SVR) and 32 ICEEMDAN- (MLP, SVR), rainfall-runoff ICEEMDAN-(SVR) model was selected as 33 optimal model with MAE (59.56), RMSE (91.82), R (0.97) MAPE (8.75), and NSEC (0.97) 34 for Mangla watershed, Pakistan. 35


38
Accurate forecasting of streamflow is vital for proper water control and preventing 39 monetary or fiscal disruptions in the long run. As a result, streamflow forecasting has acquired a great deal of popularity. In water research sectors, three subsets of AI have been 41 extensively used: (1) Evolutionary computation (2) Fuzzy logic and (3) Machine learning 42 methods and classifiers (Vapnik 2000). Artificial neural networks (ANNs) are used 43 effectively in various fields, including hydrology and water resource management, with a 44 good nonlinear mapping capability (Huo et al. 2012). They affirm that the multilayer  Additionally, however, ANNs have issues such as slow learning speed, dimension shifting, 51 and local minima being addictive, and they also tend to overfit to the data (Shamseldin et    However, to the best of the author's knowledge no study has focused on the streamflow 71 prediction by using data driven and decomposition techniques at selected study area 72 (Mangla watershed) which is considered vital in the planning of water release patterns for 73 sustainable agricultural development and hydropower generation. Therefore, present study 74 aims to develop numerous hybrid models by combining the traditional AI-ML methods 75 such as MLP and SVR models with DMs such as, EMD, EEMD and ICEEMDAN to 76 simulate the monthly streamflow in Mangla watershed, Pakistan. For the selection of input 77 parameters, autocorrelation (ACF) was used and the partial auto-correlation function 78 MIV. Developed models with SIV and MIV can be categorized as runoff models (RMs) 85 and rainfall runoff models (RRMs), respectively. Furthermore, extension of extreme value 86 analysis (EVA) and least value analysis (LVA) approach were adopted to ensure the 87 findings for each model in both calibration and validation phase (CVP). The accuracies of 88 the forecasting models in this study are measured using five-error metrics (MAE, RMSE, 89 MAPE, R, and NSEC). The constructed models were verified for predicting the streamflow 90 with a 1-month time lag at one hydrological station and eleven meteorological stations over 91 Mangla watershed, Pakistan.

92
The below is a breakdown of the paper's structure: Segment 2 presents overview to the 93 methodology discussed above, which includes AI-ML based models, i.e., MLP and SVR 94 DMs, i.e., EMD, EEMD, Improved CEEMDAN, Segment 3 explains the study area and 95 data in detail, Segment 4 calls and inspects the case study outcomes, and Section 5 draws 96 conclusions of this study.  Owing to a lack of data in India, the research region was limited to a catchment that ran    (2) 151 The loss function of the real value and the optimal production can be represented as Where denotes the actual value, the output value is given by ℎ, distance norm is shown 156 by ‖. ‖.

158
In 1990s Vapnik (Vapnik 1999) developed the support vector machine, which is 159 also regarded as classification and has been applied to regression. SVMs is designed for  EMD method can be expressed as follows:     Where q is original streamflow data set, qmin represents minimum value in original data set 253 and qmax represents maximum value in original data set.

254
To first deal with input selection problem, a suitable input vector for hybrid models is  Table 2 and Table 3 respectively.    Figure 6 (a, b) for CP and Figure 4 (a, b)

323
To address the shortcomings of standalone models such as MLP and SVR, three    Table 2.  results at VP for all developed models are shown in Figure 5 and Figure 6 respectively. RMs and RRMs at both CVP are shown in Table 3.