An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge coefficient of submerged radial gates

Accurately prediction of discharge coefficient through radial gates is considered as a challenging hydraulic subject, particularly under highly submerged flow conditions. Incurring the advantages of kernel-depend extreme learning machine (KELM), this study offers a grey wolf optimization-based KELM (GWO-KELM) for effective prediction of discharge coefficient through submerged radial gates. Additionally, support vector machine (SVM) and Gaussian process regression (GPR) methods are also presented for comparative purposes. To build prediction models using GWO-KELM, GPR, and SVM, an extensive experimental database was established, consisting of 2125 data samples gathered by the US Bureau of Reclamation. From simulation results, it is observed that the proposed GWO-KELM approach with radial basis function and input parameters of the ratio of the downstream flow depth to the gate opening and submergence ratio provides the best performance with the correlation coefficient (R) of 0.983, the determination coefficient (DC) of 0.966 and the root-mean-squared error (RMSE) of 0.027. The obtained results showed that the proposed GWO-KELM with RBF kernel function gives better prediction accuracy than employed GPR and SVM approaches. Furthermore, the obtained results showed that the employed kernel-depend methods are capable of a statistically predicting the discharge coefficient under varied submergence conditions with satisfactory level of accuracy. Amon theme, proposed hybrid GWO-KELM method gave the most accurate results (R = 0.873, DC = 0.744, and RMSE = 0.035) for extremely highly submerged flow. Moreover, the results reflected that the employed kernel-depend methods give better predictions than the developed dimensionless formulas.


Q
Flow discharge (m 3 /sec) C d Discharge coefficient y 1 Upstream flow depth y 2 Flow depth at vena contracta (m) y 3 Downstream flow depth (m) L Channel width (m) w Gate opening (m) a Gate trunnion height above the invert (m) h Gate leaf angle from horizontal (°) Re Reynolds number y 1 -y 3 / w Submergence ratio y 1 /R Upstream flow depth to the hydraulic radius ratio y 3 /R Downstream flow depth to the hydraulic radius ratio y 3 /w Downstream flow depth to the gate opening ratio l Water viscosity q Water density g Gravitational constant (m/s 2 )

Introduction
Radial gates are known as the most common elements of water control structures in use today and can be found in most irrigation networks overall the world. They are constructed by the use of a curved skin plate with a structural steel frame and play an essential role in facilitating water delivery, maintaining river environments, and controlling flooding. These types of gates are more costeffective and simpler to utilize and maintenance in comparison to sluice gates (Clemmens et al. 2003;Shahrokhnia and Javan 2006). The performance accuracy of radial gates as check structures can vary according to the type of flow condition (i.e., free or submerged flow). Under a free flow condition, a free hydraulic jump is developed in the downstream part of the radial gate, and the flow is just affected by the upstream flow depth. On the other hand, with increasing tail-water depth, the jet emanating from a radial gate changes to partially submerged flow with an incomplete submerged jump and totally submerged flow with no jump. The discharge coefficient of gates constitutes as the basis for the determination of the opening, which is of considerable importance for accurate simulation and controlling of canal flow. Furthermore, accurate determination of discharge is crucial for informing water-saving policies and efficient operational management. Numerous studies have attempted to enhance discharge calibration of radial gates. An early study was carried out by Metzler (1948), who developed a series of maps to calculate the discharge coefficient of submerged radial gates. Toch (1955) and Buyalski (1983) applied the empirical relationship method to illustrate the discharge coefficient changes for both free and submerged flow conditions. Clemmens et al. (2003) acknowledged the inefficiency of empirical relationship approaches in calibrating submerged radial gates and developed the Energy-Momentum (E-M) approach to calibrate the discharge through a single radial gate. Using Buyalski's experimental data, Wahl (2005) introduced relative gate opening to modify the correction factor. Shahrokhniya and Javan (2006) and Zahedani et al. (2012) derived equations for quantifying the submerged flow rate based on the downstream flow depth using dimensional analysis and nonlinear regression. Abdelhaleem (2016) conducted research on a discharge estimation of three submerged parallel radial gates and found that the method of dimensional analysis with the incomplete self-similarity concept excels any other calibration methods. The practical implementation of the aforementioned discharge calibration methods indicated inadequate performance under highly submerged flow (Guo et al. 2020). More recently, Guo et al. (2021) proposed new criteria in order to subdivided submerged flow into partially submerged and totally submerged flow. Using a new discharge calibration method, expressed as the identification method, they asserted that considering the classification method can satisfactorily enhance the discharge model accuracy for totally submerged flow.
Where classical methods are insufficient in providing persistent success due to the complexity and uncertainty of hydraulic of submerged radial gates, machine learning (ML) methods can be reliable tools that provide solutions for discharge estimation of radial gates. High efficiency and speed of machine learning methods in modeling sophisticated hydraulic problems have made them as one of the most widely used methods in predicting the discharge coefficient of different types of weirs. A promising application of artificial neural network (Bilhan et al. 2011;Salmasi et al. 2013;Parsaie and Haghiabi 2015;Karami et al. 2018), support vector machine (Azamathulla et al. 2016), and adaptive-network-based fuzzy inference systems (Shamshirband et al. 2016;Haghiabi et al. 2018) were reported in the literature. Moreover, more recently, deep learning techniques are employed to accurate discharge coefficient prediction of weirs (Chen et al. 2022). Generally, deep learning is a special kind of neural networks that improve modeling accuracy through increasing the number of layers (Dahou et al. 2022).
Kernel extreme learning machine (KELM) (Huang et al. 2011) is a single-layer forward neural network developed based on extreme learning machine (ELM) (Huang et al. 2006), which integrates the concept of kernel functions into ELM to enhance its generalization ability on prediction problems. Excellent prediction performance of KELM has encouraged researchers to employ this rigorous kernel-depend method in predicting the discharge coefficient of piano key weirs (Roushangar et al. 2021a, b) and elliptical side orifices (Karbasi, 2021).
On the other hand, a great diversity of meta-heuristic algorithms have been developed and utilized to solve the constrained optimization problems. These nature inspired algorithms such as genetic algorithm (GA), particle swarm optimization (PSO) and ant colony optimization (ACO) are attractive as they do not implement mathematical suppositions in the optimization problems and present better global search capability than conventional optimization algorithms. Combining the merits from different evolutionary algorithms has been also suggested to attain more efficiency from computational point of view. Outstanding performance of hybrid genetic algorithm and particle swarm optimization (Garg 2016;Moslehi et al. 2020), the fractional-order Caputo Manta-Ray Foraging Optimizer (Yousri et al. 2022), hybrid gravitation search algorithm and genetic algorithm (Garg 2015), modified whale optimization algorithm (Al-qaness et al. 2021), an effective hybrid method improved teaching-learning based on Harris hawks optimization (Kundu and Garg, 2021), and modified Aquila optimizer  have been reported in the literature.
With the rapid development of ML and meta-heuristic methods, a large number of intelligent integrated systems have been proposed. Coupling meta-heuristic methods is an effective way to optimize the parameters of ML algorithms and achieve the best performance. In this context, Roushangar and Shahnazi (2019) used a combination of KELM and the PSO optimization algorithm in predicting bed load transport rate of gravel-bed rivers. Liao et al. (2019) applied a useful prediction method based on grey wolf optimization (GWO) algorithm and KELM, their hybrid model enjoyed greater accuracy when it came to predict the displacement of step like landslide. There are also a number of published studies that describe the significant role of GWO-KELM method in modeling various phenomena (Wang et al 2017a, b;Luo 2019;Zhou 2021;Roushangar et al. 2022). A detailed review of mentioned studies confirmed the GWO-KELM as promising to serve as robust early warning tool with great performance for prediction of various phenomena. The KELM has achieved a lot benefits from the GWO strategy due to its unique global search ability. In other word, the most suitable parameter set of KELM have been determined with the aid of GWO algorithm. The proposed GWO-KELM model requires the least running time compared to other hybrid models. Moreover, GWO-KELM has more flexibility under different hydraulic conditions. Prediction of the discharge coefficient is required as a basis for discharge measurements in irrigation channels and plays a fundamental role in evaluating the hydraulic efficiency of radial gates. A great deal of previous researches into the hydraulic of radial gates showed that the application of the ML approaches in radial gates has not been tried before. A review of the literature shows that the hybrid GWO-KELM method has a high capability in modeling various phenomena. Moreover, these studies confirm the complicated hydraulics of submerged radial gates, which necessitates the use of a robust model in predicting their discharge coefficient. In this regard, taking the advantages of hybrid GWO-KELM, for the first time, this method was used to predict the discharge coefficient of submerged radial gates. Apart from this, a high performance comparative study among two other kernel-depend methods, including Gaussian process regression (GPR) and support vector machine (SVM), has been applied to predict the discharge coefficient of submerged radial gates. Furthermore, it is justified to adopt the hybrid GWO-KELM as more efficacious substantial predicting model to predict the discharge coefficient of radial gates under varied submergence conditions with different submergence ratios. An extensive database including 2125 experimental data points gathered by the U.S. Bureau of Reclamation was used to feed the utilized kernel-depend models. Prediction accuracy of the employed kernel-depend techniques was assessed through various input combinations and under different submergence conditions.

Materials and methods
The experimental data provided by Buyalski (1983) were employed for prediction of the discharge coefficient of submerged radial gates. 2657 experiments were conducted based on three types of gate (Fig. 1). The gate lip seal was varied by modeling three commonly used designs: (1) the hard rubber bar; (2) the music note seal; and (3) without a seal, resulting in a sharp edge configuration. The sharp edge gate lip was achieved by simply removing the gate lip seal and filling in the screw holes with solder. Radial gates were established at three different trunnion height settings including 409, 461, and 511 mm. The radius of the radial gate for all the experiments was 702 mm, the gate width was 711 mm, and the downstream channel width was 762 mm. The floor of the laboratory flume was flat and at the same level over its entire length. The experiments were carried out under three different flow conditions, namely free, submerged, and jump (assumed to be the transition zone) flow (Fig. 2). The discharge coefficient of the submerged radial gate was obtained through the following equation: where Q is the discharge (m 3 /sec), w is the gate opening (m), b g is the gate width (m), y 1 and y 3 are the upstream and downstream flow depth (m), respectively, y 2 is the flow depth at vena contracta (m), a is the gate trunnion height above the invert (m), h is the gate leaf angle from horizontal (°) and g is the gravitational constant (m/s 2 ). Here, with the removal of outliers, 2,125 experimental data samples under submerged flow conditions were prepared to feed the employed ML methods.

Support vector machine (SVM)
In this section, basic features of SVM are provided since this kernel-depend technique has been implemented in diverse areas of engineering. SVM can be formulated from a given dataset of x i ,y i f g as: where the vector W is called the weight factor and b stands as the bias. In the above equation, u x ð Þ represents the transfer function. Minimization procedure of risk function is utilized for estimation of the aforementioned coefficient as indicated below: where C 1 2 P n i¼1 Lðx i ,y i Þ stands as the empirical risk. The An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3625 abovementioned coefficients can be determined through the minimization procedure of regularized risk function. This procedure is conducted by inserting slack variables n i and n Ã i as representative of upper and lower excess deviation.
Subject to where 1 2 kwk 2 is the regularization term, C denotes the cost factor, e is called as the loss function, and n indicates the sample size.
Using Lagrange multiplier and optimality constraints in Eq. (2) thus yields a general form of function as indicated below: where The term K x i ,x j À Á denotes the kernel function to achieve the inner products of two vector x i and x j in the feature space uðx i Þ and uðx j Þ, respectively. The various kernel functions used in the present study are as follows: Linear kernel: Polynomial kernel: Radial Basis Function (RBF): Sigmoid kernel: Furthermore, the generalization capability of the employed SVM method was assessed through the k-fold cross-validation techniques to prevent over-fitting over the testing phase.
The datasets were divided into k identical sub-samples (randomly selected), each of which was defined as a fold ðF 1 ,F 2 ,F 3 ,. . .,F k Þ. Among subsamples k, one subsample was implemented for validation, and reminder were implemented for the training process of this model. The kfold cross-validation approach was then repeated for k times, each of the k subsamples being utilized exactly once in the validation phase. The merit of this approach is the random repetition of subsamples in the test and training process for all observations, and each observation is just utilized one time for validation. In the present study, k is considered to be 10. The separating mechanism of the k-fold cross-validation approach is illustrated in Fig. 3. An infinite group of random variables is defined as a Gaussian process (GP) in which any of the finite subgroups contains a constant joint Gaussian distribution (Rasmussen and Nickisch, 2010). A GP is described through a mean and covariance (Eq. 9). Simply, the mean value is usually considered zero as the GP stands as a linear combination of multiple random variables with a high-dimensional joint normal distribution. The Gaussian process can be expressed as below: where m(x) and k x,x 0 À Á stands as the mean and the covariance functions, respectively. In the Gaussian process, the expected value of the function y * at the input matrix point x can be obtained through the mean function of m(x) as expressed in Eq. (12): The covariance function of kðx,x 0 Þ plays a role as a confidence level measure of the mean function as shown in Eq. (13). The covariance function captures any two arguments in such a way that it creates a positive definite matrix of K.
The covariance function determines particular features of the model, including stationarity, smoothness and periodicity. The primary and extensively used Gaussian process regression (GPR) consists of a mean value function zero and covariance function. There are different types of covariance functions (kernel functions) that can be utilized in a core tool of GPR as represented in Eqs. (14-19): Exponential: r f and r l are the hyper-parameters that control the performance of the employed GPR model. r f stands as the signal standard deviation, and r l stands as the characteristic length scale.
Squared exponential: Matern 3/2: Matern 5/2: Rational quadratic: The joint Gaussian prior distribution on the observed value y can be analytically derived as: where f Ã represents the prediction mean, and it provides the An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3627 best approximation for y Ã . The variance covðf Ã Þ is a sign of the prediction uncertainty. The mean value of the prediction, f * in Eq. (21) is a linear combination of the target y while on the other hand, the variance covðf Ã Þ is not reliant on the target but only inputs: where KðX,X Ã Þ indicates the covariance matrix of the training set X and the test set X * . Furthermore, K X Ã ,X Ã ð Þ represents the matrix of covariance between the test set itself. The marginal likelihood over f * is denoted in Eq. (22) as: Log marginal likelihood is obtained by simplifying the integral expression of Eq. (22) using the logarithmic identifier: where h is the component that sets the hyper-parameters. The minimum posterior hyper-parameter in the covariance function can be achieved through maximizing the marginal likelihood or minimizing the negative log marginal likelihood. Maximization of this marginal likelihood is accessible through taking derivatives over the parameters and utilizing gradient descent.

Kernel extreme learning machine (KELM)
The extreme learning machine (ELM) is known as a fast learning algorithm with better generalization capability. One of the weaknesses of ELM is the instability of this algorithm, which is caused by the stochastic adjustment of the initial weights. Kernel-based ELM as an extension of ELM can solve this problem through introducing kernel method and regularization theory (Wang et al. 2017a, b). Compared to ELM, kernel-depend ELM has more stability and generalizability, and it has been extensively applied regression fitting problems. The algorithm model of kernel-depend ELM can be denoted as follows: where h(x) is the hidden layer output matrix, and b is the connection weight between the hidden layer and the output layer. In the next step, the regularization coefficient of C is introduced for improving the stability and generalization capability of the model. Then, the least squares solution of the output weight can be achieved based on the theory of generalized inverse matrix: where H is the output matrix of the hidden layer and y is desired output vector. Therefore, the output function of the KELM can be expressed as: Due to the unknown nature of feature mapping function h(x), the kernel function should be embedded into the structure of KELM. In order to determine the kernel matrix X KELM , the element of the matrix is: And then, the network output of KELM can be written as: where Kðx i ,x j Þ stands as the kernel function. To select the appropriate kernel function, several most frequently used kernel functions are as follows: Linear kernel: Polynomial kernel: Radial Basis Function (RBF): Wavelet kernel: 6 Grey Wolf Optimization (GWO) GWO is a newly introduced intelligent algorithm inspires by the hunting behavior and social hierarchy of grey wolves developed by Mirjalili et al. (2014). Considering the social rank of wolves, the pack is divided into four levels from alpha (a) to omega (x), as depicted in Fig. 4. The a wolf plays a leadership role as the group tracks his/ her commands. In the mathematical model of GWO, a, b, and d are considered as the best solutions among the whole wolf pack and remaining solutions belong to x wolves. In the mathematical model of GWO, the first three solutions are classified as a, b, d, and the remaining solutions have relevance to x wolves. The hierarchy in each iteration is updated according to the three best solutions to implement this mechanism. The location update of grey wolves is illustrated in Fig. 5.
The concept of GWO algorithm is based on searching, encircling, hunting, and attacking the prey. The first attempt before hunting process is encircling the prey by the pack of grey wolves. The encircling model of wolves is as follows: where X ! stands as the position vector of wolf, X ! P t ð Þ is the prey position, t and t ? 1 represents the current and next locations and A ! is the convergence factor, and D ! represents the distance between the prey and the wolf which is determined as follows: where r ! 1 and r ! 2 are random vectors in [0, 1]. The a wolf is usually guided the hunt and is closer to the prey position. The b and d wolves might also contribute in hunting. For mathematically simulation of hunting behavior, the a, b, and d wolves as the first, second and third best candidate solutions are supposed to have more information about the possible position of prey. Therefore, x wolves should be obligated to renovate their positions as specified by the position of the best search agents as follows: where X ! 1 , X ! 2 and X ! 3 are the position of the grey wolf a, b, and d, respectively: where where x ! ðt þ 1Þ represents the randomly updated position of x in the (t ? 1)th generation, X ! 1 , X ! 2 , and X   An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3629 position of the grey wolf in the pack can be obtained after iteration as expressed in Eq. (45): 7 Modeling configurations In this paper, propose hybrid GWO-KELM technique was implemented in the MATLABÒ environment. Optimal adjustment of constant parameters q and especially the kernel parameters might considerably affect the proposed modeling process. The optimum values of the abovementioned effective parameters were obtained using the GWO algorithm. Therefore, GWO was embedded into the employed KELM method to enhance the modeling capability of discharge coefficient of the submerged radial gate. The GWO algorithm does not have any affinity to stick in local optimum points in the complex multimodal optimization problem. This nature inspired algorithm provides more varied search of the solution space for solving complex optimization problems. Improved optimum solutions with reduced computation burden can be accessed in GWO in comparison with the existing stochastic search methods (Roushangar et al. 2021a, b). The GWO is superior to these methods because (i) The GWO has better capability for information sharing and benefits of improved conveying mechanism; (ii) it utilizes random function and considers three candidate solutions for getting the best performance and converges quickly by making jump from local minima towards global minima. The main advantage of GWO algorithm over most of the widely known meta-heuristic algorithms is that the GWO algorithm operation requires no specific input parameters. Additionally, it is straightforward and free from computational complexity. In order to divide the total data into training and testing sections, Kfold cross validation approach is applied by considering K = 10 for random selection of data division. The data set containing 2125 data samples were partitioned into two sub-sets. The first separation as 75% of total data was associated with training set while the remaining 25% of data were used for testing goals. It is noteworthy that, the validity of a predicting model is significantly attributed to the range of data used in building the model. Generally, using abnormal data may lead to undesirable results, so in this study all input variables are scaled in the range of 0.1-0.9. This leads to uniformity of data values for the network and increases the speed of training and model capability (Dawson and Wilby, 1998). The following equation was used to normalize the utilized data in this study: where X n , X, X max , X min , respectively, are: the normalized value of variable i, the original value, the maximum and minimum of variable i. The flowchart of the suggested hybrid model is depicted in Fig. 6.

Performance evaluation criteria
The correlation coefficient (R), the determination coefficient (DC), and the root-mean-square error (RMSE) were employed as statistical indices for evaluating the models. R was applied to show linear correlation between simulated and observed vales. DC is a commonly used index in water resources modeling which has a range of (-!,1]. A value of unity demonstrates a perfect match between the simulated and observed values. Finally, RMSE shows the model's estimation error.
where X i and Y i denote the observed and simulated values of the discharge coefficient, respectively, and N represents the number of data, also:

Input parameters
Design of the optimum input matrix of the ML models that can give the best simulation of the target parameter is an important step with application of these models. The flow motion through a submerged radial gate can be expressed by the following functional relationship: F Q,y 1 ,y 3 ,L,w,g,l,q ð Þ ¼ 0 In which Q = flow discharge, y 1 = upstream flow depth, y 3 = downstream flow depth, L = channel width, w = gate opening, l= water viscosity and q = water density.
Considering w, g and l as independent variables, the Gtheorem provides the following five dimensionless groups:  An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3631 Rearranging the dimensionless group yields: In which Re = Reynolds number, and: Accordingly, the functional relationship, Eq. (51), can be rearranged in the following form: On the other hand, an experimental study by Toch (1955) demonstrated influential effect of w/R, y 1 /R, y 3 /R, and a/R on the discharge coefficient of submerged radial gates. Neglecting the influence of the Reynolds number and rearranging the dimensionless groups yields: Various combinations of the aforementioned parameters were prepared, and the best ones were selected through and trial and error procedure as listed in Table 1. Statistical information of input and output parameters in terms of the maximum (max), minimum (min), mean and standard deviation (S.td.) value is represented in Table 2. 10 Results and discussions Table 3 provides the statistical indicators of the employed kernel-depend models during the training and testing phases. For the implementation of GWO-KELM, the population (pack size) was set to 300 and iteration was applied as 50. Taking into consideration the statistical indices, all the employed kernel-depend models generated the most accurate results with the model (4), where y 1 -y 3 /w and y 3 /w were utilized as the input parameters. The comparison results of the employed kernel-depend methods revealed that the hybridized GWO-KELM with model the (4) as input combination outperformed the SVM, and GPR standalone models with the lowest RMSE (0.027) and highest R (0.983) and DC (0.966) values. The results presented in Table 3 also showed that the submergence ratio (y 1 -y 3 / w) as a common parameter in all developed input models has an essential role in the modeling process, and is the most influential parameter in predicting the discharge coefficient of submerged radial gates. The results are consistent with the findings of Ansar (2001), Ferro (2001), and Shahrokhnia and Javan (2006) that the discharge of a submerged radial gate can be considered as a function of the differential flow depth. Comparing the model (2) and (3) demonstrated that the addition of y 3 /r instead of y 1 /r has a positive effect on the prediction accuracy of the employed kernel-depend methods, and it can be declared that the flow through the radial gate is more affected by downstream flow depth. Similarly, using the ratio of the downstream flow depth to the gate opening (y 3 /w) significantly increases the modeling accuracy. The scatter plots of predicted discharge coefficient (C d ) vs. observed values for proposed GWO-KELM models are depicted in Fig. 7.
In order to provide more information about the performance of employed kernel-depend approaches, the distribution of relative errors and their histograms are plotted for the best input combination (Fig. 8). The relative errors of the proposed GWO-KELM met the characteristic of normal distribution (mean = 0.03). The relative errors were restricted to a limited range between (-0.5 and ? 1.5) for GPR, (-1 and ? 2.5) for SVM, and (-0.8 and ? 1.6) for GWO-KELM. Assessing the histogram of errors reveals that the distribution of the errors was symmetrical and the errors were accumulated around zero for the GWO-KELM, which indicated better performance of GWO-KELM approach in the prediction of C d compared to the GPR, and SVM methods.
Determination of an appropriate kernel function relies on the distribution of sample data and the relationship between sample data and predicted value. Since different feature space has different data distribution, the proper implementation of kernel-depend techniques is largely affected by the choice of kernel function. Kernel functions (1) ð y 1 Ày 3 w Þ (2) ð y 1 Ày 3 w : y 1 R Þ (3) ð y 1 Ày 3 w : y 3 R Þ (4) ð y 1 Ày 3 w : y 3 w Þ can be classified into two major groups: global and local. Global kernel functions, such as linear and polynomial kernel functions, show high generalization performance but a weak learning ability and are affected by samples far from each other. Local kernel functions, such as RBF and wavelet kernel functions, have a strong learning ability but low generalization performance and are affected by samples close to each other. Here, to find an appropriate kernel function, different kernel functions were utilized in the structure of the employed kernel-depend techniques. For the implementation of the embedded SVM method, the best values of kernel parameters were achieved through a trialand-error procedure. Additionally, optimization process of the cost factor (C) and the loss function (e) was conducted by a grid search in the C and e parameter space using the cross-validation technique. In this grid search a normal range of parameters settings are examined. First, the optimal values of C and e were obtained in the intervals of  An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3633 (0-15), and (0-1), respectively, for determined kernel parameters, and then kernel parameters were changed. The result demonstrated that SVM-RBF showed better prediction accuracy (R = 0.972, DC = 0.944, and RMSE = 0.035) as reported in the literature (Azamathulla et al. 2016;Le et al. 2020;Roushangar and Shahnazi 2020a, b). It was followed by linear kernel function (R = 0.761, DC = 0.558, and RMSE = 0.099) and polynomial kernel (R = 0.645, DC = 0.398 and RMSE = 0.115). In modeling the present phenomenon, the linear kernel function as the simplest kernel outperformed the polynomial kernel with the shortest processing time. The results indicated the poor performance of the SVM model using the sigmoid kernel function (R = 0.127, DC = -0.00043 and RMSE = 0.132). In the case of GPR, for tuning the associated hyper-parameters, a standard gradient descent optimizer was employed with maximizing the log marginal likelihood. Table 4 exhibits the performance of different kernel functions in the structure of employed GPR model. In consideration of the DC values, small variation can be observed throughout the employed kernel functions. The GPR performance ranges between 0. 957 (for Squared Exponential kernel) and 0.960 (for Matern 3/2 kernel). With the help of a standard gradient descent optimizer, the optimum values of relevant parameters for the various kernels were obtained as the length scale parameter (r l ) ranging from 2.3654 to12.2743 and the signal standard deviation (r f ) ranging from 0.1440 to 0.2721.
In the SVM, model performance is substantially affected by the RBF kernel parameter (c), which may cause underfitting and over-fitting problems Shahnazi, 2020a, b and. Figure 9 illustrates the statistical parameter of DC via gamma values of the SVM model. From the figure, it can be observed that the accuracy of the SVM model shows various behavior with changes of gamma values for different developed models. Fluctuations with the introduction of the model (1) as an input to SVM are more evident. The best-fitting gamma values for the best input combination (model 4) are obtained when c C 150. The SVM model with small values of c has tends to overfit, or memorize of data. The ideal value of the RBF kernel parameter was obtained c = 250 for the model (4) from the analysis performed. The optimal hyper-parameters of SVM for the model (4) were achieved as (C = 1, e = 0.001).
Since the gate trunnion pin height is an effective geometrical characteristic of radial gates on flow rate, it may be beneficial to check out the prediction capability of employed kernel-depend methods for different trunnion pin heights. For this purpose, developed input combinations were re-run for three different trunnion pin-heights, including 409, 461, and 511 mm and the obtained results are presented in Table 5. It can be noticed a slightly better performance of employed SVM, GPR, and GWO-KELM techniques in predicting discharge coefficient of submerged radial gates with the gate trunnion heights of a = 409 mm and a = 511 mm. For radial gates with the gate trunnion heights of a = 409 mm, introducing y 1 /R and y 3 /R (models 2 and 3) led to a decrease in overall modeling performance. It was more evident in the case of SVM, where modeling accuracy was reduced by 27% and 22% in terms of DC, respectively.   An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3635 The performance of the employed kernel-depend methods was compared with those of the traditional models based on the dimensional analysis. To achieve this aim, the formulas of Shahrokhniya andJavan (2005), (2006) and Zahedani et al (2012) were used as depicted in Table 6. It should be noted that the selected dimensionless formulas are developed based on experimental data and under submerged flow conditions.
The results illustrated in Fig. 10 reflect that the employed kernel-depend methods give better predictions than the selected dimensionless formulas. As is shown in this figure, Shahrokhniya and Javan's (2005) approach underestimates the discharge coefficient of submerged radial gates with statistical indices of R = 0.942, DC = -4.29, and RMSE = 0.348. By contrast, Shahrokhniya and Javan's (2006) approach overestimates the discharge coefficient with statistical indices of R = 0.969, DC = -1.46, and RMSE = 0.237. Existing equations and related constant parameters are developed within a specific range of flow conditions, and this inefficiency can be due to the method's range of validity. However, the proposed formulas of Shahrokhniya andJavan (2005 and enjoyed better correlation with the observed data as they incorporate the gate leaf angle (h) that plays an effective role on the discharge coefficient. Moreover, the dimensionless formula developed by Zahedani et al. (2012) presented an extremely poor performance on prediction of discharge coefficient with statistical indices of R = -0.233, DC = -4.29, and RMSE = 0.348. Generally, proposed traditional methods are not flexible enough to present uniform results under different conditions, and their applicability is limited to the specific cases of their development. Generally, kernel-depend techniques allow high dimension genomic data, fit nonlinear relations between outcomes and genomic data and find flexible ways to include structured information and computational sophistication. Among theme, KELM can obtain comparative and more excellent property with faster training speed and much easier implementation in application. The GWO strategy conveys advantages into KELM due to its unique global search ability to acquire the most suitable parameter set of KELM. The set of mentioned reasons increases the efficiency of the employed hybrid method compared to the dimensionless formulas and under different hydraulic conditions.
A review of previous studies shows that the prediction performance of existing calibration methods is poor for submerged flows (Guo et al. 0.2020). In order to assess the capability of proposed kernel-depend approaches in the quantification of discharge coefficient under varied submergence conditions, the submergence ratio, defined as y 1y 3 /w was considered as a representative of subdivision criteria for submerged flow through a radial gate. Different intervals of mentioned parameters were evaluated through the large number of trial and error processes, and the generalization ability of proposed kernel-depend approaches with selected best input combinations were analyzed for each interval. The obtained results are depicted in Fig. 11, which demonstrating an obvious rising trend in prediction accuracy of kernel-depend techniques. These findings confirmed that the performance of GWO-KELM, SVM and, GPR techniques with the specified input parameters inclines to be more robust with decreasing levels of submergence. According to the obtained results, proposed hybrid GWO-KELM method gave the most accurate results (R = 0.873, DC = 0.744, and RMSE = 0.035) for extremely highly submerged flow with S r-\ 0.05 (where S r = y 1 -y 3 /w).

Conclusions
The present paper is an attempt to implement a new kerneldepend extreme learning machine grey wolf optimization algorithm (GWO-KELM), Gaussian process regression (GPR), and support vector machine (SVM) for modeling discharge coefficient of submerged radial gates. The optimization capability of GWO that naturally inspired by the  An efficient hybrid grey wolf optimization-based KELM approach for prediction of the discharge… 3637 swarm evolutionary feature is utilized to tune the KELM model for the implemented application. The obtained results showed that the proposed GWO-KELM gives better prediction accuracy than employed GPR and SVM approaches. Among the developed input combinations, the model having input parameters of the ratio of the downstream flow depth to the gate opening (y 3 /w) and submergence ratio (y 1 -y 3 /w) provides the best performance. The submergence ratio as a common parameter in all developed input models is the most influential parameter in predicting the discharge coefficient of submerged radial gates. The introduced influential parameters can be used in the development of new relationships to estimate the discharge coefficient of submerged radial gates. The results also showed that despite other calibration methods, the employed kernel-depend approaches have high flexibility in predicting the discharge coefficient under different submergence conditions. It was showed that the prediction performance of employed kernel-depend methods tends to be more robust with decreasing levels of submergence. It is clear that there are variations in the prediction process of discharge coefficient under varied submergence Therefore, the exact state of flow condition should be determined before any modeling attempts are made. The outcomes of this work will help engineers to make some contribution towards the practical application of AI methods in hydraulic science. The results of this study indicate that by knowing the hydraulic conditions of the radial gate, a relative understanding of the accuracy of discharge coefficient estimation can be achieved. It is noteworthy to mention that the problems related to the lack of field data related to radial gates are among the limitations of the present study. However, the proposed kernel-depend techniques are datadriven, so further studies should be conducted using data ranges beyond this study and field data to confirm the high capabilities of the proposed prediction tools to model the discharge coefficient of submerged radial gates.
Funding No funding was received for conducting this study.
Availability of data and materials The data and materials that support the findings of this study are available on request from the corresponding author.

Declarations
Conflict of interest The authors declare that they have no competing interests.
Ethical approval Not applicable, because this article does not contain any studies with human or animal subjects.  Fig. 11 Results of the employed kernel-depend approaches under varied submergence conditions