Estimation of Manning Roughness Coefficient in Alluvial Rivers with Bed Forms Using Soft Computing Models

Flow conditions (flow discharge, flow depth, and flow velocity) in natural streams are mainly determined via the flow resistance formula such as Manning’s equation. Evaluating the accurate Manning’s roughness coefficient (n), especially in rivers with bed form during floods, to obtain more reliable results has always been of interest to scholars. The interaction between the flow and bed form is very complex since the flow conditions control bed forms, and vice versa. The main goal of the present study is to predict n in rivers with bed forms, using soft computing models, including multilayer perceptron artificial neural network (MLPNN), group method of data handling (GMDH), support vector machine (SVM) model, and genetic programming model (GP). To this end, the energy grade line (Sf\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${S}_{f}$$\end{document}), flow Froude number (Fr), the relative submergence (y/d50\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y/{d}_{50}$$\end{document}; y = flow depth and d50 = bed sediment size), and the bed form dimensionless parameters (Δ/d50\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta /{d}_{50}$$\end{document}, Δ/λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta /\lambda$$\end{document}, and Δ/y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta /y$$\end{document}; ∆ = bed form height and λ = bed form length) were used as the input variables, and n was used as the output variable. The results showed that all the test models have acceptable accuracy, while the SVM model showed the highest level of accuracy with the coefficient of determination R2=0.99\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}^{2}=0.99$$\end{document} in the verification stage. The sensitivity analysis of SVM and MLPNN models and the structural analysis of GMDH and GP models indicated that the most important parameters affecting n are Fr, Sf\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${S}_{f}$$\end{document}, and Δ/λ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta /\lambda$$\end{document}.

The constant in the regression function C A positive integer that determines the penalty when a model training error occurs d 50 The average diameter of sediment particles f The target function FFNN The feed-forward neural network model Fr The flow Froude number G Gravitational acceleration GMDH Group method of data handling GP Genetic programming model G s The relative density of sediment particles MAPE Mean absolute percentage error MLP-FFA Multilayer perceptron-firefly algorithm model MLPNN Multilayer perceptron artificial neural network N The number of samples (data used for training) n The total Manning roughness coefficient n ′′ The bed form related to the Manning roughness coefficient O The observed value O The mean observed value P The calculated value P The mean calculated value R Correlation coefficient Re The flow Reynolds number RMSE Root mean square error R 2 The coefficient of determination S f The energy grade line SI The scatter index SVM Support vector machine model V The average flow velocity W T The transpose of the coefficient matrix x The input variables ( x = x 1 , x 2 , ⋯ x m ) X j The calculated value for the chromosome by fitting function j y The average flow depth Y j The measured value or the expected value of the chromosome by fitting j α The angle of the upstream side of the bed form relative to the horizon θ The angle of the downstream side of the bed form relative to the horizon λ The length of the bed form Δ The height of the bed form The kernel function s The specific mass of sediment particles w The specific mass of water The dynamic viscosity of water

Introduction
In natural rivers, when the flow condition is higher than the incipient motion of riverbed sediments, the live bed condition exists in which the sediments are transported by the flow in the form of "bed load" or "suspended load". In many rivers, especially sand bed rivers, the bed form in various shapes and geometries are developed as a result of the bed load transport under specific flow conditions. These forms from small to large size are called: Ripples, Dunes, Anti-Dunes, and Chutes and Pools. Ripple and Dune are asymmetric triangles that are developed at lower flow conditions, whilst Anti-Dunes, Chutes and Pools have larger geometries and are developed at higher flow conditions. The shape, size, and spacing of the bed forms depend on the flow characteristics (flow velocity and depth) and the characteristics of the riverbed sediments (size and specific gravity). On the other hand, the flow conditions also strongly depend on the type of bed form (Julien 2010;Dey 2014). Larger bed form sizes can contribute to higher resistance to the flow. During the flood, due to variations in flow conditions, some changes occur in the type of bed forms. Thus, the friction factor varies more than ten times during a flood (Vanoni 2006). The n and friction factor are known as flow resistance coefficients. Compared to the friction factor, the n is more significant and has more applications in open channels. Various factors such as flow hydraulic conditions, channel geometry, size and shape of channel bed sediments, bed loads, suspended loads, bed forms, etc. are effective on the value of this coefficient. Thus, the accurate determination of the n in channels requires knowing the effect of the aforementioned factors on it (Vatanchi and Maghrebi 2019; Bahrami Yarahmadi and Shafai Bajestan 2011; Heydari et al. 2022).
The effect of bed forms on the n has been studied by a few investigators. Talebbeydokhti et al. (2006) investigated the effect of the Dune's geometry on flow resistance in a sand-bed channel. The effect of bed load transport on flow resistance in channels covered with Dune bed form was studied by Omid et al. (2010). Afzalimehr et al. (2010) studied the friction factor in gravel-bed rivers with bed forms. Nasiri Dehsorkhi et al. (2011) investigated the effect of different types of bed forms and vegetated banks on flow conditions. Chegini and Pender (2012) studied the influence of flow conditions on bed load and its related bed form in relatively low, mild, and steep slopes channels. Kabiri et al. (2014) examined the velocity and Reynolds stress distributions on gravel Dunes. Kwoll et al. (2016) investigated the effects of Dune lee slope on flow resistance and turbulent flow structure. Brakenhoff et al. (2020) studied the effect of bed form-related roughness on hydrodynamic patterns and sediment transport using the Delft3D numerical model. Okhravi and Gohari (2020) examined the bed form friction factor in armored river beds. Dey et al. (2020) investigated the impact of flow hydrodynamics on two-dimensional Dunes. Heydari et al. (2022) conducted a laboratory study of the effect of bed forms with different sediment sizes on flow resistance and bed shear stress.
The majority of the studies reviewed above have conducted experiments to obtain the required data. Since collecting laboratory data is very time-consuming and expensive, and given that field and laboratory data are currently available, it is necessary to make more accurate predictions using soft computing models, especially since n in a channel with bed form is related to various factors, including hydraulic, sedimentary, and channel geometry. Hereupon, few researchers have tried to predict n using various soft computing models. For instance, Yuhong and Wenxin (2009) applied an artificial neural network (ANN) model to accurately predict the nonlinear relationship between the friction factor and the factors affecting it. Ayhan (2011) developed an adaptive neuro-fuzzy inference system (ANFIS) to obtain a good correlation between the experimental and predicted data for estimating n. Roushangar et al. (2018) estimated the n in Dune-bed rivers using multilayer feed-forward neural network (FFNN), radius basis neural network (RBNN), and adaptive network-based fuzzy inference system (ANFIS) models. The results indicated that ANFIS outperforms RBNN but is less efficient than FFNN. The sensitivity analysis of the models also indicated that the Re and R∕d 50 parameters have the most significant impact on estimating the n. Niazkar et al. (2019) estimated the grain and total Manning's coefficients using artificial neural networks and genetic programming with higher accuracy than empirical formulas. Zanganeh and Rastegar (2020) modeled and estimated the roughness coefficient in erodible channels using the artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS). The sensitivity analysis of the models showed that shear Reynolds numbers have the most significant effect on predicting the roughness coefficient of erodible channels. Saghebian et al. (2020) estimated the n in Dune and Ripple rivers using multilayer perceptron-firefly algorithm (MLP-FFA) and feed-forward neural network (FFNN) models. The results showed that in rivers with a Dune bed form, Re, and in rivers with Ripple bed form, Fr and R∕d 50 are vital in modeling the n. Yao et al. (2023) developed an adaptive parallel genetic algorithm (APGA) to estimate n in unsteady flow simulations. They declared that the APGA with the successive-approximation-based stepwise optimizing strategy yields better solutions than the APGA alone, and that the proposed model outperforms models that do not consider variations of n with both discharge and distance.
A review of the literature shows that a few studies and soft computing models have been conducted to predict n in channels with bed form. In addition, no study has yet addressed the effect of bed forms with different heights and sizes of sediment on the roughness coefficient. However, as previously explained, the formation of bed form in natural rivers, especially during floods, can lead to the variation in roughness and, as a result, variation in flow conditions. Therefore, it is necessary to determine the amount of roughness coefficient with the possibility of bed form formation. For this reason, the main goal of the present study is to predict n in rivers with bed forms using soft computing models such as multilayer perceptron artificial neural network (MLPNN), group method of data handling (GMDH), support vector machine (SVM), and genetic programming (GP).

Materials and Methods
The data used to estimate the n using soft computing models were collected through the experiments conducted by the present study's authors. To this end, the section describes the laboratory equipment and the experimental procedure, followed by a discussion of the dimensional analysis and an overview of the soft computing models employed.

The Laboratory Equipment and Experimental Procedure
The experiments were conducted in a straight and tilting flume with a length of 12 m and a width of 0.3 m (Fig. 1a). The flume made by Armfield Company, England, had a rectangular section with glass walls and a metal bottom.
The Ripple and Dune bed forms were in the shape of an asymmetrical triangle, such that the slope of the upstream side of the triangle was gentle and the slope of the downstream was steep and equal to the angle of repose of the bed sediments (≈32º) (Simons and Richardson 1966;Simons et al. 1961). Thus, each bed form was artificially made in an asymmetric triangle. Two groups of experiments were conducted. In the first group of experiments, the bed forms' length, width, and height were chosen as 20, 30, and 4 cm, respectively. The angle of the downstream side of the bed form was considered equal to 32 degrees. Furthermore, 0.51, 1.29, and 2.18 mm sediments were used to roughen the surface of the forms (using glue). In the second group of experiments, the length and width of the forms were equal to 25 and 30 cm, respectively. The angle of the downstream side of the bed form was considered equal to 32 degrees. Moreover, in the second group of experiments, bed forms with four different heights (1, 2, 3, and 4 cm) were used, and sediments with a size of 0.45 mm were used to roughen their surface (using glue). The sediments used in both groups of experiments were made of sand with a relative density of 2.65. Besides, the bed forms were attached to the bottom of the flume at a length of about 6 m.
The experiments were conducted with discharge rates of 10, 15, 20, 25, and 30 L/s and bed slopes of 0, 0.0001, 0.0005, 0.001, and 0.0015. The total number of experiments was equal to 186.
The geometric and hydraulic parameters used in this study were the bed form dimensions, flow depth, the average size of sediments, discharge rate, flow velocity, and flow Froude number, as displayed in Table 1. For example, the energy grade line range varies from 0.003 to

Dimensional Analysis
The variables affecting the Manning roughness coefficient in channels with a bed form are expressed as follows: Where V is the average flow velocity, y is the average flow depth, g is gravitational acceleration, w is the specific mass of water, is the dynamic viscosity of water, s is the specific mass of sediment particles, d 50 is the average diameter of sediment particles, B is the channel width, S f is the energy grade line, λ is the length of the bed form, Δ is the height of the bed form, α is the angle of the upstream side of the bed form relative to the horizon, and θ is the angle of the downstream side of the bed form relative to the horizon. Figure 1b shows the bed form variables.
Following Buckingham's Π-theorem and taking y, V, and w as repeated variables, the following dimensionless parameters are estimated after analyzing and combining the parameters: Where Δ is the ratio of the height of the bed form to the flow depth, Δ is the ratio of the height of the bed form to its length, Fr is the flow Froude number, Re is the flow Reynolds number, and G s is the relative density of sediment particles. Since sand was used in this study as the sediment to roughen the surface of the bed form and its relative density value was constant (G s =2.65), this parameter is removed from the above equation. Furthermore, given the turbulent nature of the flow in all experiments, the flow Reynolds number was also removed. The θ value was also constant (θ = 32 o ). Thus, this parameter was also removed from Eq. (2). Moreover, given that α is a function of the ratio of the height of the bed form to its length, it was also removed because Δ exists in the final equation. Accordingly, the dimensionless parameters effective on the Manning roughness coefficient in channels with a bed form in this study are estimated as follows:

Multilayer Perceptron Neural Network
The multilayer perceptron neural network (MLPNN) was introduced firstly in the form a simple network consisting of several simple artificial neurons. Multilayer perceptron neural networks function based on the parallel processing of data (Dayhoff 1990). A multilayer perceptron network is constructed by adding one or more hidden layers to a single-layer perceptron network. Most studies have used soft computing modeling in water engineering with multilayer perceptron networks (Singh et al. 2021). These networks are primarily trained using backpropagation technique. Since this method needs derivable activation functions, derivable functions such as the sigmoid function are mostly used in this network. One single neuron is not just enough to model hydraulic phenomena, so several neurons must work in parallel to be able to model these phenomena. The used data were divided into two training and testing datasets for calibrating and verifying the multilayer perceptron neural network. The training of the multilayer perceptron neural network is considered one of the vital steps in modeling, which is the same as updating the weight coefficients in the intermediate and output layers (Parsaie et al. 2018).

Group Method of Data Handling (GMDH)
The group method of data handling (GMDH) was introduced for modeling complex systems. As a multilayer neural network model, GMDH has input layers, hidden layers, and output layers. The GMDH structure is determined systematically, unlike the perceptron multilayer neural network model whose structure is mainly developed based on the designer's experience (Saberi-Movahed et al. 2020). Currently, the GMDH model is successfully used in most hydraulic engineering problems, such as predicting the scour around hydraulic structures and the discharge coefficient of weirs. The GMDH model was developed based on the extension of partial descriptions. This idea was taken from Valter's hypothesis, whereby a complex system can be modeled using an infinite series of polynomials. For simplicity, instead of using an infinite series of polynomials, Ivakhnenko (1968) used only a quadratic polynomial. Of course, he stated that a quadratic polynomial could form a Kolmogorov-Gabor polynomial during the performance of perceptron networks. The GMDH model is a feed-forward neural network in which two inputs are used for each neuron. The relationship between the input and output variables in each neuron is expressed using a polynomial activation function as follows: where w 0 to w 5 are polynomial coefficients. In the first layer, neurons are created by a pair of input parameters, and in the second layer, they are connected to previous superior neurons. Figure 2 shows an example of a GMDH network with six input parameters and three layers. As can be seen, superior neurons are differentiated from removed neurons in each layer. In this network, the output neuron n represents the superior neuron in terms of fitting features.
The means of training of GMDH model is to adjust the values of the coefficients to obtain the minimum difference between the observed values and the outputs of the neurons. For this purpose, various methods have been presented, the most common of which is the least square error method.

Support Vector Machine (SVM)
Support vector machine (SVM) is a classifier algorithm known as one of the best classification and prediction techniques. The SVM model falls under the supervised learning classification and has two training and testing stages. SVM classifier's working basis is the linear classification of data. Some of the applications of the SVM model in hydraulics are the prediction of scour depth, water quality, and performance of hydraulic structures, such as energy dissipation structures. A support vector machine is an efficient system based on constrained optimization theory that uses the structural risk minimization (SRM) principle and leads to a general optimal solution (Cortes and Vapnik 1995). Similar to other regression problems, the relationship between independent and dependent variables is determined by an algebraic function such as f (x) plus the error term (allowable error (ε)).
If W T is the transpose of the coefficient matrix and b is the constant in the regression function, and is the kernel function. The goal is to find a functional form for f (x) by training the support vector machine (SVM) with a set of data (training dataset). To calculate W and b values, it is necessary to minimize the error function following the constraints (the terms detailed in Eq. 7). The SVM model is trained in the form of an optimization problem as follows: Fig. 2 The network structure of the GMDH model to estimate n Where C is a positive integer that determines the penalty when a model training error occurs, is the kernel function, N is the number of samples (data used for training), and the two terms * i and i are missing variables (error rate). Finally, the regression SVM kernel function can be rewritten as follows: Where a i is the average of the Lagrange coefficients. Computing (x i ) in its characteristic space may be very complex. To solve this problem, the conventional procedure in the SVM regression model is to choose a kernel function as follows: Different kernel functions can be used to develop different types of SVM models. The kernel functions that can be used in the regression SVM model are the polynomial kernel, radial basis function (RBF) kernel, linear kernel, and sigmoid kernel, whose equations are presented below. Figure 3 shows the layout of the SVM model. Since the RBF, linear, and polynomial kernels are among the most widely used kernel functions, these three kernel functions were investigated in this study (Azamathulla et al. 2016).

The Genetic Programming (GP) Model
The genetic programming (GP) model is a subtype of intelligent search algorithms proposed by Koza (1992). A tree or branch system is used in the GP model. Each branch also consists of a set of terminals (input variables) and a set of main algebraic operators. To this end, the blocks are defined, including the input variables, the target function, and their connecting functions. Then, their suitable structures and their coefficients are determined. The final model presents the relationship between operators and variables as a tree structure whose mathematical form can also be extracted. Although the output of the model determined in this way may seem complicated, it has an important advantage as it is generalizable to a large part of the phenomenon in question. The step-by-step modeling in genetic programming starts by creating an initial population of the functions of the predicted models. Then, each member of the said population is evaluated using fitness functions. With each reproduction, a new population is selected through various steps such as crossover, mutation, and copy operators, the optimal selection of present people, reproduction, and evaluation of the new generation. The fitting of each member of the generated population is assessed using the following equation (Azamathulla and Jarrett 2013): Where f is the target function, X j is the calculated value for the chromosome by fitting function j , and Y j is the measured value or the expected value of the chromosome by fitting j.

Modeling Strategies and Evaluation Criteria
The number of parameters involved in determining the n equals 6. One to six parameters can be used to model and estimate the n. It is suggested that about 75% to 85% and 15% to 25% of the data should be used for training and testing, respectively. Using statistical error indices (the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE)), the performance of the models and the errors are assessed via the following equations: (14) Fig. 3 The network layout of the SVM model to estimate n Where O is the observed value, P is the calculated value, O is the mean observed value, and P is the mean calculated value. Talebbeydokhti et al. (2006) developed the following equations to determine the Manning roughness coefficient in sand channels with Dune bed form:

Results and Discussion
Where n ′′ is the bed form related to the Manning roughness coefficient, and n is the total Manning roughness coefficient. The statistical error indices in the above equations were measured for the laboratory data in this study, and the corresponding values were R 2 = 0.84 and RMSE = 0.0019. The evaluation of these indices confirms the accuracy of these equations in estimating the Manning roughness coefficient in rivers with a bed form. However, given the importance of the Manning roughness coefficient in river engineering studies and the need for a more accurate estimation of this parameter, the present study focused on developing and assessing various soft computing models to estimate this parameter. The correlation coefficients between the research parameters and the n are presented in Table 2 and Fig. 4. As can be seen, the n has a direct and significant relationship (r > 0.5) with S f , Δ∕y , and Δ∕ . It has an inverse relationship with Fr and y/d 50 , which is consistent with hydraulic concepts. In addition, the n has no significant relationship with Δ/d 50 .
This section assesses the soft computing models (MLPNN, SVM, GMDH, and GP) developed to estimate the n. Data preparation is done as the first step of soft computing modeling. Therefore, the dataset was divided into two training and testing groups. The percentage of each of these groups is usually determined by trial and error. Nevertheless, the researcher's experience and the advice of other researchers are beneficial. Typically, 80% of the collected data was dedicated to training (for calibration) and the remaining 20% to testing (for verification). It is noteworthy that the statistical characteristics of the dimensionless parameters of each of the training and testing datasets are almost similar, as displayed in Table 3. After the preparation of the data, the patterns of the input and output variables in soft computing models should be designed. The dimensional analysis showed that the six dimensionless parameters involved in determining the n are Fr, S f , y∕d 50 , Δ∕ , Δ∕d 50 , and Δ∕y . One, two, more, or all independent parameters can be used to develop the pattern of input variables. In this study, the pattern of 5 input variables 6 5 = 6 which includes 6 scenarios (in each scenario the influence of an independent parameter is examined), and the pattern of 6 input variables which includes 1 scenario (the scenario where all input variables are considered) are examined. The sensitivity analysis of the models was performed using the 5-input variable pattern.

MLPNN Model Estimation Results
The MLPNN model was developed following the recommendations provided by Azamathulla et al. (2016). According to them, for the development of MLPNN, first, a hidden layer with neutrons equal to the number of input variables should be designed. At this stage, the performance of various neural activation functions was checked. The results  . 4 The n values against different parameters showed that the sigmoid tangent activation function is the most accurate one. Next, to increase the accuracy of modeling, the number of neurons or the number of hidden layers can be gradually increased. Figure 5 shows the structure of the MLPNN model based on all input variables. As can be seen, the final MLPNN model has two hidden layers, with six neurons in the first layer (in the presence of all involved variables) and three neurons in the second layer. In addition to this pattern, other patterns for input variables were examined, as displayed in Table 4.     Table 4 shows the effect of each of the input variables on the modeling accuracy. The performance of the MLPNN model in each pattern was assessed as shown in rows one to six of Table 4. An evaluation of statistical indicators showed that the MLPNN model has a high level of accuracy. The first row of Table 4 displays the statistical indicators of the MLPNN model in the absence of the parameter Δ∕d 50 . As can be seen, the statistical error indices of the model in the training stage are equal to RMSE = 0.0007 and R 2 = 0.981 . Besides, the corresponding values in the testing stage are equal to RMSE = 0.0008 and R 2 = 0.978 . Further examination of Table 4 indicates that the performance of the MLPNN model in the absence of parameters of Δ∕d 50 (the first row), Δ∕y (the second row), and y∕d 50 (fourth row) is almost the same. An evaluation of this model in the third row indicates slightly less efficiency in the absence of the Δ∕ . This finding was expected because the correlation analysis confirmed a direct relationship between this parameter and the n. Table 4 also shows that in the absence of the S f and Fr , the statistical error indicators of the MLPNN model have decreased significantly, indicating they are most influencing the n. Overall, Table 4 declared that the S f , Fr , and Δ∕ are the most influential parameters on the manning coefficient.

GMDH Model Estimation Results
Following the details presented about the GMDH model, only a pair of input variables can be introduced to each neuron. Thus, if six input variables (all influenced parameters) are used in the first layer, up to 15 neurons can be maximally prepared. The structure of the developed GMDH model is shown in Fig. 2. As it is clear from this figure, the GMDH model has two hidden layers, with four and three neurons in the first and the second hidden layers, respectively. In the GMDH model, the Δ∕ the parameter was not used in the formation and development of the network structure. An analysis of the network structure of the GMDH model indicates that S f , Fr , and Δ∕y have the most contribution to the development of the network. Table 4 shows the assessment of the GMDH model with different input variables (rows one to seven). As evident in this Table, the statistical indicators of the GMDH model based on the input variables presented in the first row of this Table in the training phase are RMSE = 0.0007 and R 2 = 0.975. The corresponding values in the testing phase are RMSE = 0.0007 and R 2 = 0.968. The evaluation of the statistical indicators of the GMDH model in the scenarios presented in rows one to four shows that the model has maintained its effective performance. It should be noted that the data in rows one to four show the performance of the GMDH model in the absence of parameters such as Δ∕d 50 , Δ∕y , Δ∕ , and y∕d 50 . In other words, the statistical indicators of the GMDH model in the absence of the Fig. 7 The performance of MLPNN, GMDH, GP, and SVM models in the testing stage parameters mentioned above did not show significant changes. However, the model's accuracy in the absence of parameters such as the S f and the Fr has decreased significantly. Figures 6 and 7 show the assessment of the GMDH model at different stages of development. The values of the scatter index of the model in the training, and testing stages are 0.023 and 0.029, respectively.

SVM Model Estimation Results
Four kernel functions, including RBF, sigmoid, polynomial, and linear kernel functions, were evaluated for the development of the SVM model, as shown in Table 5.
As can be seen in Table 5, the RBF and sigmoid kernel functions have the best performance as they have the highest R 2 value and the lowest RMSE value. The linear kernel function had the lowest efficiency, which seems logical due to the nonlinear relationship between independent and dependent parameters. Since the sigmoid function has been used in the MLPNN model and to compare the performance of the SVM and MLPNN models, the sigmoid function was considered the final kernel function. Figure 3 shows the layout of the SVM model. Table 4 presents the performance of the SVM model in different scenarios. An analysis of Table 4 shows that removing the parameters Δ∕ , Δ∕y , Δ∕d 50 , and y∕d 50 from the pattern of input variables, the statistical indicators of the SVM model did not change or decrease significantly. By removing the Fr from input variables (the fifth row in Table 4), the model's efficiency decreased significantly. Thus, in the testing stage, the R 2 value decreased to 0.936, and the RMSE value increased to 0.0012. Table 4 also indicated that the most important parameter is Fr, which follows the structure of the GMDH model and the results of the MLPNN model. Figures 6 and 7 show the performance of the SVM model in the training and testing stages in the scenarios, which consider all involved input parameters in its input variables. The values of the scatter index of this model are equal to 0.002 and 0.006 in the training and testing stages, respectively.

GP Model Estimation Results
As mentioned in the material and methods section, the n has a direct relationship with S f , Δ∕ , and Δ∕y , while has an inverse relation with Fr and y∕d 50 . Therefore, these parameters The statistical error indices of the GP model in the training and testing stages were calculated under each scenario as presented in Table 4. The statistical error indices of the GP model in the training stage are RMSE = 0.0012 and R 2 = 0.928. Besides, the corresponding values in the testing stage are RMSE = 0.0013 and R 2 = 0.926. Table 4 reveals the efficiency of the GP model in comparison with the previous models, it has less accuracy and weaker performance. An analysis of the equation derived from the GP model indicates that it was formed based on the Fr, and S f . Figures 6 and 7 show the performance of the GP model in estimating the n.
The statistical error indicators of the formula proposed by Talebbeydokhti et al. (2006), based on the measured laboratory data of this research, are R 2 = 0.84 and RMSE = 0.0019.
However, using the equation obtained from the GP model (Eq. (21)), the values of the statistical indices were R 2 = 0.926 and RMSE = 0.0013. Talebbeydokhti et al. (2006) used Δ∕y as the only parameter in their equation. While in addition to (21) n = 2.184S f Fr + 0.006 Fig. 8 The values of the parameters used in the GP model this parameter, parameters such as Fr and S f are also effective on the n. Thus, they were incorporated into the equation extracted from the GP model. Thus, the equation derived from the GP model has a higher accuracy than the equation proposed by Talebbeydokhti et al. (2006).

Comparing the Performance of Soft Computing Models
To compare the efficiency of the soft computing models, the error percentage index and the Taylor diagram in the training and testing stages were prepared. Figure 9 reveals the distribution of error percentage on the most important common parameter, i.e. flow Froude number in the testing stage. As evident in this figure, the lowest error percentage range belongs to the SVM model. Examining the Taylor diagram presented in Fig. 10 shows that the performance of the SVM model closely matches the observed data in the testing stage, while the GP model has the weakest performance. In other words, the SVM and GP models have the highest and lowest efficiency in estimating the n. Roushangar et al. (2018) showed that the models developed by them (i.e., FFNN, RBNN, and ANFIS) to estimate the n in rivers with a Dune bed form had R 2 ≤ 0.9 in the verification stage. Saghebian et al. (2020) estimated the n in Dune and Ripple rivers using multilayer perceptron-firefly algorithm (MLP-FFA) and feed-forward neural network (FFNN) models. They reported R 2 = 0.56 and RMSE = 0.0034 for the mentioned model in the best scenario. The R 2 values for MLPNN, GMDH, SVM, and GP models in the present study in Fig. 9 Distribution of the error percentage index of the developed models on the flow Froude number in the testing stage the verification stage were equal to 0.982, 0.979, 0.999, and 0.926, and the RMSE values were equal to 0.0006, 0.0006, 0.0000, and 0.0013, respectively. A comparison of the statistical error indices (R 2 and RMSE) for the models developed in the present study and the models developed by previous scholars (Roushangar et al. 2018(Roushangar et al. , 2017Saghebian et al. 2020) indicated that the soft computing models developed in the present study had significantly higher accuracy.

Conclusion
This study addressed the development of soft computing models, including multilayer perceptron artificial neural network (MLPNN), group method of data handling (GMDH), support vector machine (SVM) model, and genetic programming model (GP) for modeling and estimating n in alluvial rivers with Ripple and Dune bed forms. The models were developed using laboratory data extracted by the authors. The results showed that all the soft computing models used in this study have an acceptable level of accuracy in estimating the n. However, the SVM model (R 2 = 0.999; RMSE = 0.0000) was more efficient than Fig. 10 The Taylor diagram for soft computing models used to estimate the n in the testing stage the other models. Furthermore, the sensitivity analysis showed that the flow Froude number and energy grade line had the most significant impact in modeling and estimating the n. A comparison of the performance of the models using the Taylor diagram indicated that the SVM model and the GP model had the highest and lowest level of accuracy in estimating the n. Moreover, the MLPNN and GMDH models had almost the same efficiency level. However, they were slightly less efficient than the SVM model and more efficient than the GP model in estimating the n.