On the Determination of Groundwater Level using Temporal and Spatial Parameters: Advanced Machine Learning Methods

6 Prediction of groundwater level is a useful tool for managing groundwater resources in the mining 7 area. Water resources management requires identifying potential periods for groundwater drainage 8 to prevent groundwater from entering the mine pit and reduce high costs. For this purpose, four 9 multilayer perceptron (MLP) neural network models and four cascade forward (CF) neural network 10 models optimized with Bayesian Regularization (BR), Levenberg-Marquardt (LM), Resilient 11 Backpropagation (RB), and Scaled Conjugate Gradient (SCG), as well as a radial basis function 12 (RBF) neural network model and a generalized regression (GR) neural network model were 13 developed to predict groundwater level using 1377 data point. This data set includes 12 spatial 14 parameters divided into two categories of sediments and bedrock, and besides, 6 time series 15 parameters have been used. Also, to determine the best models and combine them, 165 extra validation data points have been used. After identifying the best models from the three candidate 17 models with lower average absolute relative error (AARE) value, the committee machine 18 intelligence system (CMIS) model has been developed. The proposed CMIS model predicts 19 groundwater level data with high accuracy with an AARE value of less than 0.11%. Also, the 20 proposed model was compared with ten other models through graphical and statistical error 21 analysis. The results show that the developed CMIS model performs better than other existing 22 models in terms of precision and validity range. The relevancy factor indicates that the electrical 23 resistivity of sediments had the highest effect on the groundwater level. Eventually, the quality of 24 the data used was investigated both statistically and graphically, and the results show satisfactory 25 reliability of the data used. 26

The output of an MLP model with two hidden layers whose activation functions for these two layers   133 There is a direct relationship between input and output in the perceptron connection, whereas in the 134 feedforward neural network connection, there is an indirect relationship between input and output, 135 which is a hidden layer through a nonlinear activation function (Fahlman and Lebiere 1989). If the 136 connection form is combined in a multilayer network and perceptron, the network can be formed

Cascade forward neural network (CF)
Where f i is the activation function between the input layer and the output layer, w i i is the weight between the input layer and the output layer, w b is the weight from bias to output, w j b is the weight 142 from bias to hidden layer, and f h is the activation function of each neuron in the hidden layer.  with a particular radius is located at a given space, that in each neuron, the distance between the 157 input vector and its center is calculated. Euclidean distance is used to measure the distance between 158 centers and inputs, which is calculated from the following equation: For a model with ten input variables, P=10. To transfer the Euclidean distance from each neuron in 160 the hidden layer to the output, a radial basis function has been used. The most common radial basis 161 function is the Gaussian, which is obtained from the following interface: Where ω shows connection weight, is the number of neurons in the hidden layer, c denotes the 167 center, (‖x − c‖) is Euclidean distance between the center of the radial function and the input data In the hidden layer, network product function netprod multiplies the threshold b 1 and ‖dist‖ output 182 to get net input n 1 . The net put n 1 is passed to transfer function radbas. For the GR model, the 183 Gaussian function is used as the transfer function, namely, In the above equation, is a smoothing factor, also called the spread parameter, which calculates 185 the shape of RBF in the jth hidden layer.

186
The normalized weight product function is used as the weight function in the linear output layer, 187 making the former layer's output with the weight value IW 21 in this layer as the weight output. The

188
Purelin function is used as the transfer function for the output of the passed weight. The network 189 output is calculated from the following equation: For the GR model, only the spread parameter should be specified. Due to the significant effect of 191 this parameter on the model's performance, the optimal value of this parameter must be determined.

193
To achieve the desired goals, different models are developed, and the best models are chosen as a 194 candidate, and other models are discarded. Under these circumstances, the cost incurred for the 195 discarded models is wasted. For this purpose, a committee machine can be built by combining 196 intelligent models and using the features of each of these models. This committee machine Where e stands a vector of network errors, and J expresses a Jacobian matrix. In the following 217 relation of updating the LM algorithm, the mentioned approximation with the Hessian matrix is 218 used: η is a constant, and x denotes connection weights. η increases when an experimental step enlarges 220 the efficiency function.

232
One of the basic features of the backpropagation algorithm is to reach the most negative gradient, 233 and it uses the adjustment of weights in the steepest descending direction., Along such a direction

Resilient Backpropagation algorithm (RB)
The most widely used transfer functions in multilayer perceptron neural networks (MLP) are 249 Sigmoid and Tansig, which compress an infinite input range into a finite output. When using the 250 steepest descent to train the network using these activation functions, the slope is small when an parameters are expressed as: Based on the information in Table 3, the multilayer perceptron neural network (MLP) and cascade Conjugate Gradient (SCG) optimizers. The higher LM and BR optimizer accuracy possibly due to 296 using nonlinear least squares to find local minimums. Also, due to the inaccuracy of SCG and RB 297 methods compared to LM and BR methods, it is concluded that these optimizers are not suitable 298 for regression analysis. Since the efficiency of the created models is strongly influenced by the 299 initial biases and weights, the training of artificial neural networks with each optimizer using trial 300 and error was executed more than 50 times with dissimilar initial biases and weights, and the most 301 satisfactory results were chosen.  Table 3, the RBF model is less accurate than other models but requires less run time the program, which can be used for models with many input data.

307
To identify the optimal value for the coefficient of expansion and the number of neurons, the RBF 308 model was implemented more than 100 times, and the best results were stored. The general 309 regression neural network (GR) model, like the RBF model, has the spread coefficient parameter, 310 which is 0.5 for this parameter by trial and error. Based on the information in Table 3, Table 4. Some models, such as radial basis function neural network (RBF), which has the 335 highest AARE value in Table 3, but here with one of the lowest error values, is one of the best 336 models to predict, indicating accurate learning between input and output relationships. Other surface in the range of 130 to 150 meters, the AARE value is the lowest. Fig. 9. presents the 374 efficiency of the developed models over dissimilar ranges of the electrical resistivity of bedrock.

375
As can be easily seen, as in Figure 7, the lowest AARE value here is in the range of 21 to 30 (ohm-376 m). Besides, the CMIS model has the lowest AARE in all four ranges. Fig. 10. depicts the AARE    William's plot is plotted in Fig. 15. for the resulting by the developed committee machine . These data may be well predicted, but due to the difference with 435 a large amount of data, but, are outside the acceptable range of the model.