Here, we have used feed-forward back propagation method consisting of an input layer, a hidden layer, and an output layer (Fig. 2a). One or more neurons (nodes) characterize each layer. In this method, first a collection of all nodes, which are multiplied with weights, is performed at each node of hidden layer. Next, a bias is attached to this sum, which is transformed through a non-linearity function prior to being transferred to the next layer. Functions such as hyperbolic tangent, sigmoid and linear functions can be used as transfer function. Subsequently, the same procedure is followed in this layer to obtain the network output results. The error between the network output and actual observation is estimated at the output layer. Finally, the desired outputs are achieved by back propagating this error to the input layer, through hidden layer in the network. All the synaptic weights of the network remain fixed during the forward processing while they are adjusted according to an error correction rule during the backward processing (Haykin, 1999). The FFBP network structure used in the present study is shown in Fig. 2b. Here, the input layer is considered to consist of xn neurons while hm neurons characterize the hidden layer. And, yl neurons characterize the single output layer. The weights (w) consider to act as the link between very nodes in one layer to other layers. Here, wij marks the input to hidden weights while wik represents the hidden to output weights. And, the output of the network (yk) (Gunaydin and Gunaydin, 2008) can be written as,
\({y}_{ik}=\dot{f} [\sum _{j=0}^{m} {w}_{jk}.f [\sum _{i=1}^{n}({w}_{ij}.{x}_{i}\) + \({b}_{1}\)) + \({b}_{2}\)] --------------(1)
where b1 and b2 are the biases for the first and second layer, respectively. The activation functions between input and hidden layers, and hidden and output layers are f(.) and \(\dot{f}\)(.), respectively.
Here, functions like the tangent sigmoid (tan-sgmoid), logarithmic sigmoid (log-sigmoid) and linear activation functions have been used for both f(.) and \(\dot{f}\)(.) for obtaining the best prediction performance.
For learning model, the back-propagation network model (Rumelhart, 1986) is used here, which aims at minimizing iteratively the mean sum squared error (MSE), that defined by
E= \(\frac{1}{2P}\) \(\sum _{p=1}^{P} \sum _{k=1}^{l}({T}_{pk}\) - \({y}_{pk}\)), p=1, 2, 3, …………………..P -------------(2)
where \({T}_{pk}\) and \({y}_{pk}\) are observed target and predicted output at kth output node of pth pattern, respectively. The total number of training patterns considered here is P. At each iteration, the global error (E) is minimized by adjusting the weights in each layer of the network until a convergence is achieved. Otherwise, the same process is repeated for further iterations.
A Learning Epoch is defined as each step in the learning phase. Here, for learning phase Levenberg-Marquardt algorithm (Levenberg, 1944; Marquardt, 1963) has been used that minimizes E and is expressed as:
Wnew = Wold – [JTJ + γI]−1 JT E (Wold) --------(3)
where J is the Jacobian of the error function E, I is the identity matrix, and γ marks the iteration step value. Here, an adaptive learning rate is used that changes dynamically during the training stage from 0 to 1. We increase the learning rate by the factor learning increment if performance decreases toward the goal for an epoch. Otherwise, we adjust the learning rate by the factor learning decrement when performance increases for an epoch. Here, we use a value of 0.0001 as the performance goal throughout all FFBP simulations. After completing the training phase of the network successfully, a testing dataset (30% of the total data points) is used to examine the performance of the trained model.
The normalization of training input and output datasets are done through the following equation:
xni = a. \(\frac{{x}_{i}-{x}_{min}}{{x}_{max}-{x}_{min}}\) + b ----------------------------(4)
where \({x}_{i}\) is the observed data obtained from ith record and xni is the normalized value of the ith record. And, xmax and xmin are the maximum and minimum values, respectively. Different values can be assigned for scaling factors a and b. Here, we have used different values for a, and b to obtain the best prediction performance. Finally different values of a and b are considered for different input and output parameters as discussed in the Supplementary data. This procedure has resulted in normalization of both inputs and output within the range of [-1.5, +1.5]. Several trials are made to select the optimum node numbers for the hidden layers. Finally we obtain the best performance when we used 8 nodes for hidden layers. We stopped the networks training after maximum 10000 epochs.