Peak ground acceleration prediction using artificial neural networks for Kachchh, Gujarat, India

The uncertainty in the empirical ground motion prediction models (GMMs) for any region depends on several parameters. In the present work, we apply an artificial neural network (ANN) to design a GMM of peak ground acceleration (PGA) for Kachchh, Gujarat, India, utilizing independent input parameters viz., moment magnitudes, hypocentral distances, focal depths and site proxy (in terms of average seismic shear-wave velocity from the surface to a depth of 30 m (Vs30)). The study has been performed using a PGA dataset consisting of eight engineering seismoscope records of the 2001 Mw7.7 Bhuj earthquake and 237 strong-motion records of 32 significant Bhuj aftershocks of Mw3.3–5.6 (during 2002–2008) with epicentral distances ranging from 1.0 to 288 km. We apply a feed-forward back propagation ANN method with 8 hidden nodes, which is found to be optimal for the selected PGA database and input–output mapping. The standard deviation of the error has been utilized to examine the performance of our model. We also test the ground motion predictability of our ANN model using real recordings of the 2001 Bhuj mainshock, two Mw5.6 Kachchh aftershocks and the 1999 Mw6.4 Chamoli mainshock. The standard deviation of PGA prediction error estimates in log10 units is found to be ± 0.2554. Also, the model predictability of our ANN model suggests a good prediction of the PGA for earthquakes of Mw5.6–7.7, which are occurring in Kachchh, Gujarat, India.


Introduction
In general, ground motion prediction equations (GMPEs) or attenuation relationships form a key input for the earthquake hazard assessment of any region (Joyner and Boore, 1993;Ambraseys et al. 1996;Campbell, 1997). In general, the linear regression analysis is applied on the observed and/or synthetic peak ground acceleration data to determine GMPEs (Joyner and Boore, 1993;Ambraseys et al. 1996;Campbell, 1997;Boore and 1 3 Atkinson, 2008;Nath et al. 2009;Anbazhagan et al. 2013). However, this method introduces errors in ground motion predictability due to the assumption of prior functional form. Further, the availability of less numbers of independent input parameters in India in comparison with other developed countries like USA and Europe has resulted in most of the available GPMEs in India as functions of magnitude and hypocentral distances (Nath et al. 2009;Anbazhagan et al. 2013). It is a well-established fact that inclusion of other independent input parameters (like focal depths, V s30 , soil conditions, focal mechanisms, etc.) can lead to relatively robust prediction in ground motions (Abrahamson and Silva, 2008;Campbell and Bozorgnia, 2008;Castellaro et al. 2008;Kokusho and Sato, 2008). The CSIR-National Geophysical Research Institute (NGRI), Hyderabad, India, had monitored the earthquake activity in Kachchh during 2002-2016, with a close seismic network of 8-12 three-component broadband seismographs and 10-20 three-component strongmotion accelerographs (Fig. 1), which provided an excellent dataset for the robust estimation of moment magnitude, focal depths, hypocentral distances and focal mechanisms. Further toward hazard assessment, several micro-zonation studies have been carried in Kachchh, which provided V s30 estimates for the different regions of Kachchh that enabled us to construct maps of Vs at shallow depths below the Kachchh basin (Mandal and Asano, 2009). Therefore, the availability of PGA data and input parameters (like Mw, focal depths, Hypocentral distances, and V s30 ) for the Kachchh region has motivated us to apply an artificial neural network method to construct a ground motion model (GMM) for the PGA prediction.
Due to the availability of large data set, nowadays the artificial neural network (ANN) has been used commonly to determine the GMPEs (Ahmad et al. 2008;Alavi and Gandoni, 2011;Güllü and Erçelebi, 2007;Rumelhart et al. 1986;Derras et al. 2012 and2016;Irshad et al. 2008). By applying an ANN methodology, a simple but powerful relationship is derived with M w , Focal Depths, Hypocentral distances and shear velocity at 30 m depth (V s30 ), for predicting PGA for Kachchh, Gujarat, India. For data fitting, pattern recognition and time series prediction, ANN has been demonstrated to be a powerful tool in a very complex situation. Massive parallel computations with a large number of variables can also be easily performed by an ANN, which can also grasp the nonlinear aspects of the earthquake dataset. Thus, a data-driven ANN can provide robust prediction for ground motions.
In the present work, we apply a feed-forward back propagation ANN method with 8 hidden layers, which is found to be optimal for the selected PGA database and input-output mapping for the Kachchh region. Here, the ANN method also uses the Levenberg-Marquardt training algorithm for mapping input-output datasets, available within ANN toolbox of MATLAB 2018b. Finally, the ground motion predictability of the derived GMM has been tested using recorded PGAs of different Indian earthquakes and predicted PGAs from available GMPEs for India and eastern Northern America.

Strong motion acceleration dataset
For developing a GMM for PGA prediction using an ANN method, strong motion data of 237 recordings of 32 Kachchh earthquakes of M w 3. 3-5.6, during 20023-5.6, during -2008. For our study, eight engineering seismoscope (SRR) records of the 2001 M w 7.7 Bhuj earthquake are also utilized. The robust estimates of hypocentral parameters and moment magnitudes have been obtained by using data from 12 broadband seismographs and 10-20 strong-motion accelerographs (Fig. 1). For our ANN modelling, we used the moment magnitudes (M w ), focal depths (D, m) and hypocentral distances (R, m) as independent inputs while we used V s30 (m/s) as the site proxy. The hypocentral distances for the dataset vary from 335 to 28900 m, while focal depths and M w are varying from 3 to 32 km and 3.3-7.7, respectively ( Fig. 2a, b, c and d). The dataset is restricted to earthquakes in Kachchh, Gujarat for which the fault rupture lies mainly above a depth of 40 km. The values of V s30 have been obtained from several micro-zonation studies in Kachchh, Gujarat, which were used to construct three-dimensional shallow (0-0.9 km) shear velocity structure below the Kachchh basin (Mandal and Asano, 2019). For peak values, we use the larger of the two horizontal components in the directions as originally recorded. Others (e.g. Campbell, 1981) have used the mean of the two components. The open circles in Fig. 2a, b, c and d show the distribution of the peak acceleration data with moment magnitudes, hypocentral distances, focal depths and V s30 values. For the 2001 M w 7.7 Bhuj earthquake, the sources of strong-motion data have been given in previous publications (Cramer and Kumar, 2003;Iyengar and Raghukanth, 2004). The dataset consists of a total of 245 data points. For our ANN modelling, we used 70% (i.e. 171) of dataset for training, while 30% of dataset has been used for testing and validation of the model.
Utilizing the above-discussed PGA dataset of Kachchh earthquakes, Mandal et al. (2009) have derived the following ground motion attenuation relationship for the Kachchh region using the regression analysis following the Joyner-Boore (1993)'s method for a magnitude independent shape where, Y is the peak horizontal acceleration in 'g', M w is the moment magnitude, and r jb is the closet distance to the surface projection of the fault rupture in km. And, S is 0 for a soft soil site while S is 1 for a stiff soil site. However, Eq. (1) can predict ground motion with a standard deviation of ± 0.8243. Thus, it is apparent that Eq. (1) cannot provide very robust ground motion prediction (with standard deviation exceeding 0.9). This observation demands a serious effort to improve our GMPE for the Kachchh region, lying in the seismic zone V (BIS, 2002).

Artificial neural network (ANN) method
A statistical learning method was for the first time developed by McCulloch and Pitts (1943), based on the concept of biological neural networks in the brain, which is known as the Artificial Neural Network (ANN) (Perlovsky, 2001). Different problems without algorithmic solutions or with complex solutions can be obtained through ANN by developing predictive model for the outputs given the inputs. The neurons of each layer are fully connected to the neurons of other layers with connection weights. The connection weights come from the training process of the network, which is done here by implementing the Levenberg-Marquardt back-propagation algorithm (Marquardt, 1963). Here, we have used a multilayer perception model (MLP) network, which is a kind of ANN with feed-forward back propagation architecture, for predicting PGA, consisting of an input layer, a hidden layer, and an output layer (Cybenko, 1989;Fig. 3a). One or more neurons (nodes) characterize each layer. In this method, first a collection of all nodes, which are multiplied with weights, is performed at each node of hidden layer. Next, a bias is attached to this sum, which is transformed through a nonlinearity function prior to being transferred to the next layer. Functions such as hyperbolic tangent, sigmoid and linear functions can be used as transfer function. Subsequently, the same procedure is followed in this layer to obtain the network output results. The error between the network output and actual observation is estimated at the output layer. Finally, the desired outputs are achieved by back propagating this error to the input layer, through hidden layer in the network. All the synaptic weights of the network remain fixed during the forward processing, while they are adjusted according to an error correction rule during the backward processing (Haykin, 1999). The FFBP network structure used in the present study is shown in Fig. 3b. Here, the input layer is considered to consist of x n neurons, while h m neurons characterize the hidden layer. And, y l neurons characterize the single output layer. The weights (w) consider acting as the link between very nodes in one layer to other layers. Here, w ij marks the input to hidden weights, while w ik represents the hidden to output weights. And, the output of the network (y k ) (Gunaydin and Gunaydin, 2008) can be written as, where b 1 and b 2 are the biases for the first and second layer, respectively. The activation functions between input and hidden layers, and hidden and output layers are f(.) and ̇f (.), respectively. Here, functions like the tangent sigmoid (tan-sigmoid), logarithmic sigmoid (log-sigmoid) and linear activation functions have been used for both f(.) and ̇f (.) for obtaining the best prediction performance.
For learning model, the back-propagation network model (Rumelhart, 1986) is used here, which aims at minimizing iteratively the mean sum squared error (MSE), that defined by where T pk and y pk are observed target and predicted output at k th output node of p th pattern, respectively. The total number of training patterns considered here is P. At each iteration, T pk − y pk , p = 1, 2, 3..............P the global error (E) is minimized by adjusting the weights in each layer of the network until a convergence is achieved. Otherwise, the same process is repeated for further iterations.
A Learning Epoch is defined as each step in the learning phase. Here, for learning phase Levenberg-Marquardt algorithm (Levenberg, 1944;Marquardt, 1963) has been used that minimizes E and is expressed as: where J is the Jacobian of the error function E, I is the identity matrix, and γ marks the iteration step value. Here, an adaptive learning rate is used that changes dynamically during the training stage from 0 to 1. We increase the learning rate by the factor learning increment if performance decreases toward the goal for an epoch. Otherwise, we adjust the learning rate by the factor learning decrement when performance increases for an epoch. Here, we use a value of 0.0001 as the performance goal throughout all FFBP simulations. After completing the training phase of the network successfully, a testing dataset (30% of the total data points) is used to examine the performance of the trained model.
The normalization of training input and output datasets is done through the following equation: where x i is the observed data obtained from i th record and x ni is the normalized value of the i th record. And, x max and x min are the maximum and minimum values, respectively. Different values can be assigned for scaling factors 'a' and 'b'. Here, we have used different values for a, and b to obtain the best prediction performance. Finally, different values of 'a' and 'b' are considered for different input and output parameters as discussed in the Supplementary data. This procedure has resulted in normalization of both inputs and output within the range of [− 1.5, + 1.5]. Several trials are made to select the optimum node numbers for the hidden layers. Finally, we obtain the best performance when we used 8 nodes for hidden layers. We stopped the networks training after maximum 10,000 epochs.

Proposed ANN Model
Here, an artificial neural network (ANN) has been constructed to predict PGA wherein the site proxy log 10 (V s30 ) has been used as an input. Other basic inputs are moment magnitude (M w ), log 10 (focal depth, D in m), and log 10 (hypocentral distance, R in m), while the output is considered as log 10 (y, in m/s*s). These inputs and output are normalized using Eq. 4 prior to applying the feed-forward ANN method. A feed-forward supervised learning ANN architecture consisting of input, output, and a single hidden layer is chosen in this study as shown in Fig. 3a, b, c and b. One hidden layer is generally acknowledged for describing a nonlinear physical process. The input model with four input variables [Moment Magnitude, log 10 (D), log 10 (R), log 10 (V s30 )] and eight number of nodes in the hidden layer, which reported the least σ during three phases, is considered as the final selected model. The activation functions selected between the input-hidden layer and the hidden-output layer are Tanhsigmoid and linear, respectively. The ANN architecture of the derived model is shown in Fig. 3b.
The general structure of the ANN network of the i th neuron from the hidden layer is shown in Fig. 4a, wherein information weighted with corresponding connection weights is received by each neuron. While the input to the activation function is formed by the summation of the weighted inputs and the bias, b i (Fig. 4a). Finally, the following equation is used to compute the output of the neuron: where Y i is the output of the ith neuron in the hidden layer. W ji defines the connection weight of the jth neuron from the input layers and ith neuron from the hidden layer. And, b i defines the bias for i th neuron in the hidden layer. Here, the bias is assumed to follow a linear Tanhsigmoid function of ϕ(x) = tanh(x) = (2/(1 + e −2x ) -1. Other types of activation function can also be used for the ANN network viz., identity, binary step, logistic, ArcTan, Rectified linear unit, exponential linear unit. The selection of activation function does not necessarily control the accuracy of the ANN model, but it can affect the values of weights between different nodes. During the training process, the value of the bias (b i ) is determined. The output Y i is the input of the neurons in the next layer. The general structure of the output is shown in Fig. 4b. It shows clearly that the values of Y i calculated in the hidden neurons are set as inputs for this neuron. The values of PGA are predicted by passing through the weighted summation from the activation function. This is achieved by adopting the use of a linear activation function which scales the output of the summation junction to the actual values of the ground motion parameters.

Attenuation Models
The above-discussed ANN models are trained and turned into the mathematical formulation of PGA (cm/s 2 ) as mentioned below: where b PGA represents the bias values of output neurons for ANN models of PGA. While v iPGA marks the connection weight between the i th neuron from the hidden layer and output neuron for each ANN model ( Fig. 4a and b). And, F iPGA is computed using the following equation: In the analysis, the original data are rescaled to dimensionless values before being input to the networks (see the normalization of data section in the Appendix). We examined several values of number of nodes for hidden layers and finally selected 8 nodes that showed smallest validation loss among the tested models. The logistic sigmoid and rectified linear functions are used as activation functions in the last hidden layer and the other hidden layers, respectively. The linear (identity) function is used in the output layer. The connection weights and biases for our proposed ANN model here are listed in Table 1. Our derived attenuation model is given in Eqs. (7) and (8) . Figure 5a shows the performance of our modelling, suggesting that our model was trained for 11 times or epochs. However, by 5 iterations the model was trained and training was stopped because the model was overfitting after 5 iterations. Here, we used the Levenberg-Marquardt training algorithm to train our dataset. The best validation performance is 0.08825 at epoch 5 (Fig. 5a). The validation dataset of our model aligns with the best fit between 5 and 11 epochs, while our model for training and test datasets suggests lower values than the best fit values between 2 and 11 epochs. Figure 5b shows the error histogram for our NN modelling showing most of the larger positive and negative values are near to the zero error line (shown by a yellow thick line), which suggests the error distribution of our NN modelling is good. Thus, our model is well trained.  Figure 6a, b, c and d shows the regression analysis of the neural network modelling for our training, validation, test and complete datasets. Our training dataset consists of 171 data points (i.e. 70% of total data points), while validation and test dataset are having 37 data points (i.e. 15%) each. And, the complete dataset consists of 1 total of 245 data points. Our observed and predicted PGA values (through FFBP modelling) show  (Fig. 6a, c, and d). Thus, apparently our NN modelling has trained and predicted the data quite well. From Fig. 6a, b, c and d, we get four linear fit lines  in the form y = a・x + b, where x and y are observed and predicted (from the FFBP) PGA, respectively, which are mentioned below:  Figure 7a and b shows good agreement between predicted and observed PGA values for training and test datasets, while Fig. 7c and d shows errors in predicted PGA for training and test datasets. A standard deviation of ± 0.2554 is obtained for the errors in PGA prediction for training dataset, while a standard deviation of ± 0.2720 is estimated for the errors in PGA prediction for the test data. Therefore, the PGA prediction from our NN model is found to be good (Fig. 7a, b, c and d).

Results and discussions
(9) y = 0.83 . x + 0.12 (10) y = 0.76 . x + 0.16 (11) y = 0.84 . x + 0.12 (12) y = 0.82 . x + 0.13 Fig. 7 a Predicted log 10 (PGA) from the NN modelling for training dataset, b Predicted log 10 (PGA) from the NN modelling for testing dataset, c Error in Predicted log 10 (PGA) from the NN modelling for training dataset showing a standard deviation of ± 0.2554, and d Error in Predicted log 10 (PGA) from the NN modelling for testing dataset showing a standard deviation of ± 0.2720 Fig. 8 The attenuation curves (solid thick red line) for the predicted PGA using the feedforward data-driven ANN trained network for a M w 5.6, b M w 6.5 and c Mw7.7 events. Observed PGA data (blue open circles) for the 2001 Bhuj mainshock and large Bhuj aftershocks of M w 5.6 and the 1999 Chamoli earthquake have also been plotted to demonstrate the robustness of the prediction. Solid thick grey line marks the predicted attenuation curves from the ground motion relations obtained from the regression analysis of PGA dataset of the 2001 Bhuj earthquake sequence (Mandal et al., 2009). In Fig. 8b, black solid line marks the attenuation curve for the predicted PGA (using ANN model) for an M w 6.0 earthquake. In Fig. 8c, predicted curves for the ground motion attenuation models of Frankel et al. (1996)

Prediction of PGA and correlation with the observed PGA from moderate to large Indian Earthquakes
The predicted PGA estimates from our trained ANN model suggest a good agreement with the available observed PGA data for the 2001 Bhuj mainshock (derived from engineering seismographs records) and two 2006 Bhuj aftershocks of M w 5.6 ( Fig. 8a, b and c). Since we do not have observed PGA data from Kachchh events of M6.0-6.5 events, thus, to validate the PGA predictions from our ANN model we have used available observed PGA data for another Indian earthquake from the Uttarakhand Himalaya namely the 1999 Chamoli earthquake of M w 6.4 (Fig. 8b). The predicted PGA values from our ANN model (thick red line in Fig. 8a) show an excellent agreement with the observed PGA data of the 7 March M w 5.6 2006 and 6 April M w 5.6 2006 Bhuj events. We also plotted the attenuation curve (thick grey line in Fig. 7a) predicated from the ground motion relation of Mandal et al. (2009) that does not show good correlation with the Observed PGA data of two M w 5.6 Bhuj events. However, the observed PGA data of the 1999 M w 6.4 Chamoli earthquake fall between the predicted PGA curves for M w 6.0 and M w 6.5 from our ANN model (Fig. 8b). This difference in PGA values could be attributed to the different source processes and tectonics for intraplate Kachchh and interplate Chamoli seismic zones. We also plotted the attenuation curve (thick grey line in Fig. 8b) predicated from the ground motion relation of Mandal et al. (2009) (Fig. 8c). We also plotted attenuation curves for an M w 7.7 event predicted by available ground-motion relations for the eastern North America (Frankel et al. 1996;Toro et al. 1997;Somerville et al. 2001) and Kachchh, Gujarat, India (Mandal et al. 2009). From Fig. 8c, it is quite apparent that the predicted ground motion curve for an M w 7.7 event from our ANN model (shown by thick red line) shows a relatively better agreement with the observed data of the 2001 Bhuj earthquake as compared to other predicted curves from other attenuation relations. Thus, we can infer that the predictive PGA estimates from our ANN model suggest a fair to excellent correlation with the observed PGA values of earthquakes of M w ranging from 5.6 to 7.7, which are occurring in Kachchh, Gujarat, India.

Conclusions
We derive a data-driven feed-forward back propagation ANN model (with 8 hidden nodes) for predicting peak ground motions using observed PGA dataset of earthquakes occurring in Kachchh, Gujarat. The dataset consists of thirteen engineering seismoscope (SRR) records of the 2001 M w 7.7 Bhuj earthquake and 232 strong-motion records of 32 significant Bhuj aftershocks of M w 3.3-5.6 (during 2002-2008) with epicentral distances ranging from 3.3 to 289 km. Our ANN predictive PGA values are validated by correlating with the observed PGA values of Kachchh events of M w 5.6 and 7.7, and 1999 Chamoli earthquake