Stochastic Time-Series Prediction Equation Using Wavelet Packets for Iran

In this study, a stochastic simulation model proposed by Yamamoto and Baker (Bulletin of the Seismological Society of America 103:3044–3056, 2013) is applied to the Iranian strong motion database, which comprises more than 3828 recordings for the period between 1975 and 2018. Each ground motion is decomposed into wavelet packets. Amplitudes of wavelet packets are divided into two groups, and for each group, model parameters are estimated using the maximum likelihood method. Regression coefficients are then obtained relating model parameters to seismic characteristics such as earthquake magnitude, distance, and site condition. Inter-event residuals of coefficients and correlation of total residuals of those parameters are also calculated. An inverse wavelet packet transform is used to reconstruct the amplitudes in the time domain and perform the simulation. Finally, a validation test is performed. The comparison of ground motion intensity measures for recorded and simulated time series shows acceptable conformity in the application. The estimated parameters using the simulated data agree with the real data, indicating the acceptable validity of the estimated stochastic simulation model. The obtained regression equations can generate ground motions for future earthquake scenarios in Iran.


Introduction
The Iranian Plateau is located on the Alpine-Himalayan orogenic belt, and due to the northward motion of the Arabian plate toward Eurasia is experiencing continuous deformation. This deformation occurs mainly in the Alborz, Kopeh Dagh, and Zagros Mountains (Fig. 1a). Except for the Makran subduction zone to the southeast of the Iranian Plateau, where an oceanic lithosphere subducts northward under southeastern Iran (Nilforoushan et al., 2003), a continental form of the Arabian-Eurasian plate convergence causes the present tectonics of Iran (Jackson & McKenzie, 1984)  3) earthquakes are among the most destructive to date. Active seismic sources, young tectonics, and convergence rate due to regional and global tectonics have caused major earthquakes with extensive destruction and casualties over the past 50 years in Iran, where there are many populated industrial cities with critical infrastructures. This indicates a crucial need in seismic risk assessment and its mitigation tools.
Strong ground motion recordings play an indispensable role in earthquake engineering for investigating the individual characteristics of earthquakes and the response of structures under excitation. In almost all seismic regulations, structural design spectra are formed utilizing strong ground motion intensity measures (GMIM, e.g., peak ground acceleration, PGA), and spectral quantities (e.g., spectral acceleration, SA) calculated through hazard assessments. Assessments are essentially performed utilizing ground motion prediction equations (GMPE) compatible with a specific region.
Hence, the precise calculation of GMPE is directly dependent on the number of strong motion recordings in an area. Strong motion simulation techniques are convenient ways to produce abundant ground motion as an input for structural analysis in earthquake engineering, especially for the regions that will be urbanized and that lack sufficient strong motion data. In addition, synthetic accelerograms provide demand parameters of earthquake engineers that can be extracted from them and are allowed in several design regulations for nonlinear time history analyses of structures. Generating synthetic ground motions whose features match the characteristics of the Figure 1 a Five tectonic regions of Iran (Mirzaei et al., 1998). The distribution of the b 892 strong motion stations and c 1636 earthquakes studied in this study 2662 M. Najaftomaraei et al. Pure Appl. Geophys. recorded ground motions can be a substitute approach.
There have been several works on ground motion simulation of earthquakes in Iran. Among them, Nicknam et al. (2009) present a probabilistic-based strong motion compatible with the source path and site soil conditions given the probability of exceedance for the citadel of Arg-e-Bam site bedrock. Soghrat et al. (2011), developed a GMPE for soil and rock sites in northern Iran, based on the calibrated specific barrier model (SBM) against the latest available strong motion data. Hassani et al. (2012), implemented stochastic finite-fault modeling based on dynamic corner frequency to simulate the 22 February 2005 (Mw 6. 4) Zarand earthquake. Zafarani and Soghrat (2012) developed a physically-based GMPE for soil and rock sites in the Zagros region based on the SBM. Zafarani et al. (2013) used a hybrid simulation method to calculate broadband ground motion time histories at the bedrock level for deterministic earthquake scenarios on the NTF, and used a hybrid technique to simulate the ground motion caused by an earthquake with Mw = 4, which occurred close to Tehran on 17 October 2009 and is the first local earthquake that has ever been recorded by the local strong ground motion network in Tehran. Gholami et al. (2013), applied a new analytical methodology for computing synthetic seismograms in three-dimensional (3-D) anelastic media to model the local records of the 2003 Bam Mw 6.6 earthquake, Southeastern Iran. Zafarani et al. (2015), employed a generalized inversion of S-wave amplitude spectra for deriving the site response and S-wave attenuation (quality factor) in the northwest region of Iran. Riahi et al. (2015), simulated the 2003 Bam earthquake acceleration recordings at three strong motion stations using the empirical Green's function method. Vahidifard et al. (2017), presented a simulation of three components of near-field ground shaking recorded during the mainshock at three stations on 16 September 1978 in Tabas (Mw 7.4), Iran, using a hybrid method (composed of discrete wavenumber method) and stochastic finite-fault modeling based on a dynamic corner frequency.
As mentioned earlier, some of the essential issues regarding the studies are as follows: (1) researchers performed ground motion simulations of a specific earthquake, not for a region or even for a whole country; (2) GMPEs predict a particular ground motion parameter (e.g., PGA, SA), not stories; (3) GMPEs predict ground motion at different frequencies with indeterminate time; (4) the research does not consider non-stationarity. As is known, temporal and spectral non-stationarity (changing amplitudes and changing frequency characteristics of time series, respectively) can affect nonlinear dynamic structural response.
Hence, methods that can reproduce nonstationary characteristics of ground motions are needed. Amiri et al. (2011), predicted wavelet packet coefficient amplitudes using a neural network approach. Their simulations do not consider any seismological parameters such as source, path, and site effects, so ground motions containing all available variables of possible future earthquakes generated with their models are somewhat difficult.
Discrete wavelet transform (DWT) is another approach to generating synthetic ground motions Vol. 179, (2022) Stochastic Time-Series Prediction Equation 2663 (e.g., Bai et al., 2012;Sasaki et al., 2003); however, time localization of low-frequency amplitudes is an issue in DWT. Yamamoto and Baker (2013) (hereafter, YB2013) proposed a stochastic method that uses wavelet packet transform (WPT) to describe the waveform in the time and frequency domain and finally simulates time series using inverse wavelet transform. The reasons for selecting WPT among wavelet analysis approaches are as follows: (1) Because of the nonorthogonality of the wavelets at neighbor times and frequencies, reconstructing time series using the continuous wavelet transform (CWT) is complicated.
(2) Time localization of low-frequency amplitudes is an issue in DWT, as mentioned above. (3) Controlling resolution in the time and frequency domains in WPT, a developed copy of the DWT, is easier because the basis functions are orthogonal, making it easy to reconstruct time series. (4) Simulated ground motions resulting from their proposed stochastic ground motion model are variable because they affect the variability of structural responses (e.g., Rezaeian, 2010;Rezaeian & Der Kiureghian, 2012). The YB2013 method has been successfully applied to strong motion data in various parts of the world. Among them, Huang and Wang (2015) used wavelet packets and cokriging analysis to simulate spatially distributed ground motions over a region stochastically.
Stochastic simulations, in general, require fewer parameters and are computationally less expensive than physics-based and hybrid simulations. Still, they do not consider physical phenomena because there are almost no parameters in the stochastic model to control these effects. The method proposed by YB2013 overcomes this problem. Unlike the previous works that simulated a specific earthquake, this method can be applied to a tectonic region. In addition, previous GMPEs can estimate only GMIMs, but YB2013 predicts the whole time series. Unlike the earlier studies, this approach locates the frequencies in time and considers temporal and spectral nonstationarity.
Considering the above-discussed superiority of the YB2013 over conventional stochastic methods, the YB2013 method is utilized to simulate the substantial, strong ground motion recordings in Iran. The motivation of this study comes from the fact that no work is available robustly predicting ground motion time series for Iran with an extensive strong ground motion database.
We begin this paper by introducing the general procedure of YB2013. Then the process is extended to Iranian strong ground motion data. Finally, a comparison of GMIMs of simulated versus recorded strong ground motion and SA at various periods is performed to validate the approach.

Methodology
YB2013 proposed a stochastic simulation method that uses wavelet packet transform to describe the waveform in the time and frequency domain and simulate time series using inverse wavelet transform. The significant advantage of this method is that it does not necessitate detailed information on the source rupture process, propagation path, or local site effects. The success of the physics-based and hybrid simulation techniques depends on the potential for characterizing those fundamental inputs. Fewer parameters are needed for the YB2013 stochastic simulation method. Compared to physics-based and hybrid simulations, they are computationally less expensive. Thirteen parameters are requisite in their proposed model that can be obtained by regression analysis, which comprises proper variability matching with the properties of the simulated ground motions. PGA, Arias intensity (Arias, 1970), and significant duration (Trifunac & Brady, 1976) of simulated ground motions obtained by the model and recorded ground motion were compared. The results showed a relatively appropriate agreement.
The following steps of the stochastic simulation model introduced by YB 2013 are summarized. The whole scheme of their stochastic modeling is also shown in Fig. 2.

Decomposing Time Series Using Wavelet Packet Transform
In the first step, the coefficients of the wavelet packets are obtained. To decompose a ground motion time series into wavelet packets in the time and frequency domain, WPT was employed, and it is defined as follows: where xðtÞ is the time series, c i j;k is the WPT coefficient, in which i is its location on the frequency axis, and j is the scale parameter number that shows the frequency resolution. K is its location in time and w i j;k is the wavelet packet function. n is the number of wavelet packet coefficients; T is the number of time steps and F is the number of frequency points. The mother wavelet function used here is the Meyer wavelet (Meyer, 1986). It has orthogonality characteristics and proper positioning in the time and frequency axis. It is computed from the finite-impulse response-based approximation. To minimize the residual of simulated data, wavelet packets in the lowest frequency (as low as 0.1 Hz) level had to be controlled; to this end, the dt and j were selected to be 0.02 s and 8, respectively.

Dividing Wavelet Packet Coefficients into Two Groups of Major and Minor
Amplitudes of WPT were divided into two groups, major (containing 70% of the total energy) and minor, because WPT has few coefficients with large amplitudes (less than 1% of the total number of wavelet packet coefficients). These two groups are united as follows to obtain the total wavelet packet coefficients: where c i j;k;maj and c i j;k;min are the wavelet packet coefficients in the major and minor groups, respectively.
The energy fraction criteria maximize the difference in the characteristics of the wavelet packet coefficients of the major and minor groups.

Finding Parameters for Both Major and Minor Groups
The 13 parameters that explain the time and frequency characteristics of the acceleration time series are given in Table 1. Equation 4 is obtained for all amplitudes (major and minor groups), Eqs. 5-15 are obtained for two groups separately, and Eq. 15 is obtained for just the major group. Note that the 13th parameter is the standard deviation of independent and identically distributed lognormal random variables with median 1 and logarithmic standard deviation of the residual of the wavelet packets in the minor group computed from the bivariate lognormal function) In both groups, the correlations between time and frequency are transformed into a new parameter using the cumulative density function of the standard normal distribution, called a transformed coefficient because it is more appropriate for prediction using regression analysis EðtÞ Eðf Eðf EðaÞ where E acc is the cumulative squared acceleration for the total wavelet packet coefficients, C i j;k is the wavelet packet amplitudes, EðtÞ and Eðf Þ are the center of the time (temporal centroid) and frequency (spectral centroid) location of the wavelet packet coefficients, respectively, S 2 ðtÞ is the temporal variance, S 2 ðf Þ is the spectral variance, qðt; f Þ is the correlation between the time and frequency of the wavelet packet coefficients, and EðaÞ maj is the mean of the squared amplitudes of the wavelet packet coefficients of a major group. In all equations, maj denotes the major group, and min represents the minor group.

Finding Coefficients of Regression Analysis of Those 13 Parameters (Regression Analysis)
To simulate the ground motion time series, 13 parameters of the model must be predicted as a function of the source, path, and site of future scenarios through a two-stage regression analysis approach (Joyner & Boore, 1993 with a moment magnitude (M w ), hypocentral distance (R hyp ), rupture distance (R rup ), and average shearwave velocity within 30 m depth (V S30 ) as predictor variables.
The 13 model parameters are estimated using the maximum likelihood estimation to obtain the regression coefficient. The functional form used to conduct a regression analysis is as follows: Except for the case of correlation, Y is the natural logarithm of a model parameter (for correlation, Y is the transformed coefficient), and g and d are intra and inter-event residuals, respectively. The parameters a and b i are the regression coefficients that must be predicted. Other details of the regression analysis can be found in the original paper. Through the source parameters, the logarithmic lnðMÞ term is used only for predicting EðaÞ maj and E acc because, at the large magnitude, they may confront saturation.

Major and Minor and Group of Wavelet Packets
The predicted parameters must lead to distributions in time and frequency to simulate future ground motions of a target event.
A statistical survey of wavelet packets shows that large wavelet packet coefficients differ in time and frequency distributions from those of small ones. Therefore, the WPT coefficients were divided into two groups, the major and minor groups. Observations from the major wavelet packet coefficients demonstrate that their locations depend on time and frequency but are independent of their amplitudes. To illustrate this statistically, the squared amplitudes have exponential distributions, and the locations are consistent with bivariate lognormal distributions. Parameters and characteristics of the functional form of WPT coefficients in both groups are listed in Table 2.

Inverse Wavelet Packet Transform
After obtaining wavelet packet coefficients for both groups through the above-mentioned distributions, which are acquired by estimated parameters calculated via regression equation and combining them, the inverse WPT is used to reconstruct the time series from wavelet packets as follows: where 2 N is the number of time steps in the time series. Another required case for reconstructing the time series is the sign of the amplitudes of the wavelet packet obtained here randomly because it does not have a significant effect on the ground motion characteristics. 3. Strong Motion Data and Data Processing The strong motion database of the Road, Housing, and Urban Development Research Center (BHRC) was used between 1975 and 2018. The database comprises 1636 events (Fig. 1c) with Mw between 3 and 7.5. A total of 3828 recordings of 892 stations (Fig. 1b) with hypocentral distances less than 300 km were selected. We selected those events that were recorded in at least five stations. Stations are equipped with SMA1, Guralp, or SSA2 accelerometers with a sampling rate of 200 samples per second. Stations and epicentral distribution of earthquakes are shown in Fig. 1b and c. Figure 3a and b show magnitude versus Repi and Vs 30 versus Repi, respectively. From the 3828 three-component waveforms used in this study, 1469 records were not associated with measured Vs30 data (from 504 stations), so we used the estimated Vs30 obtained by these records in Najaftomaraei et al. (2020). Their approach estimates V S30 stations, using a normal distribution and PDF and CDF of the natural logarithm of horizontal-to-vertical ratio (H/V) of PGA and PSAs at 24 periods (388 stations).
A suitable window for high-quality accelerograms must be selected for initial data processing. To choose the appropriate section of waveforms, arrival times of P and S waves were determined. The considered window starts 2 s before the P wave arrival and continues until the end of the waveform. A linear trend removal was applied to the data. Low-quality data with a signal-to-noise ratio [SNR, Eq. (17)] smaller than 3 (SNR \ 3) were eliminated. Further data with no pre-event and even those with no P wave (truncated data) were not considered. Since having information about the style of faulting is restricted to just some of the events, the effect of this parameter on regression analysis was neglected, which needs to be studied and improved.
where Y is signal amplitudes, N is noise amplitudes, n sig and n noise define the number of samples of signal and noise, respectively, and RMS is the root mean square.

Stochastic Simulation
Essentially major steps given in the methodology section were followed for the simulation. WPT was applied to the processed ground motion recordings in the first phase of the analysis. For example, time, frequency, and wavelet packet coefficients at the corresponding time and frequency of an arbitrary signal are shown in Fig. 4, and the same plot for the 2000/2/7 Mw 4.2 event recorded at Ab-bar station are shown in Fig. 5. We can see the whole wavelet packet transform amplitudes of the record in this figure; large amplitudes are concentrated at about 12.5 s and between 3 and 4 Hz, and it is clear that the number of small amplitudes is much greater than that of large ones. This fact leads to the idea of dividing the amplitudes into two groups. The time and frequency nonstationary characteristics can also be seen in this figure. Then, 13 model parameters were estimated using the observed time series. And regression coefficients were calculated utilizing two-stage regression analysis in MATLAB. For the two-stage regression analysis, we apart from the relation into two regression equations: The first one is related to intra-event effects considering distances and site conditions, and the second is associated with the inter-event impacts with the characteristics of the earthquakes (Joyner & Boore, 1993. To predict all parameters, regression analysis was carried out using the functional form of Eq. 4 for all 3828 records. Figure 6 shows the Major and Minor amplitudes of recordings given in Fig. 5. The major wavelet packet amplitudes are the largest, consisting of 70% of the total energy (less than one percent of the whole number of coefficients). The remaining packets with smaller amplitudes are placed in the minor group. The relationship between wavelet packet coefficients and some of these parameters is shown in Fig. 7. In this figure, E(t) is the temporal centroid, which is associated with the time of the decisive part of the ground motion, S 2 (t) is the temporal variance related to significant duration containing 90% of the total energy, and E(f) is the spectral centroid. It is associated with the dominant frequency, and S 2 (f) is the spectral variance related to significant bandwidth that contains 90% of the total energy. In the final phase, stochastic simulations at 3828 acceleration time series from 1636 earthquakes utilizing the calculated stochastic time-series prediction model.

Results and Discussion
The resulting regression coefficients are shown in Table 3. In Table 4, the correlations of total residuals for each pair of model parameters have been tabulated. There can be some factors that do not exist in the regression equation, which may influence the model parameters computed from observed ground motions. Thus, a correlation must be performed through the intra-and inter-event residuals.
To evaluate and compare the simulated data with the observations, the parameters related to the two categories (simulated and observed data) are plotted relative to each other and are shown in Fig. 8. These parameters are the PGA and spectral accelerations (Sa) at T = 0.01; 0.06; 0.5; 1; 5; 10 and 20 s, Arias intensity (Arias, 1970), and significant duration. As shown in Fig. 8, the estimated parameters from the simulated data are in good agreement with the real data, indicating the acceptable validity of the estimated stochastic simulation model. Figure 8a is simulated versus observed PGA, Fig. 8b is the significant duration of simulated ground motion versus observed, Fig. 8c is Arias intensity of simulated ground motion versus observed, Fig. 8d-j show simulated versus observed PSA at T = 0.01; 0.06; 0.5; 1; 5; 10 and 20 s, respectively. Generally, the model can produce the observed PGA and PSA over

Figure 7
Relationship between wavelet packet coefficients and some of the parameters used in regression analysis [EðtÞ and Eðf Þ are the center of the time (temporal centroid) and frequency (spectral centroid) location of the wavelet packet coefficients, respectively, S 2 ðtÞ is temporal variance, S 2 ðf Þ is the spectral variance] Table 3 Coefficients of the prediction equation   periods from 0.01 to 20 s, but for T = 0.01, T = 5, and T = 10 s, data distribution around the equality line (y = x) is more acceptable.
For all records, wavelet packet coefficients and resulting time series were simulated. Finally, PGA, significant duration, Arias Intensity, and SA at 7 periods were extracted for simulated and recorded ground motion. The left panels of Figs. 9, 10, 11, and 12 simulated and observed acceleration, velocity, and displacement time series are shown for the tabulated earthquake in Table 5. In addition, the Fourier and response spectra of simulated and recorded ground motions are displayed in the right panel of Figs. 9, 10, 11, and 12. These figures show the contrast between the recorded and simulated acceleration time series. A match between the observed and simulated acceleration waveforms, velocity, and displacement is acceptable. The durations of the simulated and observed time series are similar and show qualified matching. The estimated PSA is more significant at long periods than observed, and the model overestimates it. Fourier amplitude spectra matching at frequencies lower than about 8 Hz seems to be the best.
As shown in Table 5, the magnitude of these earthquakes was considered from 4.2 to 7.3 to evaluate the validity of the estimated simulation model in a broader range of magnitudes. Also, the effect of shear wave velocity in 30 m (V 30 ), which indicates the site effect below the stations, was from 296 to 700 km/s for the selected stations to evaluate the performance of the derived simulation model according to different soil effects. In addition, the epicentral distance for the mentioned stations varied from 42 to 111 km (Table 5).
Due to the limitations of the stochastic simulation method and the lack of consideration parameters such as slip distribution on the fault, 3-D velocity, and attenuation mode, the acceptable agreement between the simulated and observed acceleration, velocity and displacement can be seen in the time domain. On the other hand, as it is clear in Figs. 9,10,11,and 12, there is an almost good agreement between the Fourier and response spectra of simulated and observed time series.

Conclusion
To simulate the ground motion time series, 13 statistical model parameters extracted from wavelet packet analysis were obtained and estimated using the maximum likelihood method using the recorded ground motion from the BHRC database for Iran. These parameters were assumed to function with seismic properties such as earthquake magnitude, distance, and Vs30. So through a two-stage regression analysis, the regression coefficient was obtained. Intra-and inter-event residuals of coefficients and correlation of total residuals of those 13 parameters were also calculated. Because temporal and spectral non-stationarity can be governed by this computationally inexpensive method, and since there are some regions in Iran where mega constructions will be established in the future but which also lack sufficient recorded ground motions for engineering purposes, the time-series prediction equation coefficient was obtained to simulate ground motions, followed by a validation test for all records, comparing simulated and recorded parameters such as PGA, significant duration, Arias Intensity, and Sa at seven periods, as well as for four earthquakes. This comparison of engineering demand parameters for recorded and simulated time series showed excellent consistency. Indeed, because of our constraints on data, such as not having the style of faulting, this study can be improved by inputting some other seismological parameters in the model to make it more realistic.