The older the better? The strange case of empirical ground motion models in the near-source of moderate-to-large magnitude earthquakes

This paper aims at providing a quantitative evaluation of the performance of a set of empirical ground motion models (GMMs), by testing them in a magnitude and distance range (Mw = 5.5 ÷ 7.0 and Joyner-Boore source-to-site distance Rjb ≤ 20 km) which dominates hazard in the highest seismicity areas of Italy for the return periods of upmost interest for seismic design. To this end, we made use of the very recent release of the NESS2.0 dataset (Sgobba et al. NESS2.0: an updated version of the worldwide dataset for calibrating and adjusting ground motion models in near-source. Istituto Nazionale di Geofisica e Vulcanologia (INGV), 2021. https://doi.org/10.13127/NESS.2.0), that collects worldwide near-source strong motion records with detailed metadata. After selection of an ample set of GMMs, based on either their application in past seismic hazard assessment (SHA) studies or for their recent introduction, a quantification of between- and within-event residuals of predictions with respect to records was performed, with the final aim of shedding light on the performance of existing GMMs in the near-source of moderate-to-large earthquakes, also in view of their potential improvement by taking advantage of results from 3D physics-based numerical simulations.


Introduction
There is a general consensus that empirical ground motion models (GMMs) are bound to improve their effectiveness and predictive performance as new records become available, although it is also recognized that, in spite of increasing data and more complex parametrizations, a significant reduction in the aleatory variability cannot be achieved (Douglas and Edwards 2016).
However, the exponential increase of records has not populated uniformly the different ranges of magnitude and distances. As illustrated in Fig. 1, with reference to the availability of strong-motion records in Italy, while, up to the 90 s, records in the range 5.5 ≤ M w ≤ 7 and Joyner-Boore distance R jb ≤ 20 km (simply denoted in the following by M-R range) weighted about 10% of the whole dataset, now they constitute only about 1%. Although this is an obvious consequence of the seismicity recurrence laws, it clearly reminds that it takes a long time for the calibration dataset of the empirical GMMs to be significantly enriched by new records in the near-source region of moderate-to-strong earthquakes. Besides the Italian case, as shown in Fig. 2, similar considerations apply to the calibration datasets of European and global GMMs.
Our interest towards the selected M-R range of moderate to large magnitude earthquakes in near-source conditions stems from its dominance in the seismic hazard assessment (SHA) of the highest seismicity regions of Italy (e.g., Barani et al. 2009), in view of more accurate determinations of elastic spectra for design, of reliable scenarios for seismic risk evaluations in large urban areas, as well as for critical infrastructures, and, more generally, of a proper planning of risk mitigation strategies.
Although the selected M-R range may not be strictly considered as near-source, we will check the predictions of different GMMs with records included in NESS 2.0 (Sgobba et al. 2021, referred to in the following as NESS), consisting of a near-source dataset including worldwide records from active shallow crustal regions, with a careful documentation in terms of seismic source and site conditions metadata. Note that, in Magnitude-distance distribution of calibration datasets for some of the selected GMMs, namely: AMB96, BSSA14 and K20 (see Table 1 for references). Distance metric is R JB for all plots. The rectangle highlights magnitude and distance ranges used for the selection of the data in this study 1 3 NESS, records are selected only based on M-R considerations and not on their impulsive/non-impulsive features.
For the purpose of this paper, a set of representative GMMs, listed in Table 1, was selected either because of their past use in the framework of SHA studies in Italy, or of their potential use in future studies. Therefore, this selection is neither intended to exhaustively cover the whole range of applicable GMMs nor to provide an absolute ranking for use for future SHA studies in Italy.
This paper is organized as follows. After introducing the selection of GMMs and of the records of NESS considered in this work, the performance of selected GMMs against NESS records is illustrated. Residuals, separated into their between-and withinevent components, are computed with respect to each model, together with their standard deviation. Finally, general remarks on the adequacy of older and newer empirical GMMs to predict near-source ground motions of moderate-to-large earthquakes are provided, together with considerations on the possible support from 3D physics-based numerical simulations.

Selection of empirical earthquake ground motion models
Generation of GMMs, or recalibration of old ones, has nowadays become a continuous process, as new data or new regression procedures become available. In the framework of SHA studies in Italy, we selected several GMMs of interest either for their role in the derivation of the official SH maps of Italy (Ambraseys et al. 1996;Sabetta and Pugliese 1996), or for their specific calibration with near-source records (Ambraseys and Douglas 2003), or for their calibration in the Italian and European framework (Akkar and Bommer 2010;Bindi et al. 2011Bindi et al. , 2014Akkar et al. 2014a;Lanzano et al. 2019;Kotha et al. 2020), or for their worldwide calibration dataset and use (Boore et al. 2014;Campbell and Bozorgnia 2014;Cauzzi et al. 2015), with, in the case of Chiou and Youngs (2014), the inclusion of a near-source term in the model formulation. Table 1 gives an overview of the 13 selected models, in terms of area of calibration of its dataset, magnitude and distance ranges of applicability, distance metrics and intensity measure type. Note that the most recent models (ITA18 and K20) are based on the European Strong Motion dataset (ESM, Luzi et al. 2020), while two of them, ITA14 and ASB14, are derived from the European RESORCE dataset (Akkar et al. 2014b). Two of the selected models are derived from only Italian data (SP96 and ITA10). Anyway, all selected models are suitable to be used in the shallow active crustal regions that dominate seismicity of Italy, with few exceptions of volcanic areas and subduction zones offshore Calabria.
Given that the aim of this work is the evaluation of performance of GMMs in the magnitude and distance ranges of engineering relevance in Italy (i.e., M w from about 5.5 to 7 and distances up to 20 km), Fig. 2 highlights, for some of the selected models, the portion of records falling in that range (shown by the rectangle), with respect to the total.
It is clear that the records included in the selected M-R range are by far a minority of the total, especially for the most recent GMMs, despite the richness of their calibration dataset. Moreover, the fraction of the selected records with respect to the total decreases sharply moving towards recent years, because the number of small magnitude-large distance records increase with time much faster than the number of near-source records. Figure 4 shows the observations used in this analysis, within the same ranges of interest.
As a last remark, we point out that, despite the continuous improvement in regression models and metadata, the aleatory variability of the predictions of the GMMs tends to increase, most probably due to the large variability of the available calibration dataset. To support this sentence, Fig. 3 illustrates the total standard deviation (σ ln ) of the selected models as a function of period, showing an increasing trend with time of σ ln .

Overview of NESS near-source dataset
NESS is a dataset of high-quality metadata and intensity measures of near-source strong-motion records from active shallow crustal regions, with a careful documentation in terms of seismic source (e.g. geometries and rupture mechanisms) and site conditions. The last version of the dataset was released in January 2021 and it is available at the website http:// ness. mi. ingv. it/ (Sgobba et al. 2021). As explained by Pacor et al. (2018), records were selected according to the following criteria: M w ≥ 5.5, hypocentral depth ≤ 40 km, distance from the source obtained from scaling relationships depending on magnitude, stress drop and fault dimensions. Figure 4 shows the distribution of data with respect to M w , R jb , site conditions and fault mechanism.
In the analyses of this study, only a selection of data from NESS was considered (Fig. 4, rectangle). The selection has been done according to the following criteria: • records on ground type B according to EC8 (stiff soil conditions, CEN 2004); • records with R jb ≤ 20 km.
Selected data were classified in four M w classes, centered at 5.5, 6.0, 6.5 and 7.0 (± 0.1), respectively, as shown in Fig. 5. The class with the largest number of records is M w 6.0, with about 70 records. The same figure shows the distribution of the selected data with respect to focal depth as well. Note that most of the records are characterized by a focal depth of about 10 km.

Performance of selected GMMs against near-source records
The performance of the selected GMMs was tested for the horizontal components of motions, considering spectral acceleration values at 0 s (PGA), 0.5 s (SA05), 1 s (SA1), 2 s (SA2), in order to cover the short-to-long period range. Figure 6 shows some examples of comparison, for the four spectral accelerations, between the median GMM predictions for M w 6.0 and NESS records (dots) with M w = 5.9 ÷ 6.1; while Fig. 7 shows the same comparison using the median prediction for M w 7.0 and NESS records with M w = 6.9 ÷ 7.1. In order for all the selected GMMs to have a proper basis of comparison, Figs. 6 and 7 show the predictions under the following assumptions: (i) 'unspecified' mechanism; (ii) V s,30 = 600 m/s, representative of ground type B (for the models having V s,30 as site proxy); (iii) suggested default values for any additional parameter required by the models.
In the case of SP96, the parametrized soil conditions of which cannot be directly associated to the ground type B, a default multiplication factor of 1.2 has been applied to the prediction on rock. For AMB96 and AD03 models, a conversion from M s to M w was applied, in agreement with that considered for the Italian seismic hazard map (MPS Working group 2004). For SP96, as the magnitude scale adopted provides a good correspondence with M w (Idriss, 1991), no conversion was applied. Finally, note that for each GMM the comparison with records has been carried out considering the corresponding type of predicted intensity measure (Table 1), with no need of conversion.
Note that, following K20, we have distinguished the following focal depth (D) classes: 'shallow' (D < 10 km), 'intermediate' (D = 10 ÷ 20 km) and 'deep' (D ≥ 20 km), although, as shown in Fig. 4, the NESS records are by far mostly populated in the 8 ÷ 10 km bin. The source-to-site distance was selected according to the specific GMM and the corresponding values provided in NESS. According to their general definition (Strasser et al. 2008) total residuals are the difference between the natural logarithm of records (y es ) and the corresponding prediction (µ es ), for a given earthquake e at station s, obtained using a specific GMM. In order to analyze the performance of models as a function of either magnitude or distance, total residuals are separated into their between-and within-event components. Following the notation proposed by several authors (e.g., Al Atik et al. 2010;Rodriguez-Marek et al. 2011), the natural logarithm of the observed ground motion y es can be expressed as in Eq. 1, where µ es is the median ground motion value predicted by a specific GMM, δB e is the between-event residual, which corresponds to the average misfit of records from one particular earthquake with respect to the median ground motion model (Eq. 2, where NS is the number of station recordings from one particular earthquake e), δW es is the within-event residual defined as the misfit between an individual observation at station s with respect to the event-corrected median estimate (Eq. 3). Between-and withinevent residuals are modelled using normal distributions and standard deviation τ and φ, respectively.  Table 1). The only exception is in the fourth column (CB14, CY14 and CEA15), where records are shown in terms of HGM, while CB14 and CY14 predictions are in terms of RotD50 In our analysis, we calculated the residuals by considering the proper M w , styleof-faulting, V s,30 (if not present in the database we used 600 m/s as representative of ground type B) and by taking extra parameters from the NESS database. In particular, for the GMMs requiring additional parameters, we made the following assumptions: • for CB14, we took the hypocentral depth (Z hyp ), dip, rupture width (W), depth-totop rupture (Z tor ), that were available from NESS; while we evaluated the depth to V s = 2.5 km/s horizon (Z 2.5 ) based on the correlations with V s,30 provided in the NGAW2 spreadsheet (https:// peer. berke ley. edu/ peer-strong-ground-motion-datab ases); • for CY14, we used the dip and depth-to-top rupture (Z tor ) that were available from NESS; we used zero for the directivity effect term (ΔDPP); Rx was set negative so that no hanging wall effect was considered. Default values for the depth to V s = 1 km/s (Z 1.0 ) were calculated using Eqs. (1) and (2) of CY14; • for BSSA14, we accounted for the specific regional anelastic attenuation parameter, although of minor relevance for near-source computations.
(1) y es = es + B e + W es As an example, Fig. 8 shows the computed δB e and δW es for the AMB96 model, together with the 16th, 50th and 84th percentiles of their normal distribution. Note that, as δB e can identify a systematic misfit of a specific GMM, with reference to a particular magnitude class, a positive δB e indicates a systematic underprediction of records, while a negative term indicates a systematic overprediction.
The whole set of between-event residuals calculated for all GMMs is shown in Figs. 9 and 10, for PGA and SA1. In addition, to give a more comprehensive overview of the performance of the analyzed GMMs, Fig. 11 shows the mean computed δB e for all periods, magnitudes and GMMs. Note that the AD03 model was excluded from the calculations for M w 5.5 class, because its range of applicability is beyond M s = 5.8. Instead, the SP96 and ITA10 models were considered also for M w 7.0, because their calibration dataset includes records of the M w 6.9 1980 Irpinia earthquake, used for the validation.
In general, between-event residuals, and their corresponding scatter, show different patterns depending on the GMM under consideration. Considering for example the ITA18 model, PGA residuals suggest a systematic underprediction for almost all magnitude classes, with an overall positive mean δB e up to about + 0.6 (in ln units) for M w 6.5 class (in PGA), that means an underestimation of recorded values by a factor of about 2. In some cases, such as for the previously noted PGA predictions at M w 6.5, all models tend to underestimate recorded values, that may be likely related to a specific trend of NESS records for this magnitude class.
Although the visual inspection of Fig. 11 is sufficient to provide a meaningful overview of the performance of the different GMMs, a quantitative comparison of δB e values is shown in Table 2, in terms both of their average and of their maximum absolute values.

Performance in terms of standard deviation
The performance check in the previous section was limited to the median prediction values. We aim in this section at checking if the models are suitable to quantify the observed variability of near-source records.
To this end, we computed the standard deviation of the within-event residuals δW es_j of NESS records with respect to the j-th GMM, denoted by φ NESS_j , as follows: where y es is the observed value at station s from event e (of either PGA or SA1), a GMMj is the median predicted acceleration from the j-th GMM, δB e_j is the between-event residual computed with reference to the same model, for the same event and NS e is the number of recordings available for event e. The φ NESS_j values are compared in Fig. 12 with the corresponding φ j , that is the standard deviation of the within-event residuals of the j-th GMM. Note that for the older models (AMB96, SP96 and AD03) φ j was not computed, so the total standard deviation σ j was used for comparison.
Considering that φ is generally slightly lower than σ, the performance of both AMB96 and AD03 models (first row of Fig. 12) is good, while for SP96 (R epi ) the performance is fairly good for PGA, but not for the other periods, especially at higher magnitudes. For the more recent GMMs (second and third rows of Fig. 12), there is a clear trend of φ NESS_j values to be substantially lower than φ j , suggesting that the variability implied by those GMMs tends to overestimate that observed in the near-source records, as expected since the size of the calibration dataset is higher. This is also visually confirmed in Fig. 13, showing δW es_j for the class with the highest number of records (M w 6.0), together with the standard deviation of AMB96, AD03, CB14, ITA10, ITA18 and K20 models.

Conclusions
In this paper, we provide a quantitative evaluation of the performance of a set of GMMs in the near-source area of moderate-to-large earthquakes (M w = 5.5 ÷ 7.0 and Joyner-Boore source-to-site distance R jb ≤ 20 km), selected because it dominates hazard in the high seismicity Italian regions, and, therefore it is crucial for accurate PSHA evaluations at long return periods, for reliable determination of seismic actions for design, and for a proper definition of seismic scenarios for risk analyses. GMMs were tested taking advantage of the NESS dataset, consisting of worldwide near-source records with highquality metadata and intensity measures. Through an analysis of residuals extended to intensity measures representative of different period ranges (PGA, SA05, SA1, SA2), we investigated the misfit of the considered GMMs with respect to records and we compared the performance of the different models. The goal of this test is not to make an absolute ranking of GMMs in the selected M-R range, that would imply consideration of a much wider set of GMMs and the use of more advanced ranking approaches (e.g., Scherbaum et al. 2004;Kale and ve Akkar 2013), but to check the actual predictive Fig. 11 Mean between-event residuals (δB e ) computed over the four spectral accelerations considered, for a selection of GMMs, and for the four adopted magnitude classes: M w 5.5 (M w = 5.4 ÷ 5.6), M w 6.0 (M w = 5.9 ÷ 6.1), M w 6.5 (M w = 6.4 ÷ 6.6), M w 7.0 class (M w = 6.9 ÷ 7.1)  capability of GMMs and to identify possible systematic bias with respect to the newly available near-source records NESS dataset. Our results are somehow unexpected, as it turned out that, according to the simple criteria considered in this paper, the AMB96, dating back about 25 years, and the AD03 models, dating about 20 years, are among the best performing GMMs in the selected M-R range, both in terms of median values and standard deviation. While the excellent performance of AD03 could be predicted, although not granted, because it was specifically devised for near-source conditions and it took advantage of the important sets of records from the 1999 Turkey and Taiwan earthquakes, that of AMB96 was not obvious, also considering that most of the NESS dataset does not contain records used for its calibration.
To explain these results, it should be considered that, although the calibration datasets of newer models are growing continuously and very fast in low amplitude motions, the number of near-source high amplitude records is increasing at a much lower rate, as previously shown in Fig. 1. As a consequence, while the performance of new GMMs for low-magnitude large-distance earthquake ground motions is expected to steadily improve, a similar improvement is not granted in the near-source region of moderateto-large earthquakes, that most often govern the seismic hazard assessment for the highest seismicity regions. Instead, the percentage of near-source records over the whole calibrations sets of older GMMs is larger than it is in the newer ones, and this may be a reason for the remarkably good performance of the AMB96 and AD03 models. Within-event residuals (δW es ) for M w 6.0 class, computed using different GMMs, with 16th, 50th and 84th percentiles (with bin of distances) of the corresponding normal distribution. Solid lines represent the total standard deviation (σ), dashed lines the within-event standard deviation (φ), when available, of the corresponding GMM Although the title of this paper may be provocative, the main intent is of course not to claim that recent GMMs should be replaced by the old ones. Rather, it is to focus on the need for improvement of the tools of ground motion prediction in the near-source conditions, that are often overlooked in the development of empirical GMMs and that are instead the key, for most highly seismic regions, for a proper estimation of hazard and risk.
With this objective, it should be pointed out that the NESS dataset itself is still sparse and not sufficiently extended to cover the variety of fault slip distributions, fault geometry details, complex geological configurations that are expected to affect strong ground motion in near-source conditions. This variety is hardly sampled by the available near-source records, the number of which, as noted before, is not expected to grow significantly in the next few years. In these conditions, the most rationale way to progress towards advanced modelling of near-source ground motions seems to take advantage of the increasingly accurate broadband results of physics-based numerical simulations, as a numerical lab providing realistic realizations of future earthquake ground motions in realistic 3D geological configurations. For this purpose, simulated broadband accelerograms may successfully complement the sparse availability of records in near-source conditions, as in the BB-SPEEDset, recently introduced by Paolucci et al. (2021) based on several validated ground motion scenarios from Italian and worldwide earthquakes, or in the subset of CyberShake ground-motion time series, for the Los Angeles area, selected by Baker et al. (2021).