Near-source magnitude scaling of spectral accelerations: analysis and update of Kotha et al. (2020) model

Ground-motion models (GMMs) are often used to predict the random distribution of Spectral accelerations (SAs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{SAs}$$\end{document}) at a site due to a nearby earthquake. In probabilistic seismic hazard and risk assessment, large earthquakes occurring close to a site are considered as critical scenarios. GMMs are expected to predict realistic SAs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{SAs}$$\end{document} with low within-model uncertainty (σμ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\upsigma }_{\upmu }$$\end{document}) for such rare scenarios. However, the datasets used to regress GMMs are usually deficient of data from critical scenarios. The (Kotha et al., A Regionally Adaptable Ground-Motion Model for Shallow Crustal Earthquakes in Europe Bulletin of Earthquake Engineering 18:4091–4125, 2020) GMM developed from the Engineering strong motion (ESM) dataset was found to predict decreasing short-period SAs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{SAs}$$\end{document} with increasing MW≥Mh=6.2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{W}\ge {\mathrm{M}}_{\mathrm{h}}=6.2$$\end{document}, and with large σμ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\upsigma }_{\upmu }$$\end{document} at near-source distances ≤30km\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\le 30\mathrm{km}$$\end{document}. In this study, we updated the parametrisation of the GMM based on analyses of ESM and the Near source strong motion (NESS) datasets. With Mh=5.7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{M}}_{\mathrm{h}}=5.7$$\end{document}, we could rectify the MW\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{W}$$\end{document} scaling issue, while also reducing σμ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\upsigma }_{\upmu }$$\end{document} at MW≥Mh\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{W}\ge {\mathrm{M}}_{\mathrm{h}}$$\end{document}. We then evaluated the GMM against NESS data, and found that the SAs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{SAs}$$\end{document} from a few large, thrust-faulting events in California, New Zealand, Japan, and Mexico are significantly higher than GMM median predictions. However, recordings from these events were mostly made on soft-soil geology, and contain anisotropic pulse-like effects. A more thorough non-ergodic treatment of NESS was not possible because most sites sampled unique events in very diverse tectonic environments. We provide an updated set of GMM coefficients,σμ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\upsigma }_{\upmu }$$\end{document}, and heteroscedastic variance models; while also cautioning against its application for MW≤4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{W}\le 4$$\end{document} in low-moderate seismicity regions without evaluating the homogeneity of MW\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${M}_{W}$$\end{document} estimates between pan-European ESM and regional datasets.


Introduction
For engineering purposes, a ground-motion observation at a site from an earthquake is often reported as the peak acceleration response of a 5% damped Single-degree-of-freedom (SDOF) oscillator with fundamental period T when excited by the recorded accelerogram at the site. For a variety of SDOF fundamental periods, the vector of spectral accelerations [ SA(T) ] is the so-called response spectrum. Thousands of response spectra collected from several hundreds of earthquakes at several dozens of recording sites are compiled into engineering ground-motion datasets, such as the recent pan-European Engineering Strong-Motion (ESM by Bindi et al. 2018;Lanzano et al. 2018), the global Next-Generation-Attenuation West-2 (NGA-W2 by Ancheta et al. 2014), the Japanese KiK-net (Okada et al. 2004) ground-motion datasets (Bahrampouri et al. 2021;Dawood et al. 2016), and a few others. Such datasets are essential to studying the physical characteristics of SA(T) from past earthquakes and predict SA(T) of future earthquakes. For this purpose, a common practice has been to regress empirical attenuation relations connecting these observed SA(T) to the properties of earthquake ruptures (e.g. moment magnitude M W ), the local geological conditions at the effected site (e.g. time-averaged shear-wave velocity in the top 30 m of a 1D soil column V s30 ), and the rupture-to-site distances (e.g. Joyner-Boore distance R JB ). Such empirical relations are commonly referred to as Ground-Motion Models (GMMs), and are widely used in probabilistic seismic hazard assessments (PSHA). Therefore, the quality and quantity of ground-motion datasets, and the physical and statistical robustness of GMMs -especially beyond the ( M W , R JB , V s30 ) ranges of the regressed dataset -are of utmost importance in PSHA.
The most recent, and the largest yet, pan-European Engineering strong-motion dataset (ESM) was meticulously developed as part of EPOS (www. epos-eu. org), and used for the next update of European Seismic Hazard Map of the region-from ESHM13 (Woessner et al. 2015) to ESHM20 (www. efehr. org). This dataset is tenfold larger than its predecessor RESORCE (Akkar et al. 2014b), and facilitated the development of the Kotha et al. (2020) GMM, which superseded its RESORCE based predecessor Kotha et al. (2016). The new GMM also made use of the latest geological and tectonic regionalisation models (e.g. Basili et al. 2019) to predict more accurate (e.g. region-, locality-and site-specific) ground-motions in the extremely heterogeneous active shallow crustal regions of pan-Europe. Therefore, this regionally-adaptable GMM was adapted into a scaled-backbone GMM logic tree implementation (e.g. Bommer 2012;Douglas 2018) in the new ESHM20 by Weatherill et al. (2020b).
Though current ground-motion datasets, including the ESM are now an order of magnitude larger than a decade ago, a typical PSHA requires ground-motion predictions for rare scenarios (e.g.M W ≥ 5.5 & R JB ≤ 30 km) that remain very sparsely or never sampled before, but are also deemed the most critical for built environments in the region. For instance, only 2% of the ESM data qualifies the hazard critical criterion above. Of course, the other 98% data is quite useful in robustly calibrating the model for its region-, locality-, and site-specific adaptability. But in seismically active regions of Europe, such as Italy, Turkey, and Greece, it is perhaps more important to maintain GMM's physical and statistical robustness in the hazard critical scenarios. On the other hand, for low seismicity regions of Northern Europe, even at long return periods (e.g. 2475 years) moderate magnitude events (e.g.5 ≤ M W ≤ 6 ) may dominate the hazard, and a reduction in aleatory variability (from better calibrating the region-, locality-and site-specific terms) will likely pay-off more than a better constraint of the larger magnitude events.

3
Thanks to the timely feedback from the community of seismic hazard and risk modellers involved in ESHM20, we were motivated to evaluate the performance of the Kotha et al. (2020) GMM in hazard critical scenarios of Mediterranean active shallow crustal regions. Especially at short distances ( R JB ≤ 20 km), the short-period SA(T ≤ 0.2s) ground-motions predicted by the GMM for M W ≥ 6.5 are lower than those for 5.7 ≤ M W ≤ 6.2 , and similar to those for M W ≈ 5.5 . Although such behaviour reflects the modelling choice and is to an extent data-driven, it is discordant with the behaviour of other contemporary GMMs at near-source distances (at short periods and for large magnitudes); and more importantly, has a rather high within-model uncertainty due to the lack of calibration data. To resolve this issue, we propose a revision to the model, while also evaluating its behaviour against the recently published NEar Source Strong-motion datasets (NESS1 and NESS2 by Pacor et al. 2018;Sgobba et al. 2021b). Since NESS2 is an update of the NESS1 dataset, we refer to them as NESS from hereon. Figure 1 compares the ESM and NESS datasets. Since both datasets are compiled with identical data qualification procedures, they are seamlessly compatible. However, it is to be noted that while ESM contains data exclusively from shallow crustal earthquakes in the European Mediterranean region, NESS is primarily composed of data from global large earthquakes from Western United States, Japan, New Zealand, Mexico, and Taiwan. The top-left panel of Fig. 1, showing the distribution of recordings in (M W vsR JB ) ranges of the two datasets, clearly shows that NESS complements ESM with several recordings from hazard critical scenarios ( M W ≥ 5.5&R JB ≤ 30 km )-albeit from global earthquakes originating in very diverse tectonic environments, and recorded by stations in very particular geological settings. In addition, this plot also shows that 30% of the NESS recordings are characterised by pulse-like characteristics, either due to geological or finite-fault effects. In the other panels, we only compare the two datasets at M W ≥ 5.5 (data above the blue line). The top-right panel of Fig. 1 shows the scaled kernel density of recordings versus rake angles (fault mechanism) of M W ≥ 5.5 events in the two datasets. Scaled (to 1) kernel density plots are smoothed version of frequency histograms, and are more intiutive when comparing the sampling rate of continuous data between two datasets. Evidently, the ESM dataset is dominated by recordings from normal faulting ( −110 0 ≤ rake < −70 0 ) and reverse faulting ( 70 0 ≤ rake < 110 0 ) events from Italy, Turkey, and Greece. NESS on the other hand contains data from strike-slip ( 160 0 ≤ |rake| ) and reverse-oblique ( 20 0 ≤ rake < 70 0 ) faulting events from elsewhere. Similarly, the bottom-left panel shows that the NESS data (from M W ≥ 5.5 ) is predominantly recorded on soft-soils characterised by very low V WA07 s30 -which is the V s30 inferred from topographic slope using Wald and Allen (2007) model-when compared to the more uniform distribution in ESM. Finally, the bottom-right panel shows that the distribution of reported stress-drops-by various studies cited in Sgobba et al. (2021b) and Bindi and Kotha (2020)-of the sampled M W ≥ 5.5 events are comparable.
Although the two datasets are quite complementary, the NESS2 dataset of Sgobba et al. (2021b) was not available during the development of Kotha et al. (2020) GMM. Even if the smaller NESS1 dataset of Pacor et al. (2018) was available in a limited way, it was decided against integrating it into ESM. The reasons being: (1) Kotha et al. (2020) GMM was intended to be partially non-ergodic; in the sense that, events were localised into tectonic localities to capture systematic differences between earthquake generating tectonic sources, and stations were regionalised into attenuating regions to capture differences in apparent anelastic attenuation of SA(T) . The tectonic localisation and attenuation regionalisation models used therein could not be extended to contain global events and stations. The possibility of simply assigning NESS (or ESM) data to larger political boundaries was decided against because the region-toregion and locality-to-locality random variances estimated from ESM dataset were to be used in constructing the backbone GMM logic tree for ESHM20 (Weatherill et al. 2020b), and these estimates were intended to capture the diversity of active shallow crustal regions in Mediterranean region alone; and not partially global.
(2) Kotha et al. (2020) GMM was also required to function as a site-specific GMM for sites with sufficient number of recordings or otherwise, use the site-specific randomeffects to develop a large scale site-response model for use in European Seismic Risk Map (ESRM20 by Crowley et al. 2019 andWeatherill et al. 2020a). For this purpose, it was necessary that each site recorded a variety of magnitudes, and event-to-site distances and azimuths. As well that events are recorded by stations with a variety of site conditions. Most stations in NESS dataset have recorded a single large event at a close distance, and most events are recorded by sites located in very similar geological settings. In this regard, as opposed to the crossed and nested nature of ESM dataset, NESS is exclusively nested. In order to avoid any trade-offs between site-specific and event-specific (between-event) random-effects, we chose not to integrate the two datasets.
(3) As seen in top-left panel of Fig. 1, around 30% of NESS recordings show pulse-like characteristics, which are often characterised by higher long-period SAs and lower short-period SAs than average, depending on the event-to-site distance and azimuth. Instead of biasing the (isotropic) GMM median (at short distances and long periods) with pulse-like effects, we chose to reserve NESS datasets for investigation of complex near-fault phenomenon, as done by Sgobba et al. (2021a).
In this study, we discuss the re-parametrization of the Kotha et al. (2020) GMM, consequent impact on its within-model (epistemic) and aleatory uncertainties, its limitation in capturing the complex near-source finite-fault phenomenon, and the apparent (or actual) global diversity of M W ≥ 5.5 shallow crustal earthquakes.

Functional form
We briefly discuss the choice of parametrization (or functional form) of the Kotha et al. (2020) GMM. The number and combination of parameters in this GMM is far fewer than the more sophisticated NGA-W2 GMMs, such as Abrahamson et al. (2014); Campbell and Bozorgnia (2014); Chiou and Youngs (2014), and comparable to Boore et al. (2014). Of course, the sophistication of NGA-W2 GMMs comes from a more complex characterisation of the large magnitude near-fault effects, and complex site-effects such as nonlinear soil response and basin effects, relying on physics-based numerical simulations. While in application driven GMMs for Europe (e.g. Bindi et al. 2017), the decision for a more modest parametrization has been made to respect the limited metadata on earthquake rupture (event, from hereon), propagation path (or simply, path), and local geological conditions at the receiving site (site properties, in short); in seismic hazard models (e.g. ESHM13 and ESHM20), and also in the strong-motion datasets (e.g. RESORCE and ESM). In fact, the functional form had only minor changes since Bindi et al. (2014) and Kotha et al. (2016), despite the tenfold increase in data since RESORCE. A more detailed description is available in the Kotha et al. (2020) study. Here we reintroduce the functional form in Eqs. (1) through (4): (1) ln(μ) = e 1 + f R,g M W , R JB + f R,a R JB + f M M W + δL2L l + δB 0 e,l + δS2S s + ε (2) The coefficients e 1 , b 1 , b 2 , b 3 , c 1 , c 2 , c 3 are the so-called fixed-effect coefficients of a mixed-effect model; while δc 3,r , δL2L l , δB 0 e,l , δS2S s are the random-effects, and ε are the residuals. In addition, the h D (pseudo-depth, h-parameter or near-source saturation parameter) was a fixed-effect regression coefficient in Bindi et al. (2014) and Kotha et al. (2016) -making them nonlinear mixed-effect models -while Kotha et al. (2020) assigned it a priori values to keep the regression linear. The random-effects' description and their purposes are elaborated in Kotha et al. (2020), and will not be repeated here for brevity. Since this study is about revising the fixed-effect coefficients, we explain their original purpose here: • 1 is the generic offset, which is always recommended in a multivariate regressionirrespective of the choice of independent predictor variables (here, M W , R JB ). Absence of this parameter may destabilize the mixed-effects regression, and redistribute the offset among the random-effects. • 1 captures the linear increase of ground-motion with M W and is perhaps the most prevalent parameter among 50 years of GMMs (see Douglas 2010).
• 2 captures the nonlinear increase of ground-motion with M W , as in Eq. (4). Fukushima (1996) has demonstrated eloquently the necessity of this term in a GMM, especially when the Fourier source spectrum is approximated to be an ω −2 source model proposed by Brune (1970). When the concerned magnitude range ( 3.1 ≤ M W ≤ 7.4) of the dataset is wide enough that their corner-frequencies ( 0.2 ≤ 1 f C ≤ 12s , estimated by Bindi and Kotha 2020) span the range of spectral periods ( 0.3 ≤ T ≤ 8s ), the nonlinear scaling of Fourier amplitudes with M W is captured by an M 2 W term. We note that, although the period range of Kotha et al. (2020) GMM is 0.01 ≤ T ≤ 8s , since the Fourier amplitudes at frequencies f > 3Hz and SA(T) with 1 T ≥ 3Hz are not analogues (e.g. Bora et al. 2016), we limited the range to 0.3 ≤ T ≤ 8s in our argument above.
Of course, if b 2 should take a positive or negative value depends on the magnitude scale. Fukushima (1996) showed that it should be negative if the choice of magnitude scale is M W , and positive if the scale were to be M L or M JMA . In agreement, both the pan-European Bindi et al. (2014) and Kotha et al. (2016) estimates of their respective b 2 (at all periods) were negative, when regressed over the data from 4 ≤ M W ≤ 7.6 events in RESORCE dataset. The Akkar et al. (2014a) shares its quadratic form with pan-European models mentioned here, while being derived from the NGA-W2 dataset, also has small but positive M 2 W scaling coefficient. The Kotha et al. (2020) and Boore et al. (2014) both show positive M 2 W scaling at short periods, and their respective predecessors Kotha et al. (2016) and Boore and Atkinson (2008) show the contrary.
A strong decrease in short-period ground-motions towards lower magnitudes has been demonstrated both theoretically and empirically by Douglas and Jousset (2011), and was recognised to be a source effect -a parameter that attenuates high-frequency groundmotions in the ω −2 source model of Brune (1970). Theoretically, b 2 should be negative to accommodate this. However, the non-parametric regressions on ESM (see Fig. 4 of Kotha et al. 2020 and supplementary Fig.S1 here) suggested instead a positive b 2 . It is interesting to note that, in trial GMM regressions excluding the data from events with M W ≤ 4.1 , the b 2 estimates were once again negative -as was the case with RESORCE based GMMs regressed over data from 4 ≤ M W ≤ 7.6 events. Based on these trials, in our opinion, the main reason for the positive b 2 values of Kotha et al. (2020) is the introduction of more than 10,000 new records from M W ≤ 4.5 events in ESM (details in Kotha et al. 2020) with respect to RESORCE. We suspect poorly constrained M W ≤ 4.5 and is a factor, but probably not the only one, as also discussed in Kotha et al. (2016). Figure 2 shows the correlation matrix of the fixed-effects coefficients of Kotha et al. (2020) GMM. It is clear that at all periods, the coefficients b 1 and b 2 are almost perfectly correlated, which suggests one of them could be dropped out of the regression. GMM regression trials removing b 1 rendered b 2 estimates to be negative-as suggested by Fukushima (1996) -but lead to under prediction of short-period ground-motions at small magnitudes. Trials without b 2 would mean abandoning the Fukushima (1996) recommended nonlinear scaling with M 2 W . And finally, removing either b 1 or b 2 led to a substantial increase in AIC, BIC, Log-likelihood, and deviance estimates (typical metrics of the analyses of variance) with respect to the model containing both. Meaning, presence of both these fixedeffects coefficients is beneficial to model fit. Therefore, we chose not to modify this component of the functional form for time being.
• 3 is the coefficient controlling the linear scaling of ground-motion at large magnitudes M W > M h . In Kotha et al. (2020) GMM, M h is a placed at M W = 6.2 based on the non-parametric analyses presented therein. The choice of M h is largely ambiguous and is chosen iteratively. For example, Boore et al. (2014) chose a period-dependent M h ∈ [5.5, 6.2] , while Abrahamson et al. (2014) and Campbell and Bozorgnia (2014) chose to introduce multiple hinges with multiple period-independent M h . • Several GMMs use the so-called hinge-magnitude ( M h ) to disassociate the scaling of large events from that of small events at near-source distances. While b 1 and b 2 capture the nonlinear scaling of SAs with M W for small events that can be approximated as point-sources, b 3 captures the scaling for large events that cannot be approximated as point-sources. Above a given magnitude, not only the finite dimension of the rupture (event) plays a role in the scaling of SAs with M W and R JB , but even the spatial variability of kinematic properties of the rupture leaves an imprint on the SAs recorded near the fault. Since the same explanatory variables (here M W and R JB ) are used for all events -irrespective of whether they can be approximated as point-sources or notintroduction of the M h allows the model to 'break' the M W scaling and capture different M W -dependent trends on either sides of M h . Above M h = 6.2 , the limited availability of data does not allow introduction of complex scaling, therefore a simple linear scaling form with b 3 is used. We note that, although M h allows dissimilar scaling on its either sides, they are not completely independent because the overall M W scaling (Eq. 4) is a continuous function with discontinuous derivative at M h . Therefore, the choice of M h will influence scaling at both M W ≤ M h and M W > M h -as shown later in this manuscript. In summary, M h is needed to give an additional degree-of-freedom to the GMM to distinguish the M W scaling of events of point-source approximation from events that are finite-sources. Figure 2 shows that b 3 is poorly correlated to b 1 and b 2 , as intended. Intuitively, removing b 3 (and the hinge at M h ) while maintaining both b 1 and b 2 in the functional form would strongly bias their estimates towards the relatively numerous M W ≤ M h events. This would lead to a severe over-prediction at larger magnitudes, which are often of greater relevance (hazard critical) than smaller events in PSHA. In fact, the choice M h = 6.2 has led to strong over-prediction for events in the range 6.2 ≤ M W ≤ 6.5 , which is one factor motivating us to revise this part of the GMM in this study. Figure 3 compares the SA(T) magnitude-scaling of a few recent GMMs for T = 0.01, 0.1, 1, 2s at R JB = 5, 20, 50 km . Evidently, the choice of M h = 6.2 led to predictions by Kotha et al. (2020) to be oddly higher for M6.2 events than M6.5 events. Moreover, the kink at M h = 6.2 doesn't seem to reflect any physical phenomenon that would suggest M6.2 events to be often stronger than any other magnitudes. This oddity is more prevalent at short distances and short periods. In this study, we rectify this issue by revising our choice of M h . Accepting that b 3 is necessary in the GMM, we will demonstrate the impact of (apparently ambiguous) choice of M h on the GMM predictions for large magnitudes and short distances. • 1 is the fixed-effect coefficient capturing the linear decay of SA(T) with R JB , i.e. the geometrical spreading of ground-motion as a spherical wave front. In distance range 10 < R JB < 80km , analytical values of decay rate are about − 1.1 to − 1.3, but can also be strongly region dependent. For example, in the spectral decomposition of this ESM dataset using Generalized Inversion Technique (GIT, as in Castro et al. 1990, Edwards et al. 2008, and Oth et al. 2011), Bindi and Kotha (2020) showed that the slopes of non-parametric 1 Hz Fourier ground-motion amplitude decay curves at R JB < 80km are: − 1.36 in Italy, − 1.34 in regions encompassing Balkans and Romania, and -1.03 in regions covering Aegean and East Anatolian regions. Since the c 1 estimates of our model range between from − 1.34 and − 1.12 for SA(T = 0.1s − 8s) , we are not concerned with revising this part of functional form. • 2 complements c 1 in capturing the magnitude dependence of geometric spreading due to finiteness of large magnitude events. This parameter is only necessary in GMMs predicting spectral accelerations, and not as much in those predicting Fourier amplitudes (e.g. Cotton et al. 2008;Kotha et al. 2021). This parameter usually takes positive values to simulate the more gradual decay of SA(T ) in near-source distances from large events, and increases towards longer periods. However, this is simply a consequence of the functional form. For our concerns, Fig. 2 shows that c 1 and c 2 are almost uncorrelated, which means it deserves place in the functional form. • 3 captures the apparent anelastic attenuation of SA(T ) with distance. Unlike the geometric spreading of c 1 and c 2 , c 3 is intended to simulate the exponential decay of SA(T ) with distance due to intrinsic scattering and absorption of radiated seismic energy. The effect of this parameter is more pronounced at R JB > 80km , and is known to be strongly region dependent. We often find that c 1 and c 3 fixed-effects are inversely correlated because both capture the decay of ground-motion with distance; similar to correlation of ln(R JB ) and R JB . However, since we have used a dataset of recordings with 0 ≤ R JB ≤ 545km , the negative correlation between c 1 and c 3 is reasonably low at all periods. Inverse correlation of c 1 and c 3 is lower at short periods because anelastic attenuation effects high-frequency ground-motions more than low-frequency groundmotions. While at longer periods, c 3 values become close to zero and slightly more inversely correlated to c 1 . For instance, a few GMMs force c 3 = 0 at T ≥ 1s . But as long as they are non-positive-indicating an unphysical exponential increase with distancewe are not too concerned about revising this part of the model.
Essentially, we mean to say that all the fixed-effects parameters of our model are necessary, and most behave as expected. The only exception being b 3 (and b 2 to some extent), which in a way is also the most critical parameter controlling the GMM predictions for large magnitudes at short distances, and therefore the hazard and risk assessments. More specifically, our concern has been that the b 3 of Kotha et al. (2020) GMM are negative for T ≤ 0.5s , which led to decreasing median SA(T ≤ 0.5s) with increasing M W > 6.2 . This could have been avoided by constraining b 3 to be non-negative during the GMM regression, but that would have been too strong an assumption; considering that the apparent oversaturation of short-period SAs at large M W has been also observed in broadband numerical simulations (as mentioned in Boore et al. 2014). In the following sections, without changing the mixed-effects configuration of the GMM or applying additional constrains, and using the exact same dataset used in Kotha et al. (2020), we will choose a variety of M h values to understand how the b 3 estimates vary; and of course, its impact on median, aleatory variability, and within-model uncertainty of the GMM.

Results and discussion
The published fixed-effect coefficients of Kotha et al. (2020) GMM were estimates using M h = 6.2 , based on the non-parametric trends of SA(T) scaling with M W (Fig.S1). Here we repeat the regressions and estimate the fixed-and random-effects for the choices of M h = 5.5, 5.7, 6.2, 6.5 . While the Kotha et al. (2020) GMM was derived using a robust linear mixed-effects regression algorithm (robustlmm by Koller 2016), in this exercise involving several trials, we use a less computationally intense ordinary least-square mixedeffects regression algorithm (lme4 by Bates et al. 2014). Both packages are developed for use with R software (Team 2013). We have verified (in development of Kotha et al. 2020) that both the algorithms yield nearly identical estimates of fixed-effect coefficients. In fact, even the fixed-effect variance-covariance matrices differ ever so slightly between robust and ordinary least-square estimates-at least in this exercise. The biggest differences to be expected are in the random-effect and residual variances, but this will be handled in the following sections. Therefore, the inferences we make in this study-at least about the fixedeffects-will not change if we preferred either robust or ordinary least-square regressions.
Essentially, we recalibrated the GMM for PGA , PGV , and SA(T) at 34 values of T = 0.01 − 8s , while choosing M h = 5.5, 5.7, 6.2, 6.5 in Eq. (4) and keeping all other pre-sets unchanged; i.e. M ref = 4.5 , R ref = 30km and h D = 4, 8, 12km depending on the hypocentral depth of the event. We discuss the fixed-effect (median) predictions, and the (M W , R JB ) dependent within-model (epistemic) uncertainty σ μ estimated using their variance-covariance matrices. σ 2 μ is the asymptotic variance defined in Atik and Youngs (2014). Figure 4 shows the magnitude and distance scaling of the four M h -trial GMMs. Note that the predictions corresponding to M h = 6.2 in this figure are identical to those presented in the Kotha et al. (2020). The four trial GMMs have identical magnitude and distance scaling in the best sampled range 4 ≤ M W ≤ 5.5 , irrespective of location of the hinge magnitude. But beyond these ranges, particularly at M w ≥ 5.5 (and slightly at M W < 4) , the median predictions deviate significantly. The apparent oversaturation of short-period SAs for large magnitudes and short distances disappears completely when M h = 5.5, 5.7 ; while for M h = 6.2, 6.5 the GMM predicts a greater degree of oversaturation than that could be constrained by the data, and in doing so diverged from the general trend found in other GMMs (see Fig. 3).

Fixed-effects
As we can identify no clear physical explanation as to why predictions for an M7 event should be systematically lower than those of a M6.5 event, we cannot consider the apparent oversaturation to be defensible. Despite, several independent datasets we have reviewed-for example, those used in the development of Cauzzi et al. (2015) and Boore et al. (2021), near-source datasets of Pacor et al. (2018) and Sgobba et al. (2021b), etc.-showed recorded short-period SAs of M W ≥ 7 to be lower than smaller magnitude events. In fact, Cauzzi et al. (2015) provided an explicit adjustment to the magnitude scaling of their GMM if one prefers to ignore the data-driven apparent oversaturation at higher magnitudes. Even the recently published dataset of synthetic near-source ground-motions, BB-SPEEDset by Paolucci et al. (2021), shows an on-average lower short-period SAs for M W ≥ 7 . Boore et al. (2014) refer to the simulations of Schmedes and Archuleta (2008) to support the apparent oversaturation of short-period SAs, which was avoided in their earlier Boore and Atkinson (2008) GMM by constraining the relevant coefficient to be non-negative.
Here we mean to say that, oversaturation is apparent in several datasets, but the reasons as to why could be from: (1) very poor sampling at near-source distances from the rare large magnitude events, (2) finiteness of large events and location of recording stations relative to the largest asperity, i.e. stations with R JB , R rup = 0km but R epi , R hypo ≠ 0km , (3) location of recording stations in the areas receiving lower shear-wave radiation energy (due to anisotropy) or directivity effects (e.g. Dujardin et al. 2018;Kotha et al. 2019), (4) regional variability of large events due to tectonic diversity, i.e. stress loading and release rates, maturity of fault systems, etc., as reported in Radiguet et al. (2009) andChounet et al. (2018). It could also be that large and characteristic events of M W ≥ 7 are likely to occur on the same areas of the fault plane with lower fault friction, i.e. the mature sections of the fault plane, and thereby releasing lower amount of high-frequency shear-wave energy than smaller events occurring in the less mature sections. In addition, rupture directivity and Doppler's effect could have caused a stacking of high-frequency pulses into a single low-frequency pulse-like ground-motion, leading to higher low-frequency groundmotions and lower high-frequency ground-motions. These effects need to be investigated thoroughly using the recently published near-source strong motion datasets (e.g. NESS1 and NESS2) and broadband simulations (e.g. BB-SPEEDset). In-lieu of developing semiempirical near-source adjustments to our GMM, as in the NGA-W2 GMMs (Donahue and Abrahamson 2014), here we will first focus on revising the M W scaling component.
It is interesting to see (in Fig. 4) that changing the M h to lower values has curtailed the issue of apparent oversaturation; although non-parametric plots in Kotha et al. (2020) suggested it to be close to M W = 6.2 . The fact that the fixed-effect b 3 is poorly correlated (in Fig. 3) to other coefficients also helps in maintaining the distinction between scaling on either sides of M h . In Fig. 4 we also observe that the within-model uncertainty of predictions ( σ μ ) at M h ≤ M W is lower when considering M h = 5.5, 5.7 , which could be from the increased amount of data in calibrating b 3 than when M h = 6.2, 6.5. Figure 5 shows the response spectra predicted by the GMMs with four alternative M h for a set of scenarios. Although the shape of the response spectrum does not change much, one striking feature is the clear reduction in the median within-model uncertainty σ μ . In this figure, as in the previous, the ±σ μ is shown as the coloured ribbon around the median prediction (solid line). Apparently the reduction in uncertainty on b 3 has led to a substantial decrease in σ μ , to the point that the predicted median and uncertainty of response spectra for (M7.5,10 km) when setting M h = 5.5, 5.7 has no overlap with that of (M5.5, 10 km); unlike when using M h = 6.2, 6.5. Figure 6 is more illustrative of the change in within-model epistemic uncertainty of the GMM depending on the choice of M h . It is evident that σ μ is lower for hazard critical scenarios (M W ≥ 5.5&R JB ≤ 30km) when M h is lowered to 5.5 or 5.7, due to the substantial increase in number of events and recordings used to constrain b 3 . Based on all the figures referred to so far, we consider a reduction of M h to M W = 5.7 is the most reasonable revision to the fixed-effects of the GMM.

Random-effects and residuals
Assured that the fixed-effects estimates and predictions improve when shifting to Response spectra predictions including within-model epistemic uncertainty ( ± ) when M h = 5.5, 5.7, 6.2, 6.5 . Left-panel shows the spectra for M W = 3.5, 5, 7.5 at R JB = 10km , and at R JB = 120km in the right-panel Fig. 6 Within-model epistemic uncertainty σ μ over the magnitude and distance range of the ESM dataset at T = 0.01, 0.1, 1, 2s when M h = 5.5, 5.7, 6.2, 6.5 M h = 5.7 , it is necessary to verify the impact on random-effect and residual variances. A major change in these quantities would require us to re-evaluate the physical meaning of random-effects, and would also challenge the understanding that random-effects are poorly correlated to fixed-effects, as shown in von Specht and .
Firstly, in Fig. 7 we present the comparison of random-effect and residual standarddeviations (aleatory variability) of the four M h -trial GMMs. Note that the curves associated to different choices of M h overlap in this figure, indicating that the estimates are almost identical-with even a small reduction in between-locality ( τ L2L ) and betweenevent ( τ 0 ) variabilities. Since our revision is primarily concerned with magnitude scaling, we see only minor changes in these event related aleatory variabilities. Note also that, the values shown here are not comparable to those in Kotha et al. (2020) because, the (lower) estimates therein are from robust mixed-effect regressions, while the (higher) estimates in Fig. 7 here are from ordinary least-square mixed-effect regressions.
We refer to the supplementary figures, Figs.S2, S3, S4, to compare the random-effect estimates of the four M h -trial GMMs. Therein we make a one-to-one comparison of random-effect values for levels of each random-effect groups -levels being regions, localities, and sites, when groups are between-region (attenuation, δc 3,r ), between-locality (source, δL2L l ), and between-site ( δS2S s ), respectively. Except for some very minor changes in δL2L l of a couple of tectonic localities l-at long periods where τ L2L is smaller than at short periods (Fig. 7)-there is no difference between the four M h -trial GMMs. In addition, Fig.S5 and Fig.S6 show the trends of aleatory residuals ≈ N 0, Φ 0 against M W and R JB , respectively. Neither plots show any striking differences among the four M h -trial GMMs, and are not discussed further. This will also allow us to neither revise the discussions in Kotha et al. (2020) nor the GMM backbone logic tree implementation presented in the companion article of Weatherill et al. (2020b).
The most relevant changes in random-effect estimates are those associated to individual events, i.e. the inter-event or the between-event random-effect group ΔB 0 e,l ≈ N 0, 0 . Figure 7 shows that between-event standard-deviation τ 0 is almost identical among the four M h -trial GMMs, with a minor decrease at long periods. However, this does not necessarily mean that the trends of ΔB 0 e,l against M W will remain unchanged. In fact, they should necessarily change to reflect the change in the fixed-effects median predictions in Fig. 4. Figure 8 shows the ΔB 0 e,l vsM W trends for the four M h -trial GMMs at T = 0.01, 0.1, 1, 2s . Here we see that the δB 0 e,l in the range 4 ≤ M W ≤ 5.5 do not show any remarkable differences between the four regressions, reflecting the unchanged median prediction in Fig. 4. The most relevant changes in random-effect estimates are those associated to individual events, i.e. the inter-event or the between-event random-effect group ΔB 0 e,l ≈ N 0, 0 . Figure 7 shows that between-event standard-deviation τ 0 is almost identical among the four M h -trial GMMs, with a minor decrease at long periods. However, this does not necessarily mean that the trends of ΔB 0 e,l against M W will remain unchanged. In fact, they should necessarily change to reflect the change in the fixed-effects median predictions in Fig. 4. Figure 8 shows the ΔB 0 e,l vsM W trends for the four M h -trial GMMs at T = 0.01, 0.1, 1, 2s . Here we see that the δB 0 e,l in the range 4 ≤ M W ≤ 5.5 do not show any remarkable differences between the four regressions, reflecting the unchanged median prediction in Fig. 4. However, at M W < 4 , there is a stronger upward trend when M h = 5.5, 5.7 , indicating a less positive value of b 2 than when M h = 6.2, 6.5 . This means that the positive b 2 values when M h = 5.5, 5.7 deviate further from the suggestion of Fukushima (1996) that the M 2 W coefficient should be negative.
We have mentioned earlier that a trial regression removing the data from M W ≤ 4.1 events in ESM rendered b 2 values negative. Although removing data from M W ≤ 4.1 events seems a straightforward solution to render b 2 values negative, if this GMM were to be used in low-moderate seismicity regions (e.g. Northern Europe), data from the Fig. 8 Between-event random-effects versus M W of the events in ESM dataset at T = 0.01, 0.1, 1, 2s when M h = 5.5, 5.7, 6.2, 6.5 in the four M h -trial GMMs numerous small magnitude events is quite useful in estimating between-region, betweenlocality, and between-site random-effects (see Kotha et al. 2021b for more details). The EMEC M W used in regression of this GMM, especially of smaller events from the Balkans, were approximated from local-magnitude ( M L , or other region dependent magnitude scales) using empirical relationships. Therefore, the positive b 2 values could in fact be a consequence of erroneous M W of small events. Recognising the inhomogeneity of M W estimates across the highly diverse regional/local seismic networks in Europe, we suggest evaluating the compatibility of M W estimates between the pan-European ESM (and EMEC) and regional/local datasets prior to adapting the GMM for hazard and risk assessments in those regions. We are currently investigating this issue with an independent dataset containing a greater number of M W ≤ 4 events from France (Traversa et al. 2020). We consider this as a more critical issue in low-moderate seismicity regions, where smaller earthquakes also contribute to hazard. For the moment, since ESHM20 sets the lower M W limit of magnitude-frequency distributions in its seismogenic source models at M W = 4.5 , positive b 2 values do not affect their hazard and risk estimates.
In Fig. 8, the most striking differences between the four M h -trials are seen at M W > 5.5 -highlighted in red for clarity. In the lower two rows corresponding to the estimates whenM h = 6.2, 6.5 , we see that most of the δB 0 e,l values are negative, i.e. below the zero-baseline. This indicates that these two models -among which is also the Kotha et al. (2020) -are strongly overestimating the median predictions atM W > 5.5 . This aspect was overlooked in Kotha et al. (2020) because: 1) that model was derived using a robust linear mixed-effects regression, and 2) the δB 0 e,l of the largest events were more evenly distributed around zero-baseline. About point 1, robust random-effect variances are (generally) smaller than ordinary least-square estimates because the apparent outlier events are down-weighted, pushing their δB 0 e,l further from zero and bringing the rest of δB 0 e,l closer to zero. This led us to believe that the model was performing reasonably well. Regarding point 2, we still notice that when using M h = 6.2, 6.5 the δB 0 e,l of M W ≥ 7 events are almost symmetrically distributed around zero. The strong oversaturation at large magnitudes seen for these two models is supported by non-parametric trends, but at the price of over predicting ground-motions of the more frequent (and relevant to high activity regions in Europe) 6 ≤ M W ≤ 6.5 events. Revising the model with M h = 5.5, 5.7 resolves this issue. There are two events with M W ≥ 6 with abnormally large δB 0 e,l values: one of them is the M6.1 event of 10 th June, 2012 in the Dodecanese Islands, Greece, and the other is the famous M6.45 Friuli earthquake of 6 th May, 1976. The Friuli event is known to be one of the strongest ever in Italy with a very high stressdrop, and therefore, a strongly positiveδB 0 e,l . Based on the analyses shown here, we considered that the optimal revision to the Kotha et al. (2020) GMM would be to choose M h = 5.7 , while keeping the mixed-effects formulation the same. Across the trials, M h = 5.7 trial GMM has the lowest within-model uncertainty on its median, and relatively unchanged random-effect and residual standarddeviations compared to the published estimates. Therefore, we move ahead by re-deriving the GMM with M h = 5.7 . In order to derive an updated set of coefficients to replace those provided by Kotha et al. (2020), we repeat the regressions shown previously but this time adopting the same robust linear mixed-effects regression approach used by Kotha et al. (2020).

3 6 Evaluation with ness dataset
So far in this study, we have re-derived and revised the Kotha et al. (2020) GMM using the ESM dataset. The revised model is a clear improvement with regards to the scaling of SA(T) with M W of pan-European shallow crustal events; most of which originate in Italy, Turkey, and Greece. However, it still remains to be seen how this revision performs against a complementary dataset with high-quality ground-motion recordings in near-source distances of large earthquakes from across the globe. For this purpose, we make use of the recently published NEar Source Strong-motion dataset NESS.

Dataset
NESS dataset is an impressive collection of 1189 high-quality ground-motion recordings from global large magnitude events with 5.5 ≤ M W ≤ 8.1 recorded in near-source distances 0 ≤ R JB ≤ 138km . Along with various ground-motion intensity measures of engineering interest, NESS also provides advanced event metadata (e.g. centroid moment tensor solutions, fault geometry, stress-drop, etc.) and distance metrics (e.g. nucleation depth, fault parallel and perpendicular distances, etc.). These metrics are quite useful in exploring the near-source finite fault effects, such as directivity, shear-wave radiation pattern, and pulselike ground-motions; as in Sgobba et al. (2021a). Moreover, the NESS dataset is compiled using the same processing methodologies as those of ESM; and hence, complement each other seamlessly. We refer the readers to Pacor et al. (2018) and Sgobba et al. (2021b) for elaboration on the NESS1 and NESS2 datasets, respectively. Figure 1 compares the M W − R JB distribution of data between NESS and ESM datasets. It is evident that NESS has much more data from hazard critical scenarios. However, we note that there are around 177 records, and 39 events common to both the datasets. Of the 34 recordings exclusive to NESS but from events present also in ESM, 15 are from the M6.5 Norcia, Italy earthquake of 30 th October, 2016; 10 from the M7.6 Izmit, Turkey earthquake of 17 th August, 1999; and one each from 9 other events. In this study, we chose to discard all NESS records or events that are already present in ESM; amounting to 211 recordings. In addition, 8 recordings associated to the M8.1 Iquique, Chile earthquake of 1 st April, 2014 have also been removed because it might have been wrongly classified in NESS as shallow crustal event while being an subduction interface event. Following a further data screening with the same selection criteria as in Kotha et al. (2020), out of the 1189 recordings we are left to evaluate the performance of our revised GMM against 947 recordings from earthquakes originating in Taiwan (382 records), USA (242), New Zealand (204), Japan (85), Mexico (27), and Italy (7). Note that the 7 records from Italy that did not feature in ESM -which contains only data up to 23 rd December, 2016 -are from an M5.5 aftershock of the 2016 central Italy sequence occurring on 18 th January, 2017.

Random-effects and residuals
To evaluate the performance of the Kotha et al. (2020) GMM and its revision presented here, we perform a random-effect and residual analyses of the NESS dataset. Instead of the customary plots showing GMM fixed-effects (median) prediction against observations, we 1 3 choose to present and analyse the trends of random-effects and residuals against predictor variables M W and R JB . For this purpose, we perform an ordinary least-square random-effect decomposition as shown in equation below: In Eq. (5), ln Obs e,s are the natural-log of SA(T) in the NESS dataset recorded for event e by site s , and ln(μ) is the fixed-effects GMM prediction for that event at that event-to-site distance (in R JB ). Since there are two GMMs with different M h , ln(μ) will also be different: one with M h = 5.7 and other with M h = 6.2 . No region-, locality-, and site-specific adjustments are made to ln(μ) . Residuals from ln Obs e,s − ln(μ) are then split into betweenevent ( δB e ) and within-event ( δW e,s ) terms. On the right-hand side of Eq. (5), of fset(0) means the random-effects decomposition does not estimate an offset or bias term. Unlike in the GMM, here we estimate the traditional between-event random-effect δB e , with subscript e as the event index. These are different from the δB 0 e,l of the GMM, which are corrected for tectonic locality-specific δL2L l . Technically,δB e ≈ δB 0 e,l + δL2L l , but since we could not localize the events in NESS, we could only estimate the δ B e . Out of the 1049 sites in NESS, 826 do not have a measured V s30 . As most of the sites are uniquely sampled, we could not introduce a between-site random-effect δS2S s , where subscript s is the site index. This means that, all the site-effects are redistributed among between-event δB e and within-event residuals δW e,s . Now that there are two GMMs we are evaluating -revised GMM with M h = 5.7 and original GMM with M h = 6.2 -against the NESS datasets, we have two sets of δB e and δ W e,s for PGA , PGV , and SA(T = 0.01 − 8s) . Similarly, for the ESM dataset, we have two sets of δB 0 e,l and ε . Note again that the δB 0 e,l and ε are filtered for tectonic locality-, attenuation region-, and site-specific effects, while δB e and δ W e,s are not. Therefore, although these are not technically the same quantities, they are still comparable. Figure 9 plots the between-event random-effects against M W of the M W ≥ 5 events in ESM and NESS dataset, estimated with respect to the GMMs with M h = 5.7, 6.2 . Of course, the localized non-parametric trend (loess by Jacoby 2000) always stays close to zero for the ESM data because the models were calibrated on it. The NESS data, however, (5) ln Obs e,s − ln(μ) = of fset(0) + δB e + δW e,s Fig. 9 Between-event random-effects, δ B e for NESS and δ B 0 e,l for ESM, versus M W of the events in ESM and NESS dataset at T = 0.01, 0.1, 1, 2s when M h = 5.7, 6.2 (rlmm) is significantly off the zero baseline, and often positively biased. This indicates that, despite the choice of M h , both the GMMs under-predict the ground-motions of events in NESS dataset -although the unrevised Kotha et al. (2020) GMM with M h = 6.2 appears to performs slightly better with a smaller bias. The between-event standard-deviation τ 0 of the two GMMs is about 0.45 at all periods (see Fig. 7); which means, although most NESS events are within ±τ 0 at short periods, at long periods most are significantly above +τ 0 .
The misfit between both the GMMs and the NESS events is a concern. But it is important to note a few assumptions made in this comparison: 1) the δB 0 e,l of the ESM events are corrected for specificities of the tectonic locality in which they originate using the δL2L l , and of attenuating regions through which their shear-waves propagate using the δc 3,r (more details in Kotha et al. 2020). While the δB e of the NESS events are not, and may include some of these effects, 2) the δB 0 e,l of the ESM events are corrected for site-response of their recording stations, while the δB e of the NESS events are not. This means that, if these events were preferentially recorded by soft-soil sites, then the their site-amplification is likely to have crept into δ B e estimates, and possibly inflating them, and 3) ESM events are primarily recorded at R JB ≥ 30km , while the NESS events at R JB ≤ 30km . Any nearsource finite-fault effects, such as directivity and shear-wave radiation pattern, may have aggravated their recorded ground-motions and in-turn, when unfiltered, may have inflated their δB e . Figure 10 is a polar coordinate diagram showing the δ B e of NESS events and δ B 0 e,l of ESM events against their rake angle. The radius of the plot measures the δ B e and δ B 0 e,l values, and the azimuth measures the rake of the event. In this figure, we see that the loess fit for ESM events coincides with the zero baseline -indicating no bias towards any focal mechanism. While for the NESS events, although the mean trend (of loess fit) coincides with zero baseline in the range −180 0 ≤ rake ≤ 20 0 , elsewhere there's a strong positive bias. This range 20 0 < rake < 180 0 corresponds to events often identified as reverse-slip or thrust faulting mechanism. Reverse-slip events often generate stronger ground-motions on their hanging-wall side than on foot-wall side (at the same R JB ) due to wedge geometry. If the recording stations were concentrated on this side of the fault trace, then it is likely that the finite-fault effects and anisotropic radiation (along with site-effects) might have aggravated their recorded ground-motions (e.g. Spudich 2013; Donahue and Abrahamson 2014;Kamai et al. 2014;Dujardin et al. 2018;Kotha et al. 2019;Sgobba et al. 2021a;etc.) Table.1 lists the events whose between-event random-effect δB e is often larger than τ 0 of the Kotha et al. (2020) GMM at 0.01s ≤ T ≤ 8s . The data from these events show all the characteristics that we hypothesized might have aggravated their recorded ground-motions. For instance, most events are of reverse-slip rupture type, recorded at very close distances, by sites whose V s30 characterises them as soft-soils with strong amplification of long-period ground-motions; and most interestingly, a good proportion of these recordings are identified with pulse-like (finite-fault effect) features. Here we mean to say that, it is clear that our GMMs are not able to predict the recorded ground-motions of these events, and the reason is two-fold: lack of site-specific recordings to correct for site-effects, and lack of GMMs ability to predict near-source finite-fault effects. With this experience, we anticipate further sophisticating our GMM with empirical or semi-empirical finite-fault adjustments to near-source predictions. For this purpose, the NESS2 dataset of Sgobba et al. (2021b), BB-SPEEDset of Paolucci et al. (2021), and work of Donahue and Abrahamson (2014) provide exciting opportunities. Supplementary Fig.S7 shows the trend of within-event residuals δ W e,s of the NESS events plotted against M W , along with the aleatory residuals ε of the ESM events. Since these plots show no discernible trends, we have excluded it from discussion. Figure 11   shows the same residuals against R JB , with the pulse-like ground-motion residuals in black. At longer periods, i.e. T = 1, 2s here, residuals from pulse-like recordings are preferentially above zero baseline, indicating that these ground-motions are systematically stronger than the GMM predictions. Without dwelling further, we refer to a similar finding by Sgobba et al. (2021a); wherein, pulse-like recordings are characterised by a higher than average long-period SAs and lower than average short-period SAs . We recognize the need to upgrade our GMMs to reflect these finite-fault features, but perhaps not in this study. Weatherill et al. (2020b) provided an M W dependent heteroscedastic model of τ 0 for use with the implementation of (revised) Kotha et al. (2020) GMM in the regionally adaptable ESHM20 backbone logic-tree. While that heteroscedastic model of τ 0 is adapted from Al Atik (2015), which was developed from a global dataset of events, here we provide one using only the ESM events. Alongside a heteroscedastic τ 0 vsM W for the ESM events, we also examine the impact of adding NESS events to ESM events. Figure 12 shows the investigation of τ 0 vsM W for two cases: δB 0 e,l of only the ESM events, δB 0 e,l of ESM events along with δB e of NESS events. We perform a simple test for heteroscedasticity -the Breusch-Pagan test (Breusch and Pagan 1979) -programmed into an R package lmtest by Zeileis and Hothorn (2002). The Breusch-Pagan test fits a linear regression model to the residuals of a linear regression model (here δB 0 e,l vsM W ) and rejects the 'null hypothesis: presence of homoscedasticity' ( H 0 ) if too much of the variance (here τ 0 ) is explained by the additional explanatory variables (here M W ). Therefore, if H 0 has a p-value below an assigned threshold (e.g. p-value < 0.05 ) then H 0 is rejected at that significance level, and heteroscedasticity is assumed.

Heteroscedasticity
The top panel of Fig. 12 shows the p-value of H 0 atT = 0.01s − 8s . Lower p-values mean the null hypothesis assuming homoscedasticity of variability of δB 0 e,l , δB e vsM W can be rejected at higher significance level; thereby, the alternative hypothesis of Fig. 11 Within-event residuals versus R JB of the records in ESM and NESS dataset at T = 0.01, 0.1, 1, 2s when M h = 5.7, 6.2 . Black markers indicate those records with reported pulse-like effects after removing the shared events heteroscedasticity is acceptable at higher significance. Higher p-values do not necessarily mean homoscedasticity, but simply a lower significance of heteroscedasticity. In this case, we observe that the heteroscedasticity is most significant for both ESM and ESM&NESS datasets in the range0.2 < T < 2s . AtT ≤ 5s , heteroscedasticity of ESM events' δB 0 e,l is stronger than when including also the δB e of global, tectonically diverse, events of NESS; as indicated by the lower p-values. As seen in Table.1, one of the reasons a few NESS M W ≥ 6 events have large δB e -thereby increasing the scatter-is likely because of their origin in a tectonically different environment (region), and style of faulting compared to those in Italy, Greece, and Turkey. Nevertheless, this could also be due to the finite-fault and site effects mentioned earlier.
In the middle panels of Fig. 12, we show the δB 0 e,l vsM W of ESM events in grey, overlain by δB e vsM W of NESS events in black. The error-bars indicate the ±MAD (median absolute deviation) of δB 0 e,l and ( δB 0 e,l , δB e ) in magnitude bins M W ∈ [3.4, 4), [4, 5), [5, 5.5), [5.5, 6), [6, 6.5), [6.5, 7.4].At M W < 5 , the MAD is comparable to the τ 0 , while at M W ≥ 6.5 the MAD is on average 35% smaller than τ 0 for ESM events depending on the period. For the combination ESM&NESS events however, although MAD for M W ≥ 6.5 is 20% lower than that of M W < 5 at T ≤ 3s , at T > 3s the MAD for M W ≥ 6.5 is in fact 15% larger. At such long periods, we hypothesise that many more complexities are responsible for the apparently increased diversity of large event ground-motions. Therefore, we consider our heteroscedastic model for the combination ESM&NESS as purely experimental.
In the lower panels of Fig. 12, we show the estimated M W binwise MAD (markers), and the proposed piece-wise linear model of heteroscedastic between-event variability is shown in Eq. (6). For the moment, we provide a readily applicable τ 0,M W (M W ) for the ESM events, and an experimental one for ESM&NESS data: In Eq. (6) showing the formulation of the heteroscedastic between-event variability τ 0,M W as a function of M W , at each period ( T , not shown in equation), τ 0,M 1 is the betweenevent variability for events with M W < M 1 = 5 , and is set equal to τ 0 . For M W ≥ M 2 = 6.5 , τ 0,M W M W is a smaller τ 0,M 2 . For M 1 ≤ M W < M 2 , τ 0,M W M W decreases linearly from τ 0,M 1 to τ 0,M 2 . Values of τ 0,M 1 are the same for ESM and ESM&NESS datasets, but τ 0,M 2 values are different, as seen in the lower-panels of Fig. 12. All these values are provided in the electronic supplement containing the revised coefficients of the GMM.
In addition to the heteroscedasticity of between-event variability, we have examined the magnitude dependent heteroscedasticity of residual variability; i.e. the so-called withinevent δW e,s residuals of NESS data containing the various regional and site-specific effects, and the aleatory residuals of ESM data corrected for region-, locality-and site-specific effects by Kotha et al. (2020). Once again, although δW e,s and ε are not technically the same, they are both the left-overs after filtering out all possible the fixed-and randomeffects from observations. Supplementary Fig.S8 is similar to Fig. 12 shown above for between-event random-effects. Therein we observed that the δW e,s of NESS data show strong heteroscedasticity at long periods. However, unlike the decreasing variability of δB e with increasing M W , long-period δW e,s (T ≥ 1s) variability increases rapidly with increasing M W . We understand this could be due to the anisotropy of shear-wave radiation, pulselike effects, etc. in near-source ground-motions. We believe it is necessary to investigate these effects in more detail than that can be presented here. Therefore, for the moment, we will not further discuss heteroscedasticity of within-event δW e,s residuals of the NESS dataset.

Conclusions
In this study, we have revised the fixed-effect regression coefficients of the Kotha et al. (2020) GMM. We first discussed the adequacy of various fixed-effects in the GMM in capturing the primary physical phenomenon controlling the seismic ground-motion observations. An outstanding issue with the GMM was the oversaturation of short-period SAs with increasing magnitude at near-source distances. Alongside, the selection of a hinge-magnitude M h = 6.2 led to large within-model uncertainty of median predictions at M W ≥ 6.2 . Since large magnitude earthquakes occurring at short distances from the site are often the hazard critical scenarios, the GMM within-model uncertainty has a large impact on seismic hazard results. Therefore, the most optimal revision, i.e. without effecting the randomeffects and residual variances, was to revise the GMM with a different M h . We have carefully analysed the within-model uncertainties associated to the choice of M h and finally chose M h = 5.7 to curtail the oversaturation issue. A new set of fixed-effect coefficients are generated using robust linear mixed-effect regressions to replace the one provided with Kotha et al. (2020).
In fact, there are several alternative M W scaling formulations already implemented in various GMMs. The one in Kotha et al. (2020) is among the simplest (oldest) one with a linear and quadratic scaling terms for small-moderate magnitudes ( M W < M h ), and linear-only term for M W ≥ M h . Even with such simple formulation, the choice of M h is still ambiguous, and may render the GMM behaviour inexplicable (and poorly constrained) at M W > M h for some choices of M h . Nevertheless, a break in M W scaling of SAs with introduction of M h is considered necessary here because: while b 1 and b 2 capture the nonlinear scaling of SAs with M W for small events ( < M h ) that can be approximated as point-sources, b 3 captures the scaling for large events that cannot be approximated as point-sources.
Another interesting observation in this study is the scaling of SAs for small magnitudes (e.g. M W ≤ 4 ). While empirical and theoretical expectations suggest that ground-motions decay rapidly towards smaller magnitudes, non-parametric trends and data-driven GMM coefficients suggest it isn't so in ESM. More specifically, the fixed-effects coefficient of M 2 W term in the GMM is positive, which is contrary to the theoretically expected negative value. Although removing ESM data from events with M W ≤ 4.1 renders this coefficient negative, data from the numerous small magnitude events was indispensable in estimating between-region, between-locality, and between-site random-effects; which can be used in making partially non-ergodic GMM predictions in regions with data in ESM, and in adapting the GMM to low-moderate seismicity regions with no data in ESM. Newer GMMs with lower magnitude limit of applicability up to M W = 3 , despite very low within-model uncertainty, are perhaps affected by uncertainty in M W estimates. We suggest caution when applying this GMM to low-moderate seismicity regions (e.g. France and Germany), where the M W estimates of small earthquakes may be quite uncertain, if not inferred from another magnitude scale (e.g. local magnitude M L , body-wave magnitude m b , etc.). Therefore, achieving homogeneity of M W estimates across the very diverse local/regional seismic networks (and datasets) in Europe has become quite critical to extending the applicability of GMMs towards lower magnitudes.
Once the GMM coefficients were revised to have lower within-model uncertainty on large magnitude predictions at near-source distances, we evaluated its performance against a high-quality near-source strong ground-motion dataset. Unlike the regressed ESM dataset however, the NESS dataset contains ground-motion data collected from tectonically diverse, large magnitude earthquakes recorded by dense network of stations located on particularly soft-soil geology. In addition, a large fraction of NESS data are characterised by pulse-like characteristics. Owing to its peculiar sampling, NESS was decided against integration into ESM; in order to not bias the isotropic Kotha et al. (2020) predictions with anisotropic pulse-like ground-motion observations. We noticed that most of the large magnitude events in NESS have produced stronger ground-motions than that the GMM predicts. The misfit between NESS observations and GMM median predictions is larger at longer periods, where pulse-like effects and soil conditions play a major role in aggravating ground-motions. At short periods, the misfit was within one standard-deviation, and that's an improvement we could achieve by revising the model with M h = 5.7 instead of M h = 6.2 of Kotha et al. (2020). For the sake of brevity however, we have only