SegPhase: Development of Arrival Time Picking Models for Japan’s Seismic Network Using the Hierarchical Vision Transformer

doi:10.21203/rs.3.rs-4291782/v1

Download PDF

Research Article

SegPhase: Development of Arrival Time Picking Models for Japan’s Seismic Network Using the Hierarchical Vision Transformer

https://doi.org/10.21203/rs.3.rs-4291782/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

A seismic arrival time picking model, SegPhase, is introduced to automatically process a large amount of seismic data recorded by large dense seismic networks with different sampling frequencies and numbers of observed components. Three models were created to address different sampling frequencies and the number of observed components in each network. The model structure uses a hierarchical Vision Transformer structure, which has not previously been used in seismic arrival time picking models and shows superior performance compared to conventional models using convolutional layers. The performance of SegPhase models was verified in terms of the relationship between arrival time residuals, output probability values, epicentral distance, signal-to-noise ratio, and magnitude, and compared to the PhaseNet models. The SegPhase models had better picking performance and number of seismic detections. Moreover, when the SegPhase models are applied to continuous waveforms, the relationship between the number of detections, O-C values and hypocenter determination error, and the threshold of output probability values used in the analysis was then investigated. It was found that when the threshold was lowered, more arrival times were used for earthquake detection not only with lower output probability values but also with higher output probability. Therefore, lowering the threshold allows the Phase association to make better use of the arrival times that the model assumes to be highly accurate. Although lowering the threshold value increases the error, its effect does not significantly impact the overall result.

Deep Learning

Phase Picking

Earthquake Detection

Manten Network

In recent years, various seismic wave arrival time picking models using deep learning have been developed (Ross et al., 2018; Zhu and Beroza, 2019; Mousavi et al., 2019; Zhu et al., 2022; Tokuda and Nagao, 2023; Sun et al., 2023). It has been reported that PhaseNet (Zhu and Beroza, 2019) outperforms traditional arrival time picking methods using AR-AIC (Sleeman and Van Eck, 1990) in accuracy. Furthermore, when comparing deep learning models, Generalized Phase Detection (GPD; Ross et al., 2018), PhaseNet and EQTransformer (Mousavi et al., 2019) performed well (Münchmeyer et al., 2022). Garcìa et al. (2022) compared the above three models. They reported that GPD has a high arrival time detection capability but many false positives, EQTransformer has a low arrival time detection capability but few false positives, and PhaseNet has an intermediate performance between GPD and EQTransformer. Garcìa et al. (2022) concluded that combining PhaseNet and phase association has the best earthquake detection performance for earthquake cataloging.

Deep learning models for arrival time picking can be divided into two categories. The first category is single-station models that input seismic waveforms observed at a single station and determine arrival times of P-waves and S-waves. This approach, used by PhaseNet and EQTransformer, focuses on data obtained from individual observation points. The second category is multi-station models that input seismic waveforms observed at multiple stations and determine the arrival times of P-waves and S-waves for each waveform. This approach, used by EdgePhase (Feng et al., 2022) and PhaseNO (Sun et al., 2023), focuses on using information from entire observation networks by integrating data from different locations. Single-station models can learn local features of seismic waveforms, while multi-station models can learn information from the wave field.

Accuracy is essential for creating earthquake catalogs using an arrival time-picking deep learning model. While it may be more effective to consider multiple stations when detecting earthquakes from continuous waveforms (Yano et al., 2021), it is important for earthquake detection using arrival times picked by a deep learning model to be able to achieve accurate picking. When creating a seismic catalog using a deep learning model, the process involves three primary steps. First, arrival times are picked from observed continuous seismic waveforms. Second, arrival times corresponding to the same earthquake are grouped together; this process is known as phase association. Finally, earthquake localization is performed. Hence, regardless of the model category, the accuracy of picking arrival times is crucial for earthquake detection using arrival time picking deep learning models.

The accuracy of arrival time picking models depends on the data used for training. It has been reported that differences between the regions used for training and application can degrade picking accuracy, and adapting models trained on three-component waveforms to only vertical-component waveforms can result in decreased accuracy (Münchmeyer et al., 2022). Moreover, deep learning models for polarity determination perform differently depending on the sampling frequency and observed location (Hara et al., 2019). Therefore, preparing models trained on seismic waveforms obtained in the application area, matching sampling frequencies, and waveform components can improve picking accuracy.

Dense seismic observation networks like the Manten network (Miura et al., 2010; Iio, 2011; Iio et al., 2017) and Metropolitan Seismic Observation network (Sakai and Hirata, 2009; Aoi et al., 2021), and temporary aftershock observation in Japan, in addition to stationary seismic observation networks like Hi-net (Okada et al., 2004; Obara et al., 2005; Shiomi et al., 2009), currently exist. Stationary seismic observation networks composed of seismometers sampling at 100 Hz in three components. The Manten seismic observation network, composed of seismometers sampling at 250 Hz in three components, has been deployed in the central and northern Kinki region, western Nagano region including the source region of the 1984 western Nagano prefecture earthquake, and the San-in region including the source region of the 2000 western Tottori prefecture earthquake, from 2009 to 2022. The Manten network has many observation stations: 80 in the Kinki region, 31 in Nagano, and 125 in the San-in area, making it a very dense network. Additionally, a temporary aftershock observation network called the 0.1 Manten network was installed in the source regions of the 2000 western Tottori prefecture earthquake (Matsumoto et al., 2020; Hayashida et al., 2020). The 0.1 Manten network consisted of 1000 seismic stations sampled at 100 Hz with only the vertical component. In addition, temporary aftershock observations were carried out in the aftershock areas of the 2016 Central Tottori Earthquake and the 2018 Northern Osaka Earthquake using seismic station used at the 0.1 Manten network (Iio et al., 2021). These networks have recorded a large volume of high-quality data, which has been utilized in various studies (Aoki et al., 2016; Katoh et al., 2018; Yukutake et al., 2020; Hayashida et al., 2020; Iio et al., 2021). Due to the high-density and long-term observations conducted, a large volume of recorded seismic waveforms exists. However, a significant amount of data remains in which earthquake detections and travel times have not been picked by humans. Dense and long-term seismic observation networks like the Manten observation network are globally rare and represent datasets that can contribute to understanding seismic activity and crustal structure. Therefore, comprehensively analyzing this dataset is meaningful.

This study developed a new seismic wave arrival time picking model suitable for the Manten observation network, aftershock observation networks above, and stationary observation networks such as Hi-net near the Manten and aftershock observation networks. Since the sampling frequency and the number of observation components differ in each observation network, three models were created: a model for the stationary observation network with 100 Hz sampling in three components (100 Hz model), a model for the Manten observation network with 250 Hz sampling in three components (250 Hz model), and a model for the 0.1 Manten network with only vertical components and 100 Hz sampling (M01 model).

The model structure developed in this study, called SegPhase, differs from previously reported arrival time picking model structures. The SegPhase uses a hierarchical Vision Transformer (ViT; Dosovitskiy et al., 2021) structure similar to Segformer (Xie et al., 2021). ViT applies the Transformer (Vaswani et al., 2017) used in large language models like ChatGPT (Brown et al., 2020) to image recognition models. The previous arrival time picking models primarily used convolutional layers in the encoder part for feature acquisition. On the other hand, our model structure introduces a novel approach by employing three ViT layers exclusively in the encoder for feature acquisition.

ViT is known to be more accurate than convolutional models (Dosovitskiy et al., 2021). Models using ViT and the Convolutional layer have different recognition characteristics. The convolutional layer has a narrow receptive field as it extracts features within the convolutional kernel. On the other hand, ViT has a wide receptive field as it extracts features from the entire input data. Therefore, the Convolutional layer is texture-oriented, while ViT is shape-oriented. The shape-oriented recognition characteristic is close to the human recognition characteristic, meaning ViT performs feature extraction closer to humans than the convolutional layer (Tuli et al., 2021).

We conducted a comparative analysis of picking performance using PhaseNet to evaluate SegPhase. Additionally, we utilized continuous waveform data from the aftershock observation network, which was installed in the source region of the central Tottori prefecture earthquake on October 21, 2016, to compare the number of detected earthquakes. A comprehensive comparison was made between earthquake catalogs generated by SegPhase and PhaseNet which was trained using the same dataset as SegPhase, the Horiuchi program (Horiuchi et al., 2013), and the Japan Meteorological Agency (JMA). This comparison facilitated discussions on the number of detected earthquakes and the accuracy of their hypocenter location based on the output probability threshold.

A package that we created to create a seismic catalog using SegPhase models can directly input WIN files which are a widely used file format in Japan (Urabe, 1994). Although pre-trained deep learning arrival time picking models are available such as PhaseNet and EQTransformer, when using those models to pick the arrival time of seismic waves contained in WIN files, it was necessary to convert them to files such as npy or npz (specific to the python package numpy; Haris et al., 2020). This conversion consumes time and storage space. Our package, however, can directly input WIN files, thereby saving both time and storage space.

In this study, we have developed a new seismic wave arrival time picking model, SegPhase, aimed at efficiently analyzing the vast amount of seismic waveform data obtained from the Manten network and surrounding stationary observation networks such as Hi-net. The primary objective of utilizing deep learning models for picking arrival times is to automate the picking process and the creation of earthquake catalogs. Picking arrival time is most important in seismological research. Thus, our goal is to enhance the accuracy and efficiency of arrival time analysis. Unlike conventional models that use convolution layers for feature extraction, SegPhase employs a hierarchical Vision Transformer, which performs feature extraction in a manner more akin to human recognition. Vision Transformers are more accurate than models that use convolution layers, possessing recognition characteristics closer to those of humans. Therefore, SegPhase is expected to achieve higher accuracy than traditional convolution-based arrival time picking models. We detail the design, implementation, and performance advantages of SegPhase through comparative analysis with PhaseNet.

2.1 Data

The training of the models utilized observed three-component waveforms sampled at 100 Hz, three-component waveforms sampled at 250 Hz, and vertical component waveforms sampled at 100 Hz within the range of 131.5°138.5°E, 34°37°N as shown in Fig. 1. The three-component 100 Hz waveforms were recorded by the stationary observation networks installed by the National Research Institute for Earth Science and Disaster Resilience (NIED), the National Institute of Advanced Industrial Science and Technology (AIST), the Japan Meteorological Agency (JMA), Kyoto University, and the University of Tokyo. The three-component 250 Hz waveforms were recorded by the Manten network, and the vertical component 100 Hz waveforms were recorded by aftershock observation networks. The eigenfrequency of the seismometers used in the stationary observation networks is 1 Hz, except for the GS. HNO is installed by AIST which has a eigenfrequency of 3 Hz. The eigenfrequency of the seismometers in the Manten networks is 2 Hz, and those in the 0.1 dense observation networks are either 2 Hz or 4.5 Hz.

The periods during which the waveform data were recorded vary across five regions: (A) San-in, (B) Northern Kinki, (C) Western Nagano, (D) Central Tottori, and (E) Northern Osaka (Fig. 1). Regions (A), (B), and (C) are where the Manten networks were installed. Regions (D) and (E) correspond to the aftershock areas of the M_J 6.6 central Tottori earthquake on October 21, 2016, and the M_J 6.1 northern Osaka earthquake on June 18, 2018, respectively (M_J is the JMA magnitude). Temporary aftershock observation networks were installed in central Tottori by the joint earthquake observation teams of Kyoto University, Kyushu University, and the Earthquake Research Institute of the University of Tokyo (Iio et al., 2020). Similarly, in northern Osaka, installation was conducted by the joint earthquake observation teams of Kyoto University, Kyushu University, the Earthquake Research Institute of the University of Tokyo, and Kansai University (Iio et al, 2021). In region (D), 21 three-component seismometers and 49 vertical component seismometers were installed, and in region (E), 90 vertical component seismometers were installed as part of the temporary aftershock observation network.

The data used for model construction includes 341,140 for three-component 250 Hz waveforms, 143,540 for three-component 100 Hz waveforms, and 280,370 for vertical 100 Hz waveforms. The data consists of earthquake waveforms ranging from − 1.1 ≤ M_J ≤ 5.4. The observed waveforms were randomly split by region into training, validation, and test data at 80%, 10%, and 10% respectively. Each dataset was normalized using Z-score normalization for each component without any filter processing, with a data length of 30 seconds. The ground truth data were assigned a probability distribution like Gaussian for the P- and S-wave and noise probability distribution (Zhu and Beroza, 2019). The Gaussian-shaped probability distribution we used had a mean as a human-picked arrival time value and a standard deviation of 0.05 s for the P-wave and 0.1 s for the S-wave. The vertex is then set to 1 by normalizing it by the maximum of the distribution.

In this study, data augmentation was performed by synthesizing multiple seismic waves in the direction of the time axis to create synthetic data that contained multiple seismic waves in a single trace (Figure S1). Firstly, the signal range was extracted from the seismic wave trace in the training data set. This signal range was from a random sampling point between 0.5 and 1 second just before the P-wave arrival time to the point where the signal-to-noise ratio (SNR) reached 2. These extracted signals were randomly combined and synthesized along the time axis to generate 30 seconds of waveforms. The average number of combinations was 3, and traces containing up to 9 seismic waves were synthesized. This process tripled the number of training data.

2–2. Model Structure

A model structure of SegPhase is illustrated in Fig. 2. SegPhase has an encoder-decoder architecture. Inputs are seismic waveforms, and outputs are the probability values of each label (P, S, or N label) at each sampling point. The encoder part is structured to use ViT hierarchically and downsamples data while learning seismic wave features. The structure of the decoder part uses a convolutional layer and an up-sampling layer. The decoder part performed up-sampling of data and a combination of information based on the features obtained in the encoder part. Outputs of the decoder part are input to a one-dimensional convolution and activation function softmax to obtain the probability values of each label at each sampling point.

ViT used in the encoder consists of Overlap Patch Embedding (OPE; Xie et al., 2021) and three Transformer blocks. The Transformer block consists of Multi Head Self-Attention (MHSA; Vaswani et al., 2017) and Mix Feed Forward Network (MixFFN; Xie et al., 2021).

OPE performed linear transformation using a one-dimensional convolution with stride settings that downsample data and Position Embedding (PE; Vaswani et al., 2017). Unlike Patch Embedding used in ViT for image classification, this study used an overlapping convolution, thus maintaining local continuity in OPE's output. After linearly transforming, PE was applied to incorporate positional information into the model. PE added a tensor of the same shape with trainable parameters to the linearly transformed data. These trainable parameters were initialized with random values following a normal distribution.

MHSA used self-attention (SA) to learn attention maps, indicating essential aspects of data. While convolution captures local features, MHSA learns inter-element features, enabling the acquisition of global data characteristics. MHSA performed self-attention in parallel across multiple "heads", capturing relationships in different representational subspaces and integrating these findings for more detailed information. SA is represented as:

$$SA\left(x\right)=softmax\left(\frac{q{k}^{T}}{\sqrt{{D}_{h}}}\right)v$$

where q is the query, k is the key, and v is the value, and these are obtained by linearly transforming and Layer Normalization (Ba et al., 2016) the data input into MHSA. Here, ${D}_{h}=D/h$, where D is the dimensionality and h is the number of heads. SA computes the dot product of q and k, normalizes it by the dimensionality contained in each head, and inputs it into the SoftMax function to obtain the attention map. Finally, the weighted sum of the attention map and v is calculated to obtain $SA\left(x\right)$. The square root of $\sqrt{{D}_{h}}$ is a normalization factor. MHSA's computational cost depends on input size. To reduce cost, we performed downsampling of k and v using one-dimensional convolution with a stride of 3 in the first ViT layer and a stride of 2 in the second ViT layer (Wang et al., 2021). In the third ViT layer, we did not perform downsampling.

MixFFN performs further feature extraction based on the characteristics emphasized by MHSA. MixFFN is represented as

$$MixFFN=Linear\left(GELU\left(Conv(Linear\left({x}_{in}\right)\right)\right).$$

MixFFN consists of a linear transformation, a nonlinear transformation, and another linear transformation. We used a kernel size of 1 for the convolution in the linear transformation. The nonlinear transformation inputs the output of a depth-wise convolution with a kernel size of 3 into the GELU (Hendrycks and Gimpel, 2016) activation function. MixFFN increases the number of input channels for information expansion by four times and then reduces it back to the number of channels for information aggregation.

In the decoder, the features obtained from the encoder were used to restore the data size and learn the output representation. The decoder's outputs were the probability values of each label at each sampling point, achieved using the softmax function. The number of output channels differs based on the model, with three (P, S, N labels) for the 250 Hz and 100 Hz models and two (P, N labels) for the M01 model.

The models were trained using mini-batch training with batch size = 32. RAdam (Liu et al., 2019) was used as the optimization algorithm. This method varies a learning rate of Adam’s (Kingma and Ba, 2014) hyperparameter based on a stage of learning. The Categorical Cross Entropy Loss was used as a loss function. The number of training epochs was not predetermined; early stopping was employed, stopping training when the Categorical Cross-Entropy Loss on the validation data did not reach a new minimum within 20 epochs, and the model achieving the minimum Categorical Cross-Entropy Loss was considered the optimal model.

We conducted a performance evaluation of SegPhase models. We used the test dataset, which was not used in the training process, for this evaluation. Arrival times picked by the model are the position of the peak of the probability distribution output by the models. The residuals between the arrival times determined by humans and the models were employed for performance evaluation. The evaluation of performance was conducted based on the mean (µ), standard deviation (σ), and mean absolute error (MAE) of the arrival time residuals, as well as examining the relationships between arrival time residuals and the model's output probability values, epicentral distance, signal-to-noise ratio (SNR), and magnitude.

PhaseNet models were used as a comparison for the SegPhase models. The data used to train the SegPhase models were used to train the PhaseNet models, and we performed arrival time picking of the same test data. The PhaseNet models were trained with Batch size = 32, RAdam as the optimization algorithm, Categorical Cross Entropy Loss as the loss function, and Early stopping as in the training of the SegPhase model. Therefore, the only difference between the SegPhase and PhaseNet models is the structure.

The original PhaseNet model structure was designed for three-component 100 Hz sampling waveforms. Thus, adjustments were needed to the model to input three-component 250 Hz sampling waveforms and vertical component 100 Hz sampling waveforms. For the three-component 250 Hz sampling waveforms, we adjusted the padding value within convolutional layers. For the vertical component 100 Hz sampling waveforms, we adjusted the number of input and output channels, setting the input channel to 1 and the output channels to 2. Meanwhile, for the three-component 100 Hz sampling waveforms, the original model structure of PhaseNet was used.

3.1 Where does the SegPhase model take the attention?

The MHSA mechanism within the SegPhase models allows for the acquisition of attention maps that indicate the attention weight applied to the data, making it possible to visualize where the model has learned to take attention during training. Figure 3 shows the attention map of the 250 Hz model, showing the attention weights normalized to the maximum value (Attention maps of the 100 Hz and M01 models are shown in Supplementary Figs. 1 and 2). A Normalized Attention weight closer to 1 (red) indicates a higher attention by the model.

The SegPhase model focused explicitly on the direct P- and S-waves, which are crucial for arrival time picking. Figure 3(a), (b), and (c) show high attention weights on the initial P-wave, while Fig. 3(d), (e), and (f) show high attention weights on the initial S-wave. Furthermore, Fig. 3(d), (e), and (f) show that the models focus on the same S-wave initial motion, but the focus areas are slightly different. We can see that the model pays attention from the initial S-wave to the direct S-wave in Fig. 3(d), from the initial S-wave to the first half of the direct S-wave in Fig. 3 (e), and on the second half of the direct S-wave in Fig. 3(f). These results suggest that the model changes the location of attention for the same S-wave and learns various features.

The SegPhase model also paid attention to areas other than the P- and S-waves. Figure 3(g), (h), and (l) show that the model pays attention to coda or noise areas. Figure 3(g) and (h) show that the model focuses not on the direct P- and S-waves but on the other noise, P-wave code, and S-wave code part. Figure 3(l) shows that the model focuses solely on the noise without attention to the seismic waves.

The model paid attention to different places depending on heads and not only to the signal part of the P- and S- waves but also to the noise and coda wave parts. This result suggests that SegPhase can learn various features from seismic waves and more effectively distinguish the characteristics of P- and S-wave initial motions.

3.2 Results of Phase Picking

Figure 4 and 5 show the results of picking the arrival time by the 250Hz model (Results of arrival time picking for the 100 Hz and M01 models are shown in Supplementary Figs. 3 and 4). Figure 4 shows that the arrival times determined by the model are close to the arrival times picked by the human, while Fig. 5 shows that the arrival times picked by the model and the human are different.

Figure 4(a) shows that the output probability values for both P- and S-waves are low, below 0.2, but the model picked almost the same as humans. Figure 4(b) also shows that the model picked almost the same as humans, although the output probability value was not large at about 0.3. Furthermore, although there was an SP-converted wave around 3 s, the model did not respond to it, indicating that it could pick the arrival time of the direct S-wave. Figure 4 (c) shows that the P-wave amplitude in the UD component is almost the same or lower than a high-frequency noise, but the model picked almost the same position as the humans. In Fig. 4 (d), high-frequency noise is dominant in all three components, obscuring the signal, and no signal can be seen in the NS component, indicating that the model picked almost the same position as a human. Figure 4(e) and Fig. 4(f) show that the signals are clear, and the model picked almost the same position as the humans.

Next, we show that the results of picking arrival time by the model and by the human are different (Fig. 5). Here, ‘different’ refers to the cases where there is an arrival time residual of over 0.1 s or more between the model and the human. Figure 5 (a) shows an example of the difference between human and model picks for both P- and S-wave arrival times with the model picking the wrong P arrival time, but the human picking the wrong S arrival time. Figure 5(b) shows that the model picked the S-wave arrival time later than the human. This may be due to the change in frequency of the UD waveform at the S-wave arrival time affecting the determination by the model. Figure 5 (c) shows that the P-wave arrival time was incorrect even though the P-wave output probability was high. Noise with a slightly higher amplitude before the P arrival time picked by the human may affect the determination by the model. Figure 5 (d) shows that the model picked the more accurate S-wave arrival time than the human. Figure 5(e) shows that the waveform contains low-frequency noise and that the model picked the correct P arrival time while the human picked the wrong P arrival time. Figure 5 (f) shows that the human picked the SP-converted wave as the S-wave arrival time, but the model picked the correct location of the direct S-wave.

The results in Figs. 4 and 5 show that the model picks correctly even when the data are contaminated by large noise and it can pick the direct S-wave arrival time even when SP-converted waves are included in the data. Furthermore, in some instances, accurate selections were made despite the low output probability values, and conversely, incorrect selections occurred even with high output probability values.

3.3 Performance evaluation by arrival time residuals

We evaluated the performance of the SegPhase models based on distribution of arrival time residuals. Arrival time residual is defined as the difference between arrival time picked by humans and by models. Additionally, we compared the performance of the SegPhase models with that of the PhaseNet models. Figure 6 shows the histograms of the arrival time residuals for each model. We set the bin width of the arrival time residual histograms to 0.01 s, corresponding to the minimum data point interval achievable by seismic stations with a 100 Hz sampling rate.

The mean (µ), standard deviation (σ), and mean absolute error (MAE) of the arrival time residuals indicated that the SegPhase models performed more like humans than the PhaseNet models (Table 1, Table 2). First, µ are 0.0 s for SegPhase and PhaseNet models. σ and MAE are basically smaller for the SegPhase models than for the PhaseNet models, and the S-wave arrival time pickings for the 100 Hz model and the P-wave arrival time pickings for the M01 model have the same values. In addition, a comparison of the amount of data in the bin at 0.0 s shows that the SegPhase models has more data for all data sets. These results indicate that the SegPhase models picked arrival times that were more human-like than the PhaseNet models.

The histograms of each arrival time residual show asymmetric arrival time residuals (errors). The kurtosis and skew indicate the arrival time residuals do not follow a normal distribution. The Kurtosis is larger than 3 in any model, which indicates a deviation from the normal distribution. Kurtosis was larger for the SegPhase models than for the PhaseNet models. This indicated that the arrival time residuals of the SegPhase models concentrated around the mean. This result is also confirmed by the amount of data in the bin at 0.0 s. In addition, the skew is not zero for all models, indicating that the tails of the distribution are heavy or long to the left or right. From the above, it is clear that the output of the SegPhase models (as well as the PhaseNet models) has asymmetric errors.

The bias that causes this asymmetric error is likely to be contained in the training data. A deep learning model learns features from training data and makes predictions or classifications based on those features. Therefore, the bias contained in the training data is also reflected in the output of the model. In the histograms of arrival time residuals for the 100 Hz model and the M01 model, the amount of data in the bin to the left and right of the bin with an arrival time residual of 0.0 s differs significantly. This may be because humans tend to pick more earlier points than exact points. On the other hand, in the 250 Hz model, the amount of data in the bin to the left and right of the bin with an arrival time residual of 0.0 s is almost equivalent. This may be because the 250 Hz sampling data has a higher temporal resolution than the 100 Hz sampling data, which allows for human’s easier waveform identification and, thus, reduced bias.

Next, we analyzed the relationship between the arrival time residuals and the output probability value, epicentral distance, SNR, and magnitude (Fig. 7). The mean and standard deviation of the P-wave and S-wave arrival time residuals were obtained for every 0.1 for the output probability value, every 30 or 10 km for the epicentral distance, every 0.5 for the SNR (log₁₀), and every 0.5 for the magnitude, and are shown with a histogram indicating the number of data in the interval.

Figure 7(a), (e), and (i) show the relationship between arrival time residuals and output probability values for the 250 Hz model, the 100 Hz model, and the M01 model. The relationship between arrival time residuals and output probability values indicates that if the output probability value is larger than 0.1, the models can pick arrival times close to the human, as standard deviations of P-wave arrival time residuals are less than 0.1 s. In all models, the standard deviation tends to become smaller as the output probability value increases. However, the standard deviation of the S-wave arrival time residuals is large in the 0.5 to 0.6 interval of the 100 Hz model, although the cause is unknown. Most of the data are concentrated in the range of output probability values from 0.8 to 1.0. Additionally, the standard deviations for probability range from 0.8 to 1.0 are lower than those for others. This indicates that the SegPhase model predicts the test data with high confidence, and the SegPhase model's predictions are consistent with human results. In the range of output probabilities below 0.9 and above 0.1, there is no significant increase in the magnitude of error bars for P- and S-waves, even for low output probability values. However, the errors are large for values below 0.1. Thus, the prediction uncertainty was shown to be constant over the range of output probability values above 0.1. This suggests the model picked close to human judgment even at low probability values.

Next, the relationship between the arrival time residuals and epicenter distance indicates that the prediction error becomes slightly larger at longer distances. Figure 7(b) and 7(f) show that the error bar of S-wave arrival time residual increases slightly with distance from the epicenter. This may be because the P-wave coda is complicated by scattered waves, making it difficult to pick the S-wave arrival time. On the other hand, the error bar of P-wave arrival time residuals is almost constant, up to 120 km for the 250 Hz model and the 100 Hz model (Figs. 7(b) and (f)). This may be because, unlike S-waves, P-waves are unaffected by scattering. In the M01 model, the error increases from 40 km or longer. This may be because the data set used for the M01 model was obtained from aftershock observations and, therefore, does not include the waveforms of earthquakes that occurred at a long distance.

In terms of the relationship between the arrival time residuals and SNR, Figs. 7(c), (g), and (k) shows that the better the signal-to-noise ratio, the smaller the prediction error of the P-wave arrival time for both models. On the other hand, the prediction error of the S-wave arrival time is almost constant regardless of SNR (although the S-wave arrival time residual is larger in the 3.5-4 intervals, probably due to the small number of data, although σ is still within 0.1 second). This is thought to be due to the effect of scattered waves in the P coda wave like the relationship between the arrival time residuals and epicentral distance. Finally, as for the relationship between the arrival time residuals and magnitude, the picking error was found to be constant for both P- and S-waves, regardless of magnitude. However, this may reflect the possibility that small magnitude earthquakes in the test data are relatively good SNR enough for humans to pick, probably because they were detected at shorter focal distances.

The performance of the SegPhase and PhaseNet models was compared using arrival time residual histograms. The results suggest that the SegPhase model performs closer to human pickings than the PhaseNet model, especially regarding the µ, σ, and MAE of the arrival time residuals. The σ of P-wave arrival time residuals from the commonly used 100-Hz PhaseNet model is 1.5 times that of the SegPhase model. The high performance of the SegPhase models was also confirmed by its relationship with output probability values, epicentral distance, SNR, and magnitude.

In order to apply the SegPhase model developed in this study to continuous waveforms for earthquake detection, it is necessary to determine a threshold for the output probability value. Previous studies that made earthquake catalogs using deep learning models used thresholds to maximize model performance (Liu et al., 2020; Tan et al., 2021; Park et al., 2022). However, as mentioned above, the output of the deep learning model depends on the training data. When models pick the arrival time of seismic waves that contain features that are not included in the training data, the output probability may be low even if the arrival time is picked correctly. Therefore, in this study, we investigated whether the number of earthquake detections and hypocenter accuracy are affected by changing the threshold used for the preparation of an earthquake catalog.

For validation, continuous waveform data from the aftershock observation network for the Central Tottori Earthquake, which occurred on October 21, 2016, were utilized to determine hypocenters. These determinations were made at each threshold of the output probability value: 0.1, 0.3, 0.5, 0.7, and 0.9. The period was from October 22 to October 31, 2016. Eleven stations with 3-component 250 Hz sampling, ten stations with 3-component 100 Hz sampling, and 24 stations with 1-component 100 Hz sampling were installed in the aftershock observation network for this period (Iio et al., 2020). In addition to the aftershock observation network, two stations with 3-component 100 Hz sampling, which are stationary observation networks, were also used in the analysis for a total of 47 stations with continuous waveform data in WIN files

To create an earthquake catalog, we performed phase association and hypocenter determination after phase picking by the models. When picking arrival times from observational data stored in WIN files, we divided it into segments of 30 seconds. However, considering that seismic waves may exist the boundaries of these segments, we adjusted the segments so that each one overlaps by 15 seconds with its neighbors. This adjustment ensures that we do not miss any part of the seismic waves due to the segmentation. Next, phase association was performed using Rapid Earthquake Association and Location (REAL; Zhang et al., 2019). An earthquake was considered to have occurred if the number of P-wave arrival times was larger than 4, the number of S-wave arrival times was larger than 4, the total number of P- and S-wave arrival times was larger than 8, and the number of stations at which both P- and S-wave were picked was larger than 4. Next, we performed hypocenter determination using Hypomh_PS (Kawanishi et al., 2009). Hypomh_PS is an improved version of Hypomh (Hirata and Matsu'ura 1987), a hypocenter determination method used in the WIN system. Hypomh sets S-wave velocity at 1.7 times P-wave velocity. Conversely, Hypomh_PS can set P-wave and S-wave velocity structures independently. We used JMA2001 (Ueno et al., 2002) as seismic wave velocity structure and did not apply station corrections. The same process was used for the PhaseNet models to generate an earthquake catalog, which was used for comparison. In addition, the JMA earthquake catalog for the same period and the earthquake catalog created using the Horiuchi program (Horiuchi et al., 2013) which utilizes STA/LTA and AR-AIC were used for comparison.

Figure 8(a) shows the number of earthquake detections when the output probability value threshold is 0.1, 0.3, 0.5, 0.7 or 0.9. The JMA catalog contains the fewest earthquakes, and the SegPhase model with a threshold of 0.1 contains the most earthquakes. Both the SegPhase and PhaseNet models show a decrease in the number of earthquake detections as the threshold value of the output probability value is increased, and the number of earthquake detections by the PhaseNet model at a threshold value of 0.9 is lower than that of the Horiuchi program. In addition, the number of earthquake detections by the SegPhase model is higher than that of the PhaseNet model at any threshold value. These results indicate that the SegPhase models has a higher detection capability. The frequency-magnitude distribution when the threshold is set to 0.1 is shown in Fig. 8(b), and indicates that the SegPhase and PhaseNet models detected more earthquakes with smaller magnitudes than the Horiuchi program and JMA.

Figure 9 shows the hypocenter distribution of all earthquakes detected by the SegPhase models and earthquakes within 1 km from the AA' and BB' survey lines are projected on two vertical cross sections. Lowering the threshold for output probability appears to increase the variability in earthquake distribution. However, the seismic clusters on the map can be seen even when the threshold is lowered, and a distribution extending in the direction of the mainshock hypocenter can also be seen in the projection on the BB' cross section.

Figures 8 and 9 show that lowering the threshold value improves the detection capability as the number of small-magnitude earthquakes increases, although the dispersion of the hypocenter distribution increases. It is intuitively reasonable that lowering the threshold increases the number of earthquakes detected, and similar results were reported by Kim et al. (2023). Table 3 also shows the O-C (observed - theoretical) and hypocenter determination errors for all earthquakes for each output probability value. It compares the impact of different output probability value thresholds on those errors. When the threshold value changes from 0.1 to 0.9, the mean (µ) and standard deviation (σ) of the difference between the observed and O-C of the P- and S-waves, as well as the mean absolute error (MAE), change only relatively little. In particular, the MAE of the P-wave only slightly decreases from 0.06 s at threshold 0.1 to 0.03 s at threshold 0.9, and the MAE of the S-wave similarly decreases slightly from 0.09 s at threshold 0.1 to 0.05 s at threshold 0.9. While O-C tends to decrease slightly as the threshold for the output probability value is increased, the hypocenter location error tends to decrease slightly from 0.37 ± 0.17 km at a threshold of 0.9 to 0.34 ± 0.19 km at a threshold of 0.1 as the threshold is lowered. This is likely due to the increase in the number of arrival times used to determine the hypocenter, as can be seen in Figs. 8 and 9. These results suggest that changing the threshold value of the output probability value does not significantly affect the accuracy of the arrival time prediction or the error in the epicenter location.

Figures 10 and 11 show the ratio of the number of arrival times detected by the models to that associated with earthquakes in the phase association for each threshold value. We call this ratio utilization rate. When the threshold value is 0.1, the P-wave arrival time utilization rate is about 74.7%, and the S-wave arrival time utilization rate is about 59.5% in the range of output probability values from 0.9 to 1.0. On the other hand, when the threshold value is 0.9, the utilization rate of P-wave arrival time in the output probability value range of 0.9-1.0 is about 45.6%, and the utilization rate of S-wave arrival time is about 33.9%. The results of increasing utilization by lowering the threshold can also be seen in the range other than 0.9-1.0. Figures 10 and 11 show that only about 60% of the arrival times with output probability values of 0.9-1.0, which the model confidently determines, are used for earthquake detection, but this may be due to the inclusion of arrival times of seismic waves outside the network for which no hypocenter determination was performed. Lowering the threshold leads to an increased use of arrival times that have high probability values. This indicates that the rise in the number of detected earthquakes is not just a result of considering arrival times with low probability values. Instead, it also includes those arrival times that are determined with a high level of confidence by the models. It is believed that the events for which the arrival times possess a high output probability value, yet did not meet the criteria for phase association, have now been recognized as earthquakes.

From the above, lowering the threshold value can increase the number of earthquake detections without much-degrading the performance of arrival time picking. Although it is possible to set the threshold of the output probability value smaller than 0.1, the analysis time increases by increasing phase picks because REAL performed a grid search method for phase association. Therefore, in this study, we set the lower threshold at 0.1. The uncertainty of hypocenters caused by lowering the threshold can be improved by applying station corrections, optimizing the seismic wave velocity structure, or using high-precision hypocenter determination methods such as hypoDD (Waldhauser and Ellsworth., 2000). For example, due to increasing detected earthquakes, ray paths that are used for arrival time tomography increase, and we can get more detailed seismological structures. For this reason, lowering the threshold is helpful.

ViT exhibits scaling behaviors that are analogous to those observed in language models, where performance increases with both model size and the amount of training data (Kaplan et al., 2020). Dosovitskiy et al. (2021) indicates that as the number of parameters in ViT increases, there is a corresponding improvement in their ability to effectively process and interpret complex data. The scalability of ViT is evidenced by their performance on large-scale image datasets, where larger models trained with more data outperform smaller models (Steiner et al., 2021). SegPhase models are trained with a limited amount of data. It is expected that increasing the amount of data will enable the training of models with better performance.

In this study, we developed new models, SegPhase, for seismic arrival time picking of Manten, aftershock observations, and nearby stationary stations. The model structure employs an innovative structure using ViT, which has not previously been used in seismic arrival time picking models and shows superior performance compared to conventional seismic arrival time picking models. The performance of the SegPhase models was verified in terms of the relationship between arrival time residuals, output probability values, epicentral distance, SNR, and magnitude and compared to the PhaseNet models, which showed that the SegPhase models had better picking performance and number of seismic detections. Moreover, when the SegPhase models are applied to continuous waveforms, the discussion of the number of detections, O-C values, and hypocenter determination errors indicates that when the threshold is lowered, more arrival times were used for earthquake detection not only with lower output probability values but also with higher output probability values. Therefore, lowering the threshold allows the Phase association to make better use of the arrival times that the model detected as highly accurate. Lowering the threshold was shown to slightly increase the error; however, it significantly increased the detection number of earthquakes.

The application of SegPhase is expected to lead to new advances in seismology by automating the arrival time picking of large amounts of high-quality data. We hope this research opens new perspectives on the methods of earthquake detection and catalog creation in seismology. However, this model is limited to arrival time picking, and further automation of seismic wave analysis requires integrating other elements, such as P-wave initial motion polarity determination. Future research will integrate these elements into the model to develop a more comprehensive seismic wave analysis model.

GPD : Generalized Phase Detection

ViT : Vision Transformer

JMA : Japan Meteorological Agency

NIED : National Research Institute for Earth Science and Disaster Resilience

AIST : National Institute of Advanced Industrial Science and Technology

M_J: JMA magnitude

M: magnitude

SNR : signal-to-noise ratio

OPE : Overlap Patch Embedding

MHSA : Multi Head Self-Attention

MixFFN : Mix Feed Forward Network

PE : Position Embedding

SA : Self-attention

μ : mean

σ : standard deviation

MAE : mean absolute error

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

Data from the stationary observation network is available at the Data Management Centre of the National Research Institute for Earth Science and Disaster Resilience (NIED) (https://hinetwww11.bosai.go.jp/auth/?LANG=en), the Japan Meteorological Agency (JMA) (http://www.data.jma.go.jp/svd/eqev/data/bulletin/ index_e.html, http://www.data.jma.go.jp/svd/eqev/data/daily_map/index.html), and the Earthquake Research Institute, the University of Tokyo (http://tkypub.eri.u-tokyo.ac.jp/ harvest).

Please contact the author for other data use. Deep learning model is available at author’s github (Currently under preparation).

Competing interests

The authors declare that they have no competing interests.

Funding

the MEXT Project for Seismology Toward Research Innovation with Data of Earthquake (STAR-E) grant JPJ010217

JSPS KAKENHI Grant-in-Aid for Research Activity Start-up No. 23K19061

Grant-in-Aid for Scientific Research (A) No. 23H00466

Grant-in-Aid for Fund for the Promotion of Joint International Research (International Collaborative Research) No. 23KK0181

Earthquake Research Institute, University of Tokyo Joint Research ERI JURP 2024-A-04, 2022-B-06 and 2024-B-01

Authors' contributions

Shinya Katoh carried out the analysis and drafted the manuscript. Yoshihisa Iio and Hiromichi Nagao helped prepare the manuscript. Yoshihisa Iio, Hiroshi Katao, Masayo Sawada, and Kazhuhide Tomisaka maintained the seismic stations and supervised the data. All authors read and approved the final manuscript.

Acknowledgements

We used seismic data from the National Research Institute for Earth Science and Disaster Prevention (NIED), the National Institute of Advanced Industrial Science and Technology (AIST), the Japan Meteorological Agency (JMA), Kyoto University, Kyushu University and the Earthquake Research Institute of the University of Tokyo. This study was supported by the MEXT Project for Seismology Toward Research Innovation with Data of Earthquake (STAR-E) grant JPJ010217. The key ideas in this study were derived from the activities of JSPS KAKENHI Grant-in-Aid for Research Activity Start-up No. 23K19061, Grant-in-Aid for Scientific Research (A) No. 23H00466, Grant-in-Aid for Fund for the Promotion of Joint International Research (International Collaborative Research) No. 23KK0181, Grant-in-Aid for Challenging Research (Exploratory) No. 20K21785, and Earthquake Research Institute, University of Tokyo Joint Research ERI JURP 2024-A-04, 2022-B-06 and 2024-B-01. Generic Mapping Tools (Wessel et al., 2019) and Matplotlib (Hunter, 2007) were used to create the figures. Obspy (Beyteuther et a., 2010) was used for waveform processing. Pytorch (Paszke et al., 2019) was used to create the deep learning models. We would like to express our gratitude.

Aoi S, Kimura T, Ueno T, Senna S, Azuma H (2021) Multi-Data Integration System to Capture Detailed Strong Ground Motion in the Tokyo Metropolitan Area. Journal of Disaster Research 16 (4):684-699. doi:10.20965/jdr.2021.p0684
Aoki S, Iio Y, Katao H, Miura T, Yoneda I, Sawada M (2016) Three-dimensional distribution of S-wave reflectors in the northern Kinki district, southwestern Japan. Earth, Planets and Space 68 (1). doi:10.1186/s40623-016-0468-3
Beyreuther M, Barsch R, Krischer L, Megies T, Behr Y, Wassermann J (2010) ObsPy: A Python Toolbox for Seismology. Seismological Research Letters 81 (3):530-533. doi:10.1785/gssrl.81.3.530
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language Models are Few-Shot Learners.arXiv:2005.14165. doi:10.48550/arXiv.2005.14165
Chai C, Maceira M, Santos‐Villalobos HJ, Venkatakrishnan SV, Schoenball M, Zhu W, Beroza GC, Thurber C (2020) Using a Deep Neural Network and Transfer Learning to Bridge Scales for Seismic Phase Picking. Geophysical Research Letters 47 (16). doi:10.1029/2020gl088651
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.arXiv:2010.11929. doi:10.48550/arXiv.2010.11929
Feng T, Mohanna S, Meng L (2022) EdgePhase: A Deep Learning Model for Multi‐Station Seismic Phase Picking. Geochemistry, Geophysics, Geosystems 23 (11). doi:10.1029/2022gc010453
García JE, Fernández-Prieto LM, Villaseñor A, Sanz V, Ammirati J-B, Díaz Suárez EA, García C (2022) Performance of Deep Learning Pickers in Routine Network Processing Applications. Seismological Research Letters 93 (5):2529-2542. doi:10.1785/0220210323
Hara S, Fukahata Y, Iio Y (2019) P-wave first-motion polarity determination of waveform data in western Japan using deep learning. Earth, Planets and Space 71 (1). doi:10.1186/s40623-019-1111-x
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del Rio JF, Wiebe M, Peterson P, Gerard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585 (7825):357-362. doi:10.1038/s41586-020-2649-2
Hayashida Y, Matsumoto S, Iio Y, Sakai Si, Kato A (2020) Non‐Double‐Couple Microearthquakes in the Focal Area of the 2000 Western Tottori Earthquake (M 7.3) via Hyperdense Seismic Observations. Geophysical Research Letters 47 (4). doi:10.1029/2019gl084841
Hendrycks D, Gimpel K (2016) Gaussian Error Linear Units (GELUs).arXiv:1606.08415. doi:10.48550/arXiv.1606.08415
Hirata N, Matsu'ura M (1987) Maximum-likelihood estimation of hypocenter with origin time eliminated using nonlinear inversion technique. Physics of the Earth and Planetary Interiors 47:50-61. doi:https://doi.org/10.1016/0031-9201(87)90066-5
Horiuchi S, Horiuchi Y, Iio Y, Takada Y, Sawada Y, Sekine S, Nakayama T, Hirahara S, Kono T, Nakajiama J, Okada T, Umino N, Hasegawa A, Obara K, Kato A, Nakano M, Nakamura T, Talahashi N (2013) Automatic arrival time picking compared to manual picking (5). Abst Fall Meet Seismol Soc Jpn, 2013
Hunter JD (2007) Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 9 (3):90-95. doi:10.1109/mcse.2007.55
Iio Y (2011) Developement of a Seismic Observation System in the Next Generation - to Install Ten Thousands Stations -. DPRI Annuals 54 (A):17-24
Iio Y, Matsumoto S, earthquake JaogotNOP (2021a) Relationship between the Northern Osaka Prefecture earthquake and the Arima-Takatsuki Tectonic Line estimated from dense aftershock observations. Abst Japan Geoscience Union Meeting
Iio Y, Matsumoto S, Yamashita Y, Sakai Si, Tomisaka K, Sawada M, Iidaka T, Iwasaki T, Kamizono M, Katao H, Kato A, Kurashimo E, Teguri Y, Tsuda H, Ueno T (2021b) Stress relaxation arrested the mainshock rupture of the 2016 Central Tottori earthquake. Communications Earth & Environment 2 (1). doi:10.1038/s43247-021-00231-6
Iio Y, Yoneda I, Sawada M, Ito Y, Katao H, Tomisaka K, Nagaoka A, Matsumoto S, Miyazaki M, Sakai Si, Kato A, Hayashi Y, Yamashina T, Okubo M, Noguchi T, Kagawa T (2016) Manten Seismic Observation in the Western Tottori Prefecture Region. DPRI annuals (60):382-388
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, Gray S, Radford A, Wu J, Amodei D (2020) Scaling Laws for Neural Language Models.arXiv:2001.08361. doi:10.48550/arXiv.2001.08361
Katoh S, Iio Y, Katao H, Sawada M, Tomisaka K, Miura T, Yoneda I (2018) The relationship between S-wave reflectors and deep low-frequency earthquakes in the northern Kinki district, southwestern Japan. Earth, Planets and Space 70 (1). doi:10.1186/s40623-018-0921-6
Kawanishi R, Iio Y, Yukutake Y, Shibutani T, Katao H (2009) Local stress concentration in the seismic belt along the Japan Sea coast inferred from precise focal mechanisms: Implications for the stress accumulation process on intraplate earthquake faults. Journal of Geophysical Research: Solid Earth 114 (B1). doi:10.1029/2008jb005765
Kim A, Nakamura Y, Yukutake Y, Uematsu H, Abe Y (2023) Development of a high-performance seismic phase picker using deep learning in the Hakone volcanic area. Earth, Planets and Space 75 (1). doi:10.1186/s40623-023-01840-5
Kingma DP, Ba J (2014) Adam: A Method for Stochastic Optimization.arXiv:1412.6980. doi:10.48550/arXiv.1412.6980
Lei Ba J, Kiros JR, Hinton GE (2016) Layer Normalization.arXiv:1607.06450. doi:10.48550/arXiv.1607.06450
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2019) On the Variance of the Adaptive Learning Rate and Beyond.arXiv:1908.03265. doi:10.48550/arXiv.1908.03265
Liu M, Zhang M, Zhu W, Ellsworth WL, Li H (2020) Rapid Characterization of the July 2019 Ridgecrest, California, Earthquake Sequence From Raw Seismic Data Using Machine‐Learning Phase Picker. Geophysical Research Letters 47 (4). doi:10.1029/2019gl086189
Matsumoto S, Iio Y, Sakai Si, Kato A, Observation GfMHDS (2020) Hyper Dense Seismic Observation for Investigation on Fault Zone Development: Application to Hypocentral Area of 2000 Western Tottori Earthquake. Journal of Geography(Chigaku Zasshi) 129 (4):511-527. doi:10.5026/jgeography.129.511
Miura T, Iio Y, Katao H, Nakano S, Yoneda I, Fujita Y, Kondo K, Nishimura K, Sawada M, Tada M, Hirano N, Yamazaki N, Tomisaka K, Tatsumi K-i, Kamo M, Shibutani T, Ohmi S, Kano Y (2010) Temporary Seismic Observation in the Northern Kinki District. DPRI Annuals 53 (B):203-212
Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer-an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11 (1):3952. doi:10.1038/s41467-020-17591-w
Münchmeyer J, Woollam J, Rietbrock A, Tilmann F, Lange D, Bornstein T, Diehl T, Giunchi C, Haslinger F, Jozinović D, Michelini A, Saul J, Soto H (2022) Which Picker Fits My Data? A Quantitative Evaluation of Deep Learning Based Seismic Pickers. Journal of Geophysical Research: Solid Earth 127 (1). doi:10.1029/2021jb023499
Obara K, Kasahara K, Hori S, Okada Y (2005) A densely distributed high-sensitivity seismograph network in Japan:Hi-net by National Research Institute for Earth Science and DisasterPrevention. Review of Scientific Instruments 76 (2). doi:10.1063/1.1854197
Okada Y, Kasahara K, Hori S, Obara K, Sekiguchi S, Fujiwara H, Yamamoto A (2004) Recent progress of seismic observation networks in Japan - Hi-net, F-net, K-NET and KiK-net. Earth Planets and Space 56 (8):XV-XXVIII. doi:10.1186/bf03353076
Park Y, Mousavi SM, Zhu W, Ellsworth WL, Beroza GC (2020) Machine‐Learning‐Based Analysis of the Guy‐Greenbrier, Arkansas Earthquakes: A Tale of Two Sequences. Geophysical Research Letters 47 (6). doi:10.1029/2020gl087032
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library.arXiv:1912.01703. doi:10.48550/arXiv.1912.01703
Ross ZE, Meier MA, Hauksson E, Heaton TH (2018) Generalized Seismic Phase Detection with Deep Learning. Bulletin of the Seismological Society of America 108 (5A):2894-2901. doi:10.1785/0120180080
Sakai Si, Hirata N (2009) Distribution of the Metropolitan Seismic Observation network. Bull Earthq Res Inst Univ Tokyo 84 (2):57-69
Shiomi K, Obara K, Haryu Y, Matsumura M (2009) Construction of NIEC High Sensitivity Seismograph Network (Hi-net) and its Contribution. Zisin 61 (Supplement):1-7. doi:10.4294/zisin.61.1
Sleeman R, van Eck T (1999) Robust automatic P-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Physics of the Earth and Planetary Interiors 113 (1-4):265-275. doi:10.1016/s0031-9201(99)00007-2
Steiner A, Kolesnikov A, Zhai X, Wightman R, Uszkoreit J, Beyer L (2021) How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers.arXiv:2106.10270. doi:10.48550/arXiv.2106.10270
Sun H, Ross ZE, Zhu W, Azizzadenesheli K (2023) Phase Neural Operator for Multi‐Station Picking of Seismic Arrivals. Geophysical Research Letters 50 (24). doi:10.1029/2023gl106434
Tan YJ, Waldhauser F, Ellsworth WL, Zhang M, Zhu W, Michele M, Chiaraluce L, Beroza GC, Segou M (2021) Machine-Learning-Based High-Resolution Earthquake Catalog Reveals How Complex Fault Structures Were Activated during the 2016–2017 Central Italy Sequence. The Seismic Record 1 (1):11-19. doi:10.1785/0320210001
Tokuda T, Nagao H (2023) Seismic-phase detection using multiple deep learning models for global and local representations of waveforms. Geophysical Journal International 235 (2):1163-1182. doi:10.1093/gji/ggad270
Tuli S, Dasgupta I, Grant E, Griffiths TL (2021) Are Convolutional Neural Networks or Transformers more like human vision? :arXiv:2105.07197. doi:10.48550/arXiv.2105.07197
Ueno H, Hatakeyama S, Aketagawa T, Funasaki J, Hamada N (2002) Improvement of hypocenter determination procedures in the Japan Meteorological Agency. Quarterly Journal of Seismology (65):123-134
Urabe T (1994) A common Format for Multi-Channel Earthquake Waveform Data. Abst Fall Meet Seismol Soc Jpn, 1994
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention Is All You Need.arXiv:1706.03762. doi:10.48550/arXiv.1706.03762
Waldhauser F, Ellsworth WL (2000) A double-difference earthquake location algorithm: Method and application to the northern Hayward fault, California. Bulletin of the Seismological Society of America 90 (6):1353-1368. doi:10.1785/0120000006
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.arXiv:2102.12122. doi:10.48550/arXiv.2102.12122
Wessel P, Luis JF, Uieda L, Scharroo R, Wobbe F, Smith WHF, Tian D (2019) The Generic Mapping Tools Version 6. Geochemistry, Geophysics, Geosystems 20 (11):5556-5564. doi:10.1029/2019gc008515
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.arXiv:2105.15203. doi:10.48550/arXiv.2105.15203
Yano K, Shiina T, Kurata S, Kato A, Komaki F, Sakai Si, Hirata N (2021) Graph‐Partitioning Based Convolutional Neural Network for Earthquake Detection Using a Seismic Array. Journal of Geophysical Research: Solid Earth 126 (5). doi:10.1029/2020jb020269
Yukutake Y, Iwata T, Iio Y (2020) Estimation of the heterogeneity of stress fields using misfit angles in focal mechanisms. Tectonophysics 790. doi:10.1016/j.tecto.2020.228553
Zhang M, Ellsworth WL, Beroza GC (2019) Rapid Earthquake Association and Location. Seismological Research Letters 90 (6):2276-2284. doi:10.1785/0220190052
Zhu W, Beroza GC (2018) PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method. Geophysical Journal International. doi:10.1093/gji/ggy423
Zhu W, Tai KS, Mousavi SM, Bailis P, Beroza GC (2022) An End‐To‐End Earthquake Detection Method for Joint Phase Picking and Association Using Deep Learning. Journal of Geophysical Research: Solid Earth 127 (3). doi:10.1029/2021jb023283

Tables 1-3 are available in the Supplementary Files section.

Download PDF

Reviewers agreed at journal
05 May, 2024
Reviewers invited by journal
02 May, 2024
Editor assigned by journal
27 Apr, 2024
First submitted to journal
19 Apr, 2024

You are reading this latest preprint version

SegPhase: Development of Arrival Time Picking Models for Japan’s Seismic Network Using the Hierarchical Vision Transformer

Status:

Version 1

Abstract

Figures

1. Introduction

2. Method

2.1 Data

2–2. Model Structure

3. Result

3.2 Results of Phase Picking

3.3 Performance evaluation by arrival time residuals

4. Discussion

5. Conclusion

Abbreviations

Declarations

References

Tables

Supplementary Files

Status:

Version 1