Predicting ionospheric precursors before large earthquakes using neural network computing and the potential development of an earthquake early warning system

Total electron content (TEC) precursors of the Chi-Chi Earthquake, which occurred in Taiwan at 01:47:15 on September 21, 1999 (Taiwan Standard Time, TST), with its epicenter at 23.85° N and 120.82° E, a Richter magnitude (ML) of 7.3 (a moment magnitude (Mw) of 7.6), and a focal depth of 8.00 km, were detected 1, 3, and 4 days before the earthquake using two back-propagation neural network (BPNN) models. These results are consistent with the analysis results of Liu et al. (2001) and Lin (2010). Another TEC precursor was detected on May 13, 2003 (TST), 2 days before the earthquake on May 15, 2003 (TST), with an ML of 5.21. Their precursors might be induced by the variations of the geomagnetic and electric fields near their hypocenter. The two BPNN models were to be verified for stability and reliability after performing cross-validation and evaluating the variance in the learning process. For this, the result of the analyzed method can serve as a real-time TEC predicted system for giving any future time as inputs of the two BPNN models. Hence, an earthquake early warning (EEW) system is performable.


Introduction
Many recent studies have focused on the detection of ionospheric anomalies as precursors to large earthquakes according to the Lithosphere-Atmosphere-Ionosphere Coupling (LAIC) model (Pulinets and Ouzounov 2011;Marchetti et al. 2020). In this study, Total Electron Content (TEC) is the value obtained from the vertical integration of the electron density, and it is a critical indicator of the status of the ionosphere, also called vertical TEC (VTEC). The unit of the TEC is TECu (1 TECu = 10 16 e/m 2 ), and the symbols 'e' and 'm' 1 3 denote electrons and meters, respectively (Mannucci et al. 1998;Cooper et al. 2019). The calculation of the slant TEC from the vertical TEC is an essential research topic (Gautam et al. 2018;Cooper et al. 2019). However, it is not within the scope of this study.
In recent years, one concept has become particularly attractive: artificial intelligence (AI). Several researchers have already applied the concept of AI to detect TEC anomalies associated with earthquakes. Kalita et al. (2012) used a peak detection algorithm, an automatic pattern matching approach, and an artificial neural network (ANN) to process GPS data, generate the TEC record, and detect the TEC precursor of an impending earthquake in Guwahati, India, with an epicenter at 26.10° N and 91.45° E. Homam (2016) presented TEC anomalies related to earthquakes using an ANN, GPS ionospheric scintillation, and a TEC monitor (GISTM) receiver from 2005 to 2010 (Coordinated Universal Time, UTC) during low-to medium solar activity, with sunspot numbers (SSNs) of 0.0-42.6. To reduce the predicted errors, Okoh et al. (2016)  Based on the results of previous studies researchers have indicated complicated computational processing using a large number of ionospheric resources. Two factors that are not associated with earthquakes have been simultaneously considered. They are ionospheric anomalies, such as the equatorial ionization anomaly (EIA); and space weather information, such as seasonal variations, solar wind, diurnal variations, SSNs, and the geomagnetic storm (Homam 2016;Kumar et al. 2016;Tsagouri et al. 2018;Hudson et al. 2021). Separating the earthquake-associated TEC anomalies from the aforementioned two factors has been an important procedure. However, this has complicated the data processing and required a long computation time to exclude the two factors. Moreover, in some previous studies, two different methods have been simultaneously used to predict the TEC precursors, and thus, they have simultaneously combined two different methods with a large amount of data, making the analysis procedure complex and the economic cost high. Liu et al. (2001) examined TEC data to detect the TEC precursors 1, 3, and 4 days before the Chi-Chi Earthquake. Liu et al. (2006) found that the TEC precursors of earthquakes with a Richter magnitude (ML) ≥ 5.0 were detectable by examining the earthquakes in Taiwan from 1988 to 2001 (UTC) using a statistical approach. They found that the TEC precursors of earthquakes with an ML < 5.0 could not be detected. Lin (2010) examined the TEC data using principal component analysis (PCA) to detect the TEC precursors before the Chi-Chi Earthquake on September 17, 18, and 20, 1999, at 01:47:15 (Taiwan Standard Time, TST), with its epicenter at 23.85° N and 120.82° E, an ML of 7.3 (a moment magnitude (Mw) of 7.6), and a focal depth of 8.00 km on September 21, 1999. The results of previous studies were examined to verify the reliability and correctness of the analyzed method in this study.

Objectives of this study
This study mainly focused on researching the TEC precursors of earthquakes using two back-propagation neural network (BPNN) models before large earthquakes with an ML ≥ 5.0, with a few TEC datasets as the training datasets. The source of TEC datasets is introduced in the next section. The training datasets were the 24-day TEC records of different seasons and months owing to the TEC seasonal and monthly variants (Ogwala et al. 2019). The 24-day TEC records are called the short-period TEC records (SPTRs) in the plural form. Similarly, the abbreviation of the short-period TEC record is the SPTR in the singular form. They are the VTEC. There were only two earthquakes with an ML ≥ 5.0 in the periods examined, that is, the Chi-Chi Earthquake on September 21, 1999 (TST), and an earthquake on May 15, 2003 (TST).
Therefore, for the first objective, it is expected that the TEC precursors of these two earthquakes can be detected by predicting the TEC situation using the two BPNN models. A large predicted error can be an indicator for the TEC precursors, according to the results of the study of Lin et al. (2018). For the second objective, any TEC precursor is expected to be detected relating to the earthquakes with an ML ≥ 5.0. For the third objective, it is expected that the analyzed method can serve as a real-time TEC predicted system for giving any future time as inputs of the two BPNN models by controlling the computation time using appropriate computing hardware. Thus, an earthquake early warning (EEW) system is performable. The data processing in training the two BPNN models is not complicated, which reduces the economic cost owing to the use of a few SPTRs as training datasets. These are the distinguishing advantages of the analyzed method in this study, which can simplify the data processing and thereby overcome the shortcomings of methods used in previous studies.
The SPTRs are a set of small datasets. Zakeri et al. (2020) emphasized that the crossvalidation is indispensable for ensuring the reliability and robustness of ANN models with small training datasets to avoid overfitting. Many studies have also focused on this issue (Pasini 2015;Bhadra et al. 2017;Baquirin and Fernandez 2018;Hemmerich et al. 2020;Seddiki et al. 2020;Zakeri et al. 2020;Jiang et al. 2021). Therefore, the cross-validation is an important step for meeting the objectives of this study. The SPTRs, which were the training datasets, were predicted to verify and validate the reliability and robustness of the two BPNN models (i.e., the inside test), and other SPTRs that were not training datasets (i.e., test datasets), were predicted to test the predicted ability of the two BPNN models (i.e., the outside test). Following the cross-validation, standard deviation (STD) and mean square error (MSE) were used as statistical approaches to evaluate the reliability of the two BPNN models. The predicted results were compared with those of Liu et al. (2001), Liu et al. (2006), andLin (2010) to verify the consistency of the two BPNN models.

Data sources
In this study, the SPTRs were classified into five SPTRs with their corresponding earthquake catalogs. The five SPTRs with different time periods were selected due to their long-term seasonal variations. Therefore, they could serve as the training datasets to build the two suitable BNNN models. The corresponding earthquake catalogs consisted of the earthquakes with an ML ≥ 4.0 (CWB). Three SPTRs were the training datasets. Two SPTRs were the test datasets.

3
The SPTRs (sampling per 15 min) were acquired from Chung-Li GPS station (25.00° N and 121.20° E). These were the records in the F region of the Ionosphere, at a height of approximately 150-800 km. The features and scale of the examined ionospheric TEC records for Taiwan were measured at this station, called Chung-Li Ionosonde. The Chung-Li Ionosonde monitors the TEC variants within a radius of 500 km from the Chung-Li GPS station, which covers GPS network receiver stations in the region of 15-30° N and 110-130° E using the very-high-frequency (VHF) radar echoes emitted by the Chung-Li VHF radar station (24.91° N, 121.24° E) (refer to Fig. 1 of Liu et al. 2004). This means that the TEC records of the Chung-Li Ionosonde can reflect the TEC variants of Taiwan (Lin et al. 2016). The Chung-Li Ionosonde was built by the Center for Space and Remote Sensing Research (CSRSR), National Central University (NCU), Taiwan. Therefore, Chung-Li Ionosonde monitors the ionospheric situation in Taiwan. The five SPTRs are listed below.

Algorithm of the back-propagation neural network
Several studies and engineering applications have revealed that a neural network consisting of two hidden layers with few neurons can replace a network with numerous neurons in a hidden layer (Thomas et al. 2016). Therefore, in Fig. 5, a four-layered BPNN model with two hidden layers is employed similarly as that in the study of Lin et al. (2018). This BPNN model is based on the multilayer perceptron (MLP) neural network with an error back-propagation (EBP) algorithm. The EBP algorithm is a type of supervised learning algorithm that solves several nonlinear and complex problems (Rumelhart et al. 1986;Khoshgoftaar et al. 2010). Its training datasets are called label signals and include the training input and corresponding target output (Belo et al. 2017). A BPNN model includes an activation function and two parameters. The two parameters are the weight and bias.
A modified elementary Levenberg-Marquardt Algorithm (M-LMA) served as the EBP algorithm in this study (Lin et al. 2018). The M-LMA is the best method for general nonlinear problems, in which rank-deficient nonlinear least squares are encountered; that is, when ill-conditioned data are encountered, such as the nonlinear TEC and seismic data, the global minimum can be easily accessed after a single completed iteration. The BPNN model was constructed in the following equation: (1)  where R I is the training input; and W , b , and f are the weight, bias, and activation function, respectively. TEC 4 Output is the target output. According to the framework of the BPNN model in Fig. 5 and the superscripted variables in Eq. (1), following the input layer, layer 1 is the first hidden layer (i.e. Hidden Layer 1 in Fig. 5). Following the first hidden layer, layer 2 is the second hidden layer (i.e. Hidden Layer 2 in Fig. 5). Following the output layer, the output is described as the temporary output for each epoch using the EBP algorithm. The subscripts J and K indicate the number of neurons in each hidden layer. They were the true TEC values in this study. The superscripts indicate the number of layers. The Framework of a four-layered BPNN model. Following the output layer, the output on the right side is described as the temporary output for each epoch using the EBP algorithm (Rumelhart et al. 1986) sigmoid function was used as an activation function, as shown in the following equation (Schmidhuber 2015): 6 Training procedure of the two back-propagation neural network models The training procedure of the two BPNN models was as follows: (1) We let each hidden layer have 10 neurons (J = K = 10) in Fig. 5; then, Eq.
(2) The SPTRs with the same dimension (i.e., 24 days) with three SPTRs (i.e., the first, second, and third SPTRs) were selected as the training inputs, which is a matrix, in training a BPNN model according to the concept of the algorithm for training a BPNN model (Rumelhart et al. 1986). Therefore, the training inputs R I in Eq.
(3) were as follows: (3) The M-LMA and the sigmoid function in Eq.
(2) were used as the EBP algorithm and the activation function in training a BPNN model. The training epoch was set as adaptive (Mocanu et al. 2018). In this study, after testing, we performed more than 1000 training epochs to achieve a similar training error. Therefore, performing more than 1000 training epochs was already meaningless, and more computing time was necessary. (4) The learning rate was adaptive based on the concept used by Ede and Beanland (2020). (5) The initial random weights and the initial biases (Schmidhuber 2015;Nadia et al. 2017) were within the range of 0 and 1. (6) (6) The magnitudes of all of the components of R I were normalized in the range of 0-1 via feature scaling (Bo et al. 2006). (7) The training error was defined as the target output minus the temporary output (described as output in Fig. 5) for each epoch using the EBP algorithm. The unit of this error is TECu. (8) A BPNN model was built at the best learning rate of 0.11, with the minimal training error occurring after finishing 1000 training epochs. This BPNN model is called the first BPNN model. (9) The predicted error was defined as the 'predicted TEC value minus the target output'.
The unit of this error is also TECu. (2) (10) According to the same procedure in training the first BPNN model, a second BPNN model was built using the SPTRs with the same dimension (i.e., 24 days), as the training inputs, with three SPTRs (i.e., the second, third, and fifth SPTRs). Therefore, the training inputs R I in Eq. (3) were as follows: The second BPNN model was built at the best learning rate at 0.23, with the minimal training error occurring after finishing 1000 training epochs. After testing, we performed more than 1000 training epochs to achieve a similar training error. Therefore, performing more than 1000 training epochs was already meaningless, and more computing time was necessary. Building two BPNN models, the epoch was 1000; the similar data character of the training data might be the reason. (11) The training dataset SPTRs were predicted to verify and validate the reliability and robustness of the two BPNN models (inside test). The test dataset SPTRs were predicted to test the predicted ability of the two BPNN models (outside test). Therefore, the cross-validation of the two BPNN models was performed.
Following step 11, the flowchart for training the two BPNN models was created, as shown in Fig. 6.

Results
The five SPTRs were predicted using the first BPNN model. The predicted results are shown in Fig. 7. Figure 7a shows the predicted results of the SPTR for September 1999, and the predicted results for the period from September 16 to 21 are the outside test. Figure 7b shows the SPTR for January 2003, and Fig. 7c shows the SPTR for February 2003. Figure 7d shows the predicted results of the SPTR for March 2003. Figure 7e shows the predicted results for the SPTR for April 2003. Figure 7f shows the predicted results for the SPTR for May 2003. Figure 7g shows the predicted results for the SPTR for October 2002. From these predicted results, the large predicted errors occur on September 17, 18, and 20, 1999 in Fig. 7a and on May 13, 2003, in Fig. 7f.
The first and fourth SPTRs for September 1999 and October 2002 were predicted using the second BPNN model. The predicted results for September 1999 and May 2003 are shown in Figs. 8 and 9. From these predicted results, the large predicted errors occur on September 17, 18, and 20, 1999 in Fig. 8 and on May 13, 2003, in Fig. 9. Referring to the captions of Figs. 7, 8, and 9, the threshold of the predicted error is 11.91 TECu in Fig. 8, which is the maximum value of the predicted errors (Note: predicted errors for September 17, 18, and20, 1999, andMay 13, 2003, were not evaluated). In this study, under the condition that the predicted error is greater than and equal to 11.91 TECu, the predicted error was identified as a TEC precursor (Lin et al. 2018). The predicted results of the two BPNN models were compared with the results of past related studies in Sect. 2 to verify the reasonability of the selected threshold, that is, 11.91 TECu. Fig. 6 Flowchart for training the two BPNN models

Discussion
As stated previously, the STD and MSE (with the unit: TECu) were used as statistical approaches to evaluate the ability application of the two BPNN models. For an excellent and reasonable predicted BPNN model, the accuracy of the inside test should be better than that of the outside test (Jiang et al. 2021). Combining the predicted results of subfigures in Fig. 7, the results, which are inside tests, were entirely examined using a set of the STD and MSE values as an evaluation. In contrast, the results, which are outside tests, were entirely examined using another set of the STD and MSE values as an evaluation. Therefore, the training dataset SPTRs were predicted to verify and validate the reliability and robustness of the first BPNN model, with STD = 0.24 and MSE = 0.38 (inside test). The test dataset SPTRs were predicted to test the predicted ability of this BPNN model, with STD = 0.36 and MSE = 0.49 (outside test). Evaluating the results of the inside and outside tests using the previous STD and MSE values, the accuracy of the inside test was better than that of the outside test. The low STD and MSE values confirm that the first BPNN model was an excellent and reasonable predicted BPNN model. As shown in Fig. 8, the SPTRs (i.e., test datasets) were predicted to test the predicted ability of the second BPNN model with STD = 0.59 and MSE = 0.79 (outside test). In Fig. 9, the SPTRs (i.e., test datasets) were predicted to test the predicted ability of the second BPNN model with STD = 0.54 and MSE = 0.71(outside test). The low STD and MSE values confirm the reliability of the second BPNN model. The results show the reasonable application of the cross-validation for the two BPNN models. The TEC precursors of the Chi-Chi Earthquake were successfully predicted using the two BPNN models. As was previously stated, when the predicted error was greater than 11.91 TECu, it was a TEC precursor. Therefore, the precursors of the 21 September Chi-Chi Earthquake were detectable 1, 3, and 4 days before the Chi-Chi Earthquake, that is, on September 17, 18, and 20, respectively. These results are consistent with the results of Liu et al. (2001) and Lin (2010), and they verify the reliability of the two BPNN models. The predicted errors were large on 13 May 2003, indicating a TEC precursor before the earthquake on May 15, 2003, with an ML = 5.21. Therefore, the two BPNN models had a reasonable predicted ability. They produced results that are consistent with the results of past related studies in Sect. 2. Thus, it was shown that determining the TEC precursors via a threshold, that is, 11.91 TECu, was reasonable.
As with the statistical approach of Liu et al. (2006), this study confirmed that the TEC precursors of earthquakes with an ML < 5.0 were not detectable. Therefore, the two BPNN models were reliable and suitable for predicting TEC precursors before large earthquakes. The predicted errors from September 22 to 24, 1999, were small; the possible reason for this may be that the variations in the TEC data were not associated with the aftershocks of the Chi-Chi Earthquake. By controlling the computation time using appropriate computing hardware, the analyzed method can serve as a real-time TEC predicted system for giving any future time as inputs of the two BPNN models. Thus, an EEW system is performable. Simultaneously, as stated previously, the data processing is not complicated, which reduces the economic cost. Naturally, the results of this study demonstrate that the duration of the TEC precursors, which were defined with large TEC predicted errors, was difficult to evaluate. However, under this condition, i.e., a large TEC predicted error that is larger than and equal to the selected threshold, the EEW system begins to monitor the variants of TEC, and then an alarm is broadcast. Thus, such EEW system is already well significant in Taiwan.
Finally, after performing cross-validation to avoid overfitting the two BPNN models, determining the variance in the learning process, i.e., the bias-variance tradeoff, was also another important step taken to validate the possibility of the overfitting or underfitting of the two BPNN models. Usually, a stable BPNN model with an appropriate tradeoff bias is used to let the predicted datasets of this BPNN model closely match its training datasets. In other words, the training error should be small, and the difference between the training error and the prediction error should also be small to avoid overfitting or underfitting (Belkin et al. 2019;Doroudi 2020). Therefore, we analyzed the accuracy of the training errors and predicted errors of the two BPNN models so as to validate the two BPNN models, which were not the overfitting and underfitting models. First, two training parametersthree ranges of the initial weight and bias-were selected. The ranges were 0-1, 1-2, and 2-3. Three pairs of BPNN models were built using initial weight and bias, which were random variables in the three ranges, using the same training datasets and procedures detailed in Sect. 6. Each pair included the first and second BPNN models. The MSE values were used to evaluate the training and predicted accuracy for the three pairs of BPNN models. Figure 10 shows the MSE variants of training errors with the initial weight and bias, which were randomly selected in the three different ranges, for the three pairs of BPNN models. The BPNN models with high training accuracy had low initial weight and bias. After predicting the training datasets with the three pairs of BPNN models, the MSE variants of predicted errors for the three pairs of BPNN models are shown in Fig. 11. The BPNN models with high predicted accuracy had low initial weight and bias. Based on Figs. 10 and 11, the two BPNN models trained in Sect. 6 were stable models because the training error was small, and the difference between the training error and the prediction error was also small as stated previously. The initial weight and bias, which were randomly selected within the range of 0-1, were at the optimal tradeoff. Therefore, these initial weight and bias, which fitted the value of the sigmoid function and were randomly selected within the range of 0-1, were at the optimal tradeoff.
It is noted that, although both models were stable after testing, the training and prediction errors of the second BPNN model were better than those of the first BPNN model. For this, the two BPNN models were built with the initial weight and bias, which were randomly selected within the range of 0-1 as in Sect. 6, with different training epochs. The MSE variants of training errors with different training epochs are shown in Fig. 12. The training accuracy of the second BPNN model was better that of the first model in training process. This should be the reasonable verification for the second BPNN model with more stability compared to the first BPNN model. More training epochs increased the accuracy of the training process. The accuracy when using 1250 epochs was a little better than that when using 1000 epochs. However, more computing time was necessary when using 1250 epochs for two BPNN models as stated previously in Sect. 6. Fig. 7 Predicted results of the SPTRs (TST) using the first BPNN model. The red lines indicate the tracing errors. The symbols y , yr , and error in the legend represent the SPTRs, the predicted SPTRs of the first BPNN model, and the predicted errors of the first BPNN model, respectively. a Predicted results for September 1999 (inside test . 10 The MSE variants of training errors with different ranges of the initial weight and bias. The initial weight and bias were randomly selected within different ranges in training the three pairs of BPNN models. Each pair includes the first and second BPNN models. The ranges of initial weight and bias were scaled in the horizontal coordinate. The scale at one indicates the range of initial weight and bias, from 0 to 1. The scale at two indicates the range of initial weight and bias, from 1 to 2. The scale at three indicates the range of initial weight and bias, from 2 to 3 Regarding the mechanical principles that induce the ionospheric earthquake-associated TEC anomalies, relevant studies have described three possibilities, as follows. (1) The gravity and acoustic waves, for example, the co-seismic acoustic waves generated before earthquakes, cause ionospheric anomalies. When an earthquake occurs, the fine low-frequency vibrations of the ground surface induce the ionospheric TEC anomalies. (2) The

Fig. 11
The MSE variants of predicted errors for the three pairs of BPNN models. The scale in the horizontal coordinate has the same definition as that in Fig. 10   Fig. 12 The MSE variants of training errors with different training epochs for the two BPNN models crustal chemistry theory suggests that before earthquakes, the ground vibrates and radon gases are released from the crust, causing ionospheric TEC anomalies. (3) The anomalous rock and magmatic activities cause variations of the geomagnetic and electric fields near the hypocenter. The rocks deform, and the magma flow slows before a large earthquake; thus, geomagnetic and electric field anomalies are caused, resulting in ionospheric anomalies (Liu et al. 2001;Freund 2003;Pulinets 2004;Kamogawa and Kakinami 2013;Oyama et al. 2016;Pulinets et al. 2016;Fu et al. 2017). However, the precursors of the two earthquakes, which were researched in this study, might be caused by the fluctuations of the geomagnetic and electric fields close to their hypocenter (Liu et al. 2001).

Conclusions
The first BPNN model was trained using three SPTRs, that is, the first, second, and third SPTRs. The second BPNN model was trained using three SPTRs, that is, the second, third, and fifth SPTRs. Cross-validation was performed to avoid the overfitting of the two BPNN models because their training data (i.e., SPTRs) were small datasets. The low STD and MSE values prove the predicted abilities of the two BPNN models. The results show that the precursors of the September 21 Chi-Chi Earthquake were detected 1, 3, and 4 days before the Chi-Chi Earthquake, that is, on September 17, 18, and 20, 1999. These results are consistent with the results of Liu et al. (2001) and Lin (2010). Another TEC precursor was detected on May 13, 2003, 2 days before the earthquake on May 15, 2003, with an ML = 5.21. The precursors of the two earthquakes might be caused by the differences in the geomagnetic and electric fields close to their hypocenter. No TEC precursors were detected for the earthquakes with an ML < 5.0, as with the statistical approach of Liu et al. (2006). After cross-validation and evaluation of the variance in the learning process, the two BPNN models were determined to be reliable and stable models. For this, this analyzed method can serve as a real-time TEC predicted system for giving any future time as inputs of the two BPNN models. Hence, an EEW system is performable. Simultaneously, the data processing is not complicated, which reduces the economic cost.