Elite GA-based Feature Selection of LSTM for Earthquake Prediction

Earthquake magnitude prediction is an extremely diﬃcult task that has been studied by various machine learning researchers. However, the redundant features and time series properties hinder the development of prediction models. Elite Genetic Algorithm (EGA) has the advantages in searching optimal feature subsets, meanwhile, Long Short-Term Memory (LSTM) is dedicated to processing time series and complex data. Therefore, we propose an EGA-based feature selection of LSTM model (EGA-LSTM) for time series earthquake prediction. First, the acoustic and electromagnetics data of the AETA system we developed are fused and preprocessed by EGA, aiming to ﬁnd strong correlation indicators. Second, LSTM is introduced to execute magnitude prediction with the selected features. Speciﬁcally, the RMSE of LSTM and the ratio of selected features are chosen as ﬁtness components of EGA. Finally, we test the proposed EGA-LSTM on the AETA data of Sichuan province, including the inﬂuence of data in diﬀerent periods ( timePeriod ) and ﬁtness function weights ( ω a and ω F ) on the prediction


Introduction
Earthquakes are serious, sudden-onset natural disasters.Since 1970, at least 3.3 billion people have died from such natural disasters, and 226 million people are directly affected each year [1].Earthquakes cause serious human injuries and economic losses due to the destruction of buildings and other rigid structures.Nevertheless, earthquake prediction is essential and necessary to reduce human casualties and economic loss.
Earthquake Moment Magnitude (MW) prediction is one of the important issues in earthquake prediction fields.Many studies have been carried out for finding the relationships in earthquakes and MW prediction.However, the complex and non-linear relationships in earthquakes make MW prediction to be a challenging task.Machine Learning (ML) methods have been adopted for MW prediction due to their high classification accuracy.Adeli et al. [2] calculated the seismic historical indicators, including Gutenberg Richer b-values, time lag, earthquake energy, and mean magnitude based on ML methods to predict MW.Asim et al. [3] investigated the Cyprus earthquake catalog temporally and computed sixty seismic features.Then, these features served as the input instances of Support Vector Machines (SVM) and RF to predict five days-ahead, one week-ahead, ten days-ahead, and fifteen days-ahead with different MW thresholds.Crustal movement is a continuous process makes MW prediction to be a time-series issue.Hence, in [4], Berhich et al. calculated an appropriate feature by historical data to enhance time-series task and implemented LSTM to predict MW with the enhanced features.Cai et al. [5] used three groups of real seismic data: gravity, georesistivity, and water-level dataset to predict MW based on LSTM.They considered the time series of precursors and earthquakes, however, there are irrelevant or even redundant features within the precursors, resulting in low prediction accuracy.
Feature Selection (FS) is a critical issue to improve the classification accuracy of ML methods.FS can be regarded as combinatorial optimization problems and EGA has the advantage in searching optimal feature subset.Kadam et al. [6] proposed EGA for selecting features in arrhythmia classification, and achieved satisfied performance.Thus, we adopt EGA for FS in MW prediction and design a novel fitness function, which is calculted by the RMSE of LSTM and the ratio of selected features.Finally, considering the time-series effect of electromagnetic and acoustic data from AETA and the abruptness of earthquake, we introduce LSTM to predict magnitude with optimal feature subset selected by EGA.
To verify the performance of the proposed EGA-LSTM, 95 features that are derived from the electromagnetic and acoustic data collected by AETA are adopted as our experimental dataset.After the feature selection process by EGA, the chosen features served as the input data of LSTM model.Besides, five Evolution Algorithms (EAs) and four different ML methods are adopted as our baselines: SGA, steadyGA, and three DEs, LR, SVR, Adaboost, and RF.Five ML evaluation indicators are adopted to assess the performance of those methods.The experimental results demonstrate that our proposed EGA-LSTM is superior to the state-of-the-art methods.
The main contributions of this study are as follows: • The original dataset is collected by the AETA system developed by our team, which detects electromagnetic and acoustic to predict earthquakes in Sichuan and surroundings.• To eliminate redundant features from the original 95 features, we propose EGA with a novel fitness function of a specific seismic scene to find the optimal solution.• Considering earthquake magnitude's time series effect and the abruptness of earthquake, we choose LSTM as prediction model.Other ML methods, such as LR, SVR, Adaboost, and RF are our baselines.• The statistical measures (MAE, MSE, RMSE, R-square) are used to measure the performance of suggested ML models.
The remainder of the study is arranged as follows.Section 2 depicts the latest and related works.The method will be fully described in section 3.In section 4, the experiments of EGA-LSTM and other EAs and ML methods will be illustrated in detail.The conclusion and future works will be shown in section 5.

Related Work
Various approaches have been implemented in earthquake prediction, including feature selection methods and prediction methods.

Seismic feature selection
Feature selection is a crucial step in building a robust model, especially in highdimensional data scene [7].The application of feature selection to earthquake prediction is a complex problem, because different techniques are sensitive to data imbalance and noise.Martínez-Álvarez et al. [8] proposed information gain to evaluate different indicators and removed the low-ranked or null contribution seismic indicators.Roiz-Pagador et al. [9] introduced four different Elite GA-based Feature Selection of LSTM for Earthquake Prediction feature selection methods and nine different EAs in correlation-based feature selection methods.

Earthquake prediction methods
The main MW prediction methods contain mathematical models, shallow ML techniques, and DL methods.
For mathematical models, the mathematical methods have been implemented to deal with uncertainty problems, such as probability theory and fuzzy sets to predict MW.For instance, Chen et al. [10] focused on the laws of the earthquake time series based on chaos theory and carried on the earthquake forecast simulation through the analysis of real data.The quantitative identification of time-series data result shows that seismic time-series data perform deterministic chaotic characteristics.Cekim et al. [11] captured the dynamics of earthquake occurrence using a novel Singular Spectrum Analysis and Autoregressive Integrated Moving Average method to predict the average and maximum MW in the East Anatolian Fault of Turkey.However, the prediction results proved unsatisfactory due to the strict requirements of the mathematical model on the independence and correlation of the data.
Another branch of MW prediction approaches focused on shallow ML techniques.They are driven by data and don't require many parameters.Since there is no definite mathematical relationship between precursor indicators and location and MW, Panakkat et al. [12] introduced three different neural networks: feed-forward Levenberg-Marquardt Back Propagation, RNN, and Radial Basis Function (RBF) neural network to predict MW and location with eight indicators from Gutenberg-Richter power-law.In 2017, Asim et al. [13] proposed different ML approaches including RF and Linear Programming Boost ensemble of decision trees for earthquake prediction in the region of Hindukush.These techniques show significant and encouraging results but they were still not available so far.As Chile is rocked by intraplate, interface, and crustal events, Chanda et al. [14] proposed six different ML methods to predict the total duration and significant duration for the three types of earthquakes.Shah et al. [15] introduced an Improved Artificial Bee Colony algorithm to improve the training process of Multilayer Perceptron.Muhammad et al. [16] studied earthquakes, Radon gas, and the Ionospheric total electron content and used statistical and machine/deep learning methods to find the relationship among the above-mentioned three elements in North Anatolian Fault.Yang et al. [17] proposed an automated regression pipeline approach for high-efficiency earthquake prediction.The core prediction part is training RF with bayesian parameter optimization.Ensemble learning has also been used in the field of earthquake prediction.Asim et al. [18] proposed a novel earthquake predictor system combining seismic indicators with GA and AdaBoost.GA's searching capabilities and boosting of AdaBoost makes the proposed method to be a powerful classifier.In addition, considering the earthquake prediction process is similar to the anomaly detection process of the biological immune system, Zhou et al. [19][20][21] [22] introduced dendritic cells algorithm, artificial macrophage classification optimization method, and artificial antigen-presenting cells approach to earthquake prediction.However, ML methods are unable to learn the complex and nonlinear relationship of earthquake features.Therefore, we need a more powerful technique to solve this problem.
Deep learning methods are experts at solving nonlinear problems because they have many hidden layers and densely connected neurons that preserve complex information.Moustra et al. [23] applied Artificial Neural Network with the history of MW data and seismic electric signals in Greece to predict MW, but the result is not satisfactory.In 2018, Asim et al. [24] computed sixty features by employing seismological concepts, extracted features by Maximum Relevance and Minimum Redundancy (mRMR), and predicted MW with the selected features based on SVR and HNN.In addition, HNN is supported by enhanced Particle Swarm Optimization.Jain et al. [25] proposed a DL method to predict the position and depth of earthquakes.They analyzed the effects of parameters on different datasets.However, these methods didn't catch the time series effects and only utilize DL methods to learn the relationship between features and magnitude.Draz et al. [26] believed that the evolution and appearance of earthquake precursors exhibit complex behavior.Therefore, they used deep machine learning for the detection of ionosphere and atmosphere precursors.This study showcases the importance of machine learning techniques in earthquake detection, which contributes to the understanding of the Lithosphere-Atmosphere-Ionosphere Coupling mechanism.Berhich et al. [27] presented LSTM for earthquake prediction to study the correlations between two divided groups considering their range of MW.To find the specfic pattern among magnitude, timing and location.Berhich et al. [28] used LSTM to learn temporal relationships, and the attention mechanism extracts important patterns and information from input features.Kavianpour et al. [29] introduced a novel prediction model based on the attention mechanism, Convolution Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM) models, to capture the temporal dependencies, which can predict the maximum MW and the number of earthquakes in a specified period in mainland China.Berhich et al. [4] calculated an appropriate feature to enhance time feature and predict MW based on LSTM with this feature.Nevertheless, most of these DL methods didn't consider some precursor features are irrelevant features, even redundant features.These irrelevant features can make DL models difficult to execute learning process, reducing predictive accuracy.
Generally speaking, the MW prediction process contains two steps: finding a valid optimization precursor subset and constructing a predicting model.Before a major earthquake stroke, it is often accompanied by changes in electromagnetic and acoustic signals in AETA.Hence, we have reasons to prove that electromagnetic and acoustic signals are related to MW.As far as authors know, electromagnetic and acoustic signals as precursor features haven't been Elite GA-based Feature Selection of LSTM for Earthquake Prediction researched by LSTM, so we utilize these indicators to predict MW.In addition, considering the existence of redundant features, we propose EGA-LSTM for time series earthquake prediction based on AETA.

Methodology
This section describes the proposed model to perform earthquake prediction.EGA-LSTM is presented to predict MW.

Model overview
A schematic presentation of the proposed model is shown in Fig. 1.First, the dataset from AETA was detected in the Sichuan and surroundingd.This dataset includes two types, electromagnetic and acoustic, with a total of 95 features.AETA has recorded electromagnetic and acoustic signals from January 1, 2017 to December 31, 2022 and detects them every ten minutes.So, there are more than 20000 records.Since MW predicting in a short period makes less sense and human activities affect electromagnetic and acoustic signals, we merge signals from one day into one piece of catalog.The fusing detail will be described in section 3.2.Then, we proposed EGA for feature selection.As

Process of EGA-LSTM
The EGA-LSTM architecture for earthquake magnitude prediction depicts as Fig. 2, meanwhile, the pseudocode of EGA-LSTM is shown in Algorithm 1. Their main idea is as follows.The AETA dataset includes 95 features, 51 features for electromagnetic signals, and 44 features for acoustic signals.However, MW isn't detected in AETA.Hence, the MW data is chosen from China Earthquake Networks Center.As a result, the algorithm merges 95 features and MW according to time.Since the time interval is short and MW prediction in a short period makes less sense.Hence, we select a typical value in every feature to represent a whole day.That means the size of the original one-day data is 144*96.After fusing phase, the size becomes 1*96.In addition, electromagnetic and acoustic signals may be affected by human activities, we need to choose a suitable timeP eriod to present a whole day.Considering the lack of nighttime activities, we choose the data from the timeP eriod 0:00 to 8:00.Then, the algorithm transforms time series data into supervised learning.In this algorithm, the input variable is electromagnetic and acoustic signals and the output variable is the MW of the following day.The step size is equal to 1.That means the algorithm can predict MW of the next day.After processing the original data, the algorithm executes feature selection.This model adopts EGA to select optimal feature subset, the individual of EGA is applied to binary encoding and the dimension of each individual is 95.If one feature is selected, the corresponding dimension is 1, otherwise 0. With comprehensive consideration of prediction accuracy and time complexity, we proposed a novel fitness function defined as Eq. ( 1), where ω a and ω F is Elite GA-based Feature Selection of LSTM for Earthquake Prediction Algorithm 1 EGA-LSTM Input: electromagnetic and acoustic signals from AETA Output: MW of the next day 1: M axiter: The maximum number of iterations 2: Of s: The selected optimal feature subset 3: Use the max value of the data between 0:00 and 8:00 representing the whole day 4: Time series transform into supervised learning sequence, step size equal to 1 5: Initialize population and parameter 6: while t < M axiter do Generate new population 15: end while 16: OF S = best f itness individual 17: M W = LST M (OF S) 18: return M W the weight factor, F is the number of selected features, and P is the number of all features.The first part of Eq. ( 1) represents the prediction accuracy and the second part indicates the complexity of the model.In addition, Root Mean Squared Error (RMSE) is an evaluation indicator that is defined as Eq. ( 2), where ŷ is the predicting value and is the true value.This fitness function enables considerable accuracy and the number of selected features is significantly reduced.
The algorithm first defines the maximum number of iterations, crossover rate, mutate rate, and the number of individuals.Then the model randomly initializes the population and begins to iterate.The first step of iterating is calculating the fitness of every individual.Hereafter, the algorithm executes the selection operator, crossover operator, and mutation operator to generate a new subpopulation.The loop finishes until the number of iterations reaches the maximum number of iterations.After the iteration phase, the best individual is the selected optimal feature subset.The last step of the algorithm is predicting MW.At the beginning, the selected optimal feature subsets are normalized by the Min-Max scaler, which is a transform method calculated by Eq. ( 3).In this model, optimal feature subsets are mapped into the same range which is between 0 and 1, which enables no one feature dominates the others.The next step consists of dividing 70% of the training set and 30% of the testing set.Then LSTM is trained and supported by RMSE for error calculation and evaluation.Finally, after the model is trained, LSTM predicts MW on the testing set, and the error between the real MW and the predicted MW is calculated by different evaluation indicators.They are widely used to evaluate regression models and applied to earthquake prediction.The detailed evaluation indicators will be provided in section 4.2.

Feature selection of EGA
GA is expert in solving combinatorial optimization problems since its operator was designed based on discrete encoding [30].Feature selection is a combinatorial optimization problem and non-deterministic polynomial problem.The traditional feature selection method is to find relevant features to the target variables.However, the features among the selected features subset should be weakly correlated.Combining the above-mentioned two conditions, the model can find the relationship between features and target variable.EGA can directly take the optimal gene and directly participate in the next evolution without selection.Thus, we adopt EGA to find the optimal feature subset based on GA's searching ability.
In this method, we set the individual as a 95-dimensional vector and each element is equal to 1 or 0. If the element is equal to 1, it means the corresponding dimension's feature is selected, otherwise without being selected.Initially, we initialize these individuals randomly.Then, calculate the fitness of each individual based on Eq. (1).To get RMSE in Eq. ( 1), we train a sample LSTM on the training test and predict MW on the testing set.Then calculate RMSE based on Eq. (2).To decrease the time complexity and increase the accuracy, we design a novel fitness function as Eq. ( 1).The first part represents the prediction accuracy and the second part represents time complexity.After calculating the fitness, the algorithm begins to select individuals.Before every loop, EGA retains the best individuals.The selection part is applied to the roulette wheel to produce two new solutions.The algorithm generates a random probability value and selects the corresponding individual.The next step is the crossover operator.The chosen crossover technique in this model is double-point.There are two crossover points and the chromosomes between two points are swapped only.The mutation operator is inversion mutation randomly generating a gene locus in a low mutate rate and inverting.If the old gene locus value is 1, the new value is 0. After generating a new population, if the number of iterations reaches the maximum, output the Elite GA-based Feature Selection of LSTM for Earthquake Prediction best individual, otherwise recalculate fitness.The best individual gene (output OF S = {of 1 , of 2 , ..., of n }) is the optimal feature subset.

The LSTM for MW prediction
LSTM aims to solve the problem of long-term dependence [31].Hence, we apply LSTM to predict MW with the output OF S of EGA feature selection phase.The diagram of LSTM for MW prediction is shown in Fig. 3. LSTM cell is enhanced by three components called gates, including forget gate, update gate, and output gate and two memory cells: hidden state and internal state.
Fig. 3 The LSTM for MW prediction The selected features are processed by the Min-Max scaler and the squashed data pass the input gate which takes the relevant features from the squashed input data by multiplying them with a sigmoid function.This function maps the relevant features the range 0 to 1.If the value is 0, the network removes the feature, otherwise, the feature pass through the network.The next step is to decide how much memory we need to store in the cell state.Then a tanh layer creates a new vector of new candidate values.The memory gate takes the information stored in the previous state and adds it to the input gate.Since the memory operator is addition instead of multiplication, LSTM avoids the vanishing problem.Moreover, the forget data decides which state of information needs to be memorized or forgotten based sigmoid function.Finally, the output gate decides what the algorithm is going to output based sigmoid function.Then, we put the cell state through tanh layer to push the values to between -1 and 1 and multiply it by the output of the sigmoid gate, so that we only output the parts we are interested in.In the last layer, the output is MW.These operations are described as Eqs.(4)(5)(6)(7)(8)(9).

Data process
The earthquake information of the regions AETA decting is intercepted from January 1, 2017 to December 31, 2022.AETA is a multi-component seismic monitoring system and it is co-developed by the IMS Laboratory of Peking University and our research group School of Computer Science of Wuhan University for earthquake monitoring and prediction.AETA system mainly collects two categories of data, including electromagnetic and acoustic.It has the advantages of strong system stability, high environmental adaptability, and strong anti-interference ability [32].To better verify earthquake prediction, the dataset consists four regions: DS1((98 The training data starts on January 1, 2017 and ends on February 1, 2022, and the testing data starts on February 1, 2022 and ends on December 30, 2022.In AETA, electromagnetic and acoustic signals are detected every 10 minutes, so, we selected more than 20000 samples in the four years only at one station.Each catalog contains a list of records: time of earthquake, 51 electromagnetic features, and 44 acoustic features.MW data is maintained on the China Earthquake Network Center (CENC).Their catalog is available over the internet at http://www.ceic.ac.cn/.Then, we merge the two tables over the time of earthquake.Table 1 shows the partial earthquake data of DS2 after merging.After merging the dataset, we fuzze multiple pieces of data from one day into one piece of data.The specific operation consists of two steps: selecting the different period of catalogs as the representative data and selecting the maximum of each feature and MW.In this experiment, we choose the timeP eriods including: 0:00-8:00, 0:00-12:00, and 0:00-24:00.Then, we transform time-series data into supervised learning data based on MW and the time step is one day.Fig. 4 shows the kurt of sound which is one of the 95 features.
Fig. 5 shows the relationship between partial different features.From Fig. 5(b), we found the electromagnetic absolute mean value and electromagnetic absolute maximum 5% position have a strong correlation.Not only these two features, there are also many characteristics that correlate with each other.Hence, these strong correlation features need to be rejected to decrease time complexity.

Prediction indicators
To evaluate the performance of our proposed earthquake prediction model, we choose various regression evaluation indicators: mean absolute error (MAE), mean squared error (MSE), RMSE, and R square (R 2 ).These indicators are calculated as Eqs.(10)(11)(12).

Baseline methods
The performance of our proposed EGA-LSTM is compared with other four ML approaches: RF [17], Adaboost [18], LR [33], and SVR [24].Moreover, Springer Nature 2021 L A T E X template Elite GA-based Feature Selection of LSTM for Earthquake Prediction 13 we choose two types of GA: SGA, SteadyGA and three different mutation strategies of DEs: operation vector is the best vector and the crossover operation is a random crossover (DE best 1 b), operation vector is the best vector and the crossover operation is linear order crossover (DE best 1 L), operation vector is a random vector and the crossover operation is random operation (DE rand 1 b).Six EAs adopt binary encoding and set the dimension of individuals to 95 corresponding features.1 means the corresponding dimension's feature is selected, otherwise without being selected.The number of individuals is set to 10 and the maximum of iteration is set to 20.The threshold of stagnation is set to 0.000001 and the maximum evolutionary stagnation counter is set to 10.In GAs, the crossover rate and mutation rate are set to 0.7 and 0.01.In DEs, the scaling factor and crossbreeding rate are both set to 0.5.
The RF is combined with 100 decision trees.Other parameters are set as follows: max depth = 10, min samples split=2, min samples leaf=1.The Adaboost's base learner is decision trees, the number of boosting is 100, and the learning rate is set to 1 and the boosting algorithm is based on the probability of prediction error.Elite GA-based Feature Selection of LSTM for Earthquake Prediction The LR's loss function adopts the least square method.In SVR, the kernel function adopts the radial basis function, the kernel factor is set to the reciprocal of the product of the number of features and the variance of the eigenvector and penalty parameter is set to 1.
The LSTM in our experiment has 100 hidden layers, one output layer, and the activation functions.The learning rate is set to 0.00045, the batch-size is set to 32 and the number of training epochs are set to 120 epochs.The initial weights are random values and the bias is set to 0.

Result and analysis
This section will demonstrate the result in two subsections.In section 4.4.1, the fitness and RMSE in different periods and EAs are demonstrated.In section 4.4.2,EGA-LSTM is compared with other nine methods based on four metrics.In section 4.4.3,two non-parametric tests were used to prove that EGA-LSTM is different other nine algorithms and EGA-LSTM is superior to LSTM in different datasets.
Table 2 demonstrates the result for the timeP eriod of 0:00-8:00, it is worth noting that EGA performs well than other EAs in most groups.Especially, in the group of ω a = 1, ω F = 0.8, RMSE is the lowest of all EAs which is a desirable property and in predicting MW.The difference in indicators of F , RM SE, and F itness is markable, producing a better difference greater than 0.03 with the second Fitness (DE rand 1 b) of RMSE and greater than 0.08 of Fitness.Meanwhile, the number of selected features is only 38 which is lower than 95 (without selecting).Table 3-4 describe timeP eriods of 0:00-8:00 and 0:00-24:00, respectively, we found RMSE stays between 0.103 and 0.110 and F decreases in different degrees.
By combining the three tables, we take a comprehensive view of F and RMSE.Along with lower the number of selected features, RMSE still performs well.Thus, we conclude that during the period of 0:00-8:00, EGA performs most stable and suitable.In Fig. 5(b), we found that there is a strong correlation between the electromagnetic absolute mean value and the electromagnetic absolute maximum 5% position.To increase predicting accuracy and decrease time complexity, one of these two features needs to be removed.And the best group's result shows the target reached.Fig. 7 shows the trend of RMSE and Fitness scores as ω F increased.With ω F increasing, the penalties of F .Therefore, the number of selected features decreased and the trend of Fitness increased.However, the change of RMSE is not obvious.It always fluctuates in a low range.

Comparisons among ML Methods
To verify the performance of EGA-LSTM, we compare it with some state-ofthe-art ML methods: RF, Adaboost, LR, SVR, and LSTM.Table 5 demonstrates the effectiveness between EGA-LSTM and other nine approaches at timeperiod of 0:00-8:00 at the region of DS1.It can be found that EGA-LSTM obtains better performance on all indicators.Meanwhile, all the approaches perform badly with the indicator R 2 .Totally speaking, EGA-LSTM outperforms other methods.
Referring to Table 6, we can find that for the MAE, MSE, RMSE and R 2 , EGA-LSTM performs best among the four predictors.For RMSE, it generates a difference greater than 0.011 units with the second best predictor (EGA-SVR).Therefore, EGA-LSTM is an appropriate method to predict MW.The predictor with the worst result is EGA-LR, which obtains much worse performance on the four indicators than other four approaches.That means the time sequence between seismic and precursory features cannot be ignored.It is worth noting that EGA-LSTM's performance for DS2 is the best among the four regions.Elite GA-based Feature Selection of LSTM for Earthquake Prediction Fig. 6 The process of decreasing in EGA searching With the reference to DS3(see Table 7), it is worth noting that LSTM without EGA perform best than EGA-LSTM.However EGA-LSTM's results still perform superior than others for the rest indicators.Intuitively, the difference between EGA-LSTM and other algorithms is not obvious, especially EGA-SVR.More statistical comparison details can be seen in section 4.5.
Particular results for DS4 are shown in Table 8.EGA-LSTM perform better than other nine methods for four metrics.
From a joint analysis of the four tables, a conclusion can be easily concluded.For the four indicators, EGA-LSTM is the best one of them.Therefore, it can be concluded that EGA-LSTM is the most precise and stable algorithms for all datasets.
Fig. 8 shows the fitting curves of EGA-LSTM.Since we choose the suitable parameter and loss function, this model is not overfitting and underfitting.The real MW and predicted MW on different methods are shown on Fig. 9, and the result demonstrates that all methods perform well in the range of a low MW.However, the high MW can rarely be predicted accurately.Fig. 7 The trend of RMSE and Fitness score as ω F

Non-parametric tests
Since some of the algorithms used were deterministic, we adopt hypothesis tests.To prove the difference of multiple algorithms, we test the performance of multiple algorithms on multiple datasets.Earthquake MW datasets can be used Kruskal-Wallis H test which is a kind of non-parametric test because they did not assume a normal distribution.Therefore, Kruskal-Wallis H test was used to determine whether EGA-LSTM's prediction accuracy significantly different from other algorithms on datasets DS1, DS2, DS3, and DS4.We set the confidence level α 0.05.Finally, we designed that the distribution function of RMSE for group i follows the form F i (RM SE) = F (RM SE − µ i ), where Elite GA-based Feature Selection of LSTM for Earthquake Prediction H 1 : not all µ i are equal The p-value and statistical quantities of Kruskal-Wallis H test from the experimental results was 0.0005 and 29.30.The p-value is less than 0.05 which indicting that we can reject the H 0 and receive H 1 (that indicts EGA-LSTM is different from other algorithms).
In order to verify the effectiveness of EGA-LSTM versus LSTM, another non-parametric statistical test, the Wilcoxon signed-rank test, was conducted for four different regions between the proposed EGA-LSTM and LSTM.The hypotheses are list as Eqs.(15)(16):   11 for 20 independent and replicate experiments, that RMSE of EGA-LSTM is better than the opposed to its counterpart LSTM.
In sub-Table 9(b) of Wilcoxon signed-rank, positive ranks, negative ranks are presented for each region of measures corresponding to EGA-LSTM and LSTM.The Wilcoxon signed-rank shows that the results from different regions are different.Checking the Wilcoxon signed-rank's boundary table, the critical value in case of one-tailed test with confidence level of 0.01 is 43, and in case of the confidence level of 0.05 is 60.In DS1, the test statistic is 28, therefore we reject the null hypothesis with 99% certainty that EGA-LSTM is superior to

Conclusion and Future Works
Aiming at solving the problem that the redundant features and time series properties hinder the development of earthquake magnitude prediction models, we propose an EGA-LSTM for time series earthquake prediction.First, the acoustic and electromagnetics data of AETA system we developed are fused and preprocessed by EGA to find the strong correlation indicators.Second, since the EGA has the advantages in searching optimal feature subset, we adopt it to selected features.Then, LSTM is implemented to execute magnitude prediction with the selected features, to process time series and complex data.Specifically, we chose RMSE of LSTM and the ratio of selected features as the fitness components of EGA.Finally, we test the proposed EGA-LSTM on the AETA data of Sichuan and Yunnan province.Experimental results demonstrate that all the methods can get the best performance when timeP eriod = 0 : 00 − 8 : 00, ω a = 1, and ω F = 0.8.Moreover, our proposed EGA-LSTM obtains satisfying performance than state-of-the-art approaches on the evaluation indicators MAE, MSE, RMSE, and R 2 .
However, due to the data of medium and large earthquakes belonging to small samples, in order to be able to predict medium and large earthquakes, the model proposed in this study needs to be improved for the data in AETA.Our future work will focus on how to process the time series of small samples to obtain an effective and usable magnitude prediction model, which is suitable for medium and large earthquakes.Apart from that, the theoretical analysis Elite GA-based Feature Selection of LSTM for Earthquake Prediction and more complicated earthquake prediction scenarios of our method can also be part of future work.

Ethical Approval
Not applicable.

Competing interests
Not applicable.

Authors' contributions
Zhiwei Ye and Wuyang Lan and Wen Zhou: Wrote the main manuscript text.Qiyi He: Supervision, Writing-Reviewing and Editing, Funding acquisition.Liang Hong and Xinguo Yu and Yunxuan Gao: Reviewing.All authors reviewed the manuscript.

Availability of data and materials
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Fig. 1 depicts, the selected features are divided into two groups: the training set and the testing set.The time sequence of the train set and test set don't change.In this work, EGA-LSTM has been compared to other classic ML methods, such as LR, SVR, Adaboost, and RF.Finally, different evaluation indicators are applied to evaluate these algorithms.

Fig. 1
Fig. 1 The program diagram of the proposed earthquake prediction model

Fig. 4
Fig.4The wave of the kurt of sound of DS2

Fig. 5
Fig. 5 Analysis of partial features of DS2

Fig. 8
Fig. 8 Plot of epoch and loss of the LSTM model

Table 1
Partial earthquake data features of DS2 after merging

Table 5
Comparison of effectiveness between EGA-LSTM and other approaches in DS1

Table 6
Comparison of effectiveness between EGA-LSTM and other approaches in DS2

Table 7
Comparison of effectiveness between EGA-LSTM and other approaches in DS3 (13)(14)ents the RMSE of a specific algorithm in our experiments.We set the null hypothesis(H 0 ) and alternative hypothesis(H 1 ) as Eqs.(13)(14):

Table 8
Comparison of effectiveness between EGA-LSTM and other approaches in DS4

Table 9
Results of Wilcoxon signed-rank test in terms of statistical measures