Storm surge level prediction based on improved NARX neural network

The northern Gulf of Mexico coast is affected by the North Atlantic hurricane season, which causes storm surge disasters every year and brings serious economic losses to the southern USA; therefore, it is necessary to make an accurate advance prediction of storm surge level. In this paper, a model with simple structure, fast computation speed, and accurate prediction results has been constructed based on nonlinear auto-regressive exogenous (NARX) neural network. Five types of data collected from observation stations are selected as the input factors of the model. To improve the model's computational efficiency, a neuron pruning strategy based on sensitivity analysis is introduced. By analyzing the output weights of the neurons in the hidden layer on the sensitivity of the model prediction output, the model structure can be adjusted accordingly. Moreover, a modular prediction method is introduced based on the tide harmonic analysis data so as to make the model prediction results more accurate. At last, a complete storm surge level prediction model, pruned modular (PM)-NARX, is constructed. In this paper, the model is trained by using historical data and used for storm surge level prediction along the northern Gulf of Mexico coast in 2020. The simulation test results show that the correlation between the predicted data and the observed data is stable above 0.99 at 12 h in advance and the model is able to produce the results within one minute. The prediction speed, accuracy, and stability are higher than those of conventional models. In addition, two sets of follow-up tests show that the prediction accuracy of the model can still maintain a high level. The above can prove that the pruned modular (PM)-NARX model can effectively provide early warning before the storm surge to avoid property damage and human casualties.


Introduction
Storm surge is an abnormal sea level change phenomenon caused by strong weather systems such as cold fronts, tropical cyclones, temperate cyclones, and sudden changes in air pressure. Offshore areas are more susceptible to extreme storm surges due to the combined effects of barometric, wind, and tidal fields, affecting all aspects of the social and economic environment worldwide. Due to the unique geographical position of the Gulf of Mexico, tropical cyclones generated in the North Atlantic and the Caribbean often enter the Gulf and are gradually strengthened when entering into coastal areas, causing severe damage there [1][2][3][4].
To deal with the damage caused by storm surge disasters, a lot of work has been carried out in storm surge disaster monitoring and surveillance, and prediction forecasting in countries around the world. From the 1950s to the present, different numerical simulation models of sea level have been established to predict and simulate storm surges by using hydrodynamics and atmospheric driving forces, such as the SLOSH and ETSS (Extra-Tropical Storm Surge) models in the USA, the SEA model in the UK, the Mike21 model in Denmark, and the DSCM model in the Netherlands [5]. In addition, based on SLOSH, the P-SURGE (Probabilistic Storm Surge) model was created [6], capable of predicting storm surges from the corresponding parameters of tropical cyclones; the lead time of P-SURGE forecast guidance can be up to 60 h if the data accuracy is sufficient. The model is now one of the most dominant modeling systems which informs all NWS (National Weather Service) coastal inundation forecasts, warnings, and risk communication during tropical cyclone landfall threats. Besides, Meteorological Development Laboratory (MDL) used ensemble techniques for quantitative estimation of inundation risk based on storm surge generation and developed the P-ETSS (Probabilistic Extra-Tropical Storm Surge) model by using atmospheric inputs from 21 Global Ensemble Forecast System ensemble members [7]. Moreover, based on artificial features plus linear model fitting, a numerical prediction model has also evolved. Oliveira et al. [8] empirically predicted storm surge by determining the relationship between the wind and barometric fields relative to the storm surge and combining it with neural networks, with the experimental results showing a correlation of 0.99 for predictions 6 h in advance. However, the model is not robust, and the prediction accuracy decreases significantly after extending the lead time. Salmun et al. [9] proposed a statistical method for storm surges forecasting (STAT FCST). The method regressed the storm surge extremes in historical storms on the mean wave height during the storm duration to derive the corresponding regression equation. The prediction tests were performed for 41 storm surges from 2005 to 2008 with a confidence level of 95%. However, the same drawbacks of an excessive amount of data required and complicated calculation process still exist. Georgas et al. [10][11][12][13] designed and operated the Stevens Flood Advisory System (SFAS, http:// steve ns. edu/ SFAS) in 2015. The SFAS is an ensemble prediction system used to forecast total water levels over a broad coastal region and street-scale flood levels for several NYH (New York Harbor) critical infrastructure sites. Every six hours, the underlying H3E (Hydrologic-Hydraulic-Hydrodynamic Ensemble) modeling framework prepares, runs, data-assimilates, and integrates the results from 375 dynamic model simulations to produce actionable, probabilistic ensemble forecasts of upland and coastal (storm surge) flooding conditions with an 81-h forecast horizon. SFAS incorporates multiple models, including ECMWF, GFS, NAM, CMC, etc., and has monitoring sites in multiple locations to obtain data, it has a large investment in both hardware and software, and the results are very reliable for storm surge prediction. The accuracy of SFAS was also confirmed in a recent evaluation-based study by evaluating SFAS's prediction of storm tide and resurgence during Tropical Cyclone Isaias [14]. Xie et al. [15] developed a fast storm surge ensemble forecasting model based on probabilistic prediction and search optimization of hurricane tracks from a numerical scenario database (SONSD). Its fast and accurate characteristics in predicting storm surges (prediction correlation coefficient > 0.9 within 1-2 min) were demonstrated in historical cases. Still, the model has its limitations since requires a large amount of local historical data to build the database, and the data are sometimes not easy to be accessed. Generally, these models might be helpful for predicting storm surges, but they have obvious drawbacks. They do not allow long-term prediction of storm surge activity, the computational process is very complicated and expensive, and the calculation costs are quite high, etc. By reducing the delay and streamlining the process of storm surge prediction, and by quickly and accurately predicting some critical data for each storm surge event, the authorities can be directed to prepare for the disaster and further reduce the damage caused by the disaster.
In recent years, with the development of artificial intelligence, more and more researchers have used data-driven models that introduce the idea of artificial intelligence in predicting storm surges [16][17][18][19][20], and the prediction is accomplished by mapping input and output variables to obtain information from the collected data. Various storm parameters such as sea level height, wind, barometric pressure, and other tropical cyclone characteristics are usually used in this type of prediction method to predict storm surge levels in the future period without directly studying the natural rules that govern the storm surge mechanism. The most commonly used method includes artificial neural networks (ANN). Lee [21] proposed an ANN-based system to predict storm surges in Taiwan. Four input parameters were used to effectively predict storm surge, including wind speed, wind direction, barometric pressure, and tide harmonic analysis data, with a correlation coefficient of 0.98, the effectiveness of the ANN model was verified. Wang et al. [22] proposed an ANN-based system to predict storm surges in coastal Louisiana and achieved better results in some storm tests, but the predictions are not generalizable and still need further improvement. Lee et al. [23] proposed an ANNbased generalized regression neural network (GRNN) that can reduce the computational efforts and improve the prediction efficiency by inputting fewer meteorological parameters to derive the prediction results. However, the accuracy of the predicted data from this model is low, with a correlation coefficient of only 0.916. Sahoo et al. [24] used storm surge prediction based on ANN with storm-related parameters as input for the 1999 Orissa super cyclone with 92% accuracy. Mahmoud et al. [25] proposed an improved ANN to predict the maximum height of storm surge at a specific location, combining physics-based simulations with deep learning to extend the ANN's capabilities. Simulation of over 30,000 TC data obtained the results with centimeter-level errors and showed its potential for performing risk-informed coastal management. Nevertheless, the case data used for testing are too few to reflect the model's generalizability, and the model has a prediction time step of seconds, which is more toward real-time prediction, and longterm prediction results are needed as a basis for practical applications. Other methods include surge prediction system that combines hydrodynamic models and artificial neural networks [26], long short-term memory network (LSTM) [27,28], back propagation neural network (BPNN) [29,30], machine learning models [31], etc., which still have disadvantages such as low prediction accuracy and lack of generalizability.
When performing storm surge prediction, the observed data can be considered as time series. The structural features of NARX neural network make it have better learning efficiency for time series and higher prediction accuracy. There are several previous examples of successful applications of NARX neural networks. Han et al. [32] used a NARX neural network with genetic algorithm incorporated to predict the daily bitcoin price, yielding a mean squared error of 0.00142 for the expected data, indicating that the NARX neural network is suitable for predicting the future direction of the data. Zubier and Eyouni [33] researched the influence of meteorological elements on sea level change in the east-central Red Sea using NARX neural network and were able to effectively identify the effect of meteorological factors on the remaining sea level change, demonstrating the effectiveness of NARX neural network in meteorological prediction. In addition, previous research results have also applied the NARX model to storm surge prediction [34,35]. By inputting storm parameters (NARX-Model-A), a model for storm surge prediction was constructed for the city of Venice. Although more accurate results were obtained, not much improvements were made compared with traditional model. Additionally, this study proved the feasibility of NARX for storm surge prediction as well, but there is still more space for the improvement in the prediction accuracy. In this paper, based on the traditional NARX model, a simple storm surge prediction model, Pruned modular-nonlinear auto-regressive exogenous neural network (PM-NARX) is constructed for predicting future storm surge extremes based on the meteorological parameters obtained at the observation stations.
The rest of the paper is organized as follows. Section 2 shows an introduction to the methodology, starting with a description of the traditional NARX neural network and then presenting the improvements in the PM-NARX neural network over the traditional NARX and at last introducing two comparison models. Section 3 describes the geographical characteristics of the Gulf of Mexico and introduces the selection of site data and the storm surge prediction performance measures. Section 4 describes the simulation experiments and results analysis. The prediction results of PM-NARX and the comparison models are described, and their performance is discussed through subsequent experiments compared with other traditional and improved methods. At last, conclusions are given in Sect. 5.

Methods
This section introduces the NARX neural network structure and its improvement methods, and introduces two other similar methods for the storm surge prediction problem for comparison.

NARX neural network
The NARX neural network is a model proposed by Leontaritis et al. (1985) for describing nonlinear discrete systems [36]. It is one of the most widely used neural networks in nonlinear dynamic systems and is suitable for time series prediction. It has been validated and applied to solve nonlinear series prediction problems in a variety of fields. Its memory effect on historical data enhances its ability to handle dynamic data and has significantly improved the prediction performance of complex sequences [37][38][39]. The basic structure of the NARX model is shown in Fig. 1.
As shown in Fig. 1, the NARX neural network is divided into a three-layer structure, an input layer containing n neural nodes, a hidden layer containing l neural nodes, and an output layer containing m neural nodes. x(t) represents the external input to the neural network, y(t) represents the output of the neural network at time t, and y(t-1) represents the output of the neural network at time t-1. ij represents the association weight of the ith node in the input layer to the jth Fig. 1 The NARX neural network structure diagram node in the hidden layer, jk represents the weight of the jth node in the hidden layer to the kth node in the output layer, a i represents the bias from the input layer to the hidden layer, and b j represents the bias from the hidden layer to the output layer. The model learning rate is , the excitation function is g(x) , which takes the tanh function, and the output mapping is between (−1, 1), monotonically continuous, with a limited output range, optimally stable, and faster convergence because it is 0-mean. The output layer uses Identity as an activation function on the final output node for the regression task. tanh and Identity are expressed as follows [40].
In addition, the neural network uses a certain indicator as a clue to find the optimal weight parameter, which is called the loss function and indicates the degree of difference between the current predicted output of the neural network and the actual measured data. In this paper, MSE is chosen as the loss function, when the gradient is easy to calculate and has a more stable solution, and the calculation formula is expressed by Eq. (3).
In Eq. (3), n represents the number of data individuals, e i represents the prediction error of the ith data. Regarding the selection of training functions (The training function continuously corrects the weights and thresholds during the training process and outputs the trained network and training records) in this model, it can be determined that trainlm and trainbr functions are trained optimally after several experimental validations. The former utilizes the Levenberg-Marquardt optimization algorithm to find the extreme value by gradient, which has the advantages of both the gradient method and the Newton method. The latter utilizes Bayesian regularization, which has better generalization ability and reduces the difficulty of determining the optimal network structure. The difference in training accuracy between the two functions is not significant, but trainlm is finally determined as the training function for this model because of its shorter training time.
The single-step prediction model of the NARX neural network can be expressed as: From Eq. (4), it can be seen that the output of the NARX neural network is influenced by the input and output data of the previous moment, and similarly, the multi-step prediction model with a step ahead can be expressed as: The storm surge level prediction is a multi-step time series prediction like that shown in Eq. (5). The standard NARX neural network uses a closed-loop model in the overall structure of the multi-step prediction, and the output of the neural network is fed directly to the input, as shown in Fig. 2a. Since the prediction is used instead of the actual measured values, prediction errors are accumulated, and thus the performance of the model may deteriorate rapidly as the prediction time horizon increases. For this problem, the open-loop model of the series-parallel neural network shown in Fig. 2b is constructed. In this mode, since the expected output of the NARX neural network training is known (The storm surge observation data), the desired output is fed back to the input. During the training process, this model has two advantages. Firstly, the prediction error of NARX neural network does not accumulate, and the training effect is more accurate. Secondly, the NARX neural network is transformed into a simple forward neural network, and the modeling function of static neural network can be utilized.
In this paper, the direct multi-step prediction method is used, and due to the recursive nature of the NARX neural network itself, the final model combines the characteristics of both direct and recursive multi-step prediction methods. In addition, in order to further improve the prediction speed and accuracy, this paper improves the traditional NARX model in terms of the overall structure of the model and the prediction dataset, respectively.

Sensitivity pruning strategy
Since there are no clear regulations for the setting of the number of nodes in the hidden layer of the neural network, the commonly used methods to determine it are the direct construction method, the deletion method, and the empirical formula method. All of them need to train the model manually several times and determine the neural network structure by evaluating the effect of each training session. However, the neural network model determined by these methods has a high probability of redundant neurons and weights. In order to improve the inefficient utilization of the hidden layer nodes in the training process of feedforward neural networks, this paper analyzes the sensitivity of the model prediction output by the contribution of the output weights of the hidden layer neurons, so as to adjust the overall structure of the neural network, without depending on the input dataset, and the sensitivity is calculated by the hidden layer nodes of the neural network itself so as to complete the pruning operation, which has general applicability. The detailed operation is to remove neurons whose sensitivity is below a certain threshold, which can effectively remove redundant hidden layer neurons without sacrificing network performance, and improve computational efficiency by reducing neural network dimensionality thereby [41][42][43].
In this paper, a single hidden layer, single output neural network is constructed so that the nodes' number of the input layer is i = 1, 2, … n, the nodes' number of the hidden layer is j = 1, 2, … l, and the nodes' number of output layer is k = m = 1.
At moment t, the weight update formula from the hidden layer to the output layer is: is the input of the hidden layer: In addition, e k (t) is t he prediction er ror, is the output of the output layer: Since the number of nodes in the output layer of the model constructed in this paper is 1, the weight jk from the hidden layer to the output layer can be expressed as h , h = 1, 2, … l. Taking h as the input quantity for sensitivity analysis, its sensitivity S h to the predicted output can be expressed by Sobol Index [44] in Eq. (9): In Eq. (9), Var represents the variance, Y represents the desired output, and E represents the conditional expectation of Y when h is equal to I i . The final expression for the sensitivity of the output weights of the hidden layer of the neural network can be obtained by performing the Fourier transform on the input and output quantities (Eq. 10), while the resulting sensitivity values are normalized in order to reduce the computational error (Eq. 11): where A t and B t are represented by Eqs. (12) and (13), respectively [41]: where the range of s is [−π, π]. When the obtained s ′ h is less than the set threshold, it indicates that the current hidden layer neural node is redundant, and the algorithm will delete the neural node and all the values connected to it. The judgment of redundant neural nodes and the deletion operation are performed during the training process. When the neural network structure is determined, the parameters are then modified using the most rapid descent algorithm.
In this paper, test data were selected to verify the feasibility of the sensitivity pruning strategy. (The test data were observed at Miami Biscayne Bay, 25° 43.9′ N, 80° 9.7′ W, June 1, 2020, 0000 GMT to July 30, 2020, 2300 GMT, with an observation interval of 1 h for 1440 sets of port measured surge level, and the prediction time step of 12.) The initial hidden layer neural nodes are set to 30, S ′ h is set to 0.2, and the results are run under the condition of 10,000 training A t cos j s + B t sin j s cos j s ds A t cos j s + B t sin j s sin j s ds times as shown below, the deleted hidden layer nodes end up being 6, and the mean error of the final training result converges to about 0.08 m (Fig. 3a), and the results remained stable in the subsequent test steps, and the error was basically kept within ± 0.05 m as shown in Fig. 3b), there are still a lot of error extremes in the testing process: And as can be seen in Fig. 4a, the pruning strategy removes some of the redundant neurons, which is reflected in the reduction of the hidden layer weights, and the pruned weights are all too small and inactive values, which are marked by black arrows in the figure. In addition, for the 20 subsequent tests conducted in this paper, Fig. 4b shows the changes in the hidden layer weights in Fig. 3 The pruned NARX training and testing process the NARX neural network before and after the pruning step was performed in the form of a box line plot. It can be seen that the changes of the box before and after pruning are not significant, but the distance between the upper and lower edges of the weights array after pruning is larger and the median line is closer to the 0 value, indicating that the weights after pruning are basically maintained uniformly distributed on both sides of the 0 value, and Moreover, in order to confirm the improvement in the time spent in running the traditional NARX neural network by introducing the sensitivity pruning strategy, the 20 tests were performed on the NARX model before and after pruning, respectively, and the correlation of operation time and prediction results are shown in Table 1.
As shown in Table 1, the average correlation of the prediction results of the NARX model processed by the sensitivity pruning strategy decreased by 0.38% compared with the original model, which was basically negligible. In comparison, the average prediction time was reduced by 57% compared with the original model, and the computing speed was greatly improved. However, the test data are more periodic, the prediction difficulty is low, and other subsequent improvements need to be introduced when predicting real storm surge data. The main role of this test is to confirm the effectiveness of the pruning strategy applied to the NARX neural network.

Modular prediction
The modular prediction method is introduced to improve the prediction dataset. In the field of storm surge prediction, traditional data-driven models generally predict surge levels directly. They cannot be targeted to make accurate predictions using the effects of meteorological factors on surge levels. This paper divides the storm surge data as prediction output into an astronomical component influenced by gravity with mathematical regularity and a non-astronomical component manipulated by extreme meteorological factors. By correlating the non-astronomical component data with the astronomical tide data and found that there was a significant negative correlation between the two datasets, as shown in Fig. 5, the overall influence of astronomical tide factors on observed sea level is also weakened. Tidal harmonic analysis is introduced to calculate the former and, the proposed model is used to predict the latter.
It is the most traditional technique in tidal forecasting. Firstly, we separate out the tidal divisions (such as lunar tidal system, solar tidal system, etc.) from the measured tidal records, and find out the amplitude and phase angle of each tidal division. Then, after being revised by astronomical factors, the tidal division's summation constant is obtained, which can be used to calculate the tidal changes within a certain period and analyze the tidal properties of the area. Each tide component can be expressed by Eq. (14): In Eq. (14), h means the height of the component, R represents the amplitude of the component, q t means the angular rate of the component (which can be determined according to the component), V 0 + u denotes the phase angle of the imaginary celestial body at zero universal time at the beginning of the observation period, and K means the phase angle at which the high tide lags behind the moon culmination due to seabed friction and inertia.
The tide component expression can be further written as Eq. (15):  The improved prediction model introduces a modular approach on the basis of pruning by deleting the regular astronomical tide data in the input dataset, replacing the prediction output with the non-astronomical surge data (surge deviation), and finally combining the resulting prediction output with the astronomical tide calculated by the harmonic analysis to derive the final prediction results. It has also been demonstrated in previous studies that the modularized NARX model has better prediction results compared to the conventional model, with improved accuracy and robustness of the model, with a combined degree of improvement in about 23.3% [45].

The improved model structure
As shown in Fig. 6, a complete PM-NARX model consists of two parts, the neural network part and the modular prediction part. Firstly, in the neural network part, the input data are evaluated once at the end of each training process on the sensitivity of the output weights of the hidden layer, so as to filter out redundant neural nodes for pruning operations. The pruned neural network is then tested to check whether the test results satisfy the set conditions, and if not, the internal parameters are adjusted and retrained in the first step, and if the test results satisfy the set conditions, the number of iterations is counted once (K + 1), and the number of training iterations is further determined whether the set upper limit is satisfied. The setting conditions mentioned here are represented by the mean square error of training data (if the current mean square error is less than the set value, it will be recorded as meeting the conditions of this training). This indicator can be set with the. trainParam.goal command. When the number of passing test results satisfies the set condition (K ≥ 6) or the number of training iterations is up to the upper limit, then enter the modular prediction part, the model is brought into the meteorological factor data for predicting nonastronomical surges. Finally, the prediction output from pruned NARX is combined with the astronomical tide data calculated by the tidal harmonic analysis can produce the final storm surge prediction results.

The comparison methods
In order to highlight and verify the superiority and accuracy of the proposed model, other commonly used datadriven models for prediction, including epsilon-support vector regression (ε-SVR) [46] and auto-regressive integrated moving average model (ARIMA) [47,48], are introduced under the same dataset conditions. The predicted results are also compared and analyzed in the form of graphs and tables in Sect. 4, respectively.
The ε-SVR originated from support vector machine (SVM) and was firstly proposed by Vapnik [49] as a binary classification algorithm that supports multivariate classification problems, while being extended to be applied to regression problems as well [50]. The principle of ε-SVR is to construct one or a set of hyper plane in a high-or finite-dimensional space. The regression implemented by the hyper plane, which means to minimize the distance from this hyper plane to the farthest training data point of any class, is functionally represented as shown in Eq. (16): In Eq. (16) b represents the bias and w represents the feature weight vector. By defining the Lagrangian function, the above equation can be transformed as shown in Eq. (17): In Eq. (17), i represents the lagrangian multiplier, x i , y i represent the horizontal and vertical coordinates of the sample points, respectively. Only the samples (support vectors) that play a role in the final w, b calculation will have an impact on their functional models during the linear regression process. Finally the optimized model is derived by minimizing the total loss and maximizing the interval.
Since the traditional SVR can only solve linear regression problems, on its basis, a kernel is introduced for the nonlinear regression problem, and the vector inner product space is extended so that the nonlinear regression problem can be turned into an approximately linear regression problem after the transformation of the kernel, which can be expressed as shown in Eq. (18): In Eq. (18), k x T i x is the kernel, and in this paper, the commonly used Gaussian kernel (Radial Basis Function, RBF) is chosen, which can be expressed as: , and its width σ is the standard deviation of the Gaussian kernel. ARIMA (p, d, q) model is one of the time series prediction and analysis methods, which is extended from ARMA (p, q) model and divided into two parts: auto-regressive (AR) and moving average model (MA). p represents the number of auto-regressive orders, q represents the number of moving average terms, and d represents the number of difference orders made to make the input data a smooth In Eq. (19), y t represents the predicted value, μ represents the constant term, i and i represent the coefficients of the AR part and MA part, respectively, L represents the lag operator, L i represents the regression of the time series by i time points, t represents the error, and the value of d is taken at the data processing step to complete, so it is not mentioned in the Equation.
During the operation of ARIMA, firstly, for the input data, the non-stationary time series data are smoothed through differencing, and the differential order d is determined by the augmented dickey-fuller (ADF) test of smoothness verification. Secondly, the autocorrelation function (ACF) and the partial autocorrelation function (PACF) are used to determine the order p and q of the ARIMA model. Then, the more efficient model is selected by using the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), with the following expression: where k is the number of model parameters, t is the number of samples, and L is the likelihood function. Finally, the residuals of the fitted results are checked to see if they satisfy the white noise property, and if so, the model is considered to be constructed and can be used for data prediction.

Materials
In 2020, the North Atlantic produced 31 tropical cyclones, including 13 hurricanes. There are 9 tropical cyclones that made landfall on the northern Gulf Coast, including 7 hurricanes, accounting for 53.8% of the total number of (19) hurricanes. In addition, 4 tropical cyclones are affecting the southeast coast of the USA, and the rest are tropical cyclones affecting southern Mexico (near the equator) or far from land. It can be seen that the North Atlantic hurricanes are mainly concentrated in the Gulf of Mexico, and it is more valuable to make more rapid and accurate storm surge gain prediction for this region. The named tropical cyclones that made landfall in the USA in the northern Gulf of Mexico in 2020 are shown in Table 2: As shown in Table 2, the impact of the 2020 North Atlantic hurricane season on the Gulf of Mexico is evenly distributed among the southern states of the USA. To verify the prediction effect of PM-NARX on the impact of different tropical cyclones, a number of stations in southeastern Texas, southern Louisiana, and southwestern Florida were selected to predict the storm surge at specific stations. The basic information of the stations and the actual observed storm surge data of each station are shown in Table 3 and Fig. 7: Accordingly, the storm surge time series extracted from each observation station are listed in Figs. 8,9,10 in the form of line graphs. Because the complete observation data is too long, only 1 month (720 h) of data to be predicted in these figures.
The measured data are sampled every 1 h, and the data are listed to obtain the time series of surge level for that  Concerning the selection of training data, it is a highly complex process because the causes and influencing factors of storm surges are very complicated, mainly the joint action of waves and atmosphere, where the atmospheric action is dominant and can be divided into the action of air pressure and wind stress. To this point, five nonlinear meteorological factors (wind speed, wind direction, gust speed, air temperature, and pressure) corresponding to the time of storm surge were selected for this paper, and all of the above data were queried and downloaded through the NOAA website. In this paper, the corresponding historical data are selected for each station to train the PM-NARX and the two comparison models. Initially, the meteorological observation data of 20 months before the occurrence of storm surge are extracted as the training set, and the meteorological data will have about 5% missing. The interpolation method is adopted for completing the processing in order to retain the continuous pattern of the data. In addition, during the training process, instead of following the general approach of extracting storm surge data from historical data to build a separate training dataset, the corresponding historical storm surge data (e.g., Michael, Imelda, Cristobal, etc.) and normal weather data of other time periods are mixed to build the training dataset for different stations. To enhance the model's ability to respond to unexpected conditions and the accuracy of predicting future storm surge data, and to reduce the possibility of misleading storm surge by the model.
In order to compare the prediction data more intuitively, three indicators, mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC), are introduced as the main basis for judging the prediction effect, and the calculation formula is shown in the equations below: where n is the number of samples, or it is interpreted as a time index in time series analysis; y k and y k represent the observed values and the average of the observed values, respectively; and ŷ k and ŷ k represent the predicted values and the average of the predicted values, respectively.
The MAE is the average of the absolute values of the deviations of all individual observations from the arithmetic mean. Making a smaller value of MAE in the measure indicates a more accurate prediction result. The introduction of this indicator can avoid the problem of positive and negative errors canceling each other out and thus can accurately reflect the magnitude of the actual prediction error.
The RMSE can effectively reflect the measurement precision. Making a smaller value of RMSE in the measure indicates a more accurate prediction result. This indicator has been squared and rooted to amplify the gap between larger errors, and if there are individual outliers with very large deviations, even if the number of outliers is very small, it makes the RMSE indicator poor, which is the main difference between RMSE and MAE.
In contrast, the CC is a statistical index reflecting the closeness of the degree of correlation between variables and is calculated according to the product difference method. Based on the deviations between two variables and their respective averages, the degree of correlation between these two variables is reflected by multiplying the two deviations. Generally, a CC above 0.7 indicates that the relationship is very close; 0.4 ~ 0.7 indicates a close relationship; and 0.2 ~ 0.4 shows that the relationship is general. The larger the CC, the better the prediction effect.

Simulation and analysis
In this section, the simulation results and error evaluation indicators of different models for storm surge level are compared and analyzed. In this paper, ARIMA and ε-SVR are selected as the control models for simultaneous prediction with PM-NARX. In addition, the prediction of the PM-NARX is also compared and analyzed with other existing storm surge prediction models. Finally, an attempt is made to reduce the training dataset of PM-NARX to determine the minimum amount of training data that enables the model to make accurate predictions with informative values.

Storm surge level prediction simulation
The parameters of each model need to be set before conducting the prediction simulation, respectively. Firstly, the PM-NARX model needs to set the number of neural nodes inside the neural network (input layer, hidden layer, and output layer), the sensitivity threshold s ′ h of the neural nodes in the hidden layer, the maximum number of training times of the model, and the TDL value. Secondly, the ARIMA model requires setting the values of the p, d, and q parameters. Finally the ε-SVR model requires setting the penalty parameter c, the kernel parameter g, and the tolerance deviation ε. The parameters of PM-NARX in the three models can be applied to all station dataset: Since the input dataset contains 5 meteorological factors and the output data is the surge level at a certain moment, the input layer and output layer neural nodes are set to 5 and 1, respectively, and the implied layer neural nodes are initially set to 30, s ′ h is set to 0.2, and the maximum training times are set to 5000. When determining the TDL, since the non-astronomical surge separated from the actual observed sea level data can be regarded as white noise with basically no autocorrelation, but considering that the predicted output will eventually be superposed with the astronomical tide, the TDL is determined to be 24 according to the tidal period. The other two models need to adjust their internal parameters according to the data of each station separately, as shown in Table 4.
From the perspective of practical application, the key time for hurricane prevention is 1 to 6 h before hurricane landfall, and all preparations should be completed within 12 h before hurricane landfall, so the advance time step a of the simulation is initially set to 12 in this paper. In this way, the preparation for storm surge prediction simulation is basically completed, and the prediction output can be obtained by bringing each station dataset into each model, as shown in the figure below (Figs. 11, 12, 13).
For S1-3, the former half of the storm surge data shows a cyclical upward trend, and the latter half shows a storm surge extreme and decreases cyclically. Since the Beta rating is TS and the wind is not strong, there is apparent periodicity of the observed data, and the prediction for it is relatively easy. From the prediction results, the prediction outputs of the three models are basically consistent with the actual observed surge data, but the error plots can be used for comparison so as to show that the prediction error of PM-NARX is the smallest and the smoothest, and the prediction error of ARIMA is the largest, with an error of more than 0.2 m even in S1, while the prediction effect of ε-SVR lies between the other two.
For S4-6, the storm surge data are relatively stable overall. Still, there is a brief period of extreme surge at each station under the direct influence of C4 Hurricane Delta, which is different from the overall data pattern and is challenging to predict by general data-driven models. All three models were able to accurately identify the moment when such anomalous data appeared in the surge level data, but the gap between the models could be seen in the accuracy of the predicted values, and the prediction results of all three models fluctuated drastically at the anomalous data moment. The ARIMA model has a prediction value that differs from the observed surge level by more than 0.5 m, and the error is more in the high prediction value that exceeds the observed value, with a difference of 0.92 m at station S6. The error of the ε-SVR model is more in the low prediction value that fails to reach the observed value, with a difference of − 0.96 m at station S4. In contrast, the PM-NARX model also has prediction fluctuation, but the extent is lower. The prediction results of S4-5 show the obvious advantage of PM-NARX, and only one error of − 1 m occurred after the anomalous data at S6, which is the same as ε-SVR. However, its prediction of surge level extreme is still the most accurate among the three models. For S7-8, although hurricane Eta is also C4, its wind has weakened when it moves near the stations, and its intensity has dropped to TS. So, the storm surge data at S7-8 are similar to S1-3, and the prediction effects of the three models are relatively close, and the error fluctuations are all basically controlled within 0.1 m. However, compared with the error plot, it can be seen that the prediction error of ARIMA has the most significant fluctuation at station S7. The prediction error of ε-SVR has the most significant fluctuation at station S8, and PM-NARX has an error of more than 0.2 m. This error has also appeared in the previous simulations, appearing a few hours after the storm surge extreme, but it has little impact on the reliability of the model prediction results. From an overall perspective, the PM-NARX model has the slightest and most stable prediction error, and its performance is still dominant.
For the prediction results of the three models 12 h ahead (prediction lead time a = 12), the prediction error evaluation indexes of the three models at the selected stations are compiled and calculated in this paper. In addition to the MAE, RMSE, and CC mentioned above, the extreme of positive Fig. 11 The comparison of predicted outputs and errors from three models at three stations (S1-3) located in Texas and negative errors and the time cost for prediction are additionally included, and are presented in Table 5: According to the comparison of prediction results and the prediction evaluation indicators in Table 6, it can be seen that all three models can produce relatively precise prediction results, but in comparison, the PM-NARX model has the most stable and accurate prediction results at all stations, with a mean CC value of 0.9939, which is higher than the same indicator of the ε-SVR and ARIMA models of 0.9863 and 0.9811, respectively. The information that is not reflected in the prediction results and error comparison figures can be seen by comparing the three evaluation indicators. The increase in MAE may be due to a rise in the number of error extrema points in the prediction error or the appearance of larger error extrema. However, the increase of RMSE may be because the overall stability of the prediction error is reduced, and the error extremes alone have little effect on this indicator, so the state of the prediction error can be judged from the magnitude of these two indicators. For example, PM-NARX has a negative error extreme of − 1 m at station S6, leading to an increase in MAE values, and because of the simultaneous presence of large positive error extremes, indicating overall fluctuations in the prediction results, leading to a simultaneous increase in RMSE values. By analyzing the error evaluation indicators, it can be seen that PM-NARX has the lowest mean values of MAE and RMSE, and is able to output accurate, reference-value prediction results consistently at all selected stations. There The reason for this situation may be the compensation mechanism limited by the structure of the NARX model itself. However, fortunately, the prediction of the timing of the extreme value of water gain is still accurate. Finally, at station S7-8, although the three models are close to each other, and even the PM-NARX model has a negative error of 0.24 m, its MAE and RMSE values are the lowest among the three models, indicating that the PM-NARX model is the best in terms of accuracy and stability of prediction results, which can also be seen from the comparison of CC values.
In addition, in terms of the time cost, PM-NARX takes the shortest time and has a high accuracy while producing faster prediction results. Overall, the average prediction time of PM-NARX model is 55.6 s per station, which is 39% and 94% better than the ε-SVR model (91.25 s) and ARIMA model (938.125 s), respectively. Moreover, in order to verify the robustness of PM-NARX, this paper conducted a further simulation test and changed the prediction conditions to 24 h and 48 h in advance (a = 24 and a = 48) for prediction, and the results of the three simulation tests were compared as a whole as follows: It can be seen from Fig. 14 that increasing a makes the prediction performance of PM-NARX slightly decreased, which is obvious in all three indicators except Time, especially in the station S4-6, which has higher prediction difficulty, the stability and accuracy of the prediction results decreased more compared with other stations due to a brief extreme surge, and the maximum error also increased, but the CC value still remains above 0.9. Specifically, increasing a will make the prediction results less relevant, stable, and accurate, making PM-NARX unable to make long-term predictions for storm surge generated by the C4 hurricanes, but the results can still be used as a basis for storm surge generated by the tropical storms. In addition, in terms of the time taken, the average prediction times of the three sets of simulation tests at each station are 55.6 s, 58 s, and 56.375 s, respectively, which are not very different, so that the change of a does not affect the prediction efficiency.

Comparison with other models
This paper also collected other researchers' published storm surge prediction models, and to ensure the fairness of the comparison, the selected models are all data-driven models Fig. 13 The comparison of predicted outputs and errors from three models at two stations (S7-8) located in Florida based on artificial neural networks and storm parameters, and the storm surge prediction results at a = 12 are chosen uniformly. The selected models include SLPVRF [28], NARX-Model-A [34], and SSM (Storm Surge Model) [51], and the prediction evaluation indicators of the optimal results derived from each of these four models are compared with PM-NARX, including MAE, RMSE, and CC. For these models, due to the literature's selected evaluation indicators are different and have certain limitations, so only two evaluation indicator parameters are collected in this paper.
From Table 6 and Fig. 15, PM-NARX is the smallest in the comparison of known values in terms of MAE and RMSE values, indicating that it has the highest prediction accuracy and the best stability. Moreover, in terms of CC values, although the prediction results of all four models are highly linearly correlated with the observed values (> 0.9), PM-NARX has the highest correlation in comparison, which laterally proves its prediction accuracy. On the whole, the prediction results of PM-NARX model perform more satisfactorily.

Testing with reduced training dataset
Previous researches have used ANN models for storm surge prediction with an extensive training dataset. This paper uses the recursive nature of the NARX neural network to attempt multiple reductions of the training dataset for storm surge prediction. Since there are too many stations and this experiment focuses on testing the characteristics of the PM-NARX model itself, which is largely independent of the data value. Therefore, in this paper, two representative stations are selected based on the output results of eight prediction stations, S2, which has the relatively best prediction effect, and S6, which has the relatively worst prediction effect, and the training dataset corresponding to the representative stations are reduced in time order to obtain four different lengths of training data: 12 months, 6 months, 3 months, and 1 month. The complete training dataset and the reduced ones are brought into the model simultaneously and the predicted storm surge level under the condition of a = 12 will be output, and the evaluation indicators of the prediction results are compared as shown in Fig. 15. From Fig. 16, at station S2, where the observed data are relatively stable, the training dataset is reduced to 6 months and PM-NARX can still predicts the storm surge data with the informative prediction results, while at station S6, the prediction error and the increasing magnitude of the error are more obvious due to the presence of brief extreme surge, so the length of the training data that can produce informative prediction results is 12 months. In terms of time cost, the model cost about the same amount of time for datasets of 20 and 12 months in length, with an average of about 50 s. However, there is a watershed at 6 months, where the time taken is around 20 s, which is a significant reduction in the prediction time compared to applying 20 and 12 months datasets. The prediction results can be obtained within 10 s when applying 3 months or less training data, but the prediction output of the model is no longer informative due to the large error. Overall, the prediction accuracy and stability of the PM-NARX model are significantly reduced after reducing the training dataset, but the prediction time is correspondingly reduced by about 50%.

Conclusion
This paper proposes a storm surge level prediction model (PM-NARX) based on an improved NARX neural network. By using meteorological parameters as input, the neural network structure is automatically determined by integrating a strategy of neuron sensitivity pruning, introducing a modular prediction method for harmonic analysis data, and dividing the observed surge level data into the astronomical tide and non-astronomical surge components to achieve a targeted prediction of storm surge level. The PM-NARX was trained and simulated on a dataset collected from eight stations along the northern coast of the Gulf of Mexico. A comparison of the MAE, RMSE, and CC values of the prediction results of PM-NARX with other common data-driven models shows that PM-NARX performs more accuracy and stability of the prediction results under the condition of 12 h advance (prediction lead time a = 12) than other models. However, PM-NARX is further demonstrated to be unable to predict storm surges caused by hurricanes beyond C4 in the long term by testing with increased a value. In addition, the test with a reduced training dataset demonstrates that the PM-NARX model can predict more accurate results even with fewer training data (at least 6 months), and the time required for prediction will be reduced by more than 50%. This can reduce the prediction cost and enhance the generalizability of the model to some extent and PM-NARX model can be very helpful for the prediction of storm surge in areas with relatively little historical data. Finally, since PM-NARX is a data-driven model and is not limited to storm surge prediction, it can even be applied to other time series multi-step prediction problems beyond storm surge prediction as well.
Teaching Guidance Sub-committee of the Teaching Guidance Committee of the Ministry of Education for Transportation Majors in Colleges and Universities (2022jzw004) and the Fundamental Research Funds for the Central Universities (3132022142) in China.
Author contributions All authors have contributed to the creation of this manuscript for important intellectual content and read and approved the final manuscript.
Funding We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, "Storm Surge Level Prediction Based on Improved NARX Neural Network." Data availability Enquiries about data availability should be directed to the authors.