Simulation Materials. In 2020, the North Atlantic produced 31 tropical cyclones, including 13 hurricanes. There are 9 tropical cyclones that made landfall on the northern Gulf Coast, including 7 hurricanes, accounting for 53.8% of the total number of hurricanes. In addition, 4 tropical cyclones are affecting the southeast coast of the United States, and the rest are tropical cyclones affecting southern Mexico (near the equator) or far from land. The North Atlantic hurricane season is mainly concentrated in the Gulf of Mexico, and its impact is evenly distributed among the southern states of the U.S. To verify the prediction effect of PM-NARX on the impact of different tropical cyclones, a number of stations in southeastern Texas, southern Louisiana, and southwestern Florida were selected to predict the storm surge at specific stations. The basic information of the selected stations and the actual observed storm surge data of each station are shown in Table1:
Station Area
|
Station Number
|
Station ID
|
Longitude
|
Latitude
|
Table 1
Basic information about the selected stations
Texas
|
S1
|
8771341
|
94°43.5′W
|
29°21.4′N
|
S2
|
8775792
|
97°14.2′W
|
27°38.0′N
|
S3
|
8779770
|
97°12.9′W
|
26°3.7′N
|
Louisiana
|
S4
|
8761927
|
90°6.8′W
|
30°1.6′N
|
S5
|
8764227
|
91°20.3′W
|
29°27.0′N
|
S6
|
8768094
|
93°20.6′W
|
29°46.1′N
|
Florida
|
S7
|
8725520
|
81°52.3′W
|
26°38.9′N
|
S8
|
8726520
|
82°37.6′W
|
27°45.6′N
|
Accordingly, the storm surge time series extracted from each observation station are listed in Figs. 1, 2 and 3 in the form of line graphs. Because the complete observation data is too long, only one month (720h) of data to be predicted in these figures.
The measured data are sampled every 1h, and the data are listed to obtain the time series of surge level for that period. In this paper, the prediction results are set to a total of 720 data for 30 days (Fig. 9–11), and a longer time span is used to analyze whether the model can accurately predict when the storm surge extremes occur and reduce the predicted values back to the normal range in time. For stations S1-3 in Texas, this paper researches the impact of Tropical Storm BETA on the region, thus selecting the measured storm surge data from September 1, 2020 GMT 0000 to September 30, 2020 GMT 2300 as the prediction output for the simulation prediction test. For stations S4-6 in Louisiana, this paper researches the impact of Hurricane DELTA on the region, thus selecting the observed surge data from October 1, 2020 GMT 0000 to October 30, 2020 GMT 2300; for stations S7-8 in Florida, this paper researches the impact of Hurricane Eta on the region, thus selecting the observed surge data from November 2020 GMT 0000 to October 30, 2020 GMT 2300. And for stations S7-8 in Florida, this paper researches the impact of Hurricane ETA on the region, thus selecting the observed surge data from November 2020 GMT 0000 to October 30, 2020 GMT 2300.
Concerning the selection of training data, it is a highly complex process because the causes and influencing factors of storm surges are very complicated, mainly the joint action of waves and atmosphere, where the atmospheric action is dominant and can be divided into the action of air pressure and wind stress, etc. To this point, five non-linear meteorological factors (wind speed, wind direction, gust speed, air temperature, and pressure) corresponding to the time of storm surge were selected for this paper, and the corresponding historical data are selected for each station to train the PM-NARX and the two comparison models.
In order to compare the prediction data more intuitively, three indicators, Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Correlation coefficient (CC), are introduced as the main basis for judging the prediction effect, and the calculation formula is shown in the Equations(1–3):
\(MAE=\frac{1}{n}\sum _{k=1}^{n}\left|{y}_{k}-{\widehat{y}}_{k}\right|\)
|
(1)
|
\(RMSE=\sqrt{\frac{\sum _{k=1}^{n}{\left({y}_{k}-{\widehat{y}}_{k}\right)}^{2}}{n}}\)
|
(2)
|
\(CC=\frac{\sum _{k=1}^{n}\left({y}_{k}-{\stackrel{-}{y}}_{k}\right)\left({\widehat{y}}_{k}-{\stackrel{-}{\widehat{y}}}_{k}\right)}{\sqrt{\sum _{k=1}^{n}{\left({y}_{k}-{\stackrel{-}{y}}_{k}\right)}^{2}\sum _{k=1}^{n}{\left({\widehat{y}}_{k}-{\stackrel{-}{\widehat{y}}}_{k}\right)}^{2}}}\)
|
(3)
|
Where n is the number of samples, or it is interpreted as a time index in time series analysis; \({y}_{k}\) and \({\stackrel{-}{y}}_{k}\) represent the observed values and the average of the observed values, respectively; and \({\widehat{y}}_{k}\) and \({\stackrel{-}{\widehat{y}}}_{k}\) represent the predicted values and the average of the predicted values, respectively.
Simulation Method The NARX neural network is a model proposed by Leontaritis et al. (1985) for describing nonlinear discrete systems [22]. It is one of the most widely used neural networks in non-linear dynamic systems and is suitable for time series prediction. It has been validated and applied to solve non-linear series prediction problems in a variety of fields. Its memory effect on historical data enhances its ability to handle dynamic data and has significantly improved the prediction performance of complex sequences [23–25]. The basic structure of the NARX model is shown in Fig. 4.
As shown in Fig. 4, the standard NARX neural network is divided into a three-layer structure, an input layer containing n neural nodes, a hidden layer containing l neural nodes, and an output layer containing m neural nodes. x(t) represents the external input to the neural network, y(t) represents the output of the neural network at time t, and y(t-1) represents the output of the neural network at time t-1. \({\omega }_{ij}\) represents the association weight of the i-th node in the input layer to the j-th node in the hidden layer, \({\omega }_{jk}\) represents the weight of the j-th node in the hidden layer to the k-th node in the output layer, \({a}_{j}\)represents the bias from the input layer to the hidden layer, \({b}_{k}\)represents the bias from the hidden layer to the output layer. The model learning rate is \(\eta\), the excitation function is \(g\left(x\right)\), which takes the tanh function, and the output mapping is between (-1, 1), monotonically continuous, with a limited output range, optimally stable, and faster convergence because it is 0-mean. The output layer uses Identity as an activation function on the final output node for the regression task. tanh and Identity are expressed as follows [26].
$$g\left(x\right)=\frac{{e}^{x}-{e}^{-x}}{{e}^{x}+{e}^{-x}}$$
4
$$Identity\left(x\right)=x$$
5
In addition, the neural network uses a certain indicator as a clue to find the optimal weight parameter, which is called the loss function and indicates the degree of difference between the current predicted output of the neural network and the actual measured data. In this paper, MSE is chosen as the loss function, when the gradient is easy to calculate and has a more stable solution, and the calculation formula is expressed by Eq. (3).
$$loss=\frac{1}{n}\sum _{i=1}^{n}{\left({e}_{i}\right)}^{2}$$
6
In Eq. (6), n represents the number of data individuals, ei represents the prediction error of the i-th data. Regarding the selection of training functions in this model, it can be determined that trainlm and trainbr functions are trained optimally after several experimental validations. The former utilizes the Levenberg-Marquardt optimization algorithm to find the extreme value by gradient, which has the advantages of both the gradient method and the Newton method. The latter utilizes Bayesian Regularization, which has better generalization ability and reduces the difficulty of determining the optimal network structure. The difference in training accuracy between the two functions is not significant, but trainlm is finally determined as the training function for this model because of its shorter training time.
The single-step prediction model of the NARX neural network can be expressed as:
$$y\left(t\right)=f\left(y\right(t-1),y(t-2),\dots ,y(t-{n}_{y}),x(t-1),x(t-2),\dots ,x(t-{n}_{x}\left)\right)$$
7
From Eq. (7), it can be seen that the output of the NARX neural network is influenced by the input and output data of the previous moment, and similarly, the multi-step prediction model with a step ahead can be expressed as:
$$y\left(t\right)=f\left(y\right(t-a),y(t-a-1),\dots ,y(t-{n}_{y}-a+1),x(t-a),x(t-a-1),\dots ,x(t-{n}_{x}-a+1\left)\right)$$
8
The storm surge level prediction is a multi-step time series prediction like that shown in Eq. (8). However, the standard NARX neural network uses a closed-loop model with parallel structure in the multi-step prediction, as shown in Fig. 5(a). Moreover, the output of the neural network is fed directly to the input, as shown in Fig. 4. Since the prediction is used instead of the actual measured values, prediction errors are accumulated, thus the performance of the model may deteriorate rapidly as the prediction time horizon increases. For this problem, the open-loop model neural network with series-parallel structure shown in Fig. 5(b) is constructed. In this mode, since the expected output of the NARX neural network training is known (The storm surge observation data), the desired output is fed back to the input.
Figure 5 Diagram of two structures of NARX neural network, where TDL means the time delay ladder, y(t) is the known desired output, and Y(t) is the predicted tide height
In this paper, the direct multi-step prediction method is used, and due to the recursive nature of the NARX neural network itself, the final model combines the characteristics of both direct and recursive multi-step prediction methods. In addition, in order to further improve the prediction speed and accuracy, this paper improves the traditional NARX model in terms of the overall structure of the model and the prediction data set, respectively.
Sensitivity pruning strategy There are no clear regulations for the setting of the number of nodes in the hidden layer of the neural networks. For this problem, this paper analyzes the sensitivity of the model prediction output by the contribution of the output weights of the hidden layer neurons, so as to adjust the overall structure of the neural network, without depending on the input data set, and the sensitivity is calculated by the hidden layer nodes of the neural network itself so as to complete the pruning operation, which has general applicability. The detailed operation is to remove neurons whose sensitivity is below a certain threshold, which can effectively remove redundant hidden layer neurons without sacrificing network performance, and improve computational efficiency by reducing neural network dimensionality thereby [27–29].
In this paper, a single hidden layer, single output neural network is constructed so that the nodes’ number of the input layer is i = 1, 2, ... n, the nodes’ number of the hidden layer is j = 1, 2, ... l, and the nodes’ number of output layer is k = m = 1.
At moment t, the weight update formula from the hidden layer to the output layer is:
$${\omega }_{jk}^{{\prime }}\left(t\right)={\omega }_{jk}\left(t\right)+\eta {H}_{j}{\left(t\right)e}_{k}\left(t\right)$$
9
In Eq. (6), \({H}_{j}\left(t\right)\) is the output of the hidden layer, \({H}_{j}\left(t\right)=g\left({I}_{i}\left(t\right)\right)\), where \({I}_{i}\left(t\right)\) is the input of the hidden layer:
$${I}_{i}\left(t\right)=\sum _{i=1}^{{n}_{x}}{\omega }_{ij}{\left(t\right)x}_{{j}_{x}}\left(t\right)+\sum _{i=1}^{{n}_{y}}{\omega }_{ij}{\left(t\right)y}_{{j}_{y}}\left(t\right)+{a}_{j}, \left\{\begin{array}{c}{j}_{x}+{j}_{y}=j\\ {n}_{x}+{n}_{y}=n\end{array}\right.$$
10
In addition, \({e}_{k}\left(t\right)\) is the prediction error, \({e}_{k}\left(t\right)=\left({Y}_{k}\left(t\right)-{O}_{k}\left(t\right)\right)\), where \({Y}_{k}\left(t\right)\) is the desired output and \({O}_{k}\left(t\right)\) is the output of the output layer:
$${O}_{k}\left(t\right)=\sum _{j=1}^{l}{H}_{j}{\left(t\right)\omega }_{jk}\left(t\right)+{b}_{k}$$
11
Since the number of nodes in the output layer of the model constructed in this paper is 1, the weight \({\omega }_{jk}\) from the hidden layer to the output layer can be expressed as \({\omega }_{h}\), h = 1, 2, ... l. Taking \({\omega }_{h}\) as the input quantity for sensitivity analysis, its sensitivity \({S}_{h}\) to the predicted output can be expressed by Eq. (9):
$${S}_{h}=\frac{{Var}_{{\omega }_{h}}\left[E\left(\left.Y\right|{I}_{i}={\omega }_{h}\right)\right]}{{Var}\left(Y\right)}$$
12
In Eq. (9), Var represents the variance, Y represents the desired output, and E represents the value of Y when \({\omega }_{h}\) is equal to \({I}_{i}\). The final expression for the sensitivity of the output weights of the hidden layer of the neural network can be obtained by performing the Fourier transform on the input and output quantities (Eq. 13), while the resulting sensitivity values are normalized in order to reduce the computational error (Eq. 14):
$${S}_{h}=\frac{\sum _{t=1}^{+\infty }\left({A}_{t{\omega }_{h}}^{2}+{B}_{t{\omega }_{h}}^{2}\right)}{\sum _{t=1}^{+\infty }\left({A}_{t}^{2}+{B}_{t}^{2}\right)}$$
13
$${S}_{h}^{{\prime }}=\frac{{s}_{h}}{\sum _{h=1}^{l}{s}_{h}}$$
14
where \({A}_{t}\) and \({B}_{t}\) are represented by Equations (15) and (16), respectively:
$${A}_{t}=\frac{1}{2\pi }{\int }_{-\pi }^{\pi }\sum _{t=0}^{\infty }\left({A}_{t}{cos}\left({\omega }_{j}s\right)+{\beta }_{t}{sin}\left({\omega }_{j}s\right)\right){cos}\left({\omega }_{j}s\right)ds, -\pi \le s\le \pi$$
15
$${B}_{t}=\frac{1}{2\pi }{\int }_{-\pi }^{\pi }\sum _{t=0}^{\infty }\left({A}_{t}{cos}\left({\omega }_{j}s\right)+{\beta }_{t}{sin}\left({\omega }_{j}s\right)\right){sin}\left({\omega }_{j}s\right)ds, -\pi \le s\le \pi$$
16
When the obtained \({s}_{h}^{{\prime }}\) is less than the set threshold, it indicates that the current hidden layer neural node is redundant, and the algorithm will delete the neural node and all the values connected to it. The judgment of redundant neural nodes and the deletion operation are performed during the training process. When the neural network structure is determined, the parameters are then modified using the most rapid descent algorithm.
In this paper, a set of test data were selected to verify the feasibility of the sensitivity pruning strategy. The initial hidden layer neural nodes are set to 30, \({S}_{h}^{{\prime }}\) is set to 0.2, and the results are run under the condition of 10,000 training times as shown below, the deleted hidden layer nodes end up being 6, and the mean square error of the final training result converges to about 0.08m, the test process and result is shown in Figs. 6 and 7:
As seen in the figures above, after the pruning process, the neural network removes some redundant neurons, making the whole network more compact. In addition, to verify the improvement of the traditional NARX neural network by introducing the sensitivity pruning strategy, five tests were performed on the NARX model before and after pruning respectively, and the correlation of operation time and prediction results are shown in Table 2.
Table 2
Correlation of prediction results and time cost between the traditional NARX and the pruned NARX
Model
|
CC
|
Time/s
|
Traditional NARX
|
0.9878
|
147
|
0.9827
|
143
|
0.9876
|
106
|
0.9880
|
128
|
0.9863
|
115
|
Mean value
|
0.9865
|
128
|
Pruned-NARX
|
0.9818
|
50
|
0.9827
|
56
|
0.9820
|
47
|
0.9847
|
66
|
0.9828
|
56
|
Mean value
|
0.9828
|
55
|
As shown in Table 2, the average correlation of the prediction results of the NARX model processed by the sensitivity pruning strategy decreased by 0.38% compared with the original model, which was basically negligible. In comparison. the average prediction time was reduced by 57% compared with the original model, and the computing speed was greatly improved.
Modular prediction The modular prediction method is introduced to improve the prediction dataset. In the field of storm surge prediction, traditional data-driven models generally predict surge levels directly. They cannot be targeted to make accurate predictions using the effects of meteorological factors on surge levels. This paper divides the storm surge data as prediction output into an astronomical component influenced by gravity with mathematical regularity and a non-astronomical component manipulated by extreme meteorological factors. Tidal harmonic analysis is introduced to calculate the former and, the proposed model is used to predict the latter.
It is the most traditional technique in tidal forecasting. Firstly, we separate out the tidal divisions (such as lunar tidal system, solar tidal system, etc.) from the measured tidal records, and find out the amplitude and phase angle of each tidal division. Then, after being revised by astronomical factors, the tidal division's summation constant is obtained, which can be used to calculate the tidal changes within a certain period and analyze the tidal properties of the area. Each tide component can be expressed by Eq. 17:
$$h=R{cos}\left({q}_{t}+{V}_{0}+u-K\right)$$
17
In Eq. (17), h means the height of the component, R represents the amplitude of the component, qt means the angular rate of the component (which can be determined according to the component), \({V}_{0}+u\) denotes the phase angle of the imaginary celestial body at zero universal time at the beginning of the observation period, and K means the phase angle at which the high tide lags behind the moon culmination due to seabed friction and inertia.
The tide component expression can be further written as Eq. 18:
$$h=fH{cos}\left({q}_{t}+{V}_{0}+u-K\right)$$
18
In Eq. (18), \(f\), \({q}_{t}\) and \({V}_{0}+u\) are all known. If the values of H and K of each component are obtained, the tide component can be obtained. H and K are called the tide harmonic constants of tide components. If the harmonic constants are known, the tide height of each component can be acquired, and the future tide can be calculated by adding all the components together.
The harmonic analysis data introduced in the model were taken from the NOAA website, and 37 sub-tides were selected for prediction (M2, S2, N2, K1, M4, O1, M6, MK3, S4, MN4, NU2, S6, MU2, 2N2, OO1, LAM2, S1, M1, J1, MM, SSA, SA, MSF, MF, RHO, Q1, T2, R2, 2Q1, P1, 2SM2, M3, L2, 2MK3, K2, M8, MS4). The improved prediction model introduces a modular approach on the basis of pruning by deleting the regular astronomical tide data in the input dataset, replacing the prediction output with the non-astronomical surge data (surge deviation), and finally combining the resulting prediction output with the astronomical tide calculated by the harmonic analysis to derive the final prediction results. It has also been demonstrated in previous studies that the modularized NARX model has better prediction results compared to the conventional model, with improved accuracy and robustness of the model, with a combined degree of improvement of about 23.3% [30].
The improved model structure As shown in Fig. 8, a complete PM-NARX model consists of two parts, the neural network part and the modular prediction part. Firstly, in the neural network part, the input data is evaluated once at the end of each training process on the sensitivity of the output weights of the hidden layer, so as to filter out redundant neural nodes for pruning operations. The pruned neural network is then tested to check whether the test results satisfy the set conditions, and if not, the internal parameters are adjusted and retrained in the first step, and if the test results satisfy the set conditions, the number of iterations is counted once (K + 1), and the number of training iterations is further determined whether the set upper limit is satisfied. When the number of passing test results satisfies the set condition (K ≥ 6) or the number of training iterations is up to the upper limit, then enter the modular prediction part, the model is brought into the meteorological factor data for predicting non-astronomical surges. Finally, the prediction output from pruned-NARX is combined with the astronomical tide data calculated by the tidal harmonic analysis can produce the final storm surge prediction results.