In this paper, the tide level data from Yorktown, US (37 ° 13.6'N, 76 ° 28.7' W), are selected as the samples to verify the performance of the prediction model. Yorktown is a small town in southeast Virginia, USA, and now is a part of the National Historical Park. The satellite image is shown in Figure 4 below.
In the process of model simulation, the tide observation data of GMT0000 to GMT2300 from June 1, 2020 to July 30, 2020 in Yorktown is selected, and the observation interval is 1 hour. A total of 1440 groups of tide level data are listed, and the tide level time series of this period can be obtained by listing the data. Tide and meteorological data used in this paper are from the website https://tidesandcurrents.noaa.gov/. The observed tide level data is shown in Figure 5.
3.1. Prediction Analysis of Harmonic Analysis Method
In order to facilitate comparison, this paper directly intercepted the last 240 sets of harmonic analysis data and compared them with the actual observed tide level. The results are shown in Figure 6.
It can be seen from Figure 6 (a) and (b) that the prediction error range of harmonic analysis model is [0,0.25] meters. The larger error is mainly concentrated in the first 150 groups of data, and the later prediction error is gradually stable. The stable prediction error is about 0.1 meters. This result obviously cannot satisfy the accuracy requirements.
3.2. Prediction and Analysis of GA-BP Neural Network
In order to complete the tide level prediction, it is necessary to set the initial input parameters of GA-BP neural network. The main contents include the layers’ number of BP neural network, the number of nodes in the input layer, the hidden layer and the output layer, and the initial parameters of genetic optimization algorithm. Firstly, the topological structure of neural network model should be determined.
In this paper, the classic three-layer BP neural network is used, and the topological structure is in Figure 7.
The number of nodes in the input layer is determined by the number of input parameters. In this paper, BP neural network is used to train the data obtained by harmonic analysis method to predict tide, so the number of nodes in the input layer is taken as 1.
The number of nodes in the hidden layer mainly affects the performance of BP neural network. If the selected number of hidden layer nodes is not appropriate, the accuracy of prediction output data of trained network is often difficult to achieve the expected. To solve this problem, this paper adopts Equation (4).
In Equation (4), M means the number of hidden layer nodes, m means the number of input layer nodes, n means the number of output layer nodes, and a means a random natural number between 0 and 10.
Through the calculation of empirical formula combined with multiple test prediction method, the number of hidden layer nodes is finally determined to be 10.
For the GA-BP neural network model used in this paper, the output data is the predicted tide level at a certain time, so the number of nodes in the output layer is set as 1.
For the introduced genetic algorithm, four parameters need to be set in advance.
(1) The size of the population is generally 20 ~ 100
If the population size is too small, the population evolution cannot produce the expected number according to the pattern theorem; if the population size is too large, the results are difficult to converge and waste resources, and the robustness is reduced.
(2) The mutation probability is generally 0.0001 ~ 0.1
If the mutation probability is too small, the diversity of population will decrease too quickly, which will lead to the rapid loss of effective genes and is not easy to repair. If the mutation probability is too large, although the population diversity can be guaranteed, the probability of high-order patterns being destroyed increases with the increase of mutation probability.
(3) The crossover probability is generally 0.4 ~ 0.99
If the crossover probability is too large, it is easy to destroy the existing favorable pattern, increase the randomness, and easily miss the optimal individual. If the crossover probability is too small, it cannot effectively update the population.
(4) Evolutionary algebra, usually 100 ~ 500
If the evolutionary algebra is too small, the algorithm is not easy to converge, and the population is not mature; if the evolutionary algebra is too large, the algorithm is already skilled or the population is too early to converge again, it is meaningless to continue evolution, which will only increase time expenditure and resources waste.
Through repeated attempts and experiments, the population size is set as 50, the number of iterations is set as 100, the crossover probability is 0.5, and the mutation probability is 0.005.
1200 groups of data were trained before the prediction, and the last 240 groups of data were used as prediction output. The comparison between the predicted tide level and the actual tide level and the correlation between the predicted output and the actual output data are shown in Figure 8 and Figure 9.
As shown from the above figures, the prediction data of GA-BP neural network is basically consistent with the actual value, but the error of a small part of output data is still very large. the overall prediction error range of GA-BP neural network is reduced to less than 0.2 meters, which is lower than that of harmonic analysis method. However, the prediction error increases significantly before and after the peak tide level. This phenomenon is more obvious in the last 100 sets of data.
3.3. NARX Modular Neural Network Prediction Analysis
As GA-BP model, in order to complete the tide level prediction, it is also necessary to set the initial input parameters of NARX neural network, including the number of nodes in the input layer, the hidden layer and the output layer, the delay order of input and output.
In this paper, aiming at many nonlinear factors that affect the tide level data, five input parameters including wind speed, wind direction, gust speed, air temperature and air pressure are selected to predict tide level. Therefore, the number of input nodes is 5; the number of output nodes is 1; the number of neurons in hidden layer is determined as 10 according to the empirical equation (4); and the default delay order of input and output is 1: 2. This means that the simulation data of the next output layer refers to the data of the first two input layers; the greater the delay order is, the more data are referenced in the prediction process, and the better the prediction effect is. In this paper, the delay order is set as 1:20.
After setting, its structure is shown in Figure 10.
The comparison and correlation between the predicted tide level and the observed tide level after training are shown in Figure 11 and Figure 12.
As shown from the figures, the prediction results are in good agreement with the actual observed tide level change trend, and the correlation between the predicted output data and the actual output data reaches 0.988. Only when the tide level is about to turn, the prediction error is relatively large, but it has little impact on the overall accuracy of the data. Compared with the first two prediction methods, NARX modular model has a better prediction effect. According to the specific error data, the error is basically stable within 0.05 meters.