Surrogate modeling of time-dependent metocean conditions during hurricanes

Metocean conditions during hurricanes are defined by multiple parameters (e.g., significant wave height and surge height) that vary in time with significant auto- and cross-correlation. In many cases, the nature of the variation of these characteristics in time is important to design and assess the risk to offshore structures, but a persistent problem is that measurements are sparse and time history simulations using metocean models are computationally onerous. Surrogate modeling is an appealing approach to ease the computational burden of metocean modeling; however, modeling the time-dependency of metocean conditions using surrogate models is challenging because the conditions at one time instant are dependent on not only the conditions at that instant but also on the conditions at previous time instances. In this paper, time-dependent surrogate modeling of significant wave height, peak wave period, peak wave direction, and storm surge is explored using a database of metocean conditions at an offshore site. Three types of surrogate models, including Kriging, multilayer perceptron (MLP), and recurrent neural network with gated recurrent unit (RNN-GRU), are evaluated, with two different time-dependent structures considered for the Kriging model and two training set sizes for the MLP model, resulting in a total of five models evaluated in this paper. The performance of the models is compared in terms of accuracy and sensitivity toward hyperparameters, and the MLP and RNN-GRU models are demonstrated to have extraordinary prediction performance in this context.


Introduction
Modeling of metocean conditions (e.g., significant wave height, peak wave period, etc.) during hurricanes is a situation well-suited for surrogate modeling, as the underlying physical processes are complex and numerical models are so computationally demanding that their potential for engineering applications is limited. The term surrogate model refers to a data-driven statistical model that approximates the behavior of a complex process 1 3 characterized by input features (measurable quantities that have relevant predictive power) and output target(s). Some parametric surrogate models of metocean conditions during hurricanes (Young 1988) have been proposed, but these are limited to deep and open waters. However, for more general situations, such as those with limited water depth or fetch length, the conventional practice is to use numerical, time-dependent models (Tolman 2014; DHI 2014a) that account for important nonlinear phenomena such as wind-sea and wave-wave interactions, energy dissipation, etc. Because the historical record of hurricanes is relatively short, a catalog of synthetic hurricanes representing a much longer period of time is commonly applied in engineering design and risk assessments. Numerical simulations of metocean conditions for each of the thousands of hurricanes in such a catalog are not practical, and a promising approach is to instead simulate only a small subset of these hurricanes numerically and then create a surrogate model to replace the time-consuming numerical model for simulating the remaining hurricanes.
Surrogate modeling for time-dependent processes, such as metocean conditions during hurricanes, is a challenging task. In this context, time-dependent processes refer to processes with characteristics that vary in time and are affected not only by the current conditions but also by the conditions at previous instances of time. This kind of behavior is difficult for surrogate models to represent because the correlation between the predictions at different time instances must be captured accurately. Many types of surrogate models have been explored for modeling metocean conditions, including decision trees (Mahjoobi and Etemad-Shahidi 2008), response surfaces (Taflanidis et al. 2012), Kriging (Jia and Taflanidis 2013), support vector machine (SVM) , and ensemble models (O'Donncha et al. , 2019. When applying these models to a time series, the input and output data at different time instances are independent of each other, and so these models cannot explicitly model the time-dependency of metocean processes. Some researchers have proposed approaches to adapt these models for time-dependent processes. For example, Jia et al. (Jia et al. Mar 2016) used Kriging to model storm surge with a high-dimensional vector designed to include time-dependence. In addition to the techniques above, neural networks have also been used for surrogate modeling of metocean conditions, such as (Kim et al. 2014;Herman et al. 2009;Browne et al. 2007), where the networks used by these researchers have an architecture with one hidden layer. In recent years, neural networks have evolved alongside the rapid development of deep learning, resulting in a new family of techniques that are referred to as deep neural networks, which have network architectures with multiple hidden layers and more sophisticated layer operations-from the basic operations of a multilayer perceptron, MLP, to the complex operations of a recurrent neural network, RNN. These advances have yielded better performance for many complex tasks such as natural language processing (Collobert and Weston 2008) and computer vision (Szegedy et al. 2016), but their application in the surrogate modeling of metocean conditions is so far limited (refer to James et al. (2018) as one example).
Development of a surrogate model of time-dependent metocean conditions involves several loosely coupled choices: the size of the training dataset, the selection of the type of surrogate model (e.g., Kriging, MLP, RNN, etc.), and the design of the time-dependent structure. A model with a time-dependent structure can predict a series of targets that includes correlation in time. Some models, like RNN, have inherent time-dependent structures, while others, like Kriging and MLP, do not have such structures inherently. For the latter type of models, the effect of time-dependent processes can be modeled with ad hoc manipulation of the feature and/or target vectors of the models. In this paper, a database of metocean conditions during hurricanes for an offshore site was used to compare five combinations of surrogate model, time-dependent structure, and training dataset size. This paper has two goals: (1) to illustrate how intrinsic characteristics of the surrogate model along with details of its time-dependent structure and size of training dataset affect the overall performance of the model and (2) to demonstrate the enormous potential of deep neural networks in surrogate modeling of metocean conditions. For the neural networks considered in this paper, the influence of the network architecture (i.e., the number of hidden layers and the number of nodes within each hidden layer) was also explored. This paper is organized as follows. First, the background of the surrogate models used in this paper is provided in Sect. 2. Then, the database of metocean conditions during hurricanes, which was used for model training, is described in Sect. 3. Details of the implementation of the five models considered in this paper are presented in Sect. 4. Results are provided and discussed in Sect. 5, and conclusions are summarized in Sect. 6.

Background
The general formulation of the time-dependent surrogate modeling discussed in this paper is expressed as x(t) → y(t) , where the prediction target y is a vector of metocean conditions at a single location at a series of time instances during a hurricane and the input feature x is a vector of variables characterizing the hurricane at a series of time instances. Three types of surrogate models were considered in this paper: Kriging, the multilayer perceptron (MLP), and recurrent neural network with gated recurrent unit (RNN-GRU) (Cho et al. 2014). These three types of models were chosen to represent a relevant range of characteristics. For example, the RNN-GRU model has an inherent time-dependent structure and can naturally model time-dependent processes, while the other two require an ad hoc arrangement of the feature and target vectors to model such processes. Another important distinction is that Kriging is a memory-based approach (i.e., all training data are memorized to make predictions), while the other two are parametric models (i.e., training data are only used to determine a set of parameters of the model and are not directly involved in making predictions). Key features of these models are provided in Table 1, and the details of each model are presented separately in the remainder of this section.

Kriging model
For a training dataset X m×n , Y m×q composed of m samples of n feature variables and q target variables, the Kriging model uses a linear combination of training targets Y to provide the prediction ŷ for any unknown feature ∼ x as where c ⊤ is the coefficient matrix of the linear combination, f p×1 (x) is the basis function, * p×q is the result of the generalized least-squares method is the result of the basis function evaluated at all the training features  There are two hyperparameters in the Kriging model, the basis function f (x) and the semi-variogram function R , x i , x j . Possible forms for the function f (x) include a constant function f (0) (x) = 1 , a linear function f (1) (x) = 1, x 1 , … , x n ⊤ , and a quadratic func- In this paper, the linear basis function and the exponential semi-variogram function were used. The constant basis function and two other types of semi-variogram functions, linear and Gaussian (detailed expressions for these are provided by Lophaven et al. (2002)), were included in preliminary experiments but were found to have higher prediction errors and are not discussed here. Quadratic or higherorder basis functions were not considered because the dimension of f (x) exceeded the number of training samples for some cases considered in this paper. Note that the inclusion of some quadratic terms might improve the performance of the term f ∼ x ⊤ * ; however, this requires domain-specific knowledge and is not common practice. Features were standardized (i.e., to have a zero mean and unit standard deviation for each feature) during the model training to improve prediction accuracy.

MLP
An MLP model is the most basic neural network. It includes several layers of nodes, with the first layer representing the feature vector and the last layer representing the target vector. Figure 1 shows the architecture of an MLP model with two hidden layers as an example. The operation of each hidden layer is expressed as where for each layer of the model, z 1 is the input vector, z 2 is the output vector, W is the weight matrix, b is the vector representing bias, and g is the activation function, which nonlinearly transforms the data elementwise.
Compared to the Kriging model, the MLP model has more hyperparameters. First, the MLP model includes l hidden layers (the number of hidden layers is referred to herein as the depth of the network) and k nodes per layer (the number of nodes per layer is referred to herein as the width of the network), which together determine the architecture of the network. The ability of the model to approximate nonlinear behavior and the complexity of the training process increase with the number of hidden layers and the number of nodes per layer. These parameters affect model performance as does the activation function g.
In this paper, the selected activation function was ReLU (Rectified Linear Unit (Nair and Hinton 2010), expressed as y = max(0, x) ). Three other types of activation functions, ELU (Exponential Linear Unit (Clevert et al. 2015)), sigmoid (i.e., standard logistic), and hyperbolic tangent, were considered during preliminary experiments, but ReLU was found to have the best performance and so the others are not discussed here. The number of [x 1 , … , x n ] is the feature vector, [y 1 , … , y q ] is the target vector, and each instance of [h l1 , … , h lk ] represents a hidden layer hidden layers ( l= 1, 3, 5, and 7) and the number of nodes per layer ( k= 16, 32, 64, 128, 256, 512, 1024, 2048, and 4096) were investigated to determine the optimal network architecture for this application. Note that the number of nodes per layer k was assumed to be constant for each hidden layer to simplify the optimization, and thus each network architecture was identified as l × k . ADAM (adaptive moment estimation) was used for model training. The training batch size was set as 128, and the learning rate was set as 0.001. No dropout or feature scaling was used.

RNN-GRU
An RNN-GRU model is structured similarly to the MLP model (see Fig. 1), but with the output of each layer h t expressed as bias vectors, is the recurrent activation function, * indicates the Hadamard product between two vectors, and t indicates the time instance. For the first time instance t = 1 , the vector h 0 at each layer is initialized as a zero vector and the vector h 1 at each layer is calculated according to Eq. (6), which is then used as the vector h t−1 to make predictions at time instance t = 2 , and so on. The vector u t in Eq. (6) represents the percent of the past information (i.e., h t−1 ) to be updated, r t indicates the proportion of the past information to forget, and ∼ h t represents the memory content.
The same 36 sets of network architecture were tested for the RNN-GRU model as for the MLP model (i.e., all combinations of four values of l and nine values of k , see Table 1). The activation function g was chosen as hyperbolic tangent and the recurrent activation function was chosen as hard sigmoid, following common practice.

Database of metocean conditions during hurricanes
The database of metocean conditions used in this paper for training the surrogate models includes conditions during a set of synthetic hurricanes. The synthetic hurricanes were selected from a catalog developed by Liu (2014), which uses historical hurricanes to characterize potential hurricane activity in the northeastern part of the Atlantic basin over a span of 100,000 years. The hurricanes are defined using the Holland model (Georgiou et al. 1983;Holland 1980) in terms of seven parameters: longitude and latitude of the hurricane eye location, the central atmospheric pressure, the radius to maximum wind speed, the translational velocity, the translational direction, and the B parameter, which describes the profile of pressure versus distance from the eye. These parameters were sampled at six-hour intervals for the entire life span of each hurricane according to the distribution of historical measurements. Linear interpolation was then performed in this paper to obtain parameters at one-hour intervals. The wind speed and atmospheric pressure fields defined by the Holland model were then used as input to a numerical metocean model Mike 21 (refer to Qiao et al. (2019) for details). The Mike 21 model couples a hydrodynamic module, which simulates two-dimensional flows based on the depth-integrated, incompressible, Reynolds-averaged Navier-Stokes equations (DHI 2014b), and a spectral wave module, which simulates the growth, propagation, and decay of wind-generated waves and swells based on wave action conservation equations (DHI 2014a), to provide predictions on metocean conditions. One offshore site near South Carolina was considered in this paper (see Fig. 2). The coordinates of this site are (78.87 °W, 33.09 °N), and the water depth is 17 m. The metocean conditions during all hurricanes in the catalog that have a mean wind speed of at least 33 m/s at an elevation of 10 m were simulated, resulting in a total of 5,881 simulations of hurricanes over the 100,000-year span of the catalog. Four types of metocean conditions were considered at an hourly interval: the significant wave height H s , the peak wave period T p , the peak wave direction H s , and the sea surface elevation (including both tide and storm surge).

Numerical experiments
Surrogate modeling is a mathematical mapping between feature vector x and target vector y , i.e., x → y . For the hurricane database introduced in Sect. 3, y is a time-dependent vector of four metocean conditions ( H s , T p , H s , and ) during each hurricane. Five models are introduced in this section, including three types of surrogate models (Kriging, MLP, and RNN-GRU), with two different time-dependent structures considered for the Kriging model and two training set sizes for the MLP model. Key information for each of the five models is presented in Table 2. There are a total of 12 combinations of the aforementioned factors ( 3 × 2 × 2 ), and for conciseness, only a subset of these 12 combinations are investigated as part of the design of these five models (see Table 3 Table 1. The values of these hyperparameters are determined through a series of numerical experiments to find the value that generates a local minimum of the loss function (i.e., each hyperparameter was iteratively perturbed and the value that minimizes the loss function was selected), and for conciseness, details of the results of these numerical experiments are only provided for the architecture of the neural network models, as described in Sect. 5.2.4.

Time-dependent structures
Modeling of metocean conditions during a hurricane can be abstracted as a sequenceto-sequence problem (see Fig. 3), where the time-dependent feature x represents a set of parameters characterizing the hurricane and site-specific conditions, the time-dependent target y is the metocean conditions of interest (e.g., H s , T p , H s , and in this paper), and the hidden variables H represent the complex interactions of metocean characteristics. The target at step t i is affected by not only x(t i ) , but also by x t i−1 , x(t i−2 ), etc.
For surrogate models without inherent time-dependent structures (e.g., the Kriging and MLP models), the effect of time-dependence can be included with an ad hoc Model 5 RNN-GRU x t i → y t i ~ 660,000 arrangement of the feature and/or target vectors. Two structures were compared in this paper. The first structure is expressed as x t i−k , … , t i → y t j , … , t j+l , a structure that incorporates features at steps t i−k , … , t i (referred to herein as hurricane features) to form the overall feature vector (i.e., a vector that concatenates hurricane feature vectors at several time instances into a larger vector) and predicts a target vector which includes steps t j , … , t j+l (see Fig. 4(a)). The vector y t j , … , t j+l was selected to cover the time instances corresponding to intense metocean conditions, as these conditions are most important to model accurately for engineering applications. The second structure is expressed as x t i−k , … , t i → y t i , a structure that predicts metocean conditions independently for each time instance and includes the time-dependence implicitly by including hurricane features at various time instances (see Fig. 4(b)). At first glance, this structure is a special case of the first structure with l = 0 . But, there is a philosophical difference between the two structures: the second structure trains and predicts metocean conditions for each hurricane hour, rather than for each hurricane, so each hurricane produces multiple training samples, and the duration of predictions is not constant. It is worth noting that the structure x t i−k , … , t i → y t j , … , t j+l can also be implemented in a step-wise manner, i.e., generating multiple training samples for each hurricane, However, this results in multiple predictions for metocean conditions at the same time instance. Thus, the implementation of the first structure in this paper was based on the idea that one hurricane produces one training sample. For both structures, the time interval to concatenate the hurricane features does not have to match the time interval of the training data. For example, the hurricane database introduced in Sect. 2 has hourly intervals, but hurricane features can be concatenated with 3-h intervals. This benefits model performance in some cases, as metocean characteristics with 3-h intervals are less autocorrelated with each other. Both structures were implemented using the Kriging model (see Model  Table 2), while the second structure was also implemented using the MLP model (see Model 3 and Model 4 in Table 2). For the RNN-GRU model, the prediction of time-dependent processes is straightforward because it inherently models time-dependent processes. The RNN-GRU model considers hurricane features at each time instance and makes predictions with the following structure x t i → y t i . This structure is considered in Model 5 in Table 2. All three time-dependent structures are summarized in Table 4.

Design of feature vector
Two aspects were involved when designing the feature vector: the selection of hurricane features included at each time instance (i.e., the specific form of x t i in Fig. 4) and the time instances to concatenate these features (i.e., the selection of t i−k , … , t i in Fig. 4). The former affects all five models, and the latter affects only Models 1-4 (see Table 2), as the time-dependent structure of Model 5 does not require such concatenation. Because the target vector y was simulated from a numerical model driven by wind and pressure fields, the seven hurricane parameters that define the wind and pressure fields and the water depth (sum of still water depth and tide level) are the most straightforward to include as hurricane features. Other features, including the maximum wind speed within the entire wind field, local wind speed at the selected site, and local wind direction, were also considered. The two circular variables, the hurricane translational direction and local wind direction, were expressed in terms of sinusoidal and cosinoidal values, as is common practice. Hence, a total of 13 hurricane features were considered for each time instance.
For Models 1-4, the 13 hurricane features were concatenated at various time instances to form the feature vector. Up to 9 time instances with 3-h intervals were selected (i.e., t i , t i−3, …, t i−24 ). For Model 1, only one training sample was extracted for each hurricane and t i was selected as the hour of the maximum V at the selected site. The target time instances t j , … , t j+l included the 12 h before and after t i (i.e., 25 time instances in total). The design of the feature vector directly affects the effectiveness of the surrogate model, and much research has been devoted to optimizing the feature vector e.g., (Kira and Rendell 1992;Blum and Langley 1997). In preliminary experiments, fewer time instances and hurricane features at each time instance were tested, and the results indicated that using all 13 hurricane features at each time instance always yielded the best prediction performance for Models 1-5. For Models 1, 3, and 4, concatenating the 13 hurricane features at all 9 time instances yielded the best prediction performance, while for Model 2, using the 13 hurricane features only at t i as the feature vector yielded the best prediction performance.  Table 2 x t i−k , … , t i → y t j , … , t j+l No One 1 x t i → y t i Yes Multiple 5 1 3

Model training and evaluation
The five models considered in this paper use three types of surrogate models and three time-dependent structures, see Tables 1 and 4. The training and evaluation processes were slightly different for each model. For the 5,881 hurricanes included in the database of metocean conditions, 881 were used for model evaluation (see Fig. 5), leaving 5000 hurricanes (~ 800,000 hurricane hours) for model training. The Kriging models were trained using the DACE package (Lophaven et al. 2002) in MATLAB, and the neural networks were trained using TensorFlow (Abadi et al. 2016). The training datasets are described for each prediction model as follows.
• For Model 1, all 5000 hurricanes were used for training, producing 5000 training samples. • For Model 2, model training was based on hurricane hours rather than hurricanes, and therefore, 5000 hurricane hours were randomly selected from the 5000 hurricanes as the training dataset. According to the result of the random selection, a total of ~ 3000 hurricanes contributed to the 5000 hurricane hours. The same number of 5000 training samples was selected for Model 2 because (1) this allows a fair comparison with Model 1 to reveal the impact of the time-dependent structure on predictions and (2) it is very difficult for the Kriging model to memorize ~ 800,000 training samples formed in terms of hurricane hours(it would take ~ 5000 Gigabytes of computer memory just to construct the R matrix in MATLAB using double-precision floating-point values for ~ 800,000 training samples). • For Model 3, 5,000 training samples were used again to train the MLP model to create a fair comparison with the Kriging model in Model 2. Since there are more hyperparameters for neural networks than Kriging models, the chance of overfitting is higher.
To prevent overfitting, a common approach was employed: a validation dataset was prepared in addition to the training dataset, and optimal hyperparameters were selected based on the performance of the model for the validation dataset, rather than for the training dataset. The ratio between the training dataset and the validation dataset used here was around 70:15, resulting in a total of 1070 samples of hurricane hours for validation. The training and validation datasets were shuffled and re-divided during the training process to improve training efficiency. • For Model 4, the ~ 800,000 hurricane hours were divided into ~ 660,000 samples for training and ~ 140,000 samples for validation according to the 70:15 ratio. The training process was the same as for Model 3. • For Model 5, the training samples were formed in terms of hurricane hours, but the training process was implemented in terms of hurricanes, as time instances were fed The division of the 5881 hurricanes in the catalog among training, validation, and testing datasets is illustrated in Fig. 5. All models were tested for the same 881 hurricanes.

Results and discussions
For the four types of metocean conditions ( H s , T p , H s , and ), separate prediction models were trained and the results of the H s predictions were used as an example to compare the models. The overall performance of the models is presented in Sect. 5.1, and detailed comparisons are discussed in Sect. 5.2.

Overall performance
Models 1-5 were trained in different ways but tested for the same 881 testing hurricanes. The root-mean-square error for the testing dataset was chosen as the metric to evaluate the performance of each model, expressed as where ŷ is the prediction value during a time instance of a hurricane, y is the corresponding true value from the numerical time history simulation, subscript j indicates the prediction time instance, subscript i indicates the testing hurricane, m is the total number of testing hurricanes, n i is the total number of time instances for each testing hurricane, and N is the total number of time instances for all the testing hurricanes. Note that for the same hurricane, n i is different for each prediction model. For instance, Model 1 provides predictions for 25 time instances regardless of the hurricane duration, while Models 2-4 provide predictions for a duration slightly shorter than the hurricane, since the mapping of x t i−k , … , t i → y t i determines that the first available time instance for prediction is t k+1 ; Model 5 provides predictions for the entire hurricane duration.
Each prediction model was tested for various sets of hyperparameters (see Table 1). The best performance for predicting H s and the corresponding hyperparameters are listed in Table 5. Overall, Model 1 performed the worst, with H s ,Test = 0.41 m, and Model 5 performed the best, with H s ,Test = 0.05 m. Figure 6 shows H s ,Test as a function of H s . It is interesting to note that H s ,Test for Models 1 and 2 differs by ~ 32% as listed in Table 5,  Fig. 6, and this can be considered by including more intense hurricanes in the training database and by adjusting the training process to increase the weight of high values of H s during the model training.
Prediction performance of Model 5 is provided in Table 6 for the other three metocean conditions ( T p , H s , and ). Note that for the prediction of H s , which is a circular variable, the target variable was selected as sin H s and cos H s , so that values referring to the same direction (e.g., -180° and 180°) had the same representation. However, H s ,Test was evaluated based on the resulting direction errors for the range between -180° and 180°.
To illustrate the accuracy of Model 5 for various metocean conditions, prediction results for an individual hurricane that has a similar value Test as the overall value provided in Table 5 are presented in Fig. 7, with Test for H s , T p , H s , and equal to 0.04 m, 1.52 s, 13.5°, and 0.06 m, respectively. Note that the duration of the hurricane plotted in Fig. 7 is 163 h, but only the results for the 24 h before and after the maximum H s are shown. The prediction of the time history of H s matches the details of the simulation, while the prediction of T p captures only the overall trend of the simulation. This difference is not surprising, as H s is related to the area of a wave spectrum, while T p is related to the mode of a spectrum and is more sensitive to complex wave-wave interactions and thus harder to predict. Note that Fig. 7(c) shows a lower prediction error compared to the overall H s of 13.5°. This is because of some large prediction errors when the hurricane is far away from the site of interest. The opposite situation is observed in Fig. 7(d), where an overestimation at the peak surge is not reflected in the relatively low value of for this hurricane. These

Detailed comparisons
Some key aspects of the five prediction models are summarized here to indicate meaningful comparisons among the models.
• Model 1 and Model 2 both use the Kriging model and were trained using 5000 training samples. Model 1 was trained based on hurricanes (i.e., one training sample per hurricane), but Model 2 was trained based on hurricane hours (i.e., multiple training samples per hurricane). Model 3 was trained using 5,000 training samples, while Model 4 was trained using ~ 660,000 training samples. • Model 4 and Model 5 were trained using the same ~ 660,000 training samples. Model 4 uses the MLP model, which cannot inherently represent time-dependent behavior and relies on concatenating hurricane features at multiple time instances to represent such behavior. Model 5 uses the RNN-GRU model, which predicts the time-dependence of metocean conditions using its inherent recurrent structure.

Effect of time-dependent structure
The Kriging model was used to implement two structures of time-dependence as represented by Models 1 and 2, which can be expressed as x t i−k , … , t i → y t j , … , t j+l and x t i−k , … , t i → y t i . As revealed in Fig. 6, these two models perform similarly for H s > 2 m. However, Model 2 outperforms Model 1 in two ways. First, Model 2 is more flexible in predicting hurricanes with varying duration, as Model 1 only provides predictions with a fixed duration. Second, Model 2 is more efficient in terms of how the simulated hurricanes are used for training. Even though the same number of training samples were used, Model 1 used all 5000 simulated hurricanes, while Model 2 used only 5000 hurricane hours (contributed by ~ 3000 hurricanes), which is a small fraction of the ~ 800,000 total hurricane hours from the 5000 hurricanes. The efficiency of these two strategies reflects  Tables 5 and 6 the difference in what the models are learning from the training data. For Model 1, one set of features related to the time instance of maximum local wind speed is used to predict the corresponding time history target, while for Model 2, each set of features is used to predict the target at a corresponding time instance. Therefore, Model 1 learns how to represent metocean conditions during the maximum local wind input of a hurricane, while Model 2 learns to represent metocean conditions during any wind input of a hurricane.

Kriging vs. neural networks
The time-dependent structure x t i−k , … , t i → y t i is highly efficient, resulting in a large number of training samples, as the number of samples is approximately equal to the total hurricane hours instead of the number of hurricanes. Kriging models memorize the entire training dataset, and thus, the training process becomes onerous for large training datasets. There are some approaches (such as the adaptive Kriging combined with importance sampling method (Yun et al. 2018)) to improve efficiency in designing the training dataset; however, for a given training database, the comparison between Model 2 and Model 3 clearly shows how the type of surrogate model affects prediction performance. The same number of 5000 training samples were used in Model 2 and Model 3, and Model 3, which uses MLP, lowered H s ,Test by 32% compared to Model 2, which uses Kriging. The improvement is even more pronounced for high values of H s , as shown in Fig. 6. The MLP model outperforms Kriging because its capacity to approximate a nonlinear process can be easily adjusted through its network architecture, while the Kriging models had only two hyperparameters (i.e., the selection of basis and semi-variogram functions) to control its ability to approximate the nonlinear behavior of a process. Another important difference between the Kriging models and the neural networks is that for the input included in the training dataset, Kriging models always provide an exact prediction for the corresponding training target, much like interpolation. In contrast, neural networks make predictions with some deviation from the corresponding training target, much like regression. For predictions of a spatial variable, where Kriging models are widely used, this characteristic is attractive because the input is well defined by the spatial coordinates. For hurricane metocean conditions, however, a regression-type behavior is more attractive, because the characteristics of a hurricane can rarely be well defined by several parameters at several time instances, i.e., the same hurricane input features can be used to describe different hurricanes, leading to variation in metocean conditions. Though there are ways to account for such variation in Kriging models (Gramacy 2020), additional hyperparameters are required.

Size of the training dataset
One advantage of neural networks compared to Kriging models is that model training is no longer constrained by the size of the training dataset. Model 4 uses the same training process as Model 3, except that all ~ 660,000 training samples were used. The resulting overall value of H s ,Test is reduced by 26%, and H s ,Test is almost reduced by half for high values of H s (see Fig. 6).
Training neural networks effectively minimizes training error. Due to the non-convex characteristics of the loss function, the mini-batch gradient descent algorithm used in this paper is a standard approach in deep learning. This algorithm uses a small number of training samples to estimate the gradient for minimizing the loss function and is more efficient and suitable when the size of the training data is large (Li et al. 2014). This also allows the model to be easily updated when new training samples are available, instead of re-training a model from scratch.

MLP versus RNN-GRU
Both Models 4 and 5 can be represented with the schematic in Fig. 1. Model 4 uses a simple operation for each hidden layer as expressed in Eq. (2) and represents timedependent processes by including hurricane features from previous time instances in the feature layer. Hence, the network learns the relationship between metocean conditions at each time instance and hurricane features at multiple time instances. Model 5 uses a much more complex operation for hidden layers as expressed in Eqs. (3, 4, 5 and 6) so that each hidden layer manages its own memory vector ∼ h t representing the prediction history. The overall value of H s ,Test for Model 5 is lowered by 64% compared to Model 4, and Fig. 6 reveals that the values of H s ,Test for Model 5 are about half of Model 4 for H s > 3 m. This suggests that using a network with an intrinsic time-dependent structure performs better than manually concatenating hurricane features. However, the better performance of Model 5 comes with a price. As shown in Fig. 8, the performance of Model 5 is more sensitive to the network architecture compared to Model 4. A U-shaped behavior is observed in Fig. 8(b) for results based on networks with constant depth and varying width. The reason for this behavior is complex and is mainly due to the non-convex loss function within neural networks. Based on a study of the training and validation details, which is not presented here, some deep and narrow networks (e.g., a 5 × 16 network) converge to a local minimum of the loss function (i.e., a global minimum exists but is missed by the training algorithm) and thus perform worse compared to a shallower network with the same width. For a wide network (e.g., one with 4,096 nodes per hidden layer), when the number of hidden layers increases (starting from 3 hidden layers in this case), the training process becomes challenging due to the network complexity. For intermediate widths, a relatively clear pattern is observed: the prediction performance is insensitive to the number of hidden layers if more than one hidden layer is used, and the optimal number of nodes per layer is around 128. In other words, it took significantly more effort to confirm the network architecture of the RNN-GRU model was optimal compared to MLP.

Conclusions
Multiple approaches to apply surrogate models to predict time-dependent metocean conditions during hurricanes were discussed and implemented in this paper. A hurricane database composed of numerical metocean simulations for synthetic hurricanes was used for training and testing the surrogate models.
Some surrogate models, such as Kriging and MLP, do not have the inherent ability to model time-dependent processes. However, by concatenating hurricane features and/or prediction targets at various time instances, these models can be adapted to model time-dependent processes. Two adaptations were evaluated in this paper: x t i−k , … , t i → y t j , … , t j+l and x t i−k , … , t i → y t i . The latter was found to use training data more efficiently as multiple training samples are produced by each hurricane. In addition, for these adapted models, larger sizes of the training dataset were found to improve prediction performance, but also increase computational demands, an issue that is especially pronounced for memory-based models such as Kriging.
Neural networks were demonstrated to have the potential for accurate time-dependent surrogate modeling of metocean conditions. They outperform the Kriging model in several respects. First, the RNN-GRU model accurately predicted the time histories of multiple metocean variables (significant wave height, peak wave period, peak wave direction, and sea surface elevation), an ability that can lower uncertainty when applying surrogate modeling in risk analysis and other tasks. Second, the complexity of neural networks can be adjusted easily through the network architecture, which enables the models to learn the complex behavior of metocean conditions appropriately and accurately. Lastly, the optimization algorithm for a neural network can consider large training datasets efficiently, facilitating model training and updating. The flexibility of neural networks, however, makes their performance sensitive to the hyperparameters, and the complex RNN-GRU model implemented in this paper was shown to be especially sensitive.
The results of this paper demonstrate the potential of neural networks to represent complex metocean conditions during hurricanes, a potential that could transform the way hurricane risk is assessed.