Global extreme wave estimates and their sensitivity to the analysed data period and data sources

14 In the lack of wave measuring buoys operating over extended periods, the use of wave hindcast 15 data or satellite observations is indispensable for estimating global extreme wave heights. 16 However, the results may depend on the analysed wind wave sources and the length of the 17 analysed period. The sensitivity of the estimated extreme significant wave heights (SWH) to 18 the analysed data sources and periods is investigated in this study. Global extreme wave heights 19 are estimated using ECMWF Reanalysis v5 data (ERA5), global wave hindcast developed 20 based on Simulating WAves Nearshore forced by the Japanese 55-year Reanalysis (SWAN-21 JRA55), satellite altimeter observations, and long-term wave buoy measurements. Both Annual 22 Maximum fitting to the Generalized Extreme Value Distribution (AM-GEV) and Peaks Over 23 Threshold fitted to the Generalized Pareto Distribution (POT-GPD) models are used. The 24 results show that the global extreme SWH estimates considerably depend on the analysed data 25 sources. The relative differences observed between the analysed data sources are >20% in large 26 part of the world. Thus, by increasing the analysed data periods, the relative differences in 27 extreme SWH are mainly lower, but they can reach 30% and are more important using AM-28 GEV. Besides, by comparing the extreme values from reanalysis and hindcast wave data to 29 those from long-term wave measurements, underestimations of up to 2 m are observed for a 30 return period of 100 years in the North-West Atlantic and North-East Pacific.

wave hindcast. We divided the data in time by ten years scale factor for the last three wave data 114 sources. The shape parameters are estimated for each dataset using the MLE method, and the 115 goodness of fit of the different used data sets against two used distribution models was tested 116 using K-S test. The K-S test results concerning the data with significant fit were mapped. On 117 the other hand, for each analysed dataset, the upper confidence interval of the estimated shape 118 and scale parameters are computed for a confidence level of 95% and mapped. Based on the 119 estimated distribution parameters of each dataset, the returned period results obtained for 120 different data sources and different data periods using AM-GEV and POT-GEV distribution 121 models were estimated, inspected, and compared through global spatial maps. The extreme 122 waves for a return period up to 100 years are also estimated using offshore buoy measurements 123 and compared graphically to those obtained from the ERA5 reanalysis data and JRA-55 wave 124 hindcast data at the nearest location. Thus, the goodness of fit of each data set against both 125 distribution models (GEV and GPD) was evaluated using K-S test, considering a confidence 126 level of 95% for a significant fit. 127 As there is no theoretical approach to validate the estimated extreme wave values, we consider 128 that the more probably correct estimations are the results obtained using the buoy measurement 129 data set, with a more considerable analysed period and a proper fit to the used distribution 130 model. When ξ is less than zero, it is referred to as having a tail Type II (Frechet distribution family).

144
When ξ is greater than zero, it is admitted that the GEV has a tail of Type III that matches the 145 Weibull distribution family. To estimate these GEV distribution parameters, MLE is used. The (4) 160 When ξ=0, the GPD is expected to have a tail of Type I (exponential distribution). Otherwise, 161 it is assumed that the GPD has a tail Type II similar to the Pareto distribution if ξ>0. If ξ< 0, 162 the tail is Type III, a case of the beta distribution. 163 The maximum value of x that is expected to return at least once within the next m years can be Setting the threshold is a tricky challenge in POT approach. The threshold value selection must 177 be based on the meteorological data in the study area, as advised by Mathiesen et al. (1994). To 178 achieve a robust fit, it is important that the threshold selected for the POT-GPD model is set function of their corresponding threshold, and finally, the appropriate threshold is defined 185 graphically based on the plot stability (see Coles 2001   where F is the theoretical cumulative distribution of the distribution being tested which must 266 be a continuous distribution (i.e., no discrete distributions such as the binomial or Poisson), and 267 it must be fully specified (i.e., the location, scale, and shape parameters cannot be estimated 268 from the data). The hypothesis regarding the distributional form is rejected if the test statistic, 269 D, is greater than the critical value.    input data for the SWAN model. The ice data have not been taken into account as inputs.

318
The wave parameter output used in this study is the SWH, which is defined based on the zero 319 moments of the energy density spectrum E(f), in which f is the frequency.
The collected data were validated against offshore wave measurements to perform a careful and 323 reliable assessment of specific potential errors related to wave hindcast accuracy. In-situ 324 measurements are obtained from 64 buoys spread across the globe with recording times ( Figure   325 2). We recognized that a longer measurement time might be more suitable for validation. The   we noticed that all used datasets present a significant fit to the GEV distribution at a risk level 475 alpha=0.05, and in the majority of the global offshore seas. However, a non-significant fit is 476 observed in the near coast areas using the wave hindcast data. Thus, using the reanalysis data, 477 a non-significant fit is also noticed in the Polar Regions for satellites and in a larger polar area.

478
The results in this location are not discussed for the hindcast data considering that the ice 479 information was not used during the wave hindcast simulation. It is essential to notice that the  reanalyses. In areas of low cyclone activity, the overestimation is only observed using the 517 dataset of a short period (10 years). The results based on ten years of data also present larger 518 confidence intervals in the estimated shape and scale parameters (Figures 3 and 4). Based on 519 ten years of data, the results are more acceptable when using satellite observations. Comparing 520 the results obtained from satellite, SWAN-JRA55 hindcast, and ERA5 reanalysis for the same 521 data period of 20 years, we observed that the relative differences depending on the used data 522 sources are computed and shown in (Figure 7 (a-c)). In the regions of tropical cyclone activities, 523 22 the absolute relative differences can reach 80% comparing SWAN-JRA55 results to satellite 524 data results (Figure 7 (a)), and reach 100% comparing ERA5 results to satellite data results 525 (Figure 7 (b)). The extreme SWH results obtained using ERA5 data are overestimated compared 526 to those obtained based on satellite data in the northern and southern tropical regions and 527 underestimated in the tropical region. Using SWAN-JRA55, the relative changes out of the 528 regions of tropical cyclone activities are mainly <20% compared to both ERA5 and satellite 529 results (Figure 7 (a,c)). The extreme SWH is overestimated compared to those obtained based ERA5 data to those estimated based on 20 years ( (Figure 7 (d, f)). Using the satellite data, the 555 relative change as a function of the used data period are mainly >20% (Figure 7 (g)). By 556 23 increasing the analysed data period of ERA5 and SWAN-JRA55 to 40years, the relative 557 changes rarely exceed 10%, except in tropical cyclone regions (Figure 7 (g, h)). The results    optimized to obtain a more accurate estimation of peak events at a regional scale.

621
Taking into account that the AM-GEV method results do not depend on the threshold selection,    The results obtained at those locations may depend more on the selected threshold. According    (Figures 8 and 14), the differences in the estimated extreme SWHs using AM-

811
GEV and POT-GPD depend on the spatial location and the analysed data pe3.