Comparison of annual maximum and peaks over threshold methods with automated threshold selection in flood frequency analysis: A case study for Australia

Flood frequency analysis (FFA) enables fitting of distribution functions to observed flow data 20 for estimation of flood quantiles. Two main approaches, Annual Maximum (AM) and peaks- 21 over-threshold (POT) are adopted for FFA. POT approach is under-employed due to its 22 complexity and uncertainty associated with the threshold selection and independence criteria 23 for selecting peak flows. This study evaluates the POT and AM approaches using data from 24 188 gauged stations in south-east Australia. POT approach adopted in this study applies a 25 different average numbers of events per year fitted with Generalised Pareto (GP) distribution 26 with an automated threshold detection method. The POT model extends its parametric 27 approach to Maximum Likelihood Estimator (MLE) and Point Moment Weighted Unbiased 28 (PMWU) method. Generalised Extreme Value (GEV) distribution using L-moment estimator 29 is used for AM approach. It has been found that there is a large difference in design flood 30 estimates between the AM and POT approaches for smaller average recurrence intervals 31 (ARI), with a median difference of 25% for 1.01 year ARI and 5% for 50 and 100 years ARIs. 32

complexity and uncertainty associated with the threshold selection and independence criteria 23 for selecting peak flows. This study evaluates the POT and AM approaches using data from 24 188 gauged stations in south-east Australia. POT approach adopted in this study applies a Flood is one of the worst natural disasters. In flood risk assessment, a design flood is defined 37 as a flood discharge associated with a given annual exceedance probability. Flood frequency 38 analysis (FFA) is widely adopted in estimating design floods. In FFA, two approaches are 39 frequently adopted: annual maximum (AM) and peaks over threshold (POT). AM approach 40 involves the selection of maximum streamflow data from each year to form AM flood series.  Many studies in Australia aimed to enhance the accuracy in both at-site and regional FFA  Peaks-over-threshold approach 98 The first assumption that makes the POT model viable is the confirmation of Poisson arrival 99 or homogeneous hypothesis. Lang, Ouarda and Bobée (1999) stated that the threshold value 100 should be sufficiently high so that this assumption is not violated. However, the selected 101 threshold value also needs to be low enough to retain as many peak flow values as possible to 102 exploit the advantages of the POT approach. This study applies the various values of average 103 events per year to automating the selection of the physical threshold. Through using the 104 method, the overall quantities of data points for the analysis can be controlled. For the site 105 having a shorter record length, POT approach can obtain more data points than AM approach.

106
This study applies 3, 5 and 10 in average events per year, which are denoted as POT3, POT5 107 and POT10, respectively. Noteworthy, the associated complexity using POT for FFA is that 108 the approach considers two elements, magnitude and time.

109
Regarding the aspect of time, the second flood peak is rejected if the duration between the 110 two peaks is smaller than the calculated value. With the consideration of the magnitude, some Program using R is applied in this study to extract the POT series by applying a physical 117 threshold and used for GP distribution fitting with an automated statistical threshold detection 118 based on ND and TS approach, which are denoted as POT-ND and POT-TS respectively.

119
Procedures shown in Figure 1 are followed in this study.

120
As mentioned previously, the complexity associated with the POT approach is the 121 determination of the physical threshold. Below iterative process is implemented in R code to 122 retain flood peaks.
The difference between individual candidate thresholds is then formulated as per Equation 5.

288
Overall, the POT3-ND-PWMU approach provides the best match with the AM-GEV 289 approach. Overall, at smaller ARIs, the differences between the POT-GP and AM-GEV 290 approaches are higher; however, the differences reduce as ARI increases.   lower ARIs and is mainly in the range of 5% -10%.

357
Comparison between parametric and non-parametric approaches 358 The differences in quantile estimates between POT-parametric and POT-non-parametric 359 approaches are calculated. Here the peak flow data extracted based on POT3 model is used.

360
As can be seen in the best results compared to POT non-parametric approach.

368
Tables 6(a) and 6(b) detail the statistics of the differences between the POT-parametric and POT approach is suitable for lower ARIs, but still it is not error free.

379
The difference in quantile estimates between AM-GEV and POT-non-parametric approaches 380 are calculated for comparison. Figure 14 represents the boxplots between two approaches.

381
POT-non-parametric approach provides satisfactory results at lower ARIs (less than 2 years 382 ARI), and the difference between two approaches increase with the increase of ARI.    Step 1

383
•POT series extration •Incorporate the independence criteria in R code for iterative process. •Extrat POT series with different average events per year provided the time and magnitude criteria are met, i.e. POT3, POT5 and POT10.
Step 2 •Automating statistical threshold detection for GP distribution •Perform iterative process for equally spaced thresholds and estimate individual parameter set of the GP distribution by different thresholds. •Perform normality test (POT-ND) based on p-value for suitable threshold. •Verify candidate threshold MRLP and parameter plots. •Perform cubic spline fitting and find minimum rate of change between consecutive increase of threshold (POT-TS). •Verify candidate threshold MRLP and threshold stability plots. •Estimate flood quantiles based on selected threshold, i.e. POT3-ND-GP, POT3-TS-GP.
Step 3 •Illustration and comparison of results •Evaluate differences in flood quantiles between POT and AM approaches. •Evaluate differences between parametric and non-metric approaches.
•Apply a suit of statistical measures to evaluate different approaches. •Plot statistical measures on Australian map to assess spatial coherence.   ARI POT-