TL-Moments Approach: Application of non-Stationary GEV Model in Flood Frequency Analysis


 The non-stationarity in hydrological records is a significant concerning area of interest within the field of flood risk management. Ignoring the non-stationary behaviour in flood series will result in a substantial bias in floods quantile. Hence, the non-stationary flood frequency analysis appeared to be an appropriate option to maintain the independent and identically distributed (IID) assumptions in sample observation. This paper utilized the Generalized Extreme Value (GEV) distribution to analyze extreme flood series. The time-varying moment technique, namely the L-moment and TL-moment methods are employed to estimate the non-stationary model (GEV 1, GEV 2, and GEV 3) in the flood series. The ADF test, Mann-Kendall trend test, and Spearman’s Rho test showed that two out of ten streamflow stations in Johor, Malaysia demonstrated a non-stationary behaviour in the annual maximum streamflow. Results from the simulation study demonstrate a consistent performance on the non-stationary model. Furthermore, the TL-moments method could efficiently predict the flood event estimated at quantiles of the higher return periods.

2000; Vogel, McMahon, and Chiew 1993). However, there is a rising concern over irregular patterns in climate change caused by increased flood risk and severe hydrological events (Burn and Hag Elnur 2002;Ishak et al. 2013 2001) anticipated that climate change attributed to the susceptibility of human activity on the ecosystems. Warmer air due to climate change may give rise to the potential for extreme rainfall, consequently the flood events. The criteria for the frequency of the data sample such as the mean and variance can be changed following the irregular pattern (Khaliq et al. 2006). Consequently, the frequency analysis of probability distributions and distribution parameters in the context of non-stationary changed throughout history. Hence stationary assumption on nonstationary sample records may no longer be valid under the possible impacts of climate change of flood events (Salas and Obeysekera 2014;Vasiliades et al. 2015). Several studies have explored risk factors associated with violations of the stationarity principle (Cunderlik and Burn 2003;Milly et al. 2008;Villarini et al. 2009a,b).
However, some essential questions remain about the role of practical non-stationary assumptions in flood events.
Alternative methods are required to estimate the time-dependent distribution parameter at a time.
The parameter estimation in non-stationary flood frequency analysis (NFFA) by using the L-moments method has received attention from some researchers (Campos-Aranda 2019; Debele et al. 2017;Gado and Nguyen 2016a,b;Kochanek et al. 2013;Sunget al. 2018). The L-moments method offers an effective way to measure a small data sample in a hydrologic extremes study (Katz et al. 2002). However, it has been reported that L-moment is too sensitive towards an outlier (Ahmad et al. 2011;Bílková 2015) which is improper for predicting large return period events of hydrologic extremes. Building on the work of Hosking (2007), Elamir & Seheult (2003) introduced the trimmed L-moments (TL-moments) method to improve the sample with a significant outlier. TL(t1,0) (t1 = 1, 2, 3, 4) for parameter estimation on a non-stationary model using the generalized extreme value (GEV) distribution. The performance of this method will be evaluated based on a comparative study of the Lmoments method for the non-stationary model using simulated data from the Monte Carlo simulation. The input from this research leads to an understanding of the use of non-stationary models by emphasizing extreme values in hydrological data.

GEV Distribution
The GEV is a commonly used distribution for flood frequency analysis of extreme events (El-Adlouni et al. 2007;El-magd 2010;Guru and Jha 2014;Rahman et al. 2013;Raynal-Villasenor 2012;Tramblay et al. 2012). The three parameters of the GEV distribution are expressed from the three types of the probability distribution function, namely Frechet, Gumbel and Weibull (Coles 2001). The cumulative distribution function (CDF) for the GEV distribution is given by: where x denotes the observed flood series, and ξ, α, and k represent the location, scale, and shape parameters.

Non-stationary Model
Conventionally, the GEV parameters model is a time-independent parameters model (Coles 2001).
According to Strupczewski et al. (2016), the distribution parameter of the non-stationary model changes over time. The non-stationary parameter model considered in the form of trend moments. When a trend is found in a series of data, specific parameters of the non-stationary model turn into time-dependent variables. Following the principle of parsimony, finite number of parameters are required to explain the non-stationary behavior. However, as the parameters for the non-stationary model vary according to the non-stationary condition, the performed estimation is likely to increase the number of parameters. Hence, transfer functions representing the location and scale parameters such as linear, logarithm, exponential, trigonometry, and parabola can be considered in the nonstationary model. These transfer functions are handy because they can explain existing trends for hydrological data. Moreover, this function satisfies the parsimony parameter principle, i.e., the minimum number of parameters required (El-Adlouni et al. 2007;Kim et al. 2017). Nevertheless, the shape parameters for non-stationary models need to be fixed due to the complexity of accurate estimation (Coles 2001). In this study, three non-stationary models with time, t as time-dependent covariates, are given in Table 1. Previous studies used an exponential function in the linear function of the scale parameter for the GEV 2 model (Gado and Nguyen 2016a,b). However, for simplicity, this study remains the original scale of parameters with the linear trend in time due to excessive values on the scale parameters if the exponential scale is used.

L-moments for the GEV distribution
As the L-moments formed from the linear combinations of probability weighted moments (PWM) (Hosking 1990), the PWM equation for the GEV distribution is defined as follows (Hosking et al. 1985): where ξ, α, and k are the location, scale and shape parameter of the GEV distribution respectively. Next, the first four L-moments can be written as follows: and the L-moments ratios is given by:

TL-moments for the GEV distribution
According to Elamir and Seheult (Elamir and Seheult 2003), the r-th trimmed TL-moment is given by: Note that TL-moments can be reduced to the L-moments when 12 0 tt . The expectation of order statistics can be written as follows (Hosking and Wallis 1997): where F(x) denotes the cumulative distribution function (CDF) for x, x(F) is the inverse CDF of x calculated at the probability, and both r and n are a non-negative integer, respectively. The expectation of order statistics can also be defined in term of r (Greenwood et al. 1979). The TL-moment ratios can be written as: The non-stationary parameter in the GEV distribution for both L-moment and TL-moment can be obtained based on the relationships between the non-stationary condition of the series and the moments of the sample time series (Gado and Nguyen 2016a). The quantile functions for GEV 1, GEV 2, and GEV 3 models can be written as: (F,t) are quantile estimation of stationary and non-stationary GEV model at T-years return period; F = 1-1/T. Assume the estimated model is GEV 2, the non-stationary procedure is specified as follows: 1. By using the time-varying moments method of the time series, the non-stationarity in the mean, μ(t) and standard deviation, σ(t) is estimated by using the linear model: 2. Estimate the non-stationary location parameter (ξ(t)) by deriving the relationships between the first moment, λ1 from Equation (3a) and the non-stationary in the mean, μ(t) obtained from Equation (8). 3. Estimate the non-stationary scale parameter (α(t)) by deriving the relationships between the second moment, λ2 from Equation (3b) and the non-stationary in the standard deviation, σ(t) modelled from Equation (8). 4. Estimate the shape parameter (k) from L-coefficient of skewness, τ3 given in Equation (4). 5. Repeat the entire procedure using TL-moments method.
Note that the selection of non-stationary criteria in location and scale parameters for GEV 1, GEV 2, and

Monte Carlo Simulation
To assess the accuracy of the non-stationary model, Monte Carlo simulation is utilized for calculating the estimation error in both TL-moments and L-moments method. The random sampling in Monte Carlo simulation is useful in creating artificial samples under identical distribution conditions (Cassey and Smith 2014).
Thus, in this study, the synthetic flood data under non-stationary condition is produced using GEV distribution with sample sizes of n = 15, 30 and 50 for return quantile periods of 20, 50, 100, 200, and 1000 years (x(F) = 0.95, 0.980, 0.990, 0.995, and 0.999). To see the affectability of the estimation methods, the shape (k) parameter of GEV distribution varies between -0.3 ≤ k ≤ 0.3 (Martins and Stedinger 2000). According to Gado & Nguyen (2016a), the coefficient of trend in the location (ξ1) parameter is drawn from the GEV distribution with the range of -0.3 ≤ ξ1 ≤ 0.3 to measure the sensitivity of the methods. Table 5 summarizes the intercept of the trend in the scale and location parameter for GEV 1, GEV 2, and GEV 3 models. This simulation is repeated 1000 times for each GEV distribution, with a different location, scale, shape, and proportion of non-stationarity. The best model estimation is measured based on the smallest value of relative root mean square error (RRMSE).

Trend and stationarity detection test
According to some studies, the record for the hydrological data are censored and non-normally distributed (Bouza-Deano et al. 2008;Sadri et al. 2016;Villarini et al. 2009b;Yue et al. 2002). The non-parametric Mann-Kendall (MK) and Spearman's Rho (SR) test have been widely used to identify the existence of a trend in environmental data series (Pohlert 2020).

Mann-Kendall (MK) test
The power of the monotonic relationship between two variables x and y is quantified tau measurement in both tests suggested by Kendall (1975). In order to estimate the tau in the test, the x variable considered as time and y variable is the annual maximum streamflow for this study (Mann 1945). The test often referred to as an MK test. This test very efficient in detecting monotonic patterns because of it is straightforward, reliable, and treats missed values below a detection threshold. (Ahmad et al. 2015;Burn and Hag Elnur 2002;Ishak et al. 2013;Ren et al. 2019;Shadmani et al. 2012;Silva 2004;Xu et al. 2003 where xi and xj are the data points at times i and j (j>i), n is the number of data points and estimated tau,  is given by: The test statistic, Z for SR test is given by Sneyers (1990): where rs is defined as follows: where R(xi) is the rank of the i-th observation xi in the sample size n.

Augmented Dickey-Fuller (ADF) test
The ADF test typically used to test the existence of unit roots in the series (difference stationary), which was first suggested by Dickey and Fuller (1979) and updated by Said and Dickey (1984). This test is performed to check if mean values and variances of a series vary with time which known as non-stationary time series.
Estimation of regression models has carried out the test with either an interrupt or a linear trend: Ordinary Least Square (OLS) (i.e., OLS calculates the variable relationship by maximizing the sum of the squares in the difference between the observed and predicted levels). The AR(1) model is considered as , where εt is an error valued sequence of independent random variables with mean zero and variance and the maximum likelihood estimator for ρ is using the OLS estimator, which is: If ρ = 1, the process {xt} is non-stationary, where it is known as a random walk process. In contrast, the process {xt} is stationary if |ρ| < 1. The hypothesis of the ADF test is: where p is the number of parameters in the model, n is the sample size, yi is i th value of the predicted variable, f(xi) is the predicted value of yi.

Data Description Study
Johor is known as one of Malaysia's rainiest states. Usually, the amount of rainfall is heavier at the end of the year, starting from May to September. The tendency of this heavy rain is known as the Southwest Monsoon season. In 2006, Malaysia experienced a hefty rainfall resulting in unexpected floods in Johor. This extreme heavy rainfall is almost certainly due to global climate change. Hence, this study systematically examines a series of annual maximum flow data of several stations in Johor to investigate the effectiveness of using the non-stationary assumptions in FFA. This study used secondary data collected from ten streamflow stations in Johor, Malaysia.
The streamflow data must be at least 15 years old or longer to generate a meaningful flood quantile estimation. Table 6 summarized the relevant information for each station used in this study. For better understanding on the location of each station, Figure 1  In order to assess the best estimation method in non-stationary assumption, RRMSE values for Lmoments and TL-moments were compared. Note that the quantile function of x(F = 0.999) is used as this quantile value is appropriate in expressing the whole quantile in describing the occurrence of a flood for the next 1000 years. Figure 2 presents the relationship between RRMSE at different level of shape parameter (k) for the GEV 1 model with a downward trend of 1  = -0.3. In Figure 2, there is a clear increasing trend in RRMSE value as the value of shape parameter increased. For the small sample cases n = 15, the TL-moments method reported significantly lower in RRMSE than the L-moments method except for negative shape values cases of k = -0.4, -0.3 and -0.2. This study indicates that the TL-moments method tends to perform better than the L-moments method for positive shape parameter (k) value cases. In general, it can be observed that the TL-moments methods perform well in capturing the non-stationary behaviour of the GEV 1 model with the downward trend, especially for positive shape parameter (k) cases. In addition, TL-moments (4,0) shows the best performance in capturing the presence of non-stationary by producing lower RRMSE.

GEV 2 model
In this section, the analysis for GEV 2 model carried out by modeling the mean and standard deviation as a linear function of time. To investigate the effect of positive and negative trends in the non-stationary series, the location parameter is set as ξ0 = 0 and ξ1 = -0.1 while the scale parameter is set as α0 = 1 and α1 = ±0.02. Figure 3 displays the relationship between RRMSE at a different shape parameter (k) for the GEV 2 model with a negative trend in scale parameter (α1 = -0.02). Focusing on the small sample size n = 15, the TLmoments methods produce smaller RRMSE than the L-moments method for every shape, k parameter. However, no significant difference between L-moments and TL-moments method as the RRMSE values are overlapped, particularly at k = -0.4, -0.3, and -0.2 for n = 100 cases. These results suggest that the L-moments method cannot capture the non-stationary behaviour in the case of negative trends in the scale parameters. Figure 4 shows the RRMSE results obtained from the L-moments and TL-moments estimation generated from GEV 2 model with a positive trend in scale parameter (α1 = +0.02). As shown in Figure 4, the L-moments method gives a poor estimation compare to other ways with the evidence of higher RRMSE values at sample size n = 15. It is apparent from this figure that the TL-moments method gives the lowest RRMSE compared to the L-moments method. For large sample size (n = 100) cases, TL-moment was the better method than L-moment in estimating the non-stationary condition, mainly when k is negative. These experiments confirmed that the TLmoments method gives better accuracy in evaluating the non-stationary series in both cases of scale parameter (α1 = ± 0.02) with evidence of small RRMSE values.

GEV 3 model
In the GEV 3 model, the quadratic characteristics in the mean are used to model the non-stationary data series. Figure 5 presents the relationship between obtained RRMSE at a different level of shape parameter, specifically for the GEV 3 model. Note that the RRMSE are produced based on the following GEV 3 parameters, ξ0 = 0, ξ1 = 0.2, and ξ2 = 0.2. As shown in Figure 5, the TL-moments(1,0) method reported significantly lower RRMSE than the other methods for every k in both sample sizes specifically, n =15 and 100. In addition, RRMSE for the L-moments method remains the highest, indicating less accuracy in estimating the quadratic nonstationary behaviour. These results suggest that the TL-moments method is found to be more stable than the Lmoments method in dealing with the non-stationary GEV 3 model properties.
In the GEV 3 model, the quadratic characteristics in the mean are used to model the non-stationary data series. Figure 5 presents the relationship between obtained RRMSE at a different level of the shape parameter, specifically for the GEV 3 (ξ0 = 0, ξ1 = 0.2, and ξ2 = 0.2) model. As shown in Figure 5, the TL-moments(1,0) method reported significantly lower RRMSE than the other methods for every k in both sample sizes, specifically, n =15 and 100. In addition, RRMSE for the L-moments method remains the highest, indicating less accuracy in estimating the quadratic non-stationary behaviour. These results suggest that the TL-moments method is more stable than the L-moments method when dealing with the non-stationary GEV 3 model properties.

Analysis of Real Data Series
In this section, the non-stationary FFA using the GEV model applied to the annual maximum flow data obtained from ten different station in Johor. In general, non-stationary FFA aims to provide an appropriate model in explaining the non-stationary behaviour in annual maximum flow.

Trend and non-stationary detection test
The first step in the non-stationary FFA process was to check the series trend behaviour and nonstationary behaviour in the data series. Thus, the trend detection test, namely the Mann-Kendall test and Spearman's Rho test, can detect a monotonic trend in the data. The advantages of these non-parametric trend detection test are that the calculation is free from distribution assumption and not affected by missing data (Hipel and McLeod 2005). While, the non-stationarity test intends to test the presence of changes in the mean values and variances of data series with respect to time. The application of the non-stationarity test in time series analysis can help for the validation of using certain models. Table 6 shows the result of Mann-Kendall, Spearman's Rho test, and ADF test for each of the studied streamflow station.  Table 7, it can be concluded that the stations that are reported with significant trends are also reported as non-stationary. The existence of a trend direction in the data series indicates that there is a non-stationary behaviour in its streamflow data station (López and Francés 2013). In subsequent analysis, only data with trends and non-stationary behaviour will be considered to highlight the importance of non-stationary models.

Selection of Best Model
The purpose of non-stationary flood frequency analysis was to improve the accuracy of flood prediction.
Hence, the selection of the best candidate models need to be undertaken. In this study, the most efficient model is determined based on the smallest values of the AIC test between all non-stationary models and stationary model. As shown here, the GEV 3 model seem to be well fitted to the data with trends and non-stationary characteristics compared to the other two models in Kahang station, except for L-moments estimation method assisted the GEV 1 model to give the smallest AIC value. However, Lenggor station chose GEV 1 model as the best model in dealing with non-stationary flood frequency analysis. Therefore, when the non-stationary characteristics presence in the data set, the non-stationary models such as GEV 1 should be applied in flood frequency analysis. In terms of application to flood series, one of the advantages of GEV 1 is that this model can be used to model the linear trend in the data series that frequently exist on flood data. This analysis showed that TL-moments estimation method gives a better estimate than the L-moment for each case data with evidence of smaller AIC values. Therefore, these results provide further support for the hypothesis that TL-moments estimation method is more efficient than L-moments in handling non-stationary flood frequency analysis for nonstationary streamflow station.

Conclusion
The GEV distribution is often used to describe the extreme event in flood frequency analysis. However, one of the issues of interest on extreme modelling is to determine the best estimation technique. Previous studies had utilized the TL-moment method to estimate the extreme quantile that minimize the error. However, the research to date had not been able to demonstrate any significant advantages of using TL-moments method in estimating the non-stationary models. Hence, this study attempts to investigate the importance of non-stationary assumptions in estimating the extreme events by utilizing the GEV distribution and TL-moments estimation approach. Monte Carlo simulation was used to compare the performance between TL-moments and the Lmoments method in estimating the non-stationary models.
To highlight the effectiveness of the TL-moments method in the context of non-stationary series behaviour, firstly, the Monte Carlo simulation analysis was conducted using the GEV distribution. The synthetic non-stationary data was generated for three types of models particularly, GEV 1 for a series with a linear function in mean, GEV 2 for the series that has the linear function of the time in the mean and the standard deviation, and GEV 3 for the series has the quadratic function of the time in the mean. Then, the optimal estimation method was determined using the RMSE values.
Moreover, the advantage of using the non-stationary model is demonstrating by using the real data sample. After identified the non-stationary behaviour in the annual maximum data, the AIC value was determined using the conventional stationary and proposed non-stationary models. The current study found that the nonstationary model was efficient for describing the non-stationary data series obtained from the two stations, namely Kahang and Lenggor, as the AIC values from the non-stationary models were found to lower than the stationary model. Interestingly, the annual maximum data series was observed to have negative trend behaviour. This decreasing trend indicates that the flood risk management strategies in Malaysia had improved. The application of the TL-moments method for non-stationary modeling using the non-stationary models is quite reasonable. This work contributes to existing knowledge of flood frequency analysis by improving the accuracy of the flood risk prediction as the comprehensive response to extreme climate events is crucial for flood risk management.   RRMSE of the 0.999 quantile estimated by L-moments and TL-moments(t1,0) methods for every shape (k) estimated from GEV 3 model.