Evaluating Financial Risk in the Transition from EONIA to ESTER: A TimeGAN Approach with Enhanced VaR Estimations

: This study investigates the evaluation of multivariate time series data using a Generative Adversarial Network (GAN). Calculating the Value at Risk (VaR) for the Euro Overnight Index Average (EONIA) over different time periods and evaluating the financial risk consequences of the EONIA to Euro Short-Term Rate (ESTER) transition are the main objectives. Through the use of a particular GAN called TimeGAN, which focuses on macro-finance temporal and latent representation, the study aims to predict short-rate risk for EONIA. When estimating lower VaR and the 1-day higher VaR for EONIA, the TimeGAN model performs poorly. However, it performs well when estimating upper VaR for 10-day and 20-day periods. The variation of TimeGAN with PLS+FM, which uses Positive Label Smoothing and Feature Matching shows the upper and lower VaR for EONIA over 10 and 20-day periods are excellently estimated by this enhanced model. Simulations for the 20-day EONIA show less variation between TimeGAN variations than a one-factor Vasicek model, even with the proper VaR estimations. This study evaluates the proposed transition mapping from ESTER to EONIA by the European Central Bank (ECB), calculating an ESTER+8.5bps shift with the TimeGAN with PLS+FM. The results do not refute the validity of the ECB's proposed EONIA-ESTER mapping. Additionally, the TimeGAN with PLS+FM accurately predicts VaR for 10 and 20-day periods for ESTER using the EONIA-ESTER mapping. Whereas the one-factor Vasicek model finds it difficult to estimate higher VaR for ESTER over the same time


II. BACKGROUND: REGULAR GAN VS TİMEGAN
The investigation applies a two-model system simultaneously trained in adversarial objectives and explores GANs in the setting of short rates EONIA and ESTER together with risk indicators.The generator, the initial model, seeks to generate data that is identical to real data.This generator uses a stochastic Wiener process to generate data through the use of a Recurrent Neural Network [2][3][4][5][6][7][8][9] [33].The discriminator, the second model, is a RNN designed to distinguish between authentic and fraudulent input.The binary cross-entropy loss function is essentially what the GAN architecture is designed to minimize.It assesses how well the discriminator can accurately identify samples of created and real data.The discriminator wants to maximize this loss, whereas the generator wants to minimize it.Through mutual training, the generator can improve its output and successfully fool the discriminator.Even though the standard GAN model works well, [1] suggest the Time Series GAN (TimeGAN).In order to develop low-dimensional representations of actual data sequences of length that are separate from the discriminator's classification, this improved model adds an intermediate embedding space.The GAN and the Autoencoder are the two main components of TimeGAN's architecture.The Autoencoder uses a RNN as the embedder to translate real-time series data to a low-dimensional embedding space.The recovery model then reverses this mapping, returning to the original data space from the lowdimensional embedding space.Instead of mapping random variables directly to the data space, the GAN portion of TimeGAN concentrates on the generator's capacity to map random variables to the low-dimensional embedding space.The model's ability to learn representations is enhanced by this indirect simulation process, which helps it recognize patterns in the low-dimensional manifold of short rates.The discriminator in TimeGAN works in the embedding space and aims to distinguish between produced and real data samples in this altered domain.In order to guarantee that temporal links are maintained in the generated data throughout time, a supervisor network is trained on the embedding space simultaneously.Because the generator and discriminator must be trained simultaneously in order to reach a Nash equilibrium in a non-cooperative game, TimeGAN optimization is difficult.The Wasserstein GAN (WGAN) and its Gradient Penalty (WGAN-GP) variation, as well as label smoothing and feature matching approaches, are presented to improve optimization.Smoother optimization is made possible by WGAN and WGAN-GP, which work to enhance the loss function and allay worries about gradient saturation.Feature matching adds mean, variance, skewness, and kurtosis as matching statistics to ensure that the statistics of generated data are more closely aligned with those of real data.Lastly, mode collapsewhich happens when the generator produces a single sample-is reduced by label smoothing, which modifies the discriminator's output labels.TimeGAN's ability to learn, generate, and iterate more successfully across lowdimensional embeddings is made possible by this multimodal optimization strategy, which also improves TimeGAN's understanding and simulation of short-rate hazards.

III. RELATED WORKS
Unlike stocks, interest rates have a limit; a rise in rates could have unintended negative effects on the economy.An Ornstein-Uhlenbeck stochastic process is used by [10] to capture mean reversion of interest rates.[11] include a noarbitrage limitation in the yield curve, whereas [12] include a non-negativity condition in their extensions.In 1992, [13] recognized that assuming a single source of market risk was insufficient for effective risk modeling.Therefore, they proposed a two-factor model with a stochastic mean component.[14] question the idea that unobservable latent forces solely drive short-rate evolution and suggest including macroeconomic variables in short-rate models.[15][31] [32] draw the conclusion that macroeconomic factors primarily affect the short end of the yield curve and suggest using the [16] rule to incorporate inflation and economic growth.In order to create a connection between macroeconomics and market microstructure, [17] show how trading activity, liquidity, and scheduled macroeconomic announcements relate to one another.[18] claim that shortterm bond liquidity is the first to react to changes in monetary policy, having a significant impact on short rates.[19] claim that during times of market stress, liquidity drives interest rates in the Eurozone, but credit quality has a stronger ability to explain cross-sectional fluctuations in short rates.Credit risk premia and liquidity are positively correlated, according to [20], suggesting that modeling liquidity automatically accounts for any exorbitant credit risk premia.They measure liquidity as the bid-ask spread on short-term Euribor tenors using specific formulas derived from panel bank quotations.

IV. EMPİRİCAL STUDY
A. Data As of this writing, the EONIA dataset spans the dates January 4, 1999, to March 12, 2020.The pre-ESTER data spans the period from September 30, 2019, to March 15, 2017.In contrast, the ESTER dataset spans the period from October 1, 2019, to March 12, 2020.The data is notable for having a large amount of noise in it.A further examination reveals outliers that are the result of improper inputs.Notwithstanding these anomalies, the data unmistakably show the spread's illiquidity during the Great Financial Crisis (GFC) and its subsequent decline, though it did not completely recover after the GFC.I suggest that EONIA and ESTER differ from one another in ways that extend beyond the first two statistical moments.Therefore, to effectively reflect these distributional disparities, a nonlinear, model-free simulation technique becomes essential.I base the comparison of EONIA and ESTER on industry-standard stylized information as well as some of our own.

B.
Model Analysis However, in this study I include market microstructure risk variables, macroeconomic risk factors after 2000, and mean reversion components prior to 2000 into a single model.The fundamental process in this model is replicated by the explanatory factors and the short rate within a time T. Two efficient methods for producing non-linear images include the Generative Adversarial Network (GAN) created by [21] and its extension for multivariate time series, the TimeGAN model by [1].Risk modeling for the short rate is made simpler by the TimeGAN model, which enables the simulation of future scenarios without the requirement for a specialized model.Therefore, there are three main parts to this study: 1. Autoencoder Model: [22] initially developed the Autoencoder model, which is used to initiate nonlinear dimension reduction.This decline is in line with the theory that a limited set of risk variables drive the short rate's underlying mechanism.I will discuss the state-of-the-art in autoencoder technology at the moment and explain our choice of network architecture.The curse of dimensionality, which typically causes problems in forecasting applications, is lessened by this dimension reduction.model.[23] propose the LSTM network design to capture these temporal changes.In order to forecast the low-dimensional representation of simulations, the LSTM model is trained using the low-dimensional representation of real data.Any differences between the simulations and the predictions show where the temporal dynamics are flawed.

C. Evaluation Metrics
The propensity for negative returns is measured by skewness (S).A higher likelihood of negative returns is indicated by lower skewness, and vice versa.Concentration is calculated using kurtosis (K), where bigger values denote fatter tails in the return distribution.Mean reversion is measured using the Hurst exponent (H), which denotes time series persistence or long-term memory.A higher regression to the mean is indicated by higher Hurst values.The possibility of non-stationarity in the time series is indicated by stationarity (A), which is evaluated using the Augmented Dickey-Fuller (ADF) test.I also give three additional stylized facts in addition to these widely recognized ones.I are able to evaluate the strength of different time series components thanks to [24]'s Season and Trend decomposition using LOESS (STL).Higher values indicate stronger correlations.It computes the Strength of Trend (FT) and the Strength of Seasonality (FS).The spikiness (SP) of daily returns, according to [25], is a measure of distribution spikes; a higher number indicates a spikier distribution.Table 2 displays those stylized facts for ESTER, the ECB's mapping, pre-ESTER, EONIA, and EONIA during the pre-ESTER period.Compared to the ECB's suggested mapping, pre-ESTER returns in particular exhibited stronger skewness, higher kurtosis, and lower variance.This indicates that while the ECB mapping is more closely matched with a normal distribution, pre-ESTER has a greater tendency to yield positive returns.There hasn't been any significant shift in mean reversion, according to the comparatively stable Hurst exponent.Furthermore, pre-ESTER returns are non-stationary compared to the ECB's proposed mapping.Moreover, pre-ESTER had lower Spikiness ratings and greater FS scores than the ECB's intended mapping.In order to evaluate the T-VaR estimates, I use the TimeGAN models for EONIA and its risk components to create 10,000 simulations for T trading days, from which I calculate the T-VaR for EONIA.This procedure is repeated in the test and validation datasets for 250 trading days, and it is backtested using [26] Basel Committee Coverage Test.The train and validation datasets are combined to calibrate the TimeGAN models.In particular, I calibrate the Vasicek and Variance-Covariance models on the 250 trading days prior in both the test and validation datasets simultaneously.According to [27], one way to keep an eye out for overfitting and mode collapse is to compare the produced samples' Nearest Neighbors (NN) to the actual EONIA data.In a quantitative assessment, the diversity of TimeGAN simulations is quantified by the number of unique NNs (designated as DNN) for real EONIA data among 250 simulations.Furthermore, in order to qualitatively evaluate variety, I perform a dimension reduction on 20-day simulations of latent variables using t-distributed Stochastic Neighbor Embedding (t-SNE).After that, I compute 20-day simulations while suppressing the influence of other latent factors to investigate logical interpretations, and then I examine the effect of latent variables on EONIA simulations using the best-performing TimeGAN model.I assess the ECB's suggested EONIA-ESTER mapping, continuing with the top-performing TimeGAN model calibrated on EONIA.In order to validate the ECB's mapping, the TimeGAN model's discriminator is applied to 20 trading days of the pre-ESTER+8.5bpsand EONIA risk variables' latent space.The realness score predictions made by the discriminator for every pre-ESTER trading day are averaged in order to evaluate how well they match the stylized facts found in Table 2.In addition, I use an XGboost regression decision tree to examine the independent variable importance for the realness score.Lastly, I use a coverage test to assess the capacity to predict T-VaR for ESTER during the pre-ESTER period using an EONIA-calibrated TimeGAN model.

A.
Model Selection Using the EONIA training dataset, a 5-fold crossvalidation is used to optimize the hyperparameters.To find the ideal autoencoder configuration, we've trained a variety of autoencoder architectures with different dropout regularization parameters, layer counts, and latent space dimensions.Fig. 2(a) displays the average recovery loss for several network designs in the hypothesis space with a 0.1 dropout.Interestingly, even with a somewhat smaller latent space dimension, the combination of a 1-layer Embedder, 1layer Recovery, and four latent variables shows a minimum recovery loss.I examine the mean recovery loss for different Embedder and Recovery layers, conditioned on four latent variables and a dropout of 0.1, in Fig. 2(b).The configuration with the lowest recovery loss in this case is the 1-layer Embedder and 1-layer Recovery setup.As a result, I use four latent variables and a 1-layer LSTM for the Embedder and Recovery networks, with a dropout regularization of 0.1.

(a) (b) Fig. 2: Recovery Loss for Different Number of Layers with Dimension and Recovery
Examining the latent variable values in the training dataset at random for the optimum autoencoder (Fig. 3), we see that these variables move independently.According to our theory, every latent variable in the training set reflects unique, imperceptible traits.I have experimented with different dropout regularization parameters and layer numbers in Supervisor designs, taking into account the latent space representation that I have learned.The mean supervisor loss for several designs fitted with the training data is shown in Fig. 4. Interestingly, the least supervisor loss is seen when a 1-layer Supervisor is combined with a 0.2 dropout regularization.Taking into account the Generator and Discriminator parts, I have set up the Generator to have more capacity than the Discriminator through the use of layered LSTM layers.A bidirectional LSTM is used in the Discriminator design, which provides more flexibility because the classification does not strictly follow regular temporal dynamics.The interesting thing is that the TimeGAN model's performance improves as T increases.The training goal of the model, which is to create 20-day simulations for EONIA and EONIA risk factors, provides an explanation for this pattern.The LSTM model uses time-based backpropagation to address the vanishing gradient problem.However, gradient backpropagation becomes more difficult with longer sequences.Therefore, the 1-day VaR estimates perform worse than the 20-day VaR estimates for all TimeGAN models.For the sake of illustration, T-VaR produced by TimeGAN on arbitrary trade days in the validation dataset are shown in Fig. 5.The importance of independent variables in predicting the reality of ESTER in the pre-ESTER period is seen in Fig. 13.Interestingly, the biggest element influencing the realness score turns out to be inflation.It's interesting to observe, though, that real GDP growth is comparatively less significant, perhaps because the 20-day simulations using this relatively static variable only collected a limited amount of data.A more refined real GDP growth signal-such as one obtained from Kalman filtering-is proposed to provide a feature importance that is higher.Due to computing complexity, I were only able to perform manual searches because I did not have access to structured optimization techniques like grid or random searches.Random searches across high-dimensional hyperparameter spaces yield better optimization results than grid searches, but at the expense of increased processing cost.On the other hand, grid and random searches provide quicker computation speeds because of parallelism, whereas Bayesian optimization, as explained by [28], increases model performance in fewer evaluations by reasoning about performance in the hyperparameter space prior to exploration.Inadequate hyperparameter settings may have affected our investigation's results because I lacked a systematic optimization technique.Disentangling the latent space produced by the Autoencoder is another thing I might look at.

C. Diversity of TimeGAN Simulations
The three groups that support sequential autoencoder disentanglement, statistical independence of latent variables, and information regularization are [29,30].Examining these disentanglement processes could reveal ways to enhance the model's functionality.VaR calculations benefit from including as much EONIA data as feasible; yet, the nature of EONIA data varies with time.In order to improve performance, an improved model should adjust to these dynamics, perhaps by conditioning on previous trading periods or the volatility of the current market.Performance profiling is possible with TensorFlow, and this could reveal information on algorithmic effectiveness and runtime optimization.But in our TensorFlow implementation, I were unable to ensure the model's optimal efficiency and runtime due to restricted access to profile TimeGAN on the LISA multi-GPU cluster.Including attention processes in TimeGAN can improve long-term dependence modeling.Better predictions are made possible by the attention model, which selectively focuses on important RNN hidden states to provide context vectors.The field is undergoing evolution as researchers investigate models such as LSTNet, self-attention architectures, and CNNs combined with LSTMs and attention mechanisms.Improved LSTM network interpretability is essential, particularly in the financial sector.The underlying workings of the model might be revealed by methods for visualizing and interpreting LSTM activations and gate behaviors.For recurrent networks, methods such as batch normalization and layer normalization provide ways to stabilize training and hasten convergence.Enhancing TimeGAN's ability to adjust to shifting dynamics may require training it on particular time periods, such as stressed or non-stressed periods.Gaussian mixture models are one example of a strategy that incorporates soft classifiers to help in soft classification for conditioned training.The model's loss distributions are then influenced by these classifications.Prospective paths for improving TimeGAN's performance and suitability for conditional modeling and financial risk simulations are provided by these research lines.Investigating these areas could improve the model's performance and interpretability.

2 .
GAN Model: I utilize the GAN to simulate the Autoencoder's low-dimensional representation without the need for a model.Implicit probability maximization is possible in this GAN-driven simulation, and it doesn't need an underlying distribution.Section 3 delves more into the network architecture and state-of-the-art.3. Long Short-Term Memory (LSTM): The last stage is to synchronize the low-dimensional representation of the real data with the temporal dynamics of both simulations from the GAN model.Similar to a non-linear version of the Vector AutoRegression (VAR)

V
. EXPERİMENTAL ANALYSİS As shown in Fig. 1, I first divided the EONIA and EONIA risk factors into train, validation, and test datasets.I next use the training dataset to choose models for the Embedder, Recovery, and Supervisor networks.On the validation and test datasets, T-VaR estimations for EONIA and the variety of EONIA simulations are evaluated, taking into account different T values like 1, 10, and 20.I carry out the same process with a variance covariance model and a 1factor Vasicek model to create a baseline for comparison.

Fig. 3 :Fig. 4 :
Fig. 3: Values for Latent Variables in the Training Dataset, Which Includes EONIA, that were Discovered Over Time by the Optimum Autoencoder

Fig. 5 :
Fig. 5: T-VaR for Baseline Models and TimeGAN When compared to the Vasicek and Variance-Covariance models, the 1-day VaR estimates are clearly more aggressive.Upon closer inspection, the Variance-Covariance model's T-VaR estimates are unstable, so I decide not to include the Variance-Covariance model in the following analyses.To provide more context, Fig. 6 show examples of T-VaR9 produced by TimeGAN with PLS+FM on random trading days in the validation dataset.These T-VaR estimates differ from the regular TimeGAN T-VaR estimates, indicating a propensity towards aggressive 1-day VaR estimations.

Fig. 6 :Fig. 7 :
Fig. 6: T-VaR for Time GAN PLS+FM and Baseline ModelsThe training dynamics of a GAN introduce diversity in VaR estimations across iterations, underlining the necessity to focus on the ideal training iteration.Visualizing the 1-day and 20-day VaR estimates for TimeGAN with PLS+FM conditioned on the most favorable training iterations in Fig.7over the validation and test periods reveals several insights.Notably, when T increases, there is an expansion in the absolute difference between upper and lower T-VaR, accompanied by a shift in the mean.Additionally, a comparable depiction for the other TimeGAN models demonstrates.

Four
nearest neighbors are shown for the TimeGAN with PLS+FM model in Fig. 8 and for the normal TimeGAN in Fig. 9.The two models demonstrate their capacity to produce various EONIA short rate routes.This shows that as the model goes through more training steps, the diversity of EONIA short rate pathways increases.Reducing the dimensionality in R 2 , Fig. 101 displays the t-SNE projections that represent the 20-day simulations of four latent variables produced by Gθg(w1:T) and real data embeddings, Eθe(x1:T).The graphic highlights certain commonalities between the two datasets by showing a partial overlap in the low-dimensional manifolds of the simulated and real data.

Fig. 11 :
Fig. 11: Realness Score eSTER During Pre-eSTER Period Based on TimeGAN with PLS+FM Discriminator Now let's look at Fig. 12, which shows the correlation matrix between the pre-ESTER realness score and the stylized facts of ESTER.It is difficult to determine which stylized information primarily accounts for the realness score because the analysis does not show any significant connections between the stylized facts and the realness score.

Fig. 12 :
Fig. 12: Correlation Matrix Stylized Facts of ESTER and ESTER Realness Score

Fig. 13 :
Fig. 13: Independent Variable Importance in Ester Realness Score During pre-ESTER Period VII.DİSCUSSİON The standard TimeGAN models provide reasonably accurate upper 10-day and 20-day VaR estimates for EONIA.They do, however, have worse 1-day VaR estimates and lower T-VaR.TimeGAN with PLS+FM shows the most notable improvement; it still cannot estimate 1-day VaR for EONIA, but it generates reasonable 10-day and 20-day VaR estimates.The Variance-Covariance model produces unreliable forecasts, however the 1-factor Vasicek model produces sufficient T-VaR estimations.It's interesting to note that although TimeGAN with PLS+FM can accurately forecast interest rate risks for 10 and 20 days, it is unable to generate as diverse EONIA simulations as the 1factor Vasicek model.Our analysis does not challenge the correctness of the ECB's proposed EONIA-ESTER mapping.Short-term and mid-term liquidity indicators, as well as inflation, are the main factors influencing the realness score.I successfully estimate 10-day and 20-day VaR for ESTER during the pre-ESTER timeframe using TimeGAN with PLS+FM calibrated on EONIA.But during the same time periods, the 1-factor Vasicek model has trouble projecting lower VaR for ESTER.It's interesting to see that TimeGAN with PLS+FM forecasts perform poorly for 1-day VaR estimations for ESTER, whereas the 1-factor Vasicek model performs respectably.

Table 4 displays
DNN for each of the TimeGAN models across 250 simulations in the EONIA test data set.DNN is included in the Vasicek simulations for baseline comparison.In comparison to the Vasicek model, I find that DNN is lower for all T and all TimeGAN models.Stochasticity is incorporated into each time step of the Vasicek model's Euler discretization simulation.While the Generator network makes sure that temporal links are maintained and that latent representations match latent representations in the data, the TimeGAN model takes stochasticity into account in w1:T.Because of this, the 1factor Vasicek model generates more NNs than any other TimeGAN model.