Thanks to the methods used in the literature, the methods used in the study are described in detail in the subtitles.
Time Series Analysis
The time series are sequences created by ordering the observation values for any event by time. Time series analysis, on the other hand, is a method that aims to model the stochastic process that gives the structure of the observed series about an event observed at certain time intervals and to make predictions about the future with the help of the observation values of the past periods. (Box and Jenkins, 1986).
Box-Jenkins describes a time series as a sequentially generated group observation (Box and Jenkins, 1976). Among the observed values of time series, situations such as increase, decrease or remain constant in certain periods can be observed. Such situations can have several causes. Investigating these changes is beneficial when it is desired to make predictions, that is, when predicting the future of time series data, because time series may show similar features in the future. These changes are the components of the time series; Trend, seasonal changes, cyclical and random changes are examined in four groups.
Although it is difficult to completely eliminate seasonal, cyclical or irregular fluctuations, various methods are used for a certain level of smoothing. These methods will be discussed in the following sections. Flattening on the dataset used within the scope of the study was made for weekly, daily, hourly time zones and hourly flattening was found to be the most suitable option for the dataset, considering that the actual data came every half hour.
Essentially, flattening is to get rid of outliers without losing the trend of the data set. Accordingly, the flattening process for the specified time periods has been applied to the data set in Figure 2 (the flattening process has been applied to 130,000 data, and the flattening process for the last 500 data is shown in Figure 2).
In cases where the time series is stationary, that is, the average, variance and covariance of the process do not change depending on time, suitable ones from the time series models are used. However, most of the time series are not stationary due to the characteristics of a particular stochastic process that changes over time.
Time series are analyzed under two titles as stationary and non-stationary series according to the deviations shown on average. In order to apply the Box Jenkins method to non-stationary time series, it is necessary to eliminate the elements such as trend and seasonality, which disrupt the stability, with some conversion methods, thus making the series stationary (Pindyckand and Rubinfeld 1998).
Before performing a statistical analysis of a time series, it is necessary to investigate the stationarity of the process that created that series. The absence of a stochastic process makes the behavior of the series valid only for the period under consideration, and makes it difficult for us to generalize the series for other periods. For this reason, stagnation in time series is one of the most important features to be emphasized. In addition, the classical regression model was developed for use in the relationships between stationary variables, therefore it should not be used in non-stationary series (Gujarati, 2005: 709).
Since any of the explanatory variables in the regression equation is not stationary, the regression theory is disrupted, and stasis is actually a necessity. If the expected value, variance and covariance of a time series do not remain constant over time (if it changes depending on time), the series is not stationary. If the series is not stationary, analyzes are made only after it becomes stationary using various techniques. Many economic data (especially monetary data) are not stationary.
If a series is not stationary, the expected value or variance, or both, changes over time, only validates the behavior of the series for the estimated period under review. For this reason, it is very important to make a non-stationary time series stationary. A non-stationary time series is made stationary by applying one or more degrees of discrimination.
DETERMINATION OF STABILITY IN TIME SERIES
The methods used in the determination of stationarity are divided into two as classical and modern in the literature.
Classical methods make intuitive use of correlograms of graphs of series and autocorrelation (ACF) and partial autocorrelation (PACF) graphs to detect stability.
Modern methods include mathematical tests such as the Dickey Fuller Test.
DETERMINATION OF STABILITY WITH CLASSIC METHODS
In order to determine the stability with the graphic method, the graphics of the levels and differences of the series are examined, so that it can be determined whether there is a trend or seasonality in the series, and whether it is deterministic or random.
The correlogram is obtained by drawing autocorrelation (ACF) and partial autocorrelation (PACF) functions.
Autocorrelation Function (ACF)
The autocorrelation function (ACF) can be seen as an indicator of independence in a series, as it shows the correlation between observation values. Since it is not possible to fully define a stochastic process, the autocorrelation function that partially defines the process has an important place in the model building process. The autocorrelation function gives the information of the degree of correlation between adjacent data points in an array.
The time series consisting of all delays and the original time series are drawn on the same graph, and with the help of this graphic, it is seen that the time series consisting of delays have the same structure as the original time series. In the next step, the values of the Autocorrelation coefficients are calculated.
Partial Autocorrelation Function (PACF)
Partial autocorrelation coefficient values of all delays constitute the partial autocorrelation function. In the time series analysis, the partial autocorrelation coefficient is used to determine to what degree the autoregressive model will continue. In other words, the partial correlation function refers to the relationship between lagged variables.
DETERMINATION OF STABILITY WITH MODERN METHODS
The data set used in the study is already stationary without any stabilization, as can be understood intuitively from the ACF and PACF graphics. However; as classical methods have been replaced by modern methods today, modern methods, which include various tests, are more reliable. Accordingly, while analyzing the data set stationarity, DF test was also used with the help of the R Programming language, although it is clearly seen that the data set is stationary in the correlograms. As a result, knowing that the data set we will analyze is stationary, modeling can be started.
Time Series Models
For any time series to be stationary, its mean, variance, covariance and higher-order moments must be constant over time. If the model is not stationary, the array must be stationed. (Box and Jenkins, 1976). Within the scope of the study, univariate-linear-stationary time series will be discussed. Time series models can be divided into three general classes. Autoregressive Process (AR) models were developed by Yule (1926, 1927), Moving Average (MA) models Slutsky (1937) and ARMA models Wold (1954) (Makridakis and Wheelwright, 1989). For each time series model listed below, the results of a flight selected from the diverted flight list will be presented as an example.
Accuracy and model results vary for other flights on the list; therefore, the example to be presented should not be considered as the representative of flights on the entire list. The selected flight is a flight that leaves at 20:20 on 18.10.2011 and is directed due to low visibility range. Assuming that the METAR data recorded up to the flight time are known, the visibility range parameter will be predicted 6 steps ahead of the flight time, and the results will be presented in graphical format. The accuracy criteria and comparisons of the models established for all flights in the routing list will be included in the next sections.
Autoregressive Process Model - AR (p)
Autoregressive models (alternate-dependent models) can be defined as models whose future values are estimated by using the past values of the time series. Many time series include this process (Enders, 2004). Under the title of autoregressive process models, the first order AR model, AR (1), was used as the initial value. As an example, the view distance estimation results of the guided flight dated 18.10.2011 with 20:20 departure time are presented in Figure 3.
In Figure 3, AR (1) model was applied and the last 500 of the 130.000 data set were visualized. In the graphic of the sample flight, the red line represents the actual data, the black line represents the fitted values (fitted) according to the model, and the blue line indicates the forecast values. Forecasting for 6 forward steps is made in 85% and 95% confidence intervals. In the graph, although the AR (1) model tends to capture the actual data in the fitting process, it is observed that it does not comply with the METAR data by displaying a linear behavior while predicting and cannot predict the sudden decrease of visibility range.
Moving Average Model - MA (q)
If the delayed error terms of a time series affect its current value, the moving average (MA) process is defined. In other words, the estimated value of the variable in the moving average process is related to the estimated values of the error value (Enders, 2004). As an introduction to the moving average model, MA (1) model was chosen from the first degree. The result graph of the sample flight can be seen in Figure 4.
When the chart above is analyzed, the MA (1) model tends to capture the actual data in the fitting process; however, it can be concluded that he could not model the sudden decrease in visibility range in the forecast. Contrary to the sudden decrease in visibility range, it can be said that the MA (1) model has a much more optimistic prediction.
Autoregressive and Moving Average Model - ARMA (p, q)
Most cases cannot be expressed by AR (p) or MA (q) processes alone. These series are expressed as the sum of autoregressive and moving average models. If a time series has both AR and MA properties at the same time, this process is called Autoregressive and Moving Average (ARMA) process. AR (p), MA (q), ARMA (p, q) processes are based on the assumption that time series are stationary. Model results of ARMA (1,1) of the sample flight are presented in Figure 5.
When the graphic is examined, the ARMA (1,1) model tends to capture the actual data in fitting, as in the AR (1) and MA (1) models; It can be said that it gives more realistic values than AR (1) and MA (1) models. The sudden visibility range decrease was estimated more accurately by ARMA (1,1) compared to AR (1) and MA (1) models.
Autoregressive Integrated Moving Average Model - ARIMA (p, d, q)
The basis of the Box Jenkins method used in the analysis of univariate time series is to explain the value of time series in any period with a linear combination of observation values and error terms of the same series in the previous period. Therefore, the mentioned method is also seen in the literature as Autoregressive Integrated Moving Average Method (ARIMA) (Özmen, 1986).
When applying ARIMA model, initial parameters were chosen as ARIMA (1,1,1). Model results can be found in Figure 6:
As can be seen from the graphic showing the results of ARIMA (1,1,1) model, it appears that the ARIMA model tends to capture the actual data in the fitting process as in the AR (1), MA (1) and ARMA (1,1) models. However, by estimating the sudden decrease of visibility range better than AR, MA and ARMA models; it has been observed that forecasts tend to catch the trend of actual data.
As can be seen from the graphic applied with the ARIMA (1,1,1) model, it can be said that the ARIMA model tends to capture the actual data in the fitting process as in the AR (1), MA (1) and ARMA (1,1) models.
However, by estimating the sudden decrease of view range better than AR, MA and ARMA models; it has been observed that forecasts tend to catch the trend of real data.
AutoARIMA Model
auto.ARIMA is an R function that returns the best ARIMA model based on AIC, AICc or BIC. The function fits the univariate time series best ARIMA model by performing a search between possible models within the specified constraints. (Montgomery et al., 2008). If there are more than one alternative model in a modeling process, there are multiple model selection criteria in the literature to choose the best one. The most common of these are AIC (Akaike Information Criterion) and SC (Scwarz Criterion) information criteria. According to these criteria, the best model is the model with the lowest numerical value according to AIC and SC values. (Grasa, 1989, Lutkepohl, 1991). AutoArima fits the best ARIMA model to the univariate time series. Returns the best ARIMA model based on AIC or BIC value. This function performs a search according to the possible model within the restrictions. That is, p, d, q values, auto, which are determined manually in ARIMA models. In Arima, all combinations are tried in the background by the program and given as the most optimized program output.
For the sample flight, the results of the ARIMA model (5,1,6) returned by the auto.ARIMA function are shown in Figure 7:
When the graphic is examined, the ARIMA (5,1,6) model obtained as a result of the auto.ARIMA function; It is seen that as in the AR (1), MA (1), ARMA (1,1) and ARIMA (1,1,1) models, it tends to capture the actual data in the fitting process. The sudden decrease in visibility range is much better predicted than the aforementioned models; the tendency to capture the behavior of real data and model accuracy have increased significantly.
Vector Autoregressive Model - VAR ()
The Vector Autoregression (VAR) model is an extension of the univariate autoregression model to multivariate time series data. In this model, all variables are considered intrinsic and a variable is defined as a function of its own the lagged values of other variables in the model.
In Figure 8, the VAR model has been applied and the last 500 of the 130.000 data set have been visualized. The ones shown in red are real data, while the blacks show the fitted data according to the VAR model algorithm. The line, colored in blue, shows the forecast data according to the VAR algorithm. Forecasting for 6 steps forward is made between 85% and 95% confidence intervals, as can be seen from the chart, the VAR model tends to capture real data in the fitting process, as in the Ar and MA, ARMA and ARIMA models. In making predictions, it modeled the sudden visibility range decrease better than the AR, MA and ARMA and ARIMA models, and its success in capturing the trend of real data is much higher than the success of the ARIMA model, but since the AutoArima model and the VAR model results are close to each other, Error tests were used to find out which model has higher success.