FeRM Model for Time Series Forecasting

doi:10.21203/rs.3.rs-2010849/v1

Download PDF

Research Article

FeRM Model for Time Series Forecasting

https://doi.org/10.21203/rs.3.rs-2010849/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Time series forecasting is a way of analyzing the data points indexed in time order and fitting a model to predict future values. Time series data can be hourly, daily, weekly, monthly, or sometimes annual. Time series analysis is used to understand the underlying trends and patterns over time to accurately predict future outcomes. Traditional models such as ARIMA, and deep learning models such as RNNs, Transformers are used for long-term forecasting of time series because they effectively utilize historical information. However, there is still great room for improvement when it comes to the situation where historical data for training is limited. In this paper, we demonstrate a novel time series model with better predictions where the traditional models failed to exploit in view of the fact that not all time series problems can be solved by the traditional models. To this end, our model incorporates the sum of Fourier extrapolation, Rate of change approximation and arithmetic Mean of data points that elicit better predictions and reduced errors which importantly outweighed the state-of-the-art models with little computational overhead.

Time series analysis

Modeling techniques

Modeling and prediction

Extrapolation

Time series forecasting aims to predict its values in the near future and is used for a wide range of applications including but not limited to Finance, Business, Medical, Weather domains etc. Time series analysis is used extensively in today’s data-driven world. Traditional models and advanced deep learning models are serving their purpose in achieving significant prediction accuracy, but they are advantageous with increased data availability. The size of the training data would indeed have a significant impact on the model performance.

The data/ use case we dealt with was sparse and discontiguous (A time series where the observations are not uniform over time are described as discontiguous) Using the traditional models we witnessed a cyclical pattern of behavior that is not captured by moving averages and suffered from significant error accumulation effects. Motivated by the above, we present a computationally effective FeRM model which at its core leverages

1. Fourier extrapolation (to take into account the pattern/ trend),

2. The rate of change approximation (to consider the change in relation to the training data) and

3. The arithmetic mean of each observation to inset the central tendency - a central value for a probability distribution which has helped in gaining a lot of insights for the model to do the forecasting.

Using these three factors into consideration the prediction error is reduced and the model is able to forecast a list of nearly identical values.

2.1 Dataset

Python programming is used for carrying out detailed data analysis on the dataset. We have used our proprietary data which is available at Kaggle for reference [4] This dataset holds day-wise time series data with two columns namely, “ds” representing the date and “y” which denotes the sales values. The sales value ranges from 0 to 10 with 402 zeros out of 1094 records (3 years)

2.2 Background

Traditional models like ARIMA and Prophet are used but have not been met with success. To alleviate this problem, we customized and tweaked various methods which led to the development of the FeRM model.

ARIMA - An autoregressive integrated moving average is a statistical analysis model that uses time series data to better understand the data set and predict future trends. The rule of thumb for ARIMA is that at least 50 data points are required but preferably more than 100 observations. This is the downside for our use case because we lag in terms of the number of data points. Hanke and Wichern [1] recommends a minimum of 2xs to 6xs depending on the method, where s is the seasonal period, so s = 12 for monthly data. 50 data points would be 50/12 = 4 years of data. However, it also depends on the regularity of data, if the seasonal pattern is regular, 3 years of data would be sufficient.

Auto ARIMA is used to find the best model parameters by performing a stepwise search. Python is extensively used to organize and interpret the data for further research analysis. The predicted values of the Auto ARIMA model using our two-year data were confined to a narrow range (1.3, 1.35) which when converted to a whole number is one throughout the year. Refer the Fig. 3 below.

Prophet - algorithm [2] detects the trend and seasonality from the data first, then combine them all together to get the forecasted values considering the Overall Trend, Seasonality and Holiday Effect. Time Series Forecasting has become drastically easier because of the Prophet model as it works without setting any parameters explicitly. It tries to fit additive regression models – ‘curve fitting’. The predicted values of the prophet model using our two-year data were confined to a narrow range (2.0, 3.0) as shown in the Fig. 4 below.

The major problem we faced with these models is that the predicted value range tends to be closer, and the trends were simplistic and weak which leads to the conclusion that the situation is stable, and the predicted values are 1 for all the dates throughout the year (Considering the Auto Arima forecasting). To produce a list of nearly identical sequences is trivially maximized by the FeRM model which forecasts the value by considering the trends and thus the value range is comparatively wider.

2.3 Evaluation Metrics

We used Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) with reference to [5] as key performance metrics indicators to compare the models.

MSE measures the average of the squares of the errors — that is, the average squared difference between the estimated values and what is estimated. An ideal MSE value is 0.0, which means that all predicted values matched the expected values exactly. When a model has no error, the MSE equals zero. As model error increases the MSE value increases. MSE is most useful when the dataset contains unexpected values. MSE is scale-dependent, i.e., dependent on the dependent variable.

MSE formula = (1/n) * Σ (actual – forecast)2

where:

n = number of items,

Σ = summation notation,

Actual = original or observed y-value,

Forecast = y-value from regression.

MAE calculates the average difference between the calculated values and actual values. It calculates errors between actual values and values predicted by the model.

MAE = True values – Predicted values

MAE takes the average of this error from every sample in a dataset and gives the output (each error influences MAE in direct proportion to the absolute value of the error)

Root mean squared error (RMSE) measures the average magnitude of the error. It is the square root of the mean of the square of all of the error. The formula for calculating RMSE: √MSE

3.1 Fourier Extrapolation

Fourier transform is a powerful tool for efficiently representing and analyzing real-world signals. Since real-world data frequently exhibit slowly changing trends, Fourier transform is a great tool for analyzing them. The basic idea of the Fourier transform is that it splits out amplitude versus time function into amplitude versus frequency function. This means Fourier transform is a method that permits us to decompose functions based on time into functions based on frequency. The version of the Fourier transform we need for time series data is the discrete Fourier transform. It is called discrete because the input data is measured at discrete intervals. Our time series data is not a continuous function. This powerful operation in math terms can be expressed as:

\({X}_{k}\) = \(\sum _{n=0}^{N-1}{x}_{n} {e}^{-2\pi ikn/N}\)

The sequence of numbers of the original variable is the input for this formula. The original time domain signal is represented by the ‘x’.

Xn is the time domain datapoint in the n position. And the output will have a single value which is the amplitude of each frequency. The data point in the frequency domain is denoted by ‘X’.

Xk means the value at the frequency of k. This means each of the values in the outcome series is the strength of a specific frequency.

So, we can use a Fourier transform to detect seasonality in time series data. Time series data is a function depending on time. An important aspect of time series data is seasonality. If the amplitude is high for a frequency, which means this seasonality is important in our time series data. Extrapolation refers to estimating an unknown value based on extending a known sequence of values (refer to Fig. 5). Fourier extrapolation means simply extending the past trend and values into the future. It simply repeats the series with period N,

where N - is the length of the time series.

This is used to extend the data outside the original window. Extrapolation methods that incorporate more data are more likely to enhance forecast accuracy. For the implementation of Fourier extrapolation, we have used Artem Tartakynov’s GitHub repository as referred to in [7].

3.3 Rate of change

We use the average rate of change – the rate at which one value within a function change in relationship to another over a specific period of time which can reduce forecast errors substantially. To find the average rate of change, the following steps are involved: 1. Identify and label the two points being used to calculate the average rate, 2. Subtract the first y-value from the second y-value, 3. Subtract the first x-value from the second x-value, 4. Divide the result from subtracting the y-values by the result from subtracting the x-values. The resulting value is the average rate of change.

3.2 Arithmetic Mean

The quantity is obtained by summing two or more numbers and then dividing by the total count of numbers. We have an arithmetic mean as a component to describe the central tendency of a data set by demonstrating the size of a typical value in the data set. Some of the advantages of including the arithmetic mean includes the following,

1. It is influenced by the value of every item in the series,

2. It is a measured value and not based on the position in the series,

3. It is least affected by fluctuations

The mean is an important measure because it incorporates the score from every subject in the research study. The required steps for its calculation are: 1. Count the total number of cases—referred to in statistics as n, 2. Add up all the scores and 3. Divide by the total number of cases.

Mean = Σxi / n

where:

Σ: A symbol that means “sum”,

xi: The ith observation in a dataset,

n: The total number of observations in the dataset

We take the arithmetic of these three components and the results will be the final prediction values (refer to Fig. 7). Arithmetic mean gave better results compared to additive or multiplicative relationships. In an additive model the four components of a time series; trend component (T), seasonal component (S), cyclical component (C) and an irregular component (I), are added to form the values of the time series at each time period. In an additive model the time series is expressed as: Y = T + S + C + I. In the multiplicative model, the original time series is expressed as the product of trend, seasonal and irregular components.

FeRM is a simple model, but it serves a simple yet competitive baseline with strong interpretability. We compare with the following existing methods:

Table 1

Performance comparison of three models on the dataset [4]. The best results are in bold.
Methods	MSE	MAE	RMSE
Prophet	4.4	1.7	2.1
Auto Arima	4.1	1.5	2
FeRM	3.5	1.2	1.8

As can be seen from Table 1, compared with Prophet and Auto Arima models, the FeRM model is better at predicting the future, leading to lower prediction errors. Although the error values don’t seem to make a big difference, yet it's coherent if we look into the prediction plots, Fig. 4, 5, 6, & 7 which distinctly showcase the value range of the other models is tightly packed. Especially Fig. 4 of the Auto Arima model has the value range between 1.3 to 1.35 and still, that performance metric gives the lowest error. Thus, it should come as no surprise that the plot although is not a skill metric, even so this plot is so important that it warrants a discussion. It allows a quick visual assessment of predictions, systematic bias and can influence the conclusions. It shows the superiority of the FeRM model over state-of-the-art methods. Ideally, all the points should be close to the actual value (blue line). So, if the actual value is 7, the predicted should be reasonably close to 7.

The research paper – ‘Simple versus complex forecasting: The evidence’ [6] states that complexity increases forecast error by 27 per cent on average in the 25 papers with quantitative comparisons. Eighty-three per cent of the comparisons found that forecasts from simple methods were more accurate than, or similarly accurate to, those from complex methods. On average, the errors of forecasts from complex methods were about 32 per cent greater than the errors of forecasts from simple methods in the 21 studies that provide comparisons of errors.

In this paper, we propose a novel model for time series forecasting, which is motivated by the unique properties of time series data. With reference to the Fig. 7, we observe that the FeRM model forecast is beneficial with a nice zig-zag line and the forecasts turned out to be more accurate. FeRM model gives substantial gains in forecasting and consistently repels complex models. Our results show that FeRM outperforms existing models at the least by a small margin. Auto Arima and Prophet incur worse results, time, and memory complexity, due to the additional design elements introduced in their model.

Future Work

Our proposed model serves its purpose in places where the training data is limited and discontiguous. Consequently, we believe there is much potential for new model designs to tackle complex time series problems. An extension of this would be adding a regressor component which might further improve the prediction accuracy.

Funding: The authors received no financial support for the research, authorship, and publication of this article.

Authors Contribution:

Author 1 - Conducted the experiments analyzed the data and wrote the paper

Author 1 & 2 - Conception and design of the work

Author 3 - Data collection

Author 1 & 2 - Drafting the article

All authors contributed to manuscript review and revisions. All authors approved the final version of the manuscript and agree to be held accountable for the content herein.

Conﬂict of interest: All the authors declare that they have no conﬂict of interest.

Business Forecasting - John E. Hanke, Dean W. Wichern
Prophet: forecasting at scale by Sean J. Taylor, Ben Letha: https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/
An Introduction to Time Series Forecasting with Prophet in Exploratory by Kan Nishida (https://blog.exploratory.io/an-introduction-to-time-series-forecasting-with-prophet-package-in-exploratory-129ed0c12112)
Time Series Sales Dataset (https://www.kaggle.com/datasets/vibinvijay/time-series-sales-dataset)
Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology by Alexei Botchkarev - Principal, GS Research & Consulting, Adjunct Prof., Department of Computer Science, Ryerson University.
Simple versus complex forecasting: The evidence, by Kesten C. Green, and J. Scott Armstrong (Journal of Business Research 2015)
Fourier Extrapolation in Python by Artem Tartakynov, GitHub Gist: https://gist.github.com/tartakynov/83f3cd8f44208a1856ce
S. Sharmin, F. I. Alam, A. Das and R. Uddin, "An Investigation into Crime Forecast Using Auto ARIMA and Stacked LSTM," 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), 2022, pp. 415–420, doi: 10.1109/ICISET54810.2022.9775862.
H. Musbah, M. El-Hawary and H. Aly, "Identifying Seasonality in Time Series by Applying Fast Fourier Transform," 2019 IEEE Electrical Power and Energy Conference (EPEC), 2019, pp. 1–4, doi: 10.1109/EPEC47565.2019.9074776.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

FeRM Model for Time Series Forecasting

Status:

Version 1

Abstract

Figures

1 Introduction

2 Preliminaries

2.1 Dataset

2.2 Background

2.3 Evaluation Metrics

3 Ferm Model

3.1 Fourier Extrapolation

3.3 Rate of change

3.2 Arithmetic Mean

4 Results And Comparison

5 Conclusion

Declarations

References

Additional Declarations

Status:

Version 1