Generalized support vector machines (GSVMs) model for real-world time series forecasting

Support vector machines (SVMs) are one of the most popular and widely used approaches in modeling. Various kinds of SVM models have been developed in the literature of prediction and classification in order to cover different purposes. Fuzzy and crisp support vector machines are a well-known branch of modeling approaches frequently applied for certain and uncertain modeling, respectively. However, each model can only be efficiently used in its specified domain and cannot yield relevant and accurate results if the opposite situations have occurred. While the real-world systems and data sets often contain both certain and uncertain patterns that are complicatedly mixed and need to be simultaneously modeled. In this paper, a generalized support vector machine is proposed that can simultaneously benefit the unique advantages of certain and uncertain versions of the traditional support vector machines in their specialized categories. In the proposed model, the underlying data set is first categorized into two classes of certain and uncertain patterns. Then, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the function of the relationship, as well as the relative importance of each component, is estimated by another support vector machine. Finally, the forecasts of the proposed model are calculated. Empirical results of wind speed forecasting indicate that the proposed method not only can achieve more accurate results than support vector machines (SVMs) and fuzzy support vector machines but also can yield better forecasting performance than traditional fuzzy and nonfuzzy single models and traditional preprocessing-based hybrid models of SVMs.


Introduction
The effectiveness of machine learning (ML) techniques in predicting have been recognized in different domains of applications. For example, recently deep learning techniques that utilize convolutions and recurrent neural networks are being used to classify different problems (Darabi et al. 2019). Theoretical analyses of the literature prove the reliability and superior performance of learning machines (Xie et al. 2020). There are various ML techniques that literature indicates hybrid methods have a superior performance against single models. Also, the literature shows that ensemble learning techniques are among the most accurate models (Ahmadi and Khashei 2021). It decomposed the predictive models into sub-tasks. In other words, these hybrid techniques generalized the advantages of single models to meta-learning. So, using multiple experts, the performance of the hybrid model increases by getting into homogeneous regions. It is the case for the hybridization of statistical and neural networks. According to the importance of learning machines, several papers have been attempted in wind power and speed forecasting in order to model different data sets on various aspects by single and multiple expert models. Wang et al. (2019) have used IPN-RBF for short-term wind speed forecasting. To show the model's accuracy, the proposed model is compared with other models based on statistical indices. Maruliya Begam and Deepa (2019) have proposed RBFNN-ENN-SNN-WNN for short-term wind speed forecasting by historical data of wind speed and power data sets. The results show the high performance of the proposed model. Jiang and Li (2018) have introduced a hybrid BPNN-ENN-GRNN model for ultra-short-term wind speed forecasting. The results show that the proposed model has better performance than other models. Men et al. (2016) have provided an ensemble NARX-ANN model for short-term wind power and speed forecasting.
The numerical results show that the proposed model has high accuracy in predicting wind speed and power for the multistep ahead. There are some popular machine learning techniques in which support vector machine (SVM) and artificial neural networks (ANNs) are among the most widely used and important methods. The support vector machines or support vector networks are among the most popular supervised learning machine approaches frequently used for prediction and classification tasks. Neural networks have been generally considered as important tools to handle kinds of complex problems in recent decades, which have plenty of applications in science ). However, numerous researchers have been attracted to use the support vector machines rather than other artificial neural networks, due to some specific advantages of these models. Support vector machines, unlike artificial neural networks, do not need to choose numbers, sizes of hidden layers, and activation functions to create a high-precision model. Furthermore, the support vector machines often have better accuracy and speed in resolving nonlinear problems . For such reasons, in recent years, support vector machines have been widely used in wind energy forecasting. Mohandes et al. (2004) have used a support vector machine model to predict wind speed. The time series of 12-year has been collected between 1970 and 1982 in Medina, Saudi Arabia, in order to make reliable and accurate results. The results of the support vector machine and multilayer perceptron models have been compared and have been shown that the mean squares error of the SVM is less than the multilayer perceptron. Zhou et al. (2011) have estimated the short-term wind speed using the least square support vector machine (LSSVM) in North Dakota based on the hourly wind speed data in the year 2002. In the analysis of this study, the data set has been divided into four seasonal sections, and a great effort to provide an accurate adjustment of the LSSVM model by taking three linear core functions, namely Linear, Gaussian, and Polynomial, and Poly functions. The effectiveness of the used model is examined by the mean squared error (MSE), and this comparison demonstrates the model's ability compared to other models. Jiang et al. (2019a) have developed a hybrid model, including a variational mode decomposition-multi-objective salp swarm algorithm and a least square support square machine (LSSVM) model. This model is then used for wind power forecasting using wind speed data in China for 10 min, 30 min, and 60 min time horizons. By examining the results of the multi-objective optimization models with the single-objective optimization methods, it can be concluded that the multi-objective slap swarm algorithm has much more accurate than other models. Chen et al. (2019) have used a hybrid model, including the singular spectrum analysis (SSA) and the LSSVM model for wind speed forecasting. In this study, three data sets are collected based on the wind speed data at the National Wind Technology Center (NWTC) of the National Renewable Energy Laboratory (NREL). The experimental results indicate that the proposed model performs better than other conventional methods. Hong et al. (2019) have developed a high-frequency morphological filter (MHF) and a double similarity search (DSS) algorithm-least squares support vector machine (LSSVM) model for short-term wind power and speed forecasting. Results demonstrate that the MHF-DSS model provides more accurate and stable forecasts in comparison to the other methods. Li and Jin (2018) have used a new multi-objective ant lion algorithm (MOALO) and LSSVM model for ultra-short-term wind speed forecasting. Numerical results show that the proposed model has higher accuracy than other hybrid models. Chen and Jie (2014) have used a hybrid model consisting of the Kalman filter and support vector regression (SVR) for short-term wind speed forecasting. The SVR-KF method is compared with artificial neural networks (ANNs), SVR, autoregressive (AR), and AR-Kalman approaches. The forecasting results indicate that the proposed method has better performance in both one-step-ahead and multistep-ahead wind speed predictions than the other approaches across all locations. Jiang et al. (2018a) have introduced a new hybrid method (GARCH, LSSVM) for ultra-short-term wind speed forecasting. Results show that the proposed method has more satisfactory performance in both accuracy and stability than others. Guo et al. (2011) have used a SARIMA-LSSVM model for ultra-short-term wind speed forecasting. To demonstrate the ability of the proposed model, monthly data from January 2001 to December 2006 are used in the Masong and Jiukan Mountains. The numerical results show that the proposed model is efficient. Yuan et al. (2017) have proposed a hybrid model, incorporating an autoregressive fractionally integrated moving average and a LSSVM model for short-term wind power forecasting. The autoregressive fractionally integrated moving average model is used to predict the linear components of the wind power series. Also, the LSSVM model is applied to predict the nonlinear patterns. Compared with other models, simulation results show that the proposed hybrid model has the lowest error values, among other models. Khosravi et al. (2018) have used the SVR-RBF model for 5 min, 10 min, 30 min, and one-hour wind speed and direction forecasting. In this paper, a large set of wind speed and wind direction data is utilized in order to predict the wind speed accurately and its direction at Bushehr. Numerical results indicate that the proposed SVR-RBF model can achieve the lowest statistical errors and the highest correlation coefficient for all time intervals. Zafirakis et al. (2019) have provided ensemble SVR-ENN for short-term wind power and speed forecasting. The numerical results show that the proposed model has high accuracy in predicting wind speed and power for the multistep ahead. Wu and Xiao (2019) have used ensemble ANNs-SVM-MLR for ultra-short-term wind speed forecasting. The numerical results show that the proposed model is efficient. Liu et al. (2019) have proposed ensemble CNNGRU-SVR for short-term wind speed forecasting. Daily wind speed time series is used to demonstrate the efficiency of the proposed model. Jiang et al. (2019b) have investigated an ensemble method (CEEM-DAN-Holt's method-SVR) for ultra-short-term wind speed forecasting. Results show that the proposed method has more acceptable results than others. Despite the unique benefits of supportive vector machine models in solving nonlinear problems, these models have weaknesses in uncertainty environments with complex patterns. Although fuzzy support vector machines (FSVMs) have also been developed in the literature to solve uncertain issues, no support vector machines model simultaneously models certain and uncertain problems. Articles published in wind power and speed forecasting in recent years based on the support vector machines are shown in Table 1. As mentioned, using different kinds of support vector machines, especially in the hybrid form, increases the accuracy of forecasts in wind power and wind speed fields. Although support vector machine models have effectively reduced the inconsistencies and weaknesses of artificial neural network models, they may have desired performance, especially in ambiguous patterns (Zendehboudi 2016). In order to eliminate the shortcomings of the support vector machine models in uncertain environments modeling, the fuzzy support vector machine (FSVM) models have been developed. The fuzzy support vector machines were first introduced by Lin and Wang (2002). Fuzzy support vector machines are among the most widely used uncertain models to solve fuzzy nonlinear problems in the subject literature. Although the fuzzy support vector machine models have a high ability to model uncertainty, these models can only model a specific part of the patterns in the data sets (e.g., uncertain patterns) efficiently. While the actual data sets often have both certain and uncertain patterns (Mencar and Pedrycz 2020). So, a more comprehensive model is needed that can simultaneously model certain and uncertain patterns. Therefore, in this paper, a generalized support vector machine is presented, in which the support vector machines and fuzzy support vector machines are combined. In the proposed model, the raw data set is first decomposed into certain and uncertain patterns by a preprocessing model. Then, certain patterns are modeled by a certain support vector machine, and uncertain patterns are modeled by a fuzzy support vector Generalized support vector machines (GSVMs) model for real-world time series forecasting machine. At the last stage of the proposed model, the function of the relationship, as well as the relative importance of each component, is estimated by another support vector machine. Finally, forecasts of the proposed model are calculated. The remainder of the paper is organized as follows. In the next section, the methodology and the applied models as components of the proposed model are briefly introduced. In Sect. 3, the explanation of the proposed model is discussed. In Sect. 4, the numerical results of using the proposed model for wind power and speed forecasting are reported, and the performance of the proposed model is compared with other models in the term of accuracy. Some conclusions are offered in the last section.

Methodology
In this section, the details of the Kalman filter, the support vector machines (SVMs), and the fuzzy support vector machines (FSVMs) are briefly presented.

Kalman filter (KF)
The Kalman filter is based on the hypothesis of non-delayed measurements (Kumar Singh 2020). It is a consecutive data preprocessing technique that addresses the state estimation problem of linear dynamical systems. There are various Kalman filtering (KF) techniques, such as conventional KF for linear problems to extended KF, unscented KF, and ensemble KF for nonlinear issues (Fang et al. 2018). A linear m-dimensional state vector (x 2 R m ), at a given time z, can be demonstrated as follows (Lynch et al. 2014): where M defines the m Â m state transition matrix and g z ð Þ 2 R n is the noise of model noise which is a normal distribution with a covariance matrix Q, here signified by g $ N 0; Q ð Þ. The observations vector (y z ð Þ) of dimension p at a given time z can be denoted by: where H is the p Â m observation operator that converts model state to the p-dimensional observation space, and e z ð Þ 2 R p is supposed to be a Gaussian noise with the error covariance matrix R. This method involves two steps of analysis and forecast steps. In the analysis step (a), the Kalman filter provides an estimate of the recent observations. Then, in the forecast step (f ), the previous estimate is updated in time until new observations become available. The analysis state x a z ð Þ and analysis covariance p a z ð Þ in the analysis step can be shown as: where I is the m Â m identity matrix and the Kalman gain matrix, G, is given by: In the forecast step, both x a z ð Þ and p a z ð Þ later evolve forward to time z þ 1 using Eqs. (6) and (7).

Support vector machines (SVMs)
A support vector machine is a machine learning algorithm constructed based on the statistical learning theory and the principle of structural risk reduction. Support vector machines were first proposed by Cortes and Vapnik (1995). The SVMs have been successfully used for various purposes, such as image retrieval, error detection, text detection, and regression problems. The main idea of this approach is to convert the input space with a nonlinear region into a linear one with large dimensions. In the support vector regression (SVR) models, the support vector machine model approximates the function and regression. Various kernel functions are used in support vector machine models, including polynomial function, radial basis function, sigmoid function, and the linear function.
The training data set consisting of input and output pairs is considered, that q is the dimension of the input vector, y i 2 R, the corresponding target value, and N refers to the size of the training data. The regression model is computed according to Eq. (8).
where W is the weight vector, b is the bias component, and h x ð Þ represents a nonlinear mapping function that maps x to higher dimensional pattern recognition. According to the risk minimization principle to obtain the W, and b, Eq. (8) is minimized by Eq. (9).
where f i ; f Ã i are the positive lag variables and Cis the error parameter. Finally, the support vector regression is introduced by the Lagrangian coefficients

Fuzzy support vector machine SVM (FSVM)
Since the traditional support vector machine methods, the training data sets get the same weights, the important data with unique properties are neglected. Therefore, in the fuzzy support vector machine method, by applied the fuzzy membership function Z i ð Þ, the training features set S ð Þ are introduced according to Eq. (11) (Lin and Wang 2002).
where x i is the input training feature data, y i 2 À1; 1 f g in the two-class classification problem, and the fuzzy membership Z i calculated by: where m is the weighted components, and o j j ¼ 1; 2 ð Þis the center of classes. After selecting the appropriate function kernel and parameter c, an optimization problem is solved according to Eq. (13).
Min a 1 2 After selection of the vector a Ã ¼ a Ã 1 ; a Ã 2 ; :::; a Ã L 0 and b Ã , the construct the decision function is calculated according to Eqs. (14) and (15).
3 The proposed generalized support vector machines (GSVMs) Despite all the unique features of certain and uncertain support vector machines, they can only model certain and uncertain parts of the existing patterns in the underlying data. While real-world raw data sets have both patterns, simultaneously. In order to separately model certain and uncertain patterns in the subject literature, various models of support vector machines have been developed. Although these support vector machine-based models have yielded acceptable results in terms of performance and accuracy, none can simultaneously model the certainty and uncertainty in the data sets. In general, two possible techniques of (1) simultaneously modeling both patterns and (2) decomposing patterns, separately modeling, and finally combined, can be considered for lifting this task. In this paper, the second strategy is elected due to less complexity and lower computational time and cost. Therefore, in the first stage of the proposed model, the raw data are preprocessed by the Kalman filter preprocessor for decomposing certain and uncertain patterns. After that, decomposed certain and uncertain data, along with the lags of the raw data, are used as input of the support vector machine and the fuzzy support vector machine for separately modeling patterns. In this way, certain patterns that often have lower complexity and less ambiguity are entered into certain support vector machines. Also, uncertain patterns that often have higher complexity and ambiguity are entered into the fuzzy support vector machines. In the last stage, the function of the relationship and the relative importance (weight) of these components are estimated by another support vector machine. Later, the final forecasts of the proposed model are calculated. The general process of the proposed method is presented in Fig. 1. The procedure of implementing the proposed model is explained in more detail as follows: Phase 1 (Data preprocessing step): In the first step of the proposed model, the trend patterns of actual data are obtained by the Kalman filter. Phase 2 (Pattern Analysis step): Commonly, a filter-based decomposition technique is used to mine the regular trend patterns in the raw data. It is generally demonstrated in the literature that the first mined component (Trend) of a decomposition process has the lowest complexity and uncertainty; thus, it is considered ''Certain Patterns.'' Consequently, remained patterns that have higher complexity and/or uncertainty, are regarded as ''Uncertain Patterns.'' So in the second step of the proposed model, the output of the Kalman filter is regarded as certain patterns and, the difference between actual data and trend patterns is regarded as uncertain patterns.
Phase 3 (Forecasting step): In the third step of the proposed model, firstly, the assumptions of normality and stationarity of time series are investigated. Since it is not easy to predict non-stationary time series, it is better to eliminate the factors that cause the time series to go out of stationary. In this paper, differencing is used in order to stationary. After that, certain patterns are entered into the SVM model in order to model the crisp patterns. In addition, the uncertain patterns are entered into the FSVM model, so the non-crisp patterns model as well.
Phase 4 (Integrating step): In the last step of the proposed model, the output of the SVM model, FSVM model, along with the lags of the raw data, are integrated by another SVM model to model all certain and uncertain patterns in the underlying data set.
4 Numerical results of the proposed method for wind power and speed forecasting In this section, the training data set is used to design different models and the proposed model. After designing each model, the performance of each model is calculated using the criteria mentioned above. Ultimately, a comprehensive assessment of the performance of the methods compared with each other in the training data and the data tests has been conducted.

Data sets and evaluation metrics (Phase 0)
In this section, two real data sets are used to evaluate the performance of the proposed model. The first data set is the ultra-short-term wind power data, which is located in Sotavento Galicia (Fig. 2). This time series is for the first week of May 2015 and consists of 168 points. 85% of data (e.g., 144 observations) is applied as the training sample, and 15% of the remaining data (e.g., 24 observations) is used as the test sample in order to evaluate the performance of the proposed model compared with other models. The second data set is wind speed data, which is collected in the Colorado State on 09/02/2012 from 00:00 to 14:00 and consists of 169 points. The first 80% of data (e.g., 133 observations) is applied as the training sample, and 20% of the remaining data (e.g., 36 observations) is used as the test sample. These data sets are benchmark wind energy data, which has been frequently used in the literature (Azimi et al. 2016 where A t is the actual value at time t, F t is the forecasting value at time t, and N is the number of data.

Results of the Kalman filtering (KF) (Phase 1)
According to the process of the proposed method, the Kalman filter model has first used to mind the trend (certain) patterns. So, the Kalman filter is designed by selecting appropriate coefficients. At first step, a sample signal is selected randomly, an autoregressive signal, then a completely random Gaussian noise is applied to the desired signal. The noise signal is then analyzed, and the noise applied to the signal is removed. This operation is completely simulated in MATLAB software. Final output is considered as certain patterns. The difference between trend patterns and actual data is considered as uncertain patterns. The trend patterns obtained from the Kalman filter against the actual data are shown in Fig. 3.

Results of the pattern analysis (Phase 2)
According to the second step of the proposed model, the output of the Kalman filter is regarded as a certain pattern. By mining certain patterns, the uncertain pattern is obtained, as mentioned in the previous section. The certain and uncertain patterns in the underlying data set are shown in Fig. 4.

Results of the certain and uncertain support vector machine (CSVM and USVM) (Phase 3)
In the third step of the proposed model, after decomposing the underlying data, certain patterns in the data, along with the lags of the raw data, are entered into the SVM model. The performance indicators of the designed certain support vector machine (CSVM) model until each hour of the test day are shown in Fig. 5 based on AE (Absolute Error) and SE (Squared Error) criteria. It can be seen from Fig. 5 that the maximum and minimum values of MAE and MSE values are 5.92 9 10 4 , 2.48 9 10 5 , and 3.51 9 10 9 , 1.35 9 10 10 , respectively. The performance of the CSVM model is also presented in Table 2.
Results of Table 2 indicate that the CSVM model can achieve 7.87 9 10 4 , 8.65 9 10 9 , and 9.30 9 10 4 in MAE, MSE, and RMSE in the whole test day, respectively. The error values of the CSVM model for training and test data sets are also shown in Fig. 6. It can be concluded from these error values that the support vector machine employed for modeling certain patterns can appropriately do it and can yield satisfactory results. It can demonstrate that the data generation process of the underlying data contains certain patterns.
Similarly, the uncertain patterns in the data, along with the lags of the raw data, are entered into the fuzzy support vector machine model. The performance indicators of the designed uncertain support vector machine (USVM) model until each hour of the test day are shown in Fig. 7. Results show that the maximum and minimum values of MAE and MSE values are 6.64 9 10 4 , 1.11 9 10 5 , and 6.26 9 10 9 , 1.34 9 10 10 , respectively. The performance of the USVM model is also reported in Table 3.
Results of Table 3 indicate that the USVM model can achieve 6.87 9 10 4 , 7.15 9 10 9 and 8.64 9 10 4 in MAE, MSE, and RMSE in the whole test day, respectively. The error values of the USVM model for training and test data sets are also shown in Fig. 8 Fig. 2 The wind power data sets Generalized support vector machines (GSVMs) model for real-world time series forecasting data. It can demonstrate that the data generation process of the underlying data may include uncertain patterns and certain ones.

Results of the proposed model (GSVM) (phase 4)
After modeling certain and uncertain patterns in the underlying data, the relationship between these two types of patterns, as well as their relative importance to produce the actual values, must be functionally estimated. Thus, in the last step of the proposed model, obtained results of the CSVM and USVM, along with the lags of the raw data, are entered into a support vector machine. Finally, forecasts of the proposed method are calculated. These forecasting results using evaluation metrics are presented in Fig. 9. The actual and fitted values of the proposed model for train and test samples are shown in Fig. 10. The evaluation metrics for the proposed model for test day are reported in Table 4. The results of MAE, MSE, and RMSE for this model are 3.38 9 10 4 , 1.63 9 10 9 , and 4.04 9 10 4 , respectively. Also, the error values of the proposed model for the train and test data sets are shown in Fig. 11.

Comparison with other models
In this section, the performance of the proposed model in both data sets in train and test is compared with other forecasting models. These models involve the autoregressive integrated moving average (ARIMA), the general regression neural networks (GRNNs), the radial basis functions (RBFs), the multilayer perceptrons (MLPs), the support vector machines (SVMs), the fuzzy autoregressive integrated moving average (FARIMA), the fuzzy multilayer perceptrons (FMLPs), the fuzzy support vector machines (FSVMs), and the Kalman preprocessing based hybrid model of ARIMA (KARIMA), the Kalman preprocessing based hybrid model of MLP (KMLP), the Kalman preprocessing based hybrid model of SVM   3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Tables 5 and 6, respectively. In this paper, the improvement percentage (IP) of Model A against the Model B is calculated as follows:

Conclusion
The literature shows that it is not possible to provide a compatible model globally to predict wind energy because of variable patterns that vary based on various factors. The support vector machine is a nonlinear model that is a widely used technique to predict and classify various data, especially wind speed and wind power. Despite the unique advantages of support vector machines in solving nonlinear problems, these models have a fundamental weakness in the uncertainty and ambiguous environments. Fuzzy support vector machines have been proposed to eliminate the weakness of traditional SVM models and to model the uncertain patterns appropriately. Although each of the support vector machines and the fuzzy support vector machines separately has a high capacity in certainty and uncertainty modeling, the real-world data sets often have certain and uncertain patterns simultaneously. Therefore, none of them can properly be used, and a new version of the support vector machines is needed to simultaneously model the certainty and uncertainty in the data. For these reasons, in this paper, a generalized support vector machine (GSVM) model is proposed to predict certain and uncertain patterns simultaneously. Therefore, after decomposing the input data into certain and uncertain patterns, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the weight of each model is calculated by another support vector machine. Numerical results show that the proposed model can achieve more accurate results than its components, base models, and other single and hybrid models.
Author contributions Mehrnaz Ahmadi and Mehdi Khashei equally contributed to the design and implementation of the research, to the analysis of the results, and the writing of the manuscript.