Wavelet kernel least square twin support vector regression for wind speed prediction

Wind energy is a powerful yet freely available renewable energy. It is crucial to predict the wind speed (WS) accurately to make a precise prediction of wind power at wind power generating stations. Generally, the WS data is non-stationary and wavelets have the capacity to deal with such non-stationarity in datasets. While several machine learning models have been adopted for prediction of WS, the prediction capability of primal least square support vector regression (PLSTSVR) for the same has never been tested to the best of our knowledge. Therefore, in this work, wavelet kernel–based LSTSVR models are proposed for WS prediction, namely, Morlet wavelet kernel LSTSVR and Mexican hat wavelet kernel LSTSVR. Hourly WS data is gathered from four different stations, namely, Chennai, Madurai, Salem and Tirunelveli in Tamil Nadu, India. The proposed models’ performance is assessed using root mean square, mean absolute, symmetric mean absolute percentage, mean absolute scaled error and R2. The proposed models’ results are compared to those of twin support vector regression (TSVR), PLSTSVR and large-margin distribution machine-based regression (LDMR). The performance of the proposed models is superior to other models based on the results of the performance indicators.


Introduction
Renewable energy sources are progressively made use of because of the amazingly booming contamination degrees in the air, water and soil. Wind energy has ended up being the prime focus for energy developers because of the accessibility of megawatt size wind equipment, environmental friendliness, low cost of maintenance, ease of availability, etc. Moreover, wind resource is a tidy, limitless and free resource. This resource has offered mankind for numerous centuries in driving the wind turbines and pumping water (Bakhsh et al. 1985). To utilize this resource efficiently, prediction of wind speed (WS) plays a critical role. This is necessary for site planning, performance analysis and deciding the optimal size of the wind turbine. Rehman and Halawani (1994) predicted wind speed using the autoregressive moving average method. Recently, machine learning (ML) models such as artificial neural network (ANN), support vector machine (SVM), Gaussian process regression (GPR), fuzzy logic (FL) and extreme learning machine (ELM) have been extensively used for this purpose. Selection of the most appropriate ML technique is very important for obtaining accurate results. Apart from these ML techniques, researchers are nowadays focused on the development of hybrid ML tools to achieve accurate WS forecasting (Natarajan and Nachimuthu 2019). Furthermore, hybrid ML has the advantage of both algorithms. Salcedo-Sanz et al. (2011) tried to estimate the short wind speed using an evolutionary SVR model. Wang et al. (2015) forecasted the WS using support vector regression (SVR) optimized by the cuckoo search optimization algorithm. Wang et al. (2016) utilized the SVM whose parameter search was optimized using a novel steepest descent Cuckoo search algorithm for the dataset which was pre-processed using the empirical mode decomposition. Khosravi et al. (2018) forecasted the WS and wind direction with the help of multi-layer feed-forward ANN, SVM with radial basis function (RBF) and adaptive neurofuzzy inference system (ANFIS) optimized using a swarm optimization (SO) called particle SO algorithm. Mi et al. (2019) developed a new WS multi-step forecasting framework based on the singular spectrum analysis, empirical mode decomposition and convolution SVM. Wu and Lin (2019) forecasted the WS based on LS-SVM optimized using the bat optimization algorithm. They applied variation mode decomposition to disintegrate the original WS series into separate sub-series with different frequencies. Fu et al. (2019) coupled SVM with improved chicken SO algorithm for the prediction of WS, and the results were compared with SVM-CSO. Xiang et al. (2019) forecasted WS based on the hybrid of improved empirical wavelet transform and LS-SVM. Li et al. (2020) performed wind power prediction based on SVM with improved dragonfly algorithm. Vinothkumar and Deeba (2020) used a long short-term memory network model and variants of the SVMs for forecasting the WS in the neighbourhood of windmills. Gupta et al. (2021) adopted convolution longterm short memory (LSTM) for the short-term prediction of wind power density for five stations in Tamil Nadu.
In addition to the above discussed hybrid SVM models, wavelet transform has been used in combination with SVM for the forecasting the WS of a location. There are two types of studies where the wavelet transform (WT) has been adopted. In some of the earlier studies, wavelet has been adopted for the processing of the WS data by considering the same as a signal, while in some other cases, a waveletbased kernel has been used in the SVM model. Some of the studies which fall under the former category have been performed by Liu et al. (2014), Sangita and Deshmukh (2011), Sun et al. (2013), Sivanagaraja et al. (2014) and Prasetyowati et al. (2019). While most of them have considered the only SVM for the former case, Dhiman et al. (2019) have adopted not only SVM but also its variants. They analysed the efficiency of a hybrid model consisting of WT and several variants of SVR like ɛ-SVR, least square SVR, twin SVR (TSVR) and ɛ-TSVR for the prediction of WS at a farm in Spain. Only a few studies about the latter case have been performed for WS prediction. Zeng and Qiao (2012) performed wind power prediction using a novel wavelet SVM that changes between the RBF kernel and a Mexican hat kernel. They established that the wavelet SVM outperforms the radial basis function kernel and Mexican hat kernels. He and Xu (2019) predicted the WS of Ningxia using a combination of a wavelet as well as polynomial kernel function and concluded that the combined kernel has better accuracy compared to the single kernel function. A few very recent prominent works are portrayed in Table 1.
In this study, the major contributions are as follows: 1. Motivated by the contribution of Zhang et al. (2004) and Ding et al. (2014), two novel wavelet-based models, namely, Morlet wavelet kernel LSTSVR (MKLSTSVR) and Mexican hat LSTSVR (MHKLSTSVR) models, are proposed for WS prediction. 2. Inspired by the PLSTSVR model, the optimization problem of the proposed MKLSTSVR and MHKLSTSVR is solved in primal, which always shows better performance compared to solving them in duals (Gupta 2017). 3. To expand the kernel selection range of the PLSTSVR model and to improve its generalization ability, the wavelet kernels are embedded in the PLSTSVR model.
The models have been applied to the WS data collected from four locations in the state of Tamil Nadu, India, and the performance of these models has been compared with other ML models such as TSVR, primal least square twin support vector regression (PLSTSVR) and large-margin distribution machine-based regression (LDMR). The rest of this paper is organized as follows: the "Related works" section briefly describes the related works. The "Proposed models" section describes the wavelet analysis briefly and also proposes the two novel models. Dataset description and area of study is presented in the "Dataset description" section. Experimental and numerical analyses are presented in the "Experimental setup and numerical analysis" section; statistical analyses are presented in the "Statistical analysis using Friedman test with post hoc analysis" section. The "Conclusion" section describes the conclusion.

Related works
Consider that m is the total number of samples. The total training samples can be represented as x i m i=1 ∈ R n . x i ∈ R and y i ∈ R N indicate the input training points and original output, respectively. Also, let us consider that G ∈ R m×n are the input samples. Here the ith row vector can be represented as x t i y = (y 1,... y m ) denotes the observed data points.

The TSVR model
TSVR (Peng 2010) seeks for two non-parallel functions termed as ε-insensitive down-bound f 1 (x) = K(x t , D t )w 1 + b 1 and up-bound function f 2 (x) = K(x t , D t )w 2 + b 2 , respectively. The primal problems of TSVR are expressed as follows: and where regularization parameters are C 1 , C 2 > 0 , and input parameters are 1 , 2 > 0 ; the slack variables are indicated by 1 and 2 . e is the ones vector and G ∈ R m×n are the input samples. By introducing the Lagrangian multipliers 1 , 2 > 0 and applying the K.K.T. sufficient conditions on Eq. (2), the duals of Eqs. (1) and (2) are expressed as follows: s.t., y K(G, G t . The unknowns w 1 , w 2 , b 1 , b 2 can be determined as follows: and where > 0 and I is an identity matrix.

The PLSTSVR model
To reduce the computational complexity of the TSVR model, Huang et al. (2013) suggested a novel LSTVSR solved in primal, termed as least PLSTSVR. The formulation of PLSTSVR can be expressed as follows: and where R = (Y − 1 e) and Q = (y + 2 e) Now, by placing the values of Eqs. (5) and (6), the slack vectors in the objective functions, further computation of the gradient with respect to w i and b i for i = 1, 2 , and equating to zero, we get, and (7) ] is the augmented matrix; > 0 is a small positive integer and I is an identity matrix. The final regressor of PLSTSVR for any new instance x ∈ R n can be calculated as follows:

The LDMR model
Recently, Rastogi et al. (2020) introduced a margin distribution-based LDMR which was on the spirit of the LDM model (Zhang and Zhou 2014). LDMR simultaneously minimizes the ε-insensitive loss function and the quadratic loss function. Hence, LDMR shows improved regression performance for noisy data, and it considers the minimization of scattering which is inside the ε-tube. The primal problem of LDMR can be expressed as follows: where the input parameters are , l 1 , l 2 , The dual problem of Eq. (9) may be obtained after adding the Lagrangian multiplier and applying the KKT condition as follows: After determining the solution from Eq. (10) for 1 and 2 , the unknowns w and b can be obtained as follows (Hazarika et al. 2020a): The LDMR regressor f (.) can be obtained for any new sample x ∈ R n as follows:

Proposed models
The wavelet transform (WT) is an enhanced version of the conventional Fourier transform (FT) that was introduced by Jean Morlet in 1982(Morlet et al. 1982a. Recently, the WT analysis has gained a lot of interest among the researchers as it overcomes the two key disadvantages of FT (Sifuzzaman et al. 2009;Hazarika et al. 2020a, b): 1) FT is suitable only for stationary signals, whereas WT is suitable for both stationary as well as non-stationary datasets. 2) In FT, the time information is lost while transforming the time domain to the frequency domain, whereas in WT, the time information is not lost.
Wavelet analysis was initially suggested to enhance seismic signal analysis by switching from short-term analysis of Fourier to improved algorithms to identify and analyse abrupt signal changes (Daubechies 1990(Daubechies , 1992Mallat 1999). The wavelet has the multi-resolution and localization capacity in both time-domain (TD) and frequency domain (FD). Wavelet transform's TD properties can be elaborated through the wavelet functions which are translated from a wavelet base function (Holland 1992).
The wavelet functions can be derived from the mother wavelet (MW). Let (z) be the MW function (Hazarika and Gupta 2020). The wavelet function (z) can be obtained by the temporal translation t and with dilation as follows: Equation (11) can be represented in the time domain as follows: (12) From Eqs. (11) and (12), one can observe that the local characteristics of a signal can be reflected through the wavelet transform. Hence, the wavelet transforms as an analytical technique has shown great potential (Zhang et al. 2004;Zhou and Ye 2006;Ding et al. 2016).

Translation-invariant kernel
The kernel of translation-invariant (TI) is acceptable if and only if the FT is always positive.
Lemma 1: Let (z) be an MW, and let and t denote the dilation and translation. If z, z � ∈ R N , , then the dot-product wavelet kernel may be expressed as follows: The TI kernel of Eq. (13) may be expressed as In Eq. (14), N represents the total quantity of the sample (Ding et al. 2014).
To generate the TI kernel function, this work uses two wavelet kernel functions. They are:

Proposed MKLSTSVR
MKLSTSVR indicates the PLSTSVR embedded with the Morlet wavelet kernel instead of using the traditional kernels. To be an acceptable kernel, the Morlet which is an admissible kernel (Zhang et al. 2004). Hence, it is a kernel by which the wavelet kernel trick can be used to construct MKLSTSVR. The basic goal of MKLSTSVR is to search for the optimum wavelet coefficients in feature space spanned by multidimensional wavelet basis (MWB). Thus, it obtains the optimal prediction function. Thereby, the prediction function for MKLSTSVR can be obtained as follows:

Proposed MHKLSTSVR
MHKLSTSVR indicates the PLSTSVR combined with the Mexican hat wavelet kernel rather than the traditional kernels. To be an admissible kernel function, the Mexican hat kernel should also satisfy the TI kernel theorem as shown in Eq. (14). Lemma 3: The Mexican hat wavelet kernel function (MHK) follows the TI kernel theorem as the following: which also can be considered as an admissible kernel (Zhang et al. 2004). Therefore, the Mexican hat wavelet kernel trick can be used to construct MHKLSTSVR. Like MKLSTSVR, the basic goal of MHKLSTSVR is to find the optimum wavelet coefficients in feature space spanned by MWB. Thereby, it obtains the optimum predictor. Thereby, the predictor for MHKLSTSVR can be obtained as follows:

Dataset description
The state of Tamil Nadu is situated in the southern part of India. Tamil Nadu is bordered by the Eastern Ghats in the north, Anaimalai hills and Kerala in the west, Bay of Bengal in the east, Gulf of Mannar and Palk Strait in the southeast and the Indian Ocean in the south. The climate of Tamil Nadu ranges from sub-humid to semi-arid. The sites selected for this study has been provided in Fig. 1. The geographical information such as latitude (LAT), longitude (LON) and altitude (ALT) of the chosen sites are provided in Table 2. Hourly average WS data recorded at a height of 50 m above ground level was collected from MERRA-2 analysis database (NASA) for the period of Jan 1980 to May 2018. However, in our experiments, we have considered the data from Jan 2014 to May 2018. The statistics of the WS data of the above four stations have been provided in Table 3 below.
The maximum wind speed of 22.64 m/s is noticed in Tirunelveli, and the lowest wind speed of 13.86 m/s is noticed in Madurai. The observation is similar for the mean WS too. Figure 2 shows the hourly wind speed data from January 2014 to May 2018.

Experimental setup and numerical analysis
To illustrate the prediction capability of the proposed MKLSTSVR and MHKLSTSVR models, their performance has been compared with TSVR, PLSTSVR and LDMR on a portion of the WS dataset. The performances of these implemented models are evaluated using RMSE (root mean squared error), MAE (mean absolute error), SMAPE (symmetric mean absolute percentage error), MASE (mean absolute scaled error) and R 2 (coefficient of correlation). The definitions of these evaluators can be expressed as: • RMSE: RMSE describes how close the data is near the line of best fit. To validate experimental results, RMSE is frequently applied in climatology, forecasting and regression analysis.
• MAE: The amount of error in your measurements is expressed as absolute error (AE). It represents the difference between the measured and original values. The MAE is the average of all absolute errors.
• SMAPE: The mean absolute percentage error is a popular metric for assessing forecasting performance. MAPE is asymmetric, penalising negative errors more than positive errors when forecasts are higher than originals. SMAPE is a symmetric version of MAPE that overcomes MAPE's asymmetry.
Here, o and p indicate the original and the predicted WS values, respectively. • MASE: Mean absolute scaled error (MASE) is a scalefree error metric that presents each error as a ratio to the average error of a baseline. MASE has the advantage of never returning undefined or infinite values, making it an excellent choice for intermittent-demand series. It can be applied to a single series or used to compare multiple series.
Two types of lag periods are considered: l-3 and l-5. Table 4 and Table 5 portray the results based on the various performance evaluators utilized in this study for the various cities considered for l-3 and l-5, respectively. From Table 4, the following conclusions can be derived: a) The proposed MKLSTSVR and MHKLSTSVR show the best performance based on RMSE for 0 and 2 cases, respectively. b) The proposed MKLSTSVR and MHKLSTSVR show best results based on MAE in 1 case each.
c) The proposed MKLSTSVR and MHKLSTSVR show best results based on SMAPE in 2 cases and 1 case, respectively. d) The proposed MKLSTSVR and MHKLSTSVR indicate the best performance based on MASE in 1 case each. e) The proposed MKLSTSVR and MHKLSTSVR show the best performance based on R 2 in 0 and 2, respectively. Table 5 shows ranks obtained by the reported models based on various performance evaluators. It can be observed that our proposed MKLSTSVR shows the lowest average rank followed by the MHKLSTSVR, which promulgates the efficiency of these models.
The window size is enlarged from l-3 to l-5 to further evaluate the performance of these models. In Table 6, the results are shown using a lag window of 5. From Table 6, the following conclusions can be derived: a) The proposed MKLSTSVR and MHKLSTSVR reveal the best results based on RMSE in 2 cases each.
b) The proposed MKLSTSVR and MHKLSTSVR report the best performance based on MAE in 2 cases each. c) Based on SMAPE, the proposed MKLSTSVR and MHKLSTSVR show the best performance in 3 and 1 case, respectively. d) Based on MASE, the proposed MKLSTSVR and MHKLSTSVR show the best performance in 2 cases each. e) Based on R 2 , the proposed MKLSTSVR and MHKLST-SVR show the best performance in 1 and 3 cases, respectively.  Table 7 shows ranks obtained by the reported models based on different evaluators. One can notice that our proposed MHKLSTSVR shows the lowest average rank followed by the MKLSTSVR, which indicates the efficacy of the models.
Overall, it is noticeable that the increase in the time lag period shows improved performance for the proposed MKLSTSVR and MHKLSTSVR. Figures 3, 4, 5, and 6 reveal the comparison between the original and predicted WS values of TSVR, PLSTSVR, LDMR and the proposed MKLSTSVR and MHKLSTSVR for l-5. Although all the plots seem to be similar, the overall performance of the proposed models is better compared to TSVR, PLSTSVR and LDMR models based on the evaluators.
It can also be observed from Table 4 and Table 5 that the proposed models MKLSTSVR and MHKLSTSVR take less time for computation compared to other models. Thus, it will be interesting to see how the proposed MKLSTSVR and MHKLSTSVR behave in short-term WS prediction at different sites.

Statistical analysis using Friedman test with post hoc analysis
It is noticeable that the proposed models MKLSTSVR and MHKLSTSVR do not show the best results in all the cases. To further justify the efficiency of the best proposed model, the non-parametric Friedman test (Demšar 2006) has been performed. To generate the Friedman statistics, the average ranks of the five models are considered from Table 7.
Let us assume that all the models are not significantly different, from the null hypothesis (NH). Then, we formulate NH as follows: The critical value C V for F F with (5 − 1) and (5 − 1) × (24 − 1) degrees of freedom is 2.471 for = 0.05 . As it can be noticed that F F > C V ; therefore, we can reject the NH. Furthermore, the Nemenyi test is performed with p = 0.05 as follows: For significance, the pairwise difference in average ranks should be more than CD. It is worth noting that the proposed MHKLSTSVR is significantly different compared to TSVR, PLSTSVR and LDMR. However, no significant difference can be found between MKLSTSVR and MHKLSTSVR. The Friedman test with Nemenyi Statistics can be visualized from Fig. 7. Those models that are not connected by a line are statistically significant. It can be observed that our proposed MHKLSTSVR is significantly different than TSVR, PLST-SVR and L DMR .

Conclusion
The wavelets are powerful enough to deal with the nonstationary datasets. Moreover, the PLSTSVR model shows excellent prediction performance. Hence, in this work, the wavelet kernels are embedded in the PLSTSVR models and two novel wavelets kernel-based PLSTSVR models, namely, MKLSTSVR and MHKLSTSVR are proposed for WS prediction in four different wind stations in India. The results show the proposed models to be superior when compared to the related models, i.e. TSVR, PLSTSVR and LDMR. Among the proposed MKLSTSVR and MHKLST-SVR models, the latter shows a closer relationship with the original data. Moreover, the proposed models are computationally efficient as they take less time for computation compared to the other reported models. Comprehensively, one can conclude that both MKLSTSVR CD = 2.728 √ 5 × (5 + 1) 6 × 24 = 1.2452 and MHKLSTSVR are efficient models and applicable for short-term WS prediction. These models can also be applied in the field of engineering like prediction of river suspended sediment load, rainfall forecasting, runoff prediction, etc.