Support vector machine for stock movement direction prediction with sparsity for feature selection

In the stock market, accurate prediction of stock price movement direction can eﬀectively increase the proﬁts for investors. However, the stock price is an extremely complex dynamic system with strong ﬂuctuation, proper selection of technical indicators can potentially improve the accuracy of the direction prediction. We propose a novel sparse least squares support vector machine (LSSVM) by combining recursive feature elimination (RFE) and Relief via a weight parameter. Specially, the beneﬁt if this hybrid is three fold: (1) accounting for any intrinsic correlations among the features, (2) more eﬀective prediction due to the sparse framework capable of removing some “noise” features completely; and (3) simultaneously select technical indicators according to the feature ranking and accounts for possible interactions and possible non-linear eﬀects among the features. Three stock datasets from the liquor and spirits concept are analyzed to demonstrate the superiority of our proposed new framework providing sparse solutions resulting in more accurate predictions and higher returns among all seven considered classiﬁers.


Introduction
Stock investment can be regarded as one of the main choices to increase the profits for individuals and companies. However,the stock price usually is influenced by some uncertainties from economic conditions, social factors, and political events, etc. Thus, highly accurate stock prediction is a very challenging project in such an uncertain stock market (Z. Li et al., 2019). Technical indicators, which can provide an amount of related information, are developed as technical analysis for the price prediction. Now, many studies on direction prediction, which incorporate technical indicators in machine learning methods, have been developed (Gandhmal & Kumar, 2019;Gunduz et al., 2017;Q. Wang et al., 2018). Generally, plenty of technical indicators can be designed according to different prospects; meanwhile, any specific stock usually is only sensitive to indicators. Specially, in this paper, we focus on a sparse machine learning framework, which can remove redundant indicators during model training and provide good stock movement direction predictions.
The work on stock market prediction generally can be divided into two groups: stock price prediction and stock price movement direction. The first group is to predict the stock price value directly by machine learning methods based on stock price series modeling. For example, P. Yu & Yan (2020) proposed to predict stock price by combining time series phase-space method with deep neural networks. Xiao et al. (2020) hybridized least squares support vector machine with an auto-regressive moving average for stock forecasting. On the other hand, the stock price movement direction can benefit decision-making directly, thus such direction prediction has been popularly researched. One of the effective modeling methods is the artificial neural networks (ANNs) (Hu et al., 2018), which usually is modeled based on some novel predictors. For instance, J. Long et al. (2020) incorporated stock market information and public market information into a deep neural network to improve the prediction performance. Chen et al. (2021) integrated graph convolutional features with convolutional neural networks to increase prediction accuracy.
Similarly, Ismail et al. (2020) presented a persistent homology method to obtain some invariant topological features as input variables of ANNs to improve the prediction performance. Here, we should mention that some problems of local minima and overfitting during ANNs training often lead to degradation of the predicting performance (W. Long et al., 2019). Different from ANNs methods, support vector machine (SVM) is also very popular in stock price movement direction forecasting, which is based on a very solid statistical foundation. For example, Hao et al. (2021) developed a fuzzy twin support vector machine to improve the performance of SVM, which is robust to outliers. Notice that compared with ANNs, SVM not only can obtain the global optimum solution but also can prevent over-fitting (L. Yu et al., 2008).
However, among SVM methods for stock price movement direction prediction, the work on variable selection (i.e., removing redundant and irrelevant features) is very limited. Therefore, to improve the performance of SVM for predicting stock price movement direction, the feature selection method is incorporated into SVM to eliminate some unimportant features. Simultaneously, due to the low computational cost of the least squares support vector machine (LSSVM) (Suykens & Vandewalle, 1999), the LSSVM is employed for our stock price movement direction prediction, and we name our new SVM framework as sparse least squares support vector machines (Sparse LSSVM) in the paper.
In variable selection fields, sparse methods mainly can be divided into filter methods, wrapper methods,and embedded methods (Zhang et al., 2019), which usually generate different quality of feature subsets. The first methods, filter methods, rely on data characteristics to eliminate redundant, irrelevant, and noise features. Many recent studies have incorporated filter methods in SVM, such as fuzzy filter (Roy et al., 2016), information gain (Kurniawati & Pardede, 2018), and correlation-based feature selection (Khaokaew & Anusas-amornkul, 2016), to enhance its forecasting performance. It should be noted that filter methods ignore interaction with a learning algorithm, and most filter methods suffer from the problem of setting up the default threshold to distinguish influential features from redundant features (Cherrington et al., 2019). The second methods are the wrapper methods, which search for the optimal feature subset in the whole feature space based on the performance of learning algorithms. One of the most obvious cases is the genetic algorithm-based SVM (Tao et al., 2019), whose main advantage is to utilize crossover and mutation to control the balance between exploitation and exploration. However, when dimensions of data are extremely high,the wrapper algorithm has expensive computational cost. Moreover, overuse of classifier performance in feature subset selection would result in overfitting in the feature subset space (Kohavi & John, 1997). Different two mentioned methods before, embedded methods is defined as embedding the feature selection into the model training process, which commonly utilizes a penalty-based method to select feature subsets, such as elastic net-SVM (Lorbert & Ramadge, 2013) and l p -norm SVM (Nie et al., 2017). The most typical embedded-based SVM framework is the SVM with the least absolute shrinkage and selection operator (Lasso-SVM) (K. Wang et al., 2020). However, when there are several highly correlated variables, Lasso-SVM tends to pick only a few of them and remove the rest, which may remove important features. Moreover, the essence of penalty-based method is to force coefficients to be zero by controlling hyperparameters. Some useful features may be eliminated, leading to degradation of prediction performance, or all feature is retained, leading to model's interpret-ability unchanged.
Later, to overcome above shortcoming of the feature selection method, SVM-Recursive Feature Elimination (RFE) was proposed by Guyon et al. (2002). It has three main advantages: 1) compared with the wrapper method, its computational cost is low. It only needs to train the classifier once to obtain feature subset and is more robust in overfitting data than the wrapper method (Guyon et al., 2002); 2) The less informative feature is eliminated based on changes in loss function; and 3) it interacts with the learning algorithm without any setting threshold. However, as pointed out by Guyon et al. (2002), its feature evaluation criteria do not consider redundancy between features and relevance between features and target variable, so its feature evaluation result is unreliable.
Motivated by the above, to further improve the stock price movement direction prediction, we present a new sparse SVM framework based on the RFE method and the ReliefF method to select the best technical indicator subsets. Furthermore, three stock datasets from the liquor and spirits concept are investigated to validate the performance of our proposed new sparse framework. Feature evaluation criterion in the new sparse SVM framework takes into account not only feature evaluation criteria based on changes in the loss function, but also conditional dependencies between features to enhance reliability in the evaluation of feature quality. It can therefore eliminate more effectively some unimportant technical indicators and obtain optimal technical indicators subset; In the application to the stock data from three Chinese Liquors, the accuracy of the new framework can achieve as much as 84.71%, 85.29%, and 84.12%, with the corresponding F-measures as 0.8687, 0.8663, and 0.8439, respectively.
The rest of the paper is organized as follows. Section 2 reviews the theoretical background of least squares support vector machines. In Section 3, we illustrate the LSSVM-RFE and ReliefF method; and the proposed LSSVM-RFE with ReliefF framework is described. Then, Section 4 report the case studies of three stock datasets and compare our method with other forecasting frameworks. Finally, Section 5 concludes the paper.
2 Least Square Support Vector Machines Support Vector Machines, proposed by Cortes & Vapnik (1995), where a hyperplane with maximal margin can separate well positive instances and negative instances. Given a training set {(x 1 , y 1 ), . . . , (x n , y n )}, where x i ∈ R p×1 is the p-dimensional input vector, and y i ∈ {−1, 1} is a label. The model can be formulated as y = sign(w T x + b) with a scalar b and w = [w 1 ; . . . ; w p ] ∈ R p×1 , which determines direction of hyperplane. Based on the structural risk minimization principle, the objective function can be formulated as: where ξ i is a tolerable classification error of the i-th sample, and C is a penalty parameter that strikes a balance between structural risk and empirical risk. When C is large, the empirical risk will be emphasized.
To speed up the training process for SVM, Suykens & Vandewalle (1999) modified the Eq. (1) and proposed the LSSVM, which can solve a linear equation to obtain the solution, as: To solve Eq.
(2), a Lagrangian function is constructed as: with Lagrange multipliers α i .
Then, with the derivative of Eq.
(3) with respect to w, b, ξ i , α i , respectively, and simple substitutions, the solution can be obtained as: 3 Sparse LSSVM In this section, we first review the basic framework of LSSVM-RFE by Guyon et al. (2002) and the basic idea of ReliefF (Robnik-Šikonja & Kononenko, 2003). Then, we present our new sparse LSSVM framework based on the LSSVM-RFE and ReliefF.

LSSVM-RFE
LSSVM-Recursive feature elimination (LSSVM-RFE) is that nested subsets of features are selected in a sequential backward elimination manner, which starts with all the features and each time removes one feature with the smallest ranking score (Guyon et al., 2002). It inherits the two advantages of the wrapper and filter algorithms: (i) the learning of feature subset interact with learning algorithm; and (ii) the computational cost is low, and it only trains LSSVM classifier once at a time to obtain feature subsets.
The evaluation of feature quality for LSSVM-RFE is based on changes in objective function by removing a given feature. These changes can be approximated by the optimal brain damage (OBD) algorithm (LeCun et al., 1990) as: where w j = w j corresponds to removing the j-th feature, and L(j) is changes in the objective function caused by removing the j-th feature with the objective function L as Eq. (3). Here, at the optimum of L, ∂L ∂wj is set as 0. Then, Eq. (5) becomes Therefore, evaluation of the j-th feature quality is w 2 j . Apparently, the larger the w 2 j is, the more important the j-th feature is.

ReliefF
ReliefF algorithm not only takes into account conditional dependencies between features given the predicted value and correlation between feature and class but also is more robust in noisy data compared with Relief (Robnik-Šikonja & Kononenko, 2003). The key idea of the ReliefF is to estimate the quality of features according to how well their values distinguish between instances that are near to each other. For that purpose, assume that the training set ; and all feature have been normalized. The evaluation of feature quality is as follows: 3) The quality estimation W j for the j-th feature is updated, j = 1, . . . , p, by using Eq. (7) as where m is the number of iterations; y i is the class of instance x i ; C is instance class, which is different from the class of instance x i ; P (C) is the probability of an instance being from the class C; P (y i ) is the probability of an instance being from the class y i ; and 4) Repeat (1)-(3) process for m times.
Here, for big data modeling, m instances are randomly sampled from a training set and m is a user-defined parameter (Robnik-Šikonja & Kononenko, 2003). Since the selected stock dataset size in the paper is not very large, the number of iterations m is set as the training size (i .e., all instances in training set {x i , y i } is selected). The specific algorithm is showed in Algorithm 2.

LSSVM-RFE with ReliefF
Before the presentation of our new LSSVM-RFE with ReliefF, we should mention two points again: 1) the LSSVM-RFE does not comprehensively take into account the redundancy among technical indicators and Algorithm 1 Searching for nearest hits set H and nearest misses set M for the i−th instance Choose an instance x i in the training set {(x 1 , y 1 ), . . . , (x n , y n )};

5:
Find the nearest hits set H of x i and the nearest misses set M of x i by using algorithm 1;

6:
for j = 1, . . . , p do 7: 8: Output: Update the estimators W j of the qualities of feature j, j = 1, . . . , p relevancy between target variable and technical indicators, which may not yield optimal feature subset; and 2) the ReliefF algorithm sometimes ignores the interaction with the learning algorithm. Therefore, we incorporate the ReliefF method into LSSVM-RFE to develop our more effective sparse LSSVM.
In our proposed framework of LSSVM-RFE with reliefF, the quality of each feature is evaluated based on Eq. (8) as where parameter α ∈ [0, 1] determines the trade-off between ReliefF ranking and LSSVM-RFE ranking. W j is obtained by using iteration of Eq. (7), while w 2 j is obtained by using LSSVM-RFE algorithm. Then the feature corresponding to the smallest C j is removed, and the quality of the feature subset composed of the rest features is evaluated by classifier performance, such as average accuracy.
Now, the procedure of obtaining optimal feature subset in LSSVM-RFE with reliefF is given as: Step 1 Initialization. Import original feature set S = [1, . . . , p]; Step 2 Process. Score W j of feature j on the feature set S and score w 2 j of feature j on the feature set S is calculated by reliefF method and LSSVM-RFE method, respectively; Step 3 Evaluation. Utilize Eq. (8) to obtain a quality of feature j on the feature set S; Step 4 Remove. Find a feature j corresponding to the smallest C j on the feature set S. Then, feature j is removed on feature set S, i.e. S = S \ j; Step 5 Evaluation. The quality of removed feature set S is evaluated by classifier performance. Calculate average accuracy A of LSSVM on training sets with features in removed feature sets S by using cross-validation. The average accuracy A is considered as quality of removed feature set S; Step 6 Repeat Step 2-5 until feature set S is empty; and Step 7 Output. The feature set S corresponding to the highest average accuracy A is found and output.
Furthermore, the flowchart of our new sparse LSSVM is shown in Fig. 1.

The case study
In this section, three stock datasets are used to validate the performance of stock price movement direction forecasting with our proposed LSSVM-RFE with ReliefF. Specifically, we employ the sparse LSSVM for one-day-ahead stock price movement direction forecasting.

Data gathering and preparation
Start Initialization Feature Set S=[1, · · · , p] ReliefF and LSSVM-RFE method Calculate score W j of each feature j on S by using ReliefF. Calculate score w 2 j of each feature j on S by using LSSVM-RFE.

LSSVM-RFE with ReliefF method
Evaluate quality C j of each feature j on S, i.e. C j = αW j + (1 − α)w 2 j .
Removing Find feature j corresponding to the smallest C j on S, i.e. j=argmin j (C j );Then S = S \ j where open t represents the opening price at the t-th day. and

Correlation Analysis
In the subsection, the correlation analysis on training sets is conducted to explore the correlation among all considered variable. Here, we plot the correlation between technical indicators and technical indicators and technical indicators and target variable (target) in Fig. 3.
As illustrated in Fig. 3, for all stocks, the target variable is strongly correlated with MOBV, while the

The proposed LSSVM-RFE with ReliefF training
We normalize all 49 technical indicators to eliminate the impact from scale of each indicator. Furthermore, according to the average accuracy in validation sets, we use 5-fold cross-validation method in training sets to search for the two hyper-parameters α in Eq. (8) and C in Eq.
(2). In particular, we present three alternative weight parameters as 0.3, 0.5, 0.7. Meanwhile, we first do a coarse search for C from the sequence 2 −8 , 2 8 with step size 2 0.8 ; then, we do the fine search for C from the preset sequence 2 −4 , 2 4 with step size 2 0.5 (Yuanyuan et al., 2017). In addition,When size of nearest hits set H and nearest misses set M is reasonably small,important features can be better separated from unimportant features(Robnik-Šikonja & Kononenko (2003)).Therefore,the size of their sets is set 5.The linear kernel is chosen for our proposed sparse LSSVM. The results of our 5-fold cross-validation are detailed in Fig. 4.
According to the best average accuracy in Fig. 4, we report the optimal hyper-parameter settings and corresponding accuracy and features subsets with the LSSVM-RFE with ReliefF for three investigated stocks in Tab. 2.

The experiment results
We retrain the LSSVM-RFE with ReliefF by using the tuning hyper-parameters and feature subset in Tab. 2 on the whole training sets for three stock datasets, respectively. Then, the forecasting performances for stock price movement direction with five error indexes (Accuracy, Recall, Specificity, Precision, and n Figure 4: The average accuracy with different pre-set parameters and feature number for three stocks

The model comparison
To further demonstrate the capacity of our proposed LSSVM-RFE with ReliefF framework, we investigate

The comparison of error indexes
This subsection discusses the results of model comparison as following four aspects: 1) Sparse LSSVM vs. LSSVM: According to Tab. 4 and Tab. 5, the MOBV is the most technical indicator for stock price movement direction prediction. Interestingly, the CFS-based LSSVM which excludes the MOBV indicator is remarkably worse than the basic LSSVM without variable selection. Another point also can be found   Technical Indicator Subset  CFS-based LSSVM  VOSC,BR,D,J,TAPI,VMA  ReliefF-based LSSVM  MOBV,VRSI,QRR,RSI,WR,DPO,BIAS,TAPI,RC,

Comparison of cumulative return rate
To further explore the superiority of our proposed sparse LSSVM, we design a stock investment strategy based on all forecasting results. Generally, investors prefer to focus on the cumulative return rate from a classifier. Regardless of transaction costs such as commission, cumulative return rate with investment strategy showed in Eq. (15) as, where R T +t denotes cumulative rate of return from the T -th day to the (T + i)-th day and O T denotes opening price of a stock on the T -th day. P T +i denotes that if the classifier's prediction result is 1 on the (T + i)-th day, that is, Y T +i = 1, the increase in profit on the (T + i)-th day is O T +i − O T +i−1 . Otherwise, the increase in profit on the (T + i)-th day is 0.
where R T +t denotes the cumulative rate of return from the T -th day to the (T + i)-th day, and O T denotes the opening price of stock on the T -th day.
In the experiment, we buy these stocks on the opening price of 22 April 2020, respectively. All cumulative return rates for all investigated forecasting models are calculated from 23 April 2020 to 31 December 2020, which are reported in Fig. 5.

Conclusions
In this present paper, a new sparse LSSVM framework, LSSVM-RFE with ReliefF, has been proposed for stock price movement direction prediction. Different from the ReliefF-based LSSVM, the learning of feature subset is related to the learning algorithm. Furthermore, our feature evaluation criteria are based on the changes in loss function and the conditional dependencies between features and correlation between features and class, which greatly enhances reliability in the evaluation of feature quality. Therefore, our

Declaration of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

A Appendix A
A.1 Correlation Feature Selection(CFS)-based LSSVM CFS-based LSSVM is composed of CFS and LSSVM model. Its algorithm procedure is divided into two steps: 1) the first step: CFS is utilized to select feature subset; and 2) the second step: feature subset is considered as input variables of LSSVM model. The basic idea of CFS (J. Li et al., 2017) is to use a correlation-based heuristic to evaluate the worth of a feature subset S: where the CFS score shows the heuristic"merit" of the feature subset S with k features.r cf is the mean feature class correlation andr f f is the average feature-feature correlation. Because searching globally optimal subset is NP-hard problem, best first search is utilized to search local optimal feature subset.It starts with empty set. Then, a feature corresponding to the highest score calculated by Eq. (17) is introduced into the set at a time. Meanwhile, in order to avoid searching the entire feature subspace, the method uses a stopping criterion of five consecutive fully expanded non-improving subsets.

A.2 Backward Elimination-LSSVM
Backward Elimination-LSSVM is that backward elimination feature selection is introduced into LSSVM.It starts with full feature set. Then a feature is removed on feature set in a greedy way,which can improve performance of LSSVM at a time until performance of LSSVM can not be improved.

A.3 Elastic Net-LSSVM
The Elastic Net-LSSVM is composed of LSSVM and elastic-net penalty, a mixture of the L 1 -norm and the L 2 -norm penalties. Elastic net penalty term not only simultaneously does automatic variable selection and continuous shrinkage, but also can select groups of correlated variables (Zou & Hastie, 2005). Its objective function can be formulated as: w i x i )] 2 + + λ 1 ||w|| 1 + λ 2 ||w|| 2 , with two tuning parameters λ 1 , and λ 2 ≥ 0.