Classifying stock market data using rule-based temporal classification overlaps with many disciplines, including temporal clustering and classification, rule-based classification, optimization (differential evolution in our case), and stock market forecasting. In the sub-sections below we will briefly explain each of these subjects and critically review related studies.
2.1. Temporal Classification
Classification is a type of supervised machine learning concerned with predicting one of the predefined finite classes for items subject to classification [5]. The temporal and sequence classification is an automatic system that assigns one of the predefined classes to the time series or sequence input [6]. Many temporal classifications have been introduced that reuse traditional classification algorithms using criteria and measurements crafted for temporal data.
Many temporal supervised and unsupervised algorithms use dynamic time warping (DTW) [7] to align between two sequences or time series and find the distance between them. This method was originally used in speech recognition to find human speech patterns [8]. DTW uses a local cost function to compare between two time-series. The operation of time matching between two series using DTW is shown in Fig. 2.
Douzal-Chouakria et al. [10] used classification trees to classify time series data by introducing new splits for the tree nodes using time series proximities relying on adaptive metrics considering behaviours and values. The distance-based K-nearest neighbour classification method (KNN) is used with temporal and sequential data with Euclidean distance measure [11]. However, for complex time series, Euclidean distance is sensitive to the time fluctuation, thus DTW has been used [12]. Other methods use Support Vector Machine (SVM) as a temporal data classifier using different kernels [13]. SVM classifies items by separating each class using optimal hyperplanes between them [5].
Model-based classifiers can also be used for temporal and sequential classifications like the Naive Bayes sequence classifier [14] and Hidden Markov Model [15]. In the training step, the parameters of the model are created and trained depending on some assumptions, and a set of parameters describing probability distributions. In the classification step, a new sequence is assigned to the class with the best possible similarity [16].
2.2. Temporal Clustering
Clustering is an unsupervised machine-learning method whose goal is to find natural groupings (clusters) of instances in data sets. All clustering methods strive to detect compacted clusters by maximizing the total sum of inter-cluster distance and minimizing the total sum of the intra-cluster distance be-tween instances [17]. The distance can be measured using Euclidean distance, DTW distance, or any other similarity measures.
Jebara et al. [18] used a hidden Markov model (HMM) to cluster time series data, while Oates et al. [15] compared two methods for clustering time-series data sets, first using HMM alone and then using DTW with HMM. DTW returns the minimized area between two time-series variables, which can be used as a similarity measure between the variables. They concluded that using DTW enhances the results of the clusterings of the time series data set.
Rodrigues, Gama, and Pedroso [19] used hierarchical clustering to cluster time series data sets. A hierarchical clustering method works by grouping items into a tree of clusters. The tree can be generated in two ways, either by starting from single items then agglomerating them into a higher structure, or starting from the entire data set and dividing it until ends up with single items in each branch of the tree [20]. Another method used a scaled-up version of DTW [21] with hierarchical clustering, which calculates the distance between temporal variables efficiently and shows the advantage of using DTW with hierarchical clustering.
Soheily-Khah et al. [22] proposed k-means-based clustering for temporal data sets using DTW, the Dynamic Temporal Alignment Kernel, and the Global Alignment kernel. Items of a data set are partitioned by K-means clustering, minimizing the total distance of items to a centre of the clusters chosen randomly at the initial stage, but later recalculated iteratively, and items are allocated to the nearest centroid to form clusters with minimum intra-cluster distance [5].
2.3. Rule-Based Classification
Most rule-based classification uses if..else.. form to classify underlying data, which is amenable to easy human comprehension [23]. The rules can be learned through examples or provided by an expert [24]. Many different data mining and analysis methods use rule-based systems for classification, as explained below.
Rule-based classification in fuzzy systems is used, for example, Cordon et al. [25] proposed a new Fussy Reasoning Method (FRM) with better optimization for the system, whereby the rules do not lose their comprehensibility. Ishibuchi [26] compared two kinds of voting schemes for fuzzy rule-based classification.
Experts use common sense and vague terms to solve problems and classify situations/items, while an expert system that tries to simulate human experts uses logic to conclude decisions instead of hard programmed solutions [23]. Several expert systems that rely on rule-based logic have been introduced [27].
Many other methods have been introduced that use rule-based systems for classification, like [28], which proposed a generic classifier construction algorithm (ICCA). [29] proposed an algorithm for a rule-based classifier that can extract rules from uncertain data, and [30] used probability estimation for rule learning, inspired by the use of probabilities to construct decision trees.
2.4. Differential Evolution
Differential Evolution (DE) is a heuristic search algorithm introduced by Storn et al. [31], who described it as simple and efficient. DE is a special type of genetic algorithm that uses crossover and mutation while producing the next generation, as it happens according to the nature of DNA and derives natural evolution from creating solutions (species) that are optimized for the environment. This algorithm proved its success and it has been used in many different areas [32]. In this study, we used DE to optimize provided rules by human classifiers. The optimization focuses on minimizing the distance between items within classes using their temporal attributes.