A comparison of the analysis of methods for feature extraction and classification by Wavelet transform in SSVEP BCIs

: Most of the studies in the field of Brain-Computer (BCI) based on electroencephalography have a wide range of applications. Extracting Steady State Visual Evoked Potential (SSVEP) is regarded as one of the most useful tools in BCI systems. In this study, different methods such as feature extraction with different spectral methods (Shannon entropy, skewness, kurtosis, mean, variance) (bank of filters, narrow-bank IIR filters, and wavelet transform magnitude), feature selection performed by various methods (decision tree, principle component analysis (PCA), t-test, Wilcoxon, Receiver operating characteristic (ROC)), and classification step applying k nearest neighbor (k-NN), perceptron, support vector machines (SVM), Bayesian, multiple layer perceptron (MLP) were compared from the whole stream of signal processing. Through combining such methods, the effective overview of the study indicated the accuracy of classical methods. In addition, the present study relied on a rather new feature selection described by decision tree and PCA, which is used for the BCI-SSVEP systems. Finally, the obtained accuracies were calculated based on the four recorded frequencies representing four directions including right, left, up, and down.


Introduction
The brain-computer interface (BCI) is considered as a possible method for boosting communication and controlling the environments such as amyotrophic lateral sclerosis by which severely disabled people are able to manage their life [1]. BCI aims to create a path between the human brain and an external device such as BCI systems in order to bring human intentions into control signals. A large number of researches have recently focused on an Electro-Encephalography (EEG)-based BCI systems to accomplish the desirable communication. The signals extracted from an EEG signal such as ERP (event-related potentials), ERS (event-related synchronization), and VEP (visual-evoked potential) have been used in many perusals. They have attracted a lot of attention since VEP-based BCI systems enjoy a high information transfer rate (ITR) [2]. Among all these types, BCIs based on the Steady State Visual Evoked Potential (SSVEP) have been more emphasized. As brain responses to a visual stimulus, these significant subsets of VEP-based BCIs include high ITR, high signal-to-noise ratio (SNR), low set-time in train, and optimum steady function [3].
Firman et al. [4] applied the minimum energy combination (MEC) method to detect SSVEP from EEG signals. The method is used when the rapid and accurate recognition is necessary to attain high SNR for BCI systems. They used short segments to delete noises from EEG signals. In [5], the double stimulus frequency was used for eye stimulus in BCI systems, which leads to an increase in the performance of system. Wavelet analysis has been applied to a single EEG channel to extract its features. It was used as a conventional method to analyze EEG in order to detect sun band frequencies [6][7]. Fourier transform was applied to a single channel of EEG signals to discover the phase and amplitude of SSVEP [8,9]. The main disadvantage of Fourier transform is that it applies the frequency of signal regardless of the time information. When both Wavelet and Fourier transform were put into application, it was possible to reach a better signal extraction level [10]. Fig. 1 shows the block diagram of the stream of the steps applied to the four stages of signal processing module. In this study, a comparative method was adopted for classical methods in three parts of signal processing including feature extraction, feature selection, and classification. The study was conducted by using five feature extraction methods, spectral approach applying narrow band IIR filters, and wavelet transform computed at evoked frequencies. Further, six feature filters and wavelet transform computed at evoked frequencies, as well as six feature selection methods, and five classifiers were considered in the present study. Furthermore, the performance of each method was analyzed under a five-feature selection including decision tree, Wilcoxon, ROC, Bhattacharyya, and PCA. After combining to be applied on the same database, these methods were reported to describe an interesting comparative approach as follows: This study was conducted on the database built according to the experimental setup described in the future. This study is a modest contribution to the identification of the best way for SSVEP processing from the feature extraction to the classification methods. To the best of our knowledge, no study has focused on the performance of a decision tree and PCA as feature selection approaches in BCI based on SSVEP.

Experimental procedure
The data were recorded at Brain Science Institute, Laboratory for Advanced Brain Signal Processing. Biosemi Inc. machine recording was used for recording the data built in Netherland.
The, 128 active electrodes from four participants were utilized to record the related data. The participants were completely aware of the objectives of this project. In addition, they could identify the light sensitive to the epilepsy disease before recording the final data. Further, the SSVEP stimulation was recorded by reversed white and black checkered (6×6 screen). In the next procedure, the second experiment was performed with a small checkered screen at three stimulus frequencies (8,14,28 Hz). The sampling frequency was 256 Hz. Additionally, the participants were asked to sit at the 90 cm distance of a monitor. The SSVEP started and ended 5 seconds and 20 seconds after starting the data, respectively. We had 15 seconds of SSVEP from four batches of participants with three frequencies. Each frequency included five experiments for each participant. The total number of samples was 6370 [11,12].

Preprocessing: Filtering and segmentation; Wavelet transfer
Wavelet is used for time-frequency analyses especially for non-stationary signals. It applies two windows to properly perform both extensive and short time-frequency analyses for low and high frequencies respectively [11]. Wavelet is a reliable way to segment the raw signal of EEG. It is useful as it enables the researchers to analyze the signal, considering the aspects of time and frequency [12]. Wavelet decomposes the original signals into two levels: low frequency information ( ), and high frequency information ( ). It breaks the low frequency information repeatedly, while keeps the high frequency information intact. Finally, it completes a wavelet packet tree [6]. The orthonormal basis , ( ) that refers to the nth sub-bound of wavelet packet at jth scale is calculated as , ( ) = 2 − 2 (2 − − ), where is the shift factor [6]. It satisfies with: The signal is decomposed into 4 levels; ℎ 0 ( ) and ℎ 1 ( ) are two members of the 4 th level of filters, is obtained as [6]: Decomposing continues repeatedly until the frequency resolution increases [13,14]. When the scaling and decomposing are finished, the sampling sequence and the Wavelet coefficient are calculated at kth sample and jth level (formerly extracted) as following [6]: Finally, it comes to computation of the frequency ranges of the whole subspaces at jth level, where fs is the frequency of sampling [6]. Fig.2 shows the filtering analysis of wavelet decomposition for the three divided steps.

Skewness and Kurtosis
A fundamental task in many statistical analyses is to characterize the location and variability of a data set. Further characterizations of the data include skewness and kurtosis [17]. The former is a measure of symmetry, or more precisely, the lack of symmetry. A distribution or data set is symmetric if it looks the same to the left and right of the center point [17].
The latter is a measure of whether the data is heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails or outliers. Data sets with low kurtosis tend to have light tails or lack of outliers. A uniform distribution would be the extreme case [17].
Skewness is a standard to find the symmetry or asymmetry in distribution function. If the distribution function is symmetrical, Skewness measure is zero and if the distribution function is asymmetrical, Skewness measure is positive to the higher values and negative to the lower values, and it is defined as below [17].
Kurtosis is the standard to recognize the sharpness of the curve at maximum value, and it defines the normal distribution in the following [18]

Decision Tree
An amazing way to select the superior data for an analysis is to make a decision tree. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. These segments form an inverted decision tree that originates with a root node at the top of the tree. It is a set of nodes which contact to each other through their branches going down to the root when they come to leaf notes [19]. The starting point is the root node, located at the top of the tree according to the agreement; indexes are located in decision nodes and each result is put at one branch. Each branch connects to another decision node or the final leaf node [19]. Here each branch divides into two other branches. Finally, we selected the data at the top of nodes.

PCA (Principles Components Analysis)
PCA is a well-recognized way to reduce the dimensions of data when there is a huge volume of data for classification. It is used to achieve a linear transform from the main complex of data [20].
At the end of the procedure, Bhattacharyya, t-test, ROC, and Wilcoxon were other feature selection methods used in this study. Finally, the results were collected in a table in order to find the best feature selection method.

Classifiers
After extracting data via Wavelet and selecting the best of them, we had to use the classifications to classify them. We applied k-NN (k-Nearest Neighbor), Perceptron, MLP, SVM (Support Vector Machine) and Bayes.

MLP classifier
Multilayer Perceptron (MLP) is put to artificial neural networks with at least three nodes except for the input nodes. Here, nodes are considered as neurons which apply a nonlinear function. MLP uses the backpropagation for its training, which is a supervised method of training [20,21].

SVM Classifier
It is a supervised machine-learning which is applied in classification of data sets. Training data divides into two groups and SVM makes a model to assign the new given data to one of two groups.
It divides these two as far as possible from each other so that it could rise the resolution of groups [23].

Results
The diagram below illustrates the arrangement of this study. First, the SSVEP signals from Mat lab were broken into 2-second spans. Then, the discrete Wavelet transform was used for these 2-second spans. In addition, the Wavelet coefficients were decomposed. The stimulus frequencies were 8, 14, 28 Hz and accordingly Wavelet decomposition continued until every stimulus frequency was separately put in each single band. Using a two-time analysis of Wavelet decomposition level could lead us to the slight point according to the range of EEG signal, which is between 0-40 Hz. It is worth noting that the Wavelet was decomposed for four times to reach the stimulus frequencies of 8, 14, and 28 Hz at each band separately. After decomposing signals by Wavelet, we were allowed to extract the features from the decomposed signals including Entropy Shannon, Skewness, Kurtosis, mean, power, and variance. Further, the features were normalized, and the decision tree, PCA, ROC, and statistics methods were used for selecting the optimum data. The result of the decision tree, according to the higher rank, was in accordance to variables input (features) and outputs (labels). Ten features were selected by using the decision tree as show in Table 1. By considering the decision tree presented in Tables 1-3, Entropy Shannon of first channel was selected as the best feature The other methods and their selected features are shown in Tables 2-5.  After using all of the above methods, the selected features were divided into testing and training groups. Testing and training ratios were 20% to 80%, respectively. The k-NN, MLP, SVM, and Bayes classifiers were learned on the training group, while the testing group was applied to the learned classifiers. The classifications were of a 4-class nature, among which three were related to the frequencies of 8, 14, and 48 Hz. Another frequency was related to a normal frequency. is from 3 to 9 for k-NN classifier and the classifier was used for each number.
There were five and six hidden layers of neurons for MLP classifier. The test was performed 100 times, and accordingly the accuracy was reported. The results of all classifiers with each featured method are reported in Table 6, along with their accuracy.
In addition, k = 7 for k-NN and n = 6 (the number of neurons in hidden layer) for MLP had the best accuracy. Table 6 shows that k-NN gained the highest accuracy among the four classifiers (91.39%) and then SVM allocated the second rate of accuracy (89.34%) with the PCA feature selection method.
Further, the features selected by PCA and those given to classifiers had higher accuracy.
Furthermore, the accuracy of MLP was lower for each feature selection. Finally, Bhattacharyya and Wilcoxon had weaker results compared to other feature selections methods.

Discussion
Comparing the results of the present study with those in other studies is difficult due to EEG (exclusively SSVEP) database, number of participants, type of Wavelet transform, decomposition level, type of classifiers, and feature extraction methods. However, the results were compared in the present study with other related ones. The information related to the database, as a pure SSVEP from EEG, was used to obtain the robust SSVEP, which was completely reliable for BCI systems and clinical applications. In addition, it allowed the researchers to reach the highest levels of accuracy and efficiency and signal-to-noise. Complexity and time-consuming calculations are considered as the most frequent problems regarding non-linear classifies [11]. A large number of studies reported such difficulties and created over-fitting problems [24]. Thus, in the present study, the linear classifications and feature extractions were implemented to obtain the highest accuracy and efficiency. The results demonstrated the highest accuracy, when linear methods are used for both classifiers and feature extractions, compared to other recent studies. However, some cases with lower accuracy were reported in comparison with the present work. In 2006, some studies applying SVM [25] reached an accuracy of 53.98%-56.07%, and some obtained 87.5% when they improved the level of decomposition up to seven [26]. Furthermore, an accuracy of 81.48% was obtained in some studies when they decomposed Wavelet up to five levels [27] and they used k-NN. In addition, some attained an accuracy of 65.90% by using SVM. In 2011, some obtained the 85.4% accuracy by using SVM classifier [28]. It is worth noting that non-linear classifiers were difficult to be calculated due to over-fitting and time calculation in the present study [29].

Conclusion
The present study aimed to determine the suitable features for classifying the frequency of 8, 14, and 28 Hz, and one normal class. Further, it answered the question whether the accuracy of SSVEP extraction could be optimized by ranking the features, and using PCA and decision tree methods.
The highest level of accuracy was obtained (91.39%). Furthermore, the highest accuracy was reported by using PCA method with k-NN classifier. Unlike other conducted studies, the present research was prioritized due to its classification of all processes such as signal extraction methods, feature extractions, feature selection methods and classifications, as well as its accuracy, which is shown to be the possible value. In addition, PCA, decision tree, and t-test were better than Bhattacharyya and Wilcoxon. The results of Bayes and SVM had better performance that those in k-NN and MLP. Finally, MLP had the weakest result.
Funding information This work did not receive any grant from funding agencies in the public, commercial, or not-for-profit sectors.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict ofinterest.
Ethical approval All procedures performed in studies involving humanparticipants were in accordance with the ethical standards of the Brain Science Institute, Laboratory for Advanced Brain Signal Processing and its later amendments or comparable ethical standards.

Informed consent Informed consent was obtained from all individualparticipants included
in the study.