An Efficient and High Accuracy P300 Detection for Brain Computer Interface System Based on Kernel Principal Component Analysis

: Human machine interaction with the use of brain signals has been made possible by the advent of the technology popularly known as brain computer interface (BCI). P300 is one such brain signal which is used in many BCI systems. The problems associated with most of the existing P300 detection methods are that they are time consuming and computationally complex as they follow the procedure of averaging the values obtained from multiple trials. Also the existing single trial methods have been able to obtain only moderate accuracy levels. In this paper, a novel approach which for achieving a high level of accuracy has been proposed for single trial P300 signal detection amidst noise and artifacts. In this method features were obtained by applying Discrete Wavelet Transform followed by a technique making use of the obtained wavelet coefficients. Kernel Principal Component Analysis (KPCA) was used for reducing the feature dimension. Classification of the P300 signal using the reduced features was done using Support Vector Machine (SVM). The Dataset used was the Dataset II of the third BCI Competition. An accuracy of 98.53% was achieved for Subject S1 (signal obtained from the first person) and 99.25% for Subject S2 (signal obtained from the second person) by using the proposed method. A high level of accuracy was obtained, as compared to many existing techniques. Also the speed of classification was improved with the use of reduced feature dimensions. two equal accuracy The accuracy increased by applying KPCA. The maximum accuracy was obtained at a reduced feature dimension of six for both the subjects. The proposed algorithm was tested with different other features. the best results were obtained using the proposed features. The results obtained from proposed method have been compared with that of literature survey as well as different other methods which we have experimented. The above comparisons show the advantages of the proposed method over existing


Introduction
With the advancements in the field of computer technology and neuroscience, it is now possible to link brain signals to operate various applications via computers or devices. Brain Computer Interface Technology (BCI) is one of the most researched areas in the field of Biomedical Engineering. From the field of entertainment to the world of automations, BCI can play a major role to ease life of different categories of people. Various gaming applications, autonomous vehicles, devices that can help differently abled people, etc. are in the development stage. Based on the brain signal recording methods, BCI systems can be classified into two categories: invasive system and non-invasive system. In the invasive system, BCI devices are implanted directly into the brain. Though they provide the best performance in terms of accuracy, the procedure involved in its implantation is risky as well as expensive. Whereas non-invasive BCI devices are safer and less expensive but they provide poor signal to noise ratio due to the attenuation of the signal by the skull. Thus this paper primarily focuses on noninvasive systems.
In BCI systems (both invasive and non-invasive), Electroencephalography (EEG) has been found as a primary sensing technique. Among the various types of EEG signals, event-related potential (ERP) signals are mainly utilized in the design of BCI applications. ERP signals are the direct result of specific sensitive, cognitive or motor events. P300 is one such ERP signal which has been extensively used in the research. P300 is an ERP which occurs approximately 300ms after a rarely presented event or stimuli. P300 signal is one of the most popularly used signal in the design of several BCI applications such as BCI speller, Brain finger print, Lie detector, home automation systems etc. [1], [2], [3]. Due to the high importance of BCI systems, development of accurate and robust detection algorithms of the P300 signals is also very important. Although a few algorithms have already been developed, efficiency and robustness is still lacking. This paper is motivated by the fact that for proper use of BCI technology, there is a requirement of efficient and robust algorithms which can lead to more user friendly system. This was the motivation for coming up with a novel P300 detection method. The average value of a large number of trials is used in most of the existing BCI systems considering the removal of various artefacts. This process increases the signal detection accuracy but at the cost of communication rate as averaging consumes a lot of time. Most of the BCI systems use large number of electrodes for acquisition of signal which make the system costly. This is the other limitation of the existing BCI system. Yet another challenge is the fact that most of the biomedical signals are highly subjectdependent. So the characteristics of the signal vary from one individual to another.

Dataset
The dataset supplied by BCI Competition III data set II for P300 speller paradigm is used in this experiment [21]. This dataset consists of training samples as well as testing samples. Among these we have used only the training samples for our study. This is a standard benchmark dataset which has been used in developing many of the P300 detection algorithms. In order to compare the performance of our developed algorithm with the existing ones we have used the same dataset provided by BCI Competition III. This dataset consists of data taken from only two subjects during an experiment. In this experiment a 6x6 matrix (Fig.1) was presented in front of each subject which contains 36 symbols. The matrix consists of 26 alphabets and other useful characters. The user was made to wear headset consisting of 64 electrodes. The user was asked to concentrate on a particular symbol which he wants to communicate. The 6x6 symbol matrix consists of 6 rows and 6 columns. The backgrounds of rows and columns were intensified one at a time in a random manner for a fixed period of 100ms. A blank matrix is presented for duration of 75ms after each intensification attempt. The cycle of 12 intensifications covering each row and column once is called a sequence which was repeated for 15 times. The Subject stares at one of the 36 characters in the matrix at a given time and maintains a mental count. We can notice that in a sequence, only two intensifications (corresponding row and column containing the symbol) were the target intensifications. Hence the target character was intensified twice a sequence and 30 times in 15 sequences. 85 symbols or characters were targeted by each subject. The same process of 15 sequences was repeated for all the targeted symbols or characters. An ERP, P300 potential is generated whenever the targeted row or column (containing the targeted symbol) was intensified. The 64 electrodes cap which was placed over the subject's head will measure the ERP at a sampling rate of 240 Hz [6] as shown in Fig.2. We have used the data from only six channels (Fz, Pz, P1, P2, POz and Oz ) out of the 64 channels. The P300 signals are strongly generated in the specific area of scalp which is parental lobe [21]. The above mentioned six channels read data from same area. Hence the electrode position surrounding the area mentioned in [21] having stronger P300 signals was selected based on our experimental observation.

Proposed Methodology
The flow of the algorithm is as shown in Figure 3. Most of the existing works were using averaged data from large number of trials in order to avoid noise and intensify P300 signal feature present in the data thereby resulting in improved accuracy. But the consequence of averaging is large time consumption which will result in decreasing the speed of the system. Hence we propose a P300 signal detection algorithm based on single trial. KPCA and SVM are used together in our method which is uncommon in P300 detection. The proposed method uses only six electrodes for signal acquisition and processing thereby proving to be cost effective.

Feature Extraction
One of the most suitable methods for feature extraction of P300 signal is wavelet transformation. The Wavelet transform having multi resolution property contains frequency as well as time domain information. As P300 signals are the non-stationary and random in nature, we have selected wavelet transform as the technique to extract features of the signal. Wavelet coefficients of each data samples are obtained from the wavelet decomposed signal.
Where D(i) is the detail coefficient and A(i) is the approximation coefficient. Among these detail coefficients (D) were used for feature vector. The reason behind this is that detail coefficients Signal X(n) is acquired (in the duration of 0 ms to 666.67 ms) after stimuli (from all 6 channels) Corresponding feature vector from all the 6 channels are concatenated SVM is used for Classification Coefficients are divided into 5 parts and maxima and minima from each are taken (Feature Vector) KPCA is used for reduction of Feature dimension Feature dimension reduction using PCA DWT is applied correspond to the sharp features which can more distinctly classify the signal as P300 and non-P300 unlike approximation coefficients.
Previous studies suggested Daubechies wavelet 'db4' at level 2 as the most suitable wavelet for EEG signals, so it was used in the algorithm. A new technique for feature extraction is applied instead of using the wavelet coefficients directly. The wavelet coefficients (D) obtained from each signal data (160 samples) are divided into 5 different groups (Si) consisting of 32 samples in each group.
A maximum value i max and a minimum value i min are obtained from each group. The obtained feature vector (of length 2 samples) consists of maxima value and minima value of each signal data.
The feature vectors from each of the 6 channels are concatenated to form a new feature vector F (

Feature Reduction
Principal component analysis (PCA) is one of the most famous statistical techniques for dimensionality reduction and feature extraction. It linearly projects the data sample from a large number of correlated variable spaces into less number of uncorrelated variable spaces. The obtained uncorrelated variables are known as principal components. The largest possible variance is associated with the first principal component and the variance of the succeeding components keeps decreasing while maintaining orthogonality to the preceding components [13]. Data dimension is reduced by removing some of these components in the signal. Most of the high dimensional datasets are observed to be non-linear in nature. In such cases PCA cannot be used to model the variability of data. To address such problems, a non-linear dimensionality reduction technique known as KPCA was designed. KPCA has all the advantages of the regular PCA. It gives better recognition rate and improved performance as compared with its linear counterpart. It provides an implicit non-linear mapping to a feature space where the features representing the structure in the data may be extracted better. KPCA performs data compression by reducing the data dimension while preserving the information. Data compression can be achieved using KPCA process as follows: Step 1: A kernel mapping   y x K , is chosen.
Step 2: Input data of dimension N is mapped into some nonlinear feature space  where its mean is zero.
Step 3: The covariance matrix is obtained using following formula: Step 4: The normalized kernel matrix is calculated: Where N A 1 is a matrix with all elements 1/N.
Step 5: The Eigen vectors i  of the covariance matrix is calculated by solving the following equation: Where i  are the Eigen values.
Step 6: The data with reduced dimension r is given by: In this method the obtained feature dimension is very large. This is likely to cause problems for the SVM classifier as the detection accuracy decreases when the feature vector dimension nears the number of testing samples. So using feature reduction technique KPCA to the obtained feature vector improves the accuracy. Also KPCA process enhances the contrast between the two classes of signals namely P300 and non-P300 thus improving the performance of the method. The non-linear kernel function used in the KPCA method is KPCA is applied to each feature vector F to form a new feature vector F  of dimension three. So the reduced dimension of dataset is six. Hence the final dimension of P300 signal dataset and non-P300 signal dataset used are 2550×6 and 2465×6 respectively. Reduced P300 Dataset Where Z is the total number of signal data.
KPCA works only on two dimensional data. For offline processing, the available P300 and Non-P300 data are arranged in respective rows to form two dimensional data. The testing data (P300 or Non-P300) is appended on the corresponding two dimensional data (P300 or Non-P300) for applying KPCA feature reduction process. For online processing, information regarding input data is unknown. Hence two dimensional data is made out of input one dimensional data by using the equation provided in [25].

Classification of Signal
Studies show SVM as one of the most suitable classifier for P300 signal detection [26]. SVM classifier is a binary classifier which is used to classify signal as P300 or non-P300 signal in the proposed method. Different kernel functions like 'Gaussian', 'rbf', 'poly' etc. were applied to the non-linear SVM using the same dataset [23], out of which 'rbf' kernel function gave the best performance. Hence rbf' kernel function is used in this paper. RBF kernel function is defined as [27]: Where  is the corresponding parameter whose optimum value was obtained by performance assessment and systematic variation. The optimized value of was found to be 0.8. The default value '1' was used for the regularization parameter c, which gave optimum performance. Various experiments were carried out to obtain optimum values for '' and 'c' used the same dataset [23]. The dataset consisting of 2550 P300 signal data and 2465 non-P300 signal data were divided randomly into two equal parts each. First half of P300 signal data and non-P300 signal data were used for training and the second half of P300 signal data and non-P300 signal data were used for testing. After the initial training phase of the algorithm, the classification of the testing data set given to the algorithm can be performed efficiently.

Results and Discussions
The dataset containing P300 and non-P300 data signals were divided randomly into two equal parts each for training and testing as mentioned above. This ensures that in each testing trial the data signals present in the training and testing dataset will be different. The average output value of the performance parameters of 10 such training and testing sets were taken as the result. The accuracy obtained was 98.62% for Subject S1 and 99.22% for Subject S2 for the proposed method. The accuracy increased with reduction in feature dimension by applying KPCA. The maximum accuracy was obtained at a reduced feature dimension of six for both the subjects. The proposed algorithm was tested with different other features. However the best results were obtained using the proposed features. The results obtained from proposed method have been compared with that of literature survey as well as different other methods which we have experimented. The above comparisons show the advantages of the proposed method over existing ones.

Signal Analysis
The processing of algorithm and various technical computing has been done using MATLAB 2018, run on Intel core i5 processor. A lot of variations in the features were experimented in order to reach the optimum solution with respect to the maximum accuracy. The time domain plots of detected P300 and non-P300 signals are as shown in Fig.4 and Fig.5 respectively. These signals were obtained using Pz electrode and were obtained from a single trial. The extracted signal from beginning of the stimuli upto 666.67 ms has been plotted in Fig.6. The positive defection peak in the P300 signal plot is clearly distinguishable as compared to non-P300 signal plot. The peak of P300 is observed at 560.3 ms after the visual stimuli. The P300 signal usually appears anywhere between 300ms to 700 ms practically and is clearly distinguished from the normal signal by a positive peak like the one which can be observed in Fig.4.

Changes In The Detection Accaracy of P300 Signal With Feature Dimensions
During the experiments it was observed that there was larger increase in accuracy when the feature reduction technique KPCA was applied instead of directly using the features. Table 1 shows the variation in accuracy with the changes in reduced dimension of the feature vector.  Figure 6. Changes in the Accuracy of P300 signal with change in dimension of reduced feature vector A lot of variations in the feature vector dimension were experimented and it was observed that the maximum accuracy was achieved with the reduced dimension of feature vector, six for both the subjects S1 and S2 as shown in Fig.6. It has to be noted that the values given in Table.1 are the average values obtained from 10 successive trials, after randomizing testing and training dataset before each trial as mentioned in the previous section.

Changes in the accuracy of P300 signal Detection with Different Features
Different types of features were extracted and classified using SVM classifier. Table 2 shows the performance of different features were compared while detecting P300 signal with respect to accuracy. Wavelet features were the least effective in detecting the P300 signal while EMD features and peak specifications were comparatively better than wavelet.  Fig.7 where the modified wavelet features used in the proposed method give clear dominance performance in comparison with the other feature vectors.

Performance Parameter evalution
Performance of the classifier is evaluated based on various statistical parameters like: Precision: The percent measure of correct positive predictions (P300) out of total positive predictions and can be calculated as Sensitivity: The percent measure of correct positive predictions out of positive labelled (P300) instances and is given by Accuracy: The percent measure of total correct predictions (including both P300 and non-P300) and can be calculated as Specificity: The percent measure of correct negative predictions out of negative labelled instances (non-P300) and can be calculated as Where:-TP: True Positive is the number of P300 signals that are correctly detected. FP: False Positive is the number of non-P300 that is classified as P300. TN: True Negative is the number of non-P300 signals that are classified as non-P300. FN: False Negative is the number of P300 that are that are classified as non-P300.
The performance parameter evaluation of the proposed method is given in Table 3. This is obtained based on the Confusion matrix which consists of parameters like TP, FP, TN and FN.

Comparison of performance parameters w.r.t. the various feature extraction techniques
Various feature extraction techniques are compared using the above mentioned four performance parameters. It can be observed from Table 4 that proposed method has given the superior performance as compared with the other feature extraction techniques. Approximately 4.18% improvement in Precision, 24.70% improvement in Sensitivity, 2.9% improvement in Specificity and 13.8% improvement in Accuracy on an average is observed for both the subjects S1 and S2 as compared with other methods. Table 4. Performance comparison of different feature extraction techniques

Comparison with existing methods
A comparison has been made of the accuracy of proposed method with the different existing methods as shown in table 5. All the methods considered for comparison have used the same dataset provided by BCI Competition III which is also used by the proposed method. Hence the same benchmark has been maintained for comparison of various existing methods. Also the proposed method has made use of the same parameters for both the subjects unlike the best existing method where subject specific parameters have been used. Yet, the proposed method has managed to achieve a high accuracy. The winners of BCI Competition III who have used the same dataset managed an accuracy of 96.5% with 15 trials and 73.5% accuracy using 5 trials. The proposed method has achieved much higher accuracy of 98.92% on an average for both the subjects with only a single trial.
Biomedical signals are normally very weak signals corrupted by noise and artifacts. The proposed algorithm proves very effective since it takes into account these factors and has inculcated necessary steps to enhance the signal quality. The pre-processing step removes most of the artifacts and noise contents in the signal. The wavelet transform is renowned for its quality performance with nonstationary signals. Hence is very effective for feature extraction in case of non-stationary signal P300 signals as well. KPCA produces the significant principal components of the features which gives way for efficient classification of the signal. SVM which has been proved to one of the best classifier in BCI systems has been used for the high performance of the method.

Conclusions and Future Work
A method which makes use of modified wavelet coefficients along with KPCA and non-linear SVM classifier is proposed for detection of P300 signal.The proposed method is using the signal obtained from single trial only. Hence making the algorithm computationally fast. The developed method is making use of data acquired from only six electrodes. Thus making the System cost effective as readily available headsets like emotiv epoc, neuro headset etc. with less number of electrodes are sufficient for signal acquisition unlike 64 electrode headsets which are expensive and less comfortable to wear. Thus, an efficient method which is well optimized w.r.t parameters like number of electrodes, accuracy, number of trials etc has been developed. One of the limitations is that the dataset considered here consists of data obtained from just two subjects. But the dataset used in this paper is a standard one which has been used by many of the researchers in developing P300 detection algorithms. Hence same benchmark is defined for comparison of the proposed method with all the existing methods. Due to the limitation in the available standard dataset, this paper has consider ed data collected from only two subjects. In future work we would like to test our algorithm on more number of subjects. Further work will also include development of efficient BCI applications based on the proposed algorithm with higher data rate and efficiency.