A deep neural network approach for P300 detection-based BCI using single-channel EEG scalogram images

Brain–computer interfaces (BCIs) acquire electroencephalogram (EEG) signals and interpret them into a command that helps people with severe motor disabilities using single channel. The goal of BCI is to achieve a prototype that supports disabled people to develop the relevant function. Various studies have been implemented in the literature to achieve a superior design using multi-channel EEG signals. This paper proposed a novel framework for the automatic P300 detection-based BCI model using a single EEG electrode. In the present study, we introduced a denoising approach using the bandpass filter technique followed by the transformation of scalogram images using continuous wavelet transform. The derived images were trained and validated using a deep neural network based on the transfer learning approach. This paper presents a BCI model based on the deep network that delivers higher performance in terms of classification accuracy and bitrate for disabled subjects using a single-channel EEG signal. The proposed P300 based BCI model has the highest average information transfer rates of 13.23 to 26.48 bits/min for disabled subjects. The classification performance has shown that the deep network based on the transfer learning approach can offer comparable performance with other state-of-the-art-method.


Introduction
A BCI is a primary communication medium that measured human-brain activities to communicate them with the outside environment. BCIs collect electroencephalogram (EEG) signals, study them, and interpret them into useful data that helps to communicate with external devices that carry relevant messages [1]. The purpose of BCI is to implement a model that helps disabled people to reclaim valuable functions. A study based on BCI includes the estimation of brain movement, pattern analysis (based on brain activity), development of an adaptive algorithm (for interpretation of brain signals into the system), and development and evaluation of a brain-machine interface system(for the disabled subjects) [2]. This paper presents a non-invasive method for disabled people to analyze brain movement using single-channel EEG signals.
Generally, BCI can categorize into four sections based on signal processing (Neural Network). They are (1) Acquisition of signal using a transducer, (2) preprocessing of the raw input signals, (3) followed by assigning labels to the various stimuli, and (4) transferring the classes to the external devices following a distinct protocol linked with all the devices. The classification of EEG involves feature extraction and the interpretation of these signals into the computer instructions. The BCI model using single-channel EEG signals depends on stimuli, detection response, motor event, and slow cortical potentials.
Most of the studies for analysis of particular EEG signals employed a pattern recognition approach. The machine learning approach outperforms other conventional techniques, as reported in the literature [3][4][5]. However, neuroscience provides information and experiences about the acquisition of signals and enables artificial intelligence (AI) techniques after analyzing the brain signal variability. Machine learning algorithms based on neural networks [6], k nearest neighbors (KNN) [7], hidden Markov models (HMM) [8], and support vector machines (SVM) [7] had employed for EEG classification. Hiraiwa et al. [9] introduced a machine learning technique based on the backpropagation algorithm and proves that the machine learning technique can be employed for EEG classification and designing brain-computer interface models for a single channel. Liu et al. [10] developed a machine learning BCI model (detection of P300 ERP signals) based on a 1D-convolutional capsule network. Two classifiers based on the convolutional capsule network were used to categorise EEG data using 64 and 8 electrodes, respectively. They attained average classification accuracies of 84.7% and 82.1% utilising 64 and 8 electrodes, respectively. Shukla et al. [11] proposed BCI-based P300 spellers based on CNN. The studies used 16 EEG electrodes acquired from 30 targets (9 subjects). They achieved performance with a classification accuracy and bitrates of 90% and 22.3 bits/min respectively. Kundu and Ari [12] proposed the BCI model for character recognition using a P300 signal based on deep learning architecture. The proposed architecture employing the deep features achieved an average classification accuracy of 98%.
Modern methods in the BCI models use a support vector machine (SVM) to predict the P300 outcomes. Using the MNIST database, the author in [13] achieved the best classification performance for identifying the characters using convolutional neural network (CNN) models based on the gradient learning approaches. The present approaches, on the other hand, have a number of drawbacks in terms of computational complexity, classification accuracy for prediction of target trials, user friendliness, and overall model cost. Hence, the above limitations have motivated us to design a low-cost user friendly with lesser complexity using a single EEG electrode. The objective of the proposed method is to validate a novel P300 based BCI model with improved classification accuracy and higher bitrate. The study also intends to achieve a high-speed, cost-effective, and energy-efficient model that is agreeable with the traditional model. The organization of the paper is listed as follows: section 2 illustrates the detail related to P300 and the database. Section 3 explained the detail related to the overall system design that comprises preprocessing, convolutional neural network, and data training. The results and discussion are elaborated in sections 4 and 5 respectively.

The P300 based on EEG signal
The occipital lobe principally holds the regions related to vision-related tasks [14]. Hence, in this study, we introduce a deep learning technique based on 2D timefrequency features to efficiently predict without the need for extensive training using P300 based on a single channel (Pz electrode). The P300 is the decisive deflection that appears about 0.3 s after displaying distinct task-relevant objects in the human EEG [15]. The authors in [13] were the first to use the P300 as a signal in a brain-computer interface. They demonstrated the P300 model using locked-in patients to spell the words after selecting the letters of 26 alphabets with some symbols. Sellers and Donchin [13] estimate the BCI model that works by identifying a P300 elicited task-driven stimuli. They used four samples and experimented with their model over three subjects and proved that P300 based BCI model helps the patients suffering from ALS.
Birbaumer et al. [16] were the first to employ the EEG signal to analyze disabled subjects. They reported that BCI helps to command spelling equipment and communicate with the outside situation for the disabled users diagnosed with amyotrophic lateral sclerosis. Pfurtscheller and Neuper [17] designed a BCI model to analyze the brain activity associated with motor-imagery as a control signal. Kubler et al. [18] proved that the BCI system helps to assist disabled users (ALS patients) by controlling motor imagery using wired EEG. But, the system has a slow response, required training for many months with prolonged communication. Hill et al. [19] examined the able and disabled patients using a brain-computer interface. They proved that healthy subjects efficiently classify the signal while disabled patients fail to predict based on similar methods. The contradictory conclusion between Kubler and Hill was due to their diverse method of training. Kubler et al. employed different training sessions while Hill et al. employed long individual training sessions. The proposed BCI method based on a single EEG electrode can achieve a high-speed, user-friendly, and energy-efficient model based on data-driven time-frequency scalogram features that overcome the limitations faced by the existing methods.

Datasets
The dataset consists of nine subjects; five are disabled, and the remaining four are healthy subjects. A detailed description of the datasets can be assessed from Hoffmann et al. [1] https:// www. epfl. ch/ labs/ mmspg/ resea rch/ page-58317-en-html/ bci-2/ bci_ datas ets/. They used A Biosemi Active Two system for EEG recording. The subjects were allowed to observe six different objects through the computer screen. The images are displayed in arbitrary orders. Each image lasted for 0.1 s followed by 0.3 s of the gap, i.e., the interstimulus intervals (ISI) were 0.4 s. The sampling rate of the EEG signal was 2.048 kHz recorded from 34 electrodes. Figure 1 represents a detailed description of EEG recording using six different images randomly flashed every 0.4 seconds. Four sessions were arranged for all the subjects. The initial two sessions were conducted a day while the remaining two on the next day. Every session consists of six tasks; each task is equivalent to each object (Image). The following rule was implemented in each of the tasks. (1) Scoring the number of images flash.
(2) Displaying all six images with an alarm tone. (3) A random order of flashes was excited after 4 s of alarm tone followed by recording of EEG signal. A random number between 20 and 25 has been selected as a block. It implies one run consists of 22.5 (average value) P-300 trails and 112.5 (average value) non-P300 trails. (4) In the last three sessions, based on the EEG signal target data was inferred using a machine learning technique. (5) At the end of each task/run, subjects were asked for the counted sequence. The time duration of the individual task was about 60 seconds and the time duration of each session was roughly 30 min (including both experimental setup and break).

Overall system design
The raw EEG signals were corrupted by biological noises such as; respiratory, electrodermal response, eye mobility, and muscular movement [20]. Preprocessing has been processed after applying 6th order forward-backward Butterworth bandpass filter. The filtered EEG signals were converted into scalogram images using continuous wavelet transform (CWT). Finally, the images were trained and validated using a convolutional neural network (CNN) based on the deep learning method. In this study, we used directed acyclic graph (DAG) network using AlexNet. DAG networks are complicated and usually include two or more parallel layers. However, the deep networks are pretrained to predict other targets; we can adjust or remodel them to readjust the networks using the transfer learning approach by adjusting the required parameters. Several researchers employed DAG networks like GoogleNet [21], SqueezeNet [22] and ResNet [23] to estimate the classification performance in the study and their applicability. Figure 2 described the block diagram of the proposed model.

Preprocessing
The information about the visual function corresponds to the parietal lobe of the human brain [14]. Hence, this study employed a single channel i.e., Pz electrode (13th channel in the database) for our analysis. The raw EEG signals are affected by biological noises like eye movements, muscular movements, electrodermal response, and respiratory. Hence, we applied preprocessing step to remove the undesired noises and artifacts. As most of the biological noises consist of low-frequency ranges (below 12 Hz), we employed 6th order forward-backward Butterworth bandpass filter with a cutoff frequency of 1 Hz and 12 Hz as employed in [1]. Figure 3 illustrated the removal of biological noises. (1) After denoising the EEG signals, we transformed the normalized EEG signal as shown in Eq. 1 into scalogram images using the CWT method. EEG signals are slowly varying functions characterized by sudden transitory of several scales that give efficient time localization for high frequency, short term events, and excellent frequency localization for a prolonged duration. Hence, efficient time-frequency features could deliver using scalogram images. The CWT transformed a 1-D EEG signal into a single image (2-D) that holds time-frequency and amplitude using a filter bank approach. The filter bank is the predefined parameters to be performed on the input data applied. In this paper, we set the default filter bank parameters. We applied Morse wavelet as employed by [24] for this experiment as it provides excellent time-frequency localization for EEG signals. The analysis has proceeded by setting voices per octave to 10. Figure 4 illustrated the example of scalogram images for six different classes derived from subject 6. The preprocessed images are employed to train and validate the model using a deep network based on a deep learning algorithm.
CNN-based deep learning is a technique employed for feature recognition and classification. This paper used a pre-trained CNN model that required 2-D data as an input; hence, it needed a transformation of 1-D EEG signal into compatible data (2-D) for the trained network. Continuous wavelet transform provides agreeable input/images by delivering efficient performance during training based on a deep learning algorithm. The scalogram images consist of unique characteristics like amplitude, time, and scale. Hence, this amplitude-time at varying scales is used as a feature for analyzing the BCI model. The generated scalogram images are rescaled and normalized before training the model according to the input specification. This paper employed a pre-trained AlexNet model for analyzing the efficiency of the P300 model. Hence, the input scalogram images were resized to 224×224 pixels before training the model.

Convolutional neural network
Convolutional neural network is a multi-layer network based on a deep learning algorithm. It includes three fundamental Removal of biological noises from subject 1 (Pz electrode from session 1). a Trending and noisy EEG signal. b Preprosed EEG signal after applying filtering and normalization method layers, such as convolution layers, pooling layers, and fullyconnected layers. CNN model is mainly used to analyze the image signal. A convolutional operation is conducted in the convolutional layers using the kernel filter. A convolutional layer helps in generating low-level features or an activation map. A pooling layer layers serve to minimize the dimension of the low-level features or high-level features. The fullyconnected layers provide a decision by performing a highlevel logical operation. Figure 5 represents the architecture of the typical AlexNet model.

Data training
Deep networks require a large sample of data for learning, whereas traditional approaches required a smaller number of sample data. A pre-trained AlexNet model based on a transfer learning approach is adopted to overcome the above difficulties. Transfer learning helps in adapting to different tasks and helps in limiting the learning/training time. Initially, the CNN models have designed to analyze only the 2-D images. We adopt 2-D scalogram images to fit the model based on the deep learning algorithm. Hence, we resolve the issue using the CWT approach. This study introduced a pre-trained AlexNet model to predict the classification performance of the BCI model. The final layer (Fully connected layer/output layer) of the training model has been modified to make the network compatible with the input EEG sample and classes. We have trained the model using two different learning rates (1e −3 and 1 e −4 ) to estimate the classification performance. Training/learning consist of 80% of trial data (target and non-target trials) and validating/testing consists of the remaining 20% of trial data (target and non-target). For example, subject-1 consists of 3240 trials (540 targets and 2700 non-target). Hence training/learning datasets consist of 2592 trials (432 targets and 2160 non-target) and the remaining 648 trials for validating/testing (108 targets and 540 non-target). We have distributed the training data into 11 epochs comprised of 250 iterations. Minibatch size parameter defines the value that characterized the no. of samples. The value for minibatch size is set to 10.

Results
The classification accuracy (based on two learning rates) and average bitrate for the disabled and abled subject using a single channel electrode are plotted in Figs. 6 and 7 respectively. We computed the bitrates using the definition employed by Wolpaw et al. [2]. Table 1 illustrated the average bitrates and average classification accuracy for all possible cases over two learning rates. Subject 5 on the database was missing, and hence, we excluded it from the study. The average classification accuracy for disabled subjects lies in Fig. 6 Accuracy and bitrate plot (average) vs. epoch for disabled subjects. The plot displays the classification accuracy using two different learning rate based on a single channel EEG electrode, and it also presents the average bitrate for two learning rate for disabled subjects (Subject1-Subject4.) the range from 92.69 to 96.32% whereas, the average classification accuracy for healthy subjects lies between 93.5 to 100%. The classification accuracy for subject-8 is 100%. The classification performance of disabled subjects and healthy subjects based on maximum classification accuracy is comparable. We used bitrate to analyze the performance classification for disabled and abled subjects. Healthy subjects have been able to achieve a higher bitrate (average) than disabled patients.
The performance plot for a combination of all the disabled, healthy, and disabled-abled subjects has illustrated in Figs. 8 and 9 respectively. The performance based on 1e −4 . learning rate achieved higher classification accuracy than 1e −3 learning rate. Based on the performance for the combination of disabled or healthy subjects, the bitrate for the disabled is 15.409 ± 8.09 bits per minutes (mean ± standard deviation) and for healthy subjects is 24.50 ± 16.15 bits per minutes. The classification accuracy for the combination of all the subjects is 95.70%, with the bitrate of 22.22 ± 14.76 bits per minutes.

Discussion
The issue begins with whether the nature of the classification contributes to the practical application of BCI. It is necessary to define the limits of the proposed method-based BCI The plot displays the classification accuracy using two different learning rate based on a single channel EEG electrode, and it also presents the average bitrate for two learning rate for abled subjects (Subject6-Subject9.) 1 3 model. The only limitation is that the proposed deep network based on a deep learning algorithm requires longer training/learning time compared with state-of-the-art-methods, but the model is fast after effectively trained. Classification performance for insufficient datasets provides a high-grade result, but a sufficient dataset enhances the classification performance. The information transfer rate/ bitrate is an essential parameter in designing a brain-computer interface model that depends on time. In the literature [25], the authors reported that a higher bitrate is achievable when the training reaches at least six epochs.
To compare the P300 based BCI model, different researchers used classification accuracy and bitrate as performance parameters. The authors in [26] achieved the average classification accuracy of 72% for ALS patients and 85% for abled subjects, but bitrate was missing in their results. Cecotti and Graser [25] employed a CNN-based BCI model for detecting P300. They achieved the best classification accuracy of 95.5%. Our proposed CNN model using 2-D EEG scalogram images achieved an average classification accuracy of 95.68% for the healthy subjects and 94.69% for disabled patients. Piccione et al. [27] achieved a bitrate of 8 bits per minute for both disabled and healthy subjects while our method achieved a bitrate of 15.41 bits per minute for disabled patients and 24.50 bits per minute for healthy subjects.
Due to variation in different paradigms, methods, and case problems, the performance classification based on accuracy and bitrate of the above two state-of-the-art-method cannot be compared directly with the proposed method. Our proposed method used six-choice paradigms, while authors in [26,27] used four choice paradigms to analyze the BCI model using P300. In this paper, the target stimulus occurred with a chance of 16%. While [26,27] occurred with a chance of 25%. Cecotti and Graser achieved classification accuracy using limited datasets (two subjects). But, our proposed method used eight different subjects to estimate the classification performance.
Hoffmann et al. [1] has achieved 100% classification accuracy for all the disabled subjects and a bitrate that lies between 8.8 to 24.7 bits/min using all 32 electrodes and classification accuracy using different electrode configuration is less than 100%. For example, Hoffmann et al. analyze subject-4 (disabled subject) with a maximum classification accuracy of 69% and a bitrate of 3 bits/min using a single channel EEG (Pz electrode) as shown in Fig. 10. While our

Conclusion
This paper presents a deep neural network-based BCI model for disabled subjects using P300 but with single electrode. The deep network model proved that the time-frequency features after denoising the EEG signals enhance the classification performance in terms of bitrate and accuracy of the BCI model. We have revealed that the deep network based on transfer learning can achieve comparable classification accuracy with efficient bitrate for disabled subjects. Our model achieved a promising performance with minimum errors. In the future, we planned to adopt a hardware-based model that can assist disabled users by defining BCI applications. Additional works will focus on the relation between the P300 detection and its effect on the image identification paradigm concerning the number of epochs.