EEG Recording Device
Although there is a lot of research work done in the area of human speech detection and recognition modeling, it is always challenging to use this data that works on the wireless EEG device to acquire the EEG signal data from the human brain and to transmit it wirelessly to the computer interface that creates a speech recognition model. There is always difficulty in the acquisition of the data from wireless devices due to various inferior signal conditions. But wireless devices benefit such as easy connection, the easy transmission of data, cheap prices, easy to mount on the head, and so on.
In this research work, Epoc Signal Server is used heavily for the EEG raw data recorder to Simulink for mathematical signal processing for feature extraction and classification ranking purpose. Besides, Emotiv Control Panel is used to check the electrodes connectivity strength before the recording and training started. Emotiv Testbench and Emotiv Brain Activity Map are used heavily for visual analysis hand in hand with the Simulink recorded data to provide a better strategy to analyze the data efficiently.
EEG is responsible to collect data emitted by the cortical cortex of the brain. The EEG has 16 electrodes which are placed according to a 10-20 system. The EEG device communicates wirelessly to the laptop and therefore other additional components to support the functions of the EEG device are required. The EEG that was used is ‘Emotiv EPOC+[12], as shown in Figure 3.
Data Acquisition
The whole system starts with the EEG device. There are 16 sensors on the EEG device, and these sensor locations are fixed using the 10-20 system. Two of the 16 sensors are used as reference points. These two sensors will be placed behind the ears on mastoid bones. The 16 sensors are located at Fp1, Fp2, F3, F4, F7, F8, C3, C4, T3, T4, P3, P4, T5, T6, O1, O2. Data collected by each of these sensors is considered a separate channel. The sensors will measure the potential difference of the electrical signals fired by the neurons in the brain. The unit used by the sensors is micro-volts.
Data collected by mobile EEG device is at a sampling rate of 128Hz. This new sampling rate will produce lower samples for each recording, thus computational power and time required to train and test classifiers are reduced drastically. The recording of each activity will contain 256 samples. Some variables and arrays were initialized like the 14 selected EEG channels. The 14 channels were selected based on the construction of the Emotiv EPOC EEG device which is used in this research work to read data from these 14 channels. Then the number of participants was checked and looped to open all the raw EEG files to analyze the EEG signals channel by channel.
The computing device receives data from the EEG device via Bluetooth connection. The data received has 25 channels which is additional nine channels. These nine channels contain other data such as timestamp, counter, marker signal, synchronization signal, gyro values and etc. The data from the EEG device is collected every 0.0078125 seconds based on the sampling rate of 128Hz. At the end of the two-second recording, 256 samples are collected and tabulated in a matrix form. The dimensions of the matrix would be 256×25. A high pass filter is then used to reduce the effects of DC offset and also filter out low-frequency noises that may exist in the signal. The high pass filter has a roll-off frequency of 5Hz. Data is saved into the computing device in matrix format allowing the files to be accessed on MATLAB during classifier training.
Data Acquisition Protocol
First, the participants were briefed on the experiment to be carried out and mentioned that the data acquired will be purely used for the research work only following the code of ethics of Anna University guidelines. The data acquisition protocol was developed and briefed to the participants that all agreed to the instructions during the recording. A total of 10 participants were involved, with each being tested separately using two different models using CNN. The participants were asked to imagine 10 words in sequential order, and the signal for each word imagined was recorded with 5 secs gap in between. For this, a stopwatch clock was used that was placed in front of them. The signal imagined was recorded for a duration of 3 secs, followed by a gap of silence for 5 secs. Thus 10 words imagined were recorded for a period of 75 secs for each participant.
Experiment
Convolutional Neural Network (CNN) is used to predict the word imagined as the Continuous Wavelet Transform (CWT) will convert the EEG signals into 3D scalogram image and CNN is able to capture the time-related and spatial dependencies of an image when the relevant filter is applied. The method is shown in Figure 4 and explained in sequence as follows:
Pre-processing using Morlet Continuous Wavelet Transform
In order to perform the pre-processing of the dataset and normalization of label value, the dataset size and format were first checked by loading the dataset. CWT has the window function which can tackle the main wavelet function where the window is shifted and scaled in the process of conversion. This allows the windowing at a longer time interval at low frequency and for high frequency, a short time interval windowing will be used. Moreover, with the capability of the splitting window of variance sizes, it provides the highly effective analysis of the low and high-frequency information of the EEG signal with the non-stationary property. The spectral analysis was done using Morlet wavelet CWT (MCWT) as it is more suitable for non-stationary EEG signals.
The MCWT can be represented mathematically by the equation 1 defined as,
$${W}_{\text{x}}\left(\text{s}, {\tau }\right)=\frac{1}{\sqrt{\text{s}}}\underset{+{\infty }}{\overset{-{\infty }}{\int }}\text{x}\left(\text{t}\right){\psi }\left(\frac{\text{t}-{\tau }}{\text{s}}\right)\text{d}\text{t}$$ …………………………………….. 1
Where
\({\text{W}}_{\text{x}}\left(\text{s}, {\tau }\right)\) = coefficient of wavelet
\(\text{x}\left(\text{t}\right)\) = signal of time
\({\psi }\left(\text{t}\right)\) = wavelet function conjugate
s = scale
\({\tau }\) = constraint position
The MCWT was done and produced 1 scalogram per channel, as each trial has 14 channels and thus, the 14 scalograms were combined into 1 image and this combined image will be used as the input for the CNN.
For example, the location [1, 1, 1:8064] represents the first data of the first electrode of the first trial out of 40 trials and the 8064 layers are the data recorded per sample, where the sampling frequency is 128Hz and the data length is 63 seconds. After obtaining the data from one electrode, MCWT is applied with a sampling frequency of 128Hz to convert the data into the scalogram.
The scalogram produced is shown in Figure 5, but the generated image has a label and white bar covering the image which increases the training duration and accuracy of the CNN.
Finally, the 14 scalograms that represent the 14 electrodes recording the EEG signal of the same instance are combined into 1 image as shown in Figure 6 to ease the process of labeling matching and allow the CNN to understand the direct relationship and difference of each scalogram of the same instance when changes have appeared. Then the combined image is saved into the respective designated folders and the combined image is as shown in Figure 6.
Normalisation
The value of the dataset label was normalized to 1 (High) and 0 (Low), where 1 and 0 indicate that the value is in the range of 1-5 and 6-9 respectively. The normalization of the value is required by this method so that the accuracy of the system can be increased by reducing the wide range of parameters.
Feature Extraction
Based on the literature, it is decided to extract 7 features for the first round of training two different models and then reduce to 4 features for the second round of training two different models. The seven features extracted from each signal are,
1. Mean is defined as the average value of the frame, given as
$$\mu =\frac{1}{N}\sum _{n=1}^{N}x\left(n\right)$$ ………………………………… 2
2. Standard deviation is defined as,
$$\sigma =\sqrt{\frac{1}{N-1}\sum _{n=1}^{N}{\left(x\left(n\right)-\mu \right)}^{2}}$$ …………………………3
3. Skewness is defined as the asymmetric measured distribution around its mean values and is given as,
$$s=\frac{1}{N{\sigma }^{3}}\sum _{n=1}^{N}{\left(x\left(n\right)-\mu \right)}^{3}$$……………………………4
4. Kurtosis is defined as the 4th order central moment of the distribution in the given frame and is given as,
$$k=\frac{1}{N{\sigma }^{4}}\sum _{n=1}^{N}{\left(x\left(n\right)-\mu \right)}^{4}$$……………………………5
5. Band power is defined as the average power of the signal in the frequency domain and is given as,
$$P=\frac{1}{N{\sigma }^{4}}\sum _{n=1}^{N}{\left(x\left(n\right)-\mu \right)}^{4}$$……………………………6
6. Root Mean Square is defined as,
$$RMS=\sqrt{\frac{1}{N}\sum _{n=1}^{N}{x\left(n\right)}^{2}}$$………………………………7
7. Shannon Entropy is defined as the spectral distribution measurement of the signal or the amount of information measured and is given as,
$$SE=\sum _{n=1}^{N}x\left(n\right){log}_{2}x\left(n\right)$$………………………………8
Training of CNN
The training process of the CNN using the pre-processed dataset and the normalized dataset labels. The CNN can be divided into 2 parts where the first part is the feature learning layer which generally extract the feature from the input signal, second part is the classification layer where the extracted signal features were flattened into a series of column vector for the feed-forward neural network to perform training.
ReLU activation function
The Rectified Linear Unit (ReLU) was used due to its high efficiency for computational conversion as it does not restrict the range of the activation which means from 0 to infinity. Overfitting issues and long training duration can also be avoided. The ReLU function can be represented mathematically shown in equation 9.
The Alexnet is 8 layers deep convolutional neural network where it is capable to predict signals over 1000 categories of the object as more than 1 million signals have been trained by the network. The fully connected layer is used to classify the signals and the number of classification output of both the fully connected layer and output layer are 1000. However, the classification output required in this work is 10 categories.
The backpropagation method was used for the adjustment of weight and bias for every iteration until certain series of epochs where the fully connected layer will be able to perform the classification.
Finally, the output of the fully connected layer will be sent to the soft-max classification layer for final classification as a label and the trained CNN model will be used for testing. The output of the softmax layer is equal to the desired number of outputs which in this case is 10 (left, right, up, down, front, back, stop, pick, red, blue).