A Persian Brain-Computer Interface Speller Based on P300 and Language Model Utilization


 Background: Brain-computer interface (BCI) is a system in which complete control is exerted directly through brain signals. One of the most important applications of BCI is in spellers, which are designed to restore communication ability for people with severe speech and motor impairments. These people can communicate using a BCI speller by spelling their desired words. The goal of this paper is to present the achievements of a project whose ultimate purpose is to propose a Persian BCI speller, which utilizes a Persian language model as an assisting tool to improve its performance.Method: In this study, we first present our method for data collection. To collect the dataset, we designed a screen that is used to display the visual stimuli. Furthermore, we carefully selected the words used in the data collection process. Using the designed stimuli screen and the selected words, the dataset from 5 subjects is collected. This dataset is used in different signal processing and machine learning methods in the designed speller. To classify the P300 signal for spelling purposes, first, the feature vectors of data samples were extracted by the concatenation of the electroencephalography (EEG) channels, which were filtered using a moving average filter. Then, using a support vector machine (SVM), the samples were classified and the results were fused with character-level N-grams to detect the target characters.Results: In the evaluation of the system, we have achieved an average accuracy of 93.2% in character detection by utilizing a character-level 5-gram language model.Conclusions: The assessment of the language models in the system demonstrated that using the 5- gram language model resulted in increasing the accuracy by 1.9% compared to when no language models were used.


Introduction
A brain-computer interface (BCI) enables interaction with surroundings directly by translating brain activities into computer commands without using peripheral nerves and muscles [1]. In fact, BCI creates a non-muscular contact way for interpreting a user's purpose on an external device. Therefore, individuals with severe motor disorders can improve their quality of life through such an interface [2].
The major components of BCI systems are signal acquisition, signal preprocessing, feature extraction, and classification [3]. The signal acquisition step in BCI systems is usually done using invasive, semiinvasive, and non-invasive methods [4]. In an invasive method, such as electrocorticography, signals from the human brain are received by sensors that are surgically inserted into the brain or on the surface of the brain. In the semi-invasive technique, the process of reading the signal is performed inside the skull and outside the brain gray matter [5]. Non-invasive methods, such as electroencephalography (EEG), follow an approach that does not require physical insertion into the brain [3]. EEG is a method for measuring brain electrical activity to understand brain cognitive processes or mental disorders. EEG sensors embedded in wearable devices make it possible to record EEG signals in different living conditions of everyday life [6]. In the non-invasive mode, input signals are always associated with some unwanted data, i.e. noise. Electronic interactions, muscle activity signals, and eye-related noise (e.g., blinking) are also recorded with signals. These unwanted components may negatively affect the performance of non-invasive BCI systems [4].
The main purpose of signal preprocessing is to reduce the artifacts that are recorded with EEG signals. Different filters are used to fulfill this goal [7]. Moving average filters and band-pass filters help to remove noise and artifacts [3]. A moving average filter is optimized to reduce random noise and is an appropriate filter in the time domain for encrypted signals [8]. A band-pass filter passes the cutoff frequencies (e.g., 0.5-65 Hz) and prevents the passage of signals at other frequencies [9].
In the feature extraction step, the features of a signal that encodes the user's purpose are extracted. The extracted electrophysiological features have a strong connection with the user's purposes and they can be in the range of time or frequency or both. One of the most common signal features used in current BCI systems is the latency of event-evoked potentials (such as P300) [10]. The P300 signal occurs in the person's reaction to an unexpected stimulus. The onset of the P300 signal in the individual's brain is between 300 and 500 milliseconds after the stimulus [11]. The P300 signal has been used in a variety of applications, such as lie detections and spellers. The P300-based speller is one of the most commonly used applications in BCI systems, which allows users to select and display different characters on a digital screen by detecting the P300 evoked-potential generated from visual stimuli [12]. BCI-based spellers use language context information as a helpful way to classify brain signals. One of the best context information that can be used in spellers is the language context, which can be used in spellers as a language model [13]. A language model can improve the performance, in terms of accuracy and speed, by predicting the next letter using the previous letters [14].
The final stage of BCI, classification, is the most popular method for identifying brain activity patterns. In this method, the BCI system is considered a pattern recognition system. The performance of a pattern recognition system depends on the extracted features and the classification algorithm [15]. Machine learning techniques such as support vector machine (SVM) are employed for classification. SVM is an algorithm that uses supervised learning to discriminate two different classes of data. In this method, the discriminator is determined to have a maximum distance from the nearest training points [3]. Also, the SVM classifier has better performance than linear discriminant analysis (LDA) in the P300 evoked potentials detection [12].
In this paper, a Persian language model was used to improve the performance of the presented Persian BCI speller using acquired EEG signal datasets of 5 subjects in Iran's National Brain Mapping Lab (NBML) 1 . To classify the P300 signal, first, the feature vectors of the data samples were extracted by the concatenation of the EEG channels which were filtered using a moving average filter. Then, the samples were classified using SVM, of which the SVM algorithm is more suitable for the detection of P300 evoked potentials [12], and the results were fused with character-level N-grams to detect the target characters.
This paper is organized as follows: Section 2 describes some works related to BCI spellers, Section 3 presents the details of the materials and methods used in this research, Section 4 analyzes the data and results and, finally, in Section 5 the discussion and conclusion are given.

Related Works
The first BCI-based speller was introduced by Farwell and Donchin in 1998 [16]. The speller is composed of a 6x6 matrix of symbols, including letters of the English alphabet, numbers 1 to 9, and a space symbol. This matrix speller uses the P300 signal to identify the symbol choice of the user. Following the introduction of this speller, which is widely used for BCI research, a lot of researches has been improved the accuracy and speed of spellers. Some of these studies, based on different methods of signal processing and machine learning, followed the path of increasing the accuracy and speed of the same matrix speller, while others generally presented different methods for constructing a BCI-based speller.
Some researches aimed at introducing new speller development methods for displaying P300 stimuli, such as the use of audio stimuli [17,18] and tactile stimuli [19], or visual stimuli other than matrix [20,21] and rapid serial visual presentation paradigm(RSVP) [22,23].
One of the most important methods of visual representation of stimuli, presented after the matrix method, is the region-based presentation method, presented by Fazel Rezai and Abhari [20]. In this method, 49 characters, including letters of the English alphabet, numbers from 0 to 9, and the number of syllables (such as dots, commas, etc.) were divided into 7 groups of 7 stimuli, which were located in 6 corners and the center of a hexagon. In this method, character selection is a two-step process. Through the detection of the P300, firstly, the group containing the target character can be found after several intensifications. In the second step, the characters of the detected group are distributed into the other seven regions. This method provides the highest accuracy and speed among P300-based methods [24].
In other researches in the area of BCI-based speller, designers have changed the signal to non-P300 signals. Many methods have used Volitional Cortical Potentials (VCP) as an input. The Hex-o-Spell method developed by the Berlin BCI team is one of the most important methods used to design a VCP signal for a speller. In this method, which uses a two-step selection method similar to the region-based presentation method, character selection at each level is made using a motor imaginary [21].
In addition to VCP, in many visual spelling methods, SSVEP has been used. The fastest BCI provided utilizes the SSVEP signal [25,26,27]. Among other BCI studies, other studies have used two or more types of input signals to increase accuracy, speed, or efficiency [28,29,30,31,32,33,34].
In BCI-based spellers, some auxiliary methods have also been used to increase spelling speed and accuracy. The most important of these studies is the RSVP keyboard provided by CSL Laboratory at Northwestern University in the United States. The RSVP keyboard is a P300-based speller that uses a rapid serial visual presentation paradigm to display the stimuli. The RSVP keyboard uses the N-gram language model as an auxiliary tool to detect and classify the P300, which significantly increases the accuracy and speed of the system [14].
For the Persian language, studies on providing a BCI-based speller are limited. The only we have found is Najafi's master's thesis [35] in which provides a Persian P300-based speller. However, our presented method is different in terms of the language model, the screen design of the stimuli, and the classification method.

Materials and Methods
In this section, we present the details of our research in developing a Persian EEG-based speller. To this aim, first, the details of the data collection including the screen design of the stimuli, word selection, and data acquisition are given. Afterward, the proposed speller which is utilized language model is presented.

Screen design of the stimuli
The screen is designed to enable the user to use the covert attention of the system, that can grasp all the stimuli and recognize their color changes by just staring at the center of the screen without moving the eyes. As a result, the placement of the stimuli (which is following the one in Hex-o-Spell [36]), can also help users who cannot move their eyes (e.g., patients with total Locked-in Syndrome [37]). In this speller, the character selection is done in the form of a two-step process, so that in the first step, the characters are placed into six groups in different parts of the screen as shown in Figure 1. The user chooses a group of characters that contains the target character. Then, in the second step, the user selects the target character among the characters in the first step. Figure 1 shows the stimuli on the screen in each of the two steps, the character selection, and how the color of the stimuli is changing to evoke the P300 signal. The screen of the stimuli is designed by the Psychophysics Toolkit in MATLAB [38]. Changing the color of each stimulus must be in a semi-random order. A very short time between the two color-changing disables the ability of the user to recognize that stimulus, and if this interval is too long, the stimulus loses its surprising ability, and hence, the amplitude. In the presented screen, the duration of each color-changing is 150 milliseconds, and between the two successive color changes, all the stimuli remain unchanged for 50 milliseconds.

Target words selection using the language model
Since the language model is an important part of the presented speller, the impact of the language model on Persian phrases should be assessed. For this purpose, the method presented in the work by Amini et al. [13] was used to select target words in the language model. These language models are trained using a corpus with more than two million Persian words collected from the Iranian Student`s News Agency (ISNA) website 2 [13]. For each of the words, the character-level probability was computed using equation (1).
In equation (1), ( ) is the character-level probability of word and represents the n-th character in the word . In this way, among the pairs of words in three easy, medium, and hard categories, 3 words from the easy group, 6 words from the middle group, and 3 words from the hard group were selected as the final candidates, in a way that they could cover the maximum number of Persian characters possible. According to the word-level bigrams with previous words, the word-level probability was computed using (2). As a result, 12 words containing 29 different Persian characters, were used in the data collection and testing system.
At the final stage, for each of these 12 selected words, a sentence was manually generated, in which way these words are put into language context. To test the system, subjects were asked to spell these 12 target words. Tables 1, 2, and 3 show the final target words that were selected. The two bold words represent the selected pairs of words, and the underlined words are the words that were spelled in the EEG data acquisition process using the stimuli screen.

Data Acquisition
At first, the selected target words were displayed to the users. Then for each word, characters were shown to the users once at a time, and users were asked to select these characters using the screen of the stimuli that are in front of them. To this end, users were asked to focus on the stimuli and also count the number of times the color of the stimuli is changed. The focusing of the users on the stimuli leading to enabling SSVEP and counting the color changes resulting in the P300 component excitation.
An EEG recording device made by g.Tec was used to recording EEG data. This device had 63 active electrodes, which were connected to a g.HIamp amplifier. The ground electrode was placed on the forehead and the reference electrode was placed on the right ear lobe. Followed by the 10-20 standard pattern, figure 2 shows how the electrodes were located during the data acquisition process. The data was collected in a way that the processing and classification of EEG signals could be done in the future. The data collection process was carried out at the National Brain Mapping Lab, located at the College of Engineering at the University of Tehran. The duration of the presence for each subject in the lab was about 1.5 to 2 hours, depending on the time of the checkup before the recording process, and the time it took to setup the EEG cap. All subjects participated voluntarily in the process of recording the EEG signal. To confirm their consent, the consent form prepared for the subjects was completed. For all stages of this study, including the recording and processing of EEG data, a medical ethics license was received.
EEG data were recorded from 5 subjects, including 3 females and 2 male ones. It should be noted that all of these subjects were physically healthy and right-handed. In order to reduce the effect of the sequence of user-spelled words, this sequence was different for each user. Table 4 shows the sequence of these words for each subject.

Subject
Word Sequence For each character, the color of each of the six stimuli on the screen was changed 12 times, for each of the two steps of group selection and the selection of the target character, it was repeated. Between every 4 repetitions of the change in the color of all the stimuli, 2 seconds of the break were considered. To reduce the effect of eye-movement artifacts, subjects were asked to avoid blinking when showing stimuli. Subjects could use the two-second rest period to blink and rest their eyes. Figure 3 shows the process of collecting EEG data from two subjects.

Data processing and classification
The EEG data was stored in the raw format (without filtering) and a sampling rate of 512 Hz was used. To evaluate the available data, it was necessary to preprocess the data. For this purpose, the EEG signal was first filtered using a band-pass filter in the range of 0.5 to 45 Hz. Then, using the ICA method, eye artifacts were manually removed from the filtered signal. All these processes were performed by the EEGLab toolbox in MATLAB environment.
To detect a character using the P300 signal, it is first necessary to classify the EEG samples after a change of color before the P300 signal is detected. For this purpose, an SVM classifier was used. Dimensions of the 600-millisecond-signal for all EEG channels were reduced after each change by using a moving average filter of order 8. Then the range of signal changes across all channels was normalized between zero and one. In the end, by concatenating the outputs for all channels, a feature vector was extracted for each data sample. To do labeling these samples, the ones that were related to the change of color of a target stimulus were labeled as P300, while the others were labeled as non-P300.
Using 9-fold cross-validation, the SVM classifier was trained and tested. When classifying the samples given to the SVM. In addition to determining the class of that sample (P300 or non-P300), its distance from the decision boundary was calculated. This model classifies the samples with a negative distance from the decision boundary as P300 and positive distance samples as non-P300.
By using a certain number of stimuli epochs (a complete change of the color of all the stimuli), the values of the distance between all the samples pertaining to a stimulus were pooled together. In this case, the stimulus, whose sum of the intervals is smallest (with negative and positive values), was chosen as the selected stimulus by the user. Assuming that ( ) is equal to the sample distance of the EEG sample after the stimulus is intensified, the SVM classifier boundary is used, n is the number of epochs used to detect the stimulus, the sum of the sample distance values after the stimulus 's' The classification boundary was calculated using equation (2): Assuming that ( ) represents the sum of the distance values of the EEG signal samples after the occurrence of the stimulus , the target stimulus was selected using the equation (3):

Using language model for character recognition
For character detection, the language model was applied to the result of the P300 classification. Character-level N-gram of N values between 2 and 5 was matched with the addition of a smoothing method. The numerical values used to recognize the character using the P300 signal were similar using the same model, and the values of the space used for character recognition using the P300 were converted to probabilities using the softmax function, as shown in equation (4): In equation (4), ( ) is the probability of the target stimulus and ( ) equivalent to the sum of the signal sample distances from the SVM decision boundary.
To calculate the probability of choosing a stimulus using the language model, if the stimulus consists of a group of characters, the probability is calculated by using equation (5) and if the stimuli contain a character, the equation (6) is used to calculate the probability. In these two equations, − ( ) indicates the probability of selected target stimulus using the N-gram, and � | − +1 … −1 � is the occurrence probability of the j-th character at the k-th position of the sentence with the condition of the N-1 previous character(s).
In equation (7), the selection of a character is obtained by combining the results of the P300 detection and the N-gram language model: In equation (7), the parameter ω is the weight of the probability of the language model and the probability of the P300 classification. In this study, this number was chosen experimentally by 0.13, which yielded the best result.

Evaluation Results
In this section, the results of our evaluation for the given speller are presented. In the evaluations, we have used well-known evaluation criteria, such as accuracy, precision, and recall [39]. The primary results on the classification of the P300 signals are presented in Table 5. Despite the high accuracy and precision, the results in Table 5 show that the SVM classifier was not capable of achieving a good recall. This means that most of the P300 samples were mistakenly classified as non-P300. To improve the performance, the language model score is integrated with the P300 classification score to classify the characters. The results of character recognition are presented in Table 6 for different levels of the N-gram language model. In this table, the results are given for different numbers of epochs, as well. The last column of this table, i.e., Avg, denotes the average accuracy of the corresponding row. The results in Table 6 show that the method used to recognize characters using the P300 achieved high accuracy, and with an average of approximately 93%, with four display stimulus epochs. By applying the language model in the P300 classification, the character recognition for all subjects, except subject number 3, the accuracy eventually reached 100% with the increasing number of epochs. We believe that the reason for not having 100% accuracy of the classification of P300 in subject number 3 is due to a mistake by the subject; in another word, subject 3 probably selected characters other than the target character in one of the steps.
To more precisely study the impact of the language model on the P300 classification of characters, the performance was calculated for each difficulty level, separately. Table 7 shows the results of character recognition in each of the easy, medium, and hard groups. Each accuracy score of this table denotes the average accuracy for all 5 subjects. The results in Table 7 show that increasing the amount of N in the N-gram language models (i.e. increasing the context window) in the words of the medium group has the most positive effect and even more than the easy group. In the hard group, the 1-gram language model is better than the 2-gram, 3-gram, and 4-gram. Also, the 5-gram language model is better than other models.

Discussion and Conclusions
In this work, a stimuli screen was designed based on the Hex-O-Spell model. Then, using the language model in this speller, target words with three different levels of difficulty were selected. After collecting the EEG data from 5 subjects, the P300 classification and the character detection were tested. In the character recognition using the P300 signal, an average accuracy of 91.3 % was obtained. The used language model was a combination of character-level N-grams with values of N from 1 to 5 and a word-level bigram. The comparison of the average accuracy of the P300 classification for character recognition by different methods used in this study is summarized in Figure 4. The results of these N-grams showed that the use of the language model generally increased the precision of the P300 classification in character recognition. It was also concluded that by increasing the number of N in the N-gram language models, the character recognition accuracy was increased, and as a result, when using the 5-gram language model, the maximum average accuracy of 93.2% was achieved (a 1.9% improvement from when only P300 is used). As we had no access to any similar data to compare our method with, to better understand the results of this study, these results are compared with the results presented in the study by Cecotti and Graser [40]. The average accuracy values provided by various methods in the classification of P300 (Hofftam, Zongton, Yandong, mLVQ, LDA, ESVM CNN-1, MCNN-1) are given in the character detection of the research [40], based on the number of epochs which are given in Table 8. The dataset collected in the study [40] contained 29 different characters from the 36 characters shown on the screen, which has a similar approach to this study. The results of the comparison of these two studies show that this study with 12 epochs even when not using the language model, has had better results than the 5 and 10 epochs in the methods mentioned in the study [40]. Although the results are not quantitively comparable due to differences in data, they can give the insight to understand the position of our work besides similar research. Our method can be improved in various directions including the improvement of language models and the smoothing methods, enriching the dataset, and also benefiting from hybrid BCI techniques. Hybrid BCIs combine two or more different types of BCIs and also could improve universality because a second approach might convey the user's intent if another approach cannot. Also, the use of multiple types of brain signals as inputs to a BCI system can increase the accuracy and speed of the system. In the process of collecting EEG data, the stimuli screen was designed to induce SSVEP in the subjects' brain signals for the Persian speller, but it was not modeled in this work. Consent for Publication: Written informed consent was obtained from the users for the publication of their data.
Availability of Data and Material: Data and material of this research are available upon request.
Competing Interests: The authors declare that there is no conflict of interest regarding the publication of this paper.
Funding: This project was funded by the Iran National Science Foundation under project number 95844527. However, the responsibility for the content of this study lies with the authors alone.

Authors' Contributions:
Hadi supervised the project, advised in developing the concept and methods, advised in designing the experiments, edited the manuscript, and is the corresponding author of the paper. Hessam developed the concept, implemented the algorithms, did the simulations designed data collection, collected and prepared the data, prepared the first draft of the paper. Parisa did some implementation, did several evaluations, prepared some parts in the first draft of the paper. All authors discussed the final results.