ASTERI: image-based representation of EEG signals for motor imagery classification

Electroencephalography (EEG) signals are valuable in the monitoring and investigation of neurological diseases and in the control of brain-machine interfaces (BCI). However, these signals are noisy and are non-linear and non-stationary in nature. Signal analysis is an expensive task and can lead to misdiagnosis. Deep learning can be used to overcome these challenges. The most used deep architectures are based on convolutional neural networks (CNNs). Representing EEG signals as images can be useful to use deep architectures based on CNNs in solutions based on intelligent EEG-based systems. In this work, we propose the ASTERI method, for representing EEG signals in two dimensions using the backprojection reconstruction method. To validate the proposal, experiments were performed with motor imagery EEG signals from the BCI Competition IV 2b motor imagery database. To extract attributes from the EEG signal windows, we used the ASTERI representation and investigated the pre-trained networks VGG16 and LeNet. To classify the final classification of imagined movements to the right and to the left, we used Random Forests. For the original database, the validation with 100 trees achieved an average accuracy and kappa index of 88.97% and 0.78, respectively. In the case of augmented databases with synthetic instances, the average accuracies were 89.10% and 89.13% for increases of 50% and 150% for each class, respectively. The experimental results showed that the ASTERI method, together with hybrid deep architectures based on CNN and Random Forests, can generate competitive results in solving motor imagery problems. The representation of signals as images can also contribute to the exploration of new solutions of applied neuroscience problems based on deep architectures.


Context and motivation
Electroencephalography (EEG) is a technique that detects brain activity over a period of time. This means that it detects voltage fluctuations due to the flow of ionic current between neurons in the brain. Currently, EEG signals are valuable in the monitoring and investigation of neurological diseases. Abnormalities such as excessive and uncontrolled brain activity, slow rhythms, or altered synchronization may be indicative of neuronal dysfunction or brain degeneration, for instance (Abiri et al. 2019;Craik et al. 2019).
In clinical practice, electroencephalography has been used in several contexts: diagnosis of epilepsy (Chen et al. cal artifacts are also present, such as 60 Hz artifacts resulting from interference from the electrical network, artifacts due to electrode malposition, and electrical and electromagnetic interference from other medical and hospital equipment (Rampal et al. 2018).
Analyzing EEG signals is a difficult task that requires visual inspection by a well-trained medical specialist. Therefore, this process becomes susceptible to the subjectivity of the specialist, in addition to demanding considerable time (Thomas et al. 2020, Tjepkema-Cloostermans et al. 2018 . As a consequence, diagnostic errors are common, which can lead to poor prognosis (Benbadis 2010). Amin and Benbadis (2019) point out that approximately 25-30% of patients diagnosed with epilepsy, through EEG signs, do not have epilepsy. Some patients had been misdiagnosed for more than 10 years (Amin and Benbadis 2019).
Although there are other ways for noise reduction or signal recovery such as curve fitting and attenuation filter for noisy signal (Luccas et al. 2021), several researchers have sought to overcome these challenges through the use of machine learning (ML) (AlAbboudi et al. 2020;Gayathri et al. 2018;Gemein et al. 2020;Ieracitano et al. 2021;Isa et al. 2019;Mardini et al. 2020;Rashid et al. 2018). Among the various areas within machine learning, deep learning (DL) algorithms have stood out in recent years. Deep learning is based on learning machines that seek to imitate the functioning of the human brain through hierarchical architectures. Unlike classical ML methods, based on the explicit extraction of attributes, DL algorithms may be able to handle data in its original format or raw data, for example, images. Thus, these methods have multiple levels of representation, obtained through the composition of simple and non-linear modules, where each level represents the data in an increasingly abstract way. In this way, these deeper layers can amplify aspects of the input data that are important in the classification process (LeCun et al. 2015;Polat and Özerdem 2020;Silva et al. 2019). In this way, DL methods implicitly extract attributes. Among deep architectures, convolutional neural networks (CNN) are quite popular, being able to capture an input image, assign learnable weights to various aspects and objects of the image, and thus classify the image (Mao et al. 2020).
EEG recordings are commonly used as one-dimensional signals in machine learning problems. Thus, in this approach, it is necessary to choose a set of representative attributes to represent the signal windows in time through physical and statistical attributes in the time and frequency domains. However, representing these signals in two dimensions can be useful in the context of deep learning. Several researchers have proposed ways to represent and classify EEG signals as (Fu et al. 2014;Ieracitano et al. 2019;Lee and Choi 2018;Polat and Özerdem 2020;Srinivasan et al. 2012) images.
The main objective of this work is to propose the ASTERI method of representing EEG signals in images, in order to facilitate the use of deep learning machines based on CNNs to solve EEG motor imagery problems. The ASTERI method uses the classic backprojection image reconstruction method, originally applied in the reconstruction of computed tomography images (Gordon et al. 1975;Hsieh 2003). Backprojection is a simple and fast summing method capable of obtaining images from sinograms. In this study, we will use biosignal windows as pseudo-sinograms (Gomes et al. 2020a(Gomes et al. , 2021a. The method was named ASTERI, the Greek word for "star," because of the pattern that is generated in the representation of the signal in time as an image. The pattern similar to the design of a stylized star is the result of applying the backprojection algorithm to the signal in time. To validate the method of representing signals in images proposed in this work, we used as a problem of interest the technique of classification of motor images (MI) based on EEG to control brain-machine interfaces (BCI). MI is based on intentional mental rehearsals of motor behavior without an associated movement or external stimulus. Thus, only with imagination, the kinesthetic memory of a previously performed movement is activated, giving the impression that it is being performed again Batula et al. 2017;Padfield et al. 2019). In this work, we use the 2b database of the BCI IV Competition (Leeb et al. 2008). This signal base is composed of brain signals from nine volunteers who imagined two types of general movements: right hand and left hand movements.
After generating the image banks referring to the nine subjects' motor image signals, we tested two pre-trained deep architectures: VGG16 and LeNet. These deep neural networks were used to extract features from ASTERI images that represent EEG signal windows. Then, we applied swarm intelligence methods for attribute selection and random forests for classification. Finally, the results were evaluated according to the accuracy and kappa index, being compared with the results of the state of the art.
The structure of the subsequent sections is organized as follows: in the "Related works" section, we comment on works related to the representation of EEG signals as images and related to motor imagery classification; in the "Proposed method" section, we present the theoretical concepts necessary for a good understanding of this work, and also our proposed method; in the "Results" section, we show the results and discussion. Finally, our conclusions and future works are described in the "Conclusion" section.

Representation of signals as images for use of convolutional neural networks
Several works have proposed methods of representing signals in images aiming at the use of convolutional neural networks. Kwon et al. (2020) tested an approach to represent speech signals as spectrograms. The general objective of the work was to recognize emotions through these signals. The work methodology consisted of applying a pre-processing to speech signals using an adaptive threshold. Then, the signals were converted into 128 × 128 images using the short-term Fourier transform (STFT) and the fast Fourier transform (FFT). Finally, the authors designed a new CNN architecture, called deep stride CNN (DSCNN). The network uses steps of 2 × 2 directly in the convolution layers as downsampling. Furthermore, the network is formed by convolutional layers of 7 and fully connected layers of 2 layers that feed the Softmax classifier. This model was evaluated in two databases: IEMOCAP and RAVDESS. The first base counts with speech signals from 10 actors, who represented seven emotions: anger, fear, happiness, disgust, sadness, excitement, and neutral. The RAVDESS dataset is composed of 24 actors, who enact eight emotions (anger, calm, happiness, sadness, surprise, disgust, fear, and neutral). The proposed method obtained an average accuracy of 81.75% for the IEMOCAP base and 79.5% for the RAVNESS base.
Similarly, Khare and Bajaj (2020) sought to differentiate emotions with CNNs. However, in this case, the authors used electroencephalographic signals made available in a public database. The database consists of recordings of 20 students using 24 channels. During brain monitoring, volunteers watched videos, intended to arouse one of four emotions: fear, happiness, sadness, and relaxation. First, the work proposed a pre-processing of signals with Butterworth bandpass filters. To represent the signals as images, they tested the Smoothed pseudo-Wigner-Ville (SPWVD) distribution. The work points out that the SPWVD transformation provides a better spatio-temporal resolution, providing a direct localization of the signal energy. The generated images served as input for three pre-trained CNNs (AlexNet, Resnet50 and Vgg16) and a fourth CNN configured with four convolutional layers and 2 fully connected layers. Average classification accuracy was 90.98%, 91.91%, 92.71%, and 93.01%, for tests performed with AlexNet, Resnet50, Vgg16, and CNN configurable, respectively. Yildirim et al. (2019) proposed a method for automatic detection of diabetes mellitus (DM) using heart beats from electrocardiogram (ECG) signals. Yildirim et al. (2019) also tested the following pre-trained networks: AlexNet, VggNet, ResNet, and DenseNet, in order to overcome the challenge of using a small database. To transform the one-dimensional signals into two-dimensional, the authors generated grayscale spectrograms of size 437 × 501. To test the method, they used a database of 30 volunteers: 15 healthy and 15 diagnosed with DM. Finally, the pre-trained DenseNet model achieved the best performance with an average accuracy of 97.62 ± 2.30. Taghizadegan et al. (2021) sought to predict obstructive sleep apnea (OSA) events through polysomnographic signals (EEG, ECG, respiratory signals, oximetry, and eye movements). The authors designed a semi-automatic method using CNNs and recurrence plots. The recurrence plot (RP) is a tool to visually represent the dynamic recurrence of the system. The work methodology consisted of representing the signals in RP and thus feeding pre-trained CNNs. In this case, each type of signal fed a different network. Finally, the output of these CNNs was merged using the weighted majority voting (WMV) method based on accuracy. This proposal was tested in two databases: "Dublin Sleep Apnea database" and "MIT-BIH polysomnographic signal." The first database consists of EEG and ECG signals from 15 volunteers. The second database is composed of EEG, ECG, and respiratory signs from 16 men. Considering four scenarios (30, 60, 90, and 120 s before the OSA events), the authors observed that the best prediction occurs closer to the event, that is, at 30-s intervals. For the "Dublin Sleep Apnea" database, the ShuffletNet network achieved an accuracy of 90.45% ± 1.23. For the MIT-BIH database, the average accuracies were 90.72% ± 1.50 and 90.23% ± 1.35, for the ShuffleNet and Resnet networks, respectively.
Another way of representing signals as images was addressed by Ieracitano et al. (2019). The researchers sought to differentiate between people with Alzheimer's disease (AD) and people with moderate cognitive impairment (MCI) and healthy individuals. They proposed to represent EEG signals as spectral density images (PSD) of size 19 × F. The image dimensions represent the 19 EEG channels, while F is the number of discrete frequency values in the range of 0.5 − 32 Hz. The images then served as inputs to a custom CNN that classifies the images into two or three classes. To test the proposed method, the authors recruited 189 volunteers from the Centro Neurolesi Bonino-Pulejo de Messina, IRCCS, including 63 subjects with AD, 63 subjects with MCI, and 63 volunteers for the control group. The signals were pre-processed, and the CNN was trained and tested, achieving an accuracy of 83.33% in the three-class scenario. Considering a binary classification, the best results were the comparison of AD with the control group (92.95%) and of MCI with the control group (91.88%). The authors also tested the classification of PSD images using usual classifiers such as SVM, MLP, and LDA, but the results with custom CNN still remained the best.
Classification of EEG-based motor imagery for control of brain-machine interfaces Zhang et al. (2021) tested ways to adapt CNNs to improve motor image classification. To overcome the small amount of EEG signals per person, the study proposed a subjectindependent classification. The idea is to train the network using data from several patients, with the exception of the target subject. Then, the pre-trained model is adapted and adjusted with the data of the chosen subject. For this, the researchers proposed different strategies for adapting the network. First, they tested the classifier's optimization, leaving the attribute extraction parameters unchanged. Then they tested the adaptation of convolutional layers. In this second scenario, four different proportions were tested, varying the amount of layers retrained with the subject's data. To evaluate the method, they used a baseline of EEG signals from 54 healthy patients imagining right and left hand movements. The CNN used consisted of spatial and temporal filters with max-pooling, 3 convolutional blocks, and a fully connected layer with Softmax. Finally, the study observed that, compared to traditional methods, with training performed with the target subject's own data, the proposed method presented an accuracy of 32.50% or higher. Among the scenarios tested, the best results were with adaptation of 3 of the 4 convolutional layers and the Softmax layer. This indicates that only the first layer was properly trained with data from other patients.
Another innovative approach was tested by . In order to improve motor image classification results, the authors merged several models of CNNs with different depths and filters. Thus, the proposed method is formed by CNN models based on the AlexNet architecture. The attributes extracted by the different models are then concatenated, serving as input to an MLP (MCNN Method) or an autoencoder (CCNN Method). Finally, a Softmax activation function is added. In order to test this proposal, the authors experimented with four types of CNNs, ranging from one to four convolution blocks and max-pooling and filters ranging in size from 10 to 30. The CNN models were pre-trained and evaluated using the High Gamma Dataset, formed by EEG signals from 20 volunteers. In addition, the models were also tested on the BCI IV-2a (BCID) dataset. The BCID is composed of EEG signals from nine volunteers, who imagined four types of movements: right hand, left hand, foot, and tongue. As for training, the study also evaluated two approaches: subject-specific and also using signals from all volunteers for training. Finally, considering the BCID, the tested method presented an average accuracy of 75.7% for the specific training of the discipline. In the case of training with data from all volunteers, the CCNN method outperformed the others, with an average accuracy of 55.34%. Dai et al. (2020) focused on two approaches to improve motor image classification: customizing the CNN architecture for each subject by adjusting the kernel dimensions, and augmenting the data for each subject. The authors investigated the impact of kernel size between different volunteers and between different imaging sessions for the same person. This analysis revealed 10% differences in accuracy with the variations made. Inspired by these findings, they proposed a CNN with a hybrid convolution scale (HS-CNN). In addition to the architectural adaptations, the work proposed a new method of data augmentation, where signal segments were recombined in the time and frequency domains. To validate the method, the researchers used two BCI Competition IV databases: 2a and 2b. Each base is composed of nine healthy subjects, with 4 and two classes of motor imagery, respectively. The mean classification accuracy was 87.6% for the proposed method, showing an improvement of up to 23.25% and 19.7% for databases 2a and 2b, respectively.
As in the previous work, Zhu et al. (2019) also performed experiments on the BCI Competition IV dataset 2b. However, the authors tested a subject-independent approach. The main idea of the work is to train BCIs with existing data from several individuals and transfer the learning to classify images of a new individual. Thus, the authors pre-processed the signals with bandpass filters and extracted the attributes using the common spatial pattern (CSP) method. In this study, however, the authors abandoned the log normalization operation commonly used in CSP, believing that the operation can remove important temporal information for classification. To this end, they designed a structure capable of reducing the dimensionality of CSP data and with greater discriminative capacity of the information, called separate channel convolutional network (SCCN). To do this, they used a one-dimensional CNN as an encoder to capture information from each channel and reduce redundant information. Then, a recognition block was used, capable of classifying the encoded CSP data. Finally, a Softmax layer is employed to complete the classification process. The results obtained were evaluated according to two metrics: accuracy and information transfer rate (ITR). The proposed method presented an average accuracy of 64% and an ITR of 0.83. However, simpler classifiers like LDA had better results (65%). Zhao et al. (2019) proposed a new method to represent and extract signal attributes from motor images. The authors' strategy was to represent signal windows in three dimensions. For this, the spatial distribution of the electrodes in a 2 × 2 matrix was maintained, while the positions without electrodes were filled with zero. They then expanded the 2D conformation to 3D using the temporal information. For the extraction of attributes and subsequent classification, a 3D CNN called Multi-branch 3D CNN was designed. Then, three branches formed by different CNNs were used, called small receptive field network (SRF), medium receptive field network (MRF), and large receptive field network (LRF). Each of these branches is made up of three convolutional layers and three dense layers, including the shared layer. The idea with this architecture is that the parameters of the filters chosen for each branch must be different, targeting different receptive fields in the time domain. Finally, these three networks are combined, serving as input to a Softmax. This methodology was tested with data from the BCI Competition IV 2a database. To analyze the performance of the proposed CNN, the authors compared the accuracy of each of the nine volunteers using CNN 3D Multibranch, SRF only, MRF only, and LRF only. Analyzing the results of networks with a single branch, they realized that it is not possible to choose just one model for all volunteers, as each one stood out in the performance of some subject. These observations point to the need for subject-specific networks. On the other hand, when using the Multi-branch 3D CNN, the results were superior in all cases.

Proposed method
In this work, we present the ASTERI method of representing EEG signal windows in time in the form of an image, to enable the use of convolutional neural networks in applications of intelligent systems based on EEG. To validate the proposal, we applied the proposed form of representation to the solution of the motor imagery problem. Figure 1 schematically presents a summary of ASTERI. First, EEG signals are organized into classes and windowed into 1-s windows with 0.5-s overlap. Then the signals are filtered to remove artifacts and select frequency bands of interest. In this sense, different filtering approaches are tested in this work.
After pre-processing, the signals are represented in the form of images: each signal window, composed of all the channels used, is interpreted as a pseudo-sinogram. From these pseudo-sinograms, images are reconstructed by the classical method of reconstruction: the rear-projection. Figures 2a and b show examples of images obtained with signal windows of subjects 1 and 3.
The third step was to extract and select features. Two ways of extracting attributes were investigated. The first way consists of extracting implicit attributes through pretrained deep networks with other image datasets. For this study, the architectures chosen were standard ImageNet VGG16 (Hon and Khan 2017; Tammina 2019; Theckedath and Sedamkar 2020) and standard MNIST LeNet (Krishna and Kalluri 2019;Larabi et al. 2019;Sun et al. 2021;Tan et al. 2018;Zhang et al. 2019). The second way was to extract explicit attributes from each signal window, both in the time domain and in the frequency domain. The chosen attributes are described in detail in Table 1. These 34 features were empirically chosen to combine time and frequency-based analysis using common state-of-the-art signal features (da Silva Junior et al. 2019, 2022, Espinola et al. 2021a,b, Silva Júnior et al. 2020. We also investigated the fusion of attributes, where the attributes extracted from both forms form a single database. Finally, we applied an attribute selection method based on swarm intelligence, the evolutionary search . Considering the explicit attributes, we could have used a feature selection method to select only the most statistically relevant methods, as used by Faria et al. (2021). However, we preferred to select the most relevant features at the end of the feature extraction process, including also the attributes obtained from the deep neural network.
The fourth step is the investigation of the best classifier configuration. In the proposed deep hybrid architecture, the pre-trained deep neural network is used to extract features from the input images. The feature vector is then passed to the effective classification step, where a random forest classifier is present. To define how many trees to use in random forest, each configuration is tested with tenfold cross-validation. Random forest configurations of 10-100 trees were tested, ranging from 10 to 10 trees. The test and training experiments were performed considering 70% of the database instances. The remaining data were used for validation purpose.
Experiments were also performed with augmented datasets with synthetic instances. The objective of this step was to analyze scenarios in which more data were collected per patient and analyze the impact of this increase in motor image sections on classification performance. To increase the databases, we applied the SMOTE method (Synthetic Minority Oversampling Technique) in train/test subset. The validation subset remained only with instances of the original dataset, that is, collected EEG signals.

Database
The proposed method was tested in the BCI Competition IV 2b (Leeb et al. 2008) database. The database is composed of electroencephalographic signals from 9 volunteers. Each of them was evaluated during 5 sessions. At the beginning of each session, some steps were followed: (1) 2 min with eyes open, looking at a fixed point; (2) 1 min with eyes closed; and (3) 1 min with moving eyes. The purpose of these steps is to estimate the influence of electrooculography (EOG) signals on EEG signals. The electrooculogram was recorded using 3 channels.
After these 4 introductory minutes, the runs of motor images began, monitoring with three EEG channels (positions C3, Cz, and C4). Two classes were tested: imagination of the right hand movement and imagination of the left hand movement. In each of the sessions, 6 runs were performed, consisting of 20 trials each. Thus, the base provides 120 image essays for each of the classes for each session. Figure 3 presents a diagram detailing how the acquisition of EEG signals from this base was performed.
The first two imaging sessions were without visual feedback. They consisted of the following steps: first, a fixation cross appeared on the volunteer's screen; then, an acoustic stimulus was given as an alert; and finally, a visual stimulus indicated the class by means of an arrow. Finally, the patient must imagine the indicated movement for a period of 4 s. Figure 4 details a run without visual feedback.
In contrast, the final three sessions used visual feedback. At the start of each round, a neutral gray smiley appeared on the screen. Thus, the clue with the class was given in the period of 3.5-7.5 s. Depending on the cue, the volunteer must move the face to the right or to the left, according to the cue given. During the movement imagination, the smiley turned green or red, indicating whether the imagined movement was in the right or wrong direction, respectively. Figure 5 details how feedback runs were made.

Backprojection
The backprojection reconstruction algorithm was originally proposed in the context of computed tomography (CT). In this case, there is a source and a detector of an X-radiation surrounding the body to be irradiated. The reconstruction of this body can be considered a 2D distribution of some function. This means that we can have a function that represents the linear attenuation coefficients of the objects internal to the body. We then sought to measure the sum or line integrals of the attenuation coefficients along different collected paths or at different angulations. Each result obtained in one direction is called a projection (Gordon et al. 1975;Hsieh 2003).
The backprojection method is a simple and fast summation method. The projections or Radon transforms of a sample are line integrals at specific angles. Mathematically, the reconstruction technique can be defined as follows. Assuming a function f of two polar variables r and ϕ, with its line integral defined by [Rf](l,θ), being at a distance l from the origin and with a angle θ with the vertical axis. Also considering the R operator, corresponding to the Radon transform, we can define for any function p through Eq. 1 (Gordon et al. 1975, Koshev et al. 2016 where p l (l,θ) is the partial derivative of p with respect to l. For all points (r,ϕ), they satisfy Eq. 2: (1) Therefore, the backprojection method is one of the numerical methods for estimating double integrals in a two-step process. Firstly, you can define: Secondly, the estimate of f * from f is obtained by the backprojection given by Eq. 4: where this second phase is more computationally costly (Gordon et al. 1975).
There are several ways to represent these projections. Sinograms are the best known and most used form. The name sinogram has its origins in the fact that the singlepoint Radon transform is a sinusoid. A sinogram is a collection of flat or single-slice projections arranged in a matrix. It is possible to use the backprojection algorithm to obtain tomographic images from sinograms (Gomes et al. 2020a, Hsieh 2003.

Transfer learning
The training of convolutional neural networks is computationally very expensive, due both to the high demand for processing consumption to adjust the weights of the neurons and to the use of memory, due to its large number of layers and neurons. The greater the number of layers, the greater the number of parameters to be optimized. Furthermore, in many applications, it is impractical to collect new data and reconstruct models. Therefore, transfer of learning can be an interesting alternative. It is inspired by human capacities to transfer knowledge between multiple activities. In other words, the main idea is that the learning acquired in training a database can be used to solve other problems. Thus, it is possible to use deep learning, without the need to perform numerous (Bouti et al. 2020;Pan and Yang 2009) calculations.

Feature extraction: VGG16 and LeNet
The visual geometry group network (VGG16) is 16-layer architecture designed for image classification. They have 13 convolutional layers, where the convolution operation performed with 3 × 3 kernels can be described by Eq. 5: where P input and P output are the pixels of the input and output images, respectively. The parameters W and b are learned (Rangarajan and Purushothaman 2020). Like other deep architectures, the VGG16's convolutional layers act as feature extractors. Initial layers extract simpler attributes, which are combined to form more complex attributes in deeper layers. Therefore, the network has multiple levels of representation, where each level represents data in an increasingly abstract way. In this way, these deeper layers can amplify aspects of the input data that are important in the classification process (LeCun et al. 2015; Polat and Özerdem 2020).
After each of the convolutional layers, the VGG16 has a non-linear activation layer (Rectifed Linear Unit, ReLU) and a maxpooling layer. The network ends up with two fully connected layers (FC). The first FC layer is formed by 4096 neurons. The second layer has the number of neurons corresponding to the number of classes of the problem. Finally, the last layer of the architecture is a Softmax, which gives the probability for each class (Özyurt 2020; Rangarajan and Purushothaman 2020).
Another deep architecture is LeNet. LeNet is a simpler network than the VGG16, originally developed for handwritten digit recognition. It consists of five layers, two of which are convolutional layers and three are fully connected (Priyadharshini et al. 2019) layers. Likewise, each of the convolutional layers is followed by a clustering layer and an activation function (Kuo 2016).

Classification: decision trees and random forests
Decision trees are a type of supervised learning machine that can be used in both classification and regression problems. Decision trees analyze data through a series of attribute-related questions. Each question is contained in a node, which branches into child nodes containing possible answers. The questions then form a hierarchy, modeled as a tree. The starting point of the tree is the root node that has the highest hierarchical level. The endpoint is the leaf, a node with no children. In this way, the algorithm makes decisions following the tree structure. There are several types of decision trees, which are usually distinguished by the way the method traverses the tree (de Freitas Barbosa et al. 2021, Gomes et al., 2020b, Kingsford and Salzberg 2008. The random tree and random forest methods are among the most important tree-based models. According to Geurts et al. (2006), the random tree algorithm uses a single tree built by a stochastic process. This method considers only a few randomly selected features at each node of the tree. Random forests consist of a collection of predictive trees, where each tree depends on a group of random variables (Cutler et al. 2012). In this way, each tree votes for a problem class. Finally, the most voted class is chosen as the classifier's final prediction.

Data augmentation: SMOTE
SMOTE is a popular database augmentation method. The idea of the algorithm is to perform an interpolation between close instances, both belonging to the same class. Therefore, by introducing new examples, SMOTE aids in the generalizability of the classifier (Fernández et al. 2018). Due to its popularity, several works have applied the method, combined with other techniques and proposed modifications (Demidova and Klyueva 2017;Douzas et al. 2018;Jiang et al. 2016).
The idea behind SMOTE is as follows: the algorithm generates synthetic instances based on data similarities, rather than replicating samples. First, SMOTE selects a sample of the class you want to increase. We call this sample s 1 . It then selects k nearest neighbors from the sample s 1 using the KNN algorithm. In this case, the Euclidean distance is considered. Then, one of the neighboring k samples is randomly selected. We will call this second sample s 2 . Finally, a synthetic sample s new is calculated from the samples s 1 and s 2 . It is generated according to Eq. 6. This means that the synthetic sample created is spatially positioned between the samples that generated it (s 1 and s 2 ). Finally, the sample s new is added to the Chawla et al. (2002) database.

Experiments setup
In this work, the codes for pre-processing the motor imagery signals were implemented using the scientific computing environment GNU/Octave. 1 The extraction/selection of attributes and classification steps was performed in the opensource Java library Weka. 2 In the feature extraction step, VGG16 or LeNet deep neural networks were used, pretrained with the IMAGENET and MNIST datasets, respectively. Feature extraction using VGG16 resulted in 1096 attributes, while extraction with LeNet resulted in 500 attributes.
In the best classifier investigation step, we tested different random forests. Each classifier configuration was tested under 30 runs using tenfold cross-validation on the test/train subset (70% of the instances). We also evaluated scenarios with more instances in the test/train subset. For this, we applied the SMOTE algorithm with 2 nearest neighbors, for increases of 50% and 150%. Finally, the best classifier for each train/test subset was tested in the validation subset.

Metrics
We chose two metrics to evaluate the performance of experiments: accuracy and kappa statistic. Accuracy is the probability that the experiment will provide correct results, i.e., to correctly classify the motor imagery as right or left movement. In other words, it is the probability of the true positives and true negatives among all the results. It can be calculated according to Eq. 7 (Gomes et al. 2021b, Santana et al. 2018 where TP, TN, FP, and FN are the true positives, true negatives, false positives, and false negatives, in this order. In addition, the Kappa index is a very good measure that can handle very well multiclass problems, as the one proposed here. It is a statistical metric to assess the agreement between the obtained and expected results defined as follows (Gomes et al. 2021b, Santana et al. 2018): (6) S new = s 1 + s 2 × n, n [0, 1].
where ρ o is the observed agreement or accuracy and p e is the expected agreement, defined as the following (Gomes et al. 2021b, Santana et al. 2018): The kappa index can range from − 1 to 1. Based on its value, the performance of the classifier can be evaluated between very poor and excellent, as shown in Table 2 (McHugh 2012).

Results
Initially, different signal processing and attribute extraction approaches were tested on EEG signals for classification of motor imagery using the ASTERI method. As can be seen in Table 3, regarding the pre-processing of signals, the different approaches included (or not) bandpass filters, wavelet decomposition, and statistical thresholds.
In this context, we first used the fifth-order Butterworth bandpass filter in setups 1, 3, 6, 7, 8, 9, and 10, aiming to select the alpha and beta frequency bands (8-32 Hz). Many works with 2b dataset include this approach Ang et al. 2012;Hossain et al. 2016;Lee and (8)    Choi 2019), since typically the motor movements are related to these frequency bands (Lee and Choi 2019).
In addition to filtering, some of these signals have been decomposed into Haar wavelets. Wavelet-based approaches are interesting in the identification of short duration artifacts in signals such as EEG and may act in the removal of electrooculography artifacts, for instance Bajaj et al. 2020;de Freitas Barbosa et al. 2020). Thereby, the signals were decomposed into lowpass signals and highpass signals using wavelet function. In the following, lowpass signals were recursively decomposed with the same function until we reach the desired number of levels. In this work, we considered decompositions into 2 or 10 levels. Setups 1, 2, 3, 8, and 9 included this type of analysis.
Among the decomposed signals, a statistical threshold was also applied to the wavelet components of setups 1, 2, 3, and 9. The idea in this case is to disregard the wavelet components that are above or below the established thresholds and then reconstruct the signals back . In these cases, two threshold techniques were studied. The first, based on standard deviation, set as threshold the value of ± 1.5 × standard deviation, as detailed in Eq. 10.
where STD means the standard deviation and w the wavelet coefficients. The second technique consisted of a smoother threshold based on the ATAR (Automatic and Tunable Artefact Removal) algorithm proposed by Bajaj et al. (2020). This algorithm provides different filtering modes and a tunable parameter, which allows for smoother filtering. According to Bajaj et al. (2020), the soft threshold is defined as Eq. 11: where = − 1 log − + and θ γ = 0.8θ α . In addition, θ α was defined as Eq. 12: where f (r) = k 2 exp − 100r 2k 2 , r is the Interquartile Range of w , k 1 is the 5% quantile of w, and k 2 is the 95% quantile of w and β = 0.01.
As for feature extraction, two pre-trained networks were tested: VGG16 and LeNet. Exceptionally, in setup 5, we have not used deep networks for feature extraction. In this case, explicit time-and frequency-domain attributes were extracted. In setup 10, a fusion of attributes (implicit extracted with VGG16 + explicit) was applied.

Original dataset
To carry out the experiments with the 10 setups proposed in Table 3, one of the subjects belonging to the database was selected, in order to avoid presenting a huge amount of experimental results. In this case, we selected the subject 8 due to his good results in several studies (Dai et al. 2020;Li et al. 2017). Therefore, it is likely that this subject is the participant who best followed the standards indicated in the motor imagery sessions and, consequently, this subject is a good representative of the full set of subjects. The results for this participant are presented in the tables below. In these initial experiments with subject 8, we considered all signals for training and testing purpose. Table 4 Accuracy values in the classification of motor imagery with EEG signals for subject 8. Different configurations of pre-processing of signals and decision trees were used Table 5 Kappa index in the classification of motor imagery with EEG signals for subject 8. Different configurations of pre-processing of signals and decision trees were used Table 4 presents the accuracy values, while Table 5 presents the kappa index values. Observing the mean values of both metrics, it is possible to notice that setting 10 outperformed the other settings, with an average accuracy of 74.53 ± 2.80 and an average kappa index of 0.49 ± 0.06. If we still consider the standard deviation of these values, we can say that setups 6 and 7 also showed similar results. In contrast, the worst results were obtained by setups 8 and 9, with average accuracies of 63.56 ± 3.94 and 63.49 ± 3.20, respectively. It is also possible to observe an improvement in performance with the increase in the number of trees used. Random forests with 100 trees reached the best classification performance.
The boxplots in Fig. 6 show a comparison of the ten configurations. In this case, only the results of the best classifier of each setup were plotted. For example, for setup 1, the boxplot refers to the random forest classifier with 100 trees, as it achieved the highest average accuracy. Again, boxplots show better performance for settings 6, 7, and 10.
Considering that setup 10 proved to be superior for subject 8, we decided to carry out experiments with the other subjects belonging to the database. This means that the signals from all subjects were subjected to Butterworth bandpass filtering and their attributes were extracted in two ways: explicitly (i.e., time-and frequency-domain features) and implicitly (pre-trained deep neural networks) with the  Table 6 Classification accuracy for the nine subjects. Configuration 10 of pre-processing and extraction of attributes was used. Different decision trees were tested, being trained with subject-specific data pre-trained VGG16 deep architecture, as pointed out earlier in Table 3. Tables 6 and 7 show the values of accuracy and kappa index found for each subject, using again the random forest classifiers. These training and testing experiments were performed with 70% of the original database instances, while remaining signals were separated for validation. It is possible to observe that, for the nine subjects, the results were better when using 90 or 100 trees in the classification. In addition, both tables show that subjects 4 and 8 presented better classification performance than the other subjects, with average accuracy of 78.20 ± 3.14 and 73.40 ± 3.21, respectively. On the other hand, subjects 2 (50.54 ± 3.97) and 3 (50.49 ± 3.96) had lower results. Similarly, the kappa index followed this trend.
Finally, Fig. 7 presents validation confusion matrices for all nine subjects, considering the original dataset with only collected signals. When observing subject 4, for example, Table 7 Classification Kappa index for the nine subjects. Configuration 10 of pre-processing and extraction of attributes was used. Different decision trees were tested, being trained with subject-specific data Fig. 7 Confusion matrices for all nine subjects data from original dataset. Validation was performed with random forest classifier with 100 trees it is possible to notice that among 260 instances referring to imagery of the right hand, 237 were correctly classified, while 23 were classified as imagery of the left hand. Considering the left hand class for this subject, 91.4% of the images were correctly classified. The highest rate of confusion was observed for subject 3. In this case, 16.8% of the left class was classified as right hand class.

Augmented dataset with SMOTE
In the following, the same pre-processing methodology previously applied in the original database (i.e., using only collected signals) was reproduced for the augmented database, with the creation of synthetic instances using the SMOTE method (increase of 150% considering 2 neighbors). Tables 8 and 9 show the results of the classifiers considering only subject 8.
Unlike the original base, configuration 6 stood out with the enlarged base. In this case, the mean accuracy was 92.47 ± 1.19, and the mean kappa index was 0.85 ± 0.02. Considering the standard deviation, it is possible to notice that configuration 7 has similar results. However, configuration 4 presented results inferior to the others, with an average accuracy of 86.26 ± 1.39.
Again, boxplots with the best classifiers for each configuration were plotted, as shown in Fig. 8. The boxplots clarify the highlight for settings 6 and 7.
Finally, configuration 6 was chosen for the experiments with the other subjects. In this case, however, the datasets have been expanded using two increase percentages: 50% and 150%. The augmentation was applied to each subject separately and the percentage of increase was applied in each class (right hand and left hand). Furthermore, SMOTE was applied only on train/test subset (70% of the database). Then, both augmented databases were experimented with decision trees using tenfold cross-validation.
For the database with an increase of 150%, participants 2, 3, and 8 stood out with accuracies around 93% and kappa index around 0.86, according to Tables 10 and 11. For the database with an increase of 50%, the results were intermediate. In other words, our results were higher than the results with the original database, but lower than the base obtained with a higher magnification. In this case, subjects 4 and 8 showed better performance, similar to the original dataset. For subject 4, the mean accuracy obtained was 85.08 ± 2.36 (Table 12), and mean kappa index was 0.70 ± 0.05 (Table 13). In contrast, subject 1 performed worse, with mean accuracy of 84.35 ± 2.38 and mean kappa index of 0.69 ± 0.05. Furthermore, Figs. 10 and 9 present validation confusion matrices for all nine subjects, considering both datasets (expanded by 50% and 100%, respectively). In these cases, Random Forest with 100 trees was applied, considering its superior performance in most train/test cases.
By observing the confusion matrix from Fig. 9, we can see that all subjects had a good performance, with a hit rate of the "Right" class between 83.6% (Subject 3) and 94.4% (Subject 4). In the case of the "Left" class, the hit rate varied between 84.8% (Subject 2) and 95.9% (Subject 8).
In addition, the dataset augmented by 50% (Fig. 10) also present great results. In this case, subject 9 had superior results in the classification of "Right" class (94.4%), while subject 2 had the worse result in the same classification (84.2%). Considering the "Left" class, the results achieved by all subjects were in the range of 85.9 − 96.0%. Finally, Table 14 presents a summary of all validation experiments, considering accuracy and kappa index. As detailed before, Table 8 Classification accuracy for subject 8. Different signal pre-processing and decision trees configurations were used for the expanded database by 150% Table 9 Classification Kappa index for subject 8. Different signal pre-processing and decision trees configurations were used for the expanded database by 150% Fig. 8 Subject 8's motor imagery classification performance using the augmented database with the SMOTE method Table 10 Classification accuracy of the nine subjects, using the augmented database by 150%. Configuration 6 of pre-processing and attribute extraction was used. Different decision trees were trained with subject-specific data the best classifier configuration for each subject (obtained in the training/test step) was applied in the validation subset.

Discussion
According to the experimental results, we can notice a highlight for configurations 10 and 6 for the original and extended databases with SMOTE, respectively. According to Table 3, both configurations make use of signal pre-processing with Butterworth bandpass filters, indicating success in removing artifacts from this technique. On the other hand, both configurations do not use wavelet decomposition techniques combined with statistical thresholds. This indicates that these latter approaches likely removed important information from the signals. Despite this, future experiments can apply a greater number of levels in the wavelet decomposition, since only 2 and 10 levels were tested. It is possible that this change will bring interesting results.
Also, using the SMOTE method to extend the database, more experiments were designed. However, when we look at the validation results, we obtained similar results with all three approaches (with and without expansion), as pointed in Table 14. For subject 8, for instance, it achieved classification accuracy of 93.23%, 92.86%, and 92.48% for original, expanded by 50% and 150% datasets, respectively. With respect to the kappa index, it achieved 0.86 for both original and with 50% expansion and 0.85 for the third dataset. These results indicate that the collection of more data per patient is not necessary. That is, the data collected by the database 2b from BCI Competition IV is sufficient for training and generalization by using the ASTERI method, preferably using the fusion of implicit and explicit attributes to improve results.
These results are interesting, especially in assistive technology rehabilitation scenarios, where acquiring small amounts of signals is less tiring and tedious. In addition, by increasing the collection time of the sessions, there is a risk of the volunteer becoming more dispersed and less efficient in the imagery of the indicated class. Consequently, the signal quality may also deteriorate.
Finally, Table 15 provides a comparison between t he results presented in t his work and ot her Table 11 Kappa index in the classification of the nine subjects, using the augmented database by 150%. Configuration 6 of pre-processing and attribute extraction was used. Different decision trees were trained with subject-specific data

Table 12
Accuracy in the classification of the nine subjects, using the expanded database by 50%. Configuration 6 of pre-processing and attribute extraction was used. Different decision trees were trained with subject-specific data Table 13 Kappa index in the classification of the nine subjects, using the expanded database by 50%. Configuration 6 of pre-processing and attribute extraction was used. Different decision trees were trained with subject-specific data state-of-the-art works, using the same database (BCI IV-2b). Thereby, it is clear that ASTERI method achieved results superior to many state-of-art works in the last 5 years. This result is valuable, considering the hybrid nature of the ASTERI method, which combines deep networks for attribute extraction and decision trees in classification. Thus, EEG signal classification problems can be solved in a less complex way and with satisfactory results.
Furthemore, the work of Dai et al. (2020) also expanded the data, tripling their number. That is, the increase applied was even greater than the two cases proposed in this work. The method used in this case was not SMOTE. The idea applied was the recombination of signals belonging to the same class and the same subject. Despite the relevant results, the method may have added high frequency noise, the peaks and valleys of different signal segments may have been put together.
We also observed that many of these studies considered accuracy only as an evaluation metric (Dai et al. 2020;Li et al. 2017;Lu et al. 2016). However, the BCI competition uses the kappa index to compare results, as it is a more rigorous metric.

Conclusion
In this work, we propose a new method to represent EEG signals in images: the ASTERI method. ASTERI generates pseudo-sinograms, i.e., pre-processed signal windows on which the rear-projection reconstruction method is applied, generating images. The main objective of this type of Fig. 9 Confusion matrices for all nine subjects data from expanded dataset by 150%. Validation was performed with Random Forest classifier with 100 trees representation is to facilitate the use of deep architectures such as CNNs in applications with EEG signals.
To validate this approach, we applied the method to motor imagery signals from the BCI Competition IV 2b database. The signals were windowed, and then we tested different preprocessing approaches. Among the 10 approaches tested, the best results were found when using a bandpass filter (8-32 Hz) and extracting attributes with the pre-trained VGG16 network.
Thus, the average accuracy obtained among the nine volunteers was 88.97%, and the average kappa index was 0.78, using random forest with 100 trees in validation step. In addition to experiments with the collected data, we also expanded the database with synthetic instances generated by Fig. 10 Confusion matrices for all nine subjects data from expanded dataset by 50%. Validation was performed with random forest classifier with 100 trees Table 14 Validation results of the nine subjects, considering the databases with and without the addition of synthetic instances Table 15 Comparison of the proposed method with others of the state of the art the SMOTE method, for each subject. For a 50% increase in the instances of each class, the accuracy and average kappa index were 89.10% and 0.78, respectively. For an increase of 150%, the average accuracy obtained was 89.13%. Thereby, we can see that the results with and without database expansion were very close with ASTERI method. This shows that the signals collected in the motor imagery sections of dataset 2b are sufficient for training, generalization, and subsequent control of brain-machine interfaces. This result is valuable in the sense that increasing the acquisition time can be tiring and affect the participants' concentration, deteriorating the quality of the signals.
Finally, these results indicate that ASTERI is effective in classifying motor imagery and generates results that are competitive with the state-of-art. In addition, ASTERI is also interesting for its low complexity, especially for the use of pre-trained deep networks combined with random forests and unique image reconstruction method for all types of EEG signals, regardless of the subject. In the future, the ASTERI method should be tested in other motor imagery databases and in other applications based on EEG signals.