Artificial Intelligence for Emotion Recognition in EEG
Artificial intelligence (AI) has made significant progress in recent years, enabling the development of systems that can accurately perform tasks such as speech and image recognition. One area of AI that has received particular attention is emotion recognition, which aims to identify and classify human emotional states from various input modalities, giving birth to a new branch of AI, i.e., affective computing1.
The first attempts in this field concerned recognizing the emotions of people based on their facial expressions 2 3, or more recently, many natural language processing (NLP) models have been developed to assess the emotional valence of words or sentences 4.
Electroencephalography (EEG), a neurophysiological technique that measures the spontaneous electrical activity of the brain, has proven to be useful in the field of emotion recognition5 due to its sensitivity to emotional changes6. For instance, studying emotions through physiological signals can be particularly useful for individuals who have difficulty expressing emotions through facial expression or speech, such as people with traits in autism spectrum disorders7. Thus, having a tool that can translate emotions into feedback that is understandable by therapists or parents could be extremely helpful.
However, classifying EEG data can be challenging due to several factors. EEG signal not only contains actual brain activity but, if not properly preprocessed, is also affected by noise and artifacts8. Additionally, the EEG signal is non-linear9 meaning that linear equations may have limited effectiveness in modelling it. EEG signal is also non-stationary9, meaning that its statistical properties vary over time. This can make it difficult for models trained on temporally limited EEG data to generalize at different times or for different people. Finally, there is high inter-subject variability in the EEG signal, which can drastically affect the performance of a model when evaluating different subjects10.
The consequence of these features, taken together, is represented by the struggle to have good results with machine or deep learning techniques.
One common approach to simplify the input data is feature extraction. This involves extracting relevant features from the data, such as time domain features like event-related potentials (ERPs)11. For example, some studies have employed the amplitude (P200, P300) and latency (N100, N200) of visual ERPs (positive and negative voltage deflections) as features linked to emotions12,13. Other commonly used features are those in the frequency domain, which can be identified through power spectrum analysis. Indeed, some useful elements that can be utilized to determine the emotion from the EEG signal have been discovered through power spectrum analysis14. In one recent paper15, authors extracted five frequency bands and used a Random Forest (RF) model on those extracted features, achieving an accuracy of 70% in classifying emotional valence. Another study by Pachori and colleagues16, classified emotions (negative, neutral or positive) achieving a result of 94% exploiting the decomposition of a multivariate emotion EEG signal into sub-band signals using the Fourier-Bessel series expansion (FBSE) based empirical wavelet transform (FBSE-EWT)17
Deep Learning methods for Emotion Recognition
Although feature extraction can be a promising approach for simplifying EEG data, another more challenging method involves using raw data as input to the model. In this case, the architecture and composition of the model itself are used to extract the most important features. Convolutional neural networks (CNN) are often used for this purpose18 as they are capable of initially extracting both local, low-level features and global, high-level features from raw input data19. Results achieved when using this approach in a binary classification of emotional valence are often considered good when the accuracy is less than 70%18, even when using feature extraction techniques. Another possible approach not often used in the literature is classifying emotional valence by using the combination of CNN and Recurrent Neural Networks (RNN). In fact, RNNs have an internal state that allows them to learn from long-term dependencies and temporal patterns in the data20. In a recent work21, the authors used a particular type of RNN, a Long Short Term memory network (LSTM)22, particularly efficient to avoid the vanishing gradient problem, which occurs when the gradients of the loss function with respect to the weights of the earlier layers become very small, and handling long sequences23. With this method, Researchers reached an accuracy around 70% for the emotional valence.
One promising approach could be using a CNN + GRU (Gated Recurrent Unit), which has been scarcely used in emotion classification. However, analysis with EEG sleep data24 suggests that this type of model could be particularly useful for these purposes.
Recently, researchers have begun to use Transformer neural networks25 for EEG analysis. Originally developed for natural language processing26, these models have also been adapted to time series as input data 27 and, more recently, EEG data. A recent study28 achieved impressive results by using Transformer networks to classify mental overload and predict age and gender from raw EEG data, achieving an accuracy of around 90%. However, there are currently no studies on the use of Transformer for emotion classification.
Both feature extraction and raw signal methods have their advantages and disadvantages. Feature extraction results are usually more accurate, but the signal is extensively processed, making it difficult to use for a future Brain Computer Interface (BCI) machine. BCI could be a great asset for some individuals in particular cases such as interventions for autistic people29 that can express their emotions in a misleading way for the therapist30. On the other hand, using the raw signal produces less accurate results, but this method could be more useful in this context.
Emotion Regulation
In this paper, we used an EEG emotion regulation (ER) task to classify the perceived emotional valence of emotional pictures and the ER strategy adopted. We asked participants to assess the emotional valence of 60 pictures while adopting two ER strategies (and a control condition). We used a novel approach based on identifying the best methodology to classify EEG data after minimal preprocessing.
ER refers to the “extrinsic and intrinsic processes responsible for monitoring, evaluating, and modifying emotional reactions, especially their intensity and duration”31. It plays an important role in everyone’s life: many studies highlighted the correlation between healthy ER strategies and social and affective adaptation32 33, how it affects decision-making34 and coping with stress35 or the severity of symptoms in conditions such as Post-traumatic stress disorder (PTSD)36 or Attention deficit hyperactivity disorder (ADHD)37. Also, the typical state that characterizes mood and anxiety disorders often depends on emotion dysregulation38.
Two of the most studied ER strategies are cognitive reappraisal and expressive suppression39. Cognitive reappraisal is an antecedent-focused ER strategy, refers to the attempt to reinterpret an emotion-eliciting situation in a way that changes its meaning and emotional impact33,40. Expressive suppression, on the other hand, is a response-focused strategy and can be defined as the attempt to hide, inhibit, or reduce ongoing emotion-expressive behaviour (such as facial expressions, verbal utterances and gestures)41.
Data-centric approach
Since emotion and ER classification are still challenging tasks for machine learning models, the first aim of our study is to use a data-centric approach to feed the machine learning model the best possible data based on input EEG data.
For what concerns the data-centric approach, the first technique we used was confident learning. This method identifies and corrects label errors in any dataset using any model42. Confident learning can help models avoid learning from unreliable or inconsistent labels, which can degrade their accuracy and generalization. In the same vein, we also used curriculum learning. This is a training strategy that orders data samples from easy to hard, based on several criteria such as label noise or the ambiguity of the examples given42. The concept of curriculum learning was first introduced by Bengio and colleagues in a data-centric framework43, who proposed a method for training neural networks by presenting training examples in a predefined order of difficulty. This approach was inspired by the idea of curriculum development in education, which involves designing a sequence of learning experiences that gradually build on one another. Since then, curriculum learning has been extensively studied in the machine learning literature, with researchers exploring a variety of different strategies for selecting the order in which training examples are presented44. Curriculum learning can help models focus on simpler concepts first and gradually progress to more complex ones, without being overwhelmed by noise or ambiguity.
Indeed, our assumption is that both assessing the emotional valence and adopting the right ER strategy can be difficult for a naïve subject, so a promising way to approach this problem is studying not only the EEG variation but also the sanity of the data and the label assigned.
Explainable AI (XAI)
The second aim of this study was to use explainable AI (XAI) to study how different models make classification decisions with respect to emotion and ER strategy classification of raw EEG data. With regard to XAI, we employed the Integrated Gradient (IG) approach 45 to study the deep learning models we trained. IG requires no modification to the original network (Model Agnostic Approach) and is extremely simple to implement. Its objective is to assign an attribution score that underlines how each input feature is positively or negatively related to each prediction. It is based on two axioms: (i) Sensitivity: If one feature change makes the classification output change, then that feature should have a non-zero attribution. This makes sense because if a feature makes the output change, then it must have played a role. (ii) Implementation Invariance: The attribution method result should not depend on the specificities of the neural network architecture 45.
Hypothesis
Our hypothesis is that adopting those two novel approaches for emotion recognition, namely data-centric approach and XAI, could lead to less biased networks, improve their learning, and understand how the predictions are made. Curriculum learning could help models focus on simpler concepts first and gradually progress to more complex ones, without being overwhelmed by noise or ambiguity. Confident learning could help models avoid learning from unreliable or inconsistent labels, which can degrade their accuracy and generalization. IG could enhance the understanding of how the predictions are made by different models. In this study, we compared four different architectures: CNN, CNN + RNN, CNN + bidirectional RNN, and Transformer. In summary, in this paper, we propose a novel approach that combines curriculum learning, confident learning and IG for emotion classification, using different models, while adopting two ER strategies, i.e., expressive suppression and cognitive reappraisal, and a neutral strategy that implies just looking at the stimuli.