EEG-based Measurement System for Student Engagement Detection in Learning 4.0

A wearable system for the personalized EEG-based detection of engagement in learning 4.0 is proposed. The system can be used to make an automated teaching platform adaptable to the cognitive and emotional conditions of the user. For example, the teaching strategy could be personalized by an automatic modulation of the proposed contents. The system is validated by an experimental case study on twenty-one students. The experimental task consisted in learning how a speciﬁc human-machine interface works. Both the cognitive and motor skills of participants were involved. De facto standard stimuli based on cognitive task and music background were employed to guarantee a metrologically founded reference. The proposed signal processing pipeline (Filter bank, Common Spatial Pattern, and Support Vector Machine), in within-subject approach, reaches almost 77 % average accuracy by a 3 s time window, in detecting both cognitive and emotional engagement.


Introduction
Man's relationship with knowledge is increasingly mediated by technology. Digital era [1], the period of the pervasive use of information and communication technologies in every area of life, has heavily impacted on learning starting from the second half of the last century. Currently, the ongoing fourth industrial revolution (Industry 4.0) expands the role of technology in learning processes even further: automated teaching platforms can real-time adapt to the user skills and the new generation interfaces allow multi-sensorial interactions with virtual contents. In the pedagogical domain, the concept of "Learning 4.0" is emerging and it is not just a marketing gimmick [2]. The 4.0 technologies are strongly impacting on the creation, the conservation, and the transmission of knowledge [3]. In particular, the new immersive eXtended Reality (XR) solutions make possible to achieve embodied learning by restoring the role of learning catalyst to bodily activities [4]. Furthermore, wearable transducers and embedded Artificial Intelligence (AI) increase real-time adaptivity in human-machine interaction [5]. In detail, in the Learning 4.0 context, the adaptation is reciprocal: the subject learns to use the human-machine interface, but also the machine adapts to human by learning from her/him [6].
Traditionally, learning to use a new technological interface was a once-ina-lifetime effort as a child. For many people this has occurred with learning to read and write. Recently, the rapidity of technological evolution has been entailing the need to learn how to use several interfaces. The joy-pad, icon, touch/multi-touch screen, speech and gesture recognition are examples of the interface (hardware and software components) evolution of new interfaces.
More specifically, learning to use an interface is an hard task which requires complex cognitive-motor skills. When human beings learned to use the mouse/touchscreen, as well as when they learned to write, read or speak, their minds learned complex cognitive-body patterns [7,8]. Regarding the humanmachine interfaces of older generation, the user was autonomously required to explore the different available resources and learn their use. Currently, the interfaces 4.0 can adapt in real time to the user supporting the learning process.
The effectiveness learning process mainly depends on the engagement level of the learner [9]. Therefore, the engagement monitoring is a fundamental aspect allowing the machine to adapt to the user.
In this context, Engagement stands for concentrated attention, commitment, and active involvement in contrast to apathy, lack of interest or superficial participation [10,11]. In the learning context, Fred Newman, in his report "Student Engagement and Achievement in American Secondary Schools", defines engagement as: "the student's psychological investment in and effort directed toward learning, under-standing, or mastering the knowledge, skills, or crafts that academic work is intended to promote" [12,13]. Moreover, Frederiks defines the student engagement as a meta-construct that includes: behavioral, emotional, and cognitive engagement [14].
As concerns the engagement measurability, evaluation grids and self-assessment questionnaires (to be filled out by the observer or by the learner autonomously) are traditionally the most used methods for the behavioral, cognitive, and emotional engagement detection [15]. In recent years, measures based on biosignals are spreading very rapidly. Furthermore, the use of physiological sensors able to detect cognitive and emotional engagement allows the real-time machine adaptive strategies. Among the different physiological biosignals, the EEG appears to be one of the most promising technology thanks to its low cost, low invasiveness, and high temporal resolution. Moreover, the EEG contains a broader range of information about the state of a subject with respect to others biosignals [16]. In 1995, authors in [17] proposed an engagement index to decide when to use the autopilot and when to switch to the manual one during a fly simulator session. The engagement index was E = β θ+α where α, β, and θ are the EEG frequency bands in (8)(9)(10)(11)(12)(13) Hz, (13)(14)(15)(16)(17)(18)(19)(20)(21)(22) Hz, and (4-8) Hz respectively. Several studies used this index as engagement estimator also in learning contexts [15,18,19]. However, the proposed index does not take into account the different engagement types (i.e., cognitive, emotional and behavioural) proposed by the theories previously reported.
In this study, a method for EEG-based cognitive and emotional engagement detection during learning activities is proposed. The high wearability is guaranteed by a low number of dry electrode. This property allowing the cognitive and emotional learning engagement detection in daily life applications. Furthermore, the proposed method can be used also in traditional school contexts. Acquiring cognitive and emotional engagement data during the lessons can provide, for example, i) real-time feedback to the teacher, for maximizing class engagement, and ii) student engagement trends over the time, for academic program adaptation to individuals or to the whole class.
This work is organized as follows: in Section 2, a background on the engagement in the learning context is reported. In Section 3, the basic ideas, the proposed method, and the processing framework are described. Then, in Section 4, the experimental protocol and results are reported and discussed.

Background
In general, learning a new interface can be traced back to a classic learning problem. In the constructivism framework learning consists in the construction of the schemes: units of knowledge, each relating to different aspect of the world, including actions, objects, and abstract concepts [20]. When a subject learns a specific pattern, the neuroplasticity process is activated modifying the neural brain structure [21]. Once the process is learned, the brain builds a myelinated axon connection system to automate that. The adjacent neurons fire in unison, and more the experience or operation is repeated, more the synaptic link between neurons becomes strong [22]. The automated use of all mental processes as well as the understanding and use of new technologies occurs through the creation of neural diagrams and maps [23,24]. During life, humans learn new skills or modify the already learned ones by enriching the existing neural maps. Therefore, the introduction of increasingly innovative technologies requires a continuous brain re-adaptation to new interfaces [25]. This effort is more effective when the learner is engaged. An engaged user actuates learning in an optimal way, avoiding distractions, and increasing the mental performance [26,27].
In [28], three different types of engagement are proposed: behavioural, emotional, and cognitive engagements. Behavioral engagement focuses on the ob-servable actions during the learning process [29,30]. Emotional engagement regards the impact of emotions on the cognitive process effectiveness and the effort sustainability for the users [31]. Cognitive engagement refers to the amount of cognitive resources spent by the user in a specific activity [30,32].
Different methods for learning engagement detection are proposed in literature [19]. For the behavioral engagement assessment, observation grids (used to support direct observations or video analysis) were proposed [33,34]. For the cognitive and emotional engagement assessment, self-assessment questionnaires and surveys (compiled autonomously by the user) were developed [35,36]. In recent years alternative engagement assessment methods based on physiological sensors have established: heart-rate variability, galvanic skin response, and EEG. Among these biosignal, the most promising for engagement assessment is the EEG. As already described, the learning is based on a neurological changes set, and the EEG presents the possibility of studying these the neural modification [15,[37][38][39][40]. The EEG system is low-cost and non-invasive, and provides information on brain activity within milliseconds. It is now commonly used in many application [41,42] including the cognitive and emotion engagement assessment as well as the detection of the underlying elements: emotions recognition and cognitive load activity assessment respectively [43][44][45][46][47][48][49].
To achieve a correct metrological reference of the EEG-based cognitive and emotional engagement constructs, a reproducibility problem arises. From emotional point of view, when eliciting a specific emotion, the same stimulus does not often induce the same emotion in different subjects. The effectiveness of the induction can be verified by means of self-assessment questionnaires or scales. The combined use of standardized stimuli and subject's self-assessment ratings can be an effective way to build a metrological reference for a reliable EEG-based emotional engagement detection [50]. From the cognitive point of view, when the subject is learning, the working memory identifies the incoming information and the long-term memory constructs and stores new schemes on the basis of the past ones. While the already built schemes decrease in the working memory load, the construction of new schemes entails its increase [16,51]. Therefore, increasing difficulty levels allows to induce different cognitive states; the cognitive engagement level grows up according to the proposed exercise difficulty increases.

Proposal
This study proposes an EEG-based cognitive and emotional engagement detection method during a learning task. In this section the basic ideas, the architecture, and the adopted processing framework are outlined.

Basic Ideas
The proposed method is based on the following key concepts: -EEG-based subject-adaptative system: new input channels (EEG) to the Intelligent Teaching Systems enhance the adaptivity to the user in the context of learning 4.0. -Cognitive and emotional learning engagement detection: the assessment of student engagement is realized considering both cognitive and emotional aspects, according to the Frederiks theory [14]. -Within and cross-subject designs: both the approaches are experimentally validated in order to pursue accuracy maximization or calibration-time minimization, respectively. -Domain Adaptation procedure in cross-subject case: a Transfer Component Analysis (TCA) [52] allows to use knowledge acquired about other subjects to simplify the system calibration on a new subject. -Wearable system: an ultralight wireless EEG device with few and dry electrodes maximizes the wearability. -Multi-factorial metrological reference: the system is calibrated by using (i) standardized strategies for inducing different levels of cognitive load, and (ii) a public acoustic stimuli dataset to elicit emotions. Moreover, the metrological reference of emotional engagement was confirmed by statistical analysis on the outputs of self-assessment questionnaires. -Narrow EEG frequency intervals: the EEG features resolution is improved by a 12-band Filter-Bank, obtained by sub-dividing the traditional EEG six bands (delta, theta, sigma, alpha, beta, and gamma).

Architecture
The architecture of the proposed system is depicted in Fig. 1. The eight Active Dry Electrodes acquire the EEG signals directly from the scalp. Each channel is differential with respect to AFz (REF), and referred to Fpz (GND), according to 10/20 international system. After transduction, analog signals are conditioned by the Analog Front End. Next, they are digitized by the Analog Digital Converter (ADC), and submit an Artifact removal block performed by an ICA based algorithm. Then the signals are sent by the wireless Bluetooth transmission to the Data Processing stage. Here, the suitable feature are extracted by a 12-component Filter Bank. The two Support Vector Machine (SVM) classifiers receive the features array from two trained Common Spatial Pattern (CSP) algorithms for detecting the Cognitive and the Emotional Engagement respectively. Only in the cross-subject case, a baseline removal followed by a TCA procedure is provided during the training stage of the classifier.

Processing Framework
In this section, (i) the feature extraction and selection, the (ii) baseline removal and Domain Adaptation, and (iii) the classification are detailed.

Feature extraction and selection
In this work, a novel Filter Bank version [42] is adopted. EEG signals are acquired by an eight channels device with sample rate of 512 Sa/s. The acquired signals are then filtered by a filter bank composed of 12 infinite impulse response (IIR) band-pass Chebyshev type 2 filters with 4 Hz amplitude, equally spaced from 0.5 to 48.5 Hz. Then, epochs are extracted using a time window of 3 s with an overlap of 1.5 s.
Then, a Common Spatial Pattern (CSP) [16] is applied. In a binary problem, CSP works by computing the covariance matrices related to the two classes, simultaneously diagonalized such that the eigenvalues of two covariance matrices sum up to 1. Afterwards, a matrix is computed to project the input into a space where the differences between the class variances are maximized. More precisely, in a binary problem, the projected components are sorted by variances in a decreasing or ascending order: the former, when the projection matrix is applied to inputs belonging to the first class, while the latter when inputs belong to the second class [53].

Baseline removal and Domain Adaptation
A cross-subject approach has several advantages with respect to a withinsubject one, such as the reduction of time for the initial calibration procedure. Unfortunately, the non-stationarity nature of the EEG signal leads to a greater data variability between subjects. This is a well-known problem in the literature, which makes the cross-subject approach a very challenging task [54]. Currently, the Domain Adaptation methods [55] are obtaining a great attention from the scientific community. In this work, the Transfer Component Analisys (TCA) [52] is adopted. TCA is a well-established technique of domain adaptation already used in the EEG signal classification literature with promising results [54]. In a nutshell, TCA searches for a common latent space between data sampled from two different (but related) data distributions by preserving data properties. More in detail, TCA searches for a data projection φ that minimizes the Maximum Mean Discrepancy (MMD) between the two distributions, that is: where n S and n T are the numbers of points in the first (source) and the second (target) domain set respectively, while x Si and x T i are the i−th point (epoch) in the two different sets. The data projected in the new latent space are then used as input for the classification pipeline. However, TCA works with only two different domains, differently from a multiple-subject environment, which can lead to a domain composed of several sub-domains generated by the different subjects or sessions. In [54], TCA was tested by considering for the first domain a subset of samples from N − 1 subjects, where N is the total number of subjects, and with the data of the remaining subject for the other domain. However, this approach does not take into consideration the fact that different subjects may belong to very different domains, leading to poor results. A simple solution consists in subtracting to each subject a baseline signal recorded from the user, for example, in rest condition. However, this last point requires new subject acquisition. Instead, in this work, an average of the signals for each subject is used as baseline, thus avoiding the need for new signal acquisitions.

Classification
For the classification stage, Support Vector Machines (SVMs) [56] are implemented. Considering inputs as points in a vector space, SVM is a binary classifier which discriminates data according to a decision hyperplane. Differently from other hyperplane-based classifiers, an SVM finds the hyperplane maximizing the separation between the classes, i.e. the hyperplane having the largest distance from the margins of the classes.

Protocol
Twenty-one school age subjects (9 males and 13 females, 23.7 ± 4.1 years) participated in the experiment. The ethical committee of the University of Naples Federico II approved the experimental protocol. All methods were performed in accordance with the relevant guidelines and regulations. Before the experiment, each subject read and signed the informed consent. All volunteers have no neurological diseases. Each subject was seated in a comfortable chair at a distance of 1 m from the computer screen. The location was sanitized before and after of each acquisition as indicated in the COVID-19 academic protocols. Each subject was equipped with a mouse to carry out the experimental test. After wearing the EEG-cap, the contact impedance was assessed to guarantee optimal signal-acquisition conditions. Each subject underwent an experimental session composed by 8 trials. Various stimuli to induce high and low levels of emotive and cognitive engagements were equally distributed among the trials. As stimulus modulating the cognitive engagement level an updated and revised Continuous Performance Test (CPT) [57] was administrated. In particular, a CPT version based on a learning by doing activity on how an interface works was adopted. Whereas, proper background music and social feedback was used to modulate the emotive engagement level . More in details, the three different stimuli are described as follows: -Revised CPT: a red cross and a black circle on the computer screen were presented to the subject. The red cross tends to run out from the circle on the screen in random directions. The subject was asked to keep the cross inside the circle by using the mouse. For each trial, a different difficulty level was set by the experimenter changing the cross speed. The percentage of the time spent by the red cross inside the black circle with respect to the total time was reported to the subject at the end of the trial (Fig. 2). -Background music: for each trial, a particular emotive engagement level was favored by proper background music. The music tracks were randomly selected from the MER [58] database where songs are organized according to the 4 quadrants of the emotion Russell's circumplex model [59]. The songs associated with the Q1 and Q4 quadrants (cheerful music) were employed in high emotional engagement trials, Q2 and Q3 for the low ones (sad music). -Social feedbacks: during each trial, the experimenters gave proper social feedbacks according to the emotive engagement levels under the experimental protocol. The positive and negative social feedbacks consisted of encouraging and disheartening comments respectively, given to subject on his/her ongoing performance. The social feedback effectiveness was improved by the simultaneous music background effects.
A well-founded metrological reference, is ensured by two assessment procedures validating the stimuli effectiveness were used: performance index : an empirical threshold was used to confirm that an appropriate CPT stimuli response was given by the participant. The threshold changed according to the trial difficulty level. -Self Assessment Manikin questionnaire (SAM): the emotional engagement level was assessed by a 9-level version of the SAM. The lower emotional engagement level was associated to the SAM score 1, while the greater one to 9.
The experimental session started with the administration of the SAM to get information about the initial emotional condition of the subject. Then, a preliminary CPT training phase to uniform all the participants starting levels was realized. After this preliminary phase, each trial was implemented by a succession of a CPT stage followed by a SAM administration.

Dataset building
45 s acquisition EEG signals were labeled according to two parameters: i) high or low emotional engagement, and ii) high or low cognitive engagement. More in detail, regarding the cognitive engagement, the trials were labeled according to the CPT speed [51,60], since the higher was the speed the more the cognitive engagement increased [16,51]. The trials having speed lower than 150 pixels/s were labeled as low c whereas high c , were assigned to the trials having speed higher than 300 pixels/s. As concern the emotional engagement, the trials characterized by cheerful/sad music and positive/negative social feedback were labelled as high e /low e . For each trial, the SAM results (normalized to the initial pre-session value) were consistent with the proposed stimuli. In fact, a one-tailed t-student analysis revealed in the worst case a 0.02 P-value.

EEG Instrumentation
The AB-Medica Helmate system Class IIA (certified according to the Regulation on medical devices (EU) 2017/745) is used for the EEG signal measurements [61] (fig.3 a). The device provides 10 dry electrodes disposed according to the International Positioning System 10/20: Fp1, Fp2, Fz, Cz, C3, C4, O1, O2, AFz (ref), and Fpz (Ground). The signals are differentially acquired with respect to the Fpz electrode and grounded to the AFz electrode. The Electrodes (made of a conductive rubber ending with Ag/AgCl coating) are of three different shapes to minimize the contact impedance in each scalp area ( fig.3 b). The Helm8 AB-Medica Software Manager [61] allows to: i) verify the contact impedance level, ii) and apply several digital filters for a real-time signal visual analysis. The EEG signals are acquired with a 512 Sa/s sampling rate and sent via Bluetooth to a computation device.

Data Processing
An artifact removal stage preceded the feature extraction and the classification stages. Independent Component Analysis (ICA) was used to filter out the artifacts from the EEG signals using the Runica module of the EEGLab tool [62]. Then, data were normalized by subtracting their mean and dividing for their standard deviation.

Feature extraction
EEG data were divided in epochs of 3 s, overlapping by 1.5 s. Owing to the sampling rate of 512 Sa/s, for each subject 232 epochs of 1536 samples per channel were extracted. Five different strategies were compared: 1. Butterworth -Principal Component Analysis (BPCA): data were filtered by a fourth-order bandpass Butterworth filter [0. 5 -45] Hz; then, relevant features were extracted using Principal Component Analysis (PCA) [63] selecting the components explaining the 95% of the total variance; 2. Butterworth -CSP (BCSP): data were filtered using a fourth-order bandpass Butterworth filter [0. 5 -45] Hz followed by a CSP projection stage; 3. Filter Bank -CSP (FBCSP): data were filtered through a 12 IIR bandpass Chebyshev filter type 2 filter bank with a 4 Hz bandwidth equally spaced from 0.5 to 48.5 Hz, followed by a CSP projection stage. 4. Domain adaptation: only in the cross-subject approach, a baseline removal and a TCA were adopted. 5. Engagement Index : to make a comparison with the classical literature approach, the engagement index proposed in [17] was extracted. Although the Engagement Index was not defined for a particular engagement type, given the experimental setup proposed in [17], it can be assumed compatible with the cognitive engagement proposed in this work.

Classification
The output of the classification stage can be "high" or "low" both for cognitive and emotional engagement. For each feature selection strategy shown in the previous subsection, four different classifiers were compared: SVM, k-Nearest Neighbors (k-NN) [64], shallow Artificial Neural Networks (ANN) [65], and Linear Discriminant Analysis (LDA) [64]. Each combination of feature selection strategies and classifiers were used on both emotional, and cognitive engagement. The best model was selected by a stratified leave-2-trials out technique in order to maintain a balancing among the classes in each fold. A Grid search strategy was adopted as approach for hyperparameters tuning for each classifier (Table 1).

Experimental Results
In this section, the experimental results obtained in within-and cross-subject cases are reported.

Within-subjects
Firstly, to make a comparison with the classical literature approach, the engagement index proposed in [17] was used as feature for a classification of the cognitive engagement. Unfortunately, as highlighted by the results reported in Tab. 2 accuracy performances were not optimal. In fact, this feature is mainly used in non-predictive applications (e.g., [19]). Instead, the best results both on cognitive and emotional engagements (Fig. 4) were achieved using features extracted by Filter-Bank and CSP.  Quantitative results related to the use of Filter Bank and CSP for each classifier can be observed in Tab. 3: among the different classifiers, SVM stands out with a better performance than the others, reaching its best mean accuracies of 76.9 ± 10.2 on cognitive engagement classification and of 76.7 ± 10.0 on emotional engagement. Results are computed as the average accuracy over all the subjects. The results reported in Fig. 4b show that the Filter Bank improves the classification performance in a significative way. This can be due to the use of several sub-bands which highlight the signal main characteristics, allowing the CSP computation to project the subject data in a more discriminative common space. In Fig. 5, BCSP and FBCSP are compared through t-SNE [66] on the subject data transformed using the two different methods. The figure shows that, for several subjects, CSP applied after FB projects the data in a space where they are easily separable with respect to the BCSP case.

Cross-subject approach
A t-SNE plot of the data first and after removing the average value of each subject is shown in Fig. 6. The data without for-subject average removal (Fig.  6a) are disposed in several clusters over the t-SNE space, exhibiting a fragmentation tendency. Instead, after the for-subject average removal (Fig. 6b), the data result more homogeneous, enhancing the model generalizability. A comparison using TCA with and without the for-subject average removal is made and the resulting performances are reported in Tab. 4. The results show that removing the for-subject average from each subject boosts the performance with respect to using TCA alone (more than 3 % of improvement in almost all classifiers, especially in Cognitive Engagement case).  Table 4: Cross-subject experimental results using FBCSP followed by TCA. Accuracies are reported with and without for-subject average removal for cognitive engagement and emotional engagement detection. The best performance values are highlighted in bold.

CONCLUSION
In this work, a wearable system for personalized EEG-based cognitive and emotional engagement detection is proposed. The system can be used in the context of Learning 4.0 as a new input channel of an adaptive automated teaching platform to improve the learning effectiveness. The wearability is guaranteed by a wireless cap with dry electrodes and 8 data acquisition channels.
The system is validated on students during a training stage involving cognitive and motor skills and aimed to learn how to use a human-machine interface. Standard stimuli, performance indicator, and self assessment questionnaires were employed to guarantee a well founded metrologically reference. The proposed method, based on Filter Bank, CSP and SVM, experimentally showed the best performance. In particular, in the cross-subject case, an average accuracy of 72.8 % and 66.2 % was reached for the cognitive engagement and emotional engagement respectively by using TCA and for-subject average removal. Instead, in the within-subject case, an accuracy of 76.9 % and 76.7 % was reached for the cognitive engagement and emotional engagement, respectively. Compliance with ethical standards