Facial Expression and Electrodermal Activity Analysis for Continuous Pain Intensity Monitoring On The X-ITE Pain Database

Automatic systems enable continuous monitoring of patients’ pain intensity as shown in prior studies. Facial expression and physiological data such as electrodermal activity (EDA) are very informative for pain recognition. The features extracted from EDA indicate the stress and anxiety caused by diﬀerent levels of pain. In this paper, we investigate using the EDA modality and fusing two modalities (frontal RGB video and EDA) for continuous pain intensity recognition with the X-ITE Pain Database. Further, we compare the performance of automated models before and after reducing the imbalance problem in heat and electrical pain datasets that include phasic (short) and tonic (long) stimuli. We use three distinct real-time methods: A Random Forest (RF) baseline methods [Random Forest classiﬁer (RFc) and Random Forest regression (RFr)], Long-Short Term Memory Network (LSTM), and LSTM using sample weighting method (called LSTM-SW). Experimental results (1) report the ﬁrst results of continuous pain intensity recognition using EDA data on the X-ITE Pain Database, (2) show that LSTM and LSTM-SW outperform guessing and baseline methods (RFc and RFr), (3) conﬁrm that the electrodermal activity (EDA) with most models is the best, (4) show the fusion of the output of two LSTM models using facial expression and EDA data (called Decision Fusion = DF). The DF improves results further with some datasets (e.g. Heat Phasic Dataset (HTD)).


Introduction
A reliable assessment of pain is necessary to determine appropriate and prompt treatment for vulnerable patients who are unable to self-report their pain, such as intensive care patients, people with dementia, or adults with cognitive impairment. To make the clinical observations go well, it is promising to provide an automated system due to its possibility for an objective and robust measurement and monitoring of pain [1]. Othman et al. [2] reported that machines are much better than human observers in recognising pain intensity using facial expression. Further, automatic pain recognition when analysing electrodermalactivity (EDA) data improves pain assessment [3]. This work focuses on the EDA modality and on combining the facial expression and EDA modalities for continuously recognising pain intensity, and then compares the obtained results to facial expression modality results pre-sented in [4,5]. Such automated monitoring systems could be highly beneficial for reliable and economical pain intensity assessment.
In the study of pain recognition, the X-ITE Pain Database has been introduced to evaluate different proposed methods. It comprises reactions to pain stimuli in four different qualities: phasic (short) and tonic (long) variants of each, heat and electrical stimuli. The X-ITE Pain Database includes 134 healthy participants (subjects) between 18 and 50 years who were stimulated with heat and electricity; it involves data from different sensors such as frontal RGB camera video, audio signals, physiological signals (ECG, EMG, and EDA), for more details on the database see Gruss et al. [6]. Conducting experiments with healthy subjects helps developing the pain assessment technology, which later can be validated and applied with vulnerable patients, who are unable to self-report. In this work, the selected database contains 127 participants (subjects), for which data were available from all sensors. The database is extremely imbalanced. Thus, to reduce the impact of the imbalanced database, we used the same 11 datasets proposed by Othman et al. [4,5]: Reduced datasets were obtained after using a reduction strategy (see Section 2.1), which addresses the problem of imbalanced data. We focus in this study on facial expression and EDA data involving the phasic and tonic pain intensity during the application of the thermal and electrical pain stimuli and no pain.
Many studies in the past years have focused on the temporal integration of framelevel features to recognise pain intensity because it is good in describing dynamic among neighboring frames. In this work, and in line with Othman et al. [4,5] who analysed facial expression to recognise continuous pain intensity, the temporal integration of EDA features is done using a time series statistics descriptor. Three different methods are applied to discriminate between no pain, three pain intensities (low, moderate, and severe) and qualities (heat and electrical stimuli) in regarding to sequence classification and regression: (1) Random Forest (RF) as baseline methods [Random Forest classifier (RFc) and Random Forest regression (RFr)], (2) Long-Short Term Memory (LSTM), and (3) LSTM using sample weighting method (called LSTM-SW).
Othman et al. [4,5] reported that RF, LSTM, and LSTM-SW methods with a time series statistics descriptor called Facial Activity Descriptor (FAD) and using the reduced datasets from the X-ITE Pain Database help with the imbalanced database problem and improve the results for both classification and regression. Further, LSTM, and LSTM-SW methods were better than RF (baseline methods: RFc and RFr) to recognise continuous pain intensity. The exceptions were small datasets, for which RFs were the best but the performance of the models was still poor. This paper extends [4,5] by investigating the same methods with EDA data and comparing the results with results when using facial expression data. In addition, we present the results of fusing the output of models based on facial expression and EDA data with mean-score mapping approach (called Decision Fusion = DF) to show whether recognition performance of continuous pain intensity improves.
The current work is organised as follows. Section 1.1 provides an overview of pain recognition methods based on facial expression and EDA data and then describes their relevance to this paper, while Section 1.2 describes the contribution of this work. In Section 2 the used material and methods are presented for automatic recognition of continuous pain intensity using electrodermal activity sensors. The X-ITE Pain Database preprocessing, feature extraction preprocessing, the experimental setup, Random Forest (RF) methods, Long-Short Term Memory (LSTM), and LSTM using sample weighting (called LSTM-SW) methods for classification and regression task are described in detail. Section 3 presents results and compares with previous work, followed by a discussion in Section 4. We conclude the results and describe the potential future works in Section 5.

Related Work
Many studies in the past years have focused on facial expression, which is very informative for pain recognition [7,8]. Ekman and Friesen [9] decompose facial expression into individual facial Action Units (AUs) with the Facial Action Coding System (FACS). Several automatic systems analyse AUs and their combinations for recognising frame-level and sequence (or video)-level pain intensity. Frame-level methods such as Prkachin and Solomon Pain Intensity (PSPI) [10] have their limitation in describing relevant dynamic information that is beneficial for pain intensity recognition. Thus, many recent works focus on video-level pain recognition because it is more effective in describing such information [11,12]. It often uses temporal integration of frame-level features. For example, the video content can be condensed to high-level features by using a time series statistics descriptor that consists of several statistical measures of the time series. According to the ability of Random Forest (RF) [13] for pain detection using facial expression [3,12,14], Othman et al. [2,15] introduced RFc, with time series statistics descriptor by calculating several statistical measures with their first and second derivatives per time series, as baseline method and compare its performance to the proposed deep learning methods that analyse and RGB image encoding temporal information. The performance of reduced MobileNetV2 [15] and simple Convolutional Neural Network (CNN) [2] was better than RFc. CNN accuracy improved when using the sample weighting method by about 1% [2]. The sample weighting method is suggested to reduce the weight of misclassified samples by duplicating some training samples with more facial responses if their classification scores are above 0.3 to improve the pain intensity recognition performance. Further, LSTM [16] is an effective method for better handling time series prediction. It was proposed to learn long-term dependency among longer time periods by storing information from previous periods. Thus, Othman et al. [4,5] utilise long-term memory (LSTM) and LSTM using sample weighting (LSTM-SW), which are significantly better than RF for recognising continuous pain intensity when using facial expressions with classification and regression. These results are used in this work for the comparison.
A promising technology used in automatic pain recognition is electrodermal activity (EDA) sensors [17]. EDA can be measured on superficial muscles of the skin of hands, which is controlled by the autonomic nervous system [18,19]. The sweat on the skin surface changes the electrical conductivity of the skin. E.g. people sweat when they are scared, nervous, and in pain. Thus, in earlier works of automatic pain recognition [2,17,20], EDA proved that it carries salient information about different pain level in people. EDA is composed of phasic and tonic signals. The phasic signal is a quick response caused by external stimuli such as pain stimuli, and the tonic signal is a slower component of the signal including the baseline of signal due to unconscious activities [19]. Werner et al. [3] trained an RF using features extracted from different physiological signals [6]. They reported that the EDA sensor is the best in recognising pain levels when using the X-ITE Pain Database. However, in contrast to our work they did not investigate continuous pain monitoring but classification of pre-segmented time windows. Further, prior works in emotion recognition such as [19] using EDA sensors were promising when using deep learning networks [Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)]. The deep learning networks were utilised due to their ability to mine the sequential relationships between different periods of EDA signals. Thus, we train our RF and LSTM mothods, which were proposed in [4,5] with features extracted from EDA.
In [3], the authors reported that the fusion with modalities (frontal RGB camera, audio, ECG, EMG, EDA) can improve the results: first, they individually trained RF using the features of each modality; second, they concatenated the feature vectors of all modalities and trained and tested RF (called feature fusion); third, they applied decision fusion by training the RF in the individual modalities, then aggregating the RF scores into the final decisions. They used two types of aggregation: a fixed mapping and trained mapping approaches. In this work, we use fixed mapping approach on two modalities (facial expression and EDA), and we call it Decision Fusion (DF). DF for each trained model (RF, LSTM, and LSTM-SW) is based on calculating the mean of output scores per class from the two modalities and selecting the class with the highest score (classification task). In regards to regression, DF for each trained model is based on calculating the average of the predicted probabilities from the two modalities. Our main aim is to provide a highly valuable automated real-time system for clinical settings to quickly, accurately and objectively monitor the pain levels of patients.

Contributions
Werner et al. [3] and Walter et al. [21] report the results of using phasic (short) and tonic (long) stimulation samples from frontal RGB camera, audio, and psychological data of 7 seconds, which have been cut out from the continuous recording of the main stimulation phase in X-ITE Pain Database. This paper goes beyond their studies, as it reports the first continuous pain monitoring results based on analysing electrodermal activity (EDA) signal from X-ITE Pain Database. We use the continuous recording of the EDA signal of most of the experiment, which is about 1 hour and a half per subject. This paper additionally includes classification with discrete output categories and the comparison of obtained results with the results of continuous-valued outputs provided by regression. We report the Decision Fusion (DF) results when using facial expression and EDA modalities. We applied three automatic methods for classification and regression pain recognition EDA data sequences: (1) Random Forest classifier (RFc) and Random Forest regression (RFr) (as baseline methods), (2) Long-Short Term Memory (LSTM), and (3) LSTM using sample weighting method (LSTM-SW). Further, we compare model performances achieved using facial expression (the results obtained from [4,5]) with EDA and DF. All LSTMs models perform better than guessing and baseline methods (RFc and RFr) to recognise pain intensity for the two pain stimulus types (phasic and tonic) in three pain intensities (low, moderate, and severe) and two qualities (heat and electrical stimuli). In line with our recent studies, the used datasets proposed in [4,5] to reduce the impact of imbalanced data. We show that LSTMs using EDA perform the best with most datasets and LSTMs using DF was fair with Heat Phasic Dataset (HTD) dataset.

Materials and Methods
In this section, we introduce the system structure to automatically recognise the continuous pain intensity using the facial expression in video and electrodermal activity (EDA) data on the X-ITE Pain Database. The X-ITE Pain Database preprocessing is described in Section 2.1. Our prior works focused on facial expression, see [4,5]. The focus of this paper is on the electrodermal activity (EDA) and Decision Fusion (DF) to recognise continuous pain intensity, for more details see Section 2.2 and Section 2.3. Figure 1 shows an overview of the automatic system using frontal facial RGB video and EDA to recognise continuous pain intensity. We determined temporal integration in time window from the extracted facial features (FF) and EDA time series by using a time series statistics descriptor [called Facial Activity Descriptor (FAD) and electrodermal activity Descriptor (EDA-D)]. We moved the labels three seconds forward and then used sliding window with a time length of ten seconds ago. More details about the processing of facial video and EDA data are described in Section 2.2. For recognising continuous pain intensities, we used Random Forest (RF) as an automatic baseline method and two Long-Short Term Memory (LSTM) methods (one uses the sample weighting method and the other does not). Further, we applied Decision Fusion (DF) method, in which individual RF and LSTMs are trained for FF and EDA modalities; for more details, see Section 2.3.

Database preprocesssing
In this section, we give an overview of the multimodal Experimentally Induced Thermal and Electrical (X-ITE) Pain Database [6], which we use to validate the performance of different automatic methods for continuous pain intensity recognition. In this database, only 127 participants (subjects) subset have data available from all sensors (frontal RGB camera, audio, ECG: electrocardiogram, EMG: surface electromyography, EDA: electrodermal activity). Alongside to Werner et al. [3] and Othman et al. [2,4], we use this subset and focus on analysing the facial expression and EDA data from time series involving the phasic and tonic pain intensity in 3 intensities (low, medium, and high) during the application of the thermal (Medoc PATHWAY Model ATS) and electrical pain stimuli (Digitimer DS7A) and no pain. The 5 second phasic stimuli of each modality (heat and electrical pain) and intensity were repeated 30 times in randomised order with pauses of 8-12 seconds. The tonic stimuli were applied once for one minute per intensity and modality followed by a pause of five minutes. For more details about the data collection experiment, see Gruss et al. [6]. Further, we explain in this paper the steps in X-ITE Pain Database preprocessing to reduce the impact of imbalanced database problem. Automatic methods should be able to recognise pain intensity from video and EDA data time series. However, we noticed that the distribution of samples for pain intensity labels is extremely unbalanced as shown in Figure 2.
In Othman et al. [4], we proposed 11 datasets obtained from the X-ITE Pain Database to reduce the imbalance database problem, which are also used in this work. First, we investigated the intensity of facial expressions for most samples when expressing pain intensity, then we assigned all subjects into four categories based on how they expressed pain intensity. Second, we suggested to split the database into 80% of data for training (100 subjects = 572696 samples), 10% for validation (13 subjects = 75537 samples), and 10% for testing (14 subjects = 79485 samples); each split contains samples from all intensity categories. We selected the subjects randomly from each category based on the proposed percentage (see Figure 3). We noticed that the distribution of samples for pain intensity labels is extremely unbalanced as shown in Figure 2. Third, we processed the database: (a) we excluded all sequences of samples with labels -10, -11 and no pain samples sequence before and after these samples to simplify the problem and reduce the impact of imbalance in database; (b) we split the obtained dataset into 6 datasets (called Subsets) to evaluate the proposed methods, (c) we reduced each proposed subset by removing some no pain samples prior to pain intensity frames in a time series for each subject, these datasets are called Reduced Subsets. The applied Subsets are (1) Phasic Dataset (PD): Exclude tonic samples (labeled 4, 5, 6, -4, -5, -6, -10, -11) and no-pain samples before these samples and also after samples with -10, -11 labeled,  Reduce the no pain frames in ETD to about 49%. Our reduction strategy focuses on reducing some no pain samples prior to each pain intensity sequence by preserving different numbers of no pain samples that are directly adjacent to each pain intensity sequence, this number is assigned based on the number of samples in each pain intensity sequence, e.g. for sequence of phasic electrical pain intensity that contains five samples, we keep the previous five no pain samples and delete the rest before.

Processing of facial experssion and electrodermal activity (EDA) Data
After preprocesssing database in Section 2.1, we processed the data of both modalities (facial RGB video and EDA) to extract features for recognising continuous pain intensity. First, OpenFace [22] was used to extract Facial Features (FF) from each frame for each video (subject), the average length of videos is about one and a half hours. OpenFace detects the face, facial landmarks, extracts Action Units (AUs), and estimates head pose. The FF we use include 21 features: 3 head pose (Yaw, Pitch and Roll), AU1 (binary occurrence output), and 17 AU intensity outputs of OpenFace which are AU1, AU2, AU4, AU5, AU6, AU7, AU9, AU10, AU12, AU14, AU15, AU17, AU20, AU23, AU25, AU26, and AU45. The FF were recorded at 25 frames per second (fps). In line with the FF extraction process, we use only the EDA data at the same time series sampling rate (1/25 seconds). We calculated temporal integration features from the 1-dimensional EDA time series and 21-dimensional facial expression time series. Each time series was summarised by four statistics of the time series itself and its first and second derivative, including minimum, maximum, mean, and standard deviation, yielding a 12×21-dimensional and 12×1-dimensional descriptor per time series for FF and EDA features, respectively. A person-specific standardisation of the features [11] was applied with Facial Activity Descriptor (FAD) and EDA Descriptor (EDA-D) in order to focus on the within-subject response variation rather than the differences between subjects. The labels of each subject were moved 3 seconds after because the facial pain responses typically are delayed by 2-3 seconds compared to stimulus. Further, We applied the sliding time window with length ten seconds once per second by combining the FAD/EDA-D of ten seconds ago to predict the next time step of pain intensity labels. The data for the first ten seconds are removed because there are no prior observations to use.

Classification, Regression and Fusion
The descriptor (EDA-D) is used as features for recognising continuous pain intensity (no pain, low, moderate, and severe) and modality (heat and electrical pain stimuli) using Random Forest (RF), Long-Short Term Memory (LSTM), and LSTM using sample weighting method (LSTM-SW). We presented the results of using the above methods with FAD in [4,5].
In line with Werner et al. [3] and Othman et al. [2,4,5], we trained Random Forest classifier (RFc) and Random Forest regression (RFr) with 100 trees and a maximum depth of 10 nodes for classification and regression tasks. Both RFc and RFr are the baseline methods to compare them with other deep learning methods in this study. Figure 4 shows The applied LSTM-SW was introduced by our recent studies [2,4,5]. In LSTM-SW, the samples were trained on LSTM after increasing some training samples using sample weighting method [2]. We used RF with FAD to determine samples with prediction scores higher than 0.3 in training datasets and then replicated these samples once. LSTM-SW using CCE is called LSTM-SW-CCE and LSTM-SW using BCE is called LSTM-SW-BCE.
In this work, we applied Decision Fusion (DF) on the outputs from individual trained models that using facial expression and EDA modalities. The classification models (RFc, LSTM, and LSTM-SW) yield a score for each possible class and the regression models (RFr, LSTM, and LSTM-SW) yield predict a continuous value. We aggregated the classifier scores and regression outputs individually into a final decision by using a fixed mapping approach. In regards to classification, DF is implemented by calculating the mean of output scores per class of both models using FAD and EDA-D and selecting the class with the highest score. In regards to regression, all RFr, LSTM and LSTM-SW predictions were averaged individually in terms of calculating DF.

Classification
In regards to classification, the accuracy is used to measure and compare the performance of our models (See Section 2.3 and Figure 4) with Trivial and baseline model (RFc) in recognising continuous pain intensity, see Table 1 and Figure 5.  Most RFc models perform worse than guessing. In contrast, Trivial model totally failed to recognise continuous pain intensity because it always votes for the majority class (no pain in our experiment), when RFc is better. Thus, we compared the results of the proposed models (LSTM-CCE and LSTM-SW-CCE) if they are significantly better than the baseline model (RFc) by considering the p-value of the paired ttest. Decision Fusion (DF) of 7-Class modalities using LSTM-SW-CCE with Subsets datasets improves the performance significantly compared to the single modality or DF using RFc (78.8% for PD, 79.9% for HPD, 87.2% for EPD, and 70.9% for TD using decision fusion with mean-score aggregation). LSTM-CCE with DF yields the similar results as LSTM-SW-CCE to recognise phasic heat/electrical pain intensity in Subsets (79.9% and 87.2% for HPD and EPD, respectively). Further, LSTM-CCE with EDA-D performs the best to recognise tonic heat/electrical pain intensity in Subsets, 7-Class pain intensity in Reduced Subsets, heat & electrical pain intensity in Reduced Subsets for phasic stimuli, and electrical phasic intensity (48.4%, 82.4%, 67.3%, 68.6%, 75.8%, and 57.7% for HTD, ETD, RPD, RHPD, REPD, and RETD, respectively in this sequence). LSTM-SW-CCE with EDA-D performs the best in RTD by about 42.7%. LSTM-CCE with EDA-D and DF, and LSTM-SW-CCE with FAD are the best in the ETD dataset and obtain the same performance (82.4%).

Classification vs Regression
This section provides the comparison between Trivial and our methods with phasic and tonic pain stimuli datasets (7-Class, see Section 3.2.1), phasic and tonic heat pain stimuli datasets (4-Class, see Section 3.2.2), and phasic and tonic electrical pain stimuli datasets (4-Class, see Section 3.2.3). Mean Squared Error (MSE) and the Intraclass Correlation Coefficient (ICC) [23] are used to measure the performance of classification models versus regression models. Table 2 shows the results of classifying all 7 available classes (datasets PD, ED, RPD, EPD) for both phasic and tonic pain, combining the pain intensity and the stimuli modality (heat and electrical pain). In regards to classification and regression, all automatic models with EDA-D and DF are superior to those with FAD. In line with Othman et al. [4,5], our models with EDA-D and DF outperform Trivial and baseline methods (RFr and RFc). In this experiment, the fusion using the average outputs from the FAD and EDA-D modalities (DF) improves performance significantly compared to the best FAD modalities as shown in Figure 6.

Heat Pain Intensity Recognition (4-Class)
Due to the results of 7-Class pain intensity recognition, our models were trained with the 4-Class classification and regression tasks to simplify the problem and increase the performance. For each pain stimulation modality type (phasic and tonic), we excluded electrical pain intensities; the used combinations are: BL, PH1, PH2, and PH3 (datasets HPD, RHPD), as well as BL, TH1, TH2, and TH3 (dataset HTD). The performance of our automatic models with EDA-D and DF for continuous pain recognition with both combinations (see Table 2 and Figure 7) is significantly greater than Trivial and baseline method (RFr and RFc). The best models results: (1) LSTM-BCE-SW with DF (when using HPD dataset) yields the highest ICC  Table 2 and Figure 8 show the high performance in recognising continuous pain intensity of the 4-Class classification and regression tasks. 4 classes for electrical stimulation method are considered, we excluded heat stimuli: BL, PE1, PE2, and PE3 for phasic pain stimulation (datasets EPD, REPD) and BL, TE1, TE2, and TE3 for tonic pain stimulation (datasets ETD, RETD). The electrical pain recognition models with EDA-D and DF show better performance than Trivial and baseline method (RFr and RFc), same models with FAD provided the best as shown in Othman et al. [4,5]. LSTM-SW-BCE with EDA-D model when using EPD, ETD, and REPD datasets performed the best based on the highest ICC (0.53, 0.21, 0.88) and MSE of 0.05, 0.07, 0.03, respectively in this sequence. LSTM-BCE with EDA-D model when using RETD yield the highest ICC (0.49) and lowest MSE (0.10). Further, RFc with DF when using EPD dataset performs the worse because most of no pain samples were labeled with pain after fusion of the FAD and EDA-D modalities.

Discussion
In this paper, we conducted experiments to compare the performance between different automatic methods in recognising continuous pain on the X-ITE Pain Database, when using two modalities (Frontal RGB camera and EDA), and the fusion of these two modalities, see Table 1, 2 and Figures 5, 6, 7, 8. The results in both phasic and tonic datasets show that it is possible to monitor the continuous pain intensity, which is in line with results of our recent works [4,5] and many prior studies. The results of analysing the facial expression features were reported in Othman et al. [4,5] studies and are used here for comparison. In this work, we trained RF (RFc and RFr as baseline models) and LSTMs (LSTM-CCE, LSTM-SW-CCE, LSTM BCE and LSTM-SW-BCE models) with EDA-D and DF on several datasets from X-ITE Pain Database (the used datasets and methods are described in Section 2.1 and Section 2.3). To achieve continuous pain intensity recognition, we used the sliding window strategy for obtaining 10s-length input samples. Further we move the labels for each subject 3 seconds after. The results show that (1) the models using networks A and C perform better than models using networks B and D, and (2) LSTMs models using EDA-D and DF are significantly better than guessing and most baseline models (RFc and RFr).    In line with Werner et al. [3] results, the accuracy performance of RFc with DF instead of FAD (Facial Activity Descriptor) improved the almost balanced Heat Tonic Dataset (HTD) (only 20% of samples experience no pain). The possible reason for the poor performance of other models is that our datasets are highly imbalanced. However, RFc with DF increased pain intensity recognition performance but at the same time decreased no pain recognition performance when input data are sequences. Thus, we improved the continuous pain intensity recognition and its qualities (heat and electrical stimuli) using LSTMs methods because they work better with imbalanced datasets (see [4,5]).
The LSTMs with DF and EDA-D outperform LSTMs using FAD, DF is best in most results of the Subsets (huge imbalanced datasets) in regards to classification (see Table 1). LSTMs with EDA-D perform the best when using Reduced Subsets, it seems that the amount of EDA data was appropriate with the complexity of the problem and informative when using LSTMs. In Reduced Subsets, we only preserved some adjacent no pain samples prior to each sub-sequence of pain intensity, so that the number of no pain samples is the same as the number of samples in the adjacent pain intensity sub-sequence, e.g. for a sub-sequence of tonic heat pain of low intensity that contains 5 samples, we keep the previous 5 no pain samples and delete the rest before. The reduced datasets (Reduced Subsets) were suggested to reduce the impact of huge imbalanced datasets which maybe include a lot of outlier samples. In Reduced Subsets, the obtained results show that LSTMs were significantly better than RFc by at least 10% in reduced phasic datasets and at least 5% in tonic phasic datasets. In contrast, the results of most datasets in Subsets show that LSTMs improve the accuracy of about 2.5% (maximum) compared to RFc except with Heat Tonic Dataset (HTD), which improved by up to 7%.
In regards to regression, the performance of most models with EDA-D outperform models with FAD and are better than several models with DF, see Table 2. Further, the performance of the LSTM-BCE model with EDA-D was very good for pain intensity monitoring when using Reduced Electrical Phasic Dataset (REPD) (ICC about 0.88). It might be due to reducing the noise or outlier data and more data of pain intensities. LSTM-SW-BCE model with DF also performs fair when using Heat Phasic Dataset (HPD), ICC about 0.33. Most of the LSTMs models using EDA-D achieve the best performance, moreover, LSTM-SW-BCE with EDA-D increased the performance compared to LSTM-BCE with EDA-D of several models and it performs the best with RPD, EPD, and ETD. This leads to the hypothesis that the success of LSTM-SW using the binary cross-entropy loss function is based on downweighting samples in training set with less facial response using sample weighting method [2]. These samples might negatively affect the model performance. LSTM-CCE with EDA-D and DF models perform fairly and the best when using the HPD and RTD, about 0.35 and 0.31, respectively. The possible reason why the classification model performs well is that the HPD is almost balanced dataset, and the balanced data are good for classification. In accordance with Othman et al. [4,5], we confirm that regression models are superior to classification models when using huge imbalanced datasets.
Prior works with the X-ITE Pain Database [3,21] focused on parts (time windows) from the continuous recording of the main stimulation phase. These time windows have been cut out from the continuous recording frontal RGB camera, audio, and physiological data of 7 seconds. See [3] for more details about the time windows. So this paper advances by introducing continuous monitoring system using the frontal RGB camera and EDA data for pain intensity recognition in the continuous recording of the main stimulation phase of about one hour and a half per subject.
As we have seen in the promising results in the discussion section, an automated system can be provided which is able to continuously monitor the pain intensity of patients. In regards to classification and regression, we applied three methods [Random Forest (RF), Long-Short Term Memory (LSTM), and LSTM using sample weighting method (called LSTM-SW)] with Facial Activity Descriptors (FAD), Electrodermal Activity Descriptors (EDA-D), and Decision Fusion (DF) for continuous pain intensity recognition on several datasets from X-ITE Pain Database. The major difficulties for using machine learning classifier or regressor are the huge data imbalance and outliers. Thus, we used the reduced datasets to simplify and to reduce the impact of imbalanced database problem by reducing no pain samples. The applied strategy is reported in Section 2.1 and [4,5]. The best results were obtained with LSTM models. Generally, they outperformed baseline models (RFc and RFr) and guessing (the majority of votes = no pain). LSTM models most clearly outperform other models when using the Reduced Subsets (see Table 1, 2 and Figure 5, 6, 7, and 8). Most LSTM models with EDA-D are the best for continuous monitoring pain intensity. LSTM model with DF was fair when using HPD because the fusion of facial expressions and EDA modalities was good with the complexity of the problem of the heat phasic dataset. Further, the results showed that regression is superior to classification for huge imbalanced data. The classification performed better than regression with Heat Tonic Dataset (HTD) because it is almost balanced, and the best result when using LSTM-CCE with DF. This result is consistent with Werner et al. [3] findings, who used a balanced database for classification of pre-segmented time windows. LSTM model with EDA-D using Reduced Tonic Dataset (RTD) is the best regarding classification. Although, the performance of tonic datasets improved compared to Othman et al. [4,5] who used FAD only (see Table 1 and 2), the performance is still worse compared to the phasic datasets due to the small size of the data. Thus, we assume more data with more pain intensities will improve LSTMs or other deep learning methods. Further, finding a solution for imbalanced datasets and using other modalities (Audio, EMG, and ECG) could improve the results.
Author Contributions: Methodology, software, validation, investigation, and visualization: E. Othman; formal analysis, writing-original draft preparation: E. Othman, P. Werner; data gathering: S. Gruss, S. Walter; conceptualisation, writing-review and editing: all authors; All authors have read and agreed to the published version of the manuscript. projects HuBA no. 03ZZ0470, RoboAssist no. 03ZZ0448L, and Robo-Lab no. 03ZZ04X02B within the Zwanzig20 Alliance 3Dsensation for support.
Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Ulm University (protocol code: 372/16, date of approval: 5 January 2017).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations:
The following abbreviations are used in this manuscript: