ReliefF Matching Feature Selection for Emotion Recognition based on EEG Signal

ReliefF Matching Feature Selection (RMFS) is proposed in the paper, which can solve the problem of individual specificity and global threshold mismatch of emotion recognition. Firstly, EEG was decomposed into six emotion-related bands by wavelet packet, then EMD was employed for extracting the 10 categories of features of wavelet coefficient and IMF component of the reconstructed signal; Secondly, the optimization formula of the feature group weight was proposed based on feature sets selected by ReliefF, and it can get the weights of different test features, which were the global optimal matching feature group and the corresponding matching channel, so it can eliminate the redundant information and solve the problem of individual specificity. Finally, SVM was employed to identify the test feature group data to obtain emotional recognition results. The experimental results show that the average correct rates of RMFS for two-category of the valence and the arousal are 93.28% and 93.32%, and the four-categories are higher than 83%. The efficiency of the single subject using RMFS is improved by 42.65%, which is better than the traditional ReliefF algorithm.


Introduction
The 21st century is known as the century of artificial intelligence and brain science [1]. Emotion recognition is to automatically identify human emotional states by acquiring human physiological and unphysiologic signals, and to realize human-computer interaction more friendly and naturally [2]. According to the 10-20 international system of 16, 32, 64 or 128 channels distributed on the entire scalp to obtain multi-channel EEG sensor signals, the increase in the number of electrodes will cause the feature dimension to rise sharply, resulting in excessive calculations. At the same time, some scholars have proved that the data of a few electrodes does not affect the accuracy of emotion recognition [3][4]. This may be achieved by 16 subjects' universal optimal channels for using Relief algorithm to channel selection. The same features cannot accurately reflect the information of some subjects, because of the individual differences between subjects [6]. In Ref. [7], Lin used the F-score algorithm to select subject-independent features, and the corresponding accuracy of emotion recognition remains basically unchanged by using a half of the features.
Jenke et al. [8], in the study, based on the unique model of different emotion recognition accuracy evaluation method, imply to dimensionality reduction by using ReliefF algorithm. In addition, Bos [9], Schmidt, Traitor [10], Zhang, Lee [11] et al.
According to the famous "emotional valence hypothesis" (Asymmetry of the forehead (F3, F4) when the brain is dealing with negative and positive emotions,) and they selected to perform two types of emotion recognition in the valence dimension.
Momennezhad and A [12] used wavelet transform for feature extraction, and the accuracy rates of the two-class recognition of valence and arousal degree were 0.73 and 0.77, respectively; Lin Jingxin [13][14][15][16]  In order to solve the above problems, this paper proposes RMFS algorithm [17][18][19]. This may be achieved by reducing the feature dimension, eliminating the redundant, prioritizing weighting channels and improving the accuracy of emotional recognition for being weighted formula is optimized for the characteristics of the different subjects set of weights, thus optimizing the matching characteristics of the subjects groups.

ReliefF algorithm
The ReliefF algorithm [20][21][22] is a feature selection algorithm, which is, assigning weights to feature vectors based on the correlation between signal features and classification labels, and deleting feature subsets that have a small impact on the classification effect based on the weights. Specific method: randomly select a sample X in a certain signal characteris  , and find its corresponding type label C; find K nearest neighbor samples H in other samples similar to sample X, and then find the k samples M closest to sample X in samples different from X. If the distance d1 ( , X, H) of sample X from H on the EEG signal  is smaller than (or greater than) the distance d2 (  , X, M from M ), and it will indicate that the feature is beneficial to the classification of the signal (or caused a negative effect), then increase (or decrease) the weight of the feature W.
Owing that the signal samples initially drawn are random and unrepresentative, the ReliefF algorithm needs to be repeated M times to obtain the average value of the weight of each attribute (the degree of effect of the attribute on the classification effect).

ReliefF Matching Feature Selection
The RMFS algorithm [23] conducts the research on subject-related and subject-independent channel selection, which consists of RMFS feature type selection and RMFS channel selection.
First, the RMFS feature type selection [24] is used to achieve the weights of all kinds of features of subjects. What's more, effectively reducing the number of channels and improving the emotion recognition rate by selecting a feature group with a high recognition rate, and then selecting RMFS channels based on the weight of these features.

RMFS matching feature type selection
We adopted ReliefF matching feature selection algorithm for each type of feature weight calculation, All feature weights in the feature subset by ReliefF algorithm are added to obtain the weight of each type of feature and learn the contribution of "per subject" to as a basis for the classification [25].
, we calculate the first n features weights on the basis of formula (3) to obtain a larger gain.
, the n+1feature is moved to the end of the column, and the rest is moved as before. The weight of the n features is calculated on the basis of formula (4): Step6: Go to step4, repeat the above steps, and finally meet 5(3) to complete the feature selection.
In formula (3): In Ref. [26] and Feature weights said the th n terms, the proportion of its use value in the proportion of matching feature set of cumulative values [26]. It may be achieved by making contribution to the classification of higher features in groups plays a more important role for make the right value characteristics of greater rights worth to more incremental, and the weights of the characteristics of the small get smaller increment. RMFS matching feature selection is judged each feature, and adjusted the proportion of each adjustment weight in the matching feature group.
As a consequence, the features with larger weights get more gain, and features with smaller weights get smaller. In this way, the features with higher classification contribution play a greater role in the feature group, which screen the preferred matching feature group, and improve the recognition accuracy rate, and reduce the running time.

RMFS channel selection
The RMFS algorithm is used to treat the channel as a whole. After calculating the channel weights, the cross-validation method is appropriate for acquiring the contribution of different channels to the classification. The weight is adjusted on the basis of the contribution to removing.

RMFS channel selection
Enter: Feature groups corresponding to 32 channels Output: Matching feature set of optimal channels Step1: Calculation with the weight of 32 channels Step2: If the weight is negative, we will remove the corresponding channel, return to Step1,

Single participant feature selection
The DEAP dataset contains EEG and peripheral physiological signals recorded from 32 subjects. In the data collection process, 560 samples were randomly divided into two groups: one part is valence labels, and another part is arousal labels, and each was taken 1/2 sample as training set and the rest as test set. All randomized trials were repeated 10 times to eliminate randomness. The result after taking the mean value was the final classification accuracy. Table 1 shows the result of selecting the feature category of the 25th participant using the DEAP dataset, and we discovered an increase in accuracy of classification and weights especially in feature types.
When there are more than 4 features, the recognition accuracy does not change significantly, but program run-time has been significantly improved. In valence recognition, based on the first four features, and the accuracy rate is 95.35%.Although the accuracy rate and weight sum of the awakening based on the top 6 features are "-" in Table 1, it shows that the weight of 5 features is greater than 0 and the rest are negative, but the running time can be measured. On the one hand, it will increase the use of channels to result in excessive calculation. On the other hand, it will affect the real-time of emotional recognition. The channel selection method is to take the channel as a whole and delete them on the basis of the corresponding recognition accuracy of the channel to obtain the optimal subset of channels. Therefore, Fig.1 presents the advanced RMFS combined the advantages of both is not only improving the recognition accuracy, but also reducing run time.  Table 2 shows the running times of the identification programs of the first n channels when different thresholds p (p = 0.05, 0.1... 0.9) are selected for the channels used.

Impact of RMFS on feature selection
The 32-bit subjects in DEAP were allowed to selected as the experimental objects, in order to increasing the sample size, a 4-second overlapping time window was used to divide the 60s sample into 14 segments, each segment was corresponded to 1024 data points, divided into two groups, and 100 random experiments were performed.       Figure   6 is the statistical results of classification in the recognition of valence and arousal.
From Figure 6, it can be seen that the 84.375% of subjects have a recognition accuracy rate higher than 90% in the two categories of valence and arousal degree, of which 34% have a higher accuracy rate than 95%, and merely 3.125% in the awakening degree two classifications. The accuracy rate of the subjects' recognition is lower than 85%; Experimental results presented that the RMFS algorithm is significantly better than the traditional Relief algorithm.

Table 4
The  In particular, the reason is that the two matching feature sets of valence and arousal are used in the four categories, which adds the advantage of one feature.
When emotions divided by more dimensions are classified, the emotion recognition rate based on the RMFS algorithm will be higher. The statistical results of four types of emotions used in Figure 7.  Figure 7 shows that more than 75% of the subjects have an accuracy rate of greater than 85% for the three types of emotions: happiness, anger, and relaxation. 37.5% of the subjects had a recognition accuracy rate higher than 90% in the average accuracy rate, and 75% of the subjects had a recognition accuracy rate higher than 85%. Furthermore, the overall recognition result of the four emotion categories is better.

Conclusion
These studies presented RMFS algorithms are significantly better than classifying by the traditional ReliefF algorithm using the 32-bit EEG data from the DEAP dataset to perform emotion recognition. Revealing the emotional recognition of EEG signal: (1)The weight formula is optimized to acquire the classification contribution of the feature group and the matching feature group of the subject, as well as its proportion in the matching feature group. In particular, the feature groups with few contributions are eliminated to improve the recognition rate and efficiency of the algorithm; (2)In addition, in order to obtaining a higher accuracy of recognition rate, effectively identifying the channels on the basis of subjects' specific features; specific feature groups of different subjects and select channels on the basis of subjects' specific features (3)Compared with the traditional ReliefF algorithm, it can achieve higher recognition accuracy in the second and fourth types of emotion recognition.
In summary, RMFS proposed in the paper can obtain the global matching feature groups and matching feature groups of different subjects, thereby improving the recognition accuracy and algorithm efficiency, and verifying the effectiveness and feasibility of the RMFS algorithm.