2.1 Data Acquisition
Numerous studies have demonstrated that mental arithmetic tasks[18-20] can induce psychological stress, which will be utilized to induce stress in the current experiment. 26 volunteers, aged between 20-30, who were well-educated, physically and mentally healthy, and not abusing any substances, were recruited for the study. Informed consent forms were completed by all participants before the experiment, and the experimental protocol was in accordance with the Declaration of Helsinki.
EEG data were collected using a 14-channel wireless EMOTIV EpocX headset with a sampling frequency of 128 HZ. Figure 1 displays the electrode schematic.
ECG data were gathered with a single-channel CHERO ECG patch at a sampling frequency of 250 HZ. The software for the experiment was designed using Python3.9 and Pyside6 framework on the Windows 10 platform. The computer controlled the marking signals for the data collection to ensure high temporal precision.
The study is composed of five distinct phases: a resting phase, a practice phase, and three formal phases. Firstly, we will gather two minutes of resting state data from each subject, during which they are not required to perform any task but simply remain relaxed. A flow chart depicting the experiment for all phases except the resting phase is included in Figure 2.
Each stage comprises n subsections, wherein participants are required to complete m questions within 10 seconds. The questions consist of three two-digit numbers, which are added or subtracted. The computer generates a random result that is close to the correct answer, and subjects are expected to choose whether the result is greater than, less than, or equal to the correct answer using the keyboard. There is a rest period of 5 seconds after each subsection and a 20-second rest period in the middle of each stage. The task difficulty increases with each stage. Ideally, no data indicating stress levels will be collected from the subjects during the resting stage. In stage 1, only light stress data will be collected, while in stage 2, concentrated stress data will be gathered. Heavy stress data will be collected during stage 3. The data obtained from the practicing stage and the resting time will be discarded. During the experiment, participants were instructed to minimize signal artifacts by refraining from making large movements and only applying slight pressure with their fingers on the keyboard. Failure to adhere to these guidelines resulted in the experiment being restarted or terminated, depending on the circumstances.
After assessing the data quality of 26 participants, we removed two subjects due to excessive data noise and preserved the data from the remaining 24 participants. For each subject, we selected the central 100 seconds of the resting state data over a 2-minute period and partitioned it into ten equal segments. Additionally, we obtained the data from each subsection of the three phases, yielding a total of 4*10=40 data samples. Each sample included 10 seconds of EEG and ECG data, with 14*1280 sampling points for EEG and 1*2500 sampling points for ECG. We collected a total of 24*40 data samples for 24 subjects, resulting in 960 overall samples.
2.2 Data Pre-processing
Since both EpocX and CHERO devices have some denoising of the signal and the denoising method used is not available, this paper will not take any pre-processing measures and will directly use the raw data for analysis to maximize the retention of information.
2.3 Traditional Machine Learning Method
2.3.1 Feature Extraction
EEG power band and power asymmetry in brain regions appear to have a strong correlation with psychological stress[4-6, 12]. The current study extracted the frequency band power of Delta (0.5-3.5 Hz), Theta (4-7.5 Hz), Alpha (8-13 Hz), and Beta (14-30 Hz) from 14 EEG channels. Furthermore, the frequency band power of seven symmetrical groups of channels (AF3-AF4, F3-F4, Fc5-FC6, F7-F8, T7-T8, P7-P8, O1-O2, shown in Figure 1) was analyzed for Alpha frequency band power asymmetry, as illustrated in Table 1.
Table 1. Description and numbers of extracted EEG features. Totally 63 EEG features in each sample.
HRV, which responds to autonomic nervous system activity, has been the primary method for analyzing ECG signals[7-9]. The study presents the time domain, frequency domain, and statistical features extracted in Table 2.
Table 2. Description of extracted ECG features. Totally 12 features in each sample.
Feature
|
Description
|
VLF
|
power of very low frequency(<0.04Hz)
|
LF
|
power of low frequency(0.04-0.15Hz)
|
HF
|
power of high frequency(0.15-0.4Hz)
|
LF/HF
|
Ratio of LF and HF
|
RMSSD
|
root mean of squared differences of successive NN Intervals
|
SDNN
|
standard deviation of an NN intervals
|
NN_mean
|
Mean NN interval
|
NN_min
|
Minimum NN interval
|
NN_max
|
Maximum NN interval
|
SDSD
|
standard deviation of differences of successive NN intervals
|
PNN20
|
NN20 count divided by the total number of all NN intervals
|
Triangular Index
|
triangular index based on the NN intervals histogram
|
2.3.2 Classification
Support Vector Machines (SVMs), Decision Trees (DTs), Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbors (KNN), and Stochastic Gradient Descent (SGD) algorithms are utilized for comparing classifications. The classification will be separated into two sessions. Firstly, the EEG and ECG features will be separately introduced to the classification algorithms to demonstrate the unimodal classification results. Subsequently, the multimodal fusion classification results will be presented utilizing feature fusion and decision fusion strategies, respectively.
In the feature fusion approach, EEG and ECG features are overlayed and fed into the classifier. In the decision fusion method, both types of features are fed into the same classifier, and the outputs are added to generate the final classification result. Please refer to Figure 3 for the visual representation of these two techniques.
2.4 Deep Learning Method
CNN has demonstrated a high level of effectiveness in EEG and ECG signal-related tasks [10-14]. In this study, we utilized models based on CNN to categorize EEG and ECG signals separately, and subsequently incorporated the classification outcomes of both into a decision model for conclusive decision making. Next, we will present EEG, ECG, and categorical decision models in order.
2.4.1 Model of EEG Classification
For EEG signals, we used the convolutional model shown in Figure 4:
The initial 14-lead EEG signal serves as the model's input and the corresponding data is allotted to the 14 channels of the convolutional layer. Temporal features are then obtained from the Convolution1 Layer, consisting of 28 convolutional kernels with each one measuring (1,3) in size. Next, the first feature dimension (channel) is switched with the last dimension (sampling point). And the feature shape is transformed from (28, 1, 1278) to (1278, 1, 28). Then, the transposed feature is inputted into the Convolution2 Layer, which has 28 convolution kernels with dimensions of (1, 1) to further extract the features of each channel in the Convolution1 Layer. Subsequently, the transposition is reversed to the same dimensions of the feature as the output of Convolution1. Input the feature into the Convolution3 Layer, which contains 56 convolution kernels, each with dimensions of (1,3), to extract deep temporal features. Then, use the channel maximum pooling layer to downscale the feature dimensions from (56,1,1276) to (1,1,1276). Finally, flatten the feature and input it into the Full Connection Layer for classification into four classes. Notably, the convolution process does not involve padding.
2.4.2 Model of ECG Classification
Inspired by ST-CNN-GAP-5-Net[21], we designed a convolutional network to analyze ECG signals as shown in Figure 5.
The model comprises six convolution modules, and Figure 5 displays the structure of each module in chronological order: Convolution Layer, Batch Normalization Layer, ReLu Layer, and Pooling Layer. The first five modules are Temporal Convolution Modules, while the last one is a Spatial Convolution Module. The kernel size for each filter is (1,5), (1,5), (1,5), (1,5), (1,3), and (1,3), respectively. Additionally, pooling is applied with sizes of (1,2), (1,4), (1,2), (1,4), (1,2), and (1,4). The original ECG signal is processed through various filters, including 4, 8, 16, 32, and 64. The Spatial Convolution Module is then used to extract spatial information from the features. Next, the features undergo downsizing through channel convolution with 64 filters and a kernel size of (12, 1). Lastly, global average pooling is employed. Finally, the extracted features are inputted into the Dense Module, which includes a Full Connection Layer, Batch Normalization Layer, ReLu Layer, and Dropout Layer (dropout rate of 0.1), to generate predicted probabilities for the four classes. Padding is applied throughout the convolution process.
2.4.3 Model of Decision Fusion
The predicted probabilities of EEG and ECG for the four categories were acquired from the models mentioned in Sections 2.4.2 and 2.4.3, respectively. Then, all the probable values of the samples in the training set were inserted into the decision fusion model presented in Figure 6 for training.
First, the predicted probabilities of EEG and ECG for the four pressure levels are added separately to calculate a new probability. This probability is then combined with the predicted probabilities of EEG and ECG to create a decision matrix with a size of (4,3). The matrix is inputted into the Decision Convolution Model illustrated in Fig. 6. The Convolution1 Layer, with filters of 2 and kernel size of (2,3), will learn the probabilities of EEG, ECG, and their sum to automatically assign weights. Then, the Convolution2 Layer with 4 filters and kernel size (2,1), will extract the lateral decision features again, flatten all features, and input them into the Full Connection Layer to obtain the final classification result. Padding is not used throughout the entire convolution process.
2.5 Performance Metrics
Accuracy and Kappa score are commonly used as a measure of model performance and are also considered to be the most important in classification tasks, and will serve as the basis for comparing model performance in this paper.
Accuracy is the ratio of correct predictions to total predicted values. To calculate accuracy, it is necessary to first calculate true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) using the following formula:
The Kappa coefficient calculates whether the model's prediction accuracy is balanced for each class and is determined by:
where Pa is the actual percentage of agreement, and Pe is the expected percentage chance of agreement.
2.6 1D-Grad-CAM
Grad-CAM[22] obtains the weight of any convolutional layer on the target category through backpropagation. The original method displays features obtained by the convolutional layer on the input image. This paper combines the weights of the convolutional layer with the input signal and displays information obtained by the convolutional layer in the 1D signal for improved interpretability. For the principle of Grad-CAM derivation, please refer to the original article.