Fusion Classi�cation of Stroke Patients' Biosignals by Weighted Cross-Validation-based Feature Selection (W-CVFS) Method

: A multi-source information fusion-based disease class classification of stroke patients was implemented to address the low classification accuracy of pure input motion and electromyographic signals. sEMG sensor MYO arm ring and wearable wireless motion sensor Shimmer were used as data acquisition devices. The Butterworth high-pass filter filtering and envelope threshold-ing method detected the activity segment. Detection and FIR filtering using the window function method remove interference from the motion signal. A weighted cross-validation-based feature selection (W-CVFS) method is proposed for feature fusion selection. The top 10 features selected by the W-CVFS method and all 18 features are input to the deep neural network for training and testing, and the feature classification result of the W-CVFS method is 79.17%, which is better than the existing mRMR method (66.67%) and ILFS method (62.50%). The classification accuracy of multi-source information fusion was 95.385%, which was higher than that of a single input motion signal or sEMG. The experiments showed that the proposed method can retain the features that have more influence on the classification results and can improve the classification accuracy of the rehabilitation model for stroke patients. EMG (DA). six movements surface EMG digital signal discrete wavelet (EPNN). classifier (SVM) classifier and a linear (LR) classifier classifier, classification accuracy of [10] using a multilayer neural network and an adaptive neuro-fuzzy system with surface electromyography (sEMG) and accelerometer (ACC) sensors [11] of the used a single surface EMG signal for analysis. Experiments classification accuracy of pure input motion signal features and EMG features is low and cannot the requirements of clinical rehabilitation assessment. When fused motor features and sEMG features were input, training accuracy and test


Introduction
Cerebral stroke, a cerebrovascular disease with a high fatality rate, is the main cause of disability among adults [1] . Surface electromyography (sEMG) [2] is a bioelectrical signal produced by the human body, which contains meaningful information related to muscle activity, and can be used to identify the muscle movement intention, evaluate the functional state of the muscle, and play a role in motor control. And neuromuscular physiology has many applications.
Using a genetic algorithm (GA) and pseudo-wavelet function, the classification rate of muscle fatigue was improved by 4.45% to 14.95% (p<0.05), and the average correct classification rate was 87.90% [3] . The performance was evaluated using six different sEMG signals with varying movements of the arm and four different classifiers. Better classification accuracy was obtained in the MT classifier with a 6% improvement in differentiation compared to the features extracted from the original sEMG signals [4] . The fusion of surface EMG signal nonlinear features and time-domain features using the SVM-DS fusion algorithm and the recognition accuracy can be stabilized at 95% [5] . The features of surface EMG signals were extracted using principal component analysis (PCA), and the processing effect of each feature extraction method was compared using discriminant analysis (DA). The recognition rate of six gestures could reach 98.29% [6] . Upper limb movements were identified from surface EMG signals by digital signal processing, discrete wavelet transform, and enhanced probabilistic neural network (EPNN). This method's average classification accuracy was 75.5% [7] . Identification of six different hand motions by comparing frequency domain (FD) and time-frequency domain (TFD) features using a most neighborly field (KNN) classifier with 95.5% classification accuracy for the TFD feature vector and 89% for the FD feature vector [8] . Proposed an improved deep BP (Backpropagation)-LSTM for sEMG signal classification, achieving an accuracy of 92% [9] . Individual time-frequency domain features were compared using a support vector machine (SVM) classifier and a linear regression (LR) model. The SVM classifier outperformed the LR classifier, achieving a classification accuracy of 95.8% [10] . The data were analyzed using a multilayer neural network and an adaptive neuro-fuzzy inference system combined with surface electromyography (sEMG) and accelerometer (ACC) sensors [11] . Most of the above studies have used a single surface EMG signal for analysis. Experiments have shown that the classification accuracy of pure input motion signal features and EMG features is low and cannot meet the requirements of clinical rehabilitation assessment. When fused motor features and sEMG features were input, training accuracy and test accuracy improved. Therefore, to improve the accuracy of biosignal classification of stroke patients as well as to assist physicians in patient grade classification, to address the problem of the insufficient effect of single-input biosignal classification, this paper fuses biosignals of stroke patients to achieve fusion classification of patient motion features and surface EMG features. A weighted cross-validation feature selection method (W-CVFS) is used for feature selection, which is experimentally validated with the collected patient data and compared with the classical mRMR and ILFS methods, and the classification accuracy of the W-CVFS method is higher through extensive experiments. By using the proposed method in this paper, we can more effectively select the features that have more influence on the classification results, and thus improve the classification accuracy of the rehabilitation model for stroke patients.

Data acquisition
In this paper, we use a novel dataset collected from 30 stroke patients. Table 1 shows the general information of the patients participating in the data collection experiment.We collect two kinds of data for every patient: signals (both motion and sEMG) and their disease stage. In this section, we introduce the collection process of each data.
Actual disease stage of patients To acquire the ground truth prediction target, we collect each patient's clinical file and let clinical experts assess their disease stage.
Motion and sEMG signal Wearable sensors collect both signals. Concretely, we use a Shimmer device to manage the motion signal from the wrist and a MYO arm ring device to select the sEMG signal from the upper arm. Both signs are collected with four selected rehabilitation movements (shoulder forward flexion, shoulder forward exhibition, shoulder 0° elbow 90° forearm pronation, hand touch lumbar vertebrae). The collection process is as follows: Acquisition steps: Step1. The technician helps the patient to wear the Shimmer and MYO [12] .
Step2. The patient sits in a relaxed position and adjusts the seat or bed to a comfortable height for better movement.
Step3. Under the guidance of the rehabilitation therapist, we familiarized the patient with the selected four movements by practicing them several times.
Step4. After hearing the technician's instruction, the patient completes the four sets of movements in sequence, doing each set of movements three times with an interval of three seconds each. Precautions: 1. The technician disinfected the key areas of the skin with medical alcohol to improve the quality of the data.
2. To ensure the accuracy and consistency of the collected data, all MYO sensors were worn on the outside of the forearm wrist and all EMG sensors were worn on the upper arm up to 3 cm from the elbow joint.
3. During the collection process, we recorded the signal for each movement three times, with a 3-second interval between every two movements. We started recording 2 seconds before the movement and discarded movements that did not meet the rehabilitation therapist's criteria. Figure 1 shows the motion sensor and the MYO arm ring wearing position. Since the energy of the sEMG is mainly concentrated above the 20Hz band, the sEMG is filtered with a Butterworth high-pass filter, setting the cut-off frequency to 20Hz.

Active segment detection
To extract valuable information from the three movements, we need to perform active segment detection in the data pre-processing. Therefore, we adopt an effective and efficient method, the envelope threshold method, to detect the functional segments. Figure 2 shows the Filtered sEMG and the envelope. Figure 3 shows the division of active components of sEMG.

Fig2 Filter sEMG Data and Envelope
Fig3 Active Segment Division for sEMG

Motion signals
Due to the low-frequency nature of the biological signal, we adopt the FIR filtering for motion signal pre-processing. Specifically, we use the FIR1 filter and set the cutoff frequency as 5.5HZ according to the characteristics of human medical-biological signals.
As the FIR1 filter requires normalized frequency as an input, we apply normalization to the signal before filtering. With a sampling frequency s f and a cutoff frequency c f , the normalized cutoff frequency n W is calculated as [14] . In this study, the sampling frequency s f is

Feature selection
Feature selection is selecting the subset of relevant features in model construction. As a dimensionality reduction technique, it can help the model to save computational resources and achieve better generalization without sacrificing prediction performance [15] .

W-CVFS method
Based on the CVFS method, this paper proposes a weighted cross-validation feature selection (W-CVFS) method. Logistic regression is used as a classification calculation method to convert the classification problem into a regression analysis problem and solve the weight coefficient of the feature. Detailed steps are as follows: Step 1: Divide the sample equally into K randomly, and use multi-class Logistic regression as the classification method.
Step 2: In each cross-validation, the weight value of each feature in each category is calculated, and the model classification accuracy rate of this cross-validation is obtained at the same time.
Step 3: Multiply the classification accuracy obtained this time with the weight value of each feature received this time, and record it as the feature weight value of each category under this set of cross-validation.
Step 4: After performing K times of cross-validation, the weight coefficients of the same feature in each group are averaged, and then the same feature weights under each category are arranged in descending order, and finally, the weight ranking of all features is obtained.
The specific calculation method will be introduced below: The probability distribution can be found by calling the Softmax function for all categories and satisfying.
For the logistic regression model, find the weight vector  i  so that the model's output in the training set is as close to the given label as possible. Therefore, the maximum likelihood estimation method establishes a likelihood function G , hoping to maximize it. The likelihood function expression G is expressed as: Adding a negative sign, the original maximization is changed to depreciation, so the above equation is the loss function that needs to be solved in logistic regression to minimize.

Results
To verify that the features of motion and sEMG selected by the feature selection method based on weighted cross-validation are more influential on the classification results and to select the features that are more influential on the classification results, the larger the weight value at this point, the more influential the selected features are and the more effective they are for our experiments.Experimental data of feature selection method comparison was performed using data from 36 patients.
There were 4 patients in class III, 14 patients in class IV, 13 patients in class V, and 5 patients in class VI, for a total of 108 data sets. Experiments were conducted using the Matlab R2016a platform, and the mRMR method, ILFS method and the feature selection method proposed in this paper (W-CVFS) were used for feature selection of the extracted motion signals and the 18 features of sEMG, respectively. Table 3 shows the signal features. Table 4 shows the feature selection results of the three methods. Table 5, the W-CVFS method outperforms the mRMR and ILFS methods for all the 18 features extracted. To further verify the superiority of the proposed method in the classification results, the weight results shown in the above table were rearranged in descending order, and the top 10 features were selected, as shown in Table 4. A five-fold cross-validation method is used, and the classification accuracy of each cross-validation is involved in the calculation of the feature weights, to improve the accuracy of the weights. Finally, the extracted motion features and sEMG features are fused and selected by applying the W-CVFS method, and the final feature selection results are obtained and compared for the classification effect, and the classifier is selected as SVM [20] . After the experimental comparison, when using SVM classification, the W-CVFS method was able to select the more important features that had a greater impact on the results, and the classification accuracy of the stroke patient class was higher, and the classification effect was better than that of mRMR and ILFS methods. Table 6 shows the classification accuracies of the results of different feature selection methods.

As shown in
The results show that the classification accuracy of the features selected by the W-CVFS method when input to the SVM for classification is 79.17%, while the classification accuracy of the features selected by the mRMR and ILFS methods when input to the SVM is 66.67%, and 62.50%, respectively, which are lower than that of the W-CVFS method, indicating that the W-CVFS method proposed in this paper can select more important The proposed W-CVFS method can select more important features that have more influence on the classification results. Table 7 shows the classification accuracy of the deep neural network with three different inputs. respectively, which is only 0.371% and 0.769% higher than that of the 10 features, indicating that the weight ranking selected using the improved method features located in the top 10 improve the classification accuracy, reduce the amount of data processing, and lower the computational cost.
In summary, the W-CVFS method has the following advantages: 1.First, the entire dataset is randomly divided into K groups for cross-training and validation using the cross-validation method, in which multiple combinations of data are considered. Each feature data has the opportunity to be used as training and proof. The feature selection results obtained by the cross-validation method are more relevant and reasonable than a single calculation.

Discussion
In this paper, a weighted cross-validation-based feature selection (W-CVFS) method is proposed, and the effectiveness of the proposed method for feature selection is verified by comparing the extracted features with the classical mRMR and ILFS methods. The use of the cross-validation method makes the feature selection results more relevant and reasonable, and each time when cross-validation is performed under different training and test sets, the weights of the same feature obtained are different, which improves the generalization ability of the features. At the same time, processing the weights of each classification accuracy involved in the final weight of each feature can be more effective in selecting the features that have more influence on the classification results for the next classification step. After verifying the effectiveness of the proposed method, three cases based on motion signal features, surface EMG features, and fusion of motion and EMG features are compared, and the results show that the classification accuracy is higher using the W-CVFS method. By using the proposed method in this paper, the features that have more influence on the classification results can be selected more effectively, and the classification of patients can be more accurate, which facilitates the objective detection of the degree of rehabilitation of patients.