Classication of Electrocardiogram Signals With Waveform Morphological Analysis and Support Vector Machines

Background: Electrocardiogram (ECG) indicates the occurrence of various cardiac diseases, and the accurate classiﬁcation of ECG signals is important for the automatic diagnosis of arrhythmia. Methods: This paper presents a novel classiﬁcation method based on multifeatures by combining waveform morphology and frequency-domain statistical analysis, which oﬀer a better classiﬁcation accuracy and minimise the time spent for classifying signals. A wavelet packet is used to decompose a de-noised ECG signal, and the singular value, maximum value and standard deviation of the decomposed wavelet packet coeﬃcients are calculated to obtain the frequency domain feature space. The slope threshold method is applied to detect R peak and calculate RR intervals, and the ﬁrst two RR intervals are extracted as time-domain features. The fusion feature space is composed of time-domain and frequency-domain features. Results: A combination of support vector machine (SVM) with the help of grid search and waveform morphological analysis is applied to complete nine types of ECG signal classiﬁcation. Computer simulations show that the accuracy of the proposed algorithm on multiple types of arrhythmia databases can reach 96.67 % . Conclusions: The proposed approach classiﬁed the arrhythmias of ECG signals with promising results. The experimental results reveal that classiﬁcation accuracy can reach 96.67 % when the penalty factor C is 9.1896, and the kernel function parameter g is 0.10882.


Introduction
Cardiovascular diseases have high morbidity and mortality rates in the world, and they have seriously threatened people's health [1,2]. Arrhythmia should be properly identified to reduce the mortality rate, and proper treatments for this condition should be immediately provided to patients. Electrocardiogram (ECG) is an intuitive reflection of human heart activity and composed of a high amount of information about heart activity. Therefore, the monitoring and recognition of ECG signals are important issues in biomedicine. Automatic ECG classification provides great convenience as it helps people monitor heart conditions by using wearable devices and does not require physicians to manually analyse signals. ECG signal feature extraction plays a vital role in the automated classification and diagnosis of ECG signals. Three popular types of ECG feature extraction methods: waveform morphology [3,4,5], transformation-based [6,7] and statistical [8,9] methods. The waveform morphology method is mainly used to extract the waveform information of the feature points of the RR intervals, PR intervals, P waves and T waves of ECG signals [10,11]. The transformation method is mainly utilised to extract the decomposition coefficient as a feature through Fourier transform [12], wavelet transform [13] as well as empirical mode decomposition [7]. Statistical features indicate the distribution of individuals in the overall and characterise the correlation between certain features. Achieving accurate classification by extracting a single feature type can be difficult, so extracting multiple types of features to form fusion features is gradually widely applied [14].
In ECG training and classification, researchers have tried to maximise the detection level of accuracy in various ways. C. Ye [14] performed wavelet transformation and independent component analysis to extract the RR intervals of ECG signals as features; they applied the acquired features to a support vector machine (SVM). S. Raj et al. [15] used sparse representation to decompose ECG signals into fundamental waves, extracted the time delay, frequency, waveform width parameters and expansion coefficients of fundamental waves from the Gabor dictionary as feature vectors and classified them by using a least square double SVM. S. U. Kumar et al. [16] constructed a novel automatic classification system for ECG signals. This system adopts a discrete wavelet transform to extract the features of signals and applies the extracted features to a neighbourhood rough set classification algorithm, which can classify five types of ECG signals.
At present, most ECG signal classification methods use only one classifier for classification. No matter how distinct the waveform characteristics are, a complete feature extraction step is required, which will increase a part of unnecessary calculations. In order to improve this situation, the combination of waveform morphological analysis and SVM optimized by grid search (GS) method was applied to classify ECG signals. With this classification algorithm, nine types of ECG signals should be classified: normal ECG (N), sinus bradycardia (SB), ventricular tachycardia (VT), ventricular premature beats (V), atrial premature beats (A), atrial fibrillation (AF), atrial tachycardia (AT), sinus arrest (SA) and sinus tachycardia (ST). The five types of ECG signals, namely, SB, A, AT, SA and ST, are classified and identified through waveform morphology analysis, and four other types are classified and recognised through SVM.
The general structure of the article is as follows: Sect. 2 describes and analyses the experimental results. Sect.3 is a conclusion of this paper. Sect. 4 mainly introduces the methods used in this paper and why these features are used.

Classification results and analysis
Database used Data are generated from three databases, namely, the MIT-BIH arrhythmia database, Long-Term AF Database (LTAFDB) and Fluke physiological parameter simulator (ProSim 2). When ECG signals are collected, 1,000-point data are intercepted as the data segment, and the sampling frequency is 360 Hz. For each ECG type, 60 sets of ECG signals are extracted, and a total of 540 sets of data are gathered. Specific data sources are shown in Table 1.

Waveform Morphology Analysis
The basic waveform characteristics of nine types of ECG signals to be classified are counted, and the specific waveform statistical information is shown in Table 2.
According to the different waveform characteristics of different arrhythmias in the table, judgment conditions can be set to classify the waveform shape. If only 9 types of ECG signals are classified using waveform morphology, the result of the confusion matrix is shown in Figure 1. Labels 1-9 indicate: N, SB, VT, V, A, AF, AT, SA, ST respectively. The average accuracy, sensitivity and positive predictive value of the waveform morphology analysis classification results are 62.19 %, 65.19 % and 63.91 %, respectively. It can be seen from Figure 1 that both VT and V are misclassified, and the misclassification rates of N and AF are both higher than 50%, which seriously affects the recognition and diagnosis of ECG signals. But for other types of ECG signals, the classification effect is better, and even the classification accuracy of these types of signals (SB, SA, ST, AT) has reached 100%.

Classification results
The GS algorithm is applied to determine the best parameters of the SVM classifier. As shown in Figure 2, it is a parameter of the 3D graph of the support vector machine through grid search optimisation. In Figure 2, the classification accuracy of all possible combinations has been calculated, and the classification accuracy of C and g in the gentle range on the right basically remains unchanged, reaching the highest classification accuracy. The results show that the optimised penalty factor C is 9.1896, the kernel function parameter g is 0.10882, and the cross-validation accuracy is 99.63%.
In the design of the program, N, VT, V and AF are the types of ECG detected with the SVM. SB, A, AT, SA and ST are the types of ECG signals detected in the time domain. In the actual detection, the 270 sets of data in the prediction set are also used as the signal to be tested, and 1,000 points are still utilised as the input signal. The comparison between actual and predicted classification results is shown in Figure 3.
Four evaluation criteria are used to evaluate the classification performance of the classifier: accuracy (Acc), sensitivity (Sen), specificity (Spe) and positive predictive value (PPV). These parameters are expressed as follows: where TP (true positive) is the number of the correct classified beats for any class, FN (false negative) is the number of the incorrect classified beats in the other used   Table 3. The average accuracy, sensitivity and positive predictive values of the statistical classification results in the table can reach more than 96%, and the specificity is 99.63%. Comparison A comparison of the proposed method with several other methods is presented in Table 4. The accuracy of the proposed method is high, but this comparison may be relatively unfair because the number and types of ECG signals are different. The accuracy of the proposed method is high, and few samples are required. Although this method has a certain improvement in accuracy compared with other classification methods based on SVM, the algorithm proposed still has certain limitations. First, the amount of data selected in this paper is not very large, and there may be some deviations in the results; second, this paper only uses the RR interval and the PR interval based on the waveform analysis. In addition to these features, there are other useful features that we did not use, such as QRS interval and so on. In future research, we will focus on more reasonable judgment conditions for waveform feature analysis and extraction.

Conclusions
The proposed approach classified the arrhythmias of ECG signals with promising results. Frequency-domain and time-domain methods are applied to extract the frequency-domain statistical features and the time-domain waveform features from the preprocessed ECG signal. The two features are combined into a fusion feature, which is then divided into a training set and a test set. SVM is used to train nine types of the ECG signals of training sets, and GS is utilised to optimise the parameters of the SVM to obtain an efficient ECG signal training model. Experiments revealed that classification accuracy can reach 96.67% when the penalty factor C is 9.1896, and the kernel function parameter g is 0.10882. However, this study has some limitations. The data used are obtained from the database and the physiological parameter simulator and are not applied to actual test experiments. Moreover, these data cannot reflect individual differences in human ECG signals. In the future, a large amount of clinical data will be used to verify and improve the algorithm, and actual experiments will be utilised to test the algorithm. We will focus on the ECG signal classification algorithm for wearable automatic detection. ECG signals can be autonomously detected and identified with wearable ECG signal detection equipment to provide certain technical means and prevent heart disease.

Proposed Method
The heartbeat classification scheme is composed of pre-processing, feature extraction, dimension reduction and classification. The study mainly explored ECG classification method.The ECG signal classification method based on waveform morphological analysis has a simple process and a fast classification speed; therefore, it has a better practical value in a wearable ECG signal monitoring system. However, the classification method based on waveform shape analysis relies too much on the detected ECG waveform, whereas the classification method based on SVM has good stability. Classification results via SVM need to train the extracted features, so a certain delay in time is observed when they are actually applied to the wearable ECG signal monitoring system. Therefore, considering synthetically, a combination of waveform morphological analysis and SVM classification method is applied. The specific flowchart is shown in Figure 4.

ECG signal preprocessing
The preprocessing algorithm proposed by Wang X. [18] is utilised to reduce the noise of an ECG signal. In this algorithm, a lifting wavelet transform is mainly employed, and a semisoft threshold is improved to denoise the ECG signal. Five layers of sym8 lifting wavelet decomposition are performed on the signal. The detailed coefficients of the third and fourth layers (that is, high-frequency components) are mostly similar to the frequency range of the ECG signals, so the improved semisoft threshold is applied to further process the two coefficients and remove the noise in the coefficients.The semisoft threshold function used is as follow: where, λ i = median(d i )/0.6745 2log(length(d i )), i = 3, 4. Then, the two coefficients are used to reconstruct the ECG signal, obtain the denoised ECG signal and complete the preprocessing of the ECG signal.The preprocessing result diagram are shown in Figure 5.

Time-domain feature extraction
An ECG signal has a periodicity in waveform, and the R peak is the waveform with the largest amplitude and slope in an ECG beat, so the detection and location of this peak is the most critical part in ECG waveform detection. The proposed slope threshold method is used to detect the R peak and then extract RR intervals. Although this method is easily interfered with noise signals, its design idea is relatively simple and has a fast calculation speed. It can also be used for real-time calculation. The basic idea is as follows: the slope of each point of the denoised signal is calculated, and a judgment condition is set because the slope of the point near the R peak has a maximum. When the slope meets the judgment condition, the threshold is determined. When the point in a certain area satisfies the threshold condition, the maximum amplitude point in the area is determined as the detected R peak. For example, in a normal ECG signal, the collected 1,000-point normal signal is simulated with a time-domain waveform feature extraction algorithm. Figure  6(a) shows a schematic for detecting the R peak of a normal signal. A threshold value is set to determine whether the slope of each point exceeds the threshold value. If the set condition is met, the maximum amplitude point is searched within the range of 10 points before and after the point as the R peak, then the ECG signal missing detection and false detection need to be conducted. RR intervals are calculated on the basis of the detected R peak. An RR interval of more than 360 points indicates that information is missed during this period and needing to be detected again. At the same time, in order to prevent the concept of refractory period, an RR interval of less than 200 ms indicates a case of false detection; as such, detection should start again from the previous R peak. When the slope threshold method is used for time-domain feature extraction, the collected samples need to be denoised, and the ECG signal is squared to increase the amplitude of the R peak. The R peaks of eight types of arrhythmias detected using the slope threshold method to extract the R peaks of eight types of arrhythmias are shown in Figure  6(b). The first two RR intervals are taken as the time-domain waveform features of ECG signals, and the acquired time-domain waveform features can be expressed as follows: v = [r 1 , r 2 ] T , so the characteristic space of the time-domain waveform is

Frequency-domain feature extraction
Analysing ECG signals is quite difficult because it is a naturally complex signal even when they are observed in healthy people. ECG signals can be represented in two dimensions through wavelet packet decomposition (WPD) to accurately describe this complex behaviour, which can well characterise the time-frequency features of the signal. However, the selection of wavelet base and decomposition level affects the final decomposition effect and feature dimension, respectively, so appropriate levels and wavelet base functions should be selected. A db6 wavelet base has good regularity, limited support in time domain, relatively smooth in reconstruction signal and a strong localisation ability; it can also reflect the characteristics of ECG signals. Therefore, the db6 wavelet base is used as the basis function of decomposition. As the main frequency band of the QRS complex of an ECG signal is concentrated in the range of 0.5-45 Hz, and the layer coefficient of WPD contains this frequency band, hence, this paper performs 4-level wavelet packet decomposition on the ECG signals. Figure 7 shows the process diagram of frequency-domain features. S0-S15 are 16 wavelet packet decomposition coefficients. WPD is applied to decompose nine kinds of ECG signals, and coefficients are calculated statistically. The singular value eigenvectors of each group of data can be expressed as follows: x = [x 1 , x 2 , · · · , x 16 ]. The maximum value eigenvector can be presented as follows: y = [y 1 , y 2 , · · · , y 16 ]. The standard deviation eigenvector can be written as follows:z = [z 1 , z 2 , · · · , z 16 ]. The final frequency domain statistical eigenvector is shown as follows: w = [x, y, z]. A total of 540 sets of ECG data are extracted, so the final statistical feature space of the frequency domain is T (540×48) = [w 1 , w 2 , · · · , w 540 ] T .

Fusion feature space
The fusion feature extraction method is realised by combining frequency domain statistical feature extraction and time domain waveform feature extraction. The feature space M 540×50 = [F, T ] is fused into a multidomain fusion feature space M by combining the frequency domain statistical feature space T (540×48) = [w 1 , w 2 , · · · , w 540 ] T extracted from the WPD statistical analysis algorithm and the time domain waveform feature space T (540×2) = [v 1 , v 2 , · · · , v 540 ] T obtained from the slope threshold algorithm, that is, the feature extraction of all ECG signals is completed. The fusion feature space of the extracted ECG signals is used as the training set and the test set for subsequent ECG signal classification, and classification results are analysed.
SVM and optimisation algorithms of its parameters SVM is a supervised learning algorithm that provides good results in a large database because of its basic optimisation methods. It generates optimal discrimination hyperplanes to discriminate the features in SVM. The kernel function is important to acquire high performance in different datasets. The RBF function is used as a kernel function. The RBF function is expressed as follows: Unlike a linear kernel function, the RBF function can classify multidimensional data. Additionally, the RBF has fewer parameters to set than a polynomial kernel function. The two main parameters (SVM penalty factor C and kernel function parameter g) must be set appropriately. C is a constant, which can control the penalty of misclassified samples during classification so that the percentage of misclassified samples and the algorithm complexity can be balanced. When C is too small, the classification accuracy becomes unsatisfactory, making the model invalid. g greatly affects the division result in the feature space and thus the classification result. GS algorithm is used to establish the optimal parameters of the RBF function. The steps of the GS method to optimise the parameters of SVM are as follows: Step 1: Set the step accuracy of C and g to 0.8.
Step 2: Set the initial range of C and g to −2 8 to 2 8 .
Step 3: Keep C unchanged, and traverse the value of g to obtain the crossverification accuracy.
Step 4: Change the value of C.
Step 5: Terminate. The procedures from steps 3 to 4 are repeated until C reaches the maximum value.
Step 6: Obtain the comparison output that yields the combined value that maximises the cross-validation accuracy and minimises C.