As illustrated in Fig. 1, the ECG data to be classified by the proposed algorithm are read and pre-processed. Then, a dataset is constructed with the processed data. Next, the classification model is trained and used for classification. During the pre-processing stage, adaptive filtering is employed for noise reduction. After pre-processing, the obtained dataset is divided into a training set and test set. The network model is then trained under the Caffe framework from the beginning using random initial weights. Finally, arrhythmias are classified using the trained model.
Pre-processing
In this study, the MIT-BIH arrhythmia database provided by the Massachusetts Institute of Technology was used to ensure data authenticity and reliability. This database consisted of over 4,000 recordings from patients, containing common ECG signals as well as rare abnormal ECG signals, and each recording was annotated and verified by expert cardiologists. In this study, five categories of ECG signals were selected for classification, namely, normal heartbeat, left bundle branch block beat (LBBB), right bundle branch block beat (RBBB), PAC and PVC, which accounted for 66.63%, 7.17%, 6.44%, 2.26% and 6.33% of the MIT-BIH arrhythmia database, respectively, while other abnormal signals accounted for less than 1% [11-12].
Due to the different data acquisition equipment and methods used in the ECG signal acquisition process, interference may have been introduced into the acquired ECG signal, resulting in additional noise. Thus, the acquired ECG signal may have contained noise such as power line interference and baseline wander [13]. To avoid large errors in ECG signal recognition, pre-processing was performed before signal detection to reduce the impact of interference on the original ECG signal.
In this study, the ECG signal pre-processing stage consisted of the following steps. First, a band-pass filter with a bandwidth in the range 0.05–100Hz was adopted to filter the original signal. Then, the dual-slope method was used to locate the R peak. Next, a low-pass filter with a cut-off frequency of 0.05Hz was used to remove the interference signal. Finally, to prevent the misrecognition of noise as a QRS complex, moving window integration and a self-defined threshold were used.
Normally, the frequency of an ECG signal is in the range 0.05–100Hz. To remove unnecessary information from the ECG signal, a 40-order finite impulse response (FIR) band-pass filter with a band-pass frequency in the range 0.05–100Hz was designed in this study to filter the original signal. Fig. 2(a) presents the original ECG signal while Fig. 2(b) presents the ECG signal after band-pass filtering.
After filtering the ECG signal, dual-slope pre-processing was employed to detect the peak of the QRS complex, thereby obtaining the effective waveform of the ECG signal. The fundamental principle of dual-slope pre-processing is based on the characteristics of the QRS complex, namely, that it is steep on both sides of the R peak. Using these characteristics, the point with the largest slope within the interval on both sides of that point is determined. In this method, first, the maximum and minimum average slope in the intervals on both the left and right sides of a point are calculated. Then, the slope difference is obtained by subtracting the minimum average slope of one side from the maximum average slope of the other side. Finally, the two average slope differences are compared, and the maximum slope difference is used. Through the dual-slope calculation, the peak of the QRS complex can be located. Fig. 3(a) presents the signal after band-pass filtering while Fig. 3(b) presents the signal after dual-slope pre-processing.
After the dual-slope calculation, a low-pass filter with a cut-off frequency of 5 Hz was used to remove the interference signal. Figs. 2 and 3 indicate that the amplitude of the ECG signal gradually decreased after the band-pass filtering and dual-slope pre-processing. However, an amplitude that is too small is not conducive to the subsequent detection and segmentation of the ECG signal.
To highlight the characteristic points of the QRS complex, moving window integration was used in this study. By selecting a window with a certain width and moving from the initial point in the signal, the signal within the window is integrated during the moving process, and the integrated value is used to represent the amplitude of the signal. The moving window integration algorithm can magnify the effective information in the amplified signal, thereby increasing the absolute amplitude of the signal. In addition, moving window integration allows the peak of the waveform to become smooth and the slope to become less sharp. In experimental test in this study, the width of the moving window was set to 17 sampling points, which achieved the most desirable window integration effect. Fig. 4(a) presents the low-pass filtered signal while Fig. 4(b) presents the signal with a magnified amplitude after moving window integration.
The signal after moving window integration displayed prominent characteristics; hence, the location of the QRS complex could be determined according to the QRS threshold, and the ECG signal could be further segmented. In this study, the dual-threshold method was designed to locate the QRS complex in the integrated signal. When the peak amplitude of the integrated signal exceeded the lower threshold, it was compared with the higher threshold to determine whether a QRS complex was detected [14]. Furthermore, to ensure that the thresholds could be flexibly adapted to ECG signals with different forms, the higher and lower thresholds in this study were designed to independently change according to the variations in the peak amplitudes detected previously. The design of the dual threshold and adaptive threshold mainly aimed to reduce the number of missing and false detections of the QRS complex. In addition, if two peaks were too close to each other, the peak with the larger amplitude was retained as an R peak according to the dual threshold, thereby preventing the misrecognition of some types of noise.
After detection, the peak of the ECG signal was determined. Based on the location of the peak, the input ECG signal was segmented into a series of single ECG beats. The single ECG beats derived from segmentation were fed to the subsequent 1D CNN for the classification of ECG signals.
1D CNN
A neural network is an abstract mathematical model inspired by the human brain that has been developed by modern neuroscience. Neural network models have wide applications and can be used for the classification and prediction of problems with indescribable rules [15].
A CNN is a widely used neural network model that is able to extract local features of data to establish local connections. There are multiple filters in each convolutional layer that can extract multiple feature parameters. The shared weight in the convolutional layer and the pooling operation in the pooling layer can reduce the difficulty of network training and reduce the data dimensions, thus avoiding excessive computational complexity during parameter extraction. For an image, there is a connection between local details and the global area, and the combination of low-level features forms a high-level feature representation. This principle also applies to ECG signal processing [16–17]. Therefore, CNNs have advantages in ECG signal processing. Because an ECG signal consists of 1D data, a 1D CNN is adopted in this study for classification.
The most commonly used CNN in image processing is a two-dimensional (2D) CNN; however, a 1D CNN is more suitable for processing time-series data derived from sensors, such as an ECG signal. A 1D CNN has the same characteristics and processing methods as a 2D CNN. The width of the 1D CNN is fixed, whereas the length can be set to different values according to the required processing [18]. As illustrated in Fig. 5, a 1D CNN slides from left to right without repeating sliding, whereas a 2D CNN must return to the start position and repeat sliding. As a result, during the feature extraction process of the ECG signal, a 1D CNN can reduce redundant computation more effectively than a 2D CNN, thus greatly increasing the computation speed.
The core of a 1D CNN lies in the 1D convolutional layer. Suppose that there is an input sequence xi (i = 1, 2, ..., n) and the weight is set to wj (j = 1, 2, ..., m). The kernel (also known as filter) in the current layer performs a convolution operation on the input signal of the previous layer. Then, the output after the current convolution layer is as follows [19]:

In a CNN, each neuron in the current layer forms a connection network only with neurons in the local window of the previous layer. Usually, an activation function is required for non-linear feature mapping before the output of the convolutional layer [20].
After feature extraction by the convolutional layer, feature selection and information filtering are required by the pooling layer. Generally, there are two types of operations for the pooling layer that either output the maximum value or average value of clustered neurons. The task performed by the pooling layer is actually a subsampling process that reduces the high computational complexity generated by the convolutional layer while ensuring the integrity of feature extraction and preventing overfitting of the neural network output [21].
Based on the feature data and expected target, a five-layer 1D CNN was designed in this study to train the pre-processed data. This CNN has the capacity to learn useful features through a training process. As illustrated in Fig. 6, the proposed 1D CNN consisted of two convolutional layers, two pooling layers and a fully connected layer. Considering the size of the input ECG beat, the filter length in the first convolutional layer was set to 31, the number of filters was set to 4, and the rectified linear unit (ReLU) function was used as the activation function [22]. The window size in the first pooling layer was set to 5, and the average pooling method was used. The filter size in the second convolutional layer was set to 6, the number of filters was set to 8, and the ReLU function was also used as the activation function. The window size in the second pooling layer was set to 3 using the average pooling method. Finally, the output obtained through the convolutional and pooling layers was sent to a fully connected layer for the final output.
Batch normalisation (BN) was also used in the proposed network. BN is a training optimisation method proposed by Google [23–24]. Normalisation refers to data standardisation while batch refers to a group of data; therefore, BN refers to the standardisation of a group of data. After applying BN to the input data and the output of the intermediate network layer, the changes produced by the internal neurons and the sample differences can be reduced. Therefore, most of the data can be maintained in the unsaturated region, thus ensuring gradient back-propagation to prevent the gradient vanishing and exploding problems [25].