ECG Heartbeat Classification Based on Signal-to-Image Transformation Using Convolutional Neural Networks


 Background: Electrocardiogram (ECG) is a method of recording the electrical activity of the heart and provides a diagnostic mean for heart-related diseases. An arrhythmia is any irregularity of heartbeat that causes an abnormality in one’s heart rhythm. Early detection of arrhythmia has great importance to prevent many diseases. Manual analysis of ECG signal is not sufficient for quickly identifying arrhythmias that may cause sudden deaths. Hence, many studies have been presented to developed computer-aided diagnosis (CAD) systems to automatically identify arrhythmias.Methods: This paper proposes a novel deep learning approach to identify arrhythmias in ECG signals. The signals are obtained from MIT-BIH arrhythmia database and are categorized according to five arrhythmia types. The proposed approach identifies arrhythmia classes by using Convolutional Neural Network (CNN) architecture trained by two-dimensional (2D) ECG beat images. CNN architecture is selected due to high image recognition performance. ECG signals are segmented into heartbeats, then each heartbeat is transformed into a 2D grayscale image. The heartbeat images are used as input for the CNN. Results: The proposed CNN model is compared to other common CNN architectures such as LeNet and ResNet-50 to evaluate the performance of our study. Overall, the proposed study achieved 99.7% test accuracy in the classification of five different ECG arrhythmias.Conclusions: Testing results demonstrate that CNN trained by ECG image representations provide outstanding classification performance of arrhythmic ECG signals and outperforms similar network architectures. Hence, the proposed approach provides a robust method for the classification of ECG arrhythmias.


Background
A heartbeat is an event that occurs when the heart contracts and relaxes rhythmically. Electrical system of the heart is made up of two main parts: Sinoatrial (SA) node which is located in the right atrium of the heart and atrioventricular (AV) node that is located in the middle of the heart, between the atrium and the ventricle. The electrical trigger starts in the SA node travels from the atria to the AV node. The signal is delayed in the AV node and then spreads through the lower chambers. The ventricles contract, sending blood throughout the body, and heartbeat occurs. ECG is a tool used to observe electrical cardiac activity. Each heartbeat has a P wave, QRS complex, and a T wave that represents repolarization and depolarization of the atria and ventricles of the heart. Heart rate for a healthy person ranges from 60 to 100 beats per minute [1]. It is controlled by the sinus node which is the natural pacemaker of the heart. The heartbeat depends on one's instant activity that it may beat slower or faster. For example, our heart beats faster when we exercise because the frequency of the electrical impulses goes up, causing an increase in the heart rate to supply a sufficient amount of oxygen to the organs. During resting or sleeping, the heart beats slower than active conditions. Arrhythmia is any abnormality in the cardiac cycle which can be considered as an irregular heart rate or irregular waveform [2]. A heart that has an arrhythmic heartbeat cannot pump enough blood throughout the body as well as it should. This condition may damage many organs and threat to daily life. Since cardiac arrhythmias are a major threat to human health, its early and accurate detection is essential in medical sciences [3]. Manual analysis of the ECG signal is not sufficient to quickly identify abnormalities in the heart rhythm. Analyzing long-duration ECG signals by physicians is a timeconsuming task that may yield inaccurate results. Developing an automatic cardiac arrhythmia detection algorithm reduces the physician's workload, decreases arrhythmia detection time, and also improves diagnostic efficiency and accuracy. In order to accurately detect abnormalities in the ECG signal, many studies in the literature proposed computer-aided systems by using different feature extraction and classification techniques. 3 There have been several methods for automatically detecting arrhythmias based on signal processing, feature extraction, and learning algorithms [4,5]. Recorded ECG signals are generally contaminated by different noise types or artifacts which may change the characteristics of the ECG signal. In the preprocessing stage, contaminants are removed from ECG signals by applying different filtering operations [6,7]. The feature extraction stage is crucial for the discrimination of arrhythmic signals from regular ones. Features are extracted from the ECG signals by using various forms in the time-or the frequency-domain [6]. Among the time-domain ECG morphology and heart rate features [8], R-R interval and linear discriminant analysis (LDA) [9] have been widely used. In the frequency-domain, features based on Fourier transform [10], spectral correlation [11], and variational mode decomposition (VMD) [12] have been used. The preprocessing and feature extraction stages construct an analysis system for the final learning algorithms. Conventional machine learning algorithms have been utilized in previous studies for the classification of different arrhythmia types. Support Vector Machine (SVM) [13], Random Forest (RF) [14], Artificial Neural Networks (ANN) [15] have been used. In [16] [18].
The architecture of conventional neural network algorithms contains input, output, and hidden layers.
Deep Learning (DL) is a recent approach that contains more than three layers and has become more favorable compared to conventional techniques [19]. In DL, feature extraction and classification parts are embedded in the model which automatically identifies the optimal features from the input data [20]. DL has become popular in recent studies since it provides improved performance of ECG arrhythmia classification. DL may be categorized into different types based on the training methods such as recurrent neural networks (RNNs), deep neural networks (DNNs), convolutional neural networks (CNNs), and Long short-term memory (LSTM). Zhang et al. [21] proposed an RNN and clustering-based method to find patient-specific ECG classification algorithms by using the MIT-BIH arrhythmia database. Al Rahhal et al. [22] proposed a DNN based method to classify ECG signals using three different databases. In [23], the temporal features of ECG heartbeats are detected with DNN on the MIT-BIH database with 99.09% accuracy. Yildirim et al. designed 16-layer deep CNN to classify 17 different arrhythmias on ECG signals taken from the MIT-BIH dataset. In [24], authors proposed a novel 3-layer deep genetic ensemble of classifiers to detect 17 different arrhythmias which achieved 99.37% classification accuracy.
Baloglu et al. [29] used CNN algorithms with the end-to-end structure on 12-lead ECG signals for automatic detection of myocardial infarction on ECG and the proposed structure yielded significant performance with over 99% accuracy. Savalia et al. [30] proposed multilayer perception (MLP) and CNN based methods to identify first degree AV block (FAV) and ventricular bigeminy diseases. In [31], authors proposed an 11-layer CNN structure to detect different ECG segments with accuracy of 5 92.50% using 2 seconds and 94.90% using 5 seconds ECG segments. Yao et al. [32] proposed an attention-based time-incremental convolutional neural network (ATI-CNN), that preserve spatial and temporal characteristics of ECG signals with the integration of CNN architecture and recurrent cells.
Their results attained to 81.2% accuracy. Besides the use of 1D CNN for ECG signal classification, several methods are proposed based on the 2D CNN algorithm for ECG image classification. The hidden structure of CNN can extract various local features from 2D input samples. The spatially adjacent pixels can be provided by utilizing nonlinear and multiple filters. Hence, the recent state-ofthe-art studies proposed 2D CNNs for ECG arrhythmia classification problems, and these approaches have motivated researchers to implement CNNs with 2D image-based estimation process. In these studies, ECG signals are converted into 2D images and provided as input data for CNN [18,33,34].
The CNN architecture is considered to be suitable for the analysis and classification of 2D data. It achieves better results compared to other classical techniques in image processing [35]. In [18], a method is presented for heartbeat classification by using 2D CNN and 2D grayscale ECG images that are transformed from ECG beats of the MIT-BIH database. The proposed model achieves accuracy of 99.05%. Huang et al. [36] used time-frequency spectrograms that are obtained from five different ECG beats by short-time Fourier transform (STFT). Spectrogram images are utilized as input data to the 2D deep CNN which yielded an average accuracy of 99.00%. Jun et al. [18] proposed deep 2D CNN based 7 different arrhythmia detection using 2D grayscale ECG beat images and compare their proposed architecture performance with well-known structures such as AlexNet and VGGNet.
Considering the benefits and drawbacks of the existing techniques, this paper proposes a novel DL approach for identifying different arrhythmia types in ECG signals. In this approach, CNN model is selected for DL model and ECG signals are converted into ECG images for use as an input to the CNN model. Examining ECG images for arrhythmia detection by CNN is much similar to expert examination. CNN model directly processes the input images while noisy ECG beats do not affect the performance of classification. The model automatically extracts features from the raw ECG images. 6 These properties provide to elimination of noise filtering and feature extraction steps. ECG recordings are taken from MIT-BIH database [37] which includes different arrhythmia types categorized according to irregularity of heart by cardiologists. In this study, five different arrhythmia types; normal (N), left bundle branch block (LBBB), premature ventricular contraction (PVC), premature atrial beat (PAB), and right bundle branch block (RBBB) are considered for classification. Before converting 1D ECG signals into 2D images, segmentation is applied to the ECG signals to extract the ECG beats.
Then each ECG beat is converted into a 2D grayscale image and used as input data for CNN architecture. Classification performance of the proposed DL approach is compared to LeNet and ResNet-50 architectures. Experimental results show that the proposed image transformation method and network architecture provided 99.7% accuracy for the classification of five different arrhythmic heartbeat types. Compared with other popular DL approaches such as ResNet, the proposed method provides superior performance for classifying ECG images with fewer layers and less complex architectures. Thus, it is aimed to create a deep learning model for real-time wearable ECG arrhythmia systems to make quicker and more accurate predictions.

Methods
The proposed ECG arrhythmia classification algorithm based on 2D deep learning approaches consists of following steps: Obtaining ECG signals from public database, heartbeat segmentation, image transformation, and ECG arrhythmia classification by using 2D CNN architecture. Schematic diagram of the methodology is shown in Fig. 1. ECG arrhythmic signals are taken from the MIT-BIH arrhythmia database [37]. Five different arrhythmia types are selected from the database. In order to generate 2D images that will be applied as input to CNN architecture, ECG signals are segmented into heartbeats, then converted into ECG heartbeat images. 7

ECG Database
MIT-BIH arrhythmia database [37] includes different arrhythmic signals which are independently annotated by two or more cardiologists according to their arrhythmia types. Each record includes twochannel ECG signals which are the modified limb lead II (MLII) and one of the modified leads V1, V2, or V5. Due to the deformation of the second channel [38], MLII lead recordings are used in this study. The duration of each signal is 30-min. with 360 Hz sampling frequency and filtered by 0.1-100 Hz bandpass filter. The MIT-BIH database is well-known to be imbalanced by the non-equal number of ECG beats for each arrhythmia which deteriorates the accuracy of DNN and CNN models [39]. The deep learning algorithms may tend to be biased for the type of arrhythmia classes that include a large number of samples as the number of heartbeats are not equal for each class of arrhythmia in the dataset.
Arrhythmia classes with approximately same number of heartbeats are utilized to train the model to eliminate the imbalance effect of the database. Five different arrhythmia types are selected from the database as follows: normal (N), left bundle branch block (LBBB), premature ventricular contraction (PVC), premature atrial beat (PAB), and right bundle branch block (RBBB). ECG signals collected from the database are segmented into beats to generate single heartbeat images for CNN structure. The sample sizes of ECG beats for the considered classes are given in Table 1. Notice that the number of beats in each arrhythmia class is very close.

Heartbeat Segmentation
Python programming language is utilized for analyzing and classifying ECG arrhythmic signals. In the MIT-BIH database, each heartbeat of a signal is annotated by cardiologists based on the QRS structure and type of the heartbeats. To identify annotated heartbeats, WFDB Toolbox for Python is applied to the ECG signals. This toolbox finds the QRS peak of each beat on the signal, separates heartbeats from the signal, and categorizes them according to their arrhythmia types. Each heartbeat is segmented according to its QRS peak location and categorized arrhythmia types. Examples of the segmented 8 arrhythmic beats are demonstrated in Fig. 2. ECG records and number of ECG beats for each arrhythmia type are shown in Table 1. After completing the segmentation, each ECG beat is converted into ECG image formation.

Convolutional Neural Network
CNN is a deep learning algorithm based on artificial neural network structures. Conventional machine learning techniques include three layers which are the input layer, one hidden layer, and output layer [41]. An artificial neural network consists of more than three layers that have several hidden layers in its structure. The structure is inspired by brain working system that includes hidden layers, many artificial neurons in the multiple layers. In the hidden layer which includes many neurons, the input is transformed into something that the output layer can use. Neurons provide feature detection from the input data. Mathematical representation of artificial neuron can define as; where j f is a function of the input x weighted by a vector of connection weights j w completed by a neuron bias j b , and associated with an activation function φ. The schematic diagram is shown in Fig.   3.
Artificial neural networks have been used in many different areas as computer vision, speech recognition, natural language processing, bioinformatics, drug design, medical image analysis. CNN is a part of artificial neural networks especially designed for analyzing two-dimensional data like images or videos [42]. In contrast conventional machine learning techniques, CNN architecture does not need to extract hand-crafted features from the raw data. Both feature extraction and classification parts are embedded in the architecture and so automatically identify the robust features from the input 10 data [20]. CNN has three characteristic layers; convolution, pooling, and fully connected layer.
Convolution and pooling layers are used as feature extraction, fully connected is used as a classifier.
In the convolution layer, input samples are convolved with a specific kernel. Many features are selected by changing the kernel size [33]. Equation of discrete convolution function can define as; (2) where f and g are two functions. For 2D signals like images, the equation is changed as; where K is a convolution kernel, G is a 2-D signal. Convolution process provides to extracts effective features from the input. The pooling layer is applied for reducing the dimension of the input sample by keeping the optimal features. In the fully connected layer, final layer which is convolution or pooling layer flattened and neurons of previous layer are connected to other neurons that are found current layer [43]. After the finishing process, the CNN model is created.
In this study, a novel CNN model is designed to inspired LeNet model which is a well-known CNN model. The novel model has three convolution layer, three pooling layers, and a fully connected layer.
Maximum pooling is implemented in the pooling layer. It selects only maximum value within the convolution filter and this provides reducing the number of output neurons. The model structure can be seen in Fig. 4. To specify the output values of the kernel in the layer, activation function is used. In the experiment, rectified linear units (ReLU) are applied as activation function and its equation is where x is an element of the output kernel after the convolution or pooling layer. 11

Training and Testing
The cross-validation is usually not processed because of training costs and time duration for deep learning approaches. The random split of train and test size is selected instead of the cross-validation.
The deep learning-based studies need to process huge datasets so the validation split technique can be able to provide the function of n-fold-cross-validation. The dataset is separated into two parts as 80% of total data is used for training and 20% is used for testing with using random validation split. Hence, the test size is included 7519 ECG beat images and label distribution of ECG beats in test sample size In the proposed algorithm, the learning rate is defined as 0.001. Adam optimizer is used to optimization and cross-entropy is selected to loss function calculation. After every training epoch is completed, the proposed algorithm performs a test on the CNN model. The summary of CNN architecture is shown in Table 2. 12 The proposed CNN model is compared with LeNet and ResNet-50 architectures to evaluate the accuracy rate of the study. LeNet architecture contains two convolution layers, two pooling layers, and a fully connected layer. The summary of LeNet architecture is demonstrated in Table 3.

Performance Evaluation Metrics
The performance of the proposed model is evaluated utilizing various metrics. These metrics are accuracy, specificity, recall, precision, and F1-Score. The accuracy is indicated as the total number of correctly classified ECG beat images divided by the total number of test images. The accuracy-based performance evaluation is performed for machine and deep learning algorithms, which is not sufficient in the case of imbalanced labeled testing and training sets. Hence, additional performance metrics are included in the evaluations. The sensitivity (Recall) metric is referred to as a true positive rate which is predicted as positive, while the specificity metric is referred to as a true negative rate which is predicted as negatives. The precision is referred to the proportion of classified positive cases that are 13 correctly real positives. The F1-Score metric is calculated by evaluating the harmonic mean of precision and recall values. These performance metrics are calculated as follows;   The confusion matrix is provided to express the number of true positives (TPs), true negatives (TNs), false positives (FPs), false negatives (FNs), and test sample size. The above five evaluation metrics are generated after the calculation of the confusion matrix process. Here, The FPs referred to the number of test samples for different classes but is incorrectly assigned for actual class, while FNs referred to the number of test samples for actual class, but is incorrectly assigned for different classes.
TPs are the number of test samples that actually belong to a class and correctly indicated as the same class and TNs are the number of test samples for different classes that are correctly predicted by the proposed model.
The Receiver Operating Characteristic (ROC) curve serves as a graphic presentation of the trade-off between sensitivity (the false-positive rates) and specificity (the false-negative rates). The percentage of false-positive is indicated by X-axis and the percentage of false negative is indicated by Y-axis on the representation of the ROC curve. The ideal ROC curve can be pointed to ranging from 0 to 100.
The area between ROC and the axes calculated as the Area Under the ROC Curve (AUC). The AUC is an appropriate metric to examine algorithm performance as it is independent of the prediction 14 criterion selected. The performance of the proposed model has been evaluated thanks to the performance evaluations mentioned above.

Results
In    Fig. 6. Thus, we observed that the harmony between the higher performances and the complexity of the structure is related to the selection of effective layer types that are in the correct order, filter dimensions, and other training parameters.
The developed automatic cardiac arrhythmia detection algorithm improves the diagnostic efficiency and accuracy while reducing the training time. The innovative contribution of our proposed study can be highlighted as follows:  This paper proposes a novel deep learning approach for identifying different arrhythmia types utilizing 2D ECG beats that are provided from 1D ECG signals by signal-to-image transformation procedure.
 We demonstrate the benefits of using 2D gray-scale images on the proposed CNN structure.
We investigate the performance improvements by using a 2D CNN-based model in the classification of arrhythmia types as compared to well-known CNN approaches such as LeNet, and ResNet-50.

Conclusions
This paper proposed a method for accurate classification of ECG arrhythmias based on 2D CNN architecture. CNN is mainly used for image recognition in the literature. As such we propose 17 representation of ECG heartbeats as 2D gray-scale images and train a CNN to classify these images.
Each ECG signal is segmented into heartbeats according to arrhythmia types. Then heartbeats are transformed into gray-scale ECG images and used as input for CNN. Advantage of transformation from signal to image is the elimination of preprocessing for removing noise. The proposed novel 2D CNN architecture has achieved high performance with the already known CNN structures like LeNet and ResNet, Furthermore, the architecture includes fewer layers compare to ResNet and demonstrate the highest performance for classification of arrhythmic signals. Comparison of arrhythmia classification results of the proposed study with previous studies may be seen in Table 5.
Traditional pattern recognition algorithms consist of preprocessing, feature extraction, feature selection, and classification steps. Through CNN architecture, preprocessing, feature extraction, feature selection steps may be eliminated. These steps are embedded in the CNN model. Furthermore, deep learning approaches are proper to find optimal features due to inherently hidden layers in the structures. This study demonstrated that detection of arrhythmias using ECG time-amplitude images and CNN architecture is a successful method and may help experts for diagnosing cardiovascular diseases quickly and precisely. In future studies, the proposed algorithm may be developed for home healthcare monitoring systems, by combining mobile applications and portable ECG devices, that will automatically detect arrhythmias and send them to physicians.