Arrhythmic Heartbeat Classification Using 2D Convolutional Neural Networks

Background: Electrocardiogram (ECG) is a method of recording the electrical activity of the heart and it provides a diagnostic means for heart-related diseases. Arrhythmia is any irregularity of the heartbeat that causes an abnormality in the heart rhythm. Early detection of arrhythmia has great importance to prevent many diseases. Manual analysis of ECG recordings is not practical for quickly identifying arrhythmias that may cause sudden deaths. Hence, many studies have been presented to develop computer-aided-diagnosis (CAD) systems to automatically identify arrhythmias. Methods: This paper proposes a novel deep learning approach to identify arrhythmias in ECG signals. The proposed approach identifies arrhythmia classes using Convolutional Neural Network (CNN) trained by two-dimensional (2D) ECG beat images. Firstly, ECG signals, which consist of 5 different arrhythmias, are segmented into heartbeats which are transformed into 2D grayscale images. Afterward, the images are used as input for training a new CNN architecture to classify heartbeats. Results: The experimental results show that the classification performance of the proposed approach reaches an overall accuracy of 99.7%, sensitivity of 99.7%, and specificity of 99.22% in the classification of five different ECG arrhythmias. Further, the proposed CNN architecture is compared to other popular CNN architectures such as LeNet and ResNet-50 to evaluate the performance of the study. Conclusions: Test results demonstrate that the deep network trained by ECG images provides outstanding classification performance of arrhythmic ECG signals and outperforms similar network architectures. Moreover, the proposed method has lower computational costs compared to existing methods and is more suitable for mobile device-based diagnosis systems as it does not involve any complex preprocessing process. Hence, the proposed approach provides a simple and robust automatic cardiac arrhythmia detection scheme for the classification of ECG arrhythmias.


Introduction
A heartbeat is an event that occurs when the heart contracts and relaxes rhythmically.
Electrocardiogram (ECG) is a tool used for observing the electrical activity of the heart. Each heartbeat has a P wave, QRS complex, and T wave that represent repolarization and depolarization of the atria and ventricles of the heart. The heart rate for a healthy person ranges from 60 to 100 beats per minute [1]. The heartbeat depends on one's instant activity that it may beat slower or faster. The heart beats faster when exercising, and it beats slower than active conditions during resting or sleeping.
Arrhythmia is any abnormality in the cardiac cycle that can be considered as an irregular heart rate or irregular waveform [2]. A heart that has an arrhythmic heartbeat cannot pump enough blood throughout the body as well as it should. This condition may damage many organs and pose a threat to daily life.
Since cardiac arrhythmias are a major threat to human health, their early and accurate detection is essential in medical practice [3]. Manual analysis of the ECG signal recordings is not efficient to correctly detect abnormalities in the heart rhythm [4,5]. Analysis of long-duration ECG signals by physicians is a burdensome and time-consuming task that may yield inaccurate results. Developing automatic cardiac arrhythmia detection algorithms reduce the physician's workload, decreases arrhythmia detection time, and also improves diagnostic efficiency and accuracy. Many studies in the literature presented some forms of computer-aided systems by using different feature extraction and classification techniques to accurately detect abnormalities in the ECG signals.
There have been several methods for automatically detecting arrhythmias based on signal processing, feature extraction, and machine learning algorithms [6]. Recorded ECG signals are generally contaminated by different noise types or artifacts which may change the characteristics of the ECG signal. In the preprocessing stage, contaminants are removed from ECG signals applying different filtering operations [7]. The feature extraction stage is crucial for the discrimination of arrhythmic signals from regular ones. Features are extracted from the ECG signals by using various methods in the time or the frequency domain [7]. Among the time-domain ECG morphology and heart rate features [8], R-R interval and linear discriminant analysis (LDA) [9] have been widely used. In the frequencydomain, features based on Fourier transform [10], spectral correlation [11], and variational mode decomposition (VMD) [12] have been used. The preprocessing and feature extraction stages construct an analysis system for the final learning algorithms. Conventional machine learning algorithms such as Support Vector Machine (SVM) [13], Random Forest (RF) [14], and Artificial Neural Networks (ANN) [15] have been utilized in previous studies for the classification of different arrhythmia types.
In [16], time-frequency (TF) analysis of ECG signals is applied in the detection of cardiac arrhythmias.
Pseudo Wigner-Ville Distribution is utilized to obtain a TF representation of ECG signals obtained from the American Heart Association (AHA) and Massachusetts Institute of Technology (MIT) -

Boston's Beth Israel Hospital (BIH) databases. Four different classifiers; Logistic Regression with L2
Regularization (L2-RLR), Adaptive Neural Network Classifier (ANNC), SVM, and Bagging are used to classify ventricular fibrillation, ventricular tachycardia, normal sinus, and other rhythms. Although many studies have developed arrhythmia detection algorithms by using preprocessing, feature extraction, and machine learning techniques, they have limitations for accurately classifying arrhythmias. Loss of ECG beat characteristics in noise filtering, not selecting optimal features for classification, low classification performance is examples of these limitations that directly affect the success of the studies [17].
The architecture of conventional neural network algorithms contains input, output, and hidden layers.
Deep Learning (DL) is a novel neural network structure that contains more than three layers and has become more favorable in detection and classification studies compared to conventional techniques [18]. In DL, feature extraction and classification parts are embedded in the model which automatically identifies the optimal features from the input data [19]. DL has become very popular in recent studies since it provides improved performance of ECG arrhythmia classification. DL may be categorized into different types based on the training methods such as recurrent neural networks (RNNs), deep neural networks (DNNs), convolutional neural networks (CNNs), and Long short-term memory (LSTM).
Zhang et al. [20] proposed an RNN and clustering-based method to find patient-specific ECG classification algorithms by using the MIT-BIH arrhythmia database. Al Rahhal et al. [21] proposed a DNN based method to classify ECG signals using three different databases. In [22], the temporal features of ECG heartbeats are detected with DNN on the MIT-BIH database with 99.09% accuracy.
Yildirim et al. designed a 16-layer deep CNN to classify 17 different arrhythmias on ECG signals taken from the MIT-BIH dataset. In [23], authors proposed a novel 3-layer deep genetic ensemble of classifiers to detect 17 different arrhythmias which achieved 99.37% classification accuracy.
CNN is a popular deep learning architecture for the automatic classification of ECG signals [24].
Kiranyaz et al. [25] introduced a one-dimensional (1D) 3-layer CNN with an R-peak wave for ECG arrhythmia classification. Baloglu et al. [26] used CNN algorithms with the end-to-end structure on 12-lead ECG signals for automatic detection of myocardial infarction on ECG signals with over 99% accuracy. Savalia et al. [27] proposed multilayer perception (MLP) and CNN-based methods to identify first-degree AV block (FAV) and ventricular bigeminy diseases. In [28], the authors proposed an 11-layer CNN structure to detect different ECG segments with an accuracy of 92.50% using 2 seconds, and 94.90% using 5 seconds ECG segments. Yao et al. [29] proposed an attention-based timeincremental CNN (ATI-CNN), that preserves spatial and temporal characteristics of ECG signals with the integration of a CNN architecture and recurrent cells. Their results attained 81.2% accuracy.
Besides using a 1D CNN for ECG arrhythmia classification, there are several studies in the literature based on 2D CNNs. The hidden structure of CNN can extract various local features from 2D input samples. The spatially adjacent pixels may be represented by utilizing nonlinear and multiple filters.
Hence, the recent state-of-the-art studies proposed 2D CNNs for ECG arrhythmia classification, these approaches motivated researchers to implement CNNs with 2D image-based input data. In these studies, ECG signals are converted into 2D images and provided input for CNN [17,30]. The CNN architecture is considered to be more suitable for the analysis and classification of 2D data. It achieves better results compared to other classical techniques in image processing [31]. Jun et al. [17] proposed deep 2D CNN based on 7 different arrhythmia detection methods using 2D grayscale ECG beat images, and compare the performance of their proposed architecture with well-known structures such as AlexNet and VGGNet. Huang et al. [32] used TF spectrograms that are obtained from five different ECG beats by short-time Fourier transform (STFT). Spectrogram images are utilized as input data to the 2D deep CNN which yielded an average accuracy of 99.00%.
Although there are many studies for arrhythmia detection in the literature, most of them experience various problems such as excessive depth in the network, training cost, and computational complexity.
Considering the benefits and drawbacks of the existing techniques, this paper proposes a novel DL approach for identifying different arrhythmia types in ECG signals. In this approach, the CNN model is selected as the network structure, and ECG signals are converted into ECG images to be used as input to a new CNN architecture. Five different arrhythmia types are considered for classification.
Before converting 1D ECG signals into 2D images, segmentation is applied to the ECG signals to extract the ECG beats. Then each ECG beat is converted into a 2D grayscale image and used as input data for the CNN architecture. The classification performance of the proposed DL approach is compared to LeNet and ResNet-50 architectures.

Material and methods
The proposed ECG arrhythmia classification algorithm consists of the following steps; heartbeat segmentation, image transformation, and ECG arrhythmia classification by using 2D CNN architecture. The schematic diagram of the methodology is shown in Fig. 1. The ECG signals are taken from the MIT-BIH arrhythmia database [33]. Five different arrhythmia types are selected from the database. ECG signals are segmented into heartbeats and converted into ECG heartbeat images which are then used to train the network. 6

ECG database
MIT-BIH arrhythmia database [33] includes different arrhythmic signals which are independently annotated by two or more cardiologists according to their arrhythmia types. Each record includes twochannel ECG signals which are the modified limb lead II (MLII) and one of the modified leads V1, V2, or V5. Due to the deformation of the second channel, MLII lead recordings are used in this study.
The duration of each signal is 30-min with 360 Hz sampling frequency and is filtered by a 0.1-100 Hz bandpass filter. The MIT-BIH database is well-known to be imbalanced by the non-equal number of ECG beats for each arrhythmia which deteriorates the accuracy of DNN and CNN models [34]. The deep learning algorithms may tend to be biased for the type of arrhythmia classes that include many samples as the number of heartbeats are not equal for each class of arrhythmia in the dataset. Recently, some approaches have been proposed to eliminate the imbalance effect in the MIT-BIH database. A novel study proposed a data augmentation technique using 15 different classes from MIT-BIH to generate a balanced database. In this approach, the proposed model trained with the same techniques and hyperparameters using the original imbalanced database and balanced database that is created with the augmentation technique to observe the effect of the balanced dataset. Their results revealed that augmenting the imbalanced original dataset with generated heartbeats outperforms the performance of arrhythmia classification than using the proposed techniques trained with the original dataset [35]. Oh et al. [36] utilized a combination of CNN and LSTM for classifying five classes of the MIT-BIH dataset which are normal (N), left bundle branch block (LBBB), right bundle branch block (RBBB), premature ventricular contraction (PVC), atrial premature beat (APB) and normalization technique is applied to standardize the input data. Huang et al. [32] proposed 2D CNN using spectrograms of the five different classes of arrhythmias as input for ECG arrhythmia detection. The nearly equal numbers of five different arrhythmias which are N, LBBB, RBBB, PVC, and APB are selected to balance the dataset and classified to achieve the highest accuracy. Recent studies have revealed that data augmentation techniques and providing an equal number of beat samples among classes of the dataset can be used as approaches to stabilize the imbalance ratio (IR) [37].
Considering the limitations of the MIT-BIH dataset and approaches that are applied to overcome them, arrhythmia classes with an approximately equal number of heartbeats are utilized to train the model and to eliminate the imbalance effect of the database. Besides, the processing of a large amount of heartbeat data plays a vital role in DL-based approaches for successfully analyzing and classifying them. As a result, the process of determining the types of arrhythmia was performed based on the two criteria mentioned above. The specific 5 arrhythmia type that includes many heartbeat data and an approximately equal number of beat samples among them are determined to provide the normalized and big dataset. Consequently, five different arrhythmia types which are N, LBBB, RBBB, PVC, and Paced beat (PB) are selected from the database. Another advantage of selecting the 5 specific types of arrhythmia is that it provides a fair comparison with recent studies that make a certain number of arrhythmia classifications. The sample sizes of ECG beats for the considered classes are given in Table   1.

Heartbeat segmentation
Python programming language is utilized for analyzing and classifying ECG arrhythmic signals. In the MIT-BIH database, each heartbeat of a signal is annotated by cardiologists based on the QRS structure and type of the heartbeats. To identify annotated heartbeats, WFDB Toolbox for Python is applied to 8 the ECG signals. This toolbox finds the QRS complex of each beat on the signal, separates heartbeats from the signal, and categorizes them according to their arrhythmia types. An example drawing of the ECG signals in the MIT-BIH database, and the segmented 2D heartbeat images are illustrated in Fig.   2. ECG records and the number of ECG beats for each arrhythmia type are shown in Table 1. After completing the segmentation, each ECG beat is converted into ECG image formation.

2D signal-to-image transformation
CNN is mainly used for analyzing 2D data since it automatically learns the optimal features from raw image data.  [38]. Thus, grayscale image conversion provides a decreased dimension of images and reduces the computational complexity, and the classification time.

Convolutional neural network
CNN is a deep learning algorithm based on artificial neural network structures. Conventional machine learning techniques include three layers which are the input layer, one hidden layer, and the output layer. An artificial neural network consists of more than three layers that have several hidden layers in its structure. The structure is inspired by the brain working system that includes many hidden layers.
In the hidden layer which includes many neurons, the input is transformed into something that the output layer can use. Neurons provide feature detection from the input data. The mathematical representation of artificial neuron is defined as; where is a function of the input weighted by a vector of connection weights completed by a neuron bias , and associated with an activation function φ. The schematic diagram of an artificial neuron is visualized in Fig. 3.
Artificial neural networks have been used in many different areas as computer vision, speech recognition, natural language processing, bioinformatics, drug design, and medical image analysis.
CNN is a part of artificial neural networks especially designed for analyzing 2D data like images or videos. In contrast to conventional machine learning algorithms, CNN architectures do not need to extract hand-crafted features from the raw data. Both feature extraction and classification parts are embedded in the architecture and so automatically identify the robust features from the input data [19].
CNN has three characteristic layers; convolution, pooling, and fully connected layer.
In the convolution layer, input samples are convolved with a specific kernel. Many features are provided by moving the specific kernel [30]. The equation of discrete convolution function is defined as; where and are two functions. For 2D signals like images, the equation is changed as; where K is a convolution kernel, G is a 2D signal. The convolution process provides to extract effective features from the input. The pooling layer is used for reducing the dimension of the input sample by keeping the optimal features. In the fully connected layer, all neurons of the current layer are interconnected to the neurons in the next layer. As such, the results of the convolution and pooling layers are used for classification. Between convolution and fully connected layers, there is a flattened layer where multidimensional feature vectors are transformed into 1D output vectors [39]. Also, in the fully connected layer, data is provided from the flattened layer and the learning process is realized through the neural network. The SoftMax function is used in the last layer to classify each ECG arrhythmia class. When the training of CNN is complete, the model is created for the classification.
In this study, a novel CNN model is designed, inspired by the LeNet model which is a well-known f(x) = max(0, x) Here x is an element of the output kernel after the convolution or pooling layer. The proposed model structure is illustrated in Fig. 4.

Training and testing
The cross-validation is usually not processed because of training costs and time duration in deep learning approaches. The deep learning-based studies need to process huge datasets so the validation split technique can be able to provide the function of n-fold-cross-validation. Despite this, the 5-fold cross-validation method was used in addition to the validation split process to observe the performance of the study. In the random validation split process, the dataset is separated into two parts; 80% of total input data is using for training and remains is using for testing. Then, the part reserved for training was applied to a 5-fold cross-validation process. Hence, the test data is included 7519 ECG beat images In the proposed algorithm, the learning rate is chosen as 0.001. Adam optimizer is used for optimization, and cross-entropy is selected for the loss function calculation. After all the training epochs are completed, the proposed algorithm performs a test on the CNN model. The summary of the proposed CNN architecture with layer parameters is shown in Table 2.
The proposed CNN model is compared with LeNet and ResNet-50 architectures to evaluate the accuracy rate of the study. LeNet architecture contains two convolution layers, two pooling layers, and a fully connected layer. As the proposed CNN architecture mimics the LeNet, the summary of LeNet architecture with layer parameters is also demonstrated in Table 3.

Performance evaluation metrics
The performance of the proposed model is evaluated utilizing various metrics which are accuracy (ACC), specificity (SPE), recall (REC), precision (PRE), and F1-Score [40]. The ACC is indicated as the total number of correctly classified ECG beat images divided by the total number of test images.
The accuracy-based performance evaluation is performed for machine and deep learning algorithms, 13 which is not sufficient in the case of imbalanced labeled testing and training sets. Hence, additional performance metrics are included in the evaluations. The REC metric is referred to as a true positive rate which is predicted as positive, while the SPE metric is referred to as a true negative rate which is predicted as negatives. The PRE is referred to as the proportion of classified positive cases that are correctly real positives. The F1-Score metric is calculated by evaluating the harmonic mean of PRE and REC values. These performance metrics are calculated as follows; The Receiver Operating Characteristic (ROC) curve serves as a graphic presentation of the trade-off between sensitivity (the false-positive rates) and specificity (the false-negative rates). The percentage of false-positive (1-specificity) is indicated by the x-axis, goes from 0 to 1 (0-100%) and the percentage of false negatives is indicated by the y-axis, goes from 0 to 1 (0-100%) on the representation of the ROC curve. When the ideal values (100% sensitivity and 100% specificity) are provided, the point on the ROC curve would be at the upper left-hand corner (0,1) which means that the better the test is at discriminating between cases and non-cases. The area between ROC and the axes calculated as the Area Under the ROC Curve (AUC). It can get any value between 0 and 1 as both x-and y-axis have values goes from 0 to 1. The fact that AUC is closer to 1 indicates that the better the overall performance of the test. The AUC is an appropriate metric to examine algorithm performance as it is independent of the prediction criterion selected. Also, in order to observe the training result clearly, the average of 5 training ACC values obtained after 5-fold cross-validation was calculated. The performance of the proposed model has been evaluated thanks to the performance evaluation metrics mentioned above.

Results
In this study, Keras and TensorFlow libraries are used as the backend for deep learning algorithms. In addition to examining the effect of epoch and batch size changes on the performance of the model, the effect of changing input image sizes, changing learning rate hyperparameter, and the number of model layers on the accuracy was also examined. A visualization of the effect of some parameter changes on accuracy is given in Fig. 6. Note that, in order to tune the optimum parameter, the relevant parameter is changed while the other parameters are fixed to the values where the best accuracy is obtained. Image sizes are chosen to square sizes and a power of 2 in all trials. The main reason for this is to avoid the padding process in the convolutional and pooling layers. Thus, the image size was chosen as 256x256 and an accuracy of 99.7 was obtained. Afterward, the model was trained with input image sizes at 128x128, 64x64, 32x32, and 16x16, respectively. The highest accuracy has been achieved with 64x64, 128x128, and 256x256 image sizes. Similar to the epoch size selection, the image size was chosen as 64x64 in order to avoid increasing the training cost. Also, the reason why image size lower than 16 was not selected; to keep the resolution at a sufficient level to avoid distortion of the ECG images and to avoid negative dimension when applying the pooling process in the training phase. Additionally, for the ResNet-50 architecture, the lowest image size that can be used as an input without any padding is 64x64. Another reason why image size was chosen as 64x64 is to compare the trained models as equal conditions as possible. In order to observe the effect of the learning rate hyperparameter on training performance, the learning rate was chosen as 0.01, 0.001, and 0.0001, respectively. Experimental results show that when the learning rate is tuned to 0.001, the highest accuracy is yielded.
Finally, in addition to conducting experimental studies to tune the number of layers in the proposed architecture, the obtained results by training ResNet-50 (more layers) and LeNet (fewer layers) architectures were considered. Therefore, it is adopted to yield higher accuracy with fewer layers (lower training cost). In the proposed model layers containing convolution and subsequent maxpooling with 20, 50, 100, 150, and 200 kernels, respectively, were tested by freezing some layers. The highest performance was yielded in 3-layer architecture. In the parameter tuning process, the principle of providing the lowest cost has been adopted while creating the lightest system. Finally, 150 epoch sizes, 64 batch sizes, 3-layer architecture, 0.001 learning rate, and 64x64 image sizes are tuned to be optimum for the proposed experiment. With these tuned parameters, an outstanding multiclass arrhythmia classification performance with an accuracy of 99.7 was yielded. Detailed performance evaluation metrics of the study are demonstrated in Table 4. Table 4 Classification performances of the trained models.

Discussion
This study presents a new method for the ECG arrhythmia classification using 2D ECG signal-toimage representations with the deep learning approach. In our previous study [38], five different arrhythmia types were differentiated using ECG signal and LeNet architecture with an accuracy of 97.42%. The five different heartbeats are categorized as non-ectopic beats, ventricular ectopic beats, fusion beats, supraventricular ectopic beats, and unclassifiable beats. However, the label distribution between heartbeat categories was not balanced. In this study, almost equally distributed (IR= 1.  Table 5. Many studies in the literature achieved a lower accuracy rate than the proposed approach, although they involve many deep network layers. Recently, Li et al. [24] proposed ResNet-31 based ECG heartbeats classification using five classes of single-lead and 2-lead datasets and achieved 99.06% and 99.38% accuracy for training and testing respectively. In another study [41]  The respectively. It can be observed in Fig. 5 (a) that the ResNet-50 architecture has a high validation loss fluctuation. This is thought to be caused by shortcuts in residual networks. Besides, after the 80th epoch, the loss values converged to zero. In Fig. 5 (b), although the accuracy-loss drawings obtained from the LeNet architecture seem more stable, they could not exceed a certain accuracy value. The most stable accuracy-loss graph is obtained by the proposed architecture in Fig. 5 (c). Similarly, the ROC Curves in Fig. 5 (d, e, f), are examined, it is seen that the best ideal result is obtained with the proposed architecture in Fig. 5 (f). Finally, the confusion matrices in Fig. 5  These accuracy values show that ResNet and LeNet achieve lower classification success than the proposed model. Thus, it was that the harmony between the higher performances and the complexity of the structure is related to the selection of effective layer types that are in the correct order, optimal filter dimensions, and other training parameters. The developed automatic cardiac arrhythmia detection algorithm improves diagnostic efficiency and accuracy while reducing the training time. The innovative contribution of the proposed study may be emphasized as follows: • This paper proposes a novel deep learning approach for identifying different arrhythmia types utilizing 2D ECG beats that are obtained from 1D ECG signals by a signal-to-image transformation procedure.
• The benefits of using 2D gray-scale images on the proposed CNN structure were demonstrated.
The proposed method, • does not require any complex pre-processing of ECG signals, and QRS complex determination to perform classification, • does not contain any manual computationally demanding feature extraction steps as in the traditional machine learning methods, • investigates the performance improvements by using a novel 2D CNN-based model in the classification of arrhythmia types compared to well-known CNN approaches such as LeNet and ResNet-50, • increases classification performance while decreasing the computational cost compared to the well-known CNN architectures for mobile-based decision-making systems, • optimizes the deep network hyperparameters to yield the best classification performance.

Conclusion
This paper proposes a novel approach for the accurate classification of ECG arrhythmias based on 2D CNN architecture. ECG heartbeats are transformed into 2D time-amplitude images to be used as input data for CNN architecture. This study demonstrates that a combination of simple ECG time-amplitude images, and the image classification capability of the CNN architecture may provide the highest classification performance with the elimination of manual preprocessing steps such as noise filtering, feature extraction, and feature selection. The proposed novel 2D CNN architecture includes relatively fewer layers compare to other CNNs and has yielded better performance compared to the well-known CNN architectures. Additionally, the proposed model was generated with balanced label distribution to minimize the imbalanced dataset effect. Considering the experimental results, the proposed method is a simple, effective, and useful approach that may be used by experts for quickly and automatically identifying cardiovascular problems on ECG signals. In future studies, the proposed algorithm may be implemented for home healthcare monitoring systems, by combining mobile applications and portable ECG devices, that will automatically detect arrhythmias in real-time and share them with physicians.