Cardiac arrhythmia detection using dual-tree wavelet transform and convolutional neural network

The non-stationary ECG signals are used as key tools in screening coronary diseases. ECG recording is collected from millions of cardiac cells and depolarization and re-polarization conducted in a synchronized manner as: the P wave occurs first, followed by the QRS-complex and the T wave, which will repeat in each beat. The signal is altered in a cardiac beat period for different heart conditions. This change can be observed in order to diagnose the patient’s heart status. Simple naked eye diagnosis can mislead the detection. At that point, computer-assisted diagnosis (CAD) is therefore required. In this paper dual-tree wavelet transform is used as a feature extraction technique along with deep learning (DL)-based convolution neural network (CNN) to detect abnormal heart. The findings of this research and associated studies are without any cumbersome artificial environments. This work investigates the viability of using deep learning-based architectures for heartbeat classification. The DL architecture is used for the proposed project, and the results suggest that it is feasible to use 2D images to train a deep learning architectures for heartbeat classification. The CNN produced the highest overall accuracy of around 99%. The CAD method proposed has high generalizability; it can help doctors efficiently identify diseases and decrease misdiagnosis.


Introduction
The chaotic, low voltage, and non-invasive signal is an electrocardiogram (ECG) (World Health Organization 2017; Martis et al. 2014). Due to variations, ECG is noisy and artifacts occur during their acquisition by pasting electrodes at defined locations. These deviations and artifacts occur in the current carrying cables, muscle artifacts, body motions, worst electrode consistency, etc., and also due to electromagnetic contamination. The ECG is a visual recording of the electrical potential produced by the heart's pumping action. The depolarization and repolarization of the SA node followed by the depolarization and repolarization of the AV node is nothing but a pumping operation. Today, since it reveals vital clinical knowledge, ECG is the most promising heart diagnostic method in the world. It diagnoses the ECG's rhythmic episodes, and if the patient has any heart failure, further arrhythmia is listed. Arrhythmia is caused by heart muscle injury, diabetes, tobacco use, low and high respiration, blood pressure, etc. There are life-threatening (critical) and non-life-threatening (non-critical) arrhythmia. Critical arrhythmias do not give any time for surgery, whereas noncritical requires special care to save life.
Simple diagnosis using naked eye may mislead the detection. Therefore, computer-aided diagnosis (CAD) is required at that stage (Yu 2015). Computers in the field of ECG signal interpretation have also enhanced the diagnosis of the critical health (Kora and Kalva 2017;. The ECG displays and records electrical activity of the heart using electrodes pasted on the skin surface of the body. It is the depolarization and repolarization of the heart tissue, which generates a potential difference to examine. This cycle begins with the activation of the SA node and the heart's pacemaker, which produces an electrical impulse that sets off a chain of electrical events in the heart. Normal phenomena are noted by the electrodes, and variations in the ECG signal's wave portion are reflected. P wave component indicates atrial depolarization, QRS complicated wave component indicates ventricular depolarization, ST slope indicates blood flow in the body, and T wave component indicates ventricular repolarization are all common components of an ECG signal. Identification of diseases from ECG signal is done with rhythm, measured heart rate and QRS duration. The R wave peak classification is much essential in automatic signal classification, especially in critical conditions and cardiac abnormalities. Many coronary heart diseases can be detected by analyzing ECG signal. In this paper three different diseases are identified, namely atrail fibrillation (AF), myocardial infarction (MI) and bundle branch block (BBB) (Texas Heart Institute 2016).

Atrial fibrillation
Atrial fibrillation (Selesnick et al. 2005) is the most severe cause of supra-ventricular tachycardia that occurs as unregulated 'forces' of electric signals travel across the atria from the sinus node instead of standard regulated signals. Such unregulated impulses contribute to muscle fibers in the atria contracting out of time to fibrillate. Any of this signals enter the ventricles and induce a fast and erratic heartbeat.
The heart does not move normally or work as it should when it is in atrial fibrillation. Atrial fibrillation may result in rhythm fluttering, erratic pulse, chest pain or pressure, fatigue and dizziness. The likelihood of stroke can also increase from auric fibrillation because atria-trapped blood can coagulate. Such coagulations can break loose from the heart and move into the bloodstream, triggering a stroke, into the brain. Figure 1 demonstrates following morphological shifts found in ECG, unsteady signals rather than P waves (f waves). MI Acharya et al. (2017), Padmavathi and Krishna (2014) is a dangerous condition which occurs when blood flows unexpectedly to the heart muscle, causing tissue damage. A cardiac attack or myocardial infarction is a medical emergency. A blood clot usually happens where a MI prevents cardiac movement. The tissue lacks nutrients and fails without oxygen. Tension and/or stomach, throat, back and arms discomfort, and exhaustion, light-headedness, irregular pulse and anxiety are the major signs. The atypical signs in women are more common than with males. On ECG, high peaked T waves (or hyperacute T waves) and then ST eye elevation may be detected, and then T waves negative and gradually Q waves pathological will evolve. The ST height (Type 1) and ST depression (Type 2) of two separate forms of MI can be detected from ECG. MI-2 signal in Fig. 2 and the morphological adjustments of form 2 MIs are seen.

Bundle branch block
The electrical motion of the heart begins at the sino-atrial junction (the typical pacemaker of the heart), which is arranged in the upper right chamber. Next, the electrical pulse moves across the left and right atria and sums up at the atrioventricular (AV) node. The electric pulse drive goes down to the bundle from the AV point and partitions into the divisions of the right and left bundle. Finally, the divisions spread out into a large number of Purkinje fibers, interdigitating with individual cardiovascular myocytes, thus accounting for the ventricles' rapid, synchronized and synchronous physiological depolarization. BBB normally allows the QRS complex to stretch and can shift the electrical axis of the heart slightly on one side. In the BBB condition of the heart, the length of the QRS complex on the ECG is greater than 120 ms. The ECG would display a terminal R wave in lead V1 and a slurred S wave in lead I. LBBB spans the entire QRS and shifts the electrical axis of the core to one side for most of the time. Reasonable T wave depression is another normal finding for BBB. The T wave of the QRS complex would be redirected inversely to terminal avoidance. Cardiovascular dyssynchrony may be prompted by the left block. Around the same moment, the occurrence of left and right blocks adds up to AV block prompts. The variations in BBB signals are depicted in Figs. 3 and 4.
The main steps in the ECG signal classification are [(i) preprocessing; (ii) feature extraction; and (iii) classification] Previous approaches required both a large volume of data and strong computing power computers. Instead of following the popular trend of creating new algorithms to solve a problem, we decided to leverage existing tools and show their high accuracy in medical tasks outperforming the suggested solutions. The motivation of this work was to create a pipeline allowing us to detect arrhythmia using tiny datasets and limited computational resources. To prove that statement, we generate class activation maps marking regions of interest (regions that vastly contributed to the final classification). The purpose of this project is investigate the viability of using 2D images for heartbeat classification using deep learning architectures. The particular deep learning architecture used for this project is the convolutional neural network (CNN). CNNs are basically composed of a stack of feature extractors that makes them a good solution for classification without human intervention. Unlike the other machine learning algorithms mentioned above, which require a feature extraction phase before training and classification can take place, CNNs can inherently automate the feature extraction process during the training phase. In theory, the features that CNNs automatically extract should be the same morphological features and sinus rhythm irregularities mentioned above that a human or other learning algorithms would use for heart arrhythmia classification. To be successful, the CNNs will need to correctly identify heartbeat images into one the following Association for the Advancement of Medical Instrumentation (AAMI) classifications: normal beat (N), MI, BBB and AF beats are included in this project due to lack of other data samples.

Related works
Different feature extraction methods are available in literature study due to the accessibility of information with larger number of variables (features) (Kim et al. 2007). Using the Physionet database, recent spectral estimation-based feature extraction approaches, such as continuous wavelet transform (CWT), discrete wavelet transform (DWT), magnitudesquared coherence (MSC) and wavelet coherence (WTC) , yielded a wide feature set. In this paper we have used DTWT as a feature extraction technique as it furnishes performance over the standard wavelet transform for signal, image and video processing. The DTWT is realized using two different filter banks. To implement this transform, we cannot randomly select the two filters (scaling and wavelet) in two separate trees as shown Fig. 6. The scaling and wavelet [low pass (ho), high pass (h1)] of upper filter bank should generate Hilbert transform of lower filters (go,g1). Hence, approximately analytic complex-valued filters (scaling and wavelet) are generated from the two trees.
For signal processing, multiple transforms have been proposed: (Hramov et al. 2015;Vetterli et al. 2014). For such research, the option of signal transforms is typically attributed to certain useful characteristics that these transforms provide, including their compact signal representation, reversibility, the availability of fast computer models, the capacity to independently interpret signals at each frequency, among others. Feature separation technique commonly used nowadays is the WT which has many applications such as (Xizhi 2008), de-noising , feature extraction and a timefrequency transformation. Wavelet transforms express feature of signal in both frequency and time domain view. WT has a disadvantage, as more additional sort-out is required to distinguish features that are most important. Feature optimization techniques include independent component analysis, principal component analysis and linear discriminant analysis. The details to be added to the classifier to separate these characteristics from their distinct disease groups with each of these machine learning feature extraction algorithms are shown in Table 1. Bal (2012) utilized complex dual-tree wavelet transform to remove noise in optical microscopy images. Sudarshan et al. (2017) extracted features using dual-tree complex wavelets transform and then classified these features using different classifiers. Accuracy for detection of congestive heart failure using this method was high. Thomas et al. (2014) demonstrated DTWT with four other features to detect five cardiac arrhythmias and then classified using neural network. The performance DTWT was compared with WT. Mishu et al. (2014) utilized DTWT to denoise ECG signals collected from MIT/BIH database.

Preprocessing of ECG signal
The information was obtained from the Physionet database. Sinus rhythm (N), AF, MI, LBBB and RBBB are examples  19088,19090,05261,04426,0645,19103,19140,19830 of ECG signals from the Physionet database, and they refer to the files in Table 2.

Dataset
The

Dual-tree wavelet transform
Discrete wavelet transform (DWT) (Selesnick et al. 2005) was recently modified to provide additional enhancements such as :

(i) Directionally selective at higher dimensions (ii) Shift invariant (iii) Rotational invariance
The traditional dual-tree wavelet transform makes the use of two real DWTs parallel to process the input data. The upper DWT computes the real component value, whereas the lower DWT computes the imaginary part, collectively combined together to form a complex WT. The dual-tree transform gives a key way to find out the solution for shift-invariant and directional selectivity problem for signals, which has proven somewhat disappointing in processing complex signals like music, speech and radar of higher dimensions. To avoid these problems, the complex wavelets were introduced to provide additional potential improvement. Dual-tree wavelet transform produces real (R) and imaginary (I) parts in six subbands directed in ±15 • , ±45 • and ±75 • . In h i (n) and g i (n) are the fitters in stage 1, where Fig. 7 Dual-tree wavelet transform 2 wavelets g,1(t)and h,1(t) generate an approximate Hilbert transform pair, g2(t)and h2(t) likewise. This is to say that the decomposition f(t) by utilizing the dual-tree WT creates six complex-valued high-pass subband and six complex-valued low-pass subbands at every level of decomposition as shown in Fig. 6.

Convolutional neural network
There are several layers in a convolutional neural network, accompanied by neural network layers (Zhao and Zhang 2005;Gu et al. 2015). A CNN's structure is designed to take advantage of a 1D signal or a 2D picture anatomical structure as an input. CNN's main moves are as follows: The convolution layers are made of kernels and small tensors compared to windows which process input and output information. Those operators can successfully capture the spatial and temporal dependencies in an image and thus learn different local features like straight lines (horizontal or vertical) and curves, while upper layers (hidden) can perform detection of more sophisticated information like rectangles or circles based on the received input and, therefore, understand it better. As processed data flow higher to deeper layers, a network learns more "abstractive" combinations. Let us consider a hypothetical image presented in Fig. 8; the role of a convolutional neural network is to reduce the image space into a form much more comfortable to process without loss of any information crucial for obtaining a valid prediction. The process of convolution operation extracts valuable input features and processes this information to the next level, whereas reducing the dimensionality.
As seen in Fig. 8, convolution operations with weights are accompanied by pooling, which optimizes invariant functionality. The neural network is simpler to understand with less features, which is CNN's biggest benefit. The design of CNN and the neural network used to compute the gradient will be discussed here.
The purpose of an activation function is to add some nonlinear property to the function, which is a neural network. With softmax as the activation function, we apply only in the last layer and only when we want the neural network to predict probability scores during classification tasks. Both dense layers and convolution layers get updated during backpropagation, but max-pooling layers do not have any weight to be updated. Dense layers are updated to help the neural network to classify. Convolution layers get updated to let the network learn the features itself. The loss function we used was binary cross-entropy. The model was trained using the ADAM optimizer through 75 epochs and a learning rate of 0.001. We have used five convolution layers, and the output

Results
MI data were obtained from the Physionet PTB online database containing data from 52 average persons and 148 MI patients at 1000 Hz sampling frequencies. BBB data were compiled from the same collection of 48 half-channel outpatient ECG recordings from 47 persons. Three LBBB and 3 RBBB files with a length of 30 min at a 360 Hz sampling rate were used from the arrhythmia database. Along with its informative coefficients of levels D1 to D6, a 10-s ECG waveform during AF. Although the sampling rate of the AF signal is 250 Hz, 1250 samples are used in this signal. It can be clearly seen that in the wavelet domain, especially in the comprehensive coefficients, atrial behavior can be evident as shown in Figs. 9 and 10. Ten-s MI data are for all of its D1 to D6 coefficients. As the MI waveform rate is 1000 Hz, 10,000 samples are included in this signal. In the WT domain, the characteristics of MI are noticeable, especially in the comprehensive coefficients as seen in Figures 11 and 12. Ten-second LBBB and RBBB data with their coefficients are from levels D1 to D6. The BBB signal rate is 360 Hz, with 3600 samples containing this info. It can be clearly seen that in the wavelet domain, especially in the detailed coefficients, the BBB operation can be evident.
ECG signal comprises many characteristic lines. These points describe the ECG signal's behavior. It is especially essential for identification and diagnostic functions to reflect these points (features) with a lower number of parameters. The division of a signal into a variety of scales of each dimension reflects a basic coarseness of the signal under this and the wavelet coefficients are the information (D). The approximation coefficients are again split into a coarser approximation (low-pass) and high-pass (detail) segment at each subsequent step. Each step of this device requires two digital filters and two downsamplers. Information, D1, and approximation, A1, respectively, are supplied by the downsampled outputs of first high-pass and low-pass filters (Figures 9,10,11,12,13,14,15 and 16). The first approximation, A1, is more decomposed and this phase is persisted. The parameters precision, sensitivity and output accuracy in terms of ROC efficiency curves seen in Figs. 17, 18 and 19 are used to approximate overall achievement. In terms of sensitivity, specificity and accuracy, Table 3 compares the performance of various feature extraction techniques.

Discussion
DT-Decision tree PNN-Probability neural network RBF-Radial basis function  Lee et al. (2007) and categorized them utilizing various forms of classifiers to detect coronary artery diseases. To differentiate three cardiac conditions, Kim et al. (2007) used multiple discriminant study. used SVM, DT, KNN and PNN classifiers to evaluate the efficiency of DWT and nonlinear techniques for detecting normal and coronary heart diseases. In order to distinguish regular and resting ECGs, Schreck et al. (1988) used the technique of empirical mode decomposition. Lehtinen et al. (1998) used the multilayer perceptron neural network to identify coronary artery disorders and found that the precision of identification was enhanced by computer-aided diagnosis. To diagnose coronary artery disease, Lewenstein (2001) analyzed the efficiency of an RBF neural network to be categorized as stable and unsafe patients. Babaoglu et al.
(2010) utilized optimization using binary particle swarm and genetic algorithm as strategies of function optimization and SVM as a classification tool for identifying coronary heart diseases. In order to diagnose coronary artery atherosclerosis, Kaveh and Wayne (2013) used electrocardiogram exercise stress test results acquired from the Physionet database. Using DWT and PCA, functions are collected and optimized and then listed using SVM. Higher-order figures and spectra were used by Acharya et al. (2017), and multiple coronary heart disorders were categorized using KNN and decision tree classifiers. For the purpose of decomposing ECG signals, Kumar et al. (2017) used the flexible analytic wavelet transform and then the least squares support vector machine. The Morlet kernel has a classification performance of 99.6%, compared to 99.56% (Kernel RBF) as shown in Table 4.

Conclusion
The classification of ECG signals is useful for preventing and diagnosing cardiovascular disease, and it is a hot topic in preventive medicine science. The Physionet database was used to achieve higher-quality ECG signals, and the dualtree wavelet transform filter was used for function extraction. Then, CNN model realizes the classification of different arrhythmia signals. Finally, a sophisticated CNN model will automatically recognize and acquire good functionality. In comparison to previous work, a high level of classification correctness is achieved.
Author Contributions All authors have equally contributed and all authors have read and agreed to the published version of the manuscript.
Funding This research has no funding by any organization or individual.

Data availability Statement
Data sharing not applicable.