Deep Convolutional Bidirectional LSTM Model for identifying Ventricular Tachyarrhythmia using ECG Signal Variability

An electrocardiogram (ECG) signal is used widely to detect ventricular tachyarrhythmia (VTA) and to diagnose heart disease. Deep learning models and large ECG data have made the diagnosis of VTA an attractive task to demonstrate the power of articial intelligence in clinical applications. One of the life-threatening complications of VTA is cardiac arrest (CA). The VTA is divided into two categories: ventricular brillation (VF) and ventricular tachycardia (VT). Abnormal electrical activity in the ventricle causes VT, which leads to CA, whereas the chaotic electrical activity in the ventricle leads to VF. To improve the clinical diagnostic system and to help cardiologists, it is essential to identify the risk of VTA at an early stage. The goal of this paper is to develop an end-to-end (E2E) deep learning model that uses a convolution neural network (CNN) and a bidirectional long-short term memory network (BiLSTM) to classify VT and VF arrhythmias from multiple ECG databases. The CNN extracts features from ECG signals, and BiLSTM learns information. The ECG signals are acquired from the MIT-BIH malignant ventricular arrhythmia database (VFDB) and the Creighton University VTA database (CUDB). These ECG signals indicate that heart rate variability is a fast and dynamic event. Before the method's implementation, ECG signals are windowed at a xed size according to annotation information and then normalized within each window. In terms of accuracy and sensitivity, the proposed CNN-BiLSTM deep learning model outperforms existing state-of-the-art methods. These results made it possible to obtain a relatively higher average accuracy (AC) of 99.37%, precision (PE) of 97.12%, a sensitivity (SE) of 98.15%, and F-score (FS) of 98.43%, and an overall accuracy of 99.07%, respectively.


Background
Many patients suffer from cardiac arrest in hospitals every year, an estimated 209,000 people in the United States alone, of whom only one in four survive (Kibos et al. 2013). Cardiac arrest occurs when the heart ceases pumping blood to the brain and other vital organs (Theuns et al. 2019). Cardiac arrests in hospitalised patients are due to VF and VT without pulsation, collectively called VTA, as opposed to systole or electrical activity without pulsation (Chen et al. 2020). VT and VF are fatal heart rhythms that need to be corrected for the patient to survive. In VT, the ventricles beat so fast that the heart is not able to effectively ll its chambers and pump blood throughout the body. In VF, the ventricles quiver rather than a pump, requiring abortive treatment. If either condition is maintained, the brain dies from lack of blood ow and oxygen deprivation. Because the cardiac disease has such a high mortality rate, detecting VTA at an early contact as a boon is a patient treatment (Benjamin et al. 2017). The ECG signal, which shows the electrical activity of the heart over time using electrodes, is the most popular tool for identifying tachyarrhythmia. ECG leads, which measure the electrical potential of the heart from multiple angles and positions, are used to detect abnormalities in the signal types (Kleiger et al. 1987). The characteristics of the ECG have become one of the most important diagnostic tools for predicting heart disease. It contains a wealth of information, not only about the heart structure but also about the function of the electrical conduction system . The one ECG cycle consists of ve different waves. The P wave corresponds to atrial depolarization, the QRS complex occurs as a result of ventricular depolarization, and the T wave corresponds to the polarization of the ventricle. The RR intervals are measured as the time between consecutive R waves. Thus, the variability of the RR interval is very important for measuring heart function. The analysis of variations in this R-R interval is known as HRV analysis . Many methods exist when it comes to classifying VTAs. Automated algorithms are highly desirable for more accurate determination of VT and VF states for timely diagnosis of VTA. Deep learning and machine learning model have been combined to classify VT and VF. Atienza et al. (2013) proposed a strategy of extraction and selection of the most appropriate features from the ECG signal for the identi cation of VT and VF. They extracted 13 ECG features that were input to an SVM to identify VT and VF rhythms with a SE of 92%. Lee et al. (2013) extracted fourteen optimal features from ECG signals using a genetic algorithm. The SVM classi er was used to classify different episodes of ventricular arrhythmias based on extracted features with an AC of 96.3%. Panda et al. (2020) proposed a method for detecting VT and VF using an ECG database. A xed frequency range empirical wavelet transform lter bank for multiscale analysis was introduced. The evaluated ECG signals were used as inputs to CNN to detect VT and VF. The proposed CNN includes four convolutions, two poolings, and four dense layers. ECG signals from various public databases were used to evaluate CNN. The results show that the proposed approach has achieved an AC of 81.25% for the classi cation of VT and VF classes using 8second ECG frames with a 10-fold cross-veri cation. Alwan et al. (2017) classi ed VT and VF using the leakage characteristics of the VF lter and spectral parameters. A set of high-dimensional functions preserves more up-to-date information about the ECG dataset and provides an AC of detection of 73.5% using a support vector classi er. Plesinger et al. (2018) extracted ve features from 3-second blocks of ECG signal using analysis of ECG amplitude measurement, spectra, derivatives, and autocorrelation.
These features were fed into a logistic regression method viewing the probability of the VT and VF. This method was trained on CUDB, consisting of 393 blocks, and tested on VFDB, showing a SE of 95%. In this paper, the deep learning model is composed of CNN-BiLSTM for identifying the VT and VF rhythms from the ECG signal. The ECG databases have been obtained from two publicly available databases, namely CUDB and VFDB. CNN automatically learns the features of multiple ve-second ECG windows through multilayer nonlinear transformation and it is better than conventional features extraction methods. Deep learning can quickly learn from training data and automatically obtain valid features, mining the information hidden behind ECG large datasets. Various hidden factors are often associated with complex non-linear approaches, but deep learning can separate these factors. The CNN structure of deep learning makes it highly expressive and learning. It is especially good at extracting features and making VTA classi cation simple and effective. Following the extraction of features from the CNN model, they are input into the BiLSTM, which subsequently maintains information on long-distance dependencies. LSTM method, which uses recurrent neural network techniques to learn from sequential data, is well-suited for sequential data. Because the data is structured in a sequential manner, it is possible to derive signi cant conclusions in both the forward and backward directions. As a result, we used BiLSTM in this study, which is suitable for processing ECG window data in both directions and classifying them into NSR, VT, and VF. According to our experimental results, as mentioned in the results section, BiLSTM can enhance the classi cation rate.

Methods
The work ow process for identifying VTA from normal sinus rhythms (NSR) using the E2E deep learning model based on CNN-BiLSTM is represented in Figure 1. In order to classify the NSR, VT, and VF classes, the CNN-BiLSTM model has been designed using self-learning CNN features and BiLSTM-generated classes, which eliminates the need for a manual feature extraction and selection process. The continuous ECG signal is subdivided into ve-second windows, with each window being normalized using the Z score approach. The windows database has been divided into ve different train-test ratios, which are as follows: 50%-50%, 60%-40%, 70%-30%, 80%-20%, and 90%-10%.

ECG Databases
All of the ECG databases applied in this study were acquired from the publicly available physionet bank, which includes the CUDB and VFDB databases, among other databases (Goldberger et al. 2000). In both databases, cardiologists have annotated each ECG record that was recorded by different patients over time. All records from the CUDB and VFDB were evaluated and classi ed into two different VTA classes: VT and VF from NSR. The ECG records of both databases have been framed into xed-size window lengths. In this study, we used a window length of 5 seconds (1250 samples) for each record. Each recording has a varied portion of the normal ECG signal, as well as portions that contain VT and VF signals. Table 2 shows the distribution of the number of ECG windows in each class. All ECG records have been windowed into non-overlapping events. As a result, the total number of events is listed in Table 2 with a ve-second window. Each window has been labelled as VF, VT, or NSR using the database's annotation le. Each window has been normalized using the Z-score method. This method eliminates the offset effect and addresses the issue of amplitude scaling before being fed into the CNN-BiLSTM model for the training and testing phases. Z-scores are calculated from the normalized signal value using the following equation:

CNN Model
CNN model is composed of arti cial neurons that have been designed with weights, biases, and activation functions to perform various tasks. These arti cial neurons act as a bridge between the input (I/P) layer and the output (O/P) layer (Pandey et al. 2019). CNN model is made up of ve different types of layers namely I/P layer, convolutional layer, pooling layer, fully connected layer, and the O/P layer. The ECG window data is the content of I/P layer in the proposed CNN-BiLSTM model. This data from the I/P layer is then passed on to the next convolution layer, which is also known as the feature learning layer since the features of the ECG signal data are learned through convolutional operations on the dataset from the I/P layer. This model also includes the recti er linear unit (ReLU) function, which is used to make all negative values null. In the usual sequence, the O/P of the convolutional layer will be used as I/P for the pooling layer. To decrease the spatial volume of the I/P data, it is necessary to employ a pooling layer. It comes after the convolutional layer. If we didn't use the pooling layer, it would be computationally expensive to run the model. As a result, max-pooling is the only approach to decrease the spatial amount of the I/P data.

Bidirectional-LSTM Network
BiLSTM is a convolutional recurrent neural network that has been speci cally designed to overcome the distinct limitations of convolutional recurrent neural networks . These network's neural states are divided into two categories, namely forwarding and backward states, which results in two distinct forward recurrent neural networks and backward recurrent neural networks, as shown in Figure 1. Both networks must be connected to the same output layer in order to produce an output. It is possible to evaluate future and past situations of sequential and one-time entries using this structure without waiting for the results to come in. When compared to the standard LSTM network, two different types of LSTM have been constructed speci cally for sequential inputs in the BiLSTM architecture.

Proposed CNN-BiLSTM Model
The CNN-BiLSTM model consists of twelve layers, each of which is described below in Table 3. It has three convolutional layers, three max-pooling layers, two BiLSTM layers, atten, and three dense layers, as illustrated in Table 3. CNNs have the feature that they automatically emphasize and learn from the best-t features in the input data by using the convolution operation. It is more concerned with local features and their placement, among other things. However, when using a CNN, the weights of neurons for the same feature map remain constant, resulting in parallel network learning, which reduces the amount of time spent learning.
According to Equation 2, the rst, third, and fth convolution layers in this model are convolutional with kernel sizes of 35, 25, and 10 in the rst, third, and fth layers, respectively.
Here, Y -indicates the output vector, whereas x-represents the values of the signal, f -denotes the lters, and N-represents the number of elements.
Following each convolution layer, a second max-polling layer is applied in order to reduce the size of the feature map. The activation function of ReLU is used. In order to learn about the temporal information, the BiLSTM layer is used to learn about the feature maps. In terms of temporal analysis, the features learned from the convolution and pooling procedures were broken down into sequential parts and components, which were then fed to all recurring BiLSTM units for analysis. Following that, the O/P of the previous phase, i.e., the last step of BiLSTM, is fed into the fully connected layers for the purpose of predicting VTA. According to the BiLSTM, the activation function is represented by Equation (3).
Here, x i = input sequence, b f = bais vector, W = weight, and α = activation function, h t = output of previous state, C t = previous LSTM memory.
The output neurons in the three dense layers are 300, 50, and 3 in number, respectively. Finally, the nal dense layer (12) is utilized by the SoftMax activation function, which produces the use of Softmax to classify the output class into three classes: VT, VF, and NSR. Using this strategy, the CNN-BiLSTM model was trained in this study. The training parameters are as follows: the learning rates are in the.001 range, and the regularisation is 0.01. The CNN-BiLSTM model employs a batch size of 34 for its calculations. The categorical cross-entropy function is used to calculate the loss parameter in this case. The CNN-BiLSTM model has been trained and tested in an iterative fashion. The number of iterations was determined to be 100 in this case. Finally, the overall performance of all records is used to calculate statistical performance. The BiLSTM model used for the classi cation of VTA consists of two layers. The seventh and eighth layers are made up of BiLSTM blocks that are 8 units and 16 units in size, respectively, and are inserted into the model. A dropout rate of 0.25 is introduced into the seventh and eighth layers, which helps to mitigate the problem of over tting the model. After the BiLSTM layer, three dense layers with a total of 300, 50, and 3 neuronal units each are put on top of each other. To automatically classify NSR, VT, and VF, the last dense layer makes use of the SoftMax activation function in the previous dense layer. The learning rate of 0.001, the total number of epochs of 25, and the batch size of32 were the additional parameters for training. As a result, the BiLSTM model divides the ECG signal into three classes: NSR, VT, and VF, each of which is determined by the features learned from the ve-second windows.

Results
The CNN-BiLSTM model was trained and tested on an Intel i5 PC with 8 GB of RAM. The Keras libraries were used in the development of this model, which was written in the Python programming language. The experiment was carried out on two different types of ECG records, namely, CUDB and VFDB, and the results were compared. In accordance with the annotation le that was provided in the database, all records were framed into a ve-second window length. An overall number of 12355 windows were obtained from both ECG records, with the speci cs listed below in Table 2. In order to optimize model performance, the database has been randomly divided into the following ratios: 70-30, 60-40, 80-20, and 90-10 training test ratios. As shown in Table 4, the confusion matrices of the test dataset for the trainingtest ratios of 80-20 are shown, and we have built a confusion matrix for each of the training-test ratios in a similar manner.
Table4: A best result actual and predict class is shown using an 80-20 train-test ratio in the database. The AC, SE, PR, and FS of the CNN-BLSTM model have been evaluated against the performance of the model, which has been measured using the equations 2, 3, 4, and 5 given below.   The 50-50 training-test ratio is used to randomly split the dataset in Table 5, and then the applied CNN-BiLSTM model for training and performance is measured on the test dataset. In this table, the train-to-test ratio is considered to be 50-50; the results of this experiment are shown in the table below.   In Table 8, the performance of the CNN-BiLSTM model is shown using the 80-20 training-test ratios of the dataset. The CNN-BiLSTM model in this table achieved better sensitivity and speci city than the others. Finally, in Table 9, the dataset is divided into 90-10 training test ratios and then the CNN-BiLSTM model is applied to evaluate the performance on the test dataset. On both the databases (CUDB and VFDB) used in this study, we used ve different training-test ratios, as shown in Tables 5, 6, 7 and 8. According to the given tables, we have been able to achieve the best results by using the 20% of the test database, as illustrated in gure 4.

Class Predicted
Comparing the proposed CNN-BiLSTM model, which achieved the best results on the CUDB and VFDB datasets, to other existing works presented in literature is shown in Table 8. As a result, our model has showed outstanding performance when compared to other existing models, as illustrated in Table 10.

Discussion
Based on the results in Table 10, the CNN-BiLSTM model achieves the highest level of accuracy. We compared the results of the newly proposed CNN-BILSTM model with other methods in the literature that used the same database (CUDB and VFDB) to identify VTA rhythms using ECG signal analysis. We found that the results of the proposed model are signi cant compared to the excellent performance, indicating the effectiveness of the CNN-BiLSTM model in classifying NSR, VT, and VF rhythms. As can be seen in Table 4, we achieved the highest accuracy of 99.37% for three-class recognition, the highest accuracy ever achieved.  Othman et al. (2012) used semantic mining to classify VT and VF episodes by identifying three signi cant parameters that were damping coe cient, input signal and frequency from the ECG signal. The 4-second ECG signals were analyzed using this method. The VT and VF signals were classi ed from NSR signals without false detection, with a recognition sensitivity of 96% and 94%, respectively. Li et al. (2013) proposed the support vector classi er for classi ed VF/VT. A total of 14 features were extracted from a 5-second window length of the ECG signal. The best combinations of factors were then chosen using a genetic algorithm. As training, test, and validation of ECG datasets, three annotated ECG databases (the American Heart Association database, VFDB, and CUDB) were used. A variety of window sizes, ranging from 1 to 10 seconds, were evaluated. On the training data in the sample with a window size of 5 seconds, AC 98.1% and SE 98.4% were obtained from training datasets. The test data was validated using 5 fold cross-validation and found AC 96. 3% and SE 96.2%. Tripathi et al. (2016) were proposed that an ECG signal be used for detecting and classifying episodes of VF. The decomposition of an ECG signal is performed by the use of variation mode decomposition. The energy, renyi entropy, and permutation entropy of the rst three modes were calculated and used as diagnostic features. Mutual information feature-based assessment was used to select the optimal set of diagnostic functions for the given situation. In order to evaluate the performance of diagnostic functions, a random forest classi er is Author Manoj Kumar Ojha declares that he/she has no con ict of interest. Author Dr Sulochna Wadhwani declares that he/she has no con ict of interest. Author Dr Arun Kumar Wadhwani declares that he/she has no con ict of interest. Author Dr Anupam Shukla declares that he/she has no con ict of interest.

Consent for publication
Informed consent was obtained from all participants involved in the study.

Ethics approval and consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.

Availability of data and materials
All datasets for supporting the conclusions of this article are available from the public data repository at the website of https:// physionet.org.

Authors' contributions
Manoj Kumar Ojha designed, carried out experiments and data analysis and drafted the manuscript. Dr.
Sulochna Wadhwani, Dr Arun kumar Wadhwani, and Dr anupam shukla conceived of the study, and participated in research coordination. The authors read and approved the nal manuscript.     Here, x t = Input vector, h t = Output value.