Efficient Data-Mining Classification Approach For Ecg Data In Health Care Application

Data mining enables classification of Electrocardiographic (ECG) signals of the heart for diagnosing many cardiac diseases. ECG signals often consist of unwanted noises, speckles and redundant features. An unwanted noise and redundant features always degrade the quality of ECG signal and may lead to loss of accuracy in classification technique. To overcome these challenges, we introduced Optimize Discrete Kernel Vector (ODKV) classifier with an impressive pre-processing in this paper. In order to remove the noises, image processing filter namely the Adaptive Notch Filters (ANF) are initially used to remove Power Line Interference from ECG Signals. Moreover, reducing the redundant features from the ECG signal plays a vital role in diagnosing the cardiac disease. So, Optimize Discrete Kernel Vector (ODKV) classifier is used to reduce the redundant features and also to enhance the classification accuracy of the input ECG signal. Thus, Optimize Discrete Kernel Vector (ODKV) classifier identifies the Q wave, R wave and S wave in the input ECG signal. Finally, performance metrics Sensitivity, Specificity, Accuracy and Mean Square Error (MSE) are calculated and compared with the existing method such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN to prove the enhancement of the classification technique.


Introduction
Along with stroke and heart diseases, almost 17 million cardiovascular diseases (CVD) deaths occur each year. The World Health Organization (WHO) demonstrates that about 16.7 million inhabitants die from CVD each year. By 2020, cardiovascular sickness, the estimate of mortality, 20 million deaths annually by 2020, and 24 million deaths annually by 2030, will be the primary cause of death and universal disability. A cardiac arrhythmia is a form of abnormal heartbeat. We require an initial diagnosis and prognosis to decrease deaths due to cardiovascular disease that demands a precise and reliable biomedical diagnostic technique [1]. Some main causes of heart attacks are lack of physical activity, Smoking, elevated cholesterol, hypertension, destructive use of liquor, and undesirable diet [2]. To measure the heart's electrical activity, an electrocardiogram (ECG) procedure is used to monitor many electrical potentials of the heart [3].

Related Work
The author in [13] studied the methods like anisotropic diffusion, adaptive filters, wavelet transforms condensation of empirical mode, morphological filter, the algorithm of non-local means and denoising total variation was utilized for examination. It is concluded that the best denoising is generated by multivariate wavelet denoising or wavelet-PCA and is, therefore, the most suitable for DVP enhancement applications in the real world. Therefore the two-dimensional function of a Gaussian derivative was used in the images to obtain the spatiotemporal and most important spectral characteristics. The key purpose of the filters in two-dimensions was to manage and remove most of the significant characteristics from multi-scale images. Principal Components Analysis (PCA) was included to extract GD features from the output image produced from the two-dimensional image in this work filter. The efficiency of the proposed extraction method of the twodimensional filter-based image function was examined with the support of different picture perspectives [14]. Although it is not sufficient to decrease the noise in Signal Processing, specifically signals from ECG. The different stages of signal from ECG pre-processing, including the treating of imbalanced data, data normalization, and noise filtering through band pass filter and feature extraction method, were defined as the Random Forest machine learning method to analyze the ECG signal. The accuracy of this model is not properly indicated [15]. To enhance the accuracy, In this paper, the author suggest edit's not just an extraction technique of Multiple purpose vectors from ultrasound images for electrocardiogram signal Carotid Arteries (CAS) and Heart Rate Variability (HRV), but also an electrocardiogram signal effective and accurate prediction model in diagnosing the disease of Cardiovascular Disease (CVD) using SVM and showed about 89.51% after evaluating the diagnosis or prediction approaches in terms of diagnosing accuracy rate utilizing the multiple feature vectors are chosen [16]. Despite this, the presence of noise is affected by the accuracy of classification. To be classified into usual and abnormal subjects, delayed error normalized LMS filtering method using ECG signal preprocessing domain features of HRV Frequency and are adapted to SVM classifier-based Classification of the arrhythmic beat. Thus, noise is reduced insufficiently. The SVM-based classifier developed system provides maximum accuracy of 96% for the classification of normal and arrhythmic abnormal risk subjects [17]. Cardiac disease diagnosis such as Arrhythmia using ECG recording was performed in [18] by wrapper-based feature selection technique and classification of multi-classes. In detecting the frequency and lack of arrhythmias, Support Vector Machine (SVM) approaches based on it like One-Against-One (OAO), One Against-All (OAA), and Error-Correction Code (ECC) are used for multiclass classification. Accordingly, the OAO method of SVM provided an accuracy rate of 81.11%.Despite this, it is not presented sufficient accuracy in classification. Using a k-Nearest Neighbour algorithm and statistical features, 5-second ECG segments were classified into good-quality and bad-quality levels for signal quality classification in [19] and achieved a 96.87% average. Classifying blood pressure records obtained from the analysis of the Electrocardiogram (ECG) using the SVM classifier could reach an acceptable accuracy of 98.18% in [20]. As a portion of the filter-based feature selection techniques, an effective feature that the search algorithm for harmony can be altered and used in combination with other evaluators of function subsets. It is possible for the proposed expert systems to also be willing to be readily accessible and utilized by other biomedical indications, for instance, electrocardiography (ECG) and electromyography (EMG) signals, for classification tasks.

Problem Statement
The existence of PLI noise and redundant ECG signal features influences the precise classification of the recording of ECG signals that may aid in the diagnosis and care of patients with heart disease. It can find out the following validation metrics,  Sensitivity  Specificity  Accuracy  Mean Square Error

Sensitivity
The sensitivity is calculated based on the correct positive rate. It can be defined as the number of positive predictions correctly divided by the total number of positive predictions. It is called even as True Positive Rate (TPR). The best sensitivity is 1.0, whereas the worst is 0.0.

Specificity
The specificity metric is used to predict the exact prediction. It is also defined and divided by the total number of negative predictions as the number of accurate negative predictions. Often it is known as a true negative rate (TNR). The highest specificity is 1.0, while the worst specificity is 0.0.

Accuracy
Accuracy (ACC) is determined as the sum of all right predictions divided by the dataset's total number. Here, 1.0 is the highest accuracy, while the lowest is 0.0. Also, it can be estimated through 1.

Mean Square Error (MSE)
Mean Square Error (MSE) is computed and cantered based on inequality between an estimator and the true value of the calculated quantity. MSE is determined by,

Image Processing
ECG recordings are typically tainted by noise and objects of various kinds. The aims are to minimize certain noise and artifacts in the pre-processing phase to assess the fiducial points and to prevent amplitude and offset effects to compare signals from various patients. The ECG classification is an important task because the signal contains an excessive number of unrestful noises. The different noise in different degrees in the classification of cardiovascular disease allows a physician to make inaccurate diagnoses of patients and decreases the correctness of the diagnosis. ECG signal denoising and pre-processing are now becoming a discriminatory necessity. During the recording of the ECG signal, matters are more complicated. Power line interference, baseline wandering, electrode touch noise, electrode motion artifacts, muscle contractions (electromyography noise, electrosurgical noise, and instrumentation noise are common types of noise. Among these, noise that is described as a signal at 50 or 60 Hz frequency, and below 1 Hz bandwidth, is discussed as power line interference (PLI). An Adaptive Notch Filter reducing/cancelling PLI has been suggested. Nonetheless, the frequency varies in the narrow band around the fundamental frequencies of 50 Hz in ECG Signal in real-time re-recording. It is due to differences in the power supplies available that comply with various requirements and thus result in wandering between 47-53 Hz. This contributes conceptually to the design of a relevant aim filter and can delete between the PLI frequencies and the 47-53 Hz band, but keep the valuable signal for ECG's signal frequencies in that range intact to prevent degradation of any type. The Architecture of Adaptive Notch Filter is shown in figure 2.

Proposed Optimize Discrete Kernel Vector Classifier
The proposed Optimize Discrete Kernel Vector classifier is the second stage in this for the enhancement of the classification accuracy. The classification of the ECG signal helps to predict cardiac disease using heartbeat level. The three main components consist of a single beat of an ECG signal: the P wave, the complex QRS and the T wave. Variations of these elements are linked to different heart features and disorders.In making an assessment or inquiry, cardiologists commonly use factors as well as other derived positions, magnitudes and modes of the waves and characteristics for example, QT interval, ST segment, PR interval and PR segment. ECG has been used to provide important insights into the prevention, care and prevention of heart disease and diagnosis, such as arrhythmias. In the proper heart diagnosis disease, the accurate representation of the signal of ECG plays a crucial role, since electrocardiography is an examination of the electrical activity of the heart. We can optimize different signal features, such as position, length, shape, altitude, peak points, etc. The significant step is the feature reduction of specific properties from biomedical signals (ECG, EEG, etc.) during the method of classification. Subsequent to the pre-processing stage, the need is to obtain the basic assets for use in the last one stage. A mathematical model that is supervised is Discriminant. It is characterised by high computational power, robust performance, and easy implementation. Discriminant is a widely used technique for ECG signal reduction of features. Discriminant aim the intention is to classify the low-dimensional subspace into which the scattering of the intra-class reconstruction is minimised although maximising scatter for inter-class reconstruction.Suppose we have the perfect forecasts P = {φ 1, φ2… φ d} the best representation of all of their intra-class samples can be projected on which samples. Project each point of data into the subspace: In the subspace, the sample dispersion of the intra-class reconstruction is Where, The equation (8) above is labelled the scatter matrix of reconstruction within the intra-class.
In the subspace, the sample dispersion for inter-class reconstruction is Where, The above equation (10) is labelled the scatter matrix of inter-class reconstruction.
In our method, the gap from a sample of x to a sample of x i.e., the ℎ class is defined as the error of reconstruction by the ℎ class, In our method, the function we can obtain focuses scatter on the ranks of intra-class reconstruction and interclass scatter reconstruction. Usually, the intra-class reconstruction dispersion and the inter-class reconstruction dispersion are both in a high-dimensional subspace of total rank. Thereforeat most, Discriminant can reduce n characteristics. After feature reduction ECG classification is done. Automatic ECG signal classification is a difficult problem for many reasons. Inconsistencies in the temporal and morphologic features of different patients' patterns of simple ECG can be seen in the waveform of the ECG. The waveforms of ECGs can be equivalent to different patients with various heart rhythms and can vary at various times with the same patient. Heart rate variability is also a concern involved in the ECG Signals Classification. Cardiac rateis dependent on physiological and behavioural problems factors such as pressure, arousal, and workout can induce changes in ECG characteristics such as RR interval, PR interval, etc. In addition to such problems, the absence of consistency of variations of ECG, the complexity signals of an ECG, the non-existence of optimal rules of classification, and the variation of the beat in a single ECG is the main problems complicating the classification of ECG signal. Several algorithms for ECG heartbeat detection and classification have been evolved. Most of these ECG beat methods of classification work is well done on the data for preparation, but offer poor output and the ECG signal of various patients as a result of the above difficulties.Owing to the absence of standardization of the classification algorithm in the development and assessment criteria, analysis of the results across most of these works could not be carried out. The development of scalable classifier for data mining is the subject of several scientific studies. ODKV classifier is an effective technique widely used to address supervised classification issues due to its generalization ability.It is a binary classifier that tries to find a maximum margin hyper plane to represent the decision boundary, i.e., it defines a decision boundary with the greatest possible margin that can still distinguish the two classes. This not only decreases the chances of prediction errors but also reduces the high over-fitting possibility of limited margins inherent in decision limits. This is a classification algorithm that plots each data object as a point where n has become several characteristics with a particular coordinate value in the space ndimensional.In this paper, we utilize the ODKV method to distinguish more ECG input signal. In the input ECG signal, the Optimize Discrete Kernel Vector process effectively distinguishes the Q wave, R wave, and S waves to identify the heartbeat stage, such as LBBB, RBBB, PVC, and PACs.
For the cases positioned in the borderline between two classes, the ODKV classifier uses a hyper plane that produces the greatest separation of values calculated from the decision function.For a labelled data = {( , )}, where = 1,2 … . . ; n stands for the complete number of samples of results, and ODKV classifier mapped the input vectors using a nonlinear kernel function to the desired value .If the mapping function associated with it is defined as ( ) the judgment on the product of classification or mapping depends on the following equation: Here, w is the vector of weight and b is the value of bias, and ( ): → is a decision function that produces the product of the classification for each input vector by linear classification .So the product of the classification is: The parameters of W and is the training data considering minimization of the cost function are decided by the training data. The cost function associated with this can be written as: stands for positive meaning and Δ means the threshold Value Specified by the User. Under the restriction determined under Where, = 1, 2 …. and C, are two parameters specified by the user. Linear, polynomial, and sigmoidal are the popular functions used in ODKVC. Since Optimize Discrete Kernel Vector classifier can determine the degree of the pulse, such as PVC, PAC, LBBB, and RBBBs from the ECG signals. More efficient real-time discrimination could be possible after the extent of classification steps has been recombined. The ECG signal having features may directly or indirectly influence the complexity of the Optimize Discrete Kernel Vector model while the network is being trained. The ODKV method identifies the heart disease in the ECG signal input by classifying the Amount of Pulse.
Here, the MSE is measured based on the differentiation between estimators and the real value of the measured quantity. The Mean Square Error Calculated for the proposed Optimize Discrete Kernel Vector form of help is roughly analyzed with other classification methods that they are SVM-kNN, ANN-kNN, GB-SVNN, and CNN.
In contrast to other current deep learning methods, the proposed approach generates fewer MSE. The MSE is calculated by Where, n = Observation number, = Values observed,̂= RMSE expected values are used to calculate the variation sin-between the values obtained through a model or an estimator and the observed values. Root Mean Squared Error is MSE's root square.

= √MSE
The 2 (R-squared) is a mathematical issue degree for how near the knowledge to fit the regression line. Rsquared is likewise called because of the determinative coefficient. The 2 as follows is defined: i.e., SSE =Amount of Errors Squared, TSS = Complete Number of Errors Squared. SSE is the number of the square variations among the mean of every single observation and its party.
TSS is described as the combination of every square difference between each observation and the sum of average over all observations. TSS is defined by Where, ̅ = 1 ∑ =1 ,Hence, 2 is defined by, 2 (Q-squared),it's the ratio of MSE about the difference in response. 2 is represented as, The Coefficient of determination 2 is represented as Mean Absolute Error tests the variations among prediction and actual observation average over the test sample, where all individual variations occur areequally equal in weight use. MAE is represented as, The Mean Absolute Percent Error (MAPE) finds in percentage terms, the magnitude of the error. MAPE is characterized by, For the proposed Kernel Functional Support Vector Machine method, RMSE, MAPE, MAE, 2 , and 2 can be measured and compared with other methods, such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN.

Result
The UCI's machine learning library gathers the MHEALTH (Mobile HEALTH) database. The number of occurrences is 120 in this dataset, and the set of attributes is 23. The sensors were positioned on the chest, right wrist, and left ankle of the person respectively and connected using elastic straps. The ANF filter is used to eliminate the interruption of the power line available in the ECG input signals. Predominantly, ECG signals with a noise like power line interference affect the performance of the classification method. From the given input ECG signal, the proposed Optimize Discrete Kernel Vector classifier significantly determines the Q, R, and S wave to characterize the heartbeat frequency for the prediction of heart disease, include LBBB, RBBB, PVC, and PAC. The assessment metrics were used to determine the ability of the qualified classifier to generalize. This classifier is reduced redundant data which is the cause of inaccurate classification. In this case, when evaluated with unseen data, the assessment metric is used to calculate and summarise the output of the qualified classifier. With regards to, Mean Square Error (MSE) is used to analyze the inaccurate classification based on error reduction i.e redundant data reduction. Accuracy is among the most common criteria used to assess the ability of classifiers to generalize by many researchers. So that, the following metrics such as Sensitivity, Specificity and Accuracy are used to finding the performance of classification using equation (1, 2, 3,)

Figure 3
Comparison of accuracy Here, the accuracy of the proposed Optimize Discrete Kernel Vector classifier is compared with the other classification methods of ECG signals like SVM-kNN, ANN-kNN, GB-SVNN, and CNN. It is obvious from the above figure 3 that the highest accuracy is obtained by our proposed classifier because the noise of power line interference in the ECG signal that disturbs the accuracy is eliminated in the pre-processing stage. A critical review of the measured specificity of the proposed Optimize Discrete Kernel Vector classifier is performed using classification approaches such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN. Figure 4 shows that, relative to other current methods, the proposed method produces a high sensitivity score.

Figure 4 Comparison of Specificity
In comparison with classification methods such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN, the measured sensitivity for the proposed Optimize Discrete Kernel Vector method is evaluated. Figure 5 shows that are compared to any other current methods, the proposed method produce a high sensitivity score.

Figure 5 Comparison of sensitivity
As per the above discussion, to show the redundant data reduction is analyzed by the following metrics from table 1 for classification methods.  The standard of the presented work and is compared to four other classification methods. These were namely SVM-kNN, ANN-kNN, GB-SVNN, and CNN. The overall performance for the proposed and the existing method is shown in below figure 8. Moreover, the power line interference that exists in the ECG signal is withdrawn from the device using the Adaptive Notch Filter; the features in the signals are reduced using Discriminant method. Hence the Mean Square error is reduced and the performance of the classifier is get enhanced in terms of sensitivity, specificity, and accuracy.

Figure 6
Overall output of the proposed and existing system It is evident from Figure 6 that the accuracy of our classification is improved by eliminating the interference of the power line from the ECG signal and using an influential feature redundant method.

Conclusion
Electrocardiographic signals mostly consist of unwanted speckles and sounds. Various filters for Image Processing are used in different experiments to eliminate the noises.In this paper, the ANF filteris initially used to remove the power line interference that is present in the input ECG signal. To minimize the features present in the input ECG signal, Discriminant method is utilized. In this paper, to classify more of the input ECG signal characteristics, we useOptimize Discrete Kernel Vector classifier. The ODKV approach significantly distinguishes the Q wave, R wave, and S wave in the ECG input signal to identify the pulse stage for heart disease prediction includes LBBB, RBBB, PVC, and PACs. In comparison with other existing approaches such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN, the performance of the proposed Optimize Discrete Kernel Vector classifier with effective noise removal and is evaluated. In comparison with other methods, such as SVM-kNN, ANN-kNN, GB-SVNN, and CNN, the measured MAPE, 2 , RMSE, R 2 , and MAE for the proposedOptimize Discrete Kernel Vector classifier method is low. Finally, to demonstrate the efficiency of the proposed ODKV classifier, sensitivity, specificity, and Mean Square Error (MSE) is measured.

Not Applicable
Compliance with ethical standards

Conflict of interest
The authors declare that they have no conflict of interest.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.