In the designed IoT-based healthcare framework discussed in the previous section, data acquisition was performed. The acquired data are filtered in the first step of analysis, which includes detrending and denoising of the acquired biosignals to accurately compare the minimum threshold values already computed into the designed framework. Figure 5 shows the unfiltered ECG signal. To make it free from baseline wanderings and other motion artifacts, wavelet transform was used. The wavelet transform has its own significance due to its better localization properties in both the time and frequency domains. Decomposition of a normal ECG signal waveform comprises a P wave, QRS complex and T wave. These are the important parameters we located during the analysis of the detected ECG signal. To understand the cycle for better analysis, the P wave relates to the heart’s atrial depolarization, and the T wave corresponds to the heart’s ventricular repolarization. A wavelet transform is applied to remove noise or motion artifacts in the ECG signal, as shown in Fig. 6.
Tables 2 and 3 shows the located intervals and peak values, which are compared to threshold normal values (stored manually), in Table 1 for generating medical alerts. The ECG data are received via a biosignal acquisition device and stored on the cloud at 512 values per second. If the variation in LabVIEW-detected particular intervals is large, irregular ECG is detected. The R-R interval and QRS interval as shown in Figs. 7 and 8, serve as important indicators of various heart diseases, such as arrhythmia.
The next variable that is part of our analysis is HRV. This is variation in time between successful heartbeats, and is in control of a basic part of our nervous system called the autonomic nervous system (ANS). An abnormal HRV pattern leads to life-threatening cardiac diseases such as arrhythmias.
Table 1
Normal values and their standard deviation
Heart Rate Mean (bpm)
|
QRS Amplitude Mean (mv)
|
QRS Time Mean (s)
|
PR Interval Mean
(s)
|
QT Interval Mean (s)
|
ST level Mean (mV)
|
ISO Level Mean (mV)
|
78.06
|
0.86
|
0.062
|
0.13
|
0.338
|
-0.035
|
-0.294
|
Heart Rate Std(bpm)
|
QRS Amplitude Std (mv)
|
QRS Time Std (s)
|
PR Interval Std (s)
|
QT Interval Std(s)
|
ST level Std (mV)
|
Iso Level Std(mV)
|
0.84
|
0.022
|
0.002
|
0.009
|
0.009
|
0.027
|
0.079
|
Table 2
Heart Rate Mean (bpm)
|
QRS Amplitude Mean (mv)
|
QRS Time Mean (s)
|
PR Interval Mean
(s)
|
QT Interval Mean (s)
|
ST level Mean (mV)
|
Iso Level Mean (mV)
|
119.89
|
1.182
|
0.145
|
0.13
|
0.34
|
-0.213
|
0.21
|
Heart Rate Std(bpm)
|
QRS Amplitude Std (mv)
|
QRS Time Stdan(s)
|
PR Interval Std(s)
|
QT Interval Std(s)
|
ST level Std (mV)
|
Iso Level Std(mV)
|
0.98
|
0.011
|
0.0078
|
0.023
|
0.059
|
0.038
|
0.025
|
Table 3
Heart Rate Mean (bpm)
|
QRS Amplitude Mean (mv)
|
QRS Time Mean (s)
|
PR Interval Mean
(s)
|
QT Interval Mean (s)
|
ST level Mean (mV)
|
Iso Level Mean (mV)
|
84.07
|
0.635
|
0.056
|
0.141
|
0.339
|
0.11
|
-0.326
|
Heart Rate Std(bpm)
|
QRS Amplitude Std (mv)
|
QRS Time Stdan(s)
|
PR Interval Std(s)
|
QT Interval Std(s)
|
ST level Std (mV)
|
Iso Level Std(mV)
|
0.85
|
0.018
|
0.002
|
0.012
|
0.008
|
0.021
|
0.062
|
HRV can be affected by other factors, including aging and gender. Figure 9 reflects the heart rate number under normal conditions.
Another variable that is part of our analysis is plotting of the EMG signal for the detection of abnormalities. A 0.5 Hz EMG signal shown in Fig. 10 is acquired from the biosignal sensor and is stored in the cloud for further analysis and anomaly detection.
Studies have shown the significance of EMG signal analysis in the frequency domain to obtain useful intuition about muscle fibers. We used the frequency spectrum, as shown in Figs. 11 and 12, to generate other measures associated with EMG frequency analysis. The aim is to identify and validate the frequency domain behavior of the EMG signal and to determine median, average and mod frequency parameters from the power density spectrum to obtain useful information about the muscle state.
II. Anomaly Detection through Autoencoder
Another important contribution of this research work is the real-time implementation of a framework designed on the basis of LabView and Python as an add-on that produces a real-time powerful signal anomaly detection system. LabView 2020 Community edition is used for implementation of Python code with Python version 3.6. In a real-time scenario, an LSTM autoencoder-based deep neural network is implemented to detect anomalies in acquired ECG signals. The model is implemented in keras, and the model summary is shown in Fig. 13.
The model used in the classifier has a simple structure of one LSTM layer with 256 followed by a dropout layer with a rate of 0.2. Then, a repeat vector followed by the decoder level of the LSTM layer with 256 and a time-distributed layer is applied. Adam optimizer is used with “mae” loss. An average mae loss of 0.0072 for normal signals is achieved, and 0.078 is achieved for anomalous signals. Figure 16 shows the distribution of normal signal mae loss, while Fig. 14 shows the distribution of anomaly signal loss. It is shown that 98% of the normal signal loss distribution is under 0.05, while the distribution of anomaly signal mae loss is after 0.07. Therefore, anomalies are accurately detected by setting the maximum loss threshold of 0.068. Figure 15 visually represents the model ability of reconstruction of normal signals and anomaly signals. The normal signals model has much less reconstruction loss than anomaly signals by learning/mapping the dependencies present in the input signal. In Fig. 16, the first row of signal plots is for the normal signal compared with the reconstructed signal (red), and the second row of signal plots shows the anomaly signal compared with the reconstructed signal (red). It is clear from the graphs that the anomaly reconstruction error is almost 10 times greater than the normal signal reconstruction error loss.
We have implemented an interactive detection approach to detect anomalies using an LSTM autoencoder. This deep learning-based approach enables the discovery of constraints involving long-term nonlinear associations among multivariate time-series data records and attributes.
II. ECG Classification using AI techniques
The wavelet time scattering network required key parameters are the time invariant scale, the number of wavelet transforms, and the number of wavelets per octave in wavelet filter banks. In this research, two cascaded filter banks are used.
A. Input Data
The physiological signals used in this research were collected from Physionet [29]. Three datasets are used in this research: the MIT-BIH Arrhythmia Database, MIT-BIH Normal Sinus Rhythm Database and BIDMC Congestive Heart Failure Database. Data have three classes: arrhythmia (ARR), congestive heart failure (CHF), and normal sinus rhythm (NSR), with a total of 162 records. Training data are structured into an array of two fields with data and labels. The data field is a 162x65536 matrix, where each row is a training recording sampled at 128 Hz. Sample Plot of each class is shown in Fig. 17.
B. Classification Results
This work presents an extensive comparative analysis of 23 machine learning algorithms to find the best possible option for the ECG classification model into three classes: ARR, CHF and NSR. The signal length of each of these samples is 65,536 samples long, and we have 162 records of ECG data. A typical, easy way to train machine learning models is to directly feed our raw ECG signals in a machine learning classification algorithm, which would give us outputs such as the ARR and CHF. However, unfortunately, this approach does not work, probably because the signals are lengthy and the features of the signals change very rapidly with time, which the machine learning model is unable to interpret. We have added an additional step of feature engineering to extract the features from the ECG signals and use those features to train our machine learning algorithms, which helps build this model with a higher accuracy. The feature engineering step has numerous advantages and numerous aspects because it reduces the dimensionality of the input data. Then, feature selection is also a part of feature engineering, which allows us to select the features that are more prominent or have a greater effect on training the machine learning models, and we can eliminate the other features, which are probably just causing noise to the input data set.
A wavelet scattering filter bank is constructed giving the length of the signal, sampling frequency, quality factor of 8, 1 and default invariance scale. We extracted the features using the function feature matrix, reducing down to a feature set of 499 by 8, which is a 95% reduction in the size of features.
We separated 113 samples for training, and 49 samples were excluded from ECG samples for testing and randomly selected and computed the feature matrix of all of these data for both the test and training samples. The output of training data feature set is 499x8x113. The wavelet scattering transform is critically downsampled in time based on the bandwidth of the scaling function. In this case, this results in 8-time windows for each of the 499 scattering paths. The data were then trained by implementing 23 machine learning algorithms, including KNN, SVM, ensemble subspace discriminant and subspace KNN, which had higher accuracies, as shown in Table 4. While evaluating this trained model on the test data, the accuracy of the predicted class will be compared with the true class accuracy. We obtained the classification results on the basis of different numbers of feature sets. We trained the raw signal by inputting all 499 features, and we observed SVM fine, cubic, medium Gaussian, and ensemble subspace discriminants with prominent testing accuracies.
We started with 499 by 8 for each signal data set, and some of these features might be causing just more noise to the data set and not helping in the training process. Therefore, we use automatic feature selectors in MATLAB to select the prominent features and remove the unwanted features. For feature selection, we used the “fscmrmr” function[30], which is an automatic feature selection algorithm that ranks features for classification using the minimum redundancy maximum relevance algorithm. It calculates the scores for all the training data sets and evaluates them on what features are more prominent and have a greater impact and which features have less impact.
If we plot the first 60 features as shown in Fig. 18, the feature number one has a very high score, then the second highest score is feature number 141. Third highest, 395, 73, and so on. Therefore, for the next training, we used the top 20 feature sets, not using the entire 499 feature set. Similarly, when there are only 20 features to be used for training, there are fewer feature sets, so the training would be very quick. We reduced the feature set to 60 samples to include only the most important features, and we observed improvements in the accuracy of most SVM-, KNN- and ensemble-based ML algorithms. By reducing the feature set up to the 20 most important samples, we were able to obtain testing accuracies up to 98%.
The three classes that we need to build a model to classify the signals are ARR, CHF, and NSR. In each class, we have approximately 96 signals for the first class and 30 signals for the second and third classes, which is the normal sinus rhythm, with 36 signals. Each signal is roughly approximately 65,000 samples long. Now, the goal of this research is to find an optimal network that can classify ECG signals into these three distinct categories most accurately. Therefore, applying ML techniques can reduce the dimensionality of the signal automatically and extract the features while losing information about the signal itself. The main idea is to reduce signal dimensionality while preserving the information. Therefore, deep learning algorithms have been implemented so that the signal does not lose any information during training. We can directly feed in the signal to the DL-based LSTM network, and under few situations, instead of feeding the signals directly to LSTM networks, we can actually perform feature extraction before feeding the signals into the LSTM network, if required. We can train our LSTM networks on those features to build our model in situations where feeding raw data directly into LSTMs does not work, if we have less data to begin with, which is typically the case for many AI or machine learning or deep learning problems, or in situations where data augmentation can be very challenging. The aim of implementing a deep neural network is to take advantage of the set of multiple layers to train our data. The acquired ECG signal has all three different classes, having 113 training records. and 49 test records. The first step is to extract features automatically from the signal using the wavelet scattering network, apply the filter bank using this function feature matrix, and obtain a set of features.
The sample signal that we used earlier had 65,536 samples, and the features that were extracted automatically had 499 x 8 features. So roughly, approximately 4,000 features here. This represents almost a 95% reduction in the size of the features compared to the original signal. We have taken all the features, which is 499 x 8, by taking that as the whole matrix and then trained the LSTM network. We built a small LSTM network with 300 hidden units, 0.01 initial learning rate, 1000 minibatch size and 150 maximum epoch size and “last” as the output mode after hyperparameter optimization, giving 100% classification accuracy. The input side is 499 because they are the number of rows for every signal. For each signal, we have 499 by 8. The LSTM network is trained quickly, roughly in 45 seconds. To evaluate the LSTM model, we extracted all the features from the test signals. This model yields 100% accuracy, as shown in Fig. 19. Thus, LSTM has proven to be the optimal technique without the limitation of losing information and being assessable to noise due to reduction in the feature set, being a quick solution for raw real-time data.
Table 4
Classification accuracies of ML algorithms on different feature sets
Selected Features
|
499
|
60
|
20
|
Model
|
Training Accuracy %
|
Testing accuracy
|
Training Accuracy %
|
Testing accuracy %
|
Training Accuracy %
|
Testing accuracy %
|
Trees
|
Fine
|
97.6
|
94.89
|
97.3
|
93.36
|
97.6
|
93.36
|
Medium
|
97.6
|
94.89
|
97.3
|
93.36
|
94.6
|
93.36
|
Coarse
|
89.6
|
90.81
|
89.2
|
86.48
|
89.0
|
86.48
|
Linear Discriminant
|
99.1
|
82.9
|
99.7
|
95.91
|
95.9
|
91.58
|
Naïve Bayes
|
Gaussian
|
83.3
|
82.3
|
86.8
|
86.98
|
86.2
|
85.96
|
Kernel
|
93.4
|
85.2
|
97.6
|
96.17
|
98.1
|
94.64
|
SVM
|
Linear
|
96.8
|
96.17
|
98.8
|
97.19
|
97.9
|
92.60
|
Quadratic
|
99.8
|
94.13
|
99.8
|
96.42
|
99.9
|
97.19
|
Cubic
|
99.9
|
99.23
|
100
|
97.19
|
99.7
|
95.91
|
Fine Gaussian
|
83.1
|
63.7
|
95.5
|
67.09
|
86.2
|
72.19
|
Medium Gaussian
|
96.3
|
95.4
|
99.8
|
97.9
|
99.3
|
97.19
|
Coarse Gaussian
|
82
|
80.35
|
87.4
|
85.20
|
88.6
|
86.73
|
KNN
|
Fine
|
100
|
94.39
|
100
|
96.17
|
99.9
|
94.64
|
Medium
|
95.7
|
91.07
|
97.9
|
93.87
|
98.8
|
92.85
|
Coarse
|
75.9
|
78.06
|
85
|
89.28
|
91.6
|
92.60
|
Cosine
|
95.2
|
88.01
|
96.8
|
95.92
|
97.7
|
95.15
|
Cubic
|
94.5
|
88.26
|
96.9
|
92.60
|
97.7
|
89.54
|
Weighted
|
99
|
93.10
|
99.3
|
93.87
|
99.4
|
92.60
|
Ensemble
|
Boosted Trees
|
59.3
|
59.18
|
59.3
|
59.18
|
59.3
|
59.18
|
Bagged Trees
|
97.5
|
84.94
|
96.6
|
92.09
|
96.7
|
89.28
|
Subspace Discriminant
|
99.8
|
95.15
|
97.6
|
96.42
|
91.3
|
87.24
|
Subspace KNN
|
100
|
90.56
|
99.7
|
95.91
|
99.6
|
98.04
|
RUSBoosted Trees
|
98.3
|
94.64
|
98.1
|
95.15
|
99.0
|
95.15
|