Anomaly Detection and Classification of Physiological Signals in IoT- Based Healthcare Framework

doi:10.21203/rs.3.rs-540398/v2

Download PDF

Research Article

Anomaly Detection and Classification of Physiological Signals in IoT- Based Healthcare Framework

https://doi.org/10.21203/rs.3.rs-540398/v2

This work is licensed under a CC BY 4.0 License

Version 2

posted

You are reading this latest preprint version

Physiological signals retrieve information from sensors implanted or attached to the human body. These signals are vital data sources that can assist in predicting the disease well before time; thus, proper treatment can be made possible. With the addition of the Internet of Things in healthcare, real-time data collection and preprocessing for signal analysis has reduced the burden of in-person appointments and decision making on healthcare. Recently, deep learning-based algorithms have been implemented by researchers for the recognition, realization and prediction of diseases by extracting and analyzing important features. In this research, real-time 1-D time series data of on-body noninvasive biomedical sensors were acquired, preprocessed and analysed for anomaly detection. Feature engineered parameters of large and diverse datasets have been used to train the data to make the anomaly detection system more reliable. For comprehensive real-time monitoring, the implemented system uses wavelet time scattering features for classification and a deep learning-based autoencoder for anomaly detection of time series signals to assist the clinical diagnosis of cardiovascular and muscular activity. In this research, an implementation of an IoT-based AI-edge healthcare framework using biomedical sensors was presented. This paper also aims to analyse cloud data acquired through biomedical sensors using signal analysis techniques for anomaly detection, and time series classification has been performed for disease prognosis in real time by implementing 24 AI-based techniques to find the most accurate technique for real-time raw signals. The deep learning-based LSTM method based on wavelet time scattering feature extraction has shown a classification test accuracy of 100%. Using wavelet time scattering feature extraction achieved 95% signal reduction to increase the real-time processing speed. In real-time signal anomaly detection, 98% accuracy is achieved using LSTM autoencoders. The average mean absolute error loss of 0.0072 for normal signals and 0.078 is achieved for anomalous signals.

Computational Biology

Bioinformatics

Health Policy

Classification

Healthcare

Internet of Things (IoT)

Sensors

Filters

Signal analysis.

Physiological signals carry information on the electrical activity taking place in parts of a human body [1]. Traditionally, this information is analysed by healthcare workers to realize the physiological condition of the patients, and the analysis helps them in decision-making [2]. These decisions have far-reaching consequences on diagnosis, treatment monitoring, drug efficacy tests, and quality of life. Physiological signals such as electrocardiogram (ECG) carries information on the human heart’s electrical activity, and its analysis helps in understanding the cardiovascular health of the patient [3]. The recording was performed by conventionally attaching leads to the surface of the human body over specific areas. The graph is then examined by expert healthcare professionals to identify abnormalities or anomalies. The inclusion of compute vision technology and deep learning algorithms specifically has made the process of ECG anomaly detection a matter of a few minutes. The significance of automated detection has been validated in this COVID-19 pandemic when personal visits to a hospital are riskier than the problem itself. This research is an extension of our already published work where we have implemented and analysed an IoT-based healthcare system, and important features from the biomedical sensor data have been extracted by applying signal processing techniques for anomaly detection [4].

In recent years, the Internet of Things (IoT) has emerged as one of the most essential communication paradigms [5]. The IoT can connect a number of sensors, people, vehicles, gadgets, and appliances together on the internet and make the exchange of information easier and quicker. This exchange results in a large amount of useful data that is accessible and can be analysed for application purposes and makes lives easier. The IoT can assist the healthcare industry due to its access to information because of its ability to ensure connectivity. IoT applications in healthcare have helped people keep track of their medical requirements, such as reminding them of appointments, keeping a check on calorie count, variations in blood pressure, a check on exercises, and many more [6]. The IoT-based healthcare system field comprises sensors, microcontrollers and other electronic devices. These devices are capable of communicating with each other. The data received from these sensors (which are mostly biomedical sensors) are stored in the cloud for analysis and monitoring performed by medical professionals or attendants. Integrating IoT structures into medical devices advances the quality and service of care for aged patients and for children [7]. On-body non-invasive sensors monitor the physiological activities of the human body, and the information is stored and analysed for the realization, detection and potential treatment of diseases. This technology can be a breakthrough in the healthcare industry in terms of quick and safe responses, less physical contact, multiple treatment options, cost reduction and, most importantly, availability of better healthcare to the masses.

Over the past few years, there have been many advancements in the design of the IoT, which has spurred the use and development of smart systems in various biomedical-related processes. Ultimately, it has supported the improvement of the healthcare system by making it intelligent and manageable. There are many examples describing the effectiveness of smart healthcare, including tracking of patients/biomedical equipment, automatic identification, correct prescription of drugs for patients and monitoring of physiological parameters of patients in real time. For implementation, three of the most common smart technologies available are radio frequency identification (RFID), ultrahigh-frequency (UHF) and wireless sensor networks (WSNs). [8]

In today’s digital world, artificial intelligence, especially deep learning algorithms, has influenced every data-driven field, including healthcare. Deep learning algorithms have great potential to analyse data, which would assist medical staff in the perception, recognition and prediction of diseases and would also help in decision-making. According to a survey (Journal of Patient Safety, 2014), approximately 400,000 people have died in the US due to incorrect diagnosis and decision-making errors of practitioners. Deep learning algorithms can reduce the diagnosis time and cost and would also help in providing accurate treatment. Convolution neural networks [9] have emerged as the most frequently used algorithm for extracting important features from data mainly due to their ability to generate good-quality featured trained data automatically and require the least data pre-processing. Recently, 1D CNNs have not only managed to achieve high-quality results over 2D CNNs while training temporal data [10] but have also shown better performance with less complexity in both the computations and the training methods. Time series classification (TSC) of patients’ physiological data helps to recognize and realize anomalies in physiological features by learning and identifying the temporal patterns of the acquired real-time data. Temporal data have temporal abstraction and patterns, and learning the important features for the classification of time series data is a problem that has been addressed by researchers in the past few years. TSC utilizes the temporal order of the data for analysis, and each time the task involves human cognition [11]. All these applications make TSC a desirable algorithm for dealing with real-time data analysis. The process of classification of a time series involves labelling or assigning a class to the data/time series.

This research is an extension of our already published work where we have implemented and analysed an IoT-based healthcare system, and important features from the biomedical sensor data have been extracted by applying signal processing techniques for anomaly detection [4]. The key contributions of this research work include the following:

i) Design and implementation of an IoT-based health monitoring framework, where real-time data from on-body sensors have been acquired and stored in the cloud for the purpose of real-time monitoring and analysis for anomaly detection.

ii) The implemented system can detect anomalies automatically and quickly with the help of unsupervised deep LSTM autoencoders (AEs). AE has the ability to learn the important input features by itself by utilizing nonlinear mapping without even labeling the data and then reconstructing the input signal.

iii) In this research, wavelet time scattering feature extraction was implemented to reduce the signal size to increase the real-time processing speed.

iv) To bridge the gap between affective computing techniques and healthcare organizations for the early diagnosis of diseases, this paper presents an AI-edge platform by implementing 24 artificial intelligence-based algorithms to find the optimized algorithm by comparative investigation of different physiological signal classifications.

In the remaining part of this article, we present the related study in section II, design and implementation in section III. The signal processing of acquired data and time series classification are given in section IV, and based on the analysis of signal data, the results are discussed in section IV. We then conclude the article in section V.

Temporal data have temporal abstraction and patterns, and learning the important features for the classification of time series data is a problem that has been addressed by researchers in the past few years. Time series classification groups the temporal data on the basis of extracted features into classes for identification and recognition. In this research, an IoT-based healthcare framework is given with real-time feature extraction, and extensive machine/deep learning algorithms are applied to learn and train the extracted data for time series classification. Preprocessing and feature extraction through applying signal processing techniques would make the feature extraction process much more reliable. Researchers have invested a good amount of time in analyzing signal processing techniques for the construction of structural health monitoring (SHM) for damage detection [12]. A simple Butterworth filter was used to denoise the signal cross-correlation to detect the extent of damage. Detection and classification of physiological signals such as ECG for heartbeat diagnosis that has utilized discrete wavelet transform (DWT) and support vector machine (SVM) have resulted in 98.8% classification accuracy [13]. The limitation of such work is usually the vector size. Other aspects, including the security of industrial IoT-based healthcare, have been explored with the inclusion of encrypted techniques such as water marking for the prevention of theft. A large amount of data collected by body sensors in IoT-based healthcare should be managed properly. For this reason, big data analytic techniques have been introduced in healthcare organizations [14]. Deep learning algorithms have the capability to analyse large and varied datasets compared to machine analysis and classification methods. This is because deep learning algorithms have the ability to avail all the available input during the development process [1]. Biomedical signals including EEG, ECG, EMG, etc. have been analysed for this purpose. In another study [15][16], a healthcare system for critical patients and this system was to be operated on by medical attendants/nurses anywhere. The system has a smartphone application developed to operate with biomedical sensors for acquiring the required data and dedicated health servers. Authors have collected different types of biosignals from patients using the developed system to monitor and analyse the data received from biosensors. The designed system has its own limitation, as the biometric information is being transmitted to the smartphone using Bluetooth, making it low range, and then the acquired data are sent to the dedicated health server for analysis. Additionally, an encrypted form of this analysed data is stored in the database.

The process of classification of a time series involves labeling or assigning a class to the data/time series. As the accessibility of temporal data has increased [17], a large number of algorithms have been proposed to address the classification problem. Generally, TSC can be categorized into feature-based, model-based and distance-based methods [11]. Most commonly, feature-based classification methods include discrete wavelet transform (DWT) and principal component analysis (PCA) [18]. The Markov method and hidden Markov method (HMM) lie under the category of model-based classification, and the distance of time series has been measured by dynamic time wrapping (DTW) in some cases. Traditional methods cover assigning the labels to nearest neighbors (NN) or k-nearest neighbor for multivariate time series[19]; in some cases, the combination of NN and DTW has produced good results [20]. Over the past few years, with advancements in deep neural networks, deep learning algorithms have been introduced to solve time series classification problems. Algorithms such as recurrent neural networks (RNNs) and 1-D convolutional neural networks (CNNs) have been shown to provide state-of-the-art results on challenging activity recognition tasks with little or no data feature engineering instead of using feature learning on raw data [21]. In deep learning algorithms, the auto encoder (AE) is an unsupervised way of learning the features of the training data [22]. It is a type of neural network whose prime function is to reconstruct the input data as an output. Unsupervised AE has the ability to learn the important input features by itself by utilizing nonlinear mapping without even labeling the data and then reconstructing the input data. A convolutional auto encoder is a modified type of AE where a 1D-CNN layer is integrated to extract the important features from raw input data while a 2D-CNN layer is added for image processing. For time series signals LSTM based deep learning methods show the better accuracy, therefore, for anomaly detection of ECG time series signals LSTM based Autoencoder is implemented in this research to achieve high accuracy and robustness. Another CNN-based AE comprises an encoder part and a decoder part in its algorithm. The CNN layers help compress the input data following the convolutional layers, pooling layers and using different kernel sizes [23]. Unlike conventional CNNs, CAEs are trained to minimize the loss ratio to decrease the reconstruction error during the training stages on the encoder/decoder.

This aim of this work is to design and implement an IoT-based AI-edge healthcare system with uniqueness due to its capability to combine and enable different, yet complementary, technologies. Basically, the system we have pictured should have the ability to collect and analyse real-time signals/data about the physiological parameters of a patient and deliver them to a signal processing unit. Then, finding anomalies in patient data sends alerts to a medical professional, helping patients take an active role in managing their health. Figure 1 shows the design of the IoT-based healthcare system and signal analysis in the same framework. In this system, signals from sensors connected to a patient are transferred to Arduino (NodeMCU ESP8266). NodeMCU ESP8266 is connected to PC via mini USB port. NodeMCU ESP8266 is equipped with an Atmega 328 microcontroller and compatible with the Arduino IDE with WiFi SoC. It belongs to the Boards Manager selection of the ESP-12 family and has six extra GPIOs compared to the ESP-12E module [24]. Adafruit IO is an open source platform available for data logging with the MQTT library. Sensor data are displayed and logged onto the relevant feeds, which can be private and public. In public mode, data can be viewed by anyone in the world connected to the internet, while in private mode, data are viewed by the owner of the feed only, giving him the required privacy.

Online data are retrieved and processed in LabVIEW for anomaly detection. In Fig. 2, the Arduino is connected with LabVIEW through NI NISA (Virtual Instrument Software Architecture VISA) and LabVIEW Interfacing For Arduino (LIFA), which is a standard for configuring, programming, and troubleshooting instrumentation systems comprising GPIB, VXI, PXI, serial, ethernet, and/or USB interfaces. Data are fed through a serial port interface to LabVIEW for online signal processing.

I. Signal processing

A. ECG Signals:

Electrocardiogram (ECG) is the measurement of the body’s electrical potentials. For the measurement, contact electrodes are placed on the surface of the body. There are different factors causing distortions in the measured signals, including patient movement, contact area between skin and electrodes and breathing, generally known as baseline wanderings. Baseline wanderings are noise signals that can mask important features of ECG signals. Figure 3 shows the processing of ECG signal. Certain methods can be used to remove the artifacts in ECG signals, generally categorized into two major types of filtering: nonadaptive and adaptive filtering. In this paper, wavelet adaptive filtering is used for the elimination of ECG baseline wanderings to minimize distortion. Coefficients of the transform and filtered output have been used to reconstruct the signal, and then the baseline-removed signal has been produced using inverse wavelet transform.

B. Heart Rate Variability (HRV):

Heart rate variability (HRV) is the measure of the variation between successive heartbeats over a period of time. It relies on heart rate regulation. HRV is an important feature reflecting physiological factors and the normal heart rhythm. It also provides an understanding of the interaction between the sympathetic and parasympathetic nervous systems. Therefore, heart rate variation analysis has become a popular noninvasive tool for evaluating the nervous system and its automatic activities. [25].

C. Electromyogram (EMG) Signals:

Electromyography (EMG) is the study of neurophysiologic mechanisms of fatigue in human body muscles. It is a noninvasive tool that helps in the measurement of the muscle activation level in myofibril membranes [26]. Fatigue can be seen as reduction in muscle force. Signal processing of the EMG signal using the frequency domain helps in extracting important features such as its dispersion values (standard deviation (SD) and variance measures), the median power frequency (MPF), and the slope, or the rate of decrease, of the MPF. In this research, all of these variables were extracted and analysed to measure the state of human muscle fibers.

II. Wavelet Scattering Transform

The wavelet transform computes the inner products of a time series signal [27]. Transient electrocardiogram (ECG) signals are present due to the presence of heart beats. These transients are generally not smooth and are of short duration. Transients are precisely captured by wavelets due to their flexibility in shape and short duration. ECG time-series signals usually require higher frequency resolution than time resolution because they have a low-frequency component. However, a high time resolution is required for high-frequency components in ECG because they vary quickly with time. Therefore, a multiresolution analysis method would analyse an ECG signal precisely if the ECG signal comprises both high- and low-frequency components. In wavelet scattering, the propagation of data is done by a series of wavelet transforms as shown in Fig. 4, nonlinearities, and averaging. Therefore, it produces low-variance representations of time series. Wavelet time scattering yields signal representations insensitive to shifts in the input signal without sacrificing class discriminability.

In the designed IoT-based healthcare framework discussed in the previous section, data acquisition was performed. The acquired data are filtered in the first step of analysis, which includes detrending and denoising of the acquired biosignals to accurately compare the minimum threshold values already computed into the designed framework. Figure 5 shows the unfiltered ECG signal. To make it free from baseline wanderings and other motion artifacts, wavelet transform was used. The wavelet transform has its own significance due to its better localization properties in both the time and frequency domains. Decomposition of a normal ECG signal waveform comprises a P wave, QRS complex and T wave. These are the important parameters we located during the analysis of the detected ECG signal. To understand the cycle for better analysis, the P wave relates to the heart’s atrial depolarization, and the T wave corresponds to the heart’s ventricular repolarization. A wavelet transform is applied to remove noise or motion artifacts in the ECG signal, as shown in Fig. 6.

Tables 2 and 3 shows the located intervals and peak values, which are compared to threshold normal values (stored manually), in Table 1 for generating medical alerts. The ECG data are received via a biosignal acquisition device and stored on the cloud at 512 values per second. If the variation in LabVIEW-detected particular intervals is large, irregular ECG is detected. The R-R interval and QRS interval as shown in Figs. 7 and 8, serve as important indicators of various heart diseases, such as arrhythmia.

The next variable that is part of our analysis is HRV. This is variation in time between successful heartbeats, and is in control of a basic part of our nervous system called the autonomic nervous system (ANS). An abnormal HRV pattern leads to life-threatening cardiac diseases such as arrhythmias.

Table 1

Normal values and their standard deviation
Heart Rate Mean (bpm)	QRS Amplitude Mean (mv)	QRS Time Mean (s)	PR Interval Mean (s)	QT Interval Mean (s)	ST level Mean (mV)	ISO Level Mean (mV)
78.06	0.86	0.062	0.13	0.338	-0.035	-0.294
Heart Rate Std(bpm)	QRS Amplitude Std (mv)	QRS Time Std (s)	PR Interval Std (s)	QT Interval Std(s)	ST level Std (mV)	Iso Level Std(mV)
0.84	0.022	0.002	0.009	0.009	0.027	0.079

Table 2

Tachycardia Detection
Heart Rate Mean (bpm)	QRS Amplitude Mean (mv)	QRS Time Mean (s)	PR Interval Mean (s)	QT Interval Mean (s)	ST level Mean (mV)	Iso Level Mean (mV)
119.89	1.182	0.145	0.13	0.34	-0.213	0.21
Heart Rate Std(bpm)	QRS Amplitude Std (mv)	QRS Time Stdan(s)	PR Interval Std(s)	QT Interval Std(s)	ST level Std (mV)	Iso Level Std(mV)
0.98	0.011	0.0078	0.023	0.059	0.038	0.025

Table 3

hyperkalemia Detection
Heart Rate Mean (bpm)	QRS Amplitude Mean (mv)	QRS Time Mean (s)	PR Interval Mean (s)	QT Interval Mean (s)	ST level Mean (mV)	Iso Level Mean (mV)
84.07	0.635	0.056	0.141	0.339	0.11	-0.326
Heart Rate Std(bpm)	QRS Amplitude Std (mv)	QRS Time Stdan(s)	PR Interval Std(s)	QT Interval Std(s)	ST level Std (mV)	Iso Level Std(mV)
0.85	0.018	0.002	0.012	0.008	0.021	0.062

HRV can be affected by other factors, including aging and gender. Figure 9 reflects the heart rate number under normal conditions.

Another variable that is part of our analysis is plotting of the EMG signal for the detection of abnormalities. A 0.5 Hz EMG signal shown in Fig. 10 is acquired from the biosignal sensor and is stored in the cloud for further analysis and anomaly detection.

Studies have shown the significance of EMG signal analysis in the frequency domain to obtain useful intuition about muscle fibers. We used the frequency spectrum, as shown in Figs. 11 and 12, to generate other measures associated with EMG frequency analysis. The aim is to identify and validate the frequency domain behavior of the EMG signal and to determine median, average and mod frequency parameters from the power density spectrum to obtain useful information about the muscle state.

II. Anomaly Detection through Autoencoder

Another important contribution of this research work is the real-time implementation of a framework designed on the basis of LabView and Python as an add-on that produces a real-time powerful signal anomaly detection system. LabView 2020 Community edition is used for implementation of Python code with Python version 3.6. In a real-time scenario, an LSTM autoencoder-based deep neural network is implemented to detect anomalies in acquired ECG signals. The model is implemented in keras, and the model summary is shown in Fig. 13.

The model used in the classifier has a simple structure of one LSTM layer with 256 followed by a dropout layer with a rate of 0.2. Then, a repeat vector followed by the decoder level of the LSTM layer with 256 and a time-distributed layer is applied. Adam optimizer is used with “mae” loss. An average mae loss of 0.0072 for normal signals is achieved, and 0.078 is achieved for anomalous signals. Figure 16 shows the distribution of normal signal mae loss, while Fig. 14 shows the distribution of anomaly signal loss. It is shown that 98% of the normal signal loss distribution is under 0.05, while the distribution of anomaly signal mae loss is after 0.07. Therefore, anomalies are accurately detected by setting the maximum loss threshold of 0.068. Figure 15 visually represents the model ability of reconstruction of normal signals and anomaly signals. The normal signals model has much less reconstruction loss than anomaly signals by learning/mapping the dependencies present in the input signal. In Fig. 16, the first row of signal plots is for the normal signal compared with the reconstructed signal (red), and the second row of signal plots shows the anomaly signal compared with the reconstructed signal (red). It is clear from the graphs that the anomaly reconstruction error is almost 10 times greater than the normal signal reconstruction error loss.

We have implemented an interactive detection approach to detect anomalies using an LSTM autoencoder. This deep learning-based approach enables the discovery of constraints involving long-term nonlinear associations among multivariate time-series data records and attributes.

II. ECG Classification using AI techniques

The wavelet time scattering network required key parameters are the time invariant scale, the number of wavelet transforms, and the number of wavelets per octave in wavelet filter banks. In this research, two cascaded filter banks are used.

A. Input Data

The physiological signals used in this research were collected from Physionet [29]. Three datasets are used in this research: the MIT-BIH Arrhythmia Database, MIT-BIH Normal Sinus Rhythm Database and BIDMC Congestive Heart Failure Database. Data have three classes: arrhythmia (ARR), congestive heart failure (CHF), and normal sinus rhythm (NSR), with a total of 162 records. Training data are structured into an array of two fields with data and labels. The data field is a 162x65536 matrix, where each row is a training recording sampled at 128 Hz. Sample Plot of each class is shown in Fig. 17.

B. Classification Results

This work presents an extensive comparative analysis of 23 machine learning algorithms to find the best possible option for the ECG classification model into three classes: ARR, CHF and NSR. The signal length of each of these samples is 65,536 samples long, and we have 162 records of ECG data. A typical, easy way to train machine learning models is to directly feed our raw ECG signals in a machine learning classification algorithm, which would give us outputs such as the ARR and CHF. However, unfortunately, this approach does not work, probably because the signals are lengthy and the features of the signals change very rapidly with time, which the machine learning model is unable to interpret. We have added an additional step of feature engineering to extract the features from the ECG signals and use those features to train our machine learning algorithms, which helps build this model with a higher accuracy. The feature engineering step has numerous advantages and numerous aspects because it reduces the dimensionality of the input data. Then, feature selection is also a part of feature engineering, which allows us to select the features that are more prominent or have a greater effect on training the machine learning models, and we can eliminate the other features, which are probably just causing noise to the input data set.

A wavelet scattering filter bank is constructed giving the length of the signal, sampling frequency, quality factor of 8, 1 and default invariance scale. We extracted the features using the function feature matrix, reducing down to a feature set of 499 by 8, which is a 95% reduction in the size of features.

We separated 113 samples for training, and 49 samples were excluded from ECG samples for testing and randomly selected and computed the feature matrix of all of these data for both the test and training samples. The output of training data feature set is 499x8x113. The wavelet scattering transform is critically downsampled in time based on the bandwidth of the scaling function. In this case, this results in 8-time windows for each of the 499 scattering paths. The data were then trained by implementing 23 machine learning algorithms, including KNN, SVM, ensemble subspace discriminant and subspace KNN, which had higher accuracies, as shown in Table 4. While evaluating this trained model on the test data, the accuracy of the predicted class will be compared with the true class accuracy. We obtained the classification results on the basis of different numbers of feature sets. We trained the raw signal by inputting all 499 features, and we observed SVM fine, cubic, medium Gaussian, and ensemble subspace discriminants with prominent testing accuracies.

We started with 499 by 8 for each signal data set, and some of these features might be causing just more noise to the data set and not helping in the training process. Therefore, we use automatic feature selectors in MATLAB to select the prominent features and remove the unwanted features. For feature selection, we used the “fscmrmr” function[30], which is an automatic feature selection algorithm that ranks features for classification using the minimum redundancy maximum relevance algorithm. It calculates the scores for all the training data sets and evaluates them on what features are more prominent and have a greater impact and which features have less impact.

If we plot the first 60 features as shown in Fig. 18, the feature number one has a very high score, then the second highest score is feature number 141. Third highest, 395, 73, and so on. Therefore, for the next training, we used the top 20 feature sets, not using the entire 499 feature set. Similarly, when there are only 20 features to be used for training, there are fewer feature sets, so the training would be very quick. We reduced the feature set to 60 samples to include only the most important features, and we observed improvements in the accuracy of most SVM-, KNN- and ensemble-based ML algorithms. By reducing the feature set up to the 20 most important samples, we were able to obtain testing accuracies up to 98%.

The three classes that we need to build a model to classify the signals are ARR, CHF, and NSR. In each class, we have approximately 96 signals for the first class and 30 signals for the second and third classes, which is the normal sinus rhythm, with 36 signals. Each signal is roughly approximately 65,000 samples long. Now, the goal of this research is to find an optimal network that can classify ECG signals into these three distinct categories most accurately. Therefore, applying ML techniques can reduce the dimensionality of the signal automatically and extract the features while losing information about the signal itself. The main idea is to reduce signal dimensionality while preserving the information. Therefore, deep learning algorithms have been implemented so that the signal does not lose any information during training. We can directly feed in the signal to the DL-based LSTM network, and under few situations, instead of feeding the signals directly to LSTM networks, we can actually perform feature extraction before feeding the signals into the LSTM network, if required. We can train our LSTM networks on those features to build our model in situations where feeding raw data directly into LSTMs does not work, if we have less data to begin with, which is typically the case for many AI or machine learning or deep learning problems, or in situations where data augmentation can be very challenging. The aim of implementing a deep neural network is to take advantage of the set of multiple layers to train our data. The acquired ECG signal has all three different classes, having 113 training records. and 49 test records. The first step is to extract features automatically from the signal using the wavelet scattering network, apply the filter bank using this function feature matrix, and obtain a set of features.

The sample signal that we used earlier had 65,536 samples, and the features that were extracted automatically had 499 x 8 features. So roughly, approximately 4,000 features here. This represents almost a 95% reduction in the size of the features compared to the original signal. We have taken all the features, which is 499 x 8, by taking that as the whole matrix and then trained the LSTM network. We built a small LSTM network with 300 hidden units, 0.01 initial learning rate, 1000 minibatch size and 150 maximum epoch size and “last” as the output mode after hyperparameter optimization, giving 100% classification accuracy. The input side is 499 because they are the number of rows for every signal. For each signal, we have 499 by 8. The LSTM network is trained quickly, roughly in 45 seconds. To evaluate the LSTM model, we extracted all the features from the test signals. This model yields 100% accuracy, as shown in Fig. 19. Thus, LSTM has proven to be the optimal technique without the limitation of losing information and being assessable to noise due to reduction in the feature set, being a quick solution for raw real-time data.

Table 4

Classification accuracies of ML algorithms on different feature sets
Selected Features		499		60		20
Model		Training Accuracy %	Testing accuracy	Training Accuracy %	Testing accuracy %	Training Accuracy %	Testing accuracy %
Trees	Fine	97.6	94.89	97.3	93.36	97.6	93.36
	Medium	97.6	94.89	97.3	93.36	94.6	93.36
	Coarse	89.6	90.81	89.2	86.48	89.0	86.48
Linear Discriminant		99.1	82.9	99.7	95.91	95.9	91.58
Naïve Bayes	Gaussian	83.3	82.3	86.8	86.98	86.2	85.96
Naïve Bayes	Kernel	93.4	85.2	97.6	96.17	98.1	94.64
SVM	Linear	96.8	96.17	98.8	97.19	97.9	92.60
	Quadratic	99.8	94.13	99.8	96.42	99.9	97.19
	Cubic	99.9	99.23	100	97.19	99.7	95.91
	Fine Gaussian	83.1	63.7	95.5	67.09	86.2	72.19
	Medium Gaussian	96.3	95.4	99.8	97.9	99.3	97.19
	Coarse Gaussian	82	80.35	87.4	85.20	88.6	86.73
KNN	Fine	100	94.39	100	96.17	99.9	94.64
	Medium	95.7	91.07	97.9	93.87	98.8	92.85
	Coarse	75.9	78.06	85	89.28	91.6	92.60
	Cosine	95.2	88.01	96.8	95.92	97.7	95.15
	Cubic	94.5	88.26	96.9	92.60	97.7	89.54
	Weighted	99	93.10	99.3	93.87	99.4	92.60
Ensemble	Boosted Trees	59.3	59.18	59.3	59.18	59.3	59.18
	Bagged Trees	97.5	84.94	96.6	92.09	96.7	89.28
	Subspace Discriminant	99.8	95.15	97.6	96.42	91.3	87.24
	Subspace KNN	100	90.56	99.7	95.91	99.6	98.04
	RUSBoosted Trees	98.3	94.64	98.1	95.15	99.0	95.15

An IoT-based AI-edge healthcare framework has been designed and implemented for real-time anomaly detection and classification of physiological signals, ultimately preventing probable loss by early diagnosis. The designed system helps improve medical services by providing a smart system capable of tracking and displaying the analysed state of patient health using graphical representation. This information helps doctors understand and prescribe suitable medical treatments for immediate action. One of the contributions of this research is extensive signal processing of acquired data from a biosignal acquisition framework to understand the patient’s condition. Analysis was performed by detecting PR, QRS, ST intervals, R-peaks in the ECG signal, heart rate variance, standard deviation (SD) and variance measures, the median power frequency, and the slope of the median power frequency in the EMG signal. In the extended part of the research work of the implemented hardware framework, real-time anomaly detection of raw ECG signals has been implemented using a deep learning-based autoencoder. By setting reconstructed error thresholds of 0.02 in terms of MAE loss, 98% accuracy is achieved for anomaly detection. Using wavelet time scattering feature extraction achieved 95% signal reduction to increase the real-time processing speed. In addition to anomaly detection, this work provides an extensive comparative study of 24 machine learning and deep learning-based techniques to identify the optimum solution for the classification of ECG signals. The comparative evaluation has shown that ML algorithms are dependent on the size of the feature set for better accuracy, which not only loses important information but also becomes more prone to noise as the size decreases. However, the deep-learning-based algorithm LSTM has shown 100% testing accuracy for multiclass ECG signal giving in the raw data without any size reduction.

We declare that we will make readily reproducible materials described in the manuscript, including new software, databases and all relevant raw data, freely available to any scientist wishing to use them, without breaching participant confidentiality.

*Funding: NA

*Conflicts of interest/Competing interests: We have no conflicts of interest to disclose.

*Availability of data and material: Three datasets are used in this research, including the MIT-BIH Arrhythmia Database, MIT-BIH Normal Sinus Rhythm Database and BIDMC Congestive Heart Failure Database.

*Code availability: NA

*Authors' contributions: Concept and design (MN and JA), implementation (MN), classification and comparative study (MN).

Faust, O., Hagiwara, Y., Hong, T. J., Lih, O. S., & Acharya, U. R. (2018). Deep learning for healthcare applications based on physiological signals: A review. Computer Methods and Programs in Biomedicine, 161, 1–13. https://doi.org/10.1016/j.cmpb.2018.04.005
Yeasmin, S. (2019). Benefits of Artificial Intelligence in Medicine. 2nd International Conference on Computer Applications and Information Security, ICCAIS 2019, 1–6. https://doi.org/10.1109/CAIS.2019.8769557
Chauhan, S., & Vig, L. (2015). Anomaly detection in ECG time signals via deep long short-term memory networks. Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, (October). https://doi.org/10.1109/DSAA.2015.7344872
Nawaz, M., Ahmed, J., Abbas, G., & Ur Rehman, M. (2021). Signal Analysis and Anomaly Detection of IoT-Based Healthcare Framework, 1–6. https://doi.org/10.1109/gcwot49901.2020.9391621
Tyagi, S., Agarwal, A., & Maheshwari, P. (2016). A conceptual framework for IoT-based healthcare system using cloud computing. Proceedings of the 2016 6th International Conference - Cloud System and Big Data Engineering, Confluence 2016, (January), 503–507. https://doi.org/10.1109/CONFLUENCE.2016.7508172
Al-Emran, M., Malik, S. I., & Al-Kabi, M. N. (2020). A Survey of Internet of Things (IoT) in Education: Opportunities and Challenges (pp. 197–209). https://doi.org/10.1007/978-3-030-24513-9_12
Shin, D., Kim, S., Yeom, S., Kwon, O. J., & Shin, D. (2018). Ubiquitous healthcare system for analysis of chronic patients’ biological and lifelog data. IEEE Access, 6(c), 8909–8915. https://doi.org/10.1109/ACCESS.2018.2805304
Catarinucci, L., de Donno, D., Mainetti, L., Palano, L., Patrono, L., Stefanizzi, M. L., & Tarricone, L. (2015). An IoT-Aware Architecture for Smart Healthcare Systems. IEEE Internet of Things Journal, 2(6), 515–526. https://doi.org/10.1109/JIOT.2015.2417684
Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., Adam, M., Gertych, A., & Tan, R. S. (2017). A deep convolutional neural network model to classify heartbeats. Computers in Biology and Medicine, 89, 389–396. https://doi.org/10.1016/j.compbiomed.2017.08.022
Zhao, B., Lu, H., Chen, S., Liu, J., & Wu, D. (2017). Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 28(1), 162–169. https://doi.org/10.21629/JSEE.2017.01.18
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2019). Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4), 917–963. https://doi.org/10.1007/s10618-019-00619-1
Mahmud, M. A., Abdelgawad, A., Yelamarthi, K., & Ismail, Y. A. (2017). Signal processing techniques for IoT-based structural health monitoring. In 2017 29th International Conference on Microelectronics (ICM) (pp. 1–5). IEEE. https://doi.org/10.1109/ICM.2017.8268825
Azariadi, D., Tsoutsouras, V., Xydis, S., & Soudris, D. (2016). ECG signal analysis and arrhythmia detection on IoT wearable medical devices. In 2016 5th International Conference on Modern Circuits and Systems Technologies (MOCAST) (pp. 1–4). IEEE. https://doi.org/10.1109/MOCAST.2016.7495143
Wang, Y., Kung, L., & Byrd, T. A. (2018). Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change, 126, 3–13. https://doi.org/10.1016/j.techfore.2015.12.019
Kim, S., Yeom, S., Kwon, O.-J., Shin, D., & Shin, D. (2018). Ubiquitous Healthcare System for Analysis of Chronic Patients’ Biological and Lifelog Data. IEEE Access, 6, 8909–8915. https://doi.org/10.1109/ACCESS.2018.2805304
Shin, D., Shin, D., & Shin, D. (2016). Ubiquitous Healthcare Platform for Chronic Patients. In 2016 International Conference on Platform Technology and Service (PlatCon) (pp. 1–6). IEEE. https://doi.org/10.1109/PlatCon.2016.7456836
Debayle, J., Hatami, N., & Gavet, Y. (2018). Classification of time-series images using deep convolutional neural networks. In J. Zhou, P. Radeva, D. Nikolaev, & A. Verikas (Eds.), Tenth International Conference on Machine Vision (ICMV 2017) (p. 23). SPIE. https://doi.org/10.1117/12.2309486
Geurts, P. (2001). Pattern Extraction for Time Series Classification (pp. 115–127). https://doi.org/10.1007/3-540-44794-6_10
Karim, F., Majumdar, S., Darabi, H., & Harford, S. (2019). Multivariate LSTM-FCNs for time series classification. Neural Networks, 116, 237–245. https://doi.org/10.1016/j.neunet.2019.04.014
Kate, R. J. (2016). Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery, 30(2), 283–312. https://doi.org/10.1007/s10618-015-0418-x
U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, M. A. (n.d.). Application of deep convolutional neural network for automated detection of myocardial infarction using ecg signals,. Information Sciences 415, 190–198.
Chen, Z., Yeo, C. K., Lee, B. S., & Lau, C. T. (2018). Autoencoder-based network anomaly detection. In 2018 Wireless Telecommunications Symposium (WTS) (pp. 1–5). IEEE. https://doi.org/10.1109/WTS.2018.8363930
Sugimoto, K., Kon, Y., Lee, S., & Okada, Y. (2019). Detection and localization of myocardial infarction based on a convolutional autoencoder. Knowledge-Based Systems, 178, 123–131. https://doi.org/10.1016/j.knosys.2019.04.023
Theopaga, A. K., Rizal, A., & S. (2014). DESIGN AND IMPLEMENTATION OF PID CONTROL BASED BABY INCUBATOR. Journal of Theoretical & Applied Information Technology.
Rajendra Acharya, U., Paul Joseph, K., Kannathal, N., Lim, C. M., & Suri, J. S. (2006). Heart rate variability: a review. Medical & Biological Engineering & Computing, 44(12), 1031–1051. https://doi.org/10.1007/s11517-006-0119-0
Costa, M. V, Pereira, L. A., Oliveira, R. S., Pedro, R. E., Camata, T. V, Abrão, T., … Altimari, L. R. (2010). Fourier and wavelet spectral analysis of EMG signals in maximal constant load dynamic exercise. In 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (pp. 4622–4625). IEEE. https://doi.org/10.1109/IEMBS.2010.5626474
Rhif, M., Ben Abbes, A., Farah, I., Martínez, B., & Sang, Y. (2019). Wavelet Transform Application for/in Non-Stationary Time-Series Analysis: A Review. Applied Sciences, 9(7), 1345. https://doi.org/10.3390/app9071345
Mathswork. (n.d.). Retrieved from https://ww2.mathworks.cn/help/wavelet/ug/ecg-signal-classification-using-wavelet-time-scattering.html
Physionet.org. (n.d.). Retrieved from https://www.physionet.org/
Ding, C., and H. P. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3, 185–205. Retrieved from https://www.worldscientific.com/doi/abs/10.1142/S0219720005001004

Internet of Things (IoT); Autoencoder (AE); Electrocardiogram (ECG); Heart rate variation (HRV); Electromyogram (EMG); Mean Absolute Error (MAE)

No competing interests reported.

Download PDF

Version 2

posted

You are reading this latest preprint version

Anomaly Detection and Classification of Physiological Signals in IoT- Based Healthcare Framework

Status:

Version 2

Abstract

Figures

1. Introduction

2. Related Work

3. System Design & Implementation

4. Simulation Results & Discussion

5. Conclusion

Declarations

References

Abbreviations

Additional Declarations

Status:

Version 2