Hurst Exponent-Based Nonlinear Analysis for the Identification of Arrhythmia-Affected Cardiac Systems


 The stochastic nature of the human heart, a complex biological system, is evident from electrocardiogram (ECG) signals, which are weak, non-linear and non-stationary signals. These temporal variations of electromagnetic pulses emanated from the heart are instrumental in indicating the cardiac health. The Empirical Mode Decomposition (EMD) technique was employed in order to decompose a total of 64 ECG signal data of arrhythmic and normal subjects, obtained from widely used MIT-BIH databases, into a finite number of Intrinsic Mode Functions (IMFs). The rationale behind using this strategy was to extract non-linear features of ECG signals which are not explicitly expressed, while keeping the original signal unaltered. Following removal of non-stationary noises from the ECG signals by the Savitzky-Golay (SG) filter, popular non-linear parameter Hurst Exponent (H) was estimated for every IMF by employing the R/S technique. A distinct difference between H values of 1st IMFs between normal individuals and arrhythmia affected patients was identified. This observation was further validated through an age-based and gender-based analysis, which demonstrated a unique alteration pattern with age. The study showed 94.92% probability in detection of arrhythmia in a patient. Adopting this EMD-based procedure for ECG data analysis and disease prediction may assist in reducing our dependence on intuition-based diagnosis of ECG reports by medical practitioners and may provide novel insights into the functioning of the human heart which might help develop new biomedical strategies to combat cardiac disorders.


Introduction
Recent times have witnessed a surge in occurrence of cardiovascular diseases. Arrhythmia is one of the notable heart ailments having symptoms including anxiety, palpitations at rest and exertion, reduced physic al ability and breathlessness [1]. The occurrence of arrhythmia is also associated with the onset of other severe complications like stroke and heart failure. Electrocardiogram (ECG) signals are conventional and well -established tools for cardiologists to assess the health of the heart in an easy and cost-effective manner [2]. Modern techniques such as the Holter test, echocardiogram and stress tests are also used for the diagnosis of a cardiac patient but are sophisticated and relatively expensive, thus limiting accessibility of the less-privileged populations of developing nations to these techniques [3]. However, conventional ECG analysis is typically based on identification of temporal variations in the signal, such as its wavelength and amplitude. Thus, manual interpretation by the cardiologists sometimes fails to provide certain vital information about the heart by conventional ECG-based analysis alone [4]. Often time is a constraint for interpretation of ECG reports of critical patients suffering from emergency medical issues, which necessitates analysis of these reports automatically [5]. Therefore, a computerized interpretation of medical data like electrocardiogram has been of interest to researchers over the last few decades. Efforts have been made so far in the identification of cardiac abnormalities such as premature ventricular complexes, atrial fibrillation etc. by taking into consideration different components of the ECG signals and analysing them with respect to ECG signals emanated from normally functioning cardiac systems [6][7][8]. However, studies involving characterization of ECG signals using nonlinear approaches are relatively rare.
Diverse non-linear techniques that involve a shift to the frequency domain from the time domain, such as multifractal analysis, wavelet transform, recurrence etc. are often used in physiology and medical sciences [9,10].
Empirical Mode Decomposition (EMD), which seems to be an attractive tool for extraction of nonlinear features from biomedical signals, essentially involves decomposition of the signal into a finite number of Intrinsic Mode Functions (IMFs), keeping its original properties intact [11]. This multi-scale analysis method, which is used widely for the prediction of the signal's trend, does not rely on any prior knowledge [12,13]. A nonlinear parameter, Hurst exponent (H), which is crucial for elucidating the regularity and scaling in a time series, might serve as an effective parameter for interpretation of the underlying dynamics of the nonlinear cardiac system and for the characterization of arrhythmic patients [14,15].
PhysioNet is an open access collection of physiological signals and clinical data besides related open-source software. In this study, we collected publicly available ECG signals and proposed a pipeline which will help to extract the nonlinear features buried in ECG signals by emphasizing on the nonlinear topographies of IMFs, for comprehensive analysis of ECG reports. We sought to distinguish arrhythmia affected patients from the normal human subjects based on a comparative study of Hurst exponents of a particular IMF retrieved from respective ECG signals, which can prove to be a potential strategy for identification of arrhythmia and subsequently, may assist to establish a new paradigm in cardiology.

Data Collection and Preprocessing
Full-length ECG time series data of 48 arrhythmic patients were downloaded from popular MIT-BIH Arrhythmia Database (https://physionet.org/content/mitdb/1.0.0/) [16,17]. The patients were 25 men aged 32 to 89 years, and 22 women aged 23 to 89 years. However, records 201 and 202 were taken from the same male subject.
Full-length ECG time series data of 18 normal subjects were collected from MIT-BIH Normal Sinus Rhythm Database (https://www.physionet.org/content/nsrdb/1.0.0/). Normal subjects include 5 men aged 26 to 45, and 13 women aged 20 to 50 having no significant arrhythmias. Each ECG data was recorded for the time duration of 60 seconds. The Savitzky-Golay (SG) filter was employed for pre-processing of ECG signals in order to maximize noise reduction with minimal signal distortion [18,19]. The SG filter was performed by the MATLAB v.R2016a (https://in.mathworks.com/).

Empirical Mode Decomposition Analysis of the ECG Signals
Empirical Mode Decomposition (EMD) technique was applied to each of the filtered ECG signals, decomposing them into a finite number of Intrinsic Mode Functions (IMFs) [20]. To retrieve the maximum number of IMFs, the number of iterations was set to 150. The algorithm is described as follows:  4. The mean is then subtracted from the main signal ( ) to get the first proto-IMF ℎ 1 ( ). Thus, 5. Due to multiple extrema present in between two consecutive zero crossings, the sifting process is to be applied continuously to ℎ ( ), the kth proto-IMF. Once, this satisfies the IMF conditions, the first IMF 1 ( ) is obtained.
6. The sifting process gets stopped once reaching the stopping criterion which is characterized by the Sum of The first IMF 1 ( ) is obtained when the SD is smaller than or equal to the threshold value . The typical values of lies between 0.2 and 0.3.

7.
After that, the IMF is deduced from the original signal to find the first residual signal 1 ( ) by (4) 8. This residual signal is considered as the original signal to produce further a pair of IMF and residual signal.
The procedure is allowed to continue until the N th residue ( ) turns to be a constant or with a single extremum or having a monotonic slope.
9.Finally, combining all the above steps, the original signal can be expressed as Each IMF should satisfy the following characteristics [21]: A.The number of zero-crossing must be equal or differ by one unit to the number of extrema (assuming it has at least two extrema).
B.Each IMF should be symmetrical with respect to the local mean.
The above mentioned EMD processes were performed by the MATLAB v.R2016a (https://in.mathworks.com/).

Statistical Significance of Intrinsic Mode Functions
The correlation coefficient between the IMFs and the original signal were calculated to identify the IMFs with the real components of the original signal [22,23]. The correlation coefficients ( ) were calculated using the following equations [24,25]: .. (6) A hard threshold λ is used to select the significant IMFs described as Where, is a ratio factor greater than 1.0 [23]. In this study, IMFs with ≥ λ were considered to be significant.

Estimation of Hurst Exponents
The Hurst exponent (H) was calculated for each IMF by R/S technique, proposed by Mandelbrot and Wallis in 1969 [26]. The steps to get H using R/S method are described below: 1. The analysis begins by dividing the IMFs of length L into d sub series ( , ) of length n. for i = 1,…., n.
... (9) 5. The range is calculated from the cumulative series as 6. The range is then re-scaled by dividing by .
7. Finally, for all the sub-series of length n, the mean value of re-scaled range is considered as follow

Statistical Analysis
All the computation and statistical analyses were performed using in -house MATLAB v.R2016a (https://in.mathworks.com/) codes and GraphPad Prism v.7 (GraphPad Software, La Jolla California USA, www.graphpad.com). Graph plots were generated using MATLAB software and Adobe Illustrator. The whole methodology is followed by the schematic diagram, as shown in Figure. 1.

Observatory results distinguish arrhythmia affected ECG signals
The disease data and the normal data collected from the MIT-BIH databases were recorded for a time duration of 60 seconds. 21600 data points were observed for disease data series with a frequency of 360.01 sec -1 while 7680 data points were observed for normal data series with a frequency of 128 sec -1 . Amplitudes for such data sets

Selection of significant IMFs
As the signature of the arrhythmia with respect to the H was observed in case of the 1 st IMFs, thus each such series was examined to determine whether that was significant or not with the help of the equation no. 6 and 7 mentioned in Methods. The correlation coefficients were calculated for all IMFs of ECG signals (Supplementary to be insignificant and were eliminated from further analysis.

Age and gender-based subgrouping for pattern reconfirmation
An intensive analysis was carried out on the H on the basis of age and gender. The values of H for the agebased subgroups were following the similar trend as observed for the whole data set ( respectively. Also, the p-values for the same are listed in the above-mentioned tables.

Discussion
Alteration in heart rate occurs due to systole and diastole, induced by the actions of SA node, AV node as well as the His-Purkinje system [35]. Dysregulation of these mechanisms leads to various cardiovascular diseases, including arrhythmia [36,37]. Presence of such cardiovascular diseases is usually identified by an observation of the ECG report of an individual, which is mainly the graphical representation of the electrical potential of the heart [38].
Here Figure 2 reflects the distinction between a normal and diseased individual's electrocardiogram. Values of the maximum (and minimum) amplitude of all cycles are nearly constant throughout the measurement time for the normal data. However, a rapid variation on the same is observed in the ECG of disease data. Therefore, from the primary observation on the ECG plot in Figure 2, one can understand the deviation of the signal pattern for a diseased individual from that of a normal person. However, neither does such a representation quantify the seriousness n or the type of the disease. The diagnosis of most of the cardiac disorders therefore continues to be largely dependent on intuitive analysis of the medical practitioner based on ECG reports. Interpretation of such reports may vary from one physician to another and may lead to inappropriate treatment by inexperienced physicians. Thus, owing to the advancement made in the domain of biophysics, quality of diagnosis of cardiovascular diseases through ECG signals could be improved by mounting on non-linear statistics and machine learning approaches [40 -42].
EMD is chosen over other well-known nonlinear data analysis methods because it is a data-driven mechanism whereas other methods, like Fourier and wavelet-based methods, need some predefined basis functions for representing a signal [43,44]. Here, we have evaluated the IMFs of each data series by calculating the Hurst exponent of every IMF. The components of a signal i.e. the IMFs will provide deep insights about origin of the disease, which prompted us to characterise the ECG signals by critically analysing the features of IMFs rather than the original signals. By omitting the H values for the patient IDs 19093, 19140 and 232 from our analysis, we found the H of 1st IMFs to be ranging from 0.7886 to 0.9297 (mean = 0.8631, median = 0.8705) for normal persons and from 0.9361 to 1.0324 (mean = 0.9972, median = 1.0025) for patients suffering from arrhythmia, which significantly differ from those of the normal data ( Figure 5). The fact implies that the dynamic behaviour of the cardiac system of a normal person differs noticeably from the same of an arrhythmia affected individual. Therefore, the statement is true for 56 -time series data except 03, i.e., there is a probability of 94.92% towards the predictability of arrhythmia disease in a patient by this analysis procedure.
The experimental result can also be enlightened with the help of the persons' gender and age -based grouping.
Here, such distribution will lead us to assess whether the H values (of 1st IMFs only ) are evenly distributed throughout the range and also obey the similar pattern in short-range classification. The mean of the H of the diseased male and female patients were significantly distinct from the mean of the normal subjects as shown using box an d whisker plot in Figure 6B. The discrete margin between the H of normal and diseased persons are also consistent with the genderbased grouping. Moreover, Table 2 portrays the significant distinction between H values corresponding to the different age groups of the diseased and normal subjects, which validates the assessment made earlier ( Figure 6A).
However, all the 1st IMFs fall in the category of persistence type behaviour as H values of the same are always greater than 0.5. However, all the H values for the disease data (~1) are greater than that of the normal data. This observation suggests that the series of 1st IMFs obtained from the disease data exhibits a stronger trend than that of the normal data [39]. So, the Hurst exponent can be a potential nonlinear factor for accurate identification of arrhythmia-affected cardiac systems in humans.

Conclusion
Signal analysis by nonlinear techniques has varied applications in understanding crucial physiological processes and in devising new biomedical strategies for disease prediction and treatment. This study highlights that the EMD can be a powerful tool for extraction of nonlinear features from electrocardiographic data. Our proposed pipeline demonstrated 94.92% probability towards identification of arrhythmia-affected cardiac systems in a patient.
We showed that Hurst exponent might serve as a potential indicator of the health of the human heart and thereby can help in detection of cardiac arrhythmias owing to perturbations in the electrical activity of the heart.

Conflicts of interest/Competing interests:
The authors declare no conflict of interest.

Availability of data and material:
The raw data was collected from the publicly available data respiratory of           Figure 1 Temporal variation of ECG data: Comparative analysis of temporal variation of ECG time-series data of a normal subject (ID: 19090) (shown above) and an arrhythmic patient (ID: 100) (shown below).

Figure 2
Plots of IMFs retrieved following EMD Methodology: 12 IMFs were extracted by EMD performed on arrhythmic and normal ECG data. IMFs corresponding to one arrhythmic patient data (ID: 100) is shown.   Distinction in the H of 1st IMFs: The two-tailed student's t-test between the H of all 1st IMFs corresponding to the disease data and normal data. Distinction in the H of 1st IMFs based on age and gender: (A) A gender-based classi cation shows a variation in H of 1st IMFs between arrhythmic (Disease) and normal ECG signals between males and females. (B) An age-based subgrouping (< 30, 30-50, 50-70) also shows signi cant difference in H of 1st IMFs between arrhythmic (Disease) and normal ECG signals.