Assessment of Dual Tree Complex Wavelet Transform to improve SNR in collaboration with Neuro-Fuzzy System for Heart Sound Identification


 Here we propose a novel de-noising method to improve the outcome of heart sound (HS)-based heart condition identification. We applied Dual Tree Complex Wavelet Transform (DTCWT) in collaboration with Adaptive Neuro Fuzzy Inference System (ANFIS) classifier. The method consisted of three steps. First, preprocess to eliminate 50 Hz noise. Second, application of DTCWT to de-noise and reconstruct time-domain HS signal. Third, evaluation of ANFIS on total 2735 HS recordings from an international dataset (PhysioNet Challenge 2016). The signal-to-noise ratio (SNR) with DTCWT was significantly improved (p < 0.001) as compared to original HS recordings. Quantitatively, there was a 11% increase in SNR after DTCWT, representing a significant improvement in de-noising HS. In addition, the ANFIS, using six time-domain features, resulted in 55–86% precision, 51–98% recall, 53–86% f-score, and 54–86% MAcc in comparison to other attempts on the same dataset. Therefore, DTCWT is a successful technique in de-noising information such as HS recordings. The adaptive property of ANFIS exhibited capability in classifying HS recordings.


Introduction
Healthy cardiac valves (CV) play essential roles in our overall health [1]. Functional conditions of CV can be examined by the sounds of CV opening and closing in heart beats. The heart sounds (HS) measured by phonocardiograph (PCG) are consisted of 4 different sounds, S1, S2, S3 and S4 [1,3].
Various CV diseases can be detected from the heart sounds, mostly during S1 and S2 but the accuracy of such diagnosis is largely depending on experience and expertise of a cardiologist [2,3].
However, PCG is a complicated non-stationary signal in nonlinear low frequency, which can be easily interfered by surrounding signal sources [3,4].
Since S1-S2 section is the vital portion of HS in diagnosis for CV diseases from PCG, human errors in the diagnosis may cause life hazard [3]. To aid cardiologist in distinguishing healthy Normal Heart Sounds (NrHS) from Pathological Heart Conditions (HC) various signal processing and artificial intelligence algorithm have been applied to S1 and S2 [2]. To date, there have been 1,347 studies for algorithms for automatic analysis and classification of HS [2]. The earliest attempt was in 1963 by Gerbarg et al. who applied merely signal processing analysis on 1000 HS recordings [1]. Since then, hundreds of attempts have been reported that applied the more advanced information technology, such as the artificial neural networks (ANN) classifiers and mathematical transforms including Fourier transform (FT) and wavelet transform (WT). Most of previously applied signaling processes had a three steps of construction: a de-noising process, the extraction of signal "features" to discern HC from NrHS, and application of a "classifier" such as ANN or Deep Learning (DL) (i.e. Machine learning) [5]. The signal features are usually extracted either from time domain, wavelet domain, frequency domain, or morphological operations [6][7][8][9]. Many types of ANN and DL techniques have been applied on different datasets with a range of accuracy [10][11][12][13].To facilitate the research to develop an efficient and reliable computer algorithm supporting the diagnosis of heart diseases, international databases have been established containing large PCG data sets of HC and NrHS. PhysioNet Challenge 2016 is an example, which was Massachusetts Institute of Technology (MIT) in the Unites States [14].
In this study, we tested a new technique to de-noise the HS using the Dual Tree Complex Wavelet Transform (DTCWT) using the Physio-Net Challenge 2016 data set. The DTCWT was modified from discrete wavelet transform (DWT) to overcome the drawbacks during sampling procedure, leading to inadequate noise elimination primarily due to failure in the anti-aliasing and shifting invariance properties [29].
Our findings from 2,735 HS signals indicated that the signal-to-noise ratio (SNR) was improved by DTCWT. A set of time-domain HS signal features was extracted and analyzed using the Adaptive Neuro Fuzzy Inference System (ANFIS). Our study may suggest the efficiency and reliability of DTCWT in processing HS data, followed by the accurate diagnosis as compared to other studies on PhysioNet 2016.

Materials and methods
The PhysioNet Challenge is an international dataset of HS signals, acquired from multiple sites around the world including non-clinical and clinical health organizations, professional and nonprofessional supervision, and various medical recording instruments. It is consisted of five classes categorized as A, B, C, D, and E, containing NrHS and HC signals [14]. There is a total of 3,153 HS in the dataset, with unevenly distributed HC severity among classes [5]. In addition, each class was constructed to contain some emergency external noise such as uncontrolled environment voices [5,14].
In this research, we downloaded the HS recordings from all the five classes in the dataset.
These HS recordings were divided into training and test sets using the 80-20% split protocol, as widely applied in ANN. We combined the class C data with class D due to their low numbers of HS signals to allow 80-20% ANN split protocol. To meet the objectives of this paper, we discarded obvious distorted HS recordings that contained unusual external sounds using our experience as biomedical engineers, so we had 2735 out of 3,153 HS recordings, approximately 87% of the entire dataset, as illustrated in Table 1. As the HS signal is periodic where S1 and S2 are repetitive, we assigned 5 seconds HS duration to cover the S1 and S2 and sampled the signals at 2,000 Hz, as per the specifications of the PhysioNet.  Figure 1 shows the block diagram of the proposed method. It differs from our previous work [4] in which we investigated a new de-noising technique (DTCWT) and employed ANFIS using time domain features instead of spectral domain. To best of our knowledge, the DTCWT, which is a tool that permits quantitative SNR analysis, has not been explored for HS recordings in PhysioNet Challenge 2016.

Pre-processing of HS Signal
Each HS signal was pre-processed by applying a Notch filter to eliminate the primary 50-60 Hz noise.
Then, a Butterworth filter was applied using two cutoff frequencies; FC1=0.025 and FC2=0.4 [30,31]. Figure 1 shows examples of on an HC after applying previous steps.

Dual Tree Complex Wavelet Transform
The DWT, developed to aid signal analysis, suffers filtering difficulties during sampling procedure, leading to inadequate noise elimination due to failure in the anti-aliasing and shifting invariance properties [29]. To overcome such limitations, DTCWT was developed to enhance noise abolition by DWT [31]. The DTCWT includes several sequential steps during the decomposition and reconstruction process. As these two processes are applied to the real and imaginary constituents of the signal so it is denoted as Dual Tree. Each tree is a one-dimensional DWT presentation consisted of the real and imaginary parts, which is referred as Complex Wavelet Transform. As shown in Figure   2, each tree can be considered as a Filter-Bank (FB) tree containing low pass filter (h0 or g0) and high pass filter (h1 and g1). During the decompose process, the low and high pass filters divide signals into Approximate Coefficients (cA) and Detail Coefficients (cD). The size of cA and cD are successively decreased by factor of 2 at each decomposition level. The factor of 2 was selected because it is the optimum DTCWT performance as indicated in ref [29]. During the reconstruction process, a reverse procedure is applied to rebuild the signal successively with progressive de-noising, leading to optimal reconstruction with the minimum anti-aliasing and shift invariance impacts [29].
Particularly, the DTCWT satisfies the well-known condition of Hilbert Analytic, which is illustrated in top of Figure 1 (denoted in dashed box), indicating that the scaling complex function and the wavelet complex function collaborate to generate Hilbert pair. Equation 1 shows its mathematical presentation as previously described [29,31,32].
Where ( ) is real (even), ( ) is imaginary (odd), and ( ) is the DTCWT signal (analytic). The best performing DTCWT design can be achieved [31,33] when internal structure of FBs satisfies the following three requirements: 1. Perfect reconstruction (PR) to make the reconstructed signal � ( ) identical to the original (input) signal.
2. Successful application of half sample shift of low pass filters (ho and go).
3. The shifting procedure extended to one sample in the first divergence in the dual tree with respect to go and g1, respectively.
Therefore, DTCWT solves drawbacks in DWT by permitting q-shift and anti-aliasing during analysis and synthesis processes, according to Hilbert pair. This suggests that DTCWT may lead to better de-noising performance. Subsequently, the SNR of HS signals is expected to improve, consequently allowing more precise time-domain feature extraction.

SNR Calculations
HS signals frequently contain many types of ambient noises, such as electronic circuit noise and interface between electrodes and skin [5]. It is often that a type "A" noise is blocked by another noise in type "B" that makes it challenging to precisely assign types of the embedded noises in the HS signals [34,35]. Therefore, it is recommended for the SNR to be calculated depending on the value of accumulative residual noises, and we calculated the SNR of HS using equation (2): It was converted to decibels (dB) by using equation (3).

Feature Extractions
From HS signals after DTCW, we extracted a set of seven features, including Entropy, Skewness, The six features showing significant improvements, including Entropy, Skewness, Kurtosis, STDev, Max, and Min, were normalized between 0 and 1 prior to be applied for the ANFIS using the equation (5) adopted from a previous study [4]: Where, and , are the original and normalized j-th feature, respectively; , and , are the minimum and the maximum of the -th feature values calculated for all 2,735 samples, respectively. In other words, the j-th feature (j=1 to 6) for n samples (n=1 to 2735) was normalized between 0 and 1 values. Thus, the classification process, described in the following section, should not affected by different magnitudes of HS signal.

Adaptive Neuro Fuzzy Inference System (ANFIS)
The ANFIS is a machine-learning (ML)-based classifier algorithm that have been used for a variety of engineering problems in biomedical fields. It is a rule-based method originally developed by Jang [36].
The ANFIS has the ability of ANN ML that exploits fuzzy inference system to deduce decisions by a fuzzy logic method that takes in the account the membership degree of input-output variables [36].
The ANFIS architecture has two fuzzy if-then rules based on the Sugeno model. It has two set of input rules, which are applied to generate one output. The connection between the two input rules and the Sugeno fuzzy output is reconstructed in five layers of nodes, two layers are adaptive with flexibility, while the other three are fixed.
We applied ANFIS to each class of the HS dataset. Figure 3 shows

Results
For each HS signal (Normal or abnormal), the SNR was calculated twice before and after applying DTCWT. In the second experiment, we calculated the time-domain features for each HS signal after DTCWT. Then the ANFIS classifier was trained and applied to each test set. The rate of recall, precision, and F-score was reported on each class in the dataset. The MAcc, which is often used in recent literature, was also reported. Table 3 shows that the average precision, recall, F-score, and MAcc were 0.68, 0.81, 0.74, 75%, respectively. Whilst, Figure 4 shows the box-plot of SNR measurements on HS recordings for the five classes in dataset (Table 1), indicating the increment in SNR after applying DTCWT. Finally, we compared our findings with several previous findings with wavelet transform and other approaches applied on PhysioNet challenge 2016 (Table 4).

Discussion
The results in Table 2 and Figure 4  These findings are consistent with some studies suggesting that classes B and E contain the poorest quality of HS recordings [14,21]. Kay E. et al. [16] argued that Class E should be excluded from PhysioNet Challenge 2016 as it contains clinically not practical HS recordings, which was also claimed by Gjoreski et al [5].
Whilst, analyzing the ANFIS classifying performances in Table 3 shows that ANFIS outputs on classes B and E appeared to be unsatisfactory in terms of accuracy. The ANFIS classifier is a machine code that can be trained to provide an optimal performance, so it is unlikely to be the reason.
Therefore, the shortage in performance can be attributed to the quality of HS recordings, likely suggesting that classes B and E contained impractical HS signals, as claimed by references [10,16,25,26]. The potential solutions for this issue could be either to increase the number of input features to ANFIS classifier as in references [19,21,28,37], or to segment S1 and S2 portions from the five seconds HS recordings as in references [17,18,20,28].
Nonetheless, our ANFIS performed the mathematical calculations to achieve 73-86% precision, 87-98% recall, 84-86% f-score, and approximately 86% MAcc (Table 3). This was achieved on all HS signals in classes A, C, and D, which presumably contained the correct S1-S2 segments, suggesting the adaptive property ANFIS with only six time-domain features performs well in classifying biosignals such as HS recordings. Our outcome is different from the previous finding by us [4] in many aspects. First, here we tested DTCWT for de-noise HS signals instead of Fourier bispectrum. Second, we employed time domain instead of frequency domain features as inputs to ANFIS. Third, we employed the kurtosis and skewness as inputs to ANFIS. Since we applied the new method in Figure 1 on 2735 instead of 1738 signals, we consider our outcomes in this paper to be more reliable. Therefore, combining the results in this paper along with our previous study [4]  It is noteworthy that the performance of DTCWT and ANFIS may be affected by several parameters. One of which is the Butterworth Filter cut-off frequencies. However, we speculate that any marginal change in the cut-off frequency would only make marginal change in performance [30].
The other parameters likely affecting the performance are the cA and cD, the amplitude and detail coefficients in the DTCWT, respectively [29]. A follow-up prospective research of those two parameters may further improve the DTCWT reconstruction of HS signal, followed by an improvement of ANFIS performance. At the end, the number of decomposition and reconstruction levels in the DTCWT in Figure 2 also could have an impact.

Conclusion
In summary, this paper evaluated de-noising HS recordings by DTCWT, which has not been attempted. The results showed statistical significance in SNR improvement with substantial improvement, number of folds of [dB], in SNR percentage difference. The adaptive property of ANFIS classifier with 6-time domain features successfully resulted in an encouraging accuracy (74.5%) on 2735 samples. If non-clinically-accepted HS recordings were excluded as in reports [10,16,17,20,24,38,40], the proposed approach would result in 86% accuracy which come in range of accuracies in these reports, indicating the capability of DTCWT and ANFIS as a suggestive successful tool.