An Optimal Approach for Heart Sound Classication Using Articial Neural Network

Heart sound auscultation is one of the most widely used approaches for detecting cardiovascular disorders. Diagnosing abnormalities of heart sound using a stethoscope depends on the physician’s skill and judgement. Several studies have shown promising results in the automatic detection of cardiovascular disorders based on heart sound signals. However, the accuracy performance needs to be improved as automated heart sound classication aids in the early detection and prevention of the dangerous effects of cardiovascular problems. In this study, an optimal heart sound classication method based on machine learning technologies for cardiovascular disease prediction is performed. It consists of three steps: pre-processing that sets the 5 s duration of the Physionet Challenge 2016 datasets, feature extraction using mel-frequency cepstrum coecients (MFCC), and classication using an articial neural network (ANN) with one hidden layer that provides low parameter consumption. Ten-fold cross-validation was used to evaluate the performance of the proposed method. The best model obtained 94% accuracy and 93% AUC score, which were assessed using 1626 test datasets. Taken together, the results show that the proposed method obtained excellent classication results and provided low parameter consumption, thereby reducing computational time to facilitate a real-time implementation.


Introduction
Cardiovascular disease is one of the main causes of mortality worldwide. In 2016, an estimated 17.9 million deaths occurred prematurely due to cardiovascular disease and accounts for 31% of all global deaths. Heart attacks and strokes are responsible for 85% of global deaths due to cardiovascular disease [1]. The high mortality rate is caused by cardiovascular problems that should be diagnosed early to avoid long-term complications and premature cardiac death.
Electrocardiograms (ECGs) and heart sound or phonocardiograms (PCGs) are commonly used to diagnose cardiovascular disease. PCG, which is a graphical representation of heart sound signals, can extract the heart valve opening time more accurate than ECG signals [2]. As a result, heart sound signals contain important physiological information of cardiac conditions that can be utilized to detect cardiac organ deformation as well as valve damage [3]. On the other hand, cardiac auscultation is determined by the physicians' abilities and subjective experiences [4]. Therefore, an objective and automatic computerassisted analysis of heart sound signals is very important for the early diagnosis of cardiovascular diseases [5] which can potentially prevent premature death.
Automatic heart sound classi cation is currently a promising research eld based on signal processing and arti cial intelligence approaches. It is reliable to screen or monitor for cardiac diseases in a wide range of clinical settings [6], allowing for the reduction of costly and laborious manual examinations.
Several studies have proposed algorithms to detect cardiovascular disorders based on heart sound signals.
Previously, Rubin et al. reported an 84% test accuracy using mel-frequency cepstral coe cients (MFCC) and two-dimensional convolutional neural network (2D CNN) [7]. In the preprocessing step, a 3 s duration of heart sound signals was selected. In addition, they extracted 13 MFCC features, and converted these into 2D heat map images as input for the 2D CNN. Nogueira et al. used 2D heat maps images as input for the support vector machine (SVM) classi er algorithm. This method achieved an accuracy of 82.33 % [8].
Meanwhile, in a study by Xiao et al. a validation accuracy of 93% was reported and veri ed via 10-fold cross validation using a 1D convolutional neural network (1D CNN) [9]. The preprocessing step required resampling at 2000 Hz, removing noise using a band-pass lter and sliding window with 3 s patches and 1s stride. The authors claimed that the proposed model provided extremely low parameter consumption. However, the proposed method did not use test datasets that were different from the training and validation datasets.
Li et al. proposed CNN as a classi cation method [10]. However, the method did not extract the features directly using CNN but required a separate feature extraction process using multi-feature extraction. The proposed model was tested using 831 test datasets and reported an accuracy of 86.8% using 5-fold cross validation. In a study by Khrisnan et al. a 6 s heart sound recording duration was used as input for a feed forward neural network (FNN) [11]. They veri ed the model using 5-fold cross validation and reported an accuracy of 85.65 % for 100 test datasets.
Al-Naami et al. reported the highest accuracy at 89% using high-order spectral analysis and adaptive neuro-fuzzy inference system (ANFIS) [12]. However, they only used 1837 heart sound recordings from folder 'a', 'b', and 'e' in Physionet Challenge 2016 datasets. Khan et al. reported a 91.39% accuracy using MFCC for feature extraction and LSTM as a classi er algorithm [13]. He et al. extracted 512 features using several feature extraction methods, such as the Hilbert envelope, homomorphic environment map, wavelet envelope, and power spectral density envelope as inputs for 1D CNN. This study reported an accuracy of 87.3% [14]. Jeong et al. used a 5 s heart sound recording duration and removed datasets that were less than 5 s [15]. They applied a short -time Fourier transform (STFT) to transform to the timefrequency domain and generated spectrogram images of heart sound signals as input for the 2D CNN.
The proposed method obtained 91% accuracy for the 208 test datasets.
There are several challenges in developing automated cardiac disorders using Physionet challenge datasets, such as the imbalance datasets of normal and abnormal conditions, and the variation of the length data due to different recording procedures from clinical data around the world. The aforementioned studies reported promising results related to the performance of automated heart sound detection. However, the accuracy needs to be improved because the early diagnosis of heart sound is critical to save patient life. In addition to the accuracy performance, low consumption parameters must be considered by researchers. Although the deep network provides good accuracy and can extract the information from raw signals directly, the complex architecture provides high consumption parameters and a long computational time.
Considering the limitations of the previous studies, we proposed an algorithm that can detect cardiac abnormalities based on heart sound signals, which not only improves accuracy performance, but also produces low parameter consumption. To achieve this, we proposed the mel frequency cepstral coe cient (MFCC) as a feature extraction method that has many advantages, such as being capable of capturing important information contained in the audio signal and providing data as minimal as possible without losing information. Furthermore, the proposed classi cation method is a simple ANN that contains one hidden layer to classify heart sound signals as normal or abnormal conditions.

Results
We applied 10-fold cross validation using10968 training datasets and 3833 validation datasets to nd the best model. The proposed model was assessed using 1626 test datasets, 1236 normal conditions and 390 abnormal conditions that were not part of the training and validation datasets. After conducting several simulations, the best model was obtained from the rst fold, epoch 46, and batch size 64 using the Adam optimizer with a learning rate of 0.001.
Another important term that affects the system performance is the consideration of features as input to the ANN. Based on the result, 128 MFCC features that were inputted into ANN provided the best performance, with 94% accuracy and 93% AUC score, compared with 13, 40, and 96 MFCC features as shown in Table 1. Furthermore, the performance measures such as precision, recall and F1-score were also provided to evaluate the performance of the proposed model. The model performance using 128 MFCC features obtained a precision of 97% and 85% for normal and abnormal conditions, respectively. The recall value obtained was 95% for normal conditions and 90% for abnormal conditions, and the F1score values obtained for normal and abnormal conditions were 95% and 88% respectively. Even though the proposed model is still affected by false detection as shown in Fig. 1a, most of the test datasets were successfully classi ed according to their class. In addition, the receiver operating characteristic (ROC) curve is a probability curve that plots the true positive rate (TPR) against the false positive rate (FPR). Meanwhile, the area under the curve (AUC) as a summary of the ROC curve measures a classi er's ability to distinguish between classes. The higher the AUC score, the better the model performs in classifying normal and abnormal heart sound conditions. As shown in Fig. 1b, the AUC score was 93%. Therefore, we can conclude that the proposed model can overcome the imbalanced datasets and generalize the test datasets well.

Discussion
In this study, an automated classi cation of the cardiovascular disorders based on heart sound was proposed using MFCC as a feature extraction method and ANN as a classi cation method. The results of the proposed method outperformed the aforementioned studies released by Physionet Challenge 2016, which reported the best accuracy of 86.02% [16]. Moreover, the results obtained in this study are superior to those of the latest studies that involve the same dataset as shown in Table 2. However, the proposed method was tested using datasets that are available on the Physionet Challenge 2016 website. In terms of preprocessing, we discussed the selection of the heart sound duration. An optimized heart sound duration can obtain the best performance accuracy. Khrisnan et al. and Liu et al. suggested that a data record of at least 5 s is required for reliable detection of heart abnormalities. Therefore, the duration was set to 5 s, similar to that in previous studies [12,15]. Compared with the aforementioned studies that used the same duration length to detect abnormalities in heart sound signals, the proposed model obtained superior performance accuracy as shown in Table 2.
For feature extraction, the number of MFCC features selections was determined as shown in Table 1 In most studies on heart sound classi cation, machine learning and deep learning algorithms have the same goal of providing high accuracy and low parameter consumption, thereby reducing computational time. Deep learning is reliable for the automatic extraction of the heart sound characteristics from the raw data signals. On the other hand, the training process takes a long-time and requires a larger dataset to avoid over tting. Moreover, a complex architecture will produce a larger number of parameters and require a long computation time. This study proposed a simple ANN that consists of 128 nodes as the input layer, one hidden layer with 256 nodes, and two nodes as the output layer, producing 33538 parameters, which is a low number parameter in machine learning that provides fast computational time.
The existing public heart sound datasets used in this study were imbalanced between normal and abnormal heart sound. This makes it challenging to accurately classify heart sound in real clinical applications. The model was therefore evaluated using the ROC curve to ensure that the suggested model was not underestimated or overestimated because of dataset imbalance. This is because the ROC does not account for differences in the dataset sizes between classes.
The characteristics of normal and abnormal heart sound time series data were captured by MFCC as a feature extraction method and successfully identi ed using a simple ANN employed in this study. The best model was selected via 10-fold cross validation and evaluated using 1626 test datasets. The 94% test accuracy and 93% AUC score show that the proposed method can overcome the problem of an imbalanced dataset. Moreover, the low parameter consumption resulted in fast computational time for screening the abnormality of heart sound for heart disease identi cation. Therefore, it helps the cardiologist devise a faster and proper treatment plan for the patients.

Methods
In this study, we proposed an optimal heart sound classi cation system using MFCC as a feature extraction method and ANN as a classi cation method. A general block diagram of the proposed method is presented in Fig. 2a.

Dataset
This study uses heart sound signals that were collected from the PhysioNet Challenge 2016 datasets, which are the most widely used in the previous studies for heart sounds analysis [17,18]. The datasets consisted of six folders from 'a' to 'f' with a total of 3160 heart sound recordings, 2495 are normal conditions and 665 are abnormal conditions such as mitral regurgitation, aortic stenosis, and valvular surgery. The datasets were provided by different medical institutions using different recording settings.
The duration of each heart sound varied from 5 to 120 s with a 2000 Hz sampling frequency. However, recent studies have claimed that 5 s is the minimum duration required to detect heart sound abnormalities. Therefore, this study divided the heart sound into short recordings of 5 s using time series segmentation. On the other hand, this process is capable of expanding the heart sound datasets by generating 16251 more datasets, consisting 12418 normal conditions and 3833 abnormal conditions.

Feature extraction using MFCC
The feature extraction of heart sound using MFCC consists of several processes [8] as shown in Fig. 2b, namely pre-emphasis, framing, windowing, and fast Fourier transform (FFT) to transform the heart sound signal from the time domain to the frequency domain, which is then transformed to the mel frequency domain, generating the mel lter bank. The mel lter bank value shows the amount of energy in the frequency range of each mel lter. A non-linear transformation is performed to take the natural logarithmic value of each mel lter. In the last process, the discrete cosine transform (DCT) returns the heart sound signal to the time domain and produces MFCC features as input to the ANN. The number of MFCC features used in this study was 13, 40, 96 and 128 features.
Classi cation using ANN An ANN is a fully connected structure that mainly comprises three layers of its architecture [19]: input, hidden and output layers (Fig. 2a). Input layer acquires input from the external resources. The feature extraction results of MFCC (128 features) were assigned as input for the ANN architecture. The hidden layers process the input from the previous layer and transfer the result to the output nodes. In this study, the hidden layers that were used in the ANN consisted of one hidden layer with 256 nodes. The Rel-U activation function was applied to the hidden layer, and a sigmoid activation function was applied for the output layer, which consisted of two nodes that were represented as normal and abnormal conditions.

System Performance
The system performance was evaluated using a confusion matrix to obtain the accuracy, precision, recall, and F1-score. The following equations were used to calculate the parameters in order to measure the effectiveness of the system in diagnosing normal and abnormal heart sound conditions. (1) (3) In Equations (1)