Hypertrophic Cardiomyopathy Diagnosis Based on Cardiovascular Magnetic Resonance Using Deep Learning Techniques

Hypertrophic cardiomyopathy (HCM) can lead to serious cardiac problems. HCM is often diagnosed by an expert using cardiovascular magnetic resonance (CMR) images obtained from patients. In this research, we aimed to develop a deep learning technique to automate HCM diagnosis. CMR images of 37421 healthy and 21846 HCM patients were obtained during two years. Images obtained from female patients form 53% of the collected dataset. The mean and standard deviation of the dataset patients’ age are 48.2 and 19.5 years, respectively. Three experts inspected images and determined whether a case has HCM or not. New data augmentation was used to generate new images by employing color filtering on the existing ones. To classify the augmented images, we used a deep convolutional neural network (CNN). To the best of our knowledge, this is the first time CNN is used for HCM diagnosis. We designed our CNN from scratch to reach acceptable diagnosis accuracy. Comparing the designed algorithm output with the experts’ opinions, the method could achieve accuracy of 95.23%, recall of 97.90%, and specificity of 93.06% on the original dataset. The same performance metrics for the designed algorithm on the augmented dataset were 98 . 53%, 98.70%, and 95.21%, respectively. We have also experimented with different optimizers (e.g. Adadelta and Adagrad) and other data augmentation methods (e.g. height shift and rotation) to further evaluate the proposed method. Using our data augmentation method, accuracy of 98.53% were achieved which is higher than the best accuracy (95.83%) obtained by the other data augmentation methods which have been evaluated. The upper bound on difference between true error rate and empirical error rate of the proposed method has also been provided in order to present better performance analysis. The advantages of employing the proposed method are elimination of contrast agent and its complications, decreased CMR examination time, lower costs for patients and cardiac imaging centers.


Introduction
In Hypertrophic cardiomyopathy (HCM), the heart muscles abnormally become thick, as illustrated in Figure 1. These thickened muscles make it hard for the heart to pump blood. In some HCM patients, the thickened muscle may lead to shortness of breath. It also may cause chest pain or problems in heart operation which leads to abnormal heart rhythms and even sudden death [1]. Figure 1. In HCM patients, the muscular heart walls i.e. septum are thicker. Left illustrates a normal heart and right shows an HCM heart. HCM may have one or more symptoms like shortness of breath and chest pain especially during exercise, fainting, heart murmur, and sensation of rapid heartbeats. In most HCM patients, the muscular wall between heart two bottom chambers becomes thicker. Consequently, this thickened wall may block blood outflow. This problem is called non-obstructive hypertrophic cardiomyopathy if the blocking is not significant. However, the left ventricle may become stiff which makes it hard for the heart to relax. Consequently, it reduces the amount of blood in the ventricle and the amount of blood sent to the body in each heartbeat [2].
Hypertrophic cardiomyopathy patients also have an abnormal arrangement of heart muscle cells known as myofiber disarray. This may trigger arrhythmias in some patients. This disease is usually inherited through families. Having a parent with hypertrophic cardiomyopathy increases the chance of having genetic mutation by 50% [3].
The most common complications of hypertrophic cardiomyopathy are listed as below [2]:  Atrial fibrillation: this problem i.e. thickened heart muscle alongside the abnormal structure of heart cells may lead to changes in the heart's operation. Consequently, the heartbeats are fast or irregular. Atrial fibrillation increases the risk of blood clots leading to stroke or emboli in other organs or extremities.  Blocked blood flow: in some patients, the thickened heart muscle may block the blood outflow of the heart. Consequently, it leads to shortness of breath with exertion and other complications such as fainting spells, dizziness, and chest pain [3].  Mitral valve problems: the valve between the left ventricle and left atrium does not close properly when thickened muscle blocks the blood outflow. Consequently, the blood may leak backward toward the left atrium.  Weak heart pumping: the thickened heart muscle becomes weak and ineffective, and the ventricle becomes larger, resulting in less forceful heart pumping.  Heart failure: filling the heart with blood may be disturbed by the stiffness of thickened heart muscle. Consequently, the heart is not able to pump enough blood.  Sudden cardiac death: this disease may lead to heart-related sudden death in patients of all ages. As many patients do not know they have it, sudden cardiac death may be the first sign of a problem even in seemingly healthy young people or active adults [4].
Based on our knowledge, there is no prevention for HCM. However, it is possible to identify this disease as soon as possible to direct the treatment process and prevent its side effects. For distinguishing the HCM patients, doctors may recommend genetic testing. However, this test may not detect mutation in all HCM patients. In addition, some insurance companies do not cover this test [4].
Echocardiograms can also be used for screening these patients. However, Echocardiograms do not have high accuracy. Therefore, the researchers are trying to find alternative methods for diagnosing HCM based on machine learning and data mining algorithms. To the best of our knowledge, there are few researches in this field [2] which will be reviewed in the next section.

Related work
The outstanding representation of convolutional neural network (CNN) has made it one of the most popular machine learning methods among researchers in various fields. Regarding HCM diagnosis, WY Ko et al. [5] developed a convolutional neural network (CNN) based on ECG signals. The network was trained and validated on 2448 HCM patients and 51153 healthy subjects. To evaluate the trained network, 612 HCM and 12788 healthy subjects were used. The achieved sensitivity and specificity were 93% and 87%, respectively. CNN has also been used for automatic quantification of left ventricle mass in 1073 HCM patients [6]. The reported results demonstrated the power of CNN in automatic segmentation of LV in HCM patients. It was shown that there is no significant difference between automatic and manual segmentation. The same research was done by Q. Tao et al. [7]. In their research, a cine MRI dataset which was obtained from four medical centers was used. Using this dataset, three CNNs with U-NET architecture were trained. Automatically generated results were compared with manually generated ones. Wilcoxon test showed that there is no significant difference between automatic and manually generated results.
Deep learning has also been used to increase the performance of HCM mutation prediction using cardiac cine images [8]. The studied dataset was based on a non-enhanced four-chamber view of cine images and contained information of 198 HCM patients. The classification of the HCM genotypes was realized by a deep learning model. The conducted experiments yielded AUC of 84%, sensitivity of 83.33%, and specificity of 78.26%. Another research employing CNN has been carried out in [9]. The objective was to measure the maximum heart wall thickness of HCM patients. The studied data consisted of 60 adult HCM patients including those carrying HCM gene mutations. Eleven experts and a machine learning algorithm analyzed the maximum heart wall thickness of patients. Compared to the generated results, the precision of the machine learning algorithm was superior.
A. D. Marvao et al. [10] found the prevalence of uncommon variants of HCM genes based on cardiac magnetic resonance imaging of participants. The segmentation of these images was performed using a deep neural network. They reported that due to heart failure and arrhythmia, SARC-P/LP variants were associated with an increased lifetime risk of adverse cardiovascular outcomes. Such variants are also associated with greater risk in HCM patient cohorts. Overall, SARC-P/LP variants have low aggregate penetrance for overt HCM but are associated with an increased risk of adverse cardiovascular outcomes and a sub-clinical cardiomyopathic phenotype.
O. Bernard et al. [11] introduced the "Automatic Cardiac Diagnosis Challenge" dataset, for the purpose of cardiovascular magnetic resonance (CMR) assessment. The dataset contains 150 multi-equipment CMR data. The data were classified by two medical experts. The objective of this work was assessment of state-of-the-art deep learning methods performance in segmenting the myocardium and classifying pathologies.
To the best of our knowledge, none of the existing researches has focused on HCM detection using CMR images and our method is the first in this regard. The novelties of this research are:  Collecting a new dataset which has been made publicly available to the research community.
 Diagnosing HCM by deep learning algorithm with high accuracy. To the best of our knowledge, it is done for the first time.  Proposing a new data augmentation method based on color filtering  Designing a CNN architecture from scratch which is customized to obtain high accuracy on HCM disease instead of using existing models.  Analyzing the difference between true error rate and the empirical error rate of the proposed method.
Consequently, the following advantages will be achieved:  Elimination of contrast agent and its complications.  Shorter examination time of CMR.  Lower costs for patients and radiology centers.  Better and faster services by radiology centers.
The rest of this paper is organized as follows. In Section 2, the collected dataset is described. In section 3, the required background for this paper is briefly reviewed. The proposed method is explained in detail in section 4. Implementation details are given in section 5. The experimental results are presented in section 6. Finally, Discussions and conclusions are discussed in section 7.

Dataset description
In this study, CMR images of 37421 healthy and 21846 HCM patients were obtained at Omid Hospital in Tehran. They were collected from September 2018 to September 2020 and were labeled by three cardiac imaging experts during the collection time. The mean and standard deviation of their age are 48.2 ± 19.5 years. The percentage of samples collected from female patients is 53%. In parts (a) and (b) of Figure 2, five healthy and five HCM subjects from the collected dataset are illustrated. The collected dataset is publicly available in [12].
Institutional approval was granted by Omid hospital in Tehran so that the patient datasets can be used for research studies on diagnostic and therapeutic purposes. Approval was granted on the grounds of existing datasets. All the patients who have participated in data collection process for this study have completed informed consent form. All the carried out methods complied with relevant guidelines and regulations. Tehran Omid hospital in Iran has given ethical approval in order to use the dataset for research purposes.

Prerequisites
In this section, the required background is briefly reviewed. In section 4.1, HSV color space and its relation to RGB color space are described. CNN is reviewed in section 4.2.

RGB to HSV conversion
Hue, Saturation, Value (HSV) is an alternative representation for RGB color space. HSV color space represents color more closely to the way human vision perceives color. However, we use HSV instead of RGB for another important reason. Similar to RGB, HSV is made of three channels. Hue channel represents color, S channel is the amount of color, and V channel is the brightness or intensity of the pixel. Therefore, color filtering is done with ease in HSV color space. The desired color interval can be easily set on H channel. However, in RGB space the color filtering involves all three channels which is more complicated compared to HSV space. The conversion from RGB to HSV can be done as follows:

Convolutional Neural Network
Neural network is a machine learning model inspired by how the brain works. A neural network is made of a network of neurons. Neurons are learning units. The parameters of the neurons are determined such that the network input is mapped to the desired output. CNNs are a special type of neural networks which operate on image data [13]. Two key concepts of CNNs are convolutional and pooling layers. The convolutional layers extract key features from the input images via repetitive application of filters on the input images. The output of a convolution layer is a set of activations called a feature map. To make the feature map less sensitive to the location of features in the input image, the feature map is down-sampled using pooling layers. Two common pooling methods are average pooling and max pooling [14]. Combining convolution and pooling, CNNs can extract features from the input images automatically which makes them an ideal choice for various classification and regression applications. In this paper, we use a CNN for binary classification which is trained on Binary Cross Entropy (BCE) loss: where N is the batch size, is the ith input sample, is the ith target label, and ( ) is the prediction of CNN about sample .

Proposed method
In deep learning, CNN is currently one of the most successful methods with outstanding performance on image classification tasks. However, a huge amount of training data is required to unleash the power of CNNs. In real-world applications, it may be challenging and costly to collect a sufficient number of training samples. It is shown that the CNN classification performance can be improved by increasing the number of training samples via data augmentation methods [13]. Common data augmentation methods include rotating images, shifting images to left or right, zooming in, and changing the brightness of images. In this paper, we propose another data augmentation approach and show experimentally that it can improve CNN performance for image classification. The proposed data augmentation approach is based on the color filtering algorithm. Depending on the characteristics of the problem at hand, filtering may exclude too bright or too dark colors in the image or it may keep a specific color spectrum while discarding the rest of the spectrum from the image.
In each color image represented in RGB color space, the three channels red (R), green (G), blue (B) indicate the intensity of red, green, and blue colors, respectively. When it comes to filtering a specific range of colors from an image, HSV color space yields better results than RGB [15]. To investigate this claim, we have performed our color filtering in both RGB and HSV spaces and compared their performance ( Table 1). As will be discussed in section 7.2, converting data color space from RGB to HSV and performing the color filtering in HSV space leads to slightly better performance. That is why we chose to perform our data augmentation (based on color filtering) in HSV color space.
Given the argument above, as the first step of color filtering, the images are converted from RGB color space to HSV (section 4.1). Using the H channel of HSV, the desired color range can be specified for filtering. The portion of image within the specified color spectrum is kept while the rest of the image is removed. As we know, H channel can take on any integer value from 0 to 359. After several experiments, we observed that using interval [58,198] as the color filtering spectrum yields the best results. The color filtering spectrum is defined around the base point 128 (Figure 3). The lower and upper bounds of the spectrum are 128-70=58 and 128+70=198, respectively. Any value outside the defined spectrum ([58, 198]) is discarded during the color filtering. The lower and upper bounds of the filtering spectrum may vary depending on the problem characteristics. For example, the red color may be of interest to us if the detection of red apples is desired. Similarly, the blue color may interest us if our objective is to identify rivers and lakes in satellite imagery.
After applying the color filter on the original images (available ones), they are used as augmented data. Therefore, CNN classifier is trained on original samples as well as the augmented ones. The steps of the proposed method are depicted in Figure 4 which will be explained in more details below:  As the first step the dataset of images is received.  The proposed color filter is applied on the dataset samples.  The dataset is augmented with the filtered images.  As a preprocessing, the augmented dataset samples are normalized.  A CNN is trained on the augmented dataset.  Classification is performed using the trained CNN, and the results are presented.

Color Filtering Algorithm
The images used in this paper are grayscale, but the proposed color filtering method is also applicable to color images. The objective of color filtering is to discard unwanted details from the images based on their color. The filtering begins by first converting images from RGB to HSV. This is necessary to separate the color and intensity information of the image pixels. The desired color spectrum on Hue is defined using a base color and a threshold around it. For example, in this study, the base color is 128 and the threshold is set to 70. Therefore, the desired color spectrum will be [128-70, 128+70] = [58, 198] as shown in Figure 3. The base color and threshold parameters must be set so that the problem's objective is met. Based on the desired color spectrum, a mask can be formed. By applying the mask on the image, pixels outside the desired color spectrum are discarded while the rest of the pixels are retained. The effect of applying the color filtering algorithm is illustrated in Figure 5.

Implementation details
The color filtering has been carried out using the OpenCV library and Python programming language for image processing. By default, OpenCV loads images with RGB color space which are converted to HSV before applying the color filter.
The classification has been carried out by training a CNN. The training procedure has been implemented using Keras deep learning library with Tensorflow backend. The total number of the dataset (section 3) samples is 59267. The number of healthy subjects is 37421 while the number of HCM patients is 21846. After applying the proposed data augmentation scheme, the number of samples in the augmented dataset increased to 118534 (twice the original dataset). We used 10-Fold cross validation for training since it is the most common method to select model for high dimensional datasets [16]. One fold (11854 samples) was used as test set. The test set samples have been selected randomly from the dataset. The training set was 80% (85344 samples) of the other nine folds (106680 samples) and the remaining 20% (21336 samples) was used as validation set.
The architecture of the CNN is illustrated in Figure 6. As can be seen, the CNN consists of six convolution layers followed by three fully connected layers. Multiple dropout layers have been employed to avoid overfitting.

Experimental results
In this section, experimental results are presented. The upper bound on the difference between true error rate and the empirical error rate is determined in section 7.1 and the proposed data augmentation approach is evaluated in section 7.2. Finally, the effect of changing the optimizer algorithm on the quality of CNN training is investigated in section 7.3.

Performance bound analysis
In supervised learning problems, the training and test data are always limited. Therefore, classifiers are always at risk of overfitting to the available data. The true performance bound of the classifiers after training can never be determined exactly. However, it is possible to estimate an upper bound for the performance to get some insight on how well the trained classifier can perform. To this end, consider a training set {( , ), = 1, … , } where and are input and desired output for ith sample, respectively. The true error rate of the classifier F with parameter vector ̂ for a test sample ( , ) is defined in terms of probability as [16]: where ( ̂) is the probability that classifier F misclassifies test sample . The true error rate ( ( ̂)) can never be calculated exactly. However, it is possible to calculate an empirical error rate based on training data (as pointed out in [16]): where (̂, ) is the prediction of the classifier F for input . As discussed in [16], the difference between true error rate and empirical error rate can be bounded from above as: where is the significance level and ( , ) is computed as [16]: , where d is the dimension of input . The bound in Equation (1) has been devised for linear classifiers. Our proposed method is based on a CNN which is not a linear classifier. However, as pointed out by Gorriz et al. [16], the last layer of a deep neural network which is a dense layer can be considered as the linear classifier. After all, dense layers are nothing but dot product of weight parameters and outputs from the previous layer. Therefore, in our case, parameter d is equal to the number of neurons in the last layer of CNN in Figure 6 which is 64. Parameter l is also equal to the number of training samples which is 85344 (see section 6). Using Equation (1) and the specified parameters l and d, the upper bound for the performance of the proposed method is plotted in Figure 7. Clearly, as the sample size (number of training samples) increases, the empirical error rate approaches to true error rate and the upper bound on their difference approaches zero. Similarly, reduction of input sample dimension lowers the upper bound.
Considering that in our case = 64 and > 80 , the upper bound on the difference between empirical performance obtained in this paper (see Table 1 and Table 3) and the actual performance is at most 0.05. Figure 7. Upper bound for difference between true error rate and empirical error rate with 95% confidence level.

Classification performance on the augmented dataset
To observe the effect of the proposed data augmentation approach, first, the CNN classifier has been trained on the original dataset. Next, the training has been repeated on the augmented dataset. The data augmentation was performed in both RGB and HSV spaces to find out whether using HSV color space is beneficial. The performance of the CNN trained on original data (without augmentation), augmented data using RGB space, and augmented data using HSV space has been shown in Figure 8. The loss and accuracy plots on validation data for the original and the augmented datasets are presented in parts (a) and (b) of Figure 8. As expected, CNN trained on original data has been outperformed by CNN trained on augmented datasets. Moreover, it is observed that performing augmentation in HSV space has led to better performance compared to augmenting data in RGB space. The effect of training with 10-Fold cross validation with/without augmented data has also been shown by various performance metrics in Table 1. Apparently, regardless of the chosen color space (RGB or HSV), data augmentation has improved the overall classification performance of CNN as observed in the last two rows of Table 1. Comparing the performance of CNN trained on data augmented in RGB and HSV spaces reveals that using HSV space gives minor improvement compared to using RGB space. Based on the argument provided in section 7.1, we can conclude that the variance of the performance measures reported in Table 1 is below 0.05.

Comparison of applying different optimizers
In our experiments, for CNN training, we tried different optimizers such as Adadelta [17], Adagrad [18], and Adam [19]. The performance of these methods has been summarized in Table  2. As can be seen, Adadelta has performed better than Adagrad. The superiority of Adadelta stems from the fact that it addresses the aggressive learning rate reduction of Adagrad [17]. Among the methods listed in Table 2, Adam has the best performance since it combines the advantages of two popular optimization methods namely Adagrad and RMSProp [19]. The loss and accuracy plots (on validation data) during training CNN with the three optimizers ( Figure 9) agree with the performance metrics presented in Table 2. Adam has the least loss and the best accuracy. The second best is Adadelta and Adagrad takes third place.

Comparison of different data augmentation methods
To compare the proposed data augmentation method with existing ones, we have augmented the dataset with height shift [20], rotation [21], and width shift [22]. The results of training the CNN on the data augmented with these methods are presented in Table 3. The best performance has been achieved by the proposed augmentation method. Second best is rotation augmentation. Height and width shifts take the third and fourth place, respectively. Based on the argument provided in section 7.1, we can conclude that the variance of the performance measures reported in Table 3 is below 0.05. The loss and accuracy plots of CNN trained on dataset augmented using the methods in Table 3 are illustrated in Figure 10. Clearly, the proposed augmentation has led to better accuracy and lower loss compared to the rest of the augmentation methods.

Grad-CAM results
Grad-CAM is a generalization of the Class Activation Mapping. It does not need retraining and can be applied broadly to any CNN-based architectures [23]. Some samples of CMR images and their corresponding Grad-CAM results are shown in Figure 11. Three CMR images for normal cases and the result of applying Grad-CAM on them are shown in parts (a) and (b) of Figure 11. Similarly, three CMR images for HCM cases and their Grad-CAM results are presented in parts (c) and (d) of Figure 11.

Discussions
HCM patients may have no explicit symptoms, which makes the diagnosis process challenging for experts. In this research, we proposed an automatic method for HCM diagnosis using a CNN trained on CMR images and achieved high accuracy. Widespread use of this method may help experts and patients for better decision-making to treat this disease while reducing the time and cost for patients and radiology centers. However, more studies are required for this method to be used in clinical practice.
Diagnosing HCM by human experts is prone to mistakes. The ensemble of human experts and the proposed method may lead to more accurate decision-making about HCM patients and prevent severe complications of misdiagnosis. The proposed system performance could be improved provided that more training is done on newly collected images. Despite having high accuracy, the proposed system must be validated by medical practitioners.
Machine learning algorithms have shown excellent performance in the health care domain. However, to put these methods to some good use, gaining the medical experts' trust is a must.
Therefore, our final goal is to use the proposed method as a tool in routine clinical care.
In the medical field, disagreement between experts' opinions is common. Under such circumstances, the experts have no choice but to conduct more tests on patients to make more accurate decisions. More tests mean higher treatment costs and time. However, it is possible to exploit the potential of machine learning algorithms and use them as a source of decision-making. Combining the opinion of human experts with the decision of the algorithms may lead to better diagnosis with fewer tests and faster.
In this research, a new CNN architecture was designed from scratch to obtain high accuracy on HCM disease diagnosis instead of using existing models. Our machine learning tool not only shortens CMR examination time but also lowers the costs for patients and radiology centers. Furthermore, the performance of our method can be improved using new training data collected over time.
To the best of our knowledge, no existing work has used CNN for HCM diagnosis using CMR images. The advantage of our method is its automation that can assist the medical experts with their decisions. It eliminates the contrast agent and its complications, shortens the examination time of CMR, and lowers the costs for patients and cardiac imaging centers. This leads to better and faster services by cardiac imaging centers.
Our proposed system has also utilized a new data augmentation approach and showed experimentally that it could improve CNN performance for image classification. Our data augmentation approach is based on the color filtering algorithm. After applying the color filter on the original images, they were used as augmented data. Therefore, the CNN classifier was trained on original samples as well as the augmented ones. For analyzing the performance of our method, we experimented with different optimizers and data augmentation methods. The results showed that our designed algorithm outperforms all evaluated data augmentation methods.
During our experiments, we observed that performing the color filtering in HSV space yields better results compared to performing the filtering in RGB space which is why we use HSV space for our data augmentation based on color filtering.
Unfortunately, our research has some limitations. It is not fully automated, as an expert still needs to have a quality control role. All data were collected from a limited region (one hospital).
Having more data collected from various health care centers may improve the reliability of our algorithm. Additionally, this method has not been tested with poor-quality images. These images are generated, for example, when a patient cannot hold their breath or those with implantable devices or atrial fibrillation.
In conclusion, we showed how a machine learning tool could be used for HCM diagnosis more precisely. Delivering high accuracy is the first requirement of any machine learning method for HCM diagnosis. As the next step, the method must be validated in practice on new cases. A tool with more precise prediction compared to experts is the ultimate goal of learning methods.
Widespread usage of such a tool could help to improve HCM diagnosis. Consequently, sudden cardiac death risk will be decreased, leading to higher life expectancy in HCM patients.