An exploration of ventricle regions segmentation and multiclass disease detection using cardiac MRI

In this modern era, various cardiac diseases are very crucial as they cause a high mortality rate. Early detection of cardio vascular disease (CVD) is essential to prevent and control it. Diagnosis of cardiac disease is the process of analyzing the left and right ventricle cavities (LV, RV) and myocardium (MYO) from cardiac magnetic resonance (CMR) images. As deep learning architectures are becoming more mature, segmenting, and classifying cardiac MRI images using deep learning is gaining more attention. This work is aimed to identify five different cardiac disease subgroups namely NOR, MINF, DCM, HCM, and ARC by employing a new deep join attention model (DJAM) technique for segmenting LV, MYO, and RV regions separately. This method provides advancement as the joined attention model was combined with the pooling layers and the resultant is added to the convolution layers. The proposed region integrated deep residual network (RIDRN) is used to extract the features from the segmented images for classification. In this process, the features of LV, RV, and MYO are combined with a different combination. The advantage of doing this process is to get the overall features without leaving any single strip of features from the three regions. Hence, it shows a rise in performance accuracy. The random forest classification method is used to classify the underlying features for cardiac disease diagnosis. This proposed work is tested in the automated cardiac diagnosis challenge (ACDC) dataset and it perfects the state‐of‐art techniques.


| INTRODUCTION
According to the World Health Organization, the world's most dangerous disease is cardiovascular disease (CVDs), which causes 16% of the world's deaths.The death rate was increased due to this disease, in the year 2019 by more than 2 million to 8.9 million deaths.It is predicted that, by 2035, the number of people with CVDs will increase by 30%. 1 Cardiac magnetic resonance imaging (cardiac MRI or CMR) is a detailed image of the heartbeat.This can help the doctor to study the structure and function of the heart. 2The cardiac function has to be analyzed with the help of MRI images for disease detection and risk estimation. 3With the well-discriminating power of MRI, it analyses cardiac functions by the assessment of the stroke volumes (SV), left and right ventricular ejection fractions (EF) and myocardium thickness. 4In medical practice, semi-automatic segmentation is used because we cannot achieve good accuracy in fully-automatic cardiac segmentation. 5n the year of 2015, deep learning (DL) was introduced in medical image processing 6 opening a breakthrough in automatic analysis.The scientific community has an attraction to DL due to its high generalization capacity, high performance, and versatility.The promotion of high-performance computers and a large amount of medical data have an interest in DL. [7][8][9][10] Segmentation of the main structures of the heart like the left ventricle, right ventricle, and myocardium is a crucial job.The CMR segmentation has many difficulties discussed in, 11 (i) poor contrast in the myocardium and its surroundings, (ii) brightness heterogeneities in the left/right ventricular cavities, (iii) limited CMR resolution, etc., Recently, Convolution Neural Network (CNN)-based techniques attained outstanding performance in medical image segmentation, but still cannot achieve the strict requirements of medical applications for segmentation accuracy.Accurate image segmentation is still a challenging task in medical image analysis. 12Some researchers have tried to address this problem by using arose convolutional layers, 13,14 image pyramids, 15 and self-attention mechanisms. 16,17However, these techniques still have limitations in modeling long-range dependencies.The main focus is based on discriminative semantic features and the experimental results illustrate a visible improvement in classification accuracy than other state-of-art techniques.

| Motivations of the research
Efficient diagnosis of cardiac diseases using MRI with accuracy remains as a significant challenge.Existing approaches for cardiac MRI diagnosis suffer from limitations, such as low accuracy, high computational complexity, and insufficient feature extraction.These limitations can lead to inaccurate diagnoses, delayed treatment, and increased healthcare costs.Therefore, there is a need for more accurate and efficient approaches to diagnosing cardiac diseases using MRI.
To address the limitations of existing approaches, we propose a novel deep join attention model (DJAM) segmentation and integrated deep residual network (IDRN) feature extraction with different classifiers approach for diagnosing cardiac diseases using MRI.Our proposed approach addresses the limitations of existing approaches by leveraging the power of deep learning and attention mechanisms to accurately and efficiently segment cardiac MRI images and extract relevant features for classification.In addition, our proposed approach uses different classifiers to improve the accuracy and reliability of the diagnosis.

| RELATED WORKS
CMR segmentation is a significant precondition in medical practice to consistently identify and treat many cardiovascular diseases. 18,19Recently, the deep learning paradigm has provoked CMR segmentation. 20Ma et al. 21sed a fully convolutional neural network that produces precise results in the UK Biobank dataset, which consists of more than 4875 cases, but this technique might not generalize well to other datasets.With the development of deep CNN, U-Net is introduced for medical image segmentation. 22Gomathi et al. 23 have attempted to compare UNet, Segnet, and FCNN techniques and come up with a result that UNet performance is better than other techniques.U-Net architecture is trained on small data for obtaining rapid and better results.It is different from the fully connected network (FCN), it has a symmetric encoder-decoder with skip connections.U-Net holds two phases, namely downsampling and upsampling.The downsampling is a narrowing path, used for extracting high-level features, and in upsampling, the lost pixel is reconstructed.Simple structure and elegant performance of UNet structure, various Unet-like methods are emerging, such as Res-UNet, 24 U-Net++, 25 Dense-UNet 26 and UNet3+, 27 V-Net. 28he introduction of the self-attention mechanism in CNN is improving the network's performance. 29The visual attention method is similar to human vision based on the brain signal processing mechanism.Human vision acquires the important object by speedily scanning the input image and eliminating other ineffective information.Now, this attention model is broadly used in deep learning.Gomathi et al., 30 introduced channel and spatial attention in UNet architecture to extract the dominant features effectively.Xiaofenget al., 31 integrated the attention model with a gradient-increasing process to improve segmentation with minimum computation cost.Seo et al., 32 have established an enhanced U-Net (mU-Net) for liver tumor segmentation in CT images.This network uses a residual module with deconvolution and activation function through the skip connection which tackles the problem of low-resolution features in U-Net architecture.Guo et al., 33 segment the polyp in colonoscopy images using SE-U-Net, which was augmented by the dilation kernel.Li et al., 34 introduced the modified encoderdecoder with many integrated sequential depths dilated inception blocks.Bofeng et al. 35 introduced a model which combines the convolutional neural network and U-net achieves LV (left ventricle) segmentation.ROI can be located in the image using CNN and UNet is used for segmenting LV.Zheng et al. 36 developed a modified UNet for the short-axis MRI images.A recurrent interleaved attention network (RIANet) used recurrent feedback structure, CliqueBlock, which integrates forward and backward connections together between different layers with identical resolution.The attention model is also used in other medical image segmentation techniques and achieved better performance.For example, Junding sun et al. 37 introduced a novel U-shaped model with complementary feature enhancement to improve the liver segmentation process.It reduces the semantic gap between encoder, decoder, and increases the segmentation accuracy.A cross-attention model is used which reduces redundant information and enhances the contextual information of single sparse attention by encoding contextual information through 3 × 3 convolution.Junding sun et al. 38   The proposed DJAM is employed for segmenting the LV and RV cavities and the myocardium in MRI images.In the process of segmentation, the dominant features and common features are extracted.The proposed RIDRN is used to extract the salient features for detecting cardiac disease.The features extracted from RIDRN are classified by RF to classify the type of cardiac disease.This classifier classifies five classes namely NOR, MINF, DCM, HCM, and ARC.

| DJAM-based segmentation
Figure 2A explains the proposed DJAM architecture.This architecture is used to extract the regions like LV, RV, and MYO effectively for both diastolic and systolic phase instances from an input image.The proposed work uses an effective attention module to successively work out attention maps with different dimensions namely channel and spatial.Adaptive feature is refined by multiplying attention maps with the input feature map.Attention mechanism only focuses on the target area and suppresses other useless information.Channel attention map can be generated during channel attention while exploiting the inter-channel relationship of features.Channel attention concentrates on 'what' is significant in an image to be processed.It is empirically proved that exploiting both max and average pooled features simultaneously will greatly upgrade the power of networks rather than using each independently.
In this architecture, DJAM is employed three times namely DJAM1, DJAM2, and DJAM3.The channel features to find out by applying DJAM1in Pool3 obtained from the downsampling process and that resultant is multiplied by the 1 × 1 convolution which is carried out in conv6.In Conv6, the outcome from the previous layer is converted into 1 × 1 convolution for dimensionality reduction and element-wise operation.
In Figure 2B, according to DJAM1 and DJAM3, the channel attention module aggregates the required information of a feature map by using average and max pooling to generate context descriptors which denote refined features.Channel attention is computed as In Figure 2B, according to DJAM2, a spatial attention map is produced using the inter-spatial relationship of features.To compute the spatial attention, apply max and average pooling to generate a two-dimensional (2D) map as The channel feature (F c Þ is given by the formula as In the same way, DJAM2 is performed and the spatial features are calculated as For DJAM3 is performed and the channel features are calculated as

| Arrangement of attention modules in the proposed work-DJAM
The proposed architecture alternates between convolution and pooling layers.The MRI image is fed into the architecture as an input.Downsampling and upsampling are used in this process.Downsampled features are concatenated with those acquired during the upsampling process.This approach ensures that every significant feature is extracted without variation.During the downsampling phase, convolutions 1, 2, 3, and 4 occurred.To reduce the number of depth channels, the process of upsampling begins with a 1 × 1 convolution of the previous layer's output, which is done in Conv6.As shown in Figure 2A, the resultant of the pool3 is exposed to the DJAM and the outcome is multiplied with 1 × 1 convolution.As illustrated in Figure 2A, the same approach is followed for the output of pool 2 and pool 1.The segmented results for both systole and diastole are compared with other techniques and shown in Figure 3A,B.This join attention model will exploit the interchannel relationship of features.Using this architecture all common features and dominant features are extracted elegantly from the given image.

| The proposed RIDRN
The extracted regions of the proposed DJAM are employed in RIDRN for feature extraction.RIDRN is used to extract features for correct classification of cardiac disease.
Feature extraction extracts important information from images. 39,40The mechanism of the proposed RIDRN is shown in Figure 3.This architecture is based on a deep residual network. 41It works on a 50-layer residual network and follows the idea of migration learning and building the global average pooling layer and fully connected layer.Network performance is increased in residual learning by solving gradient problems.Gradient problem occurs due to network depth increase which causes gradient explosion and disappearance.
Construct a network as and fitting F i ð Þ ¼ 0, which structures an identity map N i ð Þ ¼ i where, i is the previous layer input, N (i) is mapping from the input to summation and F (i) is the network mapping before summation.Figure 4 shows the architecture  RIDRN, which is used to bring out features from the segmented image.
In the forward process, continuous process is carried out from layer l to L. The calculation process is given as Finally, we get The equivalent mapping is represented as w x .For residual elements, the subsequent input is equal to the result of the input that is added to each residual element.Next global average pooling is employed to find the average of each feature map.In the fully connected layer, the occurrence of overfitting is due to the usage of many parameters.So the number of parameters can be reduced to fix the problem of overfitting is reduced.By doing this the global average pooling layer is more robust to spatial translation of the input.Then, the resulting vector was given to the softmax activation function.
In this proposed work, the features from every block can be integrated.As illustrated in Figure 4.In the first block, the LV features (LVF1) are concatenated with MYO features (MYOF1) and the result is concatenated with RV features (RVF1).Finally, the resultant is concatenated with the features LVF1.By doing this integration, we can get salient features in the feature map.In the second block again the features are concatenated with each other like LVF2 is concatenated with RVF2 and this will connect with MYOF2 and finally the resultant is concatenated with LVF2.In the third block, LVF3 is concatenated with RVF3 and it is integrated with MYOF3 and that result is concatenated with LVF3.Constructing this integration will result in getting salient features of the three regions namely LV, RV, and MYO.Acquiring these salient features will lead to the accurate classification of diseases.

| The classification approach
The extracted features of the proposed DJAM are discussed in the previous Section 3.3, which is described in Dataset is fed into the classification process.At present, deep learning-based classification techniques are used widely. 42According to the statistics, female breast cancer has risen to the top of the list of cancers.Zhu et al. 43 analyzed a wide range of recent papers and provide a thorough assessment of the convolutional neural network-based diagnosis of breast cancer.They start with a review of several kinds of imaging techniques.The subsequent part describes the structure of CNN and some open-access breast cancer datasets.The proposed DSNN, 44 is a model that combines the DenseNet architecture with spiking neural networks (SNNs) to enhance the understanding of decision-making processes in brain disease classification.The key contribution of the paper is the integration of attention mechanisms into DSNN.These mechanisms highlight salient regions within brain images, enabling clinicians and researchers to identify the areas contributing most to the disease classification.The ROENet 45 method was automatically categorizes malaria parasites on blood samples.Three RNNs (randomized neural networks) are combined into one ROE-Net output, which improves the performance of the architecture.Initially, an attempt was made to use deep learning classification technique in this work but it produce poor classification accuracy.So machine learning classifiers like SVM, 46 KNN, 47 and RF 48 are used for classification and RF produces good classification accuracy than other techniques.

| EXPERIMENTAL SETUP
The performance of the proposed DJAM is calculated using the Dice coefficient (DC) and Hausdorff distance (HD).Quantitative analysis of DJAM and RIDRN is done with precision, recall, F1score and accuracy.This work is implemented in Python version 3.7 with Keras (2.4.0) and Tensor Flow-GPU (2.1.0)libraries as well as the system configuration ofIntel core i7 with 16GB RAM and RTX-2060 6GB Nvidia graphics card.

| Dataset
The ACDC (automated cardiac diagnosis challenge) dataset is used in this proposed work.This dataset is of 150 different patients' data and was created from real clinical exams obtained at the University Hospital of Dijon.This covers several well-defined pathologies with the required amount of cases.In the ACDC dataset, each class contributes equally to the overall training.This dataset comprises 100 patients classified into five classes (20 patients per class); arranged in a balanced manner.It has five evenly distributed subgroups (1 healthy subject group and 4 pathological subgroups).First sub-group consists of 30 patients with normal subjects.Second group is composed of 30 patients with previous myocardial infarction (MINF), third group consists of 30 patients with dilated cardiomyopathy (DCM), fourth group consists of patients with hypertrophic cardiomyopathy (HCM) and finally, the fifth group consists of 30 patients with abnormal right ventricle (RV).In a total of 100 patients data are about for diastole and systole phases of cardiac MRI, 80% of data are randomly allotted for training and the remaining 20% is for testing by the segmentation process.

| Performance measures
In this research work, two new techniques are proposed, one for segmentation, another for feature extraction.The accomplishment of the proposed DJAM is evaluated by the metrics like DC, HD, precision, recall and F1-Score.The mathematical expression is given below Here, A, B represents systole and diastole datasets.
The performing tendency of the classification process is examined by using precision, recall, and F-measure metrics.

Overall accuracy ¼ Correct classification of the total number of images in the dataset
Here, TP implies true positive, TN, true negative, FP, false positive, and FN, false negative.

| Experimental outcomes
As discussed in the previous sections, the proposed DJAM is employed for segmentation and proposed RIDRN is used for feature extraction and RF is used to classify diseases.The experimental results are analyzed in this section.

| Ablation study
An ablation study was carried out to determine the impact of model architecture.Table 1 describes the ablation experiment performed on the proposed DJAM.It examines each module in the architecture.The proposed architecture employs a two-channel attention (CA) and a single spatial attention (SA) model.A comparative method provides two spatial and single channel attention model (DJAM: SA-CA-SA).Another comparative model comprises a method without joining the resultant of the attention model with the upsampled features (DJAMwithout joined attention).This study provides a clear analysis of the high level of accuracy of DJAM.This experiment also shows the need for the inclusion of joined upsampled features in the architecture.

| Performance exploration of the proposed work using deep join attention modelbased segmentation
In this proposed system, the deep join attention-based segmentation technique is employed for segmenting LV, RV, and MYO for both systolic and diastolic phases.Diastole and systole are two phases of the heartbeat.The contraction of the heart to pump blood out is called systole, T A B L E 1 Ablation experiment of the proposed segmentation-DJAM.

| Performance comparison of the proposed DJAM with existing techniques for segmentation
The performance of the proposed system is compared with the state-of-art techniques and the outcomes are displayed in Table 3.The performance is carried out for different epochs.As inferred from Table 3, the staging process of the proposed system is higher than the U-Net, 22 U-Net+ +, 49 and U-Net-CBAM 50 techniques.For epoch 120, DC, and HD distance is calculated for segmented LV, RV, and MYO and the results show that the proposed system has an overall gain approximately ranging from 2% to 3%.The proposed deep join attention-based segmentation extracts the dominant and common features elegantly by using the attention model.The precision, recall and F1 score of the proposed system are shown in Table 4.

| Parameters comparison with other methods
For comparison, four models UNet, UNet++, UNet with CBAM, UNet with Detail preserving attention-UNet (DPA-UNet) being compared in this study for systole and diastole separately.Table 5 shows a comparison between various methods concerning the parameters.When compared to the typical UNet architecture, our proposed DJAM has a lot fewer parameters.This is due to the single convolution process in each layer instead of the double convolution process.For systole and diastole phases the proposed DJAM has 24,12 185 parameters.

| Comparison of training time of the proposed DJAM with various methods
It is important to analyze the training time taken by various methods for segmentation.Table 6 shows the time   7.
The ResNet, VGG, and GoogleNet performance are compared with the proposed work and the results show that the proposed work produces better results.RF also has better results than KNN and SVM because there is limited generalization error.RF produces approximately 97% of precision, 96% of recall and F1 score.The accuracy of the proposed system with other techniques is shown in Figure 5. From Figure 5, it is clearly understood that the accuracy of the proposed work is higher than other existing techniques.The overall accuracy of the proposed RIDRN with an RF classifier produces 96%.It has a gain of 4% when compared with GoogleNet.The confusion matrix for the proposed work is shown in Figure 6.

| CONCLUSION
In this investigation, the proposed DJAM is developed for segmentation and the proposed RIDRN is designed for feature extraction to achieve high classification accuracy in the area of cardiac disease detection.The DJAM attains excellent segmentation results by extracting dominant and common features.Experimental results point out that the proposed RIDRN achieves higher results because it integrates the LV, RV, and MYO features with different combinations.The various classifiers are tested, and their performances are analyzed with different performance measures.It is concluded that RF produces good classification accuracy.The overall accuracy of the proposed work has found with 96%.

3 |
PROPOSED SYSTEM'S FRAMEWORK 3.1 | Proposed system The architecture of the proposed mechanism is illustrated in Figure 1.The major contribution of the proposed work is segmentation, feature extraction and classification.F I G U R E 1 Visual flow of the proposed work.

F
I G U R E 2 (A) The architecture of the proposed deep join attention model (DJAM).(B) Channel and spatial attention carried out in DJAM.F I G U R E 3 (A) Comparative results of the segmented ventricle regions in diastolic phase.(B) Comparative results of the segmented ventricle regions in systolic phase.F I G U R E 4 The framework of the proposed region integrated deep residual network (RIDRN).
SUBHA ET AL.

4 . 3 . 6 |
taken by the proposed DJAM and other methods.The training time for epoch 120 is tabulated.For the UNet-CBAM it takes 394 s for training which is more than traditional UNet and UNet ++.In UNet-CBAM, the spatial and channel attention are carried out simultaneously, so it consumes more time.The proposed work takes 136 for diastole and 113 for systole which is lesser than UNet-CBAM.The proposed system takes lesser time than other techniques because channel attention takes place followed by spatial attention.This continues in the further process; consuming a lesser pace of time.Performance analysis of the proposed RIDRN feature extraction with different classifiers for cardiac disease identification This research work uses the proposed RIDRN for efficient feature extraction.The segmented LV, RV, and MYO are given as input to this process and the features are extracted by the proposed RIDRN.In this step, the extracted features are classified with various machine learning classifiers like SVM, KNN, and RF.The performance of different networks with different classifiers is listed in Table

F I G U R E 5
Accuracy of the proposed work with other techniques.F I G U R E 6 The confusion matrix of the proposed work for cardiac disease detection.
This research work aims to build up an automatic system to find out about heart disease.For this research work, segmentation and feature extraction techniques are proposed.The input MRI images are subjected to the DJAM architecture to generate segmentation, which localizes the LV, RV, and MYO from the whole MRI images.The proposed region integrated deep residual network (RIDRN) is employed for feature extraction from the segmented image.The random forest (RF) classifier is used to classify cardiac diseases.Five sections are discussed in this research work, starting with an introduction.Section 2 describes the literature review.The proposed work is explored in Section 3. Section 4 explains the experimental results and Section 5 is the conclusion.
Contribution to research includes the following: 1.The proposed DJAM is used to segment the LV, RV, and MYOusing for both diastolic and systolic phases by extracting dominant and common features.2. The proposed RIDRN is used to bring out features from the previous segmented image.3. The RF classifier is used to classify five classes from the extracted features-normal case (NOR), dilated cardiomyopathy (DCM), heart failure with infarction (MINF), abnormal right ventricle (ARC), and hypertrophic cardiomyopathy (HCM).4. Validate the effectiveness of the proposed system in the ACDC dataset and the experimental results are compared with other techniques.

Table 2 .
Performance analysis of the DJAM with different epochs.Performance analysis of the proposed DJAM with other techniques.Comparison of the proposed DJAM with precision, recall and F1-score values for both systole and diastole phases with different techniques.The work is executed for different epochs like 20, 40, 60, 80, 100, and 120.In systole phase, for Epoch 20, the DC got 0.692, 0.661, and 0.675 for LV, RV and MYO, respectively.For diastole phase, the DC values are 0.823, 0.765, and 0.717.For Epoch 120, the DC attained 0.967, 0.94, and 0.946 for LV, RV, and MYO, respectively.It is inferred from Table 2, the DC and HD values are increased when the epoch values are increased for both systole and diastole phases.A higher DC value indicates better performance of the proposed segmentation algorithm.On the other hand, the HD values obtained for epoch 40 are 12.8, 15.1, and 14.3, respectively.It is clearly shown that the HD values are decreased when epoch values are increased.A smaller HD value indicates that the predicted segmentation is closer to the ground truth segmentation, thus, the proposed DJAM achieves better performance.The performance analysis of the proposed DJAM with other techniques is tabulated in Table 3.