DBCU-Net: deep learning approach for segmentation of coronary angiography images

Coronary angiography (CAG) is the “gold standard” for diagnosing coronary artery disease (CAD). However, due to the limitation of current imaging methods, the CAG image has low resolution and poor contrast with a lot of artifacts and noise, which makes it difficult for blood vessels segmentation. In this paper, we propose a DBCU-Net for automatic segmentation of CAG images, which is an extension of U-Net, DenseNet with bi-directional ConvLSTM(BConvLSTM). The main contribution of our network is that instead of convolution in the feature extraction of U-Net, we incorporate dense connectivity and the bi-directional ConvLSTM to highlight salient features. We conduct our experiment on our private dataset, and achieve average Accuracy, Precision, Recall and F1-score for coronary artery segmentation of 0.985, 0.913, 0.847 and 0.879 respectively.


Introduction
Coronary artery disease (CAD) is a common heart disease, which is the leading cause of death worldwide according to the World Health Organization (WHO) [1]. Globally, the number of patients with CAD is expected to increase from 327.9 million in 2017 to 365.9 million in 2026. CAD poses a huge threat to human life. An accurate diagnosis of CAD is particularly important. As computer-aided diagnosis and treatment technologies continue to advance, increasingly powerful medical imaging methods, such as magnetic resonance imaging (MRI), computed tomography (CT) [2], and X-ray coronary angiography (CAG) [3], have been proposed to aid in diagnosis. As the "gold standard" for the diagnosis of CAD, CAG can precisely pinpoint the site and the degree of coronary artery stenosis, as well as the symptoms of the condition.
Accurate medical image analysis is crucial to subsequent clinical diagnosis and treatment. Diagnosis and treatment of clinical diseases mainly rely on advanced instruments and doctors' high-precision technology. However, manual segmentation of such CAG images requires a lot of medical expertise, which is time-consuming and is prone to human error. Due to the shortage of medical resources, a computer diagnosis system that can assist doctors in making better diagnoses and determining follow-up treatment plans in a shorter time is preferred.
In normal CAG image segmentation, the images are divided into two parts: vessels and backgrounds. Although traditional machine learning techniques (e.g., model-based methods and atlas-based methods [4][5][6][7][8]) have achieved good performance in vessels segmentation, deep learning methods have defeated traditional techniques with its automation and versatility in terms of segmentation efficiency and detection accuracy.
In this paper, we propose a DBCU-Net for automatic segmentation of CAG images, which is an extension of U-Net, DenseNet [9] with BConvLSTM [10,11] The dense connectivity strengthens feature propagation and gets multi-level feature to enhance representation. The BConvLSTM ensures both forward and backward passes are used simultaneously. The segmentation of blood vessels is achieved by combining the features of both the encoding and decoding layers in each direction.

3
The contributions of our works are listed as follows: (1) Many other tissues and noises in the background can be accurately segmented. (2) The problem of mis-segmentation of vessels in other branches of the heart in the background is improved. (3) The segmentation of tiny vessels in the distal terminals has the same positive outcomes. We conduct our experiment on our private dataset and obtain competitive performance.

Related work
Since 2010, deep learning has outperformed the conventional state-of-the-art approaches towards visual recognition tasks like heart MRI segmentation and cancer lung nodule segmentation. Ciresan et al. [12]. used CNN to automatically segment electron microscope images for the first time and won the electron microscope challenge with a huge advantage. In 2015, Ronneberger et al. [13]. proposed a groundbreaking network architecture named U-Net. U-Net is composed of an encoder-decoder structure, where the encoding path includes four down-sampling layers, a total of 16 times down-sampling. Symmetrically, its decoding path is up-sampled four times accordingly. The high-level semantic information feature map obtained from the encoding path is restored to the resolution of the original image, and the feature map stitching is performed on the same stage using skipping connection. Finally, a 1 × 1 convolution is used to obtain the required number of classes (e.g., vessels and background) from each component feature. Although U-Net has achieved good results in medical image segmentation, it also has some limitations. Firstly, a large and deep network requires about 32 million parameters. To better match the network, the quantity of date needed for training is large. Secondly, to make the network achieve higher accuracy, more convolutions need to be added to the network, which causes the network has to learn more unnecessary information that leads to overfitting. Finally, as the network deepens, there may be a risk of gradient disappearance, which brings great risks to the training of deep neural networks.
To solve the above-mentioned architectural problems, many U-Net-based network deformation structures have been proposed and achieved good results. A new U-Netbased DNN structure was proposed to improve the segmentation effect by Ahmed et al. [14]. This method selected various enhancement methods (e.g. histogram and Frangi multi-scale filtering) according to the background to remove noise and improve segment effect. Shi et al. [15] proposed an adaptive generative confrontation network to complete the segmentation of coronary blood vessels. This method uses an adaptive U network as a generator and a three-layer pyramid structure as a discriminator. Under the mechanism of generating confrontation, both the generators can extract the fine features of coronary arteries. This method has achieved significantly improved accuracy and continuity in blood vessel segmentation when compared to other segmentation methods. However, it should be noted that this approach for U-Net requires the utilization of multiple prior information.
With the continuous development of deep learning in medical image segmentation, many scholars have combined U-Net with other deep neural networks and achieved competitive results. In their work, Xian et al. [16] introduced four fully convolutional neural networks that are based on the U-Net structure. These networks utilize ResNet, DenseNet, and Residual Attention Network as the backbone for classification instead of U-Net. While this approach significantly improves segmentation performance compared to U-Net, it still faces challenges in accurately segmenting small blood vessels. Fan et al. [17] proposed a method based on FCN model for coronary artery vessel segmentation. This method removes catheters and artifacts by enhancing the vascular structure of the low-contrast contrast image and improving the segmentation effect. Li et al. [18] proposed a U-Net-based CNN network called CAU-Net. To overcome the undisclosed public data set of coronary angiography images, they established a dataset containing 538 samples. Although the method has excellent denoising and recognition capabilities, it is not very good in the segmentation of small blood vessels. Pearl et al. [19] designed a two-stage vessel extraction framework to segment coronary arteries and fundus vessels. This method is mainly composed of Vessel Specific Convolutional (VSC) blocks, Skip chain Convolutional (SC) layers, and feature map summations. VSC block and SC block for better feature learning and feature propagation, the feature map summations play an important role in extracting blood vessels. Zhou et al. [20] combined deep learning algorithms with traditional algorithms and used ResNet and U-Net to segment small data sets. This algorithm has lower complexity and fewer training calculations while ensuring the segmentation effect, but it is easier to make mistakes near the end RCA bifurcation. Yang et al. [21] proposed a U-Net-based fully convolutional neural network method to segment the main blood vessels of coronary angiography images and improved the segmentation performance by quoting an improved loss function, but the network is only suitable for the segmentation of single blood vessels. Jun et al. [22] improved the U-Net-based codec structure. Unlike U-Net that has only one level skip connection between codec blocks, T-Net arranges pooling and upsampling appropriately during the encoding and decoding process. All feature maps in the coding block can be connected with the decoding block. However, compared with the segmentation performance on RCA, the separation effect of this algorithm on the blood vessels on LAD and LCX still needs to be improved. Azad et al. [23] propose an extension of U-Net, bi-directional ConvLSTM U-Net with Densely connected convolutions for medical image segmentation. To improve feature propagation and reuse, they utilize densely connected convolutions in the last convolutional layer of the encoding path.
Although the above methods have made great progress, there are still areas for improvement: (1) Many other tissues and noises in the background cannot be accurately segmented out, and further fine segmentation is needed for post-processing. (2) The blood vessels of other branches of the heart in the background are easily segmented. (3) The accuracy of small vessel segmentation near distal endings needs to be improved.

Methodology
In this paper, we propose DBCU-Net network for segmentation of blood vessels in response to problems such as other networks presenting insufficient features of blood vessels, and a preprocessing method for dividing the region of interest of blood vessels in response to the problem that other background noises are easily mis-segmented. First, the blood vessel labeling data is divided into regions of interest, and a mask dataset is generated. Then, the dataset is subjected to data augmentation as well as contrast enhancement using contrast limited adaptive histogram equalization (CLAHE). The learning of redundant information is avoided through densely connected blocks, and the bi-directional ConvL-STM ensures that both forward and backward passes are used simultaneously. The two features of the encoding layer and the decoding layer are combined in each direction to realize the segmentation of blood vessels.

Preprocessing of the region of interest
Many segmentation methods for fundus vessel have been proposed in the research community, along with the publicly available dataset. In the publicly available fundus vessel images, the region of interest in the mask image is a circular white area that roughly contains the fundus vessels. Inspired by the mask images in the fundus vessel dataset, this paper uses preprocessing to process the annotated images in the coronary angiography vessel dataset. Then the region of interest generates a new mask image dataset where blood vessels exist in the images. Preprocessing effects are shown in Fig. 1.

Network framework
In this paper, we present a neural network named DBCU-Net, as is shown in Fig. 2. The encoding path consists of a total of three layers, each layer consists of dense connectivity modules followed by 2 × 2 max pooling. The output of the first convolutional layer in each block is added to the batch normalized output, and then the feature map size is reduced to half of the original size using 2 × 2 max pooling. Before the max pooling is performed, the feature maps are combined by skip connection via a bidirectional ConvLSTM. Then the resulting feature maps are combined with feature maps from upsampling layer.
The decoding part of DBCU-Net also has three layers, where each layer includes an upsampling layer, a bi-directional ConvLSTM module and a convolution module. The connecting layer of the encoding and decoding paths also has three layers, each with two consecutive convolution operations. We use the 2 × 2 upsampling operation after this and use its output as the decoding path input of the first BConvLSTM.

Dense connectivity mechanism
Inspired by U-Net, the coding path proposed in this paper has four layers. Each layer of traditional U-Net consists of two consecutive 2 × 2 convolution and one pooling operation. As the network deepens, this series of convolutional layers learn many redundant features while helping the network to learn different types of features. To solve this problem, we apply dense connected convolution in the coding path. The feature maps of different layers in DenseNet need to be kept in the same size for the feature map connection operation. However, the downsampling layer attempts to reduce the size of the feature. In order to avoid affecting the downsampling operation in the coding path, we divide DenseNet into dense blocks and the structure is shown in Fig. 3.

Bi-directional ConvLSTM mechanism
LSTM (Long Short Term Memory Network) is proposed to solve the problem that RNNs cannot handle long time dependencies by adding the state c. The state allows network to preserve long time states. The structure of LSTM is shown in Fig. 4.
The mathematical formula of LSTM is as follows: (1) Fig. 2 The overall architecture of DBCU-Net  where h t−1 , c t−1 and f t are inputs, which are the LSTM output at the previous moment. The cell state at the previous moment and the input value of the network at the current moment, h t and c t are outputs, which are the LSTM output at the current moment and the cell state at the current moment. For the rest notations, i t is the input gate, f t is the forgetting gate, o t is the output gate, c t is the cell state at the current input, σ is the sigmoid function and tanh is the tanh function.
In the above equations, W represents the weight matrix of each gate, [h t−1 , x t ] is the two input vectors connected into a longer vector, b is the bias term of each gate, and ○ represents the corresponding multiplication by elements.
The core part of LSTM is the cell state, which preserves the long-term dependency state and solves the problem that RNN cannot handle the long-term dependency. But the LSTM uses full connectivity in the input-to-state and state-to-state transitions, which does not take into account spatial correlation and has limited ability to portray spatial features. Some scholars proposed the convLSTM [24]. The core essence of the convLSTM is the same as the LSTM, only with the addition of the convolution operation. The convolution operation can not only get the temporal relationships, but also extract spatial features.
In the standard U-Net, the feature maps in the encoding path are cropped and copied to the decoding path, and then simply concatenated with the output of the corresponding upsampling layer. We propose a BConvLSTM instead of simple concatenation to combine the two feature maps to obtain more accurate outputs. Traditional ConvLSTM only takes into account the forward information processing. However, all information in the sequence should be fully considered. To solve this problem, we use a BConvLSTM in skip concatenation, which considers not only the forward dependencies but also the backward dependencies, and the structure is shown in Fig. 5.
In this paper, the output features cropped and amplified from the coding layer are denoted as X e ∈ R C l ×W l ×H l , and the features of output from the previous convolutional layer are denoted as X d ∈ R C l+1 ×W l+1 ×H l+1 , where C l and W l × H l represent the number of feature channels and feature map size of the l layer features, respectively. Since each convolutional layer operation is immediately followed by a pooling operation in the coding path, C l+1 = 2 × C l , W l+1 = 1∕2 × W l and H l+1 = 1∕2 × H l can be obtained. as shown in Fig. 5, X d is first passed to the upsampling convolutional layer, in which the upsampling function and a 2 × 2 convolution immediately following. And after such an operation, the size of each feature map is doubled and the number of feature channels is halved, thus obtaining X up d ∈ R C l ×W l ×H l ; then X up d is obtained after a BN operation. then the X e and X up d are then fed into a BConvLSTM for encoding. The BConvLSTM method employs two ConvLSTMs to process input data in both forward and backward directions, and subsequently uses the dependencies of these two directions to determine the current input. Existing literature has demonstrated that taking bi-directional relationships into account yields better experimental performance. The ConvLSTM models in the forward and backward directions can be regarded as two independent ConvLSTMs, whose mathematical formulations are shown below.
where * represents the convolution operation. �� ⃗ H t and ⃖�� H t to represent the hidden state tensor for forward and backward transfer, respectively, and Y t denotes the output after considering the bi-directional dependence. Then we can obtain:

Experiments and results
We train DBCU-Net on our private coronary angiography image dataset. Our dataset consists of 50 X-ray coronary angiography images along with segmentation annotation of vessel regions. Angiographic images are grayscale images of different blood vessels with a size of 512 × 512 pixels. Segmentation labels were annotated manually by an experienced cardiovascular clinical expert. The annotated image is a binary image with the size of 512 × 512 pixels corresponding to the angiographic vessel image and consists of two categories: black represents the background, and white represents the vessel. We randomly divide the 30 angiographic images and their corresponding annotated images in our dataset into the training set and divide the remaining 20 images into the testing set. Example images of original image, label (annotated image) and mask are presented in Fig. 1.
Since the coronary angiography image dataset only includes 30 images. Lack of dataset causes overfitting, and we experienced it as expected initially. To alleviate the overfitting problem, we use data augmentation to expand the dataset so that the augmented dataset can represent a more comprehensive dataset. To increase the robustness of our method, we adopt the following data augmentation methods to the dataset: (1) Rotation: Rotate the image on a 45-degree axis, no need to remove black edges. The 30 angiography images of the original training dataset are increased to 120 contrast images after the above-mentioned series of rotation, mirroring, and noise processing. After performing data augmentation, the ratio of training samples to validation samples to test samples in the experiments conducted in this paper is 9:1:1.7. Example images of original image, annotated image, mask, and corresponding augmented images are presented in Fig. 6. In this experiment, we use batch training to train the network. For each angiography image after data augmentation, 3000 not fully images horizontal mirroring vertical mirroring Rotate Fig. 6 Augmented images of CAG overlapping 48 × 48 image patches are randomly cut out through the sliding window method and input into the network for training. In each epoch of training, we use 90% of the image patches for training and 10% for validation.
Our solution is implemented in Tensorflow. As the experimental course involves the sensitivity of the medical field, the dataset and related code of this experiment will not be publicly processed. In our training procedure, the use of mixed precision not only speeds up the training of the network but also reduces the consumption of GPU memory. To use mixed precision, we require Tensorflow version 1.14 or later. Our experiments are conducted on a server equipped with 64G of memory, an i7-6850 k CPU @3.6 GHz, and two NVIDIA GeForce GTX 1080Ti graphics card.
Our experiment is trained for 500 epochs with batch size 32 using the Adam optimizer [25] with a learning rate of 0.001. Segmentation results are given in Fig. 7.
To observe the difference more intuitively between our method and the basic U-Net network, we show the original image, expert-annotated image, and segmented image in Fig. 7. Benefit from BConvLSTM, DBCU-Net acquires a feature vector that comprises both high-level context and low-level context. The feature vector results from the nonlinear fusion of the encoding path's features and the decoding path's features. From the figures above, we observe that compared with the U-Net segmentation method, our proposed method can better segment the coronary artery main vessel and most of the branch vessels, and it also has a certain degree of segmentation for the small blood vessels at the distal end. Meanwhile, we also analyze the segmentation results quantitatively. We used several suitable evaluation indicators, such as accuracy, recall, precision, and F1-score. As shown in Table 1, compared with other methods, our network achieved better results on our private dataset. From the result table we can see that compared to existing methods, we obtained the best results for Accuracy, Precision, Recall and F1 score with 0.985, 0.913, 0.847 and 0.879, respectively. Moreover, compared to UNet, DBCUNet achieves competitive results in many reference metrics without adding a large number of additional parameters.

Discussion
In this paper, the DBCU-Net model is proposed to improve the existing method over segmentation with insufficient accuracy. Data enhancement and region of interest segmentation are first performed on the data images; then the segmentation is performed using the DBCU-Net model vessels incorporating densely connected blocks and BConvLSTM.
From Fig. 7, it can be concluded that. (1) The segmentation method using U-Net is shown in the red marked part of the figure, there will be a part of blood vessel disconnection, due to the simple series operation of U-Net's skip connection, which leads to incomplete extraction of context, image details cannot be extracted well, and the blood vessel cross and distal part segmentation is poor. (2) The DenseU-Net-based method fuses shallow and deep information in the skip-connection part, but fails to enhance the weight of the vascular segment part, and obtains insufficient effective information.  4) The segmentation method using ResU-Net solves the problem of U-Net disconnection due to improving the simple tandem of skip connection into residual modules. The improved methods learn the target structure focusing on different shapes and sizes, and performing context extraction in coarser scales, thus solving the problem of U-Net disconnection, and the segmentation of distal vessels is also better than U-Net. However, oversegmentation occurs due to the redundancy of learning information phenomenon. (5) The segmentation method using DBCU-Net uses bi-directional ConvLSTM in the skip connection part, thus taking into account the forward and backward dependence of information, and the segmentation results of vessels are more refined, as shown in the yellow marked part of the figure, and the over-segmentation phenomenon is also greatly improved.
Finally, the segmentation results of the algorithm in this chapter are compared and analyzed with other segmentation methods. In this paper, we show through experiments that DBCU-Net improves the over-segmentation problem of existing methods and also has a great improvement in different segmentation evaluation indexes. As shown in Table 1, with adding only a small number of training parameters, our method achieves optimal results on parameters such as Accuracy, Precision, Recall and F1 score compared with other methods. However, due to the imbalance of positive and negative samples in the dataset, this paper does not score the best on the AUC evaluation, which points out the robustness of the model proposed in this paper still needs to be improved.
The large cost of labeling CAG images has added challenges to the acquisition of datasets. In the future, we hope to focus on unsupervised segmentation [26] to reduce the difficulty of dataset acquisition. For a single CAG image to segment blood vessels, we may use CAG dynamic video in the future [27] to segment blood vessels to reduce the effects of blood vessel crossing, masking, and noise on the segmentation.

Conclusion
We proposed a new convolutional neural network architecture, which is based on U-Net and combines dense connectivity and bidirectional convLSTM. The new neural network is referred to as DBU-Net. Our model proposed in this paper is a way to enhance the current approach over segmentation with insufficient accuracy. After the evaluation of the ablation experiment, the segmentation results of the blood vessels in this method are more accurate. We conduct our experiment on our private dataset, and achieve average Accuracy, Precision, Recall and F1-score for coronary artery segmentation of 0.985, 0.913, 0.847 and 0.879 respectively. In the mainstream metrics, our method has achieved competitive results without adding additional large numbers of parameters.

Competing interests
The authors declare no competing interests.

Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Ethical approval This is an observational study. The data used in this article is obtained with the permission of the patients themselves.
Consent to participate Informed consent was obtained from all individual participants included in the study."

Consent to publish
The authors affirm that human research participants provided informed consent for publication of the images in Figs. 1, 6 and 7.