Brain tumor segmentation based on improved U-Net

doi:10.21203/rs.3.rs-1903672/v1

Download PDF

Research Article

Brain tumor segmentation based on improved U-Net

https://doi.org/10.21203/rs.3.rs-1903672/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background: Automatic segmentation of brain tumors using deep learning algorithms is one of the research hotspots in the field of medical image segmentation at this stage. An improved u-net network is proposed to segment brain tumors in order to improve the segmentation effect of brain tumors.

Methods: In order to solve the problems that other brain tumor segmentation models such as U-Net have insufficient ability to segment edge details, poor extraction of location information and the commonly used Binary Cross-Entropy and Dice loss are often ineffective when used as loss functions for brain tumor segmentation models, we propose a serial encoding-decoding structure, which achieves improved segmentation performance by adding Hybrid Dilated Convolution (HDC) modules and the connections between each module of the two serial networks, in addition, we propose a new loss function in order to make the model more focused on samples that are difficult to segment and classify. We compared the results of our proposed model and the commonly used segmentation models under IOU, PA, Dice, Precision, Hausdorf95, and ASD metrics.

Results: The performance of the proposed method outperforms other segmentation models in each metric. In addition, the schematic diagram of segmentation results shows that the segmentation results of our algorithm are closer to ground truth, showing more details of brain tumors, while the segmentation results of other algorithms are more smooth.

Conclusions: Our algorithm has better semantic segmentation performance, compared with other commonly used segmentation algorithms. The technology we proposed can be used in the diagnosis of brain tumors to provide better protection for patients' later treatment.

Brain Tumor

Dice loss

Encoding-decoding

HDC

Segmentation

U-Net

Tumors have always been a feared disease, and brain tumors have an incidence rate of 1.5% and an alarming 3% mortality rate in the population, and have been feared because of their extremely high incidence and mortality rate[1]. Brain tumor is a cancer of the brain tissue, which is formed when the brain tissues become cancerous or metastasize to other tissues in the skull. With the development of medical imaging technology, imaging technology has gradually been applied to tumor detection. At the beginning, Computed Tomography (CT) technology was used for detection, but with the development of magnetic resonance physics, and then combined with the theory and technology of digital image reconstruction, Magnetic Resonance Imaging (MRI), is slowly taking shape, because it has no ionizing radiation damage to the body and many imaging parameters have gradually become the mainstream of medical detection of brain tumors[2]. However, most of the current clinical diagnosis of brain tumors is based on the clinician's experience. The manual method to segment, diagnose and annotate the tumor images is inefficient and demanding for image analysts, and it is easy to miss the best treatment time of patients [3]. Therefore, how to efficiently diagnose brain tumor images and reduce the diagnostic error of images has become a research direction for many researchers. Currently, deep learning-based intelligent algorithms are widely used in brain tumor analysis tasks, and CNNs are adopted by researchers for their good segmentation performance and the convenience of feature extraction[4]. However, CNN is prone to computational redundancy when processing a large number of dense images [5]. Therefore, FCN[6], U-Net [7] and other derived algorithms based on CNN have been proposed. However, many brain tumor segmentation algorithms still have many problems, such as the segmentation accuracy and recognition accuracy of the algorithms are not high enough, and the attention to details is not enough. In this paper, we propose an improved segmentation network based on U-Net to solve these problems, using a tandem encoding-decoding model and proposing a new loss function to increase the weight of samples that are difficult to segment classify. The experimental results show that it outperforms several other commonly used derived models based on CNN in terms of segmentation performance and tumor recognition accuracy.

How to perform image segmentation is a key problem in the field of Computer Vision(CV), and image segmentation generally includes semantic segmentation and instance segmentation [8]. The brain tumor segmentation in this paper is using semantic segmentation. The evaluation of the segmentation ability of the semantic segmentation model needs to focus not only on the overall segmentation of the segmented image, but also on the edge segmentation of the segmented image. So, how to design the segmentation algorithm becomes the most important, and different researchers have proposed different methods for the research of segmentation algorithm. With the rise of neural network models and the development of deep learning, segmentation networks based on deep learning have been rapidly developed and applied. Starting from the concept of neural network proposed by Le Cun, neural networks have been developed rapidly, and various neural network structures began to emerge slowly, such as AlexNet [9], VGG[10], ResNet [11], etc. Although these networks have advantages in the field of image recognition and prediction, the advantages in accurate semantic segmentation of images are not so obvious. In order to change this situation, Shelhamer et al. proposed FCN, applying FCN to semantic segmentation of images[6]. They achieved the segmentation mainly by replacing the fully connected layers of net with convolutional layers, and the results showed that the image semantic segmentation did outperform the other Convolutional Neural Networks (CNN). The reason is that full convolutional networks (FCNs) require high data volume, and such brain tumor images are relatively few and precious in medicine. To solve this problem, Ronnerberger et al. modified the full convolutional network by adopting transposed convolution, up-sampling, and fusion of context features and detail features to form U-Net, which can obtain enough data features with few brain images, and the segmentation effect is significantly better than that of full convolutional network (FCN). But there are still problems of incomplete information and low segmentation accuracy when performing brain tumor segmentation. In order to solve the remaining problems of U-Net network, Alom proposed a recursive neural network based on U-Net and a recursive residual convolution neural network based on U-Net model.[12]. Zhang et al. Used U-Net extended path to design residual connection, and proposed a depth residual U-Net for image segmentation[13]. Milletari proposed a 3D U-Net model, which uses 3D convolution kernel to expand the original U-Net structure, and then adds residual units to further modify the original U-Net structure [14]. Salehi uses auto context algorithm to enhance U-Net to improve segmentation effect[15]. Zhou et al. Used the nesting method to replace the original connection method [16]. Wanli Chen, Yue Zhang et al. proposed a stacked U-net with a bridging structure to address the problem of increasing training difficulty as the number of layers of the network increases [17]. The above segmentations model can only segment images, but cannot grade segmented tumor. To achieve this clinical need, Mohamed A. Naser and M. Jamal Deen first used the trained segmentation model and MRI images for mask generation, and then used a densely connected neural network classifier to classify the tumor [18].

Dataset

The dataset was obtained from the Kaggle open source database "Brain Tumor MRI Image Classification", which contains three main types of brain tumors: glioma tumor, meningioma tumor, pituitary tumor. The sample size of the training set containing brain tumors is 2475 and the sample size of the test set is 289. Firstly, we screened the data set and selected sections with brain tumors in the sample as our experimental data set. Then, image enhancement was carried out and the sample size of the enhanced data set was 2624. Finally, manual labeling of the enhanced sections was completed with the help of graduate students from the Medical College. The labeled images were reassigned according to the 10:1 ratio of training set and test set.

Encoding Module And Decoding Module

The Vgg16 chosen in this paper is used as the basic framework of the encoding module. Our neural network models use maximum pooling to reduce the network volume and highlight the main features when performing feature extraction. This method may ignore the details in segmentation and loses the spatial location information of the main features. However, brain tumor cutting requires the accuracy of cutting to millimeter or even micron level, which requires us to obtain more detailed features and minimize the loss of features in training. we know that larger convolutional kernels may capture more positional information compared to smaller sized convolutional kernels, because a larger receptive field can better resolve positional information [19]. Therefore, we chose Hybrid Dilated Convolution (HDC) [20]. We replaced the ordinary convolution of each layer in the two VGG encoding modules with a group of three dilated convolutions. The dilation rates were selected as 1, 2 and 3 respectively to satisfy the formula, ${M_i}=\hbox{max} \left[ {{M_{i+1}} - 2{r_i},{M_{i+1}} - 2\left( {{M_{i+1}} - {r_i}} \right),{r_i}} \right]$, representing the maximum distance between two non-zero values. The distribution of dilation rate into sawtooth wave pattern, helping the top layer obtain more information while keeping the receiving field unchanged compared with the original configuration that dilation rate is into the same value[20]. The encoding network structure as shown in Fig. 1, which is divided into 5 layers, each layer is composed of a group of dilated convolution blocks and a maximum pooling. Each group of dilated convolution blocks is composed of three 3×3 dilated convolution (the dilation rate of dilated convolution is 1, 2, 3 respectively) and three ReLu functions. In the decoding module, transpose convolutions are used to obtain more image information.

Series Section

The complete architecture of SCU-Net is to connect two encoding-decoding modules in series and then bridge each layer for feature sharing purposes. Several existing works show that the interactions between global features or contexts help to do semantic segmentation, and we experimented with two bridging approaches, a pixel summing approach and a channel concatenation approach, respectively. Finally, we used the concatenation approach to connect the different modules.

The overall architecture of SCU-Net is shown in Fig. 2, which is composed of two encoding-decoding structures in series. Unlike stacked U-Net, SCU-Net directly uses the output of the previous encoding-decoding structure as the input of the next encoding-decoding structure, maximizing the use of the image information after the previous decoding, avoiding structural redundancy, and being more conducive to the extraction of key information and mining.

Linking the same layers of two encoding-decoding structures through concatenation can deepen the internal connection between the two encoding-decoding structures and perform secondary feature extraction of the previous information to obtain richer semantic information and segmentation effects.

Loss function

Dice Coefficient is a commonly used performance metric for segmentation tasks and thus has also been modified as a loss function to obtain higher segmentation performance, but since Dice Coefficient Loss is a non-convex function, the training process may not achieve the desired results, therefore, Shruti Jadon et al. proposed Log-Cosh Dice Loss to solve the problem of non-convex loss function by adding smoothing through Lovsz expansion[21]. The formula is as follows:

$$coshx=\frac{{{e^x}+{e^{ - x}}}}{2}$$

$${L_{lc - dce}}=log(cosh({\text{ }}DiceLoss{\text{ }}))$$

Although Log-Cosh Dice Loss takes into account the problem of non-convexity, the same loss coefficient is used for different segmentation samples. We improve Log-Cosh Dice Loss so that it can achieve adaptive changes for different samples by reducing the weights of easily segmented samples and adding the weights of difficult segmented samples, making the model focus more on the hard-to-segment samples during training. In addition, the improved Log-Cosh Dice Loss and Focal Loss are combined so that the Loss function can focus on both model classification ability and model segmentation ability. The improved Loss function is called focus-dice Loss, as shown in Formula (3). ${\omega _1}$和${\omega _2}$are used to adjust the weights between Focal Loss and the improved Log-Cosh Dice Loss.

$$\begin{gathered} Focal-Dice Loss={\omega _1}(FocalLoss)+ \hfill \\ {\omega _2}{\left( {DiceLoss} \right)^\gamma }log(cosh({\text{ }}DiceLoss{\text{ }})) \hfill \\ \end{gathered}$$

Experimental Environment

The framework used in the experiments is PyTorch, and the specifications of the machine are: graphics card: Tesla P40; video memory: 22G; CPU: Intel(R) Xeon(R) CPU E5-2680 v4; memory: 440G, cores: 56 cores. We optimize the model using the Adam optimizer and adjust the learning rate using the cosine annealing function by setting the cosine annealing function.

The experiment is divided into three parts. First, we modify the convolution part of the UNetVgg16 backbone network by replacing the original convolution block with an HDC block and compare it with the original UNetVgg16 model. Then we modify the loss function part of UNetVgg16, replace the original cross-entropy loss function with our proposed Focal-Dice Loss, and compare it with the original UNetVgg16 model. Finally, we compare the performance capability of our proposed SCU-Net with several commonly used semantic segmentation models on our brain tumor dataset.

UNetVgg16 + HDC

To verify the effectiveness of the UNetVgg16 model using the HDC module. We compared the UNetVgg16 with the HDC added with UNetVgg16. We replace the convolution in the original UNetVgg16 with the HDC module, train 25 epochs, and set the dilation rates of the three dilated convolutions of the HDC module to: r = 2, r = 3, r = 4; the padding to: padding = 2, padding = 3, padding = 4. The experimental results are shown in the table 1. As shown in Table 1, the results of HDC under each index are better than unetvgg16. The index with the most improvement is hausdorf95, an increase of 27.08 percentage points, followed by MIoU, an increase of 9.51 percentage points.

UNetVgg16 + Focal-Dice Loss

After the validity of HDC module was proved, we removed HDC module and replaced the cross entropy Loss function in UNetVgg16 with the Focal-Dice Loss proposed by us. The weight of Focal-Dice Loss was set as: 4:1. As can be seen from Table 1, the use of proposed Focal Dice Loss has greatly improved or reduced the values of most performance indicators, MIoU and Hausdorf95 change the most, with MIoU increasing by 9 percentage points, Hausdorf95 decreasing by 14.5 percentage points.

SCU-net

Finally, we used SCU-net to carry out 50 iterations, and the initial learning rate was set as 10^− 4. The weight of Focal-Dice Loss function was set as: 4:1. For UNetVgg16 and UNetResNet50, we use the weights pre-trained by Vgg16 and ResNet50 on VOC data set for migration learning. The initial learning rate is set to 10^− 4 in the freezing phase and 10^− 5 in the unfreezing phase. For U-Net, Deeplabv3RresNet50, FCN8s, we directly conduct 50 iterations for training, and the initial learning rate is set as 10^− 4. Table 2 shows the performance of SCU-NET and several commonly used segmentation algorithms under different indexes in the same tumor data set. It can be seen from the figure that all indexes of SCU-NET are ahead of other networks and have the best performance, followed by UNetVgg16 and balanced performance of all indexes, followed by FCN8s. The worst performers were U-Net and Deeplabv3ResNet50, which were not as good at segmentation or classification accuracy.

Figure 3 was the comparison of the segmentation results and Ground Truth of 6 different types of tumors by different models, in which the red region represented glioma tumor, the green region represented meningioma tumor, and the yellow region represented pituitary tumor. It can be seen from the figure that the first segmentation result of UNetVgg16 is quite different from the Ground Truth. The segmentation results of other classes by UNetVgg16 are basically consistent with the Ground Truth, but the edges are still too smooth and the segmentation details of the edges are not ideal. There were parts of glioma tumor wrongly predicted as meningioma tumor in the second segmentation figure of UNetResNet50 and the segmentation result is a triangle which is quite different from the Ground Truth. The second segmentation result graph of U-Net almost did not segment the tumor part, so the effect was poor. The last segmentation map of Deeplabv3ResNet50 has nearly 50% of the regions incorrectly classified, while the segmentation results of other segmentation maps are obviously too smooth, although the categories are correctly predicted. The effect of the second segmentation map of FCN8s is poor, which is quite different from the Ground Truth. The segmentation result of SCU-NET (Ours) is the best, there is almost no error prediction, and the edge segmentation effect is obviously better than other models. Experimental results show that SCU-NET has good robustness on the brain tumor data set presented in this paper.

UNetVgg16 + HDC

In order to verify whether unetvgg16 with HDC module can make the model capture more detailed information, we compare unetvgg16 with HDC with the original unetvgg16. It can be seen in Table 1 that HDC+UNetVgg16 has a great improvement in both segmentation ability and classification accuracy. This is because the HDC module enlarges the receptive field of the model, and solves the problem of incomplete information capture caused by the traditional expanded convolution chessboard phenomenon, so that the algorithm can better extract the characteristics of the image. Therefore, it is a good choice to use HDC in the backbone network of UNetVgg16, which can improve the model's ability in classification accuracy and segmentation accuracy.

UNetVgg16 + Focal-Dice Loss

In order to verify whether our proposed focal dice loss can specifically learn samples that are difficult to segment and classify, so as to improve the overall performance of the algorithm, we designed a comparative experiment between unetvgg16 + focal dice loss and unetvgg16, as shown in Table 1, the experimental results show that the Focal Dice Loss proposed by us contributes to the performance of segmentation network and greatly improves the tumor recognition accuracy and segmentation performance. We accelerate the training of difficult training samples by increasing the loss weight of difficult segmentation and classification, so as to improve the performance of the model, which means that we can better deal with brain tumor samples that are more difficult to train in the training set, so as to effectively solve the problems of unbalanced tumor category samples and different image quality of tumor samples.

SCU-net

Table 2 shows that our SCU net is different from u-net and the other four commonly used segmentation algorithms. It can be seen from the table that SCU net has the best performance under most indicators, and its performance is far better than that of u-net. Figure 3 shows the segmentation results of SCU net, u-net and several other commonly used segmentation algorithms. It can be seen that the effect of SCU net is closest to ground truth, and the processing of details is also better than other algorithms. According to the experiment, it can be inferred that although FCN8s adopts 8 times up-sampling, which is much better than 32 times, it still lacks details and is not enough to achieve a high segmentation effect. UNetResNet50 and Deeplabv3ResNet50 are easy to produce gradient disappearance and other problems because the backbone network model is too deep. The tumor images are composed of simple textures and shapes, and the most typical features. If ResNet50 is used, May cause redundancy of feature, which affects the final result, so the Vgg16 as SCU-Net backbone network is the best choice. Through serial operation, SCU net not only does not result in feature redundancy, but also uses two decoding networks for twice feature extraction, as well as the fusion of features and pixels, which improves the semantic segmentation performance of the network.

In this paper, we propose an improved U-Net algorithm, called SCU-Net, for segmentation of brain tumors. We operate two U-Net models with Vgg16 as the backbone in tandem and perform feature splicing and decoding module splicing at each layer, so that the two encoding-decoding blocks before and after can form a tighter connection to obtain more semantic information, reduce feature redundancy, and further improve the generalization ability of the model. Since location information is extremely important for brain tumor category classification, we add another HDC module to the U-Net encoding network in order to obtain a larger receptive field to capture more location information. In addition, the proposed Focus-dice Loss enables the model to focus not only on the accuracy of pixel classification but also on segmentation performance all the time during training, and pay more attention to the samples that are difficult to classify and divide. We compared SCU-NET with commonly used brain tumor segmentation models under 6 indicators. Experimental results show that the proposed method can significantly improve the performance of target segmentation and tumor prediction.

Acknowledgements

Not applicable.

Authors’ contributions

Ping Zheng designed the algorithm and is the main contributor to the manuscript. Xunfei Zhu participated in the analysis and discussion of the experiment and the preparation of the manuscript. Wenbo Guo participated in the preparation of the manuscript and the preliminary material preparation.

Funding

Not applicable.

Availability of data and materials

The data in this article adopts Kaggle's official open source data set. Download address: https://www.kaggle.com/datasets/iashiqul/brain-tumor-mri-image-classification.

Ethics approval and consent to participate

In this paper, all methods and data were carried out in accordance with relevant guide-lines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Author details

¹studying for a master’s degree at Anhui University of Science and Technology, Anhui, 232001, China (e-mail: [email protected]).

² studying for a master’s degree at Yunnan University, Yunnan, 650500, China

³studying for a master’s degree at Anhui University of Science and Technology, Anhui, 232001 China.

Mayer G, Vrscay E. Self-similarity of Fourier domain MRI data. Nonlinear Anal Theory Methods Appl. 2009;71(12):e855-e864.
Mohan G, Subashini MM. MRI based medical image analysis: Survey on brain tumor grade classification. Biomed Signal Process Control. 2018;39:139–161.
Hu HX, Mao WJ, Lin ZZ, Hu Q, Zhang Y. Multimodal Brain Tumor Segmentation Based on an Intelligent UNET-LSTM Algorithm in Smart Hospitals. ACM Trans Internet Technol. 2021;21(3):14.
Hao K, Lin S, Qiao J, Tu Y. A Generalized Pooling for Brain Tumor Segmentation. IEEE Access. 2021;9:159283–159290.
Yang T, Song J, Li L, Tang Q. Improving brain tumor segmentation on MRI based on the deep U-net and residual units. J X-Ray Sci Technol. 2020;28(1):95–110.
Long J, Shelhamer E, Darrell T, Ieee: Fully Convolutional Networks for Semantic Segmentation. In: Proc IEEE Conf Comput Vision Pattern Recognit. Boston, MA: Ieee; 2015: 3431–3440.
Ronneberger O, Fischer P, Brox T: U-Net: Convolutional Networks for Biomedical Image Segmentation. In: 18th Proc Int Conf Med Image Comput Comput-Assisted Intervention. vol. 9351. Munich, GERMANY: Springer International Publishing Ag; 2015: 234–241.
Yu H, Yang Z, Tan L, Wang Y, Sun W, Sun M, Tang Y. Methods and datasets on semantic segmentation: A review. Neurocomputing. 2018;304:82–103.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Proc Adv Neural Inf Process Syst. 2012;25.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.
He KM, Zhang XY, Ren SQ, Sun J, Ieee: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conf on Comput Vision Pattern Recognit. Seattle, WA; 2016: 770–778.
Alom MZ, Hasan M, Yakopcic C, Taha TM, Asari VK. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:180206955. 2018.
Zhang Z, Liu Q, Wang Y. Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett. 2018;15(5):749–753.
Milletari F, Navab N, Ahmadi SA, Ieee: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In: 4th IEEE International Conference on 3D Vision (3DV). Stanford Univ, Stanford, CA: Ieee; 2016: 565–571.
Salehi SSM, Erdogmus D, Gholipour A. Auto-context convolutional neural network (auto-net) for brain extraction in magnetic resonance imaging. IEEE Trans Med Imaging. 2017;36(11):2319–2330.
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging. 2019;39(6):1856–1867.
Chen W, Zhang Y, He J, Qiao Y, Chen Y, Shi H, Wu EX, Tang X: Prostate segmentation using 2D bridged U-net. In: 2019 Int Jt Conf Neural Networks. IEEE; 2019: 1–7.
Naser MA, Deen MJ. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med. 2020;121:103758.
Islam MA, Jia S, Bruce ND. How much position information do convolutional neural networks encode? arXiv preprint arXiv:200108248. 2020.
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G, Ieee: Understanding Convolution for Semantic Segmentation. In: 18th IEEE Winter Conf Appl Comput Vis. Nv; 2018: 1451–1460.
Jadon S: A survey of loss functions for semantic segmentation. In: IEEE Conf on Comput Intell in Bioinf and Comput Biol IEEE; 2020: 1–7.

Tables 1 to 2 are available in the Supplementary Files section

No competing interests reported.

TABLES.docx

Download PDF

Editorial decision: Major revision
13 Sep, 2022
Reviews received at journal
01 Sep, 2022
Reviewers agreed at journal
16 Aug, 2022
Reviewers invited by journal
16 Aug, 2022
Editor assigned by journal
14 Aug, 2022
Editor invited by journal
29 Jul, 2022
Submission checks completed at journal
29 Jul, 2022
First submitted to journal
28 Jul, 2022

You are reading this latest preprint version

Brain tumor segmentation based on improved U-Net

Status:

Version 1

Abstract

Figures

Background

Method

Dataset

Encoding Module And Decoding Module

Series Section

Loss function

Experimental Environment

Result

UNetVgg16 + HDC

UNetVgg16 + Focal-Dice Loss

SCU-net

Discussion

UNetVgg16 + HDC

UNetVgg16 + Focal-Dice Loss

SCU-net

Conclusion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1