Segmentation of Brain Gliomas Based on a Multi-modal Multiscale Double-pathway 3D Residual CNN

8 Background ： The automated segmentation of brain gliomas regions in magnetic resonance (MR) 9 images plays an important role in the early diagnosis, intraoperative navigation, radiotherapy planning 10 and prognosis of brain tumors. It is very challenging to segment gliomas and intratumoral structures 11 since the location, size, shape, edema range and boundary of gliomas are heterogeneous, and 12 multimodal brain gliomas images (such as T1, T2, fluid-attenuated inversion recovery (FLAIR), and 13 T1c images) are collected from multiple radiation centers. 14 Methods ： This paper presents a multimodal, multi-scale, double-pathway, 3D residual convolution 15 neural network (CNN) for automatic gliomas segmentation. First, a robust gray-level normalization 16 method is proposed to solve the multicenter problem, such as very different intensity ranges due to 17 different imaging protocols. Second, a multi-scale, double-pathway network based on DeepMedic 18 toolkit is trained with different combinations of multimodal MR images for gliomas segmentation. 19 Finally, a fully connected conditional random field (CRF) is used as a post-processing strategy to optimize the segmentation results for addressing the isolated segmentations and holes. 21 Results ： Experiments on the Multimodal Brain Tumor Segmentation (BraTS) 2017 and 2019 22 challenge data show that our methods achieve a good performance in delineating the whole tumor with 23 a Dice coefficient, a sensitivity and a positive predictive value (PPV)


Results ： Experiments on the Multimodal Brain Tumor Segmentation (BraTS) 2017 and 2019 22
challenge data show that our methods achieve a good performance in delineating the whole tumor with 23 a Dice coefficient, a sensitivity and a positive predictive value (PPV) of 0.88, 0.89 and 0.88, 24 respectively. Regarding the segmentation of the tumor core and the enhancing area, the sensitivity 25 reached 0.80. 26 Conclusions：Experiments show that our method can accurately segment gliomas and intratumoral 27 structures from multimodal MR images, and it is of great significance to clinical neurosurgery. 28 29 30 1 Background 31 Brain tumors, whose annual incidence is 4 per 10000 people, are one of the most lethal cancers in the 32 world. Gliomas account for 40-50% of all brain tumors [1]. Identifying gliomas structures based on 33 precise segmentation of brain magnetic resonance (MR) images not only assists diagnosis but also 34 plays an important role in intraoperative navigation of tumor location, regional planning of targeted 35 radiotherapy, and follow-up observation of patients after surgery [2]. Therefore, developing a reliable, 36 accurate and fast gliomas segmentation method has always been a research hotspot in the field of 37 medical image processing. However, the locations, sizes, shapes, edema ranges and boundaries of 38 gliomas in brain images are heterogeneous. It is very difficult to segment gliomas and the intratumoral 39 structures [3]. 40 Manual and semiautomatic segmentation of gliomas takes several hours for an experienced radiologist 41 to delineate the tumor boundary. The results of manual segmentation largely depend on the subjective 42 judgment of the observer or clinician. It was reported that gliomas segmented by the same or different 43 doctors may differ by up to 20%+15% or 28%+12%, respectively [4]. 44 Based on prior knowledge, the traditional automatic segmentation algorithm preprocesses images, 45 calculates and outputs image features (such as texture primitives, wavelet transforms) to random forest, 46 support vector machine, self-organizing map or K-means classifiers for judgment and classification, 47 and achieve the automatic segmentation of gliomas [5,6]. These traditional classifiers can process only 48 pre-extracted image features. In recent years, deep learning algorithms based on human visual systems, 49 such as convolutional neural networks (CNNs), can independently acquire certain feature information 50 from images and achieve feature classification and target segmentation [7]. This method greatly saves 51 the time consumed by traditional algorithms in feature extraction. 52 To achieve brain gliomas segmentation in multimodal brain MR images, some studies divide 53 multimodal 3D MR images into groups of 2D cross-sectional data along the vertical axis and input 54 them into a 2D CNN for training [8]. In essence, this method is still based on a 2D CNN. The 55 complexity of the network structure is relatively low, and the amount of computation is relatively small. 56 The disadvantage of this algorithm is that it ignores the relative relationship of voxels on the vertical 57 axis and loses the interlayer relationship of the image information. Some studies directly build a 3D 58 CNN network to process the 3D structure feature of voxels [9]. This method greatly increases the 59 computational complexity and introduces higher requirements for the hardware performance. To 60 address the heterogeneity of tumors, a two-channel cascaded CNN with multi-scale sample input is an 61 effective method [10,11] both in 2D or 3D CNN since such architecture combined the detailed features 62 of tumor and its larger context information well. Except of altering the kernel size in each pathway, 63 DeepMedic [12] was ranked top in Multimodal Brain Tumor Segmentation (BraTS) 2016 challenge  64 with an architecture of deep layers and two pathways each dealing with the multi-scale information. 65 Besides, the traditional automatic segmentation methods, for example, a conditional random field 66 (CRF), was proved efficient to be a post-processing step of a deep learning algorithm to improve the 67 network performance, as it can take into account voxels and their surrounding domain information [13]. 68 And with the basic idea of multi-pathway architecture, many studies [14] and even DeepMedic itself 69 has been improved to a three-or more-pathway network to deal with multi-resolution patches. At the 70 same time, there are also other multi-pathway architecture [15,16] focusing on the information of 71 multimodalities, with a densely-connection between convolutional layers in the one pathway and 72 between the pathways, they had proved the combination of multi-modality MRI in multi-pathway 73 worked well in infant brain segmentation and ischemic stroke lesion segmentation. Recently, Aigṻn et 74 al. [17] had explored that a late fusion strategy did perform better than early one as it avoids the early 75 merge of particular features in each modalities. 76   Inspired by the above mentioned methods, we trained the multimodal multi-scale double-pathway 11-77 layer residual CNN with a late fusion strategy for automatic segmentation of gliomas in MR images. 78 Since MR imaging scans may be altered by the bias field distortion and suffer from the multicenter 79 issue, it is necessary to vary the intensity of the same tissues consistently across the images. In the 80 preprocessing stage, a robust strategy of denoising and intensity normalization of multimodal MR 81 images is proposed to make the images more comparable. Then, the datasets are randomly divided into 82 a training set, a validation set and a test set. After data enhancement, multiscale training samples of 83 different modalities are sampled and input into two pathways according to our modality combination 84 to enhance the feature information. The soft segmentation results by the CNN may show some isolated, 85 misclassified voxels or small clusters, sometimes in physiological and anatomically unlikely locations. 86 Finally, a post-processing procedure including a fully connected 3D CRF is employed to optimize the 87 soft segmentation probability maps and obtain the final segmentation results. 88 This paper is organized as follows: the experimental results are summarized in section 2, and the 89 discussion and conclusion are given in sections 3 and 4, the details of our methods are presented in 90 section 5 respectively. 91 The study is based on the DeepMedic open source toolkit Keras in the Theano deep learning framework 107 [12]. The experimental environment is a CUDA 7.5 parallel computing structure and cuDNN 7.5 GPU 108 acceleration library. The GPU is an NVIDIA GeForce GTX 1080. 109

Network configuration 110
The total number epochs of the training process was set to 35. Each epoch contains 20 iterations, and 111 during each iteration, 10 sets of sample data are processed in parallel. Each training session consisted 112 of 1000 samples, which consisted of 10 samples from healthy tissue and 10 samples from tumor regions 113 from 50 patients randomly selected from the training set. After each training epoch, 5000 samples were 114 randomly selected from the ROIs of all 40 patients in the validation set with a ratio of healthy tissue to 115 tumor area of 3:7 to evaluate the performance of the CNN and determine the training condition. After 116 training, 40 case images in the test set were input into the network for segmentation. 117 The dropout rate of the full connection layer was set to {0, 0.5, 0.5}. The activation function of each 118 layer was the PReLU function, and the classifier was the softmax function. The batch size was 10, 119 while the standardized number of samples of each batch was 60 [20]. The activation function values 120 were normalized to prevent output explosion. The initial learning rate is 10 -4 with a decline by half. 121 After three epoch validations, the accuracy remains stable, and the learning rate is halved to improve 122 the learning accuracy. 123

Evaluation metrics 124
According to the BraTS evaluation index, we need to evaluate the complete tumor, core region and 125 enhancing region segmented manually ( ) and algorithmically ( ) through the Dice coefficient, 126 positive predictive value (PPV), and sensitivity, respectively. These measures are calculated as follows: 127 In particular, the complete tumor includes a necrotic area, edema area and enhancing core; the core 128 region includes a necrotic area and an enhancing core; and the enhancing region includes only the 129 enhancing core. 130 The Hausdorff distance is also taken into consideration as the maximum distance of a set to the nearest 131 point in another set, in other words, how close the segmentation and the expected output are. Given 132 two point sets A = { 1 , 2 , … , } and B= { 1 , 2 , … , } in Euclidean space, the Hausdorff distance 133 between A and B is defined as 134 ( , ) = max (max ∈ (min ∈ ( − )) , max ∈ (min ∈ ( − ))), and a 95% Hausdorff distance was chosen to avoid the influence of outliers [21]. 135 136

Evaluation of the intensity normalization methods 138
Results in Table 1 shows that our robust intensity normalization performs better than the simple 139 normalized filter in 3D slicer module in all the metrics. As the latter one normalized the image globally, 140 which account for a lot of zero point, and its intensity distribution is still the same as the original images 141 without a histogram matching. 142 143 144 145 considering the performance of the enhancing core and neurotic core, the larger size of 31 3 is much 153 poorer, as its Dice coefficient declines to 0.65 and 0.40 and PPV declines to 0.76 and 0.52. The smaller 154 patch size of 19 3 tends to have good accuracy in segmenting the enhancing core with a PPV of 0.84. 155 The segmentation results are shown in Fig. 1. Compared with the results of a patch size of 25 3 , the blue 156 neurotic core and the whole tumor were poorly segmented regardless of whether the patch was reduced 157 to a size of 19 3 or increased to a size of 31 3 . Thus, the patch size was set as 25 3 in the study. 158

Evaluation of the modality combinations 161
Based on the modality combinations and training pathways defined in Table 7, Table 3 compares the  162 segmentation performance of different combinations. For the whole tumor segmentation, all four 163 modalities were used in the E_G and C1_G groups, and their Dice coefficients for whole tumor 164 segmentation were all higher than 0.85 in the T_G group with only two modalities. However, the Dice 165 coefficient in the E_G group was better than that in the C1_G group, with an improvement in the 166 enhancing part. Although the sensitivity for detecting tumors in the E_G group is not as good as that 167 for the other two groups, its accuracy is best, with a PPV of 0.86 in whole tumor segmentation and 168 approximately 0.80 in inner structure segmentation. In addition, the Hausdorff distance of the E_G 169 group is also the smallest among all four groups. After exchanging the training pathways from the E_G 170 group into the C2_G group, the Dice and PPV indicators fell catastrophically in both the whole tumor 171 and the core region segmentation tasks. However, when referring to the enhancing part, the accuracy 172 and Hausdorff distance are even better than those of the other groups. The results of size 19 3 patch (see 173  Table 3) also show high precision but low sensitivity in the enhancing core and indicate under-174 segmentation. With a much poorer performance in the Hausdorff distance, we deduce that the C1_G 175 group has a more severe over-segmentation problem with higher sensitivity and lower precision. 176 Fig. 2 shows that the T_G group was more successful in identifying the blue necrotic areas than the 177 E_G group. However, it was easy to make mistakes in the segmentation of whole tumors, especially in 178 the peritumoral area, which was characterized by the over-segmentation of necrotic areas. Here, the 179 necrotic areas, enhanced nuclei and edema areas are expressed in blue, red and green, respectively. 180 After using the information from the four modes, the segmentation performance of the C1_G group is 181 improved, but compared with the E_G group, the accuracy of the C1_G group is still insufficient. The 182 results for the C2_G group were worse than those for the C1_G groups. From multimodal combination 183 training, grouping the training of the information from the four modes can indeed increase the 184 segmentation accuracy. 185

Evaluating the effectiveness of the post-processing steps 188
We incorporate the fully connected 3D CRF with several other procedures into the 3D CNN to achieve 189 more structured predictions. As shown in Table 4, after the post-processing step, the Dice coefficient 190 of the whole tumor and enhancing core all increase by two percent, and the PPV improves as well. 191 Combined with the post-processing step, the void area disappears, the adhesion part is separated, and 192 the segmentation result is more consistent with the manual labeling standard. Therefore, the 193 performance of segmentation improves as the Hausdorff distance decrease a lot. One interesting finding 194 is that with a post-processing step, the sensitivity of detecting the whole tumor weakens slightly; this 195 finding is considered a side-effect of CRF addressing the over-segmentation issue. As shown in the 196 comparisons in Fig. 3 has missed some of their metrics. As shown, we can see that though our hausdorff distance need 208 improvement, our other metrics performs well and even ranked top. Further details would be discussed 209 in next section. 210 We also tested our methods in BraTS 2019. The results are shown in Table 6. As post-processing 214 procedure is still useful in improvement on performance. Because BraTS 2019 dataset abandons the 215 post-operative data and refines the data categorization, our method obtains a slightly better result by 216 comparing with the one of BraTS 2017 dataset, especially in Hausdorff distance. 217 Patch sizes and modality combinations were analyzed in the framework. In the training process, the 228 size of the sample patches has certain limitations. When it decreases, the details in the ROI will be 229 reduced, leading to an impairment in the segmentation of large lesions and under-segmentation of 230 smaller parts. In contrast, when the patch becomes larger, it will contain too much irrelevant 231 information that will hinder useful feature extraction. 232 By comparing different modal combinations in the training methods, it was found that modal fusion 233 can enhance the feature information of the double pathway and improve the network performance to a 234 certain extent. T1 and T2 images can provide information when segmenting the whole tumor. However, 235 when segmenting the enhancing core, T1c images provide more useful information than the other 236 modes. In this study, a double pathway with different resolutions is used to learn complementary 237 information in each modality. For example, T1c and FLAIR images can display detailed information 238 such as tumor nuclei and edema areas, while T1 and T2 images provide location information of tumors 239 relative to brain structures. 240 The comparison between the different methods is shown in their work based on DeepMedic, while they add several pathways in it. However, the job is focusing 253 on the multi-resolution information in MRI images and all the modalities are still sampled with early 254 fusion before inputing into the same connection DeepMedic adopted. Compared with us, the 255 performance is very close in several metrics. Our dice and hausdorff distance is better, while their 256 method is more sensitive to lesion. And our further network is considering adding more pathway and 257 make a more complicated connection, such as densely-connection for a more complex and accurate 258 multi-modal feature fusion strategy. 259 260 4 Conclusion 261 This paper presents a multimodal 3D residual CNN algorithm for the segmentation of gliomas and 262 intratumoral structures. Specifically, a multimodal, multiscale, double-pathway, 11-layer 3D residual 263 CNN was established in this study. After normalizing all the images in the BraTS 2017 dataset by 264 histogram matching and network initiation, the data will be divided into a training group, an evaluation 265 group and a test group. This study builds and trains the segmentation network on the training group 266 data, adjusts the parameters in the evaluation group data, generates rough segmentation results on the 267 test data, and uses a CRF to remove holes and merge isolated areas to achieve the final optimization of 268 these segmentation results. The method can make full use of modal image information, such as T1, T2, 269 T1C, FLAIR, to achieve the accurate and effective segmentation of gliomas and their internal structure. 270 To improve the segmentation performance for tumors and their internal structures and ultimately 271 achieve the end-to-end classification of LGGs and HGGs, we will continue to explore the methods of 272 mixed modal combinations and multi-pathway CNNs to extract the structural features of tumors. 273 274 5 Methods 275

Preprocessing image 276
MR artifacts are usually caused by the bias field and the tremendous influence of brain tumors on 277 different intensity ranges. Specifically, one problem with combining images from different scanners 278 and analyzing them with automatic tools is that the qualities of MR images can be dependent on the 279 scanner manufacturer, field strength, MRI protocol, and so on; this issue is called the multicenter 280 problem and especially manifests as a varied range of intensities [26]. Traditional histogram matching 281 techniques cannot achieve proper intensity normalization. Goetz [27] proposed a normalization method 282 for any kind of mode image, which subtracts the whole gray-level mean of the mode image from each 283 image and adjusts the standard deviation of the gray distribution to 1. As gliomas often appears in high-284 signal areas, which causes deviations in the gray-level mean and variance of the image, the gray-level 285 value of the highest histogram bin is taken as the mean (typically the gray-level distribution of white 286 matter [28]). The K-means clustering algorithm is used to obtain the optimal clustering center of all 287 the images of the same mode, that is, the mean and variance of the distribution. Therefore, this method 288 normalizes all the images by histogram matching. The procedures are as follows: 289 For each type of mode image of each subject,  Reduce the noise with the method of SUSAN [29] and correct the bias field using FAST [30].  Linearly transform the intensity range of the image to the range of [0, 255] and calculate its histogram.  For an image I = {I k |k = 1,2, … , N}, where N is the number of voxels and I k is the intensity of the k th voxel, I ̅ denotes the gray value of the highest histogram bin, and the robust deviation is σ = √∑ (I ̅ − I k ) 2 N k=1 N ⁄ . Then, the image can be normalized to the range of [-1, 1] with its own parameters (I ̅ and σ).  Using these two parameters, the mean I ̅ and variance σ, as coordinate axes, the mean and variance distributions of all the images of the same mode can be obtained.  Using the K-means clustering algorithm to cluster the distribution, we can obtain a set of optimal mean and variance parameters, which will achieve gray matching for all the images. Fig. 4 shows that the FLAIR images with different low-grade gliomas (LGGs) (a1-a3) or T1c 290 images with different high-grade gliomas (HGGs) (a4-a6) are normalized to the images (b1-b3) and 291 (b4-b6), respectively, with a more consistent grayscale. The results indicate that the robust 292 normalization method overcomes the multicenter issue and makes the MRI scans of different patients 293 comparable. The improvement can also be confirmed by the change in the image intensity histogram 294 before and after the normalization (Fig. 5). Fig. 5b shows that the histograms of all the FLAIR images 295 with LGGs are more compact after normalization. It should be noted that the simple intensity 296 normalization technology in 3D Slicer [31] can also match all images of the same mode to the same 297 intensity space, but obtain the same distribution results similar to original one in Fig. 5a. Therefor, the 298 segmentation performance of such normalized images for subsequent CNN segmentation is slightly 299 worse than ours (shown in Table 1). 300

Multi-scale double-pathway 3D residual CNN 301
Gliomas can occur in different parts of the brain, such as the supratentorial infratentorial regions or the 302 brainstem. The shapes of gliomas vary, such as stellate, oligo-branched, ventricular or mixed types, 303 and the sizes of gliomas are very different. To successfully segment tumor tissue, a large number of 304 multi-scale spatial features with context information around the tumor should be considered. Therefore, 305 by incorporating both local and global contextual information into 3D CNN, the study employs parallel 306 convolutional pathways of multi-scale resolution containing multimodal context information of 307 gliomas; this method can effectively segment gliomas, as shown in Fig. 6 [32][33]. 308 The blue pathway ( ℎ ) inputs the sample patches of normal resolution for handling the 309 tumor appearance details, while the down-sampled sample patches that record the spatial context 310 information of the sample patches of normal resolution are input into the yellow pathway 311 ( ℎ ) for learning the high level features, such as locations, in low resolution. Note that the 312 image in the yellow pathway was down-sampled to reduce the computational complexity before 313 training. Moreover, these two pathways are independent of each other, which makes it possible to 314 extract and study each of their characteristics and information using multi-scale resolution. The patches 315 in the two pathways are all normalized to standard patches with a mean of 0 and standard deviation of 316 1 and augmented with symmetry, reversal and duplication for convolution processing. Each pathway 317 consists of 8 convolution layers, where the last 6 layers are connected with residual blocks in two 318 adjacent layers to enhance the robustness. After 8-layer convolution, the feature maps from 319 ℎ are upsampled to match the ones from ℎ . The two groups of features are 320 concatenated and input into the fully connected layer to integrate the trained features and location 321 information. Finally, based on the trained features, the softmax function is used to classify the voxels 322 into four rough segmentation results: necrotic core (red), enhanced area (blue), edema area (green) and 323 normal tissue area (yellow). 324

Convolutional layer
The ℎ ( ∈ [1, L]) convolutional layer has feature maps. The value at the 325 position ( , , ) in the ℎ feature map from the ℎ layer, ℎ xyz ，is calculated as 326 which is the result of convolving each of the feature maps ℎ ( −1) ( ∈ [1, −1 ]) from the ( − 1) th layer 327 with a 3D kernel and with a size of ( + 1) × (V + 1) × (W + 1), adding a learned bias, , and 328 applying a nonlinearity active function. Here, the kernel size is set to 3 × 3 × 3 to make the architecture 329 deeper since deeper networks have more discriminative power due to the additional nonlinearity and 330 better quality of local optima [34]. In this work, we choose the stride as 1, as larger strides down-331 sample the feature maps and cause inaccurate segmentation. The size of the outputted feature maps is 332 calculated as follows: (size of the input feature mapssize of the kernel) / stride +1. 333 ).
(2) Activition function The parametric rectified linear unit (PReLU) [35] is chosen as the nonlinear 340 activation function in the convolution and fully connected layers instead of the rectified linear unit 341 (ReLU) since its coefficient is self-adaptive to guarantee the output of each neuron during the 342 training process. PReLU is defined as 343 where denotes the temporary value of the active function during the training stage. In the case of 344 over-fitting problem, the dropout regularization is adopted to cut off some hidden neurons with a 345 probability of 50% [36]. 346 Classification layer A position-wise softmax function is chosen as a classifier, 347 where ℎ denotes the pixel to be classified, ℎ is the value of the pixel in the to regulate the cumulative square gradient r and is calculated as 354 where is the learning rate, denotes the network parameters, and is a small constant to ensure that 355 the denominator is not zero. As the network goes deeper, it becomes more difficult to train. One reason 356 for this difficulty is the vanishing or exploding gradient of the activation function during propagation; 357 the other reason is degradation with amplified errors caused by over-fitting. Therefore, a residual 358 module [37] is introduced to achieve an identical mapping. As shown in Fig. 6, the traditional mapping 359 (Fig. 6b) is altered to the residual function ℎ +2 − ℎ between the two layers (Fig. 6c). When its value 360 is 0, it is an identity mapping. Moreover, the residual blocks focus on the very small disturbance in the 361 network, which is relatively more sensitive to the learning of parameters. 362

Dense training on image patches 363
Patch size and balanced sampling Tumor volume varies greatly in different brain images. For 364 example, Fig. 7 shows the registered FLAIR (Fig. 7a) and T1c (Fig. 7b) images of the same patient, 365 showing the larger green edema area (Fig. 7a) and pink necrotic core (Fig. 7b) near the midsagittal 366 section, respectively, while Fig. 7b also shows the multiple small lesions (indicated by yellow arrow) 367 in T1c. The information in different modalities differs a lot. The green area of edema is pretty large in 368 FLAIR image while small necrotic core in pink is hardly to find in T1c images. This phenomenon leads 369 to two problems. First, what patch size is appropriate? As shown in Fig. 7, both large (upper) and small 370 (lower) patches may contain different amounts of normal tissue (green dots) and tumor tissue (red dots). 371 The determination of the patch size needs to balance whether the tumor boundary contains normal 372 tissue and the degree of inclusion. Large patches tend to contain too much irrelevant normal tissue 373 when delineating the internal structure of the tumor (red cube in Fig. 7), and small patches fail to 374 include many details, which makes it difficult to extract features (blue cube in Fig. 7). The second 375 problem is related to sampling. As shown in Fig. 8a, uniform sampling can easily lead to a significant 376 reduction in the sensitivity of tumor tissue recognition. Therefore, we sampled the tumor tissue (red 377 dot) and the normal tissue (green dot) with equal weight (50%) (Fig. 8b). 378 Fig. 9 shows that multimodal MRI images can provide different medical 379 diagnosis information due to different imaging parameters. Edema caused by tumors has a low signal 380 on T1 images, as T1 images can provide only the anatomical information of the brain. In T2 images, 381 the edema area presents a high signal intensity, and it is easy to observe the uneven gray enhancement 382 area, i.e., the relative location of the lesion.  Table 7  392 also shows the methods for training the pathways. 393

Image post-processing 396
The results of CNN segmentation may be affected by noise and local extrema, and some isolated 397 regions or holes appear, which need to be processed. Based on CRFs [38, 39], a 3D fully connected 398 CRF is built to refine the segmentation results. The energy loss function is defined as 399 where, for voxel i, ( ) = − ln ( | ) and ( | ) is the output from the CNN. According to 400 Koltun's observation [40], when referring to the multiclass segmentation of images, ( , ) can be 401 defined as a linear combination of Gaussian kernels that is sensitive to the image contrast: 402 when ≠ , ( , ) = 1; otherwise, ( , ) = 0. In Eq. (7), p and I denote the position and 403 intensity of pixel X. The first Gaussian function describes the probability distribution of two adjacent 404 pixels and spatially identical labels belonging to the same class by two parameters: , and . The 405 other Gaussian function uses the parameter , to enforce the smooth classification of pixels with 406 homogenous labels in a large range to avoid local voids. Finally, the weights 1 and 2 adjust the 407 relative ratio of the two factors. 408 After the CRF, we also further improve the segmentation results by a clustering method and applying 409 following post-processing procedures according to the clinical experience (Here, noted that all the gray 410 value is normalized to 0-255): 411 a) All the clusters in the tumor mask with mean gray value of FLAIR image and T2 image are 412 both higher than the 150, are considered image noise and removed from the tumor mask. 413 b) In general, tumor tissues have high signal in at least one modality. Therefore, for the voxels in 414 the tumor mask, if their gray values are less than 0.95 times of the average intensity of FLAIR image 415 and T2 image, and their gray values on T1C image are less than 150, then these voxels will be excluded 416 from the tumor mask. 417 c) Fill the holes in the necrotic area, as they are very likely to be necrosis. 418 d) For those voxels within the enhancing tumor areas whose T1c gray value is less than 110, they 419 will be regarded as classification errors and changed into the ones in necrotic regions. 420 e) Voxels in clusters whose volume is smaller than 0.2 times the maximum clustering volume are 421 considered to be non-tumorous regions, while those in necrotic clustering are the same. 422

Ethics Approval and Consent to Participate 424
All patients signed an informed consent approved in BraTs dataset. 425

Consent for Publication 426
Not applicable. 427

Availability of Data and Materials 428
The BraTs data are available in https://ipp.cbica.upenn.edu. 429