A Novel 2-Phase U-net Algorithm Combined with Optimal Mass Transportation for 3D Brain Tumor Detection and Segmentation

Utilizing the optimal mass transportation (OMT) technique to convert an irregular 3D brain image into a cube, a required input format for the U-net algorithm, is a brand new idea for medical imaging research. We develop a cubic volume-measure-preserving OMT (V-OMT) model for the implementation of this conversion. The contrast-enhanced histogram equalization grayscale of ﬂuid attenuated inversion recovery (FLAIR) in a brain image creates the corresponding density function. We then propose an effective two-phase U-net algorithm combined with the V-OMT algorithm for training and validation. First, we use the U-net and V-OMT algorithms to precisely predict the whole tumor (WT) region. Second, we expand this predicted WT region with dilation and create a smooth function by convoluting the step-like function associated with the WT region in the brain image with a 5 × 5 × 5 blur tensor. Then, a new V-OMT algorithm with mesh reﬁnement is constructed to allow the U-net algorithm to effectively train Net1–Net3 models. Finally, we propose ensemble voting postprocessing to validate the ﬁnal labels of brain images. We randomly choose 1000 and 251 brain samples from theBraTS 2021 training dataset, which contains 1251 samples, for training and validation, respectively. The Dice scores of the WT, tumor core (TC) and enhanced tumor (ET) regions for validation computed by Net1–Net3 were 0.93705, 0.90617 and 0.87470,

Due to the rapid development of the architectural designs of convolutional neural networks (CNNs), deep CNNs have become one of the most widely used AI technologies and have undoubtedly provided surprisingly powerful approaches for tasks, such as object detection, feature extraction, image classification and segmentation, language translation, chemical molecular structure prediction, and other significant tasks in medical and computational sciences. Deep CNNs are composed of three key components, namely, input data, computation steps and models. The last two items include the design of the optimization algorithms and model structure weight training. The former is largely fueled by inputting large amounts of data into the CNN for the development of a more capable prediction system. In the past decade, innovations in high-performance computing, such as GPU accelerators, have promoted powerful advancements in deep learning. However, with Moore's law coming to an end, the reduction in the gratis cost of computing and storage may cease in the near future. Although, thus far, expanding the size of trillion-parameter models and adding a large amounts of training data has been successful, and these models have effectively obtained excellent prediction performance. However, because of the limitation of Moore's law, the calculation of these super models will become extremely expensive and inefficient.
In recent years, two benchmark datasets, MSD2018 1, 2 and BraTS 2020 [3][4][5] , which contain 484 and 369 labeled 3D brain image samples for brain tumor segmentation, respectively, have provided a challenging platform and have attracted enormous attention and interest from researchers in this field. The brain samples were scanned with four modalities, namely, fluidattenuated inversion recovery (FLAIR), T1-weighted (T1), T1-weighted contrast-enhanced (T1CE), and T2-weighted (T2), by multiparametric magnetic resonance imaging (mpMRI). The challenge is the evaluation of state-of-the-art methods for the task of brain tumor segmentation of whole tumor (WT, labeled as {2,1,4}), tumor core (TC, labeled as {1,4}), and enhanced tumor (ET, labeled as {4}) regions in the human brain. To address this issue, in the early years, random forest algorithms and machine learning techniques were used to perform image classification [6][7][8] and segmentation 6,[9][10][11] . After that, CNN structures with two layers 12 and eight layers 13 were proposed and made good progress in brain tumor segmentation. Then, a more sophisticated multiple CNN architecture, called the U-net model, was first developed in 14 and improved in 15 by assembling two full CNNs and U-net. The merits of applying the U-net model to the challenge of MSD 2018 1, 2 were first proposed by 16 . Recently, BraTS 2021 3, 4, 17 expands a large number of new brain samples in the database and provides 1251 labeled samples for training and 219 unlabeled samples for validation.
For this reason, preprocessing to effectively represent large amounts of input data for CNNs is crucial. For example, taking an irregular 3D physical brain image obtained from MRI, which is generally composed of 1.5 million vertices, by randomly selecting several cubes (e.g., 16 cube filters were used in 18 ) with seamless coverage to overlay the irregular brain image is a natural way to fit the input format of tensors for a U-net system. An efficient two-stage optimal mass transportation (OMT) algorithm, newly proposed by 19 , was designed to first transform an irregular 3D brain image into a unit ball and then into a cube with minimal distortion and transport costs. This strategy can greatly reduce the capacity of input data, so there are more opportunities to expand various types of training data, and then, the existing computing resources can be effectively used to improve the expected prediction accuracy. Thus, we are motivated to consider transforming an irregular brain image into a cube. In practical applications, using a CNN to make segmentation predictions for human brain tumors, we only need to use a suitable cube to represent a brain image without losing many important features and conversion accuracy. In this way, the storage capacity and the computational cost of the computer environment can be greatly economized.
OMT is a very old optimization problem that was raised by Monge in 1781 (see 20 for details) to find an optimal solution that minimizes the transport cost and preserves the local mass ratios between two spaces. The existence and uniqueness of a solution to the OMT problem was proven by Kantorovich 20 by relaxing the probability measure with a joint probability distribution. The regularity condition for the solution of the OMT problem was first shown by Caffarelli 11 , and an elegant theoretical survey paper "Optimal Transport: Old and New", which summarized the achievements of predecessors, was published by Villani 21 . For numerical methods, Brenier 10 proposed an alternative scheme for solving the OMT problem with a quadratic cost function for a special class of convex domains. Based on Brenier's approach and the variational principle 22 , Su et al. 23 developed a volume-preserving parameterization from a 3-manifold M with a spherical boundary to a unit ball B 3 . Recently, Yueh et al. 24 proposed a novel algorithm to compute a volume-preserving parameterization from M to B 3 by modifying the denominators of the coefficients of the corresponding Laplacian matrix by imposing the local volume stretch factor at each iteration step and adopted the projected gradient method (PGM) combined with the homotopy technique in 25 to find the OMT map between M and B 3 . In addition, a two-stage OMT (2SOMT) procedure from M to B 3 and from B 3 to a cube was efficiently developed by Lin et al. 19 and applied prior to U-net training and inference in 3D brain tumor segmentation.
In this paper, we study the applicability of mapping an irregular 3D image (i.e., a human brain) to a canonical domain (i.e., a cube or a cuboid), which minimizes the transport cost and preserves the local mass ratios. First, based on the homotopy technique, a direct one-stage OMT approach from a 3-manifold M with a genus-zero boundary to a cube is developed for 3D U-net training and inference to improve the higher conversion loss of 2SOMT 19 from M to B 3 and B 3 to a cube. Thus, we can construct a one-to-one correspondence between the input data of irregular images and the associated cubic tensors. Without any the conversion loss between OMT maps, the capacity of the training data of the 3D U-net model is greatly reduced, and it is our belief that 3D U-net training can easily find a local minimum and achieve better performance.
Next, we propose a two-phase U-net with OMT (2P-Unet-OMT) algorithm utilizing the density distribution of brain tumor features and train four related networks to detect tumor regions and segment tumor labels. Given an irregular 3D brain, in Phase I, we first construct the associated density map at each vertex according to the normalized contrast-enhanced histogram equalization (CEHE) grayscale values of the FLAIR modality of a brain image by MRI. Then, we compute OMT maps from brain images to cubes for the training set and train Net0 by the U-net algorithm for the detection of possible tumor regions. In fact, there are no clues at the beginning; the CEHE grayscales of FLAIR, which typically reflect the distribution of WT, should be an effective way to detect tumor regions. Next, we cover these possible tumor regions by 5 voxels with dilation. In Phase II, because ET ⊂ TC ⊂ WT, we construct a smooth density function by convoluting the step-like function with exp(FLAIR) on the expanding WT region and 1.0 on the others, with a 5 × 5 × 5 box blur tensor. We remesh the tetrahedron with finer meshes in the higher density region in the brain so that the target tumor region can be enlarged in the cube by OMT and better viewed and learned by U-net. We then train Net1 for WT, Net2 for TC and Net3 for ET by U-net. In practice, for the testing issue, Net0 can help by first detecting a WT region as much as possible. As above, by covering this region with 5 voxels with dilation and creating a similar smooth density function with finer meshes on the raw brain image, we compute the corresponding OMT map and call Net1-Net3 combined with ensemble voting postprocessing to make the final label prediction and image segmentation.
Contribution. The 2P-Unet-OMT procedure transforms an irregular 3D brain image into a cube with density estimates and mesh refinement to fit the input format of the U-net algorithm. Unlike the previous methods, 2P-Unet-OMT minimizes the transport cost and preserves the global features of input data to surpass the other methods. The main contributions of this paper are summarized below. U-net while preserving the local mass ratios between two domains and minimizing the transport cost and the distortion. These advantages for 2SOMT were highlighted in 19 . However, 2SOMT did not make full use of estimating the distribution of the density function, so U-net could not infer the target object accurately. 2P-Unet-OMT fully grasps the distribution of the associated density function for creating an effective OMT map from an irregular 3D domain to a cube and provides it to U-net for training a high-performance prediction network. 2P-Unet-OMT inherits the advantages of 2SOMT in that it only needs to use a cube to represent an irregular 3D brain image without losing many important features and conversion accuracy. In this way, the computational cost and the computer environment can be greatly economized during U-net training and used for data augmentation, which exactly considers the limitation of the memory capacity. Nevertheless, 2P-Unet-OMT greatly increases the prediction accuracy via a precise estimate of the density function for the tumor distribution.
2. One of the characteristics of the OMT map is to preserve the local mass. With this peculiar feature, in Phase II of 2P-Unet-OMT, we apply mesh refinement on the expanding WT region detected during Phase I. The mesh refinement technique can increase the number of tetrahedrons in a specific region in the brain and enlarge the portion of volume appearing in the target domain; that is, using the U-net algorithm is similar to using a magnifying glass to view and learn how to mark the segmentation labels well. The numerical experiment with the trained Net1-Net3 model combined with ensemble voting shows that the Dice scores of validation for WT, TC and ET can reach 0.93705, 0.90617 and 0.87470, respectively; hence, this approach significantly boosts the accuracy of brain tumor detection and segmentation.
3. Because the OMT approach must convert the labels predicted by U-net to a brain image and to evaluate the Dice score more precisely, we propose a new conversion technique with ensemble voting postprocessing to convert the predicted labels on the cube back to each voxel of the brain by using the multivalues on the cube validated by various models to make a precise evaluation of labels corresponding to voxels in the brain image. From the expressively high validation Dice scores on the BraTS 2021 validation data, it can be said that using a cube for representing an irregular 3D brain image by OMT is indeed an innovative idea and the most streamlined approach for CNN training and prediction.
This paper is organized as follows. In "Discrete OMT Problems and Cubic OMT Maps", we introduce the discrete OMT problem and the spherical-cubic area-measure-preserving and cubic volume-measure-preserving OMT maps. In "Two-Phase U-net with OMT for Training and Validation", we propose a two-phase U-net model with OMT maps for training and validation. For the evaluation of high Dice scores, we develop an effective conversion technique to convert the predicted labels on the cube back to the brain image using all related probability information corresponding to each voxel in the brain image. In "Results and Discussions", we show the improvement in the Dice score obtained by the U-net models in Phase II with mesh refinements on the expanding WT region provided by Phase I and the ensemble voting postprocessing for the label evaluation. Finally, a concluding remark is given in "Conclusions".

Discrete OMT Problems and Cubic OMT Maps
Let M be a simplicial 3-complex that describes an irregular 3D brain image with a genus-zero boundary. M is generally composed of sets of vertices V(M ), edges E(M ), faces F(M ) and tetrahedrons T(M ). A discrete OMT problem is to find a bijective function that maps M to a canonical simple domain with minimal distortion. The canonical shape could be a ball B 3 or a unit cube C 3 . Since a tensor form is necessary for the input of the U-net algorithm, a cube or a cuboid is the target domain for M . In this section, we propose a one-stage OMT approach to map M to C 3 .
, and τ ∈ T(M ). Furthermore, we define the local area-/volume-measures (i.e., local mass) by respectively, where |α| and |τ| are the area and volume of α and τ, respectively.

3/15
Denote and as the sets of all area-/volume-measure-preserving (i.e., mass-preserving) piecewise linear maps from ∂ M to ∂ C 3 and from M to C 3 , respectively, in which the bijective maps between α and g(α), as well as τ and f (τ), are determined by the barycentric coordinates on α and τ, respectively. For given g ∈ G ρ and f ∈ F ρ , we define the transport costs of g and f , respectively, by where a ρ (v) and m ρ (v) are the local area-/volume-measures atv ∈ V(∂ M ) and v ∈ V(M ), respectively, as in (2). The discrete OMT problems on ∂ M and M with respect to · 2 are to find a g * ρ ∈ G ρ and f * ρ ∈ F ρ that solve optimal problems where d ρ (g) and c ρ ( f ) are given in (4). Without loss of generality, hereafter, each simplicial 3-complex M is centralized and normalized so that the center of mass is located at the origin and the mass is one.
The piecewise linear function g on ∂ M is given by the barycentric coordinates, g is called the induced function by g and g is the inducing matrix for g. The area-weighted stretch energy 26 on ∂ M is defined as where L S (g) is the area-weighted Laplacian matrix with and where θ i, j (g) and θ j,i (g) are two angles opposite to edge g( , is the local area-measure stretch factor. To compute the cubic area-measure-preserving OMT (A-OMT) map from ∂ M to ∂ C 3 , we utilize the PGM proposed in 25 , which can be used to efficiently compute the A-OMT maps h * ρ : ∂ M → S 2 , where S 2 denotes the unit sphere in R 3 and h * 1 : ∂ C 3 → S 2 (ρ = 1), respectively. Then, the composition map Figure 1, is the desired A-OMT map. The computational procedure is summarized in Algorithm 1.
Cubic Volume-Measure-Preserving OMT Maps. In this section, we will develop the OMT algorithm for solving the cubic OMT map f * ρ , as in (5), from M to C 3 directly. Let g * ρ be the cubic A-OMT map from ∂ M to ∂ C 3 computed by Algorithm 1. We now construct a homotopy g ζ : ∂ M → R 3 for the boundary maps by Input: A genus-zero simplicial 2-complex ∂ M of mass one and a piecewise linear density function ρ on ∂ M .
as in 24 . For k = 1, . . . , p, we compute the interior map by solving the linear system where The corresponding computational procedure is stated in Algorithm 2.
Input: A simplicial 3-complex M with a genus-zero boundary of mass one and a piecewise linear density function ρ on M .
To study the partition number p of homotopy in step (9), we define the total mass distortion and the local mass ratio as respectively, where N (v) is the set of 1-ring neighboring tetrahedrons of v. A physical brain image is contained in I s and accounts for approximately 12%-20% of the voxels. Suppose M ⊆ I s is a simplicial 3-complex with a genus-zero boundary composed of tetrahedral meshes representing a brain image. Since I 1 records the adapted CEHE grayscales of FLAIR and, in general, the FLAIR modality typically reflects the distribution of WT = {2, 1, 4}, the adapted CEHE grayscales on the voxel I 1 (i, j, k) can help with defining the density map on V(M ) by
Two-Phase U-Net with OMT for Training. For the given samples in the training set of 3D brain images, we propose a 2P-Unet-OMT algorithm with density function estimates to construct an effective input tensor for the U-net algorithm. In general, a real brain image roughly contains 1.5 million vertices. It is reasonable to cover a brain image with 128 3 voxels.
Phase I We first construct training tensors by using the OMT algorithm with the density ρ γ (v), as in (11)    Net0 is designed to detect the possible tumor region of WT and then used to construct a new density function for enlarging the tumor region for Phase II.

6/15
Phase II For a given training brain image, we cover the possible tumor region of WT that is labeled 1 = {2, 1, 4} by m voxels with morphology dilation; that is, T ⊆ M . Let ρ γ (v) be a step-like functions defined as Then, we construct new smooth density functions using the image filtering technique by convoluting ρ γ (v) with a m × m × m blur box tensor, as follows: where ρ γ (v) is given in (12). As shown in Figure 3a, we compute the OMT map f * ρ γ from M to a 128 × 128 × 128 cube N Net0 and Net1-Net3 for Validation. Once we have computed Net0 and Net1-Net3 in Phase I and Phase II, respectively, we use Net0 to detect the possible tumor region of WT = {2, 1, 4} with the density function ρ γ (v) defined in (11) and cover WT by m voxels with dilation, that is, T ⊆ M , and construct a new density function ρ γ , as in (13)    We denote GT as the ground truth of WT ⊃ TC ⊃ ET and PD as the prediction of WT, TC and ET, respectively, by (i)-(iii) above. The associated relationship of between sets of GT and PD is plotted in Figure 5. For numerical experiments, we define the Dice score as In (ii), we see that for each voxel v j in M , we utilize the multivalues p t i , i = 1, . . . , n( j), on the cube to define the most likely probability, which can be used to make a more precise evaluation of the label prediction. Furthermore, if we define GT p and PD p as the probability density tensors of GT and PD, respectively, we can define Loss function = Dice loss + Cross entropy loss The Dice loss in (16) can help with checking the convergence of the training procedure for WT, TC and ET vs. epochs by the U-net algorithm.

Improvement in Dice Scores with Mesh Refinement and Ensemble Voting Postprocessing.
In this subsection, we propose two methods to improve the Dice scores of WT, TC, and ET. One is the mesh refinement on the WT region for the OMT map, and the other is the ensemble voting postprocessing.
(a) Mesh refinement. With the merit of 2P-Unet-OMT, the density distribution of interesting regions in a brain image computed by Phase I can be enlarged with finer meshes and can be better viewed in Phase II for U-net training. One of the most important features of the OMT map is that the density can be increased in the region of interest, and then the region can be remeshed by the mesh refinement technique. In this way, due to the mass-preserving property of OMT, the region of interest can be enlarged in the cube, which enables U-net to learn more efficiently and achieve high-performance prediction results.

8/15
(b) Ensemble voting postprocess. We propose an ensemble voting postprocessing approach to determine the final labels in the brain image for validation. The main purpose of this postprocessing step is to modify the probability p t i , t = 1, 2, 3, in steps (i)-(iii) of paragraph "Net0 and Net1-Net3 for Validation". We first select the three best models {Net1 ν , Net2 ν , Net3 ν } 3 ν=1 for WT, TC and ET from the training procedure. For each 128 × 128 × 128 brain tensor (R 0 ) for validation, we further build four 128 × 128 × 128 tensors with 90 degree counterclockwise rotations (R 1 ), mirroring from the left to the right (R 2 ), mirroring from the top to the bottom (R 3 ) and mirroring from the left to the right followed by a 90 degree counterclockwise rotation (R 4 ).
The various rotations R 1 , . . . , R 4 of the brain tensor R 0 constructed above indeed help with improving the Dice scores with the ensemble voting technique developed in (17) and (18).

Results and Discussions
Based on the CNN technique, the U-net algorithm is designed to learn an effective network from training data using an optimization process that requires decreasing the model error of the loss function on the training and validation sets. We adopt the U-net algorithm and set the hyperparameters as follows: encoder depth: 3, initial learning rate: α 0 = 1.0 × 10 −4 , learning rate drop factor: F = 0.95, learning rate drop period: P = 10, L 2 -regularization: 1.0 × 10 −4 , minimum batch size: 8.
For the 1251 brain image samples in the BraTS 2021 challenge database 3, 4, 17 , we randomly fix 1000 samples for training and 251 for validation. The training is carried out on a server equipped with an NVIDIA Tesla V100S PCIe 32 GB×4 GPU. All calculations are implemented in MATLAB R2020a. Partition number p in Algorithm 2. We select BraTS0002 as an M from the BraTS 2021 dataset and compute the cubic V-OMT from M to C 3 by Algorithm 2. In Figure 6, we plot the statistical summary of the local mass distortion ∑ τ∈N (v) |ρ(τ)|τ|− | f (τ)||/4, as in (10), and r f (v) for all v ∈ V(M ) versus the partition number p of homotopy. In each box, the red centerline indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The dotted lines extend to the most extreme data points that are not considered outliers, and the outliers are represented separately with "+" signs. Furthermore, in Figure 7, we also plot the statistical summary of the total mass distortion d M ( f ) and the mean and standard deviation (SD) of r f (v) vs. the partition number p for the first 1000 brain samples from the BraTS 2021 dataset. Figure 6 shows that when p = 11, the cubic V-OMT between BraTS0002 and C 3 has the smallest local mass distortion and the closest local mass ratio to one. Moreover, in Figure 7, when p = 11, the first 1000 brain samples of BraTS 2021 have the smallest total mass distortion and the best mean and SD of the local mass ratios. Therefore, we choose p = 11 in Algorithm 2.
Dimension m of blur box tensor. We now discuss the dimension m of blur box tensor in (13), which covers the WT region by m voxels. To choose a suitable number m for the covering voxels with dilation for WT, we apply the recall and precision metrics, which are defined as where PD denotes the prediction of {WT covered by m voxels with dilation} by Net0. In fact, the recall metric in (19) indicates how many voxels lie in the prediction, and the precision metric in (19) indicated how precise the prediction is. Thus, we want to make both recall and precision as large as possible. In Figure 8, we plot the mean, minimum, median and maximum values of the recall and precision metrics of the WT validation vs. the numbers of covering voxels with dilation. We find that m = 5 is a suitable number to balance the recall and precision values for the validation data.      from M to C 3 .

10/15
For fixed m = 5, in Table 1, we list the mean, minimum, median, maximum and SD values of the transport costs, folding numbers and enlarged ratios for both the 1000 training samples and the 251 validation samples. The enlarged ratio is defined by (the ratio of WT in the raw data)/(the ratio of WT in the cube).
In Table 1, we observe that the numerical results of the transport costs, folding numbers and enlarged ratios for the 1000 training and 251 validation samples computed by f * ρ 1 in (13) are in line with what we expected.
Dice Scores and Loss Functions. We first compare 2P-Unet-OMT developed in Section "Two-Phase U-net with OMT for Training and Validation" with one-phase Unet-OMT (1P-Unet-OMT); i.e., the density functions of (11) with γ = 1.0 and 1.5 are used for training Net1. We learn Net1 by using 2P-and 1P-Unet-OMT with 310 epochs. In Figures 9a and 9b, we plot the Dice scores of WT for training and validation by 2P-and 1P-Unet-OMT, respectively. We observe that for both the training and validation scores, 2P-Unet-OMT is obviously much better than 1P-Unet-OMT. Therefore, in the following numerical experiments, we prefer to adopt 2P-Unet-OMT. To expand the training data in Phase II, we use three different density functions ρ 1 (v), ρ 1.5 (v) and ρ 2 (v) for v ∈ I 1 (i, j, k) ⊆ T , as in (13), to create 3000 augmentation brain images for training. We now use 2P-Unet-OMT to train Net0 and Net1-Net3 on 3000 training samples. Then, we utilize them to obtain predictions on the 251 validation samples. In Figure 10, we plot the Dice scores with blue "o" and "x" symbols and the loss functions with red "o" and "x" symbols vs. the epoch numbers for the training and validation sets of WT, TC and ET, respectively. Note that the Dice scores for WT, TC and ET are defined by (15) and the loss function is defined by (16). The predicted labels of WT, TC and ET in a brain image are evaluated by steps (i)-(iii), which are precisely determined by the probability value p t We see that the training and validation Dice scores for WT, TC and ET increase very fast during the first 50 epochs but

11/15
then do not increase significantly and reach (0.9720, 0.9673, 0.9330) and (0.9325, 0.8965, and 0.8614), respectively, after 310 epochs. On the other hand, the training and validation loss function for WT, TC and ET decrease very fast during the first 50 epochs and approach (7.008 × 10 −2 , 7.067 × 10 −2 , and 8.678 × 10 −2 ) and (8.006 × 10 −2 , 9.487 × 10 −2 , and 9.957 × 10 −2 ), respectively, after 310 epochs. The trends of both the Dice score and loss function value indicate the typical training and validation history. Thus, based on the clear tendency of the curves of the Dice scores and loss functions, in our experiment, we run U-net for 310 epochs.
Merit of Mesh Refinement on the WT Region. Based on the discussion of the merit of mesh refinement on the expanding WT region in the brain image, we compute OMT maps with the smooth density functions ρ γ (v), γ = 1.0, 1.5, 1.75, 2.0, in (13) for 1000 brain samples to obtain 4000 augmented brain cubes and use U-net to train Net1-Net3. Furthermore, for validation, we compute 2P-OMT for 251 brain samples with the density function ρ 1.75 (v) on the expanding WT region by Phase I with mesh refinement. We train U-net for 310 epochs on 4000 augmented brain cubes. From epoch 10 to 310, for every 10 epochs, we validate the Dice scores on the 251 samples of validation data for WT, TC and ET. In Table 2, we show the top 3 validation Dice scores for WT, TC and ET by steps (i)-(iii) at epochs (150, 140, 100), (170, 120, 70) and (130, 170, 80), respectively. The corresponding training Dice scores for WT, TC and ET are listed in the first three columns of Table 2. We see that the validation Dice scores for WT, TC, and ET for the brain image reach 0.93469, 0.90251 and 0.86912, respectively, which is a satisfactory result.  Table 2. Dice scores of WT, TC and ET for the brain image with mesh refinement on the training and validation sets.
Dice Scores with Ensemble Voting Postprocessing. In this subsection, we show the improvement in dice scores with the mesh refinement and the ensemble voting postprocessing approach to determine the final labels in the brain image for validation. We first select the three best models for WT, TC and ET at epochs (150, 170, 130), (140, 120, 170) and (100, 70, 80) from the training procedure, as shown in Table 2, and call them Net1 ν , Net2 ν , and Net3 ν for ν = 1, 2, 3.
In Figures 11a, 11b, and 11c, we plot the histograms of the Dice scores with and without ensemble voting postprocessing in blue and green lines for WT, TC, and ET, respectively, vs. the epoch. Furthermore, the associated increments of the Dice scores are plotted with red lines in Figures 11a, 11b, and 11c. We see that the Dice scores for WT, TC, and ET with the ensemble voting technique are much better than those without voting postprocessing. In addition, the Dice score curves for WT, TC and ET have a relatively stable upward trend.  Table 3, we show the Dice, sensitivity, specificity, and 95 percentile of the Hausdorff distance (HD95) scores of 251 validation samples for WT, TC, and ET in brain images by Net1 ν -Net3 ν , ν = 1, 2, 3, with the ensemble voting technique. We see that Net1 ν -Net3 ν with mesh refinement and ensemble voting postprocessing, as well as with the precise conversion of 12/15 steps (i)-(iii), significantly boosts the validation Dice scores (251) on BraTS 2021. This result is very promising for brain tumor detection and segmentation.  Table 3. Dice, sensitivity, specificity and HD95 scores on 251 validation samples for WT, TC and ET with the mesh refinement and the ensemble voting techniques.

Conclusions
In this paper, we mainly introduce 2P-Unet-OMT with density estimates for 3D brain tumor detection and segmentation. We first propose a cubic volume-measure-preserving OMT algorithm in Section to compute an OMT map for transforming an irregular 3D brain image to a cube while preserving the local mass ratios and keeping the minimal deformation. Furthermore, OMT is bijective and minimizes the transport cost. The concept of expressing an irregular brain image with a cube with minimal distortion is proposed for the first time in this research field, and these cubes are typically adequate for the tensor input format of the U-net algorithm that creates validation networks. Representing 3D brain images as cubes significantly reduces the effective brain images from sizes of 240 × 240 × 155 to cubes of sizes 128 × 128 × 128 and preserves the global information of tumor features. This novel OTM preprocessing technique can save a large amount of input data and reduce the computational time for training. In addition, the ensemble voting technique proposed in (17)- (18) and the robust conversion steps (i)-(iii) of paragraph "Net0 and Net1-Net3 for Validation" from cubes (128 × 128 × 128) with predicted labels back to brain images (240 × 240 × 155) considerably increase the Dice scores for brain images compared to those for cubes on the 1251 brain image samples.
One of the characteristics of the OMT map is that it can control the densities of tumor regions in brain images, and then, via mass-preserving OMT, the high-density areas can be enlarged in the cube so that the U-net algorithm can strengthen the cognition and learning in the high-density regions. In fact, 2P-Unet-OMT in Section is designed for this purpose. Phase I can first catch the possible region of WT and then cover this region by 5 voxels with dilation. Next, Phase II reconstructs new smooth density functions, as in (13), and performs mesh refinement on the range estimated by Phase I. With the advantage of the mass preservation of OMT, the portion of the possible WT region can be enlarged in the cube. Then, the U-net algorithm is utilized to train more effective Net1-Net3 models for tumor prediction and validation.
The Dice scores of WT, TC and ET by Net0 and Net1 ν -Net3 ν for ν = 1, 2, 3, with mesh refinement and ensemble voting postprocessing reach 0.93705, 0.90617 and 0.87470 for validation, respectively. 2P-Unet-OMT with mesh refinement sufficiently utilizes the mass-preserving property to significantly improve the tumor detection and segmentation accuracy.
In future work, because an irregular 3D brain image only needs to be represented by a cube in our approach, we have much room to expand the augmented data with various density settings, such as in (12); these settings include rotating, mirroring, shearing and cropping, and will allow for more opportunities to boost the prediction accuracy. In addition, we believe that for a 3D image provided by real 3D scanning instruments that may be developed in the future, the use of OMT to represent an irregular 3D object must retain the structure of the global information. This 3D OMT representation would take advantage of a precise conversion in the three directions in space and be beneficial to the input format of CNN algorithms. We believe this is a cross-trend research direction for medical images in the near future.