Automatic Segmentation of Prostate Magnetic Resonance Imaging Using Generative Adversarial Networks

Background: Automatic and detailed segmentation of the prostate using magnetic resonance imaging (MRI) plays an essential role in prostate imaging diagnosis. However, the complexity of the prostate gland hampers accurate segmentation from other tissues. Thus, we propose the automatic prostate segmentation method SegDGAN, which is based on a classic generative adversarial network (GAN) model. Methods: The proposed method comprises a fully convolutional generation network of densely connected blocks and a critic network with multi-scale feature extraction. In these computations, the objective function is optimized using mean absolute error and the Dice coeﬃcient, leading to improved accuracy of segmentation results and correspondence with the ground truth. The common and similar medical image segmentation networks U-Net, fully convolution network, and SegAN were selected for qualitative and quantitative comparisons with SegDGAN using a 220-patient dataset and the publicly available dataset PROMISE12. The commonly used segmentation evaluation metrics Dice similarity coeﬃcient (DSC), volumetric overlap error (VOE), average surface distance (ASD), and Hausdorﬀ distance (HD) were also used to compare the accuracy of segmentation between these methods. Results: SegDGAN achieved the highest DSC value of 91.66%, the lowest VOE value of 23.47% and the lowest ASD values of 0.46 mm with the clinical dataset. In addition, the highest DCS value of 88.69%, the lowest VOE value of 23.47%, the lowest ASD value of 0.83 mm, and the lowest HD value of 11.40 mm was achieved with the PROMISE12 dataset. Conclusions: Our experimental results show that the SegDGAN model outperforms other segmentation methods


Background
Prostate cancer (PCa) ranks second after lung cancer as the most common malignant tumor in men. According to global cancer statistics, 1.3 million new cases of prostate cancer and 359,000 associated deaths were recorded in 2018 [1]. Because early stage prostate cancer can be effectively diagnosed and controlled, accurate detection of emerging prostate cancers is particularly important. Currently, MRI is the most common imaging method for diagnosing PCa [2,3]. Due to its relatively high soft-tissue resolution, MRI can clearly show internal structures and surrounding tissues of end-to-end network that classifies images at the pixel level, thereby solving the problem of semantic segmentation. As an alternative, Ronneberger et al. [13] applied a U-Net network to medical image segmentation. U-Net is performed with a contracting path to capture contextual information from images and an expanding path that is used to accurately localize the segmented target. To the same end, Milletari et al. [14] developed V-Net, which is a three-dimensional (3D) end-toend medical segmentation method. This network introduces the Dice coefficient as a novel objective function and uses a residual learning method to accelerate convergence. Various other automatic algorithms have been proposed for the segmentation of prostate MRI data [15,16,17,18]. Among these, Zhu et al. [19] developed a deeply supervised two-dimensional (2D) U-Net model, and Milletari et al. [14] introduced V-Net as a 3D prostate image segmentation model. Further, Zhu et al. [20] proposed a recursive neural networks (RNNs) UR-Net model, they treat prostate image slices as data sequences and use intra-slice features to improve the performance of prostate segmentation. Cheng et al. [21] presented an active appearance model (AAM) based on atlas and a deep learning model, and demonstrated higher precision for prostate segmentation on MRI. Vincent et al. [22] similarly developed a MRI segmentation system based on a fully automatic model, which was built on the AAM established by the MCCAI2012 team's manual segmentation example using the minimum description length group. In a later study by Zhu et al. [23], a fully automated algorithm for peripheral zone and transition zone segmentation of prostate MRI achieved satisfactory performance. To date, these methods have been tested on clinical datasets and the public prostate dataset PROMISE12, and have achieved good performance. However, due to the complexity of MRI, these methods remain unable to actively fit natural structures of the prostate gland in MRI.
To address this shortcoming, we propose the new end-to-end architecture SegDGAN, which was inspired by the SegAN model [24] and was used to segment prostate glands. In this study, we enhanced the performance of prostate gland segmentation by optimizing the original GAN [25] network structure and the objective function. The resulting model overcomes the problems associated with natural prostrate structures, and provides an architecture that performs better than previously described algorithms. The contributions of this work are listed as follows: • In the generator network (segmentor network), we introduce a fully convolutional network that is similar to the U-Net codec structure, and then performed end-to-end training. • We applied the dense connection block structure to achieve direct connections between single layers and all subsequent layers, thus contributing additional supervisory information. This design reduces redundant feature learning to alleviate the gradient vanishing problem and to prevent model over-fitting. • We developed a multi-scale feature connection in the discriminator network and added Dice control to the objective function, leading to improved segmented binary mask images.

Methods
Our proposed prostate segmentation method SegDGAN comprises a generator G network and a discriminator D network, and these are trained concomitantly. The discriminator is designed to distinguish between ground truth images and those produced by the generator. In contrast, the generator generates images that are similar to ground truth images, and is hence designed to constrain the discriminator. Following adversarial training, both of these networks ensure that the output images from the generator are constantly approaching the ground truth of the prostate gland. The network structure is shown in Fig. 1 and Fig. 2. Figure 1 The architecture of generator G. it comprises a down-sampling path with three dense blocks and an up-sampling path with two dense blocks; blue circles represent concatenation in the network. Blue dotted arrows indicate bypass connections of the feature maps between coupled encoders and decoders. Figure 2 The architecture of discriminator D. All layers comprise convolutional, batch normalization (BN), and leaky ReLU layers. BN is not included in the first layer. Masked images are generated by multiplying label images with input images in a pixel-wise order.

SegDGAN Architecture Generator
The generator G is a segmentation network in which end-to-end training is performed. G uses an encoderdecoder structure of a previously described U-Net [13] based on a reported fully convolutional network [12], as shown in Fig. 1. This network includes down-sampling and up-sampling processes. The down-sampling process includes a convolutional layer of 3×3 convolution kernels, three maximum pooling layers, and three densely connected blocks. The up-sampling process includes three deconvolution layers and three densely connected blocks followed by a 1×1 convolution kernel. The U-Net is applied to the generator G to add skip connections between the encoder and the decoder. Feature maps from the down-sampling path are connected to those from the up-sampling path in the symmetry pattern. This design allows extraction of image features at different scales during down-sampling, and provides a view of the same size as the input image in the upsampling process.
The internal structure of the dense block is similar to DenseNet [26], as illustrated in Fig. 3, and contains four layers in a dense block. Each layer comprises batch normalization (BN), a rectified linear unit (Re-LU), and a 3×3 convolution kernel. Direct connections Figure 3 The dense block structure. it is adopted in Generator G. Green circles represent concatenation. from a single layer are made with all subsequent layers to achieve additional supervision of the dense block structure.

Discriminator
Discriminator D is a multi-dimensional feature extraction network with six layers. Each layer comprises a convolution layer, a BN, and a leaky ReLU activation layer. But in the first layer, BN is not considered and only the other two layers are included. Convolution kernel sizes are 7×7, 5×5, 4×4, and 3×3. Fig. 2 shows the structural details of the discriminator D with the components of each convolutional layer.

Optimized Objective Function
The objective function of conventional GANs [25] is defined as follows: where x represents the real images from distribution P data and z is random data that is introduced to determine the distribution of the generator. z usually satisfies a random noise distribution P z . G(z ) represents the differential function for the generator. Finally, D(x ) represents the probability that the input image x originates from the training data sample rather than the image generated by the generator.
In this study, we applied the GAN network to prostate segmentation. The generator G is a mapping term that relates x original MRI images to y segmented binary mask images. The discriminator D is used classify the image and produces a binary classification {0, 1} k for each data point {x, y}. In this classification, 1 indicates that y is a label image from the training sample and 0 indicates that y is a G generated image. Finally, k represents the number of decisions. Accordingly, the objective function of the GAN for this segmentation problem is defined as follows: As suggested previously [24], we added a multi-scale feature connection and a Dice coefficient control to the objective function to achieve better training results as follows: where N is the number of the training images and x n and y n refer to the MRI images and ground truth label maps, respectively. The symbol "•" in the formula represents the point multiplication of the matrix. The function of the discriminator f D was used to extract hierarchical features from input data x, and mac is the mean absolute error (MAE) between the predict-ed prostate region and the real prostate region, and is defined as follows: where M is the number of layers of the discriminator D, and f i D (x ) is the feature map of the i -th layer of D. In this equation, dice is the Dice coefficient that characterizes the degree of similarity between the generated binary mask image and the ground truth. λ is an adjustment coefficient that optimizes the weight effect between mac and dice . Accordingly, dice is defined by the following equation: where S represents the total number of pixels in images and x i is the predicted feature mapping probability. The term x i is the pixel value of the labeled segmentation sample and has a value of {0, 1}. Finally, ε is a stability constant that is used to prevent the denominator from being 0.

Segmentation Evaluation Metrics
We employed four widely used segmentation evaluation metrics to evaluate and compare the segmentation results of various neural networks. The metrics Dice similarity coefficient (DSC) [27] and volumetric overlap error (VOE) [28] are both indicators of volume similarities. The other two metrics, average surface distance (ASD) [29] and Hausdorff distance (HD) [30], are relevant to distance measurements. DSC is designed to calculate similarities between pairs of contour regions and is defined as follows: where M and N correspond with segmentation results and ground truth masks, respectively. The VOE is used to calculate ratios between intersections and unions of the two images, and is calculated as follows: ASD is used to compute average surface distances between binary objects in two images. ASD is defined as follows: S (M ) and S (N ) denote surface voxels of the segmentation results and the ground truth masks. d (N, M ) is the Euclidean distance between the two images. The HD is used to measure the distance between the two images, and HD(M, N ) is calculated using the follow- where d (m, S (N )) represents the Euclidean distance between m and S (N ), and d (n, S (M )) represents the Euclidean distance between n and S (M ).

Dataset
The experimental data were collected during clinical prostate examinations in the hospital from July 2012 to June 2018. Data were collected for a total of 220 patients, including 121 healthy subjects and 99 prostate cancer patients, and were processed anonymously. All of prostate MRI images were acquired on a 3.0-T MR imaging system (SIEMENS Verio 3.0 Tesla (T)), and the running software version is syngo MR B17. Transverse T2-weighted (T2-w) images were acquired using Turbo spin-echo sequence with the following parameters: repetition time, 2900-4030 ms; echo time, 96-106 ms; slick thickness, 3-4.0 mm; intersection gap, 3.6-4.8 mm; and matrix, 640×640. Prostate boundary masks were cross-labeled by two expert clinicians with more than five years clinical experience, and labeled images used as the ground truth in the form of binary masks.
To verify the generalizability of the model, we used the public dataset PROMISE12, which was provided by 2012 MICCAI challenges [31]. Although the images in the dataset were collected using different devices with different resolutions and scan protocols, the data has greater universality, and it was not used in the training process. Thus, we considered 20 randomly selected MRI images from this dataset as suitable for verifying the generalizability of our model. These images were all as large as those in the clinical test dataset.

Data preparing
To facilitate model training, we resized the images to a fixed resolution of 0.3 mm×0.3 mm×2 mm. Subsequently, image sizes were fixed to 512×512 and 32 slices were selected to exclude those that do not contain prostate. The final dataset contained 7040 selected images. We then split the dataset randomly into a training dataset, a validation dataset, and a test dataset. Twenty patients with a total of 640 images were selected for testing, and the remaining data were used for verification and training. Hyperparameters were selected using the validation data set and were finally trained with the training data.
Because insufficient training samples would lead to over-fitting of the model, we performed data augmentation to increase the training sample artificially. The data augmentation transformations in our model included translation, rotation, distortion, and scaling.

Training
All model training was performed on a GTX 1080TI GPU with 64GB RAM and CUDA edition 9.0 software. The model was implemented on the open source deep learning framework PyTorch [32]. To expedite model training and save memory, input image sizes were reshaped to 128 × 128. Because the dense block structure used in our model shares weights among sub-networks, it can suppress over-fitting [26]. Thus dropout was not considered in our model. In addition, we used the Adam optimizer to achieve better training performance. The initial learning rate was set at 0.003, with decay every 40 rounds at a decay rate of 0.5 until the learning rate was fixed at 0.0000001. In consideration of machine performance and training speed, we set the batch size at 8 for the experimental training dataset.

Results
To evaluate the performance of our SegDGAN model, we compare it to the common segmentation network U-Net [13] and the FCN [12]. We also compared SegDGAN with the reference model SegAN, which was the precursor of our model [24]. All neural network models were trained and tested on the same dataset and were compared according to the same metrics.

Qualitative Comparison
Segmentation was performed using the SegDGAN model and the other neural networks U-Net, FCN, and SegAN based on experiment data, as shown in Fig. 4. Segmentation results are shown for various prostate locations, including the center, the apex, and the base.
In Fig. 5, we present segmentation results that were generated using the public dataset PROMISE12. As Figure 4 Qualitative segmentation results of a prostate gland. these results from our clinical dataset at the center (upper), apex (middle), and base (lower); the first column shows raw MRI images with ground truth of the prostate gland. Columns 2-5 correspond with segmentation analyses using U-Net, FCN, SegAN, and SegDGAN models, respectively. In the final column, five contours superimposed on the raw MR images; red contour denotes the ground truth, and pink, yellow, blue, and green contours denote the segmentation results from U-Net, FCN, SegAN, and SegDGAN, respectively.  Segmentation results of SegDGAN model analyzes are shown in the fifth columns of Fig. 4 and Fig. 5. These figures indicate that our SegDGAN model outperforms the other methods at all locations. In particular, the second row and third column of Fig. 5 show no relevant results because FCN segmentation failed entirely and no prostate gland were detected.

Quantitative Comparison
We evaluated our model SegDGAN and the models U-Net, FCN and SegAN according to the metrics listed in section . All metrics for the four models were calculated using the experimental dataset. To compare and assess the generalizability of these methods, we also calculated evaluation metrics using the PROMISE12 dataset, as shown in Table 1. These data show that the highest DSC values, lowest VOE values, and lowest ASD values were obtained on our SegDGAN model with the experiment dataset. In contrast, the highest DCS values, lowest VOE values, lowest ASD values, and lowest HD values were achieved on the public dataset PROMISE12.

Discussion
In this study, we developed an automatic segmentation algorithm for prostate gland, based on generative adversarial networks (GANs). To test our algorithm, we compared qualities and quantities of segmentation results with those from other models. Currently, FCN [12] and U-Net [13] networks are the most widely used for medical image segmentation. Because our method was generated on the basis of published models [24], we made comparisons with the established segmentation algorithms U-Net, FCN, and SegAN. Generalizability of neural networks remains a challenging problem, especially for medical imaging analyses [33]. To demonstrate the clinical usability of the present four models, we tested the models with our clinical dataset and with the public dataset PROMISE12. Because the PROMISE12 dataset contains images that were generated with differing scanning parameters and clinical conditions, it was omitted from model training, but its heterogeneity was favorable for model testing.
The experimental results presented in Fig. 4 and Fig. 5 show that the proposed SegDGAN method effectively performs automatic segmentation of prostate glands. Even with difficult prostate MRI images that had blurred borders and heterogeneous distributions of pixel intensity inside and outside the prostate, SegDGAN returned good segmentation results that were highly consistent with the ground truth. Hence, the present segmentation results benefited from the improved model design and the optimized objective function. In particular, we introduced a dense connection block structure into the generator network to realize direct connections between single layers and all subsequent layers, thus contributing additional monitoring information, reducing redundant feature learning, mitigating gradient disappearance issues, and preventing model over-fitting [26]. Multi-scale linking is widely used in discriminators to identify hierarchical features and calculate loss functions. Using these multilayer features, loss functions can capture spatial relationships between pixels. We also added Dice control to our objective functions to improve training performance for SegDGAN. The quality results presented herein demonstrate that SegDGAN is more effective and robust for prostate gland segmentation than the other segmentation methods, and has better segmentation performance. Moreover, our segmentation results from SegDGAN were smoother and more continuous, and had a higher degree of coincidence with the ground truth.
Based on analyzes of evaluation metrics shown in Table 1, application of SegDGAN to the experiment dataset achieved the highest DSC value of 91.66%, the lowest VOE value of 15.98%, and the lowest AS-D value of 0.46 mm. DSC values indicate degrees of similarity between two contour regions. Hence, the high DSC value for segmentation using SegDGAN demonstrates greater similarity with the ground truth, and greater segmentation accuracy than that achieved by the other methods. Similarly, the low VOE value for SegDGAN indicates lower volumetric overlap error between segmentation results and reference masks. Finally, the ASD value of SegDGAN was the lowest among models, suggesting relatively low error of prostate segmentation results. Taken together, these metrics demonstrate that the SegDGAN method significantly improves the performance of prostate segmentation compared with previously reported methods. Primarily, this improvement reflects the use of a dense block structure, which increases the depth of the layer. Moreover, we used a GAN model with a generator and a discriminator. This generator model facilitates evaluations of distribution characteristics of marked images. Specifically, extra supervised layers were added to the network through the discriminator, leading to strong constraints on the network. This design improved the accuracy and completeness of feature learning from images. Finally, addition of MAE and Dice to the loss function improved the proximity of the segmentation result to the masked image. The present metrics analysis accordingly indicated that the segmentation results from our method were closer to the ground truth and had relatively low segmentation errors, suggesting greater stability and higher repeatability. Quantitative results analyses were also in good agreement with previous visual observations. In analyzes of images from the public dataset PROMISE12, the SegDGAN method produced the highest DCS value of 88.69%, the lowest VOE value of 23.47%, the lowest ASD value of 0.83 mm, and the lowest HD value of 11.40 mm. These data confirm that SegDGAN has improved robustness and repeatability compared with other methods, during application to both experimental and public datasets. In Fig. 6, we present worst results of our segmentation algorithm. The image on the right of this figure shows a case with lesions. The brightness of the lesion area is relatively large, leading to a strong influence on the surrounding tissue. These conditions introduced error to our segmentation results. Our algorithm is also limited to two-dimensional information, and this could cause mis-segmentation, especially in tissues with structures and textures that are similar to those of the prostate gland, as shown in the left image of Fig. 6. In future studies, we will consider three-dimensional information and will optimize the algorithms for lesions. Finally, our method suffers disadvantages of longer training times and higher numbers of convergence rounds, again warranting further studies to optimize and improve future computational models.

Conclusion
In this paper, we present a novel segmentation network for GAN-based images. In this SegDGAN model, we used a fully convolutional network comprising densely connected blocks to construct a generator network. We also used a multi-scale and multi-level convolutional network to construct a discriminator network. Subsequently, we optimized corresponding objective functions to improve segmentation performance. When applied to MRI prostate image segmentation, SegDGAN segmentation results were close to the ground truth. Moreover, in comparisons with the established medical image segmentation networks U-Net, FCN, and SegAN, SegDGAN gave higher prostate segmentation accuracy when applied to our own clinical data and to that in the public dataset PROMISE12. In particular, analyzes with the PROMISE12 dataset, which was not used during model training, indicted that SegDGAN is more representative and generalizable than the other models. These qualitative and quantitative comparisons show that the SegDGAN model is a more effective and robust algorithm for prostate segmentation, and warrant consideration for automatic segmentation of prostate glands. This study is highly relevant to intelligent MRI analyses and diagnoses of prostate disease. Author's contributions PW and WW designed details of the research and carried out the majority of experimens. GW, XW and LW participated in the design and coordination of experimental work. JZ, LW, XW, XD and XC developed new algorithms. WW, GW,XD and XC contributed to data acquisition, interpretation and statistical analysis. WW and XC drafted the manuscript. JZ and PW made critical revision for important intellectural content. PW was the project administrator and handled funding and supervision. All authors have reviewed and approved the submitted manuscript for publication.

Funding
The work was supported by Science and Technology Commission of Shanghai Municipal (No. 17411952300).

Ethics approval and consent to participate
The dataset used in this study was approved by the Ethics Committee at Tongji Hospital of Tongji University. The requirement for informed patient consent was waived because it was a retrospective study without patient interactions.