A CBCT image noise reduction method based on cGAN

Background: As an imaging modality, cone beam CT (CBCT) is widely used in dentistry, which can help dentists to observe tissues such as roots and jaws without confusion. CBCT has the advantages of convenience and low radiation dose, however low contrast and large noise are the serious points in the images. These disadvantages make it extremely difficult for doctors to accurately identify target tissues. Due to the differences in scanning methods and reconstruction algorithms between CBCT and multi-row detector spiral CT (MDCT) ， the current CT noise reduction models have significant shortcomings when reduce the noise on CBCT images. Methods: In this paper, we propose a method of image noise reduction based on conditional generative adversarial network to improve the quality of CBCT images. The normal-dose MDCT images are used as the ground truth images to train the model to generate the denoise images. Results: In order to increase the model’s sensitivity to the gradient information, a gradient loss function is involved in our proposed method. The verification experiments on the simulated data set and the real data set show that our model effectively generates the denoise images as well as preserves the quality of the images. Conclusions: We compared the denoising effect between our model and other models with different loss functions. The scores by PSNR, MS-SSIM and GMSD showed that our model had better edge characteristics and denoising effect.


cGAN
The original GAN consists of generator G and discriminator D. Both of them are trained through adversarial training process. G receives the noise signal as input and generated samples. D received real samples and generated samples. D tries to distinguish these two samples. Training G is to generate the highly similar samples with the real ones. Training D is to accurately identify whether the samples originate from the generated samples or the real samples. This training process is like min max game between G and D.
However, the original GAN may have problems such as model collapse and gradient disappearing. To solve these problems, Arjovsky proposed Wasserste GAN, which introduced quantified training indicators and largely avoided model collapse.
Wasserste GAN is more easily trained than GAN.
Besides, cGAN was proposed to solve the randomness of the input and output of GAN. cGAN specifies the mapping from input to output by adding constraint y into generator G and discriminator D, respectively: min max � ， � =~�log� ( | )�� +~�log�1 − ( ( | ))�� (2) Constraint y is the ground truth image in noise reduction process. The images with noises are put into generator G, while both the images with noises and the ground truth images are put into discriminator D. Through adversarial training, generator G continuously generates images closer to the ground truth images and eventually to output the denoise images.

Models
We propose an improved cGAN, to solve noise misidentification and incomplete noise reduction problems existing in current cGAN models for CBCT images.
The generator G uses Unet-like structure in this paper which is similar to DeblurGAN [5] . Fig.1 shows the network structure. As an advantage over the direct convolution on the original image, Auto Encoder (AE) structure shortens training time by doing image transformation on a smaller size through downsampling and then restoring the original image size through deconvolution [11] . Long skip connections can help increase the number of feature channels, thus make the texture information of the original image easier propagation [12] .  Figure 1 generator G structure We use residual blocks in the residual net (ResNet) [13] to do the image transformation on the reduced size. Fig. 2 shows the residual block structure. The residual block helps the gradient propagate, and prevents the gradient from disappearing and avoids net degenerated when the network deepens. The number of residual blocks is 10 in our model. The discriminator D structure of our model is consistent with that of patchGAN proposed by P.Isola et al. [4] The generators loss function is the sum of content and adversarial loss: The adversarial loss is a loss function in the original GAN. It aims at generating images that approximate the ground truth image with high precision.
The content loss has three parts: the perceptual loss, the L2 loss (mean square error, MSE), and the gradient loss.
In recent researches, the output of deepen network expressed the abstract features [14] . The perceptual loss focused on how to restore the general content of the generated image [15] . In our paper, we calculate the perceptual loss from the output of third activated function on the third layer of VGG.
Although the L2 loss which is a classical choice for the content loss, has obvious merits. It is smooth. Its derivatives are easy to find. And it has stable solutions. But when it is the only optimization target the generated images are blurred.
We propose the gradient loss, which makes the model more sensitive to image gradient and edge information. The gradient vector of image function f (x, y) at the point (x, y) is defined as: The digital image can be expressed by a two-dimensional discrete function. We use the approximation differential derivatives in horizontal and vertical directions. For the gradient G of the digital image at the point (i, j), we have: = ( , + 1) − ( , − 1) The difference vector measures how far the generated images are from the ground truth images. The gradient loss, defined as the mean of the squares of the difference vectors, is: Where and represent the gradient of the generated image， � and � represent the gradient of the ground truth image at point (i, j) , with W, H being the length and height of the image respectively.
In medical images, both edge and noise are high-frequency components. It depends on the gradient information when doing the edge detecting and segmentation.
The noise greatly effects the accuracy [16] . L2 loss only concerns the difference value of each pixel. It can't reflect the effect of noise on the gradient. Gradient loss solves this problem.  Table 1. The gradient loss is the mean square for gradient difference vector length. It is more sensitive to the points with larger gradient error in the image, which have bigger effect on edge detecting and segmentation, while the points with smaller gradient errors do not. The drawback of gradient loss is that it cannot perceive the brightness change of the whole image, and it is not sensitive to the synchronous change of the continuous areas. we combined the L2 loss and the gradient loss. It showed that the results were better than using these two loss functions separately through the experiments shown in the discussion section.

Results
We discuss our model performance on noise reduction for CBCT data in this section.
Against RED-CNN [8] and SAGAN [6] , we compare the noise reduction results on the simulation data with Poisson noise and the real data. We also discuss the hyperparameter selection and model adjustment.

Experiments environment
The experiment data included CBCT and the head MDCT from 5 volunteers. CBCTs tube potential was 90kV, current was 10mA, and slice thickness was 0.3mm; MDCTs tube potential was 120kV, current was 232mA, and slice thickness was 0.9mm. The direction and position of the data were deviated because it came from different scanning equipment. We preprocessed the data firstly. We used iso-surface to 3D reconstruct the data, used ICP to register the 3D model, and resampled data again to obtain the experiment data of the same direction and the same slice.
We used the CBCT image as the generator G input and the MDCT image as the ground truth image to reduce the noise in the CBCT image.
The experiment used 620 images as training data and 150 ones as test data. The batch size was 1, the patch was 256, the number of training was 300, the first 150 epoch learning rates remained 10 -4 , and the last 150 ones training rates linearly decreased to 0. The experiment system was Ubuntu 16.04, the CPU was 20-core Intel Xeon E5-2698, and the model was trained on a tesla V100.

Image quality evaluation method
We use the peak signal-to-noise ratio (PSNR), multi-level structural similarity (MS-SSIM), and gradient amplitude similarity deviation (GMSD) to evaluate the quality of the generated image.
PSNR is a full reference image quality evaluation standard. It is calculated as follow: Where, MSE represents the mean square error between the current image X and the reference image Y, MSE is calculated as follow: Where W represents the length and H represents height of the image, n represents the number of pixel bit, and the unit of PSNR is dB. The larger the value, the smaller the distortion. [17] is the improved structural similarity proposed by Wang et al., Which is closer to the subjective quality evaluation results than SSIM. SSIM is calculated as:

MS-SSIM
MS-SSIM adds scale information when calculating SSIM, reduces the original image with various extent and calculates the contrast factor and structure factor on each size image. The calculation method is shown as follow: X is current image and Y is the reference image, L(X,Y)is the brightness contrast factor, C (X, Y) is the contrast factor of contrast, and S (X, Y) is the structure 1333， M represents the scale factor of image reduction. L, C and S are calculated as follows: ( , ) = 2 + 1 2 + 2 + 1 (14) ( , ) = 2 + 2 2 + 2 + 2 (15) ( , ) =  [18] , the calculation method is: GMS is the similarity of the gradient amplitude, and the calculation formula is as follow, GMSM is the mean of the regional gradient field.
represents the gradient magnitude of the current image and represents the gradient magnitude of the ground truth image.
ℎ ℎ are the operators of Prewitt. The smaller the GMSD value, the closer the current image is to the ground truth image.

Simulation dataset
We added Poisson noise to the MDCT to simulate LDCT data. The number of images, size, and network parameters remained the same with the real data set. Table 2 shows the objective evaluation scores of each model in the test data. The denoise effect of the RED-CNN in the simulation data set was slightly better than our model. It showed that SAGAN had a lower overall score because it misidentified the low-density area as error. But the main tissues such as teeth were clear. Figure 3 shows the results of each model.

Real dataset
The real data set used the CBCT image as the LDCT image and its corresponding MDCT image as the ground truth image. The data was divided into three groups. We discuss the denoise results of each model on the crown, root and jaw bone. Figure 4 shows the results in the test set. It showed that the images trained by RED-CNN were blurred. The images trained by SAGAN model were clearer, which was not sensitive to the noise. Some noise had not been removed, and the part of the low-density bone was misidentified as noise. We use PSNR, MS-SSIM, and GMSD as evaluation indicators. Table 3 shows the quantitative results. The scores of our model are the best.  We constructed the error images according to the average of absolute difference between all generated images and ground truth images pixel by pixel in the test set.
Contrast with the error image, it can be seen directly that the error of our model was significantly smaller than the contrast model in Figure 5.

Hyperparameter selection and model adjustment
In this section we demonstrate the effectiveness of the long skip link through the contrast experiments. We discuss the hyperparameter selection and the effect of the preprocessing.

Long skip link
We respectively tested the quality of the images generated by the model without skip connections and with one skip connection. Table 4 shows the scores that show the quality of the image generated by the model added the skip-in link is improved.

The number of Residual Blocks
The number of residual blocks determines the depth of the model. We compared the different depth models with the residual blocks number of 6, 8, 10, 12, and 14. Figure   6 shows the results. It can be observed that with the depth of the model increasing within a certain range, the quality of the generated image was improved. But as the depth continued to increase, the quality of the generated image decreased instead. At the same time, as the model depth increased, the training time linearly increased.  Figure 7(c) was the strong noise aera. In this aera, the denoised images by our model were better than other models.
In summary, our model could identify noise more accurately. It couldn't misidentify the noise as tissues compared with other models while its limitation was the incomplete denoising for radial noise. Through the experiments, it can be concluded that although the RED-CNN performed well on objective evaluation indicators such as PSNR, its visual effect was poor that was more blurred than other models. We did Gaussian filtering and mean filtering with a window size of 3 on the MDCT images with Poisson noise, and calculated the image quality of the results. Table 5 shows the experiment results. It showed that for Poisson noise, above filtering had improved the objective evaluation score of the image in some extent. when training the model, it was necessary to avoid image blurred when denoising.

Effectiveness of gradient loss
We discuss the effect of the generator G loss function on the generated image quality.
We used different loss function and their combination as loss function. Figure 8 shows the comparison results. perceived loss combined with adversarial loss and gradient loss Comparing the results of (2) and (3), we can see that compared with "perceived loss combined with adversarial loss", the model using "gradient loss combined with MSE" did better on PSNR, but it did worse on GMSD and MS-SSIM. We combined perceived loss with gradient loss and MSE and we combined adversarial loss with gradient loss and MSE. The results are shown in Figure 8(4) and (5), respectively. The former improved the quality of the generated image. For the model with adversarial loss, the generated image quality was poor and it appeared fake details. We also compared the "perceived loss combined with adversarial loss and MSE" model with "perceived loss combined with adversarial and gradient loss" model. Figure 8 (6) and (7) show the results. It can be observed that our model obtained the best results which used the combination of the four loss functions as the final function.
We compared the effect between the models without gradient loss and the model with gradient loss on the edge information of the generated image. Figure 8(1) and (6) show the quantitative results. In the test data, the model with gradient loss had better edge characteristics, as shown in Figure 9.
It can be observed that the median of ΔPSNR is above 0, and the first quartile is slightly lower than 0 in Figure 10, while both the median and lower quartile of ΔMS-SSIM are above 0. It can be concluded that the test image gradient loss improved the quality of the generated image. But there are also some images, the model without gradient loss denoised better yet. For all methods based on deep learning, it needs to be trained for specific dose levels, window widths and window levels. Our model only did the noise reduction for high-density bones and teeth in oral CBCT data. The denoise performance of other parts and other tissues remains to be evaluated.