A Generative Adversarial Network technique for high-quality super-resolution reconstruction of cardiac magnetic resonance images

Objective: In this paper, we proposed a Denoising Super-resolution Generative Adversarial Network (DnSRGAN) method for high-quality super-resolution reconstruction of noisy cardiac magnetic resonance (CMR) images. Methods: This method is based on feed-forward denoising convolutional neural network (DnCNN) and SRGAN architecture. Firstly, we used a feed-forward denoising neural network to pre-denoise the CMR image to ensure that the input is a clean image. Secondly, we use the gradient penalty (GP) method to solve the problem of the discriminator gradient disappearing, which improves the convergence speed of the model. Finally, a new loss function is added to the original SRGAN loss function to monitor GAN gradient descent to achieve more stable and efficient model training, thereby providing higher perceptual quality for the super-resolution of CMR images. Results: We divided the tested cardiac images into 3 groups, each group of 25 images, calculated the Peak Signal to Noise Ratio (PSNR) /Structural Similarity (SSIM) between Ground Truth (GT) and the images generated by super-resolution, used them to evaluate our model, and Compared with the current widely used method: Bicubic ESRGAN and SRGAN, our method has better reconstruction quality and higher PSNR/SSIM score. Conclusion: We used DnCNN to denoise the CMR image, and then using the improved SRGAN to perform super-resolution reconstruction of the denoised image, we can solve the problem of high noise and artifacts that cause the cardiac image to be reconstructed incorrectly during super-resolution. Furthermore, our method is capable of high-quality reconstruction of noisy cardiac images.


Background
Cardiac magnetic resonance (CMR) imaging plays an important role in the diagnosis of heart disease. It can be used to assess the structure and function of the heart. The false triggering of ECG gating, patient arrhythmia and incomplete breath hold may cause artifacts or other noises during the acquisition of CMR images [1], which will greatly affect the cardiovascular image diagnosis [39] of patients.The suppression of noise is best handled in time during the acquisition process, but the hardware requirements are very demanding and the cost is relatively expensive. The cost of using the deep learning technology for the acquired image is much smaller. With the development of deep learning, image processing [36] methods such as denoising, super-resolution reconstruction and intelligent recognition of the collected images have become the focus of attention of most scholars.
Image denoising is a classic theme in computer vision and is an indispensable part in the application of actual image processing. In the past, non-local self-similarity (NSS) [2], sparse representation [3], Markov random field (MRF) [4] and other prior methods are applied to image denoising. However, the significant problems of the prior method are difficult to optimize and take a long time. Chen et al. suggested a trainable nonlinear reaction diffusion (TNRD) model [5], which is expressed as a feed-forward deep network by developing a fixed number of gradient descent inference steps. With the development of machine learning and deep learning, multi-layer perceptron (MLP) [6] is successfully used in image denoising. Yang et al. used a CT image denoising method with Wasserstein distance and perceptual similarity based on Generative Adversarial Network (GAN) [7]. Perceptual similarity loss is compared with the perceptual features of the denoising output in the established feature space, and use the perceptual features of real images to suppress noise.
The disadvantages of the above methods that cannot be ignored are that they are trained with a specific model for a specific noise level and are limited in the denoising of non-directional noise images. Zhang et al. advised to treat image denoising as a simple discriminant learning problem [8], that is, to separate noise from noisy images by feed-forward denoising convolutional neural network (DnCNN), integrating batch normalization and residual learning to accelerate training Process and improve denoising performance. This paper uses feed-forward denoising convolutional neural network (DnCNN) to denoise the noisy cardiac image, train a single DnCNN model for blind Gaussian denoising, and a better denoising model than training for a specific noise level.This method improved the authenticity and quality of cardiac images.
In actual medical image diagnostics processing [40], low-resolution (LR) images with poor quality have too few texture details, which is detrimental to the accuracy of the diagnosis of heart disorders. Therefore, it is necessary to convert a LR cardiac image into a high-quality high-resolution (HR) image. Harris and Goodman et al. mentioned super-resolution [9,10], and believed that there is a relevant mapping relationship between LR images and HR images. If we can train a large number of images through deep learning models to learn these mapping relationships, then using LR images can reconstruct true HR images. Dong et al. used a deep learning model to solve the super-resolution problem [11], and used a three-layer convolutional neural network (CNN) [12] to learn the mapping relationship between LR images and HR images, adding a Mean Squared Error ( MSE) as a loss function to obtain high-quality images.
However, when MSE is used as a loss function, when the input resolution of the image is large, the high-frequency texture details of the image will be lost. Ledig et al. used GAN to deal with image super-resolution [13], learned the mapping relationship between LR and HR through the confrontation of generator and discriminator, and adopted a new perceptual loss function to enhance the texture details of the image. However, the training of the original GAN is unstable and it is easy to bring the non-existing features to the generated image, which is also the open and challenging problem that GAN has always faced. Martin Arjovsky et al. proposed WGAN to solve the problem of instability in the original GAN training [14]. The approximate optimal discriminator was used to optimize the generator to reduce the Wasserstein distance, so that the distribution of generated images tended to the distribution of real images. Ishaan Gulrajani et al. found that WGAN had shortcomings and can only generate low-quality samples and the model convergence was difficult [15]. Therefore, WGAN was improved, and a high learning rate gradient penalty was applied to each sample to increase the convergence speed and use the Adam optimizer to Improve performance.
Our main contributions are as follows: Firstly, the input image with lower PSNR [33] is not suitable for GAN, which will greatly reduce its performance. Most CMR images are noisy, and their PSNR is relatively low. Therefore, we first perform DnCNN denoising on the CMR image to obtain a high PSNR cardiac image, which is used as the SRGAN [32] training image to increase the CMR training set. Enhance the generalization ability of the model. Secondly, SRGAN has the problem of gradient disappearance, which makes the training unstable and brings non-existent features to the generated image. Therefore, we minimize the Wasserstein distance and use the GP method, make the distribution of the generated image close to the distribution of the real image. The Lipschitz limit is that the gradient of the discriminator does not exceed K, then we first find the gradient of the discriminator. A L2-norm between output and K is established to implement a penalty loss function to avoid the disappearance and explosion of training, and at the same time improve the convergence speed of the model. Thirdly, the network we proposed is based on the GAN architecture and extends the Denoiser module so that the network can perform super-resolution processing on low-resolution and noisy CMR images, and add the WGAN loss function to the original SRGAN loss function SR to increase the accuracy of image reconstruction. The experimental results show that our method has higher PSNR/SSIM than Bicubic [30], ESRGAN [31], SRGAN and other methods, and the reconstruction quality is significantly improved.

Results
Our method is implemented by Python 3.6, TensorFlow 1.6.0 and Pytorch 0.4.1. We applied the Adam optimizer [29] to adjust our GAN network, where the parameter β =0.9, and the batch normalized size is 16. The learning rate of the residual network is 10-4, and the learning rate decay is 0.1. we use Wasserstein-GP to supervise gradient descent [38] and avoid gradient disappearance and explosion. We changed the number of training epoch from 10,000 in the past to 1500, because we found that the network began to converge after 1500 epochs, so we discarded the number of unnecessary training epochs, reduced the amount of network calculations, and accelerated the speed of network reconstruction.
Our data set collected cardiac images of 64 patients using the Siemens Sonata 1.5 T Syngo MR 2004A scanner with Numaris-4 serial number 21609. A total of 675 MR images were collected, of which 75 were used as test images. The test was divided into 3 groups of 25 images, and the PSNR/SSIM was averaged for each group to verify our super-resolution reconstruction model. The remaining 600 images are used as the training set, and the data is amplified by cropping, translation, and rotation, because a large amount of data can improve the reconstruction accuracy of the model. The original GAN uses MSE and VGG as the generation loss function, which solves the generation of super-resolution images, but the training time and reconstruction accuracy can also be improved. We combine WGAN with the two as the generation loss function of Dn-SRGAN. Through the comparison of the experimental training curve (see Fig.1), it is found that the PSNR value is effectively improved after adding the WGAN loss function. It can be seen from the curve that the improved method can reach convergence in fewer training rounds.
We divided the test images into 3 groups ( hereby labelled as a, b, and c ), which is based on each group of 25 time series frames, to verify our DnSRGAN model. As shown in Fig. 2, we select representative images from each group to show the intuitive visual experience of our method and super-resolution methods such as bicubic, SRGAN, ESRGAN, etc. It can be clearly seen that the MR image generated by our method is closer to the GT image than other methods. In computer vision, most people use the numerical evaluation criteria such as PSNR and SSIM to verify the quality of the reconstructed image. From the values in Table 1, we can see that the Avg-PSNR and Avg-SSIM of our DnSRGAN model are obvious higher than the other three methods. From the average PSNR/SSIM, we are superior to other methods. The histograms in Fig.3 and Fig. 4 show the advantages of our method more clearly,VAvg-PSNR ,VAvg-SSIM are the values of Avg-PSNR and Avg-SSIM, respectively. This diagram shows that our network can better reconstruct cardiac images with complex structures.

Discussion
In this work, we refer to the WGAN and WGAN-GP methods to add the WGAN loss function to the original SRGAN loss function to monitor the GAN gradient descent, minimize the Wasserstein distance, adjust the distribution of the generated image close to the distribution of the real image, and use the GP optimize the discriminator to achieve more stable and efficient model training, thereby providing higher perceptual quality for the super-resolution of CMR images. We also pre-denoise the CMR image, add the obtained high PSNR image to the training of the network, expand the training set, and enhance the generalization ability and credibility of the model.
The traditional bicubic interpolation method adopts average interpolation for the enlarged image, which smoothes the contour details. The generation of adversarial networks solves the problems of traditional methods and sharpens the boundary texture. But then comes new problems. The deepening of the network layer and the complexity of the texture of the training image make the training unstable and the gradient disappears, so that the network produces unreal texture details. Our method fills the defects of the above two problems. From Table 1, we can see that our DnSRGAN is significantly better than the bicubic interpolation method and other SRGAN methods in PSNR/SSIM, and is superior in the reconstruction of a clear image.
Since our dataset is many groups of cine MR image, which consist of the same object in different time frames. Even if data augmentation technologies such as rotation, translation, and cropping expand the data set, they still cannot solve the problem of insufficient data and insufficient variety. In future research, we will conduct research from the following three aspects. Firstly, we try a more accurate loss function to restore the structural information lost after image enlargement, and will not reconstruct unreal texture features. Secondly, we consider using a highly complex and modular deep learning method to pre-denoise the CMR image, because the higher the quality of the input image, the better the effect of super-resolution reconstruction. Thirdly, use deep learning to batch generate CMR images that are completely different from existing data sets. Finally, we will perform research in the direction of 3D super-resolution and consider adding the spatial features of the image sequence to the middle of the network.

Conclusion
In this paper, we developed a novel Dn-SRGAN model based on DnCNN and advanced SRGAN architecture, introducing the Denoiser module to pre-denoise the CMR image. The network not only perform super-resolution processing on the original low-resolution and noisy CMR images, but also perform super-resolution reconstruction on the original high-resolution CMR images. Based on our proposed technique, we can solve the problem that the artifacts and noise of CMR images cause lower PSNR to affect the performance of GAN. Experimental results show that our method has higher PSNR/SSIM compared to Bicubic, ESRGAN and SRGAN methods, which means that our proposed method has higher image reconstruction quality, which can better guide medical experts in the targeted cardiac diagnosis.

Denoising Convolutional Neural Network
The goal of image denoising [34] is to remove the pollution information superimposed on the original clean image and reconstruct the potentially clean image. Compared to noisy images, potentially clean images contain more information. Zhang et al. proposed to use a deeper DnCNN model to achieve denoising [7]. In order to solve the gradient dispersion effect caused by the deepening of the network layer, DnCNN does not directly learn the noise image, but uses the L2-norm [37] of output and noise as the loss Function to train the network. DnCNN is regarded as a residual learning process. The network uses the BN layer and residual learning to jointly use to improve the performance of the model and achieve blind image denoising without specific noise.

Generative Adversarial Network
Goodfellow et al. first designed to generate adversarial networks [16], which use generator networks and discriminator networks to confront each other and train alternately to make the generated images better. The generator network G is responsible for generating SR images close to Ground Truth (GT) image, and the discriminator network D is responsible for distinguishing the true and false images generated by the generator network from the GT images. Ideally, when the discriminator network discriminates the generated image into the GT image, the network model is optimal. However, the distribution of the images generated by the original GAN generator is random, which easily makes the gradient [17] of the generator disappear. To solve this problem, WGAN was proposed to minimize the Wasserstein distance [18] so that the distribution of the generated image and the distribution of GT image are infinitely close. In the process of experimental training, the convergence speed of WGAN is slow. Therefore, WGAN with Gradient Penalty (WGAN-GP) [19] accelerates the convergence of WGAN by adding a GP to the discriminator and using the Adam optimizer to optimize the generator.
We convert the GAN problem to solve the min-max problem, as shown in Eq. 1: where G is the generator network, D is the discriminator network, HR CMR  is the original high-resolution CMR image or the high-resolution image after denoising.   x means to sample in all CMR images,  is a constant.

Proposed DnSRGAN architecture
The network structure we use is based on the SRGAN framework, but we have augmented a Denoiser, which effectively helps us remove noise and artifacts [20] in CMR images. The high PSNR images after denoising can be added to the GT image as our training image. This increases the data set and avoids overfitting [35] the model. At the same time, it enables our generator to perform super-resolution reconstruction of noisy CMR images. The entire network structure is shown in Fig. 5, we take the noisy CMR low-resolution image as input and enlarge it by 4x . The generator uses the enlarged image to generate a SR image. The discriminator distinguishes whether the generated image is the original high-resolution CMR image or the high-resolution CMR image after denoising by the denoiser.
Our expanded Denoiser (Fig.6a) separates noisy images and noise based on the DnCNN network, adopts residual learning [21], and adds batch normalization to enhance denoising performance and speed up the training process. After denoising the CMR image, the sharpening effect of the boundary and texture details is better. We convolved the CMR noise image with 64 3×3 convolution kernels, and then integrated the feature map using the ReLU function. Then we used 8 residual blocks ResBlock (Fig. 6b) to train the network, difference from the previous residual block is that we added the BN layer and adopted the skip connection [23] between each ResBlock [22], which can shorten the training time of the network and improve the denoising performance of the network.
The generator of the network learns the residual between the original HR image and the LR enlarged 4x image by modifying the gradient parameters on the basis of the low-resolution CMR image enlarged by 4x. The task of the discriminator is to determine whether the image generated by the generator is GT image, and feed back the result to the generator through parameters. As shown in Fig.7a, the left side is a LR magnified 4x image, and the right side is a generated SR image. First use 64 9×9 convolution kernels for convolution processing, then use 8 ResBlocks to learn the residual mapping between LR/HR, and finally use 256 3×3 convolution kernels for convolution to obtain the final generation SR images. In Fig. 7b, the generated SR image and GT image are taken as input, and after 8 convolutional layers, each convolutional layer is batch normalized (BN) [24], LeakyReLU [25] activation, after passing through two dense layers [26], the statistical probability is finally feed back to the generator through the Sigmoid function. Ideally, the discriminator discriminates all SR images into GT images, the model can reconstruct true HR cardiac images.

Perceptual loss function
GAN model training is unstable and prone to gradient disappearance and explosion problems. We use the minimized Wasserstein distance to make the distribution of the generated image close to the distribution of the real image, and use the gradient penalty method. Lipschitz limits the gradient of the discriminator to no more than K, the gradient of the discriminator is determined, and then a two-norm between K is established to realize a penalty loss function to avoid training disappearance and explosion, and at the same time improve the convergence speed of the model. Wasserstein distance is shown in Eq. 2: There, L represents the number of layers of the network, i D represents the judgment network of the i-th layer,  G represents the real generation network,  G represents the estimated generation network, x represents the image sample, and K is a constant, which depends only on the size of the pixel space, and the network of each layer The weight is irrelevant. In order to enable our generated CMR image to have better high frequency details, our loss function SR L uses the combination of MSE [27] loss and VGG [28] loss as the content loss of the generator, while adversarial loss using the WGAN loss function to solve the problem of gradient disappearance during the training process. loss function is shown in Eq. 3: Here, j i,  represents the feature obtained by the j-th convolution before the i-th pooling layer.
Due to the problem of gradient disappearance of the original GAN generator, we adopted Wasserstein distance to optimize generator. The generator's adversarial loss is defined as the probability that the discriminator will recognize the generated high-resolution CMR image as the original high-resolution image. The function loss of the generator is shown in Eq. 6: Where, HR CMR  refers to GT images.  List of Table Legends   Table. 1 PSNR/SSIM evaluation with other super-resolution methods.