CS-GAN for High-Quality Diffusion Tensor Imaging


 Background: Compressed sensing magnetic resonance imaging (CS-MRI) is a promising technique for accelerating MRI speed. However, image quality in CS-MRI is still a pertinent problem. In particular, there is little work on reducing aliasing artefacts in compressed sensing diffusion tensor imaging (CS-DTI), which constitute a serious obstacle to obtaining high-quality images. Method: We propose a CS-DTI de-aliasing method based on conditional generative adversarial (cGAN), called CS-GAN, to tackle de-aliasing problems in CS-DTI with highly undersampled k-space data. The method uses a nested-UNet based generator, a ResNet-based discriminator, and a content loss function defined in both image domain and frequency domain. Result and Concludions: Compared to existing state-of-the-art de-aliasing methods based on deep learning, our method achieves superior imaging quality in terms of both diffusion weighted (DW) image quality and DTI diffusion metrics. Moreover, even at extremely low sampling ratio and low SNR, our method can still reconstruct texture details and spatial information.


Background
Diffusion tensor magnetic resonance imaging (DTI) is a technique that aims to measure the diffusion of water molecules in tissues in different directions. It has been used to examine noninvasively fiber structures of ex vivo human hearts [1,2], healthy human hearts [3,4,5,6], and patients [7,8]. To achieve DTI, a relatively important number (at least six) of diffusion-weighted (DW) images have to be acquired, each of which corresponds to one diffusion gradient direction. Therefore, shortening acquisition time especially compressed sensing (CS) [9,10,11,12] theory is applied to clinical routine applications of DTI. CS appeared the most promising method to highly undersample k-space while obtaining excellent image quality. CS achieved a great success in magnetic resonance imaging (MRI) reconstruction. It was later applied to diffusion MRI (dMRI) including DTI. This is particularly important for reducing motion effects in in vivo cardiac DTI. However, according to the CS theory, under-sampling at k-space will cause serious aliasing artifacts in the spatial image. In order to remove aliasing, some methods pay attention to reconstruct high dimensional signal from undersampled k-space signal using CS approach, other methods aim to de-aliasing and recover high quality DW images, the latter is what this work focuses on.
Recently, de-aliasing for DTI based on total variation (TV) constraint on spatial images and diffusion directions was proposed in [13]. To further promote image sparsity required in CS-dMRI, the inner and intra correlations of DW images were also introduced in the sparsity prior [14]. Global low-rank [15,16], TV-constrained lowrank [17] and phase-constrained low-rank [18] models were used for the CS-dMRI dealiasing. Although the above-mentioned methods provide valuable means to recover better image quality of dMRI, their clinical implementation still faces many challenges, such as nonlinear optimization used in these methods, which often involves iterative computation that may result in relatively long reconstruction time, which cannot satisfy real-time requirement in routine clinical use. More recently, with the success of deep learning, in particular convolutional neural networks (CNNs), in inverse problems solving, such as denoising [19], deblurring [20,21] and superresolution [22,23,24], researchers have begun to investigate deep learning models for CS-MRI reconstructions. For instance, Lee et al. proposed to use UNet with deep residual learning to learn the aliasing artifacts between the reconstructed images with undersampling data and the ground truth [25]. Subsequently, they used a dual frame U-net to reconstruct both magnitude and phase images to further improve the CS-MRI reconstruction accuracy [26]. Schlemper et al. used deep cascade CNNs to boost the dynamic MRI acquisition by learning the spatial-temporal correlations in MR images [27]. Yang et al. proposed a deep ADMM-net that realized CS-MRI reconstruction using optimized parameters learned from training data [28]. Han et al. proposed to use deep learning with domain adaption to reconstruct CS-MRI, which fine-tuned a well-trained network for CT images with a few radial k-space sampling data [29]. Eo et al. developed two CNN networks implemented on both k-space domain and image spatial domain to improve the performance of CS-MRI reconstruction [30]. Bao et al. presented an enhanced recursive residual network that combines residual blocks and dense connections to realize CS-MRI reconstruction for both real-valued and complex-valued data [31]. Dai et al. developed a multi-scale dilated residual convolution network to promote CS-MRI reconstruction accuracy with fewer network parameters [32]. Inspired by the successful application of GAN in generating perceptually appealing images with rich texture information, Yang et al. proposed a deep de-aliasing generative adversarial network (DAGAN) to remove aliasing artifacts in CS-MRI [33]. Quan et al. presented a RefinGAN model with a cyclic loss to improve the reconstruction accuracy at extremely low sampling rate [34]. Mardani et al. used a ResNet-based generator and discriminator with a least square GAN loss to solve the CS-MRI de-aliasing problem [35]. Despite these researches on CS-MRI using CNNs, there are very few studies on the use of deep learning for CS-DTI [36]. The interest of CS-DTI resides in the fact that k-space can be highly undersampled to reduce substantially the amount of k-space data to acquire. However, highly undersampling k-space means considerable difficulty of reconstructing images of good quality, especially for DTI that has a high requirement on the quality of reconstructed diffusion weighted (DW) images. If the quality of DW images along different diffusion directions cannot be guaranteed, the derived diffusion metrics will not be correct. Considering the superiority of GAN models in generating image textures, and the fact that GAN models are easy to fall into mode collapse, in this paper, we propose a CS-DTI de-aliasing method based on modified conditional GAN models, called CSGAN. The method uses a nested-UNet [37] like model as generator and a ResNet-based [38] network as discriminator. The long and short skip connections introduced in the generator allow us to recover more details in DW images and therefore avoid mode collapse to some extent. We used the aliased spatial images whose k-space is undersampled as the input, and the images corresponding to full k-space as the target to train CSGAN.

Methods
The architecture of the cGAN-based CS-DTI de-aliasing is illustrated in Fig 1. It consists of a generator (Fig 1(a)) and a discriminator (Fig 1(b)). The generator is composed of a nested-UNet model, in which the long and short connections coexist, which not only allows concatenating the deep and shallow features belonging to the same layer in the topology, but also let the features from different layers to be fused. Fig 1(a) shows how the feature map travels through the top skip pathway of generator. Formally, it can be formulated as follows: where x ij is the feature map represented by the pellet at the positions i and j , S(·) is a convolution operation followed by an activation function. L(·) denotes an up-sampling layer and C(·) the concatenation layer. Each standard unit embed a squeeze and excitation (SE) module, which can calculate the importance of each feature channel through learning, and then use this importance to enhance useful features and suppress features that are not very useful for the current task. The discriminator is a typical binary classification network based on ResNet. It consists of feature selection, feature fusion and discrimination modules. The feature selection module is formed of five convolution layers with stride of 2, kernel size of 4 and batch normalized layer. The feature fusion module comprises two convolutional layers with stride of 1 and kernel size of 1 and three convolutional layers with stride of 1 and kernel size of 3. The last module of the network consists of a full connection layer and sigmoid function.
Denoting the original image as u and let y represent the undersampled k-space data, where R is the undersampling mask and F represents the Fourier transform. Given the k-space data y, the objective of CS reconstruction is to find a nonlinear function G through which the original image can be reconstructed asû, Such reconstructed image should be as close as possible to the original image u , which means that we should find the appropriate parameter θ of such function to minimize the difference between u andû .
Since y is the undersampled data in k-space while u is expressed in space domain, we converted y into space domain by inverse Fourier transforming its zero-padded version, thus resulting in a zero-filling reconstructed image u 0 . The optimization problem then becomes The objective is to estimate the nonlinear function G using the above-described neural network.
The input of the generator is DW images whose k-space is undersampled, also called k-space undersampled DW images in what follows. The inputs of the discriminator are DW images whose k-space is fully sampled (also named ground-truth images) and the reconstructed images. The discriminator aims to distinguish the generated images and ground-truth images and make the generator to generate more realistic images through an adversarial training. To achieve this, the loss function of the generator is formulated as where the generator aims to implicitly learn the real data distribution p data from a set of images to further yield images drawn from the learned distribution and L a dv denotes traditional adversarial loss that drives the learned distribution to be the same as p data through a minmax adversarial training. L i is used to measure the mean square error between the reconstructed images and the k-space undersampled DW images, and L f designates the mean square error in k-space. α and β are weighting coefficients.
Theoretically, the adversarial loss L adv is defined as a value function V (D, G), In practice, such value function cannot be implemented easily. It is however equivalent to minimizing the cross entropy of negative samples when training the generator: The mean square errors L i and L j are defined respectively by: And where v i andv f represent respectively the Fourier transform of images u i andû f . As to the discriminator, since it aims by definition to train a binary classification, its loss function can be defined as [39,40]: In order to make the proposed reconstruction network to generate the desired samples, the discriminator was trained several times and then the generator was trained once. The training process is described as follows.
.., u m from real data. Perform Adam updates for G: end for

Dataset Description and Experimental Setup
To effectively evaluate the superiority of the proposed model, two kinds of diffusion MRI datasets were used in the present work, including DW images of 11 human neonatal/infant hearts and those of 11 human adult hearts. For neonatal/infant hearts, the DW images were acquired with a 3T clinical MRI scanner (MAGNE-TOM Verio, Siemens AG) using MDDW (Multi-Directional Diffusion Weighting) sequence with the following imaging parameters: slice thickness and spacing=1.4mm, number of slice=30 to 40, in-plane spatial resolution=1.38×1.38mm, TE=70ms , TR=5500ms to 7900ms, matrix size=104×104, number of diffusion gradient di-rections=192, and diffusion sensitivity (b value=700s/mm 2 ) . All the DW images were acquired for six times and averaging them allows increasing SNR. For adult hearts, the DW images were acquired with a 1.5 T clinical MRI scanner with the parameters as follows: slice thickness and spacing=5 or 2mm , number of slice=25 to 62, in-plane spatial resolution=0.5×0.5mm 2 or 2×2mm 2 , TE=70ms to 102ms , TR=5270ms to 8600ms, matrix size=128×128 or 336×512, number of diffusion gradient directions=12 or 64, and diffusion sensitivity (b value=1000s/mm 2 ) .
In our experiments, DW images of seven neonatal/infant hearts were selected randomly for training the reconstruction model, two hearts were selected for validation, and the images of the remaining two hearts were taken as the testing dataset. In summary, we have 38851 samples for training, 5018 samples for testing, and 12352 samples for validation. To speed up the training, the original slices were cropped to a size of 64×64 and the region of interest (ROI) was extracted to avoid the effects of image background. A few slices with small ROI were disregarded. The intensity of the selected DW images was normalized to the range of (−1, 1) . For the human adult hearts, we fine-tuned the well-trained model obtained with the neonatal/infant hearts. Specifically, the network parameters of the well trained model were taken as the initial, then we used the DW images of 7 adult hearts to fine-tune the parameters. The DW images of two hearts were used to validate, and those of remaining two hearts were used to test.
The proposed reconstruction method was also compared with other CS reconstruction methods including ADMM and DAGAN. For these two existing models, the hyperparameters in their loss function were adjusted based on our own dataset to achieve their best performance. To be adapted to image size and maintain the same deep architecture simultaneously, the stride length that was used in the first two convolutional layers of the DAGAN model was changed from 2 to 1. All the training strategies for ADMM [28] and DAGAN [33] were identical to those given in the initial works, which can be downloaded from the following links, https://github.com/yangyan92/Deep-ADMM-Net and https://github.com/nebulaV/DAGAN. The initial learning rate was 0.0001 and dropped every 5 epochs with a decay rate of 0.75 to get the best performance; epoch number was set as 30-40. For our proposed model CSGAN, we used nested-UNet as the generator instead of UNet. The initial learning rates for both generator and discriminator were set as 0.0001 and they were dropped every 5 epochs with a decay rate of 0.85; epoch number was set as 20-30. ADMM-net was implemented with Matlab 2016a, DAGAN and the proposed CSGAN was implemented with Tensorflow on a NV-Linked V100 GPU.
The original full k-space was firstly sampled with a randomly perturbed goldenangle sampling mask with ratios of 50%, 40%, 30%, 20% and 10%. Then, spatial images were reconstructed using respectively ADMM, DAGAN, and CSGAN. The reconstruction results and quantitative analysis are given in the following subsections.
Note that the above initial DW images are spatial magnitude data (so, their corresponding k-space data are complex but Hermitian symmetric). Such Hermitian symmetry was however rendered non-Hermitian symmetric by the following calculation procedure: a) take the Fourier transform of spatial magnitude DW images, yielding complex but Hermitian-symmetric k-space data; b) randomly undersample the complex k-space data, making the k-space data asymmetric; c) obtain complex spatial DW image containing phase information after estimating the nonlinear function G using CNN; d) take the magnitude of the complex spatial DW image as the final reconstructed DW image.
To evaluate the performance of the proposed reconstruction network, the normalized mean square error (NMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and deviation angle between original fiber orientation and generated fiber orientation were used. The higher the SSIM and PSNR values, the better the results. The deviation angle measures the accuracy of the reconstructed fiber tracks.

DTI De-aliasing Results
The DW images of both fetus and adult human hearts reconstructed with different methods and sampling ratios are shown in Fig 2. The first column represents the ground truth (GT). The next five columns show results of reconstruction using different undersampling ratios. The first three rows represent the results of the proposed model, the middle three rows the results of DAGAN, and the last three rows the results of ADMM. We observe that, for the same sampling ratio, the reconstructed DW images with CSGAN preserve well texture details, while ADMM and DAGAN yield over-smoothed reconstructed images. Especially at sampling ratio of 30% or smaller, ADMM presents few textures while CSGAN is still able to preserve details. Such observation can be found in the DW images of both fetus and adult human hearts.
In CS-DTI, we should guarantee reconstruction accuracy of diffusion properties, including the most important DTI indices such as FA, MD and fiber orientations. The reconstructed FA, MD and deviation angle (between original and reconstructed fiber orientation) maps are shown in Fig 3 to 4, respectively. In each figure except Fig 5, there are three groups of 2-rows, each of them corresponds to one method. The first row in each group represents the reconstruction results and the second row the residual map between reconstruction and GT. In Fig 5, each group contains 3 rows with a zoomed error version added. Since FA is derived from diffusion tensor that is in turn calculated from DW images, slight changes of DW images in any directions can have a large impact on the results. It can be seen that, although the residual of the reconstructed DW images is small, FA and MD calculated from the reconstructed DW images exhibit a bigger difference from GT (Fig 2 and Fig  3). Nevertheless, FA and MD are much better with CSGAN than with DAGAN or ADMM, with far fewer pixels having greater errors. This implies that CSGAN reconstructed well DW images along all the diffusion directions. Since ADMM oversmoothed DW images, it also oversmoothed FA and MD maps and led to greater errors in fiber orientation reconstruction (Fig 5). In contrast, with the proposed CSGAN method, the reconstructed diffusion index maps are closer to the original ones.

Quantitative Analysis
To quantitatively compare the proposed model and existing ones, the average PSNR and SSIM for DW images were given in Table 1. It is clearly seen that PSNR and SSIM of DW images reconstructed with our method are the highest, and this for all the sampling ratios. PSNR and SSIM only reflect overall reconstruction quality. To get insights into the difference in image intensity distribution, the violin plots (drawn with python package seaborn) of DW images, FA, MD, and fiber orientation errors reconstructed with different methods for sampling ratio of 20% are illustrated in Fig 6. In the violin plot, the middle vertical black bar indicates the interquartile range. The thin black line extended from the middle black bar represents the 95% confidence interval. The white point is the median. On each side of the thin black line is a kernel density profile that shows the distribution shape of data. Wider sections of the violin plot represent a higher probability that the data will take on the given value; skinnier sections represent a lower probability. From the violin maps, we can get insights into the distribution of data values. The closer the distribution of reconstruction results to that of original data values, the better the reconstruction. Our proposed reconstruction model has the best performance in terms of the distributions of DW image intensity, FA, MD and fiber orientation deviation angle. Noise Influence DW images are often corrupted by noise. To evaluate the robustness of the proposed method to noise, we averaged the repeated acquisitions of isolated infant hearts several times (respectively, 1, 2, 3, 4, 5, and 6), thus yielding DW images presenting different noise levels (the corresponding SNRs are 17.1, 20.2, 21.9, 23.0, 23.7, 24.3 dB), which are then fed into the trained model. The effect of number of averaged acquisitions with sampling ratio of 20% were plotted in Fig 7. Globally, for different noise levels, our proposed CSGAN model always outperforms other two models. As the number of averaged acquisitions increases, the noise is decreased and the PSNR of DW images reconstructed by all the three methods clearly increases. But the PSNR of ADMM always remains at a lower level and the proposed model is always the highest. We also observe that with the increase of SNR, compared to ADMM and DAGAN, the superiority of our proposed model becomes more obvious. In terms of deviation angle, the proposed model always induces the smallest fiber orientation errors. As to FA and MD, with our model, changes in SNR do not influence greatly the reconstructed FA and MD with respect to ground truth. In contrast, the reconstruction results obtained with DAGAN and ADMM are much more sensitive to noise, which demonstrates the robustness of the proposed model.

Model Accuracy
The proposed reconstruction method works on 2D slices. So, it would be interesting to know if its performance varies with the image content (intensity variation, anatomical structures, etc.) of the slices of a three-dimensional volume. To evaluate the accuracy of the proposed method, we plot in Fig 8 the variation of SSIM values of FA and MD as a function of slice number. The greater the SSIM value (the darker the blue color), the closer to ground-truth. In the ideal case, for a given method (a given row), the dark blue remains constant regardless of slice number. Clearly, the FA SSIM or MD SSIM values of the proposed CSGAN vary much less than those of existing methods, which implies that the proposed reconstruction method is much less sensitive to the variation of image content.
As an illustration, in Fig 9 are plotted the fiber orientation errors as a function of sampling ratio for different reconstruction methods. As expected, fiber orientation errors decrease with the increase of sampling ratio (i.e. increase of available k-space data). However, fiber orientation errors induced by the proposed reconstruction method are always smaller than those induced by existing methods.

Discussion
We have proposed a CS-DTI de-aliasing method based on a novel GAN model, which consists of a nested-UNet based generator and ResNet-based discriminator. The results showed that the proposed model outperforms existing deep learningbased CS de-aliasing methods, especially at extremely low sampling ratios (10%). From the recovered DW images and the corresponding diffusion metric maps, we found that the performance of deep ADMM-net is poor, although it employs a deep neural network, which learned the parameters of the traditional ADMM optimization algorithm based on multiple-stage training using a mean square error (MSE) loss. Since the traditional ADMM algorithm is sensitive to hyperparameters and the MSE loss is likely to blur the image, consequently, ADMM-net results in the worst reconstruction results. Our results demonstrated that short connections in the proposed nested-UNet based generator were able to yield less artifacts and richer texture in the recovered DW images, thus producing more accurate DTI diffusion properties, such as fiber orientation, FA and MD. This is consistent with the fact that DAGAN is a typical UNet-based model capable of solving inverse problems (such as image super resolution, compressed sensing and aliasing removing) and reconstructing high-quality images. Compared to UNet-based structure, the nested-UNet structure includes both short and long skip connections, which makes the network be able to learn semantic features at multi-levels and generate more image details to overcome the mode collapse problem. This explains why its de-aliasing performance is better than UNet-based one (DAGAN). Moreover, we observed that the FA and MD maps obtained by ADMM-net or DAGAN model have many zeros values. If FA is about zero, it means that the DW signal along different diffusion directions is almost the same. That could be caused by the mode collapse problem in DAGAN and the smooth effect in ADMM-net.
As observed, the SSIMs of FA and MD reconstructed with our model are almost the same for all the slices. On the contrary, DAGAN and ADMM-net produce significant variation of SSIMs of FA and MD from one slice to another, which implies that the de-aliasing performance of DAGAN and ADMM-net is largely dependent on image content. In other words, they cannot accurately recover images with rich textures. Unlike in UNet where the feature maps of encoder are directly inputted into the decoder, in nested-UNet, the short skip connections allow the feature maps of encoder to experience a dense convolution with different layers before entering the decoder. Such dense convolution makes the semantic information conveyed in the feature maps of encoder closer to that expected in the features maps of decoder. The optimization problem then becomes much easier if the real input and the expected input are semantically similar. Moreover, the dense convolution with different layers allows us to explore and integrate semantic features at multi-levels, which plays a significant role in enhancing image details and as a result generating better dealiasing results. In addition, the diffusion metrics derived from our model are more robust to noise. This can be explained by the fact that through dense convolution along skip pathways, the receptive field was enlarged, which enables us to get more useful image features that are extremely significant for promoting the reconstruction robustness.
Although the proposed method achieved the best performance, there are still some limitations. First, the relationship between the images along different diffusion gradient directions was not fully considered in the network design, further modifying the reconstruction network and investigating novel loss functions suitable for CS-DTI reconstruction would be a future work. Then, although the present CNN-based method was focused on already acquired data, it would also be interesting to act on sequence design for reducing acquisition time.

Conclusion
We have proposed a novel GAN-based model for cardiac CS-DTI de-aliasing. It consists of a nested-UNet based generator and a ResNet-based discriminator. We compared it with state-of-the-art models ADMM-net and DAGAN in terms of various criteria. The results showed that, compared to state-of-the-art ADMM-net and DAGAN models, our method achieves much better imaging quality in terms of PSNR, SSIM, DW image residual error and diffusion metrics. This is particularly noticeable at extremely low sampling ratios and low SNRs, which suggests its interest for the clinical use of cardiac DTI.

Availability of data and materials
The data used in this study are available from the corresponding author on reasonable request, the data will not be shared.
Ethics approval and consent to participate All clinical data used in this work does not involve the privacy of patients, and the experiments meet the legal and ethical standards.

Consent for publication
All the authors listed have signed an informed consent to approve the manuscript that is enclosed. Fig. 1: Architecture of the proposed CS-DTI reconstruction. In the generator (a), each pellet represents a network unit, including a standard convolution layer with stride of 1, kernel size of 2 and the same padding, then embed a SE module into unit, and a convolutional layer used to down-sampling with stride of 2 and kernel size of 4. The discriminator (b) consists of feature selection, feature fusion and discrimination module. The input of the CS-DTI reconstruction network is DW images whose k-space is undersampled and its output is the generated images. GT stands for ground truth.    : Data distribution of different reconstructed results on a whole heart with sampling ratio of 20%. The horizontal axis represents different methods and the vertical one represents values. (a) Violin plots of DW images for fetus human heart, (b) Violin plots of DW images for adult human heart, (c) Violin plots of FA for fetus human heart, (d) Violin plots of FA for adult human heart, (e) Violin plots of MD for fetus human heart, (f) Violin plots of MD for adult human heart, (g) Violin plots of fiber orientation error for fetus human heart, (h) Violin plots of fiber orientation error for adult human heart.