Deep Robust Residual Network for Brain MRI Super-Resolution

Spatial resolution is a key factor of quantitatively evaluating the quality of magnetic resonance imagery (MRI). Super-resolution (SR) approaches can improve its spatial resolution by reconstructing high-resolution (HR) images from low-resolution (LR) ones to meet clinical and scientiﬁc requirements. To increase the quality of brain MRI, we study a robust residual-learning SR network (RRLSRN) to generate a sharp HR brain image from an LR input. Given that the Charbonnier loss can handle outliers well, and Gradient Difference Loss (GDL) can sharpen an image, we combine the Charbonnier loss and GDL to improve the robustness of the model and enhance the texture information of SR results. Two MRI datasets of adult brain, Kirby 21 and NAMIC, were used to train and verify the effectiveness of our model. To further verify the generalizability and robustness of the proposed model, we collected eight clinical fetal brain MRI data for evaluation. The experimental results show that the proposed deep residual-learning network achieved superior performance and high efﬁciency over other compared methods.


Introduction
Spatial resolution is a key factor of evaluating the quality of magnetic resonance imagery (MRI). Images having high spatial resolution produce rich structural details, enabling accurate image analysis 1 and detailed anatomical information for accurate quantitative analysis 2 . MRI is widely used to assess brain disease and development 3,4 . However, owing to the limitations of the component performance of equipment, uncooperative patients, and other factors, improvements to MRI quality are necessary 5,6 .
With conventional medical-image processing, bicubic or spline interpolation is usually adopted as standard image-processing techniques to achieve high resolution. This interpolation method negatively affects image accuracy 7 . Therefore, coherently recovering the missing information during the acquisition of medical images and better reconstructing the high-resolution (HR) image is a fundamental problem in the field.
The input to above SRCNNs must be a bicubic low-resolution (LR) image. However, employing bicubic interpolation on the LR input is very time-consuming. To reduce the computational costs, Fast SRCNN 22 uses a deconvolutional layer to reconstruct HR images from LR features. W. Shi et al. 23 proposed an efficient sub-pixel CNN. When the redundant nearest-neighbor interpolation was replaced with the interpolation, the deconvolutional layer was simplified into a sub-pixel convolution. This interpolation was more efficient than the nearest-neighbor interpolation.
Although these models demonstrated promising results, they all required upscaled input images at the desired spatial resolutions via bicubic interpolation prior to applying the network, and these models did not use low-level feature information. To cope with these limitations, some SR algorithms have adopted residual learning 8, 10-12, 16, 24, 25 , showing effective improvements.
To address the computational-cost problem and to avoid generating fake features, we adopted a deep residual network to train residuals in a coarse-to-fine fashion. We up-scaled input images to the desired spatial resolutions in our network to improve computational efficiency. We train our model to sharpen the SR image using Gradient Difference Loss (GDL) 26 , which is used to penalize the gradient difference between the SR result and the HR ground-truth. This strategy was helpful in improving the quality of the SR result and avoiding false information. We also adopted the robust Charbonnier loss function, it can deal with outliers and improve reconstruction accuracy. Adult brain MRI datasets, Kirby 21 and NAMIC, were used to train and verify the effectiveness of our model. To further assess the generalizability and robustness of the proposed model, we collected eight clinical fetal-brain MRIs for evaluation. Figure 1 shows the HR example slices for the different algorithms: cubic spline interpolation and non-local means up-sampling (NMU) 27 , low-rank total variation (LRTV) 28 , and SRCNN 29 for visual inspection with the ground-truth MR image and LR image on Kirby 21 in Figure 1 (a), NAMIC 30 in Figure 1 (b), and clinical fetal MR images, respectively. It can be seen that our approach recovered fine details and preserved the edges.  The SR deep-learning technique was not very limited by MRI parameters and could therefore be further migrated to the fetal brain. Thus, we applied our model to fetal MRIs, which were provided by the First Affiliated Hospital of Xi'an Jiaotong University. We labeled the fetal brain on the MRI and extract the fetal brain. The MRIs of each fetus were cut into 10-20 slices. We tested all slices of each fetus. Figure 1 (c) shows the SR example slices of different algorithms on a subject. The reconstructed MR images by our network provided more details than did the other algorithms.

Experimental Results
For a quantitative comparison, the average peak signal-to-noise ratio and structural similarity 31 were used to evaluate the performance of each algorithm. Table 1 provides a summary of the quantitative evaluation within a scale factor of two. The reported results tend to show that CNN-based approaches (e.g., SRCNN and our RRLSRN model) achieved better performance than did spline interpolation, NMU, and LRTV. Our experiments also showed that residual learning approaches were more effective than SRCNN.

Dataset
Cubic To verify the efficiency of our algorithm, we separately calculated the test time of our Kirby 21, NAMIC, and the fetal MR image methods. We then compared the spending time of other methods. The results are shown in Table 2. The average speed of our model was faster than those of the cubic spline, NMU, LRTV, SRCNN on NAMIC, and the eight fetal MR images. The speed of our model outperformed NMU, LRTV, and SRCNN and as very close to the cubic spline of the Kirby 21.

Discussion
In this work, we proposed a network-based algorithm to learn the residual information between upsampled MR images and HR MR images. Our approach adpoted the robust Charbonnier loss function and GDL are helpful to our model training. In order to demonstrate the potential of SR methods for enhancing the quality of LR images, we have presented an experiment with image quality transfer from HR experimental dataset to LR images. The results based on two brain MR image datasets show that our algorithm outperforms cubic spline, NMU, LRTV and SRCNN in this study. RRLSRN network effectively learns the residual information between upsampled LR MRI and HR MRI, the model can not only improve the accuracy of network SR results, but also greatly reduce the computational cost. Then we applied the model on the clinic fetal MR images. The fetal SR results of the proposed RRLSRN are better than above listed methods. The texture of SR results become detailed.
In terms of the processing speed, we observed that our method trained ×2 faster than NMU, LRTV and SRCNN on both Kirby21 and NAMIC datasets, and is almost as same as fast with cubic spline on Kirby21. Overall, our algorithm performs well in terms of speed.
Our SR method shows clear improvement over other listed methods, which is the standard technique to enhance image quality from visualization, quantitative evaluation and computational efficiency. Our model is currently SR on the scale of ×2, it can also be extended to ×4 or ×8 times for SR reconstruction by cascading. In future work, we will improve our residual learning based SR framework to obtain better accuracy, meanwhile reduce computational complexity. In addition, we will further apply the SR technology to improve the accuracy and validity of the clinical diagnosis.

Methods
All methods were performed in accordance with the Declaration of Helsinki.

MR Image Super-Resolution Framework
We propose RRLSRN to generate an HR brain image from its LR input. Our network is made up of the feature extraction and image reconstruction parts. The image reconstruction part estimates a raw HR output and extracts useful representations from LR MRI. We up-sampled LR MRIs and learned the residual information between the HR MRI and the up-sampled MRI. Our LR MRI was derived from the HR MRI via bicubic interpolation.

3/7
where x and y represent the LR and HR images, respectively. κ is the down-sampling operator. r is the residual information between the HR MRI and the bicubic-interpolated MRI. u represents the up-sampling operator. The model can learn the residual feature and up-sampling feature with normal and transposed convolutional layers. The network architecture used in this study is illustrated in Figure 2 (b). When using fetal data, we segmented and extracted fetal brains as shown Figure 2 (a).

Figure 2. Comparison of computational speed (second) with different methods.
The main architecture of the network for feature extraction consisted of 13 convolutional layers and two transposed convolutional layers to up-sample the extracted features using a scale of two. Because the fetal MRI slice sequence did not enable 3D representation, we designed our model with 2D convolution. The convolution kernel size was 3 × 3 × 64. The transpose convolutions were 4 × 4 × 1. Our model performed feature extraction at a coarse resolution and generated feature maps with finer details by using the transposed convolutional layer. Compared to the listed networks, our network can reduce computational complexity significantly.

Loss function
This approach can learn the information lost in the image by interpolation, and it can also reduce computational complexity. We optimized the network with a Charbonnier loss 7 , as stated in the following formulation: ε is a very small constant. We optimized our model using a Charbonnier loss function instead of the l 2 loss to cope with outliers and improve MRI SR-result accuracy, due to the loss is robust. Let x be the input. We denote the ground-truth HR MRI slice by y, generating the corresponding HR MRI slice byŷ, and the residual information of MRI by r. The loss function is defined as ε is empirically set as 1e −3 . According to the number of each batch of samples in the training process, the overall loss function is: Where i represents the number of training samples.

4/7
We also combined the GDL, which can directly penalize the differences of image gradient to sharpen the SR result. The GDL function is defined as follows: Where |.| denotes the absolute value function. Then the final combine loss is:

Dataset and training details
To verify the ability to reconstruct HR MRI slices of the brain, we applied our method on two adult-brain datasets (Kirby 21 and NAMIC) and eight clinical fetal MRIs.

Kirby 21 dataset
The Kirby 21 dataset 30 contains the data of 21 volunteers who were all healthy, had no history of neurological conditions, and the dataset contained T1-weighted MRIs. The dataset was obtained using a 3-T MRI scanner (Achieva, Philips Healthcare, Best, Netherlands) with a sagittal view (FoV) of 240 × 204 × 256 mm and a resolution of 1.0 × 1.0 × 1.2 mm 3 .

NAMIC Brain Multimodality dataset
The NAMIC dataset (http://hdl.handle.net/1926/1687) was acquired using a 3-T General Electric (GE) device at Brigham and Women's Hospital in Boston, MA. An eight-channel coil was employed to perform parallel imaging by using array spatial sensitivity encoding techniques 30 . The parameters of structural MRI were as follows: TR = 7.4 ms, TE = 3 ms, 25.6 cm 2 FoV, and matrix = 256 × 256.

Clinic fetal MRI dataset
The eight clinic fetal MRI data was provided by the First Affiliated Hospital of Xi'an Jiaotong University. Images were continuously collected from September 2017 to October 2018 using GE 3.0-T MRI scanner (Discovery 750W; GE Medical system, Milwaukee, WI; 240 × 204 × 256 mm FoV; 4-mm slice thickness; TE = 85 ms) for fetal-head MRI. Eight pregnant volunteers used silent sequences, which contained silent T2 half-Fourier acquisition single-shot fast-spin-echo axial, sagittal, and coronal. These eight women underwent MRI scans because of health concerns. We performed the experiments by following the safety guidelines for MRI research. All patients signed informed consent forms, and the clinical protocol was approved by the Institutional Review Board of the First Affiliated Hospital of Xi'an Jiaotong University in Xi'an Shaanxi, China on February 25, 2019. The experimental data were completely de-identified, so that any related information of the subject cannot be retrieved.

Training details
We chose data from KKI2009-06 to KKI2009-42 in Kirby 21 to train the model. KKI2009-01, KKI2009-02, KKI2009-03, KKI2009-04, and KKI2009-05 were used for testing. We tested the model from case01011 to case01034 NAMIC. The remaining images were used for training. All eight fetal brain MRIs were used for testing. LR images were generated using a scale factor of two. We initialized the network using the model of W. Lai 7 . The slope of leaky rectified linear units was -0.2. We padded zeros to make sure that the size of the feature map for each layer is the same as the input. And we trained the model by randomly sampling 64 patches whose sizes were all 128 × 128. We set the momentum parameter to 0.9 and the weight decay to 1e −4 . The learning rate was initialized to 1e −5 and decreased by a factor of two at every 50 epochs. We trained the original codes of the compared methods to calculate the runtime on the same computer with an Intel i7 processor (64-GB RAM) and Nvidia Tesla V100 graphics processor (16-GB Memory).