Longitudinal Structural MRI Data Prediction in Nondemented and Demented Older Adults via Generative Adversarial Convolutional Network

Alzheimer’s disease (AD) is the most common cause of dementia and threatens the health of millions of people. Early stage diagnosis of AD is critical for improving clinical outcomes and longitudinal magnetic resonance imaging (MRI) data collection can be used to monitor the progress of each patient. However, missing data is a common problem in longitudinal AD studies. The main factors come from subject dropouts and failed scans. This hinders the acquisition of longitudinal sequences that consist of multi-time-point magnetic resonance (MR) images at relatively uniform intervals. In this paper, we present a generative adversarial convolutional network to predict missing structural MRI data. In particular, we include multiple MRI scans as a temporal sequence collected at different times and determine the spatio-temporal relationship between the different scans in the proposed network. We adopt residual bottlenecks in the generator to decrease parameter values and deepen the network. In order to make full use of the longitudinal information, our discriminator classiﬁes not only real MR images from generated MR images, but also fake sequences from real sequences in which the longitudinal MR images for all time points come from the dataset, only the last MR image comes from the generator. Results of our experiment show that our method performs more accurately for the longitudinal structural MRI data prediction of a brain afﬂicted with AD.


Introduction
Alzheimer's disease (AD) is a severe type of neurodegenerative disorder.It comes with progressive impairment of learning and memory 1 .AD is the most common cause of dementia, therefore, early diagnosis is important.Longitudinal magnetic resonance imaging (MRI) data collection can be very helpful for following up the progress of AD patients.A longitudinal sequence usually consists of two or more time points [2][3][4] .Due to scan corruption, artifacts, limited available scan time, and incorrect machine settings, the data can be corrupted or incomplete.Therefore, predicting the structural MRI of AD patients is important for completing a longitudinal dataset.Moreover, diagnostic structural magnetic resonance (MR) images are sensitive to the degeneration that occurs in mild AD and MCI 5 and can help monitor disease progression [6][7][8][9][10][11] .Thus, prediction of structural MR images is very useful in assessing AD risk.
Several studies have been conducted on the longitudinal prediction of missing images or structural MRI.A sparse metamorphosis patch-based model predicted the anatomical structures' temporal evolution 12 .Then regression forest was used to predict development of cortical 13 .However, above methods are applied on infants.In terms of AD image regression, Niethammer 14 developed a generative model that adopted a second-order dynamic formulation on image registration.Pathan 15 presented a predictive regression model for longitudinal images with missing data based on large deformation diffeomorphic metric mapping (LDDMM).However, the method mainly focuses on capturing linear changes in the image sequence and cannot deal with missing correspondences among images.To solve this problem, we introduce a generative adversarial convolutional network (GAN) to predict the missing structural MRI data of AD subjects.
GANs 16 have been widely used for segmentation [17][18][19] , classification 20 , medical image analysis 21 and synthesis of medical images [22][23][24][25] .The generator is trained to generate a realistic output that can "fool" the discriminator into classifying them as real, and the discriminator is trained to accurately distinguish between real and fake data.The framework is equivalent to a maximum and minimal game between two parties.Recently, [26][27][28] generating natural images by using GAN has achieved excellent performance.GANs are also applied for converting MRI images to CT 22 , and for noise reduction for low-dose CT 29 .
Our model can learn non-linear mapping from available longitudinal historical data to missing data.The proposed network is designed as a longitudinal input, where the spatio-temporal relationships of MR images collected from the same subject at different times is taken into account.The sequence consisting of longitudinal historical MR images is called and a MR image for the last time point is generated as a "fake sequence" while the sequence consisting of all the longitudinal real MR images is a "real sequence".Then, a discriminator is used to distinguish both the fake/real sequence and the fake/real MR image.Our model makes full use of the longitudinal information of historical MR images.

Methods
In this section, we introduce a generative adversarial convolutional network for longitudinal structural MRI prediction.

Our proposed network architecture
Our network is designed based on GAN, We considered the relationship between MR images collected at different time points for the same subject.Let Y predict be a prediction of MR image which from input MR images at other time points X = {X 1 , ..., X n }.In generator, we use 18 residual blocks to deepen our model.The discriminator d takes a sequence of MR images at consecutive time intervals for each subject.It is trained to predict the probability that the next MRI of the sequence is generated by g.Only the last MRI is either real or generated by g and the rest of the sequence is always from the dataset.In addition, we discriminated a single image from the dataset or generator.This allows the discriminative model to make use of temporal information, so that g learns to produce sequences that are temporally coherent with its input.g is conditioned on the input MR images X = {X 1 , ..., X n }.Network details are shown in Figure 1.Our network adopts three important components.(A) Residual bottlenecks (Figure 2).Our generator contains 18 residual bottleneck blocks, which can reduce the parameters while the layers of our network increase.Our discriminator contains 3 residual bottleneck blocks.(B) Fake/real longitudinal sequence discrimination.The discriminator can classify not only real MR images and generated MR images of the last time point, but also the real longitudinal sequence and the fake longitudinal sequence.The MR image at the last time point comes from the generator and the longitudinal history MR images are from the dataset.(C) Global average pooling (GAP).We also use GAP in our discriminator to replace the FC (Full-Connected) layer because the FC layer needs to train and tune a mass of parameters.GAP reduces the spatial parameters to make the model more robust.

Loss
We define the MR images from the dataset as (X,Y ).X is historical longitudinal MR images and Y is the ground truth of Y preict .Y preict is generated by generator g.We train d to classify the input (X,Y ) into class 1 and the input (X,Y preict ) into class 0. In addition, we train d to classify the input Y into class 1 and the input Y preict into class 0. We perform stochastic gradient descent (SGD) iteration of d while keeping the weights of g fixed.Therefore, the loss function we use to train d is where L bec is the binary cross-entropy function defined as below: While we train g, we should keep the weights of d fixed and one SGD step is adopted on g to minimize the adversarial loss: Some studies have shown that L 1 distance results in images that are less blurry compared to the L 2 distance 30 .Then we emoply the L 1 loss to evaluate the distance between MR images.The loss function is defined as To sharpen the image prediction, it is necessary to penalize the differences in image gradient predictions in the generative loss function.We combine the gradient difference loss (GDL) 31 with L 1 loss and adversarial loss function.
where | • | denotes the absolute value.
The final loss of generator is: For the combined loss function, we set λ L 1 = 1, λ gdl = 1 and λ g = 0.05 empirically to balance each term of loss.

Experimental Setup
In this section, we describe the datasets, preprocessing and training and implementation details.

Dataset
The Open Access Series of Imaging Studies (OASIS) is a free neuroimaging datasets of the brain for the scientific study 32 .We used OASIS-2 to train our network.For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions were included.The subjects were all right-handed and included both men and women.We selected 18 subjects with 3 time points from OASIS-2.
To avoid a large age gap between subjects, we screened 18 subjects ranging in age from 75 to 88.The time between the two collections should not be more than two years.The 18 subjects consisted of 7 males and 11 females.There were 7 demented subjects, 11 non-demented subjects.The age distribution of these subjects is shown in Figure 3.

Preprocessing
Our experiment followed the normal data preprocessing process 15 , We used software from the FMRIB Software Library (FSL) 33 to register all the subjects' heads to the same size before stripping the skull.This artificially improves generator performance numbers because most voxels is very close to zero, and the generator could yield images with intensity values close to zero 34 .In order to extract the brain region from each sequence, we calculated the largest bounding box that can contain each brain in the whole dataset, and then cropped each sequence in every subject scan.The coronal position of the brain was selected because doctors are more likely to diagnose AD based on the coronal MRI slice.The final cropped size of each subject scan for all sequences consisted of 434 coronal slices of size 362.We labeled the corresponding slices of the same subject at different time points as a sequence.

Training and Implementation Details
To optimize the network, we used the SGD optimizer.We set the generator learning rate to 1e −4 and the discriminator to 1e −6 .We used Python and implemented the architecture in Pytorch.The computing hardware consisted of an i7 CPU with 64 GB RAM and GTX2080Ti 12 GB GPU.In this sequence of 18 subjects, we used the data of 17 subjects for training and 1 subject for testing.

Experimental Results
In this section, we present the results of our experiments validating our method.We have conducted ablation experiments to verify the role of the components in our model.

Method Comparison
As comparison baseline, we applied U-Net 35 and Resnet 36 with 18 residual blocks to predict MRI of subject.Both U-Net and Resnet's required inputs for predicting the third MRI scan are the subject's first two MR images.Figure 4. (a) shows the MRI results of different methods and our method, (b) shows the gray error map between U-Net, Resnet, and our method with the target.Our method produces more realistic details closer to the ground truth of MR image.
Both quantitative (per pixel synthesis errors) and qualitative differences in simulated human perception should be taken into account in assessing the quality of the predicted images.Three metrics are used to report the results: mean squared error (MSE), peak signal-to-noise ratio (PSNR), and the structural similarity index metric (SSIM).The prediction results are summarized in Table 1, the PSNR and SSIM of our model are 29.46 and 0.9977, 7.39 and 0.0285 higher than U-net, 0.16 and 0.0025 higher than Resnet respectively.Our method has a lower MSE than U-Net and Resnet, which means that total of the pixel-wise error between our results and the targets is smaller.Comprehensive evaluation shows that our algorithm performs well.

Ablation Study
An ablation study was conducted to understand the effect of each network component.Firstly, we compared our proposed model with and without adversarial learning.The average MSE, PSNR, and SSIM of the model with and without adversarial 5/9 learning are shown in Figure 5.The visualization results are shown in Figure 6.Our model with adversarial learning yields greater details closer to the target and provides more accurate prediction results than without adversarial learning.Our model, utilizing both GAP and residual bottlenecks, performs better than the model using only GAP or bottlenecks alone as summarized in Table 2.We replaced the FC layer with GAP because it is more natural to convert the feature map to the final classification.If only bottlenecks are used, the PSNR and SSIM are 3.03 and 0.0029 lower than GAP+Bottleneck.If only GAP is employed, the PSNR and SSIM are 0.15 and 0.0003 lower than GAP+Bottleneck respectively.
Our discriminator can categorize not only real MR images and generated MR images of the last time point, but also real sequences and fake sequences.In order to verify the effect of sequence discrimination in our model, a comparative analysis was conducted.The results of the prediction are summarized in Table 3.
In addition, we compared the impact of different components on the number of parameters in our model, the detailed results of parameters can been seen in Table 4.The parameters of our model with GAP+Bottleneck is 47.4,2.1 lower than our model without GAP and 2.4 lower than the model without bottleneck.The results show that the GAP and Bottleneck components can effectively reduce the number of model parameters by more than 4%.

Conclusion
We present a network for prediction of missing structural MRI data via GAN.We take into account multiple MRI scans of a subject collected at different time points as a temporal sequence and determine the spatio-temporal relationship between different scans in the proposed network.The multi-time-point-input method exploits the correlations between available sequences and generates missing structural MR images by leveraging information from the existing sequence of a subject.In addition, sequence discrimination is also helpful for our model to make full use of spatio-temporal information.Our approach qualitatively and quantitatively performs better than the listed methods and the predicted missing data provide complementary visual information on the OASIS-2 dataset.
The structure of the whole network.
Details of residual bottleneck.
The age of the subjects at three time points.The comparison results of our method with/without adversarial learning.The overall result and the local enlargement in the yellow box.

Figure 1 . 9 Figure 2 .
Figure 1.The structure of the whole network.

Figure 3 .
Figure 3.The age of the subjects at three time points.

Figure 4 .
Figure 4.The results of different prediction methods.(a): The results of U-Net, Resnet, Ours and target.The yellow arrows indicate areas of difference in detail.(b): The corresponding gray error map of (a).

Figure 6 .
Figure 6.The comparison results of our method with/without adversarial learning.The overall result and the local enlargement in the yellow box.Component w/o GAP w/o Bottleneck GAP+Bottleneck Parameter 49.5 49.8 47.4

Figure 4 The
Figure 4

Table 1 .
The average prediction accuracy comparison of different methods.

Table 2 .
The influence comparison of different components of our method.

Table 3 .
The average prediction accuracy comparison of our model without/with sequence discriminating.

Table 4 .
Model parameter comparison of different components of our method.