Dataset
The data analyzed in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The ADNI database was launched in 2003 as a public-private partnership that was led by Principal Investigator Michael W. Weiner, MD. ADNI primarily aimed to investigate whether a combination of measurements from serial MRI, PET, clinical and neuropsychological assessments, and other biological markers be used to measure the progression of mild cognitive impairment (MCI) and early AD (for up-to-date information, see www.adni-info.org).
We downloaded data acquired from 192 scan pairs of PiB PET and MR three-dimensional (3D) T1-weighted scans from the ADNI database. The PET and MR images were acquired from 93 participants, including 16 healthy controls (HC), 59 patients with MCI, and 18 patients with AD. One, two, and three follow-up scans and a baseline scan were performed for 43, 25, and 2 participants, respectively. Only a baseline scan was performed for the remaining 23 participants. No participants experienced conversion from HC to MCI or AD or from MCI to AD.
For the PET input, we downloaded PET data that were preprocessed by the coregistration of each frame to the first frame and averaged the frames (5 min × four frames starting 50 min after the [11C]PiB injection; this is termed “Coregister, Averaged” in the ADNI database). We smoothed the downloaded PET images using a Gaussian kernel to adjust the PSF for similar PET images among all ADNI sites. The smoothing kernel employed in this study was the same as that used in the “post-processed” image, named “Co-reg, Avg, Std Img, and Vox Siz, Uniform resolution” in the ADNI database. The smoothed PET images had a uniform isotropic resolution of an 8-mm FWHM.
For the MR input, we downloaded thin-sliced MR T1-weighted images from the ADNI database. The downloaded MR T1-weighted images were resampled to 256 × 256 × 256 voxels with dimensions of 1 × 1 × 1 mm3. The resampled MR images were analyzed using FreeSurfer (https://surfer.nmr.mgh.harvard.edu) to automatically label the volumes of interest (VOIs) [38, 39] for PVC and subsequent VOI analysis. A total of 113 labeled VOIs were identified based on the Desikan/Killiany atlas [40] and termed as “aparc+aseg” in the FreeSurfer software. To save computation time in the PVC processes, we merged the 113 VOIs into 44 regions (22 regions in each hemisphere) based on definitions from previous analysis by the ADNI PiB PET Core [41]. Details regarding the process of merging VOIs are presented in Table S1 in the Supplementary Materials. To examine spillover to non-brain tissues and air in the PVC, we added a VOI comprising a 15-mm “shell” surrounding the outer surface of the brain. The VOI map for a representative case is shown in Figure S1. To avoid memory errors during training, MR images were down-sampled to 128 × 128 × 128 voxels with 2 × 2 × 2 mm3 before registration of PET images and PVC processes as follows.
We considered maps corrected for the partial volume effect based on the region-based voxelwise (RBV) method [17] the target images for training. The PVC-optimized registration (PoR) framework [42] was applied to compensate for the misregistration between the MR and PET images. Briefly, the PoR framework iteratively performed PVC and registration between the smoothed PV-corrected map and uncorrected PET image. Next, the final PV-corrected map was generated by performing RBV PVC using a misregistration-compensated PET image.
We considered the MR and PET images acquired from 156 scan pairs the training data. Images acquired from 36 scan pairs from sites other than the training datasets were used to test the trained model.
Training
The U-Net [37] used in this study is shown in Figure 1. The U-Net was trained using a training dataset containing 156 scan pairs. In brief, the U-Net contains the encoder and decoder parts. The encoder compresses the data to extract the robust image features, while the decoder part mirrors the encoder’s structure and restores a desirable image from the extracted features. Each level of the encoder and decoder parts contains two convolutional layer blocks. Each block includes a convolutional layer, a batch normalization layer to prevent internal covariance shifts (Ioffe and Szegedy, 2015), and an activation layer with rectified linear units [44]. Down- and up-sampling was performed of the encoder and decoder parts, with convolutional and transposed convolutional layers with a stride of two. The number of channels for data was doubled in the down-sampling and reduced by half in the up-sampling. We empirically set the number of down- and up-samplings to three. Skip connections at each level of the network were added to prevent loss of spatial information. Finally, the output images were recovered from the final image features using a convolutional layer with a 1 × 1 kernel.
Further, we trained the network weights by minimizing the mean squared error between the real and predicted output images. The weights were optimized using the Adam method [45]. We used the default values of the hyperparameters for Adam except for β1, which was set to 0.723. The update of the weights was implemented in batches, including 16 image sets, and iterated with 400 epochs. The initial learning rate was set to 0.0018 and linearly decayed from the 200th epoch to the end of the learning process. The β1, batch size, number of epochs, and initial learning rate were optimized by Bayesian optimization using the Optuna library (https://optuna.org/) [46]. Data augmentation was performed of the training data with rotation and horizontal flipping. The training was implemented using the PyTorch library (https://pytorch.org/) [47].
The input PET and MR images were standardized based on the average of each individual image. The output PV-corrected maps were standardized using the average of the individual PET images.
Validation
To validate whether the trained deepPVC model learns features for PVC from the MR and PET images, we performed training using only the PET images as well as using the PET and MR images and named the groups “deepPVCPET” and “deepPVCMRI+PET,” respectively. Model performance was subjected to six-fold cross-validation. The 156 image pairs were split into six data subsets – five as training data and one as validation data – and trained and evaluated the trained model six times to validate all subsets. We compared the following metrics between the two deepPVC models: 1) the structural similarity index (SSIM) [48] between the real and predicted PV-corrected maps; and 2) regional standardized uptake values (SUVs) in the VOIs on the real and predicted PV-corrected maps. The SSIM assesses the structural and perceptual similarities between the two images. The SSIM was calculated using the scikit-image library (https://scikit-image.org/) [49].
The regional SUVs on the VOIs were compared to assess the quantitative correspondence between the real and predicted PV-corrected maps. The intraclass correlation coefficient for absolute agreement of a single measure (ICC[2, 1]) between the real and predicted PV-corrected SUVs for each individual was calculated as an index for the quantitative correspondence of PV-corrected SUVs. The ICC was calculated using the pingouin library (https://pingouin-stats.org/) [50]. To demonstrate the voxel-level correspondence between the real and predicted PV-corrected maps, we constructed two-dimensional (2D) histograms between the real and predicted PV-corrected SUVs on voxels in the brain for each individual.
Differences in the SSIM and ICC among the trained models were tested using a pairwise t-test with correction for multiple comparisons using Bonferroni’s method. The SSIM and ICC of the uncorrected and real PV-corrected PET images were used as references to compare the models and test for differences between them.
Test with [11C]PiB PET data
The trained deepPVC model was tested against the test data: 36 scan pairs acquired at different sites from those of the training data to assess trained model performance upon generalization. Note that the PET scanner used for the test dataset differed from that used for the training/validation dataset, while the MR scanners were the same for both datasets. The lists of PET and MR scanners are presented in Table S2 of the Supplementary Materials. We tested the model trained with all 156 scan pairs of the training/validation dataset. The SSIM and ICC for the predicted PV-corrected maps were calculated for the test data, as for the validation data. The differences in the SSIM and ICC between the validation and test data were tested using Welch’s t-test.
The computer used in the test had an Intel Xeon E5-1650 v3 3.50 GHz central processing unit (6 cores and 12 threads), four graphics processing units (GPUs), GeForceGTX TITAN X 12 GB, and eight 8-GB memory cards (total, 64 GB). We measured the computation time with versus without the GPU; for reference, the computation time required to perform RBV PVC was also measured.
Test with over-smoothed PET images
To demonstrate the effect of PSF inaccuracy on deepPVC and whether the trained deepPVC model learned PSF information, we tested the model on excessively smoothed PET images. We hypothesized that, if the trained model learned the PSF information, the PSF mismatch between the PSF true and assumed in the training would affect the predicted PV-corrected maps as with conventional MR-PVC. We excessively smoothed the PET images for the test data using 6.0- and 8.9-mm FWHM Gaussian kernels, resulting in a final resolution of 10- and 12-mm FWHM, respectively. We calculated the differences between the PV-corrected SUVs predicted for the original and over-smoothed PET images using the trained deepPVCMRI+PET model. For reference, we also performed RBV PVC for the smoothed PET images and compared the differences in the PV-corrected SUV with the deepPVC.
Test with misaligned PET images
To demonstrate the effect of misregistration between the PET and MR images on deepPVC, we arbitrarily realigned the PET images for the test dataset. We then predicted the PV-corrected maps using deepPVC with the arbitrarily realigned PET and the original MR images as input data. The realignment in a single direction, shift, or rotation on the x-, y-, or z-axis was performed by ±4, 8, or 12 mm or degrees. We calculated the differences in regional PV-corrected SUVs from those without realignment. To compare the robustness of the misregistration between conventional PVC and deepPVC, we also performed RBV PVC for the realigned PET images.
Test with PET images acquired with a radiotracer other than [11C]PiB
To determine whether the trained model learned uptake patterns specific to [11C]PiB, we tested the trained model on the acquired data using a tracer other than [11C]PiB. We assumed that the trained model could successfully predict a PV-corrected map for the other tracer if the trained model learned the pure partial volume effect on the PET images. Subsequently, [18F]FDG PET and MR T1 images from 16 participants were downloaded from the ADNI database, including three HCs, 10 with MCI, and three with AD. Preprocessing of these MR images was performed using FreeSurfer as for the [11C]PiB data; co-registration between PET and MR images using the PoR method as well as the RBV-PVC method were performed, as for [11C]PiB data. Prediction of the PV-corrected map for the [18F]FDG PET data and comparisons of the real and predicted maps were implemented in the same manner as the test for [11C]PiB.