Deep learning-based brain age prediction in normal aging and dementia

Brain aging is accompanied by patterns of functional and structural change. Alzheimer’s disease (AD), a representative neurodegenerative disease, has been linked to accelerated brain aging. Here, we developed a deep learning-based brain age prediction model using a large collection of fluorodeoxyglucose positron emission tomography and structural magnetic resonance imaging and tested how the brain age gap relates to degenerative syndromes including mild cognitive impairment, AD, frontotemporal dementia and Lewy body dementia. Occlusion analysis, performed to facilitate the interpretation of the model, revealed that the model learns an age- and modality-specific pattern of brain aging. The elevated brain age gap was highly correlated with cognitive impairment and the AD biomarker. The higher gap also showed a longitudinal predictive nature across clinical categories, including cognitively unimpaired individuals who converted to a clinical stage. However, regions generating brain age gaps were different for each diagnostic group of which the AD continuum showed similar patterns to normal aging. The authors developed a deep learning-based model to estimate the brain age gap based on metabolic and structural imaging data in cognitively normal individuals and in patients with dementia. An older brain age was associated with Alzheimer’s disease biomarkers and was predictive of future cognitive decline.

T he biology of aging is complex 1 and not fully understood 2 . In general, aging is characterized by the gradual accumulation of deleterious biological changes accompanying a progressive loss of function 1 , although this is an oversimplification. The endeavor to better understand the biology of the aging brain is widely relevant because the impact of aging on the human brain and associated changes in cognitive function have implications for quality of life.
Brain aging entails both structural and functional changes. Structural magnetic resonance imaging (MRI) has shown that increased age is associated with reduction of gray matter volume, most prominently in the frontal lobes, insular cortex and hippocampus [3][4][5][6] , increased volume of the ventricular system and intracranial cerebrospinal fluid 3,4,7 and changes in white matter microstructure 7,8 . In addition, functional imaging techniques using positron emission tomography (PET) have shown that brain aging is associated with decreased global oxygen utilization, cerebral blood flow, glucose uptake and regional changes in aerobic glycolysis 9,10 . Age-related decreased glucose utilization has been found most prominently in the frontal lobes, posterior cingulate, posterior parietal lobes [11][12][13] and also medial temporal regions-a critical area of pathology in dementia [14][15][16] . In contrast, the primary motor, occipital cortex, cerebellum and subcortical structures, including the thalamus, putamen and pallidum, are less susceptible to metabolic changes with aging 17 .
Based on these findings, age prediction using brain imaging is an active area of neuroscience research [18][19][20][21][22] . An estimated age can be referred to as 'brain age' , which may differ from the individual's chronological age 19 . Recently, growth in data availability and advancement of deep learning techniques have allowed more accurate brain age estimation in the cognitively normal population through convolutional neural network (CNN) models [21][22][23][24][25] . In addition, the 'brain age gap' , which is the difference between brain age and chronological age, is useful as a promising, personalized biomarker of brain health 19 . On an individual basis, brain age gap measurements may also prove to have prognostic value, potentially predicting health outcomes by capturing individual differences in the interaction of aging and disease 19 . Several studies reported that an overestimation of an individual's age based on neuroimaging, measured as a large brain age gap, is associated with mortality 26 , neurodegenerative diseases 27 and several other clinical conditions 19,20 . Moreover, measuring the brain age gap in cases of neurodegenerative pathology may inform our understanding of disease risk, resilience to structural/functional insults, which accumulate with aging, and the effects of diseases on the aging brain.
We aimed to develop a deep learning-based brain age prediction model using a large collection of brain structural MRI and fluorodeoxyglucose (FDG) PET scans from participants aged 26-98 years old (n = 2,349 unique individuals with 4,127 scans; cognitively unimpaired controls n = 1,805 and cognitively impaired n = 732). Our brain age prediction method was developed from 30-97-year-old cognitively unimpaired participants to train the healthy aging trajectories. We also studied age-and modality-specific saliency maps of the CNN model explaining which brain regions contribute most to age prediction for each age subgroup and modality type using an occlusion sensitivity analysis. We investigated the brain age gap estimation in the patient groups including mild cognitive impairment (MCI), AD, frontotemporal dementia (FTD) and dementia with Lewy bodies (DLB). The associations of brain age gap with neuropsychological tests, other imaging AD biomarkers, such as amyloid PET and tau PET, and the longitudinal predictive nature of disease progression in dementia were evaluated. A voxel-wise linear regression analysis evaluated which regional alterations contribute to higher brain age gap generation for each disease group and compared them with normal brain aging trajectories. correction, we observed that the correlation between the corrected brain age gap and chronological age decreased and the MAE also decreased (Fig. 2c,f). The overall performance after bias correction for five folds was MAE = 3.0755 ± 0.1401 and 3.4868 ± 0.1631 for FDG and MRI, respectively (Supplementary Table 2).
To assess whether the trained model presents a dataset-specific bias, the model trained with the Mayo dataset was applied to an independent cohort, the Alzheimer's Disease Neuroimaging Initiative (ADNI; https://adni.loni.usc.edu/) dataset (cognitively unimpaired, n = 330; number of scans n = 454). We obtained a comparable result that is not statistically different from the Mayo result (MAE = 3.1398 ± 0.2013 for FDG and MAE = 3.5101 ± 0.2270 for MRI; P = 0.58 and P = 0.84 for FDG and MRI, respectively; Holm-Šídák test), suggesting that the models were generalizable to the independent dataset (Extended Data Fig. 1a-f and Supplementary Table 2). In addition, we also trained a model by blending the Mayo and ADNI datasets together (Extended Data Fig. 1g-l). In this trial, the overall performance of age prediction was significantly better than using the Mayo dataset only (MAE = 2.7383 ± 0.1091 for FDG and MAE = 3.1029 ± 0.2107 for MRI; P = 0.01 and P = 0.005 for FDG and MRI, respectively, Holm-Šídák test; Extended Data Fig. 1m and Supplementary Table 2). The model's performance was also compared with two different architectures: 3D-ResNet and simple, fully convolutional network (SFCN) 31  Given the longitudinal nature of our dataset, many participants had serial scans that were acquired at different time points (mean interscan interval = 2.65 ± 1.14 years for the cognitively unimpaired cohort). Although the images at each time point could be considered different and independent data because the interscan interval was long enough to allow for some changes in the acquired images, the repeated scans still had high similarity to each other. Thus, we aimed to explore whether these serial images were different enough to serve as independent data points for machine learning applications or whether this induces model overfitting and bias, thus hurting the generalizability of the model. To interrogate the possibility that within-participant variability affects the model's performance, the prediction accuracy of several data split strategies was compared (as detailed in the Methods). As expected, we observed that the overlap of the same participants between the training and validation or test datasets (assigning at least one scan of a participant to the training while assigning a different scan of the same participant to the validation or test dataset) significantly affected the accuracy of age estimation (two-sample Student's t-test with option 1; P < 0.001 for validation; test MAEs in option 2 and validation MAE in option 3; Supplementary Table 4). This pattern was similar for both FDG and MRI inputs. Meanwhile, including multiple scans for each participant showed minimal differences on the model's performance (two-sample Student's t-test; option 4 and option 5; Supplementary Table 4).
Saliency map of brain age prediction model. For interpretability of the trained models, saliency maps were estimated through occlusion sensitivity analysis: a portion in the input space was occluded with a mask (11 × 11 × 11) by setting these voxels to zero; their relevance in the decisions was estimated indirectly by calculating the change of MAE (MAE occlusion − MAE original ; Fig. 1b). Saliency patters were age-and modality-specific ( Fig. 3 and Extended Data Fig. 2).
For FDG, the overall posterior region with a peak at the posterior cingulate cortex (PCC) had a higher contribution for age prediction in the younger group (30-40 and 40-50 years). For the 50-60, 60-70 and 70-80 years of age groups, the inferior frontal regions including the orbitofrontal, gyrus rectus and middle frontal regions showed a higher contribution than other areas. A global contribution with the peak around the inferior frontal cortex, basal ganglia, inferior temporal cortex and pons was also important for age prediction in the older groups (80-90 and 90-100 years). For MRI, the insular cortex contributed most to age prediction in the younger groups (30-40 and 40-50 years). From 50 to 60 years, the ventricular boundary showed a higher contribution. The cerebellomedullary cistern showed the highest saliency in the older groups (80-90 and 90-100 years). The coordinates of peak saliency found in each age range are summarized in Supplementary Table 5.
Brain age gap estimation in patient groups. The brain age gap of 4 clinical diagnosis groups (MCI, n = 480, number of scans, n = 666; AD, n = 215, number of scans, n = 372; FTD, n = 45, number of scans, n = 69; DLB, n = 86, number of scans, n = 141) was estimated using the model trained with normative cohorts. Brain age was corrected using the same coefficients used for bias correction of cognitively unimpaired individuals (Fig. 2). As expected, the brain age gap of all patient groups was significantly higher than that of the cognitively unimpaired group for both modalities (P < 0.001, Holm-Šídák test; Fig. 4a,c). Interestingly, the predicted brain age gap had a negative correlation with chronological age, that is, younger patients had a higher gap (Fig. 4b,d). The mean brain age gap of FTD, a relatively early onset process, was higher than that of other groups, followed by AD, DLB and MCI.
As shown in Fig. 4e, the FDG-and MRI-based brain age gaps showed significant correlation with each other (P < 0.001, Pearson's  Fig. 3). The same models were then applied to the disease groups in the ADNI cohort (MCI, n = 647, number of scans, n = 885; AD, n = 255, number of scans, n = 283) and a similar result was observed (Extended Data Fig. 4). In both modalities, the brain age gap for the MCI and AD groups was significantly higher than that of the cognitively unimpaired groups (P < 0.001, Holm-Šídák test; Extended Data Fig. 4a,c). The correlation coefficient between the FDG-and MRI-based brain age gaps of AD was significantly higher than that of the cognitively unimpaired groups (P < 0.001, z-test after Fisher's r-to-z transformation; Extended Data Fig. 4e).
Associations of brain age gap with demographics and AD biomarkers. A high brain age gap is linked to high cognitive impairment 19,20,25 . In light of this, the association on the corrected brain age gap of disease groups with three neuropsychological test scores was tested, including the Clinical Dementia Rating sum of boxes (CDR-SB) 32 , Short Test of Mental Status (STMS) 33 and Mini-Mental State Examination (MMSE) 34 . As expected, both brain age gaps showed significant correlations (P < 0.001, Pearson's correlation for  Table 6).
Then, we sought to examine the association of brain age gap with neuroimaging AD biomarkers (Fig. 5). AD is characterized by pathological aggregation of amyloid beta and neurofibrillary tangles that can be captured by Pittsburgh Compound B (PiB) PET and tau PET, respectively. For PiB PET, only the MCI group reached statistical significance in FDG and MRI; however, the correlation coefficient was marginal and there was no obvious pattern of association in distribution (Pearson's correlation; Fig. 5a,c). However, tau PET showed a significant correlation with brain age gap in the MCI and AD groups but not FTD or DLB (Pearson's correlation; Fig. 5b,d).
In particular, the AD group showed a higher correlation (r = 0.5110 for FDG and r = 0.6648 for MRI). The same pattern was also observed in the ADNI dataset (Extended Data Fig. 6). Only tau PET showed a significant correlation with brain age gap, while the amyloid PET did not show an association.
We also tried to evaluate the association of sex to the age gap estimation (Extended Data Fig. 7). In cognitively unimpaired individuals, females showed a significantly lower brain age gap than males in both modalities (two-sample Student's t-test, P < 0.001 and P = 0.001 for FDG and MRI, respectively). This is consistent with previous findings reporting that the female brain showed a persistently lower brain age compared with the male brain 18 . The brain age gap of females was estimated significantly higher than males in the AD group (two-sample Student's t-test, P = 0.009 and P = 0.005 for FDG and MRI, respectively). Females also showed a significantly higher brain age gap in the DLB group but this pattern was observed only in the MRI-based model (two-sample Student's t-test, P = 0.0045).
Longitudinal predictive nature of the brain age gap. The longitudinal relevance of the brain age gap was investigated using serial scans. For this analysis, disease progression groups based on serial time points were defined as cognitively unimpaired to cognitively unimpaired, cognitively unimpaired to MCI/AD, MCI to MCI, MCI to AD, MCI to FTD, FTD to FTD, MCI to DLB and DLB to DLB, with the second category representing the most recent diagnostic group assignment. The three patients in the cognitively unimpaired to AD category were included in the cognitively unimpaired to MCI/AD, and the cognitively unimpaired to FTD (n = 1) and cognitively unimpaired to DLB (n = 0) groups were excluded due to an insufficient number of participants. For the cognitively unimpaired cohort, the brain age gap was collected when each participant was assigned to the test dataset. First, we explored whether the brain age gap at earlier time points (that is, baseline) was associated with the progression of diagnosis at later time points. For this comparison, only cognitively unimpaired (that is, cognitively unimpaired to cognitively unimpaired, cognitively unimpaired to MCI/AD) and MCI groups (MCI to MCI, MCI to AD, MCI to FTD and MCI to DLB) at baseline were included and the baseline brain age gap was compared within the same baseline groups (Fig. 6a,c). The cognitively unimpaired to MCI/AD group showed a significant difference of baseline brain age gap from the cognitively unimpaired to cognitively unimpaired group for both modalities (P = 0.001 and P < 0.001 for FDG and MRI, respectively; Holm-Šídák test). The comparison between the MCI to MCI and MCI to AD groups also reached significance in the MRI model (P = 0.005; Holm-Šídák test), while the difference for the FDG model approached but did not meet significance (P = 0.07; Holm-Šídák test). This observation suggests that the baseline brain age gap can predict the progression of cognitive impairment. Statistical significance of the comparison between MCI to MCI and MCI to FTD was only found in the FDG model (P < 0.001; Holm-Šídák test) and the baseline brain age gap was not different between the MCI to MCI and MCI to DLB groups for either imaging modality. Since the probability of disease progression increases with longer time intervals between consecutive scans, the interscan interval between groups was also compared. We found that the interval of the cognitively unimpaired to cognitively unimpaired group was significantly higher than that of the cognitively unimpaired to cognitively unimpaired group to the MCI/ AD groups (Extended Data Fig. 8a; Holm-Šídák test). To exclude any bias due to the difference in interscan interval, we repeated the comparison of baseline brain age gap after excluding participants with an interscan interval of >2 years (Extended Data Fig. 8b) and found that the baseline brain age gap was still predictive of disease progression at a later time point in both modalities (Holm-Šídák post hoc test; Extended Data Fig. 8c,d).
A similar result was observed in the external ADNI cohort (Extended Data Fig. 9a,c). The comparison between the cognitively unimpaired to cognitively unimpaired and cognitively unimpaired Chronological age FDG-based brain age gap MRI-based brain age gap e Fig. 4 | Regression plots of a corrected brain age gap as a function of chronological age for clinical diagnostic groups. a, Violin plots of corrected brain age gap for each diagnostic group. The corrected brain age gap of disease groups was compared with cognitively unimpaired individuals using a one-way ANOVA with Holm-Šídák multiple comparisons test. Exact P values: cognitively unimpaired versus MCI, P = 1.5 × 10 −9 ; cognitively unimpaired versus AD, P < 1 × 10 −15 ; cognitively unimpaired versus FTD, P < 1 × 10 −15 ; cognitively unimpaired versus DLB, P < 1 × 10 −15 ; ***P < 0.001. b, FDG-based brain age gap estimation for MCI, AD, FTD and DLB, respectively. The black solid line and dotted lines in each figure represent a regression line and its 95% confidence bands, respectively. c, Violin plots of corrected brain age gap for each clinical diagnosis group. The corrected brain age gap of disease groups was compared with cognitively unimpaired individuals using a one-way ANOVA with Holm-Šídák multiple comparisons test. Exact P values: cognitively unimpaired versus MCI, P = 2.4 × 10 −11 ; cognitively unimpaired versus AD, P < 1 × 10 −15 ; cognitively unimpaired versus FTD, P < 1 × 10 −15 ; cognitively unimpaired versus DLB, P < 1 × 10 −15 ; ***P < 0.001. d, MRI-based brain age gap estimation for MCI, AD, FTD and DLB, respectively. The black solid line and dotted lines in each figure represent a regression line and its 95% confidence bands, respectively. e, Relationship between FDG-and MRI-based brain age gap. The black solid line and dotted lines in each figure represent a regression line and its 95% confidence bands, respectively. r indicates Pearson's correlation coefficient.
to MCI/AD groups was statistically significant in both models (P = 0.04 and P < 0.001 for FDG and MRI, respectively; two-sample Student's t-test). Only the MRI model showed a statistically significant difference in comparison between MCI to MCI and MCI to AD (P = 0.03; two-sample Student's t-test).
Next, we looked at how the longitudinal change of brain age differed for each disease group. For this analysis, the annual rate of change in brain age gap (Δbrain age gap per year) between consecutive scans was compared between the groups. Thus, we found that in both modalities, the MCI to AD and AD to AD groups showed a significantly higher Δbrain age gap than the cognitively unimpaired to cognitively unimpaired group (Holm-Šídák test; Fig. 6b,d). Only the FDG model showed statistical significance in the FTD to FTD group (P < 0.001, Holm-Šídák test; Fig. 6b). In the ADNI cohort, only the AD to AD group in the FDG model showed a significantly higher Δbrain age gap compared to the cognitively unimpaired  5 | association of brain age gap with meta-ROi PiB and Tau PET SuVR. a, Scatter plots show the relationship between FDG-based brain age gap with meta-ROI PiB PET SUVR for MCI, AD, FTD and DLB, respectively. b, Scatter plots of FDG-based brain age gap versus meta-ROI tau PET SUVR. c, Scatter plots of MRI-based brain age gap versus meta-ROI PiB PET SUVR. d, Scatter plots of MRI-based brain age gap versus meta-ROI Tau PET SUVR. The black solid line and dotted lines in each figure represent a regression line and its 95% confidence bands, respectively. r, Pearson's correlation coefficient; P, correlation test P value.

Brain age gap in dementia and normal aging.
A voxel-wise linear regression analysis was performed using the brain age gap as a regressor to investigate which brain region alterations were related to higher brain age gap generation for each patient group. In this analysis, chronological age was specified as a nuisance covariate because it was negatively correlated with the brain age gap. The FDG-and MRI-based brain age gap showed different patterns according to the disease groups (using linear regression, false discovery rate (FDR)-corrected, q < 0.01; Fig. 7a,b). In FDG, the MCI and AD groups showed a negative correlation throughout the brain, meaning that global cortical hypometabolism was associated with a higher brain age gap, while the white matter region showed positive correlation ( Fig. 7a and Extended Data Fig. 10). In the AD group, the frontal, temporal and parietal regions showed a stronger negative correlation. In contrast, significant hypometabolism related to the brain age gap was observed in the frontal and temporal regions in the FTD patient group. Interestingly, the occipital, precentral cortex and thalamus showed a positive correlation in the FTD group.
The DLB group showed a significant negative correlation in the posterior and temporal regions. The precentral cortex and thalamus showed a positive correlation with the brain age gap. However, MRI showed a distinctly different pattern of prominent regions from FDG ( Fig. 7b and Extended Data Fig. 10). In MCI and AD, sulci and white matter showed a positive correlation; regions around the gyri and ventricles showed a negative correlation with the brain age gap. In contrast, a local negative correlation around the ventricles was marginally observed for the FTD and DLB patient groups. To compare the observed brain age gap-related changes with normal aging, a linear regression analysis was also performed for the cognitively unimpaired group using chronological age as a regressor (Fig. 7c,d). Compared to the occlusion analysis, voxel-wise regression can highlight brain regions showing statistically significant associations with normal aging. The regression analysis of age in the cognitively unimpaired group differed from the salience analysis on the same group of images, a plausible result because the model may have focused on specific features (even statistically nonsignificant features) rather than treating all input information together as a group as with the regression. Like the results for MCI and AD, a global cortical negative correlation and positive correlation in white matter were observed on FDG (Fig. 7c). A positive correlation in sulci and white matter and a negative correlation in areas around the gyri and ventricles was observed on MRI (Fig. 7d). Then, to evaluate the similarity between the brain age gap-related changes and normal aging, a voxel-wise correlation analysis between beta values was performed (Fig. 7e-h). The result showed that the beta map of each patient group was strongly correlated with that of normal aging for FDG and MRI (P < 0.001, Pearson's correlation). The similarity of MCI was strongest among groups and followed by the AD group in both modalities (P < 0.001, z-test after Fisher's r-to-z transformation; Fig. 7f,h). The correlation coefficients of the FTD and DLB groups were relatively lower than those of the MCI and AD groups (P < 0.001, z-test after Fisher's r-to-z transformation; Fig. 7f,h).

Discussion
We developed the 3D-DenseNet models, trained on structural or metabolic brain images, which accurately estimated an individual's brain age during normal aging. An occlusion analysis revealed anatomical regions critical to the model performance and demonstrated an age-dependent saliency pattern of brain regions, which was distinct for each input imaging modality. In cohorts with a neurological disorder, the brain age gap was larger than in cognitively unimpaired individuals and was significantly correlated with  , the first and third quartile (box boundaries) and the median (internal line). *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.
the cognitive score and AD neuroimaging biomarker. Additionally, the brain age gap measure at baseline predicted the progression of cognitive impairment at a later time point. Anatomical regions with the greatest association with the brain age gap, identified from the voxel-wise linear regression analysis, were different for each diagnostic group. The results for the AD continuum, MCI and AD, showed close correlations to normal aging compared to FTD or DLB.
Most previous brain age studies were based on structural MRI [21][22][23][24][25][26]31 . To our knowledge, only 1 prior study utilized FDG PET 18 but that study was based on a non-deep learning method and utilized a substantially smaller cohort (n = 205   Fig. 7 | Voxel-wise linear regression analysis of the brain age gap. a,b, Clinical diagnosis group-specific (MCI, AD, FTD and DLB) results from voxel-wise whole-brain linear regression examining brain age gap-related changes (FDR-corrected, q < 0.01) for FDG and MRI, respectively. Chronological age was specified as a nuisance covariance. c,d, For cognitively unimpaired individuals, voxel-wise linear regression analysis was performed using the chronological age as a regressor to show the age-related change for FDG (c) and MRI (d), respectively. e, Voxel-wise correlation of beta value between clinical diagnosis group (vertical axis) and cognitively unimpaired group (horizontal axis) for the FDG model. structural MRI-based studies reported explanation maps of the CNN model 21,22,24 . The structural and functional changes contributing to precise age prediction in the deep learning approach are to be fully elucidated. The longitudinal predictive nature of the brain age gap has not yet been explored in a preclinical group. Furthermore, there is a dearth of knowledge regarding which brain alterations and specific regional changes are associated with higher brain age gaps in patients and the relationship of expected biological senescence and pathological processes.
Our model precisely estimated an individual's chronological age based on structural and metabolic neuroimaging data. Interestingly, FDG-based brain age prediction was slightly better than the MRIbased model (Fig. 2 and Supplementary Table 2), suggesting that metabolic data may be more sensitive for tracking normal brain aging trajectories. One consideration is that metabolic changes detectable on PET may precede structural changes observed in AD 35 , although this has not been characterized in cognitively unimpaired individuals. Also, our FDG-based model partially incorporated structural information since the spatial normalization to template space for the FDG scan was performed using the individual's MR images, meaning that the brain age prediction model using FDG has the benefit of both functional and structural information. The FDG images are also affected by structural changes via partial volume effects. Alternatively, the decreased performance of the model using MRI relative to FDG could be a consequence of regional heterogeneity in age-related structural changes in the brain 36 .
Occlusion analysis revealed regions important for age estimation and showed a distinct age-specific saliency pattern according to the input imaging modality (Fig. 3 and Extended Data Fig. 2). In the FDG-based model, a transition of posterior to anterior structures with increased age was observed. The posterior structures, especially the PCC, contributed most in the younger age groups, whereas anterior structures including the frontotemporal lobes were more critical in the older age groups. Glucose metabolism decline in the PCC with age has been reported 11 and amyloid deposition and reduced glucose metabolism in the PCC has been implicated in early AD 37 . In older adults, FDG activity in the frontal regions was more salient; decline of frontal metabolism in normal aging was consistently reported across several studies 14,38 . The MRI-based model's saliency map demonstrated different critical regions compared to the FDG analysis. For the younger age groups, the insula was identified as the most critical region, a region that undergoes gray matter volume loss with normal aging 39 . Additionally, the medial temporal lobe was identified as an area with high saliency in the MRIs of younger, 30-50-year-old individuals, regions of previously described volume loss with aging as well as AD 40 . Preservation of brain parenchyma in the insula and medial temporal lobe of younger individuals may have been a reliable feature for MRI-based age prediction. For the older age groups, the cerebellomedullary cistern and peripheral boundaries of the ventricles were critical. This may reflect reliance of the model on the typical enlargement of the cerebrospinal fluid (CSF) spaces that occurs with age 3,4,7,41 . Interestingly, the saliency maps did not show a prominent contribution of cortical regions for age estimation, which we expected to find due to the typical agedependent decrease in cortical volume seen on MRI 39,41 . We speculate that cortical changes with age may be too heterogenous to serve as the most reliable salient feature for the age prediction model. Changes in white matter signal characteristics are also a well-known phenomenon of aging 42 . No contribution of white matter was found with our occlusion analysis, which might be a consequence of white matter intensity normalization performed on MRI.
Interestingly, the estimated brain age gap was negatively correlated with chronological age for both MRI and FDG and was close to zero in the older age groups (Fig. 4), suggesting that the model cannot distinguish normal from diseased brain at a similar older age. Alternatively, attrition could explain the negative association between chronological age and brain age gap because individuals with a more diseased brain (that is, higher brain age gap) are not likely to survive to older ages. The brain age gap of MCI and AD showed a significant association with tau PET but not amyloid PET ( Fig. 5 and Extended Data Fig. 6). Tau is well known to be more closely related to AD severity than amyloid level 43 . In both preclinical AD and AD dementia, tau radiotracer uptake and cortical thickness are correlated with decreased cognitive task performance to a greater degree than amyloid beta radiotracer uptake 43 . Furthermore, the brain age gap estimation is capable of predicting disease progression even in the preclinical stage in a longitudinal design (Fig. 6). One prior study reported that a higher brain age gap was found in progressive MCI changing diagnosis from MCI at baseline to AD at follow-up compared to the stable MCI group 44 . Franke and Gaser 44 showed that brain aging accelerates more in progressive MCI and AD groups than cognitively unimpaired individuals and stable MCI 44 .
A strong correlation was observed between FDG-and MRIbased brain age gap in the cognitively unimpaired and neurodegenerative disease groups. This suggests that the metabolic changes of normal aging, as well as disease progression, are concurrent with structural changes, with regard to factors that impact the performance of the age prediction model. The correlation between FDGand MRI-based brain age gap was mildly stronger in the disease groups (r = 0.6548-0.7824) than in the cognitively unimpaired cohort (r = 0.5873). The structural changes or atrophy in neurodegenerative pathology accompanying hypometabolism, to a greater extent than with normal aging, is one plausible explanation for the increased correlation in the disease groups. Alternatively, brain hypometabolism, which occurs in specific patterns for different categories of neurodegenerative pathology 45 , may correlate more closely with structural or volumetric changes for specific neurodegenerative disease cohorts than in normal aging.
In FTD, the frontal and anterior temporal regions showed a negative correlation with the brain age gap, regions with characteristic hypometabolism in FTD; 46,47 a positive correlation was observed in the occipital lobe, a region typically without hypometabolism in FTD 46,47 . Castelnovo et al. 48 reported that some cases with FTD showed occipital hypermetabolism 48 . In DLB, the temporal, parietal and occipital regions were negatively correlated with the brain age gap, regions of hypometabolism frequently observed in DLB 46 . Correlation between the occipital lobe and primary visual cortex in the DLB group is notable because occipital/primary visual cortex hypometabolism is characteristic of DLB from other neurodegenerative processes such as AD 46,49 . The ability of the metabolic signature to distinguish DLB from AD is unique and an important component of the clinical utility of FDG PET 50 since abnormal amyloid PET, which is a defining hallmark of AD, is commonly present in DLB due to the phenomenon of co-occurring pathologies with advancing age 49 . The ventricle and boundaries of the brain parenchyma with the CSF space were correlated with MCI and AD in MRI. For FTD and DLB, the ventricular boundary was correlated with the brain age gap, although no correlation was seen at the CSF and cortical region. Periventricular borders with the CSF may reflect areas of white matter volume loss and the gyral/sulcal interface, both also occurring with normal aging 3,4,7 . The decreased correlation found in the FTD and DLB groups relative to the MCI and AD groups could be due to the smaller sample size in the FTD and DLB cohorts.
Brain age estimation has potential as a useful neuroscientific and prognostic clinical tool, although the conceptual paradigm underlying a 'brain age' has attracted some criticism and debate of which deviations are due to the result of specific pathological processes rather than an acceleration of normal biological senescence 19 . Using both functional and structural neuroimaging, we demonstrate that brain age gap-associated changes in MCI and AD have a stronger similarity to normal aging than those of FTD and DLB (Fig. 7). Brain age gap estimation in MCI and AD using our model may reflect a process of accelerated aging versus FTD and DLB in which the brain age gap may represent specific regional pathology. However, pathological entities and normal aging changes cannot be easily dissociated because aging results from cumulative biological damage 51 , which suggests that biological aging and disease are intrinsically linked 19 . Relevant to this conceptual framework, we showed that MCI correlates with normal aging more than AD, suggesting that the mild disease condition is more similar to biological aging. With greater severity in the AD continuum, the pattern of changes was more pathology-specific. In this sense, brain age estimation may more provide greater insights, informing our understanding of the relationship of the aging process to degenerative pathology in a broader sense. If dementia reflects a continuum of the underlying changes in brain structure and metabolism to which all individuals are inevitably susceptible at various rates, brain age prediction based on neuroimaging may yield a better understanding of different brain aging phenotypes. Alternatively, if types of dementia represent entities with distinctly different mechanisms than normal aging, markers of brain age may still prove useful in identifying individuals at greater risk for developing these conditions 52,53 .
This study has some notable limitations. In the occlusion analysis, left hemispheric dominance was observed in the contribution to brain age prediction, which was not explainable by post hoc analysis. The occlusion-based method focuses more on the most dominant regions compared to other interpretation methods 54 . In this study, we only tested neurodegenerative pathology without evaluating any chronic systemic medical diseases and vascular diseases that may have different patterns of brain aging. Previous studies 21,25 trained CNNs on larger MRI samples (approximately 10,000) by aggregating public datasets and were able to greatly improve performance. We did not include cohorts from additional public datasets in which FDG was not available because we aimed to compare MRI-and FDG-based models to test how the two modalities behave differently in age prediction tasks for cognitively unimpaired and dementia groups. Although we acquired successful prediction accuracies, increasing the sample size for training could further improve the model's performance. Our model presented reasonable performance; however, excluding those voxels absent of brain parenchyma could reduce the parameters and further increase training efficiency.
In summary, we showed that a 3D-DenseNet model generates accurate age prediction for cognitively unimpaired individuals, with slightly more robust performance using an FDG PET input than MRI. Brain age prediction using PET imaging, which reflects metabolic function, may present a distinct assessment of brain health from the structural information evaluated on MRI. The brain age gap from MRI or FDG data is increased in multiple types of dementia compared to cognitively unimpaired individuals; therefore, it may prove to be a useful composite biomarker to identify increased risk for pathology or as a marker of disease severity. (Table 1) ranging in age from 26 to 98 years old were included (n = 2,349, number of scans = 4,127) who had both MRI and FDG PET from the Mayo Clinic Study of Aging (MCSA) or the Alzheimer's Disease Research Center (ADRC) study (Table 1). All participants or designees provided written informed consent with the approval of Mayo Clinic and Olmsted Medical Center institutional review boards. As described previously, the Mayo Clinic Rochester ADRC is a longitudinal cohort study that enrolls participants from the clinical practice at the Mayo Clinic in Rochester 55 . The MCSA is a population-based study of cognitive aging among Olmsted County residents 56 . Enrolled participants are adjudicated to be clinically normal or cognitively impaired by a consensus panel consisting of study coordinators, neuropsychologists and behavioral neurologists. Methods for defining clinical unimpairement, MCI and dementia in both of these studies conform to standards in the field [57][58][59] . For this analysis, participants were assigned into 6 clinical subgroups based on clinical diagnosis according to consensus criteria 49,60 including cognitively unimpaired For the CNN model training, only data from cognitively unimpaired individuals were utilized. Some participants also underwent amyloid PET scanning with PiB (number of scans = 2,508) and tau PET scans with flortaucipir (number of scans = 608). Some participants had CDR-SB 32 , STMS 33 and MMSE 34 available (number of scans = 1,522, 1,491 and 1,587, respectively). All cognitive tests were administered by experienced psychometrists and supervised by board-certified clinical neuropsychologists. To examine whether the trained model presented a dataset-specific bias, we also utilized the ADNI dataset (n = 1,150, number of scans = 1,622; Supplementary Table 1). The ADNI dataset included cognitively impaired participants (n = 330, number of scans = 454) and participants with MCI (n = 647, number of scans = 885) and dementia (n = 255, number of scans = 283). Some participants from the ADNI dataset also underwent amyloid PET scanning with AV45 (number of scans = 1,464) and tau PET scans with flortaucipir (number of scans = 283).

Dataset. A large number of participants
Image processing. T1-weighted MRI scans were acquired using 3T scanners. FDG PET imaging was performed with 18 F-fluorodeoxyglucose, amyloid PET with PiB 61 and tau PET with 18 F-flortaucipir (AV-1451) (ref. 62 ). FDG PET images were acquired from 30-40 min, PiB PET from 40-60 min and tau PET from 80-100 min after injection. Computed tomography was obtained for attenuation correction. PET images were analyzed with our in-house fully automated image processing pipeline 63 . Briefly, PET scans were co-registered to the corresponding MRI for each participant within each time point and subsequently warped to the Mayo Clinic Adult Lifespan Template (MCALT) space 64 (https://www.nitrc.org/projects/ mcalt/) using the warps from SPM12 unified segmentation 65 . The corresponding MRI was corrected for intensity inhomogeneity and segmented using MCALT tissue priors and segmentation parameters. The FDG PET standardized uptake value ratio (SUVR) was calculated by dividing the median of uptake in the pons and the SUVR images were used for input data to the CNN model. The amyloid and tau PET SUVR were calculated by dividing the median uptake in the cerebellar crus gray matter 66 . A meta-region of interest (ROI) PiB PET SUVR was derived from the average of the median SUVR in the prefrontal, orbitofrontal, parietal, temporal, anterior cingulate and posterior cingulate/precuneus regions 66 . A meta-ROI tau PET SUVR was formed from the average of the median uptake in the amygdala, entorhinal cortex, fusiform, parahippocampal and inferior temporal and middle temporal gyri 66 . For each MRI volume, the intensities of the voxels were normalized by dividing the mean intensity derived from the individualized white matter mask after spatial normalization. Then, this image was used for the input data to the CNN model 67 .

3D-DenseNet architecture and training.
A modified 3D-DenseNet model 28 was trained on FDG PET or MRI scans of the cognitively unimpaired cohort (Fig. 1a). For the training, we only utilized scans of the first time point (number of scans = 1,805) to avoid data leakage between the training and validation/ test datasets. Experimental tests measuring how an overlap of participants among training, validation and test datasets affected the model's results were performed separately (see the Dataset split experiment section). A schematic of the 3D-DenseNet architecture is shown in Fig. 1a. The specific dimension of input data was 121 × 145 × 121, in our applications. The output to be predicted was a single scalar representing chronological age (years). The architecture consisted of a regular 5 × 5 × 5 convolutional layer followed by 4 dense blocks and 3 transition blocks in between them. The 4 dense blocks consisted of 3, 6, 12 and 8 dense layers, respectively (denoted above each block). Each dense layer had a 1 × 1 × 1 bottleneck convolutional layer followed by a 3 × 3 × 3 convolution layer. The dense layers were densely interconnected in a feed-forward manner within each block. The growth rate (k) was 48. The flattened output from the last global average pooling layer was then fully connected with 1,457 units and was connected to the output layer.
The neural network was implemented using Keras with TensorFlow 68 as the backend. Cross-validated experiments were conducted using fivefold validations (60% training dataset, 20% validation dataset and 20% test dataset). The MAE was used as the loss function. The model was optimized using the Adam optimizer with the parameters: β1 = 0:9 and β2 = 0.99 (ref. 69 ). The He initialization strategy was used for weight initialization 70 . The training epoch was 150. The learning rate selected for the training set was 1 × 10 −4 and decreased by a factor of 2 for every 10 epochs. If the validation error did not improve in seven epochs, the learning rate was updated. The hyperparameters were optimized based on the performance of the validation set in the hyperparameter tuning stage from an initial grid parameter search (batch size: (2, 4) and learning rate: (1 × 10 −1 , 1 × 10 −2 , 1 × 10 −3 , 1 × 10 −4 , 1 × 10 −5 )). The loss function, optimizer and learning rate scheduler and early stopping callback were fixed throughout the tuning stage. The total number of parameters were 70,183,073, of which 70,122,657 were trainable parameters. We used a mini-batch size of four. Training and testing were performed on a Tesla P100 GPU.
For a comparison with DenseNet, we also utilized 3D-ResNet101 (ref. 71 ) and SFCN architectures 31 . We only compared the performance with other deep learning models, although non-deep learning models can offer the benefit of greater interpretability because it has been well established that deep learning models perform far better than a non-deep learning approach and feature extraction may be problematic for non-deep learning models. For the ResNet training, we implemented an ADAM optimizer with an initial learning rate of 0.001. For the SFCN, we implemented a stochastic gradient descent optimizer with an initial learning rate of 0.01 and L2 weight decay parameter of 0.001. The source code is available at https://github.com/Neurology-AI-Program/Brain_age_prediction.git.

Occlusion sensitivity analysis.
To facilitate interpretability, we generated brain maps of the relevant features used in the age prediction model using occlusion sensitivity analysis 72 . The analysis was conducted within the test dataset. To calculate the age-specific saliency map, data were separated into 7 sub-age groups based on their chronological age, from 30 to 100 with 10 year intervals. Within each group, the original images were occluded by 11 × 11 × 11 voxel areas with zero values, along a 11 × 11 × 11 grid (Fig. 1b). Since the front and rear 12 voxels along the anteriorposterior axes do not include the brain area, those were excluded from occlusion to reduce the computational load. Then, age inference on the occluded images was performed through our pretrained 3D-DenseNet model and performance was evaluated as MAE occlusion . The delta MAE was obtained by calculating the difference between MAE occlusion and MAE original acquired through the original image; a delta MAE matrix (11 × 11 × 11) was obtained by iterating occlusion for every region (n = 1,331). Then, the delta MAE matrix was reconstructed into the original image size (121 × 145 × 121) through cubic interpolation and zero padding for the excluded area in the occlusion; the average of the 5 folds was calculated. Normalization was performed by dividing the entire image by the maximum value; thus, the values of the final saliency map ranged from 0 to 1.

Dataset split experiment.
To measure how the inclusion of multiple time points per participant affected brain age prediction, we tested five different data split options. The main result was derived from the strictest data split option: option 1 using only a single time point per participant. Four additional options were tested; option 2 (multiple time points per participant with overlap between training, validation and test datasets permitted); option 3 (multiple time points per participant with overlap between training and validation datasets permitted); option 4 (multiple time points for the training and validation datasets and a single time point for the test dataset; no overlap of participants among training, validation and test datasets were permitted); and option 5 (a single time point was used for the validation and test datasets; no overlap of participants among training, validation and test datasets). For these five options, the validation and test MAE from fivefold cross-validations were compared with option 1 (Supplementary Table 4).
Statistical analysis. Brain age prediction accuracy was assessed by MAE and Spearman's correlation between predicted age and chronological age. Defining x to be chronological age and y the predicted age, the brain age gap was calculated by y − x. The brain age gap is known to be correlated with chronological age, which results in an overestimation for younger individuals and an underestimation for older individuals 21,31 due to regression dilution 29 . Therefore, we used the linear bias correction method described in Smith et al. 30 for age bias correction for the brain age gap. We fitted a linear regression y = ax + b to the test dataset. Then, the corrected brain age gap was calculated by (y − b) / a − x. The a and b coefficient derived from the cognitively unimpaired group was applied to other diagnostic groups in the same way for the bias correction. The corrected brain age gap of the disease groups was compared with the cognitively unimpaired participants by an one-way analysis of variance (ANOVA) with Holm-Šídák post hoc test. The Pearson's correlation coefficient was evaluated to test for an association between the FDG-and MRI-based brain age gap. These correlation coefficients were then compared between the cognitively unimpaired and disease groups using a z-test after Fisher's r-to-z transformation. The Pearson's correlation coefficient was utilized to test for an association of the corrected brain age gap and cognitive scores, and meta-ROI amyloid-and tau-PET SUVR. A pair-wise comparison of corrected brain age gap between females and males within clinical groups was evaluated using a two-sample Student's t-test. For the longitudinal analysis, the disease progression group seen in participants at the time points of the serial scans were defined as cognitively unimpaired to cognitively unimpaired, cognitively unimpaired to MCI/AD, MCI to MCI, MCI to AD, MCI to FTD, FTD to FTD, MCI to DLB and DLB to DLB, with the second category representing the most recent diagnostic group assignment. Then, the baseline (that is, earlier time point) brain age gap was compared within the same baseline groups (that is, cognitively unimpaired baseline: cognitively unimpaired to cognitively unimpaired versus cognitively unimpaired to MCI/AD; MCI baseline: MCI to MCI versus MCI to AD, MCI to FTD and MCI to DLB) using a one-way ANOVA with Holm-Šídák post hoc test. An annualized Δ brain age gap was calculated by a difference of brain age gaps between baseline and follow-up scan and dividing by the time difference in years. The annualized Δ brain age gap of each group was then compared to that of CU to CU group using an one-way ANOVA with Holm-Šídák post hoc test. A voxelwise regression analysis was performed using the brain age gap as a regressor to investigate which brain regions' alterations were associated with brain age gap generation for each patient group. Each individual's chronological age was specified as a nuisance covariance. For cognitively unimpaired participants, the same analysis was performed using chronological age as a regressor. Statistical significance was corrected for multiple comparisons using a false discovery rate (FDR) 73 with a cluster size of at least 100 adjacent voxels. Three-dimensional models of the cortical surfaces were reconstructed with a standard recon-all command (FreeSurfer v 7.1.1) 74 . Surfaces were visualized using the SUMA software (https://afni.nimh.nih.gov/Suma). A similarity of beta map between the cognitively unimpaired and disease groups was evaluated using a voxel-wise Pearson's correlation analysis. These correlation coefficients were then compared between disease groups using a z-test after Fisher's r-to-z transformation. All analyses were performed with MATLAB v.9.4 (MathWorks) and Prism v. 9.1.2 (GraphPad Software).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The Mayo dataset that supports the findings of this study is not publicly available. Anonymized data are available from the corresponding author upon reasonable request. The MRI and PET data from ADNI are available to researchers via the data access procedure described at http://adni.loni.usc.edu/data-samples/access-data/. Source data are provided with this paper. CDR-SB, median (IQR) 0 (0) 0.5 (0.5-0.5) 1 (0.5-1) 1 (0.5-1) 1 (0.5-1)