Datasets
We examined a public-access MRI dataset that included 148 healthy adult participants, ages 18-75 years (Mean Age=30.01 +/-12.76 years; 64.1% Female; age-by-sex histogram shown in Figure 1) [20]. During MRI acquisition (detailed below), participants were instructed not to move at all, and in other scans to nod their head. To create different levels of motion artifacts, the word “MOVE” was presented (in Hungarian) for 5 seconds, 5 or 10 times evenly spaced during image acquisition. Nodding was used as a motion induction since it is reportedly the most prominent type of head motion, responsible for most MRI artifacts [21]. This procedure yielded images with minimal head motion, as well as with slight and more excessive head motion. Participants gave informed consent and reported no neurological or psychiatric diseases. Collection protocols were approved at the National Institute of Pharmacy and Nutrition in Hungary. These repeated scans, with varying levels of motion, allowed for direct evaluation of the impact of motion artifacts on MRI image quality and brain age calculations.
MRI Data Acquisition
The dataset includes whole brain T1-weighted MRI images that were acquired using a Siemens Magnetom Prisma 3T MRI scanner and the following parameters: MPRAGE sequence using 2-fold in-plane GRAPPA acceleration, final slice thickness = 1 mm3 isotropic voxel size, echo time (TE) = 3 ms, flip angle = 9°, inversion time (TI) = 900 ms, repetition time = 2300 ms. This dataset is publicly available at: https://openneuro.org/datasets/ds004173/
Brain Age Algorithms
We deployed five brain age algorithms on this dataset: Cole et al. [14] (referred to as “brainageR”), Kaufmann et al. [15] (referred to as “XGBoost”), Bashyam et al. [16] (referred to as “DeepBrainNet”), Han et al. [22] (referred to as “ENIGMA”), and Leonardsen et al. [23] (referred to as “pyment”). We selected these algorithms based on popularity in recent brain age publications and their open access code. Pyment, DeepBrainNet and brainageR operate on raw (or minimally preprocessed) T1-weighted MRI scans. In contrast, XGBoost and ENIGMA require preprocessing using Freesurfer [24], an open-source MRI processing software package. Of note, for a small number of subjects Freesurfer processing was not successful due to extensive noise. Specifically, 7 out of 148 participants did not complete successfully. We provide brief summaries of the algorithms below. For detailed descriptions of model structure, please see the original papers cited here.
brainageR (Cole et al., 2018) Brain Age Algorithm
brainageR uses Gaussian Process Regression to predict brain age based on unprocessed, T1-weighted MRI images [14]. Relevant code is available at: https://github.com/james-cole/ brainageR. This software uses SPM12 for segmentation and normalization with custom brain templates, and loads these images into R using the RNfiti package. Gray matter, white matter and CSF vectors are then used to predict a brain age value with a model previously trained with kernlab. This algorithm was trained on a sample (N = 2001) of healthy adults aged 18-90, including scans from 14 different studies. After model building and tuning, Cole et al. found a strong positive correlation between brain age and chronological age (r=0.92).
DeepBrainNet (Bashyam et al., 2020) Brain Age Algorithm
DeepBrainNet is a 2D Convolutional Neural Network (CNN) built using the inception-resnetv2 framework [16]. Notably, this model was initialized with random weights and trained exclusively on MRIs to create a brain-specific model. With this algorithm, raw, unprocessed, T1-weighted MR images are n4 bias corrected, skull-stripped, and affine registered to an MNI-template. This algorithm was implemented through the ANTsRNet package, an implementation of Advanced Normalization Tools (ANTs) in the R programming language [25]. Relevant code for this algorithm is located here: https://github.com/ANTsX/brainAgeR. This algorithm was originally trained on a sample (N = 11729) of healthy controls aged 3-95 drawn from 18 different datasets. Bashyam and colleagues found a correlation of r=0.978 between predicted brain age and chronological age; however, those authors purposefully selected a “moderately fit” model over a loosely or tightly fit model. This was motivated by the belief that a moderately fit model would better reveal individual differences in pathology.
XGBoost (Kaufmann et al., 2019) Brain Age Algorithm
XGBoost uses gradient tree boosting to predict brain age based on 1118 features extracted using Freesurfer [15]. These features consist of thickness, area, and volume measurements from a multimodal parcellation of the cerebral cortex, cerebellum, and subcortex. Relevant code is available at: https://github.com/tobias-kaufmann/brainage. This algorithm was trained on a large and diverse sample (N = 39,827, female = 18,990). The sample was made up of healthy controls aged 3-89 drawn from 42 different datasets. To account for potential variation, Kaufmann et al. trained separate models for male and female brain age. We deployed this algorithm by first completing standard processing approaches in Freesurfer 7.1 (http:// surfer.nmr.mgh.harvard.edu). The technical details of this software suite are described in prior publications [26–29]. Briefly, this processing includes motion correction and intensity normalization of T1-weighted images, removal of non-brain tissue [30], automated Talairach transformation, segmentation of white matter and gray matter volumetric structures, and derivation of cortical thickness. Freesurfer processing was implemented via Brainlife.io (brainlife/app-freesurfer), which is a free, publicly funded, cloud-computing platform for developing reproducible neuroimaging processing pipelines and sharing data [31–33].
ENIGMA (Han et al., 2021) Brain Age Algorithm
The ENIGMA algorithm used ridge regression based on Freesurfer features (Freesurfer methods noted above in XGBoost description). The algorithm was developed based on data from N = 6989 participants (N = 4314 healthy participants; N = 2675 individuals with MDD) [22]. Structural MRI measures output from Freesurfer, from both the left and right hemispheres, were combined. Specifically, this resulted in 77 brain features including subcortical volumes, cortical thickness and surface area. Past research suggests age-related changes in cortical thickness, surface area and volumes, so all three modalities were included, but combined to improve model performance. Normative models were then estimated in a training sample of male and female controls. Further model development occurred through multivariate cross-validation aiming to minimize mean absolute error (MAE) between predicted and chronological age. Relevant code for this algorithm is located here: https://photon-ai.com/enigma_brainage.
Pyment (Leonardsen et al., 2022) Brain Age Algorithm
The pyment algorithm implemented a Simple Fully Convolutional Network on raw structural magnetic resonance images. The training dataset was one of the largest and most diverse datasets assembled (n=53,542), stratified by age and study before splitting in training and test datasets [23]. Models relied on TensorFlow and CUDA, using a batch size of 14, 80/20 dataset splits for training and validation, optimal hyperparameters found using heuristics (rather than full grid searches), and stochastic gradient descent using a stepwise learning rate schedule with 3 steps. MAE on the validation set was used to pick the best epoch. Relevant code for this algorithm is located here: https://github.com/estenhl/pyment-public
MRI Image Quality Assessment
While head motion varied based on instructions to participants, we also quantitatively measured image quality, an indirect assessment for head motion, using the CAT12 toolbox. Specifically, we generated a quantitative metric (“CAT12 score”) using the Computational Anatomy Toolbox 12 (CAT12). This metric considers four summary measures of image quality: noise-to-contrast ratio, coefficient of joint variation, inhomogeneity-to-contrast ratio, and root-mean-squared voxel resolution. CAT12 normalizes and combines these measures using a kappa statistic-based framework. The score is a value from 0 to 1, with 0 being the lowest quality and 1 being the highest quality. This measure was used for two purposes: 1) to confirm different levels of motion artifacts of repeated scans; and 2) to allow for a more continuous investigation of the impact of image quality (and connected head motion) on brain age estimates.
Statistical Analyses
We first calculated bivariate correlations between algorithms for the no-motion scans, examining relations between raw brain age, brain age delta, and image quality. To assess the reliability of brain age calculation by algorithm, we used two approaches of looking at reliability: intraclass correlation coefficient (ICC) and Bland–Altman analysis. ICC is a descriptive statistic indicating the degree of agreement between two or more sets of measurements. The statistic is similar to a bivariate correlation coefficient insofar as it has a range from 0 to 1 and higher values represent a stronger relationship. An ICC differs from a bivariate correlation in that it utilizes groups of measurements and gives an indication of the numerical cohesion across the given groups [34]. We calculated ICCs using the statistical programming language R, with the icc function from the package “irr” [35]. A two-way model with absolute agreement was used to investigate the exact estimate of brain age for each repeated scan. Although there are no definitive guidelines for precise interpretation of ICCs, results have frequently been binned into four quality groups where 0.0–0.5 is “poor”, 0.50–0.75 is “moderate”, 0.75–0.9 is “good” and 0.9–1.0 is “excellent” [36].
Additionally, Bland–Altman analyses investigate reliability by considering the differences between paired groups of measurements. Given that we had three scans per participant, we compared the following pairing of scans: 1) no versus low-motion; and 2) no versus high-motion. Each comparison yielded a “bias” score that was the difference divided by the mean value for a given pairing of measurements. We averaged the no versus low-motion, and no versus high-motion, Bland-Altman metrics. These different methods were used for raw brain age, as well as the brain age “gap” (raw brain age - chronological age). Finally, we constructed linear mixed effect models that could accommodate repeated measures from the same individuals and examined the high-, low-, and no-motion conditions. To investigate the effect of movement condition on brain age, movement condition was input as a fixed effect and participant ID was included as a random effect to account for the repeated measures design. We then compared differences in brain age between the no movement and low movement conditions, and between the no movement and high movement conditions. This was completed with the lme4 package in R.