Examining the Reliability of Brain Age Algorithms Under Varying Degrees of Subject Motion

doi:10.21203/rs.3.rs-3331689/v1

Download PDF

Research Article

Examining the Reliability of Brain Age Algorithms Under Varying Degrees of Subject Motion

https://doi.org/10.21203/rs.3.rs-3331689/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 04 Apr, 2024

Read the published version in Brain Informatics →

You are reading this latest preprint version

Brain age, defined as the predicted age of an individual’s brain based on neuroimaging data, shows promise as a biomarker for healthy aging and age-related neurodegenerative conditions. However, noise and motion artifacts during MRI scanning may introduce systematic bias into brain age estimates. This study leveraged a novel dataset with repeated structural MRI scans from participants during no motion, low motion, and high motion conditions. This allowed us to evaluate the impact of motion artifacts for brain age derived from 5 commonly used algorithms. Intraclass correlation coefficients, Bland-Altman analyses, and linear mixed-effect models were used to assess reliability. Results demonstrated variable resilience to motion artifacts depending on the algorithm utilized. The DeepBrainNet and pyment algorithms showed the greatest invariance to motion conditions, with high intraclass correlations and minimal mean differences on Bland-Altman plots between motion and no motion scans. In contrast, the brainageR algorithm was most affected by motion, with lower intraclass correlations and a high degree of bias. Findings elucidate the critical need for careful benchmarking of brain age algorithms on datasets with controlled motion artifacts in order to rigorously assess suitability for clinical deployment. Moreover, targeted efforts to improve model robustness to image quality and motion are warranted to strengthen the validity of brain age as a predictive biomarker. Overall, this study highlights open questions regarding the sensitivity of different brain age algorithms to noise and movement and motivates future optimization to derive biologically-meaningful brain aging metrics.

Brain age

Neuroimaging

MRI

Motion artifacts

Image quality

Algorithm Reliability

Biomarker

Recent advances in machine learning and data science have led to a number of research projects focused on the concept of “brain age”. This novel metric uses large datasets where age and neuroimaging scans are available, and age can be estimated on new participants only using neuroimaging data (for review, see [1]). Interestingly, these neuroimaging estimates can diverge from participants' chronological age, suggesting potential alterations in an individual’s biological age. An increasing focus on brain age is in part due to its promise as a potential biomarker for neurodegeneration, cognitive decline, and multiple psychiatric issues (e.g., schizophrenia, major depression, bipolar disorder; [2]). Given that many of these conditions are highly prevalent, early detection, via brain age or other metrics, could have important public health implications.

While potentially powerful, numerous open questions exist regarding brain age, especially in thinking about use of this metric in clinical settings. Of particular importance is how noise and image quality may influence the derivation of brain age. Noise and poor image quality arises, in part, from participant head movement during an MRI scan. Notably, high levels of head motion have been found in children, older adults, and clinical patient groups, as compared to young adult, non-patient samples [3–6]. High rates of head motion can eventually lead to erroneous estimates of cortical thickness, surface area, and volume [7–9]. This has been noted in both adult [8, 10], as well as pediatric samples [11]. As such, head motion during imaging sessions may influence our ability to detect differences in brain age between different groups, or in relation to behavioral traits of interest [12].

Pursuant to the clinical utility of brain age, researchers have developed multiple algorithms to calculate brain age. While one can use multimodal neuroimaging [13], most algorithms use T1-weighted anatomical images, either processed in Freesurfer or in original NIfTI format to estimate brain age [14–16]. With head motion likely to compromise measures of brain morphometry [7, 8, 17], image quality and participant motion could be introducing biases into brain age calculations. Put another way, image quality and motion could lead to lower estimates of cortical thickness, surface area, and volume, resembling the cortical atrophy associated with typical aging [18] In this way, head motion artifacts could bias estimates of brain age, leading to accelerated brain age being inaccurately noted in high motion participants. Connected to this, work from our group has found modest correlations between raw brain age and image quality (r= -.38 to -.46), as well as between brain age gap (the differences from raw brain age and a participant’s chronological age) and image quality (max r=.36, [19]).

While notable, our past work may be underestimating the true impact of motion and noise in brain age calculation. We previously examined bivariate correlations between image quality and brain age by assessing brain age and scan quality across different individuals. As such, potential relations may be occluded because of between-person variations in brain age. To minimize such variation and strengthen causal inference, it will be important to examine intra-individual relations between image quality and brain age. Put another way– If we only look at between person effects, people who are able to stay still more frequently and produce higher quality MRI scans may have slower brain aging, but these individuals may also have better cognitive functioning and other factors that also relate to brain aging. As such, image quality and other third factors related to brain age collide. Such factors could be both causes and effects, and we cannot statistically separate these with between-level designs. To truly understand the impact of motion and noise on brain age, it would be advantageous to examine brain age within the same participants repeatedly scanned when they are remaining still and when they exhibit higher levels of head motion.

Motivated by this, we leveraged a public access dataset to examine the impacts of motion on brain age. This dataset had high-, low-, and no-motion scans for the same individuals [20]. Processing these datasets through multiple commonly-used brain age algorithms (2 algorithms using Freesurfer-derived outputs; 3 algorithms using less processed NIfTI images), we used intraclass correlation and Bland–Altman bias metrics to compare algorithmic performance across repeated MRI scans of the same individuals. We also constructed linear mixed effect models that could accommodate repeated measures from the same individuals and examined the high-, low-, and no-motion conditions. This would allow us to derive standardized estimates for each algorithm in relation to low- and high-motion scans. Given past work finding motion and noise related to lower gray matter levels, we predicted lower correlations between chronological age and brain age for high motion scans. We also predicted algorithms that relied on Freesurfer would have greater changes in these metrics for high motion scans, since image quality is related to variations in morphometric estimates from this software [11].

Datasets

We examined a public-access MRI dataset that included 148 healthy adult participants, ages 18-75 years (Mean Age=30.01 +/-12.76 years; 64.1% Female; age-by-sex histogram shown in Figure 1) [20]. During MRI acquisition (detailed below), participants were instructed not to move at all, and in other scans to nod their head. To create different levels of motion artifacts, the word “MOVE” was presented (in Hungarian) for 5 seconds, 5 or 10 times evenly spaced during image acquisition. Nodding was used as a motion induction since it is reportedly the most prominent type of head motion, responsible for most MRI artifacts [21]. This procedure yielded images with minimal head motion, as well as with slight and more excessive head motion. Participants gave informed consent and reported no neurological or psychiatric diseases. Collection protocols were approved at the National Institute of Pharmacy and Nutrition in Hungary. These repeated scans, with varying levels of motion, allowed for direct evaluation of the impact of motion artifacts on MRI image quality and brain age calculations.

MRI Data Acquisition

The dataset includes whole brain T1-weighted MRI images that were acquired using a Siemens Magnetom Prisma 3T MRI scanner and the following parameters: MPRAGE sequence using 2-fold in-plane GRAPPA acceleration, final slice thickness = 1 mm³ isotropic voxel size, echo time (TE) = 3 ms, flip angle = 9°, inversion time (TI) = 900 ms, repetition time = 2300 ms. This dataset is publicly available at: https://openneuro.org/datasets/ds004173/

Brain Age Algorithms

We deployed five brain age algorithms on this dataset: Cole et al. [14] (referred to as “brainageR”), Kaufmann et al. [15] (referred to as “XGBoost”), Bashyam et al. [16] (referred to as “DeepBrainNet”), Han et al. [22] (referred to as “ENIGMA”), and Leonardsen et al. [23] (referred to as “pyment”). We selected these algorithms based on popularity in recent brain age publications and their open access code. Pyment, DeepBrainNet and brainageR operate on raw (or minimally preprocessed) T1-weighted MRI scans. In contrast, XGBoost and ENIGMA require preprocessing using Freesurfer [24], an open-source MRI processing software package. Of note, for a small number of subjects Freesurfer processing was not successful due to extensive noise. Specifically, 7 out of 148 participants did not complete successfully. We provide brief summaries of the algorithms below. For detailed descriptions of model structure, please see the original papers cited here.

brainageR (Cole et al., 2018) Brain Age Algorithm

brainageR uses Gaussian Process Regression to predict brain age based on unprocessed, T1-weighted MRI images [14]. Relevant code is available at: https://github.com/james-cole/ brainageR. This software uses SPM12 for segmentation and normalization with custom brain templates, and loads these images into R using the RNfiti package. Gray matter, white matter and CSF vectors are then used to predict a brain age value with a model previously trained with kernlab. This algorithm was trained on a sample (N = 2001) of healthy adults aged 18-90, including scans from 14 different studies. After model building and tuning, Cole et al. found a strong positive correlation between brain age and chronological age (r=0.92).

DeepBrainNet (Bashyam et al., 2020) Brain Age Algorithm

DeepBrainNet is a 2D Convolutional Neural Network (CNN) built using the inception-resnetv2 framework [16]. Notably, this model was initialized with random weights and trained exclusively on MRIs to create a brain-specific model. With this algorithm, raw, unprocessed, T1-weighted MR images are n4 bias corrected, skull-stripped, and affine registered to an MNI-template. This algorithm was implemented through the ANTsRNet package, an implementation of Advanced Normalization Tools (ANTs) in the R programming language [25]. Relevant code for this algorithm is located here: https://github.com/ANTsX/brainAgeR. This algorithm was originally trained on a sample (N = 11729) of healthy controls aged 3-95 drawn from 18 different datasets. Bashyam and colleagues found a correlation of r=0.978 between predicted brain age and chronological age; however, those authors purposefully selected a “moderately fit” model over a loosely or tightly fit model. This was motivated by the belief that a moderately fit model would better reveal individual differences in pathology.

XGBoost (Kaufmann et al., 2019) Brain Age Algorithm

XGBoost uses gradient tree boosting to predict brain age based on 1118 features extracted using Freesurfer [15]. These features consist of thickness, area, and volume measurements from a multimodal parcellation of the cerebral cortex, cerebellum, and subcortex. Relevant code is available at: https://github.com/tobias-kaufmann/brainage. This algorithm was trained on a large and diverse sample (N = 39,827, female = 18,990). The sample was made up of healthy controls aged 3-89 drawn from 42 different datasets. To account for potential variation, Kaufmann et al. trained separate models for male and female brain age. We deployed this algorithm by first completing standard processing approaches in Freesurfer 7.1 (http:// surfer.nmr.mgh.harvard.edu). The technical details of this software suite are described in prior publications [26–29]. Briefly, this processing includes motion correction and intensity normalization of T1-weighted images, removal of non-brain tissue [30], automated Talairach transformation, segmentation of white matter and gray matter volumetric structures, and derivation of cortical thickness. Freesurfer processing was implemented via Brainlife.io (brainlife/app-freesurfer), which is a free, publicly funded, cloud-computing platform for developing reproducible neuroimaging processing pipelines and sharing data [31–33].

ENIGMA (Han et al., 2021) Brain Age Algorithm

The ENIGMA algorithm used ridge regression based on Freesurfer features (Freesurfer methods noted above in XGBoost description). The algorithm was developed based on data from N = 6989 participants (N = 4314 healthy participants; N = 2675 individuals with MDD) [22]. Structural MRI measures output from Freesurfer, from both the left and right hemispheres, were combined. Specifically, this resulted in 77 brain features including subcortical volumes, cortical thickness and surface area. Past research suggests age-related changes in cortical thickness, surface area and volumes, so all three modalities were included, but combined to improve model performance. Normative models were then estimated in a training sample of male and female controls. Further model development occurred through multivariate cross-validation aiming to minimize mean absolute error (MAE) between predicted and chronological age. Relevant code for this algorithm is located here: https://photon-ai.com/enigma_brainage.

Pyment (Leonardsen et al., 2022) Brain Age Algorithm

The pyment algorithm implemented a Simple Fully Convolutional Network on raw structural magnetic resonance images. The training dataset was one of the largest and most diverse datasets assembled (n=53,542), stratified by age and study before splitting in training and test datasets [23]. Models relied on TensorFlow and CUDA, using a batch size of 14, 80/20 dataset splits for training and validation, optimal hyperparameters found using heuristics (rather than full grid searches), and stochastic gradient descent using a stepwise learning rate schedule with 3 steps. MAE on the validation set was used to pick the best epoch. Relevant code for this algorithm is located here: https://github.com/estenhl/pyment-public

MRI Image Quality Assessment

While head motion varied based on instructions to participants, we also quantitatively measured image quality, an indirect assessment for head motion, using the CAT12 toolbox. Specifically, we generated a quantitative metric (“CAT12 score”) using the Computational Anatomy Toolbox 12 (CAT12). This metric considers four summary measures of image quality: noise-to-contrast ratio, coefficient of joint variation, inhomogeneity-to-contrast ratio, and root-mean-squared voxel resolution. CAT12 normalizes and combines these measures using a kappa statistic-based framework. The score is a value from 0 to 1, with 0 being the lowest quality and 1 being the highest quality. This measure was used for two purposes: 1) to confirm different levels of motion artifacts of repeated scans; and 2) to allow for a more continuous investigation of the impact of image quality (and connected head motion) on brain age estimates.

Statistical Analyses

We first calculated bivariate correlations between algorithms for the no-motion scans, examining relations between raw brain age, brain age delta, and image quality. To assess the reliability of brain age calculation by algorithm, we used two approaches of looking at reliability: intraclass correlation coefficient (ICC) and Bland–Altman analysis. ICC is a descriptive statistic indicating the degree of agreement between two or more sets of measurements. The statistic is similar to a bivariate correlation coefficient insofar as it has a range from 0 to 1 and higher values represent a stronger relationship. An ICC differs from a bivariate correlation in that it utilizes groups of measurements and gives an indication of the numerical cohesion across the given groups [34]. We calculated ICCs using the statistical programming language R, with the icc function from the package “irr” [35]. A two-way model with absolute agreement was used to investigate the exact estimate of brain age for each repeated scan. Although there are no definitive guidelines for precise interpretation of ICCs, results have frequently been binned into four quality groups where 0.0–0.5 is “poor”, 0.50–0.75 is “moderate”, 0.75–0.9 is “good” and 0.9–1.0 is “excellent” [36].

Additionally, Bland–Altman analyses investigate reliability by considering the differences between paired groups of measurements. Given that we had three scans per participant, we compared the following pairing of scans: 1) no versus low-motion; and 2) no versus high-motion. Each comparison yielded a “bias” score that was the difference divided by the mean value for a given pairing of measurements. We averaged the no versus low-motion, and no versus high-motion, Bland-Altman metrics. These different methods were used for raw brain age, as well as the brain age “gap” (raw brain age - chronological age). Finally, we constructed linear mixed effect models that could accommodate repeated measures from the same individuals and examined the high-, low-, and no-motion conditions. To investigate the effect of movement condition on brain age, movement condition was input as a fixed effect and participant ID was included as a random effect to account for the repeated measures design. We then compared differences in brain age between the no movement and low movement conditions, and between the no movement and high movement conditions. This was completed with the lme4 package in R.

Descriptive Statistics and Bivariate Correlations

Paralleling past reports [19, 37] and to understand relations between different brain age algorithms, we first computed bivariate correlations between each of the algorithms for raw brain age, as well as the brain age delta (i.e., raw brain age - a participant’s chronological age). When examining raw brain age, there were reasonably high correlations between the 5 different algorithms we investigated with r’s ranging from 0.67–0.93 (as shown in Fig. 2). For the brain age gap, these correlations were lower (range of r’s = 0.37–0.78), but still statistically significant (all p’s < .005). Interestingly, correlations were low between each algorithm’s raw brain age and brain age delta (max r, within algorithm = 0.11). As noted previously, participants were instructed to lay still, move their heads slightly, or move their heads in large amounts. When comparing image quality across different motion levels, as derived from the CAT12 toolbox, we found that moving scans had lower image quality than still scans (F(2,285.6) = 253.82, p < .005, partial Eta² = 0.64; within group differences are shown in Fig. 3). This was a simple confirmation that motion during scans led to lower image quality. We also examined relations between image quality and each of the algorithms (for raw brain age and brain age delta). There were modest negative correlations, ranging from r=-0.112 to r=-0.338 between raw brain age and image quality. As such, lower quality scans were related to higher raw brain age values. Image quality was positively and negatively correlated with brain age delta, depending on the algorithm of interest (r range = -0.081 to 0.285 shown in Fig. 4).

Brain Age Reliability by Algorithm, as assessed by Intraclass Correlations

We examined reliability using intraclass correlations (ICCs) and Bland-Altman metrics for our five algorithms of interest. For raw brain age, we found good and excellent ICCs for four of the algorithms investigated when comparing no, low and high motion scans (DeepBrainNet ICCs = 0.961, pyment ICCs = 0.96, brainageR ICCs = 0.803, ENIGMA ICCs = 0.758). One algorithm showed moderate ICCs for raw brain age (XGBoost ICCs = 0.713). For brain age delta, ICCs again ranged from poor to good (XGBoost delta ICCs = 0.822, DeepBrainNet delta ICCs = 0.809, ENIGMA delta ICCs = 0.73, pyment delta ICCs = 0.698, brainageR delta ICCs = 0.517). A dumbbell plot shows the variations in these results (Fig. 5).

Bland-Altman Metrics of Reliability by Algorithm

There was a high degree of variability when examining Bland-Altman metrics of reliability for each algorithm (as shown in Figs. 6–7). Two algorithms had a small degree of bias when looking across all subjects in the dataset (DeepBrainNet mean difference = 0.803, 95% Confidence Interval = 0.319–1.28; pyment mean difference=-0.631, 95% Confidence Interval=-1.20-0.062). Brain ages calculated by the ENIGMA and XGBoost algorithms showed greater bias (ENIGMA mean difference = 1.343, 95% Confidence Interval = 0.263–2.422; XGBoost mean difference = 3.137, 95% Confidence Interval = 2.31–3.964). The brainageR algorithm had a fair amount of bias, with variation in movement leading to a mean difference of 5.442 years (95% Confidence Interval = 4.555–6.33). Of note, all the algorithms had sizable ranges of differences, indicating that subject motion could significantly influence brain age calculation for any given subject (DeepBrainNet Range = -10.23-8.277; pyments Range=-11.13- 23.59; ENIGMA Range=-30.79-18.68; XGBoost Range =-10.15-19.24; brainageR=-2.024-23.09).

Statistical Quantification of Subject Motion, as Assessed by Linear Mixed Effects Models

Subject head motion had significant, though mostly modest impacts on raw brain age calculated by three of our algorithms, specifically DeepBrainNet, pyment, and ENIGMA. For DeepBrainNet, the overall effect was (F(2,283.73) = 4.445, p = 0.01, partial Eta² = 0.03). For this algorithm, both low and high levels of motion scans had small effects on raw brain age calculation (low motion β = 0.06, high motion β = 0.06). For pyment, the overall effect was (F(2,283.81) = 7.637, p < 0.005, partial Eta² = 0.05). For this algorithm, both low and high levels of motion scans had small negative effects on raw brain age calculation (low motion β=-0.08, high motion β=-0.03). With ENIGMA, there was an increasing but still modest effect of motion (F(2,283.37) = 8.526, p < 0.005, partial Eta² = 0.06). For this algorithm, low motion had a small effect on raw brain age calculations, while high motion had a significantly larger, albeit modest, effect (low motion β = 0.09, high motion β = 0.22). Effects of motion were more pronounced for Kaufman and brainageR. ANOVAs indicated large effects for motion for both of these algorithms (Kaufman F(2,283.99) = 58.008, p < 0.005, partial Eta² = 0.29; brainageR F(2,283.35) = 111.65, p < 0.005, partial Eta² = 0.44). For these algorithms, both low and high levels of motion scans had moderate effects on raw brain age calculation (XGBoost low motion β = 0.25, high motion β = 0.57; brainageR low motion β = 0.35, high motion β = 0.55). These effects are shown in Fig. 8.

For brain age delta, effects were larger in magnitude, but algorithmic performance mostly mirrored results for raw brain age. Omnibus tests were nearly identical but probing of low and high levels of motion scans (compared to no motion scans) revealed some subtle differences in effects. For DeepBrainNet, both low and high levels of motion scans had small effects on brain age delta (low motion β = 0.14, high motion β = 0.14). Effects were very similar for ENIGMA brain age delta (low motion β = 0.11, high motion β = 0.24). For pyment, different levels of motion had negative effects, though with similar absolute magnitudes (low motion β=-0.23, high motion β=-0.09). Similar to raw brain age, effects were largest for low and high levels of motion for XGBoost and brainageR brain age delta (XGBoost low motion β = 0.20, high motion β = 0.43; brainageR low motion β = 0.61, high motion β = 0.94). These effects are also shown in Fig. 8.

In this study, we examined the effect of movement during MRI scans on calculations of brain age using multiple, well-validated, and commonly deployed calculation algorithms. Given that motion artifacts can skew volumetric and morphological measures used to determine brain age, it is critical to track variations in the sensitivity of different brain age algorithms to these issues. To these ends, we used two metrics of reliability (ICCs; Bland Altman metrics) to compare the different algorithms, leading to multiple important findings. First, using ICCs for brain age, four of the algorithms (DeepBrainNet, pyment, brainageR and ENIGMA) demonstrate good and excellent ICCs. For brain age delta, the more commonly used outcome variable in this type of research, ICCs were more variable. Notably, brain age deltas derived from DeepBrainNet and XGBoost demonstrated good reliability even during scans with motion. Turning to Bland Altman metrics, two algorithms, DeepBrainNet and pyment, had a small degree of bias for raw brain age. Performance was a bit poorer for the ENIGMA and XGBoost algorithms, and the brainageR algorithm showed the largest amount of bias when examining motion and non-motion scans. Critically, all the algorithms had sizable ranges of differences, indicating that subject motion could significantly influence brain age calculation for any given subject.

In addition to reliability analyses, we also constructed linear-mixed effect models to get specific statistical assessments of how much subject motion could influence brain age calculation. In these analyses, subject head motion had significant, though mostly modest impacts, on brain age calculations for three of our algorithms (specifically DeepBrainNet, pyments, and ENIGMA). Effects of motion were more pronounced for Kaufman and brainageR, with these statistical models suggesting larger effects (Kaufman partial Eta² = 0.29; brainageR partial Eta² = 0.44). Overall, our findings demonstrate that motion during MRI scanning can significantly influence brain age predictions depending on the algorithm used. However, this impact is not consistent across all methods. The brainageR algorithm may be less desirable for expanded deployment, while DeepBrainNet and pyment may have greater noise tolerance. ENIGMA and XGBoost performances were more average and should be explored in greater depth.

Thinking about our findings in relation to past reports, similar patterns have been noted individually for each algorithm regarding reliability and relations with other critical variables (i.e., age; image quality). When examining raw brain age, there were reasonably high correlations between the 5 different algorithms we investigated with r’s ranging from 0.67–0.93. Of note, we deployed brain age algorithms that used original NIfTI files, as well as outputs from Freesurfer. Our results also clearly connect to past work finding variations in morphometric values derived from high motion scans. Such effects remain after different forms of manual and automatic correction, suggesting that in-scanner motion induces spurious effects that do not reflect a processing failure in software, rather, they reflect systematic bias (e.g., motion-induced blurring) and this may appear similar to gray matter atrophy. Particularly concerning, many neuroimaging groups will visually inspect scans and include scans of “fair” or “marginal” quality. As researchers focus on different groups (e.g., children versus adolescents; clinical groups versus non-clinical groups), this potentially creates an “apples versus oranges” comparison; all scans may “pass” visual inspection, but one group has excellent image quality and clarity, while another has visible motion and is only above these passing thresholds.

Regarding the impact of subject motion, past work may underestimate the true impact of motion and noise in brain age calculation. Work by past investigators has found between-person relations between image quality and brain age calculation. Our project, however, is the first to examine intra-individual differences. The use of repeated MRI scans from the same participants allows us to control for confounding variables due to individual differences and infer causal relationships between motion artifacts and brain age calculation. By utilizing a within-person design with multiple scans per person over time, we isolated the effects of scan quality while holding constant time-invariant factors. This strengthens causal inference, as we are separating between- and within-person sources of variation.

Connected to this, our team is particularly interested in the effects of image quality and motion on brain age calculation. To our knowledge, no brain age algorithms have integrated measures of image quality in their model training and testing. In future development of brain age algorithms, it would be interesting to examine whether measures of image quality and successful preprocessing (i.e., CAT12 grades, or Freesurfer’s Euler Number) could be used to further optimize models. Certain brain age models (i.e., pyment) have used large numbers of participants (N = 53, 542) in their algorithmic development. This has meant that a large number of high motion participants have been included in training and test datasets. While this may mean less error when dealing with high-motion scans, image quality was not explicitly modeled. Given that commonly-used structural MRI measures derived from T1-weighted images are strongly related to image quality, this could be a fruitful future direction. Tackling these and other open questions related to brain age could significantly advance our understanding of healthy, as well as accelerated, aging processes.

While we believe we advanced applied understanding of brain age calculation, our work is not without limitations. First, our data is cross-sectional in nature, and it will be important to think about estimation and validation of different performance metrics in participants with repeated MRI scans separated by long periods of time. By looking longitudinally at within- and between-person change in relation to different algorithms, we may be able to derive a particularly powerful window into age-associated functional declines and disease, and different clinically relevant issues. It would be particularly powerful if there were high, low, and no motion MRI acquisitions acquired longitudinally to richly probe these questions. Second, we tested five commonly used algorithms where code was publicly shared for mass implementation of brain age calculation. There are many in-press and preprinted manuscripts engineering new calculations of brain age. Such novel algorithms may exhibit superior performance and fewer limitations than the approaches we examined here. It would be useful for novel algorithms to reuse this dataset to compare performance to what is reported here and demonstrate relative superiority. Third, we did not connect variations in brain age at different levels of movement with behavioral phenotypes of interest. In past work, we found that XGBoost brain age calculations, compared to those derived from brainageR and DeepBrainNet, were more sensitive to the detection of clinical diagnoses of cognitive impairment. We, however, did not calculate pyment or ENIGMA brain ages in that past project. It would be clinically useful to examine if brain age calculated from scans at various motion levels were sensitive to clinical characteristics commonly investigated in brain age research studies (e.g., Alzheimer's Disease; schizophrenia). The sample here was healthy, sampled in early adulthood (Mean Age = 30.01), but without dense sampling of relevant psychological or neurological variables. Examined collectively, it will be important for this subfield of neuroimaging to show that brain age algorithms are reliable, even with variable levels of motion, and that algorithms identify unique and additive variance in brain age.

Limitations notwithstanding, additional research on “brain age” is imperative. Richer information about the brain and brain aging could be important for those focused on age-related mortality and morbidity. Further work is needed to examine performance of the ENIGMA and XGBoost algorithms, while our comparisons indicated DeepBrainNet and pyment were often resilient to motion across different performance metrics. Evidence suggests the brainageR may have more challenges with reliability in high-motion scans and high-motion populations, but this should be further independently verified. Regarding next steps in brain age algorithms, given that these algorithms are often easy to implement, it could be advantageous for the field to report group comparisons or behavioral correlations with multiple algorithms. Typically, a research group will simply deploy a single algorithm and report results; it is often unclear if results would generalize with different algorithmic derivations of brain age and brain age delta. However, thoughtful consideration about reliability and noise tolerance will be critical when making decisions about different brain age algorithms, especially with an ever-growing landscape of potential ways to calculate this variable. Evaluating model performance on datasets with controlled motion artifacts can better establish the validity of brain age as a predictive measure in aging research. Moving forward, further optimization and validation of brain age algorithms is needed to ensure clinical utility and reliability of this biomarker.

Ethical Approval

This work involved the secondary analyses of publicly available and was not considered human subject research by the University of Pittsburgh; it was classified as exempt from review by Institutional Review Board at the University of Pittsburgh. Of note, the original investigators who collected the data (Nárai et al.) did obtain written, informed consent from all participants. That work was designed and conducted in accordance with the Hungarian regulations and laws, and was approved by the National Institute of Pharmacy and Nutrition (file number: OGYÉI/70184/2017).

Competing interests

The authors have no competing interests to declare.

Authors' contributions

JLH drafted multiple portions of the manuscript and completed the majority of analyses; DJA prepared figures, reviewed the manuscript, and provided feedback on drafts; PZ drafted portions of the manuscript, completed analyses, and provided feedback on drafts.

Funding

This work was funds provided to Dr. Hanson from the Learning, Research, & Development Center at the University of Pittsburgh.

Availability of data and materials

The neuroimaging dataset analyzed here is publicly available at: https://openneuro.org/datasets/ds004173/

Franke K, Gaser C (2019) Ten years of BrainAGE as a neuroimaging biomarker of brain aging: what insights have we gained? Front Neurol 789
Cole JH, Franke K (2017) Predicting age using neuroimaging: innovative brain ageing biomarkers. Trends Neurosci 40:681–690
Epstein JN, Casey B, Tonev ST, et al (2007) Assessment and prevention of head motion during imaging of patients with attention deficit hyperactivity disorder. Psychiatry Res Neuroimaging 155:75–82
Engelhardt LE, Roe MA, Juranek J, et al (2017) Children’s head motion during fMRI tasks is heritable and stable over time. Dev Cogn Neurosci 25:58–68
Makowski C, Lepage M, Evans AC (2019) Head motion: the dirty little secret of neuroimaging in psychiatry. J Psychiatry Neurosci 44:62–68
Haller S, Monsch AU, Richiardi J, et al (2014) Head motion parameters in fMRI differ between patients with mild cognitive impairment and Alzheimer disease versus elderly control subjects. Brain Topogr 27:801–807
Reuter M, Tisdall MD, Qureshi A, et al (2015) Head motion during MRI acquisition reduces gray matter volume and thickness estimates. NeuroImage 107:107–115. https://doi.org/10.1016/j.neuroimage.2014.12.006
Alexander-Bloch A, Clasen L, Stockman M, et al (2016) Subtle in-scanner motion biases automated measurement of brain anatomy from in vivo MRI. Hum Brain Mapp 37:2385–2397. https://doi.org/10.1002/hbm.23180
Pardoe HR, Kucharsky Hiess R, Kuzniecky R (2016) Motion and morphometry in clinical and nonclinical populations. NeuroImage 135:177–185. https://doi.org/10.1016/j.neuroimage.2016.05.005
Savalia NK, Agres PF, Chan MY, et al (2017) Motion-related artifacts in structural brain images revealed with independent estimates of in-scanner head motion. Hum Brain Mapp 38:472–492
Gilmore AD, Buser NJ, Hanson JL (2021) Variations in structural MRI quality significantly impact commonly used measures of brain anatomy. Brain Inform 8:1–15
Takao H, Amemiya S, Abe O, Initiative ADN (2021) Reliability of changes in brain volume determined by longitudinal voxel-based morphometry. J Magn Reson Imaging 54:609–616
Jirsaraie RJ, Gorelik AJ, Gatavins MM, et al (2023) A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns 4:
Cole JH, Ritchie SJ, Bastin ME, et al (2018) Brain age predicts mortality. Mol Psychiatry 23:1385–1392
Kaufmann T, van der Meer D, Doan NT, et al (2019) Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nat Neurosci 22:1617–1623
Bashyam VM, Erus G, Doshi J, et al (2020) MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain 143:2312–2324
Blumenthal JD, Zijdenbos A, Molloy E, Giedd JN (2002) Motion artifact in magnetic resonance imaging: implications for automated analysis. Neuroimage 16:89–92
Scahill RI, Frost C, Jenkins R, et al (2003) A longitudinal study of brain volume changes in normal aging using serial registered magnetic resonance imaging. Arch Neurol 60:989–994
Bacas E, Kahhalé I, Raamana PR, et al (2023) Probing multiple algorithms to calculate brain age: Examining reliability, relations with demographics, and predictive power. Hum Brain Mapp
Nárai Á, Hermann P, Auer T, et al (2022) Movement-related artefacts (MR-ART) dataset of matched motion-corrupted and clean structural MRI brain scans. Sci Data 9:630
Frew S, Samara A, Shearer H, et al (2022) Getting the nod: Pediatric head motion in a transdiagnostic sample during movie-and resting-state fMRI. Plos One 17:e0265112
Han LK, Dinga R, Hahn T, et al (2021) Brain aging in major depressive disorder: results from the ENIGMA major depressive disorder working group. Mol Psychiatry 26:5124–5139
Leonardsen EH, Peng H, Kaufmann T, et al (2022) Deep neural networks learn general and clinically relevant representations of the ageing brain. NeuroImage 256:119210
Fischl B (2012) FreeSurfer. Neuroimage 62:774–781
Tustison NJ, Cook PA, Holbrook AJ, et al (2021) The ANTsX ecosystem for quantitative biological and medical imaging. Sci Rep 11:9068
Dale AM, Fischl B, Sereno MI (1999) Cortical surface-based analysis: I. Segmentation and surface reconstruction. Neuroimage 9:179–194. https://doi.org/10.1006/nimg.1998.0395
Fischl B, Sereno MI, Dale AM (1999) Cortical surface-based analysis: II: inflation, flattening, and a surface-based coordinate system. Neuroimage 9:195–207. https://doi.org/10.1006/nimg.1998.0396
Fischl B, Salat DH, Busa E, et al (2002) Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33:341–355. https://doi.org/10.1016/S0896-6273(02)00569-X
Fischl B, Van Der Kouwe A, Destrieux C, et al (2004) Automatically parcellating the human cerebral cortex. Cereb Cortex 14:11–22. https://doi.org/10.1093/cercor/bhg087
Ségonne F, Dale AM, Busa E, et al (2004) A hybrid approach to the skull stripping problem in MRI. Neuroimage 22:1060–1075. https://doi.org/10.1016/j.neuroimage.2004.03.032
Avesani P, McPherson B, Hayashi S, et al (2019) The open diffusion data derivatives, brain data upcycling via integrated publishing of derivatives and reproducible open cloud services. Sci Data 6:1–13. https://doi.org/10.1038/s41597-019-0073-y
Pestilli F (2018) Human white matter and knowledge representation. PLoS Biol 16:e2005758. https://doi.org/10.1371/journal.pbio.2005758
Hayashi S, Caron B, Heinsfeld AS, et al (2023) brainlife. io: A decentralized and open source cloud platform to support neuroscience research. ArXiv Prepr ArXiv230602183
McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychol Methods 1:30
Gamer M, Lemon J, Gamer MM, et al (2012) Package ‘irr.’ Var Coeff Interrater Reliab Agreem 22:1–32
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15:155–163
Dörfel RP, Arenas-Gomez JM, Fisher PM, et al (2023) Prediction of brain age using structural magnetic resonance imaging: A comparison of accuracy and test-retest reliability of publicly available software packages. BioRxiv 2023–01

No competing interests reported.

Download PDF

Journal Publication

published 04 Apr, 2024

Read the published version in Brain Informatics →

Editorial decision: Revision requested
23 Nov, 2023
Reviews received at journal
06 Oct, 2023
Reviewers agreed at journal
01 Oct, 2023
Reviewers invited by journal
08 Sep, 2023
Editor assigned by journal
08 Sep, 2023
Submission checks completed at journal
08 Sep, 2023
First submitted to journal
06 Sep, 2023

You are reading this latest preprint version

Examining the Reliability of Brain Age Algorithms Under Varying Degrees of Subject Motion

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

METHOD

RESULTS

Descriptive Statistics and Bivariate Correlations

Brain Age Reliability by Algorithm, as assessed by Intraclass Correlations

Bland-Altman Metrics of Reliability by Algorithm

Statistical Quantification of Subject Motion, as Assessed by Linear Mixed Effects Models

DISCUSSION

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1