Neural similarity across task load relates to cognitive reserve and brain maintenance measures on the Letter Sternberg task: a longitudinal study

The aging process is characterized by change across several measures that index cognitive status and brain integrity. In the present study, 54 cognitively-healthy younger and older adults, were analyzed, longitudinally, on a verbal working memory task to investigate the effect of brain maintenance (i.e., cortical thickness) and cognitive reserve (i.e., NART IQ as proxy) factors on a derived measure of neural efficiency. Participants were scanned using fMRI while presented with the Letter Sternberg task, a verbal working memory task consisting of encoding, maintenance and retrieval phases, where cognitive load is manipulated by varying the number of presented items (i.e., between one and six letters). Via correlation analysis, we looked at region-level and whole-brain relationships between load levels within each phase and then computed a global task measure, what we term phase specificity, to analyze how similar neural responses were across load levels within each phase compared to between each phase. We found that longitudinal change in phase specificity was positively related to longitudinal change in cortical thickness, at both the whole-brain and regional level. Additionally, baseline NART IQ was positively related to longitudinal change in phase specificity over time. Furthermore, we found a longitudinal effect of sex on change in phase specificity, such that females displayed higher phase specificity over time. Cross-sectional findings aligned with longitudinal findings, with the notable exception of behavioral performance being positively linked to phase specificity cross-sectionally at baseline. Taken together, our findings suggest that phase specificity positively relates to brain maintenance and reserve factors and should be better investigated as a measure of neural efficiency.


Introduction
Studies have shown that aging is associated with declines in working memory (WM), which is defined as the temporary storage and manipulation of information (Baddeley, 2012) and which serves as the building block for several higher cognitive functions, such as language, reasoning and problem solving (Brockmole & Logie, 2013;Pliatsikas et al., 2019). Age-related declines have been shown across various aspects of WM comprising both verbal and non-verbal content (Reuter-Lorenz & Sylvester, 2005;Park et al., 2002). A meta-analysis by Bopp and Verhaeghen (2005) reported age differences across eight verbal span tasks, with agerelated dissociations between short-term memory storage span and WM span, with the latter presenting more sensitive age effects. Other WM tasks displaying age-related declines include feature binding in visual object recognition (Atkinson et al., 2018), focal attention switching (van Gerven et al., 2008), and updating (Ecker et al., 2010) to name a few. However, as WM is postulated to depend on different storage systems (Baddeley, 1992), the extent of decline may also vary as a function of task type. For instance, visuospatial WM has been shown to be more impacted than verbal WM in the aging process (Hale et al., 2011;Cansino et al., 2013), although some research has shown comparable decline (Kumar & Priyadarshi, 2013).
Neural activity in the prefrontal cortex (PFC), in regions such as the dorsolateral PFC, is considered to play a key role in WM, largely due to sustained activity witnessed during task retention delay periods (for a comprehensive review, see Funahashi, 2017). PFC involvement is considered to operate as a top-down control system crucial to modulating activity in posterior cortical and subcortical sensorimotor regions via long-range projections (for a review, see D' Esposito & Postle, 2015). In the context of healthy aging, several studies have reported age-related reductions in generally more posterior task-specific regions along with activation increases in the PFC, typically suggesting compensatory processing (Vermeij et al., 2017;Fakhri et al., 2013;Rajah & D'Esposito, 2005). The successful recruitment of additional neural resources should align with the maintenance of optimal memory function, where increased activity correlates with behavioral performance (e.g., Suzuki at al., 2018). However, in many cases, this age-related "hyperactivation" has not been linked to performance increases, instead suggesting an inefficiency of processing (Morcom & Henson, 2018).
One way of investigating neural efficiency in WM ability has been to use protocols that vary task demand, manipulating either the amount, or load, of information to be maintained or the duration of that maintenance, and to then compare intensity of activation across these task manipulations. Several studies, for instance, have reported greater activation for older versus younger adults during low-load WM tasks, with this increase sometimes even linked to poorer performance (Bauer et al., 2015;Toepper et al., 2014;Schneider-Garces et al., 2010). According to the Compensation-Related Utilization of Neural Circuits Hypothesis (CRUNCH; Reuter-Lorenz & Cappell, 2008), as load increases, cortical region recruitment also increases; as older adults reach their resource capacity limit quicker than younger adults, low and intermediate levels of task difficulty will engage greater neural resources. Additionally, from a more general neural efficiency perspective, individuals with higher versus lower levels of intelligence, for instance, typically show less cortical activation for the same level of task difficulty (Nussbaumer et al., 2015;Dunst et al., 2014) as do faster-performing versus slower-performing individuals (Rypma et al., 2006), suggesting that reduced neural recruitment may be optimal.
One paradigm to study neural efficiency is the Sternberg task (Sternberg, 1966). More specifically, the letter Sternberg task is a verbal WM task consisting of three phases; participants are first presented with a string of letters varying between one and six letters (i.e., encoding, or stimulus, phase) and are asked to retain this sequence in memory over a specific duration (i.e., maintenance, or retention, phase), after which they are presented with a single letter and asked to indicate whether that letter appeared in the original string (i.e., retrieval, or probe, phase). Zarahn and colleagues (Zarahn et al., 2007) utilized this task to investigate age-related differences in the magnitude of activation change across three load levels. Using a multivariate approach, they identified spatial network patterns that increased in expression among the older age group while behavior decreased, suggesting declines in efficiency. Other studies analyzing the effect of memory load increases across phases in a delayed match-to-sample verbal WM task have shown related activation increases in the DLPFC (Glahn et al., 2002;Bunge et al., 2001). When disambiguating between phases, evidence has shown load-related activation increases in the DLPFC to occur selectively across phases, with some evidence suggesting that these increases occur during the encoding phase (Rypma and D'Esposito, 1999), the maintenance phase (Altamura et al., 2007;Habeck et al., 2005;Zarahn et al., 2005;Rypma et al., 2002) or the retrieval phase (Cairo et al., 2004;Veltman et al., 2003). Evidence has also been shown though for activation increases in the DLPFC across all three phases (Bedwell et al., 2005). In the present study, we utilized the letter Sternberg task to analyze neural efficiency from a slightly different perspective− that of neural similarity across load levels within each of the three phases. Via correlation analysis, we looked at region-level and whole-brain relationships between load levels within each phase and then computed a global task measure, what we term phase specificity, to analyze how similar neural responses were across load levels within each phase compared to between each phase. We chose to ground phase specificity in the difference across load levels within each phase compared to between each phase because we wished to account for the general relatedness present in the brain of each individual. As previously suggested, while different phases of working memory are likely to engage distinct neural processes, overlap in processing across phases may also be present. One may speculate that neural efficiency could also manifest as whole-brain similarity across both phase and load; however, high whole-brain similarity, irrespective of task phase, could also suggest a loss of specificity of neural processing typically considered necessary to sustaining healthy functioning (e.g., dedifferentiation; see Koen & Rugg, 2019 for a comprehensive review). We attempted to circumvent this possibility by considering phase specificity as the difference metric described. Our hypothesis was that phase specificity, which should index similarity in neural resources across load levels, should be high in the efficiently functioning brain. We then analyzed the effect of cognitive reserve (i.e., NART IQ as proxy) and brain maintenance (i.e., cortical thickness) factors on this phase specificity measure, hypothesizing that higher phase specificity should positively relate to these factors.
With regard to age-related protective factors, higher NART IQ, for instance, has been associated with greater cognitive reserve capacity (Tucker & Stern, 2011), which has been proposed to mitigate age-related decline and promote efficiency, capacity, and flexibility of cognitive processing (Barulli & Stern, 2013). While cortical thickness has been shown to decline throughout the lifespan (Fjell et al., 2015), preserved cortical thickness with age has been shown to be strongly linked to higher WM performance (Burzynska et al., 2011). Another recent paper by Dominguez and colleagues (2021) looked at healthy older adults aged 70 to 100 and found that better performance on WM and other executive tasks was linked to preserved regional cortical thickness, particularly in the cingulate cortex, although whole-brain cortical thickness was an even better predictor of performance. It indeed may be the case that certain regions may be differentially impacted by the aging process. A longitudinal study by Storsve and colleagues (Storsve et al., 2014), for instance, found accelerating changes with increasing age in the temporal and occipital cortices in addition to decelerating changes in prefrontal and anterior cingulate cortices. For this reason, in the current study we use both whole-brain and regional measures of cortical thickness to investigate its effect on phase specificity. Few studies have investigated the effect of structural brain changes on neural processing in WM, with inconsistent links between activation and cortical thickness in specific (for review, see Maillet and Rajah, 2013). Furthermore, we adopt both a cross-sectional and longitudinal perspective to analyze inter-individual differences in phase specificity over time and how it relates to reserve, brain maintenance, and other individual-level factors, notable testing/controlling for age effects. Critically, we also consider how behavioral performance might be related to our phase specificity measure, as efficiency of neural processing needs be grounded in a behavioral outcome to adequately interpret the utility of certain patterns of resource allocation, be it via overall quantity of activation or, as in the present analysis, similarity across processing resources.

Participants
Analyses were conducted on participants from the Cognitive Reserve study. Longitudinal data was available for 59 participants. Two participants were eliminated due to missing behavioral data and one participant was eliminated due to below chance-level accuracy. Two additional participants were eliminated due to systematic and excessive zero values in their activation maps (> 35%), indicating potential impurities during scan acquisition. Thus, a total of 54 participants were included in the final analysis (see Table 1 for a list of participant demographics), with an age range of 23-70 years at baseline. As highlighted in Table 1, this range did not contain equal sampling at every age; instead, 13 participants were in an age bracket of 23-30 years and 41 participants in a bracket of 55-70 years. All participants were native English speaking, right-handed (Oldfield Edinburgh Handedness Inventory;Oldfield, 1971) adults who were tested at two time points-baseline and 5-year follow-up. Participants were recruited for the study via random market advertising. All participants were screened for severe medical or psychiatric conditions, head injury, hearing or vision impairments, and other impediments that could interfere with MRI acquisition. Participants were also pre-screened for dementia and mild cognitive impairment at both time points using the Dementia Rating Scale (DRS; Mattis, 1988), which has a maximum score of 144 and for which 135 was required for admission to the study (baseline: M = 139.44, SD = 2.97; follow-up: M = 139.57, SD = 3.76; max score = 144).

Procedure
The experiment was designed to acquire fMRI data from participants as they performed two computerized cognitive tasks in-scanner, one relating to working memory (i.e., Letter Sternberg (LS) task) and the other to executive control (i.e., Executive Context (EC) task). Data was acquired at two time points (baseline and 5-year follow-up). The order of task administration was counterbalanced across participants at baseline and randomized at follow-up. Tasks presented at follow-up were identical to those presented at baseline. The present analysis though focuses only on the LS task.

Letter Sternberg (LS) task
The adapted LS task for fMRI consisted of a total of three runs, with each run containing thirty trials. The thirty trials were divided among three conditions that varied in stimulus set size. Each trial consisted of three phases-stimulus, retention, and probe-followed by a 3 s inter trial interval. In the stimulus presentation phase, participants were presented with a string of either one, three, or six uppercase letters, arranged in a 2 × 3 grid for a period of 3 s. This was followed by a retention period of 7 s, in which participants were required to maintain the information in memory. Finally, in the probe phase, participants were presented with a single lowercase letter in the middle of the screen and asked to indicate, by button press, whether the letter had previously appeared in the stimulus set. See Fig. 1 for a schematic representation.

Stimulus presentation
Stimuli were back-projected onto an LCD monitor positioned at the end of the scanner bore. Participants viewed the screen via a tilted mirror system, which was mounted on the head coil. When needed, vision was corrected-to-normal using MR compatible glasses (manufactured by SafeVision, LLC. Webster Groves, MO). Responses were made on a LUMItouch response system (Photon Control Company). E-Prime v2.08, operating on PC platform, was used for stimulus delivery and data collection. Task onset was electronically synchronized with the MRI acquisition device.
fMRI processing Data acquisition Image acquisition was performed using a 3T Philips Achieva Magnet. At the onset of each session, a scout T1-weighted image was acquired to determine the participant's position. A high-resolution T1-weighted magnetization-prepared rapid gradient echo (MPRAGE) scan was collected to capture participants' brain structure, with the following parameters: TE/TR of 3/6.5 ms, flip angle of 8°, in-plane resolution of 256 × 256 voxels, field of view of 25.4 × 25.4 cm, and 180 slices in the axial direction with a slice-thickness/gap of 1/0 mm. For the EPI acquisition, the following parameters were used: TE/TR of 20/2000 ms, flip angle of 72°, in-plane resolution of 112 × 112 voxels, and a slice thickness/gap of 3/0 mm. FLAIR, DTI, ASL, and a resting BOLD scan were additionally acquired; however, for the current paper, only fMRI and the MPRAGE sequence are considered. Each participant's scan was examined by a neuroradiologist for abnormality detection, in which case findings were reported to the participant's primary care physician.
Data preprocessing FMRIB Software Library v5.0 (FSL) and custom-written Python code was used to preprocess the imaging data. In brief, the preprocessing pipeline for each participant's data set consisted of the following: histograms for noise detection (FEAT), spatial realignment to the first volume (MCFLIRT), slice-timing correction using FSL's slicetimer tool, creation of grey-matter mask from the first volume, highpass filtering using a Gaussian kernel and cut-off frequency of 0.008 Hz, and pre-whitening for attenuation of autocorrelation. The first functional volume was then co-registered to the template-aligned T1-weighted image (FLIRT) using the normalized mutual information cost function. General-Linear-Model (GLM) estimation with motion-related nuisance regressors and convolved double-gamma hemodynamic response function was performed, and finally, non-linear registration of functional to structural brain images with normalization into MNI space (FNIRT; Andersson et al., 2007).

Time-series modeling
Participant-level general linear models (GLMs) were created using event-related modeling. The time series were modeled separately for each of the three phases (i.e., stimulus, retention, and probe) and load levels (i.e., load 1, load 3 and load 6) pertaining to each trial. The Fig. 1 Schematic of LS task presentation. In the stimulus presentation phase (3 s), participants were presented with a string of letters of either 1, 3, or 6 letters in uppercase. They were asked to retain this sequence in memory during the retention phase (7 s). Finally, during the probe phase, they were presented with a single letter in lowercase and asked to indicate whether that letter was present during stimulus presentation. Responses were made by button press. Trials were interleaved with a 3-second intertrial interval (ITI). regressors for each phase were constructed by convolving the phase duration with the appropriate canonical hemodynamic response function (HRF) relative to an intrinsic baseline. The intrinsic baseline was modeled as the combination of all intertrial intervals during which no stimuli were presented on the screen. Modeling was done on correct trials only, yielding one parameter estimate (beta) map per phase and load level (nine in total), per participant. While this could, in principle, result in fewer data for older participants, regression analysis showed no significant age effect on accuracy at either time point (baseline: P = 0.54; follow-up: P = 0.156). As the analysis consisted of comparisons between baseline and follow-up beta maps, a gray matter mask was applied reflecting the overlap in meaningful voxels across the two time points, where any voxel containing a 0 value (e.g., noise contamination, weak signal) for more than 5% of participants at either time point was eliminated. This reduced the number of active voxels to 29,924. The remaining 0 values were set to NaN. However, this number was further reduced when we further imposed an overlap between voxels from the activation map and voxels comprising the structural regions on which regional cortical thickness values were calculated (see next section).

Cortical thickness measure
Utilizing each participant's T1-weighted MPRAGE image, cortical thickness measures were derived using the FreeSurfer (v5.1.0) software package (http:// surfer. nmr. mgh. harva rd. edu/). Although the estimation was automatically-generated, gray and white matter segmentation and spatial registration was manually checked based on the guidelines outlined in Fjell et al. (2009). As we were dealing with longitudinal data from two time points, images were processed using the longitudinal stream (Reuter et al., 2012). Specifically, an unbiased within-subject template space and image is created using inverse consistent registration (Reuter et al., 2010). All processing steps are then initialized utilizing common information from the within-subject template, which has been shown to increase reliability and statistical power (Reuter et al., 2012). To obtain cortical thickness estimates, the gray/white matter boundary and cortical surface were first reconstructed (Dale et al., 1999). Next, at each point across the cortical mantle, measurements were calculated as the closest distance from the gray/white matter boundary to the gray matter. Mean cortical thickness was then derived across the entire cortical surface, yielding a single value. However, we also obtained regional cortical thickness values by using a validated neuroanatomical labeling system to subdivide the cortex into 68 regions of interest (ROI), based on the Desikan-Killiany (DK) surface-based cortical atlas (Desikan et al., 2006). Mean cortical thickness was then separately calculated within each of these regions. As we ultimately wished to link cortical thickness values for each region with their corresponding task-based voxel activations, we sought the voxel overlap between the Desikan-Killiany (DK) atlas and our beta-activation maps to identify with which area each of our voxels was associated. For region-based analyses, this process further restricted our voxel set to 15,535 voxels, which spanned over 66 regions instead of 68 regions, due to the initial activation map mask having excluded voxels in two regions (i.e., bilateral frontal poles). The size of the ROI per region ranged from 27 to 918 voxels (M = 235; SD = 191).

Analytic approach
Data were analyzed using custom-written MATLAB ® codes (Mathworks, Natick, Massachussets, USA). Demographic variables including Age, NART IQ (NART), Education (Edu), and Sex were included in every regression model. Whole-brain regression models were assessed for significance using an uncorrected threshold of p < 0.05, while region-specific regression models were assessed at p < 0.005. When treating regressions of longitudinal change, factors with measurements considered at both time points were residualized based on baseline values. More specifically, for cortical thickness, behavioral performance, and neural task correlations, the change values− calculated as follow-up (FU) minus baseline (BL) − were residualized with respect to baseline measurements. This was performed to account for baseline differences while restricting the number of factors included in our models to preserve degrees of freedom due to sample size limitations.

Task-phase specificity
Operating from a principle of efficiency, we theorized that greater efficiency in task processing would manifest as increased similarity across task loads within each phase of the task. The idea is that similar neural resources should be recruited irrespective of task difficulty in the efficient brain. It is somewhat conceptually similar to efficiency measures using slope estimates across task loads, with lower slopes (i.e., lower activation increases) related to increased efficiency. Here though, voxel activation "patterns" across load levels consider the relationship between all voxels within a given ROI to produce a single value per ROI, but are not calculated at the voxel-level as is done with slope calculations. This method serves to reduce the number of comparisons when contrasted to voxel-wise analysis while still considering voxellevel information in the correlation calculation as statistical dependency is measured between vectors of all voxel activations within a given ROI; to obtain a commensurate measure, slope calculations could also render a single value per ROI by averaging slope values across voxels, but this could result in a loss of specificity. Furthermore, when considering slope estimates, necessary attempts to incorporate initial activation levels can add a layer of complexity in interpreting results given differing signs of activation; longitudinal measurements of change complicate this even further. The advantage of using simple bivariate correlation is that the magnitude of activation is inherently normalized.
Whole-brain analysis To calculate phase specificity (PS), we considered pairwise correlations of the activation values between each task condition belonging to a given phase (3) and load level (3). Our neural matrix was represented in three-dimensions, composed of voxels (= 29,924), subjects (= 54), and task (= 9), with PS computed at the single subject level. As there were nine conditions in total, single subject correlation matrices were constructed based on 36 task pairings, or the upper triangle of the 9 × 9 task pairing matrix, where each correlation value reflected the correlation across voxels between each task pairing. High PS is interpreted as high correlation between elements along the diagonal (i.e., within-phase; task activation loads belonging to the same phase) and low correlation between elements off the diagonal (i.e., between-phase; tasks activation loads belonging to different phases). The within-phase (9) and between-phase (27) pairings were then averaged, resulting in a single value for each. The larger the magnitude the difference between these two values, the higher the PS. This procedure resulted in a single vector of subject-level PS values (54 × 1 values). We performed this process twice, once for baseline data and again for follow-up data. In order to capture within-subject longitudinal changes, baseline PS was subtracted from follow-up PS, resulting in a subject-level vector of DPS. These values were ultimately submitted to multiple linear regression modeling, where one model was generated per time point and change between time points.
Region-level analysis PS was calculated in a manner similar to the one previously described, only this time across voxels belong to a given ROI and not an omnibus, whole-brain analysis. ROIs were generated based on the DK surface-based cortical atlas to allow for a direct comparison of local associations between cortical thickness and corresponding PS within a given region. Returning to our 29,924 × 54 × 9 neural matrix, subsets of voxels were selected based on their regional associations and PS calculated separately for each of the 66 ROIs. This procedure resulted in a two-dimensional matrix of region x subject PS values (66 × 54 values). Again, we performed this process separately at baseline and follow-up, and calculated the difference between the two time points, resulting in a two-dimensional matrix of DPS values. These values were ultimately submitted to multiple linear regression modeling, where region (66) by time point (3) regressions were generated.

Behavioral analysis
Behavioral performance measures were reaction time (RT) and accuracy (ACC) for the three task loads. In order to analyze both outcomes jointly as a single measure, we corrected for RT based on ACC using the linear integrated speed-accuracy score (LISAS). For each condition (i.e., task load), LISAS is defined as: RT + (SD RT /SD PE ) x PE, where RT here is the mean RT across trials, SD RT is the standard deviation of RT across all trials, SD PE is the standard deviation of the proportion of errors (PE), and PE is the proportion of errors, or accuracy. The weighting of the PE with the ratio of standard deviations is performed in order to achieve a similar weight between RT and PE, which essentially provides an estimate of the RT corrected for the number of errors. This measure has been shown to outperform other integrated measures of RT and ACC in detecting effects and accounting for variance (for a review, see Vandierendonck, 2017). The mean LISAS across task loads was included in the phase specificity regression analyses. Regression models were created at each time point (baseline (BL), follow-up (FU), and FU-BL) to test the effect of Age, NART, Sex, and Education on mean LISAS across load levels. NART, Education, and Age were considered as continuous predictors and sex as a categorical predictor."

PS regression analyses
For cross-sectional whole-brain regression analyses, where phase specificity was calculated at each time point, the basic structure included the following variables: where phase specificity (PS) was regression on cortical thickness (CT) at a given time point (t), mean LISAS (mLISAS) at time point (t), each demographic variable calculated at baseline and the interaction term CT t x NART. In the case of regional analyses, the same regression analyses were repeated for each region (66), where PS was calculated within each ROI and corresponding regional CT values utilized.
For change regression analyses, where phase specificity was calculated as the difference between time points (FU − BL), the basic structure included the following variables: where the change in phase specificity after residualizing the baseline PS effect (ΔPS resid ) is regressed on the change in cortical thickness after residualizing the baseline CT effect (ΔCT resid ), mean LISAS after residualizing the baseline LISAS effect (ΔmLISAS resid ), the demographic variables calculated at baseline, and the interaction term. Again, in the PS = CT t + mLISAS t + Age + NART + Edu + Sex + CT t * NART ΔPS resid =ΔCT resid + ΔmLISAS resid + Age + NART + Edu + Sex + ΔCT resid * NART case of regional analyses, the same regression was repeated for each region (66), where ΔPS was calculated within each ROI and corresponding regional ΔCT values utilized.

Behavioral
We first plot the behavioral performance LISAS for each time point divided into younger and older adults (see Fig. 2). As can be observed from the graph, LISAS across loads was overall higher at both baseline and follow-up for older participants compared to younger participants; this pattern was to be expected given well-documented age-related declines in processing speed and working memory (e.g., Park et al., 2002;Ebaid et al., 2017). While mean accuracy was overall higher for younger participants than older participants, interestingly, this difference was not significant (P > 0.19). Also, as to be expected, LISAS generally increased across load levels, indicating greater cognitive demand with increasing task complexity. When we tested the effect of Age, NART, Sex, and Education on mean LISAS across load levels at each time point, we found a significant positive effect of Age at both baseline (F(4, 49) = 26.517, p < 0.001, 2 = 0.35) and 5-year follow-up (F(4, 49) = 16.002, p < 0.001, 2 = 0.246).
Additionally, there was a significant effect of Sex at baseline (F(4, 49) = 5.381, p < 0.025, 2 = 0.099), where LISAS was higher for females. There was no significant effect of neither Education nor NART on LISAS at any time point. When considering the difference in LISAS between baseline and follow-up (FU-BL), there was no significant effect of any factor on this change; importantly, there was no disproportionate age-related decline in performance across time points.

Whole-brain regressions
Next, we report the results from the whole-brain analyses of PS at each time point. Figure 3 shows the whole brain correlations per task pairing, averaged across all participants. As can be observed from the figure, within-phase task correlations across load levels were generally higher compared to betweenphase task correlations, at both baseline and follow-up. This effect was most clearly observed in the stimulus and probe phases and to a lesser extent in the retention phase, indicating relatively greater variability of processing. Regression analyses considered the effects of CT, LISAS, NART, Age, Sex, Edu, and the interaction between CT and NART on PS. At baseline, there was a marginal negative trend between LISAS and PS, where higher LISAS was related to a decrease in PS (F(7, 46) = 3.281, P = 0.077, 2 = 0.067). At follow-up, there was a significant effect of Sex on PS, such that females displayed higher PS (F(7, 46) = 7.961, p = 0.007, 2 = 0.148). When investigating the change in PS from baseline to follow-up, regression analysis considered the effect of DCT, DLISAS, NART, Age, Sex, Edu, and the interaction between DCT and NART on DPS. We found a significant positive effect of DCT on DPS, such that higher CT across time predicted higher PS across time (F(7, 46) = 6.508, p = 0.014, 2 = 0.124). We also observed significantly higher PS between time points In the upper panel, from left to right, the linear integrated speedaccuracy score (LISAS), which provides a corrected measurement of RT in milliseconds, in addition to mean accuracy (i.e., mACC) across task loads, is plotted for baseline, follow-up, and the difference between them, respectively. Within each graph, LISAS is plotted for load 1 (L1), load 3 (L3), and load 6 (L6), with values depicted on the left y-axis, and the mean accuracy across loads, with values being depicted on the right y-axis. Green lines represent younger participants while magenta lines represent older participants. For mean accuracy, thick horizontal bars represent the mean of each group and the color-corresponding boxes represent their standard error of the mean (SEM).

Region-level regressions
Finally, we looked at the relationship between PS and each of our factors at level of ROI mean values. For a list of significant findings, see Table 2 and for a visualization of significant effects, see Fig. 4. Interestingly, at baseline, we found a significant negative relationship between LISAS and PS in the right banks of the superior temporal sulcus, indicating that higher PS was associated with better behavioral performance. There was also a significant positive effect of Sex in the right lateral occipital cortex, where females displayed higher PS than males. This effect was also found in the left hemisphere of the same region at follow-up, in addition to three other regions. There was also a significant positive interaction between CT and NART in the left precuneus, such that, similar to the whole brain trend, individuals with higher NART displayed significantly higher PS when presenting with greater brain integrity (see Fig. 5). The left precuneus further displayed a positive association between Sex and PS at both follow-up and when investigating the longitudinal change across time, with females continually displaying higher PS both cross-sectionally and longitudinally. At follow-up, we found significant and independent positive effects of Sex and Education in the left pericalcarine, in addition to a positive Sex effect when looking at longitudinal DPS. We also found significant positive effects of NART in the right pars triangularis, both cross-sectionally at follow-up and longitudinally, where higher NART IQ was related to higher PS. Finally, we observed a significant effect of DCT on DPS, where longitudinal maintenance of brain integrity was related to higher PS across time.

Discussion
In the current study, we aimed to investigate the effect of cognitive reserve and brain maintenance factors on a derived measure of neural efficiency (i.e., phase specificity; PS), where we considered the similarity in neural response across the three load levels within each task phase of the LS task. Brain maintenance, ideally instantiated as a preservation of brain morphology over time, is thought to contribute to the sustainment of efficiency whereas efficiency itself could potentially be a mechanism of cognitive reserve (see Stern et al., 2020). We thus started from the premise that high PS should reflect greater neural efficiency and thus positively relate to brain maintenance and reserve factors. We looked at both cross-sectional and longitudinal findings of change in PS at both the whole-brain and regional level to see if these individual-level factors were related to PS when considering inscanner behavioral task performance in regression analyses. Most notably, we did observe a positive longitudinal effect of change in CT (i.e., brain maintenance measure) on change in PS at both the whole-brain and regional level, suggesting a link between brain maintenance and neural efficiency over time, in addition to a regional effect of baseline NART (i.e., proxy for cognitive reserve) on the change in PS across time. When looking solely at the LISAS behavioral outcome, we observed that older adults were significantly slower than younger adults when considering the mean across all three load levels, at both time points, although there was no evidence for disproportionate declines in performance with age when analyzing change across time points. This finding corroborates the wealth of literature supporting cross-sectional age-related declines in WM (Cansino et al., 2013;Park et al., 2002). We also found a significant Sex effect at baseline, such that females performed worse than males. This was an interesting finding given evidence from a recent meta-analysis on sex differences in verbal working memory suggesting a slight female advantage in performance (Voyer et al., 2021). Another study though by Pliatsikas and colleagues (Pliatsikas et al., 2019) suggested that further sex differences emerge when considering its interaction with age and education, with each of these factors accounted for in our model but without moderation effects being tested. However, it is important to note that most studies considering sex effects in WM have focused on accuracy and not reaction time performance; thus, results from our LISAS analysis do not necessarily translate to findings from the literature. Furthermore, a study by Reed and colleagues (2017) investigating sex differences on verbal working performance across four different tasks, including the Letter Sternberg task, found no significant effect of sex on reaction time performance, neither when analyzing mean reaction time across loads, or any individual load. More conclusive work needs to be done to determine sex-related effects on WM performance.
We observed generally high within-phase task correlations across load levels compared to between-phase task correlations in whole-brain analysis (see Fig. 3), attesting to the greater similarity elicited by load manipulation within a component than across components of memory processing. This was not a totally unexpected finding, as distinct activation patterns have been found for each memory phase (Habeck et al., 2005), although recent studies have also argued for similar neural activity patterns across memory phases (Kragel et al., 2017) as well as network-level patterns (see Kim, 2019). The higher correlations observed here across task load levels implies more sustained and orchestrated activity across increases in task demand.
When analyzing the effects of our factors on PS at the whole-brain level, we observed an effect of Sex both crosssectionally at baseline and longitudinally, with females displaying higher PS at a single time point and preserved across time. Notably, this effect was found after adjusting for behavior (i.e., LISAS) among our other factors. Several studies have displayed significant sex differences in both behavioral and neural responses during various tasks of WM. For instance, when dividing between spatial and verbal WM tasks, some research has revealed that males perform better on spatial memory tasks whereas women perform better on verbal memory tasks (Lejbak et al., 2011). However, other studies have not found sex differences in performance, rather neurofunctional differences that might suggest an engagement of different functional networks to complete a verbal WM task (Speck et al., 2000). Interestingly, Speck and colleagues (2000) found that females more prominently utilized the left hemisphere. When we performed regression analyses at the regional level, while we found both right and left hemispheric effects, there was a slight bias crosssectionally for the left hemisphere and only a left hemispheric effect longitudinally. This effect was found in the left precuneus and left pericalcarine cortex, the former being strongly implicated in verbal episodic memory tasks (for a review, see Cavanna and Trimble, 2006) and the latter forming part of the primary visual cortex and has more recently been suggested to play a role in the capacity limits of working memory (see Bergmann et al., 2016). The longitudinal whole-brain PS analysis revealed a positive relationship between preserved cortical thickness across time and PS, revealing an effect of brain maintenance on our measure of neural efficiency. Another way of considering this finding is that decreases in cortical thickness across time were related to reductions in PS; longitudinal increases in cortical thickness in later life is most often attributed to measurement error (Sele et al., 2021) or inflammation (Fleischman et al., 2010). While our interpretation of PS was based on an assumption of higher neural similarity across load levels being a positive phenomenon, the general pattern of results aligns with such an interpretation; the effect of all factors on PS corroborated this hypothesis. At the regional level, we also found a negative cross-sectional relationship between behavioral performance (LISAS) and PS, such that reaction time increases predicted reduced PS, in the right banks of the superior temporal sulcus after controlling for all other factors. The superior temporal sulcus, among other functions, has been considered a hub for linguistic processing (Deen et al., 2015), serving as a convergence zone between written and spoken language. One speculation for its link to behavior in the present context could be the potential verbal rehearsal of information processed by this region, thus facilitating performance similarly across task load. We also observed a moderation of NART, as a proxy for cognitive reserve, on the relationship between brain structure (i.e., CT) and function (i.e., PS) in the left precuneus. That is, individuals with higher cognitive reserve displayed a greater positive link between brain maintenance and our measure of efficiency, at least cross-sectionally. In addition to its role in episodic memory, the precuneus is considered a functional core of the default mode network, an area sensitive to task-load-related modulation of activation (Sambataro et al., 2010). The fact that increases in cognitive reserve and brain maintenance were linked to greater PS in this region potentially supports a mechanism for neural efficiency and general preservation of neural processing, meriting further investigation via functional network analysis and better-defined mechanistic models.
We also observed an effect of NART both cross-sectionally and longitudinally in the right pars triangularis. Forming a portion of the inferior frontal gyrus, this region has also played a key role in diverse aspects of language processing such as speech production, comprehension and semantic memory (see Elmer, 2016). Interestingly, in a recent metaanalysis of n-back WM tasks by Wang and colleagues (2019), they also showed load-related activation increases in the left inferior frontal gyrus; the effect of NART on right-lateralized inferior frontal gyrus could potentially indicate a reserve mechanism at play to support processing efficiency. Additionally, there was a significant positive effect of change in CT on change in PS, where longitudinal maintenance of brain integrity was related to higher PS across time, specifically in the left supramarginal gyrus. Activation in the supramarginal gyrus has been linked to phonological decision-making (Hartwigsen et al., 2010). More notably however, these two regions have also comprised part of the executive components of working memory, though in the case of the pars triangularis, it is the left hemisphere that is implicated in general executive function for the maintenance of verbal content; the left supramarginal gyrus has instead been linked to attentional shifting towards relevant content (for a meta-analysis, see Nee et al., 2013). However, as suggested by a recent fMRI metaanalysis on verbal WM by Emch and colleagues (2019), it might be important to reconsider the concept of verbal WM N*CT denotes the interaction term of NART * Cortical Thickness. NOTE: All regions displayed bear positive relationship between PS and the respective factor, except LISAS, which displays a negative relationship as principally a left-lateralized process, as related activation has also been observed across bilateral fronto-parietal areas.
Rather strikingly, we did not observe an effect of any measure on PS in the DLPFC, a region considered an important neural substrate for working memory (Curtis & D'Esposito, 2003). One reason for this absence of finding could be the atlas parcellation scheme that we used in calculating regional effects. The DLPFC primarily involves parts of the middle and superior frontal gyri. The Desikan-Killiany atlas broadly distinguishes between these regions, where separable ROIs were created for the caudal and rostral middle frontal gyri and the superior frontal gyrus, the summation of voxels in these regions equaling around 1600 for each hemisphere. It could be that the combination of more specialized subregions within each gyrus more adequately reflect the functionality of the DLPFC, which was not captured by sweeping correlation analyses across voxels within these extensive regions. Another possible interpretation could be that the DLPFC is indeed a region that is sensitive to load level and not only displays load-related activation increases but also greater variability depending on load level. Such a possibility could plausibly result in low phase specificity because the same voxels are not jointly fluctuating across load. By extension, it could be that participants did not express efficiency of processing in the way it was herein operationalized. However, further speculation would necessitate isolation of the specific voxels comprising the DLPFC via a different parcellation scheme, which can be investigated in future work. Our analysis of cross-sectional and longitudinal effects of cognitive reserve and brain maintenance on a derived measure of neural efficiency also comes with caveats. For instance, while we have taken behavioral performance, which is an essential component to assessing efficiency, into consideration in all our models, we have not considered overall activation levels as a result of task load. Other research into neural efficiency though has investigated activation increases across loads, considering slope values without respective intercepts, or initial activation levels (e.g., Zarahn et al., 2007). In the present study, we were interested in determining neural similarity across load levels, under the assumption that similar neural patterns of activation, irrespective of task load, should be present in the efficiently functioning brain. While efficiency might be one plausible interpretation for similar activation patterns across task loads, it is not the only one. Another possible explanation is that similar activation patterns are elicited as a result of expertise, where certain regions high in PS could represent a network that is highly attuned to the task. In such a case, task activations may not necessarily be more "muted" with increased task loads but may represent a set of specialized regions engaged in successful task processing. We therefore do not wish to claim that our operationalization of neural efficiency should be modus operandi for its investigation; moreover, the validity of this method needs be further investigated to rectify alternative explanations. Again, we merely reasoned that efficiency might be thought of in terms of neural similarity Fig. 5 Plot of interaction effect in the left precuneus between CT and NART IQ at baseline (BL). Phase specificity (PS) is plotted on the y-axis, cortical thickness (CT) on the x-axis, and data is median split to show the moderation effect of NART on the relationship between PS and CT. CT has been meancentered. Colored error ribbons reflect the 95% confidence band across task load and if this is the case, it should relate to factors of cognitive reserve and brain maintenance. Furthermore, we did not statistically test at the group level for the brain regions definitively involved in task engagement. Theoretically, high PS can occur in regions outside of task-specific processing regions, particularly as this calculation is eventually averaged across phases, potentially suggesting more domain-general processing. However, as PS was calculated as a relative measure, with the similarity across loads being measured withinversus between-phase, a significantly higher PS should indicate that these regions are not just "silent" during task engagement but are relevant to the task and, as suggested above, may actually represent high attunement to the task. Finally, another caveat of our study is the rather small sample size. While we did adopt a longitudinal approach, which typically makes larger sample sizes more difficult to achieve, there is certainly a lack of power that prevents us from making strong claims. However, our effect sizes did indicate medium to large effects, suggesting rather higher degree of association for our sample.
A future direction of this study would be to also investigate other neural measures such as functional connectivity between regions where high PS was associated with our cognitive reserve and brain maintenance factors. Additionally, we might give more consideration to task-positive versus task-negative areas as our method of parcellation when measuring PS, as there may be differences between the two when age-related alterations (see Kennedy et al., 2017). The current study though was a first attempt at exploring the utility of conceptualizing neural efficiency in terms of within-phase similarity across task loads, and how it relates to individual factors. In light of this aim, we did show both whole-brain and regional longitudinal effects of cognitive reserve and brain maintenance on PS, warranting further investigation into its interpretability and validity as a measure of neural efficiency.

Conclusion
In the present study, we investigated the effect of brain maintenance and cognitive reserve on a measure of neural efficiency, which we term phase specificity. We found a longitudinal effect of change in CT on change in PS at the whole-brain level, suggesting a link between brain maintenance and neural efficiency over time. A longitudinal effect of NART as a proxy for cognitive reserve and brain maintenance, where higher reserve at baseline and maintenance of CT predicted higher PS over time, was also found at the regional level. Additionally, females displayed higher PS over time at the whole-brain and regional level. In sum, given the direction of effect of our findings, both cross-sectionally and longitudinally, phase specificity appears to be a useful measure of neural efficiency as predicted by cognitive reserve and brain maintenance factors.