Participants and experimental designs
A total of 208 participants (150 females, 58 males) were recruited from the Zhejiang University and Hangzhou Normal University. Two participants who failed to complete all experiments were excluded. The final sample that completed behavioral tests (including seven questionnaires and fourteen computer versions of WM tasks) consisted of 206 participants (148 females, 58 males) between the ages of 17 and 25 (M = 20.4, SD = 1.4). All participants had normal color vision and normal or corrected-to-normal visual acuity. They provided written informed consent prior to the experiments and received payment or course credit for their participation. One hundred and seven participants among 206 participants further completed MRI experiments, which included high-resolution T1-weighted scanning, resting-state fMRI scanning, and task fMRI scanning (part of the task fMRI data were reported in another study 66). As for resting-state fMRI analysis, four participants were excluded because of large head motion (maximum displacement > 3 mm or 3◦, or the number of time points with Framewise Displacement > 0.5 mm exceeded 15% of the total), resulting in 103 participants in the final fMRI sample (M = 19.5, SD = 1.3; 61 females, 42 males). For the ROI analysis, a subgroup of 47 participants (M = 19.9, SD = 0.8; 29 females, 18 males) among 107 participants was included to perform the N-back task and another 34 participants (M = 23.3, SD = 2.6; 16 females, 18 males) were recruited to perform the change detection task independently (for details of N-back and change detection task, please refer to Zhou et al. 66). This study was approved by the ethics committee of Zhejiang University.
The total duration of the behavioral experiments was approximately 210 minutes. To control for the possible influence of the fatigue effect, we divided the testing into three sessions, which were conducted separately for three days. In Session 1, participants completed seven questionnaires that were irrelevant to our study and were not reported here. In Sessions 2 and 3, they completed a series of WM tasks in a dark compartment of the laboratory. Resting-state fMRI data and the N-back task fMRI data were collected after the participants completed all behavioral tests. An independent group performed an event change detection task in the fMRI scanner (preliminary results have been reported in a previous study 66).
WM tasks
The stimuli in the 14 WM tasks were generated using MATLAB Psychophysics Toolbox and presented on a black (0, 0, 0) background. Participants sat in a dark room, 60 cm in front of a 17-inch CRT monitor with a resolution of 1024 × 768.
For storage tasks (event storage, object storage, and binding storage), the change-detection paradigm was adopted, and the number of memory items was set to two, four, and six (except for the location storage task, which was set to four, six, and eight). There were 26 trials under each memory load, and 78 trials in each task. Participants were allowed to take a break after every 26 trials. Before the formal experiment, an 8-trial practice was conducted.
Event storage tasks
Event storage tasks included biological movement (BM) and non-biological movement (NBM) tasks. Point-light displays (PLDs) and solid agents were used as two types of BM stimuli, and rectangular and circular movements were used as two types of NBM stimuli.
PLD BM. For each PLDs movement, 13 light points were placed at the distinct joints of a moving human body to form a coherent and meaningful movement. We selected nine movements from the database of Vanrie and Verfaillie 88: cycling, jumping, painting, spading, walking, waving, chopping, paddling, and saluting (see Supplementary Figure S7 B). Every animation consisted of 30 distinct frames, and each frame was displayed twice in succession, leading to a 1-s PLD (refresh rate, 60 Hz). Each stimulus subtended a visual angle of approximately 1.64° × 1.64°. The spatial locations of the PLDs were randomly selected from eight evenly distributed spots on an invisible circle with a radius of 4.88° from the screen center.
The procedure of a single trial was shown in Figure S7 A. Each trial began with a display of the word "Coca-Cola" (in Chinese) for 500 milliseconds (ms), which reminded participants to rehearse aloud during the whole task. This manipulation was set to prevent the participants from verbally coding the BM. After an interval of 150–350 ms, the memory array appeared for four seconds. Participants were required to remember these stimuli. After a 900-ms retention interval, a red probe was presented at the center of the screen. Participants were required to determine within 3 seconds (s) whether the red probe had appeared in the memory array. They pressed the "J" key if it had appeared in the memory array, and pressed the "F" key if it had not. The probe BM was the same in 50% of the trials and changed to a new action in the remaining 50% of the trials. After the response, there was an interval of 500–700 ms between trials.
Solid BM. Solid agents were used as the other type of BM stimuli. In this task, BMs were presented in the form of solid agents, which were created using the Poser software. We selected nine movements: arm, foot and hip, jumping, squatting, walking, waving, saluting, turning, and bowing (see Figure S7 C). The other settings were the same as those in the PLD BM task.
Rectangle movement (Rec Move). The movements of the rectangles were created using a 12-dotted PLD. 47 In line with the PLD BM stimuli, the dotted rectangle showed nine distinct movements (see Figure S7 D). (1) The left and right sides moved downward by 60° relative to their vertical positions and then returned. (2) The left side moved upward by 45° relative to its vertical position and then returned, while the right half moved upward by 45° relative to its vertical position and then returned. (3) The left and right halves moved downward by 90° relative to their vertical positions and then returned, while the top side rotated around the middle dot once. (4) The top side moved upward by 90° relative to its horizontal position and then returned. (5) The top half of the rectangle rotated 90° clockwise relative to its vertical position, while the left bottom half moved upward by 90°. (6) The top side rotated around the top dot on the right side by 180°, while both the left and bottom sides moved downward by 90°. (7) The right side moved downward by 45° and then returned. (8) The left side rotated 180° clockwise around the middle dot, while the right side moved downward by 90° relative to its vertical position. (9) The top side moved downward by 90° relative to its horizontal position and then returned, while the right side moved 180° clockwise relative to its horizontal position and then returned. The other settings were the same as those in the PLD BM task.
Circle movement (Circle Move). We used 13-dotted PLDs of circular movement, containing nine distinct movements 89: moving up, moving down, rolling right, rolling left, shrinking, inflating, moving diagonally, splitting horizontally, and splitting vertically (see Figure S7 E). The initial radius of each circle is approximately 1.15°. Each movement lasted for 1 s. The other settings were the same as those in the PLD BM task.
Object storage tasks
Color. For distinct colors (Figure S7 F), we used white (255,255,255, in RGB value), yellow (255,255,0), lime (0,255,0), gray (128,128,128), light pink (255,178,193), aqua (0,255,255), blue (0,0,255), red (255,0,0), and magenta (255,0,255). The spatial locations of the colored squares (1.1° × 1.1°) were distributed at eight spots on an invisible circle with a radius of 4° from the center of the screen. After a 300–400 ms interval, the memory array was displayed for 200 ms. The other settings were the same as those in the PLD BM task.
Shape. We replaced the nine colored squares with nine distinct shapes (see Figure S7 G; 1.6° × 1.6°). The other aspects were the same as those of the color WM task.
Location. Participants were required to remember the locations of four, six, or eight white squares (0.2° × 0.2°) to avoid the ceiling effect. They judged whether the probe (red square) appeared at one of the memorized locations. The stimuli appeared inside an invisible circle with a radius of 5° from the center of the screen. These squares were evenly distributed in the four quadrants and did not overlap. A fixation (“+”) remained at the center of the screen from the blank interval after the Coca-Cola presentation until the end of the detection phase. The other settings were the same as those in the color task.
Binding tasks
Binding tasks were designed to assess WM capacity when two different features were presented within the visuo-spatial domains (color-location binding) or across the verbal and visual WM domains (color-letter binding). Participants had to hold them together in WM to respond correctly to the task. The probe could be one of the old items in the memory array in 50% of trials. In the remaining trials, the probe was formed by combining the different dimensions of the two items in the memory array.
Color-location binding. Participants were required to remember the bindings between the color and location of the 0.8° × 0.8° squares. The same colors were chosen from the color task, except that light pink was replaced with green (0,128,0). The squares appeared for 200 ms in an area of 10° × 10° in the screen center and were non-overlapping. Subsequently, a probe appeared at a certain location on the screen, and participants were required to report whether its color matched the memory item at the corresponding location. The fixation settings were the same as those used for the location task.
Color-letter binding. Participants were required to remember the binding between color and capital letters (A, B, C, D, E, H, M, J, and K). Each letter subtended a visual angle of approximately 1.3°×1.3° and was presented on an invisible circle with a radius of 4° from the center of the screen for 200 ms. A colored capital letter then appeared at the center of the screen. Participants were asked to determine whether any memory item matching both letter and color appeared. No fixations were demonstrated during the task.
Central executive tasks
Anti-saccade (Anti). This task was adapted from the task used by Unsworth and Spillers. 90 Each trial started with a fixation point displayed for 200, 600, 1000, 1400, or 1800 ms. A cue was then displayed either to the left or right of the fixation point for 100 ms, followed by a 50-ms interval. This procedure was repeated a second time. After the disappearance of the cue, a target letter (B, P, or R) was immediately presented to the opposite side of the second cue for 100 ms, which was followed by a masking stimulus (an H for 50 ms and then an ‘‘8’’ that remained on screen until a response). Participants were required to identify the target letter (B, P, or R) by pressing a corresponding key (1, 2, or 3). There were 15 practice and 40 formal trials. The proportion of correct responses was then recorded.
N-back. The task began by showing the fixation in the center of the screen for 1000 ms. A sequence of black squares then appeared in the eight outer squares of a three-by-three grid (the middle square was not used), and participants were asked to decide whether the location of each square matched the one appearing three items (3-back) before. The presentation time was 500 ms for each square, with inter-stimulus fixations of 2000 ms. Participants pressed “F” if the locations were identical or “J” if they were not. The task included two blocks, each with 40 formal trials. We conducted eight practice trials before the formal experiment. The proportion of correct responses was then recorded.
Other WM tasks
Operation span (Ospan). Participants remembered a set of consonants (F, H, J, K, L, N, P, Q, R, S, T, and Y) while solving arithmetic problems. Three practice sessions preceded the formal trials to familiarize participants with the procedure. The first session was a simple letter span task that asked participants to recall a set of consonants in the same order as they were presented. Each letter was presented in the center of the screen for 1000 ms. During recall, a 4 × 3 matrix of letters was displayed, and participants had to select the letters in the correct order by clicking the box beside the letter. In the second session, the participants were asked to solve 15 arithmetic problems (e.g., (3 × 2) + 6 =?). The processing time for each problem was recorded. After completing all problems, the program calculated the mean and standard deviation (SD) of the time each participant took. The third session combined the simple span and arithmetic tasks. Participants were first presented with a mathematical operation and then a letter. The time limit for each operation was fixed to the average time plus 2.5 SD obtained in the second session so that participants had little time to rehearse the letters. After the participants completed all three practice sessions, the program proceeded to the formal trials, which were similar to the trials in the third practice session. The list length varied randomly from four to eight letters. Three trials were performed for each set size. The dependent variable was the number of correct lists that recalled all letters in the correct order. Participants repeated the task if the overall accuracy of the math portion was lower than 80%.
Symmetry span (Sspan). Participants were required to remember the location of sequences of red squares (1.1° × 1.1°) presented within a 3 × 3 matrix while performing a symmetry judgment task. In line with Ospan, the task started with three practice sessions: two trials of the simple square-span task, 15 symmetry judgment tasks, and three trials of the combined task. In the formal trials, participants first saw a black-and-white 8 × 8 grid pattern and decided whether it was symmetrical along the vertical axis. The time limit was calculated in the same manner as for Ospan. Then, participants were presented with a 4 × 4 matrix with one of the cells filled in red for 650 ms. During recall, a blank 4 × 4 matrix appeared, and participants had to recall the red squares in the same order as they were presented by clicking on the corresponding locations. There were three trials for each set size with a list length ranging from three to six. The same scoring procedure was used as with the Ospan task.
Multiple-object tracking (MOT). Twelve white disks with a viewing angle of 0.25° were presented in an area of 8.5° × 8.5° at the center of the screen. The movement of the disks was divided into three stages: marking, tracking, and detection. The marking phase lasted for 3000 ms, during which all disks remained still. However, two, four, or six disks turned red and solid for 2500 ms, prompting the participants to track the position of these disks. Subsequently, all disks became hollow and appeared for 500 ms. The tracking phase lasted for 4500 ms. All discs moved in random directions at speeds ranging from − 0.00058°/ms to 0.0064°/ms, and they did not overlap during the movement. The detection phase lasted for a maximum of 1500 ms. All discs stopped moving and one of the discs turned solid green. Participants were required to determine whether the disc was one of several discs marked in the marking phase. This task did not require verbal suppression, and the other aspects were the same as those in the PLD BM task.
Behavioral analysis
For WM storage tasks, the WM capacity for each type of stimulus was estimated using Cowan’s formula 91: K = S × (H − F), where K is the WM capacity, S is the number of to-be-memorized stimuli, H is the hit rate, and F is the false alarm rate. We calculated K for each set size for each participant and considered Kmax among the three load conditions as one’s WM capacity.56,92,93 For all tasks, univariate outliers were defined as individual scores that exceeded 3 SDs from the respective grand mean. Of a total of 2884 observations, five met this criterion. These scores were replaced with corresponding cut-off values (M ± 3SD). For the MOT task, we used the formula of Scholl et al. 43 to derive the effective number of objects tracked: K = S × (2P − 1), where K is the effective number of objects tracked, S is the number of targets, and P is the tracked accuracy in a tracked-load condition.
All structural equation models were estimated using Mplus 8. 94 The robust maximum likelihood estimation method (MLM) that has been developed for non-normal data was used in modelling estimation. 95 The evaluation of the fit statistics was based on the criteria recommended by Kline 45 and Distefano 96. Specifically, the fit of a model was considered good (or acceptable) if normed χ2 (χ2/df) ≤ 2 (3), RMSEA ≤ 0.05 (0.08), SRMR ≤ 0.05 (0.10), and CFI ≥ 0.95 (0.90). We made model comparisons by considering changes in CFI and AIC. A difference of .01 or larger in CFI was considered a substantial difference. 97 AIC was used to compare non-nested models, in which a model with a smaller AIC was preferred. 98
Brain-behavior prediction analyses
We calculated the scores of all latent variables to investigate the neural correlates underlying different WM components. We first transformed the scores (accuracies for CE tasks and Kmaxs for storage tasks) of each WM task into z-scores. According to the structure of Model 14, we then used the average of the z-scores of the corresponding tasks to attain the score of each first-order factor. For example, BM = 0.5 * (zPLDBM + zSolidBM). The score of the second-order factor is averaged over the scores of the first-order factors; for example, EVENT = 0.5 * (BM + NBM). Thus, we obtained scores for the EVENT, OBJECT, and CE components. SVR models were trained using the leave-one-out cross-validation (LOOCV) method to predict the scores of the CE, EVENT, and OBJECT components derived from factor analysis. The prediction procedure was performed using the MVPANI toolbox, 101 which was programmed based on the LIBSVM toolbox (http://www.csie.ntu.edu.tw/~cjlin/libsvm/). Feature ranking and selection were performed using a training dataset. For each iteration, one participant was left out as the testing sample, whereas the remaining participants constituted the training dataset. Within the training dataset, the predictive weight of each edge was determined through SVR model training, and the absolute values of the weights were sorted in descending order. Based on previous studies that conducted SVR to reveal brain-behavior associations, a prediction model with a feature number of approximately 200 achieved good performance. 33,50 Thus, five subsets of features ranking from the top 1‰ to 9‰, with an increase step of 2‰, were used to predict the WM score of the testing sample in the current study (with feature numbers ranging from 35 to 322, more subsets were also tested; see Supplementary materials). The correlation coefficient between the predicted WM score and the actual score was calculated to quantify the prediction accuracy with permutation to test the significance of this correlation. For each permutation, the WM scores were shuffled and the same LOOCV prediction procedure was performed to generate a correlation coefficient between the predicted and shuffled WM scores. After 1000 permutations, the significance p value was calculated as the percentage of correlation coefficients exceeding the one calculated without shuffling WM scores. A p value smaller than 0.01 was considered significant, controlling for multiple comparisons of five threshold values for each WM component.
ROI analyses of event WM tasks
An event N-back task and a change detection task were employed to investigate whether the brain regions identified as important by the prediction models participated in WM processing. Both tasks used PLD biological and non-biological movements as stimuli and consisted of two WM loads (0back and 2back for the N-back task, 2sets and 4sets for the change detection task). For the N-back task, biological movement 0back, biological movement 2back, non-biological movement 0back, and non-biological movement 2back were regarded as separate regressors in the general linear modelling (GLM) analyses. For the change detection task, biological movement 2sets, biological movement 4sets, non-biological movement 2sets, and non-biological movement 4sets regressors were constructed for both encoding and delay periods (for task and GLM details, please refer to Zhou et al. 66). Beta values were extracted from each load for each stimulus type of the event WM task. We conducted one-sample t-tests to test the significance of activation, and paired t-tests to compare activation differences between low and high loads for each stimulus type (biological and non-biological movements).
Additionally, we conducted time-series analyses for the change-detection task to investigate the BOLD signal change percentage during the task. Specifically, for each participant, we extracted the average time-course of the ROI and segmented it according to the onset time of each encoding period. We then averaged the time-course segments of each load for each stimulus type. The resultant time-courses were further converted to the percent signal change for each condition by subtracting the value of the corresponding time points of the baseline trial and then dividing by the baseline trial value. We observed two peaks in the time-series which corresponded to the encoding and probe periods. Accordingly, we selected signals between 7 and 9 s for the encoding phase, signals between 17 and 19 s for the probe phase, and signals between 13 and 15 s for the delay phase. 102 Finally, we conducted paired t-tests to test the load effects (4sets > 2sets) for each stimulus type.