Data overview and task selection
We used HCP’s task fMRI data where the event and behavior logs were available from the WU-Minn Human Connectome Project dataset (S1200 Release, February 2017) (Barch et al. 2013; Glasser, Smith, et al. 2016; Van Essen et al. 2012). There are seven task fMRI experiments available, completed by young adults (age: 22–35 years): working memory, gambling, motor, language, social cognition, relational processing, and emotional processing.
We chose the task suitable for validating and characterizing CDE, whose criteria are summarized as follows: First, the task must involve cognitive processes, as well as the respective event/behavior logs, for quantifying the accuracy of the estimated time series in terms of activation amplitude and timing. Gambling, motor, and social cognition tasks do not meet this criterion. Second, the proposed CDE paradigm potentially works well even for cases where the conventional brain mapping is not applicable. To illustrate this potential, we chose a task in which the recruited cognitive processes were highly correlated with each other in the temporal domain. Thus, working memory, relational processing, and emotional processing tasks were discarded based on this criterion. The language task was considered optimal (see Table S1 for full assessment), as the timings of auditory presentation/button press and the trial difficulty recorded in the log file allowed us to validate the timings and activation strength of the corresponding cognitive components. Moreover, in the language task, the listening, language comprehension, and task-related cognitive processes (e.g., calculation process in math trials) are highly correlated in the temporal domain.
fMRI data
The fMRI data were collected using a 3T scanner (Siemens Connectome) with an echo-planar imaging (EPI) sequence (32-channel head coil, repetition time [TR] = 720 ms, echo time [TE] = 33.1 ms, in-plane field-of-view [FOV] = 208 × 180 mm, 72 slices, 2.0-mm isotropic voxels, and multiband acceleration factor of 8).
The language task (block design) consisted of math and story trials. The math trials (auditory presentation) included addition and subtraction problems (e.g., "Fourteen plus twelve"), and participants responded to a two-alternative forced-choice by pressing a button (e.g., "twenty-nine or twenty-six"). The difficulty of the question increased after three consecutive correct answers and decreased after one failure. The same difficulty adjustment was employed in the story trials. In story trials passages adapted from Aesop's fables were presented (5–9 sentences), after which participants responded to a two-alternative forced-choice for comprehension (e.g., after a story about an eagle that saves a man who had done him a favor, participants were asked, “Was that about revenge or reciprocity?”) (Binder et al. 2011)..
Event and behavioral log
Stories and math problems were presented in auditory form, and the button presses were recorded using E-Prime. The timing of the stimulus presentation, the difficulty of each question, response timing, and accuracy of responses were obtained.
Exclusion criteria and data splitting
Participants with missing data were excluded. Additionally, participants with no changes in task difficulty throughout the correct trials were excluded given the need to evaluate whether the difficulty of the problems was reflected in the estimated time series. In total, we used data from 914 participants. The data were divided into two parts: a training set including data for participants with even ID numbers (446 subjects) and a test set including data for those with odd ID numbers (468 subjects).
Preprocessing
We used the fMRI data of MSMALL CIfTI format in which gradient unwarping, motion correction, spin-echo fieldmap based echoplanar image distortion correction, brain boundary-based registration of EPI images to T1-weighted structural images, non-linear registration into MNI152 space, bias field correction, and grand-mean intensity normalization were already performed (Glasser et al. 2013; Robinson et al. 2014; Glasser, Coalson, et al. 2016). We further performed temporal band-pass filtering (0.01–0.1 Hz) (Byrge and Kennedy 2018; Tong, Hocke, and Frederick 2019; Zhang, Gheres, and Drew 2021) with mirroring, in which we added a time-reversed (mirrored) time series at each end of the original time series to reduce edge artifacts caused by the filtering (Cohen 2014). The additional parts were discarded. However, considerable edge effects which may distort the validation process were still observed at each end; therefore, two math trials and one story trial at each end were excluded.
Cognitive processes of interest
We defined cognitive processes of interest in a data-driven manner. We first extracted repeatedly occurring spatial patterns in the training set using k-means++, an unsupervised clustering algorithm (Hutchison et al. 2013; Allen et al. 2014). The hyperparameter k ranged from 10–800 and was set to 25, which maximizes the number of spatial patterns shared between participants. Twenty-five spatial patterns were then labeled by NeuroSynth, a meta-analysis platform for spatial information, resulting in 100 labels for each pattern. Among 2,500 labels, duplicate labels were removed, and anatomical region labels (e.g., gyrus, posterior, hippocampal) were also removed. Successively, we executed manual clustering to gather terms with similar meanings (e.g., hand, finger, finger movements, and index finger), resulting in 14 clusters. From these 14 clusters, we adopted the following eight cognitive processes based on whether they putatively were expected to emerge in the task (Binder et al. 2011; Iuculano et al. 2014; Harada, Bridge, and Chiao 2013) or their decoding performances could be quantified: listening, hand, calculation, social, language, task demands, object, and reward. Of the remaining six, eye, resting state, pain, and disorder were removed because these processes had no association with the language task, and retrieval and response inhibition were removed because these processes could not be assessed by the event and behavior logs.
Meta-analytic spatial patterns
The brain activation patterns associated with the eight cognitive processes were created with NeuroSynth, resulting in eight meta-analytic maps (termed association test maps in the original paper (Yarkoni et al. 2011) ). We then converted the meta-analytic maps represented in MNI space to surface space using in-house code. All meta-analytic maps are presented in Figure S2. The meta-analytic maps included both positive and negative values. For example, larger positive values in the calculation map indicate that these vertices are more likely activated in studies where the term calculation appears in the Abstract than in studies where it does not. In contrast, negative values indicate that these vertices are less likely to be activated when the term is mentioned in the Abstract than when it is not. As the meta-analytic map provided by NeuroSynth denotes the probability of activation, regions with high probability are expected to exhibit stronger activation under the corresponding task. However, it is not necessarily true that regions with low probability (i.e., negative values in the meta-analytic map) will exhibit stronger deactivation. Hence, we replaced all negative values with zero to ignore these vertices ensuring that our models were as interpretable as possible. Then, these meta-analytic spatial patterns were used as regressors in the whole-brain CDE and masks in the ROI CDE.
Whole-brain CDE
We modeled \(Y\left(t\right),\)that represents the spatial pattern of fMRI data at timepoint \(t\), using the meta-analytic maps as regressors with identity link function as follows:
$$Y\left(t\right)=X\beta \left(t\right)+\epsilon \left(t\right)$$
where \(X\)represents the meta-analytic spatial pattern matrix of \(V\times C\) and \(\epsilon \left(t\right)\) is the residual error. \(V\) denotes the number of vertices, and \(C\)denotes the number of cognitive processes of interest (i.e., eight in this study). We estimated \(\beta \left(t\right)\)under the elastic net penalty, as follows (Zou and Hastie 2005):
$$\underset{{ \beta }\left(t\right)}{{argmin}} MSE\left(Y\left(t\right),\widehat{Y\left(t\right)}\right)+\lambda \{ \alpha {‖\beta \left(t\right)‖}_{L2}^{2}+\left(1-\alpha \right){‖\beta \left(t\right)‖}_{L1} \}$$
Note that \(MSE\left(a,b\right)\) denotes the mean-squared error of a and b. With this regression, we can estimate a \(C\times 1\) vector \(\beta \left(t\right),\)which reflects the activity of the cognitive processes of interest. Repeating the above procedure for all time points \(T\), we obtained the time series of \(\beta \left(t\right)\)resulting in the \(C\times T\)matrix, \({M}_{reg}\). As \(M\)reflects the time series of the cognitive process activities, we examined how precisely the temporal information of cognitive processes had been extracted from language fMRI data. Hyperparameters, λ (0, 0.01, 0.1, 1, 10, 100, and 1000), and α (0, 0.25, 0.5, 0.75, and 1) were tuned to maximize the metrics (defined in the Evaluation metrics section) in the training set. After all sets of hyperparameters yielded values for the metrics, we ranked the respective values along all hyperparameter sets. Then, the respective ranks were averaged to create one averaged rank for each set of hyperparameters. We selected the set of hyperparameters (λ = 10, α = 0) that yielded the highest averaged rank as the optimal set of hyperparameters.
ROI CDE
In the ROI CDE, at timepoint \(t\), we extracted the activity from the vertices surrounding the peak vertex of the meta-analytic maps. The range of included vertices was defined by a hyperparameter r (ranging from 1 to 16; 16 values), which denotes the path length from the peak vertex. At timepoint \(t\), for each cognitive process, the extracted activities were averaged across vertices to calculate a \(C \times 1\) vector \(ROI\left(t\right).\)By calculating the activity for all time points \(T\), we obtained the time series of \(ROI\left(t\right),\)resulting in a \(C\times T\) matrix \({M}_{ROI}\). Using this matrix, we evaluated how precisely the ROI CDE could estimate the dynamics of cognitive processes. We searched for the best value of r by using the metrics as evaluation metrics with the same procedure described in the Whole-brain CDE section. The optimal r = 7 was used to estimate the cognitive dynamics in the test set.
Evaluation metrics
We evaluated how precisely \({M}_{reg}\) and \({M}_{ROI}\) could capture cognitive dynamics by referring to the stimulus and behavioral logs. The cognitive dynamics for the listening and hand processes were associated with externally observable variables as there are logs for the timing of sound presentation and the participant’s button press. Because the narration during the trial was a sustained stimulus, the decoding of listening timing was considered accurate if the listening local bottom was within a window of 7–15 volume after the sound offset (Boynton et al. 1996; Robson, Dorosz, and Gore 1998; Watanabe et al. 2013). In each participant, we quantified the decoding performance as the proportion of the successfully decoded trials among all trials, which was formalized as follows:
$${Timing}_{listening}^{math} ≔ \frac{successfully decoded math trials}{all math trials}$$
$${Timing}_{listening}^{story} ≔ \frac{successfully decoded story trials}{all story trials}$$
Note that we only included trials in which a participant answered correctly because we could not assume the timings and intensities of cognitive processes for incorrect trials. The same inclusion criteria were adopted for the other metrics and the prediction of cognitive ability. Then, because button presses evoke transient brain activity, the timing of the hand was considered being accurately decoded in a trial if its local peak was within a window of 3–11 volume after the actual button press. The timing quantification methods above were also used for the hand, which were \({Timing}_{hand}^{math}\)and\({Timing}_{hand}^{story}.\)
The language process is externally unobservable. However, the activity associated with the semantic processing of sounds must follow the offset of narration and precede the button press. This is because the semantic processing of sounds before hearing the sound is impossible, and correctly responding before semantic processing of sounds is also improbable. Therefore, in both trials, decoding was considered successful in the trial if language process activity peaked from the offset of narration to the actual button press. In each participant, quantification was performed as with the way defined above, which resulted in \({Timing}_{language}^{math}\) and \({Timing}_{language}^{story}.\)
Although the calculation and social processes are also unobservable, it is obvious that such processes follow the onset of narration and precede the button press because it is impossible to think before hearing a question and correctly answer question before thinking. Therefore, for math trials, we considered calculation timing to have been successfully decoded if the local peak of calculation and hand processes appeared, in that order, within the window of 3 volume after the onset of narration to 3 volume after the actual button press. In each participant, quantification was performed as follows:
$${Order}_{calculation}^{math} ≔ \frac{successfully decoded math trials}{all math trials}$$
The same applies to the timing of the social process:
$${Order}_{social}^{story} ≔ \frac{successfully decoded story trials}{all story trials}$$
As the difficulty of the math and story tasks varied across trials, the intensity of activity associated with the calculation and social processes should reflect such variation. We quantified this decoding performance using Spearman's rank correlation between local peak activity for the calculation process and math trial difficulty (\({\rho }_{calculation}\)). The same quantification procedure was used to compute \(\rho\) for the social process in story trials (\({\rho }_{social}\)). Additionally, we computed Spearman's rank correlation coefficients for the local peak activities of the task demands process and math trial difficulty in the same manner to examine whether the cognitive process reflected task difficulty; denoted as \({\rho }_{TDs}^{math}\). The same quantification procedure was used to compute Spearman's rank correlation coefficients between the peak activities of the task demands process and story trial difficulty (\({\rho }_{TDs}^{story}\)).
To quantify the statistical significance of all 12 metric performances on the whole-brain and ROI CDE, we performed a permutation where the timings of the event (e.g., button press) were scattered randomly while the time series of cognitive processes were unaltered. We avoided permuting the time series because permutations of the time series tend to yield false positives. Instead, we repeated the event-timing permutation 1,000 times for each participant to obtain 1,000 values for the respective metrics.
Prediction of cognitive ability
We further examined whether the estimated time series of the cognitive processes reflected the participants’ cognitive ability. The participants in training or test set were further divided into “high” and “low” performance groups based on performance in math and story tasks respectively. For math trials, the top 50% of participants in terms of accuracy were assigned to the high-performance group, while the others were assigned to the low-performance group. For story trials, participants with 100% accuracy were assigned to the high-performance group, while the remainings were assigned to the low-performance group. Note that we did not use all trials but selected trials based on the following three procedures. First, we used only trials in which the subjects answered correctly because, in trials in which the subjects failed, the subjects might not have focused on the given tasks. Next, we matched the number of people from the high-performance group and the low-performance group as closely as possible to avoid the class-imbalance. Finally, we matched the distribution of trial difficulty as closely as possible between the high-performance and low-performance groups' data. This is because the task difficulty is correlated with trial length, which alone can make this binary classification successful.
Then, the time series of each cognitive process in each trial were subjected to a linear support vector machine (SVM) algorithm to classify whether the time series were derived from participants with high or low performance. The time series of 21 volume (TR) around the button press was epoched as input for math trials, and 38 volume (TR) for story trials (Boynton et al. 1996; Robson, Dorosz, and Gore 1998; Watanabe et al. 2013). Activity values at the respective timepoints were used as features. The SVM was trained on the training set, and binary accuracy of the trained SVM was evaluated on the test set. No hyperparameter tuning was performed, and the default settings of the MATLAB fitcsvm functions were used. The chance level was calculated using a label permutation. The above binary accuracies were compared using McNemar’s test.