Twenty-eight participants were recruited. They were all right-handed native English speakers. All participants provided written consent forms before the experiment. Twenty-five participants were included for further analyses (14 females, age 18-40). Three were excluded, one due to anatomical anomalies, one due to excessive motion artifacts in T1 image, and one slept during the story. The experimental protocol was approved by the Institutional Review Board of Princeton University.
The stimulus was created by Lazaridi (“The 21st Year” -- Excerpt, copyright 2019), who has been in collaboration with our lab for a number of years 27. She has years of experience in practicing and evolving the technique of organizing the audience’s understanding, memory, and interpretation of a narrative through screenplay writing and professional screenplay development around the world 28. Compared to other types of writing, the creation of a screenplay is highly audience-driven due to the large investment (in time, collaboration, and financing) inherent in film-making. Furthermore, watching a film is a more continuous experience than reading a book, requiring the screenwriter to guide and unite the audience’s understanding and overall response to the narrative without loss of focus or inner thought digressions.
Lazaridi designed the narrative stimulus as a stand-alone fiction text that incorporated her experience-guided narrative techniques of traditional screenplay writing. The narrative consisted of 45 segments, and two seemingly unrelated storylines, A and B. A and B segments were presented in an interleaved manner for the first 30 segments. In the last 15 segments (Part C), the two storylines merged into a unified narrative. Each segment lasted for 41-57 TRs (mean: 46 TRs = 70 sec). They were separated by silent pauses of 3-4 TRs. The narrative was recorded by a professional actress (June Stein), who is a native English speaker, and directed by Lazaridi to ensure that the actor’s interpretation matched the author’s intent. The recording is 56 minutes long (see Supplementary Table 1 for the transcription of the story).
In the A and B segments, the author incorporated unique narrative motifs, i.e., specific images/situations/phrases that recurred in part C (see Fig. 1 and Fig. 4 for a sample motif and Supplementary Table 2 for a list of all the sentences containing motifs). The recurrence of motifs in part C is designed to trigger the reinstatement of specific moments from part AB, in order to evolve their meanings and to integrate the two storylines.
In total, there were 28 different narrative motifs, occurring 58 times in the AB part, and 36 times in part C. The same narrative motif was not always realized with the same words. The main technique Lazaridi used to make motifs memorable was to embed them in emotionally heightened narrative moments. For example, at the beginning of the narrative, (Part A) Clara serves chili during a party in LA and her interactions with the dish (serving, eating, throwing up after eating it) map a series of seminal emotional moments in her personal narrative.
The recording of the narrative was presented using MATLAB 2010 (MathWorks) and Psychtoolbox 3 29 through MRI-compatible insert earphones (Sensimetrics, Model S14). MRI-safe passive noise-canceling headphones were placed over the earbuds for noise reduction and safety. To remove the initial signal drift and the common response to stimulus onset, the narrative was preceded by a 14 TR long musical stimulus, which was unrelated to the narrative and excluded from fMRI analysis. Participants filled a questionnaire after the scanning, to evaluate their overall comprehension of the narrative and their ability to relate events in different parts of the story that shared the same motifs.
Subjects were scanned in a 3T full-body MRI scanner (Skyra, Siemens) with a 20-channel head coil. For functional scans, images were acquired using a T2*-weighted echo planar imaging (EPI) pulse sequence (repetition time (TR), 1500 ms; echo time (TE), 28 ms; flip angle, 64°), each volume comprising 27 slices of 4 mm thickness with 0 mm gap; slice acquisition order was interleaved. In-plane resolution was 3 × 3 mm2 (field of view (FOV), 192 × 192 mm2). Anatomical images were acquired using a T1-weighted magnetization-prepared rapid-acquisition gradient echo (MPRAGE) pulse sequence (TR, 2300 ms; TE, 3.08 ms; flip angle 9°; 0.86 x 0.86 x 0.9 mm3 resolution; FOV, 220 x 220 mm2). To minimize head movement, subjects' heads were stabilized with foam padding.
MRI data were preprocessed using FSL 5.0 (http://fsl.fmrib.ox.ac.uk/) and NeuroPipe (https://github.com/ntblab/neuropipe), including BET brain extraction, slice time correction, motion correction, high-pass filtering (140 s cutoff), and spatial smoothing (FWHM 6 mm). All data were aligned to standard 3 mm MNI space (MNI152). Only voxels covered by all participants’ image acquisition area were included for further analysis.
Following preprocessing, the first 19 TRs were cropped to remove the music preceding the narrative (14 TRs), the time gap between scanning and narrative onset (2 TRs), and to correct for the hemodynamic delay (3 TRs). To verify the temporal alignment between the fMRI data and the stimulus, we computed the temporal correlation between the audio envelop of the stimulus (volume) and the subjects’ mean brain activation in left Heschl’s gyrus following Honey et al. 30. The left Heschl’s gyrus mask was from Harvard-Oxford cortical structural probabilistic atlases (thresholded at 25%). The audio envelope was calculated using a Hilbert transform and down-sampled to the 1.5 s TR. The correlations were computed with -100-100 TRs lag to find the time lag that showed the highest correlation. The averaged peak time was 0.12 TR across subjects, indicating that the narrative and fMRI data were temporally well-aligned.
To account for the low-level properties of the stimulus, a multiple-regression model was built for each voxel. The regressors included an intercept, the audio envelope, and the boxcar function of the between-segment pauses, convolved by the canonical hemodynamic response function and its derivatives with respect to time and dispersion as given in SPM8 (https://www.fil.ion.ucl.ac.uk/spm/). For the effect of audio amplitude and between-segment pause, please see Supplementary Fig. 1. The residuals of the regression model were used for the following analyses.
We used 238 functional regions of interest (ROIs) defined independently by Shen et al.16 based on whole-brain parcellation of resting-state fMRI data. A control anatomical ROI was also included: left Heschl’s gyrus defined using the Harvard-Oxford cortical structural probabilistic atlas, thresholded at 25%.
A bilateral anatomical hippocampal mask was obtained using the same threshold. It has been proposed that the temporal integration window of episodic memory representation varies along hippocampal long axis 31. While the posterior hippocampus has a small representation scale, the middle and anterior portions are able to contain the associations between more than two events. If so, it would be inappropriate to treat hippocampus as a functionally homogenous region. Therefore, we divided the hippocampus mask into anterior (MNI coordinate y>-19), middle (-30 < y < =-19 ), and posterior parts (y <= -30) ROIs following Collin et al. (2015)31.
All ROIs had more than 50 voxels in our data.
Shared response model
When comparing activation patterns across subjects, the mismatch of functional topographies could decrease analysis sensitivity even after anatomical alignment 32,33. Therefore, we functionally aligned data within each ROI across subjects using the shared response model (SRM)(Brain Imaging Analysis Kit, http://brainiak.org) 34. SRM projects all subjects’ data into a common low-dimensional feature space by capturing the components of the response shared across subjects. The input to SRM was a TR x voxel x subject matrix, and the output was a TR x feature x subject matrix. We used fMRI data from the whole story (z-scored over time first) to estimate an SRM with 50 features. Note that no information about storyline or motif was submitted to SRM. Therefore, while this projection inflated the overall inter-subject pattern similarity, it could not artifactually give rise to the storyline or motif effect shown here. The output of SRM was z-scored over time. Unless otherwise stated, all the pattern analyses described below were run based on the resulting 50 features.
We also performed the same analyses without the application of SRM. Generally speaking, a subset of the areas that were significant in the analysis with SRM were also significant in the analysis without SRM. Please see Supplementary Fig. 2 for the results.
RSA of storyline effect
To examine the storyline effect, we performed representational similarity analysis (RSA) 14,15 on brain activation patterns and tested whether the representational similarity between segments from the same storyline was higher than that of segments from different storylines. We first computed the averaged activation within each segment across TRs for each voxel. The resulting 45 values were then z-scored across segments. For each ROI, pairwise pattern similarities between the 45 activation maps were computed. Pairwise pattern similarities between the 45 activation maps were computed with the leave-one-subject-out method (Fig. 1b). Namely, the averaged activation pattern was extracted for each segment. Then the Pearson correlation coefficients between one subject’s activation patterns and the averaged patterns of the remaining subjects were computed. The output correlation coefficients (45 x 45 segments) were normalized with Fisher’s z-transformation. This procedure was repeated for each of the 25 subjects and each ROI.
We then contrasted the averaged within- and between-storyline similarities in the AB part, excluding the within-segment similarities (the diagonal of the 45 x 45 similarity matrix), to obtain 25 contrast values (Fig. 2a) for each ROI. These contrast values were compared to zero by a one-tailed one-sample t-test and thresholded at p < .05 (FWE correction for multiple comparisons).
To examine whether the storyline effect increased over time, for regions showing a significant storyline effect, we computed the storyline effect in the early (segment 1-14) and later (segment 15-30) halves of the AB part separately. 25 contrast values were generated by comparing the late and early storyline effects (late (same > different storyline) > early (same > different storyline)). These contrast values were again submitted to a one-tailed one-sample t-test (p < .05, FWE). The results were projected back onto the whole-brain surface and visualized using Freesurfer v6 (http://surfer.nmr.mgh.harvard.edu/).
To test the storyline x time effect in a more graded way, we constructed a 45 x 45 time effect matrix, populated with the average of the time points (segment number). For example, the (4, 5) entry of this matrix is 4.5 (=(4+5)/2). Taking only the entries corresponding to within-storyline similarity, excluding the diagonal elements (Supplementary Fig. 5, upper panel), we computed the Pearson correlation between the time matrix and the pattern similarity matrix. The resulting R-values were entered into a one-sample one-tailed group t-test after Fisher’s z-transformation within regions showing significant storyline effect (N=25, p < .05, FWE). The between-storyline dissimilarity was tested separately in a similar manner. The overlap between these two effects was shown in the lower panel of Supplementary Fig. 5.
Temporal receptive window index
Following Yeshurun et al. (2017)5, the TRW index was generated based on an independent dataset from Lerner et al. (2011)1, which includes an intact story (“Pieman”, ~7 min long) and the same story with scrambled word order. Inter-subject correlation between averaged time series of each ROI was computed, using the leave-one-subject-out method, and normalized using Fisher’s z transformation. TRW index was then calculated by subtracting the ISC of the scrambled story from that of the intact story. We examined the correlation between TRW and storyline effect (Fig. 2c) across regions.
Time course of the storyline effect at segment boundary
To further illustrate the time course of the storyline effect, a long TRW ROI, i.e. posterior cingulate cortex, was selected. We computed the pattern similarities between each of the -40-40 TRs around segment onsets and the typical A and B storyline patterns using a leave-one-subject-out method. For example, for the boundary between segment 1 and segment 2, -40~40 TRs around the onset of segment 2 were extracted from one subject. The typical A storyline pattern was obtained by averaging all the A storyline TRs, except for the segments analyzed here, i.e. segment 1 and 2, from the rest of the subjects. The typical B storyline pattern was obtained in the same manner. Pearson correlation between the 81 TRs around segment 2 onset and the typical A and B patterns were calculated and normalized with Fisher’s z-transformation. The same procedure was repeated for each subject and each boundary. Fig. 2b shows the transition from B to A segments. Please see Supplementary Fig. 3 for the transition from A to B segments.
RSA of narrative motif effect
For each narrative motif occurrence, we obtained the corresponding activation pattern by averaging 5 TRs immediately after its onset based on the intuition that motif effect was transient and lasted only for a few sentences. Pearson correlation coefficients between activation patterns of motifs in the AB part and motifs in the C part were computed with the leave-one-subject-out method and normalized with Fisher’s z transformation. As shown in Fig. 4, pattern similarities between narrative motifs were grouped into three types: (1) same motif, (2) different motifs from the same storyline, and (3) different motifs from different storylines (unrelated). For example, pattern similarities between different occurrences of “chili” belong to (1). Similarities between “chili” and other A storyline motifs belong to (2). Similarities between chili and B storyline motifs belong to (3). Motif effect of each “chili” token in part C was defined as the averaged type (1) similarity minus the averaged type (2) similarity, in order to eliminate the confound of storyline effect.
The group motif effect was thresholded with a permutation test. For each ROI, the above procedure was repeated after shuffling the labels of motifs within storylines for 10000 times, creating a null distribution. To correct for multiple comparisons across ROIs, the largest motif effect across ROIs in each of the 10000 iterations was extracted, resulting in a null distribution of the maximum motif effects. Only ROIs with a group motif effect exceeding 95% of the null distribution were considered significant (p < .05, FWE)(Fig. 5, middle).
Time course of the narrative motif effect
To further illustrate the time course of the motif effect, for each motif in C, the Pearson correlation coefficients between activation patterns of -5~10 TRs around its onset and the activation patterns of motifs in the AB part were computed. Motifs with a -5~10 TRs time window that overlapped with the between-segment silent pauses were excluded from this analysis. The resulting coefficients were normalized with Fisher’s z-transformation and averaged by categories (same motif and same storyline, different motif but same storyline, and unrelated). For each ROI, we applied two-tailed paired t-tests to compare pattern similarities between categories at each time point (p < .05, FWE correction for time points). Fig. 5 shows the resulting pattern similarity around narrative motif onset.
Narrative motif vs. High-frequency word effect
To verify that the motif effect did not result from repeated wordings or word-level semantics, we replaced the narrative motifs with storyline-specific high-frequency words and performed the same RSA. More specifically, among words that only occurred in A and C parts and words that occurred only in B and C parts, we chose the 28 words with the highest lemma/word stem frequencies (Supplementary Table 4). Two out of the twenty-eight narrative motifs were included in this list. Together, these words occurred 111 times in the AB part and 110 times in the C part. Among regions showing a significant motif effect, we calculated the difference between the real motif effect and the effect elicited by high-frequency words for each subject. The 25 difference values were entered into a one-sample one-tailed t-test. The results were thresholded at p < .05 (FWE, Fig. 6).
Within Subject RSA of the storyline and motif effect
We used across-subject analyses to boost the signal-to-noise. In a prior study (Chen et al., 2017, Nature Neuroscience)17, we found that neural patterns associated with the perception and retrieval of specific events in a movie are shared across subjects. The finding of such shared neural coding indicates that averaging the neural patterns across subjects can boost the signal-to-noise. For a similar reasoning and detailed analysis of such effects, please see Simony et al. (2016)11.
Although averaging responses across subjects is expected to improve our SNR, we are able to detect the same effect within individuals. As we predicted (based on our prior work), the results of these within-subject analyses are qualitatively similar to the across-subject analyses but somewhat weaker. We included the results of within-subject RSA thresholded using group t-test as in inter-subject RSA in Supplementary Fig. 6-9. Considering the potential impact of temporal autocorrelation35 and low-frequency drift36 in the fMRI signals on within-subject similarity matrix, especially between neighboring segments, we also included results thresholded using the label permutation method (Supplementary Fig. 10). For the storyline effect, we shuffled the labels of segment 1-30 10000 times to obtain a null distribution of the group mean effect. This procedure was performed for each ROI and the resulting p-value was corrected for multiple comparisons across ROIs (FWE). The storyline x time effect was tested within regions showing significant storyline effect by shuffling segment labels within storylines.
Correlation between hippocampal-cortical ISFC and cortical reinstatement of storyline and motif
To examine whether the cortical reinstatement of storyline was dependent on connectivity with hippocampus, we examined the Pearson correlation between hippocampal-cortical inter-subject connectivity (ISFC)11 and the storyline effect across segments for each subject, within ROIs showing a significant storyline effect.
The ISFC was computed within the 0-40 TR time window after the onset of each segment using the leave-one-subject-out method, i.e. the correlation between one subject’s hippocampal activity and the averaged cortical activity of the other subjects. We used the preprocessed data without regressing out the effects of between-segment pause and the audio envelope because it is possible that the activation pulse between segments (Supplementary Fig. 1) does not only reflect the silence but also memory encoding or retrieval37. SRM was not applied because topographical alignment is not a concern when comparing the averaged time series between ROIs. The hippocampus seed was defined using Harvard-Oxford cortical structural probabilistic atlas thresholded at 25%.
The storyline effect for each segment was also computed using the leave-one-subject-out method. The typical activation patterns for A and B storylines were first estimated by averaging data from all but one subject, excluding the current segment. Pattern similarity between the resulting typical A & B patterns and the left-out subject’s activation pattern for the current segment was then computed. The storyline effect was defined as the difference between pattern similarity to the relevant storyline and the similarity to the irrelevant storyline, taking the previous segment as the baseline, for example, for a B segment: (current segment’s similarity to B - similarity to A) - (previous segment’s similarity to B - similarity to A).
The correlation between ISFC and the storyline effect across segments was computed for each subject, excluding the first segment of each storyline. The R-values were entered into a one-tailed t-test after Fisher’s z-transformation. This initial analysis did not yield a significant result after correction for multiple ROIs (N = 25, p < .05, FDR correction). In exploratory follow-up analyses, we then systematically examined the influence of the time window of ISFC, the time window of the storyline effect, the hippocampus seed (whole vs. posterior: MNI y <= -30), and the baseline of the storyline effect. We also examined the correlation across subjects in each segment. For each combination of analysis parameters, we corrected for multiple-comparisons across ROIs using the FDR method. Please see Supplementary Fig. 13 for the results. No significant correlation was found after simultaneous FDR-correction for multiple-comparisons across ROIs and the twelve sets of analysis parameters.
We examined the correlation between hippocampal ISFC and motif reinstatement in a similar manner. The motif effect was defined and computed using the RSA method described above (based on 5 TRs after motif onsets, using similarity between different motifs of the same storylines as baseline)(Fig. 4). Across motifs in the C part, correlation between the motif effect and ISFC after motif onset was then computed for each subject. We also examined the correlation across subjects for each motif and the influence of ISFC time windows and hippocampus seeds.
Data availability. The data used in this study have been publicly released as part of the "Narratives" collection. Raw MRI data are formatted according to the Brain Imaging Data Structure (BIDS) with exhaustive metadata and are publicly available on OpenNeuro: https://openneuro.org/datasets/ds002245. The data corresponding to this study are indicated using the "21styear" task label. These data can be cited using the following reference:
Nastase, S. A., Liu, Y.-F., Hillman, H., Zadbood, A., Hasenfratz, L., Keshavarzian, N., Chen, J., Honey, C. J., Yeshurun, Y., Regev, M., Nguyen, M., Chang, C. H. C., Baldassano, C. B., Lositsky, O., Simony, E., Chow, M. A., Leong, Y. C., Brooks, P. P., Micciche, E., Choe, G., Goldstein, A., Halchenko, Y. O., Norman, K. A., & Hasson, U. Narratives: fMRI data for evaluating models of naturalistic language comprehension. https://doi.org/10.18112/openneuro.ds002245.v1.0.3