Relating the Past with the Present: Information Integration and Segregation during Ongoing Narrative Processing

Abstract This study examined how the brain dynamically updates event representations by integrating new information over multiple minutes while segregating irrelevant input. A professional writer custom-designed a narrative with two independent storylines, interleaving across minute-long segments (ABAB). In the last (C) part, characters from the two storylines meet and their shared history is revealed. Part C is designed to induce the spontaneous recall of past events, upon the recurrence of narrative motifs from A/B, and to shed new light on them. Our fMRI results showed storyline-specific neural patterns, which were reinstated (i.e., became more active) during storyline transitions. This effect increased along the processing timescale hierarchy, peaking in the default mode network. Similarly, the neural reinstatement of motifs was found during Part C. Furthermore, participants showing stronger motif reinstatement performed better in integrating A/B and C events, demonstrating the role of memory reactivation in information integration over intervening irrelevant events.


INTRODUCTION
Real-life events unfold over multiple minutes. Using real-life stimuli such as stories and movies, previous studies have revealed a cortical hierarchy of timescales that synthesize information over increasing temporal receptive windows (TRWs; Baldassano et al., 2017;Yeshurun, Nguyen, & Hasson, 2017;Chen et al., 2016;Honey, Thesen, et al., 2012;Lerner, Honey, Silbert, & Hasson, 2011). To address these findings, we proposed a process-memory model (Hasson, Chen, & Honey, 2015). Unlike classic theories of working memory, which distinguish between areas that process incoming information and working memory buffers that accumulate and protect the processed information (Cowan, 2008;Baddeley, 2003), the process-memory model posits that all cortical areas actively sustain memories while dynamically synthesizing them with newly arrived input at their preferred timescales. Namely, early sensory areas integrate information over short timescales of tens of milliseconds, coinciding with the duration of phonemes and words. Adjacent areas along the superior temporal cortex integrate information over hundreds of milliseconds, coinciding with the duration of single sentences, whereas high-order areas, which overlap with the default mode network (DMN; Buckner, Andrews-Hanna, & Schacter, 2008;Raichle et al., 2001), integrate information across paragraphs as the narrative unfolds over many minutes. This framework illustrates a simple recurrent mechanism for continuous event updating in long processing timescale areas at the top of the hierarchy. However, in real life, we often have to integrate discontinuous pieces of information to develop a full understanding of an event. This raises the question of how areas with long processing timescales integrate incoming information with relevant past events, while, at the same time, preserving the accumulated information and protecting it from being integrated with irrelevant current events.
To probe this question, we collaborated with a professional author (C. L.) to craft an original fictional story with a purposefully designed narrative structure. The first part of the narrative consisted of two seemingly unrelated storylines, A, which takes place in Los Angeles, employing one set of characters, and B, which takes place in New York and involves a distinct set of characters ( Figure 1). The two storylines were presented in an interleaved fashion over 30 segments, 15 segments for each storyline (A 1 B 1 A 2 B 2 … A 15 B 15 ). In the last 15 segments (Part C), characters from the two storylines meet in New York and their shared history is revealed. In other words, Part C updates the two storylines with new information previously unknown to the audience. One of the main techniques for bridging Part C with Parts A and B was to embed specific images/situations/phrases, that is, narrative motifs, within either the A or B storylines. The recurrence of these motifs in Part C was designed to 1 Princeton University, 2 Columbia University, 3 Tel Aviv University reinstate specific moments from Storylines A and B. For example, in Segment A 1 , the main character, Clara, makes homemade chili for her husband in Los Angeles (LA). In Part C, Clara eats and comments on a Storyline B character's (Steven's) homemade chili recipe, which is designed to reactivate the memory of Segment A 1 and update it by revealing that Clara was making chili following Steven's example. To develop a full understanding of this story, the listeners need to piece together a series of clues (motifs and storylines) like a detective.
We hypothesized that, within individual A/B segments, high-level cortical regions would integrate information over time (i.e., they would show process memory) to build up a representation of the ongoing situation. In keeping with event segmentation theory (Zacks & Swallow, 2007;Zacks et al., 2001), we further hypothesized that storyline switches would result in "flushing out" of the previous storyline's representation in high-level cortical regions (see also Chien & Honey, 2020;DuBrow, Rouhani, Niv, & Norman, 2017;Ezzyat & Davachi, 2011), making it possible for that region to start representing the features of the other storyline. We also hypothesized that episodic memory would play a key role in reinstating existing storyline representations when the narrative returned to those storylines.
These hypotheses led to the following predictions about the A/B period of the story: First, we predicted that high-level cortical regions would contain distinct neural representations for the A and B storylines, as the storylines depict different situations and (according to the event segmentation theory) these distinct situation models will be flushed out at storyline switches, minimizing "carryover" of the neural patterns. Second, we predicted that hippocampus would be engaged at storyline switches to reactivate stored episodic memories of the relevant storyline and that the degree of hippocampal engagement would predict the degree of activation of the relevant storyline representation; we measured hippocampal engagement using hippocampal-cortical intersubject functional correlation (ISFC; Simony et al., 2016). In a previous study , we observed stimulus-driven ISFC between hippocampus and DMN cortical regions when the participants resumed a movie after a 1-day break; here, we tested whether hippocampal-cortical connectivity would also facilitate the reinstatement of recent memories from minutes ago as listeners switch from one storyline to the other. Finally, we predicted that neural representations of the A and B storylines would become more distinct over the course of the narrative, reflecting the accumulation (across segments) of distinctive details pertaining to each segment.
During part C, we hypothesized that the recurrence of motifs from the A/B storylines (e.g., the chili recipe) in Part C would induce spontaneous recall of related past events and that recalling these A/B events (in response to a motif ) would help participants to understand the relationship between the A/B segments and the narrative in Part C. To test this, we quantified the "reminding" effect during Part C by measuring whether motifs triggered neural reinstatement of corresponding moments in A/B events. To look at how this affected behavior, we computed a "relation score" (using a postscan test) that specifically tracked participants' understanding of how the motifs connected A/B events with the narrative in Part C. We predicted that, across participants, the degree of neural A/B reinstatement triggered by motifs would be correlated with these behavioral relation scores.
Our study is not the first to examine the interleaving of storyline representations and the recurrence of narrative motifs with fMRI. In one relevant study, Milivojevic, Varadinov, Vicente Grabovetsky, Collin, and Doeller (2016) used an audiovisual movie with interleaved storylines ("Sliding Doors") and found coding of storyline information in the hippocampus (but not in the DMN areas that are the focus of our study). In another relevant study, Kauttonen, Hlushchuk, Jääskeläinen, and Tikka (2018) scanned participants who watched the movie "Memento," in which recurring cues were embedded to trigger memory recall, similar to the motifs in our story; they found that recurring scenes activated matching, scene-specific neural patterns in low-level sensory regions and also that there was a common (i.e., not-scene-specific) neural pattern that was evoked in DMN regions whenever participants viewed a scene for a second time.
An important difference between our study and these other studies is that they used preexisting narrative movies as stimuli, whereas we used a stimulus that was customcrafted to address the hypotheses enumerated above. Our use of a custom-crafted stimulus allowed us to build on these prior studies while avoiding confounds that were present in these other studies. A limitation of the design of Milivojevic et al. (2016) is that storyline differences were confounded with sensory differences (e.g., related to some locations being more prevalent in one storyline than the other). Our study resolves this confound because both storylines were presented auditorily by the same speaker and thus did not differ in low-level sensory properties. Likewise, a limitation of the design of Kauttonen et al. (2018) is that recurring scenes were repeated exactly; as such, pattern similarity between matching cues could be because of matching sensory inputs (as opposed to memory). In contrast, in our study, recurring motifs were embedded in different scenes, making it easier to distinguish between the influences of perception and memory. By eliminating these confounds while still providing an engaging stimulus, we hoped to provide a more detailed view of how incoming information is segregated from irrelevant memories and integrated with relevant memories as a narrative unfolds over time.

Participants
Twenty-eight participants were recruited. They were all right-handed native English speakers. All participants provided written consent forms before the experiment. Twenty-five participants were included for further analyses (14 women, aged 18-40 years). Three were excluded: one because of anatomical anomalies, one because of excessive motion artifacts in T1 image, and one who slept during the story. The experimental protocol was approved by the institutional review board of Princeton University.

Stimulus
The stimulus was created by Lazaridi ("The 21st Year"-Excerpt, copyright 2019), who has been in collaboration with our laboratory for a number of years ( Yeshurun, Swanson, et al., 2017). She has years of experience in practicing and evolving the technique of organizing the audience's understanding, memory, and interpretation of a narrative through screenplay writing and professional screenplay development around the world (Lazaridi, 2012). Compared to other types of writing, the creation of a screenplay is highly audience-driven because of the large investment (in time, collaboration, and financing) inherent in film-making. Furthermore, watching a film isa more continuous experience than reading a book, requiring the screenwriter to guide and unite the audience's understanding and overall response to the narrative without loss of focus or inner thought digressions.
Lazaridi designed the narrative stimulus as a stand-alone fiction text that incorporated her experience-guided narrative techniques of traditional screenplay writing. The narrative consisted of 45 segments and two seemingly unrelated storylines, A and B. A and B segments were presented in an interleaved manner for the first 30 segments. In the last 15 segments (Part C), the two storylines merged into a unified narrative. Each segment lasted for 41-57 repetition times (TRs; mean: 46 TRs = 70 sec). They were separated by silent pauses of 3-4 TRs. The narrative was recorded by a professional actress ( June Stein), who is a native English speaker, and directed by Lazaridi to ensure that the actor's interpretation matched the author's intent. The recording is 56 min long. In the A and B segments, the author incorporated unique narrative motifs, that is, specific images/situations/phrases that recurred in Part C (see Figure 1 for a sample motif ). The transcription of the story and a list of all the motifs are available at arks. princeton.edu/ark:/88435/dsp011n79h7374. The recurrence of motifs in Part C is designed to trigger the reinstatement of specific moments from Part AB, to evolve their meanings with new information from Part C. The same narrative motif was not always realized with the same words ("throwing up" vs. "Clara feels sick, as the coffeecake rises to her throat"). In total, there were 28 different narrative motifs, occurring 58 times in Part AB and 36 times in Part C.

Procedure
The recording of the narrative was presented using MATLAB 2010 (The MathWorks, Inc.) and Psychtoolbox 3 (Brainard, 1997) through MRI-compatible insert earphones (Sensimetrics, Model S14). MRI-safe passive noise-canceling headphones were placed over the earbuds for noise reduction and safety. To remove the initial signal drift and the common response to stimulus onset, the narrative was preceded by a 14-TR long musical stimulus, which was unrelated to the narrative and excluded from fMRI analysis. Participants filled a questionnaire after the scanning, to evaluate their overall comprehension of the narrative and their ability to relate events in different parts of the story that shared the same motifs.
After preprocessing, the first 19 TRs were cropped to remove the music preceding the narrative (14 TRs), as well as the time gap between scanning and narrative onset (2 TRs), and to correct for the hemodynamic delay (3 TRs). To verify the temporal alignment between the fMRI data and the stimulus, we computed the temporal correlation between the audio envelop of the stimulus (volume) and the participants' mean brain activation in left Heschl's gyrus following Honey, Thompson, Lerner, and Hasson (2012). The left Heschl's gyrus mask was from Harvard-Oxford cortical structural probabilistic atlases (thresholded at 25%). The audio envelope was calculated using a Hilbert transform and down-sampled to the 1.5-sec TR. The correlations were computed with a lag of −100 to 100 TRs to find the time lag that showed the highest correlation. The averaged peak time was 0.12 TR across participants, indicating that the narrative and fMRI data were temporally well aligned.
To account for the low-level properties of the stimulus, a multiple regression model was built for each voxel. The regressors included an intercept, the audio envelope, and the boxcar function of the between-segment pauses, convolved by the canonical hemodynamic response function and its derivatives with respect to time and dispersion as given in SPM8 (www.fil.ion.ucl.ac.uk/spm/). For the effect of audio amplitude and between-segment pause, please see Appendix A. The residuals of the regression model were used for the following analyses.

ROI Masks
We used 238 functional ROIs defined independently by Shen, Tokoglu, Papademetris, and Constable (2013) based on whole-brain parcellation of resting-state fMRI data. A control anatomical ROI was also included: left Heschl's gyrus defined using the Harvard-Oxford cortical structural probabilistic atlas, thresholded at 25%.
A bilateral anatomical hippocampal mask was obtained using the same threshold. It has been proposed that the temporal integration window of episodic memory representation varies along hippocampal long axis (Collin, Milivojevic, & Doeller, 2015). Whereas the posterior hippocampus has a small representation scale, the middle and anterior portions are able to contain the associations between more than two events. If so, it would be inappropriate to treat hippocampus as a functionally homogenous region. Therefore, we divided the hippocampus mask into anterior (MNI coordinate y > −19), middle (−30 < y ≤ −19), and posterior ( y ≤ −30) parts of ROIs following Collin et al. (2015).
All ROIs had more than 50 voxels in our data.

Shared Response Model
When comparing activation patterns across participants, the mismatch of functional topographies could decrease analysis sensitivity even after anatomical alignment (Sabuncu et al., 2010;Brett, Johnsrude, & Owen, 2002). Therefore, we functionally aligned data within each ROI across participants using the shared response model (SRM; Brain Imaging Analysis Kit, brainiak.org; Chen et al., 2015). SRM projects all participants' data into a common low-dimensional feature space by capturing the components of the response shared across participants. The input to SRM was a TR × Voxel × Participant matrix, and the output was a TR × Feature × Participant matrix. We used fMRI data from the whole story (z scored over time first) to estimate an SRM with 50 features. Note that no information about storyline or motif was submitted to SRM. Therefore, although this projection inflated the overall interparticipant pattern similarity, it could not artifactually give rise to the storyline or motif effect shown here. The output of SRM was z scored over time.
Unless otherwise stated, all the pattern analyses described below were run based on the resulting 50 features. We also performed the same analyses without the application of SRM. Generally speaking, a subset of the areas that were significant in the analysis with SRM was also significant in the analysis without SRM. Please see Appendix B for the results.

Representational Similarity Analysis of Storyline Effect
To examine the storyline effect, we performed representational similarity analysis (RSA) (Kriegeskorte, Mur, & Bandettini, 2008) on brain activation patterns and tested whether the representational similarity between segments from the same storyline was higher than that of segments from different storylines. We first computed the averaged activation within each segment across TRs for each voxel. The resulting 45 values were then z scored across segments. For each ROI, pairwise pattern similarities between the 45 activation maps were computed with the leave-oneparticipant-out method ( Figure 1B). Namely, the averaged activation pattern was extracted for each segment. Then, the Pearson correlation coefficients between one participant's activation patterns and the averaged patterns of the remaining participants were computed. The output correlation coefficients (45 × 45 segments) were normalized with Fisher's z transformation. This procedure was repeated for each of the 25 participants and each ROI.
We then contrasted the averaged within-and betweenstoryline similarities in Part AB, excluding the withinsegment similarities (the diagonal of the 45 × 45 similarity matrix), to obtain 25 contrast values for each ROI. These contrast values were compared to zero by a one-tailed onesample t test and thresholded at p < .05 (FWE correction for multiple comparisons). The results were projected back onto the whole-brain surface and visualized using Freesurfer v6 (surfer.nmr.mgh.harvard.edu/).
To examine whether the storyline effect increased over time, for regions showing a significant storyline effect, we computed the storyline effect in the early (Segments 1-14) and later (Segments 15-30) halves of Part A/B separately. Twenty-five contrast values were generated by comparing the late and early storyline effects (late [same > different storyline] > early [same > different storyline]). These contrast values were again submitted to a one-tailed one-sample t test ( p < .05, FWE).
To test the Storyline × Time effect in a more graded way, we constructed a 45 × 45 time effect matrix, populated with the average of the time points (segment number). For example, the (4, 5) entry of this matrix is 4.5 (= [4 + 5] / 2). Taking only the entries corresponding to within-storyline similarity, excluding the diagonal elements (Appendix C, top), we computed the Pearson correlation between the time matrix and the pattern similarity matrix. The resulting r values were entered into a one-sample one-tailed group t test after Fisher's z transformation within regions showing significant storyline effect (n = 25, p < .05, FWE). The between-storyline dissimilarity was tested separately in a similar manner. The overlap between these two effects was shown in the bottom of Appendix C.

Time Course of the Storyline Effect at Segment Boundary
To further illustrate the time course of the storyline effect, we computed the pattern similarities between each of the −40 to 40 TRs around segment onsets and the typical A and B storyline patterns using a leave-one-participant-out method. For example, for the boundary between Segment 1 and Segment 2, −40 to 40 TRs around the onset of Segment 2 were extracted from one participant. The typical A storyline pattern was obtained by averaging all the A storyline TRs, except for the segments analyzed here, namely, Segments 1 and 2, from the rest of the participants. The typical B storyline pattern was obtained in the same manner. Pearson correlations between the 81 TRs around Segment 2 onset and the typical A and B patterns were calculated and normalized with Fisher's z transformation. The same procedure was repeated for each participant and each boundary.
To further illustrate the Time × Storyline effect, we applied the above analysis to the early and late segments separately. Namely, typical A and B patterns were computed using early and late segments, respectively. TRs around the boundaries between early segments were compared with the early templates, and TRs around the boundary between the late segments were compared with the late templates.

TRW Index
Following Yeshurun, Nguyen, et al. (2017), the TRW index was generated based on an independent data set from Lerner et al. (2011), which includes an intact story ("Pieman," ∼7 min long) and the same story with a scrambled word order. Interparticipant correlation between averaged time series of each ROI was computed, using the leave-one-participant-out method, and normalized using Fisher's z transformation. TRW index was then calculated by subtracting the inter-subject correlation of the scrambled story from that of the intact story. We examined the correlation between TRW and storyline effect across regions.

RSA of Narrative Motif Effect
For each narrative motif occurrence, we obtained the corresponding activation pattern by averaging 5 TRs immediately after its onset based on the intuition that motif effect was transient and lasted only for a few sentences. Pearson correlation coefficients between activation patterns of motifs in Part AB and motifs in Part C were computed with the leave-one-participant-out method and normalized with Fisher's z transformation. Pattern similarities between narrative motifs were grouped into three types: (1) same motif, (2) different motifs from the same storyline, and (3) different motifs from different storylines (unrelated). For example, pattern similarities between different occurrences of "chili" belong to (1). Similarities between "chili" and other A storyline motifs belong to (2). Similarities between chili and B storyline motifs belong to (3). Motif effect of each "chili" token in Part C was defined as the averaged Type (1) similarity minus the averaged Type (2) similarity, to eliminate the confound of storyline effect.
The group motif effect was thresholded with a permutation test. For each ROI, the above procedure was repeated after shuffling the labels of motifs within storylines 10000 times, creating a null distribution. To correct for multiple comparisons across ROIs, the largest motif effect across ROIs in each of the 10000 iterations was extracted, resulting in a null distribution of the maximum motif effects. Only ROIs with a group motif effect exceeding 95% of the null distribution were considered significant ( p < .05, FWE).

Motif-Related Events with Shared vs. Different Main Characters
To verify that the motif effect does not only reflect shared characters, the author of the story designated the main character in each motif-related event and a predicted pattern similarity matrix was generated based on whether two events shared the same character. We ran a permutation test similar to the one described above and found no significant character effect ( p < .05, FWE; Appendix D).

Time Course of the Narrative Motif Effect
To further illustrate the time course of the motif effect, for each motif in C, the Pearson correlation coefficients between activation patterns of −5 to 10 TRs around its onset and the activation patterns of motifs in Part AB were computed. Motifs with a time window of −5 to 10 TRs that overlapped with the between-segment silent pauses were excluded from this analysis. The resulting coefficients were normalized with Fisher's z transformation and averaged by categories (same motif and same storyline, different motif but same storyline, and unrelated). For each ROI, we applied two-tailed paired t tests to compare pattern similarities between categories at each time point ( p < .05, FWE correction for time points) and showed the resulting pattern similarity around narrative motif onset.

Narrative Motif vs. High-Frequency Word Effect
To verify that the motif effect did not result from repeated wordings or word-level semantics, we replaced the narrative motifs with storyline-specific high-frequency words and performed the same RSA. More specifically, among words that only occurred in Parts A and C and words that occurred only in Parts B and C, we chose the 28 words with the highest lemma/word stem frequencies (Appendix E). Two of the 28 narrative motifs were included in this list. Together, these words occurred 111 times in Part AB and 110 times in Part C. Among regions showing a significant motif effect, we calculated the difference between the real motif effect and the effect elicited by high-frequency words for each participant. The 25 difference values were entered into a one-sample one-tailed t test. The results were thresholded at p < .05 (FWE).

Within-Participant RSA of the Storyline and Motif Effect
On the basis of our prior work showing that betweenparticipant analysis is able to reveal the shared coding of events across participants (Baldassano, Hasson, & Norman, 2018;Baldassano et al., 2017;Chen et al., 2017;Zadbood, Chen, Leong, Norman, & Hasson, 2017) and that it also boosts signal-to-noise ratio (SNR; Simony et al., 2016), we adopted between-participant RSA as our "default" in this study; however, we also included withinparticipant RSA for comparison purposes.
The results of our within-participant RSA analyses are shown in Appendices F-I. Considering the potential impact of temporal autocorrelation (Mumford, Davis, & Poldrack, 2014) and low-frequency drift (Alink, Walther, Krugliak, van den Bosch, & Kriegeskorte, 2015) in the fMRI signals on within-participant similarity matrix, especially between neighboring segments, we also included storyline analyses thresholded using the label permutation method (Appendix J). For the storyline effect, we shuffled the labels of Segments 1-30 10000 times to obtain a null distribution of the group mean effect. This procedure was performed for each ROI, and the resulting p value was corrected for multiple comparisons across ROIs (FWE). The Storyline × Time effect was tested within regions that showed a significant storyline effect by shuffling segment labels within storylines.

Correlation between Hippocampal-Cortical ISFC and Cortical Reinstatement of Storyline and Motif
To examine whether the cortical reinstatement of storyline was dependent on connectivity with the hippocampus, we examined the Pearson correlation between hippocampal-cortical ISFC  and the storyline effect across segments for each participant, within ROIs showing a significant storyline effect.
The ISFC was computed within the time window of 0-40 TRs after the onset of each segment using the leave-oneparticipant-out method, that is, the correlation between one participant's hippocampal activity and the averaged cortical activity of the other participants. We used the preprocessed data without regressing out the effects of between-segment pause and the audio envelope because it is possible that the activation pulse between segments (Appendix A) does reflect not only the silence but also memory encoding or retrieval (Ben-Yakov & Dudai, 2011). SRM was not applied because topographical alignment is not a concern when comparing the averaged time series between ROIs. The hippocampus seed was defined using the Harvard-Oxford cortical structural probabilistic atlas thresholded at 25%.
The storyline effect for each segment was also computed using the leave-one-participant-out method. The typical activation patterns for A and B storylines were first estimated by averaging data from all but one participant, excluding the current segment. Pattern similarity between the resulting typical A and B patterns and the left-out participant's activation pattern for the current segment was then computed. The storyline effect was defined as the difference between pattern similarity to the relevant storyline and the similarity to the irrelevant storyline, taking the previous segment as the baseline, for example, for a B segment: (current segment's similarity to B − similarity to A) − (previous segment's similarity to B − similarity to A).
The correlation between ISFC and the storyline effect across segments was computed for each participant, excluding the first segment of each storyline. The r values were entered into a one-tailed t test after Fisher's z transformation. This initial analysis did not yield a significant result after correction for multiple ROIs (n = 25, p < .05, false discovery rate [FDR] correction). In exploratory follow-up analyses, we then systematically examined the influence of the time window of ISFC, the time window of the storyline effect, the hippocampus seed (whole vs. posterior: MNI y ≤ −30), and the baseline of the storyline effect. We also examined the correlation across participants in each segment. For each combination of analysis parameters, we corrected for multiple comparisons across ROIs using the FDR method.
We examined the correlation between hippocampal ISFC and motif reinstatement in a similar manner. The motif effect was defined and computed using the RSA method described above (based on 5 TRs after motif onsets, using similarity between different motifs from the same storylines as a baseline). Across motifs in Part C, the correlation between the motif effect and ISFC after motif onset was then computed for each participant. We also examined the correlation across participants for each motif and the influence of ISFC time windows and hippocampus seeds.

RESULTS
fMRI data were collected from 25 participants while they listened to a structured narrative that lasted for approximately 1 hr. The narrative has two interleaved, seemingly unrelated storylines, A and B, that converge in the later C part. In the first set of analyses, we tested how ongoing information from each of the two unrelated storylines was accumulated across minute-long segments while being segregated from the parallel unrelated interleaved storyline. In the second set of analyses, we tested how events in Parts A and B were reactivated in Part C. The two storylines are connected to Part C using 28 specifically designed, recurrent narrative motifs. These motifs were planted at specific, strategic moments of the narrative by the author (58 occurrences in Parts A and B, 36 occurrences in Part C). Participants' understanding of the relations created by these motifs was assessed based on postscan questionnaires.
For the fMRI data, we first regressed out the effect of audio amplitude and between-segment pause (Appendix A) and applied the SRM to adjust for the mismatch of functional topographies across participants ; please see Appendix B for results without applying SRM). Using RSA (Kriegeskorte et al., 2008) on brain activation patterns within ROIs independently defined by a whole-brain parcellation of resting-state fMRI (Shen et al., 2013), we tested whether the structure of the story induced the reinstatement of storylines and narrative motifs and whether this led to the integration of separate events.

Neural Reinstatement of Storyline
We first examined whether, and if so, where in the brain the two seemingly unrelated storylines (A and B) had distinct cortical representations. Using RSA, we compared the neural patterns within each storyline (AA and BB) to the neural patterns between the two storylines (AB). Within each ROI, we averaged over time within each segment (lasting approximately 1 min) to extract a spatial pattern of activity for that segment. We then compared pattern similarity between segments from the same storyline to pattern similarity between segments from different storylines (Figure 2A).
Higher within-storyline pattern similarity was revealed in a large set of regions, including language areas (superior/ middle temporal gyrus, inferior frontal gyrus, and supplementary motor cortex), areas in the DMN (including posterior cingulate cortex [PCC], precuneus, medial prefrontal cortex, superior frontal gyrus, posterior parietal cortex, angular gyrus [AG], posterior hippocampus, and parahippocampal cortex), areas in the executive network, (including anterior insula, middle temporal gyrus, middle cingulate gyrus, and supramarginal gyrus [SMG]), high-order visual areas (including cuneus and fusiform gyrus), and subcortical areas (including putamen, thalamus, and caudate). Figure 2B shows the time course of storyline effect at the segment boundary in the region where the largest separation across the two storylines was found, that is, PCC/precuneus. We computed the pattern similarity between each of the −40 to 40 TRs around segment boundaries and the typical A or B storyline patterns. At the boundary between B and A segments, the similarity to the typical B pattern rapidly dropped, whereas the similarity to the typical A pattern increased. The two waveforms crossed around the boundary. Similar results were obtained for the complementary transition from A to B segments.
It is worth noticing that, although the two curves in Figure 2B seem to be symmetrical with respect to zero, that does not mean that the two storylines had opposing activation patterns. The two patterns are forced to average to approximately zero by the need to subtract the global mean response before computing the typical A/B patterns (Garrido, Vaziri-Pashkam, Nakayama, & Wilmer, 2013;Murphy, Birn, Handwerker, Jones, & Bandettini, 2009). Therefore, the correlation values only reflect the relative, but not the absolute, difference between storylines.

Stronger Neural Reinstatement of Storyline in Areas with Longer Processing Timescales
We used an independent data set (Lerner et al., 2011) to generate a TRW index for each ROI, that is, the difference in interparticipant correlation between an intact story and its scrambled version. Higher TRW indices were found in prior studies to be associated with increased capacity to accumulate information over long timescales (Yeshurun, Nguyen, et al., 2017;Lerner et al., 2011). If the storyline effect only reflected a difference in low-level properties such as wording or acoustic features (note that the same narrator read all segments), regions with low TRW, that is, regions insensitive to word scrambling, should also show a storyline effect as strong as that in high-TRW regions. On the contrary, we found a significant positive correlation between TRW index and storyline effect ( Figure 2C). In other words, areas that are capable of accumulating information over long timescales had a larger difference between storylines.

Storyline × Time Effect
We predicted that the segregation of the two storylines (A and B) should increase as the story unfolds and participants accumulate further information about the unique context of each storyline. To test this hypothesis, we examined whether the storyline effect increased over time by dividing Part AB into the early and later halves ( Figure 3A). Within areas showing the separation between storylines, an increase in the separation of patterns at the later phase (leading to a significant interaction between time and storyline) was found in PCC/precuneus, left AG/inferior parietal Shaded areas indicate 95% CI across participants. The vertical gray-shaded area shows the silent pause at boundary. lobule (IPL), left superior frontal gyrus, right inferior frontal gyrus, right middle temporal gyrus/middle occipital gyrus, and right superior parietal lobule. Figure 3B shows the storyline transition at segment boundaries in the early and late AB parts, respectively.
We compared the storyline effect in the early and late time bins to avoid imposing assumptions on the time effect (e.g., linearity). Having said this, a graded Time × Storyline effect is observed in similar regions (Appendix C).
We also examined pattern similarity in anatomically defined hippocampus ROIs and again observed the separation between storylines (Figure 4). The three-way interaction between subregions (anterior/middle/posterior), time (early/late), and storyline (

Neural Reinstatement of Narrative Motifs
Information in Part C sheds new light on both A and B events, for example, Clara learned the chili recipe from Steven; Margaret's mustard-stained blouse now reminds Steven of her death. We examined how past information from Part AB was reinstated during Part C upon the recurrence of the narrative motifs. For each occurrence of motifs in the story, we averaged the 5 TRs after its onset. Then, we correlated each reoccurrence of a narrative motif in Part C with all its occurrences in Part A or B ( Figure 5). The correlation between matching motifs was computed, as well as the correlation between nonmatching motifs from the same storyline (shared storyline) and the correlation between nonmatching motifs from the competing storyline (unrelated segments).
Compared to nonmatching motifs from the same storyline, the reappearance of the narrative motifs in Part C reinstated specific neural patterns seen when the motifs were encountered during the A/B segments in PCC/precuneus, bilateral clusters in posterior temporal lobe/inferior parietal lobes/higher visual areas, bilateral lateral frontal areas, and dorsal medial prefrontal cortex ( Figure 6). Furthermore, TR-by-TR analysis around the onsets of narrative motifs in Part C showed that the correlation rapidly increased after motif onset and lasted for 4-7 TRs, approximately three to six sentences ( Figure 6, top and bottom). The reinstatement effect was specific to matching motifs and was not seen between nonmatching motifs, either within or across storylines.
To verify that the motif effect does not only reflect shared characters, we compared entries with shared versus different main characters in the Motif × Motif Pattern similarity matrix and found no significant character effect (Appendix D).

Narrative Motifs vs. High-Frequency Word Effects
To make sure that the reinstatement of patterns after motif onsets reflects the retrieval of narrative information (as opposed to simple reactivation of word representations shared between the A/B and C segments, e.g., the representation of the word "chili"), we performed the same analysis on a set of high-frequency words that occurred in Part C and in either the A or B storyline (e.g., "watch"). We analyzed 28 high-frequency words to match the number of narrative motifs. If the neural reinstatement effect that we observed for motifs simply reflected the reactivation of word representations, the same effect should be observed when we look at the repetition of high-frequency words like "watch" that have no particular narrative significance. In all ROIs showing a significant motif effect, besides the dorsal PCC, the correlation between matching items was significantly higher for the narrative motifs compared to the high-frequency words, for which the correlation hovered around zero ( p < .05, FWE corrected; Figure 7). This indicates that word repetition alone was not sufficient to drive the motif reinstatement effect we observed; rather, the words had to refer to significant narrative events (as is true for "chili" but not for "watch").

Correlation between Motif Reinstatement and the Behavioral Relation Score
To test whether participants showing stronger neural reinstatement also did a better job of integrating related events, we evaluated how well participants understood the relation between the AB and C events using 14 open questions in the postscan questionnaire. The fMRI participants' answers were evaluated by five raters who were blind to our hypothesis (three women, aged 26-31 years). The raters were asked to judge whether the fMRI participants responded by remembering only Part AB (score = 0), only Part C (score = 0), or both (score = 1). Responses indicating no memory or false memory were given a score of 0.
Below are two sample questions and real responses from the fMRI participants (the complete tables of all the answers and scores are available at arks.princeton.edu /ark:/88435/dsp011n79h7374): The following prompts are words, sentences, or phrases that recurred in the story. Please explain their significance to the story: Question: Homemade chili=? Answer (average score 1): Clara makes homemade chili in the beginning and we find out that Clara learned the recipe from Steven, but Steven makes it better. Answer (average score 0): Steven makes it. Answer (average score 0): What Clara makes and Gary forgets to eat.
Question: Mustard stained blouse =? Answer (average score 1): Margaret stains her blouse with mustard when Figure 5. Predicted motif effect. Events from the convergence part (Part C) were bridged to specific events in Part AB by motifs, for example, chili, that appeared both in Part C and in Part AB. For each motif presented in Part C, we identified time points in Part AB where the same motif was presented (indicated by red lines or red-outlined triangles) as well as time points where other motifs were presented (indicated by gray and black triangles). Focusing only on time points surrounding motif presentation (indicated in the figure by gathering all of the triangles together), we searched for regions where motif-evoked neural patterns in Part C were more similar to neural patterns evoked by the same motif in Part AB than neural patterns evoked by different motifs in Part AB.
meeting with Alexander, also Steven can't bear to wash it after Margaret dies. Answer (average score 0): Margaret's blouse is stained when she meets with Alexander for the first time. Answer (average score 0): Clara eating hotdogs in NYC.
We correlated (across participants) the size of the neural motif reinstatement effect for each participant with the participant's ability to relate Part AB of the story to Part C, as revealed by their average relation score. This analysis was run separately for each ROI that showed a significant motif reinstatement effect. Within these ROIs, we found a significant correlation between the neural reinstatement of motifs and the individual relation scores in left IPL, left SMG, left AG, and right middle frontal gyrus (MFG; one-tailed, FDRcorrected q < .05; Figure 8). In other words, participants who showed a stronger neural reinstatement of motifs were also better at reporting the narrative-related connections among separate events sharing the same motifs.
A possible alternative explanation of this result is that the relation score is tapping memory strength (not relations per se) and that the correlation between neural motif reinstatement and relation score simply reflects a shared influence of memory strength on these variables (i.e., high memory strength is associated with strong neural reinstatement and a high relation score, leading to a correlation between these two measures). To address this potential confound, we generated a separate "AB memory score" based on 14 questions in the postscan questionnaire that only required a good memory of Part AB to answer (e.g., What does Clara find out during the party?); the mean level of accuracy for these AB memory score questions was 75% (SD = 17%). We then computed the partial correlation between the relation score and the neural motif effect, controlling for the AB memory score. The partial correlation effect was significant in left AG (r = .67, p < .001). Left SMG (r = .49, p = .008), left IPL (r = .42, p = .020), and right MFG (r = .42, p = .020) also showed partial correlations that were (individually) significant, although they were not significant after FDR correction for multiple ROIs. The fact that the relation score and neural motif effect were still correlated, even after controlling for the AB memory score, indicates that this relationship cannot merely be explained in terms of memory strength.

Between-Participant vs. Within-Participant RSA
We used between-participant RSA for the analyses described above because our previous study  found that neural patterns associated with the perception and retrieval of specific events in a movie are shared across participants (for similar findings, see Baldassano et al., 2017Baldassano et al., , 2018Zadbood et al., 2017)-to the extent that these patterns are shared, this suggests that averaging the neural patterns across participants will boost the SNR. For a similar argument and a detailed analysis of Figure 7. Motif versus high-frequency word effect. Among regions showing a significant motif effect (marked by the blue outline), the effect of storyline-specific high-frequency words was computed using the same RSA method. The two effects were compared using a one-tailed one-sample t test (n = 25, p < .05, FWE). Shaded areas show the distribution across participants. such effects, see Simony et al. (2016). For completeness, we have also included within-participant versions of our analyses. As we predicted (based on our prior work), the results of these within-participant analyses are qualitatively similar to the across-participant analyses but somewhat weaker, presumably because of a lower SNR (Appendices F-J).

Hippocampal-Cortical ISFC and Cortical Reinstatement of Storyline and Motif
To examine whether storyline reinstatement is dependent on connectivity with the hippocampus, we examined the correlation between storyline effect and hippocampalcortical ISFC  in ROIs showing a significant storyline effect. For each A/B segment (except for the first segment of each storyline), we correlated the storyline effect with hippocampal-cortical ISFC during that segment and ran the correlation across segments (within participants) and participants (within segments). Because we did not have strong predictions about the relevant time windows for computing the storyline effect and ISFC, we ran an exploratory grid search across a range of analysis parameters. The ISFC between hippocampus and mPFC showed a strong correlation with the storyline effect (FDR-corrected q < .05 across ROIs) for multiple settings of analysis parameters (Figure 9), although the result did not survive multiple comparison correction when factoring in the full set of analysis parameters, so it should be interpreted with caution. We also examined the correlation between hippocampal ISFC and motif reinstatement in ROIs showing a significant motif effect but did not find a significant correlation.

DISCUSSION
For this study, we actively designed a structured narrative in collaboration with a professional author to test how related events are dynamically and flexibly integrated by the brain while being protected and segregated from intervening irrelevant events. Our results indicate that the memory traces of recent events can be reactivated as a function of current input. This is seen in Figure 2B, where the neural patterns associated with the current storyline were reactivated at segment boundaries, while the activation patterns of the irrelevant storylines subsided. This effect was stronger in areas with longer processing timescales, peaking in the DMN (Figure 2A and C). The reinstatement of relevant past events was also tested in Part C by using motifs to reactivate and update particular moments from both storylines ( Figure 6). As predicted, the presentation of specific motifs in Part C triggered the reinstatement of associated A/B events. Taken together, these results revealed a dynamic shift between currently active context and latent inactive contexts, which helps to integrate information over minute-long interruptions while protecting the accumulated information from irrelevant input.
Each storyline/scene is a unique combination of multiple narrative elements, such as characters, locations, goals, and so forth. In a prior paper (Yeshurun, Nguyen, et al., 2017), we showed that local differences in narrative elements (e.g., switching two to three words in a sentence with their antonyms) are amplified in the DMN, which showed response differences that are robust to spatial and temporal blurring. Furthermore, our prior work has shown that at least 15 dimensions of information are encoded in DMN activation patterns shared across participants . Therefore, we speculate that the difference between the neural representations of the two storylines was driven by the unique combinations of narrative features and should not be attributed to any single narrative dimension in isolation.
Using an audiovisual movie with interleaved storylines, a recent study (Milivojevic et al., 2016) reported the emergence of storyline-specific patterns in the hippocampus. However, as noted earlier, storyline information in that study was confounded with sensory features. Milivojevic et al. (2016) sought to control for these differences by regressing out sensory features, but this strategy is not ideal: When storyline and sensory information are strongly correlated, regressing out sensory information can attenuate legitimate storyline effects. Our study avoided this confound by presenting both storylines auditorily by the same speaker. We not only replicated the finding of neural differentiation in the hippocampus (Figure 4) but also revealed similar patterns of results in extended cortical areas, including language areas, the DMN, the executive network, high-order visual areas, and subcortical areas. In another related study, Lahnakoski, Jääskeläinen, Sams, and Nummenmaa (2017) interleaved two independent movies and found that breaking the flow of a narrative (by interleaving) disrupts the accumulation of information relative to continuous viewing, similar to our previous studies Honey, Thesen, et al., 2012;Lerner et al., 2011). However, unlike our study and Milivojevic et al.'s (2016) study, this study did not test for the coding of storyline-specific information and its reactivation across interruptions.
Our results also extend those of Milivojevic et al. (2016) by showing a difference along the long axis of the hippocampus in how the storyline effect developed over time. The anterior hippocampus only showed a significant storyline effect late (but not early) in Part AB, whereas the posterior hippocampus showed a significant storyline effect both early and late in Part AB. These results fit with the idea that the timescale of integration increases from posterior to anterior along the long axis in the hippocampus (Brunec et al., 2018;Duncan & Schlichting, 2018;Morton, Sherrill, & Preston, 2017;Collin et al., 2015). Specifically, the idea that the anterior hippocampus integrates slowly (i.e., over a long timescale) can explain why the storyline effect builds up relatively slowly in this region. Likewise, the idea that the posterior hippocampus integrates quickly (i.e., over a short timescale) can explain why the storyline effect appears early in this region. The results from the middle hippocampus ROI, showing a storyline effect early but not late in Part AB, are more puzzling. Speculatively, the preferred integration timescale for this ROI might be broad enough to connect a story segment to adjacent story segments (which are from the other storyline), but not broad enough to connect to more-temporally-distant segments from the same storyline; more work needs to be done to test this idea.
As noted above, one of our key hypotheses is that memory reactivation supports the integration of these reactivated memories with new inputs (for related work, see Cohn-Sheehy et al., 2020;Griffiths & Fuentemilla, 2020;Clewett, DuBrow, & Davachi, 2019;Backus, Schoffelen, Szebényi, Hanslmayr, & Doeller, 2016;Schlichting & Preston, 2015, 2016Zeithamova, Dominick, & Preston, 2012;Zeithamova, Schlichting, & Preston, 2012;Zeithamova & Preston, 2010;Shohamy & Wagner, 2008). This integration hypothesis is supported by several aspects of our results. First, we found that participants showing stronger neural reinstatement of motifs in left SMG, AG, IPL, and right MFG also showed higher relation scores (Figure 8). This finding supports the hypothesis that reinstatement of motifs led to better integration of Parts AB and C. In addition, the observed Storyline × Time effect (Figure 3) is consistent with the hypothesis that new segments updated the representations of the two storylines and pushed them further apart. It is worth noting, however, that the Storyline × Time effect (computed using betweenparticipant RSA) does not on its own provide definitive evidence for the within-storyline integration. An alternative explanation for this result is that, over time, individual storylines become represented in a more stable and stereotyped fashion across listeners (thus making the two storylines more discriminable across listeners), as opposed to the storylines becoming more different within listeners. In principle, the within-participant RSA results (shown in Appendix G) should be able to resolve this point, as any difference in the storyline effect over time in this analysis necessarily would reflect a within-participant effect. However, in practice, the results of this analysis are equivocal. The pattern of results across ROIs is qualitatively similar (suggesting that the storyline representations truly are moving apart), but the results no longer pass FDR correction; it is unclear whether this simply reflects the lower SNR of within-participant (vs. between-participant) analysis or the absence of a true "storyline differentiation" effect.
Finally, having demonstrated the reinstatement of relevant past information, we next addressed the contribution of hippocampus-based episodic memory to this reinstatement. Previous work by Chen et al. (2016) showed that hippocampal-cortical interaction helped participants to integrate information across movie segments separated by a 1-day break. In our exploratory analyses, we found that storyline reinstatement increased with functional connectivity to hippocampus in mPFC both across participants and across segments (Figure 9), for several (but not all) parameter settings for the analysis. This finding fits with the idea that the hippocampus, which is known to be involved with the reinstatement of episodic memories over days, may also be involved with the reinstatement of recently accumulated memories over shorter lags (for additional evidence in support of this view, see, e.g., Goodrich, Baer, Quent, & Yonelinas, 2019;Hannula & Ranganath, 2009;Staresina & Davachi, 2009;Ezzyat & Olson, 2008;Olson, Moore, Stark, & Chatterjee, 2006). Notably, the degree of interparticipant functional connectivity between hippocampus and cortex did not reliably predict neural reinstatement triggered by motifs in Part C. One possible explanation is that the interparticipant functional connectivity method involves averaging over multiple time points; this may make it less useful for detecting brief reinstatement events triggered by motifs.
In addition to hippocampus-mediated episodic memory, there are other mechanisms that might contribute to reinstating storyline representations. For example, recent studies of working memory have shown that past information could be held in the cortex during a delay period without persistent activity (Wolff, Jochim, Akyürek, & Stokes, 2017;Sprague, Ester, & Serences, 2016;Stokes, 2015) and such inactive neural patterns can be reactivated on task demand ( Watanabe & Funahashi, 2014), by probe stimuli ( Wolff et al., 2017), and by TMS (Rose et al., 2016); several computational models have been built to account for the latent memory (Buonomano & Maass, 2009), and shortterm synaptic plasticity in the cortex has been proposed to be the underlying mechanism (Miller, Lundqvist, & Bastos, 2018;Mongillo, Barak, & Tsodyks, 2008;Zucker & Regehr, 2002). More research is needed to directly test whether short-term plasticity within the cortex plays a role in supporting the memory of recent context.
Our design successfully induced reactivation of neural patterns associated with specific storylines and motifs, which, we believe, reflects the reinstatement of narrative information relating to past events rather than simple reactivation of word representations. In support of this view, we found a significant motif effect even when taking storylinespecific high-frequency words as the baseline (Figure 7), showing that word repetition is not sufficient to drive the effect. In addition, the same motif was not always expressed in the same words ("throwing up" vs. "Clara feels sick, as the coffeecake rises to her throat"), showing that word repetition is not necessary to drive the effect. Furthermore, as noted above, we found a correlation between the behavioral relation scores and the neural motif effects (Figure 8). In other words, the same set of narrative motifs yielded greater neural reinstatement in participants who demonstrated a better understanding of the narrative. As for the storyline effect, the strongest difference between storylines was found in regions with long TRWs, that is, regions where randomly ordered words do not elicit reliable responses (Lerner et al., 2011; Figure 2C). Furthermore, 27% of the word tokens in the early AB part are storyline specific, whereas only 24% of the word tokens in the late AB part are storyline specific. Therefore, the difference in wording cannot explain the finding of a stronger storyline effect in the later AB part (Figure 3).
In conclusion, real-life events require dynamic integration of past and present information. Our results suggest that process memory may have two states: a state in which prior events are active and influence ongoing information processing and an inactive state in which the latent memory does not interfere with the ongoing neural dynamics Stokes, 2015). Through crossdisciplinary collaboration, this study demonstrated a way to achieve some experimental control over naturalistic stimuli, which play an ever-increasing role in revealing how the human brain works in its native environment (Lee, Bellana, & Chen, 2020;Willems, Nastase, & Milivojevic, 2020;Sonkusare, Breakspear, & Guo, 2019), and also showed how skilled storytellers leverage these mechanisms of separation and integration to bring about the desired effects in the listener's brain.

APPENDIX A APPENDIX B
t Value maps showing the effects of audio amplitude and between-segment pause (n = 25, p < .05, FWE corrected).