Hippocampal spatial and sequential representations of event structure scaffold precise episodic temporal order memory

The hippocampus plays an important role in representing spatial locations and sequences and for transforming representations via pattern separation and completion. How these representational structures and operations support memory for the temporal order of random items is still poorly understood. We addressed this question by leveraging the method of loci (MOL), a powerful mnemonic strategy for temporal order memory that particularly recruits hippocampus-dependent computations of spatial locations and associations. Applying representational similarity analysis to fMRI activation patterns revealed that hippocampal subelds contained representations of both temporal context and multiple features of sequence structure, including location identity, distance, and sequence boundaries. Critically, the hippocampal CA1 and CA23DG exhibited spatial and sequential pattern separation, respectively, enabling the encoding of multiple items in the same location and reducing swap errors across adjacent locations. Our results suggest that the hippocampus can exibly recongure multiplexed event structure representations to support accurate temporal order memory. The above analyses reveal that hippocampal representational patterns are modied according to the well-trained structured location sequence, exhibiting spatial and temporal pattern separation to aid temporal order memory. In the following analysis, we further examined whether representations in hippocampal subelds also support another type of temporal order, i.e., the episodic-like temporal context that was formed through one-shot learning and should be specic to a given event sequence. Due to the autocorrelation of fMRI BOLD signal, we could not directly compare the representational similarity of temporally adjacent pairs with more distant pairs. Instead, we examined the reinstatement of temporal context during retrieval In particular, for a given temporal distance during retrieval ranging to we grouped the pairs, according to their temporal distance during encoding into Short and Long TDe small of trials each individual contexts We to TDr very few pairs for TDr > We predicted that the brain regions containing representations of temporal show higher pattern similarity for short-distance pairs than long-distance pairs. with this prediction, found this pattern in CA1 P 0.004, corrected P 0.021, Cohen’s 0.65; four excluded due to fewer than 10 trials in any and one subject excluded outlier, SDs No temporal context reinstatement brain Our results signicantly extend these observations. First, the current study revealed all three types of event sequence representations in one single study, including spatial location representations, location distance representations, and sequence boundary representations. Together with the representation of temporal context, our study for the rst time revealed the rich multiplexing of spatial, temporal and sequential representations in human hippocampus. Second, unlike previous studies, the location sequence in the current study was not presented during encoding, suggesting these representations could be reinstated via conscious memory retrieval, resembling the spontaneous sequence replay during awake resting state and sleep in rodents 40, 41 and humans 42, 43 . Third, while representations of location distance were restricted to CA23DG, we found signicant temporal boundary effects in both CA1 and CA23DG. This is consistent with the proposal that both CA1 and CA3 are involved when memories for events must be held over long intervals 44 . Finally, several recent studies have implicated the alEC in temporal order memory 8, 10, 15 , by representing the temporal event structure. The current study mainly found sequence representations in CA1 and CA23DG. One obvious feature of the current study is that the temporal event structure was used as a scaffold for temporal order memory of words, with an emphasis on the binding of items into a structured context. Consistently, both human imaging and rodent studies have also implicated the hippocampus in context coding 45 . Future studies should further examine the role of alEC and hippocampal subelds in event structure representations and context binding.


Introduction
Episodic memory is inherently structured according to the temporal order of experiences 1 . As the core brain structure for spatial navigation and episodic memory, the hippocampus and surrounding areas of the medial temporal lobe (MTL) have been consistently implicated in spatial and temporal coding in rodents and humans 4,5,6 . In particular, the hippocampal-entorhinal system supports representations of temporally ordered events via sequential activation of spatially tuned cells 7 , suggesting that space and time can be uni ed into a common internal coding scheme.
Recent rodent research has revealed two types of temporal order representation in the hippocampalentorhinal system: The representations of stable event sequence that is increased in well-practiced structured events, and the temporal ow that is formed automatically with one-shot learning 8 . Human functional imaging studies have implicated the hippocampus and the anterior-lateral entorhinal cortex (alEC) in the representation of event structure 9,10 , and recall of temporal order of real-life events from a movie 15 . Although temporal context binding has been consistently implicated in temporal order judgment 14,16,17,18 , little is known regarding how the learnt event structure can be used as a scaffold to support effective temporal order memory.
This question is critical for the understanding of human temporal order memory, because people are generally bad at one-shot, episodic-like temporal order memory for random events, perhaps because the contextual drifts may be rather nonlinear and unstable. By contrast, learnt event structures may be used to dramatically boost performance. In fact, ancient Greek and Roman orators had long realized that linking novel material to sequential locations along a familiar and well-organized route can signi cantly improve temporal order memory 19 . Since then, this mnemonic strategy, called method of loci (MOL) or memory palace, has been commonly used by memory experts and athletes 20 . Several studies have shown that the MOL primarily engages spatial and associative processes in the MTL 20,21,22 .
Leveraging the ancient mnemonic strategy of MOL with representational similarity analysis of fMRI data, our study for the rst time revealed multiplexed representations of event structure and temporal context in human hippocampus. More critically, the event structure representations in the hippocampus exhibited spatial and sequential pattern separation, suggesting that the hippocampus may exibly recon gure event representations to optimize temporal order memory.

MOL quickly improved temporal order memory performance
Twenty-nine college students (10 males) participated in this experiment (Fig. 1A). On Day 1, subjects were tested for baseline performance, and then given a 2-hour video lecture on the MOL strategy including presentation of a world map with 10 landmarks (see Methods) ( Fig. 1B). Participants then practiced the MOL in three consecutive sessions (Day 2 to Day 4) before the scanning session (Day 5). During the baseline test, the two later practice sessions, and the fMRI scanning, subjects performed two encodingretrieval runs, while they only nished one run on Day 2. In each run, subjects studied a list of 30 words, and after a 6-min delay ( lled with a working memory task), they recalled the temporal order of these words (Fig. 1C). To reduce interference or practice effect, new words were used in each run across the whole experiment.
We further examined the behavioral pattern to ensure that subjects were actually using MOL. First, if they were relying on the locations to encode the temporal order of the studied items, we would predict recency and primacy effects based on the 10 locations. In contrast, if subjects were encoding the items serially without the MOL, we would predict that the recency and primacy effects would be based on the overall list of 30 items. Second, if the subjects used MOL and three words were encoded for each location, we would expect their accuracy to decrease from the rst set of 10 words to the second and third sets because of an increasing load at each location (i.e., the fan effect) 23 . Finally, we would also expect signi cant within-location swap errors (i.e., confusion of items that were 10 or 20 positions apart). The fan effect and within-location swap errors should not occur if subjects did not use MOL.
A different pattern of results was found for the behavioral data during the baseline test. Speci cally, we found a signi cant set by serial position interaction (F (3.27, 91.47) = 11.11, P < 0.0001, η 2 = 0.11), indicating only a primacy effect in set 1 (initial vs. middle/ nal, ts > 5.21, Ps < 0.001), but not in sets 2 and 3 (ts < 1.78, Ps > 0.05) (Fig. S2C). No signi cant within-location swap error was found (Fig. S2D). These results suggest that before practice the 30 words were encoded into a single list, rather than based on the 10 locations. Across the baseline and three training sessions, the location-based pattern gradually emerged ( Fig. S2 to S5), showing increased ratios of within-location swap errors to all errors (Fig. S1B). Together, our behavioral data suggest that subjects effectively learnt to use the MOL to encode the word order, which not only improved the overall performance but also changed the behavioral patterns.
Hippocampal contributions to temporal order memory The above behavioral evidence suggests that subjects indeed used the MOL strategy to aid memory encoding. We then turned to the fMRI data to examine how employment of this strategy affects activity levels and stimulus-speci c representations in hippocampal sub elds. The hippocampus and surrounding medial temporal lobe areas were segmented into 5 regions, including CA1, CA23DG, anterior lateral entorhinal cortex (alEC), posterior medial entorhinal cortex (pmEC), and parahippocampal cortex (PHC) ( Fig. 3A; Methods). Univariate analyses revealed marginally signi cant subsequent memory effects (SME) in CA23DG (t (28) = 2.35, P = 0.026, corrected P = 0.074, Cohen's d = 0.44) and PHC (t (28) = 2.29, P = 0.030, corrected P = 0.074, Cohen's d = 0.43), with subsequently remembered items showing greater activity than subsequently forgotten items (Fig. 3B). Whole-brain analysis revealed a signi cant SME in the left parahippocampal gyrus, the left frontal medial cortex, and the left orbital frontal cortex (FWEcorrected for multiple comparisons), consistent with previous observations 24 ( Fig. S6A; Table S1).

Hippocampal representations of structured location sequences
Having shown the involvement of hippocampus in temporal order memory, we further examined whether speci c hippocampal representations supported temporal order memory. In the MOL strategy, the spatial location was not presented to the participants during either memory encoding or retrieval. However, its well-learnt structured sequence of locations (i.e., mental route) could be reactivated and linked with the tobe-learnt words during encoding, and again reinstated during retrieval. To probe the neurocognitive representation of this sequential location structure, we examined how hippocampal representational similarity was modulated by location identity, the distance between locations, and the boundaries of the sequence.

Hippocampal spatial pattern separation
To examine the representation of spatial locations, we compared the pattern similarity of items sharing the same location across two runs (i.e., same-location pairs) with those encoding adjacent but different locations (i.e., near-distance pairs, ordinal distance = 1) (Fig. 4A). This analysis could be performed during both encoding and retrieval (according to the temporal distance during encoding). Notably, this cross-run pattern similarity should not be affected by intrinsic autocorrelations of the BOLD signal. We found that the PHC showed greater pattern similarity for same-location pairs than near-distance pairs during encoding (t (28) = 3.42, P = 0.002, corrected P = 0.010, Cohen's d = 0.64), while a reverse pattern was found during retrieval (t (24) = -2.61, P = 0.015, corrected P = 0.039, Cohen's d = 0.52; four subjects were excluded from this analysis due to fewer than 10 trials in any condition, see Table S3) (Fig. 4B). The same effect of pattern separation during retrieval was found in CA1 (t (24) = 2.81, P = 0.010, corrected P = 0.039, Cohen's d = 0.56), but not in CA23DG (t (24) = 0.372, P = 0.713) (Fig. 4C), and there was no signi cant region (CA1 vs. CA23DG) by location identity interaction (F (1,24) = 2.011, P = 0.169). Whole-brain searchlight analysis did not reveal any representation of location identity elsewhere in the brain.
Given the similar pattern separation effect in CA1 and PHC, we further examined the relationship between representational patterns in CA1 and PHC. Representational connectivity analysis (see Methods; Fig. S7) revealed that the CA1 pattern was positively correlated with the PHC pattern during both encoding (r (28) = 0.036, t (28) = 3.77, P < 0.001) and retrieval (r (24) = 0.076, t (24) = 4.00, P < 0.001). Interestingly, the CA1 encoding pattern was marginally negatively correlated with the CA1 retrieval pattern (r (24) = -0.030, t (24) = -1.81, P = 0.083). These results suggest that the CA1 representation of spatial location could be exibly modulated via pattern separation processes, which could then modulate the PHC representations (even though the direction of the effects of course cannot be inferred).
The spatial pattern separation could help to differentiate the temporal order of items encoded at the same location during memory retrieval. If the CA1 and PHC pattern separation indeed aided temporal order judgment, we would predict the degree of pattern separation (near-distance minus same-location pairs pattern similarity) in these regions to be negatively correlated with the numbers of within-loci swaps. Robust regression revealed a marginally signi cant effect in the PHC (t (27) = -1.98, P = 0.058).
Hippocampal sequential pattern separation Second, we examined how neural pattern similarity was modulated by the distance between two locations in the well-trained sequence 9,10 . Note that distance here refers to the ordinal distance between two locations (ranging from 1 to 9) rather than their Euclidean or geodesic distance in the real world. We compared the pattern similarity of words that were encoded at near distance (i.e., ordinal distance = 1), middle distance (i.e., 2 ≤ ordinal distance ≤ 3) and far distance (i.e., ordinal distance > 3) (Fig. 4A). We found that the CA23DG region showed a main effect of location distance during encoding (F (1.80,50.49) = 5.40, P = 0.009, corrected P = 0.047, η 2 = 0.03). Post-hoc t-tests (Tukey HSD) showed higher pattern similarity for far-distance pairs than near-(t (28) = 2.93, P = 0.018, Cohen's d = 0.54) and middle-distance pairs (t (28) = 3.30, P = 0.007, Cohen's d = 0.61), but no difference between near-and middle-distance pairs (t (28) = 0.07, P = 0.998) (  Table  S4). No signi cant effect of location distance was found during retrieval.

Hippocampal sequence boundary effects during encoding
In addition to location identity and distance, the location structure also contains sequence boundaries. Unlike previous studies where the boundary was introduced by background context or different sequences 11,14 , the boundary in the current study was introduced by the repetition of location sequence. That is, when the 11th word was encoded, subjects would return to the rst location, which would break the sequence contiguity. As a result, for a given temporal distance (e.g., 2), we can construct both withinboundary pairs (e.g., location 8 -location 10) and cross-boundary pairs (e.g., location 9 -location 1).
To compare within vs. cross-boundary pairs of matching distances, we analyzed temporal distances of 4 to 7 ordinal positions during encoding ( Fig. 5A; Table S5). Following a previous study 14 , we predicted that the similarity of neural representations would be higher for within-boundary pairs than crossboundary pairs. Indeed, this was found in both CA1 (t (28) = 3.70, P < 0.001, corrected P = 0.002, Cohen's d = 0.69) and CA23DG (t (28) = 4.37, P < 0.001, corrected P < 0.001, Cohen's d = 0.81) (Fig. 5B). This effect was speci c to the hippocampus and did not occur in EC or PHC, or in any other brain region in the wholebrain analysis.
Hippocampal temporal context reinstatement during retrieval The above analyses reveal that hippocampal representational patterns are modi ed according to the welltrained structured location sequence, exhibiting spatial and temporal pattern separation to aid temporal order memory. In the following analysis, we further examined whether representations in hippocampal sub elds also support another type of temporal order, i.e., the episodic-like temporal context that was formed through one-shot learning and should be speci c to a given event sequence. Due to the autocorrelation of fMRI BOLD signal, we could not directly compare the representational similarity of temporally adjacent pairs with more distant pairs. Instead, we examined the reinstatement of temporal context during retrieval (Fig. 6). In particular, for a given temporal distance during retrieval (TDr, ranging from 1 to 29), we grouped the pairs, according to their temporal distance during encoding (TDe), into Short (TDe ≤ 3) and Long (4 ≤ TDe ≤ 6) conditions (Fig. 6A). This grouping was motivated by the small number of trials of each individual distance and by previous ndings that effects of temporal contexts decayed quickly beyond 3 items 12 . We restricted the analysis to correct trials and TDr < = 20, as there were very few pairs for TDr > 20 (Table S6). We predicted that the brain regions containing representations of temporal context should show higher pattern similarity for short-distance pairs than long-distance pairs. Consistent with this prediction, we found this pattern in CA1 (t (23) = 3.17, P = 0.004, corrected P = 0.021, Cohen's d = 0.65; four subjects were excluded due to fewer than 10 trials in any condition, and one subject was excluded as an outlier, i.e., 2.5 SDs above the mean, see Table S7) 6B). No temporal context reinstatement was found in any other brain region.

Control Analysis: The Hippocampus Did Not Represent Word Semantics
In all the above analysis, we only considered the representation of the event structure and the temporal context, but not the representations of the word, as previous studies did not reveal strong item representations in the hippocampus. To further examine this issue, we conducted latent semantic analysis to generate the semantic similarity matrix of the words, using a well-trained Chinese word embedding model, Directional Skip-Gram 27 (See Methods). Correlating semantic similarity with the neural representational similarity (Fig. S9A) did not reveal signi cant semantic representation in any hippocampal sub eld (Ps > 0.082) (Fig. S9B). In addition, all the above results remained unchanged after controlling the semantic similarity. Interestingly, we found signi cant semantic representations during encoding in the vmPFC (r (28) = 0.017, P = 0.031), and a marginally signi cant effect in the SPL (r (28) = 0.020, P = 0.068).

Discussion
Inspired by the ancient mnemonics of MOL, the current study revealed a novel neural mechanism that supports precise temporal order memory of random events. In addition to hippocampal temporal context binding, the current study found that hippocampal event structure representations could be exploited to improve the accuracy of temporal order memory. These representations were shaped by two types of pattern separation processes acting on spatial locations and sequential sequences, which were associated with CA1 and CA23DG, respectively. These results emphasize the multiplexed and exible nature of hippocampal representations in the service of precise temporal order memory.

CA1 temporal context reinstatement and temporal order memory
The current study revealed clear evidence of temporal context reinstatement in the hippocampal CA1 area, with greater pattern similarity for item pairs studied at closer as compared to more distant temporal intervals. According to the temporal context or temporal drift model 16,17 , an episodic element is 'tagged' to the random and slowly changing neuronal background activity that is present at the time of encoding 28 . This temporal context is then reinstated during recall and provides information about the temporal distance by assessing the degree of disparity between the reinstated and the present neuronal background activity 12 . Supporting the role of hippocampal temporal context in temporal order judgment, it has been shown that (a) lower hippocampal pattern similarity (i.e., higher representational distinctiveness) was associated with more accurate temporal order judgments 18 ; (b) changes in EC pattern similarity during encoding of a narrative were correlated with later duration estimates between events 29 ; and (c) manipulation of context shifts by changing background images increased subjective feelings of temporal distance 14 . Our results add to this literature by showing hippocampal temporal context representations when participants were asked to rely on existing sequence structure, suggesting that temporal context binding might be automatic.

Hippocampal representation of well-trained event structure
In addition to episodic temporal context, the hippocampal-entorhinal system can explicitly represent spatial sequences. First, the hippocampal-entorhinal system contains place cells 30 and grid cells 31 that provide two complementary representational metrics of spatial locations and distances, respectively.
Second, the hippocampus exhibits speci c neural mechanisms such as cross-frequency coupling of highfrequency bursts of activity to the phase of low frequency oscillations, which may support the representation and pairwise binding of event sequences 32,33,34 . Third, rodent studies analyzed activity while animals traveled along well-trained spatio-temporal sequences and found that CA1 "time cells" showed context-speci c activities at unique time points of an experience 35,36 . A recent study revealed both types of temporal order representations, i.e., encoding both temporal ow and trial structure, by the same alEC neurons 8 . Interestingly, in well-trained structured experiences, the encoding of temporal ow across trials was reduced, whereas the encoding of trial structure was increased. Human neuroimaging studies have also found that both hippocampus 9 and alEC 10 can represent well-studied event sequences or event maps. The hippocampus exhibits increased activity during events that violate an expected sequence 37, 38 , shows increased within-sequence similarity and decreased between-sequence similarity 39 , as well as greater pattern similarity between adjacent items in well-learnt object sequences than random sequences 11 . Finally, the hippocampal representation is sensitive to sequence boundaries 11,14 .
Our results signi cantly extend these observations. First, the current study revealed all three types of event sequence representations in one single study, including spatial location representations, location distance representations, and sequence boundary representations. Together with the representation of temporal context, our study for the rst time revealed the rich multiplexing of spatial, temporal and sequential representations in human hippocampus. Second, unlike previous studies, the location sequence in the current study was not presented during encoding, suggesting these representations could be reinstated via conscious memory retrieval, resembling the spontaneous sequence replay during awake resting state and sleep in rodents 40,41 and humans 42,43 . Third, while representations of location distance were restricted to CA23DG, we found signi cant temporal boundary effects in both CA1 and CA23DG. This is consistent with the proposal that both CA1 and CA3 are involved when memories for events must be held over long intervals 44 . Finally, several recent studies have implicated the alEC in temporal order memory 8,10,15 , by representing the temporal event structure. The current study mainly found sequence representations in CA1 and CA23DG. One obvious feature of the current study is that the temporal event structure was used as a scaffold for temporal order memory of words, with an emphasis on the binding of items into a structured context. Consistently, both human imaging and rodent studies have also implicated the hippocampus in context coding 45 . Future studies should further examine the role of alEC and hippocampal sub elds in event structure representations and context binding. Spatial and sequential pattern separation and temporal order memory Strikingly, we found clear evidence for both spatial and sequential pattern separation. Spatial pattern separation effects were found in CA1 during retrieval, revealing a pattern that is different from that in PHC during encoding. In the current study, each location was associated with three words, and consequently there were signi cant within-location swap errors. To achieve a high accuracy, the representations thus needed to be recon gured in order to enable the encoding of multiple items into the same location and yet keep them separate. This requirement is similar to effects of spatial pattern separation during path disambiguation in both rodent and human. For example, hippocampal representations of overlapping routes 46 or object sequences 11 become more distinct than non-overlapping routes/sequences. A previous rodent study also revealed that hippocampal neurons encoded different episodes in a task of overlapping sequences of odors 47 .
In addition to spatial pattern separation, we observed signi cant pattern separation for sequences in CA23DG, a region that has been implicated in temporal pattern separation before 48 . Unlikely the previous studies which found greater pattern similarity for temporally closer items in both alEC and hippocampus 9, 10 , representations of more adjacent representations were more distinct in CA23DG. This pattern was different from that observed in neocortical regions, and was also different from the temporal context representation in CA1, which showed higher similarity for temporally adjacent than for more distant items. Together with the spatial pattern separation, our results suggest that hippocampal representations can be exibly con gured in order to support temporal order memory.
The exible employment and recon guration of hippocampal spatial and temporal representations in support of episodic memory has been frequently reported in the literature. For example, one study found spatial pattern separation but an opposite pattern for temporal representations 49 , whereas another study found greater pattern similarity for both spatially and temporally adjacent item pairs 9 . Still, the CA23DG showed pattern separation effects when both spatial and temporal information was correctly retrieved 50 . Another study reported that CA1 showed greater pattern similarity for trials that shared the same episodic context as compared to those with different episodic contexts, whereas CA23DG showed the opposite pattern 51 . Clearly, future studies are required to examine in greater detail when hippocampal representations match temporal and spatial distances and when they disentangle these distances via processes of pattern separation.

Integration of multiple representational formats supports temporal order memory
The current studies revealed rich spatial and temporal sequence coding in the hippocampus in a single memory task. A further question is how these representations are integrated to support episodic-like temporal order memory. According to the cognitive map theory, the spatial representational formats of the hippocampal formation can support exible cognition and behavior, including episodic memory 52,53 . Accordingly, it has been found that the same neurons represent information about both space and time 54 . This suggests that the two dimensions might be integrated into a common coding scheme of spatiotemporal proximity in the hippocampus, supporting the formation of hierarchical structures in a memory space 55,56,57 . Human fMRI studies found that the spatial and temporal aspects of autobiographical experiences are coded within the hippocampus across various scales of magnitude, up to one month in time and 30 km in space 13 . After learning spatio-temporal trajectories in a large-scale virtual city, subject-speci c neural similarity in the hippocampus scales with the remembered proximity of events in space and time 9 . The joint coding of space and temporal context found in CA1 in the current study further suggests that spatial information can be combined with temporal contexts in order to support temporal order memory.
Beyond the representation of space and time, our study further suggests that the learnt structured event sequence may serve as a scaffold for the organization of temporal memory. In addition, a previous study showed that spatial and episodic information might be separately represented in different CA1 neurons, suggesting that in addition to place cells, other hippocampal neurons are involved to support episodic context coding 58 . Finally, although the MOL might not be commonly used, it represents just one particularly prominent case of learnt structured knowledge -i.e., schemas -which have been consistently shown to be combined with learned material to facilitate episodic memory 59,60 . This could be achieved via interactions between hippocampus and ventromedial prefrontal cortex 61,62 . Interestingly, we also found signi cant representations of learnt words in the vmPFC during encoding. All these rich representations might be integrated with the ongoing hippocampal oscillation to form an integrated sequence of neuronal activities that shapes our temporal order memory 7 . Future studies should uncover the intricate interaction of externally driven and internally organized sequences (that are in uenced by long-term learning) in shaping our experiences and memories. In particular, researchers could try to translate our paradigm into rodent or primate studies to uncover the neuronal mechanisms.
To summarize, using MOL to strongly engage the hippocampus, the current study uncovered rich and multiplexed spatial, temporal and structured sequence representations in the hippocampus. Through spatial and sequential pattern separation, the hippocampus generates robust and distinctive episodic contexts that are linked to the learnt material. This provides a novel neural mechanism by which humans can achieve precise temporal order memory.

Participants
Twenty-nine college students (10 males; ages 18-24 years, mean age = 20.3 years) participated in this experiment. All of them were healthy, had normal or corrected-to-normal vision and no history of psychiatric of neurological diseases. None of them had experience or practice with the method of loci (MOL) before the experiment. Two additional participants were also recruited but removed from nal analysis (one did not nish the scan session and one had extremely low memory performance with less than 2% remembered trials). Written consent was obtained from each participant after a full explanation of the study procedure. The study was approved by the Institutional Review Boards at Beijing Normal University and the Center for MRI Research at Peking University.

Materials
The stimuli consisted of 270 two-character Chinese words that were randomly divided into 9 lists of 30 words each. Sixty words were used in baseline test (2 lists), 150 in practice sessions (5 lists) and 60 in the scanning session (2 lists). All words describe common objects (e.g., broom), animals (e.g., crow), fruits (e.g., lemon), or vegetables (e.g., onion). The same word lists were used in each session and the orders of the words were randomized across subjects.

Experimental procedure
Subjects completed ve experimental sessions across ve consecutive days (Fig. 1A). On Day 1, they nished two rounds of the temporal order task without using the MOL, which served as the baseline. Immediately after the task, they were trained with the MOL mnemonic by watching a two-hour video developed by our lab (see below). On Day 2, they were asked to review the method and to apply it in the subsequent temporal order task. Only one round of the temporal order memory task was administered (Practice 1). On Days 3 and 4, they nished two rounds of the temporal order task each day (i.e., Practice 2 and Practice 3) to further improve their MOL skills. On Day 5, they completed two rounds of the temporal order task in the scanner.

Method of loci
The MOL uses visualizations of familiar spatial information to scaffold the memorization of a sequence of other information like words or objects. In short, it is carried out by mentally placing each to-beremembered item into one speci c familiar location, creating vivid mental associations between physical locations and the items. In the present experiment, the same locations (i.e., a world map with 10 landmarks) were introduced for all subjects to let them follow an identical memory route. During the twohour video practice session, they were asked to remember the order of the 10 locations and learn to vividly associate the to-be-remembered words with the locations, e.g., "the Statue of Liberty-apple". They were free to play back the videos and ask questions about the strategies. We also asked the subjects to explain the method in their own words to make sure they correctly understood. Before each practice session and the scanning session, subjects were asked to rehearse the main points of MOL and wrote down the names of the 10 locations in correct order. All subjects performed three practice sessions and reported they could properly use the strategy before scanning.
Temporal order task Each temporal order task consisted of an encoding task and a retrieval task. The same slow event-related design (14 s for each trial) was used for the baseline test, the practice sessions, and the scanning session, and for both encoding and retrieval tasks. During encoding, a xation cross was presented for 500 ms, followed by the two-character Chinese word for 2 s. Another xation was presented for 4 s during which subjects associated the word with the correct location along the path. After that, they performed a perceptual judgment task for 7.5 s. A Gabor image tilting 45° to the left or the right was presented on the screen, and subjects were asked to identify the orientation of the Gabor by pressing one of two buttons as quickly and accurately as possible. The next Gabor started 100 ms after subjects responded. Subjects learnt 30 words in each encoding run, which lasted 7 min.
Subjects performed a 6-min working memory task between encoding and retrieval. During baseline test and fMRI scan, a spatial working memory task was used. Each trial started with a 2 s encoding phase, during which 3 to 6 dots were sequentially displayed for 250 ms on different positions of an (invisible) 5 × 5 checker board. After an 8 s delay, a probe was presented and subjects were asked to judge whether a dot had been presented in this location during encoding or not. A different working memory task (comprising of a 1-back task of one-digit numbers) was used in the three practice sessions so that subjects would not be over-trained on the spatial working memory task (which might confound the comparison of behavioral performance between baseline and fMRI scan).
During retrieval, each trial started with a xation cross shown for 500 ms, followed by presentation of one of the studied words for 2 s. A scaled timeline was then presented on the screen for 4 s, and subjects were asked to indicate the exact temporal position of the word in the list (i.e., 1 to 30), by moving the red slider on the timeline with the "left" or "right" button. The slider would keep moving as long as the subjects held the button down. The initial position of the red slider was randomly selected, but was never the correct position for that word. The scale and slider would stay on screen for the whole 4 s period, and the last position of the slider was taken as the response. Subjects could press an "don't know" button if they could not remember the word or its temporal order. Lack of responses were also categorized as "don't know". After the temporal order judgment, subjects performed the same perceptual judgment task as in the encoding task for 7.5 s. Each retrieval run also lasted 7 min. On days 1, 3, 4 and 5, subjects performed two rounds of encoding and retrieval tasks with different sets of 30 words, which were also separated by a working memory task. On day 2, the performed one round. The whole temporal order task lasted 46 min on days 1, 3, 4, and 5, and 20 min on day 2.

MRI acquisition
Imaging data were acquired on a 3.0 T Siemens Prisma MRI scanner with a 64-channel head-neck coil at the MRI Center at Peking University. High-resolution functional images were acquired using a simultaneous multi-slice EPI sequence (TR/TE/θ = 2000 ms/30 ms/90°; FOV = 198 mm × 198 mm; matrix = 124 × 124; in-plane resolution = 1.6 * 1.6 mm; slice thickness = 1.6 mm; GRAPPA factor = 2; multi-band acceleration factor = 3). Ninety contiguous axial slices parallel to the AC-PC line were obtained to cover the whole cerebrum and partial cerebellum. A high-resolution structural image was acquired for the whole brain using a 3D, T1-weighted MPRAGE sequence (TR/TE/θ = 2530 ms/2.98 ms/7°; FOV = 256 mm × 256 mm; matrix = 256 × 256; slice thickness = 1 mm; GRAPPA factor = 2). A high-resolution T2-weighted image was also acquired using a T2-SPACE sequence for hippocampus segmentation. The image plane was perpendicular to the long axis of the hippocampus and covered the whole MTL region (TR/TE/θ = 13150 ms/82 ms/150°; FOV = 220 mm × 220 mm; matrix = 512 × 512; slice thickness = 1.5 mm; 60 slices). Image preprocessing MRI data were rst converted to Brain Imaging Data Structure (BIDS) format 63 . The rst 10 volumes before the task were automatically discarded by the scanner to allow for T1 stabilization. Image preprocessing was performed using FMRIPrep v1.4.0 64 . Each T1 volume was corrected for intensity using N4BiasFieldCorrection 65 and skull-stripped using antsBrainExtraction.sh (OASIS template). Cortical surfaces were reconstructed using FreeSurfer v6.0.1 66 . The T1 volume was then normalized to the ICBM 152 Nonlinear Asymmetrical template (version 2009c) through nonlinear registration with the ANTs v2.1.0 67 . Functional data were slice time corrected using AFNI v16.2.07 68 , motion-corrected using FSL's MCFLIRT 69 , and registered to the T1 image using a boundary-based registration with nine degrees of freedom 70 . For univariate analysis, data were spatially smoothed with a 6 mm full-width-at-halfmaximum Gaussian kernel using FSL's SUSAN, ltered in the temporal domain using a nonlinear highpass lter with a 100 s cutoff, and normalized to MNI Template space. For ROI analysis and representational similarity analysis, data were aligned to subjects' T1 image and kept at its native resolution. Slight spatial smoothing was applied to the data using a 1.6 mm full-width-at-half-maximum Gaussian kernel and ltered in the temporal domain using a nonlinear high-pass lter with a 100 s cutoff in order to obtain both high signal to noise ratio and anatomical speci city.

Hippocampal sub elds segmentation
The hippocampus and surrounding medial temporal lobe areas were segmented into CA1, CA2, CA3, DG, EC and PHC using the automatic segmentation of hippocampal sub elds (ASHS) toolbox with the UPenn atlas 25 . Anatomical masks segmented by ASHS were coregistered to the functional image for further analyses. CA2, CA3 and DG were combined (i.e., CA23DG) because they could not be unambiguously distinguished. To further examine functional speci city of EC subregions, we segmented the EC into anterior-lateral EC (alEC) and posterior-medial EC (pmEC), using the masks from a previous publication 26 . The masks were resampled and coregistered to each subject's functional space. They were then intersected with the EC masks generated by ASHS to improve segmentation precision. Finally, our ROIs For both the encoding and retrieval tasks, two types of trials were modeled: remembered and forgotten trials. We excluded trials with a temporal error of 1 position, as it was unclear whether these errors were due to inaccurate memory or response errors. The "unsure" trials and orientation trials from both encoding and retrieval were modeled as regressors of no interest. The above regressors were convolved with a double gamma hemodynamic response function. Six movement parameters and the frame-wise displacement (FD) were modeled as confound regressors. Additional censor regressors were included for each volume with a FD greater than 0.3 mm. Each run was modeled separately in the rst level analysis.
Cross-run averages for each contrast image were created for each subject using a xed-effects model. These contrast images were then used for group analyses with a random-effects model. Group images were thresholded using cluster detection statistics, with a height threshold of z > 2.3 and a cluster probability of P < 0.05, corrected for whole-brain multiple comparisons using Gaussian Random Field Theory.

Region-of-Interest (ROI) analysis
We conducted percent signal changes analysis within the prede ned ROIs in subject's native space. Parameter estimates (β values) of remembered and forgotten trials from the GLM were each averaged across all voxels in a given ROI for each subject. Percent signal changes were calculated using the following formula: (β / mean) × ppheight × 100%, where ppheight is the peak height of the hemodynamic response versus the baseline level of activity 71 .

Single-trial response estimations
Single-trial response estimations were done using the least-square separate method for each functional run 72 . Each trial was estimated in a separate GLM, in which the given trial was modeled as a separate regressor whereas all the remaining trials were modeled as another regressor. The six movement parameters and FD were included in each GLM as confound regressors. Additional censor regressors were included for each volume with a FD greater than 0.3 mm. This resulted in a single β image for each trial, which was used for the representational similarity analysis.
Representational similarity analysis (RSA) RSA 73 was used to determine the neural pattern similarity between trial pairs from different conditions. The neural pattern similarities could be calculated between encoding-encoding, retrieval-retrieval, or encoding-retrieval tasks, using images from the same or different runs. The main analysis focused on examining similarities within prede ned ROIs (CA1, CA23DG, alEC, pmEC and PHC). Each trial's β values from the single-trial response estimations within the ROI were extracted. The Fisher Z-transformed Pearson's correlations across trials were used as the index of neural pattern similarity. These similarities were then grouped and averaged based on the spatial location, sequence distance, and temporal distance for further statistical analysis. We used all trials for the encoding task but only included remembered trials for the retrieval task to make sure the correct context representations were reinstated. All trials during retrieval were used when correlating the representations with behavioral performance. To test group-level signi cance, paired t-tests or repeated measures ANOVAs were used (see below).
In order to examine the effect beyond our prede ned ROIs, a whole brain searchlight analysis was conducted 74 . Each searchlight was de ned as a spherical cluster in subject's native space with a radius of 3 voxels (4.8 mm, 123 voxels in total) surrounding a target voxel. The neural pattern similarity difference of interest was calculated within the sphere and assigned to the center voxel. All contrast maps of all subjects were transformed into MNI space and entered into group analysis using nonparametric permutation for inference on the statistical map. Non-parametric permutations were conducted by Randomise in FSL with 5,000 permutations. The signi cance of the derived statistical map was determined by the threshold-free cluster enhancement (TFCE) algorithm with P < 0.05 (whole brain FWE corrected) 75 .

Representational connectivity analysis
To examine representational connectivity, we generated the cross-run representational similarity matrix (RSM) 73 (30 × 30) for each ROI and each processing stage (encoding and retrieval) (Fig. S7). We then correlated the RSMs from different regions within each processing stage (e.g., CA1 and PHC) or from different processing stages within the same region (CA1 encoding and CA1 retrieval) 76 . To test for grouplevel signi cance, non-parametric permutations were conducted by shu ing one of the representational similarity patterns in a given RSM pair, e.g., PHC and CA1. Permutations were conducted 5,000 times and averaged to obtain the baseline correlation coe cient under the null hypothesis for each ROI pair and each subject. The p-value was obtained by paired t-tests between the original connectivity value and the baseline across subjects.

The representation of semantic information in the brain
To control the reorientations of word semantics, we conducted latent semantic analysis to generate a semantic similarity matrix of the words in the scanning session. For each word, its semantic features (i.e., word vectors) were extracted from a well-trained Chinese word embedding model, Directional Skip-Gram 27  We used a permutation test to assess group-level signi cance. During each permutation, we shu ed the orders of the words and generated a new 60 by 60 semantic similarity matrix, which was then correlated with the neural similarity matrix. The permutation was done 5,000 time for each region and each subject, and the value across the 5,000 permutations were averaged to form a baseline. A paired t-test was then conducted for each region to determine a group-level p-value. Results were shown in supplemental materials (Fig. S9).

Statistical analysis
All paired t-tests and repeated measures ANOVAs in our analysis were two-tailed and conducted by the afex package using type III sums of squares and using Greenhouse-Geisser correction to correct the degrees of freedom if necessary in R 3.6.1. Error bars in gures denote within-subject errors that account for heterogeneity of variance. FDR correction was performed to correct for multiple comparisons across the multiple ROIs.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Code availability The MOL and experimental design. (A) Subjects completed ve sessions on ve consecutive days. On Day 1, they conducted a baseline test (two rounds of the temporal order memory task) and studied the MOL video for two hours. On Day 2, they reviewed the MOL method and practiced one round of the temporal order memory task. On Days 3 and 4 and during the fMRI scanning, they nished two rounds of the temporal order memory task. (B) Subjects were trained to remember a sequence of locations and then asked to make associations between these locations and the to-be-learnt words. The same locations and route were used for all subjects. (C) During encoding, subjects studied a list of 30 words by mentally associating them with 10 well-trained locations, which was not presented to the subjects. Each word was presented for 2 s, followed by a 4 s xation cross. Subjects had 6 s to encode the word. During retrieval, a studied word was presented for 2 s and subjects were asked to indicate its exact temporal position in the list, by moving the red slider on the timeline with the "left" or "right" button in 4 s. Subjects could press the "don't know" button if they could not remember the word or its list position. A slow event-related design (14 s/trial) was used for the encoding and retrieval phases. To prevent subjects from further processing the word, subjects performed a 7.5-s perceptual orientation judgment task after encoding or retrieval each word. The encoding-retrieval cycle was conducted twice with 30 different words in each run.  ROIs segmentation and univariate effects of temporal memory. (A) Segmentation of hippocampus and adjacent medial temporal lobe areas. The top two panels show an example of segmentation from one subject, which was overlaid onto the subject's T2 image (coronal plane). The hippocampus and surrounding medial temporal lobe areas were segmented into CA1, CA2, CA3, DG, EC and PHC using ASHS 25. CA2, CA3 and DG were combined because they could not be unambiguously distinguished. The bottom panel shows an example of an EC segment, which was overlaid onto the subject's T1 image (sagittal plane). The EC was segmented into anterior lateral EC (alEC) and posterior medial EC (pmEC), based on published masks 26. The masks were resampled and coregistered to each subject's native space, and were further intersected with the EC mask generated by ASHS to improve precision. (B) Subsequent memory effects (remembered > forgotten). (C) Retrieval success effects (remembered > forgotten). Each dot represents one subject and the bars represent group means. Error bars indicate averaged within-subject errors. * P < 0.05 uncorrected. ** P < 0.01 uncorrected.

Figure 4
Hippocampal representations of structured location sequences during encoding and retrieval. (A) Schematic depiction of word pairs depending on the distance of the associated MOL locations. Words in a pair are always from different runs. Same: same-location pairs; Near: near-distance pairs (ordinal distance = 1); Middle: middle-distance pairs (2 ≤ ordinal distance ≤ 3); Far: far-distance pairs (ordinal distance > 3). (B) Spatial pattern separation in PHC: higher similarity for same-location than neardistance pairs during encoding, but reversed pattern during retrieval. (C) Spatial pattern separation in CA1, but not CA23DG. Each line represents one subject and the bars represent the group means. (D) Sequence pattern separation in CA23DG, but not CA1. The CA23DG region showed higher pattern similarity for fardistance pairs than near-and middle-distance pairs. Each dot represents one subject and the bars represent the group means. Error bars indicate averaged within-subject errors. * P < 0.05. ** P < 0.01.  Temporal context reinstatement during retrieval. (A) Schematic depiction of temporal distance during encoding. For all pairs of trials with a given temporal distance during retrieval (TDr, ranging from 1 to 20), they could be grouped, based on their temporal distance during encoding (TDe), into short-distance (TDe ≤ 3) or long-distance (4 ≤ TDe ≤ 6) pairs. The pattern similarity for each temporal distance during retrieval was rst averaged, and then averaged across all temporal distances. (B) Pattern similarity for short and long TDe pairs during retrieval. The CA1 area showed greater pattern similarity for shortdistance pairs than long-distance pairs. Each line represents one subject and the bars represent the group means. Error bars indicate averaged within-subject errors. * P < 0.05.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.