Population dynamics underlying associative learning in the dorsal and ventral hippocampus

Jeremy Biane UCSD Max Ladow University of California San Francisco Fabio Stefanini Columbia University Sayi Boddu University of California San Francisco Austin Fan University of California San Francisco Shazreh Hassan University of California San Francisco Naz Dundar University of California San Francisco Daniel Apodaca-Montano University of California San Francisco Nick Woods UCSF Mazen Kheirbek (  mazen.kheirbek@ucsf.edu ) University of California San Francisco https://orcid.org/0000-0001-9157-7363


INTRODUCTION 14
As a child, an unexpected encounter with an ice cream truck can be a highly rewarding experience. To better 15 predict the circumstances that led to this occurrence, the brain gathers information surrounding the incident, 16 from broad cues associated with the availability of reward (the presence of music, the neighborhood in which 17 the encounter occurred), to more detailed stimulus representations (the specific melody played, the precise 18 location of the encounter), to the positive outcome from the experience of consuming ice cream. Following 19 repeated encounters, the most predictive features are identified and used to inform behavior, such as 20 running outside when a particular melody is heard. 21 This example illustrates a fundamental objective of the brain: to extract the underlying structure of the world 22 and model its causal relationships. Moreover, the brain must be able to flexibly update these models as cue-23 outcome relationships change (e.g., when the melody is replaced, or the truck no longer carries your favorite 24 ice cream). While the importance of examining the population dynamics underlying cognitive processes is 25 becoming increasingly appreciated ( While dHPC and vHPC may encode unique features of an explored environment, it remains unknown how 39 these areas may be differentially engaged during the encoding of associative memories. Distinct encoding 40 properties in dHPC and vHPC could not only enrich the scope of internal models, but also facilitate learning 41 (Collin et al., 2015;Harland et al., 2017). For instance, during goal-oriented navigation, models incorporating 42 both small-and large-scale place fields (characteristic of dHPC and vHPC, respectively) lead to faster 43 learning compared to models only incorporating a single scale (Llofriu et al., 2015). During learning, detailed 44 representations by dHPC may support the formation of associative memories based on local cues, such as 45 the precise identity of an object in an environment, while broad vHPC representations may generalize 46 knowledge across multiple experiences and/or attach significance to contexts in which associations occur.
Here, we used 2-photon in vivo imaging of population activity in dCA1 or vCA1 to track the activity of the 48 same neurons across multiple stages of learning as mice learned to associate odor stimuli with appetitive 49 or aversive outcomes. This allowed us to examine how task-related information is differentially represented 50 across the dorsoventral hippocampal axis and how these representations evolve with learning. Further, we 51 examined the stability of representations across training, the influence of different outcomes on these 52 encoding properties, and how neural representations adapt when cue-outcome relationships are altered. 53 54 RESULTS 55

Representations of odor identity across the dorsoventral axis of CA1 56
We imaged odor-evoked neural responses in dCA1 and vCA1 using high resolution 2-photon microscopy in 57 mice expressing the calcium indicator GCaMP6f (Fig. 1A-C). In dCA1, we found that odors elicited a robust 58 response in a subset of neurons and that odor-evoked population responses could be discriminated from 59 ITI (baseline) activity with high accuracy using linear decoders ( Fig. 1E-H and Extended Data Fig. 1E). 60 Moreover, odor identity could be accurately decoded from dCA1 population activity (Fig. 1I,J). Surprisingly, 61 however, odors did not elicit robust responses in vCA1 neurons, and linear decoders performed significantly 62 worse compared to dCA1 when discriminating odor activity from baseline activity, or when reading out odor 63 identity ( Fig. 1E-J and Extended Data Fig. 1D). This suggests that, during initial exposure to odorants, odor 64 identities are reliably represented in dCA1 but not in vCA1. 65 66

Attaching behavioral significance to odors enhances their representations in vCA1 67
Given the role of the ventral hippocampus in emotional and motivational processes (Tannenholz et al.,68 2014), we reasoned that odor representations may be enhanced in vCA1 if paired with a salient outcome. 69 To disambiguate odor representations from potential reward anticipation signals, we used a two-odor trace 70 appetitive conditioning paradigm wherein the CS+ odor was separated from sucrose reward delivery by a 71 2s trace period. Following ~4 days of training, mice displayed anticipatory licks during the CS+ trace period, 72 with minimal licking during all other task periods ( Fig. 2A-D). 73 In vCA1, learning of the odor-reward association was accompanied by an overall increase in mean activity 74 during the CS+ odor presentation ( Fig. 2E and Extended Data Fig. 2B,C) and heightened ability to decode 75 CS+ odor period activity from baseline activity ( Fig. 2H and Extended Data Fig. 2D), suggesting that 76 assigning value to a stimulus leads to increased odor-evoked activity and encoding in ventral CA1. In 77 addition, odor classification accuracy greatly increased in Late sessions after learning when compared with 78 Pre and Early sessions (Fig. 2I,J and Extended Data Fig. 2E), indicating CS+/CS-representations become 79 more distinct in vCA1 with learning. This was in contrast to dCA1, where odor decoding accuracy was high 80 prior to training and remained so with learning ( Fig.2G-J and Extended Data Fig. 2D,E). 81 As odor identities were reliably represented in dCA1 before training, we asked whether representations 82 remained stable over the course of training. For this, we applied a cross-session classifier to neurons tracked 83 across sessions (Fig. 2K), training with data from the Early session and testing classification accuracy using 84 data from the Late session (and vice-versa). As expected from our within-session results, in vCA1 odor 85 representations changed with learning. However, in dCA1 surprisingly we found that odor representations 86 also differed across training sessions, as cross-session decoding of CS+/CS-during the odor period was 87 significantly worse relative to within-session decoding (Fig. 2L). Therefore, although CS+/-representations 88 are highly separable both before and after training, the representational geometry of CS+/-odors in dCA1 89 is altered with learning. 90 We next examined representations in the trace period, during which learned animals anticipate reward 91 availability. We found parallel changes in trace-period representations in dCA1 and vCA1 with learning (Fig.  92 2E-J and Extended Data Fig. 2B-E). Mean trace-period-evoked activity in both dCA1 and vCA1 markedly 93 increased following CS+ delivery, but not CS-delivery. Correspondingly, CS+/CS-trial-type decoding 94 accuracy during the trace period was significantly higher in both regions following learning. These trace-95 period representations emerged in concert with the initial signs of behavioral learning (Extended Data Fig.  96 2F,G), could not be explained solely by licking behavior (Extended Data Fig. 2H), and were distinct from 97 odor-period representations (Extended Data Fig. 2I). 98 Together, these data suggest that a representation of odor identity is present in dCA1, independent of 99 learning, while odor representations in vCA1 show greater dependence on learned behavioral significance. 100 In addition, with learning both vCA1 and dCA1 are recruited during the trace period prior to reward delivery, 101 seemingly encoding information related to the expectation of reward. 102 103

Learned representations of task elements are modality-independent and learning-dependent 104
We next determined whether our results in CA1 generalized to other stimulus modalities, and whether 105 neuronal changes that emerged with training were indeed learning dependent. For this, we trained a 106 separate cohort of mice on a more difficult auditory cue discrimination task (Fig. 3A,B), where an auditory 107 cue (CS+) and a sucrose reward were separated by a 2s trace period, while a distinct CS-auditory stimulus 108 was unrewarded. Reward delivery was contingent on licking during a 2-second reward availability window 109 directly following the trace period. Unlike the odor-based task, which all animals quickly learned, mice took 110 longer to learn this task (11.7 ± 1.9 days) with some failing to learn altogether ( Fig. 3C and Extended Data 111 Fig. 3E). 112 As with the olfactory-based task, CS+ activity during tone presentation was more accurately classified from 113 baseline activity in vCA1 after learning (Fig. 3D,E and Extended Data Fig. 3C), and classification of CS+/CS-114 tone identities likewise improved with learning ( Fig. 3F,G). This suggests that CS+ and CS-representations 115 become more distinct in vCA1 over the course of discrimination training, regardless of the CS modality. In 116 line with the odor-based task, we also found that decoding of CS+/CS-during the trace period was improved 117 with learning in both regions ( Fig. 3D-G). 118 Interestingly, although the CS+ and CS-tones could each be decoded from baseline activity with moderate 119 accuracy in dCA1 (Fig. 3D,E), tone identity could not be decoded accurately from dCA1 population activity 120 either before or after learning (both ~50% accuracy; Fig.3F,G). This contrasts with odorant identities, which 121 could be consistently decoded with high accuracy, and is likely due to the greater ethological salience of 122 odors vs tones. Despite this lack of strong encoding of tones in dCA1, these results are consistent with our 123 odor task in that learning enhances the separability of cues in vCA1, but not dCA1. 124 A subset of vCA1 mice (n=3) failed to learn the tone-based version of the task, even after 20 days of training. 125 In these "nonlearners" we found classification accuracy of CS+/CS-trial type during the tone or trace periods 126 did not change between Early and Late sessions (Extended Data Fig. 3F,G). Thus, representations that 127 emerge with learning are not simply driven by repeated exposure to task stimuli. 128 129 Learned odor representations in vCA1 are sensitive to extinction but can be rapidly reinstated 130 Collectively, these results suggest that imbuing a stimulus with value enhances its representation in vCA1. 131 How might stimulus representations change upon extinction of the odor-reward contingency? Would vCA1  132  continue to exhibit strong representations of odor, perhaps reflecting an enduring memory of the CS-US  133  association, or would decoding performance fall back to baseline levels, suggesting vCA1 signals track  134  current stimulus value?  135   After mice learned the odor-reward association, we extinguished it, omitting reward from CS+ trials. Mice  136 rapidly extinguished conditioned responding early in the first session of extinction ( Fig. 4A-C). In an 137 extinction retrieval session 24 hours later, we found that odor classification accuracy resembled that of early, 138 pre-learning sessions; that is, low in vCA1 but high in dCA1 (Fig. 4D,E and Extended Data Fig. 4A,B). 139 The following day we reinstated conditioned responses in a reacquisition session. Animals rapidly resumed 140 anticipatory licking behavior during CS+ trials, indicating an intact memory of the rewarded task structure. 141 Correspondingly, odor identity classification accuracy increased in vCA1 (Fig. 4D,E and Extended Data Fig.  142 4A,B). These data indicate that the discriminability of odor representations in vCA1, but not dCA1, is 143 sensitive to the current value associated with an odor. 144 To investigate whether representations were stable before and after the extinction sessions, we used a 145 cross-session classifier using data from cells tracked across Late and Reacquisition sessions. We found 146 that odor-and trace-period representations remained relatively stable across extinction in both vCA1 and 147 dCA1, as trial type could be accurately classified across sessions during both periods (Fig. 4F). This 148 contrasts with the instability of odor representations observed during initial learning (Fig. 2L, 4G and 149 Extended Data Fig. 4D,E) and indicate that, once learned, representations of odor and reward anticipation 150 are to a large extent stable across days and across the degradation and reinstatement of odor-reward 151 contingencies, indicating that CA1 may be a storage site for these representations 152 . We likewise examined whether vCA1 contained more 157 diffuse representations than dCA1 in our task as the mouse "moved" through the trial. We trained a linear 158 classifier to discriminate trial type (CS+ vs CS-) using data from a single time bin, then tested classification 159 accuracy on every other time bin (Fig. 5A). We found that in vCA1, but not dCA1, there was a persistent 160 trial-type representation throughout the trial duration, and trial-type could be decoded even when training 161 and testing on time bins separated by +/-5s seconds (Fig. 5B,C). Importantly, this was only observed for 162 time bins within the trial duration (1s post odor onset through 4s post reward delivery), and most prominently 163 for sessions where the CS-US contingency had been learned and was actively being rewarded. As these 164 prolonged representations in vCA1 may in part arise from protracted individual neuron activity in vCA1 165 compared to dCA1, we examined the distribution of activity in these regions during trials. Indeed, we found 166 that vCA1 displayed broader firing within trials than dCA1 (Extended Data Fig. 5A is being encoded during these periods, we trained mice with four odor stimuli; two that were always followed 172 by sucrose reward (CS1+ and CS2+), and two that were followed by no outcome (CS3-and CS4-). This 173 design allowed us to directly test the similarity of representations across trial types with distinct cue identities 174 but the same outcome (Fig. 6A,B). 175 We first tested how well a linear classifier could predict each of the four trial types using population activity 176 during the odor or trace periods (Fig. 6C). We found that, following learning, odor identity could be predicted 177 with high accuracy during the odor-delivery period for both dCA1 and vCA1. Conversely, although individual 178 trial types remained discriminable during the trace period, classification accuracy was lower in both regions 179 during this period, with classifier errors predominantly occurring between trial types with the same outcome 180 (e.g., CS1+ and CS2+). Thus, after learning, trace period activity is highly discriminable between trial type 181 categories (CS+ vs CS-), but less so within each category. 182 The reduced decoding accuracy between CS1+ and CS2+ trial types during the trace period suggests a 183 common signal across these trials. To more directly test this, we trained a linear classifier to discriminate 184 activity between a reward-predictive trial type (e.g., CS1+) versus a non-predictive trial type (e.g., CS3-), 185 then tested classification accuracy using data from the complementary trial types (CS2+ and CS4-), which 186 the decoder had never seen ("outcome decoding"; see Fig. 6D). Here, high decoding accuracy would 187 indicate similar neural states across related trial types (e.g., CS1+ and CS2+). 188 After learning, we found high outcome decoding accuracy during the trace period in both dCA1 and vCA1 189 (Fig. 6D), further indicating that there exists a representation related to reward expectancy that is 190 independent of the identity of the stimulus that precedes it. In contrast, outcome could not be decoded during 191 the odor delivery period, suggesting that CS identity or specific CS-outcome associations are primarily 192 represented during odor delivery. Dimensionality reduction analysis returned results that were analogous 193 with the above (Extended Data Fig. S6). Together, these data indicate a switch from individual odor 194 representations (during odor delivery) to representations related to expected outcome (during trace) as 195 animals progress through a trial. identities or perhaps information more representative of memory, such as specific cue-outcome 207 associations. To adjudicate between these possibilities, after training mice to associate one odor with 208 sucrose (CS+rew) and another with shock (CS+sh), we reversed the contingencies, where the previously 209 rewarded odor was now paired with shock and vice-versa (Fig. 7K). Animals were trained on the new 210 contingency until anticipatory licks were only observed during the CS+rew trial (Fig. 7L), and neurons were 211 tracked across all sessions. To probe for stable encoding of odor identity across different US pairings, we 212 decoded across reversal learning, training a linear classifier with population data from the final session prior 213 to reversal (Late), and tested classification accuracy using data after the reversed contingencies had been 214 learned (Late Reversal). In both dorsal and ventral CA1, we found that the neural representation for a 215 specific odor remained intact regardless of whether the odor predicted sucrose or shock ( Fig. 7M and  216 Extended Data Fig. 7G). 217 Finally, we assessed whether outcome expectation signals during the trace period remained stable following 218 reversal learning, and whether stability differed for reward vs. shock outcomes. For this, we performed the 219 same analysis as above, but using trace period data. In both dCA1 and vCA1, the cross-session decoder 220 performed well when discriminating reward anticipation signals preceded by different odors (Fig. 7N and 221 Extended Data Fig. 7H), suggesting a conserved signal related to reward anticipation that is independent of 222 the odor that precedes it. Conversely, there was not a conserved representation across reversal learning 223 when anticipating shock. Additional analyses revealed this was due to the absence of pre-shock trace-period 224 signals during the Late Reversal session (Fig. 7O). These results indicate that odor representations in both 225 dCA1 and vCA1 are independent of the nature of the associated US, and that stable signals preceding 226 reward, but not shock, emerge with learning in these regions. reactive to odors prior to training, and odor identity was poorly decoded using population activity. Instead, 233 odor decoding was generally contingent on salience, increasing for odors predictive of salient outcomes 234 (e.g., reward or shock), and decreasing in the absence of these outcomes (e.g., extinction). Although odor representations in vCA1 only emerged when an odor gained predictive value, it is important 239 to note that vCA1 does not represent value per se, as population activity differed for distinct odors with the 240 same associated value. We also did not find evidence that the associated outcome was represented by 241 population activity during odor delivery, although it's possible this information was represented by a small 242 subset of cells whose signal was too weak to appreciably influence the decoder. Instead, our data suggest 243 vCA1 strongly encodes the identity of stimuli that are relevant to the animal. This dependency on relevance 244 is fundamentally different from that observed in dCA1, where the separability of stimulus representations 245 was not dependent on their perceived relevance. Such relevance-selective processing in vCA1 may be 246 important for conveying stimulus-value information to the frontal cortex ( extend these findings beyond the spatial domain and show that ventral CA1 also contains a dedicated signal 266 during reward anticipation that generalizes across predictive cues and is stable across days. 267 Most findings observed with appetitive conditioning were likewise seen with cue-shock conditioning. 268 Surprisingly, however, shock anticipation was only weakly encoded by both dCA1 and vCA1, despite robust 269 activation of both regions in response to cue and shock deliveries. Moreover, shock anticipation signals 270 were further diminished with subsequent training, eventually becoming indistinguishable from ITI activity. 271 Although previous reports examining anticipation of aversive stimuli are mixed for dCA1 ( representations are more circumscribed and vCA1 representations generalize across large swaths of space 307 (Jung et al., 1994;Kjelstrup et al., 2008), our cross-time-bin population analysis suggests a more rapid 308 turnover of population activity patterns (neural states) in dCA1 compared to vCA1. Thus, the broad firing 309 observed in vCA1 during spatial exploration may reflect a more general property of this region that extends 310 beyond representations of physical space. These results are also in line with human studies of memory that 311 suggest posterior HPC is associated with recall of detailed information, such as the temporal sequence of 312 events, while anterior HPC represents higher level information, such as the location of where the collection 313 of events occurred (Harland et   (G) Population-activity decoding accuracy for odor 1 or odor 2 from baseline (±SD). Colored-coded bars 397 above graph denote periods where accuracy is significantly greater than chance (p < 0.01, Mann-Whitney 398 U test). 399 (H) Odor-period decoding. Population activity during the last second of odor delivery was used to decode 400 odor 1 or odor 2 from baseline. 401 (I and J) Same as (F) and (G) above, but decoding for trial type (odor 1 or odor 2) at each time bin t. 402 403 For all figures: * p< 0.05, ** p < 0.01, *** p < 0.001. See Table S1 for all statistical analysis details. 404 is unstructured but becomes restricted to the time periods directly before and after reward delivery 410 following CS+/CS-discrimination learning (Late    (C) Confusion matrices for decoding trial type from population activity. The y-axis denotes the actual trial 466 type experienced and the x-axis indicates the proportion of trials for which each trial type was decoded. 467 The ascending diagonal represents the proportion of trials correctly classified, while other row entries 468 indicate the proportion of trials where the actual state was confused with the corresponding trial type. Note 469 the high accuracy during the odor delivery period and the increased incidence of parallel trial types (e.g., 470 CS1+ and CS2+) being confused during trace. 471 (D) A linear classifier was trained to discriminate activity between reward-predictive and non-predictive trial 472 types (e.g., CS1+ vs CS3-), then tested using data from the complementary trial types (CS2+ and CS4-). 473 The mean for all combinations of trial-type pairs is presented (±SEM, Mann-Whitney U test). 474   (B) Cross-validated neural activity. Each trial type (CS+ or CS-) was separated into odd and even trials, and 514 neural activity was z-scored. For each time bin, z-scores were averaged across all trial subsets, and sorted 515 by peak firing rate latency during odd trials. Population mean is shown directly below heatmap (±SEM). 516 (C) Proportion of neurons whose activity was significantly modulated during odor-or trace-period compared 517 to pre-odor baseline (deemed "responsive cells"   (vCA1 and dCA1). 558 (A) Population-activity decoding accuracy for CS+ or CS-trials from baseline (±SD). Color-coded bar above 559 shows periods where the corresponding trial type accuracy is significantly greater than the opposing trial 560 type (p < 0.01, Mann-Whitney U test). 561 (B) CS+ versus baseline odor-period decoding accuracy for each learning phase (±SEM, Mann-Whitney U 562 test). Note that decoding performance is correlated with odor value in vCA1 (left) but not dCA1 (right). 563 (C) Same as (B), but for trace period. Decoding performance is correlated with reward expectation in both 564 vCA1 and dCA1. 565 (D) Activity during CS+ trials for neurons registered across session pairs. For each time bin, activity z-scores 566 for each neuron were averaged across all trials within a session, and neurons were sorted by peak firing 567 rate latency during the indicated session. Note the changing subset of task-responsive cells from Early to 568 Late, and the relative stability following learning (Late to Reacquisition). Also note the few cells responsive 569 to sucrose reward delivery during Early that translocate their firing to the CS and trace periods following CS-570 US learning. (E) Hypothetical results for decoding CS+reward from CS+shock trials across reversal learning (for this set  613 of results, stable encoding of US identity across reversal is assumed). Because data classes were labeled 614 with respect to the outcome of a trial, and not the odor identity, stable neural representations of odor identity 615 will manifest as cross-session decoding accuracies that are below chance (middle graph). 616 (F) Actual results for decoding trial type across reversal learning (±SD). The below chance decoding 617 accuracy for CS+reward vs CS+shock during the odor period indicates representations of odor identity 618 dominate the population activity during this time. 619 (G) Across-reversal odor ID decoding accuracy during the odor period (±SEM, Mann-Whitney U test). 620 (H) Across-reversal trial type decoding accuracy during the trace period (±SEM, Mann-Whitney U test). 621

622
Contact for reagents and resource sharing 623 Further information and requests for resources and reagents should be directed to and will be fulfilled by the 624 Lead Contact, Mazen Kheirbek (Mazen.Kheirbek@ucsf.edu). 625 626 Materials Availability 627 This study did not generate new unique reagents. 628 629 Data Availability 630 The datasets supporting the current study are available from the lead contact on request. 631 632 Code Availability 633 The analysis code supporting the current study are available from the lead contact on request. 634 635 Mice 636 All procedures were conducted in accordance with the NIH Guide for the Care and Use of Laboratory 637 Animals and institutional guidelines. Adult male C57BL/6J mice were supplied by Jackson Laboratory. Mice 638 were kept on a 12-hour light cycle, with experiments conducted during the light portion. 639

Surgery 640
Animals were 11 -15 weeks postnatal at time of surgery. Mice were anesthetized with 1.5% isoflurane with 641 an O2 flow rate of 1 L / min, and head-fixed in a stereotactic frame (David Kopf, Tujunga, CA). Eyes were 642 lubricated with an ophthalmic ointment, and body temperature was maintained at 37°C with a warm water 643 recirculator (Stryker, Kalamazoo, MI). The fur was shaved and incision site sterilized prior to beginning 644 surgical procedures. Lidocaine, meloxicam, and slow-release buprenorphine were provided for analgesia. 645 GCaMP6f virus injection and GRIN lens implantation were conducted using methods previously described 646 (Jimenez et al., 2018). Briefly, a craniotomy was made over the lens implantation site and dura was removed 647 from the brain surface and cleaned with sterile saline and absorptive spears (Fine Science Tools, Foster 648 City, CA). A nanoject syringe (Drummond Scientific, Broomall, PA) was used to deliver GCaMP6f to vCA1 649 or dCA1 (left hemisphere for both). vCA1 coordinates were -3.16 A/P and -3.25 M/L. 150nl of virus was 650 injected at each depth of -3.85, -3.55 and -3.3 (450nl total volume) with respect to bottom of skull at the 651 medial edge of the craniotomy. dCA1 coordinates were -2 A/P, -1.65 M/L and -2.1 A/P, -1.45 M/L at depths 652 -1.5, -1.25 D/V with respect to bregma. The needle was held in place for > 5 minutes prior to moving to the 653 next D/V coordinate and remained in place for 10 minutes following the final injection before slowly removing 654 from the brain. AAV1-SYN-GCaMP6f-WPRE-Sv40 (titer: 1.97E+13) was supplied from University of 655 Pennsylvania viral vector core and diluted 1:3 in 1x sterile PBS before injections. For dCA1, prior to virus 656 injection the overlying cortex was slowly aspirated until axonal fibers of the external capsule/alveus were 657 visualized. Following virus injection, a 0.6mm (vCA1) or 1.0mm (dCA1) diameter GRIN lens (Inscopix, Palo 658 Alto, CA) was slowly lowered in 0.1 mm D/V steps and then fixed to the skull with Metabond adhesive 659 cement (Parkell, Edgewood, NY). vCA1 lens coordinates were -3.16 A/P, -3.5 M/L and -3.5 D/V (from bottom 660 of skull at craniotomy; Extended Data Fig. 1A, 3A). dCA1 lens coordinates were -2.05 A/P, -1.5 M/L, -0.95 661 D/V (from bregma; Extended Data Fig. 1B). A custom-made titanium headbar was then attached to the skull 662 using dental cement (Dentsply Sinora, Philadelphia, PA). A baseplate and cover (Inscopix, Palo Alto, CA) 663 was also cemented on to protect the lens. 664 For dCA1 animals in the tone discrimination paradigm, a 3mm craniotomy was made, and the overlying 665 cortex was aspirated until axonal fibers of the external capsule/alveus were visualized. The aspiration site 666 was continuously irrigated with cold, sterile saline. Viral injections (120 nl per site) were performed at the 667 same sites as above. A custom made dCA1 imaging window was implanted, which consisted of a 3mm 668 round coverslip, #0 thickness (Warner Instruments, Hamden, CT) attached with optical adhesive (#81,  669 Norland Products, Cranbury, NJ) to a metal cannula containing 1/8" outer diameter and 1/16" in length 670 (McMaster-Carr, Santa Fe Springs, CA). This window was carefully lowered into place, until it rested on top 671 of the exposed tissue (Extended Data Fig. 3B). The cannula was then cemented into place with Metabond 672 adhesive, and a custom titanium headbar was cemented in place. 673

Verification of imaging sites and histological analysis 674
Dorsal and ventral CA1 imaging sites were verified in each animal included in final analysis (Extended Data 675 Figs. 1 and 3). After all imaging sessions were completed, mice were injected with a lethal dose 2:1 676 ketamine/xylazine solution intraperitoneally. While the heart was still beating, mice were perfused 677 transcardially using 4% PFA solution. Brains were extracted and placed in 4% PFA solution for 2-3 days to 678 allow further fixation. After saturating with a 30% sucrose solution, coronal slices of 50-micron width were 679 collected using a Leica SM2000 microtome. Slices were collected in 1x PBS solution and mounted onto 680 glass slides, coverslipped with Fluoromount G with DAPI (Southern Biotech, Birmingham, AL). 681

Behavioral training 682
Four-to-six weeks following surgery, animals were handled and habituated to the experimenter, training 683 environment and head fixation for one week. Following habituation, animals were water restricted to ~85-684 90% ad lib weight and underwent a 2-3 day pretraining period designed to introduce the sucrose delivery 685 apparatus, with free sucrose rewards (~2 µl each) intermittently delivered upon licking (up to 80 rewards in 686 a 20 min session). Sucrose rewards (10% sucrose, 0.03% NaCl in water) were delivered via a solenoid-687 gated gravity feed. Contact with a lick spout positioned in front of the mouth was measured using a capacitive 688 touch MPR121 sensor (SparkFun, Boulder, CO). Stimulus delivery and sensor reading was controlled by an 689 Arduino Mega with custom circuit boards (adapted from OpenMaze.org) and recorded via CoolTerm 690 software. Once animals displayed consistent and motivated licking (80 rewards collected in a single 691 session), lick training was complete and the pretraining odor exposure session was initiated the following 692 day. Throughout training, animals were water restricted to ~85-90% ad lib weight. All training paradigms 693 consisted of one training session/day, occurring at roughly the same time each day. Learning of the 694 discrimination tasks was assessed using lick discriminability ( ') for each session, which compares the rate 695 of anticipatory licks during the trace period of CS+ trials with CS-trials: 696 ′ = (mean CS+ licks − mean CS-licks) [ (CS+licks) + (CS-licks)] 2

697
Learning was determined as a ' score > 1.5 for a session, with all Late session mice meeting this criterion. 698 Pretraining odor exposure: One day prior to conditioning, animals underwent a single session where they 699 were passively exposed to neutral odors (benzaldehyde, eugenol) that would subsequently serve as CS+ 700 and CS-odors in the 2-odor paradigm. Each session consisted of 30 trials (15 of each odor) of 2 second 701 odor presentations. There was no lick spout present during this session. The inter-trial interval between 702 subsequent odor deliveries was chosen as a random sample from a uniform distribution between 17.5 and 703 27 seconds. Odors were delivered via a custom-made olfactometer equipped with a mass flow controller 704 (Alicat Scientific, Tucson, AZ) that maintained air flow at 2 liters per minute and prevented momentary 705 pressure changes from solenoid valve switches (Clippard, Cincinnati, OH) upstream of the controller. Odors 706 were delivered to mice via a customized nose cone, which contained an outlet where a gentle vacuum was 707 applied to evacuate residual odor. Additionally, an ongoing charcoal filter vacuum system (Hydrobuilders 708 Inc.) was used to evacuate any residual odors. 709 2-odor paradigm: Each associative learning session consisted of 120 trials (60 CS+ and 60 CS-, 710 pseudorandomly presented). Two neutral odors served as CS+ and CS-cues (benzaldehyde or eugenol, 711 2s) with cue contingencies counterbalanced across mice. Presentation of the CS+ cue was followed by a 712 2s trace period and subsequent reward delivery (~2 µl). No reward was available following the presentation 713 of the CS-cue. Animals were not punished for off-target licking. The inter-trial interval between subsequent 714 cues was chosen as a random sample from a uniform distribution between 17.5 and 27 seconds. 715 This task structure was administered over a period of ~7 days, in which day 1 and 4 were termed "Early" 716 and "Late" learning, respectively. If an animal did not meet the learning criterion (d' > 1.5) on day 4 (n=3 717 animals), training continued until this criterion was met. The two days following the Late session extinction 718 sessions, labeled as "Ext1" and "Ext2", respectively, in which the odor-reward association was extinguished 719 by removing the sucrose reward for CS+ trials (the lick spout remained in place). Ext2 was followed by a 720 one-day reacquisition session, labeled as "Reacquisition", in which the sucrose reward was reintroduced for 721 CS+ trials. A total of 11 vCA1 and 5 dCA1 animals were included in the data set. 722 2-tone paradigm: Each associative learning session consisted of 160 trials (80 CS+ and 80 CS-, 723 pseudorandomly presented). Two auditory tones served as CS+ and CS-cues (2.5 kHz and 13kHz pulsing 724 tones, 2s, 70 dBs) with cue contingencies counterbalanced across mice. Presentation of the CS+ cue was 725 followed by a 2s trace period, then a 2s reward window which required a lick for sucrose reward delivery (~2 726 µl, maximum one reward per trial). No reward was available following the presentation of the CS-cue. 727 Animals were not punished for off-target licking. The inter-trial interval between subsequent cues was 728 chosen as a random sample from a uniform distribution between 17.5 and 27 seconds. A total of 7 vCA1 729 and 2 dCA1 animals were included for analysis (a separate cohort of mice than that used for the odor-based 730 experiments). In a subset of animals (n= 4 vCA1 and n = 2 dCA1), multiple z-planes were imaged across 731 sessions. Imaging planes were separated by > 60 µm to ensure there was no overlap of cells present across 732 different z-planes. (methylbutyrate, isoamyl acetate, eugenol, eucalyptol), one of which (eugenol) had been experienced 746 previously in the 2-odor task (due to a lack of available neutral odors) and was paired with the same outcome 747 as previously experienced. Presentation of the CS+ cue was followed by a 2s trace period and subsequent 748 delivery of 10% sucrose solution (~2 µl). No US was presented following the presentation of the CS-cue. 749 Animals were not punished for off-target licking. The inter-trial interval between subsequent cues was 750 chosen as a random sample from a uniform distribution between 17 and 23 seconds. A total of 7 vCA1 and 751 3 dCA1 animals were included in the data set. 752

2-photon imaging 753
Genetically encoded calcium imaging of GCaMP6f was used to assess the functional activity of individual 754 neurons. Images were captured using an Ultima IV laser scanning microscope (Bruker Nano, Registration of cells across sessions imaged at the same FOV used probabilistic modeling of similarities 779 between cell pairs across sessions (CellReg, (Sheintuch et al., 2017)). Briefly, spatial footprint maps were 780 generated for each session by projecting the spatial filter of each cell onto a single image. Spatial footprint 781 images from sessions imaged at the same FOV were then aligned. The distribution of similarities between 782 pairs of neighboring cells were subsequently modeled via centroid distance to obtain an estimation for their 783 probability of being the same cell (Psame). Cells were then registered across sessions via a clustering 784 procedure that utilizes the previously obtained probabilities, with a probability threshold of 0.8. The average 785 Psame value for registered cells was 0.95. All putative matches were visually inspected. 786

Data analysis 787
For statistical analyses and figures, calcium event activity was separated into 1-second bins and average 788 activity during each bin was used. When reporting specific epochs of task results, "odor period" constituted 789 the final 1-second bin of odor delivery (1-sec to 2-sec post odor onset), while "trace period" constituted the 790 final 1-second bin of the trace period prior to reward delivery (1-sec to 2-sec post odor offset), unless 791 otherwise noted. These time bins were chosen to ensure odor was being experienced throughout the entire 792 odor bin and to minimize any residual odor effects during trace period analysis. All statistical analyses were 793 two-sided. For all figures: * p< 0.05, ** p < 0.01, *** p < 0.001. See Table S1 for all statistical analysis details. 794

Population decoding 795
A linear decoder was used to discriminate activity patterns into two discrete categories (Bishop, 2006): 796 where is the predicted label of the population activity pattern ⃗ recorded at time ⃗ and takes two values 798 corresponding to two classes of patterns to decode (for example, two odor identities), is the vector of 799 weights assigned to each cell, and is a bias term constant. Decoding parameters were attained via a 800 supervised learning protocol with labeled data and used a support-vector machine (SVM) with a linear kernel 801 (python/scikit/linearSVC). Results are reported as the generalized performance of the decoder using cross-802 validation, a standard machine learning procedure to avoid data overfitting. When multiple categories were 803 involved, e.g., more than two trial types, multiple linear decoders were trained on pairs of discrete categories 804 combined using majority-based error-correction codes. 805 We defined the patterns of calcium activity by computing the mean event rates during one-second time bins. 806 Pseudo-population recordings were generated by combining cells across multiple animals/FOVs. For 807 decoding, one-half of trials were randomly selected from each class and pseudo-population activity from 808 these trials was used to train the decoder, while the remaining held-out half was used to evaluate the 809 decoder's generalization performance. When comparing decoding accuracy between neural populations of 810 different size, we trained our decoder on a random subsample of cells from the more numerous population 811 equal to that of the smaller population. We repeated the operation 100 times and then combined the cross-812 validated decoding accuracies of all random choices together to get a single sample of decoding accuracies. 813 We repeated the procedure 10 times to perform statistical comparisons across groups and against chance 814 performance. A two-sided Mann-Whitney U Test was used to compare decoding accuracies between 815 groups, and Bonferroni correction used for multiple comparisons. 816 For decoding against baseline, we used population activity during the 1-second time bin that began three 817 seconds prior to CS onset as baseline data. Cross-time-bin and cross-session decoding followed the same 818 procedure as within-session decoding. In the case of cross-session decoding, only cells registered across 819 the compared sessions were included. 820 For decoding distant time bins in our cross-time-bin decoding analysis (Fig. 6C, and 7H,J), we took the 821 average of all decoding runs for each cross-time-bin comparison that was separated by 3 or more bins. 822 Further, we only included comparisons where both train and test data occurred at least 1 second after odor 823 onset (that is, pre-trial data was excluded). 824

Multidimensional Scaling (MDS) 825
We performed 2-dimensional MDS scaling of event data using python/scikit/MDS. As with decoding, we 826 combined all cells recorded from a particular region (e.g., vCA1) across all mice into one pseudopopulation. 827 For each trial type, 100 trials were randomly selected for analysis, and MDS was performed. The Euclidean 828 distance was taken between each trial type, and this process was repeated 100 times. Bar charts of 829 Euclidean distance show the mean ± SD of all runs. 830

Single-cell responsivity 831
Data used for heatmaps of calcium-traces or inferred events were not binned. For each cell, z-scores were 832 computed over the entire dataset for a specific condition (e.g., CS+ trials). To identify cells whose activity 833 was modulated during specific epochs (e.g., CS+ period, trace period, etc.), for each trial containing the 834 specified epoch the average event magnitude during the 1s epoch was compared to the average event 835 magnitude during a 1s baseline period immediately prior to cue onset for that trial. P-values were determined 836 using a two-tailed Mann Whitney U test and the False Discovery Rate (FDR) was applied to correct for 837 multiple comparisons. Cells with an adjusted p-value < 0.05 were classified as responsive. Fisher's exact 838 test was used to compare whether the proportion of selective cells for a specific epoch (e.g., CS+ Early vs 839 CS+ Late) significantly differed (p < 0.05). 840 To compare the persistence of CS+-trial-related activity in vCA1 vs dCA1 neurons, we first parsed CS+ trial 841 data into odd or even trials and averaged activity for each cell across these trials. We then extracted cells 842 whose peak activity during odd trials occurred between odor onset and US onset. Average activity for these 843 cells during even trials was then collected +/-4 seconds around the time point of odd-trial peak activity, 844 normalized to the amplitude of odd-trial peak activity, and plotted (Extended Data Fig. 5A). 845

Lick Correlation 846
We determined whether activity of dCA1 or vCA1 neurons were correlated with licking. We regressed the 847 lick rates across the session against calcium events. We fit a linear regression model to predict lick rates 848 and used the explained variance (R 2 ) as a measure of goodness of fit to compare the results across animals 849 and days. We divided each analyzed session in 10 time-contiguous blocks and computed the generalization 850 performance of the model with 10-fold cross-validation over these blocks to avoid overfitting. Regression 851 was performed with regular linear regression with Lasso, and verified that the results are not qualitatively 852 different. 853

Aha Analysis 854
We identified the first moment of distinguished licking behavior between CS+ and CS-trials by locating in 855 all mice in the 2-odor paradigm an "aha" moment. This was calculated by averaging across every 4 trials 856 the cumulative CS+ and CS-lick rates, taking the slope of the difference in cumulative licking between these 857 bins, and checking if 1) the difference exceeded the previous bin's slope >= 1 standard deviation of the 858 difference line up to that bin, and 2) the slope increase exceeded 1/3 of the difference between the previous 859 set of trials. The averaging and thresholding with an increased slope relative to previous trials limited 860 detection of instances where a short sequence of successfully discriminated trials were followed by a return 861 to incorrect lickings, which would not represent a true aha moment. For potential aha moments detected on 862 the first day of learning, we set a threshold of a minimum of 80 licks so that only mice who demonstrated 863 lick rates similar to or above the baseline we required during lick training could be considered to have 864 learned. All aha moments detected by this method were cross-checked with examining the raw licking data 865 to ensure accuracy. Aha moments across mice spanned the first two days of learning, with 62% of mice 866 reaching an aha moment on the first day or learning, and all mice reaching an aha moment by the end of 867 the second day. 868 For aha population decoding analysis, we used 30 trials before or after the aha moment. For mice where 869 the aha moment was < 30 trials from the end of the first or beginning of the second day of learning, trials 870 from both days were included in order to reach the full 30 trials, and only cells registered across both 871 sessions were included. For Mann Whitney U tests and Wilcoxon tests versus chance, effect size was determined using: 886 where is the total sample size and 888 = − ( 1 2) 2 � 1 2( 1 + 2 + 1) 12

889
Where 1 is the sample size of sample 1, 2 the sample size of sample2, and is the U test statistic 890 obtained from the statistical test output. 891 For t-test analysis, Cohen's d was used, defined as the difference between group means, divided by their 892 pooled variance. 893 For Fishers analysis, the odds ratio was obtained directly from the test output. 894