4.1 Keeping a memory of past episodes
Basically, we have explained above that, to give an ecological meaning to our behavior, our direct sensorimotor capabilities (being able to orient toward an object of interest (the where question) and being able to exploit the object with the body (the how question)) are enslaved by the motivational and emotional analysis of the situation (the why and what questions). At a first level of complexity, this can be performed by subcortical structures (amygdala, PAG and hypothalamus) learning simple pavlovian associations and having strong relations with the ventral striatum.
In the simplest cases, when the goal of the behavior has been identified in the sensory region of the amygdala (BLA) and is directly available for consumption, BLA activates the amygdalar output CeA for pavlovian response and sends also projections to the shell of NAcc for the corresponding consummatory behavior. Anatomical and functional considerations underline how these responses are similar. There is in fact anatomical continuity between CeA and the shell of NAcc with a proposed similar functional organization (Cassell et al., 1999) including strong dopaminergic innervation and projections to the same regions of motor output (including PAG and lateral hypothalamus).
When the target of the behavior is not directly identified, the general class of motivation can give information to energize a preparatory behavior that will result in selecting the target. This is allowed by projections from CeA to the core of NAcc and can result in simple autoshaping or in more complex goal-directed behavior. This view gives to the ventral striatum (or NAcc) a central role at the interface between pavlovian and instrumental learning (Mannella et al., 2013). That is the reason why it is particularly interesting to remark that, considering more elaborated information that was incorporated to the system along evolution and particularly from birds (Striedter, 2016), information of episodic memory originating from the hippocampus is projected to the striatum mainly in its ventral division (Voorn et al., 2004).
In mammals, within the medial temporal lobe generally reported as dedicated to declarative memory, the hippocampus is more precisely associated to episodic memory (Tulving, 1972), allowing to remember specific events in their context. Through its input structure, the entorhinal cortex, the hippocampus receives cortical information from the posterior ventral cortex related to the what and why questions (via the perirhinal cortex) and from the posterior dorsal cortex related to the where and how questions (via the postrhinal cortex, also called parahippocampic, depending on species) and aggregate them, including their organization in time (Jensen and Lisman, 2005), in an episode or event (Diana et al., 2007). This association of arbitrary information is made possible by the unique recurrent architecture of the hippocampal region CA3 that makes it work as an associative memory, learning very rapidly an event (Kassab and Alexandre, 2015). This recurrent structure appears in birds (Striedter, 2016). In reptiles, the ancestor of hippocampus is just a spatial memory.
Decision to memorize an event can be made intrinsically on the basis of its novelty and from extrinsic afferents, particularly originating directly or indirectly from the amygdala (Paz and Paré, 2013), signaling errors of reward prediction and consequently a need for a more precise learning. Errors might be due to ambiguities in the conjunction of features (O’Reilly and Rudy, 2001) or in their temporal ordering, as the hippocampus is also particularly critical for sequence and delay learning (Jensen and Lisman, 2005). The dentate gyrus (DG) appears in mammals (Striedter, 2016) as an intermediate step between the entorhinal cortex and CA3, with different functions of pattern separation (Kassab and Alexandre, 2018) during learning, to avoid errors in recall. In the recall process, thanks to direct projections between the entorhinal cortex and CA3, the hippocampus can be activated from partial information, evoke the complete episode and facilitate reactivation of other brain regions (Gruber and McDonald, 2012) via its output structures, CA1 and the subiculum. This has been for example reported as contextual signals sent to the amygdala for the extinction of pavlovian conditioning (Moustafa et al., 2013) or as predictive signals of possible paths sent to the entorhinal and prefrontal cortex and also to the ventral striatum during navigation of rats in a maze (Gruber and McDonald, 2012).
From its ability to store and later detect and recall complex multimodal episodes, particularly including delays between their constituents, the hippocampus provides the ventral striatum and the amygdala with more complex features than simple sensory cues sent by the thalamus or the cortex. It is for example reported that hippocampal inputs are critical to the amygdala in pavlovian trace conditioning (Paz and Paré, 2013), when the CS and the US are separated by a delay. This also allows to create conditioned reinforcers in the amygdala, corresponding to sub-goals or intermediate steps in a sequence of behaviors, sent to the ventral striatum and evoking surrogates of rewards when the actual reward is distant, as it is often the case in instrumental conditioning (Cardinal et al., 2002).
The distinction evoked above between the posterior ventral cortex (the what and why questions) representing perception for recognition and the posterior dorsal cortex (the how and where questions) rather representing perception for action (Milner and Goodale, 1995), has also been clearly reported in the hippocampus (Fanselow and Dong, 2015), with a dorsal region rather involved in navigation, with neurons coding for location (place cells) or head direction and a ventral region rather involved in emotional aspects and massively projecting to the amygdala and to the ventral striatum (mainly the shell). It must be noted that the dorsal hippocampus also projects to the core of the ventral (and the dorsomedial) striatum and to the anterior cingulate cortex (Pennartz et al., 2011), which underlines the special position of the medial prefrontal CBG loop, intermediate between pavlovian and instrumental conditioning and associating basically responses and outcomes. This will be discussed in more details in Sect. 4.3 below.
Recent hypotheses about the interplay between the hippocampus and the entorhinal cortex, associated to the computing formalism of successor representation (Stachenfeld et al., 2014) also propose a powerful basis for reasoning processes. In this view, CA3 is seen as a cognitive map encoding not only locations (classically associated to place cells) but also prediction of possible transitions to other locations under a probability distribution learnt through episodes stored in CA3. It is then proposed that this information is exploited in the entorhinal cortex, where so-called grid cells encoding various metric representations have been reported to perform path integration by performing a hierarchical decomposition of space. Now considering that the hippocampus also learns episodes of non spatial concepts, the same process will correspond to propose in the corresponding regions of the entorhinal cortex, a hierarchical decomposition as can be observed in planning, extracting subtasks and subgoals at various levels of description (Stachenfeld et al., 2017). It has been observed for some time (Pezzulo et al., 2014) that internally generated sequences resembling such episodes are replayed at certain key moments and particularly during rest. This replay mechanism is proposed to reinstantiate retrospective memory in the posterior cortex to improve training. It could also be a basis for prospective memory in the PFC, with such mental processes as planning, reasoning and more generally thoughts, as discussed in Sect. 4.3.
In summary, the hippocampus can represent complex events, corresponding to specific episodes, introducing rich and complex sensory information in pavlovian and instrumental conditioning. This gradient of complexity in sensory inputs, from specific cues encoded in the sensory cortex to cognitive maps and emotional episodes in the dorsal and ventral hippocampus is very nicely illustrated in (Voorn et al., 2004), gathering anatomical information in rats about hippocampal projections to the striatum, the amygdala and the frontal cortex, ordered along that gradient. Such complex information allows birds and mammals to learn pavlovian associations with a complex pattern in time. It is also critical in goal directed behavior which requires the prospective evocation of perception-response contingencies and of outcome values, as it has been reported in the hippocampus and the ventral striatum (Bornstein and Daw, 2011).
Experiments in rats (Packard and Knowlton, 2002) have shown that rapid and flexible goal-directed behavior involving the hippocampus and the dorsomedial striatum can be replaced by repetition by an habitual behavior involving the dorsolateral striatum and corresponding to a simple stimulus-response association insensitive to reward devaluation. Since the dorsolateral striatum has no hippocampal but only cortical sensory inputs, it can be thought that the slow habitual learning is constrained by the time for consolidation from the hippocampus to the sensory cortex, of the critical events triggering the response, as described in Sect. 4.2 below. In fact, when habits have been learned, the same experiments (Packard and Knowlton, 2002) show that both goal-directed and habitual learning coexist and are in competition. In a very interesting view (Penner and Mizumori, 2012) using the actor/critic framework where reinforcement learning is decomposed in an actor applying the current policy (rules of behavior) and a critic learning from errors of prediction the value of the outcomes and modifying the policy correspondingly, the dorsolateral striatum is proposed to be the actor for habitual behavior and the dorsomedial striatum to be the actor for goal-directed behavior. The shell, corresponding to the consummatory behavior and learning explicitly the value of the outcome, is proposed to be the critic of the dorsomedial striatum, learning explicitly the model of the world for goal-directed behavior, whereas the core, associated to preparatory behavior not specific of the outcome and learning only to associate a response to a motivational value, should be the critic of the dorsolateral striatum, associating directly in a habitual mode states with responses.
All these pieces of information give a very important role to the ventral striatum at the interface between limbic and motor systems. (Mannella et al., 2013) describes the ventral striatum as the place where motivational values are assigned to goals from their pavlovian value given by the amygdala and their salience and novelty given by the hippocampus. This results in associations between the outcomes and their motivational value in the shell and between responses and outcomes in the core, and the corresponding energizing effect on instrumental behavior. The dorsomedial striatum is also a key player in instrumental behavior and its role will become more clear as more details are given about the prefrontal cortex in Sect. 4.3.
4.2 Building abstract categories
Beyond the memory of specific episodes in the hippocampus, an important innovation has been brought in mammals by the cortex to build structured high-level information over simple signals: the elaboration of abstract categories composing a semantic memory. In the posterior cortex such a representation is built on data flows corresponding to the sensory dimensions evoked by the four questions discussed above (cf. also Fig. 5). This results in hierarchical cortical areas with neuronal populations responding to more and more complex objects (Rousselet et al., 2004), building more and more abstract categories in the ventral information flow relating the exteroceptive and interoceptive poles, for the What and Why questions. Based on considerations on the timing of information propagation (Nowak and Bullier, 1997), the information flow is described as parallel rather than serial in the dorsal pathway to elaborate categories between the exteroceptive and motor poles, related to the questions Where and How, even if intermediate strategies are also observed in associative areas, between a purely constructivist hierarchical and a purely purposive specialized view of information processing (Norman, 2002). This intricate representation is particularly useful to account for selective attention, a function of the posterior cortex particularly critical in primates (Fix et al., 2011), associating selection of spatial regions and implicit or explicit (covert or overt) involvement of body parts in the dorsal regions (for example corresponding to eye movements in the visual case) together with an anticipation of the subregion of the sensory space that will be available and the focused processing of critical features in the ventral regions.
In these associative regions, one crucial (and still open) question is about the choice of the compound objects to be represented since the combinatorics is obviously too large for a systematic representation. This selection is made by learning and in an ecological view, a simple (but vague) criterion is: ”Those which are the most useful to the organism”. A more precise specification must rely on the mechanisms triggering sensory learning in the posterior cortex (Ahissar and Hochstein, 1993), including the role of cholinergic modulation triggered by the amygdala, in case of error of prediction, to favor attentional process in the cortex (Pauli and O’Reilly, 2008), and the role of reinstatement (or replay) of episodes in the cortex, driven by the hippocampus in the consolidation process (McClelland et al., 1995).
Another important actor in the processes described in this section is the sensory thalamus (Sherman, 2007), for the critical role of its sensory part in the activation of the posterior cortex, conciliating feed-forward sensory input and feed-back cortical expectations, and also in cortical learning of new categories, particularly involving multimodal features. Nevertheless, it will not be described in details in this paper. Nor will be described the motor thalamus, even if it also has a critical role in the functioning of the frontal cortex presented below.
4.3 Building flexible sequences
The organization of the frontal lobe can be described in reference to regions of the posterior cortex in which frontal regions can control transitions of states (Burnod, 1989). In the motor cortex, neurons arranged in stripes symmetrical to the somatosensory cortical area can trigger elementary motor actions modifying the position of the bodily scheme until the sensory goal of the action (e.g. position of a limb, characteristics of the sound produced by the phonatory apparatus) is reached. Motor control is also reported in the premotor cortex, with a more integrated topography (Graziano, 2006), corresponding to more ecological categories of the behavioral repertoire like climbing, reaching, etc. Similarly, in oculomotor regions like the frontal eye field FEF (Sommer and Wurtz, 2004), transitions are between initial and final targeted eye positions.
The same process of control of transition between present and targeted states can be used to describe the functions of the limbic frontal regions. In the orbitofrontal cortex, lateral and caudal regions have been described (Padoa-Schioppa, 2009) as learning sensory features of rewarding stimuli and ordering them in a transitive way to define preferences for emotional stimuli, seen as potential goals of the behavior. This is the basis for emotional control, where the selection of a desired goal is sustained until that goal is obtained. Complementary to this consummatory behavior built on specific sensory features, the preparatory behavior can be organized in the medial prefrontal cortex on more general properties, with the ventromedial prefrontal cortex evaluating the rewarding value of stimuli for decision making and, in the dorsal part, the anterior cingulate cortex performing motivational control (Kouneiher et al., 2009). Basically, this region, described as associating responses to outcomes, is in charge of deciding if the energy required by the responses selected in the preparatory phase is worth the corresponding need. Accordingly, it is reported to energize the behavior, i.e. to evaluate up to which level it can be engaged and, when a strategy is selected, to maintain this selection until it is achieved (or given up). Here also motivational control can be defined as a transition from selection to satisfaction of the need, with maintenance of activity if not achieved.
It can be remarked that the medial prefrontal cortex is structured in a ventral part, deciding for the selection of a goal from its rewarding value and current motivations, and a dorsal part, selecting the response from its cost. The dorsal part also monitors the progress of the actual behavioral sequence (Holroyd et al., 2018) and, by comparing actual and predicted costs and rewards, is able to detect errors and conflicts indicating that the current control is not adapted. This can lead to direct adaptation of the motivational control or when this adaptation is not trivial and requires elaborated contextual rules, the needed cognitive control recruits additional circuits in the lateral prefrontal cortex (Badre, 2008). This region, increasingly large in primates, is also distributed in ventral and dorsal regions and is reported to elaborate complex rules (useful to address a specific task), with their complexity defined as the level of sequential arrangement of responses in the dorsolateral prefrontal cortex and as the level of precision in the definition of cues in the ventrolateral prefrontal cortex (O’Reilly, 2010), both together able to build a complex strategy, decomposing a goal and a level of engagement into subgoals and responses to get them. The same principle of maintenance of activity until satisfaction (or giving up), described as a working memory process (Fuster, 1989), results in resistence to distraction, another strong characteristic of the prefrontal cortex. Altogether, it has been proposed that the anterior cingulate cortex plays here a central role (Shenhav et al., 2013) by integrating from the orbitofrontal cortex, the ventromedial prefrontal cortex and the lateral prefrontal cortex three different factors (respectively the value of the reward, the cost of effort and the cost of cognitive control), to determine whether, where and how much control to allocate.
Three different levels of reasoning have been proposed in mammals, along evolution (Koechlin, 2014), relying on different kinds of associations learned in different cortical regions. In a first stage, rodents are able to learn to select the most rewarding response from the perceived stimuli and to correct errors. In their motor loops, they learn S-R associations, called the selective model, useful in the habitual behavior and for anticipating sensory states resulting from a response. Rodents are also able, thanks to their limbic loops, to predict the forthcoming outcomes and to adapt their future behavior in case of errors. These S-O associations are called the predictive model (and are similar to what is learned in pavlovian conditioning). In a second stage, thanks to their lateral PFC as evoked just above, primates can build more complex criteria to select a behavior adapted to specific contexts and to adapt their behavior to the specific situation (context) before committing errors. They accordingly build a contextual model and define rules adapted to specific contexts. In the third stage, according to (Koechlin, 2014), the frontopolar cortex allows humans to monitor several strategies in parallel and to perform hypothesis testing independently from the actual behavior, thanks to prospective mechanisms as described in Sect. 4.1.
Some generic mechanisms of the frontal cortex can be re-interpreted now. Each region of the frontal cortex has been described as preferentially linked to a posterior cortical region and composed of responses monitoring a transition from one state represented in this posterior region to another (from an initial to a final position; from need (e.g. water deprivation) to satisfaction of the need (satiety), etc.). This can be interpreted with the scheme S1-R-S2, where S1 is the initial condition eliciting R as a possible response (cf. (O’Regan and Noë, 2001) and the principle of affordance) and S2 is the consequence that can be anticipated if R is preactivated. Conversely, if S2 is a desired state, R is the response that has to be activated to obtain S2, which is possible if S1 is compatible with the current state. Else, R can display a sustained activity, as in working memory, and remain actively waiting until S1 is satisfied. This interpretation is reminiscent of behavioral studies where antecedents and consequences of goal-directed behaviors are seen as beliefs and desires (Balleine et al., 2009) and of more theoretical works on planning (Burnod, 1989; Pezzulo and Castelfranchi, 2009) explaining how goals (desires) can be decomposed into subgoals (S1 becomes desired) and recursively executed in such S1-R-S2 schemes. In our view, these intermediate steps with subgoals can also be provided by the hippocampus and the entorhinal cortex at various levels of description, as suggested in Sect. 4.1 for prospective memory. They are executed by cognitive control in lPFC: the goal remains active as a working memory in mPFC and activates subgoals and means (which can be seen as intentions) to get them in lPFC until good conditions are met (e.g. finding the kitchen seen as a subgoal to open the fridge), without forgetting the initial goal (of drinking a bottle of water), as ensured by the sustained activity insensitive to distraction.
This view is very consistent with an interpretation of the role of the BG for the dynamic gating of frontal representations (O’Reilly et al., 2010), switching from the updating of the choice of the best response to be selected (from the prediction of the value of its consequence) to the maintenance of its sustained activity until this consequence is obtained. This also explains why goal-directed behavior is defined by its sensitivity to goal devaluation and contingencies of responses (Balleine et al., 2009): in the habitual mode, S1 directly triggers R with no consideration of S2 and of the value of the goal obtained at the end of the process whereas in a goal-directed process, when a action is executed, its consequences are compared with its expected results and the corresponding contingencies are updated in case of a mismatch. Beyond real actions, the premotor theory of attention (Rizzolatti et al., 1987) proposes that attentional control is a weaker activation of motor control, allowing to explore the same situations by an access to the same learnt representations with no (or covert) action. The situations evoked above can consequently be examined in such a mode, corresponding to virtual thoughts instead of real actions in the world.
Globally, this heavy and structured process of the frontal (= prefrontal + premotor and motor) cortex can be summarized as follows: Exteroceptive and interoceptive stimuli can elicit response preactivations in the motor and limbic prefrontal cortex which can also evoke the anticipated consequences in exteroceptive and interoceptive terms. In simple and stable worlds, the elaborated model of the world can become of good predictive quality and at the end, the initial stimuli can be sufficient to trigger directly responses without evoking their consequences. This corresponds to the habitual mode, progressively shifting the control from the limbic to the motor loops (Hélie et al., 2015) and in the long term, only mobilizing the motor cortex in a basic stimulus-response scheme.
Nevertheless, in the early phases of learning or when the world is changing or when the best behavior to be selected does not correspond to the most frequent (for example in a specific context), a more precise analysis of the recent history of performance must be carried out, involving the limbic parts of the prefrontal cortex and of the basal ganglia. This is the reason why the dorsomedial prefrontal cortex is often reported to be involved in error detection and conflict monitoring (Rushworth et al., 2004) and the ventromedial prefrontal cortex to be sensitive to devaluation of outcome (Kringelbach, 2005) for example in case of reversal and extinction. The interoceptive preactivation of the limbic loops can evaluate and supervise this goal-directed learning, depending if gains or losses are observed between anticipated and actually obtained punishments and rewards, and results in the selection of the current goal and motivation.
In this goal-directed process, the role of the basal ganglia is prominent as a critic in the limbic loops to learn from errors of prediction and as an actor to explicitely trigger step by step the full plan of responses, as it was explained above. Concerning the transition between loops, note that both ventro- and dorsomedial prefrontal cortex project to the dorsomedial striatum (Gruber and McDonald, 2012) and that the exteroceptive preactivation of the motor loop is critical to offer affordances that help select the most appropriate preparatory behavior (Pezzulo and Castelfranchi, 2009), also supposed to be performed in the striatal region. The double role of the dopamine (Braver and Cohen, 2000) must be also particularly emphasized here. On the one hand, dopaminergic signals carry reward prediction errors that can be used for learning as it has been shown for a long time in reinforcement learning, with dopaminergic projections from VTA to the ventral striatum mainly for pavlovian aspects and from SNc to the dorsomedial striatum for the instrumental aspects (Yin et al., 2008). These pathways are also at the basis of the spiral principle by S. Haber evoked above (Haber et al., 2000), concerned with the articulation between CBG loops. On the other hand, dopaminergic projections from VTA to the PFC participate to the modulation of performance, by acting on the gating mechanism between maintenance and updating of activity in PFC (O’Reilly, 2006) in case of sudden changes in goal representations. This dual role of dopamine can also be seen as a dual contribution to, respectively, model-free and model-based reinforcement learning.
In a classical view (cf. for example (Dayan and Niv, 2008)), goal-directed behavior is associated to model-based reinforcement learning and habitual behavior to model-free reinforcement learning, in reference to computational learning mechanisms where contingencies of the world useful for decision are respectively gathered in an explicit model of the world or cached in variables summarizing the current state. The analogy refers to the fact that cached variables propose a more compact and less expensive representation than an explicit model and are less sensitive to accumulated approximations, and that in contrast they are very long to evaluate and to modify when the world changes. But the analogy has also been recently questioned (Miller et al., 2018). Model-free is not a perfect term, since a model has been built and even if it has been compiled in cached variables, they ultimately depend on rewarding values; accordingly, value-free (total independence on reinforcement) should be the good term for habits. Concerning model-based learning and the manipulation of explicit knowledge, such information can come from the temporal cortex (for semantic memory) or from the hippocampus (for episodic memory). Experiments have shown that the ventromedial prefrontal cortex is a key relay to associate this information in the process of cognitive control (Vaidya and Badre, 2020).
We have evoked above selective, predictive and contextual models learned in the motor, limbic and lateral loops of the frontal cortex by accumulating history of occurrences of corresponding motor, sensory and rewarding cues. Altogether, this has led to propose the concept of Task Set to describe the organization of frontal regions (Sakai, 2008) and to propose computational mechanisms for the coordination (cognitive control) and selection (decision making) of thoughts and responses for adaptive behavior (Domenech and Koechlin, 2015). In the ideal case, a configuration of cognitive processes in these loops has been selected as the behavior adapted to the situation and it will be actively maintained for subsequent task performance. In case of a problem, some processes will be adapted a posteriori and a new more adapted configuration will be selected or possibly created. These mechanisms are reported to be compatible with the observed arrangements and activations of frontal regions (Donoso et al., 2014); they still have to be consolidated at the computational level beyond stereotyped tasks, particularly concerning the question of creativity (Domenech and Koechlin, 2015). In the current view, applying the best Task Set from a previously learned repertoire rather suggests a strategy globally similar to model-free learning, with critical points where the strategy must be explicitly reconsidered in a model-based manner. In this insightful view, it has to be noted that model-based and model-free approaches are cooperative and not concurrent, thus minimizing their reported weaknesses. Such integrated architectures have already been proposed in the past in reinforcement learning (see the Dyna architecture in (Sutton, 1991)) and are presently extended with stronger biological bases (Hassabis et al., 2017).
All put together, we are still under the double constraint of goal driven and stimulus driven behaviors, with the general pre-eminence of the limbic side (O’Reilly et al., 2014) generating needs corresponding to motivations to be fulfilled. These motivations are then translated into desired goals to obtain from the environment. Reciprocally, stimuli can pre-activate motor responses by affordance, which will be directly triggered in case of habits and which will otherwise simply generate predictions of what could result of the response if triggered. In a pavlovian scheme, stimuli can also preactivate anticipated rewards. From this common basis, agranular frontal areas (motor, premotor and lateral orbitofrontal cortex) can directly make a pertinent decision if the world is stable enough. Else, cognitive control is needed with the help of medial and lateral prefrontal cortex, inhibiting the default behavior and imposing new rules adapted to the context, by an attentional process on the posterior cortex. Within this view, the behavior is seen as the control of perception, with affordances to select responses, depending on the desirability of the predicted outcomes (Pezzulo and Cisek, 2016). Interestingly, concepts like intentionality, thoughts and imagination (seen as a control of thoughts) can also be evoked in that view, as mechanisms of the cognitive control; higher cognition is in fact partly elaborated on the same basic sensorimotor and motivational loops.