Error-related Signaling in Nucleus Accumbens D2 Receptor-expressing Neurons Guides Avoidance-based Goal-directed Behavior

Learnt associations between environmental cues and the outcomes they predict (cue-outcome associations) play a major role in behavioral control, guiding not only which responses we should perform, but also which we should avoid, in order to achieve a specic goal. The encoding of such cue-outcome associations, as well as the performance of cue-guided goal-directed behavior, is thought to involve dopamine D1 and D2 receptor-expressing medium spiny neurons (D1-/D2-MSNs) of the nucleus accumbens (NAc). Here, using a visual discrimination task in mice, we assessed the role of NAc D1-/D2-MSNs in cue-guided goal-directed avoidance of inappropriate responding. Cell-type specic neuronal silencing and in-vivo imaging revealed NAc D2-MSNs to selectively contribute to cue-guided avoidance behavior, with activation of NAc D2-MSNs following response error playing an important role in optimizing future goal-directed behavior. Our ndings indicate that error-signaling by NAc D2-MSNs underlies the ability to use environmental cues to avoid inappropriate behavior. next trial after optogenetic activation reduces risky in D2-MSNs ts on support our DREADD, optogenetic, and calcium imaging experiments, male heterozygous D1-Cre (FK150Gsat) and D2-Cre (ER44Gsat) mice were used (DREADD, D1-Cre, n = 12, one mouse was excluded due to no viral expression; D2-Cre, n = 12, one mouse was excluded due to unstable behavior; optogenetics, D2-Cre, n = 19, 2 mice were excluded due to insucient conditioning; calcium imaging, D1-Cre, n = 3; D2-Cre, n = 4, one mouse was excluded because of incorrect GRIN lens placement). D1-Cre and D2-Cre were maintained in a C57BL/6J background. Animals were housed on a 12-hour light/dark cycle. Behavioral studies were conducted during the light cycle. Mice were kept on water restriction during behavioral testing. For all behavioral experiments except for calcium imaging experiment, mice were grouped housed throughout the experiments. For calcium imaging experiments, mice were singly housed after GRIN lens implantation. All experiments conformed to the guidelines of the National Institutes of Health experimental procedures, and were approved by the Animal Experimental Committee of Institute for Protein Research at Osaka University (approval ID 29-02-1).


Introduction
Goal-directed behavior can be guided not only by strategies based upon the acquisition of a desireable outcomes, but also by those aimed at reducing undesirable outcomes. Indeed, we are often faced with situations in which the correct behavioral response to acquire a desired outcome is unknown or ambigious, but prior experience of failure can be used to avoid inappropriate responses. The important role that negative outcomes play in guiding avoidance behavior has long been appreciated, dating back from Thorndike's (1927) law of effect to the more recent proposal of loss aversion in the eld of behavioral economics [1][2][3] . However, the neural mechanisms that underly the use of such avoidance-based strategies for goal-directed behavior are still unclear, and studies to date have primarily focused investigating the reinforcement of behaviors that directly result in rewarding outcomes [4][5][6] .
Within a limbic cortico-basal ganglia-thalamo-cortical signalling loop, the nucleus accumbens (NAc), in particular the NAc Core subregion, is suggested to play an important role in goal-directed decision making via its role in linking information processing concerning outcome values with that related to goal selection [7][8][9][10] . NAc medium spiny projection neurons (MSN), the major neuron type, are typically anatomically divided into two roughly equal subpopulations; dopamine D1 receptor-expressing MSNs (D1-MSN) that project predominantly to the ventral pallidum (VP) and substantia nigra pars reticulata (SNr), and dopamine D2 receptor-expressing MSNs (D2-MSN) that project predominantly to the VP 11,12 .
While previous studies have demonstrated NAc D1-MSNs to be implicated in reward-related learning and D2-MSNs to be involved in aversion-related learning and behavioral exibility [13][14][15] , the speci c role that NAc D1-and D2-MSNs may play in avoidance-based behavior are less clear.
To isolate and measure the ability for avoidance-based goal-directed behavior, we designed a novel visual discrimination-based cue-guided avoidance learning (VD-Avoid) task in which mice were required to avoid an instrumental response at a visual cue known to be associated with a reward omission, and instead respond at a random cue that had not previously been associated with any outcome in order to acquire a liquid reward. In a series of experiments, miniature microscope in vivo calcium imaging was used to investigate the precise activity patterns of D1-and D2-MSNs during the VD-Avoid task, while chemogenetic and time-speci c optogenetic silencing of NAc D1-MSN or D2-MSNs during the same task were used to establish whether inactivation of these two subpopulations impairs the utilization of avoidance-based behavioral strategies. Our ndings indicate that while D1-MSNs are activated by rewarding outcomes and contribute to the ongoing performance of behaviors directly resulting in rewarding outcomes, D2-MSNs are activated by reward omission and are necessary for avoidance of responses resulting in non-rewarded outcomes.

Results
Mice Can Acquire Avoidance-based Goal-directed Behavior To assess the ability for avoidance-based goal-directed behavior in mice, we created a novel avoidancebased visual discrimination task (VD-Avoid) in which mice used visual cues to determine which of two touchscreen response windows should be avoided in order to receive a liquid reward by responding at the alternate window (Fig. 1A). Following trial initiation, a visual cue was presented in each of the two response windows (Fig. 1B); one visual cue (correct cue) was randomly changed each trial (51 possible images) and resulted in delivery of a liquid reward (7µl condensed milk at the reward magazine) following a touchscreen response, while the other visual cue (incorrect cue) was kept consistent during all trials and resulted in no reward and a 5-sec time-out following a touchscreen response. This design meant that mice could not form a cue-outcome association for the correct cue due to its random nature, but must instead rely on the cue-outcome association of the incorrect cue to guide appropriate behavior (avoidance of the known visual cue and a touch response at the unknown visual cue).
Within 14 days of training, all C57BL/6J mice were able to reach the task criterion of ≥80% correct responses in a 60-min session for two consecutive days, indicating that visual cues signalling incorrect responses are su cient to guide avoidance-based goal-directed behavior (Fig. 1C, 1E and 1F). Indeed, as training progressed, the correct response latency gradually decreased, suggesting that mice were able to perform the task more e ciently following repeated training ( Fig. 1D and 1G). On the other hand, there was no signi cant change in reward collection latency, suggesting that motivation to acquire the reward does not change across learning ( Fig. 1D and 1H). Finally, as several studies have reported the phenomenon of post-error slowing, in which the subjects tend to show a slower response subsequent to an an error trial 16,17 , it was investigated whether the recent history of the previous trial affected the response latency on the subsequent trial. Mice showed a trend towards slower responses on trials that immediately followed a trial in which a response error was made (Fig. S1), suggesting that response latencies may also be modulated by recent history in the VD-Avoid task. Together, these data indicate the ability of mice to acquire a goal-directed avoidance of a visual cue signaling an incorrect response window, providing a framework for studying the neural mechanisms underlying avoidance-based goaldirected behavior.
NAc D1-and D2-MSNs Control Different Aspects of Goaldirected Behaviors Given the proposed importance of the NAc in goal-directed behavior and inhibitory control 10,18−20 , we investigated whether NAc D1-and D2-MSNs contribute to performance in the VD-Avoid task. We bilaterally injected a Cre-dependent adeno-associated virus (AAV) expressing an inhibitory designer receptor exclusively activated by a designer drug (iDREADD, hM4Di 21 ) fused to the uorescent marker mCherry (AAV5-hSyn-DIO-hM4Di-mCherry), or the mCherry marker by itself (AAV5-hSyn-DIO-mCherry) for control animals, into the NAc of D1-Cre or D2-Cre mice ( Fig. 2A). hM4Di iDREADDs have been established to robustly suppress neuronal activity when the arti cial ligand clozapine-N-oxide (CNO) is administered 22 , and have previously been demonstrated to be effective for inhibition of NAc activity 23 . Injections sites were targeted at the core region of the NAc due to the proposed importance of this subregion in goal-directed behavior 8,10,24,25 and were histologically con rmed by uorescence microscopic observation of mCherry expression following the completion of experiments ( Fig. 2B and S2). In D1-Cre mice, hM4Di-mCherry-positive axon terminals were observed in the ventral pallidum (VP) and substantia nigra pars reticulata (SNr), while in D2-Cre mice, hM4Di-mCherry-positive axon terminals were observed only in the VP but not SNr (Fig. S3), consistent with the canonical targets of D1-MSN or D2-MSN in the NAc 11,12 .
Following training to criterion levels of performance (≥80% correct on two consecutive days) in the VD-Avoid task, mice expressing iDREADDs in NAc D1-or D2-MSNs and control mice were intraperitoneally administered Vehicle or CNO across 4 test sessions (Fig. 2C). Chemogenetic inhibition of either NAc D1or D2-MSNs was su cient to transiently decrease performance in the VD-Avoid task (Fig. 2D, top). Indeed, averaging performances across the two Vehicle and CNO test sessions revealed performance to be similarly reduced in mice expressing iDREADDs in NAc D1-or D2-MSNs, but not in control mice (D1-and D2-Cre mice data were merged for the control group as no signi cant effect of genotype was observed) (Fig. 2D, bottom). While both NAc D1-and D2-MSN inhibition impaired performance, trial-by-trial analysis of data revealed these two neuron populations to contribute to different aspects of the task. Chemogenetic inhibition of NAc D1-MSNs reduced the amount of correct responses in trials immediately following a correct response (Fig. 2E), whereas inhibition of NAc D2-MSNs reduced correct responses in trials immediately following either correct or incorrect responses (Fig. 2F). In control mice, correct responses in trials immediately following either correct or incorrect responses were unaffected by CNO delivery (Fig. 2G). These ndings suggest that while NAc D1-MSNs may be important for the maintenance of correct goal-directed behavioral strategies, NAc D2-MSNs may play a role in error-based learning to avoid an incorrect strategy. Considering that the NAc also plays an important role in motivational control 26−28 , we additionally analyzed the effect of chemogenetic manipulation on motivation-related measures. iDREADD inhibition of NAc D1-MSNs, but not D2-MSNs, reduced the total number of earned rewards and increased the latency to collect the reward, indicating that activity in NAc D1-MSNs contributes to the motivation to perform the task (Fig. S4).
Finally, to further con rm the observed roles of NAc D1-and D2-MSNs in the VD-Avoid task, we additionally tested animals in a reversed visual discrimination task in which a consistent visual cue signaled the rewarded response window that should be attended to, while a randomly-assigned visual cue signaled an unrewarded response window (VD-Attend task) (Fig. S5A). Interestingly, although iDREADD inhibition of NAc D1-MSNs similarly decreased the performance of the VD-Attend task, inhibition of NAc D2-MSNs had no effects on performance ( Fig. S5B-S5F). Additionally, similar to the VD-Avoid task, reduced performances observed in NAc D1-MSN inhibited mice were found to be the result of reduced performance in trials immediately following a correct response in the VD-Attend task (Fig. S5D).
Collectively, our data suggest that NAc D1-MSNs contribute to attendance-based goal-directed behavior following correct responses, while NAc D2-MSNs are implicated in avoidance-based goal-directed behavior following response errors.
NAc D2-MSNs Are Transiently Activated Following Response Errors, While NAc D1-MSNs Are Transiently Activated Following Correct Responses To investigate the precise temporal window in which these NAc D1-and D2-MSNs contribute to goaldirected behavior in the VD-Avoid task, we next performed in-vivo calcium imaging of each of these neuron types at the single-cell level using miniature microscope. An AAV expressing the uorescent calcium indicator, jGCaMP7f 29 , in a Cre dependent manner (AAV9-hSyn-FLEX-jGCaMP7f) was microinjected into the NAc Core of D1-Cre or D2-Cre mice, and a gradient-index (GRIN) lens implanted above the viral injection site (Fig. 3A, 3B, S6 and S7). Neural activity was recorded in freely moving mice on the rst and criterion sessions of the VD-Avoid task, and both D1-and D2-Cre mice were able to acquire the task within 1-3 weeks of training (Fig. S7). A constrained non-negative matrix factorization method for microendoscopic images (CNMFe; 30 was used to analyze the neural activites of individual NAc D1-or D2-MSNs during performance of the VD-Avoid task ( Fig. 3C and 3D; Movies S1 and S2). In the criterion session, a total of 266 cells were identi ed in D1-Cre mice and 194 cells in D2-Cre mice. To determine which neurons' activities were modulated by correct or error responses, we performed hierarchical clustering and classi ed neurons into groups based upon their activity pro les (Fig. 3E, 3F and S8). The averaged activites of identi ed neurons during the ITI (-5-0 sec from trial initiation), Cue (0-2 sec from trial initiation), and Outcome period (0-5 sec from a response) were then compared between correct and error trials to identify the time window during which activity was altered for each neuron type  3F). Moreover, the proportion of neurons activated by error responses (Type II) was signi cantly higher in D2-MSN than in D1-MSN population, while the proportions of neurons with correct preference (Type III and Type IV; neurons inhibited by error responses and those activated by correct responses, respectively) were higher in D1-MSN than in D2-MSN population (Fig. 3F). Collectively, neural imaging of NAc D1-and D2-MSNs during performance in the criterion session of the VD-Avoid task indicates that the majority (59.8%) of D2-MSNs and a subset (30.8%) of D1-MSNs in the NAc were activated during the Outcome period of error trials, suggesting that error-related signaling by these neurons during this time window may play an important role in guiding avoidance-based goal-directed behavior following response errors. Our data also shows that D1-MSNs in the NAc have a larger proportion of the neurons that are inhibited following error responses (Type III) or activated following correct responses (Type IV) than that of D2-MSN, suggesting that correct-related signaling by D1-MSN also contributes to higher performance in the VD-Avoid task.

Error-induced Signaling in NAc D2-MSNs Was Increased by Learning
To investigate whether outcome-speci c patterns of activation in NAc D1-and D2-MSNs are shaped by learning, we next analyzed changes in the responses of individual NAc MSNs pre-and post-acquisition of the VD-Avoid task ( Fig. 4A and S7) using a previously-established cell registration method 31 . A total of 239 neurons (D1-Cre, 103 pairs; D2-Cre, 136 pairs) were able to be identi ed in both the rst (novice) and criterion (expert) sessions. First, we examined whether neural dynamics in clusters with clear activation in the outcome period (Type II and IV) was shaped by learning and whether there were differences in the original activity patterns between D1-MSN and D2-MSN. Neurons activated by error response (Type II) in the criterion session were largely unresponsive in error trials in the rst session, indicating that error activation of the NAc Type II neurons in criterion sessions emerged from non-responding population in both D1/D2-MSN types (Fig. 4B, 4C, 4F, and 4G). On the other hand, D1-MSNs activated by correct response (Type IV) were largely unresponsive in correct trials in the rst session ( Fig. 4D and 4H), while D2-MSNs activated by correct response (Type IV) were also activated by correct responses in session 1 ( Fig. 4E and 4I). Next, comparing averaged neural activity in the Outcome period for all types D1-MSNs, we found that D1-MSNs with correct preference (Type III and IV in expert) were activated in more expert than novice mice after correct responses (Fig. S9A), while Type III became inhibited after error responses (Fig. S9B). In contrast, in D2-MSNs, neurons activated by error responses (Type II in expert) became activated in more expert than novice mice after only error responses ( Fig. S9C and S9D). Interestingly, D1-MSNs activated by error resposnes (Type II in expert) also became activated in more expert than novice mice after error responses ( Fig. 4F and S9B), though not as much as D2-MSNs ( Fig. 4G and S9D). Taken together, D1-MSNs with correct preference (Type III and IV) became relatively more active after correct responses through learning, as responsiveness bidirectionally changed in both correct and error trials. In contrast, D2-MSNs with error preference (Type II) showed a selective change in responsiveness to error trials.
Lastly, to investigate whether the difference of the calcium activity between error and correct choices is maintained over the course of learning, we calculated the area under the curve receiver operating characteristic (auROC) for the z-scored activities of individual neurons (Fig. S10). By tracking the same neurons pre-and post-acquisition of the VD-Avoid task, we found that the activity of NAc D2-MSNs, but not D1-MSNs, in expert mice was positively correlated with that in novice mice (Fig. S10), indicating that D2-MSNs signaling the difference of correct and error outcomes were relatively maintained over learning.
Nevertheless, we detected a signi cant number of cells that acquired discriminability between correct and error outcomes (Fig. S10) in both cell types. These results indicate that the neuronal activation of D1-MSN and D2-MSN after making correct choices and error choices, respectively depends on the learning. The acquisition of responsiveness by learning of these neurons is likely to contribute to the high performance in the VD-Avoid task. Given that inhibition of D1-MSN in the NAc decreased the performance on the trial following a correct trial ( Fig. 2E and S5D), it could be that neuronal activation of D1-MSN in the NAc contributes to keep same strategy with con dence.
Optogenetic Inhibition of NAc D2-MSNs During the Outcome Period of Error Trials Impairs Avoidancebased Goal-directed Behavior Our iDREADD experiment showed that the chance of making a correct response following an error trial was reduced in D2-MSN inhibited mice but not in D1-MSN inhibited mice (compare Fig. 2E and 2F), suggesting the selective role of D2-MSN in error correction. In addition, our in-vivo calcium imaging experiments showed that larger proportion of D2-MSNs were activated during the outcome period of error trials ( Fig. 3 and 4). To con rm the functional importance of the outcome period for NAc D2-MSN control of avoidance-based goal-directed behavior, we optogenetically inhibited the activity of NAc D2-MSNs in a time-locked manner. We rst bilaterally injected a Cre-dependent AAV expressing the light-driven outward proton pump, archaerhodopsin (ArchT) 32 , fused to the uorescent marker tdTomato (AAV5-FLEX-ArchT3.0-tdTomato), or the eYFP marker (AAV5-DIO-eYFP) for control animals, into the NAc of D2-Cre mice, then implanted optic bers directly above the NAc (Fig. 5A, 5B, and S11). This technique has previously been established to suppress the activity of ArchT-expressing NAc D2-MSNs 33 . We separated the task into 3 time periods (ITI period; the last 5 sec of the inter-trial interval (ITI), Cue period; the time from trial initiation to the response, Outcome period; 5 sec after the response, Fig. 5C) to test whether NAc D2-MSN activities in different time periods have different effects on the performance of the VD-Avoid task. The tests were performed on animals that had reached the criterion (≥80% correct on two consecutive days or ≥75% correct on three consecutive days), and light stimulation was performed in a random order in 50% of trials. The timing of stimulation (ITI, Cue, or Outcome) was changed for each session, and the order of stimulation timing was randomized among mice (Fig. S12). We found that optogenetic inhibition of NAc D2-MSNs during the Outcome period, but not the ITI or Cue periods, impaired performance in the VD-Avoid task (Fig. 5D-5F). Trial-by-trial analysis of Outcome period D2-MSN inhibition sessions revealed that reduced performance was primarily caused by response errors in trials immediately following a response error, rather than response errors following a correct response, indicating a reduction in error signal-induced avoidance behavior (Fig. 5G and 5H). No changes in the percentage of correct responses immediately following correct or error trials were observed when NAc D2-MSNs were inhibited during ITI or Cue periods (Fig. S13A-S13D). Interestingly, optogenetic inhibition of NAc D2-MSNs during the Outcome period, but not ITI or Cue periods, speed up responses in trials immediately following a response error (Fig. S14A-S14F), suggesting that activation of NAc D2-MSNs during the Outcome period of a response error contributes to the phenomenon of post-error slowing.
Taken together, these ndings indicate that activation of NAc D2-MSNs following the Outcome period of response errors causally contributes to the ability for cue-guided avoidance of inappropriate behavior.

Discussion
The NAc is a key component of basal ganglia and is thought to contribute to reward evaluation and motivation control by integrating glutamatergic and dopaminergic inputs from the cerebral cortex and ventral tegmental area (VTA), respectively 5,10,26,34,35 . However, while the importance of the NAc has been discussed in several model of behavioral control 7,8,10,36,37 , a precise understanding of the role of the NAc, and its constituent cell-types, in goal-directed behavior has remained elusive. Here we established a novel visual discrimination task in mice to assess the neural mechanisms underlying cue-guided avoidancebased goal-directed behavior without the in uence of reward-associated cues and revealed that NAc D1and D2-MSN contribute to goal-directed behavior through dissociable mechanisms. D1-MSNs in the NAc have a larger proportion of the neurons that are inhibited following error responses (Type III) or activated following correct responses (Type IV) than that of D2-MSN and D1-MSN activity was revealed to be important for consecutively correct responding in our visual discrimination task. In contrast, the majority of NAc D2-MSNs were found to be activated by error responses and error-related signaling in NAc D2-MSNs during the response outcome period was demonstrated to play an important role in avoidance of inappropriate behavior in the immediate future.
The NAc receives dense projections from dopamine (DA) neurons of the VTA, which are known to encode reward prediction error signals 38,39 . Upon encounter with a reward greater than that predicted from previous experience, VTA DA neuron activity, and local DA release in the NAc Core, are reported to increase, while the opposite pattern is observed when a reward smaller that that predicted is encountered 5,38,40 . Within the NAc, the excitability of D1-and D2-MSNs are likely to be bidirectly modulated by local DA release 41,42 . Indeed, while DA binding at Gs-protein coupled D1 receptors stimulates cAMP signaling, increasing cellular excitability, binding at Gi-protein coupled D2 receptors inhibits cAMP signaling, reducing the cell's excitability 43 . Thus, the activity of NAc D1-and D2-MSNs observed in the present study corresponds with that expected according to the reward prediction error theory and the molecular properties of DA receptors; with a rewarding outcome activating NAc D1-MSNs, likely as a result of augmented local DA release, and a negative outcome activating NAc D2-MSNs, likely as the disinhibitory effect of a reduction in local DA release. These ndings also support previous evidence indicating that NAc D1-and D2-MSNs play important roles in signaling reward and aversion, respectively 13,44−47 . Interestingly, studies in humans have shown that increased DA concentration in the brain following L-dopa treatment, reduces the ability of participants to avoid choices that lead to negative outcomes, but does not alter the ability to learn from positive outcomes in goal-directed learning tasks 48,49 . Given that chemogenetic and optogenetic suppression of NAc D2-MSNs in our study similarly disrupted the ability of mice to avoid a behavioral response leading to a negative outcome when errorsignals were blocked, but not when reward signals were blocked, it could be speculated that the ability of L-dopa treatment to impair avoidance-based goal-directed behavior in humans may have been the result of DA-induced hypoactivity of NAc D2-MSNs.
The temporal localization of NAc activity to the Outcome period of behavioral responses in our study suggests that NAc D1-and D2-MSNs may be important for monitoring and updating of goal-directed behavior, rather than for action selection itself. Interestingly, previous evidence has suggested that optogenetic stimulation of the dorsomedial striatum (DMS) during Cue presentation biases action selection 50 , supporting models proposing the DMS to act as an actor and NAc to act as a critic in action selection and action evaluation, respectively 51,52 . The NAc forms part of a limbic processing loop that has been reported to converge with associative/cognitive and motor processing loops, involving the DMS and DLS, respectively, at the level of the SNr 53,54 . This circuitry provides a mechanism through which information about action values, provided by the limbic loop, can be integrated with information about current goals, mediated by the associative/cognitive loop, to dynamically control goal-directed behavior, as has been suggested by recent computational models 10,36 .
An interesting nding of our in-vivo imaging data was while ensembles of NAc D2-MSNs activated by the negative outcome (reward omission) of a response error remained consistent throughout task learning, ensembles of NAc D1-MSNs activated by the rewarding outcome (reward delivery) of a correct response changed across task learning. A potential explanation for these activity patterns may be that NAc signalling does not simply encode the value of a speci c outcome, but rather encodes more complex information about the value of outcomes associated with speci c cues. Thus, it could be speculated that the random nature of the reward-associated cue, but not the non-reward-associated cue, in our avoidancebased visual discrimination (VD-Avoid) task resulted in the changing patterns of activated NAc D1-MSNs, but not NAc D2-MSNs. Alternatively, it is possible that rewarding outcomes are signalled by more general, summated patterns of activity than negative outcomes. Indeed, recent evidence suggests that NAc D1-MSN activity controls generalized Pavlovian learning of cue-outcome associations, while NAc D2-MSNs contribute to the ability to discriminate between Pavlovian cues 5 . A further surprising nding of our invivo imaging study was that, contrary to canonical role of NAc D1-MSNs in reward signaling 13,44,45,55,56 , a small subset of NAc D1-MSNs were found to be activated by the reward omission. These data hint at heterogeneous functionality of NAc D1-MSN subpopulations and suggest the potential existence of NAc D1-MSNs responsive to negative outcomes, as has recently been reported in dorsal striatal D1-MSNs 57 .
Future studies investigating the precise activity patterns of ensembles of NAc D1-and D2-MSNs in response to a variety of rewarding and negative stimuli will likely help to fully elucidate the roles of these neural populations in value signaling.
Finally, impairments in the ability for response control are associated with risk-taking behaviors in drug addiction and attention-de cit/ hyperactivity disorder 49,58−60 . A previous study showed D2-MSN activation in the NAc decreased risky choices in the risk-seeking rats, suggesting that D2-MSN activity in the NAc is important for avoiding risk-taking behavior 61 . Our results largely support their data and extend them to show that NAc D2-MSN activity is involved in suppressing the action associated with the negative outcome as well as a risky choice. Additionally, their data showing that the probability of making a risky choice decreases on the next trial after failing to obtain a reward and that optogenetic activation of NAc D2-MSNs reduces risky behavior are in agreement with our ndings. Repeated cocaine treatment has also been shown to reduce the frequency of miniature excitatory postsynaptic currents in D2-MSNs 62 . This study ts with a model in which increased excitability of D2-MSNs leads to the strategy to avoid a bad option, while decrease in excitability of D2-MSN causes to disability to avoid a bad option 49,56,63 . These bidirectional effects on the strategy support our hypothesis that activation of D2-MSN plays an important role in avoiding a bad option.
In conclusion, we provide evidence that NAc D2-MSNs contribute to the ability to use environmental cues to guide avoidance behavior. Moreover, activation of D2-MSNs in the NAc by response errors plays an important role in execution of a strategy to avoid an undesirable response. These ndings indicate that modulating the neural activity of D2-MSN in the striatum including the NAc by D2 receptor-selective drugs may be bene cial for the treatment of disorders associated with impaired ability for inhibitory control, such as drug addiction and attention-de cit/ hyperactivity disorder 49,58−60 . In addition, our ndings suggest that the VD-Avoid task is a useful paradigm for investigating the neural mechanisms that underly avoidance-based goal-directed behavior.

Animals
Wild-type C57BL/6J mice (male, 8-10 weeks old) were used for validation of behavioral experiments. For DREADD, optogenetic, and calcium imaging experiments, male heterozygous D1-Cre (FK150Gsat) and D2-Cre (ER44Gsat) mice were used (DREADD, D1-Cre, n = 12, one mouse was excluded due to no viral expression; D2-Cre, n = 12, one mouse was excluded due to unstable behavior; optogenetics, D2-Cre, n = 19, 2 mice were excluded due to insu cient conditioning; calcium imaging, D1-Cre, n = 3; D2-Cre, n = 4, one mouse was excluded because of incorrect GRIN lens placement). D1-Cre and D2-Cre were maintained in a C57BL/6J background. Animals were housed on a 12-hour light/dark cycle. Behavioral studies were conducted during the light cycle. Mice were kept on water restriction during behavioral testing. For all behavioral experiments except for calcium imaging experiment, mice were grouped housed throughout the experiments. For calcium imaging experiments, mice were singly housed after GRIN lens implantation. All experiments conformed to the guidelines of the National Institutes of Health experimental procedures, and were approved by the Animal Experimental Committee of Institute for Protein Research at Osaka University (approval ID 29-02-1).

Stereotaxic Surgery
All mice used in this study were anesthetized with ketamine (100 mg/kg) and xylazine (20 mg/kg) for surgical procedures and placed in a stereotaxic frame (Kopf Instruments).
For calcium imaging experiments, heterozygous D1-Cre or D2-Cre mice were unilaterally injected with 1200 nl of AAV9-FLEX-jGCaMP7f (9.6×10 12 GC/ml, Addgene) were stereotaxically injected using a Nanoject III instrument (Drummond) at a rate of 100 nl/min (coordinates in mm: AP +1.20, ML ±1.25 from bregma, and DV −3.60 and −3.10 from brain surface. The injection pipette remained in place for 5-10 min to reduce back ow. After virus injection, a sterile 21-gauge needle was slowly lowered into the brain to a depth of -2.0 mm from the brain surface to aspirate brain tissue above the NAc. A GRIN lens (600 µm diameter, Inscopix) was slowly lowered into the brain to a depth of -3.20 mm from the brain surface by using a GRIN lens holder (Inscopix). We secured the GRIN lens to the skull with dental cement (Superbond). A silicone elastomer (Kwik-Cast; World Precision Instruments) was applied to the top of the lens to prevent external damage. Four-to-six weeks after lens implantation, a baseplate (Inscopix) attached to the miniature microscope (nVista; Inscopix) was positioned above the GRIN lens. The focal plane was adjusted until blood vessels could be clearly observed. After adjustment, the baseplate was secured in place with the dental cement.

Behavioral Experiments
Apparatus.
Training and testing were conducted in a Bussey-Saksida touchscreen chamber (Lafayette Instrument). A black plastic mask with 2 windows (70×75 mm 2 spaced, 5 mm apart, 16 mm above the oor) was placed in front of the touchscreen. ABET II and WhiskerServer software (Lafayette) were used to control operant system and data collection. Pretraining.
As the rst phase (3 days), mice were habituated to the chamber in 40-min sessions.
Diluted condensed milk (7 µl, Morinaga Milk) was dispensed in the reward magazine every 10 sec. In the following phase (1 day), a stimulus was randomly displayed in 1 of the 2 windows. After a 30-sec stimulus presentation, the milk reward (20 µl) was delivered with a tone (3 kHz) and the inside of the magazine was illuminated. When mice collected the reward, the magazine light went out, and the next trial commenced (60 trials, or up to 60 min) with a new stimulus after a 20-sec intertrial interval (ITI). In the next phase, stimuli were randomly displayed in one of 2 windows, and mice were obligated to touch the stimulus to receive a reward. In the nal phase of the pretraining, when a blank window was touched, mice were punished with a 5-sec time-out. After reaching criterion (77% correct for 2 consecutive days), mice moved on to basic training.
Basic training. Mice were tested 5-6 days per week (60 trials per day, or up to 60 min). Each trial was initiated after mice nose-poked in the magazine. Visual cues were presented until mice responded at either window.
For the VD-Attend task, two visual cues (marble and a random image) were presented in the touchscreen.
The random image was pseudorandomly chosen from 51 images. If the mouse responded to the correct (marble) visual cue, a milk reward (7 µl) was delivered with a tone (3 kHz) and the magazine was illuminated. When mice collected the reward, the magazine light went out, and the next trial commenced (60 trials, or up to 60 min) with a new stimulus after a 20-sec intertrial interval (ITI). If the mouse responded to the incorrect (random) visual cue, the mouse was punished with a 5-sec time-out (house light on).
For the VD-Avoid task, two visual cues ( ag and a random image) were presented in the touchscreen. If the mouse responded to the correct (random) visual cue, a milk reward (7 µl) was delivered with a tone (3 kHz) and the magazine was illuminated. If the mouse responded to the incorrect ( ag) visual cue, the mouse was punished with a 5-sec time-out (house light on).
A response at a random image was recorded as a correct response, while a response to visual cue "Flag" was recorded as an incorrect response.
After reaching criterion (>80% correct for 2 consecutive days), mice moved on to the test phase for DREADD experiments or cable habituation for optogenetic experiments, respectively.
For DREADD experiments, vehicle on days 1 and 3, or CNO (1.0 mg/kg diluted with vehicle, Sigma Aldrich) on days 2 and 4, was intraperitoneally administered 30 min before the session.
For optogenetic inhibition experiments, once the performance stabilized (>80% correct for 2 consecutive days) with the ber optic cables attatched, optogenetic stimulation experiments were commences. LED power was set to 1-3 mW. Stimulation schedule was counterbalanced.
For calcium imaging experiments, data was acquired at 20 Hz with 0.6 mW LED at the rst session of the basic training (Novice) and the session after reaching criterion (the criterion session; Expert). After acquisition, calcium recording les were temporally (factor of 2) and spatially (factor of 4) downsampled and motion-corrected using Inscopix Data Processing software ver 1. Sections were mounted with antifade mouting medium with DAPI (Vectashield). Stitched images were acquired using a Keyence BZ-X800 microscope.
Data are presented as mean ± SEM. Boxes and whiskers show the respective median and the 25 th to 75 th and 10 th to 90 th percentiles. Data are presented as mean ± SEM. Boxes and whiskers show the respective median and the 25 th to 75 th and 10 th to 90 th percentiles. Clustering types were de ned by criterion session data. Data are presented as mean ± SEM.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryMovieS1Correct.mp4 SupplementaryMovieS2Error.mp4