Arkypallidal neurons primarily project to the DLS. To determine if GPe arky neurons showed preferential projection to either the DMS or DLS, we injected an anterograde virus [pAAV1-CamKII(1.3).eYFP.WPRE.hGH] into the GPe. We then examined the distribution of synaptic terminals in various output regions using confocal imaging (Fig. 1a). Interestingly, GPe arky neurons preferentially projected to the DLS compared to the DMS (Fig. 1b). Prototypical projections were also observed in the STN and SNr. Next, we injected retrobeads into the DMS or DLS and examined the retrograde signal in the GPe and cortex (Fig. 2a). Then we used GPe cell-specific markers FOXP2 (arky) and PV (prototypic) to confirm the arky marker-labeled cells overlap with the retrobeads in the GPe (Fig. 2b). As previously characterized, we confirmed prefrontal cortical areas projecting to the DMS and more significant motor cortical projections to the DLS13,22,27−29. Again, we observed predominant retrograde signal in the GPe from the retrobeads injected into the DLS. Consistent with previous estimates, most of the DLS-projecting neurons in the GPe expressed FOXP2 (~ 80%), but not PV15–17.
Mice exhibit goal-directed or habitual behaviors through training on random operant schedules. Based on previous findings that GPe arky neurons are important for regulating dorsal striatum-dependent behaviors20,21, we sought to determine the temporal dynamics of arky neurons during goal-directed and habitual behavior for a 20% sucrose reward using the genetically encoded calcium-sensitive fluorescent proteins, GCaMP630. A retrograde virus expressing GCaMP6s (AAV-hSyn1-GCaMP6s-P2A-nls-dTomato) was injected into the DLS, and we recorded intracellular Ca2+ signal using fiber photometry during the last training sessions of random ratio (RR; goal) and random interval (RI; habit) schedules, where the operant behaviors were presumably sufficiently learned (Fig. 3a-c). During magazine training (MT), both groups of mice showed reduced latency to the magazine from the first day to the last day. Both RR and RI groups showed increased nose poke rates across sessions and an increased likelihood to choose the active nose-poke hole compared to the inactive hole (Two-way RM ANOVA, p < 0.05; Fig. 3d). In the devaluation test, which compares nose poking between the valued and devalued conditions, RI-trained mice did not show a decrease in nose poking in the devalued condition, indicating habitual reward-seeking (Wilcoxon test, p > 0.05; Fig. 3e). RR-trained mice showed a reduction in nose poking in the devalued condition, indicating goal-directed reward-seeking (p < 0.05; Fig. 3e).
Arkypallidal neurons exhibit increased Ca 2+ signaling during habitual (RI) reward-seeking. After confirming that mice showed goal-directed (RR) or habitual (RI) reward-seeking in the reward-devaluation task, we examined GPe arky neural activities surrounding the six behavioral events: rewarded nose-poke (NP R+), unrewarded nose-poke (NP R-), rewarded magazine entry (Mentry R+), unrewarded magazine entry (Mentry R-), rewarded magazine exit (Mexit R+), and unrewarded magazine exit (Mexit R-). We aligned the Ca2+ signal data 2 seconds prior and 2 seconds following each behavioral event (4s total, 120 frames; Fig. 4a-b). Mice showed increased GPe arky neuron activities during RI120 task compared to the RR20 task for rewarded and unrewarded nose poke (Two-way RM ANOVA, p < 0.05; Fig. 4c), and rewarded magazine entry (p < 0.05; Fig. 4f). GPe arky Ca2+ signal was significantly higher in the RR20 task for rewarded magazine exit compared to RI 120 (p < 0.05; Fig. 4i). However, no effect of operant schedule for unrewarded magazine entry or unrewarded magazine exit was observed (p > 0.05; Fig. 4f, i). Specific time ranges for significant post hoc comparisons between operant schedules are presented in Supplementary Table 3. For RI mice, comparison of mean arky activities before and after each behavioral event showed a significant increase in Ca2+ signal following rewarded nose poke, rewarded magazine entry, and unrewarded magazine entry (paired t-test, p < 0.05; Fig. 4d, g). GPe arky Ca2+ signal was decreased following unrewarded magazine exit (p < 0.05; Fig. 4j) without differences in rewarded magazine exit or unrewarded nose poke (p > 0.05, Fig. 4d, g, j). For RR mice, we found a significant increase in Ca2+ signal following rewarded nose pokes (paired t-test, p < 0.05; Fig. 4d) without changes in the other 5 behavioral events (p > 0.05, Fig. 4d, g, j). The degree of change was significantly higher in RI mice compared to RR mice for rewarded nose poke, unrewarded nose poke and rewarded magazine entry (unpaired t-test, p < 0.05; Fig. 4e, h).
Additionally, we examined whether the temporal cellular activities at the time of action-selection were stable or changed across the duration of an operant session. We used a regression analysis to compare the relationship between degree of change in Ca2+ signal surrounding each of the behavioral events with the progression of the behavioral trial. Due to the high variability in the total number of behavioral events across individuals and operant schedules, we transformed each trial into 10 blocks, each representing 10% increments of that behavioral event for the session. GPe arky Ca2+ signal for RI mice is progressively increased in activity change across the duration of a trial for rewarded nose poke, unrewarded nose poke, rewarded magazine entry, and unrewarded magazine entry (linear regression, p < 0.05; Supplementary Fig. 1a-d). During the RR task, arky activities are only increased across the trial duration for rewarded nose poke and rewarded magazine entry (p < 0.05; Supplementary Fig. 1a, c). Overall, arky Ca2+ signal was increased across trial duration at a greater rate for RI compared to RR for unrewarded nose poke and rewarded magazine entry (p < 0.05; Supplementary Fig. 1b-c).
GPe arkypallidal neuronal activities have information sufficient to distinguish goal-directed and habitual reward seeking. To identify whether GPe arky Ca2+ signal can distinguish which type of reward-seeking mice exhibited, we trained a support vector machine (SVM), a supervised learning model with minimal risk of overfitting and demonstrated utility in analyzing neural activity data31–34. To accommodate the temporal dynamics of GPe arky neurons, as opposed to the average value of GPe arky neuron activities, we utilized all the trials from the RR and RI tasks. In addition, the time window was set to 2 seconds before and after the behavioral events. Our model can successfully differentiate arky neural activities between the RR and RI tasks for all behavioral events (Fig. 5a, Supplementary Table 1). Neural activities surrounding the nose poke behavioral events especially were a strong predictor of the reward-seeking type (Post hoc Dunn’s test; Fig. 5a-b; p < 0.05 for accuracy, sensitivity, and specificity compared to other behavioral events). Together, this indicates that GPe arky Ca2+ signal could be a predictor of action-selection underling goal-directed and habitual reward-seeking.
Caspase3-dependent ablation of GPe arkypallidal projections to the DLS shift mice towards habitual behavior. To determine whether ablation of this arkyGPe→ DLS circuit modulates goal-directed or habitual reward-seeking, we used a Cre-dependent caspase 3 virus which induces cell-autonomous death with minimal toxicity to neighboring cells35–39. We bilaterally injected an mCherry-tagged retrograde virus expressing Cre recombinase (AAV-Ef1a-mCherry-IRES-Cre) into the DLS, followed by a second injection of Cre-dependent caspase-3 (pAAV5-flex-taCasp3-TEVp; or control pAAV5-Ef1a-DIO EYFP) into the GPe (Fig. 6a). We validated a significant reduction in mCherry-positive neurons in the GPe of the caspase group (unpaired t-test, p < 0.05; Fig. 7b-c). Supporting that GPe arky ablation disinhibits DLS cellular activities, caspase mice had a significantly higher number of cFos-positive cells in the DLS compared to the control group, but not in the DMS (p < 0.05; Supplementary Fig. 2a-b).
Since previous studies have implicated GPe arky neurons being associated with motor function20,21, we examined if GPe arky neuron ablation resulted in motor dysfunction. In the open field test, we observed no significant differences in spontaneous locomotion or velocity, indicating that partial arkyGPe→ DLS circuit ablation does not alter basic motor function (Mann-Whitney test, p > 0.05; Supplementary Fig. 2d). In the first 10 minutes, we found no significant changes in time in the center zone, or center zone entries (p > 0.05; Supplementary Fig. 2e), suggesting no observable impact on anxiety-like behavior. To assess dorsal striatum-dependent motor learning, we utilized an accelerated rotarod paradigm which has previously been shown to result in DLS-dependent skill acquisition40. Both groups learned the task well indicated by a significant increase in latency to fall across training sessions (Two-way RM ANOVA, p < 0.05; Supplementary Fig. 2g) without significant overall group differences in latency to fall, nor were there any group differences in the change across sessions (p > 0.05; Supplementary Fig. 2g). However, for daily average latency to fall values, we found an interaction between group differences and the day of testing (p < 0.05; Supplementary Fig. 2h). The caspase group had a shorter latency to fall time for days 1 and 2 (p < 0.05; Supplementary Fig. 2h), but similar in the remaining three training days, indicating that partial arkyGPe→ DLS circuit ablation may slow the initial motor learning without long-term effects.
During magazine training, both control and caspase mice showed reduced latency to the magazine from the first day to the last for both RR and RI (Two-way RM ANOVA, p < 0.05; Fig. 6e) training without differences between caspase and control mice for RR or RI groups. During operant training, mice showed increased nose poke rates for both RR and RI (p < 0.05; Fig. 6f) schedules in both the caspase and control groups. Altogether our results demonstrate arkyGPe→ DLS circuit ablation does not impair overall performance or acquisition rate in the operant reward-seeking task. In the devaluation test, control mice in the RR group showed a reduction in extinction session nose pokes for the devalued state (Wilcoxon test, p < 0.05; Fig. 6g), consistent with goal-directed behavior. Interestingly, RR caspase mice exhibit habitual behavior (p > 0.05; Fig. 6g), indicating that loss of arkyGPe→ DLS circuit function promotes a shift from goal-directed to habitual behavior. In contrast, RI caspase mice showed no significant differences between the valued and devalued states in both sham control and caspase mice (p > 0.05; Fig. 6h).
To determine whether this shift towards habitual behavior is specific to sucrose reward-seeking, we trained an additional set of mice with a 10% sucrose and 10% ethanol solution reward (10S10E). Similar to the sucrose reward-seeking paradigm, in the devaluation extinction test, we found no difference between the valued and devalued states for caspase mice on the RR operant schedule (p > 0.05; Supplementary Fig. 3a). Also, both RI caspase and sham control mice exhibited no differences between the valued and devalued states (p > 0.05; Supplementary Fig. 3b). To determine whether this shift to habitual behavior is possibly due to a change in motivation or valuation of the reward, or a more specific reinforcement learning process, we compared 10% ethanol preference and consumption between control and caspase mice in a 24h continuous-access two-bottle choice paradigm. We found no difference in 10% ethanol preference or consumption (Two-way RM ANOVA, p > 0.05; Supplementary Fig. 3c) between the caspase and control mice, suggesting that habitual seeking behavior is not necessarily correlated to reward preference.
Chemogenetic activation of GPe arkypallidal neurons reduces overall seeking-behaviors during valuation extinction testing. To determine if activation of GPe arky neurons could inhibit or reverse RI habitual reward-seeking, we selectively expressed the Gq-coupled designer receptors exclusively activated by designer drugs (DREADD) in arky neurons by first injecting a retrograde virus expressing Cre recombinase into the DLS (pENN.AAV.hSyn.HI.eGFP-Cre.WPRE.SV40). Next, we injected a Cre-dependent hM3Dq DREADDs virus into the GPe [pAAV5-hSyn-DIO-HM3D(Gq)-mCherry; Fig. 7a] and trained mice on an RI schedule (Fig. 7b). We confirmed DREADD expression in arky neurons via the overlapping of the mCherry with the FOXP2 cellular marker (Fig. 7c). During magazine training, mice showed reduced latency to the magazine from the first day to the last and increased nose poke rates across training sessions (Two-way RM ANOVA, p < 0.05; Fig. 7d). To test the effect of arky activation on habitual behavior, C21 (1 mg/kg i.p.) was administered 30 minutes before the extinction test for the valued and devalued states. As arky neurons have been shown to inhibit both dopamine 1 receptor (D1R) and dopamine 2 receptor (D2R)-expressing neurons in the dorsal striatum17,19−21, we sought to determine whether any behavioral changes primarily occurred via D1R- or D2R- dependent mechanisms by testing combined C21 + Raclopride (D2R antagonist; 0.1 mg/kg) and C21 + SKF38393 (D1R agonist; 1.0 mg/kg) injection groups (Fig. 7b). In the devaluation test, we observed no significant differences between the valued and devalued states for saline, C21, C21 + raclopride, nor C21 + SKF (Wilcoxon test, p > 0.05) injection groups (Fig. 7e). However, the C21 injection significantly reduced nose pokes and magazine entries (Dunn’s post hoc test, p < 0.05; Fig. 7f-g) compared to the saline injection. Addition of raclopride similarly reduced nose poke, magazine entries, and magazine duration (p < 0.05; Fig. 7f-h) compared to saline injections. Interestingly, only coadministration of the D1R agonist, SKF38393, prevented C21-induced reductions in seeking-behaviors (p > 0.05; Fig. 7f-h), indicating behavioral effects of arky activation may primarily be through D1R-expressing dMSN’s.