Angiotensin blockade enhances motivational reward learning via enhancing striatal prediction error signaling and frontostriatal communication

Adaptive human learning utilizes reward prediction errors (RPEs) that scale the differences between expected and actual outcomes to optimize future choices. Depression has been linked with biased RPE signaling and an exaggerated impact of negative outcomes on learning which may promote amotivation and anhedonia. The present proof-of-concept study combined computational modeling and multivariate decoding with neuroimaging to determine the influence of the selective competitive angiotensin II type 1 receptor antagonist losartan on learning from positive or negative outcomes and the underlying neural mechanisms in healthy humans. In a double-blind, between-subjects, placebo-controlled pharmaco-fMRI experiment, 61 healthy male participants (losartan, n = 30; placebo, n = 31) underwent a probabilistic selection reinforcement learning task incorporating a learning and transfer phase. Losartan improved choice accuracy for the hardest stimulus pair via increasing expected value sensitivity towards the rewarding stimulus relative to the placebo group during learning. Computational modeling revealed that losartan reduced the learning rate for negative outcomes and increased exploitatory choice behaviors while preserving learning for positive outcomes. These behavioral patterns were paralleled on the neural level by increased RPE signaling in orbitofrontal-striatal regions and enhanced positive outcome representations in the ventral striatum (VS) following losartan. In the transfer phase, losartan accelerated response times and enhanced VS functional connectivity with left dorsolateral prefrontal cortex when approaching maximum rewards. These findings elucidate the potential of losartan to reduce the impact of negative outcomes during learning and subsequently facilitate motivational approach towards maximum rewards in the transfer of learning. This may indicate a promising therapeutic mechanism to normalize distorted reward learning and fronto-striatal functioning in depression.


INTRODUCTION
Human learning is driven by reward prediction errors (RPEs) that signal the discrepancy between expected and actual outcomes. Computational approaches have closely linked RPEs to dopaminergic signaling in the midbrain-striatum circuitry and to motivation and reward seeking [1,2]. Deficits in these domains, in particular amotivation and anhedonia, represent key symptoms of unipolar depression and dysregulated RPE signaling has been proposed as a potential underlying neurocomputational candidate mechanism [3]. Specifically, within a computational reinforcement learning (RL) framework, depressed individuals showed enhanced sensitivity to negative information while they concomitantly discounted positive feedback leading to reduced learning from positive events [4,5]. On the neural level this learning bias was often accompanied by blunted RPE signaling in the ventral striatum (VS) and reduced fronto-striatal connectivity during reward feedback [6,7]. These neural dysregulations have been associated with depressive symptom load, specifically anhedonia and persistent negative mood [8], and predict anti-depressive treatment response [9]. As such, distorted learning from negative and positive outcomes may play a key role in the pathophysiology of depression and may represent a promising target for novel antidepressive treatments.
Accumulating evidence suggests that the renin-angiotensin system (RAS) plays a key role in learning. Preclinical studies in rodents and humans have utilized the selective competitive angiotensin II type 1 receptor (AT1R) antagonist losartan (LT) -an approved treatment for hypertension with an excellent safety record [10,11] -to modulate learning from negative or positive events [12,13]. Recent human studies have demonstrated that a single dose of LT selectively suppressed memory encoding of threatening materials [14] and accelerated threat extinction learning [15,16]. Moreover, LT specifically affected probabilistic learning from negative outcomes by reducing the degree to which participants learned from loss feedback, while leaving learning from positive outcomes unaffected [12]. An initial neuroimaging study moreover reported modulatory effects of LT on mesocorticolimbic functional connectivity during social reward and punishment processing [17].
Given the pivotal role of dopamine (DA) in RPE signaling and modulation of the mesocorticolimbic circuits [18], these results may indicate a downstream effect of LT on DA signaling and in turn on learning from positive and negative outcomes. Support for a potential DA-mediated mechanism of action is provided by studies suggesting an important role of the AT1R in regulating central dopaminergic neurotransmission [19], high co-expression of AT1R and DA receptors [20] and evidence for functionalrelevant interactions between AT1R and DA receptors in the striatum [21].
Against this background, the present proof-of-concept study combined computational modeling and functional magnetic resonance imaging (fMRI), with a preregistered between-subjects randomized double-blind placebo-controlled pharmaco-fMRI design in n = 61 [male] healthy participants to determine modulatory effects of LT-induced AT1R blockade on RL model parameters and the underlying neural mechanisms. We utilized a validated probabilistic selection RL paradigm with two stages: a learning phase during which participants learned to make better choices for fixed pairs of stimuli according to reward or loss feedback, and a subsequent transfer phase during which participants applied the learned optimal choice behavior to novel combinations of stimuli without feedback. Behavioral responses during learning were fit using a computational RL model to describe the dynamic learning process. Effects of LT on learning were examined by means of comparing RL model parameters and neural activity related to model-derived estimates of RPE. Effects of LT on learning transfer were examined by comparing choices and functional connectivity in cortico-striatal pathways when participants approached the best or avoided the worst stimulus. Based on previous literature [12,14,16,17], we predicted that LT would: 1) reduce learning from negative outcomes, increase the RPEassociated signaling in the VS and its neural expression for positive outcomes during the learning phase and 2) increase selection of the best stimulus in the context of increased fronto-striatal coupling during learning transfer.

METHODS Participants
Seventy right-handed healthy male participants were screened according to previously evaluated enrollment criteria (see Supplementary Methods, sample size based on previous studies [15][16][17]). The study focused on male individuals to control for sex differences in response to RAS blockade [22] and menstrual cycle-dependent variations in reward processing [23]. Nine participants were excluded due to (a) poor learning (LT, n = 4; Placebo, n = 4, Fig. 1) and failing to reach the choice accuracy criteria (i.e., choosing 65% A in AB, 55% C in CD, 50% E in EF, in line with Frank et al., 2007) or due to (b) excessive head movement (LT, n = 1), leading to a final sample of n = 61 (mean ± SD, age = 20.89 ± 2.32 years).
All participants provided written informed consent, protocols were preregistered on Clinical Trials.gov (https://clinicaltrials.gov/ct2/show/ NCT04604938), approved by the ethics committee at the University of Electronic Science and Technology of China (Approval 355) and in line with the latest Declaration of Helsinki.
Using a double-blind randomized, placebo-controlled, between-subjects pharmacological fMRI design, participants were administered either a single oral dose of LT (50 mg) or placebo (PLC) packed in identical capsules. Capsules were dispensed by an independent researcher based on a computer-generated randomization sequence to ensure doubleblinding. The reinforcement learning paradigm consisted of two subsequent phases. During the learning phase participants were presented with one of three different pairs of six stimuli (denoted as AB, CD and EF) on each trial in a randomized order. Participants were instructed to learn to choose the best option within each stimulus pair based on the feedback presented (i.e., 'correct' or 'wrong' presented as text, which indicated that 0.5 RMB or nothing were added to the total payment). To avoid choice preference or reward associations with one particular stimulus, stimulus pairs were presented in a counterbalanced order across subjects. During the transfer phase, participants were presented with all permutations of combinations with A and B -corresponding to the stimuli with the highest or lowest reward probability, respectively -and were instructed to choose the better option according to their previous learning experience. c The probabilities of acquiring reward for pairs AB, CD and EF were 80%:20%, 70%:30% and 60%:40%, respectively, during the learning phase. The easiest condition was therefore the AB pair while the EF pair was the hardest one to learn because of the relatively equivalent reward probabilities between the two stimuli. A performance criterion (i.e., choosing 65% A in AB, 55% C in CD, 50% E in EF, a similar approach to that used by Frank et al., 2007) was initially used to ensure successful learning in the subjects. N = 4 subjects in each treatment group did not fulfill this criterion and were excluded from the further analysis. The analysis in the transfer phase was conducted on trials in which A was correctly chosen or B was avoided when being paired with another stimulus. BP blood pressure, HR heart rate, LT losartan, PLC placebo, FMRI functional magnetic resonance imaging, RL reinforcement learning. License: Image in 1a were designed by DinosoftLabs and obtained from Flaticon.com under the free license with attribution.
Investigators involved in data acquisition and analyses were blinded for group allocation. Prior studies suggest that single-dose LT effects on cardiovascular activity in healthy subjects manifest after 3 h [24]. Effects of the blood-brain barrier permeable LT on brain activity and cognitive domains were observed in the timeframe ranging from 1.5-2.5 h after singledose administration (which overlaps with peak plasma levels 90 min after administration and an elimination half-life between 1.5-2.5 h [25][26][27]). Consistent with this, LT treatment was administered 90 min before fMRI acquisition. Participants first performed a reinforcement learning task (duration 30 min) followed by an emotional memory task (reported in Xu et al., 2022 [14]). To control for nonspecific effects of LT, assessments of mood, attention and memory were incorporated at baseline and after the experiment, while cardiovascular activity (i.e., blood pressure, heart rate) was measured at baseline, after drug administration and after the experiment ( Fig. 1a and Supplementary Methods). To ensure doubleblinding participants were asked to guess which treatment they had received after the experiment (treatment guess χ 2 = 0.40, p = 0.53; confirming successful double-blinding).

Experimental design
Probabilistic selection reinforcement learning paradigm. A validated probabilistic selection reinforcement learning paradigm was employed [28][29][30]. The paradigm consisted of two stages: an initial reinforcement learning phase and a subsequent transfer phase. During the learning phase, participants were presented with one of three different pairs of six shape stimuli (denoted as AB, CD and EF, Fig. 1c) on each trial in a randomized order. Participants were instructed to learn to choose the better option of each stimulus pair based on feedback (Fig. 1b). Learning difficulty varied for the stimulus pairs in terms of reward contingency (80%:20%, 70%:30% or 60%:40%, for AB, CD and EF, respectively). A total of 240 trialsdispersed across two fMRI runs with 120 trials each (40 trials per stimulus pair, trial mean duration 4 s)were presented during the learning phase. Each trial began with a fixation cross presented for a jittered interval of 0 ms, 500 ms, 1000 ms, or 1500 ms (Fig. 1b) followed by the presentation of two shapes displayed to the left and right side of the fixation cross (side was counterbalanced). Stimuli were presented until participants made a response or after 1700ms. The choice was visually confirmed by highlighting the chosen shape in yellow color for 300 ms, followed by 400 ms feedback presentation ('correct' or 'wrong'). Then, the fixation cross was displayed again until the whole trial duration was reached. In addition, 12 null trials without stimulus presentation of the same duration were randomly interspersed in each fMRI run to improve the model fitting of the rapid event-related fMRI design.
For the transfer phase, the six shape stimuli were recombined to constitute fifteen stimulus pairs. Each stimulus pair was presented 8 times (side was counterbalanced) leading to 120 trials in the transfer phase, also with 12 null trials interspersed in a random order. Duration of each trial was 1700 ms and no feedback was provided (Fig. 1b). Participants were told to choose the better option in each stimulus pair according to what they had learned in the learning phase.
Computational modeling of learning behavior. We explored the learning rate in terms of choice behavior by using a Q-learning algorithm [31]. The Q-learning algorithm has been widely employed to model learning behavior and serves to model the change in choice behavior based on trial-by-trial updates of the expected value of choice options [30,32,33]. The corresponding model contains three free parameters: learning rate for positive (α Gain ) and negative (α Loss ) RPEs and estimation of explore-exploit tendency (β). For details of modeling procedures, model evaluation and comparison see Supplementary Methods and Supplementary Fig. S1. Statistical analyses on the behavioral level. All analyses were performed in R (R Core Team, 2017). For the learning phase, we employed a Bayesian linear multilevel model to analyze trial-by-trial choice behavior using the Bayesian regression model in R brms package (version 2.16.1) [34]. Main effects of treatment, stimulus pair and fMRI run, as well as the interaction of stimulus pair and treatment on choice accuracy (proportion of choosing better stimulus in one stimulus pair, e.g., choosing A in AB pair) were considered as credibly different when more than 95% of the posterior distribution was above/below zero. Treatment effects on computational modeling indices of choice behavior (learning rate, explore-exploit tendency) were examined by using two sample t tests via functions from the stats package (version 4.0.5) in the R environment.
For the transfer phase, we performed similar analyses for trials including stimuli with the highest (A, 80%) and lowest (B, 20%, Fig. 1c) reward probability to examine LT effects on choosing the best (A) and avoiding the worst (B) option. An exploratory model additionally examined effects of LT on choice times with choice (choose A, avoid choosing B) and treatment (LT, PLC) as fixed factors and subject as random factor. Main effects of treatment, choice behavior and their interaction were considered significant using the same 95% posterior distribution criterion (details see Supplementary Methods). We additionally explored the effects of treatment and rewarding probability (value) difference (e.g., AB-60, AD-50, AF-40) on choice accuracy and reaction time during learning transfer (details see Supplementary Methods) to control for a potential confounding impacts of probability difference levels on treatment effects. We did not find significant main effects of treatment or an interaction effect between treatment and rewarding probability difference ( Supplementary  Fig. S2, Supplementary Tables S1 and S2), suggesting rather specific treatment effects on easily-learned stimuli (i.e., A, B) during learning transfer.
MRI acquisition, preprocessing and first level analysis. MRI data were acquired on a 3.0-T GE Discovery MR system (General Electric Medical System, Milwaukee, WI, USA) and preprocessed using standard procedures in SPM 12 (Statistical Parametric Mapping; http://www.fil.ion.ucl.ac.uk/spm/; Wellcome Trust Centre for Neuroimaging) (see Supplementary Methods). Separate general linear models (GLM) were designed for the learning and transfer phase.
For the learning phase, we established a GLM model that incorporated separate outcome onsets for positive and negative feedback, each modulated by the corresponding RPE estimated from the computational model. The highlight-period and six head motion parameters were included as covariates of no interest. Given that the choice accuracy in both treatment groups rapidly approached a ceiling effect (such that irrespective of treatment participants rapidly showed optimal choice accuracy, i.e., >85.30%, across stimulus pairs with different rewarding probability in the second run, Supplementary Fig. S3), the first and second fMRI run were modeled separately, and analyses focused on the first run to increase the sensitivity for learning-associated treatment effects.
During the transfer phase, approach A and avoid B choices were modeled as separate conditions, and the six head motion parameters were included as nuisance regressors.
Examining neural effects of LT on RPE signaling during early learning. To examine the effects of LT on RPE signaling during early learning, the corresponding first level contrasts (i.e., Positive+Negative RPE) were subjected to voxel-wise two sample t tests. Whole brain analyses thresholded at cluster level family-wise error (FWE) corrected p < 0.05 were employed (initial cluster threshold, p < 0.001 uncorrected; see recommendations in Slotnick, 2017 [35]).
Effects of LT on feedback-sensitive neural expressions in the VS during early learning. Given the higher sensitivity of multivariate neurofunctional representations for a given mental process including reward and RPEs in the VS [36,37], a multi-voxel pattern analysis (MVPA) was employed (see Supplementary Methods). We initially developed a decoder on the whole brain neural pattern that differentiated positive versus negative outcomes during early learning and tested it in an independent sample to validate brain systems strongly involved in differentiating reward versus loss. Next, treatment effects on the corresponding expression in the VS were examined (details see Supplementary Methods). The VS region of interest (ROI) included the ventral caudate and nucleus accumbens as defined by the Brainnetome atlas [38] and functionally validated in our previous works [39,40].
Examining neural effects of LT on optimal choice behavior during learning transfer. Effects of LT on the transfer of optimal choice behavior were examined by means of separate voxel-wise two sample t-tests for choosing A (best option) or avoiding B (worst option), respectively. We employed whole brain analyses thresholded at cluster level FWE corrected p < 0.05 (initial cluster threshold, p < 0.001 uncorrected).
Functional connectivity analysis during learning transfer. Given that animal and human studies indicate that reinforcement learning is critically mediated by the functional communication between the VS and frontal regions [41,42], we explored whether LT treatment would affect frontal-VS functional connectivity when participants approach maximum (approach A) or avoid lowest rewards (avoid B) during the transfer phase. Treatment effects on frontal-VS functional networks were determined by performing voxel-wise two sample t tests on choosing A or avoid choosing B events. Within the brainnetome atlas-defined prefrontal cortex, results were thresholded at p < 0.05 FWE corrected at peak level with small volume correction (SVC). In addition, we also explored the role of the VS in perceiving rewarding probability (value) difference by exploratory voxelwise two sample t test with treatment as independent variable and VS activation for each value difference level.

Demographics and potential confounders
The LT (n = 30) and PLC (n = 31) groups were comparable with respect to sociodemographics and mood and cardiovascular indices arguing against nonspecific treatment effects (Table 1; all ps > 0.10).
LT increases choice accuracy for the hardest stimulus pair via increasing value sensitivity during early learning The choice accuracy indicates the proportion of trials on which subjects chose the option with higher probability for reward in a stimulus pair (e.g., choose A in AB). Here we observed a significant main effect of stimulus pair (β = 0.19, 95% highest-density interval (HDI), [0.16, 0.22], Supplementary Fig. S4a) such that participants exhibited the highest choice accuracy for the easy stimulus pair.
The main effect of treatment did not reach significance (β = 0.02, 95% HDI, [−0.02, 0.06], Supplementary Fig. S4b), but the main effect of fMRI run (β = 0.12, 95% HDI, [0.11, 0.13], Supplementary  Fig. S4c) was significant. Further inspection revealed that participants in both treatment groups showed an increased learning trend with high choice accuracy (all>85.30%) across stimulus pairs with different rewarding probability (Supplementary Fig. S3) in the second run. This may reflect that factors not related to probabilistic learning per se such as an understanding of the task structure or the reward probabilities may have led to an improved learning performance in the second run irrespective of treatment. The ceiling effect will lead to biased estimates and could critically reduce the sensitivity of detecting treatment effects on the trial-wise feedback-dependent learning process under examination (see also ref. [43] for treatment effects on between run learning performance). This pattern was confirmed by a robust treatment × stimulus pair interaction effect in the first    Supplementary Fig. S5). To increase the sensitivity to determine learning-related treatment effects, all subsequent behavioral and neural analysis consequently focused on the early learning phase (run 1).
LT reduces the learning rate for negative outcomes during early learning In line with our hypothesis, LT significantly reduced the learning rate from negative outcomes (t (   c Moreover, losartan enhanced exploitatory decisions in comparison to the placebo group. The error bars denote standard error of the mean. n.s non-significant, *p < 0.05, **p < 0.01. PLC placebo, LT losartan.

LT increases RPE signaling during early learning
We initially examined brain regions that scaled positive and negative RPEs independent of treatment. A corresponding voxelwise one sample t-test confirmed previous studies suggesting that activity in striatal and frontal regions linearly increased with the strength of the RPEs (Supplementary Fig. S6 Fig. 4a). Examination of extracted parameter estimates (spherical masks, radius: 6 mm) revealed that these regions signaled positive but not negative RPEs under PLC whereas the LT-induced increase further enhanced positive RPE and induced negative RPE signaling in these regions (Fig. 4b). To further explore whether treatment differentially affected the RPEs in the VS we employed an independent mask from the Brainnetome atlas [38]. The use of an independent mask alleviates a potential bias of post hoc statistics [44] and the VS was chosen due to its critical involvement in RPEs. Analyzing extracted estimates from the VS further confirmed that losartan increased activation in the VS for both positive and negative RPEs (details Supplementary Fig. S7).
LT sharpens differential neural representations for positive vs negative outcomes in the VS during early learning We initially established an accurate whole brain multivariate predictive pattern for classifying positive and negative outcomes (accuracy, 89.34%, sensitivity and specificity, 88.52% and 90.16%, respectively, Fig. 5b). Applying thresholding (bootstrapped 10,000 samples) and multiple comparisons correction (false discovery rate [FDR] corrected, p < 0.001) revealed that a network including the VS, ventromedial prefrontal cortex, dorsomedial prefrontal cortex and middle frontal gyrus strongly contributed to the prediction of positive or negative outcomes during early learning (Fig. 5a, for a validation in an independent dataset see Supplementary Fig. S8). Based on our a priori regional hypothesis about the crucial role of the VS in reward learning we examined effects of LT on VS neural representations for positive outcomes. Our results suggested that only following LT -but not PLC -the VS expression accurately differentiated positive from negative outcomes (accuracy = 78.33%, p < 0.001, sensitivity = 0.83, specificity = 0.73; PLC, accuracy = 56.45%, p = 0.37, sensitivity = 0.55, specificity = 0.58), with a direct comparison between the treatment groups suggesting that LT specifically enhanced the VS representation for positive outcomes (t (59) Fig. 6a), reflecting facilitated approach of the previously learned best option following LT. Furthermore, we explored how previous learning experience supported the optimal choice behavior by conducting correlation analyses for learning rate and choice behavior during transfer. We found that learning rate for negative outcomes was not significantly correlated with choice accuracy or response times for choosing A or avoiding B across treatment groups. However, in the LT group, subjects with a high learning rate for positive outcomes during initial learning showed faster responses towards maximum reward (A) and avoiding the worst stimulus (B) during learning transfer (Fig. 6b).
On the neural activation level, we did not observe treatment effects of LT on learning transfer related regional activation, as well as on VS activity for each value difference level (all ps > 0.05, Supplementary Fig. S9). However, on the level of functional connectivity LT increased functional connectivity between the VS and left dorsolateral prefrontal cortex (left dlPFC, peak MNI: x,y,z = −48,22,28, t (59) = 5.15, k = 197, P svc-FWEpeak = 0.01, Fig. 6c), reflecting an LT-induced enhancement of VS-dlPFC communication when approaching maximum rewards during learning transfer.

DISCUSSION
The present pharmacological study utilized computational modeling in combination with fMRI to examine the effects of transient LT-induced AT1R blockade on reinforcement learning and the underlying neural mechanism in healthy individuals. On the behavioral level LT facilitated choice accuracy in the most difficult reward condition while it specifically reduced learning from negative outcomes and enhanced exploitatory choice behavior. On the neural level, the behavioral effects were paralleled by regional-specific effects on the ventral striatal-orbitofrontal reward systems, such that LT increased RPE signaling in these regions and sharpened the fine-grained neurofunctional distinction between positive and negative outcomes in the VS. During learning transfer, LT facilitated approach of the maximum rewarding option and enhanced VS-dlPFC functional connectivity. Overall, these findings indicate that LT-attenuated learning from negative feedback in the context of general positive outcome learning and increased motivation to obtain maximum rewards during subsequent learning transfer, which on the neural level was accompanied by enhanced RPE signaling and functional communication in fronto-striatal circuits.
We found that LT specifically enhanced choice accuracy for the most difficult condition suggesting that LT specifically improved learning under a low reinforcement probability. Computational modeling additionally allowed a more fine-grained examination of the behavioral effects by fitting trial-by-trial learning behavior and revealed that LT specifically reduced the learning rate for negative outcomes and enhanced exploitatory choices. The optimal learning ability can be understood when learning rate and other free parameters are considered simultaneously (e.g., exploreexploit tendency) in the RL model and the reward schedule [45]. Effects of LT on learning rate were outcome dependent, such that LT specifically decreased learning from negative outcomes, indicating an attenuated influence of negative information on reinforcement learning. Within the context of a stable reinforcement schedule, it is adaptive for an agent to ignore relatively rare and potentially misleading negative feedback given that an oversensitivity to negative outcomes would cause suboptimal choice behavior. Therefore, a decreased negative learning rate may signal relatively high approach for positive outcomes in a stable reward contingency, which in turn may facilitate an exploitatory choice tendency in terms of consistent decisions for options with a higher expected reward [33]. Previous studies demonstrated that enhancing central dopaminergic activity increases choices towards monetary gains [1,46]. The current pattern of results may reflect modulatory effects of RAS blockade on dopaminergic neurotransmission given that LT has been shown to induce stronger D1 receptor expression [47] which has been associated with better reward-associative learning [48]. These findings resonate with recent studies reporting an LTinduced enhancement of learning from positive relative to negative events [12] as well as an LT-induced shift from preferential social punishment towards social reward processing [17]. Together, this pattern of effects suggests that LT can attenuate the impact of negative information thus promoting motivation to select rewarding options.
On the neural level LT increased orbitofronto-striatal RPE signaling and induced a more distinct neural expression for positive outcomes in the VS. The VS dopamine neurons are critically involved in RPE signaling and reward seeking [49], while the OFC is strongly implicated in computation of expected reward values and RPE [50]. An LT-induced enhancement of the neural RPE signal and the representation of rewarding outcomes may reflect the potential for the RAS to modulate central dopaminergic neurotransmission during reinforcement learning. The AT1R is expressed densely in dopamine-rich brain areas [20], particularly in the striatum [51] and plays a key role in dopaminergic function [52]. Administration of an AT1R antagonist modulates the D1 receptor expression in the striatum [47] and modulates the functional response of the D2 receptor [21] -both dopaminergic receptors exhibit dense expression in ventral striatal and prefrontal regions crucially involved in reward learning [53,54]. This may indicate a potential downstream effect of LT-induced AT1R blockade on DA signaling, in turn modulating reward learning within orbitofronto-striatal circuits, thus enhancing RPE encoding and reward representation in these regions. Fig. 6 Behavioral and neural effects of losartan during the transfer phase. a In the transfer phase, all participants responded quickly when choosing stimulus A or avoiding stimulus B, and relative to the placebo group, the losartan group exhibited faster responses for approaching stimulus A in a novel environment. b Moreover, for losartan-treated participants, higher learning rates for positive outcomes in the initial learning phase were related to faster responses towards maximum reward (A) or avoiding the worst option (B) in the ransfer phase, while this association was absent in the placebo group. c For illustration purpose, parameter estimates were extracted from a spherical (radius: 6 mm) ROI in the identified left dorsolateral prefrontal cortex (dlPFC) region. Losartan increased functional coupling between the ventral striatum (VS) and the left dlPFC when participants choose A stimuli in the transfer phase. The statistical map of the left dlPFC was thresholded at p < 0.001 uncorrected (whole-brain level) for display purpose. The error bars denote standard error of the mean. *p < 0.05, ***p < 0.001. LT losartan, PLC placebo, dlPFC dorsolateral prefrontal cortex.
During subsequent learning transfer, LT facilitated approach of the maximum reward in terms of accelerated decisions. This effect was not linked with the role of the VS in value difference processing but was observed in the context of enhanced functional coupling of the VS with the left dlPFC. Faster decisions for choosing the best options following LT may reflect an increased motivation to focus on maximizing rewards after reinforcement learning. The findings partly align with early research on dopaminergic modulation of reinforcement learning, which reported improved motivation for the highest-rewarding option during transfer, an effect that was explained as dopaminedependent enhancement of learning signals [55]. The important role of fronto-striatal connectivity in reinforcement learning has been extensively documented [41], indicating that reward associations initially formed in the striatum are subsequently used to guide learning and decisions engaging the dlPFC [56]. The dlPFC plays an important role in integrating and transmitting reward representations to the mesolimbic and mesocortical dopamine system to initiate reward-motivated behaviors [57]. Reduced striato-dlPFC connectivity has been observed in disorders characterized by a dysfunctional dopamine system [58] and linked with impaired reinforcement learning [59]. As such, the present findings of an LT-induced increase in VS-dlPFC connectivity when approaching rewards might reflect a modulatory role of angiotensin signaling on fronto-striatal communication via effects on dopaminergic circuits.
Given the repeatedly observed hypersensitivity for negative information and an increased impact of negative information on learning in depression [60], the current pattern of behavioral effects may reflect a potential of LT to normalize biased processing in depression and in turn improve motivational deficits. The therapeutic potential in depression is further supported by early animal models suggesting a crucial role of the RAS in depression [61,62] and documenting potential antidepressant behavioral effects of LT [63,64]. Initial studies aimed at targeting reward processing and reinforcement learning impairments in depression via directly targeting the dopaminergic system [65,66]. These studies revealed initially promising evidence for a therapeutic potential of DA agonist in depression including normalized neural functioning in fronto-striatal reward systems [65,66] and anhedonia improvement [5]. However, effects on impaired reward learning were not observed and the clinical utility of DA agonist is limited by adverse effects such as impulsive behavior [67] and abuse [68]. The current pattern of results may suggest that LT may represent a safe and potentially behavioral relevant option to modulate deficient reward learning and frontostriatal functioning in depression.
While the current study found some evidence for a novel pathway to modulate reward learning, future studies are required to: (1) determine the potential of LT to influence reward learning and associated fronto-striatal deficits in depression, and (2) uncover the detailed interaction mechanism between the RAS with DA systems during reward learning such as incorporating receptor maps in combination with molecular imaging. In addition, future studies are required to demonstrate whether the observed effects generalize to women.
Taken together, we demonstrated that AT1R blockade via LT decreased the negative learning rate but did not affect learning from positive outcomes, while increasing RPE signaling in orbitofronto-striatal regions and improving neural expression of positive outcomes in the VS. During the subsequent transfer, LT accelerated choices for maximizing rewards and increased VS-dlPFC functional coupling. Together, this pattern may reflect a promising candidate mechanism of LT as a potential treatment to normalize impaired reward learning and fronto-striatal functioning in depression.

DATA AVAILABILITY
Unthresholded group-level statistical maps are available on NeuroVault (https:// neurovault.org/collections/12001/). Additional data and code related to study is available from the corresponding author upon reasonable request. A preprint of the manuscript was archived on the biorxiv.org repository (https://doi.org/10.1101/ 2022.03.14.484364).The present study was pre-registered on Clinical Trials.gov (Trial name: The effects of losartan on reward reinforcement learning; Registration number: NCT04604938; URL).