Outcome context-dependence is not WEIRD: Comparing reinforcement- and description-based economic preferences worldwide

Recent evidence indicates that reward value encoding in humans is highly context-dependent, leading to suboptimal decisions in some cases. But whether this computational constraint on valuation is a shared feature of human cognition remains unknown. To address this question, we studied the behavior of individuals from across 11 countries of markedly different socioeconomic and cultural makeup using an experimental approach that reliably captures context effects in reinforcement learning. Our findings show that all samples presented evidence of similar sensitivity to context. Crucially, suboptimal decisions generated by context manipulation were not explained by risk aversion, as estimated through a separate description-based choice task (i.e., lotteries) consisting of matched decision offers. Conversely, risk aversion significantly differed across countries. Overall, our findings suggest that context-dependent reward value encoding is a hardcoded feature of human cognition, while description-based decision-making is significantly sensitive to cultural factors.

Among several features characterizing human RL, the notion of outcome (or reward) context-87 dependence has recently risen to prominence 16 . More specifically, a series of studies conducted 88 mostly with Western, Educated, Industrialized and Democratic (WEIRD) populations 20 have 89 shown that in many RL tasks, participants encode outcomes (i.e., rewards and punishments) in a 90 context-dependent manner 21,22,23,24 . While there may not be a consensus yet concerning the 91 exact functional form of such context-dependency, the available findings overwhelmingly favour 92 the idea that subjective outcomes are calculated relatively, following some form of range 93 normalization 25,26,27 . Such context-dependence-induced rescaling of subjective outcomes is often 94 interpreted as a consequence of efficient information coding in the human brain 28,29 . According neurocomputational constraints akin to those observable in perceptual decision-making 30,31,32 . In 97 accordance with this proposal of outcome context-dependence in RL as a form of efficient coding, 98 multiple studies using similar tasks in different species have consistently found evidence of range-99 value adaptation, which suggests that this may be an evolutionary stable, "hard coded", principle 100 of brain functioning 37,38 .

102
One well-known consequence of context-dependence in RL is that, in some cases, it can induce 103 suboptimal decisions 25,26,27 . In particular learning contexts, individuals mistakenly attribute 104 higher subjective values to objectively worse options because of how these options are appraised 105 in relation to the local reward distribution, resulting in choices that fail to maximize reward. If 106 indeed there exists such a fundamental computational constraint in the human brain, the 107 behavioural signatures of context-dependence should be a stable feature of decision-making, 108 and thus persist across different populations and cultures. In the present work, we set out to test 109 this hypothesis by leveraging a task capable of eliciting context-dependent RL behaviours, and  This allowed us to test the cross-cultural stability of context-dependent value encoding in human 113 RL, and thus assess for the first time its putative role as a core computational process of 114 experience-based decision-making. In addition, we also administered to our participants a description-based decision-making task 117 that included the same decision contexts as the RL task. The rationale behind this second task 118 was two-fold. First, it allowed us to determine to which extent choice behavior measured in the 119 RL task can be explained by risk aversion, using standard procedures in behavioural economics. 120 Second, it gave us the opportunity to compare for the first time the variability of experience-121 based and description-based decision-making processes across countries.

123
Our results indicate a remarkable similarity in how context effects manifest in decisions from 124 experience and suboptimal choice across countries, consistent with the idea that outcome 125 representation in human RL behaviours may reflect conserved constraints on cognition. Our 126 results also showed that risk aversion inferred from the description-based lottery task could not 127 account for these effects. Interestingly, description-based decisions were also found to be highly 128 variable across countries, further confirming the functional dissociation between the behaviour 129 elicited by the two modalities 6,7,33 . Exploratory analyses using independent socio-economic, 130 cultural and cognitive measures taken from our samples further showed that the origin of cross-131 country differences in description-based decisions is multifactorial, as previously found for risk 132 and other cognitive domains 5,34,35 . Overall, our results suggest that reinforcement (experience-133 based) decision processes are much more culturally stable than description-based ones and have 134 important implications for theories of bounded rationality 18,19 . We conclude this work by 135 discussing the possible implications of these results for the current implementation of policies 136 and interventions aimed at contrasting the burden of biased decision-making.

141
Behavioural protocol 142 Our behavioural protocol consisted of a reinforcement learning (RL; i.e. experience-based) task, 143 in the form of a previously validated two-armed bandit task 26 , followed by a description-based 144 decision-making task consisting of choices between lotteries (Fig. 1A). Both decision-making were presented with eight abstract icon cues, each representing a lottery of non-disclosed 151 expected value, paired in four stable decision contexts. In the Learning phase, each decision 152 context featured only two possible outcomes: either 10/0 points or 1/0 points. The outcomes 153 were probabilistic (75% or 25%). For convenience, contexts were labelled by taking into account 154 the difference in expected value between the most and the least rewarding option, i.e. the 155 expected value-maximizing ("correct") and the expected value-minimizing ("wrong") options 156 (Fig. 1B). In the ensuing Transfer phase, these same eight lotteries were rearranged into new 157 decision contexts [as previously done in similar designs for humans and birds 22,26,36,37,38 ]. In 158 addition to the change in decision contexts, the key difference between the Learning and the 159 Transfer phases was that, while during the former participants were presented with complete 160 feedback, in the latter no feedback was provided, so that choices could only be based on values 161 learned during the Learning phase (Fig. 1B). Finally, we conducted an additional task, which we 162 identified as the Lottery task (Fig. 1C). There, the values (magnitudes and probabilities) of the 163 options were explicitly disclosed. The Lottery task featured the same decision contexts used in 164 the Transfer phase, and four additional contexts designed to better assess risk preferences. These 165 last contexts consisted of choices comparing varying probabilities of winning 10 points (100%, 166 75%, 50%, 25%) against the certainty of winning 1 point.  (Fig. 1D). Country selection was aimed at portraying a gradual spread across the United Nations'

174
Human Development Index 39 . This coefficient is built with many metrics, such as GDP, 175 industrialization, mean education level, income inequality, and liberty indexes (Fig. 1E, left). To Muthukrishna and colleagues' cultural distance metric 40 , to estimate the cultural difference 178 between each of the selected countries with respect to the USA and India, which represented 179 the higher and lower HDI values in our sample (Fig. 1E, right). 180 In order to ensure that our samples would adequately represent the culture of the country to 181 which they belonged, inclusion criteria required that participants: (1) had the target country 182 nationality, (2) resided in the target country, (3) had completed at least the full basic education 183 cycle in the target country, and (4) spoke the country's official language as their native language.

184
These criteria were assessed for each participant during a video meeting prior to launching the 185 experiment. The meeting, task instructions, and questionnaires were delivered in each country's 186 official language, by local researchers.  (Table S1).

201
Reinforcement learning task (experience-based) 202 We first looked at performance in both RL phases. We focused on correct responses (i.e., 203 probability of picking the expected value-maximizing choice) as the behavioural dependent 204 variable. Correct response rate was analysed separately in each RL phase (i.e. Learning and 205 Transfer), as a function of decision context (within-subjects variable) and country (between-206 subjects variable). We also compared the correct response rate against chance level (0.5) to 207 assess learning and preferences. As in previous studies using the same or similar designs 22,26 , of 208 particular relevance for the demonstration of outcome context-dependence were: i) the  Results showed that the average correct response rate for the Learning phase was significantly different from chance level 0.5 for all countries and decision contexts ( Fig. 2A), which confirmed 216 that learning had occurred (pooled sample: ∆EV = 5, 0.8 ± 0.2, t(560) = 42, p < .0001, d(95% CI) =  Table S3 for model selection, Table S4 for full regression results). While we found 219 significant differences in aggregate performance between countries (Country main effect: χ2 = 220 58, DF = 10, p = <.0001), learning and above-chance performance levels were observable in all 221 samples and contexts (Fig. S2). 222 Importantly, we did not find evidence for any magnitude effects in any of the country samples, to predict accuracy as the same model without it). 228 We then turned to the analysis of the Transfer phase (Fig. 2B). In this case, correct choice rates    Table S5). Once again, while significant differences in aggregate performance 242 between samples were found (Country main effect: χ2 = 19, DF = 10, p = .04), the evidence did  Table   247 S6 for post-hoc pairwise contrasts).   Lottery task (description-based) 261 We then analysed participants' preferences in the description-based Lottery task (Figs. 2C, 2D). 262 We first considered choices in the decision problems aimed at benchmarking risk preferences, 263 where a sure small payoff (1pt) was presented against risky options with varying probabilities of 264 delivering a bigger payoff (10pts). These four decision problems allowed us to estimate risk  Table S5 for per-country T-test analyses). This indicated that 277 preferences expressed in the description-based task were not cross-culturally stable, unlike 278 behaviour observed in the RL task.

279
After assessing the detectability of risk aversion in the benchmark decision contexts of the Lottery 280 task, we analysed preferences in the decision contexts homologous to those of the Transfer 281 phase in RL (Fig. 2D). This allowed us to directly compare between experience-based and 282 description-based preferences. We focused mainly on the behaviour expressed at the ∆EV = 1.75 283 decision context, where a tendency to significantly choose suboptimal choices can be interpreted 284 as a sign of context dependence in the RL task. Crucially, and contrary to RL behavior, results 285 showed that in all countries correct choice rate was significantly above chance for this decision  Table S6 for post-hoc pairwise contrasts). In order to directly compare between 292 descriptive and experiential choices at the ∆EV = 1.75 context, we modelled preferences in this 293 decision context by including an additional regressor (Decision Type; levels: RL, Lottery). Results 294 indicated a significant Decision Modality effect (χ2 = 216, DF = 1, p = <.0001) that confirmed the 295 difference between the two tasks.

297
Overall, results from the Lottery task illustrated two important points. First, we were able to 298 detect significant across country behavioural differences in our sample. This excludes that 299 absence of effect in the RL task can thus not be ascribed to a general inability of detecting 300 behavioural differences with our protocol. Second, these findings showed that risk aversion, as 301 inferred from preferences expressed in the Lottery task, could not account for preferences in the 302 RL task. This was specifically true for the key ∆EV=1.75 decision context, where we observed a 303 clear case of preference reversal when comparing the two decision modalities 45 .

306
To quantify the observed decision-making strategies in a systematic manner that encompassed 307 all decision contexts across all tasks, we formalized choice behaviour using simple models built 308 around the notion of subjective outcome scaling. This choice was motivated by the fact that this 309 outcome scaling process, described below, could satisfactorily and parsimoniously capture the Lottery decision contexts by fitting specific parameters as RL, RL and νLOT, LOT, respectively. We 329 made sure that our fitting procedure allowed us to correctly recover the parameters in simulated 330 datasets, as well as produce simulations that would closely replicate the observed behavioural 331 data (see Supplementary Materials for procedure and results of simulations and parameter 332 recovery).

333
Utilizing the same scaling parameter [ ] in both models was a crucial step in the formalization, as 334 it allowed us to compare experiential and descriptive adaptation mechanisms in the same terms, 335 while integrating all the possible decision contexts. We expected νRL to reflect context-dependent 336 range-value adaptation in the RL task, and νLOT to capture marginally decreasing utility (and 337 therefore risk aversion) in the Lottery task. It follows that νRL was expected to remain invariant 338 across country samples, confirming that relative value-encoding occurred universally, and 339 independently of risk preferences. Conversely, we expected ν LOT to differ significantly between 340 countries, in line with the observed risk aversion behaviours for each country sample, and to be 341 decorrelated from νRL.

342
As shown in Fig. 3A, scaling patterns conformed to these hypotheses. First, we found minimal 343 evidence for differences between countries in νRL (νRL ~ Country; SS = 0.98, DF = 10, p = 0.07). We  Table S7). 358 In sum, our computational approach confirmed strong evidence for stable cross-country 359 outcome context-dependence in the RL task using a compact computational measure. A similar 360 analysis performed in the Lottery task, confirmed that the preferences in the RL task could not 361 be accounted for risk aversion inferred from the Lottery task. Crucially, these results also 362 confirmed a difference in the stability of experience-and description-based processes across 363 countries.

365
In order to discard that the differences found in scaling between phases could be confounded by   Nonetheless, it should be noted that cultural metrics generally predicted changes in νLOT , but not 399 RL, which was consistent with the robustness of RL biases to cultural factors, as well as the gap 400 between experiential and descriptive choices found in our main results. In the present work, we sought to assess the cross-cultural stability of a recently discovered but In addition to our RL task, we also administered a description-based task featuring the same 416 decision contexts. This allowed us to demonstrate for the first time that risk aversion (as 417 standardly inferred in behavioural economics from lottery tasks) could not account for 418 behavioural signatures of context-dependence in the RL task (especially suboptimal preferences).

419
Further, we have also shown that while experience-based processes and preferences were 420 remarkably stable across the included countries, description-based processes were not.

421
By replicating the finding of value context-dependence outside the WEIRD space, our work shows 422 that this cognitive process is not likely to be a simple cultural artefact 50,51 . Of course, we 423 acknowledge that our current sample is not diverse enough to argue for a definitive universality 424 of contextual value encoding in RL. We also acknowledge that our samples may be neglecting  These differences between the RL and Lottery tasks, concerning both subjective outcome 482 encoding and cross-cultural stability, were well recapitulated by our modelling approach. We 483 devised a simple parsimonious outcome-scaling process, that fitted to both experiential and     Behavioural task: there were two behavioural tasks, the Reinforcement Learning (RL) task and 564 the Lottery task (Fig 1.A). The RL task was a direct reproduction of the probabilistic instrumental In the RL task, the lotteries for each decision context were represented by abstract stimuli (cues) 571 taken from randomly generated identicons. Identicons were generated so that hue and 572 saturation had similar values within the HSLUV colour scheme (www.hsluv.org). In the Lottery 573 task, cue cards displaying the reward and probability values for each option were used instead.

574
For all tasks, each decision context was formed by two cues, one at each side of the screen, 575 equidistant to the screen centre. Each trial consisted of a single decision context. Stimulus 576 location was pseudo-randomized, so that every cue would appear an equal number of times on 577 each side of the screen.

578
In the RL task, participants had to complete a Learning phase, and then a Transfer 579 phase 16,21,22,23,24,25,26,49 . In the Learning phase (Fig. 1B, upper) (Fig. 1B, lower). In the Lottery task (Fig. 1C), participants had to choose 591 between explicit cue cards, which were paired reproducing the 4 decision contexts of the 592 Transfer phase, and another 4 decision contexts comparing varying probabilities of winning 10 593 points (100%, 75%, 50%, 25%) versus the certainty of winning 1 point (∆EV = 9, ∆EV = 6.5, ∆EV = 594 4, ∆EV = 1.5). Neither Transfer phase nor the Lottery task presented any post-choice feedback: 595 choices were followed by a fixed 500 ms delay interval, after which "???" cue cards were 596 displayed for 1000 ms. Each decision context of the RL task (4 in Learning phase, 4 in Transfer 597 phase) was presented 30 times, for a total of 240 trials. Decision contexts of the Lottery task (4 598 reproducing Transfer, 4 benchmarking risk aversion) were presented 4 times each, for a total of 599 32 trials. Presentation order of decision contexts was pseudo-randomized within each phase, so 600 that all trials of a given decision context would be cr lustered (i.e., "blocked" stimuli 601 presentation).   Table S2). 638 Instructions were followed by a short training session of 12 trials, designed to familiarize 639 participants with response modality. Participants could decide to repeat the training session up 640 to two times prior to starting the actual experiment. After finishing the training session, 641 participants had to complete the RL task (Learning and Transfer), the Lottery task and the 642 sociocultural questionnaires, in that order. The existence of the Transfer phase was not disclosed 643 until the end of the Learning phase, to prevent the use of alternative strategies. Crucially, before 644 starting the Transfer phase, participants were made explicitly aware of the fact that they would 645 be presented with the same cues they had seen during the Learning phase, but combined in 646 different pairs. Before starting the Lottery task, participants were shown an example of a cue 647 card with its explicit reward probability and magnitude written on it, and were again instructed 648 to choose the option that they thought would maximize overall point reward. Following