Distinct computational mechanisms underlying cognitive flexibility deficits in impulsivity and compulsivity


 Cognitive flexibility, the ability to quickly adapt to changing environmental demands, is a hallmark of human behaviour, and is impaired across multiple psychiatric disorders. Especially compulsivity and impulsivity disorders have been linked to impaired adaptive learning and flexibility. Initial computational investigations suggested these distinct psychiatric dimensions suffer from the same underlying neurocognitive impairments, related to stochasticity during choice. However, a recent advance in computational neuroscience has demonstrated that imprecision in the learning process itself can account for a large portion of behavioural variability traditionally attributed to choice-stochasticity. Here, in a series of large-scale experiments using both lab-designed and gamified citizen-science tasks, we show that distinct computational markers are affected in compulsivity and impulsivity. Whilst impulsivity is tied to an imprecision in learning across valence domains, (hygiene-related) compulsivity is linked to choice-stochasticity. This double-dissociation demonstrates that distinct neurocomputational mechanisms can drive seemingly similar behavioural deficits, only dissociable using targeted computational approaches.


Introduction
The ability to learn and adjust to changes in action-outcome contingencies lies at the core of flexible behaviour and is critical for survival in dynamic environments. Imbalances in cognitive flexibility are a hallmark of "impulsive-compulsive" disorders such as obsessive-compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), and substance abuse [1][2][3][4][5] .
Despite the apparent phenomenological opponency between compulsivity and impulsivity, behavioural and computational analyses of cognitive flexibility thus far have found rather similar impairments showing deficits in the choice rather than the learning phase of adaptive (reversal) learning paradigms. Compulsivity was often linked to increased choice switching [12][13][14][15][16][17][18] , similarly to impulsivity, which has been linked to increased switching often seen as exaggerated exploration [19][20][21][22] .
However, recent advances in the neurocomputational understanding of perceptual and valueguided decision-making have put previous modelling attempts into question 23,24 by identifying a new source of behavioural variability that has previously not been accounted for, and which stems from the imprecisions in the inference process itself 23,24 . In reward-guided learning 24 , this inference imprecision translates into imprecisions in the learning process itself, i.e., random variability in feedback processing. These learning-driven imprecisions account for over 2/3 of behavioural variability that has previously been attributed solely to choice variability 24 . They also demand to re-consider several behavioural phenomena, such as choice hysteresis (repetition of previous choices; as proposed in compulsivity) and the presence of what was typically assumed to be exploratory decisions (as proposed in impulsivity) 24,25 .
In this series of large-scale computational studies, we thus examined whether and how distinct computational mechanisms can account for the cognitive flexibility impairments seen in impulsivity and compulsivity. In the first dimensional online study, we used this recently developed task and model 24 to demonstrate that impulsivity is linked to an exaggerated learning imprecision but not choice deficits, whilst compulsivity (primarily washing compulsions) is linked to imbalances in choice stochasticity but not learning. In a second experiment, we demonstrate that this double-dissociation holds true for novel forms of data collection using crowd-sourced, citizen-science smartphone apps. Our findings thus not only provide a re-interpretation of the cognitive flexibility deficit mechanisms, but also reveal that seemingly similar impairments can have distinct computational origins, which provide us with critical new information about their underlying neural mechanisms 15,20,[26][27][28][29] .

Learning imprecisions account for choice deficits in adaptive learning
To assess cognitive flexibility in compulsivity and impulsivity, we collected a large non-clinical sample (N=392; age 32.6±12.4 (mean±s.d.), range 18-72, 204F) via the Prolific online workers platform. Subjects played a restless 2-arm bandit reward learning task online, similar to the one we previously used to examine learning noise imprecisions 24 (Fig.1A-B, Methods, STable1).
Overall, subjects accumulated more rewards than chance, demonstrating they adapted their behaviour based on the received information (t(391)=36.29, p<0.0001), and their performance was comparable to the performance obtained in the previous laboratory study (N=30) 24 (Fig.1D) confirming that careful online experimentation can elicit similar data as lab-based studies 30 .
Recent work has highlighted the importance of accounting for learning errors when modelling behavioural variability in reward-guided decisions under uncertainty 24,25 .
To probe whether this also held true for our online sample, we ran a model comparison and found that indeed a 'noisy' model with learning imprecision better explained subjects' behaviour than a traditional 'exact' reinforcement learning (RL) model without any learning noise ( Fig.1E; posterior model exceedance probability Pexc>0.99).
Importantly, this winning model allowed for two sources of variability: at the moment of choice (through a softmax inverse temperature parameter) and at the moment of action value update through learning imprecision (a learning noise parameter that scaled with the absolute magnitude of prediction error on each trial). This 'noisy' model also fit subjects' choices better than a model with just learning imprecisions (i.e., noisy 'argmax' model) (Fig.1E), suggesting that both sources of variability are needed to best explain behaviour.
Finally, we examined to which extent the learning noise explained variability previously attributed to choice stochasticity 31 . We found that learning noise explained more than 73.9±1.5% (mean±s.e.m) of total behavioural variability in our data (Fig.1E), in line with previous findings 24 .
This suggests that previous findings linked to choice stochasticity as a primary source of interindividual differences may be inadequate and could be explained through other mechanisms.
To make sure that these associations were not driven by other covariates, we additionally controlled for age, gender and other computational model parameters. We replicated the above effects ( We did not observe any relationship between the learning rate for the chosen option and either of the two psychiatric traits (both p-values>0.17), suggesting that inter-individual differences in compulsivity and impulsivity did not depend on the exact learning process.

Distinct associations of impulsivity and compulsivity when accounting for learning imprecisions
We next investigated the model parameters of the better fitting 'noisy' model, which additionally captures learning imprecisions. Interestingly, when learning noise was taken into account, the choice stochasticity was no longer linked to impulsivity ( We, again, observed no relationship between the learning rates and either psychiatric trait suggesting impulsivity is not associated with poorer or exaggerated learning per se, but rather, it is linked to a more inconsistent learning over time 22,46

Associations with specific compulsivity and impulsivity components
Impulsivity and compulsivity as captured here are multifactorial constructs that aggregate across what is believed to be separable components of compulsivity and impulsivity 1,10,50,51 . Previous factor analyses of BIS and OCI-R revealed multiple subscales, which characterize different facets of these endophenotypes [37][38][39] and contribute to different behavioural deficits 1,8,52,53 .
We thus assessed which components of compulsivity and impulsivity were most strongly linked to the model parameters observed above. As these factors are correlated with each other (R-values>0.3, SFig.6C-D), we conducted separate regression analyses and then corrected for multiple comparisons.
When investigating choice stochasticity, we found that this parameter was primarily linked to the compulsivity subscales of "washing" ( Next, we investigated learning noise as the parameter primarily linked to impulsivity. We found that learning noise was specifically associated with the motor impulsiveness subscale ( Fig.3B; = 0.17 ( = 0.03), < 0.0001, αcorrected=0.004), capturing the propensity to act prematurely without foresight 37,39 (all other subscales p-values>0.05, STable2).

Cognitive flexibility in crowd-sourced smartphone app data
Online experiments, as used above, have been highly successful for studying inter-individual differences in psychiatric traits and transdiagnostic approaches 34,54,55 . Nevertheless, using online worker platforms has several limitations 30,56 . First, the participant pool is limited to registered workers whose primary interest is most often financial. Moreover, the platforms are restricted in age (adults only), language (mostly English speakers) and most often confined to a specific geographic location (here: United Kingdom). More importantly, using laboratory tasks online does not answer the question whether obtained results could generalize across different contexts and have sufficient ecological validity for a potential translation into clinical applications 57,58 . Thus, several research groups turned towards studying cognition through gamified tasks and more ecologically valid settings using crowd-sourced game platforms, such as smartphone app games 30,[59][60][61][62] .
The goal of this second experiment was thus not only to replicate our findings in a more ecologically realistic sample, but also examine whether similar results could be obtained using a substantially shorter task in a less well controlled environment which is likely to be more reflective of any potential future clinical translation 57,58 .
We designed a short, gamified version of the 2-arm bandit task from experiment 1 (Fig.1C) and used it as part of the smartphone-based Brain Explorer research app (www.brainexplorer.net).
We collected an independent large convenience sample of app users (N=2610, age 41.8y±15.9 (s.d.), range (18-85y), 1499F; STable1) who played the game voluntarily and without reimbursement (comparison of samples cf SFig. [3][4][5]. Subject played a "Milky Way" game with the goal to cumulate as much space milk as possible from two space cows (i.e., bandits with drifting reward magnitudes, Fig.1C). The performance between the two experiments was comparable, but slightly lower in the smartphone sample ( Fig.1D difference is expected as the latter task is substantially shorter, has a much briefer training and no pre-task quiz (as used in experiment 1; also see 30 ). However, the second sample still  1F). This highlights that learning noise is of particular importance when assessing learning using smartphone apps like the one used here.

Associations with impulsivity and compulsivity
All users also completed the BIS and OCI-R questionnaires (STable1), with scores comparable These findings confirm our previous observation that general impulsivity was exclusively linked to learning imprecisions when accounting for this mechanism.
Interestingly, unlike in the first experiment, general compulsivity was no longer linked to choice stochasticity ( Fig

Compulsivity and impulsivity subscales reliably linked to behaviour
In the first experiment, we identified that specific subscales were more closely associated with biases in both choice stochasticity and learning imprecisions. As these showed stronger associations with the parameters, we specifically tested whether we could find the same associations in the smartphone sample. We thus investigated whether motor impulsiveness was linked to learning noise, and whether hoarding and/or washing compulsions were linked to choice stochasticity.
We indeed confirmed the specific association between motor impulsiveness and learning noise Next, even though we did not replicate the association between choice stochasticity and overall compulsivity, we were able to replicate the association between choice stochasticity and washing compulsivity ( STable3). Our findings thus suggest that the specific link between choice stochasticity and compulsive washing was strong enough to remain significant in this somewhat noisier smartphone-based data set.

(Motor) impulsiveness drives learning imprecision across valence domains
Lastly, we tested whether the learning imprecision effects observed in impulsivity would generalize across valence domains, i.e., whether they would also be present in learning to avoid punishment. This is particularly relevant for impulsivity as previous studies linked impulsivity to altered cognitive flexibility and learning from punishment [63][64][65][66] .
We thus implemented a version of the same smartphone game in the loss domain. Instead of collecting points users were instructed to learn how to avoid losing the points already won We verified that the learning noise model also better accounted for this task (Pexc=0.99) (Supplementary Information) with comparable parameters across both games (SFig.7A-D).
As in the reward domain, we replicated the positive association between general impulsivity ( Fig.5C-D, Stable4, Supplementary Information). We further found that motor impulsivity was specifically linked to learning noise, suggesting that these associations are present across both valence domains.

Discussion
Reduced cognitive flexibility and adaptive learning are hallmarks of many impulsive and compulsive disorders 4 Noradrenaline is believed to play a key role in the learning variability [70][71][72] . Previous work found that learning imprecisions were linked to indirect markers 24 of noradrenaline functioning [73][74][75] such as activity in the dorsal anterior cingulate cortex and pupil size fluctuations. This suggestion also aligns well with the assumption that many impulsivity disorders, such as ADHD, are linked to impaired noradrenaline functioning 19,26,76,77 . However, our study cannot rule out that an increased exploration during choice -as proposed in previous studies 19,22,36,45 -is still present in impulsivity. Studies that investigate exploration in the absence of any learning 20,36,71 suggest that increased exploration might still be linked to impulsivity and reflect noradrenaline functioning 71 . Interestingly, preliminary pharmacological studies suggest that noradrenaline is linked to increased learning imprecisions rather than choice stochasticity. 25,78 Impulsivity represents a heterogenous symptom dimension which shows substantial variability 10,50,68,79 . Here, we observed a particularly strong association between motor impulsiveness (i.e., acting without thinking and the inability to withhold a response) and learning noise. Previous work showed that motor impulsiveness contributed to disinhibition possibly via noradrenaline system 52,80 , suggesting a potentially more specific link.
In contrast to impulsivity, our computational analysis of compulsivity -more specifically washing compulsivity -confirmed a reduced precision during choice, even when accounting for the learning noise. Importantly, the introduction of learning imprecisions also allowed us to reconsider other computational phenomena that have previously been attributed to compulsivity such as choice hysteresis (or 'stickiness') 47,81,82 . Whilst choice hysteresis improved model fitting in the noise-free model and correlated positively with compulsivity, in the better-fitting noisy learning model, this parameter no longer improved model fits thus inviting to reconsider the previous findings [14][15][16][17] .
Our study was also a testbed for new technologies and the potential of translation and generalisation of such computational tasks. In the first study, we used a well-established paid worker platform, whilst in the second we analysed an unpaid, heterogenous sample of smartphone users which played a short, gamified version of the same experiment. Importantly, we replicated most of the key associations in both samples, which is critical for the findings' robustness and the potential for using such tasks in clinics 57,58 . However, it should also be noted that the association between choice stochasticity and compulsivity was more variable and only robust for washing compulsions. We believe that this is due to a generally increased noisiness in the shorter smartphone-based task, which may render the choice stochasticity parameter less sensitive. Such effects on signal quality are important to consider when building cognitive probes and could arise from differences in the samples (semi-professional vs lay participants), in the study length, instruction, or design. Crowd-source smartphone-based experiments provide an unprecedented access to the population that are difficult to reach in the laboratory settings (e.g., children and elderly) but one should be mindful of the larger variability and elevated measurement noise leading to lower effect sizes 30,58 .
In this study, we demonstrate that by taking into account recent developments in understanding multiple sources of behavioural variability, we can successfully dissociate differences in cognitive flexibility linked to compulsivity and impulsivity. By testing and replicating these effects across different sample and contexts, we demonstrate how advanced computational modelling can help pinpoint distinct neurocognitive mechanisms underlying seemingly similar deficits.

Figure 1. Task structure and contributions of learning noise to behavioural variability
A. Trial structure and design in two-armed bandit task in experiment 1. On every trial, participants were making a choice between two bandits represented by coloured shapes on the screen and observed the outcome (between 1 or 99 points) for the chosen option. B. Example of reward magnitudes for two bandits for one session in experiment 1. Reward magnitudes changed through trials and were sampled from two probability distributions with independently drifting means. Thick lines represent the drifting means of two distributions; dotted lines represent the reward outcomes that could have been observed by the subject conditioned on the choice. This observable reward magnitudes are sampled from the distribution mean with added Gaussian sampling noise to ensure continuous learning in the task. C. Task structure for reward learning two-armed bandit task in experiment 2 (Milky Way game in the Brain Explorer app). On every trial, participants were making a choice between two space cows and then observed how much space milk they have collected in the form of points (from 1 to 99). The underlying distributions of drifting mean rewards were the same as in experiment 1 but the task was substantially shorter. D.
To analyse the subjects' overall performance in both experiments, we computed the difference between the total average reward won by each subject and the foregone reward (to account for the individually generated reward trajectories). A positive difference means that subjects were performing better than at chance level.  24 . Bottom panels. Fraction of behavioural variability attributed to learning noise based on the best winning model in experiment 1 (left panel) and experiment 2 (right panel). In both experiments more than 2/3 of total variability in the decision was due to learning noise rather than choice stochasticity. Black dots represent the results obtained in the previous laboratory study 24 .

Figure 2. Contributions of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise.
A. In the exact model (no learning noise added), both impulsivity traits measured with BIS total score and OC traits measured with OCI-R total score showed a significant association with choice stochasticity (softmax parameter). B. Impulsivity traits were no longer associated with choice stochasticity when learning noise is added to the model. C. Impulsivity traits contributed to learning noise rather than choice stochasticity in the learning noise model. D-F. Same analysis as in A-C for the Brain Explorer (experiment 2) data. All regressions included age, gender, IQ (experiment 1), mental health status (experiment 2), and other model parameters as covariates. Dark bar colours indicate the replication tests based on the results from experiment 1. Error bars are standard errors, *p < 0.05, ** p < 0.01, ***p < 0.001.

Figure 3. Contributions of different subscales of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise in experiment 1 (N = 392).
A. Analysis of impulsivity and compulsivity sub-scales show that hoarding and washing compulsions (right panel) were most closely associated with choice stochasticity in the noisy model, whilst there was no association with any of the impulsivity subscores (left panel). B. In contrast, learning noise was exclusively associated with motor impulsiveness, but no other impulsivity or compulsivity subscore. Error bars are standard errors, *p < 0.05, star represents test results Bonferroni corrected for multiple comparisons across regression models.

Figure 4. Contributions of different subscales of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise in experiment 2 (N = 2610).
A. Choice stochasticity from the noisy model was again associated with washing compulsions (replicating the first experiment), but not with hoarding or any other subscore. B. Learning noise was again primarily associated with motor impulsiveness, as in experiment 1. No compulsivity subscale was associated with learning noise. Error bars are standard errors, *p < 0.05, **p < 0.01, *** -p < 0.001. Dark bar colours indicate the replication tests based on the results from experiment 1.

Figure 5. Learning noise also linked to (motor) impulsivity in punishment learning (N = 670).
A. Trial structure for the Pirate Market game in Brain Explorer app -a version of the "Milky Way" game but in the loss domain. This was framed as choosing the one of two pirates, who takes away less milk (hence, we called it 'Pirate Market'). On every trial, users are endowed with 100 points (a full bucket of milk) and must choose a pirate that will steal less milk from them. The amount of stolen milk is presented in points (range from -99 to -1). The game setup and the outcome sequences were analogue to the Milky Way game but in the loss domain. B. Relative average rewards (chosen -unchosen) cumulated by the same sample of users (N = 670) in reward learning game (Milky Way, left) and in the punishment learning game (Pirate Market, right). White dots are medians, error bars are 25 th and 75 th percentile of the performance distributions. As in the reward learning context participants performed significantly better than chance (t(669)=38.8, p<0.0001), demonstrating they understood the task and learned to choose the less punishing bandit. Performance between the two domains was positively associated (r=0.33, p<0.0001), meaning that those who performed better in the reward version also performed better in the punishment version, even though they won slightly less in the punishment version (t(669)=3.00, p=0.003, two-tailed), C. BIS total score was associated with choice stochasticity in the exact, but not in the noisy learning model. D. In the better fitting noisy model, BIS total score was associated with learning noise. Amongst the subscales, motor impulsiveness again showed the strongest association. *p < 0.05, **p < 0.01, ***p < 0.001.

Experiment 1.
Participants played a restless, two-armed bandit game where their goal was to maximize the number of points won that were translated into a monetary bonus at the end of the game. On each trial, participants chose one of the two coloured shapes presented to the left and to the right of the fixation point on the screen and observed an outcome (Fig.1A). Choices were made in a self-paced manner with no time restrictions, but participants were instructed to complete the task within one hour. The task was divided into two sessions of 72 trials each. Each session involved a separate pair of coloured shapes.
All participants completed a training session and had to pass the quiz prior to starting the game.
The task included continuous payoff ranging from 1 to 99 points which were sampled independently for each bandit from probability distribution whose properties were validated in the previous experiment 24 (Fig.1B). The mean payoffs on trial t ̂ followed a random walk process and was sampled from a beta distribution with shape parameters = 1 +̂− 1 ( ) and = 1 + (1 −̂− 1 ) ( ). This parameterization corresponds to a mode equal to ̂− 1 and a spread growing monotonically with , fixed to 3.0. Participants did not observe these mean payoffs directly but instead were shown the rounded to the nearest integer payoffs which were sampled from another beta distribution with shape parameters = 1 +̂ ( ) and = 1 + (1 − ̂) ( ). The mode of this distribution on trial t corresponded to ̂ and a spread growing monotonically with , fixed to 1.5. While parameter controlled the overall volatility of the environment (with on average 1 reversal in every 16 trials), parameter controlled additional instantaneous fluctuations around the mean payoff that were introduced to insure a sufficient level of uncertainty and continuous learning throughout the session.
To analyse the subjects' overall performance in the task, we computed the difference between the total average reward won by each subject and the foregone reward (to account for the individually generated reward trajectories). A positive difference means that subjects were performing better than choosing at chance (Fig.1D).

Experiment 2
Milky Way game. The game represented a gamified version of the laboratory task used in experiment 1. As before, participants played a restless, two-armed bandit game but instead of the coloured shapes, the two bandits were depicted as brown and black-and-white space cows (Fig.1C). The goal of the game consisted of accumulating as many points as possible that were

Experiment 3
Pirate Market game. The game was similar to the Milky Way game but instead of space cows that distributed "space milk", participants encountered space pirates that were stealing the milk from them. The goal of the game was to preserve as much milk as possible but avoiding the most ravenous pirate (Fig.5A). On every trial, participants were endowed with 100 points and made a choice between two pirates, they next observed how many points (between 1 and 99) the chosen pirate had stolen, and the remaining points were added to their cumulative score. For both games, the payoffs were sampled from the beta distributions with the same shape parameters as in experiment 1 (Fig.1B). We verified that participants also in this version performed significantly better than chance (t(669)=38.8, p<0.0001), demonstrating they understood the task and learned to choose the less punishing bandit. Interestingly, performance between the two domains was positively associated (r=0.33, p<0.0001), meaning that those who performed better in the reward version also performed better in the punishment version, even though they won slightly less in the punishment version ( Fig.5B; t(669)=3.00, p=0.003).

Computational model
To model participants' choice behaviour, we used the same models that were previously developed and validated 24 . Choice behaviour was modelled using two versions of the reinforcement learning algorithm. In the first model, we deployed a traditional exact Rescorla-Wagner 83 learning rule that was used to update the state-action Q-values on each trial t following action at trial t-1 and obtained reward r: where is the learning rate that scales the prediction error (PE) between obtained reward −1 and expected reward −1. In this formulation, the update of the values is deterministic and only depends on the learning rate.
In our previous work 24 we introduced a "noisy" formulation of this model which assumes stochasticity in the update rule. On each trial, the updated value is corrupted by additive random noise : where is drawn from a Normal distribution with zero mean and standard deviation equal to a constant fraction of the magnitude of the PE: = | −1 − −1 |. In this formulation of the noisy learning model, the noise added at each update scales with the prediction error similarly to Weber's law. Previous studies have demonstrated its better performance and advantage over models with just random noise (e.g., where the standard deviation does not scale with the prediction error) 24,84 .
The learning noise was applied to the update of both chosen and unchosen option values. While in the complete feedback (see Supplementary Information), the update of both options was based on the observed outcomes, in the partial feedback condition (as described in the main manuscript), we assumed a "decaying" learning rule for the unchosen option where the reward was replaced by the average payoff equal to 50 in our task similarly to the previous experiment 24 .
In both models the choice process was modelled as a stochastic 'softmax' action selection policy, controlled by an 'temperature' and an optional 'choice repetition bias' : where , and , are the values associated with options A and B at time point t. This stochastic action selection policy reduces to a purely greedy (value-maximizing) 'argmax' policy when → ∞.

Model fitting procedure
Model fitting was conducted using Monte Carlo methods 85

Model comparison
We performed a series of Bayesian Model Selection (BMS) analyses to test whether learning imprecisions help account for choice data across the experiments. In all model comparisons we used model evidence conditioned on human decisions obtained from the particle MCMC fitting procedures (Methods, 24 ). We used random-effect model selection procedure where models are treated as random effects that could differ between subjects and their prior frequencies are drawn from Dirichlet distribution. Model posterior frequencies and exceedance probabilities (e.g., how likely it is that a given model is more frequent in the population than other models in the set) were obtained through the BMS algorithm implemented in SPM12 90,91 .
To identify whether learning imprecision is an important source of behavioural variability, we values.
Additionally, we tested an alternative model formulation, in order to investigate whether additional choice hysteresis parameter is necessary to improve model behaviour fitting. We compared model with and without repetition bias separately for exact and noisy learning formulations in the experiment 1 (SFig.1C).

Relationships with choice stochasticity and learning noise parameter
To examine the relationship between model parameters and impulsive and compulsive Additionally, we computed pair-wise correlations between model parameters, scales, and demographic variables: age, gender, IQ (experiment 1) and mental health status (experiment 2) (SFig.6).

Analysis of contribution of learning noise to behavioural variability
To estimate the fraction of behavioural variability explained by the learning noise illustrated on To characterize the contribution of learning noise to exploration, we estimated the fraction of non-greedy decisions that could be caused by learning noise in experiment 1 and experiment 2.
First, we labelled decisions as non-greedy based on the fits from the exact model.

Code availability
The games from experiment 2 and 3 are freely playable on the Brain Explorer app for Android and Apple handheld devices (access via www.brainexplorer.net or the respective app stores).
The code and data to reproduce the main analyses are freely available in an Open Science  Figure 1 Task structure and contributions of learning noise to behavioural variability A. Trial structure and design in two-armed bandit task in experiment 1. On every trial, participants were making a choice between two bandits represented by coloured shapes on the screen and observed the outcome (between 1 or 99 points) for the chosen option. B. Example of reward magnitudes for two bandits for one session in experiment 1. Reward magnitudes changed through trials and were sampled from two probability distributions with independently drifting means. Thick lines represent the drifting means of two distributions; dotted lines represent the reward outcomes that could have been observed by the subject conditioned on the choice. This observable reward magnitudes are sampled from the distribution mean with added Gaussian sampling noise to ensure continuous learning in the task. C. Task structure for reward learning two-armed bandit task in experiment 2 (Milky Way game in the Brain Explorer app). On every trial, participants were making a choice between two space cows and then observed how much space milk they have collected in the form of points (from 1 to 99). The underlying distributions of drifting mean rewards were the same as in experiment 1 but the task was substantially shorter. D. To analyse the subjects' overall performance in both experiments, we computed the difference between the total average reward won by each subject and the foregone reward (to account for the individually generated reward trajectories). A positive difference means that subjects were performing better than at chance level.  24 . Bottom panels. Fraction of behavioural variability attributed to learning noise based on the best winning model in experiment 1 (left panel) and experiment 2 (right panel). In both experiments more than 2/3 of total variability in the decision was due to learning noise rather than choice stochasticity. Black dots represent the results obtained in the previous laboratory study 24 .

Figure 2
Contributions of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise.
A. In the exact model (no learning noise added), both impulsivity traits measured with BIS total score and OC traits measured with OCI-R total score showed a signi cant association with choice stochasticity (softmax parameter). B. Impulsivity traits were no longer associated with choice stochasticity when learning noise is added to the model. C. Impulsivity traits contributed to learning noise rather than choice stochasticity in the learning noise model. D-F. Same analysis as in A-C for the Brain Explorer (experiment 2) data. All regressions included age, gender, IQ (experiment 1), mental health status (experiment 2), and other model parameters as covariates. Dark bar colours indicate the replication tests based on the results from experiment 1. Error bars are standard errors, * -p < 0.05, ** -p < 0.01, *** -p < 0.001.

Figure 3
Contributions of different subscales of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise in experiment 1 (N = 392).
A. Analysis of impulsivity and compulsivity sub-scales show that hoarding and washing compulsions (right panel) were most closely associated with choice stochasticity in the noisy model, whilst there was no association with any of the impulsivity subscores (left panel). B. In contrast, learning noise was exclusively associated with motor impulsiveness, but no other impulsivity or compulsivity subscore. Error bars are standard errors, * -p < 0.05, star represents test results Bonferroni corrected for multiple comparisons across regression models.

Figure 4
Contributions of different subscales of impulsivity and obsessive-compulsive traits to choice stochasticity and learning noise in experiment 2 (N = 2610).
A. Choice stochasticity from the noisy model was again associated with washing compulsions (replicating the rst experiment), but not with hoarding or any other subscore. B. Learning noise was again primarily associated with motor impulsiveness, as in experiment 1. No compulsivity subscale was associated with learning noise. Error bars are standard errors, * -p < 0.05, ** -p < 0.01, *** -p < 0.001. Dark bar colours indicate the replication tests based on the results from experiment 1.

Figure 5
Learning noise also linked to (motor) impulsivity in punishment learning (N = 670).
A. Trial structure for the Pirate Market game in Brain Explorer app -a version of the "Milky Way" game but in the loss domain. This was framed as choosing the one of two pirates, who takes away less milk (hence, we called it 'Pirate Market'). On every trial, users are endowed with 100 points (a full bucket of milk) and must choose a pirate that will steal less milk from them. The amount of stolen milk is presented in points (range from -99 to -1). The game setup and the outcome sequences were analogue to the Milky Way game but in the loss domain. B. Relative average rewards (chosen -unchosen) cumulated by the same sample of users (N = 670) in reward learning game (Milky Way, left) and in the punishment learning game (Pirate Market, right). White dots are medians, error bars are 25 th and 75 th percentile of the performance distributions. As in the reward learning context participants performed signi cantly better than chance (t(669)=38.8, p<0.0001), demonstrating they understood the task and learned to choose the less punishing bandit. Performance between the two domains was positively associated (r=0.33, p<0.0001), meaning that those who performed better in the reward version also performed better in the punishment version, even though they won slightly less in the punishment version (t(669)=3.00, p=0.003, two-tailed), C. BIS total score was associated with choice stochasticity in the exact, but not in the noisy learning model. D. In the better tting noisy model, BIS total score was associated with learning noise.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SkvortsovaHauserSuppInformation.docx