## 4.2 Procedure, measures and tasks

*Procedure.* Participants took part in two sessions in the lab, and between those sessions a week of ecological momentary assessment. The effect of reward and punishment on PM-related brain activity was assessed during a monetary incentive flanker task using EEG during the second session. The EEG session took place a minimum of 8 days after the first session. During both sessions, participants performed other tasks, which are not part of this report.

*Monetary Incentive Flanker Task.* A modified version of the arrow-version of the Eriksen flanker task, with a potential gain (PG) and a loss avoidance (LA) incentive context, was employed [60, 61]. Participants had to respond as quickly and correctly as possible using a left or right button, according to the direction of the centrally presented target arrow. The target stimulus appeared for 30 ms, with a delay of 100 ms relative to the onset of the surrounding flanker arrows. In 50% of the trials, the target stimulus pointed in the opposite direction (incongruent) as the flanker arrows. In the other 50% of the trials, the target pointed in the same direction (congruent). Each trial started with an incentive cue, signaling potential gain or loss of 40 points in the current trial. The incentive cue, a red (LA) or green (PG) frame surrounding a fixation cross, was presented for 500 ms. The frame remained visible for the duration of the trial. After response, feedback on task performance was presented: negative feedback was presented for incorrect and the slowest 20% of the correct responses and positive feedback for the remaining correct responses. In the LA context (50% of the trials) negative feedback was associated with a loss of 40 points and positive feedback with loss omission (0 points). In the PG context positive feedback resulted in a gain of 40 points and negative feedback in reward omission (0 points). An adaptive deadline based on individual performance and response time determined the response deadline, in order to approach a rate of 20% negative feedback for each context (PG context: M = 18.6%, SD = 2.3%; LA context: M = 19.6%, SD = 2.3%). After a response interval of 900 ms following target onset or 600 ms after response, performance feedback was presented for 800 ms. Participants could earn a bonus of up to 5 EUR, depending on task performance and points earned. The task included 640 trials of 2.53 to 2.75 sec duration and an additional training block of 40 trials. Task duration was approximately 25 min. After task completion, participants were asked, on a visual analogue scale, how important it was to them to respond accurately, how important it was for them to respond fast, and how much attention they paid to the red and the green frame, respectively. The task was presented using Presentation 19.0 (Neurobehavioral Systems Inc., Berkeley, CA, USA). See Fig. 1a for a schematic depiction of the task.

## 4.4 Data analysis

**Behavioral data.** We employed two multiple robust regression models onto each participant’s single-trial accuracy and log-transformed reaction time (RT) to determine predictors of influence [64]. The RT model was specified as follows: *log*(*RT) = b**0* *+ congruency x b**1* *+ previous accuracy x b**2* *+ context x b**3* *+ accuracy x b**4* *+ trial x xb**5* *+ e*. The logistic accuracy model was specified by: *accuracy = b**0* *+ congruency x b**1* *+ previous accuracy x b**2* *+ context x b**3* *+ trial x b**4* *+ e*. The predictors are: *congruency* (congruency between flanker and target; -1 = congruent, 1 = incongruent), *previous accuracy* (accuracy of the immediately preceding trial; -1 = correct, 1 = error), *context* (incentive context of the current trial; -1 = LA, 1 = PG), *accuracy* (of the current trial; -1 = correct, 1 = error) and *trial* (trial number, reflecting the time on the task). *Trial* mainly served to control for unspecific effects of task duration (e.g. adjustments of speed-accuracy trade-offs over the task or blocks, or fatigue). Individual *t*-values per regressor were tested via two-sided *t*-tests against zero, on group level, all *p*-values were Bonferroni corrected. Follow-up analyses are shown in respective figures (Fig. 1).

Because the self-report data were not normally distributed as determined by the Shapiro-Wilk test of normality (of the differences), we analyzed differences between behavioral measures using the Wilcoxon signed rank test [95, 96]. *P* values were adjusted for multiple comparisons using the procedure proposed by Benjamini and Yekutieli [97].

**Drift-diffusion modelling**. Features of the multi-stage sequential sampling models and general fitting procedures were exactly as described by Fischer, et al. [64] and Kirschner, et al. [66]. The description is adapted from therein. We used a multi-stage sequential sampling model to simulate participants' RT and accuracy distributions and the decision process giving rise to these. The model approximates the decision process as a single decisions variable which reflects both decision options. We previously demonstrated that this approximation is valid and compatible with an independently measured neural decision signal (i.e., lateralized movement selective beta power (BPL)) [64, 66]. The discrete diffusion model simulates decisions as a Wiener process with stepwise increments according to a Gaussian distribution with mean **v** (called drift rate) and variance **s** on every trial. Here, **v** reflects the speed of evidence accumulation and **s** the system’s noise, which scales all other parameters and is usually fixed (here to 0.1). The step size for all models was set to 1ms. Responses are triggered when the diffusion reaches a criterion (boundary, determined by the free parameter ± **a**). We assumed that there was no bias in response selection over the task, because we had an equal number of left and right responses. However, individual trials were allowed to start with a random bias towards one response (start-point variability, parameter **sz**). We used symmetrical boundaries which were defined as left-hand responses when the positive boundary was reached first, and as right-hand responses when the negative boundary was reached first (see Fig. 2a for more details). The non-decision time was modelled as another free parameter **T****er**. The stages of the model per trial were defined as a zero-mean baseline until flanker onset, a noisy diffusion with v = 0 during the non-decision time, a diffusion driven by the flanker direction between flanker and target onset (100 ms) with drift = vt × ft, and the target phase thereafter with drift = **vt** and the direction of the target. Single-trial values **vt** and **ft** were determined according to Gaussian variance parameters **sv** and **sf**. For display purposes, in Fig. 2a, we modelled a consecutive return of the decision variable to baseline like an Ornstein-Uhlenbeck process. To speed up the model fitting procedure, we did neither simulate baseline periods nor return to baseline during the fitting because these have no effect on model predictions. We compared four different variants of the DDM by fitting their parameters to RT and accuracy data observed in the group of 130 participants using quantile maximum likelihood statistics [98] and differential evolution algorithms [99]. Additionally, we used a mixture model assuming 2% outliers that were distributed uniformly over the full range of RTs in correct and error responses. This downweighs the impact of possible outliers on model parameters. DDM model 1 used the same drift rate during flanker and target processing, thus not allowing for suppression of distractors (parameter **f** fixed to 1). DDM model 2 fixed parameter **f** which suppressed flanker processing when below the value of 1. DDM model 3 furthermore included trial-by-trial variance in **f** modelled as a zero-mean Gaussian distribution with variance **sf**. The fourth DDM model extended DDM model 3 to allow for a dynamic decision boundary collapse according to a Weibull distribution scaled by the free parameter **k**. Here, the dynamic boundary **u** at time **t** was calculated as follows:

$${u}_{t}=a-\left(1-{\text{exp}-\left(\frac{t}{k}\right)}^{s}\right)*\frac{a}{2}$$

In this equation, \({a}\) represents the initial boundary value and \({k}\) scales the Weibull distribution. The shape parameter \({s}\) was fixed at 3, to impose a “late collapse” decision strategy.

For model fitting in all iterations, we applied the following hard priors which can be seen as boundary parameters: **v** [0.01–8.5], **sv** [0–1.5], **a** [0.01–0.45], **sz** [0.05–0.3], **T****er** [0.1–0.4], **st** [0–2], **f** [0.1–1.5], **sf** [0–1.5], **k** [10e-4–3].

For model comparison, we first computed approximated BIC values (aBIC; akin to White, et al. [100]). Next, we used these individual aBIC values to compute protected exceedance probabilities, which gives the probability that one model was more likely than any other model of the model space Rigoux, et al. [101]. As DDM4 provided the best fit to the data, we used this model to investigate model parameters when this model was fit separately to post-error and post-correct trials with positive and negative feedback, respectively, across all contexts and additionally within the PG and LA context. For this analysis, we fixed the variance parameters (sv, sz, st, sf) to the group mean to facilitate convergence.

To simulate the temporal evolution of the modelled diffusion signal depicted in Fig. 2a we used the mean maximum likelihood parameters from the group fit obtained for DDM 4. For this simulation, we computed 5000 simulated trials.

**EEG analyses.** Multiple robust regression [102] was employed to build a general linear model (GLM) and regress behavioral and task parameters on single-trial EEG activity at each electrode and time point. The resulting individual *b* values were standardized by their SDs before averaging across subjects, to penalize the regression model in case of multicollinearity of predictors and ensure comparability between predictors.

*Stimulus-locked analyses.* The model for target stimulus-locked EEG included only correct trials and included a regressor coding the incentive context of the current trial (loss/gain), the interaction between context and congruency of the current trial, the congruency (congruent/incongruent) of the current trial, the accuracy of the previous trial (correct/error) and log scaled RT.

*Response-locked analyses.* We built a model for response-locked EEG, which only included incongruent trials. The model included a regressor coding the accuracy of the current trial (correct/error), incentive context of the current trial (loss/gain) and the interaction between accuracy and context of the current trial. Log scaled RT of the current trial and response hand (left/right) were included as additional regressors.

*Feedback-locked analyses.* We focused our data analysis on feedback after correct responses, as only feedback after correct responses carried new information and erroneous responses deterministically elicited negative feedback. The first feedback model included the regressors coding the incentive context of the current trial (loss/gain), the feedback valence (negative/positive), and the interaction between feedback and context. To follow-up the interaction, we built two additional models for correct trials with corresponding positive feedback and for correct trials with negative feedback, respectively. Both models included the incentive context of the current trial (loss/gain) as the only regressor. See the Supplement for an analysis of error trials and corresponding negative feedback, a cue-locked analysis, as well as the effect of feedback type within the respective context.

The standardized *b* values resulting from the models were tested using two-tailed one-sample t-tests, for FCz and Pz. The tests were done for each data point across subjects. The resulting *p* values were adjusted for multiple comparisons using the false discovery rate (FDR) procedure proposed by Benjamini and Yekutieli [97]. We set a criterion value of 0.001 to minimize the chance of false positive results. The FDR procedure by Benjamini and Yekutieli [97] has been found to facilitate good control of the family-wise error rate (FWER) in EEG data [103]. As FDR does not provide sound local control of the FWER, FDR was applied per model and electrode. H0 was rejected for *p* < 2.2654e− 4 (FCz) and *p* < 3.2189e− 4 (Pz) in the stimulus-locked, for *p* < 2.8827e− 4 (FCz) and *p* < 3.4804e− 4 (Pz) in the response-locked model, for *p* < 3.7708e− 4 (FCz) and *p* < 2.7407e− 4 (Pz) in the first feedback-locked model, for *p* < 1.9288e− 4 (FCz) and *p* < 4.2192e− 4 (Pz) in the feedback-locked model for correct trials with corresponding positive feedback and for *p* < 2.2664e− 4 (FCz) and *p* < 2.1000e− 5 (Pz) for the feedback-locked model for correct trials with negative feedback.

*Time-frequency analysis of feedback.* We also analyzed feedback-locked activity in the time-frequency domain, to investigate whether possible differences between contexts were due to theta- or delta-band activity. Analyses were conducted according to recommendations by Cohen [104]. Data were epoched around the feedback onset (-1000 to 2000 ms). The interval between − 500 to -200 ms was used for baseline correction. Time-frequency decomposition was done separately for each combination of feedback type and context. EEG time series in each epoch were convolved with a set of complex Morlet wavelets, defined as a Gaussian-windowed complex sine wave: \({\text{e}}^{-\text{i}2{\pi }\text{f}\text{t}}{e}^{ \frac{-4{\text{l}\text{o}\text{g}\left(2\right)t}^{2}}{{h}^{2}}}\), where t is time, f is frequency, which increased from 1 to 20 Hz in 20 logarithmically spaced steps, and h defines the full-width at half-maximum, which ranged from 600 to 300 ms with 20 logarithmically spaced steps [105]. Power was normalized by conversion to a decibel (dB) scale (10∗ log10[power(t)/power(baseline)]), to allow direct comparison of effects across frequency bands. Each epoch was then cut to -200 to 600 ms. Epochs were then averaged for each context, feedback type and participant. The differences between the PG and LA contexts were tested using two-tailed one-sample t-tests, for FCz and Pz. The tests were done for each data point across subjects, within a time window from 0 to 600 ms and frequencies between 1 and 8, to include delta and theta bands. Differences between contexts were tested for negative and positive feedback separately. The resulting *p* values were adjusted for multiple comparisons using the FDR procedure proposed by Benjamini and Yekutieli [97]. We set a criterion value of 0.001 to minimize the chance of false positive results. H0 was rejected for *p* < 4.6630e− 4 (FCz) and *p* < 1.1885e− 7 (Pz) for the comparison of contexts in negative feedback, and for *p* < 1.9453e− 4 (FCz) and *p* < 6.6510e− 5 (Pz) for the comparison of contexts in positive feedback.

Statistical analysis of EEG and behavioral data was performed in MATLAB 2018b [93] and R 4.2.0 [106].