Participants
Participants included 22 bipolar patients (mean age: 29.1 years [SEM = 1.67], 17 females; Table 1), and 27 healthy participants (27.5 years [SEM = 1.18], 15 females). Bipolar participants were assessed by a consultant psychiatrist who confirmed the diagnosis of BD (I or II) using the structured clinical interview for the International Statistical Classification of Diseases and Related Health Problems (ICD11)[49]. Patients included in the study were euthymic for at least 2.5 months before recruitment. Additional inclusion criteria were: most recent episode being depression, aged 18–50 years, absence of symptoms from other mental health conditions beyond BD, and no history of substance abuse. We assessed residual mood symptoms and cognitive performance using validated scales on mania, anxiety, depression and tasks on executive and general cognitive performance. See Table 1 for further details, and Supplemental Material for sample size estimates (minimum sample size estimated was 20 per group, to identify volatility effects at 0.8 power). The study was approved by the Institutional Review Board of National Research University Higher School of Economics and the Local Ethical Committee of the First Moscow State Medical University. All participants provided written informed consent.
Table 1
Group demographics and variables describing the measures of affective state, cognitive and executive function in euthymic bipolar disorder patients (BD, N = 22) and healthy control participants (HC, N = 27).

Bipolar group
(N = 22)

Healthy control group
(N = 27)

Statistics (Permutation test pvalue and Bayes factor)

Age (years), M (SEM)

29.1 (SEM 1.67)

27.5 (SEM 1.18)

P = 0.4921, BF10 = 0.3472

Sex (females), n (%)

17 (77.2%)

15 (56%)

χ2 = 1.6559, df = 1, P = 0.1849; BF10 = 1.102

Duration of illness (years), M (SEM)

7.4 (SEM 1.04)

NA

NA

Last episode polarity, (%)

depression (100%)

NA

NA

Time in euthymia (months), M (SEM)

4.9 (0.68)

NA

NA

Bipolar II, n (%)

17 (77.3%)

NA

NA

Bipolar I, n (%)

5 (22.7%)

NA

NA

Antidepressant, n (%)

10 (45.5%)

NA

NA

Antipsychotic, n (%)

14 (63.6%)

NA

NA

Mood stabiliser, n (%)

16 (81.8%)

NA

NA

Summary: Taking psychiatric medication, n (%)

19 (86.4%)

NA

NA

HADS anxiety, M (SEM)

5.5 (0.94)

5 (0.64)

PFDR = 0.64, BF10 = 0.31

HADS depression, M (SEM)

4 (0.69)

3.30 (0.57)

PFDR = 0.4003, BF10 = 0.3824

Beck Depression Inventory, M (SEM)

8.3 (1.75)

5 (1)

PFDR = 0.1038, BF10 = 0.8668

Altman's Mania Scale, M (SEM)

1.7 (0.50)

2.6 (0.50)

PFDR = 0.2040, BF10 = 0.5652

StateTrait Anxiety Inventory (state subscale on MEG day)1, M (SEM)

35.5 (1.72)

33.7 (1.90)

PFDR = 0.4571, BF10 = 0.3833

Trail Making Test (Part 1), M (SEM)

23.3 (1.21)

21.2 (1.22)

PFDR = 0.2216, BF10 = 0.5296

Trail Making Test (Part 2), M (SEM)

87.8 (8.5)

55.7 (5.6)

PFDR = 0.0022*, BF10 = 14.907

Wisconsin Card Sorting Test, M (SEM)

29.3 (0.92)

31.6 (0.67)

PFDR = 0.0364, BF10 = 1.6560

MiniMental State Examination, M (SEM)

29.5 (0.18)

29.5 (0.17)

PFDR = 0.8678, BF10 = 0.2875

For MEG analysis, data were available for 27 HCs and 21 BDs. Values are provided as mean and SEM (M, SEM) or as count and percentage (n, %). Betweengroup differences in scale values were assessed using permutation tests. Our alpha significance level was set at 0.05, and we controlled the false discovery rate (FDR) at q = 0.05 for multiple tests. Permutation test pvalues were complemented with Bayes factor analysis to assess the amount of evidence in favour of H1 or H0. Age and sex distribution were comparable across groups (Age: PFDR = 0.4921, nosignificant differences; BF10 = 0.3472, providing anecdotal evidence for H0; Sex: Chisquared statistic, χ2 = 1.6559, df = 1, P = 0.1849, no significant; BF10 = 1.102, no evidence for H0 or H1). There was substantial and anecdotal evidence supporting no differences between BD and HC groups in anxiety, depression, mania, and general cognitive scores (Bayes factor, BF10, in the range 1/3–1: anecdotal evidence; 1/10–1/3: substantial evidence). Betweengroup differences after FDR control were observed exclusively for Part 2 of the Trail Making Test (denoted by *), assessing task switching, based on strong evidence. There was anecdotal evidence for differences in Wisconsin Card Sorting Test scores, assessing executive functions. However, this effect was not significant after FDR control. 1State anxiety scores on the day of the experimental session were available for 22 bipolar patients and 18 healthy participants. Abbreviations: HADS, Hospital Anxiety and Depression Scale. The type of mood stabiliser taken by bipolar participants was mainly Lamotrigine (n = 11), but a few patients were taking Carbamazepine (n = 2), Valproic acid (n = 2), or Lithium (n = 1). See Supplementary Materials for further details on the scales and the corresponding references.
Rewardbased motor sequence learning task
Participants underwent an initial fine motor control assessment, then completed a validated motorbased decisionmaking paradigm[48] (Fig. 1A), which combines probabilistic binary rewardbased learning within a volatile setting (reminiscent of reversal learning) with the execution of motor sequences to express decisions. Participants learned two sequences of four finger presses, followed by a 320trial test phase. In each trial, they were required to choose and perform one of the sequences to potentially earn a reward (5 points; Fig. 1A). Reward probabilities for sequences were reciprocal (p, 1p) and changed pseudorandomly (Fig. 1B). The aim was to infer the reward probability associated with each sequence ('actionoutcome' contingencies henceforth) and adjust their choices considering changing contingencies. Accumulated points translated to monetary rewards. See timeline in Fig. 1C. The task, programmed in MATLAB using Psychtoolbox, recorded participants' keypress timings to evaluate reaction time (RT) and performance tempo (Fig. 1D). See Supplemental Material.
MEG recording and preprocessing
MEG was performed using a 306channel system (Elekta Neuromag VectorView), with head movements tracked by a head position indicator with four coils. Concurrently, ECG and EOG were recorded for MEG artefact rejection. Recordings were sampled at 1000 Hz and filtered between 0.1–330 Hz. MEG preprocessing involved head movement correction, noise reduction, and channel selection using standard methods ([50]; Elekta Maxfilter software; Supplemental Material). MEG data was further processed using MNEpython[51] (Python version 3.11.5) and custom Python scripts, lowpass filtered at 125 Hz, downsampled to 250 Hz, with a notch filter applied at 50 and 100 Hz. Independent component analysis (FastICA) removed eye and heart artifacts (3.3. ICs on average per participant).
General task performance
General task performance was assessed using the win rate (rewarded trials), average tempo (mean interkey press interval in a sequence, mIKI), and RT (time from stimulus to first key press).
To assess practice effects in performance tempo and RT over trials as a function of the group, we employed Bayesian multilevel regression modelling (BML) with the logarithm of trialwise RT or mIKI (in milliseconds) as the dependent variables. BML was performed with the brms R package[52, 53]. For each dependent variable, models of decreasing complexity were constructed (Table S1). Details on model comparison can be found in the Supplemental Material.
Modelling decisionmaking behaviour using hierarchical Gaussian filters
To assess probabilistic learning in our task we used a validated hierarchical Bayesian model, the 3level perceptual HGF for binary categorical inputs[23, 24](Fig. 2a). This model described how participants infer hidden states about the tendency of the actionoutcome contingencies on trial k, x2(k)(level 2), and the rate of change in that tendency (logvolatility), x3(k). Level 1 represents the binary reward input. Gaussian belief distributions on levels 2 and 3 are represented by their posterior mean (µ2(k), µ3(k)) and posterior variance (uncertainty: σ2, σ3). Precision is the inverse variance or uncertainty, πi (i = 2, 3). Belief updating on each level i and trial k is driven by prediction errors, and modulated by precision ratios, weighting the influence of precision or uncertainty in the current level and the level below. This is termed precisionweighted PE, pwPEs. For level 2, belief updating takes the simple form:
$$\varDelta {\mu }_{2}^{k}={\mu }_{2}^{\left(k\right)}{\mu }_{2}^{\left(k1\right)}={\sigma }_{2}^{k}{\delta }_{1}^{\left(k\right)}$$
1
Thus, updating beliefs about the tendency of the actionoutcome contingencies is proportional to the PE about action outcomes, δ1(k), weighted with the estimation uncertainty on that level, σ2(k). Here, pwPE is equal to σ2(k)δ1(k). See general equation, representing updates on level 3, in Supplementary Material, and [24].
States x2, x3 evolve as random Gaussian walks, with volatility states x3 directly influencing the time evolution of x2 through its variance (conditional on past values):
\({x}_{2}^{k}\sim N\left({x}_{2}^{k1},{f}_{2}\left({x}_{3}\right)\right)\)
(2)
with (dropping k for simplicity)
$${f}_{2}\left({x}_{3}\right)\stackrel{\scriptscriptstyle\text{def}}{=}exp\left(\kappa {x}_{3}+{\omega }_{2}\right)$$
3
In (3), ω2 represents the invariant (tonic) portion of the log conditional variance of x2, while κ is a coupling constant that modulates the influence of phasic volatility, x3, on the size of actionoutcome belief updates. The step size on level 3 is modulated by ω3, a highlevel tonic volatility parameter. See Supplemental Material for modelling details.
To assess how beliefs mapped to decisions, we coupled this perceptual model to response models previously used in similar tasks[24, 37, 38]. First, we considered a unitsquare sigmoid response model where choice probability is shaped by a free fixed (timeinvariant) parameter ζ, interpreted as inverse decision noise: the sigmoid approaches a step function as ζ tends to infinity. This constituted our model M1. Model M2 was similar but employed a twolevel HGF with constant volatility. M3 combined the 3level HGF with a response model where the sigmoid function depends on the trialwise prediction of logvolatility, \(\zeta ={e}^{{\mu }_{3}^{\left(k1\right)}}\) [22](Fig. 2a). In this model, higher estimates of volatility lead to a more stochastic mapping from beliefs to decisions. As a result, there is an increased likelihood of choosing responses that deviate from predictions, consistent with increased exploration (exploring whether the contingency has changed). In models M1 and M3, parameters ω2 and ω3 were free; ω2 was also free in M2. Additionally, ζ was free in M1 and M2, while initial values µ3(0) and σ3(0) were free in M3. A fourth model, M4, was constructed similarly to M3 but replaced the free parameter ω2 with κ[28].
Models were fit to individual behavioural data (series of chosen and rewarded sequences: responses and observed outcomes), using prior values described in Table S2. Log model evidence was used for model comparison using randomeffects Bayesian model selection (Supplemental Materials). As in previous work, simulations were conducted to quantify how well free model parameters could be estimated[27, 37]. Relevant belief and uncertainty trajectories were used subsequently for our MEG analysis (Fig. 2b; see Results). The models were implemented as a part of the TAPAS toolbox[54]. We used the HGF release v7.1 in Matlab R2020b, and functions ‘tapas_ehgf_binary’.
Assessing motor invigoration
We additionally investigated whether trialbytrial predictions about the probabilistic actionoutcome mapping, \({\stackrel{\prime }{\mu }}_{2}^{\left(k\right)}\), differentially influenced motor performance in our groups. Following Tecilla et al.[48], given the sign of \({\stackrel{\prime }{\mu }}_{2}^{\left(k\right)}\) is arbitrary, we used as predictor for Bayesian multilevel modelling the absolute value,\(\left{\stackrel{\prime }{\mu }}_{2}^{\left(k\right)}\right\), representing the strength of those predictions, and assessed its association with performance variables (logmIKI, logRT). We hypothesised a negative correlation, suggesting that stronger expectation about reward contingencies speeds performance. We also hypothesised a greater sensitivity to these predictions (steeper slope) in BD compared to HC. Details of the models are in Table S3.
Source reconstruction of MEG signals
We reconstructed MEG signals using Linearly Constrained Minimum Variance beamforming (LCMV[55]) in MNEPython and individual T1MRI images for cortical divisions with Freesurfer 6.0[56, 57], http://surfer.nmr.mgh.harvard.edu/). We aligned MRI and MEG coordinate systems, selected the DesikanKilliany–Tourville atlas for cortical parcellation (DKT[58]), and performed forward modelling with boundary element models[37].
We focused on alpha, and beta frequency bands, bandpass filtering signals between 1–40 Hz before LCMV beamforming. Thetaband activity was also examined given its robust association with feedback processing[34, 59], relevant for win/lose outcomes in our task. Gamma frequency analysis followed a similar process (30–124 Hz bandpass filter). Time courses were extracted for regions of interest (ROIs) associated with decisionmaking under uncertainty and reward processing[35, 60–65], and linked with impairments in frontostriatal reward circuitry in [5, 10, 66, 67]. These included the (1) ACC, (2) OFC, including the ventromedial PFC, (3) dorsomedial PFC (dmPFC), (4) dorsolateral PFC (dlPFC). We also included the (5) primary motor cortex (M1) and (6) premotor cortex (PMC), to assess motor activity during decisionmaking[68].
Our study's ROIs comprised 16 bilateral labels in eight areas from the DKT atlas: (1) rostral and caudal ACC, (2) lateral and medial OFC (including vmPFC), (3) superior frontal gyrus (dmPFC, and supplementary motor area, SMA), (4) rostral middle frontal gyrus (rMFG), (5) precentral gyrus (M1), and (6) caudal MFG. Time series extraction utilised the PCA flip method in MNEPython. Although the 'flip' operator was not relevant for our timefrequency analysis, it was essential for preparing the sourcereconstructed time series for subsequent connectivity analysis. See anatomical label references in Supplemental Material.
Convolution modelling of timefrequency responses during outcome processing
We used a validated convolutionmodelling approach to analyse frequencydomain amplitude changes related to belief updating and uncertainty following outcome presentation[36, 38, 69]. Building on previous work[37], this frequencydomain general linear model (GLM) included as parametric regressors the unsigned pwPE updating beliefs on level 2 (representing precisionweighted Bayesian surprise; the absolute value is preferred for the binary HGF where sign on level 2 is arbitrary[70, 71]), and uncertainty measures (σ2, σ3). It also included discrete regressors for win/lose outcomes and error trials. To avoid regressor collinearity and potential GLM misspecification, we excluded the level 3 pwPE[37, 38], due to its high linear correlation with the unsigned pwPE on level 2 (Supplementary Materials).
The GLM was applied to concatenated epochs of sourcereconstructed data in our ROIs, using Morlet wavelets for timefrequency (TF) analysis in 4–100Hz and within − 0.5–1.8 s (Figure S1). We hypothesised betweengroup differences in gamma and concomitant alpha/beta activity during pwPE processing, and in alpha/beta activity during uncertainty encoding. Theta 4–7 Hz activity was assessed for the win/lose regressors[34, 59].
We conducted this analysis using SPM12 software (http://www.fil.ion.ucl.ac.uk/spm/), adapting original code by Spitzer et al.[72], as used in Hein et al.[37], with additional details available in the Supplemental Materials.
Frequencyresolved functional connectivity
To analyse directed functional connectivity between frequencyresolved activity in our ROIs, we employed timereversed Granger causality (TRGC[73]; Supplemental Materials) as a robust metric for directed information flow[74]. Following the recommendations of Pellegrini et al.[74], we applied TRGC in the frequency domain to LCMVbased sourcereconstructed time series from our 16 ROIs after the PCA flip transformation.
Our analysis focused on betweengroup differences in the directionality of information flow within the 8–30 Hz range during the 0.5–1 s interval of outcome processing for trials with large unsigned pwPEs updating beliefs at level 2. We employed a median split of unsigned pwPE values, yielding approximately 160 highpwPE trials per participant. This frequency range was selected based on evidence that betaband functional connectivity from the PFC effectively differentiates levels of predictability, exhibiting reduced values during unpredictable trials[34]. By examining TRGC in trials with high unsigned pwPE values, we anticipated a general decrease in betaband TRGC in HC, in parallel with alpha/beta amplitude suppression during belief updating. We hypothesised that this pattern would be disrupted in BD. The TRGC analysis was conducted using the ROIconnect plugin for EEGLAB[74], adapted for our MNEpython LCMV outputs. See Supplemental Materials.
Statistical analysis
Betweengroup analyses of behavioural, computational, and TRGCderived variables used independentsample permutation tests (5000 permutations) in MATLAB®. Withinsubject analyses used paired permutation tests. We maintained an alpha significance level at 0.05 and controlled false discovery rates (FDR) at q = 0.05 for multiple tests. Nonparametric effect sizes are reported as probability of superiority[75, 76] (Δ). Nonsignificant effects were further evaluated using Bayes Factors (BF10), interpreted following Wetzels and Wagenmakers[77].
Statistical analysis of sourcelevel timefrequency images used clusterbased permutation testing in the FieldTrip Toolbox[78, 79] (1000 permutations). We averaged TF activity across frequency bins within each band (theta, alpha, beta; 60–100 Hz for gamma[37]). Temporal intervals of interest for statistical analyses were selected based on previous research[37, 38, 59]: 0.5–1.8 s for parametric regressors, 0.2–1 s for win/lose regressors. We controlled the familywise error rate (FWER) at 0.025. See Supplemental Materials.