To investigate the multi-brain mechanism for observational learning, we developed a sequential MEG imaging protocol 34. First, we filmed demonstrators and recorded MEG while participants were performing a Pavlovian threat conditioning task (N = 3, Pavlovian learning), where one conditioned stimulus, CS+, but not the other, CS-, was probabilistically (62.5%) followed by electrical nerve stimulation (i.e., unconditioned stimulus, US). The demonstrators were regular, naïve participants with naturalistic reactions (i.e., not actors; see an example video online). All demonstrators showed successful learning during acquisition in the Pavlovian learning (Fig. S1). Second, we showed these video recordings to naïve observers (N = 60, observational learning) during MEG scanning (Figs. 1A-B).
Before observational learning, social status was manipulated via performance payoffs in a math competition task 62: half of the observers were instructed to believe they were observing a higher-status (HS) demonstrator (N = 30), and the other half, a lower-status (LS) demonstrator (N = 30, Fig. 1C). The influence of status manipulation on subsequent observational learning could be contaminated by task familiarity, which motivated us to have the tasks dissimilar (i.e., math competition vs. threat learning). Having a task that is not directly related also provides an even stronger test of the hypotheses that exogenously induced social status would affect learning. Similar to previous experiments 62,63, this manipulation successfully induced feelings of social rank (see Methods). After observational learning, observers performed a direct test in the absence of the demonstrator to evaluate their learning outcomes [i.e., CS+/CS- differentiation on skin conductance response (SCR) and pupillometry]. Demonstrator and observer visits were on separate days (see Fig. 1D for the experimental protocol and Methods for more details). Each demonstrator was paired with about 20 observers, among which 10 observers were randomly assigned to the LS group and the other 10 observers to the HS group.
2.1. Successful observational learning in the sequential MEG imaging paradigm
A linear mixed-effects model tested and verified successful learning in naïve observers: observers showed significantly higher SCR amplitude for CS + compared to CS- trials (β = 0.39, SE = 0.07, t = 5.97, P < 0.001; Fig. 2A, left panel). A similar pattern was observed for CS+/CS- differentiation on pupil response (β = 0.45, SE = 0.09, t = 4.85, P < 0.001; Fig. 2A, right panel). No significant main effect of social status or interaction were detected (all t < 0.66, all P > 0.51).
To further uncover the computational basis for observational threat learning, we then used a simple Rescorla-Wagner reinforcement learning model (winning model after model comparison, see Methods) to quantify the SCR patterns 64. We focused on the model-derived trial-by-trial initial value (V0), which accounted for initial shock prediction during direct test, and the learning rate (α), which elucidated the weight of reward prediction error in value (shock) update during observational learning. Modeling results on initial value revealed main effects of CS type (β = 0.14, SE = 0.002, t = 73.93, P < 0.001) and social status (β = 0.03, SE = 0.01, t = 4.27, P < 0.001), as well as their interaction (β = -0.04, SE = 0.003, t = -15.76, P < 0.001), indicating a significant larger V0 for “CS + LS > CS-LS” compared to “CS + HS > CS-HS” (Fig. 2B, left panel). The learning rate during observation learning showed main effects of CS type (β = 0.10, SE = 0.005, t = 19.76, P < 0.001) and social status (β = -0.08, SE = 0.01, t = -9.34, P < 0.001), and importantly, the interaction between them (β = -0.20, SE = 0.01, t = -27.90, P < 0.001). Post-hoc comparisons indicated that learning rate was significantly higher for CS + than CS- in the LS group, whereas this pattern was reversed in the HS group (Fig. 2B, right panel). Taken together, our physiological results demonstrated that naïve observers consistently acquired threat information irrespective of social status. Computational modeling further uncovered that status did, however, bias the initial value during direct test, as well as the learning rate for the extent of value (shock) update during observational learning.
2.2. Observational learning decreased widespread low-frequency neural response in observers
To investigate neural activity supporting observational threat learning, we first performed a sensor-level time-frequency analysis on observers’ MEG data (a parallel analysis on demonstrators’ brain data can be found in Fig. S2, albeit no statistical comparisons being applied due to the limited sample size of demonstrators). The non-parametric cluster-based permutation tests showed a significant difference between CS + vs. CS- (P = 0.001), informed by a negative cluster with the frequency ranging from 1–8 Hz and time interval ranging from 0.6–5.5 s post stimulus onset at the fronto-temporo-parietal channels (Fig. 3A-B). Source-level power mapping, using dynamic imaging of coherent sources (DICS), indicated that the CS effect (i.e., the 1–8 Hz response from 0.6–5.5 s) involved a decreased power in the ventromedial prefrontal cortex (vmPFC), occipital cortex (OCC), dorsolateral prefrontal cortex (DLPFC), Postcentral gyrus (PoG), and posterior superior temporal sulcus (pSTS) in low-frequency bands (including delta and theta bands, Fig. 3C).
In sum, our results describe a neural signature of observational learning that includes low-frequency neural oscillations. Similar low-frequency activity has been demonstrated in aversive learning and empathy 53–57. Time-frequency analyses did not reveal any significant clusters for the main effect of social status or the interaction (all P > 0.13).
2.3. Shared neural responses between demonstrators and observers
We next examined whether shared neural responses between demonstrators and observers could reflect social status and predict learning outcomes. We used the well-established Circular Correlation Coefficient (CCorr; see Methods) to measure shared neural responses (i.e., BtBC) between demonstrators and observers. We focused on low frequencies of interest associated with observational threat learning (as detected by the time-frequency analyses in the previous section) divided into the canonical delta (1–3 Hz) and theta bands (4–8 Hz).
In the first step of the analysis, we compared BtBC during observational learning with BtBC during visual baseline. This analysis filtered out channel combinations (i.e., demonstrator channel paired with observer channel) where BtBC might simply emerge due to common visual inputs and/or environment. We identified 6281 (Fig. 4A) and 4330 (Fig. S3A) channel combinations where BtBC during observational learning significantly exceeded that during visual baseline in the delta and theta bands, respectively (all PFDR < 0.05).
In the second step, we performed a series of linear mixed-effects models on channel combinations that survived in the first step. For the delta band, the analysis revealed a series of main effects of CS type (2959 channel combinations, all PFDR < 0.05) and social status (8 channel combinations, all Puncorrected < 0.001; Fig. 4B), and notably, their interaction (13 channel combinations, all Puncorrected < 0.001; Fig. 4C, left panel). Further analyses on channel combinations which showed interaction effects revealed that CS + LS elicited significantly stronger mean BtBC relative to CS-LS (CS type × social status: β = -0.61, SE = 0.13, t = -4.83, P < 0.001; Fig. 4C, right panel). For the theta band, we found a series of main effects of CS type (630 channel combinations, all PFDR < 0.05), indicating that mean BtBC was significantly larger for CS + than CS- trials (β = 0.16, SE = 0.02, t = 9.85, P < 0.001; Fig. S3B). No other effects were observed. As a control, we conducted parallel analyses in the alpha (9–12 Hz) and beta bands (13–30 Hz), neither band of which was among the a-priori (low-frequency) bands. No significant results were detected when we compared BtBC across CS type and social status, after multiple comparison corrections in these bands. In the remainder of the paper, we henceforth restricted our analyses to the low-frequency bands.
2.4. Shared neural responses selectively predicted observational learning
To further examine the interaction effect in the delta band and to test the relationships of BtBC patterns for each social status condition with learning, we conducted a nonnegative matrix factorization (NMF) analysis on BtBC (Fig. 5A). NMF clustered BtBC describing the unique demonstrator-observer brain network during observational learning 32. Previous research has shown that responses to CS + during observational learning can successfully predict the expression of learning at a subsequent time, as indicated by converging strands of physiological 41,65,66 and neural evidence 67. In line with these findings, we found that responses to the CS + LS condition in NMF-derived cluster 1 correlated positively with differential SCR (r = 0.44, P = 0.02) and differential learning rate (r = 0.47, P = 0.01), indicating that an increase in BtBC as represented by cluster 1 predicted better learning (Figs. 5B-C). Clusters 2 and cluster 3 derived from NMF were not correlated with differential SCR (all |r| < 0.10, P > 0.63) or differential learning rate (all |r| < 0.04, P > 0.85). In the CS + HS condition, clusters 1–3 were not correlated with differential SCR (all |r| < 0.25, P > 0.21) or differential learning rate (all |r| < 0.18, P > 0.36).
2.5. Source-level shared neural responses associated with social status predicted learning
To examine the neural substrates of the multi-brain mechanism underlying observational threat learning, we next sought to uncover shared neural responses in brain areas associated with learning and social status. We determined our regions of interest (ROI) based on DICS power mapping in this study and previous work 3. Following this, we selected ROIs, including insula (INS), anterior cingulate cortex (ACC), vmPFC, DLPFC, PoG, pSTS, and OCC, in both hemispheres. MEG time series in source space over these ROIs were extracted and submitted to the BtBC analyses.
We conducted the BtBC analyses in the delta band. For channel combinations where BtBC during observational learning was significantly stronger than that during visual baseline (Fig. 6A), linear mixed-effects models were conducted for each possible channel combination and revealed a series of main effects of CS type (56 ROI combinations, all PFDR < 0.05; Fig. 6B, left panel), indicating that there was significantly stronger mean BtBC in CS + compared to CS- trials (β = 0.13, SE = 0.02, t = 5.46, P < 0.001; Fig. 6B, middle panel). A planned contrast on mean BtBC revealed a marginally significant difference between “CS + HS > CS-HS” and “CS + LS > CS-LS” (independent t-test, t = 1.98, P = 0.052; Fig. 6B, right panel). No main effect of social status or interactions were detected after FDR correction on either channel combination. Corresponding analyses were conducted also in the theta band (see Fig. S4), with no significant results involving social status or CS type.
We then carried out a series of Pearson correlations to examine in which ROIs BtBC significantly predicted observers’ learning outcome. In the CS + LS condition, we observed that BtBC at lINS_rINS (i.e., demonstrator’s left insula and observer’s right insula), lvmPFC_rINS, rvmPFC_rINS, and lDLPFC_rINS during observational learning consistently predicted differential pupil responses in observers in the subsequent direct test (all r > 0.56, PFDR < 0.05; Fig. 6C). No significant relationships between BtBC and differential SCR were detected (all PFDR > 0.63). In the CS + HS condition, no significant correlations between BtBC and learning were found after FDR correction (all PFDR > 0.35). Furthermore, BtBC at lDLPFC_rINS also correlated with observer’s self-reported empathic concern (measured by Interpersonal Reactivity Index68) in the LS group (r = 0.41, P = 0.03), but not in the HS group (r = -0.01, P = 0.95).
2.6. Shared attention and emotion as likely sources for BtBC predicting learning outcome
Having established that BtBC in the fronto-limbic circuit can predict learning outcome (as measured by differential pupil response, see the previous section), we further investigate how early that learning outcome at direct test could be decoded from source-level BtBC at observational learning. To this end, we conducted time-varying support vector regressions (SVR). Indeed, both empirical 69,70 and theoretical 71 accounts of threat learning have described its temporal dependency in relationship to predictive events. Therefore, capitalizing on the uniquely high temporal resolution of MEG, rather than averaging BtBC across time, we repeatedly predicted trial-averaged learning outcome (i.e., pupil response) in the direct phase based on cumulative BtBC in the CS + LS condition for each time point in the observational learning phase. This analysis would inform whether BtBC could predict learning outcome shortly after CS onset or when a threat is imminent (i.e., close to the US). We observed that the prediction performance was improving and reached significance starting at about 4.7 s after CS onset, immediately preceding the US (note that a US was expected to come 5.5 s post CS onset; all PFDR < 0.05, Fig. 7A). The prediction accuracy of the model was expressed by the Pearson correlation coefficient between the actual and predicted values 72,73. The mean absolute errors (MAEs) were also reported. Better fit of the prediction model can be characterized by a higher value of correlation coefficient and a lower value of MAE. Our results showed that, the correlation coefficients between predicted and actual values were larger than 0.49, with MAEs smaller than 0.28. We performed a parallel analysis in the CS + HS condition. No significant predictions were found after FDR correction (all PFDR > 0.16). These findings indicate that BtBC data are able to predict learning outcome for the LS condition (but not HS) when a threat is imminent to the demonstrator.
To better understand the functional meaning of the BtBC for the prediction of learning outcome in the LS group, we carried out two complementary analyses. First, to ascertain how the observers allocated their attentional focus, we parsed the fixation proportion (i.e., fixation time at each area of interest normalized to the total fixation time) during observational learning. A linear mixed-effects model demonstrated that the observers pre-dominantly paid attention to the area including the demonstrator’s right hand, which was attached to the stimulator that administered the electric stimulation during CS + vs. CS- trials (β = 0.48, SE = 0.05, t = 10.52, P < 0.001; Fig. 7B), in the time window of 4.7–5.5 s post CS onset. Other areas of interest did not show this attentional bias (Face: β = -0.04, SE = 0.02, t = -1.87, P = 0.07; Stimulus: β = -0.003, SE = 0.006, t = -0.43, P = 0.67; Fig. 7B). Second, we computed eye-to-eye coupling (a corollary of shared attention 74 and emotion 31) between demonstrator and observer in the same time window for both CS + and CS- trials. Here, eye-to-eye coupling was defined based on the similarity between demonstrator’s and observer’s pupil dilation that is thought to typically track attentional effort 75 and sensitive to emotional peaks 31. Notably, eye-to-eye coupling was significantly stronger in the CS + compared to CS- trials (β = 0.54, SE = 0.15, t = 3.83, P < 0.001; Fig. 7C). These exploratory analyses indicate a higher level of shared attentional effort and emotional response between observers and demonstrators on CS + vs. CS- trials when observers believed they were watching a LS demonstrator.