2.1 Participants
A sample size of 24 was calculated using G.power 3.1 (Faul et al., 2007) according to the following settings: F-tests > ANOVA: Repeated measures, within factors, effect size F = .25, α error probability = .05, correlation among repeated measures = .5, power (1 − β error probability) = .8, number of groups = 1, number of measurements = 5, and nonsphericity correct ∈ = 1. Thirty participants were recruited from Liaoning Normal University. All participants had normal or corrected vision and had no history of neurological or psychological disorders. The study excluded participants whose head movements exceeding 3 mm during the experiment and those whose accuracy rates were < 50%. A total of six participants were excluded, leaving 24 participants (22 females, mean = 21.5, SD = 1.72, right-handed) for data analysis. Ethics approval was provided by the Research Centre for Brain and Cognitive Neuroscience of Liaoning Normal University, and all participants provided their written informed consent prior to taking part in the study.
Table 1 shows the mean age of L2 (English) acquisition and self-ratings of L1 (Chinese) and L2 proficiency. Participants self-rated their language proficiency on a 6-point scale, where “1” was not proficient and “6” was completely proficient. The paired sample t-test showed that the L1 was more proficient than the L2 in listening (t(23) = -8.741, p < .001), speaking (t(23) = -8.108, p < .001), reading (t(23) = -6.409, p < .001) and writing (t(23) = -8.113, p < .001). These results indicate that participants have intermediate proficiency in their L2 (see also Liu et al., 2022, 2023 for a similar sample).
Table 1
Participants’ age of language acquisition and self-ratings of proficiency.
| L1 | L2 |
Age of Acquisition | | 7.38 ± 2.00 |
Listening | 5.42 ± .88 | 3.29 ± .86 |
Speaking | 4.79 ± .78 | 3.25 ± .73 |
Reading | 4.50 ± 1.18 | 2.83 ± 1.01 |
Writing | 4.88 ± .95 | 3.25 ± 1.15 |
2.2 Materials
We employed a picture-word matching task to investigate how participants overcome the disturbance caused by confusing feedback and how they establish mappings between word forms and meanings. The experimental materials consisted of 128 images that represented combinations of 16 shapes (pentagram, square, triangle, circle, pentagon, rhombus, arc, trapezoid, ring, ellipse, hexagon, parallelogram, cross, rectangle, semicircle, sector) and 16 colors (deep red, dark brown, light yellow, grass green, dark blue, dark purple, sky blue, light orange, light brown, dark grey, ochre, light pink, beige, black, dark green, cyan).
The names of the colors and shapes were monosyllabic pseudowords (e.g., “sa” for yellow, “da” for pentagram). To achieve a balance of color and shape, half of the time the images were presented in the order of color before shape (e.g., sada = “sa” for yellow, “da” for pentagram) and the other half in the order of shape before color (e.g., dasa = “da” for pentagram,“sa” for yellow). Because it is believed that in visual perception, humans perceive color cues before shape cues (Gong et al., 2016), we designed 4 experimental blocks, with each block containing 32 pictures (16 with color + shape, and 16 with shape + color).
In the disturbance condition, we set different rewards according to different feedback probabilities (see Fig. 1a). We considered the feedback to have a disturbance if it was misleading. For instance, a correct response may only have a 70% chance of receiving 9 points (high reward) and a 30% chance of receiving 1 point (low reward), while a wrong response would have the opposite reward in the learning trial. When there was no disturbance, the feedback was deterministic, with a 100% chance of receiving 9 points for a correct response and a 100% chance of receiving 1 point for a wrong response in the learning trial.
2.3 Procedure
To ensure that participants were familiar with the procedure, we asked them to practice four to six trials before entering the scanner. Participants were told that on each trial, they would observe a virtual learner’s judgement about a target word (i.e., whether it was presented as color + shape vs. shape + color) and would then see feedback about that choice. Following this, participants were asked the same question and were required to make their own judgement based on the virtual learner’s feedback to maximize their own reward. Participants did not receive feedback about their own judgements. As the learning time increased and participants gained more experience with the lexical form-semantic rules, they became ‘expert learners.’ This allows us to compare their performance and brain activity as naïve learners (i.e., in the first and second blocks) and as expert learners (i.e., in the third and fourth blocks).
We used a within-subject experimental task with a 2 (disturbance type: non-disturbance vs. disturbance) × 2 (learning experience: naïve vs. expert) design. The design included 4 experimental blocks: two containing feedback without disturbance and two containing feedback with disturbance. Each block contained 32 trials, and 16 compound stimuli of different colors and shapes were randomly presented. Each block lasted 8 minutes and 10 seconds, with the whole experiment lasting approximately 33 minutes. The order of the four blocks was counterbalanced across participants. As shown in Fig. 1b, a trial started with a fixation point for 500 ms, followed by the stimulus’ image and name for 3000 ms. After this, participants observed a question about the target and viewed a response made by a virtual learner (i.e., the computer) for 2000 ms. They then saw feedback about the virtual learner’s choice for 2000 ms. Finally, the choice again appeared for 3000 ms and participants made their own choice based on the virtual learner’s feedback. If participants responded within these 3000 ms, the remaining time was filled by a blank screen. Finally, a jittered inter-trial interval appeared for 1000–4000 ms.
2.4 fMRI data acquisition
In this study, a GE Discovery MR750 3-T scanner was used to obtain functional and structural brain images. Participants’ heads were immobilized during scanning to prevent artifacts caused by head movement from interfering with the experiment. Each brain volume consisted of 33 axial slices (voxel size: 3.5 × 3.5 × 4.2 mm, slice thickness: 2 mm) acquired by using a T2*-weighted gradient echo planar imaging (EPI) sequence. The scan parameters of the functional images were as follows: repetition time (TR) = 2000 ms; echo time (TE) = 30 ms; flip angle = 90°; image matrix = 64 × 64; field of view (FOV) = 224 × 224 mm. There were four runs in total and each functional scan run contained 245 time points. A structural image was acquired using a T1-weighted 3D MPRAGE sequence with 19 slices, slice thickness = 1 mm, TR = 6.652 ms, TE = 2.928 ms, rotation angle = 12°, sequential acquisition = 192 slices, slice spacing = 1 mm, image matrix = 256 × 256. The field of view and voxel size were 256 × 256 mm and 1 × 1 × 1 mm, respectively.
2.5 Behavioral data analyses
To investigate how disturbance in feedback affects lexical form-meaning mapping, we used a generalized linear mixed effects model to analyze participants’ accuracy. The analyses were conducted using lme4 software, with accuracy as the dependent variable, and learning experience and disturbance/non disturbance as fixed effects. Trials in which participants did not respond were excluded from the analyses. We constructed a mixed-effects model with different random effects and evaluated the superiority of the model using Bayesian Information Criteria (BIC). According to the BIC, we selected the simplest model (i.e., the model with the lowest BIC). The final model was model = glmer (rate ~ data$learning experience*data$disturbance + (1|subject), family = “binomial”, data, control = glmerControl (optimizer = “bobyqa”, optCtrl = list (maxfun = 20000))).
2.6 fMRI data preprocessing analyses
We analyzed the fMRI data using dpabi (Yan et al. 2016), a toolkit for preprocessing and analyzing brain imaging data. First, the EPI DICOM data were converted to NIFTI format, and the first 10 volumes of each run were discarded due to T1 relaxation artifacts. In addition, slice time correction was performed using the middle slice of the volume as a reference to correct for head movement. Then, each participant’s brain and structural images were registered for comparison between groups and statistical analyses. Next, we used the DARTEL tool (Ashburner, 2007) to map the brain structure images of different individuals into the same standard space (MNI) to improve the accuracy of normalization. Finally, all voxels were resampled to 3 × 3 × 3 mm and all functions were smoothed using 6 mm FWHM isotropic Gaussian checks.
2.7 Full factorial analyses
Using SPM12 software in MATLAB R2014b (Welcome Department of Cognitive Neurology, London, UK), we performed a general linear model (GLM) analysis. At the first level analysis, to understand the influence of disturbance on word learning, we divided each trial into disturbance and learning phases to establish a multiple event-related GLM. In fMRI studies, head movements generated by each participant can interfere with the analysis of brain images, so these six head movement parameters must be treated as noise and modeled as regression factors. These head movement parameters were convolved with a typical hemodynamic response function (HRF) to account for the delay and shape of the blood oxygen level dependent (BOLD) signal, reducing the noise introduced by head movements, and more accurately reflecting brain activity. Then, the generated .mat file is divided into four conditions (disturbance-naïve, disturbance-expert, non-disturbance-naïve, and non-disturbance-expert) in the Contrast Manager.
The results of the first level GLM for each participant were used in a second level group analysis, using the full factor analysis to test for significant effects between groups. In the group statistical analysis, a full factor analysis was used on the whole brain to examine main effects and interactions between disturbance and learning experience. Finally, Gaussian Random Field (GRF) theory was used to generate a statistical graph at the threshold. GRF correction is a family error rate correction method (Tillikainen et al., 2006; Woo et al., 2014) which allows for strict correction using a single voxel threshold of p < .001, cluster level threshold of p < .05, and a cluster size of > 20 voxels.
2.8 gPPT analysis
To further investigate the neural circuitry elicited by disturbance vs. non-disturbance conditions, we used a generalized psychophysical interaction analysis (gPPI) to assess functional networks in the brain and reveal how functional connections are made between BOLD signals in regions of interest (ROIs). Specifically, for each participant, we used the gPPI toolbox (Cisler et al., 2013; McLaren et al., 2012) in SPM8 to extract the deconvolved times series from the seed regions as physiological variables, and convolved each experimental condition (disturbance-naïve, disturbance-expert, non-disturbance-naïve, and non-disturbance-expert) and parameter modulator with the standard oxygen level dependent response function as psychological regressors. The PPI terms were then created by multiplying the time series of the psychological regressors and the physiological variables. Next, the general linear model (GLM) was estimated separately for each participant using spm8, and statistical plots comparing the results of all 24 participants were combined to enter a group-level random effects analysis (i.e., one-sample t-test). Finally, statistical significance was examined for various parameters (GRF correction, single voxel threshold p < .001, cluster level threshold p < .05, cluster size > 20 voxels) to assess functional connectivity across different brain regions.