Trial settings
This study involved examining the effect of ER on learning EFL by monitoring BA in the prefrontal cortex (PFC). The experiment was conducted from April 2018 to February 2019 at Gunma University, Maebashi City, Gunma Prefecture, Japan. The purpose and procedure of the experiment were explained to the participants, and their written informed consent was obtained. Monetary compensation was given for each experiment and for handing over personal records of test scores and ER. This study was approved by the Ethics Committee of the Faculty of International Studies of Bunkyo University and conducted in accordance with the principles of the Declaration of Helsinki (1991). To observe the continual training effect, three experiments (pre, post 1, and post 2) were conducted .
Participants
Forty-three college students (first-year students of a national university) took part in this project from two different classes of a compulsory subject, English I. They had learned EFL and had taken at least six years of formal English education in Japanese junior high and high school. Their English proficiency was measured using the Listening and Reading test in TOEIC Institutional Program (IP). All participants belonged to the science and engineering department and took the same compulsory English classes. They were from the same university and department and were taking the same English course, showing that environmental and ability differences among the participants were small. On the other hand, since this study was conducted through the compulsory subject, a control group who were not given an intervention could not be made.
Intervention
In an ER training, the participants were assigned to read 150,000 words as one of the requirements for their credits during the 15-week semester. There were about 100,000 books in the library, from which they were allowed to choose according to their interests and proficiency. The participants were required to keep a record on Moodle, take 10 comprehension quizzes after reading each book, and get 60% or more correct answers. The instructors checked their records. The same amount of ER and procedure was repeated in the fall semester. As a result, the participants read approximately 300,000 words in total. Measurements were conducted at three time points: the beginning of the spring semester in April, the end of the spring semester in July, and the end of the fall semester in January. They did not have any prior experience in ER before they entered the university. Therefore, we could have a baseline to examine the effect of ER.
English proficiency
The TOEIC Listening and Reading test was used to objectively assess the proficiency of the participants. The TOEIC test was developed by the Educational Testing Service (ETS), United States (US), and consists of a 45-minute listening section (100 questions) and a 75-minute reading section (100 questions). It is widely used to measure English proficiency in Asia, particularly in Japan and South Korea. In 2018, 2,456,000 students took the test, and the average score of first-year college students was 440 out of 990. The mean score of the participants was 431.43 (SD = 19.16), which is considered as indicating the average proficiency of Japanese college students.
Listening task for functional near-infrared spectroscopy (fNIRS) experiment
A listening task was conducted through E-Prime 2.0. The listening materials were extracted from an EFL textbook, “Timed Reading for Fluency (Book 1)” [35]. All the texts were 200 words each, and the readability was carefully controlled (written with 800 headwords). The participants listened to the whole text by pressing the Enter key and answered seven True/False questions about the content of the same text. The RTs of the listening comprehension questions and their ARs were calculated. The participants listened to the passage on their headphones. The fixation point was shown at the center of the laptop computer monitor. The sound was played at an appropriate volume for each participant using a computer. The topics of the text which they read and to which they listened were counterbalanced among the participants. The same stimuli were not given to each participant among pre, post 1, and post 2.
Trial session
A trial session was conducted just before the main experiments were conducted so that the participants could become proficient in the measurement task. The procedure was the same as the main experiments, and two items were given to each participant in this trial session. These items in the trial session did not overlap with those in the main experimental sessions. The trial session had to allow all participants to participate in the experiments even in post 1 after familiarizing themselves with the task itself. Therefore, we assumed that this trial session could avoid a practice effect/repetition effect on our main experimental tasks among pre, post 1, and post 2 [36] .
Data exclusion
The participants who did not participate in all three experiments and whose data were not correctly measured were excluded from the analysis. The data of 24 participants (20 male and 4 female participants, aged 18.13, SD = 0.448) were analyzed (Table 1). The main reason for the decrease in the number of participants was the failure of data collection at the post 1 experiment, not the loss of many participants’ motivation to continue ER.
Table 1
Participants’ demographic data
Participants (N = 24) | |
| Mean | SD |
Age | 18.13 | 0.448 |
Sex | Male 20: Female 4 | |
fNIRS data
During the trials, prefrontal BA was measured using a two-channel near-infrared spectroscopy (NIRS) device (HOT-1000, a wavelength of 810 nm, sampling frequency 10 Hz, NeU Corp.). HOT-1000 has two dual source-detector optode sets and measures the concentration change of total hemoglobin (total Hb) for each source-detector [37]. It was set onto the forehead/frontal pole of the DLPFC (Broadman Area 10) of both the left and right hemispheres. The NIRS signals were collected while the participants listened to the stimulus. The whole data were baseline corrected by subtracting the average NIRS signals of both the preceding and post-rest periods, 15 s each, for a total of 30 s. We then averaged the NIRS signals in each experiment: pre, post 1, and post 2.
Analysis
We tested the expectation that BA would change along with the improvement in listening comprehension of the target language (English) as the participants gained the number of words during the ER training. Familiarity and mastery of skill have been found to decrease the load on BA [38]. To validate the results by analyzing longitudinal data, a generalized linear mixed-effects model (GLMM ANOVA) in SPSS was used. GLMMs are fundamentally flexible because they model both the mean and covariance structures [39]. We investigated the contribution of the fixed factors (the three time points and the left and right sides of the brain) and their interaction with behavioral and BA data. A post-hoc test was conducted using the Bonferroni method to extract the differences between the two periods. Missing values were excluded from the analysis. Statistical significance was set at p < 0.05, and all analyses were conducted using SPSS 25 (IBM SPSS 2017, Chicago, IL, US).
Results
Participant description
There were 24 participants (20 male and 4 female participants), aged 18 to 20 years (Mean = 18.13, SD = 0.448; Table 1).
Pre-test
The results of the behavioral data are presented in Table 2. The participants took the TOEIC test before beginning ER. The mean total score was 414.79 (SD = 132.33), reading section score was 181.67 (SD = 75.18), and listening section score was 233.13 (SD = 67.14). This score was slightly lower than the score of Japanese first-year college students (440, 2018). In the listening task, the AR of the comprehension tests was measured, and the total time taken to answer the comprehension questions was calculated as RT. The AR was 54.69% (SD = 8.82), and RT was 7560.04 ms (SD = 2094.11).
Table 2
Descriptive data of behavioral data
Criteria | Pre (n = 24) | Post 1 (n = 24) | Post 2 (n = 24) |
Mean | SD | Mean | SD | Mean | SD |
TOEIC Total | 414.79 | 132.33 | 465.00 | 141.56 | 466.46 | 155.15 |
TOEIC Reading | 181.67 | 75.18 | 198.54 | 72.42 | 197.08 | 76.07 |
TOEIC Listening | 233.13 | 67.14 | 266.46 | 76.34 | 269.38 | 85.43 |
Listening Accuracy Rate | 54.69 | 8.82 | 54.58 | 7.82 | 56.47 | 6.63 |
Listening Response Time | 7560.04 | 2094.11 | 6080.22 | 2079.29 | 6194.27 | 2228.17 |
Post 1
After 15 weeks of ER, Post1 tests were carried out. The total score on the TOEIC test was 465 (SD = 141.56), the reading section score was 198.54 (SD = 72.42), and the listening section score was 266.46 (SD = 76.34). Listening test AR was 54.58% (SD = 7.82), and RT was 6080.22 ms (2079.29) (Table 2).
Post 2
At the end of another 15 weeks of ER, the total score on the TOEIC test was 466.46 (SD = 151.15), the reading section score was 197.08 (SD = 76.08), and the listening section score was 269.38 (SD = 85.43). The listening test AR was 56.47% (SD = 6.63), and RT was 6194.27 ms (2228.17) (Table 2).
GLMM analysis
Number of words read in ER training
Statistically significant increases emerged among the three time points in the proportions of Post1 (148,583, SD = 3,6739) words and Post2 (289,040, SD = 85,809) words from baseline on word counts in ER (F(2,46) = 216.14, p = .000). A Bonferroni post-hoc test was performed and was significant for pre vs. post 1 (p = .000, 95% CI = [-183,130, -114,040]), pre vs. post 2 (p = .000, 95% CI = [-323,590, -254,490]), and Post1 vs. Post2 (p = .000, 95% CI = [-175,010, -105,910]) between any of the groups.
Listening task results
The RT in the listening task performed at the three time points was Pre: 7560.04 ms, SD = 2094.11 ms; Post 1: 6080.22 ms, SD = 2079.29; and Post2: 6194.27 ms, SD = 2228.17 with a significant difference (F(2,46) = 10.32, p = .000). After the Bonferroni post-hoc test, there was a significance decrease in RT at post 1 (p = .001, 95% CI = [579.096, 2380.556]) and post 2 (p = .001, 95% CI = [465.041, 2266.501]) compared with pre. There was no significant difference between post1 and post2 (p = 1.000, 95% CI = [-1014.79, 786.674]) (Fig. 3).
The AR of the listening task performed at the three time points was pre: 54.69, SD = 8.82; post 1: 54.58, SD = 7.82; and post 2: 56.47, SD = 6.63. There were no significant differences (F(2,46) = 7.28, p = .002)(Fig. 2).
BA results
In BA during the listening task, there were significant differences among the three time points: pre: left tHb (0.0111, SD = 0.11), right t-Hb (-0.0104, SD = 0.12); Post1: left t-Hb (-0.0375, SD = 0.13), right t-Hb (-0.0295, SD = 0.13); and Post2: left t-Hb (-0.1101, SD = 0.22), right t-Hb (-0.0851, SD = 0.12); (F(2,46) = 10.32, p = .000). The main effect of the three time points on BA revealed a significant decrease (F(2,115) = 7.362, p = .001). After the Bonferroni post-hoc test, there was a significant decrease in BA at Post 2 (p = .001, 95% CI = [0.035, 0.161]) compared with pre and at post 2 (p = .045, 95% CI = [0.001, 0.127]) compared with Post1. There was no significant difference between pre and post 1 (p = 0.584, 95% CI = [-0.029, 0.097]) (Table 3).
Table 3
Blood flow change (mMmm) in Listening tasks
| | Mean | SD | Range |
Listening | Pre_Left | 0.0111 | 0.1097 | -0.3120–0.2072 |
| Post1_Left | -0.0375 | 0.1322 | -0.2203–0.1886 |
| Post2_Left | -0.1101 | 0.2154 | -0.9756–0.0950 |
| Pre_Right | -0.0104 | 0.1157 | -0.3952–0.1366 |
| Post1_Right | -0.0295 | 0.1329 | -0.2962–0.3374 |
| Post2_Right | -0.0851 | 0.1174 | -0.3934–0.0486 |
However, there was no main effect of left or right hemisphere on BA (F(2,115) = 0.033, p = .867). In addition, there was no interaction between the three time points and left-right BA (F = 0.413, p = .663).
TOEIC results
The total score on the TOEIC tests, which can test overall English proficiency, showed significant differences among the three time points: pre (414.79, SD = 132.33), post 1 (465.00, SD = 141.56), and Post 2 (466.46, SD = 155.15); (F (2,46) = 7.28, p = .002). After the Bonferroni post-hoc test, there was a significant increase is listening section scores at Post 1 (p = .006, 95% CI = [-88.528, -11.888]) compared with pre and at post 2 (p = .005, 95% CI = [-89.987, -13.347]) compared with pre. There was no significant difference between post 1 and post 2 (p = 1.000, 95% CI = [-39.778, 36.862]) (Fig. 1).